Abstract
Background: One of the main challenges in the early stages of drug discovery is the computational assessment of protein-ligand binding affinity. Machine learning techniques can contribute to predicting this type of interaction. We may apply these techniques following two approaches. Firstly, using the experimental structures for which affinity data is available. Secondly, using protein-ligand docking simulations.
Objective: In this review, we describe recently published machine learning models based on crystal structures, for which binding affinity and thermodynamic data are available.
Method: We used experimental structures available at the protein data bank and binding affinity and thermodynamic data was accessed through BindingDB, Binding MOAD, and PDBbind databases. We reviewed machine learning models to predict binding created using open source programs, such as SAnDReS and Taba.
Results: Analysis of machine learning models trained against datasets, composed of crystal structure complexes indicated the high predictive performance of these models when compared with classical scoring functions.
Conclusion: The rapid increase in the number of crystal structures of protein-ligand complexes created a favorable scenario for developing machine learning models to predict binding affinity. These models rely on experimental data from two sources, the structural and the affinity data. The combination of experimental data generates computational models that outperform the classical scoring functions.
Keywords: Crystal structures, machine learning, scoring function space, binding affinity, SAnDReS, Taba.