Abstract
Background: Enhancing a compound’s biological activity is the central task for lead optimization in small molecules drug discovery. However, it is laborious to perform many iterative rounds of compound synthesis and bioactivity tests. To address the issue, it is highly demanding to develop high quality in silico bioactivity prediction approaches, to prioritize such more active compound derivatives and reduce the trial-and-error process.
Methods: Two kinds of bioactivity prediction models based on a large-scale structure-activity relationship (SAR) database were constructed. The first one is based on the similarity of substituents and realized by matched molecular pair analysis, including SA, SA_BR, SR, and SR_BR. The second one is based on SAR transferability and realized by matched molecular series analysis, including Single MMS pair, Full MMS series, and Multi single MMS pairs. Moreover, we also defined the application domain of models by using the distance-based threshold.
Results: Among seven individual models, Multi single MMS pairs bioactivity prediction model showed the best performance (R2 = 0.828, MAE = 0.406, RMSE = 0.591), and the baseline model (SA) produced the most lower prediction accuracy (R2 = 0.798, MAE = 0.446, RMSE = 0.637). The predictive accuracy could further be improved by consensus modeling (R2 = 0.842, MAE = 0.397 and RMSE = 0.563).
Conclusion: An accurate prediction model for bioactivity was built with a consensus method, which was superior to all individual models. Our model should be a valuable tool for lead optimization.
Keywords: Matched molecular pair, matched molecular series, bioactivity prediction, SAR transfer, application domain, lead optimization.
[http://dx.doi.org/10.1021/jm00280a002] [PMID: 5069767]
[http://dx.doi.org/10.1002/3527603743.ch11]
[http://dx.doi.org/10.1021/jm0605233] [PMID: 17154498]
[http://dx.doi.org/10.1021/jm070838y] [PMID: 18173228]
[http://dx.doi.org/10.1021/jm200026b] [PMID: 21443196]
[http://dx.doi.org/10.1021/ci300481d] [PMID: 23186159]
[http://dx.doi.org/10.1021/acs.jcim.6b00709] [PMID: 28459552]
[http://dx.doi.org/10.1021/ci200254k] [PMID: 21774471]
[http://dx.doi.org/10.1021/ci400265b] [PMID: 23777278]
[http://dx.doi.org/10.1021/jm500022q] [PMID: 24601597]
[http://dx.doi.org/10.1039/C7MD00465F] [PMID: 30108724]
[http://dx.doi.org/10.1093/nar/gkr777] [PMID: 21948594]
[http://dx.doi.org/10.1021/ci900450m] [PMID: 20121045]
[http://dx.doi.org/10.1080/00401706.1970.10488635]
[http://dx.doi.org/10.1109/MCSE.2011.37]
[http://dx.doi.org/10.1007/s10822-005-9012-4] [PMID: 16267692]
[http://dx.doi.org/10.1016/j.bmc.2005.08.035] [PMID: 16214346]
[http://dx.doi.org/10.1186/1758-2946-6-26] [PMID: 24959207]
[http://dx.doi.org/10.1021/ci700443v] [PMID: 18311912]
[http://dx.doi.org/10.1021/ci700016d] [PMID: 17616180]
[http://dx.doi.org/10.1021/jm050260x] [PMID: 16640331]
[http://dx.doi.org/10.1021/ci060132x] [PMID: 16995729]
[http://dx.doi.org/10.1002/jcc.21002] [PMID: 18484640]
[http://dx.doi.org/10.1002/minf.201200170] [PMID: 27481768]
[http://dx.doi.org/10.1021/tx900189p] [PMID: 19845371]
[http://dx.doi.org/10.1177/026119290503300209] [PMID: 16180989]
[http://dx.doi.org/10.1177/026119290503300508] [PMID: 16268757]
[http://dx.doi.org/10.1007/3-540-45014-9_1]
[http://dx.doi.org/10.1021/ci3001138] [PMID: 22489665]
[http://dx.doi.org/10.1021/ci060064e] [PMID: 16859315]