Abstract
Background: Machine learning has become an essential tool for drug research to generate pertinent structural information to design drugs with higher biological activities. Quantitative structureactivity relationship (QSAR) is considered one technique. QSAR study involves two main steps: first is the generation of descriptors, and the second is building and validating the models.
Aim: By using a Python program language for building the QSAR model of pyrazoline derivatives, the data were collected from diverse literature for the inhibition of Mycobacterium tuberculosis. Pyrazoline, a small molecule scaffold, could block the biosynthesis of mycolic acids, resulting in mycobacteria death and leading to anti-tubercular drug discovery.
Methods: We have developed a new Python script that effectively uses CDK descriptor as the independent variable and anti-tubercular bioactivity as the dependent variable in building and validating the best QSAR model. The built QSAR model was further cross-validated by using the external test set compounds. Then, the three algorithms, viz. multiple linear regression, support vector machine, and partial least square classifiers, were used to differentiate and compare their r2 values.
Results: Our generated QSAR model via an open-source python program predicted well with external test set compounds. The generated statistical model afforded the ordinary least squares (OLS) regression as R2 value of 0.514, F value of 5.083, the adjusted R2 value of 0.413, and std. error of 0.092. Moreover, the multiple linear regression showed the R2 value of 0.5143, reg.coef_ of, -0.07795 (PC1), 0.01619 (PC2), 0.03763 (PC3), 0.07849 (PC4), -0.09726 (PC5), and reg.intercept_ of 4.8324. The performance of the model was determined by the support vector machine classifier of sklearn, module and it provided a model score of 0.5901. Further, the model performance was supported by a partial least square regression, and it showed the R2 value of 0.5901. The model performance was validated, and the model predicted similar values when compared to that of the train set, and the plotted linear curve between the predicted and actual pMIC50 value showed all data to fall over the middle linear line.
Conclusion: We have found that the model score obtained using this script via three diverse algorithms correlated well, and there was not much difference between them; the model may be useful in the design of a similar group of pyrazoline analogs as anti-tubercular agents.
Keywords: Machine learning, QSAR, Python, H37Rv strain, Mycobacterium tuberculosis, Pyrazoline derivatives.
Graphical Abstract
[http://dx.doi.org/10.5958/0974-360X.2019.00663.2]
[http://dx.doi.org/10.1002/poc.3512]
[http://dx.doi.org/10.1016/j.bioorg.2014.03.006] [PMID: 24747187]
[http://dx.doi.org/10.1016/j.molstruc.2014.12.090]
[http://dx.doi.org/10.1016/j.csbj.2016.04.004] [PMID: 27293534]
[http://dx.doi.org/10.1021/jm4004285] [PMID: 24351051]
[http://dx.doi.org/10.22159/ajpcr.2018.v11i10.27179]
[http://dx.doi.org/10.1002/chem.202103712] [PMID: 34767281]
[http://dx.doi.org/10.1039/D0EE02838J]
[http://dx.doi.org/10.2174/092986712802884259] [PMID: 22830342]
[http://dx.doi.org/10.1016/j.envpol.2019.06.081] [PMID: 31302400]
[http://dx.doi.org/10.1016/j.ecoenv.2018.06.061] [PMID: 29957404]
[http://dx.doi.org/10.1016/j.bmcl.2007.07.105] [PMID: 17827004]
[http://dx.doi.org/10.5958/0974-360X.2018.00662.5]
[http://dx.doi.org/10.1080/14786410701369367] [PMID: 17613813]
[http://dx.doi.org/10.1007/s00044-013-0815-x]
[http://dx.doi.org/10.1016/j.ejmech.2006.09.006] [PMID: 17069933]
[http://dx.doi.org/10.1016/j.ejphar.2007.11.042] [PMID: 18190906]
[http://dx.doi.org/10.5958/0974-360X.2017.00261.X]
[http://dx.doi.org/10.5958/0974-360X.2018.00367.0]
[http://dx.doi.org/10.1016/j.bmcl.2012.06.072] [PMID: 22832312]
[http://dx.doi.org/10.1016/j.jscs.2014.12.004]
[http://dx.doi.org/10.1016/j.drudis.2018.11.014] [PMID: 30472429]
[http://dx.doi.org/10.1016/j.drudis.2018.05.010] [PMID: 29750902]
[http://dx.doi.org/10.1039/D1TA04742F]
[http://dx.doi.org/10.1039/D0BM01672A] [PMID: 33443512]
[http://dx.doi.org/10.1021/ci049965i] [PMID: 15272833]
[http://dx.doi.org/10.1002/(SICI)1099-128X(199603)10:2<119::AID-CEM409>3.0.CO;2-4]
[http://dx.doi.org/10.1021/ci1004042] [PMID: 21696145]
[http://dx.doi.org/10.1021/ci980140g] [PMID: 10094611]
[http://dx.doi.org/10.1289/ehp.5758] [PMID: 12896860]