Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques

T.      Idhaya; A.      Suruliandi; S. P.      Raja

doi:10.2174/0113892002268739231211063718

Abstract

Background: Drug-Protein Interaction (DPI) identification is crucial in drug discovery. The high dimensionality of drug and protein features poses challenges for accurate interaction prediction, necessitating the use of computational techniques. Docking-based methods rely on 3D structures, while ligand-based methods have limitations such as reliance on known ligands and neglecting protein structure. Therefore, the preferred approach is the chemogenomics-based approach using machine learning, which considers both drug and protein characteristics for DPI prediction.

Methods: In machine learning, feature selection plays a vital role in improving model performance, reducing overfitting, enhancing interpretability, and making the learning process more efficient. It helps extract meaningful patterns from drug and protein data while eliminating irrelevant or redundant information, resulting in more effective machine-learning models. On the other hand, classification is of great importance as it enables pattern recognition, decision-making, predictive modeling, anomaly detection, data exploration, and automation. It empowers machines to make accurate predictions and facilitates efficient decision-making in DPI prediction. For this research work, protein data was sourced from the KEGG database, while drug data was obtained from the DrugBank data machine-learning base.

Results: To address the issue of imbalanced Drug Protein Pairs (DPP), different balancing techniques like Random Over Sampling (ROS), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive SMOTE were employed. Given the large number of features associated with drugs and proteins, feature selection becomes necessary. Various feature selection methods were evaluated: Correlation, Information Gain (IG), Chi-Square (CS), and Relief. Multiple classification methods, including Support Vector Machines (SVM), Random Forest (RF), Adaboost, and Logistic Regression (LR), were used to predict DPI. Finally, this research identifies the best balancing, feature selection, and classification methods for accurate DPI prediction.

Conclusion: This comprehensive approach aims to overcome the limitations of existing methods and provide more reliable and efficient predictions in drug-protein interaction studies.

« Previous Next »

Graphical Abstract

[1]
Paul, S.M.; Mytelka, D.S.; Dunwiddie, C.T.; Persinger, C.C.; Munos, B.H.; Lindborg, S.R.; Schacht, A.L. How to improve R&D productivity: The pharmaceutical industry’s grand challenge. Nat. Rev. Drug Discov.,  2010, 9(3), 203-214.
 [http://dx.doi.org/10.1038/nrd3078] [PMID:  20168317]

[2]
Imming, P.; Sinning, C.; Meyer, A. Drugs, their targets and the nature and number of drug targets. Nat. Rev. Drug Discov.,  2006, 5(10), 821-834.
 [http://dx.doi.org/10.1038/nrd2132] [PMID:  17016423]

[3]
Parada, C.A.; Vivancos, G.G.; Tambeli, C.H.; de Queiróz Cunha, F.; Ferreira, S.H. Activation of presynaptic NMDA receptors coupled to NaV1.8-resistant sodium channel C-fibers causes retrograde mechanical nociceptor sensitization. Proc. Natl. Acad. Sci. USA,  2003, 100(5), 2923-2928.
 [http://dx.doi.org/10.1073/pnas.252777799] [PMID:  12589028]

[4]
Mashalidis, E.H. A three-stage biophysical screening cascade for fragment- ´ based drug discovery. Nat. Protoc.,  2013, 8(11), 2309-2324.
 [http://dx.doi.org/10.1038/nprot.2013.130] [PMID:  24157549]

[5]
Swinney, D.C.; Anthony, J. How were new medicines discovered? Nat. Rev. Drug Discov.,  2011, 10(7), 507-519.
 [http://dx.doi.org/10.1038/nrd3480] [PMID:  21701501]

[6]
Hughes, J.P.; Rees, S.; Kalindjian, S.B.; Philpott, K.L. Principles of early drug discovery. Br. J. Pharmacol.,  2011, 162(6), 1239-1249.
 [http://dx.doi.org/10.1111/j.1476-5381.2010.01127.x] [PMID:  21091654]

[7]
Keserű, G.M.; Makara, G.M. Hit discovery and hit-to-lead approaches. Drug Discov. Today,  2006, 11(15-16), 741-748.
 [http://dx.doi.org/10.1016/j.drudis.2006.06.016] [PMID:  16846802]

[8]
Pandeya, S.N.; Thakkar, D. Combinatorial chemistry: A novel method in drug discovery and its application. Indian J. Chem.,  2005, 44B, 335-348.

[9]
Ou-Yang, S.; Lu, J.; Kong, X.; Liang, Z.; Luo, C.; Jiang, H. Computational drug discovery. Acta Pharmacol. Sin.,  2012, 33(9), 1131-1140.
 [http://dx.doi.org/10.1038/aps.2012.109] [PMID:  22922346]

[10]
Suruliandi, A.; Idhaya, T.; Raja, S. P. Drug target interaction prediction using machine learning techniques – a review. Int. J. Interact. Multime. Artificial Intellig., Web Science,  2022, 1-5.
 [http://dx.doi.org/10.9781/ijimai.2022.11.002]

[11]
Zhang, W.; Chen, Y.; Li, D. Drug–protein interaction prediction through label propagation with linear neighborhood information. Molecules,  2017, 22(12), 2056.
 [http://dx.doi.org/10.3390/molecules22122056] [PMID:  29186828]

[12]
Zhang, X.; Li, L.; Ng, M.K.; Zhang, S. Drug–target interaction prediction by integrating multiview network data. Comput. Biol. Chem.,  2017, 69, 185-193.
 [http://dx.doi.org/10.1016/j.compbiolchem.2017.03.011] [PMID:  28648470]

[13]
Shi, Z.; Li, J. Drug–protein interaction prediction with weighted Bayesian ranking. Proceedings of the 2nd International Conference on Biomedical Engineering and Bioinformatics,  2018, pp. 19-24.
 [http://dx.doi.org/10.1145/3278198.3278210]

[14]
Li, L; Cai, M Drug protein prediction by multi-view low rank embedding. IEEE/ACM Trans; Comput. Biol. Bioinform, 2017. 

[15]
Bolgár, B.; Antal, P.V.B-M.K-L.M.F. Fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics,  2017, 18(1), 440.
 [http://dx.doi.org/10.1186/s12859-017-1845-z] [PMID:  28978313]

[16]
Huang, Y.; You, Z.; Chen, X. A systematic prediction of drug-target interactions using molecular fingerprints and protein sequences. Curr. Protein Pept. Sci.,  2018, 19(5), 468-478.
 [http://dx.doi.org/10.2174/1389203718666161122103057] [PMID:  27875970]

[17]
Zhang, J.; Zhu, M.; Chen, P.; Wang, B. DrugRPE: Random projection ensemble approach to drug-target interaction prediction. Neurocomputing,  2017, 228, 256-262.
 [http://dx.doi.org/10.1016/j.neucom.2016.10.039]

[18]
Rayhan, F.; Ahmed, S.; Shatabda, S.; Farid, D.M.; Mousavian, Z.; Dehzangi, A.; Rahman, M.S. iDPI-ESBoost: identification of drug protein interaction using evolutionary and structural features with boosting. Sci. Rep.,  2017, 7(1), 17731.
 [http://dx.doi.org/10.1038/s41598-017-18025-2] [PMID:  29255285]

[19]
Sharma, A.; Rani, R. BE-DTI’: Ensemble framework for drug target interaction prediction using dimensionality reduction and active learning. Comput. Methods Programs Biomed.,  2018, 165, 151-162.
 [http://dx.doi.org/10.1016/j.cmpb.2018.08.011] [PMID:  30337070]

[20]
Seal, A.; Ahn, Y.Y.; Wild, D.J. Optimizing drug–target interaction prediction based on random walk on heterogeneous networks. J. Cheminform.,  2015, 7(1), 40.
 [http://dx.doi.org/10.1186/s13321-015-0089-z] [PMID:  26300984]

[21]
Huang, Y.; Zhu, L.; Tan, H. Predicting drug-protein on heterogeneous network with co-rank. International Conference on Computer Engineering and Networks,  2018, pp. 571-81.

[22]
Ban, T.; Ohue, M.; Akiyama, Y. NRLMFβ: Beta-distribution-rescored neighborhood regularized logistic matrix factorization for improving the performance of drug–target interaction prediction. Biochem. Biophys. Rep.,  2019, 18, 100615.
 [http://dx.doi.org/10.1016/j.bbrep.2019.01.008] [PMID:  30793050]

[23]
Zhang, Y.; Wu, M.; Wang, S.; Chen, W. EFMSDTI: Drug-target interaction prediction based on an efficient fusion of multi-source data. Front. Pharmacol.,  2022, 13, 1009996.
 [http://dx.doi.org/10.3389/fphar.2022.1009996] [PMID:  36210804]

[24]
Shao, K.; Zhang, Y.; Wen, Y.; Zhang, Z.; He, S.; Bo, X. DTI-HETA: Prediction of drug–target interactions based on GCN and GAT on heterogeneous graph. Brief. Bioinform.,  2022, 23(3), bbac109.
 [http://dx.doi.org/10.1093/bib/bbac109] [PMID:  35380622]

[25]
Li, F.; Zhang, Z.; Guan, J.; Zhou, S. Effective drug–target interaction prediction with mutual interaction neural network. Bioinformatics,  2022, 38(14), 3582-3589.
 [http://dx.doi.org/10.1093/bioinformatics/btac377] [PMID:  35652721]

[26]
El-Behery, H.; Attia, A.F.; El-Fishawy, N.; Torkey, H. An ensemble-based drug–target interaction prediction approach using multiple feature information with data balancing. J. Biol. Eng.,  2022, 16(1), 21.
 [http://dx.doi.org/10.1186/s13036-022-00296-7] [PMID:  35941686]

[27]
Abbasi Mesrabadi, H.; Faez, K.; Pirgazi, J. Drug–target interaction prediction based on protein features, using wrapper feature selection. Sci. Rep.,  2023, 13(1), 3594.
 [http://dx.doi.org/10.1038/s41598-023-30026-y] [PMID:  36869062]

[28]
Zhou, L.; Wang, Y.; Peng, L.; Li, Z.; Luo, X. Identifying potential drug-target interactions based on ensemble deep learning. Front. Aging Neurosci.,  2023, 15, 1176400.
 [http://dx.doi.org/10.3389/fnagi.2023.1176400] [PMID:  37396659]

[29]
Tan, D.; Jiang, H.; Li, H.; Xie, Y.; Su, Y. Prediction of drug–protein interaction based on dual channel neural networks with attention mechanism. Brief. Funct. Genomics,  2023, elad037.
 [http://dx.doi.org/10.1093/bfgp/elad037] [PMID:  37642213]

[30]
Yang, Z.; Liu, J.; Zhu, X.; Yang, F.; Zhang, Q.; Shah, H.A. FragDPI: a novel drug-protein interaction prediction model based on fragment understanding and unified coding. Front. Comput. Sci.,  2023, 17(5), 175903.
 [http://dx.doi.org/10.1007/s11704-022-2163-9] [PMID:  36532946]

[31]
Huang, Y.; Huang, H.Y.; Chen, Y.; Lin, Y.C.D.; Yao, L.; Lin, T.; Leng, J.; Chang, Y.; Zhang, Y.; Zhu, Z.; Ma, K.; Cheng, Y.N.; Lee, T.Y.; Huang, H.D. A robust drug–target interaction prediction framework with capsule network and transfer learning. Int. J. Mol. Sci.,  2023, 24(18), 14061.
 [http://dx.doi.org/10.3390/ijms241814061] [PMID:  37762364]

[32]
Khojasteh, H.; Pirgazi, J.; Ghanbari Sorkhi, A. Improving prediction of drug-target interactions based on fusing multiple features with data balancing and feature selection techniques. PLoS One,  2023, 18(8), e0288173.
 [http://dx.doi.org/10.1371/journal.pone.0288173] [PMID:  37535616]

[33]
Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A.C.; Liu, Y.; Maciejewski, A.; Arndt, D.; Wilson, M.; Neveu, V.; Tang, A.; Gabriel, G.; Ly, C.; Adamjee, S.; Dame, Z.T.; Han, B.; Zhou, Y.; Wishart, D.S. DrugBank 4.0: Shedding new light on drug metabolism. Nucleic Acids Res.,  2014, 42(D1), D1091-D1097.
 [http://dx.doi.org/10.1093/nar/gkt1068] [PMID:  24203711]

[34]
Kanehisa, M.; Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res.,  2000, 28, 27-30.

[35]
Xiao, N.; Cao, D.S.; Zhu, M.F.; Xu, Q.S. protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics,  2015, 31(11), 1857-1859.
 [http://dx.doi.org/10.1093/bioinformatics/btv042] [PMID:  25619996]

[36]
Yap, C.W. PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem.,  2011, 32(7), 1466-1474.
 [http://dx.doi.org/10.1002/jcc.21707] [PMID:  21425294]

[37]
Suruliandi, A.; Ranjini, K.; Raja, S.P. Balancing assisted reproductive technology dataset for improving the efficiency of incremental classifiers and feature selection techniques. J. Circuits Syst. Comput.,  2021, 30(6), 2130007.
 [http://dx.doi.org/10.1142/S0218126621300075]

[38]
Saeys, Y.; Inza, I.; Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics,  2007, 23(19), 2507-2517.
 [http://dx.doi.org/10.1093/bioinformatics/btm344] [PMID:  17720704]

[39]
Jaynes, E.T. Information theory and statistical mechanics II. Physical Review,  1957, 108(2), 171-190.
 [http://dx.doi.org/10.1103/PhysRev.108.171]

[40]
 Available from: https://machinelearningmastery.com/feature-selection-machine-learning-python/Chi-square

[41]
Yu, L.; Liu, H. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. Proceedings, Twentieth International Conference on Machine Learning,  2003, , pp. 856-863.Vol. 2

[42]
Kononenko, I. Estimating attributes: Analysis and extensions of RELIEFF. In: Machine Learning: ECML-94; Lecture Notes in Computer ScienceSpringer: Berlin, Heidelberg, 1994; 784, pp. 171-182.
 [http://dx.doi.org/10.1007/3-540-57868-4_57]

[43]
Zaynab Mousavian & Ali Masoudi-Nejad. Drug–target interaction prediction via chemogenomic space: learning-based methods. Exp. Opin. Drug Metabol. Toxicol.,  2014, 9, 12731287.
 [http://dx.doi.org/10.1517/17425255.2014.950222]

[44]
Wagstaff, K. Machine learning that matters. arXiv:1206.4656, 2012.

[45]
 Available from: https://www.javatpoint.com/supervised-machine-learning

[46]
Lv, Z.; Jin, S.; Ding, H.; Zou, Q. A random forest sub-golgi protein classifier optimized via dipeptide and amino acid composition features. Front. Bioeng. Biotechnol.,  2019, 7, 215.
 [http://dx.doi.org/10.3389/fbioe.2019.00215] [PMID:  31552241]

[47]
 Available from: https://www.analyticsvidhya.com/blog/2021/09/adaboost-algorithm-a-complete-guide-for-beginner

Rights & Permissions Print Cite

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/0113892002268739231211063718	Print ISSN 1389-2002
Publisher Name Bentham Science Publisher	Online ISSN 1875-5453

Current Drug Metabolism

Drug-Protein Interactions Prediction Models Using Feature Selection and Classification Techniques

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract