Survey of Machine Learning Techniques in Drug Discovery

Natalie       Stephenson; Emily       Shane; Jessica       Chase; Jason       Rowland; David       Ries; Nicola       Justice; Jie       Zhang; Leong       Chan; Renzhi       Cao

doi:10.2174/1389200219666180820112457

Abstract

Background: Drug discovery, which is the process of discovering new candidate medications, is very important for pharmaceutical industries. At its current stage, discovering new drugs is still a very expensive and time-consuming process, requiring Phases I, II and III for clinical trials. Recently, machine learning techniques in Artificial Intelligence (AI), especially the deep learning techniques which allow a computational model to generate multiple layers, have been widely applied and achieved state-of-the-art performance in different fields, such as speech recognition, image classification, bioinformatics, etc. One very important application of these AI techniques is in the field of drug discovery.

Methods: We did a large-scale literature search on existing scientific websites (e.g, ScienceDirect, Arxiv) and startup companies to understand current status of machine learning techniques in drug discovery.

Results: Our experiments demonstrated that there are different patterns in machine learning fields and drug discovery fields. For example, keywords like prediction, brain, discovery, and treatment are usually in drug discovery fields. Also, the total number of papers published in drug discovery fields with machine learning techniques is increasing every year.

Conclusion: The main focus of this survey is to understand the current status of machine learning techniques in the drug discovery field within both academic and industrial settings, and discuss its potential future applications. Several interesting patterns for machine learning techniques in drug discovery fields are discussed in this survey.

Keywords: Drug discovery, artificial intelligence, machine learning, deep learning, drug development, pharmacology.

« Previous Next »

Graphical Abstract

[1] 
Munos, B. Lessons from 60 years of pharmaceutical innovation. Nat. Rev. Drug Discov.,  2009, 8(12), 959.
[2] 
Warren, J. Drug discovery: Lessons from evolution. Br. J. Clin. Pharmacol.,  2011, 71(4), 497-503.
[3] 
Hughes, B. 2009 FDA drug approvals. Nat. Rev. Drug Discov.,  2010, 9, 89-72.
[4] 
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature,  2015, 521(7553), 436.
[5] 
Li, D.; Sajjapongse, K.; Truong, H.; Conant, G.; Becchi, M. In. A
distributed CPU-GPU framework for pairwise alignments on
large-scale sequence datasets,  Application-Specific Systems, Architectures
and Processors(ASAP). 2013, IEEE 24th International
Conference on, IEEE: 2013; pp. 329-338.
[6] 
Li, D.; Becchi, M. In. Deploying graph algorithms on gpus: An
adaptive solution,  Parallel & Distributed Processing(IPDPS), 2013
IEEE 27th International Symposium on, IEEE. 2013, pp. 1013-
1024.
[7] 
Li, D.; Chen, X.; Becchi, M.; Zong, Z. In. Evaluating the energy
efficiency of deep convolutional neural networks on cpus and gpus,  Big Data and Cloud Computing (BDCloud), Social Computing and
Networking (SocialCom), Sustainable Computing and Communications
(SustainCom) (BDCloud-SocialCom-SustainCom), 2016
IEEE International Conferences on, IEEE. 2016, pp. 477-484.
[8] 
Zhang, K.; Gao, C.; Guo, L.; Sun, M.; Yuan, X.; Han, T.X.; Zhao, Z.; Li, B. Age group and gender estimation in the wild with deep RoR architecture. IEEE Access,  2017, 5, 22492-22503.
[9] 
Sun, M.; Han, T.X.; Liu, M-C.; Khodayari-Rostamabad, A. In. Multiple instance learning convolutional neural networks for object
recognition,  Pattern Recognition(ICPR), 2016 23rd International
Conference on, IEEE. 2016, pp. 3270-3275.
[10] 
Trieu, T.; Cheng, J. Large-scale reconstruction of 3D structures of human chromosomes from chromosomal contact data. Nucleic Acids Res.,  2014, 42(7), e52-e52.
[11] 
Adhikari, B.; Trieu, T.; Cheng, J. Chromosome3D: Reconstructing three-dimensional chromosomal structures from Hi-C interaction frequency data using distance geometry simulated annealing. BMC Genomics,  2016, 17(1), 886.
[12] 
Bhattacharya, D.; Nowotny, J.; Cao, R.; Cheng, J. 3Drefine: An interactive web server for efficient protein structure refinement. Nucleic Acids Res.,  2016, 44(W1), W406-W409.
[13] 
Cheng, J.; Tegge, A.N.; Baldi, P. Machine learning methods for protein structure prediction. IEEE Rev. Biomed. Eng.,  2008, 1, 41-49.
[14] 
Cao, R.; Adhikari, B.; Bhattacharya, D.; Sun, M.; Hou, J.; Cheng, J. QAcon: Single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics,  2017, 33(4), 586-588.
[15] 
Cao, R.; Bhattacharya, D.; Hou, J.; Cheng, J.; Deep, Q.A. Improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics,  2016, 17(1), 495.
[16] 
Cao, R.; Wang, Z.; Wang, Y.; Cheng, J. SMOQ: A tool for predicting the absolute residue-specific quality of a single protein model with support vector machines. BMC Bioinformatics,  2014, 15(1), 120.
[17] 
Wang, S.; Xu, J. De Novo protein structure prediction by big data and deep learning. Biophys. J.,  2017, 112(3), 55a.
[18] 
Manavalan, B.; Lee, J. SVMQA: Support-vector-machine-based protein single-model quality assessment. Bioinformatics,  2017, 33(16), 2496-2503.
[19] 
Tang, H.; Yang, Y.; Zhang, C.; Chen, R.; Huang, P.; Duan, C.; Zou, P. Predicting presynaptic and postsynaptic neurotoxins by developing feature selection technique. BioMed Res. Int.,  2017, 2017, 1-4.
[20] 
Chen, X-X.; Tang, H.; Li, W-C.; Wu, H.; Chen, W.; Ding, H.; Lin, H. Identification of bacterial cell wall lyases via pseudo amino acid composition. BioMed Res. Int.,  2016, 2016, 1-8.
[21] 
Tang, H.; Zou, P.; Zhang, C.; Chen, R.; Chen, W.; Lin, H. Identification of apolipoprotein using feature selection technique. Sci. Rep.,  2016, 6, 30441.
[22] 
Feng, P-M.; Ding, H.; Chen, W.; Lin, H. Naive Bayes classifier with feature selection to identify phage virion proteins. Comput. Math. Methods Med.,  2013, 2013, 530696.
[23] 
Tang, H.; Chen, W.; Lin, H. Identification of immunoglobulins using Chou’s pseudo amino acid composition with feature selection technique. Mol. Biosyst.,  2016, 12(4), 1269-1275.
[24] 
Cao, R.; Freitas, C.; Chan, L.; Sun, M.; Jiang, H.; Chen, Z. ProLanGO: Protein function prediction using neural machine translation based on a recurrent neural network. Molecules,  2017, 22(10), 1732.
[25] 
Jiang, Y.; Oron, T.R.; Clark, W.T.; Bankapurm, A.R.; D’Andrea, D.; Lepore, R.; Funk, C.S.; Kahanda, I.; Verspoor, K.M.; Ben-Hur, A. Koo da, C.E.; Penfold-Brown, D.; Shasha, D.; Youngs, N.; Bonneau, R.; Lin, A.; Sahraeian, S.M.; Martelli, P.L.; Profiti, G.; Casadio, R.; Cao, R.; Zhong, Z.; Cheng, J.; Altenhoff, A.; Skunca, N.; Dessimoz, C.; Dogan, T.; Hakala, K.; Kaewphan, S.; Mehryary, F.; Salakoski, T.; Ginter, F.; Fang, H.; Smithers, B.; Oates, M.; Gough, J.; Törönen, P.; Koskinen, P.; Holm, L.; Chen, C.T.; Hsu, W.L.; Bryson, K.; Cozzetto, D.; Minneci, F.; Jones, D.T.; Chapman, S.; Bkc, D.; Khan, I.K.; Kihara, D.; Ofer, D.; Rappoport, N.; Stern, A.; Cibrian-Uhalte, E.; Denny, P.; Foulger, R.E.; Hieta, R.; Legge, D.; Lovering, R.C.; Magrane, M.; Melidoni, A.N.; Mutowo-Meullenet, P.; Pichler, K.; Shypitsyna, A.; Li, B.; Zakeri, P.; ElShal, S.; Tranchevent, L.C.; Das, S.; Dawson, N.L.; Lee, D.; Lees, J.G.; Sillitoe, I.; Bhat, P.; Nepusz, T.; Romero, A.E.; Sasidharan, R.; Yang, H.; Paccanaro, A.; Gillis, J.; Sedeño-Cortés, A.E.; Pavlidis, P.; Feng, S.; Cejuela, J.M.; Goldberg, T.; Hamp, T.; Richter, L.; Salamov, A.; Gabaldon, T.; Marcet-Houben, M.; Supek, F.; Gong, Q.; Ning, W.; Zhou, Y.; Tian, W.; Falda, M.; Fontana, P.; Lavezzo, E.; Toppo, S.; Ferrari, C.; Giollo, M.; Piovesan, D.; Tosatto, S.C.; Del Pozo, A.; Fernández, J.M.; Maietta, P.; Valencia, A.; Tress M.L.; Benso A.; Di Carlo S.; Politano G.; Savino, A.; Rehman, H.U.; Re, M.; Mesiti, M.; Valentini, G.; Bargsten, J.W.; Van Dijk, A.D.; Gemovic, B.; Glisic, S.; Perovic, V.; Veljkovic, V.; Veljkovic, N.; Almeida-E-Silva, D.C.; Vencio, R.Z.; Sharan, M.; Vogel, J.; Kansakar, L.; Zhang, S.; Vucetic, S.; Wang Z.; Sternberg, M.J.; Wass, M.N.; Huntley, R.P.; Martin, M.J.; O’Donovan, C.; Robinson P.N.; Moreau, Y.; Tramontano A.; Babbitt, P.C.; Brenner, S.E.; Linial, M.; Orengo, C.A.; Rost, B.; Greene, C.S.; Mooney, S.D.; Friedberg I.; Radivojac P. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol.,  2016, 17(1), 184.
[26] 
Tang, H.; Su, Z-D.; Wei, H-H.; Chen, W.; Lin, H. Prediction of cell-penetrating peptides with feature selection techniques. Biochem. Biophys. Res. Commun.,  2016, 477(1), 150-154.
[27] 
Kulmanov, M.; Khan, M.A.; Hoehndorf, R.; Deep, G.O. Predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics,  2017, 34(4), 660-668.
[28] 
Cao, R.; Cheng, J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods,  2016, 93, 84-91.
[29] 
Ekins, S.; Madrid, P.B.; Sarker, M.; Li, S-G.; Mittal, N.; Kumar, P.; Wang, X.; Stratton, T.P.; Zimmerman, M.; Talcott, C. Combining metabolite-based pharmacophores with bayesian machine learning models for Mycobacterium tuberculosis drug discovery. PLoS One,  2015, 10(10), e0141076.
[30] 
Lavecchia, A. Machine-learning approaches in drug discovery: Methods and applications. Drug Discov. Today,  2015, 20(3), 318-331.
[31] 
Ekins, S.; Freundlich, J.S.; Reynolds, R.C. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for mycobacterium tuberculosis. J. Chem. Inf. Model.,  2014, 54(7), 2157-2165.
[32] 
Ekins, S.; Freundlich, J.S.; Clark, A.M.; Anantpadma, M.; Davey, R.A.; Madrid, P. Machine learning models identify molecules active against the Ebola virus in vitro. F1000 Res.,  2015, 4, 1091.
[33] 
Warmuth, M.K.; Liao, J.; Rätsch, G.; Mathieson, M.; Putta, S.; Lemmen, C. Active learning with support vector machines in the drug discovery process. J. Chem. Inf. Comput. Sci.,  2003, 43(2), 667-673.
[34] 
Byvatov, E.; Fechner, U.; Sadowski, J.; Schneider, G. Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci.,  2003, 43(6), 1882-1889.
[35] 
Liu, Y. A comparative study on feature selection methods for drug discovery. J. Chem. Inf. Comput. Sci.,  2004, 44(5), 1823-1828.
[36] 
Niehaus, K.E.; Walker, T.M.; Crook, D.W.; Peto, T.E.; Clifton, D.A. In. Machine learning for the prediction of antibacterial susceptibility
in Mycobacterium tuberculosis,  Biomedical and Health
Informatics (BHI), 2014 IEEE-EMBS International Conference on,
IEEE: 2014; pp. 618-621.
[37] 
Menden, M.P.; Iorio, F.; Garnett, M.; McDermott, U.; Benes, C.H.; Ballester, P.J.; Saez-Rodriguez, J. Machine learning prediction of cancer cell sensitivity to drugs based on genomic and chemical properties. PLoS One,  2013, 8(4), e61318.
[38] 
Durrant, J.D.; Amaro, R.E. Machine‐learning techniques applied to antibacterial drug discovery. Chem. Biol. Drug Des.,  2015, 85(1), 14-21.
[39] 
Gawehn, E.; Hiss, J.A.; Schneider, G. Deep learning in drug discovery. Mol. Inform.,  2016, 35(1), 3-14.
[40] 
Lusci, A.; Pollastri, G.; Baldi, P. Deep architectures and deep learning in chemoinformatics: The prediction of aqueous solubility for drug-like molecules. J. Chem. Inf. Model.,  2013, 53(7), 1563-1575.
[41] 
Korkmaz, S.; Zararsiz, G.; Goksuluk, D. Mlvis: A web tool for machine learning-based virtual screening in early-phase of drug discovery and development. PLoS One,  2015, 10(4), e0124600.
[42] 
Hughes, T.B.; Miller, G.P.; Swamidass, S.J. Modeling epoxidation of drug-like molecules with a deep machine learning network. ACS Cent. Sci.,  2015, 1(4), 168-180.
[43] 
Aliper, A.; Plis, S.; Artemov, A.; Ulloa, A.; Mamoshina, P.; Zhavoronkov, A. Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm.,  2016, 13(7), 2524-2530.
[44] 
Naik, A.W.; Kangas, J.D.; Sullivan, D.P.; Murphy, R.F. Active machine learning-driven experimentation to determine compound effects on protein patterns. eLife,  2016, 5, e10047.
[45] 
Ding, H.; Takigawa, I.; Mamitsuka, H.; Zhu, S. Similarity-based machine learning methods for predicting drug-target interactions: A brief review. Brief. Bioinform.,  2013, 15(5), 734-747.
[46] 
Giguere, S.; Laviolette, F.; Marchand, M.; Tremblay, D.; Moineau, S.; Liang, X.; Biron, É.; Corbeil, J. Machine learning assisted design of highly active peptides for drug discovery. PLOS Comput. Biol.,  2015, 11(4), e1004074.
[47] 
Murphy, R.F. An active role for machine learning in drug development. Nat. Chem. Biol.,  2011, 7(6), 327.
[48] 
Costello, J.C.; Heiser, L.M.; Georgii, E.; Gönen, M.; Menden, M.P.; Wang, N.J.; Bansal, M.; Hintsanen, P.; Khan, S.A.; Mpindi, J.P.; Kallioniemi, O. A community effort to assess and improve drug sensitivity prediction algorithms. Nat. Biotechnol.,  2014, 32(12), 1202.
[49] 
Lin, H.; Liang, Z-Y.; Tang, H.; Chen, W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans. Comput. Biol. Bioinform., 2017. doi: 10.1109/TCBB.2017.
2666141. [Epub ahead of print].
[50] 
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics,  2017, 33(22), 3518-3523.
[51] 
Chen, W.; Tang, H.; Lin, H. MethyRNA: A web server for identification of N6-methyladenosine sites. J. Biomol. Struct. Dyn.,  2017, 35(3), 683-687.
[52] 
Liang, Z-Y.; Lai, H-Y.; Yang, H.; Zhang, C-J.; Yang, H.; Wei, H-H.; Chen, X-X.; Zhao, Y-W.; Su, Z-D.; Li, W-C. Pro54DB: A database for experimentally verified sigma-54 promoters. Bioinformatics,  2017, 33(3), 467-469.
[53] 
Feng, P.; Ding, H.; Yang, H.; Chen, W.; Lin, H.; Chou, K-C. iRNA-PseColl: Identifying the occurrence sites of different RNA modifications by incorporating collective effects of nucleotides into PseKNC. Mol. Ther. Nucleic Acids,  2017, 7, 155-163.
[54] 
Yang, H.; Tang, H.; Chen, X-X.; Zhang, C-J.; Zhu, P-P.; Ding, H.; Chen, W.; Lin, H. Identification of secretory proteins in mycobacterium tuberculosis using pseudo amino acid composition. BioMed Res. Int.,  2016, 2016, 5413903.
[55] 
Tang, H.; Zhang, C.; Chen, R.; Huang, P.; Duan, C.; Zou, P. Identification of secretory proteins of malaria parasite by feature selection technique. Lett. Org. Chem.,  2017, 14(9), 621-624.
[56] 
Feng, P-M.; Lin, H.; Chen, W. Identification of antioxidants from sequence information using Naive Bayes. Comput. Math. Methods Med.,  2013, 2013, 567529.
[57] 
Zhao, Y-W.; Lai, H-Y.; Tang, H.; Chen, W.; Lin, H. Prediction of phosphothreonine sites in human proteins by fusing different features. Sci. Rep.,  2016, 6, 34817.
[58] 
Tang, H.; Cao, R-Z.; Wang, W.; Liu, T-S.; Wang, L-M.; He, C-M. A two-step discriminated method to identify thermophilic proteins. Int. J. Biomath.,  2017, 10(04), 1750050.
[59] 
Lai, H-Y.; Chen, X-X.; Chen, W.; Tang, H.; Lin, H. Sequence-based predictive modeling to identify cancerlectins. Oncotarget,  2017, 8(17), 28169.

Rights & Permissions Print Cite

Article Metrics

171

18

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1389200219666180820112457	Print ISSN 1389-2002
Publisher Name Bentham Science Publisher	Online ISSN 1875-5453

Current Drug Metabolism

Survey of Machine Learning Techniques in Drug Discovery

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract