Genetic Algorithm-based Feature Selection Approach for Enhancing the Effectiveness of Similarity Searching in Ligand-based Virtual Screening

Fouaz       Berrhail; Hacene       Belhadef

doi:10.2174/1574893614666191119123935

Abstract

Background: In the last years, similarity searching has gained wide popularity as a method for performing Ligand-Based Virtual Screening (LBVS). This screening technique functions by making a comparison of the target compound’s features with that of each compound in the database of compounds. It is well known that none of the individual similarity measures could provide the best performances each time pertaining to an active compound structure, representing all types of activity classes. In the literature, we find several techniques and strategies that have been proposed to improve the overall effectiveness of ligand-based virtual screening approaches.

Objective: In this work, our main objective is to propose a features selection approach based on genetic algorithm (FSGASS) to improve similarity searching pertaining to ligand-based virtual screening.

Methods: Our contribution allows us to identify the most important and relevant characteristics of chemical compounds and to minimize their number in their representations. This will allow the reduction of features space, the elimination of redundancy, the reduction of training execution time, and the increase of the performance of the screening process.

Results: The obtained results demonstrate superiority in the performance compared with these obtained with Tanimoto coefficient, which is considered as the most widely coefficient to quantify the similarity in the domain of LBVS.

Conclusion: Our results show that significant improvements can be obtained by using molecular similarity research methods at the basis of features selection.

Keywords: Feature selection, genetic algorithm, ligand-based virtual screening, similarity searching, similarity coefficients, molecular descriptors, drug discovery.

« Previous Next »

Graphical Abstract

[1] 
Vyas V, Jain A, Jain A, Gupta A. Virtual screening: A fast tool for drug design. Sci Pharm  2008; 76(3): 333-60.
[http://dx.doi.org/10.3797/scipharm.0803-03] 
[2] 
Willett P. Fusing similarity rankings in ligand-based virtual screening. Comput Struct Biotechnol J  2013; 5(6) e201302002
[http://dx.doi.org/10.5936/csbj.201302002] [PMID:  24688695] 
[3] 
Jain AN. Virtual screening in lead discovery and optimization. Curr Opin Drug Discov Devel  2004; 7(4): 396-403.
[PMID:  15338948] 
[4] 
Willett P, Barnard JM, Downs GM. Chemical similarity searching. J Chem Inf Comput Sci  1998; 38(6): 983-96.
[http://dx.doi.org/10.1021/ci9800211] 
[5] 
Rawlins MD. Cutting the cost of drug development? Nat Rev Drug Discov  2004; 3(4): 360-4.
[http://dx.doi.org/10.1038/nrd1347] [PMID:  15060531] 
[6] 
Concepts and Application of Molecular Similarity. Wiley Interdiscip Rev Mol Sci  1990; 50: 376-7.
[7] 
Al-Dabbagh MM, Salim N, Himmat M, Ahmed A, Saeed F. A quantum-based similarity method in virtual screening. Molecules  2015; 20(10): 18107-27.
[http://dx.doi.org/10.3390/molecules201018107] [PMID:  26445039] 
[8] 
Walters WP, Stahl MT, Murcko MA. Virtual screening-an overview. Drug Discov Today  1998; 3(4): 160-78.
[http://dx.doi.org/10.1016/S1359-6446(97)01163-X] 
[9] 
Maldonado AG, Doucet JP, Petitjean M, Fan BT. Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers  2006; 10(1): 39-79.
[http://dx.doi.org/10.1007/s11030-006-8697-1] [PMID:  16404528] 
[10] 
Ahmed A, Abdo A, Salim N. An enhancement of bayesian inference network for ligand-based virtual screening using features selection. Am J Appl Sci  2011; 8(4): 368-73.
[http://dx.doi.org/10.3844/ajassp.2011.368.373] 
[11] 
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F. A review of microarray datasets and applied feature selection methods. Inf Sci (Ny)  2014; 282: 111-35.
[http://dx.doi.org/10.1016/j.ins.2014.05.042] 
[12] 
Kim Y-H. Effect of Changing the Basis in Genetic Algorithms Using Binary Encoding. KSII Trans Internet Inf Syst  2008; 2(4): 184-93.
[http://dx.doi.org/10.3837/tiis.2008.04.002] 
[13] 
Fouaz B, Hacene B, Hamza H, Saeed F. Molecular similarity searching with different similarity coefficients and different molecular descriptors. IRICT 2017 Recent Trends Inf. Tongxin Jishu  2017; 39-47.
[14] 
Todeschini R, Consonni V, Xiang H, Holliday J, Buscema M, Willett P. Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets. J Chem Inf Model  2012; 52(11): 2884-901.
[http://dx.doi.org/10.1021/ci300261r] [PMID:  23078167] 
[15] 
Salim PWN, Holliday J. Combination of similarity rankings using data fusion. J Chem Inf Model  2003; 43(1): 435-42.
[16] 
Whittle M, Willett P, Klaffke W, van Noort P. Evaluation of similarity measures for searching the dictionary of natural products database. J Chem Inf Comput Sci  2003; 43(2): 449-57.
[http://dx.doi.org/10.1021/ci025591m] [PMID:  12653508] 
[17] 
Fouaz B, Hacene B, Hamza H, Saeed F. Similarity searching in ligand-based virtual screening using different fingerprints and different similarity coefficients. Int J Intell Syst Technol Appl  2019; 18(4): 405-25.
[http://dx.doi.org/10.1504/IJISTA.2019.10021692] 
[18] 
Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods  2015; 71(C): 58-63.
[http://dx.doi.org/10.1016/j.ymeth.2014.08.005] [PMID:  25132639] 
[19] 
Arif SM, Hert J, Holliday JD, Malim N, Willett P. Enhancing the effectiveness of fingerprint-based virtual screening: Use of turbo similarity searching and of fragment frequencies of occurrence. In: Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics).   2009; 5780 LNBI; pp. 404-14.
[20] 
Hert J, Willett P, Wilton DJ, et al. Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. J Med Chem  2005; 48(22): 7049-54.
[http://dx.doi.org/10.1021/jm050316n] [PMID:  16250664] 
[21] 
Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today  2006; 11(23-24): 1046-53.
[http://dx.doi.org/10.1016/j.drudis.2006.10.005] [PMID:  17129822] 
[22] 
Chen B, Mueller C, Willett P. Combination rules for group fusion in similarity-based virtual screening. Mol Inform  2010; 29(6-7): 533-41.
[http://dx.doi.org/10.1002/minf.201000050] [PMID:  27463331] 
[23] 
Sani N, Holliday J, Willett P. Ligand-based virtual screening using a genetic algorithm with data fusion. Match Commun Math Comput Chem  2018; 80(3): 623-38.
[24] 
Zernov VV, Balakin KV, Ivaschenko AA, Savchuk NP, Pletnev IV. Drug discovery using support vector machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions. J Chem Inf Comput Sci  2003; 43(6): 2048-56.
[http://dx.doi.org/10.1021/ci0340916] [PMID:  14632457] 
[25] 
Abdo A, Salim N. Similarity-based virtual screening with a bayesian inference network. ChemMedChem  2009; 4(2): 210-8.
[http://dx.doi.org/10.1002/cmdc.200800290] [PMID:  19072820] 
[26] 
Abdo A, Saeed F, Hamza H, Ahmed A, Salim N. Ligand expansion in ligand-based virtual screening using relevance feedback. J Comput Aided Mol Des  2012; 26(3): 279-87.
[http://dx.doi.org/10.1007/s10822-012-9543-4] [PMID:  22249773] 
[27] 
Ahmed A, Abdo A, Salim N. Ligand-based virtual screening using Bayesian inference network and reweighted fragments. ScientificWorldJournal  2012; 2012 410914
[http://dx.doi.org/10.1100/2012/410914] [PMID:  22623895] 
[28] 
Zheng M, Liu Z, Yan X, Ding Q, Gu Q, Xu J. LBVS: an online platform for ligand-based virtual screening using publicly accessible databases. Mol Divers  2014; 18(4): 829-40.
[http://dx.doi.org/10.1007/s11030-014-9545-3] [PMID:  25182364] 
[29] 
Al-Dabbagh MM, Salim N, Himmat M, Ahmed A, Saeed F. Quantum probability ranking principle for ligand-based virtual screening. J Comput Aided Mol Des  2017; 31(4): 365-78.
[http://dx.doi.org/10.1007/s10822-016-0003-4] [PMID:  28220440] 
[30] 
Himmat M, Salim N, Al-Dabbagh MM, Saeed F, Ahmed A. Adapting document similarity measures for ligand-based virtual screening. Molecules  2016; 21(4): 476.
[http://dx.doi.org/10.3390/molecules21040476] [PMID:  27089312] 
[31] 
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR. Protein-Ligand Scoring with Convolutional Neural Networks. J Chem Inf Model  2017; 57(4): 942-57.
[http://dx.doi.org/10.1021/acs.jcim.6b00740] [PMID:  28368587] 
[32] 
Lo YC, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today  2018; 23(8): 1538-46.
[http://dx.doi.org/10.1016/j.drudis.2018.05.010] [PMID:  29750902] 
[33] 
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today  2018; 23(6): 1241-50.
[http://dx.doi.org/10.1016/j.drudis.2018.01.039] [PMID:  29366762] 
[34] 
Brezočnik L, Fister I, Podgorelec V. Swarm Intelligence Algorithms for Feature Selection: A Review. Appl Sci  2018; 8(9): 1521.
[http://dx.doi.org/10.3390/app8091521] 
[35] 
Viegas F, Roch L, Goncalves M, et al. A Genetic Programming approach for feature selection in highly dimensional skewed data. Neurocomputing  2018; 273: 554-69.
[http://dx.doi.org/10.1016/j.neucom.2017.08.050] 
[36] 
Zhang Y, Gong D, Hu Y, Zhang W. Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing  2015; 148: 150-7.
[http://dx.doi.org/10.1016/j.neucom.2012.09.049] 
[37] 
Zhang Y, Gong D, Cheng J. Multi-objective Particle Swarm Optimization Approach for Cost-based Feature Selection in Classification. IEEE/ACM Trans Comput Biol Bioinforma  2015; 5963(1): 1-13.
[38] 
Zhang Y, Cheng S, Shi Y, Gong D, Zhao X. Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Syst Appl  2019; 137: 46-58.
[http://dx.doi.org/10.1016/j.eswa.2019.06.044] 
[39] 
Abdo A, Chen B, Mueller C, Salim N, Willett P. Ligand-based virtual screening using Bayesian networks. J Chem Inf Model  2010; 50(6): 1012-20.
[http://dx.doi.org/10.1021/ci100090p] [PMID:  20504032] 
[40] 
Mladenić D. Feature selection for dimensionality reduction.In Subspace, Latent Structure and Feature SelectionSof Lect Notes Comput Sci.  Springer Berlin Heidelb  2006; 394: pp. 84-102.
[http://dx.doi.org/10.1007/11752790_5] 
[41] 
Grünauer A, Vincze M. Using Dimension Reduction to Improve the Classification of High-dimensional Data Proc 39th Annu Work Austrian Assoc Pattern Recognit (OAGM)  2015. arXiv:1505.01065. 
[42] 
Ammu K, Preeja V. Review on feature selection techniques of DNA microarray data. Int J Comput Appl  2013; 61(12): 39-44.
[43] 
Tang Y, Huang YZ. Development of two-stage SVM-RFE gene selection strategy formicroarray expression data analysis. IEEE/ACM Trans Comput Biol Bioinforma  2007; 4(3): 365-81.
[44] 
Goldberg DE, Holland JH. Genetic Algorithms and Machine Learning. Mach Learn  1988; 3(2): 95-9.
[http://dx.doi.org/10.1023/A:1022602019183] 
[45] 
Zhang Y, Song X, Gong D. A return-cost-based binary firefly algorithm for feature selection. Inf Sci (Ny)  2017; 418: 561-74.
[http://dx.doi.org/10.1016/j.ins.2017.08.047] 
[46] 
Zhang W, Zhang Y, Peng C. Brain storm optimization for feature selection using new individual clustering and updating mechanism. Appl Intell  2019; 1-9.
[http://dx.doi.org/10.1007/s10489-019-01513-5] 
[47] 
Zhang Y, Li H, Wang Q, Peng C. A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection. Appl Intell  2019; 49(8): 2889-98.
[http://dx.doi.org/10.1007/s10489-019-01420-9] 
[48] 
Report, Mdl drug data: Sci Tegic Accelrys Inc, the MDL Drug Data Report (MDDR) Available online. http://accelrys.com/products/collaborative-science/databases/bioactivity-databases/mddr.html [accessed on 2 April 2018]

Rights & Permissions Print Cite

Article Metrics

16

1

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1574893614666191119123935	Print ISSN 1574-8936
Publisher Name Bentham Science Publisher	Online ISSN 2212-392X

Current Bioinformatics

Genetic Algorithm-based Feature Selection Approach for Enhancing the Effectiveness of Similarity Searching in Ligand-based Virtual Screening

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Abstract