Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Genetic Algorithm-based Feature Selection Approach for Enhancing the Effectiveness of Similarity Searching in Ligand-based Virtual Screening

Author(s): Fouaz Berrhail* and Hacene Belhadef

Volume 15, Issue 5, 2020

Page: [431 - 444] Pages: 14

DOI: 10.2174/1574893614666191119123935

Price: $65

Abstract

Background: In the last years, similarity searching has gained wide popularity as a method for performing Ligand-Based Virtual Screening (LBVS). This screening technique functions by making a comparison of the target compound’s features with that of each compound in the database of compounds. It is well known that none of the individual similarity measures could provide the best performances each time pertaining to an active compound structure, representing all types of activity classes. In the literature, we find several techniques and strategies that have been proposed to improve the overall effectiveness of ligand-based virtual screening approaches.

Objective: In this work, our main objective is to propose a features selection approach based on genetic algorithm (FSGASS) to improve similarity searching pertaining to ligand-based virtual screening.

Methods: Our contribution allows us to identify the most important and relevant characteristics of chemical compounds and to minimize their number in their representations. This will allow the reduction of features space, the elimination of redundancy, the reduction of training execution time, and the increase of the performance of the screening process.

Results: The obtained results demonstrate superiority in the performance compared with these obtained with Tanimoto coefficient, which is considered as the most widely coefficient to quantify the similarity in the domain of LBVS.

Conclusion: Our results show that significant improvements can be obtained by using molecular similarity research methods at the basis of features selection.

Keywords: Feature selection, genetic algorithm, ligand-based virtual screening, similarity searching, similarity coefficients, molecular descriptors, drug discovery.

Graphical Abstract

[1]
Vyas V, Jain A, Jain A, Gupta A. Virtual screening: A fast tool for drug design. Sci Pharm 2008; 76(3): 333-60.
[http://dx.doi.org/10.3797/scipharm.0803-03]
[2]
Willett P. Fusing similarity rankings in ligand-based virtual screening. Comput Struct Biotechnol J 2013; 5(6) e201302002
[http://dx.doi.org/10.5936/csbj.201302002] [PMID: 24688695]
[3]
Jain AN. Virtual screening in lead discovery and optimization. Curr Opin Drug Discov Devel 2004; 7(4): 396-403.
[PMID: 15338948]
[4]
Willett P, Barnard JM, Downs GM. Chemical similarity searching. J Chem Inf Comput Sci 1998; 38(6): 983-96.
[http://dx.doi.org/10.1021/ci9800211]
[5]
Rawlins MD. Cutting the cost of drug development? Nat Rev Drug Discov 2004; 3(4): 360-4.
[http://dx.doi.org/10.1038/nrd1347] [PMID: 15060531]
[6]
Concepts and Application of Molecular Similarity. Wiley Interdiscip Rev Mol Sci 1990; 50: 376-7.
[7]
Al-Dabbagh MM, Salim N, Himmat M, Ahmed A, Saeed F. A quantum-based similarity method in virtual screening. Molecules 2015; 20(10): 18107-27.
[http://dx.doi.org/10.3390/molecules201018107] [PMID: 26445039]
[8]
Walters WP, Stahl MT, Murcko MA. Virtual screening-an overview. Drug Discov Today 1998; 3(4): 160-78.
[http://dx.doi.org/10.1016/S1359-6446(97)01163-X]
[9]
Maldonado AG, Doucet JP, Petitjean M, Fan BT. Molecular similarity and diversity in chemoinformatics: from theory to applications. Mol Divers 2006; 10(1): 39-79.
[http://dx.doi.org/10.1007/s11030-006-8697-1] [PMID: 16404528]
[10]
Ahmed A, Abdo A, Salim N. An enhancement of bayesian inference network for ligand-based virtual screening using features selection. Am J Appl Sci 2011; 8(4): 368-73.
[http://dx.doi.org/10.3844/ajassp.2011.368.373]
[11]
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A, Benítez JM, Herrera F. A review of microarray datasets and applied feature selection methods. Inf Sci (Ny) 2014; 282: 111-35.
[http://dx.doi.org/10.1016/j.ins.2014.05.042]
[12]
Kim Y-H. Effect of Changing the Basis in Genetic Algorithms Using Binary Encoding. KSII Trans Internet Inf Syst 2008; 2(4): 184-93.
[http://dx.doi.org/10.3837/tiis.2008.04.002]
[13]
Fouaz B, Hacene B, Hamza H, Saeed F. Molecular similarity searching with different similarity coefficients and different molecular descriptors. IRICT 2017 Recent Trends Inf. Tongxin Jishu 2017; 39-47.
[14]
Todeschini R, Consonni V, Xiang H, Holliday J, Buscema M, Willett P. Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets. J Chem Inf Model 2012; 52(11): 2884-901.
[http://dx.doi.org/10.1021/ci300261r] [PMID: 23078167]
[15]
Salim PWN, Holliday J. Combination of similarity rankings using data fusion. J Chem Inf Model 2003; 43(1): 435-42.
[16]
Whittle M, Willett P, Klaffke W, van Noort P. Evaluation of similarity measures for searching the dictionary of natural products database. J Chem Inf Comput Sci 2003; 43(2): 449-57.
[http://dx.doi.org/10.1021/ci025591m] [PMID: 12653508]
[17]
Fouaz B, Hacene B, Hamza H, Saeed F. Similarity searching in ligand-based virtual screening using different fingerprints and different similarity coefficients. Int J Intell Syst Technol Appl 2019; 18(4): 405-25.
[http://dx.doi.org/10.1504/IJISTA.2019.10021692]
[18]
Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G. Molecular fingerprint similarity search in virtual screening. Methods 2015; 71(C): 58-63.
[http://dx.doi.org/10.1016/j.ymeth.2014.08.005] [PMID: 25132639]
[19]
Arif SM, Hert J, Holliday JD, Malim N, Willett P. Enhancing the effectiveness of fingerprint-based virtual screening: Use of turbo similarity searching and of fragment frequencies of occurrence. In: Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics). 2009; 5780 LNBI; pp. 404-14.
[20]
Hert J, Willett P, Wilton DJ, et al. Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. J Med Chem 2005; 48(22): 7049-54.
[http://dx.doi.org/10.1021/jm050316n] [PMID: 16250664]
[21]
Willett P. Similarity-based virtual screening using 2D fingerprints. Drug Discov Today 2006; 11(23-24): 1046-53.
[http://dx.doi.org/10.1016/j.drudis.2006.10.005] [PMID: 17129822]
[22]
Chen B, Mueller C, Willett P. Combination rules for group fusion in similarity-based virtual screening. Mol Inform 2010; 29(6-7): 533-41.
[http://dx.doi.org/10.1002/minf.201000050] [PMID: 27463331]
[23]
Sani N, Holliday J, Willett P. Ligand-based virtual screening using a genetic algorithm with data fusion. Match Commun Math Comput Chem 2018; 80(3): 623-38.
[24]
Zernov VV, Balakin KV, Ivaschenko AA, Savchuk NP, Pletnev IV. Drug discovery using support vector machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions. J Chem Inf Comput Sci 2003; 43(6): 2048-56.
[http://dx.doi.org/10.1021/ci0340916] [PMID: 14632457]
[25]
Abdo A, Salim N. Similarity-based virtual screening with a bayesian inference network. ChemMedChem 2009; 4(2): 210-8.
[http://dx.doi.org/10.1002/cmdc.200800290] [PMID: 19072820]
[26]
Abdo A, Saeed F, Hamza H, Ahmed A, Salim N. Ligand expansion in ligand-based virtual screening using relevance feedback. J Comput Aided Mol Des 2012; 26(3): 279-87.
[http://dx.doi.org/10.1007/s10822-012-9543-4] [PMID: 22249773]
[27]
Ahmed A, Abdo A, Salim N. Ligand-based virtual screening using Bayesian inference network and reweighted fragments. ScientificWorldJournal 2012; 2012 410914
[http://dx.doi.org/10.1100/2012/410914] [PMID: 22623895]
[28]
Zheng M, Liu Z, Yan X, Ding Q, Gu Q, Xu J. LBVS: an online platform for ligand-based virtual screening using publicly accessible databases. Mol Divers 2014; 18(4): 829-40.
[http://dx.doi.org/10.1007/s11030-014-9545-3] [PMID: 25182364]
[29]
Al-Dabbagh MM, Salim N, Himmat M, Ahmed A, Saeed F. Quantum probability ranking principle for ligand-based virtual screening. J Comput Aided Mol Des 2017; 31(4): 365-78.
[http://dx.doi.org/10.1007/s10822-016-0003-4] [PMID: 28220440]
[30]
Himmat M, Salim N, Al-Dabbagh MM, Saeed F, Ahmed A. Adapting document similarity measures for ligand-based virtual screening. Molecules 2016; 21(4): 476.
[http://dx.doi.org/10.3390/molecules21040476] [PMID: 27089312]
[31]
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR. Protein-Ligand Scoring with Convolutional Neural Networks. J Chem Inf Model 2017; 57(4): 942-57.
[http://dx.doi.org/10.1021/acs.jcim.6b00740] [PMID: 28368587]
[32]
Lo YC, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today 2018; 23(8): 1538-46.
[http://dx.doi.org/10.1016/j.drudis.2018.05.010] [PMID: 29750902]
[33]
Chen H, Engkvist O, Wang Y, Olivecrona M, Blaschke T. The rise of deep learning in drug discovery. Drug Discov Today 2018; 23(6): 1241-50.
[http://dx.doi.org/10.1016/j.drudis.2018.01.039] [PMID: 29366762]
[34]
Brezočnik L, Fister I, Podgorelec V. Swarm Intelligence Algorithms for Feature Selection: A Review. Appl Sci 2018; 8(9): 1521.
[http://dx.doi.org/10.3390/app8091521]
[35]
Viegas F, Roch L, Goncalves M, et al. A Genetic Programming approach for feature selection in highly dimensional skewed data. Neurocomputing 2018; 273: 554-69.
[http://dx.doi.org/10.1016/j.neucom.2017.08.050]
[36]
Zhang Y, Gong D, Hu Y, Zhang W. Feature selection algorithm based on bare bones particle swarm optimization. Neurocomputing 2015; 148: 150-7.
[http://dx.doi.org/10.1016/j.neucom.2012.09.049]
[37]
Zhang Y, Gong D, Cheng J. Multi-objective Particle Swarm Optimization Approach for Cost-based Feature Selection in Classification. IEEE/ACM Trans Comput Biol Bioinforma 2015; 5963(1): 1-13.
[38]
Zhang Y, Cheng S, Shi Y, Gong D, Zhao X. Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Syst Appl 2019; 137: 46-58.
[http://dx.doi.org/10.1016/j.eswa.2019.06.044]
[39]
Abdo A, Chen B, Mueller C, Salim N, Willett P. Ligand-based virtual screening using Bayesian networks. J Chem Inf Model 2010; 50(6): 1012-20.
[http://dx.doi.org/10.1021/ci100090p] [PMID: 20504032]
[40]
Mladenić D. Feature selection for dimensionality reduction.In Subspace, Latent Structure and Feature SelectionSof Lect Notes Comput Sci. Springer Berlin Heidelb 2006; 394: pp. 84-102.
[http://dx.doi.org/10.1007/11752790_5]
[41]
Grünauer A, Vincze M. Using Dimension Reduction to Improve the Classification of High-dimensional Data Proc 39th Annu Work Austrian Assoc Pattern Recognit (OAGM) 2015. arXiv:1505.01065.
[42]
Ammu K, Preeja V. Review on feature selection techniques of DNA microarray data. Int J Comput Appl 2013; 61(12): 39-44.
[43]
Tang Y, Huang YZ. Development of two-stage SVM-RFE gene selection strategy formicroarray expression data analysis. IEEE/ACM Trans Comput Biol Bioinforma 2007; 4(3): 365-81.
[44]
Goldberg DE, Holland JH. Genetic Algorithms and Machine Learning. Mach Learn 1988; 3(2): 95-9.
[http://dx.doi.org/10.1023/A:1022602019183]
[45]
Zhang Y, Song X, Gong D. A return-cost-based binary firefly algorithm for feature selection. Inf Sci (Ny) 2017; 418: 561-74.
[http://dx.doi.org/10.1016/j.ins.2017.08.047]
[46]
Zhang W, Zhang Y, Peng C. Brain storm optimization for feature selection using new individual clustering and updating mechanism. Appl Intell 2019; 1-9.
[http://dx.doi.org/10.1007/s10489-019-01513-5]
[47]
Zhang Y, Li H, Wang Q, Peng C. A filter-based bare-bone particle swarm optimization algorithm for unsupervised feature selection. Appl Intell 2019; 49(8): 2889-98.
[http://dx.doi.org/10.1007/s10489-019-01420-9]
[48]
Report, Mdl drug data: Sci Tegic Accelrys Inc, the MDL Drug Data Report (MDDR) Available online. http://accelrys.com/products/collaborative-science/databases/bioactivity-databases/mddr.html [accessed on 2 April 2018]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy