Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

General Research Article

Feature Selection Algorithm for High-dimensional Biomedical Data Using Information Gain and Improved Chemical Reaction Optimization

Author(s): Ge Zhang, Pan Yu , Jianlin Wang* and Chaokun Yan*

Volume 15, Issue 8, 2020

Page: [912 - 926] Pages: 15

DOI: 10.2174/1574893615666200204154358

Price: $65

Abstract

Background: There have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. However, these datasets usually involve thousands of features and include much irrelevant or redundant information, which leads to confusion during diagnosis. Feature selection is a solution that consists of finding the optimal subset, which is known to be an NP problem because of the large search space.

Objective: For the issue, this paper proposes a hybrid feature selection method based on an improved chemical reaction optimization algorithm (ICRO) and an information gain (IG) approach, which called IGICRO.

Methods: IG is adopted to obtain some important features. The neighborhood search mechanism is combined with ICRO to increase the diversity of the population and improve the capacity of local search.

Results: Experimental results of eight public available data sets demonstrate that our proposed approach outperforms original CRO and other state-of-the-art approaches.

Keywords: Feature selection, chemical reaction optimization algorithm (CRO), information gain, neighborhood search mechanism, biomedical data, optimal subset.

Graphical Abstract

[1]
Luo H, Li M, Wang S, Liu Q, Li Y, Wang J. Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics 2018; 34(11): 1904-12.
[http://dx.doi.org/10.1093/bioinformatics/bty013] [PMID: 29365057]
[2]
Luo H, Wang J, Li M, et al. Computational drug repositioning with random walk on a heterogeneous network. IEEE/ACM Trans Comput Biol Bioinformatics 2019; 16(6): 1890-900.
[http://dx.doi.org/10.1109/TCBB.2018.2832078] [PMID: 29994051]
[3]
Hira ZM, Gillies DF. A review of feature selection and feature extraction methods applied on microarray data. Adv Bioinforma 2015; 2015198363
[http://dx.doi.org/10.1155/2015/198363] [PMID: 26170834]
[4]
Yan C, Ma J, Luo H, et al. A hybrid algorithm based on binary chemical reaction optimization and tabu search for feature selection of high-dimensional biomedical data. Tsinghua Sci Technol 2018; 23(6): 733-43.
[http://dx.doi.org/10.26599/TST.2018.9010101]
[5]
Huang J, Cai Y, Xu X. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognit Lett 2007; 28(13): 1825-44.
[http://dx.doi.org/10.1016/j.patrec.2007.05.011]
[6]
Hsu HH, Hsieh CW, Lu MD. Hybrid feature selection by combining filters and wrappers. Expert Syst Appl 2011; 38(7): 8144-50.
[http://dx.doi.org/10.1016/j.eswa.2010.12.156]
[7]
Martín-Valdivia MT, Díaz-Galiano MC, Montejo-Raez A, et al. Using information gain to improve multi-modal information retrieval systems. Inf Process Manage 2008; 44(3): 1146-58.
[http://dx.doi.org/10.1016/j.ipm.2007.09.014]
[8]
Li YH. Text feature selection algorithm based on chi-square rank correlation factorization. J Interdiscip Math 2017; 20(1): 153-60.
[http://dx.doi.org/10.1080/09720502.2016.1259769]
[9]
Sharmila A, Geethanjali P. Evaluation of time domain features using best feature subsets based on mutual information for detecting epilepsy. J Med Eng Technol 2018; 42(7): 487-500.
[http://dx.doi.org/10.1080/03091902.2019.1572236] [PMID: 30875262]
[10]
Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore JH. Relief-based feature selection: introduction and review. J Biomed Inform 2018; 85: 189-203.
[http://dx.doi.org/10.1016/j.jbi.2018.07.014] [PMID: 30031057]
[11]
Ahmad W, Huang L, Ahmad A, et al. Thyroid diseases forecasting using a hybrid decision support system based on ANFIS, k-NN and information gain method. J Appl Environ Biol Sci 2017; 7: 78-85.
[12]
Ding J, Fu L. A hybrid feature selection algorithm based on information gain and sequential forward floating search. J Intel Comp Vol 2018; 9: 93-101.
[http://dx.doi.org/10.6025/jic/2018/9/3/93-101]
[13]
Chuang LY, Ke CH, Yang CH. A hybrid both filter and wrapper feature selection method for microarray classification. arXiv preprint arXiv 2016.https://arxiv.xilesou.top/abs/1612.08669
[14]
Verbiest N, Derrac J, Cornelis C, et al. Evolutionary wrapper approaches for training set selection as preprocessing mechanism for support vector machines: Experimental evaluation and support vector analysis. Appl Soft Comput 2016; 38: 10-22.
[http://dx.doi.org/10.1016/j.asoc.2015.09.006]
[15]
Mafarja MM, Mirjalili S. Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 2017; 260: 302-12.
[http://dx.doi.org/10.1016/j.neucom.2017.04.053]
[16]
Ao HL, Cheng J, Yang Y, et al. The support vector machine parameter optimization method based on artificial chemical reaction optimization algorithm and its application to roller bearing fault diagnosis. J Vib Control 2015; 21(12): 2434-45.
[http://dx.doi.org/10.1177/1077546313511841]
[17]
Truong TK, Li K, Xu Y, et al. Solving 0-1 knapsack problem by artificial chemical reaction optimization algorithm with a greedy strategy. J Intell Fuzzy Syst 2015; 28(5): 2179-86.
[http://dx.doi.org/10.3233/IFS-141500]
[18]
Marzouki B, Driss OB, Ghédira K. Multi-agent model based on combination of chemical reaction optimisation metaheuristic with tabu search for flexible job shop scheduling problem. Int J Intel Eng Inform 2018; 6(3-4): 242-65.
[http://dx.doi.org/10.1504/IJIEI.2018.091875]
[19]
Nayak J, Paparao S, Naik B, et al. Chemical reaction optimization: a survey with application and challenges soft computing in data analytics. Singapore: Springer 2019; pp. 507-24.
[20]
Doshi J, Chindhe M, Kharche Y, et al. Simultaneous gene selection and cancer classification using chemical reaction optimization. Proceedings of the World Congress on Engineering London: Springer; 2014.
[21]
Lam AYS, Li VOK, Xu J. On the convergence of chemical reaction optimization for combinatorial optimization. IEEE Trans Evol Comput 2012; 17(5): 605-20.
[http://dx.doi.org/10.1109/TEVC.2012.2227973]
[22]
Salcedo-Sanz S, Pastor-Sánchez A, Prieto L, et al. Feature selection in wind speed prediction systems based on a hybrid coral reefs optimization–extreme learning machine approach. Energy Convers Manage 2014; 87: 10-8.
[http://dx.doi.org/10.1016/j.enconman.2014.06.041]
[23]
Babatunde OH, Armstrong L, Leng J, et al. A genetic algorithm-based feature selection. Int J Electr Commun Comp Eng 2014; 2014: 2278-4209.
[24]
Yusuf M, Lansey K, Pasha F. Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng Optim 2006; 38(2): 129-54.
[http://dx.doi.org/10.1080/03052150500384759]
[25]
Hu B, Dai YQ, Su Y, et al. Feature selection for optimized high dimensional biomedical data using the improved shuffled frog leaping algorithm. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2018; 15(6): 1765-73.
[26]
Chuang LY, Chang HW, Tu CJ, Yang CH. Improved binary PSO for feature selection using gene expression data. Comput Biol Chem 2008; 32(1): 29-37.
[http://dx.doi.org/10.1016/j.compbiolchem.2007.09.005] [PMID: 18023261]
[27]
Fong S, Deb S, Hanne T, et al. Eidetic wolf search algorithm with a global memory structure. Eur J Oper Res 2016; 254(1): 19-28.
[http://dx.doi.org/10.1016/j.ejor.2016.03.043]
[28]
Li J, Fong S, Wong RK, Millham R, Wong KKL. Elitist binary wolf search algorithm for heuristic feature selection in high-dimensional bioinformatics datasets. Sci Rep 2017; 7(1): 4354.
[http://dx.doi.org/10.1038/s41598-017-04037-5] [PMID: 28659577]
[29]
Yan C, Ma J, Luo H, et al. Hybrid binary coral reefs optimization algorithm with simulated annealing for feature selection in high-dimensional biomedical datasets. Chemom Intell Lab Syst 2019; 184: 102-11.
[http://dx.doi.org/10.1016/j.chemolab.2018.11.010]
[30]
Yang CH, Chuang LY, Yang CHIG-GA. a hybrid filter/wrapper method for feature selection of microarray data. J Med Biol Eng 2010; 30(1): 23-8.
[31]
Sahu B. A combo feature selection method (Filter+Wrapper) for microarray gene classification. Int J Pure Appl Math 2018; 118(16): 389-401.
[32]
Liu Y, Yi X, Chen R, et al. Feature extraction based on information gain and sequential pattern for english question classification. IET Softw 2018; 12(6): 520-6.
[http://dx.doi.org/10.1049/iet-sen.2018.0006]
[33]
Jadhav S, He H, Jenkins K. Information gain directed genetic algorithm wrapper feature selection for credit rating. Appl Soft Comput 2018; 69: 541-53.
[http://dx.doi.org/10.1016/j.asoc.2018.04.033]
[34]
Lai CM, Yeh WC, Chang CY. Gene selection using information gain and improved simplified swarm optimization. Neurocomputing 2016; 218: 331-8.
[http://dx.doi.org/10.1016/j.neucom.2016.08.089]
[35]
Grube GW, Markison TW. Encoding data utilizing a zero information gain function. US Patent 20190138393A1, 2014.
[36]
Lei S. A feature selection method based on information gain and genetic algorithm. Proceedings of International Conference on Computer Science and Electronics Engineering March 23-25 Hangzhou, China IEEE 2012.
[http://dx.doi.org/10.1109/ICCSEE.2012.97]
[37]
Alatas B. ACROA: artificial chemical reaction optimization algorithm for global optimization. Expert Syst Appl 2011; 38(10): 13170-80.
[http://dx.doi.org/10.1016/j.eswa.2011.04.126]
[38]
Nayak J, Paparao S, Naik B, et al. Soft Computing in Data Analytics.Chemical reaction optimization: a survey with application and challenges proceedings of soft computing in data analytics. Singapore: Springer 2019; pp. 507-24.
[39]
Jarboui B, Derbel H, Hanafi S, et al. Variable neighborhood search for location routing. Comput Oper Res 2013; 40(1): 47-57.
[http://dx.doi.org/10.1016/j.cor.2012.05.009]
[40]
Vieira SM, Mendonça LF, Farinha GJ, et al. Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients. Appl Soft Comput 2013; 13(8): 3494-504.
[http://dx.doi.org/10.1016/j.asoc.2013.03.021]
[41]
Ghanad NK, Ahmadi S. Combination of PSO algorithm and naive Bayesian classification for Parkinson disease diagnosis. Adv Comp Sci: An Int J 2015; 4(4): 119-25.
[42]
Sayed S, Nassef M, Badr A, et al. A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets. Expert Syst Appl 2019; 121: 233-43.
[http://dx.doi.org/10.1016/j.eswa.2018.12.022]
[43]
Hancer E, Xue B, Zhang M, et al. Pareto front feature selection based on artificial bee colony optimization. Inf Sci 2018; 422: 462-79.
[http://dx.doi.org/10.1016/j.ins.2017.09.028]
[44]
Mafarja M, Mirjalili S. Whale optimization approaches for wrapper feature selection. Appl Soft Comput 2018; 62: 441-53.
[http://dx.doi.org/10.1016/j.asoc.2017.11.006]
[45]
Zhu Z, Ong YS, Dash M. Markov blanket-embedded genetic algorithm for gene selection. Pattern Recognit 2007; 40(11): 3236-48.
[http://dx.doi.org/10.1016/j.patcog.2007.02.007]
[46]
Pashaei E, Aydin N. Binary black hole algorithm for feature selection and classification on biological data. Appl Soft Comput 2017; 56: 94-106.
[http://dx.doi.org/10.1016/j.asoc.2017.03.002]
[47]
Pashaei E, Pashaei E, Aydin N. Gene selection using hybrid binary black hole algorithm and modified binary particle swarm optimization. Genomics 2019; 111(4): 669-86.
[http://dx.doi.org/10.1016/j.ygeno.2018.04.004] [PMID: 29660477]
[48]
Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 2005; 3(2): 185-205.
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[49]
Huang TM, Kecman V. Gene extraction for cancer diagnosis by support vector machines--an improvement. Artif Intell Med 2005; 35(1-2): 185-94.
[http://dx.doi.org/10.1016/j.artmed.2005.01.006] [PMID: 16026974]
[50]
Safran M, Dalah I, Alexander J, et al. GeneCards Version 3:the human gene integrator. Database 2010.www.genecards.org/
[51]
Park DK, Jung EY, Lee SH, et al. A composite gene selection for DNA microarray data analysis. Multimedia Tools Appl 2015; 74(20): 9031-41.
[http://dx.doi.org/10.1007/s11042-013-1583-9]
[52]
Aguilar-Ruiz JS, Azuaje F, Riquelme JC. Data mining approaches to diffuse large B– Cell Lymphoma gene expression data interpretation. International Conference on Data Warehousing and Knowledge Discovery Berlin: Springer 2004.
[http://dx.doi.org/10.1007/978-3-540-30076-2_28]
[53]
Zhou X, Liu KY, Wong STC. Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inform 2004; 37(4): 249-59.
[http://dx.doi.org/10.1016/j.jbi.2004.07.009] [PMID: 15465478]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy