Abstract
Background: There have been rapid developments in various bioinformatics technologies, which have led to the accumulation of a large amount of biomedical data. However, these datasets usually involve thousands of features and include much irrelevant or redundant information, which leads to confusion during diagnosis. Feature selection is a solution that consists of finding the optimal subset, which is known to be an NP problem because of the large search space.
Objective: For the issue, this paper proposes a hybrid feature selection method based on an improved chemical reaction optimization algorithm (ICRO) and an information gain (IG) approach, which called IGICRO.
Methods: IG is adopted to obtain some important features. The neighborhood search mechanism is combined with ICRO to increase the diversity of the population and improve the capacity of local search.
Results: Experimental results of eight public available data sets demonstrate that our proposed approach outperforms original CRO and other state-of-the-art approaches.
Keywords: Feature selection, chemical reaction optimization algorithm (CRO), information gain, neighborhood search mechanism, biomedical data, optimal subset.
Graphical Abstract
[http://dx.doi.org/10.1093/bioinformatics/bty013] [PMID: 29365057]
[http://dx.doi.org/10.1109/TCBB.2018.2832078] [PMID: 29994051]
[http://dx.doi.org/10.1155/2015/198363] [PMID: 26170834]
[http://dx.doi.org/10.26599/TST.2018.9010101]
[http://dx.doi.org/10.1016/j.patrec.2007.05.011]
[http://dx.doi.org/10.1016/j.eswa.2010.12.156]
[http://dx.doi.org/10.1016/j.ipm.2007.09.014]
[http://dx.doi.org/10.1080/09720502.2016.1259769]
[http://dx.doi.org/10.1080/03091902.2019.1572236] [PMID: 30875262]
[http://dx.doi.org/10.1016/j.jbi.2018.07.014] [PMID: 30031057]
[http://dx.doi.org/10.6025/jic/2018/9/3/93-101]
[http://dx.doi.org/10.1016/j.asoc.2015.09.006]
[http://dx.doi.org/10.1016/j.neucom.2017.04.053]
[http://dx.doi.org/10.1177/1077546313511841]
[http://dx.doi.org/10.3233/IFS-141500]
[http://dx.doi.org/10.1504/IJIEI.2018.091875]
[http://dx.doi.org/10.1109/TEVC.2012.2227973]
[http://dx.doi.org/10.1016/j.enconman.2014.06.041]
[http://dx.doi.org/10.1080/03052150500384759]
[http://dx.doi.org/10.1016/j.compbiolchem.2007.09.005] [PMID: 18023261]
[http://dx.doi.org/10.1016/j.ejor.2016.03.043]
[http://dx.doi.org/10.1038/s41598-017-04037-5] [PMID: 28659577]
[http://dx.doi.org/10.1016/j.chemolab.2018.11.010]
[http://dx.doi.org/10.1049/iet-sen.2018.0006]
[http://dx.doi.org/10.1016/j.asoc.2018.04.033]
[http://dx.doi.org/10.1016/j.neucom.2016.08.089]
[http://dx.doi.org/10.1109/ICCSEE.2012.97]
[http://dx.doi.org/10.1016/j.eswa.2011.04.126]
[http://dx.doi.org/10.1016/j.cor.2012.05.009]
[http://dx.doi.org/10.1016/j.asoc.2013.03.021]
[http://dx.doi.org/10.1016/j.eswa.2018.12.022]
[http://dx.doi.org/10.1016/j.ins.2017.09.028]
[http://dx.doi.org/10.1016/j.asoc.2017.11.006]
[http://dx.doi.org/10.1016/j.patcog.2007.02.007]
[http://dx.doi.org/10.1016/j.asoc.2017.03.002]
[http://dx.doi.org/10.1016/j.ygeno.2018.04.004] [PMID: 29660477]
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[http://dx.doi.org/10.1016/j.artmed.2005.01.006] [PMID: 16026974]
[http://dx.doi.org/10.1007/s11042-013-1583-9]
[http://dx.doi.org/10.1007/978-3-540-30076-2_28]
[http://dx.doi.org/10.1016/j.jbi.2004.07.009] [PMID: 15465478]