Abstract
Background: Microarray data is widely utilized for disease analysis and diagnosis. However, it is hard to process them directly and achieve high classification accuracy due to the intrinsic characteristics of high dimensionality and small size samples. As an important data preprocessing technique, feature selection is usually used to reduce the dimensionality of some datasets.
Methods: Given the limitations of employing filter or wrapper approaches individually for feature selection, in the study, a novel hybrid filter-wrapper approach, CS_IFOA, is proposed for high dimensional datasets. First, the Chi-square Test is utilized to filter out some irrelevant or redundant features. Next, an improved binary Fruit Fly Optimization algorithm is conducted to further search the optimal feature subset without degrading the classification accuracy. Here, the KNN classifier with the 10-fold-CV is utilized to evaluate the classification accuracy.
Results: Extensive experimental results on six benchmark biomedical datasets show that the proposed CS-IFOA can achieve superior performance compared with other state-of-the-art methods. The CS-IFOA can get a smaller number of features while achieving higher classification accuracy. Furthermore, the standard deviation of the experimental results is relatively small, which indicates that the proposed algorithm is relatively robust.
Conclusion: The results confirmed the efficiency of our approach in identifying some important genes for high-dimensional biomedical datasets, which can be used as an ideal pre-processing tool to help optimize the feature selection process, and improve the efficiency of disease diagnosis.
Keywords: Feature selection, fruit fly optimization algorithm, Chi-square Test, levy flight, Gaussian mutation, algorithm.
Graphical Abstract
[http://dx.doi.org/10.1007/s00521-012-0847-z]
[http://dx.doi.org/10.1016/j.knosys.2015.04.015]
[http://dx.doi.org/10.1016/j.ipm.2007.09.014]
[http://dx.doi.org/10.1093/bioinformatics/17.6.509] [PMID: 11395427]
[http://dx.doi.org/10.1007/3-540-57868-4_57]
[http://dx.doi.org/10.1016/j.asoc.2015.09.006]
[http://dx.doi.org/10.1109/34.574797]
[http://dx.doi.org/10.1109/TEVC.2015.2504420]
[http://dx.doi.org/10.1016/j.asoc.2013.03.021]
[http://dx.doi.org/10.1109/TCBB.2016.2602263] [PMID: 28113635]
[http://dx.doi.org/10.1016/j.patrec.2016.03.014]
[http://dx.doi.org/10.1016/j.chemolab.2018.11.010]
[http://dx.doi.org/10.1016/j.neucom.2017.04.053]
[http://dx.doi.org/10.1016/j.knosys.2011.07.001]
[http://dx.doi.org/10.1016/j.knosys.2016.05.019]
[http://dx.doi.org/10.1371/journal.pone.0173516] [PMID: 28369096]
[http://dx.doi.org/10.2307/1402731]
[http://dx.doi.org/10.1007/11691730_11]
[http://dx.doi.org/10.1103/PhysRevE.49.4677] [PMID: 9961762]
[http://dx.doi.org/10.1016/j.patcog.2007.02.007]
[http://dx.doi.org/10.1126/science.286.5439.531] [PMID: 10521349]
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[http://dx.doi.org/10.1016/j.physrep.2017.07.007]
[http://dx.doi.org/10.1016/j.eswa.2016.03.047]
[http://dx.doi.org/10.1007/978-3-540-30076-2_28]