Generic placeholder image

Current Materials Science

Editor-in-Chief

ISSN (Print): 2666-1454
ISSN (Online): 2666-1462

Research Article

Optimal Feature Selection from High-dimensional Microarray Dataset Employing Hybrid IG-Jaya Model

Author(s): Bibhuprasad Sahu* and Sujata Dash

Volume 17, Issue 1, 2024

Published on: 22 February, 2023

Page: [21 - 43] Pages: 23

DOI: 10.2174/2666145416666230124143912

Price: $65

Abstract

Background: Feature selection (FS) is a crucial strategy for dimensionality reduction in data preprocessing since microarray data sets typically contain redundant and extraneous features that degrade the performance and complexity of classification models.

Objective: The purpose of feature selection is to reduce the number of features from highdimensional cancer datasets and enhance classification accuracy.

Methods: This research provides a wrapper-based hybrid model integrating information gain (IG) and Jaya algorithm (JA) for determining the optimum featured genes from high-dimensional microarray datasets. This paper's comprehensive study is divided into two segments: we employed the parameterless JA to identify the featured gene subsets in the first stage without filter methods. Various classifiers evaluate JA's performance, such as SVM, LDA, NB, and DT. In the second section, we introduce a hybrid IG-JA model. The IG is used as a filter to eliminate redundant and noisy features. The reduced feature subset is then given to the JA as a wrapper to improve the hybrid model's performance using the classifiers outlined above.

Results: We used 13 benchmark microarray data sets from the public repository for experimental analysis. It is noteworthy to state that the hybrid IG-JA model performs better as compared to its counterparts.

Conclusion: Tests and statistics show that the suggested model outperforms the standard feature selection method with JA and other existing models. Our proposed model is unable to provide the best accuracy compared to other existing approaches; however, it is quite steady and good. In the future, this work could be implemented with various filter methods and real-time data sets. A multi-filter approach with the Jaya algorithm will be used to check the efficiency of the proposed one. And it would be better to choose any other hybrid model (chaos-based) with Jaya to enhance the feature selection accuracy with a high dimensional dataset.

Graphical Abstract

[1]
Guyon I, Gunn S, Nikravesh M, Zadeh LA. Eds. Feature extraction: Foundations and applications. Springer 2008.
[2]
Li J, Cheng K, Wang S, et al. Feature selection. ACM Comput Surv 2018; 50(6): 1-45. [CSUR
[http://dx.doi.org/10.1145/3136625]
[3]
Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science 2000; 290(5500): 2323-6.
[http://dx.doi.org/10.1126/science.290.5500.2323] [PMID: 11125150]
[4]
Dash S. Metaheuristic-based hybrid feature selection models Handbook of Research on Modeling, Analysis, and Application of Na-ture-Inspired Metaheuristic Algorithms. IGI Global 2018; pp. 1-22.
[http://dx.doi.org/10.4018/978-1-5225-2857-9.ch001]
[5]
Dash S. A diverse meta learning ensemble technique to handle imbalanced microarray dataset Advances in Nature and Biologically Inspired Computing. Cham: Springer 2016; pp. 1-13.
[http://dx.doi.org/10.1007/978-3-319-27400-3_1]
[6]
Dash S, Behera RN. Sampling based hybrid algorithms for imbalanced data classification. Int J Hybrid Intell Syst 2016; 13(2): 77-86.
[http://dx.doi.org/10.3233/HIS-160226]
[7]
Kabir MM, Shahjahan M, Murase K. A new local search based hybrid genetic algorithm for feature selection. Neurocomputing 2011; 74(17): 2914-28.
[http://dx.doi.org/10.1016/j.neucom.2011.03.034]
[8]
Al-Tashi Q, Abdul Kadir SJ, Rais HM, Mirjalili S, Alhussian H. Binary optimization using hybrid grey wolf optimization for feature selection IEEE Access 2019
[http://dx.doi.org/10.1109/ACCESS.2019.2906757]
[9]
Hichem H, Elkamel M, Rafik M, Mesaaoud MT, Ouahiba C. A new binary grasshopper optimization algorithm for feature selection problem. J King Saud Univ Comput Inf Sci 2019.
[http://dx.doi.org/10.1016/j.jksuci.2019.11.007]
[10]
Hammouri AI, Mafarja M, Al-Betar MA, Awadallah MA, Abu-Doush I. An improved dragonfly algorithm for feature selection. Knowl Base Syst 2020; 203: 106131.
[http://dx.doi.org/10.1016/j.knosys.2020.106131]
[11]
Ibrahim RA, Ewees AA, Oliva D, Abd Elaziz M, Lu S. Improved salp swarm algorithm based on particle swarm optimization for fea-ture selection. J Ambient Intell Humaniz Comput 2019; 10(8): 3155-69.
[http://dx.doi.org/10.1007/s12652-018-1031-9]
[12]
Kashef S, Nezamabadi-pour H. An advanced ACO algorithm for feature subset selection. Neurocomputing 2015; 147: 271-9.
[http://dx.doi.org/10.1016/j.neucom.2014.06.067]
[13]
Huang J, Cai Y, Xu X. A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recognit Lett 2007; 28(13): 1825-44.
[http://dx.doi.org/10.1016/j.patrec.2007.05.011]
[14]
Tan F, Fu X, Zhang Y, Bourgeois AG. A genetic algorithm-based method for feature subset selection. Soft Comput 2007; 12(2): 111-20.
[http://dx.doi.org/10.1007/s00500-007-0193-8]
[15]
Lee J, Choi IY, Jun CH. An efficient multivariate feature ranking method for gene selection in high-dimensional microarray data. Expert Syst Appl 2021; 166: 113971.
[http://dx.doi.org/10.1016/j.eswa.2020.113971]
[16]
Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classi-fication data. Comput Stat Data Anal 2020; 143: 106839.
[http://dx.doi.org/10.1016/j.csda.2019.106839]
[17]
Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng 2014; 40(1): 16-28.
[http://dx.doi.org/10.1016/j.compeleceng.2013.11.024]
[18]
Ghosh A, Datta A, Ghosh S. Self-adaptive differential evolution for feature selection in hyperspectral image data. Appl Soft Comput 2013; 13(4): 1969-77.
[http://dx.doi.org/10.1016/j.asoc.2012.11.042]
[19]
Cai J, Luo J, Wang S, Yang S. Feature selection in machine learning: A new perspective. Neurocomputing 2018; 300: 70-9.
[http://dx.doi.org/10.1016/j.neucom.2017.11.077]
[20]
Too J, Abdullah A, Mohd Saad N, Tee W. EMG feature selection and classification using a pbest-guide binary particle swarm optimi-zation. Computation 2019; 7(1): 12.
[http://dx.doi.org/10.3390/computation7010012]
[21]
Ang JC, Mirzal A, Haron H, Hamed HNA. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selec-tion EEE/ACM Trans Comput Biol Bioinf 2016; 13(5): 971-89.
[http://dx.doi.org/10.1109/TCBB.2015.2478454]
[22]
Jain I, Jain VK, Jain R. Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl Soft Comput 2018; 62: 203-15.
[http://dx.doi.org/10.1016/j.asoc.2017.09.038]
[23]
Xue Y, Tang T, Pang W, Liu AX. Self-adaptive parameter and strategy based particle swarm optimization for large-scale feature selec-tion problems with multiple classifiers. Appl Soft Comput 2020; 88: 106031.
[http://dx.doi.org/10.1016/j.asoc.2019.106031]
[24]
Liu Y, Wang Y, Ren X, Zhou H, Diao X. A classification method based on feature selection for imbalanced data IEEE Access 2019; 7: 81794-807.
[http://dx.doi.org/10.1109/ACCESS.2019.2923846]
[25]
Dadaneh BZ, Markid HY, Zakerolhosseini A. Unsupervised probabilistic feature selection using ant colony optimization. Expert Syst Appl 2016; 53: 27-42.
[http://dx.doi.org/10.1016/j.eswa.2016.01.021]
[26]
Shunmugapriya P, Kanmani S. A hybrid algorithm using ant and bee colony optimization for feature selection and classification (AC-ABC Hybrid). Swarm Evol Comput 2017; 36: 27-36.
[http://dx.doi.org/10.1016/j.swevo.2017.04.002]
[27]
Zhang Y, Cheng S, Shi Y, Gong D, Zhao X. Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Syst Appl 2019; 137: 46-58.
[http://dx.doi.org/10.1016/j.eswa.2019.06.044]
[28]
Hancer E, Xue B, Zhang M. Differential evolution for filter feature selection based on information theory and feature ranking. Knowl Base Syst 2018; 140: 103-19. [a]
[http://dx.doi.org/10.1016/j.knosys.2017.10.028]
[29]
Zhang Y, Gong D, Gao X, Tian T, Sun X. Binary differential evolution with self-learning for multi-objective feature selection. Inf Sci 2020; 507: 67-85.
[http://dx.doi.org/10.1016/j.ins.2019.08.040]
[30]
Taradeh M, Mafarja M, Heidari AA, et al. An evolutionary gravitational search-based feature selection. Inf Sci 2019; 497: 219-39.
[http://dx.doi.org/10.1016/j.ins.2019.05.038]
[31]
B S. K M. Firefly algorithm based feature selection for network intrusion detection. Comput Secur 2019; 81: 148-55.
[http://dx.doi.org/10.1016/j.cose.2018.11.005]
[32]
Jayaraman V, Sultana HP. Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification. J Ambient Intell Humaniz Comput 2019.
[http://dx.doi.org/10.1007/s12652-019-01193-6]
[33]
Prabukumar M, Agilandeeswari L, Ganesan K. An intelligent lung cancer diagnosis system using cuckoo search optimization and sup-port vector machine classifier. J Ambient Intell Humaniz Comput 2019; 10(1): 267-93.
[http://dx.doi.org/10.1007/s12652-017-0655-5]
[34]
Abdel-Basset M, El-Shahat D, El-henawy I, de Albuquerque VHC, Mirjalili S. A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection. Expert Syst Appl 2020; 139: 112824.
[http://dx.doi.org/10.1016/j.eswa.2019.112824]
[35]
Neggaz N, Ewees AA, Elaziz MA, Mafarja M. Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for fea-ture selection. Expert Syst Appl 2020; 145: 113103.
[http://dx.doi.org/10.1016/j.eswa.2019.113103]
[36]
Das H, Naik B, Behera HS. A jaya algorithm based wrapper method for optimal feature selection in supervised classification. J King Saud Univ-Comput Inform Sci 2020; 34(6): 3851-63.
[37]
Chaudhuri A, Sahu TP. A hybrid feature selection method based on binary jaya algorithm for micro-array data classification. Comput Electr Eng 2021; 90: 106963.
[http://dx.doi.org/10.1016/j.compeleceng.2020.106963]
[38]
Awadallah MA, Al-Betar MA, Hammouri AI, Alomari OA. Binary JAYA algorithm with adaptive mutation for feature selection. Arab J Sci Eng 2020; 45(12): 10875-90.
[http://dx.doi.org/10.1007/s13369-020-04871-2]
[39]
Rao R. Jaya: A simple and new optimization algorithm for solving constrained and unconstrained optimization problems. Int J Indus Eng Comput 2016; 7(1): 19-34.
[40]
Alomari OA, Khader AT, Al-Betar MA, Alyasseri ZAA. A hybrid filter-wrapper gene selection method for cancer classification. 2018 2nd International Conference on BioSignal Analysis, Processing and Systems (ICBAPS) IEEE 2018; 113-8.
[41]
Kim JT, Kum HY, Kim JH. A comparative study of filter methods based on information entropy. J Korean Society Marine Eng 2016; 40(5): 437-46.
[http://dx.doi.org/10.5916/jkosme.2016.40.5.437]
[42]
Yeh JY. Applying data mining techniques for cancer classification on gene expression data. Cybern Syst 2008; 39(6): 583-602.
[http://dx.doi.org/10.1080/01969720802188292]
[43]
Salem H, Attiya G, El-Fishawy N. Classification of human cancer diseases by gene expression profiles. Appl Soft Comput 2017; 50: 124-34.
[http://dx.doi.org/10.1016/j.asoc.2016.11.026]
[44]
Sadeghian Z, Akbari E, Nematzadeh H. A hybrid feature selection method based on information theory and binary butterfly optimiza-tion algorithm. Eng Appl Artif Intell 2021; 97: 104079.
[http://dx.doi.org/10.1016/j.engappai.2020.104079]
[45]
Breast and colon cancer classification from gene expression profiles using data mining techniques. Symmetry 2020; 12(3): 408.
[http://dx.doi.org/10.3390/sym12030408]
[46]
Zhang G, Hou J, Wang J, Yan C, Luo J. Feature selection for microarray data classification using hybrid information gain and a modi-fied binary krill herd algorithm. Interdiscip Sci 2020; 12(3): 288-301.
[http://dx.doi.org/10.1007/s12539-020-00372-w] [PMID: 32441000]
[47]
Shukla AK. Feature selection inspired by human intelligence for improving classification accuracy of cancer types. Comput Intell 2021; 37(4): 1571-98.
[48]
Alomari OA, Khader AT, Betar MAA, Abualigah LM. Gene selection for cancer classification by combining minimum redundancy maximum relevancy and bat-inspired algorithm. Int J Data Min Bioinform 2017; 19(1): 32-51.
[http://dx.doi.org/10.1504/IJDMB.2017.088538]
[49]
Sahu B, Dash S, Mohanty S N, Rout S K. Ensemble comparative study for diagnosis of breast cancer datasets Inter J Eng Technol 2018; 7(4.15): 281-5.
[http://dx.doi.org/10.14419/ijet.v7i4.15.23007]
[50]
Dash S, Abraham A, Luhach AK, Mizera-Pietraszko J, Rodrigues JJPC. Hybrid chaotic firefly decision making model for Parkinson’s disease diagnosis. Int J Distrib Sens Netw 2020; 16(1): 1-18.
[http://dx.doi.org/10.1177/1550147719895210]
[51]
Dash S, Thulasiram R, Thulasiraman P. An enhanced chaos- based firefly model for Parkinson’s disease diagnosis and classification. In. 2017 International Conference on Information Technology; 21-23 December 2017; Bhubaneswar, India: IEEE; pp. 159-64.
[http://dx.doi.org/10.1109/ICIT.2017.43]
[52]
Dash S, Thulasiram R, Thulasiraman P. Modified firefly algorithm with chaos theory for feature selection: A predictive model for med-ical data. Int J Swarm Intell Res 2019; 10(2): 1-20. [IJSIR
[http://dx.doi.org/10.4018/IJSIR.2019040101]
[53]
Dash S, Abraham A. Kernel based chaotic firefly algorithm for diagnosing Parkinson’s disease. In: Madureira, A.; Abraham, A.; Gandhi, N.; Varela, M.; (eds) Hybrid Intelligent Systems. HIS 2018. Advances in Intelligent Systems and Computing, vol 923. Springer, Cham 2020.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy