Abstract
Recently, many soft computing methods have been implemented to extract
information from big data. A standardized format for evaluating the expression levels of
thousands of genes is made available by DNA microarray technology. Cancers of several
anatomical regions can be identified with the help of patterns developed by gene expressions
in microarray technology. Since the microarray data is too huge to process due to the curse of
dimensionality problem.
Methodology: Therefore, in this chapter, a setup based on a hybrid machine learning
framework using soft computing techniques for feature selection is designed and executed to
eliminate unnecessary genes and identify important genes for the identification of cancer. In
the first stage, the genes or the features are taken out with the aid of the higher-order
Independent Component Analysis (ICA) technique. Then, a wrapper algorithm that is based on
Spider Monkey Optimization (SMO) with Genetic Algorithm (GA) is used to find the set of
genes that improve the classification accuracy of Naïve Bayes (NB) classifiers and Support
Vector Machine (SVM). For comparison purposes, three other optimization techniques
considered in this chapter are Particle Swarm Optimization (PSO), Artificial Bee Colony
(ABC), and Genetic Algorithm (GA). After the selection of relevant expressed genes, the most
popular classifiers namely Naïve Bayes (NB) and Support Vector Machine (SVM)) are trained
with selected genes, and in the end, the accuracy of classification is determined using test data.
Result: The experimental results with five benchmark microarray datasets of cancer prove that
Genetic Spider Monkey (GSM) is a more efficient approach to improve the classification
performance with ICA for both classifiers.