Soft Computing Techniques for Cancer Classification of Gene Expression Microarray Data: A Three-Phase Hybrid Approach

Abstract

Recently, many soft computing methods have been implemented to extract information from big data. A standardized format for evaluating the expression levels of thousands of genes is made available by DNA microarray technology. Cancers of several anatomical regions can be identified with the help of patterns developed by gene expressions in microarray technology. Since the microarray data is too huge to process due to the curse of dimensionality problem.

Methodology: Therefore, in this chapter, a setup based on a hybrid machine learning framework using soft computing techniques for feature selection is designed and executed to eliminate unnecessary genes and identify important genes for the identification of cancer. In the first stage, the genes or the features are taken out with the aid of the higher-order Independent Component Analysis (ICA) technique. Then, a wrapper algorithm that is based on Spider Monkey Optimization (SMO) with Genetic Algorithm (GA) is used to find the set of genes that improve the classification accuracy of Naïve Bayes (NB) classifiers and Support Vector Machine (SVM). For comparison purposes, three other optimization techniques considered in this chapter are Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC), and Genetic Algorithm (GA). After the selection of relevant expressed genes, the most popular classifiers namely Naïve Bayes (NB) and Support Vector Machine (SVM)) are trained with selected genes, and in the end, the accuracy of classification is determined using test data.

Result: The experimental results with five benchmark microarray datasets of cancer prove that Genetic Spider Monkey (GSM) is a more efficient approach to improve the classification performance with ICA for both classifiers.

Cite as