Abstract
Background: Tumor classification is one of the most important applications of gene expression data. Due to high dimensionality in microarray data, dimensionality reduction plays a crucial role in tumor classification based on gene expression profiles.
Objective: The primary objective of this study is to increase the accuracy of tumor classification by reducing the dimensionality of gene expression data with feature extraction methods.
Method: In this paper, we propose a novel supervised feature extraction method for tumor classification called discriminant hybrid structure preserving projections. The proposed method utilizes hybrid representation to efficiently characterize the structure of gene expression data, where both neighbor representation and sparse representation are taken into account. Specifically, our algorithm enhances the data separability after dimensionality reduction by simultaneously minimizing the within-class distance and maximizing the between-class distance. Moreover, it employs an imbalanced adjustment factor during the extraction process to overcome the class imbalance problem in tumor datasets.
Results: Experiments on five publicly available tumor datasets demonstrate the effectiveness of the proposed method in comparison with a number of state-of-the-art feature extraction and feature selection methods.
Conclusion: The proposed algorithm can enhance the separability of data after projections and thus improve the tumor classification accuracy of gene expression data.
Keywords: Tumor classification, gene expression data, dimensionality reduction, feature extraction, neighbor representation, sparse representation.
Graphical Abstract