Feature Selection Using Information Distance Measure for Gene Expression Data

Jie      Cai; Cheng         Liang; Jiawei         Luo

doi:10.2174/1570164615666180716144945

Abstract

Background: The accurate classification of microarray data has been a great challenge in machine learning due to its high dimensionality and small number of samples. Feature selection is an effective way to deal with such data.

Objective: Feature subset that maximizes feature-feature diversity as well as feature-class relevance is selected to improve the predictive efficiency and reduce the cost of feature acquisition. Moreover, the selection of features with high entropy but low classification performance is restricted.

Method: We first present a feature selection criterion based on information distance measure by introducing the self-redundancy factor into the maximum relevance and maximum redundancy criterion, where the self-redundancy factor is taken as the penalty for feature with high entropy; then, an incremental search based feature selection method using this criterion called MFFID is proposed to maximize the information distance between features.

Results: Compared with four representative feature selection methods on twelve high-dimensional microarray datasets, the proposed method MFFID achieves better performance than the other methods in terms of the classification accuracy.

Conclusion: In this study, a novel feature selection method MFFID is proposed, which is expressed in the form of information distance measure by introducing the self-redundancy factor into CMRMR. The experimental results clearly demonstrate that MFFID is an effective and stable feature selection method for the tumor datasets classification.

Keywords: Classification, feature selection, information distance measure, diversity, entropy, accuracy.

« Previous Next »

Graphical Abstract

Rights & Permissions Print Cite

Article Metrics

23

2

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/1570164615666180716144945	Print ISSN 1570-1646
Publisher Name Bentham Science Publisher	Online ISSN 1875-6247

Current Proteomics

Feature Selection Using Information Distance Measure for Gene Expression Data

Abstract Play Pause

Graphical Abstract

Related Journals

Related Books

Related Articles

Abstract