Abstract
Background: Cancer threatens human health seriously. Diagnosing cancer via gene expression analysis is a hot topic in cancer research.
Objective: The study aimed to diagnose the accurate type of lung cancer and discover the pathogenic genes.
Methods: In this study, Affinity Propagation (AP) clustering with similarity score was employed to each type of lung cancer and normal lung. After grouping genes, sparse group lasso was adopted to construct four binary classifiers and the voting strategy was used to integrate them.
Results: This study screened six gene groups that may associate with different lung cancer subtypes among 73 genes groups, and identified three possible key pathogenic genes, KRAS, BRAF and VDR. Furthermore, this study achieved improved classification accuracies at minority classes SQ and COID in comparison with other four methods.
Conclusion: We propose the AP clustering based sparse group lasso (AP-SGL), which provides an alternative for simultaneous diagnosis and gene selection for lung cancer.
Keywords: Lung cancer, gene selection, affinity propagation clustering, sparse group lasso, multi-classification, miRNA.
Graphical Abstract