Abstract
Background: Various regularization methods have been proposed to improve the prediction accuracy in cancer diagnosis. Elastic net regularized logistic regression has been widely adopted for cancer classification and gene selection in genetics and molecular biology but is commonly applied to binary classification and regression. However, usually, the cancer subtypes can be more, and most likely cannot be decided precisely.
Objective: Besides the multi-class issue, the feature selection problem is also a critical problem for cancer subtype classification.
Methods: An Elastic Net Regularized Softmax Regression (ENRSR) for multi-classification is put forward to tackle the multiple classification issue. As an extension of elastic net regularized logistic regression, ENRSR enforces structure sparsity and ‘grouping effect’ for gene selection based on gene expression data, which may exhibit high correlation. The sparsity structure and ‘grouping effect’ help to select more propriate discriminable features for multi-classification.
Result: It is demonstrated that ENRSR gains more accurate and robust performance compared to the other 6 competing algorithms (K-means, Hierarchical Clustering, Expectation Maximization, Nonnegative Matrix Factorization, Support Vector Machine and Random Forest) in predicting cancer subtypes both on simulation data and real cancer gene expression data in terms of F measure.
Conclusion: Our proposed ENRSR method is a reliable regularized softmax regression for multisubtype classification.
Keywords: regularization, softmax regression, elastic net, multiple classification, gene selection, cancer.
Graphical Abstract
[http://dx.doi.org/10.1093/jnci/djs027] [PMID: 22395642]
[http://dx.doi.org/10.1158/0008-5472.CAN-04-0695] [PMID: 15466177]
[http://dx.doi.org/10.1186/1471-2164-14-S8-S7] [PMID: 24564637]
[http://dx.doi.org/10.1016/j.compbiomed.2015.10.008] [PMID: 26520484]
[http://dx.doi.org/10.1093/bib/bbn027] [PMID: 18562478]
[http://dx.doi.org/10.1016/j.eswa.2009.12.074]
[http://dx.doi.org/10.1016/j.jbi.2011.01.001] [PMID: 21241823]
[http://dx.doi.org/10.1016/j.bbe.2013.09.007]
[http://dx.doi.org/10.1016/j.compbiomed.2014.09.008] [PMID: 25282708]
[http://dx.doi.org/10.1016/j.eswa.2015.08.016]
[http://dx.doi.org/10.1007/s00521-010-0513-2]
[http://dx.doi.org/10.1016/j.compbiomed.2011.08.011] [PMID: 21955335]
[http://dx.doi.org/10.1007/978-3-540-35488-8_13]
[http://dx.doi.org/10.1109/TCBB.2012.33] [PMID: 22350210]
[http://dx.doi.org/10.1198/016214502753479248]
[http://dx.doi.org/10.1093/bioinformatics/bth267] [PMID: 15087314]
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[http://dx.doi.org/10.1016/j.csda.2004.03.017]
[http://dx.doi.org/10.1016/S0925-2312(00)00325-8]
[http://dx.doi.org/10.1111/j.1467-9868.2005.00503.x]
[http://dx.doi.org/10.1016/j.jco.2009.01.002]
[http://dx.doi.org/10.1186/1753-6561-6-S2-S10]
[http://dx.doi.org/10.2202/1544-6115.1613] [PMID: 21291414]
[http://dx.doi.org/10.1016/j.neucom.2011.09.035]
[http://dx.doi.org/10.1038/nature11003] [PMID: 22460905]
[http://dx.doi.org/10.1007/s00521-012-1148-2]
[http://dx.doi.org/10.1016/j.jbi.2014.11.013] [PMID: 25500636]
[http://dx.doi.org/10.2202/1544-6115.1248] [PMID: 17402921]
[http://dx.doi.org/10.1007/s00521-012-0885-6]
[http://dx.doi.org/10.4161/sysb.26131]
[http://dx.doi.org/10.1371/journal.pone.0171122] [PMID: 28152003]
[http://dx.doi.org/10.1038/s41598-018-21851-7] [PMID: 29483546]
[http://dx.doi.org/10.1186/1471-2105-14-198] [PMID: 23777239]
[http://dx.doi.org/10.1007/BF02348081] [PMID: 12892361]
[http://dx.doi.org/10.1109/10.748981] [PMID: 10097463]
[http://dx.doi.org/10.1038/hdy.1996.55] [PMID: 8626222]
[http://dx.doi.org/10.1109/TMI.2004.834626] [PMID: 15575404]
[http://dx.doi.org/10.1007/BF00994018]
[http://dx.doi.org/10.1186/1471-2105-7-3] [PMID: 16398926]
[http://dx.doi.org/10.1137/050626090]
[http://dx.doi.org/10.1214/07-AOAS131]
[http://dx.doi.org/10.2217/nnm.12.9] [PMID: 22385199]
[http://dx.doi.org/10.1093/annonc/mdt303] [PMID: 23917950]
[http://dx.doi.org/10.2147/IJN.S58270] [PMID: 24591826]
[http://dx.doi.org/10.1038/nrc3219] [PMID: 22337151]
[PMID: 27088047]
[http://dx.doi.org/10.1038/nprot.2008.211] [PMID: 19131956]
[http://dx.doi.org/10.1016/j.celrep.2015.12.015] [PMID: 26725116]
[http://dx.doi.org/10.1371/journal.pone.0013803] [PMID: 21103052]