Abstract
Diagnosing cancer and identifying the disease gene by using DNA microarray gene expression data are the hot topics in current bioinformatics. This paper is devoted to the latest development in cancer diagnosis and gene selection via statistical machine learning. A support vector machine is firstly introduced for the binary cancer diagnosis. Then, 1-norm support vector machine, doubly regularized support vector machine, adaptive huberized support vector machine and other extensions are presented to improve the performance of gene selection. Lasso, elastic net, partly adaptive elastic net, group lasso, sparse group lasso, adaptive sparse group lasso and other sparse regression methods are also introduced for performing simultaneous binary cancer classification and gene selection. In addition to introducing three strategies for reducing multiclass to binary, methods of directly considering all classes of data in a learning model (multi_class support vector, sparse multinomial regression, adaptive multinomial regression and so on) are presented for performing multiple cancer diagnosis. Limitations and promising directions are also discussed.
Keywords: Cancer diagnosis, gene selection, machine learning, support vector machine, lasso, group lasso.
Graphical Abstract