Abstract
Background: The accurate classification of tumors types is mainly important for the treatment of cancer. With the progress of the microarray expression profile, many methods are proposed to deal with these data. However, because of the feature dimension of tumor gene expression profile is very high; many machine learning algorithms are failure.
Objective & Methods: In this paper, a novel method named probabilistic classification vector machines (PCVM) with feature selection is proposed for tumor types detection using gene expression data, PCVM adopt a signed and truncated Gaussian prior to solve the problem of unstable solutions caused, and the complexity of the model can be controlled by the truncated Gaussian prior. The performance of PCVM is evaluated on two datasets by using four metrics.
Results: This method achieves 84.21% accuracy and 95.24 % accuracy in the leukemia and prostrate dataset respectively. As compared to other methods, PCVM obtain much higher performance than Support Vector Machines (SVM), Naïve Bayes (NB), RBF Neural Networks (RBF), K-nearest Neighbor (KNN), and Random Forest (RF) except SVM on Prostate dataset. In order to reduce computational time, we adopt a feature selection method (DX) to rank the features and search the optimal feature combination based on PCVM, PCVM with DX method (PCVM-DX) achieves 94.74% accuracy, 100% sensitivity, 85.71% specificity and 92.31% precision on the leukemia dataset. PCVMDX method obtained the same result as PCVM on the prostate dataset. We also compare DX with other feature selection method; the result reveals that the PCVM-DX is efficient for tumor classification in terms of performance.
Conclusion: PCVM-DX is observed to be better than the other methods in two data sets. The novelty of this approach lies in applying PCVM to tackle the same prior for different classes may lead to unstable solutions by RVMs and also exploring the important feature subset in the microarray expression profile with feature selection.
Keywords: Probabilistic classification vector, feature selection, tumor classification, DX, machine learning, kernel function.
Graphical Abstract