Abstract
Aim: To search the genes related to the mechanisms of the occurrence of glioma and to try to build a prediction model for glioblastomas.
Background: The morbidity and mortality of glioblastomas are very high, which seriously endanger human health. At present, the goals of many investigations on gliomas are mainly to understand the cause and mechanism of these tumors at the molecular level and to explore clinical diagnosis and treatment methods. However, there is no effective early diagnosis method for this disease, and there are no effective prevention, diagnosis, or treatment measures.
Methods: Firstly, the gene expression profiles derived from GEO were downloaded. Then, differentially expressed genes (DEGs) in the disease samples and the control samples were identified. After that, GO and KEGG enrichment analyses of DEGs were performed by DAVID. Furthermore, the correlation- based feature subset (CFS) method was applied to the selection of key DEGs. In addition, the classification model between the glioblastoma samples and the controls was built by a Support Vector Machine (SVM) based on selected key genes.
Results and Discussion: Thirty-six DEGs, including 17 upregulated and 19 downregulated genes, were selected as the feature genes to build the classification model between the glioma samples and the control samples by the CFS method. The accuracy of the classification model by using a 10-fold crossvalidation test and the independent set test was 76.25% and 70.3%, respectively. In addition, PPP2R2B and CYBB can also be found in the top 5 hub genes screened by the protein-protein interaction (PPI) network.
Conclusion: This study indicated that the CFS method is a useful tool to identify key genes in glioblastomas. In addition, we also predicted that genes such as PPP2R2B and CYBB might be potential biomarkers for the diagnosis of glioblastomas.
Keywords: Feature selection, glioblastomas, correlation-based feature subset, machine learning, support vector machine, Knearest neighbor.
Graphical Abstract