Abstract
Background: Type 1 diabetes is a chronic autoimmune disease featured by insulin deprivation caused by pancreatic β-cell loss, followed by hyperglycaemia.
Objective: Currently, there is no cure for this disease in clinical treatment, and patients have to accept a lifelong injection of insulin. The exploration of potential diagnosis biomarkers through analysis of mass data by bioinformatics tools and machine learning is important for type 1 diabetes.
Methods: We collected two mRNA expression datasets of type 1 diabetes peripheral blood samples from GEO, screened differentially expressed genes (DEGs) by R software, and conducted GO and KEGG pathway enrichment using the DEGs. Moreover, the STRING database and Cytoscape were used to build PPI network and predict hub genes. We constructed a logistic regression model by using the hub genes to assess sample type.
Results: Bioinformatic analysis of the GEO dataset revealed 92 and 75 DEGs in GSE50098 and GSE9006 datasets, separately, and 10 overlapping DEGs. PPI network of these 10 DEGs showed 7 hub genes, namely EGR1, LTF, CXCL1, TNFAIP6, PGLYRP1, CHI3L1 and CAMP. We built a logistic regression model based on these hub genes and optimized the model to 3 genes (LTF, CAMP and PGLYRP1) based logistic model. The values of the area under the curve (AUC) of training set GSE50098 and testing set GSE9006 were 0.8452 and 0.8083, indicating the efficacy of this model.
Conclusion: Integrated bioinformatic analysis of gene expression in type 1 diabetes and the effective logistic regression model built in our study may provide promising diagnostic methods for type 1 diabetes.
Keywords: Type 1 diabetes, bioinformatic analysis, logistic regression model, diagnosis, GO, KEGG.