Abstract
Background: With the increasing quality of life of people, people have begun to have more time and energy to pay attention to their own health problems. Among them, diabetes, as one of the most common and fastest-growing diseases, has attracted widespread attention from experts in bioinformatics. People of different ages all over the world suffer from diabetes, which can shorten the life span of patients. Diabetes has a significant impact on human health, so that the accuracy of the initial diagnosis becomes essential. Diabetes can bring some serious complications, especially in the elderly, such as cardiovascular and cerebrovascular diseases, stroke, and multiple organ damage. The initial diagnosis of diabetes can reduce the possibility of deterioration. Identifying and analyzing potential risk factors for different physical attributes can help diagnose the prevalence of diabetes. The more accurate the prevalence, the more likely it is to reduce the incidence of complications.
Methods: In this paper, we use the open source NHANES data set to analyze and determine potential risk factors relevant to diabetes by an improved version of Logistic Regression, SVM, and other improved machine learning algorithms.
Results: Experimental results show that the improved version of Random Forest has the best effect, with a classification accuracy of 92%, and it can be found that age, blood-related diabetes, high blood pressure, cholesterol and BMI are the most important risk factors related to diabetes.
Conclusion: Through the proposed method of machine learning, we can cope with class imbalance and outlier detection problems.
Keywords: Health problems, diabetes, risk factors, machine learning, class imbalance, outlier detection.
Graphical Abstract
[http://dx.doi.org/10.1007/s13755-019-0095-z] [PMID: 31949894]
[http://dx.doi.org/10.4103/1673-5374.320965] [PMID: 34380882]
[http://dx.doi.org/10.1038/nrendo.2016.105] [PMID: 27388988]
[http://dx.doi.org/10.1177/193229681100500631] [PMID: 22226277]
[http://dx.doi.org/10.1016/j.procs.2015.03.182]
[http://dx.doi.org/10.1109/ACCESS.2021.3059343]
[http://dx.doi.org/10.1016/j.enbuild.2012.03.003]
[http://dx.doi.org/10.4018/IJEHMC.2019070104]
[http://dx.doi.org/10.3233/KES-190403]
[http://dx.doi.org/10.1186/1471-2288-14-135] [PMID: 25524443]
[http://dx.doi.org/10.1109/78.650102]
[http://dx.doi.org/10.1016/j.patrec.2008.06.011]