Generic placeholder image

Current Bioinformatics

Editor-in-Chief

ISSN (Print): 1574-8936
ISSN (Online): 2212-392X

Research Article

Diabetes Induced Factors Prediction Based on Various Improved Machine Learning Methods

Author(s): Jun Wu, Lulu Qu*, Guoping Yang and Nan Han*

Volume 17, Issue 3, 2022

Published on: 29 December, 2021

Page: [254 - 262] Pages: 9

DOI: 10.2174/1574893616666211130125206

Price: $65

Abstract

Background: With the increasing quality of life of people, people have begun to have more time and energy to pay attention to their own health problems. Among them, diabetes, as one of the most common and fastest-growing diseases, has attracted widespread attention from experts in bioinformatics. People of different ages all over the world suffer from diabetes, which can shorten the life span of patients. Diabetes has a significant impact on human health, so that the accuracy of the initial diagnosis becomes essential. Diabetes can bring some serious complications, especially in the elderly, such as cardiovascular and cerebrovascular diseases, stroke, and multiple organ damage. The initial diagnosis of diabetes can reduce the possibility of deterioration. Identifying and analyzing potential risk factors for different physical attributes can help diagnose the prevalence of diabetes. The more accurate the prevalence, the more likely it is to reduce the incidence of complications.

Methods: In this paper, we use the open source NHANES data set to analyze and determine potential risk factors relevant to diabetes by an improved version of Logistic Regression, SVM, and other improved machine learning algorithms.

Results: Experimental results show that the improved version of Random Forest has the best effect, with a classification accuracy of 92%, and it can be found that age, blood-related diabetes, high blood pressure, cholesterol and BMI are the most important risk factors related to diabetes.

Conclusion: Through the proposed method of machine learning, we can cope with class imbalance and outlier detection problems.

Keywords: Health problems, diabetes, risk factors, machine learning, class imbalance, outlier detection.

Graphical Abstract

[1]
Maniruzzaman M, Rahman MJ, Ahammed B, Abedin MM. Classification and prediction of diabetes disease using machine learning para-digm. Health Inf Sci Syst 2020; 8(1): 7.
[http://dx.doi.org/10.1007/s13755-019-0095-z] [PMID: 31949894]
[2]
Raihan M, Alvi N, Islam MT, et al. Diabetes Mellitus Risk Prediction Using Artificial Neural Network. Proceedings of the International Joint Conference on Computational Intelligence Springer. Singapore. 2020; pp. 85-97.
[3]
De Iuliis A, Montinaro E, Fatati G, Plebani M, Colosimo C. Diabetes mellitus and Parkinson’s disease: Dangerous liaisons between insulin and dopamine. Neural Regen Res 2022; 17(3): 523-33.
[http://dx.doi.org/10.4103/1673-5374.320965] [PMID: 34380882]
[4]
Zimmet P, Alberti KG, Magliano DJ, Bennett PH. Diabetes mellitus statistics on prevalence and mortality: Facts and fallacies. Nat Rev Endocrinol 2016; 12(10): 616-22.
[http://dx.doi.org/10.1038/nrendo.2016.105] [PMID: 27388988]
[5]
Hasan KA, Hasan MAM. Prediction of clinical risk factors of diabetes using multiple machine learning techniques resolving class imbal-ance. Proceedings of the International Conference on Computer and Information Technology Dhaka. Bangladesh. 2020.
[6]
Rajalakshmi K, Dhenakaran DSS. Analysis of data mining prediction techniques in healthcare management system. Int J Adv Res Comput Sci Softw Eng 2015; 5(4): 1343-7.
[7]
Marinov M, Mosa ASM, Yoo I, Boren SA. Data-mining technologies for diabetes: A systematic review. J Diabetes Sci Technol 2011; 5(6): 1549-56.
[http://dx.doi.org/10.1177/193229681100500631] [PMID: 22226277]
[8]
Durairaj M, Priya K. Breast cancer prediction using soft computing techniques a survey. Int J Comput Sci Eng 2018; 6(8): 135-45.
[9]
Kandhasamy JP, Balamurali S. Performance analysis of classifier models to predict diabetes mellitus. Procedia Comput Sci 2015; 47: 45-51.
[http://dx.doi.org/10.1016/j.procs.2015.03.182]
[10]
Khan FA, Zeb K, AlRakhami M. Detection and prediction of diabetes using data mining: A comprehensive review. IEEE Access 2021; 9: 43711-35.
[http://dx.doi.org/10.1109/ACCESS.2021.3059343]
[11]
Tsanas A, Xifara A. Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools. Energy Build 2012; 49: 560-7.
[http://dx.doi.org/10.1016/j.enbuild.2012.03.003]
[12]
Idowu PA, Balogiun JA. Fuzzy logic-based predictive model for the risk of type 2 diabetes mellitus. Int J E-Health Med Commun 2019; 10(3): 56-78.
[http://dx.doi.org/10.4018/IJEHMC.2019070104]
[13]
Reddy SS, Rajender R, Sethi N. A data mining scheme for detection and classification of diabetes mellitus using voting expert strategy. Int J Knowledge-based Intelligent Eng Sys 2019; 23(2): 103-8.
[http://dx.doi.org/10.3233/KES-190403]
[14]
Wan X, Wang W, Liu J, Tong T. Estimating the sample mean and standard deviation from the sample size, median, range and/or interquar-tile range. BMC Med Res Methodol 2014; 14(1): 135.
[http://dx.doi.org/10.1186/1471-2288-14-135] [PMID: 25524443]
[15]
Pedregosa F, Varoquaux G, Gramfort A. Scikit-learn: Machine learning in python. J Mach Learn Res 2011; 12: 2825-30.
[16]
Mao Y, Chen WL, Guo BL, Chen YX. A novel logistic regression model based on density estimation. Acta Automat Sin 2014; 40(1): 62-72.
[17]
Schölkopf B, Sung KK, Burges CJC, et al. Comparing support vector machines with gaussian kernels to radial basis function classifiers. IEEE Trans Signal Process 1997; 45(11): 2758-65.
[http://dx.doi.org/10.1109/78.650102]
[18]
Deng Z, Li D, Ke YH, et al. An improved SVM algorithm for high spatial resolution remote sensing image classification. Remote Sens Land Resour 2016; 28(3): 12-8.
[19]
Luo HW, Chen YJ, Zhang WD. An improved ID3 algorithm based on attribute importance-weighted, Database Technology and Applications (DBTA). IEEE 2010; 2010: 1-4.
[20]
Wang RS, Xie HW, An JC. Improvement of random forests algorithm based on classification accuracy and correlation. Kexue Jishu Yu Gongcheng 2017; 17(20): 67-72.
[21]
Zhu Y, Newsan SD. DenseNet for dense flow. Comput Vision Pattern Recogn 2017; 2017: 790-4.
[22]
Tumer K, Agogino AK. Ensemble clustering with voting active clusters. Pattern Recognit Lett 2008; 29(14): 1947-53.
[http://dx.doi.org/10.1016/j.patrec.2008.06.011]
[23]
Hasan KAM, Hasan MAM. Classification of Parkinson’s disease by analyzing multiple vocal features sets. Proceedings of the IEEE Re-gion 10 Symposium (TENSYMP). 2020 June 5-7; Dhaka, Bangladesh. 2020; 758-61.

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy