Generic placeholder image

Current Women`s Health Reviews

Editor-in-Chief

ISSN (Print): 1573-4048
ISSN (Online): 1875-6581

Research Article

Risk Factors Identification and Prediction of Anemia among Women in Bangladesh using Machine Learning Techniques

Author(s): Md. Merajul Islam, Md. Jahanur Rahman, Dulal Chandra Roy, Md. Moidul Islam, Most. Tawabunnahar, N.A.M. Faisal Ahmed and Md. Maniruzzaman*

Volume 18, Issue 1, 2022

Published on: 15 February, 2021

Article ID: e041021191430 Pages: 16

DOI: 10.2174/1573404817666210215161108

Price: $65

Abstract

Background: Anemia is a major public health problem with raising prevalence worldwide, including Bangladesh.

Objectives: To identify the risk factors of anemia among women in Bangladesh and its prediction using Machine Learning (ML) based techniques.

Methods: The anemia dataset, comprising of 3,020 respondents, was extracted from the Bangladesh Demographic and Health Survey (BDHS). Two feature selection techniques as Logistic Regression (LR) and Random Forest (RF), have been utilized to determine the risk factors of anemia. Additionally, eight ML-based techniques, namely LR, Linear Discriminant Analysis (LDA), K-Nearest Neighborhood (KNN), Support Vector Machine (SVM), Quadratic Discriminant Analysis (QDA), Neural Network (NN), Classification And Regression Tree (CART), and RF have also been utilized to predict anemia disease among women in Bangladesh. Classification accuracy and Area Under the Curve (AUC) are used to evaluate the performances of these classifiers.

Results: LR and RF-based feature selection results indicate that out of 15 factors, 13 for LR and 14 factors for RF appear to be significant risk factors for anemia among women. All predictive models provide the highest classification accuracy and AUC of 74.10-81.29% and 0.744-0.819 under RF features. However, the combination of RF-based feature selection along with RF-based classifier gives the highest classification accuracy (81.29%) and AUC (0.819).

Conclusion: Out of the eight predictive models, RF-RF based combination model shows the best performance for the prediction of anemia. This study suggests policymakers to make appropriate decisions to control the anemia using RF-RF combination to save time and reduce the cost for Bangladeshi women.

Keywords: Non-pregnant women of childbearing age, risk factors, identification, model, LR, RF, prediction, anemia, machine learning, Bangladesh.

Graphical Abstract

[1]
Kassebaum NJ, Jasrasaria R, Naghavi M, et al. A systematic analysis of global anemia burden from 1990 to 2010. Blood 2014; 123(5): 615-24.
[http://dx.doi.org/10.1182/blood-2013-06-508325] [PMID: 24297872]
[2]
Olivier RMR, Fischer L, Steinbicker AU. Patient blood management : Medical concept for increasing patient safety. Anaesthesist 2020; 69(1): 55-71.
[http://dx.doi.org/10.1007/s00101-019-00707-3] [PMID: 31925453]
[3]
Hasanzamani B, Ghorban Sabbagh M. The relationship between anemia and Kt/V index in patients undergoing continuous ambulatory peritoneal dialysis and hemodialysis. J Renal Inj Prev 2020; 9(1): e06.
[4]
Akel M, Shamas K, Sakr F, et al. Evaluation of the management of anemia in hemodialysis patients in Lebanon. J Nephropharmacol 2017; 6(2): 68-73.
[http://dx.doi.org/10.15171/npj.2017.07]
[5]
Mohanram A, Zhang Z, Shahinfar S, Keane WF, Brenner BM, Toto RD. Anemia and end-stage renal disease in patients with type 2 diabetes and nephropathy. Kidney Int 2004; 66(3): 1131-8.
[http://dx.doi.org/10.1111/j.1523-1755.2004.00863.x] [PMID: 15327408]
[6]
World Health Organization. Hemoglobin concentrations for the diagnosis of anemia and assessment of severity. World Health Organization 2011.
[7]
Haas JD, Brownlie T IV. Iron deficiency and reduced work capacity: a critical review of the research to determine a causal relationship. J Nutr 2001; 131(2S-2): 676S-88S.
[http://dx.doi.org/10.1093/jn/131.2.676S] [PMID: 11160598]
[8]
Sunuwar DR, Singh DR, Chaudhary NK, Pradhan PMS, Rai P, Tiwari K. Prevalence and factors associated with anemia among women of reproductive age in seven South and Southeast Asian countries: Evidence from nationally representative surveys. PLoS One 2020; 15(8): e0236449.
[http://dx.doi.org/10.1371/journal.pone.0236449] [PMID: 32790764]
[9]
Pasricha SR, Colman K, Centeno-Tablante E, Garcia-Casal MN, Peña-Rosas JP. Revisiting WHO haemoglobin thresholds to define anaemia in clinical medicine and public health. Lancet Haematol 2018; 5(2): e60-2.
[http://dx.doi.org/10.1016/S2352-3026(18)30004-8] [PMID: 29406148]
[10]
Szerafin L, Jakó J. Anemia in pregnancy: characteristics in Szabolcs-Szatmár-Bereg County, Hungary. Orv Hetil 2010; 151(33): 1347-52.
[http://dx.doi.org/10.1556/oh.2010.28887] [PMID: 20693146]
[11]
Scholl TO, Hediger ML, Fischer RL, Shearer JW. Anemia vs. iron deficiency: increased risk of preterm delivery in a prospective study. Am J Clin Nutr 1992; 55(5): 985-8.
[http://dx.doi.org/10.1093/ajcn/55.5.985] [PMID: 1570808]
[12]
Arnold DL, Williams MA, Miller RS, Qiu C, Sorensen TK. Iron deficiency anemia, cigarette smoking and risk of abruptio placentae. J Obstet Gynaecol Res 2009; 35(3): 446-52.
[http://dx.doi.org/10.1111/j.1447-0756.2008.00980.x] [PMID: 19527381]
[13]
Rasmussen K. Is there a causal relationship between iron deficiency or iron-deficiency anemia and weight at birth, length of gestation and perinatal mortality? J Nutr 2001; 131(2S-2): 590S-601S.
[http://dx.doi.org/10.1093/jn/131.2.590S] [PMID: 11160592]
[14]
Brabin BJ, Hakimi M, Pelletier D. An analysis of anemia and pregnancy-related maternal mortality. J Nutr 2001; 131(2S-2): 604S-14S.
[http://dx.doi.org/10.1093/jn/131.2.604S] [PMID: 11160593]
[15]
Ndyomugyenyi R, Kabatereine N, Olsen A, Magnussen P. Malaria and hookworm infections in relation to haemoglobin and serum ferritin levels in pregnancy in Masindi district, western Uganda. Trans R Soc Trop Med Hyg 2008; 102(2): 130-6.
[http://dx.doi.org/10.1016/j.trstmh.2007.09.015] [PMID: 17996912]
[16]
Klemmt PAB, Starzinski-Powitz A. Molecular and cellular pathogenesis of endometriosis. Curr Womens Health Rev 2018; 14(2): 106-16.
[http://dx.doi.org/10.2174/1573404813666170306163448] [PMID: 29861704]
[17]
Vlagopoulos PT, Tighiouart H, Weiner DE, et al. Anemia as a risk factor for cardiovascular disease and all-cause mortality in diabetes: the impact of chronic kidney disease. J Am Soc Nephrol 2005; 16(11): 3403-10.
[http://dx.doi.org/10.1681/ASN.2005030226] [PMID: 16162813]
[18]
Martín-Timón I, Sevillano-Collantes C, Segura-Galindo A, Del Cañizo-Gómez FJ. Type 2 diabetes and cardiovascular disease: Have all risk factors the same strength? World J Diabetes 2014; 5(4): 444-70.
[http://dx.doi.org/10.4239/wjd.v5.i4.444] [PMID: 25126392]
[19]
Maniruzzaman M, Rahman MJ, Al-MehediHasan M, et al. Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J Med Syst 2018; 42(5): 92.
[http://dx.doi.org/10.1007/s10916-018-0940-7] [PMID: 29637403]
[20]
Bruner-Tran KL, Mokshagundam S, Herington JL, Ding T, Osteen KG. Rodent models of experimental endometriosis: identifying mechanisms of disease and therapeutic targets. Curr Womens Health Rev 2018; 14(2): 173-88.
[http://dx.doi.org/10.2174/1573404813666170921162041] [PMID: 29861705]
[21]
Banchhor SK, Londhe ND, Araki T, et al. Calcium detection, its quantification, and grayscale morphology-based risk stratification using machine learning in multimodality big data coronary and carotid scans: A review. Comput Biol Med 2018; 101: 184-98.
[http://dx.doi.org/10.1016/j.compbiomed.2018.08.017] [PMID: 30149250]
[22]
Latha CB, Jeeva SC. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked 2019; 16: 100203.
[http://dx.doi.org/10.1016/j.imu.2019.100203]
[23]
Talukder A, Ahammed B. Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition 2020; 78: 110861.
[http://dx.doi.org/10.1016/j.nut.2020.110861] [PMID: 32592978]
[24]
Banchhor SK, Londhe ND, Araki T, et al. Wall-based measurement features provides an improved IVUS coronary artery risk assessment when fused with plaque texture-based features during machine learning paradigm. Comput Biol Med 2017; 91: 198-212.
[http://dx.doi.org/10.1016/j.compbiomed.2017.10.019] [PMID: 29100114]
[25]
Kuppili V, Biswas M, Sreekumar A, et al. Extreme learning machine framework for risk stratification of fatty liver disease using ultrasound tissue characterization. J Med Syst 2017; 41(10): 152.
[http://dx.doi.org/10.1007/s10916-017-0797-1] [PMID: 28836045]
[26]
Araki T, Jain PK, Suri HS, et al. Stroke risk stratification and its validation using ultrasonic echolucent carotid wall plaque morphology: a machine learning paradigm. Comput Biol Med 2017; 80: 77-96.
[http://dx.doi.org/10.1016/j.compbiomed.2016.11.011] [PMID: 27915126]
[27]
Maniruzzaman M, Jahanur Rahman M, Ahammed B, et al. Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms. Comput Methods Programs Biomed 2019; 176: 173-93.
[http://dx.doi.org/10.1016/j.cmpb.2019.04.008] [PMID: 31200905]
[28]
Vaishya R, Javaid M, Khan IH, Haleem A. Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes Metab Syndr 2020; 14(4): 337-9.
[http://dx.doi.org/10.1016/j.dsx.2020.04.012] [PMID: 32305024]
[29]
Jaiswal M, Srivastava A, Siddiqui TJ. Machine learning algorithms for anemia disease prediction.Recent Trends in Communication, Computing, and Electronics. Singapore: Springer 2019; pp. 463-9.
[http://dx.doi.org/10.1007/978-981-13-2685-1_44]
[30]
Abdullah M, Al-Asmari S. Anemia type’s prediction based on data mining classification algorithms. In: Communication, management and information technology 1st ed 2016 CRC Press. 2017.
[31]
Dithy MD, Priya VK. Anemia selection in pregnant women by using random prediction (Rp) classification algorithm. Int J Recent Technol Eng 2019; 8(2): 2277-3878.
[32]
Jameela A, Ahmad AR, George LE, et al. A review of machine learning techniques and statistical models in anemia. Int J Scientific Technol Res 2013; 2(2): 2277-8616.
[33]
Azarkhish I, Raoufy MR, Gharibzadeh S. Artificial intelligence models for predicting iron deficiency anemia and iron serum level based on accessible laboratory data. J Med Syst 2012; 36(3): 2057-61.
[http://dx.doi.org/10.1007/s10916-011-9668-3] [PMID: 21503744]
[34]
Sanap SA, Nagori M, Kshirsagar V. Classification of anemia using data mining techniques. International conference on swarm, evolutionary, and memetic computing. 113-21.
[http://dx.doi.org/10.1007/978-3-642-27242-4_14]
[35]
National Institute of Population Research and Training (NIPORT), Mitra and Associates, ICF International.. Bangladesh Demographic and Health Survey 2011. Dhaka: Bangladesh and Calverton, Maryland, USA: NIPORT, Mitra and Associates, ICF International 2013.
[36]
Kamruzzaman M, Rabbani MG, Saw A, Sayem MA, Hossain MG. Differentials in the prevalence of anemia among non-pregnant, ever-married women in Bangladesh: multilevel logistic regression analysis of data from the 2011 Bangladesh Demographic and Health Survey. BMC Womens Health 2015; 15(1): 54.
[http://dx.doi.org/10.1186/s12905-015-0211-4] [PMID: 26219633]
[37]
Elmardi KA, Adam I, Malik EM, et al. Prevalence and determinants of anaemia in women of reproductive age in Sudan: analysis of a cross-sectional household survey. BMC Public Health 2020; 20(1): 1125.
[http://dx.doi.org/10.1186/s12889-020-09252-w] [PMID: 32680488]
[38]
Gautam S, Min H, Kim H, Jeong HS. Determining factors for the prevalence of anemia in women of reproductive age in Nepal: Evidence from recent national survey data. PLoS One 2019; 14(6): e0218288.
[http://dx.doi.org/10.1371/journal.pone.0218288] [PMID: 31188883]
[39]
Pala K, Dundar N. Prevalence & risk factors of anaemia among women of reproductive age in Bursa, Turkey. Indian J Med Res 2008; 128(3): 282-6.
[PMID: 19052339]
[40]
Milton AH, Smith W, Rahman B, et al. Prevalence and determinants of malnutrition among reproductive aged women of rural Bangladesh. Asia Pac J Public Health 2010; 22(1): 110-7.
[http://dx.doi.org/10.1177/1010539509350913] [PMID: 20032040]
[41]
Austin PC, Tu JV. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J Clin Epidemiol 2004; 57(11): 1138-46.
[http://dx.doi.org/10.1016/j.jclinepi.2004.04.003] [PMID: 15567629]
[42]
Maniruzzaman M, Suri HS, Kumar N, et al. Risk factors of neonatal mortality and child mortality in Bangladesh. J Glob Health 2018; 8(1): 010417.
[http://dx.doi.org/10.7189/jogh.08.010421] [PMID: 29740501]
[43]
Jolliffe IT. Principal component analysis. New York: Springer-Verlag 2002.
[44]
Mitra P, Murthy CA, Pal SK. Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 2002; 24(3): 301-12.
[http://dx.doi.org/10.1109/34.990133]
[45]
Song F, Guo Z, Mei D. Feature selection using principal component analysis. International conference on system science, engineering design and manufacturing informatization, IEEE. 1: 27-30.
[http://dx.doi.org/10.1109/ICSEM.2010.14]
[46]
Shrivastava VK, Londhe ND, Sonawane RS, Suri JS. A novel and robust Bayesian approach for segmentation of psoriasis lesions and its risk stratification. Comput Methods Programs Biomed 2017; 150: 9-22.
[http://dx.doi.org/10.1016/j.cmpb.2017.07.011] [PMID: 28859832]
[47]
Peng H, Long F, Ding C. Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 2005; 27(8): 1226-38.
[http://dx.doi.org/10.1109/TPAMI.2005.159] [PMID: 16119262]
[48]
Vergara JR, Estévez PA. A review of feature selection methods based on mutual information. Neural Comput Appl 2014; 24(1): 175-86.
[http://dx.doi.org/10.1007/s00521-013-1368-0]
[49]
Elssied NO, Ibrahim O, Osman AH. A novel feature selection based on one-way anova f-test for e-mail spam classification. Res J Appl Sci Eng Technol 2014; 7(3): 625-38.
[http://dx.doi.org/10.19026/rjaset.7.299]
[50]
Genuer R, Poggi JM, Tuleau-Malot C. Variable selection using random forests. Pattern Recognit Lett 2010; 31(14): 2225-36.
[http://dx.doi.org/10.1016/j.patrec.2010.03.014]
[51]
Degenhardt F, Seifert S, Szymczak S. Evaluation of variable selection methods for random forests and omics data sets. Brief Bioinform 2019; 20(2): 492-503.
[http://dx.doi.org/10.1093/bib/bbx124] [PMID: 29045534]
[52]
Tolles J, Meurer WJ. Logistic regression: relating patient characteristics to outcomes. JAMA 2016; 316(5): 533-4.
[http://dx.doi.org/10.1001/jama.2016.7653] [PMID: 27483067]
[53]
Hasan MA, Nasser M, Ahmad S, et al. Feature selection for intrusion detection using random forest. J Inf Secur 2016; 7(3): 129-40.
[http://dx.doi.org/10.4236/jis.2016.73009]
[54]
Celine S, Dominic SM, Devi MS. Logistic Regression for Employability Prediction. Int J Inno Technol Explor Engg 2020; 9(3): 2278-3075.
[55]
Tabaei BP, Herman WH. A multivariate logistic regression equation to screen for diabetes: development and validation. Diabetes Care 2002; 25(11): 1999-2003.
[http://dx.doi.org/10.2337/diacare.25.11.1999] [PMID: 12401746]
[56]
Christian G. Monfort, Alain. Asymptotic properties of the maximum likelihood estimator in dichotomous logit models. J Econom 1981; 17(1): 83-97.
[http://dx.doi.org/10.1016/0304-4076(81)90060-9]
[57]
Sapatinas T. Discriminant analysis and statistical pattern reorganization. J R Stat Soc 2005; 168(3): 635-6.
[http://dx.doi.org/10.1111/j.1467-985X.2005.00368_10.x]
[58]
Li M, Yuan B. 2D-LDA: A statistical linear discriminant analysis for image matrix. Pattern Recognit Lett 2005; 26(5): 527-32.
[http://dx.doi.org/10.1016/j.patrec.2004.09.007]
[59]
Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 1992; 46(3): 175-85.
[60]
Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media 2009.
[http://dx.doi.org/10.1007/978-0-387-84858-7]
[61]
Cortes C, Vapnik V. Support-vector networks. Mach Learn 1995; 20(3): 273-97.
[http://dx.doi.org/10.1007/BF00994018]
[62]
Yu W, Liu T, Valdez R, Gwinn M, Khoury MJ. Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC Med Inform Decis Mak 2010; 10(1): 16.
[http://dx.doi.org/10.1186/1472-6947-10-16] [PMID: 20307319]
[63]
Al Mehedi Hasan M, Nasser M, Pal B. On the KDD’99 dataset: support vector machine based intrusion detection system (ids) with different kernels. Int J Electron Commun Comput Eng 2013; 4(4): 1164-70.
[64]
Tharwat A. Linear vs. quadratic discriminant analysis classifier: a tutorial. Int J Appl Pattern Recognit 2016; 3(2): 145-80.
[http://dx.doi.org/10.1504/IJAPR.2016.079050]
[65]
Nahm FS. Nonparametric statistical tests for the continuous data: the basic concept and the practical use. Korean J Anesthesiol 2016; 69(1): 8-14.
[http://dx.doi.org/10.4097/kjae.2016.69.1.8] [PMID: 26885295]
[66]
Yao X. Evolutionary artificial neural networks. Int J Neural Syst 1993; 4(3): 203-22.
[http://dx.doi.org/10.1142/S0129065793000171] [PMID: 8293227]
[67]
Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Res 1998; 26(9): 2230-6.
[http://dx.doi.org/10.1093/nar/26.9.2230] [PMID: 9547285]
[68]
Loh WY. Classification and regression trees. Wiley Interdiscip Rev Data Min Knowl Discov 2011; 1(1): 14-23.
[http://dx.doi.org/10.1002/widm.8]
[69]
Podgorelec V, Kokol P, Stiglic B, Rozman I. Decision trees: an overview and their use in medicine. J Med Syst 2002; 26(5): 445-63.
[http://dx.doi.org/10.1023/A:1016409317640] [PMID: 12182209]
[70]
Breiman L. Random forests. Mach Learn 2001; 45: 5-32.
[http://dx.doi.org/10.1023/A:1010933404324]
[71]
Islam MM, Alam MJ, Ahmed FF, Hasan MM, Mollah MNH. Improved prediction of protein-protein interaction mapping on Homo sapiens by using amino acid sequence features in a supervised learning framework. Protein Pept Lett 2020; 28(1): 74-83.
[http://dx.doi.org/10.2174/0929866527666200610141258] [PMID: 32520672]
[72]
Islam MM, Rahman MJ, Chandra Roy D, Maniruzzaman M. Automated detection and classification of diabetes disease based on Bangladesh demographic and health survey data, 2011 using machine learning approach. Diabetes Metab Syndr 2020; 14(3): 217-9.
[http://dx.doi.org/10.1016/j.dsx.2020.03.004] [PMID: 32193086]
[73]
Centor RM. Signal detectability: the use of ROC curves and their analyses. Med Decis Making 1991; 11(2): 102-6.
[http://dx.doi.org/10.1177/0272989X9101100205] [PMID: 1865776]
[74]
Asres Y, Yemane T, Gedefaw L. Determinant factors of anemia among nonpregnant women of childbearing age in southwest Ethiopia: a community based study. Int Sch Res Notices 2014; 2014: 391580.
[http://dx.doi.org/10.1155/2014/391580] [PMID: 27355074]
[75]
Dangour AD, Hill HL, Ismail SJ. Haemoglobin status of adult non-pregnant Kazakh women living in Kzyl-Orda region, Kazakhstan. Eur J Clin Nutr 2001; 55(12): 1068-75.
[http://dx.doi.org/10.1038/sj.ejcn.1601267] [PMID: 11781673]
[76]
Arabyat R, Arabyat G, Al-Taani G. Prevalence and risk factors of anaemia among ever-married women in Jordan. East Mediterr Health J 2019; 25(8): 543-52.
[http://dx.doi.org/10.26719/emhj.18.074] [PMID: 31612968]

Rights & Permissions Print Cite
© 2024 Bentham Science Publishers | Privacy Policy