Submitted:
01 August 2024
Posted:
02 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Study Design
2.2. Bayesian Multinomial Mixed Effect Model
2.3. Classification Using Machine Learning Model
2.4. Simulation Study
3. Results
3.1. Bayesian Multinomial Mixed Effect Model Result
3.1.1. Prior Predictive Check
3.1.2. Posterior Distribution
3.1.3. Model Diagnostics Checks
3.2. Classification of Diabetic Status
3.3. Simulation Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Castro-Barquero, S.; Lamuela-Raventós, R.M.; Doménech, M.; Estruch, R. Relationship between Mediterranean Dietary Polyphenol Intake and Obesity. Nutrients 2018, 10, 1523. [CrossRef]
- Bloomgarden, Z.T. Cardiovascular Disease, Neuropathy, and Retinopathy. Diabetes Care 2009, 32, e64–e68. [CrossRef]
- Section 12: Retinopathy, Neuropathy, and Foot Care. Clinical Diabetes 2024, 42, 214–215. [CrossRef]
- American Diabetes Association Economic Costs of Diabetes in the U.S. in 2017. Diabetes Care 2018, 41, 917–928. [CrossRef]
- Zhang, P.; Zhang, X.; Brown, J.; Vistisen, D.; Sicree, R.; Shaw, J.; Nichols, G. Global Healthcare Expenditure on Diabetes for 2010 and 2030. Diabetes Research and Clinical Practice 2010, 87, 293–301. [CrossRef]
- Gootjes, C.; Zwaginga, J.J.; Roep, B.O.; Nikolic, T. Functional Impact of Risk Gene Variants on the Autoimmune Responses in Type 1 Diabetes. Front. Immunol. 2022, 13, 886736. [CrossRef]
- Islam, R.; Sultana, A.; Tuhin, Md.N.; Saikat, Md.S.H.; Islam, M.R. Clinical Decision Support System for Diabetic Patients by Predicting Type 2 Diabetes Using Machine Learning Algorithms. Journal of Healthcare Engineering 2023, 2023, 1–11. [CrossRef]
- Tasin, I.; Nabil, T.U.; Islam, S.; Khan, R. Diabetes Prediction Using Machine Learning and Explainable AI Techniques. Healthcare Tech Letters 2023, 10, 1–10. [CrossRef]
- Hossain, M.B.; Khan, Md.N.; Oldroyd, J.C.; Rana, J.; Magliago, D.J.; Chowdhury, E.K.; Karim, M.N.; Islam, R.M. Prevalence of, and Risk Factors for, Diabetes and Prediabetes in Bangladesh: Evidence from the National Survey Using a Multilevel Poisson Regression Model with a Robust Variance. PLOS Glob Public Health 2022, 2, e0000461. [CrossRef]
- Akhtar, S.; Nasir, J.A.; Sarwar, A.; Nasr, N.; Javed, A.; Majeed, R.; Salam, M.A.; Billah, B. Prevalence of Diabetes and Pre-Diabetes in Bangladesh: A Systematic Review and Meta-Analysis. BMJ Open 2020, 10, e036086. [CrossRef]
- Talukder, A.; Hossain, Md.Z. Prevalence of Diabetes Mellitus and Its Associated Factors in Bangladesh: Application of Two-Level Logistic Regression Model. Sci Rep 2020, 10, 10237. [CrossRef]
- Islam, Md.M.; Rahman, Md.J.; Chandra Roy, D.; Maniruzzaman, Md. Automated Detection and Classification of Diabetes Disease Based on Bangladesh Demographic and Health Survey Data, 2011 Using Machine Learning Approach. Diabetes & Metabolic Syndrome: Clinical Research & Reviews 2020, 14, 217–219. [CrossRef]
- Aktar, Mst.F.; Chowdhury, M.H.; Rahman, Md.S. A Quantile Regression Approach to Identify Risk Factors for High Blood Glucose Levels among Bangladeshi Individuals. Health Science Reports 2023, 6, e1772. [CrossRef]
- Howlader, K.C.; Satu, Md.S.; Awal, Md.A.; Islam, Md.R.; Islam, S.M.S.; Quinn, J.M.W.; Moni, M.A. Machine Learning Models for Classification and Identification of Significant Attributes to Detect Type 2 Diabetes. Health Inf Sci Syst 2022, 10, 2. [CrossRef]
- National Institute of Population Research and Training - NIPORT; Ministry of Health and Family Welfare; ICF Bangladesh Demographic and Health Survey 2017-18; NIPORT/ICF: Dhaka, Bangladesh, 2020.
- Schafer, J.L.; Graham, J.W. Missing Data: Our View of the State of the Art. Psychol Methods 2002, 7, 147–177.
- Definition and Diagnosis of Diabetes Mellitus and Intermediate Hyperglycaemia: Report of a WHO/IDF Consultation; World Health Organization: Geneva, Switzerland, 2006; ISBN 978-92-4-159493-6.
- Geifman, N.; Cohen, R.; Rubin, E. Redefining Meaningful Age Groups in the Context of Disease. AGE 2013, 35, 2357–2366. [CrossRef]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis; 0 ed.; Chapman and Hall/CRC, 2013; ISBN 978-0-429-11307-9.
- Carpenter, B.; Gelman, A.; Hoffman, M.D.; Lee, D.; Goodrich, B.; Betancourt, M.; Brubaker, M.; Guo, J.; Li, P.; Riddell, A. Stan : A Probabilistic Programming Language. J. Stat. Soft. 2017, 76. [CrossRef]
- Lemoine, N.P. Moving beyond Noninformative Priors: Why and How to Choose Weakly Informative Priors in Bayesian Analyses. Oikos 2019, 128, 912–928. [CrossRef]
- Wainberg, M.; Mahajan, A.; Kundaje, A.; McCarthy, M.I.; Ingelsson, E.; Sinnott-Armstrong, N.; Rivas, M.A. Homogeneity in the Association of Body Mass Index with Type 2 Diabetes across the UK Biobank: A Mendelian Randomization Study. PLoS Med 2019, 16, e1002982. [CrossRef]
- Ganz, M.L.; Wintfeld, N.; Li, Q.; Alas, V.; Langer, J.; Hammer, M. The Association of Body Mass Index with the Risk of Type 2 Diabetes: A Case–Control Study Nested in an Electronic Health Records System in the United States. Diabetol Metab Syndr 2014, 6, 50. [CrossRef]
- Tran, P.; Tran, L.; Tran, L. Impact of Rurality on Diabetes Screening in the US. BMC Public Health 2019, 19, 1190. [CrossRef]
- Maniruzzaman, Md.; Rahman, Md.J.; Ahammed, B.; Abedin, Md.M. Classification and Prediction of Diabetes Disease Using Machine Learning Paradigm. Health Inf Sci Syst 2020, 8, 7. [CrossRef]
- Tsimihodimos, V.; Gonzalez-Villalpando, C.; Meigs, J.B.; Ferrannini, E. Hypertension and Diabetes Mellitus: Coprediction and Time Trajectories. Hypertension 2018, 71, 422–428. [CrossRef]
- Yesmin, M.; Ali, M.; Saha, S. The Prevalence and Influencing Factors of Coexisting Prediabetes and Prehypertension among Bangladeshi Adults. BMC Public Health 2023, 23, 1184. [CrossRef]
- Gelman, A.; Hill, J. Data Analysis Using Regression and Multilevel/Hierarchical Models; 1st ed.; Cambridge University Press, 2006; ISBN 978-0-521-86706-1.
- Rajkomar, A.; Dean, J.; Kohane, I. Machine Learning in Medicine. N Engl J Med 2019, 380, 1347–1358. [CrossRef]
- Deo, R.C. Machine Learning in Medicine. Circulation 2015, 132, 1920–1930. [CrossRef]
- Boulesteix, A.-L.; Strimmer, K. Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data. Briefings in Bioinformatics 2006, 8, 32–44. [CrossRef]
- Breiman, L. Random Forests. Machine Learning 2001, 45, 5–32. [CrossRef]
- Cover, T.; Hart, P. Nearest Neighbor Pattern Classification. IEEE Trans. Inform. Theory 1967, 13, 21–27. [CrossRef]
- Fisher, R.A. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS. Annals of Eugenics 1936, 7, 179–188. [CrossRef]
- Beam, A.L.; Kohane, I.S. Big Data and Machine Learning in Health Care. JAMA 2018, 319, 1317. [CrossRef]
- Haibo He; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [CrossRef]
- Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine Learning: A Review of Classification and Combining Techniques. Artif Intell Rev 2006, 26, 159–190. [CrossRef]
- Friedman, M. A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings. Ann. Math. Statist. 1940, 11, 86–92. [CrossRef]
- Nemenyi, P. Distribution-Free Multiple Comparisons; Princeton University, 1963.
- Azur, M.J.; Stuart, E.A.; Frangakis, C.; Leaf, P.J. Multiple Imputation by Chained Equations: What Is It and How Does It Work? Int J Methods Psych Res 2011, 20, 40–49. [CrossRef]
- Automated Machine Learning: Methods, Systems, Challenges; Hutter, F., Kotthoff, L., Vanschoren, J., Eds.; The Springer Series on Challenges in Machine Learning; Springer International Publishing: Cham, 2019; ISBN 978-3-030-05317-8.




| Variables | Categories | Frequency | Percentages | Proportion Test (p value) |
|---|---|---|---|---|
| Division | Barisal | 1272 | 10.4 | <0.001 |
| Chittagong | 1661 | 13.5 | ||
| Dhaka | 1675 | 13.6 | ||
| Khulna | 1665 | 13.6 | ||
| Mymensingh | 1408 | 11.5 | ||
| Rajshahi | 1591 | 13 | ||
| Rangpur | 1545 | 12.6 | ||
| Sylhet | 1461 | 11.9 | ||
| Residence | Rural | 7837 | 63.8 | <0.001 |
| Urban | 4441 | 36.2 | ||
| Wealth | Poorest | 2419 | 19.7 | <0.001 |
| Middle | 2315 | 18.9 | ||
| Poorer | 2389 | 19.5 | ||
| Richest | 2389 | 19.5 | ||
| Richer | 2766 | 22.5 | ||
| Gender | Male | 6943 | 56.5 | <0.001 |
| Female | 5335 | 43.5 | ||
| Age | Adult | 1174 | 9.6 | <0.001 |
| Young | 2518 | 20.5 | ||
| Middle | 5465 | 44.5 | ||
| Old | 3121 | 25.4 | ||
| Education | Primary | 2025 | 16.5 | <0.001 |
| No Education | 3009 | 24.5 | ||
| Secondary | 3715 | 30.3 | ||
| Higher | 3529 | 28.7 | ||
| Employment | No | 7458 | 60.7 | <0.001 |
| Yes | 4820 | 39.3 | ||
| BMI | Normal Weight | 7285 | 59.3 | <0.001 |
| Underweight | 537 | 4.4 | ||
| Overweight | 2314 | 18.8 | ||
| Obese | 2142 | 17.4 | ||
| Hypertension | No | 2971 | 24.2 | <0.001 |
| Yes | 9307 | 75.8 | ||
| Diabetic Status |
Diabetic | 1527 | 12.4 | <0.001 |
| Normal | 9232 | 75.2 | ||
| Prediabetic | 1519 | 12.4 |
| Variables | Categories |
Diabetic N (%) |
Non-Diabetic N (%) |
Prediabetic N (%) |
Chi Square (p value) |
| Division | Barisal | 152(11.9%) | 925(72.7%) | 195 (15.3%) | <0.001 |
| Chittagong | 237(14.3%) | 1199(72.2%) | 225(13.5%) | ||
| Dhaka | 363(21.7%) | 982(58.6%) | 330 (19.7%) | ||
| Khulna | 180(10.8%) | 1322(79.4%) | 163(9.8%) | ||
| Mymensingh | 158(11.2%) | 1093(77.6%) | 157(11.2%) | ||
| Rajshahi | 159(10.0%) | 1288(81.0%) | 144(9.1%) | ||
| Rangpur | 100(6.5%) | 1311(84.9%) | 134(8.7%) | ||
| Sylhet | 178(12.2%) | 1112(76.1%) | 171(11.7%) | ||
| Residence | Rural | 805(10.3%) | 6105(77.9%) | 927(11.8%) | <0.001 |
| Urban | 722(16.3%) | 3127(70.4%) | 592(13.3%) | ||
| Wealth | Poorest | 178(7.5%) | 1949(81.6%) | 262(11.0%) | <0.001 |
| Middle | 244(10.1%) | 1923(79.5%) | 252(10.4%) | ||
| Poorer | 186(8.0%) | 1904(82.2%) | 225(9.7%) | ||
| Richest | 606(21.9%) | 1701(61.5%) | 459(16.6%) | ||
| Richer | 313(13.1%) | 1755(73.5%) | 321(13.4%) | ||
| Gender | Male | 671(12.6%) | 4010(75.2%) | 654(12.3%) | <0.001 |
| Female | 856(12.3%) | 5222(75.2%) | 865(12.5%) | ||
| Age | Adult | 207(8.2%) | 2070(82.2%) | 241(9.6%) | <0.001 |
| Young | 627(11.5%) | 4150(75.9%) | 688(12.6%) | ||
| Middle | 519(16.6%) | 2163(69.3%) | 439(14.1%) | ||
| Old | 174(14.8%) | 849(72.3%) | 151(12.9%) | ||
| Education | Primary | 459(12.4%) | 2816(75.8%) | 440(11.8%) | <0.001 |
| No Education | 351(11.7%) | 2270(75.4%) | 388(12.9%) | ||
| Secondary | 444(12.6%) | 2653(75.2%) | 432(12.2%) | ||
| Higher | 273(13.5%) | 1493(73.7%) | 259(12.8%) | ||
| Employment | Work | 719(14.9%) | 3477(72.1%) | 624(12.9%) | <0.001 |
| No Work | 808(10.8%) | 5755(77.2%) | 895(12.0%) | ||
| BMI | Normal Weight | 818(11.2%) | 5617(77.1%) | 850(11.7%) | <0.001 |
| Underweight | 175(8.2%) | 1734(81.0%) | 233(10.9%) | ||
| Overweight | 413(17.8%) | 1566(67.7%) | 335(14.5%) | ||
| Obese | 121(22.5%) | 315(58.7%) | 101(18.8%) | ||
| Hypertension | No | 1000(10.7%) | 7176(77.1%) | 1131(12.2%) | <0.001 |
| Yes | 527(17.7%) | 2056(69.2%) | 388(13.1%) |
| Predictor | Levels | Diabetic Response | Prediabetic Response | ||||
| OR | SE | 95% CI | OR | SE | 95% CI | ||
| Intercept | 0.65 | 48.91 | (0.00, 1702.75) | 0.35 | 42.1 | (0.00, 544.57) | |
| Gender | Female | (Ref1) | |||||
| Male | 0.97 | 1.04 | (0.90, 1.05) | 0.98 | 1.12 | (0.79, 1.22) | |
| Age | Adult | (Ref1) | |||||
| Young | 1.26 | 1.03 | (1.19, 1.34) | 1.95 | 1.15 | (1.48, 2.59) | |
| Middle | 1.20 | 1.06 | (1.06, 1.35) | 1.95 | 1.15 | (1.49, 2.56) | |
| Old | 0.73 | 1.07 | (0.64, 0.84) | 1.95 | 1.31 | (1.15, 3.39) | |
| Residence | Urban | (Ref1) | |||||
| Rural | 0.95 | 1.03 | (0.90, 1.01) | 1.22 | 1.19 | (0.87, 1.72) | |
| Wealth | Poor | (Ref1) | |||||
| Poorer | 1.09 | 1.05 | (0.99, 1.21) | 0.94 | 1.23 | (0.63, 1.43) | |
| Middle | 1.13 | 1.06 | (1.00, 1.26) | 1.22 | 1.23 | (0.81, 1.86) | |
| Richer | 1.17 | 1.05 | (1.06, 1.30) | 1.38 | 1.23 | (0.92, 2.08) | |
| Richest | 1.22 | 1.05 | (1.11, 1.35) | 1.79 | 1.27 | (1.13, 2.80) | |
| Education | No | (Ref1) | |||||
| Primary | 0.57 | 1.34 | (0.32, 1.01) | 1.20 | 1.21 | (0.83, 1.73) | |
| Secondary | 0.73 | 1.35 | (0.40, 1.34) | 1.06 | 1.16 | (0.79, 1.42) | |
| Higher | 0.67 | 1.35 | (0.38, 1.20) | 1.06 | 1.17 | (0.77, 1.45) | |
| Employment | No | (Ref1) | |||||
| Yes | 1.14 | 1.06 | (1.02, 1.28) | 0.95 | 1.13 | (0.75, 1.21) | |
| BMI | Normal | (Ref1) | |||||
| Underweight | 1.30 | 1.14 | (1.00, 1.67) | 0.97 | 1.15 | (0.74, 1.27) | |
| Overweight | 1.36 | 1.05 | (1.25, 1.51) | 1.08 | 1.12 | (0.87, 1.34) | |
| Obese | 1.26 | 1.03 | (1.19, 1.34) | 1.31 | 1.19 | (0.95, 1.84) | |
| Hypertension | No | (Ref1) | |||||
| Yes | 3.13 | 1.35 | (1.75, 5.64) | 0.98 | 1.12 | (0.79, 1.21) | |
| Models | Accuracy on Classification | |
| Model 1 | Model 2 | |
| Logistic Regression | 0.752 | 0.875 |
| Decision Tree | 0.752 | 0.875 |
| KNN | 0.753 | 0.875 |
| Linear Discriminant Analysis | 0.752 | 0.875 |
| Random Forest | 0.743 | 0.869 |
| Models | Accuracy on Classification | ||||||||||
| Trinomial | Binomial | ||||||||||
| 40 - 30 - 30 (%) | 40 - 40 - 20 (%) | 50 - 30 - 20 (%) | 60 - 30 -10 (%) | 70 - 20 - 10 (%) | 80 - 10 - 10 (%) | 50 - 50 (%) | 60 - 40 (%) | 70 - 30 (%) | 80 - 20 (%) | 90 - 10 (%) | |
| LR | 0.389 | 0.39 | 0.503 | 0.608 | 0.705 | 0.799 | 0.499 | 0.608 | 0.705 | 0.799 | 0.900 |
| KNN | 0.363 | 0.364 | 0.394 | 0.493 | 0.588 | 0.729 | 0.500 | 0.532 | 0.606 | 0.737 | 0.881 |
| RF | 0.356 | 0.393 | 0.463 | 0.580 | 0.695 | 0.796 | 0.506 | 0.568 | 0.684 | 0.792 | 0.898 |
| DT | 0.394 | 0.406 | 0.507 | 0.608 | 0.705 | 0.799 | 0.493 | 0.608 | 0.705 | 0.799 | 0.900 |
| LDA | 0.389 | 0.389 | 0.507 | 0.608 | 0.705 | 0.799 | 0.499 | 0.608 | 0.705 | 0.799 | 0.900 |
| 40-30-30(%) | 40-40-20(%) | 50-30-20(%) | 60-30-10(%) | 70-20-10(%) | 80-10-10(%) | 50-50(%) | 60-40(%) | 70-30(%) | 80-20(%) | |
| 40-40-20(%) | 1.000 | - | - | - | - | - | - | - | - | - |
| 50-30-20(%) | 0.984 | 0.999 | - | - | - | - | - | - | - | - |
| 60-30-10(%) | 0.647 | 0.874 | 0.999 | - | - | - | - | - | - | - |
| 70-20-10(%) | 0.082 | 0.214 | 0.744 | 0.994 | - | - | - | - | - | - |
| 80-10-10(%) | 0.003 | 0.013 | 0.153 | 0.647 | 0.997 | - | - | - | - | - |
| 50-50(%) | 0.984 | 0.999 | 1.000 | 0.999 | 0.744 | 0.153 | - | - | - | - |
| 60-40(%) | 0.579 | 0.828 | 0.998 | 1.000 | 0.997 | 0.713 | 0.998 | - | - | - |
| 70-30(%) | 0.082 | 0.214 | 0.744 | 0.994 | 1.000 | 0.997 | 0.744 | 0.997 | - | - |
| 80-20(%) | 0.003 | 0.013 | 0.153 | 0.647 | 0.997 | 1.000 | 0.153 | 0.713 | 0.997 | - |
| 90-10(%) | 0.000 | 0.001 | 0.018 | 0.192 | 0.852 | 1.000 | 0.018 | 0.237 | 0.852 | 1.000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).