Submitted:
13 June 2025
Posted:
16 June 2025
You are already at the latest version
Abstract
Keywords:
Introduction
Methods
Data
Data Cleaning and Preprocessing
Summary Statistics
Feature Engineering
Modeling Strategy and SHAP-Based Interpretation
Results
Exploratory Analysis and Correlation Insights

Model Performance Using XGBoost

Feature Importance and Ranking

SHAP Summary and Global Explanation

SHAP Dependence and Interaction Effects

Force and Waterfall Plots

Interactions and Local Impact Visualizations
Effect Size Based on Correlation Analysis:

Extended SHAP Interpretations

SHAP Summary and Class-Level Feature Impact

Cholesterol Check Feature Dependence

Force Plot Interpretation – Individual Risk Breakdown


Discussion
Conclusions
Future Work
Data Availability Statement
Code Availability Statement
Appendix



References
- Bertsimas, D., Dunn, J., Pawlowski, C., Silberholz, J., & Zhuo, Y. D. (2017). Applied informatics decision support for diabetes risk identification. *Health Informatics Journal*, 23(2), 128–142.
- Molnar, C. (2022). *Interpretable Machine Learning*. https://christophm.github.io/interpretable-ml-book/.
- Katuwal, G. J., & Chen, R. (2016). Machine learning model interpretability for precision medicine. *Pacific Symposium on Biocomputing*, 21, 632–643.
- Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. *Advances in Neural Information Processing Systems*, 30.
- CDC. (2015). Behavioral Risk Factor Surveillance System Survey Data. U.S. Department of Health and Human Services, Centers for Disease Control and Prevention. https://www.cdc.gov/brfss/index.html.
- Alotaibi, M. M., et al. (2017). The role of big data analytics in detecting health care fraud. J Infect Public Health, 10(6), 780–785.
- Anand, D., & Mangat, V. (2018). ML techniques for diabetes prediction using ensemble classifiers. ICCCA.
- Bertsimas, D., et al. (2017). Decision support for diabetes risk. Health Informatics J, 23(2), 128–142.
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
- CDC. (2015). BRFSS 2015 Survey Data. https://www.cdc.gov/brfss/.
- Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. ACM SIGKDD, 785–794.
- Choi, E., et al. (2016). Early detection of heart failure using RNNs. JAMIA, 24(2), 361–370.
- Deo, R. C. (2015). Machine learning in medicine. Circulation, 132(20), 1920–1930.
- Dinh, A., et al. (2019). Predicting diabetes with ML. BMC Med Inform Decis Mak, 19(1), 211.
- Esteva, A., et al. (2019). Deep learning in healthcare. Nat Med, 25(1), 24–29.
- Ghassemi, M., et al. (2021). ML in health: Challenges and opportunities. AMIA Symp Proc, 191–200.
- Hale, T. M., & Cotten, S. R. (2016). Wireless health technologies. Comput Human Behav, 57, 14–24.
- Herman, R. A., & Kane, J. (2018). Forecasting diabetes in the U.S. J Data Sci, 16(2), 327–346.
- Katuwal, G. J., & Chen, R. (2016). Model interpretability for precision medicine. Pac Symp Biocomput, 21, 632–643.
- Khademi, A., & Rabanser, J. (2021). Personalized diabetes warning systems. IEEE JBHI, 25(6), 2043–2050.
- Lundberg, S. M., & Lee, S. I. (2017). SHAP: Model interpretation. NeurIPS, 30.
- Molnar, C. (2022). Interpretable ML Book. https://christophm.github.io/interpretable-ml-book/.
- Rajkomar, A., et al. (2019). ML in medicine. N Engl J Med, 380(14), 1347–1358.
- Shickel, B., et al. (2018). Deep EHR: A survey. IEEE JBHI, 22(5), 1589–1604.
- Wang, F., & Preininger, A. (2019). AI in health. Yearb Med Inform, 28(1), 16–26.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).