Submitted:
10 December 2024
Posted:
11 December 2024
You are already at the latest version
Abstract
This article analyzes machine learning methods for predicting diabetes using clinical data. The study focuses on four models: Logistic Regression, Decision Tree, Gradient Boosting, and XGBoost. Patient data, including age, gender, blood glucose levels, and body mass index, were used to evaluate the accuracy of these models. Data preprocessing techniques such as scaling and normalization were applied to improve results. The findings reveal that ensemble methods like Gradient Boosting and XGBoost outperform traditional models in prediction accuracy. These machine learning approaches not only enhance prediction but also identify key risk factors, aiding in early diagnosis and timely prevention of diabetes.

Keywords:
Introduction
Data and Methods
Experiments and results




Discussion
Conclusion
References
- Roglic G. (ed.). Global report on diabetes. – World Health Organization, 2016.
- Zou H., Hastie T. Regularization and variable selection via the elastic net //Journal of the Royal Statistical Society Series B: Statistical Methodology. – 2005. – Т. 67. – №. 2. – P. 301-320.
- Breiman L. Random forests //Machine learning. – 2001. – Т. 45. – P. 5-32.
- Choi B. G. et al. Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks //Yonsei medical journal. – 2019. – Т. 60. – №. 2. – 191 p.
- Hosmer Jr D. W., Lemeshow S., Sturdivant R. X. Applied logistic regression. – John Wiley & Sons, 2013.
- Chen T., Guestrin C. Xgboost: A scalable tree boosting system //Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. – 2016. – P. 785-794.
- Atlas D. et al. IDF diabetes atlas. International Diabetes Federation (9th edition) [Electronic resource].
- Gaso M. S. et al. Utilizing Machine and Deep Learning Techniques for Predicting Re-admission Cases in Diabetes Patients //Proceedings of the International Conference on Computer Systems and Technologies 2024. – 2024. – P. 76-81.
- Emmanuel T. et al. A survey on missing data in machine learning //Journal of Big data. – 2021. – Т. 8. – P. 1-37.
- Friedman J. H. Greedy function approximation: a gradient boosting machine //Annals of statistics. – 2001. – P. 1189-1232.
- Shaiakhmetov D. et al. Morphological classification of galaxies using SpinalNet //2021 16th International Conference on Electronics Computer and Computation (ICECCO). – IEEE, 2021. – P. 1-5.
- Toktosunova A. et al. Developing an Artificial Intelligence Tool for Image Generation Using a Unique Dataset with Image-to-Image Functionality //Proceedings of the International Conference on Computer Systems and Technologies 2024. – 2024. – P. 132-136.
- Sadriddin Z., Mekuria R. R., Gaso M. S. Machine Learning Models for Advanced Air Quality Prediction //Proceedings of the International Conference on Computer Systems and Technologies 2024. – 2024. – P. 51-56.
| Class | Precision | Recall | F1-score | Support |
| 0 | 0,77 | 0,87 | 0,82 | 150 |
| 1 | 0,68 | 0,52 | 0,59 | 81 |
| accuracy | 0,74 | 231 |
| Class | Precision | Recall | F1-score | Support |
| 0 | 0,78 | 0,87 | 0,83 | 150 |
| 1 | 0,70 | 0,56 | 0,62 | 81 |
| accuracy | 0,76 | 231 |
| Сlass | Precision | Recall | F1-score | Support |
| 0 | 0,88 | 0,84 | 0,86 | 75 |
| 1 | 0,85 | 0,88 | 0,86 | 75 |
| accuracy | 0,86 | 150 |
| Class | Precision | Recall | F1-score | Support |
| 0 | 0,86 | 0,80 | 0,83 | 75 |
| 1 | 0,81 | 0,87 | 0,84 | 75 |
| accuracy | 0,83 | 150 |
| Methods | Logistic Regression | Decision tree | Gradient Boosting | XGBoost |
| Accuracy | 0.74458874458% | 0.76190476% | 0.86% | 0.833334% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).