Submitted:
21 June 2025
Posted:
26 June 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works
2.1. Synopsis of Diabetes Mellitus
2.2. Existing Comparative Analysis of ML, DL, and ensemble models for DM prediction
3. Materials and Methods
3.1. Sampling Techniques for Datasets Imbalance
3.1.1. Oversampling Techniques
- a)
- Synthetic Minority Oversampling Techniques (SMOTE): SMOTE balances class distribution by creating artificial samples for the minority class. Instead of duplicating existing samples, it generates new instances by interpolating between them, selecting k nearest neighbours, and using a random interpolation factor to promote diversity. [48]. SMOTE is represented as:
- b)
- Adaptive Synthetic Sampling (ADASYN): ADASYN, an adaptive extension of SMOTE, emphasizes complex minority class samples by assigning greater weights to those near the decision boundary or surrounded by majority class samples. It generates synthetic data in these difficult areas, improving model robustness and refining the decision boundary in imbalanced datasets. Mathematically, it is represented in this regard:
- c)
- SMOTE-ENN and Random Oversampling are other techniques used to address class imbalance in datasets. SMOTE-ENN enhances decision boundaries by generating synthetic samples for the minority class and removing ambiguous instances using Edited Nearest Neighbours [49,50]. Random Oversampling, on the contrary, increases the minority class size by duplicating existing samples, which is simple and efficient but carries a risk of overfitting. This risk can be mitigated by resampling with replacement to maintain a more diverse and balanced dataset [51].
3.1.2. Undersampling Techniques
3.2. Machine Learning and Deep Learning Techniques employed.
3.2.1. Machine learning (ML)
- a)
- b)
- c)
- Decision Trees (DT): This supervised learning method splits data into subsets based on features to make predictions. It consists of nodes (decisions), branches (outcomes), and leaves (predictions), using criteria like MSE or Gini Index to determine splits [58].
- d)
- e)
- f)
- g)
- h)
3.2.2. Deep Learning models
- a)
- Convolutional Neural Networks (CNN): CNNs are deep learning models for grid-like data (e.g., images). They utilize convolutional layers for spatial feature extraction, pooling layers for dimensionality reduction, and fully connected layers for classification or regression, leveraging weight sharing and local connectivity [16,64,65].
- b)
- c)
- Recurrent Neural Networks (RNN): RNNs retain memory of previous inputs using hidden states, making them suitable for interpreting sequential data and capturing temporal dependencies [16].
- d)
- e)
- Gated Recurrent Unit (GRU): GRUs are a type of RNN that uses gating techniques to manage information flow, helping retain important historical data while discarding irrelevant details [16].
3.2.3. Hybrids and Ensemble strategies
3.3. Performance Metrics Tools
3.3.1. Hyperparameter Tuning
3.3.2. Evaluation Metrics
3.4. Datasets
3.4.1. Dataset 1
3.4.2. Dataset 2
3.4.3. Dataset 3
3.4.4. Dataset 4
3.4.5. Dataset 5
3.5. Preprocessing
- Median Imputation: In each column, the median of non-zero values for zeros is substituted.
- Minimum Imputation: Instead of actual measurement, the zeros may mean data was not collected. This might indicate that the physiological levels of the patients with missing results were normal. Consequently, we used each column's smallest non-zero value to impute missing data.
4. Methodology Flow Diagram
5. Results Analysis
5.1. Result Analysis on Dataset 1
5.2. Result Analysis on Dataset 2
5.3. Result Analysis on Dataset 3
5.4. Result Analysis on Dataset 4
5.5. Result Analysis on Dataset 5
6.1. Top-performing Models and Their Implications
6.2. Comparative Analysis of Results with already developed diabetes prediction models.
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| DM Diabetes Mellitus ML Machine Learning DL Deep Learning AU-ROC Area under the ROC KPI Key Performance Indicators IDF International Diabetes Federation T1DM Type 1 DM T2DM Type 2 DM GDM Gestational DM RF Random Forest LR Logistic Regression XGBoost Extreme Gradient Boosting NB Naive Bayes SVM Support Vector Machine NN Neural Networks RNN Recurrent NN CNN Convolutional NN DNN Deep NN QML Quantum ML KNN k-Nearest Neighbour CVD Cardiovascular diseases DT Decision Trees LSTM Long Short-Term Memory AdaBoost Adaptive Boosting GRU Gated Recurrent Unit ANN Artificial Neural Networks MU Memory Usage TT Inference time |
References
- Kavakiotis, I. O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, "Machine Learning and Data Mining Methods in Diabetes Research," Computational and Structural Biotechnology Journal, vol. 15, pp. 104-116, 2017. [CrossRef]
- IDF. International Diabetes Federation (IDF) Diabetes Atlas 2021 (IDF Atlas 2021); International Diabetes Federation: Brussels, Belgium, 2021; pp. 1–141.
- Refat, M.A.R.; Amin, M.A.; Kaushal, C.; Yeasmin, M.N.; Islam, M.K. A Comparative Analysis of Early Stage Diabetes Prediction using Machine Learning and Deep Learning Approach. In Proceedings of the 6th IEEE International Conference on Signal Processing, Computing and Control (ISPCC), Solan, India, 7–9 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 654–659. [CrossRef]
- Ayon, I.S.; Islam, M.M. Diabetes Prediction: A Deep Learning Approach. Int. J. Inf. Eng. Electron. Bus. 2019, 11, 21–27. [CrossRef]
- Butt, U.M.; Letchmunan, S.; Ali, M.; Hassan, F.H.; Baqir, A.; Sherazi, H.H.R.; Espino, D. Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications. J. Healthc. Eng. 2021, 2021, 9930985. [CrossRef]
- David, S.A.; Varsha, V.; Ravali, Y.; Naga Amrutha Saranya, N. Comparative Analysis of Diabetes Prediction Using Machine Learning. In Soft Computing for Security Applications; Ranganathan, G., Fernando, X., Piramuthu, S., Eds.; Advances in Intelligent Systems and Computing; Springer: Singapore, 2022; Volume 1428, pp. 155–163, Chapter 13.
- Longato, E.; Fadini, G.P.; Sparacino, G.; Avogaro, A.; Tramontan, L.; Di Camillo, B. A Deep Learning Approach to Predict Diabetes’ Cardiovascular Complications From Administrative Claims. IEEE J. Biomed. Health Inform. 2021, 25, 3608–3617. [CrossRef]
- Saeedi, P.; Petersohn, I.; Salpea, P.; Malanda, B.; Karuranga, S.; Unwin, N.; Colagiuri, S.; Guariguata, L.; Motala, A.A.; Ogurtsova, K.; et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9(th) editions. Diabetes Res. Clin. Pract. 2019, 157, 107843. [CrossRef]
- Zarkogianni, K.; Athanasiou, M.; Thanopoulou, A.C.; Nikita, K.S. Comparison of Machine Learning Approaches Toward Assessing the Risk of Developing Cardiovascular Disease as a Long-Term Diabetes Complication. IEEE J. Biomed. Health Inform. 2018, 22, 1637–1647. [CrossRef]
- Dinh, A.; Miertschin, S.; Young, A.; Mohanty, S.D. A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med. Inf. Decis. Mak. 2019, 19, 211. [CrossRef]
- Hasan, M.M.; Ahmad, S.; Ahmed, A.H.; Sayed, A.; Mia, T.; Ayon, E.H.; Koli, T.; Thakur, H.N. Cardiovascular Disease Prediction Through Comparative Analysis of Machine Learning Models. In Proceedings of the 2023 International Conference on Modelling & E-Information Research, Artificial Learning and Digital Applications (ICMERALDA), Karawang, Indonesia, 24 November 2023.
- Lin, X.; Xu, Y.; Pan, X.; Xu, J.; Ding, Y.; Sun, X.; Song, X.; Ren, Y.; Shan, P.F. Global, regional, and national burden and trend of diabetes in 195 countries and territories—An analysis from 1990 to 2025. Sci. Rep. 2020, 10, 14790. [CrossRef]
- Kodama, S.; Fujihara, K.; Horikawa, C.; Kitazawa, M.; Iwanaga, M.; Kato, K.; Watanabe, K.; Nakagawa, Y.; Matsuzaka, T.; Shimano, H.; et al. Predictive ability of current machine learning algorithms for type 2 diabetes mellitus: A meta-analysis. J. Diabetes Investig. 2022, 13, 900–908. [CrossRef]
- Larabi-Marie-Sainte, S.; Aburahmah, L.; Almohaini, R.; Saba, T. Current Techniques for Diabetes Prediction: Review and Case Study. Appl. Sci. 2019, 9, 4604. [CrossRef]
- Islam, S.; Tariq, F. Machine Learning-Enabled Detection and Management of Diabetes Mellitus. In Artificial Intelligence for Disease Diagnosis and Prognosis in Smart Healthcare; Ghita Kouadri Mostefaoui, S. M. Riazul Islam, and Tariq F.; Eds.; CRC Press: Boca Raton, New York, USA; 2020, Chapter 12, pp. 113–125. 2023; pp. 203–218. [CrossRef]
- Afsaneh, E.; Sharifdini, A.; Ghazzaghi, H.; Ghobadi, M.Z. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: A comprehensive review. Diabetol. Metab. Syndr. 2022, 14, 196. [CrossRef]
- Giacomo, C.; Martina, V.; Giovanni, S.; Andrea, F. Continuous Glucose Monitoring Sensors for Diabetes Management—A Review of Technologies and Applications. Diabetes Metab. J. 2019, 43, 383–397. [CrossRef]
- Nomura, A.; Noguchi, M.; Kometani, M.; Furukawa, K.; Yoneda, T. Artificial Intelligence in Current Diabetes Management and Prediction. Curr. Diab Rep. 2021, 21, 61. [CrossRef]
- Guan, Z.; Li, H.; Liu, R.; Cai, C.; Liu, Y.; Li, J.; Wang, X.; Huang, S.; Wu, L.; Liu, D.; et al. Artificial intelligence in diabetes management: Advancements, opportunities, and challenges. Cell Rep. Med. 2023, 4, 101213. [CrossRef]
- Lu, H.Y.; Ding, X.; Hirst, J.E.; Yang, Y.; Yang, J.; Mackillop, L.; Clifton, D.A. Digital Health and Machine Learning Technologies for Blood Glucose Monitoring and Management of Gestational Diabetes. IEEE Rev. Biomed. Eng. 2024, 17, 98–117. [CrossRef]
- Ba, T.; Li, S.; Wei, Y. A data-driven machine learning integrated wearable medical sensor framework for elderly care service. Measurement 2021, 167, 108383. [CrossRef]
- Kakoly, I.J.; Hoque, M.R.; Hasan, N. Data-Driven Diabetes Risk Factor Prediction Using Machine Learning Algorithms with Feature Selection Technique. Sustainability 2023, 15, 4930. [CrossRef]
- Mora, T.; Roche, D.; Rodriguez-Sanchez, B. Predicting the onset of diabetes-related complications after a diabetes diagnosis with machine learning algorithms. Diabetes Res. Clin. Pract. 2023, 204, 110910. [CrossRef]
- Han, B.C.; Kim, J.; Choi, J. Prediction of complications in diabetes mellitus using machine learning models with transplanted topic model features. Biomed. Eng. Lett. 2024, 14, 163–171. [CrossRef]
- Dagliati, A.; Marini, S.; Sacchi, L.; Cogni, G.; Teliti, M.; Tibollo, V.; De Cata, P.; Chiovato, L.; Bellazzi, R. Machine Learning Methods to Predict Diabetes Complications. J. Diabetes Sci. Technol. 2018, 12, 295–302. [CrossRef]
- Ochocinski, D.; Dalal, M.; Black, L.V.; Carr, S.; Lew, J.; Sullivan, K.; Kissoon, N. Life-Threatening Infectious Complications in Sickle Cell Disease: A Concise Narrative Review. Front. Pediatr. 2020, 8, 38. [CrossRef]
- Tan, K.R.; Seng, J.J.B.; Kwan, Y.H.; Chen, Y.J.; Zainudin, S.B.; Loh, D.H.F.; Liu, N.; Low, L.L. Evaluation of Machine Learning Methods Developed for Prediction of Diabetes Complications: A Systematic Review. J. Diabetes Sci. Technol. 2023, 17, 474–489. [CrossRef]
- Chauhan, A.S.; Varre, M.S.; Izuora, K.; Trabia, M.B.; Dufek, J.S. Prediction of Diabetes Mellitus Progression Using Supervised Machine Learning. Sensors 2023, 23, 4658. [CrossRef]
- Skyler, J.S.; Bakris, G.L.; Bonifacio, E.; Darsow, T.; Eckel, R.H.; Groop, L.; Groop, P.-H.; Handelsman, Y.; Insel, R.A.; Mathieu, C.; et al. Differentiation of Diabetes by Pathophysiology, Natural History, and Prognosis. Diabetes 2017, 66, 241–255. [CrossRef]
- Banday, M.Z.; Sameer, A.S.; Nissar, S. Pathophysiology of diabetes—An overview. Avicenna J. Med. 2020, 10, 174–188. [CrossRef]
- Fujimoto, W.Y. The Importance of Insulin Resistance in the Pathogenesis of Type 2 Diabetes Mellitus. Am. J. Med. 2000, 108, 9S–14S. [CrossRef]
- Galicia-Garcia, U.; Benito-Vicente, A.; Jebari, S.; Larrea-Sebal, A.; Siddiqi, H.; Uribe, K.B.; Ostolaza, H.; Martín, C. Pathophysiology of Type 2 Diabetes Mellitus. Int. J. Mol. Sci. 2020, 21, 6275. [CrossRef]
- Agliata, A.; Giordano, D.; Bardozzo, F.; Bottiglieri, S.; Facchiano, A.; Tagliaferri, R. Machine Learning as a Support for the Diagnosis of Type 2 Diabetes. Int. J. Mol. Sci. 2023, 24, 6775. [CrossRef]
- McIntyre, H.D.; Catalano, P.; Zhang, C.; Desoye, G.; Mathiesen, E.R.; Damm, P.; Primers, N.R.D. Gestational diabetes mellitus. Nat. Reviews. Dis. Primers 2019, 5, 47. [CrossRef]
- Plows, J.F.; Stanley, J.L.; Baker, P.N.; Reynolds, C.M.; Vickers, M.H. The Pathophysiology of Gestational Diabetes Mellitus. Int. J. Mol. Sci. 2018, 19, 3342. [CrossRef]
- Ahmad, R.; Narwaria, M.; Haque, M. Gestational diabetes mellitus prevalence and progression to type 2 diabetes mellitus: A matter of global concern. Adv. Hum. Biol. 2023, 13, 232–237. [CrossRef]
- Mahajan, P.; Uddin, S.; Hajati, F.; Moni, M.A.; Gide, E. A comparative evaluation of machine learning ensemble approaches for disease prediction using multiple datasets. Health Technol. 2024, 14, 597–613. [CrossRef]
- Flores, L.; Hernandez, R.M.; Macatangay, L.H.; Garcia, S.M.G.; Melo, J.R. Comparative analysis in the prediction of early-stage diabetes using multiple machine learning techniques. Indones. J. Electr. Eng. Comput. Sci. 2023, 32, 887. [CrossRef]
- Gupta, H.; Varshney, H.; Sharma, T.K.; Pachauri, N.; Verma, O.P. Comparative performance analysis of quantum machine learning with deep learning for diabetes prediction. Complex. Intell. Syst. 2022, 8, 3073–3087. [CrossRef]
- Aggarwal, N.; Basha, C.B.; Arya, A.; Gupta, N. A Comparative Analysis of Machine Leaming-Based Classifiers for Predicting Diabetes. In Proceedings of the 2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech), Banur, India, 23–24 December 2023.
- Swathy, M.; Saruladha, K. A comparative study of classification and prediction of Cardio-vascular diseases (CVD) using Machine Learning and Deep Learning techniques. ICT Express 2022, 8, 109–116. [CrossRef]
- Fregoso-Aparicio, L.; Noguez, J.; Montesinos, L.; Garcia-Garcia, J.A. Machine learning and deep learning predictive models for type 2 diabetes: A systematic review. Diabetol. Metab. Syndr. 2021, 13, 148. [CrossRef]
- Uddin, S.; Khan, A.; Hossain, M.E.; Moni, M.A. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Mak. 2019, 19, 281. [CrossRef]
- Naz, H.; Ahuja, S. Deep learning approach for diabetes prediction using PIMA Indian dataset. J. Diabetes Metab. Disord. 2020, 19, 391–403. [CrossRef]
- Hasan, M.K.; Alam, M.A.; Das, D.; Hossain, E.; Hasan, M. Diabetes Prediction Using Ensembling of Different Machine Learning Classifiers. IEEE Access 2020, 8, 76516–76531. [CrossRef]
- Sahoo, A.K.; Pradhan, C.; Das, H.; Rout, M.; Das, H.; Rout, J.K. Performance Evaluation of Different Machine Learning Methods and Deep-Learning Based Convolutional Neural Network for Health Decision Making. In Nature Inspired Computing for Data Science; Rout, M., Rout, J.K., Das, H., Eds.; Studies in Computational Intelligence; Springer International Publishing AG: Cham, Switzerland, 2020; Volume 871, pp. 201–212, Chapter 8.
- Lai, H.; Huang, H.; Keshavjee, K.; Guergachi, A.; Gao, X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr. Disord. 2019, 19, 101. [CrossRef]
- Elreedy, D.; Atiya, A.F. A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Inf. Sci. 2019, 505, 32–64. [CrossRef]
- Wongvorachan, T.; He, S.; Bulut, O. A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information 2023, 14, 54. [CrossRef]
- Kaur, R.; Sharma, R.; Dhaliwal, M.K. Evaluating Performance of SMOTE and ADASYN to Classify Falls and Activities of Daily Living. In Proceedings of the 12th International Conference on Soft Computing for Problem Solving. SocProS 2023; Pant, M., Deep, K., Nagar, A., Eds.; Lecture Notes in Networks and Systems; Springer: Singapore, 2024; Volume 995. [CrossRef]
- Panigrahi, R.; Kumar, L.; Kuanar, S.K. An Empirical Study to Investigate Different SMOTE Data Sampling Techniques for Improving Software Refactoring Prediction. In Neural Information Processing. ICONIP 2020. Communications in Computer and Information Science; Yang, H., Pasupa, K., Leung, A.C., Kwok, J.T., Chan, J.H., King, I.e., Eds.; Springer: Cham, Switzerland, 2020; Volume 1332, pp. 23–31.
- Sahlaoui, H.; Alaoui, E.A.A.; Agoujil, S.; Nayyar, A. An empirical assessment of smote variants techniques and interpretation methods in improving the accuracy and the interpretability of student performance models. Educ. Inf. Technol. 2023, 29, 5447–5483. [CrossRef]
- Haibo, H.; Yang, B.; Garcia, E.A.; Shutao, L. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, China, 1–8 June 2008.
- Elsoud, E.A.; Hassan, M.; Alidmat, O.; Al Henawi, E.; Alshdaifat, N.; Igtait, M.; Ghaben, A.; Katrawi, A.; Dmour, M. Under Sampling Techniques for Handling Unbalanced Data with Various Imbalance Rates—A Comparative Study. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2024, 15, 1274–1284.
- Bach, M.; Werner, A. Improvement of Random Undersampling to Avoid Excessive Removal of Points from a Given Area of the Majority Class. In Computational Science—ICCS 2021, Proceedings of the 21st International Conference, Krakow, Poland, 16–18 June 2021; Part III; Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2021; Volume 12744, pp. 172–186. [CrossRef]
- Rekha, G.; Tyagi, A.K.; Reddy, V.K. Performance Analysis of Under-Sampling and Over-Sampling Techniques for Solving Class Imbalance Problem. In Proceedings of the International Conference on Sustainable Computing in Science, Technology & Management (SUSCOM-2019), Jaipur, India, 26–28 February 2019; pp. 1305–1315.
- Joshi, R.D.; Dhakal, C.K. Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. Int. J. Environ. Res. Public. Health 2021, 18, 7346. [CrossRef]
- Maniruzzaman, M.; Rahman, J.; Hasan, A.M.; Suri, H.S.; Abedin, M.; El-Baz, A.; Suri, J.S. Accurate Diabetes Risk Stratification Using Machine Learning: Role of Missing Value and Outliers. J. Med. Syst. 2018, 42, 92. [CrossRef]
- Mittal, S.; Hasija, Y. Applications of Deep Learning in Healthcare and Biomedicine. In Deep Learning Techniques for Biomedical and Health Informatics; Dash, S., Acharya, B.R., Mittal, M., Abraham, A., Kelemen, A., Eds.; Springer International Publishing AG: Cham, Switzerland, 2019: Volume 68, pp. 57–78, Chapter 4.
- Iyer, A.; Jeyalatha, S.; Sumbaly, R. Diagnosis of Diabetes Using Classification Mining Techniques. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1–14. [CrossRef]
- Barik, S.; Mohanty, S.; Mohanty, S.; Singh, D. Analysis of Prediction Accuracy of Diabetes Using Classifier and Hybrid Machine Learning Techniques. In Intelligent and Cloud Computing; Mishra, D., Buyya, R., Mohapatra, P., Patnaik, S., Eds.; Smart Innovation, Systems and Technologies; Springer: Singapore, 2020; pp. 399–409.
- Ganie, S.M.; Malik, M.B.; Arif, T. Performance analysis and prediction of type 2 diabetes mellitus based on lifestyle data using machine learning approaches. J. Diabetes Metab. Disord. 2022, 21, 339–352. [CrossRef]
- Iparraguirre-Villanueva, O.; Espinola-Linares, K.; Castaneda, R.O.F.; Cabanillas-Carbonell, M. Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes. Diagnostics 2023, 13, 2383. [CrossRef]
- Altamimi, A.; Alarfaj, A.A.; Umer, M.; Alabdulqader, E.A.; Alsubai, S.; Kim, T.-H.; Ashraf, I. An automated approach to predict diabetic patients using KNN imputation and effective data mining techniques. BMC Med. Res. Methodol. 2024, 24, 221. [CrossRef]
- Suriya, S.; Muthu, J.J. Type 2 Diabetes Prediction using K-Nearest Neighbor Algorithm. J. Trends Comput. Sci. Smart Technol. 2023, 5, 190–205. [CrossRef]
- Salam, S.S.; Rafi, R. Deep Learning Approach for Sleep Apnea Detection Using Single Lead ECG: Comparative Analysis Between CNN and SNN. In Proceedings of the 2023 26th International Conference on Computer and Information Technology (ICCIT), Cox’s Bazar, Bangladesh, 13–15 December 2023.
- Rahman, M.; Islam, D.; Mukti, R.J.; Saha, I. A deep learning approach based on convolutional LSTM for detecting diabetes. Comput. Biol. Chem. 2020, 88, 107329. [CrossRef]
- Nadesh, R.K.; Arivuselvan, K. Type 2: Diabetes mellitus prediction using Deep Neural Networks classifier. Int. J. Cogn. Comput. Eng. 2020, 1, 55–61. [CrossRef]
- Wadghiri, M.Z.; Idri, A.; Idrissi, T.E.; Hakkoum, H. Ensemble blood glucose prediction in diabetes mellitus—A review. Comput. Struct. Biotechnol. J. 2022, 147, 105674. [CrossRef]
- Guan, Y.; Plotz, T. Ensembles of Deep LSTM Learners for Activity Recognition using Wearables. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, ACM: New York, USA; 2017; pp. 1–28.
- Shams, M.Y.; Tarek, Z.; Elshewey, A.M. A novel RFE-GRU model for diabetes classification using PIMA Indian dataset. Sci. Rep. 2025, 15, 982. [CrossRef]
- Hossain, M.R.; Hossain, M.J.; Rahman, M.M.; Alam, M.M. Machine Learning Based Prediction and Insights of Diabetes Disease: Pima Indian and Frankfurt Datasets. J. Mech. Contin. Math. Sci. 2025, pp. 99–114. 20. [CrossRef]
- Mousa, A.; Mustafa, W.; Marqas, R.B. A Comparative Study of Diabetes Detection Using the Pima Indian Diabetes Database. J. Univ. Duhok 2023, 26, 277–288. [CrossRef]
- Zargar, O.S.; Bhagat, A.; Teli, T.A.; Sheikh, S. Early Prediction of Diabetes Mellitus on Pima Dataset Using ML And DL Techniques. J. Army Eng. Univ. PLA 2023, 23, 230–249.
- Chang, V.; Bailey, J.; Xu, Q.A.; Sun, Z. Pima Indians diabetes mellitus classification based on machine learning (ML) algorithms. Neural Comput. Appl. 2022, 35, 16157–16173. [CrossRef]
- Xie, Z.; Nikolayeva, O.; Luo, J.; Li, D. Building Risk Prediction Models for Type 2 Diabetes Using Machine Learning Techniques. Prev. Chronic Dis. 2019, 16, E130. [CrossRef]
- Islam, M.M.F.; Ferdousi, R.; Rahman, S.; Bushra, H.Y. Likelihood Prediction of Diabetes at Early Stage Using Data Mining Techniques. In Computer Vision and Machine Intelligence in Medical Image Analysis; Advances in Intelligent Systems and Computing; Gupta, M., Konar, D., Bhattacharyya, S., Biswas, S.; Springer, Singapore; 2020, Chapter 12, pp. 113–125. [CrossRef]
- Sadhu, A.; Jadli, A. Early-Stage Diabetes Risk Prediction—A Comparative Analysis of Classification Algorithms. Int. Adv. Res. J. Sci. Eng. Technol. 2021, 8, 193–201. [CrossRef]
- Al-Haija, Q.A.; Smadi, M.; Al-Bataineh, O.M. Early Stage Diabetes Risk Prediction via Machine Learning. In Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021); Springer: Cham, Switzerland, 2022; Volume 417, pp. 451–461. [CrossRef]
- Chatrati, S.P.; Hossain, G.; Goyal, A.; Bhan, A.; Bhattacharya, S.; Gaurav, D.; Tiwari, S.M. Smart home health monitoring system for predicting type 2 diabetes and hypertension. J. King Saud. Univ.—Comput. Inf. Sci. 2022, 34, 862–870. [CrossRef]
- Bozkurt, M.R.; Yurtay, N.; Yilmaz, Z.; Sertkaya, C. Comparison of different methods for determining diabetes. Turk. J. Electr. Eng. Comput. Sci. 2014, 22, 1044–1055. [CrossRef]
- Bashir, S.; Qamar, U.; Khan, F.H. IntelliHealth: A medical decision support application using a novel weighted multi-layer classifier ensemble framework. J. Biomed. Inf. 2016, 59, 185–200. [CrossRef]
- Wang, Q.; Cao, W.; Guo, J.; Ren, J.; Cheng, Y.; Davis, D.N. DMP_MI: An Effective Diabetes Mellitus Classification Algorithm on Imbalanced Data with Missing Values. IEEE Access 2019, 7, 102232–102238. [CrossRef]
- Kaur, H.; Kumari, V. Predictive modelling and analytics for diabetes using a machine learning approach. Appl. Comput. Inform. 2020, 18, 90–100. [CrossRef]
- Yuvaraj, N.; SriPreethaa, K.R. Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster. Clust. Comput. 2017, 22, 1–9. [CrossRef]























| Description | Dataset 1 | Dataset 2 | Dataset 3 | Dataset 4 | Dataset 5 | |
| Source | UCL Machine Learning Repository, Kaggle and CDC websites | |||||
| Samples | 768 | 2000 | 253,680 | 70692 | 520 | |
| Features | 9 | 9 | 21 | 21 | 17 | |
| Positive instances | 268 | 684 | 35346 | 35346 | 320 | |
| Negative instances | 500 | 1316 | 218334 | 35346 | 200 | |
| Feature | Dataset 1 | Dataset 2 |
| Pregnancies | 111 | 301 |
| Glucose | 5 | 13 |
| BloodPressure | 35 | 90 |
| SkinThickness | 227 | 573 |
| Insulin | 374 | 956 |
| BMI | 11 | 28 |
| DiabetesPedigreeFunction | 0 | 0 |
| Age | 0 | 0 |
| Outcome (Target class) | Dataset 1 | Dataset 2 | Dataset 3 | Dataset 4 | Dataset 5 |
| 0 | 400 | 1053 | 213,703 | 218,334 | 200 |
| 1 | 214 | 547 | 4631 | 35,346 | 320 |
| 2 | - | - | 35,346 | - | - |
| Model | Accuracy | Precision | Recall | F1 Score | AUC-ROC | R2 | MSE | MAE | RMSE | TT | MU | NOP |
| AdaBoost | 0.7987 | 0.6716 | 0.8333 | 0.7438 | 0.8386 | 0.1159 | 0.2013 | 0.2013 | 0.4487 | 1.1812 | 0.0 B | 987 |
| Bagging AdaBoost | 0.7857 | 0.6615 | 0.7963 | 0.7227 | 0.8439 | 0.0589 | 0.2143 | 0.2143 | 0.4629 | 2.1976 | 80.0 kB | 1392 |
| RNN | 0.7792 | 0.6471 | 0.8148 | 0.7213 | 0.8202 | 0.0304 | 0.2208 | 0.2208 | 0.4699 | 7.9969 | 76.0 kB | 7831 |
| Bagging DNN | 0.7727 | 0.6338 | 0.8333 | 0.72 | 0.8198 | 0.0019 | 0.2273 | 0.2273 | 0.4767 | 60.4006 | 1228.0 kB | 110652 |
| RF | 0.7792 | 0.6563 | 0.7778 | 0.7119 | 0.8304 | 0.0304 | 0.2208 | 0.2208 | 0.4699 | 0.4222 | 36.0 kB | 6269 |
| Bagging XGBoost | 0.7727 | 0.6418 | 0.7963 | 0.7107 | 0.828 | 0.0019 | 0.2273 | 0.2273 | 0.4767 | 1.4211 | 0.0 B | 19164 |
| XGBoost | 0.7662 | 0.6286 | 0.8148 | 0.7097 | 0.8381 | -0.0267 | 0.2338 | 0.2338 | 0.4835 | 0.395 | 0.0 B | 1618 |
| Stacking Classifier | 0.7727 | 0.6462 | 0.7778 | 0.7059 | 0.8302 | 0.0019 | 0.2273 | 0.2273 | 0.4767 | 60.8967 | 428.0 kB | 26603 |
| Bagging RF | 0.7662 | 0.6324 | 0.7963 | 0.7049 | 0.832 | -0.0267 | 0.2338 | 0.2338 | 0.4835 | 9.6091 | 24.0 kB | 243870 |
| LR-MLP | 0.7662 | 0.6364 | 0.7778 | 0.7000 | 0.8248 | -0.0267 | 0.2338 | 0.2338 | 0.4835 | 4.4264 | 0.0 B | 10 |
| LR | 0.7597 | 0.6269 | 0.7778 | 0.6942 | 0.8244 | -0.0552 | 0.2403 | 0.2403 | 0.4902 | 0.2405 | 0.0 B | 9 |
| XGBoost-CNN | 0.7662 | 0.6452 | 0.7407 | 0.6897 | 0.8126 | -0.0267 | 0.2338 | 0.2338 | 0.4835 | 6.3776 | 0.0 B | 20809 |
| SVM | 0.7597 | 0.6308 | 0.7593 | 0.6891 | 0.8271 | -0.0552 | 0.2403 | 0.2403 | 0.4902 | 0.9922 | 0.0 B | 8 |
| Bagging RNN | 0.7468 | 0.6056 | 0.7963 | 0.688 | 0.8128 | -0.1122 | 0.2532 | 0.2532 | 0.5032 | 42.0664 | 492.0 kB | 113785 |
| XGBoost-LSTM | 0.7338 | 0.5867 | 0.8148 | 0.6822 | 0.8145 | -0.1693 | 0.2662 | 0.2662 | 0.516 | 9.9012 | 0.0 B | 3168 |
| RF-GRU | 0.7403 | 0.6000 | 0.7778 | 0.6774 | 0.8189 | -0.1407 | 0.2597 | 0.2597 | 0.5096 | 7.5671 | 108.0 kB | 28314 |
| KNN | 0.7338 | 0.589 | 0.7963 | 0.6772 | 0.806 | -0.1693 | 0.2662 | 0.2662 | 0.516 | 0.1758 | 16.0 kB | 6288 |
| RF-CNN | 0.7338 | 0.5915 | 0.7778 | 0.672 | 0.8174 | -0.1693 | 0.2662 | 0.2662 | 0.516 | 4.1751 | 40.0 kB | 5572 |
| SVM-RNN | 0.7338 | 0.5915 | 0.7778 | 0.672 | 0.8152 | -0.1693 | 0.2662 | 0.2662 | 0.516 | 6.396 | 0.0 B | 9 |
| AdaBoost-DBN | 0.7208 | 0.5714 | 0.8148 | 0.6718 | 0.7947 | -0.2263 | 0.2792 | 0.2792 | 0.5284 | 24.0248 | 4.0 kB | 1491 |
| KNN-Autoencoders | 0.6948 | 0.5432 | 0.8148 | 0.6519 | 0.7839 | -0.3404 | 0.3052 | 0.3052 | 0.5524 | 10.054 | 0.0 B | 24366 |
| NB | 0.7208 | 0.5797 | 0.7407 | 0.6504 | 0.7804 | -0.2263 | 0.2792 | 0.2792 | 0.5284 | 0.2365 | 0.0 B | 34 |
| DNN | 0.7338 | 0.6032 | 0.7037 | 0.6496 | 0.807 | -0.1693 | 0.2662 | 0.2662 | 0.516 | 5.0588 | 92.0 kB | 8067 |
| DT-CNN | 0.7013 | 0.5526 | 0.7778 | 0.6462 | 0.712 | -0.3119 | 0.2987 | 0.2987 | 0.5465 | 5.1779 | 0.0 B | 81 |
| CNN | 0.7208 | 0.5846 | 0.7037 | 0.6387 | 0.807 | -0.2263 | 0.2792 | 0.2792 | 0.5284 | 4.9617 | 68.0 kB | 1579 |
| DT | 0.7208 | 0.5873 | 0.6852 | 0.6325 | 0.7915 | -0.2263 | 0.2792 | 0.2792 | 0.5284 | 0.2345 | 0.0 B | 31 |
| LSTM | 0.6818 | 0.5424 | 0.5926 | 0.5664 | 0.7085 | -0.3974 | 0.3182 | 0.3182 | 0.5641 | 18.418 | 2052.0 kB | 64639 |
| GRU | 0.6753 | 0.5333 | 0.5926 | 0.5614 | 0.7081 | -0.4259 | 0.3247 | 0.3247 | 0.5698 | 20.1196 | 1404.0 kB | 34126 |
| Model | Accuracy | Precision | Recall | F1 Score | AUC-ROC | R2 | MSE | MAE | RMSE | TT | MU | NOP |
| RF | 0.7857 | 0.6567 | 0.8148 | 0.7273 | 0.8341 | 0.0589 | 0.2143 | 0.2143 | 0.4629 | 0.6486 | 32.0 kB | 13697 |
| Bagging DNN | 0.7792 | 0.6429 | 0.8333 | 0.7258 | 0.8283 | 0.0304 | 0.2208 | 0.2208 | 0.4699 | 69.5682 | 1256.0 kB | 154790 |
| Bagging RNN | 0.7792 | 0.6429 | 0.8333 | 0.7258 | 0.8207 | 0.0304 | 0.2208 | 0.2208 | 0.4699 | 56.3799 | 1452.0 kB | 31885 |
| Bagging AdaBoost | 0.7792 | 0.6471 | 0.8148 | 0.7213 | 0.8446 | 0.0304 | 0.2208 | 0.2208 | 0.4699 | 5.1711 | 8.0 kB | 3036 |
| AdaBoost | 0.7727 | 0.6338 | 0.8333 | 0.7200 | 0.8401 | 0.0019 | 0.2273 | 0.2273 | 0.4767 | 1.8813 | 0.0 B | 1500 |
| XGBoost | 0.7727 | 0.6338 | 0.8333 | 0.7200 | 0.8376 | 0.0019 | 0.2273 | 0.2273 | 0.4767 | 0.4041 | 0.0 B | 1272 |
| Stacking Classifier | 0.7662 | 0.6286 | 0.8148 | 0.7097 | 0.8344 | -0.0267 | 0.2338 | 0.2338 | 0.4835 | 52.1551 | 320.0 kB | 57376 |
| Bagging XGBoost | 0.7857 | 0.6780 | 0.7407 | 0.7080 | 0.8274 | 0.0589 | 0.2143 | 0.2143 | 0.4629 | 3.1946 | 4.0 kB | 118269 |
| DNN | 0.7662 | 0.6324 | 0.7963 | 0.7049 | 0.8170 | -0.0267 | 0.2338 | 0.2338 | 0.4835 | 7.7426 | 48.0 kB | 11339 |
| RF-GRU | 0.7597 | 0.6197 | 0.8148 | 0.7040 | 0.8161 | -0.0552 | 0.2403 | 0.2403 | 0.4902 | 8.3945 | 60.0 kB | 27302 |
| RF-CNN | 0.7468 | 0.5974 | 0.8519 | 0.7023 | 0.8159 | -0.1122 | 0.2532 | 0.2532 | 0.5032 | 7.3780 | 40.0 kB | 4184 |
| LR-MLP | 0.7532 | 0.6143 | 0.7963 | 0.6935 | 0.8222 | -0.0837 | 0.2468 | 0.2468 | 0.4967 | 14.3566 | 28.0 kB | 10 |
| Bagging RF | 0.7662 | 0.6452 | 0.7407 | 0.6897 | 0.8298 | -0.0267 | 0.2338 | 0.2338 | 0.4835 | 11.5716 | 12024.0 kB | 342125 |
| SVM | 0.7532 | 0.6176 | 0.7778 | 0.6885 | 0.8219 | -0.0837 | 0.2468 | 0.2468 | 0.4967 | 0.2106 | 0.0 B | 8 |
| XGBoost-LSTM | 0.7338 | 0.5844 | 0.8333 | 0.6870 | 0.8228 | -0.1693 | 0.2662 | 0.2662 | 0.5160 | 9.3506 | 40.0 kB | 3214 |
| SVM-RNN | 0.7468 | 0.6087 | 0.7778 | 0.6829 | 0.8148 | -0.1122 | 0.2532 | 0.2532 | 0.5032 | 5.1516 | 0.0 B | 9 |
| LR | 0.7468 | 0.6119 | 0.7593 | 0.6777 | 0.8215 | -0.1122 | 0.2532 | 0.2532 | 0.5032 | 0.4155 | 0.0 B | 9 |
| KNN | 0.7273 | 0.5789 | 0.8148 | 0.6769 | 0.7935 | -0.1978 | 0.2727 | 0.2727 | 0.5222 | 0.1915 | 36.0 kB | 6304 |
| AdaBoost-DBN | 0.7273 | 0.5833 | 0.7778 | 0.6667 | 0.8135 | -0.1978 | 0.2727 | 0.2727 | 0.5222 | 24.3952 | 0.0 B | 1314 |
| XGBoost-CNN | 0.7532 | 0.6333 | 0.7037 | 0.6667 | 0.7983 | -0.0837 | 0.2468 | 0.2468 | 0.4967 | 8.3154 | 0.0 B | 10900 |
| KNN-Autoencoders | 0.7273 | 0.5857 | 0.7593 | 0.6613 | 0.7693 | -0.1978 | 0.2727 | 0.2727 | 0.5222 | 10.7923 | 84.0 kB | 10244 |
| RNN | 0.7338 | 0.5970 | 0.7407 | 0.6612 | 0.8087 | -0.1693 | 0.2662 | 0.2662 | 0.5160 | 9.4642 | 436.0 kB | 3539 |
| DT | 0.7597 | 0.6667 | 0.6296 | 0.6476 | 0.7770 | -0.0552 | 0.2403 | 0.2403 | 0.4902 | 0.2572 | 0.0 B | 129 |
| DT-CNN | 0.7208 | 0.5902 | 0.6667 | 0.6261 | 0.7525 | -0.2263 | 0.2792 | 0.2792 | 0.5284 | 6.4722 | 0.0 B | 101 |
| NB | 0.6948 | 0.5522 | 0.6852 | 0.6116 | 0.7676 | -0.3404 | 0.3052 | 0.3052 | 0.5524 | 0.1878 | 0.0 B | 34 |
| CNN | 0.6818 | 0.5352 | 0.7037 | 0.6080 | 0.7665 | -0.3974 | 0.3182 | 0.3182 | 0.5641 | 4.4649 | 16.0 kB | 50587 |
| LSTM | 0.6688 | 0.5231 | 0.6296 | 0.5714 | 0.7059 | -0.4544 | 0.3312 | 0.3312 | 0.5755 | 12.0811 | 240.0 kB | 9649 |
| GRU | 0.6883 | 0.5517 | 0.5926 | 0.5714 | 0.7256 | -0.3689 | 0.3117 | 0.3117 | 0.5583 | 13.9710 | 564.0 kB | 1401 |
| Model | Accuracy | Precision | Recall | F1 Score | AUC-ROC | R2 | MSE | MAE | RMSE | TT | MU | NOP |
| RNN | 0.7144 | 0.4387 | 0.4982 | 0.4414 | 0.7008 | -0.7099 | 0.8334 | 0.4682 | 0.9129 | 141.7094 | 600.0 kB | 14277 |
| CNN | 0.6984 | 0.4403 | 0.5171 | 0.4411 | 0.7064 | -0.8212 | 0.8877 | 0.4970 | 0.9422 | 77.7925 | 216.0 kB | 31823 |
| DNN | 0.6975 | 0.4397 | 0.5149 | 0.4401 | 0.7055 | -0.8398 | 0.8967 | 0.5006 | 0.9470 | 959.0492 | 10316.0 kB | 19371 |
| AdaBoost | 0.6898 | 0.4337 | 0.5155 | 0.4330 | 0.7128 | -1.0948 | 1.0210 | 0.5471 | 1.0105 | 38.0504 | 0.0 B | 11696 |
| XGBoost | 0.6834 | 0.4301 | 0.5109 | 0.4270 | 0.7143 | -1.1936 | 1.0692 | 0.5674 | 1.0340 | 1.4653 | 4.0 kB | 1244 |
| XGBoost-LSTM | 0.7004 | 0.4301 | 0.5079 | 0.4252 | 0.7184 | -1.2789 | 1.1108 | 0.5700 | 1.0539 | 385.1103 | 272.0 kB | 34830 |
| RF | 0.6755 | 0.4296 | 0.5119 | 0.4251 | 0.7091 | -1.2227 | 1.0834 | 0.5775 | 1.0408 | 11.1173 | 68.0 kB | 738710 |
| RF-CNN | 0.6734 | 0.4302 | 0.5104 | 0.4245 | 0.7107 | -1.1799 | 1.0625 | 0.5719 | 1.0308 | 44.9699 | 2220.0 kB | 281808 |
| RF-GRU | 0.6639 | 0.4307 | 0.5115 | 0.4229 | 0.7097 | -1.1512 | 1.0485 | 0.5736 | 1.0240 | 136.2781 | 1240.0 kB | 269827 |
| DT-CNN | 0.6890 | 0.4227 | 0.4783 | 0.4218 | 0.6566 | -0.9623 | 0.9564 | 0.5261 | 0.9780 | 22.8690 | 0.0 B | 15 |
| LR | 0.6260 | 0.4499 | 0.5147 | 0.4194 | 0.7077 | -0.6358 | 0.7973 | 0.5151 | 0.8929 | 2.9852 | 336.0 kB | 64 |
| LR-MLP | 0.5930 | 0.4561 | 0.5197 | 0.4116 | 0.7118 | -0.6116 | 0.7855 | 0.5332 | 0.8863 | 23.3801 | 20.0 kB | 67 |
| DT | 0.6384 | 0.4235 | 0.4935 | 0.4085 | 0.6876 | -1.2310 | 1.0874 | 0.6036 | 1.0428 | 0.3724 | 0.0 B | 233 |
| NB | 0.6245 | 0.4364 | 0.4892 | 0.4083 | 0.6803 | -0.7259 | 0.8412 | 0.5307 | 0.9172 | 0.2969 | 0.0 B | 129 |
| SVM | 0.5759 | 0.4591 | 0.5116 | 0.4044 | 0.7084 | -0.5697 | 0.7651 | 0.5378 | 0.8747 | 277.1082 | 136.0 kB | 63 |
| KNN | 0.5626 | 0.4238 | 0.4778 | 0.3794 | 0.6617 | -0.9820 | 0.9660 | 0.6136 | 0.9829 | 114.0517 | 0.0 B | 547344 |
| KNN-Autoencoders | 0.5279 | 0.4251 | 0.4760 | 0.3651 | 0.6665 | -0.9109 | 0.9314 | 0.6252 | 0.9651 | 69.0643 | 1208.0 kB | 1537776 |
| Bagging XGBoost | 0.6899 | 0.4298 | 0.5098 | 0.4290 | 0.7029 | -1.1983 | 1.0715 | 0.5639 | 1.0351 | 9.1290 | 72.0 kB | 15380 |
| Stacking Classifier | 0.6632 | 0.4266 | 0.5095 | 0.4169 | 0.7101 | -1.3910 | 1.1654 | 0.6130 | 1.0795 | 552.4895 | 252.0 kB | 73996 |
| Model | Accuracy | Precision | Recall | F1 Score | AUC-ROC | R2 | MSE | MAE | RMSD | TT | MU | NOP |
| DNN | 0.7548 | 0.3286 | 0.7280 | 0.4528 | 0.8233 | -1.0445 | 0.2452 | 0.2452 | 0.4951 | 153.6247 | 464.0 kB | 22892 |
| GRU | 0.7542 | 0.3243 | 0.7053 | 0.4443 | 0.8141 | -1.0496 | 0.2458 | 0.2458 | 0.4958 | 1005.0657 | 1292.0 kB | 21498 |
| CNN | 0.7364 | 0.3145 | 0.7564 | 0.4443 | 0.8218 | -1.1985 | 0.2636 | 0.2636 | 0.5135 | 89.3823 | 1096.0 kB | 55909 |
| Bagging AdaBoost | 0.7249 | 0.3080 | 0.7816 | 0.4419 | 0.8250 | -1.2942 | 0.2751 | 0.2751 | 0.5245 | 172.3706 | 12.0 kB | 44017 |
| Bagging XGBoost | 0.7184 | 0.3050 | 0.7986 | 0.4414 | 0.8265 | -1.3483 | 0.2816 | 0.2816 | 0.5307 | 13.8929 | 204.0 kB | 59953 |
| AdaBoost | 0.7206 | 0.3057 | 0.7908 | 0.4409 | 0.8250 | -1.3300 | 0.2794 | 0.2794 | 0.5286 | 55.0732 | 0.0 B | 68311 |
| XGBoost | 0.7177 | 0.3044 | 0.7987 | 0.4408 | 0.8259 | -1.3545 | 0.2823 | 0.2823 | 0.5314 | 2.1093 | 0.0 B | 851 |
| LR-MLP | 0.7259 | 0.3077 | 0.7739 | 0.4403 | 0.8206 | -1.2858 | 0.2741 | 0.2741 | 0.5236 | 34.6740 | 0.0 B | 23 |
| LR | 0.7250 | 0.3069 | 0.7741 | 0.4396 | 0.8196 | -1.2935 | 0.2750 | 0.2750 | 0.5244 | 1.5440 | 0.0 B | 22 |
| Stacking Classifier | 0.7168 | 0.3032 | 0.7953 | 0.4390 | 0.8248 | -1.3612 | 0.2832 | 0.2832 | 0.5321 | 3029.2078 | 1148.0 kB | 173546 |
| XGBoost-LSTM | 0.7140 | 0.3016 | 0.8007 | 0.4382 | 0.8240 | -1.3854 | 0.2860 | 0.2860 | 0.5348 | 227.6599 | 364.0 kB | 3377 |
| RF-CNN | 0.7097 | 0.2997 | 0.8109 | 0.4377 | 0.8252 | -1.4207 | 0.2903 | 0.2903 | 0.5388 | 93.7112 | 1596.0 kB | 658218 |
| RF | 0.7124 | 0.3002 | 0.7994 | 0.4365 | 0.8226 | -1.3986 | 0.2876 | 0.2876 | 0.5363 | 15.5934 | 28.0 kB | 1554750 |
| XGBoost-CNN | 0.7076 | 0.2983 | 0.8124 | 0.4363 | 0.8261 | -1.4387 | 0.2924 | 0.2924 | 0.5408 | 70.9944 | 0.0 B | 11896 |
| RF-GRU | 0.7067 | 0.2974 | 0.8107 | 0.4351 | 0.8247 | -1.4456 | 0.2933 | 0.2933 | 0.5415 | 553.4871 | 272.0 kB | 498831 |
| DT-CNN | 0.7121 | 0.2990 | 0.7928 | 0.4342 | 0.8178 | -1.4005 | 0.2879 | 0.2879 | 0.5365 | 66.3589 | 24.0 kB | 31 |
| Bagging DNN | 0.7060 | 0.2960 | 0.8053 | 0.4329 | 0.8222 | -1.4518 | 0.2940 | 0.2940 | 0.5422 | 447.2553 | 352.0 kB | 56725 |
| SVM | 0.7089 | 0.2967 | 0.7946 | 0.4321 | 0.8189 | -1.4272 | 0.2911 | 0.2911 | 0.5395 | 882.2086 | 232.0 kB | 21 |
| Bagging GRU | 0.7206 | 0.3013 | 0.7622 | 0.4319 | 0.8120 | -1.3297 | 0.2794 | 0.2794 | 0.5286 | 2105.2017 | 3892.0 kB | 139076 |
| SVM-RNN | 0.7023 | 0.2930 | 0.8046 | 0.4296 | 0.8183 | -1.4827 | 0.2977 | 0.2977 | 0.5456 | 788.8983 | 28.0 kB | 674564 |
| LSTM | 0.7334 | 0.3049 | 0.7142 | 0.4274 | 0.8016 | -1.2235 | 0.2666 | 0.2666 | 0.5164 | 1245.5357 | 404.0 kB | 61624 |
| AdaBoost-DBN | 0.7089 | 0.2942 | 0.7785 | 0.4270 | 0.8129 | -1.4276 | 0.2911 | 0.2911 | 0.5396 | 592.0342 | 188.0 kB | 1317 |
| DT | 0.7124 | 0.2954 | 0.7687 | 0.4268 | 0.8077 | -1.3987 | 0.2876 | 0.2876 | 0.5363 | 0.3540 | 0.0 B | 127 |
| Bagging CNN | 0.6939 | 0.2880 | 0.8127 | 0.4252 | 0.8191 | -1.5526 | 0.3061 | 0.3061 | 0.5533 | 528.4152 | 992.0 kB | 87108 |
| NB | 0.7235 | 0.2941 | 0.7029 | 0.4147 | 0.7799 | -1.3055 | 0.2765 | 0.2765 | 0.5258 | 0.2801 | 0.0 B | 86 |
| KNN-Autoencoders | 0.7156 | 0.2892 | 0.7141 | 0.4117 | 0.7808 | -1.3713 | 0.2844 | 0.2844 | 0.5332 | 295.8879 | 16.0 kB | 1696620 |
| KNN | 0.7058 | 0.2848 | 0.7356 | 0.4107 | 0.7857 | -1.4531 | 0.2942 | 0.2942 | 0.5424 | 66.8340 | 0.0 B | 1187634 |
| RNN | 0.6555 | 0.2602 | 0.7986 | 0.3925 | 0.7866 | -1.8726 | 0.3445 | 0.3445 | 0.5869 | 98.3237 | 496.0 kB | 12431 |
| Model | Accuracy | Precision | Recall | F1 Score | AUC-ROC | R2 | MSE | MAE | RMSE | TT | MU | NOP |
| RF | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 0.58740 | 24.0 kB | 11455 |
| Stacking Classifier | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 0.0000 | 0.0000 | 0.0000 | 37.0527 | 296.0 kB | N/A |
| DT-CNN | 0.9904 | 0.9846 | 1.0000 | 0.9922 | 0.9992 | 0.9594 | 0.0096 | 0.0096 | 0.0981 | 5.7288 | 0.0 B | 27 |
| Bagging SVM | 0.9904 | 0.9846 | 1.0000 | 0.9922 | 0.9992 | 0.9594 | 0.0096 | 0.0096 | 0.0981 | 0.5832 | 0.0 B | 4360 |
| DT | 0.9904 | 1.0000 | 0.9844 | 0.9921 | 0.9922 | 0.9594 | 0.0096 | 0.0096 | 0.0981 | 0.2275 | 0.0 B | 67 |
| AdaBoost | 0.9904 | 1.0000 | 0.9844 | 0.9921 | 1.0000 | 0.9594 | 0.0096 | 0.0096 | 0.0981 | 0.7496 | 0.0 B | 9146 |
| Bagging DT | 0.9904 | 1.0000 | 0.9844 | 0.9921 | 1.0000 | 0.9594 | 0.0096 | 0.0096 | 0.0981 | 1.0822 | 16.0 kB | 10691 |
| SVM | 0.9808 | 0.9844 | 0.9844 | 0.9844 | 0.9977 | 0.9188 | 0.0192 | 0.0192 | 0.1387 | 0.3871 | 0.0 B | 1312 |
| DNN | 0.9808 | 0.9844 | 0.9844 | 0.9844 | 0.9988 | 0.9188 | 0.0192 | 0.0192 | 0.1387 | 7.3960 | 88.0 kB | 10911 |
| RF-CNN | 0.9808 | 0.9844 | 0.9844 | 0.9844 | 0.9980 | 0.9188 | 0.0192 | 0.0192 | 0.1387 | 5.2281 | 68.0 kB | 4230 |
| Bagging RF | 0.9808 | 0.9844 | 0.9844 | 0.9844 | 0.9965 | 0.9188 | 0.0192 | 0.0192 | 0.1387 | 1.6575 | 128.0 kB | 10245 |
| XGBoost | 0.9808 | 1.0000 | 0.9688 | 0.9841 | 0.9992 | 0.9188 | 0.0192 | 0.0192 | 0.1387 | 0.3725 | 0.0 B | 4488 |
| AdaBoost-DBN | 0.9808 | 1.0000 | 0.9688 | 0.9841 | 0.9984 | 0.9188 | 0.0192 | 0.0192 | 0.1387 | 15.2999 | 0.0 B | 1275 |
| RF-GRU | 0.9808 | 1.0000 | 0.9688 | 0.9841 | 1.0000 | 0.9188 | 0.0192 | 0.0192 | 0.1387 | 8.4001 | 56.0 kB | 6094 |
| CNN | 0.9712 | 0.9841 | 0.9688 | 0.9764 | 0.9980 | 0.8781 | 0.0288 | 0.0288 | 0.1698 | 6.8386 | 236.0 kB | 174659 |
| SVM-RNN | 0.9712 | 0.9841 | 0.9688 | 0.9764 | 0.9977 | 0.8781 | 0.0288 | 0.0288 | 0.1698 | 8.0761 | 0.0 B | 1394 |
| XGBoost-LSTM | 0.9712 | 1.0000 | 0.9531 | 0.9760 | 0.9973 | 0.8781 | 0.0288 | 0.0288 | 0.1698 | 14.8126 | 0.0 B | 1489 |
| Bagging AdaBoost | 0.9615 | 0.9688 | 0.9688 | 0.9688 | 0.9859 | 0.8375 | 0.0385 | 0.0385 | 0.1961 | 5.3577 | 0.0 B | 2970 |
| LR-MLP | 0.9615 | 0.9839 | 0.9531 | 0.9683 | 0.9984 | 0.8375 | 0.0385 | 0.0385 | 0.1961 | 11.7585 | 0.0 B | 18 |
| XGBoost-CNN | 0.9615 | 0.9839 | 0.9531 | 0.9683 | 0.9947 | 0.8375 | 0.0385 | 0.0385 | 0.1961 | 6.5654 | 0.0 B | 5082 |
| Bagging CNN-DT | 0.9615 | 0.9839 | 0.9531 | 0.9683 | 0.9969 | 0.8375 | 0.0385 | 0.0385 | 0.1961 | 38.0482 | 1072.0 kB | N/A |
| KNN | 0.9519 | 0.9836 | 0.9375 | 0.9600 | 0.9820 | 0.7969 | 0.0481 | 0.0481 | 0.2193 | 0.1675 | 0.0 B | 6656 |
| LR | 0.9519 | 1.0000 | 0.9219 | 0.9593 | 0.9918 | 0.7969 | 0.0481 | 0.0481 | 0.2193 | 0.2580 | 0.0 B | 17 |
| KNN-Autoencoders | 0.9519 | 1.0000 | 0.9219 | 0.9593 | 0.9949 | 0.7969 | 0.0481 | 0.0481 | 0.2193 | 9.9981 | 0.0 B | 14560 |
| NB | 0.9423 | 0.9677 | 0.9375 | 0.9524 | 0.9863 | 0.7563 | 0.0577 | 0.0577 | 0.2402 | 0.2241 | 0.0 B | 66 |
| RNN | 0.9327 | 0.9831 | 0.9063 | 0.9431 | 0.9934 | 0.7156 | 0.0673 | 0.0673 | 0.2594 | 11.3673 | 408.0 kB | 15749 |
| LSTM | 0.8942 | 0.9206 | 0.9063 | 0.9134 | 0.9711 | 0.5531 | 0.1058 | 0.1058 | 0.3252 | 20.4970 | 1220.0 kB | 60635 |
| GRU | 0.8846 | 0.9643 | 0.8438 | 0.9000 | 0.9559 | 0.5125 | 0.1154 | 0.1154 | 0.3397 | 14.1540 | 712.0 kB | 38753 |
| Datasets | Models | Accuracy | Precision | Recall | F1-score | TT(s) | MU |
| D1 | AdaBoost | 0.798 | 0.671 | 0.833 | 0.743 | 1.181 | 0 B |
| D2 | RF | 0.785 | 0.656 | 0.814 | 0.727 | 0.648 | 32 kB |
| D3 | RNN | 0.714 | 0.438 | 0.498 | 0.441 | 141.709 | 600 kB |
| D4 | DNN | 0.754 | 0.328 | 0.728 | 0.452 | 153.624 | 464 kB |
| D5 | RF | 1.000 | 1.000 | 1.000 | 1.000 | 0.587 | 24 kB |
| Datasets | Authors | Outliers | Missing Values | Model | Precision | Accuracy | Recall | F1-score |
| [44] | IQR | Attribute Mean | AB + XB | – | – | 0.7900 | – | |
| [46] | – | – | GBM | – | – | 0.8700 | – | |
| [80] | – | – | DA | – | 0.7400 | 0.7200 | – | |
| [81] | – | – | ANN | – | 0.7600 | 0.5300 | – | |
| Dataset 1 | [82] | ESD | k-NN | HM-BagMoov | – | 0.8600 | 0.8500 | 0.7900 |
| Dataset 2 | [39] | IQR | CWM | QML | 0.7400 | 0.8600 | 0.8500 | 0.7900 |
| [83] | – | NB | RF | 0.8100 | 0.8700 | 0.8500 | 0.8300 | |
| [84] | – | – | k-NN | 0.8700 | 0.8800 | 0.9000 | 0.8800 | |
| [56] | GM | Median | RF | – | 0.9300 | 0.7970 | – | |
| [85] | – | – | RF | 0.9400 | 0.9400 | 0.8800 | 0.9100 | |
| [39] | IQR | CWM | DL | 0.9000 | 0.9500 | 0.9500 | 0.9300 | |
| Our Study | IQR | ADASYN | AdaBoost | 0.6716 | 0.7987 | 0.8333 | 0.7438 | |
| Our Study | IQR | ADASYN | RF | 0.6567 | 0.7857 | 0.8148 | 0.7273 | |
| Dataset 3 | [76] | – | Excluded | NN | – | 0.8240 | 0.3781 | – |
| Dataset 4 | Our Study | IQR | Clustering | RNN | 0.4387 | 0.7144 | 0.4982 | 0.4414 |
| Our Study | IQR | Clustering | DNN | 0.3286 | 0.7548 | 0.7280 | 0.4526 | |
| Dataset 5 | [77] | – | Ignoring Tuple | RF | 0.9740 | 0.9740 | 0.9740 | 0.9740 |
| Our Study | IQR | – | RF | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).