Submitted:
30 November 2025
Posted:
02 December 2025
You are already at the latest version
Abstract

Keywords:
Introduction
Research Questions
Literature Review
Machine Learning Approaches to Dropout Prediction
Comparison of Algorithms Performance
Forecasting in Dropout Networks
Indicators of Academic Performance
Engagement and Behavioral Characteristics
Demographic and Socio-Economic Factors
Gaps in the Literature and Positioning of the Present Study
Method
Overview of the Methodological Approach
Selection of Data set and Institutional Context
Set of features and Outcome Variable
Data Preprocessing
Selection of the ML-Algorithms
Hyperparameter Tuning, Evaluation, and Model Training.
Results
Descriptive Statistics and Class Distribution
Baseline Model Performance
Tuned Models and Multi-Seed Evaluation
Confusion Matrices
Feature Importance and Predictive Factors
Summary of Key Findings
| Variable | N | M | SD | Min |
|---|---|---|---|---|
| Age at enrollment | 4424.0 | 23.265145 | 7.587816 | 17.0 |
| Previous qualification (grade) | 4424.0 | 132.613314 | 13.188332 | 95.0 |
| Admission grade | 4424.0 | 126.978119 | 14.482001 | 95.0 |
| Tuition fees up to date | 4424.0 | 0.880651 | 0.324235 | 0.0 |
| Nacionality | 4424.0 | 1.873192 | 6.914514 | 1.0 |
| Model | Accuracy | Macro-F1 | Weighted-F1: |
|---|---|---|---|
| Multinomial logistic regression | 0.7650 | 0.6798 | 0.7518 |
| Random forest | 0.7797 | 0.7041 | 0.7659 |
| Extreme gradient boosting | 0.7785 | 0.7148 | 0.7710 |
| Model | Macro-F1 M (SD) | ROC–AUC M (SD) |
|---|---|---|
| Multinomial logistic regression | 0.670870 | 0.877823 |
| Random forest | 0.682767 | 0.883189 |
| Extreme gradient boosting | 0.706758 | 0.893289 |






Discussion
Discussion of Results Relating to the Literature
Early Warning and Institutional Practice Implications
Methodological Contributions
Future Research Limitations and Directions
Restricted Feature Space
Scales of Imbalance Management and Assessment
Interpretability, Faireness, and Ethical Concerns
Future Research Directions
- Cross-institutional and cross-cohort validation: Train and test models across multiple institutions and time periods to assess generalizability and concept drift.
- Richer feature spaces and temporal modeling: Integrate enrollment-time data with LMS logs, assessment records, and advising notes; explore time-series and sequence models that capture behavioral trajectories.
- Advanced imbalance and calibration techniques: Compare class-weighted losses, resampling strategies, and calibration methods to improve minority-class performance and probability reliability.
- Explainable and fair machine learning: Incorporate SHAP or related methods to provide case-level explanations, and systematically evaluate fairness metrics across key demographic and socio-economic groups.
- Intervention and impact evaluation: Move beyond predictive accuracy to study how model-informed interventions affect actual retention and graduation outcomes, ideally through pilot programs or randomized controlled trials.
Conclusions
References
- Akter, T., Ayman, U., Narayan Ranjan Chakraborty, D. A., Islam, A., Mazumder, M. H. I., & Bijoy, B. (2024). Dropout prediction of university students in Bangladesh using machine learning. In Proceedings of the 2024 IEEE International Conference on Computing, Applications and Systems (COMPAS). IEEE.
- Arqawi, S. M., Zitawi, E. A., Rabaya, A. H., Abunasser, B. S., & Abu-Naser, S. S. (2022). Predicting university student retention using artificial intelligence. International Journal of Advanced Computer Science and Applications, 13(9), 315–321. [CrossRef]
- Attiya, W. M., & Bin Shams, M. A. (2023). Predicting student retention in higher education using data mining techniques: A literature review. In Proceedings of the 2023 International Conference on Cyber Management and Engineering (CyMaEn) (pp. 171–177).
- Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2016). Predicting student dropout in higher education. arXiv. arXiv:1606.06364.
- Cheng, J., Yang, Z.-Q., Cao, J., Yang, Y., & Zheng, X. (2025). Predicting student dropout risk with a dual-modal abrupt behavioral changes approach. arXiv. arXiv:2505.11119.
- Elbouknify, I., Berrada, I., Mekouar, L., Iraqi, Y., Bergou, E. H., Belhabib, H., Nail, Y., & Wardi, S. (2025). AI-based identification and support of at-risk students: A case study of the Moroccan education system. arXiv. arXiv:2504.07160.
- Kabathova, J., & Drlik, M. (2021). Towards predicting student’s dropout in university courses using different machine learning techniques. Applied Sciences, 11(7), 3130. [CrossRef]
- Kim, S., Yoo, E., & Kim, S. (2023). Why do students drop out? University dropout prediction and associated factor analysis using machine learning techniques. arXiv. arXiv:2310.10987.
- Matz, S. C., Bukow, C. S., Peters, H., Deacons, C., Dinu, A., & Stachl, C. (2023). Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics. Scientific Reports, 13, 5705. [CrossRef]
- Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., & Nshimyumukiza, P. C. (2022). Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Computers and Electrical Engineering: Artificial Intelligence, 3, 100066. [CrossRef]
- Noviandy, R., Mukti, F. A., Sari, D. M., & Pratiwi, P. E. (2024). Machine learning for early detection of dropout risks and academic excellence: A stacked classifier approach. Journal of Educational Management and Learning, 2(1), 28–34.
- Park, H. S., & Yoo, S. J. (2021). Early dropout prediction in online learning of university using machine learning. JOIV: International Journal on Informatics Visualization, 5(4), 347–353. [CrossRef]
- Ridwan, S., & Priyatno, A. (2024). Predict students’ dropout and academic success with XGBoost. Journal of Education and Computer Applications, 1(2), 108–119.
- Sabando Moreira, P. A., & Zambrano Montenegro, D. F. (2024). Técnicas de machine learning para predecir la deserción estudiantil universitaria: Una revisión sistemática de la literatura: Machine learning techniques to predict university student dropout: A systematic review of the literature. Revista Científica Multidisciplinar G-Nerando, 5(2), 1181–1189. [CrossRef]
- Vaarma, M., & Li, H. (2024). Predicting student dropouts with machine learning: An empirical study in Finnish higher education. Technology in Society, 76, 102474. [CrossRef]
- Villar, A., & de Andrade, C. R. V. (2024). Supervised machine learning algorithms for predicting student dropout and academic success: A comparative study. Discover Artificial Intelligence, 4, 2. [CrossRef]
- Zerkouk, M., Mihoubi, M., & Chikhaoui, B. (2025). SentiDrop: A multi modal machine learning model for predicting dropout in distance learning. arXiv. arXiv:2507.10421.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).