Submitted:
25 September 2025
Posted:
25 September 2025
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Literature Review
III. Methodology
A Data Source and Acquisition
B Data Integration and Cleaning
C Feature Engineering
- Single Feature Model: From the previous years, total enrollment (EFTOTLT) is the only feature where it is a strong baseline.
- Core Demographics Model: Total enrollment, men (EFTOTLM), and women (EFTOTLW), reflecting gendered trends.
- Expanded Demographics Model: the core three plus nine race/ethnicity and residency features (EFAIANT, EFASIAT, EFBKAAT, EFHISPT, EFNHPIT, EFWHITT, EF2MORT, EFUNKNT, EFNRALT), providing fine-grained subgroup detail as advocated in the literature [9,10,26,27].
D Sequence Construction for Time-Series Modeling
- The input of each window had feature values from the years t-L to t-1.
- Total enrollment in the year t served as the target variable.
- To illustrate, a 5-year sequence took years 2019–2023 as input to forecast enrollment in 2023. Each of the institutions covered the identification of all the corresponding rolling windows thus providing more than 500,000 training examples for each configuration.
E Model Architecture
- Input size: (sequence_length, num_features), like (5, 12) for 5-year, 12-feature input.
- Core Layers: A layer of LSTM (48 units) is first, then a dropout layer (rate 0.15) to combat overfitting.
- A dense layer (24 units, ReLU activation) used for nonlinear feature transformation.
- Final output: a dense layer with 1 unit and linear activation for regression.
- A grid search on the validation set was used to determine all hyperparameters (layer sizes, dropout rate, learning rate, batch size).
F Training and Validation
G Baseline and Evaluation Metrics
- Mean Absolute Error (MAE): which determines the average absolute prediction error.
- Root Mean Squared Error (RMSE): this errors-associated measure, on the other hand, penalizes larger errors, and is sensitive to outliers.
- The results of the two indicators were provided in standardized metrics (z-scores), in addition to their electron transform option into actual enrollment numbers for interpretation.
- A comparison between models was done by paired t-tests appropriately.
H Software and Reproducibility

IV. Experimental Results
A Comparative Model Performance
B Model Calibration and Error Analysis
C Institutional-Level Forecasts
V. Discussion
A Interpretation of Findings
- More features (beyond total enrollment) consistently improve prediction, particularly for demographically diverse and larger institutions.
- Longer input sequences (5 years vs. 3 years) yield lower error, reflecting the importance of historical trends for capturing latent patterns and volatility in institutional enrollment.
B Practical Implications
- Improved resource allocation: More accurate, robust enrollment forecasts enable better budgeting, staffing, and planning.
- Equity and inclusion: Disaggregated demographic predictions can help target interventions and monitor progress on diversity, equity, and inclusion (DEI) initiatives.
- Reproducibility and scalability: The fully open, Python-based pipeline allows other institutions to adapt and deploy the framework to their own data.
C Limitations and Future Work
- Data dependency: The models require multi-year, high-quality institutional data, which may not be uniformly available for all institutions or countries.
- Unobserved shocks: Sudden external disruptions (e.g., pandemics, policy shifts) may not be predictable from past trends alone.
- Interpretability: Deep neural networks can be less interpretable than linear or rule-based models; integrating explainable AI (XAI) methods remains an area for future research [30].
VI. Conclusions
References
- Y. A. Chen, R. Li, and L. S. Hagedorn, “Undergraduate international student enrollment forecasting model: An application of time series analysis,” ERIC, 2019. [CrossRef]
- A. P. Dela Cruz et al., “Higher education institution enrollment forecasting using data mining technique,” Int. J. Adv. Trends Comput. Sci. Eng., 2020.
- University of Massachusetts Boston, “Factors and techniques for projecting enrollment,” Research Brief, 2017.
- J. Ward, “Forecasting enrollment to achieve institutional goals,” Seattle Pacific Univ., 2007.
- H. Kaur and G. Jagdev, “A comprehensive review on time series forecasting techniques,” JETIR, 2023.
- S. Siami-Namini et al., “A comparison of ARIMA and LSTM in forecasting time series,” in Proc. IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), 2018. [CrossRef]
- L. Gao et al., “Advancing temporal forecasting: A comparative analysis of conventional paradigms and deep learning architectures,” Springer, 2025. [CrossRef]
- L. Mozaffari and J. Zhang, “Predictive modeling of stock prices using transformer model,” in Proc. ACM Int. Conf. Mach. Learn. Technol. (ICMLT), 2024. [CrossRef]
- K. Cao et al., “Advanced hybrid LSTM–transformer architecture for real-time multi-task prediction in engineering systems,” Sci. Rep., Nature, 2024. [CrossRef]
- Y. Zhao et al., “Hybrid LSTM–transformer architecture with multi-scale feature fusion for high-accuracy gold futures price forecasting,” Mathematics, MDPI, 2025. [CrossRef]
- M. Nawar et al., “Transfer learning in deep learning models for building load forecasting,” arXiv, 2023. [CrossRef]
- G. Al-Naymat and M. A. Al-Betar, “University student enrollment prediction: A machine learning framework,” in Lect. Notes Netw. Syst. (LNNS), Springer, 2024. [CrossRef]
- L. Schmid et al., “Comparing statistical and machine learning methods for time series forecasting in data-driven logistics,” arXiv, 2023. [CrossRef]
- V. I. Kontopoulou et al., “A review of ARIMA vs. machine learning approaches for time series forecasting,” Future Internet, MDPI, 2023. [CrossRef]
- S. Jin et al., “A comparative analysis of traditional and machine learning methods in forecasting the stock markets,” The Society of AI, 2024. [CrossRef]
- King County Public Health, “Demographic data toolkit,” King County, 2025.
- J. L. Hughes et al., “Guidance for researchers when using inclusive demographic questions,” USU IRB, 2023. [CrossRef]
- CMS Office of Minority Health, “Inventory of resources for standardized demographic and language data collection,” CMS, 2024.
- MIT Institutional Research, “Inclusive language for collecting demographic data,” MIT IR, 2025.
- Community Commons, “Data granularity in demographic analysis,” Community Commons, 2024.
- National Center for Education Statistics (NCES), “IPEDS fall enrollment data,” IPEDS Portal, 2025.
- NCES Data Explorer, “Integrated Postsecondary Education Data System (IPEDS),” Data Explorer, 2025.
- ICPSR, “IPEDS series archive,” ICPSR, 2025.
- Datalumos, “IPEDS complete 1980–2023 dataset,” Datalumos, 2025.
- NCES, “IPEDS tools and resources,” Use the Data Portal, 2025.
- Springer, “Benchmarking deep learning vs. traditional forecasting models,” Springer, 2025.
- MDPI, “Hybrid statistical-AI models in forecasting,” Future Internet, 2023.
- arXiv, “Simulation study of forecasting methods in logistics,” arXiv, 2023.
- JETIR, “Comparative analysis of forecasting techniques,” JETIR, 2023.
- MDPI, “Multi-scale feature fusion in forecasting,” Mathematics, 2025.
- OECD, “The state of higher education one year into the COVID-19 pandemic,” OECD Report, 2021.







| Model | Seq. Length | Features | MAE | RMSE |
| Naive Last-Value Baseline | 1 | 1 | 0.31 | 0.85 |
| LSTM (Total Only) | 3 | 1 | 0.31 | 0.85 |
| LSTM (Core) | 3 | 3 | 0.27 | 0.77 |
| LSTM (Expanded) | 3 | 12 | 0.27 | 0.80 |
| LSTM (Expanded) | 5 | 12 | 0.25 | 0.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).