Submitted:
23 July 2025
Posted:
24 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We introduce a next-generation learner model, TSA-GRU, a novel deep learning module that uses the combined output of TSA and a GRU, each focusing on context-aware, enriched embeddings of student interactions.
- In parallel, such embeddings are concatenated with a sequential modeling procedure, which helps capture short- and long-term dependencies in student behavior and hence supports learning outcome and response time prediction.
- In Our current procedure, the embedding originating from TSA has been merged with GRU-based sequential modeling. This allows it to capture short-term fluctuations and long-term temporal dependencies in student behavior. All of this translates into better performance in predicting learning outcomes and the time needed to respond.
- Finally, extensive experiments and validation on big-scale educational datasets have proved the superiority of TSA-GRU over existing knowledge-tracing models. These experiments show that TSA-GRU is capable of being put to real-world use in adaptive learning systems.
2. Related Works
- 1.
- Learner Behavior Analytics in MOOCs, which examines the intricate patterns of student interactions.
- 2.
- Dropout prediction studies, which use traditional ML techniques to define and detect early indicators of disengagement and help in better learner retention.
- 3.
- MOOC engagement prediction strategies that rely on deep learning methods to identify intricacies in learner interaction and engagement patterns.
- 4.
- Personalized learning in MOOCs that encompasses adaptive-based algorithms to provide pedagogical content and intervention based on individual learners.
- 5.
- Student performance which arises in online learning by means of a combination of deep and traditional learning to effectively forecast academic aspects.
2.1. Learner Behavior Analytics in MOOCs
2.2. Mooc Engagement Prediction
2.3. Mooc Dropout Prediction Models
2.4. Personalized Learning in MOOCs
2.5. Student Performance Prediction in Online Learning
Review and Comparative Analysis of Prior Work
Problem Analysis and Proposed Solution
- Can the integration of TSA method to a GRU-based sequential modeling approach in the TSA-GRU module assist in filtering out noise and capturing both short and longer-term dependencies for students’ overall interaction?
- Will the TSA-GRU model enhance the meta-representation of learner behavior by allowing for noise monitoring and suppressing noise while still ’corral’ contextually relevant features?
- Thus, creating improved predictive accuracy of complex behavior and generalizable targeted interventions within learner contexts found in MOOC design?
3. Methodology
- TSA Module:
- Using sparse multi-head attention mechanism, this module extracts the most salient temporal features by focusing on certain parts of the sequence.
- GRU Sequential Encoder and Fusion Module:
- This extracts local sequential dynamics from attention-enhanced features and fuses these global and local representations for the final prediction.
3.1. Temporal Sparse Attention (TSA)
- Embedding Transformation
- is the input sequence of shape ,
- and are learnable parameters,
- is the embedded sequence with H being the hidden dimension.
- Multi-Head Sparse Attention
- Contextual Feature Compression

3.2. GRU Sequential Encoder and Fusion Module
- GRU Sequential Encoding
- Global Pooling and Feature Fusion
3.3. Final Prediction

4. Experimental Results and Comparaison
4.1. Dataset
-
Input Data:
- -
- Student interaction logs with records featuring original, correct, attempt_count, and position.
- -
- The variable to predict is ms_first_response_time.
-
Preprocessing:
- -
- Cleaning: Removing duplicate records.
- -
- Feature Scaling: Standardizing input features and the target using z-score normalization.
-
Sequence Generation:
- -
- For each user, a fixed-length sequence (e.g., 10 timesteps) is generated.
- -
- The last value within each sequence is used as the prediction target.
-
Variable Split:
- -
- Processed data is split into training and testing sets.
- -
- Data is wrapped in a custom PyTorch Dataset and DataLoader for efficient batching during model training.
4.2. Materials
4.3. Implementation of the TSA-GRU for Learner Behavior Analytics
- Linear Embedding Layer: The input sequences (10 steps with multiple numerical features per step) are projected into a 128-dimensional space using a linear embedding layer. With 640 learnable parameters, this layer enables the model to capture complex nonlinear relationships among features.
-
TSA-Attention Module: The embedded features are refined through linear transformations of queries, keys, and values. A sparse selection mechanism, based on binary masks and thresholding, filters out irrelevant interactions and enhances the most pertinent temporal patterns.
- -
- Query transformation: 16,512 parameters
- -
- Key transformation: 16,512 parameters
- -
- Value transformation: 16,512 parameters
- -
- Output transformation: 16,512 parameters
The total number of parameters in the TSA-Attention module is therefore , enabling effective feature selection and the extraction of global behavioral representations. - GRU Module for Sequential Dependency Modeling: The refined features from the TSA module are passed to a GRU layer. Using update and reset gates, the GRU captures temporal dependencies in student behavior, retaining important historical information while discarding less useful patterns. This module contains tunable parameters.
- Fusion and Prediction Layer: Outputs from both the TSA-Attention and GRU modules undergo global average pooling to reduce sequence length. The pooled outputs (each 128-dimensional) are concatenated into a 256-dimensional vector. This vector is passed through a fully connected layer (with 257 parameters) to predict the normalized response time for the next interaction.
4.4. TSA-GRU Performance Across Epochs and with Other Models
- TSA-GRU Performance Across Epochs
Model Performance Across 5 Epochs
Model Performance Across 10 Epochs
Model Performance Across 20 Epochs
Model Performance Across 30 Epochs
Model Performance Across 40 Epochs
Model Performance Across 100 Epochs
- Comparison of Knowledge Tracing Models
4.5. Cross-Validation and Statistical Significance Analysis
- 5-Fold Cross-Validation Procedure and Results
- Confidence Intervals and Variability Analysis
- Statistical Significance Testing
- Visual Comparative Analysis
5. Discussion
- Performance Across the Training Epochs
- Comparative Analysis with Other Models
- Adaptive Learning and Knowledge Tracing
- Cross-Validation and Statistical Significance Analysis
6. Implications and Future Directions
7. Conclusion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Boufaida, S.O.; Benmachiche, A.; Maatallah, M.; Chemam, C. An Extensive examination of varied approaches in e-learning and MOOC Research: A thorough overview. In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS). IEEE; 2024; pp. 1–8. [Google Scholar] [CrossRef]
- Nadira, B.; Makhlouf, D.; Amroune, M. Personalized Online Learning: Context Driven Massive Open Online Courses. International Journal of Web-Based Learning and Teaching Technologies 2021, 16, 1–15, Publisher: IGI Global. [Google Scholar] [CrossRef]
- Boutabia, I.; Benmachiche, A.; Betouil, A.A.; Chemam, C. A survey in the use of deep learning techniques in the open classroom approach. In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS). IEEE; 2024; pp. 1–7. [Google Scholar] [CrossRef]
- Kahil, M.S.; Bouramoul, A.; Derdour, M. Big data visual exploration as a recommendation problem. International Journal of Data Mining, Modelling and Management 2023, 15, 133–153. [Google Scholar] [CrossRef]
- Benmachiche, A.; Sahia, A.; Boufaida, S.O.; Rais, K.; Derdour, M.; Maazouzi, F. Enhancing learning recommendations in mooc search engines through named entity recognition. Education and Information Technologies, 2025, pp. 1–31. [CrossRef]
- Almaiah, M.A.; Al-Khasawneh, A.; Althunibat, A. Exploring the critical challenges and factors influencing the E-learning system usage during COVID-19 pandemic. Education and information technologies 2020, 25, 5261–5280. [Google Scholar] [CrossRef] [PubMed]
- Wunnasri, W.; Musikawan, P.; So-In, C. A two-phase ensemble-based method for predicting learners’ grade in MOOCs. Applied Sciences 2023, 13, 1492. [Google Scholar] [CrossRef]
- Wang, W.; Guo, L.; Sun, R. Rational herd behavior in online learning: Insights from MOOC. Computers in Human Behavior 2019, 92, 660–669. [Google Scholar] [CrossRef]
- Ghanem, M.; Mouloudi, A.; Mourchid, M. Towards a scientific research based on semantic web. Procedia Computer Science 2015, 73, 328–335. [Google Scholar] [CrossRef]
- Sedraoui, B.K.; Benmachiche, A.; Makhlouf, A.; Chemam, C. Intrusion Detection with deep learning: A literature review. In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS). IEEE; 2024; pp. 1–8. [Google Scholar] [CrossRef]
- Kim, S.; Cho, S.; Kim, J.Y.; Kim, D.J. Statistical assessment on student engagement in asynchronous online learning using the k-means clustering algorithm. Sustainability 2023, 15, 2049. [Google Scholar] [CrossRef]
- Mukala, P. Unveiling the Synergy of Process Mining, Explainable AI, and Learning Analytics in Advancing Educational Data Interpretability: Paving the Way for a New Era in Educational Analytics. Explainable AI, and Learning Analytics in Advancing Educational Data Interpretability: Paving the Way for a New Era in Educational Analytics (January 24, 2025), 24 January; 2025. [CrossRef]
- Doss, A.N.; Krishnan, R.; Karuppasamy, A.D.; Sam, B. Learning analytics model for predictive analysis of learners behavior for an indigenous MOOC platform (tadakhul system) in Oman. International Journal of Information and Education Technology 2024, 14, 961–967. [Google Scholar] [CrossRef]
- Şahin, M. Advances in video analytics. Technology, Knowledge and Learning 2024, 29, 1869–1875. [Google Scholar] [CrossRef]
- Su, B.; Peng, J. Sentiment analysis of comment texts on online courses based on hierarchical attention mechanism. Applied Sciences 2023, 13, 4204. [Google Scholar] [CrossRef]
- Kong, B.; Hemberg, E.; Bell, A.; O’Reilly, U.M. Investigating Student’s Problem-solving Approaches in MOOCs using Natural Language Processing. In Proceedings of the LAK23: 13th International Learning Analytics and Knowledge Conference; 2023; pp. 262–272. [Google Scholar] [CrossRef]
- Chonraksuk, J.; Boonlue, S. Development of an AI predictive model to categorize and predict online learning behaviors of students in Thailand. Heliyon 2024, 10. [Google Scholar] [CrossRef] [PubMed]
- Benabbes, K.; Housni, K.; Hmedna, B.; Zellou, A.; El Mezouary, A. A new hybrid approach to detect and track learner’s engagement in e-learning. IEEE access 2023, 11, 70912–70929. [Google Scholar] [CrossRef]
- Deng, R. Effect of video styles on learner engagement in MOOCs. Technology, Pedagogy and Education 2024, 33, 1–21. [Google Scholar] [CrossRef]
- Hu, Y.; Jiang, Z.; Zhu, K. An optimized cnn model for engagement recognition in an e-learning environment. Applied Sciences 2022, 12, 8007. [Google Scholar] [CrossRef]
- Anand, G.; Kumari, S.; Pulle, R. Fractional-Iterative BiLSTM Classifier: A Novel Approach to Predicting Student Attrition in Digital Academia. SSRG International Journal of Computer Science and Engineering 2023, 10, 1–9. [Google Scholar] [CrossRef]
- Baron, M.J.S.; Sanabria, J.S.G.; Diaz, J.E.E. Deep Neural Network (DNN) applied to the analysis of student dropout in a Higher Education Institution (HEI). Investigación e Innovación en Ingenierías 2022, 10, 202–214. [Google Scholar] [CrossRef]
- Al Amoudi, S.; Alhothali, A.; Mirza, R.; Assalahi, H.; Aldosemani, T. Click-Based Representation Learning Framework of Student Navigational Behavior in MOOCs. IEEE Access 2024. [Google Scholar] [CrossRef]
- Rizwan, S.; Nee, C.K.; Garfan, S. Identifying the factors affecting student academic performance and engagement prediction in mooc using deep learning: A systematic literature review. IEEE Access 2025. [Google Scholar] [CrossRef]
- Xia, X.; Qi, W. Driving STEM learning effectiveness: dropout prediction and intervention in MOOCs based on one novel behavioral data analysis approach. Humanities and Social Sciences Communications 2024, 11, 1–19. [Google Scholar] [CrossRef]
- Agarwal, A.; Mishra, D.S.; Kolekar, S.V. Knowledge-based recommendation system using semantic web rules based on Learning styles for MOOCs. Cogent Engineering 2022, 9, 2022568. [Google Scholar] [CrossRef]
- Wu, S.; Cao, Y.; Cui, J.; Li, R.; Qian, H.; Jiang, B.; Zhang, W. A comprehensive exploration of personalized learning in smart education: From student modeling to personalized recommendations. arXiv preprint arXiv:2402.01666, arXiv:2402.01666 2024. [CrossRef]
- Chanaa, A.; et al. Sentiment analysis on massive open online courses (MOOCs): multi-factor analysis, and machine learning approach. International Journal of Information and Communication Technology Education (IJICTE) 2022, 18, 1–22. [Google Scholar] [CrossRef]
- Chen, Q.; Yu, X.; Liu, N.; Yuan, X.; Wang, Z. Personalized course recommendation based on eye-tracking technology and deep learning. In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). IEEE; 2020; pp. 692–968. [Google Scholar] [CrossRef]
- Wang, Y.; Lai, Y.; Huang, X. Innovations in Online Learning Analytics: A Review of Recent Research and Emerging Trends. IEEE Access 2024. [Google Scholar] [CrossRef]
- Ismanto, E.; Ab Ghani, H.; Saleh, N.I.M.; Al Amien, J.; Gunawan, R. Recent systematic review on student performance prediction using backpropagation algorithms. Telkomnika (Telecommunication Computing Electronics and Control) 2022, 20, 597–606. [Google Scholar] [CrossRef]
- Yuan, J.; Qiu, X.; Wu, J.; Guo, J.; Li, W.; Wang, Y.G. Integrating behavior analysis with machine learning to predict online learning performance: A scientometric review and empirical study. arXiv preprint arXiv:2406.11847, arXiv:2406.11847 2024. [CrossRef]
- Li, Y.; Wang, X.; Chen, F.; Zhao, B.; Fu, Q. Online Learning Behavior Analysis and Prediction Based on Spiking Neural Networks. Journal of Social Computing 2024, 5, 180–193. [Google Scholar] [CrossRef]
- Althibyani, H.A. Predicting student success in MOOCs: a comprehensive analysis using machine learning models. PeerJ Computer Science 2024, 10, e2221. [Google Scholar] [CrossRef]
- 2009-2010 ASSISTment Data. https://t.ly/oSggh. Accessed: 2025-07-19.



















| Method | The Resolved Problem | Gaps Resolved by the Approach | Accuracy | Scalability | Interpretability | Resource Consumption | Adaptability to Large Datasets |
| Learner Behavior Analytics in MOOCs | |||||||
| Machine Learning [11] | Traditional engagement metrics do not capture true student engagement. | Introduces k-means clustering to indicate the necessity of behavioral, emotional, and cognitive measures. | – | High | High | Low-Medium | Medium |
| Process Mining + XAI [12] | Opaque analytics in MOOCs lead to a lack of trust and insight. | Combines process mining, learning analytics, and XAI to identify hidden patterns and provide transparent recommendations. | – | High | Very High | Medium | Very Good |
| Tadakhul System [13] | Cold start issues and insufficient personalization on MOOC platforms. | Combines BiLSTM and CNN to predict behavior, engagement, and drop-out rates. | High | Medium | Medium | Medium | Very Good |
| Video Analytics [14] | Limited understanding of how video interactions affect learner engagement. | Analyzes video metrics and recommends real-time feedback via dashboards. | Medium | High | Medium | Medium | Good |
| Sentiment Analysis [15] | Comment texts are underexplored in MOOCs. | Combines CNN and LSTM in a hierarchical attention model for sentiment detection. | High | Medium | Medium | Medium | Good |
| NLP Framework [16] | Log data analysis overlooks off-platform problem-solving. | Uses topic modeling on free-text responses to identify 18 problem-solving types. | Medium | Medium | High | Medium | Good |
| Predictive AI [17] | Need for improved learner classification for interventions. | Uses k-means and decision trees to segment students for targeted support. | High | Medium | High | Medium | Very Good |
| MOOC Engagement Prediction | |||||||
| Hybrid Engagement Model [18] | Difficulty predicting engagement from complex MOOC data. | Combines unsupervised clustering, BiLSTM, and decision trees. | High | Medium | Medium | Medium | Very Good |
| Video Style Impact Study [19] | Uncertainty on best video modalities for engagement. | Classifies video styles and correlates with engagement types. | Medium | High | Very High | Medium | Good |
| Optimized CNN Model [20] | Need for lightweight real-time model. | Upgrades ShuffleNet v2 with attention; outperforms Inception V3 and ResNet. | High | High | High | Low | Very Good |
| MOOC Dropout Prediction Models | |||||||
| Fractional-Iterative BiLSTM [21] | Conventional models struggle with feature identification. | Uses fractional calculus in BiLSTM to capture nuanced behavior. | High | Medium | Medium | Medium | Good |
| Deep Neural Network [22] | Sparse hand-crafted features hinder prediction. | Uses 17 features from 3,000 students in a DNN. | High | High | Medium | High | Very Good |
| Self-Supervision from Clickstream [23] | Clickstream behavior is underutilized. | Uses self-supervised skip-gram + PCA for better dropout prediction. | Medium | High | Medium | Medium | Good |
| Integrated deep learning and machine learning [24] | Dropout factors are multifaceted and underaddressed. | Integrates DL and ML for adaptive interventions. | Medium | High | Medium | Medium | Good |
| CNN + RNN Hybrid [25] | High dropout in STEM MOOCs; few early warnings. | Combines CNN and LSTM to analyze behavioral sequences. | High | High | Medium | High | Very Good |
| Personalized Learning in MOOCs | |||||||
| Knowledge-based Recommender [26] | Cold start and lack of personalization in recommendations. | Combines semantic web rules and clustering based on learning styles. | High | Medium | High | Medium | Very Good |
| Personalized Learning Exploration [27] | Need for diverse behavior-based recommendations. | Deep learning-based analysis of diverse learner data. | Medium | High | Medium | High | Very Good |
| Sentiment Analysis for Personalization [28] | Learner feedback is underused for personalization. | Combines sentiment analysis with engagement metrics. | Medium | Medium | High | Medium | Good |
| Eye-Tracking Recommendations [29] | Observing non-intrusive behavior for personalization is difficult. | Uses eye-tracking and DL to suggest courses. | Medium | Medium | High | Medium | Good |
| Student Performance Prediction in Online Learning | |||||||
| Behavior-Integrated Prediction [32] | ML models often ignore behavior logs. | Integrates clustering with ML models to include behavior. | High | High | Medium | Medium | Very Good |
| Spiking Neural Networks [33] | Course completion prediction from massive logs is hard. | Uses SNNs to connect behavioral data to completion. | High | High | Low-Medium | High | Very Good |
| ML on Diverse Features [34] | Traditional models use limited data for predictions. | Combines LR and RF on demographics, assessments, and logs. | Medium | High | High | Medium | Very Good |
| Hyperparameter | Value |
|---|---|
| Number of Attention Heads | 4 |
| Hidden Dimension (Embedding) | 128 |
| Learning Rate | 0.001 |
| Batch Size | 64 |
| Number of Epochs | 5 |
| Optimizer | Adam |
| Dropout Rate | Not used |
| Loss Function | Binary Cross-Entropy |
| Sequence Length | 10 |
| GRU Hidden Units | 128 |
| Weight Initialization | Xavier Normal |
| Sparsity Threshold | 50% of max attention score |
| Total Parameters | 166,017 |
| Model | Train Loss | Train Accuracy | Test Loss | Test Accuracy |
|---|---|---|---|---|
| MSA-GRU | 0.0667 | 81.36% | 0.0222 | 80.00% |
| BKT | 0.6651 | 64.10% | 0.6691 | 64.01% |
| PFA | 0.6334 | 63.63% | 0.6363 | 64.11% |
| TSA | 0.0678 | 75.06% | 0.0238 | 74.83% |
| TSA-GRU | 0.0296 | 95.62% | 0.0209 | 95.60% |
| Model | Mean Accuracy | Standard Deviation (SD) | 95% Confidence Interval (CI) |
|---|---|---|---|
| TSA-GRU | 95.63% | ±0.0014 | ±0.0013 |
| MSA-GRU | 65.21% | ±0.0346 | ±0.0430 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).