Submitted:
24 October 2025
Posted:
27 October 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Section 2: Survey of Enabling Technologies – A comprehensive review of existing AI tools and technologies relevant to real-time content analysis, learner modeling, and multi-modal delivery. The section highlights current capabilities and identifies the limitations of existing systems, underscoring the need for the proposed STREAM framework.
- Section 3: STREAM Framework: Concepts, Architecture, Components – This section introduces the conceptual framework that outlines the end-to-end flow of content from instructional source to personalized delivery. It details the system’s key components, including content decomposition, learner profiling, and adaptive multi-modal presentation.
- Section 4: Feasibility and Early Prototype Design – This section presents the initial implementation strategy, including designing a pilot study using pre-recorded lectures. It outlines the methodological approach to analyzing content and simulating adaptive delivery based on the learners’ preferences.
- Section 5: Discussion – An analytical discussion of how STREAM addresses existing gaps in the literature. The section examines the theoretical and practical implications of implementing such a system, with a focus on enhancing equity, engagement, and real-time responsiveness in virtual learning.
- Section 6: Conclusion and Next Steps – A summary of key contributions, followed by a roadmap for future research, including the development of subsequent papers that will explore specific components of the framework in depth.
2. Survey of Enabling Technologies
2.1. Real-Time Content Analysis Tools
2.2. Learner Modeling and Preference Detection
2.3. Multimodal Delivery Tools
2.4. Existing Adaptive Learning Systems
3. STREAM Framework: Concepts, Architecture, Components
3.1. Conceptual Flow
3.1.1. Source Side

3.1.2. Middle Layer
3.1.3. Receiver Side
3.2. Key Components
3.2.1. Knowledge Point Extraction
3.2.2. Metadata Generation
3.2.3. Learner Profiling
3.2.4. Adaptive Content Generation
3.3. Modularity
4. Feasibility and Early Prototype Design
4.1. Source Side
4.1.1. Pre-Recorded Lecture (e.g., from Pree”)
S1 [00:02:10–00:02:15]: From the start, [step]forward, forward, right[/step] to reach the [entity]zoo[/entity].
S2 [00:02:15–00:02:17]: [prompt]Show me the path[/prompt].
S3 [00:02:17–00:02:19]: [prompt]Test it[/prompt] with your [entity]car[/entity].
4.2. Middle Layer
4.2.1. Content Component Extraction
S1 [00:02:10.00--00:02:15.00]: "From the start, go forward, forward, then right to reach the zoo." S2 [00:02:15.10--00:02:16.80]: "Show me the path." S3 [00:02:17.00--00:02:18.50]: "Test it with your car."
Sentence segmentation (spaCy) [ S1 | S2 | S3 ]
Multi-label classifier outputs (BERT + sigmoid) # probs shown only for labels >= 0.05 S1: knowledge_point: 0.88 entity: 0.11 (token "zoo") example: 0.07 S2: prompt: 0.95 S3: prompt: 0.93
Rule-assisted passes Imperative detector: S2 → prompt=True (verb-initial "Show") S3 → prompt=True (verb-initial "Test") Arrow-sequence parser (token collapse): S1 evidence span chars [16,47) = "go forward, forward, then right" canonical_steps: ["forward","forward","right"]
Exported tag items (JSON Lines)
"sentence_id":"S1",
"time":{"start":"00:02:10.00","end":"00:02:15.00"},
"text":"From the start, go forward, forward, then right to reach the zoo.",
"labels":{"knowledge_point":0.88,"entity":0.11,"example":0.07},
"evidence_spans":{"steps":[16,47]},
"canonical_steps":["forward","forward","right"]
"sentence_id":"S2",
"time":{"start":"00:02:15.10","end":"00:02:16.80"},
"text":"Show me the path.",
"labels":{"prompt":0.95},
"evidence_spans":{"prompt":[66,83]}
"sentence_id":"S3",
"time":{"start":"00:02:17.00","end":"00:02:18.50"},
"text":"Test it with your car.",
"labels":{"prompt":0.93},
"evidence_spans":{"prompt":[84,106]}
"id",
"timecodes",
"text",
"visual_refs",
"type" {definition, step, prompt, example},
"bloom_level",
"difficulty",
"prerequisites"
"id": "KO-0142",
"timecodes": {"start": "00:02:11.20", "end": "00:02:14.90"},
"text": "forward, forward, right",
"type": "step",
"asr_conf": 0.93,
"tag_conf_map": {"knowledge_point": 0.88},
"visual_refs": [{"frame": 3187, "bbox": [412, 276, 86, 44], "ocr": null}],
"bloom_level": "apply",
"difficulty": "intro",
"prerequisites": ["KO-0061: forward arrow meaning"],
"source_hash": "c7a9f2"
"id": "KO-0151",
"timecodes": {"start": "00:01:40.00", "end": "00:01:48.00"},
"text": "test it with your car",
"type": "prompt",
"asr_conf": 0.91,
"tag_conf_map": {"prompt": 0.95},
"visual_refs": [],
"bloom_level": "apply",
"difficulty": "intro",
"prerequisites": ["KO-0148: plan a path"],
"source_hash": "5d2e61"
"id": "KO-0163",
"timecodes": {"start": "00:02:12.10", "end": "00:02:13.20"},
"text": "zoo",
"type": "entity",
"asr_conf": 0.00,
"ocr_conf": 0.88,
"visual_refs": ["frame": 3190, "bbox": 508, 240, 92, 36, "ocr": "zoo"],
"bloom_level": "remember",
"difficulty": "intro",
"prerequisites": [],
"source_hash": "c7a9f2"
4.3. Receiver Side
4.3.1. Single Student Style Adaptation
4.4. Tools
4.5. Feasibility Criteria & Quick Evaluation
4.6. Scope
4.7. Risks & Immediate Mitigations
4.8. Pilot Study Outcomes
4.8.1. Accuracy
4.8.2. Latency and Resources
4.8.3. Output Quality
4.8.4. Traceability and Provenance
4.8.5. Lightweight Ablations
5. Discussion
5.1. Why STREAM Fills an Important Research Gap?
5.2. Alignment with Personalized Learning Theories
5.3. Role of AI in Equity and Access
5.4. Possibility for Cross-Disciplinary Collaboration
5.5. Limitations and Roadmap for Validation
- Phase 1: Expanded content testing (short-term, 3-6 months): Apply the STREAM framework to a corpus of 20-30 lessons (5-15 minutes each) spanning STEM, humanities, and languages. Include varied input qualities (e.g., noisy audio from actual classrooms, handwritten slides, multilingual content). Metrics: ASR accuracy (>85%), tagging fidelity (inter-rater agreement >0.8 via human annotation), latency (<5 seconds end-to-end). This will test robustness to content diversity.
- Phase 2: Multi-profile learner validation (medium-term, 6-12 months): Conduct user studies with 50-100 diverse participants (e.g., multilingual learners, students with disabilities like dyslexia or ADHD, varying ages/levels). Simulate profiles beyond the visual (e.g., auditory, kinesthetic) and measure outcomes such as comprehension retention (pre/post-tests), engagement (time-on-task, self-reported via surveys), and preference matching. Use A/B testing to compare adapted vs. non-adapted content. This will evaluate equity impacts in controlled lab settings.
- Phase 3: Real-world deployment pilots (long-term, 12-24 months): Deploy in 3-5 virtual classrooms (e.g., K-12 and higher ed, urban/rural sites) with bandwidth constraints. Integrate with platforms like Zoom or Moodle, tracking scalability metrics (e.g., concurrent users without latency spikes, edge vs. cloud performance). Include ethical reviews for privacy and bias audits. Longitudinal data will assess sustained impacts on learning outcomes and accessibility.
6. Conclusions
6.1. Contribution Summary
6.2. Content Decomposition and Prompt Generation
6.3. Scope Alignment
Author Contributions
Data Availability Statement
Conflicts of Interest
Abbreviations
| API | Application Programming Interface |
| ASR | Automatic Speech Recognition |
| BERT | Bidirectional Encoder Representations from Transformers |
| CI | Confidence Interval |
| CV | Computer Vision |
| fps | frames per second |
| GPU | Graphics Processing Unit |
| NER | Named Entity Recognition |
| NLP | Natural Language Processing |
| OCR | Optical Character Recognition |
| PCA | Principal Component Analysis |
| PNG | Portable Network Graphics |
| SER | Sentence Error Rate |
| STREAM | Semantic Transformation and Real-Time Educational Adaptation Multimodal |
| SVG | Scalable Vector Graphics |
| T5 | Text-to-Text Transfer Transformer |
| TTS | Text-to-Speech |
| UDL | Universal Design for Learning |
| VAD | Voice Activity Detection |
| VARK | Visual, Auditory, Reading/Writing, Kinesthetic |
| VRAM | Video Random-Access Memory |
| YOLO | You Only Look Once |
| WCAG | Web Content Accessibility Guidelines |
| WER | Word Error Rate |
| 1 | Contrast ratio , where L is relative luminance. |
References
- Spaho, E.; Çiço, B.; Shabani, I. IoT Integration Approaches into Personalized Online Learning: Systematic Review. Computers 2025, 14, 63. [Google Scholar] [CrossRef]
- Farley, I.A.; Burbules, N.C. Online education viewed through an equity lens: Promoting engagement and success for all learners. Review of Education 2022, 10, e3367. [Google Scholar] [CrossRef]
- Bashir, A.; Bashir, S.; Rana, K.; Lambert, P.; Vernallis, A. Post-COVID-19 adaptations; the shifts towards online learning, hybrid course delivery and the implications for biosciences courses in the higher education setting. In Proceedings of the Frontiers in education. Frontiers Media SA, 2021, Vol. 6, p. 711619.
- Yu, Z.; Xu, W.; Yu, L. Constructing an online sustainable educational model in COVID-19 pandemic environments. Sustainability 2022, 14, 3598. [Google Scholar] [CrossRef]
- Hess, S.; Tremblay, F. Student engagement and the role of technology. Humans 2024, 4, 351–370. [Google Scholar] [CrossRef]
- Costa, C.; Bhatia, P.; Murphy, M.; Pereira, A.L. Digital education colonized by design: Curriculum reimagined. Education Sciences 2023, 13, 895. [Google Scholar] [CrossRef]
- Strielkowski, W.; Grebennikova, V.; Lisovskiy, A.; Rakhimova, G.; Vasileva, T. AI-driven adaptive learning for sustainable educational transformation. Sustainable Development 2025, 33, 1921–1947. [Google Scholar] [CrossRef]
- Xie, Y.; Yang, L.; Zhang, M.; Chen, S.; Li, J. A Review of Multimodal Interaction in Remote Education: Technologies, Applications, and Challenges. Applied Sciences 2025, 15, 3937. [Google Scholar] [CrossRef]
- Ayeni, O.O.; Al Hamad, N.M.; Chisom, O.N.; Osawaru, B.; Adewusi, O.E. AI in education: A review of personalized learning and educational technology. GSC Advanced Research and Reviews 2024, 18, 261–271. [Google Scholar] [CrossRef]
- Raj, N.S.; Renumol, V. A systematic literature review on adaptive content recommenders in personalized learning environments from 2015 to 2020. Journal of Computers in Education 2022, 9, 113–148. [Google Scholar] [CrossRef]
- Khine, M.S. Using AI for adaptive learning and adaptive assessment. In Artificial Intelligence in Education: A Machine-Generated Literature Overview; Springer, 2024; pp. 341–466.
- Ahamed, H.R.; Hanirex, D.K. A deep learning-enabled approach for real-time monitoring of learner activities in adaptive e-learning environments. In Proceedings of the 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT). IEEE, 2024, Vol. 1, pp. 846–851.
- Aladakatti, S.S.; Senthil Kumar, S. Exploring natural language processing techniques to extract semantics from unstructured dataset which will aid in effective semantic interlinking. International Journal of Modeling, Simulation, and Scientific Computing 2023, 14, 2243004. [Google Scholar] [CrossRef]
- Passi, N.; Raj, M.; Shelke, N.A. A review on transformer models: applications, taxonomies, open issues and challenges. In Proceedings of the 2024 4th Asian Conference on Innovation in Technology (ASIANCON). IEEE; 2024; pp. 1–6. [Google Scholar]
- Zeeshan, R.; Bogue, J.; Asghar, M.N. Relative applicability of diverse automatic speech recognition platforms for transcription of psychiatric treatment sessions. IEEE Access 2025. [Google Scholar] [CrossRef]
- Uke, S.; Junghare, P.; Kenjale, S.; Korade, S.; Kothwade, A. Comprehensive Real-Time Intrusion Detection System Using IoT, Computer Vision (OpenCV), and Machine Learning (YOLO) Algorithms. In Proceedings of the 2024 4th International Conference on Ubiquitous Computing and Intelligent Information Systems (ICUIS). IEEE; 2024; pp. 1680–1689. [Google Scholar]
- Hong, H.; Dai, L.; Zheng, X. Advances in Wearable Sensors for Learning Analytics: Trends, Challenges, and Prospects. Sensors 2025, 25, 2714. [Google Scholar] [CrossRef]
- Villegas-Ch, W.; Gutierrez, R.; Mera-Navarrete, A. Multimodal Emotional Detection System for Virtual Educational Environments: Integration Into Microsoft Teams to Improve Student Engagement. IEEE Access 2025. [Google Scholar] [CrossRef]
- Santhosh, J.; Pai, A.P.; Ishimaru, S. Toward an interactive reading experience: Deep learning insights and visual narratives of engagement and emotion. IEEE Access 2024, 12, 6001–6016. [Google Scholar] [CrossRef]
- Wang, Y.; Lai, Y.; Huang, X. Innovations in Online Learning Analytics: A Review of Recent Research and Emerging Trends. IEEE Access 2024. [Google Scholar] [CrossRef]
- Daraghmi, E.; Atwe, L.; Jaber, A. A Comparative Study of PEGASUS, BART, and T5 for Text Summarization Across Diverse Datasets. Future Internet 2025, 17, 389. [Google Scholar] [CrossRef]
- Orynbay, L.; Razakhova, B.; Peer, P.; Meden, B.; Emeršič, Ž. Recent advances in synthesis and interaction of speech, text, and vision. Electronics 2024, 13, 1726. [Google Scholar] [CrossRef]
- Pratschke, B.M. Generative AI and education: Digital pedagogies, teaching innovation and learning design; Springer, 2024.
- Patil, P.A.; Juanico, J.F. The Effectiveness of Khan Academy in Teaching Elementary Math. Behavior Analysis in Practice 2024, pp. 1–14.
- BANU, J.S.; Preethi, G. EMPOWERING SENTIMENT ANALYSIS OF COURSERA COURSE REVIEWS WITH SOPHISTICATED ARTIFICIAL BEE COLONY-INSPIRED DEEP Q-NETWORKS (SABC-DQN). Journal of Theoretical and Applied Information Technology 2024, 102. [Google Scholar]
- Zhou, Q.; Tang, Y. AI-Driven Adaptive Learning and Management System Research: A Practical Framework Based on the ALEKS System. In Proceedings of the 2025 International Conference on Artificial Intelligence and Digital Ethics (ICAIDE). IEEE; 2025; pp. 415–420. [Google Scholar]
- Rizvi, I.; Bose, C.; Tripathi, N. Transforming Education: Adaptive Learning, AI, and Online Platforms for Personalization. In Technology for Societal Transformation: Exploring the Intersection of Information Technology and Societal Development; Springer, 2025; pp. 45–62.
- Yang, C. Online Learning Platform of Modern Chinese Course Based on Multimodal Emotion-Aware Adaptive Learning. In Proceedings of the 2025 3rd International Conference on Data Science and Network Security (ICDSNS). IEEE; 2025; pp. 1–6. [Google Scholar]
- Yeganeh, L.N.; Fenty, N.S.; Chen, Y.; Simpson, A.; Hatami, M. The future of education: A multi-layered metaverse classroom model for immersive and inclusive learning. Future Internet 2025, 17, 63. [Google Scholar] [CrossRef]
- Yeganeh, L.N.; Simpson, A.; Fenty, N.; Hatami, M.; Rho, S.; Park, S.; Chen, Y. Immersive Future: A Case Study of Metaverse in Preparing Students for Career Readiness. In Proceedings of the 2025 International Conference on Metaverse Computing, Networking and Applications (MetaCom). IEEE; 2025; pp. 57–62. [Google Scholar]
- Bollu, J.; Relangi, S.R.S.P.; Musuku, S.; Gangadhar, P.; Divya Sri, K.S.; Sree, K.B. Personalized Learning Content Generator: A Multimodal Application with Ai-Driven Content Creation and Adaptive Learning. Available at SSRN 5221494 2025. [Google Scholar]
- Polonetsky, J.; Tene, O. Who is reading whom now: Privacy in education from books to MOOCs. Vand. J. Ent. & Tech. L. 2014, 17, 927. [Google Scholar]
- Childs, E.; Mohammad, F.; Stevens, L.; Burbelo, H.; Awoke, A.; Rewkowski, N.; Manocha, D. An overview of enhancing distance learning through emerging augmented and virtual reality technologies. IEEE transactions on visualization and computer graphics 2023, 30, 4480–4496. [Google Scholar] [CrossRef]
- Kayi, E.A. Transitioning to blended learning during COVID-19: Exploring instructors and adult learners’ experiences in three Ghanaian universities. British Journal of Educational Technology 2024, 55, 2760–2786. [Google Scholar] [CrossRef]
- Hughes, C. Meaning Particles and Waves in MOOC Video Lectures: A transpositional grammar guided observational analysis. Computers & Education, 2025; 105308. [Google Scholar]
- Chen, C.C.; Chai, M.H.; Lin, P.H. Exploring the Impact of Interactive Multimedia E-Books on the Effectiveness of Environmental Learning, Pro-Environmental Attitudes, and Behavioural Intentions Among Primary School Students. Journal of Computer Assisted Learning 2025, 41, e70087. [Google Scholar] [CrossRef]
- Dritsas, E.; Trigka, M. Methodological and technological advancements in E-learning. Information 2025, 16, 56. [Google Scholar] [CrossRef]
- Xu, X.; Li, J.; Zhu, Z.; Zhao, L.; Wang, H.; Song, C.; Chen, Y.; Zhao, Q.; Yang, J.; Pei, Y. A comprehensive review on synergy of multi-modal data and ai technologies in medical diagnosis. Bioengineering 2024, 11, 219. [Google Scholar] [CrossRef]
- Hong, S.; Moon, J.; Eom, T.; Awoyemi, I.D.; Hwang, J. Generative AI-Enhanced Virtual Reality Simulation for Pre-Service Teacher Education: A Mixed-Methods Analysis of Usability and Instructional Utility for Course Integration. Education Sciences 2025, 15, 997. [Google Scholar] [CrossRef]
- Raiaan, M.A.K.; Mukta, M.S.H.; Fatema, K.; Fahad, N.M.; Sakib, S.; Mim, M.M.J.; Ahmad, J.; Ali, M.E.; Azam, S. A review on large language models: Architectures, applications, taxonomies, open issues and challenges. IEEE access 2024, 12, 26839–26874. [Google Scholar] [CrossRef]
- Hang, C.N.; Tan, C.W.; Yu, P.D. MCQGen: A large language model-driven MCQ generator for personalized learning. IEEE Access 2024, 12, 102261–102273. [Google Scholar] [CrossRef]
- Khonde, K.R.; Shah, J.; Patel, P. EchoSense AI Transcrib Using DevOps. In Proceedings of the 2024 Parul International Conference on Engineering and Technology (PICET). IEEE; 2024; pp. 1–5. [Google Scholar]
- Almusfar, L.A. Improving learning management system performance: a comprehensive approach to engagement, trust, and adaptive learning. IEEE Access 2025. [Google Scholar] [CrossRef]
- Yoon, H.Y.; Kang, S.; Kim, S. A non-verbal teaching behaviour analysis for improving pointing out gestures: The case of asynchronous video lecture analysis using deep learning. Journal of Computer Assisted Learning 2024, 40, 1006–1018. [Google Scholar] [CrossRef]
- Li, C.; Wang, L.; Li, Q.; Wang, D. Intelligent analysis system for teaching and learning cognitive engagement based on computer vision in an immersive virtual reality environment. Applied Sciences 2024, 14, 3149. [Google Scholar] [CrossRef]
- Shen, L.; Zhang, Y.; Zhang, H.; Wang, Y. Data player: Automatic generation of data videos with narration-animation interplay. IEEE Transactions on Visualization and Computer Graphics 2023, 30, 109–119. [Google Scholar] [CrossRef]
- Saleem, R.; Aslam, M. A Multi-Faceted Deep Learning Approach for Student Engagement Insights and Adaptive Content Recommendations. IEEE Access 2025. [Google Scholar] [CrossRef]
- Liu, M.; Yu, D. Towards intelligent E-learning systems. Education and Information Technologies 2023, 28, 7845–7876. [Google Scholar] [CrossRef]
- Alwadei, A.M.; Mohsen, M.A. Investigation of the use of infographics to aid second language vocabulary learning. Humanities and Social Sciences Communications 2023, 10, 1–11. [Google Scholar] [CrossRef]
- Chen, J.J.; Adams, C.B. Drawing from and expanding their toolboxes: Preschool teachers’ traditional strategies, unconventional opportunities, and novel challenges in scaffolding young children’s social and emotional learning during remote instruction amidst COVID-19. Early Childhood Education Journal 2023, 51, 925–937. [Google Scholar] [CrossRef]
- Reales, D.; Manrique, R.; Grévisse, C. Core Concept Identification in Educational Resources via Knowledge Graphs and Large Language Models. SN Computer Science 2024, 5, 1029. [Google Scholar] [CrossRef]
- Xiao, Q.; Zhang, Y.W.; Xin, X.Q.; Cai, L.W. Sustainable personalized E-learning through integrated cross-course learning path planning. Sustainability 2024, 16, 8867. [Google Scholar] [CrossRef]
- Ridell, K.; Walldén, R. Graphical models for narrative texts: Reflecting and reshaping curriculum demands for Swedish primary school. Linguistics and Education 2023, 73, 101137. [Google Scholar] [CrossRef]
- Munir, H.; Vogel, B.; Jacobsson, A. Artificial intelligence and machine learning approaches in digital education: A systematic revision. Information 2022, 13, 203. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, Y.; Shi, H. A convolutional recurrent neural-network-based machine learning for scene text recognition application. Symmetry 2023, 15, 849. [Google Scholar] [CrossRef]
- Si, Q.; Hodges, T.S.; Mousavi, V. Designing Writers: A Self-Regulated Approach to Multimodal Composition in Teacher Preparation and Early Grades. Education Sciences 2025, 15, 1059. [Google Scholar] [CrossRef]
- Zeng, M.L.; Qin, J. Metadata; American Library Association, 2020.
- Das, S.; Das Mandal, S.K.; Basu, A. Classification of action verbs of Bloom’s taxonomy cognitive domain: An empirical study. Journal of Education 2022, 202, 554–566. [Google Scholar] [CrossRef]
- Liu, S.; Liu, S.; Sha, L.; Zeng, Z.; Gašević, D.; Liu, Z. Annotation Guideline-Based Knowledge Augmentation: Towards Enhancing Large Language Models for Educational Text Classification. IEEE Transactions on Learning Technologies 2025. [Google Scholar] [CrossRef]
- Leung, J. Examining the characteristics of practical knowledge from four public Facebook communities of practice in instructional design and technology. Ieee Access 2022, 10, 90669–90689. [Google Scholar] [CrossRef]
- Sümer, Ö.; Goldberg, P.; D’Mello, S.; Gerjets, P.; Trautwein, U.; Kasneci, E. Multimodal engagement analysis from facial videos in the classroom. IEEE Transactions on Affective Computing 2021, 14, 1012–1027. [Google Scholar] [CrossRef]
- Peter, H. Integrating Emotion Recognition in Educational Robots Through Deep Learning-Based Computer Vision and NLP Techniques 2025.
- Saxer, K.; Tuominen, H.; Schnell, J.; Mori, J.; Niemivirta, M. Lower Secondary Students’ Well-Being Profiles: Stability, Transitions, and Connections with Teacher–Student, and Student–Student Relationships. In Proceedings of the Child & Youth Care Forum. Springer; 2025; pp. 1–30. [Google Scholar]
- Lee-Cultura, S.; Sharma, K.; Giannakos, M.N. Multimodal teacher dashboards: Challenges and opportunities of enhancing teacher insights through a case study. IEEE Transactions on Learning Technologies 2023, 17, 181–201. [Google Scholar] [CrossRef]
- Yang, W.; Fu, R.; Amin, M.B.; Kang, B. The impact of modern ai in metadata management. Human-Centric Intelligent Systems 2025, 1–28. [Google Scholar] [CrossRef]
- Mosha, N.F.; Ngulube, P. Metadata standard for continuous preservation, discovery, and reuse of research data in repositories by higher education institutions: A systematic review. Information 2023, 14, 427. [Google Scholar] [CrossRef]
- Essa, S.G.; Celik, T.; Human-Hendricks, N.E. Personalized adaptive learning technologies based on machine learning techniques to identify learning styles: A systematic literature review. IEEE Access 2023, 11, 48392–48409. [Google Scholar] [CrossRef]
- Lee, Y.; Migut, G.; Specht, M. What attention regulation behaviors tell us about learners in e-reading?: Adaptive data-driven persona development and application based on unsupervised learning. IEEE Access 2023, 11, 118890–118906. [Google Scholar] [CrossRef]
- Hussain, T.; Yu, L.; Asim, M.; Ahmed, A.; Wani, M.A. Enhancing e-learning adaptability with automated learning style identification and sentiment analysis: a hybrid deep learning approach for smart education. Information 2024, 15, 277. [Google Scholar] [CrossRef]
- Lin, T.C.; Chiu, C.N.; Wang, P.T.; Fang, L.D. VisFactory: Adaptive Multimodal Digital Twin with Integrated Visual-Haptic-Auditory Analytics for Industry 4.0 Engineering Education. In Proceedings of the Multimedia. MDPI; 2025; Vol. 1, p. 3. [Google Scholar]
- Salloum, S.A.; Alomari, K.M.; Alfaisal, A.M.; Aljanada, R.A.; Basiouni, A. Emotion recognition for enhanced learning: using AI to detect students’ emotions and adjust teaching methods. Smart Learning Environments 2025, 12, 21. [Google Scholar] [CrossRef]
- El Maazouzi, Q.; Retbi, A. Multimodal Detection of Emotional and Cognitive States in E-Learning Through Deep Fusion of Visual and Textual Data with NLP. Computers 2025, 14, 314. [Google Scholar] [CrossRef]
- Troussas, C.; Krouska, A.; Sgouropoulou, C. Learner Modeling and Analysis. In Human-Computer Interaction and Augmented Intelligence: The Paradigm of Interactive Machine Learning in Educational Software; Springer, 2025; pp. 305–345.
- Sajja, R.; Sermet, Y.; Cikmaz, M.; Cwiertny, D.; Demir, I. Artificial intelligence-enabled intelligent assistant for personalized and adaptive learning in higher education. Information 2024, 15, 596. [Google Scholar] [CrossRef]
- Gligorea, I.; Cioca, M.; Oancea, R.; Gorski, A.T.; Gorski, H.; Tudorache, P. Adaptive learning using artificial intelligence in e-learning: A literature review. Education Sciences 2023, 13, 1216. [Google Scholar] [CrossRef]
- Iliska, D.; Gudoniene, D. Sustainable technology-enhanced learning for learners with dyslexia. Sustainability 2025, 17, 4513. [Google Scholar] [CrossRef]
- Szabó, T.; Babály, B.; Pataiová, H.; Kárpáti, A. Development of spatial abilities of preadolescents: What works? Education Sciences 2023, 13, 312. [Google Scholar] [CrossRef]
- Gm, D.; Goudar, R.; Kulkarni, A.A.; Rathod, V.N.; Hukkeri, G.S. A digital recommendation system for personalized learning to enhance online education: A review. IEEE Access 2024, 12, 34019–34041. [Google Scholar] [CrossRef]
- Rapanta, C.; Botturi, L.; Goodyear, P.; Guàrdia, L.; Koole, M. Online university teaching during and after the Covid-19 crisis: Refocusing teacher presence and learning activity. Postdigital science and education 2020, 2, 923–945. [Google Scholar] [CrossRef] [PubMed]
- Nikolic, S.; Daniel, S.; Haque, R.; Belkina, M.; Hassan, G.M.; Grundy, S.; Lyden, S.; Neal, P.; Sandison, C. ChatGPT versus engineering education assessment: a multidisciplinary and multi-institutional benchmarking and analysis of this generative artificial intelligence tool to investigate assessment integrity. European Journal of Engineering Education 2023, 48, 559–614. [Google Scholar] [CrossRef]
- Sharif, M.; Uckelmann, D. Multi-Modal LA in Personalized Education Using Deep Reinforcement Learning Based Approach. IEEE Access 2024, 12, 54049–54065. [Google Scholar] [CrossRef]





| Category | Technology/Platform | Function | Limitation |
|---|---|---|---|
| T5, BERT, GPT | Text-based content analysis and semantic extraction | Requires fine-tuning for educational contexts | |
| Real-Time Content Analysis Tools | Whisper, Google Speech-to-Text | Speech recognition, and lecture transcription | Accuracy may drop with noisy inputs or accents |
| OpenCV, YOLO, Vision API | Visual content segmentation and object recognition | Limited interpretation of abstract visuals | |
| Emotion APIs, Affective Computing Tools | Detects emotional and motivational states in learners | Potential bias; limited granularity without hardware | |
| Learner Modeling & Preference Detection | Eye-tracking (Tobii, iMotions) | Tracks gaze, attention, and behavioral engagement | Intrusive or costly; sensitive to setup |
| VARK, Felder-Silverman Models | Categorizes learners by preferred learning modalities | Contested theoretical validity | |
| TTS Engines (Polly, WaveNet) | Delivers content in natural spoken formats | Modality fidelity varies by language and platform | |
| Multimodal Delivery Tools | ChatGPT, Gemini, GenAI | Generates custom content for adaptive instruction | Limited control over depth and granularity |
| Semantic Communication Models | Optimizes message meaning in low-bandwidth settings | Still emerging; high technical complexity | |
| Existing Adaptive Learning Systems | Khan Academy, Coursera, Smart Sparrow | Personalized paths based on performance history | Lacks real-time adaptation and multimodal personalization |
| Feature | STREAM | Khan Academy | Coursera | Smart Sparrow | Duolingo | ALEKS |
|---|---|---|---|---|---|---|
| Real-Time Adaptation | Yes: Decomposes and adapts content during live/streamed lessons with s latency on standard hardware. | Partial: AI feedback via Khanmigo; adapts between exercises from post-performance data; limited in-lesson processing. | Partial: Recommends modules post-quiz; no live decomposition/regeneration. | Partial: Adaptive simulations, but rule-based and not real-time for all content types. | Partial: Adjusts difficulty in-session, but limited to gamified drills without full content transformation. | No: Adapts paths based on assessments; offline processing dominates. |
| Multimodal Content Delivery | Yes: Dynamically generates/regenerates across text, audio, video, diagrams; fuses ASR/NLP/CV for seamless integration. | Partial: Text, video, exercises with some AI narration; no real-time modality switching or generation. | Partial: Video lectures, quizzes, text; limited to pre-made formats without fusion. | Yes: Interactive simulations with text/video; not AI-driven regeneration for live contexts. | Partial: Audio/text drills, images; app-based, no video decomposition or custom generation. | No: Primarily text-based math problems; minimal multimodal support. |
| Personalization Depth | High: Dynamic learner profiles (cognitive/affective states via eye-tracking, emotion APIs); adapts to preferences, disabilities, multilingual needs with UDL compatibility. | Medium: Performance-based paths with AI tutoring; basic mastery tracking, limited affective or real-time behavioral modeling. | Medium: Skill-based recommendations; learner profiles limited to progress/history. | High: Scenario-based adaptation; includes some behavioral cues, but not deeply affective. | Medium: Skill/decay models; gamified, but no deep affective or disability-focused profiles. | High: Knowledge-space theory for math; detailed but domain-specific; no multimodal/affective. |
| Content Decomposition & Tagging | Yes: AI-driven (BERT/T5 for semantics, Whisper for speech, YOLO/OpenCV for visuals); tags units with metadata for traceability. | No: Relies on pre-tagged content; no automated decomposition. | No: Courses are pre-structured; no real-time tagging. | Partial: Tags simulations; manual/author-driven, not AI-automated. | No: Pre-built lessons; algorithmic but not decomposed via multimodal AI. | No: Pre-defined knowledge points; no real-time multimodal tagging. |
| Equity & Accessibility Focus | High: Designed for diverse populations; supports multilingual use and disabilities via regenerated formats and provenance links. | Medium: Free access, subtitles, AI for underserved areas; largely one-size-fits-all. | Medium: Subtitles, mobile access; partnerships for equity, but not adaptive regeneration. | Medium: Customizable for inclusivity; deployment-limited. | High: Multilingual support, gamification for engagement; app-centric, less for disabilities. | Medium: Adaptive pacing; limited multimodal accessibility. |
| Scalability & Hardware Needs | High: Modular pipeline for classroom-grade hardware; pilot-tested in clean conditions with a roadmap for noisy/bandwidth-constrained extensions. | High: Web/app-based; scales globally. | High: Cloud-based; accessible worldwide. | Medium: Requires authoring tools; less scalable for non-experts. | High: Mobile-first; scales via app ecosystem. | High: Web-based; LMS integrations; math-focused. |
| Validation & Evidence | Pilot-based: Feasibility on a 5-minute STEM clip; staged roadmap for diverse testing (e.g., multilingual, disabilities). | Extensive: Data from millions; A/B tests on mastery learning and AI efficacy. | Extensive: University partnerships; completion-rate analyses. | Research-backed: Studies on adaptive simulations. | Extensive: App metrics; language-retention studies. | Research-backed: Knowledge-space model validated in education studies. |
| Component | Purpose | Technologies Used | Role in Framework | Conceptual Flow Location |
|---|---|---|---|---|
| Knowledge Point Extraction | Identifies and isolates core instructional concepts (definitions, skills, etc.) from multimodal content. | Transformer-based NLP (e.g., BERT), OCR, semantic parsing. | Converts instructional content into modular, meaningful learning units. | Middle layer (content analysis / decomposition) |
| Metadata Generation | Adds descriptive and pedagogical tags (e.g., type, difficulty, modality) to enable intelligent retrieval and alignment. | Heuristic tagging, Bloom’s taxonomy mapping, prosodic/emotional analysis. | Provides a metadata layer for content organization and adaptive use. | Middle layer (content analysis / decomposition) |
| Learner Profiling | Builds dynamic profiles based on learner preferences, behaviors, and emotional states to guide personalization. | Machine learning, affective computing, behavioral analytics. | Guides decision-making on what and how to present content. | Receiver side (student model) |
| Adaptive Content Delivery | Delivers content in customized formats and sequences across modalities, adapting in real-time to learner responses. | Decision algorithms, multimodal rendering engines, real-time feedback loops. | Implements learner-facing adaptations for engagement and mastery. | Receiver side (personalized delivery) |
| Label | Precision (%) | Recall (%) | F1 (%) |
| Knowledge Point | 75.0 | 65.0 | 69.7 |
| Prompt | 90.0 | 85.0 | 87.4 |
| Entity | 80.0 | 70.0 | 74.7 |
| Example | 70.0 | 60.0 | 64.7 |
| Overall (Macro Avg.) | 78.8 | 70.0 | 74.1 |
| Stage | Median Time (min) | 90th Percentile (min) |
| ASR | 4.1 | 4.5 |
| NLP (Tagging) | 1.2 | 1.4 |
| Vision (OCR + Arrow Detection) | 1.4 | 1.6 |
| Rendering | 0.5 | 0.6 |
| Total | 7.2 | 7.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
