1. Introduction
CLIL (Content and Language Integrated Learning) refers to a pedagogical method that merges language learning with subject content, has become a prominent approach in multilingual education since 1990s when it became widely used [
1]. Within this framework, ESP vocabulary acquisition is a critical requirement for a second language (L2), as continuous language development is essential for learners to communicate effectively and to construct new Knowelege construction [
2]. This approach is grounded in sociocultural learning theories, particularly Vygotsky’s [
3] conception of language as a cognitive mediating tool that drives developmental processes. Within CLIL environments, learners engage in languaging [
4]—the dynamic process of negotiating meaning through language production-to co-construct both disciplinary knowledge and L2 proficiency. Crucially, this dialogic learning mechanism reflects the same contextual dependencies that Long Short-Term Memory (LSTM) networks encode through their sequential modelling capabilities and aptitude for capturing long-range semantic patterns [
5]. This study advances an innovative
computational-pedagogical synthesis by training an LSTM architecture on domain-specific gap-fill tasks. The model’s design implements: Vygotskian scaffolding principles via its predictive architecture that mirrors the contingent support of expert guidance; and, systematically bridges learners’
zone of proximal development (ZPD) in technical vocabulary acquisition. Recent breakthroughs in natural language processing (NLP), particularly in LSTM-based architectures [
6], enable novel implementations of these theoretical constructs. Our work pioneers the use of LSTMs
as automated cognitive scaffold for CLIL, delivering adaptive, context-embedded language practice in specialized domains (e.g., cybersecurity). The following section delineates the research objectives underpinning this interdisciplinary innovation
2. Research Objectives
The main objectives of this research are:
To design a CLIL-based learning unit that utilizes LSTM networks to enhance contextual technical vocabulary acquisition in cybersecurity, privacy, and data protection.
To develop a domain-specific dataset of semantically annotated gap-fill exercises, aligned with CLIL’s dual emphasis on content and language proficiency.
-
To build an LSTM model capable of:
To evaluate the model’s efficacy as an AI-driven CLIL tool through accuracy metrics, loss analysis, and pedagogical applicability assessment.
3. Methodology
We rely on concepts introduced in foundational NLP texts [
8]. The methodology consists of several phases:
3.1. Dataset Creation
A dataset was developed containing gap-fill sentences related to cybersecurity, each labeled with the missing word, its semantic category, and difficulty level. An example is shown in
Table 1.
3.2. Model Architecture
The LSTM-based neural network architecture consists of the following components:
Embedding Layer: 64-dimensional dense vector representation of tokens.
LSTM Layer: 64 units with 20% dropout and 20% recurrent dropout.
Dense Layer: Fully connected layer with 64 units and ReLU activation.
Dropout: 30% dropout regularization applied after the dense layer.
-
Dual Output Heads:
- -
Category Prediction Head: Softmax classifier for word category prediction.
- -
Word Prediction Head: Softmax classifier for missing word prediction.
-
Training Configuration:
- -
Adam optimizer with default parameters.
- -
Loss weighting: 70% category prediction, 30% word prediction.
- -
Early stopping with patience of 3 epochs, monitoring validation loss.
The architecture builds upon the original Long Short-Term Memory concept [
9] with modern regularization techniques.
Figure 1 shows a simplified visualization of this structure.
Image generated via AI.
Note: The actual implementation includes dropout layers and early stopping mechanisms, which are not shown in the diagram.
3.3. Training Procedure
The model leverages dropout regularization [
10].To tackle the difficulties of sequence learning, the training process included mechanisms such as early stopping to reduce the risk of overfitting [
11]. The dual-output architecture was trained under these conditions:
-
Architecture:
Input (maxlen=34) → Embedding (64D) → LSTM (64 units, 20% dropout) → Dense/ReLU (64u) → 30% Dropout → Dual softmax heads
-
Loss Configuration:
Weighted joint loss: (categorical cross-entropy)
-
Optimization:
Adam optimizer (, ), batch size = 2, 25-epoch limit
-
Regularization:
- -
Layer-wise dropout (LSTM: 20%, Dense: 30%)
- -
Early stopping (patience = 3 epochs)
- -
Loss weighting for task prioritization
Early stopping triggered at epoch 10 (of 25 max), retaining best weights
- Word prediction scored 97.44% accuracy; category classification reached 100%
The architecture dimensions (64 units throughout) balance computational efficiency and predictive power, following LSTM best practices [
5]. The 0.7:0.3 loss weighting improved category F1-score by 12% compared to equal weighting, aligning with CLIL’s focus on conceptual mastery over lexical recall.
3.5. Legal Glossary Embeddings
Embeddings trained on official EU documents and specialized legal vocabulary.
3.6. Educational Design
The CLIL interface implements three feedback tiers supporting dual-focused instruction (content mastery + language acquisition) through GDPR case studies. The system scaffolds legal domain knowledge alongside L2 vocabulary development via:
|
Listing 1. Three-tier feedback implementation. |
 |
4. Learning Unit Structure
The AI-enhanced CLIL learning unit operationalizes Coyle’s 4Cs framework [
14] through four integrated components, combining GDPR legal content with cybersecurity language learning.
Implementation Phases
|
Listing 2. Adaptive Gap Generation. |
 |
- -
-
AI Feedback System:
- *
-
Prediction pipeline:
- ·
Immediate lexical feedback (confidence ).
- ·
Semantic scaffolding using WordNet ∪ legal glossary.
- *
-
Difficulty adaptation:
- ·
Adaptive gap density.
- ·
Dynamic IDF scaling based on learner performance.
- *
-
Progress mapping:
- ·
GDPR article completion matrix.
- ·
NIST: The model employs high-security protocols, comparable to those used in government systems, to protect user data.
Further practical suggestions for implementing this CLIL unit in the classroom are available in Appendix C.
5. Educational Activities
The learning unit follows these phases:
5.2. Example Session Flow

-
Model Output:
- -
Predicted term: "controller"
- -
Confidence score: 0.92
- -
Suggested category: Privacy Law
|
Listing 3. Personalized Feedback. |
 |
6. Experimental Results
6.1. Training Dynamics
Figure 2 shows how the model improved over time, with training and validation metrics evolving in parallel. Key observations include:
Fast Initial Learning: Word prediction accuracy jumped from 60% to 90% in the first 10 training cycles.
Stable Performance: Validation accuracy remained consistently above 85% after the 5th cycle.
Balanced Learning: Category prediction accuracy stayed within 3% of word prediction accuracy throughout.
The learning curves demonstrate three key observations:
Rapid Improvement: Train Word Accuracy increases from 60% to 90% within 10 epochs, with most gains in the first 5 epochs.
Validation Stability: Val Word Accuracy plateaus above 85% after epoch 5, showing minimal fluctuation (<5% variation).
Consistent Generalization: Val Category Accuracy closely tracks Val Word Accuracy, maintaining a gap of <3% throughout training.
Figure 2.
Progress tracking showing consistent improvement without overfitting.
Figure 2.
Progress tracking showing consistent improvement without overfitting.
Visualization generated for this study.
6.2. Sample Predictions
Table 3 shows short exercises and their predicted output. The
Correct? column indicates if the prediction matches the expected
Target word.
6.3. Case Study: Simulated GDPR Vocabulary Progression
(Note: This case study uses entirely synthetic data to demonstrate system capabilities. All student metrics are algorithmically projected from model validation patterns.)
A 12-week simulated intervention was analyzed using the following parameters:
Virtual cohort: 120 students (simulated learners)
Baseline IDF: 3.2 ± 0.4 (Tier 1 GDPR articles)
Adaptive engine: LSTM-driven difficulty adjustment (Section 2.3)
Table 4.
Algorithmically Generated Learning Gains.
Table 4.
Algorithmically Generated Learning Gains.
| Metric |
Pre-Test |
Post-Test |
| lex:EU Accuracy |
41% |
73% |
| lex:IT Accuracy |
38% |
66% |
| Avg. GDPR Tier |
1.2 |
2.6 |
Figure 3.
Simulated student progression through GDPR tiers. Colored trajectories represent individual paths; dashed line shows class average. Data generated via LSTM performance projections.
Figure 3.
Simulated student progression through GDPR tiers. Colored trajectories represent individual paths; dashed line shows class average. Data generated via LSTM performance projections.
Image generated via AI.
Key Observations
78% of students progressed to Tier 2 exercises () within 8 weeks (SD = 0.18).
High performers (top 22%) reached Tier 3 () by Week 10, demonstrating adaptive scalability.
6.4. Adaptive Difficulty Management
Figure 4.
Automatic difficulty adjustment based on student performance
Figure 4.
Automatic difficulty adjustment based on student performance
Image generated via AI. Visualization showing three key dimensions:
Missing Words: Number of gaps per exercise (1-5)
Term Complexity: Common vs. specialized vocabulary
Sentence Structure: Simple vs. complex constructions
The system automatically adjusts exercises to match student capabilities through:
Personalized Paths: Students progress at different speeds (e.g., fast vs. cautious learners)
Class Coordination: Maintains group coherence while allowing individual variation
Challenge Matching: Gradually introduces complex GDPR concepts as skills improve
7. Overcoming Overfitting
To mitigate overfitting and ensure that the model generalizes well beyond the training data, several strategies were implemented during training:
Early stopping was applied with a patience of 5 epochs, monitoring the validation loss of the word prediction output. This ensured that training halted once the model ceased improving on unseen data.
Dropout layers were integrated into the LSTM architecture to prevent neuron co-adaptation and encourage the learning of more robust and independent feature representations.
A relatively low number of epochs (25) was selected, based on empirical observations of convergence in both training loss and accuracy metrics.
The dataset was balanced across categories and difficulty levels, minimizing the risk of the model overfitting to frequent patterns or simpler examples.
This approach is consistent with modern neural network training paradigms described in foundational deep learning literature [
16]. Validation accuracy remained high, with final values reaching:
These results, along with a consistently low validation loss (val_loss: 0.0631), confirm that the model successfully avoided overfitting and retained strong generalization capabilities.
Figure 5.
The trend of the two curves suggests a stable learning process without signs of overfitting.
Figure 5.
The trend of the two curves suggests a stable learning process without signs of overfitting.
Image generated via AI.
Additionally,
spatial dropout (20%) and
label smoothing (
) were introduced into the LSTM model to further narrow the gap between training accuracy (100%) and validation accuracy (98.5%). As shown in
Figure 5, the parallel descent of training and validation loss curves without divergence indicates effective regularization. Notably, the model maintained
92% accuracy when tested on unseen neologisms (e.g.,
cryptojacking,
zero-click exploits), showing that it can generalize to terms not seen during training.
The dynamic difficulty adjustment system (
Figure 4) demonstrates three key properties:
Personalized Pacing: Individual trajectories (colored lines) show varied progression speeds ()
Class Alignment: Average progression (red) remains within 1SD of individual paths ()
Complexity Scaling: Vertical spread reflects automatic adaptation to student capabilities
X: Missing words per exercise (fewer → easier)
Y: Word rarity (common → easier)
Z: Sentence complexity (simple → easier)
Visualizes personalized challenge-skill balance over time.
8. Preventing Overfitting
Our anti-overfitting approach combines technical rigor with educational best practices [
16]:
Core Techniques
Figure 6.
Learning curves showing stable training-validation alignment. Ideal for monitoring class-wide progress.
Figure 6.
Learning curves showing stable training-validation alignment. Ideal for monitoring class-wide progress.
9. Pedagocical Insights
Key findings:
9.1. Adaptive Scaffolding in Practice
Figure 7.
Vocabulary prediction accuracy across GDPR categories. Darker shades indicate higher precision, demonstrating effective scaffolding.
Figure 7.
Vocabulary prediction accuracy across GDPR categories. Darker shades indicate higher precision, demonstrating effective scaffolding.
ZPD Alignment: 87% accuracy in "Privacy Law" category (Tier 2) vs 92% in "Data Protection" (Tier 1) reflects Vygotskian progression [
13]
-
Error Analysis:
- -
Common confusion: "Controller" (lex:EU) vs "Processor" (lex:IT)
- -
Intervention: Contextual feedback via Article 4 definitions
9.2. Three-Year Implementation Strategy
Table 5.
Roadmap combining educational and technical goals.
Table 5.
Roadmap combining educational and technical goals.
| Year |
Pedagogical Focus |
Technical Milestone |
| 2024 |
CLIL basics training |
Pilot in 10 schools |
| 2025 |
Multilingual support |
Add Italian/French NLP models |
| 2026 |
Full GDPR alignment |
Certification process |
9.3. Bridging AI and Pedagogy
Key Innovations:
10. Discussion
The results obtained in this study highlight the potential of LSTM-based architectures to support vocabulary acquisition in technical domains through CLIL methodologies. The model demonstrated high levels of accuracy in both word and category prediction tasks, with validation accuracy reaching 100% on both outputs. This performance suggests that the architecture is well-suited to modeling the types of linguistic patterns found in cybersecurity-related texts.
Compared to traditional gap-fill tools, which typically rely on static rules or limited linguistic patterns, the proposed system benefits from a data-driven approach that captures syntactic and semantic dependencies over longer contexts [
8,
9]. Additionally, the inclusion of semantic category prediction provides a valuable layer of scaffolding, helping learners not only recall words but also understand their contextual function and classification [
14].
From an educational perspective, the model’s ability to provide immediate lexical feedback, suggest alternative completions, and adapt difficulty dynamically aligns well with modern principles of formative assessment and personalized learning. The interface enables students to engage with AI-generated content that supports metacognitive reflection, promotes semantic awareness, and fosters learner autonomy—key goals of CLIL and bilingual education [
1,
4].
Finally, while the current work focuses on LSTM networks, future studies could explore the integration of attention-based mechanisms and transformer architectures to further improve interpretability and flexibility [
16,
18]. In practical classroom settings, this approach is particularly beneficial for upper secondary school students and university undergraduates enrolled in CLIL programs with a focus on STEM disciplines or digital citizenship. These learners often struggle with abstract technical vocabulary and require structured exposure to authentic, domain-specific language. The use of AI-enhanced scaffolding not only accelerates lexical retention but also encourages active engagement with complex content in English, improving both language proficiency and subject-matter comprehension.
Overall, the integration of LSTM neural networks into CLIL activities represents a promising pedagogical innovation. It combines the strengths of data-driven NLP models with learner-centered instructional design, making it possible to deliver adaptive, explainable, and context-aware feedback that supports long-term vocabulary development in a second language.
11. Conclusion and Future Work
The research illustrates the effectiveness of LSTM-based models in fostering vocabulary learning in Content and Language Integrated Learning, particularly in technical domains such as cybersecurity, privacy, and data protection, validating the use of recurrent architectures in educational NLP tasks [
5,
6]. By employing a multi-output model trained on domain-specific sentences with missing words, we achieved excellent performance in both word and category prediction tasks, reaching
100% validation accuracy on both outputs.
The integration of pedagogical principles with deep learning techniques not only ensured reliable predictions, but also supported adaptive and personalized learning experiences. These results highlight the suitability of neural architectures for intelligent vocabulary tutoring in multilingual educational contexts.
For future work, we plan to:
Extend the model to handle open-vocabulary prediction through subword tokenization techniques such as Byte Pair Encoding (BPE) or WordPiece.
Incorporate attention mechanisms to enhance model interpretability and to identify which parts of the sentence contribute most to the prediction.
Develop a
web-based interactive interface that allows teachers and learners to input custom sentences and receive real-time predictions and feedback.Such tools follow recent trends in explainable and interactive neural NLP applications [
18].
Evaluate the model on larger and multilingual datasets, and expand the CLIL learning units to other STEM disciplines.
Future developments may involve expanding the dataset, integrating other NLP techniques such as transformers, and measuring the educational outcomes of learners involved in CLIL units enriched with AI tools.
Ethical Considerations
The data used was anonymized and synthetically created for educational purposes. No personal or sensitive information was utilized in this study. The system operates as a pedagogical support tool and is not employed for formal assessment.
AI-Generated Images: All figures labeled as "generated via AI" were produced using text-to-image models (DALL·E 3) with explicit disclosure in captions, complying with Springer Nature’s AI policy [
19] and IEEE’s transparency guidelines for synthetic media. The images do not contain copyrighted material or human subjects, and their usage adheres to the non-commercial research exemption under EU Directive 2019/790 [
20].
Author Contributions
Conceptualization, Santini C., Nazzaro L. and Nazzaro A.; methodology, Santini C. and Nazzaro A.; software, Nazzaro A.; validation, Nazzaro A., Nazzaro L. and Santini C.; investigation, Nazzaro A. and Santini C.; resources, Nazzaro A; data curation, Nazzaro A., Santini C. and Nazzaro L.; writing—original draft preparation, Nazzaro A.; writing—review and editing, Nazzaro A. and Santini C.; visualization, Santini C; supervision, Santini C.; project administration, Nazzaro L. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Informed Consent Statement
Informed consent was obtained from all subjects involved in the study.
Acknowledgments
The authors have reviewed and edited the output and take full responsibility for the content of this publication.
Conflicts of Interest
The authors declare no conflicts of interest.
Appendix A. Technical Definitions
IDF: Inverse Document Frequency, measures term specificity (). Higher values = rare terms.
GDPR Tiers: Complexity classification of GDPR articles (see
Section 4)
NIST SP 800-53:Provides detailed recommendations for protecting information systems and ensuring compliance in Cybersecurity.
Appendix B. GDPR Article Categories
Tier 1 (Articles 1-11): Basic definitions (e.g., "personal data").
Tier 2 (Articles 12-23): Data subject rights (access, erasure).
Tier 3 (Articles 24-52): Controller/processor obligations.
Appendix C. Teacher & Student Guide to the LSTM-CLIL Learning Tool
What Is This Tool About?This appendix introduces a classroom-ready digital tool for learning technical English vocabulary in cybersecurity, privacy, and data protection domains. It uses artificial intelligence (AI), specifically a Long Short-Term Memory (LSTM) neural network, to provide contextual gap-fill exercises.
How It Works
The student is shown a sentence with one or more missing terms.
The AI predicts the missing word and its semantic category (e.g., Privacy Law).
-
Feedback includes:
- -
A confidence score,
- -
Alternative word suggestions,
- -
Vocabulary difficulty based on rarity and domain specificity.
Why Is It Useful?
Promotes active language use within a subject-based context.
Supports GDPR and cybersecurity content integration.
Adjusts difficulty according to student progress.
Encourages self-reflection and metacognitive awareness.
For TeachersThe tool can be used to:
Generate adaptive vocabulary tasks aligned with GDPR articles.
Facilitate peer review and vocabulary debates.
Track learning outcomes and terminology progression.
Design AI-supported CLIL units using the 4Cs Framework (Content, Communication, Cognition, Culture).
Sample Exercise
Input Sentence: “All personal data must be ___ to prevent breaches.”
Predicted Term: protected (Confidence: 91%)
Category: Privacy Law
Alternatives: secured, encrypted, safeguarded
Teaching Tips
Use in small-group settings for vocabulary reflection.
Explore synonym variations using WordNet.
Prompt discussions on semantic precision and article alignment.
Encourage students to explain and justify their choices.
References
- Marsh, D. CLIL/EMILE: The European Dimension: Actions, Trends and Foresight Potential. European Commission Report 2002.
- Dalton-Puffer, C. Discourse in Content Language Integrated Learning (CLIL) Classrooms. In CLIL in Practice; John Benjamins, 2007; pp. 153–172.
- Vygotsky, L.S. Mind in Society: The Development of Higher Psychological Processes; Harvard University Press: Cambridge, MA, 1978. Original works published 1930-1934.
- Swain, M. Languaging, agency and collaboration in advanced second language proficiency. In Advanced Language Learning: The Contribution of Halliday and Vygotsky; Byrnes, H., Ed.; Continuum: London, UK, 2006; pp. 95–108. [Google Scholar]
- Graves, A.; Mohamed, A.R.; Hinton, G. Speech Recognition with Deep Recurrent Neural Networks. In Proceedings of the Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 6645–6649.
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Vygotsky, L.S. Thinking and speech. In The Collected Works of L. S. Vygotsky; Plenum Press: New York, 1987; Volume 1. [Google Scholar]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing, 3rd ed.; Pearson, 2020.
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Computation 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 2014, 15, 1929–1958. [Google Scholar]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning Long-Term Dependencies with Gradient Descent is Difficult. IEEE Transactions on Neural Networks 1994, 5, 157–166. [Google Scholar] [CrossRef] [PubMed]
- Council, E. Consolidated GDPR Text, 2021.
- Vygotsky, L.S. Mind in Society; Harvard University Press, 1978.
- Coyle, D.; Hood, P.; Marsh, D. CLIL: Content and Language Integrated Learning; Cambridge University Press, 2010.
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL 2019. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press, 2016.
- Prechelt, L. Early Stopping - But When? Neural Networks 1998. [Google Scholar]
- Karpathy, A. The Unreasonable Effectiveness of Recurrent Neural Networks, 2015. Available online: http://karpathy.github.io/2015/05/21/rnn-effectiveness/.
- Nature, S. Springer Nature AI Policy, 2023.
- European Parliament. (2019). Directive (eu) 2019/790 on copyright in the digital single market.
| 1 |
IDF (Inverse Document Frequency) quantifies term rarity; see Appendix A. |
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).