Submitted:
09 April 2025
Posted:
10 April 2025
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Methods
A. Research Design
B. Search Strategy
- “generative AI” OR “generative artificial intelligence”
- “large language model*” OR “LLM*”
- “transformer model*” OR “attention mechanism*”
- “GPT” OR “BERT” OR “LaMDA” OR “PaLM”
- “neural language model*”, “self-attention”
- “LLM model*”, “LLM training”
- “chain-of-thought prompting*”, “prompt engineering”
- “tokenization”, “text-to-text transfer”
C. Inclusion and Exclusion Criteria
- Peer-reviewed journal articles, conference proceedings, and technical reports
- Literature focusing on generative AI and large language models
- Studies examining technical architecture, applications, limitations, or future directions
- Publications in English
- Literature published between 2017 and March 2025
- Websites for institutions (European Commission) and blogs for main LLM providers (OpenAI, DeepSeek, Google)
- Non-English publications
- Opinion pieces without substantial technical or empirical content
- Studies focusing exclusively on other AI technologies without significant discussion of generative AI or LLMs
- Duplicate publications or multiple reports of the same study
D. Data Extraction and Synthesis
- Publication details (authors, year, journal/conference)
- Study objectives and methodology
- Key findings and contributions
- Technical details of models or architectures discussed
- Applications and use cases
- Limitations and challenges identified
- Future research directions proposed
E. Quality Assessment and Limitations of Review Methodology
III. Results
A. Technical Architecture of Generative LLMs
1). Transformer Architecture
2). Components of Modern LLMs
3). Architectural Variations
4). Scaling Properties, Emergent Capabilities and Efficiency Innovations
B. Applications and Use Cases
C. Limitations and Challenges
1). Technical Limitations
2). Ethical Concerns
3). Regulatory and Compliance Challenges
4). Environmental Impact
5). Implementation and Integration Challenges
D. Future Directions and Emerging Trends
IV. Discussion
A. Implications for Research and Practice
B. Ethical Considerations
C. Limitations of the Current Review
V. Conclusions
Acknowledgment
References
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar] [CrossRef]
- Brown, T.B.; et al. Language Models are Few-Shot Learners. arXiv 2020. [Google Scholar] [CrossRef]
- Bommasani, R.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2021. [Google Scholar] [CrossRef]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog. Available online: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (accessed on 1 April 2025).
- Yu, P.; Xu, H.; Hu, X.; Deng, C. Leveraging Generative AI and Large Language Models: A Comprehensive Roadmap for Healthcare Integration. Healthcare 2023, 11, 2776. [Google Scholar] [CrossRef]
- Salierno, G.; Leonardi, L.; Cabri, G. Generative AI and Large Language Models in Industry 5.0: Shaping Smarter Sustainable Cities. Encyclopedia 2025, 5, 30. [Google Scholar] [CrossRef]
- Sallam, M. ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns. Healthcare 2023, 11, 887. [Google Scholar] [CrossRef]
- Chen, M.; et al. Evaluating Large Language Models Trained on Code. arXiv 2021. [Google Scholar] [CrossRef]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA; 2022; pp. 10674–10685. [Google Scholar] [CrossRef]
- Shuster, K.; Poff, S.; Chen, M.; Kiela, D.; Weston, J. Retrieval Augmentation Reduces Hallucination in Conversation. Findings of the Association for Computational Linguistics: EMNLP 2021; Association for Computational Linguistics: Punta Cana, Dominican Republic, 2021; pp. 3784–3803. [Google Scholar] [CrossRef]
- Bengio, Y.; Ducharme, R.; Vincent, P. A neural probabilistic language model. In Proceedings of the 14th International Conference on Neural Information Processing Systems, in NIPS’00; MIT Press: Cambridge, MA, USA, 2000; pp. 893–899. [Google Scholar]
- Vaswani, A.; et al. Attention Is All You Need. arXiv 2017. [Google Scholar] [CrossRef]
- Kaplan, J.; et al. Scaling Laws for Neural Language Models. arXiv 2020, arXiv:2001.08361. [Google Scholar] [CrossRef]
- Touvron, H.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023. [Google Scholar] [CrossRef]
- DeepSeek-AI; et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning. arXiv 2025. [Google Scholar] [CrossRef]
- Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Virtual Event; ACM: Canada; pp. 610–623. [CrossRef]
- Bolukbasi, T.; Chang, K.-W.; Zou, J.; Saligrama, V.; Kalai, A. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. arXiv 2016. [Google Scholar] [CrossRef]
- Patterson, D.; et al. Carbon Emissions and Large Neural Network Training. arXiv 2021. [Google Scholar] [CrossRef]
- Tamkin, A.; Brundage, M.; Clark, J.; Ganguli, D. Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. arXiv 2021. [Google Scholar] [CrossRef]
- Moher, D.; Liberati, A.; Tetzlaff, J.; Altman, D.G.; The PRISMA Group. Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement. PLoS Med. 2009, 6, e1000097. [Google Scholar] [CrossRef]
- Critical Appraisal Skills Programme. CASP Systematic Review Checklist. 2018. Available online: https://casp-uk.net/casp-tools-checklists/systematic-review-checklist/ (accessed on 1 April 2025).
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The Efficient Transformer. arXiv 2020. [Google Scholar] [CrossRef]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar] [CrossRef]
- Shaw, P.; Uszkoreit, J.; Vaswani, A. Self-Attention with Relative Position Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers); Association for Computational Linguistics: New Orleans, Louisiana, 2018; pp. 464–468. [Google Scholar] [CrossRef]
- Sennrich, R.; Haddow, B.; Birch, A. Neural Machine Translation of Rare Words with Subword Units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Berlin, Germany; pp. 1715–1725. [CrossRef]
- Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2016. [Google Scholar] [CrossRef]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition; 2016; pp. 770–778. [Google Scholar]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. 2018. [Google Scholar]
- Raffel, C.; et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. arXiv 2019. [Google Scholar] [CrossRef]
- Chowdhery, A.; et al. PaLM: Scaling Language Modeling with Pathways. arXiv 2022. [Google Scholar] [CrossRef]
- Hoffmann, J.; et al. Training Compute-Optimal Large Language Models. arXiv 2022, arXiv:2203.15556. [Google Scholar] [CrossRef]
- Henighan, T.; et al. Scaling Laws for Autoregressive Generative Modeling. arXiv 2020, arXiv:2010.14701. [Google Scholar] [CrossRef]
- Wei, J.; et al. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of the 36th International Conference on Neural Information Processing Systems, in NIPS ’22; Curran Associates Inc.: Red Hook, NY, USA, 2022. [Google Scholar]
- Hu, E.J.; et al. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021, arXiv:2106.09685. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
- Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International conference on machine learning; PMLR, 2020; pp. 11328–11339. [Google Scholar]
- Johnson, M.; et al. Google`s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. Trans. Assoc. Comput. Linguist. 2017, 5, 339–351. [Google Scholar] [CrossRef]
- Roller, S.; et al. Recipes for building an open-domain chatbot. arXiv 2020, arXiv:2004.13637. [Google Scholar] [CrossRef]
- Chu, Z.; et al. LLM Agents for Education: Advances and Applications. arXiv 2025. [Google Scholar] [CrossRef]
- Ouyang, X.; et al. ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora. arXiv 2021, arXiv:2012.15674. [Google Scholar] [CrossRef]
- Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A Pretrained Language Model for Scientific Text. arXiv 2019, arXiv:1903.10676. [Google Scholar] [CrossRef]
- Esteva, A.; et al. CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization. arXiv 2020, arXiv:2006.09595. [Google Scholar] [CrossRef]
- Chow, J.C.L.; Wong, V.; Li, K. Generative Pre-Trained Transformer-Empowered Healthcare Conversations: Current Trends, Challenges, and Future Directions in Large Language Model-Enabled Medical Chatbots. BioMedInformatics 2024, 4, 837–852. [Google Scholar] [CrossRef]
- Ciubotaru, B.-I.; et al. Frailty Insights Detection System (FIDS)—A Comprehensive and Intuitive Dashboard Using Artificial Intelligence and Web Technologies. Appl. Sci. 2024, 14, 16. [Google Scholar] [CrossRef]
- Svyatkovskiy, A.; Deng, S.K.; Fu, S.; Sundaresan, N. IntelliCode compose: code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event; ACM: USA, 2020; pp. 1433–1443. [Google Scholar] [CrossRef]
- Iyer, S.; Konstas, I.; Cheung, A.; Zettlemoyer, L. Summarizing Source Code using a Neural Attention Model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Association for Computational Linguistics: Berlin, Germany, 2016; pp. 2073–2083. [Google Scholar] [CrossRef]
- Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar] [CrossRef]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Hawthorne, C.; et al. Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset. arXiv 2019, arXiv:1810.12247. [Google Scholar] [CrossRef]
- Amodei, D.; Olah, C.; Steinhardt, J.; Christiano, P.; Schulman, J.; Mané, D. Concrete Problems in AI Safety. arXiv 2016, arXiv:1606.06565. [Google Scholar] [CrossRef]
- Wei, J.; He, J.; Chen, K.; Zhou, Y.; Tang, Z. Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 2017, 69, 29–39. [Google Scholar] [CrossRef]
- Hendrycks, D.; et al. Measuring Mathematical Problem Solving With the MATH Dataset. arXiv 2021, arXiv:2103.03874. [Google Scholar] [CrossRef]
- Lemley, M.A.; Casey, B. Fair Learning. SSRN Electron. J. 2020. [Google Scholar] [CrossRef]
- European Comission. Proposal for a Regulation laying down harmonised rules on artificial intelligence. Available online: https://digital-strategy.ec.europa.eu/en/library/proposal-regulation-laying-down-harmonised-rules-artificial-intelligence (accessed on 4 January 2025).
- Doshi-Velez, F.; Kim, B. Towards A Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
- Strubell, E.; Ganesh, A.; McCallum, A. Energy and Policy Considerations for Deep Learning in NLP. arXiv 2019, arXiv:1906.02243. [Google Scholar] [CrossRef]
- Guu, K.; Lee, K.; Tung, Z.; Pasupat, P.; Chang, M.-W. REALM: Retrieval-Augmented Language Model Pre-Training. arXiv 2020. [Google Scholar] [CrossRef]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning; ACM: Montreal Quebec Canada, 2009; pp. 41–48. [Google Scholar] [CrossRef]
- Tuan, N.T.; Moore, P.; Thanh, D.H.V.; Pham, H.V. A Generative Artificial Intelligence Using Multilingual Large Language Models for ChatGPT Applications. Appl. Sci. 2024, 14, 3036. [Google Scholar] [CrossRef]
- Bengio, Y.; et al. A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms. arXiv 2019, arXiv:1901.10912. [Google Scholar] [CrossRef]
- Leike, J.; Krueger, D.; Everitt, T.; Martic, M.; Maini, V.; Legg, S. Scalable agent alignment via reward modeling: a research direction. arXiv 2018, arXiv:1811.07871. [Google Scholar] [CrossRef]
- Mitchell, M.; et al. Model Cards for Model Reporting. arXiv 2018. [Google Scholar] [CrossRef]
- Acemoglu, D.; Restrepo, P. Automation and New Tasks: How Technology Displaces and Reinstates Labor. J. Econ. Perspect. 2019, 33, 3–30. [Google Scholar] [CrossRef]


Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).