Submitted:
18 August 2024
Posted:
21 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Introducing the term “XtremeLLM” and defines it within the context of current advancements in language model technology.
- Examines the technological, ethical, and practical challenges associated with developing and deploying these models.
- Explores the potential applications and impacts of XtremeLLMs across various sectors, providing insights into how they can be harnessed to advance innovation and address complex challenges in the industry.
2. Related Works
3. Overview of Large Language Models
3.1. GPT Series
3.2. BERT and Its Variants
4. XtremeLLMs
5. Technological Foundations for Scaling Up
5.1. Hardware Innovations
5.2. Software and Algorithmic Developments
5.3. Data Requirements
- Data Ingestion: Integrating data from various sources into a single repository. This stage may involve dealing with different data formats and merging data while ensuring consistency.
- Data Cleaning: Removing inaccuracies and preparing the data by fixing or discarding incorrect records, filling missing values, and resolving inconsistencies.
- Data Standardization: Applying uniform formats and labels to ensure that data from various sources can be used interchangeably in training.
- Data Retrieval: Developing systems that can quickly access required data subsets during the training process, which is critical for efficient use of computational resources.
6. Challenges in Developing XtremeLLMs
6.1. Computational and Financial Costs
6.2. Issues of Diminishing Returns
- Overfitting Risk: Larger models, especially those that significantly outsize their training data, are at a higher risk of memorizing rather than generalizing from their training sets.
- Parameter Efficiency: There is an upper limit to how effectively additional parameters can be utilized due to inherent limitations in training data diversity and model architecture.
- Optimization Challenges: As models grow larger, they become increasingly difficult to tune and optimize. Advanced optimization techniques that work well for smaller models might not scale linearly with size, leading to suboptimal training outcomes.
6.3. Environmental Impacts of Energy Consumption
7. Ethical and Societal Considerations
7.1. Privacy and Security Concerns
7.1.1. Advanced Security Measures
7.1.2. Potential Vulnerabilities
7.1.3. Access Controls and Audits
7.1.4. Federated Learning
7.1.5. Transparency and User Control
7.2. Regulatory and Policy Challenges
7.2.1. Adapting Legal Frameworks
7.2.2. Enhancing Explainability and Accountability
7.2.3. Developing Comprehensive Governance Frameworks
7.2.4. Fostering International Cooperation
7.2.5. Anticipating Future Challenges
7.3. Bias and Fairness
8. Potential Applications and Impacts of XtremeLLMs
8.1. Revolutionary Uses in Technology and Industry
8.2. Economic and Social Implications
8.3. Comparisons with Human Cognitive Capabilities
9. Future Research Directions and Prospects
9.1. Exploration of Alternative Architectures
9.2. Energy Efficiency and Sustainability
9.3. Mitigating Bias and Enhancing Fairness
9.4. Interdisciplinary Applications
9.5. Ethical Implications and AI Governance
9.6. Enhancing Human-AI Interaction
10. Conclusions
References
- Zhang, B.; Haddow, B.; Birch, A. Prompting large language model for machine translation: A case study. International Conference on Machine Learning. PMLR, 2023, pp. 41092–41110.
- Piñeiro-Martín, A.; García-Mateo, C.; Docío-Fernández, L.; López-Pérez, M.d.C. Ethical challenges in the development of virtual assistants powered by large language models. Electronics 2023, 12, 3170. [Google Scholar] [CrossRef]
- Kasneci, E.; Seßler, K.; Küchemann, S.; Bannert, M.; Dementieva, D.; Fischer, F.; Gasser, U.; Groh, G.; Günnemann, S.; Hüllermeier, E.; others. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences 2023, 103, 102274. [Google Scholar]
- Naseem, U.; Razzak, I.; Khan, S.K.; Prasad, M. A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. Transactions on Asian and Low-Resource Language Information Processing 2021, 20, 1–35. [Google Scholar]
- Raiaan, M.A.K.; Mukta, M.S.H.; Fatema, K.; Fahad, N.M.; Sakib, S.; Mim, M.M.J.; Ahmad, J.; Ali, M.E.; Azam, S. A review on large Language Models: Architectures, applications, taxonomies, open issues and challenges. IEEE Access 2024. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; others. Language models are few-shot learners. Advances in neural information processing systems 2020, 33, 1877–1901. [Google Scholar]
- Shoeybi, M.; Patwary, M.; Puri, R.; LeGresley, P.; Casper, J.; Catanzaro, B. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019; arXiv:1909.08053. [Google Scholar]
- Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, 2021, pp. 610–623.
- Zhang, M.; Li, J. A commentary of GPT-3 in MIT Technology Review 2021. Fundamental Research 2021, 1, 831–833. [Google Scholar]
- Federico, M.; Cettolo, M. Efficient handling of n-gram language models for statistical machine translation. Proceedings of the Second Workshop on Statistical Machine Translation, 2007, pp. 88–95.
- Doval, Y.; Gómez-Rodríguez, C. Comparing neural-and N-gram-based language models for word segmentation. Journal of the Association for Information Science and Technology 2019, 70, 187–197. [Google Scholar]
- Bengio, Y.; Ducharme, R.; Vincent, P. A neural probabilistic language model. Advances in neural information processing systems 2000, 13. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural computation 1997, 9, 1735–1780. [Google Scholar] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Min, B.; Ross, H.; Sulem, E.; Veyseh, A.P.B.; Nguyen, T.H.; Sainz, O.; Agirre, E.; Heintz, I.; Roth, D. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Computing Surveys 2023, 56, 1–40. [Google Scholar]
- Kim, G.; Baldi, P.; McAleer, S. Language models can solve computer tasks. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Bongini, P.; Becattini, F.; Del Bimbo, A. Is GPT-3 all you need for visual question answering in cultural heritage? European Conference on Computer Vision. Springer, 2022, pp. 268–281.
- Chan, A. GPT-3 and InstructGPT: technological dystopianism, utopianism, and “Contextual” perspectives in AI ethics and industry. AI and Ethics 2023, 3, 53–64. [Google Scholar]
- Nassiri, K.; Akhloufi, M. Transformer models used for text-based question answering systems. Applied Intelligence 2023, 53, 10602–10635. [Google Scholar]
- Thirunavukarasu, A.J.; Ting, D.S.J.; Elangovan, K.; Gutierrez, L.; Tan, T.F.; Ting, D.S.W. Large language models in medicine. Nature medicine 2023, 29, 1930–1940. [Google Scholar]
- Nguyen-Mau, T.; Le, A.C.; Pham, D.H.; Huynh, V.N. An information fusion based approach to context-based fine-tuning of GPT models. Information Fusion 2024, 104, 102202. [Google Scholar]
- Aydın, N.; Erdem, O.A. A research on the new generation artificial intelligence technology generative pretraining transformer 3. 2022 3rd International Informatics and Software Engineering Conference (IISEC). IEEE, 2022, pp. 1–6.
- Kalyan, K.S. A survey of GPT-3 family large language models including ChatGPT and GPT-4. Natural Language Processing Journal, 1000. [Google Scholar]
- Savelka, J.; Agarwal, A.; Bogart, C.; Sakr, M. From GPT-3 to GPT-4: On the Evolving Efficacy of LLMs to Answer Multiple-choice Questions for Programming Classes in Higher Education. International Conference on Computer Supported Education. Springer, 2023, pp. 160–182.
- Nazir, A.; Wang, Z. A comprehensive survey of ChatGPT: Advancements, applications, prospects, and challenges. Meta-radiology, 1000. [Google Scholar]
- Shahin, M.; Chen, F.F.; Hosseinzadeh, A. Harnessing customized AI to create voice of customer via GPT3. 5. Advanced Engineering Informatics 2024, 61, 102462. [Google Scholar]
- Yang, Z.G.; Laki, L.J.; Váradi, T.; Prószéky, G. Mono-and multilingual GPT-3 models for Hungarian. International Conference on Text, Speech, and Dialogue. Springer, 2023, pp. 94–104.
- Ding, X.; Chen, L.; Emani, M.; Liao, C.; Lin, P.H.; Vanderbruggen, T.; Xie, Z.; Cerpa, A.; Du, W. Hpc-gpt: Integrating large language model for high-performance computing. Proceedings of the SC’23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, 2023, pp. 951–960.
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018; arXiv:1810.04805. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019; arXiv:1907.11692. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019; arXiv:1910.01108. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019; arXiv:1909.11942. [Google Scholar]
- Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. Ernie: Enhanced representation through knowledge integration. arXiv preprint arXiv:1904.09223, 2019; arXiv:1904.09223. [Google Scholar]
- Gruver, N.; Finzi, M.; Qiu, S.; Wilson, A.G. Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Fathullah, Y.; Wu, C.; Lakomkin, E.; Jia, J.; Shangguan, Y.; Li, K.; Guo, J.; Xiong, W.; Mahadeokar, J.; Kalinli, O.; others. Prompting large language models with speech recognition abilities. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 13351–13355.
- Zhai, X.; Kolesnikov, A.; Houlsby, N.; Beyer, L. Scaling vision transformers. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12104–12113.
- Pope, R.; Douglas, S.; Chowdhery, A.; Devlin, J.; Bradbury, J.; Heek, J.; Xiao, K.; Agrawal, S.; Dean, J. Efficiently scaling transformer inference. Proceedings of Machine Learning and Systems 2023, 5. [Google Scholar]
- Nie, X.; Chen, X.; Jin, H.; Zhu, Z.; Qi, D.; Yan, Y. ScopeViT: Scale-aware Vision Transformer. Pattern Recognition, 1104. [Google Scholar] [CrossRef]
- Jagannadharao, A.; Beckage, N.; Nafus, D.; Chamberlin, S. Timeshifting strategies for carbon-efficient long-running large language model training. Innovations in Systems and Software Engineering.
- Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High Confidence Computing, 1002. [Google Scholar]
- Jansen, B.J.; Jung, S.g.; Salminen, J. Employing large language models in survey research. Natural Language Processing Journal 2023, 4, 100020. [Google Scholar]
- Singhal, K.; Azizi, S.; Tu, T.; Mahdavi, S.S.; Wei, J.; Chung, H.W.; Scales, N.; Tanwani, A.; Cole-Lewis, H.; Pfohl, S.; others. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022; arXiv:2212.13138. [Google Scholar]
- Brakel, F.; Odyurt, U.; Varbanescu, A.L. Model Parallelism on Distributed Infrastructure: A Literature Review from Theory to LLM Case-Studies. arXiv preprint arXiv:2403.03699, 2024; arXiv:2403.03699. [Google Scholar]
- Xu, M.; Yin, W.; Cai, D.; Yi, R.; Xu, D.; Wang, Q.; Wu, B.; Zhao, Y.; Yang, C.; Wang, S.; others. A survey of resource-efficient llm and multimodal foundation models. arXiv preprint arXiv:2401.08092, 2024; arXiv:2401.08092. [Google Scholar]
- Czymmek, V.; Möller, C.; Harders, L.O.; Hussmann, S. Deep learning approach for high energy efficient real-time detection of weeds in organic farming. 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC). IEEE, 2021, pp. 1–6.
- Civit-Masot, J.; Luna-Perejón, F.; Corral, J.M.R.; Domínguez-Morales, M.; Morgado-Estévez, A.; Civit, A. A study on the use of Edge TPUs for eye fundus image segmentation. Engineering Applications of Artificial Intelligence 2021, 104, 104384. [Google Scholar]
- Shuvo, M.M.H.; Islam, S.K.; Cheng, J.; Morshed, B.I. Efficient acceleration of deep learning inference on resource-constrained edge devices: A review. Proceedings of the IEEE 2022, 111, 42–91. [Google Scholar]
- Ren, H.; Dai, H.; Dai, Z.; Yang, M.; Leskovec, J.; Schuurmans, D.; Dai, B. Combiner: Full attention transformer with sparse computation cost. Advances in Neural Information Processing Systems 2021, 34, 22470–22482. [Google Scholar]
- Choromanski, K.; Likhosherstov, V.; Dohan, D.; Song, X.; Gane, A.; Sarlos, T.; Hawkins, P.; Davis, J.; Mohiuddin, A.; Kaiser, L.; others. Rethinking attention with performers. arXiv preprint arXiv:2009.14794, 2020; arXiv:2009.14794. [Google Scholar]
- Dass, J.; Wu, S.; Shi, H.; Li, C.; Ye, Z.; Wang, Z.; Lin, Y. Vitality: Unifying low-rank and sparse approximation for vision transformer acceleration with a linear taylor attention. 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2023, pp. 415–428.
- Ma, X.; Fang, G.; Wang, X. Llm-pruner: On the structural pruning of large language models. Advances in neural information processing systems 2023, 36, 21702–21720. [Google Scholar]
- Kurtić, E.; Frantar, E.; Alistarh, D. ZipLM: Inference-Aware Structured Pruning of Language Models. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Huang, W.; Liu, Y.; Qin, H.; Li, Y.; Zhang, S.; Liu, X.; Magno, M.; Qi, X. BiLLM: Pushing the Limit of Post-Training Quantization for LLMs. arXiv preprint arXiv:2402.04291, arXiv:2402.04291 2024.
- Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282, arXiv:1710.09282 2017.
- Wu, J.; Leng, C.; Wang, Y.; Hu, Q.; Cheng, J. Quantized convolutional neural networks for mobile devices. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4820–4828.
- Peng, Y.L.; Lee, W.P. Practical guidelines for resolving the loss divergence caused by the root-mean-squared propagation optimizer. Applied Soft Computing 2024, 153, 111335. [Google Scholar]
- Hanin, B. Which neural net architectures give rise to exploding and vanishing gradients? Advances in neural information processing systems 2018, 31. [Google Scholar]
- Ravikumar, A.; Sriraman, H. Mitigating Vanishing Gradient in SGD Optimization in Neural Networks. International Conference on Information, Communication and Computing Technology. Springer, 2023, pp. 1–11.
- Moradi, R.; Berangi, R.; Minaei, B. A survey of regularization strategies for deep models. Artificial Intelligence Review 2020, 53, 3947–3986. [Google Scholar]
- Santos, C.F.G.D.; Papa, J.P. Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Computing Surveys (CSUR) 2022, 54, 1–25. [Google Scholar]
- Tirumala, K.; Simig, D.; Aghajanyan, A.; Morcos, A. D4: Improving llm pretraining via document de-duplication and diversification. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Guérin, J.; Nahid, A.; Tassy, L.; Deloger, M.; Bocquet, F.; Thézenas, S.; Desandes, E.; Le Deley, M.C.; Durando, X.; Jaffré, A.; others. Consore: A Powerful Federated Data Mining Tool Driving a French Research Network to Accelerate Cancer Research. International Journal of Environmental Research and Public Health 2024, 21, 189. [Google Scholar]
- Arora, S.; Yang, B.; Eyuboglu, S.; Narayan, A.; Hojel, A.; Trummer, I.; Ré, C. Language models enable simple systems for generating structured views of heterogeneous data lakes. arXiv preprint arXiv:2304.09433, 2023; arXiv:2304.09433. [Google Scholar]
- Candel, A.; McKinney, J.; Singer, P.; Pfeiffer, P.; Jeblick, M.; Prabhu, P.; Gambera, J.; Landry, M.; Bansal, S.; Chesler, R.; others. h2ogpt: Democratizing large language models. arXiv preprint arXiv:2306.08161, 2023; arXiv:2306.08161. [Google Scholar]
- Li, Y.; Wang, S.; Ding, H.; Chen, H. Large language models in finance: A survey. Proceedings of the Fourth ACM International Conference on AI in Finance, 2023, pp. 374–382.
- Chae, Y.; Davidson, T. Large language models for text classification: From zero-shot learning to fine-tuning. Open Science Foundation 2023. [Google Scholar]
- Muennighoff, N.; Rush, A.; Barak, B.; Le Scao, T.; Tazi, N.; Piktus, A.; Pyysalo, S.; Wolf, T.; Raffel, C.A. Scaling data-constrained language models. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Ho, A.; Besiroglu, T.; Erdil, E.; Owen, D.; Rahman, R.; Guo, Z.C.; Atkinson, D.; Thompson, N.; Sevilla, J. Algorithmic progress in language models. arXiv preprint arXiv:2403.05812, 2024; arXiv:2403.05812. [Google Scholar]
- Bai, G.; Chai, Z.; Ling, C.; Wang, S.; Lu, J.; Zhang, N.; Shi, T.; Yu, Z.; Zhu, M.; Zhang, Y.; others. Beyond efficiency: A systematic survey of resource-efficient large language models. arXiv preprint arXiv:2401.00625, 2024; arXiv:2401.00625. [Google Scholar]
- Wang, Y.; Chen, K.; Tan, H.; Guo, K. Tabi: An efficient multi-level inference system for large language models. Proceedings of the Eighteenth European Conference on Computer Systems, 2023, pp. 233–248.
- Doo, F.X.; Kulkarni, P.; Siegel, E.L.; Toland, M.; Paul, H.Y.; Carlos, R.C.; Parekh, V.S. Economic and Environmental Costs of Cloud Technologies for Medical Imaging and Radiology Artificial Intelligence. Journal of the American College of Radiology 2024, 21, 248–256. [Google Scholar]
- Hort, M.; Grishina, A.; Moonen, L. An exploratory literature study on sharing and energy use of language models for source code. 2023 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM). IEEE, 2023, pp. 1–12.
- Sebastian, G. Privacy and data protection in chatgpt and other ai chatbots: Strategies for securing user information. International Journal of Security and Privacy in Pervasive Computing (IJSPPC) 2023, 15, 1–14. [Google Scholar]
- Huang, K.; Goertzel, B.; Wu, D.; Xie, A. GenAI Model Security. In Generative AI Security: Theories and Practices; Springer, 2024; pp. 163–198.
- Yan, B.; Li, K.; Xu, M.; Dong, Y.; Zhang, Y.; Ren, Z.; Cheng, X. On Protecting the Data Privacy of Large Language Models (LLMs): A Survey. arXiv preprint arXiv:2403.05156, 2024; arXiv:2403.05156. [Google Scholar]
- Alawida, M.; Abu Shawar, B.; Abiodun, O.I.; Mehmood, A.; Omolara, A.E.; Al Hwaitat, A.K. Unveiling the dark side of chatgpt: Exploring cyberattacks and enhancing user awareness. Information 2024, 15, 27. [Google Scholar] [CrossRef]
- Ali, R.; Zikria, Y.B.; Garg, S.; Bashir, A.K.; Obaidat, M.S.; Kim, H.S. A federated reinforcement learning framework for incumbent technologies in beyond 5G networks. IEEE network 2021, 35, 152–159. [Google Scholar]
- Albrecht, J.P. The EU’s new data protection law–how a directive evolved into a regulation. Computer Law Review International 2016, 17, 33–43. [Google Scholar]
- Rendón, L.G. An Introduction to the Principle of Transparency in Automated Decision-Making Systems. 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO). IEEE, 2022, pp. 1245–1252.
- Mirghaderi, L.; Sziron, M.; Hildt, E. Ethics and transparency issues in digital platforms: An overview. AI 2023, 4, 831–843. [Google Scholar] [CrossRef]
- Kaur, D.; Uslu, S.; Durresi, M.; Durresi, A. LLM-Based Agents Utilized in a Trustworthy Artificial Conscience Model for Controlling AI in Medical Applications. International Conference on Advanced Information Networking and Applications. Springer, 2024, pp. 198–209.
- Chacko, N.; Chacko, V. Paradigm shift presented by Large Language Models (LLM) in Deep Learning. ADVANCES IN EMERGING COMPUTING TECHNOLOGIES 2023, 40. [Google Scholar]
- Anderljung, M.; Barnhart, J.; Leung, J.; Korinek, A.; O’Keefe, C.; Whittlestone, J.; Avin, S.; Brundage, M.; Bullock, J.; Cass-Beggs, D.; others. Frontier AI regulation: Managing emerging risks to public safety. arXiv preprint arXiv:2307.03718, 2023; arXiv:2307.03718. [Google Scholar]
- Mylrea, M.; Robinson, N. Artificial Intelligence (AI) trust framework and maturity model: applying an entropy lens to improve security, privacy, and ethical AI. Entropy 2023, 25, 1429. [Google Scholar] [CrossRef] [PubMed]
- Smuha, N.A. From a ‘race to AI’to a ‘race to AI regulation’: regulatory competition for artificial intelligence. Law, Innovation and Technology 2021, 13, 57–84. [Google Scholar]
- Huang, K.; Joshi, A.; Dun, S.; Hamilton, N. AI Regulations. In Generative AI Security: Theories and Practices; Springer, 2024; pp. 61–98.
- Huang, K.; Ponnapalli, J.; Tantsura, J.; Shin, K.T. Navigating the GenAI Security Landscape. In Generative AI Security: Theories and Practices; Springer, 2024; pp. 31–58.
- Gaske, M.R. Regulation Priorities for Artificial Intelligence Foundation Models. Vand. J. Ent. & Tech. L. 2023, 26, 1. [Google Scholar]
- Hou, Y.; Zhang, J.; Lin, Z.; Lu, H.; Xie, R.; McAuley, J.; Zhao, W.X. Large language models are zero-shot rankers for recommender systems. European Conference on Information Retrieval. Springer, 2024, pp. 364–381.
- Yang, J.; Jin, H.; Tang, R.; Han, X.; Feng, Q.; Jiang, H.; Zhong, S.; Yin, B.; Hu, X. Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data 2023. [Google Scholar]
- Head, C.B.; Jasper, P.; McConnachie, M.; Raftree, L.; Higdon, G. Large language model applications for evaluation: Opportunities and ethical implications. New Directions for Evaluation 2023, 2023, 33–46. [Google Scholar]
- Yuan, L.; Chen, Y.; Cui, G.; Gao, H.; Zou, F.; Cheng, X.; Ji, H.; Liu, Z.; Sun, M. Revisiting Out-of-distribution Robustness in NLP: Benchmarks, Analysis, and LLMs Evaluations. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Ullah, E.; Parwani, A.; Baig, M.M.; Singh, R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology–a recent scoping review. Diagnostic Pathology 2024, 19, 1–9. [Google Scholar]
- Reddy, S. Generative AI in healthcare: an implementation science informed translational path on application, integration and governance. Implementation Science 2024, 19, 27. [Google Scholar]
- Fawzi, S. A Review of the Role of ChatGPT for Clinical Decision Support Systems. 2023 5th Novel Intelligent and Leading Emerging Sciences Conference (NILES). IEEE, 2023, pp. 439–442.
- Guha, N.; Nyarko, J.; Ho, D.; Ré, C.; Chilton, A.; Chohlas-Wood, A.; Peters, A.; Waldon, B.; Rockmore, D.; Zambrano, D.; others. Legalbench: A collaboratively built benchmark for measuring legal reasoning in large language models. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Trozze, A.; Davies, T.; Kleinberg, B. Large language models in cryptocurrency securities cases: can a GPT model meaningfully assist lawyers? Artificial Intelligence and Law, 2024; 1–47. [Google Scholar]
- Haleem, A.; Javaid, M.; Singh, R.P. Exploring the competence of ChatGPT for customer and patient service management. Intelligent Pharmacy 2024. [Google Scholar] [CrossRef]
- Nazir, A.; Wang, Z. A comprehensive survey of ChatGPT: Advancements, applications, prospects, and challenges. Meta-Radiology 2023, 1, 100022. [Google Scholar] [CrossRef] [PubMed]
- Hämäläinen, P.; Tavast, M.; Kunnari, A. Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. CHI ’23: CHI Conference on Human Factors in Computing Systems, ACM, 2023. [CrossRef]
- Cox, S.R.; Abdul, A.; Ooi, W.T. Prompting a Large Language Model to Generate Diverse Motivational Messages: A Comparison with Human-Written Messages. HAI ’23: International Conference on Human-Agent Interaction, ACM, 2023. [CrossRef]
- Xie, T.; Wan, Y.; Zhou, Y.; Huang, W.; Liu, Y.; Linghu, Q.; Wang, S.; Kit, C.; Grazian, C.; Zhang, W.; Hoex, B. Creation of a structured solar cell material dataset and performance prediction using large language models. Patterns, 1009. [Google Scholar] [CrossRef]
- Alt, R.; Fridgen, G.; Chang, Y. The future of fintech — Towards ubiquitous financial services. Electronic Markets 2024, 34. [Google Scholar] [CrossRef]
- Yang, R.; Tan, T.F.; Lu, W.; Thirunavukarasu, A.J.; Ting, D.S.W.; Liu, N. Large language models in health care: Development, applications, and challenges. Health Care Science 2023, 2, 255–263. [Google Scholar]
- Law, S.; Oldfield, B.; Yang, W. ChatGPT/GPT-4 (large language models): Opportunities and challenges of perspective in bariatric healthcare professionals. Obesity Reviews 2024. [Google Scholar] [CrossRef]
- Eloundou, T.; Manning, S.; Mishkin, P.; Rock, D. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models, 2023, [arXiv:econ.GN/2303.10130].
- Carvalho, I.; Ivanov, S. ChatGPT for tourism: applications, benefits and risks. Tourism Review 2023, 79, 290–303. [Google Scholar] [CrossRef]
- Gao, S.; Fang, J.; Tu, Q.; Yao, Z.; Chen, Z.; Ren, P.; Ren, Z. Generative News Recommendation, 2024, [arXiv:cs.IR/2403.03424].
- Sharma, N.; Liao, Q.V.; Xiao, Z. Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking, 2024, [arXiv:cs.CL/2402.05880].
- Ke, L.; Tong, S.; Cheng, P.; Peng, K. Exploring the Frontiers of LLMs in Psychological Applications: A Comprehensive Review, 2024, [arXiv:cs.LG/2401.01519].
- Denecke, K.; May, R.; Rivera-Romero, O. Transformer Models in Healthcare: A Survey and Thematic Analysis of Potentials, Shortcomings and Risks. Journal of Medical Systems 2024, 48, 23. [Google Scholar]
- Farina, M.; Ahmad, U.; Taha, A.; Younes, H.; Mesbah, Y.; Yu, X.; Pedrycz, W. Sparsity in transformers: A systematic literature review. Neurocomputing 2024, 127468. [Google Scholar]
- Mohammad, A.F.; Clark, B.; Agarwal, R.; Summers, S. LLM GPT Generative AI and Artificial General Intelligence (AGI): The Next Frontier. 2023 Congress in Computer Science, Computer Engineering, and Applied Computing (CSCE), 2023, pp. 413–417.
- Bal, M.; Sengupta, A. Spikingbert: Distilling bert to train spiking language models using implicit differentiation. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, Vol. 38, pp. 10998–11006.
- Zhang, H.; Yin, J.; Wang, H.; Xiang, Z. ITCMA: A Generative Agent Based on a Computational Consciousness Structure. arXiv preprint arXiv:2403.20097, 2024; arXiv:2403.20097. [Google Scholar]
- Roth, W.; Schindler, G.; Klein, B.; Peharz, R.; Tschiatschek, S.; Fröning, H.; Pernkopf, F.; Ghahramani, Z. Resource-efficient neural networks for embedded systems. Journal of Machine Learning Research 2024, 25, 1–51. [Google Scholar]
- Chitty-Venkata, K.T.; Mittal, S.; Emani, M.; Vishwanath, V.; Somani, A.K. A survey of techniques for optimizing transformer inference. Journal of Systems Architecture 2023, 102990. [Google Scholar]
- Luccioni, A.S.; Viguier, S.; Ligozat, A.L. Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model. Journal of Machine Learning Research 2023, 24, 1–15. [Google Scholar]
- Mohaidat, T.; Khalil, K. A Survey on Neural Network Hardware Accelerators. IEEE Transactions on Artificial Intelligence 2024, 1–21. [Google Scholar] [CrossRef]
- Akkad, G.; Mansour, A.; Inaty, E. Embedded Deep Learning Accelerators: A Survey on Recent Advances. IEEE Transactions on Artificial Intelligence 2023, 1–19. [Google Scholar] [CrossRef]
- Timmons, A.C.; Duong, J.B.; Simo Fiallo, N.; Lee, T.; Vo, H.P.Q.; Ahle, M.W.; Comer, J.S.; Brewer, L.C.; Frazier, S.L.; Chaspari, T. A call to action on assessing and mitigating bias in artificial intelligence applications for mental health. Perspectives on Psychological Science 2023, 18, 1062–1096. [Google Scholar] [PubMed]
- Nezami, N.; Haghighat, P.; Gándara, D.; Anahideh, H. Assessing Disparities in Predictive Modeling Outcomes for College Student Success: The Impact of Imputation Techniques on Model Performance and Fairness. Education Sciences 2024, 14, 136. [Google Scholar]
- Wan, M.; Zha, D.; Liu, N.; Zou, N. In-processing modeling techniques for machine learning fairness: A survey. ACM Transactions on Knowledge Discovery from Data 2023, 17, 1–27. [Google Scholar]
- Ferrara, E. Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies. Sci 2023, 6, 3. [Google Scholar] [CrossRef]


| Name | Year | Developer/Company | Method | Parameter Size | Unique Contribution |
|---|---|---|---|---|---|
| GPT-1 | 2018 | OpenAI | Transformer-based, unsupervised pre-training | 117M | Introduction of transformer model to NLP |
| BERT | 2018 | Transformer-based, bidirectional training | 110M (Base), 340M (Large) | Deep contextual understanding from bidirectional training | |
| GPT-2 | 2019 | OpenAI | Transformer-based, scaled-up GPT-1 | 1.5B | Scale-up in size and training data for greater generality |
| RoBERTa | 2019 | Facebook AI | Optimized BERT pre-training approach | 125M (Base), 355M (Large) | Removed BERT’s next-sentence prediction |
| DistilBERT | 2019 | Hugging Face | Knowledge distillation from BERT | 66M | Reduced size and preserved performance |
| ALBERT | 2019 | Parameter-reduction techniques | 12M (Base), 18M (Large) | Factorized embedding and cross-layer parameter sharing | |
| ERNIE | 2019 | Baidu | Integrating structured knowledge into pre-training | Similar to BERT | Leveraging real-world knowledge |
| GPT-3 | 2020 | OpenAI | Transformer-based, unsupervised learning | 175B | Scalability to a very large number of parameters |
| LaMDA | 2021 | Language model for dialog applications | 137B | Specialized in conversational understanding | |
| Bard | 2021 | Reinforcement learning from human feedback | 1.3B | Enhanced user interaction capabilities | |
| Gemini | 2021 | Microsoft | Hybrid transformer-CNN architecture | 2B | Integration of CNNs for enhanced spatial reasoning |
| GPT-3.5 | 2022 | OpenAI | Refinement of GPT-3 architecture | 175B | Improved training techniques and fine-tuning |
| Orca | 2022 | Multi-task learning approach | 500B | Advanced multitasking across diverse NLP applications | |
| PaLM | 2022 | Pathway language modeling | 540B | Pathways approach for simultaneous multiple tasks | |
| Falcon | 2022 | SpaceX AI | Streamlined architecture for low-resource environments | 850M | Optimized for rapid deployment in constrained settings |
| GPT-4 | 2023 | OpenAI | Further scaled and optimized GPT architecture | 800B | Extended capabilities in multilingual tasks and complex problem-solving |
| LLaMA | 2023 | Facebook AI | Advanced language model with minimal supervision | 65B | Efficient learning from fewer data |
| YaLM | 2023 | Yandex | Customizable modules for specific industries | 750B | Tailored solutions for sector-specific needs |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).