Submitted:
05 February 2025
Posted:
06 February 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Understanding Large Language Models Beyond the Buzzwords
2.1. The Transformer Revolution
2.2. Strengths, Caveats, and Uncharted Territories
2.3. Where to Draw the Line: Titles, Abstracts, or Full Text?
3. Evaluation Metrics for Classification Tasks
3.1. The Basics of Classification Metrics
3.2. Bringing Metrics to Life: A Practical Example
4. Speak and It Shall Be Done: The Art of Prompt Engineering
4.1. Building a Good Prompt
4.2. Zero, One, or Few Shots?
4.3. Soft Versus Strict: Setting the Bar for Inclusion
4.4. Illustrative Examples
4.5. Avoiding Pitfalls with Prompt Refinement
5. Future Horizons and Potential Advancements
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
Appendix A
- Practical Guidelines for Prompt Engineering
A.1. Define Clear Criteria
Map PICO Elements Precisely:
- Population: Clearly state the target population (e.g., “adult patients, defined as individuals ≥18 years”).
- Intervention: Specify the intervention details (e.g., “use of EMD combined with bone graft”).
- Comparison: Define what the intervention is being compared against (e.g., “bone graft alone”).
- Outcome: Identify the primary outcome or endpoints (e.g., “clinical measures of periodontal regeneration”).
- Use Unambiguous Language: avoid vague terms. For example, write “adult patients (≥18 years)” instead of “adults.”; include specific keywords or phrases that the model can recognize as indicators of a particular criterion.
- Incorporate Contextual Examples (if necessary): where appropriate, provide a brief example or keyword list that exemplifies the criterion, ensuring the model captures the intended meaning.
A.2. Choose an Appropriate Prompting Approach
Zero-Shot vs. One-Shot vs. Few-Shot:
A.3. Employ Iterative Testing and Refinement
Pilot Testing: Use a small, representative validation set of abstracts to assess prompt performance.
A.4. Integrate Human Oversight
Appendix B
Comprehensive Example: Zero-Shot Soft Prompt
References
- Mulrow, C.D. Systematic Reviews: Rationale for Systematic Reviews. BMJ 1994, 309, 597–599. [Google Scholar] [CrossRef] [PubMed]
- Dickersin, K.; Scherer, R.; Lefebvre, C. Systematic Reviews: Identifying Relevant Studies for Systematic Reviews. Bmj 1994, 309, 1286–1291. [Google Scholar] [CrossRef] [PubMed]
- Parums, D. V Review Articles, Systematic Reviews, Meta-Analysis, and the Updated Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 Guidelines. Med Sci Monit 2021, 27, e934475-1. [Google Scholar] [CrossRef] [PubMed]
- Waffenschmidt, S.; Knelangen, M.; Sieben, W.; Bühn, S.; Pieper, D. Single Screening versus Conventional Double Screening for Study Selection in Systematic Reviews: A Methodological Systematic Review. BMC Med Res Methodol 2019, 19, 132. [Google Scholar] [CrossRef]
- Elliott, J.H.; Synnot, A.; Turner, T.; Simmonds, M.; Akl, E.A.; McDonald, S.; Salanti, G.; Meerpohl, J.; MacLehose, H.; Hilton, J.; et al. Living Systematic Review: 1. Introduction—the Why, What, When, and How. J Clin Epidemiol 2017, 91, 23–30. [Google Scholar] [CrossRef]
- Bramer, W.M.; Rethlefsen, M.L.; Kleijnen, J.; Franco, O.H. Optimal Database Combinations for Literature Searches in Systematic Reviews: A Prospective Exploratory Study. Syst Rev 2017, 6, 245. [Google Scholar] [CrossRef]
- Scells, H.; Zuccon, G. Generating Better Queries for Systematic Reviews. In Proceedings of the The 41st international ACM SIGIR conference on research & development in information retrieval; 2018; pp. 475–484. [Google Scholar]
- Cooper, C.; Booth, A.; Varley-Campbell, J.; Britten, N.; Garside, R. Defining the Process to Literature Searching in Systematic Reviews: A Literature Review of Guidance and Supporting Studies. BMC Med Res Methodol 2018, 18, 85. [Google Scholar] [CrossRef]
- Gupta, S.; Rajiah, P.; Middlebrooks, E.H.; Baruah, D.; Carter, B.W.; Burton, K.R.; Chatterjee, A.R.; Miller, M.M. Systematic Review of the Literature: Best Practices. Acad Radiol 2018, 25, 1481–1490. [Google Scholar] [CrossRef]
- Eriksen, M.B.; Frandsen, T.F. The Impact of Patient, Intervention, Comparison, Outcome (PICO) as a Search Strategy Tool on Literature Search Quality: A Systematic Review. Journal of the Medical Library Association 2018, 106. [Google Scholar] [CrossRef]
- Abbade, L.P.F.; Wang, M.; Sriganesh, K.; Mbuagbaw, L.; Thabane, L. Framing of Research Question Using the PICOT Format in Randomised Controlled Trials of Venous Ulcer Disease: A Protocol for a Systematic Survey of the Literature. BMJ Open 2016, 6, e013175. [Google Scholar] [CrossRef]
- Methley, A.M.; Campbell, S.; Chew-Graham, C.; McNally, R.; Cheraghi-Sohi, S. PICO, PICOS and SPIDER: A Comparison Study of Specificity and Sensitivity in Three Search Tools for Qualitative Systematic Reviews. BMC Health Serv Res 2014, 14, 579. [Google Scholar] [CrossRef] [PubMed]
- Cooke, A.; Smith, D.; Booth, A. Beyond PICO. Qual Health Res 2012, 22, 1435–1443. [Google Scholar] [CrossRef] [PubMed]
- Frandsen, T.F.; Bruun Nielsen, M.F.; Lindhardt, C.L.; Eriksen, M.B. Using the Full PICO Model as a Search Tool for Systematic Reviews Resulted in Lower Recall for Some PICO Elements. J Clin Epidemiol 2020, 127, 69–75. [Google Scholar] [CrossRef] [PubMed]
- Brown, D. A Review of the PubMed PICO Tool: Using Evidence-Based Practice in Health Education. Health Promot Pract 2020, 21, 496–498. [Google Scholar] [CrossRef]
- Pham, B.; Jovanovic, J.; Bagheri, E.; Antony, J.; Ashoor, H.; Nguyen, T.T.; Rios, P.; Robson, R.; Thomas, S.M.; Watt, J.; et al. Text Mining to Support Abstract Screening for Knowledge Syntheses: A Semi-Automated Workflow. Syst Rev 2021, 10, 156. [Google Scholar] [CrossRef]
- Chai, K.E.K.; Lines, R.L.J.; Gucciardi, D.F.; Ng, L. Research Screener: A Machine Learning Tool to Semi-Automate Abstract Screening for Systematic Reviews. Syst Rev 2021, 10, 93. [Google Scholar] [CrossRef]
- Gates, A.; Johnson, C.; Hartling, L. Technology-Assisted Title and Abstract Screening for Systematic Reviews: A Retrospective Evaluation of the Abstrackr Machine Learning Tool. Syst Rev 2018, 7, 45. [Google Scholar] [CrossRef]
- Wang, Z.; Chu, Z.; Doan, T.V.; Ni, S.; Yang, M.; Zhang, W. History, Development, and Principles of Large Language Models: An Introductory Survey. AI and Ethics 2024. [Google Scholar] [CrossRef]
- Li, M.; Sun, J.; Tan, X. Evaluating the Effectiveness of Large Language Models in Abstract Screening: A Comparative Analysis. Syst Rev 2024, 13, 219. [Google Scholar] [CrossRef]
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. arXiv 2023, arXiv:2307.06435. [Google Scholar]
- Khraisha, Q.; Put, S.; Kappenberg, J.; Warraitch, A.; Hadfield, K. Can Large Language Models Replace Humans in the Systematic Review Process? Evaluating GPT-4’s Efficacy in Screening and Extracting Data from Peer-Reviewed and Grey Literature in Multiple Languages. arXiv 2023, arXiv:2310.17526. [Google Scholar]
- Giray, L. Prompt Engineering with ChatGPT: A Guide for Academic Writers. Ann Biomed Eng 2023, 51, 2629–2633. [Google Scholar] [CrossRef] [PubMed]
- Meskó, B. Prompt Engineering as an Important Emerging Skill for Medical Professionals: Tutorial. J Med Internet Res 2023, 25, e50638. [Google Scholar] [CrossRef] [PubMed]
- Zaghir, J.; Naguib, M.; Bjelogrlic, M.; Névéol, A.; Tannier, X.; Lovis, C. Prompt Engineering Paradigms for Medical Applications: Scoping Review. J Med Internet Res 2024, 26, e60501. [Google Scholar] [CrossRef]
- Wahidur, R.S.M.; Tashdeed, I.; Kaur, M.; Lee, H.-N. Enhancing Zero-Shot Crypto Sentiment with Fine-Tuned Language Model and Prompt Engineering. IEEE Access 2024. [Google Scholar] [CrossRef]
- Ashwathy, J.S.; SR, N.; Pyati, T. The Progression of ChatGPT: An Evolutionary Study from GPT-1 to GPT-4. Journal of Innovations in Data Science and Big Data Management 2024, 38–44. [Google Scholar]
- Bharathi Mohan, G.; Prasanna Kumar, R.; Parathasarathy, S.; Aravind, S.; Hanish, K.B.; Pavithria, G. Text Summarization for Big Data Analytics: A Comprehensive Review of GPT 2 and BERT Approaches. In; 2023; pp. 247–264.
- Kalyan, K.S. A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4. Natural Language Processing Journal 2024, 6, 100048. [Google Scholar] [CrossRef]
- Katrak, M. The Role of Language Prediction Models in Contractual Interpretation: The Challenges and Future Prospects of GPT-3. Legal Analytics 2022, 47–62. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv Neural Inf Process Syst 2017, 30. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations; 2020; pp. 38–45.
- Lin, T.; Wang, Y.; Liu, X.; Qiu, X. A Survey of Transformers. AI Open 2022. [Google Scholar] [CrossRef]
- Liu, J.; Chu, X.; Wang, Y.; Wang, M. Deep Text Retrieval Models Based on DNN, CNN, RNN and Transformer: A Review. In Proceedings of the 2022 IEEE 8th International Conference on Cloud Computing and Intelligent Systems (CCIS); IEEE; 2022; pp. 391–400. [Google Scholar]
- Shiri, F.M.; Perumal, T.; Mustapha, N.; Mohamed, R. A Comprehensive Overview and Comparative Analysis on Deep Learning Models. CNN, RNN, LSTM, GRU 2023. [Google Scholar]
- Gue, C.C.Y.; Rahim, N.D.A.; Rojas-Carabali, W.; Agrawal, R.; RK, P.; Abisheganaden, J.; Yip, W.F. Evaluating the OpenAI’s GPT-3.5 Turbo’s Performance in Extracting Information from Scientific Articles on Diabetic Retinopathy. Syst Rev 2024, 13, 135. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Han, T.; Ma, S.; Zhang, J.; Yang, Y.; Tian, J.; He, H.; Li, A.; He, M.; Liu, Z.; et al. Summary of ChatGPT-Related Research and Perspective towards the Future of Large Language Models. Meta-Radiology 2023, 1, 100017. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, C.; Zheng, S.; Qiao, Y.; Li, C.; Zhang, M.; Dam, S.K.; Thwal, C.M.; Tun, Y.L.; Huy, L.L. A Complete Survey on Generative Ai (Aigc): Is Chatgpt from Gpt-4 to Gpt-5 All You Need? arXiv 2023, arXiv:2303.11717. [Google Scholar]
- Tao, K.; Osman, Z.A.; Tzou, P.L.; Rhee, S.-Y.; Ahluwalia, V.; Shafer, R.W. GPT-4 Performance on Querying Scientific Publications: Reproducibility, Accuracy, and Impact of an Instruction Sheet. BMC Med Res Methodol 2024, 24, 139. [Google Scholar] [CrossRef]
- Baktash, J.A.; Dawodi, M. Gpt-4: A Review on Advancements and Opportunities in Natural Language Processing. arXiv 2023, arXiv:2305.03195. [Google Scholar]
- Sindhu, B.; Prathamesh, R.P.; Sameera, M.B.; KumaraSwamy, S. The Evolution of Large Language Model: Models, Applications and Challenges. In Proceedings of the 2024 International Conference on Current Trends in Advanced Computing (ICCTAC); IEEE; 2024; pp. 1–8. [Google Scholar]
- Irshad, M. Revolutionizing Healthcare Delivery: Evaluating the Impact of Google’s Gemini AI as a Virtual Doctor in Medical Services. J Artif Intell. Mach Learn & Data Sci, 2024; 2, 1618–1625. [Google Scholar]
- Annepaka, Y.; Pakray, P. Large Language Models: A Survey of Their Development, Capabilities, and Applications. Knowl Inf Syst 2024. [Google Scholar] [CrossRef]
- Xu, J.; Li, Z.; Chen, W.; Wang, Q.; Gao, X.; Cai, Q.; Ling, Z. On-Device Language Models: A Comprehensive Review. arXiv 2024, arXiv:2409.00088. [Google Scholar]
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. arXiv 2023, arXiv:2307.06435. [Google Scholar]
- Qu, G.; Chen, Q.; Wei, W.; Lin, Z.; Chen, X.; Huang, K. Mobile Edge Intelligence for Large Language Models: A Contemporary Survey. IEEE Communications Surveys & Tutorials 2025. [Google Scholar]
- Patel, D.; Raut, G.; Cheetirala, S.N.; Nadkarni, G.N.; Freeman, R.; Glicksberg, B.S.; Klang, E.; Timsina, P. Cloud Platforms for Developing Generative AI Solutions: A Scoping Review of Tools and Services. arXiv 2024, arXiv:2412.06044. [Google Scholar]
- Ofoeda, J.; Boateng, R.; Effah, J. Application Programming Interface (API) Research. International Journal of Enterprise Information Systems 2019, 15, 76–95. [Google Scholar] [CrossRef]
- Hadi, M.U.; Qureshi, R.; Shah, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Wu, J.; Mirjalili, S. A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage. Authorea Preprints 2023. [Google Scholar]
- Villalobos, P.; Ho, A.; Sevilla, J.; Besiroglu, T.; Heim, L.; Hobbhahn, M. Will We Run out of Data? Limits of LLM Scaling Based on Human-Generated Data. arXiv preprint arXiv:2211.04325 2024, 13–29.
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Wang, J.; Yang, Z.; Yao, Z.; Yu, H. Jmlr: Joint Medical Llm and Retrieval Training for Enhancing Reasoning and Professional Question Answering Capability. arXiv 2024, arXiv:2402.17887. [Google Scholar]
- Zhang, Y.; Mao, S.; Ge, T.; Wang, X.; de Wynter, A.; Xia, Y.; Wu, W.; Song, T.; Lan, M.; Wei, F. LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models. arXiv 2024, arXiv:2404.01230. [Google Scholar]
- Liu, Y.; He, H.; Han, T.; Zhang, X.; Liu, M.; Tian, J.; Zhang, Y.; Wang, J.; Gao, X.; Zhong, T. Understanding Llms: A Comprehensive Overview from Training to Inference. arXiv 2024, arXiv:2401.02038. [Google Scholar] [CrossRef]
- Campos, D.G.; Fütterer, T.; Gfrörer, T.; Lavelle-Hill, R.; Murayama, K.; König, L.; Hecht, M.; Zitzmann, S.; Scherer, R. Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research. Educ Psychol Rev 2024, 36, 19. [Google Scholar] [CrossRef]
- Drury, A.; Pape, E.; Dowling, M.; Miguel, S.; Fernández-Ortega, P.; Papadopoulou, C.; Kotronoulas, G. How to Write a Comprehensive and Informative Research Abstract. Semin Oncol Nurs 2023, 39, 151395. [Google Scholar] [CrossRef]
- Liang, X.; Wang, H.; Wang, Y.; Song, S.; Yang, J.; Niu, S.; Hu, J.; Liu, D.; Yao, S.; Xiong, F. Controllable Text Generation for Large Language Models: A Survey. arXiv 2024, arXiv:2408.12599. [Google Scholar]
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. ACM Trans Inf Syst 2025, 43, 1–55. [Google Scholar] [CrossRef]
- Loya, M.; Sinha, D.A.; Futrell, R. Exploring the Sensitivity of LLMs’ Decision-Making Capabilities: Insights from Prompt Variation and Hyperparameters. arXiv 2023, arXiv:2312.17476. [Google Scholar]
- Errica, F.; Siracusano, G.; Sanvito, D.; Bifulco, R. What Did I Do Wrong? Quantifying LLMs’ Sensitivity and Consistency to Prompt Engineering. arXiv 2024, arXiv:2406.12334. [Google Scholar]
- Sclar, M.; Choi, Y.; Tsvetkov, Y.; Suhr, A. Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I Learned to Start Worrying about Prompt Formatting. arXiv 2023, arXiv:2310.11324. [Google Scholar]
- Shen, J.; Tenenholtz, N.; Hall, J.B.; Alvarez-Melis, D.; Fusi, N. Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains. arXiv 2024, arXiv:2402.05140. [Google Scholar]
- Felin, T.; Holweg, M. Theory Is All You Need: AI, Human Cognition, and Causal Reasoning. Strategy Science 2024, 9, 346–371. [Google Scholar] [CrossRef]
- Crowther, M.A.; Cook, D.J. Trials and Tribulations of Systematic Reviews and Meta-Analyses. ASH Education Program Book 2007, 2007, 493–497. [Google Scholar] [CrossRef]
- Guizzardi, S.; Colangelo, M.T.; Mirandola, P.; Galli, C. Modeling New Trends in Bone Regeneration, Using the BERTopic Approach. Regenerative Med 2023, 18, 719–734. [Google Scholar] [CrossRef]
- Colangelo, M.T.; Meleti, M.; Guizzardi, S.; Galli, C. A Macroscopic Exploration of the Ideoscape on Exosomes for Bone Regeneration. Osteology 2024, 4, 159–178. [Google Scholar] [CrossRef]
- Galli, C.; Cusano, C.; Meleti, M.; Donos, N.; Calciolari, E. Topic Modeling for Faster Literature Screening Using Transformer-Based Embeddings. In Proceedings of the Metrics; MDPI, 2024; Vol. 1; p. 2. [Google Scholar]
- Mateen, F.J.; Oh, J.; Tergas, A.I.; Bhayani, N.H.; Kamdar, B.B. Titles versus Titles and Abstracts for Initial Screening of Articles for Systematic Reviews. Clin Epidemiol 2013, 89–95. [Google Scholar] [CrossRef]
- Saloojee, H.; Pettifor, J.M. Maximizing Access and Minimizing Barriers to Research in Low- and Middle-Income Countries: Open Access and Health Equity. Calcif Tissue Int 2023, 114, 83–85. [Google Scholar] [CrossRef] [PubMed]
- Herbst, E.; Kopf, S. Writing an Abstract. Arthroskopie 2024, 37, 258–261. [Google Scholar] [CrossRef]
- Galli, C.; Colangelo, M.T.; Guizzardi, S. Linguistic Changes in the Transition from Summaries to Abstracts: The Case of the Journal of Experimental Medicine. Learned Publishing 2022, 35, 271–284. [Google Scholar] [CrossRef]
- Moher, D.; Schulz, K.F.; Altman, D.G. The CONSORT Statement: Revised Recommendations for Improving the Quality of Reports of Parallel-Group Randomised Trials. The lancet 2001, 357, 1191–1194. [Google Scholar] [CrossRef]
- Hermont, A.P.; Cruz, P.V.; Occhi-Alexandre, I.G.P.; Bendo, C.B.; Auad, S.M.; Pordeus, I.A.; Martins, C.C. The Importance of Full Text Screening When Judging Eligibility Criteria in a Systematic Review. Arquivos em Odontologia 2022, 58, 160–165. [Google Scholar] [CrossRef]
- Jacso, P. Open Access to Scholarly Full-text Documents. Online Information Review 2006, 30, 587–594. [Google Scholar] [CrossRef]
- Singh, A.; Singh, M.; Singh, A.K.; Singh, D.; Singh, P.; Sharma, A. “Free Full Text Articles”: Where to Search for Them? Int J Trichology 2011, 3, 75–79. [Google Scholar] [CrossRef]
- Lewis, C.L. The Open Access Citation Advantage: Does It Exist and What Does It Mean for Libraries? Information Technology and Libraries 2018, 37, 50–65. [Google Scholar] [CrossRef]
- Ye, A.; Maiti, A.; Schmidt, M.; Pedersen, S.J. A Hybrid Semi-Automated Workflow for Systematic and Literature Review Processes with Large Language Model Analysis. Future Internet 2024, 16, 167. [Google Scholar] [CrossRef]
- Baldi, P.; Brunak, S.; Chauvin, Y.; Andersen, C.A.F.; Nielsen, H. Assessing the Accuracy of Prediction Algorithms for Classification: An Overview. Bioinformatics 2000, 16, 412–424. [Google Scholar] [CrossRef]
- Bramer, W.M.; Giustini, D.; Kramer, B.M.R. Comparing the Coverage, Recall, and Precision of Searches for 120 Systematic Reviews in Embase, MEDLINE, and Google Scholar: A Prospective Study. Syst Rev 2016, 5, 39. [Google Scholar] [CrossRef] [PubMed]
- Streiner, D.L.; Norman, G.R. “Precision” and “Accuracy”: Two Terms That Are Neither. J Clin Epidemiol 2006, 59, 327–330. [Google Scholar] [CrossRef] [PubMed]
- Straube, S.; Heinz, J.; Landsvogt, P.; Friede, T. Recall, Precision, and Coverage of Literature Searches in Systematic Reviews in Occupational Medicine: An Overview of Cochrane Reviews Recall, Precision Und Coverage von Literatursuchen in Systematischen Reviews Aus Dem Bereich Arbeitsmedizin: Ein Überblick Über Cochrane Reviews. GMS Medizinische Informatik, Biometrie und Epidemiologie 2021, 17. [Google Scholar]
- Grandini, M.; Bagli, E.; Visani, G. Metrics for Multi-Class Classification: An Overview. arXiv 2020, arXiv:2008.05756. [Google Scholar]
- Beurer-Kellner, L.; Fischer, M.; Vechev, M. Prompting Is Programming: A Query Language for Large Language Models. Proceedings of the ACM on Programming Languages 2023, 7, 1946–1969. [Google Scholar] [CrossRef]
- Chen, B.; Zhang, Z.; Langrené, N.; Zhu, S. Unleashing the Potential of Prompt Engineering in Large Language Models: A Comprehensive Review. arXiv 2023, arXiv:2310.14735. [Google Scholar]
- Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput Surv 2023, 55, 1–35. [Google Scholar] [CrossRef]
- Pourpanah, F.; Abdar, M.; Luo, Y.; Zhou, X.; Wang, R.; Lim, C.P.; Wang, X.-Z.; Wu, Q.M.J. A Review of Generalized Zero-Shot Learning Methods. IEEE Trans Pattern Anal Mach Intell 2022, 45, 4051–4070. [Google Scholar] [CrossRef]
- Li, Y. A Practical Survey on Zero-Shot Prompt Design for in-Context Learning. arXiv 2023, arXiv:2309.13205. [Google Scholar]
- Dang, H.; Mecke, L.; Lehmann, F.; Goller, S.; Buschek, D. How to Prompt? Opportunities and Challenges of Zero-and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. arXiv 2022, arXiv:2209.01390. [Google Scholar]
- Qi, B.; Zhang, K.; Li, H.; Tian, K.; Zeng, S.; Chen, Z.-R.; Zhou, B. Large Language Models Are Zero Shot Hypothesis Proposers. arXiv 2023, arXiv:2311.05965. [Google Scholar]
- Wang, X.; Yin, X.; Zhang, Y.; Zhang, Y. Related Work on Few-Shot Method: A Review. 2024.
- Dang, H.; Mecke, L.; Lehmann, F.; Goller, S.; Buschek, D. How to Prompt? Opportunities and Challenges of Zero-and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. arXiv 2022, arXiv:2209.01390. [Google Scholar]
- Cao, C.; Sang, J.; Arora, R.; Kloosterman, R.; Cecere, M.; Gorla, J.; Saleh, R.; Chen, D.; Drennan, I.; Teja, B. Prompting Is All You Need: LLMs for Systematic Review Screening. medRxiv 2024, 2024–2026. [Google Scholar]
- Kusano, G.; Akimoto, K.; Takeoka, K. Are Longer Prompts Always Better? Prompt Selection in Large Language Models for Recommendation Systems. arXiv 2024, arXiv:2412.14454. [Google Scholar]
- Sahoo, P.; Singh, A.K.; Saha, S.; Jain, V.; Mondal, S.; Chadha, A. A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications. arXiv 2024, arXiv:2402.07927. [Google Scholar]
- Heston, T.F.; Khun, C. Prompt Engineering in Medical Education. International Medical Education 2023, 2, 198–205. [Google Scholar] [CrossRef]
- Ferdaus, M.M.; Abdelguerfi, M.; Ioup, E.; Niles, K.N.; Pathak, K.; Sloan, S. Towards Trustworthy Ai: A Review of Ethical and Robust Large Language Models. arXiv 2024, arXiv:2407.13934. [Google Scholar]
- Quttainah, M.; Mishra, V.; Madakam, S.; Lurie, Y.; Mark, S. Cost, Usability, Credibility, Fairness, Accountability, Transparency, and Explainability Framework for Safe and Effective Large Language Models in Medical Education: Narrative Review and Qualitative Study. JMIR AI 2024, 3, e51834. [Google Scholar] [CrossRef]
- Matsui, K.; Utsumi, T.; Aoki, Y.; Maruki, T.; Takeshima, M.; Takaesu, Y. Human-Comparable Sensitivity of Large Language Models in Identifying Eligible Studies Through Title and Abstract Screening: 3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews. J Med Internet Res 2024, 26, e52758. [Google Scholar] [CrossRef]
- Guo, E.; Gupta, M.; Deng, J.; Park, Y.-J.; Paget, M.; Naugler, C. Automated Paper Screening for Clinical Reviews Using Large Language Models: Data Analysis Study. J Med Internet Res 2024, 26, e48996. [Google Scholar] [CrossRef]
- Tran, V.-T.; Gartlehner, G.; Yaacoub, S.; Boutron, I.; Schwingshackl, L.; Stadelmaier, J.; Sommer, I.; Aboulayeh, F.; Afach, S.; Meerpohl, J. Sensitivity, Specificity and Avoidable Workload of Using a Large Language Models for Title and Abstract Screening in Systematic Reviews and Meta-Analyses. medRxiv 2023, 2012–2023. [Google Scholar]
- Anisuzzaman, D.M.; Malins, J.G.; Friedman, P.A.; Attia, Z.I. Fine-Tuning Llms for Specialized Use Cases. Mayo Clinic Proceedings: Digital Health, 2024. [Google Scholar]
- Parthasarathy, V.B.; Zafar, A.; Khan, A.; Shahid, A. The Ultimate Guide to Fine-Tuning Llms from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities. arXiv 2024, arXiv:2408.13296. [Google Scholar]
- Cohen, Y.; Aperstein, Y. A Review of Generative Pretrained Multi-Step Prompting Schemes–and a New Multi-Step Prompting Framework. 2024.
- Neimann Rasmussen, L.; Montgomery, P. The Prevalence of and Factors Associated with Inclusion of Non-English Language Studies in Campbell Systematic Reviews: A Survey and Meta-Epidemiological Study. Syst Rev 2018, 7, 1–12. [Google Scholar] [CrossRef] [PubMed]
- Zhu, S.; Xu, S.; Sun, H.; Pan, L.; Cui, M.; Du, J.; Jin, R.; Branco, A.; Xiong, D. Multilingual Large Language Models: A Systematic Survey. arXiv 2024, arXiv:2411.11072. [Google Scholar]
- Xu, Y.; Hu, L.; Zhao, J.; Qiu, Z.; Ye, Y.; Gu, H. A Survey on Multilingual Large Language Models: Corpora, Alignment, and Bias. arXiv 2024, arXiv:2404.00929. [Google Scholar]
- Yuan, F.; Yuan, S.; Wu, Z.; Li, L. How Multilingual Is Multilingual LLM? arXiv 2023, arXiv:2311.09071. [Google Scholar]
- Thellmann, K.; Stadler, B.; Fromm, M.; Buschhoff, J.S.; Jude, A.; Barth, F.; Leveling, J.; Flores-Herr, N.; Köhler, J.; Jäkel, R. Towards Multilingual LLM Evaluation for European Languages. arXiv 2024, arXiv:2410.08928. [Google Scholar]
- Huang, K.; Mo, F.; Li, H.; Li, Y.; Zhang, Y.; Yi, W.; Mao, Y.; Liu, J.; Xu, Y.; Xu, J. A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers. arXiv 2024, arXiv:2405.10936. [Google Scholar]
- Huang, K.; Mo, F.; Li, H.; Li, Y.; Zhang, Y.; Yi, W.; Mao, Y.; Liu, J.; Xu, Y.; Xu, J. A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers. arXiv 2024, arXiv:2405.10936. [Google Scholar]
- Laakso, A.; Kemell, K.-K.; Nurminen, J.K. Ethical Issues in Large Language Models: A Systematic Literature Review. 2024.
- Zhang, Z.; Yao, Y.; Zhang, A.; Tang, X.; Ma, X.; He, Z.; Wang, Y.; Gerstein, M.; Wang, R.; Liu, G. Igniting Language Intelligence: The Hitchhiker’s Guide From Chain-of-Thought Reasoning to Language Agents. arXiv 2023, arXiv:2311.11797. [Google Scholar] [CrossRef]
- Augenstein, I.; Baldwin, T.; Cha, M.; Chakraborty, T.; Ciampaglia, G.L.; Corney, D.; DiResta, R.; Ferrara, E.; Hale, S.; Halevy, A.; et al. Factuality Challenges in the Era of Large Language Models and Opportunities for Fact-Checking. Nat Mach Intell 2024, 6, 852–863. [Google Scholar] [CrossRef]
- Wang, H.; Fu, W.; Tang, Y.; Chen, Z.; Huang, Y.; Piao, J.; Gao, C.; Xu, F.; Jiang, T.; Li, Y. A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy. arXiv 2025, arXiv:2501.09431. [Google Scholar]
- Lin, Z.; Guan, S.; Zhang, W.; Zhang, H.; Li, Y.; Zhang, H. Towards Trustworthy LLMs: A Review on Debiasing and Dehallucinating in Large Language Models. Artif Intell Rev 2024, 57, 243. [Google Scholar] [CrossRef]
- Guo, Y.; Guo, M.; Su, J.; Yang, Z.; Zhu, M.; Li, H.; Qiu, M.; Liu, S.S. Bias in Large Language Models: Origin, Evaluation, and Mitigation. arXiv 2024, arXiv:2411.10915. [Google Scholar]
- Ranjan, R.; Gupta, S.; Singh, S.N. A Comprehensive Survey of Bias in Llms: Current Landscape and Future Directions. arXiv 2024, arXiv:2409.16430. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).