Submitted:
11 July 2024
Posted:
11 July 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Defining In-Context Learning in Large Language Models
2.1. Characteristics of ICL
2.2. Research Gaps and Suggested Agenda
- Investigate the threshold at which in-context learning emerges in language models of varying sizes and architectures.
- Explore the differences in in-context learning capabilities between models trained on diverse datasets.
- Examine the impact of model scaling on the robustness and generalizability of in-context learning.
3. Theoretical Foundations of ICL
3.1. Implicit Meta-Learning During Pre-training
3.2. Emergence of Task Vectors
3.3. Research Gaps and Suggested Agenda
- Further investigation into the precise mechanisms by which transformers represent and utilize in-context information is needed, building on the preliminary identification of induction heads [6].
- There is a need for a deeper understanding of the distributional properties that facilitate ICL, especially in real-world data, beyond synthetic datasets [3].
- Expanding on the theoretical frameworks that link language modeling to downstream task performance, including the exploration of alternative objectives to cross-entropy, could yield insights into more efficient pretraining strategies [4].
4. Mechanisms Underlying ICL
4.1. Gradient Descent-like Behavior
4.2. Statistical Learning in Transformers
4.3. Research Gaps and Suggested Agenda
- Understanding the role of softmax in attention mechanisms and its impact on abrupt learning improvements can be valuable for enhancing ICL [9].
5. Optimizing ICL Performance
5.1. Prompt Engineering Techniques
5.2. Demonstration Selection Strategies
5.3. Example Ordering Methods
5.4. Research Gaps and Suggested Agenda
6. Enhancing ICL Generalization
6.1. Cross-task Transfer in ICL
6.2. Few-shot and Zero-shot ICL Approaches
6.3. Research Gaps and Suggested Agenda
- Mitigating Label Biases: Despite attempts to control label biases in ICL, as highlighted by [27], further research is necessary to develop more robust bias mitigation strategies that can handle the nuanced and diverse label biases that can occur across different domains and model scales.
- Knowledge Editing and Updating: The ability to edit and update factual knowledge within LLMs without retraining is a crucial area of exploration. Studies like [26] show the potential of in-context knowledge editing, but more scalable and efficient methods need to be developed to handle the dynamic nature of knowledge.
- Diverse Demonstration Effectiveness: While the use of diverse demonstrations for ICL has been proposed ([32]), understanding the optimal strategies for selecting and updating these demonstrations to maximize generalization across various tasks remains an open question.
7. Applications and Use Cases of ICL
7.1. Natural Language Processing Tasks
7.2. Visual and Multimodal Applications
7.3. Research Gaps and Suggested Agenda
- Enhancing the in-context learning capabilities of VLMs for complex multi-modal prompts, building on the work in [39], and addressing language biases in these models.
- Developing more effective methods for selecting in-context examples to improve ICL performance, as the quality of examples is critical for tasks like visual in-context learning [40].
8. Conclusion
References
- Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; Chi, E.; Hashimoto, T.; Vinyals, O.; Liang, P.; Dean, J.; Fedus, W. Emergent Abilities of Large Language Models 2022.
- Srivastava, A.; Rastogi, A.; Rao, A.; Shoeb, A.A.M.; Abid, A.; Fisch, A.; Brown, A.R.; Santoro, A.; Gupta, A.; Garriga-Alonso, A.; Kluska, A.; Lewkowycz, A.; Agarwal, A.; Power, A.; Ray, A.; Warstadt, A.; Kocurek, A.W.; Safaya, A.; Tazarv, A.; Xiang, A.; Parrish, A.; Nie, A.; Hussain, A.; Askell, A.; Dsouza, A.; Slone, A.; Rahane, A.; Iyer, A.S.; Andreassen, A.; Madotto, A.; Santilli, A.; Stuhlmuller, A.; Dai, A.M.; La, A.; Lampinen, A.K.; Zou, A.; Jiang, A.; Chen, A.; Vuong, A.; Gupta, A.; Gottardi, A.; Norelli, A.; Venkatesh, A.; Gholamidavoodi, A.; Tabassum, A.; Menezes, A.; Kirubarajan, A.; Mullokandov, A.; Sabharwal, A.; Herrick, A.; Efrat, A.; Erdem, A.; Karakacs, A.; Roberts, B.R.; Loe, B.S.; Zoph, B.; Bojanowski, B.; Ozyurt, B.; Hedayatnia, B.; Neyshabur, B.; Inden, B.; Stein, B.; Ekmekci, B.; Lin, B.Y.; Howald, B.; Orinion, B.; Diao, C.; Dour, C.; Stinson, C.; Argueta, C.; Ram’irez, C.F.; Singh, C.; Rathkopf, C.; Meng, C.; Baral, C.; Wu, C.; Callison-Burch, C.; Waites, C.; Voigt, C.; Manning, C.D.; Potts, C.; Ramirez, C.; Rivera, C.; Siro, C.; Raffel, C.; Ashcraft, C.; Garbacea, C.; Sileo, D.; Garrette, D.H.; Hendrycks, D.; Kilman, D.; Roth, D.; Freeman, D.; Khashabi, D.; Levy, D.; Gonz’alez, D.; Perszyk, D.R.; Hernandez, D.; Chen, D.; Ippolito, D.; Gilboa, D.; Dohan, D.; Drakard, D.; Jurgens, D.; Datta, D.; Ganguli, D.; Emelin, D.; Kleyko, D.; Yuret, D.; Chen, D.; Tam, D.; Hupkes, D.; Misra, D.; Buzan, D.; Mollo, D.C.; Yang, D.; Lee, D.H.; Schrader, D.; Shutova, E.; Cubuk, E.D.; Segal, E.; Hagerman, E.; Barnes, E.; Donoway, E.; Pavlick, E.; Rodolà, E.; Lam, E.; Chu, E.; Tang, E.; Erdem, E.; Chang, E.; Chi, E.A.; Dyer, E.; Jerzak, E.; Kim, E.; Manyasi, E.E.; Zheltonozhskii, E.; Xia, F.; Siar, F.; Mart’inez-Plumed, F.; Happ’e, F.; Chollet, F.; Rong, F.; Mishra, G.; Winata, G.I.; Melo, G.d.; Kruszewski, G.; Parascandolo, G.; Mariani, G.; Wang, G.X.; Jaimovitch-L’opez, G.; Betz, G.; Gur-Ari, G.; Galijasevic, H.; Kim, H.; Rashkin, H.; Hajishirzi, H.; Mehta, H.; Bogar, H.; Shevlin, H.; Schutze, H.; Yakura, H.; Zhang, H.; Wong, H.M.; Ng, I.; Noble, I.; Jumelet, J.; Geissinger, J.; Kernion, J.; Hilton, J.; Lee, J.; Fisac, J.; Simon, J.B.; Koppel, J.; Zheng, J.; Zou, J.; Koco’n, J.; Thompson, J.; Wingfield, J.; Kaplan, J.; Radom, J.; Sohl-Dickstein, J.N.; Phang, J.; Wei, J.; Yosinski, J.; Novikova, J.; Bosscher, J.; Marsh, J.; Kim, J.; Taal, J.; Engel, J.; Alabi, J.O.; Xu, J.; Song, J.; Tang, J.; Waweru, J.W.; Burden, J.; Miller, J.; Balis, J.U.; Batchelder, J.; Berant, J.; Frohberg, J.; Rozen, J.; Hernández-Orallo, J.; Boudeman, J.; Guerr, J.; Jones, J.; Tenenbaum, J.; Rule, J.S.; Chua, J.; Kanclerz, K.; Livescu, K.; Krauth, K.; Gopalakrishnan, K.; Ignatyeva, K.; Markert, K.; Dhole, K.D.; Gimpel, K.; Omondi, K.; Mathewson, K.; Chiafullo, K.; Shkaruta, K.; Shridhar, K.; McDonell, K.; Richardson, K.; Reynolds, L.; Gao, L.; Zhang, L.; Dugan, L.; Qin, L.; Contreras-Ochando, L.; Morency, L.P.; Moschella, L.; Lam, L.; Noble, L.; Schmidt, L.; He, L.; Col’on, L.O.; Metz, L.; cSenel, L.K.; Bosma, M.; Sap, M.; Hoeve, M.t.; Farooqi, M.; Faruqui, M.; Mazeika, M.; Baturan, M.; Marelli, M.; Maru, M.; Quintana, M.J.R.; Tolkiehn, M.; Giulianelli, M.; Lewis, M.; Potthast, M.; Leavitt, M.L.; Hagen, M.; Schubert, M.; Baitemirova, M.; Arnaud, M.; McElrath, M.; Yee, M.; Cohen, M.; Gu, M.; Ivanitskiy, M.; Starritt, M.; Strube, M.; Swkedrowski, M.; Bevilacqua, M.; Yasunaga, M.; Kale, M.; Cain, M.; Xu, M.; Suzgun, M.; Walker, M.; Tiwari, M.; Bansal, M.; Aminnaseri, M.; Geva, M.; Gheini, M.; MukundVarma, T.; Peng, N.; Chi, N.A.; Lee, N.; Krakover, N.G.A.; Cameron, N.; Roberts, N.; Doiron, N.; Martinez, N.; Nangia, N.; Deckers, N.; Muennighoff, N.; Keskar, N.; Iyer, N.; Constant, N.; Fiedel, N.; Wen, N.; Zhang, O.; Agha, O.; Elbaghdadi, O.; Levy, O.; Evans, O.; Casares, P.A.M.; Doshi, P.; Fung, P.; Liang, P.; Vicol, P.; Alipoormolabashi, P.; Liao, P.; Liang, P.; Chang, P.; Eckersley, P.; Htut, P.M.; Hwang, P.B.; Milkowski, P.; Patil, P.; Pezeshkpour, P.; Oli, P.; Mei, Q.; Lyu, Q.; Chen, Q.; Banjade, R.; Rudolph, R.E.; Gabriel, R.; Habacker, R.; Risco, R.; Milliere, R.; Garg, R.; Barnes, R.; Saurous, R.; Arakawa, R.; Raymaekers, R.; Frank, R.; Sikand, R.; Novak, R.; Sitelew, R.; Bras, R.L.; Liu, R.; Jacobs, R.; Zhang, R.; Salakhutdinov, R.; Chi, R.; Lee, R.; Stovall, R.; Teehan, R.; Yang, R.; Singh, S.; Mohammad, S.M.; Anand, S.; Dillavou, S.; Shleifer, S.; Wiseman, S.; Gruetter, S.; Bowman, S.R.; Schoenholz, S.; Han, S.; Kwatra, S.; Rous, S.A.; Ghazarian, S.; Ghosh, S.; Casey, S.; Bischoff, S.; Gehrmann, S.; Schuster, S.; Sadeghi, S.; Hamdan, S.S.; Zhou, S.; Srivastava, S.; Shi, S.; Singh, S.; Asaadi, S.; Gu, S.; Pachchigar, S.; Toshniwal, S.; Upadhyay, S.; Debnath, S.; Shakeri, S.; Thormeyer, S.; Melzi, S.; Reddy, S.; Makini, S.; Lee, S.H.; Torene, S.; Hatwar, S.; Dehaene, S.; Divic, S.; Ermon, S.; Biderman, S.; Lin, S.; Prasad, S.; Piantadosi, S.T.; Shieber, S.M.; Misherghi, S.; Kiritchenko, S.; Mishra, S.; Linzen, T.; Schuster, T.; Li, T.; Yu, T.; Ali, T.; Hashimoto, T.; Wu, T.L.; Desbordes, T.; Rothschild, T.; Phan, T.; Wang, T.; Nkinyili, T.; Schick, T.; Kornev, T.; Tunduny, T.; Gerstenberg, T.; Chang, T.; Neeraj, T.; Khot, T.; Shultz, T.; Shaham, U.; Misra, V.; Demberg, V.; Nyamai, V.; Raunak, V.; Ramasesh, V.; Prabhu, V.U.; Padmakumar, V.; Srikumar, V.; Fedus, W.; Saunders, W.; Zhang, W.; Vossen, W.; Ren, X.; Tong, X.; Zhao, X.; Wu, X.; Shen, X.; Yaghoobzadeh, Y.; Lakretz, Y.; Song, Y.; Bahri, Y.; Choi, Y.; Yang, Y.; Hao, Y.; Chen, Y.; Belinkov, Y.; Hou, Y.; Hou, Y.; Bai, Y.; Seid, Z.; Zhao, Z.; Wang, Z.; Wang, Z.J.; Wang, Z.; Wu, Z. Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models 2022.
- Xie, S.M.; Raghunathan, A.; Liang, P.; Ma, T. An Explanation of In-context Learning as Implicit Bayesian Inference 2021.
- Saunshi, N.; Malladi, S.; Arora, S. A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks 2020.
- Bai, Y.; Chen, F.; Wang, H.; Xiong, C.; Mei, S. Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection 2023.
- Olsson, C.; Elhage, N.; Nanda, N.; Joseph, N.; Dassarma, N.; Henighan, T.; Mann, B.; Askell, A.; Bai, Y.; Chen, A.; Conerly, T.; Drain, D.; Ganguli, D.; Hatfield-Dodds, Z.; Hernandez, D.; Johnston, S.; Jones, A.; Kernion, J.; Lovitt, L.; Ndousse, K.; Amodei, D.; Brown, T.B.; Clark, J.; Kaplan, J.; McCandlish, S.; Olah, C. In-context Learning and Induction Heads 2022.
- Dai, D.; Sun, Y.; Dong, L.; Hao, Y.; Sui, Z.; Wei, F. Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers 2023.
- Natan, T.B.; Deutch, G.; Magar, N.; Dar, G. In-context Learning and Gradient Descent Revisited 2023.
- Hoffmann, D.T.; Schrodi, S.; Behrmann, N.; Fischer, V.; Brox, T. Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems 2023.
- Min, S.; Lyu, X.; Holtzman, A.; Artetxe, M.; Lewis, M.; Hajishirzi, H.; Zettlemoyer, L. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? 2022.
- Wei, J.W.; Wei, J.; Tay, Y.; Tran, D.; Webson, A.; Lu, Y.; Chen, X.; Liu, H.; Huang, D.; Zhou, D.; Ma, T. Larger language models do in-context learning differently 2023.
- Wang, L.; Li, L.; Dai, D.; Chen, D.; Zhou, H.; Meng, F.; Zhou, J.; Sun, X. Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning 2023.
- Mao, H.; Liu, G.D.; Ma, Y.; Wang, R.; Tang, J. A Data Generation Perspective to the Mechanism of In-Context Learning 2024.
- Gao, X.; Das, K. Customizing Language Model Responses with Contrastive In-Context Learning 2024.
- Do, X.L.; Zhao, Y.; Brown, H.; Xie, Y.; Zhao, J.X.; Chen, N.F.; Kawaguchi, K.; Xie, M.Q.; He, J. Prompt Optimization via Adversarial In-Context Learning 2023.
- Li, D.; Liu, Z.; Hu, X.; Sun, Z.; Hu, B.; Zhang, M. In-Context Learning State Vector with Inner and Momentum Optimization 2024.
- Li, X.; Lv, K.; Yan, H.; Lin, T.; Zhu, W.; Ni, Y.; Xie, G.; Wang, X.; Qiu, X. Unified Demonstration Retriever for In-Context Learning 2023.
- Ye, J.; Wu, Z.; Feng, J.; Yu, T.; Kong, L. Compositional Exemplars for In-context Learning 2023.
- Mavromatis, C.; Srinivasan, B.; Shen, Z.; Zhang, J.; Rangwala, H.; Faloutsos, C.; Karypis, G. Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection 2023.
- Luo, M.; Xu, X.; Dai, Z.; Pasupat, P.; Kazemi, M.; Baral, C.; Imbrasaite, V.; Zhao, V. Dr.ICL: Demonstration-Retrieved In-context Learning 2023.
- Li, X.; Qiu, X. Finding Support Examples for In-Context Learning 2023.
- Zhang, K.; Lv, A.; Chen, Y.; Ha, H.; Xu, T.; Yan, R. Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning 2024.
- Wu, Z.; Lin, X.; Dai, Z.; Hu, W.; Shu, Y.; Ng, S.K.; Jaillet, P.; Low, B. Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars 2024.
- Xu, Z.; Cohen, D.; Wang, B.; Srikumar, V. In-Context Example Ordering Guided by Label Distributions 2024.
- Ye, Q.; Beltagy, I.; Peters, M.E.; Ren, X.; Hajishirzi, H. FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning 2023.
- Zheng, C.; Li, L.; Dong, Q.; Fan, Y.; Wu, Z.; Xu, J.; Chang, B. Can We Edit Factual Knowledge by In-Context Learning? 2023.
- Fei, Y.; Hou, Y.; Chen, Z.; Bosselut, A. Mitigating Label Biases for In-context Learning 2023.
- Song, F.; Fan, Y.; Zhang, X.; Wang, P.; Wang, H. ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization 2024.
- Chen, W.L.; Wu, C.K.; Chen, H.H. Self-ICL: Zero-Shot In-Context Learning with Self-Generated Demonstrations 2023.
- Lyu, X.; Min, S.; Beltagy, I.; Zettlemoyer, L.; Hajishirzi, H. Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations 2022.
- Xu, B.; Wang, Q.; Mao, Z.; Lyu, Y.; She, Q.; Zhang, Y. kNN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference 2023.
- He, J.; Wang, L.; Hu, Y.; Liu, N.; Liu, H.j.; Xu, X.; Shen, H. ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction 2023.
- Kung, T.H.; Cheatham, M.; Medenilla, A.; Sillos, C.; Leon, L.D.; Elepaño, C.; Madriaga, M.; Aggabao, R.; Diaz-Candido, G.; Maningo, J.; Tseng, V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models 2022.
- Gilson, A.; Safranek, C.; Huang, T.; Socrates, V.; Chi, L.; Taylor, R.; Chartash, D. How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment 2023.
- Li, T.; Ma, X.; Zhuang, A.; Gu, Y.; Su, Y.; Chen, W. Few-shot In-context Learning on Knowledge Base Question Answering 2023.
- Seegmiller, P.; Gatto, J.; Basak, M.; Cook, D.J.; Ghasemzadeh, H.; Stankovic, J.; Preum, S. The Scope of In-Context Learning for the Extraction of Medical Temporal Constraints 2023.
- Liu, S.; Cai, Z.; Chen, G.; Li, X. Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification 2024.
- Zhang, X.; Ghosh, S.; Bansal, C.; Wang, R.; Ma, M.J.; Kang, Y.; Rajmohan, S. Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 2024.
- Zhao, H.; Cai, Z.; Si, S.; Ma, X.; An, K.; Chen, L.; Liu, Z.; Wang, S.; Han, W.; Chang, B. MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning 2023.
- Zhang, Y.; Zhou, K.; Liu, Z. What Makes Good Examples for Visual In-Context Learning? 2023.
- Wang, Z.; Jiang, Y.; Lu, Y.; Shen, Y.; He, P.; Chen, W.; Wang, Z.; Zhou, M. In-Context Learning Unlocked for Diffusion Models 2023.
- Zhang, M.; He, J.; Lei, S.; Yue, M.; Wang, L.; Lu, C.T. Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization 2023.
| 1 | This survey can be reproduced with Scholar-Chat AI |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).