Submitted:
23 December 2024
Posted:
25 December 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work on Model Compression
3. Techniques for Knowledge Distillation in LLMs
4. Future Works
4.1. Emergent Behavior Distillation
4.2. Dynamic and Adaptive Distillation
4.3. Multimodal Distillation
4.4. Generalization Across Domains
4.5. Scalability of Distillation Techniques
4.6. Integration with Other Compression Techniques
4.7. Environmental and Ethical Considerations
4.8. Automated Distillation Frameworks
4.9. Evaluation Metrics and Benchmarks
5. Conclusion
References
- Agarwal, R.; Vieillard, N.; Zhou, Y.; Stanczyk, P.; Garea, S.R.; Geist, M.; Bachem, O. On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes. In Proceedings of the The Twelfth International Conference on Learning Representations, 2024.
- Burges, C.; Shaked, T.; Renshaw, E.; Lazier, A.; Deeds, M.; Hamilton, N.; Hullender, G. Learning to rank using gradient descent. In Proceedings of the Proceedings of the 22nd International Conference on Machine Learning, New York, NY, USA, 2005; ICML ’05, p. 89–96. [CrossRef]
- Jiang, Y.; Chan, C.; Chen, M.; Wang, W. Lion: Adversarial Distillation of Closed-Source Large Language Model. arXiv preprint arXiv:2305.12870 2023.
- Li, L.; Xie, Z.; Li, M.; Chen, S.; Wang, P.; Chen, L.; Yang, Y.; Wang, B.; Kong, L. Silkie: Preference Distillation for Large Visual Language Models. arXiv preprint arXiv:2312.10665 2023.
- Allen-Zhu, Z.; Li, Y. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. arXiv preprint arXiv:2012.09816 2020.
- Sachan, D.; Lewis, M.; Joshi, M.; Aghajanyan, A.; Yih, W.t.; Pineau, J.; Zettlemoyer, L. Improving Passage Retrieval with Zero-Shot Question Generation. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing; Goldberg, Y.; Kozareva, Z.; Zhang, Y., Eds., Abu Dhabi, United Arab Emirates, 2022; pp. 3781–3797. [CrossRef]
- Bai, Y.; Kadavath, S.; Kundu, S.; Askell, A.; Kernion, J.; Jones, A.; Chen, A.; Goldie, A.; Mirhoseini, A.; McKinnon, C.; et al. Constitutional AI: Harmlessness from AI Feedback, 2022, [arXiv:cs.CL/2212.08073].
- Gangal, V.; Feng, S.Y.; Alikhani, M.; Mitamura, T.; Hovy, E. Nareor: The narrative reordering problem. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2022, Vol. 36, pp. 10645–10653.
- Wan, Z.; Wang, X.; Liu, C.; Alam, S.; Zheng, Y.; Liu, J.; Qu, Z.; Yan, S.; Zhu, Y.; Zhang, Q.; et al. Efficient Large Language Models: A Survey, 2024, [arXiv:cs.CL/2312.03863].
- Lai, J.; Gan, W.; Wu, J.; Qi, Z.; Yu, P.S. Large Language Models in Law: A Survey. arXiv preprint arXiv:2312.03718 2023. [CrossRef]
- Dai, H.; Liu, Z.; Liao, W.; Huang, X.; Cao, Y.; Wu, Z.; Zhao, L.; Xu, S.; Liu, W.; Liu, N.; et al. AugGPT: Leveraging ChatGPT for Text Data Augmentation, 2023, [arXiv:cs.CL/2302.13007].
- Sachan, D.S.; Lewis, M.; Yogatama, D.; Zettlemoyer, L.; Pineau, J.; Zaheer, M. Questions Are All You Need to Train a Dense Passage Retriever. Transactions of the Association for Computational Linguistics 2023, 11, 600–616.
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 2019.
- Liu, W.; Zeng, W.; He, K.; Jiang, Y.; He, J. What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning, 2023, [arXiv:cs.CL/2312.15685].
- Kiesel, J.; Alshomary, M.; Handke, N.; Cai, X.; Wachsmuth, H.; Stein, B. Identifying the Human Values behind Arguments. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Muresan, S.; Nakov, P.; Villavicencio, A., Eds., Dublin, Ireland, 2022; pp. 4459–4471. [CrossRef]
- Kang, M.; Lee, S.; Baek, J.; Kawaguchi, K.; Hwang, S.J. Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks, 2023, [arXiv:cs.CL/2305.18395].
- Meng, R.; Liu, Y.; Yavuz, S.; Agarwal, D.; Tu, L.; Yu, N.; Zhang, J.; Bhat, M.; Zhou, Y. AugTriever: Unsupervised Dense Retrieval by Scalable Data AugAmentation, 2023, [arXiv:cs.CL/2212.08841].
- Magister, L.C.; Mallinson, J.; Adamek, J.; Malmi, E.; Severyn, A. Teaching Small Language Models to Reason. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); Rogers, A.; Boyd-Graber, J.; Okazaki, N., Eds., Toronto, Canada, 2023; pp. 1773–1781. [CrossRef]
- Gu, Y.; Dong, L.; Wei, F.; Huang, M. MiniLLM: Knowledge Distillation of Large Language Models. In Proceedings of the The Twelfth International Conference on Learning Representations, 2024.
- Li, G.; Hammoud, H.A.A.K.; Itani, H.; Khizbullin, D.; Ghanem, B. Camel: Communicative agents for" mind" exploration of large scale language model society. arXiv preprint arXiv:2303.17760 2023.
- Lee, Y.S.; Sultan, M.; El-Kurdi, Y.; Naseem, T.; Munawar, A.; Florian, R.; Roukos, S.; Astudillo, R. Ensemble-Instruct: Instruction Tuning Data Generation with a Heterogeneous Mixture of LMs. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023; Bouamor, H.; Pino, J.; Bali, K., Eds., Singapore, 2023; pp. 12561–12571. [CrossRef]
- Ye, S.; Jo, Y.; Kim, D.; Kim, S.; Hwang, H.; Seo, M. SelFee: Iterative Self-Revising LLM Empowered by Self-Feedback Generation. Blog post, 2023.
- Cao, H.; Liu, Z.; Lu, X.; Yao, Y.; Li, Y. InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery. CoRR 2023, abs/2311.16208, [2311.16208]. [CrossRef]
- Sumers, T.; Marino, K.; Ahuja, A.; Fergus, R.; Dasgupta, I. Distilling internet-scale vision-language models into embodied agents. In Proceedings of the Proceedings of the 40th International Conference on Machine Learning. JMLR.org, 2023, ICML’23.
- Lou, R.; Zhang, K.; Xie, J.; Sun, Y.; Ahn, J.; Xu, H.; Su, Y.; Yin, W. MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following, 2023, [arXiv:cs.CL/2312.02436].
- Liu, W.; Li, G.; Zhang, K.; Du, B.; Chen, Q.; Hu, X.; Xu, H.; Chen, J.; Wu, J. Mind’s Mirror: Distilling Self-Evaluation Capability and Comprehensive Thinking from Large Language Models, 2023, [arXiv:cs.CL/2311.09214].
- Liang, K.J.; Hao, W.; Shen, D.; Zhou, Y.; Chen, W.; Chen, C.; Carin, L. MixKD: Towards Efficient Distillation of Large-scale Language Models. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021.
- Liu, H.; Li, C.; Li, Y.; Lee, Y.J. Improved Baselines with Visual Instruction Tuning, 2023, [arXiv:cs.CV/2310.03744].
- Kim, M.; Lee, S.; Lee, J.; Hong, S.; Chang, D.S.; Sung, W.; Choi, J. Token-Scaled Logit Distillation for Ternary Weight Generative Language Models. arXiv preprint arXiv:2308.06744 2023.
- Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent Abilities of Large Language Models. Trans. Mach. Learn. Res. 2022, 2022.
- Li, M.; Chen, L.; Chen, J.; He, S.; Gu, J.; Zhou, T. Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning. 2024.
- Zniyed, Y.; Nguyen, T.P.; et al. Efficient tensor decomposition-based filter pruning. Neural Networks 2024, 178, 106393. [CrossRef]
- Li, Y.; Yu, Y.; Zhang, Q.; Liang, C.; He, P.; Chen, W.; Zhao, T. LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation, 2023, [arXiv:cs.LG/2306.11222].
- Kim, Y.J.; Henry, R.; Fahim, R.; Awadalla, H.H. FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs, 2023, [arXiv:cs.LG/2308.09723].
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I.; et al. Language models are unsupervised multitask learners. OpenAI blog 2019, 1, 9.
- Sun, W.; Xie, R.; Zhang, J.; Zhao, W.X.; Lin, L.; Wen, J.R. Distillation is All You Need for Practically Using Different Pre-trained Recommendation Models. arXiv preprint arXiv:2401.00797 2024.
- Plummer, B.A.; Wang, L.; Cervantes, C.M.; Caicedo, J.C.; Hockenmaier, J.; Lazebnik, S. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2015, pp. 2641–2649.
- Schick, T.; Schütze, H. Generating Datasets with Pretrained Language Models. In Proceedings of the Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing; Moens, M.F.; Huang, X.; Specia, L.; Yih, S.W.t., Eds., Online and Punta Cana, Dominican Republic, 2021; pp. 6943–6951. [CrossRef]
- Longpre, S.; Lu, Y.; Tu, Z.; DuBois, C. An Exploration of Data Augmentation and Sampling Techniques for Domain-Agnostic Question Answering. In Proceedings of the Proceedings of the 2nd Workshop on Machine Reading for Question Answering; Fisch, A.; Talmor, A.; Jia, R.; Seo, M.; Choi, E.; Chen, D., Eds., Hong Kong, China, 2019; pp. 220–227. [CrossRef]
- Wang, Z.; Yu, A.W.; Firat, O.; Cao, Y. Towards Zero-Label Language Learning, 2021, [arXiv:cs.CL/2109.09193].
- Chen, B.; Shu, C.; Shareghi, E.; Collier, N.; Narasimhan, K.; Yao, S. FireAct: Toward Language Agent Fine-tuning, 2023, [arXiv:cs.CL/2310.05915].
- Sun, W.; Yan, L.; Ma, X.; Wang, S.; Ren, P.; Chen, Z.; Yin, D.; Ren, Z. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents, 2023, [arXiv:cs.CL/2304.09542].
- Liang, Y.; Wu, C.; Song, T.; Wu, W.; Xia, Y.; Liu, Y.; Ou, Y.; Lu, S.; Ji, L.; Mao, S.; et al. TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs, 2023, [arXiv:cs.AI/2303.16434]. [CrossRef]
- Mukherjee, S.; Mitra, A.; Jawahar, G.; Agarwal, S.; Palangi, H.; Awadallah, A. Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707 2023.
- Chen, Z.; Deng, Y.; Yuan, H.; Ji, K.; Gu, Q. Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models, 2024, [arXiv:cs.LG/2401.01335].
- Wu, T.; Luo, L.; Li, Y.F.; Pan, S.; Vu, T.T.; Haffari, G. Continual Learning for Large Language Models: A Survey. arXiv preprint arXiv:2402.01364 2024.
- Ren, X.; Wei, W.; Xia, L.; Su, L.; Cheng, S.; Wang, J.; Yin, D.; Huang, C. Representation Learning with Large Language Models for Recommendation, 2023, [arXiv:cs.IR/2310.15950].
- Nakano, R.; Hilton, J.; Balaji, S.; Wu, J.; Ouyang, L.; Kim, C.; Hesse, C.; Jain, S.; Kosaraju, V.; Saunders, W.; et al. WebGPT: Browser-assisted question-answering with human feedback, 2022, [arXiv:cs.CL/2112.09332].
- Padmanabhan, S.; Onoe, Y.; Zhang, M.J.; Durrett, G.; Choi, E. Propagating Knowledge Updates to LMs Through Distillation. arXiv preprint arXiv:2306.09306 2023.
- Cai, Z.; Tao, C.; Shen, T.; Xu, C.; Geng, X.; Lin, X.A.; He, L.; Jiang, D. HypeR: Multitask Hyper-Prompted Training Enables Large-Scale Retrieval Generalization. In Proceedings of the The Eleventh International Conference on Learning Representations, 2022.
- Qian, C.; Han, C.; Fung, Y.; Qin, Y.; Liu, Z.; Ji, H. CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023; Bouamor, H.; Pino, J.; Bali, K., Eds., Singapore, 2023; pp. 6922–6939. [CrossRef]
- Qiu, L.; Zhao, Y.; Li, J.; Lu, P.; Peng, B.; Gao, J.; Zhu, S.C. Valuenet: A new dataset for human value driven dialogue system. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2022, Vol. 36, pp. 11183–11191.
- Wu, C.; Zhang, X.; Zhang, Y.; Wang, Y.; Xie, W. PMC-LLaMA: Further Finetuning LLaMA on Medical Papers. CoRR 2023, abs/2304.14454, [2304.14454]. [CrossRef]
- Lu, D.; Wu, H.; Liang, J.; Xu, Y.; He, Q.; Geng, Y.; Han, M.; Xin, Y.; Xiao, Y. BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark. CoRR 2023, abs/2302.09432, [2302.09432]. [CrossRef]
- Yu, F.; Gao, A.; Wang, B. Outcome-supervised Verifiers for Planning in Mathematical Reasoning. CoRR 2023, abs/2311.09724, [2311.09724]. [CrossRef]
- Feng, S.Y.; Gangal, V.; Wei, J.; Chandar, S.; Vosoughi, S.; Mitamura, T.; Hovy, E. A survey of data augmentation approaches for NLP. arXiv preprint arXiv:2105.03075 2021.
- Zniyed, Y.; Nguyen, T.P.; et al. Enhanced network compression through tensor decompositions and pruning. IEEE Transactions on Neural Networks and Learning Systems 2024. [CrossRef]
- Ding, B.; Qin, C.; Liu, L.; Chia, Y.K.; Li, B.; Joty, S.; Bing, L. Is GPT-3 a Good Data Annotator? In Proceedings of the ACL (1). Association for Computational Linguistics, 2023, pp. 11173–11195.
- OpenAI. GPT-4V(ision) System Card. 2023.
- Chiang, W.L.; Li, Z.; Lin, Z.; Sheng, Y.; Wu, Z.; Zhang, H.; Zheng, L.; Zhuang, S.; Zhuang, Y.; Gonzalez, J.E.; et al. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality, 2023.
- Zhao, B.; Wu, B.; He, M.; Huang, T. SVIT: Scaling up Visual Instruction Tuning, 2023, [arXiv:cs.CV/2307.04087].
- Abdine, H.; Chatzianastasis, M.; Bouyioukos, C.; Vazirgiannis, M. Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers. In Proceedings of the Deep Generative Models for Health Workshop NeurIPS 2023, 2023. [CrossRef]
- Yang, Z.; Cherian, S.; Vucetic, S. Data Augmentation for Radiology Report Simplification. In Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023; Vlachos, A.; Augenstein, I., Eds., Dubrovnik, Croatia, 2023; pp. 1922–1932. [CrossRef]
- Xi, Y.; Liu, W.; Lin, J.; Cai, X.; Zhu, H.; Zhu, J.; Chen, B.; Tang, R.; Zhang, W.; Zhang, R.; et al. Towards Open-World Recommendation with Knowledge Augmentation from Large Language Models, 2023, [arXiv:cs.IR/2306.10933].
- Luo, R.; Zhao, Z.; Yang, M.; Dong, J.; Li, D.; Lu, P.; Wang, T.; Hu, L.; Qiu, M.; Wei, Z. Valley: Video Assistant with Large Language model Enhanced abilitY, 2023, [arXiv:cs.CV/2306.07207].
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 2022, 35, 27730–27744.
- Glaese, A.; McAleese, N.; Trebacz, M.; Aslanides, J.; Firoiu, V.; Ewalds, T.; Rauh, M.; Weidinger, L.; Chadwick, M.; Thacker, P.; et al. Improving alignment of dialogue agents via targeted human judgements. arXiv preprint arXiv:2209.14375 2022.
- Sun, Z. A short survey of viewing large language models in legal aspect. arXiv preprint arXiv:2303.09136 2023.
- Wei, W.; Ren, X.; Tang, J.; Wang, Q.; Su, L.; Cheng, S.; Wang, J.; Yin, D.; Huang, C. LLMRec: Large Language Models with Graph Augmentation for Recommendation, 2024, [arXiv:cs.IR/2311.00423].
- Liu, Q.; Chen, N.; Sakai, T.; Wu, X.M. ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models, 2023, [arXiv:cs.IR/2305.06566].
- Kim, Y.; Rush, A.M. Sequence-level knowledge distillation. arXiv preprint arXiv:1606.07947 2016.
- Song, F.; Yu, B.; Li, M.; Yu, H.; Huang, F.; Li, Y.; Wang, H. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492 2023. [CrossRef]
- Solaiman, I.; Dennison, C. Process for adapting language models to society (palms) with values-targeted datasets. Advances in Neural Information Processing Systems 2021, 34, 5861–5873.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
