Submitted:
12 March 2025
Posted:
13 March 2025
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
- The main techniques and approaches used in RL-driven KD.
- The challenges and limitations of RL-driven KD.
- The applications of RL-driven KD in various domains.
- Future research directions and potential breakthroughs.
2. Background and Problem Definition
3. RL-Driven Knowledge Distillation Techniques
3.1. Policy Distillation
3.2. Value Function Distillation
3.3. Dynamic Reward-Guided Distillation
4. Challenges and Solutions
4.1. Capacity Mismatch
4.2. Temporal Dependency
4.3. Reward Design
5. Applications
5.1. Large Language Model Compression
5.2. Autonomous Driving
5.3. DeepSeek
6. Conclusions
References
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. nature 2015, 521, 436–444. [CrossRef]
- Herrmann, L.; Kollmannsberger, S. Deep learning in computational mechanics: a review. Computational Mechanics 2024, 74, 281–331. [CrossRef]
- Yu, P.; Xu, X.; Wang, J. Applications of large language models in multimodal learning. Journal of Computer Technology and Applied Mathematics 2024, 1, 108–116.
- Sun, S.; Ren, W.; Li, J.; Wang, R.; Cao, X. Logit standardization in knowledge distillation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 15731–15740. [CrossRef]
- Muralidharan, S.; Turuvekere Sreenivas, S.; Joshi, R.; Chochowski, M.; Patwary, M.; Shoeybi, M.; Catanzaro, B.; Kautz, J.; Molchanov, P. Compact language models via pruning and knowledge distillation. Advances in Neural Information Processing Systems 2024, 37, 41076–41102.
- Liu, H.; Wang, Y.; Liu, H.; Sun, F.; Yao, A. Small scale data-free knowledge distillation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6008–6016. [CrossRef]
- Tang, C.; Abbatematteo, B.; Hu, J.; Chandra, R.; Martín-Martín, R.; Stone, P. Deep reinforcement learning for robotics: A survey of real-world successes. Annual Review of Control, Robotics, and Autonomous Systems 2024, 8. [CrossRef]
- Li, Q.; Xia, W.; Yin, L.; Jin, J.; Yu, Y. Privileged Knowledge State Distillation for Reinforcement Learning-based Educational Path Recommendation. In Proceedings of the Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, pp. 1621–1630. [CrossRef]
- Xiao, H.; Fu, L.; Shang, C.; Bao, X.; Xu, X. A Knowledge Distillation Compression Algorithm for Ship Speed and Energy Coordinated Optimal Scheduling Model based on Deep Reinforcement Learning. IEEE Transactions on Transportation Electrification 2024. [CrossRef]
- Huang, D.; Xiong, X.; Ma, J.; Li, J.; Jie, Z.; Ma, L.; Li, G. Alignsam: Aligning segment anything model to open context via reinforcement learning. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 3205–3215. [CrossRef]
- Rusu, A.A.; Colmenarejo, S.G.; Gulcehre, C.; Desjardins, G.; Kirkpatrick, J.; Pascanu, R.; Mnih, V.; Kavukcuoglu, K.; Hadsell, R. Policy distillation. arXiv preprint arXiv:1511.06295 2015.
- Wang, Z.; Yang, B.; Yue, H.; Ma, Z. Fine-grained prototypes distillation for few-shot object detection. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2024, Vol. 38, pp. 5859–5866. [CrossRef]
- Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; Bi, X.; et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948 2025.
- Liu, W.; Cheng, S.; Zeng, D.; Qu, H. Enhancing document-level event argument extraction with contextual clues and role relevance. Findings of the Association for Computational Linguistics: ACL 2023 2023. [CrossRef]
- Jiang, J.; Wang, Z.; Qiu, S.; Li, X.; Zhang, C. Multi-Task Load Identification and Signal Denoising Via Hierarchical Knowledge Distillation. IEEE Transactions on Network Science and Engineering 2025, pp. 1–14.
- Jin, M.; Yu, Q.; Huang, J.; Zeng, Q.; Wang, Z.; Hua, W.; Zhao, H.; Mei, K.; Meng, Y.; Ding, K.; et al. Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? In Proceedings of the Proceedings of the 31st International Conference on Computational Linguistics, 2024, pp. 558–573.
- Yang, K.; Tao, J.; Lyu, J.; Li, X. Exploration and anti-exploration with distributional random network distillation. arXiv preprint arXiv:2401.09750 2024.
- Xu, X.; Xu, Z.; Yu, P.; Wang, J. Enhancing user intent for recommendation systems via large language models. arXiv preprint arXiv:2501.10871 2025.
- Mai, Z.; Zhang, J.; Xu, Z.; Xiao, Z. Is llama 3 good at sarcasm detection? a comprehensive study. In Proceedings of the Proceedings of the 2024 7th International Conference on Machine Learning and Machine Intelligence (MLMI), 2024, pp. 141–145. [CrossRef]
- Yi, J.; Xu, Z.; Huang, T.; Yu, P. Challenges and Innovations in LLM-Powered Fake News Detection: A Synthesis of Approaches and Future Directions. arXiv preprint arXiv:2502.00339 2025.
- Huang, T.; Yi, J.; Yu, P.; Xu, X. Unmasking Digital Falsehoods: A Comparative Analysis of LLM-Based Misinformation Detection Strategies. arXiv preprint arXiv:2503.00724 2025.
- Huang, X.; Wu, Y.; Zhang, D.; Hu, J.; Long, Y. Improving Academic Skills Assessment with NLP and Ensemble Learning. In Proceedings of the 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (ICISCAE). IEEE, 2024, pp. 37–41. [CrossRef]
- Liu, W.; Chen, J.; Ji, K.; Zhou, L.; Chen, W.; Wang, B. RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions. arXiv preprint arXiv:2501.00353 2024.
- Wu, Y.; Xiao, Z.; Zhang, J.; Mai, Z.; Xu, Z. Can LLaMA 3 Understand Monetary Policy? In Proceedings of the 2024 17th International Conference on Advanced Computer Theory and Engineering (ICACTE). IEEE, 2024, pp. 145–149. [CrossRef]
- Huang, T.; Xu, Z.; Yu, P.; Yi, J.; Xu, X. A Hybrid Transformer Model for Fake News Detection: Leveraging Bayesian Optimization and Bidirectional Recurrent Unit. arXiv preprint arXiv:2502.09097 2025.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).