Submitted:
23 May 2025
Posted:
24 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. History and Motivation
1.2. Limitations of Traditional Models
1.3. Why Reinforcement Learning?
1.4. Key Contributions of This Survey
- We provide a structured taxonomy of reinforcement learning frameworks applied in recommendation systems, categorized by decision structure, adaptability, and application focus.
- We critically compare the strengths, weaknesses, and deployment contexts of value-based, policy-based, and hierarchical RL methods.
- We highlight key challenges in real-time personalization, fairness, and reward modeling, proposing emerging directions such as offline RL and hybrid policy learning.
- We contextualize reinforcement learning applications beyond e-commerce, with attention to healthcare, education, and high-stakes decision domains where interpretability and safety are paramount.
2. Background and Foundations
2.1. Fundamentals of Reinforcement Learning
2.2. Modeling Recommendations as Markov Decision Processes
3. Reinforcement Learning Frameworks in Recommendation
3.1. Policy-Guided Path Reasoning over Knowledge Graphs
3.2. Hierarchical Reinforcement Learning for Structured User Goals
3.3. Adaptive Deep Q-Networks for Real-Time Personalization
3.4. Comparative Summary of RL Frameworks
4. Methodology
4.1. Evaluation Frameworks and Metrics
4.2. Practical Implementation Considerations
5. Discussion
5.1. Core Insights and Practical Lessons
5.2. Ongoing Challenges in RL-based Recommendation
6. Future Directions and Open Challenges
7. Conclusions
7.1. Final Summary
7.2. The Road Ahead for RL in Recommendation Systems
References
- Yikun Xian, Zuohui Fu, S. Muthukrishnan, Gerard de Melo, and Yongfeng Zhang. Reinforcement knowledge graph reasoning for explainable recommendation. Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019.
- Xiting Wang, Kunpeng Liu, Dongjie Wang, Le Wu, Yanjie Fu, and Xing Xie. Multi-level recommendation reasoning over knowledge graphs with reinforcement learning. Proceedings of the ACM Web Conference, 2022. [CrossRef]
- Pengyang Wang, Kunpeng Liu, Lu Jiang, Xiaolin Li, and Yanjie Fu. Incremental mobile user profiling: Reinforcement learning with spatial knowledge graph for modeling event streams. Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2020.
- Dongjie Wang, Pengyang Wang, Kunpeng Liu, Yuanchun Zhou, Charles Hughes, and Yanjie Fu. Reinforced imitative graph representation learning for mobile user profiling: An adversarial training perspective. Proceedings of the AAAI Conference on Artificial Intelligence, 2021.
- Dongjie Wang, Pengyang Wang, Yanjie Fu, Kunpeng Liu, Hui Xiong, and Charles Hughes. Reinforced imitative graph learning for mobile user profiling. IEEE Transactions on Knowledge and Data Engineering, 2023. [CrossRef]
- Shipeng Guo, Kunpeng Liu, Pengfei Wang, Weiwei Dai, Yi Du, Yuanchun Zhou, and Wenjuan Cui. Rdkg: A reinforcement learning framework for disease diagnosis on knowledge graph. Proceedings of the IEEE International Conference on Data Mining (ICDM), 2023. [CrossRef]
- Lu Jiang, Kunpeng Liu, Dongjie Wang, and Pengyang Wang. Reinforced explainable knowledge concept recommendation in moocs. ACM Transactions on Intelligent Systems and Technology, 2023. [CrossRef]
- Yanan Xiao, Lu Jiang, Kunpeng Liu, Yuanbo Xu, Pengyang Wang, and Minghao Yin. Hierarchical reinforcement learning for point of interest recommendation. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI), 2024.
- Xinhao Zhang, Jinghan Zhang, Wujun Si, and Kunpeng Liu. Dynamic weight adjusting deep q-networks for real-time environmental adaptation. arXiv preprint arXiv:2411.02559, 2024.
- Kunpeng Liu, Xiaolin Li, Cliff C. Zou, Haibo Huang, and Yanjie Fu. Ambulance dispatch via deep reinforcement learning. Proceedings of the 28th International Conference on Advances in Geographic Information Systems (SIGSPATIAL), 2020.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).