Submitted:
29 May 2025
Posted:
30 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- An original DRL agent architecture that successfully integrates Knowledge Graph Embeddings (KGEs) during the learning process. We modify the baseline layers to allow for concatenating the KGEs with the hidden activations from the “visual layers”.
- A methodology to quantify the improvement of the DRL agents’ performance, in terms of time and accuracy, through the use of KGEs with respect to a baseline agent with no semantic information of the environment and an agent with only partial information.
- An analysis of the possible improvement of the agents’ exploration and exploitation capabilities by examining the distribution of each joint orientation during the evaluation episodes.
- A quantitative and qualitative evaluation of the embedding’s influence in different environments and with various robot manipulators.
2. Related Work
3. Environment Setup
3.1. Experiment description
3.2. Setup description
3.3. MDP Definition
4. Methodology
4.1. Knowledge Graph and Embeddings
4.2. Deep Reinforcement Learning Framework
4.2.1. Asynchronous Advantage Actor Critic (A3C)
4.2.2. Proposed Architecture
4.2.3. Training Procedure
4.2.4. Post-Training Evaluation Procedure
4.2.5. Experiment Description
5. Results
5.1. Experiments without DR
5.2. Experiments with DR
6. Discussion
7. Limitations
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lazaridis, A. Deep Reinforcement Learning: A State-of-the-Art Walkthrough. Journal of Artificial Intelligence Research 2020, 69, 1421–1471. [Google Scholar] [CrossRef]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Foundations and Trends in Machine Learning 2018, 11, 219–354. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement learning: an introduction; MIT Press, 2018.
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: -, 2016.
- Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Panzer, M.; Bender, B. Deep reinforcement learning in production systems: a systematic literature review. International Journal of Production Research 2022, 60, 4316–4341. [Google Scholar] [CrossRef]
- Rupprecht, T.; Wang, Y. A survey for deep reinforcement learning in markovian cyber–physical systems: Common problems and solutions. Neural Networks 2022, 153, 13–36. [Google Scholar] [CrossRef]
- Singh, B.; Kumar, R.; Singh, V.P. Reinforcement learning in robotic applications: a comprehensive survey. Artificial Intelligence Review 2022, 55, 945–990. [Google Scholar] [CrossRef]
- Qureshi, A.H.; Nakamura, Y.; Yoshikawa, Y.; Ishiguro, H. Intrinsically motivated reinforcement learning for human–robot interaction in the real-world. Neural Networks 2018, 107, 23–33. [Google Scholar] [CrossRef]
- Davidson, G.; Lake, B.M. Investigating Simple Object Representations in Model-Free Deep Reinforcement Learning. Computing Research Repository (CoRR) 2020. [Google Scholar] [CrossRef]
- Kroemer, O.; Niekum, S.; Konidaris, G. A review of robot learning for manipulation: Challenges, representations, and algorithms. Journal of machine learning research 2021, 22, 1–82. [Google Scholar]
- Cabi, S.; Colmenarejo, S.G.; Novikov, A.; Konyushkova, K.; Reed, S.E.; Jeong, R.; Zolna, K.; Aytar, Y.; Budden, D.; Vecerík, M.; et al. Scaling data-driven robotics with reward sketching and batch reinforcement learning. Robotics: Science and Systems XVI 2019. [CrossRef]
- Wang, D.; Walters, R.; Zhu, X.; Platt, R. Equivariant q learning in spatial action spaces. In Proceedings of the 5th Conference on Robot Learning (CoRL). PMLR; 2022; pp. 1713–1723. [Google Scholar]
- Zeng, A.; Song, S.; Yu, K.T.; Donlon, E.; Hogan, F.R.; Bauza, M.; Ma, D.; Taylor, O.; Liu, M.; Romo, E.; et al. Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching. The International Journal of Robotics Research 2022, 41, 690–705. [Google Scholar] [CrossRef]
- Lee, Y.; Hu, E.S.; Lim, J.J. IKEA Furniture Assembly Environment for Long-Horizon Complex Manipulation Tasks. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA); 2021; pp. 6343–6349. [Google Scholar] [CrossRef]
- Zhu, Y.; Wong, J.; Mandlekar, A.; Martín-Martín, R.; Joshi, A.; Nasiriany, S.; Zhu, Y. robosuite: A modular simulation framework and benchmark for robot learning. Computing Research Repository (CoRR) 2020. [Google Scholar] [CrossRef]
- Delhaisse, B.; Rozo, L.; Caldwell, D.G. PyRoboLearn: A Python Framework for Robot Learning Practitioners. In Proceedings of the Proceedings of the Conference on Robot Learning; Kaelbling, L.P.; Kragic, D.; Sugiura, K., Eds. PMLR, 30 Oct–01 Nov 2020, Vol. 100, Proceedings of Machine Learning Research, pp. 1348–1358.
- Ammanabrolu, P.; Hausknecht, M. Graph constrained reinforcement learning for natural language action spaces. Computing Research Repository (CoRR) 2020. [Google Scholar] [CrossRef]
- Dambekodi, S.; Frazier, S.; Ammanabrolu, P.; Riedl, M.O. Playing text-based games with common sense. Computing Research Repository (CoRR) 2020. [Google Scholar] [CrossRef]
- Pternea, M.; Singh, P.; Chakraborty, A.; Oruganti, Y.; Milletari, M.; Bapat, S.; Jiang, K. The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models. J. Artif. Int. Res. 2024, 80. [Google Scholar] [CrossRef]
- Xie, T.; Zhao, S.; Wu, C.H.; Liu, Y.; Luo, Q.; Zhong, V.; Yang, Y.; Yu, T. Text2reward: Automated dense reward function generation for reinforcement learning. Computing Research Repository (CoRR) 2023. [Google Scholar] [CrossRef]
- Carta, T.; Romac, C.; Wolf, T.; Lamprier, S.; Sigaud, O.; Oudeyer, P.Y. Grounding large language models in interactive environments with online reinforcement learning. In Proceedings of the 40th International Conference on Machine Learning. PMLR; 2023; pp. 3676–3713. [Google Scholar]
- Quartey, B.; Shah, A.; Konidaris, G. Exploiting Contextual Structure to Generate Useful Auxiliary Tasks. Computing Research Repository (CoRR) 2023. [Google Scholar] [CrossRef]
- Du, Y.; Watkins, O.; Wang, Z.; Colas, C.; Darrell, T.; Abbeel, P.; Gupta, A.; Andreas, J. Guiding Pretraining in Reinforcement Learning with Large Language Models. In Proceedings of the Proceedings of the 40th International Conference on Machine Learning; Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; Scarlett, J., Eds. PMLR, 23–29 Jul 2023, Vol. 202, Proceedings of Machine Learning Research, pp. 8657–8677.
- Reid, M.; Yamada, Y.; Gu, S.S. Can wikipedia help offline reinforcement learning? Computing Research Repository (CoRR) 2022. [Google Scholar] [CrossRef]
- Dasgupta, I.; Kaeser-Chen, C.; Marino, K.; Ahuja, A.; Babayan, S.; Hill, F.; Fergus, R. Collaborating with language models for embodied reasoning. Computing Research Repository (CoRR) 2023. [Google Scholar] [CrossRef]
- Li, K.; Wang, J.; Yang, L.; Lu, C.; Dai, B. SemGrasp: Semantic grasp generation via language aligned discretization. In Proceedings of the Computer Vision – ECCV 2024, Cham; 2025; pp. 109–127. [Google Scholar] [CrossRef]
- Nie, M.; Chen, D.; Wang, D. Reinforcement learning on graphs: A survey. IEEE Transactions on Emerging Topics in Computational Intelligence 2023, 7, 1065–1082. [Google Scholar] [CrossRef]
- Miao, R.; Jia, Q.; Sun, F.; Chen, G.; Huang, H.; Miao, S. Semantic Representation of Robot Manipulation with Knowledge Graph. Entropy 2023, 25. [Google Scholar] [CrossRef]
- Pagès, J.; Marchionni, L.; Ferro, F. TIAGo: the modular robot that adapts to different research needs.
- ABB. IRB 120: Product specification, 2023. Accessed: 2024-06-28.
- Todorov, E.; Erez, T.; Tassa, Y. MuJoCo: A physics engine for model-based control. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems; 2012; pp. 5026–5033. [Google Scholar] [CrossRef]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. [CrossRef]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Harley, T.; Lillicrap, T.P.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48. JMLR.org, 2016, ICML’16, p. 1928–1937.
- Gu, Z.; Jia, Z.; Choset, H. Adversary A3C for Robust Reinforcement Learning. In Proceedings of the International Conference on Learning Representations (ICLR); 12 2018. [Google Scholar] [CrossRef]
- Grondman, I.; Busoniu, L.; Lopes, G.A.; Babuška, R. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews 2012, 42, 1291–1307. [Google Scholar] [CrossRef]
- Babaeizadeh, M.; Frosio, I.; Tyree, S.; Clemons, J.; Kautz, J. Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU. In Proceedings of the International Conference on Learning Representations (ICLR); 11 2017. [Google Scholar] [CrossRef]
- Tieleman, T.; Hinton, G. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 2012, 4, 26–31. [Google Scholar]









| MDP Definition | |
|---|---|
| Ep. Length | 50 steps maximum |
| Reward configuration | |
| 100 if grasping point reached | |
| Reward constraints | <5 cm |
| < 15o | |
| Action set | 0 |
| Initial robot config | |
| for joints 1 and 2 | |
| Initial target config | x |
| y | |
| Experiment Set | Robots | Agent Type | Accuracy % | Best Model Step |
|---|---|---|---|---|
| Without DR | TIAGo | BM | 60 | 60 M |
| Partial KGE | 70 | 36 M | ||
| Full KGE | 72 | 48 M | ||
| IRB120 | BM | 80 | 60 M | |
| Partial KGE | 91 | 59 M | ||
| Full KGE | 92 | 60 M | ||
| With DR | TIAGo | BM | 63 | 57 M |
| Partial KGE | 62 | 59 M | ||
| Full KGE | 79 | 24 M | ||
| IRB120 | BM | 76 | 56 M | |
| Partial KGE | 87 | 43 M | ||
| Full KGE | 86 | 24 M |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).