Submitted:
30 June 2025
Posted:
01 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Unified RL Framework: Development of a comprehensive, ROS-based framework (UniROS) for creating reinforcement learning environments that work seamlessly across simulation and real-world settings.
- Concurrent Env Learning Support: Enhancement of the framework to support vectorized [16], multi-robot/task learning techniques, enabling efficient learning across multiple environments.
- Real-Time Capabilities: Introduction of a ROS-centric implementation strategy for real-time RL environments, ensuring reduced latency and synchronized agent-environment interactions.
- Benchmarking and Evaluation: Empirical demonstration through benchmark learning tasks, addressing these challenges using the proposed framework, using three distinct scenarios.
2. Background
2.1. Formulation of Reinforcement Learning Tasks for Robotics
2.2. Applying Reinforcement Learning to Real-World Robots
2.3. Use Of Simulation Models For Robotic Reinforcement Learning
3. Related Work
4. Learning Across Simulated and Real-World Robotics Using UniROS
4.1. Unified Framework Formulation
4.2. Modularity of the Framework
4.3. Role of Concurrent Environments
4.4. Python Bindings For ROS
4.5. Additional Supporting Utilities
5. An In-Depth Look into ROS-Based Reinforcement Learning Package for Real Robots (RealROS)
5.1. Base Env
5.2. Robot Env
5.3. Task Env
6. ROS-Based Concurrent Environment Management
6.1. Launching ROS-Based Concurrent Environments
6.2. Maintaining Communication with Concurrent Environments
7. Setting Up Real-Time RL Environments with the Proposed Framework
7.1. Overview of The Real-Time Environment Implementation Strategy
7.2. Reading Sensor Data with ROS
7.3. Sending Actuator Commands with Ros
7.4. Environment Loop
8. Benchmark Tasks Creation
9. Evaluation And Discussion of the Real-Time Environment Implementation Strategy
9.1. Impact of Action Cycle Time on Learning
9.2. Impact Of Environment Loop Rate On Learning
9.3. Empirical Evaluation of Asynchronous Scheduling of the Real-Time Environment Implementation Strategy
9.4. Discussion
10. Use Cases
10.1. Training Robots Directly in The Real World
10.2. Simulation to Real-World
10.3. Concurrent Training In Real and Simulation Environments
10.3.1. Learning a Generalized Policy
10.3.2. Multi-Task Learning
| Algorithm 1: Multi-Task Training Strategy for TD3/TD3+HER |
![]() |
11. ROS1 vs ROS2 Support in the UniROS Framework
12. Conclusion & Future Work
Funding
Data Availability Statement
Acknowledgments
Abbreviations
| ROS | Robot Operating System |
| RL | Reinforcement Learning |
| MDP | Markov Decision Process |
| CLI | Command Line Interface |
| API | Application Programming Interface |
| DRL | Deep Reinforcement Learning |
| DQN | Deep Q-Network |
| TD3 | Twin Delayed Deep Deterministic Policy Gradient |
| SB3 | Stable Baselines3 |
| GIL | Global Interpreter Lock |
| URDF | Universal Robot Description Format |
| IK | Inverse Kinematics |
| FK | Forward Kinematics |
| EE | End-Effector |
| DOF | Degree of Freedom |
| HER | Hindsight Experience Replay |
| SSH | Secure Shell |
| PC | Personal Computer |
| VRAM | Video Random Access Memory |
| GPU | Graphics Processing Unit |
| CPU | Central Processing Unit |
Appendix A
| Standard Env | Goal Env | ||
| Action Space | Joint positions 5 actions (Continuous) |
||
| Observation | EE position | 3 | |
| Cartesian Displacement | 3 | ||
| Euclidean distance (EE to Reach goal) | 1 | ||
| Current Joint values | 8 | ||
| Previous action | 5 | ||
| Achieved Goal | N/A | EE position | 3 |
| Desired Goal | N/A | Goal position | 3 |
| Reward Architecture | Dense Sparse |
Dense Sparse |
|
| Standard Env | Goal Env | ||
| Action Space | Joint positions 6 actions (Continuous) |
||
| Observation | EE position | 3 | |
| Cartesian Displacement | 3 | ||
| Euclidean distance (EE to Reach goal) | 1 | ||
| Current Joint values | 6 | ||
| Previous action | 6 | ||
| Achieved Goal | N/A | EE position | 3 |
| Desired Goal | N/A | Goal position | 3 |
| Reward Architecture | Dense Sparse |
Dense Sparse |
|
| Algorithm A1: Dense Reward Architecture |
![]() |
Appendix B
| Actor and Critic learning rate | 0.0003 |
| Size of the replay buffer | 1000000 |
| Minibatch size for each gradient update | 256 |
| Soft update coefficient | 0.005 |
| Discount factor | 0.99 |
| Entropy regularization coefficient | 0.01 |
| Maximum episode time steps | 100 |
| Success distance threshold | 0.02m |
| hidden layers | 400, 300 |
| Optimizer | Adam |
| Actor and Critic learning rate | 0.0003 |
| Size of the HER replay buffer | 1000000 |
| Goal selection strategy | future |
| Number of future goals | 4 |
| Minibatch size for each gradient update | 256 |
| soft update coefficient | 0.005 |
| Discount factor | 0.99 |
| Entropy regularization coefficient | 0.01 |
| Maximum episode time steps | 100 |
| Success distance threshold | 0.02m |
| Hidden layers | 400, 300 |
| Optimizer | Adam |
| Policy delay | 2 |
| Target policy noise (smoothing noise) | 0.2 |
| Limit for the absolute value of target policy smoothing noise | 0.5 |
Appendix C
References
- Toner, T.; Saez, M.; Tilbury, D.M.; Barton, K. Opportunities and Challenges in Applying Reinforcement Learning to Robotic Manipulation: An Industrial Case Study. Manuf. Lett. 2023, 35, 1019–1030. [Google Scholar] [CrossRef]
- Ibarz, J.; Tan, J.; Finn, C.; Kalakrishnan, M.; Pastor, P.; Levine, S. How to Train Your Robot with Deep Reinforcement Learning: Lessons We Have Learned. Int. J. Robot. Res. 2021, 40, 698–721. [Google Scholar] [CrossRef]
- Chebotar, Y.; Handa, A.; Makoviychuk, V.; Macklin, M.; Issac, J.; Ratliff, N.; Fox, D. Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), May 2019; pp. 8973–8979. [Google Scholar]
- Bousmalis, K.; Irpan, A.; Wohlhart, P.; Bai, Y.; Kelcey, M.; Kalakrishnan, M.; Downs, L.; Ibarz, J.; Pastor, P.; Konolige, K.; et al. Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), May 2018; pp. 4243–4250. [Google Scholar]
- Dulac-Arnold, G.; Levine, N.; Mankowitz, D.J.; Li, J.; Paduraru, C.; Gowal, S.; Hester, T. Challenges of Real-World Reinforcement Learning: Definitions, Benchmarks and Analysis. Mach. Learn. 2021, 110, 2419–2468. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning, Second Edition: An Introduction; MIT Press, 2018; ISBN 978-0-262-35270-3. [Google Scholar]
- Rupam Mahmood, A.; Korenkevych, D.; Komer, B.J.; Bergstra, J. Setting up a Reinforcement Learning Task with a Real-World Robot. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2018; pp. 4635–4640. [Google Scholar]
- Fan, T.; Long, P.; Liu, W.; Pan, J. Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Navigation in Complex Scenarios. Int. J. Robot. Res. 2020, 39, 856–892. [Google Scholar] [CrossRef]
- Gleeson, J.; Gabel, M.; Pekhimenko, G.; de Lara, E.; Krishnan, S.; Janapa Reddi, V. RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads. Proc. Mach. Learn. Syst. 2021, 3, 783–799. [Google Scholar]
- Wiggins, S.; Meng, Y.; Kannan, R.; Prasanna, V. Evaluating Multi-Agent Reinforcement Learning on Heterogeneous Platforms. In Proceedings of the Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications V, SPIE, 12 June 2023; Volume 12538, pp. 501–505. [Google Scholar]
- Wu, Y.; Yan, W.; Kurutach, T.; Pinto, L.; Abbeel, P. Learning to Manipulate Deformable Objects without Demonstrations. Available online: https://arxiv.org/abs/1910.13439v2 (accessed on 4 December 2023).
- Zamora, I.; Lopez, N.G.; Vilches, V.M.; Cordero, A.H. Extending the OpenAI Gym for Robotics: A Toolkit for Reinforcement Learning Using ROS and Gazebo. ArXiv160805742 Cs 2017. [Google Scholar]
- Fajardo, J.M.; Roldan, F.G.; Realpe, S.; Hernández, J.D.; Ji, Z.; Cardenas, P.-F. FRobs_RL: A Flexible Robotics Reinforcement Learning Library. In Proceedings of the 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), August 2022; pp. 1104–1109. [Google Scholar]
- Deisenroth, M.P.; Englert, P.; Peters, J.; Fox, D. Multi-Task Policy Search for Robotics. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), May 2014; pp. 3876–3881. [Google Scholar]
- Alet, F.; Lozano-Perez, T.; Kaelbling, L.P. Modular Meta-Learning. In Proceedings of the Proceedings of The 2nd Conference on Robot Learning, PMLR; 23 October 2018, pp. 856–868.
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. In Proceedings of the Proceedings of The 33rd International Conference on Machine Learning, PMLR; 11 June 2016, pp. 1928–1937.
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the Proceedings of the 35th International Conference on Machine Learning, PMLR, 3 July 2018; pp. 1587–1596. [Google Scholar]
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. J. Mach. Learn. Res. 2021, 22, 268:12348–268:12355. [Google Scholar]
- Weng, J.; Chen, H.; Yan, D.; You, K.; Duburcq, A.; Zhang, M.; Su, Y.; Su, H.; Zhu, J. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. J Mach Learn Res 2022, 23. [Google Scholar]
- Schaarschmidt, M.; Kuhnle, A.; Ellis, B.; Fricke, K.; Gessert, F.; Yoneki, E. LIFT: Reinforcement Learning in Computer Systems by Learning From Demonstrations. 2018. [Google Scholar]
- Huang, S.; Dossa, R.F.J.; Ye, C.; Braga, J.; Chakraborty, D.; Mehta, K.; Araújo, J.G.M. CleanRL: High-Quality Single-File Implementations of Deep Reinforcement Learning Algorithms. J. Mach. Learn. Res. 2022, 23, 1–18. [Google Scholar]
- Liang, E.; Liaw, R.; Nishihara, R.; Moritz, P.; Fox, R.; Goldberg, K.; Gonzalez, J.; Jordan, M.; Stoica, I. RLlib: Abstractions for Distributed Reinforcement Learning. In Proceedings of the Proceedings of the 35th International Conference on Machine Learning, PMLR, 3 July 2018; pp. 3053–3062. [Google Scholar]
- Kormushev, P.; Calinon, S.; Caldwell, D.G. Reinforcement Learning in Robotics: Applications and Real-World Challenges. Robotics 2013, 2, 122–148. [Google Scholar] [CrossRef]
- Gomes, N.M.; Martins, F.N.; Lima, J.; Wörtche, H. Reinforcement Learning for Collaborative Robots Pick-and-Place Applications: A Case Study. Automation 2022, 3, 223–241. [Google Scholar] [CrossRef]
- Liu, D.; Wang, Z.; Lu, B.; Cong, M.; Yu, H.; Zou, Q. A Reinforcement Learning-Based Framework for Robot Manipulation Skill Acquisition. IEEE Access 2020, 8, 108429–108437. [Google Scholar] [CrossRef]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum Learning. In Proceedings of the Proceedings of the 26th Annual International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 14 June 2009; pp. 41–48. [Google Scholar]
- Beltran-Hernandez, C.C.; Petit, D.; Ramirez-Alpizar, I.G.; Harada, K. Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum Learning Study. 2022. [Google Scholar]
- Santos Pessoa de Melo, M.; Gomes da Silva Neto, J.; Jorge Lima da Silva, P.; Natario Teixeira, J.M.X.; Teichrieb, V. Analysis and Comparison of Robotics 3D Simulators. In Proceedings of the 2019 21st Symposium on Virtual and Augmented Reality (SVR), October 2019; pp. 242–251. [Google Scholar]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. 2016. [Google Scholar]
- Ferigo, D.; Traversaro, S.; Metta, G.; Pucci, D. Gym-Ignition: Reproducible Robotic Simulations for Reinforcement Learning. In Proceedings of the 2020 IEEE/SICE International Symposium on System Integration (SII), January 2020; pp. 885–890. [Google Scholar]















| Frameworks | UniROS | OpenAI_ROS | gym-gazebo2 | FRobs_RL | ros-gazebo-gym | SenseAct | Orbit |
|---|---|---|---|---|---|---|---|
| Real-Time Capability | ✓ | ✗ | Limited | ✗ | ✗ | ✓ | ✓ |
| Supports Real Robots | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Partial |
| Simulation Support | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Sim-to-Real Support | ✓ | Partial | Partial | Partial | Partial | ✗ | ✓ |
| ROS Integration | ✓ | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
| Concurrent Multi-Robot Support | ✓ | ✗ | ✗ | ✗ | ✗ | ✗ | Partial |
| Latency Handling/Modeling | ✓ | ✗ | ✗ | ✗ | ✗ | ✓ | Partial |
| Python API for Env Management | ✓ | ✗ | Manual | ✗ | Partial | ✗ | ✓ |
| Actively Maintained | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ | ✓ |
| Open Source | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Partial |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

