Submitted:
02 April 2024
Posted:
04 April 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Problem Statement
2.1. Space Manipulator Dynamics
2.2. Target Dynamics
3. Reinforcement Learning Guidance
3.1. Proximal Policy Optimization
- An agent is initially situated at a state s, and perceives its environment through observations o.
- Based on o, the actor autonomously decides the action a to take, and applies it in the environment to move to a new state .
- Depending on the definition of the reward , the critic evaluates the action that has been taken, and guides the parameter updates of the actor through stochastic gradient descent on a loss function.
3.2. GNC Implementation and Environment
3.3. Action Space and Observation Space
3.4. Reward
4. Training and Results
- Target’s major-axis spin rate deg/s.
- Each initial manipulator joint angle is perturbed by a random value deg.
- Desired end-effector state is randomized on the whole SR-facing side of the target, both in terms of position and attitude.
- Distance between SR and target is perturbed by a random value cm.
4.1. Agent performance
4.2. Agent Robustness
- The curves associated to the left axis show how the success rate varies in function of the thresholds on , , and , while keeping the one on fixed.
- The curves associated to the right axis show how the success rate varies in function of the thresholds on and , while keeping the ones on and fixed.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| APF | Artificial Potential Field |
| CIM | Convective Inertia Matrix |
| DNN | Deep Neural Network |
| DoF | Degree of Freedom |
| DRL | Deep Reinforcement Learning |
| FNN | Feedforward Neural Network |
| GIM | Generalized Inertia Matrix |
| GNC | Guidance, Navigation, and Control |
| IOS | In-Orbit Servicing |
| LR | Learning Rate |
| MDP | Markov Decision Process |
| ORM | Orbital Robotics Mission |
| POMDP | Partially Observable Markov Decision Process |
| PPO | Proximal Policy Optimization |
| RL | Reinforcement Learning |
| SR | Space Robot |
| TRPO | Trust-Region Policy Optimization |
References
- Brandonisio, A.; Capra, L.; Lavagna, M. Deep reinforcement learning spacecraft guidance with state uncertainty for autonomous shape reconstruction of uncooperative target. Advances in Space Research 2023. [Google Scholar] [CrossRef]
- Capra, L.; Brandonisio, A.; Lavagna, M. Network architecture and action space analysis for deep reinforcement learning towards spacecraft autonomous guidance. Advances in Space Research 2023, 71. [Google Scholar] [CrossRef]
- Gaudet, B.; Furfaro, R. Integrated and Adaptive Guidance and Control for Endoatmospheric Missiles via Reinforcement Meta-Learning. arXiv:2109.03880, arXiv:2109.03880 2023. [CrossRef]
- Gaudet, B.; Linares, R.; Furfaro, R. Deep reinforcement learning for six degree-of-freedom planetary landing. Advances in Space Research 2020, 65, 1723–1741. [Google Scholar] [CrossRef]
- Gaudet, B.; Linares, R.; Furfaro, R. Terminal adaptive guidance via reinforcement meta-learning: Applications to autonomous asteroid close-proximity operations. Acta Astronautica 2020, 171, 1–13. [Google Scholar] [CrossRef]
- Moghaddam, B.M.; Chhabra, R. On the guidance, navigation and control of in-orbit space robotic missions: A survey and prospective vision, 2021. [CrossRef]
- Li, Y.; Li, D.; Zhu, W.; Sun, J.; Zhang, X.; Li, S. Constrained Motion Planning of 7-DOF Space Manipulator via Deep Reinforcement Learning Combined with Artificial Potential Field. Aerospace 2022, 9. [Google Scholar] [CrossRef]
- Wang, S.; Zheng, X.; Cao, Y.; Zhang, T. A Multi-Target Trajectory Planning of a 6-DoF Free-Floating Space Robot via Reinforcement Learning. IEEE International Conference on Intelligent Robots and Systems, 2021; 3724–3730. [Google Scholar] [CrossRef]
- Yan, C.; Zhang, Q.; Liu, Z.; Wang, X.; Liang, B. Control of Free-floating Space Robots to Capture Targets using Soft Q-learning. IEEE International Conference on Robotics and Biomimetics 2018. [Google Scholar]
- Papadopoulos, E.; Aghili, F.; Ma, O.; Lampariello, R. Robotic Manipulation and Capture in Space: A Survey. Frontiers in Robotics and AI 2021, 8. [Google Scholar] [CrossRef] [PubMed]
- Romano, M.; Virgili-Llop, J.; Ii, J.V.D. SPART SPAcecraft Robotics Toolkit: an Open-Source Simulator for Spacecraft Robotic Arm Dynamic Modeling And Control. 6th International Conference on Astrodynamics Tools and Techniques 2016. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement learning: An Introduction; Westchester Publishing Services, 2018; p. 526.
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms (PPO). arXiv:1707.06347, arXiv:1707.06347 2017.
- Schulman, J.; Levine, S.; Moritz, P.; Jordan, M.I.; Abbeel, P. Trust Region Policy Optimization. arXiv:1502.05477, arXiv:1502.05477 2015, pp. 1889–1897.
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.P.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous Methods for Deep Reinforcement Learning. arXiv:1602.01783, 2016; arXiv:1602.01783. [Google Scholar]
- Kumar, V.; Hoeller, D.; Sundaralingam, B.; Tremblay, J.; Birchfield, S. Joint Space Control via Deep Reinforcement Learning. arXiv:2011.06332, arXiv:2011.06332 2020.
- Wu, Y.H.; Yu, Z.C.; Li, C.Y.; He, M.J.; Hua, B.; Chen, Z.M. Reinforcement learning in dual-arm trajectory planning for a free-floating space robot. Aerospace Science and Technology 2020, 98. [Google Scholar] [CrossRef]
- Colmenarejo, P.; Branco, J.; Santos, N.; Serra, P.; Telaar, J.; Strauch, H.; Fruhnert, M.; Giordano, A.M.; Stefano, M.D.; Ott, C.; Reiner, M.; Henry, D.; Jaworski, J.; Papadopoulos, E.; Visentin, G.; Ankersen, F.; Gil-Fernandez, J. Methods and outcomes of the COMRADE project-Design of robust Combined control for robotic spacecraft and manipulator in servicing missions: comparison between between Hinf and nonlinear Lyapunov-based approaches. 69th International Astronautical Congress (IAC) 2018, pp. 1–5.












| DoFs | Proportional gain 1 | Derivative gain 1 |
|---|---|---|
| Base | 0.4 | 0.3 |
| Manipulator | 2.5 | 1.25 |
| Layers | Actor neurons | Critic neurons |
|---|---|---|
| Input | 32 | 32 |
| 1st hidden | 300 | 300 |
| 2nd hidden | 300 | 300 |
| 3rd hidden | 300 | 300 |
| Output | 14 | 1 |
| Learning Rate | 1e-5 | 1e-5 |
| Activation |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).