Submitted:
11 March 2024
Posted:
11 March 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Motivation
1.2. Problem Definition, Objectives and Contribution
2. Dynamics Modeling
3. Controlling the Quadrotor
3.1. Controller Structure
3.2. Position Controller
3.3. Attitude Controller
3.4. Underlying LQ control
4. Reinforcement Learning
4.1. Introduction
4.2. Dynamic Programming and Q-Function
4.3. Approximate Dynamic Programming
4.4. Q Learning Policy Iteration
5. Simulation Stusdy
- Kg
- m
- Kg.m2
- Kg.m2
5.1. Experiment 1
- Position controller: = diag(200,1), = 100;
- Attitude controller (except ): = diag(100,1), = 10;
- Attitude controller (): = diag(10,5), = 10;
5.2. Experiment 2
- Position controller: = diag(200,1), = 100;
- Attitude controller: = diag(100,1), = 10;
- Third experiment Kg.m2
- Fourth experiment Kg.m2
- Fifth experiment Kg.m2
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| RL | Reinforcement Learning |
| LQR | Linear Quadratic Regulator |
| VFA | Value Function Approximation |
| TD | Temporal Difference |
| RLS | Recursive Least-Squares |
| MDP | Markov Decision Process |
| ZOH | Zero-Order Hold |
References
- Kangunde, V.; Jamisola, R.S.; Theophilus, E.K. A review on drones controlled in real-time. International journal of dynamics and control 2021, 9, 1832–1846. [Google Scholar] [CrossRef] [PubMed]
- Azar, A.T.; Koubaa, A.; Ali Mohamed, N.; Ibrahim, H.A.; Ibrahim, Z.F.; Kazim, M.; Ammar, A.; Benjdira, B.; Khamis, A.M.; Hameed, I.A.; others. Drone deep reinforcement learning: A review. Electronics 2021, 10, 999. [Google Scholar] [CrossRef]
- Elmokadem, T.; Savkin, A.V. Towards fully autonomous UAVs: A survey. Sensors 2021, 21, 6223. [Google Scholar] [CrossRef] [PubMed]
- Sutton, R.S.; Barto, A.G. Reinforcement learning: An introduction; MIT press, 2018.
- Koch, W.; Mancuso, R.; West, R.; Bestavros, A. Reinforcement learning for UAV attitude control. ACM Transactions on Cyber-Physical Systems 2019, 3, 1–21. [Google Scholar] [CrossRef]
- Deshpande, A.M.; Minai, A.A.; Kumar, M. Robust deep reinforcement learning for quadcopter control. IFAC-PapersOnLine 2021, 54, 90–95. [Google Scholar] [CrossRef]
- Deshpande, A.M.; Kumar, R.; Minai, A.A.; Kumar, M. Developmental Reinforcement Learning of Control Policy of a Quadcopter UAV With Thrust Vectoring Rotors. Dynamic Systems and Control Conference, American Society of Mechanical Engineers, 2020, Vol. 2, p. V002T36A011.
- Koh, S.; Zhou, B.; Fang, H.; Yang, P.; Yang, Z.; Yang, Q.; Guan, L.; Ji, Z. Real-time deep reinforcement learning based vehicle navigation. Applied Soft Computing 2020, 96, 106694. [Google Scholar] [CrossRef]
- Ramstedt, S.; Pal, C. Real-Time Reinforcement Learning. Advances in Neural Information Processing Systems. Curran Associates, Inc., 2019, Vol. 32.
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv preprint 2019, arXiv:1509.02971. [Google Scholar]
- Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE circuits and systems magazine 2009, 9, 32–50. [Google Scholar] [CrossRef]
- Siciliano, B.; Sciavicco, L.; Villani, L.; Oriolo, G. Robotics; Advanced textbooks in control and signal processing, Springer London, London, 2009.
- Franklin, G.F.; Powell, J.D.; Workman, M.L.; others. Digital control of dynamic systems; Vol. 3, Addison-wesley Reading, MA, 1998.
- Recht, B. A tour of reinforcement learning: The view from continuous control. Annual Review of Control, Robotics, and Autonomous Systems 2019, 2, 253–279. [Google Scholar] [CrossRef]
- Coelho, J.; Cunha, J.B.; Oliveira, P.d.M. Recursive parameter estimation algorithms. Controlo 2004-Sixth Portuguese Conference on Automatic Control, 2004.












| Weight matrices | |
| 0.99 | |
| Update step | 1000 (10 seconds) |
| 0.99 | |
| 0.5 | |
| w | |
| Weight matrices | |
| 0.99 | |
| Update step | 1000 (10 seconds) |
| 0.99 | |
| 0.1 | |
| w | |
| Weight matrices | |
| 0.99 | |
| Update step | 1000 (10 seconds) |
| 0.99 | |
| 0.05 | |
| w | |
| 500 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).