Submitted:
13 August 2024
Posted:
14 August 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Abstract modeling is carried out fortypical application scenarios of communica-tion equipment such as UAV, and environ-mental design is carried out based on Atari platform.
- A more effective action selection strategy is applied to replace the greedy strategy with the traditional DQN algorithm, and its performance is better than before.
- The reward function is redesigned to improve the efficiency and stability of the algorithm, and experimental verificationis carried out.
2. Materials and Methods
2.1. Reinforcement Learning
2.2 DQN
2.3. System Model
- The paper does not consider attenuation and other factors in the transmission process, simplifying the system model.
- The relay node directly transmits the signal downward after receiving it, and there is no delay in the middle.
- The UAV flies at a constant slow speed and only moves in two-dimensional space.
3. Improvement and Experiment
3.1. Improvement of Algorithm
3.2. Experiment
4. Conclusions and Future Work
References
- Liang, Zhang, F. Qiang, and A. Nirwan."3-D Drone-Base-Station Placement With In-Band Full-Duplex Communica-tions." IEEE Communications Letters PP(2018):1-1. [CrossRef]
- Wang Tianzhi. Research on UAV depl-oyment and path planning for Emergency Communication [D]. Beijing Univer-sity of Posts and Telecommunications,2022.
- YAN Luhong, Guo Wenpu, Xu Dongh-ui, Yang Haiyu. Application Research of Computers,2022, vol. 39 (1): 226-230,235.
- Yin Changsheng, Yang Ruopeng, Zhu Wei, Zou Xiaofei. Emergency commun-ication network planning method basedon deep reinforcement learning [J]. Systems Engineering and Electronics,2020,vol. 42 (9): 2091-2097. [CrossRef]
- Chen Haoran, Zhu Wei, Yu Sheng. E-mergency communication network plan-ning method based on deep reinforce-ment learning [J]. Command, Control and Simulation,2023, vol. 45 (1): 150-156.
- Jiangbin Lyu;Yong Zeng;Rui Zhang;Te-ng Joon Lim.Placement Optimization ofUAV-Mounted Mobile Base Stations[J].IEEE Communications Letters,2017,Vol.21(3): 604-607. [CrossRef]
- Andrew Y. Ng;Daishi Harada;Stuart R-ussell.Policy invariance under reward t-ransformations: theory and application to reward shaping[A].International Conference on Machine Learning[C],1999.
- Dong Yunlong. Research and Applicati-on of reinforcement learning based on Reward Shaping [D]. Huazhong Unive-rsity of Science and Technology,2022.
- Yu Fei, Hao Jianguo, Zhang Zhongjie. Action exploration strategy in reinforc-ement learning based on action probab-ility [J]. Journal of Computer Applicat-ions and Software,2023, vol. 40 (5): 184-189,226.
- Shi Hongyuan. Research on DQN (Deep Q-Network) algorithm in complex e-nvironment [D]. Nanjing University of Information Science and Technology,2023.
- WU Jinjin. Research on Overestimationof value function for DQN [D]. Sooc-how University,2020.
- Yang Weiyi, Bai Chenjia, CAI Chao, Zhao Yingnan, Liu Peng. Sparse rewardproblem in deep reinforcement learning[J]. Computer Science,2020, vol. 47 (3):182-191.
- Liu Hongyan. Research on UAV com-munication trajectory Optimization bas-ed on deep reinforcement learning [D].Nanchang University,2023.
- Li Qiru, Geng Xia. Robot path Planni-ng based on improved DQN Algorithm[J]. Computer Engineering,2023,(12): 111-120. [CrossRef]
- Yang, D. Research on reward strategy techniques of deep reinforcement learni-ng for complex confrontation scenarios[D]. National University of Defense T-echnology,2020.
- Niu Songdeng. Research on Student Motivation Based on Reinforcement L-earning [D]. University of Electronic Science and Technology of China,2022.



![]() |
| Hyper parameter | Number |
|---|---|
| Seed | 25 |
| entry 2 | data |
| Learning_rate | e-4 |
| Grad_clipping_value | 5 |
| Replay_buffer_size | 1000000 |
| Batch_size | 32 |
| Gamma | 0.98 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
