Submitted:
10 March 2025
Posted:
11 March 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose a two-layer hierarchical task scheduling and resource allocation scheme for the UAV data collection task, where the amount of the collected data, as well as UAV energy consumption, is jointly minimized by training the target allocation, trajectories, communication power, and CPU frequency of the UAV in a decentralized manner.
- In the lower layer, we design the PPO based algorithm to optimize the continuous actions, where the Lin-Kernighan-Helsgaun method is utilized to obtain the optimal trajectory as the given input of the lower layer network (LLN).
- In the upper layer, we leverage the DQN based algorithm to optimize the discrete actions with the assistance of the nested well-trained LLN, which gives rapid feedback on global rewards and accelerates the training process of the upper layer network.
- Extensive experiments are implemented to demonstrate the effectiveness of the proposed two-layer HDRL-TSRAM framework, which achieves superior performance compared with several baseline methods.
| Section | Content |
| 1 | Introduction |
| 2 | Related works |
| 3 | System model |
| 4 | Hierarchical DRL method |
| 5 | Experimental results and analysis |
| 6 | Discussion |
| 7 | Conclusions |
2. Related Work
2.1. Scenarios of the UAV Data Collection
2.2. Corresponding Approaches
| References | Method | Advantages | Disadvantages |
|---|---|---|---|
| [39] | Reinforcement learning-based training methods | Practicality enhanced by grid-based path planning and autonomous UAV decision-making | Limited by assumptions on user mobility |
| [40] | Q-learning | Focuses on energy efficiency, extending UAV operation time with wireless charging | Limited adaptability to dynamic environments |
| [41] | Deep deterministic policy gradient (DDPG) | Significant reduction in data packet loss with continuous action space optimization | High computational complexity and resource demands |
| [42] | A multi-agent reinforcement learning approach | Significant improvements in reducing AoI, increasing throughput, and energy efficiency | High complexity and resource consumption |
2.3. Comparative Analysis of UAV Task Scheduling and Resource Allocation Models
| Model | Hierarchical RL | Resource Allocation | Energy Efficiency | Dynamic Adaptability | Real-time Decision |
|---|---|---|---|---|---|
| QEDU | ✓ | ✗ | ✓ | ✗ | ✗ |
| TD3-OHT | ✓ | ✗ | ✓ | ✗ | ✗ |
| SCP | ✓ | ✓ | ✓ | ✓ | ✗ |
| MADRL | ✓ | ✓ | ✗ | ✓ | ✗ |
| DTDE | ✗ | ✓ | ✓ | ✗ | ✗ |
| HDRL-TSRAM | ✓ | ✓ | ✓ | ✓ | ✓ |
3. System model
| Parameters | Meaning | Parameters | Meaning |
|---|---|---|---|
| n | Time slot | Main frequency | |
| Flight time | Dynamic transition matrix | ||
| Collection time | M | Number of ground targets | |
| Flight power | B | Allocated communication bandwidth | |
| Hovering power | Power of the random noise | ||
| Distance | Offloading data transmission | ||
| Transmission power | Effective conversion rate of UAV | ||
| Energy consumption | Prioritization matrix | ||
| C | Number of CPU cycles | Energy consumption penalty coefficient |
3.1. Flight Model
3.2. Communication Model
3.3. UAV Computation Model
3.4. Optimization Model
4. Hierarchical DRL Method
4.1. Lower-layer Network
4.1.1. LKH-based UAV Trajectory Planning Algorithm
| Algorithm 1: LKH based UAV Trajectory Planning Method |
|
4.1.2. DRL-based UAV Resource Allocation Algorithm
| Algorithm 2: PPO based Training Scheme for Lower-layer network |
|
4.2. Upper-layer Network
| Algorithm 3: Training strategy for the ULN |
|
5. Experimental Results and Analysis
5.1. Experimental Setup
5.2. Numerical Results
5.2.1. Experimental Verification of Lower Layer Network
5.2.2. Experimental Verification of Upper Layer Network
6. Conclusions
7. Discussion
References
- Hanscom, A.; Bedford, M. Unmanned aircraft system (uas) service demand 2015-2035. Literature review and projections of future usage 2013. [Google Scholar]
- Liang, Y.; Xu, W.; Liang, W.; Peng, J.; Jia, X.; Zhou, Y.; Duan, L. Nonredundant information collection in rescue applications via an energy-constrained UAV. IEEE Internet of Things Journal 2018, 6, 2945–2958. [Google Scholar]
- Nguyen, M.T.; Nguyen, C.V.; Do, H.T.; Hua, H.T.; Tran, T.A.; Nguyen, A.D.; Ala, G.; Viola, F. Uav-assisted data collection in wireless sensor networks: A comprehensive survey. Electronics 2021, 10, 2603. [Google Scholar] [CrossRef]
- Yuan, X.; Hu, Y.; Zhang, J.; Schmeink, A. Joint User Scheduling and UAV Trajectory Design on Completion Time Minimization for UAV-Aided Data Collection. IEEE Transactions on Wireless Communications 2022. [Google Scholar]
- Wei, Z.; Zhu, M.; Zhang, N.; Wang, L.; Zou, Y.; Meng, Z.; Wu, H.; Feng, Z. UAV-assisted data collection for internet of things: A survey. IEEE Internet of Things Journal 2022, 9, 15460–15483. [Google Scholar]
- Khan, A.; Gupta, S.; Gupta, S.K. Emerging UAV technology for disaster detection, mitigation, response, and preparedness. Electronics 2022, 39, 905–955. [Google Scholar]
- Fu, Y.; Li, D.; Tang, Q.; Zhou, S. Joint speed and bandwidth optimized strategy of UAV-assisted data collection in post-disaster areas. In Proceedings of the 2022 20th Mediterranean Communication and Computer Networking Conference (MedComNet). IEEE; 2022; pp. 39–42. [Google Scholar]
- Xie, Z.; Song, X.; Cao, J.; Qiu,W. Providing aerial MEC service in areas without infrastructure: A tethered-UAV-based energy-efficient task scheduling framework. IEEE Internet of Things Journal 2022, 9, 25223–25236. [Google Scholar] [CrossRef]
- Halder, S.; Ghosal, A.; Conti, M. Dynamic Super Round-Based Distributed Task Scheduling for UAV Networks. IEEE Transactions on Wireless Communications 2022, 22, 1014–1028. [Google Scholar]
- Zhou, S.; Cheng, Y.; Lei, X.; Peng, Q.; Wang, J.; Li, S. Resource allocation in UAV-assisted networks: A clustering-aided reinforcement learning approach. IEEE Transactions on Vehicular Technology 2022, 71, 12088–12103. [Google Scholar] [CrossRef]
- Liu, B.; Wan, Y.; Zhou, F.; Wu, Q.; Hu, R.Q. Resource allocation and trajectory design for miso uav-assisted mec networks. IEEE Transactions on Vehicular Technology 2022, 71, 4933–4948. [Google Scholar]
- Zhang, F.; Ding, Y.; Cao, M.; Wu, M.; Lu, W. and Nallanathan, A. Energy Efficiency Optimization of RIS-Assisted UAV Search-Based Cognitive Communication in Complex Obstacle Avoidance Environments. IEEE Transactions on Vehicular Technology.
- Wang, Z.; Wen, J.; He, J.; Yu, L. and Li, Z. Resource and Trajectory Optimization for Secure Enhancement in IRS-Assisted UAV-MEC Systems. IEEE Transactions on Cognitive Communications and Networking.
- Chen, Y.; Yang, Y.; Wu, Y.; Huang, J. and Zhao, L. Joint Trajectory Optimization and Resource Allocation in UAV-MEC Systems: A Lyapunov-Assisted DRL Approach. IEEE Transactions on Services Computing.
- Shen, L.; Zhang, H.; Wang, N.; Cui, Y.; Cheng, X. and Mu, X. Joint Clustering and 3-D UAV Deployment for Delay-Aware UAV-Enabled MTC Data Collection Networks. IEEE Sensors Letters 2024, 8, 1–4. [Google Scholar]
- Gong, J.; Chang, T.H.; Shen, C.; Chen, X. Aviation time minimization of UAV for data collection from energy constrained sensor networks. In Proceedings of the 2018 IEEE Wireless Communications and Networking Conference (WCNC). IEEE; 2018; pp. 1–6. [Google Scholar]
- Gong, J.; Chang, T.H.; Shen, C.; Chen, X. Flight time minimization of UAV for data collection over wireless sensor networks. IEEE Journal on Selected Areas in Communications 2018, 36, 1942–1954. [Google Scholar]
- Zhu, K.; Xu, X.; Han, S. Energy-efficient UAV trajectory planning for data collection and computation in mMTC networks. In Proceedings of the 2018 IEEE Globecom Workshops (GC Wkshps). IEEE; 2018; pp. 1–6. [Google Scholar]
- Song, C.; Zhang, X.; She, Y.; LI, B. and Zhang, Q. Trajectory Planning for UAV Swarm Tracking Moving Target Based on an Improved Model Predictive Control Fusion Algorithm. IEEE Internet of Things Journal 2025, 1–1. [Google Scholar]
- Zhang, H.; Li, B.; Rong, Y.; Zeng, Y. and Zhang, R. Joint Optimization of Transmit Power and Trajectory for UAV-Enabled Data Collection With Dynamic Constraints. IEEE Transactions on Communications 2025, 1–1. [Google Scholar]
- Li, J.; Shi, Y.; Dai, C.; Yi, C.; Yang, Y.; Zhai, X. and Zhu, K. A Learning-Based Stochastic Game for Energy Efficient Optimization of UAV Trajectory and Task Offloading in Space/Aerial Edge Computing. IEEE Transactions on Vehicular Technology 2025, 1–16. [Google Scholar]
- Wang, M.; Zhang, D.; Wang, B. and Li, L. Dynamic Trajectory Planning for Multi-UAV Multi-Mission Operations Using a Hybrid Strategy. IEEE Transactions on Aerospace and Electronic Systems 2025, 1–19. [Google Scholar]
- Zhang, X.; Yu, X. and Cai, H. In Joint Trajectory and Resource Allocation Optimization for UAV-Assisted Edge Computing. In Proceedings of the 2024 16th International Conference on Wireless Communications and Signal Processing (2024 WCSP). IEEE; 2024; pp. 1068–1073. [Google Scholar]
- Qin, P.; Wu, X.; Ding, R.; Fu, M.; Zhao, X.; Chen, Z. and Zhou, H. Joint Resource Allocation and UAV Trajectory Design for D2D-Assisted Energy-Efficient Air–Ground Integrated Caching Network. IEEE Transactions on Vehicular Technology 2024, 73, 17558–17571. [Google Scholar]
- Khan, N.; Ahmad, A.; Alwarafy, A.; Shah, M.; Lakas, A. and Azeem, M. Efficient Resource Allocation and UAV Deployment in STAR-RIS and UAV-Relay Assisted Public Safety Networks for Video Transmission. IEEE Open Journal of the Communications Society 2025, 1–1. [Google Scholar]
- Qian, J.; Yan, Y.; Gao, F.; Ge, B.; Wei, M.; Shangguan, B. and He, G. C3DGS: Compressing 3D Gaussian Model for Surface Reconstruction of Large-Scale Scenes Based on Multiview UAV Images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2025, 18, 4396–4409. [Google Scholar]
- Aldao, E.; Veiga-López, F.; Miguel González-deSantos, L. and González-Jorge, H. Enhancing UAV Classification With Synthetic Data: GMM LiDAR Simulator for Aerial Surveillance Applications. IEEE Sensors Journal 2024, 24, 26960–26970. [Google Scholar]
- Qiu, J.; Kuang, Z.; Huang, Z. and Lin, S. Security Offloading Scheduling and Caching Optimization Algorithm in UAV Edge Computing. IEEE Systems Journal 2025, 1–11. [Google Scholar]
- Wan, L.; Wang, J.; Sun, L.; Li, K.; Xiong, X. and Lin, Y. Heterogeneous UAV Resource Scheduling for Dynamic Time Sensitive Target Detection and Interference. IEEE Internet of Things Journal 2025, 1–1. [Google Scholar]
- Chen, X.; Chen, X. The UAV dynamic path planning algorithm research based on Voronoi diagram. In Proceedings of the The 26th chinese control and decision conference (2014 ccdc). IEEE; 2014; pp. 1069–1071. [Google Scholar]
- Planning, P.R. On the Probabilistic Foundations of Probabilistic Roadmap Planning. Proceedings of the Robotics Research: Results of the 12th International Symposium ISRR. Springer Science and Business Media. 2007; Vol. 28, p. 83. [Google Scholar]
- Liu, X. Four alternative patterns of the Hilbert curve. Applied mathematics and computation 2004, 147, 741–752. [Google Scholar]
- Mokrane, A.; BRAHAM, A.C.; Cherki, B. UAV path planning based on dynamic programming algorithm on photogrammetric 654 DEMs. In Proceedings of the 2020 International Conference on Electrical Engineering (ICEE). IEEE; 2020; pp. 1–5. [Google Scholar]
- Binney, J.; Sukhatme, G.S. Branch and bound for informative path planning. In Proceedings of the 2012 IEEE international conference on robotics and automation. IEEE; 2012; pp. 2147–2154. [Google Scholar]
- Samir, M.; Sharafeddine, S.; Assi, C.; Nguyen, T. and Ghrayeb, A. UAV trajectory planning for data collection from time-constrained IoT devices. IEEE Transactions on Wireless Communications 2019, 19, 34–46. [Google Scholar]
- Sun, Y.; Babu, P.; Palomar, D.P. Majorization-minimization algorithms in signal processing, communications, and machine learning. IEEE Transactions on Signal Processing 2016, 65, 794–816. [Google Scholar]
- Xu, Y.; Wang, J.; Wang, J.; Que, X. and Lu, D. In Trajectory Design and Resource Allocation for UAV-Assisted Computation Offloading. In Proceedings of the 2024 IEEE 7th International Conference on Information Systems and Computer Aided Education (2024 ICISCAE), IEEE; 2024; pp. 963–967. [Google Scholar]
- Nguyen, M.; and Ajib, W. and Zhu, W. In Joint UAV Trajectory Control and Channel Assignment for UAV-Based Networks with Wireless Backhauling. In Proceedings of the 2024 IEEE 99th Vehicular Technology Conference (VTC2024-Spring), IEEE; 2024; pp. 1–5. [Google Scholar]
- Nguyen, K.K.; Duong, T.Q.; Do-Duy, T.; Claussen, H.; Hanzo, L. 3D UAV trajectory and data collection optimisation via deep 666 reinforcement learning. IEEE Transactions on Communications 2022, 70, 2358–2371. [Google Scholar]
- Fu, S.; Tang, Y.; Wu, Y.; Zhang, N.; Gu, H.; Chen, C.; Liu, M. Energy-efficient UAV-enabled data collection via wireless charging: A reinforcement learning approach. IEEE Internet of Things Journal 2021, 8, 10209–10219. [Google Scholar]
- Kurunathan, H.; Li, K.; Ni, W.; Tovar, E.; Dressler, F. Deep reinforcement learning for persistent cruise control in UAV-aided data collection. In Proceedings of the 2021 IEEE 46th Conference on Local Computer Networks (LCN). IEEE; 2021; pp. 347–350. [Google Scholar]
- Oubbati, O.S.; Atiquzzaman, M.; Lim, H.; Rachedi, A.; Lakas, A. Synchronizing UAV teams for timely data collection and energy transfer by deep reinforcement learning. IEEE Transactions on Vehicular Technology 2022, 71, 6682–6697. [Google Scholar]
- Liao, Z.; Li, H.; Cai, W.; Zhong, Y. and Zhang, X. Phase Sensitivity-Based Fringe Angle Optimization in Telecentric Fringe Projection Profilometry. IEEE Transactions on Instrumentation and Measurement 2025, 74, 1–10. [Google Scholar]
- Gao, M.; Xu, G.; Song, Z.; Cheng, Y. and Niyato, D. Performance Analysis of Random 3D mmWave-Assisted UAV Communication System. IEEE Transactions on Vehicular Technology 2022, 73, 19169–19185. [Google Scholar]










| Priority | I | II | III | IV |
|---|---|---|---|---|
| I | 0.8 | 0.1 | 0.08 | 0.02 |
| II | 0 | 0.8 | 0.16 | 0.04 |
| III | 0 | 0 | 0.9 | 0.1 |
| IV | 0 | 0 | 0 | 1 |
| Priority | I | II | III | IV |
|---|---|---|---|---|
| The amount of information | 1 | 2 | 3 | 4 |
| Algorithm | Parameters | |
| PPO | Learning Rate | 3 |
| Network Structure | 128 × 128 | |
| Batch Size | 64 | |
| Discount Factor | 0.99 | |
| SAC | Learning Rate | 3 |
| Network Structure | 128 × 128 | |
| Batch Size | 64 | |
| Discount Factor | 0.99 | |
| GA | Population Size | 200 |
| Iterations | 1000 | |
| Algorithm | Parameters | |
| DQN | Energy Consumption Penalty Coefficient | 1 |
| Learning Rate | 3 | |
| Greed Coefficient | 0.1 | |
| Size of Experience Buffer | 500000 | |
| Parameters of soft updating | 0.005 | |
| Batch Size | 256 | |
| Discount Factor | 0.99 | |
| Training Step | 10000000 | |
| GA | Population Size | 200 |
| Iterations | 1000 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).