Submitted:
21 May 2026
Posted:
28 May 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Research Gaps
- No existing DRL UAV navigation study deploys piezoelectric harvesters on all four quadrotor arms with FEA-validated placement optimisation and integrates the combined four-arm power signal into the reward function, state representation, and battery model simultaneously.
- Existing FEA-based harvesting studies characterise maximum achievable power from a single arm without coupling to a closed-loop navigation policy capable of preferentially exploiting high-harvest flight phases.
- No prior study simultaneously validates navigation success rate, battery consumption, and four-arm energy harvesting yield within a rigorous multi-seed statistical protocol reporting standardised effect sizes across DQN, PPO, and SAC under identical conditions.
- The non-monotonic RPM–power relationship of PZT harvesters on quadrotor arms has not been exploited as a physics-derived reward signal in any prior DRL navigation study.
1.2. Principal Contributions
- Comprehensive Euler–Bernoulli FEA across six PZT-5A patch locations on each of the four DJI F450 arms, with full electromechanical coupling and impedance analysis, analytically verified to 0.03%, establishing P3 (arm-root, 15% span) as universally optimal.
- A symmetric four-arm P3 deployment model yielding 0.2400 mW combined average power, 144 mJ per standard 10-minute mission, and 444 mJ under DRL-optimised SAC flight profiles.
- Rigorous comparative evaluation of DQN, PPO, and SAC under an identical four-arm harvest bonus reward — 5 seeds × 200,000 training steps — with one-way ANOVA, Bonferroni-corrected pairwise tests, and Cohen’s d effect sizes.
- Experimental bench validation of the FEA model within ±18% across seven RPM operating points, confirming the predicted non-monotonic power–RPM behaviour.
- Full source code, a 43/43 PASS unit test suite, and a four-stage sim-to-real deployment roadmap for DJI F450 with Pixhawk 6C and Raspberry Pi 4B companion computer.
- Incorporation of 15–20 new references from 2023–2026, contextualising this work within the most current literature.
2. Piezoelectric Energy Harvesting Model
2.1. Euler–Bernoulli FEA Formulation
2.2. Electromechanical Coupling and Power Generation
2.3. Six Patch Locations — All Four Arms
| Patch | Location | Span (%) | Hover (mW) | Climb (mW) | Max Throttle (mW) | Avg (mW) |
| P1 | Mid-arm, upper surface | 50% | 0.0118 | 0.0014 | 0.0482 | 0.0205 |
| P2 | Mid-arm, lower surface | 50% | 0.0115 | 0.0013 | 0.0468 | 0.0199 |
| P3 ★ | Arm-root, upper — OPTIMAL | 15% | 0.0342 | 0.0064 | 0.1393 | 0.0600 |
| P4 | Motor-mount — WORST | 90% | 0.0005 | 0.0001 | 0.0019 | 0.0008 |
| P5 | Near-hub, upper surface | 5% | 0.0128 | 0.0015 | 0.0522 | 0.0222 |
| P6 | Mid-arm, lower (Arm-3) | 50% | 0.0117 | 0.0013 | 0.0479 | 0.0203 |

2.4. Four-Arm Symmetric Deployment — P3 on All Four Arms

2.5. Non-Monotonic RPM–Power Relationship
| RPM | f_exc (Hz) | U_tip (μm) | Freq Ratio r | P3 Single (mW) | 4-Arm Total (mW) |
| 5,200 | 173.3 | 45.89 | 3.00 | 0.0342 | 0.1368 |
| 5,900 | 196.7 | 23.61 | 3.41 | 0.0286 | 0.1144 |
| 6,400 | 213.3 | 16.12 | 3.70 | 0.0201 | 0.0804 |
| 7,100 | 236.7 | 13.84 | 4.10 | 0.0064 | 0.0256 |
| 7,800 † | 260.0 | 12.87 | 4.51 | 0.0023 | 0.0092 |
| 8,400 | 280.0 | 25.46 | 4.86 | 0.0168 | 0.0672 |
| 9,000 ★ | 300.0 | 76.88 | 5.20 | 0.1393 | 0.5572 |

2.6. Mission Energy Budget — Four-Arm Deployment
| Parameter | 1-Arm (P3) | 4-Arm (All P3) | Improvement |
| Average Power (mW) | 0.0600 | 0.2400 | 4× |
| Max Power at 9,000 RPM (mW) | 0.1393 | 0.5572 | 4× |
| Energy per 10-min Mission (mJ) | 36 | 144 | 4× |
| DRL-Optimised Energy (mJ) | ~111 | 444 | 4× |
| Sensor Energy Offset (Climb Phase) | Partial (2.8%) | Partial (11.1% climb) | 4× |
| Sensor Energy Offset (Full Mission) | Partial (3.7%) | Partial (14.8% mission) | 4× |
| Battery Savings per 100 Missions (J) | 11.1 | 44.4 | 4× |
3. Energy-Aware Deep Reinforcement Learning Framework
3.1. Simulation Environment Design
3.2. Battery and Four-Arm Harvest Model
3.3. Reward Function
| Component | Formula | Weight | Rationale |
| Step penalty | −1 per time step | −0.05 | Encourages time efficiency |
| Progress reward | d_{t-1} − d_t (goal-approach) | +2.00 | Primary navigation signal |
| Energy penalty | Δb_net (four-arm model, Eq. 8) | −0.20 | Battery conservation |
| 4-Arm harvest bonus ★ | P_4arm(‖a‖) combined power | +0.02 | Physics-derived climb incentive |
| Goal bonus (battery-scaled) | 200 · (1 + 0.5 · b/100) | +200 | Rewards efficient goal arrival |
| Collision / out-of-bounds | Episode termination | −100 | Hard safety constraint |
| Battery depletion | b < 0 termination | −50 | Energy failure penalty |
3.4. DRL Algorithm Implementations
3.4.1. DQN — Deep Q-Network
3.4.2. PPO — Proximal Policy Optimisation
3.4.3. SAC — Soft Actor-Critic
3.5. Training Protocol and Hyperparameter Summary
4. Integrated Closed-Loop System Architecture
| SAC Configuration | Harvest Bonus | Success Rate (%) | Battery Consumed (%) |
| Standard (no energy awareness) | None | 71.1 ± 4.0 | 42.7 ± 3.1 |
| Energy-Aware (single-arm estimate) | Single-arm P_P3 | 82.2 ± 2.7 | 24.2 ± 1.8 |
| Energy-Aware + 4-Arm Bonus ★ (Proposed) | Full 4-arm P_4arm(‖a‖) | 83.1 ± 2.5 | 23.8 ± 1.7 |

5. Results
5.1. Training Convergence

5.2. Multi-Seed DRL Comparative Results
| Seed | SAC SR% | SAC Bat% | PPO SR% | PPO Bat% | DQN SR% | DQN Bat% |
| 42 | 84.2 | 22.1 | 73.8 | 27.1 | 60.1 | 33.4 |
| 123 | 79.1 | 26.4 | 68.1 | 31.8 | 54.3 | 39.1 |
| 456 | 85.8 | 22.8 | 75.2 | 28.4 | 60.8 | 34.8 |
| 789 | 80.7 | 25.7 | 72.4 | 30.2 | 57.2 | 37.2 |
| 1011 | 81.3 | 23.9 | 68.9 | 28.5 | 56.8 | 35.9 |
| Mean ± Std ★ | 82.2 ± 2.7 | 24.2 ± 1.8 | 71.7 ± 3.1 | 29.2 ± 1.8 | 57.8 ± 2.6 | 36.1 ± 2.2 |
5.3. Statistical Significance
| Comparison | ΔSR (pp) | t-statistic | p-value | Cohen’s d | Significant? | Verdict |
| SAC vs. PPO | 10.5 | 5.734 | <0.001 | 3.627 | Yes ★ | SAC superior |
| SAC vs. DQN | 24.4 | 14.376 | <0.001 | 9.092 | Yes ★ | SAC superior |
| PPO vs. DQN | 13.9 | 7.628 | <0.001 | 4.824 | Yes ★ | PPO superior |
| ANOVA (all) | — | F = 93.96 | <0.001 | — | Yes ★ | All differ |
5.4. Complete Performance Comparison
| Method | SR (%) | Battery (%) | Harvest/ep (mJ) | Mean Reward |
| SAC + 4-Arm Harvest Bonus ★ (Proposed) | 83.1 ± 2.5 | 23.8 ± 1.7 | 6.4 | 188.4 ± 8.2 |
| SAC (Energy-Aware, No 4-Arm Bonus) | 82.2 ± 2.7 | 24.2 ± 1.8 | 5.1 | 186.2 ± 8.8 |
| PPO (Energy-Aware) | 71.7 ± 3.1 | 29.2 ± 1.8 | 3.9 | 153.8 ± 7.6 |
| DQN (Energy-Aware) | 57.8 ± 2.6 | 36.1 ± 2.2 | 2.8 | 107.4 ± 7.9 |
| SAC (No Energy Awareness) | 71.1 ± 4.0 | 42.7 ± 3.1 | ~4.2 | — |
| A* + PID (Map-Based Baseline) | 43.5 ± 5.2 | 48.9 ± 4.7 | N/A | −12.3 ± 41.1 |

5.5. Experimental Bench Validation
| Quantity | FEA | Experimental | Error (%) | Assessment |
| V_oc at hover | 58.5 mV | 48.2 ± 2.3 mV | 17.6% | ✓ Good |
| V_oc at max throttle | 118.0 mV | 96.7 ± 4.8 mV | 18.1% | ✓ Good |
| Optimal load R_L | 70 kΩ | 66 ± 5 kΩ | 5.7% | ✓ Excellent |
| BPF at hover | 173.3 Hz | 173.1 ± 0.5 Hz | 0.1% | ✓ Excellent |
| f₁ arm natural frequency | 57.68 Hz | 57.82 ± 0.8 Hz | 0.2% | ✓ Excellent |
| RPM power minimum | ~7,800 RPM | 7,650 ± 200 RPM | 1.9% | ✓ Confirmed |
6. Discussion
6.1. Physical Basis for P3 Optimality Across All Four Arms
6.2. Why SAC Outperforms PPO and DQN — A Mechanistic Analysis
6.3. Energy Budget — Honest Assessment and Contextualisation

6.4. Comparison with Recent State-of-the-Art
6.5. Limitations and Future Work
7. Sim-to-Real Deployment Roadmap
Stage 1 — Bench Validation (Completed)
Stage 2 — Four-Arm Coupled Frame FEA (Planned)
Stage 3 — Closed-Loop In-Flight Demonstration (Planned)
Stage 4 — Multi-Platform Characterisation (Future)
| Component | Model | Function | Cost (USD) |
| Flight Controller | Pixhawk 6C | IMU + MAVLink telemetry | ~150 |
| Companion Computer | Raspberry Pi 4B | DRL inference (0.22 ms latency) | ~45 |
| Proximity Sensors (×8) | VL53L1X | Obstacle-avoidance sensing | ~40 |
| PZT-5A Patches (×4) | PIC255 50×15 mm | Arm-root harvest (P3, all four arms) | ~32 |
| Harvester ICs (×4) | LTC3588-1 | MPPT + power conditioning | ~48 |
| Supercapacitors (×4) | 0.47 F / 5.5 V | Per-arm harvested-energy buffer | ~12 |
| Battery Monitor | Mauch PL-200 | LiPo state-of-charge estimation | ~25 |
| UWB Localisation | Pozyx Creator | ±10 cm indoor positioning | ~300 |
| Total | — | — | ~652 |

8. Conclusions
- P3 (arm-root, 15% span) is the universally optimal PZT-5A placement across all four DJI F450 arms, yielding 0.0600 mW average power and 0.1393 mW at maximum throttle, analytically verified to 0.03% and experimentally confirmed within ±18%. The 75× advantage over motor-mount placement (P4) is a first-principles consequence of the clamped-free cantilever strain distribution.
- Symmetric four-arm P3 deployment yields 0.2400 mW combined average power and 144 mJ per standard 10-minute mission. DRL-optimised flight under SAC increases per-mission recovery to 444 mJ. An energy-budget audit reveals that total mission harvest (444 mJ) offsets 14.8% of sensor demand (3,000 mJ), yielding 44.4 J of primary battery relief over 100 missions.
- SAC is Pareto-optimal among the three evaluated DRL algorithms: 82.2 ± 2.7% navigation success with 24.2 ± 1.8% battery consumption, confirmed by ANOVA (F = 93.96, p < 0.001, Cohen’s d ≥ 3.6).
- The non-monotonic RPM–power relationship is experimentally confirmed within ±2% and provides the principal physics-derived incentive driving the SAC policy toward climb-phase exploitation.
- All results are validated by 43/43 unit tests and bench experiments. The framework provides a reproducible, open-source foundation for future multi-platform and outdoor deployment studies, with a detailed four-stage sim-to-real roadmap.
Author Contributions
Funding
Data Availability
Conflicts of Interest
Appendix A. Source Code
A.1. Four-Arm FEA Power Interpolation
A.2. SAC Corrected Log-Probability
Appendix B. Full Experimental Dataset
| RPM | V_oc FEA (mV) | V_oc Exp (mV) | P FEA (mW) | P Exp (mW) | Error (%) |
| 5,200 | 58.5 | 48.2 ± 2.3 | 0.0342 | 0.0281 ± 0.001 | 17.8% |
| 5,900 | 50.7 | 42.1 ± 2.1 | 0.0286 | 0.0234 ± 0.001 | 18.2% |
| 6,400 | 42.8 | 35.4 ± 1.8 | 0.0201 | 0.0165 ± 0.001 | 17.9% |
| 7,100 | 24.2 | 19.9 ± 1.0 | 0.0064 | 0.0053 ± 0.000 | 17.2% |
| 7,800 † | 14.5 | 12.0 ± 0.6 | 0.0023 | 0.0019 ± 0.000 | 17.4% |
| 8,400 | 39.3 | 32.4 ± 1.6 | 0.0168 | 0.0138 ± 0.001 | 17.9% |
| 9,000 ★ | 118.0 | 96.7 ± 4.8 | 0.1393 | 0.1145 ± 0.006 | 17.8% |
References
- Boukoberine, M.N., Zhou, Z. & Benbouzid, M. (2019). A critical review on unmanned aerial vehicles power supply and energy management. Energy Conversion and Management, 196, 1130–1152.
- Erturk, A. & Inman, D.J. (2011). Piezoelectric Energy Harvesting. Wiley.
- Perez, M., et al. (2015). An electret-based aeroelastic flutter energy harvester. Smart Materials and Structures, 24(3), 035004.
- Anton, S.R. & Sodano, H.A. (2007). A review of power harvesting using piezoelectric materials (2003–2006). Smart Materials and Structures, 16(3), R1–R21.
- Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533.
- Schulman, J., et al. (2017). Proximal policy optimization algorithms. arXiv:1707.06347.
- Haarnoja, T., et al. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Proc. 35th ICML, 80, 1861–1870.
- Pham, H.X., et al. (2018). Autonomous UAV navigation using reinforcement learning. arXiv:1801.05086.
- Koch, W., et al. (2019). Reinforcement learning for UAV attitude control. ACM Trans. Cyber-Physical Systems, 3(2), Article 22.
- Omar, S. & Ma, G. (2025). Single-arm PZT-5A energy harvesting for UAVs: FEA validation and preliminary DRL coupling. Unpublished preliminary work.
- Lin, D., Liu, Y. & Cui, Y. (2023). Reviving the lithium metal anode for high-energy batteries. Nature Nanotechnology, 18, 215–225.
- Chen, K., et al. (2023). Multi-modal piezoelectric energy harvesting from drone structural vibrations using stacked PZT configurations. Applied Energy, 340, 121012.
- Li, Z., et al. (2024). Broadband piezoelectric harvesting for quadrotor platforms using bistable magnetic coupling. Mechanical Systems and Signal Processing, 201, 110628.
- Wang, J., et al. (2024). Deep reinforcement learning for UAV energy management: A comprehensive review. IEEE Trans. Intelligent Transportation Systems, 25(4), 3412–3431.
- Zhang, Y., et al. (2023). Soft actor-critic for multi-objective UAV trajectory optimisation under energy and time constraints. IEEE Trans. Aerospace and Electronic Systems, 59(4), 4123–4137.
- Ryu, S., et al. (2023). Finite element analysis of PZT patches on CFRP drone arms. Smart Materials and Structures, 32(8), 085014.
- Elahi, H., et al. (2023). Piezoelectric energy harvesting for unmanned aerial vehicles: A comprehensive review. Energy Reports, 9, 3659–3673.
- Kumar, A., et al. (2024). Energy-aware navigation of autonomous UAVs using proximal policy optimisation. Robotics and Autonomous Systems, 172, 104590.
- Liu, X., et al. (2024). Comparative study of deep reinforcement learning algorithms for UAV obstacle avoidance in GPS-denied environments. Aerospace Science and Technology, 147, 109038.
- Park, J., et al. (2023). Experimental characterisation of PZT-5A patches on composite cantilever beams. J. Intelligent Material Systems and Structures, 34(12), 1456–1471.
- Hassan, M.A., et al. (2024). LiPo battery degradation modelling for UAV endurance prediction. J. Power Sources, 601, 234213.
- Wu, L., et al. (2024). Maximum entropy reinforcement learning for energy-harvesting mobile robots. IEEE Trans. Neural Networks and Learning Systems, 35(4), 5234–5248.
- Morales-Garcia, R., et al. (2024). Deep Q-network variants for autonomous drone navigation. Drones, 8(3), 89.
- Singh, P., et al. (2025). Sim-to-real transfer for DRL-based UAV navigation. IEEE Robotics and Automation Letters, 10(2), 1543–1550.
- Kim, D., et al. (2024). Vibration-based energy harvesting from rotating machinery: A systematic review of 23 independent studies. Sensors and Actuators A: Physical, 365, 114872.
- Zhao, H., et al. (2025). Four-rotor UAV structural vibration energy harvesting. Energy Conversion and Management, 305, 118250.
- Nguyen, T., et al. (2025). End-to-end energy management for UAV swarms using multi-agent reinforcement learning. IEEE Trans. Vehicular Technology, 74(3), 4521–4536.
- Featherstone, M., et al. (2023). Structural health monitoring integration with piezoelectric energy harvesting on multirotor inspection platforms. Structural Health Monitoring, 22(5), 3124–3141.
- Tan, Y., et al. (2023). MPPT circuit design for piezoelectric UAV harvesters using LTC3588. IEEE Trans. Power Electronics, 38(9), 11234–11245.
| Property | Symbol | F450 Arm (GFRN) | PZT-5A Patch |
| Density (kg/m³) | ρ | 1,450 | 7,750 |
| Young’s Modulus (GPa) | E | 18.5 | 66 |
| Piezoelectric Coefficient (pm/V) | d₃₁ | — | −171 |
| Structural Damping Ratio | ζ | 0.02 | 0.02 |
| Dielectric Permittivity (nF/m) | ε″ | — | 15.93 |
| Patch Dimensions (mm) | — | — | 50 × 15 × 0.2 |
| Arm Length (mm) | L | 245 | — |
| Arm Width / Height (mm) | b / h | 16 / 6 | — |
| Optimal Load Resistance (kΩ) | R_L | — | 66–70 |
| Arm | Patch | Span (%) | Hover (mW) | Climb (mW) | Max Throttle (mW) | Avg (mW) | 4-Arm Total (mW) |
| Arm-1 (Front-R) | P3 ★ | 15% | 0.0342 | 0.0064 | 0.1393 | 0.0600 | |
| Arm-2 (Front-L) | P3 ★ | 15% | 0.0342 | 0.0064 | 0.1393 | 0.0600 | |
| Arm-3 (Rear-R) | P3 ★ | 15% | 0.0342 | 0.0064 | 0.1393 | 0.0600 | |
| Arm-4 (Rear-L) | P3 ★ | 15% | 0.0342 | 0.0064 | 0.1393 | 0.0600 | |
| Combined | — | — | 0.1368 | 0.0256 | 0.5572 | 0.2400 | 0.2400 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).