Submitted:
23 December 2025
Posted:
24 December 2025
You are already at the latest version
Abstract
Unmanned Aerial Vehicles (UAVs) are essential tools for high-resolution urban remote sensing; however, maximizing their operational efficiency is often hindered by the Size, Weight, and Power (SWaP) constraints inherent to aerial platforms. High-end sensors (e.g., LiDAR) provide dense data but reduce flight endurance and require extensive post-processing, delaying actionable intelligence. To address the challenge of maximizing data utility through cost-effective means, this study evaluates an adaptive multi-modal monitoring framework utilizing high-resolution RGB imagery. Using a DJI Matrice 300 RTK, we assessed the performance of RGB-based advanced AI architectures across varying urban density zones. We stress-tested End-to-End Deep Learning models (Mask R-CNN, YOLOv8-seg) and a Hybrid approach (U-Net++ fused with RGB-derived Canopy Height Models) to determine their viability for replacing active sensors in precision analysis. Results indicate that the RGB-based Hybrid model achieved superior Semantic IoU (0.551), successfully demonstrating that optical imagery combined with deep learning can substitute for heavy active sensors in area-based estimation tasks. Crucially for autonomous UAV operations, YOLOv8-seg achieved inference speeds of 3.89 seconds per tile, approximately 1.86 times faster than Mask R-CNN, validating its suitability for onboard inference on embedded systems. This study establishes a protocol for high-precision analysis using standard RGB sensors, offering a strategic pathway for deploying scalable, consumer-grade UAV fleets in complex urban environments.
