Preprint
Article

This version is not peer-reviewed.

SGW-DETR: A Spectral-Guided Graph-Structured Wavelet Transformer for UAV Infrared Object Detection under Degradation

Submitted:

13 May 2026

Posted:

14 May 2026

You are already at the latest version

Abstract
Infrared object detection from unmanned aerial vehicles (UAVs) is critically challenged by multi-type composite degradation—including noise, blur, and low contrast—which severely undermines feature discriminability and multi-scale target perception. This study proposes SGW-DETR (Spectral-Guided Graph-structured Wavelet Detection Transformer), a novel framework built upon RT-DETR, incorporating three synergistic modules across the backbone, neck, and encoder. FDSANet (Frequency Domain Spectral Awareness Network) replaces the conventional ResNet backbone, integrating the Multi-Scale Frequency Perception Module (MSFPM), Selective Channel Frequency Decomposition (SCFD), and Dynamic Kernel Spectral Modulation (DKSM) to achieve instance-level adaptive spectral feature extraction without degradation-type supervision. The Graph-Structured Fusion Network (GSFN) combines the Adaptive Semantic Fusion Module (ASFM) with the Graph Structure Perception Module (GSPM), employing Gaussian kernel soft membership and two-stage message passing to explicitly model spatial topological dependencies among object components. The Wavelet-guided Contrast Feature Aggregation module (WCFA) restructures the Attention-based Intra-scale Feature Interaction (AIFI) encoder via a Haar-based Frequency Decomposition Unit (HFDU), decomposing features into foreground-edge and background-thermal components and achieving hierarchical foreground–background decoupling through nested dual-path causal contrastive attention. A UAV infrared degradation dataset comprising 4,686 images spanning six degradation types with component-level annotations was constructed for evaluation. SGW-DETR achieves 75.2% mAP50, outperforming RT-DETR by 3.5%, while simultaneously reducing GFLOPs and parameter count by 16.8% and 9.9% at an inference speed of 85.5 FPS. Sustained performance gains on M3FD and IndraEye benchmarks further demonstrate the framework’s cross-domain generalization capability, offering practical value for UAV-based surveillance, search-and-rescue, and border monitoring under adverse imaging conditions.
Keywords: 
;  ;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated