SGW-DETR: A Spectral-Guided Graph-Structured Wavelet Transformer for UAV Infrared Object Detection under Degradation

Kaipeng Wang; Guanglin He; Yuzhe Fu; Zelong Chen; Hao Zhang

doi:10.20944/preprints202605.0956.v1

Submitted:

13 May 2026

Posted:

14 May 2026

You are already at the latest version

Abstract

Infrared object detection from unmanned aerial vehicles (UAVs) is critically challenged by multi-type composite degradation—including noise, blur, and low contrast—which severely undermines feature discriminability and multi-scale target perception. This study proposes SGW-DETR (Spectral-Guided Graph-structured Wavelet Detection Transformer), a novel framework built upon RT-DETR, incorporating three synergistic modules across the backbone, neck, and encoder. FDSANet (Frequency Domain Spectral Awareness Network) replaces the conventional ResNet backbone, integrating the Multi-Scale Frequency Perception Module (MSFPM), Selective Channel Frequency Decomposition (SCFD), and Dynamic Kernel Spectral Modulation (DKSM) to achieve instance-level adaptive spectral feature extraction without degradation-type supervision. The Graph-Structured Fusion Network (GSFN) combines the Adaptive Semantic Fusion Module (ASFM) with the Graph Structure Perception Module (GSPM), employing Gaussian kernel soft membership and two-stage message passing to explicitly model spatial topological dependencies among object components. The Wavelet-guided Contrast Feature Aggregation module (WCFA) restructures the Attention-based Intra-scale Feature Interaction (AIFI) encoder via a Haar-based Frequency Decomposition Unit (HFDU), decomposing features into foreground-edge and background-thermal components and achieving hierarchical foreground–background decoupling through nested dual-path causal contrastive attention. A UAV infrared degradation dataset comprising 4,686 images spanning six degradation types with component-level annotations was constructed for evaluation. SGW-DETR achieves 75.2% mAP50, outperforming RT-DETR by 3.5%, while simultaneously reducing GFLOPs and parameter count by 16.8% and 9.9% at an inference speed of 85.5 FPS. Sustained performance gains on M3FD and IndraEye benchmarks further demonstrate the framework’s cross-domain generalization capability, offering practical value for UAV-based surveillance, search-and-rescue, and border monitoring under adverse imaging conditions.

Keywords:

UAVinfrared detection

;

composite image degradation

;

spectral feature learning

;

graph neural network

;

wavelet decomposition

;

transformer detection

Subject:

Engineering - Electrical and Electronic Engineering

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

SGW-DETR: A Spectral-Guided Graph-Structured Wavelet Transformer for UAV Infrared Object Detection under Degradation

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe