Infrared object detection from unmanned aerial vehicles (UAVs) is critically challenged by multi-type composite degradation—including noise, blur, and low contrast—which severely undermines feature discriminability and multi-scale target perception. This study proposes SGW-DETR (Spectral-Guided Graph-structured Wavelet Detection Transformer), a novel framework built upon RT-DETR, incorporating three synergistic modules across the backbone, neck, and encoder. FDSANet (Frequency Domain Spectral Awareness Network) replaces the conventional ResNet backbone, integrating the Multi-Scale Frequency Perception Module (MSFPM), Selective Channel Frequency Decomposition (SCFD), and Dynamic Kernel Spectral Modulation (DKSM) to achieve instance-level adaptive spectral feature extraction without degradation-type supervision. The Graph-Structured Fusion Network (GSFN) combines the Adaptive Semantic Fusion Module (ASFM) with the Graph Structure Perception Module (GSPM), employing Gaussian kernel soft membership and two-stage message passing to explicitly model spatial topological dependencies among object components. The Wavelet-guided Contrast Feature Aggregation module (WCFA) restructures the Attention-based Intra-scale Feature Interaction (AIFI) encoder via a Haar-based Frequency Decomposition Unit (HFDU), decomposing features into foreground-edge and background-thermal components and achieving hierarchical foreground–background decoupling through nested dual-path causal contrastive attention. A UAV infrared degradation dataset comprising 4,686 images spanning six degradation types with component-level annotations was constructed for evaluation. SGW-DETR achieves 75.2% mAP50, outperforming RT-DETR by 3.5%, while simultaneously reducing GFLOPs and parameter count by 16.8% and 9.9% at an inference speed of 85.5 FPS. Sustained performance gains on M3FD and IndraEye benchmarks further demonstrate the framework’s cross-domain generalization capability, offering practical value for UAV-based surveillance, search-and-rescue, and border monitoring under adverse imaging conditions.