In unmanned pharmacy and home-care medicine management applications, reliable pillbox localization is a prerequisite for automated dispensing and grasping. However, existing detectors still perform poorly in complex environments where dense stacking, occlusion, weak illumination, and high inter-class similarity occur simultaneously. To address this problem, GSPM-YOLO is proposed as an improved detector built on the YOLOv11 framework for complex pillbox recognition, and four novel plug-and-play lightweight modules are developed: GSimConv, a lightweight dual-branch convolution module that incorporates the Attention Weight Calculation Algorithm in HardSAM for edge-preserving feature extraction, PSCAM for position-sensitive coordinate attention, MSAAM, a multi-scale strip-pooling module that integrates the Horizontal Context-Aware Attention weight calculation algorithm to strengthen occluded targets, and LGPFH for bidirectional ghost pyramid fusion. To simulate the complex operating environments of dispensing robots, we construct MBox-Complex, a dataset of 3{,}041 images with 8{,}153 annotations across 25 drug categories. Ablation experiments first validate the effectiveness of the four-module composition, with F1 rising from 0.641 to 0.714, and each module is then individually compared with advanced replacement schemes in dedicated substitution experiments to verify its own effectiveness. The integrated model is then benchmarked against advanced detectors and domain-specific methods on the self-constructed MBox-Complex dataset, achieving 0.727 mAP@50 and 0.427 mAP@50-95 with 3.8M parameters and surpassing YOLOv11 by 7.1 and 4.0 percentage points and YOLOv12 by 4.3 and 3.1 percentage points, respectively. Further cross-dataset evaluation on the VOC and Brain Tumor benchmark datasets verifies the transferability of the proposed model. Grad-CAM is adopted to visualize the detector's attention distribution, and the resulting heatmaps together with detection visualizations confirm that the proposed model focuses more precisely on stacked and occluded regions.