1. Introduction
Precision pesticide application in orchard environments is increasingly critical for reducing chemical overuse, environmental contamination, and exposure to non-target species [
1]. Conventional spraying methods often result in excessive pesticide use, leading to ecological degradation and the development of pesticide resistance. To address these limitations, intelligent agricultural systems that leverage robotics, sensor fusion, and machine learning have emerged as viable alternatives. Autonomous spraying robots equipped with real-time perception modules enable selective application by identifying tree structures and environmental conditions. These systems integrate multimodal sensing, including visual, infrared, and LiDAR data, to support adaptive spraying control [
2,
3]. The use of embedded AI algorithms further improves decision-making accuracy, allowing precise targeting and significant reductions in pesticide usage. A Fin Ray-inspired soft gripper with integrated force feedback and ultrasonic slip detection was developed for apple harvesting, achieving a 0% fruit damage rate when force feedback was enabled, compared to 20% without it, and demonstrating effective non-destructive picking in orchard trials [
37]. Beyond manipulation systems, these developments represent a practical advancement in smart agricultural technology within the context of industrial informatics. Recent work [
4] emphasizes the role of computer vision in automating agricultural practices such as fruit picking, disease monitoring, and spraying, which align closely with the goals of this study.
Recent advances in deep learning have significantly improved pest detection and precision pesticide application in agriculture [
5,
6,
7,
8]. Lightweight segmentation models and optimized YOLO architectures have enabled real-time inference with reduced computational complexity, facilitating deployment on edge devices in resource-constrained environments [
9,
10]. To improve small-target detection, techniques such as hybrid SGD-GA optimization and enhanced feature extraction layers have been incorporated into models like YOLOX and YOLOv7, achieving high accuracy in field conditions [
11,
12,
13]. An improved YOLOv4-based detection method was proposed for recognizing apples in complex orchard environments, significantly enhancing detection accuracy and reducing model size for efficient deployment in real-world scenarios [
27]. A high-efficiency target detection algorithm, Seedling-YOLO, was developed based on YOLOv7-Tiny to assess broccoli seedling transplanting quality, achieving a mean Average Precision (mAP@0.5) of 94.3% and a detection speed of 29.7 frames per second in field conditions [
9]. Using a ZED 2 stereo camera and YOLO V4-Tiny, a potted flower detection and localization system was implemented, yielding a mean average precision of 89.72%, a recall rate of 80%, and an average detection speed of 16 FPS, with a mean absolute localization error of 18.1 mm [
16]. Similarly, an apple detection framework integrating ShufflenetV2 and YOLOX-Tiny, enhanced with attention and adaptive feature fusion modules, achieved 96.76% average precision and operated at 65 FPS in complex orchard environments [
36]. An improved YOLOX-based method utilizing RGB-D imaging was developed for real-time apple detection and 3D localization, achieving a mean average precision of 94.09%, F1 score of 93%, and spatial positioning errors under 7 mm in X and Y axes and under 5 mm in Z axis [
38]. Advanced frameworks such as CEDAnet and Transformer-based modules have been applied to UAV-based orchard monitoring, achieving precise tree segmentation in dense canopies [
14]. Additionally, multi-sensor fusion systems integrating LiDAR, vision, and IMUs have shown promise for dynamic tree localization and pose estimation in semi-structured orchards [
15,
40]. A deep learning-based variable rate agrochemical spraying system was developed for targeted weed control in strawberry crops, utilizing VGG-16 for real-time classification of spotted spurge and shepherd’s purse, achieving a 93% complete spray rate on target weeds [
41]. These developments highlight the importance of real-time, adaptable solutions for intelligent pesticide management in modern orchard environments.
Recent advancements in deep learning have significantly enhanced plant disease and pest detection in precision agriculture. Lightweight models such as ResNet50 and MobileNetv2 have shown efficiency in image classification tasks, while YOLO-based detectors, particularly YOLOv5 and YOLOv8, have achieved high accuracy in real-time object detection [
16,
17,
18]. For tomato crops, an improved Faster R-CNN model incorporating ResNet-50, K-means clustering for anchor box optimization, and Soft-NMS achieved a mean average precision of 90.7% in detecting flowers and fruits in complex environments [
39]. Similarly, a CNN enhanced with attention mechanisms yielded 96.81% accuracy in classifying tomato leaf diseases [
23,
28], while a hybrid model combining Competitive Adaptive Reweighted Sampling (CARS) and CatBoost algorithms estimated tomato transpiration rates with an
of 0.92 and RMSE of 0.427 g·
·
[
34]. Furthermore, a comprehensive review on CNN-based vegetable disease detection emphasized the dominance of VGG models and underscored challenges in data limitations and generalization performance [
35]. Complementary work reviewed nondestructive quality assessment in tomatoes using mechanical, electromagnetic, and electrochemical sensing integrated with deep learning [
43]. In parallel, deep learning-based approaches for apple quality assessment have achieved promising results. A multi-dimensional view processing method leveraging Swin Transformer and an enhanced YOLOv5s framework attained 94.46% grading accuracy and 96.56% defect recognition at 32 FPS [
29]. Another YOLOv5-based model incorporating Mish activation, DIoU loss, and Squeeze-and-Excitation modules reached a grading accuracy of 93% with throughput of four apples per second on an automatic grading machine [
33]. In addition, a CNN-VGG16-based system classified apple color and deformity with 92.29% accuracy [
31], and recent reviews have reported that deep learning combined with spectral imaging can achieve up to 98.7% mean average precision in apple maturity estimation [
32]. Across both crop types, architectural innovations such as multi-scale feature fusion, Receptive Field Attention Convolution, and advanced loss functions like WIoUv3 have further enhanced detection robustness in field conditions [
18,
19]. Techniques including transfer learning and mini-batch k-means++ clustering have also proven effective in reducing training overhead and improving bounding box precision in low-data regimes [
20]. Meanwhile, data augmentation and semantic segmentation strategies have contributed to better handling of small, occluded, or overlapping targets in dense crop environments [
21,
22]. [
24] demonstrated early use of YOLO for pest detection in greenhouse settings, underscoring the long-term potential of real-time detection frameworks in integrated pest management. Collectively, these developments form a foundation for the design of scalable, edge-deployable detection systems in smart agricultural applications.
Despite recent advancements, several challenges remain. Whereas precision spraying technologies have shown promise in reducing pesticide usage, most current systems still face significant limitations in real-time deployment, particularly in complex orchard environments. Dense tree canopies, variable lighting, and occlusions present serious obstacles for reliable pest detection and precise spraying. Existing YOLO architectures, including YOLOv9, often face challenges with computational complexity, limiting their applicability on resource-constrained industrial robotics platforms such as autonomous orchard spraying robots. To directly address these challenges, we propose YOLOv9-SEDA, featuring depthwise separable convolutions, Efficient Channel Attention (ECA), Swish activation, and Lookahead-AdamW optimization. This combination explicitly addresses real-time performance and reliability requirements, crucial for industrial robotics applications.
This paper proposes YOLOv9-SEDA (referred to as SEDA), an enhanced object detection framework tailored for real-time pesticide application in orchard environments. A systematic literature review [
25] supports the growing use of YOLO-based models in agricultural object detection, particularly in scenarios requiring real-time decision making. The main contributions of this work are as follows:
Model Architecture Enhancement: SEDA integrates depthwise separable convolutions and Efficient Channel Attention (ECA) to improve feature extraction while significantly reducing computational complexity.
Improved Training Stability and Non-Linearity Handling: The Lookahead optimizer combined with AdamW is employed to enhance convergence speed and stability, while the Swish activation function mitigates vanishing gradients and improves non-linear representation.
Edge Device Optimization: The framework is designed for real-time operation on resource-constrained edge devices, enabling practical deployment in field scenarios.
Robust Detection in Complex Environments: SEDA is particularly effective at detecting small and overlapping targets under variable environmental conditions, addressing challenges common in dense orchards.
Sustainable Agricultural Impact: By improving detection precision, the system supports targeted pesticide application, leading to significant reductions in chemical usage and environmental impact.
Comprehensive Validation: Extensive experimental evaluations demonstrate that SEDA outperforms YOLOv9 and other state-of-the-art models in both accuracy and inference speed, validating its effectiveness for intelligent agricultural spraying systems.