Preprint
Article

This version is not peer-reviewed.

LDTC-YOLO: A Lightweight Detection Model for Typical Citrus Leaf and Fruit Diseases in Real Orchard Environments

Submitted:

19 May 2026

Posted:

21 May 2026

You are already at the latest version

Abstract
Accurate detection of citrus leaf and fruit diseases is important for precision orchard management. However, real orchard images often contain small disease symptoms, leaf and fruit overlap, illumination variation, and cluttered backgrounds, making reliable detection challenging. This study proposes LDTC-YOLO, a lightweight YOLOv8n-based detection model for typical citrus leaf and fruit diseases in real orchard environments. To improve detection accuracy and model compactness, LDTC-YOLO integrates an Adaptive Feature Pyramid Network (AFPN) for cross-level feature fusion, Coordinate Attention (CA) for disease-region feature enhancement, a Lightweight Shared Convolutional Detection (LSCD) head for reducing parameter redundancy, and Wise-IoU (WIoU) for bounding-box regression optimization. In addition, a self-collected handheld citrus disease dataset, HOCD-4, was constructed using close-range smartphone images captured in real orchards. The dataset covers leaf and fruit symptoms of four typical citrus diseases: Huanglongbing/citrus greening (HLB), black spot, canker, and melanose. Experimental results show that LDTC-YOLO achieved precision, recall, mAP@0.5, and mAP@0.5:0.95 values of 0.915, 0.843, 0.894, and 0.648, respectively. Compared with YOLOv8n, LDTC-YOLO reduced parameters, GFLOPs, and model size from 3.006 M to 1.887 M, 8.1 to 7.4, and 5.97 MB to 3.83 MB, while increasing inference speed from 43.14 FPS to 47.45 FPS. These results indicate that LDTC-YOLO improves detection performance while maintaining a compact and efficient model profile, providing a potential reference for citrus disease detection under real orchard imaging conditions.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

Citrus is one of the most important fruit crops worldwide and is widely cultivated in tropical and subtropical regions for fresh consumption and industrial processing. Owing to its nutritional value, commercial importance, and broad adaptability, citrus production plays an important role in agricultural economies and rural livelihoods in many producing countries [1].
However, citrus production is continuously threatened by various diseases. Major diseases, such as Huanglongbing/citrus greening (HLB), citrus canker, citrus black spot, and citrus melanose, can cause yield reduction, increased fruit drop, deterioration of fruit appearance quality, and loss of commercial value. Among them, HLB is widely regarded as one of the most destructive citrus diseases, leading to severe tree decline and substantial yield loss. Citrus canker and citrus black spot can cause visible fruit-surface damage and may result in economic losses due to quarantine requirements and trade restrictions, while citrus melanose commonly affects young leaves and fruits under warm and humid conditions, reducing fruit appearance and market value [2,3,4]. Because these diseases can produce visible symptoms on leaves and/or fruits, image-based detection provides a feasible way to support rapid orchard disease screening.
Timely and accurate disease identification is essential for effective prevention and control. At present, orchard disease surveys still largely rely on manual visual inspection, which is labor-intensive, time-consuming, and susceptible to subjective experience. In addition, background clutter, illumination variation, occlusion, leaf or fruit overlap, and blurred lesion boundaries in field images further increase the difficulty of disease recognition and localization [5,6]. These limitations make purely manual inspection insufficient for rapid and stable disease monitoring in modern citrus production [7]. Therefore, developing objective and efficient citrus disease detection methods is important for precision orchard management and sustainable citrus production. Automated detection can reduce dependence on professional experience and improve the efficiency of orchard disease screening, thereby providing a useful reference for future intelligent orchard inspection and disease management.
In practical orchard management, citrus disease detection is not only an image classification or object detection task, but also a practical problem involving imaging conditions, model efficiency, and potential field application. Close-range image acquisition using handheld devices such as smartphones is low-cost, portable, and suitable for daily orchard inspection, making it a practical approach for non-destructive disease image collection in orchards. However, images acquired in such scenarios often contain variable shooting angles, natural illumination changes, cluttered backgrounds, frequent leaf occlusion and overlap, fruit overlap, and small or locally distributed disease regions. In potential field applications, models are also expected to have moderate computational cost, storage requirements, and inference latency [8]. Therefore, achieving accurate citrus disease detection under real close-range orchard conditions while maintaining model efficiency remains a practical challenge. In this study, lightweight design refers to reducing model parameters, computational cost, and storage size while maintaining competitive detection accuracy and acceptable inference speed. These indicators are used to evaluate whether the model has a compact and efficient profile for potential resource-constrained orchard inspection scenarios.
With the development of computer vision and deep learning, plant disease recognition has gradually shifted from traditional machine learning methods based on handcrafted features to automatic feature-learning methods represented by convolutional neural networks. Traditional methods can perform well under controlled conditions, but their robustness to background interference, illumination fluctuations, occlusion, and sample variation is limited. In contrast, deep learning methods can automatically extract discriminative features from raw images and have been widely used in agricultural disease recognition [9,10,11]. Among object detection frameworks, the YOLO series has been widely adopted in agricultural detection tasks because it balances detection accuracy and inference efficiency. Compared with two-stage detectors, one-stage detection methods have advantages in speed and are more suitable for potential field applications. Recent studies based on YOLO models have verified their practical potential in agricultural disease detection and other complex agricultural scenarios [12,13,14].
For citrus disease detection, existing studies have gradually shifted from laboratory or simple-background scenarios to natural orchard scenes, where leaf occlusion, fruit overlap, background interference, and illumination variation significantly increase the difficulty of disease recognition and localization. Dai et al. proposed YOLOv8-GABNet for citrus disease and nutrient deficiency recognition by introducing ADown, BiFPN, and GLSA modules [15]. Feng et al. proposed YOLO-Citrus for citrus leaf disease detection under uneven illumination, branch and leaf occlusion, fruit overlap, and background variation [16]. Chen et al. proposed BD-YOLOv8 to balance real-time performance and accuracy through lightweight feature extraction and a parameter-sharing detection head [17]. These studies have advanced lightweight citrus disease detection in natural scenes. Nevertheless, several issues remain insufficiently addressed. First, many studies mainly consider either leaf symptoms or fruit symptoms, whereas real orchard inspection often involves both. Second, feature enhancement and model compactness need to be balanced more explicitly for small, occluded, and locally distributed disease regions. Third, for handheld close-range orchard images, module combinations should be justified by their complementary roles rather than by simple structural stacking.
To address these requirements, this study proposes LDTC-YOLO, where LDTC denotes Lightweight Detection for Typical Citrus Diseases. The model is developed based on YOLOv8n for citrus leaf and fruit disease detection in real orchard environments. Rather than simply stacking additional modules, LDTC-YOLO follows a problem-oriented design. AFPN is introduced to strengthen multi-scale feature interaction for disease regions with different sizes, CA is embedded to emphasize spatially informative lesion responses under cluttered backgrounds, LSCD is adopted to reduce detection-head redundancy, and WIoU is used to improve bounding-box regression optimization during training. In addition, a self-collected handheld orchard dataset, HOCD-4, is constructed to evaluate the model using close-range smartphone images captured in real orchards. Experimental results show that LDTC-YOLO improves detection accuracy while reducing parameters, GFLOPs, and model size, and slightly increasing inference speed compared with YOLOv8n. The main contributions of this study are as follows:
  • A self-collected handheld orchard citrus disease dataset, HOCD-4, was constructed using close-range smartphone images captured in real orchards. The dataset covers leaf and fruit symptoms of four typical citrus diseases, including HLB, citrus black spot, citrus canker, and citrus melanose, and contains practical field imaging characteristics such as illumination variation, background clutter, partial occlusion, leaf or fruit overlap, and small disease regions.
  • A lightweight YOLOv8n-based detection model, LDTC-YOLO, was proposed for citrus leaf and fruit disease detection in real orchard environments. The model aims to improve detection performance for small and partially occluded disease regions while maintaining a compact model structure.
  • A problem-oriented feature enhancement strategy was designed by combining AFPN and CA. AFPN strengthens multi-scale feature interaction, while CA enhances spatially informative disease responses in key feature layers, improving the representation of small and background-interfered disease regions without simply increasing network depth or width.
  • A compact detection structure was adopted by introducing LSCD and WIoU. LSCD reduces detection-head parameter redundancy through shared convolution, and WIoU improves bounding-box regression optimization during training, helping the model balance detection accuracy, localization performance, and compactness.
The remainder of this paper is organized as follows. Section 2 introduces data acquisition, dataset construction, preprocessing, and data augmentation. Section 3 describes the proposed model architecture and key modules. Section 4 presents the experimental settings, evaluation metrics, comparative experiments, and ablation results. Section 5 discusses model performance, limitations, and potential application scenarios. Finally, Section 6 summarizes the study and outlines future research directions.

2. Dataset Construction and Preprocessing

2.1. Data Collection

In this study, a self-collected dataset named HOCD-4 (Handheld Orchard Citrus Disease Dataset) was constructed for citrus leaf and fruit disease detection under close-range orchard imaging conditions. The dataset contains 3,511 high-resolution images with an original resolution of 1920 × 1080. It covers four typical citrus diseases, namely HLB, citrus black spot, citrus canker, and citrus melanose, and includes visible disease symptoms occurring on both leaves and fruits. These diseases were selected because they are common or economically important in citrus production and present visually observable symptoms suitable for image-based detection. The dataset and corresponding annotations are publicly available at https://github.com/AlanPikaso/HOCD-4.git.
The four disease categories exhibit different visual symptoms on citrus leaves and fruits. HLB is mainly characterized by asymmetric leaf yellowing, mottled leaves, and abnormal fruit coloration; melanose usually appears as densely distributed small black or dark-brown spots; canker is characterized by brown raised lesions with relatively clear boundaries; and black spot often appears as black necrotic spots on fruit or leaf surfaces. The disease categories were confirmed with the assistance of agricultural experts. Representative examples of the four disease categories are shown in Figure 1. All representative images shown in the figure were selected from HOCD-4.
The HOCD-4 dataset was constructed from self-collected images acquired under real close-range orchard conditions. The image acquisition process focused on practical orchard inspection scenarios rather than controlled laboratory conditions. Therefore, the collected images include variations in shooting angle, object distance, natural illumination, background clutter, leaf occlusion, fruit overlap, and lesion scale. These characteristics make HOCD-4 suitable for evaluating citrus disease detection models under real orchard imaging conditions.
Image acquisition was conducted in 12 orchard bases located in the main production area of Bingtang sweet orange in Yongxing County, Chenzhou City, Hunan Province, China. Images were collected between 1 January 2023 and 26 March 2024. All images were manually captured at close range using a HUAWEI P40 5G smartphone under natural illumination conditions without using a flash. The acquired images were stored in JPG format with an original resolution of 1920 × 1080. The final HOCD-4 dataset contains four representative citrus disease categories and includes visible disease symptoms on both leaves and fruits, providing a data basis for evaluating citrus disease detection models in real orchard images involving background clutter, occlusion, overlapping targets, and small disease regions.

2.2. Data Annotation and Dataset Partitioning

To ensure the accuracy and consistency of data annotation, LabelImg software was used to manually annotate the citrus disease images [18]. The annotated objects were visible symptomatic regions of HLB, melanose, canker, and black spot on citrus leaves and fruits. During annotation, the disease categories were confirmed with the assistance of agricultural experts, and the bounding boxes were drawn according to visible disease symptoms. Lesion color, morphology, boundary characteristics, and spatial distribution were jointly considered to improve the reliability and consistency of the annotation results.
All annotation files were saved in the .txt format required for YOLO object detection. Each image corresponded to an annotation file with the same name, in which the target category ID and normalized bounding-box coordinates were recorded. Before final dataset partitioning, duplicate images, severely blurred images, and images with unclear disease symptoms were removed to reduce the interference of low-quality samples in model training and evaluation.
The dataset was then divided into training, validation, and test sets at a ratio of 8:1:1, which were used for model training, parameter tuning, and final performance evaluation, respectively. Dataset partitioning was performed before data augmentation, and augmentation was applied only to the training set to avoid data leakage. The image distribution of each category in different subsets is shown in Table 1.

2.3. Data Augmentation

To reduce overfitting and improve the robustness of the model to common imaging variations in close-range orchard scenarios, data augmentation was applied only to the training set. The validation and test sets were not augmented and were used to evaluate model performance on original orchard images. The adopted augmentation strategies included Mosaic augmentation, horizontal flipping, HSV color perturbation, random translation, and random scaling.
These operations were selected to simulate common variations in handheld orchard imaging. Mosaic augmentation was used to enrich background composition and target-scale diversity during training. Horizontal flipping was used to simulate different leaf and fruit orientations. HSV perturbation was adopted to imitate color and brightness changes caused by natural illumination differences, while random translation and scaling were used to simulate target-position shifts and scale changes during close-range image acquisition. Although data augmentation cannot replace the collection of more diverse original field samples, it can improve the utilization of the available training data and reduce the model’s sensitivity to common image-level variations. The parameter settings are shown in Table 2.
Overall, these augmentation strategies were used to improve the diversity of training samples and enhance the model’s tolerance to common variations in handheld orchard images, such as illumination changes, shooting-angle differences, target-position shifts, and scale changes.

3. Methods

3.1. Baseline Model and Design Motivation

In this study, YOLOv8n was selected as the baseline detection framework [19]. As a compact one-stage detector, YOLOv8n consists of a Backbone, Neck, and Detection Head. The Backbone extracts hierarchical features through convolutional layers, C2f modules, and an SPPF module; the Neck performs multi-scale feature fusion; and the decoupled Detection Head predicts object categories and bounding boxes at different scales. Compared with larger YOLO variants, YOLOv8n has fewer parameters and lower computational cost, making it a suitable baseline for lightweight-oriented citrus disease detection.
However, YOLOv8n was originally developed for general object detection and may still have limitations when applied to citrus leaf and fruit disease detection in real orchard images. Citrus disease symptoms are often small, locally distributed, and affected by background clutter, illumination variation, leaf occlusion, and fruit overlap. Under these conditions, the original feature fusion structure may not sufficiently coordinate fine-grained low-level details and high-level semantic information for disease regions with different scales. In addition, disease-related responses may be weakened by surrounding non-target regions such as leaf veins, branches, shadows, and background textures. The original Detection Head also contains repeated convolutional computation across different prediction branches, while the default regression loss may not always provide sufficient localization optimization for small or ambiguous disease regions.
Therefore, this study improves YOLOv8n from four task-oriented perspectives: multi-scale feature fusion, disease-region feature enhancement, detection-head compactness, and bounding-box localization optimization. Specifically, AFPN is introduced to strengthen cross-level feature interaction, CA is embedded to enhance spatially informative disease-related responses, LSCD is adopted to reduce detection-head parameter redundancy, and WIoU is used to improve bounding-box regression optimization during training. These modifications aim to improve citrus disease detection under real orchard imaging conditions while maintaining a compact model profile, rather than simply increasing network depth or width.

3.2. Overall Architecture of the Proposed Model

To improve multi-scale feature representation, disease-region perception, and model compactness for citrus disease detection, an improved YOLOv8n-based framework, named LDTC-YOLO, was constructed. The overall architecture of the proposed model is shown in Figure 2.
LDTC-YOLO uses YOLOv8n as the baseline framework and retains its original Backbone for hierarchical feature extraction, which helps preserve the compact structure of the baseline model. On this basis, the original Neck is replaced with an AFPN-based feature fusion structure, and CA is embedded into selected output feature layers for further feature recalibration. In addition, the original Detection Head is replaced with LSCD to reduce prediction-branch redundancy, and WIoU is incorporated as the bounding-box regression loss during training. This design improves disease-related feature representation, reduces detection-head parameter redundancy, and enhances localization optimization without simply increasing network depth or width.
In the overall workflow, the input image is first processed by the Backbone to extract multi-scale feature maps, denoted as C 3 , C 4 , and C 5 . These feature maps are then fed into the AFPN-based fusion structure to strengthen cross-level feature interaction. AFPN is used to integrate complementary information from different feature levels, which is beneficial for disease regions with different sizes, locally distributed symptoms, and partially occluded targets. After multi-level feature fusion, CA is applied to the P 3 and P 4 output feature layers to recalibrate spatial and channel responses. This sequential design allows the model to first coordinate multi-scale information and then emphasize disease-related responses in key feature layers, thereby reducing responses from non-target regions such as leaf veins, branches, shadows, and background textures.
The enhanced feature maps are subsequently fed into the Lightweight Shared Convolutional Detection (LSCD) head for classification and localization prediction. Compared with the original Detection Head, LSCD reduces repeated convolutional computation across different prediction branches through shared convolution, thereby decreasing parameter redundancy and improving model compactness. In addition, WIoU is used during training to improve bounding-box regression optimization for small, ambiguous, or difficult disease regions. Because WIoU is used only during training, it does not introduce additional parameters or inference cost.
Overall, AFPN and CA mainly improve multi-scale feature fusion and disease-region perception, while LSCD and WIoU contribute to detection-head compactness and localization optimization, respectively. These components form a problem-oriented design for balancing detection performance and model compactness in citrus leaf and fruit disease detection under real orchard imaging conditions.

3.3. Coupled AFPN–CA Feature Enhancement Module

To improve feature representation for citrus leaf and fruit disease detection under real orchard imaging conditions, a coupled AFPN–CA feature enhancement module was introduced into the Neck of LDTC-YOLO. The module was designed in a sequential manner, following a “cross-level fusion first, key-region recalibration second” strategy.
Before constructing the final AFPN–CA module, preliminary module-selection experiments were conducted to avoid arbitrary structural stacking. Alternative attention and feature-fusion designs, including RFAConv, different CA insertion positions, BiFPN, and several AFPN configurations, were compared under the same training settings. The results showed that CA inserted into the P 3 and P 4 layers provided a better balance between detection accuracy and additional complexity than RFAConv-based variants, while the AFPN-Refine1 configuration achieved a more compact feature-fusion structure than BiFPN. These preliminary comparisons guided the final selection of the coupled AFPN–CA structure in LDTC-YOLO.
Specifically, AFPN strengthens interactions among pyramid layers with different spatial resolutions and semantic depths, after which CA recalibrates selected outputs to highlight lesion-related spatial and channel responses. This design addresses two practical challenges in orchard disease detection: scale variation of disease regions and interference from background clutter, occlusion, and overlapping leaves or fruits.

3.3.1. Progressive Cross-Level Feature Fusion with AFPN

In this study, multi-scale features refer to the feature pyramid representations extracted from different stages of the YOLOv8n Backbone, namely C 3 , C 4 , and C 5 . These feature layers have different spatial resolutions and semantic levels. In general, lower-level features such as C 3 preserve more spatial details, while higher-level features such as C 5 are more abstract but have lower spatial resolution. The C 4 layer provides an intermediate representation between these two levels. Based on this hierarchical structure, the purpose of introducing AFPN is to coordinate features with different resolutions and semantic levels, so that disease symptoms with different sizes and locations can be represented more effectively. Therefore, the multi-scale issue considered in this study is feature coordination across existing pyramid levels rather than the introduction of a new multi-scale detection paradigm.
The original Neck of YOLOv8n provides basic feature fusion for general object detection. However, citrus disease symptoms in orchard images may appear as small spots, local patches, or irregular fruit-surface regions, and they are often affected by illumination variation, occlusion, and background clutter. To better coordinate low-level details and high-level semantics, the original Neck was replaced with an AFPN-based progressive fusion structure, as shown in Figure 3.
Let the three feature layers output by the Backbone be denoted as C 3 , C 4 , and C 5 . Since these feature maps have different channel dimensions, 1 × 1 convolutions are first used for channel alignment:
C ^ i = ϕ i ( C i ) , i { 3 , 4 , 5 } ,
where ϕ i ( · ) denotes the corresponding 1 × 1 convolution mapping, and C ^ i denotes the aligned feature map with a unified channel dimension. This operation provides a consistent channel representation for subsequent feature fusion.
AFPN performs progressive cross-level fusion to coordinate adjacent and non-adjacent pyramid features. In the first stage, adjacent low- and middle-level features are fused to reduce the semantic gap between neighboring feature levels. In the second stage, the high-level feature is further introduced, and features from different levels are resized to the target output scale before adaptive spatial fusion. The final pyramid outputs can be summarized as
( P 3 , P 4 , P 5 ) = F AFPN C ^ 3 , C ^ 4 , C ^ 5 ,
where F AFPN ( · ) denotes the progressive cross-level fusion process. Through this process, low-level texture information, intermediate structural information, and high-level semantic information are coordinated before being passed to the detection head.
In the adaptive spatial fusion operation, the input branches are not simply concatenated or directly summed. Instead, spatial weights are assigned to different input branches according to their feature responses. For a target pyramid level l, let { X 1 l , X 2 l , , X K l } denote the aligned input features to be fused, where K is the number of input branches. The fused feature can be written as
F l = k = 1 K α k l X k l , k = 1 K α k l = 1 ,
where α k l denotes the spatial weight map assigned to the k-th input branch at level l, and ⊙ denotes element-wise multiplication. The spatial weights are generated by the adaptive spatial fusion operation and normalized along the branch dimension. Therefore, the fusion operation can adaptively adjust the contribution of different feature levels at each spatial location.

3.3.2. Key-Region Recalibration with Coordinate Attention

Although AFPN strengthens feature interaction across different pyramid levels, the fused feature maps may still contain responses from non-lesion regions, such as leaf veins, branches, shadows, and background textures. These responses can interfere with the detection of small and locally distributed disease regions. Therefore, Coordinate Attention (CA) is applied after AFPN to further recalibrate key feature layers and enhance lesion-related responses, as shown in Figure 4.
CA embeds positional information into channel attention. Instead of compressing the whole feature map into a single channel descriptor through two-dimensional global pooling, CA aggregates features along the horizontal and vertical directions separately. This allows the module to model channel dependencies while preserving direction-aware positional information. Such a mechanism is suitable for citrus disease detection because disease regions are often small, irregularly distributed, and sensitive to spatial location.
Let the input feature map be X R H × W × C , where H, W, and C denote the height, width, and number of channels, respectively. CA generates attention maps along the height and width directions, denoted as a h and a w , through direction-aware pooling and convolutional transformations. The recalibrated output feature can be expressed as
Y = X a h a w ,
where Y denotes the output feature after coordinate attention, and ⊙ denotes element-wise multiplication. Through this operation, CA preserves positional cues while enhancing informative disease-related responses.
In LDTC-YOLO, CA is applied only to the P 3 and P 4 output layers of AFPN. The P 3 layer has a higher spatial resolution and retains richer texture and edge information, which is important for small disease-region detection. The P 4 layer provides a balance between spatial detail and semantic representation, which helps distinguish disease regions from cluttered backgrounds. In contrast, the P 5 layer has a lower spatial resolution and mainly carries high-level semantic information. Applying CA to P 5 may introduce additional computation while providing limited positional-detail enhancement. Therefore, P 5 is kept unchanged in this study. This setting was also supported by preliminary comparisons of different CA insertion positions.
The enhanced features are denoted as
P 3 C A = A CA ( P 3 ) , P 4 C A = A CA ( P 4 ) ,
where A CA ( · ) denotes the CA recalibration operation. The final feature set fed into the Detection Head is
P = { P 3 C A , P 4 C A , P 5 } .
Overall, the AFPN–CA module forms a sequential coupling structure. AFPN first coordinates feature information from C 3 , C 4 , and C 5 through progressive cross-level fusion, while CA further recalibrates selected output layers to highlight lesion-related spatial responses. In this structure, AFPN mainly addresses scale variation and semantic coordination among pyramid features, whereas CA further reduces non-target background responses and strengthens key-region perception after feature fusion. This design provides more discriminative input features for the subsequent detection head without simply increasing network depth or width.

3.4. Lightweight Shared Convolutional Detection Head

After feature enhancement by the AFPN–CA module, the output features are fed into the Detection Head for classification and bounding-box regression. Although the original YOLOv8n Detection Head adopts a decoupled prediction structure, different pyramid levels still use separate convolutional parameters, which may introduce redundant computation in the prediction branches. Considering that the detection task involves a limited number of citrus disease categories and that disease-related regions share certain low-level visual cues, such as local texture changes, color variation, and boundary patterns, excessive scale-specific prediction parameters may lead to unnecessary redundancy in the detection head. Therefore, a Lightweight Shared Convolutional Detection Head (LSCD) was adopted to replace the original Detection Head [20], as shown in Figure 5.
LSCD reduces parameter redundancy while preserving the decoupled prediction form for classification and localization. Specifically, features from different pyramid levels are first processed by level-specific 1 × 1 Conv_GN layers for channel alignment. Shared 3 × 3 Conv_GN layers are then used to extract common prediction features across pyramid levels, allowing convolutional parameters to be reused among different detection branches. Two lightweight shared prediction layers are further used to generate bounding-box regression and class prediction outputs, denoted as Conv R e g and Conv C l s , respectively.
For the regression branch, a learnable scaling layer is attached to each pyramid level to adapt the shared regression predictor to features with different spatial resolutions. This scaling layer is not an additional detection scale, but a learnable scalar factor used to adjust the magnitude of regression outputs at each pyramid level. In addition, Group Normalization (GN) is used instead of Batch Normalization (BN) to help maintain training stability under small-batch conditions [21]. Overall, LSCD follows a “level-specific alignment–shared convolution–shared prediction” design, which reduces repeated convolutional computation in the detection head and helps balance detection performance and model compactness.

3.5. Regression Loss Based on WIoU

After integrating the AFPN–CA module and LSCD head into YOLOv8n, Wise-IoU (WIoU) was incorporated to optimize bounding-box regression during training [22]. Unlike AFPN–CA and LSCD, WIoU only modifies the regression loss and does not change the Backbone, Neck, or Detection Head. Therefore, it does not affect the inference path or introduce additional parameters and computational cost.
Citrus disease regions are often small, irregularly distributed, and affected by blurred boundaries, occlusion, illumination variation, and background clutter, which may lead to unstable localization during training. For this reason, WIoU v3 was adopted to provide more adaptive regression optimization. WIoU v3 introduces a distance-aware penalty and a dynamic focusing mechanism to adjust regression gradients according to localization quality. The loss can be generally expressed as
L WIoU v 3 = r R WIoU L IoU ,
where L IoU denotes the basic IoU loss, R WIoU represents the distance-aware penalty related to the center distance between the predicted and ground-truth boxes, and r is the dynamic focusing factor used to reweight samples according to localization quality. This mechanism helps reduce the influence of samples with extremely poor localization quality while maintaining effective gradients for samples with useful optimization value.
In LDTC-YOLO, WIoU complements AFPN–CA and LSCD from the perspective of regression optimization. AFPN–CA enhances disease-related feature representation, LSCD reduces detection-head redundancy, and WIoU helps improve bounding-box localization during training. Since WIoU is used only in the training stage, it does not affect the compact inference structure of the model.

3.6. Training Settings and Evaluation Metrics

To verify the effectiveness of the proposed model for citrus disease detection, all experiments were conducted under Ubuntu 22.04 LTS using Python 3.8 and the PyTorch framework [23]. Model training and testing were performed on a workstation equipped with a single NVIDIA GeForce RTX 4080 GPU with 16 GB memory and CUDA 12.2. All models were trained and evaluated under the same experimental environment, including the same input size, training epochs, optimizer settings, data augmentation strategies, and evaluation protocol, to ensure fair comparison.
During training, the input images were resized to 640 × 640 , the batch size was set to 32, and the number of training epochs was set to 300. Stochastic Gradient Descent (SGD) was used as the optimizer, with an initial learning rate of 0.01, a momentum of 0.937, and a weight decay of 0.0005. The warmup epoch was set to 3, followed by a cosine annealing learning rate schedule. The number of data-loading workers was set to 2, and the random seed was fixed at 42. Data augmentation was applied only to the training set, while the validation and test sets remained unchanged.
Precision (P), Recall (R), Average Precision ( A P ), and mean Average Precision ( m A P ) were used to evaluate detection performance. Precision and Recall are defined as
P = T P T P + F P , R = T P T P + F N ,
where T P , F P , and F N denote true positives, false positives, and false negatives, respectively. A P is calculated as the area under the Precision–Recall curve,
A P = 0 1 P ( R ) d R ,
and m A P is computed as the average A P over all categories:
m A P = 1 N i = 1 N A P i ,
where N is the number of categories and A P i denotes the average precision of the i-th category. In this study, m A P @ 0.5 and m A P @ 0.5 : 0.95 were reported. The former is calculated at an IoU threshold of 0.5, while the latter averages the results over IoU thresholds from 0.5 to 0.95 with a step size of 0.05.
The lightweight and efficiency characteristics of the model were evaluated using the number of parameters (Params), floating-point operations (GFLOPs), model size, and inference speed. Params reflects the structural compactness of the model, GFLOPs represents the theoretical computational cost of a single forward pass, and model size indicates the storage requirement of the trained weights. Inference speed was measured using frames per second (FPS), which indicates the number of images processed per second under the same hardware and software environment. In this study, FPS was measured on the test set with an input size of 640 × 640 and a batch size of 1. The reported FPS values were obtained under identical inference settings for all compared models.
In this study, a lightweight and efficient model refers to a model that reduces Params, GFLOPs, and model size while maintaining competitive detection accuracy and favorable inference speed. Since FPS is hardware-dependent, it was reported as a supplementary efficiency indicator together with Params, GFLOPs, and model size.

4. Experimental Results and Analysis

4.1. Overall Performance Analysis of the Proposed Model

To evaluate the overall performance of LDTC-YOLO for citrus leaf and fruit disease detection, the proposed model was compared with YOLOv8n, YOLOv8-GABNet, and YOLO-Citrus. YOLOv8n was used as the baseline, while YOLOv8-GABNet and YOLO-Citrus were selected as representative lightweight citrus disease detection models. For a fair comparison, all models were trained and tested on the HOCD-4 dataset using the same data partitioning, training settings, and testing conditions. The results are shown in Table 3.
As shown in Table 3, LDTC-YOLO achieved the best overall detection performance among the compared models, with Precision, Recall, m A P @ 0.5 , and m A P @ 0.5 : 0.95 values of 0.915, 0.843, 0.894, and 0.648, respectively. Compared with YOLOv8n, LDTC-YOLO reduced the number of parameters, GFLOPs, and model size by approximately 37.2%, 8.6%, and 35.8%, respectively, while increasing the inference speed from 43.14 FPS to 47.45 FPS under the same testing conditions.
Compared with YOLOv8-GABNet and YOLO-Citrus, LDTC-YOLO also obtained higher detection metrics with fewer parameters, lower GFLOPs, and a smaller model size. The improvement in Recall is relevant for citrus disease screening because missed detections may delay disease management, while the higher m A P @ 0.5 : 0.95 suggests more stable localization under stricter IoU thresholds.
Overall, the results indicate that AFPN–CA, LSCD, and WIoU provide complementary contributions to feature enhancement, detection-head compactness, and regression optimization. LDTC-YOLO therefore achieves a favorable balance between detection performance, model compactness, and inference efficiency. Although further validation on actual edge devices is still needed, the reduced model size and improved FPS suggest its potential for future resource-constrained orchard inspection applications.

4.2. Comparative Analysis of Different Detection Models

To further evaluate the detection performance and lightweight characteristics of LDTC-YOLO, the proposed model was compared with representative detection models, including Faster R-CNN, SSD, YOLOv5n, YOLOv8s, YOLOv8n, YOLOv10n, and YOLO11n. These models cover two-stage and one-stage detection frameworks as well as several lightweight YOLO variants. All models were trained and evaluated on the HOCD-4 dataset using the same data partitioning, training settings, and evaluation protocol. The results are shown in Table 4.
As shown in Table 4, Faster R-CNN and SSD achieved lower detection accuracy and substantially higher model complexity than the YOLO-series models under the current HOCD-4 setting. Faster R-CNN had a large model size of 158.11 MB and 134.0 GFLOPs, while SSD also required 92.14 MB of storage. These results indicate that, for the citrus disease detection task considered in this study, heavier two-stage detectors or earlier one-stage detectors are less favorable when both detection accuracy and model efficiency are considered.
Among the YOLO-series models, LDTC-YOLO achieved the highest Precision, Recall, m A P @ 0.5 , and m A P @ 0.5 : 0.95 , indicating better overall detection performance on HOCD-4. Compared with YOLOv10n and YOLO11n, LDTC-YOLO increased m A P @ 0.5 : 0.95 by 0.010 and 0.008, respectively. Although these margins are modest, LDTC-YOLO also obtained higher Precision and Recall, a smaller model size, and competitive inference speed. These results suggest improved robustness on the HOCD-4 dataset, which contains small disease regions, partial occlusion, overlapping targets, and background clutter.
From a lightweight perspective, YOLOv5n had the fewest parameters, the lowest GFLOPs, and the highest FPS among the compared models. However, its detection accuracy was lower than that of LDTC-YOLO, especially in m A P @ 0.5 : 0.95 (0.553 vs. 0.648), indicating weaker localization performance under stricter IoU thresholds. In contrast, LDTC-YOLO achieved the smallest model size and the best detection metrics, while maintaining an inference speed close to YOLOv5n. This suggests that LDTC-YOLO provides a more favorable trade-off between detection accuracy, model compactness, and inference efficiency.
Overall, the results indicate that the AFPN–CA feature enhancement module, LSCD detection head, and WIoU regression loss contribute complementary effects. AFPN–CA improves disease-related feature representation, LSCD reduces detection-head redundancy, and WIoU enhances bounding-box regression optimization during training. Within the experimental scope of this study, LDTC-YOLO can be considered a lightweight-oriented alternative to YOLOv8n for citrus leaf and fruit disease detection in real orchard images. Further validation on practical edge devices is still needed before making stronger deployment claims.

4.3. Ablation Experiment Analysis

To evaluate the contribution of each core component in LDTC-YOLO, ablation experiments were conducted on the HOCD-4 dataset. YOLOv8n was used as the baseline model, and AFPN, CA, LSCD, and WIoU were introduced individually and in selected combinations. Since CA is used in LDTC-YOLO as a post-fusion recalibration module for the AFPN output layers, the combination experiments were mainly organized around the AFPN-based structure. The single +C setting was included to evaluate the independent effect of CA, whereas C+L, C+W, and C+L+W were not considered because they do not correspond to the intended AFPN–CA coupling strategy. The results are shown in Table 5.
Here, A, C, L, and W denote AFPN, CA, LSCD, and WIoU, respectively. The symbol indicates that the corresponding module is used, whereas – indicates that it is not used.
As shown in Table 5, each component contributes to the model from different aspects. When introduced individually, CA and AFPN improved the detection metrics, indicating that key-region recalibration and cross-level feature fusion are beneficial for disease feature representation. LSCD reduced the number of parameters from 3.006 M to 2.362 M while improving m A P @ 0.5 : 0.95 , showing its effectiveness in reducing detection-head redundancy. WIoU did not change the parameter count but improved m A P @ 0.5 : 0.95 , which is consistent with its role as a training-level regression optimization component.
The combination results further show the complementarity of the proposed modules. The +A+C configuration achieved higher Precision and m A P @ 0.5 : 0.95 than the baseline, suggesting that AFPN-based multi-scale fusion and CA-based recalibration work jointly to enhance disease-related features. After LSCD was added, the parameter count decreased from 2.254 M to 1.887 M while competitive detection performance was maintained. With the further incorporation of WIoU, LDTC-YOLO achieved the best Precision, Recall, m A P @ 0.5 , and m A P @ 0.5 : 0.95 values of 0.915, 0.843, 0.894, and 0.648, respectively.
Overall, AFPN enhances multi-scale feature representation, CA strengthens key disease-region responses, LSCD reduces detection-head redundancy, and WIoU optimizes bounding-box regression during training. These components are functionally complementary and enable LDTC-YOLO to achieve a favorable balance between detection performance and model compactness under real orchard imaging conditions.

4.4. Visualization and Qualitative Analysis

4.4.1. Confusion Matrix Analysis

To further evaluate class-wise recognition and inter-class confusion, normalized confusion matrices of YOLOv8n, YOLOv8-GABNet, YOLO-Citrus, and LDTC-YOLO were compared, as shown in Figure 6. Diagonal entries indicate the proportion of correctly recognized samples for each category, whereas off-diagonal entries reflect misclassification between disease categories or confusion with the background.
As shown in Figure 6, LDTC-YOLO achieved higher diagonal values for the four disease categories than the compared models, indicating improved class-wise recognition on the HOCD-4 test set. The correct recognition ratios for HLB, melanose, canker, and black spot were 0.87, 0.91, 0.85, and 0.83, respectively. The corresponding background-related entries were 0.12, 0.08, 0.14, and 0.13, respectively. These values indicate that some missed or background-confused detections still exist, but the overall background-related confusion is reduced compared with the other models. This result is consistent with the higher Recall and mAP values reported in the quantitative comparisons.
Combined with the ablation results, the improved class-wise recognition can be partly explained by the complementary design of LDTC-YOLO. AFPN and CA enhance disease-related feature representation and reduce interference from non-target regions such as leaf veins, shadows, and background textures, while LSCD maintains prediction capability with fewer redundant parameters. WIoU further improves bounding-box regression optimization during training, which may help reduce localization-related errors for small or irregular disease regions.
Overall, the confusion matrix analysis supports the quantitative results by showing that LDTC-YOLO improves class-wise recognition and reduces background-related confusion on HOCD-4. These findings further support the effectiveness of LDTC-YOLO for citrus leaf and fruit disease detection in real orchard images.

4.4.2. Precision–Recall Curve Analysis

Class-wise Precision–Recall (PR) curves were used to further compare the detection behavior of different models on the HOCD-4 dataset, as shown in Figure 7. A curve closer to the upper-right region generally indicates a better precision–recall balance for the corresponding disease category.
As shown in Figure 7, LDTC-YOLO maintained competitive class-wise detection performance across the four disease categories. It achieved AP values of 0.906, 0.932, 0.875, and 0.876 for HLB, melanose, canker, and black spot, respectively. Compared with the other models, LDTC-YOLO obtained the highest AP for melanose, canker, and black spot, while its AP for HLB was comparable to YOLOv8-GABNet (0.906 vs. 0.905). The improvement was most evident for melanose, suggesting better detection stability for small and densely distributed disease regions. For HLB and canker, the gains were relatively modest but remained consistent with the overall quantitative results.
Overall, the class-wise PR curves further support the effectiveness of LDTC-YOLO in maintaining a favorable precision–recall balance across most disease categories in real orchard images.

4.4.3. Comparison of Detection Results

Based on the quantitative evaluation, several representative disease samples were selected to visually compare the ground-truth annotations, the detection results of YOLOv8n, and the detection results of LDTC-YOLO, as shown in Figure 8.
The selected samples contain typical close-range orchard imaging conditions, including small disease regions, background clutter, partial occlusion, and dense lesion distribution. As shown in Figure 8, YOLOv8n produced some false detections, missed detections, or duplicate prediction boxes, especially when lesion boundaries were unclear or background textures were visually similar to disease symptoms.
Compared with YOLOv8n, LDTC-YOLO produced predictions that were generally more consistent with the ground-truth annotations in the selected samples. For the black spot and melanose examples, LDTC-YOLO localized the main disease regions more compactly and reduced some background-related false responses. In samples with dense or partially occluded lesions, the proposed model retained the main disease regions with fewer duplicate predictions. This observation is consistent with the ablation results, where AFPN–CA contributed to disease-related feature representation and WIoU helped improve bounding-box regression optimization during training.
Overall, the visual comparison supports the quantitative results by showing improved localization and fewer false or duplicate predictions in the selected real orchard samples. These qualitative results further illustrate the effectiveness of LDTC-YOLO for citrus leaf and fruit disease detection in real orchard images.

4.4.4. Heatmap Analysis

Based on the comparison of detection results, heatmap visualization was further used to examine the regional feature responses of YOLOv8n and LDTC-YOLO on representative disease samples, as shown in Figure 9. In this study, the heatmaps were generated by averaging the feature responses across channels at the selected feature layer and then projecting the response map onto the original image. Unlike Grad-CAM, which uses the gradients of a target class score to weight feature maps and generate class-discriminative localization maps, this visualization reflects the general activation distribution of the selected layer rather than the class-specific evidence for a particular prediction. The color transition from cool to warm indicates increasing response intensity. The same visualization procedure and settings were used for both models.
As shown in Figure 9, YOLOv8n exhibited relatively scattered high-response regions in some selected samples, with additional activations appearing around leaf edges, background textures, or other non-target areas. In contrast, LDTC-YOLO produced more concentrated feature responses around the main disease regions, while the responses in surrounding non-target areas were reduced. These observations suggest that LDTC-YOLO provides better region-of-interest focus at the feature-response level in the selected real orchard samples.
Overall, the heatmap analysis provides qualitative interpretation of the regional activation patterns of the two models. It is consistent with the confusion matrix, PR curve, and detection result comparisons, further supporting that LDTC-YOLO improves disease-region focus and reduces background-related activation in representative real orchard images.

5. Discussion

5.1. Model Effectiveness Under Real Orchard Imaging Conditions

Citrus disease detection in real close-range orchard images differs from general object detection because disease regions are usually small, locally distributed, and easily affected by branch and leaf occlusion, fruit overlap, illumination variation, shadows, and background clutter. In addition, different disease categories may show similar color or texture characteristics, while the same disease may present different visual appearances under different growth stages and imaging conditions. These factors increase the difficulty of disease-region localization and class discrimination.
The experimental results show that LDTC-YOLO improves detection performance while maintaining a compact and efficient model profile. Compared with YOLOv8n, the proposed model improved Precision, Recall, m A P @ 0.5 , and m A P @ 0.5 : 0.95 , while reducing the number of parameters from 3.006 M to 1.887 M, GFLOPs from 8.1 to 7.4, and model size from 5.97 MB to 3.83 MB. In addition, the inference speed increased from 43.14 FPS to 47.45 FPS under the same testing conditions. These results suggest that the combination of AFPN, CA, LSCD, and WIoU provides a favorable balance among feature representation, localization optimization, model compactness, and inference efficiency.
The improvements can be interpreted from the complementary roles of the proposed components. AFPN strengthens cross-level feature fusion and helps coordinate disease features with different scales. CA further recalibrates key feature layers and enhances disease-related responses in spatially informative regions. LSCD reduces repeated convolutional computation in the detection head, thereby decreasing parameter redundancy, while WIoU improves bounding-box regression optimization during training without increasing inference cost. Therefore, LDTC-YOLO is not designed as a simple stacking of modules, but as a problem-oriented structure for balancing detection accuracy and model compactness in real orchard images.

5.2. Implications for Citrus Orchard Disease Monitoring

From an agricultural application perspective, improving Recall and localization stability is important for orchard disease screening because missed detections may delay disease management. The results of this study indicate that LDTC-YOLO can provide a compact detection baseline for identifying visible citrus disease symptoms on both leaves and fruits in close-range orchard images. Such a model may support future smartphone-based or edge-assisted orchard inspection systems after further deployment validation. However, the current results should be interpreted as an offline evaluation on the HOCD-4 dataset rather than evidence of completed field deployment.

5.3. Limitations and Future Work

Despite the favorable results, several limitations should be acknowledged. First, although HOCD-4 was collected from 12 orchard bases, all images were acquired from the main production area of Bingtang sweet orange in Yongxing County, Hunan Province. Therefore, the cross-regional generalization of the model has not been fully verified. Second, the dataset includes only four typical citrus disease categories and does not distinguish disease development stages or severity levels. Other citrus diseases, pest symptoms, nutrient deficiencies, and non-disease stress symptoms were not included, which may limit the applicability of the model in broader orchard scenarios. In practical orchard management, the same disease may present different visual symptoms at early, middle, and late stages, and stage-aware recognition may provide more useful information for disease monitoring and control decisions. Third, although disease categories were confirmed with the assistance of agricultural experts, the annotation process was mainly based on visible symptoms in images. Additional diagnostic evidence, such as pathogen detection or laboratory confirmation, was not used in this study. Fourth, all experiments were conducted in an offline server environment, and the model has not yet been deployed or evaluated on smartphones or embedded edge devices. Therefore, practical inference latency, memory consumption, energy efficiency, and usability in real orchard inspection remain to be further evaluated.
Future work will focus on addressing these limitations. The dataset will be expanded to include more citrus disease categories, pest symptoms, nutrient deficiencies, healthy samples, and samples from different geographic regions, cultivars, seasons, and acquisition devices. In addition, disease development stages and severity levels will be annotated to support stage-aware citrus disease recognition, including the identification of early and late disease symptoms. Independent cross-region and cross-season tests will also be conducted to evaluate model generalization. Finally, real-device deployment on smartphones and edge platforms will be performed to assess inference speed, memory usage, energy consumption, and practical usability in orchard inspection scenarios. These efforts will further improve the robustness, generalization ability, and practical applicability of LDTC-YOLO for intelligent citrus orchard management.

6. Conclusions

To address the challenges of small disease regions, background clutter, and model compactness in citrus leaf and fruit disease detection, this study proposed LDTC-YOLO, a lightweight YOLOv8n-based detection model for real orchard images. The model integrates AFPN for multi-scale feature fusion, CA for key-region recalibration, LSCD for reducing detection-head redundancy, and WIoU for bounding-box regression optimization during training.
Experimental results on the HOCD-4 dataset show that LDTC-YOLO achieved Precision, Recall, m A P @ 0.5 , and m A P @ 0.5 : 0.95 values of 0.915, 0.843, 0.894, and 0.648, respectively. Compared with YOLOv8n, the proposed model reduced the number of parameters to 1.887 M, GFLOPs to 7.4, and model size to 3.83 MB, while increasing the inference speed to 47.45 FPS under the same testing conditions. Comparative experiments, ablation studies, and visualization analyses further showed that the proposed components contribute to feature representation, detection-head compactness, and localization optimization.
Overall, LDTC-YOLO provides a compact and effective approach for detecting visible citrus leaf and fruit disease symptoms in real orchard images. However, further validation across regions, seasons, disease categories, acquisition devices, and actual edge platforms is still needed. Future work will focus on expanding HOCD-4 and evaluating real-device deployment to improve the generalization and practical applicability of the proposed model.

Author Contributions

Conceptualization, B.J. and B.G.; methodology, B.G.; software, B.G.; validation, B.G.; formal analysis, B.G.; investigation, B.G.; resources, B.J.; data curation, B.G. and W.W.; writing—original draft preparation, B.G.; writing—review and editing, W.W. and B.J.; visualization, B.G.; supervision, B.J.; project administration, B.J.; funding acquisition, B.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shenzhen Fundamental Research Program (Grant No. JCYJ20230807094104009).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in GitHub at https://github.com/AlanPikaso/HOCD-4.git.

Acknowledgments

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Arutselvan, R.; Pati, K.; Dolatabadian, A.; Dutta, S.K. Citrus Diseases and Management. In Recent Advances in Citrus Fruits; Springer International Publishing: Cham, 2023; pp. 501–526. [Google Scholar] [CrossRef]
  2. Gottwald, T.R.; Aubert, B.; Xue-Yuan, Z. Preliminary Analysis of Citrus Greening (Huanglungbin) Epidemics in the People’s Republic of China and French Reunion Island. Phytopathology 1989, 79, 687–693. [Google Scholar] [CrossRef]
  3. Ali, S.; Hameed, A.; Muhae-Ud-Din, G.; Ikhlaq, M.; Ashfaq, M.; Atiq, M.; Ali, F.; Zia, Z.U.; Naqvi, S.A.H.; Wang, Y. Citrus Canker: A Persistent Threat to the Worldwide Citrus Industry—An Analysis. Agronomy 2023, 13. [Google Scholar] [CrossRef]
  4. Torsoni, G.B.; Aparecido, L.E.d.O.; Baratti, A.C.C.; Rossi, M.F.d.M.; Lorençone, P.A.; Lorençone, J.A. Climatic zoning and future projections of citrus black spot in Brazil: adaptive strategies for sustainable citrus farming in the face of climate change. J. Plant Pathol. 2025, 107, 1745–1758. [Google Scholar] [CrossRef]
  5. Barbedo, J.G.A. A review on the main challenges in automatic plant disease identification based on visible range images. Biosyst. Eng. 2016, 144, 52–60. [Google Scholar] [CrossRef]
  6. Habib, A.; Abdullah, A.; Puyam, A. Visual Estimation: A Classical Approach for Plant Disease Estimation. In Trends in Plant Disease Assessment; Springer Nature Singapore: Singapore, 2022; pp. 19–45. [Google Scholar] [CrossRef]
  7. da Silva, J.C.F.; Silva, M.C.; Luz, E.J.S.; Delabrida, S.; Oliveira, R.A.R. Using Mobile Edge AI to Detect and Map Diseases in Citrus Orchards. Sensors 2023, 23. [Google Scholar] [CrossRef] [PubMed]
  8. Barbedo, J.G. Factors influencing the use of deep learning for plant disease recognition. Biosyst. Eng. 2018, 172, 84–91. [Google Scholar] [CrossRef]
  9. Faisal, S.; Javed, K.; Ali, S.; Alasiry, A.; Marzougui, M.; Khan, M.A.; Cha, J.H. Deep Transfer Learning Based Detection and Classification of Citrus Plant Diseases. Comput. Mater. Contin. 2023, 76, 895–914. [Google Scholar] [CrossRef]
  10. Raut, S.C.; Kasat, N.N. A Review: Citrus Disease Detection Using Machine Learning Approach. In Proceedings of the 2024 2nd DMIHER International Conference on Artificial Intelligence in Healthcare, Education and Industry (IDICAIEI), 2024; pp. 1–5. [Google Scholar] [CrossRef]
  11. Upadhyay, A.; Chandel, N.S.; Singh, K.P.; Chakraborty, S.K.; Nandede, B.M.; Kumar, M.; Subeesh, A.; Upendar, K.; Salem, A.; Elbeltagi, A. Deep learning and computer vision in plant disease detection: a comprehensive review of techniques, models, and trends in precision agriculture. Artif. Intell. Rev. 2025, 58, 92. [Google Scholar] [CrossRef]
  12. Wijaya, R.S.; Santonius, S.; Wibisana, A.; Jamzuri, E.R.; Nugroho, M.A.B. Comparative Study of YOLOv5, YOLOv7 and YOLOv8 for Robust Outdoor Detection. J. Appl. Electr. Eng. 2024. [Google Scholar] [CrossRef]
  13. Peng, G.; Wang, K.; Ma, J.; Cui, B.; Wang, D. AGRI-YOLO: A Lightweight Model for Corn Weed Detection with Enhanced YOLO v11n. Agriculture 2025, 15. [Google Scholar] [CrossRef]
  14. Zhao, X.; Chi, J.; Wang, F.; Li, X.; Yuwen, X.; Li, T.; Shi, Y.; Xiao, L. YOLO-MSPM: A Precise and Lightweight Cotton Verticillium Wilt Detection Network. Agriculture 2025, 15. [Google Scholar] [CrossRef]
  15. Dai, Q.; Xiao, Y.; Lv, S.; Song, S.; Xue, X.; Liang, S.; Huang, Y.; Li, Z. YOLOv8-GABNet: An Enhanced Lightweight Network for the High-Precision Recognition of Citrus Diseases and Nutrient Deficiencies. Agriculture 2024, 14. [Google Scholar] [CrossRef]
  16. Feng, W.; Liu, J.; Li, Z.; Lyu, S. YOLO-Citrus: a lightweight and efficient model for citrus leaf disease detection in complex agricultural environments. Front. Plant Sci. 2025, 16–2025. [Google Scholar] [CrossRef] [PubMed]
  17. Chen, X. BD-YOLOv8: a lightweight method for real-time citrus disease detection in precision agriculture. Appl. Fruit. Sci. 2025, 67, 173. [Google Scholar] [CrossRef]
  18. Tzutalin, HumanSignal. LabelImg. Available online: https://github.com/HumanSignal/labelImg (accessed on 2026-04-17).
  19. Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO Version 8.0.0, computer software. 2023. Available online: https://github.com/ultralytics/ultralytics.
  20. Yin, B. Lightweight fire detection algorithm based on LSCD-FasterC2f-YOLOv8. In Proceedings of the 2024 5th International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), 2024; pp. 64–67. [Google Scholar] [CrossRef]
  21. Wu, Y.; He, K. Group Normalization. CoRR 2018, abs/1803.08494. [Google Scholar]
  22. Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:cs. [Google Scholar]
  23. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. CoRR 2019, abs/1912.01703. [Google Scholar]
Figure 1. Representative images of typical citrus diseases in HOCD-4. Each disease category includes one leaf image and one fruit image. From left to right, the disease categories are black spot, canker, HLB, and melanose.
Figure 1. Representative images of typical citrus diseases in HOCD-4. Each disease category includes one leaf image and one fruit image. From left to right, the disease categories are black spot, canker, HLB, and melanose.
Preprints 214243 g001
Figure 2. Overall architecture of the proposed LDTC-YOLO model.
Figure 2. Overall architecture of the proposed LDTC-YOLO model.
Preprints 214243 g002
Figure 3. Schematic diagram of the AFPN-based progressive cross-level feature fusion structure.
Figure 3. Schematic diagram of the AFPN-based progressive cross-level feature fusion structure.
Preprints 214243 g003
Figure 4. Schematic diagram of the Coordinate Attention module for spatial–channel feature recalibration.
Figure 4. Schematic diagram of the Coordinate Attention module for spatial–channel feature recalibration.
Preprints 214243 g004
Figure 5. Structure of the Lightweight Shared Convolutional Detection Head.
Figure 5. Structure of the Lightweight Shared Convolutional Detection Head.
Preprints 214243 g005
Figure 6. Normalized confusion matrices of different models on the HOCD-4 dataset.
Figure 6. Normalized confusion matrices of different models on the HOCD-4 dataset.
Preprints 214243 g006
Figure 7. Class-wise Precision–Recall curve comparison of different models on the HOCD-4 dataset. Each subplot corresponds to one citrus disease category, and each curve represents one detection model.
Figure 7. Class-wise Precision–Recall curve comparison of different models on the HOCD-4 dataset. Each subplot corresponds to one citrus disease category, and each curve represents one detection model.
Preprints 214243 g007
Figure 8. Visual comparison of detection results on representative citrus disease samples from HOCD-4. The three columns show ground-truth annotations, YOLOv8n predictions, and LDTC-YOLO predictions, respectively.
Figure 8. Visual comparison of detection results on representative citrus disease samples from HOCD-4. The three columns show ground-truth annotations, YOLOv8n predictions, and LDTC-YOLO predictions, respectively.
Preprints 214243 g008
Figure 9. Heatmap comparison between YOLOv8n and LDTC-YOLO on representative disease samples from HOCD-4. The heatmaps show channel-averaged feature response distributions at the selected feature layer.
Figure 9. Heatmap comparison between YOLOv8n and LDTC-YOLO on representative disease samples from HOCD-4. The heatmaps show channel-averaged feature response distributions at the selected feature layer.
Preprints 214243 g009
Table 1. Distribution of images in the training, validation, and test sets of HOCD-4
Table 1. Distribution of images in the training, validation, and test sets of HOCD-4
Category Training Set Validation Set Test Set Total Proportion (%)
HLB 906 113 113 1,132 32.24
Melanose 670 84 83 837 23.84
Canker 640 80 80 800 22.79
Black Spot 594 74 74 742 21.13
Total 2,810 351 350 3,511 100.00
Table 2. Parameter settings for data augmentation
Table 2. Parameter settings for data augmentation
Augmentation Method Parameter Setting Main Purpose
Mosaic augmentation mosaic=1.0 Background and scale diversity
Horizontal flipping fliplr=0.5 Orientation variation
HSV perturbation hsv_h=0.015, hsv_s=0.7, hsv_v=0.4 Color and illumination variation
Translation and scaling translate=0.1, scale=0.5 Position and scale variation
Table 3. Overall performance comparison of different models on the HOCD-4 dataset
Table 3. Overall performance comparison of different models on the HOCD-4 dataset
Model P R mAP@0.5 mAP@0.5:0.95 Params (M) GFLOPs Model Size (MB) FPS
YOLOv8n 0.829 0.805 0.866 0.589 3.006 8.1 5.97 43.14
YOLOv8-GABNet [15] 0.887 0.799 0.880 0.627 2.746 7.9 5.53 40.01
YOLO-Citrus [16] 0.857 0.786 0.867 0.618 2.177 7.6 4.52 24.38
LDTC-YOLO 0.915 0.843 0.894 0.648 1.887 7.4 3.83 47.45
Table 4. Performance comparison of different detection models on the HOCD-4 dataset
Table 4. Performance comparison of different detection models on the HOCD-4 dataset
Model P R mAP@0.5 mAP@0.5:0.95 Params (M) GFLOPs Model Size (MB) FPS
Faster R-CNN 0.584 0.804 0.789 0.509 41.310 134.0 158.11 8.33
SSD 0.579 0.652 0.648 0.413 24.150 30.6 92.14 6.32
YOLOv5n 0.840 0.799 0.846 0.553 1.765 4.1 3.90 48.21
YOLOv8s 0.879 0.770 0.859 0.585 11.127 28.4 22.50 44.13
YOLOv8n 0.829 0.805 0.866 0.589 3.006 8.1 5.97 43.14
YOLOv10n 0.879 0.817 0.883 0.638 2.266 7.8 5.51 46.18
YOLO11n 0.879 0.826 0.891 0.640 2.583 7.7 5.23 41.09
LDTC-YOLO 0.915 0.843 0.894 0.648 1.887 7.4 3.83 47.45
Table 5. Ablation experiment results on the HOCD-4 dataset
Table 5. Ablation experiment results on the HOCD-4 dataset
Method A C L W P R mAP@0.5 mAP@0.5:0.95 Params (M)
YOLOv8n 0.829 0.805 0.866 0.589 3.006
+C 0.895 0.800 0.886 0.628 3.011
+A 0.903 0.799 0.885 0.635 2.251
+L 0.890 0.803 0.884 0.625 2.362
+W 0.881 0.803 0.877 0.626 3.006
+L+W 0.868 0.806 0.882 0.632 2.362
+A+L 0.881 0.802 0.882 0.635 1.885
+A+W 0.892 0.804 0.883 0.638 2.251
+A+L+W 0.887 0.807 0.884 0.640 1.885
+A+C 0.911 0.807 0.889 0.642 2.254
+A+C+W 0.900 0.816 0.890 0.643 2.254
+A+C+L 0.892 0.821 0.889 0.641 1.887
LDTC-YOLO 0.915 0.843 0.894 0.648 1.887
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Accessibility

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated