Submitted:
14 June 2024
Posted:
14 June 2024
You are already at the latest version
Abstract

Keywords:
1. Introduction
2. Materials and Methods
2.1. Dataset Construction
2.2. YOLOv8 Algorithm
2.3. Improved YOLOv8 Algorithm
- (1)
- The Bottleneck in the C2f module is replaced with the RFMD Module, which consists of the CBS module, RFCA Conv, and MDFA. The RFMD Module uses the MDFA module proposed in this paper, which focuses on features from pixel-level dimension, spatial dimension and channel dimension. This enhances the extraction of effective feature information from channels while integrating both global and local spatial information. Additionally, the RFCA Conv not only focuses on important local information at each receptive field level but also enables the model to more precisely locate defect positions during detection, addressing the parameter sharing issue inherent in traditional convolutions.
- (2)
- The CBS modules in P3, P4, and P5 of the backbone network, as well as the CBS modules in the neck network, are replaced with the Adown down sampling structure. This structure uses various down sampling methods to extract features, preventing the loss of important features while reducing the model size and computational complexity.
- (3)
- The original YOLOv8’s loss function, CIOU Loss, is replaced with NWD Loss. This new loss function improves the detection precision of small targets.
2.3.1. MDFA Attention
2.3.2. RFCA Conv in the RFMD Module
2.3.3. Lightweight Down Sampling Structure ADown
2.3.4. Normalized Wasserstein Distance Loss [32] function
2.4. Training Environment and Evaluation Metrics
2.4.1. Training Environment
2.4.2. Evaluation Metrics
3. Results
3.1. Performance Comparison of Various Object Detection Models
3.2. Different Attention module Detection Performance Comparison
3.3. The results before and after improvement of the YOLOv8 model
3.3.1. The comparison of mAP50 and mAP50:95 before and after improvement
3.3.2. Confusion Matrix
3.4. Performance Comparison of Ablation Experiments
3.5. Different Models Detection Visualization Results Analysis
4. Discussion
5. Conclusions
- (1)
- This study proposed an MDFA module that selects important feature information from pixel-level dimension, channel dimension and spatial dimension. The feature maps processed by MDFA not only enhanced the extraction of effective information from channels but also contained global and local information on the spatial dimension. The CBS module, RFCA Conv, and MDFA module together formed the RFMD Module, which replaced the Bottleneck in the original YOLOv8’s C2f module to create RFMD-C2f. RFMD-C2f was applied to the position of the original model’s C2f module, where RFCA Conv focuses on important local features at the receptive field level and can locate disease positions more precisely. The ADown downsampling structure replaced the CBS modules in P3, P4, P5 of the original YOLOv8 backbone network and in the neck network, utilizing various downsampling methods and feature extraction strategies to avoid loss of important information that might be caused by a single downsampling method, and reduced model size and computational complexity. NWD Loss was used to replace the original YOLOv8’s CIOU loss function, enhancing detection precision for small disease features through a new measurement method.
- (2)
- The improved lightweight model was experimentally compared with other mainstream detection models, and the results showed that YOLOv8-RFMD increased mAP50 by 2.9% and mAP50:95 by 4.3% relative to the original model, with a reduction in model size by 0.53 MB and FLOPs by 0.3 G. The algorithm model improved in this study is relatively simple, with FLOPs only at 7.8 G, meeting the deployment conditions for mobile embedded devices. It provides technical support for intelligent spraying equipment for mulberry leaves and offers more precise disease diagnosis for mulberry gardens and other professionals.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| YOLOv8 | You Only Look Once version 8 |
| MDFA | Multi Dimension Feature Attention |
| CBS | Conv-BatchNomalization-SiLU |
| C2f | Faster Implementation of CSP Bottleneck with 2 convolutions |
| C3 | CSP Bottleneck with 3 convolutions |
| NWD | Normalized Wasserstein Distance |
| IoU | Intersection over union |
| mAP | Mean average precision |
| mAP50 | MAP Values at 50% loU threshold |
| mAP50:95 | MAP Values in the 50-95% loU threshold range |
| VGG | Visual Geometry Group |
| CNN | Convolutional neural network |
| Faster R-CNN | Faster region-based convolutional neural networks |
| SSP | Spatial pyramid pooling |
| CSP | Cross Stage Partial |
| ELAN | Efficient Layer Aggregation Network |
| SPPF | Spatial pyramid pooling fusion |
| PAN | Path aggregation network |
| FPN | Feature pyramid network |
| DF Loss | Distribution focal loss |
| CIOU | Complete intersection over union |
| BCE | Binary cross-entropy |
| SE | Squeeze-and-excitation |
| ECA | Efficient Channel Attention |
| CBAM | Convolutional Block Attention Module |
| UNAP | Un average pooling |
| RFCA | Receptive-Field Coordinated Attention |
| CA | Coordinated Attention |
| SSD | Single Shot Multibox Detector |
| MLCA | Mixed Local Channel Attention |
| EMA | Efficient Multi-Scale Attention |
| LSKA | Large Separable Kernel Attention |
| FLOPs | Floating point of per second |
References
- Rohela, G.K.; Shukla, P.; Kumar, R.; Chowdhury, S.R. Mulberry (Morus spp.): An ideal plant for sustainable development. Trees, Forests and People, 2020, 2, 100011. [Google Scholar] [CrossRef]
- Reddy, M.P.; Deeksha, A. Mulberry leaf disease detection using yolo. International Journal of Advance Research, Ideas and Innovations in Technology, 2021, 7, 3. [Google Scholar]
- Gnanesh, B.N.; Arunakumar, G.S.; Tejaswi, A.; Supriya, M.; Pappachan, A.; Harshitha, MM. Molecular Diagnostics of Soil-Borne and Foliar Diseases of Mulberry: Present Trends and Future Perspective. The Mulberry Genome, 2023; 215–241. [Google Scholar]
- Ngugi, H.N.; Ezugwu, A.E.; Akinyelu, A.A.; Abualigah, L. Revolutionizing crop disease detection with computational deep learning: a comprehensive review. Environmental Monitoring and Assessment, 2024, 196, 302. [Google Scholar] [CrossRef] [PubMed]
- Javidan, S.M.; Banakar, A.; Vakilian, K.A.; Ampatzidis, Y. Diagnosis of grape leaf diseases using automatic K-means clustering and machine learning. Smart Agricultural Technology, 2023, 3, 100081. [Google Scholar] [CrossRef]
- Sladojevic, S.; Arsenovic, M.; Anderla, A.; Culibrk, D.; Stefanovic, D. Deep neural networks based recognition of plant diseases by leaf image classification. Computational intelligence and neuroscience, 2016, 2016, 1–3289801. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ; ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Rangarajan, A.K.; Purushothaman, R.; Ramesh, A. Tomato crop disease classification using pre-trained deep learning algorithm. Procedia computer science, 2018, 133, 1040–1047. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Nahiduzzaman, M.; Chowdhury, M.E.H.; Salam, A.; Nahid, E.; Ahmed, F.; AL-Emadi, N.; Ayari, M.A.; Khandakar, A.; Haider, J. Explainable deep learning model for automatic mulberry leaf disease classification. Frontiers in Plant Science, 2023, 14, 1175515. [Google Scholar] [CrossRef]
- Waheed, A.; Goyal, M.; Gupta, D.; Khanna, A.; Hassanien, A.E.; Pandey, H.M. An optimized dense convolutional neural network model for disease recognition and classification in corn leaf. Computers and Electronics in Agriculture, 2020, 175, 105456. [Google Scholar] [CrossRef]
- Wen, C.; He, W.; Wu, W.; Liang, X.; Yang, J.; Nong, H.; lAN, Z. Recognition of mulberry leaf diseases based on multi-scale residual network fusion SENet. Plos one, 2024, 19, e0298700. [Google Scholar]
- Xue, Z.; Xu, R.; Bai, D.; Lin, H. YOLO-tea: A tea disease detection model improved by YOLOv5. Forests, 2023, 14, 415. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. Proceedings, Part I 14. [Google Scholar]
- Li, Y.; Sun, S.; Zhang, C.; Yang, G.; Ye, Q. One-stage disease detection method for maize leaf based on multi-scale feature fusion. Applied Sciences, 2022, 12, 7960. [Google Scholar] [CrossRef]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Nashville, TN, USA, 20-25 June 2021. [Google Scholar]
- Nie, X.; Wang, L.; Ding, H.; Xu, M. Strawberry verticillium wilt detection network based on multi-task learning and attention. IEEE access, 2019, 7, 170003–170011. [Google Scholar] [CrossRef]
- Dwivedi, R.; Dey, S.; Chakraborty, C.; Tiwari, S. Grape disease detection network based on multi-task learning and attention features. IEEE Sensors Journal, 2021, 21, 17573–17580. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Vancouver, BC, Canada, 17-24 June 2023. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, Honolulu, HI, USA, 21-26 July 2017. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, New York Hilton Midtown, New York, New York, USA, 7–12 February 2020.
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, 18-23 June 2018. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Seattle, WA, USA, 13-19 June 2020. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany,., 8–14 September 2018. [Google Scholar]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the 38th International conference on machine learning, Virtual Only, 18-24 July 2021. [Google Scholar]
- Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. Rfaconv: Innovating spatital attention and standard convolutional operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Wang, J.; Xu, C.; Yang, W.; Yu, L. A normalized Gaussian Wasserstein distance for tiny object detection. arXiv 2021, arXiv:2110.13389. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, Venice, Italy, 22-29 October 2017. [Google Scholar]
- Wan, D.; Lu, R.; Shen, S.; Xu, T.; Lang, X.; Ren, Z. Mixed local channel attention for object detection. Engineering Applications of Artificial Intelligence, 2023, 123, 106442. [Google Scholar] [CrossRef]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 04-10 June 2023. [Google Scholar]
- Lau, K.W.; Po, L.M.; Rehman, Y.A.U. Large separable kernel attention: Rethinking the large kernel attention design in cnn. Expert Systems with Applications. 2024, 236, 121352. [Google Scholar] [CrossRef]
- Lan, Y.; Sun, B.; Zhang, L.; Zhao, D. Identifying diseases and pests in ginger leaf under natural scenes using improved YOLOv5s. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE). 2024, 40, 210–246. [Google Scholar]











| Model | Precision(%) | Recall(%) | mAP50(%) | mAP50:95(%) | Model size(MB) | FLOPs(G) |
|---|---|---|---|---|---|---|
| YOLOv8-RFMD | 92.6 | 89.5 | 94.3 | 67.8 | 5.45 | 7.8 |
| YOLOv8n | 90.1 | 84.8 | 91.4 | 63.5 | 5.98 | 8.1 |
| YOLOv7-tiny | 90.8 | 88.1 | 92.2 | 56.1 | 11.7 | 13.2 |
| YOLOv5s | 90.1 | 85.1 | 91.6 | 61.8 | 5.04 | 7.1 |
| Faster R-CNN | 79.4 | 83.2 | 86.4 | 59.5 | 314 | 954 |
| SSD | 59.1 | 57.3 | 60.7 | 43.5 | 60.3 | 162 |
| RetinaNet | 64.2 | 61.7 | 64.5 | 46.3 | 338 | 150 |
| Attention | Precision(%) | Recall(%) | mAP50(%) | mAP50:95(%) | Model size(MB) | FLOPs(G) |
|---|---|---|---|---|---|---|
| MDFA | 92.6 | 89.5 | 94.3 | 67.8 | 5.45 | 7.8 |
| MLCA | 91.6 | 88.6 | 93.8 | 66.6 | 5.46 | 7.8 |
| EMA | 92.4 | 87.8 | 93.6 | 66.8 | 5.49 | 7.8 |
| LSKA | 92.5 | 88.8 | 93.9 | 66.9 | 5.60 | 7.8 |
| SE | 92.3 | 88.3 | 93.8 | 66.1 | 5.47 | 7.8 |
| ECA | 92.4 | 88.0 | 93.6 | 66.7 | 5.45 | 7.8 |
| CBAM | 91.7 | 88.8 | 93.8 | 67.0 | 5.49 | 7.8 |
| Test | MDFA | RFCA Conv |
ADown | NWD Loss |
Precision (%) |
Recall (%) |
mAP50 (%) |
mAP50:95 (%) |
Model size (MB) |
FLOPs (G) |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | - | - | - | - | 90.1 | 84.8 | 91.4 | 63.5 | 5.98 | 8.1 |
| 2 | - | - | - | 90.7 | 86.3 | 92.4 | 64.3 | 5.99 | 8.1 | |
| 3 | - | - | - | 90 | 86.5 | 92.4 | 64.9 | 6.22 | 8.5 | |
| 4 | - | - | 90.5 | 86.2 | 92.6 | 65.4 | 6.23 | 8.5 | ||
| 5 | - | - | - | 90.4 | 86.2 | 92.3 | 64.9 | 5.20 | 7.4 | |
| 6 | - | - | - | 91.4 | 86.2 | 92.4 | 63.9 | 5.98 | 8.1 | |
| 7 | 92.6 | 89.5 | 94.3 | 67.8 | 5.45 | 7.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
