Submitted:
14 August 2025
Posted:
15 August 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Small Target Detection: Residual peel fragments are often diminutive, frequently measuring less than 20×20 pixels in a high-resolution image. Standard models often lose the features of such small objects during successive down-sampling operations;
- Low Contrast and Complex Backgrounds: The textural and chromatic properties of the peel can be remarkably similar to the taro flesh itself. Furthermore, the conveyor belts used in processing plants often have grid-like or striped patterns that can be easily misidentified as peel fragments, leading to a high false positive rate;
- Dense and Occluded Scenes: It is common for multiple peel fragments to appear in close proximity, creating dense clusters. These fragments may also be partially occluded by the main body of the taro or other debris on the conveyor;
- The proposal of MDB-YOLO, a novel and lightweight architecture based on YOLOv8s. MDB-YOLO synergistically integrates six distinct technological enhancements designed specifically to improve the detection of small, irregular, and low-contrast targets while maintaining a low computational footprint;
- The development and public release of the TPID, a challenging, real-world benchmark dataset captured in an operational food processing facility. This dataset, with its detailed annotations, serves to validate our model and facilitate future research in industrial object detection;
- A comprehensive empirical evaluation, including extensive ablation studies and comparisons with other state-of-the-art (SOTA) models. The results demonstrate that MDB-YOLO achieves a superior balance of detection accuracy and computational efficiency, establishing it as a highly effective and practical solution for this industrial application.
2. The MDB-YOLO Method
2.1. Overall Network Architecture
2.2. Enhanced Feature Representation with C2f_EMA
2.3. High-Resolution Feature Recovery via DySample
2.4. Adaptive Kernel Learning with ODConv2d
2.5. Efficient Cross-Scale Integration with BiFPN_Concat2
2.6. Optimizing Bounding Box Regression with WIoU Loss
2.7. Improving Detection in Dense Scenes with Soft-NMS
3. Results
3.1. Experimental Setup
3.1.1. The TPID
3.1.2. Evaluation Metrics
-
Precision: Measures the accuracy of positive predictions. It is the ratio of true positives (TP) to the total number of predicted positives (TP + FP);Recall: Measures the model's ability to find all relevant instances. It is the ratio of TP to the total number of actual positives (TP + FN);
- mean Average Precision (mAP): The primary metric for evaluating object detection performance. The Average Precision (AP) for a single class is the area under the Precision-Recall curve. The mAP is the average of the AP values across all classes. In this study, since there is only one class ("peel fragment"), the AP and mAP are equivalent.
- Parameters (M): The total number of learnable parameters in the model, measured in millions. This is a key indicator of model size and memory footprint;
- GFLOPS: Giga Floating Point Operations Per Second. This metric quantifies the computational complexity of the model, indicating the number of multiply-add operations required for a single forward pass.
- mAP@0.5: Calculated at a fixed Intersection over Union (IoU) threshold of 0.5. This metric is often used to evaluate general detection capability.
- mAP@0.5:0.95: The average of mAP values calculated over a range of IoU thresholds from 0.5 to 0.95 with a step of 0.05. This metric provides a more stringent evaluation of localization accuracy.
3.1.3. Implementation and Training Details
3.2. Ablation Studies
3.3. Comparative Analysis with SOTA Models
3.4. Qualitative and Visual Analysis
4. Discussion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
| MDB-YOLO | Multi-Dimensional Bionic YOLO |
| DySample | Dynamic Upsampling |
| ODConv2d | Omni-Dimensional Dynamic Convolution |
| WIoU | Wise-IoU |
| TPID | Taro Peel Industrial Dataset |
| FDA | Food and Drug Administration |
| EFSA | European Food Safety Authority |
| AI | Artificial intelligence |
| YOLO | You Only Look Once |
| SOTA | State-of-the-art |
| NMS | Non-Maximum Suppression |
| SE | Squeeze-and-Excitation |
| EMA | Efficient Multi-Scale Attention |
| FPN | Feature Pyramid Networks |
| BiFPN | Bidirectional FPN |
| CIoU | Complete IoU |
| TP | True positive |
| AP | Average Precision |
| mAP | mean Average Precision |
References
- World Health Organization. WHO global strategy for food safety 2022-2030: Towards stronger food safety systems and global cooperation; World Health Organization: Geneva, Switzerland, 2022. [Google Scholar]
- Wani, N.R.; Rather, R.A.; Farooq, A.; et al. New insights in food security and environmental sustainability through waste food management. Environ. Sci. Pollut. Res. 2024, 31, 17835–17857. [Google Scholar] [CrossRef] [PubMed]
- Deeba, K.; Shankar, K.C.P.; Gnanavel, S.; et al. Artificial intelligence, computer vision, and robotics for industry 5.0. In Next Generation Data Science and Blockchain Technology for Industry 5.0: Concepts and Paradigms; Tyagi, A.K., Ed.; CRC Press: Boca Raton, FL, USA, 2025; pp. 295–324. [Google Scholar]
- Chhetri, K.B. Applications of artificial intelligence and machine learning in food quality control and safety assessment. Food Eng. Rev. 2024, 16, 1–21. [Google Scholar] [CrossRef]
- Roy, N.; Mukherjee, S.; Singh, S.; Singh, V.K.; Kumar, R. Root and tuber crops and their role in global food security. In Sustainable Production of Root and Tuber Crops; Kumar, R., Lal, M.K., Tiwari, R.K., Singh, B., Eds.; Routledge: London, UK, 2025; pp. 11–26. [Google Scholar]
- Thiele, G.; Friedmann, M.; Campos, H.; Polar, V.; Bentley, J. Root, tuber and banana food system innovations: Value creation for inclusive outcomes; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
- Tadesse, B.; Gebeyehu, S.; Kirui, L.; Maru, J. The contribution of potato to food security, income generation, employment, and the national economy of Ethiopia. Potato Res. 2025, in press. [Google Scholar]
- Mann, S.; Dixit, A.K.; Shrivastav, A. Development and performance optimization of a taro (Colocasia esculenta) peeling machine for enhanced efficiency in small-scale farming. Sci. Rep. 2025, 15, 11336. [Google Scholar] [CrossRef]
- Yu, K.; Zhong, M.; Zhu, W.; Rashid, A.; Han, R.; Virk, M.S.; Duan, K.; Zhao, Y.; Ren, X. Advances in computer vision and spectroscopy techniques for non-destructive quality assessment of citrus fruits: A comprehensive review. Foods 2025, 14, 386. [Google Scholar] [CrossRef]
- Kim, Y.A.; Kim, S.W.; Lee, M.H.; Lee, H.K.; Hwang, I.H. Erratum to: Comparisons of chemical composition, flavor and bioactive substances between Korean and imported velvet antler extracts. Food Sci. Anim. Resour. 2021, 41, 748. [Google Scholar] [CrossRef]
- Sun, D.W. Computer vision technology for food quality evaluation, 1st ed.; Academic Press: Amsterdam, The Netherlands, 2016. [Google Scholar]
- Anjali; Kumar, N. ; Nema, P.K. State-of-the-art non-destructive approaches for maturity index determination in fruits and vegetables: Principles, applications, and future directions. Food Prod. Process. Nutr. 2024, 6, 56. [Google Scholar] [CrossRef]
- Lin, Y.; Ma, J.; Wang, Q.; et al. Applications of machine learning techniques for enhancing nondestructive food quality and safety detection. Crit. Rev. Food Sci. Nutr. 2023, 63, 1649–1669. [Google Scholar] [CrossRef] [PubMed]
- Liu, Z.; Wang, S.; Zhang, Y.; et al. Artificial intelligence in food safety: A decade review and bibliometric analysis. Foods 2023, 12, 1242. [Google Scholar] [CrossRef]
- Tamasiga, P.; Onyeaka, H.; Bakwena, M.; et al. Forecasting disruptions in global food value chains to tackle food insecurity: The role of AI and big data analytics–A bibliometric and scientometric analysis. J. Agric. Food Res. 2023, 14, 100819. [Google Scholar] [CrossRef]
- Rahman, A.; Raj, A.; Tomy, P.; et al. A comprehensive bibliometric and content analysis of artificial intelligence in language learning: Tracing between the years 2017 and 2023. Artif. Intell. Rev. 2024, 57, 107. [Google Scholar] [CrossRef]
- Kale, R.S.; Shitole, S. Non-destructive fruit quality assessment: a review on emerging trends in thermal imaging technology. J. Comput. Anal. Appl. 2024, 33, 118–132. [Google Scholar]
- Fracarolli, J.A.; de Matos, L.M.; Santos, T.T.; et al. Computer vision applied to food and agricultural products. Rev. Ciênc. Agron. 2020, 51, e20207749. [Google Scholar] [CrossRef]
- Wang, C.Y.; Yeh, I.H.; Liao, H.Y.M. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the Computer Vision – ECCV 2024, Cham, Switzerland; 2024. [Google Scholar]
- Sharma, A.; Kumar, V.; Longchamps, L. Comparative performance of YOLOv8, YOLOv9, YOLOv10, and Faster R-CNN models for detection of multiple weed species. Smart Agric. Technol. 2024, 9, 100648. [Google Scholar] [CrossRef]
- Zhou, C.; Li, Y.; Wang, H.; et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. Int. J. Mach. Learn. Cybern. 2024, 1–65. [Google Scholar] [CrossRef]
- Zhu, J.; Hu, T.; Zheng, L.; et al. YOLOv8-C2f-Faster-EMA: An improved underwater trash detection model based on YOLOv8. Sensors 2024, 24, 2483. [Google Scholar] [CrossRef]
- Wang, J.; Meng, R.; Huang, Y.; et al. Road defect detection based on improved YOLOv8s model. Sci. Rep. 2024, 14, 16758. [Google Scholar] [CrossRef]
- Zhu, J.; Hu, T.; Zheng, L.; et al. YOLOv8-C2f-Faster-EMA: an improved underwater trash detection model based on YOLOv8. Sensors 2024, 24, 2483. [Google Scholar] [CrossRef] [PubMed]
- Ouyang, D.; Zhang, H.; Zhang, Y.; et al. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4-10 June 2023. [Google Scholar]
- Li, J.; Wang, B.; Liu, Z.; et al. CAMFv2: Better, faster and stronger for electrochemiluminescence image denoising. Appl. Intell. 2025, 55, 779–795. [Google Scholar] [CrossRef]
- Liu, W.; Tang, Z.; Han, G.; et al. Learning to upsample by learning to sample. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2-6 October 2023; pp. 6176–6186. [Google Scholar]
- Xi, Y.; Qu, D.; Du, L. DDM-YOLOv8s for Small Object Detection in Remote Sensing Images. In Proceedings of the 2024 7th International Conference on Machine Learning and Natural Language Processing (MLNLP), Singapore, 23-25 February 2024. [Google Scholar]
- Wang, H.; Gao, Y.; Yang, Z.; et al. Precision and speed: LSOD-YOLO for lightweight small object detection. Expert Syst. Appl. 2025, 269, 126440. [Google Scholar] [CrossRef]
- Li, C.; Zhou, A.; Yao, A. Omni-dimensional dynamic convolution. arXiv 2022, arXiv:2209.07947. [Google Scholar] [CrossRef]
- Gao, W.; Wen, P.; Wang, Y.; et al. ODCS-YOLO detection algorithm for rail surface defects based on omni-dimensional dynamic convolution and context augmentation module. Meas. Sci. Technol. 2024, 35, 106006. [Google Scholar] [CrossRef]
- Zhang, Z.; Geng, X.; Liu, H.; et al. TDR-Model: Tomato disease recognition based on image dehazing and improved MobileNetV3 model. IEEE Access 2024, 12, 74363–74373. [Google Scholar] [CrossRef]
- Long, Y.; Yang, Y.; Hu, J.; et al. Operating mechanism detection in aluminum electrolysis workshops via YOLOv8-MIE. IEEE Trans. Instrum. Meas. 2025, 74, 4501211. [Google Scholar] [CrossRef]
- Lei, T.; Zhang, D.; Du, X.; et al. Semi-supervised medical image segmentation using adversarial consistency learning and dynamic convolution network. IEEE Trans. Med. Imaging 2023, 42, 1265–1277. [Google Scholar] [CrossRef]
- Tang, X.; Wu, S.; Wang, J.; et al. Enhancing multilevel tea leaf recognition based on improved YOLOv8n. Front. Plant Sci. 2025, 16, 1540670. [Google Scholar] [CrossRef]
- Xiao, H.; Tian, K.; Wang, K.; et al. YOSBG: UAV image data-driven high-throughput field tobacco leaf counting method. arXiv 2023, arXiv:2308.08643. [Google Scholar]
- Tong, Z.; Chen, Y.; Xu, Y.; et al. Wise-IoU: Bounding box regression loss with dynamic focusing mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Zhang, Y.; Wu, C.; Zhang, T.; et al. Full-scale feature aggregation and grouping feature reconstruction-based UAV image target detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–11. [Google Scholar] [CrossRef]
- Tang, X.; Zhang, J.; Xia, Y.; et al. DBW-YOLO: A high-precision SAR ship detection method for complex environments. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 7029–7039. [Google Scholar] [CrossRef]
- Zhou, W.; Li, C.; Ye, Z.; et al. An efficient tiny defect detection method for PCB with improved YOLO through a compression training strategy. IEEE Trans. Instrum. Meas. 2024, 73, 1–14. [Google Scholar] [CrossRef]
- Pan, L.; Diao, J.; Wang, Z.; et al. Hf-yolo: Advanced pedestrian detection model with feature fusion and imbalance resolution. Neural Process. Lett. 2024, 56, 90. [Google Scholar] [CrossRef]
- Noh, K.; Jeon, J.; Baek, S.; et al. Enhancing object detection in dense images: Adjustable non-maximum suppression for single-class detection. IEEE Access 2024, 12, 62963–62976. [Google Scholar] [CrossRef]
- Chen, F.; Chen, X.; Xu, Y.; et al. Soft-NMS-enabled YOLOv5 with SIOU for small water surface floater detection in UAV-captured images. Sustainability 2023, 15, 10751. [Google Scholar] [CrossRef]
- Alif, M.A.R.; Hussain, M. YOLOv1 to YOLOv10: A comprehensive review of YOLO variants and their application in the agricultural domain. arXiv 2024, arXiv:2406.10139. [Google Scholar] [CrossRef]
- Chen, J.; Er, M.J. Dynamic YOLO for small underwater object detection. Artif. Intell. Rev. 2024, 57, 165. [Google Scholar] [CrossRef]
- Filho, E.V.; da Silva, L.A.P.P.; de Aguiar, A.P.; et al. Computer vision as a tool to support quality control and robotic handling of fruit: A case study. Appl. Sci. 2024, 14, 9727. [Google Scholar] [CrossRef]
- Kumar, P.; Chang, T.H.; Zhong, Z.G.; et al. Enhanced activity detection in mechanical robot-dog using dynamic strain-based FBG sensors and YOLO-v7. IEEE Sens. J. 2025, 25, 1807–1817. [Google Scholar] [CrossRef]
- Liu, H.; Li, D.; Jiang, B.; et al. MGBM-YOLO: a faster light-weight object detection model for robotic grasping of bolster spring based on image-based visual servoing. J. Intell. Robot. Syst. 2022, 104, 77. [Google Scholar] [CrossRef]
- Mei, Z.; Li, Y.; Zhu, R.; et al. Intelligent fruit localization and grasping method based on YOLO VX model and 3D vision. Agriculture 2025, 15, 1508. [Google Scholar] [CrossRef]
- Chen, W.; Zhang, J.; Guo, B.; et al. An apple detection method based on Des-YOLO v4 algorithm for harvesting robots in complex environment. Math. Probl. Eng. 2021, 2021, 7351470. [Google Scholar] [CrossRef]

















| Characteristic | Value |
|---|---|
| Data Source | Operational Taro Processing Line (Dongguan Deying Food Machinery) |
| Image Resolution | 1920 × 1080 pixels |
| Total Images (Original / Augmented) | 282 / 1056 |
| Train/Val/Test Split (Images) | 739 / 212 / 105 |
| Total Instances | 18,341 |
| Train/Val/Test Split (Instances) | 13,071 / 3,225 / 2,045 |
| Annotation Tool | X-Anylabeling |
| Key Visual Challenges | Small targets (<20px), low contrast, dense clustering, occlusion, complex background textures, variable lighting, motion blur |
| Model ID | Modifications | Parameters (M) | GFLOPS | mAP@0.5 (%) | mAP@0.5:0.95 (%) |
|---|---|---|---|---|---|
| 1 | YOLOv8s (Baseline) | 11.14 | 28.6 | 90.8 | 65.7 |
| 2 | Model 1 + C2f_EMA + WIoU | 11.18 | 29.3 | 91.4 | 65.2 |
| 3 | Model 2 + DySample | 11.20 | 29.3 | 91.6 | 65.8 |
| 4 | Model 3 + BiFPN_Concat2 | 11.20 | 29.3 | 90.7 | 65.4 |
| 5 | Model 4 + Soft-NMS | 11.20 | 29.3 | 90.9 | 68.6 |
| 6 | Model 5 + ODConv2d | 13.44 | 28.4 | 91.2 | 68.3 |
| 7 | MDB-YOLO (Model 6 + Hyperparameter Tuning) | 13.44 | 28.4 | 92.1 | 69.7 |
| Model ID | Parameters (M) | GFLOPS | Inference Time (ms) | Precision (%) | Recall (%) | mAP@0.5 (%) | mAP@0.5:0.95 (%) |
|---|---|---|---|---|---|---|---|
| YOLOv8s | 11.14 | 28.6 | 3.4 | 89.2 | 85.2 | 90.8 | 65.7 |
| YOLOv9-C | 25.53 | 103.7 | 7.3 | 90.1 | 84.5 | 90.9 | 68.3 |
| RT-DETR-L | 32.87 | 108 | 7.3 | 90.8 | 86.4 | 92.1 | 67.6 |
| MDB-YOLO (ours) | 13.44 | 28.4 | 2.0 | 90.9 | 0.88 | 92.1 | 69.7 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).