Submitted:
13 September 2024
Posted:
14 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Materials and Methods
2.1. Materials of Dataset
2.2. IRB-YOLO Network
| Algorithm 1:Training Algorithm of IRB-YOLO |
![]() |
2.3. Efficiency Components
2.4. Inverted Residual Block
2.5. Complete IoU: More Efficient Regression Loss Function
3. Results
3.1. Comparison with Mainstream Segment Models
3.2. Ablation Experiment
3.3. Multi-Species Mixed Dataset Experiments
3.4. HeapMap Analysis Based on EigenCAM
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A [
| Format | Model | Metadata | Argument |
|---|---|---|---|
| Pytorch | Taoism.pt | - | |
| TorchScript | Taoism.torchScript | imgsz, optimize | |
| ONNX | Taoism.onnx | imgsz, half, dynamic,simplify, opset | |
| openVINO | Taoism.openvino_model | imgsz, half, int8 | |
| TensorRT | Taoism.engine | imgsz, half, dynamic,simplify, workspace | |
| TF Lite | Taoism.tflite | imgsz, half, int8 |
References
- Shen, S.; Li, D.; Mei, L.; Xu, C.; Ye, Z.; Zhang, Q.; Hong, B.; Yang, W.; Wang, Y. DFA-Net: multi-scale dense feature-aware network via integrated attention for unmanned aerial vehicle infrared and visible image fusion. Drones 2023, 7, 517. [Google Scholar] [CrossRef]
- Zhang, R.; Luo, B.; Su, X.; Liu, J. GA-Net: Accurate and Efficient Object Detection on UAV Images Based on Grid Activations. Drones 2024, 8, 74. [Google Scholar] [CrossRef]
- Zhang, Z. Drone-YOLO: An Efficient Neural Network Method for Target Detection in Drone Images. Drones, 2023; 526. [Google Scholar]
- Lu, J.; Chen, W.; Lan, Y.; Qiu, X.; Huang, J.; Luo, H. Design of citrus peel defect and fruit morphology detection method based on machine vision. Computers and Electronics in Agriculture, 2024; 108721. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. IEEE Transactions on Pattern Analysis & Machine Intelligence.
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 2015, 28. [Google Scholar] [CrossRef] [PubMed]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015; pp. 3431–3440.
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis & Machine Intelligence.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. IEEE 2016. [Google Scholar]
- Lin, G.; Milan, A.; Shen, C.; Reid, I. RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017. [Google Scholar]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. IEEE Computer Society 2016. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020; arXiv:2010.11929 2020. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 568–578.
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 10012–10022.
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. IEEE Computer Society 2014. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence 2014, 37, 1904–16. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv e-prints, 2018. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; 2015. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Computer Science, 2014; 357–361. [Google Scholar]
- Bengio, Y.; Schwenk, H.; Senécal, J.S.; Morin, F.; Gauvain, J.L. Neural Probabilistic Language Models. The Journal of Machine Learning Research 2003, 3, 1137–1155. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser. ; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017; arXiv:1704.04861 2017. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 4510–4520.
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. UnitBox: An Advanced Object Detection Network. ACM 2016. [Google Scholar]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.Y.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. IEEE 2019. [Google Scholar]
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, 2022; arXiv:2209.02976 2022. [Google Scholar]
- Muhammad, M.B.; Yeasin, M. Eigen-cam: Class activation map using principal components. In Proceedings of the 2020 international joint conference on neural networks (IJCNN). IEEE; 2020; pp. 1–7. [Google Scholar]
- Mai, Y.; Zheng, J.; Luo, Z.; Yu, C.; Lu, J.; Yu, C.; Lin, Z.; Liao, Z. Taoism-Net: A Fruit Tree Segmentation Model Based on Minimalism Design for UAV Camera. Agronomy 2024, 14, 1155. [Google Scholar] [CrossRef]











| Model | Param | FLOPs | Latency | (Precision) | (Recall) | mAP50 |
|---|---|---|---|---|---|---|
| (/M) | (/G) | (/ms) | (/%) | (/%) | (/%) | |
| YOLOv3 [17] | 103.67 | 282.2 | 5.4 | 71.5 | 70.4 | 70.5 |
| YOLOv3-tiny [17] | 12.13 | 18.9 | 1.6 | 65.9 | 70.1 | 68.5 |
| YOLOv3-spp [17] | 104.71 | 283.1 | 7.0 | 66.0 | 73.4 | 70.8 |
| YOLOv5-m | 25.05 | 64.0 | 4.0 | 70.7 | 70.0 | 72.2 |
| YOLOv5-l | 53.13 | 134.7 | 5.1 | 72.5 | 67.6 | 70.9 |
| YOLOv6-m [26] | 51.98 | 161.1 | 4.7 | 71.5 | 71.8 | 73.9 |
| YOLOv6-l [26] | 110.86 | 391.2 | 7.0 | 69.5 | 71.5 | 72.5 |
| YOLOv8-m | 27.22 | 110.0 | 6.4 | 69.7 | 72.2 | 73.8 |
| YOLOv8-l | 45.91 | 220.1 | 6.8 | 71.6 | 70.6 | 74.0 |
| IRB-YOLO-v2 | 28.26 | 177.9 | 9.1 | 75.9 | 71.6 | 77.3 |
| IRB-YOLO-v5 | 31.05 | 230.5 | 8.5 | 75.3 | 73.0 | 77.9 |
| Strategies | Param | FLOPs | Precision | mAP50 |
|---|---|---|---|---|
| (/M) | (/G) | (/ms) | (/%) | |
| Backbone–YOLOv8-l | 45.91 | 220.1 | 71.6 | 74.0 |
| Backbone–IRB-YOLO | 31.05 | 230.5 | 70.3 | 76.7 |
| Backbone–IRB-YOLO and CIoU | 31.05 | 230.5 | 75.3 | 77.9 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
