Submitted:
26 September 2024
Posted:
26 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Datasets: The observations on datasets are obtained in the evaluation and analysis. First, the performance of YOLO models trained and evaluated on various datasets can be disparate. In particular, YOLO models perform better on PASCAL VOC than on COCO. Second, YOLOv5 and YOLOv8 are dominant models on both datasets. The reason for the performance disparity of YOLO models on the two datasets can be attributed to various factors, including differences in object classes, dataset sizes, object characteristics, architectural design, optimization techniques, and dataset biases. These factors affect model performance on various datasets, leading to variations in AP scores.
- IoU thresholds: The observation on IoU thresholds indicates that as the IoU threshold increases, the mean AP (mAP) of a YOLO model decreases. YOLOv5 and YOLOv8 perform better than other YOLO models across various IoU thresholds. The first observation is related to the essence of the IoU threshold, where a higher threshold demands a higher standard for accurate predictions. For the second observation, YOLOv5 and YOLOv8 outperform other models, because of their bag of freebies (BoF) and bag of specials (BoS) techniques, such as backbone, neck, head architectures, loss functions, anchor-based mechanisms, modules for enhancing the receptive field, data augmentation strategies, feature integration, and activation functions.
- Class of objects: Based on the observation of object classes, the performance of YOLO models can fluctuate or stabilize depending on the quality of the datasets and the characteristics of the objects within them. Similar to the above observations, YOLOv5 and YOLOv8 outperform most object classes in the two datasets. For the first observation related to the class of objects, we observe that differences in AP values of YOLO models across object classes come from certain factors, such as object class imbalance and object class-specific characteristics. For example, objects with smaller sizes or complex shapes can pose challenges for detection, resulting in lower AP values for those object classes. For the second observation, YOLOv5 and YOLOv8 still maintain their lead, because of their BoF and BoS techniques as analyzed in the observations related to IoU thresholds.
2. Background and Motivation
2.1. Object Detection in Road Transportation
2.1.1. Object Detection-based Applications
- Intelligent traffic management: Object detection is applied to monitor and analyze traffic conditions in real-time in traffic surveillance systems. For example, sensors in these systems, combined with object detection techniques, can identify and track vehicles, pedestrians, and other road users. The data gathered from detection and tracking is then employed for traffic flow management, congestion detection, and incident identification, helping to inform decisions and enhance traffic management strategies.
- Advanced driver assistance systems (ADAS): Object detection is a core technique in ADAS designed to improve driver safety and assist with driving tasks. By using sensors, object detection in ADAS can identify and track road objects, enabling features such as forward collision warnings, lane departure alerts, blind-spot detection, and adaptive cruise control.
- Autonomous vehicles: Object detection using multiple sensors, such as cameras, LiDAR (light detection and ranging), radar, and ultrasonic sensors, enables vehicles to perceive and comprehend their surroundings, facilitating autonomous driving without human intervention. The data collected by these sensors allow autonomous vehicles to identify and classify objects like vehicles, pedestrians, cyclists, traffic signs, and traffic lights. This information supports decision-making and ensures safe navigation, enabling vehicles to operate independently.
- Traffic sign detection and recognition: Object detection detects and recognizes traffic signs and road markings, such as traffic signs (speed limits, stop signs, yield signs, and lane markings). This information assists drivers in real-time compliance with traffic regulations and supports autonomous vehicles in understanding and obeying traffic signs.
- In summary, object detection techniques are vital in road transportation systems. They enhance safety and efficiency for drivers, passengers, and pedestrians. Additionally, these techniques enable smart transportation systems to perceive and adapt to dynamic road conditions.
2.1.2. One-Stage-Based Object Detection System
- Backbone network: acts as a feature generation network, which extracts features from an image. It inputs an image and generates a feature map while preserving spatial information of the image. Popular backbone networks used in object detection methods are Darknet [20], CSPDarknet [39], ResNet [40], VGG [41], EfficientNet [42], MobileNet [43], etc.
- RoI pooling: is a method focusing only on certain portions of the feature map generated from the backbone to output new fixed-length feature vectors. Using the feature map from the backbone network alone is ineffective for locating objects, so RoI pooling addresses this by focusing on a set number of Regions of Interest (RoIs), determined by a region proposal network (RPN), to produce a fixed-length feature vector for each RoI. Thus, RoI pooling gets two inputs, which are the feature map from the backbone and RoI proposals from an RPN.
- Classification and location: fixed-length feature vectors from RoI pooling are input to fully connected layers for classification and bounding box regression (e.g., location)
2.2. YOLO Models
- Backbone: is a CNN which extracts features from input images, typically pre-trained on a classification dataset, usually ImageNet, as explained in Section 2.1.2. All backbone models are classification models. Well-known backbones used in object detection are VGG16 [41], ResNet50 [44], EfficientNet [42], CSPDarknet53 [45], etc.
- Neck: layers collected feature maps from different backbone stages. Simply, it works as a feature aggregator. It is composed of several bottom-up and top-down paths as depicted in the figure. Neck networks in object detection include spatial pyramid pooling (SPP) [46], feature pyramid network (FPN) [47], path aggregation network (PANet) [48], etc.
- Head/dense prediction: works as an object detector, which determines the object region but does not specify which object classes are presented in the region. It is equal to the RoI pooling of the one-stage object detection in Figure 2. Head networks can be classified into anchor-based or anchor-free detectors [27].
2.3. Metrics for Evaluating Object Detection
- According to IoU and its score threshold definition (e.g., is considered as a good prediction), how many objects in an image are detected correctly will be determined.
-
Other basic concepts used in accuracy metrics are the true positive (TP), false positive (FP), and false negative (FN), which are defined as follows:
- True positive (TP): indicates the correct detection of the ground truth bounding box [49]. It means the IoU score of the prediction is greater than a predefined threshold
- False positive (FP): indicates the incorrect detection of the ground truth bounding box [49]. It means the IoU score of the prediction is smaller than a predefined threshold
- False negative (FN): indicates the undetected ground truth bounding box [49]
- A true negative (TN) indicates that the bounding boxes should not be detected in an image. This concept are not be considered in object detection, because the number of this kind of bounding boxes is massive in an image. Thus, metrics using the TN are not be used to assess object detection methods, such as the receiver operating characteristic (ROC) curves, FP rate (FPR), TP rate (TPR) [49].
- is an area under the precision-recall curve evaluated at IoU threshold. It is defined as follows:One of the most popular metrics developed from is the mean average precision (mAP). mAP is used to assess object detection methods over all classes in a particular dataset [49]. Basically, it is the average AP over all classes:where denotes the AP of the class in the dataset, and N is the total number of classes in the dataset.
- Which metric should be used for measuring the performance of object detection is crucial. According to the steps in an object detection problem, mAP is suggested to use on a validation set. Otherwise, precision, recall, and F1-Score are adequate on the test set.
- In this article, we focus on analyzing and evaluating the performance of different YOLO versions, thus we consider mAP as the primary performance metric.
2.4. Datasets for Object Detection
2.4.1. COCO (Common Objects in Context)
2.4.2. PASCAL VOC (Visual Object Classes)
3. Evaluation and Result Analysis
3.1. Experimental Setup
- We validate various YOLO models on COCO with pre-trained weights. Additionally, we train and validate various YOLO models on COCO and Pascal VOC with CPUs.
3.2. Evaluation Overview
3.3. Experimental Results and Observations
3.3.1. Dataset
- Observation 1: The performance of YOLO models trained or evaluated on different datasets can have disparate performance.
- Furthermore, YOLOv5 outperforms other models because it uses the CSPDarknet53 backbone, PANet for effective multi-scale feature integration, the auto-anchor technique for bounding box adjustment, and improved loss function. Additionally, YOLOv8 demonstrates superior performance due to its use of the modified CSPDarknet53 backbone, PAN feature fusion, anchor-free detection, class loss optimization, complete IoU loss for precise bounding boxes, and distribution focal loss to enhance accuracy across diverse object classes. Therefore, we make the following observation related to the datasets as follows.
- Observation 2: The YOLOv5 model is dominant on the COCO dataset, and YOLOv8 is dominant on the PASCAL VOC dataset.
3.3.2. Intersection over Union (IoU) Thresholds
- Observation 3: The mAP of the YOLO models becomes lower with a higher IoU threshold.
- Observation 4: The YOLOv5 and YOLOv8 models outperform other YOLO models across IoU thresholds.
3.3.3. Class of Objects
- Observation 5: The performance of YOLO models on various object classes can fluctuate or stabilize according to the quality of the datasets.
- Observation 6: Both YOLOv5 and YOLOv8 outperform the other models on most of the object classes.
- The following section summarizes the overall performance results, demonstrating that YOLOv4, YOLOv5, and YOLOv8 outperform the other YOLO models on the two datasets.
3.4. Summary
4. Related Work
5. Conclusions
References
- Butler, L.; Yigitcanlar, T.; Paz, A. Smart Urban Mobility Innovations: A Comprehensive Review and Evaluation. IEEE Access 2020, 8, 196034–196049. doi:10.1109/ACCESS.2020.3034596. [CrossRef]
- Golpayegani, F.; Guériau, M.; Laharotte, P.A.; Ghanadbashi, S.; Guo, J.; Geraghty, J.; Wang, S. Intelligent Shared Mobility Systems: A Survey on Whole System Design Requirements, Challenges and Future Direction. IEEE Access 2022, 10, 35302–35320. doi:10.1109/ACCESS.2022.3162848. [CrossRef]
- Longo, A.; Zappatore, M.; Navathe, S.B. The unified chart of mobility services: Towards a systemic approach to analyze service quality in smart mobility ecosystem. Journal of Parallel and Distributed Computing 2019, 127, 118–133.
- Peak, E. What makes transportation smart? defining intelligent transportation. https://www.iotforall.com/what-makes-transportation-smart-defining-intelligent-transportation. Accessed: 2023-06-13.
- Zantalis, F.; Koulouras, G.; Karabetsos, S.; Kandris, D. A review of machine learning and IoT in smart transportation. Future Internet 2019, 11, 94.
- IBM. What is smart transportation. https://www.ibm.com/blog/smart-transportation/. Accessed: 2023-06-13.
- Chen, B.H.; Huang, S.C. An Advanced Moving Object Detection Algorithm for Automatic Traffic Monitoring in Real-World Limited Bandwidth Networks. IEEE Transactions on Multimedia 2014, 16, 837–847. doi:10.1109/TMM.2014.2298377. [CrossRef]
- Ulmer, B. VITA-an autonomous road vehicle (ARV) for collision avoidance in traffic. Proceedings of the Intelligent Vehicles’ 92 Symposium, 1992, pp. 36–41. doi:10.1109/IVS.1992.252230. [CrossRef]
- Vu, V.T.; Bremond, F.; Davini, G.; Thonnat, M.; Pham, Q.C.; Allezard, N.; Sayd, P.; Rouas, J.L.; Ambellouis, S.; Flancquart, A. Audio-Video Event Recognition System for Public Transport Security. 2006 IET Conference on Crime and Security, 2006, pp. 414–419.
- Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, Vol. 1, pp. 886–893 vol. 1. doi:10.1109/CVPR.2005.177. [CrossRef]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object Detection with Discriminatively Trained Part-Based Models. IEEE Transactions on Pattern Analysis and Machine Intelligence 2010, 32, 1627–1645. doi:10.1109/TPAMI.2009.167. [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015, 37, 1904–1916. doi:10.1109/TPAMI.2015.2389824. [CrossRef]
- Girshick, R.B. Fast R-CNN. CoRR 2015, abs/1504.08083, [1504.08083].
- Yang, Z.; Liu, S.; Hu, H.; Wang, L.; Lin, S. RepPoints: Point Set Representation for Object Detection. CoRR 2019, abs/1904.11490, [1904.11490].
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. CoRR 2020, abs/2005.12872, [2005.12872].
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.S.; Zhang, L. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. CoRR 2020, abs/2012.15840, [2012.15840].
- Sun, Z.; Cao, S.; Yang, Y.; Kitani, K.M. Rethinking transformer-based set prediction for object detection. Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 3611–3620.
- Girshick, R. Fast R-CNN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2015.
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. Computer Vision – ECCV 2016; Leibe, B.; Matas, J.; Sebe, N.; Welling, M., Eds.; Springer International Publishing: Cham, 2016; pp. 21–37.
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. Proceedings of the IEEE international conference on computer vision, 2017, pp. 2980–2988.
- Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. CoRR 2018, abs/1808.01244, [1808.01244].
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digital Signal Processing 2022, p. 103514.
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.; Chen, J.; Liu, X.; Pietikäinen, M. Deep learning for generic object detection: A survey. International journal of computer vision 2020, 128, 261–318. [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016, pp. 6517–6525.
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. ArXiv 2018, abs/1804.02767.
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. ArXiv 2020, abs/2004.10934.
- YOLOv5: The friendliest AI architecture you’ll ever use. https://ultralytics.com/yolov5. Accessed: 2023-06-12.
- Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; Li, Y.; Zhang, B.; Liang, Y.; Zhou, L.; Xu, X.; Chu, X.; Wei, X.; Wei, X. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. ArXiv 2022, abs/2209.02976.
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. ArXiv 2022, abs/2207.02696.
- Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8. ArXiv 2023, abs/2305.09972.
- Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent Advances in Deep Learning for Object Detection. ArXiv 2019, abs/1908.03673.
- Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868.
- Tong, K.; Wu, Y.; Zhou, F. Recent advances in small object detection based on deep learning: A review. Image Vis. Comput. 2020, 97, 103910.
- Nazir, A.; Wani, M.A. You Only Look Once - Object Detection Models: A Review. 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), 2023, pp. 1088–1095.
- Sozzi, M.; Cantalamessa, S.; Cogato, A.; Kayad, A.; Marinello, F. Automatic Bunch Detection in White Grape Varieties Using YOLOv3, YOLOv4, and YOLOv5 Deep Learning Algorithms. Agronomy 2022, 12. doi:10.3390/agronomy12020319. [CrossRef]
- Pal, S.K.; Pramanik, A.; Maiti, J.; Mitra, P. Deep learning in multi-object detection and tracking: state of the art. Applied Intelligence 2021, 51, 6400 – 6429. [CrossRef]
- Jiao, L.; Zhang, F.; Liu, F.; Yang, S.; Li, L.; Feng, Z.; Qu, R. A Survey of Deep Learning-Based Object Detection. IEEE Access 2019, 7, 128837–128868. doi:10.1109/ACCESS.2019.2939201. [CrossRef]
- Bochkovskiy, A.; Wang, C.; Liao, H.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. CoRR 2020, abs/2004.10934, [2004.10934].
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi:10.1109/CVPR.2016.90. [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 2014.
- Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. International conference on machine learning. PMLR, 2019, pp. 6105–6114.
- Howard, A.G. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 2017.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- Wang, C.Y.; Liao, H.Y.M.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W.; Yeh, I.H. CSPNet: A new backbone that can enhance learning capability of CNN. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020, pp. 390–391.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 2015, 37, 1904–1916. doi:10.1109/TPAMI.2015.2389824. [CrossRef]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Padilla, R.; Netto, S.L.; da Silva, E.A.B. A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 2020, pp. 237–242. doi:10.1109/IWSSIP48289.2020.9145130. [CrossRef]
- Liu, L.; Ouyang, W.; Wang, X.; Fieguth, P.W.; Chen, J.; Liu, X.; Pietikäinen, M. Deep Learning for Generic Object Detection: A Survey. International Journal of Computer Vision 2018, 128, 261–318.
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.N.; Lee, B. A Survey of Modern Deep Learning based Object Detection Models. Digit. Signal Process. 2021, 126, 103514.
- Two-i. What are some interesting applications of object detection. https://www.two-i.com/blog/what-are-some-interesting-applications-of-object-detection. Accessed: 2023-07-07.
- Lee, J.; il Hwang, K. YOLO with adaptive frame control for real-time object detection applications. Multimedia Tools and Applications 2021, 81, 36375 – 36396.
- LearnopenCV.com. FPS performance comparison of YOLO models. https://learnopencv.com/performance-comparison-of-yolo-models/#FPS-Performance-Comparison-of-YOLO-Models-on-CPU. Accessed: 2023-07-07.
- KDnuggets.com. Metrics evaluate deep learning object detectors. https://www.kdnuggets.com/2020/08/metrics-evaluate-deep-learning-object-detectors.html. Accessed: 2023-07-11.
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848. [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 2014, pp. 740–755.
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. International journal of computer vision 2010, 88, 303–338. [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? the kitti vision benchmark suite. 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 3354–3361.
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 3974–3983.
- v7labs. COCO dataset guide. https://www.v7labs.com/blog/coco-dataset-guide. Accessed: 2023-08-07.
- PASCAL. Pascal2. http://host.robots.ox.ac.uk/pascal/VOC/. Accessed: 2023-08-07.
- Liu, K.; Tang, H.; He, S.; Yu, Q.; Xiong, Y.; Wang, N. Performance validation of YOLO variants for object detection. Proceedings of the 2021 International Conference on bioinformatics and intelligent computing, 2021, pp. 239–243.







| features | YOLOv3 | YOLOv4 | YOLOv5 |
| ine Backbone | Darknet53 | CSPDarknet53 | CSPDarknet53 |
| Neck | FPN | SPP, PANet | PANet |
| Anchor box | Scaled size, | Scaled size, | Auto-anchor |
| varied shape | varied shape, | ||
| K-Means generating | |||
| Activation function | ReLU | Leaky ReLU | Swish |
| Loss function | Localization loss, | Localization loss, | Localization loss, |
| confidence loss, | confidence loss, | confidence loss, | |
| class loss | class loss, | class loss, | |
| IoU loss, CIoU loss | IoU loss, CIoU loss | ||
| focal loss, GHM loss | |||
| Data augmentation | Mosaic | Mosaic | Mosaic, random shapes training |
| features | YOLOv6 | YOLOv7 | YOLOv8 |
|---|---|---|---|
| Backbone | EfficientRep | E-ELAN | Modified CSPDarknet53 |
| Neck | Rep-PAN | CSPSPP, ELAN, E-ELAN | PAN |
| Anchor box | Anchor-free | Auto-anchor | Anchor-free |
| Activation function | ReLU | SiLU | Mish |
| Loss function | Varifocal loss, | Varifocal loss, | Class loss, |
| distribution | distribution | CIoU loss, | |
| focal loss | focal loss | distribution | |
| focal loss | |||
| Data augmentation | Mosaic | Mosaic, mixup | Mosaic |
| COCO | PASCAL VOC | |||
|---|---|---|---|---|
| Training | Validation | Training | Validation | |
| Num. Image | - | 5000 | 11530 | 3422 |
| person, parking meter | ||||
| Classes | stop sign, traffic light, | bicycle, car, motorbike, | ||
| train, bus, motorcycle, car, | bus, person, train | |||
| truck, bicycle | ||||
| Num. Labels | [60,10777] | [142,7835] | ||
| models | bicycle | car | motorcycle | bus | train | truck | traffic light | stop sign | parkingmeter | person | AP |
|---|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv8 | 0.584 | 0.613 | 0.686 | 0.767 | 0.827 | 0.46 | 0.516 | 0.752 | 0.679 | 0.729 | 0.6613 |
| YOLOv7 | 0.507 | 0.575 | 0.628 | 0.764 | 0.851 | 0.45 | 0.395 | 0.659 | 0.616 | 0.687 | 0.6132 |
| YOLOv6 | 0.656 | 0.739 | 0.817 | 0.904 | 0.956 | 0.648 | 0.59 | 0.822 | 0.732 | 0.862 | 0.7726 |
| YOLOv5 | 0.695 | 0.743 | 0.79 | 0.864 | 0.906 | 0.633 | 0.622 | 0.818 | 0.746 | 0.813 | 0.763 |
| YOLOv4 | 0.695 | 0.766 | 0.811 | 0.916 | 0.939 | 0.696 | 0.644 | 0.844 | 0.877 | 0.835 | 0.8023 |
| YOLOv3 | 0.594 | 0.669 | 0.758 | 0.894 | 0.925 | 0.673 | 0.568 | 0.766 | 0.788 | 0.77 | 0.7405 |
| models | bicycle | car | motorcycle | bus | train | truck | traffic light | stop sign | parking meter | person | AP |
|---|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv8 | 0.44 | 0.481 | 0.521 | 0.699 | 0.694 | 0.366 | 0.339 | 0.722 | 0.569 | 0.584 | 0.5415 |
| YOLOv7 | 0.338 | 0.426 | 0.443 | 0.671 | 0.688 | 0.349 | 0.239 | 0.615 | 0.463 | 0.524 | 0.4756 |
| YOLOv6 | 0.413 | 0.52 | 0.564 | 0.779 | 0.78 | 0.471 | 0.332 | 0.748 | 0.545 | 0.636 | 0.5788 |
| YOLOv5 | 0.507 | 0.576 | 0.598 | 0.789 | 0.773 | 0.507 | 0.394 | 0.776 | 0.614 | 0.662 | 0.6196 |
| YOLOv4 | 0.36943 | 0.4536 | 0.4549 | 0.6782 | 0.6047 | 0.4425 | 0.2869 | 0.6777 | 0.6169 | 0.5179 | 0.4614 |
| YOLOv3 | 0.2472 | 0.3126 | 0.356 | 0.5961 | 0.5685 | 0.3361 | 0.202 | 0.5133 | 0.4747 | 0.3936 | 0.4 |
| models | bicycle | bus | car | motorbike | person | train | AP |
|---|---|---|---|---|---|---|---|
| YOLOv8 | 0.883 | 0.875 | 0.901 | 0.857 | 0.864 | 0.89 | 0.8783 |
| YOLOv7 | 0.378 | 0.627 | 0.458 | 0.409 | 0.463 | 0.688 | 0.5038 |
| YOLOv6 | 0.494 | 0.691 | 0.51 | 0.477 | 0.632 | 0.603 | 0.5678 |
| YOLOv5 | 0.801 | 0.866 | 0.82 | 0.799 | 0.844 | 0.811 | 0.8235 |
| YOLOv4 | 0.9737 | 1 | 0.9854 | 1 | 0.9795 | 0.9875 | 0.9877 |
| YOLOv3 | 0.62 | 0.637 | 0.644 | 0.607 | 0.731 | 0.602 | 0.6402 |
| models | bicycle | bus | car | motorbike | person | train | AP |
|---|---|---|---|---|---|---|---|
| YOLOv8 | 0.724 | 0.788 | 0.769 | 0.673 | 0.669 | 0.707 | 0.7217 |
| YOLOv7 | 0.298 | 0.539 | 0.353 | 0.3 | 0.33 | 0.523 | 0.3905 |
| YOLOv6 | 0.375 | 0.589 | 0.38 | 0.334 | 0.417 | 0.429 | 0.4207 |
| YOLOv5 | 0.611 | 0.742 | 0.637 | 0.581 | 0.619 | 0.624 | 0.6357 |
| YOLOv4 | 0.6283 | 0.7861 | 0.7136 | 0.7021 | 0.6418 | 0.6984 | 0.6951 |
| YOLOv3 | 0.292 | 0.34 | 0.347 | 0.261 | 0.364 | 0.258 | 0.3103 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).