Submitted:
26 February 2025
Posted:
28 February 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Two custom datasets, Korzo and Düsseldorf, capturing diverse real-world scenarios in public spaces with annotation of people and luggage,
- Fine-tuned Yolov8, YOLOv11 and DETR models for detecting people and luggage in surveillance camera footage,
- Performance analysis of variants of the fine-tuned YOLOv8 and YOLov11 family of models on a demanding set of visual scenes consisting only of small and medium-sized objects,
- A fine-tuned YOLOv11-l person and language detection model that achieves mAP precision of 71% in demanding surveillance camera scenarios, including medium object precision AP_medium of 94% and small object precison, AP_small of 69%
- Abandoned luggage detection algorithm utilizing temporal and spatial analysis of object detection and tracking to conclude luggage ownership/responsibility and recognize abandonment luggage effectively.
2. Related Work
3. Proposed Luggage Detection System
- A.
- Definition of Abandoned Luggage
- 1.
- The luggage must be unattended—all individuals who were within a predefined radius around the luggage at the time of its initial detection must have moved out of that radius.
- 2.
- The luggage must remain in the same location for a specified period.
- B.
- System and Algorithm for Abandoned Luggage detection
- C.
- Key Algorithm Parameters
4. Development of Abandoned Luggage System
5. Model for People and Luggage Detection
- A.
- YOLOv8 and YOLOv11 families
- B.
- DETR ResNet-50
6. Custom Dataset of Public Places
- A.
- CCTV-Korzo Dataset
- B.
- CCTV-Düsseldorf Dataset
7. Evaluation of Fine-Tuned Person and Luggage Detection Models
- A.
- Evaluation metrics
- B.
- Quantitative results and discussion
- C.
- Quantitative results related to Object Size and discussion
- Small objects: Area < 322 pixels.
- Medium objects: 322 ≤ Area < 962 pixels.
- Large objects: Area ≥ 962 pixels.
- D.
- Qualitative Comparison
8. Different Scenarios for Testing Abandoned Luggage Detection System
- A.
- Single Person Abandoning Luggage
- B.
- Group of People with Luggage
- C.
- Restroom break
- D.
- Shaking and Video Disruptions
- E.
- Crowded Environments
9. Discussion
10. Conclusions
References
- [1] S. M. J. O. D. M. J. Luna E, “Abandoned Object Detection in Video-Surveillance: Survey and Comparison,” Intelligent Sensors, pp. 18(12), 4290, 2018.
- [2] K. &. Q. P. &. G.-P. D. Smith, “Detecting Abandoned Luggage Items in a Public Space,” IEEE Performance Evaluation of Tracking and Surveillance Workshop (PETS), 2006.
- [3] S. S. a. R. T. Ionescu, “Real-Time Deep Learning Method for Abandoned Luggage Detection in Video,” in 6th European Signal Processing Conference (EUSIPCO), Rome, Italy, 2018.
- [4] T. &. S. P. &. S. K. &. C. W. Santad, “Application of YOLO Deep Learning Model for Real Time Abandoned Baggage Detection,” in IEEE 7th Global Conference on Consumer Electronics, Nara, Japan, 2018.
- [5] J. L. H. &. C. L. Chang, “Localized Detection of Abandoned Luggage,” EURASIP Journal on Advances in Signal Processing, 2010.
- [6] X. Z. S. R. a. J. S. K. He, “Faster R-CNN: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. VOL. 39, no. NO. 6, 2017.
- [7] Ultralytics, “Ultralytics Documentation,” 2025. [Online]. Available: https://docs.ultralytics.com/. [Accessed Jan 2025].
- [8] S. D. R. G. A. F. Joseph Redmon, “You Only Look Once: Unified, Real-Time Object Detection,” arXiv preprint arXiv:1506.02640, 2016.
- [9] G. J. e. al., “ YOLOv5 and YOLOv8 implementation documentation,” 2022. [Online]. Available: https://github.com/ultralytics/ultralytics. [Accessed 11 2024].
- [10] S. J. P. a. Q. Yang, “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering, pp. 1345-1359, 2010.
- [11] P. S. Y. J. D. Y. F. W. Z. Y. P. L. W. L. X. W. Yifu Zhang, “Bytetrack: Multi-object tracking by associating every detection box,” in European Conference on Computer Vision , Cham, Switzerland, 2021.
- [12] N. M. F. S. G. U. N. K. A. &. Z. S. Carion, “End-to-End Object Detection with Transformers.,” in European Conference on Computer Vision (ECCV), Glasgow, United Kingdom, 2020.
- [13] S. L. N. a. E. A. B. d. S. R. Padilla, “A Survey on Performance Metrics for Object-Detection Algorithms,” in International Conference on Systems, Signals and Image Processing (IWSSIP), Niteroi, Brazil, 2020.
- [14] L. Q. H. Q. J. S. a. J. J. S. Liu, “A survey and performance evaluation of deep learning methods for small object detection,” Expert Systems with Applications, vol. 172, no. 114602, 2021.
- [15] N. A. A. A. a. B. A. A. -R. A.-G. A. M. Qasim, “Abandoned Object Detection and Classification Using Deep Embedded Vision,” IEEE Access, vol. 12, pp. 35539-35551, 2024.
- [16] R. Khanam, M. Hussain: YOLOv11: An Overview of the Key Architectural Enhancements, 2024, arXiv:2410.17725, [Accessed 01.2025].
- [17] G. Jocher and A. Chaurasia and J. Qiu, Ultralytics YOLOv8, 2023. https://docs.ultralytics.com/models/yolov8/#yolov8-usage-examples [Accessed 01.2025].
- [18] S. Sambolek and M. Ivasic-Kos, “Automatic Person Detection in Search and Rescue Operations Using Deep CNN Detectors,” in IEEE Access, vol. 9, pp. 37905-37922, 2021. [CrossRef]
















| Approach | Ref. | Main Features | Advantages | Disadvantages |
|---|---|---|---|---|
| Background Subtraction Methods (traditional) | [1,3,4] | - Static/moving parts of the image - Foreground segmentation | - Simple to implement - Low computational requirements | - Sensitive to crowded scenes - Low accuracy |
| MCMC and SVM | [2,3] | - Bayesian networks - Classification (SVM) | - Tracks short displacements well | - Sensitive to variations |
| Hybrid Approaches (CNN + Background Subtraction) | [1,5,15] | - Combination of background subtraction and CNN | - Improved accuracy -Fewer false positives | - Limited to sudden changes |
| YOLO | [4,7,8,9,10] | - Single-pass detection -Transfer learning | - High accuracy even in crowded scenes -Easily adaptable | - GPU-intensive during training |
| Model | YOLOv8 | YOLOv11 | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Version | Number of parameters (M) | Model size (MB) | Inference time (ms) * |
mAPval (0.50:0.95) |
Number of parameters (M) | Model size (MB) | Inference time (ms) * |
mAPval (0.50:0.95) |
|
| Nano—n | 3.2 | 8.9 | 11.04 | 37.3 | 2.6 | 9.9 | 10.90 | 39.5 | |
| Small—s | 11.0 | 30.6 | 9.86 | 44.9 | 9.4 | 35.8 | 10.85 | 47 | |
| Medium—m | 25.0 | 68.9 | 16.71 | 50.2 | 20.1 | 76.7 | 16.75 | 51.5 | |
| Large—l | 40.0 | 113.3 | 23.98 | 52.9 | 25.3 | 96.5 | 20.39 | 53.4 | |
| XL—x | 68.0 | 189.0 | 36.48 | 53.9 | 56.9 | 217.1 | 36.30 | 54.7 | |
| DETR | |||||||||
| ResNet-50 | 41.0 | 156.4 | 93.6 | 45.9 | |||||
| Metric | YOLOv8(KD) | YOLOv11(KD) | Difference |
|---|---|---|---|
| NANO version | |||
| Precision | 0,71 | 0,73 | 0,03 |
| Recall | 0,68 | 0,70 | 0,02 |
| mAP@50 | 0,69 | 0,71 | 0,02 |
| Small version | |||
| Precision | 0,75 | 0,75 | 0,00 |
| Recall | 0,76 | 0,73 | -0,03 |
| mAP@50 | 0,76 | 0,75 | -0,01 |
| Medium version | |||
| Precision | 0,79 | 0,72 | -0,07 |
| Recall | 0,76 | 0,80 | 0,04 |
| mAP@50 | 0,77 | 0,76 | -0,01 |
| Large version | |||
| Precision | 0,76 | 0,73 | -0,02 |
| Recall | 0,77 | 0,81 | 0,04 |
| mAP@50 | 0,76 | 0,76 | 0,00 |
| Best Results | |||
| Precision | 0,79 | 0,75 | -0,05 |
| Recall | 0,77 | 0,81 | 0,04 |
| mAP@50 | 0,77 | 0,76 | -0,01 |
| Model | YOLOv8(KD)-m | YOLOv8(KD)-l | YOLOv11(KD)-m | YOLOv11(KD)-l |
|---|---|---|---|---|
| Metric | Value | Value | Value | Value |
| mAP@0.5 | 0.6766 | 0.6787 | 0.7004 | 0.7153 |
| AP_small | 0.6397 | 0.6450 | 0.6673 | 0.6904 |
| AP_medium | 0.9240 | 0.8831 | 0.9390 | 0.9376 |
| AP_large | -1,000 | -10,000 | -10,000 | -10,000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).