Submitted:
27 May 2024
Posted:
28 May 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- (1)
- In response to complex and dynamic fire environments, ConvNeXt is applied as a backbone to enhance algorithm’s ability to extract features of different scales.
- (2)
- While introducing CCFM, SSFI is proposed. The combination of these modules forms the Mixed encoder, which improves detection accuracy and greatly reduces the amount of computation.
- (3)
- The loss function is modified to PIoU v2, which solves the slow convergence issue and improves the model’s stability in complex fire scenarios.
- (4)
- Human is considered as a detection class, which helps firefighters identify potential victims in fires and improve the practical value of the model.
2. Related Works
3. Methodology
3.1. Overall Architecture of FCM-DETR

3.2. ConvNeXt Backbone
3.3. Mixed Encoder
3.3.1. Separable Single-Scale Feature Interaction
3.3.2. CNN-Based Cross-Scale Feature-Fusion Module
3.4. IoU-Based Loss Function
3.4.1. GIoU
3.4.2. DIoU
3.4.3. CIoU
3.4.4. SIoU
3.4.5. PIoU
4. Experiment Settings
4.1. Image Dataset
4.2. Evaluation Metrics
4.3. Experimental Environment
4.4. Optimization Method and Other Details
5. Result Analysis
5.1. Effectiveness of Backbone
5.2. Effectiveness of PIoU v2
5.3. Ablation Experiments
- (1)
- Compared with the first and fourth experimental groups, the result shows that using PIoU v2 as the loss function slightly improves the detection precision of the algorithm, but has almost no effect on the parameter and computational complexity.
- (2)
- Compared with the first and second experiments, the result reveals that ConvNeXt significantly reduced the number of parameters while improving Accuracyfire, Accuracysmoke Accuracyhuman and mAP.
- (3)
- Compared with the first and third groups of experiments, it is found that upgrading the original encoder to the Mixed encoder reduces the computational complexity but increases the number of parameters and reduce Accuracysmoke and Accuracyhuman slightly.
- (4)
- Compared with the experiments of the sixth and seventh groups, it can be found that although the Mixed encoder is the main reason for the increase in model parameter count, but it also ensures the improvement of the model’s accuracy in detecting fires and humans, as well as mAP.
5.4. Comparison with Other Models
6. Conclusion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Hall, Shelby, and Ben Evarts. “Fire loss in the United States during 2021.” National Fire Protection Association (NFPA) (2022).
- Wang, Z., Wang, Z., Zou, Z. et al. Severe Global Environmental Issues Caused by Canada’s Record-Breaking Wildfires in 2023. Adv. Atmos. Sci. (2023).
- M. D. Nguyen, H. N. Vu, D. C. Pham, B. Choi and S. Ro, “Multistage Real-Time Fire Detection Using Convolutional Neural Networks and Long Short-Term Memory Networks,” in IEEE Access, vol. 9, pp. 146667-146679, 2021. [CrossRef]
- Çetin, A. Enis, et al. “Video fire detection–review.” Digital Signal Processing 23.6 (2013): 1827-1843.
- Töreyin, B. Uğur, et al. “Computer vision based method for real-time fire and flame detection.” Pattern recognition letters 27.1 (2006): 49-58. [CrossRef]
- P. V. Koerich Borges, J. Mayer and E. Izquierdo, “Efficient visual fire detection applied for video retrieval,” 2008 16th European Signal Processing Conference, Lausanne, Switzerland, 2008, pp. 1-5.
- Y. H. Habiboğlu, O. Günay and A. E. Çetin, “Flame detection method in video using covariance descriptors,” 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, 2011, pp. 1817-1820. [CrossRef]
- Li, Pu, and Wangda Zhao. “Image fire detection algorithms based on convolutional neural networks.” Case Studies in Thermal Engineering 19 (2020): 100625. [CrossRef]
- J. Dunnings and T. P. Breckon, “Experimentally Defined Convolutional Neural Network Architecture Variants for Non-Temporal Real-Time Fire Detection,” 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 2018, pp. 1558-1562.
- Huang, J.; Zhou, J.; Yang, H.; Liu, Y.; Liu, H. A Small-Target Forest Fire Smoke Detection Model Based on Deformable Transformer for End-to-End Object Detection. Forests 2023, 14, 162. [CrossRef]
- Muhammad K, Ahmad J, Baik S W. Early fire detection using convolutional neural networks during surveillance for effective disaster management[J]. Neurocomputing, 2018, 288: 30-42.
- Redmon, Joseph, et al. “You only look once: Unified, real-time object detection.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
- Redmon, Joseph, and Ali Farhadi. “YOLO9000: better, faster, stronger.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
- Redmon, Joseph, and Ali Farhadi. “YOLO9000: better, faster, stronger.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
- Redmon, Joseph, and Ali Farhadi. “Yolov3: An incremental improvement.” arXiv preprint arXiv:1804.02767 (2018).
- Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. “Yolov4: Optimal speed and accuracy of object detection.” arXiv preprint arXiv:2004.10934 (2020).
- Li, Chuyi, et al. “YOLOv6: A single-stage object detection framework for industrial applications.” arxiv preprint arxiv:2209.02976 (2022).
- Wang, Chien-Yao, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.
- Uijlings, Jasper RR, et al. “Selective search for object recognition.” International journal of computer vision 104 (2013): 154-171.
- Girshick, Ross. “Fast r-cnn.” Proceedings of the IEEE international conference on computer vision. 2015.
- Ren, Shaoqing, et al. “Faster r-cnn: Towards real-time object detection with region proposal networks.” Advances in neural information processing systems 28 (2015).
- Cai Z, Vasconcelos N. Cascade R-CNN: High quality object detection and instance segmentation[J]. IEEE transactions on pattern analysis and machine intelligence, 2019, 43(5): 1483-1498.
- Sun, Peize, et al. “Sparse r-cnn: End-to-end object detection with learnable proposals.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021.
- Zhao, L.; Zhi, L.; Zhao, C.; Zheng, W. Fire-YOLO: A Small Target Object Detection Method for Fire Inspection. Sustainability 2022, 14, 4930. [CrossRef]
- Carion, Nicolas, et al. “End-to-end object detection with transformers.” European conference on computer vision. Cham: Springer International Publishing, 2020. [CrossRef]
- Zhu, Xizhou, et al. “Deformable DETR: Deformable Transformers for End-to-End Object Detection.” International Conference on Learning Representations. 2020.
- Hu, Yaowen, et al. “Fast forest fire smoke detection using MVMNet.” Knowledge-Based Systems 241 (2022): 108219. [CrossRef]
- Dai, Jifeng, et al. “Deformable convolutional networks.” Proceedings of the IEEE international conference on computer vision. 2017.
- Geng, X., Su, Y., Cao, X. et al. YOLOFM: an improved fire and smoke object detection algorithm based on YOLOv5n. Sci Rep 14, 4543 (2024). [CrossRef]
- Chen, G.; Cheng, R.; Lin, X.; Jiao, W.; Bai, D.; Lin, H. LMDFS: A Lightweight Model for Detecting Forest Fire Smoke in UAV Images Based on YOLOv7. Remote Sens. 2023, 15, 3790. [CrossRef]
- Pan Jin, Xiaoming Ou, and Liang Xu. “A collaborative region detection and grading framework for forest fire smoke using weakly supervised fine segmentation and lightweight faster-RCNN.” Forests 12.6 (2021): 768. [CrossRef]
- Feng, Qihan, Xinzheng Xu, and Zhixiao Wang. “Deep learning-based small object detection: A survey.” Mathematical Biosciences and Engineering 20.4 (2023): 6551-6590. [CrossRef]
- P. Barmpoutis, K. Dimitropoulos, K. Kaza and N. Grammalidis, “Fire Detection from Images Using Faster R-CNN and Multidimensional Texture Analysis,” ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 8301-8305.
- Chaoxia, Chenyu, Weiwei Shang, and Fei Zhang. “Information-guided flame detection based on faster R-CNN.” IEEE Access 8 (2020): 58923-58932. [CrossRef]
- Duan, Kaiwen, et al. “Corner proposal network for anchor-free, two-stage object detection.” European Conference on Computer Vision. Cham: Springer International Publishing, 2020.
- Zhang, Shifeng, et al. “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.
- Zhao, Mingxin, et al. “Quantizing oriented object detection network via outlier-aware quantization and IoU approximation.” IEEE Signal Processing Letters 27 (2020): 1914-1918. [CrossRef]
- Mardani, Konstantina, Nicholas Vretos, and Petros Daras. “Transformer-based fire detection in videos.” Sensors 23.6 (2023): 3035. [CrossRef]
- Li, Yuming, et al. “An efficient fire and smoke detection algorithm based on an end-to-end structured network.” Engineering Applications of Artificial Intelligence 116 (2022): 105492. [CrossRef]
- Huang, Jingwen, et al. “A small-target forest fire smoke detection model based on deformable transformer for end-to-end object detection.” Forests 14.1 (2023): 162. [CrossRef]
- Meng, Depu, et al. “Conditional detr for fast training convergence.” Proceedings of the IEEE/CVF international conference on computer vision. 2021.
- Liu, Shilong, et al. “DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR.” International Conference on Learning Representations. 2021.
- Li, Feng, et al. “Lite DETR: An interleaved multi-scale encoder for efficient detr.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017).
- He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
- Liu, Ze, et al. “Swin transformer: Hierarchical vision transformer using shifted windows.” Proceedings of the IEEE/CVF international conference on computer vision. 2021.
- Liu, Zhuang, et al. “A convnet for the 2020s.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022.
- Mehta, Sachin, and Mohammad Rastegari. “Separable Self-attention for Mobile Vision Transformers.” Transactions on Machine Learning Research (2022).
- Lv, Wenyu, et al. “Detrs beat yolos on real-time object detection.” ariXv preprint ariXv:2304.08069 (2023).
- Rezatofighi, Hamid, et al. “Generalized intersection over union: A metric and a loss for bounding box regression.” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019. [CrossRef]
- Zheng, Zhaohui, et al. “Distance-IoU loss: Faster and better learning for bounding box regression.” Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 07. 2020.
- Gevorgyan, Zhora. “SIoU loss: More powerful learning for bounding box regression.” ariXv preprint ariXv:2205.12740 (2022).
- Liu, Can, et al. “Powerful-IoU: More straightforward and faster bounding box regression loss with a nonmonotonic focusing mechanism.” Neural Networks 170 (2024): 276-284.
- Zhang, Yi-Fan, et al. “Focal and efficient IOU loss for accurate bounding box regression.” Neurocomputing 506 (2022): 146-157. [CrossRef]
- Tong, Zanjia, et al. “Wise-IoU: bounding box regression loss with dynamic focusing mechanism.” arxiv preprint arxiv:2301.10051 (2023).
- Kingma, Diederik P., and Jimmy Ba. “Adam: A method for stochastic optimization.” ariXv preprint ariXv:1412.6980 (2014).
- Tan, Mingxing, and Quoc Le. “Efficientnet: Rethinking model scaling for convolutional neural networks.” International conference on machine learning. PMLR, 2019.
- Woo, Sanghyun, et al. “Convnext v2: Co-designing and scaling convnets with masked autoencoders.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023.
- Lyu, Chengqi, et al. “Rtmdet: An empirical study of designing real-time object detectors.” arxiv preprint arxiv:2212.07784 (2022).
- Chen, Qiang, et al. “Group detr: Fast detr training with group-wise one-to-many assignment.” Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023.








| CPU: Intel Xeon Platinum 8255C @2.50GHz | |
| Hardware configuration | GPU: NVIDIA GeForce RTX 3090 |
| GPU number: 4 | |
| GCC: Ubuntu 9.4.0-1ubuntu1~20.04.1 | |
| CUDA Version: 11.7 | |
| Software configuration | Python: 3.8.10 |
| PyTorch: 1.11.0 | |
| CuDNN: 8.2 |
| Parameter Name | Parameter Value |
| epoch | 100 |
| batch size | 4 |
| accumulative counts | 4 |
| optimizer | AdamW |
| learning rate | 0.0002 |
| weight decay | 0.0001 |
| Backbone | |||||||||
| ResNet-50 | 65.5 | 84.0 | 63.7 | 45.1 | 53.7 | 70.6 | 126.0 | 41.1 | 25.1 |
| EfficientNet-b0 | 64.9 | 81.6 | 63.3 | 35.6 | 53.4 | 69.8 | 71.3 | 16.4 | 18.9 |
| ConvNeXtv2-A | 60.2 | 74.4 | 60.1 | 27.2 | 49.4 | 65.2 | 74.4 | 41.9 | 19.6 |
| ConvNeXt-tiny | 66.1 | 84.3 | 65.2 | 53.6 | 53.1 | 71.6 | 70.8 | 40.8 | 29.8 |
| IoU loss function | Total training time (h) | ||||||
| GIoU | 65.5 | 84.0 | 63.7 | 45.1 | 53.7 | 70.6 | 23.2 |
| DIoU | 65.4 | 82.8 | 63.8 | 39.1 | 52.4 | 70.6 | 19.2 |
| CIoU | 65.6 | 83.8 | 64.4 | 43.2 | 54.3 | 70.8 | 18.0 |
| SIoU | 65.5 | 83.6 | 64.6 | 41.1 | 53.1 | 70.6 | 19.2 |
| PIoUv1 | 65.2 | 83.3 | 64.5 | 48.7 | 51.7 | 70.5 | 18.9 |
| PIoUv2 | 65.6 | 83.6 | 64.8 | 48.2 | 52.8 | 70.7 | 19.5 |
| Improved methods | Evaluation metrics | ||||||||
| Model | ConvNeXt | Mixed Encoder |
Loss function |
||||||
| 1 Deformable DETR | × | × | × | 65.5 | 96.89 | 73.97 | 79.88 | 126.0 | 41.1 |
| 2 | √ | × | × | 66.1 | 97.50 | 80.48 | 80.17 | 70.8 | 40.8 |
| 3 | × | √ | × | 65.8 | 98.01 | 73.27 | 79.99 | 75.5 | 46.3 |
| 4 | × | × | √ | 65.6 | 97.21 | 76.91 | 78.62 | 123.0 | 40.1 |
| 5 | √ | √ | × | 66.6 | 98.05 | 78.09 | 78.89 | 77.5 | 50.1 |
| 6 | √ | × | √ | 66.2 | 97.62 | 80.75 | 79.40 | 79.8 | 40.8 |
| 7 | √ | √ | √ | 66.7 | 98.05 | 78.78 | 80.22 | 77.5 | 50.8 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).