Submitted:
19 May 2025
Posted:
20 May 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Multiscale Convolutions with Rectangular Kernels: SmokeNet integrates a multiscale convolution module using rectangular-shaped kernels alongside standard kernels, allowing it to adapt to the irregular shapes often seen in smoke. This approach provides better spatial information, with vertically oriented kernels capturing the tall, narrow shapes typical of wildfire smoke and horizontally oriented kernels suited for the wide, low plumes found in quarry blast smoke.
- Lightweight Multiview Linear Attention: To enhance feature integration without imposing high computational costs, SmokeNet incorporates a linear attention mechanism with multi-view element-wise multiplication, enabling the model to selectively attend to both spatial and channel-wise features. This design preserves accuracy while significantly reducing the parameter count, allowing smoke segmentation even in GPU-constrained settings.
- Layer-Specific Loss: To optimize feature refinement, we introduce a layer-specific loss strategy that minimizes feature gaps across the network’s layers, fostering more detailed and precise feature learning. This approach enhances segmentation accuracy by aligning intermediate feature representations throughout the network, thereby supporting consistent and refined feature extraction without increasing model complexity.
2. Related Work
2.1. Deep Learning Methods for Smoke and Fire Segmentation
2.2. Advances in Attention Mechanisms and Multiscale Feature Representation
2.3. Enhancing Model Robustness and Computational Efficiency
3. Methodology
3.1. Encoder
3.1.1. Multiscale Feature Extraction
3.1.2. Multiview Linear Attention Mechanism
3.2. Decoder
3.2.1. Decoder with Skip Connections
3.2.2. Decoder Stage Operations
3.3. Loss Function
4. Experiments
4.1. Experimental Setup
4.1.1. Deep Learning Architecture
4.1.2. Dataset
- Fire Smoke [25]: A real-world dataset with 3,826 images, including 3,060 training images and 766 test images, as illustrated in Figure 4 (c). It captures both outdoor wildfire smoke and indoor smoke scenarios, providing realistic environments where smoke detection is critical for early fire warning and safety monitoring.
- Quarry Smoke: An industrial dataset comprising 3,703 images, including 2,962 training images and 741 test images, as illustrated in Figures 4 (d), (e), and (f). It represents dense, irregular smoke plumes mixed with dust and debris from quarry blasts, testing the model’s ability to segment smoke in dynamic and high-variability environments.
4.1.3. Quarry Smoke Dataset Collection
4.1.4. Data Augmentation
4.1.5. Performance Metrics
- Mean Intersection over Union (mIoU): Assesses segmentation accuracy by quantifying the overlap between predicted and ground-truth masks.
- Parameter Count: Indicates model scalability and resource usage. Reported in millions (M) or thousands (K), with units specified in each table.
- Floating Point Operations (FLOPs): Measures computational complexity. Reported in gigaflops (GFLOPs), as indicated in the tables.
- Frames per Second (FPS): Reflects inference speed, critical for computationally constrained applications in dynamic environments like quarry blast monitoring.
4.2. Results and Discussion
4.2.1. Results
4.2.2. Impact of Architectural Innovations
4.2.3. Segmentation Performance Comparison
4.2.4. Model Efficiency Comparison
5. Conclusion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9 2015; proceedings, part III 18. . Springer, 2015; pp. 234–241. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3431–3440.
- Howard, A.G. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning. PMLR; 2019; pp. 6105–6114. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856.
- Dosovitskiy, A. An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Yuan, F.; Zhang, L.; Xia, X.; Wan, B.; Huang, Q.; Li, X. Deep Smoke Segmentation. Neurocomputing 2019, 357, 248–260. [Google Scholar] [CrossRef]
- Frizzi, S.; Bouchouicha, M.; Ginoux, J.M.; Moreau, E.; Sayadi, M. Convolutional Neural Network for Smoke and Fire Semantic Segmentation. IET Image Processing 2021, 15, 634–647. [Google Scholar] [CrossRef]
- Yuan, F.; Li, K.; Wang, C.; Fang, Z. A Lightweight Network for Smoke Semantic Segmentation. Pattern Recognition 2023, 137, 109289. [Google Scholar] [CrossRef]
- Hou, F.; Rui, X.; Chen, Y.; Fan, X. Flame and Smoke Semantic Dataset: Indoor Fire Detection with Deep Semantic Segmentation Model. Electronics 2023, 12, 3778. [Google Scholar] [CrossRef]
- Hu, X.; Jiang, F.; Qin, X.; Huang, S.; Yang, X.; Meng, F. An Optimized Smoke Segmentation Method for Forest and Grassland Fire Based on the UNet Framework. Fire 2024, 7, 68. [Google Scholar] [CrossRef]
- Marto, T.; Bernardino, A.; Cruz, G. Fire and Smoke Segmentation Using Active Learning Methods. Remote Sensing 2023, 15, 4136. [Google Scholar] [CrossRef]
- Wang, H.; Peng, L.; Sun, Y.; Wan, Z.; Wang, Y.; Cao, Y. Brightness Perceiving for Recursive Low-Light Image Enhancement. IEEE Transactions on Artificial Intelligence 2025. Early Access. [Google Scholar] [CrossRef]
- Peng, L.; Cao, Y.; Sun, Y.; Wang, Y. Lightweight Adaptive Feature De-drifting for Compressed Image Classification. IEEE Transactions on Multimedia 2024. Early Access. [Google Scholar] [CrossRef]
- Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer, 2018; pp. 3–11. [Google Scholar]
- Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. ERFNet: Efficient Residual Factorized Convnet for Real-Time Semantic Segmentation. IEEE Transactions on Intelligent Transportation Systems 2017, 19, 263–272. [Google Scholar] [CrossRef]
- Li, H.; Xiong, P.; Fan, H.; Sun, J. DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 9522–9531.
- Oktay, O. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018, arXiv:1804.03999. [Google Scholar]
- Wu, T.; Tang, S.; Zhang, R.; Cao, J.; Zhang, Y. CGNet: A Light-weight Context Guided Network for Semantic Segmentation. IEEE Transactions on Image Processing 2020, 30, 1169–1179. [Google Scholar] [CrossRef] [PubMed]
- Mehta, S.; Rastegari, M. Separable Self-Attention for Mobile Vision Transformers. arXiv 2022, arXiv:2206.02680. [Google Scholar]
- Valanarasu, J.M.J.; Patel, V.M. UNeXt: MLP-based Rapid Medical Image Segmentation Network. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, 2022; pp. 23–33. [Google Scholar]
- Ruan, J.; Xiang, S.; Xie, M.; Liu, T.; Fu, Y. MALUNet: A Multi-Attention and Light-weight UNet for Skin Lesion Segmentation. In Proceedings of the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); IEEE, 2022; pp. 1150–1156. [Google Scholar]
- Wang, Y.; Zhou, Q.; Liu, J.; Xiong, J.; Gao, G.; Wu, X.; Latecki, L.J. LEDNet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP); IEEE, 2019; pp. 1860–1864. [Google Scholar]
- Cheng, H.Y.; Yin, J.L.; Chen, B.H.; Yu, Z.M. Smoke 100k: A Database for Smoke Detection. In Proceedings of the 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE); IEEE, 2019; pp. 596–597. [Google Scholar]
- Kaabi, R.; Bouchouicha, M.; Mouelhi, A.; Sayadi, M.; Moreau, E. An Efficient Smoke Detection Algorithm Based on Deep Belief Network Classifier using Energy and Intensity Features. Electronics 2020, 9, 1390. [Google Scholar] [CrossRef]





| Model Configuration | mIoU (%) | #Params (M)↓ | GFLOPs↓ | FPS↑ | |||
|---|---|---|---|---|---|---|---|
| Smoke100k | DS01 | Fire Smoke | Quarry Smoke | ||||
| Baseline | 72.40±0.08 | 70.83±0.06 | 70.45±0.11 | 69.12±0.05 | 0.42 | 0.24 | 54.25 |
| + Multiscale | 72.19±0.10 | 69.78±0.05 | 67.22±0.07 | 63.71±0.06 | 0.23 | 0.08 | 128.65 |
| + MultiviewAttn | 73.81±0.07 | 71.53±0.12 | 69.62±0.04 | 67.74±0.10 | 0.71 | 0.12 | 56.03 |
| + LayerLoss | 70.75±0.09 | 67.45±0.06 | 66.16±0.08 | 63.52±0.07 | 0.42 | 0.24 | 54.25 |
| + Multiscale + LayerLoss | 72.24±0.04 | 71.41±0.05 | 68.95±0.08 | 66.67±0.07 | 0.23 | 0.08 | 128.65 |
| + MultiviewAttn + LayerLoss | 74.10±0.07 | 73.14±0.06 | 72.24±0.05 | 71.67±0.08 | 0.71 | 0.12 | 56.03 |
| + Multiscale + MultiviewAttn | 75.63±0.05 | 73.83±0.09 | 71.22±0.06 | 70.34±0.07 | 0.34 | 0.07 | 77.05 |
| Full Model (SmokeNet) | 76.45±0.10 | 74.43±0.04 | 73.43±0.03 | 72.74±0.06 | 0.34 | 0.07 | 77.05 |
| Methods | mIoU (%) | #Params (M) ↓ | GFLOPs ↓ | FPS ↑ | |||
|---|---|---|---|---|---|---|---|
| Smoke100k | DS01 | Fire Smoke | Quarry Smoke | ||||
| UNet (2015) | 66.13±0.10 | 61.32±0.08 | 60.14±0.06 | 57.18±0.05 | 28.24 | 35.24 | 75.58 |
| UNet++ (2018) | 69.12±0.09 | 64.65±0.04 | 61.77±0.10 | 58.44±0.06 | 9.16 | 10.72 | 91.25 |
| AttentionUNet (2018) | 69.68±0.05 | 66.59±0.12 | 64.15±0.07 | 59.64±0.09 | 31.55 | 37.83 | 46.48 |
| UNeXt-S (2022) | 72.25±0.11 | 71.62±0.07 | 69.59±0.12 | 64.54±0.04 | 0.77 | 0.08 | 202.06 |
| MobileViTv2 (2022) | 71.73±0.10 | 71.54±0.07 | 70.23±0.04 | 69.12±0.11 | 2.30 | 0.09 | 98.84 |
| MALUNet (2022) | 71.81±0.05 | 70.16±0.10 | 69.42±0.04 | 67.64±0.07 | 0.17 | 0.09 | 87.72 |
| ERFNet (2017) | 71.84±0.09 | 71.38±0.10 | 66.59±0.06 | 66.24±0.07 | 2.06 | 3.32 | 61.22 |
| LEDNet (2019) | 70.76±0.10 | 71.63±0.07 | 70.13±0.08 | 67.74±0.11 | 0.91 | 1.41 | 60.19 |
| DFANet (2019) | 66.91±0.05 | 63.87±0.10 | 62.76±0.07 | 70.21±0.09 | 2.18 | 0.44 | 31.05 |
| CGNet (2020) | 75.64±0.07 | 73.76±0.11 | 72.04±0.10 | 71.91±0.08 | 0.49 | 0.86 | 53.53 |
| DSS (2019) | 73.25±0.05 | 72.17±0.04 | 69.78±0.12 | 69.81±0.07 | 30.20 | 184.90 | 32.56 |
| Frizzi (2021) | 73.44±0.06 | 71.67±0.09 | 70.51±0.07 | 70.40±0.11 | 20.17 | 27.90 | 60.32 |
| Yuan (2023) | 75.57±0.07 | 74.84±0.06 | 71.94±0.10 | 70.92±0.08 | 0.88 | 1.15 | 68.81 |
| SmokeNet | 76.45±0.10 | 74.43±0.04 | 73.43±0.03 | 72.74±0.06 | 0.34 | 0.07 | 77.05 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).