Submitted:
06 January 2026
Posted:
07 January 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methodology
2.1. Literature Search Strategy
2.2. Inclusion and Exclusion Criteria
2.3. Screening and Selection Approach
2.4. Data Extraction and Categorization
3. Architectural Landscape of Face Mask Detection Models
3.1. Conventional CNN-Based Approaches
3.2. Lightweight Convolutional Models
3.3. Hybrid Architectures
4. Comparative Performance Analysis
4.1. Evaluation Metrics
- TP (True Positive): correctly predicted positive instances
- FP (False Positive): incorrect positive predictions
- FN (False Negative): missed positive instances
- TN (True Negative): correctly predicted negative instances
4.1.1. Accuracy
4.1.2. Precision
4.1.3. Recall (Sensitivity)
4.1.4. F1-Score
4.2. Trade-Offs Between Accuracy and Efficiency
4.2.1. Impact of Model Size and Architecture on Accuracy and Efficiency
4.2.2. Comparative Performance of Lightweight and Heavyweight Models
5. Future Research Directions
5.1. Improper Mask Detection and Multi-Class Analysis
5.2. Domain Adaptation and Real-World Variability
- Unsupervised domain adaptation (UDA) for aligning feature distributions across environments.
- Self-supervised representation learning to reduce dependency on labels
- Cross-dataset training pipelines that incorporate heterogeneous noise, mask materials, and cultural variations
- Synthetic domain randomization to simulate low-quality or occluded footage
5.3. Expanding Applications Beyond Mask Detection
- PPE compliance monitoring (helmets, gloves, lab coats)
- Human behavior analysis (face-touching detection, cough detection, proximity violations)
- Health screening (visible respiratory cues, temperature screening integration)
- Access-control and identity verification under occlusion
- Crowd analytics and anomaly detection for smart-city infrastructure
6. Conclusion
References
- Liang, M.; Gao, L.; Cheng, C.; Zhou, Q.; Uy, J.P.; Heiner, K.; Sun, C. Efficacy of face mask in preventing respiratory virus transmission: A systematic review and meta-analysis. Travel Med. Infect. Dis. 2020, 36, 101751. [Google Scholar] [CrossRef]
- Sethi, S.; Kathuria, M.; Kaushik, T. Face mask detection using deep learning: An approach to reduce risk of Coronavirus spread. J. Biomed. Informatics 2021, 120, 103848. [Google Scholar] [CrossRef]
- Wu, P.; Li, H.; Zeng, N.; Li, F. FMD-Yolo: An efficient face mask detection method for COVID-19 prevention and control in public. Image Vis. Comput. 2021, 117, 104341–104341. [Google Scholar] [CrossRef]
- Kolosov, D.; Kelefouras, V.; Kourtessis, P.; Mporas, I. Anatomy of Deep Learning Image Classification and Object Detection on Commercial Edge Devices: A Case Study on Face Mask Detection. IEEE Access 2022, 10, 109167–109186. [Google Scholar] [CrossRef]
- Ullah, N.; Javed, A.; Ghazanfar, M.A.; Alsufyani, A.; Bourouis, S. A novel DeepMaskNet model for face mask detection and masked facial recognition. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 9905–9914. [Google Scholar] [CrossRef] [PubMed]
- Abbas, S.F.; Shaker, S.H.; Abdullatif, F.A. Face Mask Detection Based on Deep Learning: A Review. J. Soft Comput. Comput. Appl. 2024, 1, 7. [Google Scholar] [CrossRef]
- Amer, F.; Ali, M.; Al-Tamimi, M. S. H. Face mask detection methods and techniques: A review. Int. J. Nonlinear Anal. Appl 2022, vol. 13, 2008–6822. [Google Scholar] [CrossRef]
- Vibhuti; Jindal, N.; Singh, H.; Rana, P.S. Face mask detection in COVID-19: a strategic review. Multimedia Tools Appl. 2022, 81, 40013–40042. [Google Scholar] [CrossRef]
- Alturki, R.; Alharbi, M.; AlAnzi, F.; Albahli, S. Deep learning techniques for detecting and recognizing face masks: A survey. Front. Public Heal. 2022, 10, 955332. [Google Scholar] [CrossRef] [PubMed]
- Anggraini, N.; Ramadhani, S.H.; Wardhani, L.K.; Hakiem, N.; Shofi, I.M.; Rosyadi, M.T. Development of Face Mask Detection using SSDLite MobilenetV3 Small on Raspberry Pi 4. 2022 5th International Conference of Computer and Informatics Engineering (IC2IE); LOCATION OF CONFERENCE, IndonesiaDATE OF CONFERENCE; pp. 209–214.
- Tan, M.; Le, Q. V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. 36th International Conference on Machine Learning, ICML 2019, Dec. 01, 2025; vol. 2019-June, pp. 10691–10700, May 2019. Available online: https://arxiv.org/pdf/1905.11946.
- Sanjaya, S.A.; Rakhmawan, S.A. Face Mask Detection Using MobileNetV2 in The Era of COVID-19 Pandemic. 2020 International Conference on Data Analytics for Business and Industry: Way Towards a Sustainable Economy (ICDABI); LOCATION OF CONFERENCE, BahrainDATE OF CONFERENCE; pp. 1–5.
- Shao, Y.; Ning, J.; Shao, H.; Zhang, D.; Chu, H.; Ren, Z. Lightweight face mask detection algorithm with attention mechanism. Eng. Appl. Artif. Intell. 2024, 137. [Google Scholar] [CrossRef]
- Dodda, R.; C, R.; R.S., U; Azmera, C.N.; M, S.; Nimmala, S. Real-Time Face Mask Detection Using Deep Learning: Enhancing Public Health and Safety. In CONFERENCE NAME, LOCATION OF CONFERENCE, COUNTRYDATE OF CONFERENCE; p. 02013.
- Sheikh, B.U.H.; Zafar, A. RRFMDS: Rapid Real-Time Face Mask Detection System for Effective COVID-19 Monitoring. SN Comput. Sci. 2023, 4, 1–19. [Google Scholar] [CrossRef]
- Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2014, Accessed: Dec. 04, 2025; Available online: https://arxiv.org/pdf/1409.1556.
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A.; Liu, W.; et al. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Proc. AAAI Conf. Artif. Intell. 2017, 31. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. CoRR 2015. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
- Radosavovic, I.; Kosaraju, R.P.; Girshick, R.; He, K.; Dollar, P. Designing Network Design Spaces. Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020, Volume 10, 10425–10433. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar] [CrossRef]
- Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar] [CrossRef]
- Cabani, A.; Hammoudi, K.; Benhabiles, H.; Melkemi, M. MaskedFace-Net – A dataset of correctly/incorrectly masked face images in the context of COVID-19. Smart Heal. 2021, 19, 100144–100144. [Google Scholar] [CrossRef]
- Jiang, X.; Gao, T.; Zhu, Z.; Zhao, Y. Real-Time Face Mask Detection Method Based on YOLOv3. Electronics 2021, 10, 837. [Google Scholar] [CrossRef]
- Mahmoud, M.; Kasem, M.S.; Kang, H.-S. A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking. Appl. Sci. 2024, 14, 8781. [Google Scholar] [CrossRef]
- Howard, A. G. “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr. 2017, Accessed: Dec. 01, 2025. Available online: https://arxiv.org/pdf/1704.04861.
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, UT, USA, Jun. 2018; IEEE; pp. 4510–4520. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chen, B.; Wang, W.; Chen, L.-C.; Tan, M.; Chu, G.; Vasudevan, V.; Zhu, Y.; Pang, R.; et al. Searching for MobileNetV3. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv 2017. [Google Scholar] [CrossRef]
- Iandola, F. N.; Han, S.; Moskewicz, M. W.; Ashraf, K.; Dally, W. J.; Keutzer, K. “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and . 01 Dec 2025. Available online: https://arxiv.org/pdf/1602.07360.
- Nagrath, P.; Jain, R.; Madan, A.; Arora, R.; Kataria, P.; Hemanth, J. SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 2021, 66, 102692–102692. [Google Scholar] [CrossRef]
- Al-Rammahi, A.H.I. Face mask recognition system using MobileNetV2 with optimization function. Appl. Artif. Intell. 2022, 36. [Google Scholar] [CrossRef]
- Fadly, F.; Kurniawan, T.B.; Dewi, D.A.; Zakaria, M.Z.; Hisham, P.A.A.B. Deep Learning Based Face Mask Detection System Using MobileNetV2 for Enhanced Health Protocol Compliance. J. Appl. Data Sci. 2024, 5, 2067–2078. [Google Scholar] [CrossRef]
- Sharma, M.; Gunwant, H.; Saggar, P.; Gupta, L.; Gupta, D. EfficientNet-B0 Model for Face Mask Detection Based on Social Information Retrieval. Int. J. Inf. Syst. Model. Des. 2022, 13, 1–15. [Google Scholar] [CrossRef]
- Azouji, N.; Sami, A.; Taheri, M. EfficientMask-Net for face authentication in the era of COVID-19 pandemic. Signal, Image Video Process. 2022, 16, 1991–1999. [Google Scholar] [CrossRef]
- Chakma, B.B.; Masud, M.A.; Ahamed, T.; Tusher, M.H. IDENTIFICATION OF FACE MASK USING CONVOLUTIONAL NEURAL NETWORK-BASED EFFICIENTNET MODEL. Khulna Univ. Stud. 2022, 531–538. [Google Scholar] [CrossRef]
- Benitez-Garcia, G.; Prudente-Tixteco, L.; Olivares-Mercado, J.; Takahashi, H. SqueezeMaskNet: Real-Time Mask-Wearing Recognition for Edge Devices. Big Data Cogn. Comput. 2025, 9, 10. [Google Scholar] [CrossRef]
- Loey, M.; Manogaran, G.; Taha, M.H.N.; Khalifa, N.E.M. Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain. Cities Soc. 2021, 65, 102600–102600. [Google Scholar] [CrossRef] [PubMed]
- Balaji, K.; Gowri, S. A Real-Time Face Mask Detection Using SSD and MobileNetV2. 2021 4th International Conference on Computing and Communications Technologies (ICCCT), LOCATION OF CONFERENCE, IndiaDATE OF CONFERENCE; pp. 144–148.
- Pham, T.-N.; Nguyen, V.-H.; Huh, J.-H. Integration of improved YOLOv5 for face mask detector and auto-labeling to generate dataset for fighting against COVID-19. J. Supercomput. 2023, 79, 8966–8992. [Google Scholar] [CrossRef]
- Loey, M.; Manogaran, G.; Taha, M.H.N.; Khalifa, N.E.M. A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 2020, 167, 108288–108288. [Google Scholar] [CrossRef]
- Tabassum, T.; Talukder, A.; Rahman, M.; Rashiduzzaman; Kabir, Z.; Islam, M.; Uddin, A. A Parallel Convolutional Neural Network for Accurate Face Mask Detection in the Fight Against COVID-19. Biomed. Mater. Devices 2025, 1–11. [Google Scholar] [CrossRef]
- Haque, S.B.U. A fuzzy-based frame transformation to mitigate the impact of adversarial attacks in deep learning-based real-time video surveillance systems. Appl. Soft Comput. 2024, 167. [Google Scholar] [CrossRef]
- Dubey, P.; Dubey, P.; Iwendi, C.; Biamba, C.N.; Rao, D.D. Enhanced IoT-Based Face Mask Detection Framework Using Optimized Deep Learning Models: A Hybrid Approach With Adaptive Algorithms. IEEE Access 2025, 13, 17325–17339. [Google Scholar] [CrossRef]
- Parikh, D.; Karthikeyan, A.; Ravi, V.; Shibu, M.; Singh, R.; Sofana, R.S. IoT and ML-driven framework for managing infectious disease risks in communal spaces: a post-COVID perspective. Front. Public Heal. 2025, 13, 1552515. [Google Scholar] [CrossRef]
- Truong, C.; Mishra, S.; Long, N.Q.; Ngoc, L.A. Efficient Face Mask Detection for Banking Information Systems. Creative Approaches Towards Development of Computing and Multidisciplinary IT Solutions for Society 2024, 435–454. [Google Scholar] [CrossRef]
- Jiang, X.; Gao, T.; Zhu, Z.; Zhao, Y. Real-Time Face Mask Detection Method Based on YOLOv3. Electronics 2021, 10, 837. [Google Scholar] [CrossRef]
- Himeur, Y.; Al-Maadeed, S.; Varlamis, I.; Al-Maadeed, N.; Abualsaud, K.; Mohamed, A. Face Mask Detection in Smart Cities Using Deep and Transfer Learning: Lessons Learned from the COVID-19 Pandemic. Systems 2023, 11, 107. [Google Scholar] [CrossRef]
- George, A.; Ecabert, C.; Shahreza, H.O.; Kotwal, K.; Marcel, S. EdgeFace: Efficient Face Recognition Model for Edge Devices. IEEE Trans. Biom. Behav. Identit- Sci. 2024, 6, 158–168. [Google Scholar] [CrossRef]
- Anh, T.N.; Nguyen, V.D. MAPBoost: augmentation-resilient real-time object detection for edge deployment. J. Real-Time Image Process. 2025, 23, 10. [Google Scholar] [CrossRef]
- Hamdi, A.; Noura, H.; Azar, J.; Pujolle, G. Frugal Object Detection Models: Solutions, Challenges and Future Directions. In 2025 International Wireless Communications and Mobile Computing (IWCMC); LOCATION OF CONFERENCE, COUNTRYDATE OF CONFERENCE; pp. 1694–1701.
- Qian, J.; Mu, S.; Lu, H.; Xu, S. Two-stage model re-optimization and application in face recognition. Neurocomputing 2025, 651. [Google Scholar] [CrossRef]
- Mostafa, S.A.; Ravi, S.; Zebari, D.A.; Zebari, N.A.; Mohammed, M.A.; Nedoma, J.; Martinek, R.; Deveci, M.; Ding, W. A YOLO-based deep learning model for Real-Time face mask detection via drone surveillance in public spaces. Inf. Sci. 2024, 676. [Google Scholar] [CrossRef]
- Hussain, D.; Ismail, M.; Hussain, I.; Alroobaea, R.; Hussain, S.; Ullah, S.S. Face Mask Detection Using Deep Convolutional Neural Network and MobileNetV2-Based Transfer Learning. Wirel. Commun. Mob. Comput. 2022, 2022, 1–10. [Google Scholar] [CrossRef]
- Hagui, I.; Msolli, A.; Helali, A.; Fredj, H. Face Mask Detection using CNN: A Fusion of Cryptography and Blockchain. Eng. Technol. Appl. Sci. Res. 2024, 14, 17156–17161. [Google Scholar] [CrossRef]
- Umer, M.; Sadiq, S.; Alhebshi, R.M.; Alsubai, S.; Al Hejaili, A.; Eshmawi, A.A.; Nappi, M.; Ashraf, I. Face mask detection using deep convolutional neural network and multi-stage image processing. Image Vis. Comput. 2023, 133. [Google Scholar] [CrossRef]
- Benifa, J.V.B.; Chola, C.; Muaad, A.Y.; Bin Hayat, M.A.; Bin Heyat, B.; Mehrotra, R.; Akhtar, F.; Hussein, H.S.; Vargas, D.L.R.; Castilla, Á.K.; et al. FMDNet: An Efficient System for Face Mask Detection Based on Lightweight Model during COVID-19 Pandemic in Public Areas. Sensors 2023, 23, 6090. [Google Scholar] [CrossRef]
- Bania, R.K. Ensemble of deep transfer learning models for real-time automatic detection of face mask. Multimedia Tools Appl. 2023, 82, 25131–25153. [Google Scholar] [CrossRef] [PubMed]
- Habeeb, Z.Q.; Al-Zaydi, I. Incorrect facemask-wearing detection using image processing and deep learning. Bull. Electr. Eng. Informatics 2023, 12, 2212–2219. [Google Scholar] [CrossRef]
- Kumar, A.; Kalia, A.; Kalia, A. ETL-YOLO v4: A face mask detection algorithm in era of COVID-19 pandemic. Optik 2022, 259, 169051–169051. [Google Scholar] [CrossRef]
- Hosny, K.M.; Ibrahim, N.A.; Mohamed, E.R.; Hamza, H.M. Artificial intelligence-based masked face detection: A survey. Intell. Syst. Appl. 2024, 22. [Google Scholar] [CrossRef]
- Mahmoud, M.; Kasem, M.S.; Kang, H.-S. A Comprehensive Survey of Masked Faces: Recognition, Detection, and Unmasking. Appl. Sci. 2024, 14, 8781. [Google Scholar] [CrossRef]
- Mbunge, E.; Simelane, S.; Fashoto, S.G.; Akinnuwesi, B.; Metfula, A.S. Application of deep learning and machine learning models to detect COVID-19 face masks - A review. Sustain. Oper. Comput. 2021, 2, 235–245. [Google Scholar] [CrossRef]
- Mulani, A.O.; Kulkarni, T.M. Face Mask Detection System Using Deep Learning: A Comprehensive Survey. Communications in Computer and Information Science 2025, vol. 2439 CCIS, 25–33. [Google Scholar] [CrossRef]
- Jayaswal, R.; Dixit, M. AI-based face mask detection system: a straightforward proposition to fight with Covid-19 situation. Multimedia Tools Appl. 2022, 82, 13241–13273. [Google Scholar] [CrossRef] [PubMed]
- Vukicevic, A.M.; Petrovic, M.; Milosevic, P.; Peulic, A.; Jovanovic, K.; Novakovic, A. A systematic review of computer vision-based personal protective equipment compliance in industry practice: advancements, challenges and future directions. Artif. Intell. Rev. 2024, 57, 1–28. [Google Scholar] [CrossRef]
- Benitez-Baltazar, V.H. Autonomic Face Mask Detection with Deep Learning: an IoT Application. Revista mexicana de ingeniería biomédica 2021, vol. 42(no. 2), 160–170. [Google Scholar] [CrossRef]
- Han, Z.; Huang, H.; Fan, Q.; Li, Y.; Li, Y.; Chen, X. SMD-YOLO: An efficient and lightweight detection method for mask wearing status during the COVID-19 pandemic. Comput. Methods Programs Biomed. 2022, 221, 106888–106888. [Google Scholar] [CrossRef]
- Biswas, A.K.; Roy, K. A comparative study on ‘face mask detection’ using machine learning and deep learning algorithms. Artificial Intelligence in e-Health Framework, Volume 1: AI, Classification, Wearable Devices, and Computer-Aided Diagnosis 2025, vol. 1, 193–200. [Google Scholar] [CrossRef]
- Masud, U.; Siddiqui, M.; Sadiq, M.; Masood, S. SCS-Net: An efficient and practical approach towards Face Mask Detection. Procedia Comput. Sci. 2023, 218, 1878–1887. [Google Scholar] [CrossRef]
- Koh, E.J.; Amini, E.; McLachlan, G.J.; Beaton, N. Utilising convolutional neural networks to perform fast automated modal mineralogy analysis for thin-section optical microscopy. Miner. Eng. 2021, 173. [Google Scholar] [CrossRef]
- Sahoo, M.P.; Sridevi, M.; Sridhar, R. Covid prevention based on identification of incorrect position of face-mask. Procedia Comput. Sci. 2024, 235, 1222–1234. [Google Scholar] [CrossRef]
- Koklu, M.; Cinar, I.; Taspinar, Y.S. CNN-based bi-directional and directional long-short term memory network for determination of face mask. Biomed. Signal Process. Control. 2021, 71, 103216–103216. [Google Scholar] [CrossRef] [PubMed]
- Wang, J.; Yuan, S.; Lu, T.; Zhao, H.; Zhao, Y. Fusing YOLOv5s-MediaPipe-HRV to classify engagement in E-learning: From the perspective of external observations and internal factors. Knowledge-Based Syst. 2024, 305. [Google Scholar] [CrossRef]
- Kuriakose, B.; Shrestha, R.; Sandnes, F.E. DeepNAVI: A deep learning based smartphone navigation assistant for people with visual impairments. Expert Syst. Appl. 2022, 212. [Google Scholar] [CrossRef]
| Ref | Experiment | Goal | Materials | Methods | Results | Conclusion |
|---|---|---|---|---|---|---|
| [10] | Object detection | To develop a Raspberry Pi 4–based SSDLite MobileNetV3 Small device capable of detecting correct and incorrect cloth masks, correct and incorrect medical masks, and cases where no mask is worn or the face is obscured. | Raspberry Pi 4 Model B 4Gb, Raspberry Pi 4 Model B Cam V.1, monitor, push button non-momentary switch, fan, diode 1N4001, 3 resistor 470 Ohm, transistor 2n2222 | 1. Trained SSDLite MobilenetV3 Small model with fine-tuning and without fine-tuning. 2. Compared the detection performance of SSDLite MobilenetV3 Small with other models like SSDLite MobilenetV3 Large, SSDLite MobilenetV2, SSD MobilenetV2, SSDLite Mobiledets, and SSDMNV2. 3. Evaluated the detection, FPS, and power consumption of the models. | The SSDLite MobilenetV3 Small model with fine-tuning had the highest FPS compared to other models, but could not detect the incorrect use of masks accurately. The overall accuracy of the SSDLite MobilenetV3 Small model was 70%. | The SSDLite MobilenetV3 Small model offers faster detection than others but is less effective than SSDLite MobilenetV2 in identifying incorrect mask usage. The tool also faces limitations. |
| Object detection model comparison | To compare the performance of different object detection models including SSDLite MobilenetV3 Small, SSDLite MobilenetV3 Large, SSDLite MobilenetV2, SSD MobilenetV2, SSDLite Mobiledets, and SSDMNV2 for face mask detection on Raspberry Pi 4. | Raspberry Pi 4 Model B 4Gb, Raspberry Pi 4 Model B Cam V.1, dataset of face images with and without masks | 1. Trained the different object detection models on the face mask dataset. 2. Evaluated the detection accuracy, FPS, and power consumption of the models on the Raspberry Pi 4. | The SSDLite MobilenetV2 model with fine-tuning had the best detection performance, able to detect all the test cases correctly. The SSDLite MobilenetV3 Small model had the highest FPS but struggled to detect incorrect mask usage. | The SSDLite MobilenetV2 model is the most suitable for face mask detection on Raspberry Pi 4 among the models tested, providing good accuracy and detection speed. | |
| [11] | Empirical study | To systematically study model scaling and identify that carefully balancing network depth, width, and resolution can lead to better performance. | Convolutional Neural Networks (ConvNets) | Systematically studied scaling up ConvNets by adjusting network depth, width, and resolution. | Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models. It is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling. | Carefully balancing network width, depth, and resolution is an important but missing piece, preventing us from better accuracy and efficiency. |
| Methodology development | To propose a new scaling method that uniformly scales all dimensions of depth/width/resolution using a simple yet highly effective compound coefficient. | Convolutional Neural Networks (ConvNets) | Proposed a compound scaling method that uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients. | The proposed compound scaling method can achieve better accuracy and efficiency compared to conventional single-dimension scaling methods. | The compound scaling method enables scaling up a baseline ConvNet to any target resource constraints in a more principled way, while maintaining model efficiency. | |
| Neural architecture search and model scaling | To use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. | Convolutional Neural Networks (ConvNets) | Used neural architecture search to develop a new baseline network called EfficientNet-B0, and then applied the proposed compound scaling method to scale it up and obtain a family of EfficientNet models. | The scaled EfficientNet models significantly outperform other ConvNets in terms of accuracy and efficiency. EfficientNet-B7 achieves state-of-the-art 84.4% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. | The EfficientNet models, developed using the proposed compound scaling method, achieve much better accuracy and efficiency than previous ConvNets. | |
| [12] | Machine learning algorithm through image classification using MobileNetV2 | To develop a face mask detection model that can be used by authorities to make mitigation, evaluation, prevention, and action planning against COVID-19. | 1,916 images of people wearing masks, 1,930 images of people not wearing masks, image size of 224x224 pixels | 1) Collect data, 2) Pre-process data (resize images, convert to array, pre-process using MobileNetV2, perform hot encoding on labels), 3) Split data into 75% training and 25% testing, 4) Construct training image generator for augmentation, build base model with MobileNetV2, add model parameters, compile model, train model, save model, 5) Test model on testing set and evaluate performance metrics (precision, recall, F1-score, accuracy) | The built model can detect people wearing and not wearing face masks with an accuracy of 96.85%. | The face mask detection model developed in this study can be used by authorities to monitor and evaluate the implementation of face mask wearing policies, and help with mitigation, prevention, and action planning against COVID-19. |
| Application of the face mask detection model to real-world data | To apply the developed face mask detection model to images from 25 cities in Indonesia and analyze the percentage of people wearing face masks in each city. | Images from various sources (public place CCTV, shops, traffic cameras) in 25 cities in Indonesia, selected based on data availability | Apply the trained face mask detection model to the images from the 25 cities, calculate the percentage of people wearing and not wearing face masks in each city. | The percentage of people not wearing face masks ranged from 64.14% (Surabaya) to 82.76% (Jambi). | Face mask usage differs across cities, with some showing notably lower compliance. This helps authorities target interventions and allocate resources to areas with the weakest mask-wearing. | |
| Correlation analysis | To evaluate the validity of the face mask wearing percentage data by correlating it with the COVID-19 vigilance index. | Percentage of people wearing face masks in the 25 cities, COVID-19 vigilance index data | Conduct a bivariate correlation analysis between the percentage of people wearing face masks in the cities and the COVID-19 vigilance index. | The percentage of people wearing face masks and the COVID-19 vigilance index have a strong, negative, and significant correlation of -0.62. | The model’s mask-wearing data aligns with the COVID-19 vigilance index, showing that cities with lower mask-wearing rates require higher vigilance against transmission. | |
| [3] | Face mask detection | To propose a novel face mask detection framework FMD-Yolo to monitor whether people wear masks in a right way in public, which is an effective way to block the virus transmission. | Im-Res2Net-101 feature extractor, enhanced path aggregation network En-PAN, localization loss, Matrix NMS method | The feature extractor employs Im-Res2Net-101 which combines Res2Net module and deep residual network, where utilization of hierarchical convolutional structure, deformable convolution and non-local mechanisms enables thorough information extraction from the input. An enhanced path aggregation network En-PAN is applied for feature fusion, where high-level semantic information and low-level details are sufficiently merged. Localization loss is designed and adopted in model training phase, and Matrix NMS method is used in the inference stage. | The proposed FMD-Yolo has achieved the best precision AP50 of 92.0% and 88.4% on the two datasets, and AP75 at IoU=0.75 has improved 5.5% and 3.9% respectively compared with the second one. | The results demonstrate the superiority of FMD-Yolo in face mask detection with both theoretical values and practical significance. |
| [13] | Algorithm development | To propose a novel object detector, lightweight FMD through You Only Look Once (LFMD-YOLO), which can achieve an excellent balance of precision and speed. | C3E(CSP Bottleneck with 3 convolutions-ECA) module, MECAPF(max-pooling ECA pyramid-fast) module, new backbone network combining C3E and MECAPF modules, weighted bidirectional feature pyramid network based on the C3E module (E-BiFPN), detection heads, intersection over union (IoU) | 1. Designed the C3E(CSP Bottleneck with 3 convolutions-ECA) module and MECAPF(max-pooling ECA pyramid-fast) module based on the effective attention mechanism (ECA) to enrich channel information. 2. Proposed a new backbone network combining C3E and MECAPF modules. 3. Designed the weighted bidirectional feature pyramid network based on the C3E module (E-BiFPN) as the feature fusion neck, making full use of multi-scale features to mine more local information and enhancing the representation of small objects of face masks. 4. Further enhanced the model performance by adding detection heads and improving intersection over union (IoU). | The proposed LFMD-YOLO achieves higher detection accuracy with mAPs of 68.7% and 60.1%, respectively, while having lower parameters and GFLOPs. | The proposed LFMD-YOLO can achieve an excellent balance of precision and speed for lightweight face mask detection. |
| [14] | Deep learning-based face mask detection | To develop a deep learning-based system for real-time face mask detection to enhance public health monitoring in environments where mask compliance is critical. | Convolutional Neural Network (CNN) built with TensorFlow and Keras, diverse input images, Google Colab, Google Drive. | Utilize a CNN model to effectively classify individuals as mask-wearing or non-mask-wearing. Apply data preprocessing and augmentation techniques to improve model robustness and generalizability. Leverage cloud-based resources for efficient model training and deployment. | The system achieved high training and validation accuracy, consistent loss reduction, and strong real-time detection. It remained reliable despite minor validation fluctuations, demonstrating resilience and suitability for varied environments. | The DL–based system detects mask usage in real time. Data augmentation improves generalization, allowing reliable performance across varied scenarios and image conditions. |
| [15] | Face mask detection system development | To develop a rapid real-time face mask detection system (RRFMDS) for effective COVID-19 monitoring | Single-shot multi-box detector based on ResNet-10, fine-tuned MobileNetV2, custom dataset of 14,535 images with 5000 incorrect masks, 4789 with masks, and 4746 without masks | Used single-shot multi-box detector for face detection and fine-tuned MobileNetV2 for face mask classification. Trained the system on the custom dataset. | The system can detect all three classes (incorrect masks, with mask and without mask faces) with an average accuracy of 99.15% and 97.81% on training and testing data respectively. The system takes on average 0.14201142 s to process a single frame. | The proposed RRFMDS system is a lightweight and efficient approach for real-time face mask detection from video data. It outperforms existing state-of-the-art models in terms of accuracy and processing speed. |
| Architecture | Year | Key Innovation | Parameter Count | Strengths | Limitations | Original Source |
| LeNet-5 | 1998 | Early CNN architecture (conv + pooling) | ~60K | Simple, stable | Too shallow for modern tasks | [16] |
| AlexNet | 2012 | ReLU, dropout, GPU training | ~60M | Started modern deep learning | Heavy; not edge-friendly | [17] |
| VGG16/VGG19 | 2014 | Deep stacks of 3×3 conv layers | ~138M | Strong features | Extremely large & slow | [18] |
| Inception-v1 | 2015 | Multi-branch convolutions | ~6.8M | Efficient, flexible | Complex structure | [19] |
| Inception-ResNet | 2017 | Residual + inception blocks | 23–55M | Very accurate | Heavy | [20] |
| ResNet (18–101) | 2016 | Skip connections | 11–44M | Deep & stable | Still heavy for edge | [21] |
| DenseNet121 | 2017 | Dense connectivity | ~8M | High feature reuse | Slow inference | [22] |
| Xception | 2017 | Depthwise separable conv | ~22M | Good efficiency | Not lightweight enough | [23] |
| Faster R-CNN | 2015 | Two-stage region detector | Backbone-dependent | Accurate | Slow without GPU | [27] |
| Mask R-CNN | 2017 | Adds segmentation branch | Backbone-dependent | Detects improper masks | Heavy for edge | [28] |
| RegNet | 2020 | Regular network design space | 10–50M | Strong accuracy | Rarely used in mask detection | [24] [Radosavovic2020] |
| Model Type | Key Architectural Concept | Approx. Parameters / Complexity | Typical Usage in Mask Detection |
|---|---|---|---|
| MobileNetV2 | Depthwise separable convolutions with inverted residual bottlenecks | ~3.4M parameters (α=1.0) | Most widely adopted lightweight backbone; real-time mask/no-mask or 3-class classification on embedded devices. |
| EfficientNet-B0 | Compound scaling of depth, width, and resolution | ~5.3M parameters | Used in high-accuracy systems (e.g., EfficientMask-Net); suitable for improper mask detection with slightly higher computational needs. |
| ShuffleNet | Grouped 1×1 convolution with channel shuffle | ~2.3M parameters (1.0×) | Limited adoption; tested in low-resource conditions but less consistent than MobileNet. |
| SqueezeNet / SqueezeMaskNet | Fire module (1×1 squeeze + expand) with attention extensions | ~1.2M (SqueezeNet), ~1.5M (SqueezeMaskNet) | Designed for real-time multi-class classification; high FPS on Jetson-class edge hardware. |
| EfficientMask-Net (2022) | EfficientNet-B0 backbone with large-margin piecewise-linear classifier (LMPL) | ~5.3M parameters | Achieves up to 99.6% accuracy; offers detailed detection of improper mask positioning (nose/chin uncovered). |
| Hybrid CNN–YOLO variants (e.g., MobileNetV2 + YOLO) | Lightweight backbone with optimised detection head | Varies (<8M total) | Used for real-time detection + localisation in surveillance and compliance monitoring; effective for streaming environments. |
| Hybrid Architecture | Backbone Type | Detection/Classification Head | Key Idea | Reported Strengths | Representative Study |
|---|---|---|---|---|---|
| YOLOv2–ResNet50 | ResNet50 (heavy backbone) | YOLOv2 one-stage detector | Combine high-level semantic features with fast one-stage detection | High accuracy in medical mask detection; good robustness | Loey et al. (2021) [44] |
| YOLOv5 + Coordinate Attention | YOLOv5 backbone | Attention-enhanced detection head | Spatial refinement + auto-labelling | Strong mAP improvement; suitable for embedded devices | Pham et al. (2023) [46] |
| MobileNetV2 + SSD | MobileNetV2 (lightweight) | SSD one-stage detector | Lightweight backbone with efficient localizations | Real-time mask detection on edge devices | Balaji & Gowri (2021) [45] |
| CNN Feature Extractor + SVM/ML Classifier | VGG19, ResNet, MobileNet | SVM / KNN / RF classifiers | Deep features + classical ML | Good performance on small datasets; simpler deployment | Loey et al. (2021) [47] |
| YOLOv3-Based Hybrid Detector | CSPDarknet-style backbone | YOLOv3 detection head | Full detector tailored to mask usage | Real-time performance with strong localization | Jiang et al. (2021)[30] |
| Smart-City System-Level Hybrid | CNN/YOLO backbone | IoT + Edge-tier inference pipeline | Combines DL, transfer learning, and IoT | Scalable deployment across large environments | Himeur et al. (2023) [54] |
| Study (Ref.) | Accuracy | Precision | Recall | F1-Score | AP | mAP | ROC / AUC | Use Case / Interpretation in Mask Detection |
|---|---|---|---|---|---|---|---|---|
| [2] Sethi et al., 2021 | ✓ | ✓ | ✓ | ✓ | Binary classifier; strong balanced metrics on curated datasets | |||
| [3] Wu et al., 2022 (FMD-YOLO) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | YOLO detection; AP/mAP used for bounding-box evaluation | |
| [5] Ullah et al., 2022 (DeepMaskNet) | ✓ | ✓ | ✓ | ✓ | Detection + masked-face recognition; reports full metric suite | |||
| [18] VGG (Simonyan & Zisserman) | (✓ ImageNet) | Backbone for early mask-classification pipelines | ||||||
| [21] ResNet (He et al.) | (✓ ImageNet) | Backbone widely reused in mask detection & compliance tasks | ||||||
| [25] R-CNN (Girshick et al.) | ✓ | ✓ | Basis for two-stage detectors adapted for mask detection | |||||
| [27] Faster R-CNN (Ren et al.) | ✓ | ✓ | Used in early mask detectors assessing region-level AP/mAP | |||||
| [33] MobileNetV2 (Sandler et al.) | ✓ | ✓ | ✓ | ✓ | Lightweight backbone for fast mask/no-mask classification | |||
| [37] Nagrath et al., 2021 (SSDMNV2) | ✓ | ✓ | ✓ | ✓ | SSD + MobileNetV2; used in real-time mask detection systems | |||
| [44] Loey et al., 2021 (YOLOv2–ResNet50) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Hybrid YOLO-based medical mask detector | |
| [64] Bania et al., 2023 (Ensemble TL) | ✓ | ✓ | ✓ | ✓ | ✓ | Ensemble ResNet50/Inception/VGG; includes ROC curve & AUC ≈ 0.99 | ||
| [73] Benítez-Baltazar et al., 2021 (IoT Mask Detection) | ✓ | ✓ | ✓ | ✓ | ✓ | IoT access-control system; explicitly reports ROC curve & AUC ≈ 0.96 |
| Architecture Family | Representative Models | Parameter Scale | FPS on Edge Devices (Jetson / RPi / low-power GPU) | Memory / Deployment Characteristics | Suitability for Real-Time Mask Detection |
|---|---|---|---|---|---|
| Conventional CNN Backbones | VGG16/19, ResNet50 [18,21], DenseNet121 [22], InceptionV3 [19], classical transfer-learning approaches | High (8M–140M+) | Low–Moderate (<10–15 FPS without optimization) | Require GPU-class memory; heavy compute | High accuracy under controlled datasets but generally not suitable for real-time edge deployment |
| Two-Stage Detectors (R-CNN Family) | R-CNN [25], Fast R-CNN [26], Faster R-CNN [27], Mask R-CNN [28] | High + region proposal overhead | Low (<5–10 FPS on Jetson; often <5 FPS on RPi) | Large VRAM usage; very slow on CPUs | Excellent detection accuracy, but too slow for practical edge-device mask monitoring |
| Single-Stage Detectors (Heavy Backbones) | YOLOv2–ResNet50 [44], ETL-YOLOv4 [66], drone-based YOLO [59] | Moderate–High (40M–60M+) | Moderate (10–30 FPS on Jetson Xavier; <15 FPS on Nano/RPi) | Need GPU acceleration; moderate memory | Suitable for edge devices only with optimization; strong accuracy but mixed speed |
| Lightweight CNN Backbones (Classification) | MobileNetV1/V2/V3 [32,33,34], EfficientNet-B0 [11], ShuffleNet [35], SqueezeNet [36]; mask-detection works [37,38,39,42] | Low (1M–5M range) | High (30–60 FPS on Jetson Nano; usable on RPi) | Very small footprint; easy to quantize and prune; CPU-friendly | Excellent for fast mask classification once faces are detected; ideal for edge and mobile deployment |
| Lightweight Single-Stage Detectors | SSD-MobileNetV2 (SSDMNV2) [37,45], EfficientMask-Net [41], YOLOv4-tiny / YOLOv5-s variants, SqueezeMaskNet [43] | Low–Moderate (2M–10M) | High (25–90 FPS depending on platform) | Optimized for low memory; fits into IoT/embedded systems | Best trade-off between accuracy and speed; preferred choice for real-time mask detection on edge devices |
| Hybrid & Attention-Enhanced Architectures | SqueezeMaskNet with attention [43], YOLOv5+CoordAttention [46], hybrid MobileNetV2 + detection head [37], IoT-optimized deep learning [50,54]. | Low–Moderate (slightly higher due to attention modules) | High (25–60 FPS with optimized pipelines) | Slightly heavier than lightweight CNNs but still edge deployable | Very promising direction: improved robustness (occlusion, clutter) while remaining efficient |
| Extreme Lightweight / Frugal / Deployment-Engineered Models | Pruned & quantized SqueezeNet/SqueezeMaskNet [43], frugal object detectors [57], augmentation-resilient edge detectors [56] | Very Low (<1M–3M) | Very High (60+ FPS even on modest devices) | Minimal memory; optimized for microcontrollers, NPUs, or minimal-GPU boards | Ideal for massive IoT, smart-city nodes, or hundreds of camera feeds with strict power limits; slight accuracy trade-off |
| Model | Parameters (approx.) | Accuracy (%) | Speed/Resource Use | Notes |
|---|---|---|---|---|
| YOLOv4-tiny | ~6M | Lower than YOLOv4 | Fast, low resource | 1/10th parameters of YOLOv4 [74] |
| MobileNetV2 | Lightweight | ~92.6 | Real-time, embedded devices | Robust for real-time use [37,62,75] |
| DenseNet201 | Heavyweight | 99 | Slower, high resource | Highest accuracy in comparison [78] |
| Mask R-CNN ConvNeXt-T | Heavyweight | Highest | Not suitable for real-time | Best accuracy, poor efficiency [80] |
| Custom Lightweight Net | 0.12M | ~95.5 | Highly efficient | Up to 496x parameter reduction [76] |
| Ensemble (Single+Two) | - | 98.2 | 0.05s/image | High accuracy and speed [2] |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
