Submitted:
03 April 2026
Posted:
08 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Fundamentals of Image Segmentation
2.1. Frameworks
2.2. Datasets
2.3. Quality Control and Assurance in Industry
2.4. Traditional Methods
- Thresholding
- Region Growing
- Clustering
- Watershed Algorithm
3. Types of Image Segmentation
3.1. Semantic Segmentation
- Fully Convolutional Networks (FCNs)
- U-Net
- Efficient Neural Network (ENet)
- V-Net
- Efficient Residual Factorized Network (ERFNet)
- SegNet
- Pyramid Scene Parsing Network (PSPNet)
- DeepLab
- Image Cascade Network (ICNet)
- Bilateral Segmentation Network (BiSeNet)
- Attention U-Net
- SEgmentation TRansformer (SETR)
- Segmenter
- SegFormer
| Model | Year | Description |
|---|---|---|
| FCN (32s,16s,8s) [28] | 2015 | First fully convolutional network for pixel-wise segmentation |
| U-Net [29] | 2015 | Encoder-decoder with skip connections, biomedical focus |
| ENet [30] | 2016 | Lightweight real-time network |
| V-Net [31] | 2016 | 3D extension of U-Net for volumetric data |
| ERFNet [32] | 2017 | Efficient residual factorized convs |
| SegNet [33] | 2017 | Encoder-decoder with pooling indices for efficient upsampling |
| PSPNet [34] | 2017 | Pyramid scene parsing, global context pooling |
| DeepLab v1 [35] | 2017 | Atrous convolutions, multi-scale context |
| ICNet [36] | 2018 | Cascade for real-time semantic segmentation |
| BiSeNet [37] | 2018 | Bilateral path for speed + accuracy balance |
| DeepLab v2–v4 | 2018–2020 | Improved ASPP + encoder-decoder refinement |
| SETR [39] | 2021 | First pure Vision Transformer for segmentation |
| Segmenter [40] | 2021 | Transformer encoder + lightweight decoder |
| SegFormer [41] | 2021 | CNN-Transformer hybrid, efficient |
| Attention U-Net [38] | 2022 | U-Net with attention gates for better localization |
3.2. Instance Segmentation
- R-CNN
- Fast R-CNN
- Sequential Grouping Networks (SGN)
- Mask - R-CNN
- PANet
- MaskLab
- Cascade Mask R-CNN
- Hybrid Task Cascade (HTC)
- You Only Look At CoefficienTs (YOLACT)
- Tensormask
- Segmenting Objects by Locations (SOLO)
- Segmenting Objects by Locations (SOLOv2)
- Conditional Instance Segmentation (CondInst)
- DEtection TRansformer (DETR)
- Deformable DETR
- Conditional DETR
3.3. Panoptic Segmentation
- Unified Panoptic Segmentation Network (UPSNet)
- Adaptive Instance Selection Network (AdaptIS)
- Efficient Panoptic Segmentation Network (EPSNet)
- Fast Panoptic Segmentation Network (FPSNet)
- Panoptic-Deeplab
- Efficient Panoptic Segmentation (Efficientps)
- Mask2Former
- Mask DINO
- Video Panoptic Segmentation Network (VPSNet)
3.4. Hybrid and Combined Models
- Diffusion Network (Difnet)
- SEG-YOLO
- DeepLabCut
- SegDiff
- Swin-Unet
- Segment Anything Model (SAM)
- FastSAM
- Diffumask
- Grounded-SAM
- PS-YOLO-seg
- GS-YOLO-Seg
4. Method Comparison: Advantages and Limitations
5. Application Examples in the Industry
5.1. Common Requirements and Issues in Industrial Environments
6. Meta Analysis of Image Segmentation Methods
6.1. Search Strategy and Selection Criteria
6.2. Qualitative Evaluation
7. Discussion
8. Conclusion and Outlook
References
- Machado, N.C.; Illes, B.; Glistau, E. Logistik und Qualitätsmanagement.
- Owen, D.G. Manufacturing defects. SCL Rev. 2001, 53, 851. [Google Scholar]
- 3.
- 4.
- 5.
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of the Advances in Neural Information Processing Systems, 2019, Vol. 32, pp. 8024–8035.
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. TensorFlow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), 2016, pp. 265–283 title=TensorFlow: A system for large–scale machine learning, author=Abadi, Martin and Barham, Paul and Chen, Jianmin and Chen, Zhifeng and Davis, Andy and Dean, Jeffrey and Devin, Matthieu and Ghemawat, Sanjay and Irving, Geoffrey and Isard, Michael and others, booktitle=12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages=265–283, year=2016.
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in vision: A survey. ACM computing surveys (CSUR) 2022, 54, 1–41. [Google Scholar]
- Wu, Y.; Kirillov, A.; Massa, F.; Lo, W.Y.; Girshick, R. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026.
- Pinheiro, P.O.; Collobert, R.; Dollár, P. Learning to Segment Object Candidates. In Proceedings of the NIPS, 2015.
- Pinheiro, P.O.; Lin, T.Y.; Collobert, R.; Dollár, P. Learning to Refine Object Segments. In Proceedings of the ECCV, 2016.
- Zagoruyko, S.; Lerer, A.; Lin, T.Y.; Pinheiro, P.O.; Gross, S.; Chintala, S.; Dollár, P. A MultiPath Network for Object Detection. In Proceedings of the BMVC title=Fastai: a layered API for deep learning, author=Howard, Jeremy and Gugger, Sylvain, journal=Information, volume=11, number=2, pages=108, year=2020, publisher=MDPI, 2016.
- Bradski, G. The OpenCV Library. Dr. Dobb’s Journal of Software Tools 2022. [Google Scholar]
- Torrey, L.; Shavlik, J. Transfer learning. In Handbook of research on machine learning applications and trends: algorithms, methods, and techniques; IGI Global Scientific Publishing, 2010; pp. 242–264. [Google Scholar]
- 16.
- Everingham, M.; Gool, L.V.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. In Proceedings of the International Journal of Computer Vision, 2010, Vol. 88, pp. 303–338. https://doi.org/10.1007/s11263-009-0275-4.
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. CoRR 2016, abs/1604.01685, [1604.01685]. [Google Scholar]
- Gupta, A.; Dollár, P.; Girshick, R.B. LVIS: A Dataset for Large Vocabulary Instance Segmentation. CoRR 2019, bs/1908.03195, 88–97, [1908.03195]. [Google Scholar]
- Azamfirei, V.; Psarommatis, F.; Lagrosen, Y. Application of automation for in-line quality inspection, a zero-defect manufacturing approach. Journal of Manufacturing Systems 2023, 67, 1–22. [Google Scholar]
- Wu, Z.G.; Lin, C.Y.; Chang, H.W.; Lin, P.T. Inline Inspection with an Industrial Robot (IIIR) for Mass-Customization Production Line. Sensors 2020, 20, 3008. [Google Scholar]
- Kim, H.; Frommknecht, A.; Bieberstein, B.; Stahl, J.; Huber, M.F. Automated end-of-line quality assurance with visual inspection and convolutional neural networks. tm - Technisches Messen 2023, 90, 196–204. [Google Scholar]
- Sahoo, P.K.; Soltani, S.; Wong, A.K. A survey of thresholding techniques. Computer vision, graphics, and image processing 1988, 41, 233–260. [Google Scholar]
- Hojjatoleslami, S.; Kittler, J. Region growing: a new approach. IEEE Transactions on Image processing 1998, 7, 1079–1084. [Google Scholar]
- 25.
- 26.
- Kornilov, A.S.; Safonov, I.V. An Overview ofWatershed Algorithm Implementations in Open Source Libraries. Journal of Imaging 2018, 4. https://doi.org/10.3390/jimaging4100123.
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431–3440.
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation, 2015, [arXiv:cs.CV/1505.04597].
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. Enet: A deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 title=Bisenet: Bilateral segmentation network for real-time semantic segmentation, author=Yu, Changqian and Wang, Jingbo and Peng, Chao and Gao, Changxin and Yu, Gang and Sang, Nong, booktitle=Proceedings of the European conference on computer vision (ECCV), pages=325–341, year=2018 2016.
- Milletari, F.; Navab, N.; Ahmadi, S.A. V-net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 fourth international conference on 3D vision (3DV), Ieee, 2016; pp. 565–571. [Google Scholar]
- Romera, E.; Alvarez, J.M.; Bergasa, L.M.; Arroyo, R. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation. IEEE Transactions on Intelligent Transportation Systems 2017, 19, 263–272. [Google Scholar] [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE transactions on pattern analysis and machine intelligence 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 2881–2890. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs V–net: Fully convolutional neural networks for volumetric medical image segmentation, author=Milletari, Fausto and Navab, Nassir and Ahmadi, Seyed–Ahmad. booktitle=2016 fourth international conference on 3D vision (3DV) IEEE transactions on pattern analysis and machine intelligence pages=565–571, year=2016, organization=Ieee. 2017, 40, 834–848. [Google Scholar]
- Zhao, H.; Qi, X.; Shen, X.; Shi, J.; Jia, J. Icnet for real-time semantic segmentation on high-resolution images. In Proceedings of the Proceedings of the European conference on computer vision (ECCV), 2018; pp. 405–420. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. In Proceedings of the Proceedings of the European conference on computer vision (ECCV), 2018; pp. 325–341. [Google Scholar]
- Zhu, Z.; Yan, Y.; Xu, R.; Zi, Y.; Wang, J. Attention-Unet: A deep learning approach for fast and accurate segmentation in medical imaging. Journal of Computer Science and Software Applications 2022, 2, 24–31. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H.; et al. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition Proceedings of the IEEE/CVF conference on computer vision and pattern recognition Rethinking semantic segmentation from a sequence–to–sequence perspective with transformers, pages=6881–6890, year=2021. 2021; pp. 6881–6890. [Google Scholar]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for semantic segmentation. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision Pyramid scene parsing network, 2021; Proceedings of the IEEE conference on computer vision and pattern recognition; p. pp. 7262–7272 pages=2881–2890. [Google Scholar]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Advances in neural information processing systems;journal=Advances in neural information processing systems 2021, 34, 12077–12090. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2015; pp. 1440–1448. [Google Scholar]
- Liu, S.; Jia, J.; Fidler, S.; Urtasun, R. Sgn: Sequential grouping networks for instance segmentation. In Proceedings of the Proceedings of the IEEE international conference on computer vision Conditional convolutions for instance segmentation, booktitle=European conference on computer vision, pages=282–298, year=2020. 2017; organization=Springer; pp. 3496–3504. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the Proceedings of the IEEE international conference on computer vision, 2017; pp. 2961–2969. [Google Scholar]
- Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Path aggregation network for instance segmentation Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, pages=8759–8768, year=2018. 2018; Proceedings of the IEEE conference on computer vision and pattern recognition; pp. 8759–876. [Google Scholar]
- Chen, L.C.; Hermans, A.; Papandreou, G.; Schroff, F.; Wang, P.; Adam, H.; Carion, Nicolas; Massa, Francisco; Synnaeve, Gabriel; Usunier, Nicolas; Kirillov, Alexander; Zagoruyko, Sergey. Masklab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition End–to–end object detection with transformers, booktitle=European conference on computer vision, 2018; organization=Springer; p. pp. 4013–4022 pages=213–229. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 6154–6162. [Google Scholar]
- Chen, K.; Pang, J.; Wang, J.; Xiong, Y.; Li, X.; Sun, S.; Feng, W.; Liu, Z.; Shi, J.; Ouyang, W.; et al. Hybrid task cascade for instance segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 4974–4983. [Google Scholar]
- 50.
- Chen, X.; Girshick, R.; He, K.; Dollár, P. Tensormask: A foundation for dense object. In ntation. In Proceedings of the Proceedings of the IEEE/CV; internat; p. nference on.
- Wang, X.; Kong, T.; Shen, C. Jiang, Y.; Li, L. Solo: Segmenting objects by loc. In s. In Proceedings of the European conference on computer; sion. Sp; p. 2020, pp. 6.
- 53.
- Tian, Z.; Shen, C.; Chen, H. Conditional convolutions for instance segmentation. In Proceedings of the European conference 282–298. comp.
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection w. h tra.
- Zhu, X.; Su,W.; Lu, L.; Li, B.;Wang, X.; Dai, J. Deformable detr: Deformable for end-to-end object detecti. rXiv preprint arXiv:2010.04159 title=Deformable detr: Deformable transformers for end-to-e t detection.
- Chen, X.;Wei, F.; Zeng, G.;Wang, J. Conditional detr v2: Efficient detection transformer with box queries. arXiv pr. rint arXiv:2207.08914 2022.
- Meng, D.; Chen, X.; Fan, Z.; Zeng, G.; Li, H.; Yuan, Y.; Sun, L.; Wang, J. Conditional detr for fast training converge. In In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, p. 3; pp. 1–3660.
- .
- Xiong, Y.; Liao, R.; Zhao, H.; Hu, R.; Bai, M.; Yumer, E.; Urtasun, R. Upsnet: A unified panoptic segmentation network. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 8818–8826.
- Sofiiuk, K.; Barinova, O.; Konushin, A. Adaptis: Adaptive instance selection network. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 7355–7363.
- Chang, C.Y.; Chang, S.E.; Hsiao, P.Y.; Fu, L.C. EPSNet: efficient panoptic segmentation network with cross-layer attention fusion. In Proceedings of the Proceedings of the Asian conference on computer vision title=EPSNet: efficient panoptic segmentation network with cross-layer attention fusion, author=Chang, Chia-Yuan and Chang, Shuo-En and Hsiao, Pei-Yung and Fu, Li-Chen, booktitle=Proceedings of the Asian conference on computer vision, year=2020, 2020.
- De Geus, D.; Meletis, P.; Dubbelman, G. Fast panoptic segmentation network. IEEE Robotics and Automation Letters 2020, 5, 1742–1749 title=The mapillary vistas dataset for semantic understanding of street scenes, author=Neuhold, Gerhard and Ollmann, Tobias and Rota Bulo, Samuel and Kontschieder, Peter, booktitle=Proceedings of the IEEE international conference on computer vision, pages=4990–4999, year=2017.
- Cheng, B.; Collins, M.D.; Zhu, Y.; Liu, T.; Huang, T.S.; Adam, H.; Chen, L.C. Panoptic-deeplab: A simple, strong, and fast baseline for bottom-up panoptic segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12475–12485.
- Mohan, R.; Valada, A. Efficientps: Efficient panoptic segmentation. International Journal of Computer Vision 2021, 129, 1551–1579.
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1290–1299.
- Li, F.; Zhang, H.; Xu, H.; Liu, S.; Zhang, L.; Ni, L.M.; Shum, H.Y. Mask dino: Towards a unified transformerbased framework for object detection and segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3041–3050.
- Wen, J.; Zhang, Q.; Zhang, G. VPSNet: 3D object detection with voxel purification and fully sparse convolutional networks. The Journal of Supercomputing 2025, 81, 466.
- Jiang, P.; Gu, F.; Wang, Y.; Tu, C.; Chen, B. Difnet: Semantic segmentation by diffusion networks. Advances in Neural Information Processing Systems 2018, 31.
- 70.
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- Nath, T.; Mathis, A.; Chen, A.C.; Patel, A.; Bethge, M.; Mathis, M.W. Using DeepLabCut for 3D markerless pose estimation across species and behaviors. Nature protocols 2019, 14, 2152–2176.
- Amit, T.; Shaharbany, T.; Nachmani, E.;Wolf, L. Segdiff: Image segmentation with diffusion probabilistic models. arXiv preprint arXiv:2112.00390 2021.
- Cao, H.;Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.;Wang, M. Swin-unet: Unet-like pure transformer for medical image segmentation. In Proceedings of the European conference on computer vision. Springer, 2022, pp. 205–218.
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
- Zhao, X.; Ding, W.; An, Y.; Du, Y.; Yu, T.; Li, M.; Tang, M.; Wang, J. Fast segment anything. arXiv preprint arXiv:2306.12156 2023.
- Wu,W.; Zhao, Y.; Shou, M.Z.; Zhou, H.; Shen, C. Diffumask: Synthesizing images with pixel-level annotations for semantic segmentation using diffusion models. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 1206–1217.
- Ren, T.; Liu, S.; Zeng, A.; Lin, J.; Li, K.; Cao, H.; Chen, J.; Huang, X.; Chen, Y.; Yan, F.; et al. Grounded sam: Assembling open-world models for diverse visual tasks. arXiv preprint arXiv:2401.14159 2024.
- Qiu, Z.; Huang, X.; Deng, Z.; Xu, X.; Qiu, Z. PS-YOLO-seg: A Lightweight Instance Segmentation Method for Lithium Mineral Microscopic Images Based on Improved YOLOv12-seg. Journal of Imaging 2025, 11, 230.
- Qiu, Z.; Huang, X.; Sun, Z.; Li, S.; Wang, J. GS-YOLO-Seg: A Lightweight Instance Segmentation Method for Low-Grade Graphite Ore Sorting Based on Improved YOLO11-Seg. Sustainability 2025, 17, 5663.
- Yao, K.; Ortiz, A.; Bonnin-Pascual, F. A weakly-supervised semantic segmentation approach based on the centroid loss: Application to quality control and inspection. IEEE Access 2021, 9, 69010–69026.
- Chen, M.C.; Yen, S.Y.; Lin, Y.F.; Tsai, M.Y.; Chuang, T.H. Intelligent Casting Quality Inspection Method Integrating Anomaly Detection and Semantic Segmentation. Machines 2025, 13. https://doi.org/10.3390/machines13040317.
- Shi, C.; Wang, K.; Zhang, G.; Li, Z.; Zhu, C. Efficient and accurate semi-supervised semantic segmentation for industrial surface defects. Scientific Reports 2024, 14, 21874.
- Tabernik, D.; Šela, S.; Skvarˇc, J.; Skoˇcaj, D. Segmentation-based deep-learning approach for surfacedefect detection. Journal of Intelligent Manufacturing 2020, 31, 759–776 title=Efficient and accurate semi–supervised semantic segmentation for industrial surface defects, author=Shi, Chenbo and Wang, Kang and Zhang, Guodong and Li, Zelong and Zhu, Changsheng, journal=Scientific Reports, volume=14, number=1, pages=21874, year=2024, publisher=Nature Publishing Group UK London.
- Schack, T.; Coenen, M.; Haist, M. Image-based quality control of fresh concrete based on semantic segmentation algorithms. Civil Engineering Design 2024, 6, 96–105.
- Valente, A.; Wada, C.; Neves, D.; Neves, D.; Perez, F.; Megeto, G.; Cascone, M.; Gomes, O.; Lin, Q. Print defect mapping with semantic segmentation. In Proceedings of the Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 3551–3559 title=Print defect mapping with semantic segmentation, author=Valente, Augusto and Wada, Cristina and Neves, Deangela and Neves, Deangeli and Perez, Fabio and Megeto, Guilherme and Cascone, Marcos and Gomes, Otavio and Lin, Qian, booktitle=Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages=3551–3559, year=2020.
- Knott, M.; Odion, D.; Sontakke, S.; Karwa, A.; Defraeye, T. Weakly Supervised Panoptic Segmentation for Defect-Based Grading of Fresh Produce. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 5462–5471 title=Weakly Supervised Panoptic Segmentation for Defect–Based Grading of Fresh Produce, author=Knott, Manuel and Odion, Divinefavour and Sontakke, Sameer and Karwa, Anup and Defraeye, Thijs, booktitle=Proceedings of the Computer Vision and Pattern Recognition Conference, pages=5462–5471, year=2025.
- Nivaggioli, A.; Hullo, J.; Thibault, G. Using 3D models to generate labels for panoptic segmentation of industrial scenes. ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2019, 4, 61–68.
- Ji, X.; Allebach, J.P.; Shakouri, A.; Zhu, F. Efficient Microscopic Image Instance Segmentation for Food Crystal Quality Control. In Proceedings of the 2024 IEEE 26th InternationalWorkshop on Multimedia Signal Processing (MMSP), 2024, pp. 1–6. https://doi.org/10.1109/MMSP61759.2024.10743276.
- Marchi, E.; Fornasier, D.; Miorin, A.; Foresti, G.L. Segmentation networks for detecting overlapping screws in 3D and color images for industrial quality control. Integrated Computer-Aided Engineering 2025, 32, 244–257. https://doi.org/10.1177/10692509251328780.
- Chiu, M.C.; Chen, T.M. Applying Data Augmentation and Mask R-CNN-Based Instance Segmentation Method for Mixed-Type Wafer Maps Defect Patterns Classification. IEEE Transactions on Semiconductor Manufacturing 2021, 34, 455–463. [Google Scholar] [CrossRef]
- Kriegler, J.; Liu, T.; Hartl, R.; Hille, L.; Zaeh, M.F. Automated Quality Evaluation for Laser Cutting in Lithium Metal Battery Production Using an Instance Segmentation Convolutional Neural Network. Journal of Laser Applications 2023, 35, 042072. [Google Scholar] [CrossRef]
- Ferguson, M.; Ak, R.; Lee, Y.T.T.; Law, K.H. Detection and segmentation of manufacturing defects with convolutional neural networks and transfer learning. Smart and sustainable manufacturing systems 2018, 2, 137–164. [Google Scholar] [CrossRef] [PubMed]
- Ribeiro, M.T.; Singh, S.; Guestrin, C. Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016; pp. 1135–1144. [Google Scholar]
- Upadhyaya, N. Low-Code/No-Code platforms and their impact on traditional software development: A literature review. No-Code Platforms and Their Impact on Traditional Software Development: A Literature Review, March 21, 2023) 2023. [Google Scholar]
- Chen, J.; Geng, Y.; Chen, Z.; Pan, J.Z.; He, Y.; Zhang, W.; Horrocks, I.; Chen, H. Zero-shot and few-shot learning with knowledge graphs: A comprehensive survey. Proceedings of the IEEE 2023, 111, 653–685. [Google Scholar] [CrossRef]
| Model | Year | Description |
|---|---|---|
| R-CNN [42] | 2014 | Region proposals + CNN classification |
| Fast R-CNN [43] | 2015 | Faster training with ROI pooling |
| Faster R-CNN [59] | 2015 | Introduced RPN for detection backbone |
| SGN [44] | 2017 | Sequential grouping of pixels into instances |
| Mask R-CNN [45] | 2017 | Adds mask head for pixel-wise instance masks |
| PANet [46] | 2018 | Improves Mask R-CNN with bottom-up path |
| MaskLab [47] | 2018 | Combines semantic + detection features |
| Cascade Mask R-CNN [48] | 2018 | Multi-stage refinement for robust masks |
| HTC [49] | 2019 | Joint box and mask optimization cascade |
| YOLACT [50] | 2019 | Real-time instance segmentation with prototypes |
| TensorMask [51] | 2019 | Dense sliding-window instance masks |
| SOLO [52] | 2019 | Anchor-free instance segmentation |
| SOLOv2 [53] | 2020 | Improved SOLO with dynamic assignment |
| CondInst [54] | 2020 | Dynamic filters for instance-specific masks |
| DETR [55] | 2020 | Transformer for detection, extended to masks |
| Deformable DETR [56] | 2020 | Deformable attention for faster convergence and high-res images |
| Conditional DETR [57,58] | 2021 | Improved DETR convergence and mask quality |
| Model | Year | Description |
|---|---|---|
| Panoptic FPN [5] | 2019 | FPN with semantic + instance heads |
| UPSNet [60] | 2019 | Unified panoptic segmentation network |
| AdaptIS [61] | 2019 | Pixel-wise instance parameter regression |
| FPSNet [63] | 2020 | Lightweight fast panoptic segmentation |
| EPSNet [62] | 2020 | Efficient panoptic segmentation with unified backbone |
| Panoptic-DeepLab [64] | 2020 | Bottom-up approach for things + stuff |
| EfficientPS [65] | 2021 | Efficient panoptic segmentation CNN |
| Mask2Former [66] | 2022 | Transformer with masked attention |
| Mask DINO [67] | 2023 | Extends Mask2Former + DETR |
| VPSNet [68] | 2025 | Video panoptic segmentation network |
| Model | Year | Description |
|---|---|---|
| DifNet [69] | 2018 | Diffusion-based refinement for object boundaries |
| SEG-YOLO [70] | 2019 | YOLO detection with added segmentation heads |
| DeepLabCut [72] | 2019 | Hybrid: segmentation + keypoint-based pose estimation |
| SegDiff [73] | 2021 | Diffusion-based generative segmentation model |
| Swin-Unet [74] | 2022 | U-Net with hierarchical Swin Transformer encoder |
| DiffuMask [77] | 2023 | Iterative diffusion-based mask generation |
| SAM [10] | 2023 | Prompt-based universal segmentation model |
| FastSAM [76] | 2023 | Optimized, real-time variant of SAM |
| Grounded-SAM [78] | 2024 | SAM extended with text/context-driven segmentation |
| PS-YOLO-Seg [79] | 2025 | YOLO with instance segmentation heads |
| GS-YOLO-Seg [80] | 2025 | Enhanced YOLO segmentation for overlapping objects |
| Paper | Method | Applications | Advantage | Segmentation type |
|---|---|---|---|---|
| Yao et al. [81] | Weakly supervised segmentation, centroid loss | Industrial quality control | Pixel annotation not necessary, robust with little data | Semantic |
| Chen et al. [82] | Combination of anomaly detection + segmentation | Defect detection in cast and manufactured parts | Higher detection accuracy | Semantic |
| Shi et al. [83] | Semi-supervised learning | Surface inspection of industrial products | Efficient and accurate with little annotated data | Semantic |
| Tabernik et al. [84] | Surface inspection | Industrial quality control | Pixel-accurate defect detection | Semantic |
| Schack et al. [85] | Semantic segmentation | Fresh concrete quality control | Detection of air pockets and material distribution | Semantic |
| Valente et al. [86] | DeepLab-v3+ segmentation | Print defect mapping | High accuracy through synthetic training data | Semantic |
| Knott et al. [87] | Weakly supervised panoptic segmentation | Automated defect classification of fruit | Combination of semantic + instance, low annotation required | Panoptic |
| Nivaggioli et al. [88] | Synthetic training data + Panoptic segmentation | Industrial scenes | Reduced manual annotation effort, realistic training data | Panoptic |
| Ji et al. [89] | Instance segmentation on microscopic images | Food crystal quality control | Automated precise control | Instance |
| Marchi et al. [90] | 3D + Color image segmentation | Overlapping screws in manufacturing | Robust detection despite overlaps | Instance |
| Jin et al. [91] | Real-time defect detection in moving objects | Production processes | Real-time capability in production | Instance |
| Kriegler et al. [92] | Instance Segmentation CNN | Laser cutting quality in batteries | High-precision defect detection | Instance |
| Chiu et al. [91] | Mask R-CNN + data augmentation | Wafer defect classification | High classification accuracy (97.7%) | Hybrid / Combination |
| Ferguson et al. [93] | CNN + Transfer Learning | Manufacturing defects | Better accuracy with small datasets | Hybrid / Combination |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).