Submitted:
08 July 2024
Posted:
09 July 2024
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
2. Methods
2.1. Model Overview
2.2. Feature Extraction
2.3. Stripe-wise Context Attention
2.4. Feature Fusion
2.5. Auxiliary Branch
3. Experimental Datasets and Setup
3.1. Datasets
3.2. Parameter Setting
3.2.1. Training and Fine-Tuning Configurations
3.2.2. Inference Configurations
3.3. Evaluation Metrics
4. Experiment
4.1. Comparative Experiments
4.2. Scaling Study
4.3. Diagnostic Experiments
4.4. Experiment Conclusion
5. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Minh Dang, L.; Wang, H.; Li, Y.; Nguyen, L.Q.; Nguyen, T.N.; Song, H.K.; Moon, H. Deep learning-based masonry crack segmentation and real-life crack length measurement. 359, 129438. [CrossRef]
- Zheng, M.; Lei, Z.; Zhang, K. Intelligent detection of building cracks based on deep learning. 103, 103987. [CrossRef]
- Ha, J.; Kim, D.; Kim, M. Assessing severity of road cracks using deep learning-based segmentation and detection. 78, 17721–17735. Number: 16 Publisher: Springer. [CrossRef]
- Zhang, J.; Qian, S.; Tan, C. Automated bridge surface crack detection and segmentation using computer vision-based deep learning model. 115, 105225. [CrossRef]
- Deng, J.; Singh, A.; Zhou, Y.; Lu, Y.; Lee, V.C.S. Review on computer vision-based crack detection and quantification methodologies for civil structures. 356, 129238. [CrossRef]
- Gavilán, M.; Balcones, D.; Marcos, O.; Llorca, D.F.; Sotelo, M.A.; Parra, I.; Ocaña, M.; Aliseda, P.; Yarza, P.; Amírola, A. Adaptive Road Crack Detection System by Pavement Classification. 11, 9628–9657. Number: 10 Publisher: Molecular Diversity Preservation International. [CrossRef]
- Jahanshahi, M.R.; Jazizadeh, F.; Masri, S.F.; Becerik-Gerber, B. Unsupervised Approach for Autonomous Pavement-Defect Detection and Quantification Using an Inexpensive Depth Sensor. Publisher: American Society of Civil Engineers.
- Zhang, D.; Zou, Q.; Lin, H.; Xu, X.; He, L.; Gui, R.; Li, Q. Automatic pavement defect detection using 3D laser profiling technology. 96, 350–365. [CrossRef]
- Iyer, S.; Sinha, S.K. Segmentation of Pipe Images for Crack Detection in Buried Sewers. 21, 395–410. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1467-8667.2006.00445.x. [CrossRef]
- Sun, B.C.; Qiu, Y.j. Automatic Identification of Pavement Cracks Using Mathematic Morphology | Proceedings | Vol , No.
- Kamaliardakani, M.; Sun, L.; Ardakani, M.K. Sealed-Crack Detection Algorithm Using Heuristic Thresholding Approach. 30, 04014110. Publisher: American Society of Civil Engineers. [CrossRef]
- Mohan, A.; Poobal, S. Crack detection using image processing: A critical review and analysis. 57, 787–798. [CrossRef]
- Qu, Z.; Lin, L.D.; Guo, Y.; Wang, N. An improved algorithm for image crack detection based on percolation model. 10, 214–221. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/tee.22056. [CrossRef]
- Cha, Y.J.; Ali, R.; Lewis, J.; Buyukozturk, O. Deep learning-based structural health monitoring. 161, 105328. [CrossRef]
- Liu, Z.; Cao, Y.; Wang, Y.; Wang, W. Computer vision-based concrete crack detection using U-net fully convolutional networks. 104, 129–139. [CrossRef]
- Yang, J.; Wang, W.; Lin, G.; Li, Q.; Sun, Y.; Sun, Y. Infrared Thermal Imaging-Based Crack Detection Using Deep Learning. 7, 182060–182077. Conference Name: IEEE Access. [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; Uszkoreit, J.; Houlsby, N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022.
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in Neural Information Processing Systems. Curran Associates, Inc., Vol. 34, pp. 12077–12090.
- Lin, Q.; Li, W.; Zheng, X.; Fan, H.; Li, Z. DeepCrackAT: An effective crack segmentation framework based on learning multi-scale crack features. 126, 106876. [CrossRef]
- Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection. 21, 1525–1535. Conference Name: IEEE Transactions on Intelligent Transportation Systems. [CrossRef]
- Chu, H.; Wang, W.; Deng, L. Tiny-Crack-Net: A multiscale feature fusion network with attention mechanisms for segmentation of tiny cracks. 37, 1914–1931. _eprint: https://onlinelibrary.wiley.com/doi/pdf/. [CrossRef]
- Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. Proceedings of the European conference on computer vision (ECCV), pp. 801–818.
- Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.m. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. 35, 1140–1156.
- Duan, Z.; Liu, J.; Ling, X.; Zhang, J.; Liu, Z. ERNet: A Rapid Road Crack Detection Method Using Low-Altitude UAV Remote Sensing Images. 16, 1741. Number: 10 Publisher: Multidisciplinary Digital Publishing Institute. [CrossRef]
- Meng, S.; Gao, Z.; Zhou, Y.; He, B.; Djerrad, A. Real-time automatic crack detection method based on drone. 38, 849–872. Publisher: John Wiley & Sons, Ltd. [CrossRef]
- Paszke, A.; Chaurasia, A.; Kim, S.; Culurciello, E. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation, [1606.02147 [cs]].
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Bisenet: Bilateral segmentation network for real-time semantic segmentation. Proceedings of the European conference on computer vision (ECCV), pp. 325–341.
- Li, Y.; Ma, R.; Liu, H.; Gaoli, C. Real-time high-resolution neural network with semantic guidance for crack segmentation. 156, 105112. Publisher: Elsevier. [CrossRef]
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11976–11986.
- Yu, W.; Luo, M.; Zhou, P.; Si, C.; Zhou, Y.; Wang, X.; Feng, J.; Yan, S. Metaformer is actually what you need for vision. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10819–10829.
- Cai, X.; Lai, Q.; Wang, Y.; Wang, W.; Sun, Z.; Yao, Y. Poly kernel inception network for remote sensing detection. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 27706–27716.
- Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11963–11975.
- Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. 35, 9969–9982.
- Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13713–13722.
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV), pp. 3–19.
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. 2881–2890.
- Kulkarni, S.; Singh, S.; Balakrishnan, D.; Sharma, S.; Devunuri, S.; Korlapati, S.C.R. CrackSeg9k: A Collection and Benchmark for Crack Segmentation Datasets and Frameworks. Computer Vision – ECCV 2022 Workshops; Karlinsky, L., Michaeli, T., Nishino, K., Eds.; Springer Nature Switzerland: Cham, 2023; pp. 179–195. [Google Scholar]
- Dais, D.; Bal, E.; Smyrou, E.; Sarhosis, V. Automatic crack classification and segmentation on masonry surfaces using convolutional neural networks and transfer learning. 125, 103606. [CrossRef]
- Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic Road Crack Detection Using Random Structured Forests. 17, 3434–3445. Conference Name: IEEE Transactions on Intelligent Transportation Systems. [CrossRef]
- Zou, Q.; Cao, Y.; Li, Q.; Mao, Q.; Wang, S. CrackTree: Automatic crack detection from pavement images. 33, 227–238. [CrossRef]
- Pak, M.; Kim, S. Crack Detection Using Fully Convolutional Network in Wall-Climbing Robot. Advances in Computer Science and Ubiquitous Computing; Park, J.J., Fong, S.J., Pan, Y., Sung, Y., Eds.; Springer Singapore: Singapore, 2021; pp. 267–272. [Google Scholar]
- Liu, Y.; Yao, J.; Lu, X.; Xie, R.; Li, L. DeepCrack: A deep hierarchical feature learning architecture for crack segmentation. 338, 139–153. [CrossRef]
- Junior, G.S.; Ferreira, J.; Millán-Arias, C.; Daniel, R.; Junior, A.C.; Fernandes, B.J.T. Ceramic Cracks Segmentation with Deep Learning. 11, 6017. Number: 13 Publisher: Multidisciplinary Digital Publishing Institute. [CrossRef]
- Dorafshan, S.; Thomas, R.J.; Maguire, M. SDNET2018: An annotated image dataset for non-contact concrete crack detection using deep convolutional neural networks. 21, 1664–1668. [CrossRef]
- Eisenbach, M.; Stricker, R.; Seichter, D.; Amende, K.; Debes, K.; Sesselmann, M.; Ebersbach, D.; Stoeckert, U.; Gross, H.M. How to get pavement distress detection ready for deep learning? A systematic approach. 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2039–2047. ISSN 2161-4407. [CrossRef]
- Özgenel, F. Concrete Crack Segmentation Dataset. [CrossRef]
- Hong, Z.; Yang, F.; Pan, H.; Zhou, R.; Zhang, Y.; Han, Y.; Wang, J.; Yang, S.; Chen, P.; Tong, X.; Liu, J. Highway Crack Segmentation From Unmanned Aerial Vehicle Images Using Deep Learning. 19, 1–5. Conference Name: IEEE Geoscience and Remote Sensing Letters. [CrossRef]
- Liu, Y.; Chu, L.; Chen, G.; Wu, Z.; Chen, Z.; Lai, B.; Hao, Y. PaddleSeg: A High-Efficient Development Toolkit for Image Segmentation, 2021. arXiv:2101.06175.
- Shi, P.; Zhu, F.; Xin, Y.; Shao, S. U2CrackNet: a deeper architecture with two-level nested U-structure for pavement crack detection. 22, 2910–2921. Publisher: SAGE Publications. [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015; Navab, N.; Hornegger, J.; Wells, W.M.; Frangi, A.F., Eds. Springer International Publishing, pp. 234–241. [CrossRef]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. 39, 2481–2495. Publisher: IEEE.
- Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation. 129, 3051–3068. [CrossRef]
- Fan, M.; Lai, S.; Huang, J.; Wei, X.; Chai, Z.; Luo, J.; Wei, X. Rethinking BiSeNet for Real-Time Semantic Segmentation. 9716–9725.
- Zhang, W.; Huang, Z.; Luo, G.; Chen, T.; Wang, X.; Liu, W.; Yu, G.; Shen, C. TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation. 12083–12093.
- Wan, Q.; Huang, Z.; Lu, J.; Yu, G.; Zhang, L. SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation. International Conference on Learning Representations (ICLR), 2023.
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223.
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778.
- Zhou, B.; Khosla, A.; A., L.; Oliva, A.; Torralba, A. Learning Deep Features for Discriminative Localization. CVPR 2016.








| Stage | Downsampling | Upsampling | Output Size | ||||
| Operation | Operation | ||||||
| S0 | Input | 3 | Seg Head | 96 | 2 | 400x400 | |
| S1 | Stem | 3 | 32 | Concatenate | 64 | 96 | 100x100 |
| S2 | CS Stage x 3 | 32 | 32 | Up-samp. | 128 | 64 | 100x100 |
| S3 | Donw-samp. | 32 | 64 | Concatenate | 64 | 128 | 50x50 |
| CS Stage x 3 | 64 | 64 | Up-samp. | 128 | 64 | 50x50 | |
| S4 | Donw-samp. | 64 | 128 | 25x25 | |||
| CS Stage x 4 | 128 | 128 | 25x25 | ||||
| Name | Number | Material |
|---|---|---|
| Masonry[39] | 240 | Masonry structures |
| CFD[40] | 118 | Paths and sidewalks |
| CrackTree200[41] | 175 | Pavement |
| Volker[42] | 427 | Concrete structures |
| DeepCrack[43] | 443 | Concrete and asphalt surfaces |
| Ceramic[44] | 100 | Ceramic tiles |
| SDNET2018[45] | 1,411 | Building facades, bridges, sidewalks |
| Rissbilder[42] | 2,736 | Building surfaces (walls, bridges) |
| Crack500[21] | 3,126 | Pavement |
| GAPS384[46] | 383 | Pavement and concrete surfaces |
| Item | Setting |
|---|---|
| Epoch | 200 |
| Batch Size | 16 |
| Optimizer | Adamw |
| Weight decay | 0.01 |
| Beta1 | 0.9 |
| Beta2 | 0.999 |
| Initial learning rate | 0.005 |
| Learning rate decay type | poly |
| GPU memory | 12 GB |
| Image size | 400x400 |
| Model | mIoU | Pr(%) | Re(%) | F1(%) | Params | FLOPs |
|---|---|---|---|---|---|---|
| Classical | ||||||
| U-Net[51] | 81.36 | 90.60 | 87.00 | 88.76 | 13.40M | 75.87G |
| PSPNet[37] | 81.69 | 89.19 | 88.72 | 88.95 | 21.06M | 54.20G |
| SegNet[52] | 80.50 | 89.71 | 86.57 | 88.11 | 29.61M | 103.91G |
| DeepLabV3+[23] | 80.96 | 88.555 | 88.29 | 88.42 | 2.76M | 2.64G |
| SegFormer[19] | 81.63 | 89.815 | 88.05 | 88.92 | 3.72M | 4.13G |
| SegNext[24] | 81.55 | 89.28 | 88.44 | 88.86 | 4.23M | 3.72G |
| Lightweight | ||||||
| BiSeNet[28] | 81.01 | 89.74 | 87.26 | 88.48 | 12.93M | 34.57G |
| BiSeNetV2[53] | 80.66 | 89.36 | 87.11 | 88.22 | 2.33M | 4.93G |
| STDC[54] | 80.84 | 88.92 | 87.76 | 88.34 | 8.28M | 5.22G |
| STDC2[54] | 80.94 | 89.54 | 87.33 | 88.42 | 12.32M | 7.26G |
| TopFormer[55] | 80.96 | 89.28 | 87.60 | 88.43 | 5.06M | 1.00G |
| SeaFormer[56] | 79.13 | 87.29 | 87.19 | 87.20 | 4.01M | 0.64G |
| Specific | ||||||
| U2Crack[50] | 81.45 | 90.125 | 87.52 | 88.80 | 1.19M | 31.21G |
| HrSegNetB48[29] | 81.07 | 90.39 | 86.78 | 88.55 | 5.43M | 5.59G |
| HrSegNetB64[29] | 81.28 | 90.44 | 87.03 | 88.70 | 9.65M | 9.91G |
| CrackScopeNet | 82.15 | 89.34 | 89.24 | 89.29 | 1.047M | 1.58G |
| Model | CrackSeg9k | Ozgenel | Aerial Track Dataset | ||||||
| mIoU | Param | FLOPs | mIoU | mIoU(F) | FLOPs | mIoU | mIoU(F) | FLOPs | |
| CSNet | 82.15% | 1.05M | 1.58G | 90.05% | 92.11% | 1.98G | 79.12% | 82.63% | 2.59G |
| CSNet_L | 82.48% | 2.20M | 5.09G | 90.71% | 92.36% | 6.38G | 81.04% | 83.43% | 8.33G |
| Attention | Mutil-branch | Decoder | mIoU(%) | FLOPs(G) | |||
| SWA | CA | CBAM | ours | ASPP | |||
| ✓ | 81.34 | 1.57 | |||||
| ✓ | ✓ | ✓ | 81.98 | 1.58 | |||
| ✓ | ✓ | ✓ | 81.95 | 1.58 | |||
| ✓ | ✓ | 81.91 | 1.61 | ||||
| ✓ | ✓ | 82.14 | 2.89 | ||||
| ✓ | ✓ | ✓ | 82.15 | 1.58 | |||
| model | mIoU(%) | Pr(%) | Re(%) | F1(%) | Params | FLOPs |
|---|---|---|---|---|---|---|
| HrSegNetB48 | 81.07 | 90.39 | 86.78 | 88.55 | 5.43M | 5.59G |
| HrSegNetB48+CBAM | 81.16 | 90.40 | 86.90 | 8.61 | 5.44M | 5.60G |
| HrSegNetB48+CA | 81.20 | 90.24 | 87.08 | 88.63 | 5.44M | 5.60G |
| HrSegNetB48+SWA | 81.72 | 89.65 | 88.33 | 88.98 | 5.48M | 5.60G |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).