Submitted:
24 October 2023
Posted:
25 October 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
- A model called the DPWRN was proposed to solve image semantic segmentation, which integrated PKD and MFF to improve the accuracy of segmentation.
- Three cross-style datasets were used to evaluate the generalization of the DPWRN, in contrast to only one-style dataset tested in existing semantic segmentation models.
- The DPWRN achieved very good results, as compared with the state-of-the-art models; i.e., ranking second (mIoU 75.95%) on CamVid, ranking first (F1-score 83.6%) on DRIVE, and ranking the first (F1-score 86.87%) on eBDtheque.
2. Related Work
3. Proposed Models
3.1. Wide Residual Networks
3.2. Pyramid of Kernel Paralleled with Dilation
3.3. Multi-Feature Fusion
3.4. Decoder
4. Experiments
4.1. Datasets
4.2. Experimental Environments
4.3. Evaluation Indicators
4.4. Experimental Results on CamVid
4.4.1. Ablation Study for the PKD Module and MFF Blocks
4.4.2. Progressive Training
4.4.3. Comparisons and Segmentation Results
4.5. Experimental Results on DRIVE
4.6. Experimental Results on eBDtheque

5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv, 2015; arXiv:1409.1556v6 2015. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 21-26 July 2017. [Google Scholar]
- Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv, 2016; arXiv:1511.07122v3 2016. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv, 2017; arXiv:1706.05587v3 2017. [Google Scholar]
- Yamashita, T.; Furukawa, H.; Fujiyoshi, H. Multiple skip connections of dilated convolution network for semantic segmentation. In Proceedings of the 25th IEEE International Conference on Image Processing, Athens, Greece, 7-10 October 2018. [Google Scholar]
- Liu, L.; Pang, Y.; Zamir, S.W.; Khan, S.; Khan, F.S.; Shao, L. Filling the gaps in atrous convolution: semantic segmentation with a better context. IEEE Access 2020, 8, 34019–34028. [Google Scholar] [CrossRef]
- Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 21-26 July 2017. [Google Scholar]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence 2018, 40, 834–848. [Google Scholar] [CrossRef] [PubMed]
- Mehta, S.; Rastegari, M.; Caspi, A.; Shapiro, L.; Hajishirzi, H. Espnet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8-14 September 2018. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7-12 June 2015. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5-9 October 2015. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18-23 June 2018. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: convolutional block attention module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8-14 September 2018. [Google Scholar]
- Brostow, G.J.; Shotton, J.; Fauqueur, J.; Cipolla, R. Segmentation and recognition using structure from motion point clouds. In Proceedings of the European Conference on Computer Vision, Marseille, France, 12-18 October 2008. [Google Scholar]
- Staal, J.; Abramoff, M.D.; Niemeijer, M.; Viergever, M.A.; van Ginneken, B. Ridge-based vessel segmentation in color images of the retina. IEEE Transactions on Medical Imaging 2004, 23, 501–509. [Google Scholar] [CrossRef] [PubMed]
- Guérin, C. et al. In eBDtheque: a representative database of comics. In Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25-28 August 2013. [Google Scholar]
- Zagoruyko, S.; Komodakis, N. Wide residual networks. arxiv, 2017; arXiv:1605.07146v4. [Google Scholar]
- Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii, USA, 21-26 July 2017. [Google Scholar]
- Abadi, M. et al. In TensorFlow: a system for large-scale machine learning. In Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation, Savannah, GA, USA, 2-4 Nov. 2016. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: a method for stochastic optimization. arXiv, 2017; arXiv:1412.6980v9. [Google Scholar]
- CIFAR-100. Available online: https://www.cs.toronto.edu/~kriz/cifar.html.
- Cordts, M. et al. In The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27-30 June 2016. [Google Scholar]
- Guo, C.; Szemenyei, M.; Yi, Y.; Wang, W.; Chen, B.; Fan, C. SA-UNet: spatial attention u-net for retinal vessel segmentation. In Proceedings of the 25th International Conference on Pattern Recognition, Milan, Italy, 10-15 January 2021. [Google Scholar]
- Zhu, Y. et al. In Improving semantic segmentation via video propagation and label relaxation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15-20 June 2019. [Google Scholar]
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
- Huang, P.Y.; Hsu, W.T.; Chiu, C.Y.; Wu, T.F.; Sun, M. Efficient uncertainty estimation for semantic segmentation in videos. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8-14 September 2018. [Google Scholar]
- Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. BiSeNet: bilateral segmentation network for real-time semantic segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8-14 September 2018. [Google Scholar]
- Bilinski, P.; Prisacariu, V. Dense decoder shortcut connections for single-pass semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18-23 June 2018. [Google Scholar]
- Chandra, S.; Couprie, C.; Kokkinos, I. Deep spatio-temporal random fields for efficient video segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18-23 June 2018. [Google Scholar]
- Li, K.C.; Chiu, C.T.; Hsiao, S.C. Semantic segmentation via enhancing Context Information by fusing multiple high-level features. In Proceedings of the IEEE Workshop on Signal Processing Systems, Coimbra, Portugal, 20-22 October 2020. [Google Scholar]
- Nakayama, Y.; Lu, H.; Li, Y.; Kamiya, T. WideSegNeXt: semantic image segmentation using wide residual network and next dilated unit. IEEE Sensors Journal 2021, 21, 11427–11434. [Google Scholar] [CrossRef]
- Li, F. Fully convolutional pyramidal networks for semantic segmentation. IEEE Access 2020, 8, 229132–229140. [Google Scholar] [CrossRef]
- Liskowski, P.; Krawiec, K. Segmenting retinal blood vessels with deep neural networks. IEEE Transactions on Medical Imaging 2016, 35, 2369–2380. [Google Scholar] [CrossRef] [PubMed]
- Orlando, J.I.; Prokofyeva, E.; Blaschko, M.B. A discriminatively trained fully connected conditional random field model for blood vessel segmentation in fundus images. IEEE Transactions on Biomedical Engineering 2017, 64, 16–27. [Google Scholar] [CrossRef] [PubMed]
- Yan, Z.; Yang, X.; Cheng, K.T. Joint segment-level and pixel-wise losses for deep learning based retinal vessel segmentation. IEEE Transactions on Biomedical Engineering 2018, 65, 1912–1923. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Xia, Y.; Song, Y.; Zhang, Y.; Cai, W. Multiscale network followed network model for retinal vessel segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Granada, Spain, 16-20 September 2018. [Google Scholar]
- Wang, B.; Qiu, S.; He, H. Dual encoding U-Net for retinal vessel segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13-17 October 2019. [Google Scholar]
- Wu, Y.; Xia, Y.; Song, Y.; Zhang, D.; Liu, D.; Zhang, C.; Cai, W. Vessel-Net: retinal vessel segmentation under multi-path supervision. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13-17 October 2019. [Google Scholar]
- Zhang, S.; Fu, H.; Yan, Y.; Zhang, Y.; Wu, Q.; Yang, M.; Tang, M.; Xu, Y. Attention guided network for retinal image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13-17 October 2019. [Google Scholar]
- Li, L.; Verma, M.; Nakashima, Y.; Nagahara, H.; Kawasaki, R. IterNet: retinal image segmentation utilizing structural redundancy in vessel networks. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1-5 March 2020. [Google Scholar]
- Zhou, Y.; Yu, H.; Shi, H. Study group learning: improving retinal vessel segmentation trained with noisy labels. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September– 1 October 2021. [Google Scholar]
- Dubray, D.; Laubrock, J. Deep CNN-based speech balloon detection and segmentation for comic books. In Proceedings of the International Conference on Document Analysis and Recognition, Sydney, Australia, 20-25 September 2019. [Google Scholar]
- Arai, K.; Tolle, H. Method for real time text extraction of digital manga comic. International Journal of Image Processing 2011, 4, 669–676. [Google Scholar]
- Ho, A.K.N.; Burie, J.; Ogier, J. Panel and speech balloon extraction from comic books. In Proceedings of the 10th IAPR International Workshop on Document Analysis Systems, Gold Coast, Queensland, Australia, 27-29 March 2012. [Google Scholar]
- Rigaud, C.; Burie, J.; Ogier, J.; Karatzas, D.; van de Weijer, J. An active contour model for speech balloon detection in comics. In Proceedings of the 12th International Conference on Document Analysis and Recognition, Washington, DC, USA, 25-28 August 2013. [Google Scholar]
- Rigaud, C.; Burie, J.; Ogier, J. Text-independent speech balloon segmentation for comics and manga. In Proceedings of the IAPR International Workshop on Graphics Recognition, Sousse, Tunisia, 20-21 August 2015. [Google Scholar]
- Nguyen, N.V.; Rigaud, C.; Burie, J.C. Multi-task model for comic book image analysis. In Proceedings of the International Conference on Multimedia Modeling, Thessaloniki, Greece, 8-11 January 2019. [Google Scholar]
- Wang, C.M.; Huang, Y.F. Self-adaptive harmony search algorithm for optimization. Expert Systems with Applications 2010, 37, 2826–2837. [Google Scholar] [CrossRef]










| Encoder | PKD module | MFF blocks | mIoU |
|---|---|---|---|
| √ | - | - | 73.71 |
| √ | √ | - | 74.00 |
| √ | - | √ | 75.60 |
| √ | √ | √ | 75.83 |
| Data1 | Data2 | Data3 | Data4 | Data5 | Data6 | Data7 | Data8 | mIoU |
|---|---|---|---|---|---|---|---|---|
| √ | - | - | - | - | - | - | - | 69.83 |
| √ | √ | - | - | - | - | - | - | 71.99 |
| √ | √ | √ | - | - | - | - | - | 72.85 |
| √ | √ | √ | √ | - | - | - | - | × |
| √ | √ | √ | - | √ | - | - | - | × |
| √ | √ | √ | - | - | √ | - | - | 73.58 |
| √ | √ | √ | - | - | √ | √ | - | × |
| √ | √ | √ | - | - | √ | - | √ | 75.95 |
| Methods | Year | Build | Tree | Sky | Car | Sign | Road | Pedes. | Fence | Pole | Swalk | Cyclist | mIoU |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dilate8 [4] | 2015 | 82.6 | 76.2 | 89.0 | 84.0 | 46.9 | 92.2 | 56.3 | 35.8 | 23.4 | 75.3 | 55.5 | 65.3 |
| PSPNet [8] | 2017 | - | - | - | - | - | - | - | - | - | - | - | 69.1 |
| SegNet [26] | 2017 | 89.6 | 83.4 | 96.1 | 87.7 | 52.7 | 96.4 | 62.2 | 53.4 | 32.1 | 93.3 | 36.5 | 60.1 |
| RTA [27] | 2018 | 88.4 | 89.3 | 94.9 | 88.9 | 48.7 | 95.4 | 73.0 | 45.6. | 41.4 | 94.0 | 51.6 | 62.5 |
| BiseNet [28] | 2018 | 83.0 | 75.8 | 92.0 | 83.7 | 46.5 | 94.6 | 58.8 | 53.6 | 31.9 | 81.4 | 54.0 | 68.7 |
| DenseDecoder [29] | 2018 | - | - | - | - | - | - | - | - | - | - | - | 70.9 |
| VideoGCRF [30] | 2018 | 86.1 | 78.3 | 91.2 | 92.2 | 63.7 | 96.4 | 67.3 | 63.0 | 34.4 | 87.8 | 66.4 | 75.2 |
| DeepLabV3Plus+SDCNetAug [25] | 2019 | 90.9 | 82.9 | 92.8 | 94.2 | 69.9 | 97.7 | 76.2 | 74.7 | 51.0 | 91.1 | 78.0 | 81.7 |
| Li et al. [31] | 2020 | - | - | - | - | - | - | - | - | - | - | - | 70.5 |
| WideSeg [32] | 2020 | 84.4 | 77.9 | 92.4 | 84.8 | 52.2 | 95.1 | 67.2 | 50.0 | 45.1 | 83.9 | 65.0 | 72.5 |
| Additive FC-PRnets94 [33] | 2020 | 89.0 | 91.2 | 94.6 | 77.8 | 60.1 | 97.0 | 46.5 | 73.3 | 32.6 | 86.3 | 80.9 | 75.4 |
| Ours (encoder only) | - | 86.19 | 79.08 | 91.37 | 86.52 | 54.58 | 96.82 | 65.43 | 57.71 | 37.77 | 88.62 | 64.70 | 73.71 |
| Ours (full) | - | 87.44 | 80.37 | 93.28 | 88.42 | 58.91 | 97.08 | 71.08 | 57.52 | 45.69 | 89.08 | 66.62 | 75.95 |
| Methods | Year | SE | SP | ACC | AUC | F1-score |
|---|---|---|---|---|---|---|
| Liskowski et al. [34] | 2016 | 78.11 | 98.07 | 95.35 | 97.90 | - |
| Orlando et al. [35] | 2017 | 78.97 | 96.85 | 94.54 | 95.07 | - |
| Yan et al. [36] | 2018 | 76.53 | 98.18 | 95.42 | 97.52 | - |
| MS-NFN [37] | 2018 | 78.44 | 98.19 | 95.67 | 98.07 | - |
| DEU-Net [38] | 2019 | 79.40 | 98.16 | 95.67 | 97.72 | 82.70 |
| Vessel-Net [39] | 2019 | 80.38 | 98.02 | 95.78 | 98.21 | - |
| AG-Net [40] | 2019 | 81.00 | 98.48 | 96.92 | 98.56 | - |
| IterNet [41] | 2020 | 77.35 | 98.38 | 95.73 | 98.16 | 82.05 |
| SA-UNet [24] | 2020 | 82.12 | 98.40 | 96.98 | 98.64 | 82.63 |
| Study Group Learning [42] | 2021 | 83.80 | 98.34 | 97.05 | 98.86 | 83.16 |
| Ours (full) | - | 83.07 | 97.80 | 95.98 | 97.96 | 83.60 |
| Methods | Year | Recall | Precision | F1-score |
|---|---|---|---|---|
| Arai and Tolle [44] | 2011 | 18.70 | 23.14 | 20.69 |
| Ho et al. [45] | 2012 | 14.78 | 32.37 | 20.30 |
| Rigaud et al. [46] | 2013 | 69.81 | 32.83 | 44.66 |
| Rigaud et al. [47] | 2015 | 62.92 | 62.27 | 63.59 |
| Nguyen et al. [48], Mask R-CNN | 2019 | 75.31 | 92.42 | 82.99 |
| Nguyen et al. [48], Comic MTL | 2019 | 74.94 | 92.77 | 82.91 |
| Dubray and Laubrock [43] | 2019 | 75.19 | 89.05 | 78.42 |
| Ours (full) | - | 84.86 | 88.98 | 86.87 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).