Submitted:
15 May 2024
Posted:
15 May 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
- We propose a decoder called MDCFD based on multi-dilation-rate convolution fusion. A corresponding module integrates outputs from convolutions with different dilation rates, addressing dilated convolution’s information loss and improving scale-variant target segmentation.
- We introduce a hybrid attention mechanism called LKSHAM, grounded on large kernel convolutions. In the spatial attention submodule, we embed a convolutional kernel selection strategy to accommodate varying segmentation scales. For channel attention, we consider a large kernel convolution-based attention to enhance the model’s receptive scope, thereby improving its foreground-background distinction and suppressing unrelated background noise.
2. Methods
2.1. Multi-Dilation Rate Convolutional Fusion Module
2.2. Large Kernel Selection Hybrid Attention Module
2.2.1. Large Kernel Selection Spatial Attention Module
2.2.2. Large Kernel Channel Attention Module
3. Results
3.1. Datasets and Data Pre-Processing
3.2. Implementation Details
3.2.1. Hardware Environment
3.2.2. Software Environment
| Hardware/Software | Parameter/Version |
|---|---|
| CPU | Intel Core i9-9900K |
| GPU | GeForce GTX 2080Ti |
| Memory Allocation | 64GB |
| Storage Capability | 4TB |
| Operation System | Ubuntu 20.04 |
| Python | 3.8.2 |
| CUDA | 11.3 |
| PyTorch | 1.12.1 |
| mmcv | 2.0.0 |
| mmsegementation | 1.2.2 |
| numpy | 1.24.4 |
| opencv | 4.9.0 |
3.2.3. Hyperparameter Settings
3.3. Ablation Study
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zheng, Z.; Zhong, Y.; Wang, J.; Ma, A. Foreground-Aware Relation Network for Geospatial Object Segmentation in High Spatial Resolution Remote Sensing Imagery. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2020; pp. 4096–4105.
- Liu, W.; Li, Q.; Lin, X.; Yang, W.; He, S.; Yu, Y. Ultra-High Resolution Image Segmentation via Locality-Aware Context Fusion and Alternating Local Enhancement. arXiv 2021, arXiv:2109.02580. [Google Scholar]
- Ma, A.; Wang, J.; Zhong, Y.; Zheng, Z. FactSeg: Foreground Activation-Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing 2021, 60, 1–16. [Google Scholar] [CrossRef]
- Xu, Z.; Zhang, W.; Zhang, T.; Yang, Z.; Li, J. Efficient Transformer for Remote Sensing Image Segmentation. Remote Sensing 2021, 13, 3585. [Google Scholar] [CrossRef]
- Wang, L.; Li, R.; Duan, C.; Zhang, C.; Meng, X.; Fang, S. A Novel Transformer Based Semantic Segmentation Scheme for Fine-Resolution Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Xu, R.; Wang, C.; Zhang, J.; Xu, S.; Meng, W.; Zhang, X. Rssformer: Foreground Saliency Enhancement for Remote Sensing Land-Cover Segmentation. IEEE Transactions on Image Processing 2023, 32, 1052–1064. [Google Scholar] [CrossRef] [PubMed]
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in neural information processing systems 2021, 34, 12077–12090. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021; pp. 6881–6890. [Google Scholar]
- Li, Y.; Hou, Q.; Zheng, Z.; Cheng, M.-M.; Yang, J.; Li, X. Large Selective Kernel Network for Remote Sensing Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision; 2023; pp. 16794–16805. [Google Scholar]
- Wang, J.; Sun, K.; Cheng, T.; Jiang, B.; Deng, C.; Zhao, Y.; Liu, D.; Mu, Y.; Tan, M.; Wang, X. Deep High-Resolution Representation Learning for Visual Recognition. IEEE transactions on pattern analysis and machine intelligence 2020, 43, 3349–3364. [Google Scholar] [CrossRef] [PubMed]
- Li, G.; Yun, I.; Kim, J.; Kim, J. Dabnet: Depth-Wise Asymmetric Bottleneck for Real-Time Semantic Segmentation. arXiv 2019, arXiv:1907.11357. [Google Scholar]
- Hu, P.; Perazzi, F.; Heilbron, F.C.; Wang, O.; Lin, Z.; Saenko, K.; Sclaroff, S. Real-Time Semantic Segmentation with Fast Attention. IEEE Robotics and Automation Letters 2020, 6, 263–270. [Google Scholar] [CrossRef]
- Li, R.; Zheng, S.; Zhang, C.; Duan, C.; Wang, L.; Atkinson, P.M. ABCNet: Attentive Bilateral Contextual Network for Efficient Semantic Segmentation of Fine-Resolution Remotely Sensed Imagery. ISPRS journal of photogrammetry and remote sensing 2021, 181, 84–98. [Google Scholar] [CrossRef]
- Srinivas, A.; Lin, T.-Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck Transformers for Visual Recognition. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021; pp. 16519–16529. [Google Scholar]
- Wang, L.; Li, R.; Wang, D.; Duan, C.; Wang, T.; Meng, X. Transformer Meets Convolution: A Bilateral Awareness Network for Semantic Segmentation of Very Fine Resolution Urban Scene Images. Remote Sensing 2021, 13, 3065. [Google Scholar] [CrossRef]
- Strudel, R.; Garcia, R.; Laptev, I.; Schmid, C. Segmenter: Transformer for Semantic Segmentation. roceedings of the IEEE/CVF international conference on computer vision; 2021; pp. 7262–7272. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2019; pp. 3146–3154. [Google Scholar]
- Li, X.; Zhong, Z.; Wu, J.; Yang, Y.; Lin, Z.; Liu, H. Expectation-Maximization Attention Networks for Semantic Segmentation. In Proceedings of the IEEE/CVF international conference on computer vision; 2019; pp. 9167–9176. [Google Scholar]
- Li, X.; He, H.; Li, X.; Li, D.; Cheng, G.; Shi, J.; Weng, L.; Tong, Y.; Lin, Z. Pointflow: Flowing Semantics through Points for Aerial Image Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2021; pp. 4217–4226. [Google Scholar]








| Dataset | Model | MIoU(%) | mF1(%) | ||
|---|---|---|---|---|---|
| MiT | HRNetV2 | MiT | HRNetV2 | ||
| Vaihingen | Baseline | 80.41 | 79.11 | 88.93 | 88.17 |
| Baseline + MDCFD | 81.56 | 79.49 | 89.68 | 88.41 | |
| Baseline + MDCFD +LKSHAM | 82.14 | 80.27 | 90.06 | 88.89 | |
| Potsdam | Baseline | 78.10 | 77.42 | 86.39 | 85.96 |
| Baseline + MDCFD | 79.13 | 78.19 | 87.21 | 86.46 | |
| Baseline + MDCFD +LKSHAM | 79.63 | 78.80 | 87.56 | 87.07 | |
| Dataset | Model | Imp. Surf. | Building | Low veg. | Tree | Car | mF1 | mIoU |
|---|---|---|---|---|---|---|---|---|
| Vaihingen | DABNet [11] | 87.8 | 88.8 | 74.3 | 84.9 | 60.2 | 79.2 | 70.2 |
| ERFNet | 88.5 | 90.2 | 76.4 | 85.8 | 53.6 | 78.9 | 69.1 | |
| PSPNet | 89.0 | 93.2 | 81.5 | 87.7 | 43.9 | 79.0 | 68.6 | |
| FANet [12] | 90.7 | 93.8 | 82.6 | 88.6 | 71.6 | 85.4 | 75.6 | |
| ABCNet [13] | 92.7 | 95.2 | 84.5 | 89.7 | 85.3 | 89.5 | 81.3 | |
| BoTNet [14] | 89.9 | 92.1 | 81.8 | 88.7 | 71.3 | 84.8 | 74.3 | |
| BANet [15] | 92.2 | 95.2 | 83.8 | 89.9 | 86.8 | 89.6 | 81.4 | |
| Segmenter [16] | 89.8 | 93.0 | 81.2 | 88.9 | 67.6 | 84.1 | 73.6 | |
| Deeplabv3+ | 90.1 | 93.2 | 82.1 | 88.0 | 84.1 | 87.5 | 78.0 | |
| Segformer | 92.0 | 95.5 | 83.3 | 89.2 | 84.6 | 88.9 | 80.4 | |
| HRNetV2 | 91.0 | 94.4 | 82.8 | 88.8 | 83.8 | 88.2 | 79.1 | |
| MiT&Ours | 92.8 | 95.7 | 85.1 | 89.6 | 87.0 | 90.1 | 82.1 | |
| HRNetV2&Ours | 91.8 | 95.1 | 84.0 | 89.0 | 84.5 | 88.9 | 80.3 | |
| Potsdam | DeeplabV3+ | 92.6 | 96.4 | 86.3 | 87.8 | 95.4 | 85.6 | 77.1 |
| DANet [17] | 88.5 | 92.7 | 78.8 | 85.7 | 73.7 | 77.1 | 65.3 | |
| CCNet | 88.3 | 92.5 | 78.8 | 85.7 | 73.9 | 75.9 | 64.3 | |
| EMANet [18] | 88.2 | 92.7 | 78.0 | 85.7 | 72.7 | 77.7 | 65.6 | |
| Segformer | 92.9 | 96.4 | 86.9 | 88.1 | 95.2 | 86.4 | 78.1 | |
| PFNet [19] | 91.5 | 95.9 | 85.4 | 86.3 | 91.1 | 84.8 | 58.6 | |
| HRNetV2 | 92.7 | 96.4 | 87.1 | 88.2 | 94.4 | 86.0 | 77.4 | |
| MiT&Ours | 93.3 | 96.8 | 87.9 | 89.3 | 96.2 | 87.6 | 79.6 | |
| HRNetV2&Ours | 93.7 | 96.9 | 87.6 | 88.8 | 96.2 | 87.1 | 78.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).