Submitted:
16 April 2025
Posted:
17 April 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
- We propose a lightweight hybrid CNN-Transformer network that effectively extracts strip-like features in RSIs using dynamic snake convolution.
- We innovatively integrate the CBAMUpConv module into the image reconstruction framework, which integrates upsampling convolution with the CBAM attention mechanism to improve both spatial and channel feature learning capabilities.
- We validate the effectiveness of the proposed methods through extensive ablation experiments on the AID remote sensing dataset and show significant improvements over existing methods.
2. Related Work
2.1. Conventional Methods for SR
2.2. CNN-Based Models for SR
2.3. Transformer-Based Models for SR
2.4. Lightweight Models for SR
3. Methodology
3.1. Architecture
3.2. Strip-like Feature Superpixel Cross and Intra Interaction Module
3.2.1. Dynamic Snake Convolution
3.2.2. Superpixel Clustering
3.2.3. Inter-Superpixel Attention
3.2.4. Intra-Superpixel Attention
3.3. Convolutional Block Attention Module with Upsampling Convolution
4. Experiments
4.1. Experimental Settings
4.1.1. Dataset and Evaluation
4.1.2. Implementation Details
4.1.3. Training Settings
4.2. Comparison with Other Lightweight Methods
4.2.1. Quantitative Evaluation
4.2.2. Visual Quality Analysis
4.3. Ablation Studies
4.3.1. Impact of DSConv
4.3.2. Impact of CBAMConv and CBAMUpConv
5. Conclusion and Future Work
Funding
Conflicts of Interest
References
- Turner, W.; Spector, S.; Gardiner, N.; Fladeland, M.; Sterling, E.; Steininger, M. Remote sensing for biodiversity science and conservation. Trends in ecology & evolution 2003, 18, 306–314. [Google Scholar]
- Herold, M.; Liu, X.; Clarke, K.C. Spatial metrics and image texture for mapping urban land use. Photogrammetric Engineering & Remote Sensing 2003, 69, 991–1001. [Google Scholar]
- Thenkabail, P.S.; Lyon, J.G.; Huete, A. Advances in hyperspectral remote sensing of vegetation and agricultural crops. In Fundamentals, Sensor Systems, Spectral Libraries, and Data Mining for Vegetation; CRC press, 2018; pp. 3–37.
- Joyce, K.E.; Belliss, S.E.; Samsonov, S.V.; McNeill, S.J.; Glassey, P.J. A review of the status of satellite remote sensing and image processing techniques for mapping natural hazards and disasters. Progress in physical geography 2009, 33, 183–207. [Google Scholar] [CrossRef]
- Shen, H.; Zhang, L.; Huang, B.; Li, P. A MAP approach for joint motion estimation, segmentation, and super resolution. IEEE Transactions on Image processing 2007, 16, 479–490. [Google Scholar] [CrossRef] [PubMed]
- Köhler, T.; Huang, X.; Schebesch, F.; Aichert, A.; Maier, A.; Hornegger, J. Robust multiframe super-resolution employing iteratively re-weighted minimization. IEEE Transactions on Computational Imaging 2016, 2, 42–58. [Google Scholar] [CrossRef]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929 2020. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 1833–1844.
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12299–12310.
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence 2015, 38, 295–307. [Google Scholar] [CrossRef]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–1654.
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2472–2481.
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 391–407.
- Liu, Z.; Sun, M.; Zhou, T.; Huang, G.; Darrell, T. Rethinking the value of network pruning. arXiv 2018, arXiv:1810.05270 2018. [Google Scholar]
- Yu, F. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122 2015. [Google Scholar]
- Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable convolutional networks. Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773.
- Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic snake convolution based on topological geometric constraints for tubular structure segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 6070–6079.
- Jampani, V.; Sun, D.; Liu, M.Y.; Yang, M.H.; Kautz, J. Superpixel sampling networks. Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 352–368.
- Li, X.; Dong, J.; Tang, J.; Pan, J. Dlgsanet: lightweight dynamic local and global self-attention networks for image super-resolution. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12792–12801.
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 3–19.
- Zhang, L.; Wu, X. An edge-guided image interpolation algorithm via directional filtering and data fusion. IEEE transactions on Image Processing 2006, 15, 2226–2238. [Google Scholar] [CrossRef]
- Hung, K.W.; Siu, W.C. Robust soft-decision interpolation using weighted least squares. IEEE Transactions on Image Processing 2011, 21, 1061–1069. [Google Scholar] [CrossRef] [PubMed]
- Lu, X.; Yuan, H.; Yuan, Y.; Yan, P.; Li, L.; Li, X. Local learning-based image super-resolution. 2011 IEEE 13th International Workshop on Multimedia Signal Processing. IEEE, 2011, pp. 1–5.
- Kim, K.I.; Kwon, Y. Single-image super-resolution using sparse regression and natural image prior. IEEE transactions on pattern analysis and machine intelligence 2010, 32, 1127–1133. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z. ; others. Photo-realistic single image super-resolution using a generative adversarial network. Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4681–4690.
- Wang, J.; Yang, W.; Guo, H.; Zhang, R.; Xia, G.S. Tiny object detection in aerial images. 2020 25th international conference on pattern recognition (ICPR). IEEE, 2021, pp. 3791–3798.
- Wang, L.; Li, R.; Zhang, C.; Fang, S.; Duan, C.; Meng, X.; Atkinson, P.M. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery. ISPRS Journal of Photogrammetry and Remote Sensing 2022, 190, 196–214. [Google Scholar] [CrossRef]
- Chen, K.; Chen, B.; Liu, C.; Li, W.; Zou, Z.; Shi, Z. Rsmamba: Remote sensing image classification with state space model. IEEE Geoscience and Remote Sensing Letters 2024. [Google Scholar] [CrossRef]
- Liebel, L.; Körner, M. Single-image super resolution for multispectral remote sensing data using convolutional neural networks. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences 2016, 41, 883–890. [Google Scholar] [CrossRef]
- Lei, S.; Shi, Z.; Zou, Z. Super-resolution for remote sensing images via local–global combined network. IEEE Geoscience and Remote Sensing Letters 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
- Xu, W.; Guangluan, X.; Wang, Y.; Sun, X.; Lin, D.; Yirong, W. High quality remote sensing image super-resolution using deep memory connected network. IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2018, pp. 8889–8892.
- Ma, W.; Pan, Z.; Guo, J.; Lei, B. Achieving super-resolution remote sensing images via the wavelet transform combined with the recursive res-net. IEEE Transactions on Geoscience and Remote Sensing 2019, 57, 3512–3527. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
- Fang, J.; Lin, H.; Chen, X.; Zeng, K. A hybrid network of cnn and transformer for lightweight image super-resolution. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1103–1112.
- Lei, S.; Shi, Z.; Mo, W. Transformer-based multistage enhancement for remote sensing image super-resolution. IEEE Transactions on Geoscience and Remote Sensing 2021, 60, 1–11. [Google Scholar] [CrossRef]
- He, J.; Yuan, Q.; Li, J.; Xiao, Y.; Liu, X.; Zou, Y. DsTer: A dense spectral transformer for remote sensing spectral super-resolution. International Journal of Applied Earth Observation and Geoinformation 2022, 109, 102773. [Google Scholar] [CrossRef]
- Tu, J.; Mei, G.; Ma, Z.; Piccialli, F. SWCGAN: Generative adversarial network combining swin transformer and CNN for remote sensing image super-resolution. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2022, 15, 5662–5673. [Google Scholar] [CrossRef]
- Shang, J.; Gao, M.; Li, Q.; Pan, J.; Zou, G.; Jeon, G. Hybrid-scale hierarchical transformer for remote sensing image super-resolution. Remote Sensing 2023, 15, 3442. [Google Scholar] [CrossRef]
- Liu, J.; Tang, J.; Wu, G. Residual feature distillation network for lightweight image super-resolution. Computer vision–ECCV 2020 workshops: Glasgow, UK, August 23–28, 2020, proceedings, part III 16. Springer, 2020, pp. 41–55.
- Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. Proceedings of the European conference on computer vision (ECCV), 2018, pp. 252–268.
- Lu, Z.; Li, J.; Liu, H.; Huang, C.; Zhang, L.; Zeng, T. Transformer for single image super-resolution. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 457–466.
- Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. European conference on computer vision. Springer, 2022, pp. 649–667.
- Zhang, A.; Ren, W.; Liu, Y.; Cao, X. Lightweight image super-resolution with superpixel token interaction. Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12728–12737.
- Wang, X.; Wu, Y.; Ming, Y.; Lv, H. Remote sensing imagery super resolution based on adaptive multi-scale feature fusion network. Sensors 2020, 20, 1142. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A. Attention is all you need. Advances in Neural Information Processing Systems 2017. [Google Scholar]
- Wang, S.; Zhou, T.; Lu, Y.; Di, H. Contextual transformation network for lightweight remote-sensing image super-resolution. IEEE Transactions on Geoscience and Remote Sensing 2021, 60, 1–13. [Google Scholar] [CrossRef]
- Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Wang, C.; Li, Z.; Shi, J. Lightweight image super-resolution with adaptive weighted learning network. arXiv 2019, arXiv:1904.02358 2019. [Google Scholar]
- Luo, X.; Xie, Y.; Zhang, Y.; Qu, Y.; Li, C.; Fu, Y. Latticenet: Towards lightweight image super-resolution with lattice block. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, –28, 2020, Proceedings, Part XXII 16. Springer, 2020, pp. 272–289.
- Muqeet, A.; Hwang, J.; Yang, S.; Kang, J.; Kim, Y.; Bae, S.H. Multi-attention based ultra lightweight image super-resolution. Computer Vision–ECCV 2020 Workshops: Glasgow, UK, –28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 103–118.
Short Biography of Authors
![]() |
Yanxia Lyu received the Ph.D. degree in the School of Computer Science and Engineering, Northeastern University, Shenyang, China, in 2020. She is currently an associate professor with the School of Computer and Communication Engineering, Northeastern University at Qinhuangdao. Her research interests include pattern recognition, information retrieval, and artificial intelligence applications. |
![]() |
Yuhang Liu is currently pursuing a Bachelor of Science degree in Computer Science and Technology in Northeastern University at Qinhuangdao, Qinhuangdao, China. His academic interests focus on computer vision and remote sensing super-resolution. |
![]() |
Qianqian Zhao is currently pursuing the BS degree in Computer Science and Technology in Northeastern University at Qinhuangdao, Qinhuangdao, China. Her research interests include data science and computer vision. |
![]() |
Ziwen Hao is currently pursuing the M.S. degree from the School of Computer Science and Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, China. His current research interests include super resolution of remote sensing images and machine learning. |
![]() |
Xin Song received her Ph.D. degree from Northeastern University in 2008. Now she is a professor in School of Computer and Communication Engineering, Northeastern University at Qinhuangdao. Her research interests include wireless communication and image processing. |







| Method | ×2 Scale | ×3 Scale | ×4 Scale | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Params (k) | PSNR | SSIM | Params (k) | PSNR | SSIM | Params (k) | PSNR | SSIM | |
| AWSRN-M [52] | 1,064 | 33.05 | 0.8703 | 1,143 | 30.22 | 0.7835 | 1,254 | 28.55 | 0.7188 |
| RFDN [42] | 626 | 32.95 | 0.8676 | 633 | 30.14 | 0.7801 | 643 | 28.48 | 0.7157 |
| LatticeNet [53] | 756 | 32.96 | 0.8679 | 765 | 30.16 | 0.7810 | 777 | 28.48 | 0.7157 |
| MAFFSRN-L [54] | 791 | 33.05 | 0.8703 | 807 | 30.20 | 0.7826 | 830 | 28.54 | 0.7184 |
| ESRT [44] | 678 | 32.97 | 0.8680 | 770 | 30.12 | 0.7787 | 752 | 28.47 | 0.7143 |
| SPIN [46] | 497 | 33.02 | 0.8693 | 569 | 30.21 | 0.7828 | 555 | 28.54 | 0.7188 |
| ELAN-light [45] | 582 | 33.01 | 0.8693 | 590 | 30.19 | 0.7826 | 601 | 28.52 | 0.7183 |
| LGCNet [33] | 193 | 32.67 | 0.8612 | 193 | 29.82 | 0.7685 | 193 | 28.17 | 0.7023 |
| CTNet [49] | 402 | 32.90 | 0.8667 | 402 | 30.06 | 0.7779 | 413 | 28.42 | 0.7135 |
| AMFFN [47] | 298 | 32.93 | 0.8671 | 305 | 30.09 | 0.7784 | 314 | 28.43 | 0.7135 |
| Ours (SFSIN-S) | 642 | 33.08 | 0.8708 | 714 | 30.23 | 0.7837 | 700 | 28.58 | 0.7205 |
| Ours (SFSIN) | 784 | 33.10 | 0.8715 | 856 | 30.25 | 0.7844 | 842 | 28.57 | 0.7203 |
| Component | ×2 Scale | ×3 Scale | ×4 Scale | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| DSConv | CBAM-Conv | CBAM-UpConv | Params (k) | PSNR | SSIM | Params (k) | PSNR | SSIM | Params (k) | PSNR | SSIM |
| 0 | 497 | 33.02 | 0.8693 | 569 | 30.21 | 0.7828 | 555 | 28.54 | 0.7188 | ||
| 1 | 781 | 33.06 | 0.8704 | 853 | 30.21 | 0.7831 | 839 | 28.55 | 0.7194 | ||
| 0 | ✓ | ✓ | 500 | 33.05 | 0.8699 | 572 | 30.21 | 0.7832 | 558 | 28.55 | 0.7196 |
| 1 | ✓ | 783 | 33.08 | 0.8708 | 855 | 30.23 | 0.7836 | 841 | 28.55 | 0.7195 | |
| 1 | ✓ | ✓ | 586 | 33.09 | 0.8710 | 588 | 30.24 | 0.7840 | 594 | 28.56 | 0.7200 |
| 2 | ✓ | ✓ | 620 | 33.10 | 0.8712 | 633 | 30.24 | 0.7842 | 629 | 28.56 | 0.7200 |
| 3 | ✓ | ✓ | 660 | 33.11 | 0.8713 | 668 | 30.25 | 0.7843 | 665 | 28.57 | 0.7203 |
| 4 | ✓ | ✓ | 701 | 33.11 | 0.8714 | 704 | 30.25 | 0.7843 | 700 | 28.58 | 0.7205 |
| 5 | ✓ | ✓ | 730 | 33.11 | 0.8714 | 738 | 30.25 | 0.7843 | 736 | 28.57 | 0.7204 |
| 6 | ✓ | ✓ | 752 | 33.12 | 0.8715 | 769 | 30.26 | 0.7844 | 771 | 28.58 | 0.7207 |
| 7 | ✓ | ✓ | 784 | 33.11 | 0.8714 | 805 | 30.25 | 0.7843 | 807 | 28.57 | 0.7204 |
| 8 | ✓ | ✓ | 813 | 33.10 | 0.8715 | 836 | 30.25 | 0.7844 | 842 | 28.57 | 0.7203 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).




