Submitted:
01 November 2023
Posted:
01 November 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction

2. Materials and Methods
2.1. Overall structure of CloudformerV3
- Multi-Scale Adapter Incorporation in the Encoder: Introducing a multi-scale adapter within the encoder enhances the synergy be-tween the pretrained natural image-based backbone and the remote sensing image cloud detection process. This collaboration facilitates multi-level feature extraction, allowing the model to gain a more profound understanding of image structure and characteristics that are pertinent to the task;
- Multi-Level Large Window Attention Enhanced Decoder Mechanism: In the decoder stage, a novel approach is adopted involving the interaction of low-resolution feature maps with high-resolution feature maps through large window attention. This process encompasses incremental layer-by-layer upsampling and fusion with higher-level feature maps. As a result, the decoder comprehensively integrates feature information across multiple levels, thereby intensifying the ability to detect cloud edges;
- Integration of Dark and Bright Channel Prior to Information: During the data preprocessing phase, the dark channel and bright channel prior information are computed and then integrated into the generalized backbone through an adapter. This infusion equips the model with prior feature information that significantly enhances cloud detection capabilities. Particularly, this enhancement contributes to distinguishing thin clouds from the ground surface with greater precision.
2.2. Multi-Scale Adapter
2.3. Multi-Level Large Window Attention and Decoder
2.4. Dark and Bright Channel Prior Information
3. Results
3.1. Dataset and Evaluation Metrics

3.2. Ablation Experiment
3.2.1. Performance Verification of the Multi-Scale Adapter
3.2.2. Performance Validation of Multi-Level Large Window Attention
3.2.3. Performance Validation of Dark Channel and Bright Channel Prior information
3.3. Comparison with State-of-the-Art Methods
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, Y.; Rossow, W.B.; Lacis, A.A.; Oinas, V.; Mishchenko, M.I. Calculation of Radiative Fluxes from the Surface to Top of Atmosphere Based on ISCCP and Other Global Data Sets: Refinements of the Radiative Transfer Model and the Input Data. J. Geophys. Res. 2004, 109, 2003JD004457. [CrossRef]
- Zhu, X.; Helmer, E.H. An Automatic Method for Screening Clouds and Cloud Shadows in Optical Satellite Image Time Series in Cloudy Regions. Remote sensing of environment 2018, 214, 135–153. [CrossRef]
- Ju, J.; Roy, D.P. The Availability of Cloud-Free Landsat ETM+ Data over the Conterminous United States and Globally. Remote Sensing of Environment 2008, 112, 1196–1211. [CrossRef]
- Zheng, W.; Shao, J.; Wang, M.; Huang, D. A Thin Cloud Removal Method from Remote Sensing Image for Water Body Identification. Chin. Geogr. Sci. 2013, 23, 460–469. [CrossRef]
- Sundqvist, H.; Berge, E.; Kristjánsson, J.E. Condensation and Cloud Parameterization Studies with a Mesoscale Numerical Weather Prediction Model. Monthly Weather Review 1989, 117, 1641–1657. [CrossRef]
- Camps-Valls, G.; Marsheva, T.V.B.; Zhou, D. Semi-Supervised Graph-Based Hyperspectral Image Classification. IEEE transactions on Geoscience and Remote Sensing 2007, 45, 3044–3054. [CrossRef]
- Irish, R.R. Landsat 7 Automatic Cloud Cover Assessment. In Proceedings of the Algorithms for Multispectral, Hyperspectral, and Ultraspectral Imagery VI; SPIE, 2000; Vol. 4049, pp. 348–355.
- Irish, R.R.; Barker, J.L.; Goward, S.N.; Arvidson, T. Characterization of the Landsat-7 ETM+ Automated Cloud-Cover Assessment (ACCA) Algorithm. Photogrammetric engineering & remote sensing 2006, 72, 1179–1188. [CrossRef]
- Zhu, Z.; Woodcock, C.E. Object-Based Cloud and Cloud Shadow Detection in Landsat Imagery. Remote sensing of environment 2012, 118, 83–94. [CrossRef]
- Kang, X.; Gao, G.; Hao, Q.; Li, S. A Coarse-to-Fine Method for Cloud Detection in Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters 2018, 16, 110–114. [CrossRef]
- Fu, H.; Shen, Y.; Liu, J.; He, G.; Chen, J.; Liu, P.; Qian, J.; Li, J. Cloud Detection for FY Meteorology Satellite Based on Ensemble Thresholds and Random Forests Approach. Remote Sensing 2018, 11, 44. [CrossRef]
- Mahajan, S.; Fataniya, B. Cloud Detection Methodologies: Variants and Development—a Review. Complex Intell. Syst. 2020, 6, 251–261. [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition; 2015; pp. 3431–3440.
- Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE transactions on pattern analysis and machine intelligence 2017, 39, 2481–2495. [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18; Springer, 2015; pp. 234–241.
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Proceedings of the European conference on computer vision (ECCV); 2018; pp. 801–818.
- Francis, A.; Sidiropoulos, P.; Muller, J.-P. CloudFCN: Accurate and Robust Cloud Detection for Satellite Imagery with Deep Learning. Remote Sensing 2019, 11, 2312. [CrossRef]
- Jeppesen, J.H.; Jacobsen, R.H.; Inceoglu, F.; Toftegaard, T.S. A Cloud Detection Algorithm for Satellite Imagery Based on Deep Learning. Remote sensing of environment 2019, 229, 247–259. [CrossRef]
- Yang, J.; Guo, J.; Yue, H.; Liu, Z.; Hu, H.; Li, K. CDnet: CNN-Based Cloud Detection for Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing 2019, 57, 6195–6211. [CrossRef]
- Li, Z.; Shen, H.; Cheng, Q.; Liu, Y.; You, S.; He, Z. Deep Learning Based Cloud Detection for Medium and High Resolution Remote Sensing Images of Different Sensors. ISPRS Journal of Photogrammetry and Remote Sensing 2019, 150, 197–212. [CrossRef]
- Guo, J.; Yang, J.; Yue, H.; Li, K. Unsupervised Domain Adaptation for Cloud Detection Based on Grouped Features Alignment and Entropy Minimization. IEEE Transactions on Geoscience and Remote Sensing 2021, 60, 1–13.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, \Lukasz; Polosukhin, I. Attention Is All You Need. Advances in neural information processing systems 2017, 30.
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Fu, Y.; Feng, J.; Xiang, T.; Torr, P.H. Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021; pp. 6881–6890.
- Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Advances in Neural Information Processing Systems 2021, 34, 12077–12090.
- Cheng, B.; Schwing, A.; Kirillov, A. Per-Pixel Classification Is Not All You Need for Semantic Segmentation. Advances in Neural Information Processing Systems 2021, 34, 17864–17875.
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer for Universal Image Segmentation. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2022; pp. 1290–1299.
- Yan, H.; Zhang, C.; Wu, M. Lawin Transformer: Improving Semantic Segmentation Transformer with Multi-Scale Representations via Large Window Attention. arXiv preprint arXiv:2201.01615 2022. [CrossRef]
- Zhang, Z.; Xu, Z.; Liu, C.; Tian, Q.; Wang, Y. Cloudformer: Supplementary Aggregation Feature and Mask-Classification Network for Cloud Detection. Applied Sciences 2022, 12, 3221. [CrossRef]
- Zhang, Z.; Xu, Z.; Liu, C.; Tian, Q.; Zhou, Y. Cloudformer V2: Set Prior Prediction and Binary Mask Weighted Network for Cloud Detection. Mathematics 2022, 10, 2710. [CrossRef]
- Chen, Z.; Duan, Y.; Wang, W.; He, J.; Lu, T.; Dai, J.; Qiao, Y. Vision Transformer Adapter for Dense Predictions. arXiv 2022. arXiv preprint arXiv:2205.08534. [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision; 2021; pp. 10012–10022.
- Zhang, J.; Wang, H.; Wang, Y.; Zhou, Q.; Li, Y. Deep Network Based on up and down Blocks Using Wavelet Transform and Successive Multi-Scale Spatial Attention for Cloud Detection. Remote Sensing of Environment 2021, 261, 112483. [CrossRef]
- Kumar, Y.; Gautam, J.; Gupta, A.; Kakani, B.V.; Chaudhary, H. Single Image Dehazing Using Improved Dark Channel Prior. In Proceedings of the 2015 2nd International Conference on Signal Processing and Integrated Networks (SPIN); IEEE, 2015; pp. 564–569.
- Wang, Y.; Zhuo, S.; Tao, D.; Bu, J.; Li, N. Automatic Local Exposure Correction Using Bright Channel Prior for Under-Exposed Images. Signal processing 2013, 93, 3227–3238. [CrossRef]
- Li, Z.; Shen, H.; Li, H.; Xia, G.; Gamba, P.; Zhang, L. Multi-Feature Combined Cloud and Cloud Shadow Detection in GaoFen-1 Wide Field of View Imagery. Remote sensing of environment 2017, 191, 342–358.
- Yuheng, S.; Hao, Y. Image Segmentation Algorithms Overview. arXiv preprint arXiv:1707.02051 2017.
- Thoma, M. A Survey of Semantic Segmentation. arXiv preprint arXiv:1602.06541 2016. [CrossRef]
- Lateef, F.; Ruichek, Y. Survey on Semantic Segmentation Using Deep Learning Techniques. Neurocomputing 2019, 338, 321–348. [CrossRef]







| Encoder | MIoU (%) | MAcc (%) | PAcc (%) |
|---|---|---|---|
| Mix transformer | 91.63 | 94.97 | 96.44 |
| + Multi-Scale Adapter | 92.43 | 95.62 | 96.95 |
| Decoder | MIoU (%) | MAcc (%) | PAcc (%) |
|---|---|---|---|
| Single feature layer | 92.26 | 95.54 | 96.83 |
| Multi-feature layer | 92.89 | 96.04 | 97.12 |
| Input channel | MIoU (%) | MAcc (%) | PAcc (%) |
|---|---|---|---|
| RGB | 92.43 | 95.62 | 96.95 |
| Dark | 92.71 | 95.77 | 97.09 |
| Bright | 92.68 | 95.83 | 97.05 |
| Dark+RGB | 92.33 | 95.48 | 96.91 |
| Bright+RGB | 92.14 | 95.51 | 96.84 |
| Dark+Bright | 92.89 | 96.04 | 97.12 |
| Dark+Bright+RGB | 92.03 | 95.45 | 96.78 |
| Method | MIoU (%) | MAcc (%) | PAcc (%) |
|---|---|---|---|
| SwinTransformer-UperNet | 90.47 | 93.37 | 94.12 |
| Mask2former | 90.89 | 94.69 | 94.89 |
| Segformer | 90.65 | 94.92 | 95.61 |
| Lawin transformer | 90.73 | 94.55 | 95.67 |
| Cloudformer | 91.78 | 94.49 | 95.07 |
| CloudformerV2 | 92.52 | 95.66 | 96.75 |
| CloudformerV3 | 92.89 | 96.04 | 97.12 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).