Submitted:
13 November 2024
Posted:
13 November 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Methods
2.1. MTGAN Model Structure
2.2. MTGAN Generator
2.2.1. Shared Parameter Encoder
2.2.2. Land-Cover Class Generator

2.2.3. Pixel-Level Fusion Network
2.3. MTGAN Discriminator
2.4. Loss Function
3. Results
3.1. Experimental Dataset
3.2. Evaluation Metrics
3.3. Implementation Details
3.4. Ablation Experiments
| Model | Description |
| Pix2Pix | Baseline model |
| Pix2Pix++ | and are added on Pix2Pix model |
| DBGAN | Dual-branch generative model |
| DBGAN++ | A shared parameter encoder is added on DBGAN |
| MTGAN | A spatial decoder is added on DBGAN++ |



3.5. Comparison Experiments
4. Discussion
4.1. Analysis of the Interpretability of Generated Images in Segmentation Models
4.2. The Effect of Adding Generated Samples on Segmentation Accuracy


5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhao, C.; Ogawa, Y.; Chen, S.; Yang, Z.; Sekimoto, Y. Label Freedom: Stable Diffusion for Remote Sensing Image Semantic Segmentation Data Generation. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData); IEEE: 2023; pp. 1022–1030.
- Khanna, S.; Liu, P.; Zhou, L.; Meng, C.; Rombach, R.; Burke, M.; Ermon, S. DiffusionSat: A Generative Foundation Model for Satellite Imagery. arXiv 2023, arXiv:2312.03606.
- Dao, T.; Gu, A.; Ratner, A.; Smith, V.; DeSa, C.; Ré, C. A Kernel Theory of Modern Data Augmentation. In Proceedings of the International Conference on Machine Learning; 2019; pp. 1528–1537.
- Mumuni, A.; Mumuni, F. Data Augmentation: A Comprehensive Survey of Modern Approaches. Array 2022, 16, 100258. [CrossRef]
- Perez, L.; Wang, J. The Effectiveness of Data Augmentation in Image Classification Using Deep Learning. arXiv 2017, arXiv:1712.04621.
- Zhang, H.; Cisse, M.; Dauphin, Y. N.; Lopez-Paz, D. Mixup: Beyond Empirical Risk Minimization. arXiv 2017, arXiv:1710.09412.
- Yun, S.; Han, D.; Oh, S. J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. In Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019; pp. 6023–6032.
- Shi, J.; Ghazzai, H.; Massoud, Y. Differentiable Image Data Augmentation and Its Applications: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 2024, 46(2), 1148–1164. [CrossRef]
- Ma, D.; Tang, P.; Zhao, L.; Zhang, Z. A Review of Deep Learning Image Data Augmentation Methods. Journal of Image and Graphics 2021, 26(3), 487–502.
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; et al. Generative Adversarial Nets. In Proceedings of the Neural Information Processing Systems; MIT Press: 2014.
- Wang, C.; Chen, B.; Zou, Z.; Shi, Z. Remote Sensing Image Synthesis via Semantic Embedding Generative Adversarial Networks. IEEE Transactions on Geoscience and Remote Sensing 2023, 61, 1–11. [CrossRef]
- Kuang, Y.; Ma, F.; Li, F.; Liu, Y.; Zhang, F. Semantic-Layout-Guided Image Synthesis for High-Quality Synthetic-Aperature Radar Detection Sample Generation. Remote Sens. 2023, 15, 5654. [CrossRef]
- Remusati, H.; Le Caillec, J.-M.; Schneider, J.-Y.; Petit-Frère, J.; Merlet, T. Generative Adversarial Networks for SAR Automatic Target Recognition and Classification Models Enhanced Explainability: Perspectives and Challenges. Remote Sens. 2024, 16, 2569. [CrossRef]
- Fu, Q.; Xia, S.; Kang, Y.; Sun, M.; Tan, K. Satellite Remote Sensing Grayscale Image Colorization Based on Denoising Generative Adversarial Network. Remote Sens. 2024, 16, 3644. [CrossRef]
- Rui, X.; Cao, Y.; Yuan, X.; Kang, Y.; Song, W. DisasterGAN: Generative Adversarial Networks for Remote Sensing Disaster Image Generation. Remote Sens. 2021, 13, 4284. [CrossRef]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the International Conference on Machine Learning; 2017; pp. 214–223.
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. Computer Science 2014. [CrossRef]
- Kong, X.; Shen, Z.; Chen, S. A GAN-Based Algorithm for Generating Samples of Pedestrians in High-Speed Railway Perimeter Environment. Railway Perimeter Environment 2019, 55(7), 57–61.
- Isola, P.; Zhu, J. Y.; Zhou, T.; et al. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017; pp. 1125–1134. [CrossRef]
- Ouyang, X.; Cheng, Y.; Jiang, Y.; et al. Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond. arXiv 2018. [CrossRef]
- Yang, S. Research on Image Generation and Velocity Estimation Based on Generative Adversarial Networks. Master’s Thesis, Zhejiang University of Technology, 2019.
- Wang, Y.; Wang, H.; Xu, T. Aircraft Recognition of Remote Sensing Image Based on Samples Generated by CGAN. Journal of Image and Graphics 2021, 26(3), 663–673. [CrossRef]
- Jiang, Y.; Zhu, B. Data Augmentation for Remote Sensing Image Based on Generative Adversarial Networks under Condition of Few Samples. Journal Name 2021, 58(8), 238–244.
- Karras, T.; Laine, S.; Aittala, M.; et al. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020; pp. 8110–8119. [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018. [CrossRef]
- Isola, P.; Zhu, J. Y.; Zhou, T.; et al. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017; pp. 1125–1134.
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, October 5–9, Proceedings, Part III; 2015; pp. 234–241.
- Wang, T. C.; Liu, M. Y.; Zhu, J. Y.; et al. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018; pp. 8798–8807. [CrossRef]
- Park, T.; Liu, M. Y.; Wang, T. C.; et al. Semantic Image Synthesis with Spatially-Adaptive Normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019; pp. 2337–2346.
- Tang, H.; Bai, S.; Sebe, N. Dual Attention GANs for Semantic Image Synthesis. In Proceedings of the 28th ACM International Conference on Multimedia; 2020; pp. 1994–2002. [CrossRef]
- Tang, H.; Sebe, N. Layout-to-Image Translation with Double Pooling Generative Adversarial Networks. IEEE Transactions on Image Processing 2021, 30, 7903–7913. [CrossRef]
- Tang, H.; Xu, D.; Yan, Y.; et al. Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020; pp. 7870–7879. [CrossRef]
- Li, Y.; Li, Y.; Lu, J.; et al. Collaging Class-Specific GANs for Semantic Image Synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision; 2021; pp. 14418–14427. [CrossRef]
- Zhu, P.; Abdal, R.; Qin, Y.; et al. SEAN: Image Synthesis with Semantic Region-Adaptive Normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020; pp. 5104–5113.
- Tan, Z.; Chen, D.; Chu, Q.; et al. Efficient Semantic Image Synthesis via Class-Adaptive Normalization. IEEE Transactions on Pattern Analysis and Machine Intelligence 2021, 44(9), 4852–4866. [CrossRef]
- Berrada, T.; Verbeek, J.; Couprie, C.; Alahari, K. Unlocking Pre-Trained Image Backbones for Semantic Image Synthesis. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); Seattle, WA, USA, 2024; pp. 7840–7849. [CrossRef]
- Caesar, H.; Uijlings, J.; Ferrari, V. COCO-Stuff: Thing and Stuff Classes in Context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018; pp. 1209–1218.
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016; pp. 3213–3223.
- Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene Parsing through ADE20K Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2017; pp. 633–641.
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014. [CrossRef]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Advances in Neural Information Processing Systems; 2017; vol. 30, pp. 1–12.
- Tong, X.-Y.; et al. Land-Cover Classification with High-Resolution Remote Sensing Images Using Transferable Deep Models. Remote Sensing of Environment 2020, 237, Art. no. 111322. [CrossRef]
- Zhang, R.; Isola, P.; Efros, A. A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; Jun. 2018; pp. 586–595.
- Zhu, J.; et al. Label-Guided Generative Adversarial Network for Realistic Image Synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023, 45(3), 3311–3328. [CrossRef]
- Zhou, B.; Khosla, A.; Lapedriza, A.; et al. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2016; pp. 2921–2929.
- Selvaraju, R. R.; Cogswell, M.; Das, A.; et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. In Proceedings of the IEEE International Conference on Computer Vision; 2017; pp. 618–626.
- Vinogradova, K.; Dibrov, A.; Myers, G. Towards Interpretable Semantic Segmentation via Gradient Weighted Class Activation Mapping (Student Abstract). In Proceedings of the AAAI Conference on Artificial Intelligence; 2020; pp. 13943–13944.








| Methods | Chongzhou | Wuzhen | ||||||
| LPIPS↓ | FID↓ | FWIoU(%)↑ | OA(%)↑ | LPIPS↓ | FID↓ | FWIoU(%)↑ | OA(%)↑ | |
| Pix2Pix | 0.6080 | 176.28 | 49.56 | 62.50 | 0.6270 | 225.25 | 57.22 | 71.18 |
| Pix2Pix++ | 0.6008 | 159.23 | 53.26 | 66.02 | 0.6087 | 213.08 | 59.34 | 72.57 |
| GLGAN | 0.6132 | 169.66 | 54.86 | 67.69 | 0.6273 | 236.80 | 61.63 | 74.20 |
| GLGAN++ | 0.5961 | 154.27 | 54.50 | 67.59 | 0.5983 | 179.22 | 59.63 | 73.47 |
| MTGAN | 0.5783 | 123.42 | 60.80 | 72.88 | 0.5551 | 137.96 | 65.98 | 77.35 |
| Methods | Chongzhou | Wuzhen | ||||||
| LPIPS↓ | FID↓ | FWIoU(%)↑ | PA(%)↑ | LPIPS↓ | FID↓ | FWIoU(%)↑ | PA(%)↑ | |
| Pix2PixHD | 0.6007 | 145.44 | 57.60 | 69.10 | 0.5707 | 173.17 | 62.03 | 74.55 |
| GauGAN | 0.6008 | 192.04 | 53.26 | 66.02 | 0.5964 | 166.33 | 61.52 | 74.33 |
| DAGAN | 0.5901 | 170.19 | 57.85 | 69.52 | 0.5670 | 150.34 | 62.29 | 74.75 |
| DPGAN | 0.6147 | 167.84 | 56.12 | 69.50 | 0.5917 | 153.51 | 57.60 | 70.72 |
| Lab2Pix-V2 | 0.5844 | 137.88 | 57.51 | 69.90 | 0.5590 | 148.15 | 63.43 | 75.91 |
| MTGAN[ours] | 0.5783 | 123.42 | 60.80 | 72.88 | 0.5551 | 137.96 | 65.98 | 77.35 |
| Dataset | Training set | Test set | |
|---|---|---|---|
| Dataset 1.0 | 845 | 211 | |
| Chongzhou | Dataset 0.8 | 676 | 169 |
| Dataset 0.6 | 507 | 127 | |
| Dataset 0.4 | 338 | 84 | |
| Dataset 1.0 | 706 | 176 | |
| Wuzhen | Dataset 0.9 | 634 | 158 |
| Dataset 0.7 | 483 | 123 | |
| Dataset 0.5 | 352 | 88 |
| Dataset | +0 | +500 | +1000 | +2000 | ||||
| FWIoU(%) | OA(%) | FWIoU(%) | OA(%) | FWIoU(%) | OA(%) | FWIoU(%) | OA(%) | |
| Dataset 1.0 | 67.64 | 78.44 | 68.66 | 79.22 | 69.22 | 79.59 | 68.03 | 78.53 |
| Dataset 0.8 | 66.64 | 77.67 | 68.87 | 79.23 | 68.73 | 79.12 | 67.05 | 77.91 |
| Dataset 0.6 | 64.33 | 75.84 | 68.22 | 78.91 | 68.49 | 78.86 | 68.46 | 78.82 |
| Dataset 0.4 | 64.50 | 76.04 | 64.78 | 75.56 | 65.85 | 76.61 | 65.04 | 75.84 |
| Dataset | +0 | +500 | +1000 | +1500 | ||||
| FWIoU(%) | OA(%) | FWIoU(%) | OA(%) | FWIoU(%) | OA(%) | FWIoU(%) | OA(%) | |
| Dataset 1.0 | 70.76 | 80.84 | 73.57 | 83.05 | 73.60 | 83.23 | 73.36 | 82.91 |
| Dataset 0.9 | 69.78 | 79.93 | 72.32 | 82.17 | 72.26 | 81.94 | 71.75 | 81.93 |
| Dataset 0.7 | 68.38 | 79.15 | 72.85 | 82.38 | 72.72 | 82.14 | 72.13 | 81.90 |
| Dataset 0.5 | 67.87 | 78.72 | 69.17 | 80.32 | 69.17 | 80.63 | 69.35 | 80.24 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).