Submitted:
12 September 2024
Posted:
13 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Works and Datasets
2.1. Sparsity in Image Classification
2.2. Sparsity for vision based geometry
2.3. Datasets
2.3.1. Change Detection
- OSCD [21] is a set of 24 pairs of Sentinel2 image-crops with binary mask of change. In this paper, only RGB images are used resulting in around 24 image pairs of size around 400x400. Each pair is annotated with changes which are mostly related to land-use change e.g. new road or building.
- LEVIR-CD [22] is a set of pairs of very high resolution images (extracted from google earth - hence being straightforwardly RGB). They are annotated with change mask focusing on urban change (mostly building).
- S2looking [23] is a Sentinel2 image dataset like OSCD (from which only RGB is used in this paper like for OSCD) but with a much larger scale: it contains 5000 pairs of images of size 1024x1024 annotated with change mask.
2.3.2. Synthetic Registration
3. Methodology
3.1. Overview
3.2. Greedy Strategy
- computing
- computing
- sorting decreasingly according to
- selecting the top-K
| Algorithm 1:Compute_metric |
|
3.3. Loss
3.4. Implementations Detail
- For registration, the task is homography matrix regression. For this purpose, an encoder used the difference of features from the pair of images to predict the homography matrix. Precisely, if is the feature map of image 1 with dimension and the one of image 2, then, the network computes for in 9 independent channels, then, those 9 channels are normalized with resulting in a 9 channel normalized and independent from the scale of . The encoder which predicts mainly considers those channels ( over and over ).

4. Experiments
4.1. Registration
4.2. Change detection
4.3. Auxiliary results
4.3.1. EuroSAT
4.3.2. Imagenet
5. Conclusion
5.1. Summary
5.2. Limits
Funding
Data Availability Statement
Conflicts of Interest
References
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- Oquab, M.; Darcet, T.; Moutakanni, T.; Vo, H.; Szafraniec, M.; Khalidov, V.; Fernandez, P.; Haziza, D.; Massa, F.; El-Nouby, A.; others. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 2023. [CrossRef]
- Bardes, A.; Garrido, Q.; Ponce, J.; Chen, X.; Rabbat, M.; LeCun, Y.; Assran, M.; Ballas, N. V-jepa: Latent video prediction for visual representation learning. arXiv 2023.
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; others. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 2023. [CrossRef]
- Ravi, N.; Gabeur, V.; Hu, Y.T.; Hu, R.; Ryali, C.; Ma, T.; Khedr, H.; Rädle, R.; Rolland, C.; Gustafson, L.; others. Sam 2: Segment anything in images and videos. arXiv preprint arXiv:2408.00714 2024. [CrossRef]
- Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 2016. [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 2019. [CrossRef]
- Hu, Q.; Wang, P.; Cheng, J. From hashing to cnns: Training binary weight networks via hashing. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, Vol. 32.
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 2012, 25.
- Zhang, L.; Chen, C.; Bu, J.; Chen, Z.; Tan, S.; He, X. Discriminative codeword selection for image representation. Proceedings of the 18th ACM international conference on Multimedia, 2010, pp. 173–182.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976–11986.
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; others. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 2020. [CrossRef]
- Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: part I. IEEE robotics & automation magazine 2006, 13, 99–110. [CrossRef]
- Labatut, P.; Pons, J.P.; Keriven, R. Efficient multi-view reconstruction of large-scale scenes using interest points, delaunay triangulation and graph cuts. 2007 IEEE 11th international conference on computer vision. IEEE, 2007, pp. 1–8.
- Fitzpatrick, J.M.; Hill, D.L.; Maurer, C.R.; others. Image registration. Handbook of medical imaging 2000, 2, 447–513.
- Avetisyan, A.; Xie, C.; Howard-Jenkins, H.; Yang, T.Y.; Aroudj, S.; Patra, S.; Zhang, F.; Frost, D.; Holland, L.; Orme, C.; others. SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model. arXiv preprint arXiv:2403.13064 2024.
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Superpoint: Self-supervised interest point detection and description. Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 224–236.
- Helber, P.; Bischke, B.; Dengel, A.; Borth, D. Introducing EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2018, pp. 204–207.
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
- Caye Daudt, R.; Le Saux, B.; Boulch, A.; Gousseau, Y. Urban Change Detection for Multispectral Earth Observation Using Convolutional Neural Networks. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2018.
- Chen, H.; Shi, Z. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote Sensing 2020, 12, 1662. [CrossRef]
- Shen, L.; Lu, Y.; Chen, H.; Wei, H.; Xie, D.; Yue, J.; Chen, R.; Lv, S.; Jiang, B. S2Looking: A satellite side-looking dataset for building change detection. Remote Sensing 2021, 13, 5094. [CrossRef]
- Wang, S.; Li, B.Z.; Khabsa, M.; Fang, H.; Ma, H. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768 2020.
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence 2017, 40, 834–848.
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.


| % | OSCD | LEVIR | S2Looking |
|---|---|---|---|
| Baseline | 0.02642196 | 0.009819713 | 0.00814459 |
| Baseline + rarity loss | 0.024850774 | 0.00957441 | 0.0076796 |
| Baseline V2 | 0.025716013 | 0.009981078 | 0.0096835742 |
| Baseline V2 + rarity loss | 0.023570214 | 0.009831453 | 0.009405145 |
| OSCD | LEVIR | |||
|---|---|---|---|---|
| % | mIoU | IoU1 | mIoU | IoU1 |
| Baseline | 57.39 | 22.35 | 53.13 | 12.38 |
| Baseline + rarity loss | 58.38 | 25.38 | 55.47 | 17.28 |
| Baseline V2 | 57.54 | 22.21 | 54.19 | 14.61 |
| Baseline V2 + rarity loss | 59.07 | 25.31 | 54.60 | 16.15 |
| % | EfficientNet | EfficientNetV2 |
|---|---|---|
| Baseline | 77.54 | 72.14 |
| Baseline + rarity loss | 76.51 | 71.02 |
| Encoder | LEVIR | S2Looking | Imagenet |
|---|---|---|---|
| EfficientNet | 0.6869364 | 0.6563601 | 0.797317505 |
| EfficientNetV2 | 0.6720152 | 0.64909315 | 0.7725842 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).