Preprint
Article

This version is not peer-reviewed.

Reconstructing a Century of Urban Growth through Deep Learning-Based Colorization and Segmentation of Historical Aerial and Satellite Imagery: Les Sables-d’Olonne, France (1920–2024)

A peer-reviewed version of this preprint was published in:
Remote Sensing 2026, 18(10), 1517. https://doi.org/10.3390/rs18101517

Submitted:

13 April 2026

Posted:

14 April 2026

You are already at the latest version

Abstract
Coastal urbanization is increasingly constrained by legacy land-use patterns and escalating climate risks, yet long-term morphological trajectories remain poorly quantified due to the absence of multispectral data in pre-satellite archives. This study presents a scalable deep learning pipeline that bridges this century-scale domain gap, enabling automated reconstruction of urban expansion from panchromatic historical aerial imagery (1920–1971) and digital aerial photographs (1997) to contemporary very-high-resolution satellite data (2024) in Les Sables-d’Olonne, France. Spectral restoration was performed using an attention-enhanced Pix2Pix generative adversarial network with hybrid inference, achieving high fidelity (PSNR 35.21 dB, SSIM 0.9762). Semantic segmentation was conducted with U-Net++, yielding strong performance on modern data (mIoU 0.9789). However, direct transfer to historical periods suffered from severe domain shift due to radiometric variations.To overcome this limitation without extensive manual annotation, few-shot adaptation was applied on year-specific calibration sets, producing reliable building footprints (mIoU 0.53–0.65) despite degradation. Multi-scalar analysis of the reconstructed footprints revealed constrained anisotropic expansion: early saturation of the coastal historic core, followed by rapid inland peri-urbanization post-1971 driven by geographic barriers. This spatiotemporal shift has entrenched spatial lock-in, placing recent development in retro-littoral zones vulnerable to submersion and characterized by severe vegetation loss. This framework unlocks previously inaccessible historical archives for quantitative urban monitoring, providing critical insights into legacy effects of unconstrained growth and informing resilient coastal planning under climate change.
Keywords: 
;  ;  ;  ;  ;  ;  ;  ;  ;  

1. Introduction

The twenty-first century is marked by a profound demographic shift toward urbanization, with approximately 55% of the global population residing in urban areas in 2018, a figure projected to rise to 68% by 2050, adding another 2.5 billion individuals to city environments [1,2]. This rapid expansion poses a sustainability paradox: cities serve as engines of economic growth but can entrench land-use patterns that exacerbate energy consumption, biodiversity loss, and exposure to climate risks [3,4]. This “spatial lock-in” is particularly acute in coastal zones, where historical urbanization often predates modern understanding of sea-level rise and erosion dynamics, heightening long-term vulnerabilities [5,6]. Consequently, understanding the long-term morphological trajectories of the past is essential for planning resilient future cities. However, the consistent observation of urban dynamics is currently constrained by the timeline of the satellite era. Programs like Landsat and Sentinel provide reliable multispectral data only from the 1970s onward, leaving early-to-mid-20th-century industrial expansions largely unmonitored [7,8].
Historical aerial photography archives provide the only viable solution to this temporal blind spot. National repositories, such as those of the French Institut National de l’Information Géographique et Forestière (IGN), contain high-resolution surveys dating back to the post-WWI era. These datasets represent an untapped resource for analyzing century-scale urban evolution [1,9,10]. However, unlocking quantitative insights from these archives remains challenging due to the “domain gap”: historical images are typically single-band panchromatic (grayscale), with heterogeneous radiometry and lacking the spectral information required by modern computer vision algorithms [11,12]. In grayscale imagery, spectral confusion between impervious surfaces (e.g., roofs, roads) and bare soil significantly reduces the accuracy of semantic segmentation models that rely on color cues [13,14].
Despite recent progress in domain adaptation and colorization techniques, no fully automated pipeline has yet successfully mapped urban expansion over an entire century while simultaneously handling extreme resolution disparities, severe radiometric heterogeneity, and the complete absence of spectral bands in historical aerial photographs.
To address these limitations without prohibitive manual annotation, the field has increasingly turned to unsupervised domain adaptation and generative techniques. Recent advancements (2022-2026) have established Generative Adversarial Networks (GANs) as powerful tools for bridging historical and modern data domains [15,16,17]. In particular, “restoration-then-segmentation” pipelines using attention-based Pix2Pix GANs can effectively “hallucinate” missing spectral bands, improving PSNR and enabling segmentation gains of over 10% in mAP on tasks such as rooftop and impervious surface detection [14,18]. Simou et al. [19] successfully applied this strategy to colorize archived grayscale satellite imagery, achieving robust land-cover classification in urban environments. Large-scale datasets such as FLAIR have further accelerated progress by providing billions of annotated pixels for training modern segmentation architectures (e.g., U-Net++, Segformer), which perform significantly better on colorized historical data than on raw grayscale inputs [20].
Building on these methodological foundations, this study proposes a scalable end-to-end pipeline for mapping urban expansion from 1920 to 2024 in the coastal region of Vendée, France, with a focus on Les Sables-d’Olonne an area characterized by intense post-war tourism development and high exposure to erosion and marine submersion [21,22]. The contributions of this work are threefold:
1.
Methodological: extension of GAN-based restoration and modern segmentation workflows across a 104-year timeline (1920–2024), effectively bridging severe resolution disparities and unlocking the previously inaccessible “temporal blind spot” of the post-war era,
2.
Analytical: development of a multi-scalar morphological framework that quantifies neighborhood-level densification trajectories, enabling precise characterization of urban saturation and spatiotemporal shifts,
3.
Empirical: delivery of the first continuous century-scale reconstruction of Les Sables-d’Olonne’s urban fabric, revealing a critical transition from isotropic historic growth to transport-driven peri-urbanization that predates contemporary coastal protection policies.

2. Materials and Methods

2.1. Study Area

The study focuses on the coastal municipality of Les Sables-d’Olonne (46°30′N, 1°47′W), located in the Vendée department of western France (Figure 1). This site was selected for its unique combination of three critical attributes rarely observed together:
(i)
one of the fastest post-war urban expansions on the French Atlantic coast (permanent population rose from ≈18,000 in 1954 to 48,402 in 2021, with seasonal peaks exceeding 100,000 inhabitants [23,24]);
(ii)
extreme vulnerability to erosion and marine submersion on a low-elevation sandy barrier (maximum altitude <10 m), exemplified by the devastating impacts of Storm Xynthia in 2010 [21];
(iii)
an exceptionally rich historical aerial archive from the French Institut National de l’Information Géographique et Forestière (IGN), spanning high-resolution surveys from 1920 to 2023 [9].
Morphologically, the area exhibits a distinctive structure shaped by its geography. The historic fishing quarter of La Chaume, on the western side of the harbor channel, features dense, irregular street patterns rooted in medieval origins. In contrast, the eastern side encompasses the modern resort area along Le Remblai promenade and extensive post-war suburban developments. Urban expansion is strongly constrained by natural barriers: the Atlantic Ocean to the southwest, and protected marshes (Marais d’Olonne) and forests to the north and east. These constraints have directed growth primarily inland toward retro-littoral agricultural lands, resulting in pronounced anisotropic urban development making the site particularly suitable for analyzing spatiotemporal morphological shifts.
The analysis extent covers the most important neighborhoods of Les Sables-d’Olonne as shown in Figure 1.

2.2. Workflow

This study follows a structured workflow (Figure 2).

2.3. Data Collection

The study relied on two main data sources: a primary multi-temporal aerial and satellite survey archive for the historical analysis of Les Sables-d’Olonne, and the FLAIR benchmark dataset for model training.

2.3.1. Multi-Temporal Aerial and Satellite Imagery

The core dataset consists of declassified aerial survey campaigns downloaded from the Remonter le Temps platform, managed by the French Institut National de l’Information Géographique et Forestière (IGN).
The earliest dataset Les Sables-d’Olonne consists of a mosaic of eight scenes from the 1920 mission CN20000181. To ensure full coverage of the study area, the following specific Entity IDs were selected: ...0040, ...0041, ...0006, ...0007, ...0008, ...0009, ...0009bis, and ...0013 (prefixed by IGNF_PVA_1-0__1920...).
The immediate post-war period is represented by a single scene acquired on 13 July 1945 (Mission FRANCESUD-OUEST7132, Entity ID ...0003), at 0.8 m resolution.
The mid-20th-century dataset comprises a scene acquired on 03 September 1971 (Mission 1971_F1227, Entity ID ...0040), at 0.4 m resolution.
The late 20th-century dataset is derived from the 1997 digital aerial campaign (Mission 1997_FD85), consisting of a mosaic of four scenes acquired on 31 May 1997 (Entity IDs ...0660, ...0658, ...0483, ...0485), at 0.5 m resolution.
Finally, contemporary reference data were obtained from Very High Resolution (VHR) satellite imagery via Google Earth (Maxar Technologies), captured in 2024 with sub-metric resolution (≈0.30 m), serving as ground truth for validation (Table 1).

2.3.2. FLAIR Training Dataset

The FLAIR dataset [20], a large-scale French aerial benchmark comprising over 20,000 annotated patches at 0.2–0.5 m resolution with 19 land-cover classes (including buildings, roads, vegetation, and water), was used for both colorization and segmentation model training. Its high-resolution RGB imagery enabled the creation of paired grayscale-RGB samples for supervised colorization training, while its detailed land-cover annotations (particularly building footprints) provided supervised labels for segmentation pre-training.

2.4. Data Preparation and Harmonization

A comprehensive harmonization framework was implemented to transform the raw heterogeneous archive into a structured dataset suitable for deep learning. This pipeline addressed geometric, radiometric, and spectral disparities between analog film scans (1920) and digital ground truth (2024), comprising general pre-processing, task-specific preparation for spectral restoration (colorization), post-inference refinement, and preparation for semantic segmentation.

2.4.1. General Pre-processing and Geometric Correction

All inputs underwent standardized geometric correction to ensure spatial consistency across epochs.
A manual georeferencing process was conducted using QGIS, with the 2024 Google Earth VHR imagery serving as the reference baseline. For each historical scene (1920–1997), a minimum of 30 Ground Control Points (GCPs) were identified on immutable urban features (e.g., historical stone masonry and port infrastructure). A Polynomial Order 3 transformation was applied to correct non-linear distortions arising from early camera optics and film aging.
To maintain consistency across the century-scale timeline and prevent model overfitting to high-frequency details available only in modern data, all datasets including training samples were resampled to a uniform Ground Sampling Distance (GSD) of 1.0 m using bicubic interpolation. This "lowest common denominator" strategy ensured that learned features remained robust and detectable throughout the archive.
The co-registered mosaics were partitioned into 512×512 pixel patches with a stride of 384 pixels (25% overlap). A 2D Gaussian weight matrix was subsequently applied during reconstruction to eliminate edge artifacts and ensure seamless stitching between adjacent tiles.

2.4.2. Preparation for Spectral Restoration (Colorization)

Direct supervised training on historical grayscale imagery was not feasible due to the absence of corresponding RGB ground truth. A synthetic paired dataset was therefore constructed for the Att-Pix2Pix generator following our previous paper method [19].
The FLAIR Vendée subset [20] originally comprised 1,750 samples at higher resolution. After downsampling these samples to 1.0 m resolution to match the information density of the 1920 baseline, a subset of 217 tiles remained viable and representative, maximizing morphological diversity while ensuring computational feasibility.
Synthetic grayscale inputs ( I gray ) were generated from the RGB targets using luminance-weighted conversion with additive stochastic noise to replicate film grain characteristics:
I gray = 0.299 R + 0.587 G + 0.114 B + N ( 0 , σ ) ,
where σ 0.02 was empirically calibrated to match observed noise in the 1920–1945 scans.
A custom statistical normalization was applied prior to model ingestion:
x norm = x μ σ × 50 + 127 ,
where μ and σ denote the mean and standard deviation of the current tile. This transformation stabilized the input distribution and prevented gradient instability.
To mitigate overfitting on the limited corpus (217 pairs), aggressive on-the-fly data augmentation was employed, including horizontal/vertical flips and random rotations (90°/180°), effectively expanding the training manifold by a factor of 8 × .

2.4.3. Post-Inference Refinement

GAN-generated outputs were refined to suppress spectral artifacts while preserving structural sharpness:
The generator was constrained to predict only the chromatic channels ( a * , b * ) in CIELAB space. The original grayscale input was retained as the lightness channel ( L * ) to preserve high-frequency structural details and film grain texture.
A median blur filter (kernel size 7) was applied exclusively to the predicted a * and b * channels, effectively removing high-frequency color artifacts ("color dots") without degrading geometric edges.

2.4.4. Preparation for Segmentation

The FLAIR dataset’s 19-class land-cover annotations were adapted to align with the study’s focus on urban morphology.
The multi-class labels were aggregated into a binary schema: the original "building" class was mapped to foreground (1), while all remaining classes (roads, vegetation, water, impervious surfaces, etc.) were collapsed into background (0).
Synchronized data augmentation was applied during segmentation training. Geometric transformations (flips, rotations) were performed identically on both pseudo-RGB inputs and binary masks to maintain label alignment.

2.5. Deep Learning Architectures

Two specialized deep learning architectures were employed to address the challenges of historical urban reconstruction, where geometric fidelity of building footprints is prioritized over purely perceptual quality. The pipeline combines a modified Attention-Guided Pix2Pix (Att-Pix2Pix) for spectral restoration and U-Net++ for semantic segmentation. These selections were informed by empirical comparisons reported in a prior study on historical aerial imagery [19], which demonstrated superior structural preservation for downstream morphological analysis compared to general-purpose alternatives.

2.5.1. Spectral Restoration: Modified Attention Pix2Pix

Spectral restoration was performed using a modified Pix2Pix conditional Generative Adversarial Network (cGAN) framework [25] with integrated self-attention mechanisms (Att-Pix2Pix), as originally proposed and validated in Simou et al. [19].
The conditional GAN objective conditions both the generator G and discriminator D on an input image x, learning a mapping G : { x , z } y where z is random noise and y is the target RGB image. The loss is formulated as:
L cGAN ( G , D ) = E x , y [ log D ( x , y ) ] + E x , z [ log ( 1 D ( x , G ( x , z ) ) ) ] .
Pix2Pix extends this with an L1 reconstruction term to enforce pixel-wise fidelity:
L L 1 ( G ) = E x , y , z [ y G ( x , z ) 1 ] ,
yielding the full objective:
G * = arg min G max D L cGAN ( G , D ) + λ L L 1 ( G ) ,
with λ = 100 in standard implementations.
The generator follows a U-Net encoder-decoder structure with skip connections. To capture long-range spatial dependencies essential for coherent urban textures, self-attention blocks were inserted at the bottleneck (32×32 resolution). The self-attention mechanism computes relationships across all positions as a weighted sum:
Attention ( Q , K , V ) = softmax Q K T d k V ,
where Q, K, and V are query, key, and value projections of the feature map, and d k is the key dimension.
A critical adaptation for historical imagery was the hybrid inference strategy. The generator was constrained to predict only chromatic channels ( a * , b * ) in CIELAB space, while the original grayscale input was retained as the lightness channel ( L * ). This ensures preservation of high-frequency structural details and film grain from the historical scans. Post-processing applied a median blur (kernel size 7) exclusively to the predicted chromatic channels, suppressing high-frequency color artifacts without degrading edges.
This attention-enhanced architecture and hybrid reconstruction were validated in prior work [19] as superior for edge-preserving colorization in historical remote sensing compared to standard Pix2Pix and alternative models (Figure 3).

2.5.2. Semantic Segmentation: U-Net++

Building footprint extraction was performed using the U-Net++ architecture [26] with an EfficientNet-B5 encoder (Figure 4).
U-Net++ extends the original U-Net by introducing nested dense skip pathways to reduce the semantic gap between encoder and decoder feature maps. Let x i , j denote the output of node X i , j , where i indexes the downsampling layer along the encoder and j indexes the convolution layer along the skip pathway. The feature maps are computed as:
x i , j = H ( x i 1 , j ) , j = 0 H ( [ [ x i , k ] k = 0 j 1 , U ( x i + 1 , j 1 ) ] ) , j > 0 ,
where H ( · ) is a convolution followed by activation, U ( · ) is upsampling, and [ · ] denotes concatenation. This dense connectivity enables richer multi-scale feature fusion, particularly effective for detecting both large modern structures and small historical buildings.
Deep supervision was enabled by attaching auxiliary loss branches to intermediate decoder nodes, accelerating convergence and improving feature learning on the limited corpus.
This configuration was selected based on comparative evaluations in Simou et al. [19], demonstrating superior multi-scale performance in historical urban contexts compared to alternatives such as DeepLabV3+ [27] or transformer-based models.

2.6. Implementation and Training Strategy

The deep learning pipeline was implemented using the PyTorch framework on a high-performance computing environment. Training and inference were conducted on an NVIDIA A100 GPU (40 GB VRAM) via Google Colab Pro+, with the Albumentations library employed for efficient on-the-fly data augmentation.

2.6.1. Hyperparameter Configuration

To ensure reproducibility, the training configurations for each network are detailed below.
The spectral restoration model (Att-Pix2Pix) was trained for 100 epochs with a batch size of 16. Optimization was performed using the Adam optimizer ( β 1 = 0.5 , β 2 = 0.999 ) at an initial learning rate of 2 × 10 4 . A linear decay schedule was applied after the first 50 epochs, reducing the learning rate to zero over the remaining epochs to promote stable convergence.
The segmentation model (U-Net++ with EfficientNet-B5 encoder) was initialized with weights pre-trained on ImageNet. Training was conducted for 100 epochs with a batch size of 16 using the Adam optimizer at an initial learning rate of 1 × 10 4 . A ReduceLROnPlateau scheduler was applied to reduce the learning rate by a factor of 0.5 upon validation loss stagnation for 5 consecutive epochs.
To address class imbalance in building footprint extraction, a composite loss function combining binary cross-entropy and Dice loss was minimized:
L seg = L BCE + L Dice .

2.6.2. Evaluation Metrics

Performance was evaluated quantitatively in two domains: spectral reconstruction quality and morphological segmentation accuracy.
For colorization quality, the following standard metrics were computed on the validation set:
  • Peak Signal-to-Noise Ratio (PSNR): Measures pixel-wise fidelity in decibels:
    PSNR = 10 log 10 R 2 MSE ,
    where R is the maximum pixel value and MSE is the mean squared error.
  • Structural Similarity Index (SSIM): Assesses perceived structural similarity:
    SSIM ( x , y ) = ( 2 μ x μ y + C 1 ) ( 2 σ x y + C 2 ) ( μ x 2 + μ y 2 + C 1 ) ( σ x 2 + σ y 2 + C 2 ) ,
    with stabilizing constants C 1 and C 2 .
The accuracy of building footprint segmentation was evaluated using metrics derived from the confusion matrix:
  • Mean Intersection over Union (mIoU): The primary metric, averaging IoU across both classes (building and background) to provide a balanced assessment:
    mIoU = 1 2 IoU building + IoU background ,
    where individual class IoU is defined as:
    IoU = T P T P + F P + F N .
  • Accuracy: Overall proportion of correctly classified pixels.
These metrics were computed on the validation set and averaged across all historical periods.

2.6.3. Transfer Learning and Temporal Domain Adaptation

A significant domain shift exists between modern digital orthophotos and historical analog scans, primarily in radiometric properties, noise characteristics, and resolution. To address this without requiring extensive manual annotation of historical imagery, a transfer learning strategy was applied to adapt the segmentation model to each target epoch (1920, 1945, 1971, 1997).
After the U-Net++ model (EfficientNet-B5 encoder) was pre-trained, providing robust feature representations for building footprints.
Subsequently, fine-tuning was performed on small, epoch-specific calibration datasets. For each historical epoch (1920, 1945, 1971, 1997), 50 representative tiles (512×512 pixels) were manually annotated with binary building masks. These samples were selected to cover morphological diversity across the study area:
  • Urban core: high-density clusters in La Chaume and the city center,
  • Peri-urban zones: low-density scattered dwellings and agricultural structures,
  • Coastal interface: complex boundaries between seafront development and the dune system.
Fine-tuning followed a two-stage protocol. Initially, the encoder layers were frozen, and only the decoder was trained for 20 epochs at a reduced learning rate of 1 × 10 5 to adapt to epoch-specific radiometric characteristics. The full network was then unfrozen and fine-tuned end-to-end for an additional 30 epochs. This hierarchical approach allowed adaptation to historical imaging conditions while retaining high-level structural knowledge acquired from modern data.
For the contemporary epoch (2024), direct inference was performed using the model pre-trained on FLAIR, as no significant domain shift was present.
This transfer learning strategy enabled effective segmentation across all epochs despite limited annotated historical samples.

3. Results

The performance of the proposed pipeline was evaluated quantitatively in two domains: spectral reconstruction quality and morphological segmentation accuracy.

3.1. Spectral Restoration Performance

The performance of the Att-Pix2Pix model for spectral restoration was evaluated on the validation set.
Quantitative results are summarized in Table 2. The model achieved an average Peak Signal-to-Noise Ratio (PSNR) of 35.21 dB and an average Structural Similarity Index (SSIM) of 0.9762. These metrics indicate high spectral fidelity and effective preservation of structural details, attributable to the hybrid inference strategy (retention of the original grayscale lightness channel) and post-processing refinement of chromatic channels.

3.1.1. Qualitative Assessment of Spectral Restoration

While quantitative metrics confirm high fidelity in spectral reconstruction, visual inspection of the restored outputs (Figure 5) provides further insight into the model’s performance across diverse urban and coastal features.
The restored imagery demonstrates effective recovery of plausible colors consistent with known land-cover types. In the dense historic core (Row 1, Patch 00076), the model distinguishes red clay roofing in residential areas from gray industrial surfaces, providing enhanced contrast absent in the original panchromatic inputs. This spectral separation supports subsequent segmentation by introducing discriminative chromatic cues.
The hybrid inference strategy retaining the original grayscale lightness channel ( L * ) preserves fine structural details. In Row 2 (Patch 00075), port infrastructure and road networks remain sharp across epochs, without the blurring commonly associated with full GAN-generated RGB outputs. Undeveloped marshlands in early epochs are rendered in natural earth tones, contrasting with later concrete development.
The model exhibits robustness across morphological zones. Coastal transitions (Row 1) show natural gradients between sandy foreshore and marine water. Salt marsh grids (Row 3, Patch 00074) maintain consistent patterning and channel delineation, reflecting the model’s ability to infer spectral properties from grayscale texture alone.
These qualitative results align with the high PSNR (35.21 dB) and SSIM (0.9762) scores, indicating that the restoration is both statistically accurate and visually coherent.

3.2. Semantic Segmentation Performance

3.2.1. Architecture Comparison on Modern Data

A comparative evaluation was conducted on the modern validation set to validate the selection of the segmentation architecture. Four established models were compared: U-Net++ (with EfficientNet-B5 encoder), SegFormer, DeepLabV3+, and Feature Pyramid Network (FPN).
Results are presented in Table 3. U-Net++ achieved the highest mIoU of 0.9789, outperforming SegFormer (0.9743), DeepLabV3+ (0.9720), and FPN (0.9695). This consistent advantage supports the choice of U-Net++ for its effectiveness in capturing multi-scale building features through nested skip pathways, particularly relevant for the wide range of structure sizes in the dataset.

3.2.2. Zero-Shot Segmentation Performance

To assess the limitations of direct transfer from modern training data, the U-Net++ model (pre-trained on the FLAIR dataset) was applied without fine-tuning to the pseudo-RGB outputs for each year. Results are visualized in Figure 6.
Performance varied substantially across epochs, reflecting differences in radiometric and textural properties between the restored historical imagery and the modern FLAIR training domain.
On 2024, segmentation was highly accurate, with clear delineation of building footprints and minimal errors, consistent with the model’s strong performance on modern validation data.
In contrast, the historical epochs (1920, 1945, 1971, 1997) exhibited varying degrees of degradation. The earliest epochs (1920 and 1945) produced fragmented masks, with low detection rates for smaller structures and incomplete delineation of larger building blocks. Urban areas were partially identified, but boundary precision was limited due to residual noise and intensity variations in the restored inputs.
1971 showed improved overall detection but noisy and irregular boundaries, particularly in dense clusters where individual buildings were merged.
The most severe degradation occurred in 1997, where the distinct greenish radiometric profile led to near-complete failure in identifying building footprints, despite visible urban fabric in the pseudo-RGB input.
These results underscore the impact of domain shift on zero-shot transfer. While the colorization stage recovered basic semantic cues and enabled strong performance on 2024, variations in global color histograms and texture across historical epochs restricted direct applicability of the modern pre-trained model. This limitation justified the subsequent few-shot adaptation strategy to align the segmentation network with year-specific characteristics.

3.2.3. Few-Shot Adaptation Performance

The selected U-Net++ model was adapted to each historical year (1920, 1945, 1971, 1997) using few-shot fine-tuning on 50 manually annotated tiles per year.
Results are summarized in Table 4 and visualized in Figure 7. Mean IoU ranged from 0.5287 (1945) to 0.6508 (1920), with accuracy consistently above 0.77. Fine-tuning substantially improved performance compared to the zero-shot baseline (Figure 6).
In 1997 , where zero-shot application resulted in near-complete failure due to the distinct greenish radiometric profile, fine-tuning enabled reliable detection of building footprints, including dense coastal development and port structures.
For the earlier years (1920, 1945, 1971), fragmentation and noisy boundaries observed in zero-shot predictions were reduced, yielding more coherent and precise building masks. Individual structures in dense clusters were better delineated, and small buildings exhibited higher detection rates.
Overall, fine-tuning produced consistent segmentation quality across years, aligning historical outputs with similar high accuracy achieved on the 2024 data. Errors were primarily confined to boundaries, while core footprints were reliably identified. This adaptation enabled standardized morphological analysis over the full century-scale timeline.

3.2.4. Urban Morphological Dynamics

The reconstructed building footprints enabled quantitative assessment of urban expansion across the century-scale timeline. Results are presented in Figure 8 and Figure 9.
Figure 8 shows cumulative built-up area per neighborhood in 2024, ordered spatially from coast (top) to inland (bottom). The highest totals were recorded in Sud (73.5 ha), Les Plesses (72.6 ha), and Chaume Sud (70.2 ha). Coastal neighborhoods exhibit dominance of early construction periods (pre-1945, dark colors), while inland zones show substantial contributions from recent periods (1997–2024, light yellow).
Figure 9 maps building footprints color-coded by construction period. Early periods (pre-1945, purple/dark tones) are concentrated in the historic coastal core (Chaume Nord/Sud, Passage-Notre-Dame-Guynemer). Later periods (post-1971, orange/light tones) dominate inland and peri-urban extensions (Les Plesses, La Métairie, Sud).
These visualizations illustrate a spatiotemporal gradient: early densification in the coastal core and progressive inland expansion in recent decades.

4. Discussion

The results demonstrate the effectiveness of the proposed pipeline in enabling quantitative morphological analysis across a century-scale timeline characterized by significant domain shift.
Spectral restoration achieved high fidelity (PSNR 35.21 dB, SSIM 0.9762), recovering plausible colors while preserving structural sharpness through the hybrid inference strategy. Qualitative inspection confirmed consistent rendering of urban textures and natural gradients in coastal zones (Figure 5).
Semantic segmentation with U-Net++ yielded strong performance on modern data (mIoU 0.9789). Zero-shot transfer to historical years revealed substantial domain shift effects, with mIoU dropping due to radiometric variations (Figure 6). Few-shot adaptation on limited annotations (50 tiles per year) substantially improved performance (mIoU 0.53–0.65), enabling reliable footprint extraction across the archive (Table 4).
The reconstructed building footprints facilitated multi-scalar quantification of urban evolution, visualized in Figure 10. Current saturation levels in 2024 (Panel A) are highest in coastal historic neighborhoods (e.g., Chaume Sud, Passage-Notre-Dame-Guynemer), exceeding 80–100%, while inland zones show moderate saturation (40–60%). This reflects early densification in the constrained coastal core.
The share of new construction (Panel B) indicates limited growth in early periods (1920–1971) across most neighborhoods. Expansion accelerated after 1971, with the 1997–2024 interval contributing the largest shares in inland zones (e.g., Les Plesses, La Métairie, Sud).
Spatial saturation mapping (Panel C) confirms higher density in the historic core and lower values in peri-urban extensions. Directional anisotropy (Panel D) reveals a predominant inland vector (east-northeast), consistent with physical barriers limiting coastal development.
These patterns reveal constrained anisotropic expansion. Early urbanization saturated the coastal core, while recent growth shifted inland toward retro-littoral agricultural lands, driven by geographic barriers (Atlantic Ocean, protected marshes, and forests). The transition from slow, isotropic growth pre-1971 to rapid peri-urbanization post-1971 aligns with broader coastal dynamics, including tourism and transport development.
The morphological patterns also highlight spatial lock-in effects with implications for climate vulnerability and ecological change (Figure 11 and Figure 12). The pre-1945 urban fabric overlaps substantially with submersion hazard zones affected by Storm Xynthia in 2010 (Figure 11). This indicates that early densification occurred in low-lying areas now recognized as high-risk under the Plan de Prévention des Risques Littoraux (PPRL) du Pays des Olonnes, which maps water heights exceeding 1 m in central and coastal sectors [28]. Official post-Xynthia assessments further confirm legacy exposure in the historic core, with significant damage to the Remblai promenade and port infrastructure [29,30].
Recent inland expansion has occurred in retro-littoral zones characterized by severe vegetation loss (Figure 12), reflecting widespread concretization of former natural and agricultural land. This aligns with official observatory data showing significant net artificialisation in Les Sables-d’Olonne agglomeration (code INSEE 85194) from 2009–2021, primarily in inland peri-urban areas [31]. The reduction in green spaces and marsh buffers exacerbates flood propagation and biodiversity decline, compounding exposure in these newly urbanized areas.
These findings underscore legacy effects of unconstrained post-war peri-urbanization, predating contemporary coastal protection policies. The inland shift, while avoiding direct coastal exposure, has placed new development in terrains vulnerable to indirect submersion risks and runoff amplification.
Limitations include dependence on manual georeferencing and potential residual inconsistencies in the earliest scans. Future work could explore automated registration techniques, unsupervised domain adaptation, and extension to additional coastal sites.
Overall, this framework provides a scalable tool for long-term coastal urban monitoring, with applications to vulnerability assessment and resilient planning in sites like Les Sables-d’Olonne.

5. Conclusions

The proposed pipeline successfully enabled quantitative reconstruction and morphological analysis of urban evolution in Les Sables-d’Olonne across a century-scale timeline (1920–2024). Spectral restoration achieved high fidelity (PSNR 35.21 dB, SSIM 0.9762), recovering plausible colors while preserving structural sharpness through hybrid inference. Semantic segmentation with U-Net++ provided robust building footprint extraction, with few-shot adaptation yielding mIoU values of 0.53–0.65 on historical years despite significant domain shift.
The resulting multi-scalar morphological analysis revealed constrained anisotropic expansion. Early densification saturated the coastal historic core, while growth accelerated post-1971, shifting inland toward retro-littoral zones. This pattern reflects geographic barriers (ocean, protected marshes, and forests) directing development away from the coast. The analysis further highlighted spatial lock-in effects, with pre-1945 urban fabric overlapping submersion hazard zones and recent inland expansion coinciding with severe vegetation loss.
These findings confirm the three contributions of this study: a methodological extension bridging century-scale domain gaps, an analytical framework quantifying neighborhood-level densification and spatiotemporal shifts, and an empirical reconstruction uncovering legacy effects of unconstrained peri-urbanization predating modern protection policies.
The framework offers a scalable approach for long-term coastal urban monitoring, with direct applications to vulnerability assessment and resilient planning in exposed sites. Future extensions could incorporate automated georeferencing and unsupervised adaptation to broaden applicability to additional historical archives.

Author Contributions

Conceptualization, M.R.S., M.M. (Mohamed Maanan), M.M. (Mehdi Maanan), and H.R.; methodology, M.R.S., M.M. (Mohamed Maanan), and M.M. (Mehdi Maanan); software, M.R.S.; validation, M.R.S., and A.H.; formal analysis, M.R.S.; investigation, M.R.S.; resources, M.R.S.; data curation, M.R.S., and A.H.; writing—original draft preparation, M.R.S.; writing—review and editing, M.R.S., M.M. (Mohamed Maanan), A.H., M.B., M.M. (Mehdi Maanan), and H.R.; visualization, M.R.S.; supervision, M.M. (Mohamed Maanan), M.M. (Mehdi Maanan), and H.R.; project administration, M.R.S. and M.M. (Mohamed Maanan); funding acquisition, M.M. (Mohamed Maanan) and M.M. (Mehdi Maanan). All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this article are available upon request from the corresponding author.

Acknowledgments

Mohamed Rabii Simou acknowledges the Centre National pour la Recherche Scientifique et Technique(CNRST), Kingdom of Morocco, for the “Ph.D. ASsociate Scholarship-PASS”.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
Att-Pix2Pix Attention-enhanced Pix2Pix
cGAN conditional Generative Adversarial Network
GAN Generative Adversarial Network
PSNR Peak Signal-to-Noise Ratio
SSIM Structural Similarity Index Measure
mIoU mean Intersection over Union
IoU Intersection over Union
FLAIR French Land cover from Aerospace Imagery Resources
IGN Institut National de l’Information Géographique et Forestière
VHR Very High Resolution
GCP Ground Control Point
TPS Thin Plate Spline
GSD Ground Sampling Distance
RMSE Root Mean Square Error
CLAHE Contrast Limited Adaptive Histogram Equalization
CIELAB Commission Internationale de l’Éclairage L*a*b* color space
PPRN Plan de Prévention des Risques Naturels

References

  1. United Nations, Department of Economic and Social Affairs, Population Division. World Urbanization Prospects: The 2018 Revision. Technical report, United Nations, 2019. Accessed: 2025-11-28.
  2. United Nations. 68% of the world population projected to live in urban areas by 2050. https://www.un.org/development/desa/en/news/population/2018-revision-of-world-urbanization-prospects.html, 2018. Accessed: 2025-11-28.
  3. Seto, K.C.; Güneralp, B.; Hutyra, L.R. Global forecasts of urban expansion to 2030 and direct impacts on biodiversity and carbon pools. Proceedings of the National Academy of Sciences 2012, 109, 16083–16088. [Google Scholar] [CrossRef] [PubMed]
  4. Güneralp, B.; Seto, K.C. Futures of global urban expansion: uncertainties and implications for biodiversity conservation. Environmental Research Letters 2013, 8, 014025. [Google Scholar] [CrossRef]
  5. Neumann, B.; Vafeidis, A.T.; Zimmermann, J.; Nicholls, R.J. Future coastal population growth and exposure to sea-level rise and coastal flooding—a global assessment. PLoS ONE 2015, 10, e0118571. [Google Scholar] [CrossRef] [PubMed]
  6. IPCC. Climate Change 2022: Impacts, Adaptation, and Vulnerability – Cross-Chapter Paper 2: Cities and Settlements by the Sea. Technical report, Intergovernmental Panel on Climate Change, 2022.
  7. Wulder, M.A.; Roy, D.P.; Radeloff, V.C.; Loveland, T.R.; Anderson, M.C.; Johnson, D.M.; Healey, S.; Zhu, Z.; Scambos, T.A.; Pahlevan, N.; et al. Fifty years of Landsat science and impacts. Remote Sensing of Environment 2022, 280, 113195. [Google Scholar] [CrossRef]
  8. Zhu, Z.; Wulder, M.A.; Roy, D.P.; Woodcock, C.E.; Hansen, M.C.; Radeloff, V.C.; Healey, S.P.; Schaaf, C.; Hostert, P.; Strobl, P.; et al. Benefits of the free and open Landsat data policy. Remote Sensing of Environment 2019, 224, 382–385. [Google Scholar] [CrossRef]
  9. Institut National de l’Information Géographique et Forestière (IGN France). Remonter le Temps — Couverture photographique aérienne historique 1919–2023. Online, 2024. Accessed: 29 November 2025.
  10. EuroSDR. Archiving and geoprocessing of historical aerial images: current status in Europe. Official publication no. 70, EuroSDR, 2019.
  11. Giordano, S.; Le Bris, A.; Mallet, C. Fully automatic analysis of archival aerial images: current status and challenges. In 2017 Joint Urban Remote Sensing Event (JURSE); 2017; pp. 1–4. [Google Scholar]
  12. Hao, T.; Zhang, L.; Zhang, Y.; Chen, M.; Zhang, J.; Dong, R.; Fu, H. WakeupUrban: Unsupervised Semantic Segmentation of Mid-20th Century Urban Landscapes with Satellite Imagery. arXiv 2025, arXiv:2506.09476. [Google Scholar]
  13. Poterek, Q.; Herrault, P.A.; Skupinski, G.; Sheeren, D. Deep learning for automatic colorization of legacy grayscale aerial photographs. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2020, 13, 2899–2915. [Google Scholar] [CrossRef]
  14. Chen, P.; Wang, S.; Wang, C.; Wang, S.; Huang, B.; Huang, L.; Zang, Z. A GAN-enhanced deep learning framework for rooftop detection from historical aerial imagery. International Journal of Remote Sensing 2025, 46, 6260–6283. [Google Scholar] [CrossRef]
  15. Tasar, O.; Happy, S.L.; Tarabalka, Y.; Alliez, P. ColorMapGAN: Unsupervised domain adaptation for semantic segmentation using color mapping generative adversarial networks. IEEE Transactions on Geoscience and Remote Sensing 2020, 58, 7178–7193. [Google Scholar] [CrossRef]
  16. Benjdira, B.; Bazi, Y.; Koubaa, A.; Ouni, K. Unsupervised domain adaptation using generative adversarial networks for semantic segmentation of aerial images. Remote Sensing 2019, 11, 1369. [Google Scholar] [CrossRef]
  17. Xu, M.; Wu, M.; Chen, K.; Zhang, C.; Guo, J. The eyes of the gods: A survey of unsupervised domain adaptation methods based on remote sensing data. Remote Sensing 2022, 14, 4380. [Google Scholar] [CrossRef]
  18. Farella, E.M.; Malek, S.; Remondino, F. Colorizing the past: Deep learning for the automatic colorization of historical aerial images. Journal of Imaging 2022, 8, 269. [Google Scholar] [CrossRef] [PubMed]
  19. Simou, M.R.; Maanan, M.; Loulad, S.; Maanan, M.; Rhinane, H. New Approach for Mapping Land Cover from Archive Grayscale Satellite Imagery. Technologies 2025, 13, 158. [Google Scholar] [CrossRef]
  20. Garioud, A.; Peillet, S.; Bookjans, E.; Giordano, S.; Wattrelos, B. Flair# 1: semantic segmentation and domain adaptation dataset. arXiv 2022, arXiv:2211.12979. [Google Scholar]
  21. Frifra, A.; Maanan, M.; Maanan, M.; Rhinane, H. Simulating future exposure to coastal urban flooding using a neural network–Markov model. Journal of Marine Science and Engineering 2024, 12, 800. [Google Scholar] [CrossRef]
  22. Audère, M.; Robin, M. Assessment of the vulnerability of sandy coasts to erosion (short and medium term) for coastal risk mapping (Vendée, W France). Ocean & Coastal Management 2021, 201, 105452. [Google Scholar] [CrossRef]
  23. Institut National de la Statistique et des Études Économiques (INSEE). Population en 2021 — Commune des Sables-d’Olonne (85194). Online, 2025. Accessed: 29 November 2025.
  24. Institut National de la Statistique et des Études Économiques (INSEE). Population en 2021 — Département de la Vendée (85). Online, 2025. Accessed: 29 November 2025.
  25. Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition; 2017; pp. 1125–1134. [Google Scholar]
  26. Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the International workshop on deep learning in medical image analysis; Springer, 2018; pp. 3–11. [Google Scholar]
  27. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the Proceedings of the European conference on computer vision (ECCV); 2018; pp. 801–818. [Google Scholar]
  28. DREAL Pays de la Loire and Préfecture de la Vendée. Plan de Prévention des Risques Littoraux (PPRL) du Pays des Olonnes. Document officiel approuvé, 2016. Cartographie des zones de submersion avec hauteurs d’eau >1 m dans les secteurs centraux et côtiers de Les Sables-d’Olonne. Accédé en 2026.
  29. Préfecture de la Vendée. 10 ans après la tempête Xynthia : la Vendée se souvient – Impacts aux Sables-d’Olonne. Rapport officiel d’anniversaire, 2020. Détails des dommages Xynthia à Les Sables-d’Olonne (Remblai, port, évacuations). Accédé en 2026.
  30. Sénat Français. Xynthia, 5 ans après : pour une véritable culture du risque dans les territoires exposés. Technical Report Rapport d’information n° 536 (2014-2015), Sénat Français, 2015. Rapport officiel du Sénat avec références spécifiques aux dommages et vulnérabilité à Les Sables-d’Olonne. Accédé en 2026.
  31. Cerema and Ministère de la Transition écologique et de la Cohésion des territoires. Observatoire national de l’artificialisation des sols – Données pour Les Sables-d’Olonne Agglomération. Portail officiel de l’artificialisation des sols, 2024. Période 2011–2024 : 1 895 700 m² (189.57 ha) de surfaces nouvelles consommées, soit 2.21% de la surface communale, principalement habitat et activité en zones rétro-littorales. Accédé en 2026.
Figure 1. Study area.
Figure 1. Study area.
Preprints 208061 g001
Figure 2. Methodology of the study.
Figure 2. Methodology of the study.
Preprints 208061 g002
Figure 3. Architecture of the Attention-Enhanced Pix2Pix Generator.
Figure 3. Architecture of the Attention-Enhanced Pix2Pix Generator.
Preprints 208061 g003
Figure 4. Architecture of the U-Net++.
Figure 4. Architecture of the U-Net++.
Preprints 208061 g004
Figure 5. Qualitative Examples of Spectral Restoration. Restored outputs for selected patches, compared to contemporary reference. Rows show different morphological zones: urban core (top), port/infrastructure (middle), and salt marshes (bottom).
Figure 5. Qualitative Examples of Spectral Restoration. Restored outputs for selected patches, compared to contemporary reference. Rows show different morphological zones: urban core (top), port/infrastructure (middle), and salt marshes (bottom).
Preprints 208061 g005
Figure 6. Zero-Shot Segmentation Results Across Years. Pseudo-RGB inputs (from Att-Pix2Pix restoration) overlaid with building footprint predictions (U-Net++ pre-trained on FLAIR, no fine-tuning). The rightmost column shows the Ground Truth reference imagery. Rows illustrate different morphological zones, highlighting the contrast between strong modern performance and varying historical degradation due to domain shift.
Figure 6. Zero-Shot Segmentation Results Across Years. Pseudo-RGB inputs (from Att-Pix2Pix restoration) overlaid with building footprint predictions (U-Net++ pre-trained on FLAIR, no fine-tuning). The rightmost column shows the Ground Truth reference imagery. Rows illustrate different morphological zones, highlighting the contrast between strong modern performance and varying historical degradation due to domain shift.
Preprints 208061 g006
Figure 7. Segmentation Results After Few-Shot Adaptation. Pseudo-RGB inputs (from Att-Pix2Pix restoration) overlaid with building footprint predictions (U-Net++ after fine-tuning on 50 tiles per year). The rightmost column shows performance on the result of 2024 imagery using pre-trained model. Rows illustrate different morphological zones (urban core, port/infrastructure, salt marshes), demonstrating improved detection and boundary precision compared to zero-shot results.
Figure 7. Segmentation Results After Few-Shot Adaptation. Pseudo-RGB inputs (from Att-Pix2Pix restoration) overlaid with building footprint predictions (U-Net++ after fine-tuning on 50 tiles per year). The rightmost column shows performance on the result of 2024 imagery using pre-trained model. Rows illustrate different morphological zones (urban core, port/infrastructure, salt marshes), demonstrating improved detection and boundary precision compared to zero-shot results.
Preprints 208061 g007
Figure 8. Urban Growth Chronology: Cumulative Built-Up Area by Neighborhood (1920–2024). Neighborhoods ordered from coast (top) to inland (bottom). Color gradient represents construction period, from pre-1920 (dark purple) to 1997–2024 (light yellow).
Figure 8. Urban Growth Chronology: Cumulative Built-Up Area by Neighborhood (1920–2024). Neighborhoods ordered from coast (top) to inland (bottom). Color gradient represents construction period, from pre-1920 (dark purple) to 1997–2024 (light yellow).
Preprints 208061 g008
Figure 9. Spatial Distribution of Urban Chronology (1920–2024). Building footprints, color-coded by construction period (pre-1920 dark purple to 1997–2024 light yellow).
Figure 9. Spatial Distribution of Urban Chronology (1920–2024). Building footprints, color-coded by construction period (pre-1920 dark purple to 1997–2024 light yellow).
Preprints 208061 g009
Figure 10. Urban Morphological Dynamics: Les Sables-d’Olonne (1920–2024). (A) Saturation matrix showing percentage of built-up area per neighborhood in 2024. (B) Share of new construction per time period, with neighborhoods ordered from coast (top) to inland (bottom). (C) Spatial distribution of saturation in 2024. (D) Directional growth anisotropy polar plot, indicating predominant inland expansion.
Figure 10. Urban Morphological Dynamics: Les Sables-d’Olonne (1920–2024). (A) Saturation matrix showing percentage of built-up area per neighborhood in 2024. (B) Share of new construction per time period, with neighborhoods ordered from coast (top) to inland (bottom). (C) Spatial distribution of saturation in 2024. (D) Directional growth anisotropy polar plot, indicating predominant inland expansion.
Preprints 208061 g010
Figure 11. Spatial Lock-In: Heritage Assets in Submersion Hazard Zones. Pre-1945 urban fabric overlaid on submersion risk areas (Xynthia-affected zones highlighted), illustrating legacy exposure in the historic coastal core.
Figure 11. Spatial Lock-In: Heritage Assets in Submersion Hazard Zones. Pre-1945 urban fabric overlaid on submersion risk areas (Xynthia-affected zones highlighted), illustrating legacy exposure in the historic coastal core.
Preprints 208061 g011
Figure 12. Ecological Turnover: Consumed Vegetation (1920–2024). Vegetation loss per neighborhood, showing severe concretization in inland expansion zones.
Figure 12. Ecological Turnover: Consumed Vegetation (1920–2024). Vegetation loss per neighborhood, showing severe concretization in inland expansion zones.
Preprints 208061 g012
Table 1. Aerial and Satellite Data Specifications. Inventory of the IGN aerial missions and modern satellite data used for the multi-temporal urban reconstruction.
Table 1. Aerial and Satellite Data Specifications. Inventory of the IGN aerial missions and modern satellite data used for the multi-temporal urban reconstruction.
Year Mission / System Scenes Resolution Acquisition Date Source
1920 CN20000181 8 ≈1.0 m 1920 IGN
1945 FRANCESUD-OUEST7132 1 0.8 m 13 July 1945 IGN
1971 1971_F1227 1 0.4 m 03 Sept 1971 IGN
1997 1997_FD85 4 0.5 m 31 May 1997 IGN
2024 Google Earth VHR ≈0.30 m 2024 Maxar
Table 2. Quantitative Evaluation of Spectral Restoration. Average performance metrics on the validation set.
Table 2. Quantitative Evaluation of Spectral Restoration. Average performance metrics on the validation set.
Metric Value
Average PSNR 35.21 dB
Average SSIM 0.9762
Table 3. Architecture Comparison on Modern Validation Data.
Table 3. Architecture Comparison on Modern Validation Data.
Model mIoU
FPN 0.9695
DeepLabV3+ 0.9720
SegFormer 0.9743
U-Net++ (selected) 0.9789
Table 4. Segmentation Performance After Few-Shot Adaptation to Historical Years.
Table 4. Segmentation Performance After Few-Shot Adaptation to Historical Years.
Year Accuracy mIoU
1920 0.9090 0.6508
1945 0.8411 0.5287
1971 0.7768 0.5295
1997 0.8305 0.5818
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated