This case study examines the application of Convolutional Neural Networks (CNNs) for LULC change detection using Sentinel-2 imagery, focusing on a specific region over a defined time period. The case study provides a concrete example that illustrates the theoretical principles discussed in the review. It allows readers to see how methodologies, such as Convolutional Neural Networks (CNNs) and remote sensing techniques, are applied in real-world scenarios. This practical application reinforces the relevance and utility of the theoretical concepts. Furthermore, the case study allows for a detailed exploration of the methodologies employed in LULC change detection and prediction. It offers insight into the specific steps taken—such as data acquisition, preprocessing, model training, and evaluation—that are critical for successful implementation. This transparency can guide future researchers and practitioners in their own studies.
4.1. Study Area
The study area selected for this case study is the city of Porto, Portugal, known for its diverse land use patterns, including urban, agricultural, and natural landscapes. The region’s rapid urbanization and changing land use dynamics make it a suitable candidate for LULC change detection.
The Sentinel-2 mission provides high-resolution multispectral images with a spatial resolution of 10 to 60 meters across various spectral bands. For this study, we utilized the following bands: Band 2 (Blue, 490 nm), Band 3 (Green, 560 nm), Band 4 (Red, 665 nm), Band 5 (Vegetation Red Edge, 705 nm), Band 6 (Vegetation Red Edge, 740 nm), Band 7 (Vegetation Red Edge, 783 nm), Band 8 (Near Infrared, 842 nm), Band 11 (Shortwave Infrared, 1610 nm), Band 12 (Shortwave Infrared, 2190 nm).
A key objective of this study was to measure the changes in land cover in Porto over a significant time span, utilizing historical remote sensing data. To achieve this, we selected three specific time points for analysis: one representing the present (2020), one reflecting a more distant past (1950), and an intermediate period (1980). The land cover categories used in this study are (1) Artificial Built Elements, ABE; (2) Trees and Shrubs, TRS; (3) Herbaceous, HER; (4) Sparsely Vegetated—Terrestrial, SPV; and (5) Sparsely Vegetated—Aquatic, AQU.
4.2. Methodology
The methodology for Land Use and Land Cover (LULC) change detection using Sentinel-2 imagery and Convolutional Neural Networks (CNNs) encompasses several critical steps, including data preprocessing, labeling, model training, and evaluation. The following sections provide a detailed description of each component of the methodology.
4.2.1. Preprocessing
The preprocessing steps include atmospheric correction, cloud masking, image resampling, and data augmentation. Atmospheric correction is a critical step in remote sensing that aims to remove atmospheric effects from the satellite images, thus providing accurate surface reflectance values. In this study, we applied the Sen2Cor algorithm (
https://step.esa.int/thirdparties/sen2cor/2.10.0/Sen2Cor-User-Guide-2.10.0.pdf), which is specifically designed for Sentinel-2 data. Sen2Cor processes Level-1C products (top-of-atmosphere reflectance) and generates Level-2A products (surface reflectance). This algorithm accounts for atmospheric scattering and absorption, as well as the effects of clouds and aerosols. The output is a set of images with surface reflectance values that accurately represent the land cover characteristics without atmospheric interference.
Clouds can significantly affect the quality of satellite imagery by obscuring the surface and introducing noise in the data. To address this issue, we utilized the cloud mask provided with Sentinel-2 products. The cloud mask identifies cloudy pixels based on spectral characteristics, allowing us to exclude these pixels from further analysis. By applying the cloud mask, we ensured that only clear-sky pixels were included in the dataset, leading to more reliable results in the subsequent LULC classification.
To maintain consistency across the dataset, all Sentinel-2 images were resampled to a uniform spatial resolution of 10 meters. This step is crucial because Sentinel-2 imagery includes bands with varying spatial resolutions (10m, 20m, and 60m). By resampling all bands to 10m, we ensured that the CNN model could process the data without encountering issues related to differing resolutions. The resampling process was performed using bilinear interpolation, which helps to preserve the spatial characteristics of the imagery.
To enhance the robustness of the model and prevent overfitting, we implemented data augmentation techniques on the training dataset. Data augmentation involves artificially increasing the size of the training set by creating modified versions of the original images. In this study, we applied several augmentation techniques to enhance the diversity of our training dataset and improve the robustness of the model. One of the techniques used was rotation, where images were rotated by small angles, such as 90°, 180°, and 270°, to introduce variability in orientation. Additionally, we implemented flipping, which involved both horizontal and vertical flipping of images to create mirror images. This technique helped the model learn to recognize features from different perspectives. Another augmentation method employed was scaling, where images were randomly scaled to simulate various distances from the sensor. Finally, we made random adjustments to the brightness of the images to account for variations in illumination conditions, ensuring that the model could perform well under different lighting scenarios. These augmentation techniques increased the diversity of the training dataset, allowing the CNN model to generalize better to unseen data.
4.2.2. Labeling
To create a comprehensive labeled dataset, we drew upon existing land cover maps provided by local authorities and remote sensing organizations. These maps offered valuable baseline information regarding the various land cover types within the study area, serving as essential references for training our model. By integrating this reliable data, we ensured that our labeled dataset accurately reflected the diverse land cover characteristics, thereby enhancing the model’s ability to learn and make accurate predictions.
In addition to utilizing existing land cover maps, we undertook the process of manual digitization to achieve a more accurate definition of land use and land cover (LULC) classes. This involved a careful visual interpretation of the Sentinel-2 imagery, allowing us to delineate various land cover types effectively. Urban areas were identified by their high reflectance values in the visible spectrum, which are characterized by the presence of buildings and infrastructure. Agricultural land was recognized through its distinct patterns and colors, often displaying seasonal variations that reflect different crop types and growth stages. Forests were discerned based on their dense green canopy and the higher reflectance observed in the near-infrared spectrum, which is typical of healthy vegetation. Water bodies, on the other hand, were characterized by low reflectance values and specific spectral signatures in the blue and near-infrared bands.
To ensure the accuracy of our digitized features, we validated them using ground truth data where available. This step was crucial in confirming that our interpretations aligned with actual land cover conditions. Each pixel in the labeled dataset was subsequently assigned a class label corresponding to its respective land cover type. This meticulously constructed labeled dataset served as the foundation for training the Convolutional Neural Network (CNN) model, enabling it to learn and accurately predict LULC changes in the study area.
4.2.3. Architecture of the Convolutional Neural Network (CNN)
The architecture of CNN designed for Land Use and Land Cover (LULC) change detection using Sentinel-2 imagery consists of several key components that work in unison to process input data, extract relevant features, and perform classification.
At the outset, the input layer serves as the gateway to the CNN, where the preprocessed Sentinel-2 images are introduced to the model. For this study, the input layer accepts multi-channel images corresponding to the nine spectral bands of Sentinel-2, which include bands such as Blue, Green, Red, Vegetation Red Edge, Near Infrared, and Shortwave Infrared. Each input image is represented as a three-dimensional tensor, encompassing dimensions that correspond to the height and width of the image as well as the number of channels (bands). This multi-channel input allows the model to utilize the rich spectral information inherent in Sentinel-2 data.
The core of the CNN architecture lies in the convolutional layers, which are responsible for feature extraction from the input images. These layers consist of multiple filters (kernels) 3x3 and 5x5, that convolve over the input data to capture spatial hierarchies and patterns. After each convolution operation, a Rectified Linear Unit (ReLU) activation function is applied to introduce non-linearity into the model, allowing it to learn complex patterns. By stacking multiple convolutional layers, the architecture can learn increasingly abstract features, transitioning from basic edges and textures in the initial layers to more complex shapes and structures in deeper layers.
Following the convolutional layers, pooling layers are employed to reduce the spatial dimensions of the feature maps generated. This downsampling is crucial as it decreases the computational load, mitigates the risk of overfitting, and enhances the model’s invariance to small translations in the input data. Max pooling is utilized in this architecture, where the maximum value within a specified window (e.g., 2x2) is taken as the representative value for that region. This operation effectively retains the most salient features while discarding less important information. After 2 convolutional and pooling layers, the high-level features extracted from the input images are flattened into a one-dimensional vector and fed into one or more fully connected layers. These layers are essential for classification, as each neuron in a fully connected layer is linked to all neurons in the preceding layer. This structure enables the model to learn global patterns and relationships among the features. The final fully connected layer outputs a vector of logits corresponding to the number of LULC classes defined in the dataset.
The output layer, the final component of the CNN architecture, is responsible for generating the predicted probabilities for each LULC class. A softmax activation function is applied to the output of the last fully connected layer, converting the logits into probabilities that sum to one across all classes. The class with the highest probability is selected as the model’s prediction for each pixel in the input image.
The model architecture was improved for Land Use and Land Cover (LULC) change prediction and incorporates several enhancements aimed at increasing the model’s effectiveness and accuracy.
One of the primary enhancements is the input layer, which benefits from multi-temporal input. Instead of relying solely on a single time slice for predictions, this approach utilizes multi-temporal input images, such as those from 1950, 1980 and 2020. By doing so, the model can effectively learn the temporal dynamics of land cover changes, capturing shifts and transformations over time. Additionally, feature stacking is employed, where spectral bands are stacked alongside derived indices, such as the Normalized Difference Vegetation Index (NDVI) and the Normalized Difference Water Index (NDWI). This technique enriches the model’s input data by providing more context regarding vegetation health and water presence, thereby enhancing classification performance.
Advancements in the convolutional layers further strengthen the model. The architecture incorporates residual connections, inspired by ResNet, which facilitate the flow of gradients through the network. This implementation is particularly beneficial for training deeper networks, as it mitigates the issue of vanishing gradients that can hinder learning in complex models. Furthermore, dilated convolutions are utilized to expand the receptive field without sacrificing resolution. This method enables the model to capture broader contextual information within the images, which is crucial for effectively distinguishing between various land cover types.
In addition to these advancements, the model integrates attention mechanisms to enhance its focus on critical areas within the input images. By incorporating spatial attention, the model can prioritize important regions, improving its ability to detect subtle changes in land cover. This focus on relevant features aids in producing more accurate predictions. Additionally, channel attention mechanisms are employed to assess the significance of different spectral bands. This allows the model to determine which bands contribute most to the classification task, ultimately improving overall accuracy and effectiveness in LULC change prediction.
Together, these enhancements form a robust architecture capable of addressing the complexities of LULC change detection and prediction, making it a powerful tool for environmental monitoring and management.
4.2.4. Training and Validation
Training and validating the CNN model is a critical aspect of the methodology. To ensure robust performance and prevent overfitting, the dataset is divided into three distinct subsets: a training set (70%), a validation set (15%), and a testing set (15%). The training set is utilized to teach the model the relationships between input features and corresponding LULC classes, while the validation set is used to monitor and tune hyperparameters. The testing set is reserved for final evaluation to provide an unbiased assessment of the model’s accuracy.
During the training process, categorical cross-entropy is employed as the loss function, measuring the difference between predicted probabilities and true class labels. This function is particularly well-suited for multi-class classification problems. The Adam optimizer is used to facilitate efficient weight updates, combining the advantages of Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp) to achieve quicker convergence. Throughout training, the model’s performance is continuously evaluated using key metrics accuracy, precision, recall, and F1-score. These metrics are calculated based on the validation set, allowing for adjustments to the model architecture, hyperparameters, and training strategies to optimize performance and ensure generalization to unseen data.
4.2.5. Change Detection Analysis
Change detection analysis plays a pivotal role in understanding how land use and land cover (LULC) have evolved over time. In this study, we applied our trained Convolutional Neural Network (CNN) model to predict LULC classes for Sentinel-2 images captured in 1980, enabling us to identify and quantify changes compared to data from 2020.
To begin the change detection process, we utilized the CNN model trained on the earlier dataset. The model was applied to the preprocessed Sentinel-2 imagery for 2022, where it generated classification maps for each pixel, assigning a predicted LULC class based on the features learned during training. This prediction process involved feeding the model with images processed in the same manner as the training data, ensuring consistency in multi-channel inputs corresponding to the nine spectral bands of Sentinel-2.
Once classification maps for 1950, 1980 and 2020 were obtained, we proceeded to generate change detection maps that visually and quantitatively represent the changes in land cover over the four-year period. The change detection methodology involved a pixel-by-pixel comparison of the predicted LULC classes from both years. By overlaying the classification maps, we were able to identify where changes had occurred.
A change detection matrix was created to systematically categorize all possible transitions between LULC classes. This matrix indicated whether urban land had converted to agricultural land, whether forest areas had been lost, or if new water bodies had formed. The categories defined in this analysis included “No Change,” which represented pixels that retained the same LULC class in both years, and “Conversion,” which captured pixels transitioning from one class to another. Each conversion was further broken down into specific transitions, such as urban to agricultural or forest to urban. Additionally, “New Class” was identified for pixels that changed from an unclassified state to a specific LULC class.
To facilitate interpretation, the results were visualized using Geographic Information System (GIS) software. Change detection maps were created to visually represent areas of change, employing different colors to signify various types of transitions. Heat maps were also generated to indicate the intensity of changes across the study area, highlighting regions experiencing significant land cover transformations. Annotated maps provided context regarding these changes, such as urban expansion zones, deforestation sites, and areas undergoing agricultural conversion.
In addition to visual representation, a quantitative analysis was conducted to summarize the extent and magnitude of the observed changes. This involved calculating the total area affected by each type of change using GIS tools, providing metrics in hectares or square kilometers. The percentage change in each LULC class was determined by comparing the area of each class in one year to its area in another her, offering insights into the relative impact of changes on the landscape. Statistical tests may also be applied to assess the significance of observed changes, particularly in light of local environmental policies or urban planning initiatives.
The interpretation of results was crucial in understanding the implications of LULC changes. The analysis revealed patterns of urban expansion, highlighting areas where agricultural land was converted into urban uses, thereby informing urban planners and policymakers. Changes in forest cover or water bodies indicated potential environmental degradation or restoration efforts, prompting further investigation into the causes and implications of such changes. Ultimately, this comprehensive change detection analysis provided valuable insights into the dynamics of land use and land cover changes in the Porto region, supporting informed decision-making for sustainable land management and urban planning.
4.3. Results
For the fives classes in the LULC classification: (1) Artificial Built Elements, ABE; (2) Trees and Shrubs, TRS; (3) Herbaceous, HER; (4) Sparsely Vegetated—Terrestrial, SPV; and (5) Sparsely Vegetated—Aquatic, AQU., the confusion matrix in
Figure 1 was obtained.
From the confusion matrix, various performance metrics can be calculated and are shown in
Table 1
Table 1.
Classification results for 2020, show different metrics.
Table 1.
Classification results for 2020, show different metrics.
| Class |
Recall |
Precision |
F1 Score |
| Artificial Built Elements, ABE |
0.945 |
0.978 |
0.961 |
| Trees and Shrubs, TRS |
0.931 |
0.951 |
0.941 |
| Herbaceous Cover (HER) |
0.9205 |
0.95 |
0.932 |
| Sparsely Vegetated—Terrestrial, SPV |
0.929 |
0.963 |
0.946 |
| Sparsely Vegetated—Aquatic (AQU) |
0.938 |
0.938 |
0.938 |
| Overall Accuracy |
0.9205 |
|
|
The CNN achieved an overall accuracy of 92% on the testing dataset, with a Kappa coefficient of 0.88. The F1-scores for individual classes varied, with urban areas achieving the highest scores due to their distinct spectral signatures.
The change detection analysis revealed significant urban expansion in Porto, particularly in previously agricultural areas. Notably, urban land cover increased by approximately 15%, while agricultural land decreased correspondingly. The analysis also identified areas of deforestation and changes in water bodies, indicating shifts in land use patterns. Change detection maps were generated using GIS software, visually representing areas of change. The maps highlighted urban sprawl, conversion of agricultural land to urban use, and changes in green spaces.
From 1950 to 2020, Porto experienced considerable urban development, marked by a significant rise in artificial land use at the expense of green spaces
Figure 2. In 1950, the city had two main urban areas: one smaller zone in the southwest near the Douro River estuary and a larger urban core located in the central-southern part. Over the years, these urban areas expanded significantly, with the urban land cover (ULC) increasing from 31% in 1950 to 62% in 2020, resulting in an almost continuous urban landscape.
Even with this substantial growth in urbanized land, the number of individual artificial built elements remained relatively constant. This indicates that instead of new urban areas forming, the existing ones simply grew larger. This pattern highlights a trend of urban consolidation, where the development of the city focused on intensifying and expanding established urban spaces rather than creating entirely new regions.
Significant areas of tree-covered vegetation (Forest) are primarily located in the outskirts of Porto, particularly at the western and eastern ends, with most of these areas preserved since the mid-20th century (
Figure 2). As a result, the percentage of tree cover within the city, along with the number of patches, has remained relatively stable from 1950 to 2020, ranging from 22% to 25% in coverage and between 750 and 783 patches (
Figure 2). Conversely, the city has seen a dramatic decline in herbaceous cover (HER), in 1950, herbaceous vegetation dominated the landscape with about 40% coverage, which plummeted to just 10% by 2020 (
Figure 2 and
Figure 3). A visual assessment of land cover maps, particularly through analyzing the increase in the number of patches alongside the decrease in coverage percentage, reveals that herbaceous habitats have been the most significantly impacted in terms of both availability and connectivity.
While the classes of Sparsely Vegetated—Terrestrial (SPV) and Sparsely Vegetated—Aquatic (AQU) are recognized as part of ecologically important habitats, they have remained stable throughout the period from 1950 to 2020 and are minimal in this study area (always less than 5%;
Figure 3). Therefore, SPV and AQU will not be analyzed in depth, as their impact on the interpretation of land cover changes in Porto is limited.