4.1. Atmospheric Correction
The utility of an atmospheric correction algorithm (ACA) for pollutants detection will depend on two things: a) the technical characteristics of the output, and b) the quality of the correction (how accurate the correction is). The first encapsulates how many bands are processed and what is the horizontal resolution of the output. The horizontal resolution of the Sentinel-2 bands is not the same for all bands, and it differs between 10, 20 and 60m grid-spacing, which can be a limiting factor for some applications. For example, C2RCC generates the only for the first eight bands, which includes Narrow NIR (865 nm) but not NIR (832 nm). The generated output is given at 10 m pixels, even for the bands with an original pixel size of 20 or 60 meters. In contrast, iCOR generates for all MSI bands retaining the resolution of the original Sentinel-2 bands. Polymer, on the other hand, generates for all bands at 10 meters resolution. The only exception is the 2nd SWIR band (2202 nm), which is not processed at all. Finally, ACOLITE generates for all available bands with a 10 m resolution, the later however is partially true. This is because, even though the generated output is given at 10 meter pixel size, the “visual resolution” is the same as the resolution of the initial MSI bands. This means, that for a 20 m band like B6 (704 nm), four 10 m pixels forming a square have relatively similar values forming visually one 20 m pixel, which cannot be distinguished into four 10 m pixels.
When it comes to the detection of our formations, ACOLITE performs well. All formations are clearly represented with anomalous
in most bands. This representation, however, is not the same for all bands. For example, the first three bands (443, 493, 560 nm) represent these formations with reduced
anomalies compared to the surrounding clear water, and the remaining bands represent the formations with increased anomalies (
Figure 2). This is illustrated in the representation of the 18 Jan 2021 oil spill with the Green (560 nm) and NIR band (
Figure 3 and
Figure 4, respectively). This transition, however, from negative to positive anomalies means that there is a wavelength where this transition takes place, and this can result in a lack of detection capability for bands that are close to this spectral threshold. This is exactly what happens during the 30 Dec 2017 event where the Red band does not exhibit a specific pattern that resembles these formations (
Figure 5), indicating a lack of detection capability for that particular event. This doesn’t happen though for the other two cases, where the formation can be clearly identified in the Red band. Regarding the ability of the corrected bands to detect the formations, we must look at the band sensitivity. For ACOLITE the rule is that the highest sensitivity is achieved with NIR and Narrow NIR (
Figure 2) indicating their detection potential. In fact the sensitivity increases as the wave length increases from 443 nm to the NIR region, and again decreases as we move into the SWIR region.
On the other hand, when it comes to iCOR, for the 18th Jan 2021 event, the bands show a similar behavior with that of ACOLITE (
Figure 2). The formations are represented with reduced values in the lower part of the MSI spectrum, and with increased values in the medium and higher part, with the maximum again taking part in the NIR region. It is only for the weak formations of 2017, where things differ. In this case, the first three bands (Coastal Aerosol, Blue and Green) exhibit increased values of
for the polluted pixels. Such inconsistency can add uncertainty in any detection method. Another limitation of iCOR is the spectral diversity of the processed image that is required for the algorithm to run properly. This is exactly what happen during the 18 Jan 2021 event, where the land surrounding our lake was fully covered with snow, which may have affected the atmospheric correction in ways that are unknown at this point and therefore could not be addressed in this study. Regarding the sensitivity of the iCOR corrected bands, it is several times (from 2 to 7) greater compared to that of ACOLITE (
Figure 2,
Figure 3 and
Figure 4) indicating a greater detection potential.
C2RCC on the other hand, identifies the pollutants only with negative anomalies (
Figure 2,
Figure 3 and
Figure 4). However, despite its advantage to output the first eight bands at the highest possible resolution (10 m), it also has severe limitations for this applications. This is because parts of the formations are automatically masked out, especially those where the formations are relatively thicker or more intense. This is particularly the case for the 18th January 2021 event, which gives the formation land-like optical properties, and are identified by C2RCC as land. This is because C2RCC is an atmospheric correction based on neural nets, which are trained based on a large sample of waters to estimate water optical properties suitable to detect water pigments, i.e. chl-a and TSM. Hence because of this it is only valid for water bodies. This could be seen as a limitation for our application given that the most intense parts of the formations are not identified as such. Another disadvantage of C2RCC is that its detection capability varies. For example, it can clearly detect the formation of 18 Jan 2021 (even by masking it as land), but it is unable to detect the one from 30 Dec 2017. Overall, it has a maximum sensitivity in the 560 nm range, rapidly decreasing to zero as we move in the Red Edge and NIR spectral region.
Unlike the first three ACAs, Polymer has a much lower ability to detect such formations overall. The pixels identified are represented by rather weak
anomalies. Additionally, these values do not exhibit a robust structure (or they do not form a cluster of uniformly anomalous
values that contrast the surrounding clear water). This happens across bands, and it can be clearly seen for the 18 Jan 2021 event (
Figure 4), where the pollutants are less defined compared to those seem in
and
. However, this is not unjustified, given that Polymer relies on a water reflectance model based on Chl and it is simply not designed for polluted water. Both Polymer and C2RCC work pixel-by-pixel making assumptions about the water reflectance without using the entire MSI image. On the other hand, ACOLITE and iCOR make use of the whole image (land and water).
Given that the ultimate goal is to use to detect formations of unknown origin, the question is which algorithm should be used for the correction, and which bands are best for this. Our results clearly show that only iCOR and ACOLITE corrected bands have the potential to successfully detect the pollution events in question. However, iCOR exhibits some inconsistency regarding the type of anomaly our formations are represented with, which raises the uncertainty level, leading us to suggest ACOLITE as a better atmospheric correction for such applications. Regarding the band choice, this depends on the band resolution, the higher the resolution the higher its usage, especially for small events. Therefore the recommendation is to use, the four 10 m resolution bands (Red, Green, Blue, NIR). However, in the future, the use of the 20m bands should not be ruled out, given that ACOLITE gives the possibility of using these bands for detection. In the future we could also combine bands from different algorithms.
4.2. Parameter Learning and Optimization
Our study focuses on formations that were onserved during 30 Dec 2017 and 18 Jan 2021 in order to thoroughly examine these specific cases. Given the limitation of the small number of formations during these two days, it would be hard to implement an object based algorithm, which requires a relatively large number of similar formations for training. Therefore, a pixel based neural network algorithm was implemented instead.
The first step is to create the training dataset. For this, we create false color images (FCI), and we observe the identified formations. The FCI are generated by combining three atmospherically corrected bands (Red, Green and NIR), which offer the highest possible resolution and detection capacity. Five orthogonal patches that depict such formations are chosen (roughly 100x100 pixels each), three from the first image (30 Dec 2017) and another two from the second one (18 Jan 2021).
Figure 6a and
Figure 7a illustrate two of the formations used in our study, one from each Sentinel-2 image. Then the pixels corresponding to each formation from each patch are annotated via visual inspection as 1 and an equal number of pixels corresponding to clear water are labeled as 0 (
Figure 6b and
Figure 7b). Applying the above resulted in 5 groups of oil spill pixels with 108, 74, 75, 156, and 137 number of pixels each. Additionally, given that the first MSI image contained only very small formations, we artificially generated patches that were the result of smaller patches puzzled together to form a larger one for practical reasons (
Figure 6a). The annotation applied to these formations could not include the entire extent of the formations for two reasons. First, some parts exhibit a relatively weak color contrast with the environment and could not be distinguished from the surrounding clear water. This is true mostly in the periphery of the formations where the pollution layer is relatively thin. Second, inclusion of pixels with very weak color contrast with clear water, could make our model too sensitive and could lead to an increased false positive rate, that had to be avoided.
One thing to note here is that the pixels corresponding to the formations chosen from the two images had different characteristics, forming two distinct groups or
values in all bands (
Figure 8, scatterplot). This means that the two pixel groups had to be carefully balanced numerically, otherwise the training of the DNN on
values predominantly from one group would result in a model that might fail to detect pixels corresponding to the other group.
We conduct three types of experiments. In the first, two vectors containing the corresponding Green and NIR values are fed to the algorithm along with the proper label. In the second, we use Red and NIR instead and in the third we use all four 10 m bands (492, 560, 665, and 832 nm). In each case, we implement a Deep Neural Network (DNN) with two hidden layers and an output layer. The hidden layers have 12 and 8 units, and they use a Relu activation. The output layer uses a sigmoid activation and has only two output classes. The DNN is run with eight different learning rates (ranging from 0.00005 to 0.1), a batch size of 10 and with 100 epochs. Two different optimizers are tested, Adam and SGD. Given the fact that every time we run the model the weights of the Neural Network are randomly initialized, leading to a different result, we run the model 10 times with the same combination of hyperparameters. At the end we estimate the final result by averaging our metrics from these 10 different experiments.
4.3. Results
Initially, we take five identified and annotated formations (observed during 30 Dec 2017 and 18 Jan 2021) from the Sentinel-2 images and we operate a 5-fold validation process, where each time we use four of the formations to train the algorithm and the last one is used for testing. We do this five times, while the formations rotate into testing mode. The metrics we employ to measure the performance of the algorithm are false positive rate, IoU, and accuracy. The results share common characteristics for all three types of experiments. First, only the Adam optimizer performs adequately. On the other hand, the SGD optimizer fails to accurately predict the formations in almost every run, either due to a large number of FPs or due to complete lack of detection capability. Second, the optimal learning rate is roughly in the range (0.0005-0.001) for the IoU and (0.00005-0.0001) for the FPR.
When it comes to the optimal band combination for detecting the formations, it seems that the 4-Band combination outperforms any of the 2 band combinations, with the Red-NIR being the least successful. More specifically, the best results (after we average over all formations) the DNN achieves during training are 0.04, 71.18 and 99.46% for the FPR, IoU, and accuracy, respectively.
Figure 6c and
Figure 7c show examples of the DNN prediction for two individual formations. A comparison of the annotation with the DNN prediction shows that the model captures the formations very well, also evident in the three metrics used.
Table 1.
IoU and FPR results corresponding to the validation runs for the 4 Bands, Red-NIR and Green-NIR experiments. Values represent averages across the five rotational runs.
Table 1.
IoU and FPR results corresponding to the validation runs for the 4 Bands, Red-NIR and Green-NIR experiments. Values represent averages across the five rotational runs.
| |
4 Bands |
Green-NIR |
Red-NIR |
| LR |
IoU |
FPR |
IoU |
FPR |
IoU |
FPR |
| 0.1 |
7.48 |
30.5 |
7.41 |
28.03 |
1.48 |
32 |
| 0.05 |
12.7 |
30.43 |
15.83 |
30.13 |
18 |
18.26 |
| 0.01 |
43.08 |
1.79 |
54.64 |
0.74 |
54.37 |
0.82 |
| 0.005 |
46.84 |
1.8 |
62.94 |
0.63 |
58.15 |
0.93 |
| 0.001 |
60.43 |
0.93 |
69.82 |
0.42 |
63.09 |
0.8 |
| 0.0005 |
71.18 |
0.4 |
68.76 |
0.27 |
58.41 |
0.65 |
| 0.0001 |
33.78 |
0.05 |
40.88 |
0.05 |
31.09 |
0.21 |
| 0.00005 |
26.44 |
0.03 |
29.92 |
0.05 |
28.31 |
0.04 |
Additionally, the perfect model is not expected to have a zero FPR, in fact a number of FP’s is naturally expected, and these represent the prediction of formation pixels that were not included in the annotation. This is perfectly captured in
Figure 6a,b, where it is evident that a number of pixels depicting these formations are not included in the annotation, and are not included in the training either, leading to a model that fails to detect the formation in its entirety.
Regarding the use of atmospheric correction specifically designed for water applications, the question is whether this step is necessary. To answer this, we perform similar experiments where Level 2 Sentinel-2 images are used instead (called No Correction from now on). Note that for these images the default algorithm (Sen2cor) is applied. Our results indicate that the use of ACOLITE allows far better metrics (IoU, FPR, Accuracy) during the 5-fold validation step. Furthermore, the 4 Band and the Green-NIR band combination perform slightly better compared to the Red-NIR configuration, which is in general similar compared to the case where the ACOLITE algorithm has been used.
The purpose of this algorithm is to be used operationally to detect similar formations. One of the biggest concerns in such cases is when an algorithm gives too often false alarms, predicting formations when there are none. Therefore, after the validation process and the final training of the algorithm using the optimal hyperparameters and including all formations in our training set, the model is tested to a number of Sentinel-2 images (28 images corresponding to 28 different dates) where no formations are present. The results show that the algorithm does not produce any false positives indicating its potential for operational use.
To further evaluate the prediction skill of our algorithm we use the oil spill case from an inland lake in Italy (lake Pertusillo) published in Lavene et al (2022). This event was chosen given its similarity (optically) to our formations. This particular study mentions several Sentinel-2 images that might contain oil spills due to inland oil drilling activity. Before running the DNN algorithm all images were processed with ACOLITE to correct for the atmospheric effect. It turns out that our trained DNN does have some detection skill, especially for the 27 Feb 2017 case where wide spread oil contamination is present (and verified) throughout the lake. On the other hand, for other cases mentioned in the Lavene et al. (2022) study, the model does not exhibit the same detection skill and it detects no oil spills. It should be noted though that some of these Sentinel-2 images are referred to as “suspect” images, and are not accompanied by in situ measurements that could verify the presence of hydrocarbons. Another thing that should be noted is that lake Pertusillo experiences algae blooms during the same time with the confirmed oil spills, affecting the visible part of the spectrum. This could explain why these oil spills do not have the same optical properties as the ones from Polyphytos reservoir. For example, maximum sensitivity is associated with the Red Edge bands. Additionally, for the wide spread oil spill of 27 Feb 2017, the boundaries of the oil spill can only be discerned in the Red Edge and NIR part of the spectrum. The lack of sensitivity of the visible bands to oil spills could be partly responsible for the lack of detection skill given that three out of the four bands our algorithm uses are from the VIS spectrum. In conclusion, training a universal algorithm for the detection of such formations in inland waters, would require a large number of such events covering a broad range of water types, which is not available at the moment.
Figure 9.
Sentinel-2 false color image (Green-Red-NIR) showing the oil spill case for the 27th February 2017 along with the DNN prediction.
Figure 9.
Sentinel-2 false color image (Green-Red-NIR) showing the oil spill case for the 27th February 2017 along with the DNN prediction.