1. Introduction
Methane (CH
4) is the second most important anthropogenic greenhouse gas after carbon dioxide (CO
2), surpassing CO
2 in its heat-trapping effectiveness [
1]. Human activities, such as livestock digestion, rice cultivation, and the fossil fuel industry, are the primary sources of anthropogenic CH
4 emissions [
2]. Satellite-based CH
4 measurements play an important role in identifying emission sources, quantifying sinks, and devising strategies to mitigate climate change [
3,
4].
The Tropospheric Monitoring Instrument (TROPOMI) on ESA’s Sentinel 5 Precursor satellite has global coverage within a day and provides the dry air mole fraction averaged methane columns XCH
4 for clear-sky scenes with a spatial resolution of up to 5.5×7 km
2. TROPOMI is a grating spectrometer with a wide spectral coverage including ultraviolet (UV), visible (VIS), near infrared (NIR) and short-wave infrared (SWIR) [
5]. The TROPOMI XCH
4 data product is retrieved from the instrument’s NIR and SWIR measurements by deploying the RemoTeC algorithm [
6]. It’s worth noting that due to algorithmic limitations on the viewing zenith angle, the swath is narrower for TROPOMI XCH4, resulting in a lack of full daily global coverage.
During recent years, the TROPOMI XCH
4 product was continuously improved; e.g., a new bias correction scheme was introduced to correct the dependence of XCH
4 on the brightness of the scene [
7]. The data set was extended to include observations over oceans under glint geometry [
8] and the identified XCH
4 anomalies over carbonated rock formations were solved by improving the spectral fit of the surface reflectivity [
9]. The TROPOMI XCH
4 dataset is widely used, for example, for the detection of anthropogenic CH
4 emissions and atmospheric modeling [
3,
4,
10].
The TROPOMI XCH
4 data product meets the stringent mission requirements on precision (<1.5%) and accuracy (<1%). Extensive validation efforts using ground-based FTIR measurements from the Total Carbon Column Observing Network (TCCON), have confirmed that the errors of the TROPOMI XCH
4 data are in line with these demands [
11]. Achieving such precision necessitates rigorous cloud-clearing of the TROPOMI measurements, as clouds in the satellite’s observation path can introduce significant retrieval errors. Although the S5-P satellite itself lacks a dedicated cloud imager, it flies in loose formation with the SUOMI-NPP (SUOMI-National Polar-orbiting Partnership) satellite. This mission synergy allows us to use the VIIRS (Visible Infrared Imaging Radiometer Suite) cloud data for cloud clearing of TROPOMI measurements, since both instruments observe the same ground scene within < 3 minutes [
12].
Here, the VIIRS data serve a dual role in ESA’s data processing workflow. Initially, they are employed for pre-filtering TROPOMI data, reducing the volume of data to be processed—a step representing weak data filtering. The more demanding second step involves cloud clearing, where VIIRS data contribute to a posteriori data filtering of the retrieved XCH4 data, ensuring the high quality of the data product. This study primarily focuses on the second step of cloud filtering because it is more demanding and interesting from the scientific point of view. However, we also applied our method on the pre-filtering step and it is implemented in ESA’s processing framework.
In the event of missing VIIRS data, a backup filter was established for the TROPOMI XCH
4 retrieval, relying solely on TROPOMI measurements. This alternative filter incorporates information from both weak and strong CH
4 absorption lines in the shortwave infrared, which are included in the output of the TROPOMI CO product [
12]. A similar approach, for instance, was applied by [
13] for the OCO-2 instrument. However, our investigation revealed that the performance of the backup filter is found to be insufficient. It is crucial to highlight that these filtered thresholds were defined preflight and have not undergone testing on real data.
SUOMI-NPP is approaching its end of life and will be succeeded by the NOAA-20 satellite [
14]. Although NOAA-20 carries a cloud imager equivalent to VIIRS, the different orbit positions cause a time gap of more than 20 minutes with TROPOMI, which presents a challenge for critical cloud clearing of TROPOMI data. A malfunction of the VIIRS instrument of a full month 2022 (August) and 7 days in November 2023 clearly demonstrated the dependence of the TROPOMI XCH
4 data quality on the VIIRS data product. The daily distribution of XCH
4 values in regions such as North America, Siberia, and Australia showed a significant low bias, due to the inclusion of cloud-contaminated scenes. Apparently, in August, the global XCH
4 distribution was skewed towards lower values, impacting both the mean by 3.9% and the standard deviation by 303% for the three regions while introducing numerous outliers. Hence, this clearly showed that the current backup filter is not sufficient and an alternative cloud-clearing of TROPOMI XCH
4 data is needed that is complementary to the SUOMI-NPP measurements.
This study introduces an innovative machine learning approach for cloud-clearing of the TROPOMI XCH
4 data product using a random forest classifier (RFC). Trained on five years of collocated measurements from both TROPOMI and SUOMI-NPP data (about 20000 orbits), the RFC can replace the VIIRS cloud-clearing process relying solely on TROPOMI data. Therefore, it represents an alternative cloud-clearing in the absence of VIIRS data and so can solve data processing issues due to the expected end-of-life of SUOMI-NPP. Moreover, the RFC approach is an essential step toward a near real-time TROPOMI XCH
4 data product, as it eliminates the need to await the availability of VIIRS data. A near real-time XCH
4 data product is requested for chemical forecasting of the atmosphere, as done by the Copernicus Atmosphere Monitoring Service (CAMS) and the Integrated Forecasting System (IFS) developed by European Centre for Medium-Range Weather Forecasts (ECMWF). It focuses on monitoring and forecasting atmospheric composition, including greenhouse gases, aerosols, and reactive gases and already assimilates TROPOMI CO in near-realtime [
15].
Our study is structured as follows. In
Section 2, we discuss the datasets utilized in our research and
Section 3 explains our machine learning approach. In
Section 4, we apply the RFC to address the one-month absence of VIIRS data in August 2022. Furthermore, a validation is presented for measurements at 12 TCCON stations. Finally,
Section 5, summarizes our findings and draws conclusions based on our research.
4. Results
The RFC cloud-clearing for TROPOMI XCH
4 as described in
Section 3.1 depends on the TROPOMI CO data and its availability. This dependency does not mean any restriction, as the TROPOMI CO data product has significantly larger coverage than TROPOMI XCH
4. CO retrieval is more resilient with respect to cloud contamination and level 1 data quality, so data that are rejected by the CO data processing are also not usable for XCH
4 processing. To evaluate the performance of our new machine learning-based cloud clearing approach, we reprocessed the TROPOMI XCH
4 data for August 2022 over North America [37.0°±17.0°N, 101.5°±34.5°W], Siberia [62.5°±6.5°N, 110.0°±14.0°E], and Australia [27.2°±16.5°S, 133.45°±20.12°E], a period when VIIRS data was unavailable. As a backup option when VIIRS data are not available, the operational TROPOMI XCH
4 retrieval applied a simple threshold filter to remove cloud-contaminated measurements. This filtering was used as a backup cloud clearing in case VIIRS data were not available [
12]. However, as depicted in
Figure 3A, the threshold filtering introduces large errors to the data set and allowed many cloud-contaminated measurements to remain in all regions considered. This discrepancy becomes evident when comparing the data distribution for August with that for July and September, when VIIRS data were available.
For August, the global XCH
4 distribution was skewed towards lower values, affecting both the mean value by 3.9% and the standard deviation by 303% for the three regions while introducing numerous outliers. This is mainly due to the shielding of air masses below the cloud. Thus, for this month, the data no longer align with the mission requirements and are flagged to be of low quality.
Figure 3B shows the filtered data using our RFC learning approach. For July and September, this data set maintained the cloud screening based on the resampled VIIRS data. For August, the distribution of XCH
4 for the three regions is in good agreement with the other months. The mean and standard deviations are no longer skewed towards lower values, and even the number of outliers, indicated by the red data points, is reduced when compared to July and September.
The RFC cloud clearing approach appears to be more restrictive compared to the VIIRS-based method. This is illustrated in
Figure 4 for TROPOMI overpasses over the US. Here, the RFC cloud clearing identifies similar clear-sky regions as the VIIRS approach but filters out more lower XCH
4 columns. As previously discussed, lower XCH
4 values are indicative of cloud contamination, suggesting that the RFC method may perform even better than VIIRS in this case. A explanation for this difference might be that the RFC approach is less sensitive to challenges in the VIIRS data because it has learned a general correlation between the VIIRS and the TROPOMI CO data. The persistence of cloud-contaminated scenes in the VIIRS-filtered data might be attributed to the 2-3 minute time difference between TROPOMI and VIIRS. Depending on the meteorological situation, the time delay can introduce a wrong clear-sky classification of the TROPOMI data using VIIRS.
The quality of the operational TROPOMI XCH
4 data is subjected to validation and continuous monitoring using XCH
4 reference data from TCCON network. To assess the effectiveness of the RFC cloud clearing approach, we processed five years of TROPOMI data around 12 TCCON sites (within a radius of 300 km), where we intentionally refrained from applying any cloud screening. This approach allows us to make a direct comparison between the different cloud clearing techniques using collocated VIIRS data and the RFC approach. For each station, we derived XCH
4 time series of daily means for both TCCON and TROPOMI. This is illustrated in
Figure 5 showing data for Sodankyla, Finland, and Wollongong, Australia. The time series of all stations are shown in the
Appendix A (
Figure A2 and
Figure A3). When comparing the TROPOMI XCH
4 data filtered with the RFC approach (A, C) to the current VIIRS filtering, we observe highly comparable results. Both approaches exhibit a similar data density, but when applying the new approach, we notice slightly fewer outliers with lower XCH
4 values. This agrees with the discussion above.
Figure 6 shows the validation statistics for all 12 stations. Overall, we see very good agreement in the bias (-7.4 ppb for RFC vs -5.6 ppb for VIIRS) and in the standard deviation of the bias (11.6 ppb for RFC vs 12 ppb for VIIRS). The amount of data is about 7 % higher for RFC compared to the VIIRS cloud clearing. In
Figure 7, we present the correlation between the daily TCCON and TROPOMI XCH
4 means for all stations. The error bars in the figure represent the error of the daily means. The TROPOMI-TCCON correlation is highly comparable using the two cloud clearing approaches. Overall, from the results presented in
Figure 6 and
Figure 7, we conclude that the data quality around the TCCON sites is very comparable using both cloud clearing approaches.
5. Discussions
The quality of the TROPOMI XCH4 data product depends on accurate cloud clearing of spectrometer data, a process facilitated by using the SUOMI-NPP VIIRS cloud product. During the last six years of mission operation, this approach proved to be very effective because both satellites operate in a formation, ensuring that they capture the same ground scene with a minimal delay of 2-3 minutes. However, the upcoming decommissioning of SUOMI-NPP necessitates the development of an alternative cloud-clearing method for the TROPOMI XCH4 data product. This need arises due to the significant time delay of about 20 min between TROPOMI and the NOAA-20 satellite, which is the successor of SUOMI-NPP, and the malfunctioning backup cloud filter of the current TROPOMI XCH4 processor. To maintain the accuracy and reliability of the TROPOMI XCH4 data, a new cloud clearing approach had to be established in response to this new situation.
In this study, we introduced a new machine learning approach based on the Random Forest Classifier technique, which replicates VIIRS cloud clearing using only TROPOMI data. The classifier is trained on a subset of 5 years of collocated TROPOMI and SUOMI-NPP VIIRS data (about 20000 orbits). To this end, we used parameters derived from the TROPOMI CO retrieval, which is inherently sensitive to the presence of clouds and is processed prior to the retrieval of TROPOMI XCH4 in the mission operational pipeline. This strategic choice simplifies the integration of our approach into the existing processing framework. In addition, we presented an efficient and robust method for despriping TROPOMI data, relying on median smoothing techniques. This approach was validated with ground-based TCCON measurements to show that it is not changing the bias, but slightly improves the standard deviation of the bias. This is a positive results since removing the stripe noise from the data should only improve the scatter and not introduce a bias change. The destriping will be suggested as an update for the operational TROPOMI XCH4 and CO retrieval in future.
We demonstrated the performance of the new cloud-clearing approach by filtering three months of TROPOMI data in summer 2022 over North America, Siberia, and Australia. During one of the months, SUOMI-NPP experienced a temporary outage, which resulted in the lack of availability of VIIRS data. Remarkably, our machine learning approach demonstrated comparable performance during this data gap. Additionally, we conducted a TCCON validation covering the entire 5-year duration of the TROPOMI mission for 12 stations. The results highlighted the effectiveness of our RFC cloud clearing approach, showcasing its ability to match the performance of VIIRS filtering with a very similar bias and about 7 % more data.
Currently, our cloud-clearing approach is tailored for land-only scenes and our future plans include expanding its application to glint geometries observed over the oceans. Moreover, we will also apply RFC for improved quality filtering of TROPOMI XCH4 data in future. When the dependence on SUOMI-NPP data is eliminated, faster processing becomes possible. This is in particular interesting for the scientific data application monitoring CH4 point sources or assimilating TROPOMI data in near-real-time as done by CAMS-IFS for TROPOMI CO.