Satellite and in situ Sampling Mismatches: Consequences for the Estimate of Satellite Sea Surface Salinity Uncertainties

Validation of satellite sea surface salinity (SSS) products is typically based on comparisons with in-situ measurements at a few meters depth, that are mostly done at a single location and time. The difference in term of spatio-temporal resolution between the in-situ near-surface salinity and the two-dimensional satellite SSS results in a sampling mismatch uncertainty. The Climate Change Initiative (CCI) project has merged SSS from three satellite missions. Using an optimal interpolation, weekly and monthly SSS and their uncertainties are estimated at a 50 km spatial resolution over the global ocean. Over the 2016-2018 period the mean uncertainty on weekly CCI SSS is 0.13, whereas the standard deviation of weekly CCI minus in-situ Argo salinities is 0.24. Using high resolution SSS simulations, we estimate the expected uncertainty due to the CCI versus Argo sampling mismatch. Most of the largest spatial variability of the satellite minus Argo salinity are observed in regions with large mismatch. A quantitative validation is performed by considering the statistical distribution of the CCI minus Argo salinity normalized by the sampling and retrieval uncertainties. This quantity should follow a Gaussian distribution with a standard deviation of 1, if all uncertainty contributions are properly considered. We find that 1) the sampling mismatch can explain most of the observed differences between Argo and CCI data, especially for monthly products and in dynamical regions (river plumes, fronts), 2) overall, the uncertainties are well estimated in CCI version 3, much better compared to CCI version 2. There are a few dynamical regions where discrepancies remain, and where the satellite SSS, their associated uncertainties and the sampling mismatch estimates should be further validated.


Introduction
Knowledge of salinity is crucial for understanding the thermohaline circulation of the ocean, as salinity has a large contribution on water density. As an advected property, it is an interesting tracer of oceanic phenomena. Salinity also conditions the stratification of the surface layer and thus influences the ability of the ocean surface to mix [1]. Salinity is a key indicator for monitoring the water cycle as it integrates the freshwater input to the ocean, through interactions with the atmosphere (precipitations and evaporation), with the continents (river discharges) or with ice ( [2], and references herein).
Sea Surface Salinity (SSS) has been measured by satellite for more than 12 years, thanks to three satellite missions: the Soil Moisture and Ocean Salinity (SMOS), the first SSS observing satellite mission launched in 2009 and still operating today; the Aquarius mission which operated from 2011 to 2015; and the Soil Moisture Active Passive (SMAP) mission launched in 2015 and still operating today. These three satellite missions measure brightness temperature at 1.4 GHz allowing to retrieve SSS, with unprecedented spatiotemporal coverage (see reviews in [3], and [4]). Although they operate in the same frequency band, they have their own characteristics in terms of spatial resolution: the spatial resolution of SMOS and SMAP retrieved SSS (~40 to 50km) is about a factor 2 to 3 higher than that of the Aquarius SSS (see more details in [3]).
The recent ESA Climate Change Initiative project (CCI+SSS) combines measurements of these three satellite missions resulting in a decrease of the differences between satellite salinity fields and Argo float surface salinities (standard deviation (robust standard deviation) of the monthly CCI SSS minus the Argo float surface salinities over the whole period equal to 0.27 (0.16) globally) [5,6]. However, the interpretation of such comparisons in term of satellite field uncertainties is tricky as it involves not only uncertainties in satellite estimates, but also uncertainties in in-situ estimates and sampling mismatches [7]. In this paper we focus on better understanding these differences, by estimating the contribution of the sampling mismatch between in-situ salinity and the CCI+SSS fields as produced with the last CCI+SSS version (version 3). Satellite SSS are typically compared to in-situ measurements for validation. The insitu data used are of several types: either at fixed point (moorings) or from moving platforms (ships; floats).
A major problem with these comparisons is the difference in sampling between satellite and in-situ datasets [8]: satellite data are integrated over a surface, of nearly 50km in diameter for the CCI products, and are either for a single time or over a weekly or monthly period, in the case a temporal optimal interpolation is applied as done for the CCI fields. In-situ data are generally at a single point in space and time, as is the case for ship, Argo floats and mooring measurements. Moreover, satellite salinity represents the upper 1cm depth whereas in situ salinity are most of the time measured at a few meters depth.
This difference in sampling necessarily implies a different sensitivity of both types of measurements to small-scale variability, and distorts the uncertainties estimate of the satellite data if the in-situ data is used as a reference without taking into account the uncertainty due to the sampling mismatch. This issue is particularly important in regions of high SSS variability such as river plumes (e.g., Amazon plume) or meanders of large currents (e.g., Gulf Stream) and rain as illustrated on Figure 1. The variability of salinity is of 3 kinds, spatial temporal and vertical [8], we mainly focus on spatial variability in this paper.
Thus, to get a valid estimate of the uncertainties of satellite products obtained by comparison with in-situ data, it is important to complement the satellite SSS uncertainty with the uncertainty due to the sampling mismatch [7]. In case of satellite salinities, [9] found that, in several ocean regions including the tropics and western boundary current regions, small-scale variability, simulated with a 1/12° HYCOM model, is an important source of sampling mismatch in comparisons between 1° (~Aquarius footprint) satellite SSS and in situ salinities. More recent studies (see [10] and references herein) used in situ measurements and high resolution models (1/48°) to estimate the sub-footprint variability (SFV) at level 2 and simulate spatial integration for Aquarius at footprint size of 100km and for SMAP at 40km. Strong dependencies of SFV on region, season and footprint size have been found. SFV are particularly high in rainy regions [11], along the continental shelves where strong river outflow is present [12], and near frontal zones such as the western boundary currents [13]. The variabilities obtained range from less than 0.1 to 1 pss. To simulate expected RMS between satellite and Argo salinities, [13] derives Representation Errors (RE) between a footprint integrated satellite SSS and a local SSS using one year of high-resolution simulations. [12] reports RE at a 40km footprint peaks at 0.06 pss, with typical values of 0.01-0.15 pss but also some high values reaching 0.4 pss or even larger (their figure 9b).
The objective of this paper is to interpret the differences observed between the weekly (monthly) satellite CCI+SSS fields and Argo salinities, given the uncertainties attributed to the CCI+SSS fields and the sampling mismatches between both types of measurements. We will then focus on validating the CCI+SSS and associated uncertainties. The sampling mismatch uncertainty is estimated using SSS simulated with a 1/12° resolution model and small-scale variability not resolved by the model will be estimated following a spectral method similar to that proposed by [12]. We conduct our study using three years (2016 -2018) of SSS simulation contemporary to CCI+SSS fields. We consider both the spatial variability within a satellite footprint and the temporal variability within one week (month).
We report results obtained with the weekly products in the main paper; results obtained with the monthly products are reported in the supplementary material. A similar study was conducted on CCI v2 in order to compare the two versions, the results obtained with CCI v2 are also in Supplementary Materials.
The datasets and methods are described in Section 2. Results are presented in Section 3 followed by a discussion (Section 4) and conclusion (Section 5).

Materials
The study is performed from 2016 to 2018, a period during which SMOS and SMAP data are available, and focuses on the interpretation of the comparisons between CCI+SSS and Argo salinities. We use three data sources: -Satellite product: CCI v3.2.
We use version 3.2 of the SSS products generated in the framework of the ESA CCI project (https://catalogue.ceda.ac.uk/uuid/fad2e982a59d44788eda09e3c67ed7d5).
These Level 4 (L4) products are derived using a temporal optimal interpolation of SSS retrieved from the three L-band radiometric satellite missions launched since 2010: SMOS (2010 -present), SMAP (2015 -present), and Aquarius (2012 -2015). These SSS fields span a period from 2010 to 2020, are available on a 25km Equal-Area Scalable Earth (EASE  4 of 19 2) Grid [14], and at weekly and monthly temporal resolutions. During the period studied here (2016-2018), both SMOS and SMAP data are available.
The CCI v3.2 SSS fields have been generated following an optimal interpolation similar to CCI v2 [5]. Compared to the CCI v2 processing, the CCI v3 processing has been updated to improve the long-term stability of the SMOS SSS entering in the CCI L4 optimal interpolation and to improve the L4 SSS uncertainty estimates. In particular: -Instead of using SMOS SSS produced by the Centre Aval de Traitement des Données SMOS (CATDS) operational chain, the SMOS SSS were reprocessed with a modified ESA v662 processor [15].
-Instead of using European Center for Medium Weather Forecast (ECMWF) Integrated Forecast System (IFS) fields as auxiliary parameters, the processing used ECMWF ERA5 fields. - The SMOS vicarious calibration, the so-called Ocean Target Transformation, is estimated using in situ interpolated SSS fields produced by the In Situ Analysis System (ISAS) [16] instead of using a salinity climatology.
-The dielectric constant model is updated as proposed by [5].

-
The SMOS SSS affected by instantaneous rain rate are corrected up to 10mm hr −1 [17] are sorted out in case of stronger RR. SMAP SSS retrieved with RR larger than 0.5mm hr −1 are filtered out. Therefore, in rainy regions the CCI v3 fields are close to bulk salinities.
-In the CCI v3 L4 optimal interpolation, a full least square propagation of the errors is implemented, instead of a simplified propagation.
-Representativity uncertainties between the swath measurements of the SMOS and SMAP level 2 SSS and that of the L4 estimated SSS (weekly or monthly fields), corresponding to the temporal variability of the SSS at the SMOS and SMAP resolution (~50 km), within one week or one month, have been taken into account. The original footprint sizes of level 2 SMOS and SMAP SSS entering in the CCI v3.2 processing are slightly less than 50km (close to 45km for SMOS (depending on the across swath location) and to 43km for SMAP). SMOS original SSS are provided on an icosahedral Snyder equal area (ISEA) grid at 12km resolution. SMAP original SSS are on a regular 0.25° grid. In a first CCI+SSS processing step, SMOS and SMAP SSS are reprojected on the 25km EASE 2 grid using nearest neighbors method. In the later stages of the CCI+SSS processing, no spatial smoothing is applied, each grid point being treated independently. Based on that, we consider in the following that each grid point is representative of SSS obtained with a simple average within 50km grid cell. In order not to add too much complexity to the processing and the computing time, we consider that the weekly and monthly temporal OI is equivalent to a simple average of SSS over one week and one month.
-In-situ products: Argo floats The Argo [18] project is a set of ~3,000 floats moving in the upper 2,000 m of the global ocean. These floats give access to about 100,000 measurements of salinity and temperature per year with a global coverage, and an average spacing of 3° between measurements. These data are collected and made freely available by the international Argo project and the national programs that contribute to it.
We use the collocations provided by the SMOS Pilot-Mission Exploitation Platform (Pi-MEP) [19] between the CCI fields and Argo floats.
The Pi-MEP collocations are made from Argo data with a quality index of 1 or 2. Argo upper measurements between 10m depth and 0m depth are considered as surface data, and used as a comparison to satellite data. Most of the Argo data resulting from this selection are taken at a depth of about 5m. More details about this method are available in the Pi-MEP reports.
We use the daily high-resolution (1/12°) Mercator GLORYS reanalyzed SSS. We consider that it resolves variability above the Nyquist wavelength (1/6°). These SSS simulated fields have a slightly reduced spatial resolution compared to studies [12] and [13] to allow global ocean coverage, while keeping the small scales that contribute much to SSS variability in a satellite pixel [20].
The GLORYS12V1 product is a CMEMS global ocean eddy-resolving (1/12° horizontal resolution, 50 vertical levels) reanalysis. These data are available from 1993 to today.
These reanalysis are based on the NEMO ocean model, forced, at the surface by ECMWF ERA5 data, and by climatological runoffs. Satellite sea level anomalies, SST, sea ice concentration, in-situ temperature and salinity vertical profiles (but not satellite SSS) are assimilated.
Data are available at daily temporal resolution, and on a regular 1/12° grid, for 50 levels, starting from the sea surface with levels at 0.5m, 1.5m, 2.6m, 3.8m and 5m. We use the level at 5 m depth as a proxy for Argo upper salinity. See [21] for a complete description of the model.
An example of the estimation of GLORYS SSS variability within satellite grid cells (the calculation method is described in section 2.2), is shown on Error! Reference source not found.. this map shows the standard deviation of the GLORYS salinities accumulated in boxes of 50km x 50 km and during a 7-day period, corresponding to the spatiotemporal resolution of the weekly CCI data for the day of February 15, 2017. This estimation has been computed for every day over the entire period from 2016 to the end of 2018. This is, as described in the methods section, the variability of salinity at 5m depth. The same maps were generated with surface salinities for a better understanding of the vertical variability of near-surface salinity (see discussion below).
As expected from previous studies [12,22], high standard deviation values are observed in areas of high natural variability, such as river plumes (e.g., Amazon, Congo) and regions with strong salinity gradients (e.g., Gulf Stream). The 30-day variability is similar to the 7-day variability with stronger values, (the map is available in the Supplementary Materials).  Table 1).

-Uncertainties balance
We follow the methodology proposed by [7] to validate the uncertainties of the satellite products, by considering the total uncertainty composed of three terms: is the uncertainty of each satellite estimate, is the uncertainty due to sampling mismatch between satellite data and in-situ data.
corresponds to the uncertainties of the reference in-situ data.
According to [7], uncertainties of satellite measurements are well represented when the observed differences between satellite and reference in-situ measurements, normalized by the quadratic sum of these uncertainties follow a Gaussian distribution with a standard deviation of 1.
In equation (1), STD stands for the standard deviation, represents the satellite measurements and the reference in-situ data. We now detail how the various terms of equation (1) are obtained. -− The colocations between CCI+SSS and Argo floats measurements are performed by the Pi-MEP, as described in [5,19]. The match-up search radius is chosen to select the closest match-up both in time and space, within the spatio-temporal sampling of the satellite fields.
The locations of xREF corresponding to the Argo measurement points made available in the Pi-MEP database are used as the basis for calculating the normalized differences described in equation 1. The terms Usat and Umis are thus collocated at these locations as described below.
-The uncertainty of each CCI+SSS gridpoint is estimated in the course of the CCI+SSS temporal OI using the formalism described in the Supplementary Information of [5]. It takes into account uncertainties of SMOS and SMAP individual SSS, their representativity uncertainties with respect to a weekly (or monthly) integrated field and uncertainties on the bias corrected by the CCI processing. In the CCI+SSS v3 processing, a full least square propagation of the uncertainties is implemented, instead of a simplified propagation involving matrix inversion over limited time period as in version 2. This quantity is described in Figure 3.b.
-In our case, corresponds to the uncertainty of the Argo salinity. As we only use Argo data with a quality index of 1 or 2, these uncertainties are small, on the order of 0.01 or less [23]. They are expected to be much smaller than the other uncertainties in equation (1), and are neglected in the following.
-To estimate the uncertainties due to sampling mismatch, we compute the variability of SSS at different spatiotemporal scales from Mercator GLORYS reanalysis at a resolution of 1/12°.
Our estimation of sampling mismatch corresponds to the standard deviation of GLORYS grid resolution (1/12°) in pixels centered on EASE 2 grid points: for each pixel, we select every point within a radius of R km (R = 25 km) from the center and a temporal range of dt/2 (3.5 days for weekly fields and 15 days for monthly fields). We compute the standard deviation of these points, and consider it as representative of the sub-pixel variability.
As our study focuses on the comparison between in-situ data taken at 5m depth, we compute the standard deviation of the GLORYS model data at 5m depth. CCI+SSS v3.21 fields are corrected from the instantaneous rain effect, based on earlier comparisons between SMOS SSS in rainy conditions and Argo salinities [17], hence reducing the difference expected between satellite SSS in the upper 1cm depth and the Argo salinity at ~5m depth.
We calculate the sampling mismatch for each day over three years (2016-2018).
The above estimate of is limited, however, by the resolution of the GLORYS reanalysis (1/12°). This means that the variability at scales smaller than 1/6° present in the point measurements will not be well represented in GLORYS simulations. We take this limitation into account by estimating the variability missed by the GLORYS reanalysis following a wavenumber spectral analysis as described in [12].
The variance of a signal between a chosen wavelength and the Nyquist wavelength can be inferred assuming that the slope of the spectrum between the two wavelengths is reasonably known. This variance is given by the following formula, which follows from the integration of the power spectral density (PSD): where corresponds to the slope of the PSD when viewed as a log-log plot, is the y-intercept of the PSD, is the wavelength below which we calculate the variance, is the Nyquist wavelength.
What we want to estimate is the influence of a change in Nyquist wavelength on the variance of salinity below a wavenumber of 50km. We therefore calculate 50 for = 20 , which corresponds to the STD actually measured by GLORYS, and 50 for = 0 , which corresponds to a perfect case where all the signal power is considered.
Spectral studies in the Arabian Sea and in the western Pacific [12], as well as in the tropical and subtropical Atlantic Ocean [20] suggest that at the scale of interest, for a wavelength below 50km, the salinity PSD have a slope close to =3.3. This is quite close to a k −3 power law expected for a passive tracer under the influence of advection, the slightly steeper slope being likely attributable to atmospheric processes [12]. Even though this slope is likely to vary slightly from time to time and place to place, we use it to infer an order of magnitude of the ratio between the variability expected in the case where all wavelengths below 50km are considered, Umist and the variability estimated with GLORYS reanalysis resolution, Umis, using equation 2. We find the following relation: The results are given thereafter with and without this adjustment.
We evaluate the gaussian shape of the normalized differences distributions (equation 1) following two methods: -when considering large regions (global ocean or in boxes represented on Error! Reference source not found. and defined in Table 1), we approximate the statistical distributions of the normalized differences with a Gaussian fit and we compare the fitted STD with the expected value of 1.
-In order to make local analysis, we also compute STD of normalized differences in 2° boxes. However, due to the reduced number of colocations in 2° boxes (Error! Reference source not found..b) they are noisy and it is not possible to estimate a reliable fit of the normalized differences distribution in each box. Nevertheless, we look to which extent the statistical distribution of STD estimated in 2° boxes is consistent with Gaussian distributions of the normalized differences in each box. This is performed by considering the histograms of the variances, STD 2 , multiplied by the number of measurements. Indeed, given 1,⋯, a random sample from a Gaussian distribution (with a average and a standard deviation) ( , 2 ), in any of the 2° boxes, with 2 = , the random variable Y=( −1) 2/ 2 = n Var(X)/ 2 follows a 2 n-1 distribution (See p211 of [24]). This choice of representation allows us to compare the histograms obtained to a theoretical curve that Y should follow if the normalized differences distributions in each box followed a (0,1) distribution, as we expect if the uncertainties are correctly estimated. The theoretical curve is deduced by cumulating the distributions of the Y term expected on each of the 2° boxes, considering =1.
The reliability of the salinity variability derived from GLORYS reanalysis will be discussed, based on comparisons with the salinity variabilities measured in-situ by sensors onboard ships.
Actually, Ship ThermoSalinoGraph (TSG) measurements are sampled at a few kilometers resolution along ship transects, hence, allowing an estimate of the variability at a higher resolution than GLORYS. We use the TSG-LEGOS-DM delayed mode data set derived from voluntary observing ships collected, validated, archived, and made freely available by the French Sea Surface Salinity Observation Service [25]. For that, we select adjusted values when available and only TSG data with quality flags=1 and 2 ('good' or 'probably good').
As the SSS variability along a line is different from the SSS variability over a surface, for this exercise, a one-dimensional GLORYS variability is computed along ship lines: the standard deviation of salinity values from GLORYS reanalysis along 50km-long lines, with a direction close to ship transects (with a precision of ± 22.5°). Fitting the variability to the direction of the ship transect allows for the possible anisotropy of the SSS field.

Results
We will first consider global maps of the main terms in the uncertainties balance (equation (1) mean. After this qualitative and raw analysis (temporal correlations between the various terms are neglected in that analysis), a thorough analysis is presented based on an exact computation of equation 1. Before analyzing in details the observed differences between CCI+SSS and Argo salinities normalized by the uncertainties originating from the different sources, we focus on the main contributors to equation (1): the STD differences between Argo and CCI salinities, the quadratic mean of Usat from CCI data, Umis and Umist derived from the GLORYS simulations over the entire study period (2016-2018). Over 2016 -2018, at global scale, Usat is lower than the STD difference between CCI7 and Argo SSS: the quadratic mean of Usat is 0.135 while the STD of CCI7-Argo SSS is 0.24. Nevertheless, when filtering out outliers by using a robust STD, STD* (STD*(x) is defined as the median (abs(x − median (x)))/0.67 ; STD*(x) is equal to STD(x) in case of a Gaussian distribution of x), STD*(CCI7-Argo) becomes closer to 0.135 (it is equal to 0.16).

The different contributions to the uncertainty balance
We look at the spatial distribution of the temporal averages of each term (Figure 3). Qualitatively, Figure 3.a. is similar to Figure 3.b in most open ocean regions, where the natural variability of salinity is expected to be low (below 0.2). The STD of CCI7-Argo SSS seems therefore bounded by the value of the satellite uncertainty Usat in regions of low variability. High values of STD (>0.5, going up to 1) are observed on Figure 3.a in regions where natural variability is important, for example in river plumes (notably the Amazon, Congo, Malvinas, Ganga Brahmaputra, Mississippi) and in areas of important fronts (Gulf Stream region, Aghulas return current, Kurushio). These high values are also observed on maps 3.c and 3.d but only slightly on map 3.b, implying that the CCI7-Argo SSS differences are mainly due to sampling mismatch in regions of high variability.
In a few regions however, notably near Japan and in the Labrador Current region, we observe large values of STD(CCI7-Argo) that are stronger than Usat and Umist. These differences are likely due to the frequent presence of RFIs in these regions ( [26]).

Detailed analysis of uncertainties balance:
3.2.1 Distribution of reduced differences Figure 4. Distribution of differences between Argo floats and CCI data normalized with Usat (orange), with Usat and Umis (red) and with Usat and Umist (blue), and corresponding Gaussian fits, over the global ocean. The black line corresponds to a theoretical Gaussian distribution with a standard deviation of 1. Corresponding STD are reported in Table 2.  Table 1. Corresponding STD are reported in Table 2. We first analyze the distribution of all differences taken between 2016 and 2018. Figure 4 shows the distribution obtained over the global ocean, and Figure 5 shows the distributions obtained in the various areas indicated on Fig. 2 and Table 1. The corresponding STD of the distributions are in Table 2.
In order to estimate the impact of the uncertainty mismatch Umis on the comparisons between in-situ Argo values and CCI products, we analyze the STD of the differences normalized by: (a) the satellite uncertainty Usat only (orange curves), (b) the quadratic mean of Usat and Umis (red curves), and (c) the quadratic mean of Usat and Umist (blue curves).
We then test the closeness of the distribution of the error-reduced salinity differences to a Gaussian distribution with a STD equal to 1 as proposed by [6].
A Gaussian fit (mean and STD estimated using Matlab Non linear Least Squares fitting method), is superimposed on the different histograms, and we consider the standard deviation of this fit in each case. If the uncertainties are correctly estimated, we expect the normalized difference to follow a Gaussian distribution with a standard deviation of 1 and of 0 mean, such a distribution is represented in the black dotted lines. This Gaussian distribution is calculated by considering the same number of measurements as the populations studied, i.e. the distributions have the same integral value. The outliers at the ends of the histograms have been gathered into a single class.
These figures show a clear improvement of the distributions when the sampling mismatch uncertainty is taken into account: over the global ocean, the standard deviation goes from a value of 1.16 with Usat only (orange curve), which is too large, to a value of 1.03 much closer to unity with the quadratic sum of Usat and Umis (red curve). Using Umist (blue curve) results in a STD of 0.99, very close to the value of 1 we are looking for.
In addition, taking into account Umis considerably reduces the distribution tails: the reduced differences of values higher than 3.9 are reduced by about one half, and even more with Umist.
More precise observations can be made by focusing on regions of particularly high or low variability. First, in the low variability region of the eastern South Pacific, we observe that the uncertainty that plays a major role is Usat, and that the addition of Umis has little impact on the distribution of differences. On Fig. 5a, the STD of the red and blue curves taking into account Umis are slightly too low, which means that Umis or Usat is slightly overestimated in this region.
In the variable regions, the improvement is not homogeneous. In the region of the Agulhas current retroflection (Fig.5b) we obtain good results, taking into account Umis and Umist allows to obtain a STD of 1.05 and 1.02 respectively.
In the Gulf Stream region, Fig. 5d, the histograms are under the black curve meaning that Umis or Usat are too low. Taking into account Umis or Umist allows to get closer to a Gaussian of standard deviation of 1, going from a STD=1.42 for the orange curve to a STD=1.20 or STD=1.13 with Umis and with Umist respectively. This corresponds to an improvement of about 30%, but remains imperfect. This may be related to remaining seasonal biases observed in this region in the CCI data which origin remains under study. In the Amazon plume (Fig. 5c), on the other hand, the orange curve is already close to a Gaussian distribution with STD=1, and adding Umis and Umist has a too strong impact. This could be related to uncertainty in GLORYS derived variability as discussed in next section.
To highlight the potential of the method to validate Usat, we compare the standard deviation values to the distributions obtained with the previous version of the CCI products (v2.3), for the global ocean, in which Usat was estimated with a simplified approach.
With version 2.3 at global scale, the STD obtained is less than 1 (standard deviation of 0.93 with Umis and 0.89 with Umist). In version 2.3 the errors were therefore overestimated by about 12% and particularly too high in regions of low variability. This has been corrected in version 3.2. Histograms for version 2.3 are available in Supplementary Materials. Supplementary Materials), the superposition of histograms shows a clear improvement in the distribution of reduced differences when mismatch uncertainty is taken into account, both at the global level and in the variable regions. Indeed, this shifts from a STD always higher than 1 (1.55 at the global scale) to a STD close to 1 (1.1 with Umis, 1.003 with Umist). In the Pacific region the importance of mismatch uncertainty is less, but taking it into account also allows to approach the STD=1 curve. The histograms on the variable regions give the same conclusions as with the weekly products.

Global distributions
We now investigate further the spatial distribution of STD computed in 2° boxes.   Table 2. distribution (dashed line) if equation (1) holds for each pixel of the maps.
In this section, we focus on the validation of the CCI v3.2 weekly SSS and associated uncertainties, taking into account, step by step, the different sources of uncertainties involved in equation 1 (Figure 6), and in 2° boxes to provide a better understanding of the geographical variations of the different sources of uncertainty. As in the previous section, the three normalization methods are evaluated here. Figure 7 shows histograms of the variances multiplied by the number of measurements, in the 2° boxes shown in Figures 6acd. The black dashed curve represents the theoretical curve that these histograms should follow if the normalized differences distributions in each box follow a (0,1) distribution, as we expect when the uncertainties are correctly estimated.
The introduction of Umis (Figure 6c) reduces the very large STD, in highly variable regions such as the Gulf Stream or the Agulhas Current retroflection. This reduction is even stronger with Umist ( Figure 6d). This can be seen in the histograms (Figure 7), where the red and blue histograms, which take into account Umis and Umist respectively, are much closer to the black theoretical curve for high values corresponding to high variances than the orange histogram, where the differences are normalized by Usat only. Quantitatively, the correlation coefficient between the χ 2 theoretical distribution and the one obtained with Usat only is 0.95, and respectively 0.97 and 0.98 when taking into account Umis or Umist.
This result is similar and more pronounced for monthly products (see maps in Supplementary Materials). By normalizing the differences by Usat only, the STD obtained is clearly greater than 1 and often even greater than 2 in a large majority of the 2° boxes. Taking into account Umis or Umist gives quite similar maps to Figures 6c and 6d, which again shows the importance of mismatch uncertainties in the validations, especially for monthly products. The histograms obtained for the monthly products also show this clear improvement which is strengthened with Umist. The correlation coefficient between the χ 2 theoretical distribution and the one obtained with Usat only is 0.81, and respectively 0.97 and 0.98 when taking into account Umis or Umist.
When the same study was conducted on the CCI version 2 weekly products, the results were further from the calculated theoretical χ 2 curve, with a very large number of points at low values of n Var(X) indicating that the Usat of version 2 was overestimated, mostly because of the simplified propagation of errors in version 2 instead of the full least square propagation of the errors implemented in version 3. The correlation coefficient between the χ 2 theoretical distribution and the one obtained with Usat only is 0.71, and respectively 0.75 and 0.76 when taking into account Umis or Umist for CCI v2.

GLORYS salinity variability
One can question the legitimacy to use of the GLORYS model to represent the in-situ point measurement. This model has indeed several limitations.
The first limitation is the spatial resolution of the model, 1/12°, which corresponds to about 10 km near the equator (Nyquist cutoff wavelength of about 20km), which is a higher resolution than the satellite salinity data but is still far from a point data. It is however not feasible to use higher resolution simulations due to computation times.
To counter this limitation, we have multiplied our uncertainties mismatch estimates by a coefficient, in order to adjust these values by taking into account the finer scales. This coefficient is calculated by assuming a PSD slope in k −3.3 constant in time and space, but this slope could vary slightly according to regions and time. We note however a weak variation of the coefficient according to the slope of the spectrum: +(-)2% for a spectrum of slope of k −3.2 (k −3.4 ), compared to the used spectrum of slope k −3.3 .
To validate the sampling mismatch derived from GLORYS simulations, we have compared model variability to in-situ variability derived along ship tracks available between 2016 and 2018. The GLORYS variability compared to TSG variability derived along merchant ships lines (Figure 8a) indicates that the differences are small, less than 0.02 for most measurements, the GLORYS variability being generally slightly higher than the in-situ variability. The difference is however greater in regions of high variability like fronts or river flows. This result seems counter-intuitive as one would expect the GLORYS salinity variability to be lower, since the resolution of the reanalysis is lower than the resolution of the ships measurements and since we did not add contribution for the small scales not resolved by GLORYS in these comparisons.
The observed discrepancies near river plumes may be due to a limitation of the GLORYS reanalysis, which assimilates river run-offs in the form of climatologies [27] (as explained in [28]), and thus does not take into account interannual variability in river discharge. The study [29] shows that the RMSD of the GLORYS model relative to Argo data decreases near the Amazon and Congo estuaries when assimilated with SMOS satellite data. This suggests that the spatial variability of the model is overestimated in the river plume regions, confirming the biases observed in the comparisons to merchant ships in these regions.
Another study [30] shows that sub-mesoscale structures are poorly represented in the GLORYS model, due to the too coarse spatial resolution. This could explain the differences observed in the Gulf Stream.
Overall, however, the observed differences between the TSG and GLORYS model variabilities are small, less than 0.02 in low variability regions, and less than ~20% in very variable regions, as shown in the histogram in Fig 8c. In view of these small differences, we consider the GLORYS variability to be a sufficiently good representation of the natural variability on a global scale, although we note that the differences are greater in regions of high variability.
The way we estimate Umis is close to the one used by [13] to estimate Representation Error (RE), except that [13] uses higher resolution simulations. We have compared both estimates, made within an area of 50 km radius for Umis and within 40 km for RE. Maps of the median of Umis over the 3 years of our study (available in Supplementary Materials) are qualitatively in agreement with the 40 km RE (Table S5 of [13]). As indicated in the methods section, we used the 5m depth layer of the GLORYS model to take into account the fact that the majority of the ARGO data are taken at 5m depth, and that the CCI data are corrected for the effect of rain. However, rain is not the only element generating a salinity gradient in the surface layer. To quantify the vertical variability of salinity and its influence on comparisons between Argo at 5m and satellite surface salinity, we have calculated the GLORYS sub-footprint variability at the surface as well. Figure 9 shows the GLORYS STD at the surface minus the GLORYS STD at 5m, averaged over 3 years.

Vertical near-surface variability
The differences obtained can reach 0.1, in regions where salinity is highly variable. In Figure 9 we can distinguish three types of regions. In regions strongly impacted by rainfall, the variability of measurements at 5m depth is lower than the surface variability. The observed differences are indeed very close to the impact of rain on salinity calculated by [31], from SMOS data, and this effect is corrected in the CCI products. Large differences between SSS and a few meters depth salinity are also expected in river plumes characterized by strong vertical stratification (e.g. [32]). However, in areas off the mouth of the river, the variability calculated at 5m is often higher than the surface variability. The origin of these patterns is not well known, 5m depth could be close to the halocline depth and surface fresh waters could be isolated from the variability of the layers below, but the assimilation of in situ data often performed close 5m depth could also generate some additional variability. Finally, in regions of low variability, the difference between the variability at 5m and the surface variability is close to zero. The CCI version 3.2 products contain a correction for rainfall effects, it is therefore wise to calculate a STD at 5m depth in rainy regions. On the other hand, to be completely consistent, it would be necessary to use a STD based on a comparison of individual points at 5m against a surface average value in regions where a large difference in variability between the surface and 5m is observed but not affected by rain, such as close to fronts or river plumes. However, since the variability derived from GLORYS at 5m depth is most of the time higher than the one obtained with in situ measurements (Figure 8), this effect is likely within the uncertainty of the STD derived from GLORYS simulations.

Conclusion
We find a remarkable agreement between CCI version 3.21 products (SSS and SSS uncertainties) and Argo upper salinities once the uncertainty due to sampling mismatch between CCI+SSS satellite products and in-situ Argo measurements is taken into account. This is particularly striking for the distributions of the differences between satellite and in situ salinities normalized by the sum of the uncertainties associated with each satellite-in situ colocation (equation 1): at global scale, when mismatch uncertainties are taken into account, the normalized distribution is very close to a Gaussian distribution with a unit sigma contrary to the one normalized with the satellite salinity uncertainty only ( Figure  4). The role of the sampling mismatch is particularly important in regions with high variability, as shown in Figure 6. In regions with low variability, on the other hand, the contribution of the sampling mismatch uncertainty is very small compared to the satellite uncertainty. Taking into account Umis or Umist in the validation allows the STD of the normalized differences to get very close to a χ 2 theoretical distribution (Figure 7; correlation coefficient of 0.95 with Usat only and of 0.98 when taking into account sampling mismatches).
Even though the spatial resolution of the GLORYS reanalysis do not resolve very fine scale phenomena (<20 km), we find that they should also be taken into consideration to approximate the sampling mismatch uncertainties. At global scale, the STD of the normalized differences is 1.15 when only the satellite SSS uncertainty is considered, and 1.03 when the GLORYS derived mismatch is taken into account. Moreover, we find that an efficient way to counter the limited spatial resolution of GLORYS is to adjust the GLORYS derived variabilities by a spectral analysis as proposed by [12]. With this correction, the global scale STD of the normalized differences becomes equal to 0.99.
The analyses shown in this main paper have been obtained with weekly CCI fields. Analyses of monthly CCI v3 data compared with Argo data (in Supplementary information) give similar results but with a reinforced importance of the mismatch uncertainty compared to the uncertainty of the monthly satellite SSS (which is reduced with respect to the uncertainty of the weekly fields). Again, the spectral correction of GLORYS variability improves the comparisons.
The results obtained at global scale are corroborated by analyses of distributions over large regions. The standard deviation of the distribution of differences reduced by the quadratic sum of the uncertainties Usat and Umist, has a standard deviation very close to 1 over the global ocean, slightly too high (1.05) in the variable regions and a little too low (0.98) in the region of low variability of the Pacific. These results are significantly better, up to 30% in the Gulf Stream, than when the sampling mismatch uncertainty is not taken into account.
These results highlight the consistency of satellite uncertainties of CCI version 3 products at the global scale, considering the validation method proposed by [7]. We observe an overly pitted distribution of reduced differences in the Amazon region, corresponding to a 15% overestimate of the total variability. Although this could indicate an overestimation of Usat, as we saw in the discussion, the Umis calculated in the river plume regions may be high biased. Future studies should address this issue, by comparing the variabilities obtained from GLORYS to the ones estimated from other models or with other in situ datasets.
Conversely, the distribution of reduced differences is too flat in the Gulf Stream region, the STD is too high by about 15%, which tends to conclude that Usat is too low there. This could be related to remaining seasonal variation of biases at high northern latitudes possibly linked to ice contamination, calibration flaws related to sun or antenna temperature or RFIs impacts, which are not well represented in the Usat values.
The interest of this method is also to carry out comparative validations between different products. The results of these tests on the previous version of the CCI products (version 2), highlights the clear improvement of uncertainty estimates in the new version related to modifications in the error propagation in the optimal interpolation methodology.
This shows the importance of carrying such tests for the validation of satellite products and their uncertainties.
For reasons of data availability and computational time we have restricted the study to three years, over the period common to SMOS and SMAP. It would be interesting for complete validations to conduct a similar study for the whole period of the CCI products. The work on level 4 CCI+SSS data presented here could also be applied to other types of data, by adapting the spatio-temporal scales on which the uncertainties due to sampling mismatches are calculated.