Simultaneous retrieval of soil, leaf, and canopy parameters from Sentinel-3 OLCI and SLSTR multi-spectral top-of-canopy reflectances

Abstract: Multiand hyper-spectral, multi-angular top-of-canopy reflectance data call for an 1 efficient retrieval system which can improve the retrieval of standard canopy parameters (as 2 albedo, LAI, fAPAR), and exploit the information to retrieve additional parameters (e.g. leaf 3 pigments). Furthermore consistency between the retrieved parameters and quantification of 4 uncertainties are required for many applications. We present a retrieval system for canopy and 5 sub-canopy parameters (OptiSAIL), which is based on a model comprising SAIL, PROSPECT6 D (leaf properties), TARTES (snow properties), a soil model (BRDF, moisture), and a cloud 7 contamination model. The inversion is gradient based and uses codes created by Automatic 8 Differentiation. The full per pixel covariance-matrix of the retrieved parameters is computed. For 9 this demonstration, single observation data from the Sentinel-3 SY_2_SYN (synergy) product is 10 used. The results are compared with the MODIS 4-day LAI/fPAR product and PhenoCam site 11 photography. OptiSAIL produces generally consistent and credible results, at least matching the 12 quality of the technically quite different MODIS product. For most of the sites, the PhenoCam 13 images support the OptiSAIL retrievals. The system is computationally efficient with a rate of 14 150 pixel s−1 (7 ms per pixel) for a single thread on a current desktop CPU using observations 15 on 26 bands. Not all of the model parameters are well determined in all situations. Significant 16 correlations between the parameters are found, which can change sign and magnitude over time. 17 OptiSAIL appears to meet the design goals, puts real-time processing with this kind of system 18 into reach, seamlessly extends to hyper-spectral and multi-sensor retrievals, and promises to be 19 a good platform for sensitivity studies. The incorporated cloud and snow detection adds to the 20 robustness of the system. 21


Introduction
Multi-and hyper-spectral multi-angular top-of-canopy reflectance data call for an 24 efficient retrieval system which can improve the retrieval of standard canopy parameters 25 (as albedo, LAI, fAPAR), and exploit the information to retrieve additional parameters 26 (e.g. leaf pigments). Furthermore, consistency between the retrieved parameters and 27 quantification of uncertainties are required for many applications. [1] give an overview 28 of methods for the retrieval of canopy parameters from hyper-spectral data and make 29 suggestions for future developments. Among the identified requirements are: per pixel 30 uncertainties and computational speed. 31 32 In their review on LAI, [2] report from the literature on suggested ways forward 33 for the retrieval algorithms. This comprises the inclusion of contributions from more 34 complicated background elements, including water, snow and understory vegetation. 35 For algorithms that do not rely on land cover information, it is suggested that multiple 36 sensors, multiple spectral bands, and observational geometry are likely to improve the   The full soil and surface state model computes the surface albedo below the canopy 95 and has four independent parameters and three components: dry soil (Sec. 2 The dry soil reflectance model is based on Empirical Orthogonal Functions (EOF, [13], 101 [14]) of the spectra of lab-dried topsoils from the ICRAF-ISRIC Soil VNIR Spectral Li-102 brary ( [15]). The dry soil spectrum is simulated as the unweighted average spectrum  Anthocyanin content µg/cm 2 C brown brown pigments content 1 C w equivalent water thickness cm C m dry matter content g/cm 2 Soil BRDF sub-model (Ross-Li-R) (f iso ) isotropic kernel factor, diagnosed from snow albedo 1 f vol volumetric scattering kernel factor 1 f geo geometric scattering kernel factor 1 Snow sub-model (TARTES) snowheight height of a single snow layer with fixed properties m Soil albedo model (EOF+Philpot) EOF1 factor for empirical soil spectrum variation 1 1 EOF2 factor for empirical soil spectrum variation 2 1 moist relative moisture saturation of soil 1 classes, projection onto the EOFs is largely uncorrelated with soil class. One reason may 112 be that FAO 74 soil classes (and many other soil classification systems) are focusing on 113 the agricultural use of the soil rather than its immediate physical or chemical properties.

114
Therefore it remains to be shown, how well the principle components align with Soil

115
Organic Carbon (SOC) content or other physical soil properties (cf. [16], [17]).   The dry soil spectrum is modified by a third parameter, namely the relative sat-123 uration of the soil with moisture, which drives a moist soil spectral model suggested 124 by [18]. The advantage of this approach is that for a given dry spectrum the moist 125 spectrum is simulated without a priori knowledge of the saturated spectrum. In addition 126 to Philpot's formulation, the two parameters d (mean free path in pore water), and f w 127 (fraction of moisture on the surface), are parameterised by the soil moisture expressed as 128 relative saturation. While this relationship obviously is influenced by the properties of 129 the individual soils, a general tendency can be deduced from a curve fit (Fig. 4) to the 130 data of the four soils presented in Fig. 8 of [18].

131
Hence this parameterisation is chosen as: with d 0 being the maximum mean free path in pore water of saturated soil (set to 305 µm 133 = 0.0305 cm). A similar attempt is made by [19], empirically fitting gravimetric water  misfits for a given soil type seem to balance to a certain extent, for instance, for a given  The surface reflectance anisotropy below the canopy is simulated by the Ross-Thick-

162
Li-Sparse-Reciprocal model as documented in [20], including the approximate integral 163 formulas for white-sky and black-sky albedo. Just two global parameters are used, 164 corresponding to the factors of the geometric and the volume scattering kernels. They 165 are scaled for each band to match the spectral albedo prescribed by the surface model.

166
The factor for the isotropic kernel is thus fully determined and has a spectral dependency.

167
This approach is motivated by the observation that normalised BRDF kernel factors for 168 bare soils show substantial correlation across bands (e.g [21], [22]). We eliminate the 169 spectral dependency of the kernel factors by normalising them with the spectral BHR.

170
This leads to a frequency-independent anisotropy factor (ANIF, see [23]) and preserves 171 the Lambertian equivalent albedo computed by the soil model. Note that this is not the 172 same as normalising with f iso , since the other kernels also contribute to the BHR.  the computation for asymmetry parameter g, absorption coefficient σ abs , and extinction 194 coefficient σ ext as shown in [25], eq. B1-4: where C v is the volumetric droplet concentration, a eff the effective droplet size, λ is 196 the wavelength, k = 2π/λ, χ is the imaginary part of the refractive index of water, 197 κ = 4πχ/λ. Using eqs. 4.42 and 4.48 of [24], we have for the bi-hemispherical reflectance 198 of a weakly absorbing semi-infinite cloud, r ∞ , x and y are called generalised radiative transfer parameters. From Section 2.4 of [25] it follows that thus introducing the cloud thickness L, which is the single parameter of this sub-model (effective droplet size a eff is fixed to 12¯m, and the volumetric droplet concentration C v to 1.0 × 10 −7 ). To compute the reflectance of a finite cloud with black background, we extend eq. 4.30 from [24] with the uncollided radiation, Here, µ 0 = cos(θ 0 ) with solar zenith angle θ 0 , µ = cos(θ) with observer zenith angle θ, and K 0 (µ) = 3 7 (1 + 2µ) is the escape function. Uncollided, directed radiation passes the cloud as T uncoll, down = exp(−τ/µ 0 ), (9) where τ = σ ext L is the optical thickness. Collided directed radiation is treated as diffuse after its passage through the cloud, with factor The single cloud thickness parameter L is expected to show a certain degree of variation   In order to convert spectral information to broadband quantities, e.g. compute 212 the fAPAR, a solar reference spectrum is used. This data is identified as: ASTM G173-213 03 Reference Spectra Derived from SMARTS v. 2.9.2 1 . Fig. 6 gives an example of the 214 simulated spectra from different sub-models and the broadband integrated quantities 215 for a given parameter set. The SY_2_SYN product combines the information from the Ocean and Land Colour      Table 4 gives an overview of the sites, which are used in this study,       The objective function to be minimised in order to retrieve the parameters, is a combination of a data term and a prior term.
The former measures the misfit between model and observation, weighted with the uncertainties, while the latter measures the deviation form prior assumptions as described in section 2.3.1. This is the standard Bayesian approach to find the a posterior probability of variables which are approximately normally distributed. The degrees of freedom for the final χ 2 -test are updated for every pixel, depending on the number of available observations. Failed inversions are marked as untrusted, badly failed inversions are discarded. J data could use inter-band correlation information, if it was available. Presently it is formulated as which corresponds to using a diagonal variance-covariance matrix. N is an index set of valid observations. Since it is assumed that the distribution of the x is standard normal, the prior term is simply

286
The optimal parameters for a single pixel are determined by minimising the cost 287 function. Here we use a quasi Newton, reduced memory, BFGS algorithm ( [29]). Efficient  Having minimised the cost function we want to specify the posterior uncertainties of the optimal model parameters, i.e. the covariance matrix Σ x . It is the inverse of the Hessian matrix at the minimum. The Hessian is the second order derivative of the cost function and computed here by vector forward over scalar reverse mode of AD ( [33]). In essence, that means that the generated gradient code is differentiated again, this time in forward mode, to get the code for the computation of the second order derivatives. In order to compute the full Hessian the vector forward mode is used. It propagates several perturbations simultaneously, allowing the extraction of the full Hessian in a single sweep, and such gaining a speed up for increased temporal and spatial locality ( [34]). Since the number of parameters is small we can directly invert the Hessian matrix. The posterior covariance matrix Σ y of the non-parameter values, such as the diagnosed quantities (sec. 2.1.5), is achieved by applying the Jacobian ∂ x y of a result vector y to the prior covariance: Code for the computation of ∂ x y is generated in vector forward mode AD, which is 300 then used to extract ∂ x y in a single sweep. Fig. 8 gives an overview of the full OptiSAIL 301 system.      data of different years did not seem sufficient to explain the fundamental difference 339 observed in fAPAR after September 2020 (Fig. 11l right) and the persistent abrupt loss 340 of Canopy Chlorophyll Content in January 2020 ( 12). Since the OptiSAIL processing is 341 time-independent, a qualitative difference in the data must have occurred. It is interest-342 ing to note, that also the MODIS product shows less variability of fAPAR for the period 343 after September 2020.   Fig. 13 shows further retrievals at the deciduous broadleaf forest site NEON-D02-SERC. A fairly clear signal is retrieved for the leaf brown matter fraction (Fig. 13a), and for the leaf and canopy chlorophyll content (Fig. 13d). Note that for deciduous sites outside the growing season, leaf pigments are only constrained by the weak prior assumptions and can therefore show quite noisy behaviour. By looking at the pigment content of the canopy rather than the individual leaf, the corresponding temporal evolution is more regular, as for instance for Canopy Chlorophyll Content (CCC). We compute the propagation of the variance-covariance matrix to CCC as

Results and Discussion
with σ LAI,Cab = r LAI,Cab σ LAI σ Cab being the covariance of LAI and Cab, computed from 346 the correlation r LAI,Cab , as provided by OptiSAIL. To demonstrate the importance of the 347 correlation information, the CCC has once been computed neglecting the correlation 348 of LAI and Cab in the uncertainty propagation (Fig. 13c), and once using the full Eq. 349 15 (Fig. 13d). The consequence is a systematically reduced uncertainty, which in turn   determined. Here, they mainly serve to reflect the uncertainty at the lower boundary 372 of the canopy. This is different where more of the bare soil is directly exposed to the 373 observer.

374
The surface moisture saturation parameter ("moist" in Tab. 1) can exhibit a very high 375 natural variability due to rain and dew. Even a superficial assessment of the quality is 376 beyond the scope of this study, even though, especially at some of the NEON sites, soil 377 moisture measurements very near the surface are available.

391
With OptiSAIL, we have documented and demonstrated an inversion system for the 392 retrieval of canopy and soil parameters from multi-angular, multi-to hyper-spectral data.

393
The application to the synergy (SY_2_SYN) top-of-canopy reflectances based on the OLCI 394 and SLSTR sensors of ESA's Sentinel-3A and -3B satellites shows good performance 395 when directly compared to the 4-day LAI and fPAR product based on the MODIS sensor.

396
Existing discrepancies tend to be more in favour of OptiSAIL, even though or because 397 OptiSAIL is not using the biome type as ancillary data. Note that for OptiSAIL, the 398 individual satellites and visits were process independently.

399
OptiSAIL has integrated cloud contamination detection in order to complement the 400 cloud flags of the satellite product. The simple assumptions of the underlying cloud 401 model (spectrum of a semi-infinite water cloud with fixed effective droplet radius and 402 droplet concentration) appear to be effective for the detection of spurious clouds.

403
In order to extend the retrieval into snowy regions, a sub-canopy snow layer is simulated,   posterior covariance matrix as part of the retrieval product.  Increased global performance could be achieved by porting the code to a GPGPU 437 (General Purpose Graphics Processing Unit) or a similar device, since the processing of 438 all pixels is strictly parallel without inter-dependencies, which is in line with the design 439 goals of this type of devices.

440
The present study has focused on processing data from single overpasses of the Sentinels.

441
However, it is common practice to aggregate observations in time over at least a few 442 days (e.g. MODIS 4-day LAI/fPAR product), in order to get better angular sampling and 443 to fill gaps. While the processing of more observations per inversion will take slightly 444 longer, there would only be needed one inversion every n days, such that a net speed-up can be expected for the processing.

446
The system seamlessly extends to multi-sensor and hyper-spectral retrievals. Other

451
The direct coupling of the system with an atmospheric correction model for a joint 452 retrieval from top-of-atmosphere radiances is a viable and logical next step. Resources program for contributing their camera imagery to the PhenoCam archive.

Conflicts of Interest:
The authors declare no conflict of interest.

478
The following abbreviations are used in this manuscript: