A Bayesian model for annual crop phenological parameter estimation using optical high resolution image time series

: Vegetation status assessment is crucial for agricultural monitoring and management. 1 Vegetation indices derived from high resolution image time series can be used to derive key 2 phenological parameters for annual crops. In this work, we propose a procedure for the estimation 3 of these parameters and their associated uncertainties. The approach uses Bayesian inference 4 through Markov Chain Monte Carlo in order to obtain the full joint posterior distribution of the 5 phenological parameters given the satellite observations. The proposed algorithm is quantitatively 6 validated on synthetic data. Its use on real data is presented together with an application to 7 real-time within season estimation allowing for phenology forecasting. 8


Introduction
Vegetation status assessment is a crucial information for agricultural monitoring 11 and management. In the coming years, climate change [1], increasing population [2] 12 among other factors will rise the level of pressure on agricultural production. Also, 13 agricultural practices are an important source of ecosystem degradation at the global 14 scale [3] and therefore, land use monitoring related to farming is crucial for sustainable 15 land management [4]. 16 Remote sensing imagery in general and, in particular, high temporal and high 17 spatial resolution data as the ones provided by Sentinel-2 [5] constitute a major asset for 18 this kind of application. Indeed, high temporal resolution provides access to phenology 19 monitoring through vegetation status indicators. 20 These indicators are usually derived from vegetation indices. For instance, the 21 well-known Normalized Difference Vegetation Index (NDVI) is largely used to monitor 22 fractional vegetation cover. The Leaf Area Index (LAI) is preferred when downstream 23 models requiring a physical magnitude are used. The choice between NDVI, LAI or other 24 vegetation indices is application dependent. Bolton and Friedl [6] show that remotely 25 sensed vegetation indices and phenology metrics can be used to forecast agricultural 26 yields. Using Sentinel-2 image time series, NDVI and LAI maps can be produced at 10 27 meter resolution every 5 days [7]. 28 For most modeling and diagnostic applications a set of metrics describing the 29 temporal evolution of the vegetation is preferred to the raw profiles. These can be for 30 instance the dates of the emergence of the vegetation (start of season), the maturity, the 31 senescence, etc. The extrema values of the vegetation index are also used. 32 All these metrics can be computed at different temporal and spatial scales depend- 33 ing on the application. Precision agriculture can use pixel-level estimations to deal with 34 heterogeneity within the fields. Field-level estimations are enough for other applications. 35 Evaluations at the end of the season can be useful for long term monitoring, while within 36 season estimations are needed for real-time management and forecasting. 37 For all applications, the availability of uncertainty estimates is crucial, since many 38 sources of errors are present in the computation of phenological parameters. Indeed, the 39 remote sensing observations can contain different sources of noise: missing acquisitions approach is illustrated for a specific phenology model for crops (a 6 parameter double 48 logistic function fitted on NDVI), but it is generic enough to be applied to other models. 49 The paper is organized as follows. In section 2, we review the existing literature on  The characterization of vegetation phenology using remote sensing has been widely 57 addressed in the literature. Most of the methods estimate phenological parameters using 58 either sigmoidal fitting based on the work of Zhang et al. [8] or empirical thresholds on 59 vegetation indices. The most usually extracted markers are start of season and end of 60 season [9], [10], which are also referred as onset and offset [11]. These 2 phenophases are 61 useful in determining the length of the growing season.

62
In the published literature, the onset of the vegetation appears under many different 63 names (SoS for start of season or start of spring, emergence date, etc.). It is also defined 64 in multiple ways which actually represent different phenological stages (e.g., the timing 65 when vegetation starts to green up, the timing when vegetation grows the fastest) [12], 66 [13]. For instance, [8] defines it as the first maxima of the rate of change. An inter-67 comparison of SoS retrieved using 10 satellite methods made by Xu et al. [14] shows that 68 the difference between individual methods can be as much as two months. It also shows 69 that most of the methods have large biases with respect to field data. Other works show 70 disparities up to 100 days [15].

71
Some studies also derive more than two phenology markers. For example, Zhang 72 et al. computed four parameters including the onset of green-up, maturity, senescence 73 and dormancy [8].

77
All these papers use the full vegetation cycle either for fitting a parametric model or 78 to produce a smoothed profile on which robust estimators can be applied. The key dates 79 are either estimated from the parameters of the fitted models, or by using constraints on 80 the profile derivatives. For instance, Zangh et al. [8] propose the first maximum of the 81 3rd derivative of the double logistic function, but other versions using the 2nd derivative 82 (the camel-back method [21]) also exist.

83
The simpler methods using empirical thresholds on vegetation index (VI) profiles 84 [22], [23], need both a smoothing window of several months and the knowledge of the 85 maximum value of the VI. For instance, Reed et al. [22] use 9 14-day composites, which 86 corresponds to more than 4 months. Furthermore, the thresholds are dependent on the 87 vegetation type and the site.

88
Most of the literature cited above uses medium or low resolution imagery (AVHRR 89 and MODIS) which ensures regular cloud-free data and smooth temporal behaviors.

90
Only recently, have these approaches been compared using high spatial resolution data 91 (30 m) [24], and even though a 2-day revisit cycle was used, the whole season approach 92 was adopted.

93
A recent review on the topic was done by [25] showing that the choice of the most 94 appropriate model «depends upon the purpose of the study, the growth trajectory to be 95 analyzed and the targeted land cover type(s)». 96 Another recent review, focused on high spatial and temporal resolution image time 97 series [26] gives interesting discussion about the differences between ground phenology 98 and land surface phenology (satellite). This paper also lists performances of phenol-99 ogy estimation methods and concludes that «more efforts are required for testing the 100 efficiency of Sentinel-2 data in estimating phenophases of crops».

101
The difference between ground phenology and land surface phenology must be 102 stressed. Ground phenology is defined by very precise stages that can only be observed 103 on the field, like for instance the emergence of 50% of the seeds to the cotyledon stage.

104
This ground phenology is used by agronomists, but it is very costly to monitor and 105 suffers from observator biases. Remote sensing allows to observe a large number of 106 fields without observer bias, but is less accurate and should be considered as a proxy for 107 the ground phenology.   The approach does not need smoothing or gap-filling of the data as it is the case in 122 most published approaches.

123
It is worth noting that the Bayesian approach proposed here is completely generic 124 and can be applied to phenological models other than the 6-parameter double logistic.   The logistic function has the form: where the t variable represents time in our application.

136
The double logistic is an affine scaling of the difference of 2 logistics: where A + B and B are respectively the maximum and the minimum values of g(t).

137
The other 4 parameters control the slopes of growing and senescence phases: t 0 (resp. t 2 ) is the date of the maximum increasing (resp. decreasing) slopes and t 1 (resp. t 3 ) is 139 related to the speed of growth (resp. senescence). The growing and senescence slopes 140 can be obtained from the derivatives of the double logistic. Since From these expressions, we can derive the 6 phenology metrics of interest: and therefore, If we take this derivative as the slope of the line which joins SoS and Mat in figure   156 1, we have that and therefore, We can also assume t 0 ≈ Mat + SoS 2 .
Using the same reasoning, we can approximate With these approximations, we have direct relation between the phenological  As stated in section 2, the differences between ground and land surface phenology 168 as well as the assumptions made by the phenology models used, can lead to inaccurate 169 phenology characterizations. Being able to asses the uncertainty of the estimations is 170 therefore crucial for downstream uses, either by models or by decision makers.

171
In order to obtain uncertainty estimates of retrieved parameters from an NDVI time 172 profile, we will use a Bayesian approach. The interest of Bayesian estimation is that, 173 given a data generation model (the phenology model described in the previous section), 174 one can estimate the probability distribution of each of the parameters of the model (the 175 phenological variables) given an observation (an NDVI time profile).

176
The Bayes rule is written as where:  What is interesting here, is that we can estimate the complete distribution of the 193 parameters for a given observation, and therefore, have an accurate picture of the 194 uncertainty associated to each parameter.

195
The question now is how to choose the 3 terms in the right-hand side of equation (8). has a unit integral. We can therefore ignore this term and normalize p(θ|D) after the 201 fact.

202
In other words, we need to estimate and normalize to ensure  advantage here is that we can provide much richer information than an initial guess.
where ε(t) is an i.i.d. Gaussian noise. Therefore: µ(t) = m(t, max NDV I , min NDV I , SoS, Mat, Sen, EoS), where m(t, max NDV I , min NDV I , SoS, Mat, Sen, EoS) was defined at the end of section 3. With this model, the vector of parameters is θ = (σ, max NDV I , min NDV I , SoS, Mat, Sen, EoS) and includes the variance of the noise. Given this model, we can simulate NDVI profiles 247 by choosing values for the parameters, obtaining the mean µ(t) and sampling a Gaussian 248 distribution. 249 We now need to choose prior distributions for all the parameters to be estimated. 250 We will choose uniform distributions for simplicity. These are rather uninformative 251 priors.

252
For σ, which represents the standard deviation of the noise in the observations, we 253 will chose a maximum value of 0.1 which corresponds to 10% of the maximum expected 254 range for NDVI values. For min NDV I , we will define a range between 0 (bare soil) and 255 0.4 (presence of vegetation). The maximum value of NDVI will be defined relative to 256 the minimum value: we assume that a crop with a typical phenology value will have a 257 max NDV I at least 0.3 higher than min NDV I , and that the highest value will not be higher 258 than 1. have an SoS of 120 (late April). It is interesting to note that this latent variable will be 269 available after the inference, and therefore, the algorithm also provides a classification 270 output for winter and summer crops if we threshold this probability.

271
With the above considerations, the priors for the parameters are set as follows:

Prior predictive checks 272
To assess the ability of our model to represent realistic crop NDVI profiles, we can

277
An illustration of such prior predictive checks is given in figure 2. The legends 278 show the values for max NDV I , min NDV I , SoS, Mat, Sen, EoS and green (resp. red) plots 279 indicate a probability of Summer crop lower (resp. higher) than 0.5.

280
One can observe a wide variety of behaviors in terms of growing rates, dates, noise 281 level and NDVI dynamics.  The real data set is extracted from the public data sets built for the Sentinel-2

290
Agriculture project [30]. The data collection procedure and protocol are described in 291 [31].

309
In order to simulate this irregular acquisition pattern, we will generate NDVI 310 profiles with a 5 day time step and then randomly choose N dates.

322
We present the results of the experiments on both the synthetic and the real data 323 sets. For all experiments the NUTS algorithm [27] as implemented in the NumPyro 324 library [28,29] was used with 10000 samples after the generation of 100 warming steps.  Table 1 provides the results of the quantitative validation on the simulated data set    The results presented in section 6.2 illustrate the power of the Bayesian approach.

408
On one hand, the model is defined in terms of the application domain (equations (12) 409 through (14)) in a way that is independent of the inference approach. should be used. One approach would be using variational inference [33,34] where the   The proposed approach has been applied to simulated data to provide a quantitative The main limitation of the approach proposed in this paper is its computational cost.