Wheat Yield Forecasting Based on Landsat NDVI and SAVI Time Series

Due to increase demand of food grain in the world, assessment of yield before actual production is important in making policies and decisions in agricultural production system. For a large area, forecast models developed from vegetation indices derived from remote sensing satellite data possesses the potential to give quantitative and timely information on crops over large areas. Different vegetation indices are being made used for this purpose, however, their efficiency in estimating crop yield is needed to be certainly tested. In this study, wheat yield forecast was derived by regressing ground truthing yield data against time series of spatial vegetation indices for the 2013 to 2019 growing seasons. These spatial vegetation indices derived from Landsat 8 image data: Normalized Difference Vegetation Index (NDVI) and Soil Adjusted Vegetation Index (SAVI) were compared to evaluate the most appropriate index that performs better in forecasting wheat production at Karcag, Kunhegyes and Ecsegfalva settlements in Jász-Nagykun-Szolnok county, in the Northern Great Plain region of central Hungary. The best time for making wheat yield prediction with Landsat 8SAVI and NDVI was found to be the beginning of ripening period (160 day of the year) with higher correlation between the vegetation indices and the wheat yield. The validation results revealed that the model from SAVI provides more consistent and accurate forecasts yield compared to NDVI. The SAVI model forecast yield for the validation years, 2018 and 2019 were within 6.00% and 4.41% of the final reported values while that of NDVI model were within 8.31% and 6.27%. Nash-Sutcliffe efficiency index is positive with E1= 0.99 for the model from SAVI and for NDVI, E1=0.57, which connote that the forecasting method developed and evaluated performs acceptable forecast efficiency.


Introduction
In the world today, there has been increase in concern about food security and sustainable agricultural development of which a key component is the accurate estimation of supply and demand of major crops like wheat. Wheat is the third most grown cereal after maize and rice with a production of 3.07metric tons/ha globally (World Wheat Production, 2016). Regarding to the 2008 food crisis and also the challenges of feeding the growing population, global agriculture and yield monitoring initaitives is needed. Due to increase demand of food grain in the world, assessment of yield before actual production is important in making policies and decisions in agricultural production system. Also, estimation of crop yield and early crop yield forecasting has a significant importance for policy makers to make timely decisions in planning for exports. Also, since world trading prices of agricultural products depends largely on their seasonal production levels, crop yields under crop production is important for national and international agricultural agencies, government agencies, and other crop marketing agencies too to set prices based on expected yield and plan earlier transportation of the crops.
The use of remote sensing for biomass monitoring has its base from its close relation to the canopy Leaf Area Index (LAI) and Fraction of Absorbed Photosynthetically Active Radiation (fAPAR) (Baret and Guyot, 1991;Prince 1991). Baret and Guyot (1991) discussed the potentials of different vegetation indices using the normalized difference vegetation indices, soil adjusted vegetation indeices, perpendicular vegetation index, and transformed soil adjusted vegetation index and described the relation between the vegetation indices and the Leaf Area Index (LAI) and fraction of absorbed photosynthetic active radiation. Prince (1991) also developed a model of crop primary production relating to the amount of absorbed photosynthetically active radiation (APAR) to net production of which it was discussed that there is an approximately linear relationship between the absorbed photosynthetically active radiation (APAR) and the normalized difference vegetation index (NDVI). Because of this almost linear relation, vegetation indices can be used as an indirect measure of primary crop productivity. The relationship described here between vegetation indices and biomass/fAPAR helps us to estimate crop yield since yield of many crops is mainly determined by the photosynthetic activity of agricultural plants in certain periods prior to harvest (Benedetti andRossini, 1993, Baret et al., 1989).
The interest in using satellite remote sensed data today for crop monitoring and crop production forecasting has tremendously increased due to its potential to produce data synoptically, with much spatial coverage, potentially at the global scale. Moreover, remote sensing is capable of providing timely (and potentially real-time) and objective data on crop growth at relatively minimum cost. Remote sensed vegetation indices (VIs) such as Normalized Difference Vegetation Index (NDVI) in particular has been greatly utilized for agricultural mapping and monitoring. In the past, National Oceanic and Atmospheric Administration's (NOAA) Advanced Very High-Resolution Radiometer (AVHRR) have been data source since 1980's for large scale crop yield forecasting and monitoring (Tucker et al., 1985;Malingreau, 1986).
Recently, remote sensing-based yield forecasting research has shifted to the National Aeronautics and Space Administration's (NASA) Moderate Resolution Imaging Spectroradiometer (MODIS) and other spatial resolution sensors. However, the spatial resolution of MODIS data is relatively low, with an available 250 m, 500 m and 1000 m (Bolton and Friedl, 2013). Cai and Sharma (2010)  Landsat multiband vegetation indices for wheat yield and discussed that vegetation indices (SAVI) derived from Landsat 8 satellite gives better estimation in the area. The Landsat 8 produces data with higher spatial resolution (30m) and free availability (Woodcock et al., 2008).
The higher the spatial resolution of a remote sensing data, the better its assistance in the identification of the low-resolution pixels. Studies on the relationship between crop yield and Landsat-derived vegetation indices are mostly bound to focus on individual fields (Liu et al., 2006;Lyle et al., 2013;Potgieter et al., 2014). Since Landsat derived vegetation indices have been proven to be one of the effective as tools for assessing vegetation condition, it is valid to use them to predict crop yield.
According to some field studies and airborne scanner experiments (e.g. Tucker, 1979;Tucker et al., 1980), green vegetation can be monitored by their spectral reflectance properties and combinations of the red and near-infrared reflecatnces (Vegetation indices). Deering (1978) proposed the Normalized Difference Vegetation Index which has become the most widely used for studying vegetation health and crop production. Moreover, minimizing the distribution of the effects on the relationships between vegetation spectral reflectance and crop yield, some researchers (like; Wu et al., 2007;Pena-Barragan et al., 2011;Dempewolf et al., 2014;Fu et al., 2014;Xie et al., 2015) refer to distance based vegetation indices such as Soil Adjusted Vegetation Index (SAVI). SAVI is applied in correcting Normalized Difference Vegetation Index (NDVI) for influencing of soil brightness in areas where vegetative cover is low. Rembold et al., (2013) gave a comprehensive overview regarding biomass and yield prediction methods. many studies focused on the quantitative relation between satellite or airborne remote sensing data and crop yields and make use of two main methods of the general strategies (Ferencz et al., 2004). The first method is by incorporating satellite remote sensing data into existing or advanced plant physiology, crop growth models (such as in Reynolds et al., 2000;Patel et al., 2001;Richter et al., 2011;Vuolo et al., 2013). The second method of general strategy is using direct mathematical relationships between satellite remote sensing and crop yields; in some researches, models use only satellite remote sensing data with ground-truth reference (reported crop statistics) data for calibration while some improve the predictive power of remotely sensed indicators by adding meteorological and agronomical data into the models. (such as in Rudorff and Barista., 1991;Dempewolf et al., 2014). There other studies on Linear regression models relating vegetation indices to crop yield (Rasmussen, 1992;Groten, 1993;Maselli et al., 1992). Nagy et al., (2018) regressing reported yield values of Wheat and maize yield against time series of 15 different peak-season MODIS-derived NDVI and showed that MODIS-NDVI could be used effectively to predict crop yields for Tisza river catchment area 6-8 weeks before harvest, and Bogná ra et al., (2017) also used time series of MODIS data to estimate and forecast the winter wheat yield of Hungary. This study will be using the same methodology but focusing on using vegetation indices derived from time series of Landsat 8 data to forecast winter wheat yield in Karcag, Ecsegfalva and Kunhegyes settlemets which are part of Já sz-Nagykun-Szolnok county in Hungary. The independent variable will be vegetation indices (SAVI and NDVI) and an empirical regression model will be developed to estimate final wheat yield which is the dependent variable.

2.2.Crop Statistical Data
Crop data/ Ground truth data, the average wheat yield (t/ha) of this area was obtained from the Research Institute of Karcag from 2013 to 2019. The data obtained were the identification of the farm, the year and the crop yield data.

Landsat Data
The time series Landsat 8 satellite images for the growing season 2013 to 2019 was downloaded from USGS website (http://earthexplorer.usgs.gov/). was used. There are 11 bands in Landsat 8. It also has operational land imager (OLI) and thermal infrared sensors (TIRS) with nine spectral bands and 30m spatial resolution. True-colour image are produced when the bands 1 to 7 which have different wavelengths with red, green and blue sensors are combined together. Band 8 has 15m resolution while the new band 9 which is ultra-blue is useful for cloud detection and for coast and aerosol studies. The band 10 and 11 are thermal bands which has their usefulness in providing more accurate surface temperature with 100m resolution (Landsat science, 2013). However, the Band 4 (red) and 5 band (NIR) were used for further processing. Square Error (RMSE) in prediction. The accuracy of the model for prediction is better when the R 2 is higher and the RMSE is lower. The R 2 and the RMSE was calculated using the equations 4 and 5 Where: yi is predicted yield data; y í is the ground truthing yield data; y ̅ is the average yield; n is the number of samples used for validation.
Also Normalized Root Mean Square Error (NRMSE) which is useful for comparing between predicted yield and ground truthing yield data and for comparing between seasons in case of variable yield ranges is calculated as Equation 6:

Equation 3
Another criterion used to assess the accuracy of the predictions of the wheat yield is Nash-Sutcliffe efficiency 'E1'. Nash and Sutcliffe (1970) defined the efficency E1 as one minus the sum of the absolute squared differences between the predicted yield and the ground truthing yield data normalized by the variance of the ground truthing yield data during the period under investigation. This is calculated as Equation 7: Where, Oi is the observed value or ground truthing yield data, Pi is the predicted yield values; O ̅ is the mean/ average observed values. Values of E ranges between 1.0 (which is perfect fit) and −∞. Here, an efficiency that is lower than zero indicates that the average value of the observed yield would have been a better predictor than the model.

Results
All these vegetation indices for each month with the field yield data were used to calculate equations for predicting wheat yield using simple regression analysis (Fig 2.)   . VI's with higher correlation were used respectively to establish regression models. The relationship between Landsat-NDVI and wheat yield were best described by linear function, accordingly, with the strongest correlation and highest accuracy (Fig.4). The deterministic coefficient of prediction models derived based on Landsat NDVI in different dates was varied during the vegetation period, and showed the highest value, R 2 = 0.569 in the 160 th day. This mean that wheat in maximum biomass is the best time to make yield prediction. The performance and the validation of the models created from 6 years vegetation indices for wheat yield forecasting was validated using these 2018-2019 two years data. The results derived were compared to the ground truthing yield for each year. RMSE and the relative deviation (difference in percent) of forecast versus reported yield were calculated (Table 1).  were not much but higher than the 5% threshold that was generally accepted as good value (Ferencz et al., 2004).

SAVI
The SAVI data on day 158, 159 and 160 of the year shows the strongest relationship with the average observed yield (R 2 = 0.500, R 2 = 0.521 and R 2 = 0.534 respectively). Figure 8. shows the R 2 values (coefficient of deterministic) peaked on the 160th day of the year which is during the beginning of ripening. The performance and the validation of the models created from 6 years vegetation indices for wheat yield forecasting was validated using these 2018-2019 two years data. The results derived were compared to the ground truthing yield for each year. RMSE and the relative deviation (difference in percent) of forecast versus reported yield were calculated (Table 2).    average. Figure 12. shows the Average absolute deviations between estimated and actual yield for Day 158, 159 and 160, which are, 5.03%, 6.00% and 4.41% respectively. These values were not much but higher than the 5% threshold that was generally accepted as good value (Ferencz et al., 2004).
But the performance of the Landsat NDVI and SAVI wheat yield forecast was also tested using the Nash-Sutcliffe efficiency index, (E1), which is a global measure of model efficiency. The Nash-Sutcliffe efficiency index is positive with E1= 0.57 in the NDVI wheat forecast and that of SAVI is E1= 0.99.

Discussion
In this study, the Landsat derived vegetation indices during the flowering stage of the wheat development was found to be siginificantly correlated to the wheat yield in all study site. In agreement with previous studies, the current study has demostrated correlations between the VIs during the flowering/fruiting spike period and crop yields (Marti et al., 2007, Labus et al., 2002, Mkhabela et al., 2011, Tiecheng et al., 2019, which has been admited as the most critical period for most crops yield forecasting, however, SAVI demostrated higher correlations during the flowering/fruiting spike period and crop yield. This also is in agreement with Liaqat et al., For crop yield prediction, MODIS data is more suitable at large regional scales. But because of the coarser pixels of the reflectance data from MODIS, a higher scale errors is usually resulted . However, Landsat or any other satellites that has higher spatial resolution do provides better precision fragmented vegetation index for predicting yield at regional scales (Bolton and Friedl, 2013), of which it has been achieved in predicting the yield of wheat in this study. In agreement with other studies, this study has shown the validty of using satellite derived vegetation indices for predicting wheat yield. The result after validation shows a fairly good agreement between the wheat yield derived from Landsat derived SAVI and ground truthing yield; with an average absolute deviation between the predicted yield and observed yield was 4.22% while that of NDVI was 6.47%, of which the SAVI is less than the 5% threshold that was generally accepted as good value (Ferencz et al., 2004) remote sensing images such as Landsat 8 images have the potential for predict wheat yield even at a regional scale like this study area and much more in a small farm because of its spatial resolution.

Conclusion
This study has shown that vegetation indices derived from Landsat 8 time series data images can be used to predict winter wheat yield for this study site. This study was based on high spatial resolution data (Landsat-VIs) and ground truthing data of winter wheat which are grown in Karcag and other nearby settlements. The result of this study has shown that the beginning of ripening period was the best time to predict winter wheat yields. The NDVI and SAVI of 160 th day of the year was verified to best fit for wheat yield prediction with the highest R 2 . In addition to this, the relationship between vegetation indices derived from Landsat (NDVI and SAVI) and wheat yield was well described by a linear regression model. This proposed model showed a better performance for wheat yield prediction with SAVI model based performing better than NDVI model based. Hence, model derived from SAVI is considered to be better than NDVI. This development of wheat yield forecast will provide timely information on wheat production, status and yield in a well standard and regular pattern from this region to the international catchment level.
Aside the accuracy of the yield prediction, another essential component in forecasting is the understanding of the applicability of the prediction because the main goal is to minimize forecast uncertainties for a particular location and for an agricultural or economical sector.
With this prediction method, good estimation is provided on time during the growing season of the crops and this can be updated periodically as the season goes until harvest. Timely information like this to the farmers or decision makers will minimize the impacts of possible losses of crop yield and appropriate mitigation measures and readiness by putting some plans in place. This information will also help stakeholders to take early decisions and locate geographically, areas that has large variation in productivity of which is most vital necessity for food security and trade.