Early crop-type mapping under climate anomalies

: Crop-type mapping is an important intermediate step for cost-effective crop management at the ﬁeld level, as an overview of all ﬁelds with a particular crop type can be used for monitoring or yield forecasting, for instance. Our study used a data set with 2400 ﬁelds and corresponding satellite observations from the federal state of Bavaria, Germany. The study classiﬁed corn, winter wheat, winter barley, sugar beet, potato, and winter rapeseed as the main crops grown in Upper Bavaria. We additionally experimented with a rejection class "Other", which summarised further crop types. Corresponding Sentinel-2 data included the normalised difference vegetation index (NDVI) and raw bands from 2016 to 2018 for each selected ﬁeld. The inﬂuence of raw bands compared to NDVI was analysed and the classiﬁcation algorithms, i.e. support vector machine (SVM) and random forest (RF), were compared. The study showed that the use of an index should be critically questioned and that raw bands provided a wider spectral bandwidth, which signiﬁcantly improved the mapping of crop types. The results underline the use of RF with raw bands and achieved overall accuracies (OA) of up to 92%. We also predicted crop types in an unknown year with signiﬁcantly different weather conditions and several months before the end of the growing season. Thus, the inﬂuence of climate anomalies and the accuracy depending on the time of prediction were assessed. The crop types of a test site and year without labels could be determined with an OA of up to 86%. The results demonstrate the usefulness of the proof-of-concept and its readiness for use in real applications.


Introduction
Agricultural productivity needs to be increased and its environmental impact must be reduced to meet the future demand for food and achieve sustainable development goals (SDGs). The recognition of crop types and their spatial distribution is becoming increasingly important, as many new methodological developments are based on their knowledge [1]. Knowledge of the crop type is essential because it can be used to derive targeted and crop type-specific management zones and management decisions, to make crop type-specific forecasts of crop development, and to make advanced yield prediction on a field-and region-specific basis. In the context of cross-compliance measures, knowledge of the crop grown and documentation of its areas is also crucial, as this is also the basis for support payments and sustainability can be better assessed through knowledge of the crop types. Through "green direct payments" (or "greening"), the European Union (EU) supports farmers who are trying to meet environmental and climate goals. One of the required measures is crop diversification, which must be controlled. Efficient crop mapping will promote this process. Thus, efficient agricultural land classification gives suppliers, policy makers and governments an up-to-date overview of the land under cultivation.
Remote sensing provides multispectral data, and especially since the launch of the Sentinel mission, the spatial and temporal availability of data has been continuously increasing. With its various satellites and sensors, the Sentinel mission enables innovative applications in different domains, such pages 1 -20 3 of 20 of research studies without the need to develop own processing pipelines. Our data set differs from previous studies regarding the number of fields and by inspecting a total of 2400 fields grown in three years with multiple crops, as well as referencing Sentinel-2 imagery with ground observations. The data set covers the years 2016 to 2018. Plant growth in 2018 was affected by a climate anomaly with significant drought and higher temperatures that impacted plant growth.
The objectives of this research are to find out whether (i) index-based classification is suitable for mapping crop types, (ii) raw bands help to improve classification, (iii) how well crop types can be classified in a year with widely differing weather conditions, and (iv) whether the classification accuracy depends on the phenological phase?

Study site
The study area is Upper Bavaria in southern Germany, with a size of approximately 17529 square kilometers, of which about 7875 square kilometers make up agricultural land [37]. The main crop types are corn, winter wheat, barley, rapeseed, sugar beet, and potato. Of these, corn, winter wheat, and barley are the predominant crops. In general, Bavaria has an annual mean temperature of 8.6°C and an annual mean precipitation of 811 mm. Figure 1 shows the typical crop periods with the sowing and harvest months. Figure 2 illustrates the average yearly maximum and minimum temperatures, monthly precipitation, and the test site. We focused on the vegetation period 2016-2018 and summarised the most important weather parameters in Table 1. For example, the average temperature during the March to July growing season in 2016 was 12.3°C, while in 2018 it reached 13.7°C. The rainfall amount as well as the ET0 values (FAO-56) [38,39] are based on the mean sum per field. While the rainfall is similar over the years, the evapotranspiration has steadily increased. This is partly due to rising temperatures and indicates an increased water demand. Unfavourable weather conditions did not only affect crop yields or reflectance values. These anomalies also affected crop-type mapping in 2018. Table 1 also shows an overview of the fields for crop-type classification. A total of 2400 samples were used for the experiments, with all 7 classes evenly distributed. 2099 fields were randomly chosen from different regions in Upper Bavaria to match the whole administrative district. In addition, a validation set of 301 neighboring fields from one region and for the year 2018 was created to apply the trained models to a coherent area and generate images for visual assessment. The Bavarian State Ministry of Agriculture and Forestry (StMELF) provided the underlying ground-truth data with all fields and corresponding crop types. The study focuses on Upper Bavaria and the crops considered are corn, winter wheat, winter barley, sugar beet, potato, winter rapeseed, and an additional rejection class called "Other", which covers all other crops, such as grassland or summer barley. Table 1. Overview of the samples per year with corresponding climatic conditions. The coherent area is shown in Figure 2 Figure a) shows a test site with decimal latitude and longitude coordinates (WGS 84). Figure b) shows Upper Bavaria and Bavaria as an overlay. Figure c) and d) show the average maximum and minimum temperatures and the monthly precipitation.

Sentinel-2
The Sentinel-2 mission, with its two twin satellites Sentinel-2A and Sentinel-2B, monitors the Earth with a resolution of up to 10 m/pixel and a revisit time of 2-5 days depending on latitude. The availability of useful data is nevertheless limited due to clouds. Each identical satellite is equipped with a multispectral sensor, that covers 13 spectral bands and has a swath width of 290 km. Sentinel-2A was launched in June 2015 and Sentinel-2B was launched in March 2017. Sentinel-2 provides visible and near-infrared (NIR) spectral bands (B2, B3, B4, B8) with a resolution of 10 m/pixel and SWIR bands (B11, B12) with a resolution of 20 m/pixel. Band 1 is for aerosol, while bands 9 and 10 are suitable for detecting water vapour and cirrus clouds. These bands provide a resolution of 60 m/pixel for atmospheric corrections and do not need higher resolutions. Table 2 summarises the properties. In this study, Sentinel's Level-1C product was downloaded from Google Earth Engine, which stores terabytes of up-to-date satellite data and provides an efficient and fast way to access this information through a JavaScript or Python interface [35,40]. Raw bands and NDVI time series were downloaded for each test field and year.
Ratio of red to NIR Furthermore, Google Earth Engine provides Level-2A (L2A) images, which are bottom-of-atmosphere images, based on the European Space Agency's Sen2Cor open source processor [42]. This processor handles all the necessary atmospheric corrections and scene classification tasks. Google Earth Engine provides Level-2A images starting from March, 2017. Former data needs to be processed with Sen2Cor. Current methods for generating surface reflectance products are compared in the scope of the Atmospheric Correction Intercomparison Experiment (ACIX) and the research community [43,44]. We decided to use the Level-1C product for this research study because L2A is not mandatory to show that an index is not the appropriate choice for crop-type detection, and to investigate predictive ability. Nevertheless, in a previous work, we observed that L2A improves the quality of images but minimally improves the prediction accuracy for some applications. This has also been observed in other research studies [20]. Figure 3 show the temporal patterns for every crop type. We compared the minimum and maximum values, the standard deviation, and the mean value to determine a significant feature. Time, as a feature, uncovers differences as well.
The mean values separate the crop types very well and are used in our experiments for the raw bands as well as the NDVI. We computed the NDVI using Google Earth Engine and downloaded all the results into a corresponding file for further analysis. The NDVI was developed to monitor biomass or plant health and has been widely used for various applications. For each field, the pixels were aggregated to a mean value to generate a time series for each field. The satellite images were taken from February to August. All data was downloaded using a cloud meta filter of 20%. The CLOUDY_PIXEL_PERCENTAGE parameter considers cloudy pixels for a Sentinel-2 scene but does not recognise all cloudy pixels at the field level.

Temporal Interpolation
Another important task of the study is temporal interpolation. Working with time series entails several problems, such as gaps, different time steps, and noisy measurements. Lepot et all. [45] reviewed interpolation methods, such as nearest-neighbour and linear interpolation. Sentinel images are available every 3 to 5 days for our study sites but cloud coverage reduces the amount of suitable data, which needs to be re-sampled and interpolated. To cover the phenological development, we decided upon a 2-weeks resolution of our time series data. Polynomial interpolation performs slightly better, but does not approximate the real reflectance values when satellite data are not available for the beginning of the vegetation period. Linear interpolation shows stable performance in all the cases and is, therefore, used.

Metrics
The performance evaluation of crop-type mapping is based on overall accuracy (OA), producer's accuracy (PA), user's accuracy (UA) and Kappa statistics [46]. The PA and UA values are presented for each crop type while the OA and Kappa values evaluate the global performance of our trained model. All the metrics are part of scikit-learn [47]. The OA represents the number of correctly classified crops divided by all the reference crops. The Kappa coefficient is a measure in the range 0-1 and quantifies prediction and ground truth. For instance, a value of 1 indicates a perfect match between the prediction and ground truth. The PA shows how well a classification matches with the ground truth while the UA refers to the reliability of the classification. Figure 4 illustrates the general processing chain with its data inputs, pre-processing, training, and final classification. OpenStreetMap was used to remove irrelevant information, such as roads, water bodies, and forests. Furthermore, we applied the standard deviation of NDVI as a filter to remove pixels, which did not change over time. An agricultural area usually changes during a vegetation period. The training and testing were carried out with official field boundaries and crop types from StMELF. The SVM and RF were trained with mean values (NDVI/raw bands) and tested with an independent data set [48][49][50]. Another experiment focused on the prediction for the year 2018 using a model trained with data from 2016 and 2017. This represents a challenging step, mainly because of the target year being divergent.

Crop-type classification with SVM and RF
We compared several interpolation methods in combination with SVMs and RF. Cleaning and interpolating the data set improves the performance of a classifier. We decided upon linear interpolation to fill in the missing values and a temporal resolution of 2 weeks from February to August. Our SVM and RF models were trained with NDVI and all the bands for 2016, 2017, and 2018. NDVI is a good proxy but the usage of all bands outperforms an index-based approach in our experiments. The verification is based on a random train/test split with a test size of 25% and 5 splits for K-Fold cross validation. The entire data set consists of 2099 samples, including 699 samples for 2016, 700 for 2017, and 700 for 2018. The additional coherent test site is visualised in Figure 6. The yearly crop data set is uniformly distributed and includes around 100 samples for every class. RF was parametrised with 1000 decision trees in the forest and SVM applied 10 for the "C" parameter and a linear kernel. Table  3 compares the performance of the classifiers using NDVI and all the bands. It clearly simplifies the collected results and shows that RF performs better in all cases, which is why only the confusion matrices for RF have been presented in Table 4 and Table 5. Nevertheless, SVM also classified the crop types very well. In general, the use of all bands improves the OA for SVM and RF. Table 4 summarises the results of RF using NDVI while Table 5 lists the performance of RF with raw bands and without the rejection class. The greatest influence on accuracy is provided by the raw bands. This is important because they cover a wider spectral range than an index alone. Thus, possible important insights are not lost. Table 5 confirms the performance without the rejection class. The performance of this model is very promising and is analysed in terms of feature importance ( Figure 5). It can be seen that bands B3, B5, B6, B7, B8, B8A, B11 and B12 are the most important features, and mainly from May to August. The results could be increased to 94% using hyperparameter optimisation, but the optimised values were omitted for this publication to keep the configuration simpler. On the one hand, the rejection class worsens the results, on the other hand, it includes all agricultural land. The class "Other" includes various types of crops, such as different types of grasslands. It introduced an error, especially in regions with crops not covered in our model. The best model was additionally applied to the coherent test site of 301 fields and for the year 2018. When training the classifier, this area was not included, which allowed us to test the performance of our model in a real scenario. Figure 6 shows the recorded results from StMELF which act as ground truth data, followed by the classification results. The OA of 96% was achieved without the rejection class and it can be seen that the classifier had problems to distinguish barley and rapeseed. This was a bit surprising since rapeseed is clearly different from other crops in spring.

Prediction in an unknown year with anomalies
The prediction for an unknown year was analysed to develop a system for detecting crop types without labels for the year considered. Especially in times of climate change, this experiment is important to assess to what extent drought or unusual growing conditions affect the performance of classifiers. Figure 7 shows the raw reflectance values for band B6 and the crop types considered in our study. The reflectance values are influenced by weather conditions and the growth progress. Nevertheless, the trend remains comparable across years, and a separation between crops is given as well. We trained a model with data from 2016 and 2017 and predict crop types for 2018 using corresponding satellite observations. The OA was 79% with and 80% without the rejection class "Other" (see Table 6 and 7), which is very accurate considering that 2018 was not comparable to previous years. Applying the classifiers to the test area in Figure 8 reached an accuracy of 75% and 86% without the "Other" class, respectively. The results are even comparable to those of the models trained with data for 2016-2018. The temporal patterns are preserved even in an unusual year, which is why the classification performs well. To get a more detailed insight, the dependence of the accuracy on the classification time was investigated in Figure 9. Here, we see that the PA for winter rapeseed is already very high in spring, which is related to the clear light green color. At the end of July and in August, the accuracy for all other crop types converges. This is also visible in the overall accuracy where from the end of July the maximum is almost reached.

Discussion
An important factor is pre-processing, which includes temporal interpolation and data cleaning. While linear interpolation is the right choice in many applications, polynomial interpolation approximates a time series more accurately, but introduces errors, especially for data sets with gaps at the beginning of the growing season, which is why linear interpolation was used in this study. Two weeks temporal resolution proved to be appropriate as it adequately matched the phenology and was supported by sufficient data in the test region. Single gaps could be filled by linear interpolation. For future works, it would be interesting to extend the observation period, as, in this study, Sentinel-2 observations were used from February to the end of August. This extension would cover the sowing and harvesting periods for all crop types as indicated in Figure 1). In our investigations, a small amount of clouds was acceptable because Random Forest can handle a certain amount of noise [51,52]. Nevertheless, a completely cloud-free time series for each field could minimally improve the results, but at the cost of run time, since each field time series must be processed with respect to cloudy pixels or shadows.
Analysis of the raw bands and comparison with the NDVI showed that the raw bands exerted a significant influence on the accuracy of a classifier. In this work, we chose the NDVI because other indices such as the NDRE did not result in better separation of crops (unpublished results). A specific objective of this work was to compare the index-based classification for crop mapping with the raw band-based classification. The use of all bands exploited additional information that improved differentiation at specific phenological stages. In general, previous multi-temporal approaches point to the advantages of time for crop differentiation but at the same time use an index by which this advantage is partially lost. For example, winter rapeseed fields can be clearly distinguished in spring with RGB bands, which becomes more difficult with a biomass index. Analysis of predictability as a function of time showed that the spring season is important for winter rapeseed. This could avoid misclassification of winter barley as winter rapeseed. Such an advantage of RGB information was explicitly observed for winter rapeseed, but in general the information from the B5, B6, B7, B8, B11, and B12 bands was significant.
The rejection class "Other" for further plant types, such as grassland, reduced the classification accuracies obtained with SVM and RF. Results described in section 3.1 showed that training a classifier with as many crop types as possible and avoiding the use of a rejection class increased the accuracy. However, from a practical point of view, this result was important because sometimes a model does not know all the crop types in a target region. In this context, transfer learning offers a possibility to extend an existing model by further crop types.
This study assessed how well crop types can be classified in a year with widely differing weather conditions, since the weather in 2018 was hot and dry. Therefore, an experiment was conducted to classify crop types in 2018 using a model trained with 2016 and 2017 data. Despite reflectance changes, the trained model was not inordinately affected by this anomaly and predicted good results. The temporal patterns of all crop types in 2018 thus remained comparable and could be distinguished. The early detection of the crop types as outlined in section 3.2 showed that maximal classification accuracy was reached from July onward, which is also consistent with observations in [33]. This emphasises that a model based on training data from average years can also be used for future years without labels for the target year. This experiments were also important because many previous studies are based on a train/test split of all collected data, which accordingly results in high accuracies. We hypothesise that climatic change in our test region will not affect the crop-type classification strongly as long as good water availability can compensate the effects of higher temperatures.
Overall, the results provided very good classification accuracies. The models explained the variance in our data set very well. Referring to the best results in Table 5 where the model was trained with data from all years, it can be seen that sugar beet, winter rapeseed and corn all achieve UA's and PA's above 90%. Winter wheat, winter barley and winter rapeseed are occasionally interchanged, which degrades the accuracies to some degree. Similarly, potatoes are sometimes confused with sugar beet and corn. We assume that due to the slight noise some time series become too similar although RF should deal with a small noise ratio. Special attention should be paid to winter wheat, winter barley and potatoes, as these have the highest potential for improvement in terms of accuracy. As a further extension of this work, it seems possible to apply a simplified approach to identify winter rapeseed based on spring color changes alone. As suggested in other research, radar-based information could also be used, which is also available from Sentinel-1. Retrieval of cloud-independent data may be beneficial in regions of high cloud cover, such as the tropics. However, Sentinel-2 data are preferable if they are available on a regular basis with at least one image every two weeks. For further quantification using other methods, our data set was partially made available to [2].

Conclusion
This study addressed several aspects of crop-type mapping. We used RF and SVM with their underlying configurations and achieved up to 92% overall accuracies. Using our approach, we classified seven crop classes with very high precision. Random Forest slightly outperformed SVMs with input features being critical for accuracy. Especially, the use of all raw bands uncovered additional information that could not be detected with a single index, such as NDVI, alone. In this context, we have shown that an index is not always useful. The use of a multi-temporal approach improved classification accuracies, but data gaps must be avoided, which was realised with a temporal resolution of two weeks and linear interpolation. Prediction of crop types for a year without labels and with extraordinary weather conditions showed that the spectral signature was still intact and crop types could be well determined at our study site. The accuracy of the prediction, depending on the time, increased steadily with the number of observations and reached a maximum from July onwards. An exception was winter rapeseed, which was already classified very well in spring. All other crops needed observations until July to reach the optimal classification level.