Multi-Temporal Crop Type and Field Boundary Classification with Google Earth Engine

Crop type and field boundary mapping enable cost-efficient crop management on the field scale and serve as the basis for yield forecasts. Our study uses a data set with crop types and corresponding field borders from the federal state of Bavaria, Germany, as documented by farmers from 2016 to 2018. The study classified corn, winter wheat, barley, sugar beet, potato, and rapeseed as the main crops grown in Upper Bavaria. Corresponding Sentinel-2 data sets include the normalised difference vegetation index (NDVI) and raw band data from 2016 to 2018 for each selected field. The influences of clouds, raw bands, and NDVI on crop type classification are analysed, and the classification algorithms, i.e., support vector machine (SVM) and random forest (RF), are compared. Field boundary detection and extraction are based on non-iterative clustering and a newly developed procedure based on Canny edge detection. The results emphasise the application of Sentinel’s raw bands (B1–B12) and RF, which outperforms SVM with an accuracy of up to 94%. Furthermore, we forecast data for an unknown year, which slightly reduces the classification accuracy. The results demonstrate the usefulness of the proof-of-concept and its readiness for use in real applications.

Crop type classification is a goal of this research and entails the detection of crop types in unknown areas without a priori information. The detection of agricultural areas gives suppliers, policy makers, and governments an up-to-date overview of areas under crops and their potential yields. Remote sensing provides multispectral data and especially since the launch of the Sentinel mission, the amount of data has been continuously increasing. With its various satellites and sensors, the Sentinel mission enables innovative applications in different domains, such as agriculture or geology. Figure 1 presents a sample red-green-blue (RGB) image of Sentinel-2 with three different multispectral visualisations. The spatial and temporal resolutions are suitable for precision agriculture and its applications. In this study, we analyse the usability of the indices and raw bands of Sentinel-2 for crop type and field boundary mapping. A detailed comparison with current research activities is provided in Section 1.2. This publication additionally presents the methods chosen and an in-depth overview of the necessary steps to generate field borders or classify crop types. Figure 1. Sentinel-2 in RGB and its multispectral indices, which deliver various important growth and health information for every crop type. The visualised area is one of our experimental areas in the scope of this research.

State of the art
The Sentinel mission is part of ESA's Copernicus program and includes several satellites, which monitor the Earth in sun-synchronous orbits and provide various spectral bands with a high and continuous temporal and spatial coverage. This study focuses on the capabilities of Sentinel-2A and 2B for crop mapping, especially the application of all the raw bands, in comparison with index-based approaches. Nevertheless, the lack of benchmarks and various metrics makes it difficult to compare performance. Various optical and radar-based data from Landsat 7, Landsat 8, Sentinel-1, and Planet have been analysed for yield forecasts [1,2]. The combined use of data from Landsat 8 and Sentinel-2 improves the temporal resolution for 2016, before Sentinel-2B was launched. Skakun et al. [3] mentioned that a combination with Landsat outperforms a single sensor especially for 2016-2017 data.
Since 2018, less impact has been observed owing to the nominal 5-day revisit cycle [3]. Data from the Moderate Resolution Imaging Spectroradiometer (MODIS) are generally used to forecast yields at the county level but do not fulfil the spatial resolution requirement of precision farming. Convolutional neural networks have shown excellent results with MODIS data. Crop type mapping is usually based on radar or optical data from Sentinel-1 or Sentinel-2, respectively. Some researchers fused both data sets and attained a maximum accuracy of 82% [4]. The authors state that a combination of Sentinel-1 and Sentinel-2 data outperforms approaches based on either radar or optical measurements alone [4]. Bargiel et al. [5] analysed phenological sequence patterns with Sentinel-1 and attained F1 scores of 0.74-0.79. Rußwurm et al. [6,7] experimented with long shortterm memory neural nets and Sentinel-2 raw bands as input data, attaining an overall accuracy (OA) of up to 89.7%. Saini et al. [8] tested single-date images from Sentinel-2 using support vector machines (SVMs) and Random Forest (RF) in the Roorkee district of India. The results showed that the OAs of the classifications achieved by RF and SVM using Sentinel-2 imagery were 84.22% and 81.85%, respectively [8]. Immitzer et al. [9] confirmed the high values of the red-edge and shortwave infrared (SWIR) bands for vegetation mapping but the usage of these raw bands additionally improved the classification tasks in comparison with those achieved by single index-based classification studies. Dimitrov et al. [10] too used Sentinel-2 as training data to apply sub-pixel classification to PROBA images with a resolution of 100 m/pixel. The chosen machine learning methods, i.e., SVM and RF, provided classification accuracies ranging from 67% for grasslands to 92% for broad-leaved forests. Random forest performed better than maximum likelihood or SVM classifiers with OA in the range 73-93% [11]. Random forest, artificial neural networks, and SVMs have been compared in several publications for crop type mapping [12,13] but the effect of input data or predictions for future years based on past observed values have rarely been addressed.
In general, both object-based and pixel-based classification approaches are detailed in the literature with an emphasis on the advantages of the former. For instance, an object-based approach was adopted in a previous work [14] with accuracies in the range 78.1-96.2%. Blaschke et al. [15] mentioned that object-based approaches use image segmentation to identify regions with similar characteristics of homogeneity, which reduces feature space and computation time compared with those achieved by pixel-based approaches [15]. There exist several examples of unmanned aerial vehicle-based classification with high-resolution data [16,17]. Herein, RF achieved accuracies of up to 94.6% through object-based classification. While much research has focused on the mapping of crop species, the mapping of field boundaries seems to be a niche problem. However, it represents an important feature of yield prediction on the field scale, especially for regions without official field boundary data. National authorities can usually accomplish this with inputs from farmers. A satellitebased approach can automate this time-consuming procedure and track boundary changes over time. Contour detection and segmentation with high-resolution images obtained from WorldView data were analysed by Marshall et al. [18]. The results showed that both methods were not suitable for challenging areas. While segmentation seemed to lead to over-segmentation, contour detection performed better in the case of fields larger than 0.01 square kilometre. Masoud et al. [19] proposed an approach based on convolutional networks, providing field boundaries and scaling up the resolution from 10 m/pixel to 5 m/pixel. A comparison with RapidEye images, which had an original resolution of 5 m/pixel, showed only minor differences. A convolutional neural network with only one composite image as the input was additionally verified [20].

Overview
The aim of this research was to implement an efficient crop type and field boundary detection method for regions without nationally available information. Object-based approaches increase classification accuracy and reduce computation time. Random Forest and SVMs are the focus of several research activities and attain excellent performance metrics. Nevertheless, all approaches show limitations in the case of time series forecasts. Furthermore, the influences of clouds and interpolation methods have not been analysed in most cases. Inglada et al. [21] considered interpolation methods and used Satellite Pour l'Observation de la Terre (SPOT) 4 and Landsat 8 data 4 of 28 for classification. Most publications focus on one or more indices, such as normalised difference vegetation index (NDVI) (Figure 2). The SWIR bands, B11 and B12, exhibit important properties and hence, we include all the bands and compare the results with those obtained using NDVI. Additionally, simple non-iterative clustering (SNIC) and our own index-based Canny edge procedure for field boundary extraction are introduced. While SNIC is a standard Google Earth Engine algorithm for the clustering of similar pixels, our approach benefits from the uniform colour distribution of the normalised difference water index (NDWI), its emphasised edges, and a compilation of several Canny edge images across the vegetation period. Our data set differs from those used in previous studies regarding the number of fields and temporal distribution. The data set extends from 2016 until 2018 and is also applied to the forecast for 2018, based on a model with data from 2016 and 2017. Time series forecasts dramatically influence the OA because of weather anomalies. The semantic segmentation of significant features, which are weather-independent to the extent possible, can boost classification accuracies and provide a stable classification for unknown regions and years. Section 1 provides a general introduction including our motivation, state-of-the-art activities, and our goals. Section 2 gives a detailed description of our data set and study site, as well as the methods used such that an interested reader can replicate our work. The results are presented in Section 3, which additionally shows the various experiments and corresponding accuracies. The discussion and conclusions in Sections 4 and 5, respectively, complete this research work and entail proposals for further investigations.

Study site
The study site is Upper Bavaria in Southern Germany with about 7875 square kilometre of agricultural land and an average field size of 0.02 km 2 [22]. The main crop types are corn, winter wheat, barley, rapeseed, sugar beet, and potato. Of these, corn, winter wheat, and barley are the predominant crops. Bavaria has an annual average temperature of 8.6 °C and average precipitation of 811 mm. Figure 3 shows the typical crop periods with the sowing and harvest months. Figure 4 illustrates the average maximum and minimum temperatures, monthly precipitation, and all the

Sentinel-2
The Sentinel-2 mission, with its two twin satellites Sentinel-2A and Sentinel-2B, monitors the Earth with a resolution of up to 10 m/pixel and a revisit time of 2-5 days depending on latitude. The availability of useful data is nevertheless limited due to clouds. Each identical satellite is equipped with a multispectral sensor, that covers 13 spectral bands and has a swath width of 290 km. Sentinel-2A was launched in June 2015 followed by Sentinel-2B in March 2017. Sentinel-2 provides visible and near-infrared (NIR) spectral bands (B2, B3, B4, B8) with a resolution of 10 m/pixel and SWIR bands (B11, B12) with a resolution of 20 m/pixel. Band 1 is for aerosol, while bands 9 and 10 are suitable for detecting water vapour and cirrus clouds. These bands provide a resolution of 60 m/pixel for atmospheric corrections and do not need higher resolutions. Table 1 summarises the properties. In this study, Sentinel's Level 1C product was downloaded from Google Earth Engine, which stores terabytes of up-to-date satellite data and provides an efficient and fast way to access this information through a JavaScript or Python interface [23,24]. The time series data includes raw bands and indices for each test field and year from 2016 to 2018. decided to focus on Level 1C products based on the assumption that it is appropriate for our case study and is consistent with our data set, which includes data from 2016 onwards. In contrast, Google Earth Engine provides Level 2A images starting from 2017. We plotted the min, max, standard deviation, and mean for every band to determine a significant feature differentiating crop types.

Figure 5.
Reflection differences using band 6 and its mean, min, max and standard deviation as mean.

Figure 8. Cloud detection based on OpenCV
Several cloud detection techniques were evaluated with a focus on fast processing time. For instance, bands 10 and 11 were excellent candidates for cloud detection and were used in our data set. However, they did not guarantee a completely cloud-free time series. Sentinel-Hub [29] implemented a Python framework, that detects clouds in Sentinel-2 images at the expense of processing time. We designed an alternative approach using OpenCV and its contour detection. Figure 8 shows three sample images with clouds detected in them. Our OpenCV-based cloud filtering takes RGB images as the input and rejects images in which clouds are detected such that a maximised cloud-free image collection serves as the input for field boundary detection. Clouds introduce noise and falsify the resultant boundaries. In contrast, crop type mapping was accomplished with a cloud cover property of 20% for every scene.
Another important task of the study is temporal interpolation. Working with time series data entails several problems, such as gaps, different time steps, and noisy measurements. Lepot et all. [30] reviewed interpolation methods, such as nearest-neighbour and linear interpolation. Sentinel images are available every 3 to 5 days for our study sites but cloud coverage reduces the amount of suitable data, which needs to be re-sampled and interpolated. To cover the phenological development, we decided upon a 2-weeks resolution of our time series data. Figure 9 shows an example of raw Sentinel-2 measurements over time in comparison with cloud-filtered and interpolated measurements for the same field and year. The maxima in the left shows clouds with high reflectance values. Polynomial interpolation performs slightly better, but does not approximate the real reflectance values when satellite data are not available for the beginning of the vegetation period. Linear interpolation shows stable performance in all the cases and is, therefore, used.

SVMs and RF
Machine learning includes unsupervised, supervised, and reinforcement-learning models in order to analyse, classify, and predict trends. Unsupervised learning provides a set of algorithms, which uncover insights without additional labels. On the contrary, supervised learning uses labels such that a target output is associated with various predictors. Reinforcement learning models entail agents, environment, and corresponding interactions in order to maximise rewards. A detailed description of this approach can be found in the literature [31] but is not in the focus of the present work. Supervised learning ,with its methods of SVM and RF for classification analysis, are used for crop type classification in this work. Support vector machines can perform linear and non-linear classification tasks and regression analysis. Non-linear classification especially uses a kernel, that maps an input space to a highdimensional feature space. An input set is separated into classes or categories with a clear, maximised gap between these categories. Every training sample consists of an assigned class (output) and its observations (input). A theoretical description and overview can be found in the literature [32,33]. Random Forest performs classification and regression tasks as well [34]. It is an interesting approach that comprises several almost uncorrelated decision trees, each of which is trained by a random subset of the training data set with substitution (bootstrapping). Individual decision trees tend to overfit but a combination of several uncorrelated trees overcomes this issue. Each decision tree provides a class prediction, but the final result is obtained from the aggregation of all the predictions by averaging the outputs. Both algorithms are part of the machine learning Python library scikitlearn, which is used in this research.

Classification accuracy
The performance evaluation of crop type mapping is based on overall accuracy (OA), producer's accuracy (PA), user's accuracy (UA) and Kappa statistics. The PA and UA values are presented for each crop type while the OA and Kappa values evaluate the global performance of our trained model. All the metrics are part of scikit-learn too. The OA represents the number of correctly classified crops divided by all the reference crops. The Kappa coefficient is a measure in the range 0-1 and quantifies prediction and ground truth. For instance, a value of 1 indicates a perfect match between the prediction and ground truth. The PA shows how well a classification matches with the ground truth while the UA refers to the reliability of the classification. A description of the metrics and examples can be found in the literature [35]. Cross-validation (CV) has also been applied to avoid over-fitting and is based on the cross_val_score of Scikit-learn. Figure 10 illustrates the general processing chain with its data inputs, pre-processing, training, and final classification. OpenStreetMap was used to remove irrelevant information, such as roads, water bodies, and forests. Furthermore, we applied the standard deviation of NDVI as a filter to remove pixels, which did not change over time. An agricultural area usually changes during a vegetation period. The training and testing of Sentinel-2 data were carried out with official field boundaries from StMELF and self-generated field boundaries. The SVM and RF were trained with mean values (NDVI/raw bands) and tested with an independent data set. Another experiment focused on the prediction for the year 2018 using a model trained with data from 2016 and 2017. This represents a challenging step, mainly because of the target year being divergent.

Overview of processing
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 19 April 2020 doi:10.20944/preprints202004.0316.v1 Figure 10. General overview of the processing chain with its data as input, processing steps, and models

Field boundaries
Especially for regions without field border information, field mapping is an important step towards yield forecasts and crop type mapping. We implemented three procedures, compared the resultant polygons visually, and applied them to a challenging area with narrow fields (12). The first algorithm, SNIC, used all the bands of Sentinel-2 as input. As the SNIC over-segments an image and produces many irrelevant boundaries, we decided to introduce a novel approach based on Canny, an algorithm for detecting edges in images. Further information on Canny and SNIC can be found in the literature [36,37]. This procedure additionally includes cloud detection, as introduced in Section 2.4. Adding all the Canny results over a period of time improves boundary detection. Figure 11 shows the processing pipeline with its OpenCV processing steps and geo-referenced polygons as the output.

Field boundary detection
The first experimental design shows RGB segementation with RGB bands of Sentinel-2 as the input for Canny. The area in Figure 12 has very narrow fields and is challenging to map. Comparing Figure 12 with Figure 13 discloses the advantage of NDWI images. While RGB images introduce more noise in the resultant Canny sum, NDWI images reduce the number of in-field edges and correctly identify the boundaries of fields and other objects. The summation of several images especially improves the results and uncovers edges, which would be hidden in a single image. Figure 13 shows a greater number of polygons detected and more distinct borders. Figure 14  polygons detected. The final output was additionally masked with OSM data in order to clear roads, buildings, and lakes, which were detected as polygons. Small fields (<0.2 2 ) or artefacts were filtered for the generation of Figure 14. The RGB-based Canny edge uncovers fewer details than the NDWI does. The SNIC over-segments the area and does not show clear boundaries. We applied bicubic resampling to smooth our image collection and could thus slightly improve the boundary visualisation. The resolution of 10 m/pixel is a limiting factor; nonetheless, it still provides good results. The application of bands with lower resolutions, such as B6 or B7, should therefore be avoided. Nevertheless, the field boundary pipeline can be further improved, for instance with higher resolutions or improvements in the processing pipeline. The processing of the images introduced small gaps between the fields, which can be resolved by slightly increasing the polygon size.

Crop type classification
The classification of crop types represents another integral aspect of this research. We compared several interpolation methods in combination with SVMs and RF. Cleaning and interpolating the data set improve the performance of a classifier. We decided upon linear interpolation to fill in the missing values and a resolution of 2 weeks from February to August. Our SVM and RF model was trained with indices, such as NDVI, NDRE, NDWI, and all the bands for 2016, 2017, and 2018. Hyperparameter tuning was realised using GridSearchCV. In the following paragraphs, we present the results obtained using NDVI and all the bands. NDVI is a good proxy but the usage of all bands outperforms an index-based approach in our experiments. The verification is based on a random train/test split with a test size of 25%. The entire data set consists of almost 2100 samples with 700 samples for each year. The yearly crop data set is uniformly distributed and includes 100 samples for every class. Table 3 and Table 4 show the performance of SVM using NDVI and all the bands. The use of all bands improves the OA to 87% in comparison with 71% achieved using NDVI. The cross validation scores are based on scikit-learn's cross_val_score and confirm the reliability of all the models. The PA and UA values are additionally listed in Tables 3-9. Table 5 and Table 6 summarise the results of RF using NDVI and all the bands. In general, RF delivers better results irrespective of the input data set. The OA achieved by RF is 80% using NDVI and 92% using all the bands.      The trained model, presented in Table 6, was applied to a location near Dürnast and for the year 2018. The training data set does not include this area, and this allows us to test the performance of our model in a real world scenario. Figure 16 shows the recorded results from StMELF which act as ground truth data, followed by the classification results. The OA is 80% which is lower than expected, considering the results in Table 6. In another experiment the same model was applied on the same area but without the additional class 'Other', leading to an accuracy of 96%. Figure 17 shows the classification without the class 'Other'. The model experiences difficulties in differentiating among winter wheat, winter rapeseed and winter barley. The class 'Other' includes various types of crops, such as different types of grasslands, summer barley, and soybean. It introduces an error, especially in regions with crops not covered in our model. Table 7 confirms the performance without the class 'Other'. The performance of this model is very promising and is analysed in terms of feature importance ( Figure 15). It reveals that the bands B3, B4, B5, B6, B7, B8, B8A, B11, and B12 are the most important features, especially in May and August.

Time series and prediction for an unknown year
The time series prediction was analysed to develop a system for detecting crop types without data for the year considered. Figure 18 shows the raw reflectance values for band B6 and the crop types considered in our study.

Figure 18. Comparison of B6_mean reflectance across years
The reflectance values are influenced by weather conditions and the growth progress. Nevertheless, the trend remains comparable across years, and a separation between crops is given as well. We trained a model with data from 2016 and 2017 and forecast values for 2018. The OA was 77% with a Kappa of 0.89 (see Table 8). The growing season of 2018 was dry and showed an overall decrease in yield, which influenced the spectral response. Weather anomalies influence the accuracy of yield forecasts but do not have a comparable effect on the classification of crops. In another experiment, the class 'Other' was excluded and the model was trained with our six crop types and data from 2016 and 2017. Furthermore, the test data for 2018 did not include the class 'Other'. Table  9 summarises all the results and shows an OA of 81%, which is very accurate considering that 2018 was not comparable with the previous years. Figure 19 and Figure 20

Discussion
The results obtained and the experimental approaches used for field boundary detection show the influence of input data on the performance. Data cleaning and preparation are very important steps to be carried out at the beginning of every data science project. Crop type mapping too benefits from the type of input and its configuration. Segmentation of images has been studied in several works but its adaption to Earth observation has rarely been carried out. Gorelick [36]  These features are additionally affected by weather conditions but possibly do not respond to weather anomalies as much as raw band reflections do. Nevertheless, it should be stressed that the performance of our model is very good and renders it suitable for use in a real application. In other research activities, radar-based data, e.g. from Sentinel-1, were used for the mapping of crop species.
Getting data independent of cloud conditions can be advantageous in regions with high cloud coverage, such as the tropics. However, Sentinel-2 data is preferred if it can be obtained on a regular basis (at least one image every 2 weeks). For a quantified comparison, some of our data will be published so that the advantages and disadvantages of other models can be evaluated.

Conclusions
This study addressed several aspects of crop type mapping and field boundary detection. Both applications represent important steps towards yield mapping on the field scale, especially in regions without additional information. We used RF and SVMs with their underlying configurations. Using our approach, we classified several crop types with very high precision. The NDWI-based field boundary detection can be further improved and has the potential to automate a time-consuming and important task. Random forest outperforms SVMs in crop type mapping. Especially, the usage of all raw bands uncovers additional information, which cannot be seen using a single index, such as