Crop Detection Using Time Series of Sentinel-2 and Sentinel- 1 and Existing Land Parcel Information Systems

Satellite Crop Detection technologies are focused on detection of different types of crops on the field in the early stage before harvesting. Crop detection is usually done on a time series of satellite data by classification of the desired fields. Currently, data obtained from Remote Sensing (RS) are used to solve tasks related to the identification of the type of agricultural crops, also modern technologies using AI methods are desired in the postprocessing part. In this challenge Sentinel1 and Sentinel-2 time series data were used due to their periodic availability. Our focus was to develop methodology for classification of time series of Sentinel 2 and Sentinel 1 data and compare how accuracy of classification can be increased, but also how to guarantee availability of data. We analyse phenology of single crops and on the basis of this analysis we started to provide crop classification. Original crop classifications were made from Enhanced Vegetation Index (EVI) layers made from Sentinel-2 time-series data and then we added also . To increase accuracy we also integrate into the process parcel borders and provide classification of fields..


Introduction
Satellite Crop Detection technology is focused on detection of different types of crops on the field in the early stage before harvesting. There exists a large area of domain, where such technologies can be used [1][2][3][4][5]. As examples we can mentioned: • Public sector and organization dealing with food security As example can be mentioned Common Agriculture Policy [6] in Europe, GEOGLAM/GEO monitoring [7] and FAO agriculture production monitoring [8]; • Food industry, investors and business owners for their strategic decision, investment making and sustainable forecast [9]; • Insurance brokers (risk assessment, data collection, clients claim verification, etc. ) [10]; • Agriculture machinery producers (information about crops are important for combination with other information and management) [11].
Multispectral satellite images are used in remote sensing crop detection. Remote sensing has the advantage of providing geographic information over a large area in a relatively short time. After processing, the images can be used to produce maps [12]. The act of processing the data into maps is called image classification. Two types of classification exist: supervised and unsupervised classification [13].
In supervised classification, the analyst selects pixels from the image based on knowledge of the land cover (also called « Training Site », [14]. With this it is possible to obtain different land cover features that will be used for classification.
Unsupervised classification does not require any prior information about the area of interest. A large number of unknown pixels are analysed and then classified into several classes based on natural groupings of images in this type of categorisation [14].
The first technique, supervised classification, is the most commonly used for quantitative analysis of remote sensing image data [13]. This method is built on the concept of segmenting the spectral domain into regions that can be linked to certain land cover classifications for a specific application. However, these regions can sometimes overlap. Different algorithms are used to perform this task.
Four steps are generally necessary to classify the images correctly [12]: -pre-processing of the image -selection of a particular criterion to describe the pattern -selection of a classifier -evaluation of the accuracy of the image classification.
Supervised classification has the potential to be more accurate than unsupervised classification. However, it is highly dependent on the training sites as well as the skill of the image analyst and the spectral distinction of the classes [14]. If several classes are very similar in terms of spectral reflectance (e.g. annual versus perennial grasslands), classification errors will tend to be high. Supervised classification requires more care in processing the training data. If the training data is poor or unrepresentative, the classification results will also be poor. In general, supervised classification requires more time and money than unsupervised classification, so both methods have advantages and disadvantages.
To extend accuracy of crop classification it is possible to use different phenology of different crops and analyse time series cross season [15 -18].

Overview of relations of phenology and Earth observation
Phenology is a branch of science that studies the periodic events of biological life cycles that depend on many external environmental influences, such as climate, weather changes and other ecological factors. Over time, species have evolved in response to their environment and adapted specifically to biotic and abiotic factors. Because of these interconnections, the study of phenology is useful in many ways. For example, the study of a plant can provide information about the environment in which it evolves, and conversely, the study of biotic and abiotic factors can help to understand how a plant responds to environmental factors [19]. Moreover, phenological events are easy to observe. This is why this science is used in many disciplines such as ecology, climatology, forestry and agriculture. In agriculture and horticulture, phenology has been used for a very long time. Phenological observations usually include the main growth stages of a crop, and several scales have been created to represent them accurately ( Figure  1 shows the 13 phenological stages for winter barley according to the BBCH scale) [20]. These observations are essential for many practical purposes. They allow, among other things, the careful selection of crops and varieties adapted to the environment, and the organisation of rotations. They also play an important role in the choice of irrigation, fertilisation and protection against pests and diseases. These observations can also be useful in preventing the risk of frost damage and in predicting harvest dates. By studying the phenophases of different crops and taking the right measures at the right time, it is therefore possible to improve management, increase yields, achieve greater stability in production and have better quality food [21 -22].
Today, climate change is impacting all ecosystems, threatening the balance of global food production. In addition, the world population continues to grow (9.7 billion people estimated in 2050 according to the United Nations) [31]. The scientific community must analyse the impacts of climate change and anticipate their consequences, in order to propose concrete solutions in terms of management of living resources. Phenological traits are key characteristics of climate adaptation and are of particular interest to the scientific community [19,24]. Efforts have been made worldwide to increase the phenology databases. Data collection and observations have been facilitated by technological advances, progress in computing, and satellite remote sensing, which has allowed the development of research methods and models on phenology [19]. There are several methods to assess the variations in eco-systems caused by climate change. Phenological observations are made on a long-term, continuous basis, using near-surface as well as satellite remote sensing [25]. Among the near-surface and in situ methods, different digital automated cameras are used. With these technologies, it is possible to detect differences in plant phenological characteristics among ecosystems by analysing the differences between the numbers extracted from daily digital images [26 -27]. The cameras are connected to a network and allow real-time monitoring of near-surface vegetation [28 -29]. Early phenocams were only sensitive to RGB wavelengths, but technical advances have enabled infrared light to be captured, improving the ability to detect vegetation [30]. Nevertheless, in situ phenological data are only available over limited areas [19]. Since 1970, technical advances in ground-based satellite observations have made it possible to observe phenology on a larger scale. Several satellites can be used for such observations, such as AVHRR (since 1980), MODIS (since 2000), and more recently VIIRS (since 2012) [19, 31 -32]. The phenology observed at the landscape scale by earth observation satellites is called land surface phenology (LSP) [41]. To study phenology at this scale, vegetation indices (VI) are created from land surface reflectance acquired by satellite optical sensors. The Normalized Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI) are among the most widely used indices [19]. The spectral response of green leaves on which these two indices are based is characterized by strong chlorophyll absorption in the red band and strong reflectance of the leaf structure in the near-infrared band of optical sensors [19]. The phenology observations obtained by LSP are different from in-situ phenology observations. Because they are based on a regional and global scale, these observations can be compared with regional climate information. This makes LSP remote sensing an important biological indicator for detecting the response of terrestrial ecosystems to climate variation.

Phenological curves of different crops
In their research, [34], wanted to enhance the understanding of the relationship between optical and SAR indices over several crops (maize, spring barley, winter wheat and grassland). They studied the correlations of three Sentinel-2 optical indices (Normalized Difference Vegetation Index, Normalized Difference Water Index, and Plant Senescence Radiation Index) with four Sentinel 1 SAR indices (VV and VH backscatter, VH/VV ratio and Sentinel-1 Radar Vegetation Index). For each land, three phenophases were considered (growing, green and senescence). They found out that correlations were increasing when data were split by land use type and phenophases. A typical bellshaped curve is observed for NDVI and NDWI for the crops maize, spring barley and winter wheat over the growing period ( Figure 2). Winter wheat crop couldn't be studied to the end of the growing season due to missing cloudless images before April 2018.

Pilot areas
The first experiments with supervised and unsupervised classification were carried out on the Rostenice farm, South Moravia, the Czech Republic ( Figure 3). The farm granted the crop data together with field borders. The Rostenice farm company manages 870 fields having in total 10 094 ha. In the study year of 2020, the temperature was significantly higher +2.5°C and the year was extraordinarily rich in precipitation +144 mm compared to the longterm average (10.8°C and 517 mm). In fact it was the second rainiest year since the beginning of the weather records (1961). Despite the cloudy and rainy weather it was possible to find useful satellite images for the crop analyses. The terrain is flat, sometimes slightly undulating. The altitude ranges from 194 to 376 metres above sea level. Crops that occupy more than 2 % of overall area are listed in Table 1. The additional experiment using unsupervised classification was performed on another site -Sentinel-2 tile covering southern Slovakia (Figure 3). This fertile agricultural region is centered at 18.

Used tools and algorithms
The pre-processed data were loaded into QGIS, version 3.16 and classified using Semi-Automatic Classification Plugin version 7 (SCP). The plugin is intensively developed by Luca Congedo and enables many earth observation operations including download of satellite images of various sources, pre-processing, clustering, classification, and accuracy calculation.
Supervised classification is a common tool to determine land cover. The quality of the output product is strongly dependent on the quality of the defined training areas [36]. The process of definition of output classes is workintensive as the supervisor must select representation of all the desired classes. The training samples for each class should be distributed throughout the layer.
Opposed to that, unsupervised classification requires no training. It organizes input layers into classes based on the similarity of spectral characteristics. The variance among the clusters and within them is calculated when creating the output classes [37]. The disadvantage of unsupervised classification is the inability to objectively evaluate the outcomes. Solution for this is to apply subjective and qualitative criteria, such as homogeneity of the segments, degree of fragmentation [38].
SCP plugin offers two clustering methods: 1) K-means clustering method was introduced by [39] and has been a popular clustering method for decades. Based on the desired number of output classes, the k cluster centers are randomly created in the n dimensional space, where n is the number of input layers. Square Euclidean distances are calculated from each point (pixel) to the cluster center. New cluster center is calculated as the mean of the points in the clusters. Again, all square distances are recalculated and the process is iterated till the cluster centers stabilize [40]. [41] claims k-means is suitable for a dataset with continuous values but is not suitable for categorical values. The drawback of the method might be the assumption that the number of the output classes is known which is not true in all applications.
2) ISODATA clustering method also randomly places cluster centers in the multispectral feature space and pixels are assigned to the centers based on the shortest distance. In the next step standard deviation within each cluster is calculated. When the standard deviation exceeds the user-defined value, the cluster is split. On the contrary, clusters are merged when the separation distance between cluster centers is less than the user-specified value. The iteration of the calculations terminates when one of the following conditions is fulfilled: i) the average center distance falls below the user-defined threshold, ii) the average change in the center distance is less than a threshold, iii) the maximum number of iterations is reached [42]. The ISODATA method overcomes possible disadvantages of the kmeans method but requires good knowledge of the dataset which is clustered. [43] carried out a study comparing clustering methods and showed the best performance of the K-Means method.

Experiments provided
The SCP plugin enables the user to create band sets which are then used as an input to further analyses. The bands in the band sets can be of different origin. The innovative approach of this experiment was to put index layers from different dates and satellites into one band set. By making a vegetation index from a multispectral image, we preserve the important information while making a single band raster from a multiband layer. Thanks to it, we can put more layers from different dates in one band set. Index layers come from both Sentinel-1 and Sentinel-2, Radar Vegetation Index for Sentinel-1 (RVI4S1) and Enhanced Vegetation Index (EVI) respectively.
Two basic types of analyses were carried out with each band set: 1, unsupervised classification for the whole tiles (South Moravia tile and Southern Slovakia tile) and 2, supervised classification for the Rostenice test farm. Different algorithms and methods were used and when it was possible, accuracy measurement was calculated.

Unsupervised classification of all images
The unsupervised classification algorithm K-means was applied on both sites. The number of output classes was variable. On the South Moravia tile it was 20, 40 and 50 classes. On the Southern Slovakia tile it was 10 and 40 classes.
To find more possible outputs several input band set options were made: A, Radar Vegetation Index for Sentinel-1 layers (26 images) B, Enhanced Vegetation Index layers (4 images) C, combination of the index layers (30 images).

Supervised classification for agriculture land
As mentioned before, there are two steps when applying supervised classification of satellite image(s). 1) The learning step, in which the supervisor (human) manually identifies the desired categories in the image. The database of the polygons which contains attribute information about the output class is called 'seed sample' or in SCP 'training input'. In our experiment there were altogether 43 training inputs for 7 categories of crop. From these vector layers the signature for each polygon was calculated. 2) In the prediction step the algorithm predicts the class for all the pixels of the input layers based on the signature calculated in the first step. In pixel-based classification the algorithm takes each pixel individually and using specific decision rules put the pixel in one of the predefined classes. In our study Minimum Distance Algorithm was applied. It has assigned each pixel to one of the seven predefined classes.
As there are many isolated pixels the performance of the classification was improved by a sieve filter making the result more compact.

Majority classification for field
Although the number of isolated pixels was decreased by the sieve filter, there were usually more classes on one field. When the desired output information should be clear and not a probability of the present categories, the whole polygon has to be assigned to the dominant class. Each field was assigned to the most frequent class using the Zonal statistics tool. With label rules the results can be visualized in order to be suitable for visual interpretation.

Unsupervised classification for all image
In the South Moravia tile, more alternatives of unsupervised classification were carried out. There were three possibilities of input layers: 1, only EVI layers ( Figure 50), 2, only RVI4S1 layers, and 3, combination of both layers. The number of output classes was chosen to be 20, 40 or 50. Figure 6. 50 classes of unsupervised classification from EVI layers on the South Moravian tile [44] In southern Slovakia tile there were two types of result layers. The first one divides the land cover into 10 classes and the second one into 40 classes. Both of these layers are made from a combination of EVI and RVI4S1.

Supervised classification for agriculture land
Several layers of supervised classification were created in order to compare the results from 3 aspects: 1,The input band sets. Three types of input were compared. The input layers come from indices: EVI, RVI4S1 or combination of both. RVI4S1 layers had the lowest accuracy. When the classification was done based on 26 RVI4S1 layers from March till August and was improved by the sieve filter, the overall accuracy came to 67 %. When the input is made of 7 EVI layers, the overall accuracy reaches up to 91 % without sieve improvement. Third set of inputs consisted of the combination of EVI and RVI4S1 layers. The overall accuracy of the classification coming from the bandset made up 89 % without the sieve filter.
2, Number of layers. As expected, the overall accuracy of the supervised classification rises when more data come into the input band set. If we want to identify the crop in March, there is 51 % overall accuracy with March satellite images. When we add April data, the accuracy jumps to 68 %. By further adding of index layers from later months we can get up to 93% overall accuracy.
3, Sieve filtering. The result layers, which have been post-processed by the sieve filter, show higher accuracy than without it in all classifications. The effect of the sieve filter is stronger on classification with RVI4S1 because these classifications contain more isolated pixels and little islands of pixels then classification coming from EVI. The sieve filter applied on EVI+RVI4S1 classifications has higher effect with less data and low accuracy than with high accuracy classifications. In order to make the result easy to understand, every field was assigned one prevalent class based on the highest representation of a class within the field. The real and classified attributes were visualized by multi labels (Figure 8). The accuracy of the majority classification is summarized in Table 2. The accuracy values are mostly slightly higher compared to the pixel classification ( Figure 9). The highest value is 96,3% When processing Sentinel-1 data the area of the field has an influence over the accuracy. When only large (more than 10 ha) fields were assigned the majority class, significantly higher accuracy was achieved.

Visual interpretation
Chosen classification outputs have been published using the HSLayers-NG web mapping framework [53] and are available on the Agrihub web portal. Figure 9. Comparison of crop classification in the season [45] This application allows to visualize time series of data during the season using the date selection control, apply data transparency and combine data with field data and data from ground measurement. Map window can also be split and multiple layers can be compared at one moment using the swipe control. Any other relevant data can be also added to the map from other resources to find possible correlations.

Discussion
Our work shows that the use of time series can significantly improve the accuracy of classification of individual crops. Surprisingly, we are able to have good estimates already in April and very accurate results in May. This could be important for many purposes, especially for food security strategies, but also for the food market. The radar data is not degraded by clouds, so a usable Sentinel-1 image is ready on a regular basis. Further preprocessing steps are needed to obtain the correct radar signal values. Even after these adjustments, the radar vegetation index values for Sentinel-1 vary significantly within a field. Sentinel-1 does not increase the classification accuracy (the classification of the combination of Sentinel-1 and Sentinel-2 data was less accurate than Sentinel-2 data alone), but it does provide information over the entire season. In contrast, obtaining a Sentinel-2 image without clouds is the exception rather than the rule. Unsupervised classification can be a good step for data analysis, but then it is necessary to ensure the interpretation of the results. Better accuracy can be achieved if we do not provide pixel classification but classify plots. In this case, classification using Sentinel 1 data is also more accurate.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.