Segmentation of agricultural parcels in satellite images based on historical vegetation index data

: This paper considered the issue of agricultural ﬁelds boundary recognition in satellite images. A novel algorithm based on the aggregated history of vegetation index data obtained via open satellite data, Sentinel-2, was proposed. The proposed algorithm included several basic steps, namely the detection of parcel regions on aggregated index data; the calculation of aggregated edge maps; the segmentation of parcel regions using the edges obtained; the computation of connected components and their contour extraction. In this paper, we showed that the use of aggregated vegetation index data and boundary maps allow for much more accurate agricultural ﬁeld segmentation compared to the instant vegetation index approach. The quality of segmentation within regions of Russia and the Ukraine was estimated. The dataset that was used and Python implementation of the proposed algorithm were provided.


Introduction
The development of modern agriculture directly depends on the efficient use of land, which requires reliable and timely information about the state of land [1]. One of the sources of information regarding agricultural land is cadastre maps that document the boundaries of fields. This information needs to be regularly revised and updated, which highlights the issue of agricultural fields boundary recognition [2]. While the manual markup of fields using satellite images is usually employed, this process has several drawbacks. It is extremely labor intensive, and it is ineffective for use in the annotation of large areas, such as counties or regions [3]. Hence, automatic markup is a relevant issue. Moreover, agricultural fields boundary recognition is a preprocessing step in the crop classification problem. For example, [4] showed that the employment of objects extracted from the recognized boundaries creates better results compared to the analysis of individual pixels.
This work is devoted to agricultural fields boundary recognition using publicly available multispectral satellite data. As it is difficult to find publicly available data that can be used to train and test such an algorithm, we created and published a dataset that includes several agricultural regions located in Russia and the Ukraine. The paper is structured as follows: Section 2 provides a review of the works that discuss remote sensing technologies and agricultural fields boundary recognition in detail; Section 3 introduces considered study area and data used for experiments; Section 4 describes the proposed approach and algorithm; Section 5 provides the list of tested algorithms, formulates the quality assessment metrics and describes the methodology of the experiments, and the results. Finally, Section 6 summarizes the results of the research and details the employed data samples and the implementation of the proposed algorithm. 3 of 19 within the employed dataset were not balanced due to the characteristics of the landscape. Hence, the authors show the possibility of reducing the problem of agricultural fields boundary recognition to the problem of classification.
In another paper [51], the authors investigated the effect of temporal data on the quality of field recognition. This work compared the output of two U-Net neural networks that used the same spatial area for training. In the first case, a single RGB image was used for training; in the second case, three RGB images belonging to the same spatial region at different points in time were used. The vegetation season was conventionally divided into three time periods: January-March, April-June, and July-September. A single image was taken from each period as input data. The results of this paper show that a model that employs both spatial and time data allows for better accuracy compared to a model trained on spatial data alone.
In [55], a method for use in the automated extraction of agricultural fields from Web-Enabled Landsat Data (WELD) time series was proposed. Within the first stages, crop probability and field edges maps were constructed based on five years of weekly WELD data. The green, red, near-infrared, and mid-infrared channels, as well as the Normalized Difference Vegetation Index (NDVI), were used to calculate probabilities. The NDVI time series was formed from the maximum weekly values of the index over a 5-year period. In this study, the authors suggested that pixels with consistently high seasonal NDVI values were most likely related to crop production, but some non-agricultural vegetation types also had high NDVI values, which resulted in inaccuracies. The resulting maps were used for Variational Region-Based Geometric Active Contour (VRGAC) segmentation. Next, the watershed algorithm was used to separate fields that have fuzzy boundaries.
In [47], the authors also used temporary data to extract field boundaries. The authors selected seven Sentinel-2 images on clear days for the vegetation period of 2016-2017. First, field boundaries were extracted for each date on red, green, blue, and NIR channels; then, an aggregated boundary map was obtained by summing 28 layers of boundaries (four channels for seven dates). The resulting aggregated boundary map was fed to the input of the segmentation algorithm to obtain potential agricultural regions. Furthermore, these regions were classified into cultivated and uncultivated objects using a classification and regression tree (CART) algorithm. NDVI indicators aggregated over the observation time, such as min, max, range, and standard deviation for each pixel, were selected as features for classification. Finally, small regions that were obtained as a result of segmentation were filtered out. The authors used the Scharr and Canny operators to obtain boundary maps. Three segmentation algorithms were tested: watershed, multi-resolution, and multi-threshold. Thus, six algorithms that could be used to obtain field boundaries were investigated. The authors showed that watershed segmentation performed best on Canny's boundary map. Canny edge detection provided clearer boundaries compared to Scharr's algorithm. The authors also noted that multi-temporal data application is more effective than instant data analysis due to the dynamic nature of field crops.
In these studies, research data were provided by various satellites such as SPOT [54,61], LANDSAT [55], WorldWide-2/3 [45,57], or Sentinel-2 [4,52,58]. After analyzing the related work, we came to the conclusion that it is advisable to use a hybrid approach to field segmentation, taking into account historical data.
The approach proposed in this paper was closest to a hybrid method as it used the derived edge maps to localize the fields more accurately. Furthermore, the proposed algorithm had low parameters and did not require a large amount of marked-up data for training, unlike neural network solutions [62].
As historical data provides more detailed information, it should be employed. However, at the same time, it is highly preferable to reduce the dimensionality of data within an algorithm. Thus, we proposed the use of time-aggregated historical data to segment the fields.

Sentinel-2 data
This study employed data from the remote sensing programs Sentinel-2A and Sentinel-2B, launched in 2015 and 2017, respectively, which provided data for each area every 5 days on average.
A set of 13 images in Jpeg2000 format, taken in different spectral ranges and with spatial resolutions from 60 m/pixel to 10 m/pixel with a maximum size of 10800x10800 pixels (see Table 1), was provided for each sensing time. The data were geometrically corrected and delivered as 100x100 km tiles in the Military Grid Reference System (MGRS). The localized cloud data, also provided by the Sentinel-2 source along with multispectral images, were used for cloud cover processing.

Data collection and study areas
To compare our approach with the alternatives found in the reviewed works, a unified fields boundaries markup and unified segmentation quality evaluation metrics were required. Analysis of the experimental sections of other works has shown that the accurate comparison of approaches is difficult, mainly due to the inaccessibility of the data samples used in these works. In [1,50,52,55,57,58], the markup was obtained manually and is not accessible; [63] does not mention the origins of the reference fields; [4,45,49,58] list sources, but there are no references to the specific datasets.
We performed our own expert field markup for four regions within Russia and the Ukraine. Each of the areas was a square fragment of a tile of 57300m x 57300m with a spatial resolution of 10 m/pixel. The areas were chosen so that each of them had other types of terrain in addition to fields: urban area, water surface, swamps, and forests.
Ground truth data were segmented by experts on RGB images taken during cloudless, clear days. Table 2 provides information regarding the coordinates of the considered areas, as well as the dates when the chosen images were taken.  Furthermore, we refer to the extracted fragments by the same names as the MGRS tiles to which they belong.
The observation period from 01 January 2016 to 31 December 2020 was chosen for experiments and algorithm tests. To calculate the aggregated vegetation indices, only images with cloud coverage less than 80% were considered, and to calculate the aggregation of edge maps, only images with less less than 1% were considered.  Table 3: Statistics of fields according to markup by experts.
The fragments were divided into training and test samples, so that both groups contained small and large fields of different shapes. Figure 1 illustrates the relative locations of the considered fragments, as well as the training/test division.

Method
The proposed approach to agricultural fields boundary recognition can be described by the following key stages: (1) MSAVI2 calculation and aggregation; (2) the detection of agricultural fields regions; (3) the obtainment and aggregation of edge maps; (4) the segmentation of regions, taking into account the detected edges; (5) the computation of eight connected components and their contour extraction and filtering.
The input data of the algorithm was a set of Sentinel-2 multispectral satellite images and corresponding cloud masks. The set was used to compute the Modified Soil-adjusted Vegetation Index (hereinafter-MSAVI2) for each of the specific dates available (hereinafter, such images are called instant) in the considered time period and their historical aggregation. In contrast to approaches that use satellite data for a specific date, or a list thereof, we proposed the analysis of aggregated data within a historical period. Edge maps were obtained by aggregating the edges obtained on separate instant images. Field regions detection was performed on the aggregated MSAVI2.
Each of the stages within the proposed algorithm are described in detail below.

Aggregated vegetation index data
Single-channel image aggregation is a process of calculating a function of the pixel values for each point of an image. The result of such calculations is also a two-dimensional single-channel image.
The process of calculating aggregations is described below. Consider a set of available images for a specific area and calculate the vegetation index MSAVI2 [64] for each image: where RED and NIR are the red and near-infrared channels, correspondingly. All image operations are pixel-wise. The range of the index values under consideration was chosen experimentally by the experts: [0...1]. Figure 2 illustrates the examples of red and near-infrared channels and an instant MSAVI2 index for the same geographic region. We considered the instant vegetation index as an image I with linear dimensions of W × H. The aggregated imageĨ f was defined as a scalar two-dimensional image (also with dimensions of W × H), the (i, j)-th pixel of which is the value of the aggregation function f calculated for a sample of k i,j instant vegetation data: where f is the aggregation function, I is an image of the vegetation index data for a particular date, and k i,j is a number of measurements available for (i, j)-th pixel. The sampling is determined by the criterion of data certainty: the terrain corresponding to the image point must not be obscured by clouds and must be within the sensor's field of view. At any particular point in time, the agricultural regions in the image may be present at different stages of vegetation and crop production. This results in significant variability in the observed index values both between fields (low vegetation during the early crop production stages, or high in the peak stages) and within a single field, for example, due to the harvesting stage. Thus, an aggregation function can be introduced as a sample mean to smooth the vegetation maps over time, which will provide data with less vegetation variability both between fields and within fields.
Thus, for aggregation as the sample mean, the equation above is expressed as follows:

Searching for field regions
First, we localized the regions of agricultural fields in an input vegetation index image. Region localization is achieved via clustering pixels into two groups: pixel of agricultural field and pixel of non agricultural field, according to their absolute value. The Otsu criterion was employed to find the optimal clustering threshold. The criterion determines the threshold which maximizes the between-class variance [65]. This work assumed that the regions with significantly higher vegetative rates represent wild plants outside the crop production process. Figure 4 illustrates an example of the aggregated vegetation index values histogram. The histogram has two distinguishable peaks, which presumably correspond to the distribution of intensities of the fields' pixels and non-fields' pixels. We searched for the optimal Otsu threshold among such intensity values that exceed the minimum threshold t low , which defines regions of low vegetation (water surface, concrete areas, etc.): where σ B (i) is the between-class variance calculated among index values exceeding threshold t low . The values of t low are determined empirically. The masks of potential fields and regions with very low vegetation are defined as M and L, respectively. Such masks are binary images with dimensions of W × H obtained by thresholdingĨ avg using t O and t low , respectively: Potential field regions mask: Low-vegetation regions mask: To increase the reliability of the localization of water bodies and other regions of the terrain with very low average vegetation, we used the morphological dilatation of the image L according to a structural element s of size w: The result of field localization is the image F, obtained by element-by-element subtraction of the mask of low-vegetation regionsŴ from the image M:

Edges Detection
As in [47], to obtain edge maps, we used the Canny edge detector [48]. However, unlike [47], we extracted edges from pre-computed MSAVI2 images rather than RGB and NIR channels of satellite images.
As the agricultural fields may be at different stages of vegetation on a single selected date, the accumulation of boundaries over the historical period strengthens static field boundaries and accumulates weak ones.
We considered the instant Canny edge map for index I as an image E(I) of size W × H with values of one for the boundary pixels. The accumulated edge mapẼ was calculated as the sum of instant boundaries normalized by the number of measurements: where ξ is the index of the image in the historical dataset.
Despite the known position of clouds in the image, the accuracy of their localization is not perfect, hence the false boundaries for clouds, and their shadows can be detected, and, as a consequence, such false boundaries appear within the aggregated data. To reduce their effect on the signal-to-noise ratio in the final accumulative image, only cloud-free satellite images were considered.
A binary edges maskẼ b from the accumulated imageẼ was obtained via threshold binarization. The threshold was also calculated using the Otsu method. Figure 6 illustrates the example of the aggregate index fragment calculated from 2016 to 2020 inclusive (Figure 6a) and the detected Canny edges (Figure 6b) on the index. Figure 6c shows a binary edge mask obtained via 54 instant edge maps aggregation and binarization.
The boundary accumulation approach provides a more detailed map of field edges compared to boundary localization in the aggregated image. The localizations of the field edges on different images do not always match exactly, and usually, there is a spatial error. To reduce the influence of this effect when accumulating the boundaries, instead of the original edge maps, we employed the results of the morphological dilatation for the latter. This step is also effective when there are inaccuracies in the alignment of satellite images. The effect of this step on the quality of agricultural field boundary segmentation is provided in the Experiments section of this paper.
The binary edges map was subjected to morphological closing. This step allows for a reduction in the proportion of open edges and noisy false positive edges of small regions.

Segmentation of regions
Then, we obtained a mask of individual agricultural fields, taking the detected edges into account. A raster image R of the segmented fields was obtained as follows: where F is an image of localized field regions, andẼ b is the edges image.

Contours extraction and processing
In this step, the R image was searched for eight connected components, which constitute a list C. For each component, its vector contour was determined. Then, it was checked for validity (no self-intersections, the number of points was more than two, etc.). If the contour was not valid, it was corrected. To get rid of noise caused by image over-segmentation, false positive detections, and unlikely big fields, vector contours were filtered by area using the thresholds t area min and t area max . These thresholds were determined empirically. The algorithm can be divided into three blocks: field mask evaluation, edge mask calculation, and field contours obtainment. The calculation of the edges map on instant indices also included morphological dilation of the obtained boundaries in order to improve their further aggregation.

Experiments
We tested the proposed algorithm and its several modifications related to the methods of edge map calculation. We also considered the case wherein the field boundaries were determined from a single instant image (obtained on the same date as the image in which the field markup was performed).
The tests were conducted using data from regions within the Russian Federation and the Ukraine, where agriculture is well developed. The areas were chosen so that each of them would be on a separate MGRS tile. To simplify and speed up the calculations, we considered square fragments rather than complete tiles, which were about four times smaller in terms of area.
Along with the results of the experiments, we provide our expert markup of the regions under study.

Description of the tested algorithms
Along with the proposed algorithm (i), the following modifications were tested: (ii) without edge map dilation in the aggregation process; (iii) with edge maps obtained from aggregated MSAVI2 images alone; (iv) using instant data at a specific date.
The following parameters were investigated: σ (Canny operator parameter) is the standard deviation of the Gaussian operator, which is applied to the original image during preprocessing; w is the radius of the circular structural element used in morphological closing; t area min and t area max are the thresholds for the minimal and the maximal field region area, respectively.
Experiments showed that for the available training sample, the following parameters are the closest to the optimal ones: t area min = 0.05 km 2 , t area min = 10 3 km 2 , and w = 2 pixels. The best result for boundary aggregation (i, ii) was achieved via the accumulation of Canny boundary maps calculated with the parameter σ = 1. The third scenario (iii) considered was an algorithm with the calculation of field boundaries on the image of the aggregated index. In this case, σ was set to 0.01. A (iv) scenario similar to the third was also considered, but one instant index image was fed to the algorithm as input. The model parameters were: σ = 0.5, w = 3.
In Table 4, we introduce the following notation for each tested algorithm.
Algorithm's notation Algorithm description and parameters

Evaluation of segmentation accuracy
To evaluate the accuracy of algorithms in agricultural fields boundary recognition, different approaches are used. Among them, we can distinguish pixel-based and objectbased metrics according to the type of data analyzed. Pixel-based metrics are defined for sets of image pixels with assigned recognition result labels (field/non-field). Their main advantage is that they are simple to implement and quick to compute, parameters which mainly depends on the size of the images. However, their serious disadvantage is low sensitivity to under-or over-segmentation. For example, the DICE [51] might yield a high value both when fields are excessively split and when they are merged, as shown in Figure  8.  We used an object-based approach to estimate agricultural fields boundary segmentation, which is more sensitive to over-and under-segmentation. We also performed pixel-based metrics estimation. We compared the recognized fields to fields segmented by the experts using Jaccard's coefficient (11) and considered that f where S(x) denotes the area of the polygon x. Under the condition J(a, b) > 0.5, if the fields from F exp do not intersect each other (a similar condition is imposed on F re f ), F re f ↔ J(a,b)>0.5 F exp can be considered one-to-one correspondence. We denote the set of N one2one found one-to-one correspondences between elements from F re f and F exp as F one2one : The index of unambiguous field recognition is introduced as the value RecRate: This value equals 100% if each field from the expert markup under the condition J(a, b) > 0.5 has a one-to-one correspondence from the set of recognized fields. It is sensitive to both false positives and false negatives, which are accounted for within both N one2one and N exp .
To assess the accuracy of the determined field area, we considered the following value: The criteria for the accuracy of determined field area are the mean and median of the area errors: The criteria (13) and (15) can be both accounted for in a single criterion RecRate θ if an additional condition is imposed on the recognized fields to allow for area error: where N one2one,δs<θ is the number of correctly recognized fields for which the area error δs i is less than the threshold value θ. Here, we used θ = {0.1, 0.2}. The value (17) is sensitive to over-and under-segmentation and penalizes for merge/split errors. If we did not consider such recognition results as errors, we could introduce a softer measure of accuracy (18) that took not only the one-to-one correspondences between the elements F re f , F exp into account.
We chose the following value as such a criterion: The field f i re f corresponds to the field f j exp = argmax f ∈F exp (Ω( f i re f , f )), provided that does not include one-to-one correspondences alone and allows for merges (when one field f exp corresponds to several fields f re f ) and splits (when one field f re f corresponds to several fields f exp ), as well as chains of correspondences. We denote the set of correspondences found as described above by F Ω , and the number of re f fields from this set by N Ω re f . Then, we can define the «soft» criterion of recognition as follows: We denote the fields from the set F exp that are not in F Ω as F FP = F exp \F Ω with cardinality |F FP | = N FP . We consider the fields within F FP to be false positive predictions and introduce the value FPR: Finally, we also take into account the pixel-based metric DICE, defined for both fields and their boundaries: where TP is the number of true positive predictions, FP is the number of false positive predictions, and FN is the number of false negative predictions. When converting the vector representation of the boundaries of the fields into raster to compute DICE edges , we set the width of the borders to 2 pixels.  Preliminary dilation of the edges when aggregating them also improved the RecRate criterion, but to a lesser extent (48.27% → 51.25%). Using aggregated indices instead of instant ones gave a noticeable increase in quality (23.54% → 29.63%).

Results and discussion
It is worth noting that for the algorithm E(Ĩ avg ), RecRate=29.63% that the criterion RecRate so f t had a higher value than that ofẼ dilated and equaled 80.50%. This effect is explained by the fact that the large under-segmented fields obtained by the algorithm E(Ĩ avg ) covered a large number of fields found within the ground-truth, hence RecRate so f t increased. At the same time, over-segmentation of fields by theẼ dilated algorithm resulted in the contours for small regions, which have no matches within the markup, so they were labeled as false positives.
The accuracy of the computed edges map also significantly influenced over-and undersegmentation. Therefore, improving the stage of edges detection will contribute to the development of the algorithm. For example, as the balance of areas of boundaries and nonboundaries is not guaranteed, a better result in comparison with the original Otsu criterion can be given by its generalization to the case of unbalanced classes [66]. Furthermore, different forms of fields are characteristic for different countries and territories, but within one region, fields often have a similar structure. In such cases, a restriction may be imposed on the possible shapes of field boundaries, as was done in a study regarding the recognition of road markings [67,68]. This will remove false boundaries and reduce over-segmentation. In addition, the search for instant boundaries can be performed on the projection of a multi-spectral image instead of the MSAVI2 index. The conversion of a multispectral satellite image to a single-channel image can be performed by the method of dimensionality reduction, preserving contrast and boundaries [36]. All the proposed improvements also imply classical low-parametric methods, which are realizable within the framework of the proposed approach.
For low thresholds for the minimum area, the quality according to RecRate dropped significantly because of the rapidly increasing number of false positives within the recognized regions. The Gaussian smoothing parameter σ = 1 of the Canny operator, as experiments have shown, is close to optimal: at values of 0.5 or 1.5, the value of RecRate slightly decreased. Small values of σ (on the order of magnitude 0.001 -0.1) when aggregating boundaries resulted in strong noise. It should be noted that for all the algorithms (except for E instant ), the indicators DICE f ields , DICE edges , calculated from pixel data, have similar values. The best DICE f ields was observed for E(Ĩ avg ) = 88.31%. Figure 9 shows the visualization of recognition for the investigated algorithms.

Conclusion
This paper considered the issue of agricultural fields boundary recognition using open Sentinel-2 satellite data. A low-parameter algorithm based on the use of classical image processing tools was proposed for use with vegetation index images. This paper described a method for use in vegetation index data as well as field edge maps aggregation over a historical period. We experimentally showed that this combination used as input data allows for the more accurate recognition of field boundaries compared to the employment of instant satellite data.
The proposed algorithm is easy to implement, and it does not require the use of resource-intensive computing machines equipped with GPU. Due to a small number of parameters and transparent architecture, segmentation quality optimization is much easier compared to the neural networks approach. The approximate running time of the algorithm for Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz machine is ∼ 60 sec for an input image of 5730x5730 pixels.
We also provided the source code of the algorithm implementation in Python3.8 (https://github.com/iitpvisionlab/fields-recognizer) and the expert markup of fields for training (7918 parcel contours) and test (10941 parcel contours) data, which were used in the experimental part of this work (https://doi.org/10.5281/zenodo.5571868). The markup repository also contains precomputed example images that are used as input in the algorithm.
Plans for future work include the investigation of improvements in the detection of field regions in order to decrease false positives and research regarding more accurate edges mask obtainment. Furthermore, research regarding parcel ground-truth dataset extension on other terrains is of interest, along with an investigation into increasing its quality.

ACKNOWLEDGMENTS
This work was supported by the Russian Science Foundation (Project No. 20-61-47089).