A season independent U-net model for robust mapping of solar arrays using Sentinel-2 imagery

We have an unprecedented ability to map the Earth’s surface as deep learning technologies are applied to an abundance of high-frequency Earth observation data. Simple, free, and effective methods are needed to enable a variety of stakeholders to use these tools to improve scientific knowledge and decision making. Here we present a trained U-Net model that can map and delineate ground mounted solar arrays using publicly available Sentinel-2 imagery, and that requires minimal data pre-processing and no feature engineering. By using label overloading and image augmentation during training, the model is robust to temporal and spatial variation in imagery. The

trained model achieved a precision and recall of ~90% each and an intersection over union of 84.3% on independent validation data from two distinct geographies. This generalizability in space and time makes the model useful for repeatedly mapping solar arrays. We use this model to delineate all ground mounted solar arrays in North Carolina and the Chesapeake Bay watershed to illustrate how these methods can be used to quickly and easily produce accurate maps of solar infrastructure.

Background & Summary
The proliferation of Earth observation (EO) data has enabled the mapping of features on the Earth's surface at an extraordinary temporal frequency and level of detail.
Publicly available images of the entire Earth are collected every 5 days 1 and global images with ≤ 1 m resolution are collected daily by private companies. With these data, researchers are mapping the distribution and state of forest loss 2 , urban growth 3 , water resources 4 and other changes to the Earth's surface with unprecedented accuracy. These endeavours are critical to better understand and conserve natural resources 5 . Even greater advances in mapping capabilities arise from the combination of EO data with deep learning approaches [6][7][8] . Yet the ability to apply these tools to environmental challenges remains out of reach for many organizations, as early progress has largely focused on high-resolution imagery and large training datasets that require substantial financial and computing resources 8 . Thus, accessible approaches are needed to unlock the impacts possible from the combination of EO data and deep learning techniques for the purposes of conservation.
The application of computer vision to EO data has expanded the precision with which researchers are able to map the Earth's surface. Computer vision describes the automated identification of objects in images, often using deep learning models to 3 recognize, locate, and delineate objects like cats, cars, and faces in photographs 9 .
More recently, these same models have been applied to satellite and aerial imagery to identify and classify land cover 7,10 . In terms of generating geospatial data, the most important development has been the application of convolutional neural networks (CNNs) -deep learning architectures that can take advantage of the shape and context of an object -marking a transition away from traditional pixel-based classification approaches like those historically used to create land cover maps 11 . Incorporating information about spatial context can be critical for distinguishing features with similar spectral characteristics. For instance, landslides may be spectrally similar to anthropogenic land clearing, but the two are distinguishable based on shape, configuration, and landscape context 12 .
Given the ability of CNNs to delineate specific objects, a process known as image segmentation, many variants of this framework have been applied to various EO data sources 7,13 . Thus far, many of these advances have focused on model development and training feature engineering, often in specific use cases taken at a snapshot in time. In this paper, we focus on advancing the breadth of applicability of computer vision and remote sensing to conservation challenges relative to historical approaches that can be resource intensive and specialized. To be of maximal use to conservation, computer vision models using EO data need to: 1. Use publicly available data 2. Use data that is updated regularly 3. Be free or of low cost to implement 4. Achieve sufficient accuracy with limited training datasets 5. Generalize to work in a variety of geographical temporal contexts 4 Methods and models must meet these five criteria to enable conservationists to regularly use them to create actionable outputs. To address these needs, we sought to develop simple, replicable computer vision models that delineate features of interest using free, publicly available earth observation data. This has been made possible with the launch of Google Earth Engine 14  generating more than 4% of its annual energy from solar arrays following substantial growth over the past decade 16 -and there is interest in understanding the current configuration of arrays relative to wildlife habitat. In the Chesapeake Bay, there is interest in quantifying how changes in impervious surface may affect hydrological dynamics. The output data are being used in ongoing conservation and land use research related to the growth of solar energy, and the robustness of the model indicates utility in other geographic areas.

Model Structure
Our implementation of U-Net consisted of 5 consecutive encoder blocks, which increase the feature space of the data while reducing spatial resolution, and 5 decoder blocks that restore spatial detail 20 . Each encoder block was comprised of two sequences of a convolutional layer, a batch normalization layer and a rectified linear unit activation, followed by a max pooling step reducing spatial resolution. Decoder blocks consisted of a deconvolution layer that increase spatial resolution, the output of which was concatenated with output from the reciprocal decoder layer, followed by two sequences of convolution, batch normalization, and rectified linear unit activation.

Earth Observation Data
We trained our U-Net model to delineate solar arrays using top of atmosphere reflectance data collected by the Sentinel-2 imaging system 1  To make the model robust to phenological variability, we applied label overloading 21 , by first separating the collection of 2020 Sentinel-2 images into four seasonal collections: Spring (01Mar20 -31May20); Summer (01Jun20 -31Aug20); Fall (01Sep20 -30Nov20); and Winter (01Dec20 -28Feb20). Each of these four collections were then reduced to a single image per season by taking the median value at each pixel among all images in the collection. Thus, we generated four images containing six bands of Sentinel-2 reflectance data covering the study areas. We stacked the label image onto each of these four 6-band Sentinel-2 images to create four 7-band rasters with input variables and labels for each season (Fig. 1). We then sampled image chips -256256x7 arrays -from each of these four images.
Because solar arrays were relatively sparse features on a landscape, we took two steps to ensure our model had enough positive examples from which to learn to recognize these features. First, we sampled image chips at the centroids of each digitized solar array. We then generated a random sample of 1000 points within 5km of these features and sampled chips at each point. The resulting sets of chips were divided into 70% model training and 30% evaluation sets.

Model Training & Evaluation
We used these image chips to train the U-Net model. During training we yielding contiguously adjacent 256x256 patches. We converted these probabilities to predicted solar array polygons using a 0.9 probability threshold.
We calculated the recall (i.e., true positive rate) precision (i.e., false positive rate) and IoU of model output polygons relative to manually verified ground truth data. For predictions in North Carolina, we recorded each output polygon as either a true or false positive using 2020 Sentinel-2 imagery as a reference. We also used 1mresolution NAIP imagery 24  These spatial data use the EPSG 3857 web Mercator coordinate reference system.

Code Availability
All code used to generate images, sample training data, train models, and run predictions is available in the GitHub repository 25 . A persistent snapshot of this repository is provided in an Open Science Framework repository 20 .
12 unintentionally introduced. While we do not believe these biases influenced the development of our U-Net model, they may impact the interpretation and application of its outputs.

Author contributions
M. J. Evans conceptualized the study, developed image sampling and processing methodology, lead model training and predictions, and lead manuscript writing.
T. Minich digitized solar arrays in North Carolina and contributed to image processing and sampling to generate model training and evaluation data.
R. Soobitsky provided digitized solar array polygons from within the Chesapeake Bay watershed and organized and supervised the quality assessment process.
K. Mainali analysed quality assessment data and contributed to manuscript writing.