Satellite-Based Human Settlement Datasets Inadequately De- tect Refugee Settlements: A Critical Assessment of Area, Accu- racy, and Agreement at Thirty Refugee Settlements in Uganda

Satellite-based broad-scale (i.e., global and continental) human settlement data offer foundational information for diverse applications spanning climate hazard mitigation, sustainable development monitoring, spatial epidemiology, and demographic modeling. While many human settlement products report exceptional detection accuracies above 85%, there is a substantial blind spot in that product validation is typically centered on large urban areas rather than rural, small-scale settlements that are home to 3.4 billion people. In this study, we make use of a data-rich collection of 30 refugee settlements in Uganda to produce a targeted assessment of small-scale settlement detection by four broad-scale human settlement products: Global Human Settlements Built-Up Sentinel-2 (GHS-BUILT-S2), World Settlement Footprint (WSF), High Resolution Settlement Layer (HRSL), and Geo-Referenced Infrastructure and Demographic Data for Development (GRID3). We measured each product’s areal coverage within refugee settlements, assessed product detection accuracies in comparison to an independent dataset of 317,416 refugee settlement building footprints, and examined agreement between products. For refugee settlements established before 2016, the human settlement products had a low median F1-Score (F1) of 0.24, a high median false alarm rate of 0.59, and tended to only agree at locations of highest building density. Individually, WSF entirely overlooked 8 of the 30 study refugee settlements (median F1=0.17); GHS-BUILT-S2 underestimated the building footprint area by a median 50% (F1=0.15); GRID3 overestimated the building footprint area by a median 280% (F1=0.38); and HRSL underestimated the median area by 7% (F1=0.34). All products suffer from low detection accuracy and high false alarm rates, but GRID3 and HRSL, based on 0.5 meter resolution imagery, offer better detection accuracy than GHS-BUILT S2 and WSF, which are based on 10-30 meter resolution imagery. These results show that human settlement products have far to go in providing an accurate depiction of small-scale refugee settlements and would benefit from incorporating refugee settlements in training and validation of broad-scale human settlement detection.


Introduction
Broad-scale (i.e., global and continental) satellite-driven human settlement products including the Global Human Settlement Layer (GHSL; [1,2]), the World Settlement Footprint (WSF; [3]), the High Resolution Settlement Layer (HRSL; [4]), and Geo-Referenced Infrastructure and Demographic Data for Development (GRID3; [5]) attempt to capture the presence and extent of populated built-up regions across an array of urban and rural geographies [6]. These products have broadened our foundational awareness of where humans live and work [7][8][9] and made important contributions to population modeling with broader relevance for small-scale settlement mapping, helps bridge the divide between localized humanitarian and broad-scale human settlement monitoring efforts, and contributes to an ongoing discussion of improving inclusive representation of refugees in geospatial Big Data (e.g., [51,52]).

Study Area
As of writing, Uganda has the third largest refugee population in the world with 1.4 million refugees (nearly 3% of Uganda's total population) under UNHCR protection [53]. Uganda's earliest refugees settled in the 1960s, but the population increased almost tenfold between 2012-2017 with the arrival of refugees from South Sudan, the Democratic Republic of the Congo (DRC), and Burundi (Fig. 1). Given that the same conflicts that forced refugees to leave their home countries in the first place are often ongoing, many refugees in Uganda do not intend to return to their home country and are likely to remain settled in Uganda for years to come [54]. Approximately 94% of Uganda's refugees live in 30 UNHCR-managed settlements in the Northern and Western Regions of the country (Fig. 2), and the remaining 6% of Uganda's refugee population lives in the capital city of Kampala [55]. The abundance of refugee settlements, diversity of settlement morphologies and density, availability of settlement boundary and building footprint data --as described below --make Uganda an ideal case study to evaluate the detection of refugee settlements by broad-scale human settlement products.  The 30 study refugee settlements have a median area of 4 km 2 but range from 0.2 km 2 (Mireyi) to 790 km 2 (Bidi Bidi) in total area. Refugee settlements have a median population of 16,782 people (mean: 47,619; standard deviation: 58,093) as of September 2020 (Table 1). Settlement layouts adhere to predefined UNHCR settlement planning protocols [56], but are uniquely designed to fit site-specific considerations and vary from a grid-like organization of dwellings and other structures delineated by roadways to clustered agglomerations of dwellings. Settlements are broadly self-contained with housing, water and latrine infrastructure (WASH), markets, financial services, and educational and healthcare facilities on site, as well as buildings for refugee response (i.e., processing and registering arriving refugees) and administration (i.e., settlement management, aid provision, inter-sector coordination etc.). Each family is allocated a 30 m by 30 m plot for housing and agricultural cultivation, and individual dwellings tend to be small (median area of 25 m 2 ). Typical materials used for dwellings at the time of refugee arrival are plastic tents, tarps, or grass thatching, which may be replaced with more durable building materials over time potentially including brick and tin roofing [43]. Table 1. Characteristics of 30 study refugee settlements in Uganda. Establishment year based on UNHCR Settlement Factsheets. Primary country of origin and population data (as of September 2020) are from https://data2.unhcr.org/en/documents/details/79424. Settlement boundary areas are from https://data2.unhcr.org/en/documents/details/74116, described below. Building footprint areas are based on 10 meter resolution OSM-MS building footprint data, described below. DRC: Democratic Republic of the Congo.

Settlement Name
Year

UNHCR Refugee Settlement Boundary Data
Refugee settlement boundaries ( Fig. 3) demarcate land allocated to refugee settlements for planning and land management purposes and were established by UNHCR in agreement with the Government of Uganda. Several of the larger settlements (Bidi Bidi, Rhino Camp, and Imvepi) include sub-settlement boundaries (i.e., zones, villages, blocks) that delineate specific regions of refugee presence from non-refugee populations within the larger settlement boundary [56]. As a result, it is common for refugee populations to only occupy a portion of the area demarcated by the UNHCR refugee settlement boundary. The boundary does not represent an absolute barrier to refugee land use as structures, transportation infrastructure, and agricultural plots are commonly established alongside or beyond the boundary. UNHCR refugee settlement boundaries are available at https://data2.unhcr.org/en/documents/details/74116.

Human Settlement Datasets
We examined four satellite-derived broad-scale human settlement products at the 30 study refugee settlements. The Global Human Settlement Layer with Sentinel-2 (GHS-BUILT-S2; [2]) offers 10 meter resolution global-scale coverage of probabilistic settlement presence based on a convolutional neural net (CNN) classification of Sentinel-2 Level 1C imagery from January 2017 through December 2018. GHS-BUILT-S2 was publicly released in November 2020. We initially examined settlement detection using >1% and >50% probability thresholds but adopted recommendations from [2] to use a 20% probability threshold for settlement detection in rural, low-density settlement regions. GHS-BUILT-S2 has a nominal accuracy of 85% when using a detection probability threshold of greater than 20% in Africa, but there is no explicit mention of detection accuracy for refugee settlements. GHS-BUILT-S2 is available at https://ghsl.jrc.ec.europa.eu/ghs_bu_s2_2018.php.
The World Settlement Footprint (WSF; [3]) is a 10 meter resolution global-scale binary human settlement dataset based on multi-temporal Sentinel-1 Synthetic Aperture Radar (SAR, 10 m) and Landsat 8 (30 m) imagery from 2014-2015 classified using a Support Vector Machine (SVM). WSF offers global-scale coverage with a nominal settlement detection accuracy of 86%, and was publicly released in July 2020. WSF producers note that smaller structures made with building materials commonly found in refugee settlements (e.g., mud bricks and straw) were not consistently detected with WSF, but there is no quantitative assessment available to support this observation. WSF is available at https://figshare.com/articles/dataset/World_Settlement_Footprint_WSF_2015/10048412. The High Resolution Settlement Layer (HRSL; [4]) is a 30 meter resolution population and settlement dataset with coverage across 140 countries and was released in December 2017. HRSL's binary settlement data are based on a CNN classification of buildings using 50 cm resolution Maxar imagery from 2011-2015. HRSL has an average precision and recall of 95% and 91%, respectively, and 99% and 93% for Uganda, respectively. There is no mention of refugee settlements being considered in the development of the settlement detection approach or accuracy assessment. HRSL data are available at https://www.ciesin.columbia.edu/data/hrsl/.
The Geo-Referenced Infrastructure and Demographic Data for Development (GRID3; [5]) offers wall-to-wall coverage for 51 countries in sub-Saharan Africa and was made publicly available in April 2020. GRID3 is a vector (i.e., polygon) product based on Ecopia building footprints derived from 50 cm resolution Maxar imagery. Approximately 77% of imagery input to GRID3 for Uganda were acquired between 2016-2020 with 23% from 2015 or earlier. Settlement extents are created by processing building footprint centroids to a 3 arc-second raster grid of building densities. Based on the underlying building density, polygons are classified as Built-Up Area (BUA; high density), Small Settlement Area (SSA; moderate density), or hamlet (low density). To capture any indication of settlement presence within GRID3, we merged all three settlement classes and refer to this combined coverage as GRID3 henceforth. As with other products, there is no mention of refugee settlements being considered in GRID3's development or validation but refugee settlements and other settlements with temporary structures are recognized as being difficult to detect. GRID3 data for Uganda are available at https://academiccommons.columbia.edu/doi/10.7916/d8-s1yg-pc20.
In addition to the four products above, we conducted preliminary analyses of the Landsat-based Global Human Settlement Layer (GHS-BUILT; https://ghsl.jrc.ec.europa.eu/ghs_bu2019.php), Global Urban Footprint (GUF; [57]), and the Global Artificial Impervious Area (GAIA; [58]). Since these products had zero or negligible coverage within study refugee settlements, we removed them from further analysis.

Refugee Settlement Building Footprint Data
Two different kinds of building footprint data were used for assessing the areal coverage and detection accuracy of the four settlement products. The open-source Open-StreetMap (OSM) building footprint dataset for Uganda includes 304,482 individual building footprints, covers a total area of 7.26 km 2 across the 30 study settlements, and was created between 2017 and 2020 according to footprint timestamp data. There is no explicit accuracy assessment for OSM footprint data but their accuracy, completeness, and geo-precision are recognized to depend on settlement morphology, building density, and rooftop architecture [59]. The OSM footprint dataset is available at https://data.humdata.org/dataset/hotosm_uga_buildings.
We also used building footprints created through the Microsoft (MS) AI for Humanitarian Action program in partnership with the Humanitarian OpenStreetMap Team. This MS dataset spans Uganda and Tanzania and was generated using a deep neural net model trained with 1.2 million labeled buildings and Maxar very high resolution satellite imagery [60]. The MS dataset includes 58,544 individual building footprints that span a total area of 2.16 km 2 across 27 study settlements; only Ayilo II, Boroli II, and Pagirinya lack MS footprint data. The reported precision and recall for the MS footprint dataset are 95% and 62%, respectively, but is variable especially between urban and rural settings [61]. The MS building footprint dataset is available at https://github.com/microsoft/Uganda-Tanzania-Building-Footprints.
A fused building footprint product was created by merging the complementary OSM and MS building footprint datasets (Fig. 4). The OSM-MS fused dataset includes 317,416 individual building footprints across 8.47 km 2 over all 30 refugee settlements; only 0.94 km 2 was recorded in both OSM and HOT footprint datasets. OSM-MS building footprints have a median area of 0.60 km 2 (mean: 2.05; min: 0.04; max: 14.93; std: 3.41), and there are a median 589 OSM-MS building footprints per km 2 within a refugee settlement's UNHCR boundary (mean: 860, min: 68, max: 4213, std: 926). Even with the fused OSM-MS building footprint product, there are notable omission errors (e.g., Fig. 4b-d), which we discuss below.
To support a pixel-wise validation of each settlement product, we prepared a 10 meter resolution raster georeferenced to the WSF product. We rasterized the fused OSM-MS building footprints to a binary settlement/non-settlement raster. In an effort to increase the likelihood of agreement with human settlement products, we considered any pixel intersecting an OSM-MS building footprint as a settlement pixel; any pixel lacking an OSM-MS building footprint was considered a non-settlement pixel. Rasterized building footprint coverage spans 0.04 to 14.93 km 2 across the 30 study settlements (Table 1), and all mentions below of the OSM-MS building footprint dataset refer to the 10 meter resolution raster product rather than the input polygons.

Methods
This study has three objectives ( Fig. 5): to assess the areal coverage of broad-scale human settlement products across 30 refugee settlements in Uganda, validate settlement product detection accuracy using OSM-MS building footprint data, and measure agreement in refugee settlement detection across the settlement products. To support intercomparison between settlement products and building footprint OSM-MS data, HRSL and GHS-BUILT-S2 were resampled and georeferenced to the 10 meter resolution WSF product, and GRID3 vector data were similarly rasterized at 10 meter resolution and georeferenced to WSF. Since WSF and HRSL were predominantly based on imagery from before 2016 --and therefore are less likely to detect refugee settlements established in 2016 or later --and GRID3 and GHS-BUILT-S2 predominantly used imagery acquired in 2016 or later, we report areal coverage, validation, and agreement results for pre-2016 refugee settlements (i.e., settlements established before 2016) separately from post-2016 refugee settlements (i.e., settlements established in 2016 or later).

Objective 1: Measure areal coverage of settlement products at refugee settlements
To measure potential under-or over-representation of refugee settlement area, we measured the areal coverage of each human settlement product at study refugee settlements at two scales. First, we measured the ratio of settlement product area to the UNHCR refugee settlement boundary area, which, as described above, may include large swaths of unpopulated land as well as non-refugee population settlements. Second, we compared human settlement product area to the total area of OSM-MS building footprints within each settlement's UNHCR boundary. Since settlement establishment partially informs whether a settlement can be detected by a given product, we considered pre-and post-2016 settlements separately.

Objective 2: Validate settlement product detection of refugee settlements
We measured the detection accuracy of each human settlement product at refugee settlements by comparison with the 10 meter resolution OSM-Microsoft building footprint data. We measured the total area of "hit" (true positive, TP), "false alarm" (false positive, FP), "miss" (false negative, FN), and "none" (true negative, TN) pixels for each product's coverage within each settlement's UNHCR boundary (Table 2). With these values in hand, we calculated commonly used validation metrics to assess classification performance: Probability of Detection (POD, also known as Recall), Critical Success Index (CSI; [62]), F1-Score (F1), and False Alarm Rate (FAR, also known as the False Positive Ratio) ( Table  3). POD indicates the proportion of building footprint area represented in a settlement's product coverage. CSI accounts for potential over-and under-detection of building footprint area by the settlement products. F1 provides additional sensitivity to detection of sparse features. Lastly, FAR indicates over-detection of building footprint area. All metrics range from 0-1.

Validation Metric Equation
Probability of Detection (POD)

Objective 3: Assess agreement between settlement products
We next examined the agreement between GHS-BUILT S2, WSF, HRSL, and GRID3 in their detection of pre-and post-2016 refugee settlements. We made a multi-product agreement raster at 10 meter spatial resolution that represents the number of products with coverage per pixel within each settlement boundary. We measured the area detected by a single human settlement product (i.e., unique detection without agreement from another human settlement product) as well as the area detected by two, three, or four products (i.e., common detection by all human settlement products) within boundaries and over OSM-MS building footprints. We also measured the respective contributions of each human settlement product to each level of agreement (e.g., 1-4) to identify product-level differences in unique or multi-product agreements. To gauge how accuracy of detection changes with increased agreement, we measured the validation metrics from Objective 2 for the multi-product agreement raster relative to the OSM-MS building footprint dataset.
Since the values in the multi-product agreement raster are mutually exclusive over space, we could not independently assess each value's detection accuracy fairly without being penalized for the footprints accurately detected by other values in the multi-product agreement raster. So, we measured validation metrics cumulatively by incorporating single (unique) coverage in the validation of two-product agreement area; incorporating one, two, and three-product coverage in the validation of the three-product agreement area, etc.

Objective 1: Measure areal coverage of settlement products at refugee settlements
We found large differences in coverage between the four human settlement products within the UNHCR refugee settlement boundaries (Fig. 6). GHS-BUILT-S2, HRSL, and GRID3 have coverage across all 30 settlements but WSF didn't offer a single pixel at eight refugee settlements that were established at various dates spanning 1992 through 2016. GRID3 offers more than five-fold the coverage of other products within UNHCR settlement boundaries, and detected a median 65% (min: 9%; max: 100%; std dev: 26%) of the settlement boundary area for pre-2016 settlements and a median 28% (min: 11%; max: 72%; std dev: 18%) of the boundary area for post-2016 settlements. HRSL has the second largest median coverage with 13% (min: 1%; max: 49%; std dev: 13%) of pre-2016 settlements and 1% (min: 0%; max: 26%; std dev: 8%) of post-2016 settlements. The median coverages of GHS-BUILT-S2 and WSF don't exceed 2% for pre-or post-2016 settlements, and the maximum areas covered by GHS-BUILT-S2 and WSF across all settlements are only 12% (i.e., Boroli I, established 2014) and 25% (i.e., Olua II, established 2012), respectively. Note that the HRSL and WSF coverage of post-2016 settlements despite not using post-2016 source imagery results from detection of non-refugee populations within the settlement boundary prior to refugee arrival. We next measured human settlement product coverage compared to OSM-MS building footprint coverage within each settlement's boundary. We found that GHS-BUILT-S2 and WSF underestimate the OSM-MS building footprint area by a median 0.30 km 2 (min: 0.53 km 2 overestimate; max: 11.80 km 2 ; std dev: 2.50 km 2 ) and 0.60 km 2 (min: 0.61 km 2 overestimate; max: 10.86 km 2 ; std dev: 2.49 km 2 ), respectively (Fig. 7). HRSL offers a closer approximation of building footprint area with a slight median underestimation of 0.04 km 2 (min: 0.68 km 2 ; max: 4.01 km 2 ; std dev: 1.21 km 2 ) while GRID3 grossly overestimates the OSM-MS footprint area by a median 1.69 km 2 (min: 129.09 km 2 ; max: 0.07 km 2 ; std dev: 32.95 km 2 ). Thus, while GRID3 may cover more of the area within a settlement boundary, this coverage is an over-estimation of building footprint area, and HRSL, instead, offers the best agreement to building footprint coverage. As above, pre-2016 settlements saw markedly better agreement than post-2016 settlements across all products, and HRSL had the most similar coverage for pre-and post-2016 building footprints.

Objective 2: Validate settlement product detection of refugee settlements
In our validation using the OSM-MS building footprint dataset, we found that the four settlement products --GHS-BUILT-S2, HRSL, WSF, and GRID3 --do not accurately detect refugee settlements. For pre-2016 settlements, the settlement products offer a me-  (Fig. 8), GRID3 has the overall highest detection accuracy of pre-2016 refugee settlements with a median POD of 0.97, CSI of 0.21, and an F1of 0.38 but suffers from the highest median FAR of 0.75. HRSL has the second highest detection rate across all metrics and is only marginally below GRID3 for all metrics other than POD. GHS-BUILT-S2 and WSF tend to have the lowest detection rates at less than 0.10 for all metrics other than F1. Detection rates are expected to be higher for pre-2016 settlements given their establishment prior to source image acquisition, yet even pre-2016 detection rates are far below each settlement product's nominal detection accuracy, which range from 85-99%. These results point to a consistent under-detection of settlement as depicted in the OSM-MS building footprints dataset, an over-detection of settlement area that is not represented in the footprints dataset, with highly variable accuracies settlement to settlement (See Figure A1). There was wide variation in detection with GRID3 most often offering superior accuracy despite its high FAR. Mireyi, for example, is a small refugee settlement demarcated by a boundary 0.20 km 2 in size with 0.11 km 2 of building footprint coverage. Mireyi was established in 1994 meaning that all four products are based on source imagery acquired decades after the earliest refugee arrivals yet these products offer disparate views of the settlement (Fig. 9). GRID3 has complete coverage of all OSM-MS building footprints (POD=0.10) but yields extensive false alarms elsewhere within the settlement (FAR=0.44). By contrast, HRSL tends to only capture the central region of the settlement (POD=0.51) and overlooks settled regions near the boundary (FAR=0.40). WSF and GHS-BUILT-S2 offer progressively less coverage but manage to capture some settled areas overlooked by HRSL. Considering the oldest settlements of Kyangwali (1960) and Oruchinga (1961) (see Fig. A2), we find that, in contrast to overall pre-2016 settlement detection trends (Fig. 8), GHS-BUILT-S2 had the highest detection rates in terms of CSI and F1. Moving several decades later to the next established settlements of Rhino Camp and Elema, established in 1980 and 1992, respectively, HRSL excelled in all detection metrics other than POD, which GRID3 once again led. These are exceptions, however, since settlements established in the mid-1990s through the most recently established settlements (such as Boroli I and Ayilo I, below) were best detected by GRID3 across detection metrics.

Objective 3: Assess agreement between settlement products
Given the at times overlapping coverage between the four human settlement products, we next examined the amount and accuracy of multi-product agreement. For pre-2016 settlements, we found that nearly half (45%) of the median coverage within a settlement boundary was unique to a single settlement product. This prevalence of unique detection and the rapidly decreasing coverage of two-(17%), three-(3%), and four-(<1%) product agreement (Fig. 10) suggest high disagreement between the four settlement products. A similar trend in declining coverage with greater levels of agreement was measured in post-2016 settlements and points to an overall low agreement between products in their detection of refugee settlements. GRID3 is almost entirely responsible for unique detection within refugee settlements by providing 99% of single product coverage (Table 4). GRID3 is also present at all locations of detection by other products; in effect, there is not a single pixel detected by GHSL-BUILT-S2, HRSL, or WSF that is not also detected by GRID3. After GRID3, HRSL contributes zero unique detection but offers the second most coverage for sites detected by two or three products, while GHSL-BUILT-S2 and WSF offer zero unique detection and contribute the least to two-or three-product agreement coverage.  1.00 1.00 1.00 Twenty-one of the 30 study settlements have some coverage by all four products, eight settlements have coverage by three settlement products, and a single settlement (Agojo) only has coverage by two products (Fig. A3). The majority of multi-product agreement occurs in built-up regions with densely arranged, large, and highly reflective structures (Fig. 11). The multi-product agreement is likely associated with refugee settlement administration or UNHCR coordination while individual, small-scale refugee dwellings broadly evade detection by products other than GRID3. The absence of agreement across much of the largest refugee settlements (e.g., Bidi Bidi, Imvepi, and Kiryandongo) reflects the openness and broad absence of structures. Multi-product agreement outside of settlement boundaries is not uncommon (e.g., Boroli II, Elema, Mungula I, etc.) and may be associated with dwelling spillover (e.g., Maaji I), market structure construction (e.g., Pagirinya), and, less commonly, nearby non-refugee settlements (e.g., Lobule). The prevalence of multi-product agreement outside the UNHCR settlement boundaries suggests that the space directly influenced by refugees and settlement-level land use is not always encapsulated by refugee settlement boundaries, a finding echoed in [46]. Figure 11. Maps of agreement between four settlement products at a) Mireyi, b) Boroli I, and c) Ayilo I. "1 product" indicates unique coverage by a single product. "2 products", "3 products", and "4 products" indicate the number of settlement products that share coverage. "None" indicates zero coverage. Note that the spatial scale of settlement maps varies.
We found that increasing agreement between settlement products also improved POD, CSI, and F1 and decreased FAR ( Fig. 12; Fig. A4). The largest gain in accuracy occurred with the transition from unique detection by a single product (i.e., GRID3, as explained above) to agreement by two products (usually HRSL and GRID3) with little subsequent improvement in accuracy with the transition from two-product agreement to three-or four-product agreement. POD showed the greatest overall increase from unique detection by a single product to multi-product agreement --likely because it overlooks false alarms --while CSI, F1, and FAR were much more constrained in their incremental changes between agreement levels. Detection accuracy of pre-2016 settlements benefited more from increasing agreement compared to post-2016 settlements, which is expected given the overall higher accuracy of detection at pre-2016 settlements. product" indicates unique coverage by a single product. "2 products", "3 products", and "4 products" indicate the number of settlement products that share coverage.

Discussion
This study is the first to systematically examine how well refugee settlements are captured by leading satellite data-based human settlement products, GHS-BUILT-S2, WSF, HRSL, and GRID3. Overall, we found generally low coverage within refugee settlements to begin with, which resulted in low detection accuracy, and consistently high FAR for products often in excess of 0.50 with GRID3 consistently reaching a FAR above 0.80. GRID3 tends to provide the most coverage within a settlement's boundary but greatly over-represents the building footprint coverage within settlement boundaries, leading to high detection accuracies (POD, CSI, F1) but also high FAR. HRSL captured far less area than GRID3 but better approximates building footprint coverage within refugee settlement boundaries, and often has comparable CSI and F1 as GRID3 as well as a lower FAR. That HRSL does so well at estimating overall footprint area but has low detection accuracy suggests that HRSL provides an adequate allocation of settlement area but that its coverage is not correctly located within the settlement. GHS-BUILT-S2 and WSF tend to capture much less area within refugee settlement boundaries, underestimate building footprint coverage, and have the lowest detection accuracies albeit with the lowest FARs. Multiple products find common detection in regions with dense arrangements of buildings or with exceptionally large buildings, with similar results to [6,33], but there are few gains in detection accuracy when combining more than two different products.
What underlies the poor coverage and low detection of refugee settlements? While a building-level examination is beyond the scope of this study, the small size of buildings in study refugee settlements likely poses a central challenge to detection. This is suggested in the higher detection rates of GRID3 and HRSL, achieved with 50 cm resolution source imagery that is much more capable of resolving small buildings. Meanwhile, the 10 m resolution imagery used by GHS-BUILT-S2 and 10-30 m resolution imagery used by WSF increases the likelihood of capturing mixed pixels that may include buildings as well as surrounding vegetation, soil, and, less so, infrastructure. GHS-BUILT-S2 and WSF tend to overlook small buildings typical of family dwellings but do capture larger buildings, which tend to be administrative in function.
Settlement morphology can also affect detection rates as GHS-BUILT-S2 and WSF capture regions of densely clustered buildings that effectively offer a spatially contiguous settlement signal. For many settlements, WSF and GHS-BUILT-S2 exclusively detect structures in densely built-up regions. Such densely arranged buildings contribute to settlement detection accuracy and underlie the pattern of 3-and 4-product agreement ( Fig.  11; Fig. A3). Morphology also influences FAR since false alarms most often occur in open, vegetated lands immediately adjacent to buildings rather than infrastructure. This is especially prominent with GRID3 data due to the processing step of buffering Ecopia building footprints that are at the core of GRID3 coverage. The resulting buffer around buildings that are isolated from others contributes to consistently higher FAR rates for GRID3, and is most pronounced for settlements that are diffusely settled ( Fig. 9; Fig. A2).
The timing of settlement establishment can also contribute to divergent areal coverage and detection accuracy. We found that settlements established before 2016 were regularly better detected than settlements established in 2016 or later, which was mainly a consequence of the pre-2016 acquisition of satellite imagery used to generate the settlement products. However, this study showed that even the earliest and most populated refugee settlements of Kyangwali (1960), Oruchinga (1961), and Rhino Camp (1980) are poorly detected by all four settlement products despite these settlements being persistently inhabited for decades before satellite image acquisition. It is also possible that seasonal and phenological conditions at the time of image acquisition were not favorable for refugee settlement detection [49,63]; however, the specific dates of imagery used in generating the human settlement product coverage over Uganda here were not available to the study.
More difficult to consider is how the characteristics of buildings within refugee settlements compare to the buildings and settlements used to train and validate the settlement detection approaches of the four human settlement products considered here. Image classification approaches for remote sensing-derived global settlement products are commonly trained and evaluated on densely populated human settlements [6,24,64]. If a training dataset did not include refugee or other informal settlements that have construction materials and morphologies distinct from cities and towns that are typically used to train and validate a detection approach, it is likely that settlement detection approaches would have struggled to capture study refugee settlements. Indeed, as described above, there is no evidence that refugee or informal settlements were included in product training or validation. It is worth noting that Ugandan settlements are likely easier to detect than many other refugee settlements in Sub-Saharan Africa that have far lower building densities and lack durable roofing materials and impervious surfaces that would be easier to detect.
Low coverage and poor accuracy of refugee settlement detection is of concern for the producers of human settlement datasets who seek accurate detection of where humans live. Within a settlement, inconsistent and incomplete detection of settlements distorts our understanding of refugee land use. As described above, the UNHCR boundaries used in this study are established for settlement planning purposes before refugee arrival and usually do not represent the actual extent of refugee dwellings or land use. After refugee arrival, the collection of refugee dwellings and administrative buildings make up a functional settlement extent that reflects the actual distribution of the refugee population's footprint within the encompassing UNHCR boundary. Having an inaccurate representation of this functional settlement extent means that the spatial allocation and configuration of housing, infrastructure, and household agricultural plots are similarly distorted. While helpful for monitoring settlement land use dynamics, having functional settlement extents would also allow researchers to pivot from relying on the centroid of a refugee settlement as the principal geographic identifier.
There are also cascading consequences for modeled estimates of population driven by remote sensing-based human settlement data. Settlement data help refine the spatial allocation of administrative population data in unmapped or otherwise difficult to access regions and inform monitoring, planning, and assessments of development targeting, economic productivity, as well as humanitarian intervention [11,12,35,39]. We examined how well HRSL estimates the refugee population in each settlement by comparing HRSL-estimated populations to UNHCR-reported settlement populations (Table 1). We find that HRSL consistently underestimated settlement-level refugee populations across the 30 study settlements (Fig. 13). HRSL has a median underestimation of 6316 and 57,869 people for pre-and post-2016 settlements, respectively, and undercounts the total population across all 30 settlements by 837,751 people accounting for only 59% of the total refugee population in Uganda. These settlement-level population estimates based on HRSL data are almost certainly an underestimation of refugee populations for three reasons. First, we showed that HRSL underestimates the settlement area at most refugee settlements. Second, the settlement area detected by HRSL that drives the population allocation may include non-refugee settlements located within UNHCR boundaries; this is the case at settlements such as Bidi Bidi, Kyangwali, Imvepi, and Rhino Camp. Since settlement area measurements may be inflated due to the presence of non-refugee settlements, the derived population estimates would be even less if only refugee settlements were being considered, which would result in an even larger underestimation. Third, since refugees are excluded from national census data, the administrative population being allocated to detected refugee settlement pixels is lower than the actual total population if refugees were included.
As is apparent, a more inclusive approach --spatially, temporally, and thematically --to broad-scale detection of refugee and other small-scale human settlements is needed. First and foremost, refugee settlement locations and areas should be included in the training and validation of remote sensing-based human settlement product generation and explicit accuracies should be reported; this will help clarify the relevance of broad-scale human settlement products for detection of refugee and other small-scale settlements.
Second, the disparate detection accuracies between GHS-BUILT-S2 (10 m resolution source imagery) and WSF (10-30 m resolution source imagery) compared to GRID3 and HRSL (0.5 meter resolution source imagery) suggest that automated approaches using very high resolution imagery to detect buildings greatly improve small-scale refugee settlement detection accuracy [6]. While several localized studies have used moderate resolution satellite imagery (i.e., Landsat or Sentinel-2) to capture refugee settlements (e.g., [41,46]), such local successes have not yet translated to broad-scale automated detection of refugee settlements. Third, the establishment of new refugee settlements each year to support the growing global refugee population quickly makes human settlement datasets out-of-date, and is the principal reason for low detection accuracies for post-2016 settlements compared to pre-2016 settlements. In order to capture new refugee settlements, and not only document the growth of existing settlements, remote sensing-based human settlement datasets need timely updates at least every year and cannot linger as static, outdated snapshots of where people live, refugees or otherwise.
This study offered the first systematic analysis of refugee settlement presence within satellite-derived human settlement datasets but was limited in several ways. The diversity of image dates used to produce each of the four products made it uncertain whether a settlement had even been established when imagery was acquired for a given product. Examining coverage at settlements established before the earliest acquired imagery in 2011 helped address this uncertainty, but having a georeferenced metadata product labeling the specific image dates used in each product would have completely clarified the ambiguous product timing at settlements. Though the OSM-MS building footprint dataset was based on imagery collected over 2017-2020, it likely underrepresents the actual collection of buildings in some settlements, which would have contributed to greater FAR than was warranted. Furthermore, using building footprints for validation purposes introduces semantic dissimilarities between the cartographic representation of settlement in each of the products and the use of building footprints for independent validation.
There are ample opportunities for future work to build on the present study. While the study was set in Uganda to benefit from the wealth of ancillary data on refugee settlement boundaries and building footprints that are not systemically available in other refugee-hosting countries, future work would benefit from a broader sample of refugee or internally displaced persons (IDP) settlements in multiple countries that offer even more morphologic diversity. Including additional detail on the size, type, and construction material of structures being detected or excluded by settlement products would be helpful to target improvements in further human settlement products. Similarly, characterizing refugee settlement detection in terms of spectral or textural conditions or quantifying the influence of building size or building density on detection success would offer tailored suggestions for improved detection of small-scale settlements. A comparison between localized refugee settlement detection approaches using high resolution imagery (e.g., [33,34,36]) and GHS-BUILT-S2, WSF, HRSL, and GRID3 would also help demonstrate whether broad-scale products are fit for humanitarian purposes. A follow-on assessment of refugee settlement detection in next-generation human settlement products such as WSF-Evolution, the next iteration of GHS-BUILT, or products based on sub-annual or annual very high resolution imagery from Planet or Maxar would also be valuable. Finally, developing a comprehensive assessment of population estimations in refugee settlements informed by satellite-based human settlement products (e.g., [11,12]) would illuminate refugee, migration, and development policy implications of this study's findings.

Conclusion
This study presents the first systematic analysis of refugee settlement detection in satellite-based broad-scale human settlement products. Across 30 refugee settlements in Uganda, four human settlement products --GHS-BUILT-S2, WSF, HRSL, and GRID3 -and 317,416 building footprints for independent validation, this study found that settlement products offered low to moderate detection accuracies, a high rate of false alarm, and tend to agree in detection within settlements only at the very largest or reflective buildings or in localized regions of high building density. GRID3 and HRSL, based on very high resolution satellite imagery, offered the greatest coverage and detection accuracy and contributed the most unique detections of building footprints within refugee settlements. While these results are explicitly associated with refugee settlements in Uganda, similarly low detection rates are likely present at other small-scale or informal settlements around the world. Such inadequate detection of refugee settlements raises concerns about the continental or global accuracy of human settlement products as well as the quality of follow-on geospatial products such as population maps based in part on remote sensing-based human settlement data.
Settlement products tout exceptional nominal accuracies above ninety percent for detecting human settlements, yet the poor detection of refugee settlements shows the need for a more inclusive human settlement detection approach. A more inclusive approach could involve formally incorporating refugee settlements in the training and testing of human settlement products, reporting of detection accuracy at refugee settlements, using very high resolution imagery to detect small-scale dwellings and other structures that often make up refugee settlements, and providing annual updates to human settlement datasets that keep pace with the establishment of new refugee settlements around the world. The accurate inclusion of refugee settlements in broad-scale human settlement products would appropriately recognize refugee settlements as being long-term residences for millions of people and improve the cartographic visibility of refugees and refugee settlements in data-driven development and climate hazard mitigation efforts.  Data Availability Statement: OSM (OpenStreetMap) refugee settlement boundary data for Uganda are available at https://data2.unhcr.org/en/documents/details/74116 (accessed on 1 December 2020). GHS-BUILT-S2 data are available at https://ghsl.jrc.ec.europa.eu/ghs_bu_s2_2018.php (accessed on 1 December 2020). WSF data are available at https://figshare.com/articles/dataset/World_Settle-ment_Footprint_WSF_2015/10048412 (accessed on 1 December 2020). HRSL data are available at https://www.ciesin.columbia.edu/data/hrsl/ (accessed on 1 December 2020). GRID3 data for Uganda are available at https://academiccommons.columbia.edu/doi/10.7916/d8-s1yg-pc20 (accessed on 1 December 2020). OSM footprint data for Uganda are available at https://data.humdata.org/dataset/hotosm_uga_buildings (accessed on 1 December 2020). MS building footprint data are available at https://github.com/microsoft/Uganda-Tanzania-Building-Footprints (accessed on 1 December 2020).  , and False Alarm Rate (FAR) for multi-product agreement sites in comparison to OSM-MS building footprint coverage across all refugee settlements. "1 product" indicates unique coverage by a single product. "2 products", "3 products", and "4 products" indicate the number of settlement products that share coverage.