Improving the accuracy of gridded population estimates in cities and slums to monitor SDG 11: Evidence from a simulation study in Namibia

200 of 200 words) People living in slums and other deprived areas in lowand middle-income country (LMIC) cities are under-represented in censuses, and subsequently in "top-down" gridded population estimates. Modelled gridded population data are a unique source of disaggregated population information to calculate local development indicators such as the Sustainable Development Goals (SDGs). This study evaluates if, and how, WorldPop-Global (WPG) -Unconstrained and -Constrained “top-down” datasets might be improved in a simulated realistic LMIC urban population by incorporating slum profile population counts into model training. We found that the WPG-Unconstrained model with or without slum training data grossly underestimated population in urban deprived areas while grossly overestimating population in rural areas. SDG 11.1.1, the percent of population living in slums, for example, was estimated to be 20% or less compared to a "true" value of 29.5%. The WPGConstrained model, which included building auxiliary datasets, far more accurately estimated the population in all grid cells (including rural areas), and the inclusion of slum training data further improved estimates such that SDG 11.1.1 was estimated at 27.1% and 27.0%, respectively. Inclusion of building metrics and slum training data in “top-down” gridded population models can substantially improve grid cell-level accuracy in both urban and rural areas.


Introduction
The United Nation's 2030 Agenda for Sustainable Development outlines a wide-ranging, integrated set of 17 Sustainable Development Goals (SDGs) and 169 targets coupled with 231 unique indicators for measuring and monitoring progress toward sustainable global development [1]. SDG 11 focuses on sustainable development of cities and communities, including the upgrading and integration of slums, informal settlements, and other deprived areas into cities. SDG indicator 11.1.1 specifically measures the percent of urban populations living in slums, informal settlements, or inadequate housing [2]. Fundamental to SDG 11 is the ability to measure all populations in urban settings; however, those living in deprived areas, particularly in low-and middle-income countries (LMIC), are often grossly under-represented in official demographic datasets such as censuses and household surveys [3,4]. The average LMIC census omits half of residents in slums and informal settlements, though this varies widely with some censuses omitting nearly all slum-dwellers [3,4]. Missing population counts are crucial to SDG efforts because cities that are expected to grow the most over the next 30 years will see much of that growth concentrated in deprived areas [5].
Modelled gridded population datasets that are based on census data can inherit some census data limitations, but they also provide an important proxy for fine-scale population distribution information that otherwise would not exist and are crucial to ensure that no one is left behind, especially in countries with coarse and/or poor quality census data [6]. However, gridded population modelling assumptions and approaches can further mask the urban poorest, for example, by assuming that average population density in large administrative units reflect population density in smaller grid cells [7]. Although the ability to disaggregate population and demographic information at fine-scale is vital to measuring more than half of SDG indicators [8], and despite continued advances in the data and techniques for spatial disaggregation of population data, reasonable population estimates for most deprived urban areas are lacking globally [7].

Gridded population accuracy in deprived urban areas
Previous work evaluated the accuracy of the above nine gridded population datasets in Nigerian and Kenyan slums as compared to mapped and field referenced data by Slum Dwellers International (SDI) [7]. Findings from that study highlight that the most accurate gridded population datasets only predicted 39% of field-referenced slum residents [7]. There were potentially two reasons for these systematic underestimates of slum dwellers across datasets. First, the implicit assumption in all topdown gridded population models is that average population densities in larger administrative units represent population densities in much smaller grid cells [24]. Second, most gridded products assessed in that study did not include an auxiliary dataset, such as building density, which correlates with variation in urban population density at the scale of the gridded output [7]. The exception was WPG-Constrained which uses cell-level building footprint metrics (e.g. building density, total building edge length) to estimate population counts at the grid cell-level in most African countries [13].
The evaluation of gridded population dataset accuracy within cities, particularly within urban deprived areas or grid cells, is challenging and generally only performed in high-income countries where highly detailed reference data are available [e.g., 13,14]. An exception is an analysis which evaluated grid cell-level accuracy of WPG-Unconstrained gridded population using a realistic simulated, geo-referenced population in Khomas, Namibia [3]. The simulation analysis showed that even when "true" population counts were input to the model, 0.5 to 1 non-slum dweller and 0.75 to 1.5 slum dwellers were omitted for every 1 person accurately estimated in an urban grid cell [3]. The systematic underestimation of slum populations raised ethical questions about the use of gridded population estimates in development applications such as SDG reporting. How much value does a gridded population dataset, like WPG-Unconstrained, offer for SDG 11.1.1 monitoring if it masksand therefore reinforces marginalisation of -the urban poorest?
The aim of this study is to evaluate if, and how, gridded population datasets might be improved to overcome systematic under-estimates of populations in deprived areas in a typical low-and middleincome country urban environment. This study focuses on WorldPop open-access data products because they have been widely used in development practice for over a decade, often in urban settings [23,[27][28][29][30]. It specifically evaluates grid cell-level accuracy in the WPG-Unconstrained and WPG-Constrained datasets, and the effect of supplementing each model with SDI-like slum population counts during model training. This is done by using the earlier referenced simulated "true" population in Khomas, Namibia [3] as the reference population and to calculate the population density inputs to all models.

Setting
This study utilised an existing 2016 simulated "true" population in Khomas, Namibia [3], which is described in detail in section 2.3.1. Khomas is one of 13 Regions (1 st -level administrative units) in Namibia, and more than 95% of the population in Khomas lives in Windhoek, Namibia's capital ( Figure 2). Windhoek's population and footprint have grown rapidly since 1990 when the country gained independence because many rural populations were able to move freely and legally to cities for the first time [31]. Between the 2001 and 2011 censuses, the population in Windhoek grew by a staggering 37% [32].

Analysis
Four gridded population datasets generated with four different models were compared: (1) WPG-Unconstrained standard model, (2) WPG-Unconstrained model plus slum training data, (3) WPG-Constrained standard model, and (4) WPG-Constrained model plus slum training data. WPG models are based on a Random Forest machine-learning algorithm which builds a series of "trees" that characterise the relationship between population density and a set of covariates [12]. WPG estimates are generated in three steps. First, for each input administrative unit, the corresponding average population density (people per hectare) and average covariate values are calculated, and used to generate the series of model "trees," each with a set of parameters that define the relationship between population density and covariates at the administrative unit level. In the WPG-Unconstrained model, average population density is calculated from the entire areas of the administrative unit while in the WPG-Constrained model, average population density is calculated with the land area that is classified as settled only [33]. Second, gridded datasets representing the same set of covariates used to train the Random Forest model and the "trees" of parameters are used to generate multiple population density predictions in ~100x100m grid cells [12]. The WPG-Unconstrained model contains 23 globally-available covariates [34], and the WPG-Constrained model in Africa contains an additional 10 covariates related to building footprint presence and patterns [35]. The hundreds of cell-level population predictions are averaged to create a final layer of cell-level population density. Third, because the averaged grid cell values do not sum to the administrative unit input populations values, the Random Forest output layer is used as a weights layer to disaggregate input administrative population totals into grid cells (see [36] for a visual and details).
During the first phase when the relationships between population density and covariate values are being defined at the administrative unit level, it is possible, and even advisable, to incorporate more spatially detailed information about population distribution, especially when the input administrative units are very coarse (geographically much larger than the ~100x100m target grid cells) [12]. This is because the Random Forest model can only predict population densities during the second phase (to ~100x100m grid cells) based on the results of the model fitting at the administrative unit level (first phase). Finer-scale administrative units from a similar, neighbouring country are often used as supplemental training data to ensure the model includes finer and more realistic population densities relevant at the grid cell-level. Here, we test the use of a few slums as supplemental training data in the 2 nd and 4 th models because (a) slum are geographically smaller than constituencies, and (b) are likely to contain high population densities which, in theory, would enable the Random Forest model to predict more realistic population densities in high-density grid cells.
To assess the magnitude of cell-level error, we calculated root mean square error (RMSE), which penalises large errors, and to assess the direction of cell-level error, we calculated Bias. Both of these measure can be sensitive to population size (such that errors are larger in cells with larger populations), so we also calculate Normalised RMSE (NRMSE) and Relative Bias which adjust the statistic by average cell-level population. In error calculations, is the "true" population in cell , is the gridded population estimate in cell , and is the number of cells.

Data
2.3.1 Reference: Population. The simulated "true" population was derived from 2011 census 20% microdata [37] and 2016 building point locations manually digitised using Maxar and SPOT imagery (40cm). The simulated households were joined to digitised building point locations based on probability surfaces of different household types. The input datasets and methodology for the simulated "true" population are detailed elsewhere and available for download and use [3,38]. Note that in the 2011 Namibian census, 89,438 households were recorded in Khomas, and the simulated population put this number at 97,667 households in 2016 (1.8% annual growth rate) [3], which is smaller than the projected growth rate estimated by the Namibia Statistics Agency (> 4%) [39]. This means that actual population in 2016 was likely higher than this simulated "true" population, and likely led to conservative results. However, for the purposes of this analysis, we treat the simulated population as the "true" location of individuals and households in 2016 ( Figure 3). 2.3.3 Model input: Population. The model input population was simply the "true" simulated population aggregated from household point locations to constituencies (2 nd -level administrative units). While Namibia is one of several countries that provides EA-level population counts to gridded population modellers, a large number of LMICs (for example Nigeria, Cameroon, Chad, Iraq, Uzbekistan, Argentina, or India) do not share similarly disaggregated population counts [41]. In most LMICs, gridded population estimates are derived from coarse (and often outdated or inaccurate) census population counts, thus Namibia's constituency boundaries were used to mimic this common situation. The "true" reference population was used to derive input population counts in constituencies located in Khomas, while 2016 constituency UN population projections [11] were used for all other Namibian constituencies to (i) eliminate the effect of outdated or inaccurate census data on model outputs, and (ii) isolate the effect of the models used to generate gridded population estimates. As described above, WPG-Unconstrained model inputs reflected average population densities across the entire area of constituencies, while WPG-Constrained model inputs reflected population densities in the settled area only of constituencies as indicated in the Figure 3 upper-left panel with white shading [ Figure 3). As with other model input data, the simulated "true" population was summed within slum boundaries to create the corresponding population counts and densities.
2.3.5 Model input: Predictive covariates. All 23 standard WorldPop-Global covariates were included in both the WPG-Unconstrained and -Constrained models, with zonal statistics generated by constituency (and slums, as applicable) [34] ( Table 2). WPG-Constrained models additional included 10 covariates derived from Maxar/Ecopia building footprints in 100x100m grid cells [35], which is a defining characteristics of this modelling approach [33]. See appendix Figure A1 for a map of each covariate in Windhoek, and Figure A2 for importance scores for the most predictive covariates in each model.

Results
The distribution of model input population densities varied widely between the WPG-Unconstrained (Model 1) and -Constrained (Model 3) due to massive differences in the total constituency area (Model 1) versus total constituency area classified as settled (Model 3). The presence of vast unsettled areas across Namibia meant that the majority of population densities calculated across full constituencies were below 1 person per hectare, with a handful of values registering up to 123 people per hectare ( Figure 4). Conversely, because WPG-Constrained densities were calculated from settled areas only, all population densities were at least 2 people per hectare, with the bulk of estimates between 3 and 6 people per hectare, and a maximum of 134 people per hectare ( Figure  4). The incorporation of slum training data into models extended the distributions of possible population densities to a maximum of 177 people per hectare (Models 2 and 4), and provided a more complete range of values to the Random Forest model to assign to ~100x100m grid cells (Figure 4). The impact of unconstrained versus constrained population density inputs, and the addition of slum training data were both strongly apparent in the modelled outputs. Model 1 (WPG-Unconstrained) resulted in highly smoothed estimates across settled and unsettled areas that ranged from less than one person to 112 people in the most populous grid cell ( Figure 5). Adding slum training data to the Unconstrained model (Model 2) slightly concentrated population estimates over Windhoek but did not increase the most populous grid cell value ( Figure 5). Model 3 (WPG-Constrained) estimated zero population in grid cells with no buildings such that the estimated population was concentrated in settled grid cells, but still highly smoothed; the most populous grid cell was only estimated to have 121 people ( Figure 5). The addition of slum training data to the WPG-Constrained model (Model 4) resulted in the most variable output and higher population counts, though the maximum estimate of 131 people in a grid cell was roughly a third of the "true" maximum grid cell population (386 people) ( Figure 5). All models incorrectly attributed population to a large unpopulated industrial zone near the city centre ( Figure 5). Next, differences between each of the four estimates and the "true" population were calculated. Figure 6 shows the distribution of these differences in all grid cells with one or more people (omitting grid cells with <1 person), and visualised separately for Windhoek not-deprived, Windhoek deprived, and the rest of Khomas. Unsurprisingly, the differences in the rest of Khomas were far more variable in unconstrained models (Models 1 and 2) compared to constrained models (Models 3 and 4) because population was often estimated in unsettled grid cells in unconstrained models. Within Windhoek, the standard WPG-Unconstrained (Model 1) slightly underestimated the population in not-deprived areas (Bias: -3.8), and consistently and severely underestimated the population in deprived areas (Bias: -26.4) ( Figure 6). Adding slum training data (Model 2) only marginally improved estimates in Windhoek's not-deprived (Bias: -2.2) and deprived areas (Bias: -22.7) (Figure 6). The WPG-Constrained model (Model 3) was more promising with reduced rates of error in Windhoek's deprived (RMSE: 37) and not-deprived (RMSE: 26) areas, and a better balance in bias across not-deprived (Bias: -0.9) and deprived (Bias: -6.3) areas ( Figure 6). Again, adding slum training data (Model 4) had the effect of slightly improving the errors and bias in all areas ( Figure 6). The marked improvement in models 3 and 4 compared to 1 and 2 were driven by constrained input population densities, and by building covariates, in particular building density which aligned with population density (see Appendix Figure A2). Figure 6. Histograms of differences between predicted and "true" population by cell in Windhoek, and unadjusted RSME and Bias statistics. Graphs and statistics represent only grid cells with 1+ estimated people, and are presented by settlement type. The tops of histograms are not shown for readability.
Given the sensitivity of these statistics to population size, Table 3 reports RMSE and Bias adjusted by average grid cell population to represent error and bias per person in each model, allowing for a fairer comparison across settlement types (i.e., deprived, not-deprived and rest of Khomas). We further report the percent of population inaccurately misallocated to "truly" unpopulated grid cells, and the percent of Windhoek's population living in deprived areas (SDG 11.1.1). Findings are consistent with the unadjusted error and bias statistics reported in Figure 6, and Model 4 continued to offer the best balance of low error and bias across all settlement types (Table 3). For accuracy statistics across all cells, including cells in the rest of Khomas located in vast desert areas, see Appendix Table A1. The "true" percent of population living in slums, informal settlements, or other deprived areas in Windhoek (SDG 11.1.1) according to the "true" simulated population is 29.5%.
In practical terms, the tendency of WPG-Unconstrained models (Models 1 and 2) to severely underestimate population in deprived areas meant that SDG 11.1.1 in Windhoek was grossly underestimated. According to the "true" simulated population in Windhoek, the actual percent of population living in slums, informal settlements, or inadequate housing (SDG 11.1.1) was 29.5% in 2016, though outputs from Model 1 put this figure at just 19.2%, and Model 2 at 20.0% (Table 1). Model 3 (27.1%) and Model 4 (27.0%) produced reasonably accurate estimates and could, therefore, plausibly be used to measure SDG 11.1.1 in cities with maps of deprived versus not-deprived areas (Table 1). For a map of model differences from the "true" simulated population in deprived versus not-deprived areas of Windhoek, see Appendix Figure A3.

Discussion
Using a simulated "true" population in a real-world location, grid cell-level accuracy in four gridded population models were compared across three settlement types (urban not-deprived, urban deprived, and rural). Findings show that the recently released WPG-Constrained model, which includes 10 building footprint auxiliary datasets derived from Maxar/Ecopia, performed well on its own across diverse urban and rural settings, and even better with a few slum training locations. In real-world settings, SDI and similar data could be used to identify slum boundaries and realistic population counts. The WPG-Constrained model, however, was imperfect with slight underestimation of urban deprived populations and slight overestimation of rural populations at the grid cell-level. While users should always remain cautious about interpreting population counts in small areas due to the accumulation of model errors and uncertainty at finer scales, our reasonably accurate SDG 11.1.1 results inspire confidence in the WPG-Constrained model with slum training data in terms of the relative distribution of population within urban and rural areas. For example, we would feel comfortable using these model outputs as a household survey sample frame (stratified by urban/rural) [23] or for SDG 11 reporting (within urban areas).
These findings add nuance to previous findings that gridded population datasets tend to vastly underestimate populations in slums [7]. This analysis differed from the previous study by controlling for the accuracy of the input population datasets, using simulated "true" population counts rather than real-world census data which contain unquantifiable inaccuracies. Our results suggest that the addition of building auxiliary data and slum training data can sufficiently improve modelled estimates in urban and rural areas for SDG reporting, if the input population data are relatively accurate. It is worth reiterating that the input population data in the four models were aggregated to constituencies (2 nd administrative level), which bodes well for the applicability of these findings across LMIC settings where published census data are often highly aggregated. While countries with the greatest need for gridded population estimates are likely to have the most outdated or inaccurate census data, we can be confident that a gridded population model that includes building auxiliary data and slum training data produces the most accurate, disaggregated top-down estimates possible.
We posit two additional datasets could improve the accuracy of top-down gridded population data. First, inclusion of settlement classification layer that accurately distinguishes urban deprived from not-deprived areas. We are not aware of any such global datasets at fine geographic scale that could be used for this purpose, yet; however, the IDEAMAPS Network is working toward developing such a layer [54], and several algorithms have been published and tested in small areas that could, in theory, be scaled across cites. Jochem and Tatem (2021), for example, published an R package that uses building footprints to classify settlement types, though it has only been tested in Europe and imperfectly distinguishes between urban settlements types [55]. The World Resources Institute released Python code to distinguish urban land use types, including "residential informal" land use, from Sentinel-II imagery and demonstrated its application in India and Mexico, though substantial bespoke training data were required [56,57]. If, and when, a routine accurate map of deprived urban areas becomes available across cities, population distribution modellers might consider dividing a country by settlement type before modelling (e.g. urban-deprived, urban-not-deprived, rural), given differences in population densities across these settlements types.
A contributor to error in Windhoek's not-deprived areas was the estimation of people in grid cells containing non-residential industrial buildings. Across all four models, hundreds of non-residential cells in an industrial zone near downtown Windhoek were allocated dozens of people each (see figure 5). Among the building covariates included in the WPG-Constrained model, not a single one distinguished between residential and non-residential buildings, though the algorithms and technologies to extract such information are being developed [58]. Inclusion of a building residential/non-residential dataset would likely improve estimates, especially in urban not-deprived areas, by further constraining settled residential areas. In addition to exclusion of industrial buildings, this type of dataset should exclude airports, government buildings, universities, hospitals, military bases, and commercial centres, all of which contribute to inaccurately allocate population in gridded population model outputs, creating practical challenges for data users in the field [59], and potentially skewing development indicators in urban areas [30].
These findings are particularly relevant for understanding the accuracy of top-down gridded population estimates in grid cells and small-area in LMIC cities at a recent point in time. Numerous types of users will be interested in findings at this fine scale in LMIC contexts, including governments and organisations involved with planning and development. We specifically focus on users involved with monitoring of SDG 11.1.1; however, these results are applicable to other SDG and development indicators calculated within LMIC cities. Other gridded population datasets might be better suited to other types of use cases; for example, GHS-POP is appropriate for longitudinal analyses at more aggregated scales [17], LandScan is often appropriate for day-time regional-scale disaster response when people are likely to be away from home [10], and GPWv4.11 is recommended as an input to climate and other global research models [21]. We echo the findings from other studies that indicate that the different gridded population datasets are better suited for different use cases, and users need to be informed about which model outputs are most relevant for their aims [6,60].

Conclusion
Using slum training data in top-down gridded population modelling improves urban population estimates, particularly in deprived areas, which are systematically underrepresented in LMIC population data (including censuses, surveys, and gridded population estimates). The WorldPop-Global-Constrained models with, and without, slum training data produced fairly accurate SDG 11.1.1 estimates for Windhoek, though the inclusion of slum training data reduced errors and bias across urban slum, urban non-slum, and rural areas. This analysis puts us closer to accurately estimating SDG indicators from gridded population data, but additional model improvements are still needed, such as inclusion of slum training data (e.g., from SDI community profiles), deprived area boundaries, and building typology information. Figure A2. Importance Scores (national constituency inputs + 15 Windhoek slums) Across all WPG models, VIIRS night-time lights intensity was a strong predictor of population density.
In the WPG-Constrained models (M3 and M4), building counts were, by far, the most predictive of population density.  Figure A3. Visual comparison of modelled population over/under-estimates in a northern section of Windhoek, Namibia