How accurate are WorldPop-Global gridded population data at the cell-level?: A simulation analysis in urban Namibia

Disaggregated population counts are needed to calculate health, economic, and development indicators in Low- and Middle-Income Countries (LMICs), especially in settings of rapid urbanisation. Censuses are often outdated and inaccurate in LMIC settings, and rarely disaggregated at fine geographic scale. Modelled gridded population datasets derived from census data have become widely used by development researchers and practitioners; however, none of these datasets have been evaluated for accuracy of population estimates at the grid cell-level. This is because the finest-scale population figures generally available to data producers are those input into gridded population models and disaggregated to smaller grid cells (e.g., 100x100m). We simulate a realistic "true" 2016 population in Khomas, Namibia, a majority urban region, and introduce realistic levels of outdatedness (over 15 years) and inaccuracy in slum, non-slum, and rural areas. We then aggregate these simulated realistic populations by census and administrative boundaries (to mimic census data), and generate 32 gridded population datasets that are typical of a LMIC setting using WorldPop-Global's gridded population approach. We evaluate the cell-level accuracy of these simulated WorldPop-Global datasets, using the original "true" population as a reference. In our simulation, we found large cell-level errors, particularly in urban cells, driven by WorldPop-Global's use of average population densities in large areal units to determine cell-level population densities. Age, accuracy, and aggregation of the input data did play a primary role in these errors. We suggest incorporating finer-scale training data into gridded population models generally, and WorldPop-Global in particular (e.g., from simulated populations, routine household surveys, or slum community profiles), and use of new building footprint datasets as a covariate to improve cell-level accuracy of gridded population data. It is important to measure cell-level accuracy of all gridded population datasets, especially if they are to be used for monitoring key development indicators.


Introduction
Small area population counts, especially in low-and middle-income countries (LMICs), provide essential denominators for health, economic, and development indicators [1]. For example, small area population counts are used to calculate vaccination coverage rates [2], understand health service utilisation [3], and estimate infection rates of malaria, COVID-19, and many other health conditions [4]. Censuses are generally collected every ten years, though one in ten LMICs has not held a census in the last 15 years [5], and some national censuses have poor data quality due to negligence (e.g., [6,7]) or deliberate mis-counting of subpopulations for political purposes (e.g., [8][9][10]). Due to increasing rates of mobility and urbanisation worldwide, especially in African and Asian cities where 90% of global population growth is expected in the next 30 years [11], the urban poorest are increasingly difficult to count as more people take-up residence in informal settlements or atypical housing locations (e.g., a shop) [12].
In the absence of updated, fine-scale census data, many policy-makers, researchers, and service providers have turned to gridded population estimates as a source of population counts in their work. Gridded population data provide estimates of the total population in small grid cells, and are derived from geostatistical models using population counts and spatial datasets [13]. "Top-down" gridded population estimates have been available for roughly 15 years and disaggregate census or other complete population counts in areal units (e.g., 3rd-, 4th-, or 5th-level administrative units) to grid cells (e.g., 30x30m, 100x100m, 1x1km) [12]. The simplest models assume a uniform distribution of population within areal units (i.e., GPW [14,15], GHS-POP [16,17], HRSL [18]), while the most complex models use spatial covariates to inform spatial disaggregation of the areal unit population to grid cells (i.e., 20], LandScan [21,22], WPE [23]). Most models aim to reflect the average night-time residential population distribution, though LandScan reflects "ambient" population, the average between night-time residential and daytime commuter populations [21]. To estimate gridded population figures beyond the year of the last census; birth, migration, and death rates are used to project new population totals by areal unit [24]. "Bottom-up" gridded population estimates are derived from micro-census population counts in a sample of areas, or from assumptions about the average household size, and have only recently been developed [25,26]. Most gridded population datasets use a settlement layer to "constrain" population estimates to settled grid cells. Exceptions include Gridded Population of the World (GPW) in which the areal unit population is simply divided equally among cells [14], and WorldPop-Global-unconstrained which uses a complex model to produce disaggregation weights for all land areas, and results in very small estimates of a person (e.g., 0.00001 people) in unsettled grid cells [19]. Read papers by Leyk and colleagues (2019) and Thomson and colleagues (2020) for detailed descriptions and comparisons of these gridded population datasets [12,13].
The accuracy of "top-down" gridded population data is generally calculated at the scale of the input population areal units because these are the finest-scale population counts available to the data producers. A number of factors contribute to gridded population model accuracy including: (1) the modelling algorithm itself, (2) inaccuracy of the input population data, (3) the geographic scale of the input population data (e.g., census tracts versus districts), (4) the age, accuracy, completeness, and type of ancillary data, (5) the nature of the relationship between ancillary data and population density, and (6) the geographic scale of the output grid. Of these, the two strongest predictors of accuracy (at the scale of areal units) in top-down gridded population models are the resolution and age of the data [27]. Among top-down gridded population datasets, the WorldPop-Global Random Forest model is among the best documented and most accurate gridded population models available [19,28] with publicly available model code [29] and pre-processed model covariates [30,31] enabling reproducibility and evaluation. WorldPop-Global and its preceding data products (AfriPop, AsiaPop, and AmeriPop) are unconstrained; however, a new WorldPop-Globalconstrained dataset was recently published [32].
Accuracy of gridded population data at the scale of input areal units is not ideal, as users of these datasets tend to need population counts at finer geographic scales [13]. Compounding this issue is that users often turn to gridded population estimates when census counts are excessively outdated or untrustworthy [12]. The accuracy of gridded population data at the scale of output grid cell is largely unknown due to the challenges of finding reliable population counts to use as a reference. Where extremely fine-scale population counts are available, gridded population estimates are not needed. Given that many gridded population estimates are derived from outdated or inaccurate census data, and use of gridded population data is most common when census data are outdated or inaccurate, it is imperative to understand if, and how, census inaccuracies propagate through gridded population data. Few accuracy assessments have ever been performed on gridded population data at the cell-level, and those that exist are generally from high-income countries (e.g., [33]).
To evaluate cell-level accuracy of gridded population data, actual population counts are needed for each grid cell or in finer-scale units such as household point locations. Few censuses in LMICs collect household latitude-longitude coordinates, and where they do, the data are extremely sensitive and difficult to obtain. Furthermore, even the best census data might be problematic because vulnerable sub-populations including homeless and nomadic populations are supposed to be counted separately in special enumerations, though under-resourced statistical offices are often not able to perform these counts [34], and some censuses do not include certain refugee or internally displaced populations [35]. To ensure that this analysis of cell-level accuracy did not exclude the urban poorest and other hidden populations, we chose to simulate a realistic population in a LMIC setting. It was important that the simulated population was located in a real-world location so that actual covariate datasets -with their own imperfections -could be used to generate realistic gridded population datasets. We adapted methods outlined by Thomson and colleagues (2018) for simulating a geo-located realistic household population, and added classification of urban households by slum/non-slum area in a final step to focus this analysis on dynamic, complex LMICs cities where inaccuracies in gridded data are likely to propagate [36].
This paper describes how we evaluated the cell-level accuracy of 32 simulated 100x100m WorldPop-Globalunconstrained gridded population datasets which reflect realistic scenarios of census (1) outdatedness, (2) inaccuracy, and (3) aggregation in an urban LMIC setting.

Setting
We chose to simulate a population in Khomas, Namibia -in which the vast majority of residents reside in Windhoek, the capital -because the government has produced numerous high-quality population datasets [37], and Windhoek's population is incredibly dynamic (Figure 1)

Simulation Overview
To simulate realistic gridded population datasets for Khomas, Namibia, we first simulated a "true" 2016 population geo-located to realistic manually-generated household point locations; introduced realistic outdatedness by removing households in 2011, 2006, and 2001; introduced realistic inaccuracies among urban-slum, urban-non-slum, and rural sub-populations; and finally aggregated these 16 simulated population scenarios into two geographic areal units (census EA and constituency) to generate 32 realistic WorldPop-Global-unconstrained 100x100m gridded population datasets. This workflow is summarised in Figure 2 and detailed below. Simulate a realistic population geo-located to realistic building point locations, 2) simulate three periods of outdatedness by removing households at point locations not present on satellite imagery in earlier years, 3) simulate low/middle/high census inaccuracy by removing points at random from rural, urban-slum, and urban-non-slum household types, 4) aggregate to 922 census enumeration areas (EAs) and 10 constituencies (admin-2), 5) generate 100x100m gridded population datasets in raster grid format using WorldPop-Global-unconstrained approach and WorldPop-Global spatial covariates.
Simulating a "true" 2016 population geo-located to household latitude-longitude points To simulate a realistic population in Khomas, Namibia, we used all of the same population inputs and spatial auxiliary datasets as Thomson and colleagues (2018) [36]. Broadly, this involved the creation of three datasets (modelled surfaces of household types, manually digitised building point locations, and synthetic (simulated) households), then assigned synthetic households to point locations based on the household type probability surfaces. Household types were defined from the Namibia 2013 Demographic and Health Survey (DHS) data using k-means analysis with variables that were also present in the Namibia 2011 census (e.g., improved sanitation facilities, gender of head of household). Next, probability surfaces of these household types were created using a Random Forrest model and spatial covariates to interpolate the likelihood of a given household type across Namibia between DHS survey locations [36]. The probability surfaces of "urban poor" and "urban non-poor" household types were manually adjusted due to high misclassification. Adjustments were made by manually assigning the proportion of households in each census enumeration area (EA) that appeared to be located in areas of small disorganised buildings based on visual inspection of 30m Quickbird imagery. Separately, we modelled a synthetic population of individuals nested within households across Khomas from Namibia 2011 census microdata using an iterative proportional fitting model and conditional annealing [42]. A third set of data, building point locations, were manually digitised from 2014-2016 30cm Quickbird imagery in ArcGIS 10. To link households with building locations, we calculated the most likely household type of each synthetic household using k-means analysis scores, and iteratively assigned households to building point locations based on the probability of each household type at a given building point. Finally, using the manually classified EAs (with our estimated portion of urban poor households), we classified all urban households as being located in either a slum or non-slum area. All of these steps are detailed in Supplement 1 and the paper by Thomson and colleagues (2018) [36]. This simulated population is meant to represent a realistic "true" reference population for 2016.

Simulating realistic outdatedness of Khomas census population
To simulate population outdatedness in Khomas, we imported the above 2016 "true" population household

Simulating realistic levels of under-count inaccuracy in censuses
To identify realistic levels of under-counts among urban-slum, urban-non-slum, and rural populations in LMIC censuses, we reviewed the scientific and grey literature.  Based on these findings, we simulated three levels of census inaccuracy: low inaccuracy was considered to be missing 2% of rural and urban-non-slum households, and 10% of urban-slum households; medium inaccuracy was considered to be missing 5% of rural and urban-non-slum households, and 30% of urbanslum households; and finally, high inaccuracy was classified as missing 10% of rural and urban-non-slum households, and 60% of urban-slum households (Table 1). We applied the inaccuracy rates at random within rural, urban-slum, and urban-non-slum households such that there was no spatial pattern inherent to the simulated under-counts. This exercise resulted in one "true" and 15 simulated outdated-inaccurate populations to generate realistic gridded population datasets that reflect typical gridded population estimates currently available across LMICs (Table 2).

Simulating realistic gridded population datasets
To simulate realistic gridded population datasets, we aggregated each of the simulated household populations to EA or constituency (second-level administrative unit) boundaries, and applied the WorldPop-Global-unconstrained modelling technique (for a total of 32 datasets). We applied the WorldPop-Globalunconstrained model in three phases as described in their method publication [19] ( Figure 5, Table 3). In the first phase (A), a non-parametric Random Forest ensemble machine-learning algorithm grows a "forest" of decision trees for each input unit (EA or constituency) [65]. Each Random Forest tree is a model of the potential relationships between multiple auxiliary covariates and census population counts. In the Random Forest modelling workflow, this is where model uncertainty is calculated -at the scale of the input population areal unit. In the second phase (B), all of the covariates are prepared in 100x100m cells. In this phase, the split values of each classification tree developed in phase A are used to parameterize corresponding regression models to predict population density within 100x100m cells [19]. For each cell, the predicted population values from all regression models are averaged to make a single population estimate, though these population estimates are not pycnophylactic, meaning that estimates in cells do not necessarily sum to the original areal unit population. Thus the WorldPop-Global-unconstrained workflow involves a third phase (C) outside of the Random Forest model to normalize cell-level predicted population densities to preserve census input population counts [19].

Analysing cell-level accuracy
To empirically measure cell-level accuracy of the 32 gridded population datasets, we compared each celllevel estimate against the "true" 2016 population count in that cell, then calculated root mean square error (RMSE), a measure of error magnitude that penalises large errors. This was performed on 100x100m cells, and then cells were aggregated and assessed for accuracy at 200x200m, 300x300m, and so on up to 1x1km.
To compare RMSE across cells of different geographic sizes, we normalised the statistic by area to represent RMSE per hectare (100x100m unit). We also evaluated RMSE in urban versus rural cells separately. In the calculation of RMSE, is the "true" population count in cell , ̂ is the gridded population estimate in cell , and is the number of grid cells.
To better understand the mechanics of the WorldPop-Global model and workflow, two additional statistics were calculated. Bias, a measure of error direction and magnitude, was calculated for the two gridded population datasets derived from "true" population counts to tease out accuracy related to the model and covariate datasets alone. Bias reveals whether cell-level estimates are systematically under-or overestimated. As above, bias was assessed in 100x100m cells as well as cell sizes that ranged up to 1x1km, and separately in urban versus rural areas. (2) To assess the degree to which WorldPop-Global's non-zero population estimates resulted in misallocation of population, a third analysis was performed counting the entire modelled population in Khomas that was misallocated to cells which were unsettled according to the "true" population. Millions of near-zero cell-level estimates in Khomas located outside of Windhoek severely skewed these statistics and made them appear artificially accurate, therefore gridded population cell-level estimates of less than 1 person were excluded (see Supplement 4 for a visual).

Results
RMSE in grid cells did not differ substantially across the simulated outdated-inaccurate census scenarios ( Figures 6). Furthermore, errors only slightly decreased when the input data were aggregated to EA rather than constituency. The major driver of RMSE in cells was urban location; errors in urban cells were substantially larger than in rural cells ( Figure 6). In urban areas, RMSE per hectare was lowest in 100x100m cells ranging from 25 to 35, while in rural areas, RMSE per hectare was lowest in cells 300x300m to 500x500m with an approximate value of 2 ( Figure 6). Results for select scenarios are presented in Figure 6 ranging from the "true" 2016 population to the most outdated (2001) and inaccurate (missing 10% to 60%) population, though tables of all scenario results are provided in Supplement 5.
Assessment of bias in the two gridded population datasets derived from "true" 2016 population counts revealed systematic and substantial under-estimates of populations in urban cells, and less dramatic overestimates of populations in rural cells (Table 4). For example, the average 300x300m urban cell underestimated the population by more than 200 people, while the average 300x300m rural cell was overestimated by just 3 (constituency-level input) to 14 (EA-level input) people (Table 4).   Table 5 summarises the percent of the estimated population misallocated to "true" unsettled cells. For this analysis, no cells in the estimated population were excluded. Roughly 20% (EA-level input) or 10% (constituency-level input) of the population was misallocated to unsettled 100x100m cells (Table 5). However, as cells were aggregated, the percent of misallocated population dropped precipitously. For example, at 300x300m, approximately 2% (EA-level input) or 1% (constituency-level input) of Khomas's population was misallocated to unsettled cells. This indicated that most of the population was disaggregated within, or near to, settlements. The rates of misallocation were similar across grid cell sizes when cells with less than one person were excluded (not reported).

Discussion
This is among the first accuracy assessments of a gridded population model at the grid-cell level, and the first that we know of in a LMIC setting. By developing a simulated realistic population and several scenarios of the population with realistic levels of outdatedness and inaccuracy, we were able to evaluate the accuracy of a gridded population modelling approach, as well as assess the impact of outdated-inaccurate census inputs on estimates. In this paper, we evaluated just one of several gridded population models -WorldPop-Globalunconstrained -however, a simulated realistic population could be used to assess the accuracy of other gridded population datasets. In the WorldPop-Global-unconstrained model of Khomas, Namibia, cell-level inaccuracies in urban versus rural areas dominated the results. There was limited evidence in this context that outdated or inaccurate census data played a major role in cell-level inaccuracy of gridded population estimates. Here we address three potential sources of the cell-level inaccuracies observed.
The first issue is specific to the modelling approach of WorldPop-Global-unconstrained relating to the use of a log-transformation on input administrative population counts before use of the Random Forest model. While this procedure ensures that population counts are normally distributed during modelling, it also means that cells with zero population are assigned a very small fraction of a person when log-population is untransformed [19]. A concern is that non-zero population estimates across millions of unsettled cells might add up to a sizable portion of the population being misallocated. Our analysis of misallocation, however, indicates that this phenomenon played only a minor role in cell-level inaccuracies. Table 5 demonstrates that even in this context of vast unsettled areas, only a small portion of Khomas's population was misallocated to cells far from actual settlements. Nearly all of the population was estimated to be in cells within 200 to 300 metres of the "true" population. The second potential source of inaccuracy relates to covariate resolution and the relationship of covariates with population density. This issue seems to have contributed more substantially to errors in this analysis, particularly within the city of Windhoek. A number of the Random Forest model covariates, such a land cover type and night-time lights, had an original resolution substantially larger than 100x100m which could have resulted in a "halo" effect around settlements, causing populations to be disaggregated to cells near a settlement, but not directly over it. Table 5 provides evidence of this; the accuracy of the estimated population distribution, and correct allocation of population to settled cells, both performed well when the estimated population was aggregated to 300x300m or larger. Other covariates, such as distance to roads and intersection locations were available at very fine spatial resolution and thus were precise at the 100x100m scale. Although they are good indicators of a settlement, they are not necessarily good indicators of higher or lower population density within a settlement. The lack of fine-scale covariates associated with population density within cities and towns likely explains a portion of the cell-level error observed in Khomas's urban population. Other issues that might further decrease local spatial accuracy are temporal miss-match of covariates [13]  . Building footprints are likely associated with population density within settlements and have a finer spatial resolution than 100x100m, making it a potentially powerful covariate to differentiate low and high population density within urban areas in any gridded population model. The WorldPop team, among other gridded population producers, is currently working to test and incorporate building footprint datasets into gridded population models.
The third potential source of cell-level inaccuracies is use of average population densities from large administrative units to estimate population density in much smaller grid cells. This is known as the ecological fallacy [89], and probably played the largest role in cell-level inaccuracies, especially within urban areas. Population densities are used by the Random Forest model to establish relationships between covariates and population density (total population divided by total area), not population totals. Even with perfect covariates and exclusion of unsettled areas, this would mean that cells with high "true" population counts are likely to be severely underestimated because the geographic size of input population units are larger (and population densities are smaller) than the output grid cells. When this happens, population counts that are not allocated to the densest cells will instead be allocated to other less dense cells in the same input areal unit. Table 4 provides strong evidence of this issue with the population in urban cells systematically underestimated regardless of cell size.
Most top-down gridded population datasets use averaged population densities from input areal units in some way to populate smaller grid cells, and are subject to similar error. The High Resolution Settlement Layer (HRSL), for example, uses uniform areal disaggregation of the population from input units (e.g., EA) to 30x30m grid cells which contain a building footprint [18], and the Global Human Settlement GHS-POP dataset takes a similar approach disaggregating input populations into 250x250m cells that are classified as settled [16,17]. The problem of aggregate averages is accentuated when input areal units are geographically large because average densities tend to be smaller across larger spaces. In these cases, WorldPop-Global will incorporate training data from a neighbouring country that has finer-scale input population counts [19]. Our analysis showed, however, that even when relatively small geographic units (census EAs) were used as the input population area unit, urban cell-level errors were still substantial, and cell-level accuracy was only marginally improved with EA-level inputs compared to constituency-level inputs (Figure 7). This suggests that much finer-scale training data (e.g., 100x100m) should be incorporated into models, particularly from highdensity urban areas, to ensure that the WorldPop Random Forest model contains sufficiently large population density values to assign to cells. Fine-scale training datasets might be simulated, like the dataset that we created here, or they might come from existing geo-referenced household survey enumerations This analysis of WorldPop-Global-unconstrained data raises broader questions about the cell-level accuracy of all gridded population estimates in urban areas, especially the densest parts of cities such as in slums, informal settlements, and neighbourhoods with high-rise apartment buildings [91-93]. New datasets derived from very high resolution satellite imagery, in particular building footprints, are a promising new covariate to reduce the "halo" effect of populations misallocated nearby, but not directly over, the highest density cells. More work will be needed to improve building footprint datasets by distinguishing residential and nonresidential buildings to avoid population being misallocated to business districts, factories, universities, airports, and other non-residential cells. These two steps -use of building footprint covariates and finerscale training data -stand to improve cell-level accuracy of all top-down and bottom-up gridded population The simulation in Khomas, Namibia followed the same steps outlined by Thomson and colleagues (2018) 1 for a simulated population in Oshikoto, Namibia: (1) Use of a supervised clustering k-means algorithm to define realistic and distinct types of households in Khomas, Namibia based on eight variables in the 2013 Demographic and Health Survey (DHS) ( Table  S1.1, A) that were also present in a 20% census microdata sample (Table S1.1, B): urban, improved toilet, improved water source, sufficient sleeping space, durable structure, non-solid fuel for cooking, whether the head of household had any formal education, and whether there were any children under age five. A dendrogram showing the Euclidean distance between each pair of child clusters and their parent cluster in the k-means analysis indicated a sensible cut-off value of 1.0 to define four easy-tointerpret household types: urban poor, urban non-poor, rural poor, rural non-poor ( Figure S1.1).  Annual net primary productivity (3) In step 3, we calculated the main type of household in each 2013 DHS primary sampling unit (PSU) (550 nationally) based on k-means groups defined in Khomas (step 1), and joined the 2km averaged auxiliary data values (step 2) to each PSU point. The distribution of PSU main household type across Namibia was: 185 (34%) urban non-poor, 82 (15%) urban poor, 249 (45%) rural poor, and 34 (6%) rural non-poor. We used these 550 PSU household types as training data, and the average 2km covariate values in a Random Forest machine classification model to predict a probability surface for each household type in each 100x100m cell in Namibia. This model performed well for urban non-poor households (14.6% misclassification) and rural poor households (7.6% misclassification), though classification error was high in areas comprised of mostly urban poor households (58.5% misclassification) and rural non-poor households (76.5% misclassification) (Table S1.2). Errors within urban areas were expected because auxiliary data 2km buffers can mask disparities between neighbourhoods. Although expected, poor performance of the model for urban poor households was problematic and addressed in the next step. Misclassification of rural non-poor households was also not surprising given the small size of this population, though this problem was ignored because non-poor rural households comprised a very small portion of the population in Khomas (<1%).  (4) To improve the accuracy of the urban household probability layers in Khomas, we created an urban poor/non-poor weights layer by manually assigning each census EA with the portion of population that appeared to be located in a slum or informal settlement in 2016 based on visual inspection of 30cm Quickbird satellite imagery. Before beginning this process, we split large EAs at the periphery of Windhoek to create new EAs for areas that had undergone urban expansion since the 2011 census boundaries were drawn (total of 922 EAs). Rural EAs had a null probability in this step. The poor/nonpoor weights layers were multiplied by the predicted household probability surfaces (step 3) to produce final 100x100m household probability surfaces ( Figure S1.2).
Figure S1.2. Household type probability surfaces (steps 1-4) in Khomas, Namibia population simulation (5) In step 5, we manually digitized building locations across Khomas using 2014-2016 high-resolution (30cm) Quickbird imagery in ArcGIS 10. Subjective judgement was required; for example, deciding not to digitize some buildings on main streets in densely populated areas where shops and offices seemed likely. In areas of dense settlement, some points were duplicated to represent more than one household in the same building. A total of 97,667 household points were digitized in 2016. As a benchmark, we exported points to Google Earth and used 2011 Maxar and SPOT (40cm) imagery to identify buildings that were missing in 2011, and ensured that the reduced number of points matched constituency household counts in the 2011 census (Table S1.1, C).
(6) In step 6, we simulated a population of realistic households in Khomas using iterative proportional fitting (IPF) with combinatorial optimisation in the R simPop package 2 (Table S1.3). IPF starts by defining a basic household structure to ensure the synthetic population is realistic. We defined household structure with household size, urban/rural residence, and age and sex of household head at the household-level; and age, sex, and relationship (to head) at the individual-level. Inputs to the model were the 2011 Census 20% microdata sample, as well as urban and rural household sizes, and constituency population by age, sex, and relationship based on the 2011 census report (Table S1.1, C). The IPF model selects random samples of records from the microdata with replacement until each of the household structure targets per constituency are met. Next, using the R simPop package, we added household and individual characteristics present in the 20% microdata census dataset (toilet, water, structure, space, fuel, education) to the simulated dataset using a multinomial logistic regression technique and conditional annealing (Table S1.4Error! Reference source not f ound.). This treated age, sex, relationship, household size, and urban/rural residence as predictors, and each of the household characteristic as a conditional outcome. We confirmed that there were not major differences between the distributions of characteristics in the 20% microdata and simulated dataset (all differences were less than +/-0.002). Confident that the simulated household and individual characteristics were realistic, we calculated the most likely household type for each household based on variable factor weights created in the k-means analysis in step 1.
The 2011 census microdata sample was provided with a weight of approximately five for each observation to scale the 20% microdata sample to the total population in 2011. We calibrated the simulation to create an extra 20% of households to ensure there were enough simulated households to assign to 2016 point locations; left over simulated households were discarded in step 7. This resulted in 122,079 simulated households in Khomas before assignment to point locations.
(7) In step 7, we joined the re-weighted household type probabilities created in step 4 to the household latitude-longitude coordinates created in step 5. For each latitude-longitude coordinate created for 2016 household point locations, we randomly sampled a simulated household created in step 6 from the corresponding constituency and urban/rural strata based on the probabilities of household types at each coordinate. We repeated assignment of simulated households to coordinate point locations until all coordinates were assigned a simulated household, and then discarded the extra unassigned simulated households for a total of 97,667 simulated households located at realistic coordinate locations in Khomas for 2016.
(8) In step 8, we used the 2013 DHS records in Khomas (n=931 households) to develop multinomial models in R to simulate the same three individual and household outcomes as Thomson and colleagues (2018): household wealth quintile (five ordinal categories), woman's use of modern contraception (binary in women age 15 to 49), and child's receipt of 3 rd DPT vaccination (binary in children under five) (Table  S1.5). We used a multinomial model to calculate associations between each outcome and householdlevel covariates in the 2013 DHS dataset, and applied coefficients to the simulated dataset to predict wealth quintile, modern contraceptive use, and receipt of 3 rd DTP vaccine for each household, woman 15 to 49, and child under five, respectively. Note: *p<0.1; **p<0.05; ***p<0.01 (9) To check the realism of this dataset, we compared the distribution of simulated household and individual outcomes (summarised by census enumeration areas -EAs) to households and individuals measured in the 2013 DHS (summarised by primary sampling units -PSUs) in Figure S1.3. The distribution of household characteristics appeared to be consistent between the simulated and DHS populations. However, individual characteristics were less consistent, and more heaped around the mean in the simulated dataset ( Figure S1.3). This may have occurred because there were more observations per unit (EA vs PSU) in the simulated dataset, and more census units (922 EAs) compared to the 2013 DHS dataset (53 PSUs). Due to these inconsistencies, we only report household-level outcomes in the simulated dataset.