A Spatial SEIR Model for COVID-19 in South Africa

The virus SARS-CoV-2 has resulted in numerous modelling approaches arising rapidly to understand the spread of the disease COVID-19 and to plan for future interventions. Herein, we present an SEIR model with a spatial spread component as well as four infectious compartments to account for the variety of symptom levels and transmission rate. The model takes into account the pattern of spatial vulnerability in South Africa through a vulnerability index that is based on socioeconomic and health susceptibility characteristics. Another spatially relevant factor in this context is level of mobility throughout. The thesis of this study is that without the contextual spatial spread modelling, the heterogeneity in COVID-19 prevalence in the South African setting would not be captured. The model is illustrated on South African COVID-19 case counts and hospitalisations.


Introduction
South Africa is a large, diverse country with marked income inequality and differences in access to adequate housing, basic municipal services, transportation and medical care. Many of those affected by poverty also have increased morbidity risks due to TB and HIV. These factors contribute to spatially diverse levels of vulnerability to the COVID-19 pandemic, which will result in limited accuracy if not taken into account when modelling.
At the onset of the COVID-19 pandemic, it remained uncertain how the spread of Severe Acute Respiratory Syndrome Coronavirus (SARS-CoV-2) would affect the South African healthcare system, particularly the public system. The ability of the government to provide quality health outcomes and to source necessary equipment were in the spotlight and dominated news discussions. The South African government proclaimed COVID-19 a national disaster on the 15 th of March 2020 and thereafter, on the 26 th of March placed the country into a national lockdown, which halted most economic activity and restricted personal movements by only permitting movement associated with essential activity. This most stringent alert level, that later became known as Level 5, was intended to contain the spread of SARS-CoV-2 and to adequately prepare the state, particularly the healthcare system, for the epidemic.
Modelling initiatives involving estimating the key parameters, such as the potential magnitude of infections across the population and possible hospitalisation requirements, became critical as a means to assist government preparations. Hospitalisation requirements were a key component that would assist in evaluating whether the healthcare system would be able to cope with the cases needing hospital care and identify where the impact would be most severe. Assessing these at different spatial scales was essential, as the country's hospital bed capacities and relevant resources vary across administrative boundaries. Most of the hospital-related care planning for COVID-19 occurred at provincial level. These included quarantine facilities, which involved decisions on building new field hospitals or refurbishing existing infrastructure and identifying quarantine sites.
The Susceptible-Exposed-Infectious-Recovered compartmental model is a well-known epidemiological model, developed by Kermack and McKendrik (28), for predicting the spread of a disease. As with most compartmental models, the underlying assumption is that each person in the model is equally likely to interact with any other person, referred to as 'homogeneous mixing'. For the case of South Africa, these interactions are spatially constrained and this must be accounted for. While this assumption is untrue for almost all cases of disease spread, it was important to account for it when assessing COVID-19 interventions as many of those proposed involved creating spatial constraints to movement. In (22) it is shown that when an epidemic outbreak exhibits spatial structure, for instance delays in outbreaks between spatial units observed (e.g. Ebola), it should be taken into account as it may incorrectly estimate the reproductive number in the spatial units. Moreover, in South Africa, due to high inequality and a history of spatial segregation there are large differences in the capacity of different segments of society to self-isolate, and therefore in their exposure to the disease. This makes modelling the pandemic at national or provincial scale unsatisfactory, due to a number of societal characteristics.
South Africa has deep-rooted inequalities causing stark differences in the quality and access of communities to basic and critical services such as healthcare, running water, sanitation, housing and social amenities. South Africa's current GINI coefficient of 0.65 (51) highlights these inequalities and shows this country to be amongst the most unequal countries globally with regards to income distribution. Wealth inequality is even starker with an estimated 10% of the population sharing close to 95% of all wealth (51). With the onset of the COVID-19 pandemic in South Africa, this high level of inequality raised concerns amongst global leaders in the health sector about the detrimental effects that the pandemic might have on the society and economy at large, specifically those most vulnerable.
Aside from the differences in income and wealth distribution, the differences in housing conditions and access to basic services such as water and sanitation raised critical concerns about the transmission potential of the virus, and whether this would be skewed to higher densities in informal settlements with limited access to basic services. The health susceptibility of communities was also raised with regards to the stark spatial differences in access to quality health services and the underlying comorbidities present within communities.
Including mobility in the model is an important factor in South Africa even under varying lockdown approaches for the pandemic. The mobility data available for this research shows significant mobility still occurring under lockdown due to very limited access to daily needs such as food and income. A significant proportion of the South African population lives day-to-day and do not have savings or food stores to rely on. Earlier in 2020 the Aljazeera news network brought to light an issue relevant to all developing countries when they published an article "In Africa, social distancing is a privilege few can afford" 1 . This statement makes it clear that many individuals who work and live in African countries do not possess the economic status to facilitate selfisolation. South Africa is one such country where the vast majority of the population lives under the poverty line (19). Certain challenges that impoverished South Africans have to contend with can increase their risk of COVID-19 infection. Given this fact it stands to reason that some areas will have higher transmission rates than others, thus necessitating the inclusion of a spatial element to models that attempt to model COVID-19 in South Africa. This paper introduces methodology for a spatial SEIR model for COVID-19 in South Africa. The spatially diverse levels of risk at ward level 2 are incorporated with a South African vulnerability index as well as mobility data between wards obtained from cellular phone-based data on peoples' movement between wards. We add spatial elements into the compartmental epidemiology model. We advocate that any model for the spread of COVID-19 in South Africa should make use of a spatial element due to the societal structure within this developing country. Forecasting cases in South Africa using a single SEIR model at a national level does not allow one to take the spatial diversity present into account or respond effectively from a health care point 1 https://www.aljazeera.com/opinions/2020/3/22/in-africa-social-distancing-is-a-privilege-few-canafford [accessed on 31 March 2021 2 A ward is defined in South Africa as an administrative area for which larger municipalities are subdivided into. http://www.statssa.gov.za/census/census_2001/geo_metadata/geography_ metadata.pdf of view. The aim, therefore, of creating the spatially modified SEIR model for South Africa is to investigate whether forecasts can be improved at a local level, by firstly tracking and predicting the spatial spread of the infection using the spatial location of cases, and secondly by bringing in factors specific to each area, such as vulnerability and population mobility.
Other solutions to the homogeneous mixing problem have included running individualbased simulation models. The solution proposed here is less computationally-intensive but still, we argue, provides the necessary level of spatial detail for fighting an epidemic. This paper does not claim to provide a model that improves over all other modelling approaches of the COVID-19 pandemic. The aim is rather to illustrate that the proposed model captures the spatial heterogeneity inherent in the nature of the spread of the virus, or in fact any other similar disease, and to demonstrate the importance of incorporating these effects.
The paper proceeds with a literature review in Section 2, and follows with the methodology in Section 3. Section 4 discusses the implementation of the proposed spatial model, Section 5 presents the results, Section 6 provides a critical discussion and Section 7 concludes.

Literature Review
The use of SEIR type models in disease modelling is common, but in their simplest form these models assume homogeneous mixing of the population. Alternatives include stratified models. Stratified models with contact matrices between population strata are used to account for different populations in large areas and movement between areas. One of the first models to introduce mobility between spatial units via contact matrices is (48). Generally the methods using contact matrices are applied when a small number of spatial units are being considered. The homogeneity assumption is relaxed by varying the contact rate between strata, commonly taken as age groups. Rost et al (45) follow the contact matrix approach in incorporating spatial dependency into the model for COVID-19 in Hungary. Mobile phone geolocation data has also been used to inform compartmental models. Peixoto et al (42) estimate probabilities of movement between cities which is used to adjust the infected equation in a simple SI model. A more elaborate stochastic approach for estimating the mobility terms in the differential equations is followed in (3).
The South African healthcare system consists of both the public and the private system. These are characterised by disproportionate spending on healthcare, medical care infrastructure, equipment and supplies, doctor-to-patient ratio as well as the quality of healthcare. The majority (over 80%) of South Africans rely on public healthcare while the healthcare requirements for the remaining (less than 20%) population are covered by private health insurance (32; 41). The gap between these two healthcare sectors is widened by the disparities in the distribution of medical practitioners, with the public health system being under-resourced and overstretched in comparison to the private health system (24; 32). According to the latest edition of the South African Health Review, South Africa has one of the lowest doctor-to-population ratios of 0.9 doctors per 1,000 people (24), compared to the ratio of 2.5 medical staff per 1,000 people, rec-ommended by the World Health Organization 3 . In the South African private healthcare sector this ratio is believed to be higher (24) even though the existing estimates lack consensus due to lack of data, making it difficult to arrive at a widely acceptable estimate. The latest available estimates suggest the ratio of doctor to population in the private sector to be around 1.75 doctors per 1 000 people (12). An increased doctor to population ratio is important for improved health outcomes (6). Other public health challenges involving poor governance and management as well as the burden of diseases including HIV and TB. Approximately 7.5 million people were estimated to be living with HIV across all ages by the end of 2019 (21) and according to the National Institute of Communicable Diseases (NICD) surveillance report, around 4.3 million of them are on anti-retroviral treatment. Meanwhile, TB is estimated to be responsible for ill health of around 320 000 people annually and is a leading causes of death for about 80 000 people every year 4 . All these factors have continued to influence the quality of healthcare, particularly that of the public sector.
Urban transport in South Africa exhibits a large degree of spatial heterogeneity. This is largely due to the legacy of Apartheid, during which city planning was geared towards restricting rather than facilitating access (23). Formal public transport has not proved sufficient to meet the needs of all lower-income South Africans. This has led to the development of paratransit systems, namely the minibus-taxi transport sector (27). Paratransit systems are not formal (15) and arise ad hoc to fulfil local transport needs (27). In line with the pre-existing patterns of city development, South African paratransit systems transport people from low-income residential areas such as informal settlements, generally located on the fringes of cities, to city centres providing work opportunities (13; 59), while middle-to-upper-class residents travel by private car (23; 60). While research has been done on formalising minibus-taxi transport (27; 49; 10), the current situation persists, and there is a difference in the form of transport used by residents of low-versus middle-to-high-income areas. Since minibus-taxi drivers earn income per passenger (15), minibus-taxis tend to be more densely-packed than private cars. This has implications for the risk of COVID-19 infection. Formal public transport, such as buses and trains, also harbours risks in terms of the proximity and contact of passengers. People living in areas serviced by public transport, generally outlying low-income areas (13; 60), therefore have a higher transport-associated risk than those living in higher-income areas, such as suburbs and estates. This increases spatial heterogeneity in the risk of COVID-19 infection across South African cities.
The inherent poverty in communities also affects disease transmission. While some studies conclude that post-Apartheid South Africa has seen an improvement with regards to the intensity of poverty (20), the country still faces a significant challenge regarding poverty and unequal distribution of resources (19), and poverty as well as general inequality vary spatially at a municipality level (14). In particular, areas that historically are known to have experienced higher intensities of poverty and inequality are still experiencing such issues (14). These areas are known as informal settlements, the South African term for what could generally be referred to using the umbrella term "slum" internationally. The precise definition of a slum can vary by source and is very dependent on the relevant context (50; 16). The United Nations (UN) defines a slum household as "A group of individuals that live under the same roof that lack one or more of the following conditions; access to improved water, access to improved sanitation, sufficient living space, durability of housing and secure tenure" (16). As per this definition, individuals who live in slums do not enjoy much free space and live in small dwellings where it is not necessarily possible to isolate themselves from other family members should they become infected with COVID-19 (16). Furthermore, the dwellings in these slums are so densely packed that the risk of infecting a neighbour is also very high. We thus expect the rate of transmission in such areas to be higher than more formally established settlements. Estimating the transmission risk in such areas poses a significant challenge since their actual population size may be unknown (16).
Upon developing symptoms of COVID-19 an individual should proceed to medical services to receive testing and potentially treatment. However, individuals have been shown to seek medical assistance with varying degrees of urgency, with economic status and location of residency often being key factors (31; 5). Individuals often choose not to seek medical attention due to the fact that they (in their opinion) do not feel sick enough to justify such a venture, most likely due to the physical distance that needs to be travelled (31; 5), linked also to transport challenges discussed earlier. This suggests that individuals who develop mild symptoms and live further away will not seek medical attention and will remain undetected, thus leading to more potentially undetected cases. Other reasons include additional costs associated with certain medical procedures, poor quality of medical services and even the potential stigma that comes from being suspected or confirmed to be infected with COVID-19 (31; 5; 11; 9). Once again these factors tend to affect poverty-stricken individuals more severely than their wealthier counterparts and thus their effect will vary spatially (31; 11).

Materials and Methods
The SEIR model is an epidemiological model used for predicting the spread of a disease. It consists of four main compartments, namely: Susceptible (S), Exposed (E), Infected (I) and Recovered (R) (28). In order to overcome the underlying assumption of 'homogeneous mixing' at a national level and to bring in localised factors, as discussed in Section 1, we add a spatial component to a standard compartmental model to simulate the spread of COVID-19 through the 4392 different municipal wards of South Africa.
For the case of South Africa, in each ward the course of the infection was modelled at a daily time-step with an SEIR compartment model parameterised to simulate the epidemiology of COVID-19. At each time-step, before the next SEIR iteration, the movement of infected people was simulated between wards based on movement data from cell phone companies. Population size of the wards ranges from 293 people to >100 000 people and within the wards people were assumed to mix homogeneously. The choice of spatial unit is important and in practice involves some compromise between granularity and computational cost. At one extreme each individual could be located in their own spatial unit, essentially mimicking an agent-based model, and at the other extreme we have the standard model with only one spatial unit.
) and the asymptomatic class was assumed to be ρ times as infectious as all other classes.
The COVID-19 Vulnerability Index of the population is a composite indicator developed at the early mitigation and prevention phase of the South African government's with average 87.7% Percentage of severe cases amongst symptomatic Percentage of critical cases amongst symptomatic  (29). In the early disaster management response phase, several sector departments required similar information with regards to the location and characteristics of highly vulnerable communities. The indicator (29) was developed to facilitate a coordinated response by several government sectors with regards to prioritising intervention areas for water provision, sanitation upgrading, social interventions and community risk awareness. The indicator provides a spatial overview of communities that are highly vulnerable to COVID-19 based firstly on how effectively the spread of COVID-19 can be contained (the transmission potential), and secondly on the population's susceptibility to severe disease associated with contracting COVID-19 (the health susceptibility). The transmission potential highlighted areas and communities that would struggle to apply the basic principles of social distancing, hand washing and good basic hygiene by including spatial information on informal settlement areas, communities with a lack of access to basic services and areas of high population density. The health susceptibility of individuals was added to account for older populations and populations with a higher disease burden and inadequate access to medical care, by including data on age cohorts, comorbidities present and poverty levels (as a proxy for health care and the access thereof). About 93% of the South African households have access to improved drinking water sources (piped water inside and outside the dwelling) (33) and nearly 80% of households have access to safely managed sanitation services (52). Figure 2 provides an illustration of the spatial vulnerability index across South Africa.
In order to use this COVID-19 Vulnerability Index in the model, it is normalised across all the wards and then scaled to have a range of 0.4 with this range centered around 1, resulting in a range of 0.8 to 1.2. It was applied by multiplying the national baseline R 0 by this factor to create an R 0 i for each ward i = 1, 2, ..., n. For highly vulnerable wards, the R 0 would increase, since their scaled vulnerability factor would be greater than 1, and for the less vulnerable wards the R 0 would reduce. The value of the wardlevel modelling is that key parameters could be varied per ward based on the economic and social vulnerability of the people as well as interventions implemented (see Table  2 for the list of interventions and scaling factors). In addition, the proportions p 2 i , p 3 i , p 4 i of exposed people moving into the mild, severe and critical infected classes was adjusted based on the age structure distribution within each ward. Figure 3 illustrates the age profile spatially across South Africa. There is a clear indication of a younger population.
The incubation period (σ −1 ) and the infectious period for each infectious compartment (γ −1 1 , γ −1 2 , γ −1 3 , γ −1 4 ) were kept the same across geographic wards. Taking N i = S i + E i + I 1 i + I 2 i + I 3 i + I 4 i + R i as the population in ward i, with R 0 i as the ward-level R 0 and taking the infection rate, β i , for that ward as β i = R 0 i γ 4 , the flow between the main compartments in the model is implemented within each geographic ward i using the set of differential equations (1)- (7).
Details of the final parameters chosen for this model, and justification based on the literature are presented in Table 2. Since the model was initially run in a real-time context rather than a post-pandemic context, only sources that were available up to June 2020 were used to calibrate the model. South African sources of parameter values were predominantly used since these had either been calculated on South African data or decided on by experts from the South African COVID-19 Modelling Consortium (SACMC) 5 . There is a large variation of possible values for R 0 (the number of secondary infections produced by one primary infection in a fully susceptible population) in the literature, and since the model is also highly sensitive to the R 0 , choosing an appropriate value is difficult. Herein R 0 was chosen as the mean value from a systematic review of 81 papers from January to July 2020 which estimated R 0 values for the beginning of the pandemic, i.e. prior to the implementation of non-pharmaceutical interventions (55).
The movement between wards is approximated by daily aggregate movements of the people based in that ward using cellphone location data. An individual is based in a geographic ward if their cellphone is located there between 8pm (of the previous day) and 4am (of the current day). For each ward (ward i with N i residents say) for each day we have a vector of length k (where k is the number of wards), where element j of that vector is the number of residents of ward i who appeared in ward j. We use this vector to derive a multinomial distribution (by dividing the vector by its sum) which represents the daily probability that persons from the ward i appeared in any other ward.
This movement data were summarised by a k by k matrix representing the average probability of movement between wards for the different interventions imposed by the SA government during this time i.e. for each lockdown scenario one movement matrix was produced representing the average inter-ward movement during these restriction conditions (see Table 1). A movement matrix for a Business As Usual (BAU) scenario prior to government interventions was also created. Since cell phone data was not available for lockdown levels 1 and 2, level 2 was estimated as a slightly constrained version of the BAU mobility matrix and the BAU matrix was used for level 1.
The South African COVID-19 Modelling Consortium (SACMC) have also done regular updates of the predicted non-ICU and ICU beds required per province (see (SACMC) for example) although not all updates were released to the public. Using the spatial SEIR model, our aim was to see whether we could create reliable projections at a low spatial resolution so as to better understand the spatial spread of hospital requirements and expected peaks across the country. Various projections were done from several starting dates. It was found that using a starting date from too early on in South Africa's fight against the pandemic did not provide sufficient case data across the spatial units, with cases being limited to a few epicentres. For the purposes of this paper, the starting date of 1st June 2020 is used to show the model results since this represents the date at which South Africa moved to level 3 of lockdown, thus allowing for far more freedom of movement of people between spatial units (represented by the L3 mobility matrix). This was also at a point in the pandemic where cases were being detected in all areas across the country.
Since the main focus here is on projecting potential hospitalisation cases of COVID-19 for each spatial unit, an assumption had to be made regarding the proportion of severe (I 3 ) and critical (I 4 ) individuals that would be admitted to hospital. Since critical cases are defined to be the proportion requiring either ICU, ventilation or oxygen, 100% of these individuals are assumed to be hospitalised. Given the difficulty in accessing healthcare for a large portion of the South African population, as discussed in the background, it was assumed that not all individuals with severe cases of COVID-19 would report to hospital. It was therefore necessary to determine what proportion of our projected severe cases were likely to go to hospital. Studies from Italy (43) and the US (26) have reported statistics of 20% and 21.1%, respectively, of symptomatic cases requiring hospitalisation, while a South African study, focusing on the asymptomatic spread (2), calculated an average of 4.02% requiring hospitalisation. The latter figure was based on the percentages of severe symptomatic cases per age group taken from (18) and the age structure of South Africa's population. A statistic close to this 4% was also used by SACMC in their May 2020 presentation (SACMC). Although the proportions of symptomatic cases that fall in the severe and critical compartments in our spatial SEIR model vary per spatial unit based on the age structure in each area, on average across the country the critical cases constitute 2.5% of the symptomatic cases while the severe cases represent 10.7% of all symptomatic cases. These are clearly well below the 20% recorded in other countries, but well above the 4% estimated in the two South African studies mentioned. Using hospitalisation data that was available for the Gauteng 6 and Western Cape 7 provinces on 1st June 2020 (using a 7-day moving average) and comparing this to the confirmed cases (assuming for simplicity sake that confirmed cases were similar in number to symptomatic cases), we calculated the proportion of our severe (I 3 ) cases reporting to hospitals in Gauteng to be roughly 30%, while for the Western Cape this number was found to be only 10%. Together with the critical (I 4 ) cases, this represents 5.7% of all symptomatic cases for Gauteng and 3.6% for Western Cape. Although the population in Gauteng is on average a much wealthier population and has the highest access to medical care out of all the provinces in South Africa (53), it is unclear as to why the proportion in the Western Cape was so much lower. One might surmise that it was due to their rigorous testing regime (36) in the early stages of the pandemic which actually resulted in a much higher detection ratio than elsewhere in the country.

Implementation
Since the delay from initial infection to time of getting a confirmed test result was estimated to be about 7 days in the early stages of the pandemic (although this could be longer in some cases), the model was initialised with the number of confirmed active cases in each ward, taken from 7 days after the start date for the simulation run. It is well known that the confirmed cases are only a portion of the actual cases in the population, but the exact ratio is unknown and would depend on the testing strategy per country. An earlier study from Italy (43) used a ratio of 10 undetected cases to every 1 confirmed positive case while a more recent study (7) done on several European countries, including Italy, calculated a ratio of 2.3. A USA study (62) using data up to the 18th April 2020 calculated that the number of infections was nationally 9 times higher than the reported cases, although this factor varied per state and region. We chose to use a ratio of 5 to 1, thus assuming that 20% of the actual infected people were tested and confirmed as cases.
We assume, for simplicity, that the population of each ward is fixed, i.e. no births, non-COVID-19 deaths or permanent movement between wards. At each time-step the number of new exposed people is produced by running the SEIR model for each ward. The new exposed are then randomly distributed over the current and other wards based on the mobility data, essentially using one draw from the appropriate multinomial distribution. This assumes that the spread of the virus matches mobility patterns, in 6  particular that infections are equally likely to occur at any point of an individual's movements throughout the day. At each subsequent time step the spatial allocation of exposed is taken into account by adding up the new exposed that have been allocated to that ward and subtracting the exposed that have been allocated elsewhere to provide the initial condition for the SEIR at that time step. Explicit in this formulation is the assumption that for a given ward, the ward level population parameters (number of susceptible, exposed, infected and recovered) drive the number of new exposed cases. This is clearly a simplification, since as individuals move out of their ward they interact with a wider group and other individuals may move into the ward for various time periods. This is more likely to be problematic at the end of the epidemic when some wards have low numbers of susceptible individuals while others are still highly susceptible. Moreover, the mobility data does not allow one to distinguish between transitory movements through a ward and extended stays in a ward. For example, wards which contain transport hubs and train stations may be allocated more infections than is realistic. The model was run on facilities made available by the Centre for High Performance Computing (CHPC) 8 which consisted of 10 x 24 core machines. Four simulations were run per core per machine and hence the model was run a total of 960 times, each time with different parameter combinations as presented in Table 2 and random draws from the multinomial movement matrix. The final model outputs were summarised into data structures containing selected percentiles (10 th , 25 th , 50 th , 75 th , 90 th ) calculated across all simulations, as well as the mean and standard deviation, with values for each model compartment (S i , E i , I 1i , I 2i , I 3i , I 4i , R i ) given per ward and per day. We selected the mean for display purposes in the results section. The data made use of in this paper consists of daily case data at ward level in South Africa from 6 March 2020 to 22 July 2020. In addition, hospitalisation data is freely available for two provinces, Gauteng and Western Cape for the period June 2020 up to mid-October 2020. In this paper we refer to hospitalisations as the number of cases occupying hospital beds on a daily basis i.e. capacity requirements, and not the number of daily hospital admissions. The mobility data is cellular data from a local cellular provider from before the pandemic (BAU) up to Lockdown Level 3. Figure 4 provides a connectivity visualisation of some of the cellular data available.

Results
The value of including a spatial component to an SEIR model is demonstrated in Figure  5, where the ward level spatial SEIR model is compared with an SEIR model run at a national level. Since both the national and spatial SEIR models were initialised with cases inflated by the undetected factor (see Section 4), for the purposes of the comparison to actual confirmed cases in Figure 5, the outputs from these models were divided by this same factor to show estimated recorded cases rather than estimated total cases. Figure 5 shows the overestimation of the non-spatial approach. Our proposed spatial model produces more conservative estimates that appear to be a closer match to the  reported case numbers. The time-course of the spatial model more accurately follows the observed case number, while the national model predicts far too rapid an increase in infections at the beginning, as it assumes a mixed population. The discrepancy with actual cases can be attributed to a number of complexities in the testing and reporting processes, as well as cases not picked up such as asymptomatic cases as well as symptomatic cases avoiding testing due to stigma, for example.
Although the NICD reports hospital admission cases for all provinces (35) these include only a small proportion of public hospitals and therefore no complete hospital data was available for the remaining 7 provinces i.e. for all provinces excluding Gauteng and the Western Cape. Due to lack of other available data and assuming that the remaining 7 provinces, being more rural in nature, would have a much lower hospitalisation proportion than Gauteng (given in Section 3, a value of 20% was taken as the proportion of severe cases that would actually report to hospital. This equates to 4.6% of the symptomatic population in these provinces. Based on the estimates for all nine provinces together with a starting date of 1st June 2020 and hospital-related parameters as given in Table 2, the projected hospitalisation cases i.e. hospital bed requirements, aggregated per province, are shown in Figure 6. This graph shows how the provinces peak at different times which provides an understanding of how the virus spread across South Africa. Figure 7 provides a closer look at the provinces Gauteng and Western Cape for which hospitalisation data was available. The predictions also overestimate here, but capture the data decently well. Complexities of the hospitalisation data include COVID-19 deaths not captured as well as excess non-COVID deaths. In addition, considerable stigma exists in the South African population regarding COVID-19 (and other diseases) and many severe cases do not end up in hospitals (57; 44).  Further, in order to validate the model, we conducted a local sensitivity analysis using a spatial SEIR model we developed for the COVID-19 cases in South Africa. We consider the sensitivity of the proportion of individuals that had been infected by the end of 201 days to the spatial R 0i . The base R 0 was sampled from a uni(1, 5) distribution over 960 simulations, and the correlation between the R 0 and proportion of infected individuals was calculated. Figure 10 illustrates the results of this spatial sensitivity analysis visualised as correlations.
Additionally, we considered the sensitivity of the proportion of individuals that had been infected by the end of 201 days to the spatial R0. Figure 10 shows that many correlations are between 0.8 and 0.9, indicating a strong relationship between the spatial R 0 and the proportion of infected individuals. It further demonstrates that the strength of this relationship varies across geographical space. This necessitates the use of a spatially varying R 0 to model COVID-19, as is proposed here.

Discussion
The spatial SEIR model herein can be used to capture the variation in peaks across the country and identify wards that could potentially become high risk by incorporating a (a) Gauteng Province predicted hospitalisation cases per local municipality. 13 August 2020 (b) Western Cape Province predicted hospitalisation cases per local municipality. 11 July 2020 Figure 8: Hospitalisation cases at local municipality level for the provinces Gauteng and the Western Cape.  vulnerability measure at ward level across the country. Age-structure per ward was also allowed for in the study on hospitalisations and identifies areas where hospitalisations are expected to be higher. This takes into account more vulnerable risk age groups in the hospitalisations. The model thereby accounts for the heterogeneous mixing occurring at a societal level.
The advantages of using mobility matrices at a fine local level is that mixing between wards is identified, and thus helping to reduce over-predictions relative to the national model. Using different movement matrices at the different lockdown interventions also allowed this mixing to vary at different stages of the disease spread.
The parameters in Table 2 were obtained using expert knowledge and literature. These parameters can easily be updated in the model and re-fitted. In addition, if suitable mobility data and a proxy for the vulnerability data is in place for another country, the model can also quite easily be extended to other spatial areas.
The model does not perfectly capture or predict COVID-19 due to the disease complexities still being researched as the pandemic continues. In addition, the data available have been collected centrally from a number of local sources, such as district clinics and testing centres, and may not be accurate in terms of location. It has become apparent that address data is not always the patient's address but perhaps that of the referring doctor's. Due to stigma the patient's address may not be captured truthfully at collection; and in some cases an algorithm attempts to automate addresses during peaks. The mobility data is also difficult to prove as accurate as the service provider only covers 46% of the market in South Africa. It is likely that certain mobility is not represented with the mobility matrices (56). The advantage, however, of still running the model at ward level is that it allows an aggregation upwards, thereby reducing the noise effect at fine levels, as shown in Figure 8.
Over-prediction in hospitalisations is similar to other projections in South Africa (SACMC; 1) which also expected hospital capacity to be breached in provinces like the Western Cape and Gauteng and which also projected a later peak. Although spatial differences have been captured in the model in terms of vulnerability, age structure and mobility, there appear to be other factors affecting the nature of the disease spread in the different provinces since, in particular, the Western Cape was over-predicted to a greater degree than other provinces in the spatial model and in other projections seen (1). It is unknown why the Western Cape was so much lower than expectations compared to other provinces but one possible reason could be behavioural differences in the population in terms of adhering to restrictions and quarantine rules. Potentially the scaling factors due to government interventions (indicated in Table 1) should differ between provinces due to behavioural differences even though restrictions are uniform across the country.
The prediction of peaks during the pandemic is important in order to understand the hospital capacity preparedness. With good focused health care a severe COVID-19 patient has a higher chance of surviving. Over-stretching the capacities of hospitals during this pandemic has been a major concern across the world, and many studies have been done to estimate the impact of COVID-19 on hospital beds, ventilator beds and ICU beds (IHME; 43; 4).
We have briefly mentioned reasons for the overestimation still present in the proposed spatial model. We expand on that now. Firstly, there is difference between modelling on a data set post-occurrence and a with disease that is mostly understood, compared to modelling in a still evolving pandemic with a disease that is only now slowly being understood. There is specifically difficulty in selecting parameters in a real-time environment when much is still unknown. Far more information is now known in the post-first wave environment internationally. Secondly, the issue of using confirmed cases to initialise the model may be temperamental since screening and testing vary due to factors such as the cost of tests, access to testing sites (closely correlated with poverty and lack of transport), varying local testing strategies, and stigma of being tested. Some studies 10 (46) use data on deaths as a more reliable indicator of the severity of a disease while others (61; 17; 58) indicate COVID-19 related death reporting as unreliable. In South Africa this also seems to be the case since the study by South African Medical Research Council (SAMRC) and University of Cape Town Centre for Actuarial Research (34; 8) reveal that between 6 May and 8 December 2020, excess deaths for persons aged one year and above were around 56 607, while the total COVID-19 deaths reported by the NICD in the same period was 22432. A September 2020 update of projected cases by the SACMC (1) also estimated that about 80% of the excess deaths are a result of COVID-19.
Furthermore, not all cases that should be hospitalised were actually admitted, due to dying on arrival or prior to being admitted and tested, as seen from the excess deaths reported. The SACMC September 2020 report (1) estimates that the probability of seeking hospital-level care for severely and critically ill cases ranges from 50% to 97%. Reasons for individuals in South Africa failing to report to hospital include lack of transport, no access to a medical facilities and also the stigma attached to having COVID-19 in certain communities, resulting in some individuals being too afraid to be tested or to go to hospital. The poor quality of care in public hospitals also serves as a deterrent to some individuals seeking care even when they need it. In addition, individuals may not have been admitted to medical facilities during the peak period due to capacities at certain facilities being overstretched.
It has more recently become known that some individuals carry immunity (38) and therefore the size of the susceptible population in each ward could be lower than what was initially assumed. Our assumption that all previously-infected individuals have permanent immunity also remains to be demonstrated but adds complexity to the model as the absence of immunity would then require a return from the R i to S i compartment.
The model naturally has limitations. These include the data accuracy discussed above, and furthermore the exposed case distribution at each time step. Since the mobility data is not a personal trajectory the location at the next time point of that individual is not certain. The model simply assumes an infected individual can be exposed but will return/remain in the same ward. Longer term movement may still play a role. The model still assumes homogeneous mixing within a ward (but provides spatial heterogeneity nationally), which may not be accurate for a diverse country such as South Africa.

Conclusion
We have successfully demonstrated that a spatially-explicit version of a classic SEIR model can effectively improve planning and preparation for COVID, and provide a better estimation of both the timing and the peak of the epidemic. This study has shown the benefit of accounting for the spatial dimension by considering local-level spatial units when using a SEIR-type model in modelling the spread of COVID-19. By adjusting the model for social vulnerability and distributing cases according to mobility data at ward level we allowed for important spatial influences in predicting the spread of the disease. Setting up a spatial compartmental model and appropriately calibrating it at a low level of spatial units is useful to improve decision making once disease characteristics are understood better after the initial outbreak, especially in contexts where social factors are strongly at play.
Future work includes improving the accuracy of the mobility data through triangulation of multiple data sources, stratifying the model to account for co-morbidity in subpopulations as well as modelling multiple waves of infections and/or strains. The model can also be expanded to include a death compartment, as well as allow for levels of population immunity as data becomes available and vaccines are rolled out.