Estimation of Time-Dependent Reproduction Number for Global COVID-19 Outbreak

Real-time estimation of the parameters characterising infectious disease transmission is important for optimization quarantine interventions during outbreaks. One of the most significant parameters is the effective reproduction number - number of secondary cases produced by a single infection. The current study presents an approach for estimating the effective reproduction number and its application to COVID-19 outbreak. The method is based on fitting SIR epidemic model to observation data in a sliding time window and allows to show real-time dynamics of reproduction number at any phase of epidemic for countries globally. Online data on COVID-19 daily cases of infections, recoveries, deaths are used. Finally, time-dependent reproduction number is explored in connection with dynamics of peoples mobility. The method allows to assess the disease transmission potential and understand the effect of interventions on epidemics spread. It also can be easily adapted to future outbreaks of different pathogens. The tool is available online as Python code from the Github repository.


Introduction
Coronavirus disease (COVID- 19) was first identified in late December 2019 in Wuhan (China) and during the first half of 2020 spread all over the word. On March 11th the coronavirus outbreak was labelled a pandemic by the World Health Organization [1]. One of the key points during an epidemic is to design appropriate mathematical models and tools for effective decision-making [2]. For optimizing the quarantine steps it is important to estimate parameters characterizing infectious disease transmission using real-time data and track the temporal changes in those values [3].
The time-dependent (or effective) reproduction number (Rt) is one of the key parameters characterizing evolution of an epidemic. The value of Rt is a function of time, and it represents the expected number of secondary cases arising from a primary case infected at time t [4]. Rt is affected by many factors, including social distancing and other quarantine steps, and changes during an outbreak. If Rt < 1 the disease will decline and eventually die out. If Rt > 1 the disease will be transmitted between people, and an outbreak is likely to occur. Reducing the reproduction number below one is one of the main goals of implementing quarantine steps [5]. In the report [6] a consistent correlation between reductions in mobility and reductions in transmission intensity of COVID-19 was found.
Different methods are available to define the initial reproduction number R0 and time-dependent reproduction number Rt. The numerical estimations of reproduction number using different methods vary depending on implementation [4,[7][8][9][10][11][12]. Some of the standard methods were applied to assume basic and time-dependent reproduction number for COVID-19 outbreak for different countries using different implementations [6, 13 -16].
In review [17] the differences for estimations of COVID-19 reproduction number for China are shown and mentioned that the different forecasts can be explained by insufficient data and different estimation techniques.
The useful open-source tool for estimating reproduction number in real-time, requiring only data that are commonly recorded during an outbreak (infections, recoveries, deaths, infection duration), with clear science based methodology and without limitations on epidemic phase will allow to forecast the progression of disease outbreaks and understand the effect of interventions on epidemics spread.

Material and Methods
To estimate the time-dependent reproduction number, the following procedure based on classical SIR model is used. The starting point at which the number of infected people in the country gets above 1000 is determined. We then move a sliding window with the width of 7 days over the whole period, and for each step we estimate the time-dependent reproduction number by minimizing square difference between real and predicted number of new infected cases. Finally, time-dependent reproduction number is explored in connection with dynamics of peoples mobility.

Data Sources
Real-time data on COVID-19 daily cases of infections, recoveries, deaths, as well as each country population are retrieved from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [18]. The data are reported by each country's official surveillance system from the January 22nd, 2020.
Data reflecting people mobility (mobility data) are sourced from Apple [19]. It is generated by counting the number of requests made to Apple Maps for directions. The data sets are compared to reflect a change in volume of people driving, walking or taking public transit around the world. Data availability in a particular city, country, or region is subject to a number of factors, including minimum thresholds for direction requests made per day [20]. The mobility data for China are temporarily not available. The data are available from the January 13th (except May, 11th-12th) and updated every day.

COVID-19 data
The online data of numbers of infected, recovered, deceased people and each country population [18] are joined in one table and the dates are transposed, which gives us country dataframe. Then the numbers are smoothed out by computing 3-days rolling average on all the columns. The "Removed" column that contains the number of people removed from the susceptible population (i.e. recovered or deceased) is also computed. The analysis is based on countries which have at least 1000 infections reported in total.

Apple mobility index data
All available apple mobility data is collected and mean values of three data streams ('driving', 'walking' and 'transit') for each country are obtained. 7-days running average is applied to reduce weekly fluctuations.

SIR modelling
SIR modelling is a classical approach to epidemic simulation presented by  for the number of people infected with a contagious illness in a closed population over time. The assumptions of model are: constant (closed) population size (N); constant rates (e.g., transmission, removal rates -the probability of disease transmission in a single contact multiplied by the average number of contacts per person); no demography (i.e., births and deaths); well-mixed population (where 3 of 10 any infected individual has a probability of contacting any susceptible individual that is reasonably well approximated by the average).
The SIR model describes the connection between S (susceptible -number of people who have the potential to be infected), I (infected -number of infected people), and R (removed -number of people who are non susceptible to infection, this includes the number of deceased people as well) with the following system of ordinary differential equations (ODE): = .
β is known as the effective contact rate. It is assumed that in a unit time each infected individual will come into contact with βN people. From those people, the proportion of susceptible people is S/N, thus the speed at which new infections occur is considered to be −βSI/N. γ is the removal rate, and the number 1/γ defines the number of days during which a person stays infected. Thus the term γI defines the speed at which infected individuals are moved from being infected to recovered (or deceased).
The rates are supposed to be constant. In the assumption that the population is completely susceptible (S=N), the basic reproductive number is R0 = β / γ [22].

Determining parameters of the model
Model predictions based on system (1) depend on the parameters (β, γ). Optimization of the model with respect to parameters β and γ and fitting it to the real data allows to find the parameters that correspond to the actual outbreak. It is more consistent to optimize for β, and set the γ parameter according to medical studies. According to WHO [23], the median time from onset to clinical recovery for mild COVID-19 cases is approximately 2 weeks, and it is 3-6 weeks for patients with severe or critical disease. Among patients who have died, the time from symptom onset to outcome ranges from 2 to 8 weeks. Since the time period is wide enough, we considered the mean recovery time of 30 days, and fixed γ as 1/30, respectively.
To smooth inaccuracies in the statistics at beginning of the outbreak the calculations start from day t0, where t0 is the first day when the number of infected people is above 1000: Considering one country at a time, the following denotations for the data are used: V(t) -the number of total (accumulated) infected cases at day t, t> t0, E(t) -number of recovered cases at day t, D(t) -number of fatal cases at day t.
The values corresponding to each β are denoted as Sβ(t), Iβ(t) and Rβ(t), respectively. Sβ(t), Iβ(t) and Rβ(t) are the solutions of system (1) with initial values of: Here N is the population of a country being considered. The accumulated number of total infected people computed by the model is given by: and new total infected cases each day is equal to ′ ( ) = + ( + 1) − + ( ).
The actual number of daily new infected cases ′ ( ) = ( + 1) − ( ). The value β * that corresponds to the minimal discrepancy between V′(t) and Iβ′(t) in the 7-day period is defined below (we will denote the number of days in consideration by n=7): The process of finding argmin (the minimum value along a given axis) is a complex optimization process, and at each step the numerical solution of ODE (1) is needed.
Powell's method (Powell's conjugate direction method) is used for optimization because this algorithm does not rely on gradients and is fast enough [24].

Application of SIR modelling for time-dependent reproduction number estimation
Similar to the basic reproductive number R0 that is a characteristic of a disease, the time-dependent reproductive number Rt is considered. Rt dynamics during the pandemic takes into account isolation measures and the proportion of nonsusceptible population, and can be used to estimate the effectiveness of quarantine measures.
The same approach is followed for estimating Rt in real time: Start from the point at which there are 1000 infected people. Move a sliding window of width n (n=7 is used) over the whole period till present day. At each point, n consecutive days are used to estimate β (and, thus, Rt) by minimizing the square difference between the real and predicted number of new infected people.

Results
The approach described above allows to find real-time dynamics of reproduction number and easily analyze it in connection with other factors directly affecting the epidemic spread (number of daily infected cases, people's mobility level, rate of changes in people's mobility level). The results are presented in Figs.1-4 for 12 countries and can be extended to any country in the world reported in [18] and [19].

Comparison of reproduction number dynamics for different countries
General trends in the COVID-19 pandemic spread connected with Rt dynamics in different countries are described in Fig. 1. The following main trends may be identified: − The Rt values decreased significantly compared to the initial value for all considered countries. − For most of considered countries Rt has already dropped below 1, which is consistent with public statements [25][26][27].  The differences of Rt curve shapes for Italy, US, Russia, Sweden and China are shown. In the beginning of the outbreak the US and China had the highest values of reproductive number and Sweden -the lowest one. But the rate of Rt decrease was the fastest for China, and the slowest for Sweden and Russia.
The shape of Rt for Russia is presented in more detail on Fig.2 (b). Fig. 2 (b) indicates two dates when isolation measures were put in place in Russia: − On the April, 2nd government announced the second wave of quarantine measures (more strict than just recommended self-isolation, but still low enough); − On April, 15th mandatory the quarantine measures became strict -permits for living houses were introduced in Moscow and a few other large cities. In order to get the relationship of people's mobility with reproduction number, and automatically track this connection daily for different countries, the approach developed in this article was supplemented by visualizing Apple Mobility Index.

Relation between Apple mobility index and reproduction number dynamics
Relationships between Rt and the mobility index in different countries are presented in Fig. 3. It is supposed that this Mobility Index reflects the level of lockdown. There is a consistent correlation between reduction in mobility level and reduction in reproduction number values. The correlation is in good agreement with the results of [6]. The first group contains countries that applied lockdown to reduce the epidemic spread. Japan, United States, Germany, France, United Kingdom, Spain, Italy and Russia demonstrate rapid reduction in the mobility index by implementation of strict quarantine measures. The mobility data for China temporarily not available, but lockdown in China was called by WHO as "unprecedented in public health history" and allowed to stop the spread of coronavirus in China.
The second group are countries without lockdown. South Korea and Singapore introduced one of the largest and best-organised epidemic control programs in the world. Different measures have been taken to screen the mass population for the virus, and isolate any infected people as well as trace and quarantine those who contacted them, without further lockdown [28]. Sweden has not imposed a lockdown too, for citizens it was recommended to follow a series of recommendations from the government agency, but without tracing contacts of those infected [29]. It is seen that Swedish model is much less effective than South Korea's one, though in the beginning of the outbreak, the reproductive number for Sweden was only 2.6 because of high social distancing specific for Scandinavian countries.
For the third group of countries (India, Brazil and Turkey) the mobility index was relatively low initially, and quarantine measures did not have any significant impact on its value. So it is difficult to evaluate measures and their impact on epidemic spread.

3.3.Relation between derivative of reproduction number and Apple mobility index
To assess the relation between the dynamics of Rt and the level of mobility, the derivative of Rt in relation to the Apple mobility index was considered (Fig. 4). It can be seen that the highest rate of decrease in Rt correlates with the minimum of mobility index. The non-monotonous curve of Rt derivatives with monotonous mobility curve suggests that the reproduction number is influenced by mobility and non-mobility events (for example, massive violations of social distancing). Local extrema of the Rt derivative graph allow us to suggest the dates of such events. For all groups of countries, the mobility index trend matches the Rt derivative trend in general. It clearly shows that after removal of quarantine restrictions, the rate of Rt growth is increasing.

Discussion
There are some restrictions regarding the results of calculations. First of all, we do not separate internal and external cases, assuming implicitly that all incident cases after the first time-point arise from local transmission, i.e. it does not account for the possibility that cases (other than those appearing at the first timestep) are imported from other locations or derived from alternative host species. This assumption is used, for example, by [30]. In our case, it is applicable since we try to calculate the potential velocity of epidemic spreading and the way of infection appearance is not a case of the study.
Based on previous studies, we also suspect that recovered person can not become susceptible again, after having already recovered from the infection [31].
We do not formally distinguish the non-diagnosed individuals, who spread the infection more because they are not in isolation, and diagnosed individuals, who transmit the disease much less thanks to proper isolation and complying with strict rules, either in hospital or at home. To make predictions more accurate, it makes sense to use more accurate model that accounts for incubation period, such as SEIR model [32]. However, using more accurate models is likely not to have a huge impact on the resulting numbers, by may complicate the optimization process with limited number and accuracy of data points.

Conclusions
The model for calculating the time-dependent reproduction number (Rt) in real-time, based on classical SIR approach, without limitations on epidemic phase and implemented to countries globally based on publicly available data was provided.
The useful open-source tool for estimating Rt in real-time, based on this model, requiring only data that are commonly recorded during an outbreak (infections, recoveries, deaths, infection duration), with clear scientific methodology and without limitations on epidemic phase allows to forecast the progression of disease outbreaks and understand the effect of interventions on epidemics spread. This tool also allows to assess the disease transmission potential, understand the effect of interventions on epidemics spread and can be easily adapted to future outbreaks of different pathogens.
For COVID-19 spread it is shown that decrease in Rt value correlates with level of quarantine measures for group of countries with lockdown and the highest rate of decrease in Rt correlates with the minimum of mobility index.
The tool is available online as Python code from the Github repository.

Supplementary data
A tool for estimating the time-dependent reproduction number during SARS-COVID-19 pandemic for countries worldwide is available online as Python code from the following Github repository: https://github.com/shwars/SlidingSir

Funding
This work was supported as part of the research program of MSU Center for Storage and Analysis of Big Data.