Defining and Computing Reproduction Numbers to Monitor the Outbreak of Covid-19 or Other Epidemics

We present a general approach to define reproduction ratios or numbers to monitor the outbreak of epidemics that are modeled by mathematical evolution equations. This provides a solution to an important topic that has not been completely settled in the literature, especially in the case of complex epidemiological models. We illustrate our procedure with a full implementation of a standard deterministic SEIR model that is applied to examine the Covid-19 outbreaks and the effects of intervention measures in several countries in America (Argentina, Brazil, Mexico, USA) and Europe (France, Italy, Spain and UK) in 2020. Our code is also used to investigate herd immunity levels for Covid-19, indicating values between 85% and 90%.


Introduction
The monitoring of the evolving state of a serious epidemic can be done during and after its outbreak by estimating the daily values of basic ratios generally known as reproductive or reproduction numbers [5,6,7,13]. While not properly geared to allow serious predictions of future values of the epidemic, they are nevertheless able to display the past and present history with amazing clarity. However, as their calculation depends on the values of various mathematical parameters (like the length of transmission and incubation periods), this ability may be impaired by inaccuracies in their estimation. This is particularly true for the widely used basic reproduction number, which measures the average number of secondary cases generated by a typical infectious individual in a full susceptible population ( Figure 1). On the other hand, once some mathematical model has been chosen to simulate the disease dynamics and its parameters determined, several alternative reproductive numbers become automatically available at no additional computational cost, many showing very little dependence on key parameters like transmission or incubation times. We will illustrate this fact in the context of deterministic SEIR models, but our approach can be adapted to other mathematical models (deterministic or stochastic) as well.
The idea is most easily explained by considering the simplest SEIR model of all, defined by the equations (1.1) below. This model divides the entire population in question into four classes: the susceptible individuals (class S), those exposed (class E, formed by infected people who are still inactive (i.e., not yet transmitting the disease), the active infected or infectious individuals (class I) and the removed ones. The latter class is formed by those who have recovered from the disease (class R) or who have died from it (class D). The dynamics between the various classes is given in the universal language of calculus by the differential equations (1.1) see e.g. [2,4,7,12] for a detailed discussion of the various terms and their meanings. The parameters β (average transmission rate) and r (average lethality rate of the population I due to the disease) vary with t (time, here measured in days), but δ and γ are typically positive constants given by where T t denotes the average transmission period and T i stands for the mean incubation time, which will be taken as 14 and 5.2, respectively [9,10,14]). In the system (1.1), N denotes the full size of the susceptible population initially exposed, so that we have S(t 0 ) + E(t 0 ) + I(t 0 ) +R(t 0 ) + D(t 0 ) = N , where t 0 denotes the initial time. Observing that, by the equations (1.1), the sum S(t)+E(t)+I(t)+R(t) +D(t) is invariant, it follows the conservation law since, for simplicity, the model neglects any changes in the population due to birth, migration or death by other causes during the period of the epidemic (of the order of a few months). To well define the model (1.1), besides informing the functions β(t) and r(t) we need to provide the initial values S(t 0 ), E(t 0 ), I(t 0 ), R(t 0 ), D(t 0 ), which is not a trivial task, since not all of these variables are reported, and those reported may be in error -which may well be large in case of significant underreporting. It thus seems clear that predicting reasonably right values for the variables S(t), E(t), I(t), R(t) and D(t) at future times is not a simple problem, especially in the long time range. The situation becomes even more complicated for more complex (i.e., stratified) models, which add other variables and parameters to be determined. Calibrating many parameters can quickly become a nightmare. For all its simplicity, models with few variables and parameters like (1.1) can yield surprisingly good results and thus should not be overlooked, as will be seen in the sequel.

Implementing the SEIR model
Having introduced the SEIR equations (1.1), we now describe an implementation of this model that is suitable for the computation of reproduction numbers.
(i) assigning a value to the population parameter N In the case of Covid-19, which can be considered a new virus (SARS-CoV-2), it has been common to assume the entire population susceptible and assign its whole value to N. This is highly debatable, since this parameter refers to that particular fraction of the susceptible population that is effectively subject to infection. For deterministic models, this introduces the possibility that an outbreak might not happen after the introduction or reintroduction of a few infected individuals, as it has been long recognized in the stochastic literature [1,9]. In any case, it turns out that N is not so much important for the short range dynamics as it proves to be in the long run (see Figures 2a and 2b), so that for our present purposes this is not a serious issue. We have therefore taken for N the full population of the region under consideration.
Initial values S 0 , E 0 , I 0 , R 0 , D 0 for the five variables are generated from a starting date t s on, which is taken so as to meet some minimum value chosen of total reported cases (typically, 100). Denoting by C r (t) the total amount of reported cases up to some time t, and letting EIR(t) be the sum of the populations E(t), I(t) and R(t), we set where f c ≥ 1 denotes a correction factor to account for likely underreportings on the official numbers given. (In (2.1), we have neglected possible underreportings on the number of deaths, which could of course be similarly accounted for if desired.) Again, this factor will not play an important role in this paper and could be safely ignored, but it should be carefully considered in the case of long time predictions.
Having estimated EIR(t s ), we then set where a = T i /(T i + T t ) and b = 0.30, consistently with the literature (see e.g. [14]). The arbitrariness in this choice of weights gets eventually corrected as we compute where t F stands for the final (i.e., most recent) date of reported data available. For each t 0 , the solution of the equations (1.1) with the previously obtained initial data best fit the reported data for these variables on [ t 0 , t 1 ] in the sense of least squares [12]. (Here, d 0 ∈ [ 2, 10 ] is chosen according to the data regularity.) Once this solution (S, and move on to the next time level t 0 + 1, repeating the procedure until t F is reached.
Having completed the previous steps, we can address the possibility of prediction. Although this is not important for our present goals, it is included for completeness. Choosing an initial time t 0 ∈ (t s , t F ], we then take the initial values In order to predict the values of the variables S(t), E(t), I(t), R(t), D(t) for t > t 0 , it is important to have good estimates for the evolution of the key parameters β(t) and r(t) beyond t 0 . This is the most computationally intensive part of the algorithm and is better executed in large computers. Such estimates can be given in the form where β 0 , a β , λ β , r 0 , a r , λ r ∈ R are determined so as to minimize the maximum size of weighted relative errors in the computed values for C r (t), D(t) as compared to the official data reported for these variables on some previous interval [ t 0 − τ 0 , t 0 ] (weighted Chebycheff problem) for some chosen τ 0 > 0 (usually, 20 ≤ τ 0 ≤ 30). This problem is solved iteratively starting with an initial guess obtained from the analysis of the previous values β 0 (t), r 0 (t) computed in the step (ii ) above. The result is illustrated in Figure 3 for the case of β(t), with similar considerations for r(t). Once β(t), r(t) have been obtained, the equations (1.1) are finally solved (Figure 4).

Reproduction numbers
A natural by-product of the results generated by the algorithm is the estimate of reproduction numbers of the epidemic, which measure the intensity of transmission at various times and, in doing so, are useful indicators to monitor the situation and the effects of intervention procedures that may have been taken. Using the generic symbol R t to denote such quantities, 1 they signal a rise in the number of infections in the case R t > 1, their decrease when R t < 1, and temporary steadiness if R t = 1. For instance, rewriting the equation for the critical population I(t) in the form we see that I(t) will increase if α(t) > 0, decrease when α(t) < 0 and stay about the same if α(t) = 0 -or, in terms of the ratio whether we have R t > 1, R t < 1 or R t = 1, respectively. Another natural possibility would be to consider basic ratios like for some chosen d > 0. For example, the choice d = T t /2 corresponds to the standard basic reproduction number, or the mean number of secondary infections caused by a typical infected individual during his transmission period [9,12]. The corresponding expressions would be, using the calculations performed in step (ii ) of the algorithm, where r 0 (t) denotes the lethality rates computed there, or else t seemingly more influenced by seasonal (weekly) variations in the data. We have found R (2) t particularly useful, with numerical results that are consistent with previous analyses 1 The notation R t is natural in stochastic models, and is adopted here as we have already used R(t), R 0 (t) with other meanings (size of the recovered population and their initial values, resp.).
(see e.g. [13]). For time scales such as those of Covid-19, the choice d = 3 is good to zoom in the scenario and facilitate the reading (Figure 6), while not compromising robustness (Figure 7). t with respect to large uncertainties on the value of transmission time. Date zero refers to 100 cases reported, that is: 03/13/2020. (As in Fig. 5 and Fig. 6 above, calculations were based upon official data reported at https://covid.saude.gov.br.)

Applications
In this section we will illustrate the use of reproduction values by examining the evolution of Covid-19 in various countries around the world under the view of such numbers -choosing for definiteness the numeric ratio R (2) t defined in (3.4) above as our basic indicator, unless explicitly stated otherwise. Thus, we set where I 0 (s) is the size of the active infected population at time s as computed in the step (ii ) of the SEIR algorithm (see Section 2).
Taking right decisions about intervention or relaxation measures is a very difficult and complex process that involves a careful consideration of various mathematical indicators and a lot of other factors including many health, economic and social issues. In the following examples we consider only the single factor given by reproduction numbers. For all the simplicity and obvious limitations of this approach, it offers nevertheless precious insight and information about the disease dynamics and evolution. Despite the encouraging behavior of R t shown in the following fortnight (yellow band), the indicator resumed increasing by mid-June due to further disease development in less affected areas of the country, particularly the southern and central western states. Another negative factor is that flexibilization of control measures has been introduced before the various regions had attained a state of epidemic control (R t < 1), which is not ideal.    Despite successfully bringing the epidemic under control, the number of reported cases and deaths was very high due to the initial delay in taking intervention action.