Estimating the Beginning and End Dates of Covid-19 for several countries using the Logistic Model

This note applies the Logistic model approximation to determine the suitable start and end dates for the observed epidemic curves in the total number of cases for different countries. The Logistic model is presented and explicit relations for the beginning and end dates are obtained together with the total epidemic duration. Using data from Brazil, Germany, Italy, and South Korea, the extreme dates are calculated. Since the epidemic onset time is found, a fair comparison of the epidemic curve for these countries is obtained. The result does not depend on the poor statistics available in the early phase of the epidemic when the initial number of infectives is unknown. In fact, the total duration depends only on the characteristic time parameter of the LM model.


Introduction
The current Covid-19 epidemics is a new challenge to the public health authorities and a threat to the world. The outbreak should be correctly managed as a task that starts by acquiring accurate information about its beginning date. The end date is much harder to predict since it is the result of a complex chain of actions and interactions between the population and the disease. The effect of quarantine, testing policy, and vaccination impacts the future circulation of the virus.
Such a challenge calls for the use of models and simulation work in order to access much-hidden information beyond publicly available data. As it is well known, the so-called Logistic model (LM) [3] can capture the general evolution of many epidemics, having been widely observed in Nature. It leads to accurate descriptions of closed populations in the limit of large numbers since it is the solution of a deterministic model [1,2,3].
In this work, the evolution of the LM is used as an approximation to estimate the extreme dates of the epidemic. The beginning date is hard to estimate because the number of initial infectives introduced in the host population is unknown and the first cases are generally poorly notified. On the other hand, the end date is also difficult to determine for the reasons outlined above. This note applies the Logistic model by least-square fitting of the LM as described previously [4] and then calculating the dates as a byproduct of data fitting using the publicly reported number of cases.

The Logistic Model
Given a closed population with N individuals at a given time t, the rate of change of the population size can be modelled by the differential model with r the rate of population growth and K the so-called "carrying capacity" [3].
When N is small, the population growth is approximately proportional to rN . As t → ∞, the population size tends to K and the growth stops. Eq. 1 has as solution with A a factor associated with the initial population size. The growth factor r is the inverse of a "time constant" τ related to the early growth rate. The idea of this note is to apply the growth given by Eq.1 to describe the evolution of the number of cases of the Covid-19 epidemics [1,4].
Given Eq.1, the growth rate is written explicitly in term of the time aṡ From this equation it is possible to see that, for t → 0, the LM is in fact an exponential law from which it is possible to relate A and K to the initial population number N (0). Therefore, in the early phase, the growth is simply given by ≈ N (0)e rt .
On the other hand, as t → ∞, N (t) → K, and the carrying capacity is the final population size. As described in [4], the model should be applied with care to describe the evolution of epidemics like Covid-19 since, in general, the reported number of infections is a compilation of cases from several regions which distinct epidemic starting dates. Using Brazil as an example, the reported case curve is a sum of the contribution from widely separated cities with different populations and starting dates and subjected to a variety of errors. However, within certain limits, since the growth is dominated by large cities (as São Paulo and Rio de Janeiro) where the epidemics started almost simultaneously, the observed curve of reported cases seems to be fairly well described by the function of Eq. 1, see Fig.1.
The "maximum epidemic time"t is estimated as the time when the maximal daily rate occurs. Since andṄ is maximum. At this time, N (t) = K/2 anḋ At timet, the maximum of "daily" number of cases is attained which corresponds to the infection of half the number of the projected carrying capacity, also known as "epidemic maximum". Therefore, there are two maxima: one associated with the maximum growth rate and another which is only achieve as t → ∞. Data set available at [5].

Finding the beginning and end dates
While it is possible to define an initial time for the epidemic onset (for example, by finding the date of the first reported case), the evolution given by Eq. 1 never ends. Therefore, we use Eq. 2 as an indicator for the beginning and end dates since it falls off as t → ∞. Let α be a positive real number with α 1. Define the function the extreme dates are found by which admits two solutions t 0 and t ∞ associated with dates close to the start and end dates. In other words, the epidemics starts and ends when daily number of cases is a fraction α of the observed maximum given by Eq.7. The start date is when the value of f (t) exceeds the fraction while the end occurs when it is just below α.
By setting x = exp(−rt), Eq. 9 leads to with the discriminant ∆ = 4A √ 1 − α/α and two solutions: Now, for 0 < α 1, and approximate values for the times are found from The total number of cases for each solution is Therefore, t + = t 0 and t − = t ∞ with t = − ln(x)/r. Defining ∆t = t ∞ − t 0 as the total duration of the epidemics, it is possible to show for α 1 that which does not depend on K nor A. Therefore, according to the LM, the total period of the epidemics depends only on the growth rate measured in the beginning of the outbreak.

Application
In fact, the value of r varies for different populations and is strongly affected by the error in the validity assumption of the LM model to describe the evolution of Covid-19 using the public number of cases. For the example shown in Fig. 1, τ = 6.3 days, which represents an average growth rate fitted using all data available on the date (April 16-th, 2020). If n is the observation date (for which a set of data points from 1 to n is available), the fitting procedure described in [4] produces a sequence {K(n), A(n), r(n)} and, therefore t 0 (n) and t ∞ (n), which evolves with n. By the time of this writing, using the total number of cases for Brazil with data available until April 16-th (n = 49, Fig.1), the sequence t 0 = {3, 2, 4, 4, 3, 4, 5} and t ∞ = {87, 86, 83, 83, 84, 83, 79} was obtained when the least-square fit was run backwards in time for n = {49, 48, 47, 46, 45, 44, 43}. The value of α is 0.5%. A fixed time window with m = 12 is used (the value of m is defined in [4]). The date n = 1 corresponds to February 27th while n = 85 is May 21st. Thus a total duration ≈ 80 days is predicted for Brazil with an epidemic end time by the end of May. Since the daily maximum according to Eq. 8 isṄ max ≈ 1900 cases per day, the epidemic threshold for the daily cases in Brazil given α is ≈ 10.
Results are shown in Table 1 for S. Korea, Brazil, Italy and Germany using data available on the HDE website [5]. As can be seen, most countries have total predicted periods in the range of 84-90 days. Therefore, the value of τ is nearly the same (≈ 6.5 days). The only country for which the value of K has converged is South Korea. The other ones may change with time. The predicted final dates ate near the same for Brazil and Germany and are expected by the end of May. Since the onset dates are determined, one can plot the evolution of the number of cases with date "0" corresponding to t 0 of each country, as a kind of time-normalization. Fig. 2 reveals some similarities between the curves of Germany and Brazil as suggested by the parameters in Table 1.  Table 1 adjusted for t 0 : Germany ( ), Italy ( ), Brazil ( ) and S. Korea (+).

Conclusions
The assumptions of this note are: The LM model assumes a closed population, however, its is used here as a function whose parameters can be used to characterized the intensity and duration of the outbreak under the hypothesis of a single outbreak.
To a certain extent, a LM model can be fitted to the number of cases curve observed for several countries as the present Covid-19 epidemics unfolds.
The accuracy of the approximation for the beginning and end times depends strongly on how data is retrieved and updated. As time goes by and new data are fed into the disease records, the beginning and end times may change.
We derive a simple relation for the total period of the epidemic which depends only on the factor r which is associated with the time evolution of the population number of cases in the early phases of the disease. The dates and total periods obtained offer a quantitative estimate of the epidemic dynamics. Under the approximation used and given the initial epidemic dates calculated, we can compare the evolution of the crude number of cases for several countries and the epidemic duration for each country. The same determination can be applied to small demographic cluster like states and cities.