A compartmental epidemiological model for the dissemination of the COVID-19 disease

A compartmental epidemiological model with seven groups is introduced herein, to account for the dissemination of diseases similar to the Coronavirus disease 2019 (COVID-19). In its simplified version, the model contains ten parameters, four of which relate to characteristics of the virus, whereas another four are transition probabilities between the groups; the last two parameters enable the empirical modelling of the effective transmissibility, associated in this study with the cumulative number of fatalities due to the disease within each country. The application of the model to the fatality data (the main input herein) of five countries (to be specific, of those which had suffered most fatalities by April 30, 2020) enabled the extraction of an estimate for the basic reproduction number R0 for the COVID-19 disease: R0 = 4.91(33).


Introduction
The dissemination of the Coronavirus disease 2019 (COVID- 19), an infectious disease caused by the 'Severe Acute Respiratory Syndrome Coronavirus 2' pathogen (SARS-CoV-2), has become a major global concern; on March 11, 2020, the World Health Organisation (WHO) declared a pandemia. At the time this work is publicised, no formal treatment or vaccine exists; to alleviate the suffering in the severe and critical cases, a number of empirical treatments are applied, and medicines, which had been developed against other diseases, are administered. The onset of symptoms (fever, cough, difficulty in breathing) occurs within a few days of the exposure. The (mild) first phase of the disease is frequently followed by a more problematic second phase, when complications (pneumonia, acute respiratory distress syndrome, kidney failure, etc.) arise and may prove life-threatening for the patient.
The disease is transmitted via human contact, when respiratory droplets are

The model
The first compartmental models in Epidemiology are approximately one century old. A good review of the efforts to model the dissemination of infectious diseases may be found in Ref. [2]. This note will be kept as short as possible, containing only what (in my judgement) is relevant to the model which is put forward in this study in order to account for the dissemination of the COVID-19 disease (and of similar diseases). The transfer diagram of this model is displayed in Fig. 1.

Definitions
At a given time, each subject is categorised into one of the following seven groups: • The 'susceptible' subjects (S) are those who can be infected.
• The 'exposed' subjects (E) are those who have been infected, but are not yet infective themselves. • The 'asymptomatic infective' subjects (I 0 ) are those who have been infected, but -whichever the reason -are not symptomatic. • The 'symptomatic phase-I infective' subjects (I 1 ) are infective subjects who are developing weak symptoms. • The 'symptomatic phase-II infective' subjects (I 2 ) are infective subjects who are developing strong symptoms (necessitating hospitalisation). • The 'immune' or 'unsusceptible' subjects (S) are those who have recovered with immunity, as well as those who might have been immune to the disease in the first place. • The 'deceased' subjects (D) are those who did not survive the phase II of the disease.
Evidently, these seven populations 1 depend on time t. Two remarks need to be made. a) At this time, it remains unknown whether the asymptomatic subjects are infectious [3], how much infectious they might be, and for how long; they will be treated here as equally infectious (and for an equally long temporal interval) as if they had been I 1 subjects (who recover from the disease). b) It remains unknown whether (at least in some cases) recovery from the COVID-19 disease induces immunity against the virus; also unknown is for how long such immunity might last. In this form (to be referred to as 'simplified version' henceforth), the model contains neither the vital dynamics (i.e., no births or deaths due to other causes) nor the effects of vaccination; the full version of the model, containing both effects, is outlined in Appendix A. Like most compartmental models, this model may be applied to the entire human population within a region of interest (identified herein as one sovereign country) or to subgroups of that population.
The discussion will be facilitated if, prior to entering the details, some additional definitions are given. As far as the early development of the disease for one infected subject is concerned, three are the important moments: • the instant t 1 at which the subject becomes infected, • the instant t 2 at which the subject becomes infective, and • the instant t 3 marking the onset of symptoms.
(For some diseases, t 2 > t 3 : an infected subject becomes infective after the onset of symptoms. In the case of the COVID-19 disease, there are indications of pre-symptomatic transmission.) Were the development of such diseases identical for all infected subjects, an infector and a related infectee would follow time-shifted, yet identical, timelines: if t 1 , t 2 , and t 3 were the infector's timeline characteristics, those of the infectee would be: t ′ 1 = t 1 + ∆t, t ′ 2 = t 2 + ∆t, and t ′ 3 = t 3 + ∆t, where the constant temporal interval ∆t ≥ t 2 − t 1 (and smaller than the temporal interval during which the infector remains infectious). However, neither the infection-to-infectiousness (t 2 −t 1 ) nor the incubation (t 3 −t 1 ) interval may be thought of as δ-distributed quantities; in case of the COVID-19 disease, broad distributions have been established. To describe the characteristics of the dissemination of infectious diseases, Epidemiology makes use of three temporal intervals: • the incubation interval t 3 − t 1 (defined for each subject), • the onset-to-onset interval (serial interval) t ′ 3 − t 3 (defined for each infectorinfectee pair), and • the infection-to-infection interval (generation interval) t ′ 1 − t 1 (defined for each infector-infectee pair).
One would expect that a) these three intervals are positive and b) the serial and generation intervals are equal. In reality, only the incubation and generation intervals are positive; negative estimates for the serial interval have appeared, implying that -on occasion -the infectee develops symptoms before the infector does; in the case of COVID-19, a number of studies [4][5][6] report on distributions of the serial interval with substantial negative tails.

The model parameters
In its simplified version of Fig. 1, the model contains ten parameters, and applies to the short-term (spanning temporal intervals of -typically -a few months) dissemination of diseases (or recurring cycles of diseases) for which no vaccines are available. To be able to model the dissemination of diseases which develop over longer time scales, the model would need enhancement, see Appendix A.
The parameters in the simplified version of the model are: • Parameter α is the rate parameter involved in the exponential decrease of the population of the exposed E, provided that the transmission process ceases. As very few results on the generation interval (which should be associated with α) are available at this time, α will be taken to be the inverse of the average serial interval. • Parameter β (infection rate). Being the product of the average number of contacts per subject per unit time and the probability of virus transmission in a contact between an infective and a susceptible subject, this parameter may be thought of as the propellant in the outbreak of the disease. • Parameters γ 1,2 . These two parameters regulate the rates at which the populations I 1 and I 2 , respectively, would diminish, were they not replenished by new infections. The decrease in the I 1 population is due either to the recovery of the subject or to the development of strong symptoms (in which case, the subject enters the phase II of the disease and the I 2 population). The decrease in the I 2 population is due either to the subject's recovery or to the subject's death. 2 A critical review of the existing literature for the serial interval may be found in Ref. [15], which concludes that the value of about 6.6 d has the best chances to account for the data from European countries. Although the argumentation in Ref. [15] (e.g., on the variability of the social contact across countries) may be valid, I decided to estimate the average serial interval using 'world' data. 3 Tindale and collaborators concluded that the infection was transmitted (on average) 2.55 d before the symptom onset in Singapore and 2.89 d before the symptom onset in Tianjin, China [11]; a similar conclusion was drawn by Ganyani and collaborators [5]. On the contrary, Zhang and collaborators found no evidence that the average incubation and serial intervals significantly differ [13].
• Parameter p 0 is the fraction of the exposed who become asymptomatic infective subjects (E → I 0 ). The remaining subjects, departing from group E, enter the I 1 population. • Parameter p 1 is the fraction of the symptomatic phase-I infective subjects who enter the phase II of the disease (I 1 → I 2 ). • Parameter p 2 is the fraction of the recoveries without immunity. • Parameter p 3 is the fatality rate, defined here 4 as the fraction of the symptomatic phase-II infective subjects who do not recover (I 2 → D).
Two remarks are due. a) From the point of view of the infectiousness and the recovery rate, the asymptomatic infective subjects are treated as if they were symptomatic phase-I infective subjects; unlike in Ref. [19], it will not be assumed here that the I 0 subjects are less infectious than the subjects of the other two infective groups I 1 and I 2 . b) The recoveries without immunity involve the same probability (p 2 ) regardless of the type of the recovered subject (i.e., whether that subject originally belonged to the I 0 , I 1 , or I 2 group).
The most essential factor in the dissemination of a disease is how the transmission compares to the recovery rate. Evidently, if the former is larger, the disease spreads; otherwise, it dies off with time. The aggressiveness of a virus is encoded in the ratio β/γ, identified in Epidemiology as the basic reproduction number R 0 (also known as transmissibility). In this study, R 0 will be the basic reproduction number, the property associated with the free dissemination of the disease, i.e., in the absence of any mitigation measures or (in general) enforced/spontaneous modification of the social behaviour of the subjects. On the contrary, the transmissibility of the disease in an environment where restrictions apply (either due to measures imposed on the social activities of the subjects or due to modification of the social behaviour of the subjects as the disease spreads) will be the effective transmissibility, a time-dependent quantity, and will be denoted by R. The effectiveness of the mitigation measures in controlling the dissemination of the COVID-19 disease has been addressed in Ref. [20].
Under these definitions, it follows that a disease tends to spread if R > 1, whereas it tends to wane if R < 1. The sole aim of the mitigation measures is to bring R to values below 1, and -desirably -make it as small as possible.
The only parameter which pertains to human behaviour (and may therefore be modified) is β. Isolation and restrictions in movement aim at reducing the number of contacts per subject per unit time, whereas the social distancing aims at reducing the probability of the virus transmission per contact. Either way, β is reduced, resulting (under constant γ) in a reduction in R. In this work, R will be identified as the ratio β/γ 1 , where β may be thought of as the effective infection rate, as opposed to the constant β 0 which is a characteristic of each virus.
At the beginning of this section, it was mentioned that the simplified version of the model contains ten parameters, yet eight have been listed thus far. The last two parameters enter the empirical modelling of the effective transmissibility.
The effect of the dissemination of a life-threatening disease is the inevitable increase in the cumulative number of fatalities with increasing time. Mitigation measures aim at disrupting the transmission process of the virus, hence at reducing the number of new infections, hence at reducing the number of new fatalities. Evidently, the reduction in transmissibility may be thought of as due to the imposition of the mitigation measures or to modification of the social behaviour of the subjects as a result of the rise in the exposure to danger, intensifying with increasing cumulative number of fatalities. My thesis is that what drives the reduction in transmissibility, the cause or the effect, is of no relevance: the important issue is the association between the reduction in transmissibility and the cumulative number of fatalities within each country.
To be able to compare the results across countries, the number of fatalities per country must be normalised to the population of that country; equivalently, one may reduce the number of fatalities to a fixed population, say n = 100 000 subjects, an option which will be followed here. If the cumulative number of fatalities in the initial population N(0) is D, then the number of fatalities per n subjects is d = Dn/N(0). The factor (to be applied to R 0 in order that R(t), equivalently β(t), be extracted) should be 1 at t = 0 (in the analysis, D(0) = 0) and continue to be equal to that value provided that D < 1. One may argue that the reduction in transmissibility commences at the moment of the first fatality, i.e., when d = d min = n/N(0). The simple form applicable for d > d min , was found to work reasonably well. The quantities R f and ζ are adjustable (free) parameters; R f would be the transmissibility if d could reach infinity; ζ −1 is related to the number of fatalities (per 100 000 subjects) which would be required so that the transmissibility drop to the average of R 0 and R f . One additional consideration, however inessential as (fortunately) d ≪ n in this work, must be taken into account. In reality, R(n) must identically vanish, as the equality d = n implies that there are no susceptible subjects to whom the disease could be transmitted. To be able to take this condition into account, the aforementioned curve R(d) for R f ≥ 0 will be followed up to the point d = d 0 from where the tangent to the aforementioned curve R(d) passes through the point (n, 0); that tangent will then be followed from d = d 0 up to the point (n, 0). (Of course, R(d) = 0 for d > n.) If R f < 0, the aforementioned curve R(d) will be followed up to the minimal value between the 0-crossing d min − ζ −1 ln (−R f /(R 0 − R f )) and n, above which R(d) will be set to 0. The quantity d 0 may be obtained iteratively when R f > 0 (the convergence is very fast), using the scheme whereas analytical solutions are available for R f ≤ 0. It needs to be emphasised that this correction to Eq. (1) was applied on account of principle and mathematical rigour; to all intents and practical purposes, Eq. (1) suffices.
All model parameters (save for R f ) are non-negative; furthermore, the probabilities p 0,...,3 ≤ 1. In principle, R f > 0, though it cannot be excluded that, R f being an effective parameter, some data might prefer a negative R f value.

Time evolution
The transfer diagram of Fig. 1 leads to the following system of first-order ordinary differential equations (ODEs).
where the dots indicate time derivatives, I = I 0 + I 1 + I 2 is the total number of infective subjects, and N = S +S + E + I is the total number of subjects who are alive at a given time. For fixed parameters, the time evolution of the seven populations may be obtained via standard methods. The fourth-order Runge-Kutta method with a constant increment of 10 −4 d (i.e., 8.64 s) was used, see Ref. [21], pp. 907-910.

Data
To fix the model parameters and be able to make reliable predictions, one needs -in addition to a successful modelling scheme -reliable (and accurate) relevant experimental data. At first, I thought that the number of new infections in each country could be used as input, but this option was abandoned, as it gradually became clearer that the confirmed cases represent a fraction of the true amount of the infective subjects I; the I 0 subjects will not visit a doctor (let alone take any test), and the same goes for many of those who belong to the I 1 group: weak symptoms point to common cold or influenza. I therefore came to disregard the number of new infections as providing credible information.
(For the same reason, I find the recovery rates equally unreliable.) There are two additional reasons why the new infections are not fit for use.
• During the outbreak of the disease, the number of diagnostic tests, performed daily within each country, generally increased with time. In an environment where not everyone can be tested, the consequence of an increasing number of tests from day to day is an increasing number of new infections, even for a constant number of infected subjects. As there is little or no information about how many tests are performed in each country on a specific date, infection rates (positive tests normalised to the total number of tests performed) can hardly be estimated. In addition, 'quantum steps' in the number of tests from day to day (e.g., due to the availability of a more effective test on a specific date) introduce discontinuities in the distribution of the new infections and reduce the reliability of these data further. One may argue that, unless reliable tests are made on random samples of the population in each country (in accordance with the principles of Sampling Theory), the fraction of the infective subjects (in one country, at one time) will remain unknown. • New infections refer to the time when tests are made and turn out positive, not to the time when the infections truly occur. A symptomatic subject will show the first symptoms after the (broadly-distributed) incubation interval elapses, and might wait for a few additional days (during the phase I of the disease) before seeking medical consultation; it is unlikely that any tests are performed within one week of the infection date. As a result, a delay of several days is certainly to be expected between the time instants when new infections truly occur and when they are discovered, consequently reported.
To summarise, yesterday's reported new infections are not 'new' at all; they are one to two weeks 'old', perhaps even older. As a result, any analysis of new-infection data must account for this delay.
It then dawned on me that, in all probability, the most reliable piece of information in each country is the number of new fatalities or their cumulative number D. I decided to make use of the latter and apply the model in the case of the first five countries on the list of the death toll: the US, Italy, the UK, Spain, and France [22]; by mid April, each of these countries had reported over 10 000 fatalities. In the case of France, the onset corresponds to the second fatality 5 . The input-cutoff date in this study is April 30; for the sake of verification, the input data were inspected on May 4, May 28, and June 8. The last inspection revealed extensive corrections in the case of the dataset from the US, spanning several weeks, and also affecting the entries in April [22]. Due to this revision, the number of fatalities by the input-cutoff date were increased (in that dataset) by 162. As a consequence, all fits to the data from the US had to be (and were) made anew; this version of the study represents the up-to-date status, also taking into account the corrections applied to the input dataset from the US sometime between May 28 and June 8.
There are two reasons why I avoided using the global data on the fatalities due to the disease.
• The mitigation measures against the dissemination of the disease were taken by sovereign countries, not at the international level. • The disease reached different parts of the globe at different times. In order to make meaningful use of the global data, one would have to shift each dataset in time. However, even this operation might not be sufficient.
To explain quickly why I do not consider the use of the global data (even after the shifting in time) meaningful, I will refer to the cumulative distribution of new infections and fatalities in Greece and in Switzerland, two European countries with comparable population; these two countries reported their first infections with one day of difference (February 26 and 25, respectively) and their first fatalities within one week (March 12 and 5, respectively). By April 30, 2 591 subjects had been confirmed as COVID-19-positive in Greece; 29 586 in Switzerland. By that time, Greece counted 140 dead; Switzerland 1 737. With hindsight, the dissemination of the disease in each country depends on a variety of factors, namely the timing of the mitigation measures, the efficiency of the healthcare system, the availability of tests, the population density, the social behaviour, the level of cross-border labour mobility (which, in my judgement, had the biggest impact in the aforementioned comparison between Greece and Switzerland), etc. Although the same argument also holds for the counties, provinces, departments, states within one country, their differences are expected to be less prominent than those across sovereign countries.
There are some unexplained features in the data, as they are given in Ref.

Optimisation
For the optimisation, the MINUIT software package [24] of the CERN library was used. The MINUIT software package is the standard in Particle Physics, and its robustness has been tested and proved during four decades of application. Other of its advantages include that it is freeware, it is downloadable from a safe site, and it is available in two versions: FORTRAN (the version used here) and C/C++.
Each fit involved four minimisation methods: SIMPLEX, MINIMIZE, MI-GRAD, and MINOS.
• SIMPLEX uses the simplex method of Nelder and Mead.
• MINIMIZE minimises the user-defined function, see Eq. (3), by calling MI-GRAD, but reverts to SIMPLEX in case that MIGRAD fails to converge. • MIGRAD is the workhorse of the MINUIT software package. It is a variablemetric method, also checking for the positive-definiteness of the Hessian matrix. • MINOS performs a detailed error analysis, separately for each free parameter. Given that it takes into account the non-linearities in the problem, as well as the correlations among the model parameters, MINOS yields reliable estimates for the fitted uncertainties.
For a fixed parameter vector and initial populations S(0),S(0), E(0), I 0 (0), I 1 (0), I 2 (0), and D(0), the system of ODEs of Eq. (2) is solved, yielding the seven populations S(t),S(t), E(t), I 0 (t), I 1 (t), I 2 (t), and D(t) up to time t, representing the sum of two temporal intervals: the first interval is the time required for D(t) to reach the first fatality/fatalities in the input dataset, whereas the second interval reflects the temporal span of the specific dataset.
As the fatalities are reported daily, the following minimisation function has been implemented: where D th i is the fitted value (corresponding to the solution D(t) for the i-th day in the dataset), D exp  3) is submitted to minimisation and the free parameters are varied (internally by MINUIT) until the χ 2 minimum is reached. Apart from the important results of each fit (i.e., the minimal χ 2 value, the fitted values and uncertainties of the model parameters, the Hessian matrix), the MINUIT software package also outputs (by default) a full report, containing the important results obtained in the intermediate steps of the optimisation. To ascertain the successful termination of the application and the convergence of its methods, the report file was routinely inspected.

Results
In this study, all results are accompanied by 1σ uncertainties, the standard in Physics, representing a confidence interval (CI) of ≈ 68.27%. To transform the results into corresponding to 95% CI, a routinely-used value in several other scientific domains, one must multiply the quoted uncertainties by ≈ 1.96. If exceeding 1, the Birge factor χ 2 /NDF, which accounts for the quality of each fit, was applied to the fitted uncertainties; NDF denotes the number of degrees of freedom in each case, i.e., the number of input datapoints minus the number of free parameters in that fit.

Fixation of the model parameters
As only the cumulative number of fatalities will be fitted to, efforts need to be made to fix as many parameters as possible from external sources; the data cannot determine more than three parameters (and, of course, also the question of which three parameters are varied is relevant).
• Parameter α. Using the average serial interval, extracted from Refs. [4][5][6][7]11,13,[16][17][18] (see end of Section 2.1), one may obtain: α = 0.230(13) d −1 . • Under constant γ 1 , the variation of R 0 is equivalent to that of the model parameter β. A compilation of early R 0 estimates in the case of the COVID-19 disease may be found in Ref. [25]. In Ref. [ [18], the average value 1.94 being accompanied by a 1σ uncertainty well below 0.1. Although an R 0 value was not explicitly reported in Ref. [28], one may obtain R 0 ≈ 3.42 from the authors' Fig. 2, along with an estimated 1σ uncertainty of ≈ 0.07, i.e., a value which appears to be compatible with the effective transmissibility values, extracted in Ref. [20] using data from Wuhan, China, before January 23 (see Fig. 4 therein). Finally, assuming a serial interval between 6 and 9 d, a broader R 0 distribution was favoured in Ref. [14], the R 0 range extending between 3.8 and 8.9 with 95% confidence.
With such an abundance of R 0 results, one might have expected that the fixation of this parameter would be straightforward. Unfortunately however, the minimal χ 2 value of the reproduction of these R 0 results by one constant is about 413.77 for seven degrees of freedom, or 59.11 per degree of freedom, enormous by all standards; one can hardly believe that the aforementioned studies report on the same quantity. Undoubtedly, the R 0 results are model-dependent; consequently, R 0 must be treated as a free parameter.
In the optimisation, the (unweighted) average of 3.3 of the values reported in Refs. [7,11,14,18,[26][27][28][29] will be used as initial guess for R 0 , and the square root of the unbiased variance of the same values (1.8) as a generous initial guess for the R 0 uncertainty. • Parameters γ 1,2 . A ballpark estimate of the values of these two parameters was contained in the WHO Director-General's opening remarks at a briefing on February 24 [30]: "for people with mild disease, recovery time is about two weeks, while people with severe or critical disease recover within three to six weeks." To obtain the recovery interval in the case of weak symptoms, four estimates for the time between the onset of the illness and hospitalisation have been used; the hospitalisation was taken as indication that the phase I of the disease was completed without obvious recovery. Of the four estimates, -two came from Ref. [7]: average value 12.5 d, ranging from 10.3 to 14.8 d with 95% confidence (data before January 1), and average value 9.1 d, ranging from 8.6 to 9.7 d with 95% confidence (data between January 1 and 11); -one from Ref. [13]: average value 4.4 d, entire distribution from 1 310 cases ranging from 0.0 to 14.0 d with 95% confidence (data between December 24 and 27, 2019); and -one from Ref. [14]: average value 5.5 d, ranging from 4.6 to 6.6 d with 95% confidence (data before January 18). One thus obtains a recovery interval for the phase I of the disease, T 1 = 4.98(90) d, and from that result: γ 1 = 0.201(36) d −1 . To extract an estimate for γ 2 , five results for the time between the onset of the illness and the conclusion of the hospitalisation, i.e., recovery or death 6 of the subject, were used: -one from Ref. [9]: average value 15.0 d, ranging from 12.8 to 17.5 d with 95% confidence (data from hospitalisation to death); -two from Ref. [31]: average value 18.0 d, entire distribution from 54 cases ranging from 15.0 to 22.0 d with 50% confidence (data from hospitalisation to death), and average value 22.0 d, entire distribution from 137 cases ranging from 18.0 to 25.0 d with 50% confidence (data from hospitalisation to recovery); -one from Ref. [18]: average value 20.0 d, ranging from 17.0 to 24.0 d with 95% confidence (data from hospitalisation to death); and -one from Ref. [14]: average value 16.1 d, ranging from 13.1 to 20.2 d with 95% confidence (data from hospitalisation to death). The weighted average of T t = 20.2(1.2) d is thus extracted. To be able to obtain γ 2 , one must subtract from this estimate the recovery interval of the phase I, in which case the characteristic time scale for the development of the phase II of the disease would be 15.2(1.5) d, which finally yields γ 2 = 0.0658(66) d −1 .
• Parameter p 0 . To the best of my knowledge, Mizumoto, Kagaya, Zarebski, and Chowell [3] were the first to report on p 0 ; their result came from an analysis of data acquired on board of the Diamond Princess cruise ship. In their work, the number of true asymptomatic was inferred from the time evolution of the ratio of the confirmed subjects who had developed symptoms (by a given time) to those who had not; their result was a true asymptomatic population of 113.3 out of 634 confirmed cases, resulting in p 0 = 17.9(1.2)%. More recently, the results of an analysis of five studies of the fraction of the asymptomatic subjects appeared [32]: the authors came up with a total of 65 asymptomatic subjects out of 413 confirmed cases, yielding p 0 = 15.7(1.8)%. Although the two p 0 results agree within the uncertainties, the estimate of Ref. [3] was not used in Ref. [32] on account of principle (as aforementioned, the true asymptomatic population was inferred in Ref. [3], not established via medical examination). I will extract a p 0 estimate from all six studies: summing up the entries of Refs. [3,32], one obtains 178.3(10.7) expected true asymptomatic subjects in a sample of 1 047 confirmed cases, yielding p 0 = 17.0(1.0)%. • Parameter p 1 . According to Ref. [33], 8 255 of 44 415 confirmed cases exhibited strong symptoms; this corresponds to 18.59(18)% of the confirmed cases. A p 1 result was also obtained in Ref. [10]: 173 out of 1 099 confirmed cases exhibited strong symptoms; their p 1 value (15.7(1.1)%) is somewhat lower than that of Ref. [33]. By summing up the corresponding entries in these two studies, one obtains p 1 = 18.52(18)%. • Parameter p 2 . Very little is known about this parameter, but (owing to the shortness of the follow-up time -the outstanding majority of the originally susceptible subjects remain unexposed) its impact on the results is deemed minor. According to a recent scientific communication by the WHO [34], "there is currently no evidence that people who have recovered from COVID-19 and have antibodies are protected from a second infection." Given the expected near insensitivity of the main results of this study to p 2 , the parameter was originally fixed to 1 (no recovery with immunity). However, I decided to opt for p 2 = 0.90(5), allowing for a small fraction of recoveries with immunity. Although the odds are not in favour 7 , knowing more about p 2 will be helpful in fighting the long-term development of the disease (i.e., potential recurrent cycles of the pandemia). • Parameter p 3 . According to Ref. [33], 1 023 out of 44 672 confirmed cases resulted in the subject's death. As 8 255 out of 44 415 confirmed cases developed strong symptoms, the probability p 3 is 1 023 · 44 415/(8 255 · 44 672) = 12.32(40)%.
• Parameters R f and ζ will be varied. After analysing the results of ten fits to the five datasets of this work, the initial guesses for the average value and uncertainty of the parameter R f were fixed to 0.22 and 0.20, respectively; of the parameter ζ to 0.78 and 0.40, respectively.
One last remark is due. In Ref. [35], a simple compartmental model with three groups (SIR) was proposed for the dissemination of the COVID-19 disease.
Given the two recovery rates of this work, the recovery rate γ of Ref. [35] may be thought as an effective one, related to the quantities γ 1,2 (hence to T 1 and T t ) of this work. To obtain an effective recovery rate herein, one must first obtain an effective recovery time, e.g., T eff = (1 − p 1 )T 1 + p 1 T t ; using the average values and uncertainties of T 1 , T t , and p 1 , as detailed above, one extracts: T eff = 7.80(77) d. The suggested value in Ref. [35], namely 8.05 d, is in good agreement with the T eff result of this work.
In summary, all the model parameters can be fixed from external sources, save for those three (to be specific, R 0 , R f , and ζ) which determine the effective transmissibility.

Description of the input data, optimal values of the free parameters
Each call to a MINUIT method generates a growth scenario; in each case, the initial populationsS, I 0 , I 1 , I 2 , and D are set to 0, and the number of exposed subjects E(0) is set to 10 −7 N(0); of course, S(0) = (1 − 10 −7 )N(0). (The populations N(0) of the five countries are fixed to estimates obtained in 2019 or 2020.) Therefore, the disease is assumed to start spreading (at t = 0) from ≈ 33 exposed subjects in the US, from 6 in Italy, etc. Either these subjects became infected in their respective countries (e.g., via contacts with infective tourists) or after a trip abroad; the details of the exposures are irrelevant. The system of ODEs of Eq.(2) is solved, yielding S(t), E(t), I 0 (t), I 1 (t), I 2 (t),S(t), and D(t) at t > 0; after the first fatality, the effective transmissibility R(d) of Eq.
(1) is used in the calculation.
The important results of the fits to the five datasets of this study are given in Table 1 Table 1 and Figs. 2-4 demonstrate, the quality of the fits is poor in the pure statistical sense. A few remarks on the five datasets follow.
• Regarding the data from the US, an undulating behaviour is noticeable in the reported number of new fatalities (the same goes for the corresponding data from the UK), with typically 3 − 5 d of systematically more reported fatalities, followed by 2 − 3 d of fewer incidents. • The dataset from Italy is best described with the model. • The data from Spain pose some questions; the transmission of the virus appears to have been faster there: the number of fatalities rises more sharply with time (than in the other four countries) and approaches saturation faster. • There is considerable fluctuation in the data from France; that dataset is worst described with the model. The data in the first half of April contain several spikes. The result of these fluctuations on the description of the input data is an unavoidable rise in the χ 2 values of the fits.
There are established ways to smooth 'noisy' data (e.g., LOESS, LOWESS, the Savitzky-Golay filter, the application of running-average windows, etc.). However, I decided to avoid applying any kind of smoothing algorithm and to continue instead to use the original values, as they appear in Ref. [22].
The analysis was repeated after removing from the input datasets the datapoints with fewer than 100 fatalities. This way, the relative uncertainty in each input datapoint does not exceed 10%. It was found that this cut has no sizeable impact on the results, see Table 1, lower part. Therefore, the low-N fluctuations cannot be blamed for the general poorness of the quality of the fits.
To be able to include the effects of the variation of the parameters α, γ 1,2 , and p 0,...,3 in the results for the free parameters, the following procedure was Table 1 The results of the fits of the model to the five datasets used as input (upper part: results from the analysis of the entire datasets; lower part: accepted in the analysis were only the entries exceeding 99 fatalities). The countries are listed in descending order of the fatalities reported by the input-cutoff date in this study (April 30). The first column identifies the country. The second column represents the cumulative number of fatalities D, whereas the adjacent column gives the number of fatalities d per 100 000 subjects (see end of Section 2.2) in that country. The last four columns contain the reduced χ 2 values of the each fit (χ 2 per degree of freedom), and the fitted values and uncertainties of the free parameters R 0 , R f , and ζ.
(The fitted uncertainties of the parameters in the MINUIT output represent standard errors of the means.) In each fit, the remaining model parameters were fixed to the corresponding central values as detailed in Section 3.1. For the fits to the data from France, R f was set to 0. put forward. One hundred combinations of random, normally-distributed, uncorrelated values for these parameters were generated, using the results for the central values and the uncertainties according to Section 3.1. Each such combination was used in one fit to the data from one country; as a result, twenty fits per country were performed. The important results (i.e., a country identifier, the minimal χ 2 , the NDF, the current values of the seven fixed parameters, as well as the results for the fitted values and uncertainties of the free parameters) were stored in binary files for later processing.
The parameter R f was fixed to 0 in the fits to the data from Spain and France; the three-parameter fits occasionally fail as the Hessian matrix either cannot be evaluated or does not come out as expected (i.e., positive definite). After this fixation, no problems were found in the fits. The twenty R 0 values and fitted uncertainties for each country were analysed further; to avoid the introduction of bias into the results due to any small fitted uncertainties, a median uncertainty was evaluated (per country) and replaced those of the fitted uncertainties (in the results of that country) which were short of that median value. A weighted average was next extracted for each country, along with the rms (root-mean-square) of the R 0 distribution, see Table 2.
The five average R 0 values of Table 2 may be thought of as 'independent observations'. The grand mean was obtained from these values and the standard error of the means was assigned to the grand mean as final uncertainty. The main result of this work is R 0 = 4.91 (33) .   Table 1), indicating a poor description of the input data in the pure statistical sense. Table 2 Weighted averages and rms values per country for the model parameter R 0 , obtained from 100 fits (in total) taking account of the effects of the variation of the parameters α, γ 1,2 , and p 0,...,3 according to Section 3.1. Contained in the second column is the range of the χ 2 /NDF values of the relevant fits. This result lies in between the values reported by Cao and collaborators [29], and by Tang and collaborators [27]. It agrees better with the former result (even though the R 0 value of Ref. [29] is not accompanied by an uncertainty). The difference to the result of Ref. [27], the largest of the reported R 0 values for the COVID-19 virus, is equivalent to an effect of about 3σ in the normal distribution. The R 0 result of this work provides support for the large-R 0 thesis in the case of the COVID-19 disease.
It must be mentioned that the scheme used in the empirical modelling of the effective transmissibility might have an impact on the estimate for R 0 . The reliable determination of the effective transmissibility in the dissemination of the COVID-19 disease is an interesting subject, which is surely worth additional investigation. One interesting transmissibility-reduction scheme, fine-tuned to the dissemination of the disease in Wuhan, was introduced in Refs. [27,36]. That scheme quantifies the impact of each of the mitigation measures and appears to be generally applicable (after its parameters have been appropriately fixed, separately for each country).
To provide estimates for the effective transmissibility, other methods have been put forward, e.g., see Ref. [37] for a review. One method, which can easily be implemented and requires only the time series of the new infections in each country, was introduced in Ref. [38]; The most probable temporal interval between the first exposures and the first fatality in each country may also be derived from the data. The results are:

Predictions
The results of the optimisation may be used in order to predict the time evolution of the populations of the seven groups of the model. For the sake of example, one obtains Fig. 6 for the time evolution of the groups E, I, I 2 , and D in the US. For this plot, the parameters α, γ 1,2 , and p 0,...,3 were fixed to their central values as detailed in Section 3.1. The fitted values of R 0 , R f , and ζ for the US were given in Table 1, upper part. An estimate for the total number of fatalities due to the virus (in fact, for the number of fatalities after this cycle of the pandemia is over) may be obtained by letting the application run for an adequately long follow-up time (representing infinity); running the software for 200 d (up to September 1) yielded the result: D ∞ ≈ 137 584.
At this time, meaningful uncertainties cannot be assigned to the model predictions; the procedure to obtain these uncertainties is straightforward, but has not yet been implemented. However, working uncertainties may be obtained indirectly, after using empirical models in the optimisation, e.g., models based on the log-normal, the Weibull, or the Gamma distribution; such models yield a relative uncertainty in the cumulative number of fatalities D ∞ around 5%.
It would be interesting to investigate how the model predictions compare with  Table 3. The differences between the predicted and observed values do not exceed the few-percent level.
It must be emphasised that predictions make sense if the collective behaviour continues to be shaped in accordance to the scheme used in the empirical modelling of the effective transmissibility. If the collective behaviour is modified, be that due to the relaxation of the mitigation measures or to re-adjusted norms following the long-term exposure of the human population to a lasting phenomenon (habituation), any predictions are likely to fail.

Conclusions and discussion
This work introduces a compartmental epidemiological model based on seven groups and tailored to the characteristics of diseases similar to the COVID-19 disease. In its simplified version, the model contains ten parameters, four of which relate to the physical characteristics of the virus, and four are transition probabilities between the groups; the last two parameters enable the empirical modelling of the effective transmissibility within each country, which is associated in this study with the cumulative number of fatalities in that country.
The novel part in this work is that it uses only the cumulative number of fatalities due to the disease in each country, i.e., data from the public domain. Given that, due to the unavailability of a fast and reliable diagnostic test, the number of infected subjects remains uncertain, the number of fatalities may be regarded (in my judgement) as the most reliable piece of information regarding this disease. Of course, 'being the most reliable' is not synonymous to 'being reliable'. Although I have not seen scientific studies on the reliability of the reported number of fatalities, several experts (and non-experts) have expressed their scepticism in newspaper and journal articles, in interviews, and in blogs. In some cases, it has been suggested that the true numbers may be as much as 60% higher than those reported [39].
Fits of the model to the data from five countries, namely from those which have been hit most severely by the disease (regarding the number of fatalities), have been performed following a transmissibility-reduction scheme based on the cumulative number of fatalities. An estimate for the basic reproduction number R 0 , the most important measure of the aggressiveness of a virus, for the free dissemination of the COVID-19 disease (no mitigation measures, no modification of the social behaviour of the subjects as a result of the rising death toll) has been obtained: R 0 = 4.91(33), see Eq. (4). This result disagrees with several early estimates, and points to a more aggressive virus (than those early estimates had suggested); the result of this work places the COVID-19 disease at equal level with poliomyelitis, rubella, pertussis, smallpox, and distances it from common cold and influenza.
Provided that the future behaviour of the susceptible subjects matches their past behaviour (e.g., regarding social distancing and self-isolation in suspect cases), predictions for the time evolution of the populations of the seven groups of the model may be obtained from the results of the fits.
Last but not least, the reliability of the results extracted herein relies on the validity of a number of assumptions.
• No significant contributions have been left out of the transfer diagram of the model (see Fig. 1). • The association between the reduction in transmissibility and the cumulative number of fatalities in each country, as explained in Section 2.2, holds. It might be that the scheme used in the empirical modelling of the effective transmissibility has an impact on the extracted estimate for the nominal transmissibility R 0 .  I have no affiliations with or involvement in any organisation, institution, company, or firm with/without financial interest in the subject matter of this work.

A The inclusion of vital dynamics
The inclusion of the vital dynamics, as well as of the effects of the vaccination, introduces three additional parameters: • The birth rate λ in the country under investigation.
• The mortality rate µ in the country under investigation.
Discarding passive immunity, the complete version of the model rests upon the following system of ODEs. where I = I 1 + I 2 + I 3 and N = S +S + E + I. This model may be used when treating diseases which spread over temporal intervals longer than a few months. Of course, a lack of vaccine implies that ǫ = 0.