Mathematical modeling of the development of confirmed daily infection numbers in the COVID-19 pandemic by a special exponential function

The developments of confirmed daily Sars-CoV-2 infections can be modeled a posteriori in a comparatively simple and satisfactory way by means of a special exponential function: its exponential coefficient varies exponentially with time itself. This property enables the function to directly simulate the „daily case“ curve fragments observed for the first waves of the current epidemics in a number of counties and countries. Linear combinations of two or more of the functions allow for the modeling of the complete curves observed in the vast majority of all regions for which case number developments have been tabulated.


Introduction
For the mathematical modeling of the progress of infection numbers in the course of epidemics, e.g., the COVID-19 pandemic, usually theoretical models are employed, which base on systems of differential equations and most often are copies or extensions of the well-known SIR model [1][2][3][4][5][6] (but see also, e.g. [7,8]). The letter "I" in SIR (or -expressed as a time-dependent function -"I(t)") represents the number of infectious persons in a certain pool of people (e.g. the population of a country) at a certain day. Differing from that, the symbol Î(t) will represent, in this paper, the number of confirmed new infections at a certain day t, or the timedependent development of this number if t is growing continuously.
While there generally are no real I(t) data which is, of course, also valid with respect to COVID-19, Î(t) real numbers for the pandemic have been registered and documented for numerous countries and counties [9]. Under the assumption that the relation between the number of confirmed cases and the number of the actually newly infected individuals doesn't vary too much in time, the development of the former should still allow conclusions about the spreading of the epidemic in a certain country.
As a relation between I(t i ) and Î(t i ) can -under certain simplifying conditions -be formulated quantitatively (e.g., by assuming a relative dark number), the above mentioned mathematical models can be calibrated by comparison of their predictions with the real case number developments Î(t) real , which is equivalent to modeling such developments a posteriori [see, e.g., [3][4][5][6].
A somewhat different strategy is to model the progress of case numbers Î(t) real (or the hereof derivable development of the cumulated confirmed case numbers (t) real ) in a country directly by a single mathematical function with the aim to be able to predict the further development of the epidemic. Modeling attempts by using a modified Gauss error function [10] or a logarithmic normal distribution function [11], for example, led to satisfying matches with the corresponding real case number developments. However, in the first case, the symmetry of the model functions is incompatible with the skewness of almost all documented (t) real (and Î(t) real ) developments, in the second case the models predicted a soon-to-be end of the epidemics (far in the past, as seen from now) for numerous (e.g. West European) countries, which obviously is inconsistent with reality.
In the following we present the alternative mathematical function (1) to be used for a posteriori modeling of Î(t) real -developments (COVID-19) with respect to -in principle -arbitrary countries or counties, which doesn't suffer from the two mentioned drawbacks:

Results
By means of the Python3 software mentioned below a satisfying fitting of a single function of type (1) to the developments of the numbers of confirmed new infections Î(t) real could be obtained for numerous countries and US counties (status: end of June, 2020). The quality of the fit increased (not unexpectedly) when 7-days averages were modeled instead of daily values. Averaging (performed symmetrically in time, see section Methods) smoothes the frequently observable fluctuations of confirmed case numbers in a weekly rhythm and other discontinuities which accompany the execution of tests and the reporting of positively tested cases. A similar (but asymmetric) averaging is, for example, performed for the determination of the so-called reproduction number [12]. For 33 countries und 24 US counties a very good fit (R 2 > 0,97) of a function of type (1) to the smoothed case number curve could be obtained (status: end of June, 2020) ( Fig. 1 and Supplementary Material: Cov19_1F.pdf). Seen from today (end of October, 2020), the corresponding case number developments represented the "first waves" of the epidemics in the corresponding countries and counties.
In more than 180 additional cases (56 countries, 125 US counties), for which a model consisting of a single function of type (1) had led to unsatisfactory results, modeling of the smoothed case number curves succeeded in a simple and optically satisfactory way (R 2 > 0,97) for the sum of two or three of such functions, in principle corresponding to the occurence of a second and third infection wave (status: end of June, 2020; see also Supplementary Material: Cov19_2F.pdf and Cov19_3F.pdf).
The number of successful fittings can be expanded to far beyond 238, when sums of more than three functions of type (1) are allowed for the models. By end of October, 2020, this measure is now actually required for almost all countries (including those represented in Fig. 1) if modeling of the complete case number developments is desired (Fig. 2). An answer to the question, how far the single curves in the different cases actually correspond to real processes, still requires additional investigations.
It should be noted here, finally, that (1) is suited to model, e.g., "daily deaths"developments D(t) in a similarly satisfying way as compared to Î(t), an aspect which however will not further be discussed in this paper.

Discussion
One advantage of the modeling method presented here is that it doesn't base upon differential equations, which means that it is comparatively easy to understand and comparatively simple to apply. Our model function (1) is an exponential function with another exponential function in the exponent. It shows the asymmetry (relative to the curve maximum) which could be observed for almost all real pure "first wave" case number developments. When the foothills of the waves had been reached (as to be seen in Fig. 1), the function did not predict a soon-to-come end of the epidemics. As (1) doesn't fall below Î(0) at its right-hand side, it predicted insteadhitherto undisproved for the majority of all countries -a long persistence of the epidemics.
Î(0) represents the function value at t = 0 and plays the role of a proportionality factor. An Î(0) value of 1 could formally be assigned to a single infected person who starts the infection wave, Î(0) >1 then would represent the more or less "synchronous" start of more than one "individual" waves superimposing each other within the population of question. Of the two additional parameters in (1) a seems to quantify the initial unrestricted exponential growth of the number of daily confirmed infection cases due to (2): In (1), the constant a from (2) is replaced by the time-dependent , which means that the initial growth of the function is increasingly weakened and finallyafter having passed the curve maximum -replaced by a decrease. The amount of the weakening of (2) (and thus -at first sight -the success of the national efforts to stop the epidemic) is quantified by b. An irritating fact in this context is, however, that b is a constant which doesn't change in time. Measures like "social distancing" or "lockdown" obviously don't lead to an increase of b in the course of the development (but see below).
Of the almost 60 countries and counties, of which the smoothed case number developments could be described in a satisfactory manner (R 2 > 0,97) by a single function (1), t max was clearly exceeded in 27 cases (status: end of June, 2020). Of these, the nine West European countries are presented in Tab. 1 together with the corresponding parameter values and additional specific data. As can be seen, the highest determined values for a as well as for b differ from the lowest ones by about a factor of 3. The quotients a/b, however, scatter only slightly, i.e., with a standard deviation of 1 about their common mean value 10. If the additional 18 countries/counties (not in Tab. 1) are also accounted for, the scattering of the quotient values grows significantly, thus, a value of about 10 cannot be expected in general for a/b. If (1) is fitted to the first parts of the curves of the nine countries only (the fragments ending at about half of the maximum value of the smoothed curves) this generally results in a larger a and a smaller b value as compared to the values in Tab. 1. Thus, national antiviral measures become visible at least in this comparison. Furthermore, this findings insinuate that, in contrast to the initial approach given above, a and b can not be discussed separately with respect to their epidemiological meanings.
The quotient a/b determines via Î(0) the maximum value Î max of the function (see Appendix: Function maximum). Because of the relatively small variation width of the a/b values the differences between the Î max values in Tab. 1 are mainly caused by the differences in the Î(0) values. Î(0) itself is positively correlated with the population number Z. Via Î(0) / Z as a new proportionality factor normalized graphs of the modeled case number developments can be obtained (Fig. 3), which allow a direct comparison of the progress of the epidemics in the different countries.
Tab.1. Parameters of (1) 1) and other data for some West European countries with R 2 >0.97  Another characteristical quantity, which is determined by both parameters a und b, is the full width at half maximum t h of (1). It can be obtained as the difference of the two solutions of equation ( By the possibility to expand the model to a sum of n functions of type (1) the case number developments of most of the countries and counties for which corresponding data are documented become possible objects for modeling. In the first instance, mainly those cases are of interest where a satisfactory modeling can be achieved for a comparatively small number n, as is the case for the examples shown in Figs. 2 and 3, with n = 3 to 7. In such cases, the spread of the virus which is mirrored in the development of the case numbers can be thought of as being composed of n spatially and/or temporally separated infection waves. We cannot say, however, how far this attribution is generally justified.

Methods a) Development of the function equation
Summing up the daily published Î(t) real values in the past leads to the number of cumulated confirmed new infections (t) real . For an unrestricted spread of an epidemic one would expect for (t) (and -with a smaller growth -also for Î(t)) an exponential initial increase as given in (4): For any two function values (t) and (t+1) equation ( From the numbers published by the Robert-Koch-Institut (RKI) for Germany [14] one could realize as early as March 2020, that the progress of (t) real stayed behind the one to be expected according to (4).
Therefore, one of us (MS) investigated, how changes in time, when it is calculated (via (5)) by employing the real cumulated case numbers(t+1) real und (t) real . The result was an exponential decrease (per average) of  due to  = ( Fig. 4).
Following this result, the "behaviour" of a simple exponential function C = C(0) was investigated, when the constant c in the exponent is replaced by the variable, time-dependent expression .
The resulting function graphs strongly resembled the development of daily cases Î(t) real in Germany, which immediately led to the tentative formulation of (1). In (1) a and b determine the shape of the function curve (blue in Fig. 5), while Î(0) acts as a proportionality factor. The function exhibits a single maximum at t=1/b (see Appendix: Function maximum). With an estimated value of t max (according to -8 --9 -the real data) b could be determined. b and an estimated value for Î max then allowed the calculation of a. By an adjustment of the parameters Î(0), a und b with the coefficient of determination, R 2 , as the only criterion, (1) could be made to model the real case number development up to end of May 2020 in a satisfactory way (Fig.  5).

b) Conception and development of used software
Starting from the above results, two of us (EK and JK) developed concepts for the automatic modelling of COVID-19 case number developments by means of (1)  averaging for day t j is performed symmetric-in-time across the days t j -3, t j -2, … t j +3.  Optimization of coarsely estimated starting values for the parameters Î(0), t(0), a, b by means of a least-squares method [17]. The additional parameter t(0) allows positioning of the model on the t axis and is introduced by replacing all instances of t in the right side of (1) by the expression (t -t(0)).  Optional extension of the model towards a sum of different functions of type (1).  Optionally adjustable temporal limitation of the case number developments at one or both sides.  Output of diagrams and refined (= optimized) parameter values

Conclusion
For quite a number of countries and counties distributed across the globe a satisfactory a posteriori modeling of the development of the "first wave" fragments of the confirmed daily new infections by Sars-CoV-2 is possible by means of a single function of type (1). The number of these countries and counties can be increased substantially, and the restriction to the first wave can be given up, when the models are extended to a sum of few functions of type (1) (which formally corresponds to a superposition of few different infection waves). Thus, our modeling approach seems to be generally applicable as far as the COVID-19 pandemic is concerned. Insofar, the mathematical structure of (1) which is characterized by an exponentially changing factor in the exponent seems to express an up to now unnamed feature of the pandemic.

Function maximum
The first derivative of equation (1)