Covid-19 Fatality Rate Between 0.1 and 0.3%, Gangelt and Santa Clara Combined

Covid-19 fatality rates of 0.37% (Gangelt, Germany) and 0.17% (Santa Clara, USA) have been reported, estimated from serological studies. We show that the two confidence intervals strongly overlap when the uncertainty in the number of deaths is taken into account, so that the two investigations may be regarded as representative of the same population (tentatively: “Western society with no overload of the Health Care System during the pandemic” ). Combining the results, the Covid-19 fatality rate is estimated to be found with 95% confidence in the range [0.1%; 0.3%].


Introduction
In this paper, we combine the infection fatality rates for Covid-19 reported in two studies (Streeck and others 2020;and Bendavid and others 2020) where the infection rate in the respective county is estimated from the fraction of people with antibodies against Sars-CoV2 in a representative sample.
We shall neglect all possible systematic deficiencies of the studies, take the reported numbers at face value, and simplify the statistical calculations by approximating the uncertainties with standard errors. This simplification is expected to yield an essentially correct estimation and to change only slightly the numerical values.
2 Results and discussion 2.1 Infection fatality rates In the German community Gangelt (12,597 inhabitants) with a superspreading event, a representative sample of 919 individuals was tested and 15. 5% (CI95 [12.3;19.0%] were found to have been infected (Streeck and others 2020). This leads to an estimated 1946 infections in the county and, with the officially reported 7 deaths related to Covid-19, to an infection fatality rate (IFR) of 0.36% with a CI95 of [0.29%; 0.45%] estimated from the uncertainty of the number of infected alone.
In the Californian county Santa Clara (1.9 million inhabitants), a sample of 3,300 was tested for antibodies with 50 positive results (Bendavid and others 2020). The sample itself is not representative of Santa Clara so that the data had to be re-weighted to reflect the demographic structure of that county. The infection rate was then estimated to be 2.8% (CI95 [1.3%; 4.7%]) corresponding to 54,000 individuals (CI95 [25,000; 91,000]).The IFR is calculated as 0.17% = (94 deaths in the region)/(54,000 infected). The authors do not specify a confidence interval.
In the two studies, the infection fatality rates IFR are calculated as:

=
Equation 1 with f = fatality rate, d = number of Covid-19 fatalities reported for a county, I = number of infected persons in the county estimated from a sample. In both studies, it is taken care of explicitly and assumed -that the samples are representative of the respective population.
Furthermore, it is implicitly assumed -that the demographic distribution of actually infected persons is similar to that of the entire county. -The IFR of Santa Clara (0.17 %) does not fall into the confidence interval ([0.29%; 0.45%]) of the IFR claimed for Gangelt. However, this does not mean that the two studies are not compatible with one another as will be argued in the next section.

Confidence intervals
The authors of the Gangelt study claim that their IFR may be used to estimate the percentage of infections in other places with similar population statistics. However, as the number of deaths is regarded a true value without uncertainty, the confidence interval claimed for Gangelt is valid only for this single specific infection process. If the infection process could be repeated, resulting again in about 1950 infected persons, we would not expect again exactly 7 fatalities.
In order to make the estimation valid for a larger population, e.g. for Germany, Gangelt has to be considered a sample and the number of fatalities must be considered the result of a random process where 1949 infected individuals die each with a probability 0.36%. This is a typical Poisson-process resulting in 2 to 12 deaths (94% confidence level). Consequently, in Equation 1, both nominator and denominator exhibit uncertainty.
For products and quotients, the relative variances of the two components add (Mergel 2017) leading to an equation for the uncertainty of the fatality rate: where the subscript "2se" indicates the error margin for a confidence level 0.95, approximately twice the standard error. This gives a CI95 of [0.04%; 0.68%] for the IFR in Gangelt.
Applying the same reasoning to Santa Clara, the number of deaths ranges between 74 and 103, yielding, together with the uncertainty of the number of infected, a CI95 of [0.06%; 0.29%].
The results are represented graphically in Figure 1. The numerical calculations are reported in Figure  2 in the annex as spreadsheet calculation. In Figure 1, the top line represents the results of Gangelt and the middle line those of Santa Clara. The extended CI95 of Gangelt includes that of Santa Clara so that, from a pure statistical point of view, there is no objection against considering the samples as arising from the same population although Gangelt is a small town in Germany with some medieval buildings and surrounded by villages whereas Santa Clara is a metropolitain area in the USA. Demographically, the population could be characterized as "Western society with no overload of the health system during the epidemic".
We may then combine the two IFRs (fG for Gangelt and fSC for Santa Clara) in a weighted average (Mergel 2017): with the weights being the respective values of ( 2 2 ) −1 , proportional to the inverse of the variances. The IFR of Santa Clara gets a higher weight than that of Gangelt because its uncertainty is smaller, with the combined IFR being 0.2%, shifted only slightly from the IFR of Santa Clara towards the IFR of Gangelt.
The uncertainty of the combined IFR is calculated also with the inverse of the variances (Mergel 2017): Equation 4 The result is CI95 = [0.1%; 0.3%] represented by the bottom line in Figure 1. It is smaller than the CI95 of Santa Clara alone because more data are combined. The numerical calculations are also reported in Figure 2.
The confidence interval is not compatible with the data from New York City (Goodman, Rothfeld 2020) where the fatality rate is 0.14% of the entire population and about 20% of the population is seropositive. The reason is unknown. We suppose that the assumptions of the calculation may not hold: -set if infected persons representative of the entire population -health care system not overwhelmed.

Conclusions
In order to compare estimates of the IFR from different studies, the number of fatalities in a region has to be regarded the result of a random process. This enlarges the confidence interval.
The serological studies in Gangelt (small German town) and in Santa Clara (metropolitain area in the USA) may be considered statistically to be samples of the same population because the extended confidence intervals overlap so that the estimations of the Covid-19 fatality rate may be combined. The true IFR for the population (tentatively: "Western societiy with no overload of the health system during the Covid-19 pandemic") is still unknown but may be found with 95% confidence between 0.1% and 0.3%.