The significance of the detection ratio for predictions on the outcome of an epidemic-a message from mathematical modelers

In attempting to predict the further course of the novel coronavirus (COVID-19) pandemic caused by SARS-CoV-2, mathematical models of different types are frequently employed and calibrated to reported case numbers. Among the major challenges in interpreting these data is the uncertainty about the amount of undetected infections, or conversely: the detection ratio. As a result, some models include assumptions about the percentage of detected cases among total infections while others completely neglect undetected cases. Here, we illustrate how model projections about case and fatality numbers vary significantly under varying assumptions on the detection ratio. Uncertainties in model predictions can be significantly reduced by representative testing, both for antibodies and active virus RNA, to uncover past and current infections that have gone undetected thus far.


Background
The World Health Organization declared the outbreak of the respiratory disease COVID-19 (coronavirus disease 2019, caused by the virus SARS-CoV-2) a global pandemic on March 11, 2020. In the past months, numerous efforts have been undertaken to understand the properties of SARS-CoV-2 and control the spread of the disease. While it has been repeatedly observed that the disease occurs in different severities, from very mild to critical [7], it is yet to be clarified how pre-symptomatic or asymptomatic infections do contribute to the spread of the virus [27,21,8].
In the attempt to predict the course of the outbreak and to possibly achieve its mitigation, mathematical models have been devised to predict the course and possible outcomes for different countries have been presented [10,19,3,24,13]. It was noted by several authors [28,5,15] that an important parameter of the epidemic is the detection ratio, meaning the percentage of infections that are actually discovered. Most mathematical models are able to reproduce the chronology of case numbers under widely varying assumptions on the value of the detection ratio. In particular, as has been noted in [9], the dynamic parameters, like the reproduction number R, derived from the case number data are completely independent of the detection ratio, assuming the latter is constant over time. Since detection ratios are also notoriously hard to obtain at early stages of an epidemic, many models use overly optimistic detection ratios of 90% or more [13], or do not even take undetected cases into account at all [6,1,20]. The differences in reported case numbers during the initial phase of an epidemic outbreak may not be noticeable. However, in the long run, the number of undetected infections, and in particular undetected recoveries, plays a major role in reducing the number of susceptible individuals within a population and thereby achieving the threshold of herd immunity. Moreover, as has been pointed out by, e.g., [25], the number of undiscovered infections is missing in the denominator when case fatality rates, i.e., the percentage of infected individuals dying from the disease, are being calculated. The uncertainty about detection ratios seems to be the major determinant of widely varying estimates for the mortality of COVID-19.
Unfortunately, different approaches to determining these ratios have delivered a huge range of estimates for different geographic regions, ranging from detection ratios as low as 2% [18] to approximately 35% [18,17]. For Germany, preliminary unofficial results for Gangelt in the county of Heinsberg [26], seem to yield about 10-20 % case detection. Consistently, screening for antibodies, and therefore counting recovered individuals as well as currently infected ones, seems to yield lower estimates than screening studies using PCR-tests for active virus RNA. Possible reasons may include that (i) mild infections could easily go unnoticed, or that (ii) false negative rates of PCR-tests being caused by less than perfect sample collection [29].
We have previously proposed mathematical models for the dynamics of COVID-19 infections in Germany, in particular taking into account the effect of current and possible non-pharmaceutical control measures [2,3]. For the simulated scenarios in our most recent work [3], we assumed detection ratios closer to the upper end of the range detailed above and remarked that the predicted fatality numbers should be expected to look very different when a lower detection ratio is assumed.
To illustrate this effect, we present here the results of simulations assuming different detection ratios, while maintaining unchanged assumptions on the other basic model parameters. For the scenarios which we show below we do not take into account the limited capacity of the health care system (this factor would further aggravate the situation in scenarios with high numbers of active cases). Our simulations shall only show the time course of both known and total active cases, as well as the cumulative number of fatalities. The latter model output is not only of high importance but also particularly sensitive to assumptions on detection.

Results
In order to illustrate how different assumptions on the detection ratio (DR) affect predictions of the epidemic's course, we show here simulation results for a few scenarios under the assumptions of high detection ratio (DR ≈ 40%), medium detection ratio (DR ≈ 10%, and low detection ratio (DR ≈ 2.5%) each. The model used for simulation is an extended version of the classical susceptible -exposedinfectious -recovered (SEIR) system, with three age groups and different compartments of infectiuous individuals (based on our previous work [3]). Case and death counts reported in Germany by the Robert Koch Institute (RKI) [22] as of April 24, 2020 were used for model calibration. In essence, this means estimating the effective contact rates only up to the lock-down situation in force until April 19, 2020. Though lock-down measures were partially relaxed starting on April 20, 2020, this would not show in the most recent data due to both the latency time of infection and delays in reporting. We show model simulations of the following scenarios for Germany.
A Nonmedical interventions as of April 25, 2020, including the most recent relaxation of some lockdown measures (starting April 20) and partial reopening of schools.
B Interventions as in A, plus an additional fatigue effect leading to general awareness wearing off.
People are assumed to become less careful in, e.g., sanitizing hands, keeping distance in public space, or in the coughing/sneezing protocol. This is assumed to gradually and partially reduce the effect of general awareness over the course of 8 weeks.
C Interventions as in A, assuming increased efforts isolating known and suspected infected individuals (called strict case isolation, sCI, in [3]) setting in on April 27. As long as the number of active cases remains relatively low, strict isolation could be practicable.
D The original baseline scenario from [3] with interventions in place as of April 14. This is a counterfactual scenario assuming the restrictions in place had not been relaxed starting on April 20.
Model simulations were run until the end of the epidemic, that is, until the number of active cases becomes insignificant due to the reproduction number R being persistently smaller than one. Note that this may be due to sufficiently low effective contact rates, or due to sufficiently many individuals having contracted the infection, hence having been removed from the pool of susceptibles. Needless to say, such long term projections are purely hypothetical since they neglect any possible reactions to the evolving situation. Specifically, it should be expected that significantly falling case numbers induce contact restrictions to be relaxed further, while rising case numbers might lead to new interventions. The precise numbers predicted by the simulations are not our main concern here. We rather want to emphasize the sensitivity of predictions to the detection ratio, that is, the different behavior exhibited by the system under the assumption of low, medium, or high detection ratios. The effect of different detection ratios is most striking in scenarios A (Fig. 1a) and B (Fig. 1b). At the time of writing this, after having relaxed some of the measures, the reproduction number, R, in Germany is close to one [23], and the system is very sensitive to the proportion of susceptibles among the population. A high detection ratio implies that only a very limited number of infections remained undetected, and given the current number of detected cases, most individuals are still susceptible. In contrast, a low detection ratio suggests that a significant number of infections has been going on unobserved, and there would already be a significant number of recovered, hence immune, individuals. This can make the difference between R > 1, leading to a second peak, or R < 1 and the epidemic subsiding. In both other scenarios (C and D, shown in Fig. 1c and Fig. 1d, respectively) the reproduction number is overall smaller than in A and B for all assumed detection ratios. We therefore primarily observe a quantitative difference. Again, in these scenarios, assuming a low detection ratio means that more susceptibles could have already turned into recovered individuals, making R even smaller, and therefore leading to the epidemic subsiding faster. The smallest effect of variations in the detection ratio is observed in scenario D. In this scenario very low effective contact rates were maintained over time as of mid April. This fact reduces R to such small values that the lower number of susceptibles in the low DR-case does not make a significant difference. Noteworthy is the effect of different detection ratios on the projected fatality number over the course of the epidemic. In scenarios A and B, the assumption of high or medium detection ratios lets us predict a second peak of the epidemic, over the course of which many more fatalities are to be expected. These would not be predicted if a low detection ratio is assumed (Fig. 2). The situation is aggravated by the fact that a low detection ratio means that the case fatality rate (percentage of fatalities among detected cases) is considerably higher than the actual lethality (or infection fatality rate, cf. [25]) of the disease.
Projecting observed case fatality rates into the future therefore produces overestimations of total fatality numbers. The overestimation becomes more pronounced the lower the detection ratio.

Conclusion
While enormous efforts are undertaken all over the globe to stop the COVID-19 pandemic and limit its consequences, more and more studies indicate that a large portion of cases could have remained undocumented [14]. Here we showed how the knowledge of the detection ratio of COVID-19 infections is of crucial importance for model-based predictions on the further course of the outbreak and its control. This emphasizes the urgent need for screening representative samples of the population in order to determine the prevalence of antibodies against SARS-CoV-2. Studies like those reported in [26] for Germany, in [4] and in [12] for California, or in [16] for New York are promising first steps but more widespread screening with selection processes minimizing biases are necessary to obtain better estimates of past detection ratios. Continuous extensive testing of individuals for virus RNA may further help to uncover temporal changes of detection ratios. It remains to be determined whether wide range screening would also help limiting the spread of the disease [11]. We conclude by noting that the difficulties in predicting an outbreak outcome are not limited to COVID-19 but pertain to any novel infectious diseases, making it even more important to not forget this lesson, even after the COVID-19 pandemics will have been resolved.