COVID-19 TRANSMISSION ESTIMATES AND FORECASTS A MODEL TO TEST SAMPLE SELECTION BIAS IN A ‘LOW RISK’ CRISIS

This paper surveys estimates of the transmission features of the novel coronavirus, and then proposes a model to address sample-selection bias in estimated determinants of infection. Containment assumptions of the infection forecasting models depend on assumed effects of policies and selfregulating behavior. In the commons dilemma of the pandemic, the perceived ‘low risks’ of unregulated marginal choices do not reflect the full social cost, implying non-pharmaceutical interventions (NPI) to reduce mortality can enhance social welfare. As more economic activity renews with liftings of restrictive NPI (RNPI), a critical question concerns the ability of milder NPI (MNPI) and voluntary precautions to mitigate the risk of greater infections and deaths while also limiting the pandemic’s economic damage and its social costs. Ineffective NPI could lead to continued COVID-19 waves and new types of crises, worsened expectations and delayed economic recoveries. From the central range of surveyed estimates of transmission and alternative herdimmunity-threshold estimates, a ‘worst-case’ virus guidepost suggests eventual deaths of around 25 to 41 million worldwide and 1.1 to 1.7 million in the U.S. needed to reach herd immunity with no vaccine or treatment. The most optimistic study surveyed (theoretical model from a non-reviewed preprint study) combined with the low end of the range of the estimated mortality rate suggests 6 to 9 million deaths worldwide and 250 to 370 thousand in the U.S. to reach herd immunity. Successes in the mix of NPI, treatments, and vaccine can limit the eventual global death toll of the virus. Improved estimation models for forecasting and decision making may assist in better targeting the local timings and mix of NPI. Diagnostic tests for the virus have been largely limited to symptomatic cases, causing possible sample selection bias. A recursive bivariate probit model of infection and testing is proposed along with several possible applications from cross-section or panel-data estimation. Multiple potential explanatory variables, data sources, and estimation needs are specified and discussed. John W. Straka *


Introduction
This paper surveys and analyzes estimates of the transmission features of the virus, and then proposes a model to address sample-selection bias in estimated determinants of infection.
The global surge of the pandemic has sharply lowered economic activity due to the rapid shift in expectations and precautionary behavior coupled with non-pharmaceutical interventions (NPI) for virus containment. In the commons dilemma of the pandemic, the perceived 'low risks' of unregulated marginal choices do not reflect the full social cost, with a socially suboptimal outcome (Eichenbaum et al 2020), implying NPI to reduce mortality can enhance social welfare. Restrictive NPI (RNPI) includes business shutdowns, travel restrictions, and shelter-in-place-and nonessential-stay-at-home orders.
Milder NPI (MNPI) includes less restrictive measures such as: masking, enhanced exposure notification for quarantine, shielding of the vulnerable, social distancing in public spaces and gatherings, enhanced population testing, and messages for self-regulating behavior. As more economic activity renews with liftings of RNPI, bringing new and reelevated COVID-19 risks, a critical question concerns the ability of MNPI and voluntary precautions to mitigate the risk of greater infections and deaths while also limiting the pandemic's economic damage and its social costs.
Ineffective NPI could lead to continued COVID-19 waves and flare-ups and new types of crises worldwide, worsening expectations and delaying economic recoveries. A worst-case virus assessment guidepost requires only the herd immunity threshold equation or estimate, estimates of the reproduction ratio, and the estimated mortality rate. The analysis in section 2 suggests eventual deaths of around 25 to 41 million worldwide and 1.1 to 1.7 million in the U.S. needed to reach herd immunity with no ultimately successful treatments, vaccine, and NPI. The most optimistic study surveyed (theoretical model from a non-reviewed preprint study) combined with the low end of the range of the estimated mortality rate suggests 6 to 9 million deaths worldwide and 250 to 370 thousand in the U.S. to reach herd immunity.
Successes in RNPI, MNPI, and voluntary behavior have limited the high mortality threat to date, as more focus now shifts to MNPI and self-regulating behavior, with RNPI fallbacks, while hopes for future treatment or a vaccine continue. Improved model estimates and forecasts may enhance policymaking on the local timings and mix of NPI.
Diagnostic testing for the novel coronavirus has been largely limited to symptomatic cases while many unreported mild or asymptomatic infections have been found. To address and test sample selection bias in estimating effects on observed infections, a recursive bivariate probit (RBP) model is proposed in section 3 along with several possible applications in section 4 from cross-section or panel-data estimation.
Multiple potential explanatory variables, data sources, and estimation needs are specified and discussed.
Case-level scoring models built from anonymized data may assist in advanced and under-privileged or constrained clinical diagnostic settings. Key estimated marginal effects on local infection rates include those from RNPI and MNPI to compare with other model results.

Estimates and Assessments of SARS-Cov-2 Transmission, Mortality, and Pandemic Policies
Policymakers in the COVID-19 crisis face many tradeoffs and uncertainties worldwide.
Institutions must consider different scenarios of the pandemic with precautionary reduced spending and investment, ongoing policy interventions, and the resulting macro and local economic effects.
Containment assumptions of the infection forecasting models depend on assumed effects of policies and self-regulating behavior (e.g. Chowdhury et al 2020), which especially impact longer-term projections. RNPI, MNPI, and voluntary increases in precautionary behavior have played simultaneous roles in containment. Varying data and measurements have allowed some first estimations of the effects of NPI and behavior on infection incidence and transmission rates, finding large effects (Courtemanche et al 2020;Unwin et al 2020;Jarvis et al 2020). Ferguson et al 2020 calibrates a model from UK and multicountry data and forecasts the effects of more vs. less restrictive virus 'suppression' vs. 'mitigation' policy interventions, testing the effects of intervention cycles. Andersen et al (2020) exploit a natural experiment: Denmark and Sweden were similarly exposed with only Denmark mandating significant restrictions. Estimated aggregate spending dropped about 25 percent in Sweden and only an additional 4 percent in Denmark, suggesting economic contraction was due largely to the virus itself and resulting precautionary behavior. Economic recoveries may generally depend on consumer and business fears and expectations, likely reacting to NPI successes or failures in containment and hopes for a successful PI.
Calibrations of the parameters in forecasting models require continued estimates of the marginal effects of NPIs and other determinants of transmission, infections, and mortality.
Virus transmission propensity depends on the critical base reproduction ratio of the virus, R0 (new cases produced by a typical single infection in a fully susceptible population). This reflects the intensity of effective contact between the infected and susceptible populations, referred to as the transmission coefficient or effective contact rate, β. R is equal to β x d, where d is the average duration of infectiousness. Transmission rates (the speed of spread) depend on both R and the 'serial interval' (average time between onset in primary and secondary cases); i.e., the speed of a pandemic depends on how many other people each case infects and how long it takes for infection between people to spread. 1 Base transmission rates due to differing β's can vary widely across local environments depending on population density, local transit, customary gatherings, etc. This has generally led to much lower infection rates in rural versus urban areas, for example (although highly connected groups may dominate transmission in any environment). Unwin et al (2020) estimated an R0 of 5.0 for New York versus 0.3 for Montana, for example. R is also of course lowered by traditional quarantining of cases and exposures (from contact tracing), and by RNPI, MNPI and voluntary social distancing.
The relationship between the incubation period (how long it takes for symptoms to appear) and the serial interval, and the duration of infectiousness, have both had important implications for the relative spread of the virus (SARS-Cov-2). The median serial interval of the seasonal flu exceeds the average incubation period, and the range of flu incubation periods is distributed comparatively narrowly, as shown in Fig. 1. In contrast, the estimated SARS-Cov-2 serial interval is generally somewhat less than the estimated median incubation period, and the range of incubation periods is distributed across a 1 The serial interval is important for epidemiological estimates of R0 based on available information as it unfolds. considerably longer number of days. The median or average incubation period for SARS-Cov-2 is around 5-6 days, but it can vary from 1 to 14 days (WHO 2020). The virus has been characterized by a high R0, a long incubation period, and a relatively shorter serial interval (Xie & Chen 2020;Van Buesekom 2020).
Furthermore, although ultimate statistical outcomes are less severe across a population of the typical flu-infected versus the SARS-Cov-2 infected, a comparatively larger share experience no or only mild symptoms from SARS-Cov-2 (which most shrug off without seeking treatment or testing).
Symptoms across these respective infected populations are thus more bimodal with SARS-Cov-2. Most people with the flu know they have it and naturally self-quarantine; likewise, with the original SARS in 2003 and its severe symptoms. Not so with SARS-Cov-2, a critical difference.
These key features of the virus clearly elevate the relative public health risk. Each implies a larger scope for 'asymptomatic' or pre-symptomatic transmission. More of the infected population simply passes the virus along to others unwittingly. In addition, a relatively low serial interval exacerbates the health system pressures when COVID-19 cases surface and flow through in faster cycles.
The range of current estimates of the general R0 for SARS-Cov-2 considerably exceeds the R0 of the widely studied seasonal flu, as shown in Fig. 2. Compared with 1.3 for the flu, the estimates for SARS-Cov-2 range from the low 2's to 5.7 or more. The effective contact rate of SARS-Cov-2 is elevated by its high unwitting non-quarantining spread, and the infectiousness duration of someone with SARS-Cov-2 (14 days or more after onset) may be at least twice that of the flu. 2 The higher general base R0 well above 1.0 implies not only that a single case infects more people in an exponential spread, but the ultimate overall need for virus extinction, developed herd immunity plus vaccine efficacy, is also higher. This reflects the basic herd immunity threshold (HIT) equation of 1 -1/R0 (Fine et al 2011). 3 Higher values of R0 increase the HIT at the rate of 1/ 0 2 . Herd immunity will not convey biological immunity to all or stop transmission, but it tends to abate the spread of a virus toward population extinction. NPI policies and behavioral social distancing can also similarly abate the virus at any time if and where they suppress R below 1.0. The studies summarized in Fig. 2  showing a large share of undetected infections (Hendrickson & Strum 2020). Li et al (2020) concluded that 86% of all infections in China were undocumented before travel restrictions and undocumented infections were the source of 79% of the documented cases. Such findings also of course lower the estimated virus mortality rate.
Estimates of total (detected and undetected) virus mortality have been generally around 1% (Preidt 2020), although recent estimates suggest it may be as low as 0.5% (Hamilton 2020). A 0.5 to 1% mortality rate is about 5 to 10 times higher than that of influenza, and with SARS-Cov-2 it can impact a 3 Simplifying assumptions make the equation complex to apply in many aspects of public health practice (Fine et al).
Surviving SARS-Cov-2 infection is also assumed to generally convey immunity, similar to other diseases; observed reinfections have been rare, but research continues. 4 During a bad flu season, the city could expect to see around 50,000 flu cases and 50 resulting deaths over the 6week peak in January-February, as calculated from CDC estimates of 'flu-like illnesses' and the detailed data reported by the New York State Health Department. Over the 6-week period ending April 25, 2020 (6 weeks after the first reported NYC COVID-19 death) the city's bolstered health system saw the flood of COVID-19 cases lead to 11,544 deaths. This is limited to the city of New York only as of April 25. It does not include the thousands of additional deaths in the suburbs and surrounding jurisdictions and thousands more after April 25. much larger share of the population infected from latent spread. Social distancing and case quarantining have reduced observed mortality, and there have been many reports of undercounting of COVID-19 deaths. Higher mortality rates have especially impacted vulnerable (pre-existing conditions and older) and under-privileged populations (e.g. Borjas 2020) and can similarly affect medical staff, first responders or others with high exposure and inadequate personal protective equipment (PPE). On the other hand, although there have clearly been many tragic deaths among the young also, the statistical mortality rate from SARS-Cov-2 has been considerably lower across younger groups without vulnerable conditions.
These numbers suggest that from the personal standpoint of most individuals, the virus may well be perceived as fairly 'low risk.' Although worse than the flu, 0.5 to 1% is much lower than the approximate 10% death rate from the original SARS virus in 2003, for example. This, along with strong incentives to resume normal economic activities, clearly elevates the commons problem of SARS-Cov-2.
Public health messaging drawing on insights from behavioral science may be most effective in helping to maintain voluntary social-distancing and support for further MNPI and any needed RNPI (Bonell et al. 2020). An important simple message could emphasize the need for a large share of the population to acquire the virus to suppress it 'naturally,' with national mortality well beyond the already grim toll. The virus has clearly shown its capabilities as an individually 'low risk' but collective social disaster.
Applying the benchmark HIT equation, examples from the estimates of R0 cited in Figure 2, and the estimated mortality rate, Table 1 shows estimates of the implied 'worst case' number of deaths in the U.S. and worldwide. These are the approximate numbers of deaths needed to reach herd immunity with no vaccine. They are labelled 'worst case' because they estimate the approximate eventual outcomes with no ultimately successful treatments, vaccine, and NPI. The shaded part of Table 1 focuses on the central part of the distribution of surveyed estimates. Using a mortality rate of 0.75%, this area of the table implies approximately 25 to 41 million worst-case deaths worldwide and 1.1 to 1.7 million in the U.S.
In addition to estimates from the basic HIT equation, Table 1 also includes some alternative HIT estimates based on two recent preprint (not peer reviewed) papers that argue for a lower HIT than implied by the classic equation. Although most epidemiologists have centered on 60 to 70% as the likely HIT range for SARS-Cov-2, these alternative more optimistic models and arguments theorize that the HIT is reduced with highly susceptible or exposed groups becoming infected and immune first, preventative measures, and facilitating population connections (Britton et al 2020;Gomes et al 2020).
Although some have advocated it as a social strategy, herd immunity from natural spread through the population is not generally a social goal. Rather, total immunity = herd immunity + vaccinated immunity or effective treatment is the necessary ultimate target for virus suppression. In addition to mitigating health-systems overload, effective social distancing and NPI precautions 'buy time' until vaccinated immunity or successful treatment can substitute to ultimately suppress the virus effects and keep the eventual death toll down. Allen, et al (2020) refer to a maintained normalcy with 'no social distancing' approach as the "surrender" strategy. 5 Maintaining MNPI measures may be better for long-run economic growth. SARS-Cov-2 transmission is insidious, inviting commons-dilemma fueled complacency. Remaining health threats with further economic repercussions, including "second wave" or "saw tooth" behavior in the virus and local economies, should not be under-estimated, even as more locations in the U.S. and globally continue to relax RNPI. Latent infections and transmission may cause outbreaks that can overwhelm any local health care system and economy. COVID-19, like the 1918 pandemic, is likely having a significant negative impact on many businesses and long-term economic growth in any event (Eichenbaum et al 2020;Andersen et al 2020;Garrett 2007;Brainerd & Siegler 2003). Evidence across cities from the 1918 pandemic suggests that cities that intervened earlier and more aggressively if anything grew faster after the pandemic (Correia et al 2020). For a variety of reasons, low-and middle-income countries are the most vulnerable to SARS-Cov-2 and high mortality globally (Khalatbari-Soltani et al. 2020), and they also suffer from the reduced global economic growth. Because of many global connections that can only be limited at significant additional economic costs, worldwide herd immunities plus vaccine efficacy or effective treatment and quarantines are likely needed to fully suppress the threats of the virus to the U.S.
Expansion of trade wars and many other socioeconomic and political unintended consequences may follow from less effective policymaking, further slowing both U.S. and global economic growth in both the short and long term. 6 In the prospects for the "time buying" success of NPI, the unprecedented efforts across the globe to achieve one or more effective treatments or vaccine should not be counted out. Based on the constraints evident from past virus-containment and pharmaceutical experiences, no single 100% 'cure' for the pandemic is likely, and no satisfactory vaccine may be found (Picheta 2020). Successes in the mix of NPI, treatments, and vaccine, however, can mitigate the eventual global toll of SARS-Cov-2.

A Proposed Complementary Model for Estimating Determinants of Infection Rates Across Individuals or Local Areas Correcting for Sample-Selection Bias
Estimation Individual-level case data also exists, although these developing data must be anonymized.
The key local-area unknown is the total infection rate, i.e. the full infected proportion across or within local-area population(s). The numerator includes the observed test-positive cases plus many unobserved pre-symptomatic and asymptomatic infections in individuals never tested (or who previously tested negative but could test positive now if tested again). We can only observe whether some random 6 To be sure, many trade strategies of all countries are being rethought in various ways in the anticipated wake of the COVID-19 crisis. 7 A bivariate logistic approach is also possible, although not common. Broader testing has been widely advocated to enhance quarantining and better target NPI measures (e.g. Berger, Herkenhoff & Mongey 2020; Begley 2020).
individual from the population is infected based not on their true infection status, but on the observed outcome of the sample-selection variable, tested or not tested. The selection process into observability has been nonrandom. This creates the well-known problem of sample selection bias in the estimated parameters of a single-equation model of infection incidence.
The explanatory variables are likely to differ somewhat in the infection-incidence equation and the sample-selection equation, as there are factors that affect testing incidence but not the incidence of infections. Consider the following plausible determinants of the Infection versus Testing likelihoods in the general bivariate probit model specification: ( The conceptual variables specified in (1) and (2) are largely self-explanatory effects on the effective contact rate and transmission vulnerability; these variables are specified in more detail in Table 2 below, with the expected signs on the variables. The variables are also more or less difficult to capture or measure depending on the nature of available data and level of information sought, and some may not be measurable easily or at all. "Flu" in the testing equation indicates the possibility of an individual likely to be given a SARS-Cov-2 test due to similar symptoms from the flu. If the presence of seasonal flu or other flu strain is always tested first, then testing for SARS-Cov-2 is sequential. Flu testing may also be given simultaneously or not at all. The predicted effect of "Flu" on net is ambiguous. Recent Infectious Contacts is an important variable clinically and in local areas. Some sampled large reductions in transmission across local areas attributed to NPI may reflect high infectious contacts from recent international travel (Miller et al 2020), and the benefits of NPI may well differ in less exposed areas.
Notice that (1)  The postulated model specifications in (1) and (2) lead to the recursive bivariate probit model specification (Maddala 1986;Filippini et al. 2018) given that the endogenous variable, Infected, appears in (2). Using the standard latent-variable specification for limited-dependent variables, this model is: (3) y * i = β'1X1i + υi , yi = 1 if y * i > 0 , yi = 0 otherwise  Greene (1996). Table 2 shows many of the potential data sources and defines the explanatory variables in (1) and (2) more specifically, and their postulated effects. Equations (3) and (4) (3) and (4) as described below. Case-level data requires both anonymizing and careful assessment of the distributions of each of the individual-level data elements to remove erroneous data, address missing information, and so on. Such data could include additional explanatory variables beyond those listed in Table 2, including clinical/medical-condition indicators. Organizations have announced plans and proceeded to assemble case-level data to make it more widely available. 9 As shown in Table 2, group-level local-area data is widely available from multiple data sources.
For the set of endogenous and explanatory variables postulated, these data also have some missing information, differences in reporting, etc. but these data are readily accessed at the current time. With respect to individual-level reporting errors, the Central limit Theorem suggests that for a sizable localarea population the reported group-level means from individual data should tend to be normally 9 Ciox, a group partnering with LabCorp, for example, announced on April 10, 2020 that they are performing anonymization "with an initial data set based on LabCorp's nearly 500,000 completed COVID-19 tests" (Eddy 2010). Two weeks later, Cerner Corporation announced a similar plan to offer "select U.S. health systems and academic research centers complimentary access to critical de-identified COVID-19 patient data to help fight the pandemic. This offering will provide eligible health care researchers free access to Cerner's COVID-19 data set to support epidemiological studies, clinical trials and medical treatments related to COVID-19, in line with applicable laws and guidelines" (Cerner Corporation 2020). See Jason (2020) who also describes these efforts and another similar one from Health Catalyst. The WHO also has a Global COVID-19 Clinical Data Platform, although it is limited to hospitalized patients. Similarly, the CDC has a database, COVID-NET that was "implemented to produce robust, weekly, age-stratified COVID-19-associated hospitalization rates." (Garg et al. 2020). Organizations first providing the anonymized-case databases may also request feedback from qualified initial users. what kinds of conditions. For these reasons also, local-area-data modeling and projections that are faster to produce have significant clear advantages.
To estimate equations (3) and (4) (3) and (4). Other estimation and data considerations for estimating this specific model are described in the Appendix.

Potential Applications of the Recursive Bivariate Probit Model Estimates
Scorecard Models: With anonymized case-level data, an estimated scoring model can rank order the likelihood that an individual with symptoms is infected with SARS-Cov-2. A preliminary scorecardtype model estimation has been reported for the virus using patient and control-and validation-group clinical and other data (Tordjman et al. 2020). 10 These authors found no clinical variables that were statistically significant. A scoring model controlling for sample selection bias could be valuable in different ways in both advanced and under-privileged health systems. With insufficient diagnostic tests available, a scorecard may be able to assist providers in triaging the available tests. It may also help assess more uncertain cases that have tested negative, or patients that can only be evaluated remotely.
More aggressive quarantining may be advised or pursued for those with a higher likelihood of infection.
If a stratified random sample of testing has been performed across local areas, it may be used together with individual-or area-scorecard results. Scorecard results can assess and rank order the distributions of infection-risk characteristics within and across local areas. Such complementary findings may allow more precise prediction of the probabilities of infection across local areas, improving identification of emerging and potential nearby outbreaks, including areas currently showing fewer cases.
Individual-or area-scorecard model results may also be used to better target stratification in random testing. Areas with higher infection-risk characteristics may be targeted for higher sample weights.
Scorecards may also be used for initial estimates of area infection rates prior to random sampling.  Vella & Verbeek (1999) and Verbeek (1990).

Improved Assumptions in Epidemiological Forecasting Models: Epidemiological (typically SEIR
infectious disease) models remain the key tools for forecasting the future paths of infection across local areas, particularly given the potentials for rapid exponential spread due to the high general R0 of SARS-Cov-2. Empirical hazard-model results may be able to provide useful complementary benchmarks and insights in refining parameter assumptions within the epidemiological models.
As discussed, the most pressing but very uncertain state and local policy questions today concern the effectiveness of various RNPI and MNPI gradations of social distancing in reducing the spread of infection. Courtemanche et al. (2020), for example, used county data across the U.S. to estimate the marginal impacts on reported infection growth rates of specific local-area gradations of RNPI social-distancing measures, controlling for unobserved fixed effects. 11 This study included testing incidence as a control variable in the estimation, which implicitly assumes that ρ = 0, a testable assumption in the RBP model. Testing incidence was also measured at the state instead of county level. question today is whether the macroeconomic effects of the pandemic will be short-or longer-lived.

Conclusion
This paper's survey and assessment of estimates of the transmission features of SARS-Cov-2 highlights the importance of continued estimation of NPI and other covariate effects on the virus's behavior and mortality outcomes. The recursive bivariate probit model proposed can be used to address sample-selection bias in infection data, with multiple applications suggested. Longer-term projections of infection forecasting models especially depend on assumed effects of NPI policies and behavior.
History may provide some additional insight on current pandemic policies. Policies across individual U.S. states today parallel previous global experiences in pandemic policymaking with two alternative directions (Baekkeskov 2016). One policy (correlating with states with lower R0 and more limited outbreaks) emphasizes a national 'theme,' which tends to be followed with countervailing information ignored or 'underreaction' (Maor 2014) (Straka & Straka 2020). U.S. pandemic policies seem likely to continue to reflect the national election year, which may increase the reliance on MNPI and voluntary behavior.

Fig. 1. Estimated Incubation and Serial Interval Days: SARS-Cov-2 vs. Flu
Source: Reported Statistical Estimates and WHO (see references).

Fig. 2. Estimates of R0: SARS-Cov-2 vs. Seasonal Flu
Source: Existing Statistical Estimates (see references). From two preprint studies (not peer reviewed). Under specific theoretical assumptions about highly susceptible or exposed goups becoming infected and immune first, preventitive measures in place, and population connection heterogenities, transmission may decline at lower threshold levels.  Table 2 a Low income is assumed associated with more dense neighborhoods and closer living quarters and less access to testing (Borjas 2020). High Income is assumed associated with greater likelihood of prior domestic and overseas travel and greater access to testing.
b These vulnerability factors increase the COVID-19 severity risk, but their effect on SARS-Cov-2 incidence likelihood is unclear, all else constant, especially with more self-quarantining conducted by those more vulnerable to severe COVID-19 illness.

Appendix
Estimation of the recursive bivariate probit model with either synthetic assignment of 0's and 1's to represent local-area rates or fractional bivariate probit requires estimates of the local-area population. Figure 3 describes components of the Susceptible, Infectious, and Recovered groups within a local-area population (and the associated mixtures of testing status), beginning from the last known population count or estimate (e.g., from the 2019 CPS estimate for a specific county).

Estimation of Area Population:
The relevant current population of any local area is equal to: CURR. POP. = Last Estimate -Out-Migration -Deceased|COVD-19, Other + In-Migration + Births The values of each of the variables on the right side above, except for Last Estimate, are dynamic and can change daily. For this reason, it seems generally reasonable to assume that Last Estimate provides a sufficiently close estimate of the Current Population, which implicitly assumes that Out-Migration + Deceased = In-Migration + Births. This is reasonable in general to simplify (although though more recent data on deaths and births, and migration should be used if available and relevant). If a local area with a large outbreak experienced sizable out-migration (fleeing the outbreak before expected travel restrictions) or deaths, and very little or no in-migration, this suggests the possible need for assessment and adjustment of the assumed area denominator. 12 The best denominator to use depends on the main purpose of the estimation, and the timing of out-migrations. 13 The widely tracked and reported known infected cases in a local area include all those who tested positive for SARS-Cov-2, including the deaths from resulting COVID-19 (in addition to deaths many areas now also report still-active cases and other information). As depicted in Figure 3, most of this group are at varying stages of recovery or stages of becoming non-infectious (typically thought to be reached about 4 to 6 weeks after release or testing). 14 Addressing the Different Types of Testing: Figure 3 also describes two types of SARS-Cov-2 testing, and this is relevant for the estimation of the selection equation for Pr (Tested). The number of 12 Journalists have documented this in the case of Wuhan, for example; it may have also occurred elsewhere as the growing severity of the virus and a metropolitan-area outbreak became clear. 13 If the purpose pertains to the infection rate of the total exposed population, then using Last Estimate should be reasonably close to the total exposed; however, any diagnostic SARS-Cov-2 tests performed elsewhere on outmigrated individuals are censored from the observed area data. If the main purpose pertains to the current residing population of the area, it is more important to adjust the denominator to reflect materially relevant out-migration. 14 Many of those infected may have lasting physical damages, making them the 'wounded' share of the infected population, which only future research may be able to estimate. Global experience has also shown that some recovered patients have become re-infected, for reasons that are unclear. The share of the infected population that does not become antibody immune is generally believed (or hoped) to be small, but not yet fully known. available tests has continued to grow, with varying accuracy, although accuracy levels have generally improved. Most available kits are a diagnostic test (DT) for the presence of active virus. For simplicity, we can assume that most or all diagnostic testing performed as reported for a local area is reasonably similar in type and similarly allocated in a non-random, largely clinical diagnostic manner (and that any divergences from this are negligible or immaterial). Various states in the U.S. have provided more detailed information on the type of tests in use. 15 Less available but statistically useful kits allow testing a population for the presence of antibodies. Those who previously had the virus typically develop the presence of antibodies. The antibody tests (AT) help to estimate how close a population may stand from the estimated 'herd immunity' threshold, for example, and as referenced in the text they can also allow more accurate estimation of the ultimate mortality rate. The AT types currently available have had more problems and may not be reliable (Vogel 2020), although accuracy has improved. 16 Most states in the U.S. are reporting any AT performed separately from the DT.
Addressing Data Reporting Discrepancies: Data reporting discrepancies need to be considered for relevance and possible adjustments if needed. Not all sources are reporting and tracking the incidence of SARS-Cov-2 infections and testing in the same way or with the same timeliness. For example , Meyer & Madrigal (2020) report data discrepancies between state health departments' testing data and recently started CDC reports on the incidence of testing by state. Most states have reported confirmed testpositive cases as their total case count, with some discrepancies. 17 For a discussion of COVID-19 data reporting needs, see Pearce et al 2020. When comparing confirmed COVID-19 deaths with those from the flu, Faust (2020) has shown the difference in reporting of confirmed COVID-19 deaths versus confirmed flu deathsmany observers have not accounted for the CDC's inclusion of total deaths from all 'flu like' illnesses (including pneumonia); actual confirmed annual flu deaths in the U.S. have been far lower than those from COVID-19 to date.
15 'Kits' must include swabs and other materials needed for any test, and available trained personnel as well. 16 New York State has reported a specificity of 97-100% for the AT used in its population testing. The CDC recently issued guidance on AT, warning of the need to interpret results properly in view of probabilities conditional on an area of high or low infections. 17 Virginia, for example, also includes "probable cases" in its count of total cases.