ESTIMATES OF SARS-COV-2 BEHAVIOR IN THE COVID-19 CRISIS Addressing Sample-Selection Bias for Public Health Applications

This study surveys and assesses the implications from recent empirical studies and reports to highlight the characteristics of SARS-Cov-2 and the COVID-19 crisis, and then proposes a recursive bivariate probit (RBP) model specification and possible applications. The RBP model addresses sample selection bias to estimate key determinants of virus infection given nonrandom testing. Applicable to anonymized case-level or widely available local-area data in the U.S., multiple data sources are shown. With suitable data the model can control for observed (e.g. population density) and unobserved factors to estimate the marginal effects of varying state-prescribed measures and behavioral social distancing. Case-level scoring models may, in addition, eventually assist in clinical diagnostic assessments. Although not proposed to substitute for more random population testing and other methods, results could also be used in advance of more testing. Uncertain assumptions in epidemiological models reflect unclear effects from gradations of social distancing now occurring. Despite many calls for broader testing and targeted quarantining in the U.S., many practical obstacles remain, leaving unknowns, especially across local areas. Differing local transmission rates respond to stronger or weaker social distancing and quarantining. High risks from latent non-quarantining spread warn of potential overwhelming local outbreaks. The insidious nature of SARS-Cov-2 invites complacency, especially in non-hotspot areas. Complacent behaviors can fail to adequately address the public-goods problem, leading to various forms of continued local and macro COVID-19 waves and crises. To assess a worst-case scenario, no model projection is needed, only the herd immunity threshold equation, estimates of the reproduction ratio, and the estimated mortality rate. With no ultimately successful countermeasures in treatment, vaccine, and non-pharmaceutical interventions (NPIs), the analysis here suggests an eventual number of deaths much like the 1918 pandemic in U.S. deaths per capita (1.8-2.7 million U.S. deaths) and in the total number of deaths worldwide (around 50 million). This toll also reflects a hypothetical global “surrender” strategy of business-as-usual and no social distancing, which in practice no nation has followed. Some successes across the three broad social countermeasure efforts – which appears most likely, in a mix of outcomes – can lessen the high social costs.


Introduction
Decision makers in the COVID-19 crisis today face tremendous pressures, uncertainties, and statistical unknowns requiring enhanced analytics and model estimates based on available data and information, especially more local information. Countless institutions today must also continue to face the many uncertainties of planning and preparing for different possible scenarios of the COVID-19 pandemic and associated macroeconomic and local economic fallouts. Ironically, for larger institutions that already deploy many models in projections and simulations, which are now at elevated model risk, the single largest model risk today probably lies externallyin the key 'containment' (effective contact rate) parameter assumptions of the external epidemiological models in such wide use for crisis management. These assumptions are highly dependent on differing policy choices and behavioral gradations of local social distancing and vigilance, all now occurring in view of the socioeconomic tradeoffs and tolerance levels for mandated restrictions on commerce and activity.
COVID-19 policies and social behaviors reflect the ability or willingness of a society and its leaders to address the stark public-goods problem in 'compliant' voluntary social distancing. In a grim variant of the 'tragedy of the commons,' both the insidious behavior of SARS-Cov-2 and strong economic incentives naturally induce more individuals to exercise and demand social and economic freedoms; however, these decisions do not reflect the full social cost of the actionswhich aggregate to a daily rising national or local death toll, potential 'waves' in outbreak hotspots of the virus, and potentially longer-lasting macro and local economic effects. The novel coronavirus has thus created a novel publicgoods problem for many. Leaders respond to this problem and the social tradeoffs involved using their own calculus and treatments of available models and other scientific information and data. Sections 2 and 3 address the underlying background and causes of this social dilemma.
This study surveys and assesses the implications from recent empirical studies and reports to highlight the characteristics of SARS-Cov-2 and the COVID-19 crisis (in sections 2 and 3), and then proposes a recursive bivariate probit (RBP) model specification for possible applications (in sections 4 and 5). The RBP model addresses sample selection bias to estimate key determinants of virus infection given nonrandom testing. Applicable to anonymized case-level or widely available local-area data in the U.S., multiple data sources are shown. With suitable and sufficient data, the model can control for observed (such as population density) and unobserved factors to estimate the marginal effects of varying state-prescribed measures and behavioral social distancing. Case-level scoring models may, in addition, eventually assist in clinical diagnostic assessments.
High risks from latent non-quarantining spread warn of potential outbreaks that can overwhelm any local health care system and economy. The insidious nature of SARS-Cov-2 invites complacency, especially in non-hotspot areas. Complacent behaviors (ignoring social distancing and mask wearing, 'domino' political effects from local-government relaxations, etc.) can fail to adequately address the public-goods problem, leading to various forms of continued local and macro COVID-19 waves and crises. To assess a worst-case scenario, no model projection is needed, only the herd immunity threshold equation, estimates of the reproduction ratio, and the estimated mortality rate. With no ultimately successful countermeasures in effective treatment, vaccine, and non-pharmaceutical interventions (NPIs) including broad testing with quarantining, the analysis here that reflects many empirical studies suggests an eventual number of deaths much like the 1918 pandemic in U.S. deaths per capita (1.8-2.7 million U.S. deaths) and in the total number of deaths worldwide (around 50 million). This toll also reflects a hypothetical global "surrender" strategy of business-as-usual and no social distancing, which in practice no nation has followed. A mix of limited successes across the three broad social countermeasures underwaywhich appears most likely, creating a mix of outcomescan lessen the high social costs.

Problems of Testing and Forecasting Virus Transmission, Mortality, and Global Economics
Most data on the SARS-Cov-2 virus behind COVID-19 suffers from sample-selection bias due to non-random testing. The first-stage triage of available non-random diagnostic testing has been largely limited of course, mainly to ill or severely ill patients. This makes these data unreliable for many public planning purposes unless they are handled carefully and appropriately. This data limitation and other larger motivations for more technological case identification and quarantining with 'contact tracing' have prompted more 'asymptomatic' and randomized population testing in the U.S. and elsewhere, which should continue to grow. 1 Berger, Herkenhoff & Mongey (2020) assume an identified-case 'quarantining technology' fixed at the estimated rate that identified U.S. cases entered quarantine during March, 2020.
They then suggest that total deaths that occur under this policy can be achieved under looser quarantine measures with a substantial increase in random testing of asymptomatic individuals. Such testing in conjunction with targeted quarantine policies could dampen the economic impact of the COVID-19 crisis and reduce peak symptomatic infections to ease hospital capacity constraints.
Despite many such calls for more random testing and targeted quarantining, however, many practical obstacles have remained. Technical, manufacturing, logistical, and testing-error problems have been widely reportedeven before, and complicating, proper statistical stratification in random sampling to include key aspects of vulnerable groups and small-population areas. State health departments across the U.S. have been allocating limited testing to emerging or potential outbreak hotspots including nursing homes, other concentrations of vulnerable groups, first responders, densely populated neighborhoods, etc.
More 'high-tech' target trace quarantining can raise privacy concerns. Quarantining of the vulnerable, as in nursing homes, or the voluntary self-quarantining elsewhere of retirees and other older individuals, is socially costly for many individuals, facilities, and family members. Quarantining of the infected population is only as effective as voluntary compliance, penalties, and enforcements can induce (which vary widely across the globe), and it is obviously limited to only test-identified cases.
Less commonly referenced have been the uncertainties around differing virus transmission rates across local environments. The reproduction ratio of the virus, R0, varies across a population as local transmission speeds reflect population and workplace densities, transit, social gatherings, etc.
1 As of this writing, some broader randomized testing has begun in a few states in the U.S. such as New York and California. Some studies have assessed testing needs by state, for example, to 'safely reopen' (Begley 2020). The CDC has just announced a new plan. Global reports include random testing done within Germany and other places. Estimated characteristics of SARS-Cov-2 in the lower 'serial interval' of spread relative to longer-distributed days in the incubation period compared with the seasonal flu (see section 3), and other factors including a relatively lengthy period of infectiousness, also clearly suggest more asymptomatic or pre-symptomatic spread and a greater existing share of the population with current or recovered past infection. This was apparently 'confirmed' recently in statewide randomized antibody testing across New York State (with a preliminary estimate of 1.7 million that have been infected in NYC for example; Saplakoglu 2020). 2 Li et al (2020) used a different approach to estimate the proportion of early infections that went undetected in China and their contribution to virus spread. They concluded that 86% of all infections were undocumented before travel restrictions, and undocumented infections were the source of 79% of the documented cases, explaining the rapid geographic spread and indicating a particularly challenging virus to contain. Such findings have the 'benefit' of expanding the denominator and lowering the estimated total SARS-Cov-2 mortality rate, in the general direction long expected by most experts.
However, at the same time, cautiously determined estimates of mortality overall from SARS-Cov-2 infections (detected and undetected) appear to remain around 1% (Preidt 2020), perhaps somewhat higher or lower (not far from initial expert rough gauges of an eventual 1-2%). This estimated range around 1% is significantly higher than the mortality rate of influenza (on the order of 10 to 15 times higher); and importantly (as referenced above) it impacts a significantly larger share of the population that has been, is, or will be infected from a largely non-quarantining spread. Social distancing and quarantining policies for identified cases have also reduced observed mortality to date by lessening the strains on local health systems, and there have been many reports of undercounting of COVID-19 deaths.
Moreover, evidence suggests higher virus mortality rates in many vulnerable populations (pre-existing conditions and older individuals) and in overly stressed or under-protected responding medical staff or 2 Questions about the accuracy of many antibody tests being offered have been raised. The antibody test performed in this case was a test developed by the New York State Department of Health. first responders. The known characteristics of SARS-Cov-2 to date make simple comparisons with the flu fraught with statistical and epidemiological pitfalls.
High risks from latent non-quarantining spread of SARS-Cov-2 warn of continued danger of smaller versions of the grim NYC-area rapid and deep local outbreak, implying a strong need for continued national and global vigilance against complacency. An outbreak can overwhelm any local health system with further loss of life and other economic damages. Local health systems have typically developed to a size capable of serving the local population under normal circumstances, with various widely reported shortages of care in rural or more impoverished areas and neighborhoods.
It is worth emphasizing briefly here some of the clear underlying and resulting stark differences between SARS-Cov-2 and the seasonal flu. Unlike the widely-studied flu, there is obviously no vaccine at all and no proven effective treatment(s) for SARS-Cov-2 to date. A successful treatment for SARS-Cov-2 has to lower mortality (as treatments for HIV for example achievedone decade into that crisis).
Although NYC is an extreme example, with its high population density, dense mass transit, and undetected pre-social-distancing spread of SARS-CoV-2, the relative NYC mortality from COVID-19 has been stunning.
During a typical relatively bad flu season in New York City (the five boroughs) as reported to the New York State Health Department, the city could expect to see around 50,000 flu cases and 50 resulting deaths over the 6-week peak period in January-February. 3 Contrast this with the recent 6-week experience in NYC ending April 25, 2020 (just 6 weeks after the first reported COVID-19 death in the city). Over this period the city's health systemwith the widely reported scramble to bolster it of course 3 Calculated from CDC estimates pertaining to 'flu-like illnesses' and the detailed data reported by the New York State Health Department. The flu season NYC peak this past January in February was a relatively bad one, and New York State overall in the 2019-2020 flu season (Oct 1 -Apr 4) had its highest number of flu cases ever. Unlike SARS-Cov-2, the flu has been widely studied for many years, with much data and history available.
had to deal with the flood of COVID-19 cases leading to 11,544 deaths. 4 Nationwide, comparing 'apples to apples,' the number of confirmed COVID-19 deaths in less than three months to date (surpassing 90,000 as of this writing) has far exceeded annual confirmed flu deaths. Describing how the CDC has been estimating flu cases and deathsincluding pneumonia in the estimate for example -Dr.
J.S. Faust (2020) reported that over the past six years the CDC's reported number of actual confirmed flu deaths (counting flu deaths just as COVID-19 deaths are currently counted) has ranged from 3,448 to 15,620, which are numbers far lower than those commonly repeated by U.S. public officials and even various experts. The grim NYC hotspot and national death-toll comparison highlight the warnings from epidemiologists that dismissive comparisons of SARS-Cov-2 with seasonal flu are naively inaccurate.
Illustrating the macro public-health model-risk uncertainties about SARS-Cov-2 and mortality, the CDC now tracks projection estimates from 14 epidemiological models, with all estimates now currently above 100,000 by June 1 (CDC 2020). The differing predictions and uncertainties, with sizable prediction intervals even over a couple weeks' forecast, depend on assumptions about the choices and effectiveness of varying social distancing policies and other interventions affecting social and business behaviors as well as health systems. A recently released, validated epidemiological model from a research team at Harvard Medical School, Mass General Hospital, and Georgia Tech shows a set of divergent projections out through the entire summer that is extremely dependent on assumed policy strategies and effectiveness. 5 Three choices are shown: "current intervention" (maintain stay-at-homesexcept-for-essentials in most states), "minimal restrictions" (includes an assumed level of learned social awareness: hand-washing, avoid close contacts when sick, etc. built into the model), and "lockdown" (complete ban on travel, as done in some countries such as Italy, China, and India). With the currentintervention strategy hypothetically maintained through July 6 (although it is already loosening in practice of course) and minimal-restrictions thereafter, the projected total U.S. deaths from COVID-19 would rise to about 116,000 by August 31 according to this model as of May 12, 2020. Under a hypothetical minimal-restrictions assumed throughout instead (essentially a 'full reopening' of restricted commerce and activity), from May 12 through August 31, the projection rises instead to a startling 2 million U.S. deaths by summer's end (reflecting the exponential growth of the virus). This large difference clearly suggests that the hypothetical "minimal restrictions" strategy would essentially be a return to business-asusual economic and other activity, with only minimal changes in actual social behavior.
Continued precautionary behavior by most businesses and large parts of the public, however, and "smart, phased-in" relaxations of social-distancing (at least on paper) should be able to keep such grim numbers, to some considerable extent, closer to the low end of the range between 116,000 and 2 million deaths; however, any model-based projections of what this "middle ground" might actually yield are highly uncertain, requiring many speculative assumptions. All assumptions affecting virus transmission and mortality could shift greatly at some point in response to a proven effective treatment, for example (in that case, transmission may increase while mortality falls). However, premature speculative hype on apparently promising but unproven pharmaceutical 'solutions' could support relaxations of effective social distancing and other precautions (due to beliefs that an effective 'cure' is coming or at hand), ultimately leading to more deaths.
It is sometimes asked, "If the models are so uncertain, and 'flattening the curve' just means we all have to get to herd immunity anyway, why should society have to endure the large economic and other social costs of social-distancing mandates and interventions? Wouldn't it be better to just 'take our bitter pill to swallow' sooner with no social distancing?" Given the high social costs of very high unemployment, etc. (which governments have been scrambling and enacting measures to mitigate of course), this is not an unreasonable question.
The common answer concerns the critical need to not overwhelm hospitals and all front line medical resources with COVID-19 cases, which has indeed been a very accurate and important strong message for the sake of severely stressed medical resources in hotspots and the avoidance of unnecessary deaths that raise the death count due to overwhelmed local systems. However, the common answer alone invites complacency when strong social distancing mandates are effective (as they have been) and leaders and the public then see the strains on medical systems diminishing, prompting political demands or expectations for a return to more normalcy in overall economic activity. Four other very important parts of an answer to the question above also deserve attention First, the critical epidemiological problem from SARS-Cov-2, unfortunately, does not simply disappear. It does not depend on any parameterized forecasting model, but only on the more basic epidemiological frameworks and knowledge that have been employed elsewhere on other diseases for decades. The ultimate mortality needed to stop SARS-Cov-2 depends on two critical factors: the estimated reproduction ratio of SARS-Cov-2 (used in the foundational threshold herd immunity equation) and the virus's raw or base mortality ratea base rate assumes no effective death-rate-mitigating treatment or vaccine or other ultimately effective non-pharmaceutical public health intervention. Current estimates of the reproduction ratio, discussed in section 3, together with the estimated mortality rate of about 1% imply about 1.8 -2.7 million eventual deaths in the U.S. from SARS-Cov-2 under a 'just take it sooner with no social distancing' strategy (summarized in section 3, Table 1), and possibly more given the confidence intervals of available estimates. This is significantly larger than the CDC's estimate of about 675,000 deaths that occurred in the U.S. as a result of the 1918 pandemic because the U.S. population now is about 3.2 times its size in 1918. Per capita, the estimated number of U.S. deaths based on these estimated features of SARS-Cov-2 is very much the same as the 1918 pandemic. The global estimate from such a hypothetical raw worldwide 'no social distancing strategy' would be eventual worldwide deaths of about 43 -64 million. This is very much the same as the CDC's estimated worldwide deaths of at least 50 million from the 1918 pandemic.
The key message in these high mortality numbers 'needed' is that 'herd immunity' (from natural spread through the population, as in any species) is not a 'required' social goal for virus extinction or suppression in the human population. Rather total immunity = herd immunity + vaccinated immunity or effective treatment is the necessary ultimate target. The key macro public health purpose of social distancing and related intervention strategies is to save lives by lowering the COVID-19 death toll from the build of herd immunity to buy more time for the (hoped) development of a successful vaccine or treatment. In this way, more vaccinated immunity, or a successful treatment, can substitute instead to ultimately eradicate or suppress the virus or its effects and keep the ultimate death toll down. Allen, et al (2020) refer to the 'just take it sooner with no social distancing' approach as the "surrender" strategy.
Second, not only does public health intervention from strong social distancing and other measures save lives, it also appears to be better for long-run economic growth. The COVID-19 coronavirus pandemic, like the 1918 pandemic, will likely have a significant negative impact on many businesses and long-term economic growth in any event (e.g. Garrett 2007;Brainerd & Siegler 2003). Evidence across cities from the 1918 pandemic examined empirically by Correia et al (2020), for example, finds that cities that intervened earlier and more aggressively did not perform worse and if anything grew faster after the pandemic. These authors suggest that non-pharmaceutical interventions (NPIs) not only lower mortality; they may also mitigate the long-term economic consequences of a pandemic.
Third, the U.S. economy has many economic interconnections with the rest of the world (and leadership connections). As a result, worldwide 'herd immunities' or herd immunities plus vaccine efficacy or effective treatment or quarantine interventions are actually needed to fully eradicate or suppress the threats of the virus in the U.S and elsewhere. This is a much more complicated global problem, even in a purely macroeconomic sense. The severe short and long-term social and moral costs of a go-it-alone and get-to-herd immunity-fast approach might sound appealing to some, in an effort to bring down forcefully the U.S. unemployment and business costs of strong social distancing. But such an approach would do little for the long-term economic impacts of the pandemic already in motion globally, and it would have both short and longer-term macroeconomic effects, impacting all Net Exports for one example, under the severely maintained world travel and commerce restrictions that would be needed. A furtherance and expansion of trade wars and many other economic, socioeconomic, and political 'unintended consequences' could also quite possibly result, further slowing both U.S. and global macroeconomic growth in both the short and long term. 6 Fourth, herd immunity is not a fixed constant as a 'goal.' Relaxations in effective social distancing have the confounding effect of raising the population's herd immunity threshold, pushing the herd immunity 'goal post' further away. Relaxations in social-distance practices and other containment measures (such as the practice of wearing masks) raise R0, which is directly related to the foundational herd immunity threshold of 1-1/R0 in a basic epidemiological relationship, as discussed in section 3. As R0 increases, the threshold herd immunity increases at the rate of 1/ 0 2 .
At the same time, the 1.8 -2.7 million eventual deaths in the U.S. cited above, and around 50 million worldwide, can also be viewed as the eventual outcome of a 'worst case' for the COVID-19 crisis.
A pandemic of this magnitude is beyond the capacity of conventional overall rapid case identification, contact tracing, and strict quarantining that succeeded globally in relatively quickly suppressing past dangerous virus outbreaks with no vaccine (such as SARS in the early 00's). Society has three general 'fallback' strategies, which are now all underway through actions and developments around the world of course in: new treatment developments, new vaccine developments, and non-pharmaceutical interventions such as social distance practices, broad testing with target quarantining, and related crisis-management developments. However, if each one of these fallback countermeasures does ultimately fail or prove to be ineffective, the worst-case eventual total mortality outcome is likely. For a variety of reasons, low-and middle-income countries are the most vulnerable to high mortality globally (Khalatbari-Soltani et al. 2020). Given past experiences with dangerous viruses of this general sort (no vaccine, no treatment), the worst-case outcome is not implausible sooner or later (e.g. Picheta 2020).
On the other hand, the unprecedented scientific research and government and individual efforts across the globe to achieve one or more effective COVID-19 countermeasures should not be 'counted out.' Based on the economic incentives and constraints and the scientific and other constraints evident from past virus-containment and pharmaceutical experiences, no single 100% 'cure' for the problem is probable; however, a mixture of continued and future more limited successes across the three general countermeasure strategies appears doable and most likely.

Some Key Factors/Parameter Needs and Implications from Epidemiology
As an inducement of voluntary social-distancing behavior, a 1% overall mortality rate alone (considerably lower for younger and less vulnerable individuals), while appreciably higher than that of the flu, may not seem much of a threat from the viewpoint of most in the general public. 1% is much lower than the approximate 10% death rate from the original SARS virus in 2003, for example. Public health messaging drawing on insights from behavioral science may be most effective in helping to maintain voluntary social-distancing and public support for any further needed mandated restrictions in the COVID-19 crisis (Bonell et al. 2020). An important simple message may be emphasis on the need for an extraordinary large share of the population to acquire SARS-Cov-2 in order to eradicate it 'naturally.' This multiplies the 1% death rate by around 200-250 million people in the U.S. becoming infected with the virus (knowingly or unknowingly) with no social interventions or changes in behavior and no successful treatment or vaccine, thus potentially leading to the over 2 million eventual U.S. deaths, concentrated across more vulnerable or unfortunate groups (e.g. Borjas 2020) and first responders.
In a grim variant of the 'tragedy of the commons' as discussed earlier, both the insidious behavior of SARS-Cov-2 and strong economic incentives have created a novel public-goods problem for many.
This section assesses the underlying characteristics of the virus based on the growing body of epidemiological estimates and insights seen to date to better illuminate its insidious features.
Many groups and governments worldwide have been estimating projections of the paths of SARS-Cov-2 cases and mortality outcomes. Such projections are typically based on well-established epidemiological models and principles. The most basic model of a pandemic, the so-called SIR, or "Susceptible -Infectious -Removed" model, is widely used, typically in different varieties and more advanced versions. In the basic model, changes in these three population-group variables are modeled in a parameterized system of ordinary differential equations (Jones 2007;Odetti & Piterbarg 2020). 7 Such model-based estimates have become a critical tool in policymaking around the COVID-19 pandemic in the U.S and elsewhere, including decisions to relax social-distancing mandates and guidelines.
Calibrations of the parameters of epidemiological models for local areas may especially benefit from good estimates of the determinants of local-area infection rates and their specific marginal effects.
A critical epidemiological parameter that may be better gauged across local areas is the assumed or estimated reproduction ratio, R0, discussed above. This key parameter is the expected or average number of new cases produced by a typical single infection in a completely susceptible population. 8 R0 is directly related to the intensity of effective contact between the infected and susceptible populations, referred to as the transmission coefficient or effective contact rate, β.
For epidemiological assessments, R0 has many important and useful properties (e.g. Heffernan et al. 2005;Jones 2007) used in analytic estimates and projections based on the calibrated mathematical models. Although it is a key parameter, R0 is also difficult to pin down empirically, especially early in any epidemic outbreak, as it takes time to estimate or gauge the key factors based on available data and information. Kirkeby et al (2017), for example, noted that while precise estimates of disease transmission 7 "Removed" can also be called "Recovered," since most patients from a typical virus recover, although the former is more precise as it refers to all those removed from the susceptible population (e.g. including the death rate). An important variant is the Susceptible-Infected-Exposed-Recovered (SIER) model, a more common approach of advanced models in use. 8 R0 is also equal to β x d, where d is the average duration of infectiousness, and it also equals β / ν, where ν is the removal rate from the infected proportion of the population. All parameters are assumed fixed. (Jones 2007). rates are critical for the simulation models, typically the rates must be estimated from longitudinal case data, which are costly and time-consuming to collect and analyze.
Epidemiologists helping to manage an unfolding crisis require faster evidence, and they have developed statistical methods, mathematical relationships, and adaptable model assumptions, which can draw on quickly observed doubling-times in a virus spread, for example, to allow faster calculations and estimates. A rapidly growing set of papers, many included in the references, have been using available case information from China, Iran, and the U.S., for example, to estimate key model parameters for projecting the outcomes of SARS-Cov-2, although these estimates to date tend to have fairly wide statistical confidence intervals of course.
As noted earlier, R0 is not simply fixed across a given population. A disease behaves differently in different local environments, depending on population density, public versus private transit, and other determinants of the frequency of social contacts and gatherings. Such factors directly impact the effective contact rate between the 'susceptible' and 'infected' populations. The effective contact rate, and therefore R0, is also of course lowered by traditional quarantining of patients and those exposed (identified through contact tracing), and, as seen so broadly today, by social distancing measures.
Transmission rates (the speed of spread) across a susceptible population depend on R0 and the 'serial interval' (average time between onset of symptoms in primary and secondary cases). In other words, the speed of an epidemic depends on how many other people each case infects and how long it takes for infection between people to spread. In assessing the available data used in calibrating their models, the serial interval is an important factor for epidemiologists seeking to estimate the R0 based on the latest available disease information as it unfolds.
Two other important factors are the incubation periodhow long it takes for symptoms to appear in a typical infected personand the average duration or time of infectiousness. The relationship between the incubation period and the serial interval, and the duration of infectiousness, have both had important implications for the relative severity of the spread of SARS-Cov-2. This is discussed next in some empirical comparisons between SARS-Cov-2 and the seasonal flu (based on this author's readings cited in the figures below and the references).

Transmission Differences in SARS-Cov-2 vs. Seasonal Influenza
As discussed above, the comparative data observed from SARS-Cov-2 in NYC and nationally dramatically exceeds a relatively bad flu season. Global outbreak comparisons from a many other hardhit cities and nations around the world appear likely to be similar. The apparent underlying epidemiological causes of this difference are illuminating.
First, the median serial interval of the seasonal flu exceeds the average incubation period, and moreover in comparison, the range of flu incubation periods is distributed comparatively narrowly. See Fig. 1. In contrast, the estimated SARS-Cov-2 serial interval is generally somewhat less than the estimated median incubation period, and the range of incubation periods is distributed across a considerably longer number of days. The median or average incubation period for SARS-Cov-2 is around 5-6 days, but it can vary from 1 to 14 days (WHO 2020). The virus has been characterized by a high R0, a long incubation period, and a short serial interval (Xie & Chen 2020). 9 Second, while ultimate statistical outcomes are probabilistically less severe across a population of the typical flu-infected versus the SARS-Cov-2 infected, a comparatively larger share of the SARS-Cov-2 infected appear to experience no or only mild symptoms (which many simply chalk up to a small cold or seasonal allergy without seeking treatment or testing). In other words, the relative severity of symptoms across the infected population appears to be more bi-modal with SARS-Cov-2. Most people with the flu know they have it (quite quickly) and naturally self-quarantine by 'staying home sick.' The same was true of SARS in 2003 and its severe symptoms. Not so now with SARS-Cov-2, a critical difference.

Source: Existing Statistical Estimates (see refs.) and WHO
Both of these first two factors clearly elevate the relative public health risk from SARS-Cov-2.
Each implies a larger scope for 'asymptomatic' or pre-symptomatic transmission. More of the infected population simply passes the SARS-Cov-2 virus along to others unwittingly, instead of naturally selfquarantining as in seasonal flu. In addition, a relatively low serial interval exacerbates the health system pressures, as COVID-19 cases can tend to surface and flow through into hospitals and other medical providers in faster cycles.
Third, the widely studied R0 of the flu is considerably lower than the range of current estimates for SARS-Cov-2, as shown in Fig. 2. R0 for the seasonal flu has been found to be 1.3. In comparison, based on a sizable set of recent studies, the estimates of R0 for SARS-Cov-2 range from the low 2's to 5.7 or more. R0 is equal to the effective contact rate times the average duration of infectiousness. The effective contact rate of SARS-Cov-2 is elevated by its high unwitting non-quarantining spread, and the infectiousness duration of someone with SARS-Cov-2 (14 days or more after becoming ill) is estimated to be at least twice that of the flu. The higher R0 implies two bad things for ultimate mortality. Not only does a single infection of the novel coronavirus on average infect a greater number of people, the ultimate need for virus extinction of a naturally developed 'herd immunity' plus vaccine efficacy is also higher. Sanche et al (2020), for example, reported that because the threshold for combined vaccine efficacy and herd immunity needed for disease extinction is 1 -1/R0, at R0 = 2.2 the threshold is 55% while at R0 = 5.7 the threshold is 82%.
It is easy to show that as R0 increases, the threshold herd immunity increases at the rate of 1/ 0 2 , which has a confounding implication for relaxations in social distancing as discussed above.
The herd immunity threshold equation of 1 -1/R0 used here has been a basic relationship in epidemiology for decades, although the simplifying assumptions behind it and the complexities of applying it in public health practice (in addition to other usages of the term 'herd immunity') make it a foundational benchmark guide (Fine et al. 2011). Applying this equation to examples from the range of estimates of R0 from the many recent studies of SARS-Cov-2 cited in Figure 2, Table 1 shows estimates of the implied 'worst case' number of deaths in the U.S. and worldwide, as discussed earlier.  For all the reasons described above, the behavior of SARS-Cov-2 is insidious. It encourages many to "let their guards down," allowing more rapid often latent exponential spread. Remaining threats to local and national public health with further economic repercussions, including various "second wave" or "saw tooth" behavior from SARS-Cov-2, should not be under-estimated, even as more locations in the U.S. and around the world continue to relax social distancing measures owing to their high economic and other social costs. Leaders generally wish to avoid or quickly contain the type of recent 'reopening' experience seen in Hokkaido, Japan, for example. Hokkaido acted early on February 28 to contain a rapid SARS-Cov-2 outbreak with a 3-week emergency 'lockdown,' but after 3 weeks it fully lifted the restrictions. This resulted in an even larger second wave of infections requiring a renewed emergency lockdown on April 14 (Leonard 2020). 10 The resurgence occurred despite the fact that the island prefecture of Hokkaido, population 5.4 million, has the lowest population density in Japan with overall density lower than that of Georgia and Michigan, and only a little higher than that of South Carolina and Tennessee. 'Reopening' experiences across U.S. states could become similar if they move too far too fast to relax social distancing. 11

A Proposed Complementary Model for Estimating Determinants of Different Infection Rates Across Individuals or Local Areas Correcting for Sample-Selection Bias
Epidemiological and public-health analyses of cases and factors in local transmission may be assisted with a recursive bivariate-probit estimation method to address sample-selection bias, a method used previously in other economic and biostatistics applications (Marra et al. 2014). Applicable to individual or local-area data, this proposed modeling approach is complementary and not proposed to substitute for other methods or broader randomized population testing, although it may also be useful for improved initial estimates in advance of more random testing. 12 Model estimation results from applying this method may be useful in: 10 Japan had adopted a 'clustering approach' strategy and their experts had initially been celebrating its apparent early 'Hokkaido success story' in controlling the spread of SARS-Cov-2 (Suzuki 2020 does not mention the second wave lockdown and acknowledged shortcomings such as viewing the virus like a bad flu strain; Leonard 2020). 11 Previous global experiences with science-led policymaking during a pandemic point to two alternative policy directions (Baekkeskov 2016), which are also evident across the U.S. states today. One policy (correlating with states with more limited hotspots) emphasizes nationally established ideas that have placed policy on trajectories of first 'ignore,' then 'shut down,' then 'open up.' In this approach a national 'theme' is followed with any countervailing new information eliciting 'underreaction' (Maor 2014) until such information becomes overwhelming. This approach appears most conducive to recurring waves of the virus. The other approach follows a more local 'knowledge-and-validation community' where uncertainties cause differences in expert judgments but accruing information leads toward better state-and-local-specific consensus on appropriate policies. The timing of recent policy changes, relaxing social distancing at least some in nearly all states, some weeks in advance of the timing recommended by most experts, shows a 'domino effect'i.e. national-idea-led policy affects all states. 12 A bivariate logistic approach is also possible, although not common. a) COVID-19 policymaking and tactical decisions; b) more precise parameter estimation and assumptions within the epidemiological projection models, with resulting more confident forecasts, scenarios, and other projections, particularly locally; c) aspects of stratified randomized population testing; and d) benchmarking against results from other modeling and estimation methods.
Combined model estimates with other local-area data and information feeding one or more epidemiological models and decision-making dashboards may be used to help better manage medicalresource deployments seeking to "skate to where the puck is going." This may be helpful to state and local leaders and analysts in various ways and at various points during cautious phased relaxations of social distancing measures.
The data to estimate this proposed model are now widely reported in aggregated (group-level) form across states, cities, local-area U.S. counties, and in some cases from the neighborhoods in different cities. Individual-level case data also exists; however, these developing data in the U.S. require deidentification for HIPPA compliance, as discussed further below.
A key unknown for public health analysts and decision makers is the total infection rate, i.e. the full infected proportion across or within local-area population(s). The numerator of this unknown rate for a local area includes the observed test-positive cases plus many unobserved pre-symptomatic and asymptomatic infections in individuals never tested (or who previously tested negative but could test positive now if tested again). This might be considered a case of partial observability of individual infections across the population or somewhat more appropriately, sample selection (the difference in these similar framings favors the latter in this case, as discussed below). We can only observe the outcomes of whether or not some random individual from the population is infected based not on their true infection status, but rather on the observed outcome of a different status variable, tested or not tested (and perhaps repeated testing). And the selection process into observability is nonrandom. This creates the wellknown statistical problem of sample selection bias in the estimated model parameters if we simply try to estimate a single-equation model to gauge the most important predictors of infection incidence.
A common multivariate estimation tool to address the problem of sample selection bias is the bivariate probit model estimator (e.g. Greene 2003/12;Cuddeback et al. 2004). This general model approach includes a sample-selection equation and a substantive equation of main policy or similar interest. In this case, in general descriptive terms, we have: (1) y ~ Pr (Infected) = f (.) + υ substantive equation of main interest (2) s ~ Pr (Tested) = g (.) + ν sample-selection equation (3) ρ = Corr (υ,ν) ≠ 0 nonzero correlation of error terms The nonzero correlation of the error terms is likely for a number of reasons. One reason, for example, is the false negative rate in the diagnostic testing for novel coronavirus. If a patient presenting SARS-Cov-2-consistent symptoms tests negative for this virus, and particularly if they are also tested and found negative for the flu, they are often simply sent home and instructed to quarantine for 14 days. Only those with worsening symptoms, i.e. the "sickest of the sick," will typically return later for a repeated SARS-Cov-2 test. This does not mean that the "least sick of the sick" do or did not have the contagious virus.
Because re-testing is also more likely to be performed on those truly infected with SARS-Cov-2, this also boosts the likelihood of testing in a positive correlation based on unobservable factors with the likelihood of SARS-Cov-2 infection. υ is correlated with ν. 13 The nonzero correlation of the error terms means that an attempt to simply estimate the substantive equation (1) alone will result in coefficient estimates that are biased and inconsistent.
The simultaneous estimation of (1) and (2) with a bivariate probit parameterization allows an estimation of 'unbiased' and consistent parameters in the substantive equation (1) of policy and publichealth analytics interest (i.e. this approach 'corrects' for sample-selection bias). The more appropriate specification of this problem as a sample selection rather than partial observability model concerns the 13 One other causative factor driving this nonzero correlation, for example, has been the widespread public dissemination (easily obtainable via a simple Google search, TV viewing, etc.) of the types of symptoms typically seen to be consistent with most SARS-Cov-2 infections. This fact, plus most of the population's enhanced fear to enter any medical or testing facility (for fear of contracting the virus) plus many doctors' needs to triage limited test kits in favor of more likely cases, has tended to keep spurious test attempts down (mitigated to some extent by the limited "drive-through" testing sites that have been made available). explanatory variables on the right-hand side of (1) versus (2) (Greene 2003). The explanatory variables are likely to differ somewhat in (1) versus (2) as there are factors that affect population testing incidence but not the incidence of SARS-Cov-2 infections for example.
Consider the following plausible determinants of the Infection versus Testing likelihoods in the general model specifications in (1) and (2) The conceptual variables specified in (1a) and (2a) are largely self-explanatory factors affecting the effective contact rate and virus vulnerability; these variables are specified further in greater detail in Table   2 below, with the expected signs on the variable in view of the earlier descriptions and discussions. The variables are also more or less difficult to capture or measure depending on the nature of available data and level of information sought, and some may not be measurable easily or at all at individual or group level. "Flu" in the testing equation indicates the possibility of an individual likely to be given a SARS-Cov-2 test due to similar symptoms from the flu. If the presence of seasonal flu or other flu strain is always tested first (as in testing-kit triage triage), then testing for SARS-Cov-2 is sequential. Flu testing may also be given simultaneously or not at all. The predicted effect of "Flu" on net is ambiguous.
Notice that (1a) does not include Age or vulnerable Health Condition (controlling for Living Facilities). This maintains the simplifying assumption that other than in more dense living facilities such as nursing homes and high-density housing and neighborhoods, these health-vulnerability conditions primarily affect the likelihood of infection severity, rather than the incidence of infection. A more complete model, beyond the scope of this paper, would also include an equation predicting the likelihood of infection severity with age and health vulnerability (correlated with other variables such as income and race; Borjas 2020) expected to have large effects.
The postulated model specifications in (1a) and (2a) lead to the recursive bivariate probit model specification (Maddala 1986;Filippini et al. 2018) given that the endogenous variable, Infected, appears in (2a). Using the standard latent-variable specification for limited-dependent variables, this model specification for estimation is: (3) y * i = β'1X1i + υi , yi = 1 if y * i > 0 , yi = 0 otherwise ( Greene (1996). Table 2 shows many of the potential data sources (focused on the U.S.) and defines the explanatory variables in (1a) and (2a) more specifically, showing their postulated effects. Equations (3) and (4) are specified, as usual, at the individual level, as this is where the endogenous individual-case selection and outcome variables of Tested and Infected are observed directly of course. Case-level data, however, although potentially valuable across various applications, is not needed to estimate equations (3) and (4) as described below.  (Courtemanche et al. 2020) a Low income is assumed associated with more dense neighborhoods and closer living quarters and less access to testing (Borjas 2020). High Income is assumed associated with greater likelihood of prior domestic and overseas travel and greater access to testing.

Data Sources and Expected Effects of Explanatory Variables
b These vulnerability factors increase the COVID-19 severity risk, but their effect on SARS-Cov-2 incidence likelihood is unclear, all else constant, especially with more self-quarantining conducted by those more vulnerable to severe COVID-19 illness.
Case-level data requires anonymizingde-identifying removal of all personal identifiable information (PII) from the dataand careful assessment of the distributions of each of the individuallevel data elements to remove erroneous data, address missing information, and so on. 14 After such work, these data could be a rich source of information with many additional detailed explanatory variables beyond those listed, including clinical/medical-condition indicators from individual patients. Recently announced plans have been assembling and speeding plans for making such data more widely available. 15 It is time consuming and may be impossible to fully identify reporting errors in individual-case data.
Data problems may be more difficult given the pandemic environment and for global analyses given international differences in privacy practices and data collection. For all these types of reasons, sound model estimation and applications are more difficult and time consuming to develop from case data. 16 In contrast as shown in table 2, group-level data is already widely available from multiple data sources. For the set of explanatory variables postulated, these data also require time and effort to gather and interpret, but they are both more readily accessed at the current time and generally more standardized for measurement level at the state and county level in the U.S. With respect to individual-level reporting errors, the Central limit Theorem suggests that so long as a local-area population is large enough the reported group-level means from individual-level data should tend to be normally distributed.
For decision-making beyond the case-level clinical setting, state and local health public officials, epidemiologists, and analysts are primarily focused at this time on group-level policy choices, particularly which parts of a state can 're-open' and remain as such under what kinds of conditions. For these reasons 14 This practice has been followed for years at large scale in credit scoring model developments, for example. 15 Ciox, a group partnering with LabCorp, for example, announced on April 10, 2020 that they are doing this type of de-identifying approach "with an initial data set based on LabCorp's nearly 500,000 completed COVID-19 tests" (Eddy 2010). Two weeks later, Cerner Corporation announced a very similar plan of "offering select U.S. health systems and academic research centers complimentary access to critical de-identified COVID-19 patient data to help fight the pandemic. This offering will provide eligible health care researchers free access to Cerner's COVID-19 data set to support epidemiological studies, clinical trials and medical treatments related to COVID-19, in line with applicable laws and guidelines" (Cerner Corporation 2020). See Jason (2020) who also describes these efforts and another similar one from Health Catalyst. The WHO also has a Global COVID-19 Clinical Data Platform, although it is limited to hospitalized patients. Similarly, the CDC has a database, COVID-NET that was "implemented to produce robust, weekly, age-stratified COVID-19-associated hospitalization rates." (Garg et al. 2020). 16 Organizations first providing the anonymized-case databases may request feedback from qualified initial users.
also, local-area-data modeling and projections that are faster to produce have significant clear advantages.
Estimating equations (3) and (4) with local-area information requires an assumption that the aggregated local-area explanatory variables are sufficient for capturing the underlying individual behavior.

Addressing the Fractional-Response Endogenous Variables in the Group Level Data
Because equations (3) and (4)  However, we can reasonably simplify this problem if we know the denominator, the local-area population "count" (or estimate based on data since the last full census). If we know the local population count, then we know the number of 0's as well as 1's to assign synthetically for a straightforward estimation of equations (3) and (4), with the explanatory variables from and predicted outcomes interpreted at the local-area level. The relevant current population of any local area is equal to: (5) CURR. POP = Last "Count" -Out-Migration -Deceased(COVD-19, Other) + In-Migration

+ Births
The values of each of the variables on the right side of (5), except for Last "Count," are dynamic and changing daily or weekly. For this reason, it seems generally reasonable to assume that Last Population Count or Estimate provides a sufficiently close estimate of the Current Population, which implicitly assumes that Out-Migration + Deceased = In-Migration + Births. Certainly for model estimation, this is reasonable in general to simplify (even though more recent data on deaths and births may be easier to obtain for the period since Last "Count"). However, there is clearly a problem with this assumption for any local area data with a sizable or large outbreak of the virus if this event has led to a considerable amount of Out-Migration and Deaths, and very little or no In-Migration (suggesting the need for a reasonable adjustment of the assumed area denominator).

Figure 3. Local Population Components of SARS-Cov-2 Susceptible, Infected, and Recovered
The widely tracked and reported Known Infected cases in a local area include all those tested who tested positive for SARS-Cov-2, including the deaths from resulting COVID-19. As depicted in Figure 3, most of this group are at varying stages of recovery or stages of becoming non-infectious (typically thought to be reached about 4 to 6 weeks after release or testing). Some of those infected may well have lasting physical damages, making them the 'wounded' share of the infected population, which only future research may be able to estimate. Global experience has also shown that some recovered patients have become re-infected, for reasons that are unclear. The share of the Infected population that does not become Antibody Immune is thought (or hoped) to be small, but not yet known.

Addressing the Different Types of Testing
Figure 3 also describes two types of SARS-Cov-2 testing, and this is relevant for the estimation of the selection equation (4)  have had more problems and may not be reliable (Vogel 2020).
For the estimation of equation (4), researchers should determine if the reported test counts from a local area are limited to DT. If random AT is included in the total local-area testing count, and if it cannot be identified and removed from the count, it can probably be assumed to be negligible relative to the overall volume of DT, certainly at this time in the U.S. Most state and local reporting of the number of tests performed appears to be of the DT only. 18

Addressing Data Reporting Discrepancies
As the description earlier of differences between confirmed deaths from COVID-19 versus confirmed deaths from the flu suggests, data reporting discrepancies need to be considered. Not all 17 'Kits' must include swabs and other materials needed for any test, and available trained personnel as well. 18  sources are reporting and tracking the incidence of SARS-Cov-2 infections and testing in the same way or with the same timeliness. For example, Meyer & Madrigal (2020) report data discrepancies between state health departments' testing data and recently started CDC reports on the incidence of testing by state.

Potential Applications of the Recursive Bivariate Probit Model Estimates
With the above data sources and their general pros and cons in mind, an estimated model specification like that in (3) and (4)  A preliminary scorecard type of model estimation has already been made for this virus using patient and control-and validation-group clinical and other data (Tordjman et al. 2020). 20 These authors found no clinical variables that were statistically significant. A scoring model of this general type, controlling for sample selection bias, could be valuable in different ways both in the U.S. and particularly in lower-income nations where doctors and health systems have less access to much diagnostic testing (or much more delayed testing results).
19 Financial institutions, for example, have commonly used scoring models for many years to predict/rank-order the likelihood of default of a loan applicant. 20 Note that this is a pre-print study prior to peer review. The authors used univariate and binary logistic analysis to derive their score, apparently with no determined need or no attempt to control for possible sample selection bias.

Scoring Based Clinical Testing Triage and Improved Assessments
For any local area in the U.S. or any nation where the local medical providers or testing stations do not have sufficient test 'kits,' a sound scorecard model result may be able to assist the providers in triaging the available tests to perform them first on those statistically most likely to be infected.
Scoring results may also help assess more uncertain clinical cases that have tested negative.
Given the chances of false negatives on the diagnostic test, re-testing may be sought or performed first on those more vaguely symptomatic cases that have a high statistical likelihood of infection. More aggressive quarantining may be also be advised or pursued for those with a higher likelihood of infection.

Scoring, Random-Sample, and Case-Count Based Area Resource Allocations
If a stratified random sample of testing has been performed across local areas, it may be used together with individual-level scorecard results. It may be possible to use these complementary findings to assign more precise predictions of the overall probabilities of infection across local areas. In this way, the random sample results together with the number of observed Infection cases and scorecard model results across local areas may be used to better identify emerging and potential nearby local hotspots.
Scorecard model results can be applied to rank-order and examine the distributions of infectionrisk characteristics within and across the local areas. This may provide a confirming or alternative-view benchmark assessment of the mix of risk characteristics within and across the local-area populations that are driving any hotspot outbreak area of infections. Potential nearby trouble spots at elevated risk but currently showing fewer cases may be better identified, thus helping to better prioritize and triage available medical-resource and testing allocations across areas.

Group-Level (Local Area) Scoring Models and Area Resource Allocations
In a very similar way, a scoring model built with local-area data may be able to help score each different local area, and thus assist in rank-ordering the relative infection risks across local areas, also in conjunction with the other data and any random sample findings available.

Targeting of Stratification in Random Sample Testing
Case or local-area scorecard model results may also be used to better target stratification in random sample testing. Areas with more vulnerable and higher-infection-risk characteristics may be targeted for higher sample weights in stratification for testing allocation.
Scorecard model results may also be used for initial or complementary estimates of local-area infection rates prior to or along with available broad-based random sampling.

Hazard-Model Predictions of Infection Rates
Case or local-area scorecard and projection models can also be built using panel data for longitudinal assessments, which is easier to collect and assemble at the local-area level. This allows estimations of hazard-model based scoring models, for example, that also rank-order the relative risks, but in addition predict the time patterns of relative infections as well, which can estimate the likelihood of greater or fewer local-area infections over time, providing benchmarks for epidemiological model assumptions. Common options for a model of this type include the Cox proportional hazards model, or a panel-logit or panel-probit model which estimate a discrete hazard.
An important benefit of using panel data in estimation is that these data also allow for the use of fixed-effects or random-effects estimators that control for unobserved county-level characteristics for example (including the effects of behavioral changes estimated through random effects). Fitting the bivariate probit model with panel data is described, for example, by Roodman (2011) andMiranda (2010). Mulkay (2015) describes methods for computing a bivariate probit model on panel data with correlated random effects. See also Vella & Verbeek (1999) and Verbeek (1990).

Improved Parameter Assumptions in the Epidemiological Forecasting Models
The epidemiological models remain the primary tools for forecasting the future paths of infection across local areas, particularly given the rapid growth underlying epidemiological properties of exponential spread. Empirical hazard-model results, particularly panel based, may be able to provide both useful benchmarks and assistance in refining parameter assumptions within the epidemiological models.
As discussed earlier, the most pressing but very uncertain state and local policy questions today concern the effectiveness of various relaxed gradations of social distancing practices in reducing the spread of SARS-Cov-2. As noted earlier, statistical estimates of the effects of more stringent social distancing measures on R0 have thus far confirmed the general and fairly obvious expectation that such restrictions result in much more limited social contacts which do indeed lower the R0 of the virus.
Various international and state and local level gradations in the timings of first restrictions and their features and now the relaxed social distancing have generated a range of empirical variation across countries and U.S. states and counties, which can be empirically tested in the context of the model above.
As noted, this type of estimation has already been performed by Courtemanche et al. (2020) using countylevel data across the U.S. These authors have estimated the marginal impacts on growth rates of specific gradations of social-distancing measures based on local-area variations, controlling for voluntary social distancing (to some extent) and other unobserved factors through fixed effects. 21 In addition, as cited in Table 2, aggregated anonymous geo-location data from digital devices, or survey tracking data, if available, might be able to provide important measures of unobserved effective voluntary social distancing behavior. Although difficult to study and measure, voluntary social distancing reflects a need for largely uncoordinated but effective mass behavioral changes to help overcome the pandemic (Weible et al. 2020) and the novel public-goods problem it presents. The use of random-effects estimators and anonymous geo-location and survey data may prove most useful for the purpose of estimating these varying behaviors and their marginal effects on virus outcomes across local areas.
Estimates of the effects of gradations of social-distancing mandates and voluntary behaviors, together 21 In qualifying their conclusions the authors note the changing nature of voluntary social distancing over time. A random-effects estimation approach may be better able help quantify this.
with effective NPI strategies (e.g. Ferguson et al. 2020;de Walque et al. 2020;Mello & Wang 2020), may help improve model projections, policy choices, and ultimate mortality and economic outcomes in the 'new normal' of the COVID-19 crisis.

Conclusion
This paper has drawn on many of the recently available empirical studies to assess the risks of the The third countermeasure of NPIs includes the continued needs for testing and statistical modeling analysis to assist with better local pandemic outbreak management. I proposed and specified a recursive bivariate-probit model (for the probability of Infection, given the selection probability into Testing) to help address the problems of sample selection bias and the desire for more knowledge about the marginal empirical effects on local transmission. Applicable to individual or local-area data, I described several potential explanatory variables and various sources of these types of data in the U.S., concluding that at present local area model estimation can proceed more quickly and broadly, although case-level scorecard models may come to have important applications. I proposed several applications for assisting in COVID-19 crisis management. Finally, within the specified explanatory variables, several of which may have useful implications for public health decisions and policymaking, I highlighted the usefulness of potential estimation of the marginal effects of various state-prescribed and behaviorallyrealized social distancing, which may be able to help improve model projections, policy decisions, and mortality and economic outcomes.