Estimating COVID-19 related mortality in India: An epidemiological challenge with insufficient data

The harrowing second wave of COVID-19 in India has led to much discussion over the quality and timeliness of reporting of deaths attributed to the pandemic. In this brief report, we aim to present the existing evidence, as well as the broader complexities surrounding the mortality burden of COVID-19 in India. This article sheds light on the following epidemiological issues: (1) general and India-specific challenges to COVID-19 death reporting, (2) latest COVID-19 mortality estimates in India as of May 16, 2021, (3) the apparent scale of uncaptured COVID-19 deaths, and (4) the role of disaggregated historic mortality trends in quantification of excess deaths attributed to COVID-19. We conclude with a set of high-level policy recommendations for improving the vital surveillance system and tracking of causes of death in India. We encourage direct efforts to integrate health data and indirect strategies for crossvalidation of registered deaths. Such system-wide advances would drastically aid epidemiological research efforts and strengthen India’s position to overcome future public health crises. Background: As of May 16, 2021, India, a country with a population of 1.38 billion—was second only to the United States in the total number of reported SARS-CoV-2 cases (nearly 25 million) and third following the U.S. and Brazil in total reported deaths (over 270 thousand) . Data from seroprevalence studies and limited excess mortality calculations offer evidence that the actual number of infections and deaths are likely much larger than the ones reported [2, . We recognize that multiple challenges lead to underreporting of COVID-19 fatalities including: (1) deaths that occur outside of hospitals either are not captured or incur a lag, (2) deaths that are classified under comorbid illnesses, (3) deaths that are due to low access to quality healthcare and/or a shortage of healthcare resources, (4) deaths that are undetected as a result of an inadequate COVID-19 testing program. Our review of the existing evidence suggests that the problem is particularly acute for India, where a large number of deaths (especially ones happening outside a healthcare facility and/or in rural areas) routinely remain medically unreported . Current Estimates: We report these numbers as of May 16, 2021. The overall case fatality rate (CFR) in India has remained low (1.09%) relative to estimates from other countries (1.77% in the United States, 2.78% in Brazil, 2.07% globally) . However, India has a young population (e.g., Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 May 2021 doi:10.20944/preprints202105.0617.v1 © 2021 by the author(s). Distributed under a Creative Commons CC BY license. proportion of the population 65+ is 6.4% in India versus 9.3% in Brazil and 16.5% in the U.S.) and as such age-specific mortality comparisons are more meaningful. The first and second waves of the pandemic in India are characteristically different in terms of both infections and deaths. The CFR for Wave 1 is 1.4%, while the CFR for Wave 2 is currently 0.8%. Some state-specific numerical estimates are presented in Table 1. It is hypothesized that the reduced CFR in Wave 2 is due to underreporting, pending data reconciliation from diverse sources, and a large number of infections in younger age groups with a lower risk for severe clinical presentation of SARS-CoV2, an assertion yet to be verified . Underreporting of Infections and Deaths: To estimate infection fatality rates (IFR), we, among other researchers, have used epidemiological models and seroprevalence surveys . Such models [7] indicate the underreporting factor is around 10-20 for cases and around 2-5 for deaths, based on data from Wave 1 in India. According to these studies, the IFR for India is roughly 0.1% using observed death counts and 0.4% after incorporating underreporting of deaths (Table 1). The former resembles early estimates for Mumbai, Srinagar, and Karnataka using observed fatalities (0.09%, 0.06%, and 0.05%, respectively) [8, . We note that anecdotal and media reports corroborate model estimates. For example, during Wave 1, a group of volunteers collected reported deaths from obituaries in newspapers and found the death count to be almost twice that officially reported . Likewise, during this recent surge, a New York Times article noted that authorities in Gujarat reported between 73-121 daily COVID19 related deaths in mid-April, contradicting a leading newspaper in Gujarat that cited the number was several times higher (around 610 daily deaths) . Recently, an excess death calculation based on comparing death certificates issued in the state of Gujarat [12] showed that while the state reported 4,218 COVID-19 deaths during March 1-May 10, 2021 an estimated 61,000 excess deaths remained uncounted, indicating an underreporting factor of nearly 15. Moreover, comparisons to past years of satellite images revealing fires emitting from burial pyres has imprinted the sheer scale of additional lives lost to the pandemic in April 2021. Unique Features: Due to delayed detection, the proportion of COVID-19 deaths with a narrow time to death window (from the date of confirmed diagnosis) is higher in select regions compared to the global findings. For example, a study found a considerable 18% of deaths across the states of Tamil Nadu and Andhra Pradesh occurred within 24 hours of diagnosis , suggesting a substantial lag in the initial diagnosis of COVID-19 compared to other countries. In Wave 2, a strained health system, a deficit of ICU beds, and inadequate oxygen monitoring for at-home isolation has collectively exacerbated this issue. The CFRs in India vary considerably across states (e.g., among large states, Kerala has the lowest and Punjab has the highest case fatality rates). This geographical heterogeneity is also reflected in the (albeit limited) regional excess death calculations available for 2020 . Data Paucity: India, unlike other countries, does not have robust data that can be used for analysis . The Ministry of Health and Family Welfare shared age and sex-disaggregated COVID-19 related data at the start of the pandemic, but the officials stopped reporting this information. We only have access to sporadic release of charts and tables in briefings and media reports. We join the research community in calling for these data as well as information on comorbidities, which are necessary to track age-sex specific trends, to identify high-risk subpopulations, and to validate hypotheses regarding rates of infections, severe cases, and deaths within subgroups of interest. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 25 May 2021 doi:10.20944/preprints202105.0617.v1

proportion of the population 65+ is 6.4% in India versus 9.3% in Brazil and 16.5% in the U.S.) and as such age-specific mortality comparisons are more meaningful. The first and second waves of the pandemic in India are characteristically different in terms of both infections and deaths. The CFR for Wave 1 is 1.4%, while the CFR for Wave 2 is currently 0.8%. Some state-specific numerical estimates are presented in Table 1. It is hypothesized that the reduced CFR in Wave 2 is due to underreporting, pending data reconciliation from diverse sources, and a large number of infections in younger age groups with a lower risk for severe clinical presentation of SARS-CoV-2, an assertion yet to be verified [5] .
Underreporting of Infections and Deaths: To estimate infection fatality rates (IFR), we, among other researchers, have used epidemiological models and seroprevalence surveys [6] . Such models [7] indicate the underreporting factor is around 10-20 for cases and around 2-5 for deaths, based on data from Wave 1 in India. According to these studies, the IFR for India is roughly 0.1% using observed death counts and 0.4% after incorporating underreporting of deaths ( Table 1). The former resembles early estimates for Mumbai, Srinagar, and Karnataka using observed fatalities (0.09%, 0.06%, and 0.05%, respectively) [8,9] .
We note that anecdotal and media reports corroborate model estimates. For example, during Wave 1, a group of volunteers collected reported deaths from obituaries in newspapers and found the death count to be almost twice that officially reported [10] . Likewise, during this recent surge, a New York Times article noted that authorities in Gujarat reported between 73-121 daily COVID-19 related deaths in mid-April, contradicting a leading newspaper in Gujarat that cited the number was several times higher (around 610 daily deaths) [11] . Recently, an excess death calculation based on comparing death certificates issued in the state of Gujarat [12] showed that while the state reported 4,218 COVID-19 deaths during March 1-May 10, 2021 an estimated 61,000 excess deaths remained uncounted, indicating an underreporting factor of nearly 15. Moreover, comparisons to past years of satellite images revealing fires emitting from burial pyres has imprinted the sheer scale of additional lives lost to the pandemic in April 2021.
Unique Features: Due to delayed detection, the proportion of COVID-19 deaths with a narrow time to death window (from the date of confirmed diagnosis) is higher in select regions compared to the global findings. For example, a study found a considerable 18% of deaths across the states of Tamil Nadu and Andhra Pradesh occurred within 24 hours of diagnosis [13] , suggesting a substantial lag in the initial diagnosis of COVID-19 compared to other countries. In Wave 2, a strained health system, a deficit of ICU beds, and inadequate oxygen monitoring for at-home isolation has collectively exacerbated this issue. The CFRs in India vary considerably across states (e.g., among large states, Kerala has the lowest and Punjab has the highest case fatality rates). This geographical heterogeneity is also reflected in the (albeit limited) regional excess death calculations available for 2020 [14] .
Data Paucity: India, unlike other countries, does not have robust data that can be used for analysis [15] . The Ministry of Health and Family Welfare shared age and sex-disaggregated COVID-19 related data at the start of the pandemic, but the officials stopped reporting this information. We only have access to sporadic release of charts and tables in briefings and media reports. We join the research community in calling for these data as well as information on comorbidities, which are necessary to track age-sex specific trends, to identify high-risk subpopulations, and to validate hypotheses regarding rates of infections, severe cases, and deaths within subgroups of interest.
In terms of longevity and cause of death, India's most recent reporting of life expectancy and allcause mortality estimates are from 2014-2018 and 2010-2013, respectively, precluding any meaningful, timely study of all-cause or excess mortality. According to the latest global excess mortality study (January 2021), 77 countries report data on all-cause mortality, enabling experts to compute country-specific excess mortality, which is largely considered the gold standard for estimating the burden of COVID-19 [15] . India is a notable exception [15] ; in our opinion, the release of these figures is sorely needed.
Impact of Insufficient Data: Deficiency in the COVID-19 death reporting has harmful ramifications. It limits modelers' ability to predict the course of the pandemic, gauge its impact, and estimate healthcare resource needs-including oxygen supplies and hospital beds. This data deficient environment stunts overall policy efforts to improve public health outcomes and healthcare infrastructure. Without disaggregated epidemiological data, linked with genomic sequencing, assessing the lethality of virus strains and evaluating vaccine effectiveness becomes nearly impossible.
Recommendations Moving Forward: We offer general recommendations herein for systematizing the collection and advancing the quality of all-cause and disease-specific mortality in India. The Indian government recently announced a pilot trial of a personal digital health identifier, which would ultimately serve as an electronic key to a health data repository for each individual nationwide [16] . Integrating data across health systems offers a solution to capturing allcause mortality in a more nationally representative way. With successful implementation, and multi-platform linkages, a digital health ID would enable comprehensive analysis of healthcare outcomes via continuous reporting and a breadth of available data.
Heterogenous data linkage holds promise for approximating unreported deaths, such as through tracking inactive Aadhaar cards (akin to social security cards in the U.S.), bank accounts, phone numbers, and social media accounts. Inspection of life insurance claims may also complement indirect validation efforts. Innovative strategies for surveillance using community healthcare and Accredited Social Health Activist (ASHA) workers are needed in rural India, where a proper reporting system is largely absent. We recommend strengthening the Civil Registration System (CRS) by leveraging community engagement and partnerships as well as collaborating with community and religious leaders to encourage prompt reporting by family members of the deceased. We need continued attention to medical certification of deaths, and mandatory linking to the CRS for India to meet international standards. Death not being reported reflect dishonor to the entire life of a person. When not captured and analyzed, the existing health inequities are further exacerbated. A fortified nationwide vital surveillance system, as well as timely and comprehensive data reporting, is at the heart of fighting this pandemic. An investment in a robust data ecosystem now will help safeguard India against future health crises. (1) CFR is the number of reported deaths divided by the number of reported infected cases.
(2) IFR and Adjusted-IFR are estimates from an extended SEIR model, where Adjusted-IFR accounts for under-reporting of COVID-19 deaths.
(3) Excess death calculations vary across regions, as approaches depend on underlying assumptions regarding the number of expected deaths. The general framework includes obtaining the difference between the observed death count and the average expected death count, as derived from previous years.