Self-healing and Underreporting of Cases of Visceral Leishmaniasis in Bihar , India : A Mathematical Modeling-based Study

1 Background: Underreporting of Visceral Leishmaniasis (VL) in India remains a prob2 lem to public health controls. Effective and reliable surveillance systems are critical for mon3 itoring disease outbreaks and public health control programs. However, in India, government 4 surveillance systems are affected by levels of scarcity in resources and therefore, uncertainty 5 surrounds the true incidence of asymptomatic and clinical cases, affecting morbidity and 6 mortality rates. The State of Bihar alone contributes up to the 40% of the worldwide VL 7 cases. The inefficiency of surveillance systems occurs because of multiple reasons including 8 delay in seeking health care, accessing non-authentic health care clinics, and existence of 9 significant asymptomatic self healing infectious cases. This results in a failure of the system 10 to adequately report true transmission rates and number of symptomatic cases that have 11 sought medical advice (thus, high underreporting of cases). 12 Objectives and Methods: There are several methods to estimate the extent of un13 derreporting in the surveillance system. In this research, we use a mathematical dynamic 14 model and two different types of data sets, namely, monthly incidence for 2003-2005 and 15 yearly incidence from 2006-2012 from the Bihar’s 21 most VL affected districts out of its 38 16 districts. The goals of the study are to estimate critical metrics to measure level of transmis17 sion and to evaluate the estimation process between the two data sets and 21 districts. In 18 particularly, our focus is on (i) estimating infection transmission potential, underreporting 19 level in incidence and proportion of self-healing cases, (ii) quantifying reproduction number 20 of theR0, and (iii) comparing underreporting incidence levels and proportion of self-healing 21 cases between the two periods 2003-2005 and 2006-2012 and between 21 districts. 22 Results: Our research suggests that the number of asymptomatic individuals in the 23 population who eventually self-heal may have a significant effect on the dynamics of VL 24 spread. The estimated mean self-healing proportion (out of all infected) is found to be ∼ 0.6 25 with only 7 out of 21 affected districts having self-healing proportion less than 0.5 for both 26 data sets. The estimated mean underreporting level is at least 64% for the state of Bihar. 27 The estimates of the basic reproduction numbers obtained are similar in magnitude for 28 most of the districts, being in the range of (0.88, 2.79) and (0.98, 1.01) for 2003-2005 and 29 2006-2012, respectively. 30 Conclusions: The estimates for the two types (monthly and yearly) of temporal data 31 suggest that monthly data are better for estimation if less number of data points are avail32 able, however, in general, using such data set results in larger variances in parameters as 33 compared to estimates obtained through aggregated yearly data. Estimated values of trans34 mission related metrics are lower than those obtained from earlier analyses in the literature, 35 and the implications of this for VL control are discussed. The spatial heterogeneity in these 36 control metrics increases the risk of epidemics and makes the control strategies more com37 plex. 38 Keywords— Kala-azar, Dynamical System, Inverse Problem, Spatial Analysis 39


Introduction
Visceral Leishmaniasis characteristics: Visceral Leishmaniasis (VL) is the second deadliest neglected vector-borne disease (VBD) caused by a parasitic protozoan.Leishmaniasis is a neglected tropical disease, characterized by high levels of complexity due to the variations in transmission cycles, clinical manifestations, reactions to therapies, multiple circulating leishmanial genus.Transmission to humans occurs through more than 30 different female phlebotomine sandflies [13].Visceral leishmaniasis (VL), also known as kala-azar (black fever in Hindi) is one of the four main syndromes of Leishmaniasis and by far, the most fatal one.There exist two mechanisms of transmission of VL: zoonotic VL, transmitted from animal hosts to vector and then to humans and, anthroponotic VL when the disease is disseminated from human to vector to human.Two species cause primarily VL: Leishmania donovani and Leishmania infantum.
Geographical connotation of VL.VL is a disease that spreads over large geographical areas and it remains endemic accross tropical environments.The World Health Organization (WHO) estimates around 50,000 to 90,000 new cases of VL every year .More than 90% of the new cases in 2015 occurred in just seven countries: Brazil, Ethiopia, India, Kenya, Somalia, South Sudan and Sudan.India, Bangladesh and Nepal have suffered the largest burden of VL, accounting for 80% of the estimated annual global cases between during the past decade [6].Just within India, 90% of the cases occur in the state Bihar, which was reported to have more than 10,000 cases in 2014 [16].In addition, three other states Jharkhand, West Bengal and Uttar Pradesh suffer from VL as an endemic disease [1].VL affects other areas in the rest of Asia and several countries in South America, with Brazil being by far the one with the majority of cases.
Clinical Epidemiology of VL in Bihar, India.The incubation period for VL lasts commonly between 2 and 6 months [16].Infected patients present several symptoms including fever, loss of appetite and weight loss [16].When infections are persistent, the parasite invades the blood and endothelial system and severe manifestations affect the spleen and liver.Also, anaemia is one of the most common haematological manifestation and further, it may be the key factor triggering leucopenia (reduction of white cells in blood), thrombocytopenia (low levels of thrombocytes), pancytopenia (deficiency of red cells, white cells, and platelets), hemophagocytosis [19].As a directed consequence, the infected patient develops fatigue and weakness.Finally, in advanced stages, hypersplenism also affect the infected individual causing him permanent inflammatory problems and most commonly, bleeding.Its name visceral corresponds to the fatal damages that the parasite produces in internal organs and it results fatal in almost 100% of cases within two years if no treatment is provided.
Socio economic factors affecting VL in India.VL affects some of the poorest population in the world, more frequently, rural areas.The disease is highly associated with conditions of economic stress, such as, overcrowding, malnutrition, poor housing, a fragile immune system, among other characteristics affecting the poor.In India, the disease is believed to attack older children and young adults.The mechanism of transmission may be primarily domestic, and the majority of the incidence is in the male population, in part due to the higher probability of occupational needs in agriculture, cattle grazing, construction, etc. VL becomes a heavy economic burden for communities with both, social and financial costs.It adversely affects individuals and leaves a social stigma in the population,which affects productivity and welfare of communities.Additionally, diagnosis and treatment are associated with high costs for families.Malnutrition status has been considered a significant factor to VL progression and infection level.In fact, malnourished individuals are more likely to show prominent clinical symptoms.World Bank estimates that around 38% of the population in India suffers from malnutrition in year 2015.According to the estimates of the Planning Commission in India [4], poverty decreased in the country from 45% in 1993 to 22% in 2012 but, poverty still remains as one of the most critical issues for well being in the country due to the size of its population.Bihar is the third largest State in India according to population size and sixth largest according to density (inhabitants per square km).
Poverty at the state level decreased from 60.5% in 1993 to 34% in year 2012.The high level of poverty adds further complexity to the issue of VL in the country.
Current Challenges in Bihar: Some of the challenges in the Bihar include high cost of treatment when approximately 75% of the VL cases in Bihar live below the poverty threshold of less than US $1.0 a day, available epidemiology data are scarce, access to public health care clinics are difficult, as well.Estimates conducted by [9] show that there exist levels of up to 90% of under reporting of cases of VL.The present work extends the analysis to estimate self-healing levels and relations with under reporting.In Bihar, humans are the only known reservoir of L. Donovani [10].A significant number of patients in Bihar are living with HIV-VL co-infection [10].Data on the prevalence and associated risk factors on malnutrition among VL patients are scarce.

WHO Elimination targets and Interventions:
In 2005, the governments of India, Bangladesh and Nepal signed a joint memorandum of understanding to eliminate VL with the aim to reduce the incidence at sub-district level [8].Because of the high mortality rate of this disease, the WHO has set many deadlines for VL elimination, with the most recent target set at reducing incidence to less than 1 in 10,000 cases by 2020.While this goal has been achieved in some sub-districts of Bihar because of increased surveillance of clinical cases and vector management, high levels of malnutrition and asymptomatic Leishmaniasis in endemic regions continues to present a significant challenge to achieving this target for all of Bihar.
Focus of this study: This study aims to assess the prevalence of self-healing VL individuals in a population and quantify its basic reproduction number for each of the 21 most affected districts of Indian state of Bihar for 2003-2012 via a simple mathematical model.
The present study includes both practical and theoretical pieces of work, which contain relevant information about difference between time zones and difference between regions.It highlights three critical aspects in the study of VL.First, the spatial and temporal transmission risks characterized by the rate of transmission and the basic reproductive number.
Second aspect of central importance is the asymptomatic prevalence, which has been under studied in the literature.Finally, the under reporting level which, still represents one of the biggest challenges for policy determination and subsequent plans for VL elimination.

Materials and Methods
To estimate the true incidence of the reservoir, namely the asymptomatic population and unreported symptomatic cases, we used a very simple mathematical model to describe the transmission dynamics of VL in the human population.Because our goal is to estimate (1) the prevalence of asymptomatic VL cases and (2) the degree of under-reporting in each of the districts of Bihar for which we have data, and because little is known about the sandfly population in this region, we chose to omit the vector dynamics and model transmission as a directly transmitted, rather than vector-borne disease.

Population Data
Population data from all 38 districts are obtained, however, because VL has been consistently shown in 21 districts which have high prevalence, the focus in this study is on the 21 districts.Data on population density, population size in rural and urban areas, and number of public health medical facilities in each of the 21 districts are generated from the 2001 and 2011 census [5], under the assumptions that the population of each district increases at an annual rate of 2.1% as reported in the Census.Additionally, socio-economic data are also collected from Census of India, through the Bihar state government website.

Incidence Data
Government health institutions in India collect incidence and disease mortality data.A case is reported when the patient is presented for treatment in a public health clinic.However, this passive case reporting grossly underestimates the number of cases with many cases getting treatment at private medical clinics, and thus are not reported to the state health system.

Model Description
The model presented does not follow the structure of a vector borne disease and there are three elements for this reasoning.First, vector-borne models commonly make use of more parameters and there is a serious limitation with data in reference to neglected tropical diseases.The current analysis favored a simple model with relatively few parameters to maintain consistency with data and avoid high levels of complexity.Also, an SIR model has been found to present a substantially better fit than the vector-host model after model selection [12].Finally, the time scale considerations also plays an important role since the mosquito average life span is 20 days while humans live around 70 years.The steady state of the vector, approximated by the quasi steady state, shows that vector dynamics occur very fast and this short time scale may be excluded.
The model used to compute the function C(t, θ) follows a simple SAIR (Susceptible-Asymptomatic-Symptomatic-Recovered) compartmental framework to describe the disease dynamics of VL in the human population.S denotes the number of susceptible individuals, A the number of infected individuals who are infectious but asymptomatic (or have very mild infection), I the number of infected individuals who are infectious and symptomatic (i.e.clinically ill), and R the number of recovered individuals.Define N (t) = S(t) + A(t) + I(t) + R(t) to be the total population size at time t.We assumed that all individuals are born susceptible at the total birth rate Λ, and all individuals in the population are subject to the natural mortality rate µ.Thus, the total population is modeled by N (t) = Λ − µN .Because both asymptomatic and symptomatic individuals are assumed to be infectious and because we are omitting the interaction between humans and sand flies, the force of infection (FOI) is given by λ(t) = β(I + A)/N , so that the incidence is λ(t)S.Asymptomatic individuals either develop symptoms at the rate α, or recover without ever showing symptoms (other than possibly mild symptoms) at the rate η.Symptomatic individuals either recover at a rate γ, or die due to the disease at a rate δ.All state variables and parameters are listed in Table 1 and Table 2, and the above description of the transmission dynamics leads to the system of nonlinear autonomous ordinary differential equations (see Equations ( 3)-( 7) in Appendix 5.1).A flowchart of the model is provided in Figure 4.
Flowchart of disease dynamics.The state variable C(t) is used to fit to the cumulative incidence data.
, where [0, T ] is the time span of the dataset.We define θ := (β, α, γ, η, q) t , where β is the transmission rate, 1/α is the average time spent asymptomatic after exposure to VL before developing symptoms, 1/γ is the average time spent symptomatic, 1/η is the average time spent asymptomatic before recovering from asymptomatic leishmaniasis, and q is the proportion of new symptomatic VL cases that are reported.

Model mathematical analysis
Using the Next Generation Approach [18], we determine that the basic reproduction number is The disease-free equilibrium and endemic equilibrium are E0 and E1, respectively, where their components are given in supplementary document.Following the approach in [2], we find condition when a backward bifurcation can occur at R0 = 1.We found that occurrence of backward bifurcation depends on the sign of the parameter α (see Appendix 5.2 for its expression).That is, if α < 0, then disease can be eliminated easily with existing intervention provided they are systematically implemented (since R0 < 10.However, if α > 0, then VL may persist even when R0 < 1 and much higher efforts to control disease will be needed.To ascertain whether VL can be eliminated from a district, we will calculate R0 and α for each district.
Remark: If R0 > 1, the disease is endemic in that district.If R0 < 1 and α < 0, then VL is moving towards elimination in that district.On the other hand, if R0 < 1 and α > 0, we do not have enough information to conclude whether or not VL is moving towards elimination in that district.
Remark: The mathematical expression representing endemic asymptomatic and symptomatic (clinical) prevalences are given by     The self-healing proportion, corresponding to the percentage of population that cure themselves without any conventional medical treatment and whose recovery process is directed by the patient itself using self knowledge or instinct.Figure 11 intends to present the level of correlation between the basic reproductive number and the self healing proportion among districts and for both periods.This correlation will be used to confirm that high levels of self healing induce higher chances of an epidemic.
The figures present a different slope for every period, negative relationship for 2003-2005 while a positive relation for 2006-2012, suggesting that while self healing used to be more common and tended to reduce slightly the occurrence of and epidemic, during the the second period, a lower SHP might be associated to an increase of an epidemic event.The results from this figures might be the representation of the a behavioral change of communities threatened with kala-azar.
Finally, figure 12 presents the linear relationship connecting the Self-Healing Proportion (SHP) versus the proportion of new cases not reported, q.One important difference between the two periods is the considerable decrease in average proportion of self healing, during the under reporting.However, for the second period of study, it seems that it is not relevant if the proportion of cases not reported increases, since population will maintain a stable behavior towards self healing, fairly stable around 4% on average.This figure might suggest that individuals tend to increase their trust in the health public system across States which can be used as a measure of the improvement in this public service, relevant for further research and evaluation.inhabitants during the period 2006-2012 with 9.8, 6.4 and 5.9 cases, respectively.Tables 5     and 6 summarize the mean estimates obtained through the model for β , q, and η, including the square root of the SSR.

Discussion
Elimination of Visceral Leishmaniasis depends on an adequate knowledge of interactions among factors such as human behavior, the local environment, and the ecology of parasites and vectors.Sociocultural factors such as poverty, educational levels, malnutrition, density and accessibility to interventions in large part determine transmission and persistence of infection in the Bihar [3,15].Interventions designed to control or eliminate VL under any of these conditions may fail if their impact is not thoroughly understood.For example, the implementation of interventions based on true quantification of self healing cases can be significantly improve its effectiveness.
Increased advocacy efforts since the early 2000s have led to new control targets set for 2020 by the World Health Organization.However, the efforts still face long lasting success in the face of elimination goals.There are multifaceted challenges in controlling VL in resource-poor local contexts that need to be understood and quantified and the methods used are novel as it is designed for the scarce data from the region.In light of the 2020 targets, this study attempts to develop methods to measure new metrics using historical data from different districts of Indian state of Bihar.The methodology provide some of the novel ways in which social aspects can be captured effectively that can be used for designing sustainable interventions.More emphasis on measurement of non-trivial but significant aspects for VL are needed that can help VL control to broader social determinants of health, given the major social and economic inequalities that continue to underpin transmission in endemic countries such as India.
We developed a dynamic transmission model for anthroponotic Visceral Leishmaniasis in tainty in the monthly reported cases as compared to annual data, leading to many studies using annual data for estimating parameters.However, use of apparently long-term data may incorporate bias as we expect social and environmental factors to have changed significantly over the time horizon.A better understanding of these factors is important for improving public health policy and aiding public health officials to establish appropriate control strategies.A through investigation on the loss of efficiency in estimation when data are aggregated to form yearly data from monthly or weekly observation, is needed to better conclude model results [20].Hence, we used both the data sets (monthly and annual) to estimate key model parameters.
In our estimation process we form a nonlinear optimization problem using least square metric, incidence data set and our dynamic model.Robust estimation from the available data sets can be a challenging task.The only disease data available are the clinical case reporting (or incidence) of VL.However, the number of asymptomatic individuals in the population who eventually self heal may have a significant effect on the dynamics of disease spread, and little quantitative data are typically available describing this population.Because most data is passively collected, the number of reported cases is often significantly lower than the true number of cases.In this study, we estimate district-wise self healing rate, proportion of case-underreporting and transmission rate using two types of incidence times series (monthly and annually).Our results show that using monthly data self healing proportion (η/(η + α)) varied from 0.01 to 0.98 with only 7 out of 21 affected districts having self healing proportion less than 0.  5 Appendix

Model
The System of the model is given as:
That is, if a < 0, then the system exhibits a forward bifurcation at R0 = 1 and therefore, the disease-free equilibrium is the only equilibrium when R0 < 1.However, if a > 0, then there is a locally asymptotically stable sub threshold endemic equilibrium (i.e. a backward bifurcation); that is, VL may persist even when R0 < 1.To ascertain whether VL can be eliminated from a district, we will calculate R0 and a for each district.If R0 > 1, the disease is endemic in that district.If R0 < 1 and a < 0, then VL is moving towards elimination in that district.On the other hand, if R0 < 1 and a > 0, we do not have enough information to conclude whether or not VL is moving towards elimination in that district.

Fig 1 .
Fig 1. Visceral Leishmaniasis Dynamical Model.Proposed model to estimate and compare distributions of parameters for two different periods of incidence data.

Fig 2 .Fig 3 .
Fig 2. Incidence data.The monthly incidence from years 2003 to 2005 for each district is depicted.The districts are ordered according to the mean reported incidence across the respective time period.White regions indicate months with missing data.

) 3 . 2
Estimation of model parameters for 21 districts via curve fitting 3.2.1 Spatial differences for years 2003-2005 We first fit the cumulative incidence C(t, θ) from the model to the cumulative monthly incidence data from 2003-2005 for each of the 21 districts.The best fit solution curves for Araria, Nalanda, Jahanabad and Samastipur are shown in Fig 5.

6 Preprints
(www.preprints.org)| NOT PEER-REVIEWED | Posted: 23 January 2019 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted: 23 January 2019 doi:10.20944/preprints201901.0231.v1Highdegree uncertainty is observed in the monthly reported cases as compared to annual incidence data (see Figure2) whereas annual data may change significantly due to changes in social and environmental factors.Hence, we used both the data sets (monthly and annual) to estimate key model parameters and show the differences in our estimates.

Fig 5 .Figure 6
Fig 5. Best fit for Araria, Nalanda, Jahanabad and Samastipur.The best fit of the solution curve C(t, θ) (shown as a solid blue line) is overlaid with the cumulative incidence data (shown with red dots) for the districts with the highest and lowest mean incidence to the monthly 2003-2005 data.

3. 3
Estimation of R 0 The estimation of the basic reproductive number R0 is presented in figure 7.This figure include both periods and the upper graph corresponds to the 2003-2005 period and the lower is for the 2006-2012 period.The upper graph has been descendantly sorted to identify the districts with higher possibilities to present epidemics and the lower figure maintains the same order of districts for easy comparison between the two periods.It is relevant to recognize that the R0 falls below the unit in the majority of district in Bihar during the 2006-2012 period, changing completely the story from past period 2003-2005 where the majority of districts presented epidemics of VL.This result regarding the potential of epidemics calls for further research by exploring the activities conducted in each district to try to control VL.3.3.1 Spatial differences for years 2003-2005 and 2006-2012

Figure 8 7 PreprintsFig 6 .Fig 7 .
Figure 8 presents the distribution of the estimated basic reproductive number R0 for both periods.Both periods maintain the big proportion of R0 around the unit.However, during 2003-2005, a large number of districts of Bihar face the event of an epidemic outbreak while, during the 2006-2012 period, the majority of districts had an R0 less than one, suggesting that epidemic of VL was less likely to occur.

Figure 9
Figure 9  depicts the goodness of fit through the squared root term of the sum of squared residuals from the estimated models per district.Nalanda and Jahanabad present the best possible fits for both periods since their SSR are close to zero.Vaishali and Araria, on the other hand, have the largest differences for fits, specially for the period 2003-2005.It is worth to note, as well, that these differences in fit are less prominent for the 2006-2012 period, which might be related to the quality of information and levels of under reporting.

Figure 10
Figure10summarizes estimates for q and η for the period 2003-2005 (left figures) and also for 2006-2012 (right figures).Recall that q corresponds to the proportion of new cases not reported for every district and η is associated to the asymptomatic recovery rate.The distribution of proportion of new cases not reported tend to be higher during the second period 2006-2012 and also presented a slight higher median.However, the level of asymptomatic recovery rate changed from a concentrated distribution (gamma) on low values during the period 2003-2005 to a higher recovery rate under a normal distribution for the period 2006-2012.

9 PreprintsFig 9 .Fig 10 .
Fig 9. Proportion of Sum of Squared Residuals.Estimates of dispersion through the proportion of the sum of squared residuals divided by the total.

Table 1
The model tracks the cumulative number C(t, θ), given a vector of parameter values θ, of new VL cases in the time interval [0, t].This function is then fit to the cumulative incidence data for each district using maximum likelihood under the assumption of normally distributed errors in order to estimate the transmission rate, the proportion of all cases that are asymptomatic, and the proportion of cases that go unreported.Letting yi denote the number of new cases in time interval [ti−1, ti], i ∈ N, and defining cn := n i=1 yi, we find the vector of parameter values θ that minimized the sum of squared residuals (SSR) 5 Preprints (www.preprints.org)| NOT PEER-REVIEWED | Posted:

11 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 January 2019 Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 23 January 2019 doi:10.20944/preprints201901.0231.v1Table 4 .
Average number of reported cases per yearthe Indian state of Bihar.The model incorporates proportion of self healing individuals with the aim to evaluate spatial distribution of elimination chances of VL.We used district level two different data sets to estimate model parameters.The data sets include a monthly inci-
5. Estimates summarized in figures 5 and 6 have a very important implication for policy considerations.Estimate of the transmission rate β might be a signal confirming the deterioration of potential elimination of VL due to the overall increase in this rate.During 2003-2005, almost no district in the Bihar state had a β higher than 2, with the exception of Nalanda and Vaishali.However, for 2006-2012, the estimates show that every district 12 Preprints (www.