Differences in Household size, Employment Status and Ability to pay for the service, are Associated with Distance Travelled for Inpatient Care in Kenya

Methods: Data on four hundred and eighty-one participants of all ages from forty-seven counties in Kenya who sought inpatient care in Kenya in 2018 were analyzed. Distance to a health facility was captured as a continuous variable, and was self-reported by the respondent. The response exhibited a discrete mass at zero and continuous characteristic, therefore a Tweedie distribution was adopted for modelling. Due to the correlation nature of clustered data, we embraced the Generalized Estimating Equations (GEE) approach with an exchangeable correlation. Since no standard software was available to analyze this problem, we developed a R functions. We assessed best model fit using the Quasi Information Criteria (QICu) and R criteria, in which the lowest value for the former and the highest for the later are preferred.


Introduction
Inpatient care is defined as a case where an individual is hospitalized for more than twenty-four hours, and reflects a more serious health problem. There were estimated 1.2 million Kenyans who required this services in 2013, and the number is predicted to increase exponentially in the upcoming decade [1]. Among those seeking care, various factors are key in predicting distance travelled. For example, differences in wealth index would determine the distance travelled. For example, those on the richer wealth quintile can afford to pay fare to any facility, pay insurance premiums that can guarantee admission at any facility or having the financial strength to pay cash at any given facility of choice. In contrast, those on the poorer quintiles have less choice to make on the type of facility for care as they are limited by finances. In as much as the government facilities are much more affordable, and are the best choice for care, most are miles away and out of reach from places of residence.
In an effort to mitigate this, the government has increased the set-up of as many inpatient services as possible, which include upgrading of health facilities which used to offer outpatient services only by equipping them with machines that are needed to provide inpatient care. However, most of this upgrade would require essential services such as water and electricity, accessible road and housing which limits some facilities for upgrade. Some of this facilities are found in the rural areas and slums which serve a large number of people, which means their upgrade would be a milestone for the residents. What this means is that the poor who live in this areas continue experiencing difficulty when they need inpatient care. As a result, potential ease of access policies for the inpatients service should be invented.
Distance to access inpatient services can determine the well-being of a population, and is potentially linked to an individual survival. For example, there has been a link of longer distances with worse health outcomes, including more length of stay in hospital and non-attendance at follow up [2], and to the worst, patients fatality [3]. Study conducted in Zambia found that longer distances and the lack of geographic access to the much needed obstetric care by pregnant mothers, explain why there are still fatalities due to deliveries without skilled care [4], and in Tanzania there is an increased child mortality due to difficulties in access to a health facilities [5].
In contrast, closer or shorter distance to a facility were associated with utility and better health outcomes in sub Saharan Africa [6]. In an event of emergency, distance could be a defining factor on a patients survival, with longer distance predicting higher mortalities [5]. Studies across some developing countries, like Bangladesh [7], Kenya [8], Nigeria [9,10], Afghanistan [11] and Burkina Faso [6] points out the importance of distance, in predicting a health outcome.
While a correlation of distance and decay exist, with those further associated with underuse and those closer associated with proper use, there is little evidence to show how this translates to the heath outcome. Therefore, distance travelled to acquire the much needed inpatient services require more investigation, and thus forms the basis of this paper.
Our analysis focuses on the secondary analysis of the Kenya Household Health Expenditure and Utilization Survey, collected in 2018, and the question of interest was; Q68. What distance did <name> cover in Kms to get to the inpatient facility?
This paper builds on our previous work on non-normal responses analysis under GEE by Mwenda et al [12] and adopts the approach by Taryn Swan [13] and Kurz Christoph [14]. Kurz fitted health care utilization cost data using a Tweedie distribution, but his works were based on generalized linear model, meaning correlation was not considered. In contrast, Taryn and Mwenda et al considered a decaying correlation with time, but with different applications to rainfall and health data respectively.
Our data is based on the clusters, which exhibit patient to patient correlation characteristic, meaning if we use the previous methods (correlation decay) we will have wrong results. Therefore, due to clustered correlation nature of our data within counties, our work proposes a new approach to model the response using the Tweedie distribution by considering what we refer to as 'decay distance with constant correlation' and thus propose to use the exchangeable correlation structure under a GEE framework.
Our main goal is seeking to understand what covariates are best associated with distance travelled for inpatient care in Kenya. Since there is no software that can analyze this as far as we are concerned, we aim to create R function with a Tweedie distribution and exchangeable correlation structure under the GEE frame work. Due to difficulty in linking inpatient admissions and accessibility, we rely on the self-report from respondents on the distance they covered to access the health facility.

Statistical Literature review
Tweedie distributions have been applied widely in modelling non-normal response data with discrete mass at zero since they are flexible to incorporate skewness without any data transformation. Most of the methods suggested in literature for analysis of such data, majorly consider data transformation [15], two-way analysis [16] and Bayesian methods [17]. However, these methods are not efficient for our approach due to correlation nature of our data within clusters. Approaches proposed for analyzing non-normal data in the GLM framework have limitation of ignoring the correlation, which may exist amongst the subjects who belong to the same cluster and also, the methods require a specification of a full likelihood. This means, if you miss-specify the likelihood then you get wrong results. Our proposed approach, to use the quasi likelihood methods, have the advantage that they only require specification of how the mean relates to the covariates. They are also very flexible in that in an event of misspecification of the correlation structure the estimates are still plausible. Moreover, our methods are easy to modify and adapt.
Previous evidence suggests influence of covariates on distance to health facilities, however research gap on selection of best fitting covariates remain. The aim of this paper is to investigate what combination of covariates will influence the distance a Kenyan citizen will travel to seek inpatient care. This new work enhances the application of Tweedie distribution to understand the influence of given set of covariates on distance.

Statistical analysis and Selection of variables
Data from the Kenya Household Health Expenditure Utilization Survey (KHHEUS) inpatient data was obtained, cleaned and coded. We investigated 13 covariates against our dependent variable, distance. Type of residence was categorized into two (rural and urban). Five wealth index quintiles ranging from richest to the poorest was constructed as a score from ownership of different household assets using the Principal component analysis as described by Filmer and Prictchett [18]. Grouping of the education was adopted due to justification and methods provided by Rippin et al [19]. We had four categories; Those who never went to school (Those under 3 years of age and those who responded, "Never went to School"), Lower education (pre-primary, primary and informal (madrassa)) intermediate (secondary, vocational and college) and higher education (university degree and above). Grouping of the age groups for employment followed the definition of employment rate by age group defined by the OECD. We divided into four age groups categories. Category 1 considered those aged below 15 and above 65, as they are considered unable to work, Category 2 considered age group 15-24years as those entering labor market following education, Category 3 considered age group 25-54 as those in prime working lives and finally category 4 considered age group 55-64 as those passing the peak of their career and approaching retirement [20].
Our dependent variable, distance was assumed to be 0 km for any value captured and is less than 2 km, considered by other studies in Kenya [21].

Data Availability
The datasets are freely available at Kenya National Bureau of Statistics (KNBS). Interested researchers should register an account to access the datasets at https://www.knbs.or.ke/kenada/. The authors confirm that others would be able to access these data in the same manner as the authors and that the authors did not have any special access privileges that others would not have.

Statistical Methods
In this study a Tweedie distribution is used to model predictors of distance for inpatient care. Justification to use the Tweedie distribution is provided by Figure 1, which shows that distance as the dependent, has a discrete mass at zero and a continuous characteristic. In addition, Table 1 shows that the data is right skewed with a skewness value 4.80. We analyzed the distance with the covariates to determine the best combinations to explain any existing association.  Detailed methodology is in Appendix 3. Multiple regression analysis under GEE framework using the Tweedie distribution and Exchangeable correlation structure Here, we adjusted for household head gender, education, age, household size and wealth index in the regression model. This we used the exchangeable correlation under GEE approach using Tweedie distribution. We checked the model fit to select the best model using the QICu proposed by Hardin and Hilbe ( [22] page 171-172) which is an extension of QIC proposed by Pan [23], following the AIC developed for GLM [24]. QICu imposes penalty based on model complexity to ensure few covariates are used to achieve model parsimony.
Data which are in .sav format, were imported into R statistical software version 3.6.3 [25] for cleaning, reformatting, recoding and analysis.

Results
We present 10 competing models with distance that demonstrates the best fitting model with the lowest QICu as seen in Appendix 2. We used the approach of backward selection [26] as a proxy to identify the best predictors for distance under a GLM. However, our model output and interpretation is only based on distance adjusted for the respective covariates in the GEE framework. We added the covariates into the model and computed their QICu and 2 , we then removed the covariates one by one and checked whether the changes were improving the model fit.
The reported model, which is model 7 from Appendix 2, with the best fitting covariates is written as log( ) = 3.093 + 0.222 .
1. Where is the expected distance traveled to access inpatient care.
2. Ability to pay (middle, highest) takes a value 0 or 1 depending on which category is being assessed. Group (lowest) is the reference category. This model reported an = 6.12 , = 0.045 , 2 = 9.96% and = 13158.23 with a = 1.64, 95% (1.59,1.68) To interpret the coefficients, which are captured in logarithmic form, we need to take the exponentials. From the given model, all factors kept constant, the population average distance to a government inpatient center in Kenya is approximately, exp (3.093) which is 22.04 km.
Compared to the least pay category (1-3000KES), the citizens in the middle pay category (3001-10,000 KES), travelled 1.24 times, while the ones who paid the highest category 3.40 times.
The employed covered half the distance to a health facility for inpatient care compared to the unemployed, 0.59times.

Discussion
This work has demonstrated the use of an alternative technique for clustered and correlated nonnormal responses which depict a discrete mass at zero, under generalized estimating equations. The paper presents the best set of covariates in predicting distance travelled by Kenyans to access inpatient care from the forty-seven counties. Data from each county are representative, and the pooled data contributed a substantial information about distance for inpatient care. A set of potential covariates were investigated to better understand their effects.
The model, without covariates showed that on average, a Kenyan seeking inpatient care travelled a distance of 22.04KM. Though a national average, the cost in terms of fare will differ substantially, in that the road terrain which is the most preferred for accessing hospitals is very different in Kenya. Some roads are all weather but others are seasonal, meaning during the rainy times, accessibility is greatly hampered. A health system performance is described using equity, measures or assessed using the health service distribution, access and utilization [27]. The access is mostly determined by cost and distance. This means irrespective of availability of a service in a hospital, if not utilized by the target group, then its fully utility cannot be actualized. The aim of SDG 3 is to 'Ensure healthy lives and promote wellbeing for all at all ages' and this work shows the importance of distance in measuring this goal.
Most inpatient care is usually critical and require specialized attention by a medical expert, therefore distance to access could determine survival or fatality. Though some studies have not linked accessibility to use [29], there has been evidence that ease of access could potentially save life, as some life threatening conditions are worsened by accelerated duration to see the physician. For example, when a patient suffers a heart attack, how fast they get to the hospital could determine their survival.
In our inpatient count data, some patients were admitted to health centers operated by the government. Although not advanced very well to handle extreme cases, they are well equipped to handle some cases like admissions for child birth. However, in case of any complications from the child birth, in eventuality the woman needs caesarean services, then it means they have to be referred to a larger facility that can offer that.
What this means is that major conditions would still need referrals to major hospitals thus access for inpatient care needing this services at the lower level hospitals will still remain a challenge.
Noor et al [28] had investigated access to any government health center, and reported distance of less than 10km. Although they focused on only 4 districts in Kenya, they were diverse and could be used as a proxy for national estimate of the distance. However, their focus on the smaller health centers, are mostly used for outpatient care, could be misleading when predicting for inpatient access.
However, it is important to note that distance in Kenya is very hard to predict, because of the differences in terrain and road types (Tarmacked, marram, earth), thus the low value of the 2 of 9.9% is reasonable.
Distance for inpatient care is a cornerstone to the Kenyan government as it tries to achieve the universal health coverage. The main goal of universal health coverage (UHC) is to ensure that every citizen has access to quality healthcare services, however, this can only be achieved if the distance to access is reasonable and achievable. What this means is that for the government to achieve the general objective, then it needs to do more on opening up access for inpatient services. It is also evident that as far as the access to the inpatient services is generalized, Kenya has a unique geographical terrain. For example, it may be easier to access a facility that is further away in an urban area than fewer km in the rural area. This is because, most urban cities have good road network, making access easy. If at all the government wants to increase access, then more needs to be done to improve road network infrastructure in the rural areas.
Our results, have shown the evidence that high costs are associated with longer distance travelled to access inpatient care. We can interpret this in two ways. First, the cost incurred could signify an expensive procedure or care. Two, those with the ability to pay could choose facility which are far, though expensive, and the required care isn't complicated, as they could be having more confidence in the hospitals of better care.
Those who had the highest ability (10,001 + KES and above) to pay seem to travel more distance for inpatient care up to 3.40 times compared with those with lowest (1-3,000 KES) ability, and those in the middle (3,001-10,000 KES) seems to travel 1.24 times more. Ability to pay give a person freedom to choose any facility they are comfortable for inpatient care and mostly, higher hospital figures are associated with complex medical care needs and procedures. For example, a caesarean will cost more than normal delivery, although all will need the inpatient care. The differences will arise in that a referral hospital could be more suited to handle caesarean from a woman with pre-eclampsia that other hospitals. This is because sophisticated medical equipment's would be required for the procedure, and are mostly found in large referral hospitals. Therefore, a patient with stronger financial muscles will travel longer distance for the procedure and pay higher medical cost. The one with low ability to pay will check in at the closest and most affordable facility.
Our result points out that Kenyans travelled longer distance for complex medical procedures or for better services which may not necessarily be found in the closest inpatient health facility. Others could have travelled longer distance for privacy of the disease. For example, a person could be more comfortable being admitted for inpatient care for a sexually transmitted diseases (STD) in a hospital further away from home. However, our results may not be comprehensively conclusive, and a further investigation on why those who travel longer paid more for the services should be established.
In Kenya, after adapting the new constitution in 2010, health services were devolved with an aim of proper and closer management at the low levels. However, due to limit in budgets and other indirect effects like road and electricity, there has been a slow growth in terms of the hospitals upgrades for inpatient care. For example, setting up an inpatient care deep in the rural set up without a good road network or proper supply of electricity would be meaningless in terms of serving the people. The result of this is that the facilities will mostly be set up in centers where such services can be accessed easily. This means people in remote areas will still have to travel for longer distance to get this inpatient services.
Inequalities in employment opportunities also determine the distance travelled to access inpatient care. It is evident that employed will cover half the distance the unemployed covers, results supported by Allin et al [30]. This means that the unemployed have less financial ability will be forced to use the facilities that are within reach, and may be limited by lack of sufficient money to access better facilities for specialized treatment, in case they require the same. Additionally, the employed have an advantage of affording to live where larger inpatient facilities are found. For example, you expect to find large referral and specialized hospitals in capital cities or large towns so that they are accessible and serve many people. The facilities also need to be easily accessed, connected to uninterrupted supply of water and electricity. Most employed category of people would choose to live in the places where such services can be found.
Family size was the last covariate that determines distance travelled, with middle and larger household services travelling longer distances compared to small household sizes. This differences could be explained by the fact that larger sizes are mostly found in rural areas and slums compared to urban areas. It is easy to raise a large family in rural since things like food and accommodation are cheap and affordable. Mostly, people in rural areas live in their ancestral land where they also farm most of their food, thus no cost implication on housing and food. This shows evidence that there is need to improve inpatient facilities in the rural areas and slums. Without a strong policy focus to support equal support in accessing the inpatient services in Kenya, prioritizing the rural areas and slums, open up job opportunities, and encourage smaller families, then the dream of achieving UHC will still remain a dream.
This work is the first to estimate distance for inpatient care in Kenya, analyzing all the responses from the forty-seven counties. This provides the best estimate and evidence on which policies to formulate. This area has been under-studied with many researchers focusing on both inpatient and outpatient care, however such analysis could lead to wrong conclusion and bad formulation of policy. For example, Noor et al [28] reported distance to access of less than 10 km, which means if policy are based on this, then it means every person in fact has access to health care. However, as stated earlier, most of this facilities are for outpatient meaning if policies are based on this, then we may not improve the much needed inpatient care that requires sophisticated and complex procedures as well as the doctors.
A further advancement of this research would be to focus on individual analysis of the 47 counties. This is out of scope of this work that has only focused on the population average. This is an open room for further research analyzed per county. Although some studies have found no association with use [29] better access regardless of the quality is still influenced by distance.
A drawback from previous researches is analyzing distance using summaries reports. This is where the researchers don't dig deep into the data and only reports the average from the summary statistics. From Table 1 alone, we could have reported a median of 10 km, which may be misleading as we didn't factor in skewness and correlation that exist in the data, which if analyzed could give more insight. This shows that our advanced statistical analysis has provided more meaningful interpretation to the data by factoring both skewness and correlation.
This analyses have limitations. Firstly, the data had a lot of missing information which made it very difficult to impute. So we only used data that were full. New complex statistical methodologies for predicting non normal data need to be developed.

Conclusion
We have demonstrated a new approach of handling correlated non-normal data, and created an r code Appendix 1 and GitHub link (https://github.com/samwenda/Tweedie-with-Exchengable-Correlation ). Our approach has demonstrated how distance decay can affect access to much needed health services. Our approach can be used in other datasets which have a discrete mass at zero and have correlation within clusters. Our approach has also shown a new way of calculating the denominator as provided by Hardin and Hilbe book page 91 as shown in Appendix 2.
Policies targeting on having more government facilities at the rural areas and slums should be targeted since that is where we have large populations. Also, policies targeting smaller families that are manageable should be encouraged to ensure people have families who they can afford to give the best health care. Jobs availability will also increase flexibility on the choice of a facility. Sophisticated services should be brought close to the person on the ground to ensure they don't travel longer for the much needed services.
Acknowledgments: Much thanks to my colleagues at KNBS, Mr Mutua Kakinyi for providing the datasets and Robert Buluma for being a great encourager. Also, thanks to Joseph Kombe, who assisted in running the R codes which were taking a very long time. Great thanks to Prof. Taryn Swan, whose R code I complemented to fit the models. Noah Mutahi who assisted in proof reading of the manuscript and offered great insights. Finally, I thank all my PhD classmates at Moi university for always being greatest motivators.
Funding: This work received no funding, and its part of the first author PhD work as the best model, even though model 9 and 10 had higher 2 values. We selected the model with the best balance between the QICu and the 2 , in which model 7 fits as the best parsimonious model, with the least covariates with acceptable QICu and 2 values Table 3. Run time in seconds in calculating the denominator of the scale parameter.

Method
Run time in seconds Hardin and Hilbe 0.033 Proposed method 1 0.0319 Proposed method 2 0.0289

Appendix 3: Methodology
The best approach of modelling a mixture of discrete and continuous data that accounts for the discrete mass at zero are the Tweedie distributions. The Tweedie distribution don't have densities that can be expressed in closed form, however, they are flexible enough to encompass discrete distribution such as poison and continuous distribution such as gamma, in what we refer to as compound Poison-Gamma distribution.
Following Hasan and Dunn [Error! Reference source not found.], for ∈ for some function ( , ) and known function and (. ) and dispersion parameter > 0, the Tweedie family belongs to EDM defined as a probability function of the form ( ; , ) = ( , )exp [1/ { − γ(θ)}] ( , ) cannot be written in closed form, and is known as the normalizing constant that ensures that the total integral of over the domain sums to one.
Following Dunn and Smyth [Error! Reference source not found.], ~( , ) indicate that belongs to an EDM with mean given by [ ] = = ′( ) and variance given by [ ] = ′′( ), and the mean relates to the variance through a mean-variance relationship given by [ ] = ( ) with ( ) referred to as the variance function.
The Tweedie's are special cases of EDM that obey a fixed mean-variance relationship of ( ) = for = ( ) and constant and ∉ (0,1). They are denoted as ( , ) with an index parameter ,dispersion parameter and mean . Altering the value of the index parameter , the Tweedie's can be reduced to some other common distributions which are also EDM members. When = 0, the normal distribution, = 1, the poisson distribution, = 2 gamma distribution and when = 3 we have the inverse Gamma. To note is that for values greater than 2, then Tweedie's have no closed form. Our focus is on distribution with the index expressed as 1 < < 2 .(1 and 2 are not included) known as poison-gamma distribution, which have a discrete mass at zero and are continuous.
An important property of Tweedie under GEE, is its ability to accommodate both correlation and right skewedness which is a characteristic of our continuous data. This approach is used in this paper and it complements the work of Taryn Swan, who used the AR(1) correlation structure.
Tweedie regression models allows relation of the mean of distance to the selected covariates. This allows the mean of distance to be modelled as a linear function of covariates using the log link given by log( ) = 0 + 1 +, … , where vectors are regression coefficients that corresponds to the vectors of covariates, all fitted based on Tweedie EDM. Means are calculated to assess the relationship between covariates and the distance.
To fit the models, we need to estimate the index parameter and the ′ from the Tweedie distribution using the GLM framework. This could be computationally difficult but the R package Tweedie [Error! Reference source not found.] and statmod [Error! Reference source not found.] fit this easily. The calculated index parameter as calculated by the software is shown by Figure 2 in    summarise(resd = sum(combn(resd2, 2, FUN = prod)),n = n())%>% mutate(sampl_n = n*(n-1)/2)

Proof by induction
The denominator of an exchangeable correlation on page 98 in Hardin and Hilbe [22] book, to calculate the alpha is given by ∑ 0.5 * ( − 1) − We propose that the same can be implemented using combinations. We suggest the following ∑ , 2 − for > 1 where are intergers

Proof by induction
For balanced clusters, assume the data given by Hardin and Hilbe page 66. The data has 2 subjects with 4 time periods.