The Impact of Social Isolation on Subjective Health: An Instrumental Variable Approach

We investigate the relationship between social isolation and subjective health, considering that this relationship is potentially affected by endogeneity due to the presence of selfreported measures. Thus, if an increase in social isolation may impact the perception on health, alternative paths of causality may also be hypothesized. Using data from round 7 of the European Social Survey, we estimate an instrumental variable model in which isolation is explained as being a member of an ethnic minority and having experienced some serious family conflicts in the past. Our results confirm that changes in social isolation influence subjective general health. In particular, greater isolation produces a strong and significant deterioration of the perceived health status. With respect to the literature on social isolation and health, we try to advance it by supporting a path of causality running from social isolation to subjective health.


Introduction
Social isolation is defined in literature as a condition of deterioration of human relationships both from a quantitative and qualitative point of view. The discussion concerning the role of social isolation is increasing in centrality nowadays, focusing on the context of industrialized societies, in which individuals experience new patterns in intergenerational ties, social mobility and living arrangements due also to dual career families, increased instability in unions and a decline in large families [1].
Recent literature has investigated deeply the consequences of social isolation, mainly documenting the relationship between a deprivation in social relationships and health [1][2][3]. In more detail, the literature has underlined that low levels of social relationships have a negative impact on health conditions. The mechanism through which a deprivation in social contacts undermines health is well summarized in two theoretical streams: the "buffering" [4] and the "direct effect" models. The "buffering" model states that social support is beneficial when individuals are exposed to the pathogenic effect of stressful events, by moderating the negative impact of those stressors.
On the other hand, the "direct effect" model suggests [5] that good social relationships have positive effects irrespective of whether individuals are under the effect of health stressors or not.
Empirical evidence [5] supports both the illustrated points of view, confirming in any case the positive impact of social networks on mental and physical health.
While this field of research has plenty of contributions which investigate and support [6,7] the impact of social isolation (and/or loneliness) on health and, in particular, on the risk of death in the older cohorts, less attention has been devoted to the relationship between human relationships and health in large samples of adults of all ages.
In this paper, we consider a large representative dataset of more than 40,000 individuals aged sixteen and older and coming from 21 countries in Europe, with the aim of establishing a relationship between social isolation and health.
Data include self-reported responses as a suitable measure of health in a crosssectional setting. Subjective health, being a self-reported measure of health, is commonly used in national representative surveys with the objective of measuring a health condition, but the literature shows that self-reported measures of health are in certain cases more subject to measurement errors if compared with more reliable objective measures [8,9].
Nevertheless, subjective measures of health are widely used in the literature when objective measure of health are not available. In addition, self-reported scales are more reliable when individuals are asked to report on their current health and not over any long period of time [10], as in the case of our data.
Using data from the European Social Survey, we aim to assess whether social isolation increases or decreases subjective health. Due to the potential presence of measurement errors which may be present both in subjective measures of wellbeing and in the perceived component of the social isolation index adopted in the paper, the relationship between the two variables may be affected by endogeneity.
In econometric practice, endogeneity may arise when we are in the presence of possible unobservable factors that may influence both the main predictor (social isolation) and the outcome (subjective health). In this framework, the presence of measurement errors in self-reported indices of social isolation and subjective wellbeing represent one of the most common sources of bias in estimates.
An instrumental variable approach, through the identification of exogenous variables that are strongly correlated with the main predictor (social isolation) but presumably not with the error term of the equation that explains subjective wellbeing as a function of social isolation, may protect estimates from the bias produced by endogeneity.
After this brief introduction, the structure of the paper is organized as follows: the section "Materials" describes the peculiarity of the dataset; the section "Methods" illustrates the statistical methodology implemented to evaluate the effect of social isolation on subjective well-being through instrumental variable estimation after having analytically defined the subjective well-being and provided full details about the construction of the social isolation index used in the analysis.
The section "Results and Discussion" presents the outcome of the regression analysis and discusses both the main and the corollary results, while the final section concludes.

Materials
To analyze the relationship between social isolation and subjective general health, we employ data from round 7 of the ESS (2014). The ESS is an academically driven cross-national survey which issues a multidimensional questionnaire across several European countries every two years. In the seventh round (the most recent released), there were 21 countries surveyed: Austria, Belgium, Czech Republic, Denmark, Estonia, Finland, France, Germany, Hungary, Ireland, Israel, Lithuania, Netherlands, Norway, Poland, Portugal, Slovenia, Spain, Sweden, Switzerland and the United Kingdom. We decided to choose this particular round not only because it is the most recent available, but also because it is the only one containing a specific module on social inequalities in health. Thus, it includes health measurements which are crucial for our analysis.

The Model
The aim of this paper is to analyze the relationship between the social isolation level of individuals and their health status. Given that (objective) medical records of individuals are unavailable, we estimate the impact of social isolation on subjective general health (SGH).
The main model we use to estimate the impact of social isolation on the health status dummy variable is a probit regression specified as follows: where Si represents subjective general health for the individual i; Yi, the dependent variable, represents the probability that the chosen health-level variable assumes value 1 for the individual i; Xi is the main explanatory variable, which is a measure of social isolation for the individual i; W i is a vector consisting of all the other exogenous regressors, that is, all the other variables that have an influence on health according to the literature. Finally, εi represents the error term and Φ is the cumulative distribution function (CDF) of the standard normal distribution. Since, as mentioned before, this relationship is likely to be affected by endogeneity problems, that is, social isolation is potentially correlated with the error term, we argue that an IV-probit regression model [11] would lead to more reliable estimates than a standard probit regression. Using an IV-probit estimation means where Z i is the vector of instrumental variables not correlated with the error term of equation (2). After estimating [α0, α1, α2] using equation (3), the obtained prediction of Xi replaces Xi in equations (1) and (2), giving robust estimates of the coefficient β1.

Dependent Variables
Subjective general health (SGH) is the dependent variable of the main equation. It has been obtained by generating a dummy from the following question: "How is your health in general? Would you say it is... [Very good, good, fair, bad, or, very bad?]" The dummy SGH takes a value of 1 when respondents choose the answers "very good" or "good" (we denote this condition as characterizing individuals with high self-reported heath) and 0 otherwise, denoting those respondents that we classify in a low self-reported health category.

Endogenous Independent Variable and Instruments
The main explanatory variable of our analysis is social isolation. The ESS has a specific section about social exclusion. In our analysis, we focus on the measures of social isolation. In order to construct a robust index of isolation, we follow the recent literature [12] and build this measure by adding together the score assigned to each response category of three related ESS variables, namely, family status, social contacts and having close friends. These items have been used in many past studies [13][14][15].

Family status
This item, indicating whether the respondent cohabits with a partner and/or has children living at home, is obtained by the combination of the following two questions of the ESS survey: In particular, the variable family status has the following four response categories and to each possible answer a score between 1 to 4 points (p) was assigned:

Social contacts
This variable indicates the frequency of the respondents' social meetings and is recorded after asking this question of the interviewee: "How often do you meet socially with friends, relatives or work colleagues?
 Never/less than once a month (1p)  Once a month (2p)  Several times a month/once a week (3p)  Several times a week/every day (4p)

Having close friends
This last item indicates whether the respondent has some close friends or not; it is thus coded as a binary variable.
"How many people, if any, are there with whom you can discuss intimate and personal matters?
The index obtained following the past literature [12] is reversely coded; that is, low values on the index mean that the individual experiences a high degree of social isolation, whereas a high score implies less social isolation- Table 1 shows the social isolation items and the distribution of their response categories.  [16] without a great loss of information.
The relationship between social isolation and subjective health is documented in Table 2. The first column shows percentages for the whole sample, while the remaining ones show them at different quartiles of social isolation. Looking at the last four columns, we can immediately see that health increases for individuals who experience higher levels of social inclusion.
As outlined above, the main issue with the estimation strategy is endogeneity: According to the literature [17], individuals belonging to racial and ethnic minorities have less social support and feel that they do not fit properly into society.
The link between these variables is further confirmed in the literature [18]  The first variable is a dummy indicating whether the respondent belongs to a minority ethnicity in his country or region. The second one is a categorical variable which we recode as a dummy, taking a value of 1 when the respondent answers "always" or "often" and 0 otherwise.

Exogenous Independent Variables
We use as controls some variables which affect health and social contacts. First, we include basic demographic variables: age classes and gender. Second, we include other socioeconomic controls: paid work status, dummies for urban area of residence (big city, suburbs, village) and level of education attained by respondents and also by their parents. Finally, we use country dummies to rule out any countrylevel effect.
Age: Older people are generally less healthy than younger people [23]. We decided to use a set of dummies for age classes, rather than a continuous variable, because the relationship between age and isolation (or health) may be nonlinear.
Gender: According to past studies [24], the inclusion of gender as a health determinant is crucial in order to control for sex-related social norms and structures that influence vulnerabilities to illness, health status, access to preventive and curative measures and quality of care.
Occupational status: Work status may be associated with anxiety and other health-affecting symptoms [25]. In order to control for these effects, we included in our regression a set of dummies representing the following categories: employed, students, unemployed, retired and other. The latter category includes individuals such as houseworkers and those in community or military service.
Religiosity: Scientific research neglected the connection between religiosity and health until recently; however, during the last few years the interest in the relationship between these two variables has started to grow [26,27]. Given the prevalence and importance of religiosity among the population, it is reasonable to consider the impact that religious beliefs, practices and traditions may have on physical and mental health outcomes. Although a large proportion of published work suggests a positive association between religion and health outcomes [28], some studies show a negative relationship [29]. Considering the contradictory positions in the existing literature, further research is needed to investigate this relationship.
Domicile: The area of residence may significantly impact health outcomes by influencing items such as healthcare access and behaviors [30][31][32]. However, until now the impact of area characteristics on health has still not been completely clear because the existing studies show contradictory findings [23].
By including a set of dummies representing residents of big cities, small cities, suburbs, country villages and the countryside, we try to capture these effects. In the literature, it is well documented that the well-educated have better health than the poorly educated, as indicated by high levels of self-reported health [33][34][35] with a directional path that goes from education to health and not vice versa [36][37][38].
Country dummies: Finally, we included a dummy for each country in order to take into account the residual unobserved heterogeneity at country level, such as country-specific institutional effects. The related coefficients are not displayed in the regression table because they are merely used as controls.

Results and Discussion
We now turn to the results of our analysis. Table 3 displays the results of the first-stage regressions using the social isolation index as the endogenous variable. Notice that due to missing values in the control variables, the total sample is reduced to 31,726 observations. The results were in line with the expectations: individuals who belong to an ethnic minority and/or who grew up in a household with serious conflicts between family members are more likely to be socially isolated. We can claim that ethnic minority and family conflict are strong determinants of social isolation, since the coefficients of the two instrumental variables are statistically significant at the 0.1%. Most of the socioeconomic controls have significant coefficients-namely age class, gender, religion, area of residence and education. The only exception is provided by the education level of the father, even if the results are compensated for with those regarding the education of the mother. All the variables used in the regression, i.e. the instruments, the demographic and socioeconomic controls and the country dummies, were altogether significant with a F-statistic greater than 10.
Thus, they are highly correlated with our endogenous variables. This is a good argument in favor of the effectiveness of our IV-probit regression [39], proving the relevance of the selected instruments.
The last column of Table 3 reports the results of the second-stage regression, where health status is the dependent variable and the index of social isolation is used as an approximation of a continuous endogenous variable. The coefficient of social isolation is positive and strongly significant, meaning that health benefits from a higher level of social integration. This effect is robust to the inclusion of all our controls and country dummies. This basic model shows evidence that social isolation has adverse effects on health as reported in the ESS.
In order to further support the choice of the IV-probit method with respect to the standard probit regression and to show the validity of our instruments, the Amemiya-Lee-Newey test and the Wald test of exogeneity were run. The results, reported in Table 4, are in favor of the IV-probit method. The Amemiya-Lee-Newey statistic can be calculated for cases in which the model is "overidentified," that is, when the number of items exceeds the number of endogenous variables. This test is equivalent to the Sargan test for the standard IV regressions and is implemented in order to judge whether the items are valid (i.e. they are exogenous and they affect the dependent variable only indirectly) or not. Results supported accepting the null hypothesis that the variables we used to instrument the index of social isolation, i.e. ethnic minority and family conflict, are exogenous. The Wald test is instead an indirect test for the endogeneity of the instrumented variable. Results were against the null hypothesis of exogeneity; thus, the error terms of the first and second-stage regressions are correlated. This suggests that using the IV-probit estimator produces consistent and more efficient estimates for the impact of social isolation on subjective general health. Our analysis has some objective shortcomings. As explained in the previous sections, there are endogeneity issues which still remain in our model, even after instrumenting.
First, the validity of our instruments must be tested since the exclusion restriction might not be satisfied. In fact, belonging to an ethnic minority might be correlated with health status, even if the direction of the correlation is ambiguous. A stream of literature in public economics focuses on overutilization and underutilization of welfare services, including healthcare, by immigrants and ethnic minorities. A possible explanation for underutilization is that ethnic minorities and immigrants face barriers to access healthcare services due to a lack of information. This could result in worse health conditions for those belonging to an ethnic minority.
Nevertheless, immigrants tend to be younger and, on average, healthier; hence, the health effect of being part of an ethnic minority is ambiguous. However, by controlling for age classes, this problem should be overcome. Our second instrument, namely, family conflict, is a lagged variable because it investigates possible tensions between family members in the past; thus, it is a good candidate for being an exogenous item. The goodness of both our instruments is confirmed by the result of the Amemiya-Lee-Newey overidentification test, which states that the instruments used in our model are not endogenous (thus, valid) and so the model seems to be specified correctly. However, this test is rather weak for multiple instrument models, since it is based on the crucial hypothesis that at least one of the instruments is valid, which is not verifiable.
A second issue is that ethnic minority and family conflict are not the only determinants of social isolation. In fact, the coefficients from the first-stage regressions are quite small (Table 3), even if jointly strongly significant. The Fstatistic on excluded instruments in the first stage is greater than 10, so following the rule of thumb by Staiger and Stock [40], we do not worry about weak instrument problems, even if this does not guarantee that the coefficients are unbiased.
Nevertheless, our objective is to test a relationship which is outlined in the literature. In this respect, it is important to focus on the qualitative conclusion: there is strong evidence that isolation has a negative effect on health. Our findings are consistent with those reported in the medical literature.
Our analysis provides additional evidence to support the hypothesis that social isolation deteriorates people's health. Although our results are coherent overall with the existing literature, they are not fully comparable. The first difference is the measure for social isolation: most studies use the UCLA loneliness scale, while, as explained above, we used the social isolation index.
Secondly, while many papers focus on specific medical conditions, often related to the circulatory system, such as cardiovascular diseases, blood pressure, myocardial infarction etc. [41][42][43], we resort to self-reported health status at a single moment in time. A similar measure was however employed in the literature [44], together with other measures: using a crossed-lagged model, studies found that loneliness has a modest negative impact on self-rated health over two years. As we have mentioned, researchers who focus on health behaviors obtain contrasting figures.

Conclusions
In this paper, we investigate the relationship between social isolation and health.
In particular, we test the hypothesis that higher isolation worsens the health status of individuals. However, since this relationship is ambiguous and affected by endogeneity, a simple correlation among these variables does not prove our hypothesis. For this reason, in order to investigate the causal relationship, we implemented an instrumental variable approach, in which the instruments are given according to the variables "belonging to an ethnic minority" and "family conflict", both of which are highly correlated with social isolation.
Using different data from a cross-sectional survey and a different methodology from those employed before, we find results in line with past research in medicine and psychology but extended to self-reported health for a large, representative sample of European citizens (not necessarily belonging to the older cohorts). High levels of social isolation, defined as a lack of personal contacts with peers and an absence of profound relationships, are found to favor a decline in the perceived health of individuals. The validity of the model providing these results is supported by the implementation of statistical tests of validity and the relevance of the instruments.
All in all, our results show that a lack of social connections impacts the health status of individuals. This is an important result for policymakers, as exploring the channels through which social status affects people's health may help in designing preventative interventions. The relationship between isolation and health becomes even more important if one considers the peculiarities of modern European societies. On the one hand, an aging population is likely to increase the already high healthcare expenditures. On the other hand, family size is decreasing with plausible consequences on social ties and feelings of loneliness. Hence, addressing the issues of social isolation and loneliness might be an effective strategy to improve the population's health, which in turn will benefit governments. Finally, we cannot ignore the role of the internet and social media, the effects of which on social relationships and loneliness are still not completely clear.