Socioeconomic inequalities in COVID-19 confirmed infection across three waves. A multilevel approach in a southern Euro- pean region

Background: The aim of our study was to analyze the effect of socioeconomic inequalities, both at the individual and area of residence levels, on the probability of COVID-19 confirmed infection, and its variations across three pandemic waves. Methods: Retrospective cohort study. We included data from all individuals tested by COVID-19 during the three waves of the pandemic, from March to December 2020 (357,989 individuals). We studied the effect of inequalities on the risk of having a COVID-19 confirmed diagnosis after being tested using multilevel analyses with two levels of aggregation: individuals and basic healthcare area (BHA) of residence (deprivation level and type of zone). Results: Patient profile changed through the pandemic, with a predominance of low-paid employees living in deprived BHA. Workers with low salaries, unemployed and people on minimum integration income or who no longer receive the unemployment allowance, had a higher probability of COVID-19 infection than workers with salaries ≥€18,000 per year. Inequalities were higher in women and in the second wave. The deprivation level of BHA of residence influenced the risk of COVID-19 infection, especially in the second wave. Conclusions: There are inequalities in the risk of COVID-19 confirmed infection, both at individual and area level. It is necessary to develop individual and area coordinated measures in the control, diagnosis and treatment of the epidemic, in order to avoid an increase in the already existing inequalities.


Introduction
The 2019 coronavirus disease (COVID-19) outbreak in China has triggered an unprecedented global public health crisis [1]. According to the World Health Organization (WHO) COVID-19 Dashboard [2], on March 2021 there were more than 120 million confirmed cases and more than 2.6 million deaths worldwide. Spain has been one of the European countries most affected by the COVID-19 pandemic. At the time of writing this article, Spain has more than 3 million confirmed cases and a 14-day incidence rate of more than 140 cases per 100,000 inhabitants, with a lethality of 2.3% [3]. outcomes in populations and areas with lower socioeconomic levels [5,6]. Regarding individual socioeconomic characteristics, several authors have pointed out a socioeconomic gradient in COVID-19 outbreaks due to differences in knowledge and practices towards COVID-19 [7,8]. In relation to job type, low-paid workers have a higher probability to be designated as key workers, with the consequent increased risk of exposure [9]. Other individual factors that can explain these differences have also been described, such as living below the poverty line or the lack of health insurance [10,11]. But socioeconomic differences do not only play a fundamental role at the individual level. Other levels of aggregation, such as area of residence, are key to understand the existence of inequalities. In Spain, COVID-19 studies conducted in the area of residence [12,13] showed that COVID-19 incidence was higher in the most deprived urban areas. In this sense, it has been described that living in disadvantaged environments is related with the existence of chronic stFethicressors that, after a time, damage the health of its inhabitants [9]. Living in a deprived area is also associated with poorer access to health care, even in universal healthcare systems [14], dependence on public transport or living in small places shared with other people, where the adoption of appropriate quarantine measures is not possible [15].
Significant variations have been observed in the evolution of the pandemic in Spain. Administratively, Spain is organized into 17 Autonomous Communities (ACs) with independent healthcare management. Aragón is a northeastern AC of 1.3 million inhabitants. COVID-19 pandemic has had a strong impact on this population, with more than 110,000 confirmed cases at the moment of writing this article [16]. Moreover, Aragón has shown certain differences in the COVID-19 pandemic with respect to the rest of the ACs in Spain. The main difference is that, unlike the rest of Spanish ACs, with 3 waves, Aragon has registered four waves at the time of writing this article, with varying social and healthcare impact, and changes in the profile of affected individuals [16]. This has been explained by the fact that the second wave in Aragon started earlier than in the rest of the country, being related to the arrival of seasonal fruit pickers to certain farming areas, as well as to the presence of urban neighborhoods where the population of the most disadvantaged social class and related to these seasonal workers is concentrated. The analysis of these variations is crucial to know the evolution of COVID-19 spread and the effect of the measures adopted.
Understanding the impact of social inequalities on the risk of COVID-19 infection is therefore essential when designing strategies to reduce COVID-19 incidence, in order to mitigate the social consequences of the pandemic. To this end, the objective of our study is to analyze the effect of socioeconomic inequalities, both at the individual and area of residence levels, on the risk of COVID-19 confirmed infection in a southern European region, and its variations throughout the three waves of the pandemic.

Design, information sources and study population
We conducted a retrospective cohort study using data from the Aragón-COVID19 cohort. This is a health data collection of all individuals tested for COVID-19 in the Spanish region of Aragón. The Aragón-COVID19 cohort includes information gathered from administrative health data sources as well as electronic health records of the Aragón health service. All individuals in the cohort were included from March 9, 2020, the first epidemiological week with COVID-19 cases reported in Aragón, to December 13, 2020, the latest data available at the moment of writing this paper (357,989 individuals). All cases of COVID-19 were confirmed using polymerase chain reaction (PCR) or COVID antigen testing.
The research protocol of this study was approved by The Clinical Research Ethics Committee of Aragón (CEICA) (PI20/184).

Variables of the study
We analyzed sociodemographic and clinical information of all the individuals in the cohort. Regarding sociodemographic characteristics, we consider sex, age (under 15, 15-44, 45-64, 65-79 and 80 years or older), and socioeconomic level. Socioeconomic level was calculated on the basis of pharmacy copayment levels and Social Security benefits received, according to the type of user of the Aragón health service. From the combination of these two variables, 8 categories mutually exclusive were obtained: employed individuals earning less than €18000 per year, employed individuals earning €18000 per year or more, individuals receiving the unemployment allowance, individuals with a contributory pension of less than €18000 per year, individuals with a contributory pension of €18000 per year or more, individuals affiliated to the mutual insurance system for civil servants, individuals receiving free medicines (people with minimum integration income or who no longer receive the unemployment allowance), and other situations not previously considered. The clinical information included was obtained from the morbidity adjusted groups (GMA) [17]. This source of information considers all medical diagnoses available from Primary Healthcare and hospital discharge records (CMBD). We considered GMA information from January 2020 in order to know the status prior to the COVID-19 diagnosis of the cohort individuals. The three variables analyzed from GMA were weight complexity (obtained from the aggregation of the patient´s different diagnoses), the presence of chronic morbidities and the presence of respiratory illnesses.
We also considered two variables by Basic Healthcare Area (BHA) of residence. The first variable was the deprivation index of the BHA categorized into four quartiles, from least (Q1) to most (Q4) deprived. This deprivation index combines information of four indicators from the Population and Housing Census 2011 (last available): % of unemployment, % of temporary workers, % of people between 16 and 64 years with low educational level and % of immigrants [18]. The other variable obtained by BHA was the classification of the zone into rural or urban, according to the Aragon Government [19]. So, urban areas are those that concentrate at least 80% of the BHA population in their municipalities and rural areas are those that do not meet this criterion.

Statistical analysis
Analyses were performed both globally and considering the three existing pandemic waves in Aragon until to December 2020: from March 9 to June 21; from June 22 to October 11; and from October 12 to December 13. All analyses were stratified by sex.
We described sociodemographic and clinical characteristics of all individuals included in the cohort, globally and according COVID-19 confirmed diagnosis. Sociodemographic and morbidity differences by wave in individuals with a laboratory-confirmed COVID-19 infection were described. Categorical variables were described by percentages. Weight complexity had a non-normal distribution, so median and interquartile range were used to describe this variable. Statistical differences between waves were assessed using Chi-square and Mann-Whitney tests.
In order to study the effect of inequalities on the risk of having a diagnosis of COVID-19, multilevel analyses stratified by sex were developed. Analyses were conducted for the entire period analyzed and by pandemic wave. Two levels of aggregation were considered: individuals and BHA. Each individual included in the study has his/her own characteristics in terms of age, socioeconomic status and previous morbidities, but they also belong to a particular BHA, each with different characteristics in terms of deprivation index and type (rural or urban). When data are grouped together, there is an intra-class correlation, meaning that there are observations that are more similar to others in the same group than to those in other groups. When adjusting the multilevel model using random intercepts, part of the variability in the response variable is divided into each "level" (deprivation index and BHA type, respectively) and variance partition coefficients can be calculated to see how much of the variance of the response belongs to each level.
Individuals could simultaneously belong to more than one group of a given hierarchical level. Thus, at the same time, an individual belongs to a BHA with a given deprivation index and to a rural or urban BHA. This leads to a cross-classified structure. In this case, we classified COVID cases by their BHA deprivation index (quartiles) and type of zone (urban or rural), so that both are considered to be random. Cross-random effects are used when each category of one factor co-exists with each category of the other factor (there is at least one observation of categories for both factors). In this model, X set of explanatory variables includes K regressors. Individual sociodemographic characteristics (age and socioeconomic level) and morbidity were considered as explanatory variables. The parameter represents the fixed effects. This model has three assumptions: first, the random effects y are normally distributed with mean 0 and variance 2 = 2 , which stands for differences in the self-referred hospitalization use variable attributable to the country; second, the error component ( ) is also normally distributed with mean 0 and variance; third, the random effects y and the error component ( ) are independent, and ( ) are all independent of each other. Interactions between variables were systematically investigated and collinearity was considered. Finally, the likelihood ratio test (LR test) was used to evaluate the final model. The significance of the fixed effects was also evaluated with the Wald Test.
All analyses were performed using R Statistical Software (the R Foundation for Statistical Computing, Vienna, Austria). Data were analyzed using a linear mixed-effects regression based on the lme4 package [20] in R statistical package version 4.0.4

Results
Data from 357,989 individuals included in the Aragón-COVID19 cohort were analyzed. Of these individuals, 74,039 (20.7%) had a COVID-19 confirmed infection. 53.4% of the studied population were women, with a COVID-19 positivity of 20.5%. In the case of men, positivity was 20.9%. Sociodemographic and morbidity description of all individuals studied are available in Tables 1 and 2. There were statistical differences between people with no COVID-19 diagnosis and COVID-19 confirmed cases for age, socioeconomic level, deprivation quartile and hospitalization for both, men and women. In the case of women, those without a confirmed COVID-19 diagnosis also presented a higher prevalence of respiratory illnesses. In men, differences were observed for all clinical variables considered.  The overall COVID-19 positivity rates were 20.49% in women and 20.90% in men.
When analyzing COVID-19 positivity rates for the entire period considered (Table 3) for the sociodemographic and morbidity analyzed characteristics, we observed that the rates were similar between men and women. In women, the age groups with the lowest and highest positivity rates were, respectively, the youngest and the eldest group. In men, the lowest positivity rate was observed in those <15 years old, while the highest positivity rate was found in people from 45 to 64 years old (23.84%).
Regarding socioeconomic status, those with free medicines in women (22.22%) and with "other" category in men (22.33%) showed the highest positivity rates. For both sexes, positivity rates were slightly higher in the most deprived quartile and similar in the rural and urban context. In terms of clinical characteristics, the highest positivity rates were observed in those people with a hospitalization. for wave 2. Weight complexity, presence of chronic morbidities and respiratory illnesses were significantly higher (p<0.001) in wave 1 than in the other two waves. We observed similar results in men than in women (Table 5) for individual socioeconomic level, BHA deprivation, type of BHA (rural or urban) and previous morbidities. The results of the multilevel analysis in women ( or who no longer receive the unemployment allowance), workers with low salaries and unemployed presented a higher risk of COVID-19 infection than those workers with salaries ≥€18000 per year. Finally, women with previous chronic morbidities showed a lower risk of COVID-19 infection than those with no morbidities, after adjusting for the rest of the variables of the model (OR: 0.8; 95%CI 0.8-0.9 for the whole period analyzed).
The highest value of the between-group variance (τ00) was observed in phase 2 for the deprivation quartile (0.0209). This result shows that about 2.09% of the residual variance of the dependent variable (COVID infection) is attributable to differences between deprivation quartiles, after controlling for the explanatory variables. There were differences in the risk of COVID-19 infection depending on BHA of residence, and especially according deprivation quartile. This effect was greatest in wave 2, with a median OR (MOR) of 1.15 for BHA deprivation and 1.11 for zone of residence. In all cases, the models with varying intercepts among crossed random effects fit the data significantly better than other models. In both, men (Table 7) and women, there was a high probability of COVID-19 infection with increasing age. The highest risk of COVID-19 infection was observed in the elderly in wave 1 (OR: 10.9; 95%CI 7.4-15.9) in relation to the youngest group (<15 years old). In men, socioeconomic inequalities in the risk of COVID-19 infection were also observed, especially in wave 2, but these differences by socioeconomic level were lower than in women. Thus, in wave 2, workers with low salaries had a higher risk of COVID-19 infection than those workers with higher salaries (OR: 1.2; 95%CI 1.1-1.2). This result was also observed in men with free medicines (OR: 1.2; 95%CI 1.1-1.3). Regarding the existence of previous morbidities, a lower probability of COVID-19 infection was observed in those with chronic morbidities than those with no morbidities, after adjusting for the rest of the variables in the model (OR: 0.9; 95%CI 0.8-0.9 for the entire period). As observed in women, the highest value of τ00 was observed in wave 2 for the deprivation quartile (0.0255), whereas the highest value of τ00 for zone of residence was obtained in wave 1 (0.0071). MOR showed its highest effect in wave 2, with similar values to those in women for both, deprivation quartile and zone of residence.

Discussion
The objective of this study was to explore the existence of individual and area inequalities in the risk of COVID-19 confirmed infection, and its variations across three pandemic waves (from March to December 2021). As we have observed, Aragón has been severely affected by the COVID-19 pandemic, with high incidence rates for all groups of age, especially young active population and the elderly. COVID-19 incidence rates were higher in women than in men. Different profiles of patients with confirmed COVID-19 diagnosis have been observed among the total who were tested for the three waves analyzed, with the most striking changes between wave 1 and waves 2 and 3. In wave 1, the highest frequency of confirmed cases was observed in low-income pensioners, with a high prevalence of chronic morbidities and living in BHA with low deprivation index. On the contrary, in waves 2 and 3 there was a predominance of employees with low salaries and people living in deprived BHA. This profile was similar for both sexes. Regarding multilevel analyses, there were inequalities on the risk of COVID-19 infection according to individual socioeconomic status. Taking workers with salaries ≥€18,000 per year as reference, workers with lower salaries, the unemployed and people with minimum integration income or who no longer receive the unemployment allowance, had a higher risk of COVID-19 infection. These inequalities were greater in women and in wave 2. The deprivation level of BHA of residence influenced the risk of COVID-19 infection, especially in wave 2.
When analysing the evolution of the pandemic in Aragón, the large difference in incidence rates between the first wave and the other two is striking [16], which is probably related with the test availability and the lack of clear diagnosis protocols at the beginning of the pandemic. As stated by Marí et al. [13], the first Spanish wave was based on hospitalized cases. This is the reason why it affected mainly the elderly and people with chronic conditions. This fact would also explain differences in wave 1 versus wave 2 and 3 in terms of inequalities, as testing accessibility improved along the pandemic, revealing inequalities that had been hidden at the beginning [21]. Finally, the low risk of COVID-19 infection observed in people with chronic morbidities, throught the waves and for both sexes, could also be related to a higher probability of testing in the profile of these patients.
In terms of inequalities at the individual level, employees with low salaries presented the highest risk of COVID-19 confirmed diagnosis, especially in wave 2. The second wave in Aragón started with a series of outbreaks among seasonal workers. Seasonal agricultural workers in Spain are mainly migrants, with temporary, low-paid jobs with very poor health and hygienic conditions. These characteristics make them a especially vulnerable group and, although especial COVID protocols were implemented, they were clearly insufficient [22]. Also, employment status has been considered especially problematic in COVID-19 pandemic, due to its relationship with class inequalities in income, employment conditions and security [23]. The lockdown and the general recommendation of "working from home" has exacerbated the differences between those people who can do telematic work and those who cannot [24]. This is related to the fact that those low-paid workers are less likely to be in jobs where is possible to work from home [25] with a higher risk of COVID-19 infection. Finally, those individuals belonging to low socioeconomic groups are more likely to have unstable working conditions and income. In Spain, the effect of COVID-19 on employment rates has been huge. According to the Economically Active Population Survey [26], the number of workers in Spain decreased by more than 622,000 people during 2020. This financial uncertainty has been linked to worse mental health conditions and high stress levels, with a high likelihood of health risk behaviors [27]. So, as some authors have pointed out [28], poverty not only increases exposure to virus, but it also reduces immunity, which can be translated into a higher risk of COVID-19 infection.
When inequalities in BHA of residence are evaluated, we observed that deprivation level of BHA influenced the risk of COVID-19 infection, showing statistical differences between the least deprived BHA and the most deprived. The association between deprived areas and high incidence rates of COVID-19 have already been described by other authors [12,13,15]. People living in deprived areas are more likely to live in poor conditions, which involve overcrowded accommodation and limit access to outdoor space [21]. BHA inequities could be associated with differential exposure to the virus and differential susceptibility to infection [21]. Finally, BHA type (rural or urban) did not play a significant role on the risk of COVID-19 infection in Aragón.
The main strength of this study lies in the fact that we analyzed all the individuals tested by COVID-19 in a population of 1.3 million people, including data from administrative health data sources and electronic health records. In addition, we used the combination of two different variables (information used to calculate pharmacy copayment levels and the type of user of the Aragón health service) to categorize socioeconomic level of the individuals. This proportions a better approximation to the real socioeconomic position of the individual.
Also, multilevel regression models allowed us to explore the impact of inequities on COVID-19 infection at different levels. Nonetheless, some aspects must be taken into consideration. First, two of the models presented (those corresponding to women for the overall period analysed and for men at wave 3) were "singular". Despite of this fact, we presented them in order to maintain comparability across models and waves, but their results must be interpreted cautiously. Second, the values of intraclass correlation coefficient (ICC) obtained were low, but similar to those of other health studies. We have computed other measures, such as MOR, which is considered an epidemiologically more suitable option for obtaining measures of variance in logistic regression, as it is not statistically dependent on the prevalence of the outcome and permits expression of the area level variance on the well-known OR scale. Therefore, it permits comparison of the magnitude of area level variations with the impact of specific factors [29]. MOR quantifies the variation between clusters (the second-level variation) by comparing two people from two different clusters randomly chosen. In our study, MOR quantifies differences (i.e. variance 2 ) between deprivation quartile and zone of residence by comparing 2 individuals with the same covariates but from 2 different, randomly chosen deprivation quartile or zone of residence. It is well known that individuals within a specific context may be more similar to each other than to individuals from a different context. Therefore, the interpretation of variance in multilevel analysis is pertinent to obtain information about a possible general effect of the context on individual outcomes [30].

Conclusions
Our study shows the existence of inequalities on the risk of COVID-19 confirmed infection, both at individual and area level. As Marmot et al. [25] have stated, the COVID-19 pandemic exposes and amplifies the existing inequalities in society. This requires the implementation of coordinated measures in the control, diagnosis and treatment of the epidemic, in order to avoid an increase in inequalities. In this sense, at the individual level, ensuring safe employment conditions and financial protection during pandemic is crucial [31,32]. Also, regarding measures at the area of residence, disease control efforts should be more intensive in those areas where the most vulnerable population lives [12] and adequate accessibility to diagnosis and treatment should be guaranteed. Finally, we must not overlook the fact that a post-COVID scenario will probably lead to a new global economic crisis, especially if austerity measures are implemented again [9,33]. It is crucial, therefore, to learn from mistakes of the past and promote a change of scenario, where increased social services for the whole population becomes a reality.

Institutional Review Board Statement: This study was approved by The Clinical Research Ethics
Committee of Aragón (CEICA) (PI20/184).

Informed Consent Statement: Not applicable.
Data Availability Statement: Aragon-COVID19 data is available under request to IACS.