1. Introduction
A study conducted by Doremalen et al [
1] has demonstrated that SARS-CoV-2 can remain viable and infectious in aerosols for several hours and on certain surfaces. Building upon this research, the hypothesis arises for other researchers that COVID-19, caused by the coronavirus, might potentially interact with air pollution. Groulx et al. [
2] confirm that microbial agents of communicable diseases, such as viruses, have interactions with air pollution, affecting public health .A study conducted in Poland found a significant association between particulate matter and the number of new COVID-19 infections [
3]. Similar studies across Europe suggest that short-term exposure to particulate matter (PM) is related to the spread of SARS-CoV-2, with PM levels in England and Italy specifically implicated [
4,
5]. In the Middle East, a study of Baghdad and Kuwait found that PM2.5 levels were positively related to deaths caused by COVID-19, with a decrease in particulate matter leading to a significant decrease in the death rate. In Kuwait, a 38.4% decrease in deaths was observed during the travel ban period, with an average decrease of 22.3% in PM2.5 levels. This study also found a positive relationship between air temperature and a negative relationship between humidity and the number of deaths [
6].Therefore, some studies have found a relationship between PM and corona [
7,
8,
9], while others have not found any significant association between the two [
10]. Some studies have merely identified a correlation between PM and the daily number of confirmed cases without providing a p-value [
5] . In a study conducted in Delhi, researchers found that the number of COVID-19 cases exhibited a significant negative correlation with PM2.5 levels (correlation=−0.63, p-value<0.01) during the pre-lockdown phase. However, the number of COVID-19 cases during the lockdown phase also showed a positive correlation with PM2.5, with a correlation value of 0.56. Despite these contrasting correlations, the researchers concluded that there is a dependence of COVID-19 transmission on the concentration of PM2.5 in Delhi’s environment [
11].
The reason why some studies find a positive relationship between PM and COVID-19 cases, while others do not, can be attributed to the fact that correlation does not imply causation. To establish causation, researchers need to conduct carefully designed studies, such as randomized controlled trials or longitudinal studies, to demonstrate a direct cause-and-effect relationship between PM levels and COVID-19 outcomes.
Indeed, emissions from the combustion of diesel fuel in cars and other vehicles are recognized as a significant source of particulate matter (PM) in urban areas [
12,
13]. As a result, regions with higher population density tend to have more transportation activities, contributing to increased levels of PM [
14,
15] During the COVID-19 lockdowns implemented in response to the pandemic, there were significant reductions in urban activity, including a decrease in transportation and industrial activities. As a result, there was a noticeable reduction in emissions, including those of particulate matter. This reduction in human activity led to improvements in air quality in many urban areas during the lockdown periods [
16]. Population is a crucial factor in urban areas, as it reflects the concentration of individuals in a given space. Areas with higher population are more likely to experience quick spreading of infectious diseases, including COVID-19 [
17]. While areas with larger populations tend to have more reported COVID-19 cases (correlation), it does not necessarily mean that population itself directly causes the spread of the virus (causation). Just like flipping a coin multiple times increases the likelihood of observing both heads and tails, having a larger population in an area might lead to more reported COVID-19 cases due to an increased chance of encountering infected individuals. However, this correlation does not imply that population size directly causes the occurrence of COVID-19 cases. Two study Malaysia found a strong positive and statistically significant correlation between the total population and Covid-19 cases, indicating that larger populations were associated with higher case numbers. However, the relationship between population density and the spread of Covid-19 was weaker [
18,
19]. Two common mistakes made by researchers are using cumulative frequency reports of Covid-19 cases or deaths directly in their analysis without considering their implications, and applying linear regression analysis, which assumes a continuous dependent variable, to model the discrete nature of the number of people infected with Covid-19. Despite this, many studies still utilize linear regression inappropriately. For instance, two study on the correlation between population density and Covid-19 in the USA [
20,
21], linear mixed models were used. Additionally, it appears that the data in the linked article uses cumulative frequency instead of frequency. Of course, the population density in specific places, such as hospitals, public transportation, and cruise ships [
22], can significantly contribute to the transmission of Covid-19 in localized settings. However, it’s important to clarify that the primary objective of our study is to investigate this phenomenon on a larger macro scale, encompassing provinces, cities, and countries. Therefore, there is a difference between the two concepts of population density and physical distance. In another study conducted in America, focusing on 913 counties, they found that metropolitan population density played a significant role as a predictor of infection rates. However, they observed that county density, by itself, was not significantly related to the infection rate. Instead, the study highlighted that connectivity, which involves factors beyond just density, appears to have a more significant impact on infection rates [
23].
Considering the complexities of the association between air pollution and the spread of COVID-19, it would be reasonable to expect that regions with higher wind speeds, resulting in lower pollution levels, would also have fewer COVID-19 cases if all other factors were equal. However, despite this logical expectation, studies have not consistently shown this correlation between wind speed, pollution, and COVID-19 cases. The Gaussian air pollutant dispersion equation is indeed one of the earliest and simplest forms of pollutant dispersion modeling. It describes how air pollutants disperse and spread in the atmosphere under the influence of wind and other meteorological factors [
24] Higher wind speeds can enhance the dispersion of air pollutants, leading to lower local pollution levels in densely populated cities. In areas with high wind speeds, it is expected that air pollutants would disperse more effectively, potentially reducing the concentration of pollutants in the air. In the study conducted in New York, the Spearman Correlation Coefficient of +0.172 suggests a positive correlation between wind speed and COVID-19 cases. This means that higher wind speeds were associated with higher COVID-19 case counts in that particular area [
25] On the other hand, the study in Jakarta, Indonesia, revealed a significantly negative correlation (r = −0.314; p < 0.05) between low wind speed and higher COVID-19 cases [
26]. Moreover, the study by Shao et al. found a positive and negative correlation between wind and the number of infected, indicating a connection between pollution and COVID-19 [
27]. The limitations observed in existing investigations stem from the fact that both air pollution and COVID-19 infections are correlated both spatially and temporally. Both spatial and temporal correlations between air pollution and infections can introduce biases in the estimation of results. Typically, researchers choose to consider either spatial or temporal correlations, depending on the research question and the nature of the data being analyzed. Our study possesses several advantages. Firstly, it benefits from a large number of statistical samples, which enhances the robustness and reliability of the findings. Additionally, the research employs two different types of correlations, namely spatial, temporal to thoroughly investigate the relationship between air pollution and COVID-19. This comprehensive approach allows for a more comprehensive understanding of the potential link between air pollution and the incidence of the disease. By utilizing various correlation methods and a substantial dataset, this study aims to provide valuable insights into the impact of air pollution on COVID-19.
2. Materials and Methods
2.1. Sample
● This study centers on fifty-one (N= 51) states in the USA, one of the countries significantly impacted by the COVID-19 pandemic, with over 54 million cases reported over the course of two years (2020 and 2021). Due to the larger dataset of people infected with COVID-19 compared to the number of deaths, this study utilized data on the number of infected individuals for analysis.
2.2 Sources
●Wind speed, air pollution data obtained from United States Environmental Protection Agency (EPA)website [
28] the study also obtained temperature data in Fahrenheit from the National Centers for Environmental Information [
29].
2.3. Measurements
● The data of PM2.5 is often reported using the Air Quality Index (AQI), which provides an overall measure of air quality based on various pollutants, including PM2.5. However, the AQI is a dimensionless index and not directly usable for quantitative analyses due to its scale and unitless nature.To facilitate statistical analysis and comparisons, researchers often convert AQI values to a more quantitative and usable unit such as micrograms per cubic meter (µg/m³) using appropriate conversion equations. This conversion allows for the data to be expressed in a standard unit that can be utilized in statistical models and helps to establish a more meaningful relationship between PM2.5 concentrations and other variables.While the correlation between AQI and µg/m³ values not be 1, converting AQI to µg/m³ provides a more accurate representation of PM2.5 concentrations, enabling researchers to better understand its relationship with other variables in quantitative analyses. The AQI is given by eqn(1) [
30].
where;
Conci(PM2.5)= input concentration for a given pollutant(pm2.5)
ConcLo= the concentration breakpoint that is less than or equal to Conci
ConcHi(PM2.5)= the concentration breakpoint that is greater than or equal to Conci
AQILo= the AQI breakpoint corresponding to ConcLo
AQIHi= the AQI breakpoint corresponding to ConcHi
● The average wind speed is measured in meters per second (m/s) using the Instrumental - RM Young Model 05103, which is designed to measure wind speed at low altitudes. It is important to note that wind speed can vary with height, and therefore, different devices and methods may yield different results due to the variations in wind patterns at different altitudes.
● Time series data for COVID-19 confirmed cases in the United States for the years 2020 and 2021 can be obtained from the CSSE (Center for Systems Science and Engineering) at Johns Hopkins University public archive data [
31]. In the archive, the data is initially provided as cumulative frequency, which represents the total number of COVID-19 cases up to a specific date. To use this data for analysis, it needs to be transformed into daily frequency by taking the difference between consecutive data points. To clarify, for each day, the number of new COVID-19 cases (frequency) can be calculated by subtracting the cumulative count at the previous day (t0) from the cumulative count at the current day (t1), denoted as x(t1) - x(t0). In addition, the ratio of the number of cases to the total time the population is at risk of disease can also be calculated. This ratio provides insights into the incidence rate of COVID-19 cases per unit time for each state. Furthermore, to determine population density, one can obtain the population of each state and divide it by the area of each state. In the majority of studies, researchers commonly employ Pearson correlation for assessing the relationship between variables. While some studies use Kendall and Spearman correlation, the differences in results are not significant. To facilitate comparison with other research, we also utilize Pearson correlation. Pearson’s correlation coefficient (r) is a widely used measure that evaluates the strength, type, and direction of the relationship between two variables. The Pearson correlation (r) is defined as shown in Equation (2) [
32].
where:
r=correlation coefficient,
are the values of the variable in a sample ,
,= mean of the values of the y-variable.
3. Results
Figure 1 depicts the number of confirmed COVID-19 cases in the United States throughout the years 2020 and 2021. The data shows that the peak of COVID-19 infections in 2020 occurred in December, while in 2021, the highest number of cases was reported in January. Over the entire year of 2020, a total of 20,126,950 confirmed COVID-19 cases were recorded in the United States, and this number surged to 34,505,103 in 2021. The figure effectively presents the overall trend of COVID-19 cases over the two-year period, highlighting fluctuations and changes in infection rates across different months in both years.
Figure 1.
The number of confirmed Covid-19 cases in the years 2020 and 2021.
Figure 1.
The number of confirmed Covid-19 cases in the years 2020 and 2021.
The data analysis presented in
Figure 2 consistently demonstrates a high prevalence of COVID-19 cases in California, Florida, New York, and Texas throughout the two-year period. The three graphs indicate that the pattern of COVID-19 cases in these states closely correlates with their respective population sizes. States with larger populations tend to have a higher number of COVID-19 cases, suggesting that population size plays a significant role in the spread of the virus.
Figure 2.
Number of confirmed Covid-19 in the years 2020 and 2021.
Figure 2.
Number of confirmed Covid-19 in the years 2020 and 2021.
The strong spatial correlation between COVID-19 cases in 2020 and 2021 suggests that the pattern of infections for each state repeated in the following year (
Table 1). There is a significant positive correlation between the population and COVID-19 cases(r=0.98), supporting the idea discussed in the introduction that population size can influence the likelihood of infection. The weak correlations, close to zero, between the rate of COVID-19 cases and population, as well as population density and COVID-19 cases. Wind speed shows no correlation with COVID-19 cases, indicating it has little impact on transmission dynamics. Temperature, on the other hand, exhibits a positive correlation with COVID-19 cases. Regarding PM2.5, COVID-19 cases in 2020 show a significant positive correlation (r=0.468) with PM2.5, while in 2021, the correlation remains positive (r=0.168) but not significant. Additionally, the correlation between the rate of COVID-19 cases and PM2.5 is close to zero, suggesting their independence.
Table 1.
Spatial correlation and Covid-19 cases in different states.
Table 1.
Spatial correlation and Covid-19 cases in different states.
| |
covid2020 |
covid2021 |
r2020 |
r2021 |
pop |
density |
pm2020 |
pm2021 |
temp2020 |
temp2021 |
wind2020 |
wind2021 |
| covid2020 |
1 |
.948**
|
0.045 |
-0.023 |
.982**
|
-0.095 |
.468**
|
.289*
|
.338*
|
.333*
|
-0.012 |
-0.011 |
| covid2021 |
.948**
|
1 |
-0.071 |
0.1 |
.967**
|
-0.084 |
.340*
|
0.168 |
.349*
|
.338*
|
-0.092 |
-0.101 |
| rate2020 |
0.045 |
-0.071 |
1 |
0.253 |
-0.083 |
-0.169 |
0.09 |
.368**
|
-0.13 |
-0.113 |
.374**
|
.392**
|
| rate2021 |
-0.023 |
0.1 |
0.253 |
1 |
-0.054 |
-0.082 |
-0.139 |
-0.067 |
-0.105 |
-0.136 |
-.286*
|
-.304*
|
| population |
.982**
|
.967**
|
-0.083 |
-0.054 |
1 |
-0.082 |
.450**
|
0.244 |
.324*
|
.316*
|
-0.065 |
-0.065 |
| density |
-0.095 |
-0.084 |
-0.169 |
-0.082 |
-0.082 |
1 |
0.092 |
0.09 |
0.12 |
0.104 |
-0.1 |
-0.104 |
| pm2020 |
.468**
|
.340*
|
0.09 |
-0.139 |
.450**
|
0.092 |
1 |
.803**
|
0.076 |
0.071 |
0.032 |
0.057 |
| pm2021 |
.289*
|
0.168 |
.368**
|
-0.067 |
0.244 |
0.09 |
.803**
|
1 |
-0.052 |
-0.046 |
0.17 |
0.188 |
| temp2020 |
.338*
|
.349*
|
-0.13 |
-0.105 |
.324*
|
0.12 |
0.076 |
-0.052 |
1 |
.998**
|
-0.115 |
-0.083 |
| temp2021 |
.333*
|
.338*
|
-0.113 |
-0.136 |
.316*
|
0.104 |
0.071 |
-0.046 |
.998**
|
1 |
-0.078 |
-0.046 |
| wind2020 |
-0.012 |
-0.092 |
.374**
|
-.286*
|
-0.065 |
-0.1 |
0.032 |
0.17 |
-0.115 |
-0.078 |
1 |
.973**
|
| wind2021 |
-0.011 |
-0.101 |
.392**
|
-.304*
|
-0.065 |
-0.104 |
0.057 |
0.188 |
-0.083 |
-0.046 |
.973**
|
1 |
| **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). |
Table 2 displays the time correlation between different variables. The correlation between COVID-19 cases in 2020 and 2021 is found to be r=0.384, which is much weaker than the spatial correlation observed earlier. This suggests that the relationship between COVID-19 cases in consecutive years is not as strong as the spatial relationship across different states. The correlation between temperatures in 2020 and 2021 is high, indicating that the temperature pattern remains consistent in most states of America and is repeated year after year. The 7th and 8th months of the year are typically the hottest months. Additionally, there is a high and significant correlation between wind speed in 2020 and 2021 (r=0.899). Wind speed and temperature tend to have an inverse relationship, where higher wind speeds are associated with cooler temperatures. Furthermore, the correlation between wind speed and PM2.5 is -0.685 and -0.613 (p-value <0.01) for the years 2020 and 2021, respectively. This indicates that when wind speed is higher, PM2.5 levels tend to be lower. Regarding COVID-19 cases, there is a positive correlation with PM2.5 in both 2020 (r=0.111) and 2021 (r=0.235).
Table 2.
Temporal correlation between pollution and Covid-19 cases.
Table 2.
Temporal correlation between pollution and Covid-19 cases.
| |
covid2020 |
covid2021 |
temp2020 |
temp2021 |
wind2020 |
wind2021 |
pm2020 |
pm2021 |
|
| covid2020 |
1 |
0.384 |
-0.175 |
-0.104 |
-0.273 |
-0.182 |
0.111 |
-0.005 |
|
| covid2021 |
0.384 |
1 |
-0.455 |
-0.398 |
-0.375 |
-0.176 |
0.355 |
0.235 |
|
| temp2020 |
-0.175 |
-0.455 |
1 |
.986**
|
-0.529 |
-.620*
|
0.295 |
0.528 |
|
| temp2021 |
-0.104 |
-0.398 |
.986**
|
1 |
-0.551 |
-.603*
|
0.327 |
0.477 |
|
| wind2020 |
-0.273 |
-0.375 |
-0.529 |
-0.551 |
1 |
.899**
|
-.685*
|
-.689*
|
|
| wind2021 |
-0.182 |
-0.176 |
-.620*
|
-.603*
|
.899**
|
1 |
-0.556 |
-.613*
|
|
| pm2020 |
0.111 |
0.355 |
0.295 |
0.327 |
-.685*
|
-0.556 |
1 |
0.331 |
|
| pm2021 |
-0.005 |
0.235 |
0.528 |
0.477 |
-.689*
|
-.613*
|
0.331 |
1 |
|
| **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed). |
4. Discussion
Our study employed spatial and temporal correlation analyses to explore the relationships between wind, temperature, pollution, population density, and COVID-19 cases. The findings suggest correlations between pollution and COVID-19 cases but caution against making direct causative conclusions. While many studies have shown a correlation between air pollution and the number of COVID-19 infections, it does not imply causality. During lockdown periods, we observed a decrease in pollution, and studies have shown that the disease itself caused a decrease in air pollution [
33]. However, this correlation does not indicate causation but rather reflects the simultaneous occurrence of two phenomena. Observing similar patterns between the graphs of mortality and infection rates in Europe [
34], researchers may be inclined to automatically assume that pollution has a strong effect on COVID-19. There are several reasons why caution is necessary in making such conclusions:
1-Correlation does not imply causation: Just because two variables (in this case, air pollution and COVID-19 outcomes) show similar patterns does not necessarily mean that one directly causes the other. There could be other factors at play that are responsible for the observed associations. To demonstrate the potential for such errors, you used the rate of infected people (the number of infected individuals divided by the population of the state) and found that its correlation with air pollution was close to zero. This finding suggests that there is no strong linear relationship between air pollution and the rate of COVID-19 infections.
2-Confounding factors: The observed patterns in covid-19 cases could be influenced by numerous confounding factors, such as population. These factors may influence both air pollution levels and the spread of COVID-19 independently [
35]. Although the spatial correlation in
Table 1 shows the effect of population on corona and pollution at a significant level (p-value<0.01). Population is one such confounding factor that can impact both air pollution levels and the spread of COVID-19 independently. A larger population in an area may lead to more reported COVID-19 cases due to the increased likelihood of encountering infected individuals. However, this correlation does not imply that population size directly causes the occurrence of COVID-19 cases. If population size were the primary determinant of COVID-19 cases, then population density would also have a similar effect on both COVID-19 cases and air pollution (But the correlation is close to zero)
.
3-Regional variations : Consistent with previous research [
26,
36] areas experiencing higher wind speeds tend to have lower levels of PM2.5 pollution. Interestingly, we also observed a temporal correlation between lower wind speeds and increased COVID-19 cases. This temporal correlation suggests that reduced wind speeds might contribute to higher COVID-19 case numbers. However, when examining the spatial correlation, we found a positive association. This suggests that factors beyond just wind speed and pollution may influence the spatial distribution of COVID-19 cases.