An analysis of Covid-19 in Europe based on fractal dimension and meteorological data

The present paper proposes a fractal analysis of the Covid-19 dynamics in 45 European countries. We introduce a new idea of using the box-counting dimension of the epidemiologic curves as a means of classifying the Covid-19 pandemic in the countries taken into consideration. The classification can be a useful tool in deciding upon the quality and accuracy of the data available. We also investigated the reproduction rate, which proves to have significant fractal features, thus enabling another perspective on this epidemic characteristic. Moreover, we studied the correlation between two meteorological parameters: global radiation and daily mean temperature and two Covid-19 indicators: daily new cases and reproduction rate. The fractal dimension differences between the analysed time series graphs could represent a preliminary analysis criterion, increasing research efficiency. Daily global radiation was found to be stronger linked with Covid-19 new cases than air temperature (with a greater correlation coefficient -0.386, as compared with -0.318), and consequently it is recommended as the first-choice meteorological variable for prediction models.


Introduction
Nowadays, the importance of thoroughly studying epidemics has been clearly emphasised. Thus, acquiring information about a pandemic from every possible source can prove to be crucial in managing not only the current pandemic, but also future epidemics. Extracting new information from the available data is of utmost importance at present times, as the current pandemic seems far from ending, even after more than a year and a half since it started. In this respect, the first part of the present paper aims to analyse the epidemic curves corresponding to Covid-19 pandemic as fractals and determine their complexity based on the fractal dimension associated, which has not yet been done, as far as we know.
The central attention received by the current Covid-19 pandemic has a significant echo in almost every area of research. Researchers from every field have joined forces to discover as much as it can be unrevealed about the SARS-CoV-2 virus. As far as Mathematics is concerned, Covid-19 has been thoroughly studied. Many researchers tried to model the virus spread (see, for example [1][2][3][4][5][6][7][8], and many more). For a literature review on the subject see also [9][10][11][12][13]. As far as fractality is concerned, analysing the current pandemic from a fractal point of view has also been tackled in different research studies (see [14][15][16][17][18][19][20]).
Among the numerous research directions related to Covid-19, assessing the correlation between the new cases of Covid-19 registered and meteorological data has been in the spotlight. From the numerous researches related to this topic we mention the following ones along with all their bibliographic references: the systematic review of Majumder and Ray on the correlation between meteorological data and Covid-19 (see [21]), the systematic review of Briz-Redon and Serrano-Aroca (see [22]), the survey of Chen et al. on the relation between climate and the current pandemic (see [23]), and many others (see [24][25][26]).
The present study is aimed to contribute at the improvement of the forecasting basis for the Covid 19 pandemics evolution, by using fractal analysis in two main directions: assessing the quality of the reported data and the effect of weather conditions on the disease incidence and reproduction rate.
On one hand, the Covid-19 epidemics is analysed from a fractal perspective, emphasizing the fractal structure of the epidemiologic curves. The fractality of the Covid-19 epidemiological curves has been argued in previous studies (see [20]). However, as far as we know, analysing the complexity of Covid-19 through the fractal dimension of epidemiologic curves has not been done before.
One of the immediate benefits of comparing the fractal dimensions of the epidemiologic curves in different countries gives us the possibility to discover the places where communicating Covid-19 related data has been more accurate and where testing in order to discover the active Covid-19 cases has been most consistent. Moreover, the analysis in a comparative manner of the complexity of epidemiological curves in European countries can help discover where the measures against the pandemic have been the most efficient. In order to better evaluate the latter, we have also studied the reproduction rates in the countries that we considered.
On the other hand, we investigated the influence of the meteorological conditions on the Covid-19 evolution parameters. Many airborne spreading diseases are well known as highly dependent on weather, with an obvious seasonality in mid-latitudes regions, with influenza as a typical example, and the new Corona virus infection making no exception. Although analysing the correlations between Covid-19 and meteorological data is not new and has been the subject of several interesting research papers, our analysis reveals some new possible correlations representing an interesting novelty. There were selected two meteorological variables, temperature, which was considered in a very large number of studies and confirmed to be well correlated with Covid progress, and solar radiation, which was seldom taken into study, despite its role as a primary energy source for atmospheric processes, thus decisively affecting temperature regime, and additionally having a direct influence on the virus spread, due to its sanitising action, given the lethal effect of the UV spectral component for viruses.

Materials and methods
The present paper studies from a fractal perspective the evolution of Covid-19 in the European countries which have publicly shared Covid-19-related data. All data related to Covid-19 (new cases, reproduction rates, vaccinations) is collected from Our World in Data (see [28], [27]). The analysed data is composed of entries from the first day when Covid-19 cases were recorded in every country, until 28 July 2021. In Appendix A (in Table A1), it is marked the first day when Covid-19 cases were recorded in each of the countries that we considered.
The climatic data were extracted from the website of the European Climate Assessment & Dataset project (see [29], [30]). We analysed data corresponding to a one-year-period, from June 1 st 2020 to May 31 st 2021.
We investigated the data from a fractal point of view using the fractal dimension. Fractal dimension is a measure of the complexity of an object. There are numerous variants of the fractal dimension. Among them, one of the dimensions which is most used in practice is the box-counting dimension (or Minkowski-Bouligand dimension, or Minkowski dimension, see [31]). Throughout the article, by fractal dimension we mean the box-counting dimension.
Although box-counting dimension is not usually the most suitable dimension for analysing time series data, in the present case, for analysing the complexity of epidemiologic curves related to Covid-19, has been proven to be the most appropriate. The fact that the box-counting dimension usually measures the space-filling properties of a structure, although it can be an impediment in most cases when analysing time-series data (see [32]), in the present case, the aspect ratio choice (which is scaled relative to the highest number of Covid-19 cases recorded) does not alter the results used for comparison purpose. This is due to the fact that analysing the complexity of the epidemiologic curves must be scaled according to the population of the country. Moreover, the chosen scaling is also relevant for understanding which countries have managed better the Covid-19 crisis.
We used Python programming language to compute the box-counting dimension and we adapted the code developed by Rougier [33].
The correlation represents a statistical feature of two random variables or bivariate data which measures the relationship between the latter. In the current paper, by correlation we understand the degree in which two datasets are linearly related. Correlation is a statistic highly used in practice. The correlation coefficient is the measure which determines the correlation, i.e. the degree to which variables move in relation to each other. For the current study we use the Pearson correlation coefficient (see [34]) which returns values between -1 and 1.
For analysing the correlations between the sets of data that we considered, we have used the correlation feature (dataframe.corr()) which can be found in the Python package pandas (see [36]). The correlation function finds the pairwise correlation of the columns in the dataset (for more information about the function see [37]).
As a case study, we chose to analyse the correlations between Covid-19 data and meteorological data for Switzerland. As Covid-19 data is reported for the entire country, and the meteorological data are reported from stations in particular locations, we computed the mean value for the meteorological data considered between three stations in Switzerland, situated in different parts of the country. The chosen stations are Zurich (Zuerich/Fluntern, Latitude: +47:22:59, Longitude: +008:34:00, Elevation: 555m), Geneva (Geneve Cointrin, Latitude: +46:15:00, Longitude: +006:07:59, Elevation: 420m) and Lugano (Latitude: +46:00:00, Longitude: +008:58:00, Elevation: 273m). The significance and the confidence level of the correlations were determined by using the F (Fisher probability distribution) test, using the following procedure. For each correlation coefficient (r) the corresponding value of F was calculated with the formula in equation (1) where N − 2 is the number of freedom degrees, equal to 363, given the sample size of one year (365 days). These calculated values were compared with three thresholds, corresponding to the theoretical F values, for 1 and 363 freedom degrees and a transgression probability of 5%, 1% and 0.1%, equal to 3.87, 6.71 respectively 11.01. Calculated values greater than these three thresholds are indicating a significant, distinctively significant respectively a very significant correlation.

Fractal dimension of Covid-19 epidemiologic curves
We chose to analyse data related to Covid-19 for European countries. We computed the graphs of the new cases registered daily in a bidimensional canvas where the transversal axis corresponds to the date of the new cases recorded on the longitudinal axis, thus obtaining the epidemiologic curves corresponding to each of the countries that we considered.
We determined the fractal dimension for 45 European countries. The box-counting dimensions computed are shown in Appendix B (see Table A2). We represent the boxcounting dimensions obtained, sorted in ascending order in Figure 1.
The results obtained for the fractal dimension corresponding to each country can give us an inside on the accuracy of the data reported. The highest fractal dimension computed is for Sweden, a dimension of 1.4732, and the smallest fractal dimension is recorded for Vatican, a dimension of 1.2912. However, the two extreme data are both a result of an inaccuracy of the data provided. The main reason for a fractal dimension too close to 1 is For further investigations on the fractal dimension we can split the remaining 39 countries into three main categories, as in Table 1 Category Classifying the countries based on the box-counting dimension as in Table 1, we obtain a clearer image upon the data sets which can be further taken into consideration. In Figure  2 we depict the classification of the countries based on the box-counting dimension on the graph of the dimensions of the 39 countries sorted in ascending order. Analysing Appendix B (Table A2) and Table 1 we can observe that from the 39 countries considered, there is no country with irrelevant data, there are 9 countries with highly accurate data, 26 countries with mostly accurate data and 4 countries with noisy and mostly inaccurate data.
In order to better visualise the relevance of classifying the countries into three categories, we present an example of graphs corresponding to a country from each category, as follows: category A - Figure 3, category B - Figure 4 and category C - Figure 5.
Analysing Figures 3, 4 and 5, we can clearly see that the data in Figure 3, corresponding to Russia, are highly accurate and more frequently reported as the variations are realistic and exact to the evolution of the disease. For Figure 3, the box-counting dimension is 1.31. In Figure 4, the data is more chaotic, and although Covid-19 has some chaotic features,  analysed and must be subject to serious filtering and corrections before being ready for analysis. The box-counting dimension computed for Sweden is 1.47.

Fractal patterns of reproduction rates
An essential parameter of an epidemic is the reproduction rate (or reproduction number) which indicates the number of new cases that will be generated by each new infection among the healthy individuals. Studying the reproduction rate can indicate whether the epidemic is expanding (a higher reproduction rate), or it is likely to disappear (a lower reproduction number). The evolution of the Covid-19 pandemic, which is characterised by the constant appearance of new waves, seems to be governed by periodic functions, which posses, besides a great deal of similarity, an even greater amount of fractality. Plotting the reproduction rates may pinpoint some interesting fractal features of the reproduction rate. We can observe fractal patterns emerging from the graphs of the reproduction rates in most of the countries analysed.
Analysing the three example graphs from Figures 6, 7 and 8 we can observe the resemblance of the graphs of the reproduction rates with a classical fractal, the random Blancmange curve (or Takagi curve) which is depicted in Figure 9. Splitting the graphs in Figures 6, 7 and 8 and looking separately at smaller parts of the graph, we can better see the resemblance. As an example, see Figure 10. Thus, treating the reproduction rate as a fractal (having a fractal behaviour), might be an appropriate way to a better analysis of it.

Meteorological data and Covid-19
In the evolution of airborne spreading viral diseases, such as COVID-19 (the infection with the SARS-CoV-2 coronavirus, which is mainly disseminated through aerosolized droplets) the weather conditions play an important role. Their influence is both direct, affecting virus survival time and human host vulnerability, and indirect since weather affects people behaviour, changing the interactions frequency etc. Obviously, the factors determinant for people conducts, especially in a pandemic context, are numerous, and  quite often the non-meteorological issues, primarily social circumstances, are causing the new "waves" occurrence and dynamics. For this study we selected two meteorological parameters, daily mean values of air temperature and solar radiation, and two parameters describing Covid-19 dynamics: daily new cases and reproduction rate. Temperature was chosen because it was intensively studied in this pandemic context, with numerous articles reporting a significant correlation with Covid-19 evolution, thus offering an extensive base for results discussion. There were many other weather data considered (relative humidity, rainfall amounts and wind characteristics etc.), but almost never solar radiation. This could directly impact the virus spread, due to its lethal effect on the germ, especially of the ultraviolet component of incoming sun-rays, on both contaminated surfaces and through aerosol droplets. Thus, we decided to also investigate the correlation between solar radiation data and pandemics evolution parameters, for the previously mentioned unique direct effect, but also for an additional reason: solar radiation is the main climate genetic factor, the primary energy source for atmospheric processes, consequently a good candidate for a meteorological independent variable in a Covid-19 forecasting model. This is very important because all weather data are well correlated and this autocorrelation could bring stability issues for models, with internal regression equations involving multiple meteorological variables.
As regards the Covid-19 parameters, the selection was similar, including the number of daily new cases, which was largely considered in researches focusing weather and Covid-19 interactions, and the reproduction rate, seldom taken into account in such studies, probably being considered mainly determined by social behaviour. Taking into account data availability and quality, we chose to test the correlation between meteorological data and Covid-19 data for Switzerland. Since the Covid-19 related data is available only on a larger area, in most cases the whole country, we had to process the meteorological data to get an overall image of the area under study. For assessing the meteorological parameters representative for the entire country, we computed a mean value. For Switzerland, we computed the mean daily data, as averages of the data reported by three stations: Zurich, situated in the North of the country, Geneva, in the Southwest and Lugano, located in the Southern part of Switzerland. As concerns solar radiation, the Swiss data used were daily values of global solar radiation (the sum of direct beam and diffuse radiation).
For analysing the correlations between the two pairs of datasets (new cases and reproduction rate, air temperature and global radiation), the calculated correlation coefficients are synthesised in the following table (Table 2).
New cases Reproduction rate Mean temperature -0.318** 0.273** Global radiation -0.386** 0.097 Table 2: Correlations between datasets for Switzerland **highly significant data The significance of the correlations was established using the procedure described in Section 2 (Materials and methods). The values of the F statistics, corresponding to the correlation coefficients in Table 2, calculated with formula (1), are presented in Table 3.
New cases reproduction rate Mean temperature 40.44** 29.23** Global radiation 63.56** 3.45 Table 3: Calculated F values, used for testing the correlation significance **highly significant data The values marked with double asterisk indicate a very high significance, all three being greater than the 11.01 threshold (corresponding to a 0.1% transgression probability). The F value, associated with the low value (0.097) of the correlation coefficient between global radiation and reproduction rate is lower than 3.87 (corresponding to 95% confidence level) which indicates no significant correlation.
It is noteworthy that, despite the very strong negative correlation with the number of new cases, determined for both mean temperature and global radiation, the second is considerably more intense, as the F statistics suggests. The significant positive correlation between reproduction rate and mean air temperature, possibly indicates the association of cold weather with lower humans' mobility, while warmer conditions could increase social interactions.
In addition to this classical approach, the graphical representations of the studied datasets (see Figure 11 and Figure 12), were fractally analysed, their box-counting dimensions could be read in Table 4    The values of the fractal dimension are clearly consistent with the visual evaluation of the graphs' complexity, as can be seen in Figures 11 and 12, with an obvious neater shape for the reproduction rate curve.
For comparing these study datasets graphs, we calculated the differences between their fractal dimensions, obtaining the values in Table 5.

Daily data
New cases Reproduction rate Mean temperature 0.049 0.097 Global radiation 0.047 0.193 Table 5: Differences between the fractal dimensions of the study datasets graphs Comparing these pairwise differences with the correlation coefficients, we could observe a clear similarity. Low differences (3-7% of the average fractal dimension) indicate a good association of the corresponding graphs, thus a correlation between datasets, while a higher difference (approximately 14% of the mean value of the four fractal dimension) suggests the lack of correlation between datasets. A great difference indicates correlation absence and the pair of values could be excluded from further investigations. The reciprocal is not complying. A low difference suggests a possible correlation, which could be confirmed or not by subsequent testing.

Discussions
The obtained results related to the box-counting dimension of the new cases of Covid-19 recorded in the 39 analysed European countries gives us an insight on the accuracy of the data reported. The importance of gathering such information is crucial for determining what kind of prediction model is suitable for each country and whether the available data might be an efficient training dataset for a prediction model. The difference between analysing the box-counting dimension and other means of measuring the complexity of a structure is that fractal dimension can give us an insight on the complexity of the structure as well as pointing to the noise which is interfering with the data. Our classification based on fractal dimension of the analysed data allows the exclusion of some countries, where the data might interfere with the accuracy of the prediction. Thus, besides offering an insight on the complexity of the data, the method which we propose aims to facilitate the data selection step and helps obtain more accurate prediction models.
The highly significant negative correlation (-0.318) between mean air temperature and new Covid-19 cases that we found in this study (with a temperature range from -6.4 o C to 23.7 o C, in the study period), integrates in the mainstream of similar research results.
There are many studies, undertaken in different climate regions of the world, stating a negative strong correlation between these daily parameters. A worldwide study, focussing on 166 countries (see [38]), in climates with temperatures ranging from -5.3 o C to 34.3 o C, found that air temperature and relative humidity were negatively correlated with daily cases, the number of new cases decreasing with 3.1% for temperature increasing 1 o C. Similar results were obtained in a tropical, warm climate (temperatures between 16.8 o C and 27.4 o C, in Brazil, where the reduction rate was estimated to 4.9% , for a unit increase of temperatures (see [39]). A considerably higher correlation, with a decrease of new cases reaching 13.53% for a 1 o C temperature increment (see [40]), was found in Africa (temperatures in the study samples spanning from -2.4 o C to 33.4 o C). In Jakarta, Indonesia, also in a warm climate region, the average daily temperatures (from 26.1 o C to 28.6 o C) were significantly correlated with COVID-19 cases (correlation coefficient equal to 0.392), while other weather data (Tmin, Tmax, humidity and rainfall) occurred to be not correlated (see [41]).
A study (see [42]), focusing on the initial period of the pandemics (from December 2019 to March 2020) for all countries (including China) stated a significant, negative, correlation, between daily air temperatures and COVID-19 incidence (temperature range -33.9 o C to 34.3 o C). The significant effect of daily temperatures (ranging from -22 o C to 26 o C) on disease incidence, was also documented in China (see [43]), in New York State, USA, for a temperature interval between -3.4 o C and 25 o C (see [44]) and in Spain (see [45]), where the temperatures interval, in the study period was [1 o C, 23.2 o C.].
The relationships between air temperatures and new Covid-19 cases occurrence are far from being fully understood. There is a large number of research outcomes, some of them mentioned above, indicating their significant negative correlation, but there are also studies reporting the opposite. For instance, a study undertaken in the Norwegian capital (see [46]), for a period with air temperature spanning from -0.5 o C to 21.9 o C, reported positive correlations between daily new cases and both maximum and mean temperatures (correlation coefficients of 0.347 and respectively 0.293). There is also a Chinese study (see [47]), analysing 122 Chinese cities, documenting a 4.86% increase in COVID-19 cases for each 1 o C temperature increase, but only in cold periods, with temperatures below 3 o C.
The other meteorological variable considered in this study, global radiation, was found to be stronger correlated with Covid-19 new cases than air temperature (with a greater correlation coefficient -0.386, as compared to -0.318, and the difference is emphasised by the associated F values, 63.56 and 40.44 respectively). As mentioned before solar radiation was seldom taken into survey, but a study (see [48]), in Rio de Janeiro, Brazil, for two months in the spring of 2020, found similar results: a very significant correlation of both temperature and solar radiation with Covid-19 incidence, but considerably stronger for the latter (with a correlation coefficient, -0.609, greater than -0.406, calculated for air temperature).

Conclusions
The present paper is composed of two main parts which cannot be separated. The main novelty of the first part is computing the fractal dimension of the epidemiologic curves in order to obtain new characteristics of the complexity of the pandemic in the countries in Europe. Moreover, the analysis of the fractal features of the reproduction rates is also a new idea brought by the present paper.
On one hand, the classification of the epidemic spread in different countries based on the fractal dimension of the epidemiologic curves that we propose can be an significant tool in selecting the datasets which can be used for modelling the pandemic and predict its behaviour. The accuracy of the data, which is of utmost importance, is distinctively tracked by the fractal dimension of the epidemiologic curves. On the other hand, the analysis of the reproduction rate, which we propose, revealed the existence of noticeable fractal patterns. Treating the reproduction rate as a fractal can further prove to be an efficient tool in discovering more much needed information not only about the Covid-19 pandemic, but also about other types of contagious diseases.
Moreover, the fractal dimension differences between the graphical representations of meteorological data and Covid-19 daily progress parameters time series, could represent a preliminary analysis criterion, enabling the selection of potentially significantly correlated variable pairs (actually, eliminating the improbable candidates), thus increasing research efficiency.
Weather conditions influence COVID-19 dynamics (both directly, affecting virus survival time and human host vulnerability, and indirectly, impacting people behaviour). As regards the two meteorological variables considered herein, daily global radiation, was found to be stronger linked with Covid-19 new cases than air temperature (with a greater correlation coefficient -0.386, as compared with -0.318). Given this stronger correlation, we recommend daily solar radiation amounts as the first choice for pandemics dynamics forecasting models, especially for those using a single meteorological variable, in order to increase stability (avoiding autocorrelation issues). Solar radiation, the primary energy source for atmospheric processes, affecting decisively all other weather elements (including air temperature), has also a direct influence on the virus spread, due to its sanitising action, given the lethal effect of the UV spectral component for viruses.