1. Introduction
The COVID-19 pandemic began in late 2019, and quickly manifested itself as a massive increase in global mortality. However, there were problems related to attribution and causation. As such, when it comes to analyzing or modeling COVID-19 mortality data, there are two approaches. The first is to specifically use COVID-19-attributed mortality. This has the benefit of a clear causal structure, where patterns in the data can be more easily connected to the spread of the pandemic [
19]. However, there are notable problems with respect to proper attribution. Deaths related to COVID-19 are not attributed to the disease absent a positive COVID-19 test, which is not always possible in locations unable to test [
26]. This is to say nothing of other infrastructure issues or potentially missing deaths not directly caused by COVID-19, but instead by complications from an existing condition or a response to the pandemic [
26,
28]. However, using mortality data directly attributed to COVID-19 comes with the massive benefit of no ambiguity.
The second approach is to analyze overall mortality, usually via the concept of excess mortality. This is defined as the normalized difference between an expected (historical) death count and aggregate deaths [
9]. Although there is some ambiguity in calculating the expected number of deaths for a particular location, this approach has the benefit of capturing systemic effects the pandemic might have had [
21,
28]. For example, it allows data to include the effects of increased mortality in those with existing conditions of contracting COVID-19 [
21], or possible increases in suicides [
19]. However, the data will also reflect a decrease in vehicle-related deaths due to lock-downs [
26]. This approach captures the net effect and accurately reflects the total effect of the pandemic. However, it also carries the possible risk of masking the magnitude of the positive effect on mortality.
Regardless, multiple approaches are used to model the resulting data. The Center for Disease Control (CDC) has suggested ensemble models [
17], in [
16], and in [
26]. [
21] and [
4] used generalized additive models to relate different demographic or location data with mortality. ARIMA models are used frequently. For listed examples, see [
19] and [
26]. Copula models are also used to determine the relationships between mortality and other time series data, such as the correlation of interstate trends [
18], or by combining mortality data with temperature [
3]. Here we follow the method laid out by [
19], in which ARIMA models are developed for individual locations, and the model residuals are then related to each other via copula analysis. This allows for seasonality and intra-country effects to be accounted for before addressing cross-correlation between countries. This also allows for the non-normal residuals, which while technically a violation of ARIMA assumptions, allows for the interpretation of fat-tailed residual distributions as an indication of a more complicated dependence structure.
The present paper has an objective of analyzing excess mortality time series for different countries during the period of T weeks and modeling the dependence patterns in the vector through the ARIMA residuals .
Mortality statistics are particularly difficult to compare across countries for several reasons. First and foremost, different countries have different standards for recording deaths. For example, England records only the date a death is “registered,” while the United States records mortality statistics using the date of a death [
4,
22]. This means comparisons involving countries who do not record the date of a death are difficult, as actual mortality experience will not be reflected in the data. While a close reconstruction of weekly data is possible (see [
21]), it still leaves open the potential problem of a death being registered several weeks after it occurs, making assigning the week it occurred impossible. Second, countries differ in how they define a “week” and how many there are in the calendar year. For example, the European countries in this study (France, Germany, Norway, Sweden) record their weekly mortality data as the sum of deaths occurring from Monday-Sunday, while the United States and Canada record theirs as the sum of deaths from Sunday to Saturday [
22]. This makes interpretations of resulting models somewhat weaker, but absent massive spikes for one day only, it should not affect overall trends.