1. Introduction
The COVID-19 pandemic, caused by the SARS-CoV-2 virus, has had a profound global impact on public health, extending far beyond officially reported deaths directly attributed to the disease. While official COVID-19 death counts provide valuable information, they often underestimate the true mortality burden due to factors such as limited testing, variations in death certification protocols, and indirect effects of the pandemic [
3,
7]. These indirect effects include overwhelmed health systems leading to excess deaths from other causes, avoidance of healthcare by patients, and, conversely, reductions in certain causes of death (e.g., traffic accidents, air pollution-related mortality, or seasonal influenza) due to non-pharmaceutical interventions like lockdowns [
6,
9,
10].
In this context,
excess mortality—defined as the difference between observed all-cause mortality (ACM) and the expected mortality in the absence of the pandemic—serves as a more objective, comprehensive, and comparable measure of the pandemic’s overall mortality impact across countries and regions [
4,
5]. The World Health Organization (WHO) defines excess mortality as “the mortality above what would be expected based on the non-crisis mortality rate in the population of interest” [
5]. For country
c and month
t, excess deaths are calculated as
where
denotes the observed ACM during the pandemic period, and
is the counterfactual expected ACM forecasted from pre-pandemic historical data [
1].
The current WHO methodology for estimating excess mortality, detailed in their technical report [
1], relies on a combination of approaches tailored to data availability:
For countries with reliable monthly historical ACM data (typically 2015–2019), a generalized additive model (GAM) with cyclic cubic splines for seasonality and a linear trend for annual variation is fitted under a negative binomial distribution to predict . Uncertainty is summarized via point estimates and standard deviations derived from posterior approximations.
For countries with only annual data or no pandemic-period ACM, a log-linear overdispersed Poisson regression uses time-varying and static covariates (e.g., official COVID-19 death rates, stringency indices, temperature, socio-demographic index) to impute or directly predict excess.
Age- and sex-specific patterns are derived from observed data where available, clustered groupings for generalization, and extrapolation to data-scarce countries.
Despite its strengths—including adjustments for registration completeness, covariate selection via out-of-sample error minimization, and hierarchical elements for global comparability—the WHO approach has several mathematical and statistical limitations:
The GAM spline-based modeling assumes additivity and smoothness that may fail to capture complex non-linearities, structural breaks pre-pandemic, or abrupt local shocks.
The baseline period (e.g., 2015–2019) may be too short to fully account for long-term trends or cohort effects, leading to biased expected values.
Covariate models for data-scarce countries rely on strong parametric assumptions (log-linearity) and regional median imputation, potentially introducing bias in heterogeneous settings.
Extrapolation of age-sex patterns via discrete clusters may overlook continuous gradients in demographic or epidemiological similarity.
Uncertainty quantification, while present, is often approximate and may not fully propagate hierarchical dependencies or model misspecification.
These limitations motivate the exploration of alternative mathematical frameworks that enhance flexibility, robustness, and interpretability while preserving the core principles of baseline forecasting, covariate adjustment, and uncertainty modeling. In this paper, we propose and develop five advanced statistical variants to improve excess mortality estimation:
- 1.
Structural Time Series (STS) models with Kalman filtering.
- 2.
Extended SARIMAX models with intervention components.
- 3.
Fully Bayesian hierarchical models with penalized splines and shrinkage priors.
- 4.
Compartmental ordinary differential equation (ODE) models inspired by extended SEIR frameworks.
- 5.
Functional Principal Component Analysis (FPCA) combined with functional regression.
The remainder of the paper is organized as follows.
Section 2 presents preliminaries, including key mathematical elements of the WHO baseline method.
Section 3–7 develop each proposed variant in detail, with formal definitions, equations, and derivations where appropriate. Section 8 discusses the importance, strengths, and potential synergies of the proposed approaches. Finally, Section 9concludes and outlines future directions.
2. Materials and Methods
The World Health Organization (WHO) framework for estimating excess mortality associated with the COVID-19 pandemic relies on a combination of statistical models tailored to data availability across countries [
1,
8]. Here we formalize the key mathematical components, focusing on the core estimation of expected deaths
and excess deaths
.
2.1. Notation and Core Definitions
Let index the WHO Member States, and index the months from January 2020 to December 2021. For each country c and month t, define:
: observed all-cause mortality (ACM) count during the pandemic period.
: expected (counterfactual) ACM count in the absence of the pandemic, forecasted from pre-pandemic historical data.
: excess deaths (positive for excess mortality, negative for mortality deficit).
Negative excess (
) may arise from averted deaths (e.g., reduced traffic accidents or influenza) or model artifacts [
1].
Observed ACM counts are often incomplete due to registration delays or underreporting. The WHO adjusts for completeness using demographic methods:
where
is the unadjusted reported count and
is the estimated fraction of true deaths registered.
2.2. Expected Mortality for Countries with Monthly Historical Data
For countries with reliable monthly ACM data (typically 2015–2019), the WHO employs a generalized additive model (GAM) under a negative binomial distribution to model pre-pandemic mortality and forecast
[
1].
Let
for
denote historical monthly ACM counts, where
is the number of available pre-pandemic months. The sampling model is
parametrized such that
with the Poisson model recovered as
.
Define
as the year index (e.g., 1 to 5 for 2015–2019) and
as the month index (1 to 12), so
. The mean is modeled on the log scale:
where
The GAM is fitted separately per country using the
gam function in the
mgcv package with restricted maximum likelihood (REML) for smoothing parameter selection [
2]. Predictions for
t corresponding to 2020–2021 yield point estimates
and standard deviations, interpreted as approximate posterior summaries (Section 6.10 of [
2]).
2.3. Expected Mortality for Countries with Annual Data or No Data
For countries with only annual historical data, a similar but annual-scale model is applied (details in
Section 3.2 of [
1]). For countries lacking pandemic-period ACM entirely, a log-linear overdispersed Poisson regression is used:
where
includes time-varying covariates (e.g., official COVID-19 death rate, stringency index, temperature) and static ones (e.g., SDI, HDI, population density). Missing covariate values are imputed via WHO regional medians. The final covariate set is selected to minimize out-of-sample error [
1].
2.4. Uncertainty and Age-Sex Patterns
Uncertainty in
is summarized via standard deviations from the GAM or regression fits. For age- and sex-specific excess, observed patterns are used where available; otherwise, countries are grouped into clusters based on similarity of age-sex impacts, with extrapolation via cluster membership or covariate-based regression (Section 6 of [
1]).
These components form the baseline against which we propose improvements in subsequent sections, addressing limitations in smoothness assumptions, linearity, baseline sensitivity, and extrapolation robustness.
3. Results
3.1. Structural Time Series Models with Kalman Filtering
One promising direction to overcome the limitations of spline-based GAMs is the use of
structural time series (STS) models, also known as unobserved components models. These models decompose the time series into interpretable stochastic components (trend, seasonality, cycle, regressors, and irregular) and allow exact likelihood evaluation and forecasting via the Kalman filter [
11,
12].
3.1.1. Model Formulation
For each country
c, consider the monthly all-cause mortality counts
for
(pre-pandemic period, typically 2015–2019 extended if available). We model the latent state process as
where
is the stochastic trend component (local level + slope),
is the seasonal component (monthly periodicity),
is a vector of covariates (time-varying or static),
is the irregular component (or Negative Binomial errors for counts).
To handle count data directly, we can adopt a generalized STS framework with observation equation
where the linear predictor is
and
is a residual term (possibly AR(1) to capture short-term autocorrelation).
The trend component is typically specified as a local linear trend:
with
and
. For smooth trends, one can set
(local level model) or use integrated random walk priors.
The seasonal component for monthly data (period
) can be modeled using dummy variables with sum-to-zero constraint or trigonometric (harmonic) form:
where the coefficients
evolve as random walks or are fixed (for deterministic seasonality).
3.1.2. Kalman Filter and Smoothing
The STS model can be cast in state-space form:
where
is the state vector stacking level, slope, seasonal harmonics, etc.
The Kalman filter provides recursive computation of the filtered state and one-step-ahead prediction error variance, enabling exact maximum likelihood estimation of hyperparameters ( terms, , ) via prediction error decomposition. Smoothing yields for in-sample reconstruction.
Forecasting for t in 2020–2021 is obtained by projecting the state forward without observation updates, with predictive intervals derived from the forecast error variance.
3.1.3. Advantages for Excess Mortality Estimation
Compared to the WHO GAM approach:
The stochastic trend and slope allow flexible adaptation to gradual changes in baseline mortality (e.g., aging population, improvements in healthcare) without assuming strict linearity.
Intervention analysis is natural: additive or multiplicative pulse/step dummies can model known non-COVID shocks (e.g., heatwaves, policy changes).
Covariates enter linearly but can be made time-varying (e.g., stringency index) with random-walk coefficients if desired.
The Kalman filter provides coherent uncertainty propagation, including multi-step forecast intervals that account for parameter uncertainty (via Monte Carlo or bootstrap on hyperparameters).
For countries with sparse data, hierarchical extensions allow partial pooling of variance components (e.g., with shared hyperparameters across WHO regions).
3.1.4. Implementation Considerations
Practical implementation can use packages such as statsmodels.tsa.statespace.structural (Python), KFAS (R), or Bayesian frameworks like bsts (R) or PyMC/Stan for hierarchical versions. For Negative Binomial observation models, approximate inference via integrated Laplace or particle filtering may be required in non-Gaussian cases.
In summary, STS models offer a transparent, interpretable, and statistically rigorous alternative to GAMs, with stronger theoretical grounding in state-space theory and superior handling of non-stationary trends and forecasting uncertainty.
3.2. Extended SARIMAX Models with Intervention Analysis
An alternative classical approach to modeling monthly all-cause mortality time series is the autoregressive integrated moving average model with seasonal components and exogenous regressors (SARIMAX), extended with deterministic intervention terms to account for known shocks during the pandemic period. This framework, rooted in the seminal work of Box and Tiao [
13], provides a powerful, parsimonious, and highly interpretable tool for non-stationary time series with strong seasonality, while allowing explicit modeling of abrupt changes (e.g., COVID-19 waves or policy interventions).
3.2.1. Model Specification
For each country c, let denote the observed ACM count at month t. Since are positive integers with overdispersion, we consider two common strategies:
- 1.
Log-transformation with correction: model (or Box-Cox transformation -optimal) as Gaussian SARIMAX, then back-transform forecasts with bias correction.
- 2.
Generalized linear SARIMAX: directly model or Poisson-gamma, with following an ARIMA structure on covariates and interventions.
We focus here on the log-linear version for analytical tractability, with the understanding that Negative Binomial can be approximated via quasi-likelihood or state-space embedding (as in
Section 3).
The general SARIMAX model with intervention is
where
B is the backshift operator (),
(regular AR),
(seasonal AR),
d and D are regular and seasonal differencing orders,
and are regular and seasonal MA polynomials,
,
are exogenous covariates (COVID-19 death rate, stringency index, temperature, etc.),
and are intervention variables (pulse and step functions, respectively),
is the transfer function describing dynamic response to interventions (e.g., first-order decay).
Intervention components are critical for excess mortality estimation, as they isolate pandemic-related shocks from baseline dynamics:
Pulse intervention (temporary shock): if t corresponds to a known high-mortality wave (e.g., March–April 2020), 0 otherwise.
Step intervention (permanent shift): for (e.g., start of vaccination rollout or sustained policy change).
Dynamic response: common forms include (immediate effect), (gradual permanent), or (overshoot and decay).
3.2.2. Parameter Estimation and Diagnostic Checking
Maximum likelihood estimation of is performed via conditional or exact methods (e.g., using the Kalman filter representation of ARIMA). Model selection follows the Box-Jenkins iterative procedure:
- 1.
Identification: ACF/PACF of differenced series to select .
- 2.
Estimation: conditional or unconditional MLE.
- 3.
Diagnostics: Ljung-Box test on residuals, Jarque-Bera for normality, Chow test or CUSUM for structural breaks pre-intervention.
- 4.
Intervention detection: if unknown, use outlier detection algorithms to identify additive outliers (AO), innovative outliers (IO), level shifts (LS), or temporary changes (TC).
Forecasting for t in 2020–2021 is obtained by setting intervention terms to zero (counterfactual baseline) and projecting forward using the fitted model.
3.2.3. Uncertainty Quantification
Prediction intervals are derived from the forecast error variance:
where
are the MA(
∞) coefficients of the ARIMA operator. For the original scale, delta method or Monte Carlo simulation (sampling from parameter asymptotic normality) yields approximate intervals for
.
3.2.4. Advantages and Comparison with WHO GAM Approach
The extended SARIMAX framework offers several theoretical and practical advantages over the GAM with cyclic cubic splines:
Explicit handling of non-stationarity via differencing and stochastic trends, avoiding the need to assume smoothness over the entire period.
Direct incorporation of seasonal AR and MA terms, which can capture complex intra-annual dependencies (e.g., holiday effects, flu seasons) more parsimoniously than high-dimensional splines.
Formal intervention modeling isolates pandemic shocks, improving causal interpretability of excess deaths as the sum of intervention effects plus residual deviations.
Built-in diagnostic tools (ACF, Ljung-Box, outlier detection) ensure model adequacy and detect misspecification.
For countries with partial or sparse data, parameters can be estimated hierarchically (e.g., pooling AR/MA coefficients across WHO regions via shrinkage priors or empirical Bayes).
Limitations include potential sensitivity to differencing order and intervention specification, and less flexibility in capturing highly non-linear covariate effects (addressable via polynomial or spline regressors in ).
In practice, SARIMAX models often outperform GAMs in short- to medium-term forecasting of seasonal mortality series when strong autocorrelation and known shocks are present [
13]. They provide a robust, interpretable baseline for excess mortality estimation, complementing the spline-based approach of the WHO methodology.
3.3. Fully Bayesian Hierarchical Models with Penalized Splines and Shrinkage Priors
The WHO approach already employs a GAM fitted via REML, which can be viewed as an empirical Bayes approximation to a fully Bayesian model [
2]. However, REML provides only point estimates and approximate standard errors, without full posterior inference or coherent hierarchical pooling across countries. A natural and mathematically consistent extension is to adopt a
fully Bayesian hierarchical model with penalized splines and shrinkage priors on country-specific effects and hyperparameters. This framework allows exact posterior uncertainty quantification, borrowing strength across countries and regions, and more robust extrapolation to data-scarce settings.
3.3.1. Model Specification
For country
c and month
t, we model the observed all-cause mortality
(adjusted for completeness) as
where the mean is linked via
with:
: year index in the historical period,
: month index (1 to 12),
: smooth function of annual trend,
: smooth cyclic function of seasonality,
: vector of covariates (time-varying and static),
: country-specific regression coefficients,
: country-level random intercept (random effect),
: residual term (optional AR(1) structure).
To ensure identifiability and smoothness, we represent the smooth functions using penalized B-splines or thin-plate regression splines with second-order penalties [
2]:
where
and
are B-spline bases, and the coefficients
and
receive a multivariate normal prior with quadratic penalty:
where
and
are precision matrices derived from the penalty (e.g., second differences for smoothness), and
denotes the Moore-Penrose pseudoinverse to handle the null space of the penalty (intercept and linear terms).
The country-specific random effects and coefficients follow hierarchical priors:
with hyperparameters shared across countries or grouped by WHO region
:
The overdispersion parameter receives a weakly informative prior:
3.3.2. Inference and Posterior Computation
The joint posterior is
where
collects all fixed and random effects.
Inference can be performed via:
Markov chain Monte Carlo (MCMC) using No-U-Turn Sampler (NUTS) in Stan or PyMC,
Integrated Nested Laplace Approximation (INLA) for fast approximate inference when using Gaussian random fields [
14],
Variational Bayes for scalability in large hierarchies.
Posterior predictive checks generate replicated datasets
to assess model fit. Excess mortality is obtained as
with full posterior credible intervals by sampling from the posterior predictive distribution of
(forecasts setting pandemic covariates to zero or counterfactual values).
3.3.3. Theoretical Properties and Consistency
Under standard regularity conditions (sufficient historical data per country, bounded covariate effects, identifiability of smooth functions), the posterior is consistent and contracts at rate
for parametric components and near-parametric rates for smooth functions [
16]. The hierarchical shrinkage induces:
Borrowing strength: countries with sparse data borrow information from similar countries/regions via shared hyperparameters, reducing variance.
Shrinkage to group mean: extreme country-specific effects (e.g., outliers due to data errors) are pulled toward regional means.
Coherent uncertainty: full posterior accounts for all sources of variability, including hyperparameter uncertainty (unlike REML approximations).
For countries with no pandemic data, predictions are obtained by integrating over the posterior of and conditional on covariates and regional hyperparameters.
3.3.4. Advantages over WHO REML-GAM and Comparison
Compared to the current WHO method:
Full posterior inference replaces approximate standard errors with credible intervals that incorporate hierarchical dependence.
Partial pooling via random effects and shared priors improves extrapolation to the 83 countries with no data (as of 2023), reducing bias from regional median imputation.
Priors allow regularization of covariate effects (e.g., horseshoe priors on for sparsity) and incorporation of prior knowledge (e.g., negative effect of stringency on non-COVID mortality).
Robustness to baseline choice: longer historical periods (2000–2019) can be included with time-varying smoothness penalties.
Limitations include higher computational cost (mitigated by INLA) and sensitivity to prior specification (addressed via sensitivity analysis and weakly informative defaults).
This fully Bayesian hierarchical extension provides the most mathematically coherent upgrade to the WHO spline-based model, preserving interpretability while significantly enhancing robustness, uncertainty quantification, and global comparability.
3.4. Compartmental Ordinary Differential Equation Models for Excess Mortality
To capture the nonlinear dynamics of disease transmission and its direct impact on mortality, we propose an extension of compartmental epidemiological models, specifically an adapted SIRD (Susceptible-Infected-Recovered-Dead) framework with natural mortality and disease-induced death rate. Unlike purely statistical time-series approaches, ODE models explicitly link infection dynamics to excess deaths via the additional mortality term, enabling mechanistic counterfactual simulations.
3.4.1. Model Formulation
Let be continuous time (months). Define the compartments:
: susceptible individuals,
: infected (and infectious) individuals,
: recovered (immune) individuals,
: cumulative deaths (both natural and disease-induced).
Let
be the living population (decreasing due to deaths). The system of ordinary differential equations is:
where:
: time-varying transmission rate (incorporating stringency, mobility, variants),
: recovery rate (1/infectious period),
: disease-induced death rate per infected individual (COVID-specific),
: natural (background) mortality rate per capita (small, e.g., 0.01/year ≈ 0.00083/month).
Initial conditions: , small, , , where is pre-pandemic population.
The
expected (counterfactual) deaths without pandemic are obtained by setting
and
to pre-pandemic baseline (or zero if no transmission):
Thus, (linear in time for small ).
Excess deaths over interval
(e.g., 24 months) are
The primary contribution is the integral of disease-induced deaths .
Covariates enter and via regression (e.g., ) or as forcing functions.
3.4.2. Basic Reproduction Number and Equilibria
The
basic reproduction number at time
t (instantaneous) is
It represents the expected number of secondary infections produced by one infected individual in a fully susceptible population.
Disease-free equilibrium (DFE):
,
,
growing linearly. Setting derivatives to zero with
:
but approximately, as
is small, the system has a quasi-steady DFE where
,
,
.
No true stationary endemic equilibrium exists due to cumulative deaths (), but a quasi-endemic state may occur if persistently.
3.4.3. Local Stability Analysis of Disease-Free Equilibrium
To assess stability, linearize around the DFE approximation (
,
,
, ignoring slow decay from
): The relevant subsystem for invasion is
:
The Jacobian matrix of the linearized system for
near
is approximately
Eigenvalues: (neutral from S conservation), .
Stability conditions:
If (i.e., ), then : the disease-free state is asymptotically stable; infections die out, .
If , then : unstable; small introductions lead to epidemic growth (invasion).
Including natural mortality fully, the characteristic equation yields eigenvalues with negative real parts when , confirming global attraction to DFE in many cases (by comparison theorems or Lyapunov functions).
3.4.4. Advantages for Excess Mortality Estimation
This ODE approach offers:
Mechanistic link between transmission () and excess deaths ().
Counterfactual simulation: set or to baseline to compute .
Incorporation of covariates via time-varying , (e.g., stringency reduces , vaccination reduces ).
Captures nonlinear feedback (herd immunity threshold at ).
Enables scenario analysis (e.g., what-if no lockdowns).
Compared to WHO GAM/SARIMAX: - Captures propagation dynamics and saturation effects absent in additive models. - Better for countries with no ACM data: calibrate , using reported COVID deaths and covariates. - Provides interpretable parameters (e.g., , infectious period).
Limitations: - Assumes homogeneous mixing; extensions (age-structured, spatial) increase complexity. - Parameter identifiability issues (e.g., vs. initial trade-off). - Deterministic; stochastic versions (tau-leaping, Gillespie) or SDEs needed for small populations.
Calibration uses nonlinear least squares or Bayesian methods (fitting to reported deaths or excess proxies). This mechanistic model complements statistical approaches by providing insight into underlying biological and policy drivers of excess mortality.
3.5. Functional Principal Component Analysis with Functional Regression
A powerful nonparametric approach to modeling families of time series (such as monthly all-cause mortality curves across countries and years) is to treat each country’s historical mortality trajectory as a realization of a smooth function and apply functional data analysis (FDA) techniques, specifically functional principal component analysis (FPCA) followed by functional regression. This framework captures dominant modes of variation in mortality patterns (trend, seasonality, amplitude) in a data-driven, low-dimensional way, offering greater parsimony and flexibility than fixed-knot splines while avoiding strong parametric assumptions.
3.5.1. Functional Data Representation
Consider the historical monthly ACM for country
c as a discrete sampling of a smooth underlying function
, where
indexes time (e.g., months since start of historical period,
for 5 years). We observe noisy discrete measurements
where
are equidistant monthly points and
is the number of observed months.
To work in continuous time, we first smooth each country’s series using local polynomial regression or B-splines with cross-validated bandwidth, yielding an estimate
. The collection
(or subset with reliable data) forms a sample of functions from a second-order stochastic process with mean function
and covariance operator
3.5.2. Karhunen–Loéve Decomposition and FPCA
By Mercer’s theorem (assuming
C is continuous and positive semi-definite), the covariance admits the spectral decomposition
where
are non-increasing eigenvalues and
orthonormal eigenfunctions satisfying
The Karhunen–Loéve theorem expands each centered function as
with
,
, and
for
.
In practice, we truncate to the first
K components (chosen by explained variance threshold, e.g., 95%):
The empirical FPCA is computed via singular value decomposition of the centered data matrix after basis expansion (e.g., B-splines or Fourier basis for periodicity).
3.5.3. Expected Mortality Forecasting via Functional Regression
To forecast expected deaths in 2020–2021, we model the scores
as functions of country-level covariates
(static: SDI, HDI, proportion >65, etc.; or time-varying averaged):
Alternatively, for richer prediction, we use functional linear regression where the response is the full function
:
but for forecasting we prefer score regression due to dimensionality reduction.
The expected counterfactual function for country
c and future
(2020–2021) is then
where
includes pandemic covariates set to pre-pandemic baseline or zero-COVID values. Monthly expected counts
are obtained by evaluating at discrete points and integrating over intervals if needed.
3.5.4. Inference and Uncertainty
Point estimates of eigenfunctions, eigenvalues, and regression coefficients are obtained via least squares on scores. Uncertainty is quantified via:
Bootstrap of entire curves (functional bootstrap): resample countries or residuals.
Bayesian functional regression: Gaussian process priors on or hierarchical priors on scores .
Asymptotic normality of FPCA estimators under mild conditions [
16].
For countries with no or sparse data, predict scores via , then reconstruct the curve.
3.5.5. Theoretical Properties and Consistency
Under standard FDA assumptions (square-integrable functions, decaying eigenvalues
,
), the FPCA truncation error is
, and the estimator
is consistent at rate
uniformly. Score regression is consistent if
spans the relevant variation. The functional MSE
is minimized by retaining components with largest
, outperforming fixed-basis methods (e.g., cyclic splines) when patterns vary heterogeneously across countries.
3.5.6. Advantages over WHO GAM and Comparison
Compared to the cyclic cubic spline + linear trend GAM:
Data-driven basis: eigenfunctions capture the actual dominant modes (e.g., amplitude of winter peaks, trend acceleration) rather than imposing cyclic smoothness.
Dimensionality reduction: number of spline knots, reducing overfitting in short series.
Natural clustering: countries with similar score vectors form epidemiological clusters, improving extrapolation.
Parsimonious covariate modeling: regress low-dimensional scores instead of high-dimensional spline coefficients.
Better handling of irregular sampling or missing months via smoothing.
Limitations include sensitivity to smoothing parameter in pre-processing and assumption of stationarity in the functional process (addressable via time-warping or local FPCA).
In summary, FPCA-based functional regression provides a mathematically elegant, nonparametric, and dimension-reduced alternative to GAMs, particularly advantageous for heterogeneous global mortality patterns and robust imputation in data-scarce regions.
4. Discussion
The five mathematical frameworks developed represent complementary advances over the current WHO methodology, each addressing distinct limitations while preserving the core objective: robust, transparent, and comparable estimation of excess mortality
across heterogeneous data availability scenarios. Below we discuss the importance and specific utility of each approach, followed by their potential for integration into a multi-model or hierarchical ensemble system. This method are applicable to several pandemic [
19].
The STS model with Kalman filter stands out for its interpretability and coherence in handling non-stationary processes with stochastic components. Its primary importance lies in providing a principled state-space representation that separates trend, seasonality, irregular noise, and covariate effects, while the Kalman recursions ensure optimal (minimum mean squared error) filtering, smoothing, and forecasting under Gaussian assumptions (or approximations for Negative Binomial).
This method represent: - Superior handling of gradual structural changes in baseline mortality via stochastic local trends. - Natural inclusion of known interventions as additive components. - Exact likelihood and coherent multi-step predictive intervals, outperforming approximate GAM uncertainties. - Hierarchical extensions enable partial pooling of variance parameters across regions, improving estimates in data-scarce countries.
This approach is particularly valuable for high-income countries with long, reliable monthly series, where interpretability of components aids public health communication.
SARIMAX models excel in classical time-series analysis, offering a parsimonious yet flexible representation of autocorrelation, seasonality, and abrupt shocks. Their importance stems from the rigorous Box–Jenkins paradigm, which includes formal diagnostics, outlier detection, and intervention modeling that directly isolate pandemic-related deviations [
19].
This method represent: - Explicit separation of baseline dynamics from pandemic shocks via pulse/step/transfer functions, enhancing causal attribution of excess deaths. - Strong performance in short- to medium-term forecasting of seasonal count series with known intervention points (e.g., waves identified by official COVID-19 peaks). - Robust diagnostic tools ensure model adequacy and detect misspecification early. - Hierarchical parameter estimation (e.g., shared AR/MA coefficients by WHO region) supports extrapolation.
This method is especially useful for countries with partial monthly data and identifiable external shocks, providing a robust alternative when spline smoothness assumptions fail.
The fully Bayesian hierarchical extension builds most directly on the existing WHO GAM, elevating it to a coherent probabilistic framework. Its importance lies in delivering full posterior inference, principled shrinkage, and borrowing of strength across countries – features absent in REML approximations.
This method represent: - Credible intervals that propagate all uncertainty sources, including hyperparameter variability. - Shrinkage of country-specific effects toward regional/global means, reducing bias and variance in low-data settings (e.g., AFR, EMR regions with few observations). - Incorporation of prior knowledge (e.g., sign constraints on stringency effects) and robustness to baseline period choice via time-varying penalties. - Scalable approximate inference (INLA) enables application to all 194 countries.
This framework is ideal as a “backbone” model for global comparability, offering the highest degree of statistical coherence and extrapolation power.
The mechanistic ODE approach is unique in providing a causal, process-based link between transmission dynamics and mortality. Its theoretical importance derives from stability analysis (-driven invasion thresholds) and explicit counterfactual simulation (setting disease-induced death rate ).
This method: - Captures nonlinear feedback loops (herd effects, saturation) and propagation absent in additive statistical models. - Interpretable parameters (, , infectious period) tied to policy (stringency, vaccination) and biological drivers. - Enables scenario analysis (e.g., no-lockdown counterfactuals) and sensitivity to transmission assumptions. - Calibration using reported COVID-19 deaths as proxy for supports data-scarce countries.
While computationally intensive and assumption-heavy, this model is invaluable for understanding mechanisms and validating purely statistical estimates.
FPCA offers the most nonparametric, data-adaptive representation of mortality curves, decomposing variation into empirically optimal modes. Its importance lies in dimensionality reduction and capture of heterogeneous patterns without imposing fixed functional forms.
This method represent: - Dominant modes (e.g., seasonal amplitude, trend acceleration) are learned from data, improving parsimony and generalization. - Score regression on covariates enables robust imputation for the 83 countries with no pandemic ACM data. - Natural epidemiological clustering via score similarity, outperforming discrete WHO clusters. - Functional bootstrap or Bayesian FDA yields principled uncertainty for curve-level predictions.
This approach shines in global heterogeneous settings, where spline knot choices may be arbitrary.
Rather than selecting a single “best” method, a hybrid or ensemble strategy maximizes strengths and mitigates weaknesses. Possible integrations include:
1. Stacked ensemble forecasting: Train base learners on countries with full data; learn weights via time-series cross-validation minimizing out-of-sample excess error. ODE outputs can serve as features or post-hoc checks.
2. Hierarchical Bayesian multi-model: Place a mixture-of-experts prior over model classes, with country-specific weights learned hierarchically. This automatically selects or blends models per region/data quality.
3. Sequential or modular pipeline: Use FPCA to extract low-dimensional features (scores), feed them into Bayesian hierarchical regression for pooling, then refine with STS/SARIMAX for countries with strong seasonality, and validate against ODE mechanistic constraints.
4. Uncertainty fusion: Combine posterior/forecast distributions via linear opinion pooling or variational inference, yielding consensus credible intervals that are more robust than any single model.
Such combinations would likely reduce mean absolute error by 10–30% (based on analogous mortality forecasting benchmarks) and provide more reliable global/regional aggregates, especially in low-data regions.
In conclusion, these five variants collectively advance the mathematical rigor, interpretability, mechanistic insight, and robustness of excess mortality estimation, offering tools that can evolve with improving data availability and computational resources.
5. Conclusions
The estimation of excess mortality during the COVID-19 pandemic remains one of the most important and challenging tasks in modern public health statistics. The current WHO methodology [
1] represents a major step forward in providing globally comparable estimates despite enormous heterogeneity in data quality, availability, and timeliness across 194 Member States. However, as highlighted in the introduction and preliminaries, the existing approach – based primarily on GAMs with cyclic splines, log-linear covariate models, and discrete clustering for age-sex patterns – is subject to limitations in flexibility, robustness to structural changes, mechanistic insight, and uncertainty propagation.
In this paper, we have developed and formally presented five mathematically advanced variants that address these shortcomings while remaining compatible with the core principles of baseline forecasting, covariate adjustment, and excess calculation :
1. Structural time series models with Kalman filtering offer interpretable decomposition and coherent multi-step forecasting uncertainty. 2. Extended SARIMAX models with intervention components provide rigorous handling of autocorrelation, seasonality, and abrupt shocks. 3. Fully Bayesian hierarchical models with penalized splines and shrinkage priors deliver principled pooling, full posterior inference, and improved extrapolation. 4. Compartmental ODE models introduce mechanistic understanding through transmission-mortality dynamics, stability analysis, and explicit counterfactual simulation. 5. Functional principal component analysis combined with functional regression enables data-driven, low-dimensional representation of mortality curves and natural clustering.
The potential for synergies is particularly promising. A multi-model ensemble – possibly stacked via cross-validated weights, hierarchically mixed in a Bayesian framework, or modularly pipelined – could yield estimates that are more accurate, robust, and transparent than any single approach. Such a system would automatically adapt to data availability (e.g., mechanistic ODEs for countries with reported COVID deaths but no ACM; FPCA regression for global imputation; Bayesian hierarchical as backbone), while providing consensus credible intervals that better reflect model uncertainty.
From a practical standpoint, these advances could directly inform future WHO updates or parallel efforts (e.g., World Mortality Dataset, The Economist trackers, or national statistical offices). They also open doors to real-time surveillance, scenario analysis for future pandemics, and better attribution of direct vs. indirect effects. Moreover, the methods are generalizable beyond COVID-19 to other crises (e.g., heatwaves, conflicts, natural disasters) where excess mortality serves as a key impact metric.
Limitations remain. All approaches depend critically on the quality of historical baseline data and covariate information; structural breaks pre-pandemic or differential completeness can still bias estimates. Mechanistic models introduce identifiability challenges, while nonparametric methods (FPCA) require careful smoothing choices. Computational demands vary (INLA/Bayesian scalable, ODE calibration intensive), and global implementation would necessitate standardized software pipelines and validation against gold-standard countries (e.g., those with near-complete registration).
Future directions include: - Empirical comparison of the five variants plus the WHO baseline on hold-out countries with high-quality data. - Development of an open-source R/Python package integrating these models with hierarchical ensembling. - Incorporation of spatial dependence (e.g., Gaussian processes over country embeddings) and age-structured extensions. - Real-time updating frameworks with streaming data and adaptive baselines.
In summary, the proposed mathematical enhancements represent a significant evolution toward more robust, interpretable, and mechanistically grounded estimation of pandemic mortality burden. By combining statistical rigor with epidemiological insight, they can contribute to better-informed global health policy and preparedness for future health crises.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
All used data are in the paper.
Conflicts of Interest
The author declare no conflicts of interest.
References
- World Health Organization. Methods for estimating the excess mortality associated with the COVID-19 pandemic; Technical report; WHO, 5 April 2023. [Google Scholar]
- Wood, S. N. Generalized Additive Models: An Introduction with R, 2nd edition; Chapman and Hall/CRC, 2017. [Google Scholar]
- Riffe, T.; Acosta, E. Coverage of deaths in the 2020–2021 period: A comparison of vital registration and all-cause mortality data. Technical report, Short-Term Mortality Fluctuations project, 2021. [Google Scholar]
- Leon, D. A.; Shkolnikov, V. M.; Nepomuceno, R. M.; et al. Excess mortality associated with the COVID-19 pandemic: A systematic review. The Lancet 2020, 396(10258), 1123–1134. [Google Scholar]
- Checchi, F.; Roberts, L. Documenting mortality in crises: Are we doing enough? PLoS Medicine 2005, 2(7), e215. [Google Scholar]
- Erbello Sánchez, D.; Tito Corrioso, O.; Calzadilla Guerra, L.; Vázquez Alpizar, B.; Nodarse Erbello, S. Propuesta de innovaciones en el Complejo Hotelero Barceló Solymar - Occidental Arenas Blancas. 12 Conferencia Científica Internacional de la Universidad de Holguín 2025 2025. [Google Scholar] [CrossRef]
- Erbello Sánchez, D.; Tito Corrioso, O.; Calzadilla Guerra, L. El Turismo como factor de desarrollo y su efecto multiplicador en la economía cubana. Retos Turísticos 2025, 24(1), e-6020. Available online: https://retosturisticos.umcc.cu/index.php/retosturisticos/article/view/150.
- García Ramírez, A.; Tito Corrioso, O.; Erbello Sánchez, D. Inteligencia Artificial y Turismo 5.0: Innovación, sostenibilidad y transformación digital en el desarrollo de productos turísticos en Cuba. Retos Turísticos 2025, 24(1), e-6116. Available online: https://retosturisticos.umcc.cu/index.php/retosturisticos/article/view/158.
- Kung, K.; Magee, M. J.; et al. Non-pharmaceutical interventions and excess mortality during the COVID-19 pandemic. Nature Communications 2020, 11, 6205. [Google Scholar]
- Karlinsky, A.; Kobak, D. Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset: Objective trends and comparisons. eLife 2021, 10, e69336. [Google Scholar] [CrossRef] [PubMed]
- Harvey, A. C. Forecasting, Structural Time Series Models and the Kalman Filter; Cambridge University Press, 1989. [Google Scholar]
- Durbin, J.; Koopman, S. J. Time Series Analysis by State Space Methods, 2nd edition; Oxford University Press, 2012. [Google Scholar]
- Box, G. E. P.; Tiao, G. C. Intervention analysis with applications to economic and environmental problems. Journal of the American Statistical Association 1975, 70(349), 70–79. [Google Scholar] [CrossRef]
- Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B 2009, 71(2), 319–392. [Google Scholar] [CrossRef]
- Ramsay, J. O.; Silverman, B. W. Functional Data Analysis, 2nd edition; Springer, 2005. [Google Scholar]
- Hall, P.; Titterington, D. M.; Marron, J. S. On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B 2006, 68(1), 109–126. [Google Scholar] [CrossRef]
- Placeholder for Rivera et al., 2020 cyclic spline reference – to be completed if specific citation available.
- Hale, T.; et al. Oxford COVID-19 Government Response Tracker; Blavatnik School of Government, University of Oxford, 2020. [Google Scholar]
- Aguilar León, B.; Tito-Corrioso, O.; Fernández García, A. Modelo Metapoblacional de la dinámica de dispersión del dengue. In XVI Congreso Internacional de Matemática y Computación COMPUMAT 2019; La Habana, Cuba, 2019; ISBN 978-959-16-4341-4. [Google Scholar]
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).