Submitted:
05 February 2026
Posted:
09 February 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- For countries with reliable monthly historical ACM data (typically 2015–2019), a generalized additive model (GAM) with cyclic cubic splines for seasonality and a linear trend for annual variation is fitted under a negative binomial distribution to predict . Uncertainty is summarized via point estimates and standard deviations derived from posterior approximations.
- For countries with only annual data or no pandemic-period ACM, a log-linear overdispersed Poisson regression uses time-varying and static covariates (e.g., official COVID-19 death rates, stringency indices, temperature, socio-demographic index) to impute or directly predict excess.
- Age- and sex-specific patterns are derived from observed data where available, clustered groupings for generalization, and extrapolation to data-scarce countries.
- The GAM spline-based modeling assumes additivity and smoothness that may fail to capture complex non-linearities, structural breaks pre-pandemic, or abrupt local shocks.
- The baseline period (e.g., 2015–2019) may be too short to fully account for long-term trends or cohort effects, leading to biased expected values.
- Covariate models for data-scarce countries rely on strong parametric assumptions (log-linearity) and regional median imputation, potentially introducing bias in heterogeneous settings.
- Extrapolation of age-sex patterns via discrete clusters may overlook continuous gradients in demographic or epidemiological similarity.
- Uncertainty quantification, while present, is often approximate and may not fully propagate hierarchical dependencies or model misspecification.
- 1.
- Structural Time Series (STS) models with Kalman filtering.
- 2.
- Extended SARIMAX models with intervention components.
- 3.
- Fully Bayesian hierarchical models with penalized splines and shrinkage priors.
- 4.
- Compartmental ordinary differential equation (ODE) models inspired by extended SEIR frameworks.
- 5.
- Functional Principal Component Analysis (FPCA) combined with functional regression.
2. Materials and Methods
2.1. Notation and Core Definitions
- : observed all-cause mortality (ACM) count during the pandemic period.
- : expected (counterfactual) ACM count in the absence of the pandemic, forecasted from pre-pandemic historical data.
- : excess deaths (positive for excess mortality, negative for mortality deficit).
2.2. Expected Mortality for Countries with Monthly Historical Data
- is a linear trend in year to capture annual variation,
- is a cyclic cubic spline on month to model within-year seasonality [17].
2.3. Expected Mortality for Countries with Annual Data or No Data
2.4. Uncertainty and Age-Sex Patterns
3. Results
3.1. Structural Time Series Models with Kalman Filtering
3.1.1. Model Formulation
- is the stochastic trend component (local level + slope),
- is the seasonal component (monthly periodicity),
- is a vector of covariates (time-varying or static),
- is the irregular component (or Negative Binomial errors for counts).
3.1.2. Kalman Filter and Smoothing
3.1.3. Advantages for Excess Mortality Estimation
- The stochastic trend and slope allow flexible adaptation to gradual changes in baseline mortality (e.g., aging population, improvements in healthcare) without assuming strict linearity.
- Intervention analysis is natural: additive or multiplicative pulse/step dummies can model known non-COVID shocks (e.g., heatwaves, policy changes).
- Covariates enter linearly but can be made time-varying (e.g., stringency index) with random-walk coefficients if desired.
- The Kalman filter provides coherent uncertainty propagation, including multi-step forecast intervals that account for parameter uncertainty (via Monte Carlo or bootstrap on hyperparameters).
- For countries with sparse data, hierarchical extensions allow partial pooling of variance components (e.g., with shared hyperparameters across WHO regions).
3.1.4. Implementation Considerations
3.2. Extended SARIMAX Models with Intervention Analysis
3.2.1. Model Specification
- 1.
- Log-transformation with correction: model (or Box-Cox transformation -optimal) as Gaussian SARIMAX, then back-transform forecasts with bias correction.
- 2.
- Generalized linear SARIMAX: directly model or Poisson-gamma, with following an ARIMA structure on covariates and interventions.
- B is the backshift operator (),
- (regular AR),
- (seasonal AR),
- d and D are regular and seasonal differencing orders,
- and are regular and seasonal MA polynomials,
- ,
- are exogenous covariates (COVID-19 death rate, stringency index, temperature, etc.),
- and are intervention variables (pulse and step functions, respectively),
- is the transfer function describing dynamic response to interventions (e.g., first-order decay).
- Pulse intervention (temporary shock): if t corresponds to a known high-mortality wave (e.g., March–April 2020), 0 otherwise.
- Step intervention (permanent shift): for (e.g., start of vaccination rollout or sustained policy change).
- Dynamic response: common forms include (immediate effect), (gradual permanent), or (overshoot and decay).
3.2.2. Parameter Estimation and Diagnostic Checking
- 1.
- Identification: ACF/PACF of differenced series to select .
- 2.
- Estimation: conditional or unconditional MLE.
- 3.
- Diagnostics: Ljung-Box test on residuals, Jarque-Bera for normality, Chow test or CUSUM for structural breaks pre-intervention.
- 4.
- Intervention detection: if unknown, use outlier detection algorithms to identify additive outliers (AO), innovative outliers (IO), level shifts (LS), or temporary changes (TC).
3.2.3. Uncertainty Quantification
3.2.4. Advantages and Comparison with WHO GAM Approach
- Explicit handling of non-stationarity via differencing and stochastic trends, avoiding the need to assume smoothness over the entire period.
- Direct incorporation of seasonal AR and MA terms, which can capture complex intra-annual dependencies (e.g., holiday effects, flu seasons) more parsimoniously than high-dimensional splines.
- Formal intervention modeling isolates pandemic shocks, improving causal interpretability of excess deaths as the sum of intervention effects plus residual deviations.
- Built-in diagnostic tools (ACF, Ljung-Box, outlier detection) ensure model adequacy and detect misspecification.
- For countries with partial or sparse data, parameters can be estimated hierarchically (e.g., pooling AR/MA coefficients across WHO regions via shrinkage priors or empirical Bayes).
3.3. Fully Bayesian Hierarchical Models with Penalized Splines and Shrinkage Priors
3.3.1. Model Specification
- : year index in the historical period,
- : month index (1 to 12),
- : smooth function of annual trend,
- : smooth cyclic function of seasonality,
- : vector of covariates (time-varying and static),
- : country-specific regression coefficients,
- : country-level random intercept (random effect),
- : residual term (optional AR(1) structure).
3.3.2. Inference and Posterior Computation
- Markov chain Monte Carlo (MCMC) using No-U-Turn Sampler (NUTS) in Stan or PyMC,
- Integrated Nested Laplace Approximation (INLA) for fast approximate inference when using Gaussian random fields [14],
- Variational Bayes for scalability in large hierarchies.
3.3.3. Theoretical Properties and Consistency
- Borrowing strength: countries with sparse data borrow information from similar countries/regions via shared hyperparameters, reducing variance.
- Shrinkage to group mean: extreme country-specific effects (e.g., outliers due to data errors) are pulled toward regional means.
- Coherent uncertainty: full posterior accounts for all sources of variability, including hyperparameter uncertainty (unlike REML approximations).
3.3.4. Advantages over WHO REML-GAM and Comparison
- Full posterior inference replaces approximate standard errors with credible intervals that incorporate hierarchical dependence.
- Partial pooling via random effects and shared priors improves extrapolation to the 83 countries with no data (as of 2023), reducing bias from regional median imputation.
- Priors allow regularization of covariate effects (e.g., horseshoe priors on for sparsity) and incorporation of prior knowledge (e.g., negative effect of stringency on non-COVID mortality).
- Robustness to baseline choice: longer historical periods (2000–2019) can be included with time-varying smoothness penalties.
3.4. Compartmental Ordinary Differential Equation Models for Excess Mortality
3.4.1. Model Formulation
- : susceptible individuals,
- : infected (and infectious) individuals,
- : recovered (immune) individuals,
- : cumulative deaths (both natural and disease-induced).
- : time-varying transmission rate (incorporating stringency, mobility, variants),
- : recovery rate (1/infectious period),
- : disease-induced death rate per infected individual (COVID-specific),
- : natural (background) mortality rate per capita (small, e.g., 0.01/year ≈ 0.00083/month).
3.4.2. Basic Reproduction Number and Equilibria
3.4.3. Local Stability Analysis of Disease-Free Equilibrium
- If (i.e., ), then : the disease-free state is asymptotically stable; infections die out, .
- If , then : unstable; small introductions lead to epidemic growth (invasion).
3.4.4. Advantages for Excess Mortality Estimation
- Mechanistic link between transmission () and excess deaths ().
- Counterfactual simulation: set or to baseline to compute .
- Incorporation of covariates via time-varying , (e.g., stringency reduces , vaccination reduces ).
- Captures nonlinear feedback (herd immunity threshold at ).
- Enables scenario analysis (e.g., what-if no lockdowns).
3.5. Functional Principal Component Analysis with Functional Regression
3.5.1. Functional Data Representation
3.5.2. Karhunen–Loéve Decomposition and FPCA
3.5.3. Expected Mortality Forecasting via Functional Regression
3.5.4. Inference and Uncertainty
- Bootstrap of entire curves (functional bootstrap): resample countries or residuals.
- Bayesian functional regression: Gaussian process priors on or hierarchical priors on scores .
- Asymptotic normality of FPCA estimators under mild conditions [16].
3.5.5. Theoretical Properties and Consistency
3.5.6. Advantages over WHO GAM and Comparison
- Data-driven basis: eigenfunctions capture the actual dominant modes (e.g., amplitude of winter peaks, trend acceleration) rather than imposing cyclic smoothness.
- Dimensionality reduction: number of spline knots, reducing overfitting in short series.
- Natural clustering: countries with similar score vectors form epidemiological clusters, improving extrapolation.
- Parsimonious covariate modeling: regress low-dimensional scores instead of high-dimensional spline coefficients.
- Better handling of irregular sampling or missing months via smoothing.
4. Discussion
5. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- World Health Organization. Methods for estimating the excess mortality associated with the COVID-19 pandemic; Technical report; WHO, 5 April 2023. [Google Scholar]
- Wood, S. N. Generalized Additive Models: An Introduction with R, 2nd edition; Chapman and Hall/CRC, 2017. [Google Scholar]
- Riffe, T.; Acosta, E. Coverage of deaths in the 2020–2021 period: A comparison of vital registration and all-cause mortality data. Technical report, Short-Term Mortality Fluctuations project, 2021. [Google Scholar]
- Leon, D. A.; Shkolnikov, V. M.; Nepomuceno, R. M.; et al. Excess mortality associated with the COVID-19 pandemic: A systematic review. The Lancet 2020, 396(10258), 1123–1134. [Google Scholar]
- Checchi, F.; Roberts, L. Documenting mortality in crises: Are we doing enough? PLoS Medicine 2005, 2(7), e215. [Google Scholar]
- Erbello Sánchez, D.; Tito Corrioso, O.; Calzadilla Guerra, L.; Vázquez Alpizar, B.; Nodarse Erbello, S. Propuesta de innovaciones en el Complejo Hotelero Barceló Solymar - Occidental Arenas Blancas. 12 Conferencia Científica Internacional de la Universidad de Holguín 2025 2025. [Google Scholar] [CrossRef]
- Erbello Sánchez, D.; Tito Corrioso, O.; Calzadilla Guerra, L. El Turismo como factor de desarrollo y su efecto multiplicador en la economía cubana. Retos Turísticos 2025, 24(1), e-6020. Available online: https://retosturisticos.umcc.cu/index.php/retosturisticos/article/view/150.
- García Ramírez, A.; Tito Corrioso, O.; Erbello Sánchez, D. Inteligencia Artificial y Turismo 5.0: Innovación, sostenibilidad y transformación digital en el desarrollo de productos turísticos en Cuba. Retos Turísticos 2025, 24(1), e-6116. Available online: https://retosturisticos.umcc.cu/index.php/retosturisticos/article/view/158.
- Kung, K.; Magee, M. J.; et al. Non-pharmaceutical interventions and excess mortality during the COVID-19 pandemic. Nature Communications 2020, 11, 6205. [Google Scholar]
- Karlinsky, A.; Kobak, D. Tracking excess mortality across countries during the COVID-19 pandemic with the World Mortality Dataset: Objective trends and comparisons. eLife 2021, 10, e69336. [Google Scholar] [CrossRef] [PubMed]
- Harvey, A. C. Forecasting, Structural Time Series Models and the Kalman Filter; Cambridge University Press, 1989. [Google Scholar]
- Durbin, J.; Koopman, S. J. Time Series Analysis by State Space Methods, 2nd edition; Oxford University Press, 2012. [Google Scholar]
- Box, G. E. P.; Tiao, G. C. Intervention analysis with applications to economic and environmental problems. Journal of the American Statistical Association 1975, 70(349), 70–79. [Google Scholar] [CrossRef]
- Rue, H.; Martino, S.; Chopin, N. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. Journal of the Royal Statistical Society: Series B 2009, 71(2), 319–392. [Google Scholar] [CrossRef]
- Ramsay, J. O.; Silverman, B. W. Functional Data Analysis, 2nd edition; Springer, 2005. [Google Scholar]
- Hall, P.; Titterington, D. M.; Marron, J. S. On properties of functional principal components analysis. Journal of the Royal Statistical Society: Series B 2006, 68(1), 109–126. [Google Scholar] [CrossRef]
- Placeholder for Rivera et al., 2020 cyclic spline reference – to be completed if specific citation available.
- Hale, T.; et al. Oxford COVID-19 Government Response Tracker; Blavatnik School of Government, University of Oxford, 2020. [Google Scholar]
- Aguilar León, B.; Tito-Corrioso, O.; Fernández García, A. Modelo Metapoblacional de la dinámica de dispersión del dengue. In XVI Congreso Internacional de Matemática y Computación COMPUMAT 2019; La Habana, Cuba, 2019; ISBN 978-959-16-4341-4. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).