Preprint
Article

This version is not peer-reviewed.

Identifiable Regime Detection in Pension Fund Networks via Sticky Hidden Markov Models

Submitted:

30 May 2026

Posted:

02 June 2026

You are already at the latest version

Abstract
Wedocument anetwork-level vulnerability in pension fund systems: lifecycle allocation regulation, designed to protect individual participants, compresses cross-fund return dynamics to the point where provider choice offers little diversification. Using daily net asset value data from second-pillar pension funds in Lithuania over 2019–2025, we find that a single common factor explains at least 73% of total return variance even in the calmest observed periods. We develop an unsupervised regime-detection framework combining a PCA-based absorption ratio, DTW hierarchical clustering, and a Gaussian hidden Markov model with a data-driven crisis threshold. The HMM specification is supported by a dual empirical calibration of the stickiness prior and cross-validated against a fully Bayesian sticky HMM. The framework identifies three latent regimes in which elevated systemic co-movement is the structural norm rather than an exceptional state and shows that funds group primarily by age cohort rather than by provider. The absorption ratio has no significant relationship with global equity benchmarks in either calm or high-risk regimes, indicating that systemic stress is network-internal rather than imported. Cluster-level tracking-error amplification of 1.09× to 1.23× during high-risk episodes confirms that even conservative funds serving retirement-age participants are not insulated.
Keywords: 
;  ;  ;  ;  ;  

1. Introduction

Systemic risk, the potential for shocks to propagate across financial systems, has been prominent since the 2008 crisis, revealing how the failure of individual institutions can destabilise economies [1]. While traditionally associated with banks because of their central role in credit intermediation, attention is now shifting to large non-bank institutional investors, especially pension funds, which hold over $63 trillion in assets globally [2]. The risk arises not simply from scale; research demonstrates that overlapping portfolios among institutions allow asset sales or losses to depress prices and transmit shocks, amplifying stress throughout the system [3,4,5]. Notably, pension funds form networks characterised by such overlaps, making indirect contagion possible even without direct financial links.
Unlike banks, pension funds typically have longer investment horizons, limited leverage, and focus on hedging [6]. Yet these safeguards do not dissolve the vulnerability that [5] identifies, because that vulnerability lives in the overlap structure, not in leverage or direct exposure. Fire-sale dynamics and portfolio overlap can still open indirect contagion channels with no direct financial links between funds [3,4]. Recent events, like the 2026 collapse of the Versorgungswerk der Zahnärztekammer Berlin pension fund, underscore that systemic vulnerabilities in this sector are real [7]. The question, then, is not whether pension networks can transmit systemic stress, but how one can know, from the data, when such a network has entered a high-risk state.
Indeed, existing literature increasingly frames the risks confronting pension schemes as fundamentally unhedgeable, system-level risks that cannot be diversified away through portfolio construction alone [8]. The enterprise risk management tradition reinforces this view [9,10]: [10] shows formally that a firm sponsoring a defined-benefit pension plan incurs compounding vulnerabilities unless pension risk is integrated with all other business risks in a unified optimisation. Their model demonstrates that managing pension risk in a silo, rather than holistically, reduces expected firm value by over 12 % . This is not merely an actuarial accounting problem; it reveals that portfolio homogeneity and shared liability structures across pension participants create interdependencies that travel well beyond the balance sheet of any single fund.
Before turning to how systemic risk has been modelled, it is worth naming the obstacle that (we will argue) every existing approach must confront, and that ultimately defines the gap this paper fills. Systemic episodes in pension funds are rare and slow-moving. [11] addresses this through modular modelling but notes that traditional balance-sheet-based indicators often reflect past conditions and cannot reliably predict the onset of stress. [12] surveying the broader machine learning for financial risk management literature arrives at a complementary diagnosis from the opposite direction: across market, credit, operational, and insurance risk [13], the dominant paradigm is supervised classification trained on labelled crises or externally specified targets, and the field’s principal open challenges, data inaccessibility, label scarcity, imbalanced datasets, and biased portfolio-level estimates, all trace back to the same root that [11] identifies. Where [11] argues from systems-design first principles that the labels cannot be there, [12] documents, study by study, that the field has been quietly assuming they are. Four main modelling traditions respond to this constraint, but each leaves gaps, particularly for pension fund networks.
Financial network frameworks often model systemic risk using explicit matrices of bilateral obligations [14,15], which are well-suited to banking but less applicable to pension funds, whose connections are indirect. A complementary strand models systemic risk through network density and contagion probability directly, showing that the relationship between interconnectedness and expected loss is non-monotonic; neither fully connected nor ring networks are optimal because denser connections simultaneously enable risk sharing and multiply contagion pathways [16]. Formalising when and how such contagion becomes systemic, however, requires a rigorous measurement framework. The mathematical foundations for measuring such systemic risk rigorously were laid by [17], who developed a general framework for systemic risk measures via multi-dimensional acceptance sets and aggregation functions, unifying both the first aggregate, then inject capital, and the first inject capital, then aggregate paradigms into a single axiomatic structure. This framework makes explicit that the choice of aggregation rule and acceptance set jointly determines where the contagion boundary lies, a theoretical insight that later computational work has sought to operationalise. This foundation has recently been extended in two influential directions. [18] generalise systemic risk measures to graph-structured data, modelling the system as a random asset vector together with a random interbank liability matrix, and proving the existence of optimal bailout-capital allocations that secure the network before contagion occurs. In a closely related vein, [19] operationalises the first allocate, then aggregate paradigm from [17], using a deep learning scheme to compute the minimal cash needed to secure a system when no closed-form solution exists. These models are mainly normative, asking how to allocate capital given a detected crisis, rather than diagnosing when a high-risk regime emerges. The supervised machine learning tradition emerged partly to fill this diagnostic gap but, in doing so, introduced a different and equally binding constraint: the need for labelled crises. Recent advances integrating network analysis with extreme dependency modelling and machine learning [20,21,22,23,24,25,26,27,28,29] improve early warning capabilities and interpretability but remain reliant on labelled historical crises.
Deep neural networks and hybrid architectures achieve high predictive accuracy but deepen rather than dissolve this label dependency [23,30,31]. The relationship between risk factors and crisis outcomes is nonlinear [30], and more sophisticated architectures confirm rather than escape this; a hybrid XGBoost-LSTM model can issue warnings 2.3 quarters in advance with 83.7% accuracy [31], yet every such figure is conditional on prior crisis labelling. This label dependency is not merely a technical inconvenience but a structural constraint on the entire supervised paradigm: [32] demonstrates that even well-specified ensemble classifiers such as random forests, which outperform panel logit models on global banking crisis data with AUROC exceeding 81%, require crisis labels drawn from externally defined databases such as [33], and their generalisability across heterogeneous economies remains contingent on the completeness and accuracy of those labels. Interpretability advances do not dissolve it either; [34] applies partial dependence plots across multiple models to identify exchange rates, money supply, and interest rates as dominant risk drivers, yet still requires a continuous quantitative risk target derived from market data, which is unavailable in the pension fund setting where NAV-based returns are the primary observable. These requirements are feasible in banking and broad equity markets, but much less so for pension fund networks, where systemic episodes are rare, slow-moving, and not cleanly labelled, and where the very question of interest is whether elevated-risk regimes can be recognised without prior labels.
Unsupervised data-driven methods present a promising alternative. [35] introduces the absorption ratio, the fraction of total asset-return variance explained by a finite number of eigenvectors from a rolling PCA, as an unsupervised, threshold-free indicator of market fragility. The logic is direct: when a single dominant factor absorbs the bulk of return variation across assets, those assets are tightly coupled, and a shock propagates more quickly and broadly than when risks are dispersed. Applied to U.S. equity industries over 1998-2010, the absorption ratio rose to its highest recorded level during the global financial crisis of 2008, and spikes in the standardised shift in the absorption ratio preceded all of the 1% worst monthly drawdowns. Crucially, [35] derive their signal entirely from observed returns: no crisis labels, no externally chosen thresholds, and no imposed regime classification. The same PCA-based logic has since been applied to broader contagion analysis [1] and forms the methodological core of our own absorption ratio component. Studies also apply clustering and anomaly detection to textual or market data [36,37], but typically do not operate directly on institutional financial positions. Their network is a network of what is said about firms, not of what firms hold, which is exactly the overlap-based contagion substrate that [5] shows to be the operative one for institutions with common asset holdings. The broader systemic-risk machine learning literature confirms how isolated this position is. [38] in their survey of machine learning for systemic risk, classify the field into four branches and identify data-driven systemic risk analysis as a major future direction precisely because real interbank-network data are difficult to obtain and most existing work relies either on proxy networks or on externally specified targets. Endogenous regime detection operating directly on actual financial positions does not appear in their taxonomy as an established branch; it appears as a gap.
The argument across the preceding traditions can now be stated crisply, and it defines the specific opening this paper fills. [15] shows formally that network topology governs how shocks clear but requires observed bilateral liabilities. [39] shows empirically that PageRank and interbank-exposure ratios are the dominant drivers of systemic impact and vulnerability, respectively, but requires observed bilateral exposures and synthetic DebtRank labels. [30] shows that nonlinear classifiers substantially outperform linear ones for crisis detection but amplify rather than resolve the label dependency. Each step increases explanatory power while deepening the data requirements. The pension case sits at the intersection of all three constraints: no bilateral liabilities, no DebtRank-computable exposures, and no labelled crises. The natural response is to step back to an earlier, harder question: can the state of the network be identified directly from returns, without any of these inputs?
Within unsupervised methods applied directly to financial returns, the toolkit is well established: PCA-based connectedness measures [1,35]; DTW-based time-series clustering, robust to temporal misalignment and well suited to financial series [40]; and Markov regime-switching, whose foundations [41] motivate the HMM approach we adopt. The CRISP-DM framework [42] provides a reproducible scaffold for assembling these components. Notably, the most relevant pension-specific precedents identify market regimes via HMM and stress-testing frameworks, anchored to external benchmarks [43,44]. Together, these establish both the network embeddedness of pension funds and the usefulness of regime-switching methods in this exact setting, yet remain conditional on external states rather than allowing risk regimes to emerge endogenously from the pension network itself. In [11]’s terms, a benchmark-anchored regime is a measurement of a previous, externally defined state rather than a forward-looking read of the network’s own condition.
We address this gap by developing an unsupervised regime-detection framework that treats systemic risk as an emergent property of the pension fund network, rather than as a response to externally defined market conditions. The proposed approach integrates three components within a unified empirical pipeline: a PCA-based absorption ratio following the logic of [35] to measure system-wide co-movement; DTW-based hierarchical clustering to capture cross-sectional similarity structure; and a Gaussian HMM to identify latent risk regimes over time. Importantly, and in the spirit of the threshold-free, data-derived discipline shared by [35], [36], and [37], the crisis threshold is derived endogenously from the HMM emission distributions, ensuring internal consistency between regime classification and crisis identification.
Three methodological contributions distinguish this framework. First, the number of latent states is selected by combining BIC with a posterior identifiability analysis in a finite sticky HMM [45], addressing BIC’s tendency to over-select when mixture components overlap. Second, the stickiness prior is calibrated empirically from the data through dual anchors corresponding to the baseline and the BIC-optimal specification, eliminating the circularity of choosing a prior that determines the result. Third, emission estimates from the frequentist Baum-Welch algorithm are cross-validated against a fully Bayesian sticky HMM estimated via the No-U-Turn Sampler, with both approaches producing near-identical results, confirming that the detected regime structure is data-driven rather than model-dependent. The framework is applied to daily NAV data from N = 40 second-pillar pension funds in Lithuania managed by five providers over January 2019 to September 2025, augmented with fund-specific benchmarks and global market indices.
The empirical analysis yields four main findings. First, systemic co-movement is persistently high: the absorption ratio computed using the method of [35] never falls below 0.728, indicating that a single common factor explains at least 73% of total return variance even in the calmest periods. Clustering results show that the funds group primarily by age cohort rather than by provider, suggesting that lifecycle allocation regulation, rather than managerial differences, is the dominant driver of cross-fund dependence, a structural reading consistent with the finding that pension risk behaviour is regulation-driven [10,46]. Second, three latent risk regimes are identified, with moderate- and high-risk states accounting for nearly 87% of trading days, indicating that elevated systemic interaction is the network’s structural norm. Third, systemic stress is not a passive reflection of global markets: the relationship between the absorption ratio and global benchmarks is statistically insignificant across regimes, while tracking-error amplification, the precise quantity practitioners monitor [46], is concentrated in the high-risk state. Fourth, the empirically derived crisis threshold τ = 0.851 exceeds the values commonly used in network-based contagion studies, where thresholds are typically chosen heuristically [47]. This is consistent with [5]’s demonstration that contagion ignites only beyond a system-specific tipping point, one that depends on leverage, concentration, and the commonality of holdings rather than on any universal shock level. More broadly, the finding that an internally derived threshold sits above the heuristic values routinely used in banking-network studies echoes the methodological conclusion that [15] reaches from the opposite direction: the clearing structure of a specific network, its liabilities, its cash-flow distribution, and the topology of its obligations determine where the contagion boundary lies, and that boundary cannot be read off from first principles or borrowed from other sectors.
The remainder of the paper is structured as follows. Section 2 develops the theoretical framework and describes the empirical methodology, including the data, the absorption ratio, DTW clustering, and the Hidden Markov Model. Section 3 presents the main results, including the regime detection, the cluster structure, and the cross-step analysis connecting them. Section 4 discusses the findings and their policy implications. Section 5 concludes and outlines directions for future research.

2. Theoretical Framework and Methodology

This section develops the theoretical framework and explains how it is implemented empirically. The analysis follows the Cross-Industry Standard Process for Data Mining (CRISP-DM) [42], which structures the workflow into data understanding, preparation, modelling, and evaluation. The three-step empirical pipeline, PCA absorption ratio, DTW-based hierarchical clustering, and HMM regime detection are summarised in Figure 1. Each step is presented together with the theoretical results that justify it.

2.1. Data

The dataset comprises daily NAV observations from pension funds operating in Lithuania. The sample covers N = 40 funds managed by five pension fund providers, anonymised as Providers 1 through 5 to comply with data confidentiality requirements. Each provider manages eight age-cohort funds, yielding a structured 5 × 8 cross-sectional panel. The observation period runs from January 2019 to September 2025, yielding approximately T = 1 , 696 daily trading observations per fund and a total of 67,794 fund-day observations. The panel is strongly balanced, with each of the 40 funds contributing between 1,692 and 1,696 daily observations over the sample period.
The age-cohort design reflects differences in lifecycle allocation across funds. Seven cohorts are defined by participant birth decade: cohorts AG1 through AG7 cover birth years from 1954-1960 to 1996-2002, capturing participants at progressively earlier career stages. The eighth cohort, AG8, corresponds to the Target Income Pension Fund (TIPF), a specialist vehicle for participants who have reached retirement age and are actively drawing income. This distinction is empirically consequential: preliminary descriptive analysis of the sample reveals that TIPF funds exhibit substantially lower daily return volatility than age-cohort funds, consistent with their capital preservation mandate (see Table 1). This distinction anticipates the finding in Section 3.2 that extreme-cohort funds (AG1 and AG8) separate sharply from middle cohorts across all five providers, forming dedicated clusters with tracking errors three to six times lower than those of middle-cohort clusters (Table 5). The presence of TIPF funds is therefore analytically significant, as they provide a conservative baseline against which the behaviour of age-cohort funds during stress periods can be assessed.
For each fund and trading day, the dataset contains the fund unit price, an associated comparative benchmark index value, a unique fund identifier, and the corresponding age-group classification. The dataset is further augmented with M = 3 publicly available global benchmark series: the MSCI World Index, the MSCI Europe Index, and the S&P 500 Index. The sample period includes the COVID-19 market shock, providing a natural benchmark for validating the regime detection framework against a well-documented systemic stress event.

2.1.1. Notation and Setup

Let ( Ω , F , P ) be a probability space. Consider N pension funds observed over T trading days, and let X i , t denote the NAV of fund i at time t, for i = 1 , , N and t = 0 , , T . Each fund i is associated with a comparative benchmark index Z i , t ( f ) , reflecting its underlying investment strategy. In addition, let Z m , t ( g ) , m = 1 , , M , denote a set of global benchmark series representing key market factors. The corresponding log returns are defined by
r i , t ( f ) = log ( X i , t ) log ( X i , t 1 ) , i = 1 , , N , t = 1 , , T , r i , t ( b , f ) = log ( Z i , t ( f ) ) log ( Z i , t 1 ( f ) ) , i = 1 , , N , r m , t ( b , g ) = log ( Z m , t ( g ) ) log ( Z m , t 1 ( g ) ) , m = 1 , , M .
Let R ( f ) R T × N denote the pension fund return matrix, R ( b , f ) R T × N the fund-specific benchmark return matrix, and R ( b , g ) R T × M the global benchmark return matrix. When required, we define the augmented return matrix
R = R ( f ) R ( b , f ) R ( b , g ) R T × ( 2 N + M ) .
For a rolling window of length ω , define the windowed return matrix R t for t = ω , , T , containing returns from periods t ω + 1 to t.

2.1.2. Descriptive Statistics

Table 1 reports descriptive statistics for the daily log return series of pension funds, disaggregated by age cohort, and global benchmarks over the full sample period. All series exhibit negative skewness and substantial excess kurtosis, indicating fat-tailed distributions that deviate markedly from normality. This is consistent with the presence of extreme return realisations during stress periods like the COVID-19 pandemic.
A clear lifecycle gradient is visible across age cohorts. The oldest active cohort (AG1, born 1954-1960) and the TIPF funds exhibit the lowest volatility, consistent with their conservative, capital-preservation mandates. Volatility increases monotonically with cohort youth, reaching 0.0083 for the youngest cohort (AG7, born 1996-2002), reflecting progressively higher equity allocations prescribed by lifecycle regulation. Skewness is most negative for the oldest cohorts ( 1.7 for AG1, 1.5 for TIPF), indicating greater exposure to left-tail events relative to the magnitude of their returns. Excess kurtosis is highest for AG1 and AG2, suggesting that conservative funds experience more extreme return realisations relative to their low baseline volatility, a pattern consistent with infrequent but sharp revaluations of bond-heavy portfolios during interest rate shocks.
Importantly, the excess kurtosis values of 10.7–18.0 reported in Table 1 do not invalidate the Gaussian emission assumption of the Hidden Markov Model introduced in Section 2.4: each regime may be approximately Gaussian, while the mixture of regimes across the full sample produces the fat tails observed in the unconditional distribution. This is precisely the distributional flexibility that motivates a regime-switching framework.
Among the global benchmarks, all three indices exhibit similar volatility, negative skewness, and substantial excess kurtosis exceeding 16, confirming that the fat-tailed behaviour observed in pension fund returns is mirrored in global equity markets over the sample period.
Table 1. Descriptive statistics of daily log returns. Pension fund series are reported by age cohort (pooled across providers) and for the full sample. Skewness and excess kurtosis are reported. All series exhibit negative skewness and substantial excess kurtosis, consistent with fat-tailed return distributions.
Table 1. Descriptive statistics of daily log returns. Pension fund series are reported by age cohort (pooled across providers) and for the full sample. Skewness and excess kurtosis are reported. All series exhibit negative skewness and substantial excess kurtosis, consistent with fat-tailed return distributions.
Series Mean Std Min Max Skewness Kurtosis
Pension funds by age cohort
AG1 (54/60) 0.0001 0.0022 −0.0301 0.0159 −1.703 17.977
AG2 (61/67) 0.0002 0.0043 −0.0551 0.0371 −1.526 17.447
AG3 (68/74) 0.0004 0.0069 −0.0867 0.0661 −1.208 13.941
AG4 (75/81) 0.0004 0.0080 −0.0868 0.0662 −1.050 10.825
AG5 (82/88) 0.0004 0.0081 −0.0864 0.0659 −1.042 10.720
AG6 (89/95) 0.0004 0.0081 −0.0866 0.0663 −1.039 10.730
AG7 (96/02) 0.0004 0.0083 −0.0902 0.0695 −1.004 10.668
AG8 (TIPF) 0.0001 0.0019 −0.0235 0.0132 −1.495 14.549
Full sample 0.0003 0.0065 −0.0902 0.0695 −1.217 16.882
Global benchmarks
MSCI World 0.0004 0.0127 −0.1239 0.0871 −1.169 16.188
MSCI Europe 0.0006 0.0121 −0.1208 0.0871 −0.942 17.861
S&P 500 0.0006 0.0127 −0.1277 0.0909 −0.664 16.012

2.2. Step 1: The PCA Absorption Ratio as a Spectral Systemic Risk Measure

2.2.1. Definition and Computation

Definition 1 
(Rolling Covariance Matrix). The rolling sample covariance matrix at time t is
Σ t = 1 ω 1 R t M R t ,
where M = I ω 1 ω 1 ω 1 ω is the centering matrix. Since Σ t is real, symmetric, and positive semi-definite, it admits an eigendecomposition. Let λ 1 ( t ) λ 2 ( t ) λ N + M ( t ) 0 denote the ordered eigenvalues. The first principal component, the linear combination of fund returns with maximum explained variance, corresponds to λ 1 ( t ) .
In financial economics, PCA applied to the return covariance matrix is widely used for dimensionality reduction and for identifying a small number of dominant common factors that capture the co-movement of asset returns [48,49,50].
Definition 2 
(PCA Absorption Ratio). The absorption ratio at time t is the fraction of total fund return variance explained by the first principal component:
A R t = λ 1 ( t ) j = 1 N λ j ( t ) = λ 1 ( t ) tr ( Σ t ) .
A R t is a direct PCA output: it measures how much of the total variation across all N fund returns is driven by a single common factor. High values indicate that funds are moving together, a hallmark of systemic stress.

2.2.2. Empirical Implementation

The absorption ratio is computed from the pension fund return matrix R ( f ) using a rolling window of ω = 60 trading days, balancing estimation stability with the ability to capture regime changes at approximately monthly frequency. At each time point t, the following steps are performed:
1.
Extract the windowed return matrix R t ( f ) ;
2.
Compute the unbiased sample covariance matrix Σ t ;
3.
Perform eigendecomposition to obtain ordered eigenvalues λ 1 ( t ) λ N ( t ) 0 ;
4.
Compute A R t = λ 1 ( t ) / tr ( Σ t ) .
The resulting series { A R t } t = ω T gives the share of total return variance explained by the first principal component of Σ t . A rising A R t signals increasing commonality in fund returns and thus greater systemic co-movement (consistent with the strictly monotone relationship established in Proposition 3).

2.2.3. Key Properties

Proposition 1 
(Boundedness). For all t, A R t 1 N , 1 .
Proof. 
By eigenvalue ordering, λ 1 ( t ) tr ( Σ t ) , giving A R t 1 . Since λ 1 ( t ) λ j ( t ) for all j, summing over j gives N · λ 1 ( t ) tr ( Σ t ) , so A R t 1 N .    □
Proposition 2 
(Interpretation of Extremes). A R t = 1 N if and only if all eigenvalues are equal. A R t = 1 if and only if rank ( Σ t ) = 1 .
Proof. 
A R t = 1 N implies λ 1 = tr ( Σ t ) / N , the mean eigenvalue; equal eigenvalues are the unique PSD solution. A R t = 1 implies λ 1 = tr ( Σ t ) , so λ j = 0 for j 2 , giving rank ( Σ t ) = 1 .    □
Proposition 3 
(Monotonicity in Cross-Fund Correlation). Let ρ ¯ t denote the average pairwise return correlation. Under the equicorrelation model Σ t = σ 2 [ ( 1 ρ ¯ t ) I N + ρ ¯ t 1 N 1 N ] , the absorption ratio satisfies
A R t = 1 + ( N 1 ) ρ ¯ t N ,
which is strictly increasing in ρ ¯ t for ρ ¯ t 1 N 1 , 1 .
Proof. 
Under equicorrelation, Σ t has one eigenvalue λ 1 = σ 2 ( 1 + ( N 1 ) ρ ¯ t ) and N 1 repeated eigenvalues λ j = σ 2 ( 1 ρ ¯ t ) . Therefore, tr ( Σ t ) = N σ 2 and A R t = ( 1 + ( N 1 ) ρ ¯ t ) / N . The derivative A R t / ρ ¯ t = ( N 1 ) / N > 0 establishes strict monotonicity.    □
Proposition 3 is the key justification for using A R t as a systemic risk measure: it is a strictly increasing function of average cross-fund correlation, so a rising absorption ratio directly and monotonically reflects rising systemic co-movement.
Theorem 1 
(Crisis Identification). Let C t = I { A R t > τ } , τ 1 N , 1 , and set ρ ¯ crit = ( N τ 1 ) / ( N 1 ) . Under equicorrelation,
C t = 1 ρ ¯ t > ρ ¯ crit .
The threshold τ provides an economically interpretable mapping between the absorption ratio and the underlying cross-fund correlation structure: a crisis state is identified whenever average inter-fund correlation ρ ¯ t exceeds ρ ¯ crit . Rather than imposing τ a priori, its empirical value is determined from the HMM emission parameters in Step 3 (Section 2.4), ensuring consistency between the regime detection framework and the crisis identification criterion.

2.2.4. Relationship to Standard Risk Decomposition

The absorption ratio admits a direct interpretation within the standard risk decomposition framework used in portfolio performance measurement [50,51]. In factor model terms, the total risk of the pension fund network at time t decomposes as
tr ( Σ t ) Total Risk = λ 1 ( t ) Systematic Risk + j = 2 N λ j ( t ) Specific Risk ,
where systematic risk = λ 1 ( t ) is the variance explained by the first principal component (the dominant common factor), and specific risk = tr ( Σ t ) λ 1 ( t ) is the residual variance attributable to fund-specific (idiosyncratic) behaviour. The absorption ratio, therefore, equals the systematic risk share:
A R t = Systematic Risk Total Risk = λ 1 ( t ) tr ( Σ t ) , 1 A R t = Specific Risk Total Risk .
This equivalence provides a bridge between the spectral systemic risk measure used in this paper and the standard risk metrics employed in portfolio performance evaluation. However, the economic interpretation of a high systematic risk share differs sharply between the two settings. In standard portfolio construction, where PCA-based factor models are used to optimise investor portfolios [49,50], a high systematic risk share is generally interpreted as indicating that idiosyncratic exposures have been diversified away, leaving the investor primarily exposed to common market-wide factors. In the pension fund network context, the same quantity carries the opposite interpretation. A high systematic risk share indicates that lifecycle allocation rules have compressed the cross-section of fund behaviour to the point where provider choice offers negligible risk differentiation. When A R t = 0.87 , 87% of all fund return variance is driven by a single common factor, leaving only 13% as fund-specific risk—a level of structural homogeneity that represents a systemic vulnerability rather than a diversification achievement.

2.2.5. Robustness to Benchmark Inclusion and Covariance Estimator

In the baseline analysis, A R t is constructed from R ( f ) alone, preserving the fund-level focus of the primary systemic risk measure. The benchmark series serves two subsequent roles. First, periods of elevated A R t are examined alongside R ( b , f ) and R ( b , g ) to assess whether heightened systemic co-movement coincides with deviations from fund mandates or with broader market stress. Second, as a robustness exercise, A R t is recomputed using the augmented return matrix R, thereby assessing the sensitivity of regime detection to the inclusion of benchmark market factors.1
2.2.5.1. Choice of Covariance Estimator
The unbiased sample covariance is selected as the baseline because it preserves the full empirical concentration of variance in the leading eigenvalue λ 1 ( t ) , which is the precise quantity A R t is designed to measure. Three alternative families of estimators are commonly considered in this context: shrinkage estimators such as Ledoit-Wolf [52], robust estimators such as the minimum covariance determinant (MCD), and Random Matrix Theory (RMT) filtered estimators based on the Marchenko-Pastur (MP) eigenvalue distribution, which separate signal eigenvalues from finite-sample noise [53,54]. Each is appropriate for a different inferential goal: shrinkage stabilises the inverse covariance matrix for portfolio optimisation, MCD downweights outliers to robustly estimate mean-variance, and RMT filtering removes spurious correlations from short panels. None of these objectives align with the absorption ratio’s purpose, which is to detect rather than smooth concentration in the leading eigenvalue.
To verify that this choice does not drive the empirical results, A R t is recomputed using all three alternative estimators as a robustness exercise (Appendix A). The MP-filtered series tracks the baseline almost exactly (correlation 0.98 , mean absolute deviation 0.011 ), confirming that λ 1 ( t ) lies well outside the MP noise bulk throughout the sample and therefore reflects genuine cross-fund co-movement rather than estimation noise. The MCD estimator yields similarly close agreement on average. The Ledoit-Wolf estimator produces a systematic downward level shift, consistent with its design objective of compressing the eigenvalue spectrum toward a structured target.

2.3. Step 2: DTW Hierarchical Clustering

2.3.1. Formal Definition

Let X = { x = ( x 1 , , x T ) : x t R } be the space of univariate real-valued time series of length T. For x , y X , define the local cost matrix C R T × T with entries C i j = ( x i y j ) 2 .
Definition 3 
(Warping Path). A warping path W = { ( i 1 , j 1 ) , , ( i L , j L ) } satisfies: (i) boundary: ( i 1 , j 1 ) = ( 1 , 1 ) and ( i L , j L ) = ( T , T ) ; (ii) monotonicity: i k + 1 i k and j k + 1 j k ; (iii) step size: ( i k + 1 i k , j k + 1 j k ) { ( 1 , 0 ) , ( 0 , 1 ) , ( 1 , 1 ) } .
Definition 4 
(DTW Distance [55]). The DTW distance between x , y X is DTW ( x , y ) = min W ( i , j ) W C i j , computed via the accumulated cost matrix
D i j = C i j + min { D i 1 , j , D i , j 1 , D i 1 , j 1 }
with D 11 = C 11 , and DTW ( x , y ) = D T T .

2.3.2. Semi-Metric Properties and Consequences for Clustering

Theorem 2. 
DTW dissimilarity satisfies non-negativity, identity ( DTW ( x , x ) = 0 ) , and symmetry ( DTW ( x , y ) = DTW ( y , x ) ) , but does not satisfy the triangle inequality in general and therefore does not define a metric.
Proof. 
Non-negativity follows from C i j 0 . Identity holds since, for x = y , the diagonal warping path yields zero cost. Symmetry follows from the symmetry of the local cost function, as ( x i y j ) 2 = ( y j x i ) 2 . The failure of the triangle inequality is a known property of DTW; see [56].    □
Since DTW operates in a non-Euclidean setting and permits nonlinear temporal alignments, standard Euclidean k-means centroids are generally not well-suited to temporally misaligned sequences. DTW barycenter averaging (DBA) [57] addresses this by constructing centroids consistent with DTW alignments, thereby extending centroid-based clustering to DTW-based dissimilarities. However, the resulting optimisation problem remains nonconvex and does not inherit the theoretical structure of classical Euclidean k-means. Within hierarchical clustering, single-linkage methods may exhibit instability under non-metric dissimilarities and do not, in general, preserve ultrametric structure when the underlying dissimilarity fails to satisfy metric axioms.
We therefore adopt Ward’s hierarchical clustering as the primary method, since it groups observations by minimising within-cluster dispersion and extends naturally to general dissimilarity settings [58]. Ward’s linkage offers two additional advantages relevant to the present application: first, it produces a fully nested hierarchical decomposition, enabling the subdivision analysis used to validate the optimal cluster number (Section 3.2); second, its variance-minimising criterion aligns naturally with the provider-cohort block structure of the pension fund network.

2.3.3. Empirical Implementation

DTW-based hierarchical clustering is applied to the N pension fund return series to identify groups of funds with similar return dynamics over the full sample period. Each return series is first standardised to zero mean and unit variance, so that clustering reflects dynamic patterns rather than differences in scale across cohorts. The N × N pairwise DTW distance matrix is then computed using the dtaidistance Python library without imposing a warping window constraint.
The optimal number of clusters K c * is selected by minimising the Davies-Bouldin (DB) index over K c { 2 , , 8 } , with two diagnostic checks. First, the search range is extended to K c = 15 to confirm that finer partitions reflect over-segmentation rather than new structure (Section 3.2). Second, solutions producing singleton clusters are excluded as degenerate. The DB index is preferred over the silhouette coefficient because it does not require a metric space, making it appropriate for DTW-based dissimilarities. Robustness against k-means alternatives, including k-means with DTW barycenters, is verified in Section 3.2. This clustering step provides a cross-sectional characterisation of fund behaviour that complements the time-series analysis of systemic risk in Steps 1 and 3.

2.4. Step 3: Hidden Markov Model Regime Detection

2.4.1. Model Specification

Let { O t } t = 1 T denote the observed absorption ratio sequence, where O t = A R t [ 1 N , 1 ] . We model { O t } as generated by an unobserved discrete-time Markov chain { S t } t = 1 T taking values in the finite state space S = { 1 , , K } .
Definition 5 
(Hidden Markov Model). A K-state Hidden Markov Model (HMM) is characterised by the parameter set λ = ( π , A , B ) , where:
π k = P ( S 1 = k ) , initial distribution , A i j = P ( S t = j S t 1 = i ) , transition probabilities , B k ( o ) = N ( o ; μ k , σ k 2 ) , Gaussian emission density .
Conditional on the latent state S t = k , observations therefore follow a Gaussian distribution, A R t S t = k N ( μ k , σ k 2 ) .

2.4.2. Estimation, Decoding, and Long-Run Behaviour

Parameter Estimation
Model parameters are estimated via the Baum-Welch (EM) algorithm using the forward-backward recursions. The resulting updates maximise the likelihood P ( O λ ) and converge to a stationary point of the likelihood function under standard regularity conditions [59]. Gaussian HMMs are estimated for K { 2 , , 8 } , with multiple random initialisations to mitigate convergence to local optima; the model achieving the highest log-likelihood is retained for each K.
State Decoding
The most likely latent state sequence S * = ( s 1 * , , s T * ) is obtained using the Viterbi algorithm,
S * = arg max S P ( S O , λ ) ,
with computational complexity O ( K 2 T ) . The states are subsequently ordered by their estimated emission means μ ^ k and labelled as Low-, Moderate-, and High-Risk regimes.
Forecasting and Long-Run Behaviour
Conditional on the current state S t , the distribution of S t + h for any horizon h 1 is obtained by iterating the transition kernel:
P ( S t + h = k S t = i ) = ( A h ) i k ,
where A h denotes the h th matrix power of A. Letting e i be the i th unit vector, the forward distribution over states is given by e i A h . If the chain is irreducible and aperiodic, A h converges as h to a rank-one matrix whose rows equal the stationary distribution π * , defined as the unique probability vector satisfying
π * A = π * , π * 1 = 1 .
Equivalently, π * is the left eigenvector of A associated with the eigenvalue 1, and represents the long-run unconditional probability of occupying each regime, independently of the starting state. Forward state probabilities are reported at horizons of one day, one week, one month, one quarter, and six months.

2.4.3. Model Selection

BIC-Based Screening
Candidate models are first evaluated using the Bayesian Information Criterion (BIC),
BIC ( K ) = 2 log P ( O λ ^ ( K ) ) + d ( K ) log T ,
where d ( K ) = K 2 + 2 K is the number of free parameters: K 2 transition probabilities (including the initial state distribution) and 2 K Gaussian emission parameters. While BIC provides a standard likelihood-based selection criterion, it tends to favour more complex models that may include weakly identified states [60,61]. A complementary identifiability analysis is therefore conducted to discriminate genuine regime structure from over-parameterisation.
Calibration of Regime Persistence
The Bayesian specification employs a sticky transition prior to encourage persistence of latent regimes. For each state k, the transition probabilities follow a Dirichlet prior,
π k Dirichlet ( α 1 + κ e k ) ,
corresponding to the finite-state analogue of the sticky HDP-HMM prior of [45], where κ governs the strength of self-transition persistence. Under this prior, the expected self-transition probability is
E [ π k k ] = α + κ K α + κ ,
which, with α = 1 , inverts to κ = ( K p ¯ 1 ) / ( 1 p ¯ ) , where p ¯ denotes the target average self-transition probability.
Rather than fixing κ arbitrarily, it is calibrated empirically from the EM-fitted transition matrix. To verify that the calibration is not itself contingent on a particular choice of K, two anchors are used: the operational baseline ( K = 3 ) and the BIC-optimal specification ( K = 5 ) identified in the empirical analysis. This dual calibration yields two values of κ , both of which are carried forward into the identifiability analysis.
Bayesian Identifiability Analysis
A Bayesian sticky HMM is estimated for K { 3 , 4 , 5 , 6 } using the No-U-Turn Sampler (NUTS), with four parallel chains per configuration. The procedure is repeated under both calibrated values of κ to ensure that the identifiability conclusions do not depend on the EM specification used to anchor the prior. Three diagnostics are used: (1) the Gelman-Rubin statistic ( R ^ ), reported both raw and after label-switching correction obtained by sorting posterior emission means within each draw; (2) the minimum separation between posterior mean emission parameters; and (3) the number of divergent transitions. A state is considered identified when its label-switching-corrected R ^ < 1.1 .
Sensitivity Analysis
To verify that the identifiability results are not artefacts of the calibrated κ , the analysis is repeated across an additional grid of κ values spanning weak to strong persistence priors. The three-state specification is expected to remain identified across the full range of empirically supported κ values, whereas configurations that depend on a narrow prior window are flagged as less robust.
Selected Specification
The number of states is determined by combining BIC with the Bayesian identifiability and sensitivity results. While BIC favours a higher-dimensional specification, the identifiability analysis reveals that additional states beyond K = 3 either fail to separate cleanly or represent finer subdivisions of the same underlying regime structure rather than genuinely distinct economic states. The three-state specification is therefore retained as the smallest parsimonious model that is both statistically identifiable and economically interpretable across both κ calibrations and the full sensitivity grid. Higher-state specifications are examined as robustness checks in the empirical results. All computations are implemented in Python using the hmmlearn library.

2.4.4. Crisis Identification

In addition to regime classification, a data-driven crisis threshold τ is defined as the absorption ratio level at which the moderate- and high-Risk emission densities have equal mass. Under Gaussian emissions with approximately equal variance, this coincides with the midpoint of the two regime means. Crisis periods are then identified as C t = I { A R t > τ } (Theorem 1), providing a complementary threshold-based measure of extreme observations alongside the Viterbi-decoded regime assignment. To assess economic relevance, the inferred regimes are further compared with the global benchmark return dynamics, allowing periods of elevated absorption ratio to be interpreted in the context of broader market conditions.
Figure 1. Empirical pipeline following the CRISP-DM framework. Step 1 produces the absorption ratio time series { A R t } from the pension fund return matrix R ( f ) R T × N ; Step 2 characterises cross-sectional fund structure via DTW-based hierarchical clustering with Ward linkage; Step 3 infers latent systemic risk regimes via a Hidden Markov Model fitted to { A R t } , with the number of states selected by combining BIC with a Bayesian identifiability analysis and regimes decoded via the Viterbi algorithm. The dashed arrow indicates the robustness exercise in which the augmented return matrix R = R ( f ) R ( b , f ) R ( b , g ) R T × ( 2 N + M ) replaces R ( f ) in Step 1.
Figure 1. Empirical pipeline following the CRISP-DM framework. Step 1 produces the absorption ratio time series { A R t } from the pension fund return matrix R ( f ) R T × N ; Step 2 characterises cross-sectional fund structure via DTW-based hierarchical clustering with Ward linkage; Step 3 infers latent systemic risk regimes via a Hidden Markov Model fitted to { A R t } , with the number of states selected by combining BIC with a Bayesian identifiability analysis and regimes decoded via the Viterbi algorithm. The dashed arrow indicates the robustness exercise in which the augmented return matrix R = R ( f ) R ( b , f ) R ( b , g ) R T × ( 2 N + M ) replaces R ( f ) in Step 1.
Preprints 216176 g001
Taken together, the three steps form a single empirical pipeline. Step 1 produces the absorption ratio series that summarises systemic co-movement over time; Step 2 partitions the cross-section of funds into structurally homogeneous groups; and Step 3 maps the absorption ratio dynamics onto a small number of latent risk regimes whose boundary τ is derived from the data rather than imposed. The interpretive role of the benchmark series, the robustness of the covariance estimator, and the identifiability of the regime structure are all built into the design rather than treated as post-hoc checks. The next section applies this pipeline to the Lithuanian second-pillar pension fund panel and reports what it reveals about the structure of systemic risk in the network.

3. Main Results

The empirical analysis proceeds in three stages following the pipeline described in Section 2 and illustrated in Figure 1. Section 3.1 presents the absorption ratio dynamics and robustness results. Section 3.2 reports the DTW clustering structure. Section 3.3 presents the HMM regime detection results, including the emission distributions, the data-driven crisis threshold, and its economic interpretation. Section 3.4 connects the clustering and regime detection findings through a cross-step analysis of tracking error amplification under systemic stress.

3.1. Absorption Ratio Dynamics

Table 2 reports descriptive statistics of the baseline absorption ratio series { A R t } computed from the pension fund return matrix R ( f ) over the full sample period.
The empirical range [ 0.728 , 0.938 ] lies strictly within the theoretical bounds established in Proposition 3, confirming the internal consistency of the measure. The absorption ratio never approaches the theoretical lower bound of 1 / N = 0.025 , which would obtain only if all 40 fund return series were orthogonal. Even in the calmest periods of the sample, a single common factor explains more than 72% of total return variance across the pension fund network, a level of structural co-movement fundamentally inconsistent with genuine portfolio diversification.
Figure 2 provides direct visual evidence of this synchronisation across all three return matrices. Panels 1 and 2 reveal that all five providers move in unison during the COVID-19 shock and the 2022 rate-hike cycle, with the provider-level tracks collapsing toward each other during stress episodes. The AG8/TIPF funds exhibit a visibly compressed return range relative to the active cohorts, consistent with their capital preservation mandate. Panel 3 shows that the three global equity benchmarks co-move closely with each other but follow a different temporal pattern from the pension fund series, particularly during 2022 - 2023, where global benchmarks show sustained negative returns while pension fund dynamics evolve independently.
Figure 3 places A R t alongside these benchmark series to examine the temporal relationship between systemic co-movement and market conditions. The absorption ratio peaks sharply during the COVID-19 shock (early 2020, A R t = 0.938 ) and exhibits sustained elevation during the 2022 interest rate tightening cycle. However, the correspondence between A R t and the global benchmarks is not straightforward: during the 2022 - 2023 period, benchmark returns in Panel 3 stabilise and recover while A R t in Panel 1 remains elevated, suggesting that systemic co-movement in the pension fund network is not simply a reflection of global equity market stress. A formal statistical analysis of this relationship, including significance testing, is deferred to Section 3.4, where it is integrated with the HMM regime classification to enable regime-conditional interpretation.

3.1.1. Robustness: Augmented Absorption Ratio

As a robustness exercise, A R t is recomputed using the augmented return matrix R = [ R ( f ) R ( b , f ) R ( b , g ) ] R T × ( 2 N + M ) , where 2 N + M = 83 . The augmented series ranges from 0.468 to 0.919, with a mean of 0.712, compared with 0.837 for the baseline. The correlation between the two series is 0.341, indicating that the inclusion of the benchmark series substantially changes the co-movement structure captured by the absorption ratio. The augmented version of the absorption ratio moves somewhat like the original version, but not too closely, because adding benchmarks changes how assets move together. This supports the methodological choice to construct the baseline covariance matrix Σ t exclusively from pension fund returns, ensuring the risk measure remains fund-focused. Figure 4 compares the baseline and augmented absorption ratios over time.
The panel shows both series, the baseline A R t consistently remains higher, while the augmented A R t fluctuates more and exhibits deeper troughs. This divergence reflects the structural impact of adding benchmark-linked returns, which redistribute variance across a broader set of factors and reduce the dominance of the leading eigenvalue. To assess the materiality of regularisation, ridge regularisation is deliberately applied to the full-rank baseline covariance matrix. Although regularisation is required for the rank-deficient augmented specification, applying it to the baseline induces distortions of 0.013 – 0.047 in A R t across representative sample windows (Table 3). This confirms that regularisation must be restricted to the augmented robustness exercise and omitted from the baseline computation.

3.2. DTW Hierarchical Clustering

The clustering analysis proceeds in three stages: selection of the optimal cluster number, characterisation of the resulting structure, and robustness verification against extended search ranges and alternative algorithms.

3.2.1. Optimal Cluster Number

Table 4 reports the Davies-Bouldin (DB)index across K c { 2 , , 8 } clusters for Ward’s linkage applied to the pairwise DTW distance matrix. The index is minimised at K c * = 8 .
Figure 5 presents the corresponding dendrogram with the K c * = 8 cut.

3.2.2. Cluster Structure

Table 5 reports the composition and mean tracking error of each cluster, and Figure 6 maps cluster assignments onto the provider-cohort grid.
Table 5. Cluster composition and mean tracking error at K c * = 8 . Tracking error (TE) is the mean absolute daily deviation of fund returns from their fund-specific benchmark R ( b , f ) .
Table 5. Cluster composition and mean tracking error at K c * = 8 . Tracking error (TE) is the mean absolute daily deviation of fund returns from their fund-specific benchmark R ( b , f ) .
Cluster Composition Size Mean TE
C1 Provider 3, AG2-AG7 6 0.0046
C2 Provider 1, AG2-AG7 6 0.0066
C3 Provider 1, AG1 & AG8 2 0.0014
C4 Providers 2 & 5, AG1 & AG8 4 0.0010
C5 Provider 4, AG1 & AG8 2 0.0012
C6 Provider 3, AG1 & AG8 2 0.0011
C7 Provider 4, AG2-AG7 6 0.0041
C8 Providers 2 & 5, AG2-AG7 12 0.0036
Two structural patterns emerge. First, extreme-cohort funds (AG1, born 1954-1960; and AG8, the Target Income Pension Fund) separate sharply from middle cohorts (AG2-AG7) across all five providers. Clusters C3-C6 group exclusively AG1 and AG8 funds within or across providers, with mean tracking errors of 0.0010-0.0014, three to six times lower than the 0.0036-0.0066 observed for middle-cohort clusters C1, C2, C7, and C8. This is consistent with the conservative, low-turnover mandates governing capital-preservation funds: AG1 funds serve participants nearing retirement, while AG8/TIPF funds serve participants actively drawing retirement income. The compressed return range of TIPF funds is also visible in Figure 2.
Second, clustering is organised primarily by age cohort rather than by provider. Cluster C8 groups the middle cohorts of Providers 2 and 5 across provider boundaries (12 funds), while Cluster C4 groups their extreme cohorts (4 funds), indicating that these two providers operate nearly identical lifecycle allocation strategies. The heatmap in Figure 6 makes this visible as a column-dominant pattern: colour transitions are sharper across cohort columns than across provider rows. This cross-provider grouping by lifecycle stage indicates that regulatory lifecycle allocation constraints – which prescribe similar asset allocations for funds targeting the same birth-year cohort regardless of provider – are the dominant driver of return co-movement in this network, rather than provider-specific investment decisions.

3.2.3. Robustness: Extended Search Range

To verify that K c * = 8 is not an artifact of the search range, the analysis was repeated for K c { 2 , , 15 } , exceeding the eight age cohorts that define the most granular natural grouping in the data. Davies-Bouldin (DB) scores continue to fall mechanically beyond K c = 8 , reaching 0.553 at K c = 15 . However, this trajectory reflects subdivision of existing clusters rather than discovery of new structure, as shown in Table 6.
Two observations follow. First, the K c = 15 partition is fully nested within the K c = 8 partition: every K c = 15 cluster is a strict subset of one K c = 8 cluster, with no reassignment across K c = 8 boundaries. Five of the K c = 15 clusters contain a single fund, confirming that the lower DB scores at higher K are driven by isolating individual funds as singleton clusters rather than identifying genuine structural groups. The K c * = 8 solution is therefore retained as the smallest K minimising DB without producing degenerate clusters.
Second, the subdivision of Cluster C8 yields a substantive observation: when forced to split the 12-fund Providers 2 & 5 middle-cohort group, the algorithm groups the youngest middle cohorts of both providers together (AG2-AG3, TE = 0.0026) while separating the older middle cohorts (AG4-AG7) by provider. This indicates that Providers 2 and 5 follow nearly identical allocation rules for the youngest cohorts but diverge slightly for cohorts approaching mid-career, consistent with greater allocation flexibility at the equity-heavy end of the lifecycle glide path.

3.2.4. Robustness: Algorithm Choice

Two k-means alternatives were also examined to verify that the cohort-dominant structure is not an artifact of the algorithm choice: k-means with Euclidean distance and k-means with DTW barycenters via the tslearn library [62]. Table 7 reports DB scores for all three methods.
Across the seven values of K c tested, Ward+DTW produces the lowest DB score at four ( K c { 2 , 3 , 4 , 6 } ), is marginally outperformed by k-means with Euclidean distance at K c = 5 and by k-means with DTW barycenters at K c = 8 , and produces identical scores at K c = 6 , 7 , & 8 for the first two methods. The differences across methods are small in absolute terms (within 0.04 DB units at the optimum), and the DB-optimal solution across all three methods occurs at K c = 8 .
Unanimous Partition at K c = 7
A particularly informative pattern emerges at K c = 7 , where all three methods produce identical DB scores (0.8688). Computing the adjusted rand index (ARI) across all three pairwise method comparisons at K c = 7 yields ARI = 1 in every case (Ward vs. k-means Euclidean: 1, Ward vs. k-means DTW: 1, k-means Euclidean vs. k-means DTW: 1), confirming that the three methods recover identical partitions at this K c . This convergence is methodologically significant: it demonstrates that the cohort-dominant block structure of the network is detected unanimously across distance metrics (Euclidean and DTW) and across algorithm families (hierarchical and partitional).
The structural relationship between the unanimous K c = 7 partition and the K c * = 8 baseline is reported in Table 8. The collapse from K c = 8 to K c = 7 merges Clusters C5 (Provider 4, AG1 + AG8) and C6 (Provider 3, AG1 + AG8) into a single 4-fund group, while leaving the other six clusters unchanged. This indicates that Providers 3 and 4 follow nearly identical extreme-cohort allocation strategies, second only to Providers 2 and 5, who already share Cluster C4 at K c * = 8 . The implications of this provider-pairing pattern are discussed in Section 4.
Comparison at K c * = 8
At the DB-optimal K c * = 8 , all three methods yield comparable scores (Ward+DTW: 0.7505, k-means Euclidean: 0.7505, k-means DTW: 0.7183), and the cluster assignments produced by k-means with DTW barycenters agree with the Ward+DTW partition at ARI = 0.97 . The full cluster composition under k-means with DTW barycenters is reported in Table 9, and the cross-tabulation with the Ward+DTW partition is reported in Table 10.
Seven of the eight Ward clusters map perfectly onto a single k-mean cluster. The sole disagreement involves Ward Cluster C3 (Provider 1, AG1 + AG8), which k-means partitions by reassigning Provider 1’s AG1 fund to the Provider 1 middle-cohort group (KM2) rather than grouping it with AG8 (KM6). This is the only fund in the network whose cluster assignment is methodologically ambiguous, suggesting that Provider 1’s AG1 fund operates with a slightly more equity-tilted allocation than the AG1 funds of the other four providers, an interpretation consistent with Provider 1’s higher middle-cohort tracking error (C2: TE = 0.0066, the highest in Table 5).
Choice of Ward as Baseline
Ward’s linkage is retained as the baseline for two reasons: it is the only method tested that produces a fully nested hierarchical decomposition, enabling the dendrogram in Figure 5 and the K c = 15 subdivision analysis in Table 6; and it accommodates the provider-cohort block structure of the network through its variance-minimising criterion. The empirical agreement at K c = 7 ( ARI = 1 ) and at K c = 8 ( ARI = 0.97 ) confirms that this methodological choice does not affect the substantive findings of the paper.

3.3. HMM Regime Detection

3.3.1. Model Selection

Selecting the number of latent regimes K in a Gaussian HMM requires balancing in-sample fit against the risk of over-parameterisation. We combine the standard BIC with a Bayesian posterior identifiability analysis under two data-driven κ calibrations, leading to a defensible choice of K = 3 as the baseline.
BIC-Based Comparison
Table 11 reports BIC values for HMM models with K { 2 , , 8 } states estimated by the Baum-Welch (BW) algorithm with 20 random initialisations each.
BIC is minimised at K = 5 . However, BIC penalises model complexity through the number of free parameters but does not assess whether the additional components correspond to interpretable or well-separated states. This limitation is well documented in the mixture-model literature [60,61]: while BIC consistently estimates the number of mixture components asymptotically, in practice it can select extra components when distributions overlap or when a single cluster is represented by multiple mixture components. We therefore complement BIC with a posterior identifiability analysis.
Dual Empirical Calibration of κ
The identifiability analysis relies on a Bayesian finite sticky HMM with the Dirichlet prior specified in Section 2.4. To avoid the circularity of choosing a prior that determines the answer, κ is calibrated empirically by inverting the prior expected self-transition probability against the EM-fitted transition matrix. Because this calibration depends on K, we use two anchors: the operational baseline ( K = 3 ) and the BIC-optimal specification ( K = 5 ). The mean diagonals are p ¯ 3 = 0.971 for the three-state EM model (diagonals 0.948 , 0.980 , 0.985 ) and p ¯ 5 = 0.953 for the five-state model (diagonals 0.953 , 0.957 , 0.946 , 0.976 , 0.934 ), yielding calibrated values κ 3 66 and κ 5 80 . Both values are carried forward into the identifiability analysis, ensuring that conclusions do not hinge on the choice of EM anchor.
Identifiability Under Both Calibrations
We fit the finite sticky HMM via NUTS sampling at each calibrated κ for K { 3 , 4 , 5 , 6 } , using four parallel chains with 1,000 warmup and 1,000 post-warmup iterations per chain. The Gelman–Rubin R ^ statistic for emission means is reported both raw and after label-switching correction (sorting emission means within each posterior draw). A state is considered identified when its corrected R ^ < 1.1 . Table 12 reports the results for both calibrations.
The two calibrations agree on every qualitative conclusion. K = 3 is cleanly identified under both priors (corrected R ^ = 1.000 , minimum gap 0.044 absorption-ratio units). K = 4 fails identification entirely under both priors, with corrected R ^ reaching 13.13 ( κ 3 ) and 14.77 ( κ 5 ). K = 5 is also identified under both priors, but represents a finer subdivision of the three-regime structure with smaller emission-mean gaps ( 0.019 ); this case is examined as a robustness check in Section 3.3.6. K = 6 shows partial identification under both calibrations (2 of 6 states at κ 3 , 1 of 6 at κ 5 ), indicating further over-parameterisation. The agreement between the two calibrations confirms that the identifiability conclusions are properties of the data, not of the prior anchor.
Sensitivity to the Stickiness Prior
To verify that these conclusions extend beyond the calibrated values, the analysis is repeated for κ { 10 , 20 , 30 , 50 } in addition to the two anchors. Table 13 reports the number of identified states across the full grid.
K = 3 is identified at every κ value consistent with the empirical persistence of the data ( κ 20 ). The single failure at κ = 10 corresponds to a prior expected persistence (Equation 2), which ranges from 0.69 at K = 6 to 0.85 at K = 3 , substantially below the empirical mean of 0.97. At this κ , the prior is therefore inconsistent with the data and the chains collapse into degenerate posterior regions.
K = 4 exhibits the opposite pattern. The chains appear to converge only at κ = 10 , where the prior is itself inconsistent with the empirical persistence and therefore cannot be used to support K = 4 as identified. At every κ consistent with the data ( κ 20 ), K = 4 fails identification under both calibrations, confirming that four genuinely distinct regimes do not match this network.
We retain K = 3 as the baseline. It is the smallest specification cleanly identified under both data-calibrated priors and across the full empirically supported κ range, and it aligns with the standard low-, moderate-, and high-risk regime taxonomy in the systemic risk literature [41]. The EM and Bayesian estimates at K = 3 agree to within 0.001 across all emission parameters (Section 3.3.2), and the BIC-optimal K = 5 specification is examined as a finer refinement in Section 3.3.6.

3.3.2. Regime Characterisation

Table 14 reports the emission parameters, state occupancy, and persistence statistics for each of the three regimes under the K = 3 baseline, together with mean daily log returns of global equity benchmarks over the regime-assigned days.
To verify that the regime estimates are not artefacts of the EM optimisation, the emission means are re-estimated under the Bayesian sticky HMM at the data-calibrated prior κ 3 = 66 . Table 15 compares the two estimators after label-switching correction.
The EM and Bayesian point estimates agree to within 0.001 across all three regimes, and the 90% credible intervals are narrow (widths of 0.002-0.007). This convergence between two methodologically distinct estimators, one frequentist and one fully Bayesian with a data-calibrated prior, confirms that the data sharply identify the three-regime structure and not an artefact of either estimation approach. Figure 7 plots the three estimated Gaussian emission distributions in relation to the crisis threshold τ derived in Section 3.3.3.
Figure 8 overlays the regime classification onto the absorption ratio and benchmark return series across the full sample period. The high-risk regime (red shading) captures the COVID-19 shock and the 2022 rate-hike period, as well as additional episodes of elevated co-movement in 2021 and 2024-2025. The decoupling between global benchmark recovery and the persistence of high-risk shading, particularly visible in late 2022, where the bottom panel shows stabilising global benchmark returns while the top panel remains in the high-risk regime, confirms that pension fund network stress is not a mirror of equity market dynamics but reflects a structural synchronisation that persists beyond the initial shock.

3.3.3. Crisis Threshold

The empirical crisis threshold τ is defined as the absorption ratio level at which the moderate- and high-risk emission densities have equal mass. Under Gaussian emissions with approximately equal variance, this coincides with the midpoint of the two regime means:
τ = μ ^ Moderate + μ ^ High 2 = 0.8291 + 0.8737 2 = 0.851 .
By Theorem 1 and Proposition 3, this corresponds to a critical average inter-fund correlation of
ρ ¯ crit = N τ 1 N 1 = 40 × 0.851 1 39 0.848 .
The secondary threshold τ 1 = 0.801 separates low- from moderate-risk, corresponding to ρ ¯ crit , 1 0.796 .
The threshold τ and the Viterbi decoder give two related but distinct classifications. The daily crisis indicator C t = I { A R t > τ } flags 494 days as crisis episodes (31.3% of the sample), whereas the Viterbi-decoded high-risk regime spans 516 days (32.7%). The 22-day difference reflects the overlap zone shown in Figure 7: within this zone, Viterbi assigns the persistent state encoded in the transition matrix rather than the closer emission density on any given day. The downstream econometric analysis in Section 3.4 and Section 3.4.1 uses the Viterbi-decoded high-risk regime, while τ provides the economic interpretation of where that regime boundary lies in observation space.
The empirical value τ = 0.851 substantially exceeds the 0.50-0.60 range commonly assumed in the systemic risk literature. This reflects the structural homogeneity of the pension fund network and confirms that crisis identification criteria must be calibrated to the institutional context rather than imported from cross-sectoral applications.

3.3.4. Regime Persistence and Transition Structure

Beyond regime characterisation, the full transition probability matrix reveals an asymmetric escalation pathway between regimes. Table 16 reports the estimated matrix.
Direct transitions from low-risk to high-risk are very rare (probability 0.0046), meaning the system almost always passes through the Moderate-Risk regime when escalating. This carries early-warning potential: a sustained rise in A R t into the moderate-risk zone is a precursor signal before a full high-risk episode.

3.3.5. Current State and Forward Probabilities

As of the final observation in the sample (29 September 2025), the pension fund network is in the moderate-risk regime, with A R t = 0.8285 , closely aligned with the Moderate-Risk emission mean μ ^ = 0.8291 . The system has remained in this regime for 38 consecutive trading days, against an expected total duration of 51 trading days.
Forward state probabilities P ( S t + h = k S t = Moderate - Risk ) are computed by iterating the transition matrix as defined in Section 2.4. Table 17 reports these probabilities at horizons ranging from one day to six months, alongside the long-run stationary distribution.
At the one-quarter horizon, the probability of transitioning to the high-risk regime rises to 23.1%, exceeding the probability of returning to the low-risk regime (15.5%). This asymmetry reflects the dominant role of the moderate-to-high transition channel established in Table 16: from the moderate-risk regime, escalation is structurally more likely than de-escalation at medium horizons. Over longer horizons, the forward probabilities converge towards the stationary distribution: 30.7% high-risk, 55.3% moderate-risk, and 14.0% low-risk. This long-run equilibrium confirms that the pension fund network spends approximately 86% of its time in moderate- or high-risk regimes, reinforcing the characterisation of elevated systemic co-movement as the structural norm rather than an exceptional state.

3.3.6. Robustness: Higher-State Specifications

The BIC-optimal K = 5 specification was shown to be statistically identified under both κ calibrations (Table 12), with all five emission means distinguishable across chains. The five estimated emission means (0.767, 0.815, 0.836, 0.857, 0.891) cluster into three groups when ordered: a single low state ( 0.77 ), three intermediate states ( 0.81 0.86 ), and a single high state ( 0.89 ). This structure refines rather than contradicts the three-state baseline: the five-state classification partitions the moderate-risk zone into three sub-regimes of varying stress intensity, while the low- and high-state assignments correspond directly to the low-risk and high-risk regimes of the three-state model.
By contrast, the K = 4 specification fails identification across all κ values consistent with the data’s empirical persistence (Table 13), indicating that there is no intermediate four-regime structure between the coarse three-regime classification and the finer five-state subdivision. K = 3 is the smallest identified specification and K = 5 the next, with K = 4 representing a structural discontinuity.
The cross-regime analysis in Section 3.4 is conducted at K = 3 for parsimony and economic interpretability, but the substantive findings are preserved if the analysis is repeated using the K = 5 specification with the three intermediate states aggregated to the Moderate-Risk regime.

3.4. Correlation Analysis and Regime Conditioning

Table 18 reports three correlation measures between A R t and the interpretive benchmark series over the full sample period: Pearson (parametric), Spearman (rank-based), and Kendall’s tau (rank-based, less sensitive to ties and outliers). Each is reported with p-values testing H 0 : ρ = 0 .
The three tests yield a consistent picture. None of the four series exhibits a statistically significant rank-based association with A R t under either Spearman or Kendall, with p-values uniformly above 0.25. The two Pearson coefficients that reach conventional significance, tracking error ( r = 0.086 , p < 0.001 ) and the MSCI World Index ( r = 0.061 , p = 0.016 ), are not corroborated by either rank-based test. Since rank-based correlations are far less sensitive to a small number of extreme observations than Pearson, this divergence indicates that the parametric significance reflects the influence of a subset of high-leverage points rather than a continuous monotonic relationship spanning the full distribution.
Figure 9 provides a visual diagnostic of this pattern. The tracking error panel reveals that the Pearson association is driven primarily by a subset of high-risk observations exhibiting both elevated A R t and elevated tracking error, while moderate- and low-risk observations form a diffuse cloud with no discernible trend. This concentration of the association in one regime explains the divergence between Pearson, which weights extreme values heavily, and the rank-based tests, which treat all observations equally regardless of magnitude. The global benchmark panels confirm the absence of any continuous relationship: the full-sample OLS regression lines are essentially flat, and observations are well mixed across regimes at every level of A R t . The most extreme negative benchmark returns ( 0.10 to 0.12 ) correspond to low-risk rather than high-risk observations—an apparent paradox that reflects the 60-day rolling window construction of A R t , which captures sustained co-movement over the preceding window rather than instantaneous market shocks.

Regime-Conditional Regression

The pattern in Figure 9 suggests that any genuine relationship between A R t and the interpretive series may operate primarily within the high-risk regime rather than uniformly across the sample. To test this directly, each interpretive series is regressed on A R t , a High-Risk regime dummy D High , t , and their interaction:
y t = β 0 + β 1 A R t + β 2 D High , t + β 3 ( A R t · D High , t ) + ε t .
This specification fits two separate lines simultaneously: a calm-regime line with intercept β 0 and slope β 1 , and a high-risk line with intercept β 0 + β 2 and slope β 1 + β 3 . The interaction coefficient β 3 therefore measures the difference in slopes between the two regimes; a significant β 3 indicates a regime-dependent rather than continuous relationship. Standard errors are computed using the HAC (Newey-West) estimator with 7 lags to account for the strong autocorrelation in A R t induced by the 60-day rolling window, consistent with the autocorrelation structure implied by the transition matrix persistence estimates in Table 16.
Figure 10 visualises the regime-conditional fits reported in Table 19, replacing the single full-sample OLS line of Figure 9 with two regime-specific lines per panel. The geometric distance between the two slopes in each panel corresponds directly to the interaction coefficient β 3 .
The regression results in Table 19 and the corresponding visualisation in Figure 10 reinforce three points. First, the non-high-risk slope β 1 is small and statistically indistinguishable from zero for all four dependent variables, confirming the absence of a continuous relationship between A R t and the interpretive series outside crisis periods. This is visible in Figure 10 as the near-horizontal blue line across every panel. Second, the interaction coefficient β 3 for tracking error is positive ( + 0.037 ), with the sign and rough magnitude predicted by the cluster-level amplification analysis in Section 3.4.1. Although β 3 falls short of conventional significance under the conservative HAC adjustment ( p = 0.157 ), its magnitude is economically meaningful: a one-unit increase in A R t during high-risk periods is associated with a 3.7% point increase in average daily tracking error, an order of magnitude consistent with the 1.09× to 1.23× amplification ratios documented at the cluster level. The visual divergence between the calm and high-risk slopes in the tracking error panel of Figure 10 illustrates this regime-dependent amplification clearly. The limited statistical power of the daily-frequency test reflects the conservative HAC adjustment, the concentration of the effect in roughly one-third of observations, and the substantial idiosyncratic variation in tracking error across the 40 funds. Third, no global benchmark exhibits a regime-dependent relationship with A R t : all interaction coefficients are small, of inconsistent sign, and far from significance, with R 2 values below 0.006 . The near-parallel calm and high-risk slopes in the three benchmark panels of Figure 10 make this visually unambiguous. This rules out the alternative interpretation that systemic co-movement in the pension fund network is a passive reflection of global equity market conditions.
Together, the rank-based correlations and the regime-conditional regression resolve the apparent tension in the Pearson results: the parametric significance for tracking error reflects regime-specific amplification rather than a continuous linear effect, but the daily-frequency tests have limited power to confirm this directly. The cluster-level analysis in Section 3.4.1 provides a more powerful test of the same hypothesis by aggregating observations within structurally homogeneous fund groups under each regime, where the amplification effect is unambiguous.

3.4.1. Cluster-Level Tracking Error Amplification

Figure 11 quantifies the tracking error amplification at the cluster level, connecting the DTW clustering results of Section 3.2 with the HMM regime classification of Section 3.3. By aggregating observations within structurally homogeneous fund groups under each regime, this analysis tests directly whether the regime-dependent amplification suggested by Section 3.4 is present in the data.
All eight clusters exhibit positive amplification ratios, confirming that tracking error increases universally during high-risk periods regardless of cohort type or provider. The magnitude differs systematically between cluster types. Middle-cohort clusters (C1, C2, C7, C8) show higher absolute tracking errors in both regimes, reflecting their greater equity exposure. Cluster C2 (Provider 1, AG2 - AG7) exhibits the highest absolute tracking error ( 6.3 × 10 3 during high-risk periods) with an amplification ratio of 1.18 × , while Cluster C7 (Provider 4, AG2 - AG7) records the highest amplification ratio of 1.23 × , tracking error grows by nearly a quarter during systemic stress.
Extreme-cohort clusters (C3 - C6, grouping AG1 and AG8/TIPF) show lower absolute tracking errors but still amplify during stress, with ratios ranging from 1.09 × (C6, Provider 3 extreme cohorts) to 1.17 × (C5, Provider 4 extreme cohorts). The 1.09 × ratio for C6, the lowest in the sample, indicates that even the most conservative cluster is not immune to systemic stress, merely less sensitive to it. This has direct welfare implications: TIPF funds serve participants actively drawing down retirement income, for whom any tracking error amplification translates immediately into income uncertainty at a life stage when recovery is not possible.

4. Discussion

The results in Section 3 point to a network whose risk architecture is shaped less by what providers choose to do than by what regulation requires them all to do in similar ways. We organise the discussion around four findings and what they imply for supervision.

4.1. When “Diversification” Stops Being Diversification

Across the seven-year sample, the absorption ratio never falls below 0.73. Even in the calmest 60-day windows we observe, a single common factor explains nearly three-quarters of the variance across 40 funds. By Proposition 3, this corresponds to an average pairwise correlation above 0.72, which is hard to reconcile with the textbook picture of a diversified system.
The clustering results tell us why. Funds group by age cohort, not by provider. Cluster C8 pulls together 12 middle-cohort funds from Providers 2 and 5, and Cluster C4 does the same for their extreme cohorts. That the three clustering methods agree exactly at K c = 7 ( ARI = 1 ) is reassuring: this is not an artefact of one algorithm or one distance metric.
The mechanism is regulatory. Lifecycle rules tell every provider to allocate similarly within a given birth-year cohort, so participants who choose between providers are, in effect, choosing between portfolios that look almost identical at the cohort level. The same architecture that protects an individual saver from extreme allocation choices removes provider differentiation as a system-level risk-mitigation channel. The cross-provider merging of Providers 2 and 5 at K c * = 8 (Clusters C8 and C4, Table 5) illustrates the extreme case: from a systemic risk perspective, these two providers constitute a single functional node, which amplifies the network concentration finding beyond what fund count alone would suggest. The economic reading of a high systematic risk share, therefore, inverts in this setting: standard portfolio theory treats it as a sign that idiosyncratic risk has been diversified away, but in a pension network, it is a sign that the cross-section has been compressed to the point of fragility. This reframing changes the regulator’s problem. The standard supervisory framework under IORP treats each fund as the unit of risk. The clustering result implies that the operative unit of risk is the cohort, not the fund: a shock affecting any middle-cohort fund simultaneously stresses all middle-cohort funds across all providers, regardless of which provider a participant chose. A participant who switches providers to reduce exposure is not reducing exposure at all. The robustness of this structure across all three clustering methods, with identical partitions at K c = 7 ( ARI = 1 , Section 3.2), confirms that the cohort-dominant pattern is a property of the data and not an artefact of the Ward linkage criterion chosen a priori in Section 2.3. Whether a supervisor addresses this by loosening cohort-level allocation constraints to restore provider differentiation, or by introducing system-level diversification requirements that operate on the network rather than the fund, is a normative question about whose risk the regulation is meant to manage. The empirical analysis cannot resolve it, but it makes the trade-off precise and previously invisible.
The risk of regulatory design producing precisely this kind of portfolio synchronisation is not confined to lifecycle allocation rules. In the context of the proposed IORP II revision, PensionsEurope has warned explicitly that performance benchmarking requirements may induce uniform portfolio alignment across providers: institutions seeking to avoid being flagged as outliers or subject to corrective measures may increasingly allocate toward reference benchmarks, generating concentration risks and reducing the scope for differentiated long-term strategies [63]. The clustering results in Section 3.2 provide empirical content for that warning in the Lithuanian second-pillar context: the cohort-dominant structure documented here is the ex post outcome of a regulatory design that already constrains cross-sectional variation, and the PensionsEurope concern is that the proposed benchmarking layer would deepen that constraint further. The present framework makes the cost of such deepening measurable: any regulatory change that increases portfolio similarity within cohorts would raise the absorption ratio floor above its current minimum of 0.728 and compress the already narrow window between the lower threshold τ 1 = 0.801 and the crisis threshold τ = 0.851 .

4.2. Elevated Co-Movement Is the Norm, Not the Exception

The HMM identifies three regimes. The low-risk regime ( μ ^ = 0.773 ) accounts for only 13.2% of trading days; the moderate-risk regime ( μ ^ = 0.829 ) for 54.1%; and the high-risk regime ( μ ^ = 0.874 ) for 32.7%. The long-run stationary distribution converges to the same proportions, confirming that monitoring frameworks calibrated on the assumption that calm is the default state will systematically underestimate the frequency and persistence of risk in this sector: the network’s baseline condition is already elevated, not one that occasionally enters stress.
Two features of the regime structure deserve emphasis beyond the occupancy shares. First, regimes are sticky: once in the high-risk state, the expected duration is approximately 65 trading days (Table 14), meaning systemic stress episodes in this network do not resolve quickly. Second, the transition matrix is asymmetric: direct escalation from low- to high-risk occurs with probability 0.0046, so the system almost always passes through the moderate-risk band before a full crisis. This asymmetry is operationally significant. It means the moderate-risk regime is not a neutral middle state but the zone in which a supervisor has the most time to act. A sustained rise in A R t above the lower threshold τ 1 = 0.801 is a structural precursor signal, not a coincident one, and the transition probabilities quantify how much time typically elapses before escalation completes. Prior HMM applications to equity markets typically find low-risk states dominating the unconditional distribution; the finding here that low-risk accounts for only 13.2% of trading days is genuinely anomalous and reflects a network whose baseline condition is already elevated rather than one that occasionally enters stress.
The empirical crisis threshold τ = 0.851 is substantially above the range commonly used in the systemic risk literature [47]. This is simply the homogeneity finding stated in different units: in a network whose average pairwise correlation never drops below 0.72, the line between ordinary stress and crisis has to sit higher. Importing a generic threshold here would label roughly 70% of the sample as ‘crisis’ and would not be informative.

4.3. The Network Is Not Just Tracking Global Markets

A natural alternative explanation for the elevated absorption ratio is that it simply mirrors global equity market stress transmitted through fund holdings. The regime-conditional regressions rule this out. Slopes of A R t on the S&P 500, MSCI Europe, and MSCI World are statistically indistinguishable from zero in both calm and high-risk regimes, with R 2 below 0.006. The decoupling is visible in Figure 8: through late 2022 and into 2023, global benchmarks stabilise and recover while the network remains in high-risk.
The cluster-level tracking-error result reinforces this picture. The daily-frequency interaction coefficient is positive and economically meaningful ( + 0.037 ) but, as expected with HAC adjustment and a small effective sample within the high-risk regime, does not reach conventional significance. Aggregating to the cluster level removes that ambiguity: every one of the eight clusters amplifies during high-risk periods, with ratios from 1.09 × (C6, the most conservative) to 1.23 × (C7, the equity-heavy middle cohort). Whatever initiates a high-risk episode, what sustains it is internal to the network: amplification is universal across all eight clusters, not concentrated in the most equity-exposed funds. The practical implication is direct: supervisory tools designed to monitor externally transmitted shocks, global VaR limits, benchmark drawdown alerts, and MSCI factor exposure reports are largely blind to the risk documented here. A regulator relying exclusively on market-linked surveillance would have observed recovering global benchmarks through late 2022 and early 2023 while the pension network remained in the high-risk regime. The monitoring gap is not a calibration problem that better parameters would fix; it is structural, because the signal is in the cross-sectional co-movement of fund returns rather than in any market index. Closing it requires a network-internal indicator of the kind the absorption ratio provides.

4.4. What the Framework Makes Possible

The principal operational advance of this paper is that systemic risk monitoring for a pension fund network can now be conducted using only publicly available net asset value data, with no proprietary holdings information, no bilateral exposure records, and no labelled crisis history. The Bank of Lithuania publishes daily NAV data for all second-pillar funds; the absorption ratio and regime probabilities can be computed from these with a one-day lag and updated as a standing dashboard. This was not possible with previously available unsupervised frameworks, which either required balance-sheet data unavailable at daily frequency or relied on external benchmarks to define regime boundaries.
The framework’s credibility for regulatory use rests on a property that most HMM applications to financial data cannot claim: the regime structure is identified by the data rather than by modelling choices. The crisis threshold τ = 0.851 is derived from the emission distributions, not imposed; the number of states is selected by combining BIC with a Bayesian identifiability analysis under two independent prior calibrations; and the EM and Bayesian point estimates agree to within 0.001 across all three regimes (Table 15). A supervisor implementing this framework does not need to defend a discretionary tuning decision, because no such decision was made.
One identifiability result deserves separate attention. The K = 4 specification fails identification across the full range of empirically supported κ values, while both K = 3 and K = 5 are cleanly identified. The absence of a coherent four-regime partition, with K = 3 and K = 5 separated by a structural gap at K = 4 , is not a modelling failure but an empirical property of the absorption ratio’s dynamics in this network. It implies that the risk surface has a coarse three-level architecture, with finer gradations available only within the moderate-risk band. This is itself a finding about how systemic risk escalates in this sector and one that warrants theoretical attention in future work.

4.5. Implications for Supervision and Regulation

Three operational implications follow directly from the empirical results.
For supervisory monitoring, the framework produces two actionable thresholds from the data. The lower threshold τ 1 = 0.801 separates low- from moderate-risk and, given the asymmetric transition structure (Table 16), constitutes a structural precursor to crisis: a sustained absorption ratio above this level should trigger an enhanced monitoring requirement and increased reporting frequency. The crisis threshold τ = 0.851 marks the boundary of the high-risk regime and should trigger active supervisory engagement. Both thresholds can be computed daily from publicly available NAV data published by the Bank of Lithuania, with no proprietary data requirement.
For lifecycle regulation, the cohort-dominant cluster structure reveals a tension between micro-prudential and macro-prudential objectives that current rules do not resolve. Lifecycle allocation constraints protect individual participants from age-inappropriate risk, but they simultaneously eliminate provider choice as a system-level diversification channel. Two regulatory instruments could address this. The first is a loosening of cohort-level allocation bands to allow more provider variation within cohort, restoring some cross-sectional differentiation. The second is a system-level diversification requirement imposed at the network level rather than the fund level, analogous to concentration limits in banking supervision. Which instrument is preferable depends on whose risk the regulation is primarily meant to manage, a question that is normative rather than empirical, but one that the current analysis makes precise for the first time.
For participant welfare, the 1.09 × amplification observed for AG8/TIPF funds is the finding that most directly contradicts standard risk disclosures for conservative vehicles. Participants drawing down retirement income during a high-risk episode face elevated income uncertainty at a life stage where intertemporal recovery is not available. Risk disclosures and guarantee structures for capital-preservation funds should reflect this residual systemic sensitivity, which is invisible to fund-level volatility metrics, but measurable in the network framework developed here.

5. Conclusion

This paper establishes that systemic risk in a regulated pension fund network is not imported from global markets but generated internally by the network’s own structure. The finding reframes a regulatory design question: lifecycle allocation rules that protect individual participants simultaneously compress the cross-section of fund behaviour to the point where provider choice offers no meaningful risk differentiation at the system level. A monitoring framework that does not account for this internal dynamic will be blind to the dominant source of risk in the sector.
Applied to 40 Lithuanian second-pillar pension funds over January 2019 - September 2025, the framework delivers a finding that inverts the causal direction assumed by most supervisory frameworks. The prior implicit model is that external shocks propagate into pension funds through their market exposures. The result here is that the network generates its own high-risk states, independently of global equity conditions, and spends the majority of its time in them. The absorption ratio has no statistically significant relationship with global equity benchmarks in either calm or high-risk regimes. The high-risk state persists through periods of global market recovery, confirming that network stress is not resolved by external stabilisation. And tracking-error amplification during stress is universal across all eight fund clusters: even the most conservative funds serving retirement-age participants amplify by 1.09 × , at a life stage where intertemporal recovery is unavailable. The source of the vulnerability is structural homogeneity induced by lifecycle allocation regulation, not market exposure.
Three actors face distinct implications. The Bank of Lithuania can implement the monitoring framework immediately: the absorption ratio and regime probabilities are computable from the NAV data it already publishes, and the thresholds τ 1 = 0.801 and τ = 0.851 provide operationally precise triggers. The European Insurance and Occupational Pensions Authority (EIOPA) is positioned to extend the framework to cross-jurisdictional comparison: if lifecycle regulation is the operative mechanism, cohort-dominant cluster structures and internally generated high-risk regimes should appear in any EU second-pillar system with comparable lifecycle allocation mandates, a testable prediction that the pipeline can evaluate using publicly available NAV data from any member state. The Lithuanian pension regulator faces the normative question that the empirical analysis cannot answer: whether to restore provider differentiation by loosening cohort allocation bands or to introduce network-level diversification requirements that operate above the fund. The analysis makes the trade-off visible and measurable for the first time.
By identifying which episodes matter, which threshold separates regimes, and which structural feature drives co-movement, the framework makes four extensions precise rather than merely plausible. Cross-jurisdictional comparison is the most direct: the pipeline requires only publicly available NAV data, so applying it to other EU second-pillar systems would test whether the cohort-dominant structure and internally generated risk regimes documented here are specific to Lithuania or a general consequence of lifecycle regulation. Portfolio-level decomposition is now motivated by a precise question: the framework has identified which episodes are worth examining; holdings data would reveal which asset classes drive the leading eigenvalue during high-risk periods and whether the fire sale dynamics hypothesised by [5] are the operative mechanism. Real-time applicability is now a specific empirical question: the forward state probabilities in Table 17 make quantitative predictions for the six months following September 2025, which subsequent data will allow to be evaluated against the regime sequence. And the K = 4 identification failure is an unexplained structural feature of the absorption ratio’s dynamics in this network that warrants theoretical attention: why a three-level architecture with a structural gap at four states should emerge from a lifecycle-regulated system is a question the current empirical framework identifies but cannot answer. Cross-jurisdictional replication is the natural first step toward an answer: if cohort-dominant cluster structures and K = 4 gaps appear consistently in other EU second-pillar systems subject to comparable lifecycle mandates, that would suggest the 3-level architecture is a general property of the regulatory design rather than a feature of Lithuanian implementation, a distinction that would directly inform EIOPA’s approach to system-level risk surveillance across member states.
The broader question this paper opens is whether the tension between micro-prudential protection and macro-prudential homogeneity is specific to lifecycle-regulated pension systems or a structural feature of any regulatory design that standardises portfolio behaviour across a large institutional population. Sovereign wealth funds, insurance general accounts, and target-date retail funds all operate under mandates that constrain cross-sectional variation. If the mechanism identified here generalises, the implication is that diversification at the participant level and fragility at the network level are not accidental co-occurrences but two consequences of the same regulatory logic and that making one better without making the other worse requires instruments that operate at the network level rather than the fund level. That is the design problem this paper makes legible.

Author Contributions

Conceptualization, M.N.J.S. and A.K.; methodology, M.N.J.S.; software, M.N.J.S.; validation, M.N.J.S. and A.K.; formal analysis, M.N.J.S.; investigation, M.N.J.S.; resources, data curation, A.K.; writing - original draft preparation, M.N.J.S.; writing - review and editing, M.N.J.S. and A.K.; visualization, M.N.J.S.; supervision, A.K.; project administration, A.K.; funding acquisition, A.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the European Union’s Horizon Europe research and innovation programme under MSCA Industrial Doctoral Network on Digital Finance (DIGITAL), grant agreement No. 101119635. More information is available at https://www.digital-finance-msca.com/.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The pension fund net asset value data used in this study are sourced from publicly available records published by the Bank of Lithuania (https://www.lb.lt/en/pf-performance-indicators). Provider identities have been anonymised to preserve confidentiality. Global benchmark series (MSCI World, MSCI Europe, S&P 500) are publicly available via Yahoo Finance. Replication code, including a synthetic dataset generator that reproduces the qualitative structure of the empirical analysis for end-to-end verification, is openly available on GitHub at https://github.com/Staures92/identifiable-regime-detection and permanently archived on Zenodo at https://doi.org/10.5281/zenodo.20408012.

Acknowledgments

The first author thanks Swedbank (Vilnius, Lithuania) for hosting her industrial secondment under the MSCA Industrial Doctoral Network on Digital Finance. During the preparation of this manuscript, the authors used Claude (Anthropic, Opus 4.7) and ChatGPT (OpenAI, variuos models) for language refinement and restructuring of expository passages. GitHub Copilot was used during the development of the Python analysis pipeline for code completions and refactoring suggestions. All analytical methods, model specifications, empirical results, and substantive interpretations are entirely the authors’ own. The authors have reviewed and edited all AI-assisted output and take full responsibility for the content of this publication.

Conflicts of Interest

The first author is currently undertaking an industrial secondment at Swedbank (Vilnius, Lithuania) as part of the MSCA Industrial Doctoral Network on Digital Finance. Swedbank is one of the pension fund providers active in the Lithuanian second-pillar market; provider identities in this study have been anonymised, and the analysis is conducted on publicly available data from the Bank of Lithuania. Neither Swedbank nor any other industry partner had any role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results. The authors declare no other conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AR Absorption Ratio
BIC Bayesian Information Criterion
BW Baum-Welch (algorithm)
CRISP-DM Cross-Industry Standard Process for Data Mining
DB Davies-Bouldin (index)
DBA DTW Barycenter Averaging
DTW Dynamic Time Warping
EM Expectation-Maximization
HAC Heteroskedasticity and Autocorrelation Consistent (estimator)
HMM Hidden Markov Model
IORP Institutions for Occupational Retirement Provision
MCD Minimum Covariance Determinant
ML Machine Learning
MP Marchenko-Pastur
MSCA Marie Skłodowska-Curie Actions
NAV Net Asset Value
NUTS No-U-Turn Sampler
OLS Ordinary Least Squares
PCA Principal Component Analysis
RMT Random Matrix Theory
TIPF Target Income Pension Fund

Appendix A. Covariance Estimator Robustness

This appendix reports the results of a robustness exercise in which A R t is recomputed using three alternative covariance estimators: the Ledoit-Wolf shrinkage estimator [52], the minimum covariance determinant (MCD) estimator, and a covariance matrix filtered through Marchenko-Pastur (MP) eigenvalue clipping with trace-preserving noise replacement [53,54]. Each estimator addresses a different concern about the sample covariance matrix: shrinkage stabilises the spectrum for portfolio optimisation, MCD reduces sensitivity to outliers, and MP filtering separates signal eigenvalues from finite-sample noise.
The MP filter is implemented following the procedure of [54]. Within each rolling window, returns are first standardised to zero mean and unit variance, so that the resulting correlation matrix has elements with σ 2 = 1 . The correlation matrix is eigendecomposed, and all eigenvalues at or below the upper Marchenko-Pastur noise bound
λ + = σ 2 1 + N / T 2
are replaced by their average, while eigenvalues exceeding λ + are retained unchanged. Replacing the bulk eigenvalues by their average preserves the trace of the matrix, ensuring that total variance is conserved while noise variance is redistributed uniformly across the bulk. The filtered correlation matrix is then rescaled to covariance using the original return standard deviations. With N = 40 , T = ω = 60 , and σ 2 = 1 by construction, the noise bound is λ + 3.30 .
Table A1 reports descriptive statistics for the four resulting A R t series, and Table A2 quantifies the deviations of each alternative estimator from the sample covariance baseline. Figure A1 plots the four series over the full sample period.
Table A1. Descriptive statistics of A R t under alternative covariance estimators. All series are computed from the pension fund return matrix R ( f ) with rolling window ω = 60 .
Table A1. Descriptive statistics of A R t under alternative covariance estimators. All series are computed from the pension fund return matrix R ( f ) with rolling window ω = 60 .
Estimator Mean Std Min Max
Sample covariance (baseline) 0.8365 0.0362 0.8582 0.9383
Ledoit - Wolf shrinkage 0.7750 0.0415 0.7997 0.8710
Minimum Covariance Determinant 0.8301 0.0429 0.8612 0.9235
Marchenko - Pastur filtered 0.8254 0.0374 0.8466 0.9329
Table A2. Differences in A R t relative to the sample covariance baseline. Mean Diff is the average pointwise difference (alternative − baseline). MAD is the mean absolute deviation. Correlation is the Pearson correlation with the baseline series.
Table A2. Differences in A R t relative to the sample covariance baseline. Mean Diff is the average pointwise difference (alternative − baseline). MAD is the mean absolute deviation. Correlation is the Pearson correlation with the baseline series.
Estimator Mean Diff Min Diff Max Diff MAD Correlation
Ledoit - Wolf −0.061 −0.255 −0.029 0.061 0.68
MCD −0.006 −0.158 0.072 0.019 0.79
Marchenko - Pastur filtered −0.011 −0.029 −0.0003 0.011 0.98
The three estimators differ from the baseline in distinct and informative ways.
The Ledoit-Wolf estimator produces a systematic downward shift of approximately 6.1 percentage points (mean 0.775 vs. 0.837 ), with the largest individual reduction reaching 0.255 . This attenuation is consistent with the design of shrinkage estimators, which compress the eigenvalue spectrum toward a structured target and thereby reduce the relative dominance of λ 1 ( t ) . While appropriate for portfolio optimisation, this property works against the objective of A R t , which is to measure that dominance directly.
The MCD estimator tracks the baseline closely on average (mean difference 0.006 , MAD = 0.019 ) but exhibits substantial local volatility, with pointwise differences ranging from 0.158 to + 0.072 . This reflects the instability of robust subset selection within short rolling windows: small changes in which observations are flagged as outliers produce visible day-to-day fluctuations in the resulting A R t series.
The Marchenko-Pastur filtered estimator yields the closest agreement with the baseline, with a Pearson correlation of 0.98 and a mean absolute deviation of only 0.011 . All pointwise differences are non-positive (range 0.029 to 0.0003 ), reflecting the trace-preserving construction of the filter. This near-equivalence is economically informative: it implies that the leading eigenvalue λ 1 ( t ) substantially exceeds the MP noise bound λ + 3.30 throughout the sample period and therefore lies well outside the noise bulk. The MP filter, consequently, leaves the dominant signal eigenvalue intact and only redistributes noise variance among the remaining eigenvalues, leaving A R t essentially unchanged. This confirms that elevated values of the absorption ratio reflect genuine cross-fund co-movement rather than finite-sample noise in the leading principal component, even in the moderately undersampled regime where N / T 0.67 .
Figure A1. Absorption ratio A R t computed using four covariance estimators: sample covariance (black, baseline), Ledoit-Wolf shrinkage (blue), minimum covariance determinant (red), and Marchenko-Pastur filtered (green, λ + = 3.30 ). The four series exhibit similar temporal dynamics, with peaks and troughs broadly aligned. The Ledoit-Wolf series runs consistently below the baseline due to spectral shrinkage. The MCD series tracks the baseline closely on average but shows higher day-to-day volatility because robust subset selection is sensitive to which observations are flagged as outliers within each 60-day window. The MP-filtered series tracks the baseline most closely (correlation 0.98 ), confirming that the leading eigenvalue lies well outside the MP noise bulk.
Figure A1. Absorption ratio A R t computed using four covariance estimators: sample covariance (black, baseline), Ledoit-Wolf shrinkage (blue), minimum covariance determinant (red), and Marchenko-Pastur filtered (green, λ + = 3.30 ). The four series exhibit similar temporal dynamics, with peaks and troughs broadly aligned. The Ledoit-Wolf series runs consistently below the baseline due to spectral shrinkage. The MCD series tracks the baseline closely on average but shows higher day-to-day volatility because robust subset selection is sensitive to which observations are flagged as outliers within each 60-day window. The MP-filtered series tracks the baseline most closely (correlation 0.98 ), confirming that the leading eigenvalue lies well outside the MP noise bulk.
Preprints 216176 g0a1
Taken together, these results confirm that the qualitative regime structure identified by the HMM, including the timing and persistence of High-Risk episodes, is robust to the choice of covariance specification. The sample covariance is retained as the baseline because it preserves the full empirical concentration of variance in λ 1 ( t ) , which is the quantity that A R t is designed to measure. The MP-filtered estimator confirms that this concentration reflects a genuine signal rather than estimation noise; the MCD estimator confirms that it is not driven by outliers; and the Ledoit-Wolf estimator demonstrates the level of distortion that occurs when an estimator designed for a different purpose is applied.

References

  1. Billio, M.; Getmansky, M.; Lo, A.W.; Pelizzon, L. Econometric measures of connectedness and systemic risk in the finance and insurance sectors. J. Financ. Econ. 2012, 104, 535–559. [Google Scholar] [CrossRef]
  2. Agrawal, S.; Broszeit, T.; Singh, R.; Sugimoto, N.; Surti, J.; Vogelsang, S. Pension Funds and Financial Stability. Glob. Financ. Stab. Notes 2025, 2025. [Google Scholar] [CrossRef]
  3. Greenwood, R.; Landier, A.; Thesmar, D. Vulnerable banks. J. Financ. Econ. 2015, 115, 471–485. [Google Scholar] [CrossRef]
  4. Delpini, D.; Battiston, S.; Caldarelli, G.; Riccaboni, M. Systemic risk from investment similarities. PLoS ONE 2019, 14. [Google Scholar] [CrossRef]
  5. Cont, R.; Schaanning, E. Fire Sales, Indirect Contagion and Systemic Stress Testing. Working Paper 2/2017, Norges Bank, 2017.
  6. Juneja, H. Pension funds and systemic risks, 2025.
  7. Bobro, N. When a pension fund fails: What the German collapse reveals about systemic weaknesses, 2026.
  8. Project, T.I.I. How Do Asset Owners Address Systemic Risk? Technical report, Investors for Purpose, 2024. White paper.
  9. Micocci, M.; Gregoriou, G.N.; Masala, G.B. Pension fund risk management: financial and actuarial modeling; Chapman & Hall/Crc, 2010. [Google Scholar]
  10. Lin, Y.; MacMinn, R.D.; Tian, R.; Yu, J. Pension Risk Management in the Enterprise Risk Management Framework. J. Risk Insur. 2017, 84, 345–365. [Google Scholar] [CrossRef]
  11. Wever, M.; Shah, M.; O’Leary, N. Designing early warning systems for detecting systemic risk: A case study and discussion. Futures 2022, 136. [Google Scholar] [CrossRef]
  12. Mashrur, A.; Luo, W.; Zaidi, N.A.; Robles-Kelly, A. Machine Learning for Financial Risk Management: A Survey. IEEE Access 2020, 8, 203434–203456. [Google Scholar] [CrossRef]
  13. Leo, M.; Sharma, S.; Maddulety, K. Machine learning in banking risk management: A literature review. Risks 2019, 7. [Google Scholar] [CrossRef]
  14. Caccioli, F.; Barucca, P.; Kobayashi, T. Network models of financial systemic risk: a review. J. Comput. Soc. Sci. 2017, 1, 81–114. [Google Scholar] [CrossRef]
  15. Eisenberg, L.; Noe, T.H. Systemic Risk in Financial Systems. Manag. Sci. 2001, 47, 236–249. [Google Scholar] [CrossRef]
  16. Li, B.; Zhang, X. Systemic risk and financial networks. Q. Rev. Econ. Financ. 2024, 94, 25–36. [Google Scholar] [CrossRef]
  17. Biagini, F.; Fouque, J.P.; Frittelli, M.; Meyer-Brandis, T. A unified approach to systemic risk measures via acceptance sets. Math. Financ. 2018, 29, 329–367. [Google Scholar] [CrossRef]
  18. Gonon, L.; Meyer-Brandis, T.; Weber, N. Computing Systemic Risk Measures with Graph Neural Networks. SIAM J. Financ. Math. 2026, 17. [Google Scholar] [CrossRef]
  19. Feng, Y.; Min, M.; Fouque, J.P. Deep Learning for Systemic Risk Measures. In Proceedings of the Proceedings of the Third ACM International Conference on AI in Finance (ICAIF ’22), New York, NY, USA, 2022; pp. 62–69. [Google Scholar] [CrossRef]
  20. Jeaab, K.; Saoudi, Y.; Ouaharahe, S.; Falloul, M.E.M. Predicting financial contagion: A deep learning-enhanced actuarial model for systemic risk assessment. J. Risk Financ. Manag. 2026, 19. [Google Scholar] [CrossRef]
  21. Balmaseda, V.; Coronado, M.; de Cadenas-Santiago, G. Predicting systemic risk in financial systems using Deep Graph Learning. Intell. Syst. With Appl. 2023, 19, 200240–200240. [Google Scholar] [CrossRef]
  22. Chen, Y.; Wang, G.J.; Zhu, Y.; Xie, C.; Uddin, G.S. Identifying systemic risk drivers of FinTech and traditional financial institutions: Machine learning-based prediction and interpretation. Eur. J. Financ. 2024, 30, 2157–2190. [Google Scholar] [CrossRef]
  23. Cheng, Y.; Xu, Z.; Chen, Y.; Wang, Y.; Lin, Z.; Liu, J. A Deep Learning Framework Integrating CNN and BiLSTM for Financial Systemic Risk Analysis and Prediction. In Proceedings of the Proceedings of the 2025 International Conference on Economic Management and Big Data Application (ICEMBDA ’25), New York, NY, USA, 2025; pp. 270–274. [Google Scholar] [CrossRef]
  24. Wang, L.; Cheng, Y.; Gu, X.; Wu, Z. Design and optimization of financial market risk monitoring system based on big data machine learning. Procedia Comput. Sci. 2025, 262, 553–560. [Google Scholar] [CrossRef]
  25. Yao, Y.; Xu, Z.; Liu, Y.; Ma, K.; Lin, Y.; Jiang, M. Integrating Feature Attention and Temporal Modeling for Collaborative Financial Risk Assessment. arXiv:2508.09399 [cs.LG] 2025, arXiv:2508.09399. [Google Scholar] [CrossRef]
  26. Srour, Z.; Hammoud, J.; Tarabay, M. Forecasting Systemic Risk in the European Banking Industry: A Machine Learning Approach. J. Risk Financ. Manag. 2025, 18, 335. [Google Scholar] [CrossRef]
  27. Tang, P.; Tang, T.; Lu, C. Predicting systemic financial risk with interpretable machine learning. North Am. J. Econ. Financ. 2024, 71, 102088–102088. [Google Scholar] [CrossRef]
  28. Li, W. Design of Financial Risk Warning System Based on AI Algorithm and Deep Learning. In Proceedings of the Proceedings of the 2024 IEEE 2nd International Conference on Image Processing and Computer Applications (ICIPCA), Shenyang, China, 2024; pp. 1264–1268. [Google Scholar] [CrossRef]
  29. Qiu, Y.; Xu, Y.; Zhu, Y. Design of financial risk warning system in digital economy based on deep neural network. KSII Trans. Internet Inf. Syst. 2025, 19, 3110–3129. [Google Scholar] [CrossRef]
  30. An, S.; Chen, Y. Systematic financial risk identification and dynamic evolution based on deep learning. Mob. Inf. Syst. 2022. [Google Scholar] [CrossRef]
  31. Sun, J.; Sun, W.; Sun, L. Design of a Financial Risk Identification and Early Warning Model Based on Machine Learning. In Proceedings of the Recent Developments in Computational Finance and Business Analytics. Springer, Cham, 2025, Vol. 53, Learning and Analytics in Intelligent Systems, pp. 42–51. [CrossRef]
  32. Wang, T.; Zhao, S.; Zhu, G.; Zheng, H. A machine learning-based early warning system for systemic banking crises. Appl. Econ. 2021, 53, 2974–2992. [Google Scholar] [CrossRef]
  33. Laeven, L.; Valencia, F. Systemic Banking Crises Revisited. IMFWorking Paper WP/18/206, International Monetary Fund,Washington, DC, 2018.
  34. Pinheiro, F.; Kim, J.K.; Bao, H. Early Warning and Prediction of Systemic Financial Risk Using Machine Learning Methods. Trans. Comput. Sci. Methods 2026, 6. [Google Scholar]
  35. Kritzman, M.; Li, Y.; Page, S.; Rigobon, R. Principal Components as a Measure of Systemic Risk. J. Portf. Manag. 2011, 37, 112–126. [Google Scholar] [CrossRef]
  36. So, M.K.P.; Mak, A.S.W.; Chu, A.M.Y. Assessing systemic risk in financial markets using dynamic topic networks. Sci. Rep. 2022, 12. [Google Scholar] [CrossRef]
  37. Alonso-Robisco, A.; Azqueta-Gavaldón, A.; Carbó, J.M.; González, J.L.; Hernáez, A.I.; Herrera, J.L.; Quintana, J.; Tarancón, J. Empowering Financial Supervision: A SupTech Experiment Using Machine Learning in an Early Warning System. Occasional Paper 2504, Banco de España, 2025.
  38. Kou, G.; Chao, X.; Peng, Y.; Alsaadi, F.E.; Herrera-Viedma, E. Machine learning methods for systemic risk analysis in financial sectors. Technol. Econ. Dev. Econ. 2019, 25, 716–742. [Google Scholar] [CrossRef]
  39. Alexandre, M.; Silva, T.C.; Connaughton, C.; Rodrigues, F.A. The drivers of systemic risk in financial networks: A data-driven machine learning analysis. Chaos Solitons Fractals 2021, 153, 111588. [Google Scholar] [CrossRef]
  40. Aghabozorgi, S.; Seyed Shirkhorshidi, A.; Ying Wah, T. Time-series clustering – A decade review. Inf. Syst. 2015, 53, 16–38. [Google Scholar] [CrossRef]
  41. Hamilton, J.D. A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle. Econometrica 1989, 57, 357–384. [Google Scholar] [CrossRef]
  42. Wirth, R.; Hipp, J. CRISP-DM: Towards a Standard Process Model for Data Mining, 2000.
  43. Kabašinskas, A.; Kopa, M.; Šutienė, K.; Lakštutienė, A.; Malakauskas, A. Stress testing for IInd pillar life-cycle pension funds using hidden Markov model. Ann. Oper. Res. 2024. [Google Scholar] [CrossRef]
  44. Kabašinskas, A. Systemic risk assessment of Lithuanian second-pillar pension funds through connectedness and spillover. J. Math. Ind. 2024, 14. [Google Scholar] [CrossRef]
  45. Fox, E.B.; Sudderth, E.B.; Jordan, M.I.; Willsky, A.S. A sticky HDP-HMM with application to speaker diarization. Ann. Appl. Stat. 2011, 5. [Google Scholar] [CrossRef]
  46. Halim, S.; Miller, T.T.; Dupont, D.C. How Pension Funds Manage Investment Risks: A Global Survey. Rotman Int. J. Pension. Manag. 2010, 3, 30–38. [Google Scholar]
  47. Ana, I.C.P. Systemic Risk and Default Cascades in Global Equity Markets: A Network and Tail-Risk Approach Based on the Gai Kapadia Framework. 2026. [Google Scholar] [CrossRef]
  48. Jolliffe, I.T.; Cadima, J. Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2016, 374, 20150202. [Google Scholar] [CrossRef] [PubMed]
  49. Ortobelli, S.; Tichý, T. On the impact of semidefinite positive correlation measures in portfolio theory. Ann. Oper. Res. 2015, 235, 625–652. [Google Scholar] [CrossRef]
  50. Neděla, D.; Kabašinskas, A.; Megang, N.J.S. Quantile regression–PCA framework in portfolio selection process. Cent. Eur. J. Oper. Res. 2026. [Google Scholar] [CrossRef]
  51. Scherer, B. Portfolio Construction and Risk Budgeting; Risk Waters Group Ltd, 2002. [Google Scholar]
  52. Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef]
  53. Laloux, L.; Cizeau, P.; Bouchaud, J.P.; Potters, M. Noise Dressing of Financial Correlation Matrices. Phys. Rev. Lett. 1999, 83, 1467–1470. [Google Scholar] [CrossRef]
  54. Laloux, L.; Cizeau, P.; Potters, M.; Bouchaud, J.P. RANDOM MATRIX THEORY AND FINANCIAL CORRELATIONS. Int. J. Theor. Appl. Financ. 2000, 03, 391–397. [Google Scholar] [CrossRef]
  55. Sakoe, H.; Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. Speech Signal Process. 1978, 26, 43–49. [Google Scholar] [CrossRef]
  56. Vidal Ruiz, E.; Casacuberta Nolla, F.; Rulot Segovia, H. Is the DTW “distance” really a metric? An algorithm reducing the number of DTW comparisons in isolated word recognition. Speech Commun. 1985, 4, 333–344. [Google Scholar] [CrossRef]
  57. Petitjean, F.; Ketterlin, A.; Gançarski, P. A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognit. 2011, 44, 678–693. [Google Scholar] [CrossRef]
  58. Randriamihamison, N.; Vialaneix, N.; Neuvial, P. Applicability and Interpretability of Ward’s Hierarchical Agglomerative Clustering With or Without Contiguity Constraints. J. Classif. 2020, 38, 363–389. [Google Scholar] [CrossRef]
  59. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. Ser. B (Methodological) 1977, 39, 1–38. [Google Scholar] [CrossRef]
  60. Keribin, C. Consistent Estimation of the Order of Mixture Models. Sankhyā Indian J. Stat. Ser. A (19612002) 2000, 62, 49–66. [Google Scholar]
  61. Baudry, J.; Raftery, A.E.; Celeux, G.; Lo, K.; Gottardo, R. Combining Mixture Components for Clustering. J. Comput. Graph. Stat. 2010, 19, 332–353. [Google Scholar] [CrossRef]
  62. Tavenard, R.; Faouzi, J.; Vandewiele, G.; Divo, F.; Androz, G.; Holtz, C.; Payne, M.; Yurchak, R.; Rußwurm, M.; Kolar, K.; et al. Tslearn, A Machine Learning Toolkit for Time Series Data. J. Mach. Learn. Res. 2020, 21, 1–6. [Google Scholar]
  63. PensionsEurope. PensionsEurope’s Position Paper on the IORP II Review: Pension Package of Supplementary Pensions. Technical report, PensionsEurope, Brussels, 2025. Position paper. Accessed: June 2025.
1
A small ridge regularisation term ε I , ε = 10 6 , is added to Σ t exclusively in the robustness exercise, where the augmented matrix dimensions ( 2 N + M = 83 > ω = 60 ) render Σ t rank-deficient. Applying the same regularisation to the baseline covariance matrix produces differences in A R t of 0.01–0.05 across sample windows, confirming that regularisation is material in the full-rank setting and must not be applied to the baseline analysis.
Figure 2. Sixty-day moving average of log return series for the three return matrices defined in equation (1). Panel 1: all N = 40 pension fund returns R ( f ) , offset by provider for visual clarity. Panel 2: all N = 40 fund-specific benchmark returns R ( b , f ) , offset by provider. In both panels, colour denotes provider and line style denotes cohort type: solid for AG1 (oldest active), thin for AG2–AG7 (middle cohorts), and dashed for AG8/TIPF. Panel 3: all M = 3 global benchmark returns R ( b , g ) . Shaded regions indicate the COVID-19 shock period (February - May 2020, red) and the 2022 interest rate tightening cycle (March - December 2022).
Figure 2. Sixty-day moving average of log return series for the three return matrices defined in equation (1). Panel 1: all N = 40 pension fund returns R ( f ) , offset by provider for visual clarity. Panel 2: all N = 40 fund-specific benchmark returns R ( b , f ) , offset by provider. In both panels, colour denotes provider and line style denotes cohort type: solid for AG1 (oldest active), thin for AG2–AG7 (middle cohorts), and dashed for AG8/TIPF. Panel 3: all M = 3 global benchmark returns R ( b , g ) . Shaded regions indicate the COVID-19 shock period (February - May 2020, red) and the 2022 interest rate tightening cycle (March - December 2022).
Preprints 216176 g002
Figure 3. Panel 1: baseline absorption ratio { A R t } computed from R ( f ) with rolling window ω = 60 trading days. Panel 2: 60-day moving average of all N = 40 fund-specific benchmark returns R ( b , f ) ; colour denotes provider with eight age-cohort series per provider. Panel 3: 60-day moving average of all M = 3 global benchmark returns R ( b , g ) . Shaded regions indicate the COVID-19 shock (February–May 2020, red) and the 2022 interest rate tightening cycle (March–December 2022, amber). The absorption ratio is constructed exclusively from R ( f ) ; benchmark series serve an interpretive role only.
Figure 3. Panel 1: baseline absorption ratio { A R t } computed from R ( f ) with rolling window ω = 60 trading days. Panel 2: 60-day moving average of all N = 40 fund-specific benchmark returns R ( b , f ) ; colour denotes provider with eight age-cohort series per provider. Panel 3: 60-day moving average of all M = 3 global benchmark returns R ( b , g ) . Shaded regions indicate the COVID-19 shock (February–May 2020, red) and the 2022 interest rate tightening cycle (March–December 2022, amber). The absorption ratio is constructed exclusively from R ( f ) ; benchmark series serve an interpretive role only.
Preprints 216176 g003
Figure 4. Baseline vs Augmented Absorption Ratio
Figure 4. Baseline vs Augmented Absorption Ratio
Preprints 216176 g004
Figure 5. Hierarchical clustering dendrogram based on pairwise DTW distances with Ward linkage. The dashed red line indicates the cut yielding the optimal K c * = 8 clusters. Fund labels follow the convention Pi/AGj for Provider i, age cohort j.
Figure 5. Hierarchical clustering dendrogram based on pairwise DTW distances with Ward linkage. The dashed red line indicates the cut yielding the optimal K c * = 8 clusters. Fund labels follow the convention Pi/AGj for Provider i, age cohort j.
Preprints 216176 g005
Figure 6. Cluster assignments by provider and age cohort ( K c * = 8 ). Cell labels indicate cluster membership.
Figure 6. Cluster assignments by provider and age cohort ( K c * = 8 ). Cell labels indicate cluster membership.
Preprints 216176 g006
Figure 7. Gaussian emission distributions estimated by the BW algorithm for the three-state HMM ( K = 3 ). Each distribution N ( μ ^ k , σ ^ k 2 ) corresponds to one latent regime; dashed vertical lines indicate emission means μ ^ k . The red dashed line marks the empirical crisis threshold τ = 0.8514 , defined as the absorption ratio level where the moderate- and high-risk emission densities have equal mass. The dotted line marks the low |moderate boundary τ 1 = 0.801 . The shaded region marks the overlap zone between the moderate- and high-risk distributions, where regime assignment is governed by the transition matrix rather than by emission density alone.
Figure 7. Gaussian emission distributions estimated by the BW algorithm for the three-state HMM ( K = 3 ). Each distribution N ( μ ^ k , σ ^ k 2 ) corresponds to one latent regime; dashed vertical lines indicate emission means μ ^ k . The red dashed line marks the empirical crisis threshold τ = 0.8514 , defined as the absorption ratio level where the moderate- and high-risk emission densities have equal mass. The dotted line marks the low |moderate boundary τ 1 = 0.801 . The shaded region marks the overlap zone between the moderate- and high-risk distributions, where regime assignment is governed by the transition matrix rather than by emission density alone.
Preprints 216176 g007
Figure 8. Top panel: absorption ratio { A R t } with HMM regime classification (low-risk: blue, moderate-risk: amber, high-risk: red), empirical crisis threshold τ = 0.851 (dashed red line), and theoretical lower bound 1 / N = 0.025 (dotted line). The annotation reports the daily crisis count under the threshold rule C t = I { A R t > τ } (494 of 1,579 trading days, 31.3%); this differs from the Viterbi-decoded high-risk count (516 days, 32.7%) for the reasons discussed in Section 3.3.3. Middle panel: 60-day moving average of all N = 40 fund-specific benchmark returns r i , t ( b , f ) , colour-coded by provider. Bottom panel: 60-day moving average of all M = 3 global benchmark returns r m , t ( b , g ) . Regime shading is consistent across all panels.
Figure 8. Top panel: absorption ratio { A R t } with HMM regime classification (low-risk: blue, moderate-risk: amber, high-risk: red), empirical crisis threshold τ = 0.851 (dashed red line), and theoretical lower bound 1 / N = 0.025 (dotted line). The annotation reports the daily crisis count under the threshold rule C t = I { A R t > τ } (494 of 1,579 trading days, 31.3%); this differs from the Viterbi-decoded high-risk count (516 days, 32.7%) for the reasons discussed in Section 3.3.3. Middle panel: 60-day moving average of all N = 40 fund-specific benchmark returns r i , t ( b , f ) , colour-coded by provider. Bottom panel: 60-day moving average of all M = 3 global benchmark returns r m , t ( b , g ) . Regime shading is consistent across all panels.
Preprints 216176 g008
Figure 9. Scatter plots of the absorption ratio A R t against each interpretive series, with observations coloured by HMM regime (low-risk: blue, moderate-risk: amber, high-risk: red). Solid black lines show OLS regression fits over the full sample. Pearson (r), Spearman ( ρ ), and Kendall ( τ ) correlation coefficients are reported with significance levels (*** p < 0.001 , * p < 0.05 , n.s. = not significant). The vertical dashed line marks the crisis threshold τ = 0.851 .
Figure 9. Scatter plots of the absorption ratio A R t against each interpretive series, with observations coloured by HMM regime (low-risk: blue, moderate-risk: amber, high-risk: red). Solid black lines show OLS regression fits over the full sample. Pearson (r), Spearman ( ρ ), and Kendall ( τ ) correlation coefficients are reported with significance levels (*** p < 0.001 , * p < 0.05 , n.s. = not significant). The vertical dashed line marks the crisis threshold τ = 0.851 .
Preprints 216176 g009
Figure 10. Regime-conditional scatter of A R t against each interpretive series. The blue line is fitted only to non-high-risk observations (slope b 1 ); the dark red line is fitted only to high-risk observations (slope b 1 + b 3 ). Each line is drawn over the range of A R t values observed within its respective regime. The vertical dotted line marks the crisis threshold τ = 0.851 . Precise coefficients and HAC-adjusted p-values are reported in Table 19.
Figure 10. Regime-conditional scatter of A R t against each interpretive series. The blue line is fitted only to non-high-risk observations (slope b 1 ); the dark red line is fitted only to high-risk observations (slope b 1 + b 3 ). Each line is drawn over the range of A R t values observed within its respective regime. The vertical dotted line marks the crisis threshold τ = 0.851 . Precise coefficients and HAC-adjusted p-values are reported in Table 19.
Preprints 216176 g010
Figure 11. Mean tracking error by cluster under normal and High-Risk regimes (left panel) and crisis amplification ratio (right panel). The ratio is defined as the mean tracking error during High-Risk days divided by the mean tracking error during normal days. Blue bars indicate extreme-cohort clusters (AG1 and AG8); red bars indicate middle-cohort clusters (AG2 - AG7). Light shading = normal regime; dark shading = High-Risk regime.
Figure 11. Mean tracking error by cluster under normal and High-Risk regimes (left panel) and crisis amplification ratio (right panel). The ratio is defined as the mean tracking error during High-Risk days divided by the mean tracking error during normal days. Blue bars indicate extreme-cohort clusters (AG1 and AG8); red bars indicate middle-cohort clusters (AG2 - AG7). Light shading = normal regime; dark shading = High-Risk regime.
Preprints 216176 g011
Table 2. Descriptive statistics of the baseline absorption ratio A R t computed from R ( f ) with rolling window ω = 60 trading days. Theoretical bounds follow from Proposition 3: A R t [ 1 / N , 1 ] = [ 0.025 , 1.000 ] for N = 40 .
Table 2. Descriptive statistics of the baseline absorption ratio A R t computed from R ( f ) with rolling window ω = 60 trading days. Theoretical bounds follow from Proposition 3: A R t [ 1 / N , 1 ] = [ 0.025 , 1.000 ] for N = 40 .
Statistic Value
Observations 1,579
Mean 0.8365
Standard deviation 0.0362
Minimum 0.7280
Maximum 0.9383
Table 3. Effect of ridge regularisation ( ε = 10 6 ) on the baseline absorption ratio A R t across three sample windows. Regularisation is applied exclusively in the robustness exercise involving the rank-deficient augmented covariance matrix ( 2 N + M = 83 > ω = 60 ).
Table 3. Effect of ridge regularisation ( ε = 10 6 ) on the baseline absorption ratio A R t across three sample windows. Regularisation is applied exclusively in the robustness exercise involving the rank-deficient augmented covariance matrix ( 2 N + M = 83 > ω = 60 ).
Window A R t (unregularised) A R t (regularised) Difference
Early 0.8993 0.8573 0.0420
Mid 0.8518 0.8391 0.0127
Late 0.8285 0.7817 0.0468
Table 4. DB index for K c { 2 , , 8 } clusters. Lower values indicate better cluster separation.
Table 4. DB index for K c { 2 , , 8 } clusters. Lower values indicate better cluster separation.
K c DB index
2 0.9815
3 1.0295
4 0.9519
5 0.8678
6 0.7983
7 0.8688
8 0.7505
Table 6. Cluster composition at K c = 8 and the corresponding K c = 15 subdivisions. Hierarchical Ward linkage ensures full nesting: every K c = 15 cluster is a strict subset of one K c = 8 cluster. Five of K c = 15 clusters are singletons, confirming that finer partitions reflect over-segmentation rather than new structure.
Table 6. Cluster composition at K c = 8 and the corresponding K c = 15 subdivisions. Hierarchical Ward linkage ensures full nesting: every K c = 15 cluster is a strict subset of one K c = 8 cluster. Five of K c = 15 clusters are singletons, confirming that finer partitions reflect over-segmentation rather than new structure.
K c = 8 Composition (TE) K c = 15 subdivision Size TE
C1 Provider 3, AG2-AG7 (0.0046) Provider 3, AG3-AG7 5 0.0049
Provider 3, AG2 1 0.0026
C2 Provider 1, AG2-AG7 (0.0066) Provider 1, AG2-AG6 5 0.0065
Provider 1, AG7 1 0.0073
C3 Provider 1, AG1 + AG8 (0.0014) Provider 1, AG1 1 0.0017
Provider 1, AG8 1 0.0011
C4 Providers 2 & 5, AG1 + AG8 (0.0010) Provider 5, AG1 + AG8 2 0.0013
Provider 2, AG1 + AG8 2 0.0008
C5 Provider 4, AG1 + AG8 (0.0012) Provider 4, AG1 + AG8 2 0.0012
C6 Provider 3, AG1 + AG8 (0.0011) Provider 3, AG1 + AG8 2 0.0011
C7 Provider 4, AG2-AG7 (0.0041) Provider 4, AG3-AG7 5 0.0044
Provider 4, AG2 1 0.0024
C8 Providers 2 & 5, AG2-AG7 (0.0036) Providers 2 & 5, AG2-AG3 4 0.0026
Provider 2, AG4-AG7 4 0.0042
Provider 5, AG4-AG7 4 0.0039
Table 7. DB scores across three clustering methods. Lower values indicate better separation. The lowest score in each row is shown in bold.
Table 7. DB scores across three clustering methods. Lower values indicate better separation. The lowest score in each row is shown in bold.
K c Ward + DTW k-means (Eucl.) k-means (DTW)
2 0.9815 1.0292 1.0413
3 1.0295 1.0429 1.0831
4 0.9519 0.9685 1.0809
5 0.8678 0.8516 0.8726
6 0.7983 0.7983 0.9290
7 0.8688 0.8688 0.8688
8 0.7505 0.7505 0.7183
Table 8. Cross-tabulation of Ward+DTW cluster assignments at K c * = 8 (rows) against the unanimous partition at K c = 7 (columns). Each row of the K c = 8 partition maps entirely to one K c = 7 cluster, with the sole merge occurring between Clusters C5 and C6 (both extreme-cohort, Providers 4 and 3 respectively). All six remaining clusters pass through the K c = 8 K c = 7 collapse unchanged.
Table 8. Cross-tabulation of Ward+DTW cluster assignments at K c * = 8 (rows) against the unanimous partition at K c = 7 (columns). Each row of the K c = 8 partition maps entirely to one K c = 7 cluster, with the sole merge occurring between Clusters C5 and C6 (both extreme-cohort, Providers 4 and 3 respectively). All six remaining clusters pass through the K c = 8 K c = 7 collapse unchanged.
K c = 7 cluster
K c * = 8 cluster 1 2 3 4 5 6 7
C1 (P3 mid) 6
C2 (P1 mid) 6
C3 (P1 ext) 2
C4 (P2+P5 ext) 4
C5 (P4 ext) 2
C6 (P3 ext) 2
C7 (P4 mid) 6
C8 (P2+P5 mid) 12
Table 9. Cluster composition under k-means with DTW barycenters at K c = 8 . Cluster numbering is method-specific and does not correspond to the Ward labels in Table 5; the mapping between methods is given in Table 10.
Table 9. Cluster composition under k-means with DTW barycenters at K c = 8 . Cluster numbering is method-specific and does not correspond to the Ward labels in Table 5; the mapping between methods is given in Table 10.
Cluster Composition Size Mean TE
KM0 Providers 2 & 5, AG2-AG7 12 0.0036
KM1 Provider 3, AG2-AG7 6 0.0046
KM2 Provider 1, AG1-AG7 7 0.0059
KM3 Provider 4, AG2-AG7 6 0.0041
KM4 Provider 3, AG1 & AG8 2 0.0011
KM5 Providers 2 & 5, AG1 & AG8 4 0.0010
KM6 Provider 1, AG8 1 0.0011
KM7 Provider 4, AG1 & AG8 2 0.0012
Table 10. Cross-tabulation of cluster assignments at K c = 8 . Rows correspond to Ward+DTW clusters (Table 5); columns correspond to k-means with DTW barycenters (Table 9). Empty cells denote zero funds. All non-zero entries except in row C3 lie in a single column, confirming that the two methods recover the same partition up to the assignment of one boundary fund.
Table 10. Cross-tabulation of cluster assignments at K c = 8 . Rows correspond to Ward+DTW clusters (Table 5); columns correspond to k-means with DTW barycenters (Table 9). Empty cells denote zero funds. All non-zero entries except in row C3 lie in a single column, confirming that the two methods recover the same partition up to the assignment of one boundary fund.
k-means (DTW)
Ward KM0 KM1 KM2 KM3 KM4 KM5 KM6 KM7
C1 6
C2 6
C3 1 1
C4 4
C5 2
C6 2
C7 6
C8 12
Table 11. BIC values for HMM models with K { 2 , , 8 } states fitted to the absorption ratio series { A R t } . d ( K ) = K 2 + 2 K denotes the number of free parameters.
Table 11. BIC values for HMM models with K { 2 , , 8 } states fitted to the absorption ratio series { A R t } . d ( K ) = K 2 + 2 K denotes the number of free parameters.
K Log-likelihood d ( K ) BIC
2 3,582.27 8 7 , 105.63
3 4,201.64 15 8 , 292.82
4 4,581.03 24 8 , 985.31
5* 4,870.60 35 9 , 483.44
6 4,579.79 48 8 , 806.08
7 4,860.80 63 9 , 257.64
8 4,860.08 80 9 , 130.99
* BIC-optimal. baseline.
Table 12. Posterior identifiability of the finite sticky HMM under the two data-calibrated stickiness priors κ 3 = 66 and κ 5 = 80 . Raw and label-switching-corrected Gelman-Rubin statistics ( R ^ ) are reported, alongside the number of identified states (# Id, corrected R ^ < 1.1 ), the minimum separation between sorted emission means. Zero divergent transitions across all ( K , κ ) confirm that identifiability failures reflect model over-parameterisation rather than sampler issues.
Table 12. Posterior identifiability of the finite sticky HMM under the two data-calibrated stickiness priors κ 3 = 66 and κ 5 = 80 . Raw and label-switching-corrected Gelman-Rubin statistics ( R ^ ) are reported, alongside the number of identified states (# Id, corrected R ^ < 1.1 ), the minimum separation between sorted emission means. Zero divergent transitions across all ( K , κ ) confirm that identifiability failures reflect model over-parameterisation rather than sampler issues.
K Raw min/max R ^ Corr min/max R ^ # Id Min μ gap Div.
Panel A: κ 3 = 66
3 14.14 / 34.15 1.000 / 1.000 3 0.0444 0*
4 34.00 / 46.09 1.114 / 13.128 0 0.0266 0*
5 33.65 / 53.97 1.000 / 1.000 5 0.0189 0*
6 23.56 / 51.40 1.000 / 12.530 2 0.0166 0*
Panel B: κ 5 = 80
3 24.74 / 34.14 1.000 / 1.000 3 0.0444 0*
4 30.87 / 46.75 1.148 / 14.772 0 0.0248 0*
5 37.81 / 54.11 1.000 / 1.001 5 0.0189 0*
6 1.06 / 65.70 1.000 / 18.827 1 0.0175 0*
Table 13. Sensitivity of posterior identifiability to the stickiness prior κ . Cells report the number of states with corrected R ^ < 1.1 at each ( K , κ ) combination. The data-calibrated values κ 3 = 66 and κ 5 = 80 (highlighted columns) are the two anchors used in the main identifiability analysis.
Table 13. Sensitivity of posterior identifiability to the stickiness prior κ . Cells report the number of states with corrected R ^ < 1.1 at each ( K , κ ) combination. The data-calibrated values κ 3 = 66 and κ 5 = 80 (highlighted columns) are the two anchors used in the main identifiability analysis.
K κ = 10 κ = 20 κ = 30 κ = 50 κ 3 = 66 κ 5 = 80
3 0 3 3 3 3 3
4 4 0 0 0 0 0
5 0 5 5 0 5 5
6 0 2 2 0 2 1
Degenerate identification: chains collapse to a near-uniform mean configuration; (gap = 0.028, smaller than at any higher κ ); see discussion below.
Table 14. HMM regime characterisation ( K = 3 ). Emission parameters ( μ ^ k , σ ^ k ) , regime persistence (diagonal of the estimated transition matrix), and expected regime duration 1 / ( 1 p k k ) are reported in trading days. Global benchmark statistics are mean daily log returns over regime-assigned days.
Table 14. HMM regime characterisation ( K = 3 ). Emission parameters ( μ ^ k , σ ^ k ) , regime persistence (diagonal of the estimated transition matrix), and expected regime duration 1 / ( 1 p k k ) are reported in trading days. Global benchmark statistics are mean daily log returns over regime-assigned days.
Regime Days Share μ ^ k σ ^ k Persistence Duration S&P 500 / MSCI World
Low-Risk 209 13.2% 0.7730 0.0237 0.9478 19.2 + 0.0004 / + 0.0009
Moderate-Risk 854 54.1% 0.8291 0.0127 0.9803 50.9 + 0.0011 / + 0.0008
High-Risk 516 32.7% 0.8737 0.0195 0.9845 64.6 0.0003 / 0.0006
Table 15. EM vs Bayesian estimates of emission means at K = 3 . The Bayesian posterior is computed via NUTS sampling at κ 3 = 66 after label-switching correction; 90% credible intervals are reported.
Table 15. EM vs Bayesian estimates of emission means at K = 3 . The Bayesian posterior is computed via NUTS sampling at κ 3 = 66 after label-switching correction; 90% credible intervals are reported.
Regime μ ^ k EM μ ^ k Bayes | Diff | Bayes 5% Bayes 95%
Low-Risk 0.7730 0.7735 0.0005 0.7702 0.7770
Moderate-Risk 0.8291 0.8292 0.0000 0.8283 0.8301
High-Risk 0.8737 0.8736 0.0001 0.8719 0.8753
Table 16. Estimated HMM transition probability matrix ( K = 3 ). Diagonal entries (persistence) are also reported in Table 14; off-diagonal entries give the probabilities of transitioning between regimes from one trading day to the next.
Table 16. Estimated HMM transition probability matrix ( K = 3 ). Diagonal entries (persistence) are also reported in Table 14; off-diagonal entries give the probabilities of transitioning between regimes from one trading day to the next.
High-Risk Moderate-Risk Low-Risk
High-Risk 0.9845 0.0137 0.0018
Moderate-Risk 0.0074 0.9803 0.0122
Low-Risk 0.0046 0.0476 0.9478
Table 17. Forward regime probabilities conditional on the current moderate-risk state (29 September 2025) and long-run stationary distribution.
Table 17. Forward regime probabilities conditional on the current moderate-risk state (29 September 2025) and long-run stationary distribution.
Horizon Low-Risk Moderate-Risk High-Risk
t + 1 (next day) 0.012 0.980 0.007
t + 5 (1 week) 0.053 0.912 0.035
t + 20 (1 month) 0.131 0.752 0.117
t + 60 (1 quarter) 0.155 0.614 0.231
t + 120 (6 months) 0.145 0.568 0.287
Stationary ( h ) 0.140 0.553 0.307
Table 18. Correlations between the baseline absorption ratio A R t and interpretive benchmark series, with p-values testing H 0 : ρ = 0 . Pearson is parametric; Spearman and Kendall are non-parametric robustness checks. T = 1579 observations.
Table 18. Correlations between the baseline absorption ratio A R t and interpretive benchmark series, with p-values testing H 0 : ρ = 0 . Pearson is parametric; Spearman and Kendall are non-parametric robustness checks. T = 1579 observations.
Pearson Spearman Kendall
Series r p ρ p τ p
Average tracking error 0.0861*** 0.0006 0.0283 0.2608 0.0190 0.2589
S&P 500 −0.0365 0.1476 −0.0127 0.6141 −0.0076 0.6511
MSCI Europe −0.0446 0.0764 −0.0152 0.5461 −0.0093 0.5803
MSCI World −0.0607* 0.0158 −0.0226 0.3695 −0.0152 0.3659
*** p < 0.001 , ** p < 0.01 , * p < 0.05 .
Table 19. Regime-conditional OLS regression of each interpretive series on A R t , a high-risk regime dummy, and their interaction. Coefficient b 1 captures the effect of A R t in non-high-risk regimes; b 3 captures the additional effect during high-risk periods. Standard errors are HAC (Newey-West) with 7 lags. T = 1579 .
Table 19. Regime-conditional OLS regression of each interpretive series on A R t , a high-risk regime dummy, and their interaction. Coefficient b 1 captures the effect of A R t in non-high-risk regimes; b 3 captures the additional effect during high-risk periods. Standard errors are HAC (Newey-West) with 7 lags. T = 1579 .
Dependent variable b 1 (AR) SE b 3 (interaction) SE p ( b 3 ) R 2
Average tracking error −0.0017 0.0051 0.0370 0.0261 0.1570 0.0181
S&P 500 0.0024 0.0115 −0.0301 0.0479 0.5294 0.0025
MSCI Europe 0.0002 0.0115 −0.0387 0.0531 0.4654 0.0034
MSCI World −0.0077 0.0133 −0.0524 0.0616 0.3953 0.0053
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated