1. Introduction
Systemic risk, the potential for shocks to propagate across financial systems, has been prominent since the 2008 crisis, revealing how the failure of individual institutions can destabilise economies [
1]. While traditionally associated with banks because of their central role in credit intermediation, attention is now shifting to large non-bank institutional investors, especially pension funds, which hold over
$63 trillion in assets globally [
2]. The risk arises not simply from scale; research demonstrates that overlapping portfolios among institutions allow asset sales or losses to depress prices and transmit shocks, amplifying stress throughout the system [
3,
4,
5]. Notably, pension funds form networks characterised by such overlaps, making indirect contagion possible even without direct financial links.
Unlike banks, pension funds typically have longer investment horizons, limited leverage, and focus on hedging [
6]. Yet these safeguards do not dissolve the vulnerability that [
5] identifies, because that vulnerability lives in the overlap structure, not in leverage or direct exposure. Fire-sale dynamics and portfolio overlap can still open indirect contagion channels with no direct financial links between funds [
3,
4]. Recent events, like the 2026 collapse of the Versorgungswerk der Zahnärztekammer Berlin pension fund, underscore that systemic vulnerabilities in this sector are real [
7]. The question, then, is not whether pension networks can transmit systemic stress, but how one can know, from the data,
when such a network has entered a high-risk state.
Indeed, existing literature increasingly frames the risks confronting pension schemes as fundamentally unhedgeable, system-level risks that cannot be diversified away through portfolio construction alone [
8]. The enterprise risk management tradition reinforces this view [
9,
10]: [
10] shows formally that a firm sponsoring a defined-benefit pension plan incurs compounding vulnerabilities unless pension risk is integrated with all other business risks in a unified optimisation. Their model demonstrates that managing pension risk in a silo, rather than holistically, reduces expected firm value by over
. This is not merely an actuarial accounting problem; it reveals that portfolio homogeneity and shared liability structures across pension participants create interdependencies that travel well beyond the balance sheet of any single fund.
Before turning to how systemic risk has been modelled, it is worth naming the obstacle that (we will argue) every existing approach must confront, and that ultimately defines the gap this paper fills. Systemic episodes in pension funds are rare and slow-moving. [
11] addresses this through modular modelling but notes that traditional balance-sheet-based indicators often reflect past conditions and cannot reliably predict the onset of stress. [
12] surveying the broader machine learning for financial risk management literature arrives at a complementary diagnosis from the opposite direction: across market, credit, operational, and insurance risk [
13], the dominant paradigm is supervised classification trained on labelled crises or externally specified targets, and the field’s principal open challenges, data inaccessibility, label scarcity, imbalanced datasets, and biased portfolio-level estimates, all trace back to the same root that [
11] identifies. Where [
11] argues from systems-design first principles that the labels cannot be there, [
12] documents, study by study, that the field has been quietly assuming they are. Four main modelling traditions respond to this constraint, but each leaves gaps, particularly for pension fund networks.
Financial network frameworks often model systemic risk using explicit matrices of bilateral obligations [
14,
15], which are well-suited to banking but less applicable to pension funds, whose connections are indirect. A complementary strand models systemic risk through network density and contagion probability directly, showing that the relationship between interconnectedness and expected loss is non-monotonic; neither fully connected nor ring networks are optimal because denser connections simultaneously enable risk sharing and multiply contagion pathways [
16]. Formalising when and how such contagion becomes systemic, however, requires a rigorous measurement framework. The mathematical foundations for measuring such systemic risk rigorously were laid by [
17], who developed a general framework for systemic risk measures via multi-dimensional acceptance sets and aggregation functions, unifying both the
first aggregate, then inject capital, and the
first inject capital, then aggregate paradigms into a single axiomatic structure. This framework makes explicit that the choice of aggregation rule and acceptance set jointly determines where the contagion boundary lies, a theoretical insight that later computational work has sought to operationalise. This foundation has recently been extended in two influential directions. [
18] generalise systemic risk measures to graph-structured data, modelling the system as a random asset vector together with a random interbank liability matrix, and proving the existence of optimal bailout-capital allocations that secure the network before contagion occurs. In a closely related vein, [
19] operationalises the
first allocate, then aggregate paradigm from [
17], using a deep learning scheme to compute the minimal cash needed to secure a system when no closed-form solution exists. These models are mainly normative, asking how to allocate capital given a detected crisis, rather than diagnosing when a high-risk regime emerges. The supervised machine learning tradition emerged partly to fill this diagnostic gap but, in doing so, introduced a different and equally binding constraint: the need for labelled crises. Recent advances integrating network analysis with extreme dependency modelling and machine learning [
20,
21,
22,
23,
24,
25,
26,
27,
28,
29] improve early warning capabilities and interpretability but remain reliant on labelled historical crises.
Deep neural networks and hybrid architectures achieve high predictive accuracy but deepen rather than dissolve this label dependency [
23,
30,
31]. The relationship between risk factors and crisis outcomes is nonlinear [
30], and more sophisticated architectures confirm rather than escape this; a hybrid XGBoost-LSTM model can issue warnings 2.3 quarters in advance with 83.7% accuracy [
31], yet every such figure is conditional on prior crisis labelling. This label dependency is not merely a technical inconvenience but a structural constraint on the entire supervised paradigm: [
32] demonstrates that even well-specified ensemble classifiers such as random forests, which outperform panel logit models on global banking crisis data with AUROC exceeding 81%, require crisis labels drawn from externally defined databases such as [
33], and their generalisability across heterogeneous economies remains contingent on the completeness and accuracy of those labels. Interpretability advances do not dissolve it either; [
34] applies partial dependence plots across multiple models to identify exchange rates, money supply, and interest rates as dominant risk drivers, yet still requires a continuous quantitative risk target derived from market data, which is unavailable in the pension fund setting where NAV-based returns are the primary observable. These requirements are feasible in banking and broad equity markets, but much less so for pension fund networks, where systemic episodes are rare, slow-moving, and not cleanly labelled, and where the very question of interest is whether elevated-risk regimes can be recognised without prior labels.
Unsupervised data-driven methods present a promising alternative. [
35] introduces the absorption ratio, the fraction of total asset-return variance explained by a finite number of eigenvectors from a rolling PCA, as an unsupervised, threshold-free indicator of market fragility. The logic is direct: when a single dominant factor absorbs the bulk of return variation across assets, those assets are tightly coupled, and a shock propagates more quickly and broadly than when risks are dispersed. Applied to U.S. equity industries over 1998-2010, the absorption ratio rose to its highest recorded level during the global financial crisis of 2008, and spikes in the standardised shift in the absorption ratio preceded all of the 1% worst monthly drawdowns. Crucially, [
35] derive their signal entirely from observed returns: no crisis labels, no externally chosen thresholds, and no imposed regime classification. The same PCA-based logic has since been applied to broader contagion analysis [
1] and forms the methodological core of our own absorption ratio component. Studies also apply clustering and anomaly detection to textual or market data [
36,
37], but typically do not operate directly on institutional financial positions. Their network is a network of what is
said about firms, not of what firms
hold, which is exactly the overlap-based contagion substrate that [
5] shows to be the operative one for institutions with common asset holdings. The broader systemic-risk machine learning literature confirms how isolated this position is. [
38] in their survey of machine learning for systemic risk, classify the field into four branches and identify data-driven systemic risk analysis as a major future direction precisely because real interbank-network data are difficult to obtain and most existing work relies either on proxy networks or on externally specified targets. Endogenous regime detection operating directly on actual financial positions does not appear in their taxonomy as an established branch; it appears as a gap.
The argument across the preceding traditions can now be stated crisply, and it defines the specific opening this paper fills. [
15] shows
formally that network topology governs how shocks clear but requires observed bilateral liabilities. [
39] shows
empirically that PageRank and interbank-exposure ratios are the dominant drivers of systemic impact and vulnerability, respectively, but requires observed bilateral exposures and synthetic DebtRank labels. [
30] shows that nonlinear classifiers substantially outperform linear ones for crisis detection but amplify rather than resolve the label dependency. Each step increases explanatory power while deepening the data requirements. The pension case sits at the intersection of all three constraints: no bilateral liabilities, no DebtRank-computable exposures, and no labelled crises. The natural response is to step back to an earlier, harder question: can the
state of the network be identified directly from returns, without any of these inputs?
Within unsupervised methods applied directly to financial returns, the toolkit is well established: PCA-based connectedness measures [
1,
35]; DTW-based time-series clustering, robust to temporal misalignment and well suited to financial series [
40]; and Markov regime-switching, whose foundations [
41] motivate the HMM approach we adopt. The CRISP-DM framework [
42] provides a reproducible scaffold for assembling these components. Notably, the most relevant pension-specific precedents identify market regimes via HMM and stress-testing frameworks, anchored to external benchmarks [
43,
44]. Together, these establish both the network embeddedness of pension funds and the usefulness of regime-switching methods in this exact setting, yet remain conditional on external states rather than allowing risk regimes to emerge endogenously from the pension network itself. In [
11]’s terms, a benchmark-anchored regime is a measurement of a previous, externally defined state rather than a forward-looking read of the network’s own condition.
We address this gap by developing an unsupervised regime-detection framework that treats systemic risk as an emergent property of the pension fund network, rather than as a response to externally defined market conditions. The proposed approach integrates three components within a unified empirical pipeline: a PCA-based absorption ratio following the logic of [
35] to measure system-wide co-movement; DTW-based hierarchical clustering to capture cross-sectional similarity structure; and a Gaussian HMM to identify latent risk regimes over time. Importantly, and in the spirit of the threshold-free, data-derived discipline shared by [
35], [
36], and [
37], the crisis threshold is derived endogenously from the HMM emission distributions, ensuring internal consistency between regime classification and crisis identification.
Three methodological contributions distinguish this framework. First, the number of latent states is selected by combining BIC with a posterior identifiability analysis in a finite sticky HMM [
45], addressing BIC’s tendency to over-select when mixture components overlap. Second, the stickiness prior is calibrated empirically from the data through dual anchors corresponding to the baseline and the BIC-optimal specification, eliminating the circularity of choosing a prior that determines the result. Third, emission estimates from the frequentist Baum-Welch algorithm are cross-validated against a fully Bayesian sticky HMM estimated via the No-U-Turn Sampler, with both approaches producing near-identical results, confirming that the detected regime structure is data-driven rather than model-dependent. The framework is applied to daily NAV data from
second-pillar pension funds in Lithuania managed by five providers over January 2019 to September 2025, augmented with fund-specific benchmarks and global market indices.
The empirical analysis yields four main findings. First, systemic co-movement is persistently high: the absorption ratio computed using the method of [
35] never falls below 0.728, indicating that a single common factor explains at least 73% of total return variance even in the calmest periods. Clustering results show that the funds group primarily by age cohort rather than by provider, suggesting that lifecycle allocation regulation, rather than managerial differences, is the dominant driver of cross-fund dependence, a structural reading consistent with the finding that pension risk behaviour is regulation-driven [
10,
46]. Second, three latent risk regimes are identified, with moderate- and high-risk states accounting for nearly 87% of trading days, indicating that elevated systemic interaction is the network’s structural norm. Third, systemic stress is not a passive reflection of global markets: the relationship between the absorption ratio and global benchmarks is statistically insignificant across regimes, while tracking-error amplification, the precise quantity practitioners monitor [
46], is concentrated in the high-risk state. Fourth, the empirically derived crisis threshold
exceeds the values commonly used in network-based contagion studies, where thresholds are typically chosen heuristically [
47]. This is consistent with [
5]’s demonstration that contagion ignites only beyond a system-specific tipping point, one that depends on leverage, concentration, and the commonality of holdings rather than on any universal shock level. More broadly, the finding that an internally derived threshold sits above the heuristic values routinely used in banking-network studies echoes the methodological conclusion that [
15] reaches from the opposite direction: the clearing structure of a specific network, its liabilities, its cash-flow distribution, and the topology of its obligations determine where the contagion boundary lies, and that boundary cannot be read off from first principles or borrowed from other sectors.
The remainder of the paper is structured as follows.
Section 2 develops the theoretical framework and describes the empirical methodology, including the data, the absorption ratio, DTW clustering, and the Hidden Markov Model.
Section 3 presents the main results, including the regime detection, the cluster structure, and the cross-step analysis connecting them.
Section 4 discusses the findings and their policy implications.
Section 5 concludes and outlines directions for future research.