Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy

Yu Mu; Robert J. Frey

doi:10.20944/preprints202507.2022.v1

Submitted:

21 July 2025

Posted:

24 July 2025

You are already at the latest version

Abstract

Statistical factor models are widely applied across various domains of the financial industry, including risk management, portfolio selection, and statistical arbitrage strategies. However, conventional factor models often rely on unrealistic assumptions and fail to account for the fact that financial markets operate under multiple regimes. In this paper, we propose a regime-switching factor model estimated using a particle filtering algorithm which is a Monte Carlo-based method well-suited for handling nonlinear and non-Gaussian systems. Our empirical results show that incorporating regime-switching dynamics significantly enhances the model’s ability to detect structure breaks and adapt to evolving market conditions. This leads to improved performance and reduced drawdown in the equity statistical arbitrage strategies.

Keywords:

regime switching

;

statistical factor model

;

particle filtering

;

statistical arbitrage

Subject:

Computer Science and Mathematics - Signal Processing

1. Introduction

Factor analysis plays a critical role in various areas of quantitative finance, including asset pricing, risk management, portfolio selection, and statistical arbitrage strategies, etc.. Broadly speaking, factor models can be categorized into two types: explicit factor models, which fall under supervised learning, and statistical factor models, which are typically unsupervised. Compared to explicit factor models such as Barra risk model [1], statistical factor models have two key advantages.

First, by construction, statistical factor models are more numerically stable and can capture the underlying covariance structure using fewer factors. Second, they are asset-class agnostic and therefore more broadly applicable across different markets. Given these strengths, this work focuses on the estimation and application of statistical factor models.

Statistical factor models has been developed as early as the 1930s. Scholars started to use factor models based on principal component analysis (PCA) back then [2]. PCA based factor models failed to provide a parametric model for the observations and perform poorly when within group variation dominates between group variation [3,4]. In the 1980s, maximum likelihood (ML) factor analysis estimated via the EM algorithm was introduced by Rubin and Thayer [5]. This was further generalized in the 1990s by Ghahramani and Hinton [6], who proposed the Mixture of Factor Analyzers. Their approach explains the hidden factors’ structure by combining clustering and local dimension reduction techniques. Mixture of factor analyzers works better than classical ML factor analysis due to its ability of segmenting data into different market regimes, each with its own factor structure. It handles the market regime changes in the covariance structure and hence has a more realistic assumption than the “one-size-fits-all” ML factor models.

Despite this advancement, within each regime, it still uses classical ML factor model which assumes i.i.d. hidden factors and Gaussian innovations. Dynamic factor model has a recursive structure for the hidden factor returns and can better model the factor return process [7,8,9,10]. This paper focus on the estimation of dynamic factor with regime switching factor loading that is governed by first order Markov Chain. The estimation relies on Particle filtering techniques [11,12], a class of Monte Carlo methods designed for solving the recursive Bayesian inference when analytical solutions such as the Kalman filter or Baum-Welch algorithm are infeasible.

Although particle filtering techniques is well developed when model parameter values are known, parameter learning in more complex settings remains challenging. Specifically, when parameters evolve according to a stochastic process rather than being constant, such as the applications in finance, an adaptive parameter learning algorithm becomes critical. To learn the parameter values, one early approach by Liu [13] incorporates parameters into the hidden state vector, but this often exacerbates the degeneracy issue in particle filtering. More computationally robust approaches have been introduced in [14,15]. Instead of combining parameter estimation to the learning algorithm they marginalize over parameters to reduce the estimation variance of their system. In this work, we implemented a particle filtering algorithm inspired by [14,15], and evaluate the estimation of regime switching dynamic factor model in the context of an equity statistical arbitrage strategy.

The rest of paper is organized as follows: Section 2 introduces the state space model and its estimation methods, including Kalman filter, EM algorithm, Particle filter, and the more recent development in the parameter learning - particle learning algorithm. Section 3 presents statistical factor models such as the MLE factor model, mixture of factor analyzers and the regime-switching dynamic factor model. We highlight the evolution of statistical factor models over recent decades and provides a simulation study demonstrating the effectiveness of particle learning algorithm for estimating the regime-switching dynamic factor model. Section 4 presents emprical studies applying the regime switching dynamic factor model in the context of equity stat-arb strategy and compares its performance to the conventional MLE factor model. Finally, Section 5 concludes the paper.

2. State Space Model and Its Estimation

A state space model is a framework used to describe the evolution of a dynamical system over time, particularly when the system is only partially observed. It has numerous applications in finance, such as time series prediction (alpha generation), statistical factor analysis (risk modeling), portfolio selection, etc.. In the most general form, a state space model can be represented by:

\begin{matrix} x_{t} & = f_{t} (x_{t - 1}, w_{t}) \end{matrix}

(1)

\begin{matrix} y_{t} & = g_{t} (x_{t}, v_{t}) \end{matrix}

(2)

where

x_{t}

denote the hidden states with dimension k, and

y_{t}

the observations with dimension s. The terms

w_{t}

and

v_{t}

represent the noise terms for the evolution equation

(1)

and the observation equation

(2)

, with dimensions w and v respectively. The functions

f_{t}

and

g_{t}

are general nonlinear functions defining the state transition and observation processes. When the parameter values of the above equations are known, inference about the hidden states can be efficiently performed using the recursive Bayesian algorithm [12], which is summarized as Algorithm 1:

Algorithm 1 Recursive Bayesian Algorithm

Prediction:

$p (x_{t} ∣ y_{1 : t - 1}) = \int p (x_{t} ∣ x_{t - 1}) p (x_{t - 1} ∣ y_{1 : t - 1}) d x_{t - 1}$

(3)
Update:

$p (x_{t} ∣ y_{1 : t}) = \frac{p (y_{t} ∣ x_{t}) p (x_{t} ∣ y_{1 : t - 1})}{p (y_{t} ∣ y_{1 : t - 1})}$

(4)

where the denominator in equation

(4)

is another intergral needs to be solved in this problem:

p (y_{t} ∣ y_{1 : t - 1}) = \int p (y_{t} ∣ x_{t}) p (x_{t} ∣ y_{1 : t - 1}) d x_{t}

(5)

2.1. Kalman Filter and Smoother

When the dynamical system described by equations

(1)

and

(2)

is linear, time-invariant, and finite-dimensional, functions

f_{t}

, and

g_{t}

can be expressed in the following form:

\begin{matrix} x_{t} & = B x_{t - 1} + w_{t} \end{matrix}

(6)

\begin{matrix} y_{t} & = A x_{t} + v_{t} \end{matrix}

(7)

and if we further assume

w_{t} \sim N (0, Q)

,

v_{t} \sim N (0, R)

, then Kalman filter can be derived to solve equations

(6)

and

(7)

analytically as described in [12] and Algorithm 2.

Algorithm 2 Kalman Filter

Prediction:

$\begin{matrix} m_{t ∣ t - 1} & = B m_{t - 1 ∣ t - 1} \end{matrix}$

(8)

$\begin{matrix} C_{t ∣ t - 1} & = Q + B C_{t - 1 ∣ t - 1} B^{T} \end{matrix}$

(9)
Update:

$\begin{matrix} m_{t ∣ t} & = m_{t ∣ t - 1} + K_{t} (y_{t} - A m_{t ∣ t - 1}) \end{matrix}$

(10)

$\begin{matrix} C_{t ∣ t} & = C_{t ∣ t - 1} - K_{t} A C_{t ∣ t - 1} \end{matrix}$

(11)

where

K_{t}

is the Kalman gain and can be computed by:

\begin{matrix} S_{t} & = A C_{t ∣ t - 1} A^{T} + R \end{matrix}

(12)

\begin{matrix} K_{t} & = C_{t ∣ t - 1} A^{T} S_{t}^{- 1} \end{matrix}

(13)

When estimating hidden states in offline mode (i.e., when the full time series data is available) or when performing EM algorithm to estimate model parameters, the Kalman smoother is typically used to refine those state estimates using future observations as well (See Algorithm 3).

Algorithm 3 Kalman Smoother

\begin{matrix} m_{t ∣ T} & = m_{t ∣ t} + K_{t ∣ T} (m_{t + 1 ∣ T} - B m_{t ∣ t}) \end{matrix}

(14)

\begin{matrix} C_{t ∣ T} & = C_{t ∣ t} + K_{t ∣ T} (C_{t + 1 ∣ T} - C_{t + 1 ∣ t}) K_{t ∣ T}^{T} \end{matrix}

(15)

where

K_{t ∣ T} = C_{t ∣ t} B^{T} C_{t + 1 ∣ t}^{- 1}

is the Kalman smoothing gain.

2.2. Parameter Estimation using EM algorithm

The Kalman filter algorithm introduced above assumes parameters of the system are known and fixed. When parameters are unknown, the EM algorithm [6] can be used to estimate these unknown parameters in a state-space form:

Algorithm 4 EM algorithm

E-Step:

$\begin{matrix} E [x_{t} ∣ y_{1 : T}] & = m_{t ∣ T} \end{matrix}$

(16)

$\begin{matrix} E [x_{t} x_{t}^{T} ∣ y_{1 : T}] & = C_{t ∣ T} + m_{t ∣ T} m_{t ∣ T}^{T} \end{matrix}$

(17)

$\begin{matrix} E [x_{t} x_{t - 1}^{T} ∣ y_{1 : T}] & = C_{t, t - 1 ∣ T} + m_{t ∣ T} m_{t - 1 ∣ T}^{T} \end{matrix}$

(18)
M-Step:

$\begin{matrix} \hat{A} & = (\sum_{t = 1}^{T} y_{t} E {[x_{t} ∣ y_{1 : T}]}^{T}) {(\sum_{t = 1}^{T} E [x_{t} x_{t}^{T} ∣ y_{1 : T}])}^{- 1} \end{matrix}$

(19)

$\begin{matrix} \hat{R} & = \frac{1}{T} \sum_{t = 1}^{T} (y_{t} y_{t}^{T} - \hat{A} E [x_{t} ∣ y_{1 : T}] y_{t}^{T}) \end{matrix}$

(20)

$\begin{matrix} \hat{B} & = (\sum_{t = 2}^{T} E [x_{t} x_{t - 1}^{T} ∣ y_{1 : T}]) {(\sum_{t = 2}^{T} E [x_{t - 1} ∣ y_{1 : T}])}^{- 1} \end{matrix}$

(21)

$\begin{matrix} \hat{Q} & = \frac{1}{T - 1} (\sum_{t = 2}^{T} E (x_{t} x_{t}^{T} ∣ y_{1 : T}) - \hat{A} \sum_{t = 2}^{T} E [x_{t} x_{t - 1}^{T} ∣ y_{1 : T}]) \end{matrix}$

(22)

$\begin{matrix} m_{1} & = E [x_{1} ∣ y_{1 : T}] \end{matrix}$

(23)

$\begin{matrix} C_{1} & = E [x_{1} x_{1}^{T} ∣ y_{1 : T}] - E [x_{1} ∣ y_{1 : T}] E {[x_{1} ∣ y_{1 : T}]}^{T} \end{matrix}$

(24)

where

m_{t ∣ T}

,

C_{t ∣ T}

will be estimated by Kalman filter and smoother.

C_{t, t - 1 ∣ T}

can be updated by backward recursions:

C_{t, t - 1 ∣ T} = C_{t ∣ t} K_{t - 1 ∣ T}^{T} + K_{t ∣ T} (C_{t + 1, t ∣ T} - B C_{t ∣ t}) K_{t - 1 ∣ T}^{T}

(25)

which is initialized by

C_{T, T - 1 ∣ T} = (I - K_{T} A) B C_{T - 1 ∣ T - 1}

.

2.3. Particle Filter

When the dynamical system

(1)

and

(2)

do not admit a linear-Gaussian representation as in

(6)

, and

(7)

, deriving an analytical solution for the recursive Bayesian inference becomes challenging. In such cases, particle filtering techniques has been a very popular approach for solving nonlinear dynamical system, including models like the stochastic volatility model [14]. In this section, we introduce several baseline particle filtering algorithms.

2.3.1. SIS Particle Filter

Sequential Importance Sampling (SIS) particle filter has been introduced as early as 1990s [16]. The main idea is to approximate the joint posterior density at time t using importance sampling method:

p (x_{1 : t} ∣ y_{1 : t}) \approx \sum_{i = 1}^{N} W_{t}^{i} δ (x_{1 : t} - x_{1 : t}^{i})

(26)

where

δ (\cdot)

is the Dirac delta function, and

W_{t}^{i}

is the importance weight for particle i, given by:

W_{t}^{i} \propto \frac{p (x_{1 : t} ∣ y_{1 : t})}{q (x_{1 : t} ∣ y_{1 : t})}

(27)

where

p (x_{1 : t} ∣ y_{1 : t})

is the target distribution and

q (x_{1 : t} ∣ y_{1 : t})

is the importance (or proposal) distribution. A sequential estimation algorithm can be derived if we assume the joint importance density has the following factorization:

q (x_{1 : t} ∣ y_{1 : t}) = q (x_{t} ∣ x_{t - 1}, y_{t}) q (x_{1 : t - 1} ∣ y_{1 : t - 1})

(28)

This factorization allows sequential sampling of the state

x_{t}

given the previous state and current observation, which forms the foundation of the SIS particle filter as in Algorithm 5:

Algorithm 5 SIS Particle Filter

Draw $x_{t}^{i} \sim q (x_{t} ∣ x_{t - 1}^{i}, y_{t})$
Assign the particle $x_{t}^{i}$ a weight, $W_{t}^{i} \propto W_{t - 1}^{i} \frac{p (y_{t} ∣ x_{t}^{i}) p (x_{t}^{i} ∣ x_{t - 1}^{i})}{q (x_{t}^{i} ∣ x_{t - 1}^{i}, y_{t})}$

2.3.2. SIR Particle Filter

The SIS particle filter often fails in practice due to the weight degeneracy issue. The variance of the weight

W_{t}^{i}

in SIS filter will increase over time until only one particle left with large weight. To alleviate this issue, we can resample (with replacement) the particles based on their weights. By doing so we can replicate those particles with large weights, eliminate particles with small weights, and hence reduce the variance of weights

W_{t}^{i}

[11].

Moreover, It has been shown the optimal importance density for a generic particle filter algorithm is

p (x_{t} ∣ x_{t - 1}^{i}, y_{t})

[17] and if we are able to choose the importance density close to the optimal importance density we can further alleviate the weight degeneracy issue. SIR particle filter uses

p (x_{t} ∣ x_{t - 1}^{i})

as important density and applying resampling at each step as shown in the Algorithm 6:

Algorithm 6 SIR Particle Filter

Draw $x_{t}^{i} \sim p (x_{t} ∣ x_{t - 1}^{i})$
Assign the particle $x_{t}^{i}$ a weight, $W_{t}^{i} \propto p (y_{t} ∣ x_{t}^{i})$
Resampling based on weight $W_{t}^{i}$

Although SIR filter improves the weight degeneracy problem present in the SIS filter, it still has its own weakness [18]. First, if there is an outlier at time t, the weights

W_{t}^{i}

will unevenly distributed and the filter will become imprecise. Second, the predictive density

p (x_{t} ∣ x_{t - 1}^{i})

often fails to capture tail behavior accurately, causing some particles to drift into low-likelihood regions during the prediction step (2).

2.3.3. Auxiliary Particle Filter

To address the limitations of the SIR filter, the auxiliary particle filter proposed by [18] introduces a lookahead step - also known as the auxiliary variable mechanism. This approach uses a predictive weighting scheme (See Algorithm 7 step 1) that estimates the likelihood of the upcoming observation given the predicted state, a process often referred to as lookahead weighting.

Algorithm 7 Auxiliary Particle Filter

(Resample) Draw $k^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\},$ where $W_{t}^{i} \propto p (y_{t} ∣ μ (x_{t - 1}^{i}))$ .
(Propagate) Draw ${\tilde{x}}_{t}^{i} \sim p (x_{t} ∣ x_{t - 1}^{k^{i}})$ .
(Resample) Draw $x_{t}^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\}$ , where $W_{t}^{t} \propto p (y_{t} ∣ {\tilde{x}}_{t}^{i}) / p (y_{t} ∣ μ (x_{t - 1}^{i}))$ .

where

μ (x_{t - 1}^{i})

denotes the mean, median, or mode of the distribution of

x_{t}

given

x_{t - 1}^{i}

. When the observation equation is informative in dynamical system, Auxiliary particle filter usually will reduce variance in particle weights and hence perform better.

2.4. Particle Filter with Parameter Learning

The particle filter algorithms discussed so far assume that model parameters are known. However, in many applications, model parameters are typically unknown. Learning the static parameter values - or more broadly - identifying the coefficient evolution process - is a critical aspect of system identification. The integration of parameter learning into the particle filtering framework has been extensively studied [19].

One idea is to treat parameters as additional hidden state variables in the particle filtering framework, but this approach often leads to severe degeneracy issue. To address this issue, [13] proposed introducing an artificial evolutional model for the parameters and augment the hidden state space by including them.

Another idea is to integrate out (or marginalize out) the parameters whenever feasible [14,20,21]. The benefits of the marginalized particle filter has been discussed in [22].

Building on these ideas, Carvalho et al. [15] introduced particle learning approach, a noval particle filter with the embedded parameter learning step. Particle learning method marginalize both the parameters and the hidden states by tracking only the sufficient statistics of them, which updated later using recursive Bayesian algorithm and Kalman filter, respectively. It is also constructed under the Auxiliary particle filter framework, and the performance improvement has been demonstrated in [15,23].

In our paper, we develop a simplified version of the particle learning algorithm to estimate regime switching dynamic factor model.

2.4.1. Particle Learning

Particle learning algorithm was introduced by Carvalho et al. in [15], extending the idea of auxiliary particle filter to incorporate parameter learning. While the overall filtering scheme is the same as auxiliary particle filter. the key innovation lies in tracking the “essential state”

z_{t}

.

Specifically,

z_{t}

includes:

the sufficient statistics of the parameter vector $s_{t}$ ,
the sufficient statistics of hidden states $x_{t}$ , $s_{t}^{x}$ , and
the current value of the parameter vector, $θ_{t}$ :

$z_{t} = \{s_{t}, s_{t}^{x}, θ_{t}\} .$

The reason we include parameter value in essential state is because for evaluating likelihood and drawing samples from predictive density we need to use parameter values. However, there values are not directly sampled at each step, which most likely will cause degeneracy issue; instead, they are inferred offline from their associated sufficient statistics. This approach is also referred to as marginalization by simulation.

The derivation of this filter start with the recursive relation:

p (z_{t} ∣ y_{1 : t}) \propto \int p (y_{t} ∣ z_{t - 1}) p (z_{t} ∣ z_{t - 1}, y_{t}) p (z_{t - 1} ∣ y_{1 : t - 1}) d z_{t - 1}

where

\begin{matrix} p (z_{t} ∣ z_{t - 1}, y_{t}) = \int p (s_{t} ∣ s_{t - 1}, x_{t}, y_{t}) p (s_{t}^{x} ∣ s_{t - 1}^{x}, x_{t}, y_{t}) \\ \cdot p (x_{t} ∣ s_{t - 1}^{x}, y_{t}) d x_{t} \end{matrix}

and

p (x_{t} ∣ s_{t - 1}^{x}, y_{t}) = \int p (x_{t} ∣ x_{t - 1}, y_{t}) p (x_{t - 1} ∣ s_{t - 1}^{x}, y_{t}) d x_{t - 1}

So given the particles of

s_{t - 1}^{x}

we can construct Monte Carlo estimate for

p (x_{t} ∣ s_{t - 1}^{x}, y_{t})

by sampling from

p (x_{t - 1} ∣ {(s_{t - 1}^{x})}^{i}, y_{t})

and

p (x_{t} ∣ x_{t - 1}^{i}, y_{t})

respectively. After that we can update

s_{t}

and

s_{t}^{x}

deterministically. Particle learning method can be summarized as follows:

Algorithm 8 Particle Learning

(Resample) Draw $k^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\}$ , where $W_{t}^{i} \propto p (y_{t} ∣ z_{t - 1}^{i})$
(Propagate) Draw $x_{t - 1}^{i} \sim p (x_{t - 1} ∣ {(s_{t - 1}^{x})}^{k^{i}}, θ_{t - 1}^{k^{i}}, y_{t})$
(Propagate) Draw $x_{t}^{i} \sim p (x_{t} ∣ x_{t - 1}^{i}, θ_{t - 1}^{k^{i}}, y_{t})$
(Update) $s_{t}^{i} = S (s_{t - 1}^{k^{i}}, x_{t}^{i}, x_{t - 1}^{i}, y_{t})$
(Sample) Draw $θ_{t}^{i} \sim p (θ_{t} ∣ s_{t}^{i})$
(Update) ${(s_{t}^{x})}^{i} = K ({(s_{t - 1}^{x})}^{k^{i}}, θ_{t}^{i}, y_{t})$

Where

S

denotes the deterministic updating function for parameter sufficient statistics and

K

denotes Kalman filter for state sufficient statistics update.

3. Statistical Factor Analysis

Statistical factor analysis is a technique used to explain the correlation structure of multivariate observations through a lower-dimensional set of unobserved factors. The model can be expressed as:

y_{t} = B x_{t} + v_{t}

(29)

where

y_{t}

is

(s \times 1)

vector of observations,

x_{t}

is

(k \times 1)

vector of statistical factors, and

B

is

(s \times k)

factor loading matrix.

v_{t}

is the

(s \times 1)

error term that follows certain distribution assumption. The hidden factors

x_{t}

are often assumed to follow an i.i.d. standard normal distribution, i.e.

x_{t} \sim N (0, I_{k})

, and are assumed to be independent of the error term

v_{t}

.

3.1. MLE Factor Analsis

The most commonly used approach for estimating the factor model in equation

(29)

is the Expectation-Maximization (EM) algorithm for maximum likelihood estimation (MLE) [4,5,6]. The EM algorithm for maximum likelihood estimation of the factor model is outlined below as Algorithm 9:

Algorithm 9 EM algorithm for Statistical Factor Analysis

E-Step: Given current fit of $B$ and $R$ , calculate sufficient statistics:

$\begin{matrix} γ & = B^{T} {(B B^{T} + R)}^{- 1} \end{matrix}$

(30)

$\begin{matrix} E [x_{t} ∣ y_{t}] & = γ y_{t} \end{matrix}$

(31)

$\begin{matrix} E [x_{t} x_{t}^{T} ∣ y_{t}] & = I - γ B + γ y_{t} y_{t}^{T} γ \end{matrix}$

(32)
M-Step: Update parameter estimation,

$\begin{matrix} \hat{B} & = (\sum_{t = 1}^{T} y_{t} E [x_{t} ∣ y_{t}]) {(\sum_{t = 1}^{T} E [x_{t} x_{t}^{T} ∣ y_{t}])}^{- 1} \end{matrix}$

(33)

$\begin{matrix} \hat{R} & = \frac{1}{T} diag (\sum_{t = 1}^{T} (y_{t} y_{t}^{T} - \hat{B} E [x_{t} ∣ y_{t}] x_{t}^{T})) \end{matrix}$

(34)

The EM algorithm starts with certain initial values for

B

and

R

and terminates when the rate of change in the log likelihood falls below some critical value. The log likelihood can be computed as:

\begin{matrix} log L (B, R ∣ y_{1 : T}) & = & - \frac{1}{2} (T (s log (2 π) + log (det (B B^{T} + R))) \\ + \sum_{t = 1}^{T} y_{t}^{T} {(B B^{T} + R)}^{- 1} y_{t}) \end{matrix}

(35)

3.2. Mixture of Factor Analyzer

The factor model in equation

(29)

is a global linear model. However, in many real world applications — such as in financial engineering — it fails to reflect the fact that the observed data present multiple regimes whose behaviors differ materially. To address this limitation, the Mixture of Factor Analyzer (MFA) model extends the MLE statistical factor model by combining local dimension reduction method with a Gaussian mixture model for clustering. Specifically, the MFA model can be described in the following form:

y_{t} = B (i) x_{t} + v_{t} with prob . π_{i} (i = 1, \dots, m)

(36)

where

\sum_{i = 1}^{m} π_{i} = 1, π_{i} \geq 0

, and the index i denotes the component (or regime), drawn from a categorical distribution with mixing probabilities

\{π_{i}\}

.

B (i)

is the factor loading matrix specific to component i. Following the formulation in [6], we assume the observation covariance

R

for each component are the same, and the number of mixture is fixed prior to the estimation step. The EM algorithm for estimating the parameters of the MFA model is described below in Algorithm 10:

Algorithm 10 EM algorithm for Mixture of Factor Analyzer

E-Step: Given current fit of $π_{i} for i = 1, \dots, m$ , $B$ and $R$ , calculate,

$\begin{matrix} τ_{i j} & = & \frac{π_{i} N (y_{j}, B (i) B {(i)}^{T} + R)}{\sum_{i = 1}^{m} π_{i} N (y_{j}, B (i) B {(i)}^{T} + R)} \end{matrix}$

(37)

$\begin{matrix} γ (i) & = & B {(i)}^{T} {(B (i) B {(i)}^{T} + R)}^{- 1} \end{matrix}$

(38)

$\begin{matrix} Σ (i) & = & \frac{1}{T} \sum_{j = 1}^{T} (\sqrt{τ_{i j}} y_{j}) {(\sqrt{τ_{i j}} y_{j})}^{T} \end{matrix}$

(39)
M-Step:

$\begin{matrix} \hat{B} (i) & = & \frac{Σ (i) γ {(i)}^{T}}{\frac{1}{T} \sum_{j = 1}^{T} τ_{i j} (I_{s} - γ (i) B (i)) + γ (i) Σ (i) γ {(i)}^{T}} \end{matrix}$

(40)

$\begin{matrix} \hat{R} & = & \frac{1}{T} diag (\sum_{i = 1}^{m} Σ (i) - \sum_{i = 1}^{m} \hat{B} (i) γ (i) Σ (i)) \end{matrix}$

(41)

$\begin{matrix} {\hat{π}}_{i} & = & \frac{1}{T} \sum_{j = 1}^{T} τ_{i j} \end{matrix}$

(42)

Starting with initial values of

π_{i}

for

i = 1, \dots, m

,

B (i)

, and

R

, we can iterate algorithm 10 until convergence. The log-likelihood of the observed data under the MFA model is given by:

log L (Ψ) = \sum_{j = 1}^{T} log \{\sum_{i = 1}^{m} π_{i} N (y_{j}, B (i) B {(i)}^{T} + R)\}

where

Ψ

denotes the set of all model parameters.

3.3. Regime Switching Dynamic Factor Model

Although the mixture of factor analyzer (MFA) model can capture data generated from multiple regimes, it still assumes the observations are i.i.d. and fails to account for the temporal persistence of market regimes. In practice, particularly in financial applications, market regimes tend to persist over time, with each regime exhibiting distinct statistical properties. Regime switching model equipped with Markov chain mechanism can better model the regime persistence and capture the abrupt regime shifts more efficiently than commonly used “low-pass filtering” approaches such as rolling window estimation. It fits better for financial time series, where capturing time dynamics is essential.

3.3.1. Methods

In this work, we represent the regime switching dynamic factor model within a state space framework. The estimation of regime switching state space models has been well studied in the literature [8,24,25], with most approaches relying on the Markov Chain Monte Carlo (MCMC) algorithms. MCMC approaches are not only computationally very expensive, but also suffer slow convergence issue. Moreover, they are batch (offline) algorithms in nature which limits their use in real-time trading.

In this work, we will rely on the particle learning technique that is introduced in section 2.4 as our estimation method. Computationally, it’s significantly faster than MCMC algorithms and, being an online algorithm, it can be adopted in the real time trading environment. The convergence of particle filtering algorithm is well studied in the literature [17,26].

Another key advantage of the particle filtering framework is its modular design, which makes it highly adaptable to complex model structures. Additional structure can be incorporated into the system relatively easier than MCMC framework, as long as it remains possible to draw samples from the importance density and evaluate the likelihood function upon receiving the new observations (See [14,27] for examples).

To introduce our model, we first define a discrete latent regime variable

r_{t} \in \{1, \dots, K\}

which captures the regime at time t. The essential state used in the particle learning algorithm is then extended as:

Z_{t} = \{r_{t}, s_{t}, s_{t}^{x}, θ\},

where

s_{t}

and

s_{t}^{x}

denote the sufficient statistics for the parameters and the hidden states, respectively, and

θ

denotes the model parameters.

The model is specified as:

\begin{matrix} x_{t} & = A x_{t} + w_{t} \end{matrix}

(43)

\begin{matrix} y_{t} & = B (r_{t}) x_{t} + \sqrt{λ_{t}} v_{t} \end{matrix}

(44)

\begin{matrix} λ_{t} & \sim I G (v / 2, v / 2) \end{matrix}

(45)

where

I G (\cdot)

denotes the inverse Gamma distribution,

v_{t} \sim N (0_{s}, R_{s})

,

w_{t} \sim N (0_{k}, I_{k})

. In this representation, we model heavy tailed innovation using the data augmentation scheme of [28], where at each time t, we will follow the two step approach: i)

λ_{t} \sim Inv - Gamma (v / 2, v / 2)

, ii)

η_{t} = {\sqrt{λ}}_{t} v_{t}

, so that

η_{t} \sim t_{v} (0_{s}, R_{s})

where

R_{s}

is a diagonal matrix.

If the degree of freedom is given, the at each time step t, our system is a conditionally Gaussian model. To derive the particle learning algorithm for this model (equation

(43)

to

(45)

), we compute the predictive density

p (y_{t} ∣ z_{t - 1}, λ_{t})

by integrating out the latent regime variable

r_{t}

:

\begin{matrix} p (y_{t} ∣ z_{t - 1}, λ_{t}) & = & \sum_{r_{t} = 1}^{K} p (y_{t} ∣ r_{t}, {(s_{t - 1}^{x}, θ, λ_{t})}^{i}) \\ \cdot p (r_{t} ∣ r_{t - 1}, θ) \end{matrix}

(46)

where

p (y_{t} ∣ r_{t}, {(s_{t - 1}^{x}, θ, λ_{t})}^{i})

is the conditional likelihood, which is derived in Appendix A, and

p (r_{t} ∣ r_{t - 1}, θ)

is given by the estimated transition matrix. To propagate

r_{t}

, we sample from the posterior distribution:

\begin{matrix} p (r_{t} ∣ {(r_{t - 1}, s_{t - 1}^{x}, θ, λ_{t})}^{i}, y_{t}) & \propto & p (y_{t} ∣ r_{t}, {(s_{t - 1}^{x}, θ, λ_{t})}^{i}) \\ \cdot p (r_{t} ∣ r_{t - 1}, θ) \end{matrix}

(47)

Algorithm 11 PL for Regime Switching Dynamic Factor Model

(Sampling) Draw $λ_{t}^{i} \sim I G (v / 2, v / 2)$ ( $λ_{t}^{i} = 1$ for Gaussian model)
(Resample) Draw $k^{i} \sim M u l t i \{W_{t}^{1}, \dots, W_{t}^{N}\}$ , with $W_{t}^{i} \propto p (y_{t} ∣ z_{t - 1}^{i}, λ_{t})$ where

$p (y_{t} ∣ z_{t - 1}^{i}, λ_{t}) = \sum_{k = 1}^{K} N (y_{t}; m_{y} (k), C_{y} (k)) p (r_{t} = k ∣ r_{t - 1}^{i}, θ)$

and $m_{y} (k) = B (k) A m_{t - 1}$ , $C_{y} (k) = B (k) A C_{t - 1} A^{T} B {(k)}^{T} + B (k) B {(k)}^{T} + λ_{t} R_{s}$ .
(Propagate) Draw $r_{t}^{i} \sim p (r_{t} ∣ {(s_{t - 1}^{x}, r_{t - 1}, θ)}^{k^{i}}, y_{t})$ , which follows multinomial distribution with:

$p (r_{t} = k ∣ s_{t - 1}^{x}, r_{t - 1}, θ, y_{t}) \propto N (y_{t}; m_{y}^{k^{i}} (k), C_{y}^{k^{i}} (k)) p (r_{t} = k ∣ r_{t - 1}^{k^{i}}, θ^{k^{i}})$
(Propagate) Draw $x_{t}^{i} \sim p (x_{t} ∣ r_{t}^{i}, x_{t - 1}^{k^{i}}, θ^{k^{i}}, y_{t})$ from normal distribution $N (m_{x}, C_{x})$ with:

$\begin{matrix} m_{x} & = & C_{x} (B {(r_{t})}^{T} {(λ_{t} R_{s})}^{- 1} y_{t} + A x_{t - 1}) \end{matrix}$

(48)

$\begin{matrix} C_{x}^{- 1} & = & B {(r_{t})}^{T} {(λ_{t} R_{s})}^{- 1} B (r_{t}) + I_{k} \end{matrix}$

(49)
(Update) $s_{t}^{i} = S (s_{t - 1}^{k^{i}}, r_{t}^{i}, r_{t - 1}^{k^{i}}, x_{t}^{i}, x_{t - 1}^{k^{i}}, y_{t})$ .
(Sample) Draw $θ^{i} \sim p (θ ∣ s_{t}^{i})$ .
(Update) ${(s_{t}^{x})}^{i} = K ({(s_{t - 1}^{x})}^{k^{i}}, θ^{i}, y_{t})$ and draw estimated states ${\hat{x}}_{t}^{i} \sim N (m_{x 1}^{i}, C_{x 1}^{i})$ .

where the deterministic updating formula

S

in step 5 is given by:

\begin{matrix} v_{t} & = & φ^{2} v_{t - 1} + 1 \end{matrix}

(50)

\begin{matrix} g_{t} & = & Φ_{t - 1} x_{t} \end{matrix}

(51)

\begin{matrix} ζ_{t}^{2} & = & λ_{t} φ^{2} + x_{t}^{T} g_{t} \end{matrix}

(52)

\begin{matrix} {\hat{e}}_{t} & = & y_{t} - b_{t - 1}^{T} x_{t} \end{matrix}

(53)

\begin{matrix} b_{t} & = & b_{t - 1} + \frac{1}{ζ_{t}^{2}} g_{t} {\hat{e}}_{t}^{T} I_{r_{t} = k} \end{matrix}

(54)

\begin{matrix} Φ_{t} & = & \frac{1}{φ^{2}} (Φ_{t - 1} - \frac{1}{ζ_{t}^{2}} g_{t} g_{t}^{T} I_{r_{t} = k}) \end{matrix}

(55)

\begin{matrix} V_{t} & = & φ^{2} (V_{t - 1} + \sum_{k = 1}^{K} \frac{1}{ζ_{t}^{2}} {\hat{e}}_{t} {\hat{e}}_{t}^{T} I_{r_{t} = k}) \end{matrix}

(56)

\begin{matrix} g_{t}^{x} & = & Ψ_{t - 1} x_{t - 1} \end{matrix}

(57)

\begin{matrix} σ_{t}^{2} & = & φ^{2} + x_{t - 1}^{T} g_{t}^{x} \end{matrix}

(58)

\begin{matrix} {\hat{e}}_{t}^{x} & = & x_{t} - a_{t - 1}^{T} x_{t - 1} \end{matrix}

(59)

\begin{matrix} a_{t} & = & a_{t - 1} + \frac{1}{σ_{t}^{2}} g_{t}^{x} {\hat{e}}_{t}^{x T} \end{matrix}

(60)

\begin{matrix} Ψ_{t} & = & \frac{1}{φ^{2}} (Ψ_{t - 1} - \frac{1}{σ_{t}^{2}} g_{t}^{x} g_{t}^{x T}) \end{matrix}

(61)

where

i = 1, \dots, s

. The sampling density in step 4 is derived in Appendix B. As discussed in Appendix C, we can assume each row of transition matrix

T

,

p_{i}

, follows a Dirichlet distribution

D i r (p_{i} ∣ α_{i})

, for

i = 1, \dots, K

, and their sufficient statistics can be updated as:

α_{i, j, t} = α_{i, j, t - 1} + I_{r_{t} = j, r_{t - 1} = i}, for i, j = 1, \dots, K

(62)

For step 6, we can either simulate model parameters from the updated sufficient statistics or calculate them analytically. The analytical solution is give by:

\begin{matrix} B (r_{t}) & = & b_{k, t} \end{matrix}

(63)

\begin{matrix} R_{s} & = & V_{t} / (v_{t} - 2) \end{matrix}

(64)

\begin{matrix} A & = & a_{t} \end{matrix}

(65)

\begin{matrix} T_{i j} & = & \frac{α_{i, j, t} - 1}{α_{0, t} - K} \end{matrix}

(66)

where we use the posterior mode (MAP) as transition probability estimate.

For step 7,

K

stands for Kalman filter, and the specific algorithm is given by 2.

3.3.2. Simulation Analysis

In this section, we apply our particle learning algorithm to synthetic data to validate the estimation algorithm. We generate synthetic scenarios based on a three regime dynamic factor model involving 15 synthetic instruments. Our “true” factor loading matrices, observation covariance and “true” state transition matrix are specified as follows:

B (r_{t} = 1) = [\begin{matrix} 0.93 & 0.316 & 0.184 \\ 0.205 & 0.568 & 0.596 \\ ⋮ & ⋮ & ⋮ \\ 0.81 & 0.096 & 0.219 \end{matrix}]

B (r_{t} = 2) = [\begin{matrix} - 0.56 & 1.011 & 0.945 \\ 2.821 & - 1.165 & 1.486 \\ ⋮ & ⋮ & ⋮ \\ 0.152 & 2.381 & - 0.153 \end{matrix}]

B (r_{t} = 3) = [\begin{matrix} - 0.694 & - 0.654 & - 0.443 \\ - 0.15 & - 2.678 & - 0.268 \\ ⋮ & ⋮ & ⋮ \\ - 0.917 & - 0.849 & - 0.811 \end{matrix}]

Diag R = [\begin{matrix} 0.49 & 0.211 & \dots & 0.518 & 0.488 \end{matrix}]

A = [\begin{matrix} 0.553 & 0 & 0 \\ 0 & 0.477 & 0 \\ 0 & 0 & 0.312 \end{matrix}]

and regime transition matrix

T

is:

T = [\begin{matrix} 0.995 & 0.0025 & 0.0025 \\ 0.0025 & 0.995 & 0.0025 \\ 0.0025 & 0.0025 & 0.995 \end{matrix}]

Since factor loadings and factor returns are not separately identifiable in our model structure, in this analysis, we mainly check regimes detection (See Figure 1), idiosyncratic risk estimation (See Figure 2), and cumulative expected returns’ tracking (See Figure 3). The estimation results are as follows:

Figure 1. Regime Detection.

As shown in Figure 1, the algorithm accurately captures the evolution of hidden regimes from both regime detection plot and cumulative regimes comparison plot.

Figure 2. Idiosyncratic Risk Estimation.

In terms if idiosyncratic risk (or error variance), the algorithm can also converge to their unbiased estimation.

The most important metric in statistical factor analysis is how accurately the model tracks the expected return. In this synthetic setting, our model demonstrates strong performance in the expected return (or cumulative expected return as shown in Figure 3) tracking.

Figure 3. Cumulative Expected Return Tracking.

4. Results in Statistical Arbitrage Strategy

In both academic and industry, equity statistical arbitrage strategy (equity stat-arb) usually means the idea of trading groups of stocks against either explicit or synthetic factors, which can be seen as the generalization of “pairs-trading” (see [29]). In some cases, we would long an individual stock and short the factors, and in others we would short the stock and long the factors. Overall, due to the netting of long and short positions, our exposure to the factors will be small. One key step of the equity stat-arb strategy is the decomposition of assets’ returns into the systematic part and idiosyncratic part. When we use a more effective factor model, we can have a better decomposition and hence the better strategy performance.

4.1. Market Neutral Strategy

In this paper, we evaluate the quality of factor models within the context of an equity statistical arbitrage strategy. Inspired by the approach of [30], in order to avoid the unnecessary complexity of signal generation and potential overfitting issues we adopt a deliberately simple reversion signal which is the negative value of previous day’s residuals. To ensure market neutrality, we construct a dollar-neutral portfolio each day, with equal dollar amounts allocated to long and short positions.

To be specific, we compare the performance of trading strategies derived from both regime switching dynamic factor model and static MLE factor model. For the regime switching dynamic factor model, the factor loading matrices for each regime are initialized using estimates from a mixture of factor analyzer (MFA), trained on a “burnin” data from 2005-01-03 to 2011-01-01. The residualization step is implemented in a rolling window framework so that the residuals are always generated out-of-sample.

After the residualization step, we can generate our signals using naive rule-based approach. If previous day’s residual for asset i has

v_{i, t - 1} > d

, where d is the threshold, then we will say that the asset i is over-priced at time

t - 1

and short this asset for the next time stamp t. For

v_{i, t - 1} < - d

, we will do the opposite operation. At each time step, we form a dollar-neutral portfolio with equally weighted positions on both the long and short sides.

4.2. Performance Comparison

We run this naive strategy using both dynamic factor model and static factor model respectively, and the strategy performance is given as Figure 4:

Figure 4. Equity Stat-Arb Performance Comparison.

We can see from Figure 4 that the strategy has a better Sharpe and lower Maximum drawdown when using regime switching dynamic factor model than using static MLE factor model. The associated historical effective sample size plot and regime detection plot are given in Figure 5:

Figure 5. (a) Effective Sample Size at each day when running particle filter (a) Regime detection when using regime switching particle filter.

From the effective sample size, we can see the particle filter estimation is stable since most of the sample size are larger than 1000 for a three-factor model. From the regimes detection, we can see there are more regime changes in 2024 and indeed the strategy relying on regime switching dynamic factor model performs better in 2024 than the one using static model.

5. Discussion

This paper investigates the estimation of regime switching dynamic factor model using a particle learning algorithm. Through simulation studies, we validate our estimation approach by evaluating regime detection accuracy, idiosyncratic risk estimation, and the model’s ability to track expected returns. Empirical study in the equity statistical arbitrage framework further demonstrates that the regime-switching dynamic factor model outperforms a conventional static MLE factor model in capturing underlying data dynamics.

The particle learning algorithm implemented in this paper builds on the framework of [15], with several modifications. First, we extend the innovation distribution to a mixture of Gaussian distributions. Second, we simplify the parameter learning step by only tracking the sufficient statistics, rather than using the marginalization via simulation. Last, we apply the estimated model to the empirical data and compare its performance against the conventional MLE factor model.

We are able to see the performance improvements in the equity statistical arbitrage strategy when using our model. Our model not only improves the Sharpe ratio but also minimizes the maximum drawdown. These results align with our initial motivation: It is well known that financial market operates under multiple regimes, incorporating the regime switching mechanism that is governed by hidden Markov model in our model representation allows us better model the structural shifts and enhances risk management — particularly during periods of elevated volatility and contagion risk.

While the current particle learning algorithm is based on the vanilla auxiliary particle filter, there remains a lot of room to improve the quality of our particle filtering algorithm and hence improve the tracking of hidden states (or hidden factor in this model). For instance, the ABC-based sequential Monte Carlo filter [31] could further enhance the filtering accuracy. Additionally, incorporating more robust, heavy-tailed innovation distributions [32] may improve the robustness of our estimation, particularly in the presence of extreme market movements.

Appendix A. Probability Density Function of p(y_t ∣z_t−1)

The likelihood function

p (y_{t} ∣ z_{t - 1})

is straightforward to derive. Given

z_{t - 1} = \{s_{t - 1}, m_{t - 1}, C_{t - 1}, θ_{t - 1}\}

, and the equation

y_{t} = B (A x_{t - 1} + w_{t}) + v_{t}

, the likelihood function

p (y_{t} ∣ z_{t - 1})

follows the Normal distribution

N (m_{y}, C_{y})

, where

\begin{matrix} m_{y} & = B A m_{t - 1} \\ C_{y} & = B (A C_{t - 1} A^{T} + I_{k}) B^{T} + R_{s} \end{matrix}

Appendix B. Probability Density Function of p(x_t ∣x_t−1,θ _t−1, y_t

Based on Bayes’ rule, we can factor

p (x_{t} ∣ x_{t - 1}, θ_{t - 1}, y_{t})

as:

\begin{matrix} p (x_{t} ∣ x_{t - 1}, θ_{t - 1}, y_{t}) & \propto & p (y_{t} ∣ x_{t}, θ_{t - 1}) p (x_{t} ∣ x_{t - 1}, θ_{t - 1}) \\ \propto & exp \{- \frac{1}{2} {(y_{t} - B x_{t})}^{T} R_{s}^{- 1} (y_{t} - B x_{t})\} \\ exp \{- \frac{1}{2} {(x_{t} - A x_{t - 1})}^{T} (x_{t} - A x_{t - 1})\} \end{matrix}

Expressing quadratic form to normal form:

\begin{matrix} {(y_{t} - B x_{t})}^{T} R_{s}^{- 1} (y_{t} - B x_{t}) + {(x_{t} - A x_{t - 1})}^{T} (x_{t} - A x_{t - 1}) \\ = {[\begin{matrix} \bar{U} y_{t} - \bar{U} B x_{t} \\ A x_{t - 1} - x_{t} \end{matrix}]}^{T} [\begin{matrix} \bar{U} y_{t} - \bar{U} B x_{t} \\ A x_{t - 1} - x_{t} \end{matrix}] \end{matrix}

where

{\bar{U}}^{T} \bar{U} = R_{s}^{- 1}

. If we let

\bar{V} = [\begin{matrix} \bar{U} y_{t} \\ A x_{t - 1} \end{matrix}]

,

\bar{W} = [\begin{matrix} \bar{U} B \\ I_{k} \end{matrix}]

, then

\begin{matrix} {(\bar{V} - \bar{W} x_{t})}^{T} (\bar{V} - \bar{W} x_{t}) & = & {(\bar{V} - \bar{W} {\hat{x}}_{t})}^{T} (\bar{V} - \bar{W} {\hat{x}}_{t}) \\ + {(x_{t} - {\hat{x}}_{t})}^{T} {\bar{W}}^{T} \bar{W} (x_{t} - {\hat{x}}_{t}) \end{matrix}

where

{\hat{x}}_{t} = {({\bar{W}}^{T} \bar{W})}^{- 1} {\bar{W}}^{T} \bar{V}

. Hence,

\begin{matrix} m_{x} & = & C_{x} (B^{T} R_{s}^{- 1} y_{t} + A x_{t - 1}) \\ C_{x}^{- 1} & = & B^{T} R_{s}^{- 1} B + I_{k} \end{matrix}

Appendix C. Conjugate Prior of Categorical Distribution

The Dirichlet distribution, which is a multivariate generalization of the Beta distribution, is usually used as the conjugate prior distribution of the categorical distribution (also known as multinomial distribution) (See [33]). The probability density function of the Dirichlet distribution is defined as:

D i r (x ∣ α) = \frac{1}{B (α)} \prod_{i = 1}^{K} x_{i}^{α_{i} - 1}

where

x = \{x_{1}, \dots, x_{K}\}

lives in a K (

K \geq 2

) dimensional probability simplex with constraints

\sum_{i = 1}^{K} x_{i} = 1

, and

x_{i} > 0 for all i \in [1, K]

.

α = \{α_{1}, \dots, α_{K}\}

is called concentration parameters with constraint

α_{i} > 0 for all i \in [1, K]

. Assume we have a set of data

y = \{y_{1}, \dots, y_{T}\}

generated from Categorical distribution with parameters

p = \{p_{1}, \dots, p_{K}\}

,

\sum_{i = 1}^{K} p_{i} = 1

and

p_{i} > 0

. Then we can represent likelihood as:

p (y ∣ p) = \prod_{i = 1}^{K} p_{i}^{N_{i}}

where

N_{i} = \sum_{t = 1}^{T} δ (y_{t} = i)

, and

δ (\cdot)

is the Dirac Delta function. Since the parameter vector

p

also lives in a K dimensional probability simplex, and the Dirichlet distribution belongs to exponential family, so it becomes a natural choice of conjugate prior for parameter

p

. If we assume prior distribution for

p

as:

D i r (p ∣ α) = \frac{1}{B (α)} \prod_{i = 1}^{K} p_{i}^{α_{i} - 1}

Hence, the posterior is also Dirichlet:

\begin{matrix} p (p ∣ y) & \propto & D i r (p ∣ α) p (y ∣ p) \\ \propto & \prod_{k = 1}^{K} p_{i}^{N_{i}} p_{i}^{α_{i} - 1} \\ \propto & \prod_{k = 1}^{K} p_{i}^{α_{i} + N_{i} - 1} \\ = & D i r (p ∣ α_{1} + N_{1}, \dots, α_{K} + N_{K}) \end{matrix}

In other words, the posterior sufficient statistics

{\tilde{α}}_{i}

can be updated by simply adding the empirical counts

N_{i}

. It is easy to show the posterior mode (MAP estimate) is given by:

{\tilde{p}}_{i} = \frac{α_{i} + N_{i} - 1}{α_{0} + N - K}

where

α_{0} = \sum_{i = 1}^{K} α_{i}

,

N = \sum_{i = 1}^{K} N_{i}

. At the meanwhile, the posterior mean is given by:

{\hat{p}}_{i} = \frac{α_{i} + N_{i}}{α_{0} + N}

References

Barra, M. Barra Risk Model Handbook; MSCI, 2004.
Jolliffe, I.T. Principal component analysis, 2. ed., [nachdr.] ed.; Springer series in statistics; Springer: New York, 2004. [Google Scholar]
Hinton, G.; Dayan, P.; Revow, M. Modeling the manifolds of images of handwritten digits. IEEE Transactions on Neural Networks 1997, 8, 65–74. [Google Scholar] [CrossRef]
McLachlan, G.; Peel, D. Finite Mixture Models; Wiley, 2000. [CrossRef]
Rubin, D.B.; Thayer, D.T. EM Algorithms for ML Factor Analysis. Psychometrika 1982, 47, 69–76. [Google Scholar] [CrossRef]
Zoubin Ghahramani, G.E.H. Parameter Estimation for Linear Dynamical Systems. Technical Report CRG-TR-96-2, 1996. [Google Scholar]
Bai, J.; Wang, P. Identification and Bayesian Estimation of Dynamic Factor Models. Journal of Business Economic Statistics 2014, 33, 221–240. [Google Scholar] [CrossRef]
Kim, C.J.; Halbert, D.C.R. State-Space Models with Regime Switching: Classical and Gibbs-Sampling Approaches with Applications; The MIT Press, 1999. [CrossRef]
Geweke, J.; Zhou, G. Measuring the price of the Arbitrage Pricing Theory. The Review of Financial Stuidies, vol. 9, no.2, 1996. [Google Scholar]
Forni, Mario, D. G.M.L.; Reichlin, L. Opening the Black Box: Structure Factor Models with Large Cross Sections. Econometric Theory 25, no. 5 2009. [Google Scholar]
Doucet, A.; Johansen, A.M. A tutorial on particle filtering and smoothing: Fifteen years later. Handbook of nonlinear filtering 2009. [Google Scholar]
Arulampalam, M.; Maskell, S.; Gordon, N.; Clapp, T. A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 2002, 50, 174–188. [Google Scholar] [CrossRef]
Liu, J.; West, M. , Combined Parameter and State Estimation in Simulation-Based Filtering. In Sequential Monte Carlo Methods in Practice; Springer New York, 2001; pp. 197–223. [CrossRef]
Djuric, P.M.; Khan, M.; Johnston, D.E. Particle Filtering of Stochastic Volatility Modeled With Leverage. IEEE Journal of Selected Topics in Signal Processing 2012, 6, 327–336. [Google Scholar] [CrossRef]
Carvalho, C.M.; Johannes, M.S.; Lopes, H.F.; Polson, N.G. Particle Learning and Smoothing. Statistical Science 2010, 25. [Google Scholar] [CrossRef]
Gordon, N.; Salmond, D.; Smith, A. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proceedings F Radar and Signal Processing 1993, 140, 107. [Google Scholar] [CrossRef]
Doucet, A.; Godsill, S.; Andrieu, C. On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing 2000, 10, 197–208. [Google Scholar] [CrossRef]
Pitt, M.K.; Shephard, N. Filtering via Simulation: Auxiliary Particle Filters. Journal of the American Statistical Association 1999, 94, 590–599. [Google Scholar] [CrossRef]
Kantas, N.; Doucet, A.; Singh, S.S.; Maciejowski, J.; Chopin, N. On Particle Methods for Parameter Estimation in State-Space Models. Statistical Science 2015, 30. [Google Scholar] [CrossRef]
Schon, T.; Gustafsson, F.; Nordlund, P.J. Marginalized particle filters for mixed linear/nonlinear state-space models. IEEE Transactions on Signal Processing 2005, 53, 2279–2289. [Google Scholar] [CrossRef]
Storvik, G. Particle filters for state-space models with the presence of unknown static parameters. IEEE Transactions on Signal Processing 2002, 50, 281–289. [Google Scholar] [CrossRef]
Özkan, E.; Šmídl, V.; Saha, S.; Lundquist, C.; Gustafsson, F. Marginalized adaptive particle filtering for nonlinear models with unknown time-varying noise parameters. Automatica 2013, 49, 1566–1575. [Google Scholar] [CrossRef]
Lopes, H.F.; Tsay, R.S. Particle filters and Bayesian inference in financial econometrics. Journal of Forecasting 2010, 30, 168–209. [Google Scholar] [CrossRef]
CARTER, C.K.; KOHN, R. On Gibbs sampling for state space models. Biometrika 1994, 81, 541–553. [Google Scholar] [CrossRef]
Andrieu, C.; Doucet, A.; Holenstein, R. Particle Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society Series B: Statistical Methodology 2010, 72, 269–342. [Google Scholar] [CrossRef]
Del Moral, P. , Feynman-Kac Formulae. In Feynman-Kac Formulae; Springer New York, 2004; pp. 47–93. [CrossRef]
Urteaga, I.; Bugallo, M.F.; Djurić, P.M. Sequential Monte Carlo for inference of latent ARMA time-series with innovations correlated in time. EURASIP Journal on Advances in Signal Processing 2017, 2017. [Google Scholar] [CrossRef]
Carlin, B.P.; Polson, N.G. Inference for nonconjugate Bayesian Models using the Gibbs sampler. Canadian Journal of Statistics 1991, 19, 399–405. [Google Scholar] [CrossRef]
Avellaneda, M.; Lee, J.H. Statistical arbitrage in the US equities market. Quantitative Finance 2010, 10, 761–782. [Google Scholar] [CrossRef]
Khandani, A.E.; Lo, A.W. What Happened to the Quants in August 2007?: Evidence from Factors and Transactions Data. SSRN Electronic Journal 2008. [Google Scholar] [CrossRef]
Jasra, A.; Singh, S.S.; Martin, J.S.; McCoy, E. Filtering via approximate Bayesian computation. Statistics and Computing 2010, 22, 1223–1237. [Google Scholar] [CrossRef]
Schoutens, W. Lévy processes in finance, reprinted ed.; Wiley series in probability and statistics; Wiley: Chichester [u.a.], 2005. [Google Scholar]
Murphy, K.P. Machine Learning - A Probabilistic Perspective; Adaptive Computation and Machine Learning, MIT Press: Cambridge, 2014; Description based upon print version of record. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy

Abstract

Keywords:

Subject:

1. Introduction

2. State Space Model and Its Estimation

2.1. Kalman Filter and Smoother

2.2. Parameter Estimation using EM algorithm

2.3. Particle Filter

2.3.1. SIS Particle Filter

2.3.2. SIR Particle Filter

2.3.3. Auxiliary Particle Filter

2.4. Particle Filter with Parameter Learning

2.4.1. Particle Learning

3. Statistical Factor Analysis

3.1. MLE Factor Analsis

3.2. Mixture of Factor Analyzer

3.3. Regime Switching Dynamic Factor Model

3.3.1. Methods

3.3.2. Simulation Analysis

4. Results in Statistical Arbitrage Strategy

4.1. Market Neutral Strategy

4.2. Performance Comparison

5. Discussion

Appendix A. Probability Density Function of p(y_t ∣z_t−1)

Appendix B. Probability Density Function of p(x_t ∣x_t−1,θ _t−1, y_t

Appendix C. Conjugate Prior of Categorical Distribution

References

MDPI Initiatives

Important Links

Subscribe

Particle Filtering Estimation of Regime Switching Factor Model and Its Application in Statistical Arbitrage Strategy

Abstract

Keywords:

Subject:

1. Introduction

2. State Space Model and Its Estimation

2.1. Kalman Filter and Smoother

2.2. Parameter Estimation using EM algorithm

2.3. Particle Filter

2.3.1. SIS Particle Filter

2.3.2. SIR Particle Filter

2.3.3. Auxiliary Particle Filter

2.4. Particle Filter with Parameter Learning

2.4.1. Particle Learning

3. Statistical Factor Analysis

3.1. MLE Factor Analsis

3.2. Mixture of Factor Analyzer

3.3. Regime Switching Dynamic Factor Model

3.3.1. Methods

3.3.2. Simulation Analysis

4. Results in Statistical Arbitrage Strategy

4.1. Market Neutral Strategy

4.2. Performance Comparison

5. Discussion

Appendix A. Probability Density Function of p(yt ∣zt−1)

Appendix B. Probability Density Function of p(xt ∣xt−1,θ t−1, yt

Appendix C. Conjugate Prior of Categorical Distribution

References

MDPI Initiatives

Important Links

Subscribe

Appendix A. Probability Density Function of p(y_t ∣z_t−1)

Appendix B. Probability Density Function of p(x_t ∣x_t−1,θ _t−1, y_t