Bayesian estimation of the fractional Ornstein-Uhlenbeck instantaneous rate of asset return process: Evidence from high-frequency stock price data

Using recent developments in econometrics and computational statistics we consider the estimation of the instantaneous rate of asset return process when the underlying Data Generating Mechanism (DGM) is an Ornstein-Uhlenbeck process, driven by fractional noise, and sampled at fixed intervals of length h. To address the problem we adopt throughout the paper an exact discretization approach. This enable us to exploit the fact that a flow sampling scheme arises naturally when observing the DGM. For, while the instantaneous rate of return process is unobservable at points in time, its time integral over successive observations is observable since it equals the increment of log-prices. Exact discretization delivers an ARIMA(1,1,1) model for log-prices with a fractional driving noise. Building on the resulting exact discretization formulae and covariance function, a new Markov Chain Monte Carlo (MCMC) scheme is proposed and we examine the properties of both the time and frequency domain likelihoods / posteriors through Monte Carlo. For the exact discrete model we adopt a general sampling interval of length h. This allow us to determine the optimal choice of h independent of the sample size. An empirical application using high frequency stock price data is presented showing the relevance of aggregation over time issues in modelling asset prices.


Introduction
Continuous time models are gaining wide acceptance in econometrics and finance in view of data availability at very high frequencies.Modern finance theory rely heavily on continuous time analysis, see for example the influential book by Merton (1990).However, modelling in discrete time may induce temporal aggregation problems that occur when observing a flow variable or the timing of observations is different from that of the underlying DGM.These issues are largely ignored in applied work, however proper accounting for them is important to ensure the correct specification of asset price dynamics.The seminal article of Bergstrom (1983) and subsequent work provided a rigorous treatment of the estimation problem by adopting an exact discretization procedure and deriving the Exact Discrete Model (EDM) for a range of continuous time models.In this paper we consider the problem of estimating the instantaneous rate of asset return process when sampled at fixed time intervals of length h using exact discretization techniques.Note that we allow for a general, not necessary unit-spacing, sampling interval h.Zhu and Taqqu (2005) noted, in the case of a fBm, that although the rate of convergence of the variance parameter is N , where T N h = is the number of observations and T is the time span, irrespective of the sampling interval the value of optimal spacing depends from the Hurst exponent H. Using Bayesian technics, we are able to determine an optimal h that in general will depend from the values of the parameters and the sample size.We assume that between successive observations the, unobservable, asset return process obeys an Ornstein-Uhlenbeck (OU) process driven by a fractional noise.A flow sampling scheme allow us to express the integral of the realization of an asset return process over successive observations as the increment of log-prices.Thus, we are able to show that for a fixed sampling interval of length h, logprices are generated according to a fractional type ARIMA (1,1,1) model.The source of the particular ARIMA structure is traced back to the temporal aggregation properties of the flow sampling scheme.Furthermore, the derived exact discrete model does not contradict a benchmark EMH since in the stationary region, 0 a < , log prices possess a unit root exhibiting a stochastic trending behaviour.
Using high frequency stock price data from Athens Stock Exchange (ASE) we evaluate the importance of the aggregation over time effect on asset prices.Lo and Wang (1995) as well as Perron and Vodounou (2004, p. 207) noted also, in continuous time, the ARIMA structure of log prices.However, in their work the ARIMA structure is originated in the reduced form version of the stationary AR(1) plus random walk model.In discrete time, Lo and MacKinlay (1999, pp. 79-81) noted the empirical relevance of the ARMA(1,1) model for small to medium size log-price increments.Ait-Sahalia, Mykland and Zhang (2005) also noted the MA(1) structure of innovations in the presence of microstructure noise, and Griffin and Oomen (2008) the ARMA(1,1) dependence structure of high frequency returns in tick time and transaction time.We model persistence, through the Hurst parameter H, by introducing a correlated fractional driving noise in the OU process.Fractional noise is a flexible tool for modelling long term persistence or antipersistence in asset price time series.The OU process is mean reverting and its consequence for portfolio and consumption decisions has been considered in Wachter (2002) while empirical evidence and tests of mean reverting stock returns are stated in Lo and MacKinlay (1999) section 2. Long range dependence and anti-persistence are empirically relevant in high frequency data coming from financial markets where microstructure noise come into play.Moreover, it has important implications both in derivative pricing as well as in testing for possible arbitrage opportunities and departures from the Efficient Market Hypothesis (EMH).The estimation method, in this work, is based on closed-form solutions for the covariance matrix of the instantaneous rate of asset return process.Then, assuming that the instantaneous rate of return is a Gaussian fractional process the likelihood can be formulated easily and formal statistical inferences via an efficient MCMC scheme are possible.We use recently developed differential geometry samplers to cope with the likelihood / posterior and provide Monte Carlo results as well as an application to real-world data from the ASE.We examine both the time-domain and spectral-domain likelihoods / posteriors and compare their relative performance.Using proper numerical techniques we find that the likelihood / posterior is well behaved, autocorrelation of the MCMC samplers is mitigated to a great extent and inferences can be obtained in a fraction of time relative to more conventional MCMC schemes.In the Bayesian literature there have been many studies which implement Markov Chain Monte Carlo (MCMC) sampling schemes to draw from the posterior of the model.Most studies focus on an Euler discretization and treat the missing observations on the continuous path as latent variables.This overhead is quite important because when the time interval of observations is longer relative to the actual spacing of observations, the latent variables are highly autocorrelated and the performance of MCMC schemes can deteriorate very quickly.The difficulty is that the likelihood function is not available in closed form for most models, and so one has to resort to various approximations.Pedersen (1995) and Santa-Clara (1995) (see also Brandt and Santa-Clara, 2002) proposed a simulation-based approach, which is based on integrating out unobserved states of the process between pairs of observations.See also Gallant (2002).Ait-Sahalia (1999, 2002) proposed a method to approximate the transition density in a one-dimensional diffusion setting.The MCMC methods have been used by Eraker (2001), Jones (1999), Elerian et al. (2001), and Kim et al. (1998).Typically, MCMC methods simulate sample paths across unobserved intermediate points between two observations.When the amount of augmentation is large MCMC methods that rely on data augmentation can break down.Papaspiliopoulos et al. (2003) find that high correlation between unknown parameters in a diffusion coefficient and the missing data results in arbitrarily slow rates of convergence of sampling schemes such as the Gibbs sampler proposed in Eraker (2001) and Golightly and Wilkinson (2005).See also Golightly and Wilkinson (2008).Some attempts have been made to rectify this problem; Roberts and Stramer (2001) transform the stochastic differential equation to obtain constant diffusion coefficients so that the dependence problem is overcome.Beskos et al. (2006) propose alternative Monte Carlo methods that rely on noncentered parametrizations of the process.In this paper we show that although the density cannot be obtained in closed form the likelihood function can be obtained analytically for the class of continuous time autoregressive models with a fractional parameter.Since we rely on analytical results computation of the likelihood function is trivial and efficient Monte Carlo methods can be used to evaluate it.In addition, although the simple Metropolis-Hastings algorithm can be used, we implement more efficient alternatives and show in artificial and real world data that they perform well.The analytical methods of the paper in conjunction with Monte Carlo methods of inference can be used in a variety of contexts involving stochastic differential equations and can be thought of as extremely efficient alternatives to existing methods.Tsai and Chan (2005 a, b) consider maximum likelihood estimation of continuous time ARMA (p, q) processes driven by fractional Brownian motion while Tsai and Chan (2005 c) obtain the autocorrelation properties of aggregates from a CARFIMA model.However, it should be noticed, that unlike Tsai and Chan (2005 a, b and c) we base our analysis and empirical application on the derived EDM.The EDM is particularly convenient in the current framework, since, in addition to using it in the estimation procedures, forms the basis for the Monte Carlo investigation of the estimates.The paper is organized as follows.Section 2 sets the continuous time DGM and its assumptions.The exact discrete model, satisfying the underlying continuous time process, along with its second-order properties are stated in section 3. The likelihood functions, in time and spectral domain, are provided in section 4. The algorithms implementing Bayesian inference are described in section 5.An empirical implementation using high frequency data is discussed in section 6. Section 7 concludes.The derivation of the exact discretization formulae is provided in the Appendix.

The data generating mechanism
To fix notation in what follows, let the instantaneous rate of return be defined by ( ) , where ( ) P t is the price of an asset or portfolio at time t.In the sequel, the DGM is formulated assuming that log prices are influenced by the ongoing effects of the continuous time unobserved process ( ) r t .Therefore, although log prices are observed at points in time, as stock variables, their movement between successive observations are unobserved.Using an exact discretization procedure, we are able to express log-returns as an integral of the instantaneous rate of return ( ) r t , over the observational interval.This allows us to examine the implications and the empirical relevance of the fact that ( ) r t is observed as a flow variable (for the econometric implications of flow variables in macroeconomic modelling, see Bergstrom, 1984).In order to obtain the exact discrete model we shall make use of the following assumption: Assumption 1 The instantaneous rate of return is generated according to the Ornstein-Uhlenbeck type diffusion model under the fixed initial condition ( ) where the innovation ( ) Note that H is the Hurst parameter taking values on the interval (0,1).If ( ) 0 c H < and the spectral density tends to zero as 0 and the spectral density function tends to infinity as 0 ω → .Moreover, using a frequency domain approach, the fractional Gaussian noise possess the spectral density function: For a definition and properties of the fractional Gaussian noise see Samorodnitsky and Taqqu (1994).

The exact discrete model
Next, we state the exact discrete model along with its second-order properties.Following the discussion in the Introduction we assume that the investigator observes asset prices at fixed equally spaced points in time h, 2h, 3h, … over a fixed time interval [ ] 0, T .

Theorem.
Define the observed sample , where h is the fixed observational interval.Then under Assumption 1 th x satisfies the exact discrete model: where ( ) x as j j j K x e ds a x j the auto-covariance ( ) ( ) and spectral density functions of the error term are given, respectively, by where ( ) and The derived exact discrete model, stated in the Theorem, is particularly suitable for observable variables that are sampled at different rates since, unlike more conventional discrete-time models, it retains its dynamic structure irrespective of the sampling frequency.

Gaussian likelihood -time and frequency domain methods
To construct the likelihood function conditional on the initial value 0 0 x = , we first express the exact discrete model for t=1,...,T/h in system form as [ ]

. . . '
Next, define the T×T covariance matrix of the innovation vector [ ] and the parameter vector θ={a,c,H,σ²}.Then, apart from a constant, minus twice the logarithm of the Gaussian likelihood function is expressed by ( ) where the matrix is a function of the parameter vector θ , and ( ) − are calculated using by (10) using equations.( 3) and ( 8)-( 9).Let the Cholesky factorization be

MM ′ Ω =
, where M is a lower triangular matrix with positive elements ii m along the diagonal.Then the likelihood function takes the form ( ) where ( ) h ω is the spectral density of the error term th κ .The Whittle likelihood function is given by Where 1 ah e vh μ − = − .

MCMC samples from the posterior
The likelihood function is

( )
; θ L Y and ( ) p θ denotes the prior.Our prior has the standard reference form: . For the likelihood function we use both the time-domain and spectraldomain expressions and compare the results both in terms of parameter estimates, posterior distributions as well as convergence through the autocorrelation functions produced by the MCMC schemes.
The most significant advantage of our numerical scheme is that it reduces the difficult problem of Bayesian inference to sampling from a tri-variate posterior1 .To our knowledge, previous treatments of the problem typically rely on sampling from high-dimensional posteriors by treating the intermediate steps of the diffusion process as latent variables.Although Gibbs sampling can be used in these cases to facilitate the computations, these approaches are always prone to convergence problems associated with high autocorrelation induced when sampling the latent states in one way or another.Although a standard Metropolis-Hastings algorithm can be used, due to the high nonlinearity of the posterior we use the method of Girolami and Calderhead (2011) to perform Bayesian inference.As the authors write "[t]he proposed methodology exploits the Riemannian geometry of the parameter space of statistical models and thus automatically adapts to the local structure when simulating paths across this manifold providing highly efficient convergence and exploration of the target density".Given a kernel posterior distribution the method of Girolami and Calderhead (2011) aims at obtaining draws by exploiting the gradient and the Hessian.Roughly speaking, parameters are updated so that they visit the area of high posterior probability mass, not unlike standard Newton-Raphson numerical optimization techniques.Following Girolami and Calderhead (2011)  ε > is the integration step size.Since the discretization induces an unavoidable error in approximation of the posterior, a Metropolis step is used, where the proposal density is The discrete form of the above stochastic differential equations is: and the acceptance probability has the standard Metropolis form: ( ) Numerical singularity of covariance matrices typically occurs in the time-domain likelihood.In that case, we always used a "sweep inverse" and the results have been checked against Tykhonov regularization, see footnote 1.The spectral likelihood has been found much better behaved, which is probably responsible for the excellent behavior of the Girolami and Calderhead (2011) scheme in this case.Even with the time-domain likelihood, the Girolami and Calderhead (2011) MCMC scheme had no problems converging relatively fast and provide autocorrelations which were significantly lower compared to those of the Metropolis-Hastings update.Our implementation of the Metropolis-Hastings update relies on using a trivariate Student-t distribution with 5 degrees of freedom.The location and scale parameters were set to those obtained through ML estimation.The covariance matrix is scaled so that the algorithm provides an acceptance rate between 20% and 30%.The Girolami and Calderhead (2011) scheme was configured to produce acceptance rates in the same range by setting ε appropriately after experimentation.  1 reveal that the Whittle likelihood method provides better parameter estimates, in terms of MSE, for both the anti-persistent and long memory cases.

Unknown sampling interval
As long as h is not too close to zero (to avoid singularity of Ω) the sampling parameter h can be treated as unknown.The likelihood function is  ( ) ( ) , where ( ) . Other choices of the prior are ( ) ( ) ( ) and Jeffreys' prior ( ) ( ) Although the derivative can be computed numerically, the sampling expectation cannot.For this reason, approximation of Jeffreys' prior requires a simulation of the DGM for given , , a H covariance V which can be computed from the MCMC draws.Therefore, the log marginal likelihood can be computed approximately as: .
There is a related work in several papers that investigate the impact of the sampling rate on the properties of parameter estimates as well as optimality criteria of sampling schemes.See Zhu and Taqqu (2005) that examine the effects of sampling rate in the case of the fBm by utilizing the asymptotic covariance matrix of the estimates.Chambers (2004) considers the consistency properties of tests in unit root cases in flow data when the time span and/or frequency of observations increases.

Empirical results
The exact discretization (5) induces an ARIMA(1,1,1) model for log prices driven by fractional noise.Therefore, a negative a results in mean reversion in asset returns and a stochastic trending behavior for log prices, a well-known stylized feature of asset prices generated by competitive markets.When 0<H<0.5 the driving noise is anti-persistent meaning that it reverts too frequently, while for 0.5<H<1 it exhibits non-periodic long cycles.Econometric implications and empirical evidence of long memory/anti-persistence form a vast area in the relevant literature.Evidence for the fractal behavior of stock prices are provided for example in Lo and MacKinlay (1999, pp 147-184) and in Mulligan (2004).Moreover, Lillo and Farmer (2004) and Bouchaud, Kockelkoren & Potters (2006) investigate the fractal structure of market orders using high frequency data.In this work we compare three priors for the ETE (National Bank of Greece) and ATE (Agricultural Bank of Greece) price data, listed in the ASE, using M=1,000 simulations of the process.We implement the algorithms using two high frequency data sets.The first data set is obtained setting the sampling interval at 48 minutes and the second data set at a denser 5-minute interval.The data are from the period June 2009 to May 2010 (10:00 am to 17:00 pm) comprising 2,611 observations in the first case and 65,534 observations in the second case.The returns are plotted in Figure 1. Figure 1 2 The Jeffreys prior is approximated using 1,000 simulations of the data generating process.We opted for this choice after conducting experiments with 500, 1,000, and 5,000 simulations in a preliminary investigation for a wide range of parameter values.The parameter α, reported in Tables 2 and 3, is always negative but not significantly far from 0. In order to check the robustness of the estimates we report empirical results for different values of the parameter h in Table 2. Parameter H is concentrated around 0.3, in the anti-persistent region, for ETE and 0.5 for ATE.A Hurst exponent H=0.5 corresponds to an uncorrelated driving noise.The spectral domain posteriors seem to be more concentrated around the mean.The effect of changing h for parameters α and H are reported in Figure 3 for ETE and Figure 4 for ATE for the case of 48-min sampling data set.

Figure 3
Figure 4 It is apparent that for h=1/14 marginal posteriors become broadly more concentrated around the posterior mean, therefore this value is close to an optimal one for our model.This effect is stronger in the case of the adjustment parameter α compared to the long memory parameter H. Zhu and Taqqu (2005), from a theoretical standpoint, investigate the determination of an optimal sampling interval h and its dependence from the parameter values.The Monte Carlo results concerning the existence of an optimal sampling deserve further investigation.In Figure 5, we report autocorrelation functions corresponding to the MCMC draws for parameter H for the Girolami-Calderhead sampler (blue bars) and a simple Metropolis-Hastings sampler (red bars).
Figure 5 The behavior of the Girolami-Calderhead sampler is better by far as autocorrelations die out more rapidly and are practically zero after lag 30, which justified our choice to thin every other 50 th draw.
In Figure 6, we report marginal posteriors of α and H for the denser 5-min spacing data set of ETE and ATE from which it can be seen that the spectral domain posteriors behave much better at least for parameter H. Marginal posterior densities of the parameter h for the 48-min sampling data set are reported in Figure 7.

Figure 7
The marginal posteriors are obtained using the Jeffreys prior and the priors A and B that we described earlier and plotted are also the Jeffreys prior itself for comparison.The marginal posteriors are not very different showing that posterior results about the sampling interval parameter h are not excessively sensitive to the choice of the prior.The ETE returns in the time domain show some sensitivity, which is not, however, replicated by the spectral domain posterior, so it must be attributed to the different, perhaps inferior, behavior of the former.From the spectral domain marginal posteriors of h it turns out that the posterior mean is close to 0.07 and between 0.04 and 0.10 with high posterior probability mass.
In figures 8 and 9 we plot the log marginal likelihood for h, considered as a parameter, for the 48-min and 5-min data sets respectively.The results provide an explanation of Figure 3 , since an optimal h for the 48-min sampled data is close to 0.07 (≈1/14) while for the 5-min sampled data an optimal h is close to 0.03.In view of the actual sampling rate of the original data this is quite reasonable and leads us to believe that the new methods perform quite well.
In Table 4, we report posterior moments from MCMC using conventional discretization methods by treating the missing data as parameters to be estimated following the method of Eraker (2001); see also Elerian, Chib and Shephard (2001).The reported posterior standard deviations were computed using a Newey-West HAC covariance matrix with 10 lags.

Table 4
The autocorrelations even at lag 50 are still quite large, and relative numerical efficiency is quite low.Conventional MCMC was run to obtain one million draws which we used to obtain the results in Table 4.This indicates the vast improvement in performance of MCMC which can result from exact discretization and use of the time-domain likelihood function.Moreover, conventional MCMC is reasonably close to the results from time-domain or spectral-domain likelihoods showing that the new methods perform quite well.

Discussion
In this paper, we proposed a methodology for the parametric estimation of the instantaneous rate of asset returns process and assessed the empirical implementation using high frequency stock price data.The assumed data generating process is a mean reverting Ornstein-Uhlenbeck process with fractional noise.The exact discretization delivers an ARIMA(1,1,1) fractional model for log-prices and allows us to use closed-form formulae for the covariance matrix which facilitates greatly the formulation of a Gaussian likelihood / posterior in the time-domain or the frequency-domain.In addition, we derive the exact discrete model using a general non-unit sampling interval h.This permit us to optimize h along with the rest of the parameters.We have introduced a new MCMC scheme and have examined the properties of both time-domain and frequency-domain likelihoods / posteriors through a Monte Carlo application.The new techniques applied to two datasets from the Athens Stock Exchange.Relative to existing MCMC samplers, which use approximate discretization and treat the missing observations of the time path as latent variables, we found that our new t ε is a continuous time fractional Gaussian noise that possesses the auto-covariance function

.
from (12) it follows, after some simple algebraic, manipulations Thus from (11) we deduce the spectral density function of the discrete error term:

.
Our prior has the form: ( ) Marginal posterior densities of parameters α and H are provided in Figure2for both the time domain (straight line) and the spectral domain likelihood functions for ETE (left panel) and ATE (right panel).
Figure 8 auto-covariance function (10).Finally, using the spectral transfer function ( ) K ω , eq. (12), we deduce the continuous time spectral representation of the error term th

Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 4 July 2016 doi:10.20944/preprints201607.0004.v1
] , the spectral density of the exact discrete model is given by we utilize Metropolis adjusted Langevin and Hamiltonian Monte Carlo sampling methods defined on the Riemann manifold, since we are sampling from target densities with high dimensions that exhibit strong degrees of correlation.Consider the Langevin diffusion:

Table 1
reports total MSE for parameters , a H andσ estimated by Whittle and Gaussian likelihood methods on the basis of the exact discrete model stated in the Theorem.The results in Table