Moritz Schauer / Sep 06 2018

MicrostructureNoise

Shota Gugushvili, Moritz Schauer

1.
Introduction

(Available as Jupyter notebook at https://github.com/mschauer/MicrostructureNoise.jl/blob/master/example/MicrostructureNoise.ipynb)

MicrostructureNoise is a Julia package for Bayesian volatility estimation in presence of market microstructure noise implementing the methodology described in our new preprint:

  • Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility learning under microstructure noise. arxiv:1805.05606, 2018.

This blogpost gives a short introduction.

1.1.
Description

MicrostructureNoise estimates the volatility function ss of the stochastic differential equation

dXt=b(t,Xt)dt+s(t)dWt,X0=x0,t[0,T]dX_t = b(t,X_t) dt + s(t) dW_t, \quad X_0 = x_0, \quad t \in [0,T]

from noisy observations of its solution

Yi=X(ti)+Vi,0<t1<<tn=TY_i = X(t_i) + V_i, \quad 0 < t_1 < \ldots < t_n = T

where Vi{V_i} denote unobservable stochastic disturbances. The method is minimalistic in its assumptions on the volatility function, which in particular can itself be a stochastic process.

The estimation methodology is intuitive to understand, given that its ingredients are well-known statistical techniques. The posterior inference is performed via the Gibbs sampler, with the Forward Filtering Backward Simulation algorithm used to reconstruct unobservable states X(ti)X(t_i). This relies on the Kalman filter. The unknown squared volatility function is a priori modelled as piecewise constant and is assigned the inverse Gamma Markov chain prior, which induces smoothing among adjacent pieces of the function. Next to yielding point estimates (e.g. the posterior mean), the method conducts automatic uncertainty quantification via marginal credible bands.

1.2.
Setup

Install MicrostructureNoise via the package manager.

using Pkg
Pkg.add.(["MicrostructureNoise", "Distributions", ]);

ENV["PYTHON"] = ""
Pkg.pkg"add Plots#master PyPlot#master PyCall#master LaTeXStrings#master Conda#master"
Pkg.pkg"precompile"

Then load the packages below.

using MicrostructureNoise, Distributions, Plots
using Random, Dates, LinearAlgebra, DelimitedFiles
pyplot()
Random.seed!(12);

2.
Real data example

As a first example, we apply our methodology to infer volatility of the high frequency foreign exchange rate data made available by Pepperstone Limited, the London based forex broker (https://pepperstone.com/uk/client-resources/historical-tick-data). Specifically, we use the EUR/USD tick data (bid prices) for 2 March 2015. We retrieve, log-transform and subsample the data and express time in milliseconds.

# uncomment if you do not mind to create this large file 
# Base.download("https://www.truefx.com/dev/data//2015/MARCH-2015/EURUSD-2015-03.zip","./data/EURUSD-2015-03.zip")
# run(`unzip ./data/EURUSD-2015-03.zip -d ./data`)
dat = readdlm("../data/EURUSD-2015-03.csv", ',')
times = map(a -> DateTime(a, "yyyymmdd H:M:S.s"), dat[1:10:130260,2])
t = Float64[1/1000*(times[i].instant.periods.value-times[1].instant.periods.value) for i in 1:length(times)]
n = length(t)-1
T = t[end]
y = log.(Float64.(dat[1:10:130260, 3]));
plot(t[1:10:end], y[1:10:end], label="EUR/USD")

The prior specification is done via the Prior struct.

prior = MicrostructureNoise.Prior(
N = 40,

α1 = 0.0,
β1 = 0.0,

αη = 0.01, 
βη = 0.01,

Πα = LogNormal(1., 0.5),
μ0 = 0.0,
C0 = 5.0
);

Most importantly, we set the number of bins N. A second import choice is the hyper-prior of the smoothing parameter Πα. The technical meaning of the other parameter is explained in our paper.

We now sample the posterior using MCMC.

α = 0.3 # Initial smoothness hyperparameter guess
σα = 0.1 # Random walk step size for smoothness hyperparameter

td, θs, ηs, αs, p = MicrostructureNoise.MCMC(prior, t, y, α, σα, 1500, printiter=500);

The inline help gives a synopsis of the function MCMC (and other functions).

?MicrostructureNoise.MCMC
text/markdown
posterior = MicrostructureNoise.posterior_volatility(td, θs)

tt, mm = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_median[:])

plot(tt, mm, label="posterior median", color="dark blue")
plot!(MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qlow[:])...,
    fillrange = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qup[:])[2], 
    fillalpha = 0.2,
    alpha = 0.0,
    fillcolor="blue",
    title="Volatility estimate", label="marginal $(round(Int,posterior.qu*100))% credibility band")

A histogram of sqrt.(ηs) visualises the posterior of the observation error η, indicating that there is indeed a non-negligible microstructure noise.

histogram(sqrt.(ηs[end÷2:end]), nbins=20, label="eta")

3.
How good does it work: Test with the Heston model

In order to test our model, we simulate a trajectory of an asset following the Heston model (https://en.wikipedia.org/wiki/Heston_model).

The Heston model is a widely used stochastic volatility model where it is assumed that the price process of a certain asset, denoted by SS, evolves over time according to the SDE

dSt=μStdt+ZtStdWt,d S_t = \mu S_t d t + \sqrt{Z_t } S_t d W_t,

where the process ZZ follows the CIR or square root process,

dZt=κ(θZt)dt+σZtdBt.d Z_t = \kappa ( \theta - Z_t ) d t + \sigma \sqrt{Z_t} d B_t.

Here WW and BB are correlated Wiener processes with correlation ρ\rho.

For this generated asset price trajectory we can compare our volatility estimate from noisy data with the true volatility known from the simulation.

3.1.
Generate data from the Heston model

To define the model in Julia and to simulate the trajectory use the package Bridge. Install it via the package manager.

Pkg.add("Bridge")

Usually we would not perform inference with SS, but with its logarithm Xt=logStX_t=\log S_t. However, we simulate SS directly.

using Bridge
using StaticArrays
const R2 = SVector{2,Float64};
struct Heston <: ContinuousTimeProcess{R2}
    mu::Float64
    kappa::Float64
    theta::Float64
    sigma::Float64
end
Heston(;mu = NaN, kappa = NaN, theta = NaN, sigma = NaN) = Heston(mu, kappa, theta, sigma)

Define the drift bb and the dispersion coefficient σ\sigma of the process.

Bridge.b(s, x, P::Heston) = R2(P.mu*x[1], P.kappa*(P.theta - x[2]))
Bridge.σ(s, x, P::Heston) = Diagonal(R2(sqrt(x[2])*x[1], P.sigma*sqrt(x[2])))

We choose the following setting, where the parameter values match what one may expect with real data.

T = 1.0
n = 4000
P = Heston(mu = 0.02, kappa = 7.0, theta = 0.04, sigma = 0.6)

The process is first simulated on a fine grid and next subsampled at n=4000n=4000 random time points. Here we define these grids. Random sampling creates a non-uniform grid, which is what one typically sees with real financial data, where observations are not equispaced in time.

nf = 10 # generate the process on a finer grid than it is observed

tfine0 = T .* sort(rand(n*nf-1))
tfine = [0.0; tfine0; T]

is = [i <= n-1 for i in 1:length(tfine)-2]
is = [true; is[randperm(length(is))]; true]
t = tfine[is];
    

Now we generate a sample path with the Euler scheme from correlated Brownian motions, and log\log-transform and subsample the data. By Itô's formula we know the true volatility of the log-transformed process SS is in fact the process ZZ.

u = R2(1., P.theta)
W = sample(tfine, Wiener{R2}())
ρ = -0.6
map!(v->R2(v[1], ρ*v[1] + sqrt(1-ρ^2)v[2]), W.yy, W.yy) # correlate Brownian motions

Xfine = solve(EulerMaruyama(), u, W, P)

xtrue = log.(first.(Xfine.yy[is]))
s0(t) = sqrt(Xfine.yy[searchsortedfirst(Xfine.tt, t)][2])
y = copy(xtrue) 

η0 = 0.005^2
y .+= randn(n + 1) * sqrt(η0);

3.2.
Perform inference using MicrostructureNoise.

This proceeds along the same lines as before.

α = 5.0 # Initial smoothness hyperparameter guess
σα = 3.0 # Random walk step size for smoothness hyperparameter

td, θs, ηs, αs, p = MicrostructureNoise.MCMC(prior, t, y, α, σα, 10000, printiter=500);

println("acceptance probability $p")
posterior = MicrostructureNoise.posterior_volatility(td, θs)

tt, mm = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_median[:])

p = plot(tt, mm, label="posterior median", linewidth=0.5, color="dark blue")
plot!(t[1:10:end], (s0.(t[1:10:end])).^2, label="true volatility", color="red")


plot!(MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qlow[:])...,
    fillrange = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qup[:])[2], 
    fillalpha = 0.2,
    alpha = 0.0,
    fillcolor="darkblue",
    title="Volatility estimate", label="marginal $(round(Int,posterior.qu*100))% credibility band")

display(p)

The greyish band mostly covers the red curve, the true volatility. Also the posterior median (blue line) globally follows the movement of the true volatility.

This is nice: Without any knowledge of the model, completely ignoring the drift, we still get a nice estimate and reasonable uncertainty quantification of the underlying volatility, and that having only indirect and noisy observations at our hand.

A histogram confirms that also the estimate of the observational noise is close to its true value.

histogram(sqrt.(ηs[end÷2:end]), nbins=20, label="eta")

4.
References

  • Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility estimation. arxiv:1801.09956, 2018.

  • Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility learning under microstructure noise. arxiv:1805.05606, 2018.