# MicrostructureNoise

## 1. Introduction

(Available as Jupyter notebook at https://github.com/mschauer/MicrostructureNoise.jl/blob/master/example/MicrostructureNoise.ipynb)

`MicrostructureNoise`

is a Julia package for Bayesian volatility estimation in presence of market microstructure noise implementing the methodology described in our new preprint:

- Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility learning under microstructure noise. arxiv:1805.05606, 2018.

This blogpost gives a short introduction.

### 1.1. Description

MicrostructureNoise estimates the volatility function $s$ of the stochastic differential equation

from noisy observations of its solution

where ${V_i}$ denote unobservable stochastic disturbances. The method is minimalistic in its assumptions on the volatility function, which in particular can itself be a stochastic process.

The estimation methodology is intuitive to understand, given that its ingredients are well-known statistical techniques. The posterior inference is performed via the Gibbs sampler, with the Forward Filtering Backward Simulation algorithm used to reconstruct unobservable states $X(t_i)$. This relies on the Kalman filter. The unknown squared volatility function is a priori modelled as piecewise constant and is assigned the inverse Gamma Markov chain prior, which induces smoothing among adjacent pieces of the function. Next to yielding point estimates (e.g. the posterior mean), the method conducts automatic uncertainty quantification via marginal credible bands.

### 1.2. Setup

Install `MicrostructureNoise`

via the package manager.

using Pkg Pkg.add.(["MicrostructureNoise", "Distributions", ]); ENV["PYTHON"] = "" Pkg.pkg"add Plots#master PyPlot#master PyCall#master LaTeXStrings#master Conda#master" Pkg.pkg"precompile"

Then load the packages below.

using MicrostructureNoise, Distributions, Plots using Random, Dates, LinearAlgebra, DelimitedFiles pyplot() Random.seed!(12);

## 2. Real data example

As a first example, we apply our methodology to infer volatility of the high frequency foreign exchange rate data made available by Pepperstone Limited, the London based forex broker (https://pepperstone.com/uk/client-resources/historical-tick-data). Specifically, we use the EUR/USD tick data (bid prices) for 2 March 2015. We retrieve, log-transform and subsample the data and express time in milliseconds.

# uncomment if you do not mind to create this large file # Base.download("https://www.truefx.com/dev/data//2015/MARCH-2015/EURUSD-2015-03.zip","./data/EURUSD-2015-03.zip") # run(`unzip ./data/EURUSD-2015-03.zip -d ./data`) dat = readdlm("../data/EURUSD-2015-03.csv", ',') times = map(a -> DateTime(a, "yyyymmdd H:M:S.s"), dat[1:10:130260,2]) t = Float64[1/1000*(times[i].instant.periods.value-times[1].instant.periods.value) for i in 1:length(times)] n = length(t)-1 T = t[end] y = log.(Float64.(dat[1:10:130260, 3]));

plot(t[1:10:end], y[1:10:end], label="EUR/USD")

The prior specification is done via the `Prior`

struct.

prior = MicrostructureNoise.Prior( N = 40, α1 = 0.0, β1 = 0.0, αη = 0.01, βη = 0.01, Πα = LogNormal(1., 0.5), μ0 = 0.0, C0 = 5.0 );

Most importantly, we set the number of bins `N`

. A second import choice is the hyper-prior of the smoothing parameter `Πα`

. The technical meaning of the other parameter is explained in our paper.

We now sample the posterior using MCMC.

α = 0.3 # Initial smoothness hyperparameter guess σα = 0.1 # Random walk step size for smoothness hyperparameter td, θs, ηs, αs, p = MicrostructureNoise.MCMC(prior, t, y, α, σα, 1500, printiter=500);

The inline help gives a synopsis of the function `MCMC`

(and other functions).

posterior = MicrostructureNoise.posterior_volatility(td, θs) tt, mm = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_median[:]) plot(tt, mm, label="posterior median", color="dark blue") plot!(MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qlow[:])..., fillrange = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qup[:])[2], fillalpha = 0.2, alpha = 0.0, fillcolor="blue", title="Volatility estimate", label="marginal $(round(Int,posterior.qu*100))% credibility band")

A histogram of `sqrt.(ηs)`

visualises the posterior of the observation error `η`

, indicating that there is indeed a non-negligible microstructure noise.

histogram(sqrt.(ηs[end÷2:end]), nbins=20, label="eta")

## 3. How good does it work: Test with the Heston model

In order to test our model, we simulate a trajectory of an asset following the Heston model (https://en.wikipedia.org/wiki/Heston_model).

The Heston model is a widely used stochastic volatility model where it is assumed that the price process of a certain asset, denoted by $S$, evolves over time according to the SDE

where the process $Z$ follows the CIR or square root process,

Here $W$ and $B$ are correlated Wiener processes with correlation $\rho$.

For this generated asset price trajectory we can compare our volatility estimate from noisy data with the true volatility known from the simulation.

### 3.1. Generate data from the Heston model

To define the model in Julia and to simulate the trajectory use the package `Bridge`

. Install it via the package manager.

Pkg.add("Bridge")

Usually we would not perform inference with $S$, but with its logarithm $X_t=\log S_t$. However, we simulate $S$ directly.

using Bridge using StaticArrays const R2 = SVector{2,Float64};

struct Heston <: ContinuousTimeProcess{R2} mu::Float64 kappa::Float64 theta::Float64 sigma::Float64 end Heston(;mu = NaN, kappa = NaN, theta = NaN, sigma = NaN) = Heston(mu, kappa, theta, sigma)

Define the drift $b$ and the dispersion coefficient $\sigma$ of the process.

Bridge.b(s, x, P::Heston) = R2(P.mu*x[1], P.kappa*(P.theta - x[2])) Bridge.σ(s, x, P::Heston) = Diagonal(R2(sqrt(x[2])*x[1], P.sigma*sqrt(x[2])))

We choose the following setting, where the parameter values match what one may expect with real data.

T = 1.0 n = 4000 P = Heston(mu = 0.02, kappa = 7.0, theta = 0.04, sigma = 0.6)

The process is first simulated on a fine grid and next subsampled at $n=4000$ random time points. Here we define these grids. Random sampling creates a non-uniform grid, which is what one typically sees with real financial data, where observations are not equispaced in time.

nf = 10 # generate the process on a finer grid than it is observed tfine0 = T .* sort(rand(n*nf-1)) tfine = [0.0; tfine0; T] is = [i <= n-1 for i in 1:length(tfine)-2] is = [true; is[randperm(length(is))]; true] t = tfine[is];

Now we generate a sample path with the Euler scheme from correlated Brownian motions, and $\log$-transform and subsample the data. By Itô's formula we know the true volatility of the log-transformed process $S$ is in fact the process $Z$.

u = R2(1., P.theta) W = sample(tfine, Wiener{R2}()) ρ = -0.6 map!(v->R2(v[1], ρ*v[1] + sqrt(1-ρ^2)v[2]), W.yy, W.yy) # correlate Brownian motions Xfine = solve(EulerMaruyama(), u, W, P) xtrue = log.(first.(Xfine.yy[is])) s0(t) = sqrt(Xfine.yy[searchsortedfirst(Xfine.tt, t)][2]) y = copy(xtrue) η0 = 0.005^2 y .+= randn(n + 1) * sqrt(η0);

### 3.2. Perform inference using `MicrostructureNoise`

.

`MicrostructureNoise`

.This proceeds along the same lines as before.

α = 5.0 # Initial smoothness hyperparameter guess σα = 3.0 # Random walk step size for smoothness hyperparameter td, θs, ηs, αs, p = MicrostructureNoise.MCMC(prior, t, y, α, σα, 10000, printiter=500); println("acceptance probability $p")

posterior = MicrostructureNoise.posterior_volatility(td, θs) tt, mm = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_median[:]) p = plot(tt, mm, label="posterior median", linewidth=0.5, color="dark blue") plot!(t[1:10:end], (s0.(t[1:10:end])).^2, label="true volatility", color="red") plot!(MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qlow[:])..., fillrange = MicrostructureNoise.piecewise(posterior.post_t, posterior.post_qup[:])[2], fillalpha = 0.2, alpha = 0.0, fillcolor="darkblue", title="Volatility estimate", label="marginal $(round(Int,posterior.qu*100))% credibility band") display(p)

The greyish band mostly covers the red curve, the true volatility. Also the posterior median (blue line) globally follows the movement of the true volatility.

This is nice: Without any knowledge of the model, completely ignoring the drift, we still get a nice estimate and reasonable uncertainty quantification of the underlying volatility, and that having only indirect and noisy observations at our hand.

A histogram confirms that also the estimate of the observational noise is close to its true value.

histogram(sqrt.(ηs[end÷2:end]), nbins=20, label="eta")

## 4. References

Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility estimation. arxiv:1801.09956, 2018.

Shota Gugushvili, Frank van der Meulen, Moritz Schauer, and Peter Spreij: Nonparametric Bayesian volatility learning under microstructure noise. arxiv:1805.05606, 2018.