Moritz Schauer / Jul 28 2019
Remix of Julia by Nextjournal

Bayesian filter

Introduction

The North Greenland Ice Core Project obtained a core sample of ice from drilling through the arctic ice shield. The oldest ice in the sample formed ca ~120000 years ago. Melting the ice and measuring its chemical composition gives information about past climate on earth. A time series which was obtained this way is shown in Figure 1. In this case, the relative difference of the ratio of occurrence of the two oxygen isotopes and from a reference value (in per mille) was measured,

δ18O=(18O16Orr)×1000.\delta {^{18}O}= \left({\frac{{\dfrac {{ {^{18}O}}}{{ {^{16}O}}} }- r}{r}}\right)\times 1000.

In paleoscience, records from ice cores are used learn about past temperature on earth. Here time is given as years before present time with the year 2000 defined as present time (so years before the year 2000).

Fig. 1: Raw time series.

These measurements are distorted by noise. Scientists are interested in a quantity (the true values) which changes over time, written as . They have produced a sequence of measurements

Y1,,YnY_1, \dots, Y_n

and, as the measurement errors stem from a variety of unrelated causes, model the measurement errors ,

Yt=Xt+ϵt. Y_t = X_t + \epsilon_t.

as independent random variables with Gaussian distribution with variance for each .

Fig. 2: Recent measurements of the timeseries.

The true mechanism under which changes in time is unknown, but the researchers state that a priori is believed to follow a normal distribution and

Xt+1=Xt+ηt, X_{t+1} = X_t + \eta_t,

for , where are independent normally distributed with mean zero and variance for .

Bayesian filter

The filtered distribution of is the conditional distribution of at time given observations up to and including . It captures what we know about if we have seen the measurement points . For a random walk model with independent observation errors as above the filtered distribution of is a normal distribution and its mean and variance can be computed recursively using the filtering equations

pt2=σ2Kt p^2_t = \sigma^2 K_t
μt=μt1+Kt(ytμt1)\mu_t = \mu_{t-1} + K_t(y_t - \mu_{t-1})

with $ the so called Kalman gain

Kt=s2+pt12s2+pt12+σ2,t>1 K_t = \frac{s^2 + p^2_{t-1}}{s^2 + p^2_{t-1} + \sigma^2}, \quad t >1

and . The mean of the filtered distribution, , serves as estimate of the unknown location of and the standard deviation $p_t$ can be seen as measure of uncertainty about the location.

Simulation

Apply the filter to simulated data to illustrate how the filter works. Simulate data from the model

Xt+1=Xt+ηt,X1N(μ0,p02)X_{t+1} = X_t + \eta_t, \quad X_1 \sim N(\mu_0,p^2_0)
Yt=Xt+ϵtY_t = X_t + \epsilon_t

Can you find values of , , and so that the simulated data visually somewhat resembles Figure 2? Use the Bayesian filter to reconstruct the signal from the observations . What happens if you take values for and in the filter which do not match the true values which have been used in the simulation?

Data analysis

Download the ice core isotope time series (see Figure 1) from https://www.dropbox.com/s/yvwgz71zgvkj7sg/icecoredata.csv .

icecoredata.csv
using DelimitedFiles

data, header = readdlm(
icecoredata.csv
, ',', header = true) display(header) [header; data]
10750×2 Array{Any,2}: "years BP 2000" " delta18O" -122280.0 -32.5309 -122270.0 -32.7474 -122260.0 -32.4231 -122250.0 -32.5502 -122240.0 -32.5729 -122230.0 -32.4632 -122220.0 -32.7979 -122210.0 -32.6084 -122200.0 -32.6188 ⋮ -14880.0 -37.9474 -14870.0 -42.0627 -14860.0 -41.6066 -14850.0 -41.1351 -14840.0 -40.411 -14830.0 -42.7777 -14820.0 -41.9283 -14810.0 -40.5719 -14800.0 -40.1437
using Plots
t = data[:, 1]
y = data[:, 2]
plot(t, y)

The second column of the CSV file contains the measurements (these are averaged measurements for ten year intervals from before present to before present.