EnKF.jl : Tools for data assimilation with Ensemble Kalman filter

1. Introduction

EnKF.jl is a framework for data assimilation using ensemble Kalman filter in Julia

The objective of this package is to allow a easy and fast setup of data assimilation problems using the ensemble Kalman filter. This package provides tools for:

- constructing data structure for an ensemble of members

- applying covariance inflation (additive, multiplicative, multiplico-additive...) on these ensemble

- setting the data assimilation problem for linear/nonlinear system with linear/nonlinear measurements

1.1. References

[1]: Evensen, Geir. "The ensemble Kalman filter: Theoretical formulation and practical implementation." Ocean dynamics 53.4 (2003): 343-367.

[2]: Asch, Mark, Marc Bocquet, and Maëlle Nodet. Data assimilation: methods, algorithms, and applications. Vol. 11. SIAM, 2016.

1.2. Installation

This package works on Julia 1.0 and above and is registered in the general Julia registry. In Julia 1.0 REPL, enter the package manager by typing ] and then type add EnKF to install the package.

] add EnKF

EnKF.jl is now installed

Then, in any version, type

using EnKF

The plots in this documentation are generated using Plots.jl (http://docs.juliaplots.org/latest/). You might want to install that, too, to follow the examples.

2. Review of the Ensemble Kalman Filter

2.1. Algorithm

The ensemble Kalman filter is a Monte Carlo view of the Kalman filter applicable to nonlinear systems. Instead of propagating the exact mean and covariance of the Kalman filter. These latter are approximated using an ensemble of size of randomly initialized members. The state of the -th member at time is denoted . Let assume that we know the ensemble members at time step , we can then propagate the state of each member using the nonlinear dynamics:

wheredenotes the estimate - measurements are not yet assimilated - of the state for the -th ensemble member .

Using the different estimate of the ensemble member, we can approximate the a priori mean aand covariance as follows:

From the small ensemble theory, one can show that the \textit{a priori} covariance is better approximated by averaging by rather than the size of the ensemble. For the ensemble Kalman filter, the innovation term - namely the difference between the measurements and the estimated one from the true state - is defined as:

However, Burgers \cite{burgers1998analysis} show we can not use the same innovation term to correct each ensemble member. Otherwise, the random character of is lost and leads to spurious correlations in the ensemble covariance. Therefore they have shown that we need to randomly perturbed the innovation of each ensemble member :

where for each member is drawn from a Gaussian distribution with zero mean and covariance. For now, we assume that measurement are linearly related to the state through the measurement matrix . The issue of a nonlinear measurement function will be addressed in the next section through an implicit linearization introduced by Evensen\cite{evensen1994sequential},\cite{evensen2003ensemble}.The analysis stage is then applied to each member of the ensemble:

The Kalman gain is common for all ensemble members and defined for every time step in its usual form:

2.2. Need for covariance inflation

As shown by Van Leeuwen , small ensemble always underestimates error covariances. As we assimilate more and more measurements into the ensemble Kalman filter, the estimated covariance decreases. Therefore for a given measurement noise level, the weight associated to the measurement in the analysis step will decreases and ultimately converges towards zero. Then measurements are no longer assimilated into the filter, this is called the covariance collapse phenomenon. To counteract this covariance collapse, we can artificially inflate the ensemble covariance to give a bigger weight to the measurements, this method is called covariance inflation.

Fortunately, we can prove the boundedness of the EnKF even for large inflation, Kelly, et al.\cite{kelly2014well}. Whitaker and Hamill\cite{whitaker2012evaluating} have propose to combine a multiplicative and an additive inflation. In practice, we compute the sample mean , apply the covariance inflation procedure to the state (\ref{covariance_inflation}) and then compute the sample covariance. For each ensemble member, the state is transformed through:

where is the scalar multiplicative covariance inflation parameter, common with all the other members. is the additive covariance inflation drawn from a Gaussian distribution with zero mean for each ensemble member and time step. Anderson, et al.\cite{anderson1999monte} have shown that additive and multiplicative inflation account for distinct kind of errors. The multiplicative inflation is better to counteract effects of sampling errors due to the small size of the ensemble, while additive multiplication tends to correct better errors intrinsic to the model. The multiplicative inflation only required the tuning of an unique parameter. Adjusting additive inflation requires a deeper understanding of the dynamic of the system and is usually harder to tune.

3. Handling states in EnKF.jl

In EnKF, ensemble of members are stored as a mutable structure EnsembleState{N, TS} where N is the number of members and TS an arbitrary type for each member.

3.1. Setting your ensemble state

Let's see an example of creating a blank ensemble of 5 members of vector of length 10 and fill each member with a vector full of ones:

ens = EnsembleState(5, zeros(10))
fill!(ens, ones(10))
ens
EnsembleState{5,Array{Float64,1}}(Array{Float64,1}[[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]])

EnsembleState supports three different constructors:

- The canonical one induced by the definition of EnsembleState{N, TS} by providing an Array{TS,1}:

ens = EnsembleState{5,Array{Float64,1}}([rand(10) for i=1:5])
EnsembleState{5,Array{Float64,1}}(Array{Float64,1}[[0.338282, 0.842324, 0.752329, 0.905438, 0.381031, 0.270683, 0.458307, 0.848904, 0.644845, 0.683655], [0.298761, 0.472655, 0.407752, 0.125314, 0.259537, 0.450847, 0.236414, 0.0983727, 0.82089, 0.77053], [0.283015, 0.846918, 0.471368, 0.290231, 0.617439, 0.613369, 0.956138, 0.836803, 0.513247, 0.314717], [0.304298, 0.42234, 0.590646, 0.209724, 0.544144, 0.713348, 0.0168549, 0.214257, 0.00230124, 0.000561584], [0.547115, 0.647394, 0.915026, 0.445036, 0.123147, 0.0144256, 0.99731, 0.130615, 0.775471, 0.664906]])

- Using EnsembleState(N::Int, u), this one create an ensemble of N blankmembers where each member is set to zero(u):

ens = EnsembleState(5, ones(10))
EnsembleState{5,Array{Float64,1}}(Array{Float64,1}[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]])

- Using EnsembleState(States::Array{TS,1}), this one create an ensemble of length length(States)`and fill the i-th member with States[i]: any type of state is supported by EnKF even your own data structures!

struct MyArray
	B::Array{ComplexF64,1}
end

A = MyArray(ones(10)+im*ones(10)) 

ens = EnsembleState(A.B)
EnsembleState{10,Complex{Float64}}(Complex{Float64}[1.0+1.0im, 1.0+1.0im, 1.0+1.0im, 1.0+1.0im, 1.0+1.0im, 1.0+1.0im, 1.0+1.0im, 1.0+1.0im, 1.0+1.0im, 1.0+1.0im])

3.2. Operations on EnsembleState

The goal of this subsection is to present all the actions defined for EnsembleState variables. All of them can be found in /src/state.jl

size : returns the number of ensemble members and the size of a member

size(ENS::EnsembleState{N, TS})  where {N, TS}

length : returns the number of ensemble members N of an EnsembleState variable

length(ENS::EnsembleState{N, TS})  where {N, TS} 

mean : compute the mean of an EnsembleState variable

"""
    Return the mean of all the ensemble members
"""
mean(ENS::EnsembleState{N, TS})  where {N, TS}

deviation : compute the deviation from its own mean of an EnsembleState variable

Different methods are available for this function

deviation(ENSfluc::EnsembleState{N, TS}, ENS::EnsembleState{N, TS}) where {N, TS}

deviation(tabfluc::Array{TS,2}, ENS::EnsembleState{N, TS}) where {N, TS}

and the in-place version deviation!

deviation!(ENS::EnsembleState{N, TS})  where {N, TS}

hcat : performs an horizontal concatenation of an EnsembleState variable (extended version for EnsembleState)

hcat(ENS::EnsembleState{N, TS})  where {N, TS}

cut : cut an array along the different columns and create an EnsembleState variable with these columns

cut(A::AbstractMatrix{TR}) where {TR}

fill! : create an EnsembleState variable filled with a vector

fill!(ENS::EnsembleState{N, TS}, A::TS)  where {N, TS}

We can also extend the addition + and subtraction - from the module Base to EnsembleState elements

function (+)(A::EnsembleState{N, TS}, B::EnsembleState{N, TS}) where {N, TS}
(-)(A::EnsembleState{N, TS}, B::EnsembleState{N, TS}) where {N, TS}

For example,

a = EnsembleState(5, zeros(10))
b = EnsembleState(5, zeros(10))

fill!(a, rand(10))
@show a
fill!(b, ones(10))
@show a + b
@show a - b;

As well as adding or subtracting a vector to each member of an EnsembleState variable

a - 2*ones(10)
EnsembleState{5,Array{Float64,1}}(Array{Float64,1}[[-1.53374, -1.90545, -1.71926, -1.18598, -1.74779, -1.91499, -1.26422, -1.70138, -1.45215, -1.38639], [-1.53374, -1.90545, -1.71926, -1.18598, -1.74779, -1.91499, -1.26422, -1.70138, -1.45215, -1.38639], [-1.53374, -1.90545, -1.71926, -1.18598, -1.74779, -1.91499, -1.26422, -1.70138, -1.45215, -1.38639], [-1.53374, -1.90545, -1.71926, -1.18598, -1.74779, -1.91499, -1.26422, -1.70138, -1.45215, -1.38639], [-1.53374, -1.90545, -1.71926, -1.18598, -1.74779, -1.91499, -1.26422, -1.70138, -1.45215, -1.38639]])

4. Inflation in EnKF.jl

Due to the covariance collapse (see section 2.2 for a reminder), the state vector need to be inflated.

EnKF.jl is equipped with several classes of inflation: identity inflation, additive inflation, multiplicative inflation, multiplico-additive inflation. Others inflations such as RPRS will be added soon.

Each type of inflation is defined as a subtype of the abstract type InflationType. For each type of inflation, we define an associated structure which contains its parameters and its action on an EnsembleState object.

For a subtype MyInflation which does, where is a real, we define:

"""
    MyInflation

An type to store MyInflation inflation :

Define MyInflation inflation: x <- β*x + β^3 with β a scalar

# Fields:
- 'β' : MyInflation inflation factor
"""
mutable struct MyInflation <: InflationType
	β :: Real
end

"""
Define action of MyInflation : x <- β*x + β^3
"""
function (A::MyInflation)(ENS::EnsembleState{N, TS}) where {N, TS}
    for s in ENS.S
        s .*= β
    		s .+= β^3
    end
    return ENS
end

4.1. Identity inflation

An identity inflation is of type IdentityInflation returns the inputted EnsembleState variable.

4.2. Additive inflation

An additive inflation is stored as a AdditiveInflation variable and adds a vector drawn from a given multivariate distribution MyDist to each member.

, where is drawn form a random distribution

In EnKF.jl, distributions are handled with the package Distributions.jl (see its documentation for more details)

By default, the additive inflation is a multivariate distribution with zero mean and unitary covariance matrix (i.e.)

"""
    AdditiveInflation
An type to store additive inflation :
Define additive inflation: x <- x + α with α a N-dimensional vector
drawn from a random distribution
# Fields:
- 'α' : Distribution of the additive inflation
"""

Let construct an additive inflation with Gaussian distribution with mean and as the diagonal coefficients of the covariance matrix.

# Add package Distributions.jl
using Pkg
Pkg.add.(["Distributions", "LinearAlgebra"])
using Distributions
using LinearAlgebra
A = AdditiveInflation(MvNormal([1.0; 2.0; 3.0; 4.0], Diagonal([0.1,0.2,0.3,0.4])))
AdditiveInflation{4}(DiagNormal( dim: 4 μ: [1.0, 2.0, 3.0, 4.0] Σ: [0.1 0.0 0.0 0.0; 0.0 0.2 0.0 0.0; 0.0 0.0 0.3 0.0; 0.0 0.0 0.0 0.4] ) )

mean and covariance of A can be obtained by:

@show mean(A)
@show var(A);

4.3. Multiplicative inflation

A multiplicative inflation is stored as a MultiplicativeInflation variable and multiply the deviation from the mean by the multiplicative inflation factor .

, where is the ensemble mean

"""
    MultiplicativeInflation
An type to store multiplicative inflation :
Define multiplicative inflation: x <- x + β*(x - x̂) with β a scalar
# Fields:
- 'β' : multiplicative inflation factor
"""
mutable struct MultiplicativeInflation{NS} <: InflationType

    "Multiplicative inflation factor β"
    β::Real
end

e.g.

M = MultiplicativeInflation(10, 1.02)
MultiplicativeInflation{10}(1.02)

4.4. Multiplico-additive inflation

A multiplico-additive inflation is stored as a MultiAdditiveInflation{NS} variable and multiply the deviation from the mean by the multiplicative inflation factor and add a vector drawn from a random distribution .

, where is drawn from .

"""
    MultiAdditiveInflation
An type to store multiplico-additive inflation :
Define multiplico-additive inflation: x̃⁻ <- x̂⁻ + β*(x̃⁻ - x̂⁻)  + α with β a scalar
# Fields:
- 'β' : Multiplicative inflation factor
- 'α' : Distribution of the additive inflation
"""
mutable struct MultiAdditiveInflation{NS} <: InflationType

    "Multiplicative inflation factor β"
    β::Real

    "Distribution of the additive inflation α"
    α::MultivariateDistribution
end

e.g.

MA = MultiAdditiveInflation(10, 1.0, MvNormal(zeros(10), I))
MultiAdditiveInflation{10}(1.0, IsoNormal( dim: 10 μ: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] Σ: [1.0 0.0 … 0.0 0.0; 0.0 1.0 … 0.0 0.0; … ; 0.0 0.0 … 1.0 0.0; 0.0 0.0 … 0.0 1.0] ) )

4.5. Exotic inflation

To handle exotic inflation, there is a type RecipeInflation which takes as input a vector of real parameters to define your specific inflation. You only need to provide its action on an EnsembleState variable to make things work !

5. Define an ensemble Kalman filter in EnKF.jl

We want users to be able to use EnKF.jl directly into their existing codes, without any modification. To do so, we define the following structures :

5.1. Propagation function

PropagationFunction is a type to handle the state advancement of each ensemble member.

To do so, a PropagationFunction function takes as arguments time t and ens the EnsembleState variable and advance the ensemble in time.

For example, this code will multiply each ensemble member by 2.

function (::PropagationFunction)(t::Float64, ens::EnsembleState{N, TS}) where {N, TS}
  
  for (i, state) in enumerate(ens.S)
    state .*= 2.0
  end
  
  return ens
end

# Define one PropagationFunction instance 
fprop = PropagationFunction()

5.2. Measurement function

MeasurementFunction is a type to define the state measurement function.

A MeasurementFunction function takes as arguments time t and state a variable of arbitrary type TS and returns the measured quantity

Note: MeasurementFunction acts on one member of the ensemble and not the entire ensemble. This choice was motivated by situations where communication is required between different members to propagate the ensemble.

Note: If you have your measurement is linearly dependent on your state, you also have to provide the associated measurement matrix .

To do so, we also define the action of MeasurementFunction on time t to provide .

For nonlinear systems, we use the technique of state augmentation described by van Leuween, et al. to avoid to provide the measurement matrix.

For example, this code will return the sum of the three first components of the state vector as the measurement.

# Define the measurement vector
function (::MeasurementFunction)(t::Float64, s::TS) where TS
    return [s[1]+s[2]+s[3]]
end

# Define the measurement matrix H
function (::MeasurementFunction)(t::Float64) 
    return reshape([1.0, 1.0 , 1.0],(1,3))
end

# Define one MeasurementFunction instance 
m = MeasurementFunction()

5.3. Real measurement function

Shift+Enter to run
""""
Define system ENKF which performs the
Fields:
 - 'f' : propagation function
 - 'A' : inflation
 - 'G' : filtering function acting on the state
 - 'm' : measurement function based on state
 - 'z' : real measurement function
 - 'ϵ' : measurement noise distribution
 - 'bounds' : bounds on certain states
 - 'isinflated' : Bool = true if state is inflated,
     = false otherwise
 - 'isfiltered' : Bool = true if state has to be filtered,
     = false otherwise
 - 'isaugmented' : Bool = true if measurement function is nonlinear,
     = false otherwise
"""
mutable struct ENKF{N, NZ}

    # "Ensemble of states"
    # ENS::EnsembleState{N, NS, TS}

    "Propagation function"
    f::PropagationFunction

    "Covariance Inflation"
    A::Union{InflationType, RecipeInflation}

    "Filter function"
    G::FilteringFunction

    "Measurement function based on state"
    m::MeasurementFunction

    "Real measurement function"
    z::RealMeasurementFunction

    "Measurement noise distribution"
    ϵ::AdditiveInflation{NZ}

    "Boolean: is state vector inflated"
    isinflated::Bool

    "Boolean: is state vector filtered"
    isfiltered::Bool

    "Boolean: is state vector augmented"
    isaugmented::Bool
    # "Bounds on certain state"
    # bounds

end

6. Ensemble Kalman filter in action

7. Examples

Interactive examples to understand how EnKF.jl works:

Data assimilation for the Lorenz attractor: here

Data assimilation for the Duffing oscillator: here