New abstract types: RadialBasis and Kriging.

Setting up Surrogates.jl

Hello everyone!

In these two weeks I have been working on Radial Basis and Kriging types, as explained in my previous post here.

First of all though, I needed to set up my package with all the standard things: Travis,Appveyor and coveralls.

These tools are really useful because they give insights on how much of your code is really working. With that out of the way I then proceeded to work on the implementation of two new types: RadialBasis and Kriging.

Let's take a look at the Surrogates!

Kriging knows how much he's wrong, pretty cool!

Kriging has the nice property of knowing how much error there is at a single evaluation. All the code below follows the mathematical reasoning from this paper: "A Taxonomy of Global Optimization Methods Based on Response Surfaces" by DONALD R. JONES.

The statistical interpretation comes from the definition of correlation of two points given below:

Corr[Y(xi),Y(xj)]=exp(l=1dxilxjlp)Corr[Y(x_i),Y(x_j)] = exp(-\sum_{l=1}^d \vert x_{il} - x_{jl} \vert^{p})

It is now time to show some code.

First of all I define my new type, which stores some variables needed for approximating a new value at an unknown point.

abstract type AbstractSurrogate <: Function end
mutable struct Kriging <: AbstractSurrogate

After that, I need to actually build this type in some way, using a constructor. Suppose we are in dimension one, which makes things clearer (theta is just a value and not a vector).

function Kriging(x,y,p::Number)
   	# 0<p<=2 determines the smoothness of the function:
    #higher p higher smoothness
    n = length(x)
    theta = 1
    R = zeros(float(eltype(x)), n, n)
    @inbounds for i = 1:n
        for j = 1:n
            R[i,j] = exp(-theta*abs(x[i]-x[j])^p)
    one = ones(n,1)
    one_t = one'
    inverse_of_R = inv(R)
    mu = (one_t*inverse_of_R*y)/(one_t*inverse_of_R*one)
    b = inverse_of_R*(y-one*mu)
    sigma = ((y-one*mu)' * inverse_of_R * (y - one*mu))/n

With the function above we can build a Kriging type, then we can use it to make a prediction in an unknown region using the following function:

function (k::Kriging)(val::Number)
     phi(z) = exp(-(abs(z))^k.p)
     n = length(k.x)
     prediction = zero(eltype(k.x))
     r = zeros(float(eltype(k.x)),n,1)
     @inbounds for i = 1:n
         prediction = prediction + k.b[i]*phi(val-k.x[i])
         r[i] = phi(val - k.x[i])
     prediction = + prediction
     one = ones(n,1)
     one_t = one'
     a = r'*k.inverse_of_R*r
     a = a[1]
     b = one_t*k.inverse_of_R*one
     b = b[1]
     mean_squared_error = k.sigma*(1 - a + (1-a)^2/(b))
     std_error = sqrt(mean_squared_error)
     return prediction, std_error

With this two methods we already can do a lot of different things. Take a look at the example below:

x = [1 2 3]
y = [4,5,6]
p = 1.3
my_k = Kriging(x,y,p)
new_value = 2.3
approx,std_err = my_k(new_value)
fake_new_value = 3
fake_approx, fake_std_err = my_k(fake_new_value)
(6.0, 0.0)

Feel free to change those numbers to see how the behavior changes!

Adding samples to the Kriging object

At the end of the day the goal is to approximate solution of Differential equations for which the computational time is too high. Suppose we calculate 10 sample points , we build the surrogate and approximate the solutions. However after some time we have 3 more samples. Then we need to add the samples to the Kriging object with the following function:

function add_point!(k::Kriging,new_x,new_y)
    if Base.size(k.x,1) == 1
        if length(new_x) > 1
            k.x = hcat(k.x,new_x)
            k.y = vcat(k.y,new_y)
            return Kriging(k.x,k.y,k.p)
            k.x = vcat(vec(k.x),new_x)
            k.y = vcat(vec(k.y),new_y)
            return Kriging(k.x,k.y,k.p)
        k.x = vcat(k.x,new_x)
        k.y = vcat(k.y,new_y)
        return Kriging(k.x,k.y,k.p,k.theta)
add_point! (generic function with 1 method)

As you can see, it is not very efficient because I actually need to solve the linear system again. At the moment I do not have a better solution, it turns out adaptivity is quite hard. I have planned to change the type of to allow method to work more effectively instead of relying on ugly if else conditions.

How to choose the Samples?

Let's say my Differential Equation is defined on the interval . Until now is up to the user to choose the points in the interval and find the solution at those points.

However I am going to use the packages Sobol.jl and LatinHypercubeSampling.jl for samples.

Generally speaking sampling makes or breaks this kind of algorithms, so it is important to use efficient libraries and not let the user choose points randomly.

The end goal is to have a method that let's the user input a Differential equation problem, choose a Sampling method and a specific Surrogate and then the package finds autonomously the approximation, resampling again in the region where the standard error is higher.

What about RadialBasis?

RadialBasis has the same functionality of Kriging as of right now. The big draw back is that there is no statistical interpretation, so it's not possible to find the standard error at a point. The big advantage however is that the user can choose whatever Radial function he wants. This is very useful because he may want the approximation to have some properties, based on what he knows about the theoretical solution of the Differential problem.

The idea behind this is to use radial functions in combination with a low degree polynomial which increases stability. You can see the type RadialBasis below:

abstract type AbstractSurrogate <: Function end
mutable struct RadialBasis{F} <: AbstractSurrogate

Sticking to 1D, the constructor is as follows:

function RadialBasis(x,y,a::Number,b::Number,phi::Function,q::Int)
    Chebyshev(x,k) = cos(k*acos(-1 + 2/(b-a)*(x-a)))
    n = length(x)
    size = n+q
    D = zeros(float(eltype(x)), size, size)
    d = zeros(float(eltype(x)),size)

    @inbounds for i = 1:n
        d[i] =  y[i]
        for j = 1:n
            D[i,j] = phi(x[i] - x[j])
        if i < n + 1
            for k = n+1:size
                D[i,k] = Chebyshev(x[i],k)
    Sym = Symmetric(D,:U)
    coeff = Sym\d

As you can see, the user specify an interval in which he wants to approximate the solution. Then depending on the radial basis I have a polynomial of degree mixed in with Radial Basis. The system I am going to solve is indeed symmetric, thanks to radial functions.

The filling of the D matrix is a bit intricated, it is based on the following conditions applied to the problem:

yi=k=1makπk(xi)+j=1nbjϕ(xixj),  i=1:ny_i = \sum_{k=1}^m a_k \pi_k(x_i) + \sum_{j=1}^n b_j \phi(x_i-x_j), \space \space i = 1:n
j=1nbjπk(xj)=0  k=1:m\sum_{j=1}^n b_j\pi_k({x_j}) = 0 \space \space k = 1:m

Where is the polynomial basis chosen. If this does not enlighten the construction of the D matrix, then my own reasoning could help, below some pictures of me working on the nitty gritty details:

If you are interested in trying the different RadialBasis you can fork my repository, which is now under the DifferentialEquations.jl umbrella (Wooo) and give it a spin! I suggest to think about a function, calculate some samples, and see if it can guess it. If It behaves well please reward it with a shining star ;)!

Next problems to tackle

In the following two weeks I will work on the Sampling methods, integration with DifferentialEquations.jl and refactoring to make the code more efficient. After that I will have some days preparing for submitting the code for the first month review.

Happy coding everyone!