Ludovico / Aug 12 2019
Remix of Julia by Nextjournal

Bonding-time, or should I say coding time?

Starting out

Hello everyone!

I started off the first half of bonding time just reading about what I would need to implement, the libraries I would need to use and best practices for Julia code.

However, I slowly started getting bored just reading and reading. I figured that starting coding earlier would be my best bet, so that's exactly what I did!

I think there are a lot of good reasons for that: I will hit road blocks earlier, thus solving them earlier. This means no all nighters (hopefully) before deadlines. Moreover, if everything goes according to plan I might manage to code the bonus part of my proposal, win-win!

Let's see what I accomplished so far.

One dimensional response surfaces

Response surfaces account for 1/3 of the functionality of my new library: the main idea is to mix polynomial interpolation with radial basis function interpolation. In these two weeks I just tackled the problem for simplicity, but after the code review I will also finish the general dimensional case.

The interpolation in my case is mainly done with basis functions, plus a two/three degree polynomial to it to make the system more stable.

The thing I love the most about my project is that the math is scary looking, however once understood the code being written turns out to be quite simple.

Suppose we choose a polynomial baseand a basis function . A quick observation: the choice of the polynomial basis is critical for stability of the method, even with low degree polynomials. After talking with the author of the paper I have been reading, in the choice of Chebyshev polynomials is the safest one. In the next article you will discover the choice for the general case ;-)!

If I have a set of n samples , my predictor at a new point would be:

y^(x)=i=1makπk(x)+j=1nbjϕ(xxj)\hat y (x^*) = \sum_{i=1}^m a_k \pi_k(x^*) + \sum_{j=1}^nb_j\phi(x^* - x_j)

To find the coefficients I need equations, given by these two expressions below:

yi=k=1makπk(xi)+j=1nbjϕ(xixj),  i=1:ny_i = \sum_{k=1}^m a_k \pi_k(x_i) + \sum_{j=1}^n b_j \phi(x_i-x_j), \space \space i = 1:n
j=1nbjπk(xj)=0  k=1:m\sum_{j=1}^n b_j\pi_k({x_j}) = 0 \space \space k = 1:m

The first equations are just for interpolating purposes. However, we added polynomial terms, so we also need to impose the second condition. These additional constraints cause the term to be a projection of the function onto the space of polynomials and the term to be an interpolator of the residual, non polynomial part. Depending on the kind of basis function we choose the size of the polynomial base.

Now that these two conditions are clear, we can just set up a linear system where is the vector of size containing the coefficients.

A neat property to observe: A is always symmetric. How beautiful!

Kriging: using a very special basis function

Suppose we have the following basis function:

ϕ(z)=el=1dθlzlpl\phi(z) = e^{-\sum_{l=1}^d \theta_l \vert z_l \vert^{p_l}}

where:

  • is the dimension of
  • models how active is the function in the coordinate
  • determines the smoothness of the function in the coordinate. High smoother function

We can then build the Krigin predictor.

The idea is the same as before, however with Kriging we have a statistical interpretation of the error at a point. That is, besides giving an estimate at a new point, the method will also explicitly say how sure it is about said prediction.

We start by defining the correlation between two sampled data point as:

Corr[Y(xi),Y(xj)]=el=1dθlxilxjlplCorr[Y(x_i),Y(x_j)] = e^{-\sum_{l=1}^d \theta_l \vert x_{il} - x_{jl} \vert^{p_l}}

That is, two values are highly correlated if they are closed together in space. This definition as the neat property that as the distance goes to infinity the correlation tends to .

Now, sparing the mathematical detail, we can construct the estimator by maximing the log of the likelihood function. We then get, for suitable and :

y^(x)=a+i=1nbiϕ(xxi)\hat y(x^*) = a + \sum_{i=1}^n b_i \phi(x^* - x_i)

For every point we can also calculate the mean squared error at that point. Of course, if we try to calculate the error at point from the sample we get the , which is a good sanity check that we are doing something that makes sense.

You can find the code for this two methods here. (Bear in mind, no review has been done at the time of this article, things are still rough).

What to do in the next two weeks?

My plan for the next two weeks is to polish the code for the one dimensional case, then build the general N-dimensional case. The code should be very similar so it should not be an issue. After that I will write some tests to make sure everything is working as intended. I will also write some documentation and a few example drawn from nasty differential equations to show the functionality of the surrogates in different cases. I hope to do this in two weeks even though it may be a bit of a stretch.

We will see how it goes!

Happy coding,

Ludovico