Ludovico / Aug 12 2019
Remix of Julia by Nextjournal

New Surrogates and final plans

In the previous article I left off talking about new Surrogate methods.

In the past two weeks I indeed managed to code the following new surrogates:

  • Linear
  • Lobachesky spline
  • Neural Network
  • Support vector machine
  • Random forest

I also had to make sure that these new surrogates would comply with the optimization methods I coded beforehand. It turns out I have been quite sloppy: at the end I had to change around a lot of data structures of these surrogates to make everything compatible.

Now, this seems like a lot of work. Actually it was not that bad because I have taken advantage of a great deal of Packages, such as: GLM, Flux, LIBSVM and XGBoost.

Linear Surrogate

The definition and construction of a Linear Surrogate is indeed quite easy:

mutable struct LinearSurrogate{X,Y,C,L,U} <: AbstractSurrogate

function LinearSurrogate(x,y,lb::Number,ub::Number)
    ols = lm(reshape(x,length(x),1),y)

The bounds are needed in the construction because optimization method need to have explicit limitations. The ND case is the same, because I still take advantage of GLM.

Lobachesky spline

The lobachesky spline is super interesting. It is defined in this way:

Fn(x)=j=1ncjh=1dfn(α(xhxhj))F_n(x) = \sum_{j=1}^n c_j \prod_{h=1}^d f_n^*(\alpha(x_h - x_{hj}))

With and parameters and dimension of the problem.

The inner is defined in this way:

fn(α(xhxhj))=n312n(n1)!k=0n(1)k(nk)×[n3α(xhxhj)+(n2k)]+n1f_n^*(\alpha(x_h-x_{hj})) = \sqrt{\frac{n}{3}}\frac{1}{2^n(n-1)!}\sum_{k=0}^n (-1)^k \binom{n}{k}\times[\sqrt{\frac{n}{3}}\alpha(x_h - x_{hj}) + (n-2k)]^{n-1}_{+}

By applying the central limit theorem the d-variate Lobachevsky spline converge to the d-variate Gaussian. Hence, Lobachevsky splines asymptotically behave like radial functions, though they are not radial in themselves.

Let's call our objective function . If we are able to express it in the following way:

F(x)=abf(x)dxF(x) = \int_a^b f(x)dx

Then we can approximate with a Lobachesky spline because there exists a closed form of the integral. Surrogates.jl makes this extremely easy, check it out:

obj = x -> 3*x + log(x)
a = 1.0
b = 4.0
x = sample(2000,a,b,SobolSample())
y = obj.(x)
alpha = 2.0
n = 6
my_loba_surr = LobacheskySurrogate(x,y,alpha,n,a,b)
int_1D = lobachesky_integral(my_loba_surr,a,b)
int = quadgk(obj,a,b)
int_val_true = int[1]-int[2]
@test abs(int_1D - int_val_true) < 10^-5

Neural network and SVM

To build this surrogate I used the library Flux, which makes it rather easy. There is not much to say about this, I think that the syntax is quite convenient:

a = 0.0
b = 10.0
obj_1D = x -> log(x)^2*x+2*x
x = sample(10,a,b,SobolSample())
y = obj_1D.(x);
model = Chain(Dense(1,1))
loss(x, y) = Flux.mse(model(x), y)
opt = Descent(0.01)
n_echos = 5
my_neural = NeuralSurrogate(x,y,a,b,model,loss,opt,n_echos)
val = my_neural(5.0)

The user just needs to define a few things about his NN and then the constructor takes care of it.

For the SVMSurrogate, I used the library LIBSVM. The syntax is exactly the same as the neural network.

Random forest surrogate

To build this surrogate I used the library XGBoost, which makes it rather easy.

The only difference from the other Library-ready surrogates is that the user needs to input the number of rounds, that is the number of trees:

lb = [0.0,0.0]
ub = [10.0,10.0]
s = sample(5,lb,ub,SobolSample())
x = Tuple.(s)
obj_ND = x -> x[1] * x[2]^2
y = obj_ND.(x)
my_forest_ND = RandomForestSurrogate(x,y,lb,ub,num_round)
val = my_forest_ND((1.0,1.0))

Final weeks

In these last two weeks I plan on writing docs, examples and tutorials because a good package is useless if I am the only one that knows how to operate it. Also, I would love to finish the SOP optimization method whose PR is still open. I would also love to code up the MARS spline surrogate.

Anyway, I have a lot more ideas for this package so for sure the work will not end after JSOC. Cannot wait for the last article that will wrap up these amazing three months!

Happy coding,