Ludovico / Aug 12 2019
Remix of Julia by Nextjournal

Surrogate optimization and Sampling

Sampling and optimization

In the last article, I talked about two new abstract types: Kriging and RadialBasis. With these new types, I managed to construct the respective surrogates, and they did work quite well! However, there were still two problems to face:

  1. How to effectively sample the space?
  2. How to choose which points to evaluate my expensive objective function in such a way that I "fill" the space and also look for the minimum of the Surrogate?

I managed to solve both problems, as you can see on my repository: here and here.


I created many sampling Algorithms thanks to the suggestion of my mentor:

I think that the syntax works quite well, as you can see below:

# One dimensional sampling
lb = -2.0
ub = 2.0
n = 100
s = sample(n,lb,ub,SobolSample())

I just created a Sobol sample of length n scaled between -2.0 and 2.0, simple as that!

It works equally well in more dimensions, as you can see below:

# Multiple dimensional sampling 
lb = [-1.0,-5.0]
ub = [4.5,7.0]
n = 100
s = sample(n,lb,ub,LatinHypercubeSample())

In this way I created a sample s with first dimension scaled between -1.0 and 4.5 and the second dimension between -5.0 and 7.0

I am quite happy with how this turned out because the syntax is indeed very straightforward.

Now that we have the samples, how do we proceed? That is, how do I pick the point to add to the Surrogates? For sure I cannot add all the sampling points, because the calculation of the objective function we want to approximate takes a long time.

Well, it turns out there are a lot of optimization methods to try, check them out in the section below!

Optimization methods

For the development of this section, I took inspiration from the following sources:

Following those, I devised two optimization methods that are currently both working in one and multiple dimensions: SRBF and LCBS.


The main theme of this method is to minimize the following convex combination:

fmerit(x)=wS(x)+(1w)D(x)f_{merit}(x) = wS(x)+(1-w)D(x)

Where , , .

and are the minimum and maximum value of the surrogate at the sampled points, while and are the minimum and maximum distance from sample points and evaluated point.

It is clear that a big value of gives importance to surrogate values leading to discovery of the minimum value. On the other hand a small value of gives importance to points that are far away from other, leading to discovery of the sampled space.

It is important to avoid to choose points that are near to each other, because this would lead to singularities in the matrix of coefficients, the paper suggested a minimum tolerance of .


The main theme of this method is to instead minimize the following function:

LCBS(x)=E[x]kV[x]LCBS(x) = \mathbb{E}[x] - k*\sqrt{\mathbb{V}[x]}

Where is a fixed value, and are respectively the expected value and the expected error of the the surrogate evaluation. Note that this method can only work with Kriging at the moment, because it has a statistical interpretation.

The same tolerance applies here as well.

Let's try them out!

Below, some examples on how to use the library.

Let's start with SRBF:

objective_function = x -> 2*x+1
x = [2.0,4.0,6.0]
y = [5.0,9.0,13.0]
p = 1.0
lb = 2
ub = 6

#Using RadialBasis
my_rad = RadialBasis(x,y,a,b,z->norm(z),1)
#my_rad.x = [2.0, 4.0, 6.0, 2.03435, 2.02078, 2.1723, 2.06831, 2.05151, 2.0248, 2.04152, 2.07796, 2.00545, 2.06329, 2.03001, 2.07309, 2.08238, 2.01087, 2.04594, 2.01492]

As you can see this works quite well, it remains around the minimum.

Problems still left to investigate:

  • the choice of the tolerance is crucial I want to study it a bit more.
  • getting stuck at local minimum -> will try to relax neighborhood of the minimum.
  • For a choice of close to , sometimes I have singularities problems, with this method, I need to look at it more closely.

Let's now take a look at LCBS:

objective_function = x -> 2*x+1
x = [2.5,4.0,6.0]
y = [6.0,9.0,13.0]
p = 1.0
a = 2
b = 6
my_k = Kriging(x,y,p)
#my_k.x = [2.5, 4.0, 6.0, 2.375, 2.25, 3.25, 3.75, 4.25, 4.375, 4.75, 5.25, 5.375, 5.75]

As you can see, in this case the points are more spread out.

objective_function_ND = z -> 3*norm(z)+1
x = [(1.2,3.0),(3.0,3.5),(5.2,5.7)]
y = objective_function_ND.(x)
p = [1.0,1.0]
theta = [2.0,2.0]
lb = [1.0,1.0]
ub = [6.0,6.0]

my_k_ND = Kriging(x,y,p,theta)
#my_k.x = [(1.2, 3.0), (3.0, 3.5), (5.2, 5.7), (1.35399, 2.6428), (2.33342, 3.40259), (1.05897, 2.83862), (1.40907, 2.3167), (1.40441, 2.19472), (1.23633, 2.84317), (1.21371, 2.10073), (1.42053, 3.23016), (1.59273, 2.05025), (1.7947, 2.52392)]

The same thing happens also in the N dimensional case.

As a side note: the number of iterations and the number of samples is really small to have a good understanding of the example. If you let the methods run for more time, surely the algorithm will converge to the minimum while sampling.

You can play with it yourself if you are interested at:

Happy Coding!