Surrogate optimization and Sampling
Sampling and optimization
In the last article, I talked about two new abstract types: Kriging and RadialBasis. With these new types, I managed to construct the respective surrogates, and they did work quite well! However, there were still two problems to face:
- How to effectively sample the space?
- How to choose which points to evaluate my expensive objective function in such a way that I "fill" the space and also look for the minimum of the Surrogate?
I created many sampling Algorithms thanks to the suggestion of my mentor:
- Random Sampling
- Uniform Sampling
- Sobol Sampling (Using Sobol.jl)
- Latin hypercube Sampling (Using LatinHypercubeSampling.jl)
- Low discrepancy Sampling
I think that the syntax works quite well, as you can see below:
# One dimensional sampling lb = -2.0 ub = 2.0 n = 100 s = sample(n,lb,ub,SobolSample())
I just created a Sobol sample of length n scaled between -2.0 and 2.0, simple as that!
It works equally well in more dimensions, as you can see below:
# Multiple dimensional sampling lb = [-1.0,-5.0] ub = [4.5,7.0] n = 100 s = sample(n,lb,ub,LatinHypercubeSample())
In this way I created a sample s with first dimension scaled between -1.0 and 4.5 and the second dimension between -5.0 and 7.0
I am quite happy with how this turned out because the syntax is indeed very straightforward.
Now that we have the samples, how do we proceed? That is, how do I pick the point to add to the Surrogates? For sure I cannot add all the sampling points, because the calculation of the objective function we want to approximate takes a long time.
Well, it turns out there are a lot of optimization methods to try, check them out in the section below!
For the development of this section, I took inspiration from the following sources:
- "A stochastic radial basis function method for the global optimization of expensive functions" by Regis and Shoemaker.
Following those, I devised two optimization methods that are currently both working in one and multiple dimensions: SRBF and LCBS.
The main theme of this method is to minimize the following convex combination:
It is clear that a big value of
It is important to avoid to choose points that are near to each other, because this would lead to singularities in the matrix of coefficients, the paper suggested a minimum tolerance of
The main theme of this method is to instead minimize the following function:
The same tolerance applies here as well.
Let's try them out!
Below, some examples on how to use the library.
Let's start with SRBF:
#SRBF objective_function = x -> 2*x+1 x = [2.0,4.0,6.0] y = [5.0,9.0,13.0] p = 1.0 lb = 2 ub = 6 #Using RadialBasis my_rad = RadialBasis(x,y,a,b,z->norm(z),1) SRBF(a,b,my_rad,max_iters=10,UniformSample(),num_samples=10,objective_function) #my_rad.x = [2.0, 4.0, 6.0, 2.03435, 2.02078, 2.1723, 2.06831, 2.05151, 2.0248, 2.04152, 2.07796, 2.00545, 2.06329, 2.03001, 2.07309, 2.08238, 2.01087, 2.04594, 2.01492]
As you can see this works quite well, it remains around the minimum.
Problems still left to investigate:
- the choice of the tolerance is crucial I want to study it a bit more.
- getting stuck at local minimum -> will try to relax neighborhood of the minimum.
- For a choice of
close to , sometimes I have singularities problems, with this method, I need to look at it more closely.
Let's now take a look at LCBS:
#1D objective_function = x -> 2*x+1 x = [2.5,4.0,6.0] y = [6.0,9.0,13.0] p = 1.0 a = 2 b = 6 my_k = Kriging(x,y,p) LCBS(a,b,my_k,10,SobolSample(),10,objective_function) #my_k.x = [2.5, 4.0, 6.0, 2.375, 2.25, 3.25, 3.75, 4.25, 4.375, 4.75, 5.25, 5.375, 5.75]
As you can see, in this case the points are more spread out.
#ND objective_function_ND = z -> 3*norm(z)+1 x = [(1.2,3.0),(3.0,3.5),(5.2,5.7)] y = objective_function_ND.(x) p = [1.0,1.0] theta = [2.0,2.0] lb = [1.0,1.0] ub = [6.0,6.0] #Kriging my_k_ND = Kriging(x,y,p,theta) LCBS(lb,ub,my_k_ND,maxiters=10,UniformSample(),n_sampl=10,objective_function_ND) #my_k.x = [(1.2, 3.0), (3.0, 3.5), (5.2, 5.7), (1.35399, 2.6428), (2.33342, 3.40259), (1.05897, 2.83862), (1.40907, 2.3167), (1.40441, 2.19472), (1.23633, 2.84317), (1.21371, 2.10073), (1.42053, 3.23016), (1.59273, 2.05025), (1.7947, 2.52392)]
The same thing happens also in the N dimensional case.
As a side note: the number of iterations and the number of samples is really small to have a good understanding of the example. If you let the methods run for more time, surely the algorithm will converge to the minimum while sampling.
You can play with it yourself if you are interested at: https://github.com/JuliaDiffEq/Surrogates.jl