# More optimization methods and future surrogates

In the last article I talked about two optimization methods: SRBF and LCBS.

In the mean time I successfully coded other three optimization methods: DYCOR, EI and SOP. As I am writing I also added another surrogate, LinearSurrogate which is under review at: https://github.com/JuliaDiffEq/Surrogates.jl/pull/42

It was a very low hanging fruit which I didn't get around doing, so I blitzed it out in the last two days.

Let's now take a look at the three new optimization methods:

## DYCOR

The DYCORS optimization method is an extension of the SRBF method. The main change is that there is a probability that a new point we want to add is perturbed.

From pySOT documentantion: "The main idea is that many objective functions depend only on a few directions so it may be advantageous to perturb only a few directions. In particular, we use a perturbation probability to perturb a given coordinate and decrease this probability after each function evaluation so fewer coordinates are perturbed later in the optimization."

Below the code:

Shift+Enter to run
surrogate_optimize(obj,::DYCORS,lb,ub,surrn::AbstractSurrogate,sample_type)
x_best = collect(surrn.x[argmin(surrn.y)])
y_best = minimum(surrn.y)
sigma_n = 0.2*norm(ub-lb)
d = length(lb)
sigma_min = 0.2*(0.5)^6*norm(ub-lb)
t_success = 3
t_fail = max(d,5)
C_success = 0
C_fail = 0
for k = 1:maxiters
p_select = min(20/d,1)*(1-log(k))/log(maxiters-1)
new_points = zeros(eltype(surrn.x),num_new_samples,d)
for j = 1:num_new_samples
w = sample(d,0,1,sample_type)
I_perturb = w .< p_select
if ~(true in I_perturb)
val = rand(1:d)
I_perturb = vcat(zeros(Int,val-1),1,zeros(Int,d-val))
end
I_perturb = Int.(I_perturb)
for i = 1:d
if I_perturb[i] == 1
new_points[j,i] = x_best[i] + rand(Normal(0,sigma_n))
else
new_points[j,i] = x_best[i]
end
end

end

for i = 1:num_new_samples
for j = 1:d
while new_points[i,j] < lb[j] || new_points[i,j] > ub[j]
if new_points[i,j] > ub[j]
new_points[i,j] = maximum(surrn.x)[j] - norm(new_points[i,j] - maximum(surrn.x)[j])
end
if new_points[i,j] < lb[j]
new_points[i,j] = minimum(surrn.x)[j] + norm(new_points[i]-minimum(surrn.x)[j])
end
end
end
end

x_new = select_evaluation_point_ND(new_points,surrn,k,maxiters)
f_new = obj(x_new)

if f_new < y_best
C_success = C_success + 1
C_fail = 0
else
C_fail = C_fail + 1
C_success = 0
end

if f_new < y_best
x_best = x_new
y_best = f_new
end
end
end

There are two accessories functions:

• select_evaluation_point_ND

The former selects the point the minimizes best a convex sum, just like in SRBF.

The latter is self-explanatory, given the amount of fails and successes it changes the step sigma.

## EI

The EI optimization method stands for: expected improvement.

The main idea is to maximize the expected value of improvement:

As you can see, an expected value pops up. We can only use this optimization method under a Gaussian Process prior, in our case: Kriging.

Shift+Enter to run
surrogate_optimize(obj,::EI,lb,ub,krig::Kriging,sample_type)
dtol = 1e-3*norm(ub-lb)
eps = 0.01
for i = 1:maxiters
new_sample = sample(num_new_samples,lb,ub,sample_type)
f_max = maximum(krig.y)
evaluations = zeros(eltype(krig.x),num_new_samples)
point_found = false
new_x_max = zero(eltype(krig.x))
new_y_max = zero(eltype(krig.x))
while point_found == false
for j = 1:length(new_sample)
std = std_error_at_point(krig,new_sample[j])
u = krig(new_sample[j])
if abs(std) > 1e-6
z = (u - f_max - eps)/std
else
z = 0
end
evaluations[j] = (u-f_max-eps)*cdf(Normal(),z) + std*pdf(Normal(),z)
end
index_max = argmax(evaluations)
x_new = new_sample[index_max]
y_new = maximum(evaluations)
diff_x = abs.(krig.x .- x_new)
bit_x = diff_x .> dtol
#new_min_x has to have some distance from krig.x
if false in bit_x
#The new_point is not actually that new, discard it!
deleteat!(evaluations,index_max)
deleteat!(new_sample,index_max)
if length(new_sample) == 0
println("Out of sampling points")
return
end
else
point_found = true
new_x_max = x_new
new_y_max = y_new
end
end
if new_y_max < 1e-6*norm(maximum(krig.y)-minimum(krig.y))
return
end
end
end

As you can see, we made use of probability theory when calling PDF and CDF of the normal. The merit function is the surrogate itself.

## SOP

The SOP optimization method is implemented following this paper: "SOP: parallel surrogate global optimization with Pareto center selection for computationally expensive single objective problems" by Shoemaker.

There are many new ideas:

• Look for new points to add in parallel.
• Two tier raking of previously evaluated points
• Points are perturbated with a truncated normal.
• Keep a list of tabu points to avoid that did not produce any improvement.
• Measure the improvement using "Expected Hypervolume improvement" (EHI).

At the moment, the code does not run in parallel but I expect to add this feature someday in the near future. At this time my mentor and I do not understand EHI super well from the paper definition, so it is ignored at this stage. This means that the code converges slowly to the minimum, nevertheless it behaves quite well and search the whole sample space uniformly.

Code below:

1.2s
surrogate_optimize(obj,sop1::SOP,lb,ub,surrSOP::AbstractSurrogate,sample_type)
d = length(lb)
N_fail = 3
N_tenure = 5
tau = 10^-5
num_P = sop1.p
centers_global = surrSOP.x
r_centers_global = 0.2*norm(ub-lb)*ones(length(surrSOP.x))
N_failures_global = zeros(length(surrSOP.x))
tabu = []
N_tenures_tabu = []
for k = 1:maxiters
N_tenures_tabu .+= 1
#deleting points that have been in tabu for too long
del = N_tenures_tabu .> N_tenure

if length(del) > 0
for i = length(del)
if del[i]
del[i] = i
end
end
deleteat!(N_tenures_tabu,del)
deleteat!(tabu,del)
end

##### P CENTERS ######
C = []

#S(x) set of points already evaluated
#Rank points in S with:
#1) Non dominated sorting
Fronts_I = I_tier_ranking_1D(centers_global,surrSOP)

#2) Second tier ranking
Fronts = II_tier_ranking_1D(Fronts_I,surrSOP)
ranked_list = []
for i = 1:length(Fronts)
for j = 1:length(Fronts[i])
push!(ranked_list,Fronts[i][j])
end
end
ranked_list = eltype(surrSOP.x).(ranked_list)

centers_full = 0
i = 1
while i <= length(ranked_list) && centers_full == 0
flag = 0
for j = 1:length(ranked_list)
for k = 1:length(tabu)
if abs(ranked_list[j]-tabu[k]) < tau
flag = 1
end
end
for l = 1:length(centers_global)
if abs(ranked_list[j]-centers_global[l]) < tau
flag = 1
end
end
end
if flag == 1
skip
else
push!(C,ranked_list[i])
if length(C) == num_P
centers_full = 1
end
end
i = i + 1
end

# I examined all the points in the ranked list but num_selected < num_p
# I just iterate again using only radius rule
if length(C) < num_P
i = 1
while i <= length(ranked_list) && centers_full == 0
flag = 0
for j = 1:length(ranked_list)
for k = 1:length(centers_global)
if abs(centers_global[i] - ranked_list[i]) < tau
flag = 1
end
end
end
if flag == 1
skip
else
push!(C,ranked_list[i])
if length(C) == num_P
centers_full = 1
end
end
i = i + 1
end
end

#If I still have num_selected < num_P, I double down on some centers iteratively
if length(C) < num_P
i = 1
while i <= length(ranked_list)
push!(C,ranked_list[i])
if length(C) == num_P
centers_full = 1
end
end
end

#Here I have C = [] containing the centers
r_centers = 0.2*norm(ub-lb)*ones(num_P)
N_failures = zeros(num_P)
#2.3 Candidate search
new_points = zeros(eltype(surrSOP.x),num_P,2)
for i = 1:num_P
N_candidates = zeros(eltype(surrSOP.x),num_new_samples)
#Using phi(n) just like DYCORS, merit function = surrogate
#Like in DYCORS, I_perturb = 1 always
evaluations = zeros(eltype(surrSOP.y),num_new_samples)
for j = 1:num_new_samples
a = lb - C[i]
b = ub - C[i]
N_candidates[j] = C[i] + rand(TruncatedNormal(0,r_centers[i],a,b))
evaluations[j] = surrSOP(N_candidates[j])
end
x_best = N_candidates[argmin(evaluations)]
y_best = minimum(evaluations)
new_points[i,1] = x_best
new_points[i,2] = y_best
end

#new_points[i] now contains:
#[x_1,y_1; x_2,y_2,...,x_{num_new_samples},y_{num_new_samples}]
#2.4 Adaptive learning and tabu archive
for i=1:num_P
if new_points[i,1] in centers_global
r_centers[i] = r_centers_global[i]
N_failures[i] = N_failures_global[i]
end
#1D it is just the length of the interval
#Need further explanation
#println(Hypervolume_Pareto_improving_1D(new_points[i,1],Fronts))
#=
if (Hypervolume_Pareto_improving_1D(new_points[i,1],Fronts)<tau)
#failure
r_centers[i] = r_centers[i]/2
N_failures[i] += 1
if N_failures[i] > N_fail
push!(tabu,C[i])
push!(N_tenures_tabu,0)
end
else
=#
#P_i is success
push!(r_centers_global,r_centers[i])
push!(N_failures_global,N_failures[i])
#end
end
end
end

As you can see I commented out the bit on EHI, given that there are many conditions to pass to even get to that point the algorithm is still performing OK.

## Another Surrogate and plans for the future

At the moment I am reading these papers: "Multidimensional Lobachevsky Spline Integration on Scattered Data" and "Numerical integration on multivariate scattered data by Lobachevsky splines" by Giampiero Allasia. The main focus is to build a new surrogate type called Lobachesky spline to fit data. I expect to open a PR very soon with the 1D version of it. This is following the issue: https://github.com/JuliaDiffEq/Surrogates.jl/issues/17

I love the fact that there are still many Surrogates to build so I can pick whatever suits my skills the most! (Knowing fully that I will implement them all sooner or later :-D)

Happy Coding,

Ludovico