More optimization methods and future surrogates
In the last article I talked about two optimization methods: SRBF and LCBS.
In the mean time I successfully coded other three optimization methods: DYCOR, EI and SOP. As I am writing I also added another surrogate, LinearSurrogate which is under review at: https://github.com/JuliaDiffEq/Surrogates.jl/pull/42
It was a very low hanging fruit which I didn't get around doing, so I blitzed it out in the last two days.
Let's now take a look at the three new optimization methods:
DYCOR
The DYCORS optimization method is an extension of the SRBF method. The main change is that there is a probability
From pySOT documentantion: "The main idea is that many objective functions depend only on a few directions so it may be advantageous to perturb only a few directions. In particular, we use a perturbation probability to perturb a given coordinate and decrease this probability after each function evaluation so fewer coordinates are perturbed later in the optimization."
Below the code:
surrogate_optimize(obj,::DYCORS,lb,ub,surrn::AbstractSurrogate,sample_type) x_best = collect(surrn.x[argmin(surrn.y)]) y_best = minimum(surrn.y) sigma_n = 0.2*norm(ub-lb) d = length(lb) sigma_min = 0.2*(0.5)^6*norm(ub-lb) t_success = 3 t_fail = max(d,5) C_success = 0 C_fail = 0 for k = 1:maxiters p_select = min(20/d,1)*(1-log(k))/log(maxiters-1) new_points = zeros(eltype(surrn.x[1]),num_new_samples,d) for j = 1:num_new_samples w = sample(d,0,1,sample_type) I_perturb = w .< p_select if ~(true in I_perturb) val = rand(1:d) I_perturb = vcat(zeros(Int,val-1),1,zeros(Int,d-val)) end I_perturb = Int.(I_perturb) for i = 1:d if I_perturb[i] == 1 new_points[j,i] = x_best[i] + rand(Normal(0,sigma_n)) else new_points[j,i] = x_best[i] end end end for i = 1:num_new_samples for j = 1:d while new_points[i,j] < lb[j] || new_points[i,j] > ub[j] if new_points[i,j] > ub[j] new_points[i,j] = maximum(surrn.x)[j] - norm(new_points[i,j] - maximum(surrn.x)[j]) end if new_points[i,j] < lb[j] new_points[i,j] = minimum(surrn.x)[j] + norm(new_points[i]-minimum(surrn.x)[j]) end end end end x_new = select_evaluation_point_ND(new_points,surrn,k,maxiters) f_new = obj(x_new) if f_new < y_best C_success = C_success + 1 C_fail = 0 else C_fail = C_fail + 1 C_success = 0 end sigma_n,C_success,C_fail = adjust_step_size(sigma_n,sigma_min,C_success,t_success,C_fail,t_fail) if f_new < y_best x_best = x_new y_best = f_new add_point!(surrn,Tuple(x_best),y_best) end end end
There are two accessories functions:
- select_evaluation_point_ND
- adjust_step_size
The former selects the point the minimizes best a convex sum, just like in SRBF.
The latter is self-explanatory, given the amount of fails and successes it changes the step sigma.
EI
The EI optimization method stands for: expected improvement.
The main idea is to maximize the expected value of improvement:
As you can see, an expected value pops up. We can only use this optimization method under a Gaussian Process prior, in our case: Kriging.
surrogate_optimize(obj,::EI,lb,ub,krig::Kriging,sample_type) dtol = 1e-3*norm(ub-lb) eps = 0.01 for i = 1:maxiters new_sample = sample(num_new_samples,lb,ub,sample_type) f_max = maximum(krig.y) evaluations = zeros(eltype(krig.x[1]),num_new_samples) point_found = false new_x_max = zero(eltype(krig.x[1])) new_y_max = zero(eltype(krig.x[1])) while point_found == false for j = 1:length(new_sample) std = std_error_at_point(krig,new_sample[j]) u = krig(new_sample[j]) if abs(std) > 1e-6 z = (u - f_max - eps)/std else z = 0 end evaluations[j] = (u-f_max-eps)*cdf(Normal(),z) + std*pdf(Normal(),z) end index_max = argmax(evaluations) x_new = new_sample[index_max] y_new = maximum(evaluations) diff_x = abs.(krig.x .- x_new) bit_x = diff_x .> dtol #new_min_x has to have some distance from krig.x if false in bit_x #The new_point is not actually that new, discard it! deleteat!(evaluations,index_max) deleteat!(new_sample,index_max) if length(new_sample) == 0 println("Out of sampling points") return end else point_found = true new_x_max = x_new new_y_max = y_new end end if new_y_max < 1e-6*norm(maximum(krig.y)-minimum(krig.y)) return end add_point!(krig,new_x_max,obj(new_x_max)) end end
As you can see, we made use of probability theory when calling PDF and CDF of the normal. The merit function is the surrogate itself.
SOP
The SOP optimization method is implemented following this paper: "SOP: parallel surrogate global optimization with Pareto center selection for computationally expensive single objective problems" by Shoemaker.
There are many new ideas:
- Look for
new points to add in parallel. - Two tier raking of previously evaluated points
- Points are perturbated with a truncated normal.
- Keep a list of tabu points to avoid that did not produce any improvement.
- Measure the improvement using "Expected Hypervolume improvement" (EHI).
At the moment, the code does not run in parallel but I expect to add this feature someday in the near future. At this time my mentor and I do not understand EHI super well from the paper definition, so it is ignored at this stage. This means that the code converges slowly to the minimum, nevertheless it behaves quite well and search the whole sample space uniformly.
Code below:
surrogate_optimize(obj,sop1::SOP,lb,ub,surrSOP::AbstractSurrogate,sample_type) d = length(lb) N_fail = 3 N_tenure = 5 tau = 10^-5 num_P = sop1.p centers_global = surrSOP.x r_centers_global = 0.2*norm(ub-lb)*ones(length(surrSOP.x)) N_failures_global = zeros(length(surrSOP.x)) tabu = [] N_tenures_tabu = [] for k = 1:maxiters N_tenures_tabu .+= 1 #deleting points that have been in tabu for too long del = N_tenures_tabu .> N_tenure if length(del) > 0 for i = length(del) if del[i] del[i] = i end end deleteat!(N_tenures_tabu,del) deleteat!(tabu,del) end ##### P CENTERS ###### C = [] #S(x) set of points already evaluated #Rank points in S with: #1) Non dominated sorting Fronts_I = I_tier_ranking_1D(centers_global,surrSOP) #2) Second tier ranking Fronts = II_tier_ranking_1D(Fronts_I,surrSOP) ranked_list = [] for i = 1:length(Fronts) for j = 1:length(Fronts[i]) push!(ranked_list,Fronts[i][j]) end end ranked_list = eltype(surrSOP.x[1]).(ranked_list) centers_full = 0 i = 1 while i <= length(ranked_list) && centers_full == 0 flag = 0 for j = 1:length(ranked_list) for k = 1:length(tabu) if abs(ranked_list[j]-tabu[k]) < tau flag = 1 end end for l = 1:length(centers_global) if abs(ranked_list[j]-centers_global[l]) < tau flag = 1 end end end if flag == 1 skip else push!(C,ranked_list[i]) if length(C) == num_P centers_full = 1 end end i = i + 1 end # I examined all the points in the ranked list but num_selected < num_p # I just iterate again using only radius rule if length(C) < num_P i = 1 while i <= length(ranked_list) && centers_full == 0 flag = 0 for j = 1:length(ranked_list) for k = 1:length(centers_global) if abs(centers_global[i] - ranked_list[i]) < tau flag = 1 end end end if flag == 1 skip else push!(C,ranked_list[i]) if length(C) == num_P centers_full = 1 end end i = i + 1 end end #If I still have num_selected < num_P, I double down on some centers iteratively if length(C) < num_P i = 1 while i <= length(ranked_list) push!(C,ranked_list[i]) if length(C) == num_P centers_full = 1 end end end #Here I have C = [] containing the centers r_centers = 0.2*norm(ub-lb)*ones(num_P) N_failures = zeros(num_P) #2.3 Candidate search new_points = zeros(eltype(surrSOP.x[1]),num_P,2) for i = 1:num_P N_candidates = zeros(eltype(surrSOP.x[1]),num_new_samples) #Using phi(n) just like DYCORS, merit function = surrogate #Like in DYCORS, I_perturb = 1 always evaluations = zeros(eltype(surrSOP.y[1]),num_new_samples) for j = 1:num_new_samples a = lb - C[i] b = ub - C[i] N_candidates[j] = C[i] + rand(TruncatedNormal(0,r_centers[i],a,b)) evaluations[j] = surrSOP(N_candidates[j]) end x_best = N_candidates[argmin(evaluations)] y_best = minimum(evaluations) new_points[i,1] = x_best new_points[i,2] = y_best end #new_points[i] now contains: #[x_1,y_1; x_2,y_2,...,x_{num_new_samples},y_{num_new_samples}] #2.4 Adaptive learning and tabu archive for i=1:num_P if new_points[i,1] in centers_global r_centers[i] = r_centers_global[i] N_failures[i] = N_failures_global[i] end #1D it is just the length of the interval #Need further explanation #println(Hypervolume_Pareto_improving_1D(new_points[i,1],Fronts[1])) #= if (Hypervolume_Pareto_improving_1D(new_points[i,1],Fronts[1])<tau) #failure r_centers[i] = r_centers[i]/2 N_failures[i] += 1 if N_failures[i] > N_fail push!(tabu,C[i]) push!(N_tenures_tabu,0) end else =# #P_i is success #Adaptive_learning add_point!(surrSOP,new_points[i,1],new_points[i,2]) push!(r_centers_global,r_centers[i]) push!(N_failures_global,N_failures[i]) #end end end end
As you can see I commented out the bit on EHI, given that there are many conditions to pass to even get to that point the algorithm is still performing OK.
Another Surrogate and plans for the future
At the moment I am reading these papers: "Multidimensional Lobachevsky Spline Integration on Scattered Data" and "Numerical integration on multivariate scattered data by Lobachevsky splines" by Giampiero Allasia. The main focus is to build a new surrogate type called Lobachesky spline to fit data. I expect to open a PR very soon with the 1D version of it. This is following the issue: https://github.com/JuliaDiffEq/Surrogates.jl/issues/17
I love the fact that there are still many Surrogates to build so I can pick whatever suits my skills the most! (Knowing fully that I will implement them all sooner or later :-D)
Happy Coding,
Ludovico