Rigorous Interval Optimization with Taylor Models
In this blog post, I'm going to talk about how to improve interval optimization methods with the use of Taylor Models, in particular with the use of range bounders, which you can read more about it here.
Optimization
]add TaylorModels IntervalArithmetic BranchAndPruneusing Plotsusing TaylorModelsusing IntervalArithmeticusing BranchAndPruneusing BenchmarkToolsusing WebIO, Interactusing WebSocketsplot(); Formally, an optimization problem is defined by the following
formula not implementedThere are several methods to solve this kind of problems[1], in this blog post I'm going to focus on a simple method to solve plain (with no constraints) optimization problems. This method forms part of the so-called branch-and-bound methods; the key idea behind this method is to recursively partition (typically in disjoint sets to prevent the algorithm to visit the same place more than once) the search space and look for candidate solutions in each of the partitions. These candidate solutions are identified by means of some validated method, in particular by computing the bound for the function range on each partition. Once these candidate solutions are identified there we decide if are good enough or need to be further refined.
To make this more concrete, let's take a look at the next simple example.
Remark: It is advisable to run this blog post interactively, in order to be able to exploit the sliders that are included in some cases.
Suppose we want to find the minimum of the function inline_formula not implemented on the interval inline_formula not implemented. The first step is to compute the bound for the function range on this initial domain,
f(x) = 1 + x^5 - x^4D = 0 .. 1bound = f(D)box = IntervalBox(D, bound)xrange = range(D.lo, D.hi, length=100)plot(box, fill=:green, label="bound")plot!(xrange, x -> f(x), lw=2, label="f(x)", color=:red)ylims!(0.9, 1.1)The bound for this initial partition overestimates the function range greatly, so, the next step is to split this partition into two partitions of equal length and compute the bounds of each of them,
D = 0 .. 1Di = mince(D, 2)bounds = f.(Di)boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]p = plot(legend=false)for i in eachindex(boxes) plot!(boxes[i], c=:green)endplot!(xrange, x -> f(x), lw=2, c=:red)ylims!(0.9, 1.1)Now the bounds look better, and we can begin to think about how to discard uninteresting partitions (the ones that can't have the minimum). We can think in some sort of "threshold" that enables us to discard a partition if the lower bound for that partition lies above that threshold. For simplicity, let's take this threshold as the lowest upper bound of all partitions, then, as all lower bounds lie below that threshold we preserve all partitions and split them again; we call these survival partitions the active partitions.
D = 0 .. 1Di = mince(D, 4)bounds = f.(Di)boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]p = plot(legend=false)for i in eachindex(boxes) plot!(boxes[i], c=:green)endplot!(xrange, x -> f(x), lw=2, c=:red)ylims!(0.9, 1.1)Once again, the bounds were improved, and we can adjust the threshold to the new lowest upper bound. We can repeat this process until the width of the active partitions become smaller than some tolerance value. So, repeating this process we get,
D = 0 .. 1 for n in [2^i for i in 3:10] Di = mince(D, n) bounds = f.(Di) boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)] threshold = minimum([b.hi for b in bounds]) active_boxes = [box for box in boxes if box[2].lo < threshold] p = plot(legend=:topleft) plot!(boxes, c=:green, label="Unactive partitions") plot!(active_boxes, c=:blue, label="Active partitions") plot!(xrange, x -> f(x), lw=2, c=:red, label="f(x)") hline!([threshold], lw=2, c=:blue, label="Threshold value") ylims!(0.9, 1.05)endAs the number of partitions is increased we get a better estimate for the region that contains the minimum; this is the essence of the branch-and-bound method.
So, specifically, the steps for the branch-and-bound method are
Partition inline_formula not implemented into inline_formula not implemented disjoint sets.
Compute the bounds for each of these inline_formula not implementeds by some validated method (previously we used only interval arithmetic but is not restricted to this).
Update the threshold value and discard the uninteresting partitions.
Split the active partitions and go to 1.
This kind of strategy is actually implemented on the BranchAndPrune.jl package. So, let's re-write the previous example with the BranchAndPrune package. This package requests us 3 things:
A struct that contains some information about the search (the function, initial domain box, tolerance, etc.)
A new method for the function
BranchAndPrune.processthat tells the package how to process each of the subdomains, i.e., which calculation needs to be done on each of them.A new method for the function
BranchAndPrune.bisectthat specifies how to split each subdomain if required.
So, for the last example, we have
mutable struct IntervalSearch{T} <: AbstractBreadthFirstSearch{Interval{T}} f::Function initial::Interval{T} tol::Float64 cutoff::Float64endfunction BranchAndPrune.process(search::IntervalSearch, interval) bound = search.f(interval) # compute the bound on the Di subdomain if bound.lo > search.cutoff # if the lower bound is higher than the cutoff value # then we discard the subdomain, as can not contain the minimum return :discard, interval end if bound.hi < search.cutoff # if the upper bound is lower than the cutoff value # then we update it search.cutoff = bound.hi end if diam(interval) < search.tol # if the width of the subdomain is less than # the tolerance, we store that subdomain, as can # potentially have the minimum return :store, interval else # If not, split further return :bisect, interval endendfunction BranchAndPrune.bisect(::IntervalSearch, Di) Di = mince(Di, 2) return (Di...,)end# Now we define a function to actually doing the searchfunction run_search_interval(f, interval, tol; cutoff=1.) history = [] search = IntervalSearch(f, interval, tol, cutoff) local end_tree = nothing for working_tree in search end_tree = working_tree push!(history, data(end_tree)) end return data(end_tree), historyend f(x) = 1 + x^5 - x^4D = 0 .. 1tol = 1e-2cutoff = 2.xrange = range(D.lo, D.hi, length=100)final, history = run_search_interval(f, D, tol, cutoff=cutoff);history = unique([h for hh in history for h in hh])interval_y = interval(-20, 20)boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))plot!(final_boxes, color=:blue, label="Final boxes")plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)ylims!(0.9, 1)We see that it takes too many steps to actually bound the minimum region and that this region (in blue) is too large; this is a consequence of the overestimation for the range function by plain interval arithmetic. This can be improved if instead of using plain interval arithmetic to compute the bound on each subdomain, we use a Taylor Model since in general gives tighter bounds.
mutable struct TMSearch{T} <: AbstractBreadthFirstSearch{Interval{T}} f::Function initial::Interval{T} tol::Float64 cutoff::Float64 order::Intendfunction BranchAndPrune.process(search::TMSearch, interval) # Defining a Taylor Model with the midpoint as expansion point tm = TaylorModel1(order, mid(interval), interval) ftm = search.f(tm) bound = ftm(ftm.dom - ftm.x0) # compute the bound on the Di subdomain if bound.lo > search.cutoff # if the lower bound is higher than the cutoff value # then we discard the subdomain, as can not contain the minimum return :discard, interval end if bound.hi < search.cutoff # if the upper bound is lower than the cutoff value # then we update it search.cutoff = bound.hi end if diam(interval) < search.tol # if the width of the subdomain is less than # the tolerance, we store that subdomain, as can # potentially have the minimum return :store, interval else # If not, split further return :bisect, interval endendfunction BranchAndPrune.bisect(::TMSearch, Di) Di = mince(Di, 2) return (Di...,)endfunction run_search_tm(f, interval, tol; cutoff=1., order=3) history = [] search = TMSearch(f, interval, tol, cutoff, order) local end_tree = nothing for working_tree in search end_tree = working_tree push!(history, data(end_tree)) end return data(end_tree), historyendf(x) = 1 + x^5 - x^4D = 0 .. 1tol = 1e-2cutoff = 2.order = 3xrange = range(D.lo, D.hi, length=100)final, history = run_search_tm(f, D, tol, cutoff=cutoff, order=order);history = unique([h for hh in history for h in hh])interval_y = interval(-20, 20)boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))plot!(final_boxes, color=:blue, label="Final boxes")plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)ylims!(0.9, 1)We see a huge impact on the results with the use of a (naive) Taylor Model of order 3, now it takes fewer splits to actually find the minimum and the region that contains it is much tighter.
While in this case the naive Taylor Model gives good results, this can be improved; let's repeat this example but with the use of range bounders.
mutable struct BounderSearch{T} <: AbstractBreadthFirstSearch{Interval{T}} f::Function initial::Interval{T} tol::Float64 cutoff::Float64 order::Int bounder_type::Stringendfunction BranchAndPrune.process(search::BounderSearch, interval) # Defining a Taylor Model with the midpoint as expansion point tm = TaylorModel1(order, mid(interval), interval) ftm = search.f(tm) if search.bounder_type == "ldb" bound = linear_dominated_bounder(ftm) elseif search.bounder_type == "qfb" bound = quadratic_fast_bounder(ftm) else bound = ftm(ftm.dom - ftm.x0) end if bound.lo > search.cutoff # if the lower bound is higher than the cutoff value # then we discard the subdomain, as can not contain the minimum return :discard, interval end if bound.hi < search.cutoff # if the upper bound is lower than the cutoff value # then we update it search.cutoff = bound.hi end if diam(interval) < search.tol # if the width of the subdomain is less than # the tolerance, we store that subdomain, as can # potentially have the minimum return :store, interval else # If not, split further return :bisect, interval endendfunction BranchAndPrune.bisect(::BounderSearch, Di) Di = mince(Di, 2) return (Di...,)endfunction run_search_bounder(f, interval, tol; cutoff=1., order=3, bounder=:ldb) history = [] search = BounderSearch(f, interval, tol, cutoff, order, bounder) local end_tree = nothing for working_tree in search end_tree = working_tree push!(history, data(end_tree)) end return data(end_tree), historyendf(x) = 1 + x^5 - x^4D = 0 .. 1tol = 1e-2cutoff = 2.order = 3final_ldb, history_ldb = run_search_bounder(f, D, tol, cutoff=cutoff, order=order, bounder="ldb")final_qfb, history_qfb = run_search_bounder(f, D, tol, cutoff=cutoff, order=order, bounder="qfb")final_naive, history_naive = run_search_bounder(f, D, tol, cutoff=cutoff, order=order, bounder="naive")xrange = range(D.lo, D.hi, length=100) for bounder in ("ldb", "qfb", "naive") if bounder == "ldb" final, history = final_ldb, history_ldb elseif bounder == "qfb" final, history = final_qfb, history_qfb elseif bounder == "naive" final, history = final_naive, history_naive end history = unique([h for hh in history for h in hh]) interval_y = interval(-20, 20) boxes = [IntervalBox(interval_x, interval_y) for interval_x in history] final_boxes = [IntervalBox(final_x, interval_y) for final_x in final] plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800)) plot!(final_boxes, color=:blue, label="Final boxes") plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2) ylims!(0.9, 1)endf(x) = x^2 * cos(5 - x) + sin(5 - x^2)^2 D = -2 .. 4tol = 1e-1cutoff = 2.order = 3final_ldb, history_ldb = run_search_bounder(f, D, tol, cutoff=cutoff, order=order, bounder="ldb")final_qfb, history_qfb = run_search_bounder(f, D, tol, cutoff=cutoff, order=order, bounder="qfb")final_naive, history_naive = run_search_bounder(f, D, tol, cutoff=cutoff, order=order, bounder="naive")xrange = range(D.lo, D.hi, length=100) for bounder in ("ldb", "qfb", "naive") if bounder == "ldb" final, history = final_ldb, history_ldb elseif bounder == "qfb" final, history = final_qfb, history_qfb elseif bounder == "naive" final, history = final_naive, history_naive end history = unique([h for hh in history for h in hh]) interval_y = interval(-20, 20) boxes = [IntervalBox(interval_x, interval_y) for interval_x in history] final_boxes = [IntervalBox(final_x, interval_y) for final_x in final] plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800)) plot!(final_boxes, color=:blue, label="Final boxes") plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2) #ylims!(-10, 10)endFrom the previous interactive plots, we see that for some functions, the use of bounders improves the search, but for others, the results are the same as with naive Taylor Model. So, a natural question arises, are these extra computations due to the range bounder worth it?
To answer this, let's compute the timings for each type of search: interval-based, Taylor Model-based, and with range bounders.
f(x) = x^2 * cos(5 - x) + sin(5 - x^2)^2 D = -2 .. 4tol = 1e-1cutoff = 2.order = 3 final, history = run_search_interval($f, $D, $tol, cutoff=$cutoff);bounder = "naive" final_naive, history_naive = run_search_bounder($f, $D, $tol, cutoff=$cutoff, order=$order, bounder=$bounder);bounder = "ldb" final_ldb, history_ldb = run_search_bounder($f, $D, $tol, cutoff=$cutoff, order=$order, bounder=$bounder);bounder = "qfb" final_qfb, history_qfb = run_search_bounder($f, $D, $tol, cutoff=$cutoff, order=$order, bounder=$bounder);We see the price for improving the search, the Taylor Model-based searches are 3 orders of magnitude slower! On the other hand, we see that naive and qfb methods have similar computation times, meanwhile, the ldb algorithm has a slightly higher time.
Finally, let's see how this works for two-dimensional functions.
# This struct must be a subtype of AbstractBreadthFirstSearchmutable struct IntervalMinimumSearch{N, T} <: AbstractBreadthFirstSearch{IntervalBox{N, T}} f::Function initial::IntervalBox{N, T} tol::Float64 cutoff::Float64endmutable struct BounderSearch{N, T} <: AbstractBreadthFirstSearch{IntervalBox{N, T}} f::Function initial::IntervalBox{N, T} tol::Float64 cutoff::Float64 order::Int bounder_type::Stringendfunction define_taylor_model(dom::IntervalBox, X0, f; order=3) zi = 0 .. 0 order_max = 2 * (order + 1) set_variables(Float64, [:x, :y], order=order_max) xT = TaylorN(Float64, 1, order=order) yT = TaylorN(Float64, 2, order=order) xT = TaylorN(deepcopy(evaluate(xT, get_variables(order) .+ mid(X0)).coeffs)) yT = TaylorN(deepcopy(evaluate(yT, get_variables(order) .+ mid(X0)).coeffs)) xm = TaylorModelN{2, Float64, Float64}(xT, zi, X0, dom) ym = TaylorModelN{2, Float64, Float64}(yT, zi, X0, dom) ftm = f(xm, ym) return ftmendfunction BranchAndPrune.process(search::BounderSearch, interval) X0 = IntervalBox(mid(interval)) ftm = define_taylor_model(interval, X0, search.f, order=search.order) if search.bounder_type == "ldb" bound = linear_dominated_bounder(ftm, max_iter=5, ϵ=1e-15) elseif search.bounder_type == "qfb" bound = quadratic_fast_bounder(ftm) else bound = ftm(ftm.dom - ftm.x0) end if bound.lo > search.cutoff # if the lower bound is higher than the cutoff value # then we discard the subdomain, as can not contain the minimum return :discard, interval end if bound.hi < search.cutoff # if the upper bound is lower than the cutoff value # then we update it search.cutoff = bound.hi end if prod(diam.(interval)) < search.tol # if the width of the subdomain is less than # the tolerance, we store that subdomain, as can # potentially have the minimum return :store, interval else # If not, split further return :bisect, interval endend# Function to process each subdomainfunction BranchAndPrune.process(search::IntervalMinimumSearch, interval) bound = search.f(interval) if bound.lo > search.cutoff return :discard, interval end if bound.hi < search.cutoff search.cutoff = bound.hi end if prod(diam.(interval)) < search.tol return :store, interval else return :bisect, interval endendfunction BranchAndPrune.bisect(::Union{IntervalMinimumSearch, BounderSearch}, box) interval_x = box[1] interval_y = box[2] w_x, w_y = diam.(box) mid_x, mid_y = mid.(box) if w_x > w_y return IntervalBox(interval_x.lo .. mid_x, interval_y), IntervalBox(mid_x .. interval_x.hi, interval_y) else return IntervalBox(interval_x, interval_y.lo .. mid_y), IntervalBox(interval_x, mid_y .. interval_y.hi) endendfunction run_search_interval(f, interval, tol; cutoff=1.) history = [] search = IntervalMinimumSearch(f, interval, tol, cutoff) local end_tree = nothing for working_tree in search end_tree = working_tree push!(history, data(end_tree)) end return data(end_tree), historyend function run_search_bounder(f, interval, tol; bounder_type="ldb", order=3, cutoff=1.) history = [] search = BounderSearch(f, interval, tol, cutoff, order, bounder_type) local end_tree = nothing for working_tree in search end_tree = working_tree push!(history, data(end_tree)) end return data(end_tree), historyendf(x1, x2) = (1.5 - x1 * (1 - x2))^2 + (2.25 - x1 * (1 - x2^2)) * (2.25 - x1 * (1 - x2^2)) + (2.625 - x1 * (1 - x2^3)) * (2.625 - x1 * (1 - x2^3))f(box) = f(box[1], box[2])box = IntervalBox(-4.5 .. 4.5, -4.5 .. 4.5)tol = 1 / 32order = 4final_interval, history_interval = run_search_interval(f, box, tol)history_interval = unique([h for hh in history_interval for h in hh])final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive", order=order)history_naive = unique([h for hh in history_naive for h in hh])final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb", order=order)history_ldb = unique([h for hh in history_ldb for h in hh])final_qfb, history_qfb = run_search_bounder(f, box, tol, bounder_type="qfb", order=order)history_qfb = unique([h for hh in history_qfb for h in hh])searches = (interval = (final_interval, history_interval), naive = (final_naive, history_naive), ldb = (final_ldb, history_ldb), qfb = (final_qfb, history_qfb)) for search_type in ("interval", "naive", "ldb", "qfb") final, history = searches[Symbol(search_type)] plot(history, c=:wheat2, legend=false, ratio=:equal, title="Minimum search for Beale function") plot!(final, c=:blue) scatter!([3], [0.5], c=:red, ms=3) xlims!(box[1].lo, box[2].hi) ylims!(box[2].lo, box[2].hi)endf(x,y)=sin(x+y)+(x-y)^2-1.5x+2.5y+1f(box) = f(box[1], box[2])box = IntervalBox(-1.5 .. 4., -3. .. 4.)tol = 1 / 32order = 4final_interval, history_interval = run_search_interval(f, box, tol)history_interval = unique([h for hh in history_interval for h in hh])final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive", order=order)history_naive = unique([h for hh in history_naive for h in hh])final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb", order=order)history_ldb = unique([h for hh in history_ldb for h in hh])searches = (interval = (final_interval, history_interval), naive = (final_naive, history_naive), ldb = (final_ldb, history_ldb)) for search_type in ("interval", "naive", "ldb") final, history = searches[Symbol(search_type)] plot(history, c=:wheat2, legend=false, ratio=:equal, title="Minimum search for McCormick function") plot!(final, c=:blue) scatter!([-0.54719], [-1.54719], c=:red, ms=3) xlims!(box[1].lo, box[1].hi) ylims!(box[2].lo, box[2].hi)endf(x, y) = (1 + (x + y + 1)^2 * (19 - 14x + 3x^2 - 14y + 6 * x * y + 3y^2)) * (30 + (2x - 3y)^2 * (18 - 32x + 12x^2 + 48y - 36 * x * y + 27y^2))f(box) = f(box[1], box[2])box = IntervalBox(-2. .. 2., -2. .. 2.)tol = 1 / 32order = 4final_interval, history_interval = run_search_interval(f, box, tol)history_interval = unique([h for hh in history_interval for h in hh])final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive", order=order)history_naive = unique([h for hh in history_naive for h in hh])final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb", order=order)history_ldb = unique([h for hh in history_ldb for h in hh])final_qfb, history_qfb = run_search_bounder(f, box, tol, bounder_type="qfb", order=order)history_qfb = unique([h for hh in history_qfb for h in hh])searches = (interval = (final_interval, history_interval), naive = (final_naive, history_naive), ldb = (final_ldb, history_ldb), qfb = (final_qfb, history_qfb)) for search_type in ("interval", "naive", "ldb", "qfb") final, history = searches[Symbol(search_type)] plot(history, c=:wheat2, legend=false, ratio=:equal, title="Minimum search for Goldstein–Price function") plot!(final, c=:blue) scatter!([0.], [-1.], c=:red, ms=3) xlims!(box[1].lo, box[1].hi) ylims!(box[2].lo, box[2].hi)endAs illustrated in the last plots, in general, the ldb algorithm gives the best results among all Taylor Model-based searches, and in every case, the Taylor Model-based search surpasses the search with plain interval methods.
References
[1] Numerical Optimization, Nocedal & Wright, Springer.