Uziel Linares / Aug 31 2020

Remix of Julia by

Rigorous Interval Optimization with Taylor Models

In this blog post, I'm going to talk about how to improve interval optimization methods with the use of Taylor Models, in particular with the use of range bounders, which you can read more about it here.

Optimization

]add TaylorModels IntervalArithmetic BranchAndPrune

71.3s

Julia

using Plotsusing TaylorModelsusing IntervalArithmeticusing BranchAndPruneusing BenchmarkToolsusing WebIO, Interactusing WebSockets

166.4s

Julia

plot();

33.2s

Julia

Formally, an optimization problem is defined by the following

formula not implemented

There are several methods to solve this kind of problems[1], in this blog post I'm going to focus on a simple method to solve plain (with no constraints) optimization problems. This method forms part of the so-called branch-and-bound methods; the key idea behind this method is to recursively partition (typically in disjoint sets to prevent the algorithm to visit the same place more than once) the search space and look for candidate solutions in each of the partitions. These candidate solutions are identified by means of some validated method, in particular by computing the bound for the function range on each partition. Once these candidate solutions are identified there we decide if are good enough or need to be further refined.

To make this more concrete, let's take a look at the next simple example.

Remark: It is advisable to run this blog post interactively, in order to be able to exploit the sliders that are included in some cases.

Suppose we want to find the minimum of the function inline_formula not implemented on the interval inline_formula not implemented. The first step is to compute the bound for the function range on this initial domain,

f(x) = 1 + x^5 - x^4D = 0 .. 1bound = f(D)box = IntervalBox(D, bound)xrange = range(D.lo, D.hi, length=100)plot(box, fill=:green, label="bound")plot!(xrange, x -> f(x), lw=2, label="f(x)", color=:red)ylims!(0.9, 1.1)

1.0s

Julia

The bound for this initial partition overestimates the function range greatly, so, the next step is to split this partition into two partitions of equal length and compute the bounds of each of them,

D = 0 .. 1Di = mince(D, 2)bounds = f.(Di)boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]p = plot(legend=false)for i in eachindex(boxes)  plot!(boxes[i], c=:green)endplot!(xrange, x -> f(x), lw=2, c=:red)ylims!(0.9, 1.1)

1.6s

Julia

Now the bounds look better, and we can begin to think about how to discard uninteresting partitions (the ones that can't have the minimum). We can think in some sort of "threshold" that enables us to discard a partition if the lower bound for that partition lies above that threshold. For simplicity, let's take this threshold as the lowest upper bound of all partitions, then, as all lower bounds lie below that threshold we preserve all partitions and split them again; we call these survival partitions the active partitions.

D = 0 .. 1Di = mince(D, 4)bounds = f.(Di)boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]p = plot(legend=false)for i in eachindex(boxes)  plot!(boxes[i], c=:green)endplot!(xrange, x -> f(x), lw=2, c=:red)ylims!(0.9, 1.1)

1.0s

Julia

Once again, the bounds were improved, and we can adjust the threshold to the new lowest upper bound. We can repeat this process until the width of the active partitions become smaller than some tolerance value. So, repeating this process we get,

D = 0 .. 1@manipulate for n in [2^i for i in 3:10]  Di = mince(D, n)  bounds = f.(Di)  boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]  threshold = minimum([b.hi for b in bounds])  active_boxes = [box for box in boxes if box[2].lo < threshold]  p = plot(legend=:topleft)  plot!(boxes, c=:green, label="Unactive partitions")  plot!(active_boxes, c=:blue, label="Active partitions")  plot!(xrange, x -> f(x), lw=2, c=:red, label="f(x)")  hline!([threshold], lw=2, c=:blue, label="Threshold value")  ylims!(0.9, 1.05)end

1.6s

Julia

As the number of partitions is increased we get a better estimate for the region that contains the minimum; this is the essence of the branch-and-bound method.

So, specifically, the steps for the branch-and-bound method are

Partition inline_formula not implemented into inline_formula not implemented disjoint sets.
Compute the bounds for each of these inline_formula not implementeds by some validated method (previously we used only interval arithmetic but is not restricted to this).
Update the threshold value and discard the uninteresting partitions.
Split the active partitions and go to 1.

This kind of strategy is actually implemented on the BranchAndPrune.jl package. So, let's re-write the previous example with the BranchAndPrune package. This package requests us 3 things:

A struct that contains some information about the search (the function, initial domain box, tolerance, etc.)
A new method for the function BranchAndPrune.process that tells the package how to process each of the subdomains, i.e., which calculation needs to be done on each of them.
A new method for the function BranchAndPrune.bisect that specifies how to split each subdomain if required.

So, for the last example, we have

mutable struct IntervalSearch{T} <: AbstractBreadthFirstSearch{Interval{T}}  f::Function  initial::Interval{T}  tol::Float64  cutoff::Float64endfunction BranchAndPrune.process(search::IntervalSearch, interval)  bound = search.f(interval) # compute the bound on the Di subdomain    if bound.lo > search.cutoff    # if the lower bound is higher than the cutoff value    # then we discard the subdomain, as can not contain the minimum    return :discard, interval   end    if bound.hi < search.cutoff    # if the upper bound is lower than the cutoff value    # then we update it    search.cutoff = bound.hi  end    if diam(interval) < search.tol    # if the width of the subdomain is less than    # the tolerance, we store that subdomain, as can    # potentially have the minimum    return :store, interval  else    # If not, split further    return :bisect, interval  endendfunction BranchAndPrune.bisect(::IntervalSearch, Di)  Di = mince(Di, 2)  return (Di...,)end# Now we define a function to actually doing the searchfunction run_search_interval(f, interval, tol; cutoff=1.)  history = []  search = IntervalSearch(f, interval, tol, cutoff)    local end_tree = nothing  for working_tree in search    end_tree = working_tree    push!(history, data(end_tree))  end    return data(end_tree), historyend 

0.2s

Julia

run_search_interval (generic function with 1 method)

f(x) = 1 + x^5 - x^4D = 0 .. 1tol = 1e-2cutoff = 2.xrange = range(D.lo, D.hi, length=100)final, history = run_search_interval(f, D, tol, cutoff=cutoff);history = unique([h for hh in history for h in hh])interval_y = interval(-20, 20)boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))plot!(final_boxes, color=:blue, label="Final boxes")plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)ylims!(0.9, 1)

1.7s

Julia

We see that it takes too many steps to actually bound the minimum region and that this region (in blue) is too large; this is a consequence of the overestimation for the range function by plain interval arithmetic. This can be improved if instead of using plain interval arithmetic to compute the bound on each subdomain, we use a Taylor Model since in general gives tighter bounds.

mutable struct TMSearch{T} <: AbstractBreadthFirstSearch{Interval{T}}  f::Function  initial::Interval{T}  tol::Float64  cutoff::Float64  order::Intendfunction BranchAndPrune.process(search::TMSearch, interval)  # Defining a Taylor Model with the midpoint as expansion point  tm = TaylorModel1(order, mid(interval), interval)  ftm = search.f(tm)  bound = ftm(ftm.dom - ftm.x0) # compute the bound on the Di subdomain    if bound.lo > search.cutoff    # if the lower bound is higher than the cutoff value    # then we discard the subdomain, as can not contain the minimum    return :discard, interval   end    if bound.hi < search.cutoff    # if the upper bound is lower than the cutoff value    # then we update it    search.cutoff = bound.hi  end    if diam(interval) < search.tol    # if the width of the subdomain is less than    # the tolerance, we store that subdomain, as can    # potentially have the minimum    return :store, interval  else    # If not, split further    return :bisect, interval  endendfunction BranchAndPrune.bisect(::TMSearch, Di)  Di = mince(Di, 2)  return (Di...,)endfunction run_search_tm(f, interval, tol; cutoff=1., order=3)  history = []  search = TMSearch(f, interval, tol, cutoff, order)    local end_tree = nothing  for working_tree in search    end_tree = working_tree    push!(history, data(end_tree))  end    return data(end_tree), historyend

0.5s

Julia

run_search_tm (generic function with 1 method)

f(x) = 1 + x^5 - x^4D = 0 .. 1tol = 1e-2cutoff = 2.order = 3xrange = range(D.lo, D.hi, length=100)final, history = run_search_tm(f, D, tol, cutoff=cutoff, order=order);history = unique([h for hh in history for h in hh])interval_y = interval(-20, 20)boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))plot!(final_boxes, color=:blue, label="Final boxes")plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)ylims!(0.9, 1)

5.8s

Julia

We see a huge impact on the results with the use of a (naive) Taylor Model of order 3, now it takes fewer splits to actually find the minimum and the region that contains it is much tighter.

While in this case the naive Taylor Model gives good results, this can be improved; let's repeat this example but with the use of range bounders.

mutable struct BounderSearch{T} <: AbstractBreadthFirstSearch{Interval{T}}  f::Function  initial::Interval{T}  tol::Float64  cutoff::Float64  order::Int  bounder_type::Stringendfunction BranchAndPrune.process(search::BounderSearch, interval)  # Defining a Taylor Model with the midpoint as expansion point  tm = TaylorModel1(order, mid(interval), interval)  ftm = search.f(tm)    if search.bounder_type == "ldb"    bound = linear_dominated_bounder(ftm)  elseif search.bounder_type == "qfb"    bound = quadratic_fast_bounder(ftm)  else    bound = ftm(ftm.dom - ftm.x0)  end    if bound.lo > search.cutoff    # if the lower bound is higher than the cutoff value    # then we discard the subdomain, as can not contain the minimum    return :discard, interval   end    if bound.hi < search.cutoff    # if the upper bound is lower than the cutoff value    # then we update it    search.cutoff = bound.hi  end    if diam(interval) < search.tol    # if the width of the subdomain is less than    # the tolerance, we store that subdomain, as can    # potentially have the minimum    return :store, interval  else    # If not, split further    return :bisect, interval  endendfunction BranchAndPrune.bisect(::BounderSearch, Di)  Di = mince(Di, 2)  return (Di...,)endfunction run_search_bounder(f, interval, tol; cutoff=1., order=3, bounder=:ldb)  history = []  search = BounderSearch(f, interval, tol, cutoff, order, bounder)    local end_tree = nothing  for working_tree in search    end_tree = working_tree    push!(history, data(end_tree))  end    return data(end_tree), historyend

0.7s

Julia

run_search_bounder (generic function with 1 method)

f(x) = 1 + x^5 - x^4D = 0 .. 1tol = 1e-2cutoff = 2.order = 3final_ldb, history_ldb = run_search_bounder(f, D, tol, cutoff=cutoff,                                            order=order, bounder="ldb")final_qfb, history_qfb = run_search_bounder(f, D, tol, cutoff=cutoff,                                            order=order, bounder="qfb")final_naive, history_naive = run_search_bounder(f, D, tol, cutoff=cutoff,                                                order=order, bounder="naive")xrange = range(D.lo, D.hi, length=100)@manipulate for bounder in ("ldb", "qfb", "naive")  if bounder == "ldb"    final, history = final_ldb, history_ldb  elseif bounder == "qfb"    final, history = final_qfb, history_qfb  elseif bounder == "naive"    final, history = final_naive, history_naive  end  history = unique([h for hh in history for h in hh])  interval_y = interval(-20, 20)  boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]  final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]  plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))  plot!(final_boxes, color=:blue, label="Final boxes")  plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)  ylims!(0.9, 1)end

1.4s

Julia

f(x) = x^2 * cos(5 - x) + sin(5 - x^2)^2 D = -2 .. 4tol = 1e-1cutoff = 2.order = 3final_ldb, history_ldb = run_search_bounder(f, D, tol, cutoff=cutoff,                                            order=order, bounder="ldb")final_qfb, history_qfb = run_search_bounder(f, D, tol, cutoff=cutoff,                                            order=order, bounder="qfb")final_naive, history_naive = run_search_bounder(f, D, tol, cutoff=cutoff,                                                order=order, bounder="naive")xrange = range(D.lo, D.hi, length=100)@manipulate for bounder in ("ldb", "qfb", "naive")  if bounder == "ldb"    final, history = final_ldb, history_ldb  elseif bounder == "qfb"    final, history = final_qfb, history_qfb  elseif bounder == "naive"    final, history = final_naive, history_naive  end  history = unique([h for hh in history for h in hh])  interval_y = interval(-20, 20)  boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]  final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]  plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))  plot!(final_boxes, color=:blue, label="Final boxes")  plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)  #ylims!(-10, 10)end

1.5s

Julia

From the previous interactive plots, we see that for some functions, the use of bounders improves the search, but for others, the results are the same as with naive Taylor Model. So, a natural question arises, are these extra computations due to the range bounder worth it?

To answer this, let's compute the timings for each type of search: interval-based, Taylor Model-based, and with range bounders.

f(x) = x^2 * cos(5 - x) + sin(5 - x^2)^2 D = -2 .. 4tol = 1e-1cutoff = 2.order = 3

0.2s

Julia

@btime final, history = run_search_interval($f, $D, $tol, cutoff=$cutoff);

12.9s

Julia

bounder = "naive"@btime final_naive, history_naive = run_search_bounder($f, $D, $tol,  cutoff=$cutoff, order=$order, bounder=$bounder);

12.8s

Julia

bounder = "ldb"@btime final_ldb, history_ldb = run_search_bounder($f, $D, $tol, cutoff=$cutoff,                                            order=$order, bounder=$bounder);

12.6s

Julia

bounder = "qfb"@btime final_qfb, history_qfb = run_search_bounder($f, $D, $tol, cutoff=$cutoff,                                            order=$order, bounder=$bounder);

12.6s

Julia

We see the price for improving the search, the Taylor Model-based searches are 3 orders of magnitude slower! On the other hand, we see that naive and qfb methods have similar computation times, meanwhile, the ldb algorithm has a slightly higher time.

Finally, let's see how this works for two-dimensional functions.

# This struct must be a subtype of AbstractBreadthFirstSearchmutable struct IntervalMinimumSearch{N, T} <: AbstractBreadthFirstSearch{IntervalBox{N, T}}  f::Function  initial::IntervalBox{N, T}  tol::Float64  cutoff::Float64endmutable struct BounderSearch{N, T} <: AbstractBreadthFirstSearch{IntervalBox{N, T}}  f::Function  initial::IntervalBox{N, T}  tol::Float64  cutoff::Float64  order::Int  bounder_type::Stringendfunction define_taylor_model(dom::IntervalBox, X0, f; order=3)    zi = 0 .. 0    order_max = 2 * (order + 1)    set_variables(Float64, [:x, :y], order=order_max)    xT = TaylorN(Float64, 1, order=order)    yT = TaylorN(Float64, 2, order=order)    xT = TaylorN(deepcopy(evaluate(xT, get_variables(order) .+ mid(X0)).coeffs))    yT = TaylorN(deepcopy(evaluate(yT, get_variables(order) .+ mid(X0)).coeffs))    xm = TaylorModelN{2, Float64, Float64}(xT, zi, X0, dom)    ym = TaylorModelN{2, Float64, Float64}(yT, zi, X0, dom)    ftm = f(xm, ym)    return ftmendfunction BranchAndPrune.process(search::BounderSearch, interval)  X0 = IntervalBox(mid(interval))  ftm = define_taylor_model(interval, X0, search.f, order=search.order)   if search.bounder_type == "ldb"    bound = linear_dominated_bounder(ftm, max_iter=5, ϵ=1e-15)  elseif search.bounder_type == "qfb"    bound = quadratic_fast_bounder(ftm)  else    bound = ftm(ftm.dom - ftm.x0)  end    if bound.lo > search.cutoff    # if the lower bound is higher than the cutoff value    # then we discard the subdomain, as can not contain the minimum    return :discard, interval   end    if bound.hi < search.cutoff    # if the upper bound is lower than the cutoff value    # then we update it    search.cutoff = bound.hi  end    if prod(diam.(interval)) < search.tol    # if the width of the subdomain is less than    # the tolerance, we store that subdomain, as can    # potentially have the minimum    return :store, interval  else    # If not, split further    return :bisect, interval  endend# Function to process each subdomainfunction BranchAndPrune.process(search::IntervalMinimumSearch, interval)  bound = search.f(interval)  if bound.lo > search.cutoff    return :discard, interval  end    if bound.hi < search.cutoff    search.cutoff = bound.hi  end    if prod(diam.(interval)) < search.tol    return :store, interval  else    return :bisect, interval  endendfunction BranchAndPrune.bisect(::Union{IntervalMinimumSearch, BounderSearch}, box)  interval_x = box[1]  interval_y = box[2]  w_x, w_y = diam.(box)  mid_x, mid_y = mid.(box)    if w_x > w_y    return IntervalBox(interval_x.lo .. mid_x, interval_y),      IntervalBox(mid_x .. interval_x.hi, interval_y)  else    return IntervalBox(interval_x, interval_y.lo .. mid_y),      IntervalBox(interval_x, mid_y .. interval_y.hi)  endend

0.2s

Julia

function run_search_interval(f, interval, tol; cutoff=1.)  history = []  search = IntervalMinimumSearch(f, interval, tol, cutoff)    local end_tree = nothing  for working_tree in search    end_tree = working_tree    push!(history, data(end_tree))  end    return data(end_tree), historyend function run_search_bounder(f, interval, tol; bounder_type="ldb",    order=3, cutoff=1.)    history = []    search = BounderSearch(f, interval, tol, cutoff,                                 order, bounder_type)    local end_tree = nothing        for working_tree in search        end_tree = working_tree        push!(history, data(end_tree))    end        return data(end_tree), historyend

0.4s

Julia

run_search_bounder (generic function with 1 method)

f(x1, x2) = (1.5 - x1 * (1 - x2))^2 + (2.25 - x1 * (1 - x2^2)) * (2.25 - x1 * (1 - x2^2)) + (2.625 - x1 * (1 - x2^3)) * (2.625 - x1 * (1 - x2^3))f(box) = f(box[1], box[2])box = IntervalBox(-4.5 .. 4.5, -4.5 .. 4.5)tol = 1 / 32order = 4final_interval, history_interval = run_search_interval(f, box, tol)history_interval = unique([h for hh in history_interval for h in hh])final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive",  order=order)history_naive = unique([h for hh in history_naive for h in hh])final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb",  order=order)history_ldb = unique([h for hh in history_ldb for h in hh])final_qfb, history_qfb = run_search_bounder(f, box, tol, bounder_type="qfb",  order=order)history_qfb = unique([h for hh in history_qfb for h in hh])searches = (interval = (final_interval, history_interval),            naive = (final_naive, history_naive),            ldb = (final_ldb, history_ldb),            qfb = (final_qfb, history_qfb))@manipulate for search_type in ("interval", "naive", "ldb", "qfb")  final, history = searches[Symbol(search_type)]  plot(history, c=:wheat2, legend=false, ratio=:equal,    title="Minimum search for Beale function")  plot!(final, c=:blue)  scatter!([3], [0.5], c=:red, ms=3)  xlims!(box[1].lo, box[2].hi)  ylims!(box[2].lo, box[2].hi)end

9.2s

Julia

f(x,y)=sin(x+y)+(x-y)^2-1.5x+2.5y+1f(box) = f(box[1], box[2])box = IntervalBox(-1.5 .. 4., -3. .. 4.)tol = 1 / 32order = 4final_interval, history_interval = run_search_interval(f, box, tol)history_interval = unique([h for hh in history_interval for h in hh])final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive",  order=order)history_naive = unique([h for hh in history_naive for h in hh])final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb",  order=order)history_ldb = unique([h for hh in history_ldb for h in hh])searches = (interval = (final_interval, history_interval),            naive = (final_naive, history_naive),            ldb = (final_ldb, history_ldb))@manipulate for search_type in ("interval", "naive", "ldb")  final, history = searches[Symbol(search_type)]    plot(history, c=:wheat2, legend=false, ratio=:equal,    title="Minimum search for McCormick function")  plot!(final, c=:blue)  scatter!([-0.54719], [-1.54719], c=:red, ms=3)  xlims!(box[1].lo, box[1].hi)  ylims!(box[2].lo, box[2].hi)end

1.6s

Julia

f(x, y) = (1 + (x + y + 1)^2 * (19 - 14x + 3x^2 - 14y + 6 * x * y + 3y^2)) * (30 + (2x - 3y)^2 * (18 - 32x + 12x^2 + 48y - 36 * x * y + 27y^2))f(box) = f(box[1], box[2])box = IntervalBox(-2. .. 2., -2. .. 2.)tol = 1 / 32order = 4final_interval, history_interval = run_search_interval(f, box, tol)history_interval = unique([h for hh in history_interval for h in hh])final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive",  order=order)history_naive = unique([h for hh in history_naive for h in hh])final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb",  order=order)history_ldb = unique([h for hh in history_ldb for h in hh])final_qfb, history_qfb = run_search_bounder(f, box, tol, bounder_type="qfb",  order=order)history_qfb = unique([h for hh in history_qfb for h in hh])searches = (interval = (final_interval, history_interval),            naive = (final_naive, history_naive),            ldb = (final_ldb, history_ldb),            qfb = (final_qfb, history_qfb))@manipulate for search_type in ("interval", "naive", "ldb", "qfb")  final, history = searches[Symbol(search_type)]  plot(history, c=:wheat2, legend=false, ratio=:equal,    title="Minimum search for Goldstein–Price function")  plot!(final, c=:blue)  scatter!([0.], [-1.], c=:red, ms=3)  xlims!(box[1].lo, box[1].hi)  ylims!(box[2].lo, box[2].hi)end

4.8s

Julia

As illustrated in the last plots, in general, the ldb algorithm gives the best results among all Taylor Model-based searches, and in every case, the Taylor Model-based search surpasses the search with plain interval methods.

References

[1] Numerical Optimization, Nocedal & Wright, Springer.

Rigorous Interval Optimization with Taylor Models

Optimization

References

Runtimes (1)