Rigorous Interval Optimization with Taylor Models

In this blog post, I'm going to talk about how to improve interval optimization methods with the use of Taylor Models, in particular with the use of range bounders, which you can read more about it here.

Optimization

]add TaylorModels IntervalArithmetic BranchAndPrune
71.3s
using Plots
using TaylorModels
using IntervalArithmetic
using BranchAndPrune
using BenchmarkTools
using WebIO, Interact
using WebSockets
166.4s
plot();  
33.2s

Formally, an optimization problem is defined by the following

formula not implemented

There are several methods to solve this kind of problems[1], in this blog post I'm going to focus on a simple method to solve plain (with no constraints) optimization problems. This method forms part of the so-called branch-and-bound methods; the key idea behind this method is to recursively partition (typically in disjoint sets to prevent the algorithm to visit the same place more than once) the search space and look for candidate solutions in each of the partitions. These candidate solutions are identified by means of some validated method, in particular by computing the bound for the function range on each partition. Once these candidate solutions are identified there we decide if are good enough or need to be further refined.

To make this more concrete, let's take a look at the next simple example.

Remark: It is advisable to run this blog post interactively, in order to be able to exploit the sliders that are included in some cases.

Suppose we want to find the minimum of the function inline_formula not implemented on the interval inline_formula not implemented. The first step is to compute the bound for the function range on this initial domain,

f(x) = 1 + x^5 - x^4
D = 0 .. 1
bound = f(D)
box = IntervalBox(D, bound)
xrange = range(D.lo, D.hi, length=100)
plot(box, fill=:green, label="bound")
plot!(xrange, x -> f(x), lw=2, label="f(x)", color=:red)
ylims!(0.9, 1.1)
1.0s

The bound for this initial partition overestimates the function range greatly, so, the next step is to split this partition into two partitions of equal length and compute the bounds of each of them,

D = 0 .. 1
Di = mince(D, 2)
bounds = f.(Di)
boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]
p = plot(legend=false)
for i in eachindex(boxes)
  plot!(boxes[i], c=:green)
end
plot!(xrange, x -> f(x), lw=2, c=:red)
ylims!(0.9, 1.1)
1.6s

Now the bounds look better, and we can begin to think about how to discard uninteresting partitions (the ones that can't have the minimum). We can think in some sort of "threshold" that enables us to discard a partition if the lower bound for that partition lies above that threshold. For simplicity, let's take this threshold as the lowest upper bound of all partitions, then, as all lower bounds lie below that threshold we preserve all partitions and split them again; we call these survival partitions the active partitions.

D = 0 .. 1
Di = mince(D, 4)
bounds = f.(Di)
boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]
p = plot(legend=false)
for i in eachindex(boxes)
  plot!(boxes[i], c=:green)
end
plot!(xrange, x -> f(x), lw=2, c=:red)
ylims!(0.9, 1.1)
1.0s

Once again, the bounds were improved, and we can adjust the threshold to the new lowest upper bound. We can repeat this process until the width of the active partitions become smaller than some tolerance value. So, repeating this process we get,

D = 0 .. 1
@manipulate for n in [2^i for i in 3:10]
  Di = mince(D, n)
  bounds = f.(Di)
  boxes = [IntervalBox(di, bound) for (di, bound) in zip(Di, bounds)]
  threshold = minimum([b.hi for b in bounds])
  active_boxes = [box for box in boxes if box[2].lo < threshold]
  p = plot(legend=:topleft)
  plot!(boxes, c=:green, label="Unactive partitions")
  plot!(active_boxes, c=:blue, label="Active partitions")
  plot!(xrange, x -> f(x), lw=2, c=:red, label="f(x)")
  hline!([threshold], lw=2, c=:blue, label="Threshold value")
  ylims!(0.9, 1.05)
end
1.6s

As the number of partitions is increased we get a better estimate for the region that contains the minimum; this is the essence of the branch-and-bound method.

So, specifically, the steps for the branch-and-bound method are

  1. Partition inline_formula not implemented into inline_formula not implemented disjoint sets.

  2. Compute the bounds for each of these inline_formula not implementeds by some validated method (previously we used only interval arithmetic but is not restricted to this).

  3. Update the threshold value and discard the uninteresting partitions.

  4. Split the active partitions and go to 1.

This kind of strategy is actually implemented on the BranchAndPrune.jl package. So, let's re-write the previous example with the BranchAndPrune package. This package requests us 3 things:

  • A struct that contains some information about the search (the function, initial domain box, tolerance, etc.)

  • A new method for the function BranchAndPrune.process that tells the package how to process each of the subdomains, i.e., which calculation needs to be done on each of them.

  • A new method for the function BranchAndPrune.bisect that specifies how to split each subdomain if required.

So, for the last example, we have

mutable struct IntervalSearch{T} <: AbstractBreadthFirstSearch{Interval{T}}
  f::Function
  initial::Interval{T}
  tol::Float64
  cutoff::Float64
end
function BranchAndPrune.process(search::IntervalSearch, interval)
  bound = search.f(interval) # compute the bound on the Di subdomain
  
  if bound.lo > search.cutoff
    # if the lower bound is higher than the cutoff value
    # then we discard the subdomain, as can not contain the minimum
    return :discard, interval 
  end
  
  if bound.hi < search.cutoff
    # if the upper bound is lower than the cutoff value
    # then we update it
    search.cutoff = bound.hi
  end
  
  if diam(interval) < search.tol
    # if the width of the subdomain is less than
    # the tolerance, we store that subdomain, as can
    # potentially have the minimum
    return :store, interval
  else
    # If not, split further
    return :bisect, interval
  end
end
function BranchAndPrune.bisect(::IntervalSearch, Di)
  Di = mince(Di, 2)
  return (Di...,)
end
# Now we define a function to actually doing the search
function run_search_interval(f, interval, tol; cutoff=1.)
  history = []
  search = IntervalSearch(f, interval, tol, cutoff)
  
  local end_tree = nothing
  for working_tree in search
    end_tree = working_tree
    push!(history, data(end_tree))
  end
  
  return data(end_tree), history
end 
0.2s
run_search_interval (generic function with 1 method)
f(x) = 1 + x^5 - x^4
D = 0 .. 1
tol = 1e-2
cutoff = 2.
xrange = range(D.lo, D.hi, length=100)
final, history = run_search_interval(f, D, tol, cutoff=cutoff);
history = unique([h for hh in history for h in hh])
interval_y = interval(-20, 20)
boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]
final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]
plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))
plot!(final_boxes, color=:blue, label="Final boxes")
plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)
ylims!(0.9, 1)
1.7s

We see that it takes too many steps to actually bound the minimum region and that this region (in blue) is too large; this is a consequence of the overestimation for the range function by plain interval arithmetic. This can be improved if instead of using plain interval arithmetic to compute the bound on each subdomain, we use a Taylor Model since in general gives tighter bounds.

mutable struct TMSearch{T} <: AbstractBreadthFirstSearch{Interval{T}}
  f::Function
  initial::Interval{T}
  tol::Float64
  cutoff::Float64
  order::Int
end
function BranchAndPrune.process(search::TMSearch, interval)
  # Defining a Taylor Model with the midpoint as expansion point
  tm = TaylorModel1(order, mid(interval), interval)
  ftm = search.f(tm)
  bound = ftm(ftm.dom - ftm.x0) # compute the bound on the Di subdomain
  
  if bound.lo > search.cutoff
    # if the lower bound is higher than the cutoff value
    # then we discard the subdomain, as can not contain the minimum
    return :discard, interval 
  end
  
  if bound.hi < search.cutoff
    # if the upper bound is lower than the cutoff value
    # then we update it
    search.cutoff = bound.hi
  end
  
  if diam(interval) < search.tol
    # if the width of the subdomain is less than
    # the tolerance, we store that subdomain, as can
    # potentially have the minimum
    return :store, interval
  else
    # If not, split further
    return :bisect, interval
  end
end
function BranchAndPrune.bisect(::TMSearch, Di)
  Di = mince(Di, 2)
  return (Di...,)
end
function run_search_tm(f, interval, tol; cutoff=1., order=3)
  history = []
  search = TMSearch(f, interval, tol, cutoff, order)
  
  local end_tree = nothing
  for working_tree in search
    end_tree = working_tree
    push!(history, data(end_tree))
  end
  
  return data(end_tree), history
end
0.5s
run_search_tm (generic function with 1 method)
f(x) = 1 + x^5 - x^4
D = 0 .. 1
tol = 1e-2
cutoff = 2.
order = 3
xrange = range(D.lo, D.hi, length=100)
final, history = run_search_tm(f, D, tol, cutoff=cutoff, order=order);
history = unique([h for hh in history for h in hh])
interval_y = interval(-20, 20)
boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]
final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]
plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))
plot!(final_boxes, color=:blue, label="Final boxes")
plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)
ylims!(0.9, 1)
5.8s

We see a huge impact on the results with the use of a (naive) Taylor Model of order 3, now it takes fewer splits to actually find the minimum and the region that contains it is much tighter.

While in this case the naive Taylor Model gives good results, this can be improved; let's repeat this example but with the use of range bounders.

mutable struct BounderSearch{T} <: AbstractBreadthFirstSearch{Interval{T}}
  f::Function
  initial::Interval{T}
  tol::Float64
  cutoff::Float64
  order::Int
  bounder_type::String
end
function BranchAndPrune.process(search::BounderSearch, interval)
  # Defining a Taylor Model with the midpoint as expansion point
  tm = TaylorModel1(order, mid(interval), interval)
  ftm = search.f(tm)
  
  if search.bounder_type == "ldb"
    bound = linear_dominated_bounder(ftm)
  elseif search.bounder_type == "qfb"
    bound = quadratic_fast_bounder(ftm)
  else
    bound = ftm(ftm.dom - ftm.x0)
  end
  
  if bound.lo > search.cutoff
    # if the lower bound is higher than the cutoff value
    # then we discard the subdomain, as can not contain the minimum
    return :discard, interval 
  end
  
  if bound.hi < search.cutoff
    # if the upper bound is lower than the cutoff value
    # then we update it
    search.cutoff = bound.hi
  end
  
  if diam(interval) < search.tol
    # if the width of the subdomain is less than
    # the tolerance, we store that subdomain, as can
    # potentially have the minimum
    return :store, interval
  else
    # If not, split further
    return :bisect, interval
  end
end
function BranchAndPrune.bisect(::BounderSearch, Di)
  Di = mince(Di, 2)
  return (Di...,)
end
function run_search_bounder(f, interval, tol; cutoff=1., order=3, bounder=:ldb)
  history = []
  search = BounderSearch(f, interval, tol, cutoff, order, bounder)
  
  local end_tree = nothing
  for working_tree in search
    end_tree = working_tree
    push!(history, data(end_tree))
  end
  
  return data(end_tree), history
end
0.7s
run_search_bounder (generic function with 1 method)
f(x) = 1 + x^5 - x^4
D = 0 .. 1
tol = 1e-2
cutoff = 2.
order = 3
final_ldb, history_ldb = run_search_bounder(f, D, tol, cutoff=cutoff,
                                            order=order, bounder="ldb")
final_qfb, history_qfb = run_search_bounder(f, D, tol, cutoff=cutoff,
                                            order=order, bounder="qfb")
final_naive, history_naive = run_search_bounder(f, D, tol, cutoff=cutoff,
                                                order=order, bounder="naive")
xrange = range(D.lo, D.hi, length=100)
@manipulate for bounder in ("ldb", "qfb", "naive")
  if bounder == "ldb"
    final, history = final_ldb, history_ldb
  elseif bounder == "qfb"
    final, history = final_qfb, history_qfb
  elseif bounder == "naive"
    final, history = final_naive, history_naive
  end
  history = unique([h for hh in history for h in hh])
  interval_y = interval(-20, 20)
  boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]
  final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]
  plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))
  plot!(final_boxes, color=:blue, label="Final boxes")
  plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)
  ylims!(0.9, 1)
end
1.4s
f(x) = x^2 * cos(5 - x) + sin(5 - x^2)^2 
D = -2 .. 4
tol = 1e-1
cutoff = 2.
order = 3
final_ldb, history_ldb = run_search_bounder(f, D, tol, cutoff=cutoff,
                                            order=order, bounder="ldb")
final_qfb, history_qfb = run_search_bounder(f, D, tol, cutoff=cutoff,
                                            order=order, bounder="qfb")
final_naive, history_naive = run_search_bounder(f, D, tol, cutoff=cutoff,
                                                order=order, bounder="naive")
xrange = range(D.lo, D.hi, length=100)
@manipulate for bounder in ("ldb", "qfb", "naive")
  if bounder == "ldb"
    final, history = final_ldb, history_ldb
  elseif bounder == "qfb"
    final, history = final_qfb, history_qfb
  elseif bounder == "naive"
    final, history = final_naive, history_naive
  end
  history = unique([h for hh in history for h in hh])
  interval_y = interval(-20, 20)
  boxes = [IntervalBox(interval_x, interval_y) for interval_x in history]
  final_boxes = [IntervalBox(final_x, interval_y) for final_x in final]
  plot(boxes, fill=0., label="Splits", legend=:bottomleft, figsize=(800, 800))
  plot!(final_boxes, color=:blue, label="Final boxes")
  plot!(xrange, x -> f(x), color=:red, label="f(x)", lw=2)
  #ylims!(-10, 10)
end
1.5s

From the previous interactive plots, we see that for some functions, the use of bounders improves the search, but for others, the results are the same as with naive Taylor Model. So, a natural question arises, are these extra computations due to the range bounder worth it?

To answer this, let's compute the timings for each type of search: interval-based, Taylor Model-based, and with range bounders.

f(x) = x^2 * cos(5 - x) + sin(5 - x^2)^2 
D = -2 .. 4
tol = 1e-1
cutoff = 2.
order = 3
0.2s
3
@btime final, history = run_search_interval($f, $D, $tol, cutoff=$cutoff);
12.9s
bounder = "naive"
@btime final_naive, history_naive = run_search_bounder($f, $D, $tol,
  cutoff=$cutoff, order=$order, bounder=$bounder);
12.8s
bounder = "ldb"
@btime final_ldb, history_ldb = run_search_bounder($f, $D, $tol, cutoff=$cutoff,
                                            order=$order, bounder=$bounder);
12.6s
bounder = "qfb"
@btime final_qfb, history_qfb = run_search_bounder($f, $D, $tol, cutoff=$cutoff,
                                            order=$order, bounder=$bounder);
12.6s

We see the price for improving the search, the Taylor Model-based searches are 3 orders of magnitude slower! On the other hand, we see that naive and qfb methods have similar computation times, meanwhile, the ldb algorithm has a slightly higher time.

Finally, let's see how this works for two-dimensional functions.

# This struct must be a subtype of AbstractBreadthFirstSearch
mutable struct IntervalMinimumSearch{N, T} <: AbstractBreadthFirstSearch{IntervalBox{N, T}}
  f::Function
  initial::IntervalBox{N, T}
  tol::Float64
  cutoff::Float64
end
mutable struct BounderSearch{N, T} <: AbstractBreadthFirstSearch{IntervalBox{N, T}}
  f::Function
  initial::IntervalBox{N, T}
  tol::Float64
  cutoff::Float64
  order::Int
  bounder_type::String
end
function define_taylor_model(dom::IntervalBox, X0, f; order=3)
    zi = 0 .. 0
    order_max = 2 * (order + 1)
    set_variables(Float64, [:x, :y], order=order_max)
    xT = TaylorN(Float64, 1, order=order)
    yT = TaylorN(Float64, 2, order=order)
    xT = TaylorN(deepcopy(evaluate(xT, get_variables(order) .+ mid(X0)).coeffs))
    yT = TaylorN(deepcopy(evaluate(yT, get_variables(order) .+ mid(X0)).coeffs))
    xm = TaylorModelN{2, Float64, Float64}(xT, zi, X0, dom)
    ym = TaylorModelN{2, Float64, Float64}(yT, zi, X0, dom)
    ftm = f(xm, ym)
    return ftm
end
function BranchAndPrune.process(search::BounderSearch, interval)
  X0 = IntervalBox(mid(interval))
  ftm = define_taylor_model(interval, X0, search.f, order=search.order)
 
  if search.bounder_type == "ldb"
    bound = linear_dominated_bounder(ftm, max_iter=5, ϵ=1e-15)
  elseif search.bounder_type == "qfb"
    bound = quadratic_fast_bounder(ftm)
  else
    bound = ftm(ftm.dom - ftm.x0)
  end
  
  if bound.lo > search.cutoff
    # if the lower bound is higher than the cutoff value
    # then we discard the subdomain, as can not contain the minimum
    return :discard, interval 
  end
  
  if bound.hi < search.cutoff
    # if the upper bound is lower than the cutoff value
    # then we update it
    search.cutoff = bound.hi
  end
  
  if prod(diam.(interval)) < search.tol
    # if the width of the subdomain is less than
    # the tolerance, we store that subdomain, as can
    # potentially have the minimum
    return :store, interval
  else
    # If not, split further
    return :bisect, interval
  end
end
# Function to process each subdomain
function BranchAndPrune.process(search::IntervalMinimumSearch, interval)
  bound = search.f(interval)
  if bound.lo > search.cutoff
    return :discard, interval
  end
  
  if bound.hi < search.cutoff
    search.cutoff = bound.hi
  end
  
  if prod(diam.(interval)) < search.tol
    return :store, interval
  else
    return :bisect, interval
  end
end
function BranchAndPrune.bisect(::Union{IntervalMinimumSearch, BounderSearch}, box)
  interval_x = box[1]
  interval_y = box[2]
  w_x, w_y = diam.(box)
  mid_x, mid_y = mid.(box)
  
  if w_x > w_y
    return IntervalBox(interval_x.lo .. mid_x, interval_y),
      IntervalBox(mid_x .. interval_x.hi, interval_y)
  else
    return IntervalBox(interval_x, interval_y.lo .. mid_y),
      IntervalBox(interval_x, mid_y .. interval_y.hi)
  end
end
0.2s
function run_search_interval(f, interval, tol; cutoff=1.)
  history = []
  search = IntervalMinimumSearch(f, interval, tol, cutoff)
  
  local end_tree = nothing
  for working_tree in search
    end_tree = working_tree
    push!(history, data(end_tree))
  end
  
  return data(end_tree), history
end 
function run_search_bounder(f, interval, tol; bounder_type="ldb",
    order=3, cutoff=1.)
    history = []
    search = BounderSearch(f, interval, tol, cutoff,
                                 order, bounder_type)
    local end_tree = nothing
    
    for working_tree in search
        end_tree = working_tree
        push!(history, data(end_tree))
    end
    
    return data(end_tree), history
end
0.4s
run_search_bounder (generic function with 1 method)
f(x1, x2) = (1.5 - x1 * (1 - x2))^2 + (2.25 - x1 * (1 - x2^2)) * (2.25 - x1 * (1 - x2^2)) + (2.625 - x1 * (1 - x2^3)) * (2.625 - x1 * (1 - x2^3))
f(box) = f(box[1], box[2])
box = IntervalBox(-4.5 .. 4.5, -4.5 .. 4.5)
tol = 1 / 32
order = 4
final_interval, history_interval = run_search_interval(f, box, tol)
history_interval = unique([h for hh in history_interval for h in hh])
final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive",
  order=order)
history_naive = unique([h for hh in history_naive for h in hh])
final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb",
  order=order)
history_ldb = unique([h for hh in history_ldb for h in hh])
final_qfb, history_qfb = run_search_bounder(f, box, tol, bounder_type="qfb",
  order=order)
history_qfb = unique([h for hh in history_qfb for h in hh])
searches = (interval = (final_interval, history_interval),
            naive = (final_naive, history_naive),
            ldb = (final_ldb, history_ldb),
            qfb = (final_qfb, history_qfb))
@manipulate for search_type in ("interval", "naive", "ldb", "qfb")
  final, history = searches[Symbol(search_type)]
  plot(history, c=:wheat2, legend=false, ratio=:equal,
    title="Minimum search for Beale function")
  plot!(final, c=:blue)
  scatter!([3], [0.5], c=:red, ms=3)
  xlims!(box[1].lo, box[2].hi)
  ylims!(box[2].lo, box[2].hi)
end
9.2s
f(x,y)=sin(x+y)+(x-y)^2-1.5x+2.5y+1
f(box) = f(box[1], box[2])
box = IntervalBox(-1.5 .. 4., -3. .. 4.)
tol = 1 / 32
order = 4
final_interval, history_interval = run_search_interval(f, box, tol)
history_interval = unique([h for hh in history_interval for h in hh])
final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive",
  order=order)
history_naive = unique([h for hh in history_naive for h in hh])
final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb",
  order=order)
history_ldb = unique([h for hh in history_ldb for h in hh])
searches = (interval = (final_interval, history_interval),
            naive = (final_naive, history_naive),
            ldb = (final_ldb, history_ldb))
@manipulate for search_type in ("interval", "naive", "ldb")
  final, history = searches[Symbol(search_type)]
  
  plot(history, c=:wheat2, legend=false, ratio=:equal,
    title="Minimum search for McCormick function")
  plot!(final, c=:blue)
  scatter!([-0.54719], [-1.54719], c=:red, ms=3)
  xlims!(box[1].lo, box[1].hi)
  ylims!(box[2].lo, box[2].hi)
end
1.6s
f(x, y) = (1 + (x + y + 1)^2 * (19 - 14x + 3x^2 - 14y + 6 * x * y + 3y^2)) * (30 + (2x - 3y)^2 * (18 - 32x + 12x^2 + 48y - 36 * x * y + 27y^2))
f(box) = f(box[1], box[2])
box = IntervalBox(-2. .. 2., -2. .. 2.)
tol = 1 / 32
order = 4
final_interval, history_interval = run_search_interval(f, box, tol)
history_interval = unique([h for hh in history_interval for h in hh])
final_naive, history_naive = run_search_bounder(f, box, tol, bounder_type="naive",
  order=order)
history_naive = unique([h for hh in history_naive for h in hh])
final_ldb, history_ldb = run_search_bounder(f, box, tol, bounder_type="ldb",
  order=order)
history_ldb = unique([h for hh in history_ldb for h in hh])
final_qfb, history_qfb = run_search_bounder(f, box, tol, bounder_type="qfb",
  order=order)
history_qfb = unique([h for hh in history_qfb for h in hh])
searches = (interval = (final_interval, history_interval),
            naive = (final_naive, history_naive),
            ldb = (final_ldb, history_ldb),
            qfb = (final_qfb, history_qfb))
@manipulate for search_type in ("interval", "naive", "ldb", "qfb")
  final, history = searches[Symbol(search_type)]
  plot(history, c=:wheat2, legend=false, ratio=:equal,
    title="Minimum search for Goldstein–Price function")
  plot!(final, c=:blue)
  scatter!([0.], [-1.], c=:red, ms=3)
  xlims!(box[1].lo, box[1].hi)
  ylims!(box[2].lo, box[2].hi)
end
4.8s

As illustrated in the last plots, in general, the ldb algorithm gives the best results among all Taylor Model-based searches, and in every case, the Taylor Model-based search surpasses the search with plain interval methods.

References

[1] Numerical Optimization, Nocedal & Wright, Springer.

Runtimes (1)