Maximizing in Multiple Dimensions

Demystifying the Multivariate Second Order Condition for Economics

Setup

import Pkg
Pkg.add("ForwardDiff")
Pkg.update("Plots")
Update environment
6.3s
using ForwardDiff
using LinearAlgebra
using Plots
Use packages
80.4s

Why Should You Read This?

If you've studied economics for any length of time, you've had to solve your fair share of maximization problems. After all, if economics is the study of agents behaving optimally, economists need to be well-versed in optimizing the objective function of the agents being studied.

My experience has been that undergraduate economics students are quite comfortable optimizing functions of a single variable but struggle a bit with the transition to a multivariate setting, especially with the intuition behind the second order condition. This document is intended to walk through that intuition using a "show and tell" method, providing examples and showing how those examples behave.

If you're new to multivariate optimization, or just looking for a quick refresher, read on! (Or feel free to skip to the bottom for the tl;dr)

Univariate Maximization

Suppose we have a function inline_formula not implemented that we want to maximize. For this example, let inline_formula not implemented, which has a maximum at inline_formula not implemented:

f(x) = 2x - x^2
x₀ = 1
f(x₀)
0.2s

You may be familiar with the necessary and sufficient conditions for a maximum if the function is differentiable: the first derivative must be zero (the first order condition, or FOC), and the second derivative must be negative (the second order condition, or SOC). But in building up our intuition, it is useful to revisit why we need these conditions to hold by consider the total derivative.

Totally differentiating the function inline_formula not implemented at the point inline_formula not implemented gives us:

formula not implemented

This expression states that at the point inline_formula not implemented, inline_formula not implemented (the marginal change in inline_formula not implemented) is equal to inline_formula not implemented (the marginal change in inline_formula not implemented) times inline_formula not implemented (the first derivative of inline_formula not implemented evaluated at inline_formula not implemented). If inline_formula not implemented is positive at inline_formula not implemented, that means that the function value can be increased by a marginal change in inline_formula not implemented, so inline_formula not implemented cannot be a maximum. By a similar argument, inline_formula not implemented cannot be negative at a maximum, or else we could consider a negative change in inline_formula not implemented and call it inline_formula not implemented, which would make inline_formula not implemented positive, which we know cannot be true at a maximum. We need inline_formula not implemented to be zero for any marginal change inline_formula not implemented, for which we need the slope of inline_formula not implemented (i.e. inline_formula not implemented) to be zero. The inline_formula not implemented such that inline_formula not implemented is a critial point.

f′(x) = ForwardDiff.derivative(f, x)
f′(x₀)
0.4s

To distinguish a maximum from a minimum, we need to see how inline_formula not implemented changes as we pass through the critical point. If inline_formula not implemented is increasing on either side, we have found a minimum (the function is increasing around the critical point, meaning that the critical point is the lowest point in the neighborhood); if it is increasing on either side, we have found a maximum. To see how inline_formula not implemented behaves close to the critical point, we can just totally differentiate again:

formula not implemented

For a maximum, we require inline_formula not implemented to be decreasing, or inline_formula not implemented. This leads to the familiar second derivative test: inline_formula not implemented.

f″(x) = ForwardDiff.derivative(f′, x)
f″(x₀)
0.4s

Multivariate Maximization

Suppose now we have a function inline_formula not implemented that we want to maximize. (For brevity I omit from the notation that all partial derivatives are evaluated at the point inline_formula not implemented.) Totally differentiating gives us:

formula not implemented

This needs to be zero for any combination of marginal changes inline_formula not implemented. The only way that can happen is if inline_formula not implemented, our familiar first order condition. But how do we check that the resulting critical point is a maximum?

The Wrong Way

A natural inclination would be to check that each partial derivative is decreasing around the critical point:

formula not implemented

But this may give us an incorrect answer! That is, there are some points that may satisfy this condition but are not local maxima. Let's illustrate this with an example function:

formula not implemented
g(x, y) = 3*x*y - x^2 - y^2
g(v) = g(v[1], v[2]); # add a method that takes a vector input
0.2s

The first order condition is:

formula not implemented

The unique solution to these equations yields a critical point at inline_formula not implemented. The "naive" second order condition is:

formula not implemented

This tells us that our critical point should be a maximum. But we can easily find a contradiction:

formula not implemented
@show g(0, 0)
@show g(1, 1)
g(0 ,0) < g(1, 1)
2.3s

So this point is not a maximum! What happened?

The second partial derivative inline_formula not implemented tells us only how inline_formula not implemented changes as inline_formula not implemented changes; it says nothing about what happens when inline_formula not implemented changes. To see this, we can plot this function for a path through the domain that passes through the false maximum where inline_formula not implemented changes but not inline_formula not implemented:

xs = range(-1, 1, length=500)
# Plot the path through the domain
px = plot(
  xs, repeat([0], length(xs)), 
  xlims=(-1, 1), ylims=(-1, 1), xlabel = "x", ylabel="y", legend=false, title="Domain"
)
scatter!(px, [0], [0], color=:black) # add a dot at the critical point
# Plot the function value over the path
pxg = plot(
  xs, g.(xs, 0), 
  xlabel="t", ylabel="g(t, 0)", legend=false, title="Function Value"
)
vline!(pxg, [0], linestyle=:dash, color=:black) # add a dashed line at the critical point
# Combine the two subplots
plot(px, pxg, size=(600, 250), layout=2)
18.2s

This is what the FOC and "SOC" with respect to inline_formula not implemented are capturing. We can see that at inline_formula not implemented we have a critical point in the function (i.e. inline_formula not implemented) and the function is locally concave with respect to changes in inline_formula not implemented (i.e. inline_formula not implemented).

Now let's do the same thing, but with inline_formula not implemented instead of inline_formula not implemented:

# Plot the path through the domain
py = plot(
  repeat([0], length(xs)), xs, 
  xlims=(-1, 1), ylims=(-1, 1), xlabel = "x", ylabel="y", legend=false, title="Domain"
)
scatter!(py, [0], [0], color=:black) # add a dot at the critical point
# Plot the function value over the path
pyg = plot(
  xs, g.(0, xs), 
  xlabel="t", ylabel="g(0, t)", legend=false, title="Function Value"
)
vline!(pyg, [0], linestyle=:dash, color=:black) # add a vertical line at the critical point 
# Combine the two subplots
plot(py, pyg, size=(600, 250), layout=2)
1.1s

Unsurprisingly, we see the same pattern – a critical point with local concavity with respect to changes in inline_formula not implemented, holding inline_formula not implemented constant.

You might have already guessed what is missing from this analysis. We have checked what happens to inline_formula not implemented as inline_formula not implemented changes, holding inline_formula not implemented constant, and what happens to inline_formula not implemented as inline_formula not implemented changes, holding inline_formula not implemented constant. But what about how inline_formula not implemented changes as both inline_formula not implemented and inline_formula not implemented change? Let's plot a new path through the domain along the path inline_formula not implemented and see what happens to inline_formula not implemented:

# Plot the path through the domain
pxy = plot(
  xs, xs,
  xlims=(-1, 1), ylims=(-1, 1), xlabel = "x", ylabel="y", legend=false, title="Domain"
)
scatter!(pxy, [0], [0], color=:black) # add a dot at the critical point
# Plot the function value over the path
pxyg = plot(
  xs, g.(xs, xs), 
  xlabel="t", ylabel="g(t, t)", legend=false, title="Function Value"
)
vline!(pxyg, [0], linestyle=:dash, color=:black) # add a vertical line at the critical point
# Combine the two subplots
plot(pxy, pxyg, size=(600, 250), layout=2)
1.2s

Uh oh, from this angle our critical point looks like a minimum rather than a maximum! When we account for both inline_formula not implemented and inline_formula not implemented changing together, we find that our point is not a true maximum.

To get a better understanding of what's going on, let's plot our function in three dimensions:

plot(
  xs, xs, g, seriestype=:surface,
  xlabel="x", ylabel="y", zlabel="g(x,y)"
)
scatter!([0], [0], [g(0, 0)], color=:black, label=nothing) # add a dot at the critical point
4.4s

From this figure we can see that the function value falls when moving straight in the inline_formula not implemented or inline_formula not implemented directions but rises along the 45º line. Taking this into account, we can find all of the points that do better (have a higher function value) than our "maximum":

plot(xs, xs, (x, y) -> g(x, y) > g(0, 0), seriestype=:heatmap)
2.7s

Note that none of these points fall on the first two paths we plotted (inline_formula not implemented and inline_formula not implemented) but do fall on the third path we plotted (inline_formula not implemented).

Clearly, we need to account for joint changes in inline_formula not implemented and inline_formula not implemented, but how do we do that?

The Right Way

Let's go back to the beginning and consider the problem using the tool of total differentiation.

Total Differentiation

Remember that the FOC tells us that at the point inline_formula not implemented, inline_formula not implemented cannot be increasing or decreasing, or inline_formula not implemented:

formula not implemented

Recall that we want to find a point in the function where inline_formula not implemented is changing at a decreasing rate, or inline_formula not implemented. Totally differentiating again yields:

formula not implemented

Here we use the fact that inline_formula not implemented to combine the two terms to get inline_formula not implemented. We need to find values of inline_formula not implemented, inline_formula not implemented, and inline_formula not implemented such that inline_formula not implemented is negative for all combinations of inline_formula not implemented and inline_formula not implemented. How are we supposed to do that?

Quadratic Forms

The (perhaps disappointing) answer is that we need to use a linear algebra trick where we write our expression for inline_formula not implemented as a matrix product:

formula not implemented

What does this trick allow us to do? When written this way, it becomes clear that our expression for inline_formula not implemented is something called a quadratic form, which is a polynomial where all terms are of degree two. A quadratic form in inline_formula not implemented variables inline_formula not implemented can be represented as a product of a inline_formula not implemented vector of variables inline_formula not implemented and a inline_formula not implemented matrix of coefficients inline_formula not implemented:

formula not implemented

(inline_formula not implemented denotes the transpose of inline_formula not implemented.) In our case, we have:

formula not implemented

You may recognize that inline_formula not implemented is a matrix of second partial derivatives of the function inline_formula not implemented, which has a special name: the Hessian matrix. Since the Hessian matrix contains the coefficients of our polynomial, our quadratic form will exhibit a certain property (being negative everywhere) if our Hessian matrix satisfies a certain condition.

Negative Definiteness

We say that a matrix inline_formula not implemented is negative definite if and only if the quadratic form inline_formula not implemented for all inline_formula not implemented. Clearly, we want the quadratic form inline_formula not implemented to always be negative, so we need the matrix inline_formula not implemented to be negative definite. How do we test this property of inline_formula not implemented?

One way to test negative definiteness is by finding the eigenvalues of the matrix (all eigenvalues of a negative definite matrix are negative). If you don't like the idea of finding eigenvalues each time you need to solve an optimization problem, don't worry – in economics, it is most common to test the leading principal minors of the Hessian matrix. Another definition is in order: the leading principal minors of a inline_formula not implemented matrix are the determinants of the inline_formula not implemented top-left sub matrices (e.g. the top-left inline_formula not implemented submatrix, the top-left inline_formula not implemented submatrix, etc.). Let's use inline_formula not implemented to denote the inline_formula not implemented-th principal minor of the matrix inline_formula not implemented. For our application, the principal minors are:

formula not implemented

Now, the condition we need these leading principal minors to satisfy is that they alternate in sign in a particular way:

formula not implemented

Why do we need the first principal minor to be negative and the second positive? Remember that all the eigenvalues of inline_formula not implemented must be negative. inline_formula not implemented is the product of one (negative) eigenvalue and so must be negative; inline_formula not implemented is the product of two (negative) eigenvalues and so must be positive. If we had more than two variables, we would need inline_formula not implemented to be negative, inline_formula not implemented to be positive, and so on. (For positive definite matrices, which identify local minima, we need all principal minors to be positive since they are the product of all positive eigenvalues.) A succinct (and hopefully memorable) way of stating this condition is:

formula not implemented

Applying the Test

Finally, we are ready to apply everything we have learned to the problem at hand. The Hessian matrix of the function inline_formula not implemented is:

formula not implemented
H = ForwardDiff.hessian(g, [0, 0])
2.9s

And the leading principal minors are:

formula not implemented

Since the leading principal minors of the Hessian matrix do not alternate in sign, the Hessian matrix is not negative definite, and consequently the critical point inline_formula not implemented is not a local maximum.

Implementing the Test

Let's write a function that implements this test for the second order condition. We want our function to return true if the SOC is satisfied and false otherwise:

"""
    check_SOC(f, x) -> Bool
Check the second order condition of the function `f` evaluated at the point `x`.
"""
function check_SOC(f, x)
  H = ForwardDiff.hessian(f, x)
  N = size(H, 1)
  SOC = Vector{Bool}(undef, N)
  for n in 1:N
    pm = det(H[1:n, 1:n])
    SOC[n] = sign(pm) == sign((-1)^n)
    msg = SOC[n] ? "passed" : "failed"
    println("n = $n: principal minor = $pm => $msg")
  end
  return all(SOC)
end
0.6s

If we apply this to our test function inline_formula not implemented, we should find that it fails:

check_SOC(g, [0, 0])
0.8s

Application: Returns to Scale in Profit Maximization

You may have learned that if a firm has a Cobb-Douglas production function, we need to assume that it exhibits decreasing returns to scale for the profit maximization problem to have a solution. Now we have the tools to show why!

The firm seeks to choose a capital input inline_formula not implemented and labor input inline_formula not implemented to maximize the following profit function:

formula not implemented
profit(K, L; α=0.3, β=0.6, A=1.0, r=0.4, w=0.6) = A*(K^α)*(L^β) - r*K - w*L;
profit(x; kwargs...) = profit(x[1], x[2]; kwargs...);
0.2s

Here inline_formula not implemented is the rental rate of capital and inline_formula not implemented is the wage (both normalized by the output price).

The first order condition is:

formula not implemented

This leads to the familiar solution:

formula not implemented
K(r, w; A=1.0, α=0.3, β=0.6) = (A * (α/r)^(1-β) * (β/w)^β)^(1/(1-α-β))
L(r, w; A=1.0, α=0.3, β=0.6) = (A * (α/r)^α * (β/w)^(1-α))^(1/(1-α-β))
r = 0.4
w = 0.6
k, l = (K(r,w), L(r,w))
0.5s

But what about the second order condition? First, we need to compute the Hessian matrix. Using the algebra trick that output inline_formula not implemented, we can find the second derivatives as follows:

formula not implemented

Now we construct the Hessian matrix:

formula not implemented
H = ForwardDiff.hessian(profit, [k, l])
2.5s

Next, we need to determine the sign of the principal minors. The first principal minor is:

formula not implemented

Clearly, the sign depends on a number of factors. In general, we typically assume that inline_formula not implemented, simply meaning that capital increases output. Because marginal revenue is infinite at inline_formula not implemented, the profit-maximizing level of inputs will be positive (meaning output will be positive), so inline_formula not implemented and inline_formula not implemented. This leaves one final requirement: inline_formula not implemented.

Moving on to the second principal minor, we have:

formula not implemented

Again, we know that inline_formula not implemented and inline_formula not implemented (and therefore inline_formula not implemented) must be positive at a critical point, so the factored term inline_formula not implemented. In order for the second principal minor to be positive (remember, we need the same sign as inline_formula not implemented), we require inline_formula not implemented. If we rearrange this inequality to inline_formula not implemented, we get precisely the requirement for a Cobb-Douglas production function to have decreasing returns to scale!

If the production function were to exhibit increasing returns to scale, we would have inline_formula not implemented, which implies inline_formula not implemented. This means that all principal minors are positive, so the critical point of the profit function is a local minimum! The profit maximization problem does not have a solution because profits go off to infinity as input levels increase beyond the critical point.

To summarize our findings:

formula not implemented
check_SOC(profit, [k, l])
0.6s

Note that the SOC fails if we pick a value such that inline_formula not implemented:

(k, l) = (K(r, w; α=0.8, β=0.8), K(r, w; α=0.8, β=0.8))
check_SOC(x -> profit(x, α=0.8, β=0.8), [k, l])
1.7s

tl;dr

To test whether a critical point inline_formula not implemented of a function inline_formula not implemented is a local maximum, we check the principal minors of the Hessian matrix (the matrix of second derivatives) evaluated at the point inline_formula not implemented, requiring that the inline_formula not implemented-th principal minor has the same sign as inline_formula not implemented.

In the case where inline_formula not implemented, this is the familar second order condition test:

formula not implemented

In the case where inline_formula not implemented, using the notation inline_formula not implemented, the test is:

formula not implemented

For the point inline_formula not implemented to be a local minimum, we require that all principal minors of the Hessian matrix evaluated at inline_formula not implemented are positive.

Runtimes (1)