We are interested in solving the constrained optimization problem
- The variables
must belong to the constraint region
, where
is a subset of
.
- The variables must satisfy the functional constraints
, where $latex h:
{\mathcal X}\rightarrow{\mathbb R}^d$ and
.
- Variables that belong to the constraint region and satisfy the functional constraints are called feasible and belong to the feasible set
- The variables are judged according to the relative magnitude of the objective function
.
- A feasible solution to the optimization,
, is called optimal when
for all
.
- In general, an optimum
need not exist.
- Inequality constraints
can be rewritten as equality constraints
, where
. Then an optimization of the form can be solved over variables
. The variables
are called slack variables.
, so maximizing is the same as minimizing.
The set could incorporate the functional constraints
, but generally we assume
is a set where the unconstrained optimization,
, is easier to solve. For example,
might be
and
might be solved by differentiating and finding local minima. What makes the problem difficult is the functional constraints. To get some solution, we might penalize going over these constraints. If so, we could remove these constraints and solve a different optimization
- In the above optimization, the objective function
is called the Lagrangian of the optimization problem .
- We call the components of the vector,
, Lagrange multipliers.
- We use
to notate its optimum of , if it exists.
- For each inequality constraint
, a variable
is added and the term
is replaced by
. Because we introduced a new term
, for a finite solution to , we require
and thus, after minimizing over
, we have
. This equality is called complementary slackness.
- If
our optimal Lagrangian solution is also feasible then complementary slackness states
The unconstrained optimization is, in principle, much easier to solve and a good choice of can be used to penalize constraints that are not satisfied. For example, if we solve the optimization problem with optimum
and found that
then, intuitively, we would make
bigger in order to make the price of overrunning constraint
higher. Hopefully, this would then make the solution of closer to . The minimum of the Lagrangian is always smaller.
Lemma [Weak Duality] For all
Proof
In fact, if we can make the two solutions equal then we have optimality.
Theorem 1 [Lagrangian Sufficiency Theorem] If there exists Lagrange multipliers and a solution
to
such that
is feasible for then
solves the optimization problem (P).
Proof As is feasible then certainly
. Now,
So, .
This result gives a simple procedure to solve an optimization:
- Take your optimization and, if necessary, add slack variables to make all inequality constraints equalities.
- Write out the Lagrangian and solve optimization for
.
- By solving the constraints
over
, find a
so that
is feasible.By Lagrangian Sufficiency Theorem,
is optimal.
Duality
Since weak duality holds, we want to make the minimized Lagrangian as big as possible. Only then can a feasible Lagrangian optimum be found to solve the optimization . Thus we consider the following optimization,
- Here
and
.
- We call this optimization problem the dual optimization and, in turn, we call the primal optimization.
- Making dependence on
explicit, use
to denote the optimal value of the primal problem and
to denote the optimal value of the dual. By weak duality
. If
then we say strong duality holds.
- Observe,
, if
and
, otherwise. So,
and by definition
. So strong duality corresponds exactly to the saddle point condition that
Essentially, the Lagrangian approach will only be effective when we can find a such that
. This is equivalent to the existence of a supporting hyperplane for
.
Theorem 2. iff
and
, together satisfy the property
Proof.
So iff
iff
for all
, as required.
- Informally, states that there exists a tangent to
at point
and that the function
lies above this tangent.
- The set
defines a hyperplane containing
and
lies in the half-space above
. So we call this “tangent”
a supporting hyperplane to
at
with slope
.
- Any function
is convex if
, for
. It can be verified that this is equivalent to the condition . Optimization of convex functions may be possible with Lagrangian methods.
- We discuss the relationship between convexity, hyperplanes, duality and Lagrangian optimization in the supplementary section, Section [DHLO].
Given , for , we can say that
Thus, if is differentiable at
then
- For a maximization problem, if we interpreted
as the optimal profit (or cost for a minimization problem) made when manufacturing using
units of raw material then can be interpreted as the change in profit made from gaining a small additional quantity of material
and thus
is interpreted at the price the manufacturer is prepared to pay to secure those extra goods.
- From this interpretation, we call the optimal Lagrange multiplier
a shadow price.
As noted above if is convex then we can find a
with which we can apply the Lagrangian approach.
- For a constrained optimization problem we say Slater’s Condition is satisfied if the objective function
is a convex function, if constraint region
is a convex set, if for each equality constraint
the function
is linear, if there exist a feasible solution
such that all inequality constraint are satisfied with strict inequality
.
- To make life easier we assume the functions
and
are continuous at the feasible
assumed in slater’s condition and we assume there exists some non-optimal feasible solution in
.
Theorem 3. If Slater’s Condition holds then is a convex function and thus there exists
such that
.
To prove this we will require the following (intuitively true) result.
Theorem 4. [Supporting Hyperplane Theorem] If is a convex set with non-empty interior and
then there exist a supporting hyperplane
that is a set
with
and
belonging a closed half-space associated with
(i.e.
or
.
We can prove this result later, we will prove Theorem 3.
Proof of Theorem 3. By Slater’s condition the equality constraints are given by a matrix , i.e.
for
. We assume that the image of
has dimension
, otherwise, as row rank
column rank for matricies, we can remove a redundant constraint.1 Consider the set
So, roughly speaking, is all the stuff above
for each
.
We show that is convex. For
,
and
corresponding feasible points used in the definition of
above, consider
and
The three statements above show that belongs to
. So
is convex.
Take . As
, by the Supporting Hyperplane Theorem there exists a
such
Provided we can choose to be strictly positive then
has a supporting hyperplane in the sense of Theorem [LO:OptSHP] and so we are done. It is clear
is positive: if we take
and
for some non-optimal
the above inequality reads
for all
, which is only true if
.
Now we show . Let’s look for a contradiction: if
then the supposting hyperplane would be vertical with respect to the component
and we would have
. Now applying Slater’s condition, there is a point
where the inequality constraints are strict. By continuity of our inequality constraint function and by continuity of linear map
for any
suitably small we can choose
(close to
) such that
By continuity of we can choose
so that
. Thus, it is clear that
but
, which contradicts the earlier assertion that thus
. Thus by taking
we have
Thus, by Theorem 2 as required. $\latex \square $
- We can choose not to remove these redundant constraints but then we have to mess arround with relative interiors, see Section [DHLO].↩