Lagrangian Optimization

We are interested in solving the constrained optimization problem


  • The variables x must belong to the constraint region {\mathcal X}, where {\mathcal X} is a subset of {\mathbb R}^p.
  • The variables must satisfy the functional constraints h(x)=b, where $latex h:155483139f9c13b2e9547efe5d19ef50.png{\mathcal X}\rightarrow{\mathbb R}^d$ and b\in{\mathbb R}^d.
  • Variables that belong to the constraint region and satisfy the functional constraints are called feasible and belong to the feasible set


  • The variables are judged according to the relative magnitude of the objective function f:{\mathcal X}\rightarrow{\mathbb R}.
  • A feasible solution to the optimization, x^*, is called optimal when f(x^*)\leq f(x) for all x\in{\mathcal X}(b).
  • In general, an optimum x^* need not exist.
  • Inequality constraints h_j(x)\leq b_j can be rewritten as equality constraints h_j(x)+z_j=b_j, where z_j\geq 0. Then an optimization of the form can be solved over variables (x,z)\in{\mathcal X}\times{\mathbb R}_+^{p'}. The variables z are called slack variables.
  • \max_{x\in{\mathcal X}(b)} f(x) = - \min_{x\in{\mathcal X}(b)} -f(x), so maximizing is the same as minimizing.

The set {\mathcal X} could incorporate the functional constraints \{x:h(x)=b\}, but generally we assume {\mathcal X} is a set where the unconstrained optimization, \min_{x\in{\mathcal X}} f(x), is easier to solve. For example, {\mathcal X} might be {\mathbb R}_+^p and \min_{x\in{\mathcal X}} f(x) might be solved by differentiating and finding local minima. What makes the problem difficult is the functional constraints. To get some solution, we might penalize going over these constraints. If so, we could remove these constraints and solve a different optimization


  • In the above optimization, the objective function


is called the Lagrangian of the optimization problem .

  • We call the components of the vector, \lambda\in{\mathbb R}^d, Lagrange multipliers.
  • We use x^*(\lambda) to notate its optimum of , if it exists.
  • For each inequality constraint h_j(x)\leq b_j, a variable z_j\geq 0 is added and the term \lambda_j(b_j-h_j(x)) is replaced by \lambda_j(b_j-z_j-h_j(x)). Because we introduced a new term -\lambda_jz_j, for a finite solution to , we require \lambda_j\leq 0 and thus, after minimizing over z_j, we have z^*_j\lambda_j=0. This equality is called complementary slackness.
  • If (x^*,z^*) our optimal Lagrangian solution is also feasible then complementary slackness states \lambda_j( b_j- h_j(x^*))=0.

The unconstrained optimization is, in principle, much easier to solve and a good choice of \lambda can be used to penalize constraints that are not satisfied. For example, if we solve the optimization problem with optimum x^*(\lambda) and found that b_i>h(x_i) then, intuitively, we would make \lambda_i bigger in order to make the price of overrunning constraint i higher. Hopefully, this would then make the solution of closer to . The minimum of the Lagrangian is always smaller.

Lemma [Weak Duality] For all \lambda\in{\mathbb R}^d




In fact, if we can make the two solutions equal then we have optimality.

Theorem 1 [Lagrangian Sufficiency Theorem] If there exists Lagrange multipliers \lambda^* and a solution x^*(\lambda^*) to (L_{\lambda^*}) such that x^*(\lambda^*) is feasible for then x^*(\lambda^*) solves the optimization problem (P).

Proof As x^*(\lambda^*) is feasible then certainly f(x^*(\lambda^*))\geq \min_{x\in{\mathcal X}(b)} f(x). Now,


So, f(x^*(\lambda^*))= \min_{x\in{\mathcal X}(b)} f(x).


This result gives a simple procedure to solve an optimization:

  1. Take your optimization and, if necessary, add slack variables to make all inequality constraints equalities.
  2. Write out the Lagrangian and solve optimization for x^*(\lambda).
  3. By solving the constraints h(x^*(\lambda))=b over \lambda, find a \lambda^* so that x^*=x^*(\lambda^*) is feasible.By Lagrangian Sufficiency Theorem, x^* is optimal.


Since weak duality holds, we want \lambda to make the minimized Lagrangian as big as possible. Only then can a feasible Lagrangian optimum be found to solve the optimization . Thus we consider the following optimization,

Screen Shot 2018-05-31 at 05.23.54.png

  • Here g(\lambda)=\inf_{x\in\mathcal X} L(x;\lambda) and \Lambda=\{ \lambda : \inf_{x\in\mathcal X} L(x;\lambda)>-\infty \}.
  • We call this optimization problem the dual optimization and, in turn, we call the primal optimization.
  • Making dependence on b explicit, use \rho(b) to denote the optimal value of the primal problem and \delta(b) to denote the optimal value of the dual. By weak duality \delta(b)\leq \rho(b). If \delta(b)=\rho(b) then we say strong duality holds.
  • Observe, \max_{\lambda} L(x;\lambda)= f(x), if h(x)=b and \max_{\lambda} L(x;\lambda)= \infty, otherwise. So, \rho(b)=\min_{x\in\mathcal X}\max_{\lambda} L(x;\lambda) and by definition \delta(b)=\max_{\lambda} \min_{x\in\mathcal X} L(x;\lambda). So strong duality corresponds exactly to the saddle point condition that

Essentially, the Lagrangian approach will only be effective when we can find a \lambda^* such that \rho(b)=\inf_{x\in\mathcal X} L(x;\lambda^*). This is equivalent to the existence of a supporting hyperplane for \rho(b).

Theorem 2. \rho(b)=\inf_{x\in\mathcal X} L(x;\lambda^*) iff \lambda^* and \rho, together satisfy the property


So \rho(b)=\inf_{x\in\mathcal X} L(x;\lambda^*) iff \rho(b)=\inf_{c\in\mathbb R^d} \big\{ \rho(c) + \lambda^{*\T} (b-c) \big\} iff \rho(b)\leq \rho(c) + \lambda^{*\T} (b-c) for all c, as required.

  • Informally, states that there exists a tangent to \rho at point b and that the function \rho lies above this tangent.
  • The set H=\{ (c,\rho(b) + \lambda^{*\T}(c-b))\in\mathbb R^{p+1}: c\in\mathbb R^d \} defines a hyperplane containing (b,\rho(b)) and \{(c,\rho(c):c\in\mathbb R^d\} lies in the half-space above H. So we call this “tangent” H a supporting hyperplane to \rho(b) at b with slope \lambda^*.
  • Any function \rho:\mathbb R^d\rightarrow (-\infty,\infty] is convex if q\rho(b)+(1-q)\rho(c) \geq \rho(qb+(1-q)c), for q\in [0,1]. It can be verified that this is equivalent to the condition . Optimization of convex functions may be possible with Lagrangian methods.
  • We discuss the relationship between convexity, hyperplanes, duality and Lagrangian optimization in the supplementary section, Section [DHLO].

Given , for h>0, we can say that

Thus, if \rho is differentiable at b then

  • For a maximization problem, if we interpreted \rho(b) as the optimal profit (or cost for a minimization problem) made when manufacturing using b units of raw material then can be interpreted as the change in profit made from gaining a small additional quantity of material i and thus \lambda_i^* is interpreted at the price the manufacturer is prepared to pay to secure those extra goods.
  • From this interpretation, we call the optimal Lagrange multiplier \lambda^*_i a shadow price.

As noted above if \rho(b) is convex then we can find a \lambda^* with which we can apply the Lagrangian approach.

  • For a constrained optimization problem we say Slater’s Condition is satisfied if the objective function f is a convex function, if constraint region \mathcal X is a convex set, if for each equality constraint h_j(x)=b_j the function h_j is linear, if there exist a feasible solution x\in\mathcal X(b) such that all inequality constraint are satisfied with strict inequality h_j(x)< b_j.
  • To make life easier we assume the functions f and h_j are continuous at the feasible x assumed in slater’s condition and we assume there exists some non-optimal feasible solution in \mathcal X(b).

Theorem 3. If Slater’s Condition holds then \rho(b) is a convex function and thus there exists \lambda^* such that \rho(b)=\inf_{x\in\mathcal X} L(x;\lambda^*).

To prove this we will require the following (intuitively true) result.

Theorem 4. [Supporting Hyperplane Theorem] If C is a convex set with non-empty interior and z\notin C^\circ then there exist a supporting hyperplane H that is a set H=\{ x : a^T x=b \} with z\in H and C belonging a closed half-space associated with H (i.e. \{ x : a^T x\leq b \} or \{ x : a^T x\geq b \}.

We can prove this result later, we will prove Theorem 3.

Proof of Theorem 3. By Slater’s condition the equality constraints are given by a matrix H, i.e. (Hx)_j=b_j for j=1,...,d'. We assume that the image of H has dimension d', otherwise, as row rank = column rank for matricies, we can remove a redundant constraint.1 Consider the set

So, roughly speaking, C is all the stuff above \rho(b') for each b'.

We show that C is convex. For p\in (0,1), (b^1,\rho^1),(b^2,\rho^2)\in C and x^1,x^2 corresponding feasible points used in the definition of C above, consider (b^p,\rho^p)= p(b^1,\rho^1)+ (1-p) (b^2,\rho^2) and x^p=px^1+(1-p)x^2

Screen Shot 2018-05-31 at 05.27.44.png

The three statements above show that (b^p,\rho^p) belongs to C. So C is convex.

Take (b,\rho(b)). As (b,\rho(b))\notin C^\circ, by the Supporting Hyperplane Theorem there exists a (-\lambda,\mu)\in \mathbb R^d\times \mathbb R such

Provided we can choose \mu to be strictly positive then \rho(c) has a supporting hyperplane in the sense of Theorem [LO:OptSHP] and so we are done. It is clear \mu is positive: if we take c=b and \rho=f(x) for some non-optimal x\in\mathcal X(b) the above inequality reads \mu f(x) \geq \mu \rho(b) for all x\in\mathcal X(b), which is only true if \mu \geq 0.

Now we show \mu\neq 0. Let’s look for a contradiction: if \mu=0 then the supposting hyperplane would be vertical with respect to the component \rho and we would have c^\T \lambda \geq b^\T \lambda. Now applying Slater’s condition, there is a point x where the inequality constraints are strict. By continuity of our inequality constraint function and by continuity of linear map H for any \delta suitably small we can choose x(\delta) (close to x) such that

By continuity of f we can choose x(\delta) so that f(x(\delta))<\infty. Thus, it is clear that (b - \delta \lambda, f(x(\delta)) \in C but \lambda^\T (b - \delta \lambda) < \lambda^\T b, which contradicts the earlier assertion that thus \mu>0. Thus by taking \lambda^* = (\lambda_j/\mu: j=1,...,d) we have

Thus, by Theorem 2 \rho(b)=\inf_{x\in\mathcal X} L(x;\lambda^*) as required. $\latex \square $

  1. We can choose not to remove these redundant constraints but then we have to mess arround with relative interiors, see Section [DHLO].

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: