We consider one of the simplest iterative procedures for solving the (unconstrainted) optimization
where is a convex function twice differentiable function. For fixed
If , the gradient is most negative when
. Thus, starting from
define the gradient descent algorithm:
- We assume
the Hessian matrix of
at
satisfies
, for some
.So, by a Taylor expansion
- We assume
. So the right-hand inequality above guarantees that
for all
.
- We assume there is a unique finite solution to our optimization.1
We now see our algorithm converges exponentially fast to its solution, so-called linear convergence.
Given the assumptions above there exists and
such that
Rewriting , we have
Using lower bound on the Hessian, we bound with the following Taylor expansion
C
ombining the last two inequalities, we have as required,
Finally for , we reapply our Hessian lower-bound assumption and use the above bound for :
Projected Gradient Descent
We consider a simple iterative procedure for solving a constrained optimization
where is a convex function twice differentiable function and where
is some non-empty closed convex set, eg.
. Like in Section [GD]we want to follow the steepest descent direction
. However, such points need not belong to
. Thus we consider the projection of
on to
:
and then, from , we define the projected gradient descent by
- After projecting it need not be true that
. Thus we adjusted the step-size
and our proof will study the distance between
and the optimal solution
rather than the gap
.
• The key observation is that making a projection cannot increase distances
[PGD:proj]
For all , we must have
, i.e. the plane
separates
from
. If this were not true, then we would have a contradiction, in particular, there would be a point on the line joining
and
that is closer to
. We thus have
Adding together and then applying Cauchy-Schwartz implies
as required.
If the gradient of iterates are bounded, then
Hence if ,
and
then
.
Recurring the above expression yields Rearranging the above gives the required result.
- This assumption is not really needed, but it saves space.↩