We consider one of the simplest iterative procedures for solving the (unconstrainted) optimization

where is a convex function twice differentiable function. For fixed

If , the gradient is most negative when . Thus, starting from define the *gradient descent algorithm*:

- We assume the Hessian matrix of at satisfies , for some .So, by a Taylor expansion
- We assume . So the right-hand inequality above guarantees that for all .
- We assume there is a unique finite solution to our optimization.
^{1}

We now see our algorithm converges exponentially fast to its solution, so-called *linear convergence*.

Given the assumptions above there exists and such that

Rewriting , we have

Using lower bound on the Hessian, we bound with the following Taylor expansion

C

ombining the last two inequalities, we have as required,

Finally for , we reapply our Hessian lower-bound assumption and use the above bound for :

# Projected Gradient Descent

We consider a simple iterative procedure for solving a constrained optimization

where is a convex function twice differentiable function and where is some non-empty closed convex set, eg. . Like in Section [GD]we want to follow the steepest descent direction . However, such points need not belong to . Thus we consider the *projection* of on to :

and then, from , we define the *projected gradient descent* by

- After projecting it need not be true that . Thus we adjusted the step-size and our proof will study the distance between and the optimal solution rather than the gap .

• The key observation is that making a projection cannot increase distances

[PGD:proj]

For all , we must have , i.e. the plane separates from . If this were not true, then we would have a contradiction, in particular, there would be a point on the line joining and that is closer to . We thus have

Adding together and then applying Cauchy-Schwartz implies

as required.

If the gradient of iterates are bounded, then

Hence if , and then .

Recurring the above expression yields Rearranging the above gives the required result.

- This assumption is not really needed, but it saves space.↩