# Online Convex Optimization

We consider the setting of sequentially optimizing the average of a sequence of functions, so called online convex optimization.

For ${\mathcal C}$ a convex set and discrete time $t=1,2,3,...$, we consider the setting where

1. An unknown convex loss function $l_{t}$ is set.
2. the algorithm picks a point $x\in{\mathcal S}$ and incurs a loss $l_t(x)$.
3. The gradient $\nabla l_t (x_t)$ is revealed.

The quantity that we are interested in minimizing is the regret of the aggregate loss between a sequence of choices $\{x_t\}_{t=1}^T$, and a fixed choice $x$ : We now define a specific selection rule for $x_t$, which we called online mirror descent. Each mirror descent algorithm is defined by a convex function $F:{\mathcal C}\rightarrow {\mathbb R}$.

Mirror Descent

Initialize by setting by setting $\theta_1=(0,...,0)$, then for $t=1,2,...$

1. set 2. As described above the algorithm the incurs loss $l_t(x_t)$ and observes $\nabla l_t(x_t)$ .
3. The algorithm then selects • Notice by Fenchel-Duality, here that $x_t=\nabla F^*(\theta_t)$ where $F^*$ is the Legendre-Fenchel Transform of $F$. In other words, we don’t need to solve the optimization of $x_t$, in principle we can just choose $\nabla F^*$
from an appropriate convex function $F^*$.
• We will say that $F$ is $\beta$-strongly convex if • We will say that $G$ is $\gamma$-smooth if it is differentiable and its derivative is Lipschitz continuous with constant $\gamma$, i.e. • It is left as an exercise, to show that if $F$ is $\beta$-strongly convex then its Legendre-Fenchel Transform $F^*$ is $\beta^{-1}$– smooth.

Thrm: If $F$ is $\beta$-strongly convex, then Proof: Notice by the convexity of $l_t$, we can upper-bound the regret as follows   Where in the equality, we just introduced the notation $g_t= \nabla l_t(x_t)$.
Now notice that given our definition of $\theta_t$ in , $x_t$ as defined in is given by In some sense our choice of $x_{t}$ is attempting to optimize the bound on the right-hand side of .

Now since $F$ is $\beta$-strongly convex, then $F^*$ is $\beta^{-1}$–smooth. Where we note that, by definition, $x_t=\nabla F^*(\theta_t)$. Further by the Fenchel-Young Inequality, we have that  where in the last inequality we note the definition of $\theta_{T+1}$ is the sum of $-g_1,...,-g_T$. Summing the interpolation terms in and applying the above bound we have that    In the last equality, we apply the Fenchel-Young Inequality to $F(x_1)$ once more. Now, rearranging the above inequality and applying our first bound , we have that   as required.