Continuous Time Dynamic Programs

Continuous-time dynamic programs
The HJB equation; a heuristic derivation; and proof of optimality.

Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.

Time is continuous $t\in\mathbb{R}_+$ ; $x_t\in \mathcal{X}$ is the state at time $t$ ; $a_t\in \mathcal{A}$ is the action at time $t$ ;

Def 1 [Plant Equation] Given function $f: \mathbb{R}_+\times\mathcal{X}\times \mathcal{A}_t \rightarrow \mathcal{X}$ , the state evolves according to a differential equation

$\frac{dx_{t}}{dt}= f_t(x_t, a_t).$

This is called the Plant Equation.

Def 2 A policy $\pi$ chooses an action $\pi_t$ at each time $t$ . The (instantaneous) reward for taking action $a$ in state $x$ at time $t$ is $r_t(a,x)$ and $r_T(x)$ is the reward for terminating in state $x$ at time $T$ .

Def 3 [Continuous Dynamic Program] Given initial state $x_0$ , a dynamic program is the optimization

$\begin{aligned} L(x_0):=\; &\text{Minimize} \qquad C({\bf a}) := \int_{0}^{T}\!\! e^{-\alpha t}c_t(x_t,a_t) dt + e^{-\alpha T}c_T(x_T)\\ & \text{subject to} \qquad \frac{dx_{t}}{dt}= f_t(x_t, a_t), & t\in\mathbb{R}_+ \notag \\ & \text{over}\qquad \quad\qquad a_t \in \mathcal{A},& t\in\mathbb{R}_+\notag \end{aligned}$

Further, let $C_\tau({\bf a})$ (Resp. $L_\tau(x_\tau)$ ) be the objective (Resp. optimal objective) for when the summation is started from $t=\tau$ , rather than $t=0$ .

When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions $L$ , $C$ and $c$ are replaced with notation $W$ , $R$ and $r$ .

Def 4 [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation

$\label{cDP:HJB}\tag{HJB} 0= \min_{a\in\mathcal{A}} \left\{ c_t(x,a)+ \partial_t L_t(x) + f_t(x,a)\partial_x L_t(x) - \alpha L_t(x). \right\}$

is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].

Ex 1 [A Heuristic derivation of the HJB equation] Argue that, for $\delta>0$ small, $x$ satisfying the recursion

$\label{cDP:xaprox} x_{t+\delta} -x_{t} = \delta f_t(x_t,a_t)$

is a good approximation to the plant equation . (A heuristic argument will suffice)

Ex 2 [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is

$\begin{aligned} \label{cDP:Ob} C({\bf a}) := \sum_{t\in\{0,\delta,...,\delta(T-1)\} } (1-\alpha \delta)^{ {t}/{\delta}}c_t(x_t,a_t) \delta + (1-\alpha \delta)^{ {t}/{\delta}}c_T(x_T) \end{aligned}$

Ex 3 [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is

$L_t(x) = \min_{a\in \mathcal{A}}\left\{ c_t(x,a)\delta + (1-\alpha \delta) L_{t+\delta}(x_t+\delta f_t(x,a)) \right\}$

Ex 4 [Continued]Argue, by letting $\delta$ approach zero, that the above Bellman equation approaches the equation

$0= \min_{a\in\mathcal{A}} \left\{ c_t(x,a)+ \partial_t L_t(x) + f_t(x,a)\partial_x L_t(x) - \alpha L_t(x). \right\}$

Ex 5 [Optimality of HJB] Suppose that a policy $\Pi$ has a value function $C_t(x,\Pi)$ that satisfies the HJB-equation for all $t$ and $x$ then, show that $\Pi$ is an optimal policy.

(Hint: consider $e^{-\alpha t}C_t(\tilde{x}_t,\Pi)$ where $\tilde{x}$ are the states another policy $\tilde{\Pi}$ .)

Linear Quadratic Regularization

Def 5. [LQ problem] We consider a dynamic program of the form

Here $x_t \in\mathbb{R}^n$ and $a_t\in\mathbb{R}^m$ . $A$ and $B$ arematrices. $Q$ and $R$ symmetric positive definite matrices. This an Linear-Quadratic problem (LQ problem).

Ex 6. [LQ Regulatization] Show that the HJB equation for an LQ problem is

$0= \min_{a\in\mathbb{R}^m}\left\{ x^\top Qx + a^\top R a + \partial_t L_t(x) + (Ax + Ra)^\top \partial_x L_t(x) \right\}$

Ex 7. [Continued] We now “guess” that the solution to above HJB equation is of the form $L_t(x)=x^\top \Lambda(t) x$ for some symmetric matrix $\Lambda(t)$ . Argue that the HJB equation is minimized by an action satisfying

$a=-R^{-1}B^\top \Lambda(t) x$

and thus for the minimum in the HJB equation to be zero we require

$0=x^\top \left[Q + \dot{\Lambda}(t)+ \Lambda(t)A + A^\top \Lambda(t) - \Lambda(t) B R^{-1} B^{\top} \Lambda(t) -Q \right]x$

Ex 8. [Continued] Show that for the HJB equation to be satisfied then it is sufficent that

$\dot{\Lambda}(t) = -Q-\Lambda(t)A - A^\top \Lambda(t) + \Lambda(t) B R^{-1} B^{\top} \Lambda(t)\quad and\quad \Lambda(T)=Q_T.$

Def 6. [Riccarti Equation] The differential equation is called the Riccarti equation.

Ex 9. [Continued] Argue that a solution to the Riccarti Equation is optimal for the LQ problem.

Answers

Ans 1 Obvious from definition of derivative.

Ans 2 Obvious from definition of (Riemann) Integral and since $(1-\alpha \delta)^{t/\delta}\rightarrow e^{-\alpha t}$ as $\delta\rightarrow 0$ .

Ans 3 Immediate from discrete time Bellman Equation.

Ans 4 Minus $L_t(x)$ from each side in [3] divide by $\delta$ and let $\delta\rightarrow 0$ . Further note that

$\frac{(1-\alpha \delta) L_{t+\delta}(x+\delta f) - L_t(x)}{\delta} \xrightarrow[\delta \rightarrow 0]{ }\partial_t L_t(x) + f_t(x,a)\partial_x L_t(x) - \alpha L_t(x).$

Ans 5 Using shorthand $C=C_t(\tilde{x}_t,\Pi)$ :

$\begin{aligned} -\frac{d}{dt} \left(e^{-\alpha t} C_t(\tilde{x}_t,\Pi)\right)&=e^{-\alpha t} \left\{ c_t(\tilde{x}_t,\tilde{\pi}_t) - \left[ c_t(\tilde{x}_t,\tilde{\pi}_t) - \alpha C + f_t(\tilde{x}_t,\tilde{\pi}_t) \partial_x C + \partial_t C\right]\right\}\\ &\leq e^{-\alpha t} c_t(\tilde{x}_t,\tilde{\pi}_t) \end{aligned}$

The inequality holds since the term in the square brackets is the objective of the HJB equation, which is not maximized by $\tilde{\pi}_t$ .

Ans 6. Immediate from Def 4.

Ans 7. $L_t(x) = x^\top \Lambda(x) x$ therefore

$\partial_x L_t(x) = 2 \Lambda(t) x \quad \text{and} \quad \partial_t L_t(x) = x^\top \dot{\Lambda}(t) x$

Substituting into the Bellman equation gives

$0 = \min_{a\in \mathbb R^n} \left\{ x^\top Q x + a^\top R a + x^\top \dot{\Lambda}(t) x + 2 x^\top \Lambda(x) ( A x + B a) \right\}$

Differentiating with respect to $a$ gives the optimality condition

$2R a + 2 x^\top \Lambda(t) B =0$

which implies

$a= - R^{-1} B^\top \Lambda(t) x$

Finally substituting into the Bellman equation, above, gives the required expression

$0=x^\top \left[Q + \dot{\Lambda}(t)+ \Lambda(t)A + A^\top \Lambda(t) - \Lambda(t) B R^{-1} B^{\top} \Lambda(t) -Q \right]x$

Ans 8. For the minimum in the HJB equation to be zero, we require

$\label{cDP:Riccarti}\tag{RicEq} 0=x^\top \left[Q + \dot{\Lambda}(t)+ \Lambda(t)A + A^\top \Lambda(t) - \Lambda(t) B R^{-1} B^{\top} \Lambda(t) -Q \right]x$

Thus for the term in square brackets to be zero is sufficent.

Ans 9. This is just applying [5].

Linear Quadratic Regularization

Answers

Share this:

Leave a comment Cancel reply