# Continuous Time Dynamic Programs

•  Continuous-time dynamic programs
• The HJB equation; a heuristic derivation; and proof of optimality.

Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.

Time is continuous $t\in\mathbb{R}_+$; $x_t\in \mathcal{X}$ is the state at time $t$; $a_t\in \mathcal{A}$ is the action at time $t$;

Def 1 [Plant Equation] Given function $f: \mathbb{R}_+\times\mathcal{X}\times \mathcal{A}_t \rightarrow \mathcal{X}$, the state evolves according to a differential equation This is called the Plant Equation.

Def 2 A policy $\pi$ chooses an action $\pi_t$ at each time $t$. The (instantaneous) reward for taking action $a$ in state $x$ at time $t$ is $r_t(a,x)$ and $r_T(x)$ is the reward for terminating in state $x$ at time $T$.

Def 3 [Continuous Dynamic Program] Given initial state $x_0$, a dynamic program is the optimization Further, let $C_\tau({\bf a})$ (Resp. $L_\tau(x_\tau)$) be the objective (Resp. optimal objective) for when the summation is started from $t=\tau$, rather than $t=0$.

When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions $L$, $C$ and $c$ are replaced with notation $W$, $R$ and $r$.

Def 4 [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].

Ex 1 [A Heuristic derivation of the HJB equation] Argue that, for $\delta>0$ small, $x$ satisfying the recursion is a good approximation to the plant equation . (A heuristic argument will suffice)

Ex 2 [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is Ex 3 [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is Ex 4 [Continued]Argue, by letting $\delta$ approach zero, that the above Bellman equation approaches the equation Ex 5 [Optimality of HJB]Suppose that a policy $\Pi$ has a value function $C_t(x,\Pi)$ that satisfies the HJB-equation for all $t$ and $x$ then, show that $\Pi$ is an optimal policy.

(Hint: consider $e^{-\alpha t}C_t(\tilde{x}_t,\Pi)$ where $\tilde{x}$ are the states another policy $\tilde{\Pi}$.)

Ans 1 Obvious from definition of derivative.

Ans 2 Obvious from definition of (Riemann) Integral and since $(1-\alpha \delta)^{t/\delta}\rightarrow e^{-\alpha t}$ as $\delta\rightarrow 0$.

Ans 3 Immediate from discrete time Bellman Equation.

Ans 4 Minus $L_t(x)$ from each side in  divide by $\delta$ and let $\delta\rightarrow 0$. Further note that Ans 5 Using shorthand $C=C_t(\tilde{x}_t,\Pi)$: The inequality holds since the term in the square brackets is the objective of the HJB equation, which is not maximized by $\tilde{\pi}_t$.