- Continuous-time dynamic programs
- The HJB equation; a heuristic derivation; and proof of optimality.
Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.
Time is continuous ; is the state at time ; is the action at time ;
Def 1 [Plant Equation] Given function , the state evolves according to a differential equation
This is called the Plant Equation.
Def 2 A policy chooses an action at each time . The (instantaneous) reward for taking action in state at time is and is the reward for terminating in state at time .
Def 3 [Continuous Dynamic Program] Given initial state , a dynamic program is the optimization
Further, let (Resp. ) be the objective (Resp. optimal objective) for when the summation is started from , rather than .
When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions , and are replaced with notation , and .
Def 4 [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation
is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].
Ex 1 [A Heuristic derivation of the HJB equation] Argue that, for small, satisfying the recursion
is a good approximation to the plant equation . (A heuristic argument will suffice)
Ex 2 [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is
Ex 3 [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is
Ex 4 [Continued]Argue, by letting approach zero, that the above Bellman equation approaches the equation
Ex 5 [Optimality of HJB]Suppose that a policy has a value function that satisfies the HJB-equation for all and then, show that is an optimal policy.
(Hint: consider where are the states another policy .)
Ans 1 Obvious from definition of derivative.
Ans 2 Obvious from definition of (Riemann) Integral and since as .
Ans 3 Immediate from discrete time Bellman Equation.
Ans 4 Minus from each side in  divide by and let . Further note that
Ans 5 Using shorthand :
The inequality holds since the term in the square brackets is the objective of the HJB equation, which is not maximized by .