- Continuous-time dynamic programs
- The HJB equation; a heuristic derivation; and proof of optimality.
Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.
Time is continuous ;
is the state at time
;
is the action at time
;
Def 1 [Plant Equation] Given function , the state evolves according to a differential equation
This is called the Plant Equation.
Def 2 A policy chooses an action
at each time
. The (instantaneous) reward for taking action
in state
at time
is
and
is the reward for terminating in state
at time
.
Def 3 [Continuous Dynamic Program] Given initial state , a dynamic program is the optimization
Further, let (Resp.
) be the objective (Resp. optimal objective) for when the summation is started from
, rather than
.
When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions ,
and
are replaced with notation
,
and
.
Def 4 [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation
is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].
Ex 1 [A Heuristic derivation of the HJB equation] Argue that, for small,
satisfying the recursion
is a good approximation to the plant equation . (A heuristic argument will suffice)
Ex 2 [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is
Ex 3 [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is
Ex 4 [Continued]Argue, by letting approach zero, that the above Bellman equation approaches the equation
Ex 5 [Optimality of HJB]Suppose that a policy has a value function
that satisfies the HJB-equation for all
and
then, show that
is an optimal policy.
(Hint: consider where
are the states another policy
.)
Answers
Ans 1 Obvious from definition of derivative.
Ans 2 Obvious from definition of (Riemann) Integral and since as
.
Ans 3 Immediate from discrete time Bellman Equation.
Ans 4 Minus from each side in [3] divide by
and let
. Further note that
Ans 5 Using shorthand :
The inequality holds since the term in the square brackets is the objective of the HJB equation, which is not maximized by .