- Continuous-time dynamic programs
- The HJB equation; a heuristic derivation; and proof of optimality.

Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.

Time is continuous ; is the state at time ; is the action at time ;

**Def 1 **[Plant Equation] Given function , the state evolves according to a differential equation

This is called the Plant Equation.

**Def 2 **A policy chooses an action at each time . The (instantaneous) reward for taking action in state at time is and is the reward for terminating in state at time .

**Def 3** [Continuous Dynamic Program] Given initial state , a dynamic program is the optimization

Further, let (Resp. ) be the objective (Resp. optimal objective) for when the summation is started from , rather than .

When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions , and are replaced with notation , and .

**Def 4** [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation

is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].

**Ex 1** [A Heuristic derivation of the HJB equation] Argue that, for small, satisfying the recursion

is a good approximation to the plant equation . (A heuristic argument will suffice)

**Ex 2** [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is

**Ex 3** [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is

**Ex 4** [Continued]Argue, by letting approach zero, that the above Bellman equation approaches the equation

**Ex 5** [Optimality of HJB]Suppose that a policy has a value function that satisfies the HJB-equation for all and then, show that is an optimal policy.

(Hint: consider where are the states another policy .)

## Answers

**Ans 1** Obvious from definition of derivative.

**Ans 2** Obvious from definition of (Riemann) Integral and since as .

**Ans 3** Immediate from discrete time Bellman Equation.

**Ans 4** Minus from each side in [3] divide by and let . Further note that

**Ans 5** Using shorthand :

The inequality holds since the term in the square brackets is the objective of the HJB equation, which is *not* maximized by .