- The Hamilton-Jacobi-Bellman Equation.
- Heuristic derivation of the HJB equation.
We consider a continuous time analogue of Markov Decision Processes.
Time is continuous ; is the state at time ; is the action at time .
Def 1 [Plant Equation] Given functions and , the state evolves according to a stochastic differential equation
where is an -dimensional Brownian motion. This is called the Plant Equation. It decides how our diffusion process evolves as a function of the control actions.
Def 2 A policy chooses an action at each time . (We assume that is adapted and previsible.) Let be the set of policies. The (instantaneous) cost for taking action in state at time is and is the cost for terminating in state at time .
Def 3 [Diffusion Control Problem] Given initial state , a dynamic program is the optimization
Further, let (Resp. ) be the objective (Resp. optimal objective) for when the integral is started from time with , rather than with .
Def 4 [Hamilton-Jacobi-Bellman Equation][DCP:Bellman] For a Diffusion Control Problem , the equation
Heuristically deriving the HJB equation
We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes. The following exercises follow from the heuristic derivation of Ito’s formula and the heuristic derivation of the HJB equation.
Ex 1 [Heuristic Derivation of the HJB equation] We suppose (for simplicity) that belongs to and is driven by a one-dimensional Brownian motion. Argue that the plant equation in the plant equation is approximated by
Ex 2 [Continued] Argue that, for small and positive, the cost function in can be approximated by
Ex 3 [Continued] Argue that the optimal value function approximately satisfies
Ex 4 [Continued] Argue that can be approximated as follows
Ex 5 [Continued] Argue that satisfies the equation
i.e. the HJB equation as required.
Ans 1. Apply the Heuristc of Ito’s formula [see Ex3].
Ans 2. Follows from the definition of a Riemann Integral and since .
Ans 3.  and  define the plant equation and objective for a Markov decision process (Def 3). The required equation is the Bellman equation for that MDP.
Ans 4. This is Itô’s formula.
Ans 5. Take expectations in  (the Brownian term has expectation zero) and substitute into  and divide by .
- Here is the dot-product of the Hessian matrix with . I.e. we multiply component-wise and sum up terms.↩