We consider a continuous time analogue of Markov Decision Processes.
Time is continuous ; is the state at time ; is the action at time .
Def [Plant Equation] Given functions and , the state evolves according to a stochastic differential equation
where is an -dimensional Brownian motion. This is called the Plant Equation.
A policy chooses an action at each time . (We assume that is adapted and previsible.) Let be the set of policies. The (instantaneous) cost for taking action in state at time is and is the cost for terminating in state at time .
Def [Diffusion Control Problem] Given initial state , a dynamic program is the optimizationFurther, let (Resp. ) be the objective (Resp. optimal objective) for when the integral is started from time with , rather than with .
Def [Hamilton-Jacobi-Bellman Equation] For a Diffusion Control Problem , the equation is called the Hamilton-Jacobi-Bellman equation.1 It is the continuous time analogue of the Bellman equation [[DP:Bellman]].
Heuristic Derivation of the HJB equation
We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes and our heuristic derivation of the Stochastic Integration. This is analogous to continuous time control.
Perhaps the main thing to remember is that (informally) the HJB equation is
Here Ito’s formula is applied to the optimal value function at time , . This is much easier to remember (assuming you know Ito’s formula).
We suppose (for simplicity) that belongs to and is driven by a one-dimensional Brownian motion. The plant equation in Def [DCP:Plant] is approximated by
for small (recall ). Similarly the cost function in can be approximated by
This follows from the definition of a Riemann Integral and since . The Bellman equation for this objective function and plant equation is satisfies
Now by Ito’s formula can be approximated by
Substituting in this into the above Bellman equation and letting , we get, as required,
The following gives a rigorous proof that the HJB equation is the right object to consider for a diffusion control problem.
Thrm [Davis-Varaiya Martingale Prinicple of Optimality] Suppose that there exists a function with and such that for any policy with states
is a sub-martingale and, moreover that for some policy , is a martingale then is optimal and
Since is a sub-martingale for all , we have
Therefore for all policies .
If is a Martingale for policy , then by the same argument . Thus
for all policies and so is optimal, and it holds that
- Here is the dot-product of the Hessian matrix with . I.e. we multiply component-wise and sum up terms.↩