We consider a continuous time analogue of Markov Decision Processes.
Time is continuous ;
is the state at time
;
is the action at time
.
Def [Plant Equation] Given functions and
, the state evolves according to a stochastic differential equation
where is an
-dimensional Brownian motion. This is called the Plant Equation.
A policy chooses an action
at each time
. (We assume that
is adapted and previsible.) Let
be the set of policies. The (instantaneous) cost for taking action
in state
at time
is
and
is the cost for terminating in state
at time
.
Def [Diffusion Control Problem] Given initial state , a dynamic program is the optimization
Further, let
(Resp.
) be the objective (Resp. optimal objective) for when the integral is started from time
with
, rather than
with
.
Def [Hamilton-Jacobi-Bellman Equation] For a Diffusion Control Problem , the equation is called the Hamilton-Jacobi-Bellman equation.1 It is the continuous time analogue of the Bellman equation [[DP:Bellman]].
Heuristic Derivation of the HJB equation
We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes and our heuristic derivation of the Stochastic Integration. This is analogous to continuous time control.
Perhaps the main thing to remember is that (informally) the HJB equation is
Here Ito’s formula is applied to the optimal value function at time ,
. This is much easier to remember (assuming you know Ito’s formula).
We suppose (for simplicity) that belongs to
and is driven by a one-dimensional Brownian motion. The plant equation in Def [DCP:Plant] is approximated by
for small (recall ). Similarly the cost function in can be approximated by
This follows from the definition of a Riemann Integral and since . The Bellman equation for this objective function and plant equation is satisfies
or, equivalently,
Now by Ito’s formula can be approximated by
Thus
Substituting in this into the above Bellman equation and letting , we get, as required,
The following gives a rigorous proof that the HJB equation is the right object to consider for a diffusion control problem.
Thrm [Davis-Varaiya Martingale Prinicple of Optimality] Suppose that there exists a function with
and such that for any policy
with states
is a sub-martingale and, moreover that for some policy ,
is a martingale then
is optimal and
Since is a sub-martingale for all
, we have
Therefore
for all policies
.
If is a Martingale for policy
, then by the same argument
. Thus
for all policies and so
is optimal, and it holds that
- Here
is the dot-product of the Hessian matrix
with
. I.e. we multiply component-wise and sum up terms.↩