# Diffusion Control Problems

We consider a continuous time analogue of Markov Decision Processes.

Time is continuous $t\in\mathbb{R}_+$; $X_t\in \mathbb{R}^n$ is the state at time $t$; $a_t\in \mathcal{A}$ is the action at time $t$.

Def [Plant Equation] Given functions $\mu_t(X_t,a_t)=(\mu^i_t(X_t,a_t): i=1,..,n)$ and $\sigma_t(X_t,a_t)=(\sigma^{ij}_t(X_t,a_t): i=1,..,n, j=1,...,m )$, the state evolves according to a stochastic differential equation

where $B_t$ is an $m$-dimensional Brownian motion. This is called the Plant Equation.

A policy $\pi$ chooses an action $\pi_t$ at each time $t$. (We assume that $\pi_t$ is adapted and previsible.) Let $\mathcal{P}$ be the set of policies. The (instantaneous) cost for taking action $a$ in state $x$ at time $t$ is $c_t(a,x)$ and $c_T(x)$ is the cost for terminating in state $x$ at time $T$.

Def [Diffusion Control Problem] Given initial state $x_0$, a dynamic program is the optimizationFurther, let $C_\tau(x,\Pi)$ (Resp. $L_\tau(x)$) be the objective (Resp. optimal objective) for when the integral is started from time $t=\tau$ with $X_t=x$, rather than $t=0$ with $X_0=x$.

Def [Hamilton-Jacobi-Bellman Equation] For a Diffusion Control Problem , the equation is called the Hamilton-Jacobi-Bellman equation.1 It is the continuous time analogue of the Bellman equation [[DP:Bellman]].

### Heuristic Derivation of the HJB equation

We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes and our heuristic derivation of the Stochastic Integration. This is analogous to continuous time control.

Perhaps the main thing to remember is that (informally) the HJB equation is

Here Ito’s formula is applied to the optimal value function at time $t$, $L_t(x)$. This is much easier to remember (assuming you know Ito’s formula).

We suppose (for simplicity) that $X_t$ belongs to $\mathbb R$ and is driven by a one-dimensional Brownian motion. The plant equation in Def [DCP:Plant] is approximated by

for small $\delta$ (recall ). Similarly the cost function in can be approximated by

This follows from the definition of a Riemann Integral and since $(1-{\alpha}{\delta})^\frac{t}{\delta} \rightarrow e^{-\alpha t}$. The Bellman equation for this objective function and plant equation is satisfies

or, equivalently,

Now by Ito’s formula $L_t(X_t)$ can be approximated by

Thus

Substituting in this into the above Bellman equation and letting $\delta \rightarrow 0$, we get, as required,

The following gives a rigorous proof that the HJB equation is the right object to consider for a diffusion control problem.

Thrm [Davis-Varaiya Martingale Prinicple of Optimality] Suppose that there exists a function $L_t(x)$ with $L_T(x)= e^{-\alpha T} c_T(x)$ and such that for any policy $\Pi$ with states $X_t$

is a sub-martingale and, moreover that for some policy $\Pi^*$, $M_t$ is a martingale then $\Pi^*$ is optimal and

Since $M_t$ is a sub-martingale for all $\Pi$, we have

Therefore $L_0(X_0) \leq C(X_0,\Pi)$ for all policies $\Pi$.

If $M_t$ is a Martingale for policy $\Pi^*$, then by the same argument $L_0(X_0) = C(X_0,\Pi^*)$. Thus

for all policies $\Pi$ and so $\Pi^*$ is optimal, and it holds that

$\square$

1. Here $[\sigma^\top \sigma]\cdot \partial_{xx} L_t(x)$ is the dot-product of the Hessian matrix $\partial_{xx} L_t(x)$ with $\sigma^\top \sigma$. I.e. we multiply component-wise and sum up terms.