# Diffusion Control Problems

• The Hamilton-Jacobi-Bellman Equation.
• Heuristic derivation of the HJB equation.

We consider a continuous time analogue of Markov Decision Processes.

### Definitions

Time is continuous $t\in\mathbb{R}_+$; $X_t\in \mathbb{R}^n$ is the state at time $t$; $a_t\in \mathcal{A}$ is the action at time $t$.

Def 1 [Plant Equation] Given functions $\mu_t(X_t,a_t)=(\mu^i_t(X_t,a_t): i=1,..,n)$ and $\sigma_t(X_t,a_t)=(\sigma^{ij}_t(X_t,a_t): i=1,..,n, j=1,...,m )$, the state evolves according to a stochastic differential equation

where $B_t$ is an $m$-dimensional Brownian motion. This is called the Plant Equation. It decides how our diffusion process evolves as a function of the control actions.

Def 2 A policy $\pi$ chooses an action $\pi_t$ at each time $t$. (We assume that $\pi_t$ is adapted and previsible.) Let $\mathcal{P}$ be the set of policies. The (instantaneous) cost for taking action $a$ in state $x$ at time $t$ is $c_t(a,x)$ and $c_T(x)$ is the cost for terminating in state $x$ at time $T$.

Def 3 [Diffusion Control Problem] Given initial state $x_0$, a dynamic program is the optimization

Further, let $C_\tau(x,\Pi)$ (Resp. $L_\tau(x)$) be the objective (Resp. optimal objective) for when the integral is started from time $t=\tau$ with $X_t=x$, rather than $t=0$ with $X_0=x$.

Def 4 [Hamilton-Jacobi-Bellman Equation][DCP:Bellman] For a Diffusion Control Problem , the equation

is called the Hamilton-Jacobi-Bellman equation.1 It is the continuous time analogue of the Markov Decision process Bellman equation.

### Heuristically deriving the HJB equation

We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes. The following exercises follow from the heuristic derivation of Ito’s formula and the heuristic derivation of the HJB equation.

Ex 1 [Heuristic Derivation of the HJB equation]  We suppose (for simplicity) that $X_t$ belongs to ${\mathbb R}$ and is driven by a one-dimensional Brownian motion. Argue that the plant equation in the plant equation is approximated by

Ex 2 [Continued] Argue that, for $\delta$ small and positive, the cost function in can be approximated by

Ex 3 [Continued] Argue that $L_t(x)$ the optimal value function approximately satisfies

Ex 4 [Continued] Argue that $L= L_t(X_t)$ can be approximated as follows

Ex 5 [Continued] Argue that $L=L_t(x)$ satisfies the equation

i.e. the HJB equation as required.

Ans 2. Follows from the definition of a Riemann Integral and since $(1-{\alpha}{\delta})^\frac{t}{\delta} \rightarrow e^{-\alpha t}$.
Ans 5. Take expectations in [4] (the Brownian term has expectation zero) and substitute into [3] and divide by $\delta$.
1. Here $[\sigma^T \sigma]\cdot \partial_{xx} L_t(x)$ is the dot-product of the Hessian matrix $\partial_{xx} L_t(x)$ with $\sigma^T \sigma$. I.e. we multiply component-wise and sum up terms.