Diffusion Control Problems

The Hamilton-Jacobi-Bellman Equation.
Heuristic derivation of the HJB equation.

We consider a continuous time analogue of Markov Decision Processes.

Definitions

Time is continuous $t\in\mathbb{R}_+$ ; $X_t\in \mathbb{R}^n$ is the state at time $t$ ; $a_t\in \mathcal{A}$ is the action at time $t$ .

Def 1 [Plant Equation] Given functions $\mu_t(X_t,a_t)=(\mu^i_t(X_t,a_t): i=1,..,n)$ and $\sigma_t(X_t,a_t)=(\sigma^{ij}_t(X_t,a_t): i=1,..,n, j=1,...,m )$ , the state evolves according to a stochastic differential equation

$dX_{t}= \mu_t(X_t, a_t) dt + \sigma_t(X_t,a_t) \cdot d B_t$

where $B_t$ is an $m$ -dimensional Brownian motion. This is called the Plant Equation. It decides how our diffusion process evolves as a function of the control actions.

Def 2 A policy $\pi$ chooses an action $\pi_t$ at each time $t$ . (We assume that $\pi_t$ is adapted and previsible.) Let $\mathcal{P}$ be the set of policies. The (instantaneous) cost for taking action $a$ in state $x$ at time $t$ is $c_t(a,x)$ and $c_T(x)$ is the cost for terminating in state $x$ at time $T$ .

Def 3 [Diffusion Control Problem] Given initial state $x_0$ , a dynamic program is the optimization

Further, let $C_\tau(x,\Pi)$ (Resp. $L_\tau(x)$ ) be the objective (Resp. optimal objective) for when the integral is started from time $t=\tau$ with $X_t=x$ , rather than $t=0$ with $X_0=x$ .

Def 4 [Hamilton-Jacobi-Bellman Equation][DCP:Bellman] For a Diffusion Control Problem , the equation

$\label{DCP:HJB}\tag{HJB} 0= \min_{a\in\mathcal{A}} \left\{ c_t(x,a)+ \partial_t L_t(x) + \mu_t(x,a)\cdot \partial_x L_t(x) + [\sigma^\T \sigma]\cdot \partial_{xx} L_t(x) - \alpha L_t(x). \right\}$

is called the Hamilton-Jacobi-Bellman equation.¹ It is the continuous time analogue of the Markov Decision process Bellman equation.

Heuristically deriving the HJB equation

We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes. The following exercises follow from the heuristic derivation of Ito’s formula and the heuristic derivation of the HJB equation.

Ex 1 [Heuristic Derivation of the HJB equation] We suppose (for simplicity) that $X_t$ belongs to ${\mathbb R}$ and is driven by a one-dimensional Brownian motion. Argue that the plant equation in the plant equation is approximated by

$X_{t+\delta} - X_t = \mu_t(X_t,\pi_t) \delta + \sigma_t(X_t,\pi_t) (B_{T+\delta}-B_t)$

Ex 2 [Continued] Argue that, for $\delta$ small and positive, the cost function in can be approximated by

$C_t(x,\Pi) \approx \mathbb{E} \Bigg[ \sum_{t\in \{0,\delta,...,T-1\}} \!\! (1-\alpha \delta)^{\frac{t}{\delta}} c_t(X_t,\pi_t) \delta + (1-\alpha \delta)^{\frac{T}{\delta}}c_T(X_T) \Bigg].$

Ex 3 [Continued] Argue that $L_t(x)$ the optimal value function approximately satisfies

$L_t(x) = \min_{a\in\mathcal{A}} \left\{ c_t(x,a) \delta + (1-\alpha \delta) \mathbb{E}_{x, a} \left[L_{t+\delta}(X_{t+\delta})\right] \right\}.$

Ex 4 [Continued] Argue that $L= L_t(X_t)$ can be approximated as follows

$\begin{aligned} &L_{t+\delta}(X_{t+\delta}) - L_t(X_t)\\ \approx &\left[ \partial_t L + \mu_t(X_t,\pi_t) \cdot \partial_x L +\frac{\sigma_t(X_t,\pi_t)^2}{2} \partial_{xx} L \right] \delta + \partial_x L \cdot \sigma_t(X_t,\pi_t) \cdot (B_{t+\delta} - B_t) \end{aligned}$

Ex 5 [Continued] Argue that $L=L_t(x)$ satisfies the equation

$0=\min_{a\in\mathcal{A}} \left\{ c_t(x,a) + \partial_t L + \mu_t(x,a) \partial_x L + \frac{\sigma_t(x,a)^2}{2} \partial_{xx} L -\alpha L \right\}$

i.e. the HJB equation as required.

Answers

Ans 1. Apply the Heuristc of Ito’s formula [see Ex3].

Ans 2. Follows from the definition of a Riemann Integral and since $(1-{\alpha}{\delta})^\frac{t}{\delta} \rightarrow e^{-\alpha t}$ .

Ans 3. [1] and [2] define the plant equation and objective for a Markov decision process (Def 3). The required equation is the Bellman equation for that MDP.

Ans 4. This is Itô’s formula.

Ans 5. Take expectations in [4] (the Brownian term has expectation zero) and substitute into [3] and divide by $\delta$ .

Here $[\sigma^T \sigma]\cdot \partial_{xx} L_t(x)$ is the dot-product of the Hessian matrix $\partial_{xx} L_t(x)$ with $\sigma^T \sigma$ . I.e. we multiply component-wise and sum up terms.↩