Diffusion Control Problems

We consider a continuous time analogue of Markov Decision Processes.

Time is continuous t\in\mathbb{R}_+; X_t\in \mathbb{R}^n is the state at time t; a_t\in \mathcal{A} is the action at time t.

Def [Plant Equation] Given functions \mu_t(X_t,a_t)=(\mu^i_t(X_t,a_t): i=1,..,n) and \sigma_t(X_t,a_t)=(\sigma^{ij}_t(X_t,a_t): i=1,..,n, j=1,...,m ), the state evolves according to a stochastic differential equation

where B_t is an m-dimensional Brownian motion. This is called the Plant Equation.

A policy \pi chooses an action \pi_t at each time t. (We assume that \pi_t is adapted and previsible.) Let \mathcal{P} be the set of policies. The (instantaneous) cost for taking action a in state x at time t is c_t(a,x) and c_T(x) is the cost for terminating in state x at time T.

Def [Diffusion Control Problem] Given initial state x_0, a dynamic program is the optimizationScreenshot 2019-01-26 at 16.03.29.pngFurther, let C_\tau(x,\Pi) (Resp. L_\tau(x)) be the objective (Resp. optimal objective) for when the integral is started from time t=\tau with X_t=x, rather than t=0 with X_0=x.

Def [Hamilton-Jacobi-Bellman Equation] For a Diffusion Control Problem , the equation is called the Hamilton-Jacobi-Bellman equation.1 It is the continuous time analogue of the Bellman equation [[DP:Bellman]].


Heuristic Derivation of the HJB equation

We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes and our heuristic derivation of the Stochastic Integration. This is analogous to continuous time control.

Perhaps the main thing to remember is that (informally) the HJB equation is

Here Ito’s formula is applied to the optimal value function at time t, L_t(x). This is much easier to remember (assuming you know Ito’s formula).

We suppose (for simplicity) that X_t belongs to \mathbb R and is driven by a one-dimensional Brownian motion. The plant equation in Def [DCP:Plant] is approximated by

for small \delta (recall ). Similarly the cost function in can be approximated by

This follows from the definition of a Riemann Integral and since (1-{\alpha}{\delta})^\frac{t}{\delta} \rightarrow e^{-\alpha t}. The Bellman equation for this objective function and plant equation is satisfies

or, equivalently,

Now by Ito’s formula L_t(X_t) can be approximated by

Thus

Substituting in this into the above Bellman equation and letting \delta \rightarrow 0, we get, as required,


The following gives a rigorous proof that the HJB equation is the right object to consider for a diffusion control problem.

Thrm [Davis-Varaiya Martingale Prinicple of Optimality] Suppose that there exists a function L_t(x) with L_T(x)= e^{-\alpha T} c_T(x) and such that for any policy \Pi with states X_t

is a sub-martingale and, moreover that for some policy \Pi^*, M_t is a martingale then \Pi^* is optimal and

Since M_t is a sub-martingale for all \Pi, we have

Therefore L_0(X_0) \leq C(X_0,\Pi) for all policies \Pi.

If M_t is a Martingale for policy \Pi^*, then by the same argument L_0(X_0) = C(X_0,\Pi^*). Thus

for all policies \Pi and so \Pi^* is optimal, and it holds that

\square


  1. Here [\sigma^\top \sigma]\cdot \partial_{xx} L_t(x) is the dot-product of the Hessian matrix \partial_{xx} L_t(x) with \sigma^\top \sigma. I.e. we multiply component-wise and sum up terms.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: