Diffusion Control Problems

  • The Hamilton-Jacobi-Bellman Equation.
  • Heuristic derivation of the HJB equation.
  • Davis-Varaiya Martingale Prinicple for Optimality


We consider a continuous time diffusion analogue of Markov Decision Processes.

Definitions

Time is continuous t\in\mathbb{R}_+; X_t\in \mathbb{R}^n is the state at time t; a_t\in \mathcal{A} is the action at time t.

Def 1 [Plant Equation] Given functions \mu_t(X_t,a_t)=(\mu^i_t(X_t,a_t): i=1,..,n) and \sigma_t(X_t,a_t)=(\sigma^{ij}_t(X_t,a_t): i=1,..,n, j=1,...,m ), the state evolves according to a stochastic differential equation

where B_t is an m-dimensional Brownian motion. This is called the Plant Equation. It decides how our diffusion process evolves as a function of the control actions.

Def 2 A policy \pi chooses an action \pi_t at each time t. (We assume that \pi_t is adapted and previsible.) Let \mathcal{P} be the set of policies. The (instantaneous) cost for taking action a in state x at time t is c_t(a,x) and c_T(x) is the cost for terminating in state x at time T.

Def 3 [Diffusion Control Problem] Given initial state x_0, a dynamic program is the optimization

d6d656ea93e9236b746ca7f1fac05730.png

Further, let C_\tau(x,\Pi) (Resp. L_\tau(x)) be the objective (Resp. optimal objective) for when the integral is started from time t=\tau with X_t=x, rather than t=0 with X_0=x.

Def 4 [Hamilton-Jacobi-Bellman Equation][DCP:Bellman] For a Diffusion Control Problem , the equation

is called the Hamilton-Jacobi-Bellman equation.1 It is the continuous time analogue of the Markov Decision process Bellman equation.


Heuristically deriving the HJB equation

We heuristically develop a Bellman equation for stochastic differential equations using our knowledge of the Bellman equation for Markov decision processes. The following exercises follow from the heuristic derivation of Ito’s formula and the heuristic derivation of the HJB equation.

Ex 1 [Heuristic Derivation of the HJB equation]  We suppose (for simplicity) that X_t belongs to {\mathbb R} and is driven by a one-dimensional Brownian motion. Argue that the plant equation in the plant equation is approximated by

Ex 2 [Continued] Argue that, for \delta small and positive, the cost function in can be approximated by

Ex 3 [Continued] Argue that L_t(x) the optimal value function approximately satisfies

Ex 4 [Continued] Argue that L= L_t(X_t) can be approximated as follows

Ex 5 [Continued] Argue that L=L_t(x) satisfies the equation

i.e. the HJB equation as required.


Answers

Ans 1. Apply the Heuristc of Ito’s formula [see Ex3].

Ans 2. Follows from the definition of a Riemann Integral and since (1-{\alpha}{\delta})^\frac{t}{\delta} \rightarrow e^{-\alpha t}.

Ans 3. [1] and [2] define the plant equation and objective for a Markov decision process (Def 3). The required equation is the Bellman equation for that MDP.

Ans 4. This is Itô’s formula.

Ans 5. Take expectations in [4] (the Brownian term has expectation zero) and substitute into [3] and divide by \delta.


  1. Here [\sigma^T \sigma]\cdot \partial_{xx} L_t(x) is the dot-product of the Hessian matrix \partial_{xx} L_t(x) with \sigma^T \sigma. I.e. we multiply component-wise and sum up terms.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: