Continuous Time Dynamic Programs

  •  Continuous-time dynamic programs
  • The HJB equation; a heuristic derivation; and proof of optimality.


Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.

Time is continuous t\in\mathbb{R}_+; x_t\in \mathcal{X} is the state at time t; a_t\in \mathcal{A} is the action at time t;

Def 1 [Plant Equation] Given function f: \mathbb{R}_+\times\mathcal{X}\times \mathcal{A}_t \rightarrow \mathcal{X}, the state evolves according to a differential equation

This is called the Plant Equation.

Def 2 A policy \pi chooses an action \pi_t at each time t. The (instantaneous) reward for taking action a in state x at time t is r_t(a,x) and r_T(x) is the reward for terminating in state x at time T.

Def 3 [Continuous Dynamic Program] Given initial state x_0, a dynamic program is the optimization

Further, let C_\tau({\bf a}) (Resp. L_\tau(x_\tau)) be the objective (Resp. optimal objective) for when the summation is started from t=\tau, rather than t=0.

When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions L, C and c are replaced with notation W, R and r.

Def 4 [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation

is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].

Ex 1 [A Heuristic derivation of the HJB equation] Argue that, for \delta>0 small, x satisfying the recursion

is a good approximation to the plant equation . (A heuristic argument will suffice)

Ex 2 [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is

Ex 3 [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is

Ex 4 [Continued]Argue, by letting \delta approach zero, that the above Bellman equation approaches the equation


Ex 5 [Optimality of HJB]Suppose that a policy \Pi has a value function C_t(x,\Pi) that satisfies the HJB-equation for all t and x then, show that \Pi is an optimal policy.

(Hint: consider e^{-\alpha t}C_t(\tilde{x}_t,\Pi) where \tilde{x} are the states another policy \tilde{\Pi}.)


 

Answers

Ans 1 Obvious from definition of derivative.

Ans 2 Obvious from definition of (Riemann) Integral and since (1-\alpha \delta)^{t/\delta}\rightarrow e^{-\alpha t} as \delta\rightarrow 0.

Ans 3 Immediate from discrete time Bellman Equation.

Ans 4 Minus L_t(x) from each side in [3] divide by \delta and let \delta\rightarrow 0. Further note that

Ans 5 Using shorthand C=C_t(\tilde{x}_t,\Pi):

The inequality holds since the term in the square brackets is the objective of the HJB equation, which is not maximized by \tilde{\pi}_t.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: