Continuous Time Dynamic Programs

  •  Continuous-time dynamic programs
  • The HJB equation; a heuristic derivation; and proof of optimality.


Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.

Time is continuous t\in\mathbb{R}_+; x_t\in \mathcal{X} is the state at time t; a_t\in \mathcal{A} is the action at time t;

Def 1 [Plant Equation] Given function f: \mathbb{R}_+\times\mathcal{X}\times \mathcal{A}_t \rightarrow \mathcal{X}, the state evolves according to a differential equation

This is called the Plant Equation.

Def 2 A policy \pi chooses an action \pi_t at each time t. The (instantaneous) reward for taking action a in state x at time t is r_t(a,x) and r_T(x) is the reward for terminating in state x at time T.

Def 3 [Continuous Dynamic Program] Given initial state x_0, a dynamic program is the optimization

Further, let C_\tau({\bf a}) (Resp. L_\tau(x_\tau)) be the objective (Resp. optimal objective) for when the summation is started from t=\tau, rather than t=0.

When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions L, C and c are replaced with notation W, R and r.

Def 4 [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation

is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].

Ex 1 [A Heuristic derivation of the HJB equation] Argue that, for \delta>0 small, x satisfying the recursion

is a good approximation to the plant equation . (A heuristic argument will suffice)

Ex 2 [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is

Ex 3 [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is

Ex 4 [Continued]Argue, by letting \delta approach zero, that the above Bellman equation approaches the equation


Ex 5 [Optimality of HJB] Suppose that a policy \Pi has a value function C_t(x,\Pi) that satisfies the HJB-equation for all t and x then, show that \Pi is an optimal policy.

(Hint: consider e^{-\alpha t}C_t(\tilde{x}_t,\Pi) where \tilde{x} are the states another policy \tilde{\Pi}.)


Linear Quadratic Regularization

Def 5. [LQ problem] We consider a dynamic program of the form 5aec10bae4bf64f3e74ea93d44c86d5d.png

Here x_t \in\mathbb{R}^n and a_t\in\mathbb{R}^m. A and B arematrices. Q and R symmetric positive definite matrices. This an Linear-Quadratic problem (LQ problem).

Ex 6. [LQ Regulatization] Show that the HJB equation for an LQ problem is

 

Ex 7. [Continued] We now “guess” that the solution to above HJB equation is of the form L_t(x)=x^\top \Lambda(t) x for some symmetric matrix \Lambda(t). Argue that the HJB equation is minimized by an action satisfying

and thus for the minimum in the HJB equation to be zero we require

 

Ex 8. [Continued] Show that for the HJB equation to be satisfied then it is sufficent that

 

Def 6. [Riccarti Equation] The differential equation is called the Riccarti equation.

Ex 9. [Continued] Argue that a solution to the Riccarti Equation is optimal for the LQ problem.


Answers

Ans 1 Obvious from definition of derivative.

Ans 2 Obvious from definition of (Riemann) Integral and since (1-\alpha \delta)^{t/\delta}\rightarrow e^{-\alpha t} as \delta\rightarrow 0.

Ans 3 Immediate from discrete time Bellman Equation.

Ans 4 Minus L_t(x) from each side in [3] divide by \delta and let \delta\rightarrow 0. Further note that

Ans 5 Using shorthand C=C_t(\tilde{x}_t,\Pi):

The inequality holds since the term in the square brackets is the objective of the HJB equation, which is not maximized by \tilde{\pi}_t.

Ans 6. Immediate from Def 4.

Ans 7. L_t(x) = x^\top \Lambda(x) x therefore

Substituting into the Bellman equation gives

Differentiating with respect to a gives the optimality condition

which implies

Finally substituting into the Bellman equation, above, gives the required expression

Ans 8. For the minimum in the HJB equation to be zero, we require

Thus for the term in square brackets to be zero is sufficent.

Ans 9. This is just applying [5].


 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s