- Continuous-time dynamic programs
- The HJB equation; a heuristic derivation; and proof of optimality.
Discrete time Dynamic Programming was given in previously (see Dynamic Programming ). We now consider the continuous time analogue.
Time is continuous ;
is the state at time
;
is the action at time
;
Def 1 [Plant Equation] Given function , the state evolves according to a differential equation
This is called the Plant Equation.
Def 2 A policy chooses an action
at each time
. The (instantaneous) reward for taking action
in state
at time
is
and
is the reward for terminating in state
at time
.
Def 3 [Continuous Dynamic Program] Given initial state , a dynamic program is the optimization
Further, let (Resp.
) be the objective (Resp. optimal objective) for when the summation is started from
, rather than
.
When a minimization problem where we minimize loss given the costs incurred is replaced with a maximization problem where we maximize winnings given the rewards received. The functions ,
and
are replaced with notation
,
and
.
Def 4 [Hamilton-Jacobi-Bellman Equation] For a continuous-time dynamic program , the equation
is called the Hamilton-Jacobi-Bellman equation. It is the continuous time analogoue of the Bellman equation [[DP:Bellman]].
Ex 1 [A Heuristic derivation of the HJB equation] Argue that, for small,
satisfying the recursion
is a good approximation to the plant equation . (A heuristic argument will suffice)
Ex 2 [Continued] Argue (heuristically) that following is a good approximation for the objective of a continuous time dynamic program is
Ex 3 [Continued]Show that the Bellman equation for the discrete time dynamic program with objective and plant equation is
Ex 4 [Continued]Argue, by letting approach zero, that the above Bellman equation approaches the equation
Ex 5 [Optimality of HJB] Suppose that a policy has a value function
that satisfies the HJB-equation for all
and
then, show that
is an optimal policy.
(Hint: consider where
are the states another policy
.)
Linear Quadratic Regularization
Def 5. [LQ problem] We consider a dynamic program of the form
Here and
.
and
arematrices.
and
symmetric positive definite matrices. This an Linear-Quadratic problem (LQ problem).
Ex 6. [LQ Regulatization] Show that the HJB equation for an LQ problem is
Ex 7. [Continued] We now “guess” that the solution to above HJB equation is of the form for some symmetric matrix
. Argue that the HJB equation is minimized by an action satisfying
and thus for the minimum in the HJB equation to be zero we require
Ex 8. [Continued] Show that for the HJB equation to be satisfied then it is sufficent that
Def 6. [Riccarti Equation] The differential equation is called the Riccarti equation.
Ex 9. [Continued] Argue that a solution to the Riccarti Equation is optimal for the LQ problem.
Answers
Ans 1 Obvious from definition of derivative.
Ans 2 Obvious from definition of (Riemann) Integral and since as
.
Ans 3 Immediate from discrete time Bellman Equation.
Ans 4 Minus from each side in [3] divide by
and let
. Further note that
Ans 5 Using shorthand :
The inequality holds since the term in the square brackets is the objective of the HJB equation, which is not maximized by .
Ans 6. Immediate from Def 4.
Ans 7. therefore
Substituting into the Bellman equation gives
Differentiating with respect to gives the optimality condition
which implies
Finally substituting into the Bellman equation, above, gives the required expression
Ans 8. For the minimum in the HJB equation to be zero, we require
Thus for the term in square brackets to be zero is sufficent.
Ans 9. This is just applying [5].