- Dynamic Programs; Bellman’s Equation; An example.
For this section, consider the following dynamic programming formulation:
Time is discrete ; is the state at time ; is the action at time ;
Def 1 [Plant Equation][DP:Plant] The state evolves according to functions . Here
This is called the Plant Equation.
A policy choses an action at each time . The (instantaneous) reward for taking action in state at time is and is the reward for terminating in state at time .
Def 2 [Dynamic Program] Given initial state , a dynamic program is the optimization
Further, let (Resp. ) be the objective (Resp. optimal objective) for when the summation is started from , rather than .
Ex 1 [Bellman’s Equation] and for
where and .
Ex 2 An investor has a fund: it has pounds at time zero; money can’t be withdrawn; it pays interest per-year for years; the investor consumes proportion of the interest and reinvests the rest.
What should the investor do to maximize consumption?
Ans 1 Let .
Ans 2 The yearly fund value satisfies . Backward induction, if reward from time onward is , then where
Also, , i.e. last year consume everything. Therefore solution is consume everything last years while otherwise consume nothing i.e. initially save and then consume in last years.