- Dynamic Programs; Bellman’s Equation; An example.
For this section, consider the following dynamic programming formulation:
Time is discrete ;
is the state at time
;
is the action at time
;
Def 1 [Plant Equation][DP:Plant] The state evolves according to functions . Here
This is called the Plant Equation.
A policy choses an action
at each time
. The (instantaneous) reward for taking action
in state
at time
is
and
is the reward for terminating in state
at time
.
Def 2 [Dynamic Program] Given initial state , a dynamic program is the optimization
Further, let (Resp.
) be the objective (Resp. optimal objective) for when the summation is started from
, rather than
.
Ex 1 [Bellman’s Equation] and for
where and
.
Ex 2 An investor has a fund: it has pounds at time zero; money can’t be withdrawn; it pays
interest per-year for
years; the investor consumes proportion
of the interest and reinvests the rest.
What should the investor do to maximize consumption?
Answers
Ans 1 Let .
Ans 2 The yearly fund value satisfies . Backward induction, if reward from time
onward is
, then
where
Also, , i.e. last year consume everything. Therefore solution is consume everything last years while
otherwise consume nothing i.e. initially save and then consume in last years.