Dynamic Programming

Dynamic Programs; Bellman’s Equation; An example.

For this section, consider the following dynamic programming formulation:

Time is discrete $t=0,1,...,T$ ; $x_t\in \mathcal{X}$ is the state at time $t$ ; $a_t\in \mathcal{A}_t$ is the action at time $t$ ;

Def 1 [Plant Equation][DP:Plant] The state evolves according to functions $f_t: \mathcal{X}\times \mathcal{A}_t \rightarrow \mathcal{X}$ . Here

$x_{t+1}= f(x_t, a_t).$

This is called the Plant Equation.

A policy $\pi$ choses an action $\pi_t$ at each time $t$ . The (instantaneous) reward for taking action $a$ in state $x$ at time $t$ is $r_t(a,x)$ and $r_T(x)$ is the reward for terminating in state $x$ at time $T$ .

Def 2 [Dynamic Program] Given initial state $x_0$ , a dynamic program is the optimization

Further, let $R_\tau({\bf a})$ (Resp. $W_\tau(x_\tau)$ ) be the objective (Resp. optimal objective) for when the summation is started from $t=\tau$ , rather than $t=0$ .

Ex 1 [Bellman’s Equation] $W_T(x) = r_T(x)$ and for $t=T-1,...,0$

$\tag{Bell eq} W_t(x_t) = \sup_{a_t\in\mathcal{A}_t}\left\{ r_t(x_t,a_t) + W_{t+1} (x_{t+1}) \right\} ,$

where $x_t\in\mathcal{X}$ and $x_{t+1}=f_t(x_t,a_t)$ .

Ex 2 An investor has a fund: it has $x$ pounds at time zero; money can’t be withdrawn; it pays $r\times 100\%$ interest per-year for $T$ years; the investor consumes proportion $a_t$ of the interest and reinvests the rest.

What should the investor do to maximize consumption?

Answers

Ans 1 Let ${\bf a}_t=(a_t,...,a_{T-1})$ .

$\begin{aligned} W_t(x_t) &= \sup_{{\bf a}_t} \left\{ R_t({\bf a}_t) \right\} = \sup_{a_t} \sup_{{\bf a}_{t+1}}\left\{ r_{t}(x_t,a_t) + R_{t+1}({\bf a}_t) \right\} \\ &= \sup_{a_t} \left\{ r_{t}(x_t,a_t) +\sup_{{\bf a}_{t+1}} R_{t+1}({\bf a}_t) \right\} =\sup_{a_t\in\mathcal{A}_t}\left\{ r_t(x_t,a_t) + W_{t+1} (x_{t+1}) \right\} . \end{aligned}$

Ans 2 The yearly fund value satisfies $x_{t+1}= x_t + rx_t (1-a_t)$ . Backward induction, if reward from time $t=T-s+1$ onward is $R_{T-s+1}(x)=\rho_{s-1} r x_{T-s+1}$ , then $R_{T-s}(x)=\rho_{s} r x_{T-s}$ where

$\rho_{s}= \left\{ \rho_{s-1}(1+r) \right\} \vee \left\{ 1+ \rho_{s-1} \right\}.$

Also, $\rho_1=1$ , i.e. last year consume everything. Therefore solution is consume everything last years while $\rho_s =s \leq (s-1)(1+r)$ otherwise consume nothing i.e. initially save and then consume in last years.