# Dynamic Programming

• Dynamic Programs; Bellman’s Equation; An example.

For this section, consider the following dynamic programming formulation:

Time is discrete $t=0,1,...,T$; $x_t\in \mathcal{X}$ is the state at time $t$; $a_t\in \mathcal{A}_t$ is the action at time $t$;

Def 1 [Plant Equation][DP:Plant] The state evolves according to functions $f_t: \mathcal{X}\times \mathcal{A}_t \rightarrow \mathcal{X}$. Here

This is called the Plant Equation.

A policy $\pi$ choses an action $\pi_t$ at each time $t$. The (instantaneous) reward for taking action $a$ in state $x$ at time $t$ is $r_t(a,x)$ and $r_T(x)$ is the reward for terminating in state $x$ at time $T$.

Def 2 [Dynamic Program] Given initial state $x_0$, a dynamic program is the optimization

Further, let $R_\tau({\bf a})$ (Resp. $W_\tau(x_\tau)$) be the objective (Resp. optimal objective) for when the summation is started from $t=\tau$, rather than $t=0$.

Ex 1 [Bellman’s Equation]$W_T(x) = r_T(x)$ and for $t=T-1,...,0$

where $x_t\in\mathcal{X}$ and $x_{t+1}=f_t(x_t,a_t)$.

Ex 2  An investor has a fund: it has $x$ pounds at time zero; money can’t be withdrawn; it pays $r\times 100\%$ interest per-year for $T$ years; the investor consumes proportion $a_t$ of the interest and reinvests the rest.

What should the investor do to maximize consumption?

## Answers

Ans 1  Let ${\bf a}_t=(a_t,...,a_{T-1})$.

Ans 2  The yearly fund value satisfies $x_{t+1}= x_t + rx_t (1-a_t)$. Backward induction, if reward from time $t=T-s+1$ onward is $R_{T-s+1}(x)=\rho_{s-1} r x_{T-s+1}$, then $R_{T-s}(x)=\rho_{s} r x_{T-s}$ where

Also, $\rho_1=1$, i.e. last year consume everything. Therefore solution is consume everything last years while $\rho_s =s \leq (s-1)(1+r)$ otherwise consume nothing i.e. initially save and then consume in last years.