Dynamic Programming

  • Dynamic Programs; Bellman’s Equation; An example.


For this section, consider the following dynamic programming formulation:

Time is discrete t=0,1,...,T; x_t\in \mathcal{X} is the state at time t; a_t\in \mathcal{A}_t is the action at time t;

Def 1 [Plant Equation][DP:Plant] The state evolves according to functions f_t: \mathcal{X}\times \mathcal{A}_t \rightarrow \mathcal{X}. Here

This is called the Plant Equation.

A policy \pi choses an action \pi_t at each time t. The (instantaneous) reward for taking action a in state x at time t is r_t(a,x) and r_T(x) is the reward for terminating in state x at time T.

Def 2 [Dynamic Program] Given initial state x_0, a dynamic program is the optimization

7904a523ec69f29deca0804f079ee6e3.png

Further, let R_\tau({\bf a}) (Resp. W_\tau(x_\tau)) be the objective (Resp. optimal objective) for when the summation is started from t=\tau, rather than t=0.

Ex 1 [Bellman’s Equation]W_T(x) = r_T(x) and for t=T-1,...,0

where x_t\in\mathcal{X} and x_{t+1}=f_t(x_t,a_t).

Ex 2  An investor has a fund: it has x pounds at time zero; money can’t be withdrawn; it pays r\times 100\% interest per-year for T years; the investor consumes proportion a_t of the interest and reinvests the rest.

What should the investor do to maximize consumption?


Answers

Ans 1  Let {\bf a}_t=(a_t,...,a_{T-1}).

Ans 2  The yearly fund value satisfies x_{t+1}= x_t + rx_t (1-a_t). Backward induction, if reward from time t=T-s+1 onward is R_{T-s+1}(x)=\rho_{s-1} r x_{T-s+1}, then R_{T-s}(x)=\rho_{s} r x_{T-s} where

Also, \rho_1=1, i.e. last year consume everything. Therefore solution is consume everything last years while \rho_s =s \leq (s-1)(1+r) otherwise consume nothing i.e. initially save and then consume in last years.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: