For infinite time MDPs, we cannot apply to induction on Bellman’s equation from some initial state – like we could for finite time MDP. So we need some algorithms to solve MDPs.
This section is intended as a brief introductory recap of Markov chains. A much fuller explanation and introduction is provided in standard texts e.g. Norris, Bremaud, or Levin & Peres (see references below).
Thus far we have considered finite time Markov decision processes. We now want to solve MDPs of the form
Markov decision processes are essentially the randomized equivalent of a dynamic program.
We briefly explain the principles behind dynamic programming and then give its definition.
The link below contains notes PDF for this years stochastic control course
I’ll upload individual posts for each section. I’ll likely update these notes and add more exercises over the coming semester. I’ll add this update in a further post at the end of the course. Comments, typos, suggestions are always welcome.