I’ve updated the notes for this year’s stochastic control course, here:
Asside from general tidying. New material includes:
- Equilibrium distributions of Markov chains
- Occupancy measure of infinite time horizon MDPs
- Linear programming as an algorithm for solving MDPs
- Convergence of Asynchronous Value Iteration
- (s,S) Inventory Control
- POMDP (though there is still more to add)
- Calculus of Variations (though there is still more to add)
- Pontyagin’s Maximum Prinicple
- Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
- (some) Online Convex Optimization
- Stochastic Bandits (UCB and Lai-Robbins Lower bound)
- Gittins’ Index Theorem.
- Sequential/Stochastic Linear Regression (Lai and Wei)
- More discussion on TD methods
- Discussion on double Q-learning and Dueling/Advantage updating
- Convergence proof for SARSA
- Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
- Cross Entropy Method (but still more to do)
- Several new appendices (but mostly from old notes)
Like last year, I will likely update the notes further (and correct typos) towards the end of the course.