I’ve updated the notes for this year’s stochastic control course, here:
Asside from general tidying. New material includes:
- Equilibrium distributions of Markov chains
- Occupancy measure of infinite time horizon MDPs
- Linear programming as an algorithm for solving MDPs
- Convergence of Asynchronous Value Iteration
- (s,S) Inventory Control
- POMDP (though there is still more to add)
- Calculus of Variations (though there is still more to add)
- Pontyagin’s Maximum Prinicple
- Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
- (some) Online Convex Optimization
- Stochastic Bandits (UCB and Lai-Robbins Lower bound)
- Gittins’ Index Theorem.
- Sequential/Stochastic Linear Regression (Lai and Wei)
- More discussion on TD methods
- Discussion on double Q-learning and Dueling/Advantage updating
- Convergence proof for SARSA
- Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
- Cross Entropy Method (but still more to do)
- Several new appendices (but mostly from old notes)
Like last year, I will likely update the notes further (and correct typos) towards the end of the course.
Very nice notes, I like the probability appendix especially (for being concise yet wide-ranging). I was looking at the Robbins-Monro proofs, and a few typos I found:
– p. 153, “Doob’s Martingale Convergene Theorem” has “Convergnce” by mistake
– The final line in the proof of Thm 120 uses a_n instead of alpha_n
– p. 149, “An Easy Robbin’s Monro Proof”, no apostrophe needed (and elsewhere, replace “Robbin” with “Robbins”
– In the next paragraph, instead of assuming sum_n E[y_n] is bounded, I think you need y_n^2 instead of y_n, as in the proof you need the term e_n bounded.
– Need a search-and-replace to replace “Munro” with “Monro”
LikeLike
Typo in the notes on the page 8. Transition function f is defined as $f:X times A rightarrow A$ should be as $f:X times A rightarrow X$.
LikeLike