Stochastic Control Notes Update – 2021 – Applied Probability Notes

I’ve updated the notes for this year’s stochastic control course, here:

Asside from general tidying. New material includes:

Equilibrium distributions of Markov chains
Occupancy measure of infinite time horizon MDPs
Linear programming as an algorithm for solving MDPs
Convergence of Asynchronous Value Iteration
(s,S) Inventory Control
POMDP (though there is still more to add)
Calculus of Variations (though there is still more to add)
Pontyagin’s Maximum Prinicple
Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
(some) Online Convex Optimization
Stochastic Bandits (UCB and Lai-Robbins Lower bound)
Gittins’ Index Theorem.
Sequential/Stochastic Linear Regression (Lai and Wei)
More discussion on TD methods
Discussion on double Q-learning and Dueling/Advantage updating
Convergence proof for SARSA
Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
Cross Entropy Method (but still more to do)
Several new appendices (but mostly from old notes)

Like last year, I will likely update the notes further (and correct typos) towards the end of the course.

Leave a comment Cancel reply