Stochastic Control Notes Update – 2021

I’ve updated the notes for this year’s stochastic control course, here:

Asside from general tidying. New material includes:

  • Equilibrium distributions of Markov chains
  • Occupancy measure of infinite time horizon MDPs
  • Linear programming as an algorithm for solving MDPs
  • Convergence of Asynchronous Value Iteration
  • (s,S) Inventory Control
  • POMDP (though there is still more to add)
  • Calculus of Variations (though there is still more to add)
  • Pontyagin’s Maximum Prinicple
  • Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
  • (some) Online Convex Optimization
  • Stochastic Bandits (UCB and Lai-Robbins Lower bound)
  • Gittins’ Index Theorem.
  • Sequential/Stochastic Linear Regression (Lai and Wei)
  • More discussion on TD methods
  • Discussion on double Q-learning and Dueling/Advantage updating
  • Convergence proof for SARSA
  • Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
  • Cross Entropy Method (but still more to do)
  • Several new appendices (but mostly from old notes)

Like last year, I will likely update the notes further (and correct typos) towards the end of the course.

Leave a comment