Stochastic Control Notes Update – 2021

I’ve updated the notes for this year’s stochastic control course, here:

Asside from general tidying. New material includes:

  • Equilibrium distributions of Markov chains
  • Occupancy measure of infinite time horizon MDPs
  • Linear programming as an algorithm for solving MDPs
  • Convergence of Asynchronous Value Iteration
  • (s,S) Inventory Control
  • POMDP (though there is still more to add)
  • Calculus of Variations (though there is still more to add)
  • Pontyagin’s Maximum Prinicple
  • Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
  • (some) Online Convex Optimization
  • Stochastic Bandits (UCB and Lai-Robbins Lower bound)
  • Gittins’ Index Theorem.
  • Sequential/Stochastic Linear Regression (Lai and Wei)
  • More discussion on TD methods
  • Discussion on double Q-learning and Dueling/Advantage updating
  • Convergence proof for SARSA
  • Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
  • Cross Entropy Method (but still more to do)
  • Several new appendices (but mostly from old notes)

Like last year, I will likely update the notes further (and correct typos) towards the end of the course.

2 thoughts on “Stochastic Control Notes Update – 2021”

  1. Very nice notes, I like the probability appendix especially (for being concise yet wide-ranging). I was looking at the Robbins-Monro proofs, and a few typos I found:

    – p. 153, “Doob’s Martingale Convergene Theorem” has “Convergnce” by mistake
    – The final line in the proof of Thm 120 uses a_n instead of alpha_n
    – p. 149, “An Easy Robbin’s Monro Proof”, no apostrophe needed (and elsewhere, replace “Robbin” with “Robbins”
    – In the next paragraph, instead of assuming sum_n E[y_n] is bounded, I think you need y_n^2 instead of y_n, as in the proof you need the term e_n bounded.
    – Need a search-and-replace to replace “Munro” with “Monro”

    Like

  2. Typo in the notes on the page 8. Transition function f is defined as $f:X times A rightarrow A$ should be as $f:X times A rightarrow X$.

    Like

Leave a comment