MATH69122 Stochastic Control for Finance – Applied Probability Notes

Stochastic Control Notes Update – 2021

I’ve updated the notes for this year’s stochastic control course, here:

Asside from general tidying. New material includes:

Equilibrium distributions of Markov chains
Occupancy measure of infinite time horizon MDPs
Linear programming as an algorithm for solving MDPs
Convergence of Asynchronous Value Iteration
(s,S) Inventory Control
POMDP (though there is still more to add)
Calculus of Variations (though there is still more to add)
Pontyagin’s Maximum Prinicple
Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
(some) Online Convex Optimization
Stochastic Bandits (UCB and Lai-Robbins Lower bound)
Gittins’ Index Theorem.
Sequential/Stochastic Linear Regression (Lai and Wei)
More discussion on TD methods
Discussion on double Q-learning and Dueling/Advantage updating
Convergence proof for SARSA
Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
Cross Entropy Method (but still more to do)
Several new appendices (but mostly from old notes)

Like last year, I will likely update the notes further (and correct typos) towards the end of the course.

I’ve updated the stochastic control notes here:

These still remain a work in progress.

(Typos/errors/mishaps found are welcome)

A quick summary of what is new:

I’ve updated a section on the ODE method for stochastic approximation.
I’ve improved the discussion around Temporal Difference methods and included some proofs.
I’ve added a proof of convergence for SARSA.
I’ve added a section on Lyapunov functions in continuous time
- La Salle, exponential convergence, and online convex optimization…
I’ve started a section on Policy Gradients but there is more recent proofs to include
I’ve started a section on Deep Learning for RL.