Q-learning is an algorithm, that contains many of the basic structures required for reinforcement learning and acts as the basis for many more sophisticated algorithms. The Q-learning algorithm can be seen as an (asynchronous) implementation of the Robbins-Monro procedure for finding fixed points. For this reason we will require results from Robbins-Monro when proving convergence.
Continue reading “Q-learning”
We review a method for finding fixed points then extend it to slightly more general, modern proofs. This is a much more developed version of an earlier post. We now cover the basic Robbin-Monro proof, Robbins-Siegmund Theorem, Stochastic Gradient Descent and Asynchronous update (as is required for Q-learning).
Continue reading “Robbins-Monro”
- HJB equation for Merton Problem; CRRA utility solution; Proof of Optimality.
- Multiple Assets; Dual Value function Approach.
Continue reading “Merton Portfolio Optimization”
What follows is a heuristic derivation of the Stochastic Integral, Stochastic Differential Equations and Itô’s Formula.
Continue reading “Stochastic Integration: A Quick Summary”
Discrete time Dynamic Programming was given in the post Dynamic Programming. We now consider the continuous time analogue.
Continue reading “Continuous Time Dynamic Programming”