Q-learning is an algorithm, that contains many of the basic structures required for reinforcement learning and acts as the basis for many more sophisticated algorithms. The Q-learning algorithm can be seen as an (asynchronous) implementation of the Robbins-Monro procedure for finding fixed points. For this reason we will require results from Robbins-Monro when proving convergence.
Continue reading “Q-learning”
We review a method for finding fixed points then extend it to slightly more general, modern proofs. This is a much more developed version of an earlier post. We now cover the basic Robbin-Monro proof, Robbins-Siegmund Theorem, Stochastic Gradient Descent and Asynchronous update (as is required for Q-learning).
Continue reading “Robbins-Monro”
- HJB equation for Merton Problem; CRRA utility solution; Proof of Optimality.
- Multiple Assets; Dual Value function Approach.
Continue reading “Merton Portfolio Optimization”
What follows is a heuristic derivation of the Stochastic Integral, Stochastic Differential Equations and Itô’s Formula.
Continue reading “Stochastic Integration: A Quick Summary”
Discrete time Dynamic Programming was given in the post Dynamic Programming. We now consider the continuous time analogue.
Continue reading “Continuous Time Dynamic Programming”
An Optimal Stopping Problem is an Markov Decision Process where there are two actions: meaning to stop, and meaning to continue. Here there are two types of costs
This defines a stopping problem.
Continue reading “Optimal Stopping”