Q-learning is an algorithm, that contains many of the basic structures required for reinforcement learning and acts as the basis for many more sophisticated algorithms. The Q-learning algorithm can be seen as an (asynchronous) implementation of the Robbins-Monro procedure for finding fixed points. For this reason we will require results from Robbins-Monro when proving convergence.
Continue reading “Q-learning”
We review a method for finding fixed points then extend it to slightly more general, modern proofs. This is a much more developed version of an earlier post. We now cover the basic Robbin-Monro proof, Robbins-Siegmund Theorem, Stochastic Gradient Descent and Asynchronous update (as is required for Q-learning).
Continue reading “Robbins-Monro”
- HJB equation for Merton Problem; CRRA utility solution; Proof of Optimality.
- Multiple Assets; Dual Value function Approach.
Continue reading “Merton Portfolio Optimization”
What follows is a heuristic derivation of the Stochastic Integral, Stochastic Differential Equations and Itô’s Formula.
Continue reading “Stochastic Integration: A Quick Summary”
Discrete time Dynamic Programming was given in the post Dynamic Programming. We now consider the continuous time analogue.
Continue reading “Continuous Time Dynamic Programming”
An Optimal Stopping Problem is an Markov Decision Process where there are two actions: meaning to stop, and meaning to continue. Here there are two types of costs
This defines a stopping problem.
Continue reading “Optimal Stopping”
For infinite time MDPs, we cannot apply to induction on Bellman’s equation from some initial state – like we could for finite time MDP. So we need some algorithms to solve MDPs.
Continue reading “Algorithms for MDPs”
This section is intended as a brief introductory recap of Markov chains. A much fuller explanation and introduction is provided in standard texts e.g. Norris, Bremaud, or Levin & Peres (see references below).
Continue reading “Markov Chains: A Quick Review”
Thus far we have considered finite time Markov decision processes. We now want to solve MDPs of the form
Continue reading “Infinite Time Horizon”