We review the proof by Robbins and Munro for finding fixed points. Stochastic gradient descent, Q-learning and a bunch of other stochastic algorithms can be seen as variants of this basic algorithm. We review the basic ingredients of the original proof.
The Simplex Theorem suggests a method for solving linear programs . It is called the Simplex Algorithm.
A linear program is a constrained optimization problem with a linear objective, , and linear functional constraints, , we could for instance write
- Weighted majority algorithm its variant for Bandit Problems.
- Optimal Stopping Problems; One-Step-Look-Ahead Rule
- The Secretary Problem.
- Infinite Time Stopping
- High level idea: Policy Improvement and Policy Evaluation.
- Value Iteration; Policy Iteration.
- Temporal Differences; Q-factors.