Stochastic Linear Regression

We consider the following formulation of Lai, Robbins and Wei (1979), and Lai and Wei (1982). Consider the following regression problem,

for n=1,2,... where \epsilon_n are unobservable random errors and \beta_1,...,\beta_p are unknown parameters.

Typically for a regression problem, it is assumed that inputs x_{1},...,x_{n} are given and errors are IID random variables. However, we now want to consider a setting where we sequentially choose inputs x_i and then get observations y_i, and errors \epsilon_i are a martingale difference sequence with respect to the filtration \mathcal F_i generated by \{ x_j, y_{j-1} : j\leq i \}.

Continue reading “Stochastic Linear Regression”



Q-learning is an algorithm, that contains many of the basic structures required for reinforcement learning and acts as the basis for many more sophisticated algorithms. The Q-learning algorithm can be seen as an (asynchronous) implementation of the Robbins-Munro procedure for finding fixed points. For this reason we will require results from Robbins-Munro when proving convergence.

Continue reading “Q-learning”