Stochastic Linear Regression

We consider the following formulation of Lai, Robbins and Wei (1979), and Lai and Wei (1982). Consider the following regression problem,

for n=1,2,... where \epsilon_n are unobservable random errors and \beta_1,...,\beta_p are unknown parameters.

Typically for a regression problem, it is assumed that inputs x_{1},...,x_{n} are given and errors are IID random variables. However, we now want to consider a setting where we sequentially choose inputs x_i and then get observations y_i, and errors \epsilon_i are a martingale difference sequence with respect to the filtration \mathcal F_i generated by \{ x_j, y_{j-1} : j\leq i \}.

We let X_n= ( x_{ij} : i= 1,..., n, j = 1,..., p ) be the matrix of inputs and y_n = (y_i : 1\leq i \leq n) be the matrix of outputs. Further we let b_n be the least squares estimate of \beta given X_n and y_n.

The following result gives a condition on the eigenvalues of the design matrix X^\top_n X_n for b_n to converge to \beta and also gives a rate of convergence.

If \lambda_{\min} (n) and \lambda_{\max}(n) are, respectively, the minimum and maximum eigenvalues of the design matrix X^{\top}_n X_n and if we assume that for some \alpha>2, almost surely, \mathbb E [ \epsilon_n^{\alpha} | \mathcal F_{n-1} ] < \infty Then whenever we have

then b_n converges to \beta and

In what follows, || \cdot || is the Euclidean norm for a vector and for a matrix || A|| = \sup_{v: ||v||=1} ||Av||. (Note that it is well known that || A|| = \lambda_{\max}(A) the maximum eigenvalue of A and || A|| = \lambda^-1_{\min}(A))

Outline of proof. The least squares estimate to the above regression problem is given by

So b_n - \beta = ( X^\top_n X_n)^{-1} X^{\top}_n \epsilon_n where \epsilon_n = ( \epsilon_i : i = 1,...,n). To prove the above theorem first note that

The inequality above we apply the Cauchey-Schwartz inequality. We bound Q_n using the Sherman-Morrison formula. Specifically we will show that

where a_N is some positive increasing sequence. So since

convergence is determine by the rate of convergence of the sequence

which, with some linear algebra, can be bounded by \mathcal O(\log \lambda_{\max}(n)). Thus we arrive at a bound of the form

In what follows we must study the asumptotic behaviour of Q_n. What we will show is

Proposition. Almost surely

Proof. To prove this proposition we will require some lemmas, such as the Sherman-Morris formula. These are stated and proven after the proof of this result.

The Sherman-Morrison Formula states that:

Note that


Thus summing and rearranging a little

Notice in the above, the first summation (before the equals sign) only acts to decrease Q_N, while on the right hand side, the first term is a martingale difference sequence and the second term is a quadratic form.

Now because the above Martingale difference sequence is a martingale we have that

In the second equality above, we use that (1+ x^\top_n V_{n-1} x_n)^{-1} \leq 1. Thus we have that

By Lemma 2 (below) we have that

Thus we have that


Lemma 1 [Sherman-Morrison Formula] For an invertible Matrix A and two vectors u and v

Proof. Recalling that the outer-product of two vectors w v^\top is the matrix (w_i v_j )_{i=1,j=1}^{n,n} it holds that

(Nb. This is matrix multiplication: each column is a constant times u and every row is a constant time v, so the dot product comes out.)

Using this identity note that

So (I + w v^\top )^{-1} = I - \frac{wv^\top }{ 1 + v^\top w}. Now letting u = A w,

as required. \square

The following in some sense repeatedly analyses to the determinant under the Sherman-Morrison formula.

Lemma 2. If w_1 , w_2,... are a sequence of vectors and we let A_n = \sum_{k=1}^n w^\top_k w_k then

Proof. First note that if A= B + w^\top w then, as was also in the Sherman-Morris formula


which should remind you of the derivative of the logarithm. (Also note that this tells us that determinant is increasing and that w^\top Aw \leq 1.) If we apply this to the above sum and apply the concavity of the logarithm

Since |A| is the product of all eigenvalues \lambda_{\max}(n)^p \geq |A_N| . So we see that



This is based on reading:

T.L Lai, Herbert Robbins, C.Z Wei, “Strong consistency of least squares estimates in multiple regression II”, Journal of Multivariate Analysis, Volume 9, Issue 3, 1979, Pages 343-361.

Lai, Tze Leung; Wei, Ching Zong. Least Squares Estimates in Stochastic Regression Models with Applications to Identification and Control of Dynamic Systems. Ann. Statist. 10 (1982), no. 1, 154–166. doi:10.1214/aos/1176345697.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: