# Stochastic Linear Regression

We consider the following formulation of Lai, Robbins and Wei (1979), and Lai and Wei (1982). Consider the following regression problem,

for $n=1,2,...$ where $\epsilon_n$ are unobservable random errors and $\beta_1,...,\beta_p$ are unknown parameters.

Typically for a regression problem, it is assumed that inputs $x_{1},...,x_{n}$ are given and errors are IID random variables. However, we now want to consider a setting where we sequentially choose inputs $x_i$ and then get observations $y_i$, and errors $\epsilon_i$ are a martingale difference sequence with respect to the filtration $\mathcal F_i$ generated by $\{ x_j, y_{j-1} : j\leq i \}$.

We let $X_n= ( x_{ij} : i= 1,..., n, j = 1,..., p )$ be the matrix of inputs and $y_n = (y_i : 1\leq i \leq n)$ be the matrix of outputs. Further we let $b_n$ be the least squares estimate of $\beta$ given $X_n$ and $y_n$.

The following result gives a condition on the eigenvalues of the design matrix $X^\top_n X_n$ for $b_n$ to converge to $\beta$ and also gives a rate of convergence.

If $\lambda_{\min} (n)$ and $\lambda_{\max}(n)$ are, respectively, the minimum and maximum eigenvalues of the design matrix $X^{\top}_n X_n$ and if we assume that for some $\alpha>2$, almost surely, $\mathbb E [ \epsilon_n^{\alpha} | \mathcal F_{n-1} ] < \infty$ Then whenever we have

then $b_n$ converges to $\beta$ and

In what follows, $|| \cdot ||$ is the Euclidean norm for a vector and for a matrix $|| A|| = \sup_{v: ||v||=1} ||Av||$. (Note that it is well known that $|| A|| = \lambda_{\max}(A)$ the maximum eigenvalue of $A$ and $|| A|| = \lambda^-1_{\min}(A)$)

Outline of proof. The least squares estimate to the above regression problem is given by

So $b_n - \beta = ( X^\top_n X_n)^{-1} X^{\top}_n \epsilon_n$ where $\epsilon_n = ( \epsilon_i : i = 1,...,n)$. To prove the above theorem first note that

The inequality above we apply the Cauchey-Schwartz inequality. We bound $Q_n$ using the Sherman-Morrison formula. Specifically we will show that

where $a_N$ is some positive increasing sequence. So since

convergence is determine by the rate of convergence of the sequence

which, with some linear algebra, can be bounded by $\mathcal O(\log \lambda_{\max}(n))$. Thus we arrive at a bound of the form

In what follows we must study the asumptotic behaviour of $Q_n$. What we will show is

Proposition. Almost surely

Proof. To prove this proposition we will require some lemmas, such as the Sherman-Morris formula. These are stated and proven after the proof of this result.

The Sherman-Morrison Formula states that:

Note that

Thus

Thus summing and rearranging a little

Notice in the above, the first summation (before the equals sign) only acts to decrease $Q_N$, while on the right hand side, the first term is a martingale difference sequence and the second term is a quadratic form.

Now because the above Martingale difference sequence is a martingale we have that

In the second equality above, we use that $(1+ x^\top_n V_{n-1} x_n)^{-1} \leq 1$. Thus we have that

By Lemma 2 (below) we have that

Thus we have that

$\square$

Lemma 1 [Sherman-Morrison Formula] For an invertible Matrix $A$ and two vectors $u$ and $v$

Proof. Recalling that the outer-product of two vectors $w v^\top$ is the matrix $(w_i v_j )_{i=1,j=1}^{n,n}$ it holds that

(Nb. This is matrix multiplication: each column is a constant times $u$ and every row is a constant time $v$, so the dot product comes out.)

Using this identity note that

So $(I + w v^\top )^{-1} = I - \frac{wv^\top }{ 1 + v^\top w}$. Now letting $u = A w$,

as required. $\square$

The following in some sense repeatedly analyses to the determinant under the Sherman-Morrison formula.

Lemma 2. If $w_1 , w_2,...$ are a sequence of vectors and we let $A_n = \sum_{k=1}^n w^\top_k w_k$ then

Proof. First note that if $A= B + w^\top w$ then, as was also in the Sherman-Morris formula

Thus

which should remind you of the derivative of the logarithm. (Also note that this tells us that determinant is increasing and that $w^\top Aw \leq 1$.) If we apply this to the above sum and apply the concavity of the logarithm

Since $|A|$ is the product of all eigenvalues $\lambda_{\max}(n)^p \geq |A_N|$ . So we see that

$\square$

References