The Law of Large Numbers and Central Limit Theorem

Let’s explain why the normal distribution is so important.

(This is a section in the notes here.)

Suppose that I throw a coin 100 times and count the number of heads

Screenshot 2021-11-18 at 16.35.48

The proportion of heads should be close to its mean and for 10,000 it should be even closer. This can be shown mathematically (not just for coin throws but for quite general random variables)

Theorem [Weak Law of Large Numbers] For independent random variables X_i, i=1,...,n, with mean \mu and variance bounded above by \sigma, if we define

then for all \epsilon >0

We will prove this result a little later. But, continuing the discussion, suppose X_1,...,X_n are independent identically distributed random variables with mean \mu and variance \sigma^2. We see from the above result that S_n /n is getting close to \mu. Nonetheless, in general, there is going to be some error. So let’s define

So what does \epsilon_n look like? We know that, in some sense, \epsilon_n \rightarrow 0 as n \rightarrow \infty but how fast?

For this we can analyze the variance of the random variable \epsilon_n:

Screenshot 2021-11-18 at 16.36.57

Thus the standard deviation of \epsilon_n decreases as \sigma / \sqrt{n}. Given this we can define

Screenshot 2021-11-18 at 16.37.18

Notice that \mathbb E [Z_n]=0 and

Screenshot 2021-11-18 at 16.37.22

So Z_n has mean zero and its variance is fixed. I.e. the error as measured by Z_n is not vanishing, but is staying roughly constant. So it seems like there is sometime happening for this random variable Z_n, a question is what happens to Z_n. The answer is that Z_n converges to a normal distribution.

This is a famous and fundamental result in probability and statistics called the central limit theorem.

Theorem [Central Limit Theorem] For independent random variables X_i with mean \mu and variance \sigma^2, for S_n = \sum_{i=1}^n X_i and

then

Screenshot 2021-11-18 at 16.38.20

where Z is a standard normal random variable.

Given the discussion above the Central Limit Theorem, roughly says that

Screenshot 2021-11-18 at 16.38.24

where Z is a standard normal random variable. So whenever we measure errors about some expected value we should start to consider normal random variables.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s