Sanov’s Theorem

Sanov’s asks how likely is it that the empirical distribution some IIDRV’s is far from the distribution. And shows that the relative entropy determines the likelihood of being far.

Let X_1,X_2,... be IIDRVs on some finite set \{1,...,m\} with a probability distribution P and let \hat{P}_n be the empirical distribution of X_1,...,X_n, i.e.

Ex 1. Show that, almost surely,

Just like Cramer’s Theorem, we wish to get a large deviations view of this law of large numbers convergence. This is Sanov’s Theorem:

Thrm. [Sanov’s Theorem – finite alphabets]

Ex 2. Convince yourself that n\hat{P}_n is a multinomial distribution with n trails parameters p_1,...,p_n: i.e. for \mathbf{n}=(n_i : i=1,...,m) with \sum_{i=1}^{m}n_i =n,

Ex 3. The support of a multinomial distribution multinom(n; p_1,...,p_m) has size

Ex 4. If n_i/n \rightarrow q_i for each i=1,...,m then

[H(q) is the entropy of the distribution Q=(q_i:i=1,...,m).]

Ex 5. If n_i/n \rightarrow q_i for each i=1,...,m then

The main ideas are now all in place. What follows now are just fiddly details.

We are now more or less done we just need to make the previous result into a LDP upper bound and a lower bound.

Ex 6. [Sanov lower bound] For each open set {\mathcal Q}\subset {\mathcal P}(\{1...m\})


Ex 7. [Sanov upper bound] For each closed set {\mathcal Q}\in {\mathcal P}(\{1,...,m\})


The last two exercises combined prove Sanov’s Theorem. Sanov’s Theorem can be extended two more abstract probability spaces, however we do not pursue such results currently.


Ans 1. This is just a strong law of large numbers.

Ans 2. Obvious from the definition of a multinomial distribution.

Ans 3. There are n points and m-1 dividers. Thus the result holds.

Ans 4. Take logs and apply the Stirling approximation that \log(n!)=n\log n - n + o(n). So,

Ans 5. Note that p_i^{n_i} = e^{nq_i \log p_i}, so combining [2] and [4] gives

Ans 6. Take Q=(q_i : i=1,...,m)\in {\mathcal Q} and take sequence {\mathbf n} such that n_i/n \rightarrow q_i for each i. Since {\mathcal Q} is open, eventually {\mathbf n}/n \in {\mathcal Q}. So

Maximising over Q\in {\mathcal Q} gives the result.

Ans 7. By [3]

Note that the combinatorial term is a polynomial in n (so goes to 0 after taking logs and dividing by n). Let Q^*_n be the distribution achieving the supreme on the right hand side above. The set {\mathcal Q} is compact. Taking an appropriate subsequence if required, Q^*_n \rightarrow Q^* for some Q^* \in {\mathcal Q}. Taking logs limits

The equality above is [5].

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: