Sanov’s asks how likely is it that the empirical distribution some IIDRV’s is far from the distribution. And shows that the relative entropy determines the likelihood of being far.
Let be IIDRVs on some finite set
with a probability distribution
and let
be the empirical distribution of
, i.e.
Ex 1. Show that, almost surely,
Just like Cramer’s Theorem, we wish to get a large deviations view of this law of large numbers convergence. This is Sanov’s Theorem:
Thrm. [Sanov’s Theorem – finite alphabets]
Ex 2. Convince yourself that is a multinomial distribution with
trails parameters
: i.e. for
with
,
Ex 3. The support of a multinomial distribution has size
Ex 4. If for each
then
[ is the entropy of the distribution
.]
Ex 5. If for each
then
The main ideas are now all in place. What follows now are just fiddly details.
We are now more or less done we just need to make the previous result into a LDP upper bound and a lower bound.
Ex 6. [Sanov lower bound] For each open set
Ex 7. [Sanov upper bound] For each closed set
The last two exercises combined prove Sanov’s Theorem. Sanov’s Theorem can be extended two more abstract probability spaces, however we do not pursue such results currently.
Answers
Ans 1. This is just a strong law of large numbers.
Ans 2. Obvious from the definition of a multinomial distribution.
Ans 3. There are points and
dividers. Thus the result holds.
Ans 4. Take logs and apply the Stirling approximation that . So,
Ans 5. Note that , so combining [2] and [4] gives
Ans 6. Take and take sequence
such that
for each
. Since
is open, eventually
. So
Maximising over gives the result.
Ans 7. By [3]
Note that the combinatorial term is a polynomial in (so goes to 0 after taking logs and dividing by
). Let
be the distribution achieving the supreme on the right hand side above. The set
is compact. Taking an appropriate subsequence if required,
for some
. Taking logs limits
The equality above is [5].