Sanov’s Theorem

Sanov’s asks how likely is it that the empirical distribution some IIDRV’s is far from the distribution. And shows that the relative entropy determines the likelihood of being far.

Let $X_1,X_2,...$ be IIDRVs on some finite set $\{1,...,m\}$ with a probability distribution $P$ and let $\hat{P}_n$ be the empirical distribution of $X_1,...,X_n$ , i.e.

$\hat{P}_{n} = \frac{1}{n} \sum_{k=1}^{n} \delta_{X_k}.$

Ex 1. Show that, almost surely,

$\hat{P}_n \rightarrow P$

Just like Cramer’s Theorem, we wish to get a large deviations view of this law of large numbers convergence. This is Sanov’s Theorem:

Thrm. [Sanov’s Theorem – finite alphabets]

$\begin{aligned} - \inf_{Q \in {\mathcal Q}^\circ} D( Q || P) & \leq \liminf_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} ( \hat{P}_n \in {\mathcal Q} ) \\ & \leq \limsup_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} ( \hat{P}_n \in {\mathcal Q} ) \leq -\inf_{Q \in \overline{{\mathcal Q}}} D( Q || P)\end{aligned}$

Ex 2. Convince yourself that $n\hat{P}_n$ is a multinomial distribution with $n$ trails parameters $p_1,...,p_n$ : i.e. for $\mathbf{n}=(n_i : i=1,...,m)$ with $\sum_{i=1}^{m}n_i =n$ ,

$\mathbb{P} \left( n \hat{P}_n = \mathbf{n} \right) = \left( \begin{array}{c} n \\ n_i\: : \: i=1,...,m \end{array} \right) \prod_{i=1}^{m} p_i^{n_i}$

Ex 3. The support of a multinomial distribution $multinom(n; p_1,...,p_m)$ has size

${ n+m-1 \choose n }$

Ex 4. If $n_i/n \rightarrow q_i$ for each $i=1,...,m$ then

$\lim_{n\rightarrow \infty} \frac{1}{n} \log\left( \begin{array}{c} n \\ n_i\: : \: i=1,...,m \end{array} \right) = -\sum_{i=1}^m q_i \log q_i =: H(Q)$

[ $H(q)$ is the entropy of the distribution $Q=(q_i:i=1,...,m)$ .]

Ex 5. If $n_i/n \rightarrow q_i$ for each $i=1,...,m$ then

$\lim_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} \left( \hat{P}_n = \frac{ \mathbf{n} }{ n } \right) = - D(Q||P)$

The main ideas are now all in place. What follows now are just fiddly details.

We are now more or less done we just need to make the previous result into a LDP upper bound and a lower bound.

Ex 6. [Sanov lower bound] For each open set ${\mathcal Q}\subset {\mathcal P}(\{1...m\})$

$-\inf_{Q\in {\mathcal Q} } D(Q||P) \leq \liminf_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} \left( \hat{P}_n \in {\mathcal Q} \right)$

Ex 7. [Sanov upper bound] For each closed set ${\mathcal Q}\in {\mathcal P}(\{1,...,m\})$

$\limsup_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} ( \hat{P}_n \in {\mathcal Q} ) \leq -\inf_{Q \in {{\mathcal Q}}} D( Q || P)$

The last two exercises combined prove Sanov’s Theorem. Sanov’s Theorem can be extended two more abstract probability spaces, however we do not pursue such results currently.

Answers

Ans 1. This is just a strong law of large numbers.

Ans 2. Obvious from the definition of a multinomial distribution.

Ans 3. There are $n$ points and $m-1$ dividers. Thus the result holds.

Ans 4. Take logs and apply the Stirling approximation that $\log(n!)=n\log n - n + o(n)$ . So,

$\begin{aligned} \log \left( \begin{array}{c} n \\ n_i\: : \: i=1,...,m \end{array} \right) & =n\log n - n -\sum_{i=1}^m (n_i\log n_i -n_i) + o(n) \\ &=n\log n - \sum_{i=1}^m np_i \log p_i -\sum_{i=1}^\infty np_i\log n + o(n) \\ &=-n\sum_{i=1}^m p_i\log p_i +o(n).\end{aligned}$

Ans 5. Note that $p_i^{n_i} = e^{nq_i \log p_i}$ , so combining [2] and [4] gives

$\lim_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} \left( \hat{P}_n = \frac{ \mathbf{n} }{ n } \right) = -H(Q) + \sum_{i} q_i \log p_i = -D(Q||P).$

Ans 6. Take $Q=(q_i : i=1,...,m)\in {\mathcal Q}$ and take sequence ${\mathbf n}$ such that $n_i/n \rightarrow q_i$ for each $i$ . Since ${\mathcal Q}$ is open, eventually ${\mathbf n}/n \in {\mathcal Q}$ . So

$\liminf_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} \left( \hat{P}_n \in {\mathcal Q} \right) \geq \lim_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} \left( \hat{P}_n = \frac{\mathbf{n}}{n} \right) = - D(Q||P).$

Maximising over $Q\in {\mathcal Q}$ gives the result.

Ans 7. By [3]

$\begin{aligned} \mathbb{P} ( \hat{P}_n \in {\mathcal Q} ) = \sum_{Q\in {\mathcal Q} } \mathbb{P} ( \hat{P}_n = Q ) \leq { n+m-1 \choose n } \max_{Q \in {\mathcal Q}} \mathbb{P} ( \hat{P}_n = Q )\end{aligned}$

Note that the combinatorial term is a polynomial in $n$ (so goes to 0 after taking logs and dividing by $n$ ). Let $Q^*_n$ be the distribution achieving the supreme on the right hand side above. The set ${\mathcal Q}$ is compact. Taking an appropriate subsequence if required, $Q^*_n \rightarrow Q^*$ for some $Q^* \in {\mathcal Q}$ . Taking logs limits

$\limsup_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} ( \hat{P}_n \in {\mathcal Q} ) \leq \lim_{n\rightarrow \infty} \frac{1}{n} \log \mathbb{P} ( \hat{P}_n = Q^*_n ) = -D(Q^*||P) \leq - \inf_{Q\in {\mathcal Q} } D(Q||P).$

The equality above is [5].

Answers

Share this:

Leave a comment Cancel reply