# Talagrand’s Concentration Inequality

We prove a powerful inequality which provides very tight gaussian tail bounds “$e^{-ct^2}$” for probabilities on product state spaces $\Omega^n$. Talagrand’s Inequality has found lots of applications in probability and combinatorial optimization and, if one can apply it, it generally outperforms inequalities like Azzuma-Hoeffding.

Thrm [Talagrand’ Concentration Inequality] For $A$ a measurable subset of the product space $\Omega^n$,

and, consequently,

Here,

where, $V(x,A)=\{ u: u_k\geq \mathbb I[x_k\neq y_k], \text{ for }k=1,..,n,\text{ for some }y\in A\}$.

Proof. The duality between the two characterizations of $d(x,A)$ are given in the appendix to this section.

We will prove the inequality holds by induction on $n$. Note for the case $n=1$, $d(x,A)=\mathbb I[x\notin A]$ so holds since $e^{-1/4}(1-\mathbb P(A)) + \mathbb P(A) \leq \mathbb P(A)^{-1}$.1

Now assuming our induction hypothesis up to $n$, we prove the result for the $n+1$ case. We do so by proving the following claim

Claim: For $z=(w,x)\in\Omega^{n+1}$, $w\in\Omega$ and $x\in\Omega^{n}$,

where,

• In equality , we over estimate $d(z,A)$ by two distances on $\Omega^n$: the first, a refined distance using information about $w$, $d(x,A(w))$; and the second, a cruder distance using only information about $x$ and $A$, $d(x,\tilde{A})$. $A(w)=\tilde{A}$ for rectangular sets only.

Proof of claim: We prove the bound by considering the convex combination of points in $V(x,A(w))$ and $V(x,\tilde{A})$. Note if $v_w$ is a point in $V(x,A(w))$ then there exists a $y\in A(w)$ such that $v_k \geq \mathbb I[x_k\neq y_k]$ for each $k=1,...,n$, also $0\geq \mathbb I[w=w]$ for the $k=n+1$ case. So, if $v_w\in V(x,A(w))$ then $(v_w,0)\in V(z,A)$. By a similar easy argument, we see that if $\tilde{v}\in V(x,\tilde{A})$ then $(\tilde{v},1)\in V(z,A)$. It is also clear that these inclusions extend on convex combinations $$ and $$. Thus we see that if $v_w\in V(x,A(w))$ and $\tilde{w}\in V(x,\tilde{A})$ then

Also, using convexity of $||\cdot||_2^2$,

Optimizing over $, $ and $$, as required, we get

This completes the proof of our claim and we continue with the proof of Talagrand’s Inequality.

Continuing to write $z=(w,x)\in \Omega\times \Omega^n$ and applying of inequality , we have for the conditional expectation that For second inequality, above, we apply Holder’s Inequality; in the third inequality, we apply our induction hypothesis; in the fourth inequality, we use the fact that $\inf_{0 \leq \lambda \leq 1} r^{-\lambda}e^{(1-\lambda)^2/4} \leq 2-r$, which we prove in the appendix to this section.

Now taking expectations over $w$, and observing $x(2-x)\leq 1$, we gain the required result:

$\square$

## Applications

Let’s informally discuss the application of Talagrand’s Inequality and then give a few examples of how to apply the result. First we recall that the Hamming distance is $d_H(x,y) =\sum_{i=1}^n \mathbb I[x_i\neq y_i]$, so Talagrand considers a weighted Hamming distance from the set of $y\in A$, $d_{\alpha,H}(x,A):=\inf_{y\in A} \sum_{i=1}^n \alpha_i\mathbb I[x_i \neq y_i]$. A result of a similar form was known to hold previously to Talagrand’s, namely,2

However the bound is vastly improved by allowing us to optimize over $\alpha$ given $x$: from an event $A=\{ F(x)\leq s\}$ it is much more possible to deduce that for some $s'$, $\{F(x) \geq s'\} \subset \{ \sup_\alpha d_{\alpha,H} (x,A) \geq t\}$. In particular, we are in a strong position if we can show a bound of the form

Note by multiplying by $-1$ both bounds are equivalent. In both cases, we imagine that, if making a change so some of the compenents of $x$ gives $y$, then we have a way of estimating the change in $F(x)$ for each component changed. Note the (left) bound above implies

Thus if we can bound $||\alpha^x||_2$ above by $a$, (the smaller the better) and we take an event $A=\{ F(x) \geq s\}$ then $d(x,A) \geq (s-F(x)) a^{-1}$. Thus, for this choice of $A$, $F(x) \leq s'$ implies $d(x,A) \geq (s-s') a^{-1}$. Thus applying Talagrand gives

For median, $m$, inspecting the above bound with $s=m$ and $s'=m-t$ and then with $s=m+t$ and $s'=m$ yields a bound

At this point, we have a pretty convincing concentration bound about the median of the distribution of $F(x)$.

One might prefer work with the mean of F(x), $\mu$, rather than the median, $m$. Note, a Cantelli’s inequality implies that the the mean is withing one standard deviation (see Lemma [Ineqs:MeanMedian]). We can estimate the variance of $F(x)$ from the above concentration inequality: the variance about the mean is always minimal so

So, $|\mu - m|\leq 4 a$ which gives $|F(x)-\mu| \leq |F(x)-m| + 4a$ and so

So once again we get a tail-bound with the same concentration behaviour.

### Longest Increasing Subsequence

Given independent random variables $X=(X_1,X_2,...,X_N)$ are independent real valued random variables, we consider the problem of finding the following Longest Increasing Subsequence:

Comparing two (similar) sequences $x$ and $y$ notice a longest subsequence of $x$, $S(x)$ could be used as a (long) subsequence for $y$. That is

We take $\alpha$ such that $\alpha_i= {I(x)}^{-1/2}$ if $i\in S(x)$ and $\alpha_i=0$ otherwise. If we consider the distance $d_\alpha(x,y) = {I(x)}^{-1/2} \cdot | i\in S(x) : x_i \neq y_i |$, then we have

which, on the event $A=\{y : I(y) \leq s \}$, gives

Note given we are working with an event of the form $A=\{ I(x) \leq s\}$ we look for an event $\{I(x)\geq s'\}$ that implies $d(x,A)$ is of a certian size. Since the function $\frac{z-s}{\sqrt{z}}$ is increasing for $z\geq s$, then $I(x) \geq s+t$ implies $\frac{I(x)-s}{\sqrt{I(x)}} \geq \frac{t}{\sqrt{t+s}}$. Applying this then Talagrands inequality gives

For $m$ the median of this distibution, then setting $s=m-t$ and setting $s=m$ gives two tail bounds in the following result.

Theorem

Note the bounds do not look not terribly symmetric but, in order to achieve a guaranteed probability in both bounds, we must choose $t$ of the same order, $O(m)$, in both bounds.

# Appendix

We first show the equality of our primal-dual characterization for $d(x,A)$.

For $r\geq 0$

It is that this is equivalent to showing that

The left-hand side is minimized by $\lambda= 1+2\log(r)$. Thus, after substituting, we are required to show

We can equivalently expressed this inequality as an integral

The term in the square brackets is negative: $2\log x-x+1$ is concave and $x\log x$ is convex, both terms are equal when $x=1$ so the rest of the time $2\log x-x+1< x\log x$. This completes the lemma. $\square$

1. Hint: the left-hand side is increasing in $\mathbb P(A)$ and the right is decreasing in $\mathbb P(A)$ and they are equal at $\mathbb P(A)=1$.
2. See, Theorem 3.1 of McDiarmid, Colin. “Concentration.” Probabilistic methods for algorithmic discrete mathematics. Springer Berlin Heidelberg, 1998. 195-248.