We prove a powerful inequality which provides very tight gaussian tail bounds “ $e^{-ct^2}$ ” for probabilities on product state spaces $\Omega^n$ . Talagrand’s Inequality has found lots of applications in probability and combinatorial optimization and, if one can apply it, it generally outperforms inequalities like Azzuma-Hoeffding.

Thrm [Talagrand’ Concentration Inequality] For $A$ a measurable subset of the product space $\Omega^n$ ,

$\label{Tal:Ineq} \int e^{\frac{1}{4}d(x,A)^2} \bP(dx) \leq \frac{1}{\bP(A)}.$

and, consequently,

$\bP( d(x,A) \geq t )\bP(A) \leq e^{-\frac{t^2}{4}}.$

Here,

$\begin{aligned} d(x,A)&= \sup_{\substack{\alpha\in\bR_+^n\\ ||\alpha||_2\leq 1}} \inf_{y\in A} \sum_{k=1}^n \alpha_k \bI[x_k\neq y_k]= \inf_{v\in < V(x,A)>} ||v||_2 \end{aligned}$

where, $V(x,A)=\{ u: u_k\geq \mathbb I[x_k\neq y_k], \text{ for }k=1,..,n,\text{ for some }y\in A\}$ .

Proof. The duality between the two characterizations of $d(x,A)$ are given in the appendix to this section.

We will prove the inequality holds by induction on $n$ . Note for the case $n=1$ , $d(x,A)=\mathbb I[x\notin A]$ so holds since $e^{-1/4}(1-\mathbb P(A)) + \mathbb P(A) \leq \mathbb P(A)^{-1}$ .¹

Now assuming our induction hypothesis up to $n$ , we prove the result for the $n+1$ case. We do so by proving the following claim

Claim: For $z=(w,x)\in\Omega^{n+1}$ , $w\in\Omega$ and $x\in\Omega^{n}$ ,

$d(z,A)^2\leq \lambda d(x,A(w))^2+ (1-\lambda) d(x,\tilde{A})^2 + (1-\lambda)^2, \qquad \forall \Lambda \in (0,1) \label{Tal:dineq}$

where,

$A(w)=\{x'\in \Omega^{n}: (w,x')\in A\} \qquad\text{and}\qquad \tilde{A}= \{ \tilde{x} \in \Omega^n: (\tilde{w},\tilde{x})\in A \text{ for some } \tilde{w} \}.$

In equality , we over estimate $d(z,A)$ by two distances on $\Omega^n$ : the first, a refined distance using information about $w$ , $d(x,A(w))$ ; and the second, a cruder distance using only information about $x$ and $A$ , $d(x,\tilde{A})$ . $A(w)=\tilde{A}$ for rectangular sets only.

Proof of claim: We prove the bound by considering the convex combination of points in $V(x,A(w))$ and $V(x,\tilde{A})$ . Note if $v_w$ is a point in $V(x,A(w))$ then there exists a $y\in A(w)$ such that $v_k \geq \mathbb I[x_k\neq y_k]$ for each $k=1,...,n$ , also $0\geq \mathbb I[w=w]$ for the $k=n+1$ case. So, if $v_w\in V(x,A(w))$ then $(v_w,0)\in V(z,A)$ . By a similar easy argument, we see that if $\tilde{v}\in V(x,\tilde{A})$ then $(\tilde{v},1)\in V(z,A)$ . It is also clear that these inclusions extend on convex combinations $<V(x,A(w))>$ and $<V(x,\tilde{A})>$ . Thus we see that if $v_w\in V(x,A(w))$ and $\tilde{w}\in V(x,\tilde{A})$ then

$v(\lambda)=\lambda (v_w,0)+ (1-\lambda) (\tilde{v},1) \in <V(z,A)>.$

Also, using convexity of $||\cdot||_2^2$ ,

$||v(\lambda)||_2^2= ||\lambda v_w + (1-\lambda) \tilde{v} ||_2^2 + (1-\lambda)^2 \leq \lambda ||v_w||_2^2 + (1-\lambda) ||\tilde{v}||_2^2 + (1-\lambda)^2.$

Optimizing over $<V(z,A)>, <V(x,A(w))>$ and $<V(x,\tilde{A})>$ , as required, we get

$d(z,A)^2\leq \lambda d(x,A(w))^2 + (1-\lambda) d(z,\tilde{A})^2 + (1-\lambda)^2.$

This completes the proof of our claim and we continue with the proof of Talagrand’s Inequality.

Continuing to write $z=(w,x)\in \Omega\times \Omega^n$ and applying of inequality , we have for the conditional expectation that $\begin{aligned} \int e^{\frac{1}{4} d(z,A)} \bP(dx, w) & \leq \int e^{\frac{1}{4} \lambda d(x,A(w))^2+ \frac{1}{4}(1-\lambda) d(x,\tilde{A})^2 +\frac{1}{4} (1-\lambda)^2} \bP(dx, w) \\ & \leq e^{\frac{1}{4} (1-\lambda)^2}\left( \int e^{\frac{1}{4}d(x,A(w))^2}\bP(dx, w) \right)^{\lambda}\left( \int e^{\frac{1}{4}d(x,\tilde{A})^2}\bP(dx, w) \right)^{(1-\lambda)}\\ & \leq e^{\frac{1}{4}(1-\lambda)^2} \bP(A(w))^{-\lambda} \bP(\tilde{A})^{-(1-\lambda)}\\ &\leq \frac{1}{\bP(\tilde{A})} \left(2 - \frac{\bP(A(w))}{\bP(\tilde{A})} \right).\end{aligned}$ For second inequality, above, we apply Holder’s Inequality; in the third inequality, we apply our induction hypothesis; in the fourth inequality, we use the fact that $\inf_{0 \leq \lambda \leq 1} r^{-\lambda}e^{(1-\lambda)^2/4} \leq 2-r$ , which we prove in the appendix to this section.

Now taking expectations over $w$ , and observing $x(2-x)\leq 1$ , we gain the required result:

$\int e^{\frac{1}{4} d(z,A)} \bP(dz) \leq \frac{1}{\bP(A)} \frac{\bP(A)}{\bP(\tilde{A})} \left(2 - \frac{\bP(A)}{\bP(\tilde{A})} \right) \leq \frac{1}{\bP(A)}.$

$\square$

Applications

Let’s informally discuss the application of Talagrand’s Inequality and then give a few examples of how to apply the result. First we recall that the Hamming distance is $d_H(x,y) =\sum_{i=1}^n \mathbb I[x_i\neq y_i]$ , so Talagrand considers a weighted Hamming distance from the set of $y\in A$ , $d_{\alpha,H}(x,A):=\inf_{y\in A} \sum_{i=1}^n \alpha_i\mathbb I[x_i \neq y_i]$ . A result of a similar form was known to hold previously to Talagrand’s, namely,²

$\bP( A) \bP( d_{\alpha,H}(x,A) \geq t) \leq e^{-\frac{t^2}{2}}$

However the bound is vastly improved by allowing us to optimize over $\alpha$ given $x$ : from an event $A=\{ F(x)\leq s\}$ it is much more possible to deduce that for some $s'$ , $\{F(x) \geq s'\} \subset \{ \sup_\alpha d_{\alpha,H} (x,A) \geq t\}$ . In particular, we are in a strong position if we can show a bound of the form

$F(x) \geq F(y) - \sum_{i=1}^N \alpha_i^x \bI [x_i\neq y_i] \qquad\text{or}\qquad F(x) \leq F(y) + \sum_{i=1}^N \alpha_i^x \bI [x_i\neq y_i].$

Note by multiplying by $-1$ both bounds are equivalent. In both cases, we imagine that, if making a change so some of the compenents of $x$ gives $y$ , then we have a way of estimating the change in $F(x)$ for each component changed. Note the (left) bound above implies

$d(x,A) \geq \frac{F(y) - F(x)}{||\alpha^x ||_2}, \qquad \forall y\in A$

Thus if we can bound $||\alpha^x||_2$ above by $a$ , (the smaller the better) and we take an event $A=\{ F(x) \geq s\}$ then $d(x,A) \geq (s-F(x)) a^{-1}$ . Thus, for this choice of $A$ , $F(x) \leq s'$ implies $d(x,A) \geq (s-s') a^{-1}$ . Thus applying Talagrand gives

$\bP( F(x) \geq s) \bP(F(x) \leq s') \leq \bP( F(x) \geq s ) \bP( d(x,\{ F(y) \geq s \}) \geq (s-s')/a ) \leq e^{-\frac{1}{4} \frac{(s-s')^2}{a^2} }$

For median, $m$ , inspecting the above bound with $s=m$ and $s'=m-t$ and then with $s=m+t$ and $s'=m$ yields a bound

$\bP(|F(x) - m| \geq t) \leq 4 e^{-\frac{1}{4} \frac{t^2}{a^2} }.$

At this point, we have a pretty convincing concentration bound about the median of the distribution of $F(x)$ .

One might prefer work with the mean of F(x), $\mu$ , rather than the median, $m$ . Note, a Cantelli’s inequality implies that the the mean is withing one standard deviation (see Lemma [Ineqs:MeanMedian]). We can estimate the variance of $F(x)$ from the above concentration inequality: the variance about the mean is always minimal so

$\bE ( F(X) -\mu)^2 \leq \bE ( F(X) - m)^2 \leq \int_0^\infty \bP(|F(x) - m|^2 \geq z) dz \leq 4 \int_0^\infty e^{-\frac{1}{4} \frac{z}{a^2} } dz = 16 a^2.$

So, $|\mu - m|\leq 4 a$ which gives $|F(x)-\mu| \leq |F(x)-m| + 4a$ and so

$\bP( |F(x) - \mu| \geq t )\leq 4 e^{-\frac{(t-4a)^2}{a^2}}$

So once again we get a tail-bound with the same concentration behaviour.

Longest Increasing Subsequence

Given independent random variables $X=(X_1,X_2,...,X_N)$ are independent real valued random variables, we consider the problem of finding the following Longest Increasing Subsequence:

$I(X):=\max \qquad |S| \qquad \text{subject to} \qquad X_i < X_j,\quad i< j,\quad i,j\in S \quad \text{over} \quad S \subset \{1,...,N\}.$

Comparing two (similar) sequences $x$ and $y$ notice a longest subsequence of $x$ , $S(x)$ could be used as a (long) subsequence for $y$ . That is

$I(y) \geq | i \in S : x_i = y_i | = I(x) - | i\in S : x_i \neq y_i|.$

We take $\alpha$ such that $\alpha_i= {I(x)}^{-1/2}$ if $i\in S(x)$ and $\alpha_i=0$ otherwise. If we consider the distance $d_\alpha(x,y) = {I(x)}^{-1/2} \cdot | i\in S(x) : x_i \neq y_i |$ , then we have

$I(y) \geq I(x) - \sqrt{I(x)} d_\alpha(x,y)$

which, on the event $A=\{y : I(y) \leq s \}$ , gives

$d(x,A)=\sup_{y\in A} \sup_{\alpha'} d_{\alpha'} (x,y) \geq \sup_{y\in A} \frac{I(x) - I(y)}{\sqrt{I(x)}} \geq \frac{I(x) -s}{\sqrt{I(x)}}.$

Note given we are working with an event of the form $A=\{ I(x) \leq s\}$ we look for an event $\{I(x)\geq s'\}$ that implies $d(x,A)$ is of a certian size. Since the function $\frac{z-s}{\sqrt{z}}$ is increasing for $z\geq s$ , then $I(x) \geq s+t$ implies $\frac{I(x)-s}{\sqrt{I(x)}} \geq \frac{t}{\sqrt{t+s}}$ . Applying this then Talagrands inequality gives

$\bP(I(x)\leq s) \bP(I(x) \geq s+t) \leq \bP(I(x)\leq s) \bP\Big(d(X,A) \geq \frac{t}{\sqrt{t+s}} \Big) \leq e^{-\frac{1}{4}\frac{t^2}{(s+t)}}$

For $m$ the median of this distibution, then setting $s=m-t$ and setting $s=m$ gives two tail bounds in the following result.

Theorem

$\bP( I(x) \leq m-t) \leq 2 e^{-\frac{1}{4} \frac{t^2}{m}} ,\qquad \bP(I(x) \geq m+t) \leq 2 e^{-\frac{1}{4}\frac{t^2}{(m+t)}}$

Note the bounds do not look not terribly symmetric but, in order to achieve a guaranteed probability in both bounds, we must choose $t$ of the same order, $O(m)$ , in both bounds.

Appendix

We first show the equality of our primal-dual characterization for $d(x,A)$ .

$\begin{aligned} \sup_{\substack{\alpha\in\bR_+^n\\ ||\alpha||_2\leq 1}} \inf_{y\in A} \sum_{k=1}^n \alpha_k \bI[x_k\neq y_k]= \inf_{v\in < U(x,A)>} ||v||_2 \end{aligned}$

For $r\geq 0$

$\inf_{0 \leq \lambda \leq 1} r^{-\lambda}e^{\frac{1}{4}(1-\lambda)^2} \leq 2-r$

It is that this is equivalent to showing that

$\inf_{\lambda\in[0,1]} \frac{1}{4}(1-\lambda)^2 -\lambda \log r \leq \log (2-r).$

The left-hand side is minimized by $\lambda= 1+2\log(r)$ . Thus, after substituting, we are required to show

$-\log r - (\log r)^2 \leq \log (2-r)\quad$

We can equivalently expressed this inequality as an integral

$0\geq \int^1_r \frac{1}{x} +\frac{2\log x}{x} -\frac{1}{2-x} dx = \int_r^1 \frac{2}{x(2-x)}\left[ 2 \log x - x+1 - x\log x \right]dx.$

The term in the square brackets is negative: $2\log x-x+1$ is concave and $x\log x$ is convex, both terms are equal when $x=1$ so the rest of the time $2\log x-x+1< x\log x$ . This completes the lemma. $\square$

Hint: the left-hand side is increasing in $\mathbb P(A)$ and the right is decreasing in $\mathbb P(A)$ and they are equal at $\mathbb P(A)=1$ .↩
See, Theorem 3.1 of McDiarmid, Colin. “Concentration.” Probabilistic methods for algorithmic discrete mathematics. Springer Berlin Heidelberg, 1998. 195-248.↩

One thought on “Talagrand’s Concentration Inequality”

Alex says:

January 20, 2018 at 4:47 pm

Wow, wonderful weblog layout! How lengthy have you ever been blogging for? you make blogging look easy. The overall look of your site is excellent, as neatly as the content material!

LikeLike

Applications

Longest Increasing Subsequence

Appendix

Share this:

One thought on “Talagrand’s Concentration Inequality”

Leave a comment Cancel reply