Talagrand’s Concentration Inequality

We prove a powerful inequality which provides very tight gaussian tail bounds “e^{-ct^2}” for probabilities on product state spaces \Omega^n. Talagrand’s Inequality has found lots of applications in probability and combinatorial optimization and, if one can apply it, it generally outperforms inequalities like Azzuma-Hoeffding.

Thrm [Talagrand’ Concentration Inequality] For A a measurable subset of the product space \Omega^n,

and, consequently,

Here,

where, V(x,A)=\{ u: u_k\geq \mathbb I[x_k\neq y_k], \text{ for }k=1,..,n,\text{ for some }y\in A\}.

Proof. The duality between the two characterizations of d(x,A) are given in the appendix to this section.

We will prove the inequality holds by induction on n. Note for the case n=1, d(x,A)=\mathbb I[x\notin A] so holds since e^{-1/4}(1-\mathbb P(A)) + \mathbb P(A) \leq \mathbb P(A)^{-1}.1

Now assuming our induction hypothesis up to n, we prove the result for the n+1 case. We do so by proving the following claim


Claim: For z=(w,x)\in\Omega^{n+1}, w\in\Omega and x\in\Omega^{n},

where,

  • In equality , we over estimate d(z,A) by two distances on \Omega^n: the first, a refined distance using information about w, d(x,A(w)); and the second, a cruder distance using only information about x and A, d(x,\tilde{A}). A(w)=\tilde{A} for rectangular sets only.

Proof of claim: We prove the bound by considering the convex combination of points in V(x,A(w)) and V(x,\tilde{A}). Note if v_w is a point in V(x,A(w)) then there exists a y\in A(w) such that v_k \geq \mathbb I[x_k\neq y_k] for each k=1,...,n, also 0\geq \mathbb I[w=w] for the k=n+1 case. So, if v_w\in V(x,A(w)) then (v_w,0)\in V(z,A). By a similar easy argument, we see that if \tilde{v}\in V(x,\tilde{A}) then (\tilde{v},1)\in V(z,A). It is also clear that these inclusions extend on convex combinations <V(x,A(w))> and <V(x,\tilde{A})>. Thus we see that if v_w\in V(x,A(w)) and \tilde{w}\in V(x,\tilde{A}) then

Also, using convexity of ||\cdot||_2^2,

Optimizing over <V(z,A)>, <V(x,A(w))> and <V(x,\tilde{A})>, as required, we get

This completes the proof of our claim and we continue with the proof of Talagrand’s Inequality.


Continuing to write z=(w,x)\in \Omega\times \Omega^n and applying of inequality , we have for the conditional expectation that For second inequality, above, we apply Holder’s Inequality; in the third inequality, we apply our induction hypothesis; in the fourth inequality, we use the fact that \inf_{0 \leq \lambda \leq 1} r^{-\lambda}e^{(1-\lambda)^2/4} \leq 2-r, which we prove in the appendix to this section.

Now taking expectations over w, and observing x(2-x)\leq 1, we gain the required result:

\square


Applications

Let’s informally discuss the application of Talagrand’s Inequality and then give a few examples of how to apply the result. First we recall that the Hamming distance is d_H(x,y) =\sum_{i=1}^n \mathbb I[x_i\neq y_i], so Talagrand considers a weighted Hamming distance from the set of y\in A, d_{\alpha,H}(x,A):=\inf_{y\in A} \sum_{i=1}^n \alpha_i\mathbb I[x_i \neq y_i]. A result of a similar form was known to hold previously to Talagrand’s, namely,2

However the bound is vastly improved by allowing us to optimize over \alpha given x: from an event A=\{ F(x)\leq s\} it is much more possible to deduce that for some s', \{F(x) \geq s'\} \subset \{ \sup_\alpha d_{\alpha,H} (x,A) \geq t\}. In particular, we are in a strong position if we can show a bound of the form

Note by multiplying by -1 both bounds are equivalent. In both cases, we imagine that, if making a change so some of the compenents of x gives y, then we have a way of estimating the change in F(x) for each component changed. Note the (left) bound above implies

Thus if we can bound ||\alpha^x||_2 above by a, (the smaller the better) and we take an event A=\{ F(x) \geq s\} then d(x,A) \geq (s-F(x)) a^{-1}. Thus, for this choice of A, F(x) \leq s' implies d(x,A) \geq (s-s') a^{-1}. Thus applying Talagrand gives

For median, m, inspecting the above bound with s=m and s'=m-t and then with s=m+t and s'=m yields a bound

At this point, we have a pretty convincing concentration bound about the median of the distribution of F(x).

One might prefer work with the mean of F(x), \mu, rather than the median, m. Note, a Cantelli’s inequality implies that the the mean is withing one standard deviation (see Lemma [Ineqs:MeanMedian]). We can estimate the variance of F(x) from the above concentration inequality: the variance about the mean is always minimal so

So, |\mu - m|\leq 4 a which gives |F(x)-\mu| \leq |F(x)-m| + 4a and so

So once again we get a tail-bound with the same concentration behaviour.

Longest Increasing Subsequence

Given independent random variables X=(X_1,X_2,...,X_N) are independent real valued random variables, we consider the problem of finding the following Longest Increasing Subsequence:

Comparing two (similar) sequences x and y notice a longest subsequence of x, S(x) could be used as a (long) subsequence for y. That is

We take \alpha such that \alpha_i= {I(x)}^{-1/2} if i\in S(x) and \alpha_i=0 otherwise. If we consider the distance d_\alpha(x,y) = {I(x)}^{-1/2} \cdot | i\in S(x) : x_i \neq y_i |, then we have

which, on the event A=\{y : I(y) \leq s \}, gives

Note given we are working with an event of the form A=\{ I(x) \leq s\} we look for an event \{I(x)\geq s'\} that implies d(x,A) is of a certian size. Since the function \frac{z-s}{\sqrt{z}} is increasing for z\geq s, then I(x) \geq s+t implies \frac{I(x)-s}{\sqrt{I(x)}} \geq \frac{t}{\sqrt{t+s}}. Applying this then Talagrands inequality gives

For m the median of this distibution, then setting s=m-t and setting s=m gives two tail bounds in the following result.

Theorem

Note the bounds do not look not terribly symmetric but, in order to achieve a guaranteed probability in both bounds, we must choose t of the same order, O(m), in both bounds.

Appendix

We first show the equality of our primal-dual characterization for d(x,A).

For r\geq 0

It is that this is equivalent to showing that

The left-hand side is minimized by \lambda= 1+2\log(r). Thus, after substituting, we are required to show

We can equivalently expressed this inequality as an integral

The term in the square brackets is negative: 2\log x-x+1 is concave and x\log x is convex, both terms are equal when x=1 so the rest of the time 2\log x-x+1< x\log x. This completes the lemma. \square


  1. Hint: the left-hand side is increasing in \mathbb P(A) and the right is decreasing in \mathbb P(A) and they are equal at \mathbb P(A)=1.
  2. See, Theorem 3.1 of McDiarmid, Colin. “Concentration.” Probabilistic methods for algorithmic discrete mathematics. Springer Berlin Heidelberg, 1998. 195-248.

One thought on “Talagrand’s Concentration Inequality”

  1. Wow, wonderful weblog layout! How lengthy have you ever been blogging for? you make blogging look easy. The overall look of your site is excellent, as neatly as the content material!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: