We prove a powerful inequality which provides very tight gaussian tail bounds “” for probabilities on product state spaces . Talagrand’s Inequality has found lots of applications in probability and combinatorial optimization and, if one can apply it, it generally outperforms inequalities like Azzuma-Hoeffding.
Thrm [Talagrand’ Concentration Inequality] For a measurable subset of the product space ,
Proof. The duality between the two characterizations of are given in the appendix to this section.
We will prove the inequality holds by induction on . Note for the case , so holds since .1
Now assuming our induction hypothesis up to , we prove the result for the case. We do so by proving the following claim
Claim: For , and ,
- In equality , we over estimate by two distances on : the first, a refined distance using information about , ; and the second, a cruder distance using only information about and , . for rectangular sets only.
Proof of claim: We prove the bound by considering the convex combination of points in and . Note if is a point in then there exists a such that for each , also for the case. So, if then . By a similar easy argument, we see that if then . It is also clear that these inclusions extend on convex combinations and . Thus we see that if and then
Also, using convexity of ,
Optimizing over and , as required, we get
This completes the proof of our claim and we continue with the proof of Talagrand’s Inequality.
Continuing to write and applying of inequality , we have for the conditional expectation that For second inequality, above, we apply Holder’s Inequality; in the third inequality, we apply our induction hypothesis; in the fourth inequality, we use the fact that , which we prove in the appendix to this section.
Now taking expectations over , and observing , we gain the required result:
Let’s informally discuss the application of Talagrand’s Inequality and then give a few examples of how to apply the result. First we recall that the Hamming distance is , so Talagrand considers a weighted Hamming distance from the set of , . A result of a similar form was known to hold previously to Talagrand’s, namely,2
However the bound is vastly improved by allowing us to optimize over given : from an event it is much more possible to deduce that for some , . In particular, we are in a strong position if we can show a bound of the form
Note by multiplying by both bounds are equivalent. In both cases, we imagine that, if making a change so some of the compenents of gives , then we have a way of estimating the change in for each component changed. Note the (left) bound above implies
Thus if we can bound above by , (the smaller the better) and we take an event then . Thus, for this choice of , implies . Thus applying Talagrand gives
For median, , inspecting the above bound with and and then with and yields a bound
At this point, we have a pretty convincing concentration bound about the median of the distribution of .
One might prefer work with the mean of F(x), , rather than the median, . Note, a Cantelli’s inequality implies that the the mean is withing one standard deviation (see Lemma [Ineqs:MeanMedian]). We can estimate the variance of from the above concentration inequality: the variance about the mean is always minimal so
So, which gives and so
So once again we get a tail-bound with the same concentration behaviour.
Longest Increasing Subsequence
Given independent random variables are independent real valued random variables, we consider the problem of finding the following Longest Increasing Subsequence:
Comparing two (similar) sequences and notice a longest subsequence of , could be used as a (long) subsequence for . That is
We take such that if and otherwise. If we consider the distance , then we have
which, on the event , gives
Note given we are working with an event of the form we look for an event that implies is of a certian size. Since the function is increasing for , then implies . Applying this then Talagrands inequality gives
For the median of this distibution, then setting and setting gives two tail bounds in the following result.
Note the bounds do not look not terribly symmetric but, in order to achieve a guaranteed probability in both bounds, we must choose of the same order, , in both bounds.
We first show the equality of our primal-dual characterization for .
It is that this is equivalent to showing that
The left-hand side is minimized by . Thus, after substituting, we are required to show
We can equivalently expressed this inequality as an integral
The term in the square brackets is negative: is concave and is convex, both terms are equal when so the rest of the time . This completes the lemma.