Blackwell Approachability

Sequentially a player decides to play \{p_t\}_{t=1}^\infty and his adversary decides \{q_t\}_{t=1}^\infty. At time t, a decision (p_t,q_t) results in a vector payoff A(p_t,q_t)\in {\mathbb R}^k. Given a_t is the average vector payoff at time t, Blackwell’s Approachability Theorem is a necessary and sufficient condition so that, regardless of the adversary’s decisions, the player makes the sequence of vectors \{a_t\}_{t=1}^\infty approach a convex set {\mathcal A}.

  • We consider a sequence (p_1,q_1), (p_2,q_2), (p_3,q_3), ... . Here each p_t is a probability distribution on a set of (pure) decisions \{1,...,n\} and similarly q_t is a probability distribution on \{1,...,n\}.
  • For each pair (p,q), there is a payoff vector A(p,q)\in{\mathbb R}^k and average payoff vector a_t\in{\mathbb R}^k. Here
  • At each time t, we assume p_t is chosen as a function of A(\cdot,\cdot) and the past decisions \{(p_\tau,q_\tau)\}_{\tau=1}^{t-1}.
  • We say that \{a_\tau\}_{\tau=1}^\infty approaches a set {\mathcal A}\subset {\mathbb R}^p if the distance between a_t and {\mathcal A} converges to zero, namely,

Theorem (Blackwell’s Approachability Theorem): For closed convex set {\mathcal A}, the following are equivalent

  1. {\mathcal A} is approachable.
  2. For every q there exists p such that A(p,q)\in {\mathcal A}.
  3. Every half space containing {\mathcal A} is approachable.1

Before a proof we comment on how Blackwell’s Approachability Theorem relates to the Minimax Theorem.

  • For A(p,q)\in {\mathbb R}, \max_{q} \min_p A(p,q)=\min_p \max_{q} A(p,q) holds iff the following are equivalent for every v\in {\mathbb R}i. \exists p \forall q s.t. A(p,q)\leq v,\:\:[“without knowing q, we can think of a good p”]

    ii. \forall q\; \exists p s.t. A(p,q)\leq v.

    [“if we know q, we can think of a good p”]

  • Blackwell’s Approachability Theorem now makes the assumption that A(p,q) is a vector.
  • The equivalence between (1) and (2) found in Blackwell’s Approachability Theorem is analogous to the equivalence between (i) and the seemingly weaker statement (ii) found in the Minimax Theorem.
  • However, we note that Approachability (1) is weaker than the statement \exists p \forall q s.t. A(p,q)\in {\mathcal A}. Approachability states \exists \{p_t\}_{t=1}^\infty (chosen sequentially) so that \forall \{ q_t \}_{t=0}^\infty on average A(p_t,q_t) converges to {\mathcal A} as t\rightarrow \infty.
  • Finally, we observe that (3) approaching a half-space {\mathcal H}=\{a : n\cdot a \leq v\} corresponds to applying the Minimax Theorem to game matrix \hat{A} whose i,j-th component is n\cdot A_{ij}.

Proof: 2\implies 3: For every {\mathcal H} = \{a : n\cdot a \leq v\} containing {\mathcal A},

3\implies 2: For every {\mathcal H} = \{a : n\cdot a \leq v\} containing {\mathcal A},

1 \implies 3: A\subset {\mathcal H}=\{a : n\cdot a \leq v\} so is immediate.

3 \implies 1: Firstly note

At time t, let the projection P(a_t) be closest point to a_t in {\mathcal A} and let {\mathcal H}_t be the half-space containing {\mathcal A} defined by normal n_t=P(a_t)-a_t and v_t=P(a_t)\cdot (P(a_t)-a_t). In other words, {\mathcal H}_t is defined by the hyperplane through P(a_t) perpendicular to P(a_t)-a_t. Since {\mathcal H}_t is approachable, we can see from there must exist a p_t such that A(p_t,q)\in {\mathcal H}_t for all distributions q. Using this p_t.

As a_{t+1}=A(p_t,q_t)/(t+1) + ta_t/(t+1), we now extract A(p_{t+1},q_{t+1})-P(a_t) from the term (a_{t+1}-a_t)\cdot (a_t-P(a_t)):

Substituting this expression, we have

Thus multiplying both sides by (t+1)^2 and rearranging gives

Summing these interpolating terms gives

Thus, as required,

\square

  • Note that our proof is constructive. We define a policy for choosing p_t such thatwhere from P(a_t), the projection of a_t onto {\mathcal A} we define n_t=P(a_t)-a_t and v_t=P(a_t)\cdot (P(a_t)-a_t).

We now consider one interesting consequence of this result.

Theorem (Hannan-Gaddum Theorem)

Suppose A(p,q) as defined in is such that A(p,q)\in {\mathcal A}. There exists a playing strategy \{p_t\}_{t=1}^\infty such that for any \{q_t\}_{t=1}^\infty

In other words, our performance in the game is asymptotically as good as the best fixed action.

Proof:

We verify Blackwell’s Approachability Theorem for the vector payoff \hat{A}(p,q)=(A(p,q)-A(i,q)\; :\; i=1,...,m) and convex region {\mathcal A}=\{a: a_i\leq 0, i=1,...,m\}. For all q there exists p such that component-wise \hat{A}(p,q)\leq 0, in particular we choose p_{i^*}=1 where A(i^*,q)=\min_{i=1,...,m} A(i,q). This verifies condition 2. of Blackwell’s Approachability Theorem. Thus there exists a strategy \{p_t\}_{t=1}^\infty such that for all \{q_t\}_{t=1}^\infty

which is equivalent to the required expression above.


  1. Recall a half-space in {\mathbb R}^k is a set {\mathcal H}=\{a\in {\mathbb R}^k : n\cdot a \leq c\} for some n\in {\mathbb R}^k and c\in {\mathbb R}^k.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s