# Blackwell Approachability

Sequentially a player decides to play $\{p_t\}_{t=1}^\infty$ and his adversary decides $\{q_t\}_{t=1}^\infty$. At time $t$, a decision $(p_t,q_t)$ results in a vector payoff $A(p_t,q_t)\in {\mathbb R}^k$. Given $a_t$ is the average vector payoff at time $t$, Blackwell’s Approachability Theorem is a necessary and sufficient condition so that, regardless of the adversary’s decisions, the player makes the sequence of vectors $\{a_t\}_{t=1}^\infty$ approach a convex set ${\mathcal A}$.

• We consider a sequence $(p_1,q_1), (p_2,q_2), (p_3,q_3), ...$. Here each $p_t$ is a probability distribution on a set of (pure) decisions $\{1,...,n\}$ and similarly $q_t$ is a probability distribution on $\{1,...,n\}$.
• For each pair $(p,q)$, there is a payoff vector $A(p,q)\in{\mathbb R}^k$ and average payoff vector $a_t\in{\mathbb R}^k$. Here
• At each time $t$, we assume $p_t$ is chosen as a function of $A(\cdot,\cdot)$ and the past decisions $\{(p_\tau,q_\tau)\}_{\tau=1}^{t-1}$.
• We say that $\{a_\tau\}_{\tau=1}^\infty$ approaches a set ${\mathcal A}\subset {\mathbb R}^p$ if the distance between $a_t$ and ${\mathcal A}$ converges to zero, namely,

Theorem (Blackwell’s Approachability Theorem): For closed convex set ${\mathcal A}$, the following are equivalent

1. ${\mathcal A}$ is approachable.
2. For every $q$ there exists $p$ such that $A(p,q)\in {\mathcal A}$.
3. Every half space containing ${\mathcal A}$ is approachable.1

Before a proof we comment on how Blackwell’s Approachability Theorem relates to the Minimax Theorem.

• For $A(p,q)\in {\mathbb R}$, $\max_{q} \min_p A(p,q)=\min_p \max_{q} A(p,q)$ holds iff the following are equivalent for every $v\in {\mathbb R}$i. $\exists p$ $\forall q$ s.t. $A(p,q)\leq v$,$\:\:$[“without knowing $q$, we can think of a good $p$”]

ii. $\forall q\; \exists p$ s.t. $A(p,q)\leq v$.

[“if we know $q$, we can think of a good $p$”]

• Blackwell’s Approachability Theorem now makes the assumption that $A(p,q)$ is a vector.
• The equivalence between (1) and (2) found in Blackwell’s Approachability Theorem is analogous to the equivalence between (i) and the seemingly weaker statement (ii) found in the Minimax Theorem.
• However, we note that Approachability (1) is weaker than the statement $\exists p$ $\forall q$ s.t. $A(p,q)\in {\mathcal A}$. Approachability states $\exists$ $\{p_t\}_{t=1}^\infty$ (chosen sequentially) so that $\forall$ $\{ q_t \}_{t=0}^\infty$ on average $A(p_t,q_t)$ converges to ${\mathcal A}$ as $t\rightarrow \infty$.
• Finally, we observe that (3) approaching a half-space ${\mathcal H}=\{a : n\cdot a \leq v\}$ corresponds to applying the Minimax Theorem to game matrix $\hat{A}$ whose $i,j$-th component is $n\cdot A_{ij}$.

Proof: $2\implies 3$: For every ${\mathcal H} = \{a : n\cdot a \leq v\}$ containing ${\mathcal A}$,

$3\implies 2$: For every ${\mathcal H} = \{a : n\cdot a \leq v\}$ containing ${\mathcal A}$,

$1 \implies 3$: $A\subset {\mathcal H}=\{a : n\cdot a \leq v\}$ so is immediate.

$3 \implies 1$: Firstly note

At time $t$, let the projection $P(a_t)$ be closest point to $a_t$ in ${\mathcal A}$ and let ${\mathcal H}_t$ be the half-space containing ${\mathcal A}$ defined by normal $n_t=P(a_t)-a_t$ and $v_t=P(a_t)\cdot (P(a_t)-a_t)$. In other words, ${\mathcal H}_t$ is defined by the hyperplane through $P(a_t)$ perpendicular to $P(a_t)-a_t$. Since ${\mathcal H}_t$ is approachable, we can see from there must exist a $p_t$ such that $A(p_t,q)\in {\mathcal H}_t$ for all distributions $q$. Using this $p_t$.

As $a_{t+1}=A(p_t,q_t)/(t+1) + ta_t/(t+1)$, we now extract $A(p_{t+1},q_{t+1})-P(a_t)$ from the term $(a_{t+1}-a_t)\cdot (a_t-P(a_t))$:

Substituting this expression, we have

Thus multiplying both sides by $(t+1)^2$ and rearranging gives

Summing these interpolating terms gives

Thus, as required,

$\square$

• Note that our proof is constructive. We define a policy for choosing $p_t$ such thatwhere from $P(a_t)$, the projection of $a_t$ onto ${\mathcal A}$ we define $n_t=P(a_t)-a_t$ and $v_t=P(a_t)\cdot (P(a_t)-a_t)$.

We now consider one interesting consequence of this result.

Theorem (Hannan-Gaddum Theorem)

Suppose $A(p,q)$ as defined in is such that $A(p,q)\in {\mathcal A}$. There exists a playing strategy $\{p_t\}_{t=1}^\infty$ such that for any $\{q_t\}_{t=1}^\infty$

In other words, our performance in the game is asymptotically as good as the best fixed action.

Proof:

We verify Blackwell’s Approachability Theorem for the vector payoff $\hat{A}(p,q)=(A(p,q)-A(i,q)\; :\; i=1,...,m)$ and convex region ${\mathcal A}=\{a: a_i\leq 0, i=1,...,m\}$. For all $q$ there exists $p$ such that component-wise $\hat{A}(p,q)\leq 0$, in particular we choose $p_{i^*}=1$ where $A(i^*,q)=\min_{i=1,...,m} A(i,q)$. This verifies condition 2. of Blackwell’s Approachability Theorem. Thus there exists a strategy $\{p_t\}_{t=1}^\infty$ such that for all $\{q_t\}_{t=1}^\infty$

which is equivalent to the required expression above.

1. Recall a half-space in ${\mathbb R}^k$ is a set ${\mathcal H}=\{a\in {\mathbb R}^k : n\cdot a \leq c\}$ for some $n\in {\mathbb R}^k$ and $c\in {\mathbb R}^k$.