Sequentially a player decides to play and his adversary decides
. At time
, a decision
results in a vector payoff
. Given
is the average vector payoff at time
, Blackwell’s Approachability Theorem is a necessary and sufficient condition so that, regardless of the adversary’s decisions, the player makes the sequence of vectors
approach a convex set
.
- We consider a sequence
. Here each
is a probability distribution on a set of (pure) decisions
and similarly
is a probability distribution on
.
- For each pair
, there is a payoff vector
and average payoff vector
. Here
- At each time
, we assume
is chosen as a function of
and the past decisions
.
- We say that
approaches a set
if the distance between
and
converges to zero, namely,
Theorem (Blackwell’s Approachability Theorem): For closed convex set , the following are equivalent
is approachable.
- For every
there exists
such that
.
- Every half space containing
is approachable.1
Before a proof we comment on how Blackwell’s Approachability Theorem relates to the Minimax Theorem.
- For
,
holds iff the following are equivalent for every
i.
s.t.
,
[“without knowing
, we can think of a good
”]ii.
s.t.
.
[“if we know
, we can think of a good
”]
- Blackwell’s Approachability Theorem now makes the assumption that
is a vector.
- The equivalence between (1) and (2) found in Blackwell’s Approachability Theorem is analogous to the equivalence between (i) and the seemingly weaker statement (ii) found in the Minimax Theorem.
- However, we note that Approachability (1) is weaker than the statement
s.t.
. Approachability states
(chosen sequentially) so that
on average
converges to
as
.
- Finally, we observe that (3) approaching a half-space
corresponds to applying the Minimax Theorem to game matrix
whose
-th component is
.
Proof: : For every
containing
,
: For every
containing
,
:
so is immediate.
: Firstly note
At time , let the projection
be closest point to
in
and let
be the half-space containing
defined by normal
and
. In other words,
is defined by the hyperplane through
perpendicular to
. Since
is approachable, we can see from there must exist a
such that
for all distributions
. Using this
.
As , we now extract
from the term
:
Substituting this expression, we have
Thus multiplying both sides by and rearranging gives
Summing these interpolating terms gives
Thus, as required,
(Typo: the term 2A_{\max}^2 in the last 4 displays above should be 4A_{\max}^2.)
- Note that our proof is constructive. We define a policy for choosing
such that
where from
, the projection of
onto
we define
and
.
We now consider one interesting consequence of this result.
Theorem (Hannan-Gaddum Theorem)
Suppose as defined in is such that
. There exists a playing strategy
such that for any
In other words, our performance in the game is asymptotically as good as the best fixed action.
Proof:
We verify Blackwell’s Approachability Theorem for the vector payoff and convex region
. For all
there exists
such that component-wise
, in particular we choose
where
. This verifies condition 2. of Blackwell’s Approachability Theorem. Thus there exists a strategy
such that for all
which is equivalent to the required expression above.
- Recall a half-space in
is a set
for some
and
.↩
Excellent notes! One question: in the proof of the Blackwell’s Approachability Theorem, (3) -> (1): I wonder why ||A(p_t, q_t), a_t||^2 can be uniformly bounded from above by a constant 2A_{max}^2?
LikeLike
Excellent notes! For the proof of the Blackwell’s Approachability Theorem, (3) to (1): Can I ask why ||A(p_t, q_t) – a_t||^2 can be uniformly bounded from above by a constant 2 A_{max}^2?
LikeLike