Two-person zero-sum games

We consider a highly simplified game between two players.

Player $\alpha$ has pure strategies $\mathcal{S}^\alpha=\{ \alpha_1, \alpha_2,...,\alpha_m\}$ . Similarly, Player $\beta$ has pure strategies $\mathcal{S}^\beta=\{\beta_1,...,\beta_n\}$ .
If $\alpha$ chooses strategy $i\in {\mathcal S}_\alpha$ and $\beta$ chooses strategy $j\in {\mathcal S}_\beta$ then $\alpha$ receives reward $A_{ij}\in{\mathbb R}$ and $\beta$ receives reward $-A_{ij}$ .¹ We call the game a two-person zero-sum game because the rewards sum to zero.
We can think of the outcome of the game being decided by player $\alpha$ choosing a row $i$ of the matrix $A$ and $\beta$ choosing the column $j$ . Hence we may refer to $\alpha$ as the row player and $\beta$ as the column player.
We can allow players to randomize and choose a mixed strategy. We let $<{\mathcal S}^\alpha >$ be the set of probability distributions on ${\mathcal S}^\alpha$ . If $p=(p_i:i\in{\mathcal S}^\alpha)\in<{\mathcal S}^\alpha>$ , $p_i$ is the probability that player $\alpha$ plays strategy $i$ .
If player $\alpha$ chooses mixed strategy $p\in <{\mathcal S}^\alpha>$ and $\beta$ chooses $q\in<{\mathcal S}^\beta >$ then the probability of pure strategy $i\in{\mathcal S}^\alpha$ with pure strategy $j\in{\mathcal S}^\beta$ is $p_iq_j$ thus the expect reward for $\alpha$ is $\sum_{i=1}^m\sum_{j=1}^m p_iA_{ij}q_j = p^\top A q.$

As noted, we simplify the notion of a game. We do not explicitly consider the rules, movements or explicit structure of the game; we just consider strategies and their outcomes. Such a representation is often called the strategic form of the game. For this section, we consider a non-cooperative game, which means that players will seek to maximize there own rewards regardless of potential costs to others. Given a player cannot guarantee her opponent will act in her favor, what strategy might a rational player choose?

A player plays a maximin criterion if the player chooses the action with the highest guaranteed reward.If the player can only choose amongst pure strategies it chooses $i\in {\mathcal S}^\alpha$ that maximizes $\min_{j=1,...,m} A^\alpha_{ij}$ .If the player can choose mixed strategies it chooses $p\in <\!{\mathcal S}^\alpha\! >$ that maximizes $\min_{q\in<\!{\mathcal S}^\beta\! >} p^\top A q$ .

A maximin criterion is somewhat pessimistic; an opponent need not choose the strategy that gives the worst outcome. This said, one setting where it is in an player’s interest to take its opponents reward is a zero-sum game. Even so, it may still be the case that a player and his opponent choose maximin pure strategies but the opponents choice gives a reward above that anticipated by the players pessimistic maximin strategy. If we now allow players to choose mixed strategies, we see in following theorem that this is no longer the case and the player was correct to be pessimistic.

Thrm [Minimax Theorem] For any two-person zero-sum game

$\label{2PG:MM} \max_{p\in <\!S^\alpha\! >} \min_{q\in <\!S^\beta\! >} p^\top A q = \min_{q\in <\!S^\beta\! >} \max_{p\in <\!S^\alpha\! >} p^\top A q.$

Moreover, there exist $p^*\in <\!S^\alpha\! >$ , $q^*\in <\!S^\beta\! >$ and $v^*\in{\mathbb R}$ such that $p^*$ and $q^*$ achieve the respective maximizations and minimizations in and $v^*=p^{*\top}Aq^*= \max_{p\in <\!S^\alpha\! >} \min_{q\in <\!S^\beta\! >} p^\top A q$ .

If player $\alpha$ observes that on average $\beta$ plays actions with distribution $q\in <\!S^\beta\! >$ then his best strategy is to play $p$ that maximizes $p^\top A q$ . Given this, $\beta$ ’s best response is to minimize $p^\top A q$ over $q$ . This leads to the $\min_q \max_p$ statement in equality . Conversely, $\beta$ observing $\alpha$ and $\alpha$ responding give the $\max_p\min_q$ statement in . Equation asserts that these incentives are aligned.
The fact there exists a $p^*$ and $q^*$ satisfying equality suggests there is a notion of equilibrium for this game: it would make no sense for $\alpha$ (or $\beta$ ) to deviate from $p^*$ (or $q^*$ ). If $\alpha$ did and $\beta$ ’s strategy remain the the same demonstrates that $\alpha$ either be worse off or, at best, be as well off as before: i.e. $\forall p$ , $p^\top A q^* \leq p^{*\top} A q^*$ .
We say that $v^*$ is the value of the game and $p^*$ , $q^*$ is a solution of the game.
The equality is, by construction², remarkably similar to the statement of .

Proof. Player $\beta$ wishes to minimizes $\alpha$ ’s reward. So let’s consider the optimization problem

$\label{2PG:MMP} \text{minimize}\quad v \text{subject to} \sum_{j=1}^n A_{ij}q_j \leq v,\;i=1,...,n,\quad \sum_{j=1}^n q_j =1 \text{over} v\in{\mathbb R},\quad q_j\geq 0,\; j=1,...,n.$

Let’s calculate the dual of this linear program. Minimizing its Lagrangian gives

$\begin{aligned} \min_{v,q\geq 0,z\geq 0}\; L(v,q,z;w,p)&= \min_{v,q\geq 0,z\geq 0}\;\; v - \sum_i p_i\Big( v - \sum_{j} A_{ij}q_j -z_i\Big) + w\Big(1- \sum_i q_i \Big)\\ &= \min_{v,q\geq 0,z\geq 0}\;\; w + \underbrace{\Big(1 - \sum_i p_i\Big)}_{=0} v + \sum_{j} \underbrace{\Big( \sum_i p_i A_{ij} -w\Big)}_{\geq 0} q_j + \underbrace{\sum_ip_i}_{\geq 0}z_i. \end{aligned}$

This gives us a dual

$\label{2PG:MMD} \text{minimize}\quad w \text{subject to} \sum_{i=1}^m p_iA_{ij}\geq w,\;i=1,...,m,\quad \sum_{i=1}^n q_j =1 \text{over} w\geq 0,\quad p_i\geq 0,\; i=1,...,m,$

which we interpret as $\alpha$ attempting to maximize his reward. As Strong Duality holds for the feasible linear programs and , we can find a $v^*$ , $p^*$ and $q^*$ where $v^*$ , $q^*$ solves and $v^*$ , $p^*$ solves .

Because $\sum_{i} p^*_iA_{ij}\geq v^*$ and $\sum_j A_{ij}q_j^*\leq v^*$ , we can say for all $p\in <\! S^\alpha\! >$ and $q\in <\!S^\beta \! >$

$\label{2PG:vpq} \sum_i \sum_j p_iA_{ij} q_j^* \leq \sum_i p_i v^* =v^*=\sum_j v^* q_j \leq \sum_i \sum_j p^*_iA_{ij}q_j.$

Thus

$\label{2PG:MMD2} \min_q \max_p p^\top A q \leq \max_p p^\top A q^* \leq v^* \leq \min_q p^{*\top} A q \leq \max_p \min_q p^\top A q.$

Above, the two middle inequalities are exactly our statement , the left-most and right-most inequalities replace $q^*$ and $p^*$ with an optimization over $q$ and $p$ , respectively. So we know $\min_q \max_p p^\top A q \leq \max_p \min_q p^\top A q$ .

We can demonstrate the reverse inequality

$\max_p p^\top A q \geq \tilde{p}^\top A q,$

for all $q\in<\!S^\beta \! >$ and $\tilde{p}\in <\!S^\alpha\! >$ . Minimizing both sides over $q$

$\min_q \max_p p^\top A q \geq \min_q \tilde{p}^\top A q.$

Finally maximizing the right-hand expression over $\tilde{q}$ gives $\min_q \max_p p^\top A q \geq \max_p \min_q p^\top A q$ . Thus $\min_q \max_p p^\top A q = \max_p \min_q p^\top A q$ as required. Finally, from we see that $v^*$ , $p^*$ and $q^*$ achieves this minimax saddle point. $\square$

Ex 1. [Matching Pennies] Suppose two players each have a coin, say $1$ euro. Without showing their coin to their opponent, each player can choose to place the coin heads up or tails up. The players then reveal their coins to each other. If the coins faces match then the first player wins the coin. If the coins do not match the second player wins the coin. This leads to the game matrix
$\begin{aligned} & \;\quad H\qquad T\\ \begin{array}{c} H \\ T \end{array} & \left(\begin{array}{rr} 1 & -1 \\ -1 & 1 \end{array}\right) \end{aligned}$

Clearly, no pure strategy equilibrium can exist e.g. if the first player always chooses heads then the second always chooses tails and so it is in the first player’s advantage to play tails more. By this same logic, intuitively, no player would want to bias towards heads or tails as their opponent could exploit this. Thus one might expect the solution to be a mixed strategy playing head and tails with equal probability and thus the expected value is zero. Let’s validate this.

$p^\top Aq=\begin{array}{c} (p_H\; p_T)\\ \; \end{array} \left(\begin{array}{rr} 1 & -1 \\ -1 & 1 \end{array}\right) \left(\!\begin{array}{r} q_H \\ q_T \end{array}\!\right) = (p_H-p_T)q_H + (p_T-p_H)q_T.$

Minimizing over $q$ gives $\min_q p^\top A p=-|p_H-p_T|$ . Thus $\max_p\min_q p^\top A p=0$ and is achieved by $p=(1/2, 1/2)$ . By the symmetric argument we see that $q=(1/2, 1/2)$ also.

Ex 2. Find the solution and value of the two-person zero-sum game whose row player rewards are given by the matrix

$\left( \!\! \begin{array}{rr} 2 & 1\\ -1 & 2 \end{array} \right)$

Ex 3. Consider $A$ a two-player zero-sum game where players must play pure strategies under a maximin criterion. 1) Show that

$\max_i \min_j A_{ij} \leq \min_j \max_i A_{ij}.$

2) Construct a $2\times 3$ matrix where this inequality is strict.

Ex 4. We consider a two-person zero-sum game with the row player $\alpha$ ’s rewards given by matrix $A$ .

Strategy $i\in S^\alpha$ is weakly dominated if there exists $i'\in {\mathcal S}^\alpha$ such that $A_{ij}\leq A_{i'j}$ $\forall j\in {\mathcal S}^\beta$ , i.e. the rewards of $i'$ are better than $i$ .

Suppose $i\in{\mathcal S}^\alpha$ is a weakly dominated strategy and consider $A'$ the game where the $i$ th row is removed from matrix $A$ , i.e. $A'$ is the game where player can no long select strategy $i$ . Prove that the game $A'$ has the same value as $A$ . Explain how a solution to the game $A$ can be constructed from a solution of $A'$ .

Ex 5. [Paper, Scissors, Stone] We consider a game of paper, scissors, stone. As a reminder, in this game there are two players. simultaneously the players reveal to each other the sign paper, scissors or stone. Paper is beaten by scissors; scissors is beaten by stone and stone is beaten by paper.

Ex 6. We assume for our game that if a player wins with stone the losing player pays $4$ euro to the winner; similarly, if a player wins with scissors the loser pays $2$ euro to the winner, and if a player wins with paper he pays $1$ euro to the winner. We assume that for a draw no money is exchanged.

a) Write down $A$ , the game matrix for this two-person zero sum game.

b) Prove that, for any game with $A=-A^\top$ , the value of this game is $0$ and there exist a solution where both players use the same (mixed) strategy which we call $p$ .

c) Argue, for the solution $p$ from part b), that if $p_i>0$ then the $i$ -th entry of $A^\top p$ equals zero.

d) Use the equalities from part c) to find a solution to this game.

Ex 7. We say that $(p^*,q^*)$ are the solution of a two-person zero-sum game $A$ if they respectively solve the maximizations and minimizations in the expression

$\max_{p\in <\!S^\alpha\! >} \min_{q\in <\!S^\beta\! >} p^\top A q = \min_{q\in <\!S^\beta\! >} \max_{p\in <\!S^\alpha\! >} p^\top A q.$

We say that $(p',q')$ are an equilibrium pair if they satisfy

$p^\top A q' \leq p'^\top A q' \leq p^\top A q',\qquad \forall p\in <\!S^\alpha\! >, q\in <\!S^\beta\! >.$

a) Show that $v=p'^\top Aq'$ is the same for all equilibrium pairs.

b) Show that $(p',q')$ is an equilibrium pair iff it is a solution.

Ex 8. [A Blotto Game] Two armies battle over two areas of land. Each army must distribute their battalions over these two land regions. Army $A$ has $3$ battalions. Army $B$ has $4$ battalions. Battalions are indivisible and each army must place at least one battalion on each land region.

The army with most battalions depolyed on an area land wins that region of land. If an army places $r$ battalions on a land region and the other army places $s$ battalions on that same region, and without loss of generality $r>s$ , then the army with $r$ battalions wins and recieves a reward of $s+1$ and the other army losses recieves a loss of $s+1$ : $s$ for the battalions defeated and $1$ for the land won/lost. If $r=s$ , the battle results in a draw and the reward/loss to both armies is zero. Rewards are added and losses are subtracted over the land regions to give the total gain of each army.

Write out the game matrix for the total gain of army A in this two-person zero-sum game. Find the value and solutions of this game.

Ex 9. a) State the Minimax Theorem and, for a reward matrix $A$ , define what is meant by the value and solution of a two-person zero-sum game.

Strategy $i\in S^\alpha$ is weakly dominated if there exists $i'\in {\mathcal S}^\alpha$ such that $A_{ij}\leq A_{i'j}$ $\forall j\in {\mathcal S}^\beta$ , i.e. the rewards of $i'$ are better than $i$ .

b) Suppose $i\in{\mathcal S}^\alpha$ is a weakly dominated strategy and consider $A'$ the game where the $i$ th row is removed from matrix $A$ , i.e. $A'$ is the game where player can no long select strategy $i$ . Prove that the game $A'$ has the same value as $A$ . Explain how a solution to the game $A$ can be constructed from a solution of $A'$ .

c) Find the value and a solution of the following two-person zero-sum game where row player rewards are given by the matrix

$\begin{aligned} \left(\begin{array}{rrr} 1 & 0 & 2\\ 2 & - 1 & -3 \\ -1 & 0 & 1 \end{array}\right). \end{aligned}$

A reward need not be money, it is simply some benefit that the player can quantify and order.↩
Von Neumann developed the Minimax Theorem shortly after developing his theory of duality.↩

Share this:

Leave a comment Cancel reply