Two-person zero-sum games

We consider a highly simplified game between two players.

  • Player \alpha has pure strategies \mathcal{S}^\alpha=\{ \alpha_1, \alpha_2,...,\alpha_m\}. Similarly, Player \beta has pure strategies \mathcal{S}^\beta=\{\beta_1,...,\beta_n\}.
  • If \alpha chooses strategy i\in {\mathcal S}_\alpha and \beta chooses strategy j\in {\mathcal S}_\beta then \alpha receives reward A_{ij}\in{\mathbb R} and \beta receives reward -A_{ij}.1 We call the game a two-person zero-sum game because the rewards sum to zero.
  • We can think of the outcome of the game being decided by player \alpha choosing a row i of the matrix A and \beta choosing the column j. Hence we may refer to \alpha as the row player and \beta as the column player.
  • We can allow players to randomize and choose a mixed strategy. We let <{\mathcal S}^\alpha > be the set of probability distributions on {\mathcal S}^\alpha. If p=(p_i:i\in{\mathcal S}^\alpha)\in<{\mathcal S}^\alpha>, p_i is the probability that player \alpha plays strategy i.
  • If player \alpha chooses mixed strategy p\in <{\mathcal S}^\alpha> and \beta chooses q\in<{\mathcal S}^\beta > then the probability of pure strategy i\in{\mathcal S}^\alpha with pure strategy j\in{\mathcal S}^\beta is p_iq_j thus the expect reward for \alpha is

As noted, we simplify the notion of a game. We do not explicitly consider the rules, movements or explicit structure of the game; we just consider strategies and their outcomes. Such a representation is often called the strategic form of the game. For this section, we consider a non-cooperative game, which means that players will seek to maximize there own rewards regardless of potential costs to others. Given a player cannot guarantee her opponent will act in her favor, what strategy might a rational player choose?

  • A player plays a maximin criterion if the player chooses the action with the highest guaranteed reward.If the player can only choose amongst pure strategies it chooses i\in {\mathcal S}^\alpha that maximizes \min_{j=1,...,m} A^\alpha_{ij}.If the player can choose mixed strategies it chooses p\in <\!{\mathcal S}^\alpha\! > that maximizes \min_{q\in<\!{\mathcal S}^\beta\! >} p^\top A q.

A maximin criterion is somewhat pessimistic; an opponent need not choose the strategy that gives the worst outcome. This said, one setting where it is in an player’s interest to take its opponents reward is a zero-sum game. Even so, it may still be the case that a player and his opponent choose maximin pure strategies but the opponents choice gives a reward above that anticipated by the players pessimistic maximin strategy. If we now allow players to choose mixed strategies, we see in following theorem that this is no longer the case and the player was correct to be pessimistic.

Thrm [Minimax Theorem] For any two-person zero-sum game

Moreover, there exist p^*\in <\!S^\alpha\! >, q^*\in <\!S^\beta\! > and v^*\in{\mathbb R} such that p^* and q^* achieve the respective maximizations and minimizations in and v^*=p^{*\top}Aq^*= \max_{p\in <\!S^\alpha\! >} \min_{q\in <\!S^\beta\! >} p^\top A q.

  • If player \alpha observes that on average \beta plays actions with distribution q\in <\!S^\beta\! > then his best strategy is to play p that maximizes p^\top A q. Given this, \beta’s best response is to minimize p^\top A q over q. This leads to the \min_q \max_p statement in equality . Conversely, \beta observing \alpha and \alpha responding give the \max_p\min_q statement in . Equation asserts that these incentives are aligned.
  • The fact there exists a p^* and q^* satisfying equality suggests there is a notion of equilibrium for this game: it would make no sense for \alpha (or \beta) to deviate from p^* (or q^*). If \alpha did and \beta’s strategy remain the the same demonstrates that \alpha either be worse off or, at best, be as well off as before: i.e. \forall p, p^\top A q^* \leq p^{*\top} A q^*.
  • We say that v^* is the value of the game and p^*, q^* is a solution of the game.
  • The equality is, by construction2, remarkably similar to the statement of .

Proof. Player \beta wishes to minimizes \alpha’s reward. So let’s consider the optimization problem

Let’s calculate the dual of this linear program. Minimizing its Lagrangian gives

This gives us a dual

which we interpret as \alpha attempting to maximize his reward. As Strong Duality holds for the feasible linear programs and , we can find a v^*, p^* and q^* where v^*, q^* solves and v^*, p^* solves .

Because \sum_{i} p^*_iA_{ij}\geq v^* and \sum_j A_{ij}q_j^*\leq v^*, we can say for all p\in <\! S^\alpha\! > and q\in <\!S^\beta \! >


Above, the two middle inequalities are exactly our statement , the left-most and right-most inequalities replace q^* and p^* with an optimization over q and p, respectively. So we know \min_q \max_p p^\top A q \leq \max_p \min_q p^\top A q.

We can demonstrate the reverse inequality

for all q\in<\!S^\beta \! > and \tilde{p}\in <\!S^\alpha\! >. Minimizing both sides over q

Finally maximizing the right-hand expression over \tilde{q} gives \min_q \max_p p^\top A q \geq \max_p \min_q p^\top A q. Thus \min_q \max_p p^\top A q = \max_p \min_q p^\top A q as required. Finally, from we see that v^*, p^* and q^* achieves this minimax saddle point. \square

Ex 1. [Matching Pennies] Suppose two players each have a coin, say 1 euro. Without showing their coin to their opponent, each player can choose to place the coin heads up or tails up. The players then reveal their coins to each other. If the coins faces match then the first player wins the coin. If the coins do not match the second player wins the coin. This leads to the game matrix

Clearly, no pure strategy equilibrium can exist e.g. if the first player always chooses heads then the second always chooses tails and so it is in the first player’s advantage to play tails more. By this same logic, intuitively, no player would want to bias towards heads or tails as their opponent could exploit this. Thus one might expect the solution to be a mixed strategy playing head and tails with equal probability and thus the expected value is zero. Let’s validate this.

Minimizing over q gives \min_q p^\top A p=-|p_H-p_T|. Thus \max_p\min_q p^\top A p=0 and is achieved by p=(1/2, 1/2). By the symmetric argument we see that q=(1/2, 1/2) also.

Ex 2. Find the solution and value of the two-person zero-sum game whose row player rewards are given by the matrix

Ex 3.  Consider A a two-player zero-sum game where players must play pure strategies under a maximin criterion. 1) Show that

2) Construct a 2\times 3 matrix where this inequality is strict.

Ex 4. We consider a two-person zero-sum game with the row player \alpha’s rewards given by matrix A.

  • Strategy i\in S^\alpha is weakly dominated if there exists i'\in {\mathcal S}^\alpha such that A_{ij}\leq A_{i'j} \forall j\in {\mathcal S}^\beta, i.e. the rewards of i' are better than i.

Suppose i\in{\mathcal S}^\alpha is a weakly dominated strategy and consider A' the game where the ith row is removed from matrix A, i.e. A' is the game where player can no long select strategy i. Prove that the game A' has the same value as A. Explain how a solution to the game A can be constructed from a solution of A'.

Ex 5. [Paper, Scissors, Stone] We consider a game of paper, scissors, stone. As a reminder, in this game there are two players. simultaneously the players reveal to each other the sign paper, scissors or stone. Paper is beaten by scissors; scissors is beaten by stone and stone is beaten by paper.

Ex 6. We assume for our game that if a player wins with stone the losing player pays 4 euro to the winner; similarly, if a player wins with scissors the loser pays 2 euro to the winner, and if a player wins with paper he pays 1 euro to the winner. We assume that for a draw no money is exchanged.

a) Write down A, the game matrix for this two-person zero sum game.

b) Prove that, for any game with A=-A^\top, the value of this game is 0 and there exist a solution where both players use the same (mixed) strategy which we call p.

c) Argue, for the solution p from part b), that if p_i>0 then the i-th entry of A^\top p equals zero.

d) Use the equalities from part c) to find a solution to this game.

Ex 7. We say that (p^*,q^*) are the solution of a two-person zero-sum game A if they respectively solve the maximizations and minimizations in the expression

We say that (p',q') are an equilibrium pair if they satisfy

a) Show that v=p'^\top Aq' is the same for all equilibrium pairs.

b) Show that (p',q') is an equilibrium pair iff it is a solution.

Ex 8. [A Blotto Game] Two armies battle over two areas of land. Each army must distribute their battalions over these two land regions. Army A has 3 battalions. Army B has 4 battalions. Battalions are indivisible and each army must place at least one battalion on each land region.

The army with most battalions depolyed on an area land wins that region of land. If an army places r battalions on a land region and the other army places s battalions on that same region, and without loss of generality r>s, then the army with r battalions wins and recieves a reward of s+1 and the other army losses recieves a loss of s+1: s for the battalions defeated and 1 for the land won/lost. If r=s, the battle results in a draw and the reward/loss to both armies is zero. Rewards are added and losses are subtracted over the land regions to give the total gain of each army.

Write out the game matrix for the total gain of army A in this two-person zero-sum game. Find the value and solutions of this game.

Ex 9. a) State the Minimax Theorem and, for a reward matrix A, define what is meant by the value and solution of a two-person zero-sum game.

Strategy i\in S^\alpha is weakly dominated if there exists i'\in {\mathcal S}^\alpha such that A_{ij}\leq A_{i'j} \forall j\in {\mathcal S}^\beta, i.e. the rewards of i' are better than i.

b) Suppose i\in{\mathcal S}^\alpha is a weakly dominated strategy and consider A' the game where the ith row is removed from matrix A, i.e. A' is the game where player can no long select strategy i. Prove that the game A' has the same value as A. Explain how a solution to the game A can be constructed from a solution of A'.

c) Find the value and a solution of the following two-person zero-sum game where row player rewards are given by the matrix

  1. A reward need not be money, it is simply some benefit that the player can quantify and order.
  2. Von Neumann developed the Minimax Theorem shortly after developing his theory of duality.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: