EC941 - Game Theory: Prof. Francesco Squintani Email: F.squintani@warwick - Ac.uk

EC941 - Game Theory
Lecture 7
Prof. Francesco Squintani
Email: [email protected]
1
Structure of the Lecture
 Infinitely Repeated Games
 Nash and Subgame-Perfect Equilibrium
 Finitely Repeated Games
2
Repeated Games
 Repeated games are a special class of interactions,
represented as extensive form games.
 A simultaneous move game, represented as a normal

form game, is repeated over time.
 This yields to enlarging the set of equilibria, if

players are sufficiently patient. For example,
cooperation is a subgame perfect equilibrium in the
prisoner’s dilemma.
3
Definition Let G = (N, A, u) be a strategic game. Let T
be finite or infinite. The T-repeated game of G for the
discount factor δ is the extensive game in which:
 the set of players is N
 the set of terminal histories is the set of infinite
sequences (a1, a2, . . .) of action profiles in G
 the player function assigns the set of all players to
every proper sub-history of every terminal history
 the set of actions of player i after any history is A
i
 each player i evaluates each terminal history (a ,
1
a2, . . .) according to its discounted average
(1 − d) ∑Tt=1 d t−1 ui (at).
4
Repeated Prisoner Dilemma
 Suppose that the following game is infinitely repeated
with discount factor d.
C D
C
2, 2 0, 3
D
3, 0 1, 1
5
Strategies
 A player’s strategy in an extensive game specifies her
action after all possible histories after which it is her
turn to move.
 A strategy of player i in an infinitely repeated game

of the strategic game G specifies an action of player i
(a member of Ai) for every sequence (a1, …, aT) of
outcomes of G.
6
Grim Trigger Strategy
 Consider the repeated prisoner’s dilemma.
 The strategy prescribes that the player initially
cooperates, and continues to do so if both players
cooperated at all previous times.
si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T.

si (a1, . . . , aT) = C otherwise.
 Note that a player defects if either she or her opponent

defected in the past.
7
Automaton Representation
An automaton for player i is (X, x0 , f, g ).
1. X is a set of states.
2. x0 is the initial state of the automaton.
3. f: XxA X is the transition across states, as a
function of the play.
4. g: X Ai is the play output at each state.
8
The automaton of the Grim Trigger Strategy is as
follows:
 There are two states: C in which C is chosen, and D,
in which D is chosen.
 The initial state is C.
 If the play is not (C,C) in any period then the state

changes to D.
 If the automaton is in state D, there it remains forever.
(C,C) (C,D)
* C (D,C)
D
(D,D)
9
Tit for Tat
 The player initially cooperates.
 At subsequent rounds, she plays the strategy played
by the opponent at the previous round.
si (a1, . . . , aT) = C if aTj = C or T=1.
si (a1, . . . , aT) = D if aTj = D
( . ,D)
( . ,C)
* C D
( . ,C)
( . ,D)
10
Grim Trigger Nash Equilibrium
 Suppose that player j adopt the grim trigger strategy.
 If player i plays grim trigger, then the outcome is (C,

C) in every period with payoffs (2, 2, . . .).
The discounted average is 2.
 If i deviates from the grim trigger strategy, then there

is one period (at least) in which she chooses D.
 All subsequent periods player j chooses D. So the

best deviation for player i is choosing D in every
subsequent period (because D is her unique best
response to D). 11
 If i can increase her payoff by deviating then she can
do so by deviating to D in the first period.
 She obtains the stream of payoffs (3, 1, 1, . . .) with

discounted average
(1 − d)[3 + d + d 2 + d 3 + · · ·] = 3(1 − d) + d.
 Thus player i cannot increase her payoff by deviating

if and only if
2 ≥ 3(1 − d) + d, or d ≥ 1/2.
 Hence, if d ≥ 1/2, then playing the grim trigger

strategy by both players is a Nash equilibrium of the
infinitely repeated Prisoner’s Dilemma.
12
Tit for Tat Nash Equilibrium
 Suppose that player j adopts the tit for tat strategy.
 If player i plays tit for tat, then the outcome is (C, C)

in every period with payoffs (2, 2, . . .).
 If i deviates, then there is one period (at least) in

which she chooses D.
 At the subsequent period player j chooses D.

If player i plays D, she triggers one further D by j.
If i chooses C, then she obtains C.
13
 If i can increase her payoff by deviating, then she can
do so by deviating to D in the first period.
 Player i can obtain either the stream (3, 1, 1, 1,…) or

the stream (3, 0, 3, 0, . . .) with discounted average
(1 − d)[3 + 0d + 3d2 + 0d3 +…] =
3 (1 − d) ∑∞t=0 d 2t = 3 (1 − d)/(1 - d2) = 3/(1 +d)

if and only if
2 ≥ 3(1 − d) + d and 2 ≥ 3/(1 + d), or d ≥ 1/2.
14
Nash Folk Theorem
in the Prisoner Dilemma
 Definition The set of feasible payoff profiles of a
strategic game is the set of all weighted averages of
payoff profiles in the game.
 For any feasible pair (x1, x2) of payoffs there is a finite

sequence (a1, . . . , ak) of outcomes for which each
player i’s average payoff is close to xi:
[ui(a1)+…+ ui(ak)]/k – e1 < xi
< e1 + [ui(a1)+…+ ui(ak)]/k.
15
 The discounted average payoff is as close as possible
to xi when taking the discount factor close enough to
1:
(1 − d)∑∞t=1 d t-1 ui(at) – e2 < xi
< e2 + (1 − d)∑∞t=1 d t-1 ui(at).
 Consider the feasible payoff pair (x1, x2), and the

outcome path b that consists of repetitions of the
sequence (a1, . . . , ak): bnk+l = al for l = 1,…,k.
 Consider the strategy

si (h1, . . . , hT-1) = bT if ht = bt for t = 1, . . . , T − 1
16
1 T-1
 As long as x1 > u1 (D,D) and x2 > u2 (D,D), this “grim
trigger” strategy is a Nash Equilibrium.
 We conclude that any feasible payoff pair (x1, x2) such

that x1 > u1 (D,D) and x2 > u2 (D,D) is a Nash
Equilibrium payoff of the Prisoner’s Dilemma game.
17
Nash Folk Theorem
 Consider a one-shot game. Suppose that each player i
can guarantee herself a “minimum” payoff mi.
 We will show that every feasible payoff profile w

such that wi > mi can be achieved as the discounted
average payoff profile of a Nash equilibrium in the
infinitely repeated game, when d is close to 1.
 This payoff can be achieved with strategies similar to

grim trigger strategies. Deviation from path by player
i is punished by minimizing i’s payoff forever.
18
 For the Prisoner’s Dilemma, the minimum payoff of
player i supported by a Nash equilibrium is ui(D, D).
 Player j can ensure (by choosing D) that player i’s

payoff does not exceed ui(D, D), and there is no lower
payoff with this property.
 Hence, ui(D, D) is the lowest payoff that player j can

force upon player i.
 What is this minimum payoff for player i in an

arbitrary game?
19
 For any collection a−i of the other players’ mixed
actions, player i’s highest possible payoff is
max ui (ai , a−i).
{ai ∈ Ai}
 As a−i changes, this maximal payoff changes.

The collection a−i of “punishments” that make
this maximum as small as possible is the solution of
min max ui (ai , a−i).
{a−i ∈ D(A-i)} {ai ∈ Ai}
 This payoff is known as player i’s minmax payoff.
20
Theorem (Nash Folk Theorem). Let G be a strategic
game. Let w be a feasible payoff profile of G for which
each player’s
payoff exceeds her minmax payoff. Then, for all e > 0,
there exists
δ < 1 such that if the discount factor exceeds δ, then the
infinitely repeated game of G has a Nash equilibrium
whose
discounted average payoff profile w’ satisfies |w’ − w| <
e.
 For any discount factor δ with 0 < δ < 1, the discounted

average payoff of every player in any Nash equilibrium
of the infinitely repeated game of G is at least her
21
minmax payoff.
 Let x be the payoff profile induced by the actions a.
By hypothesis, each xi exceeds player i’s minmax
payoff.
 For each player i, let p-i be a profile of mixed actions

for the players other than i that holds player i down to
her minmax payoff.
 Define each player i’s strategy as follows.

In each period, play ai as long as the play was a in
every
22
previous period.
 Let H∗ be the set of histories in which there is a
period in which exactly one player j chose an action
different from aj.
 Refer to j as a lone deviant.
 The strategy of player i is defined as follows:

si ( ∅ ) = a i,
si (h) = ai if h is not in H∗,
si(h) = (p-j)i if h ∈ H∗ and j is the first lone deviant in
h.
23
 We now show that the profile s is a Nash equilibrium.
 If each player i adheres to si, then her payoff is xi in

every period.
 If player i deviates from si, then she may gain in the

period in which she deviates, but she loses in every
subsequent period, obtaining at most her minmax
payoff, rather than xi.
 Thus for a discount factor close enough to 1, si is a

best response to s-i for every player i.
24
Subgame Perfect Equilibrium
Theorem (One-Shot Deviation Property of subgame
perfect equilibria of infinitely repeated games)
A strategy profile in an infinitely repeated game is a
subgame perfect equilibrium if and only if no player
can gain by changing her action after any history,
given both the strategies of the other players and the
remainder of her own strategy.
25
 The One-Shot Deviation Principle is deceptively simple.
Its application is often not straightforward.
 First, it requires that players cannot gain by deviating

once, in any history of play.
 But there are infinite many histories... So they cannot be
checked one by one.
 They must grouped according to the prescriptions of the
strategy profile we are considering.
 Second, the one-shot deviation may change future play,

according to the strategy that we are considering.
The deviation is one shot from a strategy, not from a play.
26
Grim Trigger Strategy
 Consider the repeated prisoner’s dilemma.
 The strategy prescribes that the player initially
cooperates, and continues to do so if both players
cooperated at all previous times. Otherwise, they should
defect forever.
si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T.

si (a1, . . . , aT) = C otherwise.
27
Grim Trigger SPE
 Suppose that both players adopt the grim trigger
strategy.
 There are two “groups” of histories. Those for which

grim trigger strategy prescribes that the players play
(C,C) and those for which the grim trigger strategy
prescribes that they play (D,D).
 In the first set of histories, if player i plays grim

trigger, then the outcome is (C, C) in every period
with payoffs (2, 2, . . .), whose discounted average is
2.
28
 If i deviates only once, she plays D. Then she reverts
to the grim trigger strategy, that prescribes to play D
at all subsequent periods.
 The opponent, playing grim trigger strategy, plays D

forever as a consequence of i’s one-shot deviation.
 The OSD yields the stream of payoffs (3, 1, 1, . . .)

with discounted average
(1 − d)[3 + d + d 2 + d 3 + · · ·] = 3(1 − d) + d.

if and only if
2 ≥ 3(1 − d) + d, or d ≥ 1/2.
29
 In the second set of histories, if player i plays grim
trigger, then the outcome is (D, D) in every period
with payoffs (1, 1, . . .), whose discounted average is
1.
 If I deviates only once, she plays C. Then she reverts

to the grim trigger strategy, that prescribes to play D
at all subsequent periods.
 The opponent, playing grim trigger strategy, plays D

forever as a consequence of i’s one-shot deviation.
 The OSD yields the stream of payoffs (0, 1, 1, . . .)

with discounted average
(1 − d)[0 + d + d 2 + d 3 + · · ·] = d.
30
 Player i cannot increase her payoff by deviating: 1 ≥
d.
 We conclude that if d ≥ 1/2, then the strategy pair in

which each player’s strategy is the grim trigger
strategy is a Subgame-Perfect Equilibrium of the
infinitely repeated Prisoner’s Dilemma.
31
SPE Folk Theorem
Theorem (Simplified Subgame Perfect Folk Theorem
for
Two-Player Games) Let G be a two-player strategic
game. Let w
be a feasible payoff profile of G for which each player’s
payoff exceeds
her (pure-strategy) minmax payoff. Then for all e > 0
there exists
δ < 1 such that if the discount factor exceeds δ then the
infinitely
repeated game of G has a subgame perfect equilibrium
whose
discounted average payoff profile w satisfies|w’ − w| < e
32
 Take an outcome a such that both players’ discounted
payoffs exceed their pure-strategy minmax payoffs.
 Let pj be an action of player i that holds player j down

to her minmax payoff, and let p = (p2, p1).
 If the minmax profile p is a Nash Equilibrium of the

stage game, then consider a modified grim strategy
such that both players play the sequence at at any
time t; and that, if either player deviates, p is played
for ever.
 Because both players’ discounted payoffs for a

exceed their minmax payoffs, if the discount factor d
is sufficiently close to one, the players will obey to
the modified grim trigger strategy, yielding the 33
 If p is not a Nash Equilibrium, the proof is as follows.
 Let si be a strategy of player i that starts off choosing

ai,0, and continues to choose ai,t so long as the
previous outcome was at; otherwise, it chooses the
action pj that holds player j to her minmax payoff.
 Once punishment begins, it continues for k periods, as

long as both players choose their punishment actions,
and then players revert to a.
 If any player j deviates from the assigned punishment

action, then the punishments are re-started, and player
j is now punished.
34
 To prove that (s1, s2) is a subgame perfect
equilibrium, we now find δ’ and k(δ’) such that if δ >
δ’ then the strategy pair (s1, s2) is a subgame perfect
equilibrium of the infinitely repeated game.
 Suppose that player j adheres to sj.
 If player i adheres to si in any history with no

deviations, then her discounted average payoff is
ui(a).
 If she deviates, she obtains at most her maximal

payoff in the game, say ui*, in the period of her
deviation, then ui(p) for k periods, and subsequently
ui(a) in the future.
35
 Her discounted payoff from the deviation is at most
(1 − δ)[ui*+δui(p)+· · · +δkui (p)] + δk+1ui (a)
= (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a).
 Hence, she does not deviate if

ui(a) ≥ (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a).
 If player i adheres to si in any history where the players

play p, she gets ui(p) for at most k periods, then ui(a) in
every subsequent period.
 This yields a discounted payoff of (1 − δk)ui(p) + δkui(a).
 Note that ui(p) < mi, her minmax payoff, and ui(a) > mi.
36
 If she deviates from si, she obtains at most her minmax
payoff in the period of her deviation, then ui(p) for k
periods, then ui(a) in the future.
 This yields a discounted average payoff of at most

(1 − δ)mi + δ(1 − δk)ui(p) + δk+1 ui(a).
 She does not deviate if (1 − δk) ui(p) + δk ui(a) ≥ (1 −

δ)mi + δ(1 − δk) ui(p) + δk+1ui(a) or (1 − δk) ui(p) + δkui(a)
≥ mi .
 For each value of δ sufficiently close to 1 we can find

k(δ) such that (δ, k(δ)) satisfies the 2 no-deviation
inequalities:
ui(a) ≥ (1 − δ)ui*+δ(1-δk)ui(p)+ δk+1ui (a), 37

Finitely Repeated Games
 Consider any Subgame Perfect Equilibrium of a
finitely repeated game.
 In the final stage, a Nash Equilibrium of the stage
game must be played.
 Hence, the set of Equilibria is enlarged only if there
are multiple equilibria in the stage game.
 Otherwise, the unique Subgame Perfect Equilibrium
of the repeated game is the unique Nash Equilibrium
of the stage game.
38
Prisoner’s Dilemma
 The following game is repeated for T periods.
C D
C
2, 2 0, 3
D
3, 0 1, 1
 Proceeding by backward induction, in the last period,
the unique Nash equilibrium is (D,D).
39
 Because in the last period players play (D,D)
regardless of the previous play, in the second to last
period future payoffs do not depend on current play.
 It is as if players were playing the following game.
C D
C
2, 2 0, 3
D
3, 0 1, 1
 The unique Nash Equilibrium is (D,D).
 Proceeding by backward induction, the unique
subgame-perfect equilibrium is (D,D) in every period.
40
Expanded Prisoner’s Dilemma
The following game is repeated for T periods.
C D E
C
2, 2 0, 3 -2,-2
D
3, 0 1, 1 -2,-2
E
-2,-2 -2,-2 -1,-1
There are 2 Nash Equilibria: (D, D) and (E,E).
41
Expanded Prisoner’s Dilemma
 In period T, a Nash Equilibrium is played, either (D,
D), with payoffs (1, 1), or (E, E), with payoffs (-1, -1).
 We construct a Subgame Perfect Equilibrium as

follows.
1. In period T, the profile (D, D) is played.

2. In all periods t = 1, …, T-1, the profile (C, C) is played
with payoffs (2, 2).
3. If either player deviates to D, then the future play
switches to (E, E) forever.
42
 This is a SPE if and only if players do not have an
incentive to deviate at the period before the last.
 In fact, the punishment (E, E) forever is more severe

if there are more periods left to play.
 Each player must prefer to play C with payoff 2 + d,

than to play D, with payoff 3 – d.
 Hence, the strategies are a SPE if and only if:

3 - δ < 2 + δ, i.e. δ > 1/2.
43
Summary of the Lecture
 Infinitely Repeated Games
 Nash and Subgame-Perfect Equilibrium
 Finitely Repeated Games
44
Preview of the Next Lecture
 Coalitional Games and the Core
 Ownership and the Distribution of Wealth
 Horse Trading and House Exchanges
 Voting and Matching
45

EC941 - Game Theory: Prof. Francesco Squintani Email: F.squintani@warwick - Ac.uk

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EC941 - Game Theory: Prof. Francesco Squintani Email: F.squintani@warwick - Ac.uk

Uploaded by

Copyright:

Available Formats

EC941 - Game Theory

 Infinitely Repeated Games

 Nash and Subgame-Perfect Equilibrium

 Finitely Repeated Games

 A simultaneous move game, represented as a normal

 This yields to enlarging the set of equilibria, if

 A strategy of player i in an infinitely repeated game

si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T.

 Note that a player defects if either she or her opponent

 If the play is not (C,C) in any period then the state

 If player i plays grim trigger, then the outcome is (C,

 If i deviates from the grim trigger strategy, then there

 All subsequent periods player j chooses D. So the

 She obtains the stream of payoffs (3, 1, 1, . . .) with

 Thus player i cannot increase her payoff by deviating

 Hence, if d ≥ 1/2, then playing the grim trigger

 If player i plays tit for tat, then the outcome is (C, C)

 If i deviates, then there is one period (at least) in

 At the subsequent period player j chooses D.

 Player i can obtain either the stream (3, 1, 1, 1,…) or

 Thus player i cannot increase her payoff by deviating

 For any feasible pair (x1, x2) of payoffs there is a finite

 Consider the feasible payoff pair (x1, x2), and the

 Consider the strategy

 We conclude that any feasible payoff pair (x1, x2) such

 We will show that every feasible payoff profile w

 This payoff can be achieved with strategies similar to

 Player j can ensure (by choosing D) that player i’s

 Hence, ui(D, D) is the lowest payoff that player j can

 What is this minimum payoff for player i in an

 As a−i changes, this maximal payoff changes.

 This payoff is known as player i’s minmax payoff.

 For any discount factor δ with 0 < δ < 1, the discounted

 For each player i, let p-i be a profile of mixed actions

 Define each player i’s strategy as follows.

 Refer to j as a lone deviant.

 The strategy of player i is defined as follows:

 If each player i adheres to si, then her payoff is xi in

 If player i deviates from si, then she may gain in the

 Thus for a discount factor close enough to 1, si is a

 First, it requires that players cannot gain by deviating

 Second, the one-shot deviation may change future play,

si (a1, . . . , aT) = D if at = (C,C) for some t = 1, . . . , T.

 There are two “groups” of histories. Those for which

 In the first set of histories, if player i plays grim

 The opponent, playing grim trigger strategy, plays D

 The OSD yields the stream of payoffs (3, 1, 1, . . .)

 Thus player i cannot increase her payoff by deviating

 If I deviates only once, she plays C. Then she reverts

 The opponent, playing grim trigger strategy, plays D

 The OSD yields the stream of payoffs (0, 1, 1, . . .)

 We conclude that if d ≥ 1/2, then the strategy pair in

 Let pj be an action of player i that holds player j down

 If the minmax profile p is a Nash Equilibrium of the

 Because both players’ discounted payoffs for a

 Let si be a strategy of player i that starts off choosing

 Once punishment begins, it continues for k periods, as

 If any player j deviates from the assigned punishment

 Suppose that player j adheres to sj.

 If player i adheres to si in any history with no

 If she deviates, she obtains at most her maximal

 Hence, she does not deviate if