Repeated Games with Long-run and Short-run Players

This paper studies the set of equilibrium payoffs in repeated games with longand short-run players and little discounting. Because the short-run players are unconcerned about the future, each equilibrium outcome is constrained to lie on their static reaction (best-response) curves. The natural extension of the folk theorem to games of this sort would simply include this constraint in the definitions of the feasible payoffs and minmax values. In fact, this extension does obtain under the assumption that each player's choice of a mixed strategy for the stage game is publicly observable but, in contrast to standard repeated games, the set of equilibrium payoffs is different if players can observe only their opponents' realized actions.


INTRODUCTION
The folk theorem for repeated games with discounting asserts that each individuallyrational payoff can be attained by an equilibrium for a range of discount factors close to one.It has long been realized that results similar to the folk theorem can arise if some players play the stage game infinitely often but others-play it only once, so long as each player is aware of all previous play.Consider, for example, one player who plays the following variant of the prisoners' dilemma against a succession of opponents, each of whom plays the game once.In each round of play, the short-run player moves first, deciding whether to "cooperate" or "cheat", and then the long-run player responds.If the discount rate is sufficiently close to one, one equilibrium of this game consists of all players playing cooperatively as long as everyone has cooperated previously.Note that the short-run players cooperate because the long-run player will punish them immediately if they cheat, whereas the long-run player cooperates because of her stake in future play.
Applications of this general notion can be found in the literature: Simon (1951), for example, uses such a construction in his discussion of the employment relationship.(For a reformulation of Simon's analysis in more modern game-theoretic terminology, see Kreps (1986).)Dybvig and Spatt (1980) and Shapiro (1980) appeal to similar equilibria in a discussion of a manufacturer's reputation for producing high-quality goods.
Despite these applications, we are unaware of a general analysis of such games in the style of the folk theorem.It is clear that having some long-and some short-run players is different from having only long-run players.If, in our example above, the stage game is the standard simultaneous-move prisoners' dilemma, or if the short-run players move second, then the short-run players will cheat in every round, and hence the long-run player will do so as well.In contrast, in all three variations of the stage game the cooperative outcome can be obtained in an equilibrium if both sides are long-run.
When all players are long-run, the perfect folk theorem for discounted repeated games (Fudenberg and Maskin (1986)) states that, under a full-dimensionality condition, any feasible payoffs that give all players more than their minmax values can be attained by a perfect equilibrium if the discount factor is near enough one.'Our example suggests that the obvious extension of this result to games with short-run players would simply add a constraint that short-run players always play short-run best responses.This constraint would bind both in the definition of the feasible payoffs as in our example and in the definition of the minmax values, so that the best-response constraint could raise the lower bound on payoffs for long-run players.In Propositions 1 and 2 of Section 2, we establish this extension under the assumption that each player's choice of a mixed strategy for the stage game is observable.2 We then turn to the more familiar case in which a player observes only others' realized actions and not their mixed strategies.In standard repeated games the folk theorem obtains in either case, but when there are some short-run players the set of equilibria can be strictly smaller if mixed strategies are not observed than if they are.The logic behind this discrepancy is simple: when all players are long-run, mixed strategies are not needed along the equilibrium path because the threat of subsequent punishment provides sufficient deterrent against deviation.But in order to induce a short-run player to take a particular action, it may be necessary for at least one long-run player to randomize.If mixed strategies are not observable, then randomization requires that the long-run player be made indifferent among the actions in the support of her mixed strategy, so that an outcome favourable for her must be followed by "punishment", while an unfavourable outcome must be followed by "reward".But with even a small degree of discounting, the need for these punishments lowers the long-run player's highest equilibrium payoff.We present a simple example and a general result (Proposition 3) to this effect in Section 3.
This limiting inefficiency may remind the reader of inefficiencies found in the study of repeated games with moral hazard, as in repeated partnerships (Radner, Myerson and Maskin (1986)) and repeated cartels (Green and Porter (1984), Abreu, Pearce and Stacchetti (1986)), but those inefficiencies have different sources.Fudenberg, Levine and Maskin (1989) show that the folk theorem extends to discounted repeated games with moral hazard-so there is no limiting inefficiency-so long as (roughly speaking) the public information permits the "statistical identification" of a deviator.In Radner, Myerson and Maskin (1986), the source of the inefficiency is that there each player has many possible actions yet there are only two possible outcomes, so that many different strategy profiles necessarily induce the same probability distributions over outcomes.In the cartel examples the inefficiency is caused by a restriction to symmetric equilibria, which again prevents the statistical identification of a deviator because a deviation by one player generates the same distribution on outcomes as the same deviation by another player; when asymmetric equilibria are allowed the limiting inefficiency does not arise.The limiting inefficiency with short-run players differs, as it does not depend on a restriction to symmetric equilibria or on the number of possibly observed outcomes being small.Instead, it is a consequence of some of the players having short-run objectives.
Proposition 4 of Section 3 characterizes the limit set of equilibrium payoffs (as the discount factor goes to one) for games with a single long-run player.Both this characteriz-ation and the results of Section 2 assume that there is a publicly observable random variable that players can use as a correlating device.The device makes probabilistic punishments possible: if a long-run player deviates, then players can jointly switch to a "punishment equilibrium" with some probability p < 1.The availability of public randomization is, however, inessential for our results; we show in Section 4 that it does not enlarge the limit set of equilibrium payoffs.In proving this, we construct equilibria of a simple and intuitive form that we call target strategy equilibria.A target strategy equilibrium works roughly as follows:3 at each date the long-run player is given a target corresponding to the discounted sum of payoffs she should have accumulated to that point in equilibrium.If she has performed excessively better than this target level, she is "punished"; if she has underperformed, she is "rewarded".These punishments and rewards are constructed so that the long-run player is assured of hitting the desired payoff level over the infinite horizon if she conforms to the equilibrium and so that no matter how she plays, she cannot exceed the target level in the long run.
We make further use of these target strategies in Section 5, where we suppose that, instead of discounting future payoffs, the long-run player weights all periods equally and so seeks to maximize the payoffs' time average.With time-averaging, the cost of keeping the long-run player indifferent among various actions in the support of a randomized strategy vanishes.Roughly put, anything that transpires over a finite number of periods is inconsequential, and so one can rely on the strong law of large numbers to guarantee that the long-run player is precisely compensated for taking actions in the support of a randomized strategy that are, in the short-run, disadvantageous for her.This implies that in a formulation with time-averaging the same set of payoffs is obtained as in the case where the player's privately mixed strategies are observable.Thus in passing from discounting to time-averaging, there is a discontinuity of the equilibrium set.4 Three extensions and complements to our results that are developed elsewhere are discussed in Section 6.First, except for Proposition 3, we are silent here on the case of several long-run players and unobservable mixed strategies.Using methods different from ours, Fudenberg and Levine (1989b) obtain a multi-player analogue of Proposition 4. Second, just as one is interested in the folk theorem with noisy observables, so one might consider how noise affects the results obtained here.Results on this extension are also reported in Fudenberg and Levine (1989b).Third, our constructions depend on there being an infinite horizon.One way to get similar results with a finite horizon is to introduce a little bit of incomplete information about the "nature" of long-run players, in the style of Kreps, Milgrom, Roberts and Wilson (1982).When there is only a single long-run player about whom there is incomplete information, this style of analysis can give sharp answers.This theory, developed in Fudenberg andLevine (1989a, 1990) is summarized and contrasted with our results in Section 6.where Ai is player i's (finite) action space and gi is his payoff function.We denote player i's mixed strategies by ai E sdi and mixed strategy profiles by a = (a1,..., an) E si, x ... x sin = i.Given a E d we write a,i for the strategies of all players except for i.We write gi (a) for the expected value of gi under the profile a.For i = 1, . .., n, we write gi for min,a gi(a) and gi for max,a gi(a).

Preliminaries and definition of the minmax values
Label the players so that players 1 to I are long-run and I+ 1 to n are short-run.Let For each i = 1, . .., 1, choose mI = (ml, ..., mn) E Sd so that m' = a solves minEB maXaiEA, gi(ai, a-i), and set pi = maXai gi(ai, m,ii).
(The profile m1 and the value vi exist because the constraint set B is compact.) The strategies mlii minimize long-run player i's maximum attainable payoff subject to the requirement that there exist mi with (mi, mLi) E B. The restriction to B reflects the constraint that the short-run players will always choose actions that are optimal in the short run.We claim (and will demonstrate below) that, owing to this constraint, no equilibrium of the repeated game can give long-run player i less than vi.(The short-run players may be able to force player i's payoff even lower using strategies that are not short-run optimal, but this cannot occur in equilibrium.) Let us illustrate these definitions with a two-player example, where a long-run player 1 faces a sequence of players 2 in the normal form game of Figure 1.Here player 2's minmax strategy against player 1 is ml = L, which holds player l's payoff to at most pi = 0. Playing R would hold player l's payoff even lower, but R is strictly dominated and so will never be used by a short-run player 2. (If player 2 were a long-run player, we would have v1 = -2.) Note carefully that in order to induce player 2 to play L, player 1 must put probability at least 1/2 on D, and so for any choice of min, g1(m') -1/2, which is less than player l's minmax value v, =0.That is, in the general definition of mi, it is not required that ai be a best response to mlii.If we interpret the strategies mi as a "punishment" against player i, we may have to provide him with the incentive to cooperate in his own punishment.We will show that such incentives can be found for discount factors near one.

The repeated game
In the repeated version of g, we suppose that long-run players maximize the normalized discounted sum of their single-period payoffs, with common discount factor 8. That is, long-run player i's payoff is the expectation of (1 -8) E2 = 8t'gi(a'), where a' is the action profile played at date t.Short-run players in each period act to maximize their payoff in that period.Throughout this paper, we assume that all players acting at date t have observed the history of play up to date t.In this section we assume additionally that each player can observe the others' past mixed strategies, i.e. the way they have previously randomized.This assumption (or a restriction to pure strategies) is often made in the standard repeated games literature, but as Fudenberg andMaskin (1986, 1989) have shown, it is not necessary there.(Here it matters-see the next section!)In this section and the next, we also assume that, before play in each round, players observe the outcome of a "public randomizing device", which is a random variable Ot with uniform distribution on [0, 1].Each of these random variables is independent of the others.Their purpose is to allow us to convexify the set of attainable payoffs-see for example the end of the proof of Proposition 2.5 We need terminology in what follows that distinguishes strategies and strategy profiles played in a given stage from the overall strategy and strategy profile used by players over the course of the repeated game.We will hereafter use the terms action, mixed action, and (mixed) action profile whenever we refer to the (possibly mixed) actions taken by players at a particular date.That is, ai E S is a mixed action for player i and an action profile is an n-tuple of stage-game (mixed) actions, one for each player.We will use the terms strategy and strategy profile for an overall history-dependent strategy (profile) for the repeated game.We will denote overall strategies for player i by o-i = (o-?, , ...*), where at is a function from time t histories to sdi, and strategy profiles by o-= (o(o, 01, ...).When needed, we will write c-t(ht) for the mixed action played by i at date t given history h ', but we will usually suppress the history dependence.If players use the strategy profile o-, the expected payoff to long-run player i is the expectation of (1-8) E 8t'gj(Cot), where this sum may be random because of the suppressed dependence of crt on the outcomes of earlier randomizations.Proposition 1.In any Nash equilibrium for any discount factor 8, the vector of payoffs for the long-run players lie in V, and long-run player i necessarily has payoff at least equal to Vi.

Our
Proof Fix a Nash equilibrium with payoff vector v for the long-run players.Short-run players will necessarily be playing short-run optimal actions, so each period's actions a necessarily lie in B. Since v is a convex combination of the expected payoffs in each round, this implies that v E V. Next fix some long-run player i.Since all players begin each period with the same information, in the equilibrium player i correctly anticipates the mixed action profile a,i (which may depend on history) that his opponents are about to play.One feasible strategy for player i is to play in each period the action that maximizes that period's expected payoff against the action profile of his opponents.By the definition of vi and the fact that a,i lies in the projection of B on ii = Si, x ... X 4i-l X Si+l X ... X sin this strategy will give i at least vi per period, which means his overall payoff will be at least vi (regardless of 8).11 Proposition 2. Assume that the dimensionality of V* = 1, the number of long-run players.7 Then for each v in V*, there is a B E (0, 1) such that for 8 E (?, 1) there is a subgame-perfect equilibrium of the infinitely-repeated game with discount factor 8 in which v is the vector of payoffs for the long-run players.
Remark.For the perfect folk theorem when there are three or more players, full dimensionality of the set of feasible and individually rational payoffs is needed; cf.Fudenberg and Maskin (1986).The corresponding assumption in Proposition 2 is that the dimensionality of V* equals the number of long-run players.Since our model includes theirs as a special case, our need for this assumption is hardly surprising.In the case where there is one long-run player, either V* is empty or it is one-dimensional (recall that V* is defined with a strict inequality).Hence for the case of a single long-run player, Proposition 2 could be stated without the qualification.
Proof Theorem 2 in Fudenberg and Maskin (1986) provides the basic method of analysis.We sketch the steps of the proof here, but most of the detailed calculations are omitted.
Assume that, for the given v, there is some a E B such that gi (a) = vi for all the long-run players i. Choose a v' in the interior of V* and an > 0 so that for all i from 1 to 1, (V' + V-,Vi+ e, V, i+ ,...,v+E)EV* and vi+ -< vi.
Assume that for each i = 1, . .., 1 there is some (mixed) action profile ai E B that yields Vj+ ? to all the long-run players except for i and yields vi to i. Recall the definition of mi from Subsection 2.1, and let wI = gi(m) for i=1,...,l.
If we think of mi as the profiles used to punish j should j deviate from the equilibrium, then wi will be long-run player i's stage-game payoff when j is being punished (and, note, when j cooperates in this punishment).Choose an integer N so that for each i = 1, .. ., 1, To demonstrate that this is an equilibrium, note first that play is always in B, so short-run players are always playing (short-run) optimal strategies.As for the long-run players, by standard results for discounted dynamic programming it suffices to check that in every subgame no player can gain from deviating once and then conforming.If player i deviates in Phase I, IIj or IIIj for j $ i, she is minmaxed for the next N periods, and Phase IIIj play will give her v'.If instead she conforms she gets at worst (1-_N)gi + 8NI(V + ?)It is easy to see that conforming is better for 8 close enough to one.If player i conforms in Iij (i.e. when she is being punished) and there are N' stages left in the phase, her payoff, under the maintained assumption that no one else deviates, is qi(N') (1-8N)Wi ?8Nv1 which exceeds vi if 8 is close enough to one.If she deviates once and then conforms, she receives at most vi the period she deviates (because she is being minmaxed) and receives the payoff qi(N) _ qi(N'), which lowers her payoff.The condition on N ensures that for 8 close to one, the gain to i from deviating in Phase III, is outweighed by Phase IIi's punishment.
All this assumes that v and the phase IIIj payoffs can be achieved by action profiles drawn from B. In case any or all of these payoff vectors arises only as a convex combination of payoffs feasible for profiles from B, we use the publicly observed random variable to convexify in the obvious way.11

Intuition and an example
We now drop the assumption that players can observe their opponents' mixed actions and assume instead that they can only observe their opponents' realized actions.For the time being we continue to assume that players have access to a public randomizing device.
In ordinary repeated games (privately) mixed actions are needed during punishment phases because in general a player's minmax value is lower when his opponents use mixed actions.However, mixed actions are not required along the equilibrium path, since desired play along the path can be enforced by the threat of future punishments.Note well that one has considerable freedom in constructing punishment phase payoffs that support a given equilibrium outcome.As long as the punished player's continuation payoff at the start of the punishment phase is sufficiently low to deter deviations, its exact level is irrelevant (since the punishment phase is never reached in equilibrium).Fudenberg and Maskin (1986) show that, under the full-dimension condition of Proposition 2, players can be induced to use mixed actions as punishments if the continuation payoffs at the end of a punishment phase depend on the realized actions in that phase in such a way that each action in the support of the mixed action yields the same overall payoff.This construction restricts somewhat the possible continuation values beginning at a punishment phase, but not so much that the required inequalities are affected.Thus for ordinary repeated games (that meet the full-dimensionality assumption), the inability to observe mixed actions does not affect the set of equilibrium payoffs.
In contrast, with short-run players some payoffs from B can be obtained only if the long-run players privately randomize, so that mixed actions are in general required along the equilibrium path.Moreover, if a long-run player is to be willing to use a mixed action, her continuation payoffs after some of the pure actions in the mixed action must be lower than after others.Since these low continuation payoffs will have positive probability, the highest possible equilibrium payoff for a long-run player will be inefficient relative to what is possible if mixed actions are observable.
To illustrate this point, consider the stage game depicted in Figure 2. Imagine that this game is played between a long-run player 1 and a sequence of short-run players 2. Let p be the probability that player 1 plays D. Player 2's best response is M if 0 ' p ' 1/2, L if 1/2-'p -100/101, and R if p-' 100/101.There are three static equilibria: the pure action equilibrium (D, R), a second in which p = 1/2 and player 2 mixes between M and L, and a third in which p = 100/101 and player 2 mixes between L and R. Player l's maximum attainable payoff is 3, attained when p = 1/2 and player 2 plays L.
If player l's mixed actions are observable, Proposition 2 implies that she can attain the payoff 3 in a subgame-perfect equilibrium of the infinitely-repeated game for 8 near enough 1.But we will now demonstrate that, if her mixed actions are not observable, none of her Nash equilibrium payoffs is higher than 2, regardless of 8.
To see this, fix a discount factor 8, and let v*(8) be the supremum over all Nash equilibria of player l's equilibrium payoff.Suppose that for some 8, v*(8)>2.Let E = (1-8)(v*(8) -2)/2, and choose an equilibrium profile oJ such that player l's payoff at this equilibrium, denoted v(ou), is at least v*(8)-E.It is easy to see that the set of equilibrium payoffs is stationary: any equilibrium payoff is an equilibrium payoff for any subgame and conversely.Thus, the highest payoff player 1 can obtain starting from period 2 is also bounded by v*(8).Now v(o() is (1 -8) times player l's first-period payoff at this equilibrium, gl(o0), plus 8 times her expected continuation payoff, which is no greater than v*(8).This gives the inequality But in order for player l's first-period payoff to be at least 2, player 2 must play L with positive probability in the first period.Player 2 will only play L if player 1 randomizes between U and D, hence player 1 must be indifferent between her first-period choices and in particular must be willing to play D. If she plays D, she gets at most 2 for her first-period payoff, and the inequality above becomes V* (8)-E< v(o) < (1 -)2+ 8v(8).

General results
Although our example included only one long-run player, the reasoning by which we computed an upper bound on her payoffs generalizes to any number of such players.Specifically, for each long-run player i = 1, ... ,1, define v = max.,EB minaiEsupport( ei) gi(ai, an,j).
That is, v* is the most that player i can expect to get from a profile a E B if i takes the worst (for him) action that is called for by ai.We have Proposition 3.For no discount factor less than 1 is there a Nash equilibrium of the repeated game in which, for some i ' 1, player i's payoff exceeds v*.
We omit the proof of Proposition 3, as it follows the argument of the example nearly exactly.
Because v* is in general less than iU = max,,eB gi(a) (the maximum feasible payoff for i), Proposition 3 implies that unobservable mixed actions give rise to a limiting inefficiency-an inefficiency that does not vanish as the discount factor approaches one.8 To characterize the set of equilibrium payoffs we will henceforth assume that there is a single long-run player, or 1= 1.9 Proposition 4. Consider a game with a single long-run player, player 1.For any v1e [pi, v*] there exists a 8'< 1 such that for all 8 E (8', 1), there is a subgame-perfect equilibrium in which player l's payoff is vl.For no 8 is there an equilibrium in which player 1's playoff exceeds v*.
Proof.We begin by constructing a "punishment equilibrium" in which player l's normalized payoff is exactly pl.If v, is player l's payoff in a static equilibrium this is immediate, so assume that all the static equilibria give player 1 more than vl.Fix any static equilibrium a, and let e1 = ga(a)> v, be player l's payoff in this equilibrium.
We construct a two-phase equilibrium for 8 that is sufficiently close to one in a sense to be made precise in a moment.Play begins in phase I. (Calling the continuation in phase II a "punishment equilibrium" is a standard abuse of terminology.Phase II is certainly used as a punishment if an action outside the support of a1 is chosen by 1 when in phase I.But if a1 is non-degenerate, there is probability one that play eventually moves to phase II, even if player 1 always conforms to the equilibrium.) Equilibrium values between v, and v* are obtained either by using the initially available public signal to randomize between the two equilibria or by modifying the definition of p*(al) in the second equilibrium in the obvious fashion.

Preliminaries
The equilibria that we constructed in the proof of Proposition 3 rely on our assumption that players can condition their play on the outcomes of publicly observable random variables.Although that assumption is not implausible in many settings, one may wonder whether it is necessary for our results.It is known (cf.Fudenberg and Maskin (1988)) that public randomization is not needed for the usual folk theorem.But we have already seen that some features of the usual setting do not apply to ours, so the question remains pertinent.In fact, we will show in this section that public randomization is not necessary (except possibly to implement vl).We do so by constructing a subgame-perfect equilibria for each payoff except v, in the range specified by Proposition 4. We provide these equilibria not simply to settle the question of whether a lack of public randomization matters.In our view, the form of these equilibria has intuitive appeal.Essentially, given a target equilibrium value for player 1, the short-run players keep track of how well player 1 is doing relative to "normal progress" toward that target.If player 1 is not making supernormal progress, she is "rewarded" by the others.But if she is too far ahead of normal progress, the short-lived players take actions which bring her trajectory back into line.These strategies work because, thanks to the rewards, player 1 never falls so far behind that she can't catch up, nor can she get so far ahead by deviating from the equilibrium that the short-lived players can't bring her back into line.
Fix any static equilibrium a' for the stage game, and let vil be player l's value in that equilibrium.Playing this static equilibrium in every period implements the value vil in the repeated game (for any 8).We consider values of v1 other than vil in two steps, corresponding to v1 E (ia, v*] and v1 E (vi, ix).

Target strategy equilibria for v1 E (ix, v*]
Rescale the payoffs so that vi = 0. Fix v1 E (6k, v*] and take 8 large enough that (1 -8)gl < 8vl.We will recursively construct a strategy profile oC that gives the long-run player a payoff of vl.Note that r, is the (normalized) present value of a payoff of v, in each period through period t -1; this is the "target" value that the strategy we are constructing will aim at.By contrast, I, is a measure of the long-run player's actual progress.The index I, is not quite the present value of l's payoffs through time t-1, because it is computed with the mixed actions that are meant to be used by the short-run players and not the actions they actually take.But in an equilibrium, from the laws of conditional expectation we know that player l's expected payoff from taking a given sequence of actions through time t-1 is precisely the expectation of 1t.(The characterization of a target strategy equilibrium given in the introduction was inaccurate in this respect: We use l's actual action choice and the mixed-action profiles CJt11 of the others to compute how well player 1 is doing.)Finally, J, is also a measure of player l's progress.In the strategy profile oC, J, is compared to "normal" progress as measured by r,, and player 1 is punished if she is too far ahead of normal progress (if J, > r,+1).We will see that in this equilibrium, as long as player 1 follows the equilibrium strategy, there is no possibility that I, will ever fall below the measure of normal progress, r,.But if she deviates this might happen.The measure Jt ignores any shortfall of per-period payoff below v1, as at each date the "official record" J, is adjusted up to the level of normal progress, if in fact there was any shortfall on the previous round.By induction, we see immediately that Jt -_It along any path of play.
As It and Jt are discounted sums, as t --?? they clearly converge to limits I,, and Jos.And, as noted, if players other than player 1 play according to 0-d, player l's expected payoff to any strategy is simply the expected value of IO.
Since all action profiles are selected from B, none of the short-run players has an incentive to deviate from oJ.So to verify that we have a Nash equilibrium and that this Nash equilibrium gives payoff v, to player 1, it suffices to show: where the second equality uses the induction hypothesis.For (ii) again proceed inductively.Clearly, Jo =0 ?C= vl.Assume that Jt C vl.Either Jt ?rt+1 or not.In the first case, CJt prescribes play of a.But whatever action player 1 takes against a -, her expected payoff is no greater than zero, since a is a static equilibrium with payoff normalized to be zero.Hence in this case Jt+1 = Jt C= vl.On the other hand, if Jt < rt+1, then the largest payoff that player 1 can conceivably get, g, is by construction insufficient to push herbeyond vl: since Jt < rt+, = (1-8t+l)Vl and (1 -8)gl = vl, we have This establishes that we have a Nash equilibrium.We should also check whether the equilibrium is subgame-perfect.This is the point of the "levelling-up" which turns I into J.Note first that, by the construction of J, and oC, it is impossible to find any subgame starting point at which J, > v, or J, < r,.The proof of part (ii) extends intact to show that lim,OO T = v1 with probability one no matter what strategy player 1 chooses.As for part (i), we might find ourselves at the start of a subgame at which It < J.But then one adapts the arguments above as follows.In this subgame, the continuation payoffs for player 1 are given by the expectation of I, -It.One can then adapt the proof of (i) to show that as long as player 1 follows o-l in the subgame, IT -I, = J, -J, for r > t; player 1 will lose no further ground if she keeps to o-1.Of course, I -It J, -J in general, so following o-l is optimal for player 1 in this subgame.

Target strategy equilibria for v, E (VI, iv)
Now we show how to construct equilibria (for large enough 8) that yield payoffs v, between v, and the static equilibrium payoff vtX.We continue with the rescaling of payoffs so that vil = 0. Pick a v1 E (vI, 0), and choose 8 large enough so that (1 -8)gl _ v1 and SV_ V1.
Proceeding as above, we make two claims: (i) If player 1 uses strategy ol , then It -v, for all t, which implies that I,,,, vl.(ii) Regardless of how player 1 plays, I ' vl.
Since oC specifies best responses for all the short-run players by the usual argument, this establishes that we have a Nash equilibrium that gives player 1 an expected payoff of vl.
To demonstrate (i), we proceed inductively.Clearly Io=0 To show subgame-perfection, we note first that our argument that I ' v1 for any strategy by player 1 works in any subgame.Now if we begin a subgame with I , v1 < rt, then induction establishes that for all subsequent r, IT '-, 'I, C--c v1 < r, and oJt prescribes a static equilibrium that gives player 1 a payoff of zero at most.By playing according to ol in these circumstances, player 1 obtains IT = I, the best she can do.While if we begin with a subgame where I, > v1, then the argument given previously establishes that by following o-, IO_ vl.

TIME-AVERAGING
We have observed that the barrier to sustaining equilibrium payoffs higher than v* when player l's mixed actions are unobservable is the need to provide "rewards" and "punishments" to keep her indifferent among all elements of a mixed action's support.Because no payoff from a single period (or, for that matter, any finite number of periods) has any effect on a time-average, however, one may suspect that maintaining indifference is easier when discounting is replaced by time-averaging.Put another way, when there is timeaveraging, the target strategy profiles of Section 4 can aim at higher targets, since the law of large numbers implies that the time-average is almost surely the strategic expected value, so that any shortfalls are certain to be transitory.This intuition is confirmed by Proposition 5, which demonstrates that the set of equilibrium payoffs changes discontinuously at 8 = 1.
An analogous discontinuity for repeated-partnership games is pointed out in Radner (1986) and Radner, Myerson and Maskin (1986).In the games of these papers, two players make unobservable effort decisions that affect the probabilities of observable "good" and "bad" outcomes.In order to deter "shirking", both players must be "punished" when the bad outcome occurs, even though it has positive probability when neither player shirks.As we have noted, this probability of inefficient joint punishment is why the best equilibrium outcome is bounded away from efficiency when the payoff criterion is the discounted normalized value (Radner, Myerson and Maskin (1986)).However, Radner (1986) shows that efficient payoffs can be attained in partnerships with timeaveraging.Using the "law of the iterated logarithm" (hereafter, l.i.l.) (a technique also pioneered by Rubinstein (1979) and by Rubinstein and Yaari (1983)), he constructs strategies such that (i) if players never cheat, punishment occurs only finitely often, which is negligible, and (ii) an infinite number of deviations is very likely to trigger a substantial punishment.Since no finite number of deviations can increase the time-average payoff, in equilibrium no one cheats yet the punishment costs are negligible.
Since the inefficiencies in repeated partnerships and games with short-run players both stem from the need for punishments along the equilibrium path, it is not surprising that the inefficiencies in our model also disappear when the long-run player is completely patient.
Proposition 5. Imagine that there is a single long-run player, player 1, who evaluates payoff streams with the criterion lim inf, 0 E ( =o gla) where E denotes expectation.(All other players are short-run and act to maximize their payoffs in the current period.)Then for all v, e V* there is a subgame-perfect equilibrium with payoff v, for player 1.
A referee has suggested that a proof using the l.i.l.techniques can be used to prove Proposition 5.Although we have not checked the details, this seems to us to be so.We offer a different line of proof which extends the target strategies of Section 4. These strategies differ from those that would be constructed using the l.i.l. in interesting ways that we discuss at the end of the proof.The technical details of the proof, which is based on an extension of the strong law of large numbers for martingales with bounded increments instead of the l.i.l., may also be of interest.'1Proof.Fix the payoff v, E V* to be implemented and, for notational convenience, normalize payoffs so that v, = 0.For each history of play, define Io = 0 and I, = EZ'_1O g,(a').Note that here we define I, with the actual play and not with the intended randomizations of the short-run players.
We begin with bounds of the value of I,, constructed for each path.For each history ht, let Rt (h t) = {r 't -1 1 I, '1 O} and let Pt (h t) = {r '-t -1 1 I, > O}.(The letters R and P are chosen as mnemonics for "reward" and "punishment", with reference to the strategy oC we shall shortly construct, although oC will play no role in these bounds.)Let xt(ht) = Let a be the (possibly mixed) action profile in B that maximizes player l's expected payoff (i.e. that attains the value v,).Note that g,(a) _0.Recall that ml denotes the (mixed) action profile that minmaxes player 1, and note that maxa, gl(al, nm',1) = vI ' 0 in this normalization.

Define a strategy profile CJ by t [a if It--O,and
Since actions are selected from B, the short-run players have no wish to deviate from this strategy profile.As for player 1, we claim that (i) no matter how player 1 plays, her payoff is bounded above by v, = 0, and (ii) that by following aol player 1 can attain payoff vl, both statements almost surely (and hence in expectation).These two statements, when established, show that we have a Nash equilibrium.For (i), we note that no matter how player 1 plays, if her opponents play according to o-,,, then {Yt} as defined above is a supermartingale sequence.Either the increment in {Yt} is zero (if I, 0) or the increment is, in expectation, less than or equal to zero (if I, > 0), since the best against ml l that player 1 can do (in expectation) is less or equal to zero.This supermartingale has bounded increments, so Lemma 3 in the Appendix shows that (Yt -Y,)/ t converges to zero almost surely.This implies that lim sup It t 0 almost surely, and since the per-period payoffs are uniformly bounded, lim sup E(Il/ t) ' 0 as well.Thus we have (i): no matter how player 1 plays, she can do no better than v1 = 0.
And if player 1 conforms to aol, then {xj} is a submartingale sequence with bounded increments.The increments to xt are either zero (when It >0) or are non-negative in expectation (when It ' 0).Lemma 4 in the appendix implies that, in this case, (xt -Xt)/ t converges to zero almost surely.Accordingly, and combining with the previous step, this implies that the limit of Itl t is zero almost surely.Since payoffs are uniformly bounded, the limit of E(It/t) is zero, and we have (ii).
This completes the argument that oC is a Nash equilibrium.The argument extends immediately to yield subgame prerfection since the bounds -that were established didn't depend on the strategies but hold pathwise, and play in any finite number of periods will not invalidate the basic martingale arguments.11 We can show that with our strategies, player 1 is "typically" punished infinitely often with probability one.(That is, the probability that It is strictly positive for infinitely many t is one.)This seemingly infinite punishment is compensated for in one of two ways.If vI < g1(a), then in the reward stages (when It 0) player 1 is accruing (on average) at a rate exceeding vl.On the other hand, if v1 = gj(a), then although player 1 is punished infinitely often, the proportion of time that she is punished vanishes.Contrast this with Radner's (1986) construction of efficient equilibria for symmetric time-average partnership games using the l.i.l.; in Radner's equilibrium, the probability of infinite punishment is zero.In a sense, our equilibrium strategies are more sensitive to how well the long-run player is doing and so will typically involve many more "regime shifts"; punishment begins as soon as the player is doing better than he should.Radner's equilibria, in contrast, allow for (ever) larger excursions from what should happen before punishment is initiated; there are "fewer" shifts of regime (in a very strong sense), but the criterion for shifting to punishment is relatively more complex.

CONCLUDING REMARKS
There are several extensions and complements to our results that are worth mentioning.First, we have left open the characterization of the set of equilibria for more than one long-run player without observable mixed actions.This question has been resolved by Fudenberg and Levine (1990), who, using methods different from ours, show that a natural generalization of Proposition 4 obtains.Specifically, for any long-run player i, define vi as in Section 3.2.Suppose that the dimensionality of V* is 1.Then, for any v E V* such that vi < v* for all i4 for all discount factors sufficiently near one there is a subgame-perfect equilibrium (which does not invoke public randomization) in which the long-run players' payoffs are v.
A second natural extension, also developed in Fudenberg and Levine (1989b), is to cases where players' actions are not observed perfectly but are subject to "noise".Think, for example, of a long-run player who engages a sequence of short-run opponents in a standard Cournot duopoly game where, in the style of Green and Porter (1984), equilibrium price depends on quantities supplied and some random shock, and each player observes only the equilibrium prices in each round.Fixing the amount of "noise", Fudenberg and Levine show how to compute the limit set of equilibrium values, as the discount rate approaches one.
In the tradition of the literature on the folk theorem, we have characterized the limit set of equilibrium payoffs.Also in line with tradition, we have not tried to say which of the many points in this set, if any, is the "solution" to the game.In general games, with many long-run players, we cannot even offer a guess about what the "solution" might be.But in games with a single long-run player, one natural candidate for the solution might be the equilibrium that the long-run player prefers.(This would then suggest that, in the limit, she should obtain a payoff of v*.) Fudenberg andLevine (1989a, 1990) give some credence to this hypothesis by taking an approach complementary to the one taken here, considering models where there is a small amount of uncertainty about the payoffs of the single long-run player.That is, they study games with a "little bit" of incomplete information, as analyzed initially in Kreps, Milgrom, Roberts and Wilson (1982) and for which the appropriate folk theorem (for games with only long-run players) is given in Fudenberg and Maskin (1986).Fudenberg and Levine (1989a) suppose there is initially a small chance that one of a number of pure actions is a dominant strategy for the long-run player.Then as the discount factor goes to one, the long-run player, even if not of such a type, is assured of a payoff at least as large as she would get if she committed to the best (for her) of those actions.Fudenberg and Levine (1990) extend this to the case where there is positive probability assessed that the long-run player will play one of various mixed actions and/or where the long-run player's actions are observed with noise.
In the framework of this paper, the Fudenberg and Levine results imply that, when the motivations of thelong-run player are uncertain, there may be no discontinuity as the discount factor reaches one.That is, if in the game of Figure 2, the short-run players initially assess positive probability that player 1 will consistently play a mixed action with the probability of U equal to 1/2, then as the discount factor approaches one from below, the expected payoff to player 1 if she plays to maximize her payoffs tends to 3.This suggests that, if short-run players entertain small doubts about the long-run player, a single long-run player may do even better than v* for discount factors close to one.
characterization of a target strategy equilibrium is inaccurate in one technical respect; see Section 4. 4. The set of equilibrium payoffs is similarly discontinuous in the case of repeated partnerships; Radner (1986) shows that there is no inefficiency with time averaging.
results are stated in terms of payoffs for the long-run players only.
V* is convex, this assumption is simply that V* contains an open ball in R'.Consider the following strategy profile.Begin play in phase I. Deviations from this profile by short-run players are ignored: Phase L Play according to a.If all long-run players play according to a or if two or more long-run players fail to play according to a, then remain in phase I.If a single long-run player j deviates from a, then switch to phase IIj.Phase IIj.Play according to mi.If a single long-run player i deviates, begin phase Iii.If no long-run player deviates, or if more than one long-run player deviates, then remain in phase IIj if phase IIj has lasted less than N periods, and switch to phase IIIj if phase IIj has lasted N periods.Phase IIIj.Play according to a'.If no long-run player or more than one longrun player deviates, remain in phase IIIj.If one long-run player i deviates, then begin phase Iii.

8.
As we noted in the introduction, this limiting inefficiency may be reminiscent of similar limiting inefficiencies in the context of repeated games with moral hazard, as in Green and Porter (1984), Abreu, Pearce and Stacchetti (1986), and Radner, Myerson and Maskin (1986).But following the analysis of Fudenberg, Levine and Maskin (1989) which tracks down the causes of the limiting inefficiencies in the earlier work, we think of the inefficiency here, which is rooted in the problems caused by short-run players, as more endemic.9. See Section 6 for remarks on the case of more than one long-run player.Phase I. Play ml.Use the publicly available randomizing device (whose outcome is known after play in the current round but before play in the next round) to make transitions as follows.If player l's chosen action in the current period is a1a' and remain in phase II no matter what happens.The switching probability has been constructed so that player l's normalized payoff is VI for all her possible actions, including those in the support of ml.Hence she is indifferent among these actions, and so has no incentive to deviate from the specified strategy.And the prescribed strategies are all in B, so the short-run players have no incentive to deviate either.Now assume that v* > vl.(If v* = v , the remainder of the first part of the proposition is trivial.)We will construct an equilibrium which yields v* for player 1.Let a* be a profile that satisfies the equation in the definition of v*.Consider the following two-phase equilibrium.Play begins in phase I. Phase I. Play a*.If player 1 takes any action not in the support of a*, move to phase II.Otherwise, if player 1 takes action a, E support (a*), move to phase phase I with complementary probability.(Use the publicly observed signal at the start of the following period to achieve this randomization.)Phase II.Play the "punishment equilibrium" constructed above.For 8 sufficiently close to one, p*(al) E [0, 1] for all a1.The transition probabilities are selected, moreover, so that for every a, E support (a1), the value to player 1 of playing a, and then conforming to the equilibrium is v*.And for 8 sufficiently close to one, since v* > vp, playing any other action a, (and, therefore, moving immediately to phase II) will be strictly worse.Of course, actions in phase II are best responses for player 1 by construction, and short-run players are always playing best responses since stage-game strategy prescriptions always are from B.
a-l1 is player l's actual choice of action in period t -1 (not her mixed action choice), and it-l is the mixed-action profile that the short-run players were meant to play at time t-1.(We continue to suppress the dependence of r-l on history up through time t -2.)Note that It and Jt are common knowledge at the start of time t.

( i )
If player 1 uses the strategy ol , then It = Jt.Since Jt -rt for all times t by definition, this implies that IO _ limt~o, rt = vl. (ii) Regardless of how player 1 plays, Jt ' v1, hence It C v1 and hence Io,, vl.For (i), we use induction.By definition Io= Jo= 0. Suppose then that It =Jt.At date t, either J, -rt,1 or Jt < rt+,.In the former case, Ct prescribes play of a'.Since g1(al, a1) =0 for all a, in the support of a1, we have It+, = It = Jt = Jt+1.In the latter case, owt prescribes the play of a*.The minimum expected payoff to player 1, if she plays a, E support (a*) and the others play a?*1, is by definition v* _ vl.
a') be the sum of player l's payoffs up through date t -1 in the periods where she began at or below zero, and let yt(h t) =E,TP, g,(a') be the sum of her payoffs up through date t-1 in the periods where she began above zero.(A sum over an empty set is zero by convention.In particular, xo = yo = 0.) Note that the sets Rt and Pt and the associated sums xt and Yt are defined pathwise; i.e. they depend on the history ht.Despite this, we will henceforth omit the history ht from the notation.

Lemma 1 .
Let {x,n, Fn} be a martingale sequence with bounded increments.(That is, for some number Bthis lemma can be found inHall and Heyde (1980, p. 36ff).We also use the following standard adaptation of this strong law: Lemma 2. For {xn, Fn} as above, let Xn = min, i<,5 xi.Then limn -, X,,/n = 0 almost surely.ProofSince xo = 0, Xn _ 0 for all n.Fix a sample path of the stochastic process.Since XI/n '0, we only have to show that lim inf Xn/n = 0. Suppose, instead, that ni is a sub-sequence along which the limit is less than 0. For each ni, there is mi ' ni with xmn = Xnl, and thus 0 > Xnl / ni = xml / ni h' xm / mi.Hence, along the subsequence {mi}, xm,/mi violates the strong law.By Lemma 1, this can happen only on a null set.11 Lemma 3. Let {xn, Fn} be a supermartingale and let {XnJ be defined from {xn} as in Lemma 2. Then limn -o (xn -X )/n = 0 almost surely.Proof.Since xn-, Xn, we only need to show that the lim sup of the sequence is non-n}.Then {yn, Fn} is a martingale sequence with bounded increments, and Lemmas 1 and 2 tell us that lim yn/n = lim Yn/n = 0, and thus lim (Yn -YJ)/n = 0. We are done, therefore, once we show that xn -Xn Yn -Yn pointwise.But this is easily done by induction.It is clearly true for n =0 by conventionXn-,, then since Yn-,:1 Yn,, we are done.While if Xn $ Xn-,, then Xn = xn,, and xn -Xn =0'Y_ -Yn .By a symmetrical argument we obtain: Lemma 4. Let {xn,, F,,l} be a submartingale with bounded increments and x0=0.Let Xn = max {x i i = 1,...,n}.Then limn -o(Xn -Xn)/n = O almost surely.