A game-theoretic analysis of the ESP game

“Games with a Purpose” are interactive games that users play because they are fun, with the added benefit that the outcome of play is useful work. The ESP game, developed byy von Ahn and Dabbish [2004], is an example of such a game devised to label images on the web. Since labeling images is a hard problem for computer vision algorithms and can be tedious and time-consuming for humans, the ESP game provides humans with incentive to do useful work by being enjoyable to play. We present a simple game-theoretic model of the ESP game and characterize the equilibrium behavior in our model. Our equilibrium analysis supports the fact that users appear to coordinate on low effort words. We provide an alternate model of user preferences, modeling a change that could be induced through a different scoring method, and show that equilibrium behavior in this model coordinates on high-effort words. We also give sufficient conditions for coordinating on high-effort words to be a Bayesian-Nash equilibrium. Our results suggest the possibility of formal incentive design in achieving desirable system-wide outcomes for the purpose of human computation, complementing existing considerations of robustness against cheating and human factors.


INTRODUCTION
The paradigm of human computation considers the possibility that networks of people can be leveraged in solving large-scale problems that are hard for computers to solve.Showcased by the early success of "Games with a Purpose" [von Ahn 2006] (GWAP), human computation provides an example of the broader agenda of "peer production," which seeks to design and understand the problem of promoting large-scale collaborations of humans outside of the traditional framework of firms and price signals [Benkler 2002].Examples of other peer-production systems include Wikipedia, YouTube, question-and-answer forums such as Yahoo!Answers and Naver Knowledge-iN, and Taskcn, a popular Chinese crowdsourcing website.
Work by von Ahn and others has shown the tremendous power that networks of humans possess to solve problems while playing computer games [von Ahn and Dabbish 2004;von Ahn et al. 2006c;von Ahn et al. 2006a;von Ahn et al. 2006b;Law and von Ahn 2009;Hacker and von Ahn 2009].The ESP game is an example of such human computation; it is an interactive system that allows users to be paired to play games that label images on the web [von Ahn and Dabbish 2004].Users play the ESP game because it is an enjoyable game to play, with the added side-effect that they are doing useful work in the process.As of July 2008, at least 200,000 people have played the ESP game, which has led to the collection of over 50 million labels [von Ahn and Dabbish 2008].Subsequent work to the ESP game has included Peekaboom [von Ahn et al. 2006c], a GWAP for locating objects within an image, Phetch [von Ahn et al. 2006a], for gathering useful descriptions for images on the web, Verbosity [von Ahn et al. 2006b], for gathering common sense facts, TagATune [Law and von Ahn 2009], for gathering tags for music clips, and Matchin [Hacker and von Ahn 2009], for eliciting user preferences.Still in the spirit of human computation, [Kearns et al. 2006] provide results from a number of behavioral experiments to see how fast distributed networks of humans can solve various graph problems, such as graph coloring.
While there has been incredible progress in yielding successful applications of human computation, there is still little theory at present to guide design.For GWAP, it seems appropriate to use game theory to better understand how to design incentives in order to achieve system-wide goals.For example, it appears anecdotally that during play of the ESP game, people coordinate on easy words and that the game is less effective in labeling less obvious, harder words.Google seemed to have noticed this behavior and introduced different scores for different labels depending on the "descriptiveness" of the label in their version of the ESP game, called the Google Image Labeler.However, [Weber et al. 2008], suggest that this differentiation is not strong enough, and that the resulting labels still tend to have a high percentage of colors, synonyms or generic words.[Ho et al. 2009] also notice that the set of labels determined from the ESP game for an image are not very diverse, and develop a three-player version of the ESP game that involves the addition of a "blocker" to type in words that the other two players cannot use to match.
Coordination on generic, indescriptive words is a problem for certain applications of the ESP game.One of the largest applications of the ESP game is obtaining labels for image search.In this case, it is important to have images assigned to all labels that could potentially be queried in a search engine.To this end, we seek to understand the interaction between preference models and game outcomes.We propose a simple model of the game in which players independently choose an effort level (low or high), which dictates which portion from a universe of words they sample.If a player samples low effort, she samples from the most frequent set of words in the universe, whereas if a player samples high effort, she samples from the entire universe of words.Once players have independently sampled a dictionary (or type), they decide in which order to output their words.
We consider two different models of payoffs, namely match-early preferences and rare-words preferences.Match-early preferences model the setting in which players wish to complete as many rounds as possible and receive the same score irrespective of the words on which they match.The match-early preferences model reflects the current method of assigning scores to outcomes in the ESP game.We show that low effort is an ordinal Bayesian-Nash equilibrium for all distributions on word frequencies, with players focusing attention on high-frequency words.More specifically, we show that choosing low effort in conjunction with playing words in order of decreasing frequency is a Bayesian-Nash equilibrium for all utility functions consistent with matchearly preferences and all distributions on word frequencies.For the second stage of the game, we show that playing words in order of decreasing frequency is a Bayesian-Nash equilibrium for all distributions of word frequencies and all utility functions consistent with match-early preferences.Moreover, we show that (decreasing, decreasing) is one of the few second-stage strategy profiles with these properties, and we determine that the set of strategy profiles that satisfy this property obey an "almost decreasing" property.Conditioned on the second-stage strategy of playing words in order of decreasing frequency, we show that playing low effort is an ordinal Bayesian-Nash equilibrium and that playing low effort is an ordinal best response to playing high effort.These results generalize to any number of effort levels.
The main results obtained in the match-early preferences model are: THEOREM 1.1.The second-stage strategy profile (s ↓ 1 , s ↓ 2 ), where both players play their words in order of decreasing frequency, is a strict ordinal Bayesian-Nash equilibrium for the second-stage ESP game for every distribution over the universe of words U and every choice of effort levels e 1 , e 2 .Moreover, the set of almost decreasing strategy profiles are the only strategy profiles, in which at least one player plays a consistent strategy, that can be an ordinal Bayesian-Nash equilibrium for every distribution over U and every choice of effort levels e 1 , e 2 .
THEOREM 1.2.The complete game strategy profile ((L, s ↓ 1 ), (L, s ↓ 2 )), where both players exert low effort and play their words in order of decreasing frequency, is a strict ordinal Bayesian-Nash equilibrium of the complete ESP game under match-early preferences, for every distribution over the universe of words U , except the uniform distribution.Moreover, (L, s ↓ 1 ) is a strict ordinal best-response to (H, s ↓ 2 ) for every distribution over U , except the uniform distribution.
In order to remedy the problem of users coordinating on common words, which occurs when players adopt low effort and decreasing frequency strategies, we turn to the rarewords preferences model.This is a model in which players wish to match on infrequent words before frequent words, we suppose because of appropriately designed incentives, and where the speed with which a match is achieved is no longer a consideration.
We show that under this preference model, there is a significant difference in the equilibrium structure, in that playing words in order of decreasing frequency is now a strictly dominated strategy and playing words in order of increasing frequency is an ex-post Nash equilibrium.This promotes matching on lower frequency words, with the frequency of the word matched upon, for the same pair of dictionaries, under the (increasing, increasing) strategy profile at least as low as the (decreasing, decreasing) strategy profile.Given additional structure on the utility model we identify an equilibrium behavior that shows a useful focusing on lower frequency words.We show that high effort is a Bayesian-Nash equilibrium for Zipfian distributions over word frequencies under certain classes of utility functions that satisfy rare-words first preferences.This class of utility functions satisfies an additive discount property, meaning that the difference in value between successive outcomes is an additive constant.We focus on Zipfian distributions since the distribution of words in the English language follows a Zipfian distribution with exponent very close to 1.
The main results obtained in the rare-words preferences model are: THEOREM 1.3.The second-stage strategy profile (s ↑ 1 , s ↑ 2 ), where players play their words in order of increasing frequency, is a strict ex-post Nash equilibrium for the second-stage of the ESP game for every distribution over the universe of words U and every e 1 = e 2 , under rare-words preferences.
THEOREM 1.4.The complete game strategy profile ((H, s ↑ 1 ), (H, s ↑ 2 )), where players exert high effort and play their words in order of increasing frequency, is a Bayesian-Nash equilibrium of the complete ESP game for Zipfian distributions with exponent at least 1, s ≥ 1, over the universe of words U and any additive utility function that satisfies rare-words preferences and any multiplicative utility function that satisfies rare-words preferences with the ratio of point values for successive words at least two, or r ≥ 2.
Many of these results adopt a robust equilibrium concept, namely ordinal Bayesian Nash equilibrium.In this equilibrium notion, we look for Bayesian-Nash equilibria that hold for all valuations satisfying a given-preference model.In many cases, our results hold for any distribution over the universe of words U , e.g.Theorems 1, 2 and 3.It should be noted in this case, these results don't require that player 1 and player 2 have the same distribution of words over the universe U ; it is only required that the total ordering of words in the universe U is the same.
The results in this paper provide a simple explanation for why coordinating on low effort, generic words is reasonable to expect in the ESP game, and suggest an alternative incentive structure to obtain coordination on higher effort, more descriptive words.More specifically, Theorem 1.4 shows that to achieve coordination on low frequency, descriptive words, it suffices to have a constant difference in points, i.e. additive utility function, between each successive word in the relevant universe U of words for an image.Still, the system designer does not know the set of words in the universe a priori.One way to address this is to award points based on the current knowledge of the relevant words, with the points modified later through a delayed reward system, given refined knowledge of the word universe through subsequent game play.The frequency ordering within this image-relevant universe can just be the ordering in the English language.Although this proposed scheme is adopting the universe as played by users, itself endogeneous to outcomes of the game, it seems likely to provide a reasonable implementation of our preference model.[Ho et al. 2007;Chang et al. 2007] develop a simple game called PhotoSlap, for determining content of images, based on the popular card game Snap.These authors provide a game-theoretic analysis for PhotoSlap and are able to establish that the desired behavior from a system-wide perspective is a subgame perfect Nash equilibrium.To our knowledge, [Ho et al. 2007;Chang et al. 2007] is the first application of game theory to human computation, however their model and analysis are specific to their game and cannot be applied to the ESP game.Our model of the ESP game requires a more intricate analysis, due to the fact that we model it as a game of imperfect information rather than a game of perfect information, and the action space for our ESP game model is much larger than the action space for the PhotoSlap model.

Related Work
[von Ahn and Dabbish 2008] provide a classification of games with a purpose: outputagreement games, such as the ESP game, inversion-problem games, such as Peekaboom, and input-agreement games, such as TagATune.They provide the key elements of each class in order to ensure the intended computation is done and discuss general design paradigms for increasing enjoyment and output quality.[Ho and Chen 2009] study the verification mechanisms used in various GWAP and classify the verification mechanisms into two classes, the sequential verification mechanism (as used in inversion-problem and input-agreement games) and the simultaneous verification mechanism (as used in output-agreement games), and model games that use these verification mechanisms.These authors model the simultaneous verification mechanism (as in ESP) as a one-shot symmetric coordination game for a report of a single word, and need to appeal to a focal point argument to explain why players will coordinate on the most frequent word.[Ho and Chen 2009] also model a sequential verification game as an extensive form game of imperfect information and show that desirable system wide outcome is supported in an equilibrium.
In an experimental study of the data generated from the Google Image Labeler, [Weber et al. 2008] show that the set of tags for a given image are generated from a low entropy distribution, and that labels entered by players are highly predictable given Taboo Words.In establishing the second point, the authors test a bot programmed with a simple language model learned from the Google Image Labeler, which infers what label should come next solely from the set of Taboo Words already present and derives no information from the image itself.They find that this bot agrees with a human player 81% of the time on images that have at least one Taboo Word.The analysis of [Weber et al. 2008] suggests that players are tending to match on synonyms and colors.For instance, they find that 81% of images labeled "guy" also labeled "man" and over 10% of Taboo Words from approximately 14.5K images are colors.In order to remedy this problem, these authors propose two alternate scoring schemes, such as rewarding players for a label with value inversely proportional to the probability that this label would be entered given the set of Taboo Words, and rewarding players based on the amount of information gain from each new label, but do not provide a game-theoretic analysis.
Game-theoretic models of other peer production systems have been proposed, including a study of scoring mechanisms in Yahoo!Answers [Jain et al. 2009], all-pay auction models of crowdsourcing systems (such as Taskcn and TopCoder) [DiPalatino and Vojnovic 2009;Archak and Sundararajan 2009;Chawla et al. 2011;Cavallo and Jain 2012] and related models in regard to the optimal design of contests [Moldovanu and Sela 2001;2006], and analysis of attention mechanisms in social computing systems [Ghosh and McAfee 2011;Ghosh and Hummel 2011].In addition to this, a number of empirical studies of user behavior in various peer production systems show that some fraction of users in these systems are behaving strategically [Yang et al. 2008;Adamic et al. 2008;Nam et al. 2009], motivating the use of game theory to study such systems.

THE ESP GAME
The ESP Game [von Ahn and Dabbish 2004] is a two-player game for labeling images on the web.Labeling images has proven to be a hard problem for computer vision, yet it is something that humans can do easily [Barnard et al. 2003].However, in order to label images, humans require some sort of incentive for this normally tedious task.This is achieved in the ESP game by making the game fun to play.
In the ESP game, players are randomly paired and each player is presented with the same image.Once the two players have entered a common word, this common word becomes the label for the image.Players cannot communicate with each other while they are entering words for the image and once they agree on a common word, they only see the common word that they agreed upon.Players are paired for a set of 15 images and each pair tries to label as many of the images as they can in 2.5 minutes.Players receive a fixed number of points for each successful label.In the set of 15 images, players get bonus points labeling five images, ten images, and fifteen images.Players can pass on difficult images and they are revisited at the end of a set.The only word that is used from the two input streams of an image is the first common word that is entered.It is intuitive that words upon which players will agree are likely to be relevant to the image given that it is the image, and nothing else, around which the players can coordinate.The game includes a scoreboard, with the names of players with the highest scores, that is updated daily.Empirical studies of Fig. 1.Decision Tree For A Single Player: Players choose an effort level which dictates from which portion of a universe of words they sample their dictionary.The process of sampling a dictionary can be thought of as a move by nature.Finally, a player outputs a permutation on her dictionary.other peer-production systems has shown that points are a key feature in motivating users [Nam et al. 2009].

A Formal ESP Model
We model the ESP game as a two-stage game of imperfect information.In our model, when a player decides to play the ESP game, she is presented with an image and thinks up words to represent the image.She then makes a decision about how to enter words depending how likely she is to match with the other player on those words.We focus on modeling the game associated with one of the images in a set.
Let there be a universe of n words U = {w 1 , w 2 , ..., w n } associated with the image at hand and let 1 < d < n denote the dictionary size, or the number of words that each player samples from the universe 1 .Each word in the universe has an associated frequency, where f i denotes the frequency of word w i in the English language and n i=1 f i = 1.Each player knows the frequency of the words sampled and can therefore rank words according to frequency 2 .
Even though this is a game without any communication between players, it is useful to decompose the strategy of a player into two components which we associated with a first stage, i.e. choosing an effort level, and a second stage, i.e. choosing a permutation on a sampled dictionary.We give a decision tree for a player in Figure 1.In the first stage, a player chooses an effort level: E = {L, H} for low or high.The choice of effort level determines the set of words in the universe from which a player samples her dictionary.The sampled dictionary can be thought of as a move by nature.If a player chooses L in the first stage, the dictionary is sampled from the top n > n L > 0 words (without replacement).That is, word i in the top n L words is chosen first with probability f i,L = fi n L j=1 fj .Let U L be the set of the highest n L frequency words in U , or the "low universe".In addition, let D L denote the set of all possible dictionaries a player could obtain if she played L effort.If a player chooses effort H, the dictionary is sampled from the top n H words, where n H = n, without replacement.In other words, 1 Sometimes we use the additional assumption that d ≤ n 2 . 2 Additionally, assume that the words in the universe are ordered in terms of decreasing frequency, that is the dictionary is sampled from the entire universe of words.That is word i in U is chosen with probability f i,H = f i .Similarly, D H denotes the set of all possible dictionaries a player could obtain if she played H effort.Note that we assume d < n L .
Given a word x ∈ U , we let f e (x) represent the probability of sampling x given that the player has chosen effort level e.This sample is modeled as a move by nature and can be considered to be the point at which a player learns her "type", namely her dictionary of words.Both players are symmetric and each player has the same decision space.Note that n L , n H , and d are parameters of the model.
In the second stage, once each player privately learns her dictionary based on the effort level chosen, players choose a permutation on the words.This models the decision in the ESP game about the order in which a player should enter words.This order on a player's dictionary defines the second-stage action of each player and determines the outcome of the game.The outcome is defined by the first word that is in the ordered list of both players and the location (where the location is defined as the maximum value of the two positions where the word occurs in each ordered list) at which that occurs.
Let D 1 be the dictionary for player 1 and D 2 be the dictionary for player 2. The second stage strategy s 1 ∈ S 1 for player 1 defines a specific order s 1 (D 1 ) on D 1 , for every possible dictionary.Given an effort level, which induces a distribution on sampled dictionaries, the second-stage strategy of a player defines a specific order in which words are played, for every possible dictionary.Likewise, player 2 has a second-stage strategy s 2 ∈ S 2 that defines an order on every possible dictionary.
A complete strategy for the ESP game is a pair σ i = (e i , s i ) ∈ E × S i = Σ i .This defines the play in both stages, with the second-level strategy s i defining the order in which words in the dictionary are played for all possible dictionaries sampled under effort level e i .We focus on pure strategies, which exist in our game.
Definition 2.1.We define a match as follows: Suppose player 1 outputs a list of words x 1 , x 2 , ..., x d and player 2 outputs a list of words y 1 , y 2 , ..., y d .If there exists 1 ≤ i, j ≤ d such that x i = y j , then there is a match in location max(i, j).The first match is the pair i, j that minimizes max(i, j) such that x i = y j .
Given this, an outcome is a pair o = (w, l) ∈ (U ∪ φ) × ({1, ..., d} ∪ φ) where (φ, φ) indicates there was no match and the (w, l) pair otherwise indicates that the first match occurred on word w ∈ U in location l ∈ L, where L = {1, 2, . . ., d} ∪ φ.Let O denote the set of possible outcomes.Let outcome function g(s 1 (D 1 ), s 2 (D 2 )) ∈ O denote the outcome given s 1 , s 2 , D 1 , and D 2 , with the location (if any) of the first match is denoted g l (s 1 (D 1 ), s 2 (D 2 )) ∈ L and the word the first match occurs is denoted Each player i has a utility function v i : O → + which induces a weak total preference ordering on outcomes.We assume that both players have the same utility function.We consider two preference models: match-early preferences and rare-words preferences.In both cases, we work with an ordinal model of preferences.
Definition 2.2.For match-early preferences, we require the following preference ordering on outcomes: Since players are indifferent between which word they match upon under match-early preferences, we can simply describe the outcome of the match as a location, i.e. l i can be used to describe any element in the set {(w 1 , l i ), (w 2 , l i ), ..., (w n , l i )}.We say that a utility v i is consistent with match-early preferences if and only if v i (l 1 ) > v i (l 2 ) > ... > v i (l d ) > v i (φ).This preference model captures the fact that players prefer to match with their opponent as opposed to not matching, and players prefer to match in an earlier location rather than a later location.Players are agnostic as to which word is matched and care only about location.
Definition 2.3.For rare-words preferences, we require the following preference ordering on outcomes: Under rare-words preferences, players are indifferent between which location they match and only care about which word they match upon.Therefore we can simply use a word to denote the outcome, i.e. w i can be used to describe any element in the set {(l 1 , w i ), (l 2 , w i ), ..., (l d , w i )}.We say that a utility function is consistent with rare-words preferences if and only if Let Pr(D i |e i ) denote the probability of dictionary D i given effort level e i .Often times we write this as Pr(D i ) and leave the effort level implicit.Given this, we now define the probability of first match in a particular location when player i knows her own type but has only probabilistic information on the dictionary of the other player.
denote the expected (interim) utility to player i given dictionary D i but with respect to a distribution on the possible dictionary of the other player, as induced by her effort level.Another way to express the expected interim utility to player i given dictionary D i uses the probability of first match vector defined above: to denote the expected (ex-ante) utility to player i before either dictionaries are sampled, given complete strategies σ = (σ 1 , σ 2 ).

Equilibrium Framework
In analyzing the equilibrium of the ESP game, it will be helpful to isolate a restricted game, or in other words, the game induced by a fixed pair of first stage strategies (i.e., efforts) of each player.For a complete strategy profile (σ 1 , σ 2 ) to be an equilibrium, it is necessary that neither player can usefully deviate to an alternate second-stage strategy.Of course this is not sufficient to establish an equilibrium of the full game, in that a player might still usefully deviate to an alternate effort in combination with an alternate second stage strategy.To continue, consider the game induced by fixing effort levels (e 1 , e 2 ) for the two players.This is a restricted game, that we refer to here as the second stage game, which is conditioned on effort e 1 and e 2 .In this second stage game, each player knows her own dictionary but not the dictionary of the other player.Given this, we can define some useful equilibrium concepts: Definition 2.6.Second-stage strategy profile s * = (s * 1 , s * 2 ) is an ex post Nash equilibrium of the second stage of the ESP game conditioned on effort levels e 1 and e 2 , if: This equilibrium is strict as long as there exists a pair of D 1 , D 2 such that the above inequality is strict.
We will adopt an analysis approach that establishes a strict ordinal Bayesian-Nash equilibrium3 , in the sense that we identify strategies that are an equilibrium for all utility functions consistent with match-early preferences.
Definition 2.7.Strategy profile s * = (s * 1 , s * 2 ) is a strict ordinal Bayesian-Nash equilibrium of the second-stage of the ESP game conditioned on effort levels e 1 and e 2 , if for all u i consistent with match-early preferences, where the probability adopted in interim utility u i for the distribution on the dictionary of player −i is induced by the effort of that player in the first stage.
There is an identical definition of ordinal Bayesian-Nash equilibrium for the secondstage of the ESP game for rare-words preferences.We also define ordinal Bayesian-Nash equilibrium for the entire game.
Definition 2.8.A strategy profile σ * = (σ * 1 , σ * 2 ) ∈ Σ 1 ×Σ 2 is a strict ordinal Bayesian-Nash equilibrium of the ESP game if for every u i consistent with match early preferences, we have Likewise, there is an identical definition of ordinal Bayesian-Nash equilibrium for the complete ESP game for rare-words preferences.Since the effort level chosen by each player is not visible to the other player, there is no need for a subgame perfect refinement.
Next we define the notion of stochastic dominance for a general utility function.Our definition uses the following notation: Suppose that u(s i , s We say that the strategies s i and s −i induce a probability vector on outcomes Definition 2.9.Strategy s i stochastically dominates s i with respect to opponent strategy s −i and outcome ordering o 1 , o 2 , ..., o m if and only if We say that the stochastic dominance property is strict if there exists a k such that The following theorem equates our definition of stochastic dominance and ordinal Bayesian-Nash equilibrium.We omit the proof of the following theorem since it is a standard proof in stochastic dominance [Hanoch and Levy 1969].The literature on ordinal Bayesian incentive-compatibility for representation of committees, stable matchings, etc. likewise uses an analogous definition of stochastic dominance to establish that truth-telling is an ordinal Bayesian-Nash equilibrium [d' Aspremont and Peleg 1988;Majumdar 2003].
THEOREM 2.10.Strategy s i strictly stochastically dominates s i with respect to opponent strategy s −i if and only if This means that we can use the stochastic dominance condition to establish ordinal Bayesian-Nash equilibrium for the second-stage of the ESP game and the complete ESP game.We define stochastic dominance more specifically for the second-stage game and the complete ESP game under each preference model as needed.

Remarks about the Model
We model the ESP game with each player sampling words from a universe of possible words associated with the image, to which we associate a frequency ordering.Players can vary the effort level that relates to how likely they are to sample frequent words as opposed to infrequent words.Then players decide which order to play their sampled words in the game.
We capture the idea that there are 15 images in a set, with a limited amount of time, by considering match-early preferences.We do not model a sequential decision making process, where users choose an effort level before sampling and entering each successive word.We omit this because there seems to be little inference a player can make about the strategy of the other player from the limited information revealed.All a player learns is that no match has occurred.This provides little evidence, for example, to discriminate between a player playing frequent words or a player playing rare words.Rather it seems more likely that strategy updates occur after a successful match, where a player learns what word provided the match.In addition, the time frame per image is rather small; i.e., 2.5 minutes for 15 images.Thus it seems unlikely that users are updating their strategy during the play on a particular image.It would be interesting to empirically analyze the data from the ESP game to examine whether, and if so when, strategy updates occur.
The universe of words for an image models the set of words that are in some way relevant to the image, and represent the knowledge that the game designer is trying to learn.Each of these words is relevant to the image at hand.For example, if we had an image of a Victorian house and we had the two labels, "building" and "Victorian house", both are relevant, while one is more descriptive than the other [Weber et al. 2008].
The decision of a player is modeled with a "first stage" choice of effort level and a "second stage" choice of permutation on a sampled dictionary.Note that the use of "stage" does not imply an observable action after the choice of effort level.What we refer to as the "second stage" is merely an effort-constrained game, induced by a pair of effort levels chosen by the players.We often use "second stage" and "effort-constrained game" interchangeably.Given the equilibrium analysis in the effort-constrained game, we analyze the complete game by fixing the strategies determined in the effort-constrained game analysis and examining the choice of effort level in the first stage.We refer to this analysis as the "complete game" analysis.
It should be noted that in many cases, the results in this paper generalize to any number of effort levels, but we describe all the results using two effort levels for simplicity.In order for the results to generalize to any number of effort levels, we need the mappings in sections 3.2 and 4.2, to satisfy the following properties: 1) the lower effort level's universe needs to be a strict subset of the higher effort level's universe; and 2) the lower effort level's universe needs to contain the highest frequency words of the higher effort level's universe.We discuss the generalization of the results as they are introduced in the paper.
Additionally, in many cases, only the total ordering of words in the universe U , and not necessarily the exact frequency distribution, needs to be consistent across players.Whenever a specific distribution is not imposed on U and the result is mentioned to hold for any distribution over U , the result holds for the case that only the total ordering of words via frequency is the same.Finally, the model can be generalized to handle the case where each player has a different universe of words.In such a case, a player would have no reason to enter a word that is not in the other player's universe.Therefore, the results would hold for a universe of words that is the intersection of both players sets of words.
In sampling words, it seems reasonable to model this sampling process for any given image according to the distribution induced on the image-relevant universe by the frequencies of words in the English language because there is cognitive effort required to retrieve less frequently used words.Likewise, the English language is coded efficiently in that the more frequent, common words are generally shorter whereas the less frequent, more descriptive words are generally longer (and thus take more effort to type).We establish that low effort is an equilibrium under match-early preferences even without associating a differential cost with a user's effort, which would increase with high effort and provide an increased preference towards low effort.
Match-early preferences model a player who prefers to match sooner rather than later on an image, due to the time-constraint on all 15 images in the ESP game.The actual implementation of the ESP game assigns the same number of points to players if they match, regardless of where the match occurs (e.g.how many words they enter before they match), and regardless of which word the match occurs.Despite this, players are under a time constraint and should prefer to match sooner rather than later, in order to match on as many images as possible in the allotted amount of time.We adopt an ordinal model of preferences, so that we do not have to quantify exactly how much players prefer to match sooner rather than later.Rare-words preferences model a player who prefers to match on rarer words than more frequent words, and is indifferent between the location in which the match occurs, presumably because users will be given more time for a set of images such that time is no longer such a key constraint.
We restrict our attention to strategies that involve playing all words in the dictionary since any strategy that does not involve playing all words is weakly dominated by one that involves playing all words.Moreover, we will look for equilibrium of the secondstage of the ESP game in consistent strategies, which are strategies for a player that do not change the relative ordering of elements depending on the player's realized dictionary.In other words, a consistent second-stage strategy involves specifying a total ordering of elements on U e (after choosing an effort level e) and applying that total ordering to the realized dictionary.We do not restrict agents to only playing consistent strategies, but rather identify equilibria in which a player does not wish to deviate to an inconsistent strategy.A consistent strategy s specifies a total ordering on the set U e : w 1 w 2 ... w |Ue| , where w i is not necessarily the same as w i .In fact, w i = w i for all i if and only if s = s ↓ , where s ↓ is the strategy in which a player plays her words in order of decreasing frequency.Equilibria in consistent strategies seem natural because of their simplicity, requiring that word x is played before word y independent of whether any word z is present in a player's dictionary.

EQUILIBRIUM ANALYSIS UNDER MATCH-EARLY PREFERENCES
In this section, we analyze the equilibrium behavior under match-early preferences.We show that playing decreasing frequency in conjunction with low effort is an ordinal Bayesian-Nash equilibrium for the ESP game.All omitted proofs can be found in the Appendix.

Equilibrium Analysis of the Effort-Constrained Game
First we see that playing words in order of decreasing frequency is not an ex-post Nash equilibrium for the second stage game.PROPOSITION 3.1.Suppose e = min(e 1 , e 2 ) and U e = {w 1 , w 2 , w 3 }, and d = 2.The second-stage strategy profile s = (s ↓ 1 , s ↓ 2 ) is not an ex-post Nash equilibrium.PROOF.Suppose D 2 = {w 2 , w 3 } and D 1 = {w 1 , w 2 }, s 2 (D 2 ) dictates player 2 will play w 2 followed by w 3 .If player 1 deviates from s 1 and plays w 2 followed by w 1 , then player 1 will get higher utility.
Since playing words in order of decreasing frequency is not an ex-post Nash equilibrium, we focus instead on establishing ordinal Bayesian-Nash equilibrium via stochastic dominance.We define stochastic dominance for the ordering on outcomes under match-early preferences.
Definition 3.2.Fixing effort levels e 1 and e 2 , fixing the opponent's second-stage strategy s 2 , and fixing dictionary D 1 , we say that the second-level ordering s 1 (D 1 ) stochastically dominates the second-level ordering s 1 (D 1 ) with respect to match-early preferences if and only if Definition 3.2 gives the notion of stochastic dominance for an ordering s 1 (D 1 ).We say a second-level strategy s 1 stochastically dominates another second level strategy s 1 if and only if s(D 1 ) stochastically dominates s 1 (D 1 ) for all In what follows, we show that "playing decreasing frequency" is a strict ordinal Bayesian-Nash equilibrium of the second-stage ESP game, for any pairs of effort levels e 1 , e 2 and for any distribution over U .Moreover, we show that this equilibrium is one of the few ordinal Bayesian-Nash equilibrium that holds for every distribution over U .We obtain a characterization result and show that the set of strategy profiles that are ordinal Bayesian-Nash equilibrium of the second-stage game satisfy an "almost decreasing" property.The crux of the argument will be to establish stochastic dominance.
Algorithm 1 describes a possible strategy for player 1 in terms of player 2's secondlevel strategy s 2 .It takes as input any sampled dictionary D 1 , the distribution over U , her opponent's effort level e 2 , and the second-level strategy s 2 of player 2 and outputs an ordering on the words in D 1 .In order to get a completely specified strategy for player 1 from Algorithm 1, we run the algorithm for all possible dictionaries D 1 ∈ D 1 .
Algorithm 1 implicitly takes into account the effort level of player 2. If player 2 is playing a lower effort level than player 1, player 1 will play those words in D 1 ∩ U e2 followed by any words in D 1 that are not in U e2 (these are the higher effort words that player 2 did not sample).Likewise, if player 2 is playing a higher effort level than player 1, this algorithm still computes a feasible output ordering for player 1.Since the higher effort words that player 2 may have are not in her sampled dictionary, she cannot play them.
We say that the output of Algorithm 1 with respect to dictionary D is in agreement with s 2 if for all pairs of words w i , w j ∈ D, Algorithm 1 specifies playing w i before w j if and only if w i w j in s 2 .Recall that we look for equilibrium in consistent strategies and so s 2 is associated with a well-defined ordering.
ALGORITHM 1: Candidate Best Response for Player 1 input : Sampled D1, σ2 = (e2, s2) maintain ordered list s1(D1) = ∅ for i = 1 to d do add element to the end of the ordered list s1(D1) end output: s1(D1) This algorithm does not always output an ordering that stochastically dominates all other orderings in the sense of Definition 3.2.But, any time it fails to produce such an output, we show that no such ordering exists.
The following definition is useful in characterizing the output of Algorithm 1.Note that the set {w 1 , ..., w n } is ordered according to s 2 , i.e. s 2 specifies playing the following total order on words: w 1 w 2 ... w n .We also use the notation that w i ∈ l k (s 2 (D 2 )) means that word w i is the k th highest priority word in dictionary D 2 , when s 2 acts on D 2 .Similarly, in the following definition w i ∈ l ≤k (s 2 (D 2 )) means that word w i is among the k highest priority words of dictionary D 2 .Definition 3.3.We say that second-stage strategy s 2 satisfies the preservation condition for a particular distribution, if for a fixed effort level of player 2 and for every pair of w i and w j such that i < j, we have that Pr( Definition 3.4.We say that s 2 satisfies the strong condition for a particular distribution, if for a fixed effort level of player 2 and for every pair of w i and w j such that i < j − 1, we have that Pr( In an almost decreasing strategy the first n − 1 words of s 2 are sorted in order of the decreasing frequency, but the last word may not necessarily be the least frequent word of U .Therefore, there are a total of n strategies that satisfy this property.We use the term, almost decreasing strategy profile, to describe a symmetric strategy profile (s, s), where s is an almost decreasing strategy.Definition 3.5.We say that a consistent strategy s 2 satisfies the almost decreasing property if and only if f (w i ) > f (w j ), for all 1 ≤ i < j ≤ n − 1.
LEMMA 3.6.If Algorithm 1 outputs an ordering that does not stochastically dominate all other orderings, with respect to D 1 and for fixed opponent strategy σ 2 , then no such ordering exists.
PROOF.Let the output of Algorithm 1 be s(D 1 ).Suppose there exists another ordering s (D 1 ) = s(D 1 ) that stochastically dominates all other strategies.Let l i be the first coordinate in which p(s(D 1 ), s 2 ) and p(s (D 1 ), s 2 ) differ.It must be the case that p(l i , s(D 1 ), s 2 ) ≥ p(l i , s (D 1 ), s 2 ) since Algorithm 1 will output the word (of the remaining words) that will be the most likely to appear in the top i words of player 2. Therefore, Recall from Theorem 2.10, that stochastic dominance, as defined in Definition 3.2, is a necessary condition in order to have utility maximization for all utilities consistent with match early preferences.
The following lemma gives sufficient conditions on the strategy of player 2 such that Algorithm 1 will always output an ordering in agreement with s 2 , and such that this strategy will stochastically dominate all other strategies.LEMMA 3.7.If second-stage strategy s 2 satisfies the preservation condition for a particular distribution, then Algorithm 1 will always output an ordering in agreement with s 2 , for any sampled dictionary.Moreover, if s 2 satisfies the strong condition for this distribution, then the strategy of always playing an ordering in agreement with s 2 will strictly stochastically dominate all other strategies, for any sampled dictionary.PROOF.Since s 2 satisfies the preservation condition, for every for all 1 ≤ k ≤ i, w j cannot be output before w i by Algorithm 1, for every D. Since this is true for all w i , w j ∈ D with w i w j , Algorithm 1 must output an ordering in agreement with s 2 .The strong condition tells us that, for every D 1 , where D 1 = {w 1 , w 2 , ..., w d } with w 1 w 2 ... w d under s 2 , Pr(w i ∈ D) > Pr(w j ∈ D(k)) for all i, j, k, with i < k < j.Thus w 1 , w 2 , ..., w i are the set of words that strictly maximize i j=1 p(l j , s 1 (D 1 ), s 2 ) for any i, and stochastic dominance is satisfied.
LEMMA 3.8.If a consistent strategy s 2 satisfies the almost decreasing property, then the strategy profile (s 2 , s 2 ) is a strict ordinal Bayesian-Nash equilibrium of the secondstage ESP game under match-early preferences, for every choice of effort levels e 1 and e 2 , for every distribution over U .
LEMMA 3.9.The symmetric strategy profile (s 2 , s 2 ) is a strict ordinal Bayesian-Nash equilibrium of the second-stage ESP game under match-early preferences, for every s 2 , for every e 1 = e 2 , for the uniform distribution over U .
PROOF.Since the distribution over U is uniform, for all w i , w j with w i w j in s 2 , Pr(w i ∈ D 2 ) = Pr(w j ∈ D 2 ).This gives Pr( Thus s 2 satisfies the preservation condition, for every s 2 .From Lemma 3.7, s 2 is a best response to s 2 .Finally, for every s 2 , Pr(w i ∈ D) = Pr(w j ∈ D), for all j, i such that j > i.Likewise, Pr(w j ∈ D) > Pr(w j ∈ l ≤k (s 2 (D 2 ))), for all j, k where j > k.This gives: Pr(w i ∈ D) = P r(w j ∈ D) > Pr(w j ∈ l ≤k (s 2 (D 2 )) for all i, j, k where i < k < j or in other words, the strong condition is satisfied.Therefore Lemma 3.7, along with Theorem 2.10, gives the desired result.
This lemma tells us that that strategy profile (s 1 , s 2 ), with s 1 = s 2 , cannot be an ordinal Bayesian-Nash equilibrium under the uniform distribution, because under the uniform distribution over U , for every sampled dictionary, s 2 generates a distribution on outcomes that strictly stochastically dominates s 1 = s 2 .Therefore, for every utility function consistent with match-early preferences, player 1 would prefer to deviate from s 1 to s 2 .Therefore, if a strategy profile is an ordinal Bayesian-Nash equilibrium for all distributions over U , then it must be that case that the strategy profile is symmetric.
It should be noted that the statement of Lemma 3.9 can easily be generalized to take care of the case where players play different effort levels, but still under the uniform distribution over the words in U .If player 1 is playing a lower effort level than player 2, (s 2 , s 2 ) is a strict Bayesian-Nash equilibrium for every s 2 , where s 2 is a total ordering on the set U e2 and s 2 is s 2 with all the words in the set U e2 − U e1 removed.Likewise, if player 2 is playing a lower effort level than player 1, (s 2 , s 2 ) is a strict Bayesian-Nash equilibrium for every s 2 , where s 2 is a total ordering on the set U e2 and s 2 is s 2 with all the words in the set U e1 − U e2 concatenated to the end of s 2 , e.g.all words in U e1 − U e2 are lower priority than all words in U e2 under s 2 .
Lemma 3.10 tells us that for every symmetric strategy profile (except for the almost decreasing strategy profiles), there exists a distribution such that this strategy profile is not an ordinal Bayesian-Nash equilibrium.Lemma 3.9 rules out the possibility of an asymmetric strategy profile as an ordinal Bayesian-Nash equilibrium that holds for every distribution over U and Lemma 3.10 rules out the possibility of a symmetric strategy profile (except for the almost decreasing strategy profiles) as an ordinal Bayesian-Nash equilibrium that holds for every distribution over U .Therefore, Lemmas 3.9 and 3.10 can be used to establish Theorem 1.1, where it is shown that the almost decreasing strategy profiles are the only strategy profiles that are an ordinal Bayesian-Nash equilibrium for every distribution over U .LEMMA 3.10.For every (e 2 , s 2 ) (except for s 2 that are almost decreasing), there exists a distribution over U , an effort level e 1 , and a dictionary for which Algorithm 1 will not output an ordering in agreement with s 2 .THEOREM 1.1.Second-stage strategy profile (s ↓ 1 , s ↓ 2 ) is a strict ordinal Bayesian-Nash equilibrium for the second-stage ESP game for every distribution over U and every choice of effort levels e 1 , e 2 .Moreover, the set of almost decreasing strategy profiles are the only strategy profiles, in which at least one player plays a consistent strategy, that can be an ordinal Bayesian-Nash equilibrium for every distribution over U and every choice of effort levels e 1 , e 2 .
PROOF.Lemma 3.8 tells us that Algorithm 1 will always output a strategy in agreement with s ↓ 2 , if player 2 is playing s ↓ 2 , regardless of D 1 and the distribution over U .Furthermore, this strategy stochastically dominates all other strategies, for every distribution over U .Lemma 3.9 tells us that there exists a distribution, namely the uniform distribution, for which Algorithm 1 will output an ordering in agreement with s 2 , regardless of the dictionary D 1 , for all s 2 that are consistent.Moreover, this strategy will stochastically dominate all others, for the uniform distribution.Lemma 3.10 tells us that there exists a distribution F (U ) and dictionary D 1 for which Algorithm 1 will output an ordering that is not in agreement with s 2 , for all s 2 that are not almost decreasing.Either this strategy stochastically dominates all others, for this distribution F (U ), or it does not.In the former case, we have exhibited two distributions that have two different strategies that stochastically dominate all others.In the latter case, we know that there is no strategy that stochastically dominates all others for the distribution F (U ) from Lemma 3.6.Therefore, there is no single strategy for player 1 that stochastically dominates all others when player 2 is playing s 2 , where s 2 is not almost decreasing, for all distributions over U and every utility function that satisfies match-early preferences.
Definition 3.11.We say that the distribution on words in the universe satisfies a Zipfian distribution if and only if f (w i ) = 1 i s for any s > 0. LEMMA 3.12.If there exists an ) cannot be a Bayesian-Nash equilibrium for any utility function satisfying match-early preferences.PROPOSITION 3.13.Second-stage strategy profile (s ↑ 1 , s ↑ 2 ) cannot be a Bayesian-Nash equilibrium for the second-stage of the ESP game for any Zipfian distribution over U with s ≥ 1 and for any utility function satisfying match-early preferences.

Equilibrium Analysis of the Complete Game
In the results that follow, we show that playing L at the top-level together with playing words in order of decreasing frequency is a strict ordinal Bayesian-Nash equilibrium for all distributions except the uniform distribution over U .For the case of the uniform distribution, ((L, s ↓ 1 ), (L, s ↓ 2 )) is a weak ordinal Bayesian-Nash equilibrium.In order to show this, we first carefully specify what it means for a strategy to stochastically dominate another for the top level of the game, which fixes the equilibrium strategy for the bottom-level.This definition uses the following notation for a k-truncation of dictionary D: D(k) is the set of k highest frequency words in D.
Definition 3.14.Fixing player 2's strategy (e 2 , s 2 ), we say that strategy (e 1 , s 1 ) for player 1 stochastically dominates strategy (e 1 , s 1 ) for player 1, with respect to the outcome ordering of match-early preferences, if and only if: gives the outcome when second-stage strategies s 1 and s 2 act on truncated dictionaries D 1,e1 (k) and D 2,e2 (k) and I(•) is the indicator function.We say the stochastic dominance is strict if there exists a k such that the above inequality is strict.
In order to establish stochastic dominance, we construct a randomized mapping for each dictionary that can be sampled when playing H to a number of dictionaries that can be sampled when playing L. Each dictionary in D H is mapped to a dictionary in D L that is at least as likely to match against the opponent's dictionary, averaged over the distribution of all possible dictionaries for the opponent.This is shown in Lemma 3.16.In order to complete the proof, it is necessary to show that under the randomized mapping, no element in D L is mapped to with greater probability under the randomized mapping than under the original distribution over D L .This fact is shown in Lemma 3.17.
We say that dictionary D with elements {w 1 , w 2 , ..., w n } (in order of decreasing frequency) dominates dictionary D with elements {w 1 , w 2 , ..., w n } (in order of decreasing frequency) if f (w i ) ≥ f (w i ) for all i.We say that the dominance is strict if D = D.The following lemma is needed to prove Lemma 3.16.LEMMA 3.15.For every pair of dictionaries D and D such that dictionary D dominates dictionary D, every effort level of player 2 and when both players play decreasing frequency in the second stage, we have that: In addition, when D strictly dominates D, the inequality is strict for all k ≥ k , where k is the first coordinate where D and D differ.
PROOF.It suffices to show equation 4 for D = {w 1 , w 2 , ..., w n } and D = {w 1 , w 2 , ..., w n } (both in sorted order) where w j = w j for all j = i and f (w i ) > f (w i ).If k < i, equation 4 holds with equality.Consider k ≥ i and let ).We say D satisfies property P if D(k) ∩ A = ∅ and w i ∈ D(k) and property P if D(k) ∩ A = ∅ and w i ∈ D(k).It suffices to show that Pr(D 2 satisfies P ) > Pr(D 2 satisfies P ).Consider the transformation t : D i → D i , where D i is any dictionary that satisfies P .If D i also has w i , t maps D i to D i and note D i satisfies property P .If D i does not have w i , t replaces w i with w i to yield dictionary D i .Since Thus, we have established that: Pr(D 2 satisfies P ) > Pr(D 2 satisfies P ).
For the following lemmas we use the randomized mapping h: Consider a dictionary D ∈ D H , D = A ∪ B, where A is the set of "low words" and B is the set of "high words" (in other words, Note that if D contains only high words, D is mapped to all dictionaries in D L with non-zero probability.Likewise, if D contains only low words, D is mapped to only one dictionary in D L .LEMMA 3.16.For every D 1,H , where D 1,H is a dictionary sampled with respect to the H effort level, and for every h that satisfies the property that D 1,H is mapped to a dictionary in D L that contains the set D 1,H ∩ U L and every effort level of player 2 and when both players play decreasing frequency in the second stage, we have that: In addition, the inequality is strict for all k ≥ k when h(D 1,H ) = D 1,H and k is the first coordinate where h(D 1,H ) and D 1,H differ.
PROOF.Due to Lemma 3.15, it suffices to show h(D 1,H ) = {w 1 , w 2 , ..., w d } dominates D 1,H = {w 1 , w 2 , ..., w d }.Assume there exists a coordinate i with f (w i ) < f (w i ).Let j be the minimum such coordinate.w j ∈ U L since h(D 1,H ) contains words only in U L .This means w j ∈ h(D 1,H ).Since the dictionaries are in sorted order, this means w j = w k for some k < j, however, this means h(D 1,H ) does not contain all of D 1,H ∩ U L , a contradiction.When h(D 1,H ) = D 1,H , the dominance is strict.Lemma 3.17 states that the distribution obtained from sampling U L directly is the same as the distribution obtained from sampling a high dictionary, followed by the randomized mapping (i.e.sampling U H until you get d low words).The proof is easy and omitted.LEMMA 3.17.
Lemma 3.18 uses Lemmas 3.16 and 3.17 to show that playing L stochastically dominates playing H under match-early preferences, assuming players play decreasing frequency in the second stage.It is also important to note that this argument is independent of the number of effort levels so the equilibrium analysis continues to hold as we vary the number of effort levels, as long as there are at least two.LEMMA 3.18.For every effort level of player 2 and when players play decreasing frequency in the second stage: PROOF.From Lemma 3.17, we know that: This is equivalent to writing: where the inequality follows from Lemma 3.16.We know that the inequality is strict for all k since there exists a D 1,H such that h(D 1,H ) = D 1,H and h(D 1,H ) and D 1,H differ in the first coordinate, for all possible values of h(D 1,H ) under the randomized mapping.
Theorem 3.8 together with Lemma 3.18 and Theorem 2.10 gives us the following result.
THEOREM 1.2.((L, s ↓ 1 ), (L, s ↓ 2 )) is a strict ordinal Bayesian-Nash equilibrium of the complete ESP game under match-early preferences, for every distribution over U , except the uniform distribution.Moreover, (L, s ↓ 1 ) is a strict ordinal best-response to (H, s ↓ 2 ) for every distribution over U , except the uniform distribution.
COROLLARY 3.19.((H, s ↓ 1 ), (H, s ↓ 2 )) is not a Bayesian-Nash equilibrium of the complete ESP game for any distribution over U , except the uniform distribution, and for any utility function that satisfies match-early preferences.
These results can be generalized to a model with any number of effort levels.Consider m effort levels, where m < n, these results would generalize as follows: Playing the lowest effort level in conjunction with decreasing frequency, for both players, would be a strict ordinal Bayesian-Nash equilibrium for any distribution over U and any utility function satisfying match-early preferences.Moreover, playing the lowest effort level in conjunction with decreasing frequency would be a strict ordinal best response to playing any higher effort level with decreasing frequency, for any distribution over U and any utility function satisfying match-early preferences.The randomized mapping that these results rely on can be generalized.Recall that the randomized mapping maps each high dictionary to a set of low dictionaries that do at least as well in expectation against the opponent's dictionary.The interpretation of the randomized mapping is that each higher effort level dictionary is mapped to a lower effort level dictionary with the probability that a player would get that low dictionary if continuing to sample words until receiving d low words.This mapping will work as long as the lower effort level's universe is a strict subset of the higher effort level's universe and the lower effort level's universe contains the highest frequency words of the higher effort level's universe.

THE EFFECT OF RARE-WORDS PREFERENCES
In this section, we consider the effect of rare-words preferences on the equilibrium analysis.We look to understand whether there is a high effort equilibrium available in this preference model, when players care about matching on rare words and are indifferent between the location that they match.
For the second stage, we show that playing words in order of decreasing frequency is strictly dominated, in stark contrast with the previous section.Also, playing words in order of increasing frequency is an ex-post Nash equilibrium of the second-stage game, for all pairs of effort levels chosen in the first stage.Although, we show that ((H, s ↑ 1 ), (H, s ↑ 2 )) cannot be an ordinal Bayesian-Nash equilibrium in the complete game for any distribution over U , we show that for every distribution over U , there exists a utility function for which ((H, s ↑ 1 ), (H, s ↑ 2 )) is a Bayesian-Nash equilibrium.We also show that ((L, s ↑ 1 ), (L, s ↑ 2 )) is an ordinal Bayesian-Nash equilibrium for every distribution over U and this leads to a better outcome from the system designer's perspective for every pair of player dictionaries.Finally, we demonstrate sufficient conditions on the utility function for Zipfian distributions in order for ((H, s ↑ 1 ), (H, s ↑ 2 )) to be a Bayesian-Nash equilibrium under rare-words preferences.

Equilibrium Analysis of the Effort-Constrained Game
We show that in this model, playing words in order of increasing frequency is not a dominant strategy equilibrium.
Definition 4.1.Second-stage strategy profile s * 1 is a dominant strategy of the second stage of the ESP game conditioned on effort levels e 1 and e 2 , if: Definition 4.2.Second-stage strategy profile s 1 is a dominated strategy of the second stage of the ESP game conditioned on effort levels e 1 and e 2 , if there exists an s 1 such that: We say that s 1 is strictly dominated if there also exists a D 1 , D 2 and s 2 such that the above inequality is strict.
PROPOSITION 4.3.Suppose that e = min(e 1 , e 2 ), there are five words in the universe U e , {w 1 , w 2 , w 3 , w 4 , w 5 }, and d = 4.The second stage strategy profile s = (s ↑ 1 , s ↑ 2 ) is not a dominant strategy equilibrium in the second-stage game for any distribution over U under rare-words preferences.

PROOF. Suppose
w 2 w 5 w 1 , and 1 , the match will occur on w 4 .However, if s 1 → w 5 w 3 w 1 w 4 , the match will occur on w 5 , a strictly better outcome.
While playing words in order of increasing frequency is not a dominant strategy equilibrium of the second-stage game, it is an ex-post Nash equilibrium with rarewords preferences.THEOREM 1.3.Second-stage strategy profile (s ↑ 1 , s ↑ 2 ) is a strict ex-post Nash equilibrium for the second-stage of the ESP game for every distribution over U and every e 1 = e 2 , under rare-words preferences.
The statement of the above theorem can be generalized to handle the case where players are playing different effort levels.In the case that player 1 is playing a higher effort level than player 2, player 1's best response to is play increasing on the set D 1 ∩ U e2 , followed by the words in the set D 1 ∩ (U e1 − U e2 ) (in any order).Likewise, in the case that player 2 is playing a higher effort level than player 1, player 2's best response is to play increasing on the set D 2 ∩ U e1 (in any order), followed by the words in the set D 2 ∩ (U e1 − U e2 ) (in any order).This "generalized" increasing strategy can be shown to be an ex-post Nash equilibrium.PROPOSITION 4.4.Second-stage strategy s ↓ 1 is strictly dominated for any secondstage strategy of player 2 and for any distribution over U and any choice of effort levels e 1 , e 2 , under rare-words preferences.

Equilibrium Analysis of the Complete Game
In order to analyze the top-level game, we define stochastic dominance under the order on outcomes associated with rare-words preferences.
Definition 4.5.Fixing player 2's strategy (e 2 , s 2 ), we say that strategy (e 1 , s 1 ) for player 1 stochastically dominates strategy (e 1 , s 1 ) for player 1 with respect to the ordering on outcomes given by rare-words preferences, if and only if: In addition, we say the stochastic dominance is strict if there exists a k such that the above inequality is strict.
The next two propositions give general results about strategies in the complete ESP game and use the randomized mapping from the previous section to map each high dictionary to a low dictionary that does at least as well as it in expectation.A designer will prefer this equilibrium to the ((L, s ↓ 1 ), (L, s ↓ 2 )) equilibrium we found for match-early preferences, since the ((L, s ↑ 1 ), (L, s ↑ 2 )) equilibrium leads to matches on a "rarer" word.We formalize this observation in Remark 4.7.PROPOSITION 4.6.((L, s ↑ 1 ), (L, s ↑ 2 )) is a strict ordinal Bayesian-Nash equilibrium of the complete ESP game for every distribution over U under rare-words preferences.
PROOF.We use the randomized mapping from section 3.2 to map each This allows us to apply Lemma A.3.Since there exists a D H ∈ D H consisting only of "high words", the inequalities are strict for at least one pair of D H , D L , where D H is mapped to D L under the randomized mapping.Using Lemma A.3 and Lemma 3.17, an identical version of Lemma 3.18 exists and establishes stochastic dominance in this new model of preferences.
It should be noted that the previous proposition can be generalized to handle the case where there are more than two effort levels.In such a model, we would have exerting the lowest effort level in conjunction with increasing frequency, for both players, as a strict ordinal Bayesian-Nash equilibrium for every distribution over U under rarewords preferences.We would not have this result for any effort level that is not the lowest, since Proposition 6 would generalize to any effort level that is not the lowest.
The following remark establishes that the (s ↑ 1 , s ↑ 2 ) strategy profile leads to a match on a "rarer" word than the (s ↓ 1 , s ↓ 2 ) strategy profile.Specifically, we observe that for any pair of dictionaries D 1 and D 2 , the (s ↑ 1 , s ↑ 2 ) strategy profile yields an outcome at least as good as the (s ↓ 1 , s ↓ 2 ) strategy profile from the system designer's perspective.In particular, when D 1 and D 2 overlap with more than one word, the (s ↑ 1 , s ↑ 2 ) strategy profile yields the outcome of matching on the lowest frequency word and the (s ↓ 1 , s ↓ 2 ) strategy profile yields the outcome of matching on highest frequency word.When D 1 and D 2 overlap with only one word, or no words, then (s ↑ 1 , s ↑ 2 ) and (s ↓ 1 , s ↓ 2 ) lead to the same outcome.The proof is easy and omitted.
Remark 4.7.For any pair of dictionaries D 1 ∈ D e1 , D 2 ∈ D e2 , the (s ↑ 1 , s ↑ 2 ) strategy profile yields a match on word with frequency at least as low as the (s ↓ 1 , s ↓ 2 ) strategy profile.
The following proposition follows from the fact that if player 2 plays H, player 1 maximizes the probability of matching by playing L, yet maximizes the probability of matching on the "rarest" word by playing H.  ((H, s ↑ 1 ), (H, s ↑ 2 )) is not an ordinal Bayesian-Nash equilibrium of the complete ESP game for any distribution over U under rare-words preferences.
PROOF.We show that the inequality in Definition 4.5 does not hold for k = n when Thus showing that the inequality in Definition 4.5 does not hold for k = n when (e 2 , s 2 ) = (H, s ↑ 2 ) and (e 1 , s 1 ) = (H, s ↑ 1 ) is equivalent to showing that Definition 3.14 does not hold for k = d when (e 2 , s 2 ) = (H, s ↓ 2 ) and (e 1 , s 1 ) = (H, s ↓ 1 ), which is established by Lemma 3.18.The previous proposition can be generalized to handle the case where there are more than two effort levels.In such a model, we would have that for any effort level e i that is not the lowest, playing e i in conjunction with increasing frequency is not an ordinal Bayesian-Nash equilibrium for any distribution over U under rare-words preferences.
The implication of the above proposition is that for every distribution over U , there exists a utility function for which (H, s ↑ 1 ) is not a best response to (H, s ↑ 2 ).Since ((H, s ↑ 1 ), (H, s ↑ 2 )) cannot be an ordinal Bayesian-Nash equilibrium for any distribution over U , we seek to understand under what conditions on utility functions and distributions we can get ((H, s ↑ 1 ), (H, s ↑ 2 )) as a Bayesian-Nash equilibrium.We restrict attention to the Zipfian distribution over U and multiplicative and additive valuation functions.Definition 4.9.Let α i denote the ratio of successive outcome in the utility function v(o) (satisfying rare-words preferences) with total ordering of outcomes o In order to prove positive results for the top level of the game under this new preference model, we use similar techniques as the previous section.In particular, we use a "randomized mapping", except in this case, we think of mapping each dictionary in D L to a subset of dictionaries in D H . Rather than explicitly defining a mapping and providing an intuitive explanation for the mapping as we did in the previous section, we show that a valid mapping exists.In order to show that a valid mapping exists, we first define a linear system of equations that a mapping must satisfy in order to be valid.We then prove that a solution to the system of equations exists, by defining a second system of equations.The second system of equations corresponds to the linear system of equations that a mapping from the previous section must satisfy in order to be valid.We show that a solution exists to this second system of equations (Lemma 4.10) and that if a solution to the second system of equations exists, then a solution to the first system of equations exists (Lemma 4.11).
We start by defining a linear system of equations that a mapping must satisfy in order to be valid.Consider a mapping that maps each D L ∈ D L to some subset of dictionaries in D H . Suppose the dictionaries in D L are indexed via i = 1, ..., |D L | and the dictionaries in D H are indexed via j = 1, ..., |D H |. Let the variable x ij denote the probability that A valid mapping must satisfy the following properties: (1) In order for the mapping to be valid, it must also be the case that 0 ≤ x ij ≤ 1 for all i, j.Note that in the above system of the equations, some of the x ij are removed so that they will always be set to 0. A variable x ij is removed from the above set of equations if and only if D j H ∩ U L D i L .In order to show that a solution to the above system of linear equations exists, we give a second system of linear equations.This system of equations is exactly the set of equations that the mapping from the previous section needed to satisfy in order to be a valid mapping. (1) Similar to the previous system of linear equations, it must also be the case that 0 ≤ y ij ≤ 1 for all i, j, and some of the y ij are removed so that they will always be set to 0. A variable y ij is removed from the above set of equations if and only if D j H ∩U L D i L .The following two lemmas are proved in the Appendix and show that a solution exists to the first set of equations.Therefore, we know that a valid mapping exists from D H to D L with the desired properties.LEMMA 4.10.There exists a solution to the second system of linear equations.
LEMMA 4.11.If a solution to the second system of linear equations exists, then a solution to the first system of linear equations exists.Namely this solution can be obtained by Lemma 4.12 states that the distribution obtained from sampling U H directly is the same as the distribution obtained from sampling a low dictionary, followed by the randomized mapping.This follows immediately from the second set of conditions we require the randomized mapping to satisfy.LEMMA 4.12.
We say that dictionary D with elements {w d , w d−1 , ..., w 1 } (in order of increasing frequency) dominates dictionary D with elements {w d , w d−1 , ..., w 1 } (in order of increasing frequency) if f (w i ) ≤ f (w i ) for all i.We say that the dominance is strict if D = D.
For the following lemmas, we use this randomized mapping, which satisfies the following property: We show that if the randomized mapping satisfies this property, it must be the case that each low dictionary is mapped to a high dictionary that dominates it..., w 1 } (sorted in order of increasing frequency), where D H ∩ U L ⊆ D L .Assume D H does not dominate D L , so there exists a coordinate i such that f (w i ) < f (w i ).Let i be the minimum such coordinate.Since D L ⊂ U L , w i ∈ U L , and so w i ∈ D L .Since the dictionaries are in sorted order, w j = w i for some j > i, however this means there exists a there exists an i such that w i = w i .Therefore, f (w i ) > f (w i ) and the dominance is strict.LEMMA 4.14.For every D 1,L , where D 1,L is a dictionary sampled with respect to the L effort level, and for every g that satisfies the property that D 1,L is mapped to a dictionary in D 1,H ∈ D H such that D 1,H ∩ U L ⊆ D 1,L and when both players play increasing frequency in the second stage and for all utility functions that satisfy rarewords preferences and α k ≥ Pr(w n−k ∈D H ) Pr(w n−k+1 ∈D H ) for all k, we have that: In addition, the inequality is strict when g(D 1,L ) = D 1,L .
Lemma 4.15 uses Lemmas 4.14 and 4.12 to show that playing H yields greater utility than L, given that the other playing is playing H, assuming players play increasing frequency in the second stage.It is also important to note that this argument is independent of the number of effort levels so the equilibrium analysis holds as we vary the number of effort levels, as long as there are at least two.LEMMA 4.15.Given that players are playing words in order of increasing frequency, for all u that satisfy rare-words preferences and α k ≥ Pr(w n−k ∈D H ) Pr(w n−k+1 ∈D H ) for all k.Theorem 1.3 together with Lemma 4.15 gives us the following result.PROPOSITION 4.16.((H, s ↑ 1 ), (H, s ↑ 2 )) is a Bayesian-Nash equilibrium of the ESP game for all distributions over U and any utility function that satisfies rare-words preferences and α k ≥ Pr(w n−k ∈D H ) Pr(w n−k+1 ∈D H ) for all k.COROLLARY 4.17.For every distribution over U , there exists a utility function satisfying rare-words preferences for which ((H, s ↑ 1 ), (H, s ↑ 2 )) is a Bayesian-Nash equilibrium of the ESP game.
PROPOSITION 4.18.There exists a distribution over U for which ((H, s 1 ), (H, s 2 )) cannot be a Bayesian-Nash equilibrium of the ESP game for any pair of consistent second-stage strategies s 1 , s 2 and for any utility function satisfying match-early preferences.
We interpret the conditions on the utility function in Theorem 4.16 for a specific class of distributions, namely the Zipfian distribution (see Definition 3.11).For this analysis, we restrict attention to the case where s ≤ 1. Proposition 1.4 gives the criteria for ((H, s ↑ 1 ), (H, s ↑ 2 )) to be a Bayesian-Nash equilibrium for a family of Zipfian distributions, when the dictionaries are sampled with replacement.We note that for large values of n the conditions for when the dictionaries are sampled without replacement are virtually identical to the case where the dictionaries are sampled with replacement.((H, s ↑ 1 ), (H, s ↑ 2 )) is a Bayesian-Nash equilibrium of the complete ESP game for Zipfian distributions over U with s ≤ 1 and any additive utility function that satisfies rare-words preferences and any multiplicative utility function that satisfies rare-words preferences with r ≥ 2.
PROOF.From Lemma 4.14, it suffices to show t i • v(w i ) > t i • v(w i ).First consider the case of additive utility functions.Assume that w i is the j th most frequent word in the universe and w i is the k th most frequent word in the universe (with j > k).Under the Zipfian distribution, ) for all additive utility functions.Now consider the case of multiplicative utility functions.We have shown that ti ) for all multiplicative utility functions with r ≥ 2.
It should be noted that Theorem 1.4 generalizes to hold for the many effort level model.Namely, Theorem 1.4 will hold for any effort level that is not the lowest effort level.For the lowest effort level, we have Proposition 5.

CONCLUSIONS
In this paper, we introduced a simple model of the ESP game and provided, in many cases, complete characterizations of the equilibria.We introduced a model of matchearly preferences to capture the current set-up of the ESP game.We showed that the strategy profile (s, s), where s is an almost decreasing strategy, is a strict ordinal Bayesian-Nash equilibrium of the second-stage of the ESP game (under match-early preferences) for every distribution over U , irrespective of the effort levels chosen in the first-stage.Moreover, we showed that these are the only strategy profiles, where at least one player is playing a consistent strategy, that is a strict ordinal Bayesian-Nash equilibrium for every distribution over U .These results hold even if players have different distributions over U , as long as the total ordering of words in U in terms of frequency is the same for both players.Since the (s ↓ 1 , s ↓ 2 ) is the most natural of the set of strategies that satisfy the almost decreasing property, we focused on this equilibrium profile when analyzing the equilibrium of the complete game.
The implication of equilibrium characterization for the second-stage ESP game under match-early preferences is that there exists a distribution such that (s ↑ 1 , s ↑ 2 ) cannot be an ordinal Bayesian-Nash equilibrium.However, we can make a stronger claim, that there exist distributions for which (s ↑ 1 , s ↑ 2 ) cannot be a Bayesian-Nash equilibrium for any valuation function satisfying match-early preferences.We showed that the Zipfian distribution is one such distribution.The Zipfian distribution is significant in this setting since the distribution of words in the English language is known to follow a Zipfian with exponent very close to 1 [Zipf 1932].
Given the equilibrium analysis for the second-stage, we showed that ((L, s ↓ 1 ), (L, s ↓ 2 )) is a strict ordinal Bayesian-Nash equilibrium of the complete ESP game, under matchearly preferences, for every distribution over U , except for the uniform distribution.We precluded the existence of a ((H, s ↓ 1 ), (H, s ↓ 2 )) equilibrium for any distribution over U , except for the uniform distribution, and any utility function that satisfies matchearly preferences, by showing that (L, s ↓ 1 ) is a strict ordinal best response to (H, s ↓ 2 ).In the case of the uniform distribution over U , we established both ((L, s ↓ 1 ), (L, s ↓ 2 )) and ((H, s ↓ 1 ), (H, s ↓ 2 )) are weak ordinal Bayesian-Nash equilibrium of the complete ESP game.While the model of the ESP game states that both users have the same utility function, we note that in establishing ordinal Bayesian-Nash equilibrium, these strategy profiles are in equilibrium for any utilities that satisfy match-early preferences.Our equilibrium analysis supports existing empirical results that suggest that users tend to coordinate on low effort words in the ESP game.
In order to model an alternative set-up for the ESP game in which users may choose to coordinate on more difficult words, we introduced the rare-words preferences model.

A. APPENDIX
LEMMA 3.8.If a consistent strategy s 2 satisfies the almost decreasing property, then the strategy profile (s 2 , s 2 ) is a strict ordinal Bayesian-Nash equilibrium of the secondstage ESP game under match-early preferences, for every choice of effort levels e 1 and e 2 , for every distribution over U .PROOF.For each w i , w j such that w i w j under s 2 , where s 2 is an almost decreasing strategy, f (w i ) ≥ f (w j ), as long as w j is not the least priority word under s 2 , for all distributions over U .Consider the set A of dictionaries that satisfy the property that w i ∈ l ≤k (s 2 (D 2 )) ∩ w j / ∈ l ≤k (s 2 (D 2 )) and the set B of dictionaries that satisfy the property that w j ∈ l ≤k (s 2 (D 2 ))∩w i / ∈ l ≤k (s 2 (D 2 )), for any max(1, i−n+d) ≤ k ≤ max(i, d−1).If w j is the least priority word under s 2 , B = ∅ and A = ∅, so Pr(D 2 ∈ B) < Pr(D 2 ∈ A).Now suppose w j is not the least priority element under s 2 .Notice B is the exactly the set of dictionaries that satisfy: There exists a mapping t : B → A, which takes a B ∈ B to an A ∈ A, by removing w j and replacing it with w i .The mapping t takes each element B ∈ B to a unique element in A ∈ A, where Pr(B) < Pr(A), due to Lemma A.1.Therefore Pr(D 2 ∈ B) < Pr(D 2 ∈ A).Hence the preservation condition is satisfied when s 2 is an almost decreasing strategy, regardless of distribution.Finally, when s 2 is an almost decreasing strategy, Pr(w i ∈ D 2 ) > Pr(w j ∈ D 2 ), for all j, i such that n > j > i.Likewise, Pr(w j ∈ D 2 ) ≥ Pr(w j ∈ l ≤k (s 2 (D 2 )), for all j, k.Thus: Pr(w i ∈ D 2 ) > P r(w j ∈ D 2 ) ≥ Pr(w j ∈ l ≤k (s 2 (D 2 ))) for all i, j, k where i + 1 < j < n and k ≤ d − 1 and for all distributions.When j = n, Pr(w n ∈ l ≤k (s 2 (D 2 )) = 0 for all k ≤ d − 1, so Pr(w i ∈ D 2 ) > Pr(w n ∈ l ≤k (s 2 (D 2 )) for all i < n and k ≤ d − 1.Hence, the strong condition is satisfied for all distributions over U .Lemma 3.7, along with Theorem 2.10, gives the desired result.LEMMA 3.10.For every (e 2 , s 2 ) (except for s 2 that are almost decreasing), there exists a distribution over U , an effort level e 1 , and a dictionary for which Algorithm 1 will not output an ordering in agreement with s 2 .PROOF.Since s 2 is not an almost decreasing strategy, there exists adjacent w i w i+1 under s 2 such that f (w i ) < f (w i+1 ) and i < n − 1.Let i be the smallest such index.Assume that w i+1 is the k th most frequent element in U .Consider the following distribution over U : The top k most frequent words in U have frequency 1− k and the n− k least frequent words in U have frequency n−k , where < ) Consider the following sets, A, B, C: Set A contains dictionaries that have w i in the top j positions of s 2 (D 2 ) (where max(1, i − n + d) ≤ j ≤ max(i, d − 1)) and do not contain w i+1 .Set B contains dictionaries that have w i in the top j positions of s 2 (D 2 ) and also contain w i+1 .Set C contains dictionaries that have have w i+1 in the top i positions and do not contain w i .We construct t 1 : A → C and t 2 : B → C. t 1 replaces w i with w i+1 .Since w i and w i+1 are adjacent in s 2 and A and C are non-empty, t 1 is a bijection.From Lemma A.1, each A ∈ A that occurs with probability p A is mapped to a C ∈ C that occurs with probability at least as high as p A • f (wi+1) f (wi) .There exists at least one case such that this inequality is strict.Thus Pr(D Thus there exists a t 2 that takes each B ∈ B to a unique C ∈ C. t 2 removes w i and replaces it with an x ∈ U − B. t 2 maps a B ∈ B that occurs with probability p B to a C ∈ C that occurs with probability p C ≥ p B .Therefore, Pr(D2∈B)  Pr(D2∈C) ≤ .

Combining this with
Pr(D2∈A) ).This implies Pr(w i ∈ l ≤j (s 2 (D 2 ))) < Pr(w i+1 ∈ l ≤j (s 2 (D 2 ))).Consider any D 1 with w i as the j th highest priority word under s 2 .Given the selection of i, the first j −1 words of D 1 are in order of decreasing frequency.Therefore, Algorithm 1 will output the first j − 1 words of D 1 according to s 2 .At the j th step of the algorithm, since Pr(w i ∈ l ≤j (s 2 (D 2 ))) < Pr(w i+1 ∈ l ≤j (s 2 (D 2 ))), w i will not be output.LEMMA 3.12.If there exists an , then (↑, ↑) cannot be a Bayesian-Nash equilibrium for any valuation function satisfying match-early preferences.

PROOF. Suppose s
Consider adjacent w i and w i+1 with f (w i ) > f (w i+1 ) and consider a D 1 that contains w i and w i+1 .We consider player 1's utility of playing s ↑ 1 (D 1 ) versus the utility of playing the same ordering yet swapping w i and w i+1 .Call the latter strategy s 1 .Assume that w i is the j th word of s ↑ 1 (D 1 ).Consider the following cases: 1.The match happens before l j or after l j+1 .A swap does not change the outcome.2. The match happens in l j .If the match happens does not happen on w i , the swap does not change the outcome.If the match happens on w i and w i+1 / ∈ D 2 (j), the swap leads to payoffs of v(l j+1 ) instead of v(l j ).Otherwise, if w i+1 ∈ D 2 (j), the swap does not change the outcome.3. The match happens in l j+1 .If the match does not happen on w i+1 , the swap does not change the outcome.If the match happens on w i+1 and w i+1 ∈ D 2 (j), the swap leads to payoffs of v(l j+1 ) instead of v(l j ).Otherwise, if w i+1 = l j+1 (s 2 (D 2 )), the swap does not change the outcome.Therefore, u(s . This expression will be negative for some D 1 ∈ D 1 if and only if there exists a value of max(1, i−n+d) ≤ j ≤ max(i, d−1) such that Pr(w ). PROPOSITION 3.13.Second-stage strategy profile (↑, ↑) cannot be a Bayesian-Nash equilibrium for the second-stage of the ESP game for any Zipfian distribution over U with s ≤ 1 and for any valuation function satisfying match-early preferences.
PROOF.From Lemma 3.12, it suffices to show there exists PROOF.We define the candidate solution to the linear system of equations as follows: For all i, j, if where Pr U d L (D i L ) = Pr(D i L ) denotes the probability of obtaining D i L , when sampling from the universe U L d times without replacement.(Most of the time the U d L notation is implied, but for the purposes of this proof it is necessary for clarity).It is easy to see from this definition of y ij that 0 ≤ y ij ≤ 1 for all i, j.Also note that, for all j: Now it remains to show that the candidate y ij satisfies equations of the second type.Note that: where a = |D j H ∩ U L | (e.g. the number of "low words" in D j H ) and Pr (U L \D j H )) d−a (S) denotes the probability of obtaining the set S from d − a samples from the set U L \ D j H , where the sampling is done without replacement.Since the denominator of Eq. 7 equals 1, we get: where the last expression denotes the probability of obtaining D i L \ D j H when sampling from U H \ D j H until getting d − a low words (e.g.d − a words from U L ).Now going back to equations of the second type: ) for all i.LEMMA 4.11.If a solution to the second system of linear equations exists, then a solution to the first system of linear equations exists.Namely this solution can be obtained by x ij = y ij • Pr(D H,j |H) Pr(D L,i |L) .PROOF.Since a solution exists to the second system of linear equations, there exists a set of y ij that satisfies: LEMMA 4.14.For every D 1,L , where D 1,L is a dictionary sampled with respect to the L effort level, and for every g that satisfies the property that D 1,L is mapped to a dictionary in D 1,H ∈ D H such that D 1,H ∩ U L ⊆ D 1,L and when both players play increasing frequency in the second stage and for all utility functions that satisfy rarewords preferences and α k ≥ Pr(w n−k ∈D H ) Pr(w n−k+1 ∈D H ) for all k, we have that: In addition, the inequality is strict when g(D 1,L ) = D 1,L .
PROOF.Lemma 4.13 tells us that each D L is mapped to a D H that dominates it.Therefore, it suffices to show: .., w 1 } (sorted in order of increasing frequency), where w j = w j for all j = i and f (w i ) < f (w i ).)) = 1 j=d t j • v(w j ) for analogous definition of the t j 's.Since D and D differ in only the i th coordinate, t j = t j for all d ≥ j ≥ i + 1. Therefore: t j > t j for all i − 1 ≥ j ≥ 1, so t j • v(w j ) > t j • v(w j ) for all i − 1 ≥ j ≥ 1 and therefore, ACM Journal Name, Vol. 9, No. 4, Article 39, Publication date: March 2010.
Finally it suffices to show that t i • v(w i ) ≥ t i • v(w i ).We know from the statement of the theorem that for all u that satisfy rare-words preferences and α k ≥ Pr(w n−k ∈D H ) Pr(w n−k+1 ∈D H ) for all k.PROOF.From Lemma 4.12, we know that: where the inequality follows from Lemma 4.14.
PROPOSITION 4.18.There exists a distribution over U for which ((H, s 1 ), (H, s 2 )) cannot be a Bayesian-Nash equilibrium of the ESP game for any pair of consistent second-stage strategies s 1 , s 2 and for any utility function satisfying match-early preferences.
PROOF.We map each high dictionary to a set of low dictionaries, using the mapping from Section 4. We use the following distribution over U , each low word is sampled with probability Since the distribution described is uniform over the low effort words, if player 1 chooses L effort, his second-stage strategy is given by s 2 , applied to the set of L words (from Lemma 3.9).
It suffices to show that: D2 Pr(D 2 )•I(l ≤k (s 2 (h(D 1,H )))∩l ≤k (s 2 (D 2 ))) ≥ D2 Pr(D 2 )• I(l ≤k (s 2 (D 1,H )) ∩ l ≤k (s 2 (D 2 )) for all k and all D 1,H with the inequality strict for some D 1,H and some value of k.Under the mapping, each dictionary D 1,H is mapped to all dictionaries D 1,L that satisfy the property D 1,H ∩ U L ⊆ D 1,L .Consider the prefix l ≤k (s 2 (D 1,H )) and the prefix l ≤k (s 2 (D 1,L )) where D 1,H is mapped to D 1,L under h.If l ≤k (s 2 (D 1,H )) = l ≤k (s 2 (h(D 1,H ))), l ≤k (s 2 (D 1,H )) contains a set of high words and/or it contains a set of low words.If it contains some low words that are not in l ≤k (s 2 (D 1,L )), these words are in D 1,L but are lower priority words than all of the words in l ≤k (s 2 (D 1,L )).It suffices to show D2 Pr(D 2 ) • I(l ≤k (s 2 (D)) ∩ l ≤k (s 2 (D 2 )) ≥ D2 Pr(D 2 ) • I(l ≤k (s 2 (D )) ∩ l ≤k (s 2 (D 2 )) for a pair of dictionaries D and D where l ≤k (s 2 (D )) = l ≤k (s 2 (D)) − {w i } + {w j } and either w i , w j ∈ U L with w i w j under s 2 or w i ∈ U L and w j ∈ U H . First we handle the case where w i , w j ∈ U L with w i w j under s 2 .Since w i w j under s 2 and f (w i ) = f (w j ), P r(w i ∈ l ≤k (s 2 (D 2 )) > P r(w j ∈ l ≤k (s 2 (D 2 )), by Lemma 3.9.Therefore the desired inequality is satisfied for this case and the inequality is strict.Now we handle the case where w i ∈ U L and w j ∈ U H . Since w i is in the top k words of D, there exists at least one dictionary D 2 with w i in the top k words.This dictionary occurs with probability greater than any high effort word occurs in a dictionary D 2 .Therefore the desired inequality is satisfied for this case and the inequality is strict.Since there exists at least one value of D 1,H such that h(D 1,H ) = D 1,H , there exists a value of k and D 1,H such that the inequality is strict.Therefore, playing (L, s 2 ) is a strict ordinal best response to (H, s 2 ).
LEMMA A.1.If dictionary D and dictionary D only differ by one element, x i and x i respectively, with f e1 (x i ) < f e2 (x i ), then dictionary D is sampled with strictly greater probability than dictionary D as long as e 1 ≥ e 2 .
PROOF.If e 1 > e 2 , Pr(D |e 2 ) > Pr(D |e 1 ).Therefore it suffices to show this for e 1 = e 2 .A particular dictionary can be sampled in one of d! ways.Each permutation of D has a corresponding permutation of D that involves replacing x i with x i .Let A = a 1 , ..., a d be a permutation of D and let A = a 1 , ..., a d be the corresponding permutation of D , where A and A differ in coordinate j.A is sampled with probability Pr(a 1 ) Pr(a 2 |a 1 )... Pr(a d |a 1 , a 2 , ..., a d−1 ) and A is sampled with probability Pr(a 1 ) Pr(a 2 |a 1 )... Pr(a d |a 1 , a 2 , ..., a d−1 ).We know Pr(a k |a 1 , ..., a k−1 ) = Pr(a k |a 1 , ..., a k−1 ) for all k < j and Pr(a k |a 1 , ..., a k−1 ) < Pr(a k |a 1 , ..., a k−1 ) for all k ≥ j.Hence, for each permutation of D, there exists a corresponding permutation of D that is sampled with strictly greater probability and Pr(D |e 1 ) > Pr(D|e 1 ).
Under our randomized mapping, D is mapped to all dictionaries in D L ∈ D L such that A ⊂ D L .In other words, D is mapped to dictionary D L ∈ D L with non-zero probability if and only if A ⊂ D L .If A ⊂ D L , then D is mapped to D L with the same probability that you could would get D L if you continued to sample individual words from U H (without replacement) until you got d "low words".
LEMMA 4.13.If D H ⊂ D H satisfies the property D H ∩ U L ⊆ D L for any D L ∈ D L , then each D H dominates D L .The dominance is strict when D H = D L .PROOF.Let D L = {w d , w d−1 , ..., w 1 } and let D H = {w d , w d−1 , .
Definition 4.19.Additive Discount Property: Under rare-words preferences, a utility function v(o) over the total ordering of outcomes o 1 o 2 ... o m satisfies the additive discount property if and only if, for each pair of adjacent outcomes o j and o j+1 , v(o j ) − v(o j+1 ) = c for some constant c > 0 and v(o m ) = 0. Definition 4.20.Multiplicative Discount Property: Under rare-words preferences, a utility function v(o) over the total ordering of outcomes o 1 o 2 ... o m satisfies the multiplicative discount property if and only if, for each pair of adjacent outcomes o j and o j+1 , v(oj ) v(oj+1) ≥ r for some constant r > 1. THEOREM 1.4.
and replacing it with w d in CC.Since w d−1 and w d are adjacent in s 2 and A and C are non-empty, t 1 is a bijection.Each A ∈ A that occurs with probability p A is mapped to a C ∈ C that occurs with probability greater than p A • f (w d ) f (w d−1 ) (from Lemma A.1).
y ij = Pr(D i L ) ∀i such that 1 ≤ i ≤ |D L |, or |D H | j=1 Pr(D j H ) Pr(D i L ) • y ij = 1 ∀i such that 1 ≤ i ≤ |D L |.Observe that this gives a solution to the first type of equation in the first linear system of equations, wherex ij = y ij • Pr(D j H ) Pr(D i L ) .This same set of y ij satisfies:|D L | i=1 y ij = 1 ∀j such that 1 ≤ j ≤ |D H |. If y ij = x ij • 1 ∀j such that 1 ≤ j ≤ |D H |, or |D L | i=1 x ij • Pr(D i L ) = Pr(D j H ) ∀j such that 1 ≤ j ≤ |D M |.Finally, we need to show that if x ij = y ij • Pr(D j H ) Pr(D i L ) , then 0 ≤ x ij ≤ 1.Note that y ij = 0 when D j H ∩ U L D i L , so x ij = 0 when D j H ∩ U L D i L .When D j H ∩ U L ⊆ D i L , 0 < y ij ≤ 1.Also note that when D j H ∩ U L ⊆ D i L ,D j H and D i L have some k number of words in common and the remaining d − k words in D j H are only "high" words.Therefore, the remaining d − k words in D j H are all lower frequency words than the remaining d − k words in D i L .By repeatedly applying Lemma A.1, we get that Pr(D j H ) < Pr(D i L ), and therefore, 0 ≤ x ij ≤ 1.