Pairwise Comparison and Selection Temperature in Evolutionary Game Dynamics

Recently, the frequency dependent Moran process has been introduced in order to describe evolutionary game dynamics in ﬁnite populations. Here, an alternative to this process is investigated that is based on pairwise comparison between two individuals. We follow a long tradition in the physics community and introduce a temperature (of selection) to account for stochastic effects. We calculate the ﬁxation probabilities and ﬁxation times for any symmetric 2 × 2 game, for any intensity of selection and any initial number of mutants. The temperature can be used to gauge continuously from neutral drift to the extreme selection intensity known as imitation dynamics. For some payoff matrices the distribution of ﬁxation times can become so broad that the average value is no longer very meaningful.

2006) has become a standard approach to describe the evolutionary dynamics of a population consisting of different types of interacting individuals under frequency dependent selection. In the traditional approach, one assumes that individuals meet each other at random in infinitely large, well-mixed populations. The replicator dynamics describes how the abundance of strategic types in a population changes based on their fitness, identified with the payoff resulting from the game. In this deterministic formulation, individuals with higher fitness increase in abundance and ultimately, the system reaches a stable fixed point in which the population may consist either of a single type or of a mixture of different types (Taylor and Jonker, 1978;Sigmund, 1998, 2003).
Recently, it has been shown that the finiteness of populations may lead to fundamental changes in this picture due to stochastic effects Taylor et al., 2004;Imhof et al., 2005;Imhof and Nowak, 2006). The fitness, F , of an individual is proportional to that individual's payoff π: (1) The parameter w ∈ [0, 1] denotes the intensity of selection. For w = 1, fitness equals payoff. This scenario describes "strong selection". For w 1, the payoff only provides a small perturbation to the overall fitness of an individual, a limit known as weak-selection . Weak selection is an important concept for two reasons: (i) Many analytical results can only be obtained in the limit of weak selection, but extend in good approximation to much larger values of w. (ii) It is not unreasonable to assume that the fitness of an individual is the consequence of many factors (and games) but only a particular game is under consideration here. This assumption naturally leads to the "weak selection" scenario.
In this framework disadvantageous mutants have a small, yet non-zero probability to reach fixation in a finite population. Conversely, it is not always certain that advantageous mutants take over the entire population. Both effects become more pronounced the weaker the selection intensity, and the smaller the population size. Indeed, whenever the degree of stochasticity is high, these effects become important and lead to a new concept of evolutionary stability Wild and Taylor, 2004;Traulsen et al., 2006c).
Finite-size populations are an ever-present ingredient in individual-based computer simulations which naturally incorporate stochastic effects. Moreover, instead of studying the fixation probability of a single mutant in the limit of weak selection, many individual-based simulation studies address the evolutionary fate of populations that contain a higher number of mutants at start. In this context, different intensities of selection have been employed, ranging from strong selection, captured by the finite-population analogue of replicator dynamics (Hauert and Doebeli, 2004;Santos and Pacheco, 2005;Santos et al., 2006) to an extreme selection pressure modeled in terms of the so-called imitation dynamics, used as a metaphor of cultural evolution (Nowak and May, 1992;Huberman and Glance, 1993;Nowak et al., 1994;Zimmermann et al., 2005).
In this work we present an approach to investigate the evolution of cooperation as a function of the initial fraction of cooperators present in the population at the start of some evolutionary process, and as a function of the intensity of selection. As a result, we bridge the gap between the recently developed evolutionary game theory in finite populations and common practice in individual-based computer simulations. To this end we address the problem of the fixation of a given trait as well as how long it takes for fixation to occur. We are particularly interested in investigating the effects of stochasticity in the distribution of fixation times, and to which extent the average fixation times provide an accurate description of the overall evolutionary dynamics for different games and at all temperatures of selection.
We make use of a simple evolutionary dynamics which recovers the fixation probabilities of the frequency-dependent Moran process in the limit of weak selection but which, unlike the Moran process, enables us to study the fixation probability for any value of the intensity of selection, all the way up to the extreme limit of imitation dynamics. Under such strong selection, an individual with higher fitness will always replace an individual with lower fitness. Evolutionary game dynamics in finite populations has also been studied in a frequency dependent Wright Fisher process (Imhof and Nowak, 2006). For further models of finite population game dynamics, see (Riley, 1979;Schaffer, 1988;Fogel et al., 1998;Ficci and Pollack, 2000;Schreiber, 2001).

Evolutionary Dynamics in finite populations
Let us consider symmetric two-player games in which two types of individuals interact via a payoff matrix A B A a 11 a 12 B a 21 a 22 .
In the simplest case, the payoffs of A and B individuals only depend on the fraction of both types in the population. If there are i A individuals and N −i B individuals, then the A and B individuals have payoffs π A = (i − 1)a 11 + (N − i)a 12 and π B = ia 21 + (N − i − 1)a 22 , respectively. Self interactions are excluded.
Here, we consider a process based on pairwise comparison between individuals. Two individuals, A and B, are selected at random. The individual chosen for reproduction A replaces B with probability p, which depends on the payoff difference π A − π B between the two individuals. The composition of the population can only change if both individuals are of different types. We follow (Blume, 1993;Szabó and Tőke, 1998;Hauert and Szabó, 2005) in choosing the Fermi function from statistical physics for p The parameter β ≥ 0, which corresponds to an inverse temperature in statistical physics, controls the intensity of selection and replaces w defined in Eq. 1. Small β (high temperature) means that selection is almost neutral, whereas for large β (low temperature), selection can become arbitrarily strong. With decreasing intensity of selection β, the probability for reproduction of the advantageous type in the population decreases from 1 to 1/2, selection becoming neutral for β = 0.
A major advantage of the pairwise comparison process over the frequency dependent Moran process  is that the payoff matrix can contain unrestricted positive and negative entries, while for the frequency dependent Moran process there is an inconvenient restriction because the fitness values have to be positive. In contrast to the frequency dependent Moran process, the pairwise comparison process is invariant to adding a constant to all entries of the payoff matrix, as it only depends on payoff differences. Multiplication of the payoff matrix leads to a change of the intensity of selection.
The transition probabilities to change the number of A individuals from j to j ± 1 are given by For weak selection, β 1, we can expand the Fermi function and the transition probabilities become In the frequency dependent Moran process , an individual is chosen at random proportional to its payoff. Its identical offspring then replaces a randomly chosen individual. This amounts to the transition probabilities The expansion of these transition probabilities for weak selection, w 1, leads to While these transition probabilities are different from Eq. (5) for weak selection, the ratio P − j /P + j is identical for he frequency dependent Moran process and the pairwise comparison process discussed here under weak selection. For w 1, we obtain for the frequency dependent Moran process For the pariwise comparison process, we obtain the identical result with w ↔ β. As this ratio of transition probabilities determines the fixation probability (as discussed below), both processes have the same fixation properties for weak selection.

Fixation probabilities
Under Pairwise Comparison, and in the absence of mutations, only when the two individuals chosen have different strategies the total number of individuals with a given strategy can change by one. This defines a finite state Markov process with an associated tri-diagonal transition matrix, a so-called Birth-Death process (Karlin and Taylor, 1975;Ewens, 2004). In general, the probability to reach the absorbing state with 100% A given that the initial number of A individuals is k can be written as Here P + j is the probability to increase the number of A individuals from j to j + 1 and P − j is the probability to decrease that number from j to j − 1 (cf. eq. 4). We use the usual convention that 0 j=1 x = 1 for any x. Due to the sums of products in this equation, a numerical implementation is prone to errors. In Traulsen et al. (2006b), we have shown that the following analytical expression obtained by replacing the sums by integrals constitutes an excellent approximation for the fixation probability under the Pairwise Comparison rule: where ξ k is given by 0 dy e −y 2 being the error function. The fitness difference can be written as π A − π B = 2uj + 2v. The quantity u measures the frequency dependence of payoffs: For u = 0, the fitness difference is independent of the number of A and B individuals. For large N, the quantity v measures the advantage of a A individual paired against a B individual compared to the interaction of two B individuals.
Let us first consider the case of u > 0. The fixation probability can be approximated for weak selection, using the expansion of the error function, erf( This expansion leads to the fixation probability For k = 1, this is identical to the weak selection result for the frequency dependent Moran process. This shows again the identity with this process for weak selection. For u = 0, we find instead from Eq. (11) (or, equivalently, from Eq. (12) in the limit u → 0) which is identical to the fixation probability of k individuals with fixed relative fitness r = e 2βv (Kimura, 1968;Crow and Kimura, 1970;Ewens, 2004). Eq. (14) holds for all payoff matrices where a 11 − a 12 = a 21 − a 22 , a condition known as "equal gains from switching" (Nowak and Sigmund, 1990). For the Pairwise Comparison process, it actually describes frequency independent selection, because the payoff difference is constant. For weak selection, we can apply exp(x) ≈ x + x 2 /2 and end up with which is identical to Eq. (13) for u = 0, as it should.
Finally, let us discuss the case of u < 0. In this case, Eq. (12) is still valid, but the arguments ξ j of the error functions are now imaginary with vanishing real part. However, since erf(ix) = i erfi(x), where erfi(x) is the imaginary error function, i cancels in the equation and the result is a real number which still fulfills 0 ≤ φ k ≤ 1. For weak selection, the arguments of this function become small and the imaginary error function can be approximated by erfi( 1. This expansion leads again to Eq. (13).
In contrast to Eq. (11), the closed analytical fixation probability Eq. (12) can also be approximated for very strong selection, β 1, using the appropriate asymptotics of the error functions (Gradshteyn and Ryzhik, 1994). In the limit β → ∞ and for u = 0 the fixation probability is given by φ k = 1 − δ k,0 for advantageous mutants, v > 0, and φ k = δ k,N for disadvantageous mutants, v < 0. Here, δ i,j denotes the Kronecker symbol, which is one if both indices are equal and zero otherwise.
Eqs. (12) and (14) are approximations to Eq. (11) with an associated error of order N −2 . However, even for populations as small as N = 20 excellent agreement with numerical simulations is obtained, as shown in Fig. 1, see also Traulsen et al. (2006b). These expressions are valid for any pressure of selection and allow a straightforward analysis of limiting cases: For β = 0, both equations (12) and (14) reduce to φ k = k/N, the result for neutral drift (Kimura, 1968). For β 1 we have weak selection and the linear term in β yields an approximation for the fixation probabilities starting from an arbitrary number of mutants. Strong selection is described by β 1 and reduces the process to a semi-deterministic imitation process. The speed of this process remains stochastic, but the direction always increases individual fitness for β → ∞. This limit is outside the realm of the frequency dependent Moran process and results from the nonlinearity of the Fermi function.

Fixation Times
Since the evolutionary process in a finite population is intrinsically stochastic, the system will always end up in one of the two absorbing states, corresponding to 100% individuals of type A or of type B. The average time t k that the system spends in the transient states 1, . . . , N − 1 starting from k before it reaches fixation in k = 0 or k = N is determined by the equation Three different fixation times are of interest. Two are conditional fixation times: Given the the process reaches the state k = 0 with B individuals only, how long does this process take? If instead the state k = N is reached, what is the associated time? Finally, it is of interest also to find the unconditional fixation time, that is, the time it takes until the process reaches any of the absorbing states k = 0 or k = N.
In the Appendix, we show that this average unconditional fixation time is given by where and χ l = exp [βlu + 2βv]. For neutral selection (β = 0), we have t 1 = t N −1 = 2 N −1 l=1 l −1 , which increases logarithmically with N. In general, the unconditional fixation time t k increases with the distance to the absorbing boundaries. However, when the intensity of selection is so high that the system will virtually always reach fixation in a particular state, the unconditional fixation time can increase monotonously towards the boundary at which fixation is not observed.
Adopting the theory outlined in (Antal and Scheuring, 2006), we can also compute the conditional average number of time steps τ 0 k required to reach the absorbing state 0 given that the state 0 is reached (and not state N). Such conditional fixation time τ 0 k increases with increasing k for all games, as the system always has to pass states with lower k before fixation. For k = 0 we have τ 0 0 = 0, whereas τ 0 k diverges for k = N. Similarly, the average conditional time τ N k to reach state N can be calculated. It is zero for k = N and increases with decreasing k, diverging for k = 0, independently of the game. For general β, the average fixation times can be computed numerically from Eqs. (17), (28) and (29) (see Appendix). On the other hand, the average fixation times will only provide an accurate description of the game dynamics to the extent that the probability distribution of fixation times is sharply peaked around the average value discussed so-far. In the following we examine this issue by means of numerically exact simulations for concrete examples involving different games and intensities of selection.

Examples
As a first example, we consider the Snowdrift Game (Hauert and Doebeli, 2004), which is structurally identical to the Hawk-Dove game (Maynard Smith, 1982). Two players choose simultaneously between cooperation (C) and defection (D). If one of them cooperates, both obtain the benefit b. However, cooperation involves a cost c < b, which is divided among the two players when both of them cooperate. If both choose defection, their payoff is zero. The situation is characterized by the payoff matrix The deterministic replicator equation for the Snowdrift game exhibits a stable interior equilibrium corresponding to a coexistence of cooperators and defectors. Any initial condition where both strategies are present will lead to this stable equilibrium. However, in finite populations the system will ultimately end up in a state where either C or D individuals have taken over the population. As illustrated in Fig. 1, the fixation probability φ k becomes arbitrarily high for strong selection (β 1) and k < N. Hence, for strong selection, fixation of cooperators becomes certain, as lim β→∞ φ k = 1 for k > 0. However, a fixation probability of one may be misleading. Indeed, although it is certain that the system will fixate in 100% defectors, the time required to reach fixation may be arbitrarily large.
Similarly to what happens for large population sizes (Antal and Scheuring, 2006), the fixation time increases exponentially with β. For β = 1, N = 20, b = 1 and c = 0.5, the fixation time for a single cooperator in the Snowdrift Game is already of the order of 10 9 elementary time steps. For β = 3, it reaches 10 42 time steps. In other words, a fixation probability of one is not very meaningful in view of the time it would take to reach fixation. Such an increase of fixation time with increasing intensity of selection only takes place in games with mixed Nash equilibria, as shown in Fig. 2, in which the fixation time is plotted as a function of the initial number of cooperators in the population for different selection pressures.
As a second example, we consider the Prisoner's Dilemma. In the Prisoner's Dilemma, two players choose again between cooperation and defection. Cooperation costs c, leading to a benefit b > c for the other player. If both individuals cooperate, they obtain the payoff b − c, whereas cooperation against a defector leads to a payoff −c. On the other hand, a defector playing against a cooperator gets b. The payoff matrix reads The fixation probability of cooperators decreases with increasing intensity of selection β. This can be inferred directly from our parametrization in which u = 0, as cooperators are then equivalent to disadvantageous mutants in frequency independent selection, for whom "fitness" decreases with increasing intensity of selection β. Also the fixation time of defectors decreases with increasing β, as the probability for erroneous steps is reduced. However, with increasing β, the fixation time departs from the neutral selection limit, β = 0, into the opposite direction as for the Snowdrift game. The larger β, the shorter is the fixation time in the Prisoner's Dilemma (Fig. 2).
In summary, frequency dependent selection accelerates fixation compared to neutral selection for 2 × 2 games with pure Nash equilibria. On the other hand, for games with mixed Nash equilibria such as the Snowdrift Game, the fixation time can increase exponentially. For increasing intensity of selection β the fixation time decreases for the Snowdrift game and increases for the Prisoner's Dilemma. When the intensity of selection becomes small (β → 0), both games meet at the scenario of neutral drift.

Stochastic effects on the fixation times
As shown in Figs. 2 and 3, a perfect agreement between the average fixation times is obtained when comparing computer simulations with the theoretical results leading to Eqs. (17), (28) and (29) of the Appendix. However, taking into account the intrinsic stochastic nature of the process, the right quantity to examine is the probability distribution of fixation times. To the extent that this probability distribution is sharply peaked around the average fixation time, the theoretical results provide an accurate description of the dynamical process. As usual, one expects the theory outlined in the previous section to become more accurate for large populations, since in that limit stochastic fluctuations are effectively suppressed.
In Fig. 4 we computed the probability distribution of fixation times for the cases of neutral evolution, as well as for the Prisoner's dilemma and the Snowdrift game considered before in Fig. 2 (β = 0.05, population size N = 20). The results depicted provide an impressive account of the role of stochastic effects in what concerns the fixation times, showing that the behaviour of the probability distribution does not depend solely on population size N, but, more importantly, depends sensitively on the nature of the game and (naturally) on the intensity of selection.
For β = 0.05 and N = 20, the distribution of conditional fixation times in the Prisoner's Dilemma is sharply peaked around the average fixation time. Only relatively small deviations from this average time are observed. With decreasing intensity of selection, β → 0, the probability distribution widens significantly. For neutral selection, β = 0, very long fixation times can occur, leading to an average value that is considerably larger than the most probable fixation time. Such an average value is of limited information, as large deviations are possible. The situation becomes dramatic in the snowdrift game, in which case the variance of the probability distribution actually exceeds the mean. The distribution is extremely flat and and a wide range of fixation times can be observed. Such large fluctuations necessarily question the usefulness of such calculations, not only in small populations, but also as a function of the intensity of selection and the nature of the game. Under such circumstances, stochastic effects provide such an overwhelming contribution to the dynamics that the average fixation time has no longer any predictive meaning.

Games with more than two strategies
So far, we have only discussed 2 × 2 games and the associated fixation times. The mathematical description of evolutionary game dynamics with more than two types is more intricate, but there are several qualitative statements that one can make. For the process introduced here, a strategy that is not present at some time will never appear later, as there are no mutations that lead to new strategies. Hence, starting from d types of individuals, one type will sooner or later go extinct. Then, the dynamics of the system is restricted to a space of d − 1 strategic types. Ultimately, an absorbing point is reached at which only a single type is present. This holds for any type of game if the intensity of selection is finite.
If more than two types of individuals are described, one can introduce a mutation rate which is so small that at most, two types are present in the population (Imhof et al., 2005;Imhof and Fudenberg, 2006). In this case, one can again make use of the fixation probabilities discussed here. Another possibility is to consider large populations. Whereas N → ∞ leads to a deterministic replicator equation (given that the intensity of selection is fixed), finite N leads to stochastic replicator equations. For the process here, the framework discussed in Traulsen et al. (2006a) can be applied. For cyclic games in which the replicator dynamics predicts closed orbits as Rock-Paper-Scissors (Hofbauer and Sigmund, 1998), one can apply such an approximation, introduce angular and radial coordinates and calculate the average fixation time in finite populations, see Reichenbach et al. (2006) for details.

Summary
We have introduced an alternative to the frequency dependent Moran process recently proposed in evolutionary game theory Taylor et al., 2004). Our new process leads to a simple, closed-form equation for the fixation probabilities, which can be readily computed for any symmetric 2 × 2 game, for any intensity of selection and any initial number of mutants. The intensity of selection is measured by a quantity that resembles temperature in statistical physics. It can be shown that a stochastic evaluation of payoffs in this process decreases the intensity of selection (Traulsen et al., 2007). For high intensity of selection (β → ∞) the process is quasi-determinisitic in following the gradient of selection.

Unconditional fixation times
For the time t j to reach a fixation in state (0 or N) starting from state j, we have which can be written as where σ j = t j − t j+1 and t 0 = t N = 0. In the remainder, the product of the ratio of transition probabilities is written as j k=1 P − k /P + k = χ −j j+1 , where χ j = exp [βju + 2βv]. The transition probabilities can be written in terms of χ j as Iteration of Eq. (22) yields For the fixation time, we obtain t j = t 1 − j−1 k=1 σ k . For the unconditional fixation time, we have t 0 = 0 and t N = 0, as fixation has already occurred. With t N = 0, t 1 can be calculated as The average unconditional fixation time is finally given by where

Conditional fixation times
The average conditional fixation times can be computed in an analogous way, as shown in Antal and Scheuring (2006). Here, we just outline the results. The average time τ 0 i to reach the absorbing state state 0 starting from i, given that it is reached and not the other absorbing state N, is where φ i is the probability to end up in N starting from i, cf. Eq. (11), and Similarly, the conditional average time τ N i to reach absorbing state N (and not state 0) is given by where Caption to Figure 1 Fixation probabilities in a population of size N = 20. Simulation results (symbols) obtained from averaging over 10 6 realizations coincide perfectly with the theoretical result, Eq.(12) (solid lines). Arrows indicate increasing intensity of selection. For neutral selection (diamonds), the fixation probability is given by the fraction of cooperators. In the Prisoner's Dilemma, fixation of cooperators becomes less likely with increasing intensity of selection, as shown for β = 0.05 (squares) and β = 0.1 (circles). Only for weak selection and a high initial number of cooperators, they have reasonable chances. In the Snowdrift Game, the fixation probability of cooperators increases with increasing intensity of selection, as the internal equilibrium is closer to pure cooperation. Here, the fixation probabilities are shown for β = 0.05 (squares) and β = 0.1 (circles). However, the fixation time of defectors increases accordingly, see Fig. 2 (b = 1, c = 0.5).
Caption to Figure 2 Conditional fixation times for fixation of defectors in a population of N = 20. Symbols show simulation results whereas lines depict the fixation times obtained according to Eq. (28). Arrows indicate increasing intensity of selection. For neutral selection (diamonds), the fixation time increases with the initial number of cooperators k, as the distance to the point of fixation increases.
In the Snowdrift Game, fixation times increase with increasing selection intensity (squares β = 0.05, circles β = 0.1), as the system spends much time near the internal Nash equilibrium. On the contrary, for the Prisoner's Dilemma, now stronger selection leads to faster fixation (squares β = 0.05, circles β = 0.1). Here, increasing selection intensity induces opposite behaviour for both games in what concerns the average fixation times and the fixation probability, although this is not the case in general (b = 1, c = 0.5, averages over 10 6 realizations). Figure 3 Unconditional fixation times in a population of N = 20. Lines show the theoretical result from Eq. (17) whereas symbols are results from computer simulations. Arrows indicate increasing intensity of selection. For neutral selection (black diamonds), the fixation time increases with increasing distance from the absorbing states. For the Snowdrift Game, fixation times increase with increasing intensity of selection (squares β = 0.05, circles β = 0.1). For the Prisoner's Dilemma, the fixation time increases with the number of cooperators (squares β = 0.05, circles β = 0.1), which results from the high fixation probability in 100% defection. Hence, only close to 100% cooperation, the fixation time decreases (symbols as in Fig. 2, b = 1, c = 0.5, averages over 10 6 realizations). Figure 4 Probability distributions of the conditonal fixation times of a single defector in a population of cooperators. While the average fixation times (arrows) agree well with simulations, as shown in Fig. 2, the probability distributions can become extremely broad. For the Prisoner's Dilemma and for neutral selection, the deviations of the fixation time from the average are comparably small. However, for the Snowdrift game an extremely wide range of fixation times is observed. Hence, the average fixation time is of limited interest, as large deviations are observed with a very high probability (N = 20, β = 0.1).