fixation probability

McCandlish DM, Epstein CL & Plotkin JB 2015 Formal properties of the probability of fixation: identities, inequalities and approximations. Theor Popul Biol, in press.
doi:10.1016/j.tpb.2014.11.004

  • we consider the classical problem of determining the total substitution rate across an ensemble of biallelic loci
  • formulas for the probability of fixation play a widespread role as constituent elements in more complex models of evolution
  • the most common way of constructing such models is to assume that mutation is sufficiently weak that each new mutation is either lost or fixed before the next new mutation enters the population
  • new mutations enter the population with some distribution of selection coefficients and are fixed or lost independently from each other
  • it is unfortunate that the standard formulas for the fixation probability are, mathematically, difficult to understand and manipulate
  • we develop a series of identities, inequalities and approximations for the exact probability of fixation of a new mutation under the Moran process
  • uN(s) = (1 − es) / (1 − eNs) ... (1)
  • s is the difference between the log fitness of the invading type and the log fitness of the resident type (i.e. the difference in Malthusian fitness)
  • defining the selection coefficient as the difference in log fitnesses is useful
  • it puts the probability of fixation into a form very similar to that of the approximate probability of fixation under the Wright-Fisher process
  • in these "sequential fixation" models, evolution is described as a Markov chain on the set of alleles (rather than the set of allele frequencies)
  • the logarithm of the probability of fixation must obey certain symmetries
  • the logarithm of the probability of fixation may be easier to understand than the probability of fixation itself
  • the probability of fixation can be additively partitioned into one term, corresponding to the probability of fixation in an infinite population and another term, capturing the effects of finite population-size, which depends only on the magnitude of the selection coefficient and not its sign
  • 0 < u′(s) < 1
  • it is common to approximate u(s) ≈ s for large Ns and small s
  • this approximation thus always overestimates the sensitivity of u(s) changes in s
  • the derivative d log u(s) / ds provides, in some sense, a more informative measure of how changing s changes the probability of fixation, than u′(s) itself does
  • d log u(s) / ds = u′(s) / u(s)
  • the derivative of the log probability of fixation describes how changes in s affect the probability of fixation in a relative, rather than absolute sense
  • d2 log u(s) / ds2 = d2 log u(− s) / ds2
  • d2 log u(s) / ds2 < 0 for all s
  • we will refer to this condition as the probability of fixation being "log-concave" in s
  • it will serve as the key fact for deriving inequalities in the next section
  • u(s) − (1 − es) = es u(− s) ... (I4)
  • w(s) − s = w(− s) ... (I10)
  • it is important to understand what this identity means biologically
  • Eq. (I10) expresses the probability of fixation given by wN(s) for positive s as the sum of the probability of fixation for that selection coefficient in an infinite population (limN → ∞ wN(s) = s) plus some extra probability due to the effects of finite population size
  • the amount of extra probability is symmetrical around s = 0
  • the extra probability due to finite population size for s > 0 is precisely equal to the probability of fixation of a deleterious allele whose selection coefficient has the same absolute value
  • the genome is made up of an infinite collection of independently evolving biallelic loci
  • each fixation of an allele with selection coefficient S results in the creation of a potential mutant with selection coefficient − S
  • W(S) = S + W(− S)
  • we think about this as a partitioning of the rate of evolution into one component due to finite population size and another component corresponding to the substitutions that would still occur in an infinite population
  • the rate of evolution due to finite population size depends only on the distribution of |S| and not on the distribution of S itself
  • this increase in the substitution rate due to selection can be found by multiplying the probability that a locus is fixed for the deleterious allele, 1 / (1 + eS) by the rate of advantageous substitutions at such a locus, S, and then integrating over the density of selection coefficients ψ(− |S|)dS
  • importantly, the expression in the last line is just the contribution of the term − SeS to the rate of evolution under the corresponding model of deleterious fixations when analyzed using the approximation given in Eq. (72)
  • to find the rate of evolution under a partially reflected model of evolution, one can simply take an approximation for the corresponding model of deleterious fixations derived using Eq. (72) and double the term corresponding to − SeS
  • this approximation for the partially reflected model gives an upper bound on the rate of evolution as compared to W(S)
  • Eq. (72) provides an upper bound for the rate of evolution due to finite population size effects
  • Eq. (78) is an upper bound for the rate of evolution in an infinite population
  • in the large N, fixed Ns regime, under any partially reflected model, the fraction of substitutions corresponding to selection cannot be much more than one half
  • for any population size, under any partially reflected model, the fraction of substitutions due to selection is strictly less than half
  • the large N fixed s limit of uN(s) is 1 − es for s > 0 and 0 otherwise
  • using Identity (I4), the probability of fixation due to finite population size effects is es u(− s) for s > 0 and u(− s) otherwise
  • for any single locus with symmetric mutation rates and selection coefficient |s|, at equilibrium the rate of substitution due to finite population effects is always larger than the rate of substitution that would occur if the population was suddenly replaced by an infinite population
  • this result can then be extended to an arbitrary distribution of selection coefficients |s|
  • we choose f(|s|) to be the rate of evolution in an infinite population, g(|s|) to be the rate of evolution due to finite population size effects
  • f(|s|) / g(|s|) < c = 1 for all s
  • we have also shown how these results can be extended to the situation when an allele is initially present in more than a single copy
  • Sella and Hirsh (2005, supporting text) have shown numerically that the resulting approximation for the probability of fixation has accuracy comparable to the Kimura (1957, 1962) expression based on the traditional definition of the selection coefficient
  • if one is interested in controlled approximations, then the traditional choice of selection coefficient may be superior for the Wright-Fisher process
  • this is because the Kimura (1957, 1962) expression with the traditional definition of the selection coefficient provides an upper bound on the true probability of fixation for (1) the haploid Wright-Fisher process (Moran, 1960) and (2) the diploid Wright-Fisher process where fitness is additive within loci (Bürger and Ewens, 1995)
  • w(s) = s + w(− s)
  • the probability of fixation of a mutation with selection coefficient s can be partitioned into two components
  • the first of these components is max(s, 0), which is the probability of fixation for a new mutation with selection coefficient s in an infinite population (the strong-selection limit)
  • the second of these components is w(− |s|), which is symmetric around s = 0 and which captures the effects of finite population size on the probability of fixation
  • Razeto-Barry (unpublished manuscript) has recently suggested this partition of the substitution rate to help resolve the neutralist-selectionist debates
  • the neutralist position is identified with the claim that most substitutions are due to the effects of finite population size and the selectionist position with the claim that most substitutions would still occur in an infinite population
  • we consider the substitution rate at equilibrium for a model with an infinite number of freely-recombining, biallelic loci with unbiased mutation rates and selection coefficients drawn from some arbitrary probability distribution
  • we define the probability of fixation due to selection to be the probability of fixation in the large N, fixed s limit, which is 1 − es for s > 0 and 0 otherwise
  • we then identify the probability of fixation in excess of this amount as being due to finite population size
  • using one of our new identities (Identity (I4)) we then prove that, for any finite population, the fraction of substitutions due to selection is strictly less than the fraction of substitutions due to finite population size
  • any model supporting the selectionist position must necessarily rely on complications such as non-equilibrial dynamics, mutational biases, and epistasis
  • suppose we have a sequential-fixations model with rate matrix Q
  • Q(i, j) describes the rate at which a population currently fixed for allele i would become fixed for some other allele j
  • we can write Q = QSel + QNeut
  • QSel is the rate matrix for the corresponding strong-selection weak-mutation Markov chain
  • QNeut summarizes the effects of finite population size on the evolutionary dynamics
  • this decomposition of the rate matrix Q applies only to the infinitesimal rates and not to the long-term dynamics
  • even if we can decompose the instantaneous rate of evolution into the rate due to selection and the rate due to drift, the long-term effects of selection and drift in finite populations are inextricably intertwined