soft sweep

Hermisson J & Pennings PS 2017 Soft sweeps and beyond: understanding the patterns and probabilities of selection footprints under rapid adaptation. Methods Ecol Evol 8:700-716.

  • complex footprints observed in data are, as yet, insufficiently covered by models
  • selective sweeps refer to patterns in genomic diversity that are caused by recent adaptation
  • the key genealogical implication of mutation-limited adaptation is that at the selected locus itself, the time to the most recent common ancestor (MRCA) of the sample, TMRCA, is shorter than the time that has elapsed since the onset of the new selection pressure, TS
  • a hard sweep
  • TMRCATS (a single recent ancestor)
  • a soft sweep
  • TMRCA > TS (more than a single ancestor)
  • there are two different ways how soft sweep genealogies can come about, which lead to different patterns
  • 1. single-origin soft sweeps
  • 2. multiple-origin soft sweeps
  • for both, hard and soft sweeps, the definitions apply irrespective of whether the beneficial allele has reached fixation (complete sweep) or still segregates in the population (partial sweep), as long as we restrict the sample to carriers of the beneficial allele only
  • for a given sweep locus, we may observe a soft sweep in one sample, but a hard sweep in a different sample
  • the notion of a 'soft sweep’ as defined here following previous work (e.g. Hermisson & Pennings 2005; Messer & Petrov 2013; Jensen 2014; Berg & Coop 2015) is therefore not synonymous for ‘adaptation footprint from SGV’
  • sweep types refer to classes of patterns that result from characteristic coalescent genealogies, not to evolutionary processes
  • this leaves us the task to explore the probability of each sweep type under a given evolutionary scenario and to use this information for statistical inference of process from pattern
  • x0, the frequency of mutant allele A at time TS, is not a fixed value
  • it is a stochastic variable and follows a distribution
  • using the deterministic approximation, Jensen (2014) argues that soft sweeps are only likely if sb/sd > 100 when Θ = 10−2 ('Drosophila') and if sb/sd > 10 000 when Θ = 10−4 ('humans')
  • even for a low ratio sb/sd = 10, soft sweeps dominate for all Θ values
  • eqn (9) for Pind depends only on Θ, but not on any selection parameter
  • it also applies if the selection pressure changes during the course of adaptation
  • Pind(n) also depends only weakly on sample size n
  • for n = 100, multiple-origin soft sweeps start to appear for Θ > 0·01 (>5% soft), become frequent for Θ > 0·1 (~40% soft) and dominate for Θ ≥ 1 (≥99% soft)
  • spatial structure can increase the probability of multiple-origin soft sweeps in global population samples, but not necessarily in local samples from a single region
  • whenever spatial structure and isolation by distance cause soft sweeps, despite low Θ, we expect to find hard sweeps in local samples, but soft sweeps in global samples
  • ongoing migration can blur this signal and patterns can look increasingly soft in local samples, too
  • sweeps that appear soft for a recent adaptation may turn hard if we take a sample at some later time
  • this occurs if descendants from only a single copy of the beneficial allele dominate at that later time
  • even for a major haplotype frequency of 99% the expected fixation time is >0·1Ne generations
  • over time scales where sweep patterns are generally visible, a soft sweep will usually not 'harden'
  • by genetic drift alone, soft sweep signals will rather fade than turn hard
  • a case where a hard sweep will look soft results if selection acts on a fully recessive allele
  • a recessive allele behaves essentially neutrally as long as its frequency is smaller than x < x0 = 1 / √2Nesb
  • its trajectory resembles the one of an allele that derives from neutral SGV with starting frequency x0 at time TS
  • the impact of selection on the sweep depends only on the shape of the trajectory
  • both scenarios can only be distinguished if additional biological information about dominance is available
  • for humans, estimates of a long-term Ne ~ 104 (Takahata 1993) are so heavily influenced by sporadic demographic events, like the bottleneck connected to out-of-Afirica migraion
  • this number is almost useless for population genetic theory
  • more refined methods based on deep sequencing estimate changes in Ne from ~14 000 pre-agriculture (10 000 years ago) to ~500 000 presently in Africa
  • with a mutation rate of 1–2 × 10−8, this leads to a 'recent' Θ ≈ 10−3 for point mutations, consistent with estimates of Θ from singletons
  • we arrive at ~0·01 as a rough estimate for Θl or Θg
  • this is an order of magnitude lower than for Drosohpila and in a range where hard sweeps from new mutations dominate
  • here are reasons why adaptation from SGV may be more prevalent
  • population growth protects rare alleles from loss due to drift (Otto & Whitlock 1997; Hermisson & Pennings 2005)
  • many selection pressures result from the dramatic changes in nutrition and population density since the advent of agriculture, or from pathogens that have spread in response to increased human density
  • these selection pressures are very young
  • almost all adaptive alleles that are now at high frequency must have emerged from SGV (Przeworski, Coop & Wall 2005; Novembre & Han 2012)
  • recent sweeps in humans are never species-wide
  • regional patterns arise not only as a response to regional selection pressures (...), but also because of parallel adaptation to the same selection pressure across geographic regions
  • the prime example is light skin pigmentation, a trait that has eveolved several times independently in Europeans and Asians
  • parallel adaptation in this case is facilitated by a larger genomic mutation target, comprising several genes
  • the pattern at each single gene locus is still consistent with a hard sweep, either from new mutation after the out-of-Africa migration or from a deleterious standing variant
  • given the low estimated value of Θ, maybe the most surprising finding is the evidence of multiple-origin soft sweeps
  • there is evidence for the influence of geographic structure for both, lactase and Duffy (global soft sweep), but both are also locally soft in Africa
  • some patients do not evolve drug resistance at all, or only after many years of treatment
  • this suggests a role of de novo mutation in resistance evolution
  • depending on the patient and the treatment, HIV populations may actually be mutation limited
  • better treatments lead to harder sweeps, likely because they reduce the population size and move HIV into a mutation-limited regime
  • a single parameter, the population mutation rate Θ = 4Neu , is most important in separating the rapid world from the mutation limited one
  • this parameter is more complex than it may seem
  • it depends on the specific beneficial allele and its mutational target size
  • it also depends on the timing of the adaptive event and the corresponding short‐term effective population size
  • this leads to a large variance for Θ
  • there is empirical evidence for soft sweeps even in humans
  • real adaptive stories go beyond a simplistic hard‐or‐soft dichotomy