soft sweep

Zheng Y & Wiehe T 2019 Adaptation in structured populations and fuzzy boundaries between hard and soft sweeps. PLoS Comput Biol 15:e1007426.

  • train- and test-sets must have the same, or at least similar, demographic parameters so that demographic effects will not be mis-identified as selection signals
  • real population histories may lie outside of the tested parameter space
  • it has been claimed that over 90% of the recent adaptation events in Homo sapiens have been soft sweeps, making hard sweeps the exception rather than the rule [35]
  • this finding is consistent with an earlier study reporting that "classic selective sweeps," i.e., those characterized by a sharp reduction of diversity around an adaptive locus, are rare in human populations
  • recent Drosophila adaptations are largely attributed to hard sweeps
  • we assume a co-dominant fitness scheme
  • wild-type homozygotes have fitness 1
  • heterozygotes have fitness 1 + s
  • mutant homozygotes have fitness (1 + s)2 ≈ 1 + 2s
  • we assume further s = 0.02
  • a total of 5, 000 parallel samples (100 independent populations, and 50 samples from each) were produced from each scenario, each deme and each time point
  • haplotype-based methods excel with ongoing and early-stage sweeps
  • frequency-spectrum-based ones are more powerful for completed sweeps
  • this is consistent with the fact that the methods such as iHS are designed for ongoing sweeps rather than for completed ones
  • generally, we observe a tendency to mis-classify hard sweeps as soft when there is a stage mismatch between train- and test-sets of a predictor
  • we call this effect temporal softening
  • misclassification as soft sweeps is common at both ends of the timeline but rare at time stages close to fixation
  • classification of sweeps as "hard" and "soft" often relies on ideal assumptions such as known time stage and genomic location of the selection site, as well as demographic assumptions such as a panmictic population of constant size
  • in regard to the location-based effect known as "soft shoulder", potential solutions include explicitly modeling regions linked to hard sweeps as well as classify sweeps based on signal peaks only
  • our "temporal softening" is caused by an early-stage hard sweep mimicking the signal of later-stage soft sweeps
  • multiple haplotypes at locus
  • weaker reduction of genetic diversity
  • a one-peak patterns for statistics like Fay and Wu's H or linkage-based ones
  • (two-peak patterns occur for fixed hard sweeps)
  • the peaks refer to the shape of the statistics along the chromosomes, surrounding the site of the adaptive allele
  • when a machine-learning algorithm is trained with sweeps of only one time stage, or a statistic (especially a likelihood-ratio test) is created based on only ongoing or fixed sweeps, it can be unable to recognize patterns for other stages
  • most studies before have been focusing on only ongoing [62] or fixed [21] sweeps
  • so far, little attention has been paid to the question of how robust the tools are with respect to stage mismatch and how much false positive and negative rates may be inflated by this problem
  • we thus argue that searches for sweeps in genomic data, especially those that also try to distinguish hard and soft sweeps, need to explicitly account for the different stages (ongoing, recent or ancient) in the models and (if applicable) machine-learning training sets
  • it is possible that the large amount of "soft sweeps" discovered from the human genome [36], [35] are "sweeps by proxy"
  • i.e. hard sweeps occurring in other populations imported by migration
  • mixed samples intensify the "spatial softening" effect in local adaptation scenarios
  • temporal misclassification, including softening and hardening, refers to classification of hard sweeps as soft or vice versa, because the training model mismatches with the tested data in time stage
  • spatial softening can cause hard sweeps in neighboring demes to be falsely detected as soft
  • if a panmictic population model is used in data analysis but the real situation involves occasional migration, false positive sweeps (mainly classified as soft) may ensue
  • the claim that human populations have overwhelmingly soft sweeps as the mode of adaptation may be a result of biased classification