population history

Gutenkunst RN, Hernandez RD, Williamson SH & Bustamante CD 2009 Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet 5:e1000695.

  • combining our demographic model with a previously estimated distribution of selective effects among newly arising amino acid mutations accurately predicts the frequency spectrum of nonsynonymous variants across three continental populations (YRI, CHB, CEU)
  • African and Eurasian populations diverged around 100,000 years ago
  • this is earlier than other genetic studies suggest
  • our model includes the effects of migration, which we found to be important for reproducing observed patterns of variation in the data
  • we find no evidence for recurrent migration after East Asian and Native American populations diverged
  • simulations incorporating linkage are necessary to estimate variances and define critical values for hypothesis testing and model selection
  • numerics
  • we solve the diffusion equation on a regular nonuniform grid, using a finite difference scheme [36]
  • inspired by the method of Chang and Cooper [37]
  • data
  • we considered all diallelic SNPs in 5.01 Mb of sequence from noncoding regions of 219 autosomal genes
  • to estimate the ancestral allele, we aligned to the panTro2 build of the chimp genome
  • like other methods based on the unfolded AFS, our analysis is sensitive to errors in identifying the ancestral allele
  • the human-chimp divergence in the data is 1.13%
  • we assumed a divergence time of 6 My [45] and a generation time of 25 years
  • this yielded an estimated neutral mutation rate of μ = 2.35 × 10− 8 per site per generation
  • which is comparable to direct estimates [46]
  • there is some controversy as to the appropriate generation time to assume in human population genetic studies
  • the human generation time may differ between cultures and may have changed during our biological and cultural evolution
  • results
  • first, we explored how various demographic forces affect the AFS, building intuition for our subsequent applications to real data
  • we then compared the performance of diffusion versus coalescent methods for evaluating the AFS, finding that the diffusion approach is substantially faster
  • we then applied our diffusion approach to infer parameters for plausible demographic models for the history of continental human populations
  • we first considered the expansion of humans out of Africa and then the settlement of the New World
  • finally in an application incorporating selection, we predicted the distribution of nonsynonymous variation between populations in our Out of Africa model
  • computational performance
  • ∂a∂i
  • Diffusion Approximations for Demographic Inference
  • the computational advantage of the diffusion method is even larger when placed in the context of parameter optimization
  • unlike the coalescent approach, there is no simulation variance
  • efficient derivative-based optimization methods can be used
  • the fits of three-population models with roughly a dozen parameters typically took a few hours to converge from a reasonable initial parameter set
  • this speed allows us to use extensive bootstrapping to estimate variances, overcoming the limitations of composite likelihood
  • expansion out of Africa
  • our analysis of human expansion out of Africa used data from three HapMap populations
  • 12 Yoruba individuals from Ibadan, Nigeria (YRI)
  • 22 CEPH Utah residents with ancestry from northern and western Europe (CEU)
  • 12 Han Chinese individuals sampled in Beijing, China (CHB)
  • previous analyses found that the YRI spectrum is well-fit by a two-epoch model with ancient population growth [5,17]
  • we found this as well
  • previous analyses of the CEU and CHB populations found that both populations went through bottlenecks [5,11] concurrent with divergence
  • such models qualitatively fit our marginal CEU-CHB spectrum
  • combining these demographic features yields the model illustrated in Figure 2B
  • allowing for asymmetric gene flow yielded very little improvement in fit, as did allowing for growth in the Eurasian ancestral population or allowing the CEU and CHB bottleneck and divergence times to differ
  • our composite likelihood function assumes that polymorphic sites are independent
  • it thus overestimates the number of effective independent data points
  • to control for linkage, we performed both conventional and parametric bootstraps
  • the times for growth in the African ancestral population and divergence of the Eurasian ancestral population (TAF and TB) have particularly wide confidence intervals, likely a consequence of the high inferred migration rate mAFB between the African and Eurasian ancestral populations
  • TAF shows high correlation with the ancestral population size NA
  • TB shows no strong linear correlation with any other single parameter
  • NAS0 < NEU0
  • the CHB population suffered a more severe bottleneck than the CEU population [11]
  • the improvement in fit to the real data upon adding contemporary migration to the model is much larger than would be expected if there were no such migration
  • the contemporary migration we infer is highly statistically significant
  • omitting ancient migration (mAFB) reduced fit quality even more
  • the data also demand substantial ancient migration
  • settling the New World
  • a model in which the CEU and CHB diverge from an equilibrium population did not reproduce the AFS well
  • a model allowing a prior size change in the ancestral population better fit the AFS but very poorly fit the observed LD decay
  • reproducing the AFS does not guarantee reproduction of LD
  • nonsynonymous polymorphism
  • diffusion approaches are particularly useful for studying such nonsynonymous polymorphism, because they easily incorporate selection
  • the diffusion approximation assumes that sites are unlinked
  • nonsynonymous segregating sites are rare enough that this is often a reasonable approximation
  • discussion
  • Keinan et al. [11] inferred no significant migration between CEU and CHB
  • our inferred divergence time of ≈22 kya between East Asians and Mexican-Americans is somewhat older than the oldest well-accepted New World archeological evidence
  • the divergence we infer may reflect the settlement of Beringia, rather than the expansion into the New World proper
  • the divergence time of ≈140 kya we infer between African and Eurasian populations is consistent with archeological evidence for modern humans in the Middle East ≈100 kya
  • it is much older than other inferences of ≈50 kya divergence from mitochondria DNA
  • this discrepancy may be explained by our inclusion of migration in the model
  • migration preserves correlation between population allele frequencies
  • an observed correlation across the genome can be explained by either recent divergence without migration or ancient divergence with migration
  • the sampled populations may not best represent those in which historically important divergences occurred
  • we consider only noncoding sequence in fitting our historical model
  • selection on regulatory or linked coding sites may skew the AFS
  • the AFS encodes substantial demographic information
  • the AFS does not capture all the information in the data
  • patterns of linkage disequilibrium encode additional information