population expansion

Gazave E, Ma L, Chang D, Coventry A, Gao F, Muzny D, Boerwinkle E, Gibbs RA, Sing CF, Clark AG & Keinan A 2014 Neutral genomic regions refine models of recent rapid human population growth. PNAS 111:757-762.

  • significance
  • recent studies of the genetic signature left by rapid growth were confounded by purifying selection since they focused on genes
  • to study recent human history with minimal confounding by selection, we sequenced and examined genetic variants far from genes
  • these data point to the human population size growing by about 3.4% per generation over the last 3,000–4,000 y, resulting in a greater than 100-fold increase in population size over that epoch
  • more recent studies based on sequencing data from a relatively small number of individuals have considered recent population growth in fitting models to the observed site frequency spectrum (SFS) and reported as much as a 0.5% increase in Ne per generation, culminating in a Ne of a few tens of thousands today (13, 14)
  • these studies could not capture the full scope of population growth
  • a larger sample size of individuals is needed to observe single nucleotide variants (SNVs) that arose during the recent epoch of growth
  • several recent sequencing studies with very large numbers of individuals have observed an unprecedented excess in the proportion of rare SNVs
  • fitting models to the SFS, these studies have captured a clearer, more rapid recent population growth than earlier studies (1719)
  • at the same time, demographic estimates varied by as much as an order of magnitude between these studies
  • the genetic signature left by purifying selection on the SFS confounds the signature left by recent growth (21)
  • to minimize this confounding effect, recent studies based on protein-coding genes considered for modeling solely synonymous SNVs
  • synonymous mutations have been shown to be targeted by natural selection
  • to study recent human genetic history with minimal effects of selection, it is not only desirable to consider accurate sequencing data from a large number of individuals, but also to focus on genomic regions in which mutations are putatively neutral
  • our best-fit model estimates a growth of 3.4% per generation over the last 141 generations, which is more rapid than estimated in recent large-scale studies (17, 19)
  • differences among previous models (17, 19)—and between these models and ours—can be partially explained by a priori assumptions about more ancient demographic events
  • the neutral regions dataset (NR) consists of loci that are at least 100,000 bp and at least 0.1 cM away from coding or potentially coding sequences
  • we compared the SFS from the NR dataset to that from the Exome Sequencing Project (ESP)
  • the highest agreement is observed for intronic and intergenic annotations from the ESP data
  • these are still significantly different from the NR
  • agreement is much worse between synonymous SNVs and the NR data
  • this lack of agreement is due to a higher proportion of very rare variants in the ESP data (...), which is consistent with purifying selection playing a larger role in maintaining alleles at lower frequencies in and around genes
  • the SFS predicted by recently published demographic models of European populations that include recent exponential population growth (17, 19) do not closely match the SFS of the NR data
  • we next estimated the magnitude and duration of population growth by fitting the SFS of the NR to several different models of recent history
  • the first demographic model that we fit to the SFS consists of a recent exponential population growth with two free parameters
  • the time growth started and the extant Ne
  • more ancient demographic events, before the epoch of growth, including two population bottlenecks, were assumed to follow the model and estimates of ref. 9
  • the resulting model (model I) estimates the extant population size and, as a consequence, the growth rate, with very large uncertainty
  • model I, similar to other recent models of population growth (1719), assumed more ancient demography as fixed, with the intention of obtaining better resolution for estimating the two parameters of recent history
  • different recent models of population growth have assumed different models of ancient demography
  • to test how sensitive model I is to the details of assumed ancient history, we repeated fitting model I while assuming each of the above ancient demographic models
  • our results point to large differences among the three resulting models
  • the assumption of ancient demography has a major effect on estimating the timing and magnitude of recent population growth, which explains some of the differences among the recently published models
  • to alleviate some of the sensitivity to ancient demography assumptions, we fit model II that extends model I by adding an additional parameter for the effective population size just before exponential growth
  • this three-parameter model fits the NR data significantly better than model I
  • it estimates the ancestral Ne before the growth to be 5,633 (CI of 4,400–7,100), markedly lower than the fixed value of 10,000 in model I
  • it estimates growth starting 141 (117–165) generations ago, which is a little earlier than in model I, with a less rapid growth rate of 3.4% (2.4–5.1%) per generation, which culminates in an extant Ne of 0.65 (0.3–2.87) million individuals
  • to test whether this improved model II can explain the differences between different assumed ancient demographic histories, we repeated fitting its three parameters to the NR similarly to above with model I
  • when ancient demography is assumed from Gravel et al. (14), this model fits the data significantly better (P = 2.9 × 10−7) than the equivalent of model I with the same ancient demography
  • parameter estimates become practically identical to those of the above model II based on Keinan et al. (9)
  • model II based on Schaffner et al. (15) does not fit the data better than the respective model I (P = 0.52)
  • one notable difference in the model of Schaffner et al. (15) is that the timing of the second European population bottleneck is assumed as fixed at over twofold that estimated in the other two models (9, 14)
  • the improved model II is much less sensitive to assumptions about more ancient demography
  • it goes a long way in closing the gap between different published models of recent growth (17, 19) and between these and model I
  • we considered several additional models in which growth can span two separate epochs with a different growth rate in each
  • none of these more detailed models fit the data better than model II
  • because population structure also affects the distribution of allele frequencies, the data consist of individuals with a homogenous European ancestry
  • we considered models that allow for acceleration of the rate of growth
  • none supported such acceleration
  • one recent model considered two separate epochs of exponential growth (19)
  • the first captures a slow recovery from the Eurasian population bottleneck ∼23,000 y ago, with a weak growth rate of 0.3% that leads to an Ne of only 9,208, which is similar to the instantaneous recovery from the population bottleneck in other models (14)
  • to date no recent acceleration in the rate of growth that is along the lines suggested by archeological evidence has been observed in genetic data
  • not capturing two separate epochs of growth can be due to limited statistical power or overly simplified models
  • another potential explanation is that effective population size increases extremely slowly with the census population size
  • the particular increase in census population size with the Neolithic revolution has been accompanied by changing social structure that has led to increased variability in reproductive success
  • increased variance in reproductive success results in relatively decreased effective population size
  • this can explain either a lack of growth in effective population size initially or a much milder one than in census size
  • this increased variance, in turn, can lead to our models only capturing the more recent and more rapid growth
  • selection of individuals with shared European genetic ancestry
  • PCA was run on 9,716 European Americans from the ARIC cohort
  • outliers of inferred non-European ancestry were removed, in addition to regions of extended linkage disequilibrium such as inversions
  • a total of 500 individuals that were densely clustered together based on the first four principal components (PCs) were then chosen for sequencing
  • supplementary information
  • we also studied two three-parameter models:
  • similar to Model I, with the ancestral Ne before growth as an additional free parameter ('Model II')
  • a model with two epochs of growth with the start time of the first growth fixed at 400 generations ago ('Model III')
  • we additionally explored a four-parameter model by allowing the time of the first growth in Model III to be another free parameter ('Model IV')