population history

Hey J 2010 Isolation with migration models for more than two populations. Mol Biol Evol, in press.
doi:10.1093/molbev/msp296

  • Nielsen and Wakeley (2001) developed the first general procedure for estimating population size, migration and splitting time parameters for an IM model with two populations
  • they adapted the Bayesian Markov chain Monte Carlo (MCMC) approach devised by Wilson and Balding (1998) to estimate the posterior probability of the parameters and the genealogy
  • this approach suffers because the state space of the Markov chain simulation is very large (i.e., it includes all parameters and genealogies) and because the primary results are a list of recorded values of parameters, rather than an estimate of a posterior density function
  • these difficulties can be partly overcome by a method that uses direct calculation of the prior probability of the genealogy to run a Markov chain simulation over a space that includes only G and population splitting times, t
  • this MCMC simulation generates samples from the posterior density P(G, t|X)
  • these are then used to build an estimate of the joint posterior probability of the model parameters, p(Θ|X) (Hey and Nielsen 2007)
  • analyses of pairs of populations assume that each is the other's closest relative and that no other populations have contributed to the divergence process of the sampled populations (e.g., by exchanging genes with them)
  • in cases where gene flow levels have been low, population splitting times are not very close to one another in time, and population sizes have not greatly changed over time periods, the assumptions of an IM analysis on a reduced set of populations will not be greatly violated
  • the results for a full multipopulation analysis should be predictable on the basis of results on reduced samples
  • with more populations and parameters comes the need not only for more data to inform on recent processes involving the newly included populations but also an additional need for data that can inform on processes that occurred further back in time
  • increasing the number of gene copies sampled per locus will not be of much help in accessing older time periods because the timing of coalescent events in populations are heavily weighted toward the recent (Felsenstein 2006)
  • increasing the number of loci can provide access to older time periods
  • though the numbers of loci required may be very large (e.g., hundreds or thousands) for older histories and larger models