isolation with migration

Sousa VC, Grelaud A & Hey J 2011 On the nonidentifiability of migration time estimates in isolation with migration models. Mol Ecol 20:3956-3962.

  • some conclusions of previous studies that draw upon the posterior distribution for times of migration should be discounted
  • even though the IM model assumes a constant rate of gene flow since population splitting, it seemed that by examining the genealogies sampled from the posterior density, it would also be possible to estimate the posterior density of migration times (Won & Hey 2005)
  • as Strasburg & Rieseberg (2011) discovered by simulation and as we show here using an approach based on the calculation of the probability of a genealogy, this is not the case
  • the posterior of the migration timing (eqn 6) is fully characterized by the posterior h(s,Λ|X)
  • information provided by the data about the most likely times of migration is captured through the posterior of the summaries s
  • the fact that these summaries are sums of counts and rates of events across loci introduces an identifiability problem
  • we can estimate the most likely values for the sums given the data, h(s,Λ|X)
  • we cannot expect to estimate each term of the sum
  • we can have two or more genealogies with the same posterior probability but with different migration timing distributions
  • in these cases, genealogies are said to be nonidentifiable as it is impossible to distinguish them based on their posterior
  • when using the coalescent to calculate the probability of genealogies under a model with migration, such as the IM model, the probability of a genealogy depends only on a modest set of summaries s = (cc,cm,fc,fm)
  • genealogies that differ in their times of migration can have the same values for s
  • genealogies with different migration timings can have the same posterior probability
  • the migration timings are statistically nonidentifiable
  • investigators cannot expect to be able to estimate migration times for the purpose of discerning models of population or species divergence where gene flow varies through time
  • this is a general result applicable to genealogies under neutral demographic models that include migration and that depend on the coalescent theory
  • unlike the prior distributions for the migration rates that are usually uniform and specified by the investigator, the prior distributions for the migration times are induced by the model assumptions
  • in a model with constant gene flow, the prior distribution for the migration times is not expected to be uniform, but rather a decreasing function with a peak close to zero
  • the number of migration events is proportional to the number of lineages in each population at any instant
  • given that the number of lineages decreases going backwards in time owing to coalescent events, most migrations are expected to occur recently
  • this may explain some of the results found suggesting recent migration
  • the initial motivation for looking at the posterior of migration timing was to infer variation in gene flow through time
  • we can envision at least two possible approaches to modelling variable migration rates explicitly
  • one is to assume that migration rates vary through time following some deterministic function, e.g., exponential change, the parameters of which are estimated from the data along with other parameters
  • another possibility is to include in the model more migration parameters, each associated with a distinct time period