Tajima's coalescent

Palacios JA, Véber A, Cappello L, wang Z, Wakeley J & Ramachandran S 2019 Bayesian estimation of population size changes by sampling Tajima's trees. Genetics 213:967-986.

  • our objective in the implementation of BESTT is to estimate the posterior distribution of model parameters by replacing Kingman's genealogy with Tajima's genealogy gT
  • replacing Kingman's genealogy by Tajima's genealogy in our posterior distribution exponentially reduces the size of the state space of genealogies
  • Tajima's genealogies
  • our method of computing the probability of the recoded data, Yh × m, uses ranked tree shapes rather than fully labeled histories
  • we refer to these ranked tree shapes as Tajima's genealogies
  • they have also been called unlabeled rooted trees (Griffiths and Tavaré 1995) and evolutionary relationships (Tajima 1983)
  • in Tajima's genealogies, only the internal nodes are labeled and they are labeled by their order in time
  • Tajima's genealogies encode the minimum information needed to compute the probability of data Yh × m, which consists of nested sets of mutations, without any approximations
  • no other labels matter because individuals are exchangeable in the population model we assume
  • this represents a dramatic coarsening of tree space compared to the classical leaf-labeled binary trees of Kingman's coalescent
  • this provides a much more efficient way to integrate over the key hidden variable, the unknown gene genealogy of the sample, when computing likelihoods
  • we model this hidden variable using the vintaged and sized coalescent (Sainudiin et al. 2015), which corresponds exactly to this coarsening of Kingman's coalescent
  • the main computational bottleneck of coalescent-based inference of evolutionary histories lies in the large cardinality of the hidden state space of genealogies
  • in the standard Kingman coalescent, a genealogy is a random labeled bifurcating tree that models the set of ancestral relationships of the samples
  • a lower-resolution coalescent model on genealogies, Tajima's coalescent, can be used as an alternative to the standard Kingman coalescent model
  • the Tajima coalescent model provides a feasible alternative that integrates over a smaller state space than the standard Kingman model
  • the main advantage in Tajima's coalescent is modeling of the ranked tree topology as opposed to the fully labeled tree topology, as in Kingman's coalescent
  • our method does not model recombination, population structure, or selection
  • it assumes completely linked and neutral segments from individuals from a single population, and the infinite sites mutation model