Tajima's coalescent
Palacios JA, Véber A, Cappello L, wang Z, Wakeley J & Ramachandran S 2019 Bayesian estimation of population size changes by sampling Tajima's trees. Genetics 213:967-986.
- our objective in the implementation of BESTT is to estimate the posterior distribution of model parameters by replacing Kingman's genealogy with Tajima's genealogy gT
- replacing Kingman's genealogy by Tajima's genealogy in our posterior distribution exponentially reduces the size of the state space of genealogies
- Tajima's genealogies
- our method of computing the probability of the recoded data, Yh × m, uses ranked tree shapes rather than fully labeled histories
- we refer to these ranked tree shapes as Tajima's genealogies
- they have also been called unlabeled rooted trees (Griffiths and Tavaré 1995) and evolutionary relationships (Tajima 1983)
- in Tajima's genealogies, only the internal nodes are labeled and they are labeled by their order in time
- Tajima's genealogies encode the minimum information needed to compute the probability of data Yh × m, which consists of nested sets of mutations, without any approximations
- no other labels matter because individuals are exchangeable in the population model we assume
- this represents a dramatic coarsening of tree space compared to the classical leaf-labeled binary trees of Kingman's coalescent
- this provides a much more efficient way to integrate over the key hidden variable, the unknown gene genealogy of the sample, when computing likelihoods
- we model this hidden variable using the vintaged and sized coalescent (Sainudiin et al. 2015), which corresponds exactly to this coarsening of Kingman's coalescent
- the main computational bottleneck of coalescent-based inference of evolutionary histories lies in the large cardinality of the hidden state space of genealogies
- in the standard Kingman coalescent, a genealogy is a random labeled bifurcating tree that models the set of ancestral relationships of the samples
- a lower-resolution coalescent model on genealogies, Tajima's coalescent, can be used as an alternative to the standard Kingman coalescent model
- the Tajima coalescent model provides a feasible alternative that integrates over a smaller state space than the standard Kingman model
- the main advantage in Tajima's coalescent is modeling of the ranked tree topology as opposed to the fully labeled tree topology, as in Kingman's coalescent
- our method does not model recombination, population structure, or selection
- it assumes completely linked and neutral segments from individuals from a single population, and the infinite sites mutation model