Tajima's coalescent

Palacios JA, Wakeley J & Ramachandran S 2015 Bayesian nonparametric inference of population size changes from sequential genealogies. Genetics 201:281-304.

  • we address a key problem for inference of population size trajectories under sequentially Markov coalescent models
  • we express the transition densities of local genealogies in terms of local ranked tree shapes (Tajima 1983) and coalescent times and show that these quantities are statistically sufficient for inferring population size trajectories either from sequence data directly or from the set of local genealogies
  • the use of ranked tree shapes allows us to exploit the state process of local genealogies efficiently since the space of ranked tree shapes has a smaller cardinality than the space of labeled topologies (Sainudiin et al. 2014)
  • sequential Tajima's genealogies are sufficient statistics under the SMC'
  • the sufficient statistics for inferring N(t) under the SMC' model are the coalescent times, when taken together with local ranked tree shapes (tree with no labels but ranked coalescent events)
  • for a single locus, the set of coalescent times together with the ranked tree shape corresponds to a realization of Tajima's n-coalescent
  • the set of local Tajima's genealogies has sufficient statistics for inferring N(t) under the SMC' model
  • our model can be easily modified to model a variable recombination rate along chromosomal segments and to jointly infer variable recombination rates and N(t)
  • under the SMC' model, local ranked tree shapes and coalescent times correspond to a set of local Tajima's genealogies
  • these Tajima's genealogies are sufficient statistics for inferring N(t)
  • under the SMC' model, the state space needed for inferring population size trajectories from sequence data is that of a sequence of local Tajima's genealogies
  • this lumping, or reduction of the original SMC' process, will allow more efficient inference from sequence data directly