polygenic score

Edge MD & Coop G 2019 Reconstructing the history of polygenic scores using coalescent trees. Genetics 211:235-262.

  • standard population-genetic methods for inferring evolutionary history are lii-suited for polygenic traits
  • when there are many variants of small effect, signatures of natural selection are spread across the genome and are subtle at any one locus
  • various methods have emerged for detecting the action of natural selection on polygenic scores, sums of genotypes weighted by GWAS effect sizes
  • most existing methods do not reveal the timing or strength of selection
  • we present a set of methods for estimating the historical time course of a population-mean polygenic score using local coalescent trees at GWAS loci
  • these time courses are estimated by using coalescent theory to relate the branch lengths of trees to allele-frequency change
  • the resulting time course can be tested for evidence of natural selection
  • complex traits (traits affected by many genetic loci and by environmental variation) are ill-suited to study by single-locus methods
  • selection, even when its effect is spread over many loci, leaves systematic signals in coalescent trees at loci underlying trait variation
  • the population-mean level of a polygenic score (a prediction of phenotype from an individual’s genotype) is the target of estimation
  • if the tree for the locus is known, then the branch on which the mutation must have appeared is known
  • the exact time of the mutation is not
  • in practice, we assume that the mutation occurred in the middle of the branch connecting the derived subtree to the rest of the tree
  • in Appendix B, we give a Bayesian interpretation of the proportion-of-lineages estimator that relies on connections between neutral diffusion and the ancestral process (Tavaré 1984)
  • we posit that at each locus, over short timescales, allele-frequency changes between time points follow a normal[0, fpi(1 − pi)] distribution (Cavalli-Sforza et al. 1964; Nicholson et al. 2002)
  • approximate normality fails over longer timescales
  • in part because allele frequencies are bounded by 0 and 1
  • the genotype-phenotype association captured by a polygenic score may be due to causal effects of the included genotypes
  • it might also be due to linkage disequilibrium between tag SNPs and ungenotyped causal SNPs
  • to indirect genetic effects
  • or to environmental effects that covary with genotype for other reasons
  • any of these sources of association between genotype and phenotype might change as the environment and genetic background of the population change over time, causing time courses estimated by our method to deviate from the history of average trait values in the population
  • the genetic architecture may change across the time period over which estimates are made, for example because of changes in linkage disequilibrium between tag loci and causal loci
  • or because loci that explained trait variation in the past have since fixed or been lost
  • the methods we propose here have several limitations that suggest directions for future work
  • the theory we develop here is for a single population
  • our setting within a coalescent framework suggests the possibility for extension to multiple populations, perhaps by developing multivariate analogs of our statistics within a coalescent-with-migration framework
  • we work with polygenic scores for a single trait
  • our methods can be extended to consider polygenic scores for multiple, correlated traits
  • Berg et al. (2017) have recently extended the QX statistic to multiple correlated traits, drawing inspiration from the framework of Lande and Arnold (1983)
  • Appendix A: the relationship between coalescent rates and phenotypic selection
  • we describe the consequences of directional and stabilizing selection on a trait for coalescence rates at loci associated with the trait
  • Appendix E: simulation details
  • allele-frequency time courses were simulated using the normal approximation to the diffusion
  • in simulations with selection, the derived-allele frequency was simulated forward in time in steps of 1 / (2N) coalescent units (i.e., one diploid generation)
  • the frequency at the next time step, pt + 1, was drawn from a normal distribution with expectation pt + spt(1 − pt) and variance pt(1 − pt) / (2N)
  • Appendix F: power comparison with an analog of trait SDS
  • recent selection distorts the terminal branches of the coalescent at a selected locus
  • Field et al. reasoned that recent selection would alter the distribution of singleton mutations around selected sites
  • favored alleles are expected to have short terminal coalescent branches and thus relatively few singletons nearby compared with disfavored alleles