polygenic score

Barton N, Hermisson J & Nordborg M 2019 Why structure matters. eLife 8:e45380.

  • the first GWAS for height found a small number of SNPs that jointly explained only a tiny fraction of the variation
  • this was in contrast with the high heritability seen in twin studies
  • it was dubbed ‘the missing heritability problem’
  • it was suggested that the problem was simply due to a lack of statistical power to detect polymorphisms of small effect
  • most of the variation remains ‘unmappable’
  • sample sizes on the order of a million are still not large enough
  • one way in which the unmappable component of genetic variation can be included in a statistical measure is via so-called polygenic scores
  • these scores sum the estimated contributions to the trait across many SNPs, including those whose effects, on their own, are not statistically significant
  • polygenic scores thus represent a shift from the goal of identifying major genes to predicting phenotype from genotype
  • when a GWAS is carried out to identify major genes, it is relatively simple to avoid false positives by eliminating associations outside major loci
  • if the goal is to make predictions, or to understand differences among populations (such as the latitudinal cline in height), we need accurate and unbiased estimates for all SNPs
  • accomplishing this is extremely challenging
  • it is also difficult to know whether one has succeeded
  • one possibility is to compare the population estimates with estimates taken from sibling data, which should be relatively unbiased by environmental differences
  • Berg et al. and Sohail et al. independently found that evidence for selection vanishes – along with evidence for a genetic cline in height across Europe
  • the previously published results were due to the cumulative effects of slight biases in the effect-size estimates in the GIANT data
  • they also found evidence for confounding in the sibling data used as a control by Robinson et al. and Field et al.
  • we still do not know whether genetics and selection are responsible for the pattern of height differences seen across Europe
  • there is no perfect way to control for complex population structure and environmental heterogeneity
  • biases at individual loci may be tiny
  • they become highly significant when summed across thousands of loci – as is done in polygenic scores
  • standard methods to control for these biases, such as principal component analysis, may work well in simulations but are often insufficient when confronted with real data
  • even the data in the UK Biobank seems to contain significant structure
  • quantitative genetics has proved highly successful in plant and animal breeding
  • this success has been based on large pedigrees, well-controlled environments, and short-term prediction
  • when these methods have been applied to natural populations, even the most basic predictions fail, in large part due to poorly understood environmental factors
  • natural populations are never homogeneous
  • it is therefore misleading to imply there is a qualitative difference between ‘within-population’ and ‘between-population’ comparisons