polygenic score - senecio7の日記

Barton N, Hermisson J & Nordborg M 2019 Why structure matters. eLife 8:e45380.

the first GWAS for height found a small number of SNPs that jointly explained only a tiny fraction of the variation
this was in contrast with the high heritability seen in twin studies
it was dubbed ‘the missing heritability problem’
it was suggested that the problem was simply due to a lack of statistical power to detect polymorphisms of small effect
most of the variation remains ‘unmappable’
sample sizes on the order of a million are still not large enough
one way in which the unmappable component of genetic variation can be included in a statistical measure is via so-called polygenic scores
these scores sum the estimated contributions to the trait across many SNPs, including those whose effects, on their own, are not statistically significant
polygenic scores thus represent a shift from the goal of identifying major genes to predicting phenotype from genotype

when a GWAS is carried out to identify major genes, it is relatively simple to avoid false positives by eliminating associations outside major loci
if the goal is to make predictions, or to understand differences among populations (such as the latitudinal cline in height), we need accurate and unbiased estimates for all SNPs
accomplishing this is extremely challenging
it is also difficult to know whether one has succeeded
one possibility is to compare the population estimates with estimates taken from sibling data, which should be relatively unbiased by environmental differences

Berg et al. and Sohail et al. independently found that evidence for selection vanishes – along with evidence for a genetic cline in height across Europe
the previously published results were due to the cumulative effects of slight biases in the effect-size estimates in the GIANT data
they also found evidence for confounding in the sibling data used as a control by Robinson et al. and Field et al.
we still do not know whether genetics and selection are responsible for the pattern of height differences seen across Europe
there is no perfect way to control for complex population structure and environmental heterogeneity
biases at individual loci may be tiny
they become highly significant when summed across thousands of loci – as is done in polygenic scores
standard methods to control for these biases, such as principal component analysis, may work well in simulations but are often insufficient when confronted with real data
even the data in the UK Biobank seems to contain significant structure

quantitative genetics has proved highly successful in plant and animal breeding
this success has been based on large pedigrees, well-controlled environments, and short-term prediction
when these methods have been applied to natural populations, even the most basic predictions fail, in large part due to poorly understood environmental factors
natural populations are never homogeneous
it is therefore misleading to imply there is a qualitative difference between ‘within-population’ and ‘between-population’ comparisons