missing heritability

Liu X, Li YI & Pritchard JK 2019 Trans effects on gene expression can drive ominigenic inheritance. Cell 177:1022-1034.

  • we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans
  • if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified
  • nearly all of the genetic variance is driven by weak trans effects
  • most of the missing heritability is due to large numbers of small-effect common variants that are not significant at current sample sizes
  • between 71% and 100% of 1 megabase (Mb) windows in the genome are estimated to contribute to the heritability of schizophrenia
  • much of the trait variance is mediated through genes that are not directly involved in the trait in question
  • these observations appear at odds with conventional ways of understanding the links from genotype to phenotype
  • much of the progress in classical genetics has come from detailed molecular work to dissect the biological mechanisms of individual mutations
  • why does such a large portion of the genome contribute to heritability?
  • why do the lead hits for a typical trait contribute so little to heritability?
  • what factors determine the effect sizes of SNPs on traits?
  • it is essential for the field to develop conceptual models for understanding complex trait architecture
  • the model proposed here is a step in that direction
  • genes not expressed in relevant cell types do not contribute significantly to heritability
  • the per-SNP heritability in tissue-specific regulatory elements is only modestly increased relative to SNPs in broadly active regulatory elements, provided that they are active in relevant tissues
  • the heritability of a typical complex trait is driven by variation in a large number of regulatory elements and genes, spread widely across the genome, and mediated through a wide range of gene functional categories
  • rare variants are generally not major contributors to the overall phenotypic variance
  • protein-coding variants are relatively rare in the genome and thus contribute only a small fraction of heritability
  • the heritability is generally dominated by noncoding variants, especially variants in gene regulatory regions
  • there is strong enrichment of both cis- and trans-eQTLs among GWAS hits, albeit still a considerable gap in linking all hits to eQTLs
  • some genes (and their regulatory networks) are functionally proximate to disease risk
  • these genes tend to produce the biggest signals in common- and rare-variant association studies
  • they tend to be the most illuminating from the point of view of understanding disease etiology
  • they are responsible for only a small fraction of the genetic variance in disease risk
  • the bulk of the heritability is mediated through genes that have a wide variety of functions, many of which have no obvious functional connection to disease
  • most of the GWAS hits are in noncoding, putatively regulatory regions of the genome
  • the primary links between genetic variation and complex disease are via gene regulation
  • the omnigenic model partitions genes into core genes and peripheral genes
  • core genes can affect disease risk directly
  • peripheral genes can only affect risk indirectly through trans-regulatory effects on core genes
  • two key proposals of the omnigenic model are
  • (1) that most, if not all, genes expressed in trans-relevant cells have the potential to affect core-gene regulation
  • (2) that for typical traits, nearly all of the heritability is determined by variation near peripheral genes
  • core genes are the key drivers of disease
  • it is the cumulative effects of many peripheral gene variants that determine polygenic risk
  • "omnigenic" has a more precise meaning than the term "polygenic"
  • polygenic can be used to describe the involvement of anything from tens of loci to every variant in the genome and would include omnigenic as a special case, toward the high end of the polygenic spectrum
  • we also use the term "omnigenic model" to refer to our specific model of complex trait architecture in which heritability is mainly driven by peripheral genes that trans-regulate core genes
  • it is also worth distinguishing our model from Fisher's classic infinitesimal model
  • it does not tell us how many causal variants to expect in practice nor about the molecular mechanisms linking genetic variation to phenotypes
  • we define a gene as a "core gene" if and only if the gene product (protein, or RNA for a noncoding gene) has a direct effect—not mediated through regulation of another gene—on cellular and organismal processes leading to a change in the expected value of a particular phenotype
  • all other genes expressed in relevant cell types are considered "peripheral genes" and can only affect the phenotype indirectly through regulatory effects on core genes
  • genes that are "unexpressed" in trait-relevant tissues are assumed not to contribute to heritability
  • most peripheral genes make small contributions to heritability
  • some peripheral genes, such as transcription factors and protein regulators, play important roles because they regulate multiple core genes
  • Equation 3 illustrates the key factors determining how cis- and trans-eQTL effects on core genes impact complex trait heritability
  • the first two groups of terms on the right-hand side of this expression depend on the relative importance of cis and trans effects in determining expression heritability of core genes
  • the third group of terms depends on genetic covariances between pairs of core genes
  • these genetic covariances must arise from trans effects
  • there are many more pairs of core genes (nearly M2) than core genes (M)
  • these terms may dominate the heritability for most traits
  • most studies are hugely underpowered to detect trans-eQTLs
  • estimates of trans heritability must rely on statistical methods that aggregate weak signals
  • the literature is reassuringly consistent across a range of study designs, indicating that around 60%–90% of genetic variance in expression is due to trans-acting variation
  • trans-eQTLs are notoriously difficult to find in humans
  • this is partly due to the extra multiple testing burden on trans-eQTLs but is mainly due to the small effect sizes of trans-eQTLs
  • trans effects are uniformly small compared to cis effects, with only a handful reaching significance
  • typical genes must have very large numbers of weak trans-eQTLs
  • this model starts to explain why so much of the genome contributes heritability for typical traits
  • suppose instead that a considerable fraction of core genes are either co-regulated with shared directions of effects or negatively co-regulated with opposite directions of effects
  • the sum of covariance terms can dominate the genetic variance for trait Y
  • covariances are primarily driven by trans effects, co-regulated networks could potentially act as strong amplifiers for trans-acting variants that are shared among core genes in those networks
  • there has been little work so far on measuring the genetic basis of gene expression correlations
  • the work to date shows that expression covariance is substantially driven by genetic factors
  • Goldinger et al. (2013) studied heritability of principal components (PCs) in a dataset of whole-blood gene expression from 335 individuals
  • they reported a strong genetic component in the lead PCs, with an average heritability of 0.39 for the first 50 PCs
  • Lukowski et al. (2017) tested for genetic covariance between gene pairs and identified 15,000 gene pairs (0.5% of all gene pairs) with significantly nonzero genetic covariance at 5% false discovery rate
  • for the 10% of gene pairs with the highest phenotypic correlation, the average genetic correlation is 0.12
  • this magnitude is potentially large enough to make an important contribution to heritability
  • if core genes are often co-regulated, with shared directions of effects, as seems likely, then nearly all heritability would be due to trans effects
  • cis-eQTLs usually have much larger effect sizes than trans-eQTLs
  • many of the biggest signals in GWASs are cis regulators of core genes
  • peripheral gene-regulatory variants may become notable hits if they are trans-eQTLs for many core genes with correlated directions of effect
  • the bulk of trait heritability is driven by a huge number of peripheral variants that are weak trans-eQTLs for core genes
  • the 57 genome-wide significant loci explain ~20% of the heritability
  • all variation tagged in current GWASs together explains ~80%
  • 54% of 1 Mb windows in the genome contribute to the heritability of extreme lipid levels
  • we have clear evidence for the involvement of core genes, yet they contribute only a small fraction of the genetic variance in the trait
  • much of the remaining variance is due to the combined contributions of many small trans effects being funneled through the core genes