missing heritability
Liu X, Li YI & Pritchard JK 2019 Trans effects on gene expression can drive ominigenic inheritance. Cell 177:1022-1034.
- we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans
- if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified
- nearly all of the genetic variance is driven by weak trans effects
- most of the missing heritability is due to large numbers of small-effect common variants that are not significant at current sample sizes
- between 71% and 100% of 1 megabase (Mb) windows in the genome are estimated to contribute to the heritability of schizophrenia
- much of the trait variance is mediated through genes that are not directly involved in the trait in question
- these observations appear at odds with conventional ways of understanding the links from genotype to phenotype
- much of the progress in classical genetics has come from detailed molecular work to dissect the biological mechanisms of individual mutations
- why does such a large portion of the genome contribute to heritability?
- why do the lead hits for a typical trait contribute so little to heritability?
- what factors determine the effect sizes of SNPs on traits?
- it is essential for the field to develop conceptual models for understanding complex trait architecture
- the model proposed here is a step in that direction
- genes not expressed in relevant cell types do not contribute significantly to heritability
- the per-SNP heritability in tissue-specific regulatory elements is only modestly increased relative to SNPs in broadly active regulatory elements, provided that they are active in relevant tissues
- the heritability of a typical complex trait is driven by variation in a large number of regulatory elements and genes, spread widely across the genome, and mediated through a wide range of gene functional categories
- rare variants are generally not major contributors to the overall phenotypic variance
- protein-coding variants are relatively rare in the genome and thus contribute only a small fraction of heritability
- the heritability is generally dominated by noncoding variants, especially variants in gene regulatory regions
- there is strong enrichment of both cis- and trans-eQTLs among GWAS hits, albeit still a considerable gap in linking all hits to eQTLs
- some genes (and their regulatory networks) are functionally proximate to disease risk
- these genes tend to produce the biggest signals in common- and rare-variant association studies
- they tend to be the most illuminating from the point of view of understanding disease etiology
- they are responsible for only a small fraction of the genetic variance in disease risk
- the bulk of the heritability is mediated through genes that have a wide variety of functions, many of which have no obvious functional connection to disease
- most of the GWAS hits are in noncoding, putatively regulatory regions of the genome
- the primary links between genetic variation and complex disease are via gene regulation
- the omnigenic model partitions genes into core genes and peripheral genes
- core genes can affect disease risk directly
- peripheral genes can only affect risk indirectly through trans-regulatory effects on core genes
- two key proposals of the omnigenic model are
- (1) that most, if not all, genes expressed in trans-relevant cells have the potential to affect core-gene regulation
- (2) that for typical traits, nearly all of the heritability is determined by variation near peripheral genes
- core genes are the key drivers of disease
- it is the cumulative effects of many peripheral gene variants that determine polygenic risk
- "omnigenic" has a more precise meaning than the term "polygenic"
- polygenic can be used to describe the involvement of anything from tens of loci to every variant in the genome and would include omnigenic as a special case, toward the high end of the polygenic spectrum
- we also use the term "omnigenic model" to refer to our specific model of complex trait architecture in which heritability is mainly driven by peripheral genes that trans-regulate core genes
- it is also worth distinguishing our model from Fisher's classic infinitesimal model
- it does not tell us how many causal variants to expect in practice nor about the molecular mechanisms linking genetic variation to phenotypes
- we define a gene as a "core gene" if and only if the gene product (protein, or RNA for a noncoding gene) has a direct effect—not mediated through regulation of another gene—on cellular and organismal processes leading to a change in the expected value of a particular phenotype
- all other genes expressed in relevant cell types are considered "peripheral genes" and can only affect the phenotype indirectly through regulatory effects on core genes
- genes that are "unexpressed" in trait-relevant tissues are assumed not to contribute to heritability
- most peripheral genes make small contributions to heritability
- some peripheral genes, such as transcription factors and protein regulators, play important roles because they regulate multiple core genes
- Equation 3 illustrates the key factors determining how cis- and trans-eQTL effects on core genes impact complex trait heritability
- the first two groups of terms on the right-hand side of this expression depend on the relative importance of cis and trans effects in determining expression heritability of core genes
- the third group of terms depends on genetic covariances between pairs of core genes
- these genetic covariances must arise from trans effects
- there are many more pairs of core genes (nearly M2) than core genes (M)
- these terms may dominate the heritability for most traits
- most studies are hugely underpowered to detect trans-eQTLs
- estimates of trans heritability must rely on statistical methods that aggregate weak signals
- the literature is reassuringly consistent across a range of study designs, indicating that around 60%–90% of genetic variance in expression is due to trans-acting variation
- trans-eQTLs are notoriously difficult to find in humans
- this is partly due to the extra multiple testing burden on trans-eQTLs but is mainly due to the small effect sizes of trans-eQTLs
- trans effects are uniformly small compared to cis effects, with only a handful reaching significance
- typical genes must have very large numbers of weak trans-eQTLs
- this model starts to explain why so much of the genome contributes heritability for typical traits
- suppose instead that a considerable fraction of core genes are either co-regulated with shared directions of effects or negatively co-regulated with opposite directions of effects
- the sum of covariance terms can dominate the genetic variance for trait Y
- covariances are primarily driven by trans effects, co-regulated networks could potentially act as strong amplifiers for trans-acting variants that are shared among core genes in those networks
- there has been little work so far on measuring the genetic basis of gene expression correlations
- the work to date shows that expression covariance is substantially driven by genetic factors
- Goldinger et al. (2013) studied heritability of principal components (PCs) in a dataset of whole-blood gene expression from 335 individuals
- they reported a strong genetic component in the lead PCs, with an average heritability of 0.39 for the first 50 PCs
- Lukowski et al. (2017) tested for genetic covariance between gene pairs and identified 15,000 gene pairs (0.5% of all gene pairs) with significantly nonzero genetic covariance at 5% false discovery rate
- for the 10% of gene pairs with the highest phenotypic correlation, the average genetic correlation is 0.12
- this magnitude is potentially large enough to make an important contribution to heritability
- if core genes are often co-regulated, with shared directions of effects, as seems likely, then nearly all heritability would be due to trans effects
- cis-eQTLs usually have much larger effect sizes than trans-eQTLs
- many of the biggest signals in GWASs are cis regulators of core genes
- peripheral gene-regulatory variants may become notable hits if they are trans-eQTLs for many core genes with correlated directions of effect
- the bulk of trait heritability is driven by a huge number of peripheral variants that are weak trans-eQTLs for core genes
- the 57 genome-wide significant loci explain ~20% of the heritability
- all variation tagged in current GWASs together explains ~80%
- 54% of 1 Mb windows in the genome contribute to the heritability of extreme lipid levels
- we have clear evidence for the involvement of core genes, yet they contribute only a small fraction of the genetic variance in the trait
- much of the remaining variance is due to the combined contributions of many small trans effects being funneled through the core genes