hemiplasy

Lee KM & Coop G 2019 Population genomics perspectives on convergent adaptation. Phil Trans R Soc B 374:20180236.

  • the sharing of traits or alleles incongruent with the population or species tree owing to ancestral variation (i.e. ILS shown in figure 1a) has been termed hemiplasy
  • the key question is whether selection has independently increased the allele to fixation multiple times
  • taxa sharing the same substitution owing to ILS or gene flow means there has not been independence at the level of the mutational change
  • the allele frequency change at the locus can still be independent, that is convergent, across populations
  • even if an allele was shared owing to ILS, selection may have repeatedly driven up the allele in each population separately
  • regardless of the precise terminology we use, it is important to keep clear statements about the level of independence and the role selection has played in these changes
  • we introduce a conceptual framework to quantify the degree to which allele frequency shifts have been independent
  • we should be much more impressed by selection that repeatedly moves an allele from 10−4 to 50% than 50% to 100%, despite being comparable frequency changes and taking roughly the same time under an additive model
  • we should be much more impressed by repeated adaptation from very rare standing variation than common variation
  • we cannot simply sum the work done by selection (i.e. selective deaths) across multiple loci, unless the loci contribute multiplicatively to fitness
  • this argument forms the basis of the rejection of Haldane's substitution load argument, and Kimura's gain of information extension, as a limit on the rate of evolution
  • combining information over loci additively, that is assuming multiplicative epistasis, seems a fine first approximation
  • this limitation should be borne in mind
  • convergent adaptation, at the genetic or phenotypic level, can potentially be conceptualized in this way as the independent gain of information owing to selection
  • dissecting whether the same or different mutations gave rise to the adaptive alleles will help us gain insights into questions related to mutational target size or epistatic constraints
  • these questions can be somewhat orthogonal to more basic questions about adaptation
  • human populations have repeatedly adapted to different environments through changes in skin pigmentation
  • this convergent adaptation on the level of the phenotype comes from a mixture of selection on old standing variation, both derived and ancestral variants, and recent mutations
  • in information theory, the negative log probability of an outcome under a particular model is called the surprise (or the self information)

population expansion

Ragsdale AP & Gravel S 2019 Models of archaic admixture and recent history from two-locus statistics. PLoS Genet 15:e1008204.

  • human expansion models underestimate LD between low frequency variants
  • the demographic model for human out-of-Africa (OOA) expansion proposed and inferred by Gutenkunst et al. [2] has been widely used for subsequent simulation studies
  • parameter estimates have been refined as more data became available [3, 35, 38]
  • we generalized the Gutenkunst model with a number of additional parameters accounting for recent events, including size changes in the YRI population, recent admixture between populations, and substructure within each continental population
  • none of these modifications provided satisfactory fit to the data and some did not converge to biologically realistic parameters
  • Fig 3
  • (A) we fit the 13-parameter Gutenkunst et al. model to statistics in the two-locus, multi-population Hill-Robertson system
  • most statistics were accurately predicted by this model, including
  • (B) the decays of E[D2] in each population
  • (C) the decay of the covariance of D between populations
  • (E) the joint heterozygosity E[π2(i)]
  • (D) E[Di(1 − 2pi)(1 − qi)] was fit poorly by this model
  • we were unable to find a three-population model that recovered these observed statistics, including with additional periods of growth, recent admixture between human populations, or substructure within modern populations
  • Table 1
  • we fit the commonly used 13-parameter model to the multi-population Hill-Robertson statistics
  • the best fit parameters shown here were fit to the set of statistics without the E[Dz] terms
  • inclusion of those terms led to runaway parameter behavior in the optimization
  • this is often a sign of model mis-specification
  • the same 13-parameter model is augmented by the inclusion of two deeply diverged branches, putatively Neanderthal and an unknown lineage within Africa
  • these branches split from the branch leading to modern humans roughly 460 – 650 kya, and contributed migrants until quite recently (~ 19 kya)
  • times reported here assume a generation time of 29 years and are calibrated by the recombination (rather than mutation) rate
  • Fig 4
  • inferred OOA model with archaic admixture
  • (A) we fit a model for out-of-Africa expansion related to the standard model in Fig 3A
  • we also include two branches with deep split from the ancestral population to modern humans
  • (B-E) this model fits the data much better than the model without archaic admixture, and especially for the Dz statistics (D)
  • beyond the classical recursions for E[D] and E[D2] [12, 14], two-locus statistics are difficult to compute for non-equilibrium, multi-population demographic models
  • Sved [53] proposed an IBD based recursion to compute E[r2] across subdivided populations
  • its accuracy and interpretation remain debated [12]

canalization

Hallgrimsson B, Green RM, Katz DC, Fish JL, Bernier FP, Roseman CC, Young NM, Cheverud JM & Marcucio RS 2019 The developmental-genetics of canalization. Sem Cell Dev Biol 88:67-79.

  • canalization is a potentially significant cause of missing heritability
  • canalization, or the tendency to buffer variation, is about the modulation of phenotypic variance due to factors other than the genetic or environmental variance, per se
  • phenotypic robustness is a more general concept
  • canalization refers to the minimization of variation among individuals
  • developmental stability is the tendency to minimize variation among replicated structures within individuals
  • developmental stability is most often measured via random, normally distributed departures from symmetry where symmetrical development is expected, or fluctuating asymmetry
  • canalization and developmental stability are partially overlapping phenomena
  • it is not known, however, to what extent measures of canalization and developmental stability capture the same biological phenomena
  • canalization was motivated by developmental biology
  • phenotypic plasticity has a more naturalistic or population biology origin
  • coined by Woltereck [37], the concept of a reaction norm was fleshed out by Schmalhausen [38] whose primary interest was in the central role of stabilizing selection in evolution
  • independently of Waddington, Schmalhausen proposed a concept very similar to canalization which, in the English translation, he termed autonomization
  • for him, reaction norms, shaped by stabilizing selection, are adaptive
  • autonomization is not just any minimization of any variation
  • it is specifically the minimization of non-adaptive variation around the reaction norm
  • this contrasts significantly with current thinking on plasticity
  • plasticity is not necessarily adaptive
  • canalization and phenotypic plasticity are merely abstractions from patterns in data
  • they are not processes or mechanisms
  • the problem with viewing canalization as simply the inverse of plasticity is that plasticity is a much more general phenomenon
  • Wagner et al.'s [8] definition of canalization avoids this confusion
  • they define canalization as the suppression of phenotypic variation of either genetic or environmental origin
  • this makes canalization a dispositional concept, referring to a tendency or potential
  • canalization is the tendency to suppress variation
  • canalization is not a component of an observed phenotypic variance
  • it is a component of variability, or the tendency to vary
  • defined in this way, canalization is distinct from the more general concept of phenotypic plasticity
  • this definition also allows for the possibility of changes in among-individual variance along the norm of reaction
  • such changes in among-individual variance can result from a nonlinear norm of reaction curve
  • the shape of such norm of reaction curves can vary among genoytpes
  • they might also result from other factors such as destabilizing effects of environmental stress or genetic perturbations
  • these early studies showed that the tendency for increased among-individual variance in mutants responded to selection, suggesting a heritable basis for canalization
  • all genetic canalization effects are attributable to epistasis in a quantitative genetic sense
  • yet large, population-level quantitative genetic effects are no assurance that underlying biological interactions between genes and/or gene products are understood or can be identified in individuals
  • to say epistasis explains genetic canalization in a mechanistic sense is therefore to commit Roff's sin of confounding a quantitative model with a mechanistic explanation
  • at the heart of this issue is the fact that quantitative genetics and developmental biology are concerned with different kinds of questions and have different definitions of ostensibly the same phenomena
  • quantitative genetics is concerned with partitioning phenotypic outcomes into cumulative, statistically-defined phenomena
  • developmental biology is concerned with physical mechanisms such as specific molecular, cellular or tissue-level interactions
  • the genetics of a population of clones is singularly uninteresting
  • its developmental biology may be fascinating
  • quantitative genetics defines most phenomena at the population level and is fundamentally concerned with variation
  • the mechanisms of interest to developmental biology occur in individuals and, at least under the current paradigm, variation is more often a nuisance than a direct object of study
  • in genetics, epistasis is a statistical effect
  • in developmental biology it is generally viewed as a mechanistic interaction between gene products
  • the two versions of epistasis are linked conceptually but are not identical
  • canalization is a property of individuals that is almost always measurable only at a population level
  • in genetic terms, a genetic factor increases canalization if, all else being equal, it reduces the phenotypic variance
  • for genetic canalization, the influence of a gene on the phenotypic effect of other genes is, by definition a case of epistasis
  • genetic variation in such effects could be described as differential epistasis
  • does this mean that epistasis fully explains genetic canalization?
  • not necessarily
  • gene interaction effects imply nonlinear effects on phenotype
  • this is a statistical model of variation and not a mechanistic description of actual physical interactions
  • nonlinearities in development arise in myriad ways
  • depending on one's perspective (genetic or developmental), epistasis can be seen as either a cause or a consequence of canalization
  • for genes with redundant functions, null mutations might, therefore, be expected to produce a viable but more variable phenotype
  • conversely, redundancy in gene function contributes to robustness to mutation in those genes
  • the mapping of gene expression to phenotype is highly nonlinear
  • loss of over 50% of the wildtype expression level is associated with minimal changes in phenotype
  • beyond this point, the phenotypic effects are large and increasingly severe
  • we extended variation in Fgf8 expression well beyond the range expected in natural populations
  • this begs the question of how such nonlinearities evolve
  • the local flattening of the gene-expression to phenotype around the wildtype could evolve by stabilizing selection acting on gene-interaction effects
  • the question of how stabilizing or directional selection influence variational properties such as canalization has been an area of significant debate
  • diversity, often originating in different disciplines, has also created a situation reminiscent of the parable of the blind men and the elephant
  • each tends to be concerned with a particular aspect of the problem, making it difficult to contextualize this progress in terms of a more general understanding of the mechanisms of canalization in development

polygenic adaptation

Thompson KA, Osmond MM & Schluter D 2019 Parallel genetic evolution and speciation from standing variation. Evol Lett 3:129-141.

  • adaptation from standing variation would reduce the evolution of reproductive isolation under parallel selection
  • parental populations would fix more of the same alleles and therefore evolve fewer incompatibilities
  • our simulations consider pairs of populations and multivariate phenotypes determined by multiple additive loci
  • N haploid parents are then randomly sampled with replacement from a multinomial distribution with probabilities proportional to their fitness, W
  • parents then randomly mate and produce two haploid offspring per pair, with free recombination between all loci
  • we assume an effectively infinite number of loci such that all mutations arise at a previously unmutated locus
  • mutational effects are drawn from a multivariate normal distribution ("continuum-of-alleles" sensu Kimura [1965]), with a mean of 0 and an SD of α in all m traits and no correlations among traits
  • i.e., universal pleiotropy
  • our general conclusions hold if the ancestor is under much stronger selection (σanc = 1) that puts it into the multivariate "House-of-Cards" regime
  • a parental population was established by first randomly choosing n polymorphic loci in the ancestor
  • each parental individual received the mutant (i.e., "derived") allele at each of these n loci with a probability equal to the allele's frequency in the ancestor
  • this admittedly artificial sampling procedure allowed us more control over the amount of standing genetic variation across simulations with different parameter values
  • further control was achieved by making the second parental population initially identical to the first
  • each possessed the exact same collection of genotypes
  • there were therefore no founder effects
  • populations adapted from only new (i.e., de novo) mutation when n = 0
  • an unavoidable and important effect of standing variation is that it quickens adaptation because populations do not have to wait for beneficial alleles to arise
  • reproductive isolation evolves rapidly during the initial stages of adaptation
  • after populations reach their respective phenotypic optima, genetic divergence accumulates slowly at a rate proportional to the mutation rate
  • our results reflect quasi-equilibrium conditions rather than transient states and are unaffected by standing variation's influence on the speed of adaptation
  • phenotypic variance in parental populations (i.e., before hybridization) is near zero and does not differ between populations founded with versus without standing variation
  • such low variance is expected because our simulations have fixed optima, frequency-independent selection, no migration, and parameter values corresponding to strong selection and relatively weak mutation
  • we determined the fitness (eq. (1)) of each hybrid in both parental environments and recorded its fitness as the larger of the two values
  • genetic parallelism rarely decreases to zero even under completely divergent selection (θ = 180°), indicating that populations fix some deleterious alleles

polygenic adaptation

Höllinger I, Pennings PS & Hermisson J 2019 Polygenic adaptation: from sweeps to subtle frequency shifts. PLoS Genet 15:e1008035.

  • population genetics views adaptation as a sequence of selective sweeps at single loci underlying the trait
  • quantitative genetics posits a collective response, where phenotypic adaptation results from subtle allele frequency shifts at many loci
  • a synthesis of these views is largely missing
  • we study the architecture of adaptation of a binary polygenic trait (such as resistance) with negative epistasis among the loci of its basis
  • our key analytical result is an expression for the joint distribution of mutant alleles at the end of the adaptive phase
  • for shifts, alleles need to be able to hamper the rise of alleles at other loci via negative epistasis
  • diminishing returns are a consequence of partial or complete redundancy of genetic effects across loci or gene pathways
  • adaptive phenotypes (such as [...]) can often be produced in many alternative ways
  • redundancy is a common characteristic of beneficial mutations
  • we dissect the adaptive process into two phases
  • the early stochastic phase descries the establishment of all mutants that contribute to the adaptive response under the influence of mutation and drift
  • loci can be treated as independent during this phase to derive a joint distribution for ratios of allele frequencies at different loci, Eq (5)
  • during the second, deterministic phase, epistasis and linkage become noticeable
  • mutation and drift can be ignored
  • allele frequency changes during this phase can be descried as a density transformation of the joint distribution
  • for the simple model with fully redundant loci, and assuming either LE or complete linkage, this transformation can be worked out explicitly
  • our main result Eq (8) can be understood as a multi-locus extension of Wright's formula
  • for a neutral locus with multiple alleles, Wright's distribution is a Dirichlet distribution, which is reproduced in our model for the case of complete linkage
  • for the opposite case of linkage equilibrium, we obtain a family of inverted Dirichlet distribution
  • the quantitative genetic "small shifts" view of adaptation does not talk about a stationary distribution
  • it does not imply that alleles will never fix over much longer time scales
  • the transient nature of our result means that it reflects hte effects of genetic drift only during the early phase of adaptation
  • our result ignores drift after phenotypic adaptation has been accomplished—which is also a reason why it can be derived at all
  • the qualitative pattern of polygenic adaptation is predicted by a single compound parameter: the background mutation rate Θbg
  • i.e., the population mutation rate for the background of a focal locus within the trait basis
  • the role of Θbg for polygenic adaptation is essentially parallel to the one of Θl for soft sweeps
  • the mathematical methods to analyze both cases are different
  • the polygenic scenario does not lend itself to a coalescent approach
  • alternative approaches to polygenic adaptation
  • the theme of "competition of a single locus with its background" relates to previous findings by Chevin and Hospital (2008) [26] in one of the first studies to address polygenic footprints
  • the background is modeled as a normal distribution with a mean that can respond to selection, but with constant variance
  • a drift-related parameter, such as Θbg, has no place in such a framework
  • a sweep at the focal locus is prohibited under two conditions
  • first, the background variation (generated by recurrent mutation in our model, constant in [26]) must be large
  • second, the fitness function must exhibit strong negative epistasis that allows for alternative ways to reach the trait optimum—and thus produces redundancy (due to Gaussian stabilizing selection in [26])
  • finally, while the adaptive trajectory depends on the shape of the fitness function, Chevin and Hospital note that it does not depend on the strength of selection on the trait, as also found for our model
  • de Vladar and Barton [42] and Jain and Stephan [31] [...] study an additive quantitative trait under stabilizing selection with binary loci
  • these models allow for different locus effects, but ignore genetic drift
  • before the environmental change, all allele frequencies are assumed to be in mutation-selection balance
  • sweeps are prevented in [31] if most loci have a small effect and are therefore under weak selection prior to the environmental change
  • this contrasts to our model, where the predicted architecture of adaptation is independent of the selection strength
  • in our model, weak selection does not imply shifts
  • this difference can at least partially be explained by the neglect of drift effects on the starting allele frequencies in the deterministic models
  • in the absence of drift, loci under weak selection start out from frequency x0 = 0.5 [42]
  • in finite populations, however, almost all of these alleles start from very low (or very high) frequency
  • many alleles at intermediate frequencies at competing background loci are expected only if Θbg≫1, in accordance with our criterion for shifts
  • we have analyzed our model for the case of starting allele frequencies set to the deterministic values of mutation-selection balance, μ/sd
  • we observe adaptation due to small frequency shifts in a much larger parameter range
  • adaptation by sweeps in a polygenic model requires a mechanism to create heterogeneity among loci
  • both drift and unequal locus effects are included in the simulation studies by Pavlidis et al (2012) [28] and Wollstein and Stephan (2014) [29]
  • due to differences in concepts and definitions there are few comparable results
  • they study long-term adaptation
  • they simulate Ne generations
  • sweeps are defined as fixation of the mutant allele at a focal locus
  • frequency shifts correspond to long-term stable polymorphic equilibria [29]
  • with this definition, a shift scenario is no longer a transient pattern, but depends entirely on the existence (and range of attraction) of polymorphic equilibria
  • the initial stochastic phase is relatively insensitive to interactions via epistasis or linkage
  • as described above, the key qualitative results to distinguish broad categories of adaptive scenarios are due to the initial stochastic phase
  • this holds true, in particular, for the role of the background mutation rate Θbg
  • we therefore expect that these results generalize beyond our basic model
  • Ne is the short-term effective population size [...] during the stochastic phase of adaptation
  • this short-term size is unaffected by demographic events, such as bottlenecks, prior to adaptation
  • it is therefore often larger than the long-term effective size that is estimated from nucleotide diversity

polygenic adaptation

Stetter MG, Thornton K & Ross-Ibarra J 2018 Genetic architecture and selective sweeps after polygenic adaptation to distant trait optima. PLoS Genet 14:e1007794.

  • during the stationary phase before the shift and after reaching the new optimum we followed a Gaussian fitness function appropriate for a trait under stabilizing selection
  • during the optimum shift, however, such a model would be problematic, as only a few individuals in the upper tail of the fitness distribution would have extremely high relative fitness, inducing a strong population bottleneck
  • instead, we applied a model of truncation selection, first calculating fitness under the Gaussian fitness function but then assigning a fitness of 1 to the top half of the population and 0 to the bottom half
  • such a model is reasonable for sudden shifts in trait optima that do not lead to the extinction of a population, but where higher trait values are unambiguously advantageous and the maximum population size is limited
  • in natural populations these factors can be observed when sudden changes in the environment favor a specific phenotype for invasive species
  • we modeled 20 QTL resembling 50kb regions, each with a 4 kb "genic" region centered in a 46 kb "intergenic" region
  • fitness
  • w = exp[− (zzopt)2 / (2VS)] ... (1)
  • this standard model for traits under stabilizing selection is well suited for populations at equilibrium
  • under strong directional selection, however, this model greatly amplifies fitness differences among individuals in the tails of the phenotypic distribution
  • during the adaptive phase of the simulation, we calculated individual fitness following Eq 1, but then apply truncation selection by assigning a fitness of 1 to the top 50% of the distribution of w and 0 for the remaining 50%
  • this model allowed for truncation selection on z, while the population was distant from the new optimum, but allows for selection against phenotypes that surpass the new optimum during the final stages of adaptation
  • we stopped truncation selection once the population mean reached the new optimum
  • initial genetic variance
  • the genetic variance at equilibrium can be approximated by the house of cards (HoC) model
  • E[VG] = 4μVS ... (2)
  • and the stochastic HoC approximation
  • E[VG]SHC = 4μVS / (1 + VS / (m2)) ... (3)
  • background
  • computational limitations do not allow simulation of an entire eukaryotic genomes
  • we added a heritable background (GB) to our simulations to account for the adaptive potential of the rest of the genome
  • sweeps
  • we defined as a sweep any mutation that fixed faster than 99% of neutral alleles

polygenic score

Barton N, Hermisson J & Nordborg M 2019 Why structure matters. eLife 8:e45380.

  • the first GWAS for height found a small number of SNPs that jointly explained only a tiny fraction of the variation
  • this was in contrast with the high heritability seen in twin studies
  • it was dubbed ‘the missing heritability problem’
  • it was suggested that the problem was simply due to a lack of statistical power to detect polymorphisms of small effect
  • most of the variation remains ‘unmappable’
  • sample sizes on the order of a million are still not large enough
  • one way in which the unmappable component of genetic variation can be included in a statistical measure is via so-called polygenic scores
  • these scores sum the estimated contributions to the trait across many SNPs, including those whose effects, on their own, are not statistically significant
  • polygenic scores thus represent a shift from the goal of identifying major genes to predicting phenotype from genotype
  • when a GWAS is carried out to identify major genes, it is relatively simple to avoid false positives by eliminating associations outside major loci
  • if the goal is to make predictions, or to understand differences among populations (such as the latitudinal cline in height), we need accurate and unbiased estimates for all SNPs
  • accomplishing this is extremely challenging
  • it is also difficult to know whether one has succeeded
  • one possibility is to compare the population estimates with estimates taken from sibling data, which should be relatively unbiased by environmental differences
  • Berg et al. and Sohail et al. independently found that evidence for selection vanishes – along with evidence for a genetic cline in height across Europe
  • the previously published results were due to the cumulative effects of slight biases in the effect-size estimates in the GIANT data
  • they also found evidence for confounding in the sibling data used as a control by Robinson et al. and Field et al.
  • we still do not know whether genetics and selection are responsible for the pattern of height differences seen across Europe
  • there is no perfect way to control for complex population structure and environmental heterogeneity
  • biases at individual loci may be tiny
  • they become highly significant when summed across thousands of loci – as is done in polygenic scores
  • standard methods to control for these biases, such as principal component analysis, may work well in simulations but are often insufficient when confronted with real data
  • even the data in the UK Biobank seems to contain significant structure
  • quantitative genetics has proved highly successful in plant and animal breeding
  • this success has been based on large pedigrees, well-controlled environments, and short-term prediction
  • when these methods have been applied to natural populations, even the most basic predictions fail, in large part due to poorly understood environmental factors
  • natural populations are never homogeneous
  • it is therefore misleading to imply there is a qualitative difference between ‘within-population’ and ‘between-population’ comparisons