compensatory evolution

Povolotskaya IS & Kondrashov FA 2010 Sequence space and the ongoing expansion of the protein universe. Nature 465:922-926.

  • ancient proteins are still diverging from each other
  • indicating an ongoing expansion of the protein sequence universe
  • ~98 per cent of sites cannot accept an amino-acid substitution at any given moment
  • a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur
  • ~3.5 × 109 yr has not been enough to reach the limit of divergent evolution of proteins
  • the rate of evolution may be inhibited by two factors
  • non-epistatic negative selection, where the fitness impact of every allele is independent of genetic context and environmental factors
  • or a high prevalence of sign epistasis
  • sign epistasis is a property of fitness landscapes such that an amino acid at a specific site confers high fitness in one genetic background and low fitness in another
  • if such situations are common, the fitness landscape can be described as rugged
  • the order in which substitutions accumulate is restricted
  • with the result that only a small fraction of the substitutions that are possible in at least one genetic background can occur at any given moment
  • the rate of evolution will be slow
  • because fitness ridges will consist of fewer and longer evolutionarily accessible paths in sequence space
  • evolving proteins may meander through sequence space on such a rugged fitness landscape for a long time before reaching the limit of their divergence
  • the limit of sequence divergence for proteins under non-epistatic negative selection is much closer, in terms of protein distance, than that for neutrally evolving protein sequences
  • such selection reduces the time to reach this limit
  • the ongoing divergence of ancient proteins (Fig. 3) is inconsistent with non-epistatic negative selection being the main factor responsible for their slow evolution
  • as first recognized by John Maynard Smith
  • "functional proteins must form a continuous network which can be traversed by unit mutational steps without passing through nonfunctional intermediates"
  • although some aspects of the protein fitness landscape (that is, the network) have been probed, we remain largely ignorant of its global structure spanning sequence space
  • our data indicate high incidence of sign epistasis, such that selection associated with a substitution at one site depends on other sites
  • only a small fraction of all possible sequences confer high fitness, leading to slow divergence of ancient proteins
  • compensatory interactions between different sites in the protein structure are a well-documented phenomenon11–14,16–21 that is expected to result in complex, multidimensional sign epistasis and to contribute substantially to the ruggedness of protein fitness landscapes
  • ridges of high fitness corresponding to specific ancient proteins occupy a tiny fraction of the entire volume of the sequence space
  • these ridges are long and thin and can be more accurately visualized as a wide-mesh net spanning a large part of sequence space, rather than as a small volume within the space
  • such fitness ridges imply that sign epistasis and compensatory evolution in ancient proteins must be common
  • our data show that >90% of the sites in any protein can eventually accept a substitution given the right combination of amino acids at other sites
  • although it is not clear whether such substitutions are predominantly neutral or beneficial
  • regardless of the importance of positive selection in protein divergence, it seems that many sites are conserved because there has not been enough time to create the right combination of amino acids at other sites to allow them to evolve, which may take billions of years
  • protein fitness landscapes cannot be described outside the framework of multidimensional epistasis
  • which can reflect non-trivial compensatory interactions
  • simple forms of epistasis, such as synergistic or antagonistic epistasis, do not provide an adequate theoretical framework for understanding protein evolution