Kullback-Leibler divergence

Frank SA 2012 Natural selection. V. How to read the fundamental equations of evolutionary change in terms of information theory. J Evol Biol 25:2377-2396.

  • the variance in fitness is equivalent to a symmetric form of the Kullback–Leibler information that the population acquires about the environment through the changes in gene frequency caused by selection
  • Kullback–Leibler information is closely related to Fisher information, likelihood and Bayesian updating from statistics, as well as Shannon information and the measures of entropy that arise as the fundamental quantities of communication theory and physics
  • the common variances and covariances of evolutionary models are equivalent to the fundamental measures of information that arise in many different fields of study
  • information is a primary quantity with intuitive meaning in the study of selection
  • the genetic variance just happens to be an algebraic equivalence for the measure of information
  • the history of evolutionary theory has it backwards
  • qi' = qi(wi / w) ... (1)
  • economists, mathematicians and game theorists have called this expression the replicator equation
  • it expresses in the simplest way the dynamics of replication
  • box 3: interpretation of q' and z'
  • selection theory in its most abstract and general form requires a set mapping interpretation
  • qi' is the frequency of descendants derived from type i in the ancestral population
  • the set mapping interpretation allows one to generalize equations of selection theory and total evolutionary change to a much wider array of problems than would be possible under the common interpretations of the terms
  • entities that depend on ratios have a natural logarithmic scaling
  • it is traditional to describe the logarithm of fitness as the Malthusian expression
  • mi = log(wi) = log(w) + log(qi' / qi)
  • Δsm = ∑Δqi log(qi' / qi) ... (9)
  • perhaps the most important measure of information in communication, statistics and physics is the Kullback–Leibler divergence
  • D(q' || q) = ∑qi' log(qi' / qi) ... (10)
  • Δsm = D(q' || q) + D(q || q') ... (11)
  • this sum is known as the Jeffreys divergence
  • J(q', q) = D(q' || q) + D(q || q') ... (12)
  • Δsm = J ... (13)
  • J = βwmVm / w = βmwVw / w ... (14)
  • this expression shows the relation between the information accumulated by natural selection, J , and the traditional statistical expressions of natural selection in terms of variances and regression coefficients
  • box 4: selection and information
  • Fisher never developed an information analysis of selection
  • the modern field of information theory only began with Shannon's work on communication
  • the use of Fisher information outside of statistical problems developed later
  • why J rather than D?
  • we obtain the same information gain when selection moves a population as qq' or in the reverse direction q' → q
  • we associate a set of values, θ, with each probability distribution
  • as the changes become small, Δθ → 0, the Jeffreys divuergence, J(θ), divided by the squared change in scale, Δθ2, converges to the important quantity in statistical theory known as Fisher information, F(θ)
  • J(θ) / Δθ2F(θ)
  • Δm = J + Δcm
  • we can express the equilibrium condition, Δm = 0
  • under a balance between information gain by selection and information decay by change in coordinates, J = −Δcm
  • Δsz = Cov(w, z) / w = βzwVw / w
  • why should the expression for selection be exactly the covariance, or the regression multiplied by the variance, which capture only the linear component of association?
  • Δsz describes selection by a change in average values
  • to calculate a change in the average, we need only the linear component of association between character and fitness
  • the logarithmic scale compares relative magnitudes
  • we need relative magnitudes because there is no meaning in the number of babies or the number of copies produced with regard to whether a type, i , is increasing or decreasing in the population
  • we need to know the relative success
  • the logarithmic scale is the natural scale of relative magnitudes
  • when reading the fundamental equations of selection for meaning, we should prefer the information interpretation
  • selection is the process by which populations accumulate information about the environment
  • the correct scale for analysing the changes in fitness is logarithmic
  • fitness is a relative measure
  • logarithmic scaling is the correct scale for relative measures (Wagner, 2010)
  • the partial change gives a clear sense of what selection is doing at any moment, but provides no insight by itself about evolutionary dynamics
  • earlier work implicitly used time as the parameter, which is not a meaningful way of expressing the accumulation of information
  • one does not think of selection as providing information about time
  • past work often tried to make general statements about evolutionary dynamics, which is not possible
  • it is possible to make strong and completely general statements about the partial change caused by selection