Category Archives: population genetics

Mutation-Selection Equilibrium

How do you permanently increase or decrease the fitness of a population (in terms of population genetics). Is it better to intensify the strength of selection so that deleterious mutations are removed, or relax the strength of selection so that mutations are better tolerated. This is something that is relevant to the current human condition, where there is an increased amount of medical intervention and (in the very recent past) a relaxation of some adaptive demands in the environment (this will be misunderstood, human culture has placed additional demands upon humans in the longer timescale, but anyway...). Arguments can be made for either scenario and---it turns out that it really doesn't matter. Strangely enough the strength of selection cannot change the average fitness (at equilibrium) of a population.

For this post only think of mutations as deleterious (they lower an organism's fitness) if they have any effect at all. Most mutations either have no fitness effect (are selectively neutral) or lower fitness. Very few mutations in the genome would actually increase the fitness of an organism (adaptive).  Again think about making random changes to a car. Most changes, slightly reducing the length of the radio antenna, either make essentially no difference or, reroute the fuel line to the windshield wiper pump, are a very bad idea in terms of performance of the car. It would be very rare to make random changes that increase the performance of the car.

So, we know there are deleterious mutations that exist in a population that can result in what we identify in humans as a genetic disease. Many of these are recessive such as cystic fibrosis or Tay-Sachs disease; however, some are dominant such as Huntington's disease or Marfan's syndrome. And some do not fit this simple category such as X-linked hemophilia. Why do human and other species have so many deleterious alleles in the population?

The alleles are generated by mutation and removed by selection. We can have a mutation rate \mu and a strength of selection (relative to unmutated alleles) acting on those mutations s. If we pretended the alleles acted independently, that pairing them together into diploid genotypes did not matter, then we can easily write down the equilibrium allele frequency, \hat{p}, of the mutation.

\hat{p} = \frac{\mu}{\mu + s}

This is the rate that the allele is generated, \mu, out of the total rate of input by mutation and removal by selection. For example, if the strength of selection against a mutant is 20% (a 20% fitness reduction relative to individuals with unmutated alleles) and the mutation rate is 0.1% then the equilibrium allele frequency is approximately one half of a percent, 0.5%. On the other hand if the fitness reduction is only 1% the allele can reach a higher frequency in the population at the same mutation rate (\mu=0.1%) the equilibrium becomes approximately 9%, which is a dramatically high frequency for a dominant deleterious allele. (This is mathematically equivalent to bi-directional mutation; however, here the "back mutation" to restore the original state is selection.)

Now lets keep track of genotypes and the case where the fitness effect of the allele is dominant (with some simplifying assumptions). If it is dominant and deleterious it is probably rare (not like the 9% case above), so homozygotes, \hat{p}^2 are exceedingly rare and can be safely ignored because they contribute very little to the equilibrium dynamics. Mutations occur on the non-mutant allele copies \mu(1-p) (regardless if they are present in a homozygote or heterozygote). Selection removes a portion of heterozygotes, according to Hardy-Weinberg heterozygotes are expected to appear at a frequency of 2p(1-p) and an s fraction of these are removed so the genotype rate of removal is  s 2 p (1-p).

(2p(1-p) - s 2 p (1-p) = (1-s) 2p(1-p), the relative fitness to the unmutated homozygote is 1-s which adjusts the frequency of the heterozygotes 2p(1-p) due to selection.)

Only half of the alleles in the heterozygotes are the mutant allele so the rate of removal of the mutant allele is \frac{s 2 p (1-p)}{2} = s p (1-p). (Here we are focused on the case where the mutant allele is rare, which allows us to ignore p^2 in the first place. So, 1-p \approx 1. The frequency of heterozygotes is then approximately 2p. However, we want to keep track of the change in p due to s so we divide s2p by 2. Selection is also removing non-mutant alleles in heterozygotes but the non-mutant homozygotes are much more common so we just ignore the other half of the alleles that are removed due to selection; this comes from the assumptions of large population size and rare mutant allele frequency.)

At equilibrium the rate of input and removal of the mutation are equal.

\mu(1-p) = s p (1-p)

Rearrange and simplify.

\mu = s p

\hat{p} = \frac{\mu}{s}

Using the two examples above with a mutation rate of \mu = 0.001 the equilibrium allele frequency is predicted to be 0.5% for a fitness reduction of 20% and 10% for a fitness reduction of 1%, which is almost equivalent to the case where alleles are acted on independently (as haploids). (If s \gg \mu then \frac{\mu}{s} \approx \frac{\mu}{s+\mu}.)

What if the fitness effect is recessive? Then selection only removes the mutant homozygotes which are predicted to occur at a frequency of p^2. In this case selection can proceed more efficiently, in a sense, and mutant alleles are removed in pairs. All of the alleles in the homozygote are the mutant form so there is no adjustment necessary like dividing by two in the heterozygotes.

\mu(1-p) = s p^2

To keep this from getting messy let's use the 1-p \approx 1 trick now (which is actually less appropriate in this case because higher frequencies can be attained, but for now still assume that mutant alleles are very rare, s \gg \mu, which is true for many mutations that result in human genetic diseases).

\mu = s p^2

p^2 = \frac{\mu}{s}

\hat{p} = \sqrt{ \frac{\mu}{s}}

Using our two examples again, with a mutation rate of \mu = 0.001 the equilibrium allele frequency is predicted to be 7% for a fitness reduction of 20% and 32% for a fitness reduction of 1%. Now the equilibrium allele frequencies are much higher, when very rare (when s \gg \mu) the difference can be many orders of magnitude. Masking the fitness effects in heterozygotes (carriers) allows the allele to get to unexpectedly high equilibrium frequencies.

Now let's look at the average effect in the population. The average fitness \bar{w} in a population can be calculated as the fitness of each genotype multiplied by its corresponding frequency. Let's say the fitness of unmutated homozygotes is 100% or 1. In the case of a dominant fitness effect

\bar{w} = 1 (1-p)^2 + (1-s) 2 p (1-p) + (1-s) p^2

Expand and simplify

\bar{w} = 1- 2sp + sp^2

Let's say p^2 \approx 0

\bar{w} \approx 1- 2sp

Substituting in \hat{p} = \mu/s

\bar{w} \approx 1- 2 \mu

The average fitness is one minus twice the mutation rate.

On the other hand if selection is recessive

\bar{w} = 1 (1-p)^2 + 2 p (1-p) + (1-s) p^2

Expand and simplify

\bar{w} = 1- sp^2

Substituting in \hat{p} = \sqrt{\mu/s}

\bar{w} \approx 1- \mu

The average fitness is one minus the mutation rate.

Interestingly, the average fitness in the population does not depend on the fitness of the genotypes within the population; it is only a function of the mutation rate! (However, it does depend on the type of dominance; recessive deleterious effects result in both a higher equilibrium allele frequency and a higher average fitness.) This seems like a paradox at first.  In our examples from above, if selection is dominant, then with a mutation rate of 0.1% the average fitness within the population is 99.8% regardless of how strong selection is against the mutation. If the mutation is recessive the average fitness is actually higher 99.9%, again regardless of the strength of selection against the mutation (this is related to selection being more efficient when it acts upon pairs of mutant alleles rather than one at a time in heterozygotes).

Why is average fitness only determined by the mutation rate? The trick to understand this is to realize that the strength of selection against a mutation and the frequency of the mutation in the population at equilibrium cancel out in terms of the average effect in the population. A mutation with a strong effect can only exist at a low frequency and only affect a few individuals, while a mutation with a weak effect attains a higher frequency and affects more individuals. The average effect of these two mutations, that have very different effects on fitness, in a population at equilibrium is the same.

However, going back to the original question of this post, changing the mutation rate can permanently change the average fitness of a population. How do you change the mutation rate? Well certain chemicals and radiation are well known examples. (Also, in a recent article Michael Lynch points out that if relaxed selection affects mutations in genes that affect the mutation rate itself, DNA repair, etc., then there could be a feedback effect where selection does influence equilibrium average fitness by altering the mutation rate.)

To put this in perspective, mutations are not rare unusual events that can safely be ignored; they affect all of us. We all have new mutations that can be passed on to our children. Whole genome sequencing estimates this as high as to be around 40-80 new mutations depending on the age of our father. (Click on the image below for the source information.)

newhumanmutations

Each year of father's age adds on average two new mutations. The age of the father is also a risk factor for diseases like autism and schizophrenia and new mutations might be playing a role.

Above ground nuclear testing released large amounts of radioactive particles into Earth's atmosphere which spread planet-wide. At the height of the Cold War before the 1963 partial test ban the amount of radiation in the atmosphere was almost doubled (click on the image for a link to more information about how it was generated).

Radiocarbon_bomb_spike.svg

This had very real effects, for example it created a market for pre-war steel for specialized equipment like Geiger counters to test levels of radiation (steel made after WWII was contaminated with atmospheric radiation). A particularly valuable source are WWI battleship wrecks that are protected under water from contact with the atmosphere.

The increase in radiation alarmed some population geneticists like Hermann Muller who studied mutations and their effects on fitness. He helped raise awareness of the issue and his work among others contributed to the partial test ban treaty.

Furthermore, we come into contact with a wide range of industrial chemicals, many of which are seriously toxic and/or some of which can cause heritable mutations.

It is not known how this exposure might translate into increases in mutation rates (and to what degree this might contribute to the rates of genetic diseases). It would be interesting to estimate genome-wide mutation rates, in humans and other species if possible, over the last few centuries by comparing relatives sharing DNA sequences that are connected by different numbers of generations before and after certain points in history to see if there is a measurable effect.

Genetic drift with a beta approximation

There are several ways to model or simulate genetic drift (random changes in allele frequencies) in a population over time.  Some of the simplest and most direct are 1) binomial sampling and 2) forward simulations, where every allele copy is kept track of in every generation.  However, these more complete approaches can be very slow (computationally intensive) with large populations over many generations.  Another approach is a 3) diffusion approximation that generates a probability distribution of allele frequencies at a given time point.  Yet another widely used approach is the 4) coalescent.  Coalescent simulations are very efficient because they only track the ancestral lineages of a sample of allele from the population (most allele transmission events can be ignored because they do not contribute to the last generation).

I have worked with all of these approaches in one form or another and they all have pluses and minuses that depend on the question being asked.  The diffusion approximation has generated many insights into population genetics; however, in generating outcomes (say for simulation) it can be inefficient.  The probability distribution is generated by a sum of a series of Gegenbauer polynomials, and these are slow to converge.  I was thinking about this recently and wondered if the diffusion approximation could itself be approximated by a beta distribution.  This might be more computationally efficient.

Beta distributions have two parameters, \alpha and \beta.  The mean (expectation) of a beta distribution is E\left[x\right]=\frac{\alpha}{\alpha+\beta}.  In genetic drift, if we imagine a large number of populations drifting away from the same starting allele frequency, the average allele frequency across all of these populations is not expected to change.  Genetic drift does not push allele frequency change in a direction.  Alleles are just as likely to increase or decrease in frequency.  So, if we have p_0 as the initial allele frequency then we expect p_0=E\left[x\right]=\frac{\alpha}{\alpha+\beta} and that this remains constant over time.  Incidentally, you might also see that 1-p_0=\frac{\beta}{\alpha+\beta} because these are both fractions out of the total.

On the other hand, the variance in allele frequencies between replicate populations is expected to increase over time as the populations diverge due to genetic drift.  The variance of a beta distribution is var\left[x\right]=\frac{\alpha \beta}{\left(\alpha+\beta\right)^2\left(\alpha + \beta + 1\right)}.  Right away we can rewrite this as

var\left[x\right]=\frac{\alpha}{\alpha+\beta} \frac{\beta}{\alpha+\beta}\frac{1}{\alpha + \beta + 1}

var\left[x\right]=p \left(1-p\right) \frac{1}{\alpha + \beta + 1}

Under the diffusion approximation the variance in allele frequencies after t generations in a population of size N diploid individuals is

var\left[x\right]=p \left(1-p\right) \left(1-e^{-\frac{t}{2N}}\right)

We want to match the variance of a beta distribution to the variance of the diffusion approximation so we can generate beta distributions in terms of the spread of allele frequencies after t/N=\tau generations have passed.

p \left(1-p\right) \left(1-e^{-\frac{\tau}{2}}\right)=p \left(1-p\right)\frac{1}{\alpha + \beta + 1}

We can cancel out  p \left(1-p\right) to get

 1-e^{-\frac{\tau}{2}}= \frac{1}{\alpha + \beta + 1}

Then rearrange

 \frac{1}{1-e^{-\frac{\tau}{2}}} -1= \alpha + \beta

Realizing again that \alpha and \beta are fractions out of a total we can define them as

\alpha =p \left(\frac{1}{1-e^{-\frac{\tau}{2}}} -1\right)

\beta =\left(1-p\right) \left(\frac{1}{1-e^{-\frac{\tau}{2}}} -1 \right)

And now we have a beta distribution that is defined in population genetic terms as an approximation to a diffusion approximation.  As a side note, sometimes very "bad" distributions can be used as approximations in population genetics and be surprisingly reasonable.  For example, Kimura used a uniform distribution to model binomial changes in allele frequencies between populations and it works well.  Sometimes a process can be relatively insensitive to underlying distributions.  In another example, Crow worked on a problem of truncating selection and the shape of the area (of the state space) exposed to selection had little effect. (I'll add in some references here later.)

Does this beta approximation work?

Here is a classical result plotted by Kimura (1955, PNAS 41, 144).  It shows the distribution of allele frequencies at a range of N generations after starting from p_0=0.5.

Kimura1955a

And here is the corresponding beta approximation.

betadiffusionp0p5

Right away you can see that is is a very good match, especially for short times and intermediate allele frequencies.  The big difference is at the edges.  The diffusion approximation allows for complete loss or fixation of an allele in a finite population.  The beta approximation does not.  An allele frequency can get arbitrarily close to zero or one but variation is never completely lost.  This is why the distribution "stacks up" near the edges.

By the way, I am still working on learning R and used an R script to generate these plots.  Here is the code if you are interested.

driftcoefficient<-function(tau){
  z<-1/(1-exp(-tau/2))-1
  return(z)
}

p<-1/2 # allele frequency
h<-3.8 # upper plot limit

x <- seq(0, 1, length = 1001)

tau<-0.1 # t/N generations
y<-dbeta(x, p*driftcoefficient(tau), (1-p)*driftcoefficient(tau))
plot(x,y,type='l', ylim=c(0,h),xlab='allele frequency',ylab='density')
text(0.5,3.62,'t=N/10')
par(new=TRUE)
tau<-0.2 # t/N generations
y<-dbeta(x, p*driftcoefficient(tau), (1-p)*driftcoefficient(tau))
plot(x,y,type='l', ylim=c(0,h),xlab='allele frequency',ylab='density')
text(0.5,2.53,'t=N/5')
par(new=TRUE)
tau<-0.5 # t/N generations
y<-dbeta(x, p*driftcoefficient(tau), (1-p)*driftcoefficient(tau))
plot(x,y,type='l', ylim=c(0,h),xlab='allele frequency',ylab='density')
text(0.5,1.55,'t=N/2')
par(new=TRUE)
tau<-1 # t/N generations
y<-dbeta(x, p*driftcoefficient(tau), (1-p)*driftcoefficient(tau))
plot(x,y,type='l', ylim=c(0,h),xlab='allele frequency',ylab='density')
text(0.5,1,'t=N')
par(new=TRUE)
tau<-2 # t/N generations
y<-dbeta(x, p*driftcoefficient(tau), (1-p)*driftcoefficient(tau))
plot(x,y,type='l', ylim=c(0,h),xlab='allele frequency',ylab='density')
text(0.5,0.55,'t=2N')
par(new=TRUE)
tau<-3 # t/N generations
y<-dbeta(x, p*driftcoefficient(tau), (1-p)*driftcoefficient(tau))
plot(x,y,type='l', ylim=c(0,h),xlab='allele frequency',ylab='density')
text(0.5,0.35,'t=3N')

Here is another example starting from an initial allele frequency of p_0=0.1.

Kimura1955b

And the beta approximation.

zeropone

Again, you can see that it does well in the intermediate frequencies and diverges near the boundaries of the distribution.

Okay, lets use this beta approximation to play with a model of two populations splitting and evolving over time.  Time is measured in terms of N generations after the splitting event.

popsplitequal

Here is an example of what we might expect to see over time.  The plot below is just 0.01N generations after the splitting event.  (In human terms we have an effective population size of 10,000 and a generation time of 30 years, so this is about 3,000 years.)

twopoptau0p01

This is from 10,000 randomly generated SNPs (assuming beta-distributed standing variation in the original population).  Most SNPs are near a frequency of zero or one, which is what is typically seen in natural populations.  However, some are at intermediate frequencies.  The SNPs are highly correlated in frequency between the two populations but have begun to diverge.

Here is a plot at 0.1N generations (30,000 years on a human timescale).

tauzeropone

Then 0.2N generations.

zeroptwo

Then 0.5N generations.

zeropfive

And 1N generations after the division.

one

A common theme in population genetics is that after N generations of genetic drift the initial starting point is essentially lost and an allele that is variable (not fixed or lost) can be at essentially any frequency (a uniform distribution).  You can see this in the distribution plots earlier in this post and can get a feeling for this in the interior of the plot above.

Below is the plot at 2N generations (600,000 years in human time).  Many of the alleles have reached fixation or loss in at least one of the populations.  We are also starting to see alleles that are reciprocally fixed and lost between the two populations (in the upper left and lower right corners).

two

Okay, logically following this we get 4N generations.

four

Then 8N, where no allele is polymorphic in both populations simultaneously.

eight

Then 16N, where the starting polymorphism is almost completely lost.

sixteen

And finally 32N generations results in complete fixation or loss for all 10,000 simulated SNPs.

thirtytwo

However, this simulation ignores new mutations that occur over time (which in general will be unique to a population) and gene flow by migration between populations (which will dramatically slow down the divergence between the populations even with small numbers of migrants each generation).  All of these can be added in to elaborate the model.  However, I am pleased that this beta approximation tends to fit general population genetic expectations well.  The problem is with alleles that have almost lost all polymorphism (near the edges of the distribution); however, there is a strong ascertainment bias in population genetic datasets.  For various reasons, we tend to ignore alleles with little variation in favor of alleles that are highly variable across populations, and this is precisely where the beta approximation tends to behave well.  Furthermore, in populations that are connected by gene flow, if an allele is polymorphic in one population it is never really lost (over the long term) in another population.  So perhaps the beta approximation could work well in more realistic scenarios.

Here is the code for the SNP plots in the second half of this post.

driftcoefficient<-function(tau){
  z<-1/(1-exp(-tau/2))-1
  return(z)
}

tau1<-0.01 # t/N generations
tau2<-0.01
snps=10000
thetau=0.05
thetav=0.05
initial<-rbeta(snps, thetau, thetav)
y<-rbeta(snps, initial*driftcoefficient(tau1), (1-initial)*driftcoefficient(tau1))
z<-rbeta(snps, initial*driftcoefficient(tau2), (1-initial)*driftcoefficient(tau2))
plot(y,z,xlab='population 1',ylab='population 2')

Reversible Mutations

In the previous post about mutation predictions I only considered mutations in one direction (for example, from functional to non-functional gene sequences (while non-functional to non-functional mutations were ignored because the outcome is the same).)  However, some mutations can be thought of as reversible.  We could think of a nucleotide position mutating from an "A" to any other base-pair state ("C," "G," or "T") and then back again to an "A."  Or, we could think about changes in a codon that alternate between two different amino acids.  For example "CAT" and "CAC" both code for histidine while "CAA" and "CAG" both code for glutamine in the corresponding polypeptide (protein).  The only change is at the third position so a "T" or "C" is one amino acid and an "A" or "G" is the other.  As the sequence mutates and evolves between these four bases at this position we could think about this as reversible between two amino acid states.

aa-reversible-mutations

Another type of mutation is even closer to the sense of reversible.  Transposable elements are small stretches of DNA that can insert into a gene sequence, sometimes inactivating the gene, and then later excise out of the sequence, which might restore the original function.  In fact, transposable elements (or "TE's") are quite common in the genome of many organisms.  In the image below is an example from corn (in which TE's were first discovered).  Starting with an individual that has a TE inserted into a gene that produces the purple pigment (so no pigment is produced and the kernels are white) the TE can excise in certain cells (giving purple spots) and may even excise in the germ-line cells so that the next generation has completely restored pigment production.

nrg793-f1

We can write down the expected frequency in the next generation with reversible mutations as a recursion.  To be an allele at frequency p in generation g you are either already a p type allele in the previous generation g-1 and did not mutate away with a mutation rate of \mu (red) or you were the alternative allele at a frequency of 1-p and mutated at a rate of \nu (blue).  (Incidentally, the "or" in the sentence implies that we add these two outcomes together rather than multiply because they are mutually exclusive; the allele either mutated or it did not; while the two "and"s in the sentence imply multiplying, the mutation rate and the allele frequency are independent events that have to co-occur to have the effect we are focusing on.)

p_g=\color{red}{p_{g-1}(1-\mu)} + \color{blue}{(1-p_{g-1})\nu}.

We could also write this from the alternative alleles point of view q=1-p.

q_g=\color{red}{q_{g-1}(1-\nu)} + \color{blue}{(1-q_{g-1})\mu}.

Either way works the same in the end, but for the rest of this we are using the p version.

At equilibrium p_g=p_{g-1} so setting the allele frequencies between generations equal to each other gives:

p=p(1-\mu)+(1-p)\nu.

Subtract p from both sides to get the terms together.

p-p=p(1-\mu)+(1-p)\nu-p

0=p(1-\mu)+(1-p)\nu-p

Multiply everything out.

0=p-p\mu+\nu-p\nu-p

cancel out p-p on the right

0=-p\mu+\nu-p\nu

rearrange

0=-p\mu-p\nu+\nu

factor

0=-p(\mu+\nu)+\nu

subtract \nu from both sides

-\nu=-p(\mu+\nu)

multiply both sides by negative one

\nu=p(\mu+\nu)

solve for p

p=\frac{\nu}{\mu+\nu}

This gives us the equilibrium allele frequency as determined by the forward and backward mutation rates.  Equilibrium values are often designated with a "hat" symbol like this:

\hat{p}=\frac{\nu}{\mu+\nu}

From looking at this equation you can see the the equilibrium frequency of an allele is given by the mutation rate to the allele state as a fraction out of the total of the mutation rates, \frac{\nu}{\mu+\nu}.  So for example, if \nu is one half of the total rate of mutations (in other words if \nu=\mu) then \hat{p}=\frac{1}{1+1}=\frac{1}{2}. This seems to make intuitive sense, if different alleles are mutating into each other at the same rate, then over enough time the population will be made up of a 50/50 ratio of the alleles.

Going back to the original equation and multiplying everything out to rearrange the right hand side:

p_g=p_{g-1}(1-\mu)+(1-p_{g-1})\nu

p_g=p_{g-1}-p_{g-1}\mu+\nu-p_{g-1}\nu

p_g=p_{g-1}-p_{g-1}\mu-p_{g-1}\nu+\nu

p_g=p_{g-1}(1-\mu-\nu)+\nu

is this helpful?

p_g=p_{g-1}(1-\mu-\nu)+\nu

p_{g-1}=p_{g-2}(1-\mu-\nu)+\nu

p_{g-2}=p_{g-3}(1-\mu-\nu)+\nu

substituting back in gives:

p_g=p_{g-1}(1-\mu-\nu)+\nu

p_g=(p_{g-2}(1-\mu-\nu)+\nu)(1-\mu-\nu)+\nu

p_g=((p_{g-3}(1-\mu-\nu)+\nu)(1-\mu-\nu)+\nu)(1-\mu-\nu)+\nu

which is

p_g=p_{g-3}(1-\mu-\nu)^3+\nu(1-\mu-\nu)^2+\nu(1-\mu-\nu)+\nu

which can be written in a summation series as

p_g=p_0(1-\mu-\nu)^g + \displaystyle{\sum_{n=0}^{g-1}} \nu(1-\mu-\nu)^n

or (pulling \nu out of the sum)

p_g=p_0(1-\mu-\nu)^g + \nu \displaystyle{\sum_{n=0}^{g-1}} (1-\mu-\nu)^n

I'm not sure this is very helpful.  The sum on the right is hard to work with.  It would probably be easier to plot the change over time by simply iterating the original recursion equation.  However, one interesting thing about the equation above is the term p_0(1-\mu-\nu)^g goes to zero as g becomes large because a large number of numbers less than one are being multiplied together.  This makes sense because as g becomes large the equilibrium is approached and the initial condition p_0 matters less and less.  In fact, using this logic, and letting g go to infinity, \infty, we could write down:

\hat{p}=\frac{\nu}{\mu+\nu}=\nu \displaystyle{\sum_{n=0}^{\infty}} (1-\mu-\nu)^n

dividing by \nu gives:

\frac{1}{\mu+\nu}= \displaystyle{\sum_{n=0}^{\infty}} (1-\mu-\nu)^n

which makes sense in the sense that the equilibrium allele frequency is only a function of the mutation rates (again the starting point should disappear because after an infinite number of steps toward a single equilibrium the result will be the same no matter where you started).

Is it known in general that this infinite series reduction pattern is true?  If we set b=\mu+\nu then

\frac{1}{\mu+\nu}= \displaystyle{\sum_{n=0}^{\infty}} (1-\mu-\nu)^n

becomes:

\frac{1}{b}= \displaystyle{\sum_{n=0}^{\infty}} (1-b)^n.

Then setting a=1-b gives:

\frac{1}{1-a}= \displaystyle{\sum_{n=0}^{\infty}} a^n

This is a classic result of an infinite sum of a geometric series.  (It is called geometric because raising a to the n is like adding dimensions in geometry.  a^0 is a point; a^1 is a line of length a; a^2 is a square with sides of length a; a^3 is a cube with edges of length a; etc.)

Also, looking at p_g=p_0(1-\mu-\nu)^g + \nu \displaystyle{\sum_{n=0}^{g-1}} (1-\mu-\nu)^n again.  If we have a total mutation rate M=\mu+\nu then the first part of the equation, p_0(1-\mu-\nu)^g = p_0(1-M)^g, is basically the same as the simpler model of irreversible mutation, p_g=p_0(1-\mu)^g.  So we can interpret the first part of this equation, p_0(1-\mu-\nu)^g, as the fraction of alleles that have not mutated yet.  Of course this will disappear as the equilibrium is approached.

Backing up to

p_g=p_0(1-\mu-\nu)^g + \nu \displaystyle{\sum_{n=0}^{g-1}} (1-\mu-\nu)^n

and looking at the part on the right.  Now that we realize this is a geometric series we might be able to reduce the finite series.

A finite geometric series can be reduced by

\frac{1-a^g}{1-a}= \displaystyle{\sum_{n=0}^{g-1}} a^n            (link)

As above, substituting a=1-b=1-(\mu+\nu)=1-\mu-\nu and realizing 1-a=b=\mu+\nu

\nu \displaystyle{\sum_{n=0}^{g-1}} (1-\mu-\nu)^n = \nu \frac{1 - (1-\mu-\nu)^g}{\mu+\nu} = \frac{\nu - \nu(1-\mu-\nu)^g}{\mu+\nu}.

Again, as g goes to infinity (1-\mu-\nu)^g goes to zero and \frac{\nu - \nu(1-\mu-\nu)^g}{\mu+\nu} becomes \frac{\nu}{\mu+\nu}.

So, to be able to directly calculate the allele frequency at any point along the way as reversible mutations drive the system from a starting point toward equilibrium the equation is:

p_g=p_0(1-\mu-\nu)^g +\frac{\nu- \nu (1-\mu-\nu)^g}{\mu+\nu},

which can be simplified to:

p_g=\frac{\nu+(p_0(\mu+\nu)-\nu)(1-\mu-\nu)^g}{\mu+\nu}.

This can be divided into an equilibrium component \frac{\nu}{\mu+\nu} and a component quantifying the deviation from equilibrium due to the starting point p_0 and the time since starting g:

p_g=\frac{\nu}{\mu+\nu}+\frac{(p_0(\mu+\nu)-\nu)(1-\mu-\nu)^g}{\mu+\nu}

or

p_g=\hat{p}+\frac{(p_0(\mu+\nu)-\nu)(1-\mu-\nu)^g}{\mu+\nu}

Here is an example plot showing the predicted decline in allele frequency from 100% (blue) and the rise from 0% (red) compared to the equilibrium (yellow) where one mutation rate is a fifth of the other: \nu = 10^{-4} and \mu= 2 \times 10^{-5}.  (Generations is on the x-axis and allele frequency is on the y-axis.)

reversible-mutation-example

So, by 50,000 generations, which again is not that long compared to geologic timescales, the allele frequency is predicted to essentially arrive at an equilibrium.  In this case we expect it to be  0.8\bar{3} = \frac{5}{6} = \frac{10}{12} = \frac{10^{-4}}{10^{-4} + 2 \times 10^{-5}} = \frac{\nu}{\nu + \mu}

Exactly how long does it take to converge?  The allele frequency trajectories asymptotically approach equilibrium so they will never be exactly equal.  We could however ask when the difference is below a certain threshold.  If we start one p_0=1 and the other p_0=0 this is the largest difference, d, we can begin with.  The difference between the two trajectories is (with zero and 1 substituted for p_0 in red):

d=(\frac{\nu}{\mu+\nu}+\frac{(\color{red}{1}(\mu+\nu)-\nu)(1-\mu-\nu)^g}{\mu+\nu}) - (\frac{\nu}{\mu+\nu}+\frac{(\color{red}{0}(\mu+\nu)-\nu)(1-\mu-\nu)^g}{\mu+\nu})

We can immediately get rid of the difference in the equilibrium components \frac{\nu}{\mu+\nu}-\frac{\nu}{\mu+\nu}, which is zero.  Also multiplying by one and zero simplifies the equation a little more.  Long story short, a lot cancels out and we end up with something familiar:

d=(1-\mu-\nu)^g,

which makes a lot of sense.  We realized above that this was the fraction that had not yet mutated, which is the reason the curve deviates from equilibrium (i.e. starting points are different).

Solving for g to get the number of generations gives us:

g=\frac{\log(d)}{\log(1-\mu-\nu)}.

Plugging in the mutation rates from the example above we find that the difference in highest and lowest trajectories drops to less than 1% after 38,375 generations.

The probability that a transposable element inserts into a specific gene sequence is very small.  Once it is inserted (and sticking with simplistic assumptions here) the probability it will excise is likely much higher than the insertion in the first place.  Say we had data from huge fields of purple kernel color corn that had been maintained for many generations.  In, let's say, 3% of these we find a mutation due to a transposable element that prevents the pigment from being produced.  Assuming the population is at or very near to equilibrium what can we say about the insertion versus excision relative mutation rates?

\hat{p}=\frac{\nu}{\mu+\nu}

can be rearranged to

\frac{1}{\hat{p}} - 1 = \frac{\mu}{\nu}.

Setting \hat{p} = 0.03 gives

\frac{\mu}{\nu} \approx 32

So in this example, the excision rate is approximately 32 times higher than the insertion rate.  (If you define \mu as the excision rate and \nu as the insertion rate.  We can alternatively set \hat{p} = 1 - 0.03 = 0.97 and switch the symbols for the insertion and excision rates but the result is the same.)

Backing up to a more basic level, why is there an equilibrium at all?  We might also intuitively think that if one mutation rate is higher than the other that we would eventually end up with all of one type (allele) in the population.  (Actually this can happen when we start talking about drift in finite population but we are still ignoring that at the moment and pretending populations are so large that they are essentially infinite and even tiny fractions will be present.)  The trick to thinking about this is that as one allele becomes rarer it is a smaller target for mutations in the population.  As it becomes more common and in more copies there are more opportunities for mutations to occur to change the allele into a different form.  This frequency effect "buffers" the alleles towards intermediate frequencies, so an allele is never quite lost or fixed and we end up with an equilibrium.