Fluctuating Population Size and Genetic Drift

From Genetics Wiki
Jump to: navigation, search

In general the variance from a binomial process is [math]\sigma^2 = n p (1-p)[/math], where n is the number of trials and p is the probability of one of the two outcomes.

The variance for the sampling of two alleles each generation is [math]2N p (1-p)[/math], where N is the diploid population size and p is the allele frequency.

We like to work with allele frequencies rather than allele counts in a population. The variation is scaled by 2N so the underlying standard deviation is a fraction between zero and one.

[math]\frac{\sigma}{2N} = \frac{\sqrt{2N p (1-p)}}{2N}[/math]

[math]\sigma^2 = \left(\frac{\sqrt{2N p (1-p)}}{2N}\right)^2 = \frac{2N p (1-p)}{4N^2} = \frac{p (1-p)}{2N}[/math]

Imagine the population fluctuating between two sizes, N1 and N2, so that it spends half of its time at N1 and half at N2. There is an effective population size, Ne, of constant size that has the same average variance in allele frequency change each generation as the fluctuating population.

[math]\frac{p(1-p)}{2N_e}=\frac{1}{2}\times\frac{p(1-p)}{2N_1}+\frac{1}{2}\times\frac{p(1-p)}{2N_2}[/math]

We can cancel out a lot of components to get

[math]\frac{1}{N_e}=\frac{1}{2}\times\frac{1}{N_1}+\frac{1}{2}\times\frac{1}{N_2}[/math]

and solve for Ne.

[math]N_e=\frac{1}{\frac{1}{2}\times\frac{1}{N_1}+\frac{1}{2}\times\frac{1}{N_2}}[/math]

This is a harmonic mean, the inverse of the average of the inverses of the individual population sizes.

Of the three standard types of means (the arithmetic, geometric, and harmonic) the harmonic mean has the lowest average (when there is a difference between the numbers being averaged. What this means here is that small population sizes count more and have a larger effect on the average long term dynamics of genetic drift.

For example say that N1=900 and N2 = 1100. The arithmetic mean is 1000. However, the effective population size is the harmonic mean which is [math]N_e=\frac{1}{\frac{1}{2}\times\frac{1}{900}+\frac{1}{2}\times\frac{1}{1100}}=990[/math]. The difference from 1000 is not large because 900 and 1100 are not very different from each other. The reduction in the average due to the harmonic mean is larger when the individual values deviate from each other to a greater extent. For example if N1=100 and N2 = 1900 the arithmetic mean is still 1000 but [math]N_e=\frac{1}{\frac{1}{2}\times\frac{1}{100}+\frac{1}{2}\times\frac{1}{1900}}=190[/math].

This simple example illustrated two population sizes with half of the time spent at each. More generally

[math]N_e=\frac{1}{\sum\limits_{i}{\frac{t_i}{N_i}}}[/math],

where ti is the fraction of the total time spent at each corresponding size Ni.

to be continued ... (mention upper limit in time to effects, human example).

publications

Notes

Why does something like (working with the original variance)

[math]2N_e p (1-p) = (1/2) 2N_1 p (1-p) + (1/2) 2N_2 p (1-p)[/math]

[math]N_e = (1/2) N_1 + (1/2) N_2 [/math],

which results in an arithmetic mean, not work?

If N1 is very small and N2 is very large then N2 will dominate the average. However, genetic drift is greatest when the population is smallest and this should influence the average more. What we are really interested in is the change in allele frequency per generation, not the change is the number of copies of the allele (these are different things when the total population size changes between generations). So the standard deviation is scaled in terms of an allele frequency, to make different generations comparable on the same scale, by dividing it by the total number of gene copies, 2N, and since the variance is the square of this it is divided by 4N2.