Difference between revisions of "Coalescence"

From Genetics Wiki
Jump to: navigation, search
(Sum of the infinite series)
(The coalescence of an infinite number of lineages)
Line 42: Line 42:
  
 
<math>4N\sum_{i=2}^\infty\frac{1}{i(i-1)} = 4N</math>
 
<math>4N\sum_{i=2}^\infty\frac{1}{i(i-1)} = 4N</math>
 +
 +
==Summary==
 +
So on average in general we expect the lineages within a panmictic species to all coalesce  with the last coalescence event 4''N'' generations in the past. From above we can also see that the expected time to coalescence of two lineages is 2''N'' generations. This predicts that all of the coalescence events will occur in the most recent 4''N'' - 2''N'' = 2''N'' generations and then the system will exist as two lineages until the last coalescence event 2''N'' + 2''N'' = 4''N'' generations in the past.

Revision as of 02:23, 18 September 2018

The coalescence of two lineages

Two lineages have a probability of coalescing (picking the same gene copy in the previous generation) of 1/(2N) because there are 2N total copies (in a diploid) to choose from.

The rate per generation is 1/(2N) so the average number of generations until this occurs is 2N generations.

On average two lineages are expected to coalesce to a common ancestor 2N generations in the past.

The coalescence of more than two lineages

The coalescence of an infinite number of lineages

Of course there are never an infinite number of lineages that coalesce; species are finite in number. Still it is useful to understand what the upper limit in coalescence time is that is approached with very large samples or in an entire population. Keep in mind that this is still only an expectation and there is a large variance assoaciated with these expectations.

To solve the limit we have to find the sum of an infinite series that is made up of the pattern of the sum of coalescence times as the number of sampled lineages increases.

As more lineages are added each step, with i lineages in the current step, the rate of coalescence increases by the Triangular Numbers (i(i-1)/2; these are 1, 3, 6, 10, 15, 21, ...) scaled by 2N generations: [math]\frac{\frac{i(i-1)}{2}}{2N}[/math]. The time that is added in the sum of times is the inverse of the rate or [math]\frac{2N}{\frac{i(i-1)}{2}}[/math].

Sum of the infinite series

[math]\sum_{i=2}^\infty\frac{2N}{\frac{i(i-1)}{2}}=\sum_{i=2}^\infty\frac{4N}{i(i-1)}=4N\sum_{i=2}^\infty\frac{1}{i(i-1)}[/math]

Note shifting the index starting point down by one, i=1 instead of i=2 in the sum in the next line.

[math]4N\sum_{i=2}^\infty\frac{1}{i(i-1)}=4N\sum_{i=1}^\infty\frac{1}{i(i+1)}=4N\sum_{i=1}^\infty\frac{1}{i}-\frac{1}{i+1}[/math]

Why is

[math]\frac{1}{i(i+1)}=\frac{1}{i}-\frac{1}{i+1}[/math]?

Multiply both sides by one to equalize the denominators and combine.

[math]\frac{1}{i}-\frac{1}{i+1}=\frac{i+1}{i+1}\frac{1}{i}-\frac{i}{i}\frac{1}{i+1}=\frac{i+1-i}{i(i+1)}=\frac{1}{i(i+1)}[/math]

Plug in the first few numbers of the sum to see the pattern.

[math]\sum_{i=1}^\infty\frac{1}{i}-\frac{1}{i+1} = \frac{1}{1} - \frac{1}{2} + \frac{1}{2} - \frac{1}{3} + \frac{1}{3} - \frac{1}{4} + \frac{1}{4} - \frac{1}{5} + \cdots[/math]

After the first one the pairs of fractions cancel out: +1/2 -1/2, +1/3, -1/3, +1/4, -1/4, ... this pattern continues to infinity. So,

[math]\sum_{i=2}^\infty\frac{1}{i(i-1)} = \sum_{i=1}^\infty\frac{1}{i}-\frac{1}{i+1} = 1[/math]

[math]4N\sum_{i=2}^\infty\frac{1}{i(i-1)} = 4N[/math]

Summary

So on average in general we expect the lineages within a panmictic species to all coalesce with the last coalescence event 4N generations in the past. From above we can also see that the expected time to coalescence of two lineages is 2N generations. This predicts that all of the coalescence events will occur in the most recent 4N - 2N = 2N generations and then the system will exist as two lineages until the last coalescence event 2N + 2N = 4N generations in the past.