Difference between revisions of "Coalescent"

From Genetics Wiki
Jump to: navigation, search
 
(6 intermediate revisions by the same user not shown)
Line 1: Line 1:
=Coalescence of Two Lineages=
+
=The Coalescence of Two Lineages=
 +
 
 +
The chance two lineages coalesce in the previous generation is the chance that they pick the exact same copy of a gene to inherit which is <math>1/2N</math> because there are 2''N'' total copies of a gene in the population.
 +
 
 +
The chance they did not coalesce is therefore the remaining probability <math>1-1/2N</math>.
 +
 
 +
Each ''g'' generation there is the same independent probability of not coalescing, a total of <math>(1-1/2N)^g</math>, until a coalescent event occurs with probability <math>1/2N</math>.
 +
 
 +
Therefore, the probability of coalescence in the ''g'' generation is
 +
 
 +
<math>P(\text{coalescence at time }g)=\frac{1}{2N}\left( 1 - \frac{1}{2N} \right)^g</math>
 +
 
 +
which is a form of geometric distribution in discrete time.
 +
 
 +
In a large population over many generations this is closely approximated by an exponential distribution
 +
 
 +
<math>P(\text{coalescence at time }g)=\frac{1}{2N}\left( 1 - \frac{1}{2N} \right)^g \approx \frac{1}{2N}e^{-g/2N}</math>
 +
 
 +
 
  
 
The PDF of an exponential distribution of the coalescence of two lineages in a diploid population of size ''N''.
 
The PDF of an exponential distribution of the coalescence of two lineages in a diploid population of size ''N''.
Line 5: Line 23:
 
<math>P(\text{coalescence at time }g)=\frac{1}{2N}e^{-g/2N}</math>
 
<math>P(\text{coalescence at time }g)=\frac{1}{2N}e^{-g/2N}</math>
  
For example, the probability of two lineages coalescing in a small population of 20 individuals in exactly nine generations is 2%.
+
For example, the probability of two lineages coalescing in a small population of 20 individuals in exactly the ninth generation is 2%.
 +
 
  
 
Integrate to get the CDF.  
 
Integrate to get the CDF.  
  
<math>F(\text{coalescence at time }g)=\int_0^g\frac{1}{2N}e^{-g/2N}</math>
+
<math>F(\text{coalescence at time }g)=\int_0^g\frac{1}{2N}e^{-g/2N} \text{d}g</math>
  
<math>F(\text{coalescence at time }g)=\frac{1}{2N}\int_0^g e^{-g\frac{1}{2N}}</math>
+
<math>F(\text{coalescence at time }g)=\frac{1}{2N}\int_0^g e^{-g\frac{1}{2N}} \text{d}g</math>
  
 
<math>F(\text{coalescence at time }g)=\frac{1}{2N} \frac{-e^{-g\frac{1}{2N}}}{\frac{1}{2N}} + C</math>
 
<math>F(\text{coalescence at time }g)=\frac{1}{2N} \frac{-e^{-g\frac{1}{2N}}}{\frac{1}{2N}} + C</math>
Line 17: Line 36:
 
<math>F(\text{coalescence at time }g)=-e^{-g/2N} + C</math>
 
<math>F(\text{coalescence at time }g)=-e^{-g/2N} + C</math>
  
Because the CDF must <math>\lim_{g \to \infty}\left( -e^{-g/2N} + C \right)= 1</math> and <math>\lim_{g \to \infty} -e^{-g/2N}= 0</math> then <math>C = 1</math>.
+
Because by definition the CDF area must sum to one, <math>\lim_{g \to \infty}\left( -e^{-g/2N} + C \right)= 1</math>, and the limit of <math>\lim_{g \to \infty} -e^{-g/2N}= 0</math> then the constant of integration must be one, <math>C = 1</math>.
  
 
<math>F(\text{coalescence at time }g)=-e^{-g/2N} + 1</math>
 
<math>F(\text{coalescence at time }g)=-e^{-g/2N} + 1</math>
  
 
<math>F(\text{coalescence at time }g)=1-e^{-g/2N}</math>
 
<math>F(\text{coalescence at time }g)=1-e^{-g/2N}</math>
 +
 +
For example, there is a 95% probability that two lineages will coalesce within 6''N'' generations.
 +
 +
<math>F(\text{coalescence at time }g)=0.95=1-e^{-g/2N}</math>
 +
 +
<math>1-0.95=e^{-g/2N}</math>
 +
 +
<math>\log_e 0.05=-\frac{g}{2N}</math>
 +
 +
<math>-2N\log_e 0.05=g</math>
 +
 +
<math>-2N\times-3=g</math>
 +
 +
<math>6N=g</math>
 +
 +
=The Coalescence of More than Two Lineages=
 +
 +
=The Coalescence of an Infinite Number of Lineages=
 +
 +
=Coalescence in a Population of Changing Size=

Latest revision as of 11:03, 18 February 2016

The Coalescence of Two Lineages

The chance two lineages coalesce in the previous generation is the chance that they pick the exact same copy of a gene to inherit which is [math]1/2N[/math] because there are 2N total copies of a gene in the population.

The chance they did not coalesce is therefore the remaining probability [math]1-1/2N[/math].

Each g generation there is the same independent probability of not coalescing, a total of [math](1-1/2N)^g[/math], until a coalescent event occurs with probability [math]1/2N[/math].

Therefore, the probability of coalescence in the g generation is

[math]P(\text{coalescence at time }g)=\frac{1}{2N}\left( 1 - \frac{1}{2N} \right)^g[/math]

which is a form of geometric distribution in discrete time.

In a large population over many generations this is closely approximated by an exponential distribution

[math]P(\text{coalescence at time }g)=\frac{1}{2N}\left( 1 - \frac{1}{2N} \right)^g \approx \frac{1}{2N}e^{-g/2N}[/math]


The PDF of an exponential distribution of the coalescence of two lineages in a diploid population of size N.

[math]P(\text{coalescence at time }g)=\frac{1}{2N}e^{-g/2N}[/math]

For example, the probability of two lineages coalescing in a small population of 20 individuals in exactly the ninth generation is 2%.


Integrate to get the CDF.

[math]F(\text{coalescence at time }g)=\int_0^g\frac{1}{2N}e^{-g/2N} \text{d}g[/math]

[math]F(\text{coalescence at time }g)=\frac{1}{2N}\int_0^g e^{-g\frac{1}{2N}} \text{d}g[/math]

[math]F(\text{coalescence at time }g)=\frac{1}{2N} \frac{-e^{-g\frac{1}{2N}}}{\frac{1}{2N}} + C[/math]

[math]F(\text{coalescence at time }g)=-e^{-g/2N} + C[/math]

Because by definition the CDF area must sum to one, [math]\lim_{g \to \infty}\left( -e^{-g/2N} + C \right)= 1[/math], and the limit of [math]\lim_{g \to \infty} -e^{-g/2N}= 0[/math] then the constant of integration must be one, [math]C = 1[/math].

[math]F(\text{coalescence at time }g)=-e^{-g/2N} + 1[/math]

[math]F(\text{coalescence at time }g)=1-e^{-g/2N}[/math]

For example, there is a 95% probability that two lineages will coalesce within 6N generations.

[math]F(\text{coalescence at time }g)=0.95=1-e^{-g/2N}[/math]

[math]1-0.95=e^{-g/2N}[/math]

[math]\log_e 0.05=-\frac{g}{2N}[/math]

[math]-2N\log_e 0.05=g[/math]

[math]-2N\times-3=g[/math]

[math]6N=g[/math]

The Coalescence of More than Two Lineages

The Coalescence of an Infinite Number of Lineages

Coalescence in a Population of Changing Size