Monthly Archives: April 2015

Harmonics, Convergence, and the Diffusion Approximation

Lot's of different wave forms can be made by adding together harmonic series in certain ways.  A simple sine wave can have harmonics that vibrate twice as fast, three times as fast, etc.  Here is a plot of the odd numbered harmonics of a sine wave.

oddharmonics

As the wavelength is reduced the amplitude of each wave is also purposely reduced.  It turns out that if you add these waves together you start approaching what is called a square wave.

oddharmsquareapproach

However, the approach can be slow.  There is a lot of wiggle in the waveform.  Below is a plot with 50 odd numbered harmonics added together; .

square50

Closer, but you can still see the oscillation at the height of each peak.

If you add together the even harmonics, , you get a sawtooth wave.

saw50

Odd harmonics with a different weighting scheme, , give triangular waves that converge quite fast.

traingle50

Almost any waveform is possible, 

sharkteeth50

including this, 

crazywave50

Okay, so where am I going with this; there is a point here that ties back into population genetics.   Complex curves can be built up from the sum of a series of simpler curves.  However, it is also clear that in some cases the end result can take quite a large sum (the wiggle in the square and sawtooth waves above), in other words the final curve is slow to converge and requires a large number of harmonics.

Kimura's (1955) famous diffusion approximations to model the process of genetic drift in a finite population are built up in a similar fashion.  The final curve is a sum of an infinite series of higher order harmonics.  The math is messy.  It makes use of the hypergeometric function and Gegenbauer polynomials, but the underlying idea is similar to the examples given above.

In the simplest case of the diffusion approximation the series is

In this equation is the allele frequency, is the population size of diploid individuals, is the time in generations, which is often combined with in a parameter like , and is a hypergeometric function (specifically it is the ordinary Gaussian hypergeometric function ; there are many more but this is the most common).  In the graph below are the first six odd order curves of the series with and .

diffusionoddharmonicindividual

You can see that they tend to build (are positive) near the centre of the x-axis near a frequency of 0.5 and tend to alternative positive and negative near the edges cancelling each other out for a sum near zero.

Plotting these as sums of each new harmonic plus the previous ones gives these curves.

diffusionoddharmonicsum

This plot just focuses on the last step, which is the sum up to in the equation.

diffusion12

This is starting to give us the expected distribution of allele frequencies expected after N/10 generation of genetic drift when starting from a frequency of 1/2; however, the wiggle to positive and negative values near the edges means that it has not yet converged satisfactory.

Taking the iterations up to gives a nice result.

diffusion25

However, what if we want to look at even shorter periods of time.  Holding the y-axis scale the same and letting the peak run off the top of the graph look at what happens after just N/100 generations.

diffusionN100i25`

The sum has to be taken up to the to get things to smooth out.

diffusionN100i100

This takes some time on the computer---the hypergeometric function takes a bit of grinding to calculate.  Drift over shorter periods of time is precisely some of the situations where we might want to use this type of approach (it addresses standing variation and ignores new mutations that occur over deeper periods of time). This is why I have been exploring a faster alternative with the beta distribution that I wrote about in an earlier post.

By the way, I used mathematica to generate the plots above.  Here is the code if you are interested.

t = 10;
n = 1000;
p = 0.5;
m = 100;
(*the frequency distribution (probability) of the polymorphic \
fraction*)
poly =
Plot[Sum[p*(1 - p)*i*(i + 1)*(2*i + 1)*
Hypergeometric2F1[1 - i, i + 2, 2, x]*
Hypergeometric2F1[1 - i, i + 2, 2, p]*E^(-t*i*(i + 1)/(4*n)), {i,
1, m}], {x, 0, 1}, PlotRange -> {-1, 4}, Filling -> Axis,
PlotStyle -> Blue,
AxesLabel -> {"allele frequency", "probability density"}]

I also tried to plot this in R and got the following.

Rplot

I'm not sure what is going on but some kind of error seems to be building across the function.

Here is my code
myDiffuse <- function(x,p,t,N,max){
max=25
sum=0
for(i in 1:max){
hypgeosumx=myHypergeoGaussSeries(i,x,max)
hypgeosump=myHypergeoGaussSeries(i,p,max)
sum=sum+p*(1-p)*i*(i+1)*(2*i+1)*hypgeosumx*hypgeosump*exp(-i*(i+1)*t/(4*N))
}
return(sum)
}

myKayAll <- function(i, l){
n=1
for(j in 1:(l-1)){
n=n*(j-i)*(j+1+i)
}
d=factorial(l)*factorial(l-1)
return(n/d)
}

myHypergeoGaussSeries <- function(i, z, m){
h=1
for(j in 2:m){
h=h+myKayAll(i,j)*z^(j-1)
}
return(h)
}

x <- seq(0, 1, len = 10001)
t=10
N=1000
p=0.5
max=5
y<-myDiffuse(x,p,t,N,max)

plot(x,y,type='l',ylim=c(-5, 15))

2015 Tester Symposium

We had a lab presence at this year's annual Tester Symposium.  Aki Laruson gave the first presentation of the symposium and unofficially won the award for best dressed showing up in an Icelandic linen suit with suede shoes (and he had aviator glasses to top it off) and gave a very good presentation on his current and planned work in sea urchins.  Michael Wallstrom had an awesome poster describing his work with a new invasive algal-associated keratose sponge species.  It gained the notice of Dr. Jeremy Jackson (the invited distinguished guest and keynote speaker) who came over to chat with him about it.