Harmonics, Convergence, and the Diffusion Approximation

Lot's of different wave forms can be made by adding together harmonic series in certain ways.  A simple sine wave can have harmonics that vibrate twice as fast, three times as fast, etc.  Here is a plot of the odd numbered harmonics of a sine wave.

As the wavelength is reduced the amplitude of each wave is also purposely reduced.  It turns out that if you add these waves together you start approaching what is called a square wave.

However, the approach can be slow.  There is a lot of wiggle in the waveform.  Below is a plot with 50 odd numbered harmonics added together; $\sum_{i=1}^{50} \frac{\sin((2 i - 1) x )}{2 i - 1}$.

Closer, but you can still see the oscillation at the height of each peak.

If you add together the even harmonics, $\sum_{i=1}^{50} \frac{\sin(2 i x )}{2 i }$, you get a sawtooth wave.

Odd harmonics with a different weighting scheme, $\sum_{i=1}^{50} (-1)^i \frac{\sin((2 i -1) x )}{(2 i -1)^2}$, give triangular waves that converge quite fast.

Almost any waveform is possible,  $\sum_{i=1}^{50} \frac{\sin((2 i -1)^2 x )}{(2 i -1)^2}$

including this,  $\sum_{i=1}^{50} (-1)^i \frac{\sin((3 i -1) x )}{i ^2}$

Okay, so where am I going with this; there is a point here that ties back into population genetics.   Complex curves can be built up from the sum of a series of simpler curves.  However, it is also clear that in some cases the end result can take quite a large sum (the wiggle in the square and sawtooth waves above), in other words the final curve is slow to converge and requires a large number of harmonics.

Kimura's (1955) famous diffusion approximations to model the process of genetic drift in a finite population are built up in a similar fashion.  The final curve is a sum of an infinite series of higher order harmonics.  The math is messy.  It makes use of the hypergeometric function and Gegenbauer polynomials, but the underlying idea is similar to the examples given above.

In the simplest case of the diffusion approximation the series is

$\sum_{i=1}^{\inf} p (1-p) i (i+1) (2 i +1) \,F\!(1-i, i+2, 2, p) \,F\!(1-i, i+2, 2, x) \, e^{-i(i+1)t / 4N}$

In this equation $p$ is the allele frequency, $N$ is the population size of diploid individuals, $t$ is the time in generations, which is often combined with $N$ in a parameter like $\tau=t / N$, and $F$ is a hypergeometric function (specifically it is the ordinary Gaussian hypergeometric function ${}_2F_1$; there are many more but this is the most common).  In the graph below are the first six odd order curves of the series with $p=0.5$ and $\tau=0.1$.

You can see that they tend to build (are positive) near the centre of the x-axis near a frequency of 0.5 and tend to alternative positive and negative near the edges cancelling each other out for a sum near zero.

Plotting these as sums of each new harmonic plus the previous ones gives these curves.

This plot just focuses on the last step, which is the sum up to $i=12$ in the equation.

This is starting to give us the expected distribution of allele frequencies expected after N/10 generation of genetic drift when starting from a frequency of 1/2; however, the wiggle to positive and negative values near the edges means that it has not yet converged satisfactory.

Taking the iterations up to $i=25$ gives a nice result.

However, what if we want to look at even shorter periods of time.  Holding the y-axis scale the same and letting the peak run off the top of the graph look at what happens after just N/100 generations.

The sum has to be taken up to the $i=100$ to get things to smooth out.

This takes some time on the computer---the hypergeometric function takes a bit of grinding to calculate.  Drift over shorter periods of time is precisely some of the situations where we might want to use this type of approach (it addresses standing variation and ignores new mutations that occur over deeper periods of time). This is why I have been exploring a faster alternative with the beta distribution that I wrote about in an earlier post.

By the way, I used mathematica to generate the plots above.  Here is the code if you are interested.

t = 10;
n = 1000;
p = 0.5;
m = 100;
(*the frequency distribution (probability) of the polymorphic \
fraction*)
poly =
Plot[Sum[p*(1 - p)*i*(i + 1)*(2*i + 1)*
Hypergeometric2F1[1 - i, i + 2, 2, x]*
Hypergeometric2F1[1 - i, i + 2, 2, p]*E^(-t*i*(i + 1)/(4*n)), {i,
1, m}], {x, 0, 1}, PlotRange -> {-1, 4}, Filling -> Axis,
PlotStyle -> Blue,
AxesLabel -> {"allele frequency", "probability density"}]

I also tried to plot this in R and got the following.

I'm not sure what is going on but some kind of error seems to be building across the function.

Here is my code
myDiffuse <- function(x,p,t,N,max){
max=25
sum=0
for(i in 1:max){
hypgeosumx=myHypergeoGaussSeries(i,x,max)
hypgeosump=myHypergeoGaussSeries(i,p,max)
sum=sum+p*(1-p)*i*(i+1)*(2*i+1)*hypgeosumx*hypgeosump*exp(-i*(i+1)*t/(4*N))
}
return(sum)
}

myKayAll <- function(i, l){
n=1
for(j in 1:(l-1)){
n=n*(j-i)*(j+1+i)
}
d=factorial(l)*factorial(l-1)
return(n/d)
}

myHypergeoGaussSeries <- function(i, z, m){
h=1
for(j in 2:m){
h=h+myKayAll(i,j)*z^(j-1)
}
return(h)
}

x <- seq(0, 1, len = 10001)
t=10
N=1000
p=0.5
max=5
y<-myDiffuse(x,p,t,N,max)

plot(x,y,type='l',ylim=c(-5, 15))

2015 Tester Symposium

We had a lab presence at this year's annual Tester Symposium.  Aki Laruson gave the first presentation of the symposium and unofficially won the award for best dressed showing up in an Icelandic linen suit with suede shoes (and he had aviator glasses to top it off) and gave a very good presentation on his current and planned work in sea urchins.  Michael Wallstrom had an awesome poster describing his work with a new invasive algal-associated keratose sponge species.  It gained the notice of Dr. Jeremy Jackson (the invited distinguished guest and keynote speaker) who came over to chat with him about it.