# SimpleMathJax

I have neglected the genetics wiki on this site. One of the frustrations was not being able to write formulas correctly in the wiki markup code. However, I installed a SimpleMathJax extension and it allows TeX code to be added between  tags. I added a genetic drift page to the site to test it out. http://hawaiireedlab.com/gwiki/index.php?title=Genetic_Drift

# The evolution of antibiotic resistance

Here is one result from this semseter's genetics teaching lab that I wanted to share. The students grew bacteria on a series of gradient media that had increasing concentrations of an antibiotic. At the end of the experiment the bacteria could grow on levels of antibiotic that would have prevented growth before the experiement (which we tested with a control that was genetically identical at the beginning of the experiment and was not exposed to antibiotics). The sucessive generations of bacteria evolved by mutations and selection to tolerate the antibiotic. (One of the goals of this was to show the students an example of evolution in action and illustrate the risks of over-using antibiotics.) We then measuered levels of gene expression for all the genes in the genome and identified which genes had increased their acitvity and which ones had decreased acitivity to allow them to survive (by extracting RNA and hybridizing it to an Affymetrix "GeneChip E. coli Genome 2.0 Array"). Next year I'm planning to have the students sequence some of the genes involved and try to find the precise mutations that have changed gene expression levels.

# Conway's recipe for success

“His recipe for success is to have 4 problems on the go: a big problem, difficult and important, that will probably depress you before it makes you successful; a workable problem, tedious but with a clear strategy so you can always make some progress and feel a sense of accomplishment; a book problem, for the book you're writing or may eventually write; and a fun problem, since life is hardly worth living if you're not having some fun.” pp. 114-115 Genius At Play: The Curious Mind of John Horton Conway by Siobhan Roberts, Bloomsbury Publishing 2015

# arXiv: Underdominance in Population Networks

We submitted a preprint of our "network" manuscript to arXiv and it will post today (link and info below).  We also submitted it to a journal but went over the page limit so are currently editing it down to be shorter.

http://arxiv.org/abs/1509.02205

arXiv:1509.02205

Title: Stability of Underdominant Genetic Polymorphisms in Population Networks
Authors: \'Aki J. L\'aruson and Floyd A. Reed
Categories: q-bio.PE

Heterozygote disadvantage is potentially a potent driver of population
genetic divergence. Also referred to as underdominance, this phenomena
describes a situation where a genetic heterozygote has a lower overall fitness
than either homozygote. Attention so far has mostly been given to
underdominance within a single population and the maintenance of genetic
differences between two populations exchanging migrants. Here we explore the
dynamics of an underdominant system in a network of multiple discrete, yet
interconnected, populations. Stability of genetic differences in response to
increases in migration in various topological networks is assessed. The network
topology can have a dominant and occasionally non-intuitive influence on the
genetic stability of the system. Applications of these results to theories of
speciation, population genetic engineering, and general dynamical systems are
described.

By the way, the is my first arXiv submission.  I wish I had done this years ago for other papers that were delayed for months, or in the two worst cases literally years, in review and resubmission cycles and then we ended up getting scooped in the end.  I am planning to use arXiv a lot more in the future.

It feels odd to submit something to be widely available before publication.  However, many journals now accept this (see also the discussion here) and there is excellent work that is freely available in arXiv.  There is also some discussion whether or not authors should cite preprints on arXiv with the general feeling that this is fine and in fact appropriate to do so and should be encouraged.

# Go Vulcans!

Jolene Sutton is starting a tenure track position as a new assistant professor in the Biology Department at UH Hilo this fall!

# Harmonics, Convergence, and the Diffusion Approximation

Lot's of different wave forms can be made by adding together harmonic series in certain ways.  A simple sine wave can have harmonics that vibrate twice as fast, three times as fast, etc.  Here is a plot of the odd numbered harmonics of a sine wave.

As the wavelength is reduced the amplitude of each wave is also purposely reduced.  It turns out that if you add these waves together you start approaching what is called a square wave.

However, the approach can be slow.  There is a lot of wiggle in the waveform.  Below is a plot with 50 odd numbered harmonics added together; $\sum_{i=1}^{50} \frac{\sin((2 i - 1) x )}{2 i - 1}$.

Closer, but you can still see the oscillation at the height of each peak.

If you add together the even harmonics, $\sum_{i=1}^{50} \frac{\sin(2 i x )}{2 i }$, you get a sawtooth wave.

Odd harmonics with a different weighting scheme, $\sum_{i=1}^{50} (-1)^i \frac{\sin((2 i -1) x )}{(2 i -1)^2}$, give triangular waves that converge quite fast.

Almost any waveform is possible,  $\sum_{i=1}^{50} \frac{\sin((2 i -1)^2 x )}{(2 i -1)^2}$

including this,  $\sum_{i=1}^{50} (-1)^i \frac{\sin((3 i -1) x )}{i ^2}$

Okay, so where am I going with this; there is a point here that ties back into population genetics.   Complex curves can be built up from the sum of a series of simpler curves.  However, it is also clear that in some cases the end result can take quite a large sum (the wiggle in the square and sawtooth waves above), in other words the final curve is slow to converge and requires a large number of harmonics.

Kimura's (1955) famous diffusion approximations to model the process of genetic drift in a finite population are built up in a similar fashion.  The final curve is a sum of an infinite series of higher order harmonics.  The math is messy.  It makes use of the hypergeometric function and Gegenbauer polynomials, but the underlying idea is similar to the examples given above.

In the simplest case of the diffusion approximation the series is

$\sum_{i=1}^{\inf} p (1-p) i (i+1) (2 i +1) \,F\!(1-i, i+2, 2, p) \,F\!(1-i, i+2, 2, x) \, e^{-i(i+1)t / 4N}$

In this equation $p$ is the allele frequency, $N$ is the population size of diploid individuals, $t$ is the time in generations, which is often combined with $N$ in a parameter like $\tau=t / N$, and $F$ is a hypergeometric function (specifically it is the ordinary Gaussian hypergeometric function ${}_2F_1$; there are many more but this is the most common).  In the graph below are the first six odd order curves of the series with $p=0.5$ and $\tau=0.1$.

You can see that they tend to build (are positive) near the centre of the x-axis near a frequency of 0.5 and tend to alternative positive and negative near the edges cancelling each other out for a sum near zero.

Plotting these as sums of each new harmonic plus the previous ones gives these curves.

This plot just focuses on the last step, which is the sum up to $i=12$ in the equation.

This is starting to give us the expected distribution of allele frequencies expected after N/10 generation of genetic drift when starting from a frequency of 1/2; however, the wiggle to positive and negative values near the edges means that it has not yet converged satisfactory.

Taking the iterations up to $i=25$ gives a nice result.

However, what if we want to look at even shorter periods of time.  Holding the y-axis scale the same and letting the peak run off the top of the graph look at what happens after just N/100 generations.

The sum has to be taken up to the $i=100$ to get things to smooth out.

This takes some time on the computer---the hypergeometric function takes a bit of grinding to calculate.  Drift over shorter periods of time is precisely some of the situations where we might want to use this type of approach (it addresses standing variation and ignores new mutations that occur over deeper periods of time). This is why I have been exploring a faster alternative with the beta distribution that I wrote about in an earlier post.

By the way, I used mathematica to generate the plots above.  Here is the code if you are interested.

t = 10;
n = 1000;
p = 0.5;
m = 100;
(*the frequency distribution (probability) of the polymorphic \
fraction*)
poly =
Plot[Sum[p*(1 - p)*i*(i + 1)*(2*i + 1)*
Hypergeometric2F1[1 - i, i + 2, 2, x]*
Hypergeometric2F1[1 - i, i + 2, 2, p]*E^(-t*i*(i + 1)/(4*n)), {i,
1, m}], {x, 0, 1}, PlotRange -> {-1, 4}, Filling -> Axis,
PlotStyle -> Blue,
AxesLabel -> {"allele frequency", "probability density"}]

I also tried to plot this in R and got the following.

I'm not sure what is going on but some kind of error seems to be building across the function.

Here is my code
myDiffuse <- function(x,p,t,N,max){
max=25
sum=0
for(i in 1:max){
hypgeosumx=myHypergeoGaussSeries(i,x,max)
hypgeosump=myHypergeoGaussSeries(i,p,max)
sum=sum+p*(1-p)*i*(i+1)*(2*i+1)*hypgeosumx*hypgeosump*exp(-i*(i+1)*t/(4*N))
}
return(sum)
}

myKayAll <- function(i, l){
n=1
for(j in 1:(l-1)){
n=n*(j-i)*(j+1+i)
}
d=factorial(l)*factorial(l-1)
return(n/d)
}

myHypergeoGaussSeries <- function(i, z, m){
h=1
for(j in 2:m){
h=h+myKayAll(i,j)*z^(j-1)
}
return(h)
}

x <- seq(0, 1, len = 10001)
t=10
N=1000
p=0.5
max=5
y<-myDiffuse(x,p,t,N,max)

plot(x,y,type='l',ylim=c(-5, 15))

# 2015 Tester Symposium

We had a lab presence at this year's annual Tester Symposium.  Aki Laruson gave the first presentation of the symposium and unofficially won the award for best dressed showing up in an Icelandic linen suit with suede shoes (and he had aviator glasses to top it off) and gave a very good presentation on his current and planned work in sea urchins.  Michael Wallstrom had an awesome poster describing his work with a new invasive algal-associated keratose sponge species.  It gained the notice of Dr. Jeremy Jackson (the invited distinguished guest and keynote speaker) who came over to chat with him about it.

# Cantor-SageMath-Maxima

I like a lot of things about Wolfram's Mathematica software; it is an extremely useful tool for symbolic mathematical manipulations and visualization of various functions.  However, it is expensive software so recently I have been exploring if there is a free alternative.  In doing this I came across Cantor that can run SageMath as a backend, which in turn is built on top of Maxima and other packages.

In the screenshot above I took it for a test drive.  In the first lines I set up an equation $x^2 + x/2 = a$ and told it to solve for $x$.  In the next lines it integrated one of the solutions,

$\int_{}^{} \! \left( \frac{1}{4} \sqrt{16a+1}-\frac{1}{4} \right) \, \mathrm{d}a$.

Right after that I had it display the TeX code for typesetting the equation, which I can reuse to get it displayed below.

$\frac{1}{96} \, {\left(16 \, a + 1\right)}^{\frac{3}{2}} - \frac{1}{4} \, a$

Next is a five by five matrix of random numbers, from this the eigenvalues (a summary of how the matrix transformation tends to behave when applied to a system) are calculated and in this example two of them happen to be complex numbers (which indicates that the system has a rotational quality in two of its five dimensions).  Last there is a plot to visualize the matrix entries but it is cut off at the bottom of the window in this image.

I am impressed.  It is still not capable of all the tools built into mathematica but I am glad I came across this.

# Some pictures in review

We had a bit of cautioned excitement but it turned out to be a false alarm with our mosquito transformation project (link).  To illustrate, the above image is what we expect to see with the mosquito larvae under UV light.  It is dark because we need to minimize the regular light to see the fluorescent pattern; plus we are limited to short exposures because the larvae move around.  Three is some autofluorescence in the thorax (green) and gut (red) but in general the head is dark.  We were injected a 3xP3 plasmid and we expected a green glow in a band in the head if the germline transformation was successful.  At the end of January Jolene found a few mosquitoes that looked like this.

The image is really dark because the larvae would not stop moving, so we tried to snap the image as fast as possible.  I am avoiding messing with the image too much because I want to present it as close to what we saw as possible.  If you look toward the top you can see that there is a green band in the head that was not there before.  This is kind of what we were looking for but it was not exactly right.  We expected 3xP3 expression to be more associated with the eyes and there is also red expression nearby.  Long story short it was not a successful transformation.  It appears to have been an artifact of the food and algae of the water they were in and it faded over the next few days.  Stay tuned, we are continuing injections and are now using a different set of plasmids...

In other news, I showed some images of newly fertilized sea urchin embryos and mentioned ageing them a few days to the pluteus stage (link).  We did that but I never showed any pictures.  Well here is one, but it is a little fuzzy because the little things zoom around in the sea water scooping up food; this one was rotating so the center is more focused than the exterior.

Aki has extracted RNA from them and it is being sequenced by a genomics facility.  He currently has over 9 million reads with over 16,000 genes from around the genome identified.  I am purposely avoiding posting too much online about his project before he has a chance to analyse and publish the results himself, but here is an image of a tiny part of the genome from his data, less than 16,000 base pairs.  This is the newly sequenced, never seen before, circular mitochondrial genome of Tripneustes gratilla, the collector urchin.

If you flatten the DNA out into a linear sequence and map the RNA reads to it you can see the relative expression levels of the different genes (the blue hills above the different features).

Zooming in you can see the individual sequences that are assembled by aligning overlapping sequences to reconstruct the complete gene sequence. Below is an example from the ND1 gene.

The bases highlighted in blue are disagreements due to occasional sequencing errors.  Sometimes these are common however and are due to real genetic variation among the individual sampled.  Here is a look at one of these sites, zoomed in even closer, in the COX1 gene.

There is a C/T site that is at approximately 50/50 frequency and these sequence reads are made up of a sample from two different individuals.  In other words this position is a real genetic difference among individuals that are out there in the population.

Finally, some time for some Drosophila pictures.

As the FRT/FLP flies emerge from the vial that was heat shocked (link) the mosaic pattern is encompassing larger and larger sectors (for a reminder, the eyes should either be all red or all white because the cells within an organism are (essentially) genetically identical; the cells in these fly's eyes are genetically different from each other).  The flies in these pictures were younger at the time of the heat shock (compared to the earlier post) so the cells that had the genetic rearrangement have larger numbers of daughter cells.  This gives us some indication of the physical pattern of relatedness of groups of cells on the body and it doesn't always follow what we might expect.

# Somatic genetic mosaics

For today I want to show something from a side-project that Aki Laruson is involved in.  The aim here is to generate some preliminary data for a later grant application to expand the project in the near future.  Here are pictures of two flies we generated that are genetic mosaics.  They have groups of cells throughout their bodies that are made up of two different genotypes at a single chromosomal position (usually we think of all cells in our bodies as being genetically identical to each other, although that is overly simplistic for the most part it is true).  In most tissues the genetic difference in these flies does not result in a visible genotype but in the eyes we can see the different cells in the flies.  There are groups of cells that are either red or white phenotypes depending on their underlying genetic make-up.  (Normally a Drosophila fruit fly would either have all red or all white cells in its eyes depending on what alleles it inherited from its parents.)

This was done using genetic sequences from a yeast plasmid (the two micron plasmid or $2 \mu m$ plasmid) that are inserted into the fly's genome (pioneered by Golic and Lindquist 1989 (PubMed, Google Scholar)).  There is an enzyme called flippase (FLP) that recognizes (binds to) and cuts a specific sequence known as the Flippase Recognition Target (FRT), 5'-GAAGTTCCTATTCtctagaaaGtATAGGAACTTC-3'.  In yeast the FLP-FRT system causes rearrangements to the plasmids structure that allow it to amplify in numbers in the cell when the cell undergoes DNA replication (e.g., Chan et al. 2013 (PubMed, Google Scholar)).

Here in flies the two FRT sites have been positioned on either side of part of the white gene sequence and FLP is placed under the Drosophila heat shock promoter (Hsp70).  We exposed the larvae to 37 C temperatures (by immersing vials containing the larvae in warm water for an hour) to induce the expression of FLP by the flies normal gene activation reaction to heat, which then recombined the two FRT sites together and caused inactivation of the white gene, by deleting part of the gene sequence that was between the FRT sites, resulting in a loss of pigment in the cells of the eyes that are descended from cells in which this deletion happened in the larvae.  (By the way, a common misconception is that the white gene encodes an eye pigment gene.  It does not, but it is in a pathway that is required for the eventual production of several eye pigments in addition to other important molecules.)  There are two broad categories of cells that we are interested in here, somatic and germline cells.  Only germline cells produce gametes that result in the next generation so only the DNA from germline cells get passed on to the next generation; however, most of the cells in a multicellular animal's body are somatic cells that contain DNA that does not get passed on to the next generation.  Ultimately we want flies that have the DNA rearrangement in all the cells of their body.  We can see that the FLP-FRT rearrangement worked in some cells by the mosaic pattern in the eyes but we do not know if this has affected the germline of each individual fly.  If it has, then any offspring arising from these affected cells will contain the genetic rearrangement in all of their cells (all of the cells will be descended from the same two starting gametes, genetic mosaics of this type should not be able to be passed on from generation to generation).  Now we will breed these mosaic flies together and select for all white eyed flies that contain the deletion in all the cells of their body.  Then it's on to the next stage of the experiment.