Hardy 1908
Contents
Citation
Hardy, G. H. (1908) Mendelian Proportions in a mixed population. Science 28(706): 49-50.
Links
https://www.jstor.org/stable/1636004
Notes
First Paragraph
The tone of the first paragraph is a bit amusing, at least to a modern reader. It is hard to know how much of this is overt criticism versus the norms of formal language in 1908. However, this sentence is unambiguous, "I should have expected the very simple point which I wish to make to have been familiar to biologists". Hardy definitely felt that the question of expected genotype proportions was far too trivial for him to waste time upon; yet, he was forced to do so by the blatant misunderstandings of others. The irony is that this is probably the result for which he is best known today. Elsewhere he keeps going with "a little mathematics of the multiplication-table type is enough to show ...", "it is easy to see ...", and "there is not the slightest foundation for the idea ...".
Second Paragraph
At the time there was a debate about the general validity of Mendelian genetics in terms of understanding biological heritability. This was part of the Biometric-Mendelian Debate (or mutationalists versus selectionists), which was not resolved until the Modern Synthesis of biology later in the 20th century. One objection to Mendelian genetics that was brought up by Yule is that phenotypes in natural populations do not follow Mendelian proportions. The example of brachydactyly (shortened fingers and/or toes, one form of which is a dominant trait) in humans is used with the observation that the ratio of brachydactylus to unaffected individuals is not three to one.
A | a | |
A | AA | Aa |
a | Aa | aa |
The Punnett Square above for an F2 Cross shows the expected offspring from two heterozygous parents (whose alleles are represented at the top of the columns and beginning of the rows and are combined to yield the child's genotype) with three brachydactylus offspring in red (genotypes AA and Aa) to one brachydactylus child in blue (aa).
An F2 cross is a very artificial situation that begins by crossing pure breeding "parental" lines (AA and aa in this example) to generate heterozygous offspring (Aa) then these are crossed together to generate the F2s. It is unlikely that Yule was thinking of only this scenario. In a natural population offspring from all possible crosses (AA x AA, AA x Aa, AA x aa, Aa x Aa, Aa x aa, and aa x aa), only one of which is the F2 scenario, would be generated. If the two allele frequencies were precisely 1/2 then all possible crosses would also result in a 3:1 ratio of offspring phenotypes.
AA | 2 Aa | aa | |
AA | AA | 1/2 AA, 1/2 Aa | Aa |
2 Aa | 1/2 AA, 1/2 Aa | 1/4 AA, 1/2 Aa, 1/4 aa | 1/2 Aa, 1/2 aa |
aa | Aa | 1/2 Aa, 1/2 aa | aa |
Multiplying out the twos from the heterozygous parents gives:
AA | Aa | aa | |
AA | AA | AA, Aa | Aa |
Aa | AA, Aa | AA, 2 Aa, aa | Aa, aa |
aa | Aa | Aa, aa | aa |
The total proportions of offspring with all three genotypes are 4 AA, 8 Aa, and 4 aa, or a 1:2:1 ratio of genotypes and a 3:1 ratio of phenotypes (AA and Aa to aa). Mendel's crosses worked with allele frequencies of one half within the cross. Somehow this influenced Yule's thinking that natural populations would also tend toward allele frequencies of one half over time.
In modern population genetic terms we tend to think about the random union of gametes and work with allele frequencies directly rather than keeping track of each genotype pair (usually this works well but it can be problematic in certain cases where the parental genotypes matter such as Medea systems). If p is the frequency of an allele (A) it is fairly easy to show that the expected genotype frequencies (f) are:
- [math]f_{AA}=p^2[/math]
- [math]f_{Aa}=2p(1-p)[/math]
- [math]f_{aa}=(1-p)^2[/math],
and that p can take any value from zero to one.
It is also fairly easy to show that the expected allele frequency in the next generation (p') is equal to the allele frequency in the current generation (p).
[math]p' = f_{AA}+f_{Aa}/2 = p^2 + 2 p (1-p) / 2 = p^2 + p (1-p) = p^2 + p - p^2 = p[/math]
Thus, in the absence of additional forces, neither the allele nor the genotype frequencies are expected to change over time in a deterministic fashion.
Third Paragraph
However, this is not how Hardy framed the discussion beginning in the third paragraph. First of all he used p, q, and r to represent AA, Aa, and aa genotype frequencies. This can quickly get confusing when you are used to using these variables for allele frequencies. I like to either replace them with x, y, and z or write it out more completely (e.g., fAA}), to keep things clearer. Second he used q to represent half of the heterozygote frequency, [math]f_{Aa}=2q_h[/math] (I am indicating Hardy's definition of q by subscripting with an h to avoid confusion with allele frequencies). Note that in this third paragraph he is already anticipating the effects of genetic drift ("numbers are fairly large"), non-random mating ("mating may be regarded as random"), unequal allele frequencies between the sexes ("the sexes are evenly distributed"), and selection ("all are equally fertile") on deviations from Hardy-Weinberg genotype predictions.
So the frequency of genotypes in one generation is expected to be:
- [math]f_{AA}=p_h[/math]
- [math]f_{Aa}=2q_h[/math]
- [math]f_{aa}=r_h[/math].
He then states that the frequency of genotypes in the next generation is expected to be:
- [math]f_{AA}'=(p_h+q_h)^2[/math]
- [math]f_{Aa}'=2(p_h+q_h)(q_h+r_h)[/math]
- [math]f_{aa}'=(q_h+r_h)^2[/math].
And, this is equivalent to [math]p_{h,1} : 2q_{h,1} : r_{h,1}[/math], where the subscripted one indicates the next generation.
AA Homozygotes
Let's start with the first part [math]f_{AA}=(p_h+q_h)^2[/math]. This is similar to the upper left of the table we constructed above.
AA | Aa | |
AA | AA | AA, Aa |
Aa | AA, Aa | AA, 2 Aa, aa |
However, [math]q_h = f_{Aa}/2[/math] so let's redraw it and multiply out the halves for the heterozygote frequencies.
fAA | fAa/2 | |
fAA | fAA2 | fAAfAa/2 |
fAa/2 | fAAfAa/2 | fAa2/4 |
All of the offpsring of AA x AA crosses are AA. Half of the offspring of AA x Aa crosses are AA. And, a quarter of Aa x Aa crosses are AA. There are no other crosses that can result in AA offspring. So the frequency of AA individuals in the next generation (f'AA) is the sum of these probabilities of the expected frequencies of the different types of crosses.
[math]f_{AA}' = f_{AA}^2 + f_{AA} f_{Aa} + f_{aa}^2/4= f_{AA}^2 + 2f_{AA} f_{Aa}/2 + f_{aa}^2/4 = (f_{AA} + f_{aa}/2)^2 = (p_h+q_h)^2[/math]
This logic can be applied to the other two genotypes as well in the full table of crosses.
Aa Heterozygotes
In the case of the heterozygotes produced in the next generation, [math]f_{Aa}'=2(p_h+q_h)(q_h+r_h)[/math].
fAA | fAa/2 | |
fAa/2 | 2fAAfAa/2 | fAa2/2 |
faa | 2fAAfaa | 2faafAa/2 |
In words, there are two ways to get heterozygotes from AA x Aa crosses (i.e., either the mother could be the heterozygote or the father) and half of the offspring will be heterozygous. There is one way to get heterozygotes from the Aa x Aa cross and half of those offspring will be heterozygous (2fAa2/4=fAa2/2). There are two ways to get heterozygous offspring from AA x aa crosses and all of the offspring will be heterozygous, etc.
Just to make it easier to write I am going to briefly use x for the frequency of AA genotypes, y for Aa, and z for aa. We can write down the production of heterozygotes in the table as (by scaling everything so it is multiplied by two):
[math]y' = 2xy/2 + 2xz + 2y^2/4 + 2zy/2[/math].
Take the two out to the side.
[math]y' = 2(xy/2 + xz + y^2/4 + zy/2)[/math]
Then factor the reamining part.
[math]y' = 2(x+y/2)(y/2+z)[/math]
This is equal to Hardy's
[math]2q_{h,1} = 2(p_h+q_h)(q_h+r_h)[/math]
aa Homozygotes
The calculation for the aa genotypes is identical to the AA genotypes above (with the AA and aa genotypes interchanged) because of symmetry.
Fourth Paragraph
Hardy discusses the situation when the genotype proportions are equal between generations. He claims that it is "easy to see" that this is [math]q_h^2 = p_h r_h[/math].
What Hardy was implying was the case where [math]q_{h,1} = q_h[/math], or the frequency of heterozygotes in the next generation is equal to the frequency in the current generation. Set these two equal to each other:
[math]f_{Aa}=2q_h[/math]
[math]f_{Aa}'=2(p_h+q_h)(q_h+r_h)[/math]
So
[math]q_h = (p_h+q_h)(q_h+r_h)[/math].
This can be rearranged into
[math]q_h(1-p_h-r_h) - p_h r_h= q_h^2[/math].
The trick here is to recognize what [math]1-p_h-r_h[/math] is. The frequency of all the genotypes must add up to one. So, one minus the frequency of the two homozygotes is the frequency of the heterozygotes.
[math]1-p_h-r_h = 2q_h[/math]
Substituting this back in gives
[math]2q_h^2 - p_h r_h= q_h^2[/math].
Rearrange this to
[math]2q_h^2 - q_h^2 = p_h r_h[/math]
and simplify
[math]q_h^2 = p_h r_h[/math].
Really Hardy? Yes, this is "easy to see" in the sense that it uses simple mathematics but it is not "easy to see" in the sense of being obvious or intuitive to a biologist, which is the tone of the rest of the paper.
The main point here is that there is a single prediction for the genotypes in the next generation regardless of the starting genotype frequencies and this does not change over time. So once it is reached (in the second generation at most) the genotype proportions are not expected to depart from this set of frequencies.
Let's also rewrite this as
[math]\left(\frac{f_{Aa}}{2}\right)^2=\frac{f_{Aa}^2}{4}=f_{AA} f_{aa}[/math].
And convert it to allele frequency terms.
[math]\left(\frac{2p(1-p)}{2}\right)^2 = p^2 (1-p)^2[/math]
Simplifying the heterozygote side of the equation we get
[math]p^2 (1-p)^2 = p^2 (1-p)^2[/math].
So squaring the frequency of half of the heterozygotes is the same as multiplying the frequencies of the two homozygotes when at equilibrium. I suspect that this is a very indirect way of saying that the genotypes are produced by multiplying the allele frequencies, i.e., random union of gametes, and there is not an excess or deficiency of heterozygosity.
Fifth Paragraph
Hardy then gives a numerical example to illustrate. He starts off with no heterozygotes and an allele frequency (A) of 1/10,001, where there is one AA individual and 10,000 aa individuals in the population.
[math]p_h = 1, q_h = 0, r_h = 10,000[/math]
In the next generation:
- [math]p_{h,1} = (p_h+q_h)^2 = (1+0)^2 = 1[/math]
- <math>q_{h,1} = (p_h+q_h)(q_h+r_h) = (1+0)(0+10,000) = 10,000
- <math>r_{h,1} = (q_h + r_h)^2 = (0+10,000)^2 = 100,000,000
So the fraction of people affected with bracydactyly has doubled (the two rare alleles in a single homozygous individual have been distributed to a larger number of heterozygotes). In the first generation is was 1/10,001 = 0.00009999. In the second generation it is (remember that qh is half of the heterozygotes) 20,001/100,020,001 = 0.00019997. However, after this point the calculation is the same and the proportion will not change.
...
One thing that might initially be misleading about this paragraph is that he uses a colon to indicate both a ratio of numbers and a proportion out of the total (division).
... to be continued.