Hardy 1908

From Genetics Wiki
Jump to: navigation, search

Citation

Hardy, G. H. (1908) Mendelian proportions in a mixed population. Science 28(706): 49-50.

Links

Notes

First Paragraph

The tone of the first paragraph is a bit amusing, at least to a modern reader. It is hard to know how much of this is overt criticism versus the norms of formal language in 1908. However, this sentence is unambiguous, "I should have expected the very simple point which I wish to make to have been familiar to biologists". Hardy definitely felt that the question of expected genotype proportions was far too trivial for him to waste time upon; yet, he was forced to do so by the blatant misunderstandings of others. The irony is that this is probably the result for which he is best known today. Elsewhere he keeps going with "a little mathematics of the multiplication-table type is enough to show ...", "it is easy to see ...", and "there is not the slightest foundation for the idea ...".

Second Paragraph

At the time there was a debate about the general validity of Mendelian genetics in terms of understanding biological heritability. This was part of the Biometric-Mendelian Debate (or mutationalists versus selectionists), which was not resolved until the Modern Synthesis of biology later in the 20th century (there are still divisions that have lasted until today such as Quantitative Genetics versus Population Genetics). One objection to Mendelian genetics that was brought up by Yule is that phenotypes in natural populations do not follow Mendelian proportions. The example of brachydactyly (shortened fingers and/or toes, one form of which is a dominant trait) in humans is used with the observation that the ratio of brachydactylic to unaffected individuals is not three to one.

A a
A AA Aa
a Aa aa

The Punnett Square above for an F2 Cross shows the expected offspring from two heterozygous parents (whose alleles are represented at the top of the columns and beginning of the rows and are combined to yield the child's genotype) with three brachydactylic offspring in red (genotypes AA and Aa) to one brachydactylic child in blue (aa).

An F2 cross is a very artificial situation that begins by crossing pure breeding "parental" lines (AA and aa in this example) to generate heterozygous offspring (Aa) then these are crossed together to generate the F2s. It is unlikely that Yule was thinking of only this scenario. In a natural population offspring from all possible crosses (AA x AA, AA x Aa, AA x aa, Aa x Aa, Aa x aa, and aa x aa), only one of which is the F2 scenario, would be generated. If the two allele frequencies were precisely 1/2 then all possible crosses would also result in a 3:1 ratio of offspring phenotypes.

AA 2 Aa aa
AA AA 1/2 AA, 1/2 Aa Aa
2 Aa 1/2 AA, 1/2 Aa 1/4 AA, 1/2 Aa, 1/4 aa 1/2 Aa, 1/2 aa
aa Aa 1/2 Aa, 1/2 aa aa

Multiplying out the twos from the heterozygous parents gives:

AA Aa aa
AA AA AA, Aa Aa
Aa AA, Aa AA, 2 Aa, aa Aa, aa
aa Aa Aa, aa aa

The total proportions of offspring with all three genotypes are 4 AA, 8 Aa, and 4 aa, or a 1:2:1 ratio of genotypes and a 3:1 ratio of phenotypes (AA and Aa to aa). Mendel's crosses worked with allele frequencies of one half within the cross. Somehow this influenced Yule's thinking that natural populations would also tend toward allele frequencies of one half over time.

In modern population genetic terms we tend to think about the random union of gametes and work with allele frequencies directly rather than keeping track of each genotype pair (usually this works well but it can be problematic in certain cases where the parental genotypes matter such as Medea systems). If p is the frequency of an allele (A) it is fairly easy to show that the expected genotype frequencies (f) are:

  • [math]f_{AA}=p^2[/math]
  • [math]f_{Aa}=2p(1-p)[/math]
  • [math]f_{aa}=(1-p)^2[/math],

and that p can take any value from zero to one.

It is also fairly easy to show that the expected allele frequency in the next generation (p') is equal to the allele frequency in the current generation (p).

[math]p' = f_{AA}+f_{Aa}/2 = p^2 + 2 p (1-p) / 2 = p^2 + p (1-p) = p^2 + p - p^2 = p[/math]

Thus, in the absence of additional forces, neither the allele nor the genotype frequencies are expected to change over time in a deterministic fashion.

Third Paragraph

However, this is not how Hardy framed the discussion beginning in the third paragraph. First of all he used p, q, and r to represent AA, Aa, and aa genotype frequencies. This can quickly get confusing when you are used to using these variables for allele frequencies. I like to either replace them with x, y, and z or write it out more completely (e.g., fAA), to keep things clearer. Second he used q to represent half of the heterozygote frequency, [math]f_{Aa}=2q_h[/math] (I am indicating Hardy's definition of q by subscripting with an h to avoid confusion with allele frequencies). Note that in this third paragraph he is already anticipating the effects of genetic drift ("numbers are fairly large"), non-random mating ("mating may be regarded as random"), unequal allele frequencies between the sexes ("the sexes are evenly distributed"), and selection ("all are equally fertile") on deviations from Hardy-Weinberg genotype predictions.

So the proportions of genotypes in the starting generation are defined as:

  • [math]f_{AA}=p_{h,0}[/math]
  • [math]f_{Aa}=2q_{h,0}[/math]
  • [math]f_{aa}=r_{h,0}[/math].

He then states that the genotypes in the next generation are expected to be:

  • [math]f_{AA}'=(p_h+q_h)^2[/math]
  • [math]f_{Aa}'=2(p_h+q_h)(q_h+r_h)[/math]
  • [math]f_{aa}'=(q_h+r_h)^2[/math].

And, this is equivalent to [math]p_{h,1} : 2q_{h,1} : r_{h,1}[/math], where the subscripted one indicates the next generation. (He does not use a subscripted zero for the starting generation; it is implied; however, I am using it here in an attempt at greater clarity.)

AA Homozygotes

Let's start with the first part [math]f_{AA}=(p_h+q_h)^2[/math]. This is similar to the upper left of the table we constructed above.

AA Aa
AA AA AA, Aa
Aa AA, Aa AA, 2 Aa, aa

However, [math]q_h = f_{Aa}/2[/math] so let's redraw it and multiply out the halves for the heterozygote frequencies.

fAA fAa/2
fAA fAA2 fAAfAa/2
fAa/2 fAAfAa/2 fAa2/4

All of the offpsring of AA x AA crosses are AA. Half of the offspring of AA x Aa crosses are AA. And, a quarter of Aa x Aa crosses are AA. There are no other crosses that can result in AA offspring. So the frequency of AA individuals in the next generation (f'AA) is the sum of these probabilities of the expected frequencies of the different types of crosses.

[math]f_{AA}' = f_{AA}^2 + f_{AA} f_{Aa} + f_{Aa}^2/4= f_{AA}^2 + 2f_{AA} f_{Aa}/2 + f_{Aa}^2/4 = (f_{AA} + f_{Aa}/2)^2 = (p_h+q_h)^2[/math]

This logic can be applied to the other two genotypes as well in the full table of crosses.

Aa Heterozygotes

Let's look back at the full table of possible genotype crosses and highlight the crosses that produce heterozygotes.

AA Aa aa
AA AA 1/2 AA, 1/2 Aa Aa
Aa 1/2 AA, 1/2 Aa 1/4 AA, 1/2 Aa, 1/4 aa 1/2 Aa, 1/2 aa
aa Aa 1/2 Aa, 1/2 aa aa

Let's add these up.

[math]2 f_{Aa}^2/4 + 2 f_{AA} f_{aa} + 2 f_{AA} f_{Aa}/2 + 2 f_{aa} f_{Aa} / 2[/math]

Above, [math]f_{Aa}/2 = 2 f_{Aa}/4[/math] to scale everything by two.

[math]2 (f_{Aa}^2/4 + f_{AA} f_{aa} + f_{AA} f_{Aa}/2 + f_{aa} f_{Aa}/2)[/math]

[math]2 (f_{Aa}/2 + f_{AA}) (f_{Aa}/2 + f_{aa})[/math]

Converting this to Hardy's notation, the heterozygotes produced in the next generation are [math]f_{Aa}'=2(p_h+q_h)(q_h+r_h)[/math].

This can be represented by a smaller table.

fAA fAa/2
fAa/2 2fAAfAa/2 fAa2/2
faa 2fAAfaa 2faafAa/2

In words, there are two ways to get heterozygotes from AA x Aa crosses (i.e., either the mother could be the heterozygote or the father) and half of the offspring will be heterozygous (2fAAfAa/2). There is one way to get heterozygotes from the Aa x Aa cross and half of those offspring will be heterozygous (fAa2/2=2fAa2/4, the second part rescales it to be multiplied by two like the other entries). There are two ways to get heterozygous offspring from AA x aa crosses and all of the offspring will be heterozygous, etc.

In case this is helpful, here is an alternative briefly using x for the frequency of AA genotypes, y for Aa, and z for aa. We can write down the production of heterozygotes in the table as (by scaling everything so it is multiplied by two):

[math]y' = 2xy/2 + 2xz + 2y^2/4 + 2zy/2[/math].

Take the two out to the side.

[math]y' = 2(xy/2 + xz + y^2/4 + zy/2)[/math]

Then factor the remaining part.

[math]y' = 2(x+y/2)(y/2+z)[/math]

This is equal to Hardy's

[math]2q_{h,1} = 2(p_h+q_h)(q_h+r_h)[/math]

aa Homozygotes

The calculation for the aa genotypes is identical to the AA genotypes above (with the AA and aa genotypes interchanged) because of symmetry.

Fourth Paragraph

Hardy discusses the situation when the genotype proportions are equal between generations. He claims that it is "easy to see" that this is [math]q_h^2 = p_h r_h[/math].

What Hardy was implying was the case where [math]q_{h,1} = q_{h,0}[/math], or the frequency of heterozygotes in the next generation is equal to the frequency in the current generation. Set these two equal to each other:

[math]f_{Aa}=2q_h[/math]

[math]f_{Aa}'=2(p_h+q_h)(q_h+r_h)[/math]

So

[math]q_h = (p_h+q_h)(q_h+r_h)[/math].

[math]q_h = p_hq_h + p_hr_h + q_h^2 + q_hr_h[/math].

[math]q_h - p_hq_h - q_hr_h - p_hr_h = q_h^2 [/math].

This can be rearranged into

[math]q_h(1-p_h-r_h) - p_h r_h= q_h^2[/math].

The trick here is to recognize what [math]1-p_h-r_h[/math] is. The frequency of all the genotypes must add up to one. So, one minus the frequency of the two homozygotes is the frequency of the heterozygotes.

[math]1-p_h-r_h = 2q_h[/math]

Substituting this back in gives

[math]2q_h^2 - p_h r_h= q_h^2[/math].

Rearrange this to

[math]2q_h^2 - q_h^2 = p_h r_h[/math]

and simplify

[math]q_h^2 = p_h r_h[/math].

Really Hardy? Yes, this is "easy to see" in the sense that it uses simple mathematics but it is not "easy to see" in the sense of being immediately obvious or intuitive to a biologist, which is the tone of the rest of the paper.

The main point here is that there is a single prediction for the genotypes in the next generation regardless of the starting genotype frequencies and this does not change over time. So once it is reached (in the second generation at most) the genotype proportions are not expected to depart from this set of frequencies.

Let's also rewrite this as

[math]\left(\frac{f_{Aa}}{2}\right)^2=\frac{f_{Aa}^2}{4}=f_{AA} f_{aa}[/math].

And convert it to allele frequency terms.

[math]\left(\frac{2p(1-p)}{2}\right)^2 = p^2 (1-p)^2[/math]

Simplifying the heterozygote side of the equation we get

[math]p^2 (1-p)^2 = p^2 (1-p)^2[/math].

So squaring the frequency of half of the heterozygotes is the same as multiplying the frequencies of the two homozygotes when at equilibrium. I suspect that this is a very indirect way of saying that the genotypes are produced by multiplying the allele frequencies, i.e., random union of gametes, and there is not an excess or deficiency of heterozygosity.

Fifth Paragraph

Hardy then gives a numerical example to illustrate. He starts off with no heterozygotes and an allele frequency (A) of 1/10,001, where there is one AA individual and 10,000 aa individuals in the population.

[math]p_h = 1, q_h = 0, r_h = 10,000[/math]

In the next generation:

  • [math]p_{h,1} = (p_h+q_h)^2 = (1+0)^2 = 1[/math]
  • [math]q_{h,1} = (p_h+q_h)(q_h+r_h) = (1+0)(0+10,000) = 10,000[/math]
  • [math]r_{h,1} = (q_h + r_h)^2 = (0+10,000)^2 = 100,000,000[/math]

So the fraction of people affected with bracydactyly has almost doubled (the two rare alleles in a single homozygous individual have been distributed to a larger number of heterozygotes). In the first generation is was 1/10,001 = 0.00009999. In the second generation it is (remember that qh is half of the heterozygotes) 20,001/100,020,001 = 0.00019997. However, after this point the calculation is the same and the proportion will not change.

One may wonder why he chose these numbers and this approach. It works out so that the smallest expected number ph,1 is one and the trait (bracydactyly) is rare.

This also works if we convert the genotype counts to proportions to allow for any population size.

[math]p_h = 9.999\times10^{-5}, q_h = 0, r_h = 0.99990001[/math]

In the next generation:

  • [math]p_{h,1} = (p_h+q_h)^2 = 9.998\times10^{-9} [/math]
  • [math]q_{h,1} = (p_h+q_h)(q_h+r_h) = 9.998\times10^{-5}[/math]
  • [math]r_{h,1} = (q_h + r_h)^2 = 0.999880003[/math]

and the proportion of affected individuals out of the total is the same, 0.00019997. These numbers can be used to calculate a third generation and the genotype proportions remain the same and are not expected to increase.

You could also start with the allele frequency of A as [math]p = 9.999\times10^{-5}[/math] and get the same result.

  • [math]p^2 = 9.998\times10^{-9} [/math]
  • [math]2p(1-p) = 0.00019996[/math]
  • [math](1-p)^2 = 0.999880003[/math]

One thing that might initially be misleading about this paragraph is that he uses a colon to indicate both a ratio of numbers and a proportion out of the total (division).

Finally he mentions the phenotype frequency if it were for a recessive trait and that this also is not predicted to change to smaller values over time.

Sixth Paragraph

In the sixth paragraph he briefly summarizes that allele frequencies are not expected to change over time in a direction reflecting the dominance of the phenotype that they are associated with.

Seventh Paragraph

He goes on to discuss the effects of genetic drift, "the effect of small deviations" ... "which will, of course, occur in every generation". He points out that the genotype proportions calculated are expected, but that small shifts away from these values are also expected by random chance. The following generation will be derived from slightly different genotype frequencies and this will be the new "stable" set of genotype proportions for the population.

He clarifies that stability in the sense of expected genotype proportion over time is not stability in the strong sense (that the system will tend to return to an equilibrium after small perturbations away from this value) but is "stability" in the weak sense. Small differences will accumulate and the system, over time, will wander away from its starting point---but will not wander away too far in a single generation. Hardy-Weinberg genotype proportions of the current generation are predictive of future generations, but are less predictive the farther into the future you go.

Eighth Paragraph

Hardy wraps up some other loose ends that were alluded to in the third paragraph. He briefly mentions that non-random mating, with respect to genotypes, will distort the system away from these predictions. Also, if the trait is sex linked then the results will be affected. Note that at this time it was not clear that genes were located on chromosomes (until the work of Bridges and Morgan) and so it was not obvious that X or Y linkage in humans would need to be considered. For autosomal loci Hardy-Weinberg proportions are expected after a single generation of random mating. However, for X-linked genes the proportions are approached asymptotically over many generations. Finally, he points out the problem of if the trait influences fertility, which is a form of selection, that can affect genotype proportions. He anticipated a lot of the future of population genetics by thinking about the requirements needed for his statement to be the most valid.

PostScript

Hardy's results were relayed to Yule by Punnett and Yule agreed with Hardy. He also points out that the special case of Hardy-Weinberg when p=1/2 (1 AA : 2 Aa : 1 aa) was also found by Pearson.

Conclusion

Hardy corrected a misunderstanding in early genetics regarding expected genotype frequencies and their (lack of a) relationship with the dominance of the related phenotypes. However, Hardy's explanation is overly terse, slightly obscuring the math underlying his major point (and clarity of communication is an important component of science). To his credit, he anticipated a number of important aspects of population genetics (selection, non-random mating, genetic drift, and sex linkage) which is impressive for the time. However, it also seems that the basic concept of expected genotype proportions can be much more intuitively explained using basic rules of probability applied to the underlying allele frequencies.

Related Publications

A nice perspective is Crow, J. F. (1999). Hardy, Weinberg and language impediments. Genetics, 152(3), 821-825, which focuses more on the history and personalities of Hardy and Weinberg. However, an earlier publication, Crow, J. F., 1988 Eighty years ago: the beginnings of population genetics. Genetics 119: 473–476., goes into more detail about Hardy. It does wander away at points from the focus of the paper but the anecdotes are interesting in their own right.

Another useful perspective is Edwards, A. W. F. (2008). GH Hardy (1908) and Hardy–Weinberg equilibrium. Genetics, 179(3), 1143-1150, which focuses on the historical interconnections of many figures in genetics at the time. It also helpfully points the way in decoding q2 = pr.

Punnett, R. C. (1908). Mendelism in Relation to Disease. Proceedings of the Royal Society of Medicine, 1(Sect. Epidemiol. State Med.), 135-168. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2046543/ This publication contains the statements made by Yule regarding dominance and allele frequencies including the example of brachydactyly.

Pearson, K. (1904). Mathematical Contributions to the Theory of Evolution. XII. On a Generalised Theory of Alternative Inheritance, with Special Reference to Mendel's Laws. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 203(359-371), 53-86. http://rsta.royalsocietypublishing.org/content/203/359-371/53 This includes the special case of Hardy-Weinberg when p=1/2.