Difference between revisions of "Exome Capture"
(5 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
Only a small fraction of many eukaryote genomes are protein coding exon sequences, e.g., in humans this is approximately 1%--2% of the genome spread over approximately 200,000 exons in approximately 20,000-21,000 genes<ref>Elizabeth Pennisi (2012). "ENCODE Project Writes Eulogy For Junk DNA". Science 337 (6099): 1159–1160. doi:10.1126/science.337.6099.1159</ref>. The average human exon is only 145bp in length and the average gene contains 8.8 exons<ref>Table 21 of International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome". Nature 409 (6822): 860–921. doi:10.1038/35057062</ref>. By only sequencing and analyzing the exons in a sample of individuals the investigation (say for genotype--phenotype associations) can be made much more efficient---to the extent that the causative variation is indeed found within the exon sequence. In support of this arguments have been made that the majority of genetic disease causing mutations are rare coding variants of large effect<ref>See Choi et al. (2009) and the references therein; Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., ... & Lifton, R. P. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences, 106(45), 19096-19101.[http://scholar.google.com/scholar?cluster=9399316850784582425]</ref>. | Only a small fraction of many eukaryote genomes are protein coding exon sequences, e.g., in humans this is approximately 1%--2% of the genome spread over approximately 200,000 exons in approximately 20,000-21,000 genes<ref>Elizabeth Pennisi (2012). "ENCODE Project Writes Eulogy For Junk DNA". Science 337 (6099): 1159–1160. doi:10.1126/science.337.6099.1159</ref>. The average human exon is only 145bp in length and the average gene contains 8.8 exons<ref>Table 21 of International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome". Nature 409 (6822): 860–921. doi:10.1038/35057062</ref>. By only sequencing and analyzing the exons in a sample of individuals the investigation (say for genotype--phenotype associations) can be made much more efficient---to the extent that the causative variation is indeed found within the exon sequence. In support of this arguments have been made that the majority of genetic disease causing mutations are rare coding variants of large effect<ref>See Choi et al. (2009) and the references therein; Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., ... & Lifton, R. P. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences, 106(45), 19096-19101.[http://scholar.google.com/scholar?cluster=9399316850784582425]</ref>. | ||
− | Exome capture essentially consists of the steps of fragmenting a DNA sample, hybridizing the DNA to a microarray that contains oligos that match the exons of the species of interest and washing away the non-hybridized DNA (there are also non-array in-solution alternatives to this step), then sequencing the hybridized sample using next generation sequencing technology. | + | Exome capture essentially consists of the steps of fragmenting a DNA sample, hybridizing the DNA to a microarray that contains oligos that match the exons of the species of interest and washing away the non-hybridized DNA (there are also non-array in-solution alternatives to this step), then sequencing the hybridized sample using next generation sequencing technology. This identifies genetic variaiton in the sample that is focused on the exon sequences. |
− | The above sketch works for species that already have a sequenced and annotated genome. For model organisms without a preexisting genome sequence one must first generate the exon sequences for downstream hybridization. This can be done by extracting total RNA from the sample, extracting mRNA using magnetic beads with poly-T oligos to hybridize the the mRNA poly-A tail, reverse transcribing the RNA into DNA and sequencing the DNA using next-gen sequencing approaches. | + | The above sketch works for species that already have a sequenced and annotated genome. For model organisms without a preexisting genome sequence one must first generate the exon sequences for downstream hybridization. This can be done by extracting total RNA from the sample, extracting mRNA using magnetic beads with poly-T oligos to hybridize the the mRNA poly-A tail, reverse transcribing the captured RNA into DNA and sequencing the DNA using next-gen sequencing approaches. |
+ | |||
+ | (should comment on specificity, Hodges et al. 2007, and reference species distance, Bi et al. 2012) | ||
=Literature Notes= | =Literature Notes= | ||
Line 21: | Line 23: | ||
## Images are processed with illumina base-calling software and aligned to reference. | ## Images are processed with illumina base-calling software and aligned to reference. | ||
In practice they used six custom Nimblegen arrays with 385,000 unique 60-90 nt probes (with an offset of 20 nt) and tiled approximately 25,000 exons per array, and a seventh array designed to tile alternative transcripts of the genes included on the first six arrays. In all this corresponded to a tiled 44 million bases. The captured DNA was sequenced on an Illumina 1G platform and they found an average enrichment of exon DNA sequence of 323X. | In practice they used six custom Nimblegen arrays with 385,000 unique 60-90 nt probes (with an offset of 20 nt) and tiled approximately 25,000 exons per array, and a seventh array designed to tile alternative transcripts of the genes included on the first six arrays. In all this corresponded to a tiled 44 million bases. The captured DNA was sequenced on an Illumina 1G platform and they found an average enrichment of exon DNA sequence of 323X. | ||
+ | |||
+ | ==Choi ''et al.'' 2009== | ||
+ | Choi ''et al.'' (2009) describe exome sequencing using NimbleGen microarrays and Illumina sequencing. In the process they made an unexpected genetic/clinical diagnosis (based on a homozygous D652N mutation at a highly conserved site across species in ''SLC26A3'' within a region of IBD) of congenital chloride diarrhea in an individual suspected of Bartter syndrome (and a followup identified 5 more patients in a group of 39 individuals with suspected Bartter syndrome without previously identified mutations, additional testing in 3 of these cases confirmed they had been initially misdiagnosed). | ||
+ | |||
+ | They used tiled oligonucleotides from 180,000 exons of 18,637 protein coding genes and obtained 30X coverage of targeted exons in a sample of 10 individuals with a single lane of paired end 75 base Illumina sequencing. They experimented with shortening the genomic DNA fragment size and adjusting the wash temperature for greater stringency. They carefully compared one sample that was further sequenced to 99X coverage and genotypes at known SNP positions were compared to genotype calls by the Illumina 370K chip. Sensitivity to make correct heterozygote genotype calls increased quickly from 5X to 20X coverage (95.2% correct at 20X and approximately 100% at 30X) with a per base sequencing error rate of 0.75% (Illumina chemistry). They estimate a false heterozygote discovery rate of 6 sites per exome.<ref>Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., ... & Lifton, R. P. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences, 106(45), 19096-19101.[http://scholar.google.com/scholar?cluster=9399316850784582425]</ref> | ||
==Ng ''et al''. 2009== | ==Ng ''et al''. 2009== | ||
Line 33: | Line 40: | ||
Bi, K., Vanderpool, D., Singhal, S., Linderoth, T., Moritz, C., & Good, J. M. (2012). Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales. BMC genomics, 13(1), 403.[http://scholar.google.com/scholar?cluster=7158900198946169910] | Bi, K., Vanderpool, D., Singhal, S., Linderoth, T., Moritz, C., & Good, J. M. (2012). Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales. BMC genomics, 13(1), 403.[http://scholar.google.com/scholar?cluster=7158900198946169910] | ||
− | |||
− | |||
Parla, J. S., Iossifov, I., Grabill, I., Spector, M. S., Kramer, M., & McCombie, W. R. (2011). A comparative analysis of exome capture. Genome Biol, 12(9), R97.[http://scholar.google.com/scholar?cluster=6158585625214466592] | Parla, J. S., Iossifov, I., Grabill, I., Spector, M. S., Kramer, M., & McCombie, W. R. (2011). A comparative analysis of exome capture. Genome Biol, 12(9), R97.[http://scholar.google.com/scholar?cluster=6158585625214466592] |
Latest revision as of 03:32, 17 July 2014
Exome capture is a method used to extract and sequence the exome (collection of all exons) in a genome and compare this variation across a sample of individual organisms. This allows studies to quickly focus in on the small percent of the genome that is most likely to contain variation that strongly affects phenotypes of interest and/or to identify rates of codon evolution between a set of species to infer the effects of mutation and selection among genes.
Only a small fraction of many eukaryote genomes are protein coding exon sequences, e.g., in humans this is approximately 1%--2% of the genome spread over approximately 200,000 exons in approximately 20,000-21,000 genes[1]. The average human exon is only 145bp in length and the average gene contains 8.8 exons[2]. By only sequencing and analyzing the exons in a sample of individuals the investigation (say for genotype--phenotype associations) can be made much more efficient---to the extent that the causative variation is indeed found within the exon sequence. In support of this arguments have been made that the majority of genetic disease causing mutations are rare coding variants of large effect[3].
Exome capture essentially consists of the steps of fragmenting a DNA sample, hybridizing the DNA to a microarray that contains oligos that match the exons of the species of interest and washing away the non-hybridized DNA (there are also non-array in-solution alternatives to this step), then sequencing the hybridized sample using next generation sequencing technology. This identifies genetic variaiton in the sample that is focused on the exon sequences.
The above sketch works for species that already have a sequenced and annotated genome. For model organisms without a preexisting genome sequence one must first generate the exon sequences for downstream hybridization. This can be done by extracting total RNA from the sample, extracting mRNA using magnetic beads with poly-T oligos to hybridize the the mRNA poly-A tail, reverse transcribing the captured RNA into DNA and sequencing the DNA using next-gen sequencing approaches.
(should comment on specificity, Hodges et al. 2007, and reference species distance, Bi et al. 2012)
Contents
Literature Notes
Hodges et al. 2007
Hodges et al. (2007) describe a method using microarrays to capture targeted DNA sequences from a genome and applied it to human protein coding exons.[4] They outline the following steps in Figure 1.
- Genomic DNA preparation and hybrid selection
- Randomly fragment high-molecular-weight DNA by sonication (to an average size of 500-600 bp) or nebulization.[5]
- Repair, blunt and phosphorylate ends.
- Ligate linkers, denature strands and capture with 385k arrayed probes (on exon tiling arrays).
- Recover selected fragments by thermal elution (heat denaturation) followed by lyophilization and PCR enrichment of ligated strands.
- 1G Sequencing
- Blunt asymmetric capture linkers. Phosphorylate and adenykate ends. Ligate Illumina 1G-compatible adaptors. Gel purify and PCR enrich[6].
- Denatured strands are injected into eight-lane flow cell. Clusters are generated from single molecules by in situ amplification.
- Sequencing-by-synthesis primer is hybridized and cluster images are scanned with each successive round of fluorescent nucleotide incorporation.
- Images are processed with illumina base-calling software and aligned to reference.
In practice they used six custom Nimblegen arrays with 385,000 unique 60-90 nt probes (with an offset of 20 nt) and tiled approximately 25,000 exons per array, and a seventh array designed to tile alternative transcripts of the genes included on the first six arrays. In all this corresponded to a tiled 44 million bases. The captured DNA was sequenced on an Illumina 1G platform and they found an average enrichment of exon DNA sequence of 323X.
Choi et al. 2009
Choi et al. (2009) describe exome sequencing using NimbleGen microarrays and Illumina sequencing. In the process they made an unexpected genetic/clinical diagnosis (based on a homozygous D652N mutation at a highly conserved site across species in SLC26A3 within a region of IBD) of congenital chloride diarrhea in an individual suspected of Bartter syndrome (and a followup identified 5 more patients in a group of 39 individuals with suspected Bartter syndrome without previously identified mutations, additional testing in 3 of these cases confirmed they had been initially misdiagnosed).
They used tiled oligonucleotides from 180,000 exons of 18,637 protein coding genes and obtained 30X coverage of targeted exons in a sample of 10 individuals with a single lane of paired end 75 base Illumina sequencing. They experimented with shortening the genomic DNA fragment size and adjusting the wash temperature for greater stringency. They carefully compared one sample that was further sequenced to 99X coverage and genotypes at known SNP positions were compared to genotype calls by the Illumina 370K chip. Sensitivity to make correct heterozygote genotype calls increased quickly from 5X to 20X coverage (95.2% correct at 20X and approximately 100% at 30X) with a per base sequencing error rate of 0.75% (Illumina chemistry). They estimate a false heterozygote discovery rate of 6 sites per exome.[7]
Ng et al. 2009
Ng et al. (2009) created a shotgun library of human DNA sequences and hybridized the DNA to Agilent 244K microarrays. The microarrays were designed to contain anchored oligos matching human exon sequences. The exon sequences from the samples are expected to hybridize to the oligos on the microarrays. The remaining DNA can be washed away then the hybridized DNA eluted for sequencing. Thus, the original DNA sample has been greatly enriched for exon sequences. They used an Illumina GA2 system for sequencing the remaining post-enrichment DNA fragments and mapped the resulting 76 base-pair reads to a reference human genome (hg18 http://genome.ucsc.edu). Using their approach the average sequence coverage of each exon in the genome was 51X. The coverage and quality score criteria resulted in 78% of genes having >95% of their exon bases called. In addition to eight reference individuals they are included four unrelated individuals with Freeman-Sheldon syndrome (FSS). They excluded common variants recorded in dbSNP and were able to identify mutations in MYH3, previously considered a candidate gene as causative of FSS, establishing that an exome approach can identify causual variants from very small sample sizes.[8]
To Follow Up On
Bamshad, M. J., Ng, S. B., Bigham, A. W., Tabor, H. K., Emond, M. J., Nickerson, D. A., & Shendure, J. (2011). Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics, 12(11), 745-755.[5]
Bainbridge, M. N., Wang, M., Burgess, D. L., Kovar, C., Rodesch, M. J., D'Ascenzo, M., ... & Gibbs, R. A. (2010). Method Whole exome capture in solution with 3 Gbp of data.[6]
Bi, K., Vanderpool, D., Singhal, S., Linderoth, T., Moritz, C., & Good, J. M. (2012). Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales. BMC genomics, 13(1), 403.[7]
Parla, J. S., Iossifov, I., Grabill, I., Spector, M. S., Kramer, M., & McCombie, W. R. (2011). A comparative analysis of exome capture. Genome Biol, 12(9), R97.[8]
Sulonen, A. M., Ellonen, P., Almusa, H., Lepistö, M., Eldfors, S., Hannula, S., ... & Saarela, J. (2011). Comparison of solution-based exome capture methods for next generation sequencing. Genome biology, 12(9), R94.[9]
Teer, J. K., & Mullikin, J. C. (2010). Exome sequencing: the sweet spot before whole genomes. Human molecular genetics, ddq333.[10]
References
- ↑ Elizabeth Pennisi (2012). "ENCODE Project Writes Eulogy For Junk DNA". Science 337 (6099): 1159–1160. doi:10.1126/science.337.6099.1159
- ↑ Table 21 of International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome". Nature 409 (6822): 860–921. doi:10.1038/35057062
- ↑ See Choi et al. (2009) and the references therein; Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., ... & Lifton, R. P. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences, 106(45), 19096-19101.[1]
- ↑ Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M. N., Smith, S. W., ... & McCombie, W. R. (2007). Genome-wide in situ exon capture for selective resequencing. Nature genetics, 39(12), 1522-1527.[2]
- ↑ There is a repetitive word illusion in the original figure at this step.
- ↑ The PCR amplification was for a limited number of cycles and was to optimize the amount of DNA loaded for sequencing; this step can probably be skipped to avoid artifacts introduced by PCR.
- ↑ Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., ... & Lifton, R. P. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences, 106(45), 19096-19101.[3]
- ↑ Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., ... & Shendure, J. (2009). Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461(7261), 272-276.[4]