Difference between revisions of "Exome Capture"

From Genetics Wiki
Jump to: navigation, search
Line 10: Line 10:
 
## Recover selected fragments by thermal elution (heat denaturation) followed by lyophilization and PCR enrichment of ligated strands.
 
## Recover selected fragments by thermal elution (heat denaturation) followed by lyophilization and PCR enrichment of ligated strands.
 
# 1G Sequencing
 
# 1G Sequencing
## Blunt asymmetric capture linkers. Phosphorylate and adenykate ends. Ligate Illumina 1G-compatible adaptors. Gel purify and PCR enrich<ref>the PCR amplification was for a limited number of cycles and was to optimize the amount of DNA loaded for sequencing; this step can probably be skipped to avoid artifacts intoduced by PCR</ref>.  
+
## Blunt asymmetric capture linkers. Phosphorylate and adenykate ends. Ligate Illumina 1G-compatible adaptors. Gel purify and PCR enrich<ref>The PCR amplification was for a limited number of cycles and was to optimize the amount of DNA loaded for sequencing; this step can probably be skipped to avoid artifacts introduced by PCR.</ref>.  
 
## Denatured strands are injected into eight-lane flow cell. Clusters are generated from single molecules by ''in situ'' amplification.
 
## Denatured strands are injected into eight-lane flow cell. Clusters are generated from single molecules by ''in situ'' amplification.
 
## Sequencing-by-synthesis primer is hybridized and cluster images are scanned with each successive round of fluorescent nucleotide incorporation.  
 
## Sequencing-by-synthesis primer is hybridized and cluster images are scanned with each successive round of fluorescent nucleotide incorporation.  
 
## Images are processed with illumina base-calling software and aligned to reference.   
 
## Images are processed with illumina base-calling software and aligned to reference.   
In practice they used six custom Nimblegen arrays with 385,000 unique 60-90 nt probes (with an offset of 20 nt) and tiled approximately 25,000 exons per array, and a seventh array designed to tile alternative transcripts of the genes included on the first sex arrays.  In all this corresponded to a tiled 44 million bases.   
+
In practice they used six custom Nimblegen arrays with 385,000 unique 60-90 nt probes (with an offset of 20 nt) and tiled approximately 25,000 exons per array, and a seventh array designed to tile alternative transcripts of the genes included on the first sex arrays.  In all this corresponded to a tiled 44 million bases.  The captured DNA was sequenced on an Illumina 1G platform and they found an average enrichment of exon DNA sequence of 323X.   
  
 
Ng ''et al''. (2009) created a shotgun library of human DNA sequences and hybridized the DNA to Agilent 244K microarrays.  The microarrays were designed to contain anchored oligos matching human exon sequences.  The exon sequences from the samples are expected to hybridize to the oligos on the microarrays.  The remaining DNA can be washed away then the hybridized DNA eluted for sequencing.  Thus, the original DNA sample has been greatly enriched for exon sequences.  They used an Illumina GA2 system for sequencing the remaining post-enrichment DNA fragments and mapped the resulting 76 base-pair reads to a reference human genome (hg18 http://genome.ucsc.edu). Using their approach the average sequence coverage of each exon in the genome was 51X.  The coverage and quality score criteria resulted in 78% of genes having >95% of their exon bases called.  In addition to eight reference individuals they are included four unrelated individuals with Freeman-Sheldon syndrome (FSS).  They excluded common variants recorded in dbSNP and were able to identify mutations in MYH3, previously considered a candidate gene as causative of FSS, establishing that an exome approach can identify causual variants from very small sample sizes.<ref>Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., ... & Shendure, J. (2009). Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461(7261), 272-276.[http://scholar.google.com/scholar?cluster=3420376061420380567]
 
Ng ''et al''. (2009) created a shotgun library of human DNA sequences and hybridized the DNA to Agilent 244K microarrays.  The microarrays were designed to contain anchored oligos matching human exon sequences.  The exon sequences from the samples are expected to hybridize to the oligos on the microarrays.  The remaining DNA can be washed away then the hybridized DNA eluted for sequencing.  Thus, the original DNA sample has been greatly enriched for exon sequences.  They used an Illumina GA2 system for sequencing the remaining post-enrichment DNA fragments and mapped the resulting 76 base-pair reads to a reference human genome (hg18 http://genome.ucsc.edu). Using their approach the average sequence coverage of each exon in the genome was 51X.  The coverage and quality score criteria resulted in 78% of genes having >95% of their exon bases called.  In addition to eight reference individuals they are included four unrelated individuals with Freeman-Sheldon syndrome (FSS).  They excluded common variants recorded in dbSNP and were able to identify mutations in MYH3, previously considered a candidate gene as causative of FSS, establishing that an exome approach can identify causual variants from very small sample sizes.<ref>Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., ... & Shendure, J. (2009). Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461(7261), 272-276.[http://scholar.google.com/scholar?cluster=3420376061420380567]

Revision as of 21:31, 15 July 2014

Exome capture is a method used to extract and sequence the exome (collection of all exons) in a genome and compare this variation across a sample of individual organisms. This allows studies to quickly focus in on the small percent of the genome that is most likely to contain variation that strongly affects phenotypes of interest.

Only a small fraction of many eukaryote genomes are protein coding exon sequences, e.g., in humans this is approximately 1%--2% of the genome spread over approximately 200,000 exons in approximately 20,000-21,000 genes[1]. The average human exon is only 145bp in length and the average gene contains 8.8 exons[2]. By only sequencing and analysing the exons in a sample of individuals the investigation (say for genotype--phenotype associations) can be made much more efficient---to the extent that the causative variation is indeed found within the exon sequence.

Hodges et al. (2007) describe a method using microarrays to capture targeted DNA sequences from a genome and applied it to human protein coding exons.[3] They outline the following steps in Figure 1.

  1. Genomic DNA preparation and hybrid selection
    1. Randomly fragment high-molecular-weight DNA by sonication (to an average size of 500-600 bp) or nebulization.[4]
    2. Repair, blunt and phosphorylate ends.
    3. Ligate linkers, denature strands and capture with 385k arrayed probes (on exon tiling arrays).
    4. Recover selected fragments by thermal elution (heat denaturation) followed by lyophilization and PCR enrichment of ligated strands.
  2. 1G Sequencing
    1. Blunt asymmetric capture linkers. Phosphorylate and adenykate ends. Ligate Illumina 1G-compatible adaptors. Gel purify and PCR enrich[5].
    2. Denatured strands are injected into eight-lane flow cell. Clusters are generated from single molecules by in situ amplification.
    3. Sequencing-by-synthesis primer is hybridized and cluster images are scanned with each successive round of fluorescent nucleotide incorporation.
    4. Images are processed with illumina base-calling software and aligned to reference.

In practice they used six custom Nimblegen arrays with 385,000 unique 60-90 nt probes (with an offset of 20 nt) and tiled approximately 25,000 exons per array, and a seventh array designed to tile alternative transcripts of the genes included on the first sex arrays. In all this corresponded to a tiled 44 million bases. The captured DNA was sequenced on an Illumina 1G platform and they found an average enrichment of exon DNA sequence of 323X.

Ng et al. (2009) created a shotgun library of human DNA sequences and hybridized the DNA to Agilent 244K microarrays. The microarrays were designed to contain anchored oligos matching human exon sequences. The exon sequences from the samples are expected to hybridize to the oligos on the microarrays. The remaining DNA can be washed away then the hybridized DNA eluted for sequencing. Thus, the original DNA sample has been greatly enriched for exon sequences. They used an Illumina GA2 system for sequencing the remaining post-enrichment DNA fragments and mapped the resulting 76 base-pair reads to a reference human genome (hg18 http://genome.ucsc.edu). Using their approach the average sequence coverage of each exon in the genome was 51X. The coverage and quality score criteria resulted in 78% of genes having >95% of their exon bases called. In addition to eight reference individuals they are included four unrelated individuals with Freeman-Sheldon syndrome (FSS). They excluded common variants recorded in dbSNP and were able to identify mutations in MYH3, previously considered a candidate gene as causative of FSS, establishing that an exome approach can identify causual variants from very small sample sizes.[6]


Bamshad, M. J., Ng, S. B., Bigham, A. W., Tabor, H. K., Emond, M. J., Nickerson, D. A., & Shendure, J. (2011). Exome sequencing as a tool for Mendelian disease gene discovery. Nature Reviews Genetics, 12(11), 745-755.[3]

Bi, K., Vanderpool, D., Singhal, S., Linderoth, T., Moritz, C., & Good, J. M. (2012). Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales. BMC genomics, 13(1), 403.[4]

Choi, M., Scholl, U. I., Ji, W., Liu, T., Tikhonova, I. R., Zumbo, P., ... & Lifton, R. P. (2009). Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proceedings of the National Academy of Sciences, 106(45), 19096-19101.[5]

Teer, J. K., & Mullikin, J. C. (2010). Exome sequencing: the sweet spot before whole genomes. Human molecular genetics, ddq333.[6]

References

  1. Elizabeth Pennisi (2012). "ENCODE Project Writes Eulogy For Junk DNA". Science 337 (6099): 1159–1160. doi:10.1126/science.337.6099.1159
  2. Table 21 of International Human Genome Sequencing Consortium (2001). "Initial sequencing and analysis of the human genome". Nature 409 (6822): 860–921. doi:10.1038/35057062
  3. Hodges, E., Xuan, Z., Balija, V., Kramer, M., Molla, M. N., Smith, S. W., ... & McCombie, W. R. (2007). Genome-wide in situ exon capture for selective resequencing. Nature genetics, 39(12), 1522-1527.[1]
  4. There is a repetitive word illusion in the original figure at this step.
  5. The PCR amplification was for a limited number of cycles and was to optimize the amount of DNA loaded for sequencing; this step can probably be skipped to avoid artifacts introduced by PCR.
  6. Ng, S. B., Turner, E. H., Robertson, P. D., Flygare, S. D., Bigham, A. W., Lee, C., ... & Shendure, J. (2009). Targeted capture and massively parallel sequencing of 12 human exomes. Nature, 461(7261), 272-276.[2]