Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species methods
Aim. Evidence-backed execution summary for Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species methods from Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species.
Show snapshot details
On this page
This experiment, in seven questions
Jump straight to the part of the recipe you need. Data and provenance labels stay close to the action they support.
Shopping and prep list
What do I need before I start?
mouse
Subject model for the experiment.
- Use
- confirm full cohort details in the source paper
Introduction
reagent used in the protocol.
- Use
- The plummeting cost and skyrocketing throughput of DNA sequencing has begun to enable sequencing of entire genomes of study populations of some focal species,; however, even in traditional model species (e.g., humans, laboratory mice, and Drosophila ) resources for complete genome resequencing of large numbers of...
Methods
reagent used in the protocol.
- Use
- We have developed a protocol that builds on the RADseq method but which differs in two principal respects ( ). First, our method eliminates random shearing and end repair of genomic DNA (an advantage shared with a family of partially overlapping protocols such as MSG, CrOPS, and other recent RADseq derivatives,, )...
Polymorphism Discovery and Genotyping without a Reference Genome
reagent used in the protocol.
- Use
- Due to short read lengths and high error rates, methods for analyzing second-generation sequencing data generally require mapping sequencing reads to a fully sequenced genome from the same or a very closely related species (divergence must be low enough to expect that seed conditions for read mapping will be met eve...
Reference-free RADseq Analysis by Graph Clustering
reagent used in the protocol.
- Use
- Analysis of RADseq data in the absence of a reference genome has been performed in a small but growing collection of studies employing the open-source Stacks package. Stacks includes a full suite of tools for tracking pooled samples and performing the "off-by-N" assignment of alleles to a locus describe...
Introduction
Segregating genetic markers are used to make inferences about historical processes (e.g., phylogenetic relationships, population structure) and functional mechanisms (e.g., genotype-phenotype mapping), but the optimal number of markers (fraction of the genome) needed to achieve a desired level of resolution differs...
- Use
- Segregating genetic markers are used to make inferences about historical processes (e.g., phylogenetic relationships, population structure) and functional mechanisms (e.g., genotype-phenotype mapping), but the optimal number of markers (fraction of the genome) needed to achieve a desired level of resolution differs...
Introduction
The plummeting cost and skyrocketing throughput of DNA sequencing has begun to enable sequencing of entire genomes of study populations of some focal species,; however, even in traditional model species (e.g., humans, laboratory mice, and Drosophila ) resources for complete genome resequencing of large numbers of...
- Use
- The plummeting cost and skyrocketing throughput of DNA sequencing has begun to enable sequencing of entire genomes of study populations of some focal species,; however, even in traditional model species (e.g., humans, laboratory mice, and Drosophila ) resources for complete genome resequencing of large numbers of...
Introduction
To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to recover a tunable number of regions, which are distributed randomly throughout the genome. Moreover, to maximize our ability to multiplex (i...
- Use
- To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to recover a tunable number of regions, which are distributed randomly throughout the genome. Moreover, to maximize our ability to multiplex (i...
Results
We applied the double digest RADseq (ddRADseq) method for genotyping in an emerging model system, the deer mouse (genus Peromyscus ). First, we developed and validated our method by genotyping ∼1000 segregating fixed differences in a cross between two sister species ( P. maniculatus and P. polionotus ). Second...
- Use
- We applied the double digest RADseq (ddRADseq) method for genotyping in an emerging model system, the deer mouse (genus Peromyscus ). First, we developed and validated our method by genotyping ∼1000 segregating fixed differences in a cross between two sister species ( P. maniculatus and P. polionotus ). Second...
Conclusions
The ddRADseq method described here, in conjunction with huge strides in both the throughput of sequencing (e.g., Illumina HiSeq 2000) and in genotype analysis based on short read sequence data (e.g., GATK UnifiedGenotyper, samtools) permits high throughput simultaneous discovery and genotyping of sequence polymorphi...
- Use
- The ddRADseq method described here, in conjunction with huge strides in both the throughput of sequencing (e.g., Illumina HiSeq 2000) and in genotype analysis based on short read sequence data (e.g., GATK UnifiedGenotyper, samtools) permits high throughput simultaneous discovery and genotyping of sequence polymorphi...
Before you run
What should be confirmed before execution?
First confirmation
Equipment is listed but no product mappings are linked.
Confirm before execution
This page is backed by a publishable Replication Data Ledger package with zero critical source-verification issues.
Confirm before execution
Open the source paper before finalizing run-specific details.
Procurement checkpoint
Use source-stated vendors where present. Treat mapped products as sourcing options unless the page marks an exact source match.
Open quote workflowStep-by-step procedure
What do I do, in order?
Introduction
The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical record of the demographic processes and selective forces acting in a given lineage. Exploration of mechanistic details through biochemistry, genetics, and development has lead to a deeper understanding of how genotype leads to phenotype, while exploitation of the historical record has enabled the fields of systematics, population genetics, and molecular ecology to elucidate the pressures and processes that shape diversity in populations and divergence between species. Studies of genetic information both encoded and recorded in genomes work with the same currency-comparison of homologous sequences across individuals-but these approaches employ very different modes of inference, and as such the detai...
Introduction
To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to recover a tunable number of regions, which are distributed randomly throughout the genome. Moreover, to maximize our ability to multiplex (i.e., increase the number of samples per sequencing lane), we also have developed a two-index combinatorial tagging approach (e.g., n * m individuals using n+m indices) and an accompanying computational analysis toolkit and lightweight data management component to facilitate high-order multiplexing of many hundreds of individuals. We also developed a graph clustering-based pipeline to maximize sequence read inclusion in analysis and permit detection of orthologous haplotypes regardless of divergence (i.e., without arbitrary similarity requirements), thereby improving analysi...
Validation of ddRADseq Derived Genotypes in a Laboratory Cross
ddRADseq was used to identify SNPs between two Peromyscus species, neither of which had a genome sequence available, that were crossed as part of a QTL experiment. This yielded 1158 unique markers that were fixed within, but different between, the parental species. By calculating the fraction of recombinant genotypes and LOD of linkage between markers, we generated ( A ) 24 groups of strongly linked markers, heatmap colors represent strength of linkage in both recombination frequency (upper left) and LOD (lower right) between all pairs of markers; and ( B ) a genetic map with average inter-marker distance of 1.6 cM. ddRADseq was also used to genotype wild-caught and lab-reared individuals of P. leucopus. Our ddRADseq method permitted successful genotyping of wild-caught individuals even when the allelic variants within a population are unknown. ( C ) Estimated site frequency spectrum...
Discussion
Here we describe a combination of laboratory and computational methodology to permit highly repeatable and tunable recovery of hundreds to hundreds of thousands of randomly sampled regions from a target genome. In comparison to traditional RADseq methods, ddRADseq library preparation is less expensive and rapid (<8 hours hands-on time for dozens to hundreds of samples), completely compatible with microplate format, and can be performed using limited amounts of genomic material (<100 ng). Furthermore, due to the removal of random shearing (and therefore random recovery), correlated recovery of regions across individuals results in increased robustness to variability in read count (see " "; Supporting ). As sequencing depth required to reach saturation is a direct function of the number of regions sampled ( ), the number of individuals which can be genotyped in a single sequ...
Measurement outputs
What raw and processed outputs should exist?
The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
While these approaches permit genotyping of multiple individuals with substantially reduced sequencing investment, they are limited in their ability to allow researchers to tune...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
Due to short read lengths and high error rates, methods for analyzing second-generation sequencing data generally require mapping sequencing reads to a fully sequenced genome fr...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
Analysis plan
How should the outputs become interpretable results?
Acquisition
Collect raw experimental outputs with enough metadata to preserve sample identity, condition, and timing.
inferred from protocolPreprocessing / cleaning
The plummeting cost and skyrocketing throughput of DNA sequencing has begun to enable sequencing of entire genomes of study populations of some focal species,; however, even in traditional model species (e.g., humans, laboratory mice, and Drosophila ) resources for complete...
from paperScoring or quantification
Quantify the primary readouts for this experiment: The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical...; While these approaches permit genotyping of multiple individuals with substantially reduced sequencing investment, they are limited in their ability to allow researchers to tune...; To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to...; Due to short read lengths and high error rates, methods for analyzing second-generation sequencing data generally require mapping sequencing reads to a fully sequenced genome fr....
from paperStatistical comparison
The plummeting cost and skyrocketing throughput of DNA sequencing has begun to enable sequencing of entire genomes of study populations of some focal species,; however, even i...; Precise, repeatable size selection offers two further advantages. First, because only a small fraction of restriction fragments will fall in the target size-selection regime (<5...; Analysis of RADseq data in the absence of a reference genome has been performed in a small but growing collection of studies employing the open-source Stacks package. Stacks in...
from paperReporting output
Report representative outputs alongside summary comparisons for The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical..., While these approaches permit genotyping of multiple individuals with substantially reduced sequencing investment, they are limited in their ability to allow researchers to tune..., To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to..., Due to short read lengths and high error rates, methods for analyzing second-generation sequencing data generally require mapping sequencing reads to a fully sequenced genome fr....
inferred from protocolStructured statistical methods
The plummeting cost and skyrocketing throughput of DNA sequencing has begun to enable sequencing of entire genomes of study populations of some focal species,; however, even i...; Precise, repeatable size selection offers two further advantages. First, because only a small fraction of restriction fragments will fall in the target size-selection regime (<5...; Analysis of RADseq data in the absence of a reference genome has been performed in a small but growing collection of studies employing the open-source Stacks package. Stacks in...
source structuredSource and audit
What supports the facts on this page?
Evidence quotes (4)
The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical record of the demographic processes and selective forces acting in a given lineage. Exploration of mechanistic details through biochemistry, genetics, and development has lead to a deeper understanding of how genotype leads to phenotype, while exploitation of the historical record has enabled the fields of systematics, population genetics, and molecular ecology to elucidate the pressures and processes that shape diversity in populations and divergence between species. Studies of genetic information both encoded and recorded in genomes work with the same currency-comparison of homologous sequences across individuals-but these approaches employ very different modes of inference, and as such the details of a particular experiment dictate optimal marker resolution ( ). To address the need for flexibility in marker number, we describe a next-generation sequencing-based method for determining individual sequence genotypes that can be tuned to sample a large range (from hundreds to hundreds of thous...
To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to recover a tunable number of regions, which are distributed randomly throughout the genome. Moreover, to maximize our ability to multiplex (i.e., increase the number of samples per sequencing lane), we also have developed a two-index combinatorial tagging approach (e.g., n * m individuals using n+m indices) and an accompanying computational analysis toolkit and lightweight data management component to facilitate high-order multiplexing of many hundreds of individuals. We also developed a graph clustering-based pipeline to maximize sequence read inclusion in analysis and permit detection of orthologous haplotypes regardless of divergence (i.e., without arbitrary similarity requirements), thereby improving analysis sensitivity and efficiency. Our software pipeline utilizes a novel approach for filtering resulting loci independent of coverage depth and converts the resulting haplotype multiple alignments into standard SAM/BAM format for downstream analysis, such as variant detection using the Genome Analysis...
ddRADseq was used to identify SNPs between two Peromyscus species, neither of which had a genome sequence available, that were crossed as part of a QTL experiment. This yielded 1158 unique markers that were fixed within, but different between, the parental species. By calculating the fraction of recombinant genotypes and LOD of linkage between markers, we generated ( A ) 24 groups of strongly linked markers, heatmap colors represent strength of linkage in both recombination frequency (upper left) and LOD (lower right) between all pairs of markers; and ( B ) a genetic map with average inter-marker distance of 1.6 cM. ddRADseq was also used to genotype wild-caught and lab-reared individuals of P. leucopus. Our ddRADseq method permitted successful genotyping of wild-caught individuals even when the allelic variants within a population are unknown. ( C ) Estimated site frequency spectrum of a wild population of P. leucopus caught in a single Louisiana population. ( D ) Genetic structure between five populations of P. leucopus. Dots represent individuals (N = 92) and color indicates the states from which individuals were collected: LA = Louisiana; NE =...
Here we describe a combination of laboratory and computational methodology to permit highly repeatable and tunable recovery of hundreds to hundreds of thousands of randomly sampled regions from a target genome. In comparison to traditional RADseq methods, ddRADseq library preparation is less expensive and rapid (<8 hours hands-on time for dozens to hundreds of samples), completely compatible with microplate format, and can be performed using limited amounts of genomic material (<100 ng). Furthermore, due to the removal of random shearing (and therefore random recovery), correlated recovery of regions across individuals results in increased robustness to variability in read count (see " "; Supporting ). As sequencing depth required to reach saturation is a direct function of the number of regions sampled ( ), the number of individuals which can be genotyped in a single sequencing lane is inversely proportional to the number of regions recovered. For example, we chose to recover 15-25 K regions in one experiment described here, for which saturation was achieved at less than 500 K reads per individual.
Machine-readable layer
[
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species methods",
"description": "Evidence-backed execution summary for Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species methods from Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species.",
"totalTime": "PT960M",
"step": [
{
"@type": "HowToStep",
"position": 1,
"name": "Introduction",
"text": "The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical record of the demographic processes and selective forces acting in a given lineage. Exploration of mechanistic details through biochemistry, genetics, and development has lead to a deeper understanding of how genotype leads to phenotype, while exploitation of the historical record has enabled the fields of systematics, population genetics, and molecular ecology to elucidate the pressures and processes that shape diversity in populations and divergence between species. Studies of genetic information both encoded and recorded in genomes work with the same currency-comparison of homologous sequences across individuals-but these approaches employ very different modes of inference, and as such the detai..."
},
{
"@type": "HowToStep",
"position": 2,
"name": "Introduction",
"text": "To increase the breadth of RADseq applications, we have elaborated on the method described by Baird et al. by eliminating random shearing and explicitly using size selection to recover a tunable number of regions, which are distributed randomly throughout the genome. Moreover, to maximize our ability to multiplex (i.e., increase the number of samples per sequencing lane), we also have developed a two-index combinatorial tagging approach (e.g., n * m individuals using n+m indices) and an accompanying computational analysis toolkit and lightweight data management component to facilitate high-order multiplexing of many hundreds of individuals. We also developed a graph clustering-based pipeline to maximize sequence read inclusion in analysis and permit detection of orthologous haplotypes regardless of divergence (i.e., without arbitrary similarity requirements), thereby improving analysi..."
},
{
"@type": "HowToStep",
"position": 3,
"name": "Validation of ddRADseq Derived Genotypes in a Laboratory Cross",
"text": "ddRADseq was used to identify SNPs between two Peromyscus species, neither of which had a genome sequence available, that were crossed as part of a QTL experiment. This yielded 1158 unique markers that were fixed within, but different between, the parental species. By calculating the fraction of recombinant genotypes and LOD of linkage between markers, we generated ( A ) 24 groups of strongly linked markers, heatmap colors represent strength of linkage in both recombination frequency (upper left) and LOD (lower right) between all pairs of markers; and ( B ) a genetic map with average inter-marker distance of 1.6 cM. ddRADseq was also used to genotype wild-caught and lab-reared individuals of P. leucopus. Our ddRADseq method permitted successful genotyping of wild-caught individuals even when the allelic variants within a population are unknown. ( C ) Estimated site frequency spectrum..."
},
{
"@type": "HowToStep",
"position": 4,
"name": "Discussion",
"text": "Here we describe a combination of laboratory and computational methodology to permit highly repeatable and tunable recovery of hundreds to hundreds of thousands of randomly sampled regions from a target genome. In comparison to traditional RADseq methods, ddRADseq library preparation is less expensive and rapid (<8 hours hands-on time for dozens to hundreds of samples), completely compatible with microplate format, and can be performed using limited amounts of genomic material (<100 ng). Furthermore, due to the removal of random shearing (and therefore random recovery), correlated recovery of regions across individuals results in increased robustness to variability in read count (see \" \"; Supporting ). As sequencing depth required to reach saturation is a direct function of the number of regions sampled ( ), the number of individuals which can be genotyped in a single sequ..."
}
],
"tool": [
{
"@type": "HowToTool",
"name": "Introduction"
},
{
"@type": "HowToTool",
"name": "Introduction"
},
{
"@type": "HowToTool",
"name": "Introduction"
},
{
"@type": "HowToTool",
"name": "Results"
},
{
"@type": "HowToTool",
"name": "Conclusions"
}
],
"supply": [
{
"@type": "HowToSupply",
"name": "Introduction"
},
{
"@type": "HowToSupply",
"name": "Methods"
},
{
"@type": "HowToSupply",
"name": "Polymorphism Discovery and Genotyping without a Reference Genome"
},
{
"@type": "HowToSupply",
"name": "Reference-free RADseq Analysis by Graph Clustering"
}
],
"isBasedOn": {
"@type": "ScholarlyArticle",
"headline": "Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species",
"datePublished": "2012",
"author": [
{
"@type": "Person",
"name": "Brant K. Peterson"
},
{
"@type": "Person",
"name": "Jesse N. Weber"
},
{
"@type": "Person",
"name": "Emily H. Kay"
},
{
"@type": "Person",
"name": "Heidi S. Fisher"
},
{
"@type": "Person",
"name": "Hopi E. Hoekstra"
}
],
"identifier": "10.1371/journal.pone.0037135"
}
},
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "Experiments",
"item": "https://replicatescience.com/experiments"
},
{
"@type": "ListItem",
"position": 2,
"name": "Double Digest RADseq: An Inexpensive Method for De Novo SNP Discovery and Genotyping in Model and Non-Model Species methods",
"item": "https://replicatescience.com/experiments/double-digest-radseq-an-inexpensive-method-for-de-novo-snp-discovery-and-genotyping-in-model-and-non-model-species-methods-brant-k-peterson-pmc3365034/double-digest-radseq-an-inexpensive-method-for-de-novo-snp-discovery-and-genotyping-in-model-and-non-mlpgv3fg"
}
]
}
]