A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods
Aim. Evidence-backed execution summary for A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods from A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System.
Show snapshot details
On this page
This experiment, in seven questions
Jump straight to the part of the recipe you need. Data and provenance labels stay close to the action they support.
Shopping and prep list
What do I need before I start?
mouse
Subject model for the experiment.
- Use
- confirm full cohort details in the source paper
Algorithms for OTU Recognition
reagent used in the protocol.
- Use
- Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving entities. This strategy uses a maximum likelihood approach to detect the transition of branching patterns in the gene tree from interspecific b...
Implementation
• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution.
- Use
- • Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution.
Mapping Animal Diversity: the Barcode Index Number (BIN) System
Although the selection of an effective, rapid algorithm for OTU recognition is a key step in building a DNA-based registry for animal species, it needs to be coupled with a persistent informatics platform which maps each newly acquired sequence to an existing OTU or recognizes it as a founder. Ideally, each OTU shou...
- Use
- Although the selection of an effective, rapid algorithm for OTU recognition is a key step in building a DNA-based registry for animal species, it needs to be coupled with a persistent informatics platform which maps each newly acquired sequence to an existing OTU or recognizes it as a founder. Ideally, each OTU shou...
Mapping Animal Diversity: the Barcode Index Number (BIN) System
This paper begins by examining the concordance between species inferred from prior morphological taxonomy and the OTUs recognized by RESL. Its speed and capacity to recover OTUs corresponding to known species are subsequently evaluated against four other algorithms. The final section of the paper describes varied as...
- Use
- This paper begins by examining the concordance between species inferred from prior morphological taxonomy and the OTUs recognized by RESL. Its speed and capacity to recover OTUs corresponding to known species are subsequently evaluated against four other algorithms. The final section of the paper describes varied as...
Time Trials
The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorit...
- Use
- The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorit...
Implementation
Because of its strong taxonomic performance and speed, RESL was adopted to generate OTUs for the barcode sequences on BOLD. Each of the OTUs resulting from this analysis was subsequently assigned a unique alphanumeric code with a standard structure (BOLD: 3 letters, 4 numbers). The overall informatics system suppor...
- Use
- Because of its strong taxonomic performance and speed, RESL was adopted to generate OTUs for the barcode sequences on BOLD. Each of the OTUs resulting from this analysis was subsequently assigned a unique alphanumeric code with a standard structure (BOLD: 3 letters, 4 numbers). The overall informatics system suppor...
Implementation
Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designa...
- Use
- Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designa...
BIN Partitions
As noted earlier, RESL merged 7.9% and split 2.7% of the species in the eight test datasets. These cases of discordance between BIN assignments and current taxonomy can have four explanations. They may reflect taxonomic error, sequence contamination, deficits in RESL, or the inability of sequence variation at COI to...
- Use
- As noted earlier, RESL merged 7.9% and split 2.7% of the species in the eight test datasets. These cases of discordance between BIN assignments and current taxonomy can have four explanations. They may reflect taxonomic error, sequence contamination, deficits in RESL, or the inability of sequence variation at COI to...
Before you run
What should be confirmed before execution?
First confirmation
Equipment is listed but no product mappings are linked.
Confirm before execution
This page is backed by a publishable Replication Data Ledger package with zero critical source-verification issues.
Confirm before execution
Open the source paper before finalizing run-specific details.
Procurement checkpoint
Use source-stated vendors where present. Treat mapped products as sourcing options unless the page marks an exact source match.
Open quote workflowStep-by-step procedure
What do I do, in order?
Implementation
• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution.
Benchmarking Algorithms for the Recognition of Animal Species
This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species boundaries. The speed of each algorithm was evaluated by determining the time it required to process eight trial datasets. The efficiency of each algorithm in recovering species boundaries was evaluated by examining the correspondence between the OTUs recovered by it and species memberships for each dataset. One statistical metric, F-Measure, was employed to quantify the ability of each algorithm to reproduce the reference groups (species in this case). Although mathematically concise, this metric has the disadvantage of being abstract and lacks a fixed scale of measurement (i.e. it can only be compared within a single dataset). As a result, performance was also evaluated by direct examination of the concordan...
OTU Pipeline On BOLD
Each new sequence is first filtered for quality, a process that excludes any record with less than 500 bp coverage for the barcode region of COI or with more than 1% ambiguous bases. If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile. Because sequences showing these attributes are likely to derive from pseudogenes, they are excluded. Sequences are then screened to ensure that they do not derive from bacterial (e.g. Wolbachia ) or certain external (e.g. human, mouse) contaminants by matching the sequence recovered from each specimen against a reference library of bacterial and selected vertebrate sequences. Finally, when a sequence record originates from the assembly of two or more shorter sequences, the Bellerophon package is utilized to check for possible chimeras that w...
Performance Comparison
Eight datasets were employed to test the performance of the five algorithms available for OTU recognition ( ). These datasets include four taxonomic groups (birds, fishes, moths and butterflies, bees) from two climatic regimes (temperate, tropics). Global barcode coverage is not available for any major taxonomic group, but these datasets examine taxon assemblages at both regional and continental scales. Each dataset only includes records that were associated with a valid taxonomic name; sequences associated with interim names were excluded. Seven of the datasets derive from a published study and have been placed in datasets on BOLD ( ), while the eighth includes new records which provide comprehensive coverage for the North American representatives of the Plusiinae, a moth subfamily (dx.doi.org/10.5883/DS-PLUSNA1 or GenBank accessions in ). These eight datasets have varying sampling d...
Performance Comparison of Algorithms for OTU Recognition
compares the performance of ABGD, CROP, jMOTU, and RESL in analysis of the largest dataset, the Lepidoptera of Eastern North America. Results for GMYC are unavailable because analysis was incomplete after the established time limit of two weeks. However, the performance of GMYC and RESL for other datasets is compared later. CROP, jMOTU, and RESL produced an OTU count that closely approximated the actual species number (1327), but the ABGD algorithm inflated it by about 100 species, reflecting its tendency to split sequence clusters. The tally of OTUs involved in MERGES, MIXTURES and SPLITS provides a measure of the departure of the OTUs recovered by each algorithm from recognized taxonomy. Viewed from this perspective, RESL was top performer (12.5% taxonomic discordance) for the Lepidoptera of North America dataset, while CROP was weakest (27.4%).
Performance Comparison of Algorithms for OTU Recognition
The performance of the four algorithms was also compared using the F-Measure index which returns values from 0 to 1 with 1 indicating perfect reproduction of the ground-truth partitions ( ). RESL performed best or tied for top score in 7 of 8 datasets with this test ( ).
Time Trials
The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorithms rose in an almost linear fashion with increasing size of the dataset ( ). However, RESL was more than 100 times faster than any of the other methods, completing the largest dataset (11.1 K sequences) in less than 2 minutes versus 541 minutes for the next fastest option (ABGD). More importantly, it showed the closest approach to linear computational complexity, a feature critical to the analyses of the barcode sequences on BOLD (1.81 M circa April 2013).
Implementation
Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designation. These non-compliant records are a minority because 1.68 M of the 1.81 M records derives from BOLD and nearly all (99.7%) have the required linkage to a voucher specimen and geospatial data. However, many of the 0.12 M records from GenBank lack connection to a voucher specimen and only 14.5% possess country information. BIN pages that include one or more fully compliant specimen records (sequence record >500 bp with <1% n and with trace files available, voucher specimen with at least country of origin) have been assigned a green flag, while those only based on incomple...
Measurement outputs
What raw and processed outputs should exist?
Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fie...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
The study introduces RESL, an algorithm whose design was primarily driven by the need for rapid computation to process the current 1.8 M barcode sequence records and to enable o...
- Raw artifact
- Per-sample or per-animal endpoint measurements collected during the experiment
- Processed artifact
- Structured table with cleaned measurements ready for comparison
- Reported as
- Summary statistics and between-group or across-timepoint comparisons
Analysis plan
How should the outputs become interpretable results?
Acquisition
Collect raw experimental outputs with enough metadata to preserve sample identity, condition, and timing.
inferred from protocolPreprocessing / cleaning
Puillandre et al.
from paperScoring or quantification
Quantify the primary readouts for this experiment: Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...; • Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fie...; Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent...; The study introduces RESL, an algorithm whose design was primarily driven by the need for rapid computation to process the current 1.8 M barcode sequence records and to enable o....
from paperStatistical comparison
Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...; Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent...; This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species...; Single linkage clustering is performed on the aligned sequence data. This approach ordinarily requires the generation of a distance matrix for all pairs of sequences followed by...
from paperReporting output
Report representative outputs alongside summary comparisons for Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the..., • Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fie..., Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent..., The study introduces RESL, an algorithm whose design was primarily driven by the need for rapid computation to process the current 1.8 M barcode sequence records and to enable o....
inferred from protocolStructured statistical methods
Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...; Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent...; This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species...; Single linkage clustering is performed on the aligned sequence data. This approach ordinarily requires the generation of a distance matrix for all pairs of sequences followed by...
source structuredSource and audit
What supports the facts on this page?
Evidence quotes (8)
• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution.
This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species boundaries. The speed of each algorithm was evaluated by determining the time it required to process eight trial datasets. The efficiency of each algorithm in recovering species boundaries was evaluated by examining the correspondence between the OTUs recovered by it and species memberships for each dataset. One statistical metric, F-Measure, was employed to quantify the ability of each algorithm to reproduce the reference groups (species in this case). Although mathematically concise, this metric has the disadvantage of being abstract and lacks a fixed scale of measurement (i.e. it can only be compared within a single dataset). As a result, performance was also evaluated by direct examination of the concordance between the OTUs established by each algorithm and recognized species boundaries. This comparison was implemented by examining the correspondence between species and OTU boundaries by placing each taxon into one of four categories: MATCH, SPLIT, MERGE, or MIXTURE. A species joined the MATCH categ...
Each new sequence is first filtered for quality, a process that excludes any record with less than 500 bp coverage for the barcode region of COI or with more than 1% ambiguous bases. If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile. Because sequences showing these attributes are likely to derive from pseudogenes, they are excluded. Sequences are then screened to ensure that they do not derive from bacterial (e.g. Wolbachia ) or certain external (e.g. human, mouse) contaminants by matching the sequence recovered from each specimen against a reference library of bacterial and selected vertebrate sequences. Finally, when a sequence record originates from the assembly of two or more shorter sequences, the Bellerophon package is utilized to check for possible chimeras that would arise if the component sequences inadvertently (e.g. contamination, laboratory error) derived from two different taxa.
Eight datasets were employed to test the performance of the five algorithms available for OTU recognition ( ). These datasets include four taxonomic groups (birds, fishes, moths and butterflies, bees) from two climatic regimes (temperate, tropics). Global barcode coverage is not available for any major taxonomic group, but these datasets examine taxon assemblages at both regional and continental scales. Each dataset only includes records that were associated with a valid taxonomic name; sequences associated with interim names were excluded. Seven of the datasets derive from a published study and have been placed in datasets on BOLD ( ), while the eighth includes new records which provide comprehensive coverage for the North American representatives of the Plusiinae, a moth subfamily (dx.doi.org/10.5883/DS-PLUSNA1 or GenBank accessions in ). These eight datasets have varying sampling densities, with the average number of specimens per species ranging from 2.2 to 17.3 ( ). Mean intraspecific variation and nearest-neighbour distances also show substantial heterogeneity among the species in each dataset. In testing the performance of the algorithms, the taxonomic assignment for each...
compares the performance of ABGD, CROP, jMOTU, and RESL in analysis of the largest dataset, the Lepidoptera of Eastern North America. Results for GMYC are unavailable because analysis was incomplete after the established time limit of two weeks. However, the performance of GMYC and RESL for other datasets is compared later. CROP, jMOTU, and RESL produced an OTU count that closely approximated the actual species number (1327), but the ABGD algorithm inflated it by about 100 species, reflecting its tendency to split sequence clusters. The tally of OTUs involved in MERGES, MIXTURES and SPLITS provides a measure of the departure of the OTUs recovered by each algorithm from recognized taxonomy. Viewed from this perspective, RESL was top performer (12.5% taxonomic discordance) for the Lepidoptera of North America dataset, while CROP was weakest (27.4%).
The performance of the four algorithms was also compared using the F-Measure index which returns values from 0 to 1 with 1 indicating perfect reproduction of the ground-truth partitions ( ). RESL performed best or tied for top score in 7 of 8 datasets with this test ( ).
The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorithms rose in an almost linear fashion with increasing size of the dataset ( ). However, RESL was more than 100 times faster than any of the other methods, completing the largest dataset (11.1 K sequences) in less than 2 minutes versus 541 minutes for the next fastest option (ABGD). More importantly, it showed the closest approach to linear computational complexity, a feature critical to the analyses of the barcode sequences on BOLD (1.81 M circa April 2013).
Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designation. These non-compliant records are a minority because 1.68 M of the 1.81 M records derives from BOLD and nearly all (99.7%) have the required linkage to a voucher specimen and geospatial data. However, many of the 0.12 M records from GenBank lack connection to a voucher specimen and only 14.5% possess country information. BIN pages that include one or more fully compliant specimen records (sequence record >500 bp with <1% n and with trace files available, voucher specimen with at least country of origin) have been assigned a green flag, while those only based on incomplete records are marked in yellow.
Machine-readable layer
[
{
"@context": "https://schema.org",
"@type": "HowTo",
"name": "A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods",
"description": "Evidence-backed execution summary for A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods from A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System.",
"totalTime": "PT2M",
"step": [
{
"@type": "HowToStep",
"position": 1,
"name": "Implementation",
"text": "• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution."
},
{
"@type": "HowToStep",
"position": 2,
"name": "Benchmarking Algorithms for the Recognition of Animal Species",
"text": "This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species boundaries. The speed of each algorithm was evaluated by determining the time it required to process eight trial datasets. The efficiency of each algorithm in recovering species boundaries was evaluated by examining the correspondence between the OTUs recovered by it and species memberships for each dataset. One statistical metric, F-Measure, was employed to quantify the ability of each algorithm to reproduce the reference groups (species in this case). Although mathematically concise, this metric has the disadvantage of being abstract and lacks a fixed scale of measurement (i.e. it can only be compared within a single dataset). As a result, performance was also evaluated by direct examination of the concordan..."
},
{
"@type": "HowToStep",
"position": 3,
"name": "OTU Pipeline On BOLD",
"text": "Each new sequence is first filtered for quality, a process that excludes any record with less than 500 bp coverage for the barcode region of COI or with more than 1% ambiguous bases. If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile. Because sequences showing these attributes are likely to derive from pseudogenes, they are excluded. Sequences are then screened to ensure that they do not derive from bacterial (e.g. Wolbachia ) or certain external (e.g. human, mouse) contaminants by matching the sequence recovered from each specimen against a reference library of bacterial and selected vertebrate sequences. Finally, when a sequence record originates from the assembly of two or more shorter sequences, the Bellerophon package is utilized to check for possible chimeras that w..."
},
{
"@type": "HowToStep",
"position": 4,
"name": "Performance Comparison",
"text": "Eight datasets were employed to test the performance of the five algorithms available for OTU recognition ( ). These datasets include four taxonomic groups (birds, fishes, moths and butterflies, bees) from two climatic regimes (temperate, tropics). Global barcode coverage is not available for any major taxonomic group, but these datasets examine taxon assemblages at both regional and continental scales. Each dataset only includes records that were associated with a valid taxonomic name; sequences associated with interim names were excluded. Seven of the datasets derive from a published study and have been placed in datasets on BOLD ( ), while the eighth includes new records which provide comprehensive coverage for the North American representatives of the Plusiinae, a moth subfamily (dx.doi.org/10.5883/DS-PLUSNA1 or GenBank accessions in ). These eight datasets have varying sampling d..."
},
{
"@type": "HowToStep",
"position": 5,
"name": "Performance Comparison of Algorithms for OTU Recognition",
"text": "compares the performance of ABGD, CROP, jMOTU, and RESL in analysis of the largest dataset, the Lepidoptera of Eastern North America. Results for GMYC are unavailable because analysis was incomplete after the established time limit of two weeks. However, the performance of GMYC and RESL for other datasets is compared later. CROP, jMOTU, and RESL produced an OTU count that closely approximated the actual species number (1327), but the ABGD algorithm inflated it by about 100 species, reflecting its tendency to split sequence clusters. The tally of OTUs involved in MERGES, MIXTURES and SPLITS provides a measure of the departure of the OTUs recovered by each algorithm from recognized taxonomy. Viewed from this perspective, RESL was top performer (12.5% taxonomic discordance) for the Lepidoptera of North America dataset, while CROP was weakest (27.4%)."
},
{
"@type": "HowToStep",
"position": 6,
"name": "Performance Comparison of Algorithms for OTU Recognition",
"text": "The performance of the four algorithms was also compared using the F-Measure index which returns values from 0 to 1 with 1 indicating perfect reproduction of the ground-truth partitions ( ). RESL performed best or tied for top score in 7 of 8 datasets with this test ( )."
},
{
"@type": "HowToStep",
"position": 7,
"name": "Time Trials",
"text": "The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorithms rose in an almost linear fashion with increasing size of the dataset ( ). However, RESL was more than 100 times faster than any of the other methods, completing the largest dataset (11.1 K sequences) in less than 2 minutes versus 541 minutes for the next fastest option (ABGD). More importantly, it showed the closest approach to linear computational complexity, a feature critical to the analyses of the barcode sequences on BOLD (1.81 M circa April 2013)."
},
{
"@type": "HowToStep",
"position": 8,
"name": "Implementation",
"text": "Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designation. These non-compliant records are a minority because 1.68 M of the 1.81 M records derives from BOLD and nearly all (99.7%) have the required linkage to a voucher specimen and geospatial data. However, many of the 0.12 M records from GenBank lack connection to a voucher specimen and only 14.5% possess country information. BIN pages that include one or more fully compliant specimen records (sequence record >500 bp with <1% n and with trace files available, voucher specimen with at least country of origin) have been assigned a green flag, while those only based on incomple..."
}
],
"tool": [
{
"@type": "HowToTool",
"name": "Implementation"
},
{
"@type": "HowToTool",
"name": "Mapping Animal Diversity: the Barcode Index Number (BIN) System"
},
{
"@type": "HowToTool",
"name": "Mapping Animal Diversity: the Barcode Index Number (BIN) System"
},
{
"@type": "HowToTool",
"name": "Time Trials"
},
{
"@type": "HowToTool",
"name": "Implementation"
},
{
"@type": "HowToTool",
"name": "Implementation"
},
{
"@type": "HowToTool",
"name": "BIN Partitions"
}
],
"supply": [
{
"@type": "HowToSupply",
"name": "Algorithms for OTU Recognition"
}
],
"isBasedOn": {
"@type": "ScholarlyArticle",
"headline": "A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System",
"datePublished": "2013",
"author": [
{
"@type": "Person",
"name": "Sujeevan Ratnasingham"
},
{
"@type": "Person",
"name": "Paul D. N. Hebert"
}
],
"identifier": "10.1371/journal.pone.0066213"
}
},
{
"@context": "https://schema.org",
"@type": "BreadcrumbList",
"itemListElement": [
{
"@type": "ListItem",
"position": 1,
"name": "Experiments",
"item": "https://replicatescience.com/experiments"
},
{
"@type": "ListItem",
"position": 2,
"name": "A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods",
"item": "https://replicatescience.com/experiments/a-dna-based-registry-for-all-animal-species-the-barcode-index-number-bin-system-methods-sujeevan-ratnasingham-pmc3704603/a-dna-based-registry-for-all-animal-species-the-barcode-index-number-bin-system-mlpgzc0o"
}
]
}
]