02 · Shopping and prep

Shopping and prep list

What do I need before I start?

biologicalsource linked

mouse

Subject model for the experiment.

Use: confirm full cohort details in the source paper

Each new sequence is first filtered for quality, a process that excludes any record with less than 500 bp coverage for the barcode region of COI or with more than 1% ambiguous bases. If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile. Because sequences showing these attributes are likely to derive from pseudogenes, they are excluded. Sequences are then screened to ensure that they do not derive from bacterial (e.g. Wolbachia ) or certain external (e.g. human, mouse) contaminants by matching the sequence recovered from each specimen against a reference library of bacterial and selected vertebrate sequences. Finally, when a sequence record originates from the assembly of two or more shorter sequences, the Bellerophon package is utilized to check for possible chimeras that would arise if the component sequences inadvertently (e.g. contamination, laboratory error) derived from two different taxa.Confirm cohort

reagentsource linked

Algorithms for OTU Recognition

reagent used in the protocol.

Use: Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving entities. This strategy uses a maximum likelihood approach to detect the transition of branching patterns in the gene tree from interspecific b...

source-linked evidence quoteConfirm item

instrumentsource linked

Implementation

• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution.

Use: • Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution.

source-linked evidence quoteConfirm apparatus

instrumentsource linked

Mapping Animal Diversity: the Barcode Index Number (BIN) System

Although the selection of an effective, rapid algorithm for OTU recognition is a key step in building a DNA-based registry for animal species, it needs to be coupled with a persistent informatics platform which maps each newly acquired sequence to an existing OTU or recognizes it as a founder. Ideally, each OTU shou...

Use: Although the selection of an effective, rapid algorithm for OTU recognition is a key step in building a DNA-based registry for animal species, it needs to be coupled with a persistent informatics platform which maps each newly acquired sequence to an existing OTU or recognizes it as a founder. Ideally, each OTU shou...

source-linked evidence quoteConfirm apparatus

instrumentsource linked

Mapping Animal Diversity: the Barcode Index Number (BIN) System

This paper begins by examining the concordance between species inferred from prior morphological taxonomy and the OTUs recognized by RESL. Its speed and capacity to recover OTUs corresponding to known species are subsequently evaluated against four other algorithms. The final section of the paper describes varied as...

Use: This paper begins by examining the concordance between species inferred from prior morphological taxonomy and the OTUs recognized by RESL. Its speed and capacity to recover OTUs corresponding to known species are subsequently evaluated against four other algorithms. The final section of the paper describes varied as...

source-linked evidence quoteConfirm apparatus

instrumentsource linked

Time Trials

Use: The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorit...

source-linked evidence quoteConfirm apparatus

instrumentsource linked

Implementation

Because of its strong taxonomic performance and speed, RESL was adopted to generate OTUs for the barcode sequences on BOLD. Each of the OTUs resulting from this analysis was subsequently assigned a unique alphanumeric code with a standard structure (BOLD: 3 letters, 4 numbers). The overall informatics system suppor...

Use: Because of its strong taxonomic performance and speed, RESL was adopted to generate OTUs for the barcode sequences on BOLD. Each of the OTUs resulting from this analysis was subsequently assigned a unique alphanumeric code with a standard structure (BOLD: 3 letters, 4 numbers). The overall informatics system suppor...

source-linked evidence quoteConfirm apparatus

instrumentsource linked

Implementation

Use: Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designa...

source-linked evidence quoteConfirm apparatus

instrumentsource linked

BIN Partitions

As noted earlier, RESL merged 7.9% and split 2.7% of the species in the eight test datasets. These cases of discordance between BIN assignments and current taxonomy can have four explanations. They may reflect taxonomic error, sequence contamination, deficits in RESL, or the inability of sequence variation at COI to...

Use: As noted earlier, RESL merged 7.9% and split 2.7% of the species in the eight test datasets. These cases of discordance between BIN assignments and current taxonomy can have four explanations. They may reflect taxonomic error, sequence contamination, deficits in RESL, or the inability of sequence variation at COI to...

source-linked evidence quoteConfirm apparatus

03 · Execution checks

Before you run

What should be confirmed before execution?

First confirmation

Equipment is listed but no product mappings are linked.

Confirm before execution

This page is backed by a publishable Replication Data Ledger package with zero critical source-verification issues.

Confirm before execution

Open the source paper before finalizing run-specific details.

Procurement checkpoint

Use source-stated vendors where present. Treat mapped products as sourcing options unless the page marks an exact source match.

Open quote workflow

04 · Procedure

Step-by-step procedure

What do I do, in order?

01extracted step

1 evidence link

Implementation

NeededImplementation, Implementation, Implementation

Timingnot specified

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

02extracted step

1 evidence link

Benchmarking Algorithms for the Recognition of Animal Species

This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species boundaries. The speed of each algorithm was evaluated by determining the time it required to process eight trial datasets. The efficiency of each algorithm in recovering species boundaries was evaluated by examining the correspondence between the OTUs recovered by it and species memberships for each dataset. One statistical metric, F-Measure, was employed to quantify the ability of each algorithm to reproduce the reference groups (species in this case). Although mathematically concise, this metric has the disadvantage of being abstract and lacks a fixed scale of measurement (i.e. it can only be compared within a single dataset). As a result, performance was also evaluated by direct examination of the concordan...

Neededsource paper and local SOP

Timingnot specified

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

03extracted step

1 evidence link

OTU Pipeline On BOLD

Neededsource paper and local SOP

Timingnot specified

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

04extracted step

1 evidence link

Performance Comparison

Eight datasets were employed to test the performance of the five algorithms available for OTU recognition ( ). These datasets include four taxonomic groups (birds, fishes, moths and butterflies, bees) from two climatic regimes (temperate, tropics). Global barcode coverage is not available for any major taxonomic group, but these datasets examine taxon assemblages at both regional and continental scales. Each dataset only includes records that were associated with a valid taxonomic name; sequences associated with interim names were excluded. Seven of the datasets derive from a published study and have been placed in datasets on BOLD ( ), while the eighth includes new records which provide comprehensive coverage for the North American representatives of the Plusiinae, a moth subfamily (dx.doi.org/10.5883/DS-PLUSNA1 or GenBank accessions in ). These eight datasets have varying sampling d...

Neededsource paper and local SOP

Timingnot specified

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

05extracted step

1 evidence link

Performance Comparison of Algorithms for OTU Recognition

compares the performance of ABGD, CROP, jMOTU, and RESL in analysis of the largest dataset, the Lepidoptera of Eastern North America. Results for GMYC are unavailable because analysis was incomplete after the established time limit of two weeks. However, the performance of GMYC and RESL for other datasets is compared later. CROP, jMOTU, and RESL produced an OTU count that closely approximated the actual species number (1327), but the ABGD algorithm inflated it by about 100 species, reflecting its tendency to split sequence clusters. The tally of OTUs involved in MERGES, MIXTURES and SPLITS provides a measure of the departure of the OTUs recovered by each algorithm from recognized taxonomy. Viewed from this perspective, RESL was top performer (12.5% taxonomic discordance) for the Lepidoptera of North America dataset, while CROP was weakest (27.4%).

NeededAlgorithms for OTU Recognition

Timingnot specified

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

06extracted step

1 evidence link

Performance Comparison of Algorithms for OTU Recognition

The performance of the four algorithms was also compared using the F-Measure index which returns values from 0 to 1 with 1 indicating perfect reproduction of the ground-truth partitions ( ). RESL performed best or tied for top score in 7 of 8 datasets with this test ( ).

NeededAlgorithms for OTU Recognition

Timingnot specified

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

07extracted step

1 evidence link

Time Trials

The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorithms rose in an almost linear fashion with increasing size of the dataset ( ). However, RESL was more than 100 times faster than any of the other methods, completing the largest dataset (11.1 K sequences) in less than 2 minutes versus 541 minutes for the next fastest option (ABGD). More importantly, it showed the closest approach to linear computational complexity, a feature critical to the analyses of the barcode sequences on BOLD (1.81 M circa April 2013).

NeededTime Trials

Timing2 minutes

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

08extracted step

1 evidence link

Implementation

Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designation. These non-compliant records are a minority because 1.68 M of the 1.81 M records derives from BOLD and nearly all (99.7%) have the required linkage to a voucher specimen and geospatial data. However, many of the 0.12 M records from GenBank lack connection to a voucher specimen and only 14.5% possess country information. BIN pages that include one or more fully compliant specimen records (sequence record >500 bp with <1% n and with trace files available, voucher specimen with at least country of origin) have been assigned a green flag, while those only based on incomple...

NeededImplementation, Implementation, Implementation

Timingnot specified

ConditionsDirectly quoted from source evidence; verify all lab-specific constraints against the source paper before execution.

OutputPuillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

05 · Measurement

Measurement outputs

What raw and processed outputs should exist?

from paper

Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...

Raw artifact: Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact: Structured table with cleaned measurements ready for comparison
Reported as: Summary statistics and between-group or across-timepoint comparisons

from paper

• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fie...

Raw artifact: Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact: Structured table with cleaned measurements ready for comparison
Reported as: Summary statistics and between-group or across-timepoint comparisons

from paper

Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent...

Raw artifact: Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact: Structured table with cleaned measurements ready for comparison
Reported as: Summary statistics and between-group or across-timepoint comparisons

from paper

The study introduces RESL, an algorithm whose design was primarily driven by the need for rapid computation to process the current 1.8 M barcode sequence records and to enable o...

Raw artifact: Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact: Structured table with cleaned measurements ready for comparison
Reported as: Summary statistics and between-group or across-timepoint comparisons

06 · Analysis

Analysis plan

How should the outputs become interpretable results?

Acquisition

Collect raw experimental outputs with enough metadata to preserve sample identity, condition, and timing.

inferred from protocol

Preprocessing / cleaning

Puillandre et al.

from paper

Scoring or quantification

Quantify the primary readouts for this experiment: Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...; • Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fie...; Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent...; The study introduces RESL, an algorithm whose design was primarily driven by the need for rapid computation to process the current 1.8 M barcode sequence records and to enable o....

from paper

Statistical comparison

Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the...; Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent...; This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species...; Single linkage clustering is performed on the aligned sequence data. This approach ordinarily requires the generation of a distance matrix for all pairs of sequences followed by...

from paper

Reporting output

Report representative outputs alongside summary comparisons for Puillandre et al. developed a statistical method, Automatic Barcode Gap Discovery (ABGD), to generate OTUs based on features in sequence distance distributions that indicate the..., • Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fie..., Pons et al. proposed a model-based solution, one based in phylogenetic approaches using the General Mixed Yule Coalescent (GMYC) model that represents independently evolving ent..., The study introduces RESL, an algorithm whose design was primarily driven by the need for rapid computation to process the current 1.8 M barcode sequence records and to enable o....

inferred from protocol

Structured statistical methods

source structured

07 · Source layer

Source and audit

What supports the facts on this page?

Source identityavailable

Structured protocolavailable

Methods evidenceavailable

Materials/equipment listedavailable

Specific product linksneeds review

Evidence quotes (8)

• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution.
Source method evidence

This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species boundaries. The speed of each algorithm was evaluated by determining the time it required to process eight trial datasets. The efficiency of each algorithm in recovering species boundaries was evaluated by examining the correspondence between the OTUs recovered by it and species memberships for each dataset. One statistical metric, F-Measure, was employed to quantify the ability of each algorithm to reproduce the reference groups (species in this case). Although mathematically concise, this metric has the disadvantage of being abstract and lacks a fixed scale of measurement (i.e. it can only be compared within a single dataset). As a result, performance was also evaluated by direct examination of the concordance between the OTUs established by each algorithm and recognized species boundaries. This comparison was implemented by examining the correspondence between species and OTU boundaries by placing each taxon into one of four categories: MATCH, SPLIT, MERGE, or MIXTURE. A species joined the MATCH categ...
Source method evidence

Each new sequence is first filtered for quality, a process that excludes any record with less than 500 bp coverage for the barcode region of COI or with more than 1% ambiguous bases. If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile. Because sequences showing these attributes are likely to derive from pseudogenes, they are excluded. Sequences are then screened to ensure that they do not derive from bacterial (e.g. Wolbachia ) or certain external (e.g. human, mouse) contaminants by matching the sequence recovered from each specimen against a reference library of bacterial and selected vertebrate sequences. Finally, when a sequence record originates from the assembly of two or more shorter sequences, the Bellerophon package is utilized to check for possible chimeras that would arise if the component sequences inadvertently (e.g. contamination, laboratory error) derived from two different taxa.
Source method evidence

Eight datasets were employed to test the performance of the five algorithms available for OTU recognition ( ). These datasets include four taxonomic groups (birds, fishes, moths and butterflies, bees) from two climatic regimes (temperate, tropics). Global barcode coverage is not available for any major taxonomic group, but these datasets examine taxon assemblages at both regional and continental scales. Each dataset only includes records that were associated with a valid taxonomic name; sequences associated with interim names were excluded. Seven of the datasets derive from a published study and have been placed in datasets on BOLD ( ), while the eighth includes new records which provide comprehensive coverage for the North American representatives of the Plusiinae, a moth subfamily (dx.doi.org/10.5883/DS-PLUSNA1 or GenBank accessions in ). These eight datasets have varying sampling densities, with the average number of specimens per species ranging from 2.2 to 17.3 ( ). Mean intraspecific variation and nearest-neighbour distances also show substantial heterogeneity among the species in each dataset. In testing the performance of the algorithms, the taxonomic assignment for each...
Source method evidence

compares the performance of ABGD, CROP, jMOTU, and RESL in analysis of the largest dataset, the Lepidoptera of Eastern North America. Results for GMYC are unavailable because analysis was incomplete after the established time limit of two weeks. However, the performance of GMYC and RESL for other datasets is compared later. CROP, jMOTU, and RESL produced an OTU count that closely approximated the actual species number (1327), but the ABGD algorithm inflated it by about 100 species, reflecting its tendency to split sequence clusters. The tally of OTUs involved in MERGES, MIXTURES and SPLITS provides a measure of the departure of the OTUs recovered by each algorithm from recognized taxonomy. Viewed from this perspective, RESL was top performer (12.5% taxonomic discordance) for the Lepidoptera of North America dataset, while CROP was weakest (27.4%).
Source method evidence

The performance of the four algorithms was also compared using the F-Measure index which returns values from 0 to 1 with 1 indicating perfect reproduction of the ground-truth partitions ( ). RESL performed best or tied for top score in 7 of 8 datasets with this test ( ).
Source method evidence

The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorithms rose in an almost linear fashion with increasing size of the dataset ( ). However, RESL was more than 100 times faster than any of the other methods, completing the largest dataset (11.1 K sequences) in less than 2 minutes versus 541 minutes for the next fastest option (ABGD). More importantly, it showed the closest approach to linear computational complexity, a feature critical to the analyses of the barcode sequences on BOLD (1.81 M circa April 2013).
Source method evidence

Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designation. These non-compliant records are a minority because 1.68 M of the 1.81 M records derives from BOLD and nearly all (99.7%) have the required linkage to a voucher specimen and geospatial data. However, many of the 0.12 M records from GenBank lack connection to a voucher specimen and only 14.5% possess country information. BIN pages that include one or more fully compliant specimen records (sequence record >500 bp with <1% n and with trace files available, voucher specimen with at least country of origin) have been assigned a green flag, while those only based on incomplete records are marked in yellow.
Source method evidence

Machine-readable layer

[
  {
    "@context": "https://schema.org",
    "@type": "HowTo",
    "name": "A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods",
    "description": "Evidence-backed execution summary for A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods from A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System.",
    "totalTime": "PT2M",
    "step": [
      {
        "@type": "HowToStep",
        "position": 1,
        "name": "Implementation",
        "text": "• Micro-Attribution: Attribution details for each record are provided with collector, identifier, photographer, sequencing facility, and specimen depository as primary fields. Attribution is tallied and sorted based on the number of records associated with each individual or institution."
      },
      {
        "@type": "HowToStep",
        "position": 2,
        "name": "Benchmarking Algorithms for the Recognition of Animal Species",
        "text": "This study evaluates the performance of five algorithms (ABGD, CROP, GMYC, jMOTU, RESL) from two perspectives - their speed, and their effectiveness in recovering species boundaries. The speed of each algorithm was evaluated by determining the time it required to process eight trial datasets. The efficiency of each algorithm in recovering species boundaries was evaluated by examining the correspondence between the OTUs recovered by it and species memberships for each dataset. One statistical metric, F-Measure, was employed to quantify the ability of each algorithm to reproduce the reference groups (species in this case). Although mathematically concise, this metric has the disadvantage of being abstract and lacks a fixed scale of measurement (i.e. it can only be compared within a single dataset). As a result, performance was also evaluated by direct examination of the concordan..."
      },
      {
        "@type": "HowToStep",
        "position": 3,
        "name": "OTU Pipeline On BOLD",
        "text": "Each new sequence is first filtered for quality, a process that excludes any record with less than 500 bp coverage for the barcode region of COI or with more than 1% ambiguous bases. If a sequence meets these quality requirements, it is then checked for reading frame shifts as indicated by stop codons or improbable peptides given the COI profile. Because sequences showing these attributes are likely to derive from pseudogenes, they are excluded. Sequences are then screened to ensure that they do not derive from bacterial (e.g. Wolbachia ) or certain external (e.g. human, mouse) contaminants by matching the sequence recovered from each specimen against a reference library of bacterial and selected vertebrate sequences. Finally, when a sequence record originates from the assembly of two or more shorter sequences, the Bellerophon package is utilized to check for possible chimeras that w..."
      },
      {
        "@type": "HowToStep",
        "position": 4,
        "name": "Performance Comparison",
        "text": "Eight datasets were employed to test the performance of the five algorithms available for OTU recognition ( ). These datasets include four taxonomic groups (birds, fishes, moths and butterflies, bees) from two climatic regimes (temperate, tropics). Global barcode coverage is not available for any major taxonomic group, but these datasets examine taxon assemblages at both regional and continental scales. Each dataset only includes records that were associated with a valid taxonomic name; sequences associated with interim names were excluded. Seven of the datasets derive from a published study and have been placed in datasets on BOLD ( ), while the eighth includes new records which provide comprehensive coverage for the North American representatives of the Plusiinae, a moth subfamily (dx.doi.org/10.5883/DS-PLUSNA1 or GenBank accessions in ). These eight datasets have varying sampling d..."
      },
      {
        "@type": "HowToStep",
        "position": 5,
        "name": "Performance Comparison of Algorithms for OTU Recognition",
        "text": "compares the performance of ABGD, CROP, jMOTU, and RESL in analysis of the largest dataset, the Lepidoptera of Eastern North America. Results for GMYC are unavailable because analysis was incomplete after the established time limit of two weeks. However, the performance of GMYC and RESL for other datasets is compared later. CROP, jMOTU, and RESL produced an OTU count that closely approximated the actual species number (1327), but the ABGD algorithm inflated it by about 100 species, reflecting its tendency to split sequence clusters. The tally of OTUs involved in MERGES, MIXTURES and SPLITS provides a measure of the departure of the OTUs recovered by each algorithm from recognized taxonomy. Viewed from this perspective, RESL was top performer (12.5% taxonomic discordance) for the Lepidoptera of North America dataset, while CROP was weakest (27.4%)."
      },
      {
        "@type": "HowToStep",
        "position": 6,
        "name": "Performance Comparison of Algorithms for OTU Recognition",
        "text": "The performance of the four algorithms was also compared using the F-Measure index which returns values from 0 to 1 with 1 indicating perfect reproduction of the ground-truth partitions ( ). RESL performed best or tied for top score in 7 of 8 datasets with this test ( )."
      },
      {
        "@type": "HowToStep",
        "position": 7,
        "name": "Time Trials",
        "text": "The run-time for RESL was compared with those for the other three algorithms on a 2012 model iMac with an i7 Intel processor and 8 gigabytes of memory. CROP, jMOTU, and RESL could take advantage of the four CPU cores on this system and were allowed to do so. This analysis revealed that run times for all four algorithms rose in an almost linear fashion with increasing size of the dataset ( ). However, RESL was more than 100 times faster than any of the other methods, completing the largest dataset (11.1 K sequences) in less than 2 minutes versus 541 minutes for the next fastest option (ABGD). More importantly, it showed the closest approach to linear computational complexity, a feature critical to the analyses of the barcode sequences on BOLD (1.81 M circa April 2013)."
      },
      {
        "@type": "HowToStep",
        "position": 8,
        "name": "Implementation",
        "text": "Because the BIN system gains power with increasing species coverage, records have been analyzed which are not fully compliant with the DNA barcode standard. Although all records in the BIN registry meet the sequence standard (>500 bp, <1% n), some lack the specimen data required to qualify for formal barcode designation. These non-compliant records are a minority because 1.68 M of the 1.81 M records derives from BOLD and nearly all (99.7%) have the required linkage to a voucher specimen and geospatial data. However, many of the 0.12 M records from GenBank lack connection to a voucher specimen and only 14.5% possess country information. BIN pages that include one or more fully compliant specimen records (sequence record >500 bp with <1% n and with trace files available, voucher specimen with at least country of origin) have been assigned a green flag, while those only based on incomple..."
      }
    ],
    "tool": [
      {
        "@type": "HowToTool",
        "name": "Implementation"
      },
      {
        "@type": "HowToTool",
        "name": "Mapping Animal Diversity: the Barcode Index Number (BIN) System"
      },
      {
        "@type": "HowToTool",
        "name": "Mapping Animal Diversity: the Barcode Index Number (BIN) System"
      },
      {
        "@type": "HowToTool",
        "name": "Time Trials"
      },
      {
        "@type": "HowToTool",
        "name": "Implementation"
      },
      {
        "@type": "HowToTool",
        "name": "Implementation"
      },
      {
        "@type": "HowToTool",
        "name": "BIN Partitions"
      }
    ],
    "supply": [
      {
        "@type": "HowToSupply",
        "name": "Algorithms for OTU Recognition"
      }
    ],
    "isBasedOn": {
      "@type": "ScholarlyArticle",
      "headline": "A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System",
      "datePublished": "2013",
      "author": [
        {
          "@type": "Person",
          "name": "Sujeevan Ratnasingham"
        },
        {
          "@type": "Person",
          "name": "Paul D. N. Hebert"
        }
      ],
      "identifier": "10.1371/journal.pone.0066213"
    }
  },
  {
    "@context": "https://schema.org",
    "@type": "BreadcrumbList",
    "itemListElement": [
      {
        "@type": "ListItem",
        "position": 1,
        "name": "Experiments",
        "item": "https://replicatescience.com/experiments"
      },
      {
        "@type": "ListItem",
        "position": 2,
        "name": "A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System methods",
        "item": "https://replicatescience.com/experiments/a-dna-based-registry-for-all-animal-species-the-barcode-index-number-bin-system-methods-sujeevan-ratnasingham-pmc3704603/a-dna-based-registry-for-all-animal-species-the-barcode-index-number-bin-system-mlpgzc0o"
      }
    ]
  }
]

DOI PMC 100% completeness score