Taxonomy Description Writing Evaluation - Susan Michie | ReplicateScience

View Abstract

Background Meeting global health challenges requires effective behaviour change interventions (BCIs). This depends on advancing the science of behaviour change which, in turn, depends on accurate intervention reporting. Current reporting often lacks detail, preventing accurate replication and implementation. Recent developments have specified intervention content into behaviour change techniques (BCTs) – the ‘active ingredients’, for example goal-setting, self-monitoring of behaviour. BCTs are ‘the smallest components compatible with retaining the postulated active ingredients, i.e. the proposed mechanisms of change. They can be used alone or in combination with other BCTs’ (Michie S, Johnston M. Theories and techniques of behaviour change: developing a cumulative science of behaviour change. Health Psychol Rev 2012; 6:1–6). Domain-specific taxonomies of BCTs have been developed, for example healthy eating and physical activity, smoking cessation and alcohol consumption. We need to build on these to develop an internationally shared language for specifying and developing interventions. This technology can be used for synthesising evidence, implementing effective interventions and testing theory. It has enormous potential added value for science and global health. Objective (1) To develop a method of specifying content of BCIs in terms of component BCTs; (2) to lay a foundation for a comprehensive methodology applicable to different types of complex interventions; (3) to develop resources to support application of the taxonomy; and (4) to achieve multidisciplinary and international acceptance for future development. Design and participants Four hundred participants (systematic reviewers, researchers, practitioners, policy-makers) from 12 countries engaged in investigating, designing and/or delivering BCIs. Development of the taxonomy involved a Delphi procedure, an iterative process of revisions and consultation with 41 international experts; hierarchical structure of the list was developed using inductive ‘bottom-up’ and theory-driven ‘top-down’ open-sort procedures ( n = 36); training in use of the taxonomy (1-day workshops and distance group tutorials) ( n = 161) was evaluated by changes in intercoder reliability and validity (agreement with expert consensus); evaluating the taxonomy for coding interventions was assessed by reliability (intercoder; test–retest) and validity ( n = 40 trained coders); and evaluating the taxonomy for writing descriptions was assessed by reliability (intercoder; test–retest) and by experimentally testing its value ( n = 190). Results Ninety-three distinct, non-overlapping BCTs with clear labels and definitions formed Behaviour Change Technique Taxonomy version 1 (BCTTv1). BCTs clustered into 16 groupings using a ‘bottom-up’ open-sort procedure; there was overlap between these and groupings produced by a theory-driven, ‘top-down’ procedure. Both training methods improved validity (both p < 0.05), doubled the proportion of coders achieving competence and improved confidence in identifying BCTs in workshops (both p < 0.001) but did not improve intercoder reliability. Good intercoder reliability was observed for 80 of the 93 BCTs. Good within-coder agreement was observed after 1 month ( p < 0.001). Validity was good for 14 of 15 BCTs in the descriptions. The usefulness of BCTTv1 to report descriptions of observed interventions had mixed results. Conclusions The developed taxonomy (BCTTv1) provides a methodology for identifying content of complex BCIs and a foundation for international cross-disciplinary collaboration for developing more effective interventions to improve health. Further work is needed to examine its usefulness for reporting interventions. Funding This project was funded by the Medical Research Council Ref: G0901474/1. Funding also came from the Peninsula Collaboration for Leadership in Applied Health Research and Care.

Before You Start

Use this page as an execution guide, then fall back to the source paper whenever you need exact exclusions, dosing details, or assay-specific caveats.

Confirm first

Verify the animal model, intervention setup, and collection timepoints against the source paper.
Check that every direct vendor link matches the exact specification your lab plans to run.

Use the page like this

Work through the protocol steps in order and use the inline vendor chips only when you need to source or verify an item.
Jump to Experimental Context for readouts, data shape, and analysis flow before planning downstream analysis.

Jump to vendor comparison Jump to checklist Jump to methods evidence

Protocol Steps

Start here. The step list is optimized for running the experiment, with direct vendor links available inline when you need to source a cited item.

Taxonomy Development via Delphi Procedure

Development of the taxonomy involved a Delphi procedure, an iterative process of revisions and consultation with international experts to create behaviour change technique taxonomy

Note: Involved 41 international experts in iterative consultation process

View evidence from paper

“Development of the taxonomy involved a Delphi procedure, an iterative process of revisions and consultation with 41 international experts”

Hierarchical Structure Development

Hierarchical structure of the taxonomy list was developed using inductive 'bottom-up' and theory-driven 'top-down' open-sort procedures

Note: 36 participants engaged in open-sort procedures

View evidence from paper

“hierarchical structure of the list was developed using inductive 'bottom-up' and theory-driven 'top-down' open-sort procedures (n = 36)”

Training in Taxonomy Use

Training in use of the taxonomy was provided through 1-day workshops and distance group tutorials to evaluate changes in intercoder reliability and validity

1-day workshops plus distance tutorials

Note: 161 participants received training; evaluation measured agreement with expert consensus

View evidence from paper

“training in use of the taxonomy (1-day workshops and distance group tutorials) (n = 161) was evaluated by changes in intercoder reliability and validity”

Evaluation of Taxonomy for Coding Interventions

Evaluating the taxonomy for coding interventions was assessed by reliability measures (intercoder and test-retest) and validity assessment

Note: 40 trained coders participated in this evaluation phase

View evidence from paper

“evaluating the taxonomy for coding interventions was assessed by reliability (intercoder; test-retest) and validity (n = 40 trained coders)”

Evaluation of Taxonomy for Writing Descriptions

Evaluating the taxonomy for writing descriptions was assessed by reliability measures (intercoder and test-retest) and by experimentally testing its value

Note: 190 participants engaged in this evaluation phase

View evidence from paper

“evaluating the taxonomy for writing descriptions was assessed by reliability (intercoder; test-retest) and by experimentally testing its value (n = 190)”

Intercoder Reliability Assessment

Assessment of intercoder reliability across the 93 behaviour change techniques in the taxonomy

Note: Good intercoder reliability was observed for 80 of the 93 BCTs

View evidence from paper

“Good intercoder reliability was observed for 80 of the 93 BCTs”

Test-Retest Reliability Assessment

Within-coder agreement was assessed after 1 month to evaluate test-retest reliability

1 month interval

Note: Good within-coder agreement was observed

View evidence from paper

“Good within-coder agreement was observed after 1 month (p < 0.001)”

Validity Assessment of Descriptions

Validity was assessed for behaviour change technique descriptions in the taxonomy

Note: Good validity was observed for 14 of 15 BCTs in the descriptions

View evidence from paper

“Validity was good for 14 of 15 BCTs in the descriptions”

Experimental Context

This section explains what the experiment is doing, which readouts matter, what the data artifacts usually look like, and how the analysis should flow from raw capture to reported result.

What Is This Experiment Doing?

To evaluate a taxonomy for writing intervention descriptions through intercoder and test-retest reliability assessment and experimental testing of usefulness with 190 participants

Objective

To evaluate a taxonomy for writing intervention descriptions through intercoder and test-retest reliability assessment and experimental testing of usefulness with 190 participants

Subjects

From paper

human

Sample count

From paper

190

Cohort notes

From paper

Participants were systematic reviewers, researchers, practitioners, and policy-makers from 12 countries engaged in investigating, designing and/or delivering behaviour change interventions

Study Landmarks

Taxonomy Development via Delphi Procedure

Hierarchical Structure Development

Training in Taxonomy Use (1-day workshops plus distance tutorials)

Evaluation of Taxonomy for Coding Interventions

What Are The Important Readouts?

Intercoder reliability for coding interventions

From paper

Statistical significance testing was performed (p-values reported); comparison of training methods on validity, competence achievement, and confidence; assessment of reliability and validity across the 93 behaviour change techniques

Artifact type

Endpoint measurements summarized by group or timepoint

Comparison focus

Compare endpoint magnitude between groups, timepoints, or both

Test-retest reliability (within-coder agreement)

From paper

Artifact type

Endpoint measurements summarized by group or timepoint

Comparison focus

Compare endpoint magnitude between groups, timepoints, or both

Validity of taxonomy descriptions (agreement with expert consensus)

From paper

Artifact type

Endpoint measurements summarized by group or timepoint

Comparison focus

Compare endpoint magnitude between groups, timepoints, or both

Proportion of coders achieving competence

From paper

Artifact type

Endpoint measurements summarized by group or timepoint

Comparison focus

Compare endpoint magnitude between groups, timepoints, or both

What Does The Data Look Like?

Intercoder reliability for coding interventions

From paper

Raw artifact

Per-sample or per-animal endpoint measurements collected during the experiment

Processed artifact

Structured table with cleaned measurements ready for comparison

Final reported form

Summary statistics and between-group or across-timepoint comparisons

Test-retest reliability (within-coder agreement)

From paper

Raw artifact

Per-sample or per-animal endpoint measurements collected during the experiment

Processed artifact

Structured table with cleaned measurements ready for comparison

Final reported form

Summary statistics and between-group or across-timepoint comparisons

Validity of taxonomy descriptions (agreement with expert consensus)

From paper

Raw artifact

Per-sample or per-animal endpoint measurements collected during the experiment

Processed artifact

Structured table with cleaned measurements ready for comparison

Final reported form

Summary statistics and between-group or across-timepoint comparisons

Proportion of coders achieving competence

From paper

Raw artifact

Per-sample or per-animal endpoint measurements collected during the experiment

Processed artifact

Structured table with cleaned measurements ready for comparison

Final reported form

Summary statistics and between-group or across-timepoint comparisons

How To Analyze It

Acquisition

Inferred from protocol

Collect raw experimental outputs with enough metadata to preserve sample identity, condition, and timing.

Preprocessing / cleaning

From paper

Scoring or quantification

From paper

Quantify the primary readouts for this experiment: Intercoder reliability for coding interventions; Test-retest reliability (within-coder agreement); Validity of taxonomy descriptions (agreement with expert consensus); Proportion of coders achieving competence.

Statistical comparison

Inferred from protocol

Statistical method not yet structured for this page.

Reporting output

Inferred from protocol

Report representative outputs alongside summary comparisons for Intercoder reliability for coding interventions, Test-retest reliability (within-coder agreement), Validity of taxonomy descriptions (agreement with expert consensus), Proportion of coders achieving competence.

Methods Evidence

Source links and direct wording from the methods section for validation and deeper review.

Open DOI

Citation

Susan Michie et al. (2015). Behaviour change techniques: the development and evaluation of a taxonomic method for reporting and describing behaviour change interventions (a suite of five studies involving consensus methods, randomised controlled trials and analysis of qualitative data). Health Technology Assessment

From paper

“”

From paper

“”

From paper

“”

From paper

“”

8 evidence quotes capturedJump to Index

Verified Vendor Options

Direct vendor pages are linked from the protocol above. This section stays focused on the full comparison view and the prep checklist.

Evidence & Verification

Use this section as the page quality checkpoint. It keeps section navigation, evidence access, readiness, and verification meaning in one place.

Current status surfaces were computed from experiment data updated Feb 28, 2026.

Page Map

Overview Experimental Context Protocol Vendor Comparison Checklist Methods Evidence Related Experiments

8 structured steps8 evidence quotes0 vendor items with direct pages0 total outbound links

Source access

Jump back into the original paper or the methods evidence section when you need exact wording, exclusions, or method-specific caveats.

Open Methods Evidence Open DOI

Protocol Status

Methods-BackedMethods-Backed Completeness 90/100

This protocol has structured steps plus evidence quotes, and is ready for canonical sync.

Steps

Evidence Quotes

Protocol Items

Linked Products

Canonical Sync

Pending

What this means

The completeness score reflects how much structured protocol data is present: steps, methods evidence, listed materials, linked products, and paper provenance.

Computed from the current experiment record updated Feb 28, 2026.

How ReplicateScience uses canonical sync

Canonical Sync shows whether a ConductGraph-backed protocol is available for this experiment route right now. It is a sync-status signal, not a claim that every downstream vendor link or step detail is perfect.

Verification

Methods VerifiedMethods Verified Score 80/100

Steps

Evidence

Specific Products

0/0

Canonical Sync

Pending

What this score means

The verification score reflects evidence coverage, subject detail, paper provenance, step depth, and whether linked products resolve to specific item pages instead of generic searches.

Computed from the current experiment record updated Feb 28, 2026.

Why a page can still need review

A page can have structured steps and still need review when evidence is thin, product links are generic, or canonical protocol coverage is still pending.

What still needs work

Protocol does not list structured equipment.