Source Paper
Source Paper
Susan Michie, Caroline E Wood, Marie Johnston, Charles Abraham, Jill J Francis et al.
Health Technology Assessment • 2015
Background Meeting global health challenges requires effective behaviour change interventions (BCIs). This depends on advancing the science of behaviour change which, in turn, depends on accurate intervention reporting. Current reporting often lacks detail, preventing accurate replication and implementation. Recent developments have specified intervention content into behaviour change techniques (BCTs) – the ‘active ingredients’, for example goal-setting, self-monitoring of behaviour. BCTs are ‘the smallest components compatible with retaining the postulated active ingredients, i.e. the proposed mechanisms of change. They can be used alone or in combination with other BCTs’ (Michie S, Johnston M. Theories and techniques of behaviour change: developing a cumulative science of behaviour change. Health Psychol Rev 2012; 6:1–6). Domain-specific taxonomies of BCTs have been developed, for example healthy eating and physical activity, smoking cessation and alcohol consumption. We need to build on these to develop an internationally shared language for specifying and developing interventions. This technology can be used for synthesising evidence, implementing effective interventions and testing theory. It has enormous potential added value for science and global health. Objective (1) To develop a method of specifying content of BCIs in terms of component BCTs; (2) to lay a foundation for a comprehensive methodology applicable to different types of complex interventions; (3) to develop resources to support application of the taxonomy; and (4) to achieve multidisciplinary and international acceptance for future development. Design and participants Four hundred participants (systematic reviewers, researchers, practitioners, policy-makers) from 12 countries engaged in investigating, designing and/or delivering BCIs. Development of the taxonomy involved a Delphi procedure, an iterative process of revisions and consultation with 41 international experts; hierarchical structure of the list was developed using inductive ‘bottom-up’ and theory-driven ‘top-down’ open-sort procedures ( n = 36); training in use of the taxonomy (1-day workshops and distance group tutorials) ( n = 161) was evaluated by changes in intercoder reliability and validity (agreement with expert consensus); evaluating the taxonomy for coding interventions was assessed by reliability (intercoder; test–retest) and validity ( n = 40 trained coders); and evaluating the taxonomy for writing descriptions was assessed by reliability (intercoder; test–retest) and by experimentally testing its value ( n = 190). Results Ninety-three distinct, non-overlapping BCTs with clear labels and definitions formed Behaviour Change Technique Taxonomy version 1 (BCTTv1). BCTs clustered into 16 groupings using a ‘bottom-up’ open-sort procedure; there was overlap between these and groupings produced by a theory-driven, ‘top-down’ procedure. Both training methods improved validity (both p < 0.05), doubled the proportion of coders achieving competence and improved confidence in identifying BCTs in workshops (both p < 0.001) but did not improve intercoder reliability. Good intercoder reliability was observed for 80 of the 93 BCTs. Good within-coder agreement was observed after 1 month ( p < 0.001). Validity was good for 14 of 15 BCTs in the descriptions. The usefulness of BCTTv1 to report descriptions of observed interventions had mixed results. Conclusions The developed taxonomy (BCTTv1) provides a methodology for identifying content of complex BCIs and a foundation for international cross-disciplinary collaboration for developing more effective interventions to improve health. Further work is needed to examine its usefulness for reporting interventions. Funding This project was funded by the Medical Research Council Ref: G0901474/1. Funding also came from the Peninsula Collaboration for Leadership in Applied Health Research and Care.
Objective: To evaluate a taxonomy for writing intervention descriptions through intercoder and test-retest reliability assessment and experimental testing of usefulness with 190 participants
This is a Taxonomy Description Writing Evaluation protocol using human as the model organism. The procedure involves 8 procedural steps. Extracted from a 2015 paper published in Health Technology Assessment.
Model and subjects
human • 190
Study window
Estimated timing pending
Core workflow
Taxonomy Development via Delphi Procedure • Hierarchical Structure Development • Training in Taxonomy Use
Primary readouts
Key equipment and reagents
Verified items
0
Direct vendor links
0
Use this page as an execution guide, then fall back to the source paper whenever you need exact exclusions, dosing details, or assay-specific caveats.
Confirm first
Use the page like this
Start here. The step list is optimized for running the experiment, with direct vendor links available inline when you need to source a cited item.
Development of the taxonomy involved a Delphi procedure, an iterative process of revisions and consultation with international experts to create behaviour change technique taxonomy
Note: Involved 41 international experts in iterative consultation process
“Development of the taxonomy involved a Delphi procedure, an iterative process of revisions and consultation with 41 international experts”
Hierarchical structure of the taxonomy list was developed using inductive 'bottom-up' and theory-driven 'top-down' open-sort procedures
Note: 36 participants engaged in open-sort procedures
“hierarchical structure of the list was developed using inductive 'bottom-up' and theory-driven 'top-down' open-sort procedures (n = 36)”
Training in use of the taxonomy was provided through 1-day workshops and distance group tutorials to evaluate changes in intercoder reliability and validity
Note: 161 participants received training; evaluation measured agreement with expert consensus
“training in use of the taxonomy (1-day workshops and distance group tutorials) (n = 161) was evaluated by changes in intercoder reliability and validity”
Evaluating the taxonomy for coding interventions was assessed by reliability measures (intercoder and test-retest) and validity assessment
Note: 40 trained coders participated in this evaluation phase
“evaluating the taxonomy for coding interventions was assessed by reliability (intercoder; test-retest) and validity (n = 40 trained coders)”
Evaluating the taxonomy for writing descriptions was assessed by reliability measures (intercoder and test-retest) and by experimentally testing its value
Note: 190 participants engaged in this evaluation phase
“evaluating the taxonomy for writing descriptions was assessed by reliability (intercoder; test-retest) and by experimentally testing its value (n = 190)”
Assessment of intercoder reliability across the 93 behaviour change techniques in the taxonomy
Note: Good intercoder reliability was observed for 80 of the 93 BCTs
“Good intercoder reliability was observed for 80 of the 93 BCTs”
Within-coder agreement was assessed after 1 month to evaluate test-retest reliability
Note: Good within-coder agreement was observed
“Good within-coder agreement was observed after 1 month (p < 0.001)”
Validity was assessed for behaviour change technique descriptions in the taxonomy
Note: Good validity was observed for 14 of 15 BCTs in the descriptions
“Validity was good for 14 of 15 BCTs in the descriptions”
This section explains what the experiment is doing, which readouts matter, what the data artifacts usually look like, and how the analysis should flow from raw capture to reported result.
To evaluate a taxonomy for writing intervention descriptions through intercoder and test-retest reliability assessment and experimental testing of usefulness with 190 participants
Objective
To evaluate a taxonomy for writing intervention descriptions through intercoder and test-retest reliability assessment and experimental testing of usefulness with 190 participants
Subjects
From paperhuman
Sample count
From paper190
Cohort notes
From paperParticipants were systematic reviewers, researchers, practitioners, and policy-makers from 12 countries engaged in investigating, designing and/or delivering behaviour change interventions
Taxonomy Development via Delphi Procedure
Hierarchical Structure Development
Training in Taxonomy Use (1-day workshops plus distance tutorials)
Evaluation of Taxonomy for Coding Interventions
Intercoder reliability for coding interventions
From paperStatistical significance testing was performed (p-values reported); comparison of training methods on validity, competence achievement, and confidence; assessment of reliability and validity across the 93 behaviour change techniques
Artifact type
Endpoint measurements summarized by group or timepoint
Comparison focus
Compare endpoint magnitude between groups, timepoints, or both
Test-retest reliability (within-coder agreement)
From paperStatistical significance testing was performed (p-values reported); comparison of training methods on validity, competence achievement, and confidence; assessment of reliability and validity across the 93 behaviour change techniques
Artifact type
Endpoint measurements summarized by group or timepoint
Comparison focus
Compare endpoint magnitude between groups, timepoints, or both
Validity of taxonomy descriptions (agreement with expert consensus)
From paperStatistical significance testing was performed (p-values reported); comparison of training methods on validity, competence achievement, and confidence; assessment of reliability and validity across the 93 behaviour change techniques
Artifact type
Endpoint measurements summarized by group or timepoint
Comparison focus
Compare endpoint magnitude between groups, timepoints, or both
Proportion of coders achieving competence
From paperStatistical significance testing was performed (p-values reported); comparison of training methods on validity, competence achievement, and confidence; assessment of reliability and validity across the 93 behaviour change techniques
Artifact type
Endpoint measurements summarized by group or timepoint
Comparison focus
Compare endpoint magnitude between groups, timepoints, or both
Intercoder reliability for coding interventions
From paperRaw artifact
Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact
Structured table with cleaned measurements ready for comparison
Final reported form
Summary statistics and between-group or across-timepoint comparisons
Test-retest reliability (within-coder agreement)
From paperRaw artifact
Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact
Structured table with cleaned measurements ready for comparison
Final reported form
Summary statistics and between-group or across-timepoint comparisons
Validity of taxonomy descriptions (agreement with expert consensus)
From paperRaw artifact
Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact
Structured table with cleaned measurements ready for comparison
Final reported form
Summary statistics and between-group or across-timepoint comparisons
Proportion of coders achieving competence
From paperRaw artifact
Per-sample or per-animal endpoint measurements collected during the experiment
Processed artifact
Structured table with cleaned measurements ready for comparison
Final reported form
Summary statistics and between-group or across-timepoint comparisons
Acquisition
Collect raw experimental outputs with enough metadata to preserve sample identity, condition, and timing.
Preprocessing / cleaning
Statistical significance testing was performed (p-values reported); comparison of training methods on validity, competence achievement, and confidence; assessment of reliability and validity across the 93 behaviour change techniques
Scoring or quantification
Quantify the primary readouts for this experiment: Intercoder reliability for coding interventions; Test-retest reliability (within-coder agreement); Validity of taxonomy descriptions (agreement with expert consensus); Proportion of coders achieving competence.
Statistical comparison
Statistical method not yet structured for this page.
Reporting output
Report representative outputs alongside summary comparisons for Intercoder reliability for coding interventions, Test-retest reliability (within-coder agreement), Validity of taxonomy descriptions (agreement with expert consensus), Proportion of coders achieving competence.
Source links and direct wording from the methods section for validation and deeper review.
Citation
Susan Michie et al. (2015). Behaviour change techniques: the development and evaluation of a taxonomic method for reporting and describing behaviour change interventions (a suite of five studies involving consensus methods, randomised controlled trials and analysis of qualitative data). Health Technology Assessment
“”
“”
“”
“”
Direct vendor pages are linked from the protocol above. This section stays focused on the full comparison view and the prep checklist.
Use this section as the page quality checkpoint. It keeps section navigation, evidence access, readiness, and verification meaning in one place.
Current status surfaces were computed from experiment data updated Feb 28, 2026.
Source access
Jump back into the original paper or the methods evidence section when you need exact wording, exclusions, or method-specific caveats.
This protocol has structured steps plus evidence quotes, and is ready for canonical sync.
Steps
8
Evidence Quotes
8
Protocol Items
0
Linked Products
0
Canonical Sync
Pending
What this means
The completeness score reflects how much structured protocol data is present: steps, methods evidence, listed materials, linked products, and paper provenance.
Computed from the current experiment record updated Feb 28, 2026.
Canonical Sync shows whether a ConductGraph-backed protocol is available for this experiment route right now. It is a sync-status signal, not a claim that every downstream vendor link or step detail is perfect.
Steps
8
Evidence
8
Specific Products
0/0
Canonical Sync
Pending
What this score means
The verification score reflects evidence coverage, subject detail, paper provenance, step depth, and whether linked products resolve to specific item pages instead of generic searches.
Computed from the current experiment record updated Feb 28, 2026.
A page can have structured steps and still need review when evidence is thin, product links are generic, or canonical protocol coverage is still pending.
What still needs work
Music Experience Rating Scale
Interpretative Phenomenological Analysis
Semi-structured Interviews on Music Experience in Psilocybin Therapy
Nicotine Patch Challenge Study