NEWS
magentabook 0.1.1
- Aligned with the May 2026 republication of the HM Treasury Magenta
Book. Every
@references block now cites the 2026 edition, with
pointers to the relevant chapter in the main book and section in
Annex A: Analytical methods for use within an evaluation. No API
changes.
DESCRIPTION, package-level documentation, and inst/CITATION
updated to the 2026 edition.
README.md updated: new alignment banner; "current edition" line
updated; Source documents section now points at the main Magenta
Book, Annex A, and the Test and Learn annex.
- Vintage refreshed in
inst/extdata/data_versions.csv (rebuilt from
data-raw/data_versions.R); last_updated now 2026-05-19.
CITATION.cff bumped to 0.1.1 / 2026-05-19.
- Coverage of new 2026 content (Test and Learn annex, value-for-money
expansion, place-based evaluation, AI in evaluation, research
transparency / TIGER) is planned in subsequent releases.
magentabook 0.1.0 (2026-04-29)
- First release. UK HM Treasury Magenta Book policy-evaluation primitives.
- Provenance is explicit: bundled rubrics carry honest source metadata
distinguishing direct quotations from researcher synthesis. ICC
reference values use a
value_source flag ("table_quote" vs
"central_estimate").
- DOIs added to every
@references block where available. Framework
functions (mb_evaluation_plan, mb_questions,
mb_counterfactual, mb_theory_of_change, etc.) cite the
Magenta Book (2020) chapters they correspond to.
inst/CITATION extended with footer pointing to the underlying
primary sources for the methods implemented (Sherman 1997, Cohen
1988, Hussey-Hughes 2007, Hemming 2015, Hedges & Hedberg 2007,
Drummond 2015, Cameron & Miller 2015, Stuart 2010).
- Cross-validated against canonical reference implementations:
pwr for two-sample power, sample size, MDE, and proportion
power (within ~3 percentage points; test-pwr-equivalence.R).
sandwich for mb_did_2x2 cluster-robust SEs (CR1 / HC1
to within 1e-6; test-sandwich-equivalence.R).
swCRTdesign for mb_stepped_wedge (closed-form Hemming
approximation tracks the exact Hussey-Hughes variance to
within roughly 0.5x to 2x for typical UK designs;
test-swcrt-equivalence.R). For decision-grade sample-size
work prefer swCRTdesign::swPwr.
BCEA for mb_icer and mb_ceac (floating-point agreement;
test-bcea-equivalence.R).
cobalt for mb_balance_table SMD on balanced samples
(within 1e-8; test-cobalt-equivalence.R).
mb_stepped_wedge formula argument removed in favour of the
single Hemming/Woertman closed-form approximation, which is
documented as approximate (vs the exact Hussey-Hughes variance
computed by swCRTdesign). The earlier formula = "hussey_hughes"
branch was researcher-derived and not externally verifiable; it
has been removed before the package leaves disk.
mb_balance_table() added for pre-treatment balance checks (mean, SD,
standardised mean difference, Welch t / chi-squared p, imbalance flag
at user-controlled threshold).
mb_stepped_wedge(formula = c("hemming", "hussey_hughes")): choose
between the Woertman/Hemming closed-form correction (default) and the
Hussey-Hughes (2007) closed form. Both assume balanced design,
complete data, no time-by-treatment interaction; for non-standard
designs use swCRTdesign or clusterPower.
quiet = FALSE argument added to mb_did_2x2(), mb_its(), and
mb_event_study(). The print method now appends a one-line reminder
that the estimator is canonical and points to specialist packages
(fixest, did, sandwich) for staggered adoption, autocorrelation,
or production work. Set quiet = TRUE to suppress.
cluster argument added to mb_event_study(), mirroring
mb_did_2x2(): cluster-robust SEs via CR1 with the Stata-style
finite-sample correction (G/(G-1)) * (N-1)/(N-K).
mb_power() @details now states the normal-approximation
assumption explicitly and points to pwr::pwr.t.test for small N
(where the noncentral-t form differs by ~1-2 percentage points).
- 35 exported functions across 10 families: theory of change, evaluation
planning, power and design, Maryland Scientific Methods Scale,
Magenta Book confidence rating, lightweight estimators
(difference-in-differences, interrupted time series, event study),
cost-effectiveness analysis (CEA, ICER, CEAC, INB, QALY, DALY),
realist / theory-based scaffolding, reporting, lookups.
- Bundled rubric and reference tables in
inst/extdata/ covering the
five-level Maryland SMS rubric, the three-level magentabook
confidence rubric (synthesised across What Works Centre traditions),
reference intra-class correlation values across UK policy domains
(education, health, employment, local government, criminal justice,
housing) tagged with a value_source flag distinguishing direct
table quotations from researcher synthesis, and the canonical
Magenta Book evaluation question taxonomy. Vintage and provenance
metadata accessible via mb_data_versions().
- Provenance is explicit: see the README "Bundled rubrics: provenance"
section for what is verbatim from primary sources and what is
magentabook synthesis.
- Cross-validated against canonical reference implementations (when
installed): power and sample size vs
pwr, cluster-robust SEs vs
sandwich. See tests/testthat/test-pwr-equivalence.R and
tests/testthat/test-sandwich-equivalence.R.
- Pure computation: no network calls, no API keys.
- Designed as the evaluation companion to the appraisal package
greenbook. See the vignette
Cost-effectiveness with magentabook and greenbook for an end-to-end
worked example.