Changes in version 0.1.1 - Aligned with the May 2026 republication of the HM Treasury Magenta Book. Every @references block now cites the 2026 edition, with pointers to the relevant chapter in the main book and section in Annex A: Analytical methods for use within an evaluation. No API changes. - DESCRIPTION, package-level documentation, and inst/CITATION updated to the 2026 edition. - README.md updated: new alignment banner; "current edition" line updated; Source documents section now points at the main Magenta Book, Annex A, and the Test and Learn annex. - Vintage refreshed in inst/extdata/data_versions.csv (rebuilt from data-raw/data_versions.R); last_updated now 2026-05-19. - CITATION.cff bumped to 0.1.1 / 2026-05-19. - Coverage of new 2026 content (Test and Learn annex, value-for-money expansion, place-based evaluation, AI in evaluation, research transparency / TIGER) is planned in subsequent releases. Changes in version 0.1.0 (2026-04-29) - First release. UK HM Treasury Magenta Book policy-evaluation primitives. - Provenance is explicit: bundled rubrics carry honest source metadata distinguishing direct quotations from researcher synthesis. ICC reference values use a value_source flag ("table_quote" vs "central_estimate"). - DOIs added to every @references block where available. Framework functions (mb_evaluation_plan, mb_questions, mb_counterfactual, mb_theory_of_change, etc.) cite the Magenta Book (2020) chapters they correspond to. - inst/CITATION extended with footer pointing to the underlying primary sources for the methods implemented (Sherman 1997, Cohen 1988, Hussey-Hughes 2007, Hemming 2015, Hedges & Hedberg 2007, Drummond 2015, Cameron & Miller 2015, Stuart 2010). - Cross-validated against canonical reference implementations: - pwr for two-sample power, sample size, MDE, and proportion power (within ~3 percentage points; test-pwr-equivalence.R). - sandwich for mb_did_2x2 cluster-robust SEs (CR1 / HC1 to within 1e-6; test-sandwich-equivalence.R). - swCRTdesign for mb_stepped_wedge (closed-form Hemming approximation tracks the exact Hussey-Hughes variance to within roughly 0.5x to 2x for typical UK designs; test-swcrt-equivalence.R). For decision-grade sample-size work prefer swCRTdesign::swPwr. - BCEA for mb_icer and mb_ceac (floating-point agreement; test-bcea-equivalence.R). - cobalt for mb_balance_table SMD on balanced samples (within 1e-8; test-cobalt-equivalence.R). - mb_stepped_wedge formula argument removed in favour of the single Hemming/Woertman closed-form approximation, which is documented as approximate (vs the exact Hussey-Hughes variance computed by swCRTdesign). The earlier formula = "hussey_hughes" branch was researcher-derived and not externally verifiable; it has been removed before the package leaves disk. - mb_balance_table() added for pre-treatment balance checks (mean, SD, standardised mean difference, Welch t / chi-squared p, imbalance flag at user-controlled threshold). - mb_stepped_wedge(formula = c("hemming", "hussey_hughes")): choose between the Woertman/Hemming closed-form correction (default) and the Hussey-Hughes (2007) closed form. Both assume balanced design, complete data, no time-by-treatment interaction; for non-standard designs use swCRTdesign or clusterPower. - quiet = FALSE argument added to mb_did_2x2(), mb_its(), and mb_event_study(). The print method now appends a one-line reminder that the estimator is canonical and points to specialist packages (fixest, did, sandwich) for staggered adoption, autocorrelation, or production work. Set quiet = TRUE to suppress. - cluster argument added to mb_event_study(), mirroring mb_did_2x2(): cluster-robust SEs via CR1 with the Stata-style finite-sample correction (G/(G-1)) * (N-1)/(N-K). - mb_power() @details now states the normal-approximation assumption explicitly and points to pwr::pwr.t.test for small N (where the noncentral-t form differs by ~1-2 percentage points). - 35 exported functions across 10 families: theory of change, evaluation planning, power and design, Maryland Scientific Methods Scale, Magenta Book confidence rating, lightweight estimators (difference-in-differences, interrupted time series, event study), cost-effectiveness analysis (CEA, ICER, CEAC, INB, QALY, DALY), realist / theory-based scaffolding, reporting, lookups. - Bundled rubric and reference tables in inst/extdata/ covering the five-level Maryland SMS rubric, the three-level magentabook confidence rubric (synthesised across What Works Centre traditions), reference intra-class correlation values across UK policy domains (education, health, employment, local government, criminal justice, housing) tagged with a value_source flag distinguishing direct table quotations from researcher synthesis, and the canonical Magenta Book evaluation question taxonomy. Vintage and provenance metadata accessible via mb_data_versions(). - Provenance is explicit: see the README "Bundled rubrics: provenance" section for what is verbatim from primary sources and what is magentabook synthesis. - Cross-validated against canonical reference implementations (when installed): power and sample size vs pwr, cluster-robust SEs vs sandwich. See tests/testthat/test-pwr-equivalence.R and tests/testthat/test-sandwich-equivalence.R. - Pure computation: no network calls, no API keys. - Designed as the evaluation companion to the appraisal package greenbook. See the vignette Cost-effectiveness with magentabook and greenbook for an end-to-end worked example.