Changes in version 0.1.1                        

  - Aligned with the May 2026 republication of the HM Treasury Magenta
    Book. Every @references block now cites the 2026 edition, with
    pointers to the relevant chapter in the main book and section in
    Annex A: Analytical methods for use within an evaluation. No API
    changes.
  - DESCRIPTION, package-level documentation, and inst/CITATION updated
    to the 2026 edition.
  - README.md updated: new alignment banner; "current edition" line
    updated; Source documents section now points at the main Magenta
    Book, Annex A, and the Test and Learn annex.
  - Vintage refreshed in inst/extdata/data_versions.csv (rebuilt from
    data-raw/data_versions.R); last_updated now 2026-05-19.
  - CITATION.cff bumped to 0.1.1 / 2026-05-19.
  - Coverage of new 2026 content (Test and Learn annex, value-for-money
    expansion, place-based evaluation, AI in evaluation, research
    transparency / TIGER) is planned in subsequent releases.

                 Changes in version 0.1.0 (2026-04-29)                  

  - First release. UK HM Treasury Magenta Book policy-evaluation
    primitives.
  - Provenance is explicit: bundled rubrics carry honest source metadata
    distinguishing direct quotations from researcher synthesis. ICC
    reference values use a value_source flag ("table_quote" vs
    "central_estimate").
  - DOIs added to every @references block where available. Framework
    functions (mb_evaluation_plan, mb_questions, mb_counterfactual,
    mb_theory_of_change, etc.) cite the Magenta Book (2020) chapters
    they correspond to.
  - inst/CITATION extended with footer pointing to the underlying
    primary sources for the methods implemented (Sherman 1997, Cohen
    1988, Hussey-Hughes 2007, Hemming 2015, Hedges & Hedberg 2007,
    Drummond 2015, Cameron & Miller 2015, Stuart 2010).
  - Cross-validated against canonical reference implementations:
      - pwr for two-sample power, sample size, MDE, and proportion power
        (within ~3 percentage points; test-pwr-equivalence.R).
      - sandwich for mb_did_2x2 cluster-robust SEs (CR1 / HC1 to within
        1e-6; test-sandwich-equivalence.R).
      - swCRTdesign for mb_stepped_wedge (closed-form Hemming
        approximation tracks the exact Hussey-Hughes variance to within
        roughly 0.5x to 2x for typical UK designs;
        test-swcrt-equivalence.R). For decision-grade sample-size work
        prefer swCRTdesign::swPwr.
      - BCEA for mb_icer and mb_ceac (floating-point agreement;
        test-bcea-equivalence.R).
      - cobalt for mb_balance_table SMD on balanced samples (within
        1e-8; test-cobalt-equivalence.R).
  - mb_stepped_wedge formula argument removed in favour of the single
    Hemming/Woertman closed-form approximation, which is documented as
    approximate (vs the exact Hussey-Hughes variance computed by
    swCRTdesign). The earlier formula = "hussey_hughes" branch was
    researcher-derived and not externally verifiable; it has been
    removed before the package leaves disk.
  - mb_balance_table() added for pre-treatment balance checks (mean, SD,
    standardised mean difference, Welch t / chi-squared p, imbalance
    flag at user-controlled threshold).
  - mb_stepped_wedge(formula = c("hemming", "hussey_hughes")): choose
    between the Woertman/Hemming closed-form correction (default) and
    the Hussey-Hughes (2007) closed form. Both assume balanced design,
    complete data, no time-by-treatment interaction; for non-standard
    designs use swCRTdesign or clusterPower.
  - quiet = FALSE argument added to mb_did_2x2(), mb_its(), and
    mb_event_study(). The print method now appends a one-line reminder
    that the estimator is canonical and points to specialist packages
    (fixest, did, sandwich) for staggered adoption, autocorrelation, or
    production work. Set quiet = TRUE to suppress.
  - cluster argument added to mb_event_study(), mirroring mb_did_2x2():
    cluster-robust SEs via CR1 with the Stata-style finite-sample
    correction (G/(G-1)) * (N-1)/(N-K).
  - mb_power() @details now states the normal-approximation assumption
    explicitly and points to pwr::pwr.t.test for small N (where the
    noncentral-t form differs by ~1-2 percentage points).
  - 35 exported functions across 10 families: theory of change,
    evaluation planning, power and design, Maryland Scientific Methods
    Scale, Magenta Book confidence rating, lightweight estimators
    (difference-in-differences, interrupted time series, event study),
    cost-effectiveness analysis (CEA, ICER, CEAC, INB, QALY, DALY),
    realist / theory-based scaffolding, reporting, lookups.
  - Bundled rubric and reference tables in inst/extdata/ covering the
    five-level Maryland SMS rubric, the three-level magentabook
    confidence rubric (synthesised across What Works Centre traditions),
    reference intra-class correlation values across UK policy domains
    (education, health, employment, local government, criminal justice,
    housing) tagged with a value_source flag distinguishing direct table
    quotations from researcher synthesis, and the canonical Magenta Book
    evaluation question taxonomy. Vintage and provenance metadata
    accessible via mb_data_versions().
  - Provenance is explicit: see the README "Bundled rubrics: provenance"
    section for what is verbatim from primary sources and what is
    magentabook synthesis.
  - Cross-validated against canonical reference implementations (when
    installed): power and sample size vs pwr, cluster-robust SEs vs
    sandwich. See tests/testthat/test-pwr-equivalence.R and
    tests/testthat/test-sandwich-equivalence.R.
  - Pure computation: no network calls, no API keys.
  - Designed as the evaluation companion to the appraisal package
    greenbook. See the vignette Cost-effectiveness with magentabook and
    greenbook for an end-to-end worked example.