Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-regall-recipe-regression.md
autonomic-bot 65ee741869 plan: queue prevb (dynamic upgrade base + previous/ config, opus) + regall (all-recipe regression, sonnet)
Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single
compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) ->
main fallback -> skip; optional minimal per-recipe previous/ folder for
base-only version repairs (ignored for head, version-guarded, removable when
stale). Validated on discourse PR #4 (official-image switch the current overlay
masks). regall then sweeps all recipes for regressions on sonnet.
2026-06-16 23:55:28 +00:00

4.1 KiB
Raw Blame History

Phase regall — full all-recipe regression after the dynamic-base / previous/ change

Mission (operator-specified 2026-06-16): the prevb phase changed the upgrade tier for EVERY recipe (dynamic base resolution: last-green → main → skip; new previous/ lookup; environmental-vs-version overlay split). Run the entire recipe suite through cc-ci to ensure nothing regressed, and fix anything that did. This is the safety net for a cross-cutting harness change.

State files: STATUS-regall.md, BACKLOG-regall.md, REVIEW-regall.md, JOURNAL-regall.md. DECISIONS.md shared.

1. Scope

  • Every recipe cc-ci tests — the weekly + external rows in cc-ci-plan/used-recipes.md that have a tests/<recipe>/ dir (all the enrolled recipes, ~21). Drone's gitea-dep path counts too.
  • All tiers, with focus on the upgrade tier (changed most by prevb): install / upgrade / backup / restore / custom / lint / screenshot, via the real harness / CI path.
  • Baseline = the recorded pre-prevb green levels per recipe (cc-ci's level records). A regression = a recipe that dropped a level, or a tier that newly fails, relative to that baseline — not relative to "perfect."

2. Method

  1. Run each recipe through the harness (parallelize within the shared-swarm budget — ≤23 concurrent deploys; tear down every deploy on every exit path). Capture level + per-tier pass/fail.
  2. Build a results table: recipe | baseline level | new level | per-tier delta | verdict.
  3. Classify each regression: caused by the prevb change (dynamic base resolution / previous/ lookup / overlay split), or pre-existing / flaky / environmental (was already red, or fails on re-run independent of prevb). Be honest — a recipe that was already red stays red; say so, don't claim a fix.
  4. Fix the prevb-caused regressions:
    • Harness-logic regressions → fix in runner/** (e.g. a base that resolves wrong, a previous/ mis-match, an overlay layering bug).
    • A recipe whose last-green base no longer deploys under dynamic resolution → add a MINIMAL previous/ folder for it (same rules as prevb: smallest thing that brings the base up, version-guarded, removable when stale). Do NOT over-add previous/ folders — only where a base genuinely won't deploy.
  5. Re-verify each fix green. Never weaken a test to clear a regression.

3. Gates

M1 — full sweep done + classified. Every in-scope recipe run; results table complete with baseline-vs-new levels; each regression classified prevb-caused vs pre-existing/flaky, with evidence. Adversary cold-verifies the classification (spot-re-runs a sample, confirms a claimed flake really is one, confirms a claimed prevb-cause is real).

M2 — regressions fixed, suite back to baseline. Every prevb-caused regression fixed + re-verified green (harness fix and/or a minimal previous/ folder); no recipe below its pre-prevb baseline; any added previous/ folders are minimal + version-guarded; pre-existing reds documented (not silently absorbed). Fresh Adversary PASS → ## DONE.

4. Guardrails

  • Never weaken a test to clear a regression; fix the harness or add a minimal previous/.
  • previous/ stays minimal + non-accumulating — add only where a base won't deploy.
  • Report honestly — pre-existing/flaky reds are labelled as such with evidence, never claimed fixed; any recipe left below baseline is called out explicitly with the reason.
  • Shared swarm: ≤23 concurrent deploys; tear down every deploy on every exit path; /upgrade-all's dev-* reaper is a backstop, not a substitute. Recipe mirrors PR-only; never merge. Commit author autonomic-bot; push every commit; abra over a pseudo-TTY.

5. Definition of Done

Every enrolled recipe run through cc-ci after the prevb change; a complete baseline-vs-new results table; all prevb-caused regressions fixed + re-verified green (no recipe below its pre-prevb baseline); any needed previous/ folders added minimally; pre-existing reds documented honestly; M1 + M2 fresh Adversary PASSes recorded in REVIEW-regall.md.