Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) -> main fallback -> skip; optional minimal per-recipe previous/ folder for base-only version repairs (ignored for head, version-guarded, removable when stale). Validated on discourse PR #4 (official-image switch the current overlay masks). regall then sweeps all recipes for regressions on sonnet.
4.1 KiB
Phase regall — full all-recipe regression after the dynamic-base / previous/ change
Mission (operator-specified 2026-06-16): the prevb phase changed the upgrade tier for EVERY recipe
(dynamic base resolution: last-green → main → skip; new previous/ lookup; environmental-vs-version
overlay split). Run the entire recipe suite through cc-ci to ensure nothing regressed, and fix
anything that did. This is the safety net for a cross-cutting harness change.
State files: STATUS-regall.md, BACKLOG-regall.md, REVIEW-regall.md, JOURNAL-regall.md. DECISIONS.md shared.
1. Scope
- Every recipe cc-ci tests — the
weekly+externalrows incc-ci-plan/used-recipes.mdthat have atests/<recipe>/dir (all the enrolled recipes, ~21). Drone's gitea-dep path counts too. - All tiers, with focus on the upgrade tier (changed most by
prevb): install / upgrade / backup / restore / custom / lint / screenshot, via the real harness / CI path. - Baseline = the recorded pre-
prevbgreen levels per recipe (cc-ci's level records). A regression = a recipe that dropped a level, or a tier that newly fails, relative to that baseline — not relative to "perfect."
2. Method
- Run each recipe through the harness (parallelize within the shared-swarm budget — ≤2–3 concurrent deploys; tear down every deploy on every exit path). Capture level + per-tier pass/fail.
- Build a results table:
recipe | baseline level | new level | per-tier delta | verdict. - Classify each regression: caused by the
prevbchange (dynamic base resolution /previous/lookup / overlay split), or pre-existing / flaky / environmental (was already red, or fails on re-run independent ofprevb). Be honest — a recipe that was already red stays red; say so, don't claim a fix. - Fix the
prevb-caused regressions:- Harness-logic regressions → fix in
runner/**(e.g. a base that resolves wrong, aprevious/mis-match, an overlay layering bug). - A recipe whose last-green base no longer deploys under dynamic resolution → add a MINIMAL
previous/folder for it (same rules asprevb: smallest thing that brings the base up, version-guarded, removable when stale). Do NOT over-addprevious/folders — only where a base genuinely won't deploy.
- Harness-logic regressions → fix in
- Re-verify each fix green. Never weaken a test to clear a regression.
3. Gates
M1 — full sweep done + classified. Every in-scope recipe run; results table complete with
baseline-vs-new levels; each regression classified prevb-caused vs pre-existing/flaky, with evidence.
Adversary cold-verifies the classification (spot-re-runs a sample, confirms a claimed flake really is one,
confirms a claimed prevb-cause is real).
M2 — regressions fixed, suite back to baseline. Every prevb-caused regression fixed + re-verified
green (harness fix and/or a minimal previous/ folder); no recipe below its pre-prevb baseline; any
added previous/ folders are minimal + version-guarded; pre-existing reds documented (not silently
absorbed). Fresh Adversary PASS → ## DONE.
4. Guardrails
- Never weaken a test to clear a regression; fix the harness or add a minimal
previous/. previous/stays minimal + non-accumulating — add only where a base won't deploy.- Report honestly — pre-existing/flaky reds are labelled as such with evidence, never claimed fixed; any recipe left below baseline is called out explicitly with the reason.
- Shared swarm: ≤2–3 concurrent deploys; tear down every deploy on every exit path;
/upgrade-all'sdev-*reaper is a backstop, not a substitute. Recipe mirrors PR-only; never merge. Commit authorautonomic-bot; push every commit; abra over a pseudo-TTY.
5. Definition of Done
Every enrolled recipe run through cc-ci after the prevb change; a complete baseline-vs-new results table;
all prevb-caused regressions fixed + re-verified green (no recipe below its pre-prevb baseline); any
needed previous/ folders added minimally; pre-existing reds documented honestly; M1 + M2 fresh Adversary
PASSes recorded in REVIEW-regall.md.