Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) -> main fallback -> skip; optional minimal per-recipe previous/ folder for base-only version repairs (ignored for head, version-guarded, removable when stale). Validated on discourse PR #4 (official-image switch the current overlay masks). regall then sweeps all recipes for regressions on sonnet.
66 lines
4.1 KiB
Markdown
66 lines
4.1 KiB
Markdown
# Phase `regall` — full all-recipe regression after the dynamic-base / `previous/` change
|
||
|
||
**Mission (operator-specified 2026-06-16):** the `prevb` phase changed the upgrade tier for EVERY recipe
|
||
(dynamic base resolution: last-green → `main` → skip; new `previous/` lookup; environmental-vs-version
|
||
overlay split). Run the **entire recipe suite** through cc-ci to ensure **nothing regressed**, and **fix
|
||
anything that did**. This is the safety net for a cross-cutting harness change.
|
||
|
||
State files: `STATUS-regall.md`, `BACKLOG-regall.md`, `REVIEW-regall.md`, `JOURNAL-regall.md`. DECISIONS.md shared.
|
||
|
||
## 1. Scope
|
||
|
||
- **Every recipe cc-ci tests** — the `weekly` + `external` rows in `cc-ci-plan/used-recipes.md` that have a
|
||
`tests/<recipe>/` dir (all the enrolled recipes, ~21). Drone's gitea-dep path counts too.
|
||
- **All tiers**, with focus on the **upgrade tier** (changed most by `prevb`): install / upgrade / backup
|
||
/ restore / custom / lint / screenshot, via the real harness / CI path.
|
||
- **Baseline = the recorded pre-`prevb` green levels** per recipe (cc-ci's level records). A regression =
|
||
a recipe that dropped a level, or a tier that newly fails, **relative to that baseline** — not relative
|
||
to "perfect."
|
||
|
||
## 2. Method
|
||
|
||
1. Run each recipe through the harness (parallelize within the shared-swarm budget — ≤2–3 concurrent
|
||
deploys; tear down every deploy on every exit path). Capture level + per-tier pass/fail.
|
||
2. Build a results table: `recipe | baseline level | new level | per-tier delta | verdict`.
|
||
3. **Classify each regression**: caused by the `prevb` change (dynamic base resolution / `previous/`
|
||
lookup / overlay split), or **pre-existing / flaky / environmental** (was already red, or fails on
|
||
re-run independent of `prevb`). Be honest — a recipe that was already red stays red; say so, don't
|
||
claim a fix.
|
||
4. **Fix the `prevb`-caused regressions:**
|
||
- Harness-logic regressions → fix in `runner/**` (e.g. a base that resolves wrong, a `previous/`
|
||
mis-match, an overlay layering bug).
|
||
- A recipe whose **last-green base no longer deploys** under dynamic resolution → add a **MINIMAL**
|
||
`previous/` folder for it (same rules as `prevb`: smallest thing that brings the base up,
|
||
version-guarded, removable when stale). Do NOT over-add `previous/` folders — only where a base
|
||
genuinely won't deploy.
|
||
5. Re-verify each fix green. Never weaken a test to clear a regression.
|
||
|
||
## 3. Gates
|
||
|
||
**M1 — full sweep done + classified.** Every in-scope recipe run; results table complete with
|
||
baseline-vs-new levels; each regression classified `prevb-caused` vs `pre-existing/flaky`, with evidence.
|
||
Adversary cold-verifies the classification (spot-re-runs a sample, confirms a claimed flake really is one,
|
||
confirms a claimed `prevb`-cause is real).
|
||
|
||
**M2 — regressions fixed, suite back to baseline.** Every `prevb`-caused regression fixed + re-verified
|
||
green (harness fix and/or a minimal `previous/` folder); no recipe below its pre-`prevb` baseline; any
|
||
added `previous/` folders are minimal + version-guarded; pre-existing reds documented (not silently
|
||
absorbed). Fresh Adversary PASS → `## DONE`.
|
||
|
||
## 4. Guardrails
|
||
|
||
- **Never weaken a test** to clear a regression; fix the harness or add a minimal `previous/`.
|
||
- **`previous/` stays minimal + non-accumulating** — add only where a base won't deploy.
|
||
- **Report honestly** — pre-existing/flaky reds are labelled as such with evidence, never claimed fixed;
|
||
any recipe left below baseline is called out explicitly with the reason.
|
||
- Shared swarm: ≤2–3 concurrent deploys; tear down every deploy on every exit path; `/upgrade-all`'s
|
||
`dev-*` reaper is a backstop, not a substitute. Recipe mirrors PR-only; never merge. Commit author
|
||
`autonomic-bot`; push every commit; abra over a pseudo-TTY.
|
||
|
||
## 5. Definition of Done
|
||
|
||
Every enrolled recipe run through cc-ci after the `prevb` change; a complete baseline-vs-new results table;
|
||
all `prevb`-caused regressions fixed + re-verified green (no recipe below its pre-`prevb` baseline); any
|
||
needed `previous/` folders added minimally; pre-existing reds documented honestly; M1 + M2 fresh Adversary
|
||
PASSes recorded in REVIEW-regall.md.
|