Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-regall-recipe-regression.md
autonomic-bot 65ee741869 plan: queue prevb (dynamic upgrade base + previous/ config, opus) + regall (all-recipe regression, sonnet)
Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single
compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) ->
main fallback -> skip; optional minimal per-recipe previous/ folder for
base-only version repairs (ignored for head, version-guarded, removable when
stale). Validated on discourse PR #4 (official-image switch the current overlay
masks). regall then sweeps all recipes for regressions on sonnet.
2026-06-16 23:55:28 +00:00

66 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase `regall` — full all-recipe regression after the dynamic-base / `previous/` change
**Mission (operator-specified 2026-06-16):** the `prevb` phase changed the upgrade tier for EVERY recipe
(dynamic base resolution: last-green → `main` → skip; new `previous/` lookup; environmental-vs-version
overlay split). Run the **entire recipe suite** through cc-ci to ensure **nothing regressed**, and **fix
anything that did**. This is the safety net for a cross-cutting harness change.
State files: `STATUS-regall.md`, `BACKLOG-regall.md`, `REVIEW-regall.md`, `JOURNAL-regall.md`. DECISIONS.md shared.
## 1. Scope
- **Every recipe cc-ci tests** — the `weekly` + `external` rows in `cc-ci-plan/used-recipes.md` that have a
`tests/<recipe>/` dir (all the enrolled recipes, ~21). Drone's gitea-dep path counts too.
- **All tiers**, with focus on the **upgrade tier** (changed most by `prevb`): install / upgrade / backup
/ restore / custom / lint / screenshot, via the real harness / CI path.
- **Baseline = the recorded pre-`prevb` green levels** per recipe (cc-ci's level records). A regression =
a recipe that dropped a level, or a tier that newly fails, **relative to that baseline** — not relative
to "perfect."
## 2. Method
1. Run each recipe through the harness (parallelize within the shared-swarm budget — ≤23 concurrent
deploys; tear down every deploy on every exit path). Capture level + per-tier pass/fail.
2. Build a results table: `recipe | baseline level | new level | per-tier delta | verdict`.
3. **Classify each regression**: caused by the `prevb` change (dynamic base resolution / `previous/`
lookup / overlay split), or **pre-existing / flaky / environmental** (was already red, or fails on
re-run independent of `prevb`). Be honest — a recipe that was already red stays red; say so, don't
claim a fix.
4. **Fix the `prevb`-caused regressions:**
- Harness-logic regressions → fix in `runner/**` (e.g. a base that resolves wrong, a `previous/`
mis-match, an overlay layering bug).
- A recipe whose **last-green base no longer deploys** under dynamic resolution → add a **MINIMAL**
`previous/` folder for it (same rules as `prevb`: smallest thing that brings the base up,
version-guarded, removable when stale). Do NOT over-add `previous/` folders — only where a base
genuinely won't deploy.
5. Re-verify each fix green. Never weaken a test to clear a regression.
## 3. Gates
**M1 — full sweep done + classified.** Every in-scope recipe run; results table complete with
baseline-vs-new levels; each regression classified `prevb-caused` vs `pre-existing/flaky`, with evidence.
Adversary cold-verifies the classification (spot-re-runs a sample, confirms a claimed flake really is one,
confirms a claimed `prevb`-cause is real).
**M2 — regressions fixed, suite back to baseline.** Every `prevb`-caused regression fixed + re-verified
green (harness fix and/or a minimal `previous/` folder); no recipe below its pre-`prevb` baseline; any
added `previous/` folders are minimal + version-guarded; pre-existing reds documented (not silently
absorbed). Fresh Adversary PASS → `## DONE`.
## 4. Guardrails
- **Never weaken a test** to clear a regression; fix the harness or add a minimal `previous/`.
- **`previous/` stays minimal + non-accumulating** — add only where a base won't deploy.
- **Report honestly** — pre-existing/flaky reds are labelled as such with evidence, never claimed fixed;
any recipe left below baseline is called out explicitly with the reason.
- Shared swarm: ≤23 concurrent deploys; tear down every deploy on every exit path; `/upgrade-all`'s
`dev-*` reaper is a backstop, not a substitute. Recipe mirrors PR-only; never merge. Commit author
`autonomic-bot`; push every commit; abra over a pseudo-TTY.
## 5. Definition of Done
Every enrolled recipe run through cc-ci after the `prevb` change; a complete baseline-vs-new results table;
all `prevb`-caused regressions fixed + re-verified green (no recipe below its pre-`prevb` baseline); any
needed `previous/` folders added minimally; pre-existing reds documented honestly; M1 + M2 fresh Adversary
PASSes recorded in REVIEW-regall.md.