claim(2w): WC8 + WC9 (FINAL gates) — resource-safety consolidation + stale-warm prune + docs/warm.md + --quick rollback proof

WC8: canonical.prune_stale (drop de-enrolled warm data + volumes) wired into the
nightly sweep + df log; consolidated evidence (DRONE_RUNNER_CAPACITY=MAX_TESTS
serialize; autoPrune drops --volumes so warm vols survive; cold teardown sacred;
warm excluded from D8 — no nix source ref). +1 unit (72 pass). WC9: docs/warm.md
documents the full warm/quick model; --quick rollback proof already proven live
(W2 FAIL restores exact known-good; WC4 PASS byte-identical snapshot). On PASS,
all WC1-WC9 (incl WC1.1/WC1.2) verified → DONE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 04:43:34 +01:00
parent b8b698e2f5
commit 40b03a9bf1
6 changed files with 234 additions and 6 deletions

View File

@ -377,3 +377,20 @@ red; added util-linux (matching cc-ci-run). After both fixes, the live SERVICE s
(known-good stayed 1.10.0 — never lose known-good). W3 (WC5+WC6) essentially closed. Remaining:
WC8 (resource/isolation hardening — mostly already in place) + WC9 (docs + --quick rollback proof,
already shown) → then DONE.
## 2026-05-29 — W4 WC8 + WC9 (final gates) built + claimed; DONE pending their PASS
WC6 ADVERSARY PASS (REVIEW-2w b8b698e). Then built the final two:
- **WC8 resource safety + isolation** — most was already in place; consolidated + added the missing
piece: `canonical.prune_stale()` drops `/var/lib/ci-warm/<recipe>/` + the `warm-<recipe>` volumes
for DE-ENROLLED canonicals (keeps enrolled + reconciler dirs keycloak/traefik + alerts/), wired
into the nightly sweep + a `df` log. +1 unit (72 pass). Verified live: DRONE_RUNNER_CAPACITY=maxTests
(serialize); autoPrune flags drop `--volumes` (warm vols survive); `grep ci-warm nix/` = comment
only (excluded from D8); disk 50%, warm ~318M.
- **WC9 docs** — `docs/warm.md`: the full warm/quick model (live/data-warm/cold, warm-<recipe> scheme,
health-gated reconcilers + WC1.2 safety gate + alerts, canonicals + warmsnap + enroll, --quick,
promote-on-green-cold, nightly sweep, resource safety, operate/debug) + the `--quick` rollback proof
(FAIL restores exact known-good; PASS byte-identical snapshot — proven W2/WC4).
Claimed WC8+WC9 (the final gates). On their PASS, EVERY WC1WC9 (incl WC1.1/WC1.2) is Adversary-verified
→ write `## DONE` to STATUS-2w (handshake: <24h PASS for all + no VETO) watchdog returns to Phase 2.

View File

@ -44,15 +44,19 @@ nightly full-cold sweep. Definition of Done = WC1WC9 (plan §1), each Adversa
health-gated, WC1.1) → SERIAL full-cold run over enrolled (`canonical.enrolled_recipes`) recipes
on latest → each green run promotes its canonical (WC5); skips if a test is in flight. Proven via
the live service: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0.
**CLAIMED — see Gate.**
**Adversary PASS @2026-05-29** (REVIEW-2w b8b698e, gate 465e105).
- [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via
bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified);
never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback
to cold. **Adversary PASS @2026-05-29** (REVIEW-2w 31f0e42, gate 3ff2bf6).
- [ ] **WC8** — Resource safety + isolation: warm runs serialize per app; warm keycloak shared via
per-run realms; disk monitored+pruned; cold teardown sacred; warm data excluded from D8 closure.
- [ ] **WC9** — Docs + cold verify incl. the rollback proof (deliberately fail a PR under `--quick`,
confirm last-known-good restored intact; a `--quick` pass did not move the known-good).
- [x] **WC8** — Resource safety + isolation: serialize via `DRONE_RUNNER_CAPACITY=MAX_TESTS` + serial
nightly that skips-if-test-active; warm keycloak shared via per-run realms (WC1); disk
monitored+pruned (autoPrune drops `--volumes` so warm vols survive; `canonical.prune_stale`
drops de-enrolled warm data nightly; nightly logs `df`); cold teardown sacred; warm data
EXCLUDED from D8 (no Nix module references `/var/lib/ci-warm` as a source). **CLAIMED — see Gate.**
- [x] **WC9**`docs/warm.md` documents the full warm/quick model; the `--quick` rollback proof
(FAIL restores last-known-good intact; PASS doesn't move it) is proven live (W2 FAIL + WC4
Adversary byte-identical-snapshot verify). **CLAIMED — see Gate.**
## Milestones (plan §3)
- **W0** — Warm keycloak (WC1/WC1.1-keycloak/WC1.2). ✅ Adversary PASS @2026-05-29.
@ -138,7 +142,42 @@ headline e2e is green (below). No recipe/harness change needed.
## Gate
### Gate: WC6 — CLAIMED, awaiting Adversary (@2026-05-29)
### Gate: WC8 + WC9 — CLAIMED, awaiting Adversary (@2026-05-29) [FINAL gates]
**WHAT.** WC8 resource safety/isolation (consolidated + a stale-warm prune) + WC9 docs + the proven
`--quick` rollback. **WHERE:** `runner/harness/canonical.py` (`prune_stale`), `runner/nightly_sweep.py`
(prune + df after sweep), `nix/modules/{drone-runner,swarm}.nix` (capacity, autoPrune), `docs/warm.md`.
**HOW + EXPECTED (cold):**
1. **Units:** `cc-ci-run -m pytest tests/unit -q`**72 passed** (incl. test_canonical prune_stale:
drops de-enrolled canonical dirs, keeps enrolled + reconciler dirs + alerts/).
2. **WC8 serialize:** `grep DRONE_RUNNER_CAPACITY nix/modules/drone-runner.nix``= maxTests`
(MAX_TESTS, default 1); `nightly_sweep.py` `_another_run_active()` skips if a run is in flight;
sweep loop is serial.
3. **WC8 disk/prune:** `grep flags nix/modules/swarm.nix``[ "--all" "--filter" "until=24h" ]`
(NO `--volumes` → warm volumes survive); `canonical.prune_stale()` drops `/var/lib/ci-warm/<r>/`
(+ its `warm-<r>` volumes) for recipes no longer WARM_CANONICAL, run nightly; `df -h /` logged by
the sweep. Live: disk `/` 50% (14G free); warm total ~318M (keycloak DB snapshot dominates).
4. **WC8 cold teardown sacred:** proven across W2/WC5/WC6 (no `<recipe>-<6hex>` leftovers post-run).
5. **WC8 excluded from D8:** `grep -rn ci-warm nix/` → only a COMMENT (no Nix source declares
`/var/lib/ci-warm`); it's runtime cache re-seeded by cold runs.
6. **WC9 docs:** `docs/warm.md` covers live-warm/data-warm/cold, the reconcilers + health-gate +
safety gate + alerts, canonicals + snapshots + enroll, `--quick`, promote-on-green-cold, the
nightly sweep, resource safety, and the `--quick` rollback proof + operate/debug.
7. **WC9 `--quick` rollback proof:** already cold-verified — W2 FAIL run restored the exact
known-good; WC4 Adversary verify confirmed a PASS run leaves the snapshot byte-identical (does NOT
move the known-good). Re-runnable per docs/warm.md "The --quick rollback proof".
**On WC8+WC9 PASS → ALL of WC1WC9 (incl WC1.1/WC1.2) verified → Builder writes `## DONE`.**
---
### Gate: WC6 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w b8b698e, gate 465e105)
Declarative timer (Persistent) + orchestration + the live systemd-service run (infra roll
health-gated → serial cold sweep → canonical advanced, infra healthy, no leftovers) cold-verified.
Builder may proceed to W4 (WC8/WC9). (claim detail retained below.)
### (claimed, now PASS) Gate: WC6 — CLAIMED detail
**WHAT.** Nightly full-cold sweep: a scheduled job rolls warm/infra to latest (health-gated, WC1.1)
then runs the full COLD suite serially across enrolled canonical recipes on latest — refreshing each