claim(2w): WC8 + WC9 (FINAL gates) — resource-safety consolidation + stale-warm prune + docs/warm.md + --quick rollback proof

WC8: canonical.prune_stale (drop de-enrolled warm data + volumes) wired into the
nightly sweep + df log; consolidated evidence (DRONE_RUNNER_CAPACITY=MAX_TESTS
serialize; autoPrune drops --volumes so warm vols survive; cold teardown sacred;
warm excluded from D8 — no nix source ref). +1 unit (72 pass). WC9: docs/warm.md
documents the full warm/quick model; --quick rollback proof already proven live
(W2 FAIL restores exact known-good; WC4 PASS byte-identical snapshot). On PASS,
all WC1-WC9 (incl WC1.1/WC1.2) verified → DONE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 04:43:34 +01:00
parent b8b698e2f5
commit 40b03a9bf1
6 changed files with 234 additions and 6 deletions

View File

@ -44,15 +44,19 @@ nightly full-cold sweep. Definition of Done = WC1WC9 (plan §1), each Adversa
health-gated, WC1.1) → SERIAL full-cold run over enrolled (`canonical.enrolled_recipes`) recipes
on latest → each green run promotes its canonical (WC5); skips if a test is in flight. Proven via
the live service: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0.
**CLAIMED — see Gate.**
**Adversary PASS @2026-05-29** (REVIEW-2w b8b698e, gate 465e105).
- [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via
bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified);
never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback
to cold. **Adversary PASS @2026-05-29** (REVIEW-2w 31f0e42, gate 3ff2bf6).
- [ ] **WC8** — Resource safety + isolation: warm runs serialize per app; warm keycloak shared via
per-run realms; disk monitored+pruned; cold teardown sacred; warm data excluded from D8 closure.
- [ ] **WC9** — Docs + cold verify incl. the rollback proof (deliberately fail a PR under `--quick`,
confirm last-known-good restored intact; a `--quick` pass did not move the known-good).
- [x] **WC8** — Resource safety + isolation: serialize via `DRONE_RUNNER_CAPACITY=MAX_TESTS` + serial
nightly that skips-if-test-active; warm keycloak shared via per-run realms (WC1); disk
monitored+pruned (autoPrune drops `--volumes` so warm vols survive; `canonical.prune_stale`
drops de-enrolled warm data nightly; nightly logs `df`); cold teardown sacred; warm data
EXCLUDED from D8 (no Nix module references `/var/lib/ci-warm` as a source). **CLAIMED — see Gate.**
- [x] **WC9**`docs/warm.md` documents the full warm/quick model; the `--quick` rollback proof
(FAIL restores last-known-good intact; PASS doesn't move it) is proven live (W2 FAIL + WC4
Adversary byte-identical-snapshot verify). **CLAIMED — see Gate.**
## Milestones (plan §3)
- **W0** — Warm keycloak (WC1/WC1.1-keycloak/WC1.2). ✅ Adversary PASS @2026-05-29.
@ -138,7 +142,42 @@ headline e2e is green (below). No recipe/harness change needed.
## Gate
### Gate: WC6 — CLAIMED, awaiting Adversary (@2026-05-29)
### Gate: WC8 + WC9 — CLAIMED, awaiting Adversary (@2026-05-29) [FINAL gates]
**WHAT.** WC8 resource safety/isolation (consolidated + a stale-warm prune) + WC9 docs + the proven
`--quick` rollback. **WHERE:** `runner/harness/canonical.py` (`prune_stale`), `runner/nightly_sweep.py`
(prune + df after sweep), `nix/modules/{drone-runner,swarm}.nix` (capacity, autoPrune), `docs/warm.md`.
**HOW + EXPECTED (cold):**
1. **Units:** `cc-ci-run -m pytest tests/unit -q`**72 passed** (incl. test_canonical prune_stale:
drops de-enrolled canonical dirs, keeps enrolled + reconciler dirs + alerts/).
2. **WC8 serialize:** `grep DRONE_RUNNER_CAPACITY nix/modules/drone-runner.nix``= maxTests`
(MAX_TESTS, default 1); `nightly_sweep.py` `_another_run_active()` skips if a run is in flight;
sweep loop is serial.
3. **WC8 disk/prune:** `grep flags nix/modules/swarm.nix``[ "--all" "--filter" "until=24h" ]`
(NO `--volumes` → warm volumes survive); `canonical.prune_stale()` drops `/var/lib/ci-warm/<r>/`
(+ its `warm-<r>` volumes) for recipes no longer WARM_CANONICAL, run nightly; `df -h /` logged by
the sweep. Live: disk `/` 50% (14G free); warm total ~318M (keycloak DB snapshot dominates).
4. **WC8 cold teardown sacred:** proven across W2/WC5/WC6 (no `<recipe>-<6hex>` leftovers post-run).
5. **WC8 excluded from D8:** `grep -rn ci-warm nix/` → only a COMMENT (no Nix source declares
`/var/lib/ci-warm`); it's runtime cache re-seeded by cold runs.
6. **WC9 docs:** `docs/warm.md` covers live-warm/data-warm/cold, the reconcilers + health-gate +
safety gate + alerts, canonicals + snapshots + enroll, `--quick`, promote-on-green-cold, the
nightly sweep, resource safety, and the `--quick` rollback proof + operate/debug.
7. **WC9 `--quick` rollback proof:** already cold-verified — W2 FAIL run restored the exact
known-good; WC4 Adversary verify confirmed a PASS run leaves the snapshot byte-identical (does NOT
move the known-good). Re-runnable per docs/warm.md "The --quick rollback proof".
**On WC8+WC9 PASS → ALL of WC1WC9 (incl WC1.1/WC1.2) verified → Builder writes `## DONE`.**
---
### Gate: WC6 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w b8b698e, gate 465e105)
Declarative timer (Persistent) + orchestration + the live systemd-service run (infra roll
health-gated → serial cold sweep → canonical advanced, infra healthy, no leftovers) cold-verified.
Builder may proceed to W4 (WC8/WC9). (claim detail retained below.)
### (claimed, now PASS) Gate: WC6 — CLAIMED detail
**WHAT.** Nightly full-cold sweep: a scheduled job rolls warm/infra to latest (health-gated, WC1.1)
then runs the full COLD suite serially across enrolled canonical recipes on latest — refreshing each