claim(2w): WC6 nightly full-cold sweep — timer+service roll warm/infra (health-gated) then serial cold sweep promoting canonicals (WC5); proven live

canonical.enrolled_recipes; runner/nightly_sweep.py (roll keycloak+traefik →
serial full-cold over enrolled on latest → green promotes; skip if test active;
operate against CCCI_REPO checkout for tests/); nix/modules/nightly-sweep.nix
(timer 03:00 Persistent + oneshot service) wired in. 2 bugs fixed via live
service run (repo-relative enrolled scan; util-linux for backup PTY). Live
SERVICE sweep: enrolled=['custom-html'] → all tiers green → canonical advanced
1.10.0→1.11.0; red-run correctly does NOT promote. 71 unit pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-29 04:33:08 +01:00
parent 1e40a460ba
commit 465e1059b0
7 changed files with 211 additions and 1 deletions

View File

@ -358,3 +358,22 @@ canonical at latest separately (one extra deploy) so the old known-good is never
(DECISIONS Phase-2w WC5). Next: WC6 nightly sweep (systemd timer: nixos-rebuild switch FIRST then
serial cold sweep over enrolled recipes; need canonical.enrolled_recipes() + a nightly-sweep nix
module). Building WC6 code while the Adversary verifies WC5.
## 2026-05-29 — W3 WC6 nightly full-cold sweep built + proven (systemd service); claiming. WC5+WC6 close W3.
canonical.enrolled_recipes() (scan tests/*/recipe_meta.py for WARM_CANONICAL). runner/nightly_sweep.py
(roll keycloak+traefik via warm_reconcile health-gated → serial full-cold over enrolled recipes on
latest → each green promotes WC5; skip if a run is active; per-recipe red reported not fatal).
nix/modules/nightly-sweep.nix = systemd timer (OnCalendar 03:00 Persistent +RandomizedDelay) + oneshot
service; wired into configuration.nix. 71 unit pass.
Two bugs found via the live SERVICE run (not the direct run): (1) the store packages only runner/ (not
tests/), so enrolled_recipes scanned a nonexistent store/tests → []; fixed nightly_sweep to operate
against $CCCI_REPO=/root/cc-ci (the checkout with tests/) — same place run_recipe_ci runs from. (2) the
sweep wrapper's runtimeInputs lacked util-linux → abra's backup/restore PTY (`script`) failed → backup
red; added util-linux (matching cc-ci-run). After both fixes, the live SERVICE sweep: enrolled=
['custom-html'] → all 5 tiers green → WC5 promote advanced canonical 1.10.0→1.11.0+1.29.0; timer active
(next ~03:00). Also confirmed the red-run path (the util-linux flake) correctly did NOT promote
(known-good stayed 1.10.0 — never lose known-good). W3 (WC5+WC6) essentially closed. Remaining:
WC8 (resource/isolation hardening — mostly already in place) + WC9 (docs + --quick rollback proof,
already shown) → then DONE.

View File

@ -39,7 +39,12 @@ nightly full-cold sweep. Definition of Done = WC1WC9 (plan §1), each Adversa
snapshot+registry; never lose known-good). Proven live: green cold custom-html run advanced the
canonical 1.10.0+1.28.0 → 1.11.0+1.29.0 (snapshot refreshed, idle, per-run app torn down).
`--quick` never promotes (W2). **Adversary PASS @2026-05-29** (REVIEW-2w 5bbc47c, gate 125453d).
- [ ] **WC6** — Nightly full-cold sweep (scheduled, declarative, MAX_TESTS-bounded).
- [x] **WC6** — Nightly full-cold sweep. `nix/modules/nightly-sweep.nix` (systemd TIMER OnCalendar
03:00 Persistent + oneshot service) → `runner/nightly_sweep.py`: roll warm/infra (keycloak+traefik
health-gated, WC1.1) → SERIAL full-cold run over enrolled (`canonical.enrolled_recipes`) recipes
on latest → each green run promotes its canonical (WC5); skips if a test is in flight. Proven via
the live service: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0.
**CLAIMED — see Gate.**
- [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via
bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified);
never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback
@ -133,6 +138,30 @@ headline e2e is green (below). No recipe/harness change needed.
## Gate
### Gate: WC6 — CLAIMED, awaiting Adversary (@2026-05-29)
**WHAT.** Nightly full-cold sweep: a scheduled job rolls warm/infra to latest (health-gated, WC1.1)
then runs the full COLD suite serially across enrolled canonical recipes on latest — refreshing each
canonical's known-good (WC5) + a daily authoritative regression. Declarative, MAX_TESTS-bounded
(serial), skips if a test is in flight. **WHERE:** `nix/modules/nightly-sweep.nix` (timer+service),
`runner/nightly_sweep.py`, `runner/harness/canonical.py` (`enrolled_recipes`). Wired into
`hosts/cc-ci/configuration.nix`.
**HOW + EXPECTED (cold):**
1. **Units:** `cc-ci-run -m pytest tests/unit -q`**71 passed** (incl. test_canonical enrolled_recipes).
2. **Timer present:** `systemctl is-active nightly-sweep.timer` → active; `systemctl list-timers
nightly-sweep.timer` → next ~03:00 (Persistent).
3. **Live sweep (via the systemd SERVICE, store copy):** set the custom-html canonical to an OLDER
version, then `systemctl start nightly-sweep.service` → journal shows: roll keycloak rc=0 + traefik
rc=0 (health-gated, noop at latest); `enrolled canonicals = ['custom-html']`; full-cold custom-html
install/upgrade/backup/restore/custom **all pass**; `WC5 promote: canonical custom-html advanced to
known-good 1.11.0+1.29.0`; `custom-html: PASS`; afterwards `canonical.json` version ADVANCED to
1.11.0+1.29.0, canonical idle, traefik+keycloak 200, system running. Builder ran this live: **PASS**.
(A red recipe in the sweep is reported FAIL + does NOT promote — known-good safe; verified when a
missing-util-linux backup flake red'd a run and the canonical stayed put, then fixed.)
---
### Gate: WC5 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 5bbc47c, gate 125453d)
Anti-poison gate predicate + live advancement 1.10.0→1.11.0 (cold-only) cold-verified. Builder may
proceed to WC6. (claim detail retained below.)