# STATUS — Phase 2w (warm canonical deployments + `--quick` CI mode) ## DONE **Phase 2w COMPLETE @2026-05-29.** Every Definition-of-Done item (WC1–WC9, incl. WC1.1 + WC1.2) is **Adversary cold-verified with a fresh (<24h) PASS in REVIEW-2w, NO `## VETO`, no open `[adversary]` findings** — the Adversary authorized DONE (REVIEW-2w 2822d60: "ALL Phase-2w gates Adversary cold-verified — NO VETO — DONE authorized"). The watchdog now auto-returns to **Phase 2** (resume recipe authoring; STATUS-2/BACKLOG-2 intact). Evidence (each WC → its REVIEW-2w PASS / gate commit): | WC | What | PASS (REVIEW-2w / gate) | |---|---|---| | WC1 | live-warm UNPINNED keycloak; per-run namespaced realms; concurrency; reaping | 31ac86d / 985686f | | WC1.1 | health-gated rollback — keycloak (stateful, snapshot) | 31ac86d / 985686f | | WC1.1 | health-gated rollback — traefik (stateless, version-only) | e3b08a9 / e678d2e | | WC1.2 | pre-deploy safety gate (major / manual-migration → hold+alert) | 31ac86d / 985686f | | WC2 | data-warm canonical model + registry | 0246296 / 4ce80f8 | | WC3 | known-good snapshots (raw-while-undeployed, restore round-trips) | 0246296 / 4ce80f8 | | WC4 | `--quick` mode (PASS keeps known-good; FAIL restores; never promote) | 31f0e42 / 3ff2bf6 | | WC5 | promote-on-green-cold (only cold-on-latest advances) | 5bbc47c / 125453d | | WC6 | nightly full-cold sweep (timer + roll-warm/infra + serial sweep) | b8b698e / 465e105 | | WC7 | `!testme --quick` trigger / labeling / no-canonical fallback | 31f0e42 / 3ff2bf6 | | WC8 | resource safety + isolation (serialize, disk prune, D8-excluded) | 2822d60 / 40b03a9 | | WC9 | docs (`docs/warm.md`) + the `--quick` rollback proof | 2822d60 / 40b03a9 | Final state: keycloak + traefik 200; custom-html canonical idle@1.11.0+1.29.0; nightly-sweep.timer active; system running (0 failed); disk 50%. No tests softened in the phase. --- **Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase2w-warm-canonical-quick.md` **Loop state for THIS phase:** STATUS-2w / BACKLOG-2w / REVIEW-2w / JOURNAL-2w (DECISIONS.md shared). Phase 1/1b/1c/1d/1e and Phase 2 STATUS/BACKLOG/REVIEW files are NOT this phase's state. Phase 2 is **PAUSED** (STATUS-2/BACKLOG-2 intact) and resumes after 2w `## DONE`. ## Phase Add a warm-data layer to cc-ci CI: a live-warm shared keycloak for SSO deps, data-warm per-recipe canonicals at stable domains, known-good snapshots, an opt-in `--quick` fast lane that reattaches the canonical and upgrades to PR head (rolling back on failure), cold-only canonical advancement, and a nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversary cold-verified. ## Definition of Done (Phase 2w) — WC1–WC9 (+WC1.1/WC1.2), each Adversary cold-verified in REVIEW-2w - [x] **WC1** — Live-warm UNPINNED keycloak; per-run namespaced realms (create+delete); concurrent distinct realms; orphan realms reaped. **Adversary PASS @2026-05-29** (REVIEW-2w, gate 985686f). - [~] **WC1.1** — Health-gated deploy-with-rollback. **keycloak (stateful) — Adversary PASS @2026-05-29** (marquee). **traefik (stateless, version-rollback-only) — reconciler MIGRATED (W0.10a): proxy.nix now drives `warm_reconcile.py traefik` (shared health-gated path, no snapshot; cert/file-provider setup preserved); no-op converge proven live (traefik 200, keycloak-through-traefik 200, 0 failed). **Adversary PASS @2026-05-29** (REVIEW-2w e3b08a9): destructive rollback proven (lint-breaking tag → rollback to 5.1.1, NO TLS outage). **WC1.1 FULLY CLOSED (keycloak stateful + traefik stateless).** - [x] **WC1.2** — Pre-deploy safety gate (major / manual-migration → hold + alert with notes, no churn, short-circuits before WC1.1). **Adversary PASS @2026-05-29**. - [x] **WC2** — Data-warm canonical model: per-recipe canonical at stable domain `warm-`, declarative registry (canonical.json + recipe_meta.WARM_CANONICAL) tracking recipe→known-good version/commit; data-warm (undeployed-when-idle, volume retained); re-warmable via seed_canonical. Proven on custom-html (W1.2). **Adversary PASS @2026-05-29** (REVIEW-2w 0246296, gate 4ce80f8). - [x] **WC3** — Known-good snapshots: raw per-volume tar taken while undeployed under `/var/lib/ci-warm//snapshot/`; one last-good per app, atomic subdir swap; restore round-trips data (W0.5 + W1.2 + Adversary's own mutate→restore). **Adversary PASS @2026-05-29**. - [x] **WC4** — `--quick` mode (`run_quick` in run_recipe_ci.py): reattach canonical → upgrade to PR head (chaos) → generic UPGRADE+serving+overlay+custom; PASS→undeploy-keep-volume (known-good UNCHANGED, never promote); FAIL→restore last-known-good snapshot then undeploy. Proven live on custom-html (PASS + FAIL). **Adversary PASS @2026-05-29** (REVIEW-2w 31f0e42, gate 3ff2bf6). - [x] **WC5** — Canonical advancement via cold only (promote-on-green-cold). `should_promote_canonical` (enrolled+green+cold+latest) + `promote_canonical` (re-seed canonical at green-verified latest → snapshot+registry; never lose known-good). Proven live: green cold custom-html run advanced the canonical 1.10.0+1.28.0 → 1.11.0+1.29.0 (snapshot refreshed, idle, per-run app torn down). `--quick` never promotes (W2). **Adversary PASS @2026-05-29** (REVIEW-2w 5bbc47c, gate 125453d). - [x] **WC6** — Nightly full-cold sweep. `nix/modules/nightly-sweep.nix` (systemd TIMER OnCalendar 03:00 Persistent + oneshot service) → `runner/nightly_sweep.py`: roll warm/infra (keycloak+traefik health-gated, WC1.1) → SERIAL full-cold run over enrolled (`canonical.enrolled_recipes`) recipes on latest → each green run promotes its canonical (WC5); skips if a test is in flight. Proven via the live service: enrolled=['custom-html'] → all tiers green → canonical advanced 1.10.0→1.11.0. **Adversary PASS @2026-05-29** (REVIEW-2w b8b698e, gate 465e105). - [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified); never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback to cold. **Adversary PASS @2026-05-29** (REVIEW-2w 31f0e42, gate 3ff2bf6). - [x] **WC8** — Resource safety + isolation: serialize via `DRONE_RUNNER_CAPACITY=MAX_TESTS` + serial nightly that skips-if-test-active; warm keycloak shared via per-run realms (WC1); disk monitored+pruned (autoPrune drops `--volumes` so warm vols survive; `canonical.prune_stale` drops de-enrolled warm data nightly; nightly logs `df`); cold teardown sacred; warm data EXCLUDED from D8 (no Nix module references `/var/lib/ci-warm` as a source). **CLAIMED — see Gate.** - [x] **WC9** — `docs/warm.md` documents the full warm/quick model; the `--quick` rollback proof (FAIL restores last-known-good intact; PASS doesn't move it) is proven live (W2 FAIL + WC4 Adversary byte-identical-snapshot verify). **CLAIMED — see Gate.** ## Milestones (plan §3) - **W0** — Warm keycloak (WC1/WC1.1-keycloak/WC1.2). ✅ Adversary PASS @2026-05-29. - **W1** — Canonical registry + snapshot/restore (WC2, WC3). ✅ Adversary PASS @2026-05-29. - **W2** — `--quick` mode (WC4, WC7). ✅ Adversary PASS @2026-05-29. - **W3** — Cold-advances-canonical (WC5 ✅ PASS) + nightly sweep (WC6 ← building). - **W4** — Resource/isolation hardening + docs + cold verify (WC8, WC9). - **W1** — Canonical registry + snapshot/restore (WC2, WC3). - **W2** — `--quick` mode (WC4, WC7). - **W3** — Cold-advances-canonical + nightly sweep (WC5, WC6). - **W4** — Resource/isolation hardening + docs + cold verify incl. rollback proof (WC8, WC9). → DONE. ## In flight **W0 — live-warm keycloak (WC1).** Done so far (commits up to 88c1114): - W0.1 sso realm lifecycle (list/delete/realms_to_reap/reap) + 8 unit tests (43 unit pass). - W0.2 orchestrator live-warm dep mode (warm.py + run_recipe_ci split warm/cold; per-run realm). - **WC1 core mechanism PROVEN** deploy-free on the live warm keycloak: realm create → password-grant JWT → discovery issuer → delete(idempotent) → reap(keeps live hex / deletes orphan). All PASS. - W0.3 declarative reconciler `nix/modules/warm-keycloak.nix` up; `nixos-rebuild switch` → warm-keycloak.service active, system running (0 failed), /realms/master=200. (INTERIM: pinned + skip-if-healthy; to be replaced by the unpinned + health-gated WC1.1 form.) - **W0.5 WC3 snapshot/restore helper** (`runner/harness/warmsnap.py`) DONE (4cc1e15). +5 unit tests (48 unit pass). **LIVE round-trip PROVEN on warm keycloak**: marker realm → undeploy → snapshot (mariadb+providers) → deploy → delete marker (mutate DB) → undeploy → restore → deploy → marker realm BACK; keycloak healthy. Snapshots under `/var/lib/ci-warm//`, atomic, one last-good. - **W0.6 reconciler rewrite** DONE (a044abb). `runner/warm_reconcile.py` (python, packaged into the nix store, replaces the bash reconcile): UNPIN keycloak (deploy latest version TAG; recipe fetched at runtime → D8 closure byte-identical); WC1.2 pre-deploy safety gate (major recipe/app bump OR releaseNotes manual-migration → hold + alert, no churn); WC1.1 health-gated upgrade-with-rollback scaffold (record last-good → keycloak undeploy→snapshot→deploy latest → health-gate → commit-or-restore+redeploy-prior+alert). Alerts = `/var/lib/ci-warm/alerts/*.json`. +8 unit tests (56 unit pass). PROVEN live: `nixos-rebuild switch` → warm-keycloak.service runs the python reconciler → noop-healthy (system 0-failed, 200); **WC1.2 holds proven** (MAJOR → held-major, keycloak untouched; minor+manual-migration notes → held-manual-migration, alert carries notes). - **W0.9 WC1.1 live proofs** DONE (32f0071). PROVEN on warm keycloak (annotated fake tags + CCCI_SKIP_FETCH): (a) healthy upgrade 10.7.1→10.7.9 — snapshot+deploy+health-pass, last_good committed, marker preserved; (b) **marquee rollback** — broken latest 10.7.10 → deploy fails → rollback to 10.7.9, HEALTHY, marker realm INTACT (data preserved), last_good NOT advanced, rollback alert written (attempted=10.7.10,last_good=10.7.9,recovered=True); recovered to canonical 10.7.1+26.6.2. Fixed 4 issues live (deploy-fail→rollback, warmsnap last_good subdir, wait_undeployed swarm-settle, abra-stdout capture). 57 unit pass. **Reconciler-side WC1/WC1.1/WC1.2 proven.** **Adversary reproduce (W0.9):** on cc-ci, with the keycloak recipe clone, create annotated fake tags (peel `^{}`, set git identity) `10.7.9+26.6.2`(=good commit) and `10.7.10+26.6.2`(broken KC_HOSTNAME), then `CCCI_SKIP_FETCH=1 cc-ci-run runner/warm_reconcile.py keycloak` twice; observe `upgraded:` then `rolled-back:`, marker realm survives, `/var/lib/ci-warm/keycloak/last_good` unchanged at the prior version, a `*rollback*.json` alert under `/var/lib/ci-warm/alerts/`. **W0 COMPLETE — Adversary PASS @2026-05-29.** Now in **W1 (canonical registry, WC2/WC3)**. **W0 ✅ + W1 ✅ + W2 ✅ Adversary PASS. Now in W3 (cold-advances-canonical WC5 + nightly sweep WC6).** **W3 plan:** - **WC5 — promote-on-green-cold.** A GREEN full-cold run on the LATEST (not a `--quick` run) of an enrolled (WARM_CANONICAL) recipe re-snapshots + re-tags the canonical known-good instead of deleting the volume at teardown: at the end of a green cold run, undeploy → `canonical.seed_canonical` (snapshot while undeployed + write registry version=the green commit/version) → keep the volume as the new canonical. The FIRST green cold run on latest SEEDS the canonical. ONLY cold advances it (`--quick` never promotes — proven W2). Wire into run_recipe_ci.py cold teardown, gated on: recipe is WARM_CANONICAL + run was green + deployed LATEST (not a pinned/prev base). Add unit tests + a live proof (green cold custom-html run → canonical re-seeded at the new known-good). - **WC6 — nightly full-cold sweep.** Declarative scheduler (systemd timer on cc-ci): nightly does `nixos-rebuild switch` FIRST (rolls warm/infra to latest, health-gated per WC1.1) THEN a full-cold sweep across enrolled recipes (serial, MAX_TESTS-bounded), refreshing each canonical's known-good (WC5) + serving as the daily authoritative regression. MUST NOT run while a test is in flight. - **Quiet-window opportunity (now): W0.10a traefik WC1.1** — Adversary idle post-W2 PASS, so this is the window to migrate traefik onto the health-gated reconciler (tracked-before-DONE; below). **Tracked before Phase-2w DONE:** - **W0.10a — traefik WC1.1** (Adversary requires a cold proof): migrate `proxy.nix` onto the shared health-gated reconciler (stateless = version-rollback-only; preserve cert-secret/WILDCARDS_ENABLED/ COMPOSE_FILE setup). CAREFUL — traefik serves all TLS; deploy/test only in a quiet window. - **W0.10b — Builder-loop alert relay**: each wake, scan `/var/lib/ci-warm/alerts/*.json` → PushNotification → archive to `alerts/seen/`. **Build finding (RESOLVED):** the W0.4 lasuite-docs `setup_custom_tests` redeploy failure (nginx web `host not found in upstream ...backend:8000`) was **transient resource contention** from the since-killed stale Phase-2 run (disk was also tight). On the clean system it converges fine — the headline e2e is green (below). No recipe/harness change needed. ## Gate ### Gate: WC8 + WC9 — CLAIMED, awaiting Adversary (@2026-05-29) [FINAL gates] **WHAT.** WC8 resource safety/isolation (consolidated + a stale-warm prune) + WC9 docs + the proven `--quick` rollback. **WHERE:** `runner/harness/canonical.py` (`prune_stale`), `runner/nightly_sweep.py` (prune + df after sweep), `nix/modules/{drone-runner,swarm}.nix` (capacity, autoPrune), `docs/warm.md`. **HOW + EXPECTED (cold):** 1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **72 passed** (incl. test_canonical prune_stale: drops de-enrolled canonical dirs, keeps enrolled + reconciler dirs + alerts/). 2. **WC8 serialize:** `grep DRONE_RUNNER_CAPACITY nix/modules/drone-runner.nix` → `= maxTests` (MAX_TESTS, default 1); `nightly_sweep.py` `_another_run_active()` skips if a run is in flight; sweep loop is serial. 3. **WC8 disk/prune:** `grep flags nix/modules/swarm.nix` → `[ "--all" "--filter" "until=24h" ]` (NO `--volumes` → warm volumes survive); `canonical.prune_stale()` drops `/var/lib/ci-warm//` (+ its `warm-` volumes) for recipes no longer WARM_CANONICAL, run nightly; `df -h /` logged by the sweep. Live: disk `/` 50% (14G free); warm total ~318M (keycloak DB snapshot dominates). 4. **WC8 cold teardown sacred:** proven across W2/WC5/WC6 (no `-<6hex>` leftovers post-run). 5. **WC8 excluded from D8:** `grep -rn ci-warm nix/` → only a COMMENT (no Nix source declares `/var/lib/ci-warm`); it's runtime cache re-seeded by cold runs. 6. **WC9 docs:** `docs/warm.md` covers live-warm/data-warm/cold, the reconcilers + health-gate + safety gate + alerts, canonicals + snapshots + enroll, `--quick`, promote-on-green-cold, the nightly sweep, resource safety, and the `--quick` rollback proof + operate/debug. 7. **WC9 `--quick` rollback proof:** already cold-verified — W2 FAIL run restored the exact known-good; WC4 Adversary verify confirmed a PASS run leaves the snapshot byte-identical (does NOT move the known-good). Re-runnable per docs/warm.md "The --quick rollback proof". **On WC8+WC9 PASS → ALL of WC1–WC9 (incl WC1.1/WC1.2) verified → Builder writes `## DONE`.** --- ### Gate: WC6 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w b8b698e, gate 465e105) Declarative timer (Persistent) + orchestration + the live systemd-service run (infra roll health-gated → serial cold sweep → canonical advanced, infra healthy, no leftovers) cold-verified. Builder may proceed to W4 (WC8/WC9). (claim detail retained below.) ### (claimed, now PASS) Gate: WC6 — CLAIMED detail **WHAT.** Nightly full-cold sweep: a scheduled job rolls warm/infra to latest (health-gated, WC1.1) then runs the full COLD suite serially across enrolled canonical recipes on latest — refreshing each canonical's known-good (WC5) + a daily authoritative regression. Declarative, MAX_TESTS-bounded (serial), skips if a test is in flight. **WHERE:** `nix/modules/nightly-sweep.nix` (timer+service), `runner/nightly_sweep.py`, `runner/harness/canonical.py` (`enrolled_recipes`). Wired into `hosts/cc-ci/configuration.nix`. **HOW + EXPECTED (cold):** 1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **71 passed** (incl. test_canonical enrolled_recipes). 2. **Timer present:** `systemctl is-active nightly-sweep.timer` → active; `systemctl list-timers nightly-sweep.timer` → next ~03:00 (Persistent). 3. **Live sweep (via the systemd SERVICE, store copy):** set the custom-html canonical to an OLDER version, then `systemctl start nightly-sweep.service` → journal shows: roll keycloak rc=0 + traefik rc=0 (health-gated, noop at latest); `enrolled canonicals = ['custom-html']`; full-cold custom-html install/upgrade/backup/restore/custom **all pass**; `WC5 promote: canonical custom-html advanced to known-good 1.11.0+1.29.0`; `custom-html: PASS`; afterwards `canonical.json` version ADVANCED to 1.11.0+1.29.0, canonical idle, traefik+keycloak 200, system running. Builder ran this live: **PASS**. (A red recipe in the sweep is reported FAIL + does NOT promote — known-good safe; verified when a missing-util-linux backup flake red'd a run and the canonical stayed put, then fixed.) --- ### Gate: WC5 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 5bbc47c, gate 125453d) Anti-poison gate predicate + live advancement 1.10.0→1.11.0 (cold-only) cold-verified. Builder may proceed to WC6. (claim detail retained below.) ### (claimed, now PASS) Gate: WC5 — CLAIMED detail **WHAT.** Promote-on-green-cold: a GREEN full-cold run on LATEST (no PR head) of an enrolled (WARM_CANONICAL) recipe advances/seeds the canonical known-good; `--quick` never promotes; only cold advances. **WHERE:** `runner/run_recipe_ci.py` (`should_promote_canonical` gate + `promote_canonical` + the post-green-cold hook in main()), `runner/harness/canonical.py` (seed_canonical). **HOW + EXPECTED (cold):** 1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **70 passed** (incl. test_promote: the gate fires only for enrolled+green+cold+latest; not on red / quick / PR-head / unenrolled). 2. **Live advancement (custom-html canonical):** set its registry version to an OLDER value (`canonical.write_registry("custom-html", version="1.10.0+1.28.0", …)`), then a full COLD run `RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py` (no REF = latest) → install/upgrade/backup/ restore/custom all pass, deploy-count=1, then `WC5 promote-on-green-cold: (re)seed canonical custom-html @ 1.11.0+1.29.0` → afterwards `canonical.json` version **ADVANCED to 1.11.0+1.29.0** (commit=head 8a02606…), snapshot refreshed (`warmsnap.read_meta` version=1.11.0+1.29.0), canonical idle + volume retained, NO `cust-*` per-run service left (cold teardown sacred). Builder ran this live: **advanced 1.10.0→1.11.0**. (A PR `!testme` REF=PR-head does NOT promote; `--quick` never promotes — both gate-checked.) --- ### Gate: W0.10a traefik WC1.1 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w e3b08a9, gate e678d2e) Migration + no-op converge + destructive rollback (lint-breaking tag → rollback to last-good, NO TLS outage — broken deploy rejected at lint before touching the running proxy) all cold-verified. **WC1.1 now FULLY closed (keycloak + traefik).** (claim detail retained below.) ### (claimed, now PASS) Gate: W0.10a traefik WC1.1 — CLAIMED detail **WHAT.** traefik migrated onto the shared health-gated reconciler (WC1.1, stateless = version-rollback-only, NO snapshot): record last-good → deploy latest tag → health-gate (routed host ci.commoninternet.net = 200) → healthy commit / unhealthy roll back to last-good + alert. Closes the W0.10a tracked-open item from the W0 gate. traefik's wildcard-cert/file-provider config preserved. **WHERE.** `runner/warm_reconcile.py` (SPECS["traefik"] stateful=False + `_traefik_setup` + health_domain; reconcile() per-app setup hook; the stateless path skips snapshot/restore — version rollback only), `nix/modules/proxy.nix` (deploy-proxy.service now execs `python3 …/warm_reconcile.py traefik`). **HOW + EXPECTED (cold):** 1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **65 passed** (incl. test_warm_reconcile traefik spec: stateful=False, callable setup, health_domain=ci.commoninternet.net; keycloak unchanged). 2. **No-op converge (delivered, proven live):** `systemctl is-active deploy-proxy.service` → active; `journalctl -u deploy-proxy.service` → `[traefik] already on latest 5.1.1+v3.6.15 and healthy — no-op`; traefik serving (ci.commoninternet.net=200) + keycloak-through-traefik=200 + system `running` (0 failed). The migration was zero-disruption (traefik was already at the latest tag; I pre-seeded TYPE+last_good to 5.1.1+v3.6.15 so the reconcile is a clean no-op). 3. **Destructive rollback (the Adversary's required cold proof):** stage a fake newer traefik tag with a broken config → `CCCI_SKIP_FETCH=1 cc-ci-run runner/warm_reconcile.py traefik` → broken deploy fails health → reconciler rolls back to last-good 5.1.1+v3.6.15 (version-only, no snapshot — traefik is stateless) → traefik healthy again + a `*-rollback.json` alert. NOTE: a destructive traefik test briefly drops TLS for ALL routes during the broken-deploy window until rollback — run it knowing that + with manual recovery ready (`abra app deploy traefik.ci.commoninternet.net 5.1.1+v3.6.15 -o -n -f`). The rollback logic is the SAME proven keycloak pattern, stateless variant (no snapshot). Per operator guidance, I delivered the code + the safe no-op converge this iteration and left the destructive rollback as the Adversary's cold proof (a live destructive traefik test risks all TLS). --- ### Gate: WC4 + WC7 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 31f0e42, gate 3ff2bf6) Cold-verified from the Adversary's own clone: 64 units; WC7 adversarial trigger battery (all negatives rejected, live bridge); WC4 never-promote (snapshot byte-identical, registry unchanged); WC4 FAIL→rollback restored EXACT known-good (marker back, 200, broken image gone, exit 1); no-canonical fallback to a cold per-run domain. Builder may proceed to W3. (claim detail retained below.) ### (claimed, now PASS) Gate: WC4 + WC7 — CLAIMED detail **WHAT.** The `--quick` opt-in fast lane (W2): reattach the data-warm canonical → upgrade in place to the PR head → assert (generic upgrade reconverge+moved+serving + overlay + custom); PASS → undeploy-keep-volume with the **known-good UNCHANGED (never promote)**; FAIL → restore the last-known-good snapshot + undeploy (roll back, data safe). Opt-in via `!testme --quick`, mode-tagged lower-confidence, never gates merge; clean no-canonical fallback to COLD. **WHERE (code).** `runner/run_recipe_ci.py` (`run_quick`, dispatched from `main()` on CCCI_QUICK=1 / MODE=quick; `_wait_undeployed`; no-canonical fallback), `runner/harness/canonical.py` (deploy_canonical resets TYPE; undeploy_keep_volume), `runner/harness/warmsnap.py` (restore), `bridge/bridge.py` (`parse_trigger` + CCCI_QUICK param), `.drone.yml` (quick echo). 64 unit pass. **HOW + EXPECTED (cold, from your own clone on cc-ci):** 1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **64 passed** (incl. test_bridge_trigger: `!testme`→cold, `!testme --quick`→quick, `!testmexyz`→reject). 2. **WC7 trigger (live in the running bridge):** `cid=$(docker ps -q -f name=ccci-bridge); docker exec $cid python3 -c 'import sys;sys.path.insert(0,"/app");import bridge; print(bridge.parse_trigger("!testme --quick"), bridge.parse_trigger("!testmexyz"))'` → `(True, True) (False, False)`. `trigger_build` adds `CCCI_QUICK=1` (auto-exposed to run_recipe_ci); a `!testme --quick` PR comment is labelled lower-confidence; plain `!testme` stays full cold. 3. **WC4 `--quick` flow (custom-html canonical, currently idle at 1.11.0+1.29.0):** - **PASS run:** `RECIPE=custom-html CCCI_QUICK=1 REF=87a62a5 cc-ci-run runner/run_recipe_ci.py` (REF=87a62a5 is the 1.10.0+1.28.0 commit — a different healthy head) → exit 0; SUMMARY shows `mode=quick`, `upgrade: pass`, `custom: pass`, "canonical undeployed, volume retained, known-good UNCHANGED"; afterwards `canonical.json` version STILL 1.11.0+1.29.0 (NOT promoted), canonical idle, content volume retained, known-good marker intact. - **FAIL run (rollback):** stage a broken custom-html commit (`image: nginx:99.99.99-doesnotexist`), `RECIPE=custom-html CCCI_QUICK=1 CCCI_SKIP_FETCH=1 REF= cc-ci-run runner/run_recipe_ci.py` → exit 1; SUMMARY shows "rolling back … restored known-good data; canonical idle (NOT promoted)"; afterwards known-good version UNCHANGED, canonical idle, data (marker) intact. Builder ran both live: **ALL PASS** (canonical left clean idle@1.11.0+1.29.0). - **no-canonical fallback:** MODE=quick for a recipe with no canonical → logs "falling back to COLD" and runs the full cold flow (so the PR is still tested; default `!testme` unaffected). **Builder will NOT advance into W3 (cold-advances-canonical / nightly) past this gate** until REVIEW-2w shows PASS — but will do the tracked W0.10a (traefik) in a quiet window meanwhile. --- ### Gate: WC2 + WC3 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 0246296, gate 4ce80f8) Cold-verified from the Adversary's own clone (its own data-warm round-trip + restore round-trip). Builder may proceed to W2 (`--quick`). custom-html canonical left clean (idle, volume retained, known-good content, snapshot intact, v1.11.0+1.29.0). (claim detail retained below.) ### (claimed, now PASS) Gate: WC2 + WC3 — CLAIMED detail **WHAT.** The data-warm canonical model (W1): a declarative per-recipe canonical at the stable domain `warm-.ci.commoninternet.net`, kept **data-warm** (undeployed-when-idle, data volume retained), tracked by a registry; **known-good snapshots** (raw per-volume tar while undeployed, one last-good per app, restore round-trips data). **WHERE (code).** `runner/harness/canonical.py` (registry + data-warm lifecycle), `runner/harness/ warmsnap.py` (snapshot/restore), enrollment `tests/custom-html/recipe_meta.py: WARM_CANONICAL=True`. State on cc-ci under `/var/lib/ci-warm//` (`canonical.json`, `snapshot/`, retained volume). **HOW + EXPECTED (cold, from your own clone on cc-ci):** 1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **61 passed** (incl. test_canonical, test_warmsnap). 2. **WC2/WC3 data-warm round-trip** (custom-html canonical exists idle now): reproduce with a driver that uses `runner/harness/canonical.py` — deploy `warm-custom-html.ci.commoninternet.net` @ `1.11.0+1.29.0`, write a marker file into `/usr/share/nginx/html/`, undeploy, `seed_canonical` (writes `/var/lib/ci-warm/custom-html/canonical.json` + a `snapshot/` while undeployed); confirm **app UNDEPLOYED but the `content` volume RETAINED** (`docker volume ls | grep warm-custom-html`); then `deploy_canonical('custom-html')` → the marker **survives** (data-warm reattach). Builder ran this live: **ALL PASS** (marker `WC2-DATA-MARKER-7f3a9c` survived; registry version=1.11.0+1.29.0; snapshot present). Current live state: `cat /var/lib/ci-warm/custom-html/canonical.json` → status=idle, version=1.11.0+1.29.0; `docker volume ls` shows `warm-custom-html_ci_commoninternet_net_content` retained with NO custom-html service running. 3. **WC3 restore round-trip** already cold-verified in the W0.9/W0.5 keycloak proof (snapshot → mutate DB → restore → data back); same `warmsnap` helper. 4. **D8/WC8:** `/var/lib/ci-warm/` is cache, NOT in the nix closure (no module references it as a source); re-seeded by cold runs, not restored on rebuild. **Builder will NOT advance into W2 (`--quick`, which consumes the canonical) past this gate** until REVIEW-2w shows PASS — but will do non-disruptive W0.10 follow-ups (alert relay) meanwhile. --- ### Gate: WC1 + WC1.2 + WC1.1(keycloak) — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 31ac86d, gate 985686f) All 6 checks cold-verified from the Adversary's own clone. Builder may proceed to W1. **Tracked open (must close before Phase-2w DONE, not a blocker now): traefik WC1.1 (W0.10)** — stateless version-rollback not yet on the shared health-gated reconciler; Adversary will require a cold proof. (claim detail retained below for the record) **WHAT.** The live-warm keycloak layer (W0): a persistent **unpinned** keycloak at the stable domain `warm-keycloak.ci.commoninternet.net`, declaratively reconciled, that SSO-dependent runs use via a **per-run namespaced realm** (created + deleted) instead of co-deploying; concurrent dependents get distinct realms; orphan realms are reaped (WC1). The reconciler health-gates auto-upgrades with snapshot-backed rollback (WC1.1) behind a pre-deploy safety gate for major/manual-migration bumps (WC1.2). **WHERE (code).** `runner/warm_reconcile.py` (reconcile logic), `runner/harness/warm.py` (stable domain, per-run realm naming, reaping), `runner/harness/sso.py` (realm lifecycle), `runner/harness/ warmsnap.py` (snapshot/restore), `runner/run_recipe_ci.py` (warm/cold dep split), `nix/modules/ warm-keycloak.nix` (systemd reconcile unit). Warm state on cc-ci under `/var/lib/ci-warm/`. **HOW + EXPECTED (cold, from your own clone on cc-ci — tar-sync runner+tests to your /root/):** 1. **Declarative + unpinned + healthy:** `grep -n kcVersion nix/modules/warm-keycloak.nix` → *no match* (pin removed; the unit runs `runner/warm_reconcile.py keycloak`). `ssh cc-ci 'systemctl is-active warm-keycloak.service'` → `active`; `systemctl is-system-running` → `running`. Health: `curl -sk --resolve warm-keycloak.ci.commoninternet.net:443:127.0.0.1 https://warm-keycloak.ci.commoninternet.net/realms/master -o /dev/null -w '%{http_code}'` → `200`. D8: a `nixos-rebuild build` closure hash is unaffected by which keycloak version is live (recipe fetched at runtime). 2. **Units:** `cc-ci-run -m pytest tests/unit -q` → **57 passed** (incl. test_warm_realm, test_warmsnap, test_warm_reconcile). 3. **WC1 headline e2e:** `RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py` → `install: pass`, `custom: pass`, **`deploy-count = 1 (expect 1)`** (keycloak NOT co-deployed), log shows `dep: using live-warm keycloak @ warm-keycloak...` and `dep: deleted per-run realm lasuite-docs- on warm keycloak`. The 3 custom SSO tests pass (test_health_check, test_oidc_login_via_keycloak, test_oidc_password_grant_against_dep_keycloak). After the run, warm keycloak realms = `['master']` only (no leftover); no `lasu*` docker stack. 4. **WC1 concurrency + reaping (deploy-free):** `realm_for("lasuite-docs","lasu-aaa111...")` = `lasuite-docs-aaa111` and `...bbb222` → distinct (two concurrent same-recipe runs never collide); create realms aaa111/bbb222/ccc333 on the warm kc, each `oidc_password_grant` returns a JWT; `sso.reap_orphaned_realms(D, live_hexes={"aaa111"})` deletes exactly bbb222+ccc333 and KEEPS aaa111. (Builder ran this live: PASS.) 5. **WC1.1 health-gated rollback (live):** with `CCCI_SKIP_FETCH=1` stage two **annotated** fake tags on `~/.abra/recipes/keycloak` — `10.7.9+26.6.2` at the good commit (`git tag -a -m x 10.7.9+26.6.2 10.7.1+26.6.2^{}`) and `10.7.10+26.6.2` at a commit whose compose.yml has a broken `KC_HOSTNAME=:::bad-host:::`. Create a marker realm, set last_good, then run `CCCI_SKIP_FETCH=1 cc-ci-run runner/warm_reconcile.py keycloak` twice → first `RECONCILE RESULT: upgraded:...->10.7.9` (snapshot taken, last_good=10.7.9, marker preserved); second `rolled-back:10.7.10->10.7.9` — keycloak HEALTHY on 10.7.9, **marker realm INTACT** (data preserved), `/var/lib/ci-warm/keycloak/ last_good` still `10.7.9` (NOT advanced), a `*-rollback.json` alert under `/var/lib/ci-warm/alerts/` with `attempted=10.7.10 last_good=10.7.9 recovered=true`. (Builder ran this live: ALL PASS; keycloak restored to canonical 10.7.1+26.6.2.) 6. **WC1.2 pre-deploy safety gate (live):** stage an annotated fake tag with a MAJOR bump (`11.0.0+27.0.0`) → `CCCI_SKIP_FETCH=1 ... warm_reconcile.py keycloak` → `RECONCILE RESULT: held-major:...`, a `*-held-major.json` alert written, **keycloak untouched** (TYPE unchanged, 200, no snapshot/deploy churn). Stage a minor tag (`10.7.2+26.6.3`) with `releaseNotes/ 10.7.2+26.6.3.md` containing "manual migration" → `held-manual-migration`, alert carries the notes. (Builder ran both live: held + untouched.) **SCOPE (honest).** WC1 and WC1.2 are complete. **WC1.1 is proven for keycloak** — the *stateful* case (snapshot-backed data-integrity rollback), which is the hard part and the Adversary's marquee proof. **traefik's WC1.1** (stateless = version-rollback-only) is **NOT yet migrated** onto the shared health-gated reconciler — it still uses the existing `proxy.nix` chaos-deploy reconciler. That migration is **W0.10** (tracked in BACKLOG-2w), to land before the Phase-2w DONE. If the Adversary wants WC1.1 fully closed (both reconcilers) before PASS, treat this gate as WC1 + WC1.2 + WC1.1(keycloak). **Alert delivery note (not blocking):** the reconciler WRITES alert sentinels to `/var/lib/ci-warm/alerts/*.json` (proven above). The operator-facing relay (Builder loop scans → PushNotification → archive to `alerts/seen/`) is loop behavior, run each wake when an alert exists; none currently. "Alert fired" for WC1.1/WC1.2 = sentinel written, which is independently checkable. **Builder will NOT advance past this gate** (to W1/WC2 canonical registry) until REVIEW-2w shows PASS. ## (prior) Gate (none before this) ## Blocked (none) ## Notes - **Disk budget (WC8 watch):** cc-ci `/` was 91% (2.4G free) at phase start; freed orphaned Phase-2 cold apps (lasu-0a6fb2 12-svc, keyc-07d81e, lasu-dbg) → 86% (3.8G free). 9.7GB reclaimable in Docker images kept as warm pull-cache (authenticated pulls now, so re-pull is cheaper but slower). - Stable-domain scheme (proposed, see DECISIONS): `warm-.ci.commoninternet.net`, distinct from cold `-<6hex>`.