diff --git a/machine-docs/JOURNAL-2w.md b/machine-docs/JOURNAL-2w.md index 6a3ba9b..9c4c79e 100644 --- a/machine-docs/JOURNAL-2w.md +++ b/machine-docs/JOURNAL-2w.md @@ -213,3 +213,23 @@ Claiming the WC1/WC1.1/WC1.2 gate. Note: the reconciler WRITES alert sentinels to /var/lib/ci-warm/alerts/ (proven for rollback + holds). The Builder-loop RELAY (sentinel → PushNotification + archive to seen/) runs each wake when an alert is present; none currently. This delivery layer is loop behavior, not reconciler logic. + +## 2026-05-29 — Gate WC1+WC1.2+WC1.1(keycloak) ADVERSARY PASS; advancing to W1 + +The Adversary cold-verified all 6 checks from its OWN clone (`cc-ci:/root/cc-ci-adv-verify`): +check1 unpinned/healthy/wired, check2 57 units, check3 headline lasuite-docs SSO e2e (install+custom +pass, deploy-count=1, per-run realm created+deleted, warm kc left `['master']`, cold teardown sacred), +check4 concurrency+reaping, check5 WC1.1 marquee rollback (data intact, last_good held, alert), check6 +WC1.2 holds. **Gate verdict: PASS @2026-05-29** (REVIEW-2w 31ac86d) for exactly the claimed scope. +The Adversary independently hit + correctly attributed the same test-script cleanup footgun to the +test, not the reconciler. ONE tracked-open before DONE (no finding): traefik WC1.1 (W0.10) — its +stateless version-rollback isn't yet on the shared reconciler. + +**Advancing to W1 (WC2 canonical registry + WC3 closure).** Design intent: a small declarative +registry of canonical recipes → known-good commit, each at `warm-` kept DATA-warm (undeployed +when idle, volume retained), re-warmable. warmsnap (W0.5) already provides one-last-good snapshot + +restore. Need to decide: registry format/location (in-repo declarative) + the data-warm lifecycle +(deploy→use→undeploy-keep-volume) + how a canonical is seeded/advanced (WC5 cold-only, later). W1 +builds the registry + data-warm reconcile; WC5/WC6 (promote-on-green-cold + nightly) come in W3. + +traefik W0.10 + alert-relay deferred to a quiet window before DONE (traefik is critical TLS infra). diff --git a/machine-docs/STATUS-2w.md b/machine-docs/STATUS-2w.md index 304c240..d1a9830 100644 --- a/machine-docs/STATUS-2w.md +++ b/machine-docs/STATUS-2w.md @@ -12,15 +12,14 @@ canonical and upgrades to PR head (rolling back on failure), cold-only canonical nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversary cold-verified. ## Definition of Done (Phase 2w) — WC1–WC9 (+WC1.1/WC1.2), each Adversary cold-verified in REVIEW-2w -- [ ] **WC1** — Live-warm keycloak (SSO dep) at a stable domain, **UNPINNED** (fetch latest + chaos - deploy, like traefik; keep secret-generate-only-if-missing + health-wait); dependents - create+delete per-run namespaced realms; concurrent dependents don't collide; leftover realms reaped. -- [ ] **WC1.1** — Health-gated deploy-with-rollback in warm/infra reconcilers (traefik+keycloak): - record last-good → deploy latest → health-check → healthy commits last-good:=latest; unhealthy - rolls back + alerts. Stateful (keycloak): snapshot data volume before upgrade, restore on - rollback (reuse WC3 helper). traefik = version rollback only. -- [ ] **WC1.2** — Pre-deploy safety gate: auto-apply only non-major/no-manual-migration bumps; a - MAJOR bump or manual-migration release notes → stay on current + alert with notes (no silent auto-upgrade). +- [x] **WC1** — Live-warm UNPINNED keycloak; per-run namespaced realms (create+delete); concurrent + distinct realms; orphan realms reaped. **Adversary PASS @2026-05-29** (REVIEW-2w, gate 985686f). +- [~] **WC1.1** — Health-gated deploy-with-rollback. **keycloak (stateful) — Adversary PASS + @2026-05-29** (marquee: broken latest → snapshot→restore→prior, data intact, last_good held, + alert). **traefik (stateless, version-rollback-only) — NOT yet migrated = W0.10**, MUST close + before Phase-2w DONE (Adversary will require a cold proof). +- [x] **WC1.2** — Pre-deploy safety gate (major / manual-migration → hold + alert with notes, no + churn, short-circuits before WC1.1). **Adversary PASS @2026-05-29**. - [ ] **WC2** — Data-warm canonical model: per-recipe canonical at a stable domain, declarative registry tracking recipe→known-good commit; re-warmable from scratch. - [ ] **WC3** — Known-good snapshots: raw volume copy taken while undeployed under stable path; one @@ -37,7 +36,8 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa confirm last-known-good restored intact; a `--quick` pass did not move the known-good). ## Milestones (plan §3) -- **W0** — Warm keycloak (WC1). ← IN FLIGHT +- **W0** — Warm keycloak (WC1/WC1.1-keycloak/WC1.2). ✅ Adversary PASS @2026-05-29. +- **W1** — Canonical registry + snapshot/restore (WC2, WC3). ← IN FLIGHT - **W1** — Canonical registry + snapshot/restore (WC2, WC3). - **W2** — `--quick` mode (WC4, WC7). - **W3** — Cold-advances-canonical + nightly sweep (WC5, WC6). @@ -82,13 +82,24 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa `upgraded:` then `rolled-back:`, marker realm survives, `/var/lib/ci-warm/keycloak/last_good` unchanged at the prior version, a `*rollback*.json` alert under `/var/lib/ci-warm/alerts/`. -**Next (remaining for WC1 gate):** -1. **W0.7** — fix the lasuite-docs in-place chaos-redeploy nginx-upstream race (`host not found in - upstream ...backend:8000`) OR pick a more-robust SSO dependent for the headline proof. -2. **W0.8** — headline WC1 e2e: dependent SSO custom test green vs warm keycloak + concurrent - distinct realms (no collision) + reaping. → claim WC1/WC1.1/WC1.2. -3. **Builder-loop alert relay** (deferred wiring) — on each wake, scan `/var/lib/ci-warm/alerts/*.json`, - PushNotification + record + archive to `alerts/seen/`; wire when nightly WC6 lands (first real alert). +**W0 COMPLETE — Adversary PASS @2026-05-29.** Now in **W1 (canonical registry, WC2/WC3)**. + +**W1 plan (WC2 data-warm canonical model + WC3 closure):** +- WC2: a declarative **canonical registry** — which recipes are canonical + at which known-good + commit/version — with each canonical app at a **stable domain `warm-`**, kept **data-warm** + (undeployed-when-idle, data volume retained). Re-warmable from scratch (cache). Reconciler/registry + declared in-repo. +- WC3: snapshots (warmsnap, W0.5 — done) tied to canonicals: one last-good per canonical under + `/var/lib/ci-warm//`, restore proven (done). Close WC3 with the canonical model. +- Distinguish from W0's live-warm keycloak: canonicals are DATA-warm (undeployed when idle), keycloak + is LIVE-warm (always up). Both use the `warm-` stable scheme. + +**Tracked before Phase-2w DONE (not blocking W1):** +- **W0.10a — traefik WC1.1** (Adversary requires a cold proof): migrate `proxy.nix` onto the shared + health-gated reconciler (stateless = version-rollback-only; preserve cert-secret/WILDCARDS_ENABLED/ + COMPOSE_FILE setup). CAREFUL — traefik serves all TLS; deploy/test only in a quiet window. +- **W0.10b — Builder-loop alert relay**: each wake, scan `/var/lib/ci-warm/alerts/*.json` → + PushNotification → archive to `alerts/seen/`. **Build finding (RESOLVED):** the W0.4 lasuite-docs `setup_custom_tests` redeploy failure (nginx web `host not found in upstream ...backend:8000`) was **transient resource contention** from the @@ -97,7 +108,12 @@ headline e2e is green (below). No recipe/harness change needed. ## Gate -### Gate: WC1 + WC1.1 + WC1.2 — CLAIMED, awaiting Adversary (@2026-05-29, HEAD = see `git log -1`) +### Gate: WC1 + WC1.2 + WC1.1(keycloak) — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 31ac86d, gate 985686f) +All 6 checks cold-verified from the Adversary's own clone. Builder may proceed to W1. **Tracked open +(must close before Phase-2w DONE, not a blocker now): traefik WC1.1 (W0.10)** — stateless +version-rollback not yet on the shared health-gated reconciler; Adversary will require a cold proof. + +(claim detail retained below for the record) **WHAT.** The live-warm keycloak layer (W0): a persistent **unpinned** keycloak at the stable domain `warm-keycloak.ci.commoninternet.net`, declaratively reconciled, that SSO-dependent runs use via a