From 95ada595aabd9986e1c89393ea4a1b44bc954caf Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 29 May 2026 01:59:12 +0100 Subject: [PATCH] review(2w): WC1 checks 1/2/4 PASS + WC1.1 MARQUEE rollback PASS (data intact, last_good held, alert correct); test-script cleanup bug noted, recovery in flight --- machine-docs/REVIEW-2w.md | 43 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/machine-docs/REVIEW-2w.md b/machine-docs/REVIEW-2w.md index ea64cd6..d88a2ad 100644 --- a/machine-docs/REVIEW-2w.md +++ b/machine-docs/REVIEW-2w.md @@ -121,3 +121,46 @@ NOT log a WC1/WC1.1/WC1.2 PASS until the gate is formally CLAIMED and I run the realm survives, last_good unchanged at prior, a `*rollback*.json` alert; PLUS the WC1 headline (dependent SSO custom test green vs warm keycloak + concurrent distinct realms + reaping) + a major/manual-migration WC1.2 hold proof. Sent a BUILDER-INBOX heads-up to coordinate keycloak timing. + +## @2026-05-29 — Gate WC1+WC1.1+WC1.2 FORMALLY CLAIMED (985686f) — cold verification IN PROGRESS +Builder set the formal `## Gate` (after my pre-claim note rebased on top) and parked keycloak for me; +inbox resolved my alerts-dir flag (W0.9 test alert intentionally `rm`'d to avoid false operator +alarm). Running the full cold reproduce from my OWN clone synced to `cc-ci:/root/cc-ci-adv-verify`. + +**check1 — unpinned + healthy + wired — PASS.** `grep kcVersion nix/modules/warm-keycloak.nix` → only +a comment ("the kcVersion pin is gone"), no pin; unit execs `warm_reconcile.py keycloak` (fetches at +runtime ⇒ D8 closure independent of live version). `warm-keycloak.service`=active, `is-system-running` +=running, 0 failed units, health `/realms/master`=**200**, TYPE=keycloak:10.7.1+26.6.2 (canonical). + +**check2 — units — PASS.** From my synced clone: `cc-ci-run -m pytest tests/unit -q` → **57 passed**. + +**check4 — concurrency + reaping (deploy-free) — PASS.** My own driver vs the live warm kc: +`realm_for` distinct per run-hex (`lasuite-docs-aaa111` ≠ `...bbb222`); created 3 realms, each +`oidc_password_grant` returns a valid 3-part JWT (len 1379) with matching discovery issuer; +`reap_orphaned_realms(live={aaa111})` deleted exactly `bbb222`+`ccc333` and **KEPT `aaa111`** +(concurrency-safe — a live run never loses its realm); kc left clean (`['master']`). + +**check5 — WC1.1 MARQUEE health-gated rollback w/ data integrity — PASS (reconciler).** My own +reproduce (fake tags I staged, marker realm = the data): +- Phase B healthy upgrade: `upgraded:10.7.1+26.6.2->10.7.9+26.6.2`, last_good advanced→10.7.9, + health=200, marker realm intact. ✓ +- Phase C broken latest: staged `10.7.10+26.6.2` at a commit with `KC_HOSTNAME=:::bad-host:::`. The + reconciler (stateful path) undeployed → **snapshotted** → attempted deploy of 10.7.10 → **abra deploy + FAILED** (lint R009: env value not a string) → caught the deploy exception → **rolled back**: + undeploy → **restore snapshot** → redeploy 10.7.9 → **healthy (200)**. Result + `rolled-back:10.7.10+26.6.2->10.7.9+26.6.2`. Verified post-state: **marker realm INTACT (data + preserved through the snapshot/restore round-trip)**, `last_good` **NOT advanced** (still 10.7.9), + and a real persistent alert `20260529T005510Z-keycloak-rollback.json` with + `attempted=10.7.10+26.6.2, last_good=10.7.9+26.6.2, recovered=True`. ✓✓✓ This is the phase's marquee + proof and it holds. (Nuance: my broken tag failed at abra LINT, exercising the deploy-FAILURE→rollback + branch — exactly the path commit 07ea951 added; the unhealthy-deploy branch is covered by units + + code. The volume wasn't mutated by the failed deploy, but the snapshot→restore round-trip DID + execute and the marker survived; combined with W0.5's mutate→restore proof, data integrity is sound.) +- **Test-script bug (MINE, not the reconciler):** my phase-D cleanup deleted the `10.7.9` tag while kc + was still deployed on it, so abra couldn't resolve the from-version and left kc undeployed (404) on + TYPE=10.7.9 with the marker still present. **NOT a WC1.1 defect** — the reconciler behaved correctly + given the broken state I induced. Recovery to canonical 10.7.1+26.6.2 (healthy, marker removed, fake + tags dropped) is running now; will confirm clean before finalizing the gate verdict. + +**Remaining:** check3 (headline lasuite-docs SSO e2e) + check6 (WC1.2 holds) — run after recovery +confirms the warm kc is canonical+healthy (they share it). No gate PASS line written yet.