From 2dc1e6edc7d85a6f6ca9dfa7f4268c51bac7f242 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 29 May 2026 00:00:09 +0100 Subject: [PATCH] =?UTF-8?q?review(2w):=20absorb=20design=20update=20?= =?UTF-8?q?=E2=80=94=20WC1=20unpin=20+=20new=20WC1.1=20health-gated=20roll?= =?UTF-8?q?back=20proof=20+=20WC6=20reorder=20into=20verification=20map?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- machine-docs/REVIEW-2w.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/machine-docs/REVIEW-2w.md b/machine-docs/REVIEW-2w.md index 333dea6..83baa51 100644 --- a/machine-docs/REVIEW-2w.md +++ b/machine-docs/REVIEW-2w.md @@ -33,3 +33,30 @@ deliberately fail a PR under `--quick` and confirm the canonical's last-known-go - COLD access re-verified: `cc-ci-tailscaled` active; `ssh cc-ci` → NixOS 24.11 (50ab793); wildcard `*.ci.commoninternet.net` → gateway 143.244.213.108. Verification path is live. - IDLE until the Builder claims a WC gate (watchdog will ping on claim). Standing veto power retained. + +## @2026-05-28 — Design update absorbed (orchestrator: unpin + health-gated rollback) +SSOT updated (committed). Revised/added verification obligations I will hold the gate to: +- **WC1 (revised)** — keycloak is now **UNPINNED** like traefik: reconciler `abra recipe fetch` + latest + chaos-deploy; `kcVersion` pin DROPPED; MUST keep the *secret-generate-only-if-missing* + guard + the health-wait. Cold-check: no version pin in the nix module / reconciler; recipe fetched + at activation (runtime) so the nix closure stays byte-identical (D8 preserved — verify closure hash + unaffected by which keycloak version is live). Plus original WC1: dependent SSO custom tests pass + against warm keycloak; concurrent dependents use distinct namespaced realms (no collision); stale + realms reaped. +- **WC1.1 (NEW)** — health-gated deploy-with-rollback built INTO the warm/infra reconcilers + (traefik + keycloak), NOT nix-generation rollback (the swarm app isn't in the generation). Pattern: + record running version = last-good → deploy latest → health-check → healthy: commit last-good:=latest; + unhealthy: roll back to last-good + `PushNotification` alert. Stateful (keycloak): undeploy → raw + snapshot data volume → deploy latest → health-check → on fail restore snapshot + redeploy prior + version (forward DB migrations make version-only rollback unsafe); reuse WC3 snapshot helper. + traefik (stateless) = version rollback only. **ADVERSARY PROOF (mandatory, I must run it):** + (a) force/simulate a BROKEN "latest" → confirm the warm app self-reverts to the prior healthy + version, keycloak's **pre-upgrade data intact**, and an alert fired; (b) a HEALTHY update commits + the new version as last-good. Watch for: silent failure (broken stays deployed), data loss on + revert, no alert, or last-good not advancing on a healthy update. +- **WC6 (reordered)** — nightly = `nixos-rebuild switch` FIRST (warm/infra → latest, health-gated per + WC1.1) THEN full-cold sweep; MUST NOT run while a test run is in flight; if the health-gate rolled + an infra app back, alert fires and the sweep still runs against the healthy prior version. +- **WC8 carry** — confirm the leftover phase-2 cold app `lasu-0a6fb2` (orchestrator flagged it) is + fully torn down (app+volumes+secrets gone), since cold-teardown-sacred + disk budget are WC8. +- Still no gate CLAIMED; W0 in flight. Continue idle until a WC gate is claimed (watchdog pings).