review(2w): absorb design update — WC1 unpin + new WC1.1 health-gated rollback proof + WC6 reorder into verification map
This commit is contained in:
@ -33,3 +33,30 @@ deliberately fail a PR under `--quick` and confirm the canonical's last-known-go
|
||||
- COLD access re-verified: `cc-ci-tailscaled` active; `ssh cc-ci` → NixOS 24.11 (50ab793);
|
||||
wildcard `*.ci.commoninternet.net` → gateway 143.244.213.108. Verification path is live.
|
||||
- IDLE until the Builder claims a WC gate (watchdog will ping on claim). Standing veto power retained.
|
||||
|
||||
## @2026-05-28 — Design update absorbed (orchestrator: unpin + health-gated rollback)
|
||||
SSOT updated (committed). Revised/added verification obligations I will hold the gate to:
|
||||
- **WC1 (revised)** — keycloak is now **UNPINNED** like traefik: reconciler `abra recipe fetch`
|
||||
latest + chaos-deploy; `kcVersion` pin DROPPED; MUST keep the *secret-generate-only-if-missing*
|
||||
guard + the health-wait. Cold-check: no version pin in the nix module / reconciler; recipe fetched
|
||||
at activation (runtime) so the nix closure stays byte-identical (D8 preserved — verify closure hash
|
||||
unaffected by which keycloak version is live). Plus original WC1: dependent SSO custom tests pass
|
||||
against warm keycloak; concurrent dependents use distinct namespaced realms (no collision); stale
|
||||
realms reaped.
|
||||
- **WC1.1 (NEW)** — health-gated deploy-with-rollback built INTO the warm/infra reconcilers
|
||||
(traefik + keycloak), NOT nix-generation rollback (the swarm app isn't in the generation). Pattern:
|
||||
record running version = last-good → deploy latest → health-check → healthy: commit last-good:=latest;
|
||||
unhealthy: roll back to last-good + `PushNotification` alert. Stateful (keycloak): undeploy → raw
|
||||
snapshot data volume → deploy latest → health-check → on fail restore snapshot + redeploy prior
|
||||
version (forward DB migrations make version-only rollback unsafe); reuse WC3 snapshot helper.
|
||||
traefik (stateless) = version rollback only. **ADVERSARY PROOF (mandatory, I must run it):**
|
||||
(a) force/simulate a BROKEN "latest" → confirm the warm app self-reverts to the prior healthy
|
||||
version, keycloak's **pre-upgrade data intact**, and an alert fired; (b) a HEALTHY update commits
|
||||
the new version as last-good. Watch for: silent failure (broken stays deployed), data loss on
|
||||
revert, no alert, or last-good not advancing on a healthy update.
|
||||
- **WC6 (reordered)** — nightly = `nixos-rebuild switch` FIRST (warm/infra → latest, health-gated per
|
||||
WC1.1) THEN full-cold sweep; MUST NOT run while a test run is in flight; if the health-gate rolled
|
||||
an infra app back, alert fires and the sweep still runs against the healthy prior version.
|
||||
- **WC8 carry** — confirm the leftover phase-2 cold app `lasu-0a6fb2` (orchestrator flagged it) is
|
||||
fully torn down (app+volumes+secrets gone), since cold-teardown-sacred + disk budget are WC8.
|
||||
- Still no gate CLAIMED; W0 in flight. Continue idle until a WC gate is claimed (watchdog pings).
|
||||
|
||||
Reference in New Issue
Block a user