review(2w): absorb design update — WC1 unpin + new WC1.1 health-gated rollback proof + WC6 reorder into verification map
This commit is contained in:
@ -33,3 +33,30 @@ deliberately fail a PR under `--quick` and confirm the canonical's last-known-go
|
|||||||
- COLD access re-verified: `cc-ci-tailscaled` active; `ssh cc-ci` → NixOS 24.11 (50ab793);
|
- COLD access re-verified: `cc-ci-tailscaled` active; `ssh cc-ci` → NixOS 24.11 (50ab793);
|
||||||
wildcard `*.ci.commoninternet.net` → gateway 143.244.213.108. Verification path is live.
|
wildcard `*.ci.commoninternet.net` → gateway 143.244.213.108. Verification path is live.
|
||||||
- IDLE until the Builder claims a WC gate (watchdog will ping on claim). Standing veto power retained.
|
- IDLE until the Builder claims a WC gate (watchdog will ping on claim). Standing veto power retained.
|
||||||
|
|
||||||
|
## @2026-05-28 — Design update absorbed (orchestrator: unpin + health-gated rollback)
|
||||||
|
SSOT updated (committed). Revised/added verification obligations I will hold the gate to:
|
||||||
|
- **WC1 (revised)** — keycloak is now **UNPINNED** like traefik: reconciler `abra recipe fetch`
|
||||||
|
latest + chaos-deploy; `kcVersion` pin DROPPED; MUST keep the *secret-generate-only-if-missing*
|
||||||
|
guard + the health-wait. Cold-check: no version pin in the nix module / reconciler; recipe fetched
|
||||||
|
at activation (runtime) so the nix closure stays byte-identical (D8 preserved — verify closure hash
|
||||||
|
unaffected by which keycloak version is live). Plus original WC1: dependent SSO custom tests pass
|
||||||
|
against warm keycloak; concurrent dependents use distinct namespaced realms (no collision); stale
|
||||||
|
realms reaped.
|
||||||
|
- **WC1.1 (NEW)** — health-gated deploy-with-rollback built INTO the warm/infra reconcilers
|
||||||
|
(traefik + keycloak), NOT nix-generation rollback (the swarm app isn't in the generation). Pattern:
|
||||||
|
record running version = last-good → deploy latest → health-check → healthy: commit last-good:=latest;
|
||||||
|
unhealthy: roll back to last-good + `PushNotification` alert. Stateful (keycloak): undeploy → raw
|
||||||
|
snapshot data volume → deploy latest → health-check → on fail restore snapshot + redeploy prior
|
||||||
|
version (forward DB migrations make version-only rollback unsafe); reuse WC3 snapshot helper.
|
||||||
|
traefik (stateless) = version rollback only. **ADVERSARY PROOF (mandatory, I must run it):**
|
||||||
|
(a) force/simulate a BROKEN "latest" → confirm the warm app self-reverts to the prior healthy
|
||||||
|
version, keycloak's **pre-upgrade data intact**, and an alert fired; (b) a HEALTHY update commits
|
||||||
|
the new version as last-good. Watch for: silent failure (broken stays deployed), data loss on
|
||||||
|
revert, no alert, or last-good not advancing on a healthy update.
|
||||||
|
- **WC6 (reordered)** — nightly = `nixos-rebuild switch` FIRST (warm/infra → latest, health-gated per
|
||||||
|
WC1.1) THEN full-cold sweep; MUST NOT run while a test run is in flight; if the health-gate rolled
|
||||||
|
an infra app back, alert fires and the sweep still runs against the healthy prior version.
|
||||||
|
- **WC8 carry** — confirm the leftover phase-2 cold app `lasu-0a6fb2` (orchestrator flagged it) is
|
||||||
|
fully torn down (app+volumes+secrets gone), since cold-teardown-sacred + disk budget are WC8.
|
||||||
|
- Still no gate CLAIMED; W0 in flight. Continue idle until a WC gate is claimed (watchdog pings).
|
||||||
|
|||||||
Reference in New Issue
Block a user