backlog+decisions(2w): re-sequence W0 (WC3 helper first); unpin/snapshot/alert decisions
This commit is contained in:
@ -585,3 +585,33 @@ from D8. The keycloak's realm data is ephemeral per-run, so nothing persistent t
|
||||
from-scratch host before the reconciler has run, or the warm app is down), the keycloak dep path
|
||||
falls back to the existing cold co-deploy so dependent runs still work. The warm path is preferred
|
||||
when available.
|
||||
|
||||
## Phase 2w — design update: unpinned warm/infra + health-gated rollback (2026-05-28/29)
|
||||
|
||||
**Warm/infra apps (traefik + keycloak) auto-update to LATEST nightly, health-gated (operator).**
|
||||
Supersedes the W0.3 pinned `kcVersion`. Keycloak is now unpinned like traefik: reconciler `abra
|
||||
recipe fetch` latest + chaos deploy; keep secret-generate-only-if-missing + health-wait. D8 holds
|
||||
because the recipe is fetched at *activation* (runtime), so the nix store closure is byte-identical
|
||||
regardless of which keycloak version is live.
|
||||
|
||||
**Snapshot helper (WC3) — format + path.** `runner/harness/warmsnap.py`. A snapshot is a **raw tar
|
||||
of each docker volume belonging to the app's stack**, taken **while the app is undeployed** (nothing
|
||||
writing → consistent). Stored under `/var/lib/ci-warm/<recipe>/` as `<recipe>.snapshot.tar` + a
|
||||
`<recipe>.meta.json` (commit/version/timestamp/volume list). **One last-good per app**, replaced
|
||||
**atomically** (write to `.tmp` then `rename`). Restore: for each volume, clear `_data` and untar
|
||||
back. Docker volumes are stack-scoped (`<stack>_<vol>`); the helper enumerates them via
|
||||
`docker volume ls` filtered to the stack. Reused by WC1.1 (pre-upgrade snapshot of keycloak) and WC5
|
||||
(promote-on-green-cold). Warm snapshots are **cache, excluded from the D8 closure** (WC8).
|
||||
|
||||
**Alert mechanism — sentinel files relayed by the Builder loop.** The warm/infra reconciler is an
|
||||
autonomous bash systemd unit on cc-ci; it cannot call the agent's `PushNotification` tool. So a
|
||||
reconciler that rolls back (WC1.1) or holds a major/manual-migration upgrade (WC1.2) writes a JSON
|
||||
**alert sentinel** to `/var/lib/ci-warm/alerts/<ts>-<app>-<reason>.json` (fields: app, reason
|
||||
[rollback|held-major|held-manual-migration], from_version, to_version, release_notes, ts). The
|
||||
Builder loop, each wake, scans that dir; for each new alert it (a) issues `PushNotification` to the
|
||||
operator, (b) records it in STATUS-2w/JOURNAL-2w, (c) archives it to `alerts/seen/`. This bridges the
|
||||
autonomous reconciler to operator visibility (latency = next Builder wake; acceptable for an alert).
|
||||
|
||||
**Re-sequence:** WC1.1's keycloak rollback needs the WC3 snapshot helper, so build that FIRST, then
|
||||
rewrite the reconciler ONCE into the unpinned + WC1.2-safety-gated + WC1.1-health-gated-rollback form
|
||||
(avoids reworking the reconciler twice). The W0.3 reconciler is INTERIM until then.
|
||||
|
||||
Reference in New Issue
Block a user