decisions+status(2w): W0.5 done (WC3 snapshot proven); W0.6 reconciler version model (deploy-by-tag, recipe-semver pre-+, python entrypoint in store)
This commit is contained in:
@ -53,19 +53,25 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa
|
||||
warm-keycloak.service active, system running (0 failed), /realms/master=200. (INTERIM: pinned +
|
||||
skip-if-healthy; to be replaced by the unpinned + health-gated WC1.1 form.)
|
||||
|
||||
**Re-sequenced after the 2026-05-28/29 design update (unpin + WC1.1 rollback + WC1.2 safety gate):**
|
||||
WC1.1's keycloak rollback needs the **WC3 snapshot/restore helper**, so build that FIRST, then
|
||||
rewrite the reconciler ONCE into the unpinned + safety-gated + health-gated-with-rollback form. Next:
|
||||
1. **WC3 snapshot/restore helper** (`runner/harness/warmsnap.py`): raw copy of an app's data
|
||||
volume(s) while undeployed, under `/var/lib/ci-warm/<recipe>/`, atomic replace, one last-good;
|
||||
restore round-trips data. + unit tests + live round-trip proof.
|
||||
2. Rewrite reconciler: unpin keycloak (fetch latest + chaos); WC1.2 safety gate (major / manual-
|
||||
migration → hold + alert); WC1.1 record last-good → (keycloak: undeploy→snapshot→deploy latest) →
|
||||
health-gate → commit-or-rollback+restore+alert.
|
||||
3. Settle the **alert mechanism** (bash reconciler can't call agent PushNotification — sentinel file
|
||||
the Builder loop relays, see DECISIONS).
|
||||
4. Resolve the lasuite-docs in-place-redeploy race (BUILD finding below) OR pick a more-robust
|
||||
dependent, then the headline WC1 e2e (dependent SSO green vs warm keycloak) + concurrency proof.
|
||||
- **W0.5 WC3 snapshot/restore helper** (`runner/harness/warmsnap.py`) DONE (4cc1e15). +5 unit tests
|
||||
(48 unit pass). **LIVE round-trip PROVEN on warm keycloak**: marker realm → undeploy → snapshot
|
||||
(mariadb+providers) → deploy → delete marker (mutate DB) → undeploy → restore → deploy → marker
|
||||
realm BACK; keycloak healthy. Snapshots under `/var/lib/ci-warm/<recipe>/`, atomic, one last-good.
|
||||
|
||||
**Next (W0.6 reconciler rewrite — split):**
|
||||
1. **W0.6a** — Python reconcile entrypoint `runner/warm_reconcile.py`, packaged into the nix store
|
||||
(systemd unit invokes the store copy of runner/ — D8-clean, reuses warmsnap/sso/abra; replaces the
|
||||
bash reconciler). UNPIN keycloak (fetch latest + chaos deploy; drop kcVersion); keep secret-guard
|
||||
+ health-wait.
|
||||
2. **W0.6b** — WC1.2 pre-deploy safety gate: major recipe-semver bump OR releaseNotes manual-migration
|
||||
marker → hold-on-current + alert-with-notes (no deploy churn).
|
||||
3. **W0.6c** — WC1.1 health-gated rollback: record last-good → (keycloak: undeploy→snapshot→deploy
|
||||
latest) → health-gate → commit-or-(restore+redeploy-prior+alert). Same for traefik (version
|
||||
rollback only). Alert = sentinel file in `/var/lib/ci-warm/alerts/` relayed by the Builder loop.
|
||||
4. **W0.7** — resolve the lasuite-docs in-place-redeploy race (finding below) OR pick a more-robust
|
||||
dependent; then **W0.8** headline WC1 e2e (dependent SSO green vs warm keycloak) + concurrency.
|
||||
5. **W0.9** — WC1.1/WC1.2 Adversary-facing proofs (broken latest → self-revert + data intact + alert;
|
||||
healthy → commit last-good; major/manual-migration → hold + alert).
|
||||
|
||||
**Build finding (mine, to fix):** lasuite-docs `setup_custom_tests` in-place `abra app deploy
|
||||
--force --chaos` (OIDC wiring) fails: nginx `web` fatally exits `[emerg] host not found in upstream
|
||||
|
||||
Reference in New Issue
Block a user