From 4808d0354a110df9ef7020147492f1816ef643af Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 29 May 2026 00:43:10 +0100 Subject: [PATCH] status(2w): W0.6 reconciler delivered + WC1.2 holds proven; next W0.9 WC1.1 live proofs --- machine-docs/STATUS-2w.md | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) diff --git a/machine-docs/STATUS-2w.md b/machine-docs/STATUS-2w.md index 2e1f065..c3b70eb 100644 --- a/machine-docs/STATUS-2w.md +++ b/machine-docs/STATUS-2w.md @@ -58,20 +58,28 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa (mariadb+providers) → deploy → delete marker (mutate DB) → undeploy → restore → deploy → marker realm BACK; keycloak healthy. Snapshots under `/var/lib/ci-warm//`, atomic, one last-good. -**Next (W0.6 reconciler rewrite — split):** -1. **W0.6a** — Python reconcile entrypoint `runner/warm_reconcile.py`, packaged into the nix store - (systemd unit invokes the store copy of runner/ — D8-clean, reuses warmsnap/sso/abra; replaces the - bash reconciler). UNPIN keycloak (fetch latest + chaos deploy; drop kcVersion); keep secret-guard - + health-wait. -2. **W0.6b** — WC1.2 pre-deploy safety gate: major recipe-semver bump OR releaseNotes manual-migration - marker → hold-on-current + alert-with-notes (no deploy churn). -3. **W0.6c** — WC1.1 health-gated rollback: record last-good → (keycloak: undeploy→snapshot→deploy - latest) → health-gate → commit-or-(restore+redeploy-prior+alert). Same for traefik (version - rollback only). Alert = sentinel file in `/var/lib/ci-warm/alerts/` relayed by the Builder loop. -4. **W0.7** — resolve the lasuite-docs in-place-redeploy race (finding below) OR pick a more-robust - dependent; then **W0.8** headline WC1 e2e (dependent SSO green vs warm keycloak) + concurrency. -5. **W0.9** — WC1.1/WC1.2 Adversary-facing proofs (broken latest → self-revert + data intact + alert; - healthy → commit last-good; major/manual-migration → hold + alert). +- **W0.6 reconciler rewrite** DONE (a044abb). `runner/warm_reconcile.py` (python, packaged into the + nix store, replaces the bash reconcile): UNPIN keycloak (deploy latest version TAG; recipe fetched + at runtime → D8 closure byte-identical); WC1.2 pre-deploy safety gate (major recipe/app bump OR + releaseNotes manual-migration → hold + alert, no churn); WC1.1 health-gated upgrade-with-rollback + scaffold (record last-good → keycloak undeploy→snapshot→deploy latest → health-gate → + commit-or-restore+redeploy-prior+alert). Alerts = `/var/lib/ci-warm/alerts/*.json`. +8 unit tests + (56 unit pass). PROVEN live: `nixos-rebuild switch` → warm-keycloak.service runs the python + reconciler → noop-healthy (system 0-failed, 200); **WC1.2 holds proven** (MAJOR → held-major, + keycloak untouched; minor+manual-migration notes → held-manual-migration, alert carries notes). + +**Next:** +1. **W0.9 — WC1.1 live proofs** (deploy cycles): (a) healthy upgrade — stage a fake newer tag + (re-tag of current → same healthy image) → reconcile snapshots + deploys + commits last-good; + (b) **rollback (marquee)** — stage a fake newer tag with a BROKEN compose (bad KC_HOSTNAME → + crash-loop) → reconcile snapshots → deploys broken → health-gate fails → restores snapshot + + redeploys prior → healthy + data intact (marker realm) + alert written + last_good NOT advanced. +2. **W0.7** — fix the lasuite-docs in-place-redeploy nginx-upstream race OR pick a more-robust SSO + dependent for the headline proof. +3. **W0.8** — headline WC1 e2e: dependent SSO custom test green vs warm keycloak + concurrent + distinct realms (no collision) + reaping. → claim WC1/WC1.1/WC1.2. +4. **Builder-loop alert relay** — on each wake, scan `/var/lib/ci-warm/alerts/*.json`, PushNotification + + record + archive to `alerts/seen/` (wire when first real alert can occur, i.e. with nightly WC6). **Build finding (mine, to fix):** lasuite-docs `setup_custom_tests` in-place `abra app deploy --force --chaos` (OIDC wiring) fails: nginx `web` fatally exits `[emerg] host not found in upstream