decisions+status(2w): W0.5 done (WC3 snapshot proven); W0.6 reconciler version model (deploy-by-tag, recipe-semver pre-+, python entrypoint in store)
This commit is contained in:
@ -615,3 +615,30 @@ autonomous reconciler to operator visibility (latency = next Builder wake; accep
|
||||
**Re-sequence:** WC1.1's keycloak rollback needs the WC3 snapshot helper, so build that FIRST, then
|
||||
rewrite the reconciler ONCE into the unpinned + WC1.2-safety-gated + WC1.1-health-gated-rollback form
|
||||
(avoids reworking the reconciler twice). The W0.3 reconciler is INTERIM until then.
|
||||
|
||||
## Phase 2w — W0.6 reconciler: version model + deploy-by-tag (2026-05-29)
|
||||
|
||||
**Reconcile entrypoint in Python, packaged in the nix store.** `runner/warm_reconcile.py`, invoked by
|
||||
the systemd unit as `${pyEnv}/bin/python3 ${../../runner}/warm_reconcile.py <app>` (the runner/ dir is
|
||||
copied into the store → D8-clean, no dependence on the /root/cc-ci checkout). Reuses
|
||||
warmsnap/sso/abra/lifecycle so there is ONE snapshot impl (also used by the runner for WC5). Replaces
|
||||
the bash reconcile in warm-keycloak.nix.
|
||||
|
||||
**"latest" = newest published version TAG, deployed pinned (not chaos-of-main).** WC1.2's "major
|
||||
recipe-version bump" detection needs comparable versions, which chaos (deploy main HEAD) doesn't give.
|
||||
So the reconciler resolves latest = `git tag | sort -V | tail -1` (valid coop-cloud version tags),
|
||||
records current = the app .env `VERSION`, and deploys the chosen tag pinned (`abra app deploy <domain>
|
||||
<version> -o -n -f`, after `git checkout <tag>`). "Auto-update to latest" is satisfied by converging
|
||||
to the newest tag; "chaos" in the operator note is read as "auto-deploy latest", and tag-pinning is
|
||||
the correct mechanism for a version-gated auto-update.
|
||||
|
||||
**coop-cloud version format is `<recipe-semver>+<app-version>` (observed), not the plan's
|
||||
`<upstream>+<recipe-semver>`.** Evidence: keycloak `10.7.1+26.6.2` → image `keycloak:26.6.2`; n8n
|
||||
`3.2.0+2.20.6` → image `n8nio/n8n:2.20.6` (the post-`+` part is the app image tag). So the **recipe
|
||||
semver is the part BEFORE `+`**. WC1.2's "major recipe bump = breaking" keys off the major (first)
|
||||
component of the pre-`+` recipe semver (e.g. 3.x→4.0 = held). Secondary signal: scan the target's
|
||||
`releaseNotes/<version>.md` for manual-migration markers.
|
||||
|
||||
**Scope order for W0.6:** keycloak first (the W0 focus, stateful → snapshot path); apply the same
|
||||
health-gated + safety-gate pattern to traefik (stateless, version-rollback-only) afterward by
|
||||
migrating proxy.nix onto the shared reconcile entrypoint.
|
||||
|
||||
@ -53,19 +53,25 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa
|
||||
warm-keycloak.service active, system running (0 failed), /realms/master=200. (INTERIM: pinned +
|
||||
skip-if-healthy; to be replaced by the unpinned + health-gated WC1.1 form.)
|
||||
|
||||
**Re-sequenced after the 2026-05-28/29 design update (unpin + WC1.1 rollback + WC1.2 safety gate):**
|
||||
WC1.1's keycloak rollback needs the **WC3 snapshot/restore helper**, so build that FIRST, then
|
||||
rewrite the reconciler ONCE into the unpinned + safety-gated + health-gated-with-rollback form. Next:
|
||||
1. **WC3 snapshot/restore helper** (`runner/harness/warmsnap.py`): raw copy of an app's data
|
||||
volume(s) while undeployed, under `/var/lib/ci-warm/<recipe>/`, atomic replace, one last-good;
|
||||
restore round-trips data. + unit tests + live round-trip proof.
|
||||
2. Rewrite reconciler: unpin keycloak (fetch latest + chaos); WC1.2 safety gate (major / manual-
|
||||
migration → hold + alert); WC1.1 record last-good → (keycloak: undeploy→snapshot→deploy latest) →
|
||||
health-gate → commit-or-rollback+restore+alert.
|
||||
3. Settle the **alert mechanism** (bash reconciler can't call agent PushNotification — sentinel file
|
||||
the Builder loop relays, see DECISIONS).
|
||||
4. Resolve the lasuite-docs in-place-redeploy race (BUILD finding below) OR pick a more-robust
|
||||
dependent, then the headline WC1 e2e (dependent SSO green vs warm keycloak) + concurrency proof.
|
||||
- **W0.5 WC3 snapshot/restore helper** (`runner/harness/warmsnap.py`) DONE (4cc1e15). +5 unit tests
|
||||
(48 unit pass). **LIVE round-trip PROVEN on warm keycloak**: marker realm → undeploy → snapshot
|
||||
(mariadb+providers) → deploy → delete marker (mutate DB) → undeploy → restore → deploy → marker
|
||||
realm BACK; keycloak healthy. Snapshots under `/var/lib/ci-warm/<recipe>/`, atomic, one last-good.
|
||||
|
||||
**Next (W0.6 reconciler rewrite — split):**
|
||||
1. **W0.6a** — Python reconcile entrypoint `runner/warm_reconcile.py`, packaged into the nix store
|
||||
(systemd unit invokes the store copy of runner/ — D8-clean, reuses warmsnap/sso/abra; replaces the
|
||||
bash reconciler). UNPIN keycloak (fetch latest + chaos deploy; drop kcVersion); keep secret-guard
|
||||
+ health-wait.
|
||||
2. **W0.6b** — WC1.2 pre-deploy safety gate: major recipe-semver bump OR releaseNotes manual-migration
|
||||
marker → hold-on-current + alert-with-notes (no deploy churn).
|
||||
3. **W0.6c** — WC1.1 health-gated rollback: record last-good → (keycloak: undeploy→snapshot→deploy
|
||||
latest) → health-gate → commit-or-(restore+redeploy-prior+alert). Same for traefik (version
|
||||
rollback only). Alert = sentinel file in `/var/lib/ci-warm/alerts/` relayed by the Builder loop.
|
||||
4. **W0.7** — resolve the lasuite-docs in-place-redeploy race (finding below) OR pick a more-robust
|
||||
dependent; then **W0.8** headline WC1 e2e (dependent SSO green vs warm keycloak) + concurrency.
|
||||
5. **W0.9** — WC1.1/WC1.2 Adversary-facing proofs (broken latest → self-revert + data intact + alert;
|
||||
healthy → commit last-good; major/manual-migration → hold + alert).
|
||||
|
||||
**Build finding (mine, to fix):** lasuite-docs `setup_custom_tests` in-place `abra app deploy
|
||||
--force --chaos` (OIDC wiring) fails: nginx `web` fatally exits `[emerg] host not found in upstream
|
||||
|
||||
Reference in New Issue
Block a user