backlog+decisions(2w): re-sequence W0 (WC3 helper first); unpin/snapshot/alert decisions
This commit is contained in:
@ -5,23 +5,34 @@ Single-writer rule (plan §6.1): Builder edits `## Build backlog` only; Adversar
|
||||
|
||||
## Build backlog
|
||||
|
||||
### W0 — Live-warm keycloak (WC1)
|
||||
- [ ] W0.1 — sso.py: realm lifecycle primitives (`delete_keycloak_realm`, `list_realms`,
|
||||
`reap_stale_realms`) + unit tests.
|
||||
- [ ] W0.2 — Orchestrator/deps: live-warm keycloak dep mode — stable warm domain + per-run
|
||||
namespaced realm; delete realm on teardown (don't undeploy); cold-codeploy fallback if no warm
|
||||
keycloak. Per-run realm name unique per (parent, pr, ref) for concurrency isolation.
|
||||
- [ ] W0.3 — Declarative Nix reconciler `nix/modules/warm-keycloak.nix` (systemd oneshot converges
|
||||
warm keycloak deployed+healthy at stable domain); wired into the host config.
|
||||
- [ ] W0.4 — e2e proof: a dependent recipe (lasuite-docs) SSO custom test passes against warm
|
||||
keycloak; concurrent dependents use distinct realms (no collision); leftover realms reaped.
|
||||
→ claim WC1 gate.
|
||||
### W0 — Live-warm keycloak (WC1, WC1.1, WC1.2)
|
||||
- [x] W0.1 — sso.py realm lifecycle (`list_realms`/`delete_keycloak_realm`/`realms_to_reap`/
|
||||
`reap_orphaned_realms`) + 8 unit tests. DONE (74bf8c1).
|
||||
- [x] W0.2 — Orchestrator live-warm dep mode (warm.py + run_recipe_ci warm/cold split, per-run
|
||||
namespaced realm, realm-delete teardown, cold fallback, deploy-count). DONE (1b8d26b).
|
||||
Core mechanism proven deploy-free on the live warm keycloak.
|
||||
- [x] W0.3a — Declarative reconciler `nix/modules/warm-keycloak.nix` up + verified via rebuild.
|
||||
DONE (88c1114) but INTERIM (pinned + skip-if-healthy) — superseded by W0.6 below.
|
||||
- [ ] **W0.5 — WC3 snapshot/restore helper FIRST** (prereq for WC1.1): `runner/harness/warmsnap.py`
|
||||
— raw copy of an app's data volume(s) while undeployed, under `/var/lib/ci-warm/<recipe>/`,
|
||||
atomic replace, one last-good, restore round-trips data. + unit tests + live round-trip proof.
|
||||
- [ ] **W0.6 — Rewrite reconciler: unpin + WC1.2 safety gate + WC1.1 health-gated rollback.**
|
||||
UNPIN keycloak (fetch latest + chaos; drop kcVersion); keep secret-guard + health-wait. WC1.2
|
||||
gate: hold-on-current + alert on major/manual-migration bump (no deploy churn). WC1.1: record
|
||||
last-good → keycloak undeploy→snapshot→deploy latest → health-gate → commit-or-(restore+
|
||||
redeploy-prior+alert). Apply the same health-gated+safety-gate pattern to traefik (version
|
||||
rollback only, stateless). Settle the alert mechanism (see DECISIONS).
|
||||
- [ ] **W0.7 — Fix lasuite-docs in-place-redeploy race** (nginx web `host not found in upstream
|
||||
backend` during chaos redeploy) OR pick a more-robust SSO dependent for the headline proof.
|
||||
- [ ] W0.8 — Headline WC1 e2e: dependent SSO custom test green vs warm keycloak; concurrent
|
||||
dependents distinct realms (no collision); leftover realms reaped. → claim WC1.
|
||||
- [ ] W0.9 — WC1.1/WC1.2 Adversary proofs: simulate broken latest → self-revert + data intact +
|
||||
alert; healthy update commits last-good; major/manual-migration → hold + alert-with-notes.
|
||||
→ claim WC1.1/WC1.2.
|
||||
|
||||
### W1 — Canonical registry + snapshot/restore (WC2, WC3)
|
||||
### W1 — Canonical registry (WC2)
|
||||
- [ ] W1.1 — Canonical registry/reconciler (declarative; tracks recipe→known-good commit; stable
|
||||
domain `warm-<recipe>`).
|
||||
- [ ] W1.2 — Snapshot/restore: raw volume copy while undeployed under `/var/lib/ci-warm/<recipe>/`;
|
||||
one last-known-good, atomic replace; prove restore round-trips data.
|
||||
domain `warm-<recipe>`). (Snapshot/restore done in W0.5; WC3 closes with W1's canonicals.)
|
||||
|
||||
### W2 — `--quick` mode (WC4, WC7)
|
||||
- [ ] W2.1 — `run_recipe_ci.py --quick` path (reattach → upgrade-to-PR-head → assert → PASS undeploy /
|
||||
|
||||
Reference in New Issue
Block a user