Files
cc-ci/machine-docs/BACKLOG-2w.md

3.3 KiB

BACKLOG — Phase 2w (warm canonical + --quick)

Single-writer rule (plan §6.1): Builder edits ## Build backlog only; Adversary edits ## Adversary findings only.

Build backlog

W0 — Live-warm keycloak (WC1, WC1.1, WC1.2)

  • W0.1 — sso.py realm lifecycle (list_realms/delete_keycloak_realm/realms_to_reap/ reap_orphaned_realms) + 8 unit tests. DONE (74bf8c1).
  • W0.2 — Orchestrator live-warm dep mode (warm.py + run_recipe_ci warm/cold split, per-run namespaced realm, realm-delete teardown, cold fallback, deploy-count). DONE (1b8d26b). Core mechanism proven deploy-free on the live warm keycloak.
  • W0.3a — Declarative reconciler nix/modules/warm-keycloak.nix up + verified via rebuild. DONE (88c1114) but INTERIM (pinned + skip-if-healthy) — superseded by W0.6 below.
  • W0.5 — WC3 snapshot/restore helper FIRST (prereq for WC1.1): runner/harness/warmsnap.py — raw copy of an app's data volume(s) while undeployed, under /var/lib/ci-warm/<recipe>/, atomic replace, one last-good, restore round-trips data. + unit tests + live round-trip proof.
  • W0.6 — Rewrite reconciler: unpin + WC1.2 safety gate + WC1.1 health-gated rollback. UNPIN keycloak (fetch latest + chaos; drop kcVersion); keep secret-guard + health-wait. WC1.2 gate: hold-on-current + alert on major/manual-migration bump (no deploy churn). WC1.1: record last-good → keycloak undeploy→snapshot→deploy latest → health-gate → commit-or-(restore+ redeploy-prior+alert). Apply the same health-gated+safety-gate pattern to traefik (version rollback only, stateless). Settle the alert mechanism (see DECISIONS).
  • W0.7 — Fix lasuite-docs in-place-redeploy race (nginx web host not found in upstream backend during chaos redeploy) OR pick a more-robust SSO dependent for the headline proof.
  • W0.8 — Headline WC1 e2e: dependent SSO custom test green vs warm keycloak; concurrent dependents distinct realms (no collision); leftover realms reaped. → claim WC1.
  • W0.9 — WC1.1/WC1.2 Adversary proofs: simulate broken latest → self-revert + data intact + alert; healthy update commits last-good; major/manual-migration → hold + alert-with-notes. → claim WC1.1/WC1.2.

W1 — Canonical registry (WC2)

  • W1.1 — Canonical registry/reconciler (declarative; tracks recipe→known-good commit; stable domain warm-<recipe>). (Snapshot/restore done in W0.5; WC3 closes with W1's canonicals.)

W2 — --quick mode (WC4, WC7)

  • W2.1 — run_recipe_ci.py --quick path (reattach → upgrade-to-PR-head → assert → PASS undeploy / FAIL restore+undeploy; never promote).
  • W2.2 — Trigger surface + labeling + no-canonical fallback (WC7).

W3 — Cold-advances-canonical + nightly sweep (WC5, WC6)

  • W3.1 — Promote-on-green-cold (snapshot+tag canonical at teardown on green cold; seed on first green).
  • W3.2 — Nightly full-cold sweep (declarative scheduler, MAX_TESTS-bounded).

W4 — Hardening + docs + cold verify (WC8, WC9)

  • W4.1 — Resource/isolation hardening: disk monitor+prune, per-app serialize, warm excluded from D8.
  • W4.2 — Docs (warm/quick) + the WC9 rollback proof.

Adversary findings

(none yet)