Files
cc-ci/machine-docs/JOURNAL-2w.md
autonomic-bot 5dd76d7c8c chore(2w): bootstrap Phase 2w loop state + cleanup orphaned cold apps
- Seed STATUS-2w / BACKLOG-2w / JOURNAL-2w (WC1-WC9 DoD, W0-W4 milestones).
- Tore down leftover Phase-2 cold apps (lasu-0a6fb2/keyc-07d81e/lasu-dbg);
  disk 91%->86%.
- DECISIONS: warm-domain scheme, per-run realm isolation, warm keycloak as
  declarative infra, cold fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 23:14:41 +01:00

3.8 KiB

JOURNAL — Phase 2w (warm canonical + --quick) — Builder

Append-only reasoning log (WHY). Facts/verification go in STATUS-2w; verdicts in REVIEW-2w.

2026-05-28 — Phase 2w bootstrap + cleanup + W0 design

Orientation. Operator interjected Phase 2w into Phase 2 (Phase 2 paused, state preserved). Read the 2w plan + plan.md §6.1/§7/§9. Adversary already online (REVIEW-2w @2026-05-28 start), idle awaiting a WC gate claim. Seeded STATUS-2w/BACKLOG-2w/JOURNAL-2w.

In-flight Phase 2 work committed. Working tree had an uncommitted edit to tests/lasuite-drive/setup_custom_tests.sh (Q3.2 MinIO bucket creation via the createbuckets one-shot) — the continuation of commit 6557197. Committed it (66e065d) with an honest message: not yet live-verified (needs a lasuite-drive deploy once warm keycloak exists). This preserves Phase 2 progress at the pause point; it resumes after 2w DONE.

Cleanup (orchestrator-requested). cc-ci / was at 91% (only 2.4G free) — a real WC8 concern before adding warm volumes/snapshots. Tore down the leftover COLD per-run apps from paused Phase 2 via lifecycle.teardown_app(..., verify=True): lasu-0a6fb2 (12-service lasuite-drive, heaviest), keyc-07d81e (cold keycloak), lasu-dbg (debug lasuite). All TEARDOWN OK, no residual. Disk → 86% (3.8G free). Only infra stacks remain (backups, bridge, dashboard, drone, traefik). Did NOT docker image prune — 9.7GB reclaimable but the image cache is the warm pull-cache; with authed Docker Hub pulls now wired, a re-pull is billed to the account (cheaper) but still slow, so keep the cache. Disk is the Phase-2w budget (WC8) — monitor.

W0 design (WC1 — live-warm keycloak). The existing SSO harness is already most of the way there:

  • sso.setup_keycloak_realm(provider_domain, realm, client_id, ...) creates a realm+client+user idempotently via the admin API, and _kc_admin_password reads the admin password from inside the running container (docker exec ... cat /run/secrets/admin_password). So it works against ANY running keycloak — cold or warm — with no external password handling.
  • The orchestrator dep flow (run_recipe_ci.py): declared_depsdeploy_deps (fresh co-deploy per run) → _enrich_deps_with_sso (creates realm, realm name currently = parent_recipe) → setup_custom_tests.sh hook → teardown_deps (undeploy).

What WC1 changes:

  1. The realm becomes the per-run isolation unit on a shared live-warm keycloak. Realm name must be unique per (parent, pr, ref) so concurrent dependents don't collide — change from realm=parent_recipe to realm=<parent>-<6hex> (derive the hex from the parent's per-run domain label so it's stable within a run and distinct across concurrent runs).
  2. The keycloak dep is not co-deployed: point at the stable warm domain; on teardown delete the realm (not undeploy keycloak). Fall back to cold co-deploy if no warm keycloak is present (so a from-scratch / no-warm environment still works — the warm keycloak is an optimization layer).
  3. The warm keycloak itself is declarative infra (Nix reconciler, like traefik) — NOT warm data (so it IS in the D8 closure as a reconciler; its realm data is ephemeral per-run anyway). Re-warmable from scratch.

Stable-domain scheme decision: warm-<recipe>.ci.commoninternet.net (here warm-keycloak...), clearly distinct from cold <recipe[:4]>-<6hex>. Risk: longer stack name → swarm 64-char config/secret limit; will verify on first deploy and shorten if it overflows.

Building W0 in increments (each verified): (1) sso realm lifecycle prims + units; (2) deploy warm keycloak manually at the stable domain and prove realm create→delete via admin API; (3) wire the orchestrator live-warm mode; (4) declarative Nix reconciler; (5) e2e + concurrency + reaping proof.