claim(2w): Gate WC1+WC1.1+WC1.2 CLAIMED — warm keycloak headline e2e GREEN + concurrency/reaping + rollback/holds proven
W0.7 (lasuite-docs race was transient) + W0.8 headline e2e: lasuite-docs custom pass (3 SSO tests incl. oidc_login + password_grant) vs WARM keycloak, deploy-count=1 (keycloak NOT co-deployed), per-run realm lasuite-docs-4c0858 created+deleted; warm kc left with only master realm. Concurrency+reaping proven (distinct realms for concurrent same-recipe runs; reap keeps-live/deletes-orphans). Gate claim in STATUS-2w carries full WHAT/HOW/EXPECTED/WHERE for cold verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@ -90,13 +90,75 @@ nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversa
|
||||
3. **Builder-loop alert relay** (deferred wiring) — on each wake, scan `/var/lib/ci-warm/alerts/*.json`,
|
||||
PushNotification + record + archive to `alerts/seen/`; wire when nightly WC6 lands (first real alert).
|
||||
|
||||
**Build finding (mine, to fix):** lasuite-docs `setup_custom_tests` in-place `abra app deploy
|
||||
--force --chaos` (OIDC wiring) fails: nginx `web` fatally exits `[emerg] host not found in upstream
|
||||
...backend:8000` during the rolling restart → abra converge times out. Independent of warm/cold
|
||||
keycloak. Blocks the WC1 dependent-green proof until fixed/worked-around.
|
||||
**Build finding (RESOLVED):** the W0.4 lasuite-docs `setup_custom_tests` redeploy failure (nginx web
|
||||
`host not found in upstream ...backend:8000`) was **transient resource contention** from the
|
||||
since-killed stale Phase-2 run (disk was also tight). On the clean system it converges fine — the
|
||||
headline e2e is green (below). No recipe/harness change needed.
|
||||
|
||||
## Gate
|
||||
(none claimed yet)
|
||||
|
||||
### Gate: WC1 + WC1.1 + WC1.2 — CLAIMED, awaiting Adversary (@2026-05-29, HEAD = see `git log -1`)
|
||||
|
||||
**WHAT.** The live-warm keycloak layer (W0): a persistent **unpinned** keycloak at the stable domain
|
||||
`warm-keycloak.ci.commoninternet.net`, declaratively reconciled, that SSO-dependent runs use via a
|
||||
**per-run namespaced realm** (created + deleted) instead of co-deploying; concurrent dependents get
|
||||
distinct realms; orphan realms are reaped (WC1). The reconciler health-gates auto-upgrades with
|
||||
snapshot-backed rollback (WC1.1) behind a pre-deploy safety gate for major/manual-migration bumps
|
||||
(WC1.2).
|
||||
|
||||
**WHERE (code).** `runner/warm_reconcile.py` (reconcile logic), `runner/harness/warm.py` (stable
|
||||
domain, per-run realm naming, reaping), `runner/harness/sso.py` (realm lifecycle), `runner/harness/
|
||||
warmsnap.py` (snapshot/restore), `runner/run_recipe_ci.py` (warm/cold dep split), `nix/modules/
|
||||
warm-keycloak.nix` (systemd reconcile unit). Warm state on cc-ci under `/var/lib/ci-warm/`.
|
||||
|
||||
**HOW + EXPECTED (cold, from your own clone on cc-ci — tar-sync runner+tests to your /root/<clone>):**
|
||||
|
||||
1. **Declarative + unpinned + healthy:** `grep -n kcVersion nix/modules/warm-keycloak.nix` → *no
|
||||
match* (pin removed; the unit runs `runner/warm_reconcile.py keycloak`). `ssh cc-ci 'systemctl
|
||||
is-active warm-keycloak.service'` → `active`; `systemctl is-system-running` → `running`. Health:
|
||||
`curl -sk --resolve warm-keycloak.ci.commoninternet.net:443:127.0.0.1
|
||||
https://warm-keycloak.ci.commoninternet.net/realms/master -o /dev/null -w '%{http_code}'` → `200`.
|
||||
D8: a `nixos-rebuild build` closure hash is unaffected by which keycloak version is live (recipe
|
||||
fetched at runtime).
|
||||
2. **Units:** `cc-ci-run -m pytest tests/unit -q` → **57 passed** (incl. test_warm_realm,
|
||||
test_warmsnap, test_warm_reconcile).
|
||||
3. **WC1 headline e2e:** `RECIPE=lasuite-docs STAGES=install,custom cc-ci-run
|
||||
runner/run_recipe_ci.py` → `install: pass`, `custom: pass`, **`deploy-count = 1 (expect 1)`**
|
||||
(keycloak NOT co-deployed), log shows `dep: using live-warm keycloak @ warm-keycloak...` and
|
||||
`dep: deleted per-run realm lasuite-docs-<hex> on warm keycloak`. The 3 custom SSO tests pass
|
||||
(test_health_check, test_oidc_login_via_keycloak, test_oidc_password_grant_against_dep_keycloak).
|
||||
After the run, warm keycloak realms = `['master']` only (no leftover); no `lasu*` docker stack.
|
||||
4. **WC1 concurrency + reaping (deploy-free):** `realm_for("lasuite-docs","lasu-aaa111...")` =
|
||||
`lasuite-docs-aaa111` and `...bbb222` → distinct (two concurrent same-recipe runs never collide);
|
||||
create realms aaa111/bbb222/ccc333 on the warm kc, each `oidc_password_grant` returns a JWT;
|
||||
`sso.reap_orphaned_realms(D, live_hexes={"aaa111"})` deletes exactly bbb222+ccc333 and KEEPS
|
||||
aaa111. (Builder ran this live: PASS.)
|
||||
5. **WC1.1 health-gated rollback (live):** with `CCCI_SKIP_FETCH=1` stage two **annotated** fake tags
|
||||
on `~/.abra/recipes/keycloak` — `10.7.9+26.6.2` at the good commit (`git tag -a -m x 10.7.9+26.6.2
|
||||
10.7.1+26.6.2^{}`) and `10.7.10+26.6.2` at a commit whose compose.yml has a broken
|
||||
`KC_HOSTNAME=:::bad-host:::`. Create a marker realm, set last_good, then run `CCCI_SKIP_FETCH=1
|
||||
cc-ci-run runner/warm_reconcile.py keycloak` twice → first `RECONCILE RESULT: upgraded:...->10.7.9`
|
||||
(snapshot taken, last_good=10.7.9, marker preserved); second `rolled-back:10.7.10->10.7.9` —
|
||||
keycloak HEALTHY on 10.7.9, **marker realm INTACT** (data preserved), `/var/lib/ci-warm/keycloak/
|
||||
last_good` still `10.7.9` (NOT advanced), a `*-rollback.json` alert under `/var/lib/ci-warm/alerts/`
|
||||
with `attempted=10.7.10 last_good=10.7.9 recovered=true`. (Builder ran this live: ALL PASS; keycloak
|
||||
restored to canonical 10.7.1+26.6.2.)
|
||||
6. **WC1.2 pre-deploy safety gate (live):** stage an annotated fake tag with a MAJOR bump
|
||||
(`11.0.0+27.0.0`) → `CCCI_SKIP_FETCH=1 ... warm_reconcile.py keycloak` → `RECONCILE RESULT:
|
||||
held-major:...`, a `*-held-major.json` alert written, **keycloak untouched** (TYPE unchanged,
|
||||
200, no snapshot/deploy churn). Stage a minor tag (`10.7.2+26.6.3`) with `releaseNotes/
|
||||
10.7.2+26.6.3.md` containing "manual migration" → `held-manual-migration`, alert carries the notes.
|
||||
(Builder ran both live: held + untouched.)
|
||||
|
||||
**Alert delivery note (not blocking):** the reconciler WRITES alert sentinels to
|
||||
`/var/lib/ci-warm/alerts/*.json` (proven above). The operator-facing relay (Builder loop scans →
|
||||
PushNotification → archive to `alerts/seen/`) is loop behavior, run each wake when an alert exists;
|
||||
none currently. "Alert fired" for WC1.1/WC1.2 = sentinel written, which is independently checkable.
|
||||
|
||||
**Builder will NOT advance past this gate** (to W1/WC2 canonical registry) until REVIEW-2w shows PASS.
|
||||
|
||||
## (prior) Gate
|
||||
(none before this)
|
||||
|
||||
## Blocked
|
||||
(none)
|
||||
|
||||
Reference in New Issue
Block a user