297 lines
22 KiB
Markdown
297 lines
22 KiB
Markdown
# STATUS — Phase 2w (warm canonical deployments + `--quick` CI mode)
|
||
|
||
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase2w-warm-canonical-quick.md`
|
||
**Loop state for THIS phase:** STATUS-2w / BACKLOG-2w / REVIEW-2w / JOURNAL-2w (DECISIONS.md shared).
|
||
Phase 1/1b/1c/1d/1e and Phase 2 STATUS/BACKLOG/REVIEW files are NOT this phase's state.
|
||
Phase 2 is **PAUSED** (STATUS-2/BACKLOG-2 intact) and resumes after 2w `## DONE`.
|
||
|
||
## Phase
|
||
Add a warm-data layer to cc-ci CI: a live-warm shared keycloak for SSO deps, data-warm per-recipe
|
||
canonicals at stable domains, known-good snapshots, an opt-in `--quick` fast lane that reattaches the
|
||
canonical and upgrades to PR head (rolling back on failure), cold-only canonical advancement, and a
|
||
nightly full-cold sweep. Definition of Done = WC1–WC9 (plan §1), each Adversary cold-verified.
|
||
|
||
## Definition of Done (Phase 2w) — WC1–WC9 (+WC1.1/WC1.2), each Adversary cold-verified in REVIEW-2w
|
||
- [x] **WC1** — Live-warm UNPINNED keycloak; per-run namespaced realms (create+delete); concurrent
|
||
distinct realms; orphan realms reaped. **Adversary PASS @2026-05-29** (REVIEW-2w, gate 985686f).
|
||
- [~] **WC1.1** — Health-gated deploy-with-rollback. **keycloak (stateful) — Adversary PASS
|
||
@2026-05-29** (marquee: broken latest → snapshot→restore→prior, data intact, last_good held,
|
||
alert). **traefik (stateless, version-rollback-only) — NOT yet migrated = W0.10**, MUST close
|
||
before Phase-2w DONE (Adversary will require a cold proof).
|
||
- [x] **WC1.2** — Pre-deploy safety gate (major / manual-migration → hold + alert with notes, no
|
||
churn, short-circuits before WC1.1). **Adversary PASS @2026-05-29**.
|
||
- [x] **WC2** — Data-warm canonical model: per-recipe canonical at stable domain `warm-<recipe>`,
|
||
declarative registry (canonical.json + recipe_meta.WARM_CANONICAL) tracking recipe→known-good
|
||
version/commit; data-warm (undeployed-when-idle, volume retained); re-warmable via seed_canonical.
|
||
Proven on custom-html (W1.2). **Adversary PASS @2026-05-29** (REVIEW-2w 0246296, gate 4ce80f8).
|
||
- [x] **WC3** — Known-good snapshots: raw per-volume tar taken while undeployed under
|
||
`/var/lib/ci-warm/<recipe>/snapshot/`; one last-good per app, atomic subdir swap; restore
|
||
round-trips data (W0.5 + W1.2 + Adversary's own mutate→restore). **Adversary PASS @2026-05-29**.
|
||
- [x] **WC4** — `--quick` mode (`run_quick` in run_recipe_ci.py): reattach canonical → upgrade to PR
|
||
head (chaos) → generic UPGRADE+serving+overlay+custom; PASS→undeploy-keep-volume (known-good
|
||
UNCHANGED, never promote); FAIL→restore last-known-good snapshot then undeploy. Proven live on
|
||
custom-html (PASS + FAIL). **Adversary PASS @2026-05-29** (REVIEW-2w 31f0e42, gate 3ff2bf6).
|
||
- [ ] **WC5** — Canonical advancement via cold only (promote-on-green-cold; seeds on first green cold).
|
||
- [ ] **WC6** — Nightly full-cold sweep (scheduled, declarative, MAX_TESTS-bounded).
|
||
- [x] **WC7** — Trigger/authority/labeling: default `!testme`=cold (unchanged); `--quick` opt-in via
|
||
bridge `parse_trigger` (`!testme --quick` → CCCI_QUICK=1 Drone param, deployed+live-verified);
|
||
never gates merge; runs carry mode=quick (lower-confidence label); clean no-canonical fallback
|
||
to cold. **Adversary PASS @2026-05-29** (REVIEW-2w 31f0e42, gate 3ff2bf6).
|
||
- [ ] **WC8** — Resource safety + isolation: warm runs serialize per app; warm keycloak shared via
|
||
per-run realms; disk monitored+pruned; cold teardown sacred; warm data excluded from D8 closure.
|
||
- [ ] **WC9** — Docs + cold verify incl. the rollback proof (deliberately fail a PR under `--quick`,
|
||
confirm last-known-good restored intact; a `--quick` pass did not move the known-good).
|
||
|
||
## Milestones (plan §3)
|
||
- **W0** — Warm keycloak (WC1/WC1.1-keycloak/WC1.2). ✅ Adversary PASS @2026-05-29.
|
||
- **W1** — Canonical registry + snapshot/restore (WC2, WC3). ✅ Adversary PASS @2026-05-29.
|
||
- **W2** — `--quick` mode (WC4, WC7). ✅ Adversary PASS @2026-05-29.
|
||
- **W3** — Cold-advances-canonical + nightly sweep (WC5, WC6). ← IN FLIGHT
|
||
- **W1** — Canonical registry + snapshot/restore (WC2, WC3).
|
||
- **W2** — `--quick` mode (WC4, WC7).
|
||
- **W3** — Cold-advances-canonical + nightly sweep (WC5, WC6).
|
||
- **W4** — Resource/isolation hardening + docs + cold verify incl. rollback proof (WC8, WC9). → DONE.
|
||
|
||
## In flight
|
||
**W0 — live-warm keycloak (WC1).** Done so far (commits up to 88c1114):
|
||
- W0.1 sso realm lifecycle (list/delete/realms_to_reap/reap) + 8 unit tests (43 unit pass).
|
||
- W0.2 orchestrator live-warm dep mode (warm.py + run_recipe_ci split warm/cold; per-run realm).
|
||
- **WC1 core mechanism PROVEN** deploy-free on the live warm keycloak: realm create → password-grant
|
||
JWT → discovery issuer → delete(idempotent) → reap(keeps live hex / deletes orphan). All PASS.
|
||
- W0.3 declarative reconciler `nix/modules/warm-keycloak.nix` up; `nixos-rebuild switch` →
|
||
warm-keycloak.service active, system running (0 failed), /realms/master=200. (INTERIM: pinned +
|
||
skip-if-healthy; to be replaced by the unpinned + health-gated WC1.1 form.)
|
||
|
||
- **W0.5 WC3 snapshot/restore helper** (`runner/harness/warmsnap.py`) DONE (4cc1e15). +5 unit tests
|
||
(48 unit pass). **LIVE round-trip PROVEN on warm keycloak**: marker realm → undeploy → snapshot
|
||
(mariadb+providers) → deploy → delete marker (mutate DB) → undeploy → restore → deploy → marker
|
||
realm BACK; keycloak healthy. Snapshots under `/var/lib/ci-warm/<recipe>/`, atomic, one last-good.
|
||
|
||
- **W0.6 reconciler rewrite** DONE (a044abb). `runner/warm_reconcile.py` (python, packaged into the
|
||
nix store, replaces the bash reconcile): UNPIN keycloak (deploy latest version TAG; recipe fetched
|
||
at runtime → D8 closure byte-identical); WC1.2 pre-deploy safety gate (major recipe/app bump OR
|
||
releaseNotes manual-migration → hold + alert, no churn); WC1.1 health-gated upgrade-with-rollback
|
||
scaffold (record last-good → keycloak undeploy→snapshot→deploy latest → health-gate →
|
||
commit-or-restore+redeploy-prior+alert). Alerts = `/var/lib/ci-warm/alerts/*.json`. +8 unit tests
|
||
(56 unit pass). PROVEN live: `nixos-rebuild switch` → warm-keycloak.service runs the python
|
||
reconciler → noop-healthy (system 0-failed, 200); **WC1.2 holds proven** (MAJOR → held-major,
|
||
keycloak untouched; minor+manual-migration notes → held-manual-migration, alert carries notes).
|
||
|
||
- **W0.9 WC1.1 live proofs** DONE (32f0071). PROVEN on warm keycloak (annotated fake tags +
|
||
CCCI_SKIP_FETCH): (a) healthy upgrade 10.7.1→10.7.9 — snapshot+deploy+health-pass, last_good
|
||
committed, marker preserved; (b) **marquee rollback** — broken latest 10.7.10 → deploy fails →
|
||
rollback to 10.7.9, HEALTHY, marker realm INTACT (data preserved), last_good NOT advanced, rollback
|
||
alert written (attempted=10.7.10,last_good=10.7.9,recovered=True); recovered to canonical
|
||
10.7.1+26.6.2. Fixed 4 issues live (deploy-fail→rollback, warmsnap last_good subdir, wait_undeployed
|
||
swarm-settle, abra-stdout capture). 57 unit pass. **Reconciler-side WC1/WC1.1/WC1.2 proven.**
|
||
|
||
**Adversary reproduce (W0.9):** on cc-ci, with the keycloak recipe clone, create annotated fake
|
||
tags (peel `^{}`, set git identity) `10.7.9+26.6.2`(=good commit) and `10.7.10+26.6.2`(broken
|
||
KC_HOSTNAME), then `CCCI_SKIP_FETCH=1 cc-ci-run runner/warm_reconcile.py keycloak` twice; observe
|
||
`upgraded:` then `rolled-back:`, marker realm survives, `/var/lib/ci-warm/keycloak/last_good`
|
||
unchanged at the prior version, a `*rollback*.json` alert under `/var/lib/ci-warm/alerts/`.
|
||
|
||
**W0 COMPLETE — Adversary PASS @2026-05-29.** Now in **W1 (canonical registry, WC2/WC3)**.
|
||
|
||
**W0 ✅ + W1 ✅ + W2 ✅ Adversary PASS. Now in W3 (cold-advances-canonical WC5 + nightly sweep WC6).**
|
||
|
||
**W3 plan:**
|
||
- **WC5 — promote-on-green-cold.** A GREEN full-cold run on the LATEST (not a `--quick` run) of an
|
||
enrolled (WARM_CANONICAL) recipe re-snapshots + re-tags the canonical known-good instead of
|
||
deleting the volume at teardown: at the end of a green cold run, undeploy → `canonical.seed_canonical`
|
||
(snapshot while undeployed + write registry version=the green commit/version) → keep the volume as
|
||
the new canonical. The FIRST green cold run on latest SEEDS the canonical. ONLY cold advances it
|
||
(`--quick` never promotes — proven W2). Wire into run_recipe_ci.py cold teardown, gated on:
|
||
recipe is WARM_CANONICAL + run was green + deployed LATEST (not a pinned/prev base). Add unit
|
||
tests + a live proof (green cold custom-html run → canonical re-seeded at the new known-good).
|
||
- **WC6 — nightly full-cold sweep.** Declarative scheduler (systemd timer on cc-ci): nightly does
|
||
`nixos-rebuild switch` FIRST (rolls warm/infra to latest, health-gated per WC1.1) THEN a full-cold
|
||
sweep across enrolled recipes (serial, MAX_TESTS-bounded), refreshing each canonical's known-good
|
||
(WC5) + serving as the daily authoritative regression. MUST NOT run while a test is in flight.
|
||
- **Quiet-window opportunity (now): W0.10a traefik WC1.1** — Adversary idle post-W2 PASS, so this is
|
||
the window to migrate traefik onto the health-gated reconciler (tracked-before-DONE; below).
|
||
|
||
**Tracked before Phase-2w DONE:**
|
||
- **W0.10a — traefik WC1.1** (Adversary requires a cold proof): migrate `proxy.nix` onto the shared
|
||
health-gated reconciler (stateless = version-rollback-only; preserve cert-secret/WILDCARDS_ENABLED/
|
||
COMPOSE_FILE setup). CAREFUL — traefik serves all TLS; deploy/test only in a quiet window.
|
||
- **W0.10b — Builder-loop alert relay**: each wake, scan `/var/lib/ci-warm/alerts/*.json` →
|
||
PushNotification → archive to `alerts/seen/`.
|
||
|
||
**Build finding (RESOLVED):** the W0.4 lasuite-docs `setup_custom_tests` redeploy failure (nginx web
|
||
`host not found in upstream ...backend:8000`) was **transient resource contention** from the
|
||
since-killed stale Phase-2 run (disk was also tight). On the clean system it converges fine — the
|
||
headline e2e is green (below). No recipe/harness change needed.
|
||
|
||
## Gate
|
||
|
||
### Gate: WC4 + WC7 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 31f0e42, gate 3ff2bf6)
|
||
Cold-verified from the Adversary's own clone: 64 units; WC7 adversarial trigger battery (all negatives
|
||
rejected, live bridge); WC4 never-promote (snapshot byte-identical, registry unchanged); WC4
|
||
FAIL→rollback restored EXACT known-good (marker back, 200, broken image gone, exit 1); no-canonical
|
||
fallback to a cold per-run domain. Builder may proceed to W3. (claim detail retained below.)
|
||
|
||
### (claimed, now PASS) Gate: WC4 + WC7 — CLAIMED detail
|
||
|
||
**WHAT.** The `--quick` opt-in fast lane (W2): reattach the data-warm canonical → upgrade in place to
|
||
the PR head → assert (generic upgrade reconverge+moved+serving + overlay + custom); PASS →
|
||
undeploy-keep-volume with the **known-good UNCHANGED (never promote)**; FAIL → restore the
|
||
last-known-good snapshot + undeploy (roll back, data safe). Opt-in via `!testme --quick`, mode-tagged
|
||
lower-confidence, never gates merge; clean no-canonical fallback to COLD.
|
||
|
||
**WHERE (code).** `runner/run_recipe_ci.py` (`run_quick`, dispatched from `main()` on CCCI_QUICK=1 /
|
||
MODE=quick; `_wait_undeployed`; no-canonical fallback), `runner/harness/canonical.py`
|
||
(deploy_canonical resets TYPE; undeploy_keep_volume), `runner/harness/warmsnap.py` (restore),
|
||
`bridge/bridge.py` (`parse_trigger` + CCCI_QUICK param), `.drone.yml` (quick echo). 64 unit pass.
|
||
|
||
**HOW + EXPECTED (cold, from your own clone on cc-ci):**
|
||
1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **64 passed** (incl. test_bridge_trigger:
|
||
`!testme`→cold, `!testme --quick`→quick, `!testmexyz`→reject).
|
||
2. **WC7 trigger (live in the running bridge):** `cid=$(docker ps -q -f name=ccci-bridge);
|
||
docker exec $cid python3 -c 'import sys;sys.path.insert(0,"/app");import bridge;
|
||
print(bridge.parse_trigger("!testme --quick"), bridge.parse_trigger("!testmexyz"))'` →
|
||
`(True, True) (False, False)`. `trigger_build` adds `CCCI_QUICK=1` (auto-exposed to run_recipe_ci);
|
||
a `!testme --quick` PR comment is labelled lower-confidence; plain `!testme` stays full cold.
|
||
3. **WC4 `--quick` flow (custom-html canonical, currently idle at 1.11.0+1.29.0):**
|
||
- **PASS run:** `RECIPE=custom-html CCCI_QUICK=1 REF=87a62a5 cc-ci-run runner/run_recipe_ci.py`
|
||
(REF=87a62a5 is the 1.10.0+1.28.0 commit — a different healthy head) → exit 0; SUMMARY shows
|
||
`mode=quick`, `upgrade: pass`, `custom: pass`, "canonical undeployed, volume retained, known-good
|
||
UNCHANGED"; afterwards `canonical.json` version STILL 1.11.0+1.29.0 (NOT promoted), canonical
|
||
idle, content volume retained, known-good marker intact.
|
||
- **FAIL run (rollback):** stage a broken custom-html commit (`image: nginx:99.99.99-doesnotexist`),
|
||
`RECIPE=custom-html CCCI_QUICK=1 CCCI_SKIP_FETCH=1 REF=<broken sha> cc-ci-run
|
||
runner/run_recipe_ci.py` → exit 1; SUMMARY shows "rolling back … restored known-good data;
|
||
canonical idle (NOT promoted)"; afterwards known-good version UNCHANGED, canonical idle, data
|
||
(marker) intact. Builder ran both live: **ALL PASS** (canonical left clean idle@1.11.0+1.29.0).
|
||
- **no-canonical fallback:** MODE=quick for a recipe with no canonical → logs "falling back to COLD"
|
||
and runs the full cold flow (so the PR is still tested; default `!testme` unaffected).
|
||
|
||
**Builder will NOT advance into W3 (cold-advances-canonical / nightly) past this gate** until
|
||
REVIEW-2w shows PASS — but will do the tracked W0.10a (traefik) in a quiet window meanwhile.
|
||
|
||
---
|
||
|
||
### Gate: WC2 + WC3 — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 0246296, gate 4ce80f8)
|
||
Cold-verified from the Adversary's own clone (its own data-warm round-trip + restore round-trip).
|
||
Builder may proceed to W2 (`--quick`). custom-html canonical left clean (idle, volume retained,
|
||
known-good content, snapshot intact, v1.11.0+1.29.0). (claim detail retained below.)
|
||
|
||
### (claimed, now PASS) Gate: WC2 + WC3 — CLAIMED detail
|
||
|
||
**WHAT.** The data-warm canonical model (W1): a declarative per-recipe canonical at the stable domain
|
||
`warm-<recipe>.ci.commoninternet.net`, kept **data-warm** (undeployed-when-idle, data volume
|
||
retained), tracked by a registry; **known-good snapshots** (raw per-volume tar while undeployed, one
|
||
last-good per app, restore round-trips data).
|
||
|
||
**WHERE (code).** `runner/harness/canonical.py` (registry + data-warm lifecycle), `runner/harness/
|
||
warmsnap.py` (snapshot/restore), enrollment `tests/custom-html/recipe_meta.py: WARM_CANONICAL=True`.
|
||
State on cc-ci under `/var/lib/ci-warm/<recipe>/` (`canonical.json`, `snapshot/`, retained volume).
|
||
|
||
**HOW + EXPECTED (cold, from your own clone on cc-ci):**
|
||
1. **Units:** `cc-ci-run -m pytest tests/unit -q` → **61 passed** (incl. test_canonical, test_warmsnap).
|
||
2. **WC2/WC3 data-warm round-trip** (custom-html canonical exists idle now): reproduce with a driver
|
||
that uses `runner/harness/canonical.py` — deploy `warm-custom-html.ci.commoninternet.net` @
|
||
`1.11.0+1.29.0`, write a marker file into `/usr/share/nginx/html/`, undeploy, `seed_canonical`
|
||
(writes `/var/lib/ci-warm/custom-html/canonical.json` + a `snapshot/` while undeployed); confirm
|
||
**app UNDEPLOYED but the `content` volume RETAINED** (`docker volume ls | grep warm-custom-html`);
|
||
then `deploy_canonical('custom-html')` → the marker **survives** (data-warm reattach). Builder ran
|
||
this live: **ALL PASS** (marker `WC2-DATA-MARKER-7f3a9c` survived; registry version=1.11.0+1.29.0;
|
||
snapshot present). Current live state: `cat /var/lib/ci-warm/custom-html/canonical.json` →
|
||
status=idle, version=1.11.0+1.29.0; `docker volume ls` shows
|
||
`warm-custom-html_ci_commoninternet_net_content` retained with NO custom-html service running.
|
||
3. **WC3 restore round-trip** already cold-verified in the W0.9/W0.5 keycloak proof (snapshot →
|
||
mutate DB → restore → data back); same `warmsnap` helper.
|
||
4. **D8/WC8:** `/var/lib/ci-warm/` is cache, NOT in the nix closure (no module references it as a
|
||
source); re-seeded by cold runs, not restored on rebuild.
|
||
|
||
**Builder will NOT advance into W2 (`--quick`, which consumes the canonical) past this gate** until
|
||
REVIEW-2w shows PASS — but will do non-disruptive W0.10 follow-ups (alert relay) meanwhile.
|
||
|
||
---
|
||
|
||
### Gate: WC1 + WC1.2 + WC1.1(keycloak) — ✅ Adversary PASS @2026-05-29 (REVIEW-2w 31ac86d, gate 985686f)
|
||
All 6 checks cold-verified from the Adversary's own clone. Builder may proceed to W1. **Tracked open
|
||
(must close before Phase-2w DONE, not a blocker now): traefik WC1.1 (W0.10)** — stateless
|
||
version-rollback not yet on the shared health-gated reconciler; Adversary will require a cold proof.
|
||
|
||
(claim detail retained below for the record)
|
||
|
||
**WHAT.** The live-warm keycloak layer (W0): a persistent **unpinned** keycloak at the stable domain
|
||
`warm-keycloak.ci.commoninternet.net`, declaratively reconciled, that SSO-dependent runs use via a
|
||
**per-run namespaced realm** (created + deleted) instead of co-deploying; concurrent dependents get
|
||
distinct realms; orphan realms are reaped (WC1). The reconciler health-gates auto-upgrades with
|
||
snapshot-backed rollback (WC1.1) behind a pre-deploy safety gate for major/manual-migration bumps
|
||
(WC1.2).
|
||
|
||
**WHERE (code).** `runner/warm_reconcile.py` (reconcile logic), `runner/harness/warm.py` (stable
|
||
domain, per-run realm naming, reaping), `runner/harness/sso.py` (realm lifecycle), `runner/harness/
|
||
warmsnap.py` (snapshot/restore), `runner/run_recipe_ci.py` (warm/cold dep split), `nix/modules/
|
||
warm-keycloak.nix` (systemd reconcile unit). Warm state on cc-ci under `/var/lib/ci-warm/`.
|
||
|
||
**HOW + EXPECTED (cold, from your own clone on cc-ci — tar-sync runner+tests to your /root/<clone>):**
|
||
|
||
1. **Declarative + unpinned + healthy:** `grep -n kcVersion nix/modules/warm-keycloak.nix` → *no
|
||
match* (pin removed; the unit runs `runner/warm_reconcile.py keycloak`). `ssh cc-ci 'systemctl
|
||
is-active warm-keycloak.service'` → `active`; `systemctl is-system-running` → `running`. Health:
|
||
`curl -sk --resolve warm-keycloak.ci.commoninternet.net:443:127.0.0.1
|
||
https://warm-keycloak.ci.commoninternet.net/realms/master -o /dev/null -w '%{http_code}'` → `200`.
|
||
D8: a `nixos-rebuild build` closure hash is unaffected by which keycloak version is live (recipe
|
||
fetched at runtime).
|
||
2. **Units:** `cc-ci-run -m pytest tests/unit -q` → **57 passed** (incl. test_warm_realm,
|
||
test_warmsnap, test_warm_reconcile).
|
||
3. **WC1 headline e2e:** `RECIPE=lasuite-docs STAGES=install,custom cc-ci-run
|
||
runner/run_recipe_ci.py` → `install: pass`, `custom: pass`, **`deploy-count = 1 (expect 1)`**
|
||
(keycloak NOT co-deployed), log shows `dep: using live-warm keycloak @ warm-keycloak...` and
|
||
`dep: deleted per-run realm lasuite-docs-<hex> on warm keycloak`. The 3 custom SSO tests pass
|
||
(test_health_check, test_oidc_login_via_keycloak, test_oidc_password_grant_against_dep_keycloak).
|
||
After the run, warm keycloak realms = `['master']` only (no leftover); no `lasu*` docker stack.
|
||
4. **WC1 concurrency + reaping (deploy-free):** `realm_for("lasuite-docs","lasu-aaa111...")` =
|
||
`lasuite-docs-aaa111` and `...bbb222` → distinct (two concurrent same-recipe runs never collide);
|
||
create realms aaa111/bbb222/ccc333 on the warm kc, each `oidc_password_grant` returns a JWT;
|
||
`sso.reap_orphaned_realms(D, live_hexes={"aaa111"})` deletes exactly bbb222+ccc333 and KEEPS
|
||
aaa111. (Builder ran this live: PASS.)
|
||
5. **WC1.1 health-gated rollback (live):** with `CCCI_SKIP_FETCH=1` stage two **annotated** fake tags
|
||
on `~/.abra/recipes/keycloak` — `10.7.9+26.6.2` at the good commit (`git tag -a -m x 10.7.9+26.6.2
|
||
10.7.1+26.6.2^{}`) and `10.7.10+26.6.2` at a commit whose compose.yml has a broken
|
||
`KC_HOSTNAME=:::bad-host:::`. Create a marker realm, set last_good, then run `CCCI_SKIP_FETCH=1
|
||
cc-ci-run runner/warm_reconcile.py keycloak` twice → first `RECONCILE RESULT: upgraded:...->10.7.9`
|
||
(snapshot taken, last_good=10.7.9, marker preserved); second `rolled-back:10.7.10->10.7.9` —
|
||
keycloak HEALTHY on 10.7.9, **marker realm INTACT** (data preserved), `/var/lib/ci-warm/keycloak/
|
||
last_good` still `10.7.9` (NOT advanced), a `*-rollback.json` alert under `/var/lib/ci-warm/alerts/`
|
||
with `attempted=10.7.10 last_good=10.7.9 recovered=true`. (Builder ran this live: ALL PASS; keycloak
|
||
restored to canonical 10.7.1+26.6.2.)
|
||
6. **WC1.2 pre-deploy safety gate (live):** stage an annotated fake tag with a MAJOR bump
|
||
(`11.0.0+27.0.0`) → `CCCI_SKIP_FETCH=1 ... warm_reconcile.py keycloak` → `RECONCILE RESULT:
|
||
held-major:...`, a `*-held-major.json` alert written, **keycloak untouched** (TYPE unchanged,
|
||
200, no snapshot/deploy churn). Stage a minor tag (`10.7.2+26.6.3`) with `releaseNotes/
|
||
10.7.2+26.6.3.md` containing "manual migration" → `held-manual-migration`, alert carries the notes.
|
||
(Builder ran both live: held + untouched.)
|
||
|
||
**SCOPE (honest).** WC1 and WC1.2 are complete. **WC1.1 is proven for keycloak** — the *stateful*
|
||
case (snapshot-backed data-integrity rollback), which is the hard part and the Adversary's marquee
|
||
proof. **traefik's WC1.1** (stateless = version-rollback-only) is **NOT yet migrated** onto the shared
|
||
health-gated reconciler — it still uses the existing `proxy.nix` chaos-deploy reconciler. That
|
||
migration is **W0.10** (tracked in BACKLOG-2w), to land before the Phase-2w DONE. If the Adversary
|
||
wants WC1.1 fully closed (both reconcilers) before PASS, treat this gate as WC1 + WC1.2 + WC1.1(keycloak).
|
||
|
||
**Alert delivery note (not blocking):** the reconciler WRITES alert sentinels to
|
||
`/var/lib/ci-warm/alerts/*.json` (proven above). The operator-facing relay (Builder loop scans →
|
||
PushNotification → archive to `alerts/seen/`) is loop behavior, run each wake when an alert exists;
|
||
none currently. "Alert fired" for WC1.1/WC1.2 = sentinel written, which is independently checkable.
|
||
|
||
**Builder will NOT advance past this gate** (to W1/WC2 canonical registry) until REVIEW-2w shows PASS.
|
||
|
||
## (prior) Gate
|
||
(none before this)
|
||
|
||
## Blocked
|
||
(none)
|
||
|
||
## Notes
|
||
- **Disk budget (WC8 watch):** cc-ci `/` was 91% (2.4G free) at phase start; freed orphaned Phase-2
|
||
cold apps (lasu-0a6fb2 12-svc, keyc-07d81e, lasu-dbg) → 86% (3.8G free). 9.7GB reclaimable in
|
||
Docker images kept as warm pull-cache (authenticated pulls now, so re-pull is cheaper but slower).
|
||
- Stable-domain scheme (proposed, see DECISIONS): `warm-<recipe>.ci.commoninternet.net`, distinct
|
||
from cold `<recipe[:4]>-<6hex>`.
|
||
</content>
|