review(2w): watchdog [C1] ping — no formal gate yet; read-only pre-review (reconciler clean, alerts-dir flag) + inbox heads-up to coordinate live reproduce

This commit is contained in:
2026-05-29 01:44:02 +01:00
parent 0f6e7d75e3
commit 38ba153e90
2 changed files with 57 additions and 0 deletions

View File

@ -0,0 +1,23 @@
# Builder inbox — from Adversary (@2026-05-29)
**Re: watchdog pinged me on a [C1] claim, but I don't see a formal gate.**
Your `STATUS-2w.md ## Gate` reads "(none claimed yet)", and your STATUS still lists W0.7 + W0.8 as
remaining (with the lasuite-docs nginx-upstream `--chaos` race blocking the WC1 dependent-green
proof). So I'm treating WC1/WC1.1/WC1.2 as **NOT yet formally claimed** and have NOT logged a verdict.
The ping likely fired on the "reconciler-side WC1/WC1.1/WC1.2 proven" wording in 819c1bc.
**What I did (read-only, no live churn):** pre-reviewed `runner/warm_reconcile.py` (no defects — WC1.2
ordering/conservatism + WC1.1 deploy-fail-and-unhealthy rollback both look correct) and inspected live
state (warm-keycloak active, last_good=10.7.1+26.6.2 = recovered canonical). Logged in REVIEW-2w.
**Coordination:** I deliberately did NOT run the live marquee reproduce yet — it churns the warm
keycloak (undeploy/snapshot/deploy ×several) and would collide if you're driving keycloak for W0.8.
**When you formally claim WC1, set the `## Gate` line and I'll run the full cold reproduce then.**
**One flag to check on your side:** `/var/lib/ci-warm/alerts/` is currently EMPTY, but W0.9 claims a
rollback alert was written there and the alert-relay archiving (alerts/seen/) is deferred/unwired —
so a written alert should still be present. Probably you cleaned up the W0.9 test alert; just
confirming nothing silently dropped it. I'll verify an alert actually lands during my reproduce.
— Adversary

View File

@ -87,3 +87,37 @@ leftover phase-2 cold app `lasu-0a6fb2` is **fully gone**: `abra app ls -S -m` s
secrets. Disk `/` at **63% (9.8G free / 28G)** — consistent with the Builder's claimed 96%→62%
reclaim. Cold-teardown-sacred holds for this orphan; disk budget healthy. Will fold into the WC8
verdict when that gate is claimed. Still no WC gate CLAIMED; W0 → next is W0.9 WC1.1 live proofs.
## @2026-05-29 — Watchdog pinged [C1]; NO formal gate claim yet — read-only pre-review (NOT a verdict)
Watchdog signalled a [C1] claim, but `STATUS-2w.md ## Gate` reads "(none claimed yet)" and the
Builder's own STATUS lists **W0.7 + W0.8 as remaining** before claiming WC1/WC1.1/WC1.2, with a build
finding (lasuite-docs in-place `--chaos` redeploy nginx `host not found in upstream ...backend:8000`
race) currently **blocking the WC1 dependent-green proof**. Per §6.1 there is NO formal gate to pass
yet — ping likely fired on the "reconciler-side WC1/WC1.1/WC1.2 proven" wording in 819c1bc. I will
NOT log a WC1/WC1.1/WC1.2 PASS until the gate is formally CLAIMED and I run the marquee reproduce cold.
**Read-only pre-review done now (no live churn — avoids colliding with the Builder's W0.8 keycloak work):**
- Live state consistent with the W0.9 narrative: `warm-keycloak.service` active; live image
`keycloak/keycloak:26.6.2` + `mariadb:12.2`; `/var/lib/ci-warm/keycloak/last_good = 10.7.1+26.6.2`
(the recovered canonical — correctly NOT advanced to the simulated-broken 10.7.10).
- Static review of `runner/warm_reconcile.py` — no defects:
- WC1.2 safety gate runs BEFORE any snapshot/deploy (L335-343); a hold returns with NO
snapshot/deploy/rollback churn; both `held-major` + `held-manual-migration` alerts carry `release_notes`.
- `is_major_bump` is conservative: holds on a major bump of EITHER the recipe-semver (pre-`+`) OR
the app-version (post-`+`), so a keycloak app-major (25->26, the DB-migration case) is also held.
Neutralizes a tag-format wording mismatch (plan §WC1.2 says `<upstream>+<recipe-semver>`; code's
observed data says `<recipe-semver>+<app-version>`) — checking both sides covers intent either way.
Not a defect; noted so I don't re-flag it.
- WC1.1 rolls back on BOTH a deploy exception AND an unhealthy result (L356-362); stateful path
restores the snapshot before redeploying the prior version; raises if the rollback itself is
unhealthy. Alert `rollback` carries last_good/attempted/recovered/notes.
- **OPEN FLAG to confirm at the live reproduce:** `/var/lib/ci-warm/alerts/` is currently EMPTY,
though W0.9 claims a rollback alert was written there and the alert-relay archiving to `alerts/seen/`
is explicitly deferred/unwired. Likely benign (Builder cleaned up the W0.9 test alert), but I MUST
confirm a `*rollback*.json` alert actually lands during my own cold reproduce (no silent no-alert).
- **PLAN for the formal gate:** when WC1 is CLAIMED, run the Builder's reproduce (STATUS L79-83):
fake tags `10.7.9+26.6.2`(good) + `10.7.10+26.6.2`(broken KC_HOSTNAME), `CCCI_SKIP_FETCH=1
cc-ci-run runner/warm_reconcile.py keycloak` x2 → expect `upgraded:` then `rolled-back:`, marker
realm survives, last_good unchanged at prior, a `*rollback*.json` alert; PLUS the WC1 headline
(dependent SSO custom test green vs warm keycloak + concurrent distinct realms + reaping) + a
major/manual-migration WC1.2 hold proof. Sent a BUILDER-INBOX heads-up to coordinate keycloak timing.