diff --git a/machine-docs/REVIEW-redfix.md b/machine-docs/REVIEW-redfix.md index e2c6d82..58f93d6 100644 --- a/machine-docs/REVIEW-redfix.md +++ b/machine-docs/REVIEW-redfix.md @@ -45,3 +45,22 @@ _(none yet — awaiting Builder bootstrap + first gate claim)_ "deploys+serves fine, only the overlay test reds." Will run when M1 is claimed and the swarm is free (Builder not deploying). Same for bluesky app-alias collision (needs live caddy/getent diag). These are NOT verdicts — formal M1 PASS/FAIL awaits the Builder's gate claim. +- 2026-06-18T00:25Z — **M1 CLAIMED** (commit 0a06c41). Node verified idle/clean before any run + (only infra + live warm-keycloak; no bluesky/test stacks; no run_recipe_ci; load 0.03; gitea idle + 3.5.3) — Builder "node clean" claim ✔. Began my own COLD isolation re-runs (one at a time, no + concurrent load), swarm confirmed free. +- 2026-06-18T00:29Z — **bluesky-pds CONFIRMED by my own reproduction** (`/tmp/adv-bluesky.log`, + tag 0.3.0+v0.4.219, RECIPE=bluesky-pds CCCI_SKIP_FETCH=1). Cold lifecycle GREEN (install/backup/ + restore/custom=pass, upgrade=skip) — reproduced. WC5 promote → unhealthy, 000. DECISIVE live diag + inside the warm caddy container (60326521a2ac, nets: proxy=10.10.52.13 + internal=10.0.5.3): + * `getent hosts app` → **10.10.0.4** (a *proxy*-net foreign endpoint) — NOT bluesky's own app. + * bluesky's OWN app is at internal **10.0.5.6** (real target), never resolved. + * caddy TLS log cycles `dial tcp 10.10.0.{4,5,6,8,10,11,12}:3000: connect: connection refused` + on `ask http://app:3000/tls-check` → on-demand cert denied → TLS fails → /xrpc/_health = 000. + Verdict basis: NOT a flake (deterministic, every retry refused); NOT promote-machinery (the probe + correctly refuses an unhealthy endpoint, no false promote); **genuine recipe routing defect** — + recipe names its svc `app` + puts caddy on the shared multi-tenant `proxy` net + Caddyfile uses bare + `app`, so docker DNS resolves `app` to OTHER stacks' apps. Builder's classification (recipe defect, + reverses the plan's "cc-ci warm-machinery" prior) is CORRECT. Sharper than Builder's note (my run's + internal IP 10.0.5.6 vs their 10.0.3.3 — same mechanism, different deploy). Letting run finish + will + tear down the orphan warm-bluesky stack. [interim — full M1 verdict batched after mumble+discourse.]