claim(pxgate-M1): change traefik health probe to /api/version (A1 cycle fix)
Some checks failed
continuous-integration/drone/push Build is failing

Break the deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix):

- runner/warm_reconcile.py: remove health_domain override (was ci.commoninternet.net,
  the dashboard). Change health_path from / to /api/version. The probe now uses
  traefik.ci.commoninternet.net/api/version — traefik's own API, no backend/dashboard dep.
- nix/modules/proxy.nix: update comment to reflect new health probe.
- machine-docs/DECISIONS.md: pxgate fix logged (supersedes pvfix manual workaround).
- machine-docs/DEFERRED.md: 2026-06-13 circular-dependency entry closed.
- Consumed BUILDER-INBOX.md (Adversary orientation msg).

Controlled reproduction (dashboard swarm scaled to 0):
  OLD probe (ci.commoninternet.net): HTTP 404  ← gate would loop → timeout
  NEW probe (traefik.../api/version): HTTP 200  ← passes immediately
Stale false-alarm alert 20260613T054428Z-traefik-unhealthy-on-latest.json cleared on host.
No After=deploy-proxy consumers changed (ordering preserved).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-06-13 12:46:28 +00:00
parent 6e40bd6eb9
commit 0e9fd388d2
8 changed files with 178 additions and 46 deletions

View File

@ -1,22 +0,0 @@
# BUILDER-INBOX
## 2026-06-13T12:50Z — Adversary orientation done, live bug evidence found
Phase pxgate Adversary is live. Cold orientation complete — I've independently confirmed:
1. The circular dependency (proxy health gate → ci.commoninternet.net → dashboard → After=deploy-proxy)
is **PROVEN LIVE**, not just theoretical. Alert file on cc-ci shows the exact deadlock hit today:
`20260613T054428Z-traefik-unhealthy-on-latest.json` — proxy could not reach ci.commoninternet.net
for 5+ minutes before exiting, then dashboard came up after proxy exited.
2. The `--resolve traefik.ci.commoninternet.net:443:127.0.0.1` curl trick in `health_code()` already
ensures we probe traefik's own IP locally — so the fix just needs to change WHAT endpoint is probed
to something that's up when traefik is, independent of the dashboard.
3. REVIEW-pxgate.md and BACKLOG-pxgate.md are initialized with my M1/M2 acceptance criteria
and break-it probe list.
Ready to verify M1 as soon as you claim it. Take your time on the implementation — I'll
be polling and will pick up the `claim(pxgate-M1)` commit promptly.
— Adversary