claim(pxgate-M1): change traefik health probe to /api/version (A1 cycle fix)
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Break the deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix): - runner/warm_reconcile.py: remove health_domain override (was ci.commoninternet.net, the dashboard). Change health_path from / to /api/version. The probe now uses traefik.ci.commoninternet.net/api/version — traefik's own API, no backend/dashboard dep. - nix/modules/proxy.nix: update comment to reflect new health probe. - machine-docs/DECISIONS.md: pxgate fix logged (supersedes pvfix manual workaround). - machine-docs/DEFERRED.md: 2026-06-13 circular-dependency entry closed. - Consumed BUILDER-INBOX.md (Adversary orientation msg). Controlled reproduction (dashboard swarm scaled to 0): OLD probe (ci.commoninternet.net): HTTP 404 ← gate would loop → timeout NEW probe (traefik.../api/version): HTTP 200 ← passes immediately Stale false-alarm alert 20260613T054428Z-traefik-unhealthy-on-latest.json cleared on host. No After=deploy-proxy consumers changed (ordering preserved). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@ -410,15 +410,5 @@ reachable via the operator/dev STAGES escape — production drone runs always ru
|
||||
(one-line guard in `should_promote_canonical`), or whether dev hand-runs promoting is acceptable.
|
||||
|
||||
### 2026-06-13 — deploy-proxy health-gate circular dependency (D8 risk)
|
||||
- [ ] **What:** `deploy-proxy.service` health gate waits for `ci.commoninternet.net → 200`, served by
|
||||
`deploy-dashboard.service` which is ordered `After=deploy-proxy.service`. On a fresh-from-scratch
|
||||
boot, deploy-proxy waits 5 min for the health gate, then retries up to 15 min (TimeoutStartSec=900),
|
||||
then fails — deploy-dashboard starts after but proxy is in failed state. Filed as A1 by the Adversary
|
||||
(2026-06-13, phase pvfix). See `machine-docs/BACKLOG-pvfix.md`.
|
||||
- [x] **CLOSED @2026-06-13 (Builder, phase pxgate).** Fixed in `runner/warm_reconcile.py` — traefik health probe changed from `ci.commoninternet.net/` (dashboard, ordered After=deploy-proxy) to `traefik.ci.commoninternet.net/api/version` (Traefik's own API, no backend dependency). Cold-boot deadlock eliminated; rollback semantics preserved (broken traefik won't serve /api/version). Controlled reproduction confirmed: dashboard scaled to 0 → old probe returns 404, new probe returns 200. M1 claimed. Adversary PASS pending for DONE. See DECISIONS.md 2026-06-13 pxgate entry.
|
||||
- **Filed by:** Adversary, phase pvfix (cross-filed by Builder)
|
||||
- **Reason for deferral:** Fix requires changing the health probe target for traefik to something
|
||||
available before the dashboard (e.g. a Traefik-internal health path like `https://traefik.ci.commoninternet.net/api/version`)
|
||||
or moving the health gate out of the deploy-proxy oneshot into a separate converge step. Scope
|
||||
exceeds pvfix objective; needs consideration against D8 test setup.
|
||||
- **Re-entry trigger:** Operator decides to harden D8; or a fresh-install attempt fails and triggers a bugfix phase.
|
||||
- **Needed from operator:** Confirm acceptable health probe target for traefik without dashboard dependency.
|
||||
|
||||
Reference in New Issue
Block a user