claim(pxgate-M1): change traefik health probe to /api/version (A1 cycle fix)
Some checks failed
continuous-integration/drone/push Build is failing

Break the deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix):

- runner/warm_reconcile.py: remove health_domain override (was ci.commoninternet.net,
  the dashboard). Change health_path from / to /api/version. The probe now uses
  traefik.ci.commoninternet.net/api/version — traefik's own API, no backend/dashboard dep.
- nix/modules/proxy.nix: update comment to reflect new health probe.
- machine-docs/DECISIONS.md: pxgate fix logged (supersedes pvfix manual workaround).
- machine-docs/DEFERRED.md: 2026-06-13 circular-dependency entry closed.
- Consumed BUILDER-INBOX.md (Adversary orientation msg).

Controlled reproduction (dashboard swarm scaled to 0):
  OLD probe (ci.commoninternet.net): HTTP 404  ← gate would loop → timeout
  NEW probe (traefik.../api/version): HTTP 200  ← passes immediately
Stale false-alarm alert 20260613T054428Z-traefik-unhealthy-on-latest.json cleared on host.
No After=deploy-proxy consumers changed (ordering preserved).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-06-13 12:46:28 +00:00
parent 6e40bd6eb9
commit 0e9fd388d2
8 changed files with 178 additions and 46 deletions

View File

@ -6,11 +6,13 @@
#
# Phase-2w / WC1.1: traefik is now UNPINNED + health-gated like keycloak — the deploy is driven by
# the shared `runner/warm_reconcile.py traefik` (STATELESS = version-rollback-only, NO snapshot):
# record last-good version → deploy latest tag → health-gate (a ROUTED host, the dashboard
# ci.commoninternet.net, returns 200) → healthy commits last-good / unhealthy rolls back to last-good
# + alert. traefik's wildcard-cert/file-provider config (ssl_cert/ssl_key secrets, WILDCARDS_ENABLED,
# COMPOSE_FILE) is preserved EXACTLY by the spec's `setup` (warm_reconcile._traefik_setup). The
# runner/ tree is copied into the nix store → D8-clean; recipe fetched at runtime → closure stable.
# record last-good version → deploy latest tag → health-gate (traefik.ci.commoninternet.net/api/version
# returns 200 — traefik's own API, no backend dep) → healthy commits last-good / unhealthy rolls back
# to last-good + alert. Phase-pxgate: changed from ci.commoninternet.net (dashboard) to avoid the
# cold-boot deadlock (deploy-dashboard is After=deploy-proxy; A1 fix). traefik's wildcard-cert/file-
# provider config (ssl_cert/ssl_key secrets, WILDCARDS_ENABLED, COMPOSE_FILE) is preserved EXACTLY by
# the spec's `setup` (warm_reconcile._traefik_setup). The runner/ tree is copied into the nix store →
# D8-clean; recipe fetched at runtime → closure stable.
#
# Idempotent-RECONCILE systemd oneshot (unchanged unit name `deploy-proxy` — other modules order
# after it): converges every activation/boot, self-healing drift. No run-once sentinel.