claim(pxgate-M1): change traefik health probe to /api/version (A1 cycle fix)
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Break the deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix): - runner/warm_reconcile.py: remove health_domain override (was ci.commoninternet.net, the dashboard). Change health_path from / to /api/version. The probe now uses traefik.ci.commoninternet.net/api/version — traefik's own API, no backend/dashboard dep. - nix/modules/proxy.nix: update comment to reflect new health probe. - machine-docs/DECISIONS.md: pxgate fix logged (supersedes pvfix manual workaround). - machine-docs/DEFERRED.md: 2026-06-13 circular-dependency entry closed. - Consumed BUILDER-INBOX.md (Adversary orientation msg). Controlled reproduction (dashboard swarm scaled to 0): OLD probe (ci.commoninternet.net): HTTP 404 ← gate would loop → timeout NEW probe (traefik.../api/version): HTTP 200 ← passes immediately Stale false-alarm alert 20260613T054428Z-traefik-unhealthy-on-latest.json cleared on host. No After=deploy-proxy consumers changed (ordering preserved). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@ -6,7 +6,7 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
|
||||
|
||||
- **nixos-rebuild submodule protocol — SETTLED (2026-06-13, phase pvfix).** The canonical nixos-rebuild command on the live host is `nixos-rebuild switch --flake "git+file:///root/builder-clone?submodules=1#cc-ci"`. The `path:` scheme does NOT support `?submodules=1` in this Nix version; `git+file://` does. Plain `nixos-rebuild switch --flake /root/builder-clone#cc-ci` fails with `secrets/secrets.yaml does not exist` because the git submodule is not included in the nix store copy.
|
||||
|
||||
- **deploy-proxy health gate ordering — SETTLED (2026-06-13, phase pvfix).** After stack teardown + nixos-rebuild, the deploy-proxy service's health gate (`ci.commoninternet.net → 200`) blocks until the dashboard is deployed. Since the deploy-* chain is `After=`-ordered but not concurrently started on a manual `systemctl restart deploy-proxy`, the fix is to `systemctl start deploy-drone deploy-bridge deploy-dashboard deploy-reports` concurrently in a separate invocation while deploy-proxy waits. Normal boot behavior (all WantedBy=multi-user.target services start concurrently with ordering) handles this automatically; only manual per-service restart needs the workaround.
|
||||
- **deploy-proxy health gate — SETTLED (2026-06-13, phase pxgate, supersedes pvfix workaround).** Changed the traefik health probe from `ci.commoninternet.net/` (dashboard, ordered After=deploy-proxy → circular on cold boot) to `traefik.ci.commoninternet.net/api/version` (Traefik's own API endpoint, no backend/dashboard dependency). A broken traefik still fails the gate (returns non-200 or times out), so rollback semantics are preserved. Controlled reproduction confirms: with dashboard scaled to 0, old probe returns 404, new probe returns 200. Cold-boot deadlock eliminated. DEFERRED item 2026-06-13 closed by this fix. (Old pvfix note about concurrent manual restart workaround is now superseded.)
|
||||
|
||||
- **cfold deprecated-folder policy — SETTLED (2026-06-12, phase cfold).** `tests/<recipe>/custom/`
|
||||
is the canonical home for custom tests. Discovery keeps recognizing legacy `functional/` and
|
||||
|
||||
Reference in New Issue
Block a user