Some checks failed
continuous-integration/drone/push Build is failing
Break the deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix): - runner/warm_reconcile.py: remove health_domain override (was ci.commoninternet.net, the dashboard). Change health_path from / to /api/version. The probe now uses traefik.ci.commoninternet.net/api/version — traefik's own API, no backend/dashboard dep. - nix/modules/proxy.nix: update comment to reflect new health probe. - machine-docs/DECISIONS.md: pxgate fix logged (supersedes pvfix manual workaround). - machine-docs/DEFERRED.md: 2026-06-13 circular-dependency entry closed. - Consumed BUILDER-INBOX.md (Adversary orientation msg). Controlled reproduction (dashboard swarm scaled to 0): OLD probe (ci.commoninternet.net): HTTP 404 ← gate would loop → timeout NEW probe (traefik.../api/version): HTTP 200 ← passes immediately Stale false-alarm alert 20260613T054428Z-traefik-unhealthy-on-latest.json cleared on host. No After=deploy-proxy consumers changed (ordering preserved). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
30 lines
1.6 KiB
Markdown
30 lines
1.6 KiB
Markdown
# BACKLOG — phase pxgate
|
|
|
|
## Build backlog
|
|
(Builder-owned — Adversary reads only)
|
|
|
|
- [x] Create phase state files (STATUS/JOURNAL/BACKLOG-pxgate.md)
|
|
- [x] Change `health_path` from `/` to `/api/version`; drop `health_domain` override in `runner/warm_reconcile.py`
|
|
- [x] Update stale comments in warm_reconcile.py + proxy.nix
|
|
- [x] Update DECISIONS.md + DEFERRED.md
|
|
- [x] Run controlled reproduction (dashboard swarm scaled 0 → old=404, new=200)
|
|
- [x] Claim M1
|
|
|
|
## Adversary findings
|
|
|
|
No findings yet. Recording break-it probes to run once the fix lands.
|
|
|
|
### Break-it probes to execute at M1 gate
|
|
|
|
- [ ] **P1-neg (traefik-down gate fails):** Stop traefik service; verify `health_code` returns non-200
|
|
and the reconciler would roll back. (Prove the new gate has teeth — not always-pass.)
|
|
- [ ] **P2-controlled-repro:** Simulate dashboard-absent scenario: with dashboard held back (or stopped),
|
|
run the NEW reconciler → verify it completes healthy (no deadlock). Run the OLD reconciler with
|
|
dashboard held back → verify it hangs/fails (confirm the fix actually breaks the cycle).
|
|
- [ ] **P3-ordering:** Confirm `After=deploy-proxy` consumers (drone, warm-keycloak, bridge, dashboard,
|
|
backupbot, reports-nightly) still order correctly. Check `systemctl cat <service>` for each.
|
|
- [ ] **P4-alert-cleared:** Verify the 20260613T054428Z unhealthy-on-latest alert is addressed (either
|
|
the Builder explicitly handles it, or the fix makes the next reconcile cycle healthy).
|
|
- [ ] **P5-secret-leak:** grep `/var/lib/ci-warm/alerts/` for any secret values (keys, passwords).
|
|
The alert file must contain only version strings, no credentials.
|