# STATUS — phase pxgate (Builder) **Phase plan:** `/srv/cc-ci/cc-ci-plan/plan-phase-pxgate-proxy-healthgate.md` **Phase start:** 2026-06-13 --- ## Gate: M1 — CLAIMED, awaiting Adversary ### WHAT is claimed The deploy-proxy ↔ dashboard health-gate circular dependency (Adversary A1, pvfix) is broken. **Changed files:** - `runner/warm_reconcile.py` — SPECS["traefik"]: removed `"health_domain": "ci.commoninternet.net"`, changed `"health_path"` from `"/"` to `"/api/version"`. The health probe now uses `traefik.ci.commoninternet.net/api/version` (traefik's own API endpoint, no backend/dashboard dependency). - `nix/modules/proxy.nix` — updated comment to reflect the new health probe. - `machine-docs/DECISIONS.md` — pxgate decision logged (supersedes pvfix workaround). - `machine-docs/DEFERRED.md` — 2026-06-13 circular-dependency entry closed. **No ordering changes:** all `After=deploy-proxy` consumers (drone, warm-keycloak, bridge, dashboard, backupbot, reports, nightly-sweep) unchanged. ### HOW to verify (cold-clone commands) ```bash # 1. Code change correct: grep -A5 '"traefik"' runner/warm_reconcile.py # EXPECTED: no "health_domain" key, "health_path": "/api/version" # 2. New probe works with only traefik up (controlled repro): ssh cc-ci 'docker service scale ccci-dashboard_app=0' ssh cc-ci 'curl -sk -o /dev/null -w "%{http_code}" --max-time 10 --resolve "traefik.ci.commoninternet.net:443:127.0.0.1" "https://traefik.ci.commoninternet.net/api/version"' # EXPECTED: 200 # Restore: ssh cc-ci 'docker service scale ccci-dashboard_app=1' # 3. Old probe fails with dashboard stopped: ssh cc-ci 'docker service scale ccci-dashboard_app=0' ssh cc-ci 'curl -sk -o /dev/null -w "%{http_code}" --max-time 5 --resolve "ci.commoninternet.net:443:127.0.0.1" "https://ci.commoninternet.net/"' # EXPECTED: 404 (confirms the old gate would fail/loop → rollback after timeout) # Restore: ssh cc-ci 'docker service scale ccci-dashboard_app=1' # 4. No After=deploy-proxy consumers regressed: for svc in deploy-drone deploy-bridge deploy-dashboard backupbot-backup.timer nightly-sweep.timer warm-keycloak; do ssh cc-ci "systemctl cat $svc 2>/dev/null | grep -E 'After|Wants|Requires' | grep -v '^#'" done # EXPECTED: each still has After=deploy-proxy.service (ordering preserved) # 5. Alert cleared: ssh cc-ci 'ls /var/lib/ci-warm/alerts/' # EXPECTED: empty (stale false-alarm alert from old gate removed) # 6. Rollback semantics (P1-neg — gate has teeth): # health_code() returns 999 on curl failure; 200 from /api/version is only returned when traefik # is actually serving. Verify in code: health_code() → 999 on error path. grep -n "health_code\|999" runner/warm_reconcile.py # EXPECTED: error sentinel 999 returned when curl fails ``` ### EXPECTED outcomes | Check | Expected | |---|---| | `health_path` in traefik spec | `/api/version` | | `health_domain` in traefik spec | absent (defaults to `traefik.ci.commoninternet.net`) | | New probe (dashboard=0) | HTTP 200 | | Old probe (dashboard=0) | HTTP 404 | | After=deploy-proxy consumers | Unchanged (still order after proxy) | | Alert dir | Empty | | health_code error sentinel | 999 | ### WHERE (commit sha) Commit hash: see `git log --oneline -1` after this claim commit lands. --- ## Gate: M2 — OPEN (awaiting M1 PASS + orchestrator cold-boot) M2 requires a from-scratch / cold boot where: 1. `deploy-proxy.service` reaches `active` without dashboard pre-deployed 2. Rollback path still works on deliberately-broken traefik 3. Running server unaffected M2 is orchestrator-owned (they run the nixos-rebuild on the live host). The loops produce the code + M1 proof; the orchestrator deploys and runs the cold-boot test.