# BACKLOG — phase pvfix ## Build backlog - [x] Seed pvfix state files - [x] Read plan-phase-pvfix-swarm-proxy.md + runbook - [x] Inspect live host subnets + services on proxy - [x] Patch nix/modules/swarm.nix (add --subnet 10.10.0.0/16) - [x] Write exact maintenance procedure in STATUS-pvfix.md - [x] **CLAIM M1** — awaiting Adversary review - [x] Execute live maintenance (after M1 PASS) - [x] Verify health post-maintenance - [x] **CLAIM M2** — awaiting Adversary verification ## Adversary findings ### A1 [adversary] deploy-proxy health gate circular dependency on fresh boot **Filed:** 2026-06-13T05:49Z **Severity:** D8 risk — from-scratch install deadlocks deploy-proxy for up to 15 min on first boot **Status:** OPEN **Description:** `deploy-proxy.service` runs `warm_reconcile.py traefik` whose health gate checks `ci.commoninternet.net` returns HTTP 200. That URL is served by the dashboard. `deploy-dashboard.service` has `After=deploy-proxy.service` (`nix/modules/dashboard.nix`), so systemd holds deploy-dashboard until deploy-proxy exits. On a fresh-from-scratch boot: 1. deploy-proxy starts, deploys traefik, calls `wait_healthy` → polls `ci.commoninternet.net` 2. deploy-dashboard is blocked by `After=deploy-proxy.service` (systemd won't start it) 3. `ci.commoninternet.net` never returns 200 (dashboard not up) 4. deploy-proxy times out at `TimeoutStartSec=900` (15 min) and fails 5. deploy-dashboard then starts but proxy is in failed state **Repro (controlled):** ```bash # Simulate on live host: systemctl stop deploy-dashboard deploy-proxy systemctl reset-failed deploy-dashboard deploy-proxy # Observe: starting deploy-proxy without deploy-dashboard running → wait_healthy loops until timeout systemctl start deploy-proxy & journalctl -u deploy-proxy -f # confirms repeated curl ci.commoninternet.net failures ``` **Root cause:** `warm_reconcile.py traefik` spec has `health_domain = "ci.commoninternet.net"` (a routed host proving Traefik routes + TLS — valid goal, wrong URL for a service ordered-after). **Fix options for Builder:** 1. Change `health_domain` to a URL independent of ordered services (e.g. a Traefik `api/ping` endpoint on `traefik.ci.commoninternet.net`, or `drone.ci.commoninternet.net` which starts concurrently with deploy-proxy since deploy-drone only has `After=deploy-proxy` — but that would also be circular since drone is after proxy too). 2. Remove `deploy-proxy.service` from deploy-dashboard's `after` list — dashboard becomes concurrent with proxy on boot (fine: it's a static web server, just won't be routable until Traefik is up, which is tolerable). 3. Add `Wants=deploy-dashboard.service` + `After=deploy-dashboard.service` to deploy-proxy, so systemd starts dashboard before proxy runs its health gate (reverses the current ordering). **Note:** Pre-existing, not introduced by pvfix. Manual maintenance worked around it by starting deploy-dashboard concurrently. Only a cold from-scratch boot or deliberate service reset exposes the deadlock. Builder flagged it in STATUS-pvfix.md anomaly note. **Only the Adversary closes this item**, after re-test confirms the fix resolves the deadlock.