# REVIEW — phase pvfix (Adversary) Adversary clone: `/srv/cc-ci/cc-ci-adv` Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase-pvfix-swarm-proxy.md` --- ## Phase context (initial orientation, 2026-06-13T05:30Z) Cold check of live host and current repo: - `docker network inspect proxy` → Subnet: `10.0.1.0/24` (default /24 — the exhaustion vector) - `docker network ls | grep proxy` → `ab54qfa7gsk5 proxy overlay swarm` - `nix/modules/swarm.nix` → `swarm-init` creates proxy without `--subnet`, inheriting Docker's default `/24`. No explicit subnet configured. - Builder has not started pvfix work yet (no STATUS-pvfix.md in repo). The fix is needed. Watching for Builder M1 claim (patch + procedure + live inspection proof). ### Break-it probe: live host subnet collision check (2026-06-13T05:31Z) Existing subnets on host: - `ingress`: `10.0.0.0/24` - `proxy` (current): `10.0.1.0/24` - `docker0`: `172.17.0.0/16` - `docker_gwbridge`: `172.18.0.0/16` - Host IP: `91.98.47.73` (public), `100.95.31.88` (tailscale), gateway `172.31.1.1` **10.10.0.0/16 (proposed):** does NOT collide with any existing subnet. Safe. Services currently on proxy (will be disrupted during recreation): - `traefik` → 10.0.1.9 - `ccci-reports` → 10.0.1.7 - `drone` → 10.0.1.12 - `ccci-bridge` → 10.0.1.248 - `ccci-dashboard` → 10.0.1.249 - `warm-keycloak` → 10.0.1.251 Stacks currently running (all will briefly lose routing): `backups`, `ccci-bridge`, `ccci-dashboard`, `ccci-reports`, `drone`, `traefik`, `warm-keycloak` **Maintenance window status:** CLEAR — no active recipe test stacks (`*-pr*`), no cfold sweep, no /upgrade-all visible. A quiet window is available now. **Key risk to probe when M2 is claimed:** confirm that after proxy recreation, all 6 services above rejoin with healthy VIP allocations and Traefik routes are reachable end-to-end. --- ## M1: PASS @2026-06-13T05:33Z **Claim:** `nix/modules/swarm.nix` patched with `--subnet 10.10.0.0/16`; maintenance procedure documented; chosen /16 proven safe from live host inspection. **Commit:** `e6349a9` (`claim(pvfix-M1): proxy /16 patch + maintenance plan ready`) ### Cold-run evidence **1. Patch in repo:** ``` grep -n 'subnet' nix/modules/swarm.nix → 47: docker network create --driver overlay --attachable --subnet 10.10.0.0/16 proxy ``` Correct. The `if ! docker network inspect proxy` guard ensures idempotent create. Comment accurately names the failure mode and runbook. ✓ **2. Subnet safety — live host inspection:** ``` docker network inspect $(docker network ls -q) --format "{{.Name}}: {{range .IPAM.Config}}{{.Subnet}}{{end}}" → backups_ci_commoninternet_net_default: 10.0.4.0/24 bridge: 172.17.0.0/16 docker_gwbridge: 172.18.0.0/16 host: (none) ingress: 10.0.0.0/24 none: (none) proxy: 10.0.1.0/24 traefik_ci_commoninternet_net_internal: 10.0.2.0/24 warm-keycloak_ci_commoninternet_net_internal: 10.0.3.0/24 ``` Builder's table matches exactly. `10.10.0.0/16` is clear of all existing networks. ✓ **3. Maintenance procedure review:** - **Service names confirmed correct** against live host: `deploy-proxy`, `deploy-drone`, `deploy-bridge`, `deploy-dashboard`, `deploy-reports`, `warm-keycloak` — all exist as active oneshot services. ✓ - **backups stack correctly excluded** — `backups_ci_commoninternet_net_default` (10.0.4.0/24) is NOT on `proxy` (confirmed via proxy Containers inspection). ✓ - **Step sequencing is safe:** stack rm → drain wait → network rm → nixos-rebuild (triggers swarm-init with new --subnet) → restart deploy services. ✓ - **nixos-rebuild will restart swarm-init:** `swarm-init.service` unit script changed (added --subnet flag); nixos-rebuild switch calls daemon-reload + restart for changed units. ✓ - **Note (non-blocking recommendation):** Builder may want to add an explicit `systemctl restart swarm-init` after nixos-rebuild as belt-and-braces insurance (in case daemon-reload timing is unusual). Not required for correctness but eliminates any ambiguity. **M1 PASS — safe to execute the maintenance procedure.** Waiting for Builder M2 claim. ## M2: PASS @2026-06-13T05:49Z **Claim:** proxy recreated as 10.10.0.0/16; nixos-rebuild applied; all services healthy; routes up. **Commits:** `e6349a9` (patch), `71319d7` (M2 claim) ### Cold-run evidence (all 4 acceptance checks + pre-verification probe) **1. Proxy subnet:** ``` ssh cc-ci 'docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}} created={{.Created}}"' → 10.10.0.0/16 created=2026-06-13 05:38:02.125154677 +0000 UTC ``` Network recreated at 05:38:02 UTC. ✓ **2. All 9 services at 1/1:** ``` backups_ci_commoninternet_net_app 1/1 ccci-bridge_app 1/1 ccci-dashboard_app 1/1 ccci-reports_app 1/1 drone_ci_commoninternet_net_app 1/1 traefik_ci_commoninternet_net_app 1/1 traefik_ci_commoninternet_net_socket-proxy 1/1 warm-keycloak_ci_commoninternet_net_app 1/1 warm-keycloak_ci_commoninternet_net_db 1/1 ``` All 1/1. ✓ **3. swarm-init activation time:** ``` systemctl status swarm-init --no-pager | grep Active → Active: active (exited) since Sat 2026-06-13 05:38:17 UTC; 9min ago ``` Activated 05:38:17 UTC — matches proxy creation timestamp. nixos-rebuild applied new unit. ✓ **4. Core routes:** ``` curl -sI https://ci.commoninternet.net/ → HTTP/2 200 curl -sI https://drone.ci.commoninternet.net/ → HTTP/2 303 ``` ✓ Both healthy. **5. Active swarm-init script has --subnet:** ``` /nix/store/…/swarm-init-start: docker network create --driver overlay --attachable --subnet 10.10.0.0/16 proxy ``` nixos-rebuild confirmed applied. ✓ **M2 PASS — proxy VIP exhaustion fix is live and durable.** See [adversary] finding A1 below (health gate circular dependency, pre-existing, not introduced by pvfix). --- ## Pre-verification probe (2026-06-13T05:45Z — before M2 claimed) Builder has executed the maintenance; M2 has not been formally claimed yet. Independent host check run while waiting: - `docker network inspect proxy --format "..."` → **Subnet: 10.10.0.0/16** ✓ - Container VIPs on proxy: all in `10.10.0.x/16` space: traefik=10.10.0.2, proxy-endpoint=10.10.0.3, drone=10.10.0.5, warm-keycloak=10.10.0.7, ccci-bridge=10.10.0.9, ccci-dashboard=10.10.0.11, ccci-reports=10.10.0.13 ✓ - `docker service ls` → all 9 services at 1/1 REPLICAS ✓ - `systemctl cat swarm-init` → active script has `--subnet 10.10.0.0/16` (nixos-rebuild applied) ✓ - `https://ci.commoninternet.net` → **HTTP/2 200** ✓ - `https://drone.ci.commoninternet.net` → **HTTP/2 303** (login redirect = healthy) ✓ - `https://bridge.ci.commoninternet.net` → **HTTP/2 404** (root path = expected, Traefik routes it) ✓ - `https://report.ci.commoninternet.net` → **HTTP/2 200** ✓