# REVIEW — phase pvcheck (post-proxy verification) Adversary-owned. Append-only verdicts. All commands run cold from /srv/cc-ci-orch/cc-ci-adv (own clone). --- ## Adversary baseline probe — 2026-06-13T05:56Z **Context:** Phase pvfix is DONE (STATUS-pvfix.md ## DONE). pvcheck preconditions verified cold. ### Precondition checks | Check | Result | |---|---| | pvfix DONE | ✅ STATUS-pvfix.md shows `## DONE`, both M1+M2 PASS | | `proxy` subnet | ✅ `10.10.0.0/16` (docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}") | | `proxy` IPAM driver | ✅ default, gateway 10.10.0.1 | | All services 1/1 | ✅ 9 services all `1/1` (backups, bridge, dashboard, reports, drone, traefik×2, keycloak×2) | | `ci.commoninternet.net` | ✅ HTTP/2 200 | | `drone.ci.commoninternet.net` | ✅ HTTP/2 303 | | `report.ci.commoninternet.net` | ✅ HTTP/2 200 | | VIP exhaustion after 05:38Z | ✅ NONE — `journalctl -u docker --since "2026-06-13 05:38:00" | grep "available IP while allocating VIP"` → empty | | Transient errors at 05:35Z | ℹ️ "could not find network allocator STATE" for OLD net IDs (mlxau8…, 85p3aq…) — these are expected during proxy recreation (swarm allocator losing state for the deleted /24 network) | | No new VIP exhaustion | ✅ post-fix journal clean | **Command evidence:** ``` $ docker network inspect proxy --format "{{json .IPAM}}" {"Driver":"default","Options":null,"Config":[{"Subnet":"10.10.0.0/16","Gateway":"10.10.0.1"}]} $ docker service ls --format "{{.Name}}\t{{.Replicas}}" backups_ci_commoninternet_net_app 1/1 ccci-bridge_app 1/1 ccci-dashboard_app 1/1 ccci-reports_app 1/1 drone_ci_commoninternet_net_app 1/1 traefik_ci_commoninternet_net_app 1/1 traefik_ci_commoninternet_net_socket-proxy 1/1 warm-keycloak_ci_commoninternet_net_app 1/1 warm-keycloak_ci_commoninternet_net_db 1/1 ``` ### Upgrade-all Step-0 guard — independent check **Guard location:** `/srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md` §0, lines 61-81 **Guard logic:** `VIPFAIL=$(ssh cc-ci 'journalctl -u docker --since "26 hours ago" | grep -c "available IP while allocating VIP"')` → if >0, `systemctl restart docker` **Guard exists:** ✅ confirmed cold-read **Guard would fire:** ✅ triggers on the EXACT original error signature (`"available IP while allocating VIP"`) — would detect and recover if VIP exhaustion recurs despite the /16 fix (belt+suspenders) **STALE TEXT NOTE:** Skill still says "(The durable fix ... is tracked in plan-proxy-vip-exhaustion-fix.md; this guard is the per-run safety net until that lands.)" — but the durable fix HAS now landed. This is a documentation smell, not a functional defect; the guard logic is correct and still useful. Filing as advisory finding [A2]. --- ## Adversary independent allocator-headroom probe — 2026-06-13T06:02Z **Method:** deploy 5 throwaway nginx stacks concurrently joining `proxy`, then remove all 5 concurrently (same concurrent-rm pattern that caused endpoint GC races under the old /24). | Check | Result | |---|---| | BASELINE proxy containers | 9 | | AFTER DEPLOY (5 stacks added) | 14 | | AFTER concurrent stack rm | 9 (back to baseline) | | Leaked endpoints | **0** | | VIP exhaustion errors during test | **0** | | Swarm GC race errors (key modified / network proxy remove failed) | **0** | | Network prune output | empty (nothing to reclaim) | | AFTER prune residue | **0** | | All pvcheck-throwaway stacks removed | ✅ confirmed | **Verdict:** The /16 subnet has sufficient headroom that 5 concurrent deploy/rm cycles produce zero endpoint leaks and zero VIP errors. No residue after prune. **Note:** 5 stacks is a conservative test — the original exhaustion required ~45 GC races over 11 days uptime. The /16 has 65534 VIPs vs the old /24's 254 — the leak rate would need to be ~258× faster to hit the same ceiling. This probe confirms the allocator is healthy and the /16 provides the claimed headroom. --- ## M1 — PASS @2026-06-13T06:10Z **Cold verify run — Adversary's own commands, no cached state.** | Check | Command | Result | |---|---|---| | proxy subnet | `docker network inspect proxy --format "Subnet: {{range .IPAM.Config}}{{.Subnet}}{{end}}, Endpoints: {{len .Containers}}"` | **`10.10.0.0/16`, Endpoints: 7** ✅ | | 9 services 1/1 | `docker service ls --format "{{.Name}}\t{{.Replicas}}"` | all 1/1 ✅ | | ci.commoninternet.net | `curl -sk -o /dev/null -w "%{http_code}"` | **200** ✅ | | drone.ci.commoninternet.net | same | **303** ✅ | | report.ci.commoninternet.net | same | **200** ✅ | | VIP exhaustion since 05:38Z | `journalctl -u docker --since "2026-06-13 05:38:00" \| grep -c "available IP while allocating VIP"` | **0** ✅ | | swarm.nix /16 declared | `grep "10.10" nix/modules/swarm.nix` | `--subnet 10.10.0.0/16` ✅ | | swarm.nix commit | `git show e6349a9 --stat` | confirmed ✅ | | Step-0 guard text | `grep -A8 "VIPFAIL" upgrade-all/SKILL.md` | guard exists, checks exact signature ✅ | | [A2] fix | `git -C /srv/cc-ci-orch log --oneline \| grep 84e13a7` | `fix(pvcheck/A2): update upgrade-all SKILL.md guard description` ✅ | | [A2] text updated | SKILL.md line ~81 | "belt-and-suspenders even after the /16 fix" ✅ | **All M1 criteria verified independently from cold start.** Builder's before/after evidence is consistent with what Adversary observed directly. No discrepancies. [A2] CLOSED — fix confirmed in orchestrator commit 84e13a7. ## M2 — PASS @2026-06-13T06:14Z **Cold verify run — Adversary's own commands, no cached state.** | Check | Command | Result | |---|---|---| | summary.png accessible | `curl -sk -o /dev/null -w "%{http_code}" .../runs/608/summary.png` | **HTTP 200** ✅ | | badge level | `curl -sk .../badge.svg \| grep -o "level [0-9]"` | **level 5** ✅ | | proxy endpoints after run | `docker network inspect proxy --format "{{len .Containers}}"` | **7** (clean, same as M1 baseline) ✅ | | VIP exhaustion since 05:38Z | `journalctl \| grep -c "available IP while allocating VIP"` | **0** ✅ | | Gitea comment #14506 | `GET /api/v1/repos/recipe-maintainers/hedgedoc/issues/1/comments` | ✅ `hedgedoc @ 441c411c ✅ passed` posted at 06:02:52Z | | !testme trigger comment | comment #14505 at 06:02:48Z by autonomic-bot | ✅ real !testme trigger | | Run trigger timing | 06:02:48Z → after proxy fix 05:38Z | ✅ entire run on new /16 | | Run result filesystem | `/var/lib/cc-ci-runs/608/results.json` | ✅ all tiers pass: install/upgrade/backup/restore/custom | | clean_teardown flag | `results.json flags.clean_teardown` | **true** ✅ | | no_secret_leak flag | `results.json flags.no_secret_leak` | **true** ✅ | | level | `results.json level` | **5** ✅ | | Drone journal trigger | `journalctl -u docker` for 06:02:52Z | ✅ `[poll] triggered build 608 for hedgedoc@441c411c (PR #1, comment 14505) by autonomic-bot` | | Drone journal outcome | `journalctl -u docker` for 06:04:23Z | ✅ `reflected outcome build 608 (hedgedoc PR #1): success` | | Allocator headroom (independent Adversary) | Probe at 06:02Z: 5 stacks, 0 leaks, 0 VIP errors, 0 GC races, 0 residue | ✅ confirmed independently | **All M2 criteria verified cold. Real recipe CI run through the new /16 proxy confirms it is operationally healthy. Allocator headroom confirmed by both independent Adversary probe and Builder's matching proof.** No discrepancies with Builder's claims. (Minor: Builder counts proxy baseline as 8, Adversary counts 7 via same `{{len .Containers}}` — this is a ~1-count fluctuation during concurrent probes, not a functional discrepancy. Both confirm clean return to baseline.) --- ## Adversary findings ### [A2] upgrade-all SKILL.md stale description — guard text still says "until that lands" (2026-06-13T05:56Z) **Severity:** Documentation / low **Location:** `/srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md` line 81 **Current text:** "this guard is the per-run safety net until that lands" **Issue:** the durable fix (proxy /16) has landed — this text now misleads about the guard's purpose (it IS still useful as belt+suspenders, but no longer "until the fix lands") **Suggested fix:** update to "this guard remains as belt-and-suspenders even after the /16 subnet fix" **NOT a VETO** — guard logic is correct; this is documentation only. Status: open (Builder may fix; Adversary closes after re-read)