5 concurrent throwaway stacks deploy+rm. Zero leaked endpoints, zero GC races, zero VIP exhaustion errors, zero residue after prune. /16 headroom confirmed cold. Still waiting for Builder M1/M2 claims.
5.0 KiB
REVIEW — phase pvcheck (post-proxy verification)
Adversary-owned. Append-only verdicts. All commands run cold from /srv/cc-ci-orch/cc-ci-adv (own clone).
Adversary baseline probe — 2026-06-13T05:56Z
Context: Phase pvfix is DONE (STATUS-pvfix.md ## DONE). pvcheck preconditions verified cold.
Precondition checks
| Check | Result |
|---|---|
| pvfix DONE | ✅ STATUS-pvfix.md shows ## DONE, both M1+M2 PASS |
proxy subnet |
✅ 10.10.0.0/16 (docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}") |
proxy IPAM driver |
✅ default, gateway 10.10.0.1 |
| All services 1/1 | ✅ 9 services all 1/1 (backups, bridge, dashboard, reports, drone, traefik×2, keycloak×2) |
ci.commoninternet.net |
✅ HTTP/2 200 |
drone.ci.commoninternet.net |
✅ HTTP/2 303 |
report.ci.commoninternet.net |
✅ HTTP/2 200 |
| VIP exhaustion after 05:38Z | ✅ NONE — `journalctl -u docker --since "2026-06-13 05:38:00" |
| Transient errors at 05:35Z | ℹ️ "could not find network allocator STATE" for OLD net IDs (mlxau8…, 85p3aq…) — these are expected during proxy recreation (swarm allocator losing state for the deleted /24 network) |
| No new VIP exhaustion | ✅ post-fix journal clean |
Command evidence:
$ docker network inspect proxy --format "{{json .IPAM}}"
{"Driver":"default","Options":null,"Config":[{"Subnet":"10.10.0.0/16","Gateway":"10.10.0.1"}]}
$ docker service ls --format "{{.Name}}\t{{.Replicas}}"
backups_ci_commoninternet_net_app 1/1
ccci-bridge_app 1/1
ccci-dashboard_app 1/1
ccci-reports_app 1/1
drone_ci_commoninternet_net_app 1/1
traefik_ci_commoninternet_net_app 1/1
traefik_ci_commoninternet_net_socket-proxy 1/1
warm-keycloak_ci_commoninternet_net_app 1/1
warm-keycloak_ci_commoninternet_net_db 1/1
Upgrade-all Step-0 guard — independent check
Guard location: /srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md §0, lines 61-81
Guard logic: VIPFAIL=$(ssh cc-ci 'journalctl -u docker --since "26 hours ago" | grep -c "available IP while allocating VIP"') → if >0, systemctl restart docker
Guard exists: ✅ confirmed cold-read
Guard would fire: ✅ triggers on the EXACT original error signature ("available IP while allocating VIP") — would detect and recover if VIP exhaustion recurs despite the /16 fix (belt+suspenders)
STALE TEXT NOTE: Skill still says "(The durable fix ... is tracked in plan-proxy-vip-exhaustion-fix.md; this guard is the per-run safety net until that lands.)" — but the durable fix HAS now landed. This is a documentation smell, not a functional defect; the guard logic is correct and still useful. Filing as advisory finding [A2].
Adversary independent allocator-headroom probe — 2026-06-13T06:02Z
Method: deploy 5 throwaway nginx stacks concurrently joining proxy, then remove all 5 concurrently (same concurrent-rm pattern that caused endpoint GC races under the old /24).
| Check | Result |
|---|---|
| BASELINE proxy containers | 9 |
| AFTER DEPLOY (5 stacks added) | 14 |
| AFTER concurrent stack rm | 9 (back to baseline) |
| Leaked endpoints | 0 |
| VIP exhaustion errors during test | 0 |
| Swarm GC race errors (key modified / network proxy remove failed) | 0 |
| Network prune output | empty (nothing to reclaim) |
| AFTER prune residue | 0 |
| All pvcheck-throwaway stacks removed | ✅ confirmed |
Verdict: The /16 subnet has sufficient headroom that 5 concurrent deploy/rm cycles produce zero endpoint leaks and zero VIP errors. No residue after prune.
Note: 5 stacks is a conservative test — the original exhaustion required ~45 GC races over 11 days uptime. The /16 has 65534 VIPs vs the old /24's 254 — the leak rate would need to be ~258× faster to hit the same ceiling. This probe confirms the allocator is healthy and the /16 provides the claimed headroom.
M1 — PENDING (awaiting Builder claim)
Builder has not yet claimed M1 in STATUS-pvcheck.md. Adversary baseline facts are pre-verified above.
M2 — PENDING (awaiting Builder claim)
Real recipe CI run after the proxy fix (05:38Z) still needed. Dashboard shows run #585 (ghost, ~04:56Z) was before the fix — a new !testme run post-fix is required.
Adversary findings
[A2] upgrade-all SKILL.md stale description — guard text still says "until that lands" (2026-06-13T05:56Z)
Severity: Documentation / low
Location: /srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md line 81
Current text: "this guard is the per-run safety net until that lands"
Issue: the durable fix (proxy /16) has landed — this text now misleads about the guard's purpose (it IS still useful as belt+suspenders, but no longer "until the fix lands")
Suggested fix: update to "this guard remains as belt-and-suspenders even after the /16 subnet fix"
NOT a VETO — guard logic is correct; this is documentation only.
Status: open (Builder may fix; Adversary closes after re-read)