3.3 KiB
Phase pvcheck — post-proxy verification and regression proof
Mission: prove that the durable proxy overlay fix is actually safe in production:
the network has the intended headroom, routing works, real recipe CI still deploys through
Traefik, and the IPAM/VIP exhaustion signature no longer threatens the weekly upgrade path.
State files live under machine-docs/: STATUS-pvcheck.md, BACKLOG-pvcheck.md,
REVIEW-pvcheck.md, JOURNAL-pvcheck.md.
Preconditions
- Phase
pvfixis## DONE. docker network inspect proxyshows the intended/16subnet.- Core control-plane services are back after the proxy recreation.
Verification Scope
- Host/network facts. Capture and record:
docker network inspect proxysubnet and endpoint countdocker stack ls- Traefik, Drone, bridge, dashboard, and report service health
- recent dockerd journal lines for VIP/IPAM errors
- Routing checks. Verify externally visible routes still work:
- Drone UI/API route
- dashboard route
- bridge/poller health if exposed locally
- report site route
- Real deploy proof. Trigger one low-risk enrolled recipe
!testmeor equivalent harness run that joinsproxy, completes all expected tiers, and tears down cleanly. Prefer a small stable recipe unlesscfoldneeds a broader sweep at the same time. Do not duplicate an activecfoldsweep. - Allocator-headroom proof. Run a bounded reproduction derived from
plan-proxy-vip-exhaustion-fix.md:- deploy/remove a small batch of throwaway published-port stacks, preferably in the same concurrent pattern that previously leaked endpoints
- confirm leaked endpoint count, if any, is tiny relative to
/16headroom - confirm no fresh
could not find an available IP while allocating VIPerrors - prune throwaway networks/stacks and verify no residue
- Upgrade safety check. Confirm the
/upgrade-allStep-0 guard still exists and would detect/recover the known VIP exhaustion signature if it ever recurs.
Gates
M1 — Control plane and routing verified. All cc-ci control-plane routes/services are
healthy after the proxy recreation, with before/after evidence in STATUS-pvcheck.md.
Adversary verifies independently from live commands, not just Builder notes.
M2 — Real CI and allocator proof verified. At least one real recipe deploy/test passes
through proxy and tears down cleanly; bounded allocator reproduction does not threaten the
new /16; no VIP exhaustion signature remains in fresh logs. Adversary verifies all claims
and checks for leaks.
Guardrails
- Do not run a large recipe sweep here if
cfoldalready owns that proof. This phase is the proxy-specific post-change proof. - Keep concurrency bounded. The point is to prove headroom, not stress the host into a new unrelated failure.
- Clean up every throwaway stack/network. Zero residue is part of the acceptance criteria.
- If any core route is down, stop new test traffic and fix routing first.
Definition of Done
Control-plane routes are healthy, one real proxy-joining recipe CI run succeeds and cleans
up, bounded allocator reproduction is documented, fresh logs show no VIP exhaustion, and
Adversary has signed off on M1 and M2 in machine-docs/REVIEW-pvcheck.md. Builder writes
## DONE only after both gates have fresh Adversary PASSes.