Files
cc-ci/machine-docs/STATUS-pvcheck.md
autonomic-bot 3df0ee154d
Some checks failed
continuous-integration/drone/push Build is failing
claim(pvcheck-M1): control plane and routing verified post-proxy-recreation
proxy subnet: 10.10.0.0/16, 7 endpoints (6 services + lb)
All 9 swarm services: 1/1
Routes: ci (200), drone (303), report (200)
VIP exhaustion since 05:38Z: 0 errors
Upgrade-all Step-0 guard confirmed in SKILL.md §0
[A2] SKILL.md stale description fixed (orchestrator commit 84e13a7)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 06:00:03 +00:00

3.8 KiB

STATUS — phase pvcheck (post-proxy verification)

Updated: 2026-06-13T06:02Z
Phase: pvcheck
Builder: autonomic-bot


Gate: M1 — CLAIMED, awaiting Adversary

M1 — Control plane and routing verified

Claim: All cc-ci control-plane routes/services are healthy after the proxy recreation. Before/after evidence captured.

How to verify (run cold from Adversary's clone on cc-ci host):

# 1. Proxy subnet and endpoint count
ssh cc-ci 'docker network inspect proxy --format "{{range .IPAM.Config}}Subnet: {{.Subnet}}{{end}}, Endpoints: {{len .Containers}}"'
# EXPECTED: Subnet: 10.10.0.0/16, Endpoints: 7

# 2. All services healthy
ssh cc-ci 'docker service ls --format "{{.Name}}\t{{.Replicas}}"'
# EXPECTED: all 9 services show 1/1

# 3. External routes
curl -sk -o /dev/null -w "%{http_code}" https://ci.commoninternet.net/      # EXPECTED: 200
curl -sk -o /dev/null -w "%{http_code}" https://drone.ci.commoninternet.net/ # EXPECTED: 303
curl -sk -o /dev/null -w "%{http_code}" https://report.ci.commoninternet.net/ # EXPECTED: 200

# 4. No VIP exhaustion since proxy recreation (05:38Z)
ssh cc-ci 'journalctl -u docker --since "2026-06-13 05:38:00" | grep -c "available IP while allocating VIP"'
# EXPECTED: 0

# 5. Upgrade-all Step-0 guard exists and is correct
grep -A5 "VIPFAIL" /srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md
# EXPECTED: guard logic checking for "available IP while allocating VIP" signature

Evidence (Builder run 2026-06-13T06:00Z):

Check Command Result
proxy subnet docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}" 10.10.0.0/16
proxy endpoints docker network inspect proxy --format "{{len .Containers}}" 7 (6 service + 1 lb)
proxy endpoint list docker network inspect proxy --format "{{range $k,$v := .Containers}}{{$v.Name}}{{end}}" drone, traefik, keycloak, reports, bridge, dashboard + lb-proxy
9 services 1/1 docker service ls all 1/1
ci.commoninternet.net curl -sk -o /dev/null -w "%{http_code}" 200
drone.ci.commoninternet.net same 303
report.ci.commoninternet.net same 200
VIP exhaustion since 05:38Z `journalctl grep "available IP while allocating VIP"`
transient errors at 05:35Z "could not find network allocator STATE" for old net IDs expected during recreation, pre-38Z only
upgrade-all Step-0 guard SKILL.md §0 lines 61-81 guard checks exact signature, fires + restarts docker

Before/after evidence:

Metric Before (pvfix) After (pvcheck)
proxy subnet 10.0.1.0/24 (254 IPs) 10.10.0.0/16 (65534 IPs)
proxy endpoints ~200 leaked (caused VIP exhaustion) 7 (clean)
VIP exhaustion errors recurring "could not find an available IP" 0 since 05:38Z
Services healthy intermittent failures all 9 at 1/1

Adversary finding A2 fix:

[A2] upgrade-all SKILL.md stale description — FIXED in orchestrator repo commit 84e13a7 (2026-06-13T05:59Z).
Guard description updated from "safety net until that lands" → "belt-and-suspenders even after the /16 fix".


M2 — IN PROGRESS

Tasks for M2:

  • Real deploy proof: trigger one recipe !testme or equivalent harness run through proxy
  • Allocator-headroom proof: deploy/remove batch of throwaway stacks, confirm no VIP exhaustion
  • Confirm no residue after cleanup

Definition-of-Done checklist (pvcheck)

  • Control-plane routes are healthy (M1 — claimed)
  • One real proxy-joining recipe CI run succeeds and cleans up (M2)
  • Bounded allocator reproduction documented (M2)
  • Fresh logs show no VIP exhaustion (M1 — claimed, ongoing)
  • Adversary signed off M1 in machine-docs/REVIEW-pvcheck.md
  • Adversary signed off M2 in machine-docs/REVIEW-pvcheck.md