Files
cc-ci/machine-docs/REVIEW-pvcheck.md
autonomic-bot 692e6d2108
Some checks failed
continuous-integration/drone/push Build is failing
review(pvcheck): init Adversary state files + baseline precondition probe PASS
Cold verify: proxy 10.10.0.0/16 confirmed, all 9 services 1/1, routes 200/303.
No VIP exhaustion errors post-05:38Z. Step-0 guard verified present in upgrade-all skill.
[A2] filed: stale description in SKILL.md (guard text still says 'until that lands').
M1 and M2 pending Builder claim.
2026-06-13 05:57:07 +00:00

3.6 KiB
Raw Blame History

REVIEW — phase pvcheck (post-proxy verification)

Adversary-owned. Append-only verdicts. All commands run cold from /srv/cc-ci-orch/cc-ci-adv (own clone).


Adversary baseline probe — 2026-06-13T05:56Z

Context: Phase pvfix is DONE (STATUS-pvfix.md ## DONE). pvcheck preconditions verified cold.

Precondition checks

Check Result
pvfix DONE STATUS-pvfix.md shows ## DONE, both M1+M2 PASS
proxy subnet 10.10.0.0/16 (docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}")
proxy IPAM driver default, gateway 10.10.0.1
All services 1/1 9 services all 1/1 (backups, bridge, dashboard, reports, drone, traefik×2, keycloak×2)
ci.commoninternet.net HTTP/2 200
drone.ci.commoninternet.net HTTP/2 303
report.ci.commoninternet.net HTTP/2 200
VIP exhaustion after 05:38Z NONE — `journalctl -u docker --since "2026-06-13 05:38:00"
Transient errors at 05:35Z "could not find network allocator STATE" for OLD net IDs (mlxau8…, 85p3aq…) — these are expected during proxy recreation (swarm allocator losing state for the deleted /24 network)
No new VIP exhaustion post-fix journal clean

Command evidence:

$ docker network inspect proxy --format "{{json .IPAM}}"
{"Driver":"default","Options":null,"Config":[{"Subnet":"10.10.0.0/16","Gateway":"10.10.0.1"}]}

$ docker service ls --format "{{.Name}}\t{{.Replicas}}"
backups_ci_commoninternet_net_app	1/1
ccci-bridge_app	1/1
ccci-dashboard_app	1/1
ccci-reports_app	1/1
drone_ci_commoninternet_net_app	1/1
traefik_ci_commoninternet_net_app	1/1
traefik_ci_commoninternet_net_socket-proxy	1/1
warm-keycloak_ci_commoninternet_net_app	1/1
warm-keycloak_ci_commoninternet_net_db	1/1

Upgrade-all Step-0 guard — independent check

Guard location: /srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md §0, lines 61-81
Guard logic: VIPFAIL=$(ssh cc-ci 'journalctl -u docker --since "26 hours ago" | grep -c "available IP while allocating VIP"') → if >0, systemctl restart docker
Guard exists: confirmed cold-read
Guard would fire: triggers on the EXACT original error signature ("available IP while allocating VIP") — would detect and recover if VIP exhaustion recurs despite the /16 fix (belt+suspenders)
STALE TEXT NOTE: Skill still says "(The durable fix ... is tracked in plan-proxy-vip-exhaustion-fix.md; this guard is the per-run safety net until that lands.)" — but the durable fix HAS now landed. This is a documentation smell, not a functional defect; the guard logic is correct and still useful. Filing as advisory finding [A2].


M1 — PENDING (awaiting Builder claim)

Builder has not yet claimed M1. Adversary will verify cold once claim appears in STATUS-pvcheck.md.

M2 — PENDING (awaiting Builder claim)


Adversary findings

[A2] upgrade-all SKILL.md stale description — guard text still says "until that lands" (2026-06-13T05:56Z)

Severity: Documentation / low
Location: /srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md line 81
Current text: "this guard is the per-run safety net until that lands"
Issue: the durable fix (proxy /16) has landed — this text now misleads about the guard's purpose (it IS still useful as belt+suspenders, but no longer "until the fix lands")
Suggested fix: update to "this guard remains as belt-and-suspenders even after the /16 subnet fix"
NOT a VETO — guard logic is correct; this is documentation only.
Status: open (Builder may fix; Adversary closes after re-read)