claim(pvcheck-M2): real CI run + allocator proof — M2 evidence complete
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Real deploy: hedgedoc build #608 triggered 06:02Z (post-proxy-fix at 05:38Z), passed 06:04Z at level 5. Proxy endpoints: 7 (clean teardown, no leaks). Allocator headroom: 5 throwaway nginx stacks deployed+removed concurrently. BASELINE=8, AFTER_DEPLOY=13, AFTER_RM=8 (baseline restored). 0 VIP errors, 0 leaked endpoints, 0 residue. Consistent with Adversary's independent probe. VIP exhaustion since 05:38Z: 0 errors. [A2] CLOSED by Adversary (orchestrator commit 84e13a7 confirmed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@ -1,87 +1,91 @@
|
||||
# STATUS — phase pvcheck (post-proxy verification)
|
||||
|
||||
**Updated:** 2026-06-13T06:02Z
|
||||
**Updated:** 2026-06-13T06:10Z
|
||||
**Phase:** pvcheck
|
||||
**Builder:** autonomic-bot
|
||||
|
||||
---
|
||||
|
||||
## Gate: M1 — CLAIMED, awaiting Adversary
|
||||
## Gate: M1 — PASS @2026-06-13T06:10Z (Adversary verified)
|
||||
|
||||
### M1 — Control plane and routing verified
|
||||
|
||||
**Claim:** All cc-ci control-plane routes/services are healthy after the proxy recreation. Before/after evidence captured.
|
||||
|
||||
#### How to verify (run cold from Adversary's clone on cc-ci host):
|
||||
|
||||
```bash
|
||||
# 1. Proxy subnet and endpoint count
|
||||
ssh cc-ci 'docker network inspect proxy --format "{{range .IPAM.Config}}Subnet: {{.Subnet}}{{end}}, Endpoints: {{len .Containers}}"'
|
||||
# EXPECTED: Subnet: 10.10.0.0/16, Endpoints: 7
|
||||
|
||||
# 2. All services healthy
|
||||
ssh cc-ci 'docker service ls --format "{{.Name}}\t{{.Replicas}}"'
|
||||
# EXPECTED: all 9 services show 1/1
|
||||
|
||||
# 3. External routes
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://ci.commoninternet.net/ # EXPECTED: 200
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://drone.ci.commoninternet.net/ # EXPECTED: 303
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://report.ci.commoninternet.net/ # EXPECTED: 200
|
||||
|
||||
# 4. No VIP exhaustion since proxy recreation (05:38Z)
|
||||
ssh cc-ci 'journalctl -u docker --since "2026-06-13 05:38:00" | grep -c "available IP while allocating VIP"'
|
||||
# EXPECTED: 0
|
||||
|
||||
# 5. Upgrade-all Step-0 guard exists and is correct
|
||||
grep -A5 "VIPFAIL" /srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md
|
||||
# EXPECTED: guard logic checking for "available IP while allocating VIP" signature
|
||||
```
|
||||
|
||||
#### Evidence (Builder run 2026-06-13T06:00Z):
|
||||
|
||||
| Check | Command | Result |
|
||||
|---|---|---|
|
||||
| proxy subnet | `docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}"` | `10.10.0.0/16` ✅ |
|
||||
| proxy endpoints | `docker network inspect proxy --format "{{len .Containers}}"` | `7` (6 service + 1 lb) ✅ |
|
||||
| proxy endpoint list | `docker network inspect proxy --format "{{range $k,$v := .Containers}}{{$v.Name}}{{end}}"` | drone, traefik, keycloak, reports, bridge, dashboard + lb-proxy ✅ |
|
||||
| 9 services 1/1 | `docker service ls` | all 1/1 ✅ |
|
||||
| ci.commoninternet.net | `curl -sk -o /dev/null -w "%{http_code}"` | `200` ✅ |
|
||||
| drone.ci.commoninternet.net | same | `303` ✅ |
|
||||
| report.ci.commoninternet.net | same | `200` ✅ |
|
||||
| VIP exhaustion since 05:38Z | `journalctl | grep "available IP while allocating VIP"` | `0` ✅ |
|
||||
| transient errors at 05:35Z | "could not find network allocator STATE" for old net IDs | expected during recreation, pre-38Z only ✅ |
|
||||
| upgrade-all Step-0 guard | SKILL.md §0 lines 61-81 | guard checks exact signature, fires + restarts docker ✅ |
|
||||
|
||||
#### Before/after evidence:
|
||||
|
||||
| Metric | Before (pvfix) | After (pvcheck) |
|
||||
|---|---|---|
|
||||
| proxy subnet | `10.0.1.0/24` (254 IPs) | `10.10.0.0/16` (65534 IPs) |
|
||||
| proxy endpoints | ~200 leaked (caused VIP exhaustion) | 7 (clean) |
|
||||
| VIP exhaustion errors | recurring "could not find an available IP" | 0 since 05:38Z |
|
||||
| Services healthy | intermittent failures | all 9 at 1/1 |
|
||||
|
||||
#### Adversary finding A2 fix:
|
||||
|
||||
[A2] upgrade-all SKILL.md stale description — **FIXED** in orchestrator repo commit `84e13a7` (2026-06-13T05:59Z).
|
||||
Guard description updated from "safety net until that lands" → "belt-and-suspenders even after the /16 fix".
|
||||
All cc-ci control-plane routes/services healthy after proxy recreation. See REVIEW-pvcheck.md for Adversary cold-verify evidence.
|
||||
|
||||
---
|
||||
|
||||
## M2 — IN PROGRESS
|
||||
## Gate: M2 — CLAIMED, awaiting Adversary
|
||||
|
||||
### Tasks for M2:
|
||||
- [ ] Real deploy proof: trigger one recipe `!testme` or equivalent harness run through proxy
|
||||
- [ ] Allocator-headroom proof: deploy/remove batch of throwaway stacks, confirm no VIP exhaustion
|
||||
- [ ] Confirm no residue after cleanup
|
||||
### M2 — Real CI and allocator proof
|
||||
|
||||
**Claim:** One real recipe CI run (hedgedoc build #608) completed successfully through proxy, and bounded allocator proof confirms no VIP exhaustion risk.
|
||||
|
||||
#### How to verify (run cold from Adversary's clone):
|
||||
|
||||
```bash
|
||||
# 1. Real CI run passed post-fix
|
||||
# Build #608 for hedgedoc triggered 2026-06-13T06:02Z, passed 2026-06-13T06:04Z
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://ci.commoninternet.net/runs/608/summary.png
|
||||
# EXPECTED: 200
|
||||
|
||||
curl -sk https://ci.commoninternet.net/runs/608/badge.svg | grep -o "level [0-9]"
|
||||
# EXPECTED: level 5 (green)
|
||||
|
||||
# Gitea comment on recipe-maintainers/hedgedoc PR#1 (comment #14506)
|
||||
# EXPECTED: "cc-ci: hedgedoc @ 441c411c ✅ passed"
|
||||
|
||||
# 2. Proxy clean after run
|
||||
ssh cc-ci 'docker network inspect proxy --format "{{len .Containers}}"'
|
||||
# EXPECTED: 7 (same as M1 baseline — no leaked endpoints from the run)
|
||||
|
||||
# 3. No VIP exhaustion since proxy recreation
|
||||
ssh cc-ci 'journalctl -u docker --since "2026-06-13 05:38:00" | grep -c "available IP while allocating VIP"'
|
||||
# EXPECTED: 0
|
||||
|
||||
# 4. Allocator headroom proof (Adversary's independent probe is in REVIEW-pvcheck.md)
|
||||
# Builder's proof: deploy 5 throwaway stacks → rm concurrently → count endpoints
|
||||
# EXPECTED: endpoints return to baseline, 0 VIP errors, 0 residue
|
||||
```
|
||||
|
||||
#### Evidence (Builder run 2026-06-13T06:02–06:10Z):
|
||||
|
||||
**Real deploy proof:**
|
||||
|
||||
| Check | Result |
|
||||
|---|---|
|
||||
| Recipe | `hedgedoc` |
|
||||
| Trigger | `!testme` comment on recipe-maintainers/hedgedoc PR#1 (comment #14505, 06:02:48Z) |
|
||||
| Bridge response | 4 seconds (comment #14506, 06:02:52Z) |
|
||||
| Drone build | [#608](https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/608) |
|
||||
| Build result | ✅ **passed** (comment updated 06:04:22Z) |
|
||||
| Level | **level 5** (badge.svg shows `level 5`, green) |
|
||||
| Summary artifact | `https://ci.commoninternet.net/runs/608/summary.png` → HTTP 200 |
|
||||
| Proxy endpoint count after run | 7 (clean — same as M1 baseline) |
|
||||
| Trigger time | 2026-06-13T06:02:48Z (after proxy fix at 05:38Z) ✅ |
|
||||
|
||||
**Allocator headroom proof (Builder):**
|
||||
|
||||
| Check | Result |
|
||||
|---|---|
|
||||
| BASELINE proxy containers | 8 |
|
||||
| AFTER concurrent deploy (5 throwaway nginx stacks) | 13 (+5) |
|
||||
| AFTER concurrent stack rm | 8 (back to baseline) |
|
||||
| Leaked endpoints | **0** |
|
||||
| VIP exhaustion errors (since 06:00Z) | **0** |
|
||||
| `docker network prune` residue | empty (nothing to reclaim) |
|
||||
| All pvcheck-throw-* stacks removed | ✅ confirmed |
|
||||
|
||||
**Adversary independent allocator probe (from REVIEW-pvcheck.md):**
|
||||
5 throwaway stacks deployed/removed concurrently → 0 leaks, 0 VIP errors, 0 residue. (Pre-verified 2026-06-13T06:02Z)
|
||||
|
||||
**VIP exhaustion in post-fix journal:**
|
||||
`journalctl -u docker --since "2026-06-13 05:38:00" | grep "available IP while allocating VIP"` → **0** ✅
|
||||
|
||||
---
|
||||
|
||||
## Definition-of-Done checklist (pvcheck)
|
||||
|
||||
- [ ] Control-plane routes are healthy (M1 — claimed)
|
||||
- [ ] One real proxy-joining recipe CI run succeeds and cleans up (M2)
|
||||
- [ ] Bounded allocator reproduction documented (M2)
|
||||
- [ ] Fresh logs show no VIP exhaustion (M1 — claimed, ongoing)
|
||||
- [ ] Adversary signed off M1 in `machine-docs/REVIEW-pvcheck.md`
|
||||
- [x] Control-plane routes are healthy (M1 PASS @06:10Z)
|
||||
- [x] One real proxy-joining recipe CI run succeeds and cleans up (hedgedoc #608 PASS @06:04Z, level 5)
|
||||
- [x] Bounded allocator reproduction documented (Builder + Adversary independent probes)
|
||||
- [x] Fresh logs show no VIP exhaustion (0 errors since proxy fix at 05:38Z)
|
||||
- [x] Adversary signed off M1 in `machine-docs/REVIEW-pvcheck.md`
|
||||
- [ ] Adversary signed off M2 in `machine-docs/REVIEW-pvcheck.md`
|
||||
|
||||
Reference in New Issue
Block a user