claim(pvcheck-M2): real CI run + allocator proof — M2 evidence complete
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Real deploy: hedgedoc build #608 triggered 06:02Z (post-proxy-fix at 05:38Z), passed 06:04Z at level 5. Proxy endpoints: 7 (clean teardown, no leaks). Allocator headroom: 5 throwaway nginx stacks deployed+removed concurrently. BASELINE=8, AFTER_DEPLOY=13, AFTER_RM=8 (baseline restored). 0 VIP errors, 0 leaked endpoints, 0 residue. Consistent with Adversary's independent probe. VIP exhaustion since 05:38Z: 0 errors. [A2] CLOSED by Adversary (orchestrator commit 84e13a7 confirmed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@ -6,10 +6,10 @@
|
||||
- [x] Fix [A2] upgrade-all SKILL.md stale description (orchestrator commit 84e13a7)
|
||||
- [x] Collect M1 evidence (proxy subnet, endpoints, service health, routes, VIP journal)
|
||||
- [x] Claim M1 — control plane and routing verified
|
||||
- [ ] M2: real recipe CI run through proxy (harness or !testme)
|
||||
- [ ] M2: bounded allocator headroom proof (deploy/remove throwaway stacks, confirm no VIP exhaustion)
|
||||
- [ ] M2: cleanup verification (zero residue)
|
||||
- [ ] M2: claim gate after M1 PASS
|
||||
- [x] M2: real recipe CI run through proxy — hedgedoc build #608 ✅ passed level 5 (06:04Z post-fix)
|
||||
- [x] M2: bounded allocator headroom proof — 5 stacks deploy/rm, 0 leaks, 0 VIP errors (06:08Z)
|
||||
- [x] M2: cleanup verification — proxy endpoints: 7 (baseline), no residue (06:09Z)
|
||||
- [x] M2: claim gate
|
||||
|
||||
## Adversary findings
|
||||
|
||||
|
||||
@ -45,3 +45,43 @@ M2 requires:
|
||||
2. Allocator headroom proof — deploy/remove 3-5 throwaway stacks with published ports (simulating concurrent deploys), confirm endpoint count stays small and no VIP exhaustion
|
||||
|
||||
Will check what enrolled recipes have open PRs available for !testme first.
|
||||
|
||||
---
|
||||
|
||||
## 2026-06-13T06:02–06:10Z — M2 execution
|
||||
|
||||
**Allocator headroom proof (Builder):**
|
||||
```
|
||||
# Baseline
|
||||
ssh cc-ci 'docker network inspect proxy --format "{{len .Containers}}"' → 8
|
||||
|
||||
# Deploy 5 throwaway nginx stacks concurrently, each joining proxy with published ports
|
||||
for i in 1..5: docker stack deploy pvcheck-throw-$i (background)
|
||||
wait; sleep 5
|
||||
→ AFTER DEPLOY: 13 (+5)
|
||||
|
||||
# Concurrent removal (same pattern as original GC race)
|
||||
for i in 1..5: docker stack rm pvcheck-throw-$i (background)
|
||||
wait; sleep 8
|
||||
→ AFTER concurrent rm: 8 (back to baseline)
|
||||
→ VIP exhaustion errors since 06:00Z: 0
|
||||
→ docker network prune → empty (no residue)
|
||||
→ docker stack ls | grep pvcheck → empty (all removed)
|
||||
```
|
||||
|
||||
**Real recipe CI run:**
|
||||
```
|
||||
# Posted !testme on recipe-maintainers/hedgedoc PR#1 at 06:02:48Z (post-proxy-fix)
|
||||
curl POST /repos/recipe-maintainers/hedgedoc/issues/1/comments body="!testme"
|
||||
→ comment id: 14505
|
||||
|
||||
# Bridge picked up in 4 seconds (06:02:52Z)
|
||||
# Started Drone build #608 for hedgedoc @ 441c411c
|
||||
|
||||
# Monitored: runner process PID 3016375 with RECIPE=hedgedoc, CI_BUILD_NUMBER=608
|
||||
|
||||
# Build #608 completed at 06:04:22Z → ✅ passed, level 5
|
||||
# Proxy endpoint count after run: 7 (same as M1 baseline, clean teardown)
|
||||
```
|
||||
|
||||
Key confirmation: the build was triggered at 06:02Z which is 24 minutes AFTER the proxy recreation at 05:38Z. Recipe containers deployed into and cleaned up from the /16 proxy network without issue.
|
||||
|
||||
@ -1,87 +1,91 @@
|
||||
# STATUS — phase pvcheck (post-proxy verification)
|
||||
|
||||
**Updated:** 2026-06-13T06:02Z
|
||||
**Updated:** 2026-06-13T06:10Z
|
||||
**Phase:** pvcheck
|
||||
**Builder:** autonomic-bot
|
||||
|
||||
---
|
||||
|
||||
## Gate: M1 — CLAIMED, awaiting Adversary
|
||||
## Gate: M1 — PASS @2026-06-13T06:10Z (Adversary verified)
|
||||
|
||||
### M1 — Control plane and routing verified
|
||||
|
||||
**Claim:** All cc-ci control-plane routes/services are healthy after the proxy recreation. Before/after evidence captured.
|
||||
|
||||
#### How to verify (run cold from Adversary's clone on cc-ci host):
|
||||
|
||||
```bash
|
||||
# 1. Proxy subnet and endpoint count
|
||||
ssh cc-ci 'docker network inspect proxy --format "{{range .IPAM.Config}}Subnet: {{.Subnet}}{{end}}, Endpoints: {{len .Containers}}"'
|
||||
# EXPECTED: Subnet: 10.10.0.0/16, Endpoints: 7
|
||||
|
||||
# 2. All services healthy
|
||||
ssh cc-ci 'docker service ls --format "{{.Name}}\t{{.Replicas}}"'
|
||||
# EXPECTED: all 9 services show 1/1
|
||||
|
||||
# 3. External routes
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://ci.commoninternet.net/ # EXPECTED: 200
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://drone.ci.commoninternet.net/ # EXPECTED: 303
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://report.ci.commoninternet.net/ # EXPECTED: 200
|
||||
|
||||
# 4. No VIP exhaustion since proxy recreation (05:38Z)
|
||||
ssh cc-ci 'journalctl -u docker --since "2026-06-13 05:38:00" | grep -c "available IP while allocating VIP"'
|
||||
# EXPECTED: 0
|
||||
|
||||
# 5. Upgrade-all Step-0 guard exists and is correct
|
||||
grep -A5 "VIPFAIL" /srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md
|
||||
# EXPECTED: guard logic checking for "available IP while allocating VIP" signature
|
||||
```
|
||||
|
||||
#### Evidence (Builder run 2026-06-13T06:00Z):
|
||||
|
||||
| Check | Command | Result |
|
||||
|---|---|---|
|
||||
| proxy subnet | `docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}"` | `10.10.0.0/16` ✅ |
|
||||
| proxy endpoints | `docker network inspect proxy --format "{{len .Containers}}"` | `7` (6 service + 1 lb) ✅ |
|
||||
| proxy endpoint list | `docker network inspect proxy --format "{{range $k,$v := .Containers}}{{$v.Name}}{{end}}"` | drone, traefik, keycloak, reports, bridge, dashboard + lb-proxy ✅ |
|
||||
| 9 services 1/1 | `docker service ls` | all 1/1 ✅ |
|
||||
| ci.commoninternet.net | `curl -sk -o /dev/null -w "%{http_code}"` | `200` ✅ |
|
||||
| drone.ci.commoninternet.net | same | `303` ✅ |
|
||||
| report.ci.commoninternet.net | same | `200` ✅ |
|
||||
| VIP exhaustion since 05:38Z | `journalctl | grep "available IP while allocating VIP"` | `0` ✅ |
|
||||
| transient errors at 05:35Z | "could not find network allocator STATE" for old net IDs | expected during recreation, pre-38Z only ✅ |
|
||||
| upgrade-all Step-0 guard | SKILL.md §0 lines 61-81 | guard checks exact signature, fires + restarts docker ✅ |
|
||||
|
||||
#### Before/after evidence:
|
||||
|
||||
| Metric | Before (pvfix) | After (pvcheck) |
|
||||
|---|---|---|
|
||||
| proxy subnet | `10.0.1.0/24` (254 IPs) | `10.10.0.0/16` (65534 IPs) |
|
||||
| proxy endpoints | ~200 leaked (caused VIP exhaustion) | 7 (clean) |
|
||||
| VIP exhaustion errors | recurring "could not find an available IP" | 0 since 05:38Z |
|
||||
| Services healthy | intermittent failures | all 9 at 1/1 |
|
||||
|
||||
#### Adversary finding A2 fix:
|
||||
|
||||
[A2] upgrade-all SKILL.md stale description — **FIXED** in orchestrator repo commit `84e13a7` (2026-06-13T05:59Z).
|
||||
Guard description updated from "safety net until that lands" → "belt-and-suspenders even after the /16 fix".
|
||||
All cc-ci control-plane routes/services healthy after proxy recreation. See REVIEW-pvcheck.md for Adversary cold-verify evidence.
|
||||
|
||||
---
|
||||
|
||||
## M2 — IN PROGRESS
|
||||
## Gate: M2 — CLAIMED, awaiting Adversary
|
||||
|
||||
### Tasks for M2:
|
||||
- [ ] Real deploy proof: trigger one recipe `!testme` or equivalent harness run through proxy
|
||||
- [ ] Allocator-headroom proof: deploy/remove batch of throwaway stacks, confirm no VIP exhaustion
|
||||
- [ ] Confirm no residue after cleanup
|
||||
### M2 — Real CI and allocator proof
|
||||
|
||||
**Claim:** One real recipe CI run (hedgedoc build #608) completed successfully through proxy, and bounded allocator proof confirms no VIP exhaustion risk.
|
||||
|
||||
#### How to verify (run cold from Adversary's clone):
|
||||
|
||||
```bash
|
||||
# 1. Real CI run passed post-fix
|
||||
# Build #608 for hedgedoc triggered 2026-06-13T06:02Z, passed 2026-06-13T06:04Z
|
||||
curl -sk -o /dev/null -w "%{http_code}" https://ci.commoninternet.net/runs/608/summary.png
|
||||
# EXPECTED: 200
|
||||
|
||||
curl -sk https://ci.commoninternet.net/runs/608/badge.svg | grep -o "level [0-9]"
|
||||
# EXPECTED: level 5 (green)
|
||||
|
||||
# Gitea comment on recipe-maintainers/hedgedoc PR#1 (comment #14506)
|
||||
# EXPECTED: "cc-ci: hedgedoc @ 441c411c ✅ passed"
|
||||
|
||||
# 2. Proxy clean after run
|
||||
ssh cc-ci 'docker network inspect proxy --format "{{len .Containers}}"'
|
||||
# EXPECTED: 7 (same as M1 baseline — no leaked endpoints from the run)
|
||||
|
||||
# 3. No VIP exhaustion since proxy recreation
|
||||
ssh cc-ci 'journalctl -u docker --since "2026-06-13 05:38:00" | grep -c "available IP while allocating VIP"'
|
||||
# EXPECTED: 0
|
||||
|
||||
# 4. Allocator headroom proof (Adversary's independent probe is in REVIEW-pvcheck.md)
|
||||
# Builder's proof: deploy 5 throwaway stacks → rm concurrently → count endpoints
|
||||
# EXPECTED: endpoints return to baseline, 0 VIP errors, 0 residue
|
||||
```
|
||||
|
||||
#### Evidence (Builder run 2026-06-13T06:02–06:10Z):
|
||||
|
||||
**Real deploy proof:**
|
||||
|
||||
| Check | Result |
|
||||
|---|---|
|
||||
| Recipe | `hedgedoc` |
|
||||
| Trigger | `!testme` comment on recipe-maintainers/hedgedoc PR#1 (comment #14505, 06:02:48Z) |
|
||||
| Bridge response | 4 seconds (comment #14506, 06:02:52Z) |
|
||||
| Drone build | [#608](https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/608) |
|
||||
| Build result | ✅ **passed** (comment updated 06:04:22Z) |
|
||||
| Level | **level 5** (badge.svg shows `level 5`, green) |
|
||||
| Summary artifact | `https://ci.commoninternet.net/runs/608/summary.png` → HTTP 200 |
|
||||
| Proxy endpoint count after run | 7 (clean — same as M1 baseline) |
|
||||
| Trigger time | 2026-06-13T06:02:48Z (after proxy fix at 05:38Z) ✅ |
|
||||
|
||||
**Allocator headroom proof (Builder):**
|
||||
|
||||
| Check | Result |
|
||||
|---|---|
|
||||
| BASELINE proxy containers | 8 |
|
||||
| AFTER concurrent deploy (5 throwaway nginx stacks) | 13 (+5) |
|
||||
| AFTER concurrent stack rm | 8 (back to baseline) |
|
||||
| Leaked endpoints | **0** |
|
||||
| VIP exhaustion errors (since 06:00Z) | **0** |
|
||||
| `docker network prune` residue | empty (nothing to reclaim) |
|
||||
| All pvcheck-throw-* stacks removed | ✅ confirmed |
|
||||
|
||||
**Adversary independent allocator probe (from REVIEW-pvcheck.md):**
|
||||
5 throwaway stacks deployed/removed concurrently → 0 leaks, 0 VIP errors, 0 residue. (Pre-verified 2026-06-13T06:02Z)
|
||||
|
||||
**VIP exhaustion in post-fix journal:**
|
||||
`journalctl -u docker --since "2026-06-13 05:38:00" | grep "available IP while allocating VIP"` → **0** ✅
|
||||
|
||||
---
|
||||
|
||||
## Definition-of-Done checklist (pvcheck)
|
||||
|
||||
- [ ] Control-plane routes are healthy (M1 — claimed)
|
||||
- [ ] One real proxy-joining recipe CI run succeeds and cleans up (M2)
|
||||
- [ ] Bounded allocator reproduction documented (M2)
|
||||
- [ ] Fresh logs show no VIP exhaustion (M1 — claimed, ongoing)
|
||||
- [ ] Adversary signed off M1 in `machine-docs/REVIEW-pvcheck.md`
|
||||
- [x] Control-plane routes are healthy (M1 PASS @06:10Z)
|
||||
- [x] One real proxy-joining recipe CI run succeeds and cleans up (hedgedoc #608 PASS @06:04Z, level 5)
|
||||
- [x] Bounded allocator reproduction documented (Builder + Adversary independent probes)
|
||||
- [x] Fresh logs show no VIP exhaustion (0 errors since proxy fix at 05:38Z)
|
||||
- [x] Adversary signed off M1 in `machine-docs/REVIEW-pvcheck.md`
|
||||
- [ ] Adversary signed off M2 in `machine-docs/REVIEW-pvcheck.md`
|
||||
|
||||
Reference in New Issue
Block a user