Some checks failed
continuous-integration/drone/push Build is failing
Real deploy: hedgedoc build #608 triggered 06:02Z (post-proxy-fix at 05:38Z), passed 06:04Z at level 5. Proxy endpoints: 7 (clean teardown, no leaks). Allocator headroom: 5 throwaway nginx stacks deployed+removed concurrently. BASELINE=8, AFTER_DEPLOY=13, AFTER_RM=8 (baseline restored). 0 VIP errors, 0 leaked endpoints, 0 residue. Consistent with Adversary's independent probe. VIP exhaustion since 05:38Z: 0 errors. [A2] CLOSED by Adversary (orchestrator commit 84e13a7 confirmed). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
88 lines
4.0 KiB
Markdown
88 lines
4.0 KiB
Markdown
# JOURNAL — phase pvcheck (post-proxy verification)
|
||
|
||
Builder-private reasoning and working notes. Anti-anchoring: Adversary reads STATUS for claims, not this file.
|
||
|
||
---
|
||
|
||
## 2026-06-13T05:55–06:02Z — Phase orientation and M1 data collection
|
||
|
||
Phase pvfix is DONE. Entered pvcheck. No phase files existed yet — the Adversary had proactively created REVIEW-pvcheck.md and BACKLOG-pvcheck.md with a baseline probe at 05:56Z.
|
||
|
||
**Adversary baseline findings (from REVIEW-pvcheck.md):**
|
||
- All preconditions verified cold (pvfix DONE, proxy /16 live, all services 1/1, all routes 200/303)
|
||
- [A2]: stale text in upgrade-all SKILL.md — "per-run safety net until that lands" (fix: proxy /16 HAS landed)
|
||
|
||
**My verification runs:**
|
||
```
|
||
$ ssh cc-ci 'docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}, Endpoints: {{len .Containers}}"'
|
||
10.10.0.0/16, Endpoints: 7
|
||
|
||
$ curl -sk -o /dev/null -w "%{http_code}" https://ci.commoninternet.net/ → 200
|
||
$ curl -sk -o /dev/null -w "%{http_code}" https://drone.ci.commoninternet.net/ → 303
|
||
$ curl -sk -o /dev/null -w "%{http_code}" https://report.ci.commoninternet.net/ → 200
|
||
|
||
$ ssh cc-ci 'journalctl -u docker --since "2026-06-13 05:38:00" | grep -c "available IP while allocating VIP"'
|
||
0
|
||
```
|
||
|
||
The "could not find network allocator STATE" errors in the 05:35Z window are expected transient noise: they occur when swarm tries to allocate VIPs for the old deleted /24 network IDs (mlxau8…, 85p3aq…) during the recreation — not the "available IP while allocating VIP" signature of actual exhaustion.
|
||
|
||
**A2 fix applied:**
|
||
- Edited `/srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md` line 80-81
|
||
- Committed to orchestrator repo as `84e13a7`
|
||
- Guard logic unchanged — only the description now reflects reality (durable fix has landed)
|
||
|
||
**Decision on bridge /hook:** bridge is exposed at `PathPrefix(/hook)` and only accepts POST (webhook). A GET to `/hook` returns 404 — expected; health is confirmed via service logs showing the poller running and commenting on repos.
|
||
|
||
**M1 claim:** All control-plane facts documented. Claiming M1 now. Will work on M2 while awaiting verdict.
|
||
|
||
---
|
||
|
||
## 2026-06-13T06:02Z — M2 planning
|
||
|
||
M2 requires:
|
||
1. Real recipe CI run through proxy — will use a small enrolled recipe like `hedgedoc` or `cryptpad` if a !testme PR exists, or trigger via the harness directly
|
||
2. Allocator headroom proof — deploy/remove 3-5 throwaway stacks with published ports (simulating concurrent deploys), confirm endpoint count stays small and no VIP exhaustion
|
||
|
||
Will check what enrolled recipes have open PRs available for !testme first.
|
||
|
||
---
|
||
|
||
## 2026-06-13T06:02–06:10Z — M2 execution
|
||
|
||
**Allocator headroom proof (Builder):**
|
||
```
|
||
# Baseline
|
||
ssh cc-ci 'docker network inspect proxy --format "{{len .Containers}}"' → 8
|
||
|
||
# Deploy 5 throwaway nginx stacks concurrently, each joining proxy with published ports
|
||
for i in 1..5: docker stack deploy pvcheck-throw-$i (background)
|
||
wait; sleep 5
|
||
→ AFTER DEPLOY: 13 (+5)
|
||
|
||
# Concurrent removal (same pattern as original GC race)
|
||
for i in 1..5: docker stack rm pvcheck-throw-$i (background)
|
||
wait; sleep 8
|
||
→ AFTER concurrent rm: 8 (back to baseline)
|
||
→ VIP exhaustion errors since 06:00Z: 0
|
||
→ docker network prune → empty (no residue)
|
||
→ docker stack ls | grep pvcheck → empty (all removed)
|
||
```
|
||
|
||
**Real recipe CI run:**
|
||
```
|
||
# Posted !testme on recipe-maintainers/hedgedoc PR#1 at 06:02:48Z (post-proxy-fix)
|
||
curl POST /repos/recipe-maintainers/hedgedoc/issues/1/comments body="!testme"
|
||
→ comment id: 14505
|
||
|
||
# Bridge picked up in 4 seconds (06:02:52Z)
|
||
# Started Drone build #608 for hedgedoc @ 441c411c
|
||
|
||
# Monitored: runner process PID 3016375 with RECIPE=hedgedoc, CI_BUILD_NUMBER=608
|
||
|
||
# Build #608 completed at 06:04:22Z → ✅ passed, level 5
|
||
# Proxy endpoint count after run: 7 (same as M1 baseline, clean teardown)
|
||
```
|
||
|
||
Key confirmation: the build was triggered at 06:02Z which is 24 minutes AFTER the proxy recreation at 05:38Z. Recipe containers deployed into and cleaned up from the /16 proxy network without issue.
|