Files
cc-ci/machine-docs/JOURNAL-pvcheck.md
autonomic-bot 935b6ae7bc
Some checks failed
continuous-integration/drone/push Build is failing
claim(pvcheck-M2): real CI run + allocator proof — M2 evidence complete
Real deploy: hedgedoc build #608 triggered 06:02Z (post-proxy-fix at 05:38Z),
passed 06:04Z at level 5. Proxy endpoints: 7 (clean teardown, no leaks).

Allocator headroom: 5 throwaway nginx stacks deployed+removed concurrently.
BASELINE=8, AFTER_DEPLOY=13, AFTER_RM=8 (baseline restored). 0 VIP errors,
0 leaked endpoints, 0 residue. Consistent with Adversary's independent probe.

VIP exhaustion since 05:38Z: 0 errors.
[A2] CLOSED by Adversary (orchestrator commit 84e13a7 confirmed).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 06:06:23 +00:00

88 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# JOURNAL — phase pvcheck (post-proxy verification)
Builder-private reasoning and working notes. Anti-anchoring: Adversary reads STATUS for claims, not this file.
---
## 2026-06-13T05:5506:02Z — Phase orientation and M1 data collection
Phase pvfix is DONE. Entered pvcheck. No phase files existed yet — the Adversary had proactively created REVIEW-pvcheck.md and BACKLOG-pvcheck.md with a baseline probe at 05:56Z.
**Adversary baseline findings (from REVIEW-pvcheck.md):**
- All preconditions verified cold (pvfix DONE, proxy /16 live, all services 1/1, all routes 200/303)
- [A2]: stale text in upgrade-all SKILL.md — "per-run safety net until that lands" (fix: proxy /16 HAS landed)
**My verification runs:**
```
$ ssh cc-ci 'docker network inspect proxy --format "{{range .IPAM.Config}}{{.Subnet}}{{end}}, Endpoints: {{len .Containers}}"'
10.10.0.0/16, Endpoints: 7
$ curl -sk -o /dev/null -w "%{http_code}" https://ci.commoninternet.net/ → 200
$ curl -sk -o /dev/null -w "%{http_code}" https://drone.ci.commoninternet.net/ → 303
$ curl -sk -o /dev/null -w "%{http_code}" https://report.ci.commoninternet.net/ → 200
$ ssh cc-ci 'journalctl -u docker --since "2026-06-13 05:38:00" | grep -c "available IP while allocating VIP"'
0
```
The "could not find network allocator STATE" errors in the 05:35Z window are expected transient noise: they occur when swarm tries to allocate VIPs for the old deleted /24 network IDs (mlxau8…, 85p3aq…) during the recreation — not the "available IP while allocating VIP" signature of actual exhaustion.
**A2 fix applied:**
- Edited `/srv/cc-ci-orch/.claude/skills/upgrade-all/SKILL.md` line 80-81
- Committed to orchestrator repo as `84e13a7`
- Guard logic unchanged — only the description now reflects reality (durable fix has landed)
**Decision on bridge /hook:** bridge is exposed at `PathPrefix(/hook)` and only accepts POST (webhook). A GET to `/hook` returns 404 — expected; health is confirmed via service logs showing the poller running and commenting on repos.
**M1 claim:** All control-plane facts documented. Claiming M1 now. Will work on M2 while awaiting verdict.
---
## 2026-06-13T06:02Z — M2 planning
M2 requires:
1. Real recipe CI run through proxy — will use a small enrolled recipe like `hedgedoc` or `cryptpad` if a !testme PR exists, or trigger via the harness directly
2. Allocator headroom proof — deploy/remove 3-5 throwaway stacks with published ports (simulating concurrent deploys), confirm endpoint count stays small and no VIP exhaustion
Will check what enrolled recipes have open PRs available for !testme first.
---
## 2026-06-13T06:0206:10Z — M2 execution
**Allocator headroom proof (Builder):**
```
# Baseline
ssh cc-ci 'docker network inspect proxy --format "{{len .Containers}}"' → 8
# Deploy 5 throwaway nginx stacks concurrently, each joining proxy with published ports
for i in 1..5: docker stack deploy pvcheck-throw-$i (background)
wait; sleep 5
→ AFTER DEPLOY: 13 (+5)
# Concurrent removal (same pattern as original GC race)
for i in 1..5: docker stack rm pvcheck-throw-$i (background)
wait; sleep 8
→ AFTER concurrent rm: 8 (back to baseline)
→ VIP exhaustion errors since 06:00Z: 0
→ docker network prune → empty (no residue)
→ docker stack ls | grep pvcheck → empty (all removed)
```
**Real recipe CI run:**
```
# Posted !testme on recipe-maintainers/hedgedoc PR#1 at 06:02:48Z (post-proxy-fix)
curl POST /repos/recipe-maintainers/hedgedoc/issues/1/comments body="!testme"
→ comment id: 14505
# Bridge picked up in 4 seconds (06:02:52Z)
# Started Drone build #608 for hedgedoc @ 441c411c
# Monitored: runner process PID 3016375 with RECIPE=hedgedoc, CI_BUILD_NUMBER=608
# Build #608 completed at 06:04:22Z → ✅ passed, level 5
# Proxy endpoint count after run: 7 (same as M1 baseline, clean teardown)
```
Key confirmation: the build was triggered at 06:02Z which is 24 minutes AFTER the proxy recreation at 05:38Z. Recipe containers deployed into and cleaned up from the /16 proxy network without issue.