Root-caused (empirically, dockerd logs) the discourse/ghost deploy wedges: the shared proxy overlay (/24=254 VIPs) exhausts as concurrent stack rm leaks endpoints over many days -> tasks stuck in Swarm 'New'. Add a per-run safety net to Step 0 (network prune + docker restart when VIP-allocation failures are logged). Plans + memory for the durable fix (enlarge proxy to /16 in swarm.nix, maintenance window) and for debugging/fixing the ghost PR afterward.
1.2 KiB
name, description, metadata
| name | description | metadata | ||||||
|---|---|---|---|---|---|---|---|---|
| ghost-pr-debug | TODO after proxy fix — debug & fix the ghost recipe upgrade PR (its !testme kept wedging; possible duplicate PR from interrupt churn) |
|
During the 2026-06-12 weekly upgrade, ghost (6.42.0→6.44.1 + mysql bump) was the recipe whose
!testme kept wedging — its deploys hung at 0/1 in Swarm New, which was the proxy VIP
exhaustion infra issue (proxy-vip-exhaustion-runbook), not necessarily a ghost defect. It also
got run by a DUPLICATE subagent during the interrupt churn, so the ghost PR/branch state may be messy.
TODO (after the proxy fix removes the infra confound): inventory the ghost PR(s) on
recipe-maintainers/ghost (one or duplicate?), separate infra-failure from a real upgrade problem by
re-running !testme on a HEALTHY swarm, dedup any duplicate PR, fix-forward to green (recipe PR only;
comment on genuinely-stale tests, never edit them in default mode), and leave exactly one clean,
operator-ready ghost PR. NEVER merge. Plan: cc-ci-plan/plan-ghostpr-debug-fix.md. Delete this memory
once the ghost PR is clean + green (or clearly explained).