cc-ci-orchestrator/ghost-pr-debug.md at main

Files

autonomic-bot ca02a0dd6f upgrade-all: proxy VIP-exhaustion guard in Step 0; runbooks for proxy /16 enlarge + ghost PR debug

Root-caused (empirically, dockerd logs) the discourse/ghost deploy wedges:
the shared proxy overlay (/24=254 VIPs) exhausts as concurrent stack rm leaks
endpoints over many days -> tasks stuck in Swarm 'New'. Add a per-run safety
net to Step 0 (network prune + docker restart when VIP-allocation failures are
logged). Plans + memory for the durable fix (enlarge proxy to /16 in swarm.nix,
maintenance window) and for debugging/fixing the ghost PR afterward.

2026-06-12 03:30:00 +00:00

1.2 KiB

Raw Permalink Blame History

name, description, metadata

name

description

metadata

ghost-pr-debug

TODO after proxy fix — debug & fix the ghost recipe upgrade PR (its !testme kept wedging; possible duplicate PR from interrupt churn)

node_type	type	originSessionId
memory	project	85355980-5e4f-4f90-b1ca-d0e4fe82f04b

During the 2026-06-12 weekly upgrade, ghost (6.42.0→6.44.1 + mysql bump) was the recipe whose !testme kept wedging — its deploys hung at 0/1 in Swarm New, which was the proxy VIP exhaustion infra issue (proxy-vip-exhaustion-runbook), not necessarily a ghost defect. It also got run by a DUPLICATE subagent during the interrupt churn, so the ghost PR/branch state may be messy.

TODO (after the proxy fix removes the infra confound): inventory the ghost PR(s) on recipe-maintainers/ghost (one or duplicate?), separate infra-failure from a real upgrade problem by re-running !testme on a HEALTHY swarm, dedup any duplicate PR, fix-forward to green (recipe PR only; comment on genuinely-stale tests, never edit them in default mode), and leave exactly one clean, operator-ready ghost PR. NEVER merge. Plan: cc-ci-plan/plan-ghostpr-debug-fix.md. Delete this memory once the ghost PR is clean + green (or clearly explained).

1.2 KiB Raw Permalink Blame History

1.2 KiB

Raw Permalink Blame History