71 lines
3.8 KiB
Markdown
71 lines
3.8 KiB
Markdown
# Phase `ghost` — re-evaluate ghost after proxy fix and leave one clean PR
|
|
|
|
**Mission:** re-evaluate the `ghost` upgrade failure after the Swarm proxy/IPAM infra
|
|
confound has been removed, then leave exactly one operator-ready ghost PR: green if the
|
|
recipe is sound, or clearly explained with the minimum required recipe fix/comment if a real
|
|
Ghost/MySQL upgrade issue remains.
|
|
|
|
State files live under `machine-docs/`: `STATUS-ghost.md`, `BACKLOG-ghost.md`,
|
|
`REVIEW-ghost.md`, `JOURNAL-ghost.md`.
|
|
|
|
## Context
|
|
|
|
The 2026-06-12 `/upgrade-all` recorded `ghost` as the only failed recipe, but the evidence
|
|
was mixed:
|
|
|
|
- One failure was definitely infra: shared `proxy` overlay VIP exhaustion left tasks stuck
|
|
in Swarm `New` state.
|
|
- A later failure may be recipe-specific: MySQL 8.0 to 8.4 data-dir upgrade timing under
|
|
Swarm's default update monitor, producing `UpdateStatus=paused` under load.
|
|
- A previous run on 2026-06-05 passed the Ghost/MySQL path under lighter load.
|
|
- Duplicate ghost subagent churn may have left branch/PR/comment state messy.
|
|
|
|
Existing focused plan/background: `/srv/cc-ci/cc-ci-plan/plan-ghostpr-debug-fix.md`.
|
|
|
|
## Required Work
|
|
|
|
1. **Inventory PR state.** On `recipe-maintainers/ghost`, list all open PRs and branches
|
|
related to the upgrade. Identify the correct PR, expected to be ghost PR `#4`, and close
|
|
or clearly mark any duplicate only if it is truly superseded. Never merge recipe PRs.
|
|
2. **Separate infra from recipe behavior.** After `pvfix` and `pvcheck`, trigger a fresh
|
|
`!testme` on the correct ghost PR and watch the run. Do not count pre-proxy failures as
|
|
current recipe evidence.
|
|
3. **If green:** record that the prior failure was infra/timing-confounded, ensure no stale
|
|
stacks/volumes remain, and leave the PR ready for operator review.
|
|
4. **If red for a real recipe reason:** make the smallest recipe PR change needed. The
|
|
suspected fix is a longer Swarm update monitor/start grace around the MySQL 8.0 to 8.4
|
|
data-dir migration, e.g. `update_config.monitor: 300s` and related minimal service health
|
|
timing. Validate the hypothesis with logs; do not cargo-cult timing knobs.
|
|
5. **If the test is genuinely stale:** default recipe-upgrade policy applies: leave an
|
|
explanatory PR comment for the operator. Do not edit cc-ci tests in this phase unless the
|
|
operator explicitly asks for a test-update phase.
|
|
6. **Deduplicate and clean up.** Ensure exactly one relevant open ghost upgrade PR remains,
|
|
comments explain the final state, and no `ghos-*`/`dev-ghost` stacks or volumes leak.
|
|
|
|
## Gates
|
|
|
|
**M1 — State inventory and clean retry.** Builder documents PR/branch/comment/build state,
|
|
identifies the correct PR, and runs one clean post-proxy `!testme`. Adversary verifies that
|
|
pre-proxy infra failures were not misclassified as current recipe failures.
|
|
|
|
**M2 — Operator-ready outcome.** The ghost PR is green, or it has the minimal justified
|
|
recipe fix/comment and a clear current blocker. Duplicate PR/branch mess is resolved and
|
|
no ghost resources leak. Adversary verifies live PR state, build evidence, and cleanup.
|
|
|
|
## Guardrails
|
|
|
|
- Recipe PRs are never merged by agents.
|
|
- Do not weaken tests to get green.
|
|
- Do not re-run ghost during proxy maintenance or while `cfold` owns a broad CI sweep.
|
|
- Keep iterations bounded: at most three fresh post-proxy `!testme` attempts unless the
|
|
operator authorizes more.
|
|
- Preserve useful failure evidence in PR comments and `machine-docs/STATUS-ghost.md`.
|
|
|
|
## Definition of Done
|
|
|
|
Exactly one ghost upgrade PR is operator-ready, with a fresh post-proxy verdict and clear
|
|
classification of the 2026-06-12 failure. Any real recipe fix is minimal and verified;
|
|
otherwise the PR is green or has a precise operator-facing explanation. Adversary has
|
|
signed off on M1 and M2 in `machine-docs/REVIEW-ghost.md`; Builder writes `## DONE` only
|
|
after both gates have fresh Adversary PASSes.
|