Files
cc-ci-orchestrator/cc-ci-plan/plan-phase5-verify-upgrade-flow.md
autonomic-bot 1c2be64124 Phase 5 §4: install weekly upgrade cron at completion+1h and verify first kickoff
Operator: when the final phase completes, install the weekly cron anchored to
actual completion — first run ~1h after the build finishes, weekly from then on
(supersedes the fixed "Sat 03:00 UTC" placeholder).

- plan-phase5 §4: orchestrator computes T0=now+1h, installs a weekly job at T0's
  DOW+HH:MM running launch-upgrader.sh start; cron env needs claude on PATH +
  tmux + claude.ai login (mirror cc-ci-loops.service). VERIFY the first kickoff:
  cheap --dry-run pre-check, then confirm the real T0 fire launched the
  cc-ci-upgrader agent (status RUNNING, ran /upgrade-all, summary produced);
  record schedule + verified kickoff in DECISIONS.md.
- upgrade-all skill Cron section + cron memory updated to the completion-anchored
  schedule + first-kickoff verification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 21:21:20 +01:00

117 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci Phase 5 — verify the `/recipe-upgrade` + `testme-on-pr.sh` end-to-end flow
**Status:** QUEUED — the **FINAL** phase, after Phase 4. Transition: auto (last in the launcher
sequence); the watchdog STOPS after it (build complete). **Owner:** Builder + Adversary loops
(Adversary cold-verifies); the orchestrator may also drive it.
**This file:** `/srv/cc-ci/cc-ci-plan/plan-phase5-verify-upgrade-flow.md`
**Phase order:** … 3 → 4 → **5**.
---
## 0. Why this is LAST
The `/recipe-upgrade` and `/upgrade-all` skills live in the **orchestration repo**
(`/srv/cc-ci/.claude/skills/`) and drive cc-ci by posting **`!testme` on a recipe PR**
(`testme-on-pr.sh`) and reading the verdict back **from the PR**. That flow only works once cc-ci is
**fully built** — specifically it depends on **Phase 3 (results UX)**: the run's result must be posted
back to the PR (a Gitea commit status and/or result comment on the PR head) for `testme-on-pr.sh` to
read it. So this verification can only run after the CI server is complete. It proves the upgrade
tooling actually works end-to-end **before** the weekly `/upgrade-all` cron is activated
([[cc-ci-upgrade-all-cron]] — activate only after the build is done).
**Use a sandbox recipe for the live runs** (e.g. `custom-html-tiny` or a dedicated throwaway recipe),
and **close every PR opened during verification afterward** — do not pollute real recipe mirrors, and
**never merge** anything.
## 1. Definition of Done (Adversary cold-verifies → `machine-docs/REVIEW-5.md`)
- [ ] **V1 — `!testme` on a recipe PR is the real gate.** Posting `!testme` (as the bot, a
collaborator) on a recipe mirror PR triggers bridge → Drone → harness, runs the suite, and
**posts the run + its result back to the PR** (commit status on the PR head and/or a result
comment, per Phase 3). A `!testme` from a **non-collaborator is rejected**. A bad trigger
(`!testmexyz`) does NOT fire. (Re-uses the existing !testme D-gate evidence.)
- [ ] **V2 — `testme-on-pr.sh` reads the verdict correctly.** `POST=1` posts exactly **one**
`!testme`; `POST=0` polls WITHOUT re-triggering; the helper returns `VERDICT=GREEN` on a passing
PR, `VERDICT=RED` on a failing PR, `VERDICT=PENDING` while in flight, and a `BUILD=<url>` that
points at the Drone build. (Drive a known-green and a known-red PR head.)
- [ ] **V3 — `/recipe-upgrade <sandbox-recipe>` (DEFAULT mode) end-to-end GREEN.** On a recipe with a
real available upgrade: it reconciles the mirror, opens a recipe PR with the bump, `!testme`
runs and **results appear in the PR**, GREEN → `RESULT: SUCCESS`. **Nothing merged.**
- [ ] **V4 — 3-iteration regression loop.** Seed a real upgrade regression (e.g. a deliberately bad
image tag) → `!testme` RED → fix on the recipe branch → re-`!testme` → GREEN, all within **≤3
`!testme` runs**, each recorded in the PR. With a persistent regression, after 3 runs it leaves
the PR open and reports `RESULT: FAILED — upgrade not green after 3 !testme runs` (does not
weaken anything to force green).
- [ ] **V5 — stale-test DEFAULT = comment, no test edit.** With a genuinely stale test for the new
version, default mode (no `--with-tests`) leaves an **explanatory comment** on the recipe PR
(upgrade looks correct; which test is stale + why; "re-run `--with-tests`"), modifies **no
test**, and reports `RESULT: SUCCESS-PENDING-TESTS`.
- [ ] **V6 — `--with-tests` opens + verifies a cc-ci test PR.** The same stale-test case under
`--with-tests` opens a cc-ci test-update PR (dedicated branch, separate clone — never `main`,
never the loops' clones), **verifies the recipe upgrade WITH the test change applied** via the
branch-checkout harness run (`verify-pr.sh`, since `!testme` uses prod tests), pairs the two PRs
with cross-notes, and reports `RESULT: SUCCESS+TESTPR`. Nothing merged.
- [ ] **V7 — mirror reconciliation.** After a run, the `recipe-maintainers/<recipe>` mirror `main` is
**identical to upstream main**; an open PR whose change is already in upstream main is
**auto-closed** (merged-upstream); a superseded open PR is **closed and replaced** by the new
one. (Seed each case and confirm.)
- [ ] **V8 — `/upgrade-all` DEFAULT run.** `--dry-run` lists candidates correctly; a live run over a
small set (12 sandbox recipes) opens recipe PRs **using default mode (never `--with-tests`)**,
surfaces "tests look stale" PRs in its own summary section, runs **sequentially with teardown**
between recipes, and the summary leads with the PR list. Confirms the weekly cron never
auto-edits tests.
- [ ] **V8a — the `cc-ci-upgrader` agent.** `cc-ci-plan/launch-upgrader.sh start` spins up a
remote-control `cc-ci-upgrader` session (viewable at claude.ai/code) that runs `/upgrade-all`
(DEFAULT) **to completion, then STOPS and stays idle** (does NOT self-terminate) with the
summary visible. A second `start` while a run is **in flight (busy)** leaves it alone; a `start`
against a **finished/idle (or wedged)** session **kills it and runs fresh**. (Use
`UPGRADER_ARGS=--dry-run` for a cheap check, then a real small-set run.) This is exactly what the
weekly cron invokes — verify the cron-equivalent path end-to-end.
- [ ] **V9 — cleanup.** Every PR opened during verification (recipe + any cc-ci test PR) is **closed**
and any sandbox deploy is **torn down**; the verification `cc-ci-upgrader` session is stopped
(`launch-upgrader.sh stop`); the box is left clean.
## 2. Method / notes
- **Sandbox first.** Prefer `custom-html-tiny` / a throwaway recipe for V3V8 so real recipe mirrors
aren't spammed. For the seeded-regression (V4) and stale-test (V5/V6) cases, construct the seed on a
sandbox recipe / a scratch branch, not a real upstream recipe.
- **Real path throughout** — real `!testme` → bridge → Drone → harness → results-in-PR; real `abra`.
- The skills are at `/srv/cc-ci/.claude/skills/{recipe-upgrade,upgrade-all,ci-test-review}/`; read
their SKILL.md as the spec for expected behavior.
- If Phase 3's PR-result surface differs from what `testme-on-pr.sh` polls (commit status vs comment),
**reconcile them here** — either adjust the helper to read the real surface, or file a CI finding —
and record the resolution in `DECISIONS.md`. (This is the most likely real gap to find.)
## 3. Guardrails
- **NEVER merge** any PR; **never weaken a test** to make a run pass.
- **Close verification PRs + tear down deploys** when done (V9) — leave no junk.
- **Single-writer:** cc-ci test PRs on dedicated branches + separate clones; never push `main` or
touch the loops' working clones; shared Swarm is stateful — serialize + tear down.
- **Bounded:** this phase VERIFIES the flow; it does not redesign the skills. A real defect → fix the
skill/helper (small) or file it; a bigger idea → `IDEAS.md`.
## 4. On phase completion — install the weekly cron (+1h) and verify it fires
Once V1V9 pass (the upgrade flow + the `cc-ci-upgrader` agent are proven) AND this is the final phase
(the whole cc-ci build is complete), the **orchestrator** installs the recurring weekly upgrade cron.
This SUPERSEDES the earlier fixed "Sat 03:00 UTC" placeholder — the schedule is now anchored to actual
completion ([[cc-ci-upgrade-all-cron]]).
1. **Schedule = completion + 1 hour, weekly thereafter.** Compute `T0 = now + 1h`; install a **weekly**
job at T0's day-of-week + `HH:MM` (cron `MM HH * * DOW`) so the FIRST run fires ~1h after the build
finishes, then every 7 days at that same slot.
2. **What it runs:** `/srv/cc-ci/cc-ci-plan/launch-upgrader.sh start` (→ spins up the `cc-ci-upgrader`
agent → `/upgrade-all` DEFAULT → stops idle/viewable). The cron environment must have the `claude`
CLI on `PATH`, `tmux` access, and the box logged into the claude.ai account (remote-control) — a
bare cron env is minimal, so set `PATH`/`HOME` explicitly (mirror how `cc-ci-loops.service`
launches). A user crontab / systemd timer / Claude scheduled task are all acceptable mechanisms;
pick one that gives that environment.
3. **VERIFY the first kickoff (the point of this step).**
- Cheap pre-check first: `UPGRADER_ARGS=--dry-run cc-ci-plan/launch-upgrader.sh fresh` (or schedule
a one-off test fire a couple of minutes out) to confirm the launcher path works from the chosen
cron mechanism's environment.
- Then confirm the **real** first fire at T0 (~1h later) actually kicked it off:
`launch-upgrader.sh status` shows `cc-ci-upgrader RUNNING`, it ran `/upgrade-all`, and produced a
summary. If it did NOT fire/launch (PATH, login, mechanism), fix and re-verify.
- Record the installed schedule + the verified-kickoff result in `DECISIONS.md`.