Files
cc-ci-orchestrator/cc-ci-plan/plan-phase5-verify-upgrade-flow.md
autonomic-bot 1c2be64124 Phase 5 §4: install weekly upgrade cron at completion+1h and verify first kickoff
Operator: when the final phase completes, install the weekly cron anchored to
actual completion — first run ~1h after the build finishes, weekly from then on
(supersedes the fixed "Sat 03:00 UTC" placeholder).

- plan-phase5 §4: orchestrator computes T0=now+1h, installs a weekly job at T0's
  DOW+HH:MM running launch-upgrader.sh start; cron env needs claude on PATH +
  tmux + claude.ai login (mirror cc-ci-loops.service). VERIFY the first kickoff:
  cheap --dry-run pre-check, then confirm the real T0 fire launched the
  cc-ci-upgrader agent (status RUNNING, ran /upgrade-all, summary produced);
  record schedule + verified kickoff in DECISIONS.md.
- upgrade-all skill Cron section + cron memory updated to the completion-anchored
  schedule + first-kickoff verification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 21:21:20 +01:00

8.7 KiB
Raw Blame History

cc-ci Phase 5 — verify the /recipe-upgrade + testme-on-pr.sh end-to-end flow

Status: QUEUED — the FINAL phase, after Phase 4. Transition: auto (last in the launcher sequence); the watchdog STOPS after it (build complete). Owner: Builder + Adversary loops (Adversary cold-verifies); the orchestrator may also drive it. This file: /srv/cc-ci/cc-ci-plan/plan-phase5-verify-upgrade-flow.md Phase order: … 3 → 4 → 5.


0. Why this is LAST

The /recipe-upgrade and /upgrade-all skills live in the orchestration repo (/srv/cc-ci/.claude/skills/) and drive cc-ci by posting !testme on a recipe PR (testme-on-pr.sh) and reading the verdict back from the PR. That flow only works once cc-ci is fully built — specifically it depends on Phase 3 (results UX): the run's result must be posted back to the PR (a Gitea commit status and/or result comment on the PR head) for testme-on-pr.sh to read it. So this verification can only run after the CI server is complete. It proves the upgrade tooling actually works end-to-end before the weekly /upgrade-all cron is activated (cc-ci-upgrade-all-cron — activate only after the build is done).

Use a sandbox recipe for the live runs (e.g. custom-html-tiny or a dedicated throwaway recipe), and close every PR opened during verification afterward — do not pollute real recipe mirrors, and never merge anything.

1. Definition of Done (Adversary cold-verifies → machine-docs/REVIEW-5.md)

  • V1 — !testme on a recipe PR is the real gate. Posting !testme (as the bot, a collaborator) on a recipe mirror PR triggers bridge → Drone → harness, runs the suite, and posts the run + its result back to the PR (commit status on the PR head and/or a result comment, per Phase 3). A !testme from a non-collaborator is rejected. A bad trigger (!testmexyz) does NOT fire. (Re-uses the existing !testme D-gate evidence.)
  • V2 — testme-on-pr.sh reads the verdict correctly. POST=1 posts exactly one !testme; POST=0 polls WITHOUT re-triggering; the helper returns VERDICT=GREEN on a passing PR, VERDICT=RED on a failing PR, VERDICT=PENDING while in flight, and a BUILD=<url> that points at the Drone build. (Drive a known-green and a known-red PR head.)
  • V3 — /recipe-upgrade <sandbox-recipe> (DEFAULT mode) end-to-end GREEN. On a recipe with a real available upgrade: it reconciles the mirror, opens a recipe PR with the bump, !testme runs and results appear in the PR, GREEN → RESULT: SUCCESS. Nothing merged.
  • V4 — 3-iteration regression loop. Seed a real upgrade regression (e.g. a deliberately bad image tag) → !testme RED → fix on the recipe branch → re-!testme → GREEN, all within ≤3 !testme runs, each recorded in the PR. With a persistent regression, after 3 runs it leaves the PR open and reports RESULT: FAILED — upgrade not green after 3 !testme runs (does not weaken anything to force green).
  • V5 — stale-test DEFAULT = comment, no test edit. With a genuinely stale test for the new version, default mode (no --with-tests) leaves an explanatory comment on the recipe PR (upgrade looks correct; which test is stale + why; "re-run --with-tests"), modifies no test, and reports RESULT: SUCCESS-PENDING-TESTS.
  • V6 — --with-tests opens + verifies a cc-ci test PR. The same stale-test case under --with-tests opens a cc-ci test-update PR (dedicated branch, separate clone — never main, never the loops' clones), verifies the recipe upgrade WITH the test change applied via the branch-checkout harness run (verify-pr.sh, since !testme uses prod tests), pairs the two PRs with cross-notes, and reports RESULT: SUCCESS+TESTPR. Nothing merged.
  • V7 — mirror reconciliation. After a run, the recipe-maintainers/<recipe> mirror main is identical to upstream main; an open PR whose change is already in upstream main is auto-closed (merged-upstream); a superseded open PR is closed and replaced by the new one. (Seed each case and confirm.)
  • V8 — /upgrade-all DEFAULT run. --dry-run lists candidates correctly; a live run over a small set (12 sandbox recipes) opens recipe PRs using default mode (never --with-tests), surfaces "tests look stale" PRs in its own summary section, runs sequentially with teardown between recipes, and the summary leads with the PR list. Confirms the weekly cron never auto-edits tests.
  • V8a — the cc-ci-upgrader agent. cc-ci-plan/launch-upgrader.sh start spins up a remote-control cc-ci-upgrader session (viewable at claude.ai/code) that runs /upgrade-all (DEFAULT) to completion, then STOPS and stays idle (does NOT self-terminate) with the summary visible. A second start while a run is in flight (busy) leaves it alone; a start against a finished/idle (or wedged) session kills it and runs fresh. (Use UPGRADER_ARGS=--dry-run for a cheap check, then a real small-set run.) This is exactly what the weekly cron invokes — verify the cron-equivalent path end-to-end.
  • V9 — cleanup. Every PR opened during verification (recipe + any cc-ci test PR) is closed and any sandbox deploy is torn down; the verification cc-ci-upgrader session is stopped (launch-upgrader.sh stop); the box is left clean.

2. Method / notes

  • Sandbox first. Prefer custom-html-tiny / a throwaway recipe for V3V8 so real recipe mirrors aren't spammed. For the seeded-regression (V4) and stale-test (V5/V6) cases, construct the seed on a sandbox recipe / a scratch branch, not a real upstream recipe.
  • Real path throughout — real !testme → bridge → Drone → harness → results-in-PR; real abra.
  • The skills are at /srv/cc-ci/.claude/skills/{recipe-upgrade,upgrade-all,ci-test-review}/; read their SKILL.md as the spec for expected behavior.
  • If Phase 3's PR-result surface differs from what testme-on-pr.sh polls (commit status vs comment), reconcile them here — either adjust the helper to read the real surface, or file a CI finding — and record the resolution in DECISIONS.md. (This is the most likely real gap to find.)

3. Guardrails

  • NEVER merge any PR; never weaken a test to make a run pass.
  • Close verification PRs + tear down deploys when done (V9) — leave no junk.
  • Single-writer: cc-ci test PRs on dedicated branches + separate clones; never push main or touch the loops' working clones; shared Swarm is stateful — serialize + tear down.
  • Bounded: this phase VERIFIES the flow; it does not redesign the skills. A real defect → fix the skill/helper (small) or file it; a bigger idea → IDEAS.md.

4. On phase completion — install the weekly cron (+1h) and verify it fires

Once V1V9 pass (the upgrade flow + the cc-ci-upgrader agent are proven) AND this is the final phase (the whole cc-ci build is complete), the orchestrator installs the recurring weekly upgrade cron. This SUPERSEDES the earlier fixed "Sat 03:00 UTC" placeholder — the schedule is now anchored to actual completion (cc-ci-upgrade-all-cron).

  1. Schedule = completion + 1 hour, weekly thereafter. Compute T0 = now + 1h; install a weekly job at T0's day-of-week + HH:MM (cron MM HH * * DOW) so the FIRST run fires ~1h after the build finishes, then every 7 days at that same slot.
  2. What it runs: /srv/cc-ci/cc-ci-plan/launch-upgrader.sh start (→ spins up the cc-ci-upgrader agent → /upgrade-all DEFAULT → stops idle/viewable). The cron environment must have the claude CLI on PATH, tmux access, and the box logged into the claude.ai account (remote-control) — a bare cron env is minimal, so set PATH/HOME explicitly (mirror how cc-ci-loops.service launches). A user crontab / systemd timer / Claude scheduled task are all acceptable mechanisms; pick one that gives that environment.
  3. VERIFY the first kickoff (the point of this step).
    • Cheap pre-check first: UPGRADER_ARGS=--dry-run cc-ci-plan/launch-upgrader.sh fresh (or schedule a one-off test fire a couple of minutes out) to confirm the launcher path works from the chosen cron mechanism's environment.
    • Then confirm the real first fire at T0 (~1h later) actually kicked it off: launch-upgrader.sh status shows cc-ci-upgrader RUNNING, it ran /upgrade-all, and produced a summary. If it did NOT fire/launch (PATH, login, mechanism), fix and re-verify.
    • Record the installed schedule + the verified-kickoff result in DECISIONS.md.