recipe-upgrade: !testme-on-PR verification + make test PRs opt-in (--with-tests)

Per operator:
- Verify via `!testme` posted ON the recipe PR (the real CI path) so results are
  viewable in the PR; iterate up to 3 !testme runs (fix a real regression + re-test).
  New helper testme-on-pr.sh posts !testme and polls the PR head commit status
  for the verdict (POST=0 to keep polling without re-triggering).
- Test updates are now OPT-IN via `--with-tests`. DEFAULT: recipe PR only using
  existing tests; if a test fails and is genuinely stale, leave an explanatory
  COMMENT on the PR (upgrade looks correct; re-run --with-tests to update tests)
  and do NOT touch any test. --with-tests keeps the verified cc-ci test-update PR
  path (verified via the branch-checkout harness run, since !testme uses prod tests).
- upgrade-all (weekly cron) calls the DEFAULT — never auto-edits tests unattended;
  surfaces "tests look stale" PRs in the summary for the operator to opt in per-recipe.
- New RESULT: SUCCESS-PENDING-TESTS for the recipe-green-but-test-stale default case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-05-29 20:18:59 +01:00
parent c7da03fa6c
commit 4a1da1dd60
3 changed files with 134 additions and 38 deletions

View File

@ -1,6 +1,6 @@
---
name: recipe-upgrade
description: Upgrade ONE Co-op Cloud recipe end-to-end and verify it on the cc-ci CI server. Researches available upstream upgrades, plans them (breaking changes, migrations, config), implements the bump (image tags + recipe version label + config), then VERIFIES the change green on cc-ci (full suite, cold, against the PR head) and opens a recipe PR. If the upgrade is correct but a cc-ci TEST is now stale, it also updates the test, verifies that, and opens a second PR to the cc-ci repo. NEVER merges — leaves verified, ready-to-merge PRs. The per-recipe worker behind /upgrade-all. Invoke as /recipe-upgrade <recipe>.
description: Upgrade ONE Co-op Cloud recipe end-to-end and verify it on the cc-ci CI server. Researches available upstream upgrades, plans them (breaking changes, migrations, config), implements the bump (image tags + recipe version label + config), opens a recipe PR, and verifies it by posting `!testme` on the PR (real CI; results visible in the PR; iterates up to 3×). DEFAULT: recipe PR only, using existing tests — if a test fails because it is genuinely stale, it leaves an explanatory COMMENT on the PR for the operator (does NOT touch tests). With `--with-tests`: also opens + verifies a PR to update the stale cc-ci test. NEVER merges. The per-recipe worker behind /upgrade-all. Invoke as /recipe-upgrade <recipe> [--with-tests].
---
# recipe-upgrade
@ -18,6 +18,15 @@ harness, and the bot token to push mirror branches). The Gitea PR API call uses
cc-ci harness decides pass/fail). **Create + verify, NEVER merge** (operator merges). **Never weaken a
test** to make a PR green.
## Arguments
- `<recipe>` — the recipe to upgrade (required).
- `--with-tests` (optional) — **opt in to updating cc-ci tests.** Without it (the DEFAULT, and what the
weekly `/upgrade-all` cron uses), this skill **only ever opens a recipe PR using the existing tests**;
if a test fails and is diagnosed as genuinely stale, it **leaves a comment** on the recipe PR
explaining why and stops — it does **not** modify any test. With `--with-tests`, a stale-test failure
additionally gets a **verified cc-ci test-update PR** (step 5b). A real upgrade regression is fixed +
re-tested in **both** modes.
## Procedure
### 1. Plan the upgrade (research — follow recipe-upgrade-plan methodology)
@ -73,32 +82,51 @@ real current upstream main). Capture the `PR_URL`. Optionally export `RECIPE_PR_
table + the planned operator-action notes). Re-running with the same target version updates the
existing same-branch PR rather than duplicating it.
### 4. VERIFY the upgrade on the CI server (the gate)
Run the cc-ci full suite **cold** against the PR head — the dogfood gate from `ci-test-review`:
### 4. VERIFY by running `!testme` ON THE PR (results visible in the PR; iterate ≤3×)
Trigger the **real** CI on the recipe PR by posting `!testme` — the bridge runs the harness on cc-ci
and posts the run + its results back to the PR, so the operator sees them in the PR. Use the PR index
from step 3's `PR_URL` (`.../pulls/<N>`):
```
RECIPE=<recipe> REF=upgrade-<new-version> /srv/cc-ci/.claude/skills/ci-test-review/verify-pr.sh
/srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh <recipe> <pr-index> # POST=1: posts !testme + polls
# if it prints VERDICT=PENDING, poll again WITHOUT re-triggering:
POST=0 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh <recipe> <pr-index>
```
(`SRC` defaults to `recipe-maintainers/<recipe>`.) **GREEN** ⇔ harness exits 0 → the recipe PR is
verified, ready for operator merge. **General bar = one cold green**; use `REPEAT=3` only for a recipe
already known to be FLAKY (e.g. lasuite-drive). Record the run summary in the report.
- **GREEN** → the recipe PR is verified on cc-ci, results in the PR, ready for operator merge. Done.
- **RED** → go to step 5. You get **up to 3 `!testme` runs total** on this PR (initial + 2 retries):
use them to fix and re-test a real upgrade regression (step 5a). Each retry is a fresh `!testme`
after you push a fix, so every attempt's result is recorded in the PR.
### 5. If verification is RED → diagnose (recipe vs cc-ci TEST), per ci-test-review
Read the harness log and classify the failure:
### 5. If `!testme` is RED → diagnose from the run (recipe regression vs stale cc-ci TEST)
Read the failing run (linked from the PR / the harness log on cc-ci) and classify:
- **The UPGRADE itself is broken** (real regression: bad tag, missing migration, config gap) fix it
on the recipe branch (bounded — ≤3 iterations), re-push, re-verify. The recipe PR is only "working"
once cc-ci is green. If still red after the budget, leave the PR open and report
`FAILED — upgrade not green`.
- **The upgrade is correct but a cc-ci TEST is now stale/wrong** for the new version (asserts old
behavior, an overlay needs updating, a readiness gate changed) → this is a **CI-server fix**. Make
it the `ci-test-review` way:
**5a. The UPGRADE itself is broken** (real regression: bad tag, missing migration, config gap) fix
it on the recipe branch, push, and re-run `!testme` (within the 3-run budget). The recipe PR is only
"working" once `!testme` is GREEN. If still red after 3 runs, leave the PR open (its red `!testme`
results stand as the evidence) and report `FAILED — upgrade not green after 3 !testme runs`.
**5b. The upgrade is correct but a cc-ci TEST is genuinely stale/wrong** for the new version (asserts
old behavior, an overlay needs updating, a readiness gate changed). `!testme` can't go green without a
test change, and a test change is **gated by `--with-tests`**:
- **DEFAULT (no `--with-tests`) — comment, don't touch tests.** Post a **comment on the recipe PR**
that: states the upgrade itself looks correct; identifies the specific test that failed and **why it
appears stale** (what the new version changed vs what the test asserts); and tells the operator they
can **re-run `/recipe-upgrade <recipe> --with-tests`** to also open + verify a cc-ci test-update PR.
Do **NOT** modify any test. Report `SUCCESS-PENDING-TESTS` (recipe PR open; `!testme` red on a
stale test; operator to decide).
- **`--with-tests` — open + verify a cc-ci test PR.** Make it the `ci-test-review` way:
1. Branch `recipe-maintainers/cc-ci` in a **separate clone** (single-writer: never push `main`,
never touch the build loops' `/cc-ci` `/cc-ci-adv` clones), update the test/overlay.
2. **Verify** the recipe upgrade **with the updated test applied**: check the cc-ci branch out on
cc-ci, rebuild if needed, re-run `verify-pr.sh` for the recipe + a small regression sample.
3. Open the **cc-ci test PR** via the Gitea API (`/srv/cc-ci/.testenv` creds). Capture its URL.
4. The recipe PR + the cc-ci test PR are a **pair** — note in each that it depends on the other.
- **FLAKY** (passes on a re-run) → note it; don't author a fix for a flake.
never touch the build loops' `/cc-ci` `/cc-ci-adv` clones); update the test/overlay.
2. **Verify the recipe upgrade WITH the updated test applied.** `!testme` on the recipe PR uses the
*deployed/main* cc-ci tests, so it can't see the unmerged test change — verify this combo by
running the harness with the cc-ci branch checked out on cc-ci:
`RECIPE=<recipe> REF=upgrade-<new-version> /srv/cc-ci/.claude/skills/ci-test-review/verify-pr.sh`
(after checking out the cc-ci test branch on cc-ci + rebuilding if needed), plus a small
regression sample. Green ⇔ the upgrade passes under the corrected test.
3. Open the **cc-ci test PR** via the Gitea API. Note in BOTH PRs that they are a dependent pair
(the recipe PR's `!testme` goes green only once the test PR is merged). Report `SUCCESS+TESTPR`.
**FLAKY** (passes on a re-`!testme`) → note it; don't author a fix for a flake.
Never weaken a test to turn a red upgrade green.
@ -106,21 +134,26 @@ Never weaken a test to turn a red upgrade green.
Write the report to `/srv/cc-ci/.cc-ci-logs/upgrades/<recipe>-upgrade-<YYYY-MM-DD>.md` and print, as
the **last line**, one of these exact prefixes (so `/upgrade-all` can collect it):
- `RESULT: SUCCESS — <recipe> <old><new>, cc-ci GREEN, recipe PR: <url>`
- `RESULT: SUCCESS+TESTPR<recipe> <old><new>, cc-ci GREEN; recipe PR: <url>; cc-ci test PR: <url>`
- `RESULT: FAILED<recipe> at <step>: <one-line reason>` (e.g. upgrade not green after 3 tries)
- `RESULT: SUCCESS — <recipe> <old><new>, !testme GREEN, recipe PR: <url>`
- `RESULT: SUCCESS-PENDING-TESTS<recipe> <old><new>, recipe PR: <url>; !testme RED on a stale test (commented; re-run --with-tests to update tests)`
- `RESULT: SUCCESS+TESTPR<recipe> <old> <new>; recipe PR: <url>; cc-ci test PR: <url> (verified with the test change; pair)`
- `RESULT: FAILED — <recipe> at <step>: <one-line reason>` (e.g. upgrade not green after 3 !testme runs)
- `RESULT: SKIPPED — <recipe>: <up-to-date | dirty-worktree | reason>`
Always state explicitly that **nothing was merged** — the PR(s) await operator review.
## Guardrails
- **cc-ci is the gate** — deterministic harness decides pass/fail; AI plans, implements, diagnoses.
- **cc-ci is the gate, via `!testme` on the PR** — the harness decides pass/fail and the results live
in the PR (operator-visible); AI plans, implements, diagnoses. Up to 3 `!testme` runs per PR.
- **Create + verify, NEVER merge.** Recipe PR (and any cc-ci test PR) are operator-merged after review.
- **Tests are opt-in.** Default = recipe PR only, existing tests; a stale-test failure gets an
explanatory PR **comment**, never a test edit. Only `--with-tests` opens a cc-ci test PR. The weekly
`/upgrade-all` cron always runs the **default** — it never auto-updates tests.
- **Mirror reflects reality.** Each run force-syncs the `recipe-maintainers/<recipe>` mirror `main` to
true upstream main, closes open PRs already merged upstream, and replaces any superseded open PR with
the new one — so an open mirror PR always means "genuinely still open against current upstream main".
- **Prefer a recipe-only PR.** Only open a cc-ci test PR when the upgrade is correct but a test is
genuinely stale — not to paper over a real upgrade regression.
- **Prefer a recipe-only PR.** Only open a cc-ci test PR (under `--with-tests`) when the upgrade is
correct but a test is genuinely stale — never to paper over a real upgrade regression.
- **Never weaken a test**; **bounded** changes (the upgrade + minimal test update, not rewrites).
- **Real abra path** throughout (`abra recipe upgrade` / `abra app deploy` via the harness).
- **Single-writer / coordination:** dedicated branches + separate clones; never push `main` or disturb

View File

@ -0,0 +1,55 @@
#!/usr/bin/env bash
# recipe-upgrade :: trigger cc-ci on a recipe PR via `!testme` and read the verdict FROM the PR.
# -------------------------------------------------------------------------
# Runs on the ORCHESTRATOR (just hits the Gitea API with /srv/cc-ci/.testenv creds). Posting
# `!testme` as a PR comment is the REAL CI path — the bridge runs the harness on cc-ci and the
# run + its results are posted back to the PR, so the operator can see them in the PR itself.
#
# testme-on-pr.sh <recipe> <pr-index>
# env: POST=1 post a fresh `!testme` first (default). POST=0 = poll an in-flight run only,
# WITHOUT re-triggering (use on follow-up calls so you don't launch duplicate runs).
# MAX_WAIT=480 INTERVAL=30 poll window per call, in seconds. 480s stays under the 10-min
# tool cap — if it prints VERDICT=PENDING, just run again with POST=0 to keep polling.
#
# Prints: VERDICT=GREEN|RED|PENDING and BUILD=<drone-build-url>
set -o errexit -o nounset -o pipefail
RECIPE="${1:?usage: testme-on-pr.sh <recipe> <pr-index>}"
PRIDX="${2:?usage: testme-on-pr.sh <recipe> <pr-index>}"
TESTENV="${TESTENV:-/srv/cc-ci/.testenv}"
set -a; . "$TESTENV"; set +a
: "${GITEA_USERNAME:?}"; : "${GITEA_PASSWORD:?}"; : "${GITEA_URL:?}"
NS="${GITEA_NAMESPACE:-recipe-maintainers}"
API="https://${GITEA_URL}/api/v1"; AUTH=(-u "${GITEA_USERNAME}:${GITEA_PASSWORD}")
POST="${POST:-1}"; MAX_WAIT="${MAX_WAIT:-480}"; INTERVAL="${INTERVAL:-30}"
SHA="$(curl -s "${AUTH[@]}" "${API}/repos/${NS}/${RECIPE}/pulls/${PRIDX}" \
| python3 -c "import json,sys;print(json.load(sys.stdin)['head']['sha'])" 2>/dev/null || true)"
[ -n "$SHA" ] || { echo "ERROR: could not resolve ${NS}/${RECIPE} PR #${PRIDX} head sha"; exit 1; }
if [ "$POST" = "1" ]; then
curl -s -o /dev/null "${AUTH[@]}" -H "Content-Type: application/json" -X POST \
"${API}/repos/${NS}/${RECIPE}/issues/${PRIDX}/comments" -d '{"body":"!testme"}'
echo "→ posted !testme on ${NS}/${RECIPE}#${PRIDX} (head ${SHA:0:8}) — results will appear in the PR"
sleep 5
fi
# Poll the Drone-reported combined commit status on the PR head (terminal = success/failure/error).
deadline=$(( $(date +%s) + MAX_WAIT ))
url=""
while [ "$(date +%s)" -lt "$deadline" ]; do
read -r state url < <(curl -s "${AUTH[@]}" "${API}/repos/${NS}/${RECIPE}/commits/${SHA}/status" 2>/dev/null \
| python3 -c "import json,sys
try: d=json.load(sys.stdin)
except Exception: d={}
sts=d.get('statuses') or []
print(d.get('state','') or 'none', (sts[0].get('target_url') if sts else '') or '')" 2>/dev/null || echo "none ")
case "$state" in
success) echo "VERDICT=GREEN"; echo "BUILD=${url}"; exit 0;;
failure|error) echo "VERDICT=RED"; echo "BUILD=${url}"; exit 2;;
esac
sleep "$INTERVAL"
done
echo "VERDICT=PENDING"; echo "BUILD=${url:-?}"
echo "(run still in flight — re-run with POST=0 to keep polling without re-triggering)"
exit 3

View File

@ -51,12 +51,18 @@ A table — Recipe | Status (will upgrade / skipped:reason) | Available upgrade(
For each recipe in `RECIPES_TO_UPGRADE`, spawn an Agent (`subagent_type: "general-purpose"`,
description `"Upgrade <recipe> on cc-ci"`) with a prompt like:
> Run the `/recipe-upgrade <recipe>` skill end-to-end
> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, **verify green on cc-ci**,
> open a recipe PR, and — only if the upgrade is correct but a cc-ci test went stale — open a verified
> cc-ci test PR too. Drive cc-ci over `ssh cc-ci`. Do NOT prompt. Do NOT push upstream. Do NOT merge.
> Print exactly one `RESULT:` line as your final line (the SUCCESS / SUCCESS+TESTPR / FAILED / SKIPPED
> forms from the recipe-upgrade skill).
> Run the `/recipe-upgrade <recipe>` skill end-to-end — **DEFAULT mode, do NOT pass `--with-tests`**
> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, open a recipe PR, and verify
> it by posting `!testme` on the PR (results visible in the PR; iterate ≤3×). Use the **existing
> tests** — if a test fails because it is genuinely stale, **leave an explanatory comment on the PR**
> for the operator and do NOT modify any test. Drive cc-ci over `ssh cc-ci`. Do NOT prompt. Do NOT push
> upstream. Do NOT merge. Print exactly one `RESULT:` line as your final line (SUCCESS /
> SUCCESS-PENDING-TESTS / FAILED / SKIPPED — `SUCCESS+TESTPR` will not occur in default mode).
**Why default (no `--with-tests`):** the weekly cron must never auto-edit cc-ci tests unattended — a
test change deserves a human decision. So the cron opens recipe PRs only; where a recipe's existing
test looks stale, the operator sees the explanation in the PR comment and can re-run that one recipe
with `/recipe-upgrade <recipe> --with-tests` to also get a verified test-update PR.
- **Sequential (default):** one Agent at a time; wait, collect its `RESULT:`, then the next. If the
Agent tool call itself errors, record `FAILED — agent tool error` and continue. One failure never
@ -65,8 +71,8 @@ description `"Upgrade <recipe> on cc-ci"`) with a prompt like:
the results). Failures are isolated per recipe.
## 4. Collect results
Parse each final `RESULT:` line into SUCCESS / SUCCESS+TESTPR / FAILED / SKIPPED. A subagent that
emitted no `RESULT:` line → `FAILED — no result emitted`.
Parse each final `RESULT:` line into SUCCESS / SUCCESS-PENDING-TESTS / FAILED / SKIPPED (default mode
won't emit `SUCCESS+TESTPR`). A subagent that emitted no `RESULT:` line → `FAILED — no result emitted`.
## 5. Write + print the summary
Write `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-<YYYY-MM-DD>.md` and print it, **leading with the
@ -74,9 +80,11 @@ PR list** (the actionable output):
```markdown
# cc-ci Weekly Upgrade Run — <YYYY-MM-DD>
## Summary
- Considered: N · Upgraded (PR opened): N · With cc-ci test PR: N · Failed: N · Skipped: N
- Considered: N · PR green (!testme): N · PR open but tests look stale (commented): N · Failed: N · Skipped: N
## PRs to review (NOT merged)
- <recipe> <old><new> — recipe PR: <url> [+ cc-ci test PR: <url>]
- <recipe> <old><new> — recipe PR: <url> — !testme GREEN
## PRs where a test looks stale (operator: re-run `--with-tests` to update tests)
- <recipe> <old><new> — recipe PR: <url> — !testme RED on <test>; see PR comment
## Failed (investigate)
- <recipe> — at <step>: <reason> (log: .cc-ci-logs/upgrades/<recipe>-upgrade-<date>.md)
## Skipped