recipe-upgrade: !testme-on-PR verification + make test PRs opt-in (--with-tests)
Per operator: - Verify via `!testme` posted ON the recipe PR (the real CI path) so results are viewable in the PR; iterate up to 3 !testme runs (fix a real regression + re-test). New helper testme-on-pr.sh posts !testme and polls the PR head commit status for the verdict (POST=0 to keep polling without re-triggering). - Test updates are now OPT-IN via `--with-tests`. DEFAULT: recipe PR only using existing tests; if a test fails and is genuinely stale, leave an explanatory COMMENT on the PR (upgrade looks correct; re-run --with-tests to update tests) and do NOT touch any test. --with-tests keeps the verified cc-ci test-update PR path (verified via the branch-checkout harness run, since !testme uses prod tests). - upgrade-all (weekly cron) calls the DEFAULT — never auto-edits tests unattended; surfaces "tests look stale" PRs in the summary for the operator to opt in per-recipe. - New RESULT: SUCCESS-PENDING-TESTS for the recipe-green-but-test-stale default case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@ -1,6 +1,6 @@
|
||||
---
|
||||
name: recipe-upgrade
|
||||
description: Upgrade ONE Co-op Cloud recipe end-to-end and verify it on the cc-ci CI server. Researches available upstream upgrades, plans them (breaking changes, migrations, config), implements the bump (image tags + recipe version label + config), then VERIFIES the change green on cc-ci (full suite, cold, against the PR head) and opens a recipe PR. If the upgrade is correct but a cc-ci TEST is now stale, it also updates the test, verifies that, and opens a second PR to the cc-ci repo. NEVER merges — leaves verified, ready-to-merge PRs. The per-recipe worker behind /upgrade-all. Invoke as /recipe-upgrade <recipe>.
|
||||
description: Upgrade ONE Co-op Cloud recipe end-to-end and verify it on the cc-ci CI server. Researches available upstream upgrades, plans them (breaking changes, migrations, config), implements the bump (image tags + recipe version label + config), opens a recipe PR, and verifies it by posting `!testme` on the PR (real CI; results visible in the PR; iterates up to 3×). DEFAULT: recipe PR only, using existing tests — if a test fails because it is genuinely stale, it leaves an explanatory COMMENT on the PR for the operator (does NOT touch tests). With `--with-tests`: also opens + verifies a PR to update the stale cc-ci test. NEVER merges. The per-recipe worker behind /upgrade-all. Invoke as /recipe-upgrade <recipe> [--with-tests].
|
||||
---
|
||||
|
||||
# recipe-upgrade
|
||||
@ -18,6 +18,15 @@ harness, and the bot token to push mirror branches). The Gitea PR API call uses
|
||||
cc-ci harness decides pass/fail). **Create + verify, NEVER merge** (operator merges). **Never weaken a
|
||||
test** to make a PR green.
|
||||
|
||||
## Arguments
|
||||
- `<recipe>` — the recipe to upgrade (required).
|
||||
- `--with-tests` (optional) — **opt in to updating cc-ci tests.** Without it (the DEFAULT, and what the
|
||||
weekly `/upgrade-all` cron uses), this skill **only ever opens a recipe PR using the existing tests**;
|
||||
if a test fails and is diagnosed as genuinely stale, it **leaves a comment** on the recipe PR
|
||||
explaining why and stops — it does **not** modify any test. With `--with-tests`, a stale-test failure
|
||||
additionally gets a **verified cc-ci test-update PR** (step 5b). A real upgrade regression is fixed +
|
||||
re-tested in **both** modes.
|
||||
|
||||
## Procedure
|
||||
|
||||
### 1. Plan the upgrade (research — follow recipe-upgrade-plan methodology)
|
||||
@ -73,32 +82,51 @@ real current upstream main). Capture the `PR_URL`. Optionally export `RECIPE_PR_
|
||||
table + the planned operator-action notes). Re-running with the same target version updates the
|
||||
existing same-branch PR rather than duplicating it.
|
||||
|
||||
### 4. VERIFY the upgrade on the CI server (the gate)
|
||||
Run the cc-ci full suite **cold** against the PR head — the dogfood gate from `ci-test-review`:
|
||||
### 4. VERIFY by running `!testme` ON THE PR (results visible in the PR; iterate ≤3×)
|
||||
Trigger the **real** CI on the recipe PR by posting `!testme` — the bridge runs the harness on cc-ci
|
||||
and posts the run + its results back to the PR, so the operator sees them in the PR. Use the PR index
|
||||
from step 3's `PR_URL` (`.../pulls/<N>`):
|
||||
```
|
||||
RECIPE=<recipe> REF=upgrade-<new-version> /srv/cc-ci/.claude/skills/ci-test-review/verify-pr.sh
|
||||
/srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh <recipe> <pr-index> # POST=1: posts !testme + polls
|
||||
# if it prints VERDICT=PENDING, poll again WITHOUT re-triggering:
|
||||
POST=0 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh <recipe> <pr-index>
|
||||
```
|
||||
(`SRC` defaults to `recipe-maintainers/<recipe>`.) **GREEN** ⇔ harness exits 0 → the recipe PR is
|
||||
verified, ready for operator merge. **General bar = one cold green**; use `REPEAT=3` only for a recipe
|
||||
already known to be FLAKY (e.g. lasuite-drive). Record the run summary in the report.
|
||||
- **GREEN** → the recipe PR is verified on cc-ci, results in the PR, ready for operator merge. Done.
|
||||
- **RED** → go to step 5. You get **up to 3 `!testme` runs total** on this PR (initial + 2 retries):
|
||||
use them to fix and re-test a real upgrade regression (step 5a). Each retry is a fresh `!testme`
|
||||
after you push a fix, so every attempt's result is recorded in the PR.
|
||||
|
||||
### 5. If verification is RED → diagnose (recipe vs cc-ci TEST), per ci-test-review
|
||||
Read the harness log and classify the failure:
|
||||
### 5. If `!testme` is RED → diagnose from the run (recipe regression vs stale cc-ci TEST)
|
||||
Read the failing run (linked from the PR / the harness log on cc-ci) and classify:
|
||||
|
||||
- **The UPGRADE itself is broken** (real regression: bad tag, missing migration, config gap) → fix it
|
||||
on the recipe branch (bounded — ≤3 iterations), re-push, re-verify. The recipe PR is only "working"
|
||||
once cc-ci is green. If still red after the budget, leave the PR open and report
|
||||
`FAILED — upgrade not green`.
|
||||
- **The upgrade is correct but a cc-ci TEST is now stale/wrong** for the new version (asserts old
|
||||
behavior, an overlay needs updating, a readiness gate changed) → this is a **CI-server fix**. Make
|
||||
it the `ci-test-review` way:
|
||||
**5a. The UPGRADE itself is broken** (real regression: bad tag, missing migration, config gap) — fix
|
||||
it on the recipe branch, push, and re-run `!testme` (within the 3-run budget). The recipe PR is only
|
||||
"working" once `!testme` is GREEN. If still red after 3 runs, leave the PR open (its red `!testme`
|
||||
results stand as the evidence) and report `FAILED — upgrade not green after 3 !testme runs`.
|
||||
|
||||
**5b. The upgrade is correct but a cc-ci TEST is genuinely stale/wrong** for the new version (asserts
|
||||
old behavior, an overlay needs updating, a readiness gate changed). `!testme` can't go green without a
|
||||
test change, and a test change is **gated by `--with-tests`**:
|
||||
|
||||
- **DEFAULT (no `--with-tests`) — comment, don't touch tests.** Post a **comment on the recipe PR**
|
||||
that: states the upgrade itself looks correct; identifies the specific test that failed and **why it
|
||||
appears stale** (what the new version changed vs what the test asserts); and tells the operator they
|
||||
can **re-run `/recipe-upgrade <recipe> --with-tests`** to also open + verify a cc-ci test-update PR.
|
||||
Do **NOT** modify any test. Report `SUCCESS-PENDING-TESTS` (recipe PR open; `!testme` red on a
|
||||
stale test; operator to decide).
|
||||
- **`--with-tests` — open + verify a cc-ci test PR.** Make it the `ci-test-review` way:
|
||||
1. Branch `recipe-maintainers/cc-ci` in a **separate clone** (single-writer: never push `main`,
|
||||
never touch the build loops' `/cc-ci` `/cc-ci-adv` clones), update the test/overlay.
|
||||
2. **Verify** the recipe upgrade **with the updated test applied**: check the cc-ci branch out on
|
||||
cc-ci, rebuild if needed, re-run `verify-pr.sh` for the recipe + a small regression sample.
|
||||
3. Open the **cc-ci test PR** via the Gitea API (`/srv/cc-ci/.testenv` creds). Capture its URL.
|
||||
4. The recipe PR + the cc-ci test PR are a **pair** — note in each that it depends on the other.
|
||||
- **FLAKY** (passes on a re-run) → note it; don't author a fix for a flake.
|
||||
never touch the build loops' `/cc-ci` `/cc-ci-adv` clones); update the test/overlay.
|
||||
2. **Verify the recipe upgrade WITH the updated test applied.** `!testme` on the recipe PR uses the
|
||||
*deployed/main* cc-ci tests, so it can't see the unmerged test change — verify this combo by
|
||||
running the harness with the cc-ci branch checked out on cc-ci:
|
||||
`RECIPE=<recipe> REF=upgrade-<new-version> /srv/cc-ci/.claude/skills/ci-test-review/verify-pr.sh`
|
||||
(after checking out the cc-ci test branch on cc-ci + rebuilding if needed), plus a small
|
||||
regression sample. Green ⇔ the upgrade passes under the corrected test.
|
||||
3. Open the **cc-ci test PR** via the Gitea API. Note in BOTH PRs that they are a dependent pair
|
||||
(the recipe PR's `!testme` goes green only once the test PR is merged). Report `SUCCESS+TESTPR`.
|
||||
|
||||
**FLAKY** (passes on a re-`!testme`) → note it; don't author a fix for a flake.
|
||||
|
||||
Never weaken a test to turn a red upgrade green.
|
||||
|
||||
@ -106,21 +134,26 @@ Never weaken a test to turn a red upgrade green.
|
||||
Write the report to `/srv/cc-ci/.cc-ci-logs/upgrades/<recipe>-upgrade-<YYYY-MM-DD>.md` and print, as
|
||||
the **last line**, one of these exact prefixes (so `/upgrade-all` can collect it):
|
||||
|
||||
- `RESULT: SUCCESS — <recipe> <old> → <new>, cc-ci GREEN, recipe PR: <url>`
|
||||
- `RESULT: SUCCESS+TESTPR — <recipe> <old> → <new>, cc-ci GREEN; recipe PR: <url>; cc-ci test PR: <url>`
|
||||
- `RESULT: FAILED — <recipe> at <step>: <one-line reason>` (e.g. upgrade not green after 3 tries)
|
||||
- `RESULT: SUCCESS — <recipe> <old> → <new>, !testme GREEN, recipe PR: <url>`
|
||||
- `RESULT: SUCCESS-PENDING-TESTS — <recipe> <old> → <new>, recipe PR: <url>; !testme RED on a stale test (commented; re-run --with-tests to update tests)`
|
||||
- `RESULT: SUCCESS+TESTPR — <recipe> <old> → <new>; recipe PR: <url>; cc-ci test PR: <url> (verified with the test change; pair)`
|
||||
- `RESULT: FAILED — <recipe> at <step>: <one-line reason>` (e.g. upgrade not green after 3 !testme runs)
|
||||
- `RESULT: SKIPPED — <recipe>: <up-to-date | dirty-worktree | reason>`
|
||||
|
||||
Always state explicitly that **nothing was merged** — the PR(s) await operator review.
|
||||
|
||||
## Guardrails
|
||||
- **cc-ci is the gate** — deterministic harness decides pass/fail; AI plans, implements, diagnoses.
|
||||
- **cc-ci is the gate, via `!testme` on the PR** — the harness decides pass/fail and the results live
|
||||
in the PR (operator-visible); AI plans, implements, diagnoses. Up to 3 `!testme` runs per PR.
|
||||
- **Create + verify, NEVER merge.** Recipe PR (and any cc-ci test PR) are operator-merged after review.
|
||||
- **Tests are opt-in.** Default = recipe PR only, existing tests; a stale-test failure gets an
|
||||
explanatory PR **comment**, never a test edit. Only `--with-tests` opens a cc-ci test PR. The weekly
|
||||
`/upgrade-all` cron always runs the **default** — it never auto-updates tests.
|
||||
- **Mirror reflects reality.** Each run force-syncs the `recipe-maintainers/<recipe>` mirror `main` to
|
||||
true upstream main, closes open PRs already merged upstream, and replaces any superseded open PR with
|
||||
the new one — so an open mirror PR always means "genuinely still open against current upstream main".
|
||||
- **Prefer a recipe-only PR.** Only open a cc-ci test PR when the upgrade is correct but a test is
|
||||
genuinely stale — not to paper over a real upgrade regression.
|
||||
- **Prefer a recipe-only PR.** Only open a cc-ci test PR (under `--with-tests`) when the upgrade is
|
||||
correct but a test is genuinely stale — never to paper over a real upgrade regression.
|
||||
- **Never weaken a test**; **bounded** changes (the upgrade + minimal test update, not rewrites).
|
||||
- **Real abra path** throughout (`abra recipe upgrade` / `abra app deploy` via the harness).
|
||||
- **Single-writer / coordination:** dedicated branches + separate clones; never push `main` or disturb
|
||||
|
||||
55
.claude/skills/recipe-upgrade/testme-on-pr.sh
Executable file
55
.claude/skills/recipe-upgrade/testme-on-pr.sh
Executable file
@ -0,0 +1,55 @@
|
||||
#!/usr/bin/env bash
|
||||
# recipe-upgrade :: trigger cc-ci on a recipe PR via `!testme` and read the verdict FROM the PR.
|
||||
# -------------------------------------------------------------------------
|
||||
# Runs on the ORCHESTRATOR (just hits the Gitea API with /srv/cc-ci/.testenv creds). Posting
|
||||
# `!testme` as a PR comment is the REAL CI path — the bridge runs the harness on cc-ci and the
|
||||
# run + its results are posted back to the PR, so the operator can see them in the PR itself.
|
||||
#
|
||||
# testme-on-pr.sh <recipe> <pr-index>
|
||||
# env: POST=1 post a fresh `!testme` first (default). POST=0 = poll an in-flight run only,
|
||||
# WITHOUT re-triggering (use on follow-up calls so you don't launch duplicate runs).
|
||||
# MAX_WAIT=480 INTERVAL=30 poll window per call, in seconds. 480s stays under the 10-min
|
||||
# tool cap — if it prints VERDICT=PENDING, just run again with POST=0 to keep polling.
|
||||
#
|
||||
# Prints: VERDICT=GREEN|RED|PENDING and BUILD=<drone-build-url>
|
||||
set -o errexit -o nounset -o pipefail
|
||||
|
||||
RECIPE="${1:?usage: testme-on-pr.sh <recipe> <pr-index>}"
|
||||
PRIDX="${2:?usage: testme-on-pr.sh <recipe> <pr-index>}"
|
||||
TESTENV="${TESTENV:-/srv/cc-ci/.testenv}"
|
||||
set -a; . "$TESTENV"; set +a
|
||||
: "${GITEA_USERNAME:?}"; : "${GITEA_PASSWORD:?}"; : "${GITEA_URL:?}"
|
||||
NS="${GITEA_NAMESPACE:-recipe-maintainers}"
|
||||
API="https://${GITEA_URL}/api/v1"; AUTH=(-u "${GITEA_USERNAME}:${GITEA_PASSWORD}")
|
||||
POST="${POST:-1}"; MAX_WAIT="${MAX_WAIT:-480}"; INTERVAL="${INTERVAL:-30}"
|
||||
|
||||
SHA="$(curl -s "${AUTH[@]}" "${API}/repos/${NS}/${RECIPE}/pulls/${PRIDX}" \
|
||||
| python3 -c "import json,sys;print(json.load(sys.stdin)['head']['sha'])" 2>/dev/null || true)"
|
||||
[ -n "$SHA" ] || { echo "ERROR: could not resolve ${NS}/${RECIPE} PR #${PRIDX} head sha"; exit 1; }
|
||||
|
||||
if [ "$POST" = "1" ]; then
|
||||
curl -s -o /dev/null "${AUTH[@]}" -H "Content-Type: application/json" -X POST \
|
||||
"${API}/repos/${NS}/${RECIPE}/issues/${PRIDX}/comments" -d '{"body":"!testme"}'
|
||||
echo "→ posted !testme on ${NS}/${RECIPE}#${PRIDX} (head ${SHA:0:8}) — results will appear in the PR"
|
||||
sleep 5
|
||||
fi
|
||||
|
||||
# Poll the Drone-reported combined commit status on the PR head (terminal = success/failure/error).
|
||||
deadline=$(( $(date +%s) + MAX_WAIT ))
|
||||
url=""
|
||||
while [ "$(date +%s)" -lt "$deadline" ]; do
|
||||
read -r state url < <(curl -s "${AUTH[@]}" "${API}/repos/${NS}/${RECIPE}/commits/${SHA}/status" 2>/dev/null \
|
||||
| python3 -c "import json,sys
|
||||
try: d=json.load(sys.stdin)
|
||||
except Exception: d={}
|
||||
sts=d.get('statuses') or []
|
||||
print(d.get('state','') or 'none', (sts[0].get('target_url') if sts else '') or '')" 2>/dev/null || echo "none ")
|
||||
case "$state" in
|
||||
success) echo "VERDICT=GREEN"; echo "BUILD=${url}"; exit 0;;
|
||||
failure|error) echo "VERDICT=RED"; echo "BUILD=${url}"; exit 2;;
|
||||
esac
|
||||
sleep "$INTERVAL"
|
||||
done
|
||||
echo "VERDICT=PENDING"; echo "BUILD=${url:-?}"
|
||||
echo "(run still in flight — re-run with POST=0 to keep polling without re-triggering)"
|
||||
exit 3
|
||||
@ -51,12 +51,18 @@ A table — Recipe | Status (will upgrade / skipped:reason) | Available upgrade(
|
||||
For each recipe in `RECIPES_TO_UPGRADE`, spawn an Agent (`subagent_type: "general-purpose"`,
|
||||
description `"Upgrade <recipe> on cc-ci"`) with a prompt like:
|
||||
|
||||
> Run the `/recipe-upgrade <recipe>` skill end-to-end
|
||||
> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, **verify green on cc-ci**,
|
||||
> open a recipe PR, and — only if the upgrade is correct but a cc-ci test went stale — open a verified
|
||||
> cc-ci test PR too. Drive cc-ci over `ssh cc-ci`. Do NOT prompt. Do NOT push upstream. Do NOT merge.
|
||||
> Print exactly one `RESULT:` line as your final line (the SUCCESS / SUCCESS+TESTPR / FAILED / SKIPPED
|
||||
> forms from the recipe-upgrade skill).
|
||||
> Run the `/recipe-upgrade <recipe>` skill end-to-end — **DEFAULT mode, do NOT pass `--with-tests`**
|
||||
> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, open a recipe PR, and verify
|
||||
> it by posting `!testme` on the PR (results visible in the PR; iterate ≤3×). Use the **existing
|
||||
> tests** — if a test fails because it is genuinely stale, **leave an explanatory comment on the PR**
|
||||
> for the operator and do NOT modify any test. Drive cc-ci over `ssh cc-ci`. Do NOT prompt. Do NOT push
|
||||
> upstream. Do NOT merge. Print exactly one `RESULT:` line as your final line (SUCCESS /
|
||||
> SUCCESS-PENDING-TESTS / FAILED / SKIPPED — `SUCCESS+TESTPR` will not occur in default mode).
|
||||
|
||||
**Why default (no `--with-tests`):** the weekly cron must never auto-edit cc-ci tests unattended — a
|
||||
test change deserves a human decision. So the cron opens recipe PRs only; where a recipe's existing
|
||||
test looks stale, the operator sees the explanation in the PR comment and can re-run that one recipe
|
||||
with `/recipe-upgrade <recipe> --with-tests` to also get a verified test-update PR.
|
||||
|
||||
- **Sequential (default):** one Agent at a time; wait, collect its `RESULT:`, then the next. If the
|
||||
Agent tool call itself errors, record `FAILED — agent tool error` and continue. One failure never
|
||||
@ -65,8 +71,8 @@ description `"Upgrade <recipe> on cc-ci"`) with a prompt like:
|
||||
the results). Failures are isolated per recipe.
|
||||
|
||||
## 4. Collect results
|
||||
Parse each final `RESULT:` line into SUCCESS / SUCCESS+TESTPR / FAILED / SKIPPED. A subagent that
|
||||
emitted no `RESULT:` line → `FAILED — no result emitted`.
|
||||
Parse each final `RESULT:` line into SUCCESS / SUCCESS-PENDING-TESTS / FAILED / SKIPPED (default mode
|
||||
won't emit `SUCCESS+TESTPR`). A subagent that emitted no `RESULT:` line → `FAILED — no result emitted`.
|
||||
|
||||
## 5. Write + print the summary
|
||||
Write `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-<YYYY-MM-DD>.md` and print it, **leading with the
|
||||
@ -74,9 +80,11 @@ PR list** (the actionable output):
|
||||
```markdown
|
||||
# cc-ci Weekly Upgrade Run — <YYYY-MM-DD>
|
||||
## Summary
|
||||
- Considered: N · Upgraded (PR opened): N · With cc-ci test PR: N · Failed: N · Skipped: N
|
||||
- Considered: N · PR green (!testme): N · PR open but tests look stale (commented): N · Failed: N · Skipped: N
|
||||
## PRs to review (NOT merged)
|
||||
- <recipe> <old> → <new> — recipe PR: <url> [+ cc-ci test PR: <url>]
|
||||
- <recipe> <old> → <new> — recipe PR: <url> — !testme GREEN
|
||||
## PRs where a test looks stale (operator: re-run `--with-tests` to update tests)
|
||||
- <recipe> <old> → <new> — recipe PR: <url> — !testme RED on <test>; see PR comment
|
||||
## Failed (investigate)
|
||||
- <recipe> — at <step>: <reason> (log: .cc-ci-logs/upgrades/<recipe>-upgrade-<date>.md)
|
||||
## Skipped
|
||||
|
||||
Reference in New Issue
Block a user