From 4a1da1dd601da681e110fc307186ff3e11cfc55c Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Fri, 29 May 2026 20:18:59 +0100 Subject: [PATCH] recipe-upgrade: !testme-on-PR verification + make test PRs opt-in (--with-tests) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per operator: - Verify via `!testme` posted ON the recipe PR (the real CI path) so results are viewable in the PR; iterate up to 3 !testme runs (fix a real regression + re-test). New helper testme-on-pr.sh posts !testme and polls the PR head commit status for the verdict (POST=0 to keep polling without re-triggering). - Test updates are now OPT-IN via `--with-tests`. DEFAULT: recipe PR only using existing tests; if a test fails and is genuinely stale, leave an explanatory COMMENT on the PR (upgrade looks correct; re-run --with-tests to update tests) and do NOT touch any test. --with-tests keeps the verified cc-ci test-update PR path (verified via the branch-checkout harness run, since !testme uses prod tests). - upgrade-all (weekly cron) calls the DEFAULT — never auto-edits tests unattended; surfaces "tests look stale" PRs in the summary for the operator to opt in per-recipe. - New RESULT: SUCCESS-PENDING-TESTS for the recipe-green-but-test-stale default case. Co-Authored-By: Claude Opus 4.8 --- .claude/skills/recipe-upgrade/SKILL.md | 89 +++++++++++++------ .claude/skills/recipe-upgrade/testme-on-pr.sh | 55 ++++++++++++ .claude/skills/upgrade-all/SKILL.md | 28 +++--- 3 files changed, 134 insertions(+), 38 deletions(-) create mode 100755 .claude/skills/recipe-upgrade/testme-on-pr.sh diff --git a/.claude/skills/recipe-upgrade/SKILL.md b/.claude/skills/recipe-upgrade/SKILL.md index a005da8..778a214 100644 --- a/.claude/skills/recipe-upgrade/SKILL.md +++ b/.claude/skills/recipe-upgrade/SKILL.md @@ -1,6 +1,6 @@ --- name: recipe-upgrade -description: Upgrade ONE Co-op Cloud recipe end-to-end and verify it on the cc-ci CI server. Researches available upstream upgrades, plans them (breaking changes, migrations, config), implements the bump (image tags + recipe version label + config), then VERIFIES the change green on cc-ci (full suite, cold, against the PR head) and opens a recipe PR. If the upgrade is correct but a cc-ci TEST is now stale, it also updates the test, verifies that, and opens a second PR to the cc-ci repo. NEVER merges — leaves verified, ready-to-merge PRs. The per-recipe worker behind /upgrade-all. Invoke as /recipe-upgrade . +description: Upgrade ONE Co-op Cloud recipe end-to-end and verify it on the cc-ci CI server. Researches available upstream upgrades, plans them (breaking changes, migrations, config), implements the bump (image tags + recipe version label + config), opens a recipe PR, and verifies it by posting `!testme` on the PR (real CI; results visible in the PR; iterates up to 3×). DEFAULT: recipe PR only, using existing tests — if a test fails because it is genuinely stale, it leaves an explanatory COMMENT on the PR for the operator (does NOT touch tests). With `--with-tests`: also opens + verifies a PR to update the stale cc-ci test. NEVER merges. The per-recipe worker behind /upgrade-all. Invoke as /recipe-upgrade [--with-tests]. --- # recipe-upgrade @@ -18,6 +18,15 @@ harness, and the bot token to push mirror branches). The Gitea PR API call uses cc-ci harness decides pass/fail). **Create + verify, NEVER merge** (operator merges). **Never weaken a test** to make a PR green. +## Arguments +- `` — the recipe to upgrade (required). +- `--with-tests` (optional) — **opt in to updating cc-ci tests.** Without it (the DEFAULT, and what the + weekly `/upgrade-all` cron uses), this skill **only ever opens a recipe PR using the existing tests**; + if a test fails and is diagnosed as genuinely stale, it **leaves a comment** on the recipe PR + explaining why and stops — it does **not** modify any test. With `--with-tests`, a stale-test failure + additionally gets a **verified cc-ci test-update PR** (step 5b). A real upgrade regression is fixed + + re-tested in **both** modes. + ## Procedure ### 1. Plan the upgrade (research — follow recipe-upgrade-plan methodology) @@ -73,32 +82,51 @@ real current upstream main). Capture the `PR_URL`. Optionally export `RECIPE_PR_ table + the planned operator-action notes). Re-running with the same target version updates the existing same-branch PR rather than duplicating it. -### 4. VERIFY the upgrade on the CI server (the gate) -Run the cc-ci full suite **cold** against the PR head — the dogfood gate from `ci-test-review`: +### 4. VERIFY by running `!testme` ON THE PR (results visible in the PR; iterate ≤3×) +Trigger the **real** CI on the recipe PR by posting `!testme` — the bridge runs the harness on cc-ci +and posts the run + its results back to the PR, so the operator sees them in the PR. Use the PR index +from step 3's `PR_URL` (`.../pulls/`): ``` -RECIPE= REF=upgrade- /srv/cc-ci/.claude/skills/ci-test-review/verify-pr.sh +/srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh # POST=1: posts !testme + polls +# if it prints VERDICT=PENDING, poll again WITHOUT re-triggering: +POST=0 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh ``` -(`SRC` defaults to `recipe-maintainers/`.) **GREEN** ⇔ harness exits 0 → the recipe PR is -verified, ready for operator merge. **General bar = one cold green**; use `REPEAT=3` only for a recipe -already known to be FLAKY (e.g. lasuite-drive). Record the run summary in the report. +- **GREEN** → the recipe PR is verified on cc-ci, results in the PR, ready for operator merge. Done. +- **RED** → go to step 5. You get **up to 3 `!testme` runs total** on this PR (initial + 2 retries): + use them to fix and re-test a real upgrade regression (step 5a). Each retry is a fresh `!testme` + after you push a fix, so every attempt's result is recorded in the PR. -### 5. If verification is RED → diagnose (recipe vs cc-ci TEST), per ci-test-review -Read the harness log and classify the failure: +### 5. If `!testme` is RED → diagnose from the run (recipe regression vs stale cc-ci TEST) +Read the failing run (linked from the PR / the harness log on cc-ci) and classify: -- **The UPGRADE itself is broken** (real regression: bad tag, missing migration, config gap) → fix it - on the recipe branch (bounded — ≤3 iterations), re-push, re-verify. The recipe PR is only "working" - once cc-ci is green. If still red after the budget, leave the PR open and report - `FAILED — upgrade not green`. -- **The upgrade is correct but a cc-ci TEST is now stale/wrong** for the new version (asserts old - behavior, an overlay needs updating, a readiness gate changed) → this is a **CI-server fix**. Make - it the `ci-test-review` way: +**5a. The UPGRADE itself is broken** (real regression: bad tag, missing migration, config gap) — fix +it on the recipe branch, push, and re-run `!testme` (within the 3-run budget). The recipe PR is only +"working" once `!testme` is GREEN. If still red after 3 runs, leave the PR open (its red `!testme` +results stand as the evidence) and report `FAILED — upgrade not green after 3 !testme runs`. + +**5b. The upgrade is correct but a cc-ci TEST is genuinely stale/wrong** for the new version (asserts +old behavior, an overlay needs updating, a readiness gate changed). `!testme` can't go green without a +test change, and a test change is **gated by `--with-tests`**: + +- **DEFAULT (no `--with-tests`) — comment, don't touch tests.** Post a **comment on the recipe PR** + that: states the upgrade itself looks correct; identifies the specific test that failed and **why it + appears stale** (what the new version changed vs what the test asserts); and tells the operator they + can **re-run `/recipe-upgrade --with-tests`** to also open + verify a cc-ci test-update PR. + Do **NOT** modify any test. Report `SUCCESS-PENDING-TESTS` (recipe PR open; `!testme` red on a + stale test; operator to decide). +- **`--with-tests` — open + verify a cc-ci test PR.** Make it the `ci-test-review` way: 1. Branch `recipe-maintainers/cc-ci` in a **separate clone** (single-writer: never push `main`, - never touch the build loops' `/cc-ci` `/cc-ci-adv` clones), update the test/overlay. - 2. **Verify** the recipe upgrade **with the updated test applied**: check the cc-ci branch out on - cc-ci, rebuild if needed, re-run `verify-pr.sh` for the recipe + a small regression sample. - 3. Open the **cc-ci test PR** via the Gitea API (`/srv/cc-ci/.testenv` creds). Capture its URL. - 4. The recipe PR + the cc-ci test PR are a **pair** — note in each that it depends on the other. -- **FLAKY** (passes on a re-run) → note it; don't author a fix for a flake. + never touch the build loops' `/cc-ci` `/cc-ci-adv` clones); update the test/overlay. + 2. **Verify the recipe upgrade WITH the updated test applied.** `!testme` on the recipe PR uses the + *deployed/main* cc-ci tests, so it can't see the unmerged test change — verify this combo by + running the harness with the cc-ci branch checked out on cc-ci: + `RECIPE= REF=upgrade- /srv/cc-ci/.claude/skills/ci-test-review/verify-pr.sh` + (after checking out the cc-ci test branch on cc-ci + rebuilding if needed), plus a small + regression sample. Green ⇔ the upgrade passes under the corrected test. + 3. Open the **cc-ci test PR** via the Gitea API. Note in BOTH PRs that they are a dependent pair + (the recipe PR's `!testme` goes green only once the test PR is merged). Report `SUCCESS+TESTPR`. + +**FLAKY** (passes on a re-`!testme`) → note it; don't author a fix for a flake. Never weaken a test to turn a red upgrade green. @@ -106,21 +134,26 @@ Never weaken a test to turn a red upgrade green. Write the report to `/srv/cc-ci/.cc-ci-logs/upgrades/-upgrade-.md` and print, as the **last line**, one of these exact prefixes (so `/upgrade-all` can collect it): -- `RESULT: SUCCESS — , cc-ci GREEN, recipe PR: ` -- `RESULT: SUCCESS+TESTPR — , cc-ci GREEN; recipe PR: ; cc-ci test PR: ` -- `RESULT: FAILED — at : ` (e.g. upgrade not green after 3 tries) +- `RESULT: SUCCESS — , !testme GREEN, recipe PR: ` +- `RESULT: SUCCESS-PENDING-TESTS — , recipe PR: ; !testme RED on a stale test (commented; re-run --with-tests to update tests)` +- `RESULT: SUCCESS+TESTPR — ; recipe PR: ; cc-ci test PR: (verified with the test change; pair)` +- `RESULT: FAILED — at : ` (e.g. upgrade not green after 3 !testme runs) - `RESULT: SKIPPED — : ` Always state explicitly that **nothing was merged** — the PR(s) await operator review. ## Guardrails -- **cc-ci is the gate** — deterministic harness decides pass/fail; AI plans, implements, diagnoses. +- **cc-ci is the gate, via `!testme` on the PR** — the harness decides pass/fail and the results live + in the PR (operator-visible); AI plans, implements, diagnoses. Up to 3 `!testme` runs per PR. - **Create + verify, NEVER merge.** Recipe PR (and any cc-ci test PR) are operator-merged after review. +- **Tests are opt-in.** Default = recipe PR only, existing tests; a stale-test failure gets an + explanatory PR **comment**, never a test edit. Only `--with-tests` opens a cc-ci test PR. The weekly + `/upgrade-all` cron always runs the **default** — it never auto-updates tests. - **Mirror reflects reality.** Each run force-syncs the `recipe-maintainers/` mirror `main` to true upstream main, closes open PRs already merged upstream, and replaces any superseded open PR with the new one — so an open mirror PR always means "genuinely still open against current upstream main". -- **Prefer a recipe-only PR.** Only open a cc-ci test PR when the upgrade is correct but a test is - genuinely stale — not to paper over a real upgrade regression. +- **Prefer a recipe-only PR.** Only open a cc-ci test PR (under `--with-tests`) when the upgrade is + correct but a test is genuinely stale — never to paper over a real upgrade regression. - **Never weaken a test**; **bounded** changes (the upgrade + minimal test update, not rewrites). - **Real abra path** throughout (`abra recipe upgrade` / `abra app deploy` via the harness). - **Single-writer / coordination:** dedicated branches + separate clones; never push `main` or disturb diff --git a/.claude/skills/recipe-upgrade/testme-on-pr.sh b/.claude/skills/recipe-upgrade/testme-on-pr.sh new file mode 100755 index 0000000..dd60982 --- /dev/null +++ b/.claude/skills/recipe-upgrade/testme-on-pr.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash +# recipe-upgrade :: trigger cc-ci on a recipe PR via `!testme` and read the verdict FROM the PR. +# ------------------------------------------------------------------------- +# Runs on the ORCHESTRATOR (just hits the Gitea API with /srv/cc-ci/.testenv creds). Posting +# `!testme` as a PR comment is the REAL CI path — the bridge runs the harness on cc-ci and the +# run + its results are posted back to the PR, so the operator can see them in the PR itself. +# +# testme-on-pr.sh +# env: POST=1 post a fresh `!testme` first (default). POST=0 = poll an in-flight run only, +# WITHOUT re-triggering (use on follow-up calls so you don't launch duplicate runs). +# MAX_WAIT=480 INTERVAL=30 poll window per call, in seconds. 480s stays under the 10-min +# tool cap — if it prints VERDICT=PENDING, just run again with POST=0 to keep polling. +# +# Prints: VERDICT=GREEN|RED|PENDING and BUILD= +set -o errexit -o nounset -o pipefail + +RECIPE="${1:?usage: testme-on-pr.sh }" +PRIDX="${2:?usage: testme-on-pr.sh }" +TESTENV="${TESTENV:-/srv/cc-ci/.testenv}" +set -a; . "$TESTENV"; set +a +: "${GITEA_USERNAME:?}"; : "${GITEA_PASSWORD:?}"; : "${GITEA_URL:?}" +NS="${GITEA_NAMESPACE:-recipe-maintainers}" +API="https://${GITEA_URL}/api/v1"; AUTH=(-u "${GITEA_USERNAME}:${GITEA_PASSWORD}") +POST="${POST:-1}"; MAX_WAIT="${MAX_WAIT:-480}"; INTERVAL="${INTERVAL:-30}" + +SHA="$(curl -s "${AUTH[@]}" "${API}/repos/${NS}/${RECIPE}/pulls/${PRIDX}" \ + | python3 -c "import json,sys;print(json.load(sys.stdin)['head']['sha'])" 2>/dev/null || true)" +[ -n "$SHA" ] || { echo "ERROR: could not resolve ${NS}/${RECIPE} PR #${PRIDX} head sha"; exit 1; } + +if [ "$POST" = "1" ]; then + curl -s -o /dev/null "${AUTH[@]}" -H "Content-Type: application/json" -X POST \ + "${API}/repos/${NS}/${RECIPE}/issues/${PRIDX}/comments" -d '{"body":"!testme"}' + echo "→ posted !testme on ${NS}/${RECIPE}#${PRIDX} (head ${SHA:0:8}) — results will appear in the PR" + sleep 5 +fi + +# Poll the Drone-reported combined commit status on the PR head (terminal = success/failure/error). +deadline=$(( $(date +%s) + MAX_WAIT )) +url="" +while [ "$(date +%s)" -lt "$deadline" ]; do + read -r state url < <(curl -s "${AUTH[@]}" "${API}/repos/${NS}/${RECIPE}/commits/${SHA}/status" 2>/dev/null \ + | python3 -c "import json,sys +try: d=json.load(sys.stdin) +except Exception: d={} +sts=d.get('statuses') or [] +print(d.get('state','') or 'none', (sts[0].get('target_url') if sts else '') or '')" 2>/dev/null || echo "none ") + case "$state" in + success) echo "VERDICT=GREEN"; echo "BUILD=${url}"; exit 0;; + failure|error) echo "VERDICT=RED"; echo "BUILD=${url}"; exit 2;; + esac + sleep "$INTERVAL" +done +echo "VERDICT=PENDING"; echo "BUILD=${url:-?}" +echo "(run still in flight — re-run with POST=0 to keep polling without re-triggering)" +exit 3 diff --git a/.claude/skills/upgrade-all/SKILL.md b/.claude/skills/upgrade-all/SKILL.md index d1b89cc..4425f47 100644 --- a/.claude/skills/upgrade-all/SKILL.md +++ b/.claude/skills/upgrade-all/SKILL.md @@ -51,12 +51,18 @@ A table — Recipe | Status (will upgrade / skipped:reason) | Available upgrade( For each recipe in `RECIPES_TO_UPGRADE`, spawn an Agent (`subagent_type: "general-purpose"`, description `"Upgrade on cc-ci"`) with a prompt like: -> Run the `/recipe-upgrade ` skill end-to-end -> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, **verify green on cc-ci**, -> open a recipe PR, and — only if the upgrade is correct but a cc-ci test went stale — open a verified -> cc-ci test PR too. Drive cc-ci over `ssh cc-ci`. Do NOT prompt. Do NOT push upstream. Do NOT merge. -> Print exactly one `RESULT:` line as your final line (the SUCCESS / SUCCESS+TESTPR / FAILED / SKIPPED -> forms from the recipe-upgrade skill). +> Run the `/recipe-upgrade ` skill end-to-end — **DEFAULT mode, do NOT pass `--with-tests`** +> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, open a recipe PR, and verify +> it by posting `!testme` on the PR (results visible in the PR; iterate ≤3×). Use the **existing +> tests** — if a test fails because it is genuinely stale, **leave an explanatory comment on the PR** +> for the operator and do NOT modify any test. Drive cc-ci over `ssh cc-ci`. Do NOT prompt. Do NOT push +> upstream. Do NOT merge. Print exactly one `RESULT:` line as your final line (SUCCESS / +> SUCCESS-PENDING-TESTS / FAILED / SKIPPED — `SUCCESS+TESTPR` will not occur in default mode). + +**Why default (no `--with-tests`):** the weekly cron must never auto-edit cc-ci tests unattended — a +test change deserves a human decision. So the cron opens recipe PRs only; where a recipe's existing +test looks stale, the operator sees the explanation in the PR comment and can re-run that one recipe +with `/recipe-upgrade --with-tests` to also get a verified test-update PR. - **Sequential (default):** one Agent at a time; wait, collect its `RESULT:`, then the next. If the Agent tool call itself errors, record `FAILED — agent tool error` and continue. One failure never @@ -65,8 +71,8 @@ description `"Upgrade on cc-ci"`) with a prompt like: the results). Failures are isolated per recipe. ## 4. Collect results -Parse each final `RESULT:` line into SUCCESS / SUCCESS+TESTPR / FAILED / SKIPPED. A subagent that -emitted no `RESULT:` line → `FAILED — no result emitted`. +Parse each final `RESULT:` line into SUCCESS / SUCCESS-PENDING-TESTS / FAILED / SKIPPED (default mode +won't emit `SUCCESS+TESTPR`). A subagent that emitted no `RESULT:` line → `FAILED — no result emitted`. ## 5. Write + print the summary Write `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-.md` and print it, **leading with the @@ -74,9 +80,11 @@ PR list** (the actionable output): ```markdown # cc-ci Weekly Upgrade Run — ## Summary -- Considered: N · Upgraded (PR opened): N · With cc-ci test PR: N · Failed: N · Skipped: N +- Considered: N · PR green (!testme): N · PR open but tests look stale (commented): N · Failed: N · Skipped: N ## PRs to review (NOT merged) -- — recipe PR: [+ cc-ci test PR: ] +- — recipe PR: — !testme GREEN +## PRs where a test looks stale (operator: re-run `--with-tests` to update tests) +- — recipe PR: — !testme RED on ; see PR comment ## Failed (investigate) - — at : (log: .cc-ci-logs/upgrades/-upgrade-.md) ## Skipped