ci-test-review: close the loop — author + open + cc-ci-verify fix PRs (never merge)
Per operator: the skill should not just propose, it should CREATE the fix PR (recipe repo or cc-ci repo) and VERIFY it green on its own CI server — but not merge. It drives cc-ci like the loops do. - SKILL.md: diagnose+classify (recipe vs CI-server) -> author the fix + open a PR (recipe-create-pr for recipe PRs; Gitea API for cc-ci PRs, dedicated branch in a separate clone, single-writer safe) -> VERIFY on cc-ci (full suite cold against the PR head = the !testme dogfood path) -> report a verified, ready-to-merge PR. Never merges; never weakens a test; flake != bug. General bar = one cold green; repeated-green (REPEAT=3) only for a known-flaky recipe. Adds coordination/single-writer guardrails (shared Swarm is stateful; tear down deploys; never push main or touch the loops' clones). - verify-pr.sh: deterministic recipe-PR gate — RECIPE + REF -> cold full suite on cc-ci, green iff every repeat exits 0. CI-server-PR verification stays bespoke (branch checkout + rebuild + regression sample) per SKILL.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@ -1,17 +1,25 @@
|
||||
---
|
||||
name: ci-test-review
|
||||
description: On-demand AI review of the cc-ci server — runs the deterministic recipe test suite across ALL enrolled recipes, and for any failure diagnoses the root cause and classifies it as needing a RECIPE PR (with the proposed change), a CI-SERVER PR, or a stale-test update; or reports "all passed, recipes + tests up to date". Use when the operator wants a full health check + actionable diagnosis of the recipe CI, beyond the deterministic pass/fail of normal !testme runs. Invoke as /ci-test-review.
|
||||
description: On-demand AI review + auto-fix of the cc-ci server. Runs the deterministic recipe test suite across ALL enrolled recipes; for any failure it diagnoses the root cause, classifies it as a RECIPE bug or a CI-SERVER bug, AUTHORS the fix, OPENS a PR (recipe repo or cc-ci repo), and VERIFIES that PR green on the CI server (deploy + full suite against the PR head, dogfood) — but NEVER merges (the operator merges). If everything is green and current it reports "all passed, recipes + tests up to date". Use for a full health check that ends in verified, ready-to-merge fix PRs. Invoke as /ci-test-review.
|
||||
---
|
||||
|
||||
# ci-test-review
|
||||
|
||||
An **on-demand, AI-driven review** of the cc-ci recipe CI server. cc-ci's primary use is
|
||||
**deterministic** testing (normal `!testme` / nightly — no AI). This skill adds a review layer: run
|
||||
the full suite across **all** recipes, then **AI-diagnose** any failures and say exactly what fixes
|
||||
them — a recipe PR, a CI-server PR, or a test update — or confirm everything is green and current.
|
||||
An **on-demand, AI-driven review + auto-fix** layer over the cc-ci recipe CI server. cc-ci's primary
|
||||
use is **deterministic** testing (normal `!testme` / nightly — no AI). This skill runs the full suite
|
||||
across **all** recipes, then for any failure: **diagnoses** the root cause, **classifies** it (recipe
|
||||
vs CI-server), **authors the fix, opens a PR, and verifies that PR green on the CI server** — the same
|
||||
deploy-and-test dogfood the build loops use. It leaves a **verified, ready-to-merge PR**; it does
|
||||
**NOT merge**. If everything's green and current it reports "all passed".
|
||||
|
||||
**Hard boundary:** the test *execution* is deterministic (the cc-ci harness). **AI is used ONLY for
|
||||
diagnosis / classification / proposing the PR — never to decide pass/fail.**
|
||||
**This skill drives its own CI server** (cc-ci) the way the loops do: it can push branches, deploy
|
||||
recipes, and run the harness there to verify a fix end-to-end.
|
||||
|
||||
**Hard boundaries:**
|
||||
- The test *execution and PR verification* are **deterministic** (the cc-ci harness decides
|
||||
pass/fail). **AI never decides pass/fail** — it only diagnoses, authors the fix, and drives the flow.
|
||||
- **Create + verify, never merge.** The operator reviews the cc-ci-green PR and merges.
|
||||
- **Never weaken a test** to make a PR go green.
|
||||
|
||||
## Access (this skill runs from the orchestration repo, drives cc-ci over the tailnet)
|
||||
- cc-ci is reachable via `ssh cc-ci` (root, through the userspace-tailscaled SOCKS proxy on
|
||||
@ -43,38 +51,78 @@ If every recipe's tiers are `pass`/`skip(N/A)` (no `fail`) **and** every recipe
|
||||
|
||||
Stop here. Nothing to fix.
|
||||
|
||||
### 3. For each failure → diagnose root cause + CLASSIFY (this is the AI part)
|
||||
### 3. For each failure → diagnose root cause + CLASSIFY (AI)
|
||||
For every `fail` (after confirming it's not a flake — see below), read the run log, the recipe's
|
||||
compose/code (on cc-ci `~/.abra/recipes/<recipe>`), and the harness, then determine the **root cause**
|
||||
and classify it as **exactly one** of:
|
||||
|
||||
- **RECIPE bug → recipe PR.** The recipe itself is broken/fragile (bad healthcheck, missing backup
|
||||
hook, startup race, etc.). Output the **concrete proposed change** (e.g. "add a collabora WOPI
|
||||
healthcheck + `start_period`", "add a `pg_dump` backup hook"). This is a PR to the recipe
|
||||
(recipe-maintainer); it is "working" only once cc-ci verifies it green, then the operator merges.
|
||||
- **CI-SERVER bug → cc-ci PR.** The harness/test/infra is wrong (a test asserts the wrong thing, a
|
||||
readiness gate is off, an infra config bug). Output the proposed change to the cc-ci repo.
|
||||
- **TEST out-of-date → cc-ci test update.** The recipe legitimately changed upstream and the cc-ci
|
||||
test/overlay needs updating to match. Output the update.
|
||||
- **RECIPE bug → recipe-side fix.** The recipe itself is broken/fragile (bad healthcheck, missing
|
||||
backup hook, startup race, etc.). The fix is a change to the recipe (e.g. "add a collabora WOPI
|
||||
healthcheck + `start_period`", "add a `pg_dump` backup hook").
|
||||
- **CI-SERVER bug → cc-ci-side fix.** The harness/test/infra is wrong (a test asserts the wrong
|
||||
thing, a readiness gate is off, an infra config bug). The fix is a change to the cc-ci repo.
|
||||
- **TEST out-of-date → cc-ci-side fix.** The recipe legitimately changed upstream and the cc-ci
|
||||
test/overlay needs updating to match.
|
||||
- **FLAKY / infra-transient.** Distinguish from a real failure: **re-run the recipe once or twice**;
|
||||
if it then passes, classify as flaky (note it; suggest a robustness fix if there's a pattern) — do
|
||||
NOT report a flake as a recipe/CI bug.
|
||||
NOT author a recipe/CI "fix" for a flake.
|
||||
|
||||
### 4. Freshness (part of "up to date")
|
||||
For each recipe, compare the **tested version** to the **latest published** recipe version (the
|
||||
sweep records both). Flag any recipe tested at a stale pin, and any cc-ci test that lags a recipe
|
||||
change — "all passed" must mean "passes against current upstream", not against an old pin.
|
||||
Freshness is part of this: compare each recipe's **tested version** to the **latest published**
|
||||
version (the sweep records both). A recipe tested at a stale pin, or a cc-ci test lagging an upstream
|
||||
recipe change, is itself a finding — "all passed" must mean "passes against current upstream".
|
||||
|
||||
### 5. Output a structured report
|
||||
### 4. Author the fix + OPEN a PR (AI authors; never merge)
|
||||
For each real (non-flaky) finding, write the actual fix and open a PR. **Never merge.**
|
||||
|
||||
- **Recipe-side fix → recipe PR.** Branch the recipe, apply the fix, and open the PR via the
|
||||
**`recipe-create-pr` skill** (`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md`) —
|
||||
it handles the mirror to `git.autonomic.zone/recipe-maintainers/<recipe>` (upstream
|
||||
`git.coopcloud.tech`). Keep the change **bounded** to the diagnosed root cause; don't rewrite the
|
||||
recipe.
|
||||
- **CI-server-side fix → cc-ci PR.** Branch the cc-ci product repo
|
||||
(`recipe-maintainers/cc-ci`), apply the fix, and open the PR via the Gitea API (use the
|
||||
`GITEA_*` creds from `/srv/cc-ci/.testenv`). **Single-writer discipline:** work on a dedicated
|
||||
branch in a SEPARATE clone — **never push `main`, never touch the build loops' working clones**
|
||||
(`/cc-ci`, `/cc-ci-adv`) or their in-flight state.
|
||||
|
||||
### 5. VERIFY each PR on the CI server (deterministic; still never merge)
|
||||
A PR is only "working" once **cc-ci verifies it green** (operator rule) — dogfood the CI that found
|
||||
the bug. Verification is deterministic (the harness), not an AI judgement.
|
||||
|
||||
- **Recipe PR:** run the **full suite COLD against the PR head** on cc-ci — the same path `!testme`
|
||||
uses — via the helper:
|
||||
```
|
||||
RECIPE=<recipe> REF=<pr-head-branch-or-sha> .claude/skills/ci-test-review/verify-pr.sh
|
||||
```
|
||||
Green ⇔ the harness exits 0. **General bar = one cold green.** Use `REPEAT=3` (repeated-green)
|
||||
**only for a recipe already known to be FLAKY** (e.g. lasuite-drive) as a flakiness proof — this is
|
||||
NOT the general standard.
|
||||
- **CI-server PR:** check the branch out in a clone on cc-ci, rebuild if the change requires it
|
||||
(`nixos-rebuild` / flake), then re-run the **previously-failing recipe(s)** through the harness
|
||||
**plus a small regression sample** of unrelated recipes. Green ⇔ the fix passes AND nothing
|
||||
regressed.
|
||||
- **If verification is RED:** iterate the fix on the same branch (**bounded — ≤3 attempts**) and
|
||||
re-verify. If still red, **leave the PR open** and report it as *"opened, NOT yet green — needs
|
||||
work"* with the failing evidence. Never abandon silently; never weaken a test to force green.
|
||||
|
||||
### 6. Output a structured report
|
||||
- A per-recipe **status table** (recipe | tiers | version tested / latest | verdict).
|
||||
- For each failure: **root cause → classification (recipe PR / CI PR / test-update / flaky) →
|
||||
proposed change** (concrete enough to act on).
|
||||
- For each finding: **root cause → classification (recipe / CI-server) → PR link → verification
|
||||
verdict** (`GREEN — cold-verified on cc-ci, ready for operator merge` / `RED — opened, needs work`).
|
||||
- Or the all-clear summary from step 2.
|
||||
- Always end with the explicit reminder that **nothing was merged** — the PRs await operator review.
|
||||
|
||||
## Guardrails
|
||||
- **Deterministic execution stays AI-free** — only diagnose; never override the harness's pass/fail.
|
||||
- **Propose, don't auto-merge.** This skill outputs recommendations + proposed PRs. It does NOT
|
||||
create or merge PRs. (A recipe PR is operator-merged only after cc-ci verifies it green.) If asked,
|
||||
it may hand a specific proposed recipe PR to the `recipe-create-pr` skill as an explicit follow-up.
|
||||
- **Don't weaken any test** to make the review "pass."
|
||||
- **Flake ≠ bug** — re-run before classifying a failure as a real recipe/CI defect.
|
||||
- **Deterministic execution + verification stay AI-free** — the harness decides pass/fail; AI only
|
||||
diagnoses, authors the fix, and drives the flow.
|
||||
- **Create + verify, NEVER merge.** Leave a cc-ci-green, ready-to-merge PR; the operator merges
|
||||
(recipe PRs especially are operator-merged only after cc-ci verifies them green).
|
||||
- **Don't weaken any test** to make a PR go green — the fix must make the recipe/CI genuinely correct.
|
||||
- **Flake ≠ bug** — re-run before authoring any fix.
|
||||
- **Real abra path** throughout (no docker-level bypass).
|
||||
- **Bounded fixes** — each PR targets one diagnosed root cause; not a rewrite.
|
||||
- **Coordination / single-writer:** while the build loops are actively developing cc-ci, the server
|
||||
**and** the `recipe-maintainers/cc-ci` repo are in use. Always use a dedicated branch + separate
|
||||
clone; never push `main` or disturb the loops' clones. Recipe deploys are **stateful on the shared
|
||||
Swarm**, so run verification when you have effectively-exclusive use of the host (or serialize), and
|
||||
**tear down whatever you deploy**.
|
||||
|
||||
48
.claude/skills/ci-test-review/verify-pr.sh
Executable file
48
.claude/skills/ci-test-review/verify-pr.sh
Executable file
@ -0,0 +1,48 @@
|
||||
#!/usr/bin/env bash
|
||||
# ci-test-review :: deterministic PR verification on the CI server (NO AI)
|
||||
# -------------------------------------------------------------------------
|
||||
# Verifies a RECIPE PR the same way `!testme` does: deploy + run the FULL suite
|
||||
# COLD against the PR head on cc-ci. The harness decides pass/fail — this script
|
||||
# only runs it and reports the exit code + RUN SUMMARY. AI never judges here.
|
||||
#
|
||||
# RECIPE=ghost REF=my-fix-branch verify-pr.sh # 1 cold green = working
|
||||
# RECIPE=lasuite-drive REF=sha REPEAT=3 verify-pr.sh # repeated-green: ONLY
|
||||
# # for a known-flaky recipe
|
||||
#
|
||||
# REF is the recipe PR head (branch name or sha) on the recipe mirror that abra
|
||||
# fetches from. Green iff EVERY repeat exits 0.
|
||||
#
|
||||
# (CI-SERVER PRs are verified differently — check the branch out in a clone on
|
||||
# cc-ci, rebuild, re-run the failing recipe(s) + a regression sample — see
|
||||
# SKILL.md step 5; that path is bespoke and not scripted here.)
|
||||
set -o errexit -o nounset -o pipefail
|
||||
|
||||
SSH="${SSH:-cc-ci}"
|
||||
REPEAT="${REPEAT:-1}"
|
||||
: "${RECIPE:?set RECIPE (e.g. ghost)}"
|
||||
: "${REF:?set REF (the PR head branch or sha)}"
|
||||
|
||||
RUNID="$(date -u +%Y%m%dT%H%M%SZ)"
|
||||
REMOTE_LOG="/root/cc-ci-review-logs/verify-${RECIPE}-${RUNID}"
|
||||
ssh "$SSH" "mkdir -p /root/cc-ci-review-logs"
|
||||
|
||||
echo "verify-pr: RECIPE=$RECIPE REF=$REF cold full-suite x${REPEAT} on ${SSH}" >&2
|
||||
|
||||
green=1
|
||||
for i in $(seq 1 "$REPEAT"); do
|
||||
log="${REMOTE_LOG}.${i}.log"
|
||||
rc=0
|
||||
# Real harness, cold (no --quick), against the PR head — same path as !testme.
|
||||
ssh "$SSH" "cd /root/cc-ci && RECIPE='${RECIPE}' REF='${REF}' cc-ci-run runner/run_recipe_ci.py >'${log}' 2>&1" || rc=$?
|
||||
echo "--- pass ${i}/${REPEAT}: exit ${rc} (log ${SSH}:${log}) ---" >&2
|
||||
ssh "$SSH" "awk '/===== RUN SUMMARY =====/{f=1} f{print}' '${log}'" >&2 || true
|
||||
[ "$rc" = "0" ] || green=0
|
||||
done
|
||||
|
||||
if [ "$green" = "1" ]; then
|
||||
echo "VERDICT: GREEN — ${RECIPE} PR (REF=${REF}) passed cold full-suite x${REPEAT}. Ready for operator merge (NOT merged)." >&2
|
||||
exit 0
|
||||
else
|
||||
echo "VERDICT: RED — ${RECIPE} PR (REF=${REF}) did not pass. Iterate the fix (<=3 attempts) or report needs-work. Do NOT weaken tests; do NOT merge." >&2
|
||||
exit 2
|
||||
fi
|
||||
Reference in New Issue
Block a user