diff --git a/.claude/skills/recipe-upgrade/SKILL.md b/.claude/skills/recipe-upgrade/SKILL.md index 4a52be8..2684ba9 100644 --- a/.claude/skills/recipe-upgrade/SKILL.md +++ b/.claude/skills/recipe-upgrade/SKILL.md @@ -121,6 +121,38 @@ On cc-ci's `~/.abra/recipes/` (wrap every abra call per the pseudo-TTY b - Commit on a branch: `git commit -m "chore: upgrade to "` (the commit message drives the PR branch name `upgrade-`). Do **not** push to upstream; do **not** tag-push. +### 2b. Direct deploy + inspect on cc-ci — live feedback BEFORE CI (recipe-maintainer style) +Before opening the PR / running `!testme`, deploy the WIP recipe **directly** on the cc-ci server and +watch it converge — the way recipe-maintainer tests on `cctest`. This gives you **live logs + +container state** to actually SEE what the upgrade does (a failed migration, a crash-loop, a config +gap), instead of only a pass/fail CI card. It is **ADDITIONAL to — not a replacement for** — the +`!testme` verification + its 3-attempt budget (steps 4–5); it just front-loads the diagnosis so you +arrive at CI already knowing the upgrade converges, and waste fewer of those 3 runs on basics. + +Deploy the recipe head you just edited with **`--chaos`** (deploys the local WIP without a clean +published commit), under a dedicated `dev-` domain on the local swarm (`--server default`). +Apply the usual gotchas (pseudo-TTY wrap every abra call; creds baked into the mirror origin; stash +the untracked overlay). `abra app new` generates the `.env` + secrets: +``` +ssh cc-ci 'export PATH=/run/current-system/sw/bin:$PATH; set -a; . /srv/cc-ci/.testenv; set +a; \ + R=; D=dev-$R.ci.commoninternet.net; \ + script -qec "abra app new $R --server default --domain $D -n" /dev/null 2>&1 | tail -3; \ + script -qec "abra app deploy $D --chaos -n" /dev/null 2>&1 | tail -20' +``` +- **Inspect logs directly** — prefer `docker service logs` over ssh (more reliable than `abra app + logs` non-interactively, per recipe-maintainer `references/recipe-maintainer/learnings.md`): + `ssh cc-ci 'docker stack services ${D//./_}; docker service logs --tail 200 ${D//./_}_app'` + (swap `_app` for the failing service; pseudo-TTY wrap any `abra app logs`/`abra app cmd`). +- **Iterate:** edit the recipe → `abra app deploy $D --chaos --force` to cycle → re-read logs, until it + converges, serves, and any migration/config behaves. This is where you debug the real upgrade with + full visibility — fold what you learn back into the recipe edit (step 2). +- **Tear down when done — ALWAYS** (shared swarm): `script -qec "abra app undeploy $D -n" /dev/null` + then `script -qec "abra app rm $D -n --no-input" /dev/null` (removes volumes/secrets). The + `/upgrade-all` orphan-sweep (Step 0) is the backstop, but clean up explicitly. +- Caveats: shared swarm — keep to **ONE** `dev-` instance at a time and tear it down before the + next recipe; the `dev-` domain is distinct from the harness's per-run domains and from the + `warm-*` canonicals, so the sweep removes a leaked one without touching live services. + ### 3. Open the recipe PR (never merge) ``` set -a; . /srv/cc-ci/.testenv; set +a diff --git a/.claude/skills/upgrade-all/SKILL.md b/.claude/skills/upgrade-all/SKILL.md index 13d8401..a9dedc1 100644 --- a/.claude/skills/upgrade-all/SKILL.md +++ b/.claude/skills/upgrade-all/SKILL.md @@ -109,8 +109,9 @@ For each recipe in `RECIPES_TO_UPGRADE`, spawn an Agent (`subagent_type: "genera description `"Upgrade on cc-ci"`) with a prompt like: > Run the `/recipe-upgrade ` skill end-to-end — **DEFAULT mode, do NOT pass `--with-tests`** -> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, open a recipe PR, and verify -> it by posting `!testme` on the PR (results visible in the PR; iterate ≤3×). Use the **existing +> (`/srv/cc-ci/.claude/skills/recipe-upgrade/SKILL.md`): plan, implement, **directly deploy + inspect +> the WIP on cc-ci for live feedback (step 2b, `--chaos`, tear it down after)**, open a recipe PR, and +> verify it by posting `!testme` on the PR (results visible in the PR; iterate ≤3×). Use the **existing > tests** — if a test fails because it is genuinely stale, **leave an explanatory comment on the PR** > for the operator and do NOT modify any test. Drive cc-ci over `ssh cc-ci`. Do NOT prompt. Do NOT push > upstream. Do NOT merge. Print exactly one `RESULT:` line as your final line (SUCCESS /