upgrade-all: simplify to a rolling pool, alphabetical (drop waves + heavy/light)

Per operator: just work through recipes alphabetically keeping CAP (= DRONE_RUNNER_CAPACITY=2) subagents running at once, starting the next the moment one finishes (rolling pool via run_in_background). Removes the wave-barrier and the heavy/light classification entirely — simpler and no slot ever idles.
2026-06-12 01:58:22 +00:00
parent 894d829313
commit 2c5e08f78c
1 changed files with 19 additions and 29 deletions
--- a/.claude/skills/upgrade-all/SKILL.md
+++ b/.claude/skills/upgrade-all/SKILL.md
@ -1,6 +1,6 @@
 ---
 name: upgrade-all
-description: Weekly autonomous upgrade run for the cc-ci CI server. Surveys every enrolled recipe for available upstream upgrades, then runs /recipe-upgrade on each upgradeable one via a subagent — plan, implement, verify green on cc-ci, open a recipe PR (and, only if a cc-ci test went stale, a verified cc-ci test PR). Collects results into one summary listing every PR to review. Concurrency-bounded by default — runs up to DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) recipe subagents at a time; --sequential for one-at-a-time, --capacity N to override, --parallel to fan out all, --dry-run to preview. NEVER merges. Built to run once weekly on a cron. Invoke as /upgrade-all.
+description: Weekly autonomous upgrade run for the cc-ci CI server. Surveys every enrolled recipe for available upstream upgrades, then runs /recipe-upgrade on each upgradeable one via a subagent — plan, implement, verify green on cc-ci, open a recipe PR (and, only if a cc-ci test went stale, a verified cc-ci test PR). Collects results into one summary listing every PR to review. Rolling pool by default — works through recipes ALPHABETICALLY keeping DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) subagents running at once, starting the next as each finishes; --sequential for one-at-a-time, --capacity N to override the pool size, --parallel to start all at once, --dry-run to preview. NEVER merges. Built to run once weekly on a cron. Invoke as /upgrade-all.
 ---

 # upgrade-all
@ -23,8 +23,9 @@ session, but the agent is the intended path so the weekly run isn't buried in he
 ## Arguments (optional `$ARGUMENTS`)
 - A space-separated list of recipe names → only those (else all enrolled recipes).
 - `--dry-run` → survey + print what WOULD upgrade; spawn nothing.
- **Default = concurrency-bounded:** run up to **`DRONE_RUNNER_CAPACITY`** recipe subagents at a
-  time (the drone runner's slots — currently `2`). The 2026-06-10 concurrency restructure
+- **Default = rolling pool, alphabetical:** work through the recipes in alphabetical order keeping
+  **`DRONE_RUNNER_CAPACITY`** subagents running at once (the drone runner's slots — currently `2`),
+  starting the next recipe the moment one finishes. The 2026-06-10 concurrency restructure
  (`docs/concurrency.md`) makes concurrent recipe runs SAFE (per-run recipe trees + app-domain
  locks + isolation), and the capacity knob is the operator's resource-tuned ceiling — so matching
  the subagent pool to it uses all available concurrency without oversubscribing.
@ -126,7 +127,7 @@ CAP=${CAP:-2}   # fallback to the documented default if the query fails
 ```
 `--sequential` → `CAP=1`; `--capacity N` → `CAP=N`; `--parallel` → `CAP=∞` (all at once).
 Then print a table — Recipe | Status (will upgrade / skipped:reason) | Available upgrade(s) — plus
-the mode (`Concurrency-bounded (N=<CAP>)` default / `Sequential` / `Parallel`). If `--dry-run`,
+the mode (`Rolling pool (N=<CAP>, alphabetical)` default / `Sequential` / `Parallel`). If `--dry-run`,
 **stop here**.

 ## 3. Upgrade each recipe via a subagent
@ -147,31 +148,20 @@ test change deserves a human decision. So the cron opens recipe PRs only; where
 test looks stale, the operator sees the explanation in the PR comment and can re-run that one recipe
 with `/recipe-upgrade <recipe> --with-tests` to also get a verified test-update PR.

- **Concurrency-bounded (default), `CAP` at a time:** process `RECIPES_TO_UPGRADE` in waves of
-  `CAP`. For each wave, emit `CAP` Agent calls **in one message** (they run concurrently — no
-  `run_in_background`, you need their `RESULT:` lines); wait for all `CAP` to return; collect their
-  results; then start the next wave. This keeps the drone runner's `CAP` slots busy without
-  oversubscribing. `CAP=1` degenerates to one-at-a-time (sequential). An Agent tool-call error →
-  record `FAILED — agent tool error` and continue; one failure never aborts the run or its wave.
- **Parallel (`--parallel`):** `CAP=∞` — emit ALL Agent calls in one message. Heaviest load;
-  failures still isolated per recipe.
-
-**Wave composition — alternate heavy and light (do NOT just go heaviest-first).** The binding limit
-is host memory, so never put two HEAVY recipes in the same wave: pair each heavy with a light one so
-peak memory stays bounded while both slots stay busy. Classify from the survey's image set:
- **HEAVY** (multi-GB images / many services / slow deploy): discourse, immich, matrix-synapse,
-  lasuite-drive, mattermost-lts, ghost.
- **LIGHT / moderate** (one small image or a few light services): custom-html*, cryptpad, n8n,
-  uptime-kuma, lasuite-docs, lasuite-meet, keycloak, mailu, plausible.
-Build the wave order by drawing one HEAVY + one LIGHT per wave (capacity 2) until a bucket empties,
-then pair up the remainder. **Always fill all `CAP` slots** — once only heavies are left, run two (or
-`CAP`) heavies per wave rather than leaving a slot idle; the box is tuned for `CAP` concurrent builds.
-The heavy/light alternation is only to *spread* heavies across waves while a light is available, not a
-hard cap. For `CAP>2`, spread heavies across waves where possible but still fill the slots.
-
-Note (wave barrier): a wave waits for its slowest recipe before the next starts, so the light slot
-idles while its heavy wave-mate finishes — accepted for the weekly run. Do NOT maintain a rolling
-pool with `run_in_background` (you'd lose the `RESULT:` lines).
+**Rolling pool (default), `CAP` running at once, recipes in ALPHABETICAL order.** Keep exactly `CAP`
+recipe subagents in flight at all times — as soon as one finishes, immediately start the next
+alphabetical recipe. No waves, no heavy/light classification — just two (=`CAP`) always running.
+How:
+1. Sort `RECIPES_TO_UPGRADE` alphabetically.
+2. Start the first `CAP` as background Agents (`run_in_background: true`) — you are notified when
+   each completes and can read its final output for the `RESULT:` line.
+3. On each completion: read that agent's `RESULT:` and record it; if recipes remain, immediately
+   start the next alphabetical one as a background Agent (keeping `CAP` in flight).
+4. Repeat until every recipe has been started AND every subagent has completed, then go to §4.
+`CAP=1` → strictly one-at-a-time (sequential). An Agent that dies/errors → record `FAILED — agent
+tool error` and still start its replacement; one failure never aborts the run.
+- **Parallel (`--parallel`):** no pool cap — start ALL recipes as background Agents at once. Heaviest
+  load (every step-2b chaos deploy concurrent); only when the box is otherwise idle.

 ## 4. Collect results
 Parse each final `RESULT:` line into SUCCESS / SUCCESS-PENDING-TESTS / FAILED / SKIPPED (default mode