diff --git a/.claude/skills/upgrade-all/SKILL.md b/.claude/skills/upgrade-all/SKILL.md index db09e1f..f0a76cd 100644 --- a/.claude/skills/upgrade-all/SKILL.md +++ b/.claude/skills/upgrade-all/SKILL.md @@ -1,6 +1,6 @@ --- name: upgrade-all -description: Weekly autonomous upgrade run for the cc-ci CI server. Surveys every enrolled recipe for available upstream upgrades, then runs /recipe-upgrade on each upgradeable one via a subagent — plan, implement, verify green on cc-ci, open a recipe PR (and, only if a cc-ci test went stale, a verified cc-ci test PR). Collects results into one summary listing every PR to review. Concurrency-bounded by default — runs up to DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) recipe subagents at a time; --sequential for one-at-a-time, --capacity N to override, --parallel to fan out all, --dry-run to preview. NEVER merges. Built to run once weekly on a cron. Invoke as /upgrade-all. +description: Weekly autonomous upgrade run for the cc-ci CI server. Surveys every enrolled recipe for available upstream upgrades, then runs /recipe-upgrade on each upgradeable one via a subagent — plan, implement, verify green on cc-ci, open a recipe PR (and, only if a cc-ci test went stale, a verified cc-ci test PR). Collects results into one summary listing every PR to review. Rolling pool by default — works through recipes ALPHABETICALLY keeping DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) subagents running at once, starting the next as each finishes; --sequential for one-at-a-time, --capacity N to override the pool size, --parallel to start all at once, --dry-run to preview. NEVER merges. Built to run once weekly on a cron. Invoke as /upgrade-all. --- # upgrade-all @@ -23,8 +23,9 @@ session, but the agent is the intended path so the weekly run isn't buried in he ## Arguments (optional `$ARGUMENTS`) - A space-separated list of recipe names → only those (else all enrolled recipes). - `--dry-run` → survey + print what WOULD upgrade; spawn nothing. -- **Default = concurrency-bounded:** run up to **`DRONE_RUNNER_CAPACITY`** recipe subagents at a - time (the drone runner's slots — currently `2`). The 2026-06-10 concurrency restructure +- **Default = rolling pool, alphabetical:** work through the recipes in alphabetical order keeping + **`DRONE_RUNNER_CAPACITY`** subagents running at once (the drone runner's slots — currently `2`), + starting the next recipe the moment one finishes. The 2026-06-10 concurrency restructure (`docs/concurrency.md`) makes concurrent recipe runs SAFE (per-run recipe trees + app-domain locks + isolation), and the capacity knob is the operator's resource-tuned ceiling — so matching the subagent pool to it uses all available concurrency without oversubscribing. @@ -126,7 +127,7 @@ CAP=${CAP:-2} # fallback to the documented default if the query fails ``` `--sequential` → `CAP=1`; `--capacity N` → `CAP=N`; `--parallel` → `CAP=∞` (all at once). Then print a table — Recipe | Status (will upgrade / skipped:reason) | Available upgrade(s) — plus -the mode (`Concurrency-bounded (N=)` default / `Sequential` / `Parallel`). If `--dry-run`, +the mode (`Rolling pool (N=, alphabetical)` default / `Sequential` / `Parallel`). If `--dry-run`, **stop here**. ## 3. Upgrade each recipe via a subagent @@ -147,31 +148,20 @@ test change deserves a human decision. So the cron opens recipe PRs only; where test looks stale, the operator sees the explanation in the PR comment and can re-run that one recipe with `/recipe-upgrade --with-tests` to also get a verified test-update PR. -- **Concurrency-bounded (default), `CAP` at a time:** process `RECIPES_TO_UPGRADE` in waves of - `CAP`. For each wave, emit `CAP` Agent calls **in one message** (they run concurrently — no - `run_in_background`, you need their `RESULT:` lines); wait for all `CAP` to return; collect their - results; then start the next wave. This keeps the drone runner's `CAP` slots busy without - oversubscribing. `CAP=1` degenerates to one-at-a-time (sequential). An Agent tool-call error → - record `FAILED — agent tool error` and continue; one failure never aborts the run or its wave. -- **Parallel (`--parallel`):** `CAP=∞` — emit ALL Agent calls in one message. Heaviest load; - failures still isolated per recipe. - -**Wave composition — alternate heavy and light (do NOT just go heaviest-first).** The binding limit -is host memory, so never put two HEAVY recipes in the same wave: pair each heavy with a light one so -peak memory stays bounded while both slots stay busy. Classify from the survey's image set: -- **HEAVY** (multi-GB images / many services / slow deploy): discourse, immich, matrix-synapse, - lasuite-drive, mattermost-lts, ghost. -- **LIGHT / moderate** (one small image or a few light services): custom-html*, cryptpad, n8n, - uptime-kuma, lasuite-docs, lasuite-meet, keycloak, mailu, plausible. -Build the wave order by drawing one HEAVY + one LIGHT per wave (capacity 2) until a bucket empties, -then pair up the remainder. **Always fill all `CAP` slots** — once only heavies are left, run two (or -`CAP`) heavies per wave rather than leaving a slot idle; the box is tuned for `CAP` concurrent builds. -The heavy/light alternation is only to *spread* heavies across waves while a light is available, not a -hard cap. For `CAP>2`, spread heavies across waves where possible but still fill the slots. - -Note (wave barrier): a wave waits for its slowest recipe before the next starts, so the light slot -idles while its heavy wave-mate finishes — accepted for the weekly run. Do NOT maintain a rolling -pool with `run_in_background` (you'd lose the `RESULT:` lines). +**Rolling pool (default), `CAP` running at once, recipes in ALPHABETICAL order.** Keep exactly `CAP` +recipe subagents in flight at all times — as soon as one finishes, immediately start the next +alphabetical recipe. No waves, no heavy/light classification — just two (=`CAP`) always running. +How: +1. Sort `RECIPES_TO_UPGRADE` alphabetically. +2. Start the first `CAP` as background Agents (`run_in_background: true`) — you are notified when + each completes and can read its final output for the `RESULT:` line. +3. On each completion: read that agent's `RESULT:` and record it; if recipes remain, immediately + start the next alphabetical one as a background Agent (keeping `CAP` in flight). +4. Repeat until every recipe has been started AND every subagent has completed, then go to §4. +`CAP=1` → strictly one-at-a-time (sequential). An Agent that dies/errors → record `FAILED — agent +tool error` and still start its replacement; one failure never aborts the run. +- **Parallel (`--parallel`):** no pool cap — start ALL recipes as background Agents at once. Heaviest + load (every step-2b chaos deploy concurrent); only when the box is otherwise idle. ## 4. Collect results Parse each final `RESULT:` line into SUCCESS / SUCCESS-PENDING-TESTS / FAILED / SKIPPED (default mode