upgrade-all: simplify to a rolling pool, alphabetical (drop waves + heavy/light)

Per operator: just work through recipes alphabetically keeping CAP (=
DRONE_RUNNER_CAPACITY=2) subagents running at once, starting the next the moment
one finishes (rolling pool via run_in_background). Removes the wave-barrier and
the heavy/light classification entirely — simpler and no slot ever idles.
This commit is contained in:
autonomic-bot
2026-06-12 01:58:22 +00:00
parent 894d829313
commit 2c5e08f78c

View File

@ -1,6 +1,6 @@
---
name: upgrade-all
description: Weekly autonomous upgrade run for the cc-ci CI server. Surveys every enrolled recipe for available upstream upgrades, then runs /recipe-upgrade on each upgradeable one via a subagent — plan, implement, verify green on cc-ci, open a recipe PR (and, only if a cc-ci test went stale, a verified cc-ci test PR). Collects results into one summary listing every PR to review. Concurrency-bounded by default — runs up to DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) recipe subagents at a time; --sequential for one-at-a-time, --capacity N to override, --parallel to fan out all, --dry-run to preview. NEVER merges. Built to run once weekly on a cron. Invoke as /upgrade-all.
description: Weekly autonomous upgrade run for the cc-ci CI server. Surveys every enrolled recipe for available upstream upgrades, then runs /recipe-upgrade on each upgradeable one via a subagent — plan, implement, verify green on cc-ci, open a recipe PR (and, only if a cc-ci test went stale, a verified cc-ci test PR). Collects results into one summary listing every PR to review. Rolling pool by default — works through recipes ALPHABETICALLY keeping DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) subagents running at once, starting the next as each finishes; --sequential for one-at-a-time, --capacity N to override the pool size, --parallel to start all at once, --dry-run to preview. NEVER merges. Built to run once weekly on a cron. Invoke as /upgrade-all.
---
# upgrade-all
@ -23,8 +23,9 @@ session, but the agent is the intended path so the weekly run isn't buried in he
## Arguments (optional `$ARGUMENTS`)
- A space-separated list of recipe names → only those (else all enrolled recipes).
- `--dry-run` → survey + print what WOULD upgrade; spawn nothing.
- **Default = concurrency-bounded:** run up to **`DRONE_RUNNER_CAPACITY`** recipe subagents at a
time (the drone runner's slots — currently `2`). The 2026-06-10 concurrency restructure
- **Default = rolling pool, alphabetical:** work through the recipes in alphabetical order keeping
**`DRONE_RUNNER_CAPACITY`** subagents running at once (the drone runner's slots — currently `2`),
starting the next recipe the moment one finishes. The 2026-06-10 concurrency restructure
(`docs/concurrency.md`) makes concurrent recipe runs SAFE (per-run recipe trees + app-domain
locks + isolation), and the capacity knob is the operator's resource-tuned ceiling — so matching
the subagent pool to it uses all available concurrency without oversubscribing.
@ -126,7 +127,7 @@ CAP=${CAP:-2} # fallback to the documented default if the query fails
```
`--sequential``CAP=1`; `--capacity N``CAP=N`; `--parallel``CAP=∞` (all at once).
Then print a table — Recipe | Status (will upgrade / skipped:reason) | Available upgrade(s) — plus
the mode (`Concurrency-bounded (N=<CAP>)` default / `Sequential` / `Parallel`). If `--dry-run`,
the mode (`Rolling pool (N=<CAP>, alphabetical)` default / `Sequential` / `Parallel`). If `--dry-run`,
**stop here**.
## 3. Upgrade each recipe via a subagent
@ -147,31 +148,20 @@ test change deserves a human decision. So the cron opens recipe PRs only; where
test looks stale, the operator sees the explanation in the PR comment and can re-run that one recipe
with `/recipe-upgrade <recipe> --with-tests` to also get a verified test-update PR.
- **Concurrency-bounded (default), `CAP` at a time:** process `RECIPES_TO_UPGRADE` in waves of
`CAP`. For each wave, emit `CAP` Agent calls **in one message** (they run concurrently — no
`run_in_background`, you need their `RESULT:` lines); wait for all `CAP` to return; collect their
results; then start the next wave. This keeps the drone runner's `CAP` slots busy without
oversubscribing. `CAP=1` degenerates to one-at-a-time (sequential). An Agent tool-call error →
record `FAILED — agent tool error` and continue; one failure never aborts the run or its wave.
- **Parallel (`--parallel`):** `CAP=∞` — emit ALL Agent calls in one message. Heaviest load;
failures still isolated per recipe.
**Wave composition — alternate heavy and light (do NOT just go heaviest-first).** The binding limit
is host memory, so never put two HEAVY recipes in the same wave: pair each heavy with a light one so
peak memory stays bounded while both slots stay busy. Classify from the survey's image set:
- **HEAVY** (multi-GB images / many services / slow deploy): discourse, immich, matrix-synapse,
lasuite-drive, mattermost-lts, ghost.
- **LIGHT / moderate** (one small image or a few light services): custom-html*, cryptpad, n8n,
uptime-kuma, lasuite-docs, lasuite-meet, keycloak, mailu, plausible.
Build the wave order by drawing one HEAVY + one LIGHT per wave (capacity 2) until a bucket empties,
then pair up the remainder. **Always fill all `CAP` slots** — once only heavies are left, run two (or
`CAP`) heavies per wave rather than leaving a slot idle; the box is tuned for `CAP` concurrent builds.
The heavy/light alternation is only to *spread* heavies across waves while a light is available, not a
hard cap. For `CAP>2`, spread heavies across waves where possible but still fill the slots.
Note (wave barrier): a wave waits for its slowest recipe before the next starts, so the light slot
idles while its heavy wave-mate finishes — accepted for the weekly run. Do NOT maintain a rolling
pool with `run_in_background` (you'd lose the `RESULT:` lines).
**Rolling pool (default), `CAP` running at once, recipes in ALPHABETICAL order.** Keep exactly `CAP`
recipe subagents in flight at all times — as soon as one finishes, immediately start the next
alphabetical recipe. No waves, no heavy/light classification — just two (=`CAP`) always running.
How:
1. Sort `RECIPES_TO_UPGRADE` alphabetically.
2. Start the first `CAP` as background Agents (`run_in_background: true`) — you are notified when
each completes and can read its final output for the `RESULT:` line.
3. On each completion: read that agent's `RESULT:` and record it; if recipes remain, immediately
start the next alphabetical one as a background Agent (keeping `CAP` in flight).
4. Repeat until every recipe has been started AND every subagent has completed, then go to §4.
`CAP=1` → strictly one-at-a-time (sequential). An Agent that dies/errors → record `FAILED — agent
tool error` and still start its replacement; one failure never aborts the run.
- **Parallel (`--parallel`):** no pool cap — start ALL recipes as background Agents at once. Heaviest
load (every step-2b chaos deploy concurrent); only when the box is otherwise idle.
## 4. Collect results
Parse each final `RESULT:` line into SUCCESS / SUCCESS-PENDING-TESTS / FAILED / SKIPPED (default mode