Add cc-ci-upgrader agent: observable one-shot weekly upgrade-run agent

The weekly upgrade run now executes inside a dedicated, remote-control agent (cc-ci-upgrader) — viewable/steerable at claude.ai/code like the Builder — rather than buried in headless cron output. - launch-upgrader.sh: spins up the cc-ci-upgrader tmux session under --remote-control with a kickoff that runs /upgrade-all (DEFAULT mode) to completion. On finish the agent STOPS and stays idle (does NOT self-terminate) so the run + summary stay reviewable in the web UI. `start` = use-or-create: leaves an in-flight (busy) run alone, else clears a finished/idle/wedged session and runs fresh; `fresh` always restarts. UPGRADER_ARGS passes flags (e.g. --dry-run); never --with-tests. - launch.sh: orchestrator_alive() now also skips the cc-ci-upgrader remote-control name, so the upgrader job isn't mistaken for the orchestrator. - upgrade-all skill: documents it runs as the cc-ci-upgrader agent; the weekly cron invokes `launch-upgrader.sh start` (not /upgrade-all inline). - Phase 5: V8a verifies the agent lifecycle (launch → run to completion → stay idle/viewable → next start clears it); V9 stops the verification session. - cron memory: weekly task = launch-upgrader.sh start at 0 3 * * 6 UTC. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 21:12:47 +01:00
parent 4f74676c72
commit bf71420106
4 changed files with 151 additions and 7 deletions
--- a/.claude/skills/upgrade-all/SKILL.md
+++ b/.claude/skills/upgrade-all/SKILL.md
@ -12,6 +12,14 @@ recipe, and writes one summary of every PR to review. It **never pushes upstream

 Drives cc-ci over `ssh cc-ci`. Logs/summary go to `/srv/cc-ci/.cc-ci-logs/upgrades/`.

+**Runs as the `cc-ci-upgrader` agent.** This skill is normally executed by a dedicated, observable
+**one-shot job agent** — `cc-ci-upgrader` — spun up under remote-control (viewable/steerable at
+claude.ai/code, like the Builder) by `cc-ci-plan/launch-upgrader.sh`. That agent runs this skill to
+completion, then **stops and stays idle** so the run + summary remain reviewable in the web UI (it does
+NOT self-terminate). The weekly cron just invokes `launch-upgrader.sh start`; the next week's run
+clears the idle session and starts fresh. You can also run `/upgrade-all` inline in any `/srv/cc-ci`
+session, but the agent is the intended path so the weekly run isn't buried in headless output.
+
 ## Arguments (optional `$ARGUMENTS`)
 - A space-separated list of recipe names → only those (else all enrolled recipes).
 - `--dry-run` → survey + print what WOULD upgrade; spawn nothing.
@ -104,16 +112,18 @@ End with the report path and a reminder that **nothing was merged**.
 - **Never merges**; failures/ skips are surfaced and retried next week — safe to re-run anytime.

 ## Cron
-Designed for a weekly Claude Code scheduled task that invokes `/upgrade-all` in `/srv/cc-ci`.
+Designed for a weekly Claude Code scheduled task that runs **`cc-ci-plan/launch-upgrader.sh start`** —
+which spins up the `cc-ci-upgrader` remote-control agent to run this skill to completion (the agent
+then stays idle/viewable). The cron does NOT invoke `/upgrade-all` inline.

 **Agreed schedule:** **Saturday 03:00 UTC** (`0 3 * * 6`) — low-traffic weekend window, PRs waiting by
 Monday.

 **Activation trigger (operator, 2026-05-29):** do NOT activate while the build loops are still
 constructing cc-ci — it would contend with them for the shared host. **Activate this weekly cron only
-once the cc-ci build is complete (loops finished / cc-ci stable.)** Until then it's run manually /
-on-demand. When activating, create a scheduled task that runs `/upgrade-all` in `/srv/cc-ci` at
-`0 3 * * 6` UTC.
+once the cc-ci build is complete (loops finished / cc-ci stable.)** Until then run it manually /
+on-demand via `launch-upgrader.sh start` (or `fresh`). When activating, create a scheduled task that
+runs `/srv/cc-ci/cc-ci-plan/launch-upgrader.sh start` at `0 3 * * 6` UTC.

 Re-running is idempotent: already-current recipes report `SKIPPED — up-to-date`; recipes with an open
 PR for the same branch report the existing PR rather than duplicating it.
--- a/cc-ci-plan/launch-upgrader.sh
+++ b/cc-ci-plan/launch-upgrader.sh
@ -0,0 +1,125 @@
+#!/usr/bin/env bash
+#
+# launch-upgrader.sh — spin up the cc-ci UPGRADER agent in tmux under remote-control.
+#
+# The Upgrader is a ONE-SHOT job agent (not a perpetual loop like the Builder/Adversary): it runs the
+# weekly recipe-upgrade sequence — the /upgrade-all skill in DEFAULT mode — to completion, then STOPS
+# and stays idle (it does NOT self-terminate) so the run + summary remain viewable/steerable at
+# claude.ai/code exactly like the Builder, instead of being buried in headless cron output. The next
+# weekly run starts a fresh session: `start` leaves an in-flight run alone but clears a finished/idle
+# (or wedged) session and starts clean. The weekly cron (Sat 03:00 UTC, once cc-ci is built — see
+# [[cc-ci-upgrade-all-cron]]) invokes `launch-upgrader.sh start`.
+#
+# Naming: tmux session AND remote-control name are both "cc-ci-upgrader" (matching
+# cc-ci-builder / cc-ci-adv / cc-ci-watchdog / cc-ci-orchestrator).
+#
+# Usage:
+#   ./launch-upgrader.sh start          # use-or-create: if a run is actively in flight leave it,
+#                                        #   else (no session / idle-stale) kill any stale + start fresh
+#   ./launch-upgrader.sh fresh          # always kill any existing + start a fresh run
+#   ./launch-upgrader.sh status | attach | stop
+#
+# Env:
+#   UPGRADER_ARGS=""   passthrough args to /upgrade-all (e.g. "--dry-run", "ghost n8n"); default none
+#                      = full default fleet run. NEVER pass --with-tests here (the cron must not
+#                      auto-edit tests; that's the operator's per-recipe opt-in).
+set -euo pipefail
+
+SESSION="${UPGRADER_SESSION:-cc-ci-upgrader}"        # tmux session name == remote-control name
+WORKDIR="${UPGRADER_DIR:-/srv/cc-ci}"                # cwd: where .claude/skills/ + .testenv live
+CLAUDE_BIN="${CLAUDE_BIN:-claude}"
+CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
+REMOTE_CONTROL="${REMOTE_CONTROL:-1}"                # 1 => --remote-control (viewable at claude.ai/code)
+LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
+UPGRADER_ARGS="${UPGRADER_ARGS:-}"
+
+log()  { printf '[upgrader %(%H:%M:%S)T] %s\n' -1 "$*"; }
+die()  { log "ERROR: $*"; exit 1; }
+session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; }
+# "actively working" = the TUI shows the interrupt hint (a turn in flight). Absent => idle/finished/wedged.
+session_busy() { tmux capture-pane -pt "$SESSION" 2>/dev/null | grep -q 'esc to interrupt'; }
+
+preflight() {
+  command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux"
+  command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)"
+  [[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR"
+  [[ -d "$WORKDIR/.claude/skills/upgrade-all" ]] || die "upgrade-all skill not found under $WORKDIR/.claude/skills"
+  mkdir -p "$LOG_DIR"
+}
+
+write_kickoff() {
+  local kf="$LOG_DIR/.kickoff-$SESSION.txt"
+  cat > "$kf" <<KICK
+*** cc-ci UPGRADER — weekly recipe-upgrade job ***
+You are the cc-ci Upgrader: a ONE-SHOT job agent, NOT a perpetual loop. Run the recipe-upgrade
+sequence to completion, then STOP. Your cwd is ${WORKDIR}; reach the CI server with \`ssh cc-ci\`;
+creds are in ${WORKDIR}/.testenv; the skills live in ${WORKDIR}/.claude/skills/.
+
+DO THIS:
+1. Invoke the **/upgrade-all** skill in DEFAULT mode${UPGRADER_ARGS:+ with arguments: ${UPGRADER_ARGS}}
+   (read ${WORKDIR}/.claude/skills/upgrade-all/SKILL.md for the full procedure). It surveys every
+   enrolled recipe and, for each upgradeable one, runs /recipe-upgrade in DEFAULT mode — recipe PR
+   only, verified by posting \`!testme\` on the PR (results visible in the PR, iterate up to 3x). A
+   genuinely stale test gets an explanatory PR COMMENT, never a test edit.
+2. Process recipes via per-recipe SUBAGENTS (as the skill specifies) so your own context stays light.
+   If your context usage climbs (~80%), run /compact before continuing.
+3. Write + push the weekly summary (the PR list is the actionable output for the operator).
+4. WHEN THE RUN IS COMPLETE: STOP. Print the final summary (lead with the PR list) and an
+   \`UPGRADE RUN COMPLETE\` line, then go idle. Do NOT loop, do NOT re-run, and do NOT kill your own
+   session — leave it up so the operator can review your output + the summary in the web UI
+   (claude.ai/code). Next week's run starts a fresh session (the launcher clears this idle one).
+
+GUARDRAILS: NEVER merge any PR. NEVER weaken a test. DEFAULT mode only — do NOT pass --with-tests
+(updating cc-ci tests is the operator's per-recipe opt-in). Single-writer: dedicated branches +
+separate clones, never push main, never touch the build loops' /cc-ci /cc-ci-adv clones. The shared
+Swarm is stateful — go sequentially and tear down what you deploy.
+KICK
+  echo "$kf"
+}
+
+start() {
+  local mode="${1:-use-or-create}"
+  preflight
+  if session_alive; then
+    if [[ "$mode" == "use-or-create" ]] && session_busy; then
+      log "$SESSION already running a job (busy) — leaving it"; return 0
+    fi
+    log "$SESSION exists but idle/stale (or fresh requested) — killing it first"
+    tmux kill-session -t "$SESSION" 2>/dev/null || true; sleep 1
+  fi
+  local kf rc=""
+  kf="$(write_kickoff)"
+  [[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'"
+  log "starting $SESSION (cwd=$WORKDIR, rc=$REMOTE_CONTROL, args='${UPGRADER_ARGS:-<none>}')"
+  tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
+    "$CLAUDE_BIN $rc $CLAUDE_FLAGS \"\$(cat '$kf')\""
+  tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'"
+  log "started. status: $0 status | attach: tmux attach -t $SESSION | log: $LOG_DIR/$SESSION.log"
+}
+
+case "${1:-start}" in
+  start)  start use-or-create ;;
+  fresh)  start fresh ;;
+  stop)   if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi ;;
+  status)
+    if session_alive; then
+      log "$SESSION: RUNNING $(session_busy && echo '(busy)' || echo '(idle/finishing)')"
+      ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true
+    else log "$SESSION: stopped"; fi ;;
+  attach) exec tmux attach -t "$SESSION" ;;
+  *)
+    cat <<EOF
+cc-ci upgrader launcher — one-shot weekly recipe-upgrade job agent (remote-control)
+
+  $0 start    use-or-create: leave an in-flight run alone, else (re)start fresh (DEFAULT; what the cron calls)
+  $0 fresh    always kill any existing + start a fresh run
+  $0 status   show tmux + remote-control state
+  $0 attach   tmux attach to the session
+  $0 stop     kill the session
+
+Env: SESSION=$SESSION  WORKDIR=$WORKDIR  REMOTE_CONTROL=$REMOTE_CONTROL  UPGRADER_ARGS='${UPGRADER_ARGS:-<none>}'
+The agent runs /upgrade-all (DEFAULT mode) to completion, then STOPS and stays idle (viewable in the
+web UI). It does NOT self-terminate; the next weekly `start` clears the idle session and runs fresh.
+EOF
+    ;;
+esac
--- a/cc-ci-plan/launch.sh
+++ b/cc-ci-plan/launch.sh
@ -247,8 +247,9 @@ orchestrator_alive() {
  local pid args
  for pid in $(pgrep -x claude 2>/dev/null); do
    args="$(tr '\0' ' ' < "/proc/$pid/cmdline" 2>/dev/null || true)"
-    # skip the two loops (matched by their remote-control session NAME, not a stray path mention)
-    printf '%s' "$args" | grep -qE -- "--remote-control +'?cc-ci-(builder|adv)'?" && continue
+    # skip the loops + the one-shot upgrader job (matched by remote-control session NAME, not a
+    # stray path mention) — none of these is the orchestrator.
+    printf '%s' "$args" | grep -qE -- "--remote-control +'?cc-ci-(builder|adv|upgrader)'?" && continue
    return 0   # a non-loop claude process => orchestrator (or operator) is alive
  done
  tmux has-session -t "$ORCH_SESSION" 2>/dev/null && return 0
--- a/cc-ci-plan/plan-phase5-verify-upgrade-flow.md
+++ b/cc-ci-plan/plan-phase5-verify-upgrade-flow.md
@ -60,8 +60,16 @@ and **close every PR opened during verification afterward** — do not pollute r
      surfaces "tests look stale" PRs in its own summary section, runs **sequentially with teardown**
      between recipes, and the summary leads with the PR list. Confirms the weekly cron never
      auto-edits tests.
+- [ ] **V8a — the `cc-ci-upgrader` agent.** `cc-ci-plan/launch-upgrader.sh start` spins up a
+      remote-control `cc-ci-upgrader` session (viewable at claude.ai/code) that runs `/upgrade-all`
+      (DEFAULT) **to completion, then STOPS and stays idle** (does NOT self-terminate) with the
+      summary visible. A second `start` while a run is **in flight (busy)** leaves it alone; a `start`
+      against a **finished/idle (or wedged)** session **kills it and runs fresh**. (Use
+      `UPGRADER_ARGS=--dry-run` for a cheap check, then a real small-set run.) This is exactly what the
+      weekly cron invokes — verify the cron-equivalent path end-to-end.
 - [ ] **V9 — cleanup.** Every PR opened during verification (recipe + any cc-ci test PR) is **closed**
-      and any sandbox deploy is **torn down**; the box is left clean.
+      and any sandbox deploy is **torn down**; the verification `cc-ci-upgrader` session is stopped
+      (`launch-upgrader.sh stop`); the box is left clean.

 ## 2. Method / notes
 - **Sandbox first.** Prefer `custom-html-tiny` / a throwaway recipe for V3–V8 so real recipe mirrors