refactor: rewrite launchers as Python; add orchestrator JOURNAL.md

Bash scripts are now one-liner wrappers: exec python3 <script>.py "$@"
All logic lives in the Python scripts (pure stdlib, no deps).

launch.py — loops + watchdog:
  Full port of launch.sh: phase sequencing, start/stop/status/logs/watchdog,
  handoff signalling, stall detection, heal_session, heal_orchestrator.
  Cleaner structure: config block → helpers → phase/kickoff/agent/healing/
  handoff/watchdog/main. LOOP_BACKEND + LOOP_MODEL switches throughout.

launch-orchestrator.py — orchestrator session:
  claude path: --resume <id> preserved (conversation survives reboots).
  opencode path: run --attach --title (no --resume; STARTUP_PROMPT orients
  the new session; reads JOURNAL.md for context).
  STARTUP_PROMPT updated to reference JOURNAL.md on startup.

launch-upgrader.py — one-shot upgrade job:
  LOOP_BACKEND / LOOP_MODEL take precedence over UPGRADER_BACKEND / UPGRADER_MODEL.
  Both claude and opencode paths supported.

cc-ci-plan/JOURNAL.md — new orchestrator handoff file:
  Persistent across conversation resets. Documents the handoff format and
  carries the current session's summary: migration complete, phase 5 in
  progress (V3/V7 PASS), phase 4 deferred, open items for next session.

AGENTS.md: step 1 on startup = read JOURNAL.md; step 5 = append on handoff.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-05-31 17:50:09 +00:00
parent e0e5bf6e64
commit bca51071bd
8 changed files with 1067 additions and 781 deletions

View File

@ -16,18 +16,18 @@ project (NixOS config, test runner, recipe tests) lives in a **separate** repo t
The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator
watches from outside.
## On startup: announce yourself + report reboots
## On startup: read the journal, announce yourself, report reboots
**Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online —
the operator wants to know the supervising session is back (especially after a reboot, which kills
this session). Include the current phase and the reboot count. Steps on startup:
1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status`
(current phase + whether the loops/watchdog are running).
2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog
**Every time you (the orchestrator) start or resume:**
1. **Read `cc-ci-plan/JOURNAL.md`** — the most recent `## Session` entry is where the previous
session left off. This is the persistent handoff record; read it before anything else.
2. Read `cc-ci-plan/REBOOTS.md` (count entries) and run `cc-ci-plan/launch.sh status`
(current phase + whether loops/watchdog are running).
3. **`PushNotification`** (proactive): *"cc-ci orchestrator online — phase X, loops+watchdog
running; N reboots logged (last <date>)."*
3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the
loops are down), check that `cc-ci-loops.service` brought the loops back; if not, relaunch with
`RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
4. If loops are down, relaunch: `RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
5. **On handoff / end of session:** append a `## Session` block to `JOURNAL.md` summarising
what happened, current state, and open items (see format in that file).
Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot
to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops +

82
cc-ci-plan/JOURNAL.md Normal file
View File

@ -0,0 +1,82 @@
# Orchestrator journal
This file is the **persistent handoff record** for the cc-ci orchestrator. Every orchestrator
session (whether Claude or opencode) reads this on startup and appends to it when handing off or
when something noteworthy happens. It survives conversation resets — it is the memory that
`--resume` can't provide for opencode, and a more readable supplement for Claude sessions.
**On startup:** read this file before doing anything else. The most recent `## Session` entry
is where the previous session left off. Carry that context forward.
**On handoff / end of session:** append a `## Session` block (see format below) summarising
what happened, the current state, and anything the next session needs to know.
**On significant events mid-session:** append a `### Event` sub-entry (no need to wait for
handoff).
---
## Format
```markdown
## Session YYYY-MM-DD HH:MM UTC — <backend> <model>
**Left off:** <one sentence what was the last thing done>
**Phase / loop state:** <phase X [N/11], loops RUNNING/stopped, cc-ci healthy/issue>
**Open items:** <bullet list of anything the next session needs to act on, or "none">
**Notes:** <anything surprising, a decision made, a known blocker, etc.>
### Event HH:MM — <short label>
<brief note>
```
---
## Session 2026-05-31 ~04:00 UTC — Claude Sonnet 4.6
**Left off:** Completed the orchestrator → Hetzner migration (cpx22, server 134487234, public
`168.119.126.100`, tailnet `cc-ci-orchestrator-1` @ `100.84.190.30`). The old Incus VM
(`100.116.55.106`) is still on the tailnet — cold standby, not yet deleted.
**Phase / loop state:** Phases 1c1e, 2w, 2pc, 2, 2b, 3 all DONE. Phase 5 [11/11]
(upgrade-flow verify) in progress — loops running, actively verifying the `!testme`
end-to-end flow on the new Hetzner cc-ci server.
**Open items:**
- Phase 5 is in progress — loops need to finish V1V9 and write `## DONE` to STATUS-5.md.
- Phase 4 (final review/polish) was deliberately **skipped** this session — it is queued
at idx 9 in PHASE_IDX_FILE. Resume it after the weekly Opus credits reset.
- Phase 6 (reconcile-only over all 18 recipe mirrors) and Phase 7 (full upgrade on n8n +
ghost + matrix-synapse) are planned but not yet started — run them after Phase 5 DONE.
- Old Incus orchestrator VM (`cc-ci-orchestrator`, `100.116.55.106`) is still running —
stop it via the b1 Incus API once happy with the Hetzner box. mTLS certs at
`/srv/incus-terraform-nix-vm-creator/terraform-secrets/`.
- DNS: `oc.commoninternet.net` A record → `100.84.190.30` still needs adding (operator step).
**Notes:**
- `cc-ci-loops.service` is **enabled** and wired with `reboot-log.sh` ExecStartPre — a reboot
is a non-event; loops + watchdog auto-resume via RESUME_PHASE=1.
- The cc-ci **server** also moved to Hetzner (server 134485294, `ssh cc-ci`
`100.95.31.88`). It has authenticated Docker Hub pulls and 150 GB disk — the old OOM /
disk-starvation / rate-limit issues are gone.
- All recipe mirrors currently reconcile correctly; no stale open PRs observed.
- `opencode` v1.15.13 installed at `/home/loops/.local/bin/opencode`. Tinfoil API key is in
`.testenv` as `TINFOIL_API_KEY`. Backend switch: `LOOP_BACKEND=opencode
LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
- Launcher scripts rewritten to Python (`launch.py`, `launch-orchestrator.py`,
`launch-upgrader.py`); bash wrappers are now one-liners that `exec python3 <script> "$@"`.
### Event 03:13 — migrated from old Incus VM to Hetzner
Loops were started manually during staging (not by the service); first systemd-managed
boot was later this session. `cc-ci-loops.service` now enabled.
### Event 05:23 — phase 3 (results-UX) completed
All R1R8 Adversary-verified, no VETO. Watchdog auto-advanced to phase 4.
### Event 13:22 — phase 4 paused, jumped to phase 5
Operator deferred phase 4 (weekly Opus credits exhausted). Phase idx manually set to 10
(phase 5). Loops restarted on Sonnet.
### Event 17:29 — loops stopped pending restart on different model
Operator paused loops to reconfigure backend (opencode/tinfoil exploration). Phase 5
[11/11] was in progress — loops had verified V1/V2/V3/V7 (custom-html-tiny upgrade GREEN).
Phase idx = 10 (phase 5), loops stopped, watchdog stopped.

View File

@ -0,0 +1,189 @@
#!/usr/bin/env python3
"""
cc-ci orchestrator launcher — start/resume the orchestrator session in tmux.
The orchestrator is the long-lived supervisory session: it watches the Builder/Adversary
loops, reads their logs/STATUS, edits the plan/prompts, restarts stuck loops, and owns
the VM-level fallback. It is SEPARATE from the loops that launch.py manages.
Usage:
launch-orchestrator.py start resume the persistent session (default)
launch-orchestrator.py fresh start a NEW session (no --resume)
launch-orchestrator.py stop kill the tmux session (conversation persists on disk)
launch-orchestrator.py status show session state
launch-orchestrator.py attach tmux attach to the session
Env:
LOOP_BACKEND claude (default) | opencode
LOOP_MODEL model flag, e.g. "sonnet" or "tinfoil/deepseek-v4-pro"
claude backend:
CLAUDE_BIN claude
REMOTE_CONTROL 1 (viewable at claude.ai/code)
ORCH_SESSION_ID override the resume id (else read from $ID_FILE)
ORCH_ID_FILE $LOG_DIR/.orchestrator-session-id
ORCH_STARTUP_PROMPT startup nudge injected as the first turn after --resume
opencode backend:
OPENCODE_BIN /home/loops/.local/bin/opencode
OPENCODE_SERVER http://127.0.0.1:4096
(no --resume equivalent; STARTUP_PROMPT is sent as the initial message;
the session title in the web UI is the SESSION name)
"""
import os, sys, subprocess
from datetime import datetime
from pathlib import Path
# ── config ────────────────────────────────────────────────────────────────────
SESSION = os.environ.get("ORCH_SESSION", "cc-ci-orchestrator-vm")
WORKDIR = os.environ.get("ORCH_DIR", "/srv/cc-ci")
LOG_DIR = os.environ.get("LOG_DIR", "/srv/cc-ci/.cc-ci-logs")
BACKEND = os.environ.get("LOOP_BACKEND", "claude")
LOOP_MODEL = os.environ.get("LOOP_MODEL", "")
# claude-specific
CLAUDE_BIN = os.environ.get("CLAUDE_BIN", "claude")
CLAUDE_FLAGS = os.environ.get("CLAUDE_FLAGS", "--dangerously-skip-permissions")
REMOTE_CONTROL = os.environ.get("REMOTE_CONTROL", "1") == "1"
DEFAULT_ID = "c746050a-af11-409d-87ba-c05268e2e5d1"
ID_FILE = os.environ.get("ORCH_ID_FILE", f"{LOG_DIR}/.orchestrator-session-id")
STARTUP_PROMPT = os.environ.get("ORCH_STARTUP_PROMPT", (
"STARTUP (auto-launch): you are the cc-ci orchestrator, just (re)launched, likely after a "
"reboot. Do your AGENTS.md On-startup routine NOW: read cc-ci-plan/REBOOTS.md and run "
"cc-ci-plan/launch.py status, then send a proactive PushNotification that you are online "
"with the current phase and reboot count, and confirm cc-ci-loops.service brought the loops "
"+ watchdog back (relaunch with RESUME_PHASE=1 cc-ci-plan/launch.py start if not). "
"Also read cc-ci-plan/JOURNAL.md for recent context before resuming supervision."
))
# opencode-specific
OPENCODE_BIN = os.environ.get("OPENCODE_BIN", "/home/loops/.local/bin/opencode")
OPENCODE_SERVER = os.environ.get("OPENCODE_SERVER", "http://127.0.0.1:4096")
# ── helpers ───────────────────────────────────────────────────────────────────
def log(msg):
ts = datetime.now().strftime("%H:%M:%S")
print(f"[orchestrator {ts}] {msg}", flush=True)
def die(msg):
log(f"ERROR: {msg}")
sys.exit(1)
def session_alive():
return subprocess.run(
["tmux", "has-session", "-t", SESSION], capture_output=True
).returncode == 0
def resume_id():
sid = os.environ.get("ORCH_SESSION_ID")
if sid:
return sid
try:
v = Path(ID_FILE).read_text().strip()
return v or DEFAULT_ID
except FileNotFoundError:
return DEFAULT_ID
# ── launch ────────────────────────────────────────────────────────────────────
def start(mode="resume"):
import shutil
if not shutil.which("tmux"):
die("tmux not found")
Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
if session_alive():
log(f"{SESSION} already running — leaving it (use 'stop' first to relaunch)")
return
model_flag = f"--model '{LOOP_MODEL}'" if LOOP_MODEL else ""
if BACKEND == "claude":
if not shutil.which(CLAUDE_BIN):
die(f"claude CLI not found — set CLAUDE_BIN (currently: {CLAUDE_BIN})")
if not Path(ID_FILE).exists():
Path(ID_FILE).write_text(DEFAULT_ID)
rc = f"--remote-control '{SESSION}'" if REMOTE_CONTROL else ""
resume = f"--resume '{resume_id()}'" if mode == "resume" else ""
prompt = f"'{STARTUP_PROMPT}'" if STARTUP_PROMPT else ""
cmd = f"{CLAUDE_BIN} {resume} {rc} {model_flag} {CLAUDE_FLAGS} {prompt}"
detail = f"resume={resume_id()}" if mode == "resume" else "fresh"
log(f"starting {SESSION} (backend=claude, {detail}, model={LOOP_MODEL or 'default'})")
elif BACKEND == "opencode":
if not Path(OPENCODE_BIN).exists():
die(f"opencode not found at {OPENCODE_BIN}")
# No --resume equivalent in opencode; STARTUP_PROMPT orients the new session.
# The session title in the web UI identifies it as the orchestrator.
prompt = STARTUP_PROMPT or (
"You are the cc-ci orchestrator. Read /srv/cc-ci/AGENTS.md and "
"cc-ci-plan/JOURNAL.md for context, then resume supervising the loops."
)
cmd = (
f"set -a; . /srv/cc-ci/.testenv; set +a; "
f"{OPENCODE_BIN} {model_flag} run --attach '{OPENCODE_SERVER}' "
f"--title '{SESSION}' '{prompt}'"
)
log(f"starting {SESSION} (backend=opencode, model={LOOP_MODEL or 'default'})")
log(f" visible at http://oc.commoninternet.net (tailnet only)")
else:
die(f"unknown LOOP_BACKEND '{BACKEND}' — use 'claude' or 'opencode'")
subprocess.run(["tmux", "new-session", "-d", "-s", SESSION, "-c", WORKDIR, cmd])
subprocess.run(["tmux", "pipe-pane", "-o", "-t", SESSION,
f"cat >> '{LOG_DIR}/{SESSION}.log'"])
log(f"started. attach: tmux attach -t {SESSION}")
# ── main ──────────────────────────────────────────────────────────────────────
def main():
cmd = sys.argv[1] if len(sys.argv) > 1 else "start"
if cmd == "start":
start("resume")
elif cmd == "fresh":
start("fresh")
elif cmd == "stop":
if session_alive():
log(f"killing {SESSION}")
subprocess.run(["tmux", "kill-session", "-t", SESSION])
else:
log(f"{SESSION} not running")
elif cmd == "status":
state = "RUNNING" if session_alive() else "stopped"
log(f"{SESSION}: {state}")
subprocess.run(
f"ps -eo pid,etime,args | grep '[r]emote-control {SESSION}' || true",
shell=True)
if BACKEND == "claude":
log(f"resume id: {resume_id()} (file: {ID_FILE})")
log(f"backend: {BACKEND} model: {LOOP_MODEL or '<default>'}")
elif cmd == "attach":
os.execvp("tmux", ["tmux", "attach", "-t", SESSION])
else:
backend_note = (
"claude: --resume preserves conversation across reboots; viewable at claude.ai/code\n"
" opencode: fresh session each launch (no --resume); viewable at http://oc.commoninternet.net"
)
print(f"""cc-ci orchestrator launcher
launch-orchestrator.py start resume the persistent session (default)
launch-orchestrator.py fresh start a new session (no --resume)
launch-orchestrator.py stop kill the tmux session
launch-orchestrator.py status show session state
launch-orchestrator.py attach tmux attach
Backend: {BACKEND} (LOOP_BACKEND env var)
Model: {LOOP_MODEL or '<backend default>'} (LOOP_MODEL env var)
Session: {SESSION} cwd={WORKDIR}
{backend_note}
""")
if __name__ == "__main__":
main()

View File

@ -1,118 +1,3 @@
#!/usr/bin/env bash
#
# launch-orchestrator.sh — start/resume the cc-ci ORCHESTRATOR session in tmux under remote-control.
#
# The orchestrator (see /srv/cc-ci/AGENTS.md) is the long-lived SUPERVISORY session: it watches the
# Builder/Adversary loops, reads their logs/STATUS, edits the plan/prompts, restarts stuck loops, and
# owns the VM-level fallback. It is SEPARATE from the loops that launch.sh manages — this script only
# brings the orchestrator back (e.g. after a reboot, which kills the tmux server and every session in
# it). The conversation itself survives on disk across exits/reboots; remote-control only stays
# connected while the process is alive, so recovery = relaunch the process and re-attach by --resume.
#
# Naming: tmux session AND remote-control name are both "cc-ci-orchestrator-vm" (the -vm suffix
# distinguishes it from the repo name cc-ci-orchestrator); the loop sessions are cc-ci-builder /
# cc-ci-adv / cc-ci-watchdog.
#
# Usage:
# ./launch-orchestrator.sh start # resume the persistent orchestrator session (DEFAULT)
# ./launch-orchestrator.sh fresh # start a NEW orchestrator session (no --resume)
# ./launch-orchestrator.sh status # show tmux + remote-control state
# ./launch-orchestrator.sh attach # tmux attach to the session (Ctrl-b d to detach)
# ./launch-orchestrator.sh stop # kill the tmux session (conversation persists on disk)
#
# The persistent session id is read from $ID_FILE (seeded on first run with DEFAULT_ID). A Claude
# session keeps the SAME id across --resume, so this stays valid across reboots. To point the script
# at a different session, edit that file or export ORCH_SESSION_ID.
set -euo pipefail
# ----- config -------------------------------------------------------------
SESSION="${ORCH_SESSION:-cc-ci-orchestrator-vm}" # tmux session name == remote-control name
WORKDIR="${ORCH_DIR:-/srv/cc-ci}" # orchestrator cwd (its claude project dir)
CLAUDE_BIN="${CLAUDE_BIN:-claude}"
CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
# REMOTE_CONTROL=1 → --remote-control session, viewable/steerable at claude.ai/code. Needs the box
# logged into the claude.ai account. =0 for a plain local interactive session.
REMOTE_CONTROL="${REMOTE_CONTROL:-1}"
LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
ID_FILE="${ORCH_ID_FILE:-$LOG_DIR/.orchestrator-session-id}"
DEFAULT_ID="c746050a-af11-409d-87ba-c05268e2e5d1" # the orchestrator session as of 2026-05-31 (Hetzner)
# Startup nudge injected as the resumed session's first turn, so an AUTO-launched orchestrator (e.g.
# cc-ci-loops.service ExecStartPost after a reboot) actually RUNS its AGENTS.md startup routine —
# announce itself + report reboots — instead of resuming silently and waiting. Set empty to disable.
# Must contain NO single quotes (it is single-quoted into the tmux command).
STARTUP_PROMPT="${ORCH_STARTUP_PROMPT-STARTUP (auto-launch): you are the cc-ci orchestrator, just (re)launched, likely after a reboot. Do your AGENTS.md On-startup routine NOW: read cc-ci-plan/REBOOTS.md and run cc-ci-plan/launch.sh status, then send a proactive PushNotification that you are online with the current phase and reboot count, and confirm cc-ci-loops.service brought the loops + watchdog back (relaunch with RESUME_PHASE=1 cc-ci-plan/launch.sh start if not). Then resume supervising.}"
# --------------------------------------------------------------------------
log() { printf '[orchestrator %(%H:%M:%S)T] %s\n' -1 "$*"; }
die() { log "ERROR: $*"; exit 1; }
session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; }
preflight() {
command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux"
command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)"
[[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR"
mkdir -p "$LOG_DIR"
[[ -f "$ID_FILE" ]] || echo "$DEFAULT_ID" > "$ID_FILE"
}
resume_id() { echo "${ORCH_SESSION_ID:-$(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID")}"; }
# Launch claude in a detached tmux session. $1=resume ("resume"|"fresh").
start() {
local mode="${1:-resume}"
preflight
if session_alive; then
log "$SESSION already running — leaving it (use '$0 stop' first to relaunch)"
return 0
fi
local rc="" resume="" id=""
[[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'"
if [[ "$mode" == "resume" ]]; then
id="$(resume_id)"
[[ -n "$id" ]] && resume="--resume '$id'"
log "starting $SESSION (resume=$id, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
else
log "starting $SESSION FRESH (no resume, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
fi
# Startup nudge as a POSITIONAL prompt (not stdin — stdin would force print mode and break
# remote-control). On --resume this appends as the session's next turn, triggering the AGENTS.md
# startup routine (announce + report reboots). Empty STARTUP_PROMPT => clean resume, no nudge.
local prompt_arg=""
[[ -n "$STARTUP_PROMPT" ]] && prompt_arg="'$STARTUP_PROMPT'"
tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
"$CLAUDE_BIN $resume $rc $CLAUDE_FLAGS $prompt_arg"
tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'"
log "started. status: $0 status | attach: tmux attach -t $SESSION"
}
case "${1:-start}" in
start) start resume ;;
fresh) start fresh ;;
stop)
if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi
;;
status)
if session_alive; then
log "$SESSION: RUNNING"
ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true
else
log "$SESSION: stopped"
fi
log "resume id: $(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID") (file: $ID_FILE)"
;;
attach) exec tmux attach -t "$SESSION" ;;
*)
cat <<EOF
cc-ci orchestrator launcher
$0 start resume the persistent orchestrator session in tmux + remote-control (default)
$0 fresh start a NEW orchestrator session (no --resume)
$0 status show tmux + remote-control state and the resume id
$0 attach tmux attach to the session
$0 stop kill the tmux session (conversation persists on disk)
Env: SESSION=$SESSION WORKDIR=$WORKDIR REMOTE_CONTROL=$REMOTE_CONTROL CLAUDE_BIN=$CLAUDE_BIN
EOF
;;
esac
# Thin wrapper — delegates everything to launch-orchestrator.py in the same directory.
exec python3 "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/launch-orchestrator.py" "$@"

View File

@ -0,0 +1,198 @@
#!/usr/bin/env python3
"""
cc-ci upgrader launcher — one-shot weekly recipe-upgrade job agent.
The upgrader runs /upgrade-all to completion, then stops and stays idle so the
run + summary remain viewable in the web UI. The next weekly run starts a fresh
session (start clears any idle/finished session).
Usage:
launch-upgrader.py start use-or-create: leave an in-flight run alone, else start fresh
launch-upgrader.py fresh always kill any existing session and start fresh
launch-upgrader.py stop kill the session
launch-upgrader.py status show session state
launch-upgrader.py attach tmux attach to the session
Env:
LOOP_BACKEND claude (default) | opencode — also accepts UPGRADER_BACKEND
LOOP_MODEL model flag (overrides UPGRADER_MODEL)
UPGRADER_MODEL sonnet (default for claude) | tinfoil/deepseek-v4-pro (opencode example)
UPGRADER_ARGS extra args passed to /upgrade-all (e.g. "n8n ghost", "--dry-run")
claude backend:
CLAUDE_BIN, CLAUDE_FLAGS, REMOTE_CONTROL
opencode backend:
OPENCODE_BIN, OPENCODE_SERVER
"""
import os, sys, subprocess, re
from datetime import datetime
from pathlib import Path
# ── config ────────────────────────────────────────────────────────────────────
SESSION = os.environ.get("UPGRADER_SESSION", "cc-ci-upgrader")
WORKDIR = os.environ.get("UPGRADER_DIR", "/srv/cc-ci")
LOG_DIR = os.environ.get("LOG_DIR", "/srv/cc-ci/.cc-ci-logs")
# LOOP_BACKEND / LOOP_MODEL take precedence (unified control from the operator).
BACKEND = os.environ.get("LOOP_BACKEND", os.environ.get("UPGRADER_BACKEND", "claude"))
MODEL = os.environ.get("LOOP_MODEL", os.environ.get("UPGRADER_MODEL", "sonnet"))
CLAUDE_BIN = os.environ.get("CLAUDE_BIN", "claude")
CLAUDE_FLAGS = os.environ.get("CLAUDE_FLAGS", "--dangerously-skip-permissions")
REMOTE_CONTROL = os.environ.get("REMOTE_CONTROL", "1") == "1"
OPENCODE_BIN = os.environ.get("OPENCODE_BIN", "/home/loops/.local/bin/opencode")
OPENCODE_SERVER = os.environ.get("OPENCODE_SERVER", "http://127.0.0.1:4096")
UPGRADER_ARGS = os.environ.get("UPGRADER_ARGS", "")
# ── helpers ───────────────────────────────────────────────────────────────────
def log(msg):
ts = datetime.now().strftime("%H:%M:%S")
print(f"[upgrader {ts}] {msg}", flush=True)
def die(msg):
log(f"ERROR: {msg}")
sys.exit(1)
def session_alive():
return subprocess.run(
["tmux", "has-session", "-t", SESSION], capture_output=True
).returncode == 0
def session_busy():
"""True while a turn is actively in flight (not idle/finished/wedged)."""
r = subprocess.run(["tmux", "capture-pane", "-pt", SESSION],
capture_output=True, text=True)
pane = r.stdout if r.returncode == 0 else ""
return bool(re.search(r"esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool", pane))
def kill_session():
subprocess.run(["tmux", "kill-session", "-t", SESSION], capture_output=True)
# ── kickoff prompt ────────────────────────────────────────────────────────────
def build_kickoff():
args_note = f" with arguments: {UPGRADER_ARGS}" if UPGRADER_ARGS else ""
return f"""\
*** cc-ci UPGRADER — weekly recipe-upgrade job ***
You are the cc-ci Upgrader: a ONE-SHOT job agent, NOT a perpetual loop. Run the
recipe-upgrade sequence to completion, then STOP. Your cwd is {WORKDIR}; reach the CI
server with `ssh cc-ci`; creds are in {WORKDIR}/.testenv; skills in {WORKDIR}/.claude/skills/.
DO THIS:
1. Invoke the /upgrade-all skill in DEFAULT mode{args_note}
(read {WORKDIR}/.claude/skills/upgrade-all/SKILL.md for the full procedure). It surveys
every enrolled recipe and, for each upgradeable one, runs /recipe-upgrade in DEFAULT
mode — recipe PR only, verified by posting `!testme` on the PR (results visible in the
PR, iterate up to 3x). A genuinely stale test gets an explanatory PR COMMENT, never a
test edit.
2. Process recipes via per-recipe SUBAGENTS so your own context stays light. If your
context usage climbs (~80%), run /compact before continuing.
3. Write + push the weekly summary (the PR list is the actionable output for the operator).
4. WHEN THE RUN IS COMPLETE: STOP. Print the final summary (lead with the PR list) and an
`UPGRADE RUN COMPLETE` line, then go idle. Do NOT loop, do NOT re-run, and do NOT kill
your own session — leave it up so the operator can review the output in the web UI.
Next week's run starts a fresh session (the launcher clears this idle one).
GUARDRAILS: NEVER merge any PR. NEVER weaken a test. DEFAULT mode only — do NOT pass
--with-tests (updating cc-ci tests is the operator's per-recipe opt-in). Single-writer:
dedicated branches + separate clones, never push main, never touch the build loops'
/cc-ci /cc-ci-adv clones. The shared Swarm is stateful — go sequentially.
"""
# ── launch ────────────────────────────────────────────────────────────────────
def start(mode="use-or-create"):
import shutil
if not shutil.which("tmux"):
die("tmux not found")
Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
if session_alive():
if mode == "use-or-create" and session_busy():
log(f"{SESSION} already running a job (busy) — leaving it")
return
log(f"{SESSION} exists but idle/stale (or fresh requested) — killing it first")
kill_session()
import time; time.sleep(1)
kf = Path(LOG_DIR) / f".kickoff-{SESSION}.txt"
kf.write_text(build_kickoff())
model_flag = f"--model '{MODEL}'" if MODEL else ""
log(f"starting {SESSION} (backend={BACKEND}, model={MODEL}, args='{UPGRADER_ARGS or '<none>'}')")
if BACKEND == "claude":
if not shutil.which(CLAUDE_BIN):
die(f"claude CLI not found — set CLAUDE_BIN (currently: {CLAUDE_BIN})")
rc = f"--remote-control '{SESSION}'" if REMOTE_CONTROL else ""
cmd = f"{CLAUDE_BIN} {rc} {model_flag} {CLAUDE_FLAGS} \"$(cat '{kf}')\""
elif BACKEND == "opencode":
if not Path(OPENCODE_BIN).exists():
die(f"opencode not found at {OPENCODE_BIN}")
cmd = (
f"set -a; . /srv/cc-ci/.testenv; set +a; "
f"{OPENCODE_BIN} {model_flag} run --attach '{OPENCODE_SERVER}' "
f"--title '{SESSION}' \"$(cat '{kf}')\""
)
log(f" visible at http://oc.commoninternet.net (tailnet only)")
else:
die(f"unknown LOOP_BACKEND '{BACKEND}' — use 'claude' or 'opencode'")
subprocess.run(["tmux", "new-session", "-d", "-s", SESSION, "-c", WORKDIR, cmd])
subprocess.run(["tmux", "pipe-pane", "-o", "-t", SESSION,
f"cat >> '{LOG_DIR}/{SESSION}.log'"])
log(f"started. attach: tmux attach -t {SESSION} log: {LOG_DIR}/{SESSION}.log")
# ── main ──────────────────────────────────────────────────────────────────────
def main():
cmd = sys.argv[1] if len(sys.argv) > 1 else "start"
if cmd == "start":
start("use-or-create")
elif cmd == "fresh":
start("fresh")
elif cmd == "stop":
if session_alive():
log(f"killing {SESSION}")
kill_session()
else:
log(f"{SESSION} not running")
elif cmd == "status":
if session_alive():
busy = "busy" if session_busy() else "idle/finishing"
log(f"{SESSION}: RUNNING ({busy})")
subprocess.run(
f"ps -eo pid,etime,args | grep '[r]emote-control {SESSION}' || true",
shell=True)
else:
log(f"{SESSION}: stopped")
log(f"backend: {BACKEND} model: {MODEL} args: '{UPGRADER_ARGS or '<none>'}'")
elif cmd == "attach":
os.execvp("tmux", ["tmux", "attach", "-t", SESSION])
else:
print(f"""cc-ci upgrader launcher — one-shot weekly recipe-upgrade job
launch-upgrader.py start use-or-create (leave busy run alone, else start fresh)
launch-upgrader.py fresh always kill existing + start fresh
launch-upgrader.py stop kill the session
launch-upgrader.py status show session state
launch-upgrader.py attach tmux attach
Backend: {BACKEND} (LOOP_BACKEND or UPGRADER_BACKEND env var)
Model: {MODEL} (LOOP_MODEL or UPGRADER_MODEL env var)
Args: {UPGRADER_ARGS or '<none>'} (UPGRADER_ARGS env var, passed to /upgrade-all)
claude: viewable at claude.ai/code
opencode: viewable at http://oc.commoninternet.net server={OPENCODE_SERVER}
""")
if __name__ == "__main__":
main()

View File

@ -1,151 +1,3 @@
#!/usr/bin/env bash
#
# launch-upgrader.sh — spin up the cc-ci UPGRADER agent in tmux under remote-control.
#
# The Upgrader is a ONE-SHOT job agent (not a perpetual loop like the Builder/Adversary): it runs the
# weekly recipe-upgrade sequence — the /upgrade-all skill in DEFAULT mode — to completion, then STOPS
# and stays idle (it does NOT self-terminate) so the run + summary remain viewable/steerable at
# claude.ai/code exactly like the Builder, instead of being buried in headless cron output. The next
# weekly run starts a fresh session: `start` leaves an in-flight run alone but clears a finished/idle
# (or wedged) session and starts clean. The weekly cron (Sat 03:00 UTC, once cc-ci is built — see
# [[cc-ci-upgrade-all-cron]]) invokes `launch-upgrader.sh start`.
#
# Naming: tmux session AND remote-control name are both "cc-ci-upgrader" (matching
# cc-ci-builder / cc-ci-adv / cc-ci-watchdog / cc-ci-orchestrator).
#
# Usage:
# ./launch-upgrader.sh start # use-or-create: if a run is actively in flight leave it,
# # else (no session / idle-stale) kill any stale + start fresh
# ./launch-upgrader.sh fresh # always kill any existing + start a fresh run
# ./launch-upgrader.sh status | attach | stop
#
# Env:
# UPGRADER_ARGS="" passthrough args to /upgrade-all (e.g. "--dry-run", "ghost n8n"); default none
# = full default fleet run. NEVER pass --with-tests here (the cron must not
# auto-edit tests; that's the operator's per-recipe opt-in).
set -euo pipefail
SESSION="${UPGRADER_SESSION:-cc-ci-upgrader}" # tmux session name == remote-control name
WORKDIR="${UPGRADER_DIR:-/srv/cc-ci}" # cwd: where .claude/skills/ + .testenv live
# Backend selection — mirrors launch.sh. LOOP_BACKEND overrides for consistency.
UPGRADER_BACKEND="${LOOP_BACKEND:-${UPGRADER_BACKEND:-claude}}" # "claude" or "opencode"
# Model: LOOP_MODEL > UPGRADER_MODEL > backend default (sonnet for claude, provider/model for opencode).
UPGRADER_MODEL="${LOOP_MODEL:-${UPGRADER_MODEL:-sonnet}}"
CLAUDE_BIN="${CLAUDE_BIN:-claude}"
CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
OPENCODE_BIN="${OPENCODE_BIN:-/home/loops/.local/bin/opencode}"
OPENCODE_SERVER="${OPENCODE_SERVER:-http://127.0.0.1:4096}"
REMOTE_CONTROL="${REMOTE_CONTROL:-1}" # 1 => --remote-control / opencode web
LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
UPGRADER_ARGS="${UPGRADER_ARGS:-}"
log() { printf '[upgrader %(%H:%M:%S)T] %s\n' -1 "$*"; }
die() { log "ERROR: $*"; exit 1; }
session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; }
# "actively working" = claude shows interrupt hint; opencode shows spinner/Running tool.
session_busy() { tmux capture-pane -pt "$SESSION" 2>/dev/null | grep -qE 'esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool'; }
preflight() {
command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux"
case "$UPGRADER_BACKEND" in
claude) command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)" ;;
opencode) command -v "$OPENCODE_BIN" >/dev/null 2>&1 || die "opencode not found (set OPENCODE_BIN)"
[[ -n "$OPENCODE_HOST" ]] || die "could not detect tailscale IP for OPENCODE_HOST" ;;
*) die "unknown UPGRADER_BACKEND '$UPGRADER_BACKEND' — use 'claude' or 'opencode'" ;;
esac
[[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR"
[[ -d "$WORKDIR/.claude/skills/upgrade-all" ]] || die "upgrade-all skill not found under $WORKDIR/.claude/skills"
mkdir -p "$LOG_DIR"
}
write_kickoff() {
local kf="$LOG_DIR/.kickoff-$SESSION.txt"
cat > "$kf" <<KICK
*** cc-ci UPGRADER — weekly recipe-upgrade job ***
You are the cc-ci Upgrader: a ONE-SHOT job agent, NOT a perpetual loop. Run the recipe-upgrade
sequence to completion, then STOP. Your cwd is ${WORKDIR}; reach the CI server with \`ssh cc-ci\`;
creds are in ${WORKDIR}/.testenv; the skills live in ${WORKDIR}/.claude/skills/.
DO THIS:
1. Invoke the **/upgrade-all** skill in DEFAULT mode${UPGRADER_ARGS:+ with arguments: ${UPGRADER_ARGS}}
(read ${WORKDIR}/.claude/skills/upgrade-all/SKILL.md for the full procedure). It surveys every
enrolled recipe and, for each upgradeable one, runs /recipe-upgrade in DEFAULT mode — recipe PR
only, verified by posting \`!testme\` on the PR (results visible in the PR, iterate up to 3x). A
genuinely stale test gets an explanatory PR COMMENT, never a test edit.
2. Process recipes via per-recipe SUBAGENTS (as the skill specifies) so your own context stays light.
If your context usage climbs (~80%), run /compact before continuing.
3. Write + push the weekly summary (the PR list is the actionable output for the operator).
4. WHEN THE RUN IS COMPLETE: STOP. Print the final summary (lead with the PR list) and an
\`UPGRADE RUN COMPLETE\` line, then go idle. Do NOT loop, do NOT re-run, and do NOT kill your own
session — leave it up so the operator can review your output + the summary in the web UI
(claude.ai/code). Next week's run starts a fresh session (the launcher clears this idle one).
GUARDRAILS: NEVER merge any PR. NEVER weaken a test. DEFAULT mode only — do NOT pass --with-tests
(updating cc-ci tests is the operator's per-recipe opt-in). Single-writer: dedicated branches +
separate clones, never push main, never touch the build loops' /cc-ci /cc-ci-adv clones. The shared
Swarm is stateful — go sequentially and tear down what you deploy.
KICK
echo "$kf"
}
start() {
local mode="${1:-use-or-create}"
preflight
if session_alive; then
if [[ "$mode" == "use-or-create" ]] && session_busy; then
log "$SESSION already running a job (busy) — leaving it"; return 0
fi
log "$SESSION exists but idle/stale (or fresh requested) — killing it first"
tmux kill-session -t "$SESSION" 2>/dev/null || true; sleep 1
fi
local kf
kf="$(write_kickoff)"
log "starting $SESSION (backend=$UPGRADER_BACKEND, model=$UPGRADER_MODEL, args='${UPGRADER_ARGS:-<none>}')"
case "$UPGRADER_BACKEND" in
claude)
local rc=""
[[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'"
tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
"$CLAUDE_BIN $rc --model '$UPGRADER_MODEL' $CLAUDE_FLAGS \"\$(cat '$kf')\""
;;
opencode)
tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
"set -a; . /srv/cc-ci/.testenv; set +a; $OPENCODE_BIN --model '$UPGRADER_MODEL' run --attach '$OPENCODE_SERVER' --title '$SESSION' \"\$(cat '$kf')\""
log "$SESSION visible in web UI at http://oc.commoninternet.net (tailnet only)"
;;
esac
tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'"
log "started. status: $0 status | attach: tmux attach -t $SESSION | log: $LOG_DIR/$SESSION.log"
}
case "${1:-start}" in
start) start use-or-create ;;
fresh) start fresh ;;
stop) if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi ;;
status)
if session_alive; then
log "$SESSION: RUNNING $(session_busy && echo '(busy)' || echo '(idle/finishing)')"
ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true
else log "$SESSION: stopped"; fi ;;
attach) exec tmux attach -t "$SESSION" ;;
*)
cat <<EOF
cc-ci upgrader launcher — one-shot weekly recipe-upgrade job agent (remote-control)
$0 start use-or-create: leave an in-flight run alone, else (re)start fresh (DEFAULT; what the cron calls)
$0 fresh always kill any existing + start a fresh run
$0 status show tmux + remote-control state
$0 attach tmux attach to the session
$0 stop kill the session
Env: UPGRADER_BACKEND=$UPGRADER_BACKEND UPGRADER_MODEL=$UPGRADER_MODEL UPGRADER_ARGS='${UPGRADER_ARGS:-<none>}'
claude: CLAUDE_BIN=$CLAUDE_BIN REMOTE_CONTROL=$REMOTE_CONTROL
opencode: OPENCODE_BIN=$OPENCODE_BIN OPENCODE_SERVER=$OPENCODE_SERVER web=http://oc.commoninternet.net
(LOOP_BACKEND / LOOP_MODEL override UPGRADER_BACKEND / UPGRADER_MODEL for unified control)
The agent runs /upgrade-all (DEFAULT mode) to completion, then STOPS and stays idle (viewable in the
web UI). It does NOT self-terminate; the next weekly `start` clears the idle session and runs fresh.
EOF
;;
esac
# Thin wrapper — delegates everything to launch-upgrader.py in the same directory.
exec python3 "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/launch-upgrader.py" "$@"

582
cc-ci-plan/launch.py Normal file
View File

@ -0,0 +1,582 @@
#!/usr/bin/env python3
"""
cc-ci loop launcher — phase-aware Builder/Adversary loops + watchdog.
Usage:
launch.py start start loops + watchdog (resets to phase 0 unless RESUME_PHASE=1)
launch.py stop stop loops + watchdog
launch.py status show phase + session state
launch.py watchdog run the watchdog in the foreground (called by start_watchdog)
launch.py logs builder|adversary|watchdog tail a log
Env (all optional — defaults shown):
LOOP_BACKEND claude (default) | opencode
LOOP_MODEL model flag, e.g. "sonnet" (claude) or "tinfoil/deepseek-v4-pro" (opencode)
RESUME_PHASE 1 = keep current phase index on start (default resets to 0)
CLAUDE_BIN claude
OPENCODE_BIN /home/loops/.local/bin/opencode
OPENCODE_SERVER http://127.0.0.1:4096
PLAN_DIR /srv/cc-ci/cc-ci-plan
BUILDER_DIR /srv/cc-ci/cc-ci
ADV_DIR /srv/cc-ci/cc-ci-adv
LOG_DIR /srv/cc-ci/.cc-ci-logs
PHASES_SPEC semicolon-separated "id|planfile|statusfile" entries
PHASE_IDX_FILE $LOG_DIR/.phase-idx
WATCH_INTERVAL 300 (seconds between heavy checks: phase DONE / heal sessions)
SIGNAL_INTERVAL 30 (seconds between handoff / stall checks)
STALL_IDLE 300 (idle seconds without a WAITING-UNTIL before reboot)
STALL_GRACE 180 (seconds past a WAITING-UNTIL before reboot)
"""
import hashlib, os, re, subprocess, sys, time
from datetime import datetime, timezone
from pathlib import Path
# ── config ────────────────────────────────────────────────────────────────────
PLAN_DIR = os.environ.get("PLAN_DIR", "/srv/cc-ci/cc-ci-plan")
BUILDER_DIR = os.environ.get("BUILDER_DIR", "/srv/cc-ci/cc-ci")
ADV_DIR = os.environ.get("ADV_DIR", "/srv/cc-ci/cc-ci-adv")
LOG_DIR = os.environ.get("LOG_DIR", "/srv/cc-ci/.cc-ci-logs")
BACKEND = os.environ.get("LOOP_BACKEND", "claude")
LOOP_MODEL = os.environ.get("LOOP_MODEL", "")
REMOTE_CONTROL = os.environ.get("REMOTE_CONTROL", "1") == "1"
CLAUDE_BIN = os.environ.get("CLAUDE_BIN", "claude")
CLAUDE_FLAGS = os.environ.get("CLAUDE_FLAGS", "")
if os.getuid() == 0:
os.environ.setdefault("CLAUDE_DANGEROUSLY_SKIP_PERMISSIONS", "1")
else:
CLAUDE_FLAGS = os.environ.get("CLAUDE_FLAGS", "--dangerously-skip-permissions")
OPENCODE_BIN = os.environ.get("OPENCODE_BIN", "/home/loops/.local/bin/opencode")
OPENCODE_SERVER = os.environ.get("OPENCODE_SERVER", "http://127.0.0.1:4096")
ORCH_SESSION = os.environ.get("ORCH_SESSION", "cc-ci-orchestrator-vm")
ORCH_LAUNCHER = os.environ.get("ORCH_LAUNCHER", f"{PLAN_DIR}/launch-orchestrator.sh")
WATCH_ORCHESTRATOR = os.environ.get("WATCH_ORCHESTRATOR", "1") == "1"
BUILDER_SESSION = "cc-ci-builder"
ADV_SESSION = "cc-ci-adv"
WATCHDOG_SESSION = "cc-ci-watchdog"
WATCH_INTERVAL = int(os.environ.get("WATCH_INTERVAL", 300))
SIGNAL_INTERVAL = int(os.environ.get("SIGNAL_INTERVAL", 30))
STALL_IDLE = int(os.environ.get("STALL_IDLE", 300))
STALL_GRACE = int(os.environ.get("STALL_GRACE", 180))
PHASES_SPEC = os.environ.get("PHASES_SPEC", ";".join([
"1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md",
"1b|plan-phase1b-review-lint.md|STATUS-1b.md",
"1d|plan-phase1d-generic-test-suite.md|STATUS-1d.md",
"1e|plan-phase1e-harness-corrections.md|STATUS-1e.md",
"2w|plan-phase2w-warm-canonical-quick.md|STATUS-2w.md",
"2pc|plan-phase2pc-image-cache.md|STATUS-2pc.md",
"2|plan-phase2-recipe-tests.md|STATUS-2.md",
"2b|plan-phase2b-test-performance.md|STATUS-2b.md",
"3|plan-phase3-results-ux.md|STATUS-3.md",
"4|plan-phase4-final-review-polish-cleanup.md|STATUS-4.md",
"5|plan-phase5-verify-upgrade-flow.md|STATUS-5.md",
]))
PHASES = [p.split("|") for p in PHASES_SPEC.split(";")]
PHASE_IDX_FILE = os.environ.get("PHASE_IDX_FILE", f"{LOG_DIR}/.phase-idx")
# Regex patterns for session-state detection
ACTIVE_RE = re.compile(r"esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool")
LIMIT_RE = re.compile(r"spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)", re.I)
FATAL_RE = re.compile(r"redacted_thinking|blocks cannot be modified|cannot be modified", re.I)
# ── logging ───────────────────────────────────────────────────────────────────
def log(msg):
ts = datetime.now().strftime("%H:%M:%S")
print(f"[launch {ts}] {msg}", flush=True)
def die(msg):
log(f"ERROR: {msg}")
sys.exit(1)
# ── tmux helpers ──────────────────────────────────────────────────────────────
def session_alive(name):
return subprocess.run(
["tmux", "has-session", "-t", name],
capture_output=True
).returncode == 0
def kill_session(name):
subprocess.run(["tmux", "kill-session", "-t", name], capture_output=True)
def capture_pane(name, lines=40):
r = subprocess.run(["tmux", "capture-pane", "-pt", name], capture_output=True, text=True)
return "\n".join(r.stdout.splitlines()[-lines:]) if r.returncode == 0 else ""
def pipe_to_log(session, log_path):
subprocess.run(["tmux", "pipe-pane", "-o", "-t", session, f"cat >> '{log_path}'"])
def ping_session(session, msg):
"""Type a message into a tmux session and submit it, retrying Enter until accepted."""
if not session_alive(session):
return
prefix = msg[:28]
subprocess.run(["tmux", "send-keys", "-t", session, "-l", "--", msg], capture_output=True)
time.sleep(0.5)
for _ in range(5):
subprocess.run(["tmux", "send-keys", "-t", session, "Enter"], capture_output=True)
time.sleep(1)
if prefix not in capture_pane(session, 4):
return # message was accepted
subprocess.run(["tmux", "send-keys", "-t", session, "C-m"], capture_output=True)
time.sleep(0.5)
# ── phase helpers ─────────────────────────────────────────────────────────────
def cur_idx():
try:
v = Path(PHASE_IDX_FILE).read_text().strip()
return int(v) if v.isdigit() else 0
except FileNotFoundError:
return 0
def phase_id(idx): return PHASES[idx][0]
def phase_plan(idx): return PHASES[idx][1]
def phase_status(idx): return PHASES[idx][2]
def all_ids(): return " ".join(p[0] for p in PHASES)
def resolve_state(repo_dir, basename):
"""Return the path to a loop-state file — machine-docs/ if present, else repo root."""
p = Path(repo_dir) / "machine-docs" / basename
return p if p.exists() else Path(repo_dir) / basename
def phase_done(status_basename):
path = resolve_state(BUILDER_DIR, status_basename)
try:
return any(line.startswith("## DONE") for line in path.open())
except FileNotFoundError:
return False
# ── kickoff prompt ────────────────────────────────────────────────────────────
def build_kickoff(role, idx):
pid, plan, status = phase_id(idx), phase_plan(idx), phase_status(idx)
preamble = (
f"*** cc-ci SUB-PHASE {pid} ***\n"
f"SINGLE SOURCE OF TRUTH for THIS phase: /srv/cc-ci/cc-ci-plan/{plan} — read it in full "
f"now; it defines this phase's mission and Definition of Done.\n"
f"The general loop protocol still applies and lives in /srv/cc-ci/cc-ci-plan/plan.md "
f"(§6.1 coordination, §7 pacing, §9 guardrails) — read those sections too.\n"
f"Track loop state in PHASE-NAMESPACED files in your repo clone: {status}, "
f"BACKLOG-{pid}.md, REVIEW-{pid}.md, JOURNAL-{pid}.md. DECISIONS.md is shared (append).\n"
f'"Done" for this phase = the Builder writes "## DONE" to {status} ONLY after every '
f"Definition-of-Done item is Adversary-verified with a fresh PASS in REVIEW-{pid}.md "
f"(handshake per §6.1).\n"
f"The repo's Phase-1 STATUS.md / BACKLOG.md / REVIEW.md are HISTORY from the completed "
f"Phase 1 — do NOT use them as your state; use the phase-namespaced files above.\n"
f'Wherever the standing rules below say "plan.md"/"STATUS.md"/"BACKLOG.md"/"REVIEW.md", '
f"substitute the phase plan and these phase-namespaced files.\n\n"
f"=== standing role & rules ===\n"
)
role_prompt = (Path(PLAN_DIR) / "prompts" / f"{role}.md").read_text()
return preamble + role_prompt
# ── agent launch ──────────────────────────────────────────────────────────────
def start_agent(role, session, workdir):
if session_alive(session):
log(f"{session} already running — leaving it")
return
Path(workdir).mkdir(parents=True, exist_ok=True)
Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
idx = cur_idx()
pid, plan = phase_id(idx), phase_plan(idx)
kf = Path(LOG_DIR) / f".kickoff-{session}.txt"
kf.write_text(build_kickoff(role, idx))
model_flag = f"--model '{LOOP_MODEL}'" if LOOP_MODEL else ""
if BACKEND == "claude":
rc = f"--remote-control '{session}'" if REMOTE_CONTROL else ""
cmd = f"{CLAUDE_BIN} {rc} {model_flag} {CLAUDE_FLAGS} \"$(cat '{kf}')\""
log(f"starting {session} (backend=claude, phase={pid}, plan={plan}, model={LOOP_MODEL or 'default'})")
elif BACKEND == "opencode":
cmd = (
f"set -a; . /srv/cc-ci/.testenv; set +a; "
f"{OPENCODE_BIN} {model_flag} run --attach '{OPENCODE_SERVER}' "
f"--title '{session}' \"$(cat '{kf}')\""
)
log(f"starting {session} (backend=opencode, phase={pid}, model={LOOP_MODEL or 'default'})")
log(f" visible at http://oc.commoninternet.net (tailnet only)")
else:
die(f"unknown BACKEND '{BACKEND}' — set LOOP_BACKEND=claude or LOOP_BACKEND=opencode")
subprocess.run(["tmux", "new-session", "-d", "-s", session, "-c", workdir, cmd])
pipe_to_log(session, f"{LOG_DIR}/{session}.log")
def start_loops():
start_agent("builder", BUILDER_SESSION, BUILDER_DIR)
start_agent("adversary", ADV_SESSION, ADV_DIR)
def stop_loops():
for s in (BUILDER_SESSION, ADV_SESSION):
if session_alive(s):
log(f"killing {s}")
kill_session(s)
# ── session healing ───────────────────────────────────────────────────────────
def heal_session(role, session, workdir):
"""Restart a dead session; kill+restart a FATAL-wedged one; nudge a limit-stalled one."""
if not session_alive(session):
log(f"{role} ({session}) gone — restarting (phase {phase_id(cur_idx())})")
start_agent(role, session, workdir)
return
pane = capture_pane(session, 25)
if ACTIVE_RE.search(pane):
return # actively working — leave it alone
if FATAL_RE.search(pane):
log(f"FATAL session-state error on {role} ({session}) — kill + restart fresh")
kill_session(session)
start_agent(role, session, workdir)
return
if LIMIT_RE.search(pane):
log(f"limit-stall on {role} ({session}) — nudging to resume")
ping_session(session,
"watchdog: the usage/spend limit appears lifted — RESUME your loop now. "
"Pull latest, re-read your phase STATUS/REVIEW files, and continue from where you "
"stopped; re-arm your loop pacing.")
# ── stall detection ───────────────────────────────────────────────────────────
_idle_since: dict[str, float] = {}
def _parse_waiting_until(pane):
"""Extract the epoch timestamp from a WAITING-UNTIL marker, or None."""
m = re.search(r"WAITING-UNTIL:\s*(\S+)", pane)
if not m:
return None
try:
ts = m.group(1)
dt = datetime.fromisoformat(ts.replace("Z", "+00:00"))
return dt.timestamp()
except Exception:
return None
def stall_check_one(role, session, workdir):
if not session_alive(session):
_idle_since[session] = 0.0
return
now = time.time()
pane = capture_pane(session, 40)
if ACTIVE_RE.search(pane):
_idle_since[session] = 0.0
return
since = _idle_since.get(session) or now
_idle_since[session] = since
idle = now - since
until = _parse_waiting_until(pane)
if until is not None:
# Declared wait: only reboot once STALL_GRACE seconds past the stated time.
# Never reboot before — that races with the healthy self-wake.
if now <= until + STALL_GRACE:
return
reason = f"past its WAITING-UNTIL by {int(now - until)}s — self-wake did not fire"
else:
if idle < STALL_IDLE:
return
reason = f"idle {int(idle)}s with no WAITING-UNTIL marker"
log(f"stall: {role} ({session}) {reason} — kill + reboot")
kill_session(session)
start_agent(role, session, workdir)
_idle_since[session] = 0.0
def stall_check():
stall_check_one("builder", BUILDER_SESSION, BUILDER_DIR)
stall_check_one("adversary", ADV_SESSION, ADV_DIR)
# ── orchestrator healing ──────────────────────────────────────────────────────
def orchestrator_alive():
"""
True if an orchestrator process is running anywhere.
Conflict-safety: never launch a second orchestrator resuming the same session
(double-resume causes "thinking blocks cannot be modified" crashes).
"""
for line in subprocess.run("pgrep -x claude || true", shell=True,
capture_output=True, text=True).stdout.splitlines():
pid = line.strip()
if not pid:
continue
try:
cmdline = Path(f"/proc/{pid}/cmdline").read_bytes().decode(errors="replace").replace("\0", " ")
# Skip the loop sessions and the upgrader — they're not the orchestrator.
if re.search(r"--remote-control\s+'?cc-ci-(builder|adv|upgrader)'?", cmdline):
continue
return True
except Exception:
pass
return session_alive(ORCH_SESSION)
def heal_orchestrator():
if not WATCH_ORCHESTRATOR:
return
if not Path(ORCH_LAUNCHER).is_file():
return
if orchestrator_alive():
if session_alive(ORCH_SESSION):
pane = capture_pane(ORCH_SESSION, 25)
if ACTIVE_RE.search(pane):
return
if FATAL_RE.search(pane):
log(f"FATAL session-state error on orchestrator ({ORCH_SESSION}) — kill + restart")
kill_session(ORCH_SESSION)
subprocess.run([ORCH_LAUNCHER, "start"], capture_output=True)
return
log(f"orchestrator not running — restarting via {ORCH_LAUNCHER}")
subprocess.run([ORCH_LAUNCHER, "start"], capture_output=True)
# ── handoff signalling ────────────────────────────────────────────────────────
_last_sha = ""
_adv_inbox_seen = ""
_builder_inbox_seen = ""
def handoff_reset():
global _last_sha, _adv_inbox_seen, _builder_inbox_seen
_last_sha = _adv_inbox_seen = _builder_inbox_seen = ""
def _fetch_origin():
subprocess.run(f"git -C {BUILDER_DIR!r} fetch -q origin", shell=True, capture_output=True)
def _show_pushed(path):
"""Read a file from origin/main (machine-docs/ first, then repo root)."""
for loc in (f"origin/main:machine-docs/{path}", f"origin/main:{path}"):
r = subprocess.run(
f"git -C {BUILDER_DIR!r} show {loc!r}",
shell=True, capture_output=True, text=True)
if r.returncode == 0:
return r.stdout
return ""
def handoff_check():
global _last_sha, _adv_inbox_seen, _builder_inbox_seen
_fetch_origin()
r = subprocess.run(
f"git -C {BUILDER_DIR!r} rev-parse origin/main",
shell=True, capture_output=True, text=True)
head = r.stdout.strip()
if head:
if not _last_sha:
_last_sha = head # baseline silently on first tick
elif head != _last_sha:
subjects = subprocess.run(
f"git -C {BUILDER_DIR!r} log --format=%s {_last_sha}..origin/main",
shell=True, capture_output=True, text=True).stdout
if re.search(r"^claim", subjects, re.MULTILINE | re.IGNORECASE):
log("handoff: new claim(...) commit → pinging Adversary")
ping_session(ADV_SESSION,
"watchdog ping: the Builder pushed a gate CLAIM (claim(...) commit). "
"Pull and verify the claimed gate now.")
if re.search(r"^review", subjects, re.MULTILINE | re.IGNORECASE):
log("handoff: new review(...) commit → pinging Builder")
ping_session(BUILDER_SESSION,
"watchdog ping: the Adversary pushed a verdict/finding (review(...) commit). "
"Pull REVIEW and act — proceed if it PASSes your gate, address it if it's a finding.")
_last_sha = head
adv_inbox = _show_pushed("ADVERSARY-INBOX.md")
builder_inbox = _show_pushed("BUILDER-INBOX.md")
def md5(s): return hashlib.md5(s.encode()).hexdigest()
if adv_inbox:
h = md5(adv_inbox)
if h != _adv_inbox_seen:
log("handoff: ADVERSARY-INBOX.md changed → pinging Adversary")
ping_session(ADV_SESSION,
"watchdog ping: the Builder pushed machine-docs/ADVERSARY-INBOX.md — "
"pull, read it, act, then delete the file (commit + push) to mark it consumed.")
_adv_inbox_seen = h
else:
_adv_inbox_seen = ""
if builder_inbox:
h = md5(builder_inbox)
if h != _builder_inbox_seen:
log("handoff: BUILDER-INBOX.md changed → pinging Builder")
ping_session(BUILDER_SESSION,
"watchdog ping: the Adversary pushed machine-docs/BUILDER-INBOX.md — "
"pull, read it, act, then delete the file (commit + push) to mark it consumed.")
_builder_inbox_seen = h
else:
_builder_inbox_seen = ""
# ── watchdog loop ─────────────────────────────────────────────────────────────
def watchdog_loop():
idx = cur_idx()
log(f"watchdog up — phase={phase_id(idx)} [{idx+1}/{len(PHASES)}] "
f"seq='{all_ids()}' signal={SIGNAL_INTERVAL}s heavy={WATCH_INTERVAL}s")
elapsed = WATCH_INTERVAL # force a heavy check on the first tick
while True:
handoff_check()
stall_check()
if elapsed >= WATCH_INTERVAL:
elapsed = 0
idx = cur_idx()
pid = phase_id(idx)
status = phase_status(idx)
if phase_done(status):
next_idx = idx + 1
if next_idx < len(PHASES):
log(f"PHASE {pid} DONE — auto-transitioning to {phase_id(next_idx)}")
stop_loops()
Path(PHASE_IDX_FILE).write_text(str(next_idx))
handoff_reset()
start_loops()
else:
log(f"PHASE SEQUENCE COMPLETE (last phase {pid} DONE) — stopping loops")
stop_loops()
ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
Path(LOG_DIR, "SEQUENCE-COMPLETE").write_text(
f"cc-ci phase sequence complete {ts}. Phases: {all_ids()}. "
f"Loops stopped; entire build finished.\n")
log("watchdog exiting.")
return
else:
heal_session("builder", BUILDER_SESSION, BUILDER_DIR)
heal_session("adversary", ADV_SESSION, ADV_DIR)
heal_orchestrator()
time.sleep(SIGNAL_INTERVAL)
elapsed += SIGNAL_INTERVAL
def start_watchdog():
if session_alive(WATCHDOG_SESSION):
log("watchdog already running")
return
log("starting watchdog")
script = Path(__file__).resolve()
subprocess.run([
"tmux", "new-session", "-d", "-s", WATCHDOG_SESSION, "-c", PLAN_DIR,
f"exec >>'{LOG_DIR}/watchdog.log' 2>&1; python3 '{script}' watchdog"
])
# ── preflight ─────────────────────────────────────────────────────────────────
def preflight():
import shutil
if not shutil.which("tmux"):
die("tmux not found")
if BACKEND == "claude":
if not shutil.which(CLAUDE_BIN):
die(f"claude CLI not found — set CLAUDE_BIN (currently: {CLAUDE_BIN})")
elif BACKEND == "opencode":
if not Path(OPENCODE_BIN).exists():
die(f"opencode not found at {OPENCODE_BIN}")
else:
die(f"unknown LOOP_BACKEND '{BACKEND}' — use 'claude' or 'opencode'")
for phase in PHASES:
plan = Path(PLAN_DIR) / phase[1]
if not plan.exists():
die(f"missing phase plan: {plan}")
for prompt_file in ("builder.md", "adversary.md"):
if not (Path(PLAN_DIR) / "prompts" / prompt_file).exists():
die(f"missing {PLAN_DIR}/prompts/{prompt_file}")
Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
# ── status ────────────────────────────────────────────────────────────────────
def cmd_status():
idx = cur_idx()
pid = phase_id(idx)
print(f" phase: {pid} [{idx+1}/{len(PHASES)}] plan={phase_plan(idx)} status={phase_status(idx)}")
for s in (BUILDER_SESSION, ADV_SESSION, WATCHDOG_SESSION):
state = "RUNNING" if session_alive(s) else "stopped"
print(f" {s}: {state}")
done_str = "## DONE" if phase_done(phase_status(idx)) else "in progress"
print(f" phase {pid}: {done_str}")
seq = Path(LOG_DIR) / "SEQUENCE-COMPLETE"
if seq.exists():
print(f" >>> {seq.read_text().strip()}")
# ── main ──────────────────────────────────────────────────────────────────────
def main():
cmd = sys.argv[1] if len(sys.argv) > 1 else ""
if cmd == "start":
preflight()
stop_loops()
if os.environ.get("RESUME_PHASE") != "1":
Path(PHASE_IDX_FILE).write_text("0")
seq = Path(LOG_DIR) / "SEQUENCE-COMPLETE"
if seq.exists():
seq.unlink()
start_loops()
start_watchdog()
log(f"started at phase {phase_id(cur_idx())}.")
elif cmd == "watchdog":
preflight()
watchdog_loop()
elif cmd == "status":
cmd_status()
elif cmd == "stop":
stop_loops()
if session_alive(WATCHDOG_SESSION):
log(f"killing {WATCHDOG_SESSION}")
kill_session(WATCHDOG_SESSION)
log("stopped.")
elif cmd == "logs":
sub = sys.argv[2] if len(sys.argv) > 2 else ""
log_files = {
"builder": f"{LOG_DIR}/{BUILDER_SESSION}.log",
"adversary": f"{LOG_DIR}/{ADV_SESSION}.log",
"watchdog": f"{LOG_DIR}/watchdog.log",
}
if sub not in log_files:
die("usage: launch.py logs builder|adversary|watchdog")
os.execvp("tail", ["tail", "-f", log_files[sub]])
else:
print(f"""cc-ci loop launcher (phase-aware)
launch.py start start loops + watchdog (RESUME_PHASE=1 to keep current phase)
launch.py stop stop loops + watchdog
launch.py status show phase + session state
launch.py logs builder|adversary|watchdog tail a log
launch.py watchdog run watchdog in foreground
Backend: {BACKEND} Model: {LOOP_MODEL or '<default>'}
Phase sequence ({len(PHASES)} phases, auto-advance on ## DONE, stop after last):
{all_ids()}
""")
if __name__ == "__main__":
main()

View File

@ -1,505 +1,3 @@
#!/usr/bin/env bash
#
# launch.sh — start and supervise the two cc-ci autonomous loops + a phase-aware watchdog.
#
# Model (see plan.md §6 / §6.1): two INDEPENDENT Claude Code sessions —
# • Builder (tmux session: cc-ci-builder) working clone /srv/cc-ci/cc-ci
# • Adversary (tmux session: cc-ci-adv) working clone /srv/cc-ci/cc-ci-adv
# coordinating only through the git repo on git.autonomic.zone.
#
# PHASES: the watchdog runs an ordered sequence of sub-phases (default: 1c → 1b → 1d → 1e → 2w → 2 → 2b → 3 → 4;
# 2w = warm-canonical/--quick, interjected; Phase 2 pauses for it then resumes).
# Each phase has its own plan + phase-namespaced loop-state files (STATUS-<id>.md etc.). When a phase's
# STATUS-<id>.md shows "## DONE", the watchdog AUTO-TRANSITIONS to the next phase; after the LAST
# phase (4, final review/polish/cleanup) it STOPS the loops and exits (end of the whole build).
#
# Three jobs: ITERATION (each agent's /loop), RESILIENCE (restart a dead loop), HANDOFF SIGNALLING
# (ping the waiting loop the moment its counterpart hands off), PHASE SEQUENCING (this file).
#
# Usage:
# ./launch.sh start # start the sequence at phase 0 + watchdog (stops/relaunches loops)
# ./launch.sh watchdog # run only the supervision loop in the foreground
# ./launch.sh status # show phase + session + DONE state
# ./launch.sh logs builder|adversary|watchdog # tail a session/log
# ./launch.sh stop # stop both loops + watchdog
set -euo pipefail
# Absolute path to this script, so the watchdog re-invokes it correctly regardless of cwd.
SELF="$(readlink -f "${BASH_SOURCE[0]}")"
# ----- config -------------------------------------------------------------
PLAN_DIR="${PLAN_DIR:-/srv/cc-ci/cc-ci-plan}"
# ----- backend selection ------------------------------------------------------
# LOOP_BACKEND: "claude" (default) or "opencode" (tinfoil/opencode web, tailscale-only).
# LOOP_MODEL: model to pass to the backend.
# claude: e.g. "sonnet", "opus" (--model flag); empty = use CLI default.
# opencode: "provider/model" e.g. "tinfoil/deepseek-v4-pro".
LOOP_BACKEND="${LOOP_BACKEND:-claude}"
LOOP_MODEL="${LOOP_MODEL:-}"
CLAUDE_BIN="${CLAUDE_BIN:-claude}"
OPENCODE_BIN="${OPENCODE_BIN:-/home/loops/.local/bin/opencode}"
# opencode web server listens on localhost (nginx proxies it at oc.commoninternet.net).
# One shared server hosts all sessions; agents attach with --attach.
OPENCODE_SERVER="${OPENCODE_SERVER:-http://127.0.0.1:4096}"
OPENCODE_PORT="${OPENCODE_PORT:-4096}"
# --dangerously-skip-permissions cannot be passed as a FLAG when running as root (claude blocks it).
# Use the env var form instead; detect root and switch automatically.
if [ "$(id -u)" = "0" ]; then
export CLAUDE_DANGEROUSLY_SKIP_PERMISSIONS=1
CLAUDE_FLAGS="${CLAUDE_FLAGS:-}"
else
CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
fi
# REMOTE_CONTROL=1 → interactive --remote-control sessions (viewable at claude.ai/code), required
# for /loop. The box must be logged into the claude.ai account. =0 for plain interactive.
# For opencode backend this controls whether to start the opencode web server.
REMOTE_CONTROL="${REMOTE_CONTROL:-1}"
BUILDER_DIR="${BUILDER_DIR:-/srv/cc-ci/cc-ci}" # Builder's repo clone
ADV_DIR="${ADV_DIR:-/srv/cc-ci/cc-ci-adv}" # Adversary's repo clone
LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
WATCH_INTERVAL="${WATCH_INTERVAL:-300}" # seconds between HEAVY checks (phase DONE / restart dead loops)
SIGNAL_INTERVAL="${SIGNAL_INTERVAL:-30}" # seconds between HANDOFF checks (ping the waiting loop)
STALL_IDLE="${STALL_IDLE:-300}" # NO-marker case: seconds a loop may sit idle (turn ended
# without declaring a wait) before the watchdog reboots it
STALL_GRACE="${STALL_GRACE:-180}" # marker case: seconds PAST a loop's WAITING-UNTIL before
# reboot. The real ScheduleWakeup fires AT the stated time;
# grace covers wake+start latency + marker/scheduler skew so
# the watchdog never RACES (pre-empts) a healthy self-wake.
BUILDER_SESSION="cc-ci-builder"
ADV_SESSION="cc-ci-adv"
WATCHDOG_SESSION="cc-ci-watchdog"
# Orchestrator (supervisory session) — the watchdog keeps it alive too, via launch-orchestrator.sh.
ORCH_SESSION="${ORCH_SESSION:-cc-ci-orchestrator-vm}"
ORCH_LAUNCHER="${ORCH_LAUNCHER:-$PLAN_DIR/launch-orchestrator.sh}"
# Watchdog supervision of the orchestrator can be disabled (=0) if you run the orchestrator yourself
# and don't want it auto-(re)launched.
WATCH_ORCHESTRATOR="${WATCH_ORCHESTRATOR:-1}"
# Ordered phase sequence: each entry "id|planfile|statusbasename". The watchdog runs them in order,
# auto-transitions on the phase's "## DONE" (in BUILDER_DIR/<statusbasename>), and STOPS after the
# last one (manual gate). Override PHASES_SPEC (semicolon-separated) to change the sequence.
PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md;1d|plan-phase1d-generic-test-suite.md|STATUS-1d.md;1e|plan-phase1e-harness-corrections.md|STATUS-1e.md;2w|plan-phase2w-warm-canonical-quick.md|STATUS-2w.md;2pc|plan-phase2pc-image-cache.md|STATUS-2pc.md;2|plan-phase2-recipe-tests.md|STATUS-2.md;2b|plan-phase2b-test-performance.md|STATUS-2b.md;3|plan-phase3-results-ux.md|STATUS-3.md;4|plan-phase4-final-review-polish-cleanup.md|STATUS-4.md;5|plan-phase5-verify-upgrade-flow.md|STATUS-5.md}"
IFS=';' read -r -a PHASES <<< "$PHASES_SPEC"
PHASE_IDX_FILE="${PHASE_IDX_FILE:-$LOG_DIR/.phase-idx}"
# --------------------------------------------------------------------------
log() { printf '[launch %(%H:%M:%S)T] %s\n' -1 "$*"; }
die() { log "ERROR: $*"; exit 1; }
need() { command -v "$1" >/dev/null 2>&1 || die "missing dependency: $1"; }
# ----- phase helpers ------------------------------------------------------
cur_idx() { local i; i="$(cat "$PHASE_IDX_FILE" 2>/dev/null || echo 0)"; [[ "$i" =~ ^[0-9]+$ ]] || i=0; echo "$i"; }
phase_id() { echo "${PHASES[$1]}" | cut -d'|' -f1; }
phase_plan() { echo "${PHASES[$1]}" | cut -d'|' -f2; }
phase_status() { echo "${PHASES[$1]}" | cut -d'|' -f3; }
phase_review() { echo "REVIEW-$(phase_id "$1").md"; }
# Loop-state files may sit at the repo root OR under machine-docs/ (the 1b RL6 move). Prefer
# machine-docs/ if present, else root — so the watchdog survives the move whenever it happens.
resolve_state() { local dir="$1" base="$2"; if [[ -f "$dir/machine-docs/$base" ]]; then echo "$dir/machine-docs/$base"; else echo "$dir/$base"; fi; }
phase_done() { grep -qE '^##[[:space:]]+DONE' "$(resolve_state "$BUILDER_DIR" "$1")" 2>/dev/null; } # $1 = status basename (read locally)
all_ids() { local p; for p in "${PHASES[@]}"; do printf '%s ' "$(echo "$p" | cut -d'|' -f1)"; done; }
preflight() {
need tmux
case "$LOOP_BACKEND" in
claude) command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)" ;;
opencode) command -v "$OPENCODE_BIN" >/dev/null 2>&1 || die "opencode not found at $OPENCODE_BIN; install from https://opencode.ai" ;;
*) die "unknown LOOP_BACKEND '$LOOP_BACKEND' — use 'claude' or 'opencode'" ;;
esac
local p plan
for p in "${PHASES[@]}"; do
plan="$(echo "$p" | cut -d'|' -f2)"
[[ -f "$PLAN_DIR/$plan" ]] || die "missing phase plan $PLAN_DIR/$plan"
done
[[ -f "$PLAN_DIR/prompts/builder.md" ]] || die "missing $PLAN_DIR/prompts/builder.md"
[[ -f "$PLAN_DIR/prompts/adversary.md" ]] || die "missing $PLAN_DIR/prompts/adversary.md"
mkdir -p "$LOG_DIR"
}
session_alive() { tmux has-session -t "$1" 2>/dev/null; }
# Build the per-session kickoff (phase preamble + base role prompt) and launch the agent.
# role ∈ {builder, adversary}.
# Backend "claude": prompt passed as positional arg via $(cat kf) — never stdin (piping breaks /loop).
# Backend "opencode": opencode serves a web UI on OPENCODE_HOST:OPENCODE_PORT (tailnet-only);
# each session gets a dedicated port offset (builder=+0, adversary=+1) so they don't collide.
# The kickoff prompt is passed via `opencode run <message>` in a detached tmux session; the web
# UI is accessible at http://OPENCODE_HOST:PORT for observation (like --remote-control).
start_agent() {
local role="$1" session="$2" workdir="$3"
if session_alive "$session"; then log "$session already running — leaving it"; return 0; fi
mkdir -p "$workdir"
local idx pid plan status kf
idx="$(cur_idx)"; pid="$(phase_id "$idx")"; plan="$(phase_plan "$idx")"; status="$(phase_status "$idx")"
kf="$LOG_DIR/.kickoff-$session.txt"
{
cat <<PREAMBLE
*** cc-ci SUB-PHASE ${pid} ***
SINGLE SOURCE OF TRUTH for THIS phase: /srv/cc-ci/cc-ci-plan/${plan} — read it in full now; it defines this phase's mission and Definition of Done.
The general loop protocol still applies and lives in /srv/cc-ci/cc-ci-plan/plan.md (§6.1 coordination, §7 pacing, §9 guardrails) — read those sections too.
Track loop state in PHASE-NAMESPACED files in your repo clone: ${status}, BACKLOG-${pid}.md, REVIEW-${pid}.md, JOURNAL-${pid}.md. DECISIONS.md is shared (append).
"Done" for this phase = the Builder writes "## DONE" to ${status} ONLY after every Definition-of-Done item is Adversary-verified with a fresh PASS in REVIEW-${pid}.md (handshake per §6.1).
The repo's Phase-1 STATUS.md / BACKLOG.md / REVIEW.md are HISTORY from the completed Phase 1 — do NOT use them as your state; use the phase-namespaced files above.
Wherever the standing rules below say "plan.md"/"STATUS.md"/"BACKLOG.md"/"REVIEW.md", substitute the phase plan and these phase-namespaced files.
=== standing role & rules ===
PREAMBLE
cat "$PLAN_DIR/prompts/$role.md"
} > "$kf"
local model_flag=""
[[ -n "$LOOP_MODEL" ]] && model_flag="--model '$LOOP_MODEL'"
case "$LOOP_BACKEND" in
claude)
local rc=""
[[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$session'"
log "starting $session (backend=claude, phase=$pid, model=${LOOP_MODEL:-default}, cwd=$workdir)"
tmux new-session -d -s "$session" -c "$workdir" \
"$CLAUDE_BIN $rc $model_flag $CLAUDE_FLAGS \"\$(cat '$kf')\""
;;
opencode)
# One shared opencode web server (opencode-web.service or manually started) hosts all sessions.
# Each agent attaches to it as a named session visible in the web UI at oc.commoninternet.net.
log "starting $session (backend=opencode, phase=$pid, model=${LOOP_MODEL:-default}, server=$OPENCODE_SERVER)"
tmux new-session -d -s "$session" -c "$workdir" \
"set -a; . /srv/cc-ci/.testenv; set +a; $OPENCODE_BIN $model_flag run --attach '$OPENCODE_SERVER' --title '$session' \"\$(cat '$kf')\""
log "$session visible in web UI at http://oc.commoninternet.net (tailnet only)"
;;
esac
tmux pipe-pane -o -t "$session" "cat >> '$LOG_DIR/$session.log'"
}
start_loops() {
start_agent builder "$BUILDER_SESSION" "$BUILDER_DIR"
start_agent adversary "$ADV_SESSION" "$ADV_DIR"
}
stop_loops() {
local s
for s in "$BUILDER_SESSION" "$ADV_SESSION"; do
if session_alive "$s"; then log "killing $s"; tmux kill-session -t "$s" || true; fi
done
}
# Wake a loop by typing a message into its tmux session and SUBMITTING it. A single Enter after a
# long `send-keys -l` is often swallowed while the TUI ingests the paste (text left unsent in the
# input box), so retry Enter/C-m until the message's leading text is no longer in the input box.
ping_session() {
local s="$1" msg="$2" prefix i
session_alive "$s" || return 0
prefix="${msg:0:28}"
tmux send-keys -t "$s" -l -- "$msg" 2>/dev/null || return 0
sleep 0.5
for i in 1 2 3 4 5; do
tmux send-keys -t "$s" Enter 2>/dev/null
sleep 1
tmux capture-pane -pt "$s" 2>/dev/null | tail -4 | grep -qF -- "$prefix" || return 0 # submitted
tmux send-keys -t "$s" C-m 2>/dev/null; sleep 0.5
done
}
# A loop can stall ALIVE on a usage/spend-limit notice: the claude process stays up (so the
# dead-session restart never fires) but makes no progress, and the /loop self-pacing is dead because
# the limit interrupted the turn that would have scheduled the next tick. Detect that signature
# (limit text present + no active-turn marker) and re-nudge it each heavy tick — once the limit resets
# the next nudge lands and the loop resumes. Gated on the limit text so we NEVER nudge a loop that is
# just legitimately idle-waiting on a handoff.
LIMIT_RE='spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)'
# FATAL = an unrecoverable session-state API error that recurs on EVERY turn (so the session stays
# alive but wedged — a nudge can't fix it; only a fresh session can). The confirmed case: the
# "thinking/redacted_thinking blocks ... cannot be modified" 400 that has hit the Adversary
# repeatedly (interrupted-mid-thinking corrupts the replayed history). Kill + restart fresh; the loop
# re-orients from the repo. Matched conservatively so it never fires on transient/working states.
FATAL_RE='redacted_thinking|blocks cannot be modified|cannot be modified'
# Heal one loop session: dead -> restart; wedged on a FATAL error -> kill + restart fresh; stalled on
# a usage limit -> nudge. No-op while actively working ("esc to interrupt" on screen).
heal_session() {
local role="$1" s="$2" dir="$3" pane
if ! session_alive "$s"; then
log "$role ($s) gone — restarting (phase $(phase_id "$(cur_idx)"))"
start_agent "$role" "$s" "$dir"; return 0
fi
pane="$(tmux capture-pane -pt "$s" 2>/dev/null | tail -25 || true)"
# "esc to interrupt" = claude actively working; "running" or spinner chars = opencode actively working
printf '%s\n' "$pane" | grep -qE 'esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool' && return 0
if printf '%s\n' "$pane" | grep -qiE "$FATAL_RE"; then
log "FATAL session-state error on $role ($s) — kill + restart fresh (re-orients from repo)"
tmux kill-session -t "$s" 2>/dev/null || true
start_agent "$role" "$s" "$dir"; return 0
fi
if printf '%s\n' "$pane" | grep -qiE "$LIMIT_RE"; then
log "limit-stall detected on $role ($s) — re-nudging to resume"
ping_session "$s" "watchdog: the usage/spend limit appears lifted — RESUME your loop now. Pull latest, re-read your phase STATUS/REVIEW files, and continue from where you stopped; re-arm your loop pacing."
fi
}
# --- Idle-wedge detection (complements heal_session's dead/FATAL/limit cases) ----------------------
# A loop can sit ALIVE but wedged — e.g. garbled output at the context limit — showing none of the
# heal_session signals (not dead, no FATAL string, no limit notice). The loops therefore DECLARE every
# wait with a final-line marker `WAITING-UNTIL: <ISO-8601 UTC>` and cap each wait at 10 min (plan §7).
# A healthy idle loop ALWAYS has a current marker as its last message; a wedge does not (or has one
# whose time has already passed). So: reboot a loop that has been idle (no "esc to interrupt") for
# >= STALL_IDLE seconds AND (has no WAITING-UNTIL marker OR is now past the time that marker named).
# Runs every signal tick (30 s) for fine resolution; rebooting is safe — the loop re-orients from
# git + its phase STATUS/REVIEW files.
declare -A _wd_idle_since # session -> epoch first seen idle this stretch (0/unset = working)
_parse_waiting_until() { # arg1 = pane text; echoes epoch seconds of the last marker, or nothing
local line ts
line="$(printf '%s\n' "$1" | grep -oE 'WAITING-UNTIL:[[:space:]]*[0-9][0-9T:Z+-]+' | tail -1)"
[[ -n "$line" ]] || return 0
ts="$(printf '%s' "${line#WAITING-UNTIL:}" | tr -d '[:space:]')"
date -u -d "$ts" +%s 2>/dev/null || true
}
stall_check_one() {
local role="$1" s="$2" dir="$3" pane now until idle since reason
session_alive "$s" || { _wd_idle_since[$s]=0; return 0; } # dead => heal_session handles it
now="$(printf '%(%s)T' -1)"
pane="$(tmux capture-pane -pt "$s" 2>/dev/null | tail -40 || true)"
if printf '%s\n' "$pane" | grep -q 'esc to interrupt'; then
_wd_idle_since[$s]=0; return 0 # actively working — not idle
fi
since="${_wd_idle_since[$s]:-0}"
if [[ "$since" == 0 ]]; then since="$now"; _wd_idle_since[$s]="$now"; fi
idle=$(( now - since ))
until="$(_parse_waiting_until "$pane")"
if [[ -n "$until" ]]; then
# Declared wait: the loop's own ScheduleWakeup fires AT 'until'. Reboot ONLY once we are
# STALL_GRACE seconds PAST it — i.e. the self-wake genuinely failed. Never reboot before/at
# 'until' (that races and pre-empts the healthy wake — the original false-reboot bug).
(( now > until + STALL_GRACE )) || return 0
reason="past its WAITING-UNTIL by $(( now - until ))s — self-wake did not fire"
else
# No declared wait: a turn ended without scheduling/declaring. Treat as a wedge once idle a while.
(( idle >= STALL_IDLE )) || return 0
reason="idle ${idle}s with no WAITING-UNTIL marker"
fi
log "stall: $role ($s) $reason — kill + reboot (re-orients from repo)"
tmux kill-session -t "$s" 2>/dev/null || true
start_agent "$role" "$s" "$dir"
_wd_idle_since[$s]=0
}
stall_check() {
stall_check_one builder "$BUILDER_SESSION" "$BUILDER_DIR"
stall_check_one adversary "$ADV_SESSION" "$ADV_DIR"
}
# Is an orchestrator process alive ANYWHERE? Conflict-safety: we must NEVER launch a second
# orchestrator that resumes the same conversation while one is already running (that double-resume is
# the likely cause of the "thinking blocks cannot be modified" crashes). The orchestrator may be
# running as a managed tmux session (cc-ci-orchestrator) OR as a plain terminal session the operator
# started by hand (no flags). So: alive iff any `claude` process exists that is NOT one of the two
# loop sessions (identified by their --remote-control name), or the managed tmux session exists.
orchestrator_alive() {
local pid args
for pid in $(pgrep -x claude 2>/dev/null); do
args="$(tr '\0' ' ' < "/proc/$pid/cmdline" 2>/dev/null || true)"
# skip the loops + the one-shot upgrader job (matched by remote-control session NAME, not a
# stray path mention) — none of these is the orchestrator.
printf '%s' "$args" | grep -qE -- "--remote-control +'?cc-ci-(builder|adv|upgrader)'?" && continue
return 0 # a non-loop claude process => orchestrator (or operator) is alive
done
tmux has-session -t "$ORCH_SESSION" 2>/dev/null && return 0
return 1
}
# Keep the orchestrator alive: restart it (via launch-orchestrator.sh, which resumes its session) ONLY
# when none is running; if it's the managed tmux session and wedged on a FATAL error, kill+restart.
heal_orchestrator() {
[[ "$WATCH_ORCHESTRATOR" == "1" ]] || return 0
[[ -x "$ORCH_LAUNCHER" ]] || return 0
if orchestrator_alive; then
if tmux has-session -t "$ORCH_SESSION" 2>/dev/null; then
local pane; pane="$(tmux capture-pane -pt "$ORCH_SESSION" 2>/dev/null | tail -25 || true)"
printf '%s\n' "$pane" | grep -qE 'esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool' && return 0
if printf '%s\n' "$pane" | grep -qiE "$FATAL_RE"; then
log "FATAL session-state error on orchestrator ($ORCH_SESSION) — kill + restart fresh"
tmux kill-session -t "$ORCH_SESSION" 2>/dev/null || true
"$ORCH_LAUNCHER" start >/dev/null 2>&1 || true
fi
fi
return 0
fi
log "orchestrator not running anywhere — restarting via $ORCH_LAUNCHER"
"$ORCH_LAUNCHER" start >/dev/null 2>&1 || true
}
# Detect handoffs against the PUSHED origin/main — i.e. exactly what the RECEIVER will pull — NOT the
# writer's local working tree. (Reading the working tree fired on a claim/verdict the writer hadn't
# pushed yet; the receiver then pulled a stale remote, saw "no formal gate", and a clarifying
# inbox round-trip ensued. Mirroring origin/main eliminates that race.) origin/main is the shared
# branch, so all four files are read from one clone's origin/main after a single best-effort fetch.
_wd_fetch_origin() { git -C "$1" fetch -q origin 2>/dev/null || true; }
_wd_show_pushed() { git -C "$1" show "origin/main:machine-docs/$2" 2>/dev/null || git -C "$1" show "origin/main:$2" 2>/dev/null || true; }
_wd_last_sha=""; _wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""
handoff_reset() { _wd_last_sha=""; _wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""; } # call on phase transition
# Signal handoffs off the loops' CONVENTIONAL COMMIT PREFIXES on origin/main — NOT by parsing
# free-form markdown prose (brittle). The loops consistently prefix every gate claim `claim(...)`
# and every verdict/finding `review(...)`. So: a new `claim(` commit pushed => ping the Adversary;
# a new `review(` commit => ping the Builder. Edge-triggered on the origin/main SHA (append-only —
# the loops never force-push), so it can't double-fire or mis-route. INBOX files are detected
# separately (which file changed routes the ping). All reads are of the PUSHED state (what the
# receiver pulls).
handoff_check() {
local head subjects adv_inbox builder_inbox h
_wd_fetch_origin "$BUILDER_DIR"
head="$(git -C "$BUILDER_DIR" rev-parse origin/main 2>/dev/null || true)"
if [[ -n "$head" ]]; then
if [[ -z "$_wd_last_sha" ]]; then
_wd_last_sha="$head" # baseline silently on first observation / restart
elif [[ "$head" != "$_wd_last_sha" ]]; then
subjects="$(git -C "$BUILDER_DIR" log --format='%s' "${_wd_last_sha}..origin/main" 2>/dev/null || true)"
if printf '%s\n' "$subjects" | grep -qiE '^claim'; then
log "handoff: new claim(...) commit on origin/main -> pinging Adversary"
ping_session "$ADV_SESSION" "watchdog ping: the Builder pushed a gate CLAIM (claim(...) commit). Pull and verify the claimed gate now."
fi
if printf '%s\n' "$subjects" | grep -qiE '^review'; then
log "handoff: new review(...) commit on origin/main -> pinging Builder"
ping_session "$BUILDER_SESSION" "watchdog ping: the Adversary pushed a verdict/finding (review(...) commit). Pull REVIEW and act — proceed if it PASSes your gate, address it if it's a finding."
fi
_wd_last_sha="$head"
fi
fi
adv_inbox="$(_wd_show_pushed "$BUILDER_DIR" "ADVERSARY-INBOX.md")"
builder_inbox="$(_wd_show_pushed "$BUILDER_DIR" "BUILDER-INBOX.md")"
# INBOX side-channel (§6.1), detected on the pushed state. Receiver deletes after consuming =>
# absent on origin/main => re-arm so the next write re-pings.
if [[ -n "$adv_inbox" ]]; then
h="$(printf '%s' "$adv_inbox" | md5sum | awk '{print $1}')"
if [[ "$h" != "$_wd_adv_inbox_seen" ]]; then
log "handoff: ADVERSARY-INBOX.md new/changed (pushed) -> pinging Adversary"
ping_session "$ADV_SESSION" "watchdog ping: the Builder pushed machine-docs/ADVERSARY-INBOX.md — pull, read it, act, then delete the file (commit + push) to mark it consumed."
_wd_adv_inbox_seen="$h"
fi
else
_wd_adv_inbox_seen=""
fi
if [[ -n "$builder_inbox" ]]; then
h="$(printf '%s' "$builder_inbox" | md5sum | awk '{print $1}')"
if [[ "$h" != "$_wd_builder_inbox_seen" ]]; then
log "handoff: BUILDER-INBOX.md new/changed (pushed) -> pinging Builder"
ping_session "$BUILDER_SESSION" "watchdog ping: the Adversary pushed machine-docs/BUILDER-INBOX.md — pull, read it, act, then delete the file (commit + push) to mark it consumed."
_wd_builder_inbox_seen="$h"
fi
else
_wd_builder_inbox_seen=""
fi
}
watchdog_loop() {
local idx pid status next
idx="$(cur_idx)"; pid="$(phase_id "$idx")"
log "watchdog up (phase=$pid [$((idx+1))/${#PHASES[@]}], seq='$(all_ids)', signal=${SIGNAL_INTERVAL}s, heavy=${WATCH_INTERVAL}s)"
local elapsed="$WATCH_INTERVAL"
while true; do
handoff_check
stall_check
if (( elapsed >= WATCH_INTERVAL )); then
elapsed=0
idx="$(cur_idx)"; pid="$(phase_id "$idx")"; status="$(phase_status "$idx")"
if phase_done "$status"; then
next=$((idx + 1))
if (( next < ${#PHASES[@]} )); then
log "PHASE $pid DONE (## DONE in $status) — auto-transitioning to $(phase_id "$next")."
stop_loops
echo "$next" > "$PHASE_IDX_FILE"
handoff_reset
start_loops
else
log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — entire build (1c→3) finished."
stop_loops
printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; entire build finished.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE"
log "watchdog exiting."
exit 0
fi
else
heal_session builder "$BUILDER_SESSION" "$BUILDER_DIR"
heal_session adversary "$ADV_SESSION" "$ADV_DIR"
heal_orchestrator
fi
fi
sleep "$SIGNAL_INTERVAL"
elapsed=$(( elapsed + SIGNAL_INTERVAL ))
done
}
start_watchdog() {
if session_alive "$WATCHDOG_SESSION"; then log "watchdog already running"; return 0; fi
log "starting watchdog"
tmux new-session -d -s "$WATCHDOG_SESSION" -c "$PLAN_DIR" \
"exec >>'$LOG_DIR/watchdog.log' 2>&1; '$SELF' watchdog"
}
cmd_status() {
local idx pid; idx="$(cur_idx)"; pid="$(phase_id "$idx")"
echo " phase: $pid [$((idx+1))/${#PHASES[@]}] plan=$(phase_plan "$idx") status=$(phase_status "$idx")"
local s
for s in "$BUILDER_SESSION" "$ADV_SESSION" "$WATCHDOG_SESSION"; do
if session_alive "$s"; then echo " $s: RUNNING"; else echo " $s: stopped"; fi
done
if phase_done "$(phase_status "$idx")"; then echo " phase $pid: ## DONE"; else echo " phase $pid: in progress"; fi
[[ -f "$LOG_DIR/SEQUENCE-COMPLETE" ]] && echo " >>> $(cat "$LOG_DIR/SEQUENCE-COMPLETE")"
}
case "${1:-}" in
start)
preflight
# Fresh sequence: stop any running loops, reset to phase 0 (unless RESUME_PHASE=1 keeps the idx).
stop_loops
if [[ "${RESUME_PHASE:-}" != "1" ]]; then echo 0 > "$PHASE_IDX_FILE"; fi
rm -f "$LOG_DIR/SEQUENCE-COMPLETE"
start_loops
start_watchdog
log "started at phase $(phase_id "$(cur_idx)"). status: ./launch.sh status | attach: tmux attach -t $BUILDER_SESSION"
;;
watchdog) preflight; watchdog_loop ;;
status) cmd_status ;;
logs)
case "${2:-}" in
builder) tail -f "$LOG_DIR/$BUILDER_SESSION.log" ;;
adversary) tail -f "$LOG_DIR/$ADV_SESSION.log" ;;
watchdog) tail -f "$LOG_DIR/watchdog.log" ;;
*) die "usage: $0 logs builder|adversary|watchdog" ;;
esac
;;
stop)
stop_loops
if session_alive "$WATCHDOG_SESSION"; then log "killing $WATCHDOG_SESSION"; tmux kill-session -t "$WATCHDOG_SESSION" || true; fi
log "stopped."
;;
*)
cat <<EOF
cc-ci loop launcher (phase-aware)
$0 start start the phase sequence at phase 0 + watchdog (stops any running loops first)
$0 status show phase + session + DONE state
$0 logs builder|adversary|watchdog tail a log
$0 stop stop both loops + watchdog
$0 watchdog run supervision loop in foreground
Phase sequence (auto-transition on per-phase ## DONE; STOP after the last = manual gate):
$(all_ids)
Env: LOOP_BACKEND=$LOOP_BACKEND LOOP_MODEL=${LOOP_MODEL:-<default>}
claude: CLAUDE_BIN=$CLAUDE_BIN REMOTE_CONTROL=$REMOTE_CONTROL
opencode: OPENCODE_BIN=$OPENCODE_BIN OPENCODE_SERVER=$OPENCODE_SERVER
(one shared server; each session attaches with --title; web UI: http://oc.commoninternet.net)
WATCH_INTERVAL=${WATCH_INTERVAL}s SIGNAL_INTERVAL=${SIGNAL_INTERVAL}s
PHASES_SPEC='$PHASES_SPEC'
RESUME_PHASE=1 to keep the current phase index instead of resetting to 0.
EOF
;;
esac
# Thin wrapper — delegates everything to launch.py in the same directory.
exec python3 "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/launch.py" "$@"