refactor: rewrite launchers as Python; add orchestrator JOURNAL.md

Bash scripts are now one-liner wrappers: exec python3 <script>.py "$@" All logic lives in the Python scripts (pure stdlib, no deps). launch.py — loops + watchdog: Full port of launch.sh: phase sequencing, start/stop/status/logs/watchdog, handoff signalling, stall detection, heal_session, heal_orchestrator. Cleaner structure: config block → helpers → phase/kickoff/agent/healing/ handoff/watchdog/main. LOOP_BACKEND + LOOP_MODEL switches throughout. launch-orchestrator.py — orchestrator session: claude path: --resume <id> preserved (conversation survives reboots). opencode path: run --attach --title (no --resume; STARTUP_PROMPT orients the new session; reads JOURNAL.md for context). STARTUP_PROMPT updated to reference JOURNAL.md on startup. launch-upgrader.py — one-shot upgrade job: LOOP_BACKEND / LOOP_MODEL take precedence over UPGRADER_BACKEND / UPGRADER_MODEL. Both claude and opencode paths supported. cc-ci-plan/JOURNAL.md — new orchestrator handoff file: Persistent across conversation resets. Documents the handoff format and carries the current session's summary: migration complete, phase 5 in progress (V3/V7 PASS), phase 4 deferred, open items for next session. AGENTS.md: step 1 on startup = read JOURNAL.md; step 5 = append on handoff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 17:50:09 +00:00
parent e0e5bf6e64
commit bca51071bd
8 changed files with 1067 additions and 781 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -16,18 +16,18 @@ project (NixOS config, test runner, recipe tests) lives in a **separate** repo t
 The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator
 watches from outside.

-## On startup: announce yourself + report reboots
+## On startup: read the journal, announce yourself, report reboots

-**Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online —
-the operator wants to know the supervising session is back (especially after a reboot, which kills
-this session). Include the current phase and the reboot count. Steps on startup:
-1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status`
-   (current phase + whether the loops/watchdog are running).
-2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog
+**Every time you (the orchestrator) start or resume:**
+1. **Read `cc-ci-plan/JOURNAL.md`** — the most recent `## Session` entry is where the previous
+   session left off. This is the persistent handoff record; read it before anything else.
+2. Read `cc-ci-plan/REBOOTS.md` (count entries) and run `cc-ci-plan/launch.sh status`
+   (current phase + whether loops/watchdog are running).
+3. **`PushNotification`** (proactive): *"cc-ci orchestrator online — phase X, loops+watchdog
   running; N reboots logged (last <date>)."*
-3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the
-   loops are down), check that `cc-ci-loops.service` brought the loops back; if not, relaunch with
-   `RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
+4. If loops are down, relaunch: `RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
+5. **On handoff / end of session:** append a `## Session` block to `JOURNAL.md` summarising
+   what happened, current state, and open items (see format in that file).

 Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot
 to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops +
--- a/cc-ci-plan/JOURNAL.md
+++ b/cc-ci-plan/JOURNAL.md
@ -0,0 +1,82 @@
+# Orchestrator journal
+
+This file is the **persistent handoff record** for the cc-ci orchestrator. Every orchestrator
+session (whether Claude or opencode) reads this on startup and appends to it when handing off or
+when something noteworthy happens. It survives conversation resets — it is the memory that
+`--resume` can't provide for opencode, and a more readable supplement for Claude sessions.
+
+**On startup:** read this file before doing anything else. The most recent `## Session` entry
+is where the previous session left off. Carry that context forward.
+
+**On handoff / end of session:** append a `## Session` block (see format below) summarising
+what happened, the current state, and anything the next session needs to know.
+
+**On significant events mid-session:** append a `### Event` sub-entry (no need to wait for
+handoff).
+
+---
+
+## Format
+
+```markdown
+## Session YYYY-MM-DD HH:MM UTC — <backend> <model>
+**Left off:** <one sentence — what was the last thing done>
+**Phase / loop state:** <phase X [N/11], loops RUNNING/stopped, cc-ci healthy/issue>
+**Open items:** <bullet list of anything the next session needs to act on, or "none">
+**Notes:** <anything surprising, a decision made, a known blocker, etc.>
+
+### Event HH:MM — <short label>
+<brief note>
+```
+
+---
+
+## Session 2026-05-31 ~04:00 UTC — Claude Sonnet 4.6
+
+**Left off:** Completed the orchestrator → Hetzner migration (cpx22, server 134487234, public
+`168.119.126.100`, tailnet `cc-ci-orchestrator-1` @ `100.84.190.30`). The old Incus VM
+(`100.116.55.106`) is still on the tailnet — cold standby, not yet deleted.
+
+**Phase / loop state:** Phases 1c–1e, 2w, 2pc, 2, 2b, 3 all DONE. Phase 5 [11/11]
+(upgrade-flow verify) in progress — loops running, actively verifying the `!testme`
+end-to-end flow on the new Hetzner cc-ci server.
+
+**Open items:**
+- Phase 5 is in progress — loops need to finish V1–V9 and write `## DONE` to STATUS-5.md.
+- Phase 4 (final review/polish) was deliberately **skipped** this session — it is queued
+  at idx 9 in PHASE_IDX_FILE. Resume it after the weekly Opus credits reset.
+- Phase 6 (reconcile-only over all 18 recipe mirrors) and Phase 7 (full upgrade on n8n +
+  ghost + matrix-synapse) are planned but not yet started — run them after Phase 5 DONE.
+- Old Incus orchestrator VM (`cc-ci-orchestrator`, `100.116.55.106`) is still running —
+  stop it via the b1 Incus API once happy with the Hetzner box. mTLS certs at
+  `/srv/incus-terraform-nix-vm-creator/terraform-secrets/`.
+- DNS: `oc.commoninternet.net` A record → `100.84.190.30` still needs adding (operator step).
+
+**Notes:**
+- `cc-ci-loops.service` is **enabled** and wired with `reboot-log.sh` ExecStartPre — a reboot
+  is a non-event; loops + watchdog auto-resume via RESUME_PHASE=1.
+- The cc-ci **server** also moved to Hetzner (server 134485294, `ssh cc-ci` →
+  `100.95.31.88`). It has authenticated Docker Hub pulls and 150 GB disk — the old OOM /
+  disk-starvation / rate-limit issues are gone.
+- All recipe mirrors currently reconcile correctly; no stale open PRs observed.
+- `opencode` v1.15.13 installed at `/home/loops/.local/bin/opencode`. Tinfoil API key is in
+  `.testenv` as `TINFOIL_API_KEY`. Backend switch: `LOOP_BACKEND=opencode
+  LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
+- Launcher scripts rewritten to Python (`launch.py`, `launch-orchestrator.py`,
+  `launch-upgrader.py`); bash wrappers are now one-liners that `exec python3 <script> "$@"`.
+
+### Event 03:13 — migrated from old Incus VM to Hetzner
+Loops were started manually during staging (not by the service); first systemd-managed
+boot was later this session. `cc-ci-loops.service` now enabled.
+
+### Event 05:23 — phase 3 (results-UX) completed
+All R1–R8 Adversary-verified, no VETO. Watchdog auto-advanced to phase 4.
+
+### Event 13:22 — phase 4 paused, jumped to phase 5
+Operator deferred phase 4 (weekly Opus credits exhausted). Phase idx manually set to 10
+(phase 5). Loops restarted on Sonnet.
+
+### Event 17:29 — loops stopped pending restart on different model
+Operator paused loops to reconfigure backend (opencode/tinfoil exploration). Phase 5
+[11/11] was in progress — loops had verified V1/V2/V3/V7 (custom-html-tiny upgrade GREEN).
+Phase idx = 10 (phase 5), loops stopped, watchdog stopped.
--- a/cc-ci-plan/launch-orchestrator.py
+++ b/cc-ci-plan/launch-orchestrator.py
@ -0,0 +1,189 @@
+#!/usr/bin/env python3
+"""
+cc-ci orchestrator launcher — start/resume the orchestrator session in tmux.
+
+The orchestrator is the long-lived supervisory session: it watches the Builder/Adversary
+loops, reads their logs/STATUS, edits the plan/prompts, restarts stuck loops, and owns
+the VM-level fallback. It is SEPARATE from the loops that launch.py manages.
+
+Usage:
+  launch-orchestrator.py start     resume the persistent session (default)
+  launch-orchestrator.py fresh     start a NEW session (no --resume)
+  launch-orchestrator.py stop      kill the tmux session (conversation persists on disk)
+  launch-orchestrator.py status    show session state
+  launch-orchestrator.py attach    tmux attach to the session
+
+Env:
+  LOOP_BACKEND   claude (default) | opencode
+  LOOP_MODEL     model flag, e.g. "sonnet" or "tinfoil/deepseek-v4-pro"
+
+  claude backend:
+    CLAUDE_BIN          claude
+    REMOTE_CONTROL      1 (viewable at claude.ai/code)
+    ORCH_SESSION_ID     override the resume id (else read from $ID_FILE)
+    ORCH_ID_FILE        $LOG_DIR/.orchestrator-session-id
+    ORCH_STARTUP_PROMPT startup nudge injected as the first turn after --resume
+
+  opencode backend:
+    OPENCODE_BIN        /home/loops/.local/bin/opencode
+    OPENCODE_SERVER     http://127.0.0.1:4096
+    (no --resume equivalent; STARTUP_PROMPT is sent as the initial message;
+     the session title in the web UI is the SESSION name)
+"""
+
+import os, sys, subprocess
+from datetime import datetime
+from pathlib import Path
+
+# ── config ────────────────────────────────────────────────────────────────────
+
+SESSION  = os.environ.get("ORCH_SESSION", "cc-ci-orchestrator-vm")
+WORKDIR  = os.environ.get("ORCH_DIR",    "/srv/cc-ci")
+LOG_DIR  = os.environ.get("LOG_DIR",     "/srv/cc-ci/.cc-ci-logs")
+
+BACKEND    = os.environ.get("LOOP_BACKEND", "claude")
+LOOP_MODEL = os.environ.get("LOOP_MODEL",   "")
+
+# claude-specific
+CLAUDE_BIN     = os.environ.get("CLAUDE_BIN",   "claude")
+CLAUDE_FLAGS   = os.environ.get("CLAUDE_FLAGS", "--dangerously-skip-permissions")
+REMOTE_CONTROL = os.environ.get("REMOTE_CONTROL", "1") == "1"
+DEFAULT_ID     = "c746050a-af11-409d-87ba-c05268e2e5d1"
+ID_FILE        = os.environ.get("ORCH_ID_FILE", f"{LOG_DIR}/.orchestrator-session-id")
+STARTUP_PROMPT = os.environ.get("ORCH_STARTUP_PROMPT", (
+    "STARTUP (auto-launch): you are the cc-ci orchestrator, just (re)launched, likely after a "
+    "reboot. Do your AGENTS.md On-startup routine NOW: read cc-ci-plan/REBOOTS.md and run "
+    "cc-ci-plan/launch.py status, then send a proactive PushNotification that you are online "
+    "with the current phase and reboot count, and confirm cc-ci-loops.service brought the loops "
+    "+ watchdog back (relaunch with RESUME_PHASE=1 cc-ci-plan/launch.py start if not). "
+    "Also read cc-ci-plan/JOURNAL.md for recent context before resuming supervision."
+))
+
+# opencode-specific
+OPENCODE_BIN    = os.environ.get("OPENCODE_BIN",    "/home/loops/.local/bin/opencode")
+OPENCODE_SERVER = os.environ.get("OPENCODE_SERVER", "http://127.0.0.1:4096")
+
+# ── helpers ───────────────────────────────────────────────────────────────────
+
+def log(msg):
+    ts = datetime.now().strftime("%H:%M:%S")
+    print(f"[orchestrator {ts}] {msg}", flush=True)
+
+def die(msg):
+    log(f"ERROR: {msg}")
+    sys.exit(1)
+
+def session_alive():
+    return subprocess.run(
+        ["tmux", "has-session", "-t", SESSION], capture_output=True
+    ).returncode == 0
+
+def resume_id():
+    sid = os.environ.get("ORCH_SESSION_ID")
+    if sid:
+        return sid
+    try:
+        v = Path(ID_FILE).read_text().strip()
+        return v or DEFAULT_ID
+    except FileNotFoundError:
+        return DEFAULT_ID
+
+# ── launch ────────────────────────────────────────────────────────────────────
+
+def start(mode="resume"):
+    import shutil
+    if not shutil.which("tmux"):
+        die("tmux not found")
+    Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
+
+    if session_alive():
+        log(f"{SESSION} already running — leaving it (use 'stop' first to relaunch)")
+        return
+
+    model_flag = f"--model '{LOOP_MODEL}'" if LOOP_MODEL else ""
+
+    if BACKEND == "claude":
+        if not shutil.which(CLAUDE_BIN):
+            die(f"claude CLI not found — set CLAUDE_BIN (currently: {CLAUDE_BIN})")
+        if not Path(ID_FILE).exists():
+            Path(ID_FILE).write_text(DEFAULT_ID)
+
+        rc     = f"--remote-control '{SESSION}'" if REMOTE_CONTROL else ""
+        resume = f"--resume '{resume_id()}'" if mode == "resume" else ""
+        prompt = f"'{STARTUP_PROMPT}'" if STARTUP_PROMPT else ""
+        cmd    = f"{CLAUDE_BIN} {resume} {rc} {model_flag} {CLAUDE_FLAGS} {prompt}"
+        detail = f"resume={resume_id()}" if mode == "resume" else "fresh"
+        log(f"starting {SESSION} (backend=claude, {detail}, model={LOOP_MODEL or 'default'})")
+
+    elif BACKEND == "opencode":
+        if not Path(OPENCODE_BIN).exists():
+            die(f"opencode not found at {OPENCODE_BIN}")
+        # No --resume equivalent in opencode; STARTUP_PROMPT orients the new session.
+        # The session title in the web UI identifies it as the orchestrator.
+        prompt = STARTUP_PROMPT or (
+            "You are the cc-ci orchestrator. Read /srv/cc-ci/AGENTS.md and "
+            "cc-ci-plan/JOURNAL.md for context, then resume supervising the loops."
+        )
+        cmd = (
+            f"set -a; . /srv/cc-ci/.testenv; set +a; "
+            f"{OPENCODE_BIN} {model_flag} run --attach '{OPENCODE_SERVER}' "
+            f"--title '{SESSION}' '{prompt}'"
+        )
+        log(f"starting {SESSION} (backend=opencode, model={LOOP_MODEL or 'default'})")
+        log(f"  visible at http://oc.commoninternet.net (tailnet only)")
+    else:
+        die(f"unknown LOOP_BACKEND '{BACKEND}' — use 'claude' or 'opencode'")
+
+    subprocess.run(["tmux", "new-session", "-d", "-s", SESSION, "-c", WORKDIR, cmd])
+    subprocess.run(["tmux", "pipe-pane", "-o", "-t", SESSION,
+                    f"cat >> '{LOG_DIR}/{SESSION}.log'"])
+    log(f"started. attach: tmux attach -t {SESSION}")
+
+# ── main ──────────────────────────────────────────────────────────────────────
+
+def main():
+    cmd = sys.argv[1] if len(sys.argv) > 1 else "start"
+
+    if cmd == "start":
+        start("resume")
+    elif cmd == "fresh":
+        start("fresh")
+    elif cmd == "stop":
+        if session_alive():
+            log(f"killing {SESSION}")
+            subprocess.run(["tmux", "kill-session", "-t", SESSION])
+        else:
+            log(f"{SESSION} not running")
+    elif cmd == "status":
+        state = "RUNNING" if session_alive() else "stopped"
+        log(f"{SESSION}: {state}")
+        subprocess.run(
+            f"ps -eo pid,etime,args | grep '[r]emote-control {SESSION}' || true",
+            shell=True)
+        if BACKEND == "claude":
+            log(f"resume id: {resume_id()}  (file: {ID_FILE})")
+        log(f"backend: {BACKEND}  model: {LOOP_MODEL or '<default>'}")
+    elif cmd == "attach":
+        os.execvp("tmux", ["tmux", "attach", "-t", SESSION])
+    else:
+        backend_note = (
+            "claude:   --resume preserves conversation across reboots; viewable at claude.ai/code\n"
+            "  opencode: fresh session each launch (no --resume); viewable at http://oc.commoninternet.net"
+        )
+        print(f"""cc-ci orchestrator launcher
+
+  launch-orchestrator.py start    resume the persistent session (default)
+  launch-orchestrator.py fresh    start a new session (no --resume)
+  launch-orchestrator.py stop     kill the tmux session
+  launch-orchestrator.py status   show session state
+  launch-orchestrator.py attach   tmux attach
+
+Backend: {BACKEND}  (LOOP_BACKEND env var)
+Model:   {LOOP_MODEL or '<backend default>'}  (LOOP_MODEL env var)
+Session: {SESSION}  cwd={WORKDIR}
+  {backend_note}
+""")
+
+
+if __name__ == "__main__":
+    main()
--- a/cc-ci-plan/launch-orchestrator.sh
+++ b/cc-ci-plan/launch-orchestrator.sh
@ -1,118 +1,3 @@
 #!/usr/bin/env bash
-#
-# launch-orchestrator.sh — start/resume the cc-ci ORCHESTRATOR session in tmux under remote-control.
-#
-# The orchestrator (see /srv/cc-ci/AGENTS.md) is the long-lived SUPERVISORY session: it watches the
-# Builder/Adversary loops, reads their logs/STATUS, edits the plan/prompts, restarts stuck loops, and
-# owns the VM-level fallback. It is SEPARATE from the loops that launch.sh manages — this script only
-# brings the orchestrator back (e.g. after a reboot, which kills the tmux server and every session in
-# it). The conversation itself survives on disk across exits/reboots; remote-control only stays
-# connected while the process is alive, so recovery = relaunch the process and re-attach by --resume.
-#
-# Naming: tmux session AND remote-control name are both "cc-ci-orchestrator-vm" (the -vm suffix
-# distinguishes it from the repo name cc-ci-orchestrator); the loop sessions are cc-ci-builder /
-# cc-ci-adv / cc-ci-watchdog.
-#
-# Usage:
-#   ./launch-orchestrator.sh start     # resume the persistent orchestrator session (DEFAULT)
-#   ./launch-orchestrator.sh fresh     # start a NEW orchestrator session (no --resume)
-#   ./launch-orchestrator.sh status    # show tmux + remote-control state
-#   ./launch-orchestrator.sh attach    # tmux attach to the session (Ctrl-b d to detach)
-#   ./launch-orchestrator.sh stop      # kill the tmux session (conversation persists on disk)
-#
-# The persistent session id is read from $ID_FILE (seeded on first run with DEFAULT_ID). A Claude
-# session keeps the SAME id across --resume, so this stays valid across reboots. To point the script
-# at a different session, edit that file or export ORCH_SESSION_ID.
-
-set -euo pipefail
-
-# ----- config -------------------------------------------------------------
-SESSION="${ORCH_SESSION:-cc-ci-orchestrator-vm}"     # tmux session name == remote-control name
-WORKDIR="${ORCH_DIR:-/srv/cc-ci}"                    # orchestrator cwd (its claude project dir)
-CLAUDE_BIN="${CLAUDE_BIN:-claude}"
-CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
-# REMOTE_CONTROL=1 → --remote-control session, viewable/steerable at claude.ai/code. Needs the box
-# logged into the claude.ai account. =0 for a plain local interactive session.
-REMOTE_CONTROL="${REMOTE_CONTROL:-1}"
-LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
-ID_FILE="${ORCH_ID_FILE:-$LOG_DIR/.orchestrator-session-id}"
-DEFAULT_ID="c746050a-af11-409d-87ba-c05268e2e5d1"    # the orchestrator session as of 2026-05-31 (Hetzner)
-# Startup nudge injected as the resumed session's first turn, so an AUTO-launched orchestrator (e.g.
-# cc-ci-loops.service ExecStartPost after a reboot) actually RUNS its AGENTS.md startup routine —
-# announce itself + report reboots — instead of resuming silently and waiting. Set empty to disable.
-# Must contain NO single quotes (it is single-quoted into the tmux command).
-STARTUP_PROMPT="${ORCH_STARTUP_PROMPT-STARTUP (auto-launch): you are the cc-ci orchestrator, just (re)launched, likely after a reboot. Do your AGENTS.md On-startup routine NOW: read cc-ci-plan/REBOOTS.md and run cc-ci-plan/launch.sh status, then send a proactive PushNotification that you are online with the current phase and reboot count, and confirm cc-ci-loops.service brought the loops + watchdog back (relaunch with RESUME_PHASE=1 cc-ci-plan/launch.sh start if not). Then resume supervising.}"
-# --------------------------------------------------------------------------
-
-log()  { printf '[orchestrator %(%H:%M:%S)T] %s\n' -1 "$*"; }
-die()  { log "ERROR: $*"; exit 1; }
-session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; }
-
-preflight() {
-  command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux"
-  command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)"
-  [[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR"
-  mkdir -p "$LOG_DIR"
-  [[ -f "$ID_FILE" ]] || echo "$DEFAULT_ID" > "$ID_FILE"
-}
-
-resume_id() { echo "${ORCH_SESSION_ID:-$(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID")}"; }
-
-# Launch claude in a detached tmux session. $1=resume ("resume"|"fresh").
-start() {
-  local mode="${1:-resume}"
-  preflight
-  if session_alive; then
-    log "$SESSION already running — leaving it (use '$0 stop' first to relaunch)"
-    return 0
-  fi
-  local rc="" resume="" id=""
-  [[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'"
-  if [[ "$mode" == "resume" ]]; then
-    id="$(resume_id)"
-    [[ -n "$id" ]] && resume="--resume '$id'"
-    log "starting $SESSION (resume=$id, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
-  else
-    log "starting $SESSION FRESH (no resume, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
-  fi
-  # Startup nudge as a POSITIONAL prompt (not stdin — stdin would force print mode and break
-  # remote-control). On --resume this appends as the session's next turn, triggering the AGENTS.md
-  # startup routine (announce + report reboots). Empty STARTUP_PROMPT => clean resume, no nudge.
-  local prompt_arg=""
-  [[ -n "$STARTUP_PROMPT" ]] && prompt_arg="'$STARTUP_PROMPT'"
-  tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
-    "$CLAUDE_BIN $resume $rc $CLAUDE_FLAGS $prompt_arg"
-  tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'"
-  log "started. status: $0 status | attach: tmux attach -t $SESSION"
-}
-
-case "${1:-start}" in
-  start)  start resume ;;
-  fresh)  start fresh ;;
-  stop)
-    if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi
-    ;;
-  status)
-    if session_alive; then
-      log "$SESSION: RUNNING"
-      ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true
-    else
-      log "$SESSION: stopped"
-    fi
-    log "resume id: $(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID")  (file: $ID_FILE)"
-    ;;
-  attach) exec tmux attach -t "$SESSION" ;;
-  *)
-    cat <<EOF
-cc-ci orchestrator launcher
-
-  $0 start    resume the persistent orchestrator session in tmux + remote-control (default)
-  $0 fresh    start a NEW orchestrator session (no --resume)
-  $0 status   show tmux + remote-control state and the resume id
-  $0 attach   tmux attach to the session
-  $0 stop     kill the tmux session (conversation persists on disk)
-
-Env: SESSION=$SESSION  WORKDIR=$WORKDIR  REMOTE_CONTROL=$REMOTE_CONTROL  CLAUDE_BIN=$CLAUDE_BIN
-EOF
-    ;;
-esac
+# Thin wrapper — delegates everything to launch-orchestrator.py in the same directory.
+exec python3 "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/launch-orchestrator.py" "$@"
--- a/cc-ci-plan/launch-upgrader.py
+++ b/cc-ci-plan/launch-upgrader.py
@ -0,0 +1,198 @@
+#!/usr/bin/env python3
+"""
+cc-ci upgrader launcher — one-shot weekly recipe-upgrade job agent.
+
+The upgrader runs /upgrade-all to completion, then stops and stays idle so the
+run + summary remain viewable in the web UI. The next weekly run starts a fresh
+session (start clears any idle/finished session).
+
+Usage:
+  launch-upgrader.py start    use-or-create: leave an in-flight run alone, else start fresh
+  launch-upgrader.py fresh    always kill any existing session and start fresh
+  launch-upgrader.py stop     kill the session
+  launch-upgrader.py status   show session state
+  launch-upgrader.py attach   tmux attach to the session
+
+Env:
+  LOOP_BACKEND     claude (default) | opencode   — also accepts UPGRADER_BACKEND
+  LOOP_MODEL       model flag (overrides UPGRADER_MODEL)
+  UPGRADER_MODEL   sonnet (default for claude) | tinfoil/deepseek-v4-pro (opencode example)
+  UPGRADER_ARGS    extra args passed to /upgrade-all (e.g. "n8n ghost", "--dry-run")
+
+  claude backend:
+    CLAUDE_BIN, CLAUDE_FLAGS, REMOTE_CONTROL
+  opencode backend:
+    OPENCODE_BIN, OPENCODE_SERVER
+"""
+
+import os, sys, subprocess, re
+from datetime import datetime
+from pathlib import Path
+
+# ── config ────────────────────────────────────────────────────────────────────
+
+SESSION = os.environ.get("UPGRADER_SESSION", "cc-ci-upgrader")
+WORKDIR = os.environ.get("UPGRADER_DIR",     "/srv/cc-ci")
+LOG_DIR = os.environ.get("LOG_DIR",          "/srv/cc-ci/.cc-ci-logs")
+
+# LOOP_BACKEND / LOOP_MODEL take precedence (unified control from the operator).
+BACKEND = os.environ.get("LOOP_BACKEND", os.environ.get("UPGRADER_BACKEND", "claude"))
+MODEL   = os.environ.get("LOOP_MODEL",   os.environ.get("UPGRADER_MODEL",   "sonnet"))
+
+CLAUDE_BIN     = os.environ.get("CLAUDE_BIN",   "claude")
+CLAUDE_FLAGS   = os.environ.get("CLAUDE_FLAGS", "--dangerously-skip-permissions")
+REMOTE_CONTROL = os.environ.get("REMOTE_CONTROL", "1") == "1"
+
+OPENCODE_BIN    = os.environ.get("OPENCODE_BIN",    "/home/loops/.local/bin/opencode")
+OPENCODE_SERVER = os.environ.get("OPENCODE_SERVER", "http://127.0.0.1:4096")
+
+UPGRADER_ARGS = os.environ.get("UPGRADER_ARGS", "")
+
+# ── helpers ───────────────────────────────────────────────────────────────────
+
+def log(msg):
+    ts = datetime.now().strftime("%H:%M:%S")
+    print(f"[upgrader {ts}] {msg}", flush=True)
+
+def die(msg):
+    log(f"ERROR: {msg}")
+    sys.exit(1)
+
+def session_alive():
+    return subprocess.run(
+        ["tmux", "has-session", "-t", SESSION], capture_output=True
+    ).returncode == 0
+
+def session_busy():
+    """True while a turn is actively in flight (not idle/finished/wedged)."""
+    r = subprocess.run(["tmux", "capture-pane", "-pt", SESSION],
+                       capture_output=True, text=True)
+    pane = r.stdout if r.returncode == 0 else ""
+    return bool(re.search(r"esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool", pane))
+
+def kill_session():
+    subprocess.run(["tmux", "kill-session", "-t", SESSION], capture_output=True)
+
+# ── kickoff prompt ────────────────────────────────────────────────────────────
+
+def build_kickoff():
+    args_note = f" with arguments: {UPGRADER_ARGS}" if UPGRADER_ARGS else ""
+    return f"""\
+*** cc-ci UPGRADER — weekly recipe-upgrade job ***
+You are the cc-ci Upgrader: a ONE-SHOT job agent, NOT a perpetual loop. Run the
+recipe-upgrade sequence to completion, then STOP. Your cwd is {WORKDIR}; reach the CI
+server with `ssh cc-ci`; creds are in {WORKDIR}/.testenv; skills in {WORKDIR}/.claude/skills/.
+
+DO THIS:
+1. Invoke the /upgrade-all skill in DEFAULT mode{args_note}
+   (read {WORKDIR}/.claude/skills/upgrade-all/SKILL.md for the full procedure). It surveys
+   every enrolled recipe and, for each upgradeable one, runs /recipe-upgrade in DEFAULT
+   mode — recipe PR only, verified by posting `!testme` on the PR (results visible in the
+   PR, iterate up to 3x). A genuinely stale test gets an explanatory PR COMMENT, never a
+   test edit.
+2. Process recipes via per-recipe SUBAGENTS so your own context stays light. If your
+   context usage climbs (~80%), run /compact before continuing.
+3. Write + push the weekly summary (the PR list is the actionable output for the operator).
+4. WHEN THE RUN IS COMPLETE: STOP. Print the final summary (lead with the PR list) and an
+   `UPGRADE RUN COMPLETE` line, then go idle. Do NOT loop, do NOT re-run, and do NOT kill
+   your own session — leave it up so the operator can review the output in the web UI.
+   Next week's run starts a fresh session (the launcher clears this idle one).
+
+GUARDRAILS: NEVER merge any PR. NEVER weaken a test. DEFAULT mode only — do NOT pass
+--with-tests (updating cc-ci tests is the operator's per-recipe opt-in). Single-writer:
+dedicated branches + separate clones, never push main, never touch the build loops'
+/cc-ci /cc-ci-adv clones. The shared Swarm is stateful — go sequentially.
+"""
+
+# ── launch ────────────────────────────────────────────────────────────────────
+
+def start(mode="use-or-create"):
+    import shutil
+    if not shutil.which("tmux"):
+        die("tmux not found")
+    Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
+
+    if session_alive():
+        if mode == "use-or-create" and session_busy():
+            log(f"{SESSION} already running a job (busy) — leaving it")
+            return
+        log(f"{SESSION} exists but idle/stale (or fresh requested) — killing it first")
+        kill_session()
+        import time; time.sleep(1)
+
+    kf = Path(LOG_DIR) / f".kickoff-{SESSION}.txt"
+    kf.write_text(build_kickoff())
+
+    model_flag = f"--model '{MODEL}'" if MODEL else ""
+    log(f"starting {SESSION} (backend={BACKEND}, model={MODEL}, args='{UPGRADER_ARGS or '<none>'}')")
+
+    if BACKEND == "claude":
+        if not shutil.which(CLAUDE_BIN):
+            die(f"claude CLI not found — set CLAUDE_BIN (currently: {CLAUDE_BIN})")
+        rc  = f"--remote-control '{SESSION}'" if REMOTE_CONTROL else ""
+        cmd = f"{CLAUDE_BIN} {rc} {model_flag} {CLAUDE_FLAGS} \"$(cat '{kf}')\""
+
+    elif BACKEND == "opencode":
+        if not Path(OPENCODE_BIN).exists():
+            die(f"opencode not found at {OPENCODE_BIN}")
+        cmd = (
+            f"set -a; . /srv/cc-ci/.testenv; set +a; "
+            f"{OPENCODE_BIN} {model_flag} run --attach '{OPENCODE_SERVER}' "
+            f"--title '{SESSION}' \"$(cat '{kf}')\""
+        )
+        log(f"  visible at http://oc.commoninternet.net (tailnet only)")
+    else:
+        die(f"unknown LOOP_BACKEND '{BACKEND}' — use 'claude' or 'opencode'")
+
+    subprocess.run(["tmux", "new-session", "-d", "-s", SESSION, "-c", WORKDIR, cmd])
+    subprocess.run(["tmux", "pipe-pane", "-o", "-t", SESSION,
+                    f"cat >> '{LOG_DIR}/{SESSION}.log'"])
+    log(f"started. attach: tmux attach -t {SESSION}  log: {LOG_DIR}/{SESSION}.log")
+
+# ── main ──────────────────────────────────────────────────────────────────────
+
+def main():
+    cmd = sys.argv[1] if len(sys.argv) > 1 else "start"
+
+    if cmd == "start":
+        start("use-or-create")
+    elif cmd == "fresh":
+        start("fresh")
+    elif cmd == "stop":
+        if session_alive():
+            log(f"killing {SESSION}")
+            kill_session()
+        else:
+            log(f"{SESSION} not running")
+    elif cmd == "status":
+        if session_alive():
+            busy = "busy" if session_busy() else "idle/finishing"
+            log(f"{SESSION}: RUNNING ({busy})")
+            subprocess.run(
+                f"ps -eo pid,etime,args | grep '[r]emote-control {SESSION}' || true",
+                shell=True)
+        else:
+            log(f"{SESSION}: stopped")
+        log(f"backend: {BACKEND}  model: {MODEL}  args: '{UPGRADER_ARGS or '<none>'}'")
+    elif cmd == "attach":
+        os.execvp("tmux", ["tmux", "attach", "-t", SESSION])
+    else:
+        print(f"""cc-ci upgrader launcher — one-shot weekly recipe-upgrade job
+
+  launch-upgrader.py start    use-or-create (leave busy run alone, else start fresh)
+  launch-upgrader.py fresh    always kill existing + start fresh
+  launch-upgrader.py stop     kill the session
+  launch-upgrader.py status   show session state
+  launch-upgrader.py attach   tmux attach
+
+Backend: {BACKEND}  (LOOP_BACKEND or UPGRADER_BACKEND env var)
+Model:   {MODEL}  (LOOP_MODEL or UPGRADER_MODEL env var)
+Args:    {UPGRADER_ARGS or '<none>'}  (UPGRADER_ARGS env var, passed to /upgrade-all)
+
+claude:   viewable at claude.ai/code
+opencode: viewable at http://oc.commoninternet.net  server={OPENCODE_SERVER}
+""")
+
+
+if __name__ == "__main__":
+    main()
--- a/cc-ci-plan/launch-upgrader.sh
+++ b/cc-ci-plan/launch-upgrader.sh
@ -1,151 +1,3 @@
 #!/usr/bin/env bash
-#
-# launch-upgrader.sh — spin up the cc-ci UPGRADER agent in tmux under remote-control.
-#
-# The Upgrader is a ONE-SHOT job agent (not a perpetual loop like the Builder/Adversary): it runs the
-# weekly recipe-upgrade sequence — the /upgrade-all skill in DEFAULT mode — to completion, then STOPS
-# and stays idle (it does NOT self-terminate) so the run + summary remain viewable/steerable at
-# claude.ai/code exactly like the Builder, instead of being buried in headless cron output. The next
-# weekly run starts a fresh session: `start` leaves an in-flight run alone but clears a finished/idle
-# (or wedged) session and starts clean. The weekly cron (Sat 03:00 UTC, once cc-ci is built — see
-# [[cc-ci-upgrade-all-cron]]) invokes `launch-upgrader.sh start`.
-#
-# Naming: tmux session AND remote-control name are both "cc-ci-upgrader" (matching
-# cc-ci-builder / cc-ci-adv / cc-ci-watchdog / cc-ci-orchestrator).
-#
-# Usage:
-#   ./launch-upgrader.sh start          # use-or-create: if a run is actively in flight leave it,
-#                                        #   else (no session / idle-stale) kill any stale + start fresh
-#   ./launch-upgrader.sh fresh          # always kill any existing + start a fresh run
-#   ./launch-upgrader.sh status | attach | stop
-#
-# Env:
-#   UPGRADER_ARGS=""   passthrough args to /upgrade-all (e.g. "--dry-run", "ghost n8n"); default none
-#                      = full default fleet run. NEVER pass --with-tests here (the cron must not
-#                      auto-edit tests; that's the operator's per-recipe opt-in).
-set -euo pipefail
-
-SESSION="${UPGRADER_SESSION:-cc-ci-upgrader}"        # tmux session name == remote-control name
-WORKDIR="${UPGRADER_DIR:-/srv/cc-ci}"                # cwd: where .claude/skills/ + .testenv live
-
-# Backend selection — mirrors launch.sh. LOOP_BACKEND overrides for consistency.
-UPGRADER_BACKEND="${LOOP_BACKEND:-${UPGRADER_BACKEND:-claude}}"  # "claude" or "opencode"
-# Model: LOOP_MODEL > UPGRADER_MODEL > backend default (sonnet for claude, provider/model for opencode).
-UPGRADER_MODEL="${LOOP_MODEL:-${UPGRADER_MODEL:-sonnet}}"
-
-CLAUDE_BIN="${CLAUDE_BIN:-claude}"
-CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
-OPENCODE_BIN="${OPENCODE_BIN:-/home/loops/.local/bin/opencode}"
-OPENCODE_SERVER="${OPENCODE_SERVER:-http://127.0.0.1:4096}"
-REMOTE_CONTROL="${REMOTE_CONTROL:-1}"                # 1 => --remote-control / opencode web
-LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
-UPGRADER_ARGS="${UPGRADER_ARGS:-}"
-
-log()  { printf '[upgrader %(%H:%M:%S)T] %s\n' -1 "$*"; }
-die()  { log "ERROR: $*"; exit 1; }
-session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; }
-# "actively working" = claude shows interrupt hint; opencode shows spinner/Running tool.
-session_busy() { tmux capture-pane -pt "$SESSION" 2>/dev/null | grep -qE 'esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool'; }
-
-preflight() {
-  command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux"
-  case "$UPGRADER_BACKEND" in
-    claude)   command -v "$CLAUDE_BIN"   >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)" ;;
-    opencode) command -v "$OPENCODE_BIN" >/dev/null 2>&1 || die "opencode not found (set OPENCODE_BIN)"
-              [[ -n "$OPENCODE_HOST" ]] || die "could not detect tailscale IP for OPENCODE_HOST" ;;
-    *) die "unknown UPGRADER_BACKEND '$UPGRADER_BACKEND' — use 'claude' or 'opencode'" ;;
-  esac
-  [[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR"
-  [[ -d "$WORKDIR/.claude/skills/upgrade-all" ]] || die "upgrade-all skill not found under $WORKDIR/.claude/skills"
-  mkdir -p "$LOG_DIR"
-}
-
-write_kickoff() {
-  local kf="$LOG_DIR/.kickoff-$SESSION.txt"
-  cat > "$kf" <<KICK
-*** cc-ci UPGRADER — weekly recipe-upgrade job ***
-You are the cc-ci Upgrader: a ONE-SHOT job agent, NOT a perpetual loop. Run the recipe-upgrade
-sequence to completion, then STOP. Your cwd is ${WORKDIR}; reach the CI server with \`ssh cc-ci\`;
-creds are in ${WORKDIR}/.testenv; the skills live in ${WORKDIR}/.claude/skills/.
-
-DO THIS:
-1. Invoke the **/upgrade-all** skill in DEFAULT mode${UPGRADER_ARGS:+ with arguments: ${UPGRADER_ARGS}}
-   (read ${WORKDIR}/.claude/skills/upgrade-all/SKILL.md for the full procedure). It surveys every
-   enrolled recipe and, for each upgradeable one, runs /recipe-upgrade in DEFAULT mode — recipe PR
-   only, verified by posting \`!testme\` on the PR (results visible in the PR, iterate up to 3x). A
-   genuinely stale test gets an explanatory PR COMMENT, never a test edit.
-2. Process recipes via per-recipe SUBAGENTS (as the skill specifies) so your own context stays light.
-   If your context usage climbs (~80%), run /compact before continuing.
-3. Write + push the weekly summary (the PR list is the actionable output for the operator).
-4. WHEN THE RUN IS COMPLETE: STOP. Print the final summary (lead with the PR list) and an
-   \`UPGRADE RUN COMPLETE\` line, then go idle. Do NOT loop, do NOT re-run, and do NOT kill your own
-   session — leave it up so the operator can review your output + the summary in the web UI
-   (claude.ai/code). Next week's run starts a fresh session (the launcher clears this idle one).
-
-GUARDRAILS: NEVER merge any PR. NEVER weaken a test. DEFAULT mode only — do NOT pass --with-tests
-(updating cc-ci tests is the operator's per-recipe opt-in). Single-writer: dedicated branches +
-separate clones, never push main, never touch the build loops' /cc-ci /cc-ci-adv clones. The shared
-Swarm is stateful — go sequentially and tear down what you deploy.
-KICK
-  echo "$kf"
-}
-
-start() {
-  local mode="${1:-use-or-create}"
-  preflight
-  if session_alive; then
-    if [[ "$mode" == "use-or-create" ]] && session_busy; then
-      log "$SESSION already running a job (busy) — leaving it"; return 0
-    fi
-    log "$SESSION exists but idle/stale (or fresh requested) — killing it first"
-    tmux kill-session -t "$SESSION" 2>/dev/null || true; sleep 1
-  fi
-  local kf
-  kf="$(write_kickoff)"
-  log "starting $SESSION (backend=$UPGRADER_BACKEND, model=$UPGRADER_MODEL, args='${UPGRADER_ARGS:-<none>}')"
-  case "$UPGRADER_BACKEND" in
-    claude)
-      local rc=""
-      [[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'"
-      tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
-        "$CLAUDE_BIN $rc --model '$UPGRADER_MODEL' $CLAUDE_FLAGS \"\$(cat '$kf')\""
-      ;;
-    opencode)
-      tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
-        "set -a; . /srv/cc-ci/.testenv; set +a; $OPENCODE_BIN --model '$UPGRADER_MODEL' run --attach '$OPENCODE_SERVER' --title '$SESSION' \"\$(cat '$kf')\""
-      log "$SESSION visible in web UI at http://oc.commoninternet.net (tailnet only)"
-      ;;
-  esac
-  tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'"
-  log "started. status: $0 status | attach: tmux attach -t $SESSION | log: $LOG_DIR/$SESSION.log"
-}
-
-case "${1:-start}" in
-  start)  start use-or-create ;;
-  fresh)  start fresh ;;
-  stop)   if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi ;;
-  status)
-    if session_alive; then
-      log "$SESSION: RUNNING $(session_busy && echo '(busy)' || echo '(idle/finishing)')"
-      ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true
-    else log "$SESSION: stopped"; fi ;;
-  attach) exec tmux attach -t "$SESSION" ;;
-  *)
-    cat <<EOF
-cc-ci upgrader launcher — one-shot weekly recipe-upgrade job agent (remote-control)
-
-  $0 start    use-or-create: leave an in-flight run alone, else (re)start fresh (DEFAULT; what the cron calls)
-  $0 fresh    always kill any existing + start a fresh run
-  $0 status   show tmux + remote-control state
-  $0 attach   tmux attach to the session
-  $0 stop     kill the session
-
-Env: UPGRADER_BACKEND=$UPGRADER_BACKEND  UPGRADER_MODEL=$UPGRADER_MODEL  UPGRADER_ARGS='${UPGRADER_ARGS:-<none>}'
-     claude:   CLAUDE_BIN=$CLAUDE_BIN  REMOTE_CONTROL=$REMOTE_CONTROL
-     opencode: OPENCODE_BIN=$OPENCODE_BIN  OPENCODE_SERVER=$OPENCODE_SERVER  web=http://oc.commoninternet.net
-     (LOOP_BACKEND / LOOP_MODEL override UPGRADER_BACKEND / UPGRADER_MODEL for unified control)
-The agent runs /upgrade-all (DEFAULT mode) to completion, then STOPS and stays idle (viewable in the
-web UI). It does NOT self-terminate; the next weekly `start` clears the idle session and runs fresh.
-EOF
-    ;;
-esac
+# Thin wrapper — delegates everything to launch-upgrader.py in the same directory.
+exec python3 "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/launch-upgrader.py" "$@"
--- a/cc-ci-plan/launch.py
+++ b/cc-ci-plan/launch.py
@ -0,0 +1,582 @@
+#!/usr/bin/env python3
+"""
+cc-ci loop launcher — phase-aware Builder/Adversary loops + watchdog.
+
+Usage:
+  launch.py start     start loops + watchdog (resets to phase 0 unless RESUME_PHASE=1)
+  launch.py stop      stop loops + watchdog
+  launch.py status    show phase + session state
+  launch.py watchdog  run the watchdog in the foreground (called by start_watchdog)
+  launch.py logs builder|adversary|watchdog   tail a log
+
+Env (all optional — defaults shown):
+  LOOP_BACKEND   claude (default) | opencode
+  LOOP_MODEL     model flag, e.g. "sonnet" (claude) or "tinfoil/deepseek-v4-pro" (opencode)
+  RESUME_PHASE   1 = keep current phase index on start (default resets to 0)
+
+  CLAUDE_BIN     claude
+  OPENCODE_BIN   /home/loops/.local/bin/opencode
+  OPENCODE_SERVER  http://127.0.0.1:4096
+
+  PLAN_DIR       /srv/cc-ci/cc-ci-plan
+  BUILDER_DIR    /srv/cc-ci/cc-ci
+  ADV_DIR        /srv/cc-ci/cc-ci-adv
+  LOG_DIR        /srv/cc-ci/.cc-ci-logs
+  PHASES_SPEC    semicolon-separated "id|planfile|statusfile" entries
+  PHASE_IDX_FILE $LOG_DIR/.phase-idx
+  WATCH_INTERVAL    300   (seconds between heavy checks: phase DONE / heal sessions)
+  SIGNAL_INTERVAL   30    (seconds between handoff / stall checks)
+  STALL_IDLE        300   (idle seconds without a WAITING-UNTIL before reboot)
+  STALL_GRACE       180   (seconds past a WAITING-UNTIL before reboot)
+"""
+
+import hashlib, os, re, subprocess, sys, time
+from datetime import datetime, timezone
+from pathlib import Path
+
+# ── config ────────────────────────────────────────────────────────────────────
+
+PLAN_DIR    = os.environ.get("PLAN_DIR",     "/srv/cc-ci/cc-ci-plan")
+BUILDER_DIR = os.environ.get("BUILDER_DIR",  "/srv/cc-ci/cc-ci")
+ADV_DIR     = os.environ.get("ADV_DIR",      "/srv/cc-ci/cc-ci-adv")
+LOG_DIR     = os.environ.get("LOG_DIR",      "/srv/cc-ci/.cc-ci-logs")
+
+BACKEND        = os.environ.get("LOOP_BACKEND",    "claude")
+LOOP_MODEL     = os.environ.get("LOOP_MODEL",      "")
+REMOTE_CONTROL = os.environ.get("REMOTE_CONTROL",  "1") == "1"
+
+CLAUDE_BIN   = os.environ.get("CLAUDE_BIN",   "claude")
+CLAUDE_FLAGS = os.environ.get("CLAUDE_FLAGS", "")
+if os.getuid() == 0:
+    os.environ.setdefault("CLAUDE_DANGEROUSLY_SKIP_PERMISSIONS", "1")
+else:
+    CLAUDE_FLAGS = os.environ.get("CLAUDE_FLAGS", "--dangerously-skip-permissions")
+
+OPENCODE_BIN    = os.environ.get("OPENCODE_BIN",     "/home/loops/.local/bin/opencode")
+OPENCODE_SERVER = os.environ.get("OPENCODE_SERVER",  "http://127.0.0.1:4096")
+
+ORCH_SESSION      = os.environ.get("ORCH_SESSION",    "cc-ci-orchestrator-vm")
+ORCH_LAUNCHER     = os.environ.get("ORCH_LAUNCHER",   f"{PLAN_DIR}/launch-orchestrator.sh")
+WATCH_ORCHESTRATOR = os.environ.get("WATCH_ORCHESTRATOR", "1") == "1"
+
+BUILDER_SESSION  = "cc-ci-builder"
+ADV_SESSION      = "cc-ci-adv"
+WATCHDOG_SESSION = "cc-ci-watchdog"
+
+WATCH_INTERVAL  = int(os.environ.get("WATCH_INTERVAL",  300))
+SIGNAL_INTERVAL = int(os.environ.get("SIGNAL_INTERVAL", 30))
+STALL_IDLE      = int(os.environ.get("STALL_IDLE",      300))
+STALL_GRACE     = int(os.environ.get("STALL_GRACE",     180))
+
+PHASES_SPEC = os.environ.get("PHASES_SPEC", ";".join([
+    "1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md",
+    "1b|plan-phase1b-review-lint.md|STATUS-1b.md",
+    "1d|plan-phase1d-generic-test-suite.md|STATUS-1d.md",
+    "1e|plan-phase1e-harness-corrections.md|STATUS-1e.md",
+    "2w|plan-phase2w-warm-canonical-quick.md|STATUS-2w.md",
+    "2pc|plan-phase2pc-image-cache.md|STATUS-2pc.md",
+    "2|plan-phase2-recipe-tests.md|STATUS-2.md",
+    "2b|plan-phase2b-test-performance.md|STATUS-2b.md",
+    "3|plan-phase3-results-ux.md|STATUS-3.md",
+    "4|plan-phase4-final-review-polish-cleanup.md|STATUS-4.md",
+    "5|plan-phase5-verify-upgrade-flow.md|STATUS-5.md",
+]))
+PHASES = [p.split("|") for p in PHASES_SPEC.split(";")]
+PHASE_IDX_FILE = os.environ.get("PHASE_IDX_FILE", f"{LOG_DIR}/.phase-idx")
+
+# Regex patterns for session-state detection
+ACTIVE_RE = re.compile(r"esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool")
+LIMIT_RE  = re.compile(r"spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)", re.I)
+FATAL_RE  = re.compile(r"redacted_thinking|blocks cannot be modified|cannot be modified", re.I)
+
+# ── logging ───────────────────────────────────────────────────────────────────
+
+def log(msg):
+    ts = datetime.now().strftime("%H:%M:%S")
+    print(f"[launch {ts}] {msg}", flush=True)
+
+def die(msg):
+    log(f"ERROR: {msg}")
+    sys.exit(1)
+
+# ── tmux helpers ──────────────────────────────────────────────────────────────
+
+def session_alive(name):
+    return subprocess.run(
+        ["tmux", "has-session", "-t", name],
+        capture_output=True
+    ).returncode == 0
+
+def kill_session(name):
+    subprocess.run(["tmux", "kill-session", "-t", name], capture_output=True)
+
+def capture_pane(name, lines=40):
+    r = subprocess.run(["tmux", "capture-pane", "-pt", name], capture_output=True, text=True)
+    return "\n".join(r.stdout.splitlines()[-lines:]) if r.returncode == 0 else ""
+
+def pipe_to_log(session, log_path):
+    subprocess.run(["tmux", "pipe-pane", "-o", "-t", session, f"cat >> '{log_path}'"])
+
+def ping_session(session, msg):
+    """Type a message into a tmux session and submit it, retrying Enter until accepted."""
+    if not session_alive(session):
+        return
+    prefix = msg[:28]
+    subprocess.run(["tmux", "send-keys", "-t", session, "-l", "--", msg], capture_output=True)
+    time.sleep(0.5)
+    for _ in range(5):
+        subprocess.run(["tmux", "send-keys", "-t", session, "Enter"], capture_output=True)
+        time.sleep(1)
+        if prefix not in capture_pane(session, 4):
+            return  # message was accepted
+        subprocess.run(["tmux", "send-keys", "-t", session, "C-m"], capture_output=True)
+        time.sleep(0.5)
+
+# ── phase helpers ─────────────────────────────────────────────────────────────
+
+def cur_idx():
+    try:
+        v = Path(PHASE_IDX_FILE).read_text().strip()
+        return int(v) if v.isdigit() else 0
+    except FileNotFoundError:
+        return 0
+
+def phase_id(idx):     return PHASES[idx][0]
+def phase_plan(idx):   return PHASES[idx][1]
+def phase_status(idx): return PHASES[idx][2]
+def all_ids():         return " ".join(p[0] for p in PHASES)
+
+def resolve_state(repo_dir, basename):
+    """Return the path to a loop-state file — machine-docs/ if present, else repo root."""
+    p = Path(repo_dir) / "machine-docs" / basename
+    return p if p.exists() else Path(repo_dir) / basename
+
+def phase_done(status_basename):
+    path = resolve_state(BUILDER_DIR, status_basename)
+    try:
+        return any(line.startswith("## DONE") for line in path.open())
+    except FileNotFoundError:
+        return False
+
+# ── kickoff prompt ────────────────────────────────────────────────────────────
+
+def build_kickoff(role, idx):
+    pid, plan, status = phase_id(idx), phase_plan(idx), phase_status(idx)
+    preamble = (
+        f"*** cc-ci SUB-PHASE {pid} ***\n"
+        f"SINGLE SOURCE OF TRUTH for THIS phase: /srv/cc-ci/cc-ci-plan/{plan} — read it in full "
+        f"now; it defines this phase's mission and Definition of Done.\n"
+        f"The general loop protocol still applies and lives in /srv/cc-ci/cc-ci-plan/plan.md "
+        f"(§6.1 coordination, §7 pacing, §9 guardrails) — read those sections too.\n"
+        f"Track loop state in PHASE-NAMESPACED files in your repo clone: {status}, "
+        f"BACKLOG-{pid}.md, REVIEW-{pid}.md, JOURNAL-{pid}.md. DECISIONS.md is shared (append).\n"
+        f'"Done" for this phase = the Builder writes "## DONE" to {status} ONLY after every '
+        f"Definition-of-Done item is Adversary-verified with a fresh PASS in REVIEW-{pid}.md "
+        f"(handshake per §6.1).\n"
+        f"The repo's Phase-1 STATUS.md / BACKLOG.md / REVIEW.md are HISTORY from the completed "
+        f"Phase 1 — do NOT use them as your state; use the phase-namespaced files above.\n"
+        f'Wherever the standing rules below say "plan.md"/"STATUS.md"/"BACKLOG.md"/"REVIEW.md", '
+        f"substitute the phase plan and these phase-namespaced files.\n\n"
+        f"=== standing role & rules ===\n"
+    )
+    role_prompt = (Path(PLAN_DIR) / "prompts" / f"{role}.md").read_text()
+    return preamble + role_prompt
+
+# ── agent launch ──────────────────────────────────────────────────────────────
+
+def start_agent(role, session, workdir):
+    if session_alive(session):
+        log(f"{session} already running — leaving it")
+        return
+
+    Path(workdir).mkdir(parents=True, exist_ok=True)
+    Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
+
+    idx = cur_idx()
+    pid, plan = phase_id(idx), phase_plan(idx)
+
+    kf = Path(LOG_DIR) / f".kickoff-{session}.txt"
+    kf.write_text(build_kickoff(role, idx))
+
+    model_flag = f"--model '{LOOP_MODEL}'" if LOOP_MODEL else ""
+
+    if BACKEND == "claude":
+        rc = f"--remote-control '{session}'" if REMOTE_CONTROL else ""
+        cmd = f"{CLAUDE_BIN} {rc} {model_flag} {CLAUDE_FLAGS} \"$(cat '{kf}')\""
+        log(f"starting {session} (backend=claude, phase={pid}, plan={plan}, model={LOOP_MODEL or 'default'})")
+    elif BACKEND == "opencode":
+        cmd = (
+            f"set -a; . /srv/cc-ci/.testenv; set +a; "
+            f"{OPENCODE_BIN} {model_flag} run --attach '{OPENCODE_SERVER}' "
+            f"--title '{session}' \"$(cat '{kf}')\""
+        )
+        log(f"starting {session} (backend=opencode, phase={pid}, model={LOOP_MODEL or 'default'})")
+        log(f"  visible at http://oc.commoninternet.net (tailnet only)")
+    else:
+        die(f"unknown BACKEND '{BACKEND}' — set LOOP_BACKEND=claude or LOOP_BACKEND=opencode")
+
+    subprocess.run(["tmux", "new-session", "-d", "-s", session, "-c", workdir, cmd])
+    pipe_to_log(session, f"{LOG_DIR}/{session}.log")
+
+def start_loops():
+    start_agent("builder",   BUILDER_SESSION, BUILDER_DIR)
+    start_agent("adversary", ADV_SESSION,     ADV_DIR)
+
+def stop_loops():
+    for s in (BUILDER_SESSION, ADV_SESSION):
+        if session_alive(s):
+            log(f"killing {s}")
+            kill_session(s)
+
+# ── session healing ───────────────────────────────────────────────────────────
+
+def heal_session(role, session, workdir):
+    """Restart a dead session; kill+restart a FATAL-wedged one; nudge a limit-stalled one."""
+    if not session_alive(session):
+        log(f"{role} ({session}) gone — restarting (phase {phase_id(cur_idx())})")
+        start_agent(role, session, workdir)
+        return
+
+    pane = capture_pane(session, 25)
+    if ACTIVE_RE.search(pane):
+        return  # actively working — leave it alone
+
+    if FATAL_RE.search(pane):
+        log(f"FATAL session-state error on {role} ({session}) — kill + restart fresh")
+        kill_session(session)
+        start_agent(role, session, workdir)
+        return
+
+    if LIMIT_RE.search(pane):
+        log(f"limit-stall on {role} ({session}) — nudging to resume")
+        ping_session(session,
+            "watchdog: the usage/spend limit appears lifted — RESUME your loop now. "
+            "Pull latest, re-read your phase STATUS/REVIEW files, and continue from where you "
+            "stopped; re-arm your loop pacing.")
+
+# ── stall detection ───────────────────────────────────────────────────────────
+
+_idle_since: dict[str, float] = {}
+
+def _parse_waiting_until(pane):
+    """Extract the epoch timestamp from a WAITING-UNTIL marker, or None."""
+    m = re.search(r"WAITING-UNTIL:\s*(\S+)", pane)
+    if not m:
+        return None
+    try:
+        ts = m.group(1)
+        dt = datetime.fromisoformat(ts.replace("Z", "+00:00"))
+        return dt.timestamp()
+    except Exception:
+        return None
+
+def stall_check_one(role, session, workdir):
+    if not session_alive(session):
+        _idle_since[session] = 0.0
+        return
+
+    now = time.time()
+    pane = capture_pane(session, 40)
+
+    if ACTIVE_RE.search(pane):
+        _idle_since[session] = 0.0
+        return
+
+    since = _idle_since.get(session) or now
+    _idle_since[session] = since
+    idle = now - since
+
+    until = _parse_waiting_until(pane)
+    if until is not None:
+        # Declared wait: only reboot once STALL_GRACE seconds past the stated time.
+        # Never reboot before — that races with the healthy self-wake.
+        if now <= until + STALL_GRACE:
+            return
+        reason = f"past its WAITING-UNTIL by {int(now - until)}s — self-wake did not fire"
+    else:
+        if idle < STALL_IDLE:
+            return
+        reason = f"idle {int(idle)}s with no WAITING-UNTIL marker"
+
+    log(f"stall: {role} ({session}) {reason} — kill + reboot")
+    kill_session(session)
+    start_agent(role, session, workdir)
+    _idle_since[session] = 0.0
+
+def stall_check():
+    stall_check_one("builder",   BUILDER_SESSION, BUILDER_DIR)
+    stall_check_one("adversary", ADV_SESSION,     ADV_DIR)
+
+# ── orchestrator healing ──────────────────────────────────────────────────────
+
+def orchestrator_alive():
+    """
+    True if an orchestrator process is running anywhere.
+    Conflict-safety: never launch a second orchestrator resuming the same session
+    (double-resume causes "thinking blocks cannot be modified" crashes).
+    """
+    for line in subprocess.run("pgrep -x claude || true", shell=True,
+                               capture_output=True, text=True).stdout.splitlines():
+        pid = line.strip()
+        if not pid:
+            continue
+        try:
+            cmdline = Path(f"/proc/{pid}/cmdline").read_bytes().decode(errors="replace").replace("\0", " ")
+            # Skip the loop sessions and the upgrader — they're not the orchestrator.
+            if re.search(r"--remote-control\s+'?cc-ci-(builder|adv|upgrader)'?", cmdline):
+                continue
+            return True
+        except Exception:
+            pass
+    return session_alive(ORCH_SESSION)
+
+def heal_orchestrator():
+    if not WATCH_ORCHESTRATOR:
+        return
+    if not Path(ORCH_LAUNCHER).is_file():
+        return
+
+    if orchestrator_alive():
+        if session_alive(ORCH_SESSION):
+            pane = capture_pane(ORCH_SESSION, 25)
+            if ACTIVE_RE.search(pane):
+                return
+            if FATAL_RE.search(pane):
+                log(f"FATAL session-state error on orchestrator ({ORCH_SESSION}) — kill + restart")
+                kill_session(ORCH_SESSION)
+                subprocess.run([ORCH_LAUNCHER, "start"], capture_output=True)
+        return
+
+    log(f"orchestrator not running — restarting via {ORCH_LAUNCHER}")
+    subprocess.run([ORCH_LAUNCHER, "start"], capture_output=True)
+
+# ── handoff signalling ────────────────────────────────────────────────────────
+
+_last_sha        = ""
+_adv_inbox_seen  = ""
+_builder_inbox_seen = ""
+
+def handoff_reset():
+    global _last_sha, _adv_inbox_seen, _builder_inbox_seen
+    _last_sha = _adv_inbox_seen = _builder_inbox_seen = ""
+
+def _fetch_origin():
+    subprocess.run(f"git -C {BUILDER_DIR!r} fetch -q origin", shell=True, capture_output=True)
+
+def _show_pushed(path):
+    """Read a file from origin/main (machine-docs/ first, then repo root)."""
+    for loc in (f"origin/main:machine-docs/{path}", f"origin/main:{path}"):
+        r = subprocess.run(
+            f"git -C {BUILDER_DIR!r} show {loc!r}",
+            shell=True, capture_output=True, text=True)
+        if r.returncode == 0:
+            return r.stdout
+    return ""
+
+def handoff_check():
+    global _last_sha, _adv_inbox_seen, _builder_inbox_seen
+
+    _fetch_origin()
+    r = subprocess.run(
+        f"git -C {BUILDER_DIR!r} rev-parse origin/main",
+        shell=True, capture_output=True, text=True)
+    head = r.stdout.strip()
+
+    if head:
+        if not _last_sha:
+            _last_sha = head  # baseline silently on first tick
+        elif head != _last_sha:
+            subjects = subprocess.run(
+                f"git -C {BUILDER_DIR!r} log --format=%s {_last_sha}..origin/main",
+                shell=True, capture_output=True, text=True).stdout
+            if re.search(r"^claim", subjects, re.MULTILINE | re.IGNORECASE):
+                log("handoff: new claim(...) commit → pinging Adversary")
+                ping_session(ADV_SESSION,
+                    "watchdog ping: the Builder pushed a gate CLAIM (claim(...) commit). "
+                    "Pull and verify the claimed gate now.")
+            if re.search(r"^review", subjects, re.MULTILINE | re.IGNORECASE):
+                log("handoff: new review(...) commit → pinging Builder")
+                ping_session(BUILDER_SESSION,
+                    "watchdog ping: the Adversary pushed a verdict/finding (review(...) commit). "
+                    "Pull REVIEW and act — proceed if it PASSes your gate, address it if it's a finding.")
+            _last_sha = head
+
+    adv_inbox     = _show_pushed("ADVERSARY-INBOX.md")
+    builder_inbox = _show_pushed("BUILDER-INBOX.md")
+
+    def md5(s): return hashlib.md5(s.encode()).hexdigest()
+
+    if adv_inbox:
+        h = md5(adv_inbox)
+        if h != _adv_inbox_seen:
+            log("handoff: ADVERSARY-INBOX.md changed → pinging Adversary")
+            ping_session(ADV_SESSION,
+                "watchdog ping: the Builder pushed machine-docs/ADVERSARY-INBOX.md — "
+                "pull, read it, act, then delete the file (commit + push) to mark it consumed.")
+            _adv_inbox_seen = h
+    else:
+        _adv_inbox_seen = ""
+
+    if builder_inbox:
+        h = md5(builder_inbox)
+        if h != _builder_inbox_seen:
+            log("handoff: BUILDER-INBOX.md changed → pinging Builder")
+            ping_session(BUILDER_SESSION,
+                "watchdog ping: the Adversary pushed machine-docs/BUILDER-INBOX.md — "
+                "pull, read it, act, then delete the file (commit + push) to mark it consumed.")
+            _builder_inbox_seen = h
+    else:
+        _builder_inbox_seen = ""
+
+# ── watchdog loop ─────────────────────────────────────────────────────────────
+
+def watchdog_loop():
+    idx = cur_idx()
+    log(f"watchdog up — phase={phase_id(idx)} [{idx+1}/{len(PHASES)}] "
+        f"seq='{all_ids()}' signal={SIGNAL_INTERVAL}s heavy={WATCH_INTERVAL}s")
+
+    elapsed = WATCH_INTERVAL  # force a heavy check on the first tick
+    while True:
+        handoff_check()
+        stall_check()
+
+        if elapsed >= WATCH_INTERVAL:
+            elapsed = 0
+            idx    = cur_idx()
+            pid    = phase_id(idx)
+            status = phase_status(idx)
+
+            if phase_done(status):
+                next_idx = idx + 1
+                if next_idx < len(PHASES):
+                    log(f"PHASE {pid} DONE — auto-transitioning to {phase_id(next_idx)}")
+                    stop_loops()
+                    Path(PHASE_IDX_FILE).write_text(str(next_idx))
+                    handoff_reset()
+                    start_loops()
+                else:
+                    log(f"PHASE SEQUENCE COMPLETE (last phase {pid} DONE) — stopping loops")
+                    stop_loops()
+                    ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+                    Path(LOG_DIR, "SEQUENCE-COMPLETE").write_text(
+                        f"cc-ci phase sequence complete {ts}. Phases: {all_ids()}. "
+                        f"Loops stopped; entire build finished.\n")
+                    log("watchdog exiting.")
+                    return
+            else:
+                heal_session("builder",   BUILDER_SESSION, BUILDER_DIR)
+                heal_session("adversary", ADV_SESSION,     ADV_DIR)
+                heal_orchestrator()
+
+        time.sleep(SIGNAL_INTERVAL)
+        elapsed += SIGNAL_INTERVAL
+
+def start_watchdog():
+    if session_alive(WATCHDOG_SESSION):
+        log("watchdog already running")
+        return
+    log("starting watchdog")
+    script = Path(__file__).resolve()
+    subprocess.run([
+        "tmux", "new-session", "-d", "-s", WATCHDOG_SESSION, "-c", PLAN_DIR,
+        f"exec >>'{LOG_DIR}/watchdog.log' 2>&1; python3 '{script}' watchdog"
+    ])
+
+# ── preflight ─────────────────────────────────────────────────────────────────
+
+def preflight():
+    import shutil
+    if not shutil.which("tmux"):
+        die("tmux not found")
+    if BACKEND == "claude":
+        if not shutil.which(CLAUDE_BIN):
+            die(f"claude CLI not found — set CLAUDE_BIN (currently: {CLAUDE_BIN})")
+    elif BACKEND == "opencode":
+        if not Path(OPENCODE_BIN).exists():
+            die(f"opencode not found at {OPENCODE_BIN}")
+    else:
+        die(f"unknown LOOP_BACKEND '{BACKEND}' — use 'claude' or 'opencode'")
+
+    for phase in PHASES:
+        plan = Path(PLAN_DIR) / phase[1]
+        if not plan.exists():
+            die(f"missing phase plan: {plan}")
+    for prompt_file in ("builder.md", "adversary.md"):
+        if not (Path(PLAN_DIR) / "prompts" / prompt_file).exists():
+            die(f"missing {PLAN_DIR}/prompts/{prompt_file}")
+    Path(LOG_DIR).mkdir(parents=True, exist_ok=True)
+
+# ── status ────────────────────────────────────────────────────────────────────
+
+def cmd_status():
+    idx = cur_idx()
+    pid = phase_id(idx)
+    print(f"  phase: {pid} [{idx+1}/{len(PHASES)}]  plan={phase_plan(idx)}  status={phase_status(idx)}")
+    for s in (BUILDER_SESSION, ADV_SESSION, WATCHDOG_SESSION):
+        state = "RUNNING" if session_alive(s) else "stopped"
+        print(f"  {s}: {state}")
+    done_str = "## DONE" if phase_done(phase_status(idx)) else "in progress"
+    print(f"  phase {pid}: {done_str}")
+    seq = Path(LOG_DIR) / "SEQUENCE-COMPLETE"
+    if seq.exists():
+        print(f"  >>> {seq.read_text().strip()}")
+
+# ── main ──────────────────────────────────────────────────────────────────────
+
+def main():
+    cmd = sys.argv[1] if len(sys.argv) > 1 else ""
+
+    if cmd == "start":
+        preflight()
+        stop_loops()
+        if os.environ.get("RESUME_PHASE") != "1":
+            Path(PHASE_IDX_FILE).write_text("0")
+        seq = Path(LOG_DIR) / "SEQUENCE-COMPLETE"
+        if seq.exists():
+            seq.unlink()
+        start_loops()
+        start_watchdog()
+        log(f"started at phase {phase_id(cur_idx())}.")
+
+    elif cmd == "watchdog":
+        preflight()
+        watchdog_loop()
+
+    elif cmd == "status":
+        cmd_status()
+
+    elif cmd == "stop":
+        stop_loops()
+        if session_alive(WATCHDOG_SESSION):
+            log(f"killing {WATCHDOG_SESSION}")
+            kill_session(WATCHDOG_SESSION)
+        log("stopped.")
+
+    elif cmd == "logs":
+        sub = sys.argv[2] if len(sys.argv) > 2 else ""
+        log_files = {
+            "builder":   f"{LOG_DIR}/{BUILDER_SESSION}.log",
+            "adversary": f"{LOG_DIR}/{ADV_SESSION}.log",
+            "watchdog":  f"{LOG_DIR}/watchdog.log",
+        }
+        if sub not in log_files:
+            die("usage: launch.py logs builder|adversary|watchdog")
+        os.execvp("tail", ["tail", "-f", log_files[sub]])
+
+    else:
+        print(f"""cc-ci loop launcher (phase-aware)
+
+  launch.py start     start loops + watchdog (RESUME_PHASE=1 to keep current phase)
+  launch.py stop      stop loops + watchdog
+  launch.py status    show phase + session state
+  launch.py logs builder|adversary|watchdog   tail a log
+  launch.py watchdog  run watchdog in foreground
+
+Backend: {BACKEND}   Model: {LOOP_MODEL or '<default>'}
+Phase sequence ({len(PHASES)} phases, auto-advance on ## DONE, stop after last):
+  {all_ids()}
+""")
+
+
+if __name__ == "__main__":
+    main()
--- a/cc-ci-plan/launch.sh
+++ b/cc-ci-plan/launch.sh
@ -1,505 +1,3 @@
 #!/usr/bin/env bash
-#
-# launch.sh — start and supervise the two cc-ci autonomous loops + a phase-aware watchdog.
-#
-# Model (see plan.md §6 / §6.1): two INDEPENDENT Claude Code sessions —
-#   • Builder   (tmux session: cc-ci-builder)   working clone /srv/cc-ci/cc-ci
-#   • Adversary (tmux session: cc-ci-adv)        working clone /srv/cc-ci/cc-ci-adv
-# coordinating only through the git repo on git.autonomic.zone.
-#
-# PHASES: the watchdog runs an ordered sequence of sub-phases (default: 1c → 1b → 1d → 1e → 2w → 2 → 2b → 3 → 4;
-# 2w = warm-canonical/--quick, interjected; Phase 2 pauses for it then resumes).
-# Each phase has its own plan + phase-namespaced loop-state files (STATUS-<id>.md etc.). When a phase's
-# STATUS-<id>.md shows "## DONE", the watchdog AUTO-TRANSITIONS to the next phase; after the LAST
-# phase (4, final review/polish/cleanup) it STOPS the loops and exits (end of the whole build).
-#
-# Three jobs: ITERATION (each agent's /loop), RESILIENCE (restart a dead loop), HANDOFF SIGNALLING
-# (ping the waiting loop the moment its counterpart hands off), PHASE SEQUENCING (this file).
-#
-# Usage:
-#   ./launch.sh start       # start the sequence at phase 0 + watchdog (stops/relaunches loops)
-#   ./launch.sh watchdog    # run only the supervision loop in the foreground
-#   ./launch.sh status      # show phase + session + DONE state
-#   ./launch.sh logs builder|adversary|watchdog   # tail a session/log
-#   ./launch.sh stop        # stop both loops + watchdog
-
-set -euo pipefail
-
-# Absolute path to this script, so the watchdog re-invokes it correctly regardless of cwd.
-SELF="$(readlink -f "${BASH_SOURCE[0]}")"
-
-# ----- config -------------------------------------------------------------
-PLAN_DIR="${PLAN_DIR:-/srv/cc-ci/cc-ci-plan}"
-
-# ----- backend selection ------------------------------------------------------
-# LOOP_BACKEND: "claude" (default) or "opencode" (tinfoil/opencode web, tailscale-only).
-# LOOP_MODEL:   model to pass to the backend.
-#   claude:    e.g. "sonnet", "opus" (--model flag); empty = use CLI default.
-#   opencode:  "provider/model" e.g. "tinfoil/deepseek-v4-pro".
-LOOP_BACKEND="${LOOP_BACKEND:-claude}"
-LOOP_MODEL="${LOOP_MODEL:-}"
-
-CLAUDE_BIN="${CLAUDE_BIN:-claude}"
-OPENCODE_BIN="${OPENCODE_BIN:-/home/loops/.local/bin/opencode}"
-# opencode web server listens on localhost (nginx proxies it at oc.commoninternet.net).
-# One shared server hosts all sessions; agents attach with --attach.
-OPENCODE_SERVER="${OPENCODE_SERVER:-http://127.0.0.1:4096}"
-OPENCODE_PORT="${OPENCODE_PORT:-4096}"
-
-# --dangerously-skip-permissions cannot be passed as a FLAG when running as root (claude blocks it).
-# Use the env var form instead; detect root and switch automatically.
-if [ "$(id -u)" = "0" ]; then
-  export CLAUDE_DANGEROUSLY_SKIP_PERMISSIONS=1
-  CLAUDE_FLAGS="${CLAUDE_FLAGS:-}"
-else
-  CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
-fi
-# REMOTE_CONTROL=1 → interactive --remote-control sessions (viewable at claude.ai/code), required
-# for /loop. The box must be logged into the claude.ai account. =0 for plain interactive.
-# For opencode backend this controls whether to start the opencode web server.
-REMOTE_CONTROL="${REMOTE_CONTROL:-1}"
-
-BUILDER_DIR="${BUILDER_DIR:-/srv/cc-ci/cc-ci}"        # Builder's repo clone
-ADV_DIR="${ADV_DIR:-/srv/cc-ci/cc-ci-adv}"            # Adversary's repo clone
-LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
-
-WATCH_INTERVAL="${WATCH_INTERVAL:-300}"   # seconds between HEAVY checks (phase DONE / restart dead loops)
-SIGNAL_INTERVAL="${SIGNAL_INTERVAL:-30}"  # seconds between HANDOFF checks (ping the waiting loop)
-STALL_IDLE="${STALL_IDLE:-300}"           # NO-marker case: seconds a loop may sit idle (turn ended
-                                          # without declaring a wait) before the watchdog reboots it
-STALL_GRACE="${STALL_GRACE:-180}"         # marker case: seconds PAST a loop's WAITING-UNTIL before
-                                          # reboot. The real ScheduleWakeup fires AT the stated time;
-                                          # grace covers wake+start latency + marker/scheduler skew so
-                                          # the watchdog never RACES (pre-empts) a healthy self-wake.
-
-BUILDER_SESSION="cc-ci-builder"
-ADV_SESSION="cc-ci-adv"
-WATCHDOG_SESSION="cc-ci-watchdog"
-# Orchestrator (supervisory session) — the watchdog keeps it alive too, via launch-orchestrator.sh.
-ORCH_SESSION="${ORCH_SESSION:-cc-ci-orchestrator-vm}"
-ORCH_LAUNCHER="${ORCH_LAUNCHER:-$PLAN_DIR/launch-orchestrator.sh}"
-# Watchdog supervision of the orchestrator can be disabled (=0) if you run the orchestrator yourself
-# and don't want it auto-(re)launched.
-WATCH_ORCHESTRATOR="${WATCH_ORCHESTRATOR:-1}"
-
-# Ordered phase sequence: each entry "id|planfile|statusbasename". The watchdog runs them in order,
-# auto-transitions on the phase's "## DONE" (in BUILDER_DIR/<statusbasename>), and STOPS after the
-# last one (manual gate). Override PHASES_SPEC (semicolon-separated) to change the sequence.
-PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md;1d|plan-phase1d-generic-test-suite.md|STATUS-1d.md;1e|plan-phase1e-harness-corrections.md|STATUS-1e.md;2w|plan-phase2w-warm-canonical-quick.md|STATUS-2w.md;2pc|plan-phase2pc-image-cache.md|STATUS-2pc.md;2|plan-phase2-recipe-tests.md|STATUS-2.md;2b|plan-phase2b-test-performance.md|STATUS-2b.md;3|plan-phase3-results-ux.md|STATUS-3.md;4|plan-phase4-final-review-polish-cleanup.md|STATUS-4.md;5|plan-phase5-verify-upgrade-flow.md|STATUS-5.md}"
-IFS=';' read -r -a PHASES <<< "$PHASES_SPEC"
-PHASE_IDX_FILE="${PHASE_IDX_FILE:-$LOG_DIR/.phase-idx}"
-# --------------------------------------------------------------------------
-
-log() { printf '[launch %(%H:%M:%S)T] %s\n' -1 "$*"; }
-die() { log "ERROR: $*"; exit 1; }
-need() { command -v "$1" >/dev/null 2>&1 || die "missing dependency: $1"; }
-
-# ----- phase helpers ------------------------------------------------------
-cur_idx()      { local i; i="$(cat "$PHASE_IDX_FILE" 2>/dev/null || echo 0)"; [[ "$i" =~ ^[0-9]+$ ]] || i=0; echo "$i"; }
-phase_id()     { echo "${PHASES[$1]}" | cut -d'|' -f1; }
-phase_plan()   { echo "${PHASES[$1]}" | cut -d'|' -f2; }
-phase_status() { echo "${PHASES[$1]}" | cut -d'|' -f3; }
-phase_review() { echo "REVIEW-$(phase_id "$1").md"; }
-# Loop-state files may sit at the repo root OR under machine-docs/ (the 1b RL6 move). Prefer
-# machine-docs/ if present, else root — so the watchdog survives the move whenever it happens.
-resolve_state() { local dir="$1" base="$2"; if [[ -f "$dir/machine-docs/$base" ]]; then echo "$dir/machine-docs/$base"; else echo "$dir/$base"; fi; }
-phase_done()   { grep -qE '^##[[:space:]]+DONE' "$(resolve_state "$BUILDER_DIR" "$1")" 2>/dev/null; }   # $1 = status basename (read locally)
-all_ids()      { local p; for p in "${PHASES[@]}"; do printf '%s ' "$(echo "$p" | cut -d'|' -f1)"; done; }
-
-preflight() {
-  need tmux
-  case "$LOOP_BACKEND" in
-    claude)   command -v "$CLAUDE_BIN"   >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)" ;;
-    opencode) command -v "$OPENCODE_BIN" >/dev/null 2>&1 || die "opencode not found at $OPENCODE_BIN; install from https://opencode.ai" ;;
-    *) die "unknown LOOP_BACKEND '$LOOP_BACKEND' — use 'claude' or 'opencode'" ;;
-  esac
-  local p plan
-  for p in "${PHASES[@]}"; do
-    plan="$(echo "$p" | cut -d'|' -f2)"
-    [[ -f "$PLAN_DIR/$plan" ]] || die "missing phase plan $PLAN_DIR/$plan"
-  done
-  [[ -f "$PLAN_DIR/prompts/builder.md"   ]] || die "missing $PLAN_DIR/prompts/builder.md"
-  [[ -f "$PLAN_DIR/prompts/adversary.md" ]] || die "missing $PLAN_DIR/prompts/adversary.md"
-  mkdir -p "$LOG_DIR"
-}
-
-session_alive() { tmux has-session -t "$1" 2>/dev/null; }
-
-# Build the per-session kickoff (phase preamble + base role prompt) and launch the agent.
-# role ∈ {builder, adversary}.
-# Backend "claude": prompt passed as positional arg via $(cat kf) — never stdin (piping breaks /loop).
-# Backend "opencode": opencode serves a web UI on OPENCODE_HOST:OPENCODE_PORT (tailnet-only);
-#   each session gets a dedicated port offset (builder=+0, adversary=+1) so they don't collide.
-#   The kickoff prompt is passed via `opencode run <message>` in a detached tmux session; the web
-#   UI is accessible at http://OPENCODE_HOST:PORT for observation (like --remote-control).
-start_agent() {
-  local role="$1" session="$2" workdir="$3"
-  if session_alive "$session"; then log "$session already running — leaving it"; return 0; fi
-  mkdir -p "$workdir"
-  local idx pid plan status kf
-  idx="$(cur_idx)"; pid="$(phase_id "$idx")"; plan="$(phase_plan "$idx")"; status="$(phase_status "$idx")"
-  kf="$LOG_DIR/.kickoff-$session.txt"
-  {
-    cat <<PREAMBLE
-*** cc-ci SUB-PHASE ${pid} ***
-SINGLE SOURCE OF TRUTH for THIS phase: /srv/cc-ci/cc-ci-plan/${plan} — read it in full now; it defines this phase's mission and Definition of Done.
-The general loop protocol still applies and lives in /srv/cc-ci/cc-ci-plan/plan.md (§6.1 coordination, §7 pacing, §9 guardrails) — read those sections too.
-Track loop state in PHASE-NAMESPACED files in your repo clone: ${status}, BACKLOG-${pid}.md, REVIEW-${pid}.md, JOURNAL-${pid}.md. DECISIONS.md is shared (append).
-"Done" for this phase = the Builder writes "## DONE" to ${status} ONLY after every Definition-of-Done item is Adversary-verified with a fresh PASS in REVIEW-${pid}.md (handshake per §6.1).
-The repo's Phase-1 STATUS.md / BACKLOG.md / REVIEW.md are HISTORY from the completed Phase 1 — do NOT use them as your state; use the phase-namespaced files above.
-Wherever the standing rules below say "plan.md"/"STATUS.md"/"BACKLOG.md"/"REVIEW.md", substitute the phase plan and these phase-namespaced files.
-
-=== standing role & rules ===
-PREAMBLE
-    cat "$PLAN_DIR/prompts/$role.md"
-  } > "$kf"
-
-  local model_flag=""
-  [[ -n "$LOOP_MODEL" ]] && model_flag="--model '$LOOP_MODEL'"
-
-  case "$LOOP_BACKEND" in
-    claude)
-      local rc=""
-      [[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$session'"
-      log "starting $session (backend=claude, phase=$pid, model=${LOOP_MODEL:-default}, cwd=$workdir)"
-      tmux new-session -d -s "$session" -c "$workdir" \
-        "$CLAUDE_BIN $rc $model_flag $CLAUDE_FLAGS \"\$(cat '$kf')\""
-      ;;
-    opencode)
-      # One shared opencode web server (opencode-web.service or manually started) hosts all sessions.
-      # Each agent attaches to it as a named session visible in the web UI at oc.commoninternet.net.
-      log "starting $session (backend=opencode, phase=$pid, model=${LOOP_MODEL:-default}, server=$OPENCODE_SERVER)"
-      tmux new-session -d -s "$session" -c "$workdir" \
-        "set -a; . /srv/cc-ci/.testenv; set +a; $OPENCODE_BIN $model_flag run --attach '$OPENCODE_SERVER' --title '$session' \"\$(cat '$kf')\""
-      log "$session visible in web UI at http://oc.commoninternet.net (tailnet only)"
-      ;;
-  esac
-  tmux pipe-pane -o -t "$session" "cat >> '$LOG_DIR/$session.log'"
-}
-
-start_loops() {
-  start_agent builder   "$BUILDER_SESSION" "$BUILDER_DIR"
-  start_agent adversary "$ADV_SESSION"     "$ADV_DIR"
-}
-
-stop_loops() {
-  local s
-  for s in "$BUILDER_SESSION" "$ADV_SESSION"; do
-    if session_alive "$s"; then log "killing $s"; tmux kill-session -t "$s" || true; fi
-  done
-}
-
-# Wake a loop by typing a message into its tmux session and SUBMITTING it. A single Enter after a
-# long `send-keys -l` is often swallowed while the TUI ingests the paste (text left unsent in the
-# input box), so retry Enter/C-m until the message's leading text is no longer in the input box.
-ping_session() {
-  local s="$1" msg="$2" prefix i
-  session_alive "$s" || return 0
-  prefix="${msg:0:28}"
-  tmux send-keys -t "$s" -l -- "$msg" 2>/dev/null || return 0
-  sleep 0.5
-  for i in 1 2 3 4 5; do
-    tmux send-keys -t "$s" Enter 2>/dev/null
-    sleep 1
-    tmux capture-pane -pt "$s" 2>/dev/null | tail -4 | grep -qF -- "$prefix" || return 0  # submitted
-    tmux send-keys -t "$s" C-m 2>/dev/null; sleep 0.5
-  done
-}
-
-# A loop can stall ALIVE on a usage/spend-limit notice: the claude process stays up (so the
-# dead-session restart never fires) but makes no progress, and the /loop self-pacing is dead because
-# the limit interrupted the turn that would have scheduled the next tick. Detect that signature
-# (limit text present + no active-turn marker) and re-nudge it each heavy tick — once the limit resets
-# the next nudge lands and the loop resumes. Gated on the limit text so we NEVER nudge a loop that is
-# just legitimately idle-waiting on a handoff.
-LIMIT_RE='spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)'
-# FATAL = an unrecoverable session-state API error that recurs on EVERY turn (so the session stays
-# alive but wedged — a nudge can't fix it; only a fresh session can). The confirmed case: the
-# "thinking/redacted_thinking blocks ... cannot be modified" 400 that has hit the Adversary
-# repeatedly (interrupted-mid-thinking corrupts the replayed history). Kill + restart fresh; the loop
-# re-orients from the repo. Matched conservatively so it never fires on transient/working states.
-FATAL_RE='redacted_thinking|blocks cannot be modified|cannot be modified'
-
-# Heal one loop session: dead -> restart; wedged on a FATAL error -> kill + restart fresh; stalled on
-# a usage limit -> nudge. No-op while actively working ("esc to interrupt" on screen).
-heal_session() {
-  local role="$1" s="$2" dir="$3" pane
-  if ! session_alive "$s"; then
-    log "$role ($s) gone — restarting (phase $(phase_id "$(cur_idx)"))"
-    start_agent "$role" "$s" "$dir"; return 0
-  fi
-  pane="$(tmux capture-pane -pt "$s" 2>/dev/null | tail -25 || true)"
-  # "esc to interrupt" = claude actively working; "running" or spinner chars = opencode actively working
-  printf '%s\n' "$pane" | grep -qE 'esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool' && return 0
-  if printf '%s\n' "$pane" | grep -qiE "$FATAL_RE"; then
-    log "FATAL session-state error on $role ($s) — kill + restart fresh (re-orients from repo)"
-    tmux kill-session -t "$s" 2>/dev/null || true
-    start_agent "$role" "$s" "$dir"; return 0
-  fi
-  if printf '%s\n' "$pane" | grep -qiE "$LIMIT_RE"; then
-    log "limit-stall detected on $role ($s) — re-nudging to resume"
-    ping_session "$s" "watchdog: the usage/spend limit appears lifted — RESUME your loop now. Pull latest, re-read your phase STATUS/REVIEW files, and continue from where you stopped; re-arm your loop pacing."
-  fi
-}
-
-# --- Idle-wedge detection (complements heal_session's dead/FATAL/limit cases) ----------------------
-# A loop can sit ALIVE but wedged — e.g. garbled output at the context limit — showing none of the
-# heal_session signals (not dead, no FATAL string, no limit notice). The loops therefore DECLARE every
-# wait with a final-line marker `WAITING-UNTIL: <ISO-8601 UTC>` and cap each wait at 10 min (plan §7).
-# A healthy idle loop ALWAYS has a current marker as its last message; a wedge does not (or has one
-# whose time has already passed). So: reboot a loop that has been idle (no "esc to interrupt") for
-# >= STALL_IDLE seconds AND (has no WAITING-UNTIL marker OR is now past the time that marker named).
-# Runs every signal tick (30 s) for fine resolution; rebooting is safe — the loop re-orients from
-# git + its phase STATUS/REVIEW files.
-declare -A _wd_idle_since   # session -> epoch first seen idle this stretch (0/unset = working)
-
-_parse_waiting_until() {    # arg1 = pane text; echoes epoch seconds of the last marker, or nothing
-  local line ts
-  line="$(printf '%s\n' "$1" | grep -oE 'WAITING-UNTIL:[[:space:]]*[0-9][0-9T:Z+-]+' | tail -1)"
-  [[ -n "$line" ]] || return 0
-  ts="$(printf '%s' "${line#WAITING-UNTIL:}" | tr -d '[:space:]')"
-  date -u -d "$ts" +%s 2>/dev/null || true
-}
-
-stall_check_one() {
-  local role="$1" s="$2" dir="$3" pane now until idle since reason
-  session_alive "$s" || { _wd_idle_since[$s]=0; return 0; }   # dead => heal_session handles it
-  now="$(printf '%(%s)T' -1)"
-  pane="$(tmux capture-pane -pt "$s" 2>/dev/null | tail -40 || true)"
-  if printf '%s\n' "$pane" | grep -q 'esc to interrupt'; then
-    _wd_idle_since[$s]=0; return 0                            # actively working — not idle
-  fi
-  since="${_wd_idle_since[$s]:-0}"
-  if [[ "$since" == 0 ]]; then since="$now"; _wd_idle_since[$s]="$now"; fi
-  idle=$(( now - since ))
-  until="$(_parse_waiting_until "$pane")"
-  if [[ -n "$until" ]]; then
-    # Declared wait: the loop's own ScheduleWakeup fires AT 'until'. Reboot ONLY once we are
-    # STALL_GRACE seconds PAST it — i.e. the self-wake genuinely failed. Never reboot before/at
-    # 'until' (that races and pre-empts the healthy wake — the original false-reboot bug).
-    (( now > until + STALL_GRACE )) || return 0
-    reason="past its WAITING-UNTIL by $(( now - until ))s — self-wake did not fire"
-  else
-    # No declared wait: a turn ended without scheduling/declaring. Treat as a wedge once idle a while.
-    (( idle >= STALL_IDLE )) || return 0
-    reason="idle ${idle}s with no WAITING-UNTIL marker"
-  fi
-  log "stall: $role ($s) $reason — kill + reboot (re-orients from repo)"
-  tmux kill-session -t "$s" 2>/dev/null || true
-  start_agent "$role" "$s" "$dir"
-  _wd_idle_since[$s]=0
-}
-
-stall_check() {
-  stall_check_one builder   "$BUILDER_SESSION" "$BUILDER_DIR"
-  stall_check_one adversary "$ADV_SESSION"     "$ADV_DIR"
-}
-
-# Is an orchestrator process alive ANYWHERE? Conflict-safety: we must NEVER launch a second
-# orchestrator that resumes the same conversation while one is already running (that double-resume is
-# the likely cause of the "thinking blocks cannot be modified" crashes). The orchestrator may be
-# running as a managed tmux session (cc-ci-orchestrator) OR as a plain terminal session the operator
-# started by hand (no flags). So: alive iff any `claude` process exists that is NOT one of the two
-# loop sessions (identified by their --remote-control name), or the managed tmux session exists.
-orchestrator_alive() {
-  local pid args
-  for pid in $(pgrep -x claude 2>/dev/null); do
-    args="$(tr '\0' ' ' < "/proc/$pid/cmdline" 2>/dev/null || true)"
-    # skip the loops + the one-shot upgrader job (matched by remote-control session NAME, not a
-    # stray path mention) — none of these is the orchestrator.
-    printf '%s' "$args" | grep -qE -- "--remote-control +'?cc-ci-(builder|adv|upgrader)'?" && continue
-    return 0   # a non-loop claude process => orchestrator (or operator) is alive
-  done
-  tmux has-session -t "$ORCH_SESSION" 2>/dev/null && return 0
-  return 1
-}
-
-# Keep the orchestrator alive: restart it (via launch-orchestrator.sh, which resumes its session) ONLY
-# when none is running; if it's the managed tmux session and wedged on a FATAL error, kill+restart.
-heal_orchestrator() {
-  [[ "$WATCH_ORCHESTRATOR" == "1" ]] || return 0
-  [[ -x "$ORCH_LAUNCHER" ]] || return 0
-  if orchestrator_alive; then
-    if tmux has-session -t "$ORCH_SESSION" 2>/dev/null; then
-      local pane; pane="$(tmux capture-pane -pt "$ORCH_SESSION" 2>/dev/null | tail -25 || true)"
-      printf '%s\n' "$pane" | grep -qE 'esc to interrupt|⠋|⠙|⠹|⠸|⠼|⠴|⠦|⠧|⠇|⠏|Running tool' && return 0
-      if printf '%s\n' "$pane" | grep -qiE "$FATAL_RE"; then
-        log "FATAL session-state error on orchestrator ($ORCH_SESSION) — kill + restart fresh"
-        tmux kill-session -t "$ORCH_SESSION" 2>/dev/null || true
-        "$ORCH_LAUNCHER" start >/dev/null 2>&1 || true
-      fi
-    fi
-    return 0
-  fi
-  log "orchestrator not running anywhere — restarting via $ORCH_LAUNCHER"
-  "$ORCH_LAUNCHER" start >/dev/null 2>&1 || true
-}
-
-# Detect handoffs against the PUSHED origin/main — i.e. exactly what the RECEIVER will pull — NOT the
-# writer's local working tree. (Reading the working tree fired on a claim/verdict the writer hadn't
-# pushed yet; the receiver then pulled a stale remote, saw "no formal gate", and a clarifying
-# inbox round-trip ensued. Mirroring origin/main eliminates that race.) origin/main is the shared
-# branch, so all four files are read from one clone's origin/main after a single best-effort fetch.
-_wd_fetch_origin() { git -C "$1" fetch -q origin 2>/dev/null || true; }
-_wd_show_pushed()  { git -C "$1" show "origin/main:machine-docs/$2" 2>/dev/null || git -C "$1" show "origin/main:$2" 2>/dev/null || true; }
-
-_wd_last_sha=""; _wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""
-handoff_reset() { _wd_last_sha=""; _wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""; }   # call on phase transition
-# Signal handoffs off the loops' CONVENTIONAL COMMIT PREFIXES on origin/main — NOT by parsing
-# free-form markdown prose (brittle). The loops consistently prefix every gate claim `claim(...)`
-# and every verdict/finding `review(...)`. So: a new `claim(` commit pushed => ping the Adversary;
-# a new `review(` commit => ping the Builder. Edge-triggered on the origin/main SHA (append-only —
-# the loops never force-push), so it can't double-fire or mis-route. INBOX files are detected
-# separately (which file changed routes the ping). All reads are of the PUSHED state (what the
-# receiver pulls).
-handoff_check() {
-  local head subjects adv_inbox builder_inbox h
-  _wd_fetch_origin "$BUILDER_DIR"
-  head="$(git -C "$BUILDER_DIR" rev-parse origin/main 2>/dev/null || true)"
-  if [[ -n "$head" ]]; then
-    if [[ -z "$_wd_last_sha" ]]; then
-      _wd_last_sha="$head"   # baseline silently on first observation / restart
-    elif [[ "$head" != "$_wd_last_sha" ]]; then
-      subjects="$(git -C "$BUILDER_DIR" log --format='%s' "${_wd_last_sha}..origin/main" 2>/dev/null || true)"
-      if printf '%s\n' "$subjects" | grep -qiE '^claim'; then
-        log "handoff: new claim(...) commit on origin/main -> pinging Adversary"
-        ping_session "$ADV_SESSION" "watchdog ping: the Builder pushed a gate CLAIM (claim(...) commit). Pull and verify the claimed gate now."
-      fi
-      if printf '%s\n' "$subjects" | grep -qiE '^review'; then
-        log "handoff: new review(...) commit on origin/main -> pinging Builder"
-        ping_session "$BUILDER_SESSION" "watchdog ping: the Adversary pushed a verdict/finding (review(...) commit). Pull REVIEW and act — proceed if it PASSes your gate, address it if it's a finding."
-      fi
-      _wd_last_sha="$head"
-    fi
-  fi
-
-  adv_inbox="$(_wd_show_pushed "$BUILDER_DIR" "ADVERSARY-INBOX.md")"
-  builder_inbox="$(_wd_show_pushed "$BUILDER_DIR" "BUILDER-INBOX.md")"
-
-  # INBOX side-channel (§6.1), detected on the pushed state. Receiver deletes after consuming =>
-  # absent on origin/main => re-arm so the next write re-pings.
-  if [[ -n "$adv_inbox" ]]; then
-    h="$(printf '%s' "$adv_inbox" | md5sum | awk '{print $1}')"
-    if [[ "$h" != "$_wd_adv_inbox_seen" ]]; then
-      log "handoff: ADVERSARY-INBOX.md new/changed (pushed) -> pinging Adversary"
-      ping_session "$ADV_SESSION" "watchdog ping: the Builder pushed machine-docs/ADVERSARY-INBOX.md — pull, read it, act, then delete the file (commit + push) to mark it consumed."
-      _wd_adv_inbox_seen="$h"
-    fi
-  else
-    _wd_adv_inbox_seen=""
-  fi
-  if [[ -n "$builder_inbox" ]]; then
-    h="$(printf '%s' "$builder_inbox" | md5sum | awk '{print $1}')"
-    if [[ "$h" != "$_wd_builder_inbox_seen" ]]; then
-      log "handoff: BUILDER-INBOX.md new/changed (pushed) -> pinging Builder"
-      ping_session "$BUILDER_SESSION" "watchdog ping: the Adversary pushed machine-docs/BUILDER-INBOX.md — pull, read it, act, then delete the file (commit + push) to mark it consumed."
-      _wd_builder_inbox_seen="$h"
-    fi
-  else
-    _wd_builder_inbox_seen=""
-  fi
-}
-
-watchdog_loop() {
-  local idx pid status next
-  idx="$(cur_idx)"; pid="$(phase_id "$idx")"
-  log "watchdog up (phase=$pid [$((idx+1))/${#PHASES[@]}], seq='$(all_ids)', signal=${SIGNAL_INTERVAL}s, heavy=${WATCH_INTERVAL}s)"
-  local elapsed="$WATCH_INTERVAL"
-  while true; do
-    handoff_check
-    stall_check
-    if (( elapsed >= WATCH_INTERVAL )); then
-      elapsed=0
-      idx="$(cur_idx)"; pid="$(phase_id "$idx")"; status="$(phase_status "$idx")"
-      if phase_done "$status"; then
-        next=$((idx + 1))
-        if (( next < ${#PHASES[@]} )); then
-          log "PHASE $pid DONE (## DONE in $status) — auto-transitioning to $(phase_id "$next")."
-          stop_loops
-          echo "$next" > "$PHASE_IDX_FILE"
-          handoff_reset
-          start_loops
-        else
-          log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — entire build (1c→3) finished."
-          stop_loops
-          printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; entire build finished.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE"
-          log "watchdog exiting."
-          exit 0
-        fi
-      else
-        heal_session builder   "$BUILDER_SESSION" "$BUILDER_DIR"
-        heal_session adversary "$ADV_SESSION"     "$ADV_DIR"
-        heal_orchestrator
-      fi
-    fi
-    sleep "$SIGNAL_INTERVAL"
-    elapsed=$(( elapsed + SIGNAL_INTERVAL ))
-  done
-}
-
-start_watchdog() {
-  if session_alive "$WATCHDOG_SESSION"; then log "watchdog already running"; return 0; fi
-  log "starting watchdog"
-  tmux new-session -d -s "$WATCHDOG_SESSION" -c "$PLAN_DIR" \
-    "exec >>'$LOG_DIR/watchdog.log' 2>&1; '$SELF' watchdog"
-}
-
-cmd_status() {
-  local idx pid; idx="$(cur_idx)"; pid="$(phase_id "$idx")"
-  echo "  phase: $pid [$((idx+1))/${#PHASES[@]}]  plan=$(phase_plan "$idx")  status=$(phase_status "$idx")"
-  local s
-  for s in "$BUILDER_SESSION" "$ADV_SESSION" "$WATCHDOG_SESSION"; do
-    if session_alive "$s"; then echo "  $s: RUNNING"; else echo "  $s: stopped"; fi
-  done
-  if phase_done "$(phase_status "$idx")"; then echo "  phase $pid: ## DONE"; else echo "  phase $pid: in progress"; fi
-  [[ -f "$LOG_DIR/SEQUENCE-COMPLETE" ]] && echo "  >>> $(cat "$LOG_DIR/SEQUENCE-COMPLETE")"
-}
-
-case "${1:-}" in
-  start)
-    preflight
-    # Fresh sequence: stop any running loops, reset to phase 0 (unless RESUME_PHASE=1 keeps the idx).
-    stop_loops
-    if [[ "${RESUME_PHASE:-}" != "1" ]]; then echo 0 > "$PHASE_IDX_FILE"; fi
-    rm -f "$LOG_DIR/SEQUENCE-COMPLETE"
-    start_loops
-    start_watchdog
-    log "started at phase $(phase_id "$(cur_idx)"). status: ./launch.sh status | attach: tmux attach -t $BUILDER_SESSION"
-    ;;
-  watchdog)  preflight; watchdog_loop ;;
-  status)    cmd_status ;;
-  logs)
-    case "${2:-}" in
-      builder)   tail -f "$LOG_DIR/$BUILDER_SESSION.log" ;;
-      adversary) tail -f "$LOG_DIR/$ADV_SESSION.log" ;;
-      watchdog)  tail -f "$LOG_DIR/watchdog.log" ;;
-      *) die "usage: $0 logs builder|adversary|watchdog" ;;
-    esac
-    ;;
-  stop)
-    stop_loops
-    if session_alive "$WATCHDOG_SESSION"; then log "killing $WATCHDOG_SESSION"; tmux kill-session -t "$WATCHDOG_SESSION" || true; fi
-    log "stopped."
-    ;;
-  *)
-    cat <<EOF
-cc-ci loop launcher (phase-aware)
-
-  $0 start    start the phase sequence at phase 0 + watchdog (stops any running loops first)
-  $0 status   show phase + session + DONE state
-  $0 logs builder|adversary|watchdog   tail a log
-  $0 stop     stop both loops + watchdog
-  $0 watchdog run supervision loop in foreground
-
-Phase sequence (auto-transition on per-phase ## DONE; STOP after the last = manual gate):
-  $(all_ids)
-Env: LOOP_BACKEND=$LOOP_BACKEND  LOOP_MODEL=${LOOP_MODEL:-<default>}
-     claude:   CLAUDE_BIN=$CLAUDE_BIN  REMOTE_CONTROL=$REMOTE_CONTROL
-     opencode: OPENCODE_BIN=$OPENCODE_BIN  OPENCODE_SERVER=$OPENCODE_SERVER
-               (one shared server; each session attaches with --title; web UI: http://oc.commoninternet.net)
-     WATCH_INTERVAL=${WATCH_INTERVAL}s  SIGNAL_INTERVAL=${SIGNAL_INTERVAL}s
-     PHASES_SPEC='$PHASES_SPEC'
-     RESUME_PHASE=1 to keep the current phase index instead of resetting to 0.
-EOF
-    ;;
-esac
+# Thin wrapper — delegates everything to launch.py in the same directory.
+exec python3 "$(dirname "$(readlink -f "${BASH_SOURCE[0]}")")/launch.py" "$@"