cc-ci-orchestrator

Author	SHA1	Message	Date
autonomic-bot	f94be45f9c	watchdog: cover all parts of the weekly run + survive the systemd oneshot Two gaps for the scheduled Thursday glm-5.2 run: 1. Survival: the watchdog was a Popen child of the Type=oneshot service, which systemd's cgroup cleanup kills on exit. Spawn it under the persistent tmux server instead (_spawn_watchdog), like the run sessions — survives the oneshot. 2. The report runs on glm-5.2 sharing the same opencode-go budget the upgrade run drains, so it can 429-stall with no recovery. launch-report.py now spawns the SAME watchdog pointed at the cc-ci-report session (generic via UPGRADER_SESSION/ _MODEL/_DONE_MARKER/_RESUME_FILE), with a report-specific resume prompt. Also: _run_pids() is now scoped to the managed session (title or -s <sid>) so the report watchdog can't kill the idle upgrader process and vice-versa; resume() adds --dir and honors a custom resume prompt file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 02:42:50 +00:00
autonomic-bot	5a6c62e36c	launch-upgrader: fix false completion detection (prompt contains the marker) _completed() grepped the log for UPGRADE RUN COMPLETE, but the kickoff/resume PROMPT (a user message) contains that string verbatim, so it false-positived 'done' while the run was still going. Check the model's ASSISTANT message output via the web server API instead (log grep only as an offline, prompt-excluding fallback). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 01:42:06 +00:00
autonomic-bot	6f9cbc1a56	launch-upgrader: rename babysit -> watchdog (match agents.py convention) Subcommand, function, env (UPGRADER_WATCHDOG), and log file renamed; behavior unchanged. Only the opencode upgrader 'start' auto-spawns it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 01:33:07 +00:00
autonomic-bot	28ef7e44ab	launch-upgrader: add stall-detect + auto-resume watchdog (opencode-go limit) The opencode-go subscription's rolling usage-limit (429) ends the 'opencode run' agent loop mid-run; it does NOT self-resume. Add: - resume: continue the SAME session (context preserved) via 'opencode run -s <id> --continue' — finds the session from the web server, kills the idle proc safely (via /proc scan, never pkill -f self-match), relaunches in the tmux session. - babysit: poll the session log; on a stall (>15min idle) wait out any 429 retry-after then auto-resume. Spawned automatically by an opencode 'start'. So a usage-limit pause now self-heals instead of needing a manual nudge. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-23 01:26:24 +00:00
autonomic-bot	5351ec2e40	launch-upgrader: default to opencode-go/glm-5.2 when unset Weekly upgrade run now defaults backend=opencode, model=opencode-go/glm-5.2 with no env set. Model default tracks backend (claude override → sonnet). Override via LOOP_BACKEND/LOOP_MODEL or /srv/cc-ci/upgrader.env. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 20:23:26 +00:00
autonomic-bot	1443ccaea5	weekly upgrade: optional backend/model via /srv/cc-ci/upgrader.env cc-ci-upgrade-all now reads an optional EnvironmentFile so the weekly run can switch backend/model (e.g. LOOP_BACKEND=opencode LOOP_MODEL=opencode-go/glm-5.2) without a rebuild. Absent file → claude/sonnet (unchanged). Built+switched on cc-ci-orchestrator-hetzner, host verified healthy. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 20:21:16 +00:00
autonomic-bot	ec18c98af6	launch-upgrader: fix opencode --model placement + add web-attach/--share The opencode backend emitted 'opencode --model X run ...' but -m/--model is a flag on the run subcommand, so the model was being ignored. Move it after run. Add OPENCODE_SHARE (default on): attach the session to the shared opencode web server (oc.commoninternet.net) AND create a public --share link for monitoring. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 20:14:27 +00:00
autonomic-bot	bca51071bd	refactor: rewrite launchers as Python; add orchestrator JOURNAL.md Bash scripts are now one-liner wrappers: exec python3 <script>.py "$@" All logic lives in the Python scripts (pure stdlib, no deps). launch.py — loops + watchdog: Full port of launch.sh: phase sequencing, start/stop/status/logs/watchdog, handoff signalling, stall detection, heal_session, heal_orchestrator. Cleaner structure: config block → helpers → phase/kickoff/agent/healing/ handoff/watchdog/main. LOOP_BACKEND + LOOP_MODEL switches throughout. launch-orchestrator.py — orchestrator session: claude path: --resume <id> preserved (conversation survives reboots). opencode path: run --attach --title (no --resume; STARTUP_PROMPT orients the new session; reads JOURNAL.md for context). STARTUP_PROMPT updated to reference JOURNAL.md on startup. launch-upgrader.py — one-shot upgrade job: LOOP_BACKEND / LOOP_MODEL take precedence over UPGRADER_BACKEND / UPGRADER_MODEL. Both claude and opencode paths supported. cc-ci-plan/JOURNAL.md — new orchestrator handoff file: Persistent across conversation resets. Documents the handoff format and carries the current session's summary: migration complete, phase 5 in progress (V3/V7 PASS), phase 4 deferred, open items for next session. AGENTS.md: step 1 on startup = read JOURNAL.md; step 5 = append on handoff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 17:50:09 +00:00

8 Commits