cc-ci-orchestrator

Author	SHA1	Message	Date
autonomic-bot	e144354668	loops: mandate machine-docs/ for ALL coordination files (kickoff/prompts/plan/AGENTS) Recent phases wrote STATUS/BACKLOG/REVIEW/JOURNAL to the repo ROOT because build_kickoff + plan.md's tree used bare filenames, even though the loops' AGENTS.md + INBOX/DECISIONS/DEFERRED conventions already said machine-docs/. Make machine-docs/ the single mandated home everywhere: build_kickoff now emits machine-docs/ paths + an explicit FILE-LOCATION RULE; both loop prompts and plan.md (tree + seed step) updated; orchestrator AGENTS.md documents + enforces it. resolve_state/INBOX handoff already read machine-docs/ first.	2026-06-11 20:56:24 +00:00
autonomic-bot	3fa3178546	watchdog: one-shot /upgrade-all trigger on phase-sequence completion When LOG_DIR/.run-upgrade-on-complete exists, the watchdog launches launch-upgrader.py start the moment the last phase reaches ## DONE (then consumes the flag). Lets the operator replace a scheduled weekly cron run with 'run as soon as the current phase queue finishes' — used tonight: the cc-ci-upgrade-all.timer was stopped (stamp forwarded past tonight's slot) and this flag set instead.	2026-06-11 20:49:54 +00:00
autonomic-bot	4275adc4a5	watchdog: phase_done ignores placeholder '## DONE' sections (skipped mailu) A Builder scaffolded 'STATUS-mailu.md' with a '## DONE / Not yet. Written here only when ...' placeholder section; phase_done's startswith('## DONE') matched it and auto-advanced past mailu without any of its work being done (no recipe PR, no claim, no review). Harden phase_done: a '## DONE' heading counts only when its first non-empty body line is not a placeholder/negation (Not yet / pending / TBD / when all / <...> etc). Verified against all shipped STATUS files (real DONEs still detected; mailu placeholder rejected).	2026-06-11 18:20:21 +00:00
autonomic-bot	211b4e231c	launch: per-phase model override (.loop-model[-adv]-<pid>) Lets a single phase pin a different model, read fresh each role_model call so a phase transition flips it automatically with no watchdog bounce. Operator wants builder on opus for the complex dstamp phase, reverting to sonnet from mailu on: .loop-model-dstamp=opus while base .loop-model stays sonnet.	2026-06-11 16:15:18 +00:00
autonomic-bot	969eb60df1	watchdog: probe-resumed tick returns True — don't evaluate stale pane after resume The tick whose probe resumed a session was continuing into stall logic with its pre-resume pane capture; a 4h-old WAITING-UNTIL in that stale data got the freshly-resumed adversary kill+rebooted (05:52). Treat probe-resume as handled-this-tick; the next 30s tick sees the live session.	2026-06-11 05:53:44 +00:00
autonomic-bot	5ea17fca21	watchdog: fix limit-probe self-match + scrollback dedupe wedge; plan(lvl5): badge shows level only Night-watch findings (monthly-spend-limit window, ~01:49-04:45): - probe text said 'usage limit' which matches LIMIT_RE, so a submitted probe kept limited_now true forever -> reworded to 'quota window' with a CAUTION note (nudge text must never match LIMIT_RE) - dedupe scanned all 40 captured lines, so once a probe scrolled into the conversation no further probe ever fired (builder/adv frozen at nudges=1, orchestrator probes degraded to hourly riding the wake scroll) -> dedupe now only checks the bottom 8 lines (input area) Core invariant HELD: zero kill+reboots during the limit window. plan(lvl5): operator addition - the top-corner level badge (card, dashboard pill, badge SVG) shows only the level number+color, zero capping info; the inline per-rung table keeps intentional-skip/unverified detail.	2026-06-11 05:52:26 +00:00
autonomic-bot	2e1ab8d384	watchdog: hourly orchestrator wake fires even during a limit window Operator request: the hourly supervision prompt should land regardless of limit state, as a fallback that keeps things on track if the limit-state machinery ever breaks. If the limit is genuinely still in force the wake is harmless (the banner just re-prints and limit_tick re-arms); once it lifts, the queued wake doubles as a resume nudge.	2026-06-11 01:00:29 +00:00
autonomic-bot	d6e1a704da	watchdog: parse limit-reset time, never reboot limit-stalled sessions; rename orch session Replace the blind every-300s 'limit appears lifted' nudge (claude) and the opencode-only _maybe_nudge_limit with one unified limit_tick state machine: - parse the reset time from the limit banner (last match wins; stale banners whose time already passed fall back rather than waiting ~a day) - arm a quiet window until reset+45s; parse failure -> flat 5-minute probe loop (operator-specified; not exponential backoff) - while armed, suppress ALL healing: a limit-stalled session is NEVER kill+rebooted (this was the conc-phase churn: claude limit stalls fell through to the generic idle reboot, losing the banner and re-hitting the limit fresh) - at window end send ONE nudge as a self-verifying probe: spinner clears the state; a re-printed banner re-arms from the fresh reset time - dedupe: never stack a probe while our own text is visible in the pane - state persisted per session in LOG_DIR (.limited-<session>) so watchdog restarts keep the window - orchestrator gets the same treatment: limit_tick in heal_orchestrator, a per-signal-tick orch_limit_check, and hourly wakes deferred during limit windows - loud WARNING at 3 probes, then continue flat probes forever Also rename the orchestrator session default cc-ci-orchestrator-vm -> cc-ci-orchestrator (launch.py ORCH_SESSION, launch-orchestrator.py SESSION, docs/scripts references).	2026-06-11 00:55:07 +00:00
autonomic-bot	e0c9f23391	feat(launch): ADV_MODEL — per-role model override for the Adversary loop	2026-06-10 04:03:35 +00:00
autonomic-bot	c0852d2302	feat(logs): readable greppable per-agent transcript logs (agent-log.py) The raw 'tmux pipe-pane' logs are TUI-escape soup (the 191MB builder log). agent-log.py renders Claude's own JSONL transcript into a clean one-event- per-line <agent>.clean.log — read-only on a file the agent writes anyway, so zero agent slowdown and zero extra tokens. Resolves each agent's transcript (disambiguating the shared project dir by kickoff signature; tracks restarts). 'follow-all' runs as the cc-ci-cleanlogs session, wired into launch.py start so it comes up with the loops. render/tail subcommands for ad-hoc use. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 04:35:17 +00:00
autonomic-bot	2b617ba19f	feat(launch): persist PHASES_SPEC to .phases-spec (status/watchdog/reboot agree) Mirror the .loop-backend pattern: env wins, else the persisted file, else the default build sequence. Without this, a custom single-phase run was invisible to bare 'launch.py status' and would NOT survive a reboot (the service has no PHASES_SPEC env). Now the current phase set is durable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 00:17:34 +00:00
autonomic-bot	d349656c3b	feat(launch): forward PHASES_SPEC/backend to watchdog; mark plan Phase 4 as operator gate The watchdog is spawned into the existing tmux server and didn't reliably inherit a custom PHASES_SPEC — it would fall back to the default 11-phase spec and mis-detect completion. Forward PHASES_SPEC/PHASE_IDX_FILE/ LOOP_BACKEND/LOOP_MODEL explicitly in the watchdog command so custom single-phase runs (like the mirror-enroll plan) work end-to-end. Also make the mirror-enroll plan's live-host-deploy step an explicit claim-and-wait operator gate for the loops. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-02 00:15:42 +00:00
autonomic-bot	ca6e68c08d	feat(orchestrator): fold hourly supervision wake into the watchdog The standalone ai-progress-monitor.sh waker pinged a hardcoded orchestrator session every 15m. Move that into the watchdog loop: ORCH_WAKE_INTERVAL (default 3600s) types the supervision prompt into the live orchestrator session, retrying each tick until it lands so a busy or briefly-absent orchestrator is never interrupted and no hour is skipped. Delete the now-redundant waker script; the prompt file is now driven by the watchdog. Reboot-safe by inheritance (the watchdog is started by cc-ci-loops.service). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 21:46:20 +00:00
autonomic-bot	24bf379b5b	feat(assistant): add opencode launcher and phase 6/7 plans	2026-06-01 12:59:03 +00:00
autonomic-bot	3412100240	fix(opencode): all issues from first live run resolved 1. API key: opencode doesn't support env: substitution in apiKey — write actual key value to ~/.config/opencode/opencode.jsonc at setup time (file is not committed to git; key sourced from .testenv). 2. Permission system: add permission:"allow" to opencode config (equivalent to --dangerously-skip-permissions) to avoid interactive prompts. 3. Submit key: opencode TUI uses Enter (return) to submit; Ctrl+S not needed. ping_session already uses Enter — keep as is. 4. Startup timing: bump opencode TUI init wait from 4s to 8s so the TUI is fully connected to the server before bootstrap is sent. 5. Backend persistence: LOOP_BACKEND/LOOP_MODEL written to .loop-backend / .loop-model so the watchdog uses them when restarting dead sessions. All tested: both builder and adversary sessions alive, deepseek-v4-pro processing kickoffs via tinfoil inference.tinfoil.sh, no API/permission errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 18:21:10 +00:00
autonomic-bot	cd5e645427	fix(opencode): use inference.tinfoil.sh + attach TUI + NO_COLOR Three fixes discovered during first live run: - inference host is inference.tinfoil.sh not api.tinfoil.sh (control plane only serves /v1/models, not /v1/chat/completions) - opencode run exits after one turn; switch to opencode attach for the persistent TUI, then ping_session sends the kickoff prompt - NO_COLOR=1 suppresses the first-run interactive theme picker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 17:56:06 +00:00
autonomic-bot	bca51071bd	refactor: rewrite launchers as Python; add orchestrator JOURNAL.md Bash scripts are now one-liner wrappers: exec python3 <script>.py "$@" All logic lives in the Python scripts (pure stdlib, no deps). launch.py — loops + watchdog: Full port of launch.sh: phase sequencing, start/stop/status/logs/watchdog, handoff signalling, stall detection, heal_session, heal_orchestrator. Cleaner structure: config block → helpers → phase/kickoff/agent/healing/ handoff/watchdog/main. LOOP_BACKEND + LOOP_MODEL switches throughout. launch-orchestrator.py — orchestrator session: claude path: --resume <id> preserved (conversation survives reboots). opencode path: run --attach --title (no --resume; STARTUP_PROMPT orients the new session; reads JOURNAL.md for context). STARTUP_PROMPT updated to reference JOURNAL.md on startup. launch-upgrader.py — one-shot upgrade job: LOOP_BACKEND / LOOP_MODEL take precedence over UPGRADER_BACKEND / UPGRADER_MODEL. Both claude and opencode paths supported. cc-ci-plan/JOURNAL.md — new orchestrator handoff file: Persistent across conversation resets. Documents the handoff format and carries the current session's summary: migration complete, phase 5 in progress (V3/V7 PASS), phase 4 deferred, open items for next session. AGENTS.md: step 1 on startup = read JOURNAL.md; step 5 = append on handoff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 17:50:09 +00:00

17 Commits