Replace the blind every-300s 'limit appears lifted' nudge (claude) and the opencode-only _maybe_nudge_limit with one unified limit_tick state machine: - parse the reset time from the limit banner (last match wins; stale banners whose time already passed fall back rather than waiting ~a day) - arm a quiet window until reset+45s; parse failure -> flat 5-minute probe loop (operator-specified; not exponential backoff) - while armed, suppress ALL healing: a limit-stalled session is NEVER kill+rebooted (this was the conc-phase churn: claude limit stalls fell through to the generic idle reboot, losing the banner and re-hitting the limit fresh) - at window end send ONE nudge as a self-verifying probe: spinner clears the state; a re-printed banner re-arms from the fresh reset time - dedupe: never stack a probe while our own text is visible in the pane - state persisted per session in LOG_DIR (.limited-<session>) so watchdog restarts keep the window - orchestrator gets the same treatment: limit_tick in heal_orchestrator, a per-signal-tick orch_limit_check, and hourly wakes deferred during limit windows - loud WARNING at 3 probes, then continue flat probes forever Also rename the orchestrator session default cc-ci-orchestrator-vm -> cc-ci-orchestrator (launch.py ORCH_SESSION, launch-orchestrator.py SESSION, docs/scripts references).
5.0 KiB
orchestration.md — cc-ci agent map (root)
The root structure of the cc-ci build system: who the agents are, where each one's prompt and plan live, and how they're kept on track. Start here. The watchdog and the coordination protocol are the subtlety beneath this map (see the last two sections).
Agents
| Agent | Role (one line) | Prompt | Plan / SSOT | Session · workdir · launcher |
|---|---|---|---|---|
| Builder | Builds the CI server; one of two independent loops | cc-ci-plan/prompts/builder.md |
the current phase plan (launch.py status names it) + master cc-ci-plan/plan.md |
cc-ci-builder · /srv/cc-ci/cc-ci · launch.py |
| Adversary | Independently disbelieves & verifies the Builder; owns REVIEW + veto | cc-ci-plan/prompts/adversary.md |
same current phase plan (verifies against it) + plan.md |
cc-ci-adv · /srv/cc-ci/cc-ci-adv · launch.py |
| Orchestrator | Keeps everyone on track — supervises, nudges, fixes plans/prompts, owns the host-level fallback | wake: cc-ci-plan/ai-progress-monitor-prompt.txt → this doc (§The orchestrator's job) |
this doc + cc-ci-plan/JOURNAL.md (handoff record) |
cc-ci-orchestrator · /srv/cc-ci-orch · launch-orchestrator.py |
| Assistant | One-shot agent dispatched for cross-cutting passes (e.g. mirror reconcile); idle unless dispatched | assignment set at launch (launch-assistant.py) |
the task it's dispatched with | cc-ci-assistant · /srv/cc-ci-orch · launch-assistant.py |
| Upgrader | Weekly one-shot: runs /upgrade-all (recipe-upgrade survey + PRs, never merges) |
the /upgrade-all skill |
triggered by the cc-ci-upgrade-all systemd timer (Sun 02:00 UTC) |
cc-ci-upgrader · /srv/cc-ci · launch-upgrader.py |
Phases are defined in .cc-ci-logs/.phases-spec (id|planfile|statusfile, persisted by launch.py start); launch.py status shows the current one. Backend is claude/sonnet (.loop-backend/
.loop-model).
The orchestrator's job — keep everyone on track
On each scheduled wake (and on startup — see AGENTS.md):
- Current phase, read live:
python3 cc-ci-plan/launch.py status→ current phase id, its plan file, itsSTATUS-<id>.md. Never assume a phase; whatever it reports IS the phase. - Live-state checks: builder/adv/watchdog panes (
tmux capture-pane -pt …);.loop-backend=claude &.loop-model=sonnet;ssh cc-ci hostnamereachable. - Keep them moving — only intervene where the watchdog can't:
- Builder stalled / idle past its WAITING-UNTIL with no work → nudge to continue the current phase.
- Adversary stale / on old evidence → nudge to re-orient + verify outstanding claims.
- Loop at high context (≳85%) → nudge it to
/compact(lossless; state is in git + STATUS/REVIEW). - Loop session missing →
RESUME_PHASE=1 LOOP_BACKEND=claude LOOP_MODEL=sonnet python3 cc-ci-plan/launch.py start. - Revised a plan a loop is already working in? Ping the session to re-read it — loops read the plan at kickoff and won't see later edits unless told.
- Completion: a phase is done when its
STATUS-<id>.mdhas a line starting with## DONE; the watchdog auto-advances. When the LAST phase finishes the watchdog writesSEQUENCE-COMPLETE, stops the loops, and exits (so the hourly wake stops too) → append toJOURNAL.md+ proactive PushNotification. - Be decisive but minimal. Healthy + active → just note the state. Don't make unrelated changes.
Subtlety: the watchdog (launch.py watchdog)
A non-agent supervisor loop (cc-ci-watchdog tmux session, started by launch.py start / the
cc-ci-loops.service boot unit). It: heals dead/wedged loop sessions, pings the other loop on every
claim(...)/review(...) commit (the handoff signal), enforces liveness (kills+reboots a loop idle
past its WAITING-UNTIL), auto-advances phases when a STATUS-<id>.md hits ## DONE, and writes
SEQUENCE-COMPLETE at the end. It also fires the hourly orchestrator wake (this doc, via the wake
prompt). It does NOT compact loops or make decisions — that's the orchestrator.
Subtlety: coordination protocol (plan.md §6.1)
The two loops coordinate only through the cc-ci git repo — never directly:
git pull --rebasebefore every edit; smallest change; commit; push (never--force).- Commit-prefix convention (the watchdog depends on it):
claim(...)= Builder claims a gate;review(...)= Adversary verdict. Those prefixes ARE the handoff signal. - Phase-namespaced state files in the repo:
STATUS-<id>.md,BACKLOG-<id>.md,REVIEW-<id>.md,JOURNAL-<id>.md;DECISIONS.mdis shared. - Inbox side-channels for non-gate messages:
BUILDER-INBOX.md/ADVERSARY-INBOX.md(watchdog edge-pings on appearance; consumer deletes on read). - Full rules:
plan.md§6.1 (coordination), §7 (pacing/liveness), §9 (guardrails).
See also
AGENTS.md— orchestrator on-startup routine + host/reboot facts (Hetznercpx22).plan.md— master build SSOT.