Files
cc-ci-orchestrator/cc-ci-plan/orchestration.md
autonomic-bot d6e1a704da watchdog: parse limit-reset time, never reboot limit-stalled sessions; rename orch session
Replace the blind every-300s 'limit appears lifted' nudge (claude) and the
opencode-only _maybe_nudge_limit with one unified limit_tick state machine:

- parse the reset time from the limit banner (last match wins; stale banners
  whose time already passed fall back rather than waiting ~a day)
- arm a quiet window until reset+45s; parse failure -> flat 5-minute probe
  loop (operator-specified; not exponential backoff)
- while armed, suppress ALL healing: a limit-stalled session is NEVER
  kill+rebooted (this was the conc-phase churn: claude limit stalls fell
  through to the generic idle reboot, losing the banner and re-hitting
  the limit fresh)
- at window end send ONE nudge as a self-verifying probe: spinner clears
  the state; a re-printed banner re-arms from the fresh reset time
- dedupe: never stack a probe while our own text is visible in the pane
- state persisted per session in LOG_DIR (.limited-<session>) so watchdog
  restarts keep the window
- orchestrator gets the same treatment: limit_tick in heal_orchestrator,
  a per-signal-tick orch_limit_check, and hourly wakes deferred during
  limit windows
- loud WARNING at 3 probes, then continue flat probes forever

Also rename the orchestrator session default cc-ci-orchestrator-vm ->
cc-ci-orchestrator (launch.py ORCH_SESSION, launch-orchestrator.py SESSION,
docs/scripts references).
2026-06-11 00:55:07 +00:00

5.0 KiB

orchestration.md — cc-ci agent map (root)

The root structure of the cc-ci build system: who the agents are, where each one's prompt and plan live, and how they're kept on track. Start here. The watchdog and the coordination protocol are the subtlety beneath this map (see the last two sections).

Agents

Agent Role (one line) Prompt Plan / SSOT Session · workdir · launcher
Builder Builds the CI server; one of two independent loops cc-ci-plan/prompts/builder.md the current phase plan (launch.py status names it) + master cc-ci-plan/plan.md cc-ci-builder · /srv/cc-ci/cc-ci · launch.py
Adversary Independently disbelieves & verifies the Builder; owns REVIEW + veto cc-ci-plan/prompts/adversary.md same current phase plan (verifies against it) + plan.md cc-ci-adv · /srv/cc-ci/cc-ci-adv · launch.py
Orchestrator Keeps everyone on track — supervises, nudges, fixes plans/prompts, owns the host-level fallback wake: cc-ci-plan/ai-progress-monitor-prompt.txtthis doc (§The orchestrator's job) this doc + cc-ci-plan/JOURNAL.md (handoff record) cc-ci-orchestrator · /srv/cc-ci-orch · launch-orchestrator.py
Assistant One-shot agent dispatched for cross-cutting passes (e.g. mirror reconcile); idle unless dispatched assignment set at launch (launch-assistant.py) the task it's dispatched with cc-ci-assistant · /srv/cc-ci-orch · launch-assistant.py
Upgrader Weekly one-shot: runs /upgrade-all (recipe-upgrade survey + PRs, never merges) the /upgrade-all skill triggered by the cc-ci-upgrade-all systemd timer (Sun 02:00 UTC) cc-ci-upgrader · /srv/cc-ci · launch-upgrader.py

Phases are defined in .cc-ci-logs/.phases-spec (id|planfile|statusfile, persisted by launch.py start); launch.py status shows the current one. Backend is claude/sonnet (.loop-backend/ .loop-model).

The orchestrator's job — keep everyone on track

On each scheduled wake (and on startup — see AGENTS.md):

  1. Current phase, read live: python3 cc-ci-plan/launch.py status → current phase id, its plan file, its STATUS-<id>.md. Never assume a phase; whatever it reports IS the phase.
  2. Live-state checks: builder/adv/watchdog panes (tmux capture-pane -pt …); .loop-backend=claude & .loop-model=sonnet; ssh cc-ci hostname reachable.
  3. Keep them moving — only intervene where the watchdog can't:
    • Builder stalled / idle past its WAITING-UNTIL with no work → nudge to continue the current phase.
    • Adversary stale / on old evidence → nudge to re-orient + verify outstanding claims.
    • Loop at high context (≳85%) → nudge it to /compact (lossless; state is in git + STATUS/REVIEW).
    • Loop session missing → RESUME_PHASE=1 LOOP_BACKEND=claude LOOP_MODEL=sonnet python3 cc-ci-plan/launch.py start.
    • Revised a plan a loop is already working in? Ping the session to re-read it — loops read the plan at kickoff and won't see later edits unless told.
  4. Completion: a phase is done when its STATUS-<id>.md has a line starting with ## DONE; the watchdog auto-advances. When the LAST phase finishes the watchdog writes SEQUENCE-COMPLETE, stops the loops, and exits (so the hourly wake stops too) → append to JOURNAL.md + proactive PushNotification.
  5. Be decisive but minimal. Healthy + active → just note the state. Don't make unrelated changes.

Subtlety: the watchdog (launch.py watchdog)

A non-agent supervisor loop (cc-ci-watchdog tmux session, started by launch.py start / the cc-ci-loops.service boot unit). It: heals dead/wedged loop sessions, pings the other loop on every claim(...)/review(...) commit (the handoff signal), enforces liveness (kills+reboots a loop idle past its WAITING-UNTIL), auto-advances phases when a STATUS-<id>.md hits ## DONE, and writes SEQUENCE-COMPLETE at the end. It also fires the hourly orchestrator wake (this doc, via the wake prompt). It does NOT compact loops or make decisions — that's the orchestrator.

Subtlety: coordination protocol (plan.md §6.1)

The two loops coordinate only through the cc-ci git repo — never directly:

  • git pull --rebase before every edit; smallest change; commit; push (never --force).
  • Commit-prefix convention (the watchdog depends on it): claim(...) = Builder claims a gate; review(...) = Adversary verdict. Those prefixes ARE the handoff signal.
  • Phase-namespaced state files in the repo: STATUS-<id>.md, BACKLOG-<id>.md, REVIEW-<id>.md, JOURNAL-<id>.md; DECISIONS.md is shared.
  • Inbox side-channels for non-gate messages: BUILDER-INBOX.md / ADVERSARY-INBOX.md (watchdog edge-pings on appearance; consumer deletes on read).
  • Full rules: plan.md §6.1 (coordination), §7 (pacing/liveness), §9 (guardrails).

See also

  • AGENTS.md — orchestrator on-startup routine + host/reboot facts (Hetzner cpx22).
  • plan.md — master build SSOT.