Replace the blind every-300s 'limit appears lifted' nudge (claude) and the opencode-only _maybe_nudge_limit with one unified limit_tick state machine: - parse the reset time from the limit banner (last match wins; stale banners whose time already passed fall back rather than waiting ~a day) - arm a quiet window until reset+45s; parse failure -> flat 5-minute probe loop (operator-specified; not exponential backoff) - while armed, suppress ALL healing: a limit-stalled session is NEVER kill+rebooted (this was the conc-phase churn: claude limit stalls fell through to the generic idle reboot, losing the banner and re-hitting the limit fresh) - at window end send ONE nudge as a self-verifying probe: spinner clears the state; a re-printed banner re-arms from the fresh reset time - dedupe: never stack a probe while our own text is visible in the pane - state persisted per session in LOG_DIR (.limited-<session>) so watchdog restarts keep the window - orchestrator gets the same treatment: limit_tick in heal_orchestrator, a per-signal-tick orch_limit_check, and hourly wakes deferred during limit windows - loud WARNING at 3 probes, then continue flat probes forever Also rename the orchestrator session default cc-ci-orchestrator-vm -> cc-ci-orchestrator (launch.py ORCH_SESSION, launch-orchestrator.py SESSION, docs/scripts references).
64 lines
5.0 KiB
Markdown
64 lines
5.0 KiB
Markdown
# orchestration.md — cc-ci agent map (root)
|
|
|
|
The root structure of the cc-ci build system: who the agents are, where each one's **prompt** and
|
|
**plan** live, and how they're kept on track. **Start here.** The watchdog and the coordination
|
|
protocol are the subtlety beneath this map (see the last two sections).
|
|
|
|
## Agents
|
|
|
|
| Agent | Role (one line) | Prompt | Plan / SSOT | Session · workdir · launcher |
|
|
|---|---|---|---|---|
|
|
| **Builder** | Builds the CI server; one of two independent loops | `cc-ci-plan/prompts/builder.md` | the **current phase plan** (`launch.py status` names it) + master `cc-ci-plan/plan.md` | `cc-ci-builder` · `/srv/cc-ci/cc-ci` · `launch.py` |
|
|
| **Adversary** | Independently disbelieves & verifies the Builder; owns REVIEW + veto | `cc-ci-plan/prompts/adversary.md` | same current phase plan (verifies against it) + `plan.md` | `cc-ci-adv` · `/srv/cc-ci/cc-ci-adv` · `launch.py` |
|
|
| **Orchestrator** | **Keeps everyone on track** — supervises, nudges, fixes plans/prompts, owns the host-level fallback | wake: `cc-ci-plan/ai-progress-monitor-prompt.txt` → **this doc (§The orchestrator's job)** | this doc + `cc-ci-plan/JOURNAL.md` (handoff record) | `cc-ci-orchestrator` · `/srv/cc-ci-orch` · `launch-orchestrator.py` |
|
|
| **Assistant** | One-shot agent dispatched for cross-cutting passes (e.g. mirror reconcile); idle unless dispatched | assignment set at launch (`launch-assistant.py`) | the task it's dispatched with | `cc-ci-assistant` · `/srv/cc-ci-orch` · `launch-assistant.py` |
|
|
| **Upgrader** | Weekly one-shot: runs `/upgrade-all` (recipe-upgrade survey + PRs, never merges) | the `/upgrade-all` skill | triggered by the `cc-ci-upgrade-all` systemd timer (Sun 02:00 UTC) | `cc-ci-upgrader` · `/srv/cc-ci` · `launch-upgrader.py` |
|
|
|
|
Phases are defined in `.cc-ci-logs/.phases-spec` (id|planfile|statusfile, persisted by `launch.py
|
|
start`); `launch.py status` shows the current one. Backend is `claude`/`sonnet` (`.loop-backend`/
|
|
`.loop-model`).
|
|
|
|
## The orchestrator's job — keep everyone on track
|
|
|
|
On each scheduled wake (and on startup — see `AGENTS.md`):
|
|
1. **Current phase, read live:** `python3 cc-ci-plan/launch.py status` → current phase id, its plan
|
|
file, its `STATUS-<id>.md`. Never assume a phase; whatever it reports IS the phase.
|
|
2. **Live-state checks:** builder/adv/watchdog panes (`tmux capture-pane -pt …`); `.loop-backend`=claude
|
|
& `.loop-model`=sonnet; `ssh cc-ci hostname` reachable.
|
|
3. **Keep them moving** — only intervene where the watchdog can't:
|
|
- Builder stalled / idle past its WAITING-UNTIL with no work → nudge to continue the current phase.
|
|
- Adversary stale / on old evidence → nudge to re-orient + verify outstanding claims.
|
|
- Loop at high context (≳85%) → nudge it to `/compact` (lossless; state is in git + STATUS/REVIEW).
|
|
- Loop session missing → `RESUME_PHASE=1 LOOP_BACKEND=claude LOOP_MODEL=sonnet python3 cc-ci-plan/launch.py start`.
|
|
- **Revised a plan a loop is already working in?** Ping the session to re-read it — loops read the
|
|
plan at kickoff and won't see later edits unless told.
|
|
4. **Completion:** a phase is done when its `STATUS-<id>.md` has a line starting with `## DONE`; the
|
|
watchdog auto-advances. When the LAST phase finishes the watchdog writes `SEQUENCE-COMPLETE`, stops
|
|
the loops, and exits (so the hourly wake stops too) → append to `JOURNAL.md` + proactive PushNotification.
|
|
5. **Be decisive but minimal.** Healthy + active → just note the state. Don't make unrelated changes.
|
|
|
|
## Subtlety: the watchdog (`launch.py watchdog`)
|
|
|
|
A non-agent supervisor loop (`cc-ci-watchdog` tmux session, started by `launch.py start` / the
|
|
`cc-ci-loops.service` boot unit). It: heals dead/wedged loop sessions, **pings the other loop on every
|
|
`claim(...)`/`review(...)` commit** (the handoff signal), enforces liveness (kills+reboots a loop idle
|
|
past its `WAITING-UNTIL`), **auto-advances phases** when a `STATUS-<id>.md` hits `## DONE`, and writes
|
|
`SEQUENCE-COMPLETE` at the end. It also fires the **hourly orchestrator wake** (this doc, via the wake
|
|
prompt). It does NOT compact loops or make decisions — that's the orchestrator.
|
|
|
|
## Subtlety: coordination protocol (`plan.md` §6.1)
|
|
|
|
The two loops coordinate **only** through the cc-ci git repo — never directly:
|
|
- `git pull --rebase` before every edit; smallest change; commit; push (never `--force`).
|
|
- **Commit-prefix convention** (the watchdog depends on it): `claim(...)` = Builder claims a gate;
|
|
`review(...)` = Adversary verdict. Those prefixes ARE the handoff signal.
|
|
- Phase-namespaced state files in the repo: `STATUS-<id>.md`, `BACKLOG-<id>.md`, `REVIEW-<id>.md`,
|
|
`JOURNAL-<id>.md`; `DECISIONS.md` is shared.
|
|
- Inbox side-channels for non-gate messages: `BUILDER-INBOX.md` / `ADVERSARY-INBOX.md` (watchdog
|
|
edge-pings on appearance; consumer deletes on read).
|
|
- Full rules: `plan.md` §6.1 (coordination), §7 (pacing/liveness), §9 (guardrails).
|
|
|
|
## See also
|
|
- `AGENTS.md` — orchestrator on-startup routine + host/reboot facts (Hetzner `cpx22`).
|
|
- `plan.md` — master build SSOT.
|