docs: orchestration.md as the root agent map; wake prompt + AGENTS.md point to it

One root doc maps every agent (Builder, Adversary, Orchestrator, Assistant,
Upgrader) -> its prompt + plan, with the watchdog and git coordination
protocol as the subtlety beneath. Fold the orchestrator supervision routine
into it (remove orchestrator-supervision.md). The hourly wake prompt and
AGENTS.md now just point at orchestration.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-06-02 01:42:49 +00:00
parent 37a422bc31
commit 35f83a4b74
4 changed files with 71 additions and 50 deletions

View File

@ -5,16 +5,15 @@ server. It holds the plan, the launch/supervision tooling, and the two loop prom
project (NixOS config, test runner, recipe tests) lives in a **separate** repo the loops create at
`git.autonomic.zone/recipe-maintainers/cc-ci` — do not confuse the two.
## Three roles (don't conflate them)
## The agent map lives in `cc-ci-plan/orchestration.md`
1. **Orchestrator***this* session/role. Supervises: checks in on the two loops, reads their
logs/STATUS, makes changes to the plan/prompts, restarts loops, and owns the VM-level fallback.
It is **separate** from the loops and is the only role that should power-cycle/recreate the VM.
2. **Builder loop** — builds the CI server (`cc-ci-plan/prompts/builder.md`).
3. **Adversary loop** — independently disbelieves/verifies (`cc-ci-plan/prompts/adversary.md`).
That doc is the **root structure**: every agent (Builder, Adversary, Orchestrator, Assistant,
Upgrader) → its prompt + plan, plus the watchdog and the git coordination protocol. Read it first.
The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator
watches from outside.
In short: the **Orchestrator** (*this* session/role) supervises and keeps everyone on track — it is
**separate** from the loops and is the only role that should power-cycle/recreate the host. The
**Builder** and **Adversary** loops coordinate **only** through the cc-ci git repo (`plan.md` §6.1);
the orchestrator watches from outside.
## On startup: read the journal, announce yourself, report reboots

View File

@ -1 +1 @@
You are the cc-ci orchestrator, woken for your scheduled supervision pass. Read /srv/cc-ci/cc-ci-plan/orchestrator-supervision.md and follow it — it tells you how to find the CURRENT phase (via `python3 cc-ci-plan/launch.py status`) and what to check, nudge, and do. Be decisive but minimal; if everything is healthy and active, just note the state.
You are the cc-ci orchestrator, woken for your scheduled supervision pass. Read /srv/cc-ci/cc-ci-plan/orchestration.md (the agent map) and do your job per its "The orchestrator's job — keep everyone on track" section — it tells you how to find the CURRENT phase (`python3 cc-ci-plan/launch.py status`) and what to check, nudge, and do. Be decisive but minimal; if everything is healthy and active, just note the state.

View File

@ -0,0 +1,63 @@
# orchestration.md — cc-ci agent map (root)
The root structure of the cc-ci build system: who the agents are, where each one's **prompt** and
**plan** live, and how they're kept on track. **Start here.** The watchdog and the coordination
protocol are the subtlety beneath this map (see the last two sections).
## Agents
| Agent | Role (one line) | Prompt | Plan / SSOT | Session · workdir · launcher |
|---|---|---|---|---|
| **Builder** | Builds the CI server; one of two independent loops | `cc-ci-plan/prompts/builder.md` | the **current phase plan** (`launch.py status` names it) + master `cc-ci-plan/plan.md` | `cc-ci-builder` · `/srv/cc-ci/cc-ci` · `launch.py` |
| **Adversary** | Independently disbelieves & verifies the Builder; owns REVIEW + veto | `cc-ci-plan/prompts/adversary.md` | same current phase plan (verifies against it) + `plan.md` | `cc-ci-adv` · `/srv/cc-ci/cc-ci-adv` · `launch.py` |
| **Orchestrator** | **Keeps everyone on track** — supervises, nudges, fixes plans/prompts, owns the host-level fallback | wake: `cc-ci-plan/ai-progress-monitor-prompt.txt`**this doc (§The orchestrator's job)** | this doc + `cc-ci-plan/JOURNAL.md` (handoff record) | `cc-ci-orchestrator-vm` · `/srv/cc-ci-orch` · `launch-orchestrator.py` |
| **Assistant** | One-shot agent dispatched for cross-cutting passes (e.g. mirror reconcile); idle unless dispatched | assignment set at launch (`launch-assistant.py`) | the task it's dispatched with | `cc-ci-assistant` · `/srv/cc-ci-orch` · `launch-assistant.py` |
| **Upgrader** | Weekly one-shot: runs `/upgrade-all` (recipe-upgrade survey + PRs, never merges) | the `/upgrade-all` skill | triggered by the `cc-ci-upgrade-all` systemd timer (Sun 02:00 UTC) | `cc-ci-upgrader` · `/srv/cc-ci` · `launch-upgrader.py` |
Phases are defined in `.cc-ci-logs/.phases-spec` (id|planfile|statusfile, persisted by `launch.py
start`); `launch.py status` shows the current one. Backend is `claude`/`sonnet` (`.loop-backend`/
`.loop-model`).
## The orchestrator's job — keep everyone on track
On each scheduled wake (and on startup — see `AGENTS.md`):
1. **Current phase, read live:** `python3 cc-ci-plan/launch.py status` → current phase id, its plan
file, its `STATUS-<id>.md`. Never assume a phase; whatever it reports IS the phase.
2. **Live-state checks:** builder/adv/watchdog panes (`tmux capture-pane -pt …`); `.loop-backend`=claude
& `.loop-model`=sonnet; `ssh cc-ci hostname` reachable.
3. **Keep them moving** — only intervene where the watchdog can't:
- Builder stalled / idle past its WAITING-UNTIL with no work → nudge to continue the current phase.
- Adversary stale / on old evidence → nudge to re-orient + verify outstanding claims.
- Loop at high context (≳85%) → nudge it to `/compact` (lossless; state is in git + STATUS/REVIEW).
- Loop session missing → `RESUME_PHASE=1 LOOP_BACKEND=claude LOOP_MODEL=sonnet python3 cc-ci-plan/launch.py start`.
- **Revised a plan a loop is already working in?** Ping the session to re-read it — loops read the
plan at kickoff and won't see later edits unless told.
4. **Completion:** a phase is done when its `STATUS-<id>.md` has a line starting with `## DONE`; the
watchdog auto-advances. When the LAST phase finishes the watchdog writes `SEQUENCE-COMPLETE`, stops
the loops, and exits (so the hourly wake stops too) → append to `JOURNAL.md` + proactive PushNotification.
5. **Be decisive but minimal.** Healthy + active → just note the state. Don't make unrelated changes.
## Subtlety: the watchdog (`launch.py watchdog`)
A non-agent supervisor loop (`cc-ci-watchdog` tmux session, started by `launch.py start` / the
`cc-ci-loops.service` boot unit). It: heals dead/wedged loop sessions, **pings the other loop on every
`claim(...)`/`review(...)` commit** (the handoff signal), enforces liveness (kills+reboots a loop idle
past its `WAITING-UNTIL`), **auto-advances phases** when a `STATUS-<id>.md` hits `## DONE`, and writes
`SEQUENCE-COMPLETE` at the end. It also fires the **hourly orchestrator wake** (this doc, via the wake
prompt). It does NOT compact loops or make decisions — that's the orchestrator.
## Subtlety: coordination protocol (`plan.md` §6.1)
The two loops coordinate **only** through the cc-ci git repo — never directly:
- `git pull --rebase` before every edit; smallest change; commit; push (never `--force`).
- **Commit-prefix convention** (the watchdog depends on it): `claim(...)` = Builder claims a gate;
`review(...)` = Adversary verdict. Those prefixes ARE the handoff signal.
- Phase-namespaced state files in the repo: `STATUS-<id>.md`, `BACKLOG-<id>.md`, `REVIEW-<id>.md`,
`JOURNAL-<id>.md`; `DECISIONS.md` is shared.
- Inbox side-channels for non-gate messages: `BUILDER-INBOX.md` / `ADVERSARY-INBOX.md` (watchdog
edge-pings on appearance; consumer deletes on read).
- Full rules: `plan.md` §6.1 (coordination), §7 (pacing/liveness), §9 (guardrails).
## See also
- `AGENTS.md` — orchestrator on-startup routine + host/reboot facts (Hetzner `cpx22`).
- `plan.md` — master build SSOT.

View File

@ -1,41 +0,0 @@
# Orchestrator supervision routine
The routine the cc-ci orchestrator follows on each scheduled wake-up. The hourly wake (driven by the
watchdog) types a one-line pointer to THIS file — so the wake prompt itself never needs editing as the
build moves through phases. This doc holds the actual instructions and looks the current phase up live.
Supervise the two worker loops (Builder + Adversary, both on the claude backend), nudge anything
stalled, confirm progress, otherwise stay hands-off. Do NOT make unrelated code changes.
## 1. Current phase — always read it live (never assume a specific phase)
Run `python3 cc-ci-plan/launch.py status`. It reports the CURRENT phase id, its plan file, and its
`STATUS-<id>.md` (from `.phases-spec`). Whatever it says now IS the phase; that phase's plan file (in
`cc-ci-plan/`) is the loops' single source of truth for what they're doing this phase.
## 2. Live-state checks
- builder / adv panes: `tmux capture-pane -pt cc-ci-builder` / `-pt cc-ci-adv`
- watchdog: `tmux capture-pane -pt cc-ci-watchdog` (+ `tail /srv/cc-ci/.cc-ci-logs/watchdog.log`)
- backend sanity: `.loop-backend` = `claude`, `.loop-model` = `sonnet`
- `ssh cc-ci hostname` → CI server reachable
## 3. Keep them moving
- Builder stalled / idle past its WAITING-UNTIL with no work → nudge it to continue the current phase.
- Adversary stale / parked on old evidence → nudge it to re-orient and verify outstanding claims.
- The watchdog heals dead sessions and pings on `claim()`/`review()` commits — only intervene where it
CAN'T: a wedged-but-alive loop, genuine drift, or a loop at high context (≳85%) that should `/compact`
(lossless — state is in git + the phase STATUS/REVIEW files).
- Loop session missing entirely → `RESUME_PHASE=1 LOOP_BACKEND=claude LOOP_MODEL=sonnet python3 cc-ci-plan/launch.py start`.
- If you revised a plan a loop is already working in, re-read it to them (ping the session) — loops read
the plan at kickoff and won't see later edits unless told.
## 4. Completion
- A phase is done when its `STATUS-<id>.md` (in `/srv/cc-ci/cc-ci/machine-docs/`) has a line starting
with `## DONE` (every gate Adversary-verified, no standing VETO). The watchdog auto-advances to the
next phase (`[n/N]` in status).
- When the LAST phase finishes, the watchdog writes `/srv/cc-ci/.cc-ci-logs/SEQUENCE-COMPLETE`, stops the
loops, and exits — so this hourly wake stops too (it lives in the watchdog). On that event: append a
completion note to `cc-ci-plan/JOURNAL.md` and send a proactive PushNotification to the operator.
## 5. Be decisive but minimal
If everything is healthy and active, make no changes — just note the state. If prior-wake work is still
in progress, continue from the live state instead of restarting your analysis.