From 35f83a4b74eabc7dbb6d81f5b1ef1b76164c6613 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Tue, 2 Jun 2026 01:42:49 +0000 Subject: [PATCH] docs: orchestration.md as the root agent map; wake prompt + AGENTS.md point to it One root doc maps every agent (Builder, Adversary, Orchestrator, Assistant, Upgrader) -> its prompt + plan, with the watchdog and git coordination protocol as the subtlety beneath. Fold the orchestrator supervision routine into it (remove orchestrator-supervision.md). The hourly wake prompt and AGENTS.md now just point at orchestration.md. Co-Authored-By: Claude Opus 4.8 --- AGENTS.md | 15 +++--- cc-ci-plan/ai-progress-monitor-prompt.txt | 2 +- cc-ci-plan/orchestration.md | 63 +++++++++++++++++++++++ cc-ci-plan/orchestrator-supervision.md | 41 --------------- 4 files changed, 71 insertions(+), 50 deletions(-) create mode 100644 cc-ci-plan/orchestration.md delete mode 100644 cc-ci-plan/orchestrator-supervision.md diff --git a/AGENTS.md b/AGENTS.md index 075f0cf..2eaec26 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -5,16 +5,15 @@ server. It holds the plan, the launch/supervision tooling, and the two loop prom project (NixOS config, test runner, recipe tests) lives in a **separate** repo the loops create at `git.autonomic.zone/recipe-maintainers/cc-ci` — do not confuse the two. -## Three roles (don't conflate them) +## The agent map lives in `cc-ci-plan/orchestration.md` -1. **Orchestrator** — *this* session/role. Supervises: checks in on the two loops, reads their - logs/STATUS, makes changes to the plan/prompts, restarts loops, and owns the VM-level fallback. - It is **separate** from the loops and is the only role that should power-cycle/recreate the VM. -2. **Builder loop** — builds the CI server (`cc-ci-plan/prompts/builder.md`). -3. **Adversary loop** — independently disbelieves/verifies (`cc-ci-plan/prompts/adversary.md`). +That doc is the **root structure**: every agent (Builder, Adversary, Orchestrator, Assistant, +Upgrader) → its prompt + plan, plus the watchdog and the git coordination protocol. Read it first. -The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator -watches from outside. +In short: the **Orchestrator** (*this* session/role) supervises and keeps everyone on track — it is +**separate** from the loops and is the only role that should power-cycle/recreate the host. The +**Builder** and **Adversary** loops coordinate **only** through the cc-ci git repo (`plan.md` §6.1); +the orchestrator watches from outside. ## On startup: read the journal, announce yourself, report reboots diff --git a/cc-ci-plan/ai-progress-monitor-prompt.txt b/cc-ci-plan/ai-progress-monitor-prompt.txt index b88ffa6..9732849 100644 --- a/cc-ci-plan/ai-progress-monitor-prompt.txt +++ b/cc-ci-plan/ai-progress-monitor-prompt.txt @@ -1 +1 @@ -You are the cc-ci orchestrator, woken for your scheduled supervision pass. Read /srv/cc-ci/cc-ci-plan/orchestrator-supervision.md and follow it — it tells you how to find the CURRENT phase (via `python3 cc-ci-plan/launch.py status`) and what to check, nudge, and do. Be decisive but minimal; if everything is healthy and active, just note the state. +You are the cc-ci orchestrator, woken for your scheduled supervision pass. Read /srv/cc-ci/cc-ci-plan/orchestration.md (the agent map) and do your job per its "The orchestrator's job — keep everyone on track" section — it tells you how to find the CURRENT phase (`python3 cc-ci-plan/launch.py status`) and what to check, nudge, and do. Be decisive but minimal; if everything is healthy and active, just note the state. diff --git a/cc-ci-plan/orchestration.md b/cc-ci-plan/orchestration.md new file mode 100644 index 0000000..7815057 --- /dev/null +++ b/cc-ci-plan/orchestration.md @@ -0,0 +1,63 @@ +# orchestration.md — cc-ci agent map (root) + +The root structure of the cc-ci build system: who the agents are, where each one's **prompt** and +**plan** live, and how they're kept on track. **Start here.** The watchdog and the coordination +protocol are the subtlety beneath this map (see the last two sections). + +## Agents + +| Agent | Role (one line) | Prompt | Plan / SSOT | Session · workdir · launcher | +|---|---|---|---|---| +| **Builder** | Builds the CI server; one of two independent loops | `cc-ci-plan/prompts/builder.md` | the **current phase plan** (`launch.py status` names it) + master `cc-ci-plan/plan.md` | `cc-ci-builder` · `/srv/cc-ci/cc-ci` · `launch.py` | +| **Adversary** | Independently disbelieves & verifies the Builder; owns REVIEW + veto | `cc-ci-plan/prompts/adversary.md` | same current phase plan (verifies against it) + `plan.md` | `cc-ci-adv` · `/srv/cc-ci/cc-ci-adv` · `launch.py` | +| **Orchestrator** | **Keeps everyone on track** — supervises, nudges, fixes plans/prompts, owns the host-level fallback | wake: `cc-ci-plan/ai-progress-monitor-prompt.txt` → **this doc (§The orchestrator's job)** | this doc + `cc-ci-plan/JOURNAL.md` (handoff record) | `cc-ci-orchestrator-vm` · `/srv/cc-ci-orch` · `launch-orchestrator.py` | +| **Assistant** | One-shot agent dispatched for cross-cutting passes (e.g. mirror reconcile); idle unless dispatched | assignment set at launch (`launch-assistant.py`) | the task it's dispatched with | `cc-ci-assistant` · `/srv/cc-ci-orch` · `launch-assistant.py` | +| **Upgrader** | Weekly one-shot: runs `/upgrade-all` (recipe-upgrade survey + PRs, never merges) | the `/upgrade-all` skill | triggered by the `cc-ci-upgrade-all` systemd timer (Sun 02:00 UTC) | `cc-ci-upgrader` · `/srv/cc-ci` · `launch-upgrader.py` | + +Phases are defined in `.cc-ci-logs/.phases-spec` (id|planfile|statusfile, persisted by `launch.py +start`); `launch.py status` shows the current one. Backend is `claude`/`sonnet` (`.loop-backend`/ +`.loop-model`). + +## The orchestrator's job — keep everyone on track + +On each scheduled wake (and on startup — see `AGENTS.md`): +1. **Current phase, read live:** `python3 cc-ci-plan/launch.py status` → current phase id, its plan + file, its `STATUS-.md`. Never assume a phase; whatever it reports IS the phase. +2. **Live-state checks:** builder/adv/watchdog panes (`tmux capture-pane -pt …`); `.loop-backend`=claude + & `.loop-model`=sonnet; `ssh cc-ci hostname` reachable. +3. **Keep them moving** — only intervene where the watchdog can't: + - Builder stalled / idle past its WAITING-UNTIL with no work → nudge to continue the current phase. + - Adversary stale / on old evidence → nudge to re-orient + verify outstanding claims. + - Loop at high context (≳85%) → nudge it to `/compact` (lossless; state is in git + STATUS/REVIEW). + - Loop session missing → `RESUME_PHASE=1 LOOP_BACKEND=claude LOOP_MODEL=sonnet python3 cc-ci-plan/launch.py start`. + - **Revised a plan a loop is already working in?** Ping the session to re-read it — loops read the + plan at kickoff and won't see later edits unless told. +4. **Completion:** a phase is done when its `STATUS-.md` has a line starting with `## DONE`; the + watchdog auto-advances. When the LAST phase finishes the watchdog writes `SEQUENCE-COMPLETE`, stops + the loops, and exits (so the hourly wake stops too) → append to `JOURNAL.md` + proactive PushNotification. +5. **Be decisive but minimal.** Healthy + active → just note the state. Don't make unrelated changes. + +## Subtlety: the watchdog (`launch.py watchdog`) + +A non-agent supervisor loop (`cc-ci-watchdog` tmux session, started by `launch.py start` / the +`cc-ci-loops.service` boot unit). It: heals dead/wedged loop sessions, **pings the other loop on every +`claim(...)`/`review(...)` commit** (the handoff signal), enforces liveness (kills+reboots a loop idle +past its `WAITING-UNTIL`), **auto-advances phases** when a `STATUS-.md` hits `## DONE`, and writes +`SEQUENCE-COMPLETE` at the end. It also fires the **hourly orchestrator wake** (this doc, via the wake +prompt). It does NOT compact loops or make decisions — that's the orchestrator. + +## Subtlety: coordination protocol (`plan.md` §6.1) + +The two loops coordinate **only** through the cc-ci git repo — never directly: +- `git pull --rebase` before every edit; smallest change; commit; push (never `--force`). +- **Commit-prefix convention** (the watchdog depends on it): `claim(...)` = Builder claims a gate; + `review(...)` = Adversary verdict. Those prefixes ARE the handoff signal. +- Phase-namespaced state files in the repo: `STATUS-.md`, `BACKLOG-.md`, `REVIEW-.md`, + `JOURNAL-.md`; `DECISIONS.md` is shared. +- Inbox side-channels for non-gate messages: `BUILDER-INBOX.md` / `ADVERSARY-INBOX.md` (watchdog + edge-pings on appearance; consumer deletes on read). +- Full rules: `plan.md` §6.1 (coordination), §7 (pacing/liveness), §9 (guardrails). + +## See also +- `AGENTS.md` — orchestrator on-startup routine + host/reboot facts (Hetzner `cpx22`). +- `plan.md` — master build SSOT. diff --git a/cc-ci-plan/orchestrator-supervision.md b/cc-ci-plan/orchestrator-supervision.md deleted file mode 100644 index ed41500..0000000 --- a/cc-ci-plan/orchestrator-supervision.md +++ /dev/null @@ -1,41 +0,0 @@ -# Orchestrator supervision routine - -The routine the cc-ci orchestrator follows on each scheduled wake-up. The hourly wake (driven by the -watchdog) types a one-line pointer to THIS file — so the wake prompt itself never needs editing as the -build moves through phases. This doc holds the actual instructions and looks the current phase up live. - -Supervise the two worker loops (Builder + Adversary, both on the claude backend), nudge anything -stalled, confirm progress, otherwise stay hands-off. Do NOT make unrelated code changes. - -## 1. Current phase — always read it live (never assume a specific phase) -Run `python3 cc-ci-plan/launch.py status`. It reports the CURRENT phase id, its plan file, and its -`STATUS-.md` (from `.phases-spec`). Whatever it says now IS the phase; that phase's plan file (in -`cc-ci-plan/`) is the loops' single source of truth for what they're doing this phase. - -## 2. Live-state checks -- builder / adv panes: `tmux capture-pane -pt cc-ci-builder` / `-pt cc-ci-adv` -- watchdog: `tmux capture-pane -pt cc-ci-watchdog` (+ `tail /srv/cc-ci/.cc-ci-logs/watchdog.log`) -- backend sanity: `.loop-backend` = `claude`, `.loop-model` = `sonnet` -- `ssh cc-ci hostname` → CI server reachable - -## 3. Keep them moving -- Builder stalled / idle past its WAITING-UNTIL with no work → nudge it to continue the current phase. -- Adversary stale / parked on old evidence → nudge it to re-orient and verify outstanding claims. -- The watchdog heals dead sessions and pings on `claim()`/`review()` commits — only intervene where it - CAN'T: a wedged-but-alive loop, genuine drift, or a loop at high context (≳85%) that should `/compact` - (lossless — state is in git + the phase STATUS/REVIEW files). -- Loop session missing entirely → `RESUME_PHASE=1 LOOP_BACKEND=claude LOOP_MODEL=sonnet python3 cc-ci-plan/launch.py start`. -- If you revised a plan a loop is already working in, re-read it to them (ping the session) — loops read - the plan at kickoff and won't see later edits unless told. - -## 4. Completion -- A phase is done when its `STATUS-.md` (in `/srv/cc-ci/cc-ci/machine-docs/`) has a line starting - with `## DONE` (every gate Adversary-verified, no standing VETO). The watchdog auto-advances to the - next phase (`[n/N]` in status). -- When the LAST phase finishes, the watchdog writes `/srv/cc-ci/.cc-ci-logs/SEQUENCE-COMPLETE`, stops the - loops, and exits — so this hourly wake stops too (it lives in the watchdog). On that event: append a - completion note to `cc-ci-plan/JOURNAL.md` and send a proactive PushNotification to the operator. - -## 5. Be decisive but minimal -If everything is healthy and active, make no changes — just note the state. If prior-wake work is still -in progress, continue from the live state instead of restarting your analysis.