feat(po): drop periodic fleet sweep — operator-driven, recover-if-dead only

The PO's job is to manage projects on request, not watch them live. Remove the
hourly wake/sweep entirely:

- agents.toml: watch="heal" (recover-if-dead), no `wake` field
- prompts/supervise.md: deleted
- prompts/orchestrator.md, README.md, docs/bootstrap.md, docs/manage-projects.md:
  drop sweep/wake references; document operator-driven, no periodic sweep

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-14 15:04:08 +00:00
parent 6cc3ed4f13
commit 0456837444
6 changed files with 19 additions and 33 deletions

View File

@ -10,7 +10,7 @@ is no special "control-plane" code path.
project-orchestrator/
agents.toml this project's harness config (one persistent fleet-management agent)
engine/ the agent-orchestrator harness, pinned as a submodule @ v0.1.0
prompts/ the PO agent's role (orchestrator.md) + periodic sweep (supervise.md)
prompts/ the PO agent's role (orchestrator.md)
fleet.toml THE fleet registry — the only record of project ↔ harness ↔ ref ↔ location
scripts/ fleet.py + create/start/stop/update-project.sh — the management helpers
docs/ runbooks: manage-projects.md, fleet-registry.md, bootstrap.md
@ -41,9 +41,10 @@ python3 engine/agents.py up # start the PO agent + its watchdo
python3 engine/agents.py down # stop everything
```
The PO agent itself is one persistent `claude` session (`prompts/orchestrator.md`) with an hourly
fleet sweep (`prompts/supervise.md`), supervised by the harness watchdog. See `engine/README.md`
for the full harness reference.
The PO agent itself is one persistent `claude` session (`prompts/orchestrator.md`), kept alive by the
harness watchdog (recover-if-dead) but **never** woken on a timer: it is operator-driven and manages
projects on request rather than watching them live. See `engine/README.md` for the full harness
reference.
## Managing the fleet

View File

@ -45,7 +45,8 @@ fatal_re = "redacted_thinking|blocks cannot be modified|cannot be modified"
# A single persistent fleet-management agent is enough to start (the plan: "add a loop only if
# useful"). It is NOT a build loop — it manages a fleet of *other* projects: create / start / stop
# / update / list / status, reading each project's harness docs to work out how to drive it. Its
# startup prompt + periodic wake nudge live in prompts/.
# startup prompt lives in prompts/. It is operator-driven: NO periodic wake — the PO manages
# projects on request, it does not watch them live.
[[agent]]
name = "project-orchestrator" # tmux session: po-project-orchestrator
@ -53,6 +54,6 @@ kind = "persistent"
backend = "claude"
model = "claude-opus-4-8"
resume = true # resume its session across restarts (--resume <state id>)
watch = "heal" # keep it alive/healed; never reboot just for being idle
watch = "heal" # recover-if-dead (crash/wedge/wrong-backend); never reboot for idle
prompt_file = "prompts/orchestrator.md" # startup prompt: read your role + fleet, then report
wake = { interval = 3600, prompt_file = "prompts/supervise.md" } # hourly fleet sweep
# no `wake`: the watchdog sends NO periodic prompts. It heals a dead session but never nudges a live one.

View File

@ -36,13 +36,13 @@ fleet pieces.
3. **Write the harness config** — `agents.toml` declaring the PO's own agent(s). A single
`kind = "persistent"` `project-orchestrator` agent (backend `claude`) is enough to start; its
startup prompt is `prompts/orchestrator.md` and it gets an hourly `wake` →
`prompts/supervise.md`. (You can scaffold a starter with `python3 engine/agents.py init .` and
then edit it, or copy this repo's `agents.toml`.)
startup prompt is `prompts/orchestrator.md`, with `watch = "heal"` (recover-if-dead) and **no**
`wake` — the PO is operator-driven, not woken on a timer. (You can scaffold a starter with
`python3 engine/agents.py init .` and then edit it, or copy this repo's `agents.toml`.)
4. **Add the fleet pieces** (what makes this project a PO):
- `fleet.toml` — the registry (schema: `docs/fleet-registry.md`), starting empty or with a sample.
- `prompts/orchestrator.md` + `prompts/supervise.md` — the PO agent's role and periodic sweep.
- `prompts/orchestrator.md` — the PO agent's role / startup prompt.
- `scripts/` — `fleet.py` (read/validate the registry) and `create/start/stop/update-project.sh`.
- `docs/` — these runbooks.

View File

@ -91,5 +91,5 @@ python3 scripts/fleet.py status # + a total/enabled/disabled summary
```
This reads only `fleet.toml`. To also check live state, drive each enabled project's harness
(`engine/agents.py status --config <project>/agents.toml`)`prompts/supervise.md` does this on the
PO's periodic wake.
(`engine/agents.py status --config <project>/agents.toml`). The PO does this **on request** there
is no periodic fleet sweep; this repo manages projects when asked, it does not watch them live.

View File

@ -41,7 +41,8 @@ For each flow, follow the runbook in `docs/manage-projects.md`. In short:
1. Read `fleet.toml` and `docs/manage-projects.md` so you know the current fleet and your runbooks.
2. Run `python3 scripts/fleet.py status` to see the fleet's declared state.
3. Report a short summary: how many projects, which are enabled, anything that looks wrong. Then idle
until your next wake or an operator instruction.
until an operator instruction.
Do not invent work. You act when an operator asks you to create/start/stop/update a project, or when
your periodic wake (`prompts/supervise.md`) tells you to sweep the fleet.
Do not invent work. You are operator-driven: you act when an operator asks you to
create/start/stop/update/list/status a project. There is no periodic fleet sweep this repo's job is
to *manage* projects on request, not to watch them live.

View File

@ -1,17 +0,0 @@
# Periodic fleet sweep
A scheduled wake. Do a light, read-only sweep of the fleet — do not start work unless something is
clearly wrong and a runbook covers the fix.
1. `python3 scripts/fleet.py status` — list every project in `fleet.toml` with its location, harness,
pinned ref, and enabled flag.
2. For each **enabled** project whose location is reachable from this host, optionally check whether
its harness reports it running (for an `agent-orchestrator` project:
`engine/agents.py status --config <project>/agents.toml`). Reading its harness docs first if the
harness is unfamiliar.
3. Report a one-paragraph summary: total / enabled / disabled, anything unreachable or stopped that
should be running. If a fix is needed and `docs/manage-projects.md` covers it, you may apply it;
otherwise just flag it.
Remember the one-directional rule: never write fleet/PO state into a project repo. The fleet's truth
is `fleet.toml` here.