feat(po): drop periodic fleet sweep — operator-driven, recover-if-dead only
The PO's job is to manage projects on request, not watch them live. Remove the hourly wake/sweep entirely: - agents.toml: watch="heal" (recover-if-dead), no `wake` field - prompts/supervise.md: deleted - prompts/orchestrator.md, README.md, docs/bootstrap.md, docs/manage-projects.md: drop sweep/wake references; document operator-driven, no periodic sweep Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@ -10,7 +10,7 @@ is no special "control-plane" code path.
|
||||
project-orchestrator/
|
||||
agents.toml this project's harness config (one persistent fleet-management agent)
|
||||
engine/ the agent-orchestrator harness, pinned as a submodule @ v0.1.0
|
||||
prompts/ the PO agent's role (orchestrator.md) + periodic sweep (supervise.md)
|
||||
prompts/ the PO agent's role (orchestrator.md)
|
||||
fleet.toml THE fleet registry — the only record of project ↔ harness ↔ ref ↔ location
|
||||
scripts/ fleet.py + create/start/stop/update-project.sh — the management helpers
|
||||
docs/ runbooks: manage-projects.md, fleet-registry.md, bootstrap.md
|
||||
@ -41,9 +41,10 @@ python3 engine/agents.py up # start the PO agent + its watchdo
|
||||
python3 engine/agents.py down # stop everything
|
||||
```
|
||||
|
||||
The PO agent itself is one persistent `claude` session (`prompts/orchestrator.md`) with an hourly
|
||||
fleet sweep (`prompts/supervise.md`), supervised by the harness watchdog. See `engine/README.md`
|
||||
for the full harness reference.
|
||||
The PO agent itself is one persistent `claude` session (`prompts/orchestrator.md`), kept alive by the
|
||||
harness watchdog (recover-if-dead) but **never** woken on a timer: it is operator-driven and manages
|
||||
projects on request rather than watching them live. See `engine/README.md` for the full harness
|
||||
reference.
|
||||
|
||||
## Managing the fleet
|
||||
|
||||
|
||||
@ -45,7 +45,8 @@ fatal_re = "redacted_thinking|blocks cannot be modified|cannot be modified"
|
||||
# A single persistent fleet-management agent is enough to start (the plan: "add a loop only if
|
||||
# useful"). It is NOT a build loop — it manages a fleet of *other* projects: create / start / stop
|
||||
# / update / list / status, reading each project's harness docs to work out how to drive it. Its
|
||||
# startup prompt + periodic wake nudge live in prompts/.
|
||||
# startup prompt lives in prompts/. It is operator-driven: NO periodic wake — the PO manages
|
||||
# projects on request, it does not watch them live.
|
||||
|
||||
[[agent]]
|
||||
name = "project-orchestrator" # tmux session: po-project-orchestrator
|
||||
@ -53,6 +54,6 @@ kind = "persistent"
|
||||
backend = "claude"
|
||||
model = "claude-opus-4-8"
|
||||
resume = true # resume its session across restarts (--resume <state id>)
|
||||
watch = "heal" # keep it alive/healed; never reboot just for being idle
|
||||
watch = "heal" # recover-if-dead (crash/wedge/wrong-backend); never reboot for idle
|
||||
prompt_file = "prompts/orchestrator.md" # startup prompt: read your role + fleet, then report
|
||||
wake = { interval = 3600, prompt_file = "prompts/supervise.md" } # hourly fleet sweep
|
||||
# no `wake`: the watchdog sends NO periodic prompts. It heals a dead session but never nudges a live one.
|
||||
|
||||
@ -36,13 +36,13 @@ fleet pieces.
|
||||
|
||||
3. **Write the harness config** — `agents.toml` declaring the PO's own agent(s). A single
|
||||
`kind = "persistent"` `project-orchestrator` agent (backend `claude`) is enough to start; its
|
||||
startup prompt is `prompts/orchestrator.md` and it gets an hourly `wake` →
|
||||
`prompts/supervise.md`. (You can scaffold a starter with `python3 engine/agents.py init .` and
|
||||
then edit it, or copy this repo's `agents.toml`.)
|
||||
startup prompt is `prompts/orchestrator.md`, with `watch = "heal"` (recover-if-dead) and **no**
|
||||
`wake` — the PO is operator-driven, not woken on a timer. (You can scaffold a starter with
|
||||
`python3 engine/agents.py init .` and then edit it, or copy this repo's `agents.toml`.)
|
||||
|
||||
4. **Add the fleet pieces** (what makes this project a PO):
|
||||
- `fleet.toml` — the registry (schema: `docs/fleet-registry.md`), starting empty or with a sample.
|
||||
- `prompts/orchestrator.md` + `prompts/supervise.md` — the PO agent's role and periodic sweep.
|
||||
- `prompts/orchestrator.md` — the PO agent's role / startup prompt.
|
||||
- `scripts/` — `fleet.py` (read/validate the registry) and `create/start/stop/update-project.sh`.
|
||||
- `docs/` — these runbooks.
|
||||
|
||||
|
||||
@ -91,5 +91,5 @@ python3 scripts/fleet.py status # + a total/enabled/disabled summary
|
||||
```
|
||||
|
||||
This reads only `fleet.toml`. To also check live state, drive each enabled project's harness
|
||||
(`engine/agents.py status --config <project>/agents.toml`) — `prompts/supervise.md` does this on the
|
||||
PO's periodic wake.
|
||||
(`engine/agents.py status --config <project>/agents.toml`). The PO does this **on request** — there
|
||||
is no periodic fleet sweep; this repo manages projects when asked, it does not watch them live.
|
||||
|
||||
@ -41,7 +41,8 @@ For each flow, follow the runbook in `docs/manage-projects.md`. In short:
|
||||
1. Read `fleet.toml` and `docs/manage-projects.md` so you know the current fleet and your runbooks.
|
||||
2. Run `python3 scripts/fleet.py status` to see the fleet's declared state.
|
||||
3. Report a short summary: how many projects, which are enabled, anything that looks wrong. Then idle
|
||||
until your next wake or an operator instruction.
|
||||
until an operator instruction.
|
||||
|
||||
Do not invent work. You act when an operator asks you to create/start/stop/update a project, or when
|
||||
your periodic wake (`prompts/supervise.md`) tells you to sweep the fleet.
|
||||
Do not invent work. You are operator-driven: you act when an operator asks you to
|
||||
create/start/stop/update/list/status a project. There is no periodic fleet sweep — this repo's job is
|
||||
to *manage* projects on request, not to watch them live.
|
||||
|
||||
@ -1,17 +0,0 @@
|
||||
# Periodic fleet sweep
|
||||
|
||||
A scheduled wake. Do a light, read-only sweep of the fleet — do not start work unless something is
|
||||
clearly wrong and a runbook covers the fix.
|
||||
|
||||
1. `python3 scripts/fleet.py status` — list every project in `fleet.toml` with its location, harness,
|
||||
pinned ref, and enabled flag.
|
||||
2. For each **enabled** project whose location is reachable from this host, optionally check whether
|
||||
its harness reports it running (for an `agent-orchestrator` project:
|
||||
`engine/agents.py status --config <project>/agents.toml`). Reading its harness docs first if the
|
||||
harness is unfamiliar.
|
||||
3. Report a one-paragraph summary: total / enabled / disabled, anything unreachable or stopped that
|
||||
should be running. If a fix is needed and `docs/manage-projects.md` covers it, you may apply it;
|
||||
otherwise just flag it.
|
||||
|
||||
Remember the one-directional rule: never write fleet/PO state into a project repo. The fleet's truth
|
||||
is `fleet.toml` here.
|
||||
Reference in New Issue
Block a user