commit 346ed31acbc0d98eeb2881a1b62998ac9544c002 Author: autonomic-bot Date: Sat Jun 13 19:15:47 2026 +0000 feat: project-orchestrator — engine@v0.1.0 submodule, PO config, fleet.toml registry, mgmt scripts, docs, Nix The PO is itself a project using the agent-orchestrator harness (engine/ submodule pinned at v0.1.0). Adds: agents.toml (one persistent fleet-management agent) + prompts/; fleet.toml (the sole project<->harness<->ref registry) + docs/fleet-registry.md; scripts/ (fleet.py + create/start/stop/update-project.sh); docs/manage-projects.md + docs/bootstrap.md; flake.nix/.lock devShell (python311+tmux+git); README. Co-Authored-By: Claude Opus 4.8 diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..64c6132 --- /dev/null +++ b/.gitignore @@ -0,0 +1,9 @@ +# runtime state + logs (never committed) +.ao-state/ +*.log +__pycache__/ +*.pyc +result + +# projects scaffolded locally by scripts/create-project.sh are their own repos — not tracked here +/projects/ diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 0000000..a48e6a3 --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "engine"] + path = engine + url = https://git.autonomic.zone/recipe-maintainers/agent-orchestrator.git diff --git a/README.md b/README.md new file mode 100644 index 0000000..273d219 --- /dev/null +++ b/README.md @@ -0,0 +1,96 @@ +# project-orchestrator (PO) + +The **project-orchestrator** manages a *fleet* of independent projects — creating, starting, +stopping, updating, and monitoring them. It is itself **just a project** that uses the +[`agent-orchestrator`](https://git.autonomic.zone/recipe-maintainers/agent-orchestrator) harness +(vendored as the `engine/` submodule). What makes it the PO is its *job*, not its architecture: there +is no special "control-plane" code path. + +``` +project-orchestrator/ + agents.toml this project's harness config (one persistent fleet-management agent) + engine/ the agent-orchestrator harness, pinned as a submodule @ v0.1.0 + prompts/ the PO agent's role (orchestrator.md) + periodic sweep (supervise.md) + fleet.toml THE fleet registry — the only record of project ↔ harness ↔ ref ↔ location + scripts/ fleet.py + create/start/stop/update-project.sh — the management helpers + docs/ runbooks: manage-projects.md, fleet-registry.md, bootstrap.md + flake.nix/.lock a Nix devShell (python311 + tmux + git) + .ao-state/ runtime state + logs (gitignored) +``` + +## The one rule: knowledge is one-directional (PO → projects, never the reverse) + +A project repo contains **nothing** about the PO or the fleet — no fleet metadata, no `fleet.toml`, +no mention of a PO. A project can be run and inspected entirely by hand and would have no idea a PO +exists. The *only* record of which projects exist, where, which harness, what ref, and whether +they're enabled is this repo's **`fleet.toml`**. The PO knows the projects; the projects never know +the PO. + +What a project *does* carry is its `engine/` submodule pin — a plain git fact (this harness, this +ref) with no fleet semantics. The PO's `fleet.toml` mirrors that for fleet inventory. + +## Quick start + +```bash +nix develop # python311 + tmux + git on PATH (see "Nix") + +python3 engine/agents.py status # the PO's own agents (reads agents.toml) +python3 scripts/fleet.py status # the fleet registry: projects, enabled, harness@ref + +python3 engine/agents.py up # start the PO agent + its watchdog (needs claude on PATH) +python3 engine/agents.py down # stop everything +``` + +The PO agent itself is one persistent `claude` session (`prompts/orchestrator.md`) with an hourly +fleet sweep (`prompts/supervise.md`), supervised by the harness watchdog. See `engine/README.md` +for the full harness reference. + +## Managing the fleet + +Full runbooks in [`docs/manage-projects.md`](docs/manage-projects.md). In short: + +```bash +# create a new project: scaffolds engine/ submodule + harness config, NO PO/fleet metadata inside it +scripts/create-project.sh my-new-project --ref v0.1.0 [--register] + +# drive an existing fleet project's harness (resolves location via fleet.toml) +scripts/start-project.sh +scripts/stop-project.sh +scripts/update-project.sh # bump that project's engine pin (per-project, opt-in) + +# inspect the fleet +python3 scripts/fleet.py list | status | validate | get +``` + +The fleet registry schema is documented in [`docs/fleet-registry.md`](docs/fleet-registry.md). + +## Nix + +A `flake.nix` provides a reproducible devShell with the runtime deps (`python311` for stdlib +`tomllib`, plus `tmux` and `git` incl. submodule support): + +```bash +nix develop # enter the shell +nix develop -c python3 -c 'import tomllib' # sanity: the runtime python has tomllib +nix develop -c python3 engine/agents.py status # run a command in the shell +nix flake check # evaluate + build the devShell +``` + +The agent CLI the PO uses (`claude`) is an **external, non-Nix tool** — install it per its own docs +and ensure it is on `PATH` before launching the live agent. + +## First-time setup / re-scaffolding + +Nothing creates the *first* PO — it is hand-scaffolded once. +[`docs/bootstrap.md`](docs/bootstrap.md) records exactly how (it is how this repo was made), and how +an existing PO can later re-scaffold itself. + +## Cloning + +Because the harness is a submodule, clone recursively: + +```bash +git clone --recurse-submodules https://git.autonomic.zone/recipe-maintainers/project-orchestrator.git +# or, after a plain clone: +git submodule update --init --recursive +``` diff --git a/agents.toml b/agents.toml new file mode 100644 index 0000000..3c516ae --- /dev/null +++ b/agents.toml @@ -0,0 +1,58 @@ +# project-orchestrator — agent-orchestrator harness config (this project's ONLY config). +# +# The PO is just a project that uses the agent-orchestrator harness (vendored as the `engine/` +# submodule). What makes it the project-orchestrator is its *job* — fleet management — not its +# architecture. Run it by hand exactly like any other project: +# +# nix develop -c python3 engine/agents.py status +# nix develop -c python3 engine/agents.py up # start the PO agent + watchdog +# nix develop -c python3 engine/agents.py down +# +# Runtime state (resume ids, limit windows) lives under /state/, NOT here. There is NO +# fleet/PO metadata in any project's config — the fleet lives only in this repo's fleet.toml. + +# ─────────────────────────── global watchdog cadence ─────────────────────────── +[watchdog] +signal_interval = 30 # s between handoff / stall / limit checks (light) +heavy_interval = 300 # s between heal / phase-advance checks +limit_probe_fallback = 300 # flat probe cadence when a reset time can't be parsed +limit_reset_slack = 45 # s past a parsed reset before probing +stall_grace = 180 # s of slack past a WAITING-UNTIL marker before a stall reboot + +# ─────────────────────────── defaults inherited by every agent ─────────────────────────── +[defaults] +session_prefix = "po-" # REQUIRED — tmux namespace for the project-orchestrator project +log_dir = ".ao-state" # REQUIRED — logs + state/, resolved relative to this file +backend = "claude" +model = "claude-opus-4-8" +watch = "heal" # none | heal | heal+stall + +# ─────────────────────────── backends (declared as data) ─────────────────────────── +[backend.claude] +bin = "claude" +flags = "--dangerously-skip-permissions" +remote_control = true +supports_resume = true +prompt_delivery = "arg" +process_name = "claude" +submit_key = "Enter" +stall_idle = 300 +active_re = "esc to interrupt|Running tool|⠇|⠙|· \\d+" +limit_re = "spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)" +fatal_re = "redacted_thinking|blocks cannot be modified|cannot be modified" + +# ─────────────────────────── the PO agent ─────────────────────────── +# A single persistent fleet-management agent is enough to start (the plan: "add a loop only if +# useful"). It is NOT a build loop — it manages a fleet of *other* projects: create / start / stop +# / update / list / status, reading each project's harness docs to work out how to drive it. Its +# startup prompt + periodic wake nudge live in prompts/. + +[[agent]] +name = "project-orchestrator" # tmux session: po-project-orchestrator +kind = "persistent" +backend = "claude" +model = "claude-opus-4-8" +resume = true # resume its session across restarts (--resume ) +watch = "heal" # keep it alive/healed; never reboot just for being idle +prompt_file = "prompts/orchestrator.md" # startup prompt: read your role + fleet, then report +wake = { interval = 3600, prompt_file = "prompts/supervise.md" } # hourly fleet sweep diff --git a/docs/bootstrap.md b/docs/bootstrap.md new file mode 100644 index 0000000..41ee4d0 --- /dev/null +++ b/docs/bootstrap.md @@ -0,0 +1,66 @@ +# Bootstrap — how the *first* project-orchestrator is hand-scaffolded + +The PO creates projects — but nothing creates the *first* PO. It is hand-scaffolded once. Thereafter +the PO can create projects, and can even re-scaffold itself. This doc records that one-time setup so a +fresh PO can be stood up from scratch (it is exactly how *this* repo was made). + +## What a PO is + +A PO is **just a project** that uses the `agent-orchestrator` harness, plus two things no ordinary +project has: a **fleet registry** (`fleet.toml`) and **fleet-management agents/runbooks** +(`prompts/`, `scripts/`, `docs/`). Its architecture is identical to any project; only its *job* +differs. So bootstrapping a PO = scaffolding a normal agent-orchestrator project, then adding the +fleet pieces. + +## Prerequisites + +- `git` (with submodule support), `python3 >= 3.11` (stdlib `tomllib`), `tmux` — all provided by + `nix develop` (see the README's Nix section). +- The agent CLI the PO agent uses (`claude`) on `PATH` — external, not from Nix. +- A git host for the PO repo (this PO lives at `recipe-maintainers/project-orchestrator`). + +## Steps (one time) + +1. **Create the PO repo** and clone it. + ```bash + git init -b main project-orchestrator && cd project-orchestrator + ``` + +2. **Vendor the harness as `engine/`**, pinned at a release tag (the submodule pin IS the engine + version): + ```bash + git submodule add https://git.autonomic.zone/recipe-maintainers/agent-orchestrator.git engine + git -C engine fetch --tags && git -C engine checkout v0.1.0 + git add .gitmodules engine + ``` + +3. **Write the harness config** — `agents.toml` declaring the PO's own agent(s). A single + `kind = "persistent"` `project-orchestrator` agent (backend `claude`) is enough to start; its + startup prompt is `prompts/orchestrator.md` and it gets an hourly `wake` → + `prompts/supervise.md`. (You can scaffold a starter with `python3 engine/agents.py init .` and + then edit it, or copy this repo's `agents.toml`.) + +4. **Add the fleet pieces** (what makes this project a PO): + - `fleet.toml` — the registry (schema: `docs/fleet-registry.md`), starting empty or with a sample. + - `prompts/orchestrator.md` + `prompts/supervise.md` — the PO agent's role and periodic sweep. + - `scripts/` — `fleet.py` (read/validate the registry) and `create/start/stop/update-project.sh`. + - `docs/` — these runbooks. + +5. **Add Nix + `.gitignore`** (`.ao-state/` is runtime state, never committed). Commit and push + `main` with the submodule pinned. + +6. **Bring the PO up** — exactly like any project: + ```bash + nix develop -c python3 engine/agents.py status # sanity: config parses, agent listed + nix develop -c python3 engine/agents.py up # start the PO agent + its watchdog + ``` + +That's it — the PO is live. From here it manages the fleet via the runbooks in +`docs/manage-projects.md`, and can scaffold further projects (including a replacement PO) with +`scripts/create-project.sh`. + +## Re-scaffolding the PO later + +Because a PO is just a project, an existing PO can create a new PO by running the create-a-project +flow with this repo's URL as the engine consumer and then adding the fleet pieces — or, more simply, +by cloning this repo recursively. The bootstrap above is only needed when there is no PO at all. diff --git a/docs/fleet-registry.md b/docs/fleet-registry.md new file mode 100644 index 0000000..65ab174 --- /dev/null +++ b/docs/fleet-registry.md @@ -0,0 +1,55 @@ +# The fleet registry — `fleet.toml` + +`fleet.toml` is the project-orchestrator's **authoritative, PO-only** record of every project in the +fleet. It is the *single* place where project ↔ harness ↔ ref ↔ location is recorded. Projects +themselves carry none of this: knowledge is one-directional (**PO → projects, never the reverse**), so +a project repo never contains a `fleet.toml`, a project id, or any mention of the PO. + +Validate / inspect it with the helper: + +```bash +python3 scripts/fleet.py validate # parse + schema-check; exit 1 on any error +python3 scripts/fleet.py list # one line per project +python3 scripts/fleet.py status # list + a summary count +python3 scripts/fleet.py get # dump one project's full entry +``` + +## Schema + +### `[fleet]` — registry metadata (optional) + +| key | type | meaning | +|---|---|---| +| `version` | integer | registry schema version; bump on a breaking change to this format | + +### `[[project]]` — one block per project + +| key | required | type | meaning | +|---|---|---|---| +| `name` | **yes** | string | unique fleet id, kebab-case. Used by the helper scripts to address the project. | +| `location` | **yes** | string | where the project lives: a git URL (remote) **or** a local filesystem path (a working clone the PO can drive directly). | +| `harness` | **yes** | string | which harness runs the project (e.g. `agent-orchestrator`). The PO reads that harness's docs to know how to drive it. | +| `ref` | **yes** | string | the harness ref the project pins (its `engine/` submodule pin — a tag/branch/SHA). The in-repo submodule pin is the source of truth; this mirrors it for at-a-glance fleet inventory. | +| `enabled` | **yes** | bool | whether this project is part of the active fleet (a bare fleet "start everything" would skip `enabled = false`). | +| `secrets` | **yes** | string | where the project's creds live (a path like `.env`, or a secrets-manager ref). Never the secret values themselves; secrets are never in git. | +| `config` | no | string | the project's harness config file, relative to its root (default `agents.toml`). | +| `notes` | no | string | free-form operator notes. | + +Unknown fields are rejected by `fleet.py validate` (typo guard). Duplicate `name`s are rejected. + +## Why the registry, and only the registry, holds this + +If a project repo recorded its own harness/ref/fleet-membership, that knowledge would be +bidirectional and a project would "know" it belongs to a fleet — breaking isolation and making a +project un-runnable standalone. Instead: + +- **what harness + what ref** a project uses is captured *in the project* only as the `engine/` + submodule pin (a plain git fact, no fleet semantics), and *in the PO* as this registry's + `harness`/`ref`. The project never names a PO. +- **which projects exist, where, enabled** lives *only* here. Remove a project from the fleet by + deleting its `[[project]]` block; the project repo is unaffected and still runnable by hand. + +## Adding an entry + +Either let `scripts/create-project.sh --register` append one when it scaffolds a project, or add a +block by hand and run `python3 scripts/fleet.py validate`. See `docs/manage-projects.md`. diff --git a/docs/manage-projects.md b/docs/manage-projects.md new file mode 100644 index 0000000..10c15aa --- /dev/null +++ b/docs/manage-projects.md @@ -0,0 +1,95 @@ +# Managing projects — the PO runbooks + +These are the flows the project-orchestrator (the AI in `prompts/orchestrator.md`, or a human +operator) follows. The PO is **AI-driven, not contract-bound**: for any harness it doesn't already +know, it *reads that harness's docs* and works out how to drive it. The helper scripts here cover the +common `agent-orchestrator` case; treat them as conveniences, not a rigid interface. + +The one invariant across every flow: **knowledge is one-directional (PO → project)**. Nothing about +the PO or the fleet is ever written into a project repo. + +--- + +## Create a project + +**Goal:** a new, self-contained project repo — the chosen harness vendored as `engine/` at a pinned +ref, a harness config scaffolded by that harness, and **no** PO/fleet metadata — plus (separately) a +`fleet.toml` entry on the PO side. + +```bash +scripts/create-project.sh \ + [--dir ] # where to create it (default: ./projects) + [--engine-url ] # harness repo (default: the agent-orchestrator repo) + [--ref ] # harness ref to pin (default: v0.1.0) + [--prefix ] # tmux namespace in the config (default: -) + [--register] # also append a [[project]] entry to fleet.toml +``` + +What it does, step by step (so the PO can also do it by hand for a non-default harness): + +1. `git init -b main /`. +2. Vendor the harness: `git submodule add engine`, then `git -C engine checkout `. + The submodule pin **is** the recorded harness version inside the project (no other metadata). +3. Scaffold the harness config with the harness's own initializer + (`python3 engine/agents.py init .`), and stamp a unique `session_prefix`. This writes + `agents.toml` + `prompts/` — **harness config only**. +4. Add a `.gitignore` for runtime state (`.ao-state/`), and an initial commit. +5. **Separately** (only with `--register`, or by hand): append a `[[project]]` block to the PO's + `fleet.toml`. This is PO-side; it never lands in the project. + +Verify the new project standalone — it must work with no knowledge of any PO: + +```bash +( cd / && python3 engine/agents.py status ) +``` + +And confirm isolation — the project must contain **no** PO/fleet metadata: + +```bash +cd / +grep -ril -e 'fleet' -e 'project-orchestrator' -e 'project orchestrator' . --exclude-dir=engine --exclude-dir=.git || echo "clean: no PO/fleet metadata" +``` + +(Exclude `engine/` — that's the upstream harness, whose own docs legitimately *describe* being +driven by a PO; that is the harness documenting a consumer, not this project knowing about a fleet.) + +--- + +## Start / stop a project + +Drive the project's harness. For an `agent-orchestrator` project: + +```bash +scripts/start-project.sh [agent...] # → engine/agents.py up (in the project dir) +scripts/stop-project.sh [agent...] # → engine/agents.py down +``` + +Both resolve `` → `location` via `fleet.toml`. For a remote-only `location`, clone it locally +first. For a non-`agent-orchestrator` harness, the wrapper bows out — read that harness's docs and +drive it directly. + +--- + +## Update a project's harness + +Bumping the engine is **per-project and opt-in** — it touches only that project's repo; every other +project keeps its own pin, so one bump can't break another. + +```bash +scripts/update-project.sh # checkout new ref in the project's engine/ + commit +``` + +Then update the project's `ref` in `fleet.toml` and `python3 scripts/fleet.py validate`. + +--- + +## List / status the fleet + +```bash +python3 scripts/fleet.py list # one line per project: name, enabled, harness@ref, location +python3 scripts/fleet.py status # + a total/enabled/disabled summary +``` + +This reads only `fleet.toml`. To also check live state, drive each enabled project's harness +(`engine/agents.py status --config /agents.toml`) — `prompts/supervise.md` does this on the +PO's periodic wake. diff --git a/engine b/engine new file mode 160000 index 0000000..289ef07 --- /dev/null +++ b/engine @@ -0,0 +1 @@ +Subproject commit 289ef07df40a8264f3a36b4e91b923d1424c4658 diff --git a/flake.lock b/flake.lock new file mode 100644 index 0000000..8eeab1b --- /dev/null +++ b/flake.lock @@ -0,0 +1,27 @@ +{ + "nodes": { + "nixpkgs": { + "locked": { + "lastModified": 1751274312, + "narHash": "sha256-/bVBlRpECLVzjV19t5KMdMFWSwKLtb5RyXdjz3LJT+g=", + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "50ab793786d9de88ee30ec4e4c24fb4236fc2674", + "type": "github" + }, + "original": { + "owner": "NixOS", + "repo": "nixpkgs", + "rev": "50ab793786d9de88ee30ec4e4c24fb4236fc2674", + "type": "github" + } + }, + "root": { + "inputs": { + "nixpkgs": "nixpkgs" + } + } + }, + "root": "root", + "version": 7 +} diff --git a/flake.nix b/flake.nix new file mode 100644 index 0000000..ac03596 --- /dev/null +++ b/flake.nix @@ -0,0 +1,41 @@ +{ + description = "project-orchestrator — a project that uses the agent-orchestrator harness to manage a fleet of projects"; + + inputs = { + nixpkgs.url = "github:NixOS/nixpkgs/50ab793786d9de88ee30ec4e4c24fb4236fc2674"; + }; + + outputs = { self, nixpkgs }: + let + systems = [ "x86_64-linux" "aarch64-linux" "x86_64-darwin" "aarch64-darwin" ]; + forAllSystems = f: nixpkgs.lib.genAttrs systems (system: f nixpkgs.legacyPackages.${system}); + in + { + # Reproducible devShell for the PO. Everything the PO needs to run itself + drive the fleet: + # - python311 : stdlib tomllib (>=3.11) — engine/agents.py and scripts/fleet.py import it + # - tmux : the harness runs every agent/watchdog in tmux + # - git : the engine submodule + create/update flows need git (incl. submodule support) + # The agent CLI the PO agent uses (claude) is an external, non-Nix tool — install it per its + # own docs and put it on PATH before `engine/agents.py up`. + devShells = forAllSystems (pkgs: { + default = pkgs.mkShell { + packages = [ + pkgs.python311 # tomllib (>=3.11) + pkgs.tmux # harness sessions + watchdog + pkgs.git # engine submodule + project create/update flows + pkgs.coreutils + pkgs.bash + ]; + shellHook = '' + echo "project-orchestrator devShell — $(python3 --version), $(tmux -V), $(git --version)" + echo "try: python3 engine/agents.py status | python3 scripts/fleet.py status" + ''; + }; + }); + + # `nix flake check` evaluates this — a cheap smoke that the devShell builds. + checks = forAllSystems (pkgs: { + devshell-builds = self.devShells.${pkgs.system}.default; + }); + }; +} diff --git a/fleet.toml b/fleet.toml new file mode 100644 index 0000000..24240a5 --- /dev/null +++ b/fleet.toml @@ -0,0 +1,26 @@ +# fleet.toml — the project-orchestrator's fleet registry. +# +# THE authoritative, PO-only record of every project the fleet manages. This is the ONLY place where +# project ↔ harness ↔ ref ↔ location lives. Projects themselves carry none of this (one-directional +# knowledge: PO → projects, never the reverse). Full schema + field reference: docs/fleet-registry.md +# +# Validate / inspect this file with: python3 scripts/fleet.py list (or `status`). + +# ─────────────────────────── registry metadata (optional) ─────────────────────────── +[fleet] +version = 1 # registry schema version (integer; bump on breaking changes) + +# ─────────────────────────── one [[project]] block per project ─────────────────────────── +# Required fields: name, location, harness, ref, enabled, secrets. +# See docs/fleet-registry.md for the meaning and allowed values of each. + +# Sample entry (a representative project; not started by the PO unless an operator asks). +[[project]] +name = "example-recipe-ci" # unique fleet id (kebab-case) +location = "https://git.autonomic.zone/recipe-maintainers/example-recipe-ci.git" # git url or local path +harness = "agent-orchestrator" # which harness runs it +ref = "v0.1.0" # pinned harness ref (submodule pin) +enabled = true # is this project part of the active fleet? +secrets = ".env" # where the project's creds live (path/ref; never in git) +config = "agents.toml" # the project's harness config file (relative to the project root) +notes = "Sample fleet entry — replace with a real project, or remove." diff --git a/prompts/orchestrator.md b/prompts/orchestrator.md new file mode 100644 index 0000000..2c5b612 --- /dev/null +++ b/prompts/orchestrator.md @@ -0,0 +1,47 @@ +# Role: the project-orchestrator (PO) + +You are the **project-orchestrator** — an AI that manages a *fleet* of independent projects. You are +yourself just a project that uses the `agent-orchestrator` harness (vendored at `engine/`); what is +special about you is your **job**, not your architecture. + +## The one rule that governs everything: knowledge is one-directional + +**PO → projects, never the reverse.** A project repo contains *nothing* about you or the fleet — no +fleet metadata, no `fleet.toml`, no mention of the PO. A project can be run and inspected entirely by +hand and would have no idea a PO exists. The *only* place that records which projects exist, where +they live, which harness they use, at what ref, and whether they're enabled is **this repo's +`fleet.toml`**. Never write PO/fleet metadata into a project repo. + +## What you know about (your inputs) + +- **`fleet.toml`** — the authoritative registry of every project (schema documented in + `docs/fleet-registry.md`). This is your source of truth for the fleet. +- **`engine/README.md`** — how the `agent-orchestrator` harness works (the harness most projects + use). Other projects may use a *different* harness; there is no rigid contract — you **read** each + project's harness docs and work out how to drive it. +- **`docs/`** — your runbooks: + - `docs/manage-projects.md` — the create / start / stop / update / list / status flows. + - `docs/fleet-registry.md` — the `fleet.toml` schema. + - `docs/bootstrap.md` — how the first PO (you) is hand-scaffolded. + +## What you do (your job) + +For each flow, follow the runbook in `docs/manage-projects.md`. In short: + +- **create** a project — scaffold a new repo, add the chosen harness as a submodule at a ref, write + the project's harness config (and **no** PO/fleet metadata), then add a `fleet.toml` entry. + Helper: `scripts/create-project.sh`. +- **start / stop / update** a project — drive that project's harness by reading its docs (for an + `agent-orchestrator` project: `engine/agents.py up|down`, bump the submodule to update). + Helpers: `scripts/start-project.sh`, `scripts/stop-project.sh`, `scripts/update-project.sh`. +- **list / status** — read `fleet.toml` and report. Helper: `scripts/fleet.py list|status`. + +## On startup (now) + +1. Read `fleet.toml` and `docs/manage-projects.md` so you know the current fleet and your runbooks. +2. Run `python3 scripts/fleet.py status` to see the fleet's declared state. +3. Report a short summary: how many projects, which are enabled, anything that looks wrong. Then idle + until your next wake or an operator instruction. + +Do not invent work. You act when an operator asks you to create/start/stop/update a project, or when +your periodic wake (`prompts/supervise.md`) tells you to sweep the fleet. diff --git a/prompts/supervise.md b/prompts/supervise.md new file mode 100644 index 0000000..918c8e3 --- /dev/null +++ b/prompts/supervise.md @@ -0,0 +1,17 @@ +# Periodic fleet sweep + +A scheduled wake. Do a light, read-only sweep of the fleet — do not start work unless something is +clearly wrong and a runbook covers the fix. + +1. `python3 scripts/fleet.py status` — list every project in `fleet.toml` with its location, harness, + pinned ref, and enabled flag. +2. For each **enabled** project whose location is reachable from this host, optionally check whether + its harness reports it running (for an `agent-orchestrator` project: + `engine/agents.py status --config /agents.toml`). Reading its harness docs first if the + harness is unfamiliar. +3. Report a one-paragraph summary: total / enabled / disabled, anything unreachable or stopped that + should be running. If a fix is needed and `docs/manage-projects.md` covers it, you may apply it; + otherwise just flag it. + +Remember the one-directional rule: never write fleet/PO state into a project repo. The fleet's truth +is `fleet.toml` here. diff --git a/scripts/_resolve.sh b/scripts/_resolve.sh new file mode 100755 index 0000000..67baf66 --- /dev/null +++ b/scripts/_resolve.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash +# _resolve.sh — shared helper: given a fleet project name, echo "\t\t". +# Sourced by start/stop/update scripts. Reads the PO's fleet.toml (the only project↔location record). +set -euo pipefail +PO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" +resolve_project() { + local name="$1" + python3 - "$PO_ROOT/fleet.toml" "$name" <<'PY' +import sys, tomllib +path, name = sys.argv[1], sys.argv[2] +with open(path, "rb") as f: + raw = tomllib.load(f) +for p in raw.get("project", []): + if p.get("name") == name: + print("\t".join([p.get("location", ""), p.get("config", "agents.toml"), p.get("harness", "")])) + sys.exit(0) +sys.exit(f"_resolve: no project named {name!r} in {path}") +PY +} diff --git a/scripts/create-project.sh b/scripts/create-project.sh new file mode 100755 index 0000000..780aa99 --- /dev/null +++ b/scripts/create-project.sh @@ -0,0 +1,117 @@ +#!/usr/bin/env bash +# create-project.sh — scaffold a NEW project that uses a harness. +# +# Produces a self-contained project repo: the chosen harness vendored as the `engine/` submodule at +# a pinned ref, plus a harness config scaffolded by the harness's own `init`. The project contains +# NO project-orchestrator / fleet metadata — knowledge is one-directional (PO → project). Registering +# the project in the PO's fleet.toml is a SEPARATE, PO-side step (use --register, or edit fleet.toml +# by hand); nothing about the fleet ever lands inside the project repo. +# +# Usage: +# scripts/create-project.sh [options] +# +# Options: +# --dir parent directory to create the project under (default: ./projects) +# --engine-url harness repo to vendor as engine/ (default: the agent-orchestrator repo) +# --ref harness ref to pin the submodule at (default: v0.1.0) +# --prefix session_prefix to write into the project's config (default: -) +# --register also append a [[project]] entry to this PO's fleet.toml (PO-side only) +# --no-commit leave the project tree uncommitted (default: make an initial commit) +# +# Drive it by hand afterwards, exactly like any project: +# cd / && python3 engine/agents.py status +set -euo pipefail + +die() { echo "create-project: $*" >&2; exit 1; } + +[ $# -ge 1 ] || die "usage: create-project.sh [--dir P] [--engine-url U] [--ref R] [--prefix X] [--register] [--no-commit]" +NAME="$1"; shift +[[ "$NAME" =~ ^[a-z0-9][a-z0-9-]*$ ]] || die "name must be kebab-case (got: $NAME)" + +PO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" +PARENT="$PO_ROOT/projects" +ENGINE_URL="https://git.autonomic.zone/recipe-maintainers/agent-orchestrator.git" +REF="v0.1.0" +PREFIX="" +REGISTER=0 +COMMIT=1 + +while [ $# -gt 0 ]; do + case "$1" in + --dir) PARENT="$2"; shift 2;; + --engine-url) ENGINE_URL="$2"; shift 2;; + --ref) REF="$2"; shift 2;; + --prefix) PREFIX="$2"; shift 2;; + --register) REGISTER=1; shift;; + --no-commit) COMMIT=0; shift;; + *) die "unknown option: $1";; + esac +done +PREFIX="${PREFIX:-${NAME}-}" + +command -v git >/dev/null || die "git not on PATH" +command -v python3 >/dev/null || die "python3 not on PATH" + +DEST="$PARENT/$NAME" +[ -e "$DEST" ] && die "$DEST already exists — refusing to overwrite" +mkdir -p "$PARENT" + +echo "create-project: scaffolding '$NAME' at $DEST (engine $ENGINE_URL @ $REF)" +git init -q -b main "$DEST" +cd "$DEST" + +# 1) vendor the harness as a pinned submodule under engine/ +git -c protocol.version=2 submodule add -q "$ENGINE_URL" engine +( cd engine && git fetch -q --tags origin && git checkout -q "$REF" ) +git add .gitmodules engine + +# 2) scaffold the harness config + prompts via the harness's OWN init (no PO/fleet metadata) +python3 engine/agents.py init . >/dev/null +# stamp the chosen session_prefix into the scaffolded config (keeps namespaces unique per project) +python3 - "$PREFIX" <<'PY' +import re, sys, pathlib +prefix = sys.argv[1] +p = pathlib.Path("agents.toml") +txt = p.read_text() +txt = re.sub(r'session_prefix\s*=\s*"[^"]*"', f'session_prefix = "{prefix}"', txt, count=1) +p.write_text(txt) +PY + +# 3) ignore runtime state; the project knows nothing about any PO +cat > .gitignore <<'EOF' +# runtime state + logs (never committed) +.ao-state/ +*.log +__pycache__/ +*.pyc +result +EOF + +git add agents.toml prompts .gitignore 2>/dev/null || true + +if [ "$COMMIT" -eq 1 ]; then + git -c user.name="project-orchestrator" -c user.email="po@localhost" \ + commit -q -m "init: scaffold $NAME (engine @ $REF)" +fi + +echo "create-project: done — $DEST" +echo " engine pinned at: $(cd engine && git rev-parse HEAD) ($REF)" +echo " config: agents.toml (session_prefix = $PREFIX)" +echo " verify it: ( cd $DEST && python3 engine/agents.py status )" + +if [ "$REGISTER" -eq 1 ]; then + echo "create-project: registering '$NAME' in $PO_ROOT/fleet.toml" + cat >> "$PO_ROOT/fleet.toml" < # dump one project's full entry + python3 scripts/fleet.py --file PATH ... # use a non-default registry (default: ./fleet.toml) + +Needs only the Python stdlib (tomllib → python >= 3.11). +""" +import sys +import tomllib +from pathlib import Path + +REQUIRED = ["name", "location", "harness", "ref", "enabled", "secrets"] +OPTIONAL = ["config", "notes"] + + +def _registry_path(argv): + path = Path("fleet.toml") + out = [] + i = 0 + while i < len(argv): + if argv[i] == "--file" and i + 1 < len(argv): + path = Path(argv[i + 1]); i += 2; continue + out.append(argv[i]); i += 1 + return path, out + + +def load(path): + if not path.exists(): + sys.exit(f"fleet: registry not found: {path}") + with open(path, "rb") as f: + raw = tomllib.load(f) + return raw + + +def validate(raw, path): + errors = [] + projects = raw.get("project", []) + if not isinstance(projects, list): + errors.append("[[project]] must be an array of tables") + projects = [] + seen = set() + for i, p in enumerate(projects): + tag = p.get("name", f"#{i}") + for k in REQUIRED: + if k not in p: + errors.append(f"project {tag}: missing required field '{k}'") + if not isinstance(p.get("enabled", False), bool): + errors.append(f"project {tag}: 'enabled' must be a boolean") + name = p.get("name") + if name in seen: + errors.append(f"duplicate project name: {name}") + if name: + seen.add(name) + for k in p: + if k not in REQUIRED + OPTIONAL: + errors.append(f"project {tag}: unknown field '{k}'") + return projects, errors + + +def cmd_list(projects): + if not projects: + print("(no projects registered)") + return + w = max(len(p.get("name", "?")) for p in projects) + for p in projects: + flag = "enabled " if p.get("enabled") else "disabled" + print(f" {p.get('name','?'):<{w}} [{flag}] {p.get('harness','?')}@{p.get('ref','?')}" + f" {p.get('location','?')}") + + +def main(): + path, argv = _registry_path(sys.argv[1:]) + cmd = argv[0] if argv else "list" + raw = load(path) + projects, errors = validate(raw, path) + if cmd == "validate": + if errors: + for e in errors: + print(f" ERROR: {e}") + sys.exit(1) + print(f"fleet: OK — {len(projects)} project(s), schema v{raw.get('fleet', {}).get('version', '?')}") + return + # for list/status/get we still surface errors but don't hard-fail the read + for e in errors: + print(f" ERROR: {e}", file=sys.stderr) + if cmd == "list": + cmd_list(projects) + elif cmd == "status": + cmd_list(projects) + en = sum(1 for p in projects if p.get("enabled")) + print(f"\n total={len(projects)} enabled={en} disabled={len(projects) - en}" + f" (registry schema v{raw.get('fleet', {}).get('version', '?')})") + elif cmd == "get" and len(argv) > 1: + target = argv[1] + for p in projects: + if p.get("name") == target: + for k in REQUIRED + OPTIONAL: + if k in p: + print(f" {k:<9} = {p[k]!r}") + return + sys.exit(f"fleet: no project named {target!r}") + else: + sys.exit(__doc__) + if errors: + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/scripts/start-project.sh b/scripts/start-project.sh new file mode 100755 index 0000000..0608fe4 --- /dev/null +++ b/scripts/start-project.sh @@ -0,0 +1,19 @@ +#!/usr/bin/env bash +# start-project.sh [agent...] — start a fleet project's agents via its harness. +# +# Resolves in the PO's fleet.toml, then drives that project's harness. For an +# agent-orchestrator project that is `engine/agents.py up`. For an unfamiliar harness, READ that +# project's harness docs first — there is no rigid contract. This wrapper handles the common +# agent-orchestrator case; extend it (or act by hand) for other harnesses. +set -euo pipefail +[ $# -ge 1 ] || { echo "usage: start-project.sh [agent...]" >&2; exit 1; } +NAME="$1"; shift || true +source "$(dirname "$0")/_resolve.sh" +IFS=$'\t' read -r LOCATION CONFIG HARNESS < <(resolve_project "$NAME") +[ -d "$LOCATION" ] || { echo "start-project: location not a local dir: $LOCATION (clone it first)" >&2; exit 1; } +case "$HARNESS" in + agent-orchestrator) + echo "start-project: $NAME → python3 engine/agents.py up $* (in $LOCATION)" + ( cd "$LOCATION" && python3 engine/agents.py --config "$CONFIG" up "$@" );; + *) echo "start-project: harness '$HARNESS' is not agent-orchestrator — read its docs and drive it by hand." >&2; exit 2;; +esac diff --git a/scripts/stop-project.sh b/scripts/stop-project.sh new file mode 100755 index 0000000..6216c93 --- /dev/null +++ b/scripts/stop-project.sh @@ -0,0 +1,15 @@ +#!/usr/bin/env bash +# stop-project.sh [agent...] — stop a fleet project's agents via its harness. +# Mirror of start-project.sh; for an agent-orchestrator project that is `engine/agents.py down`. +set -euo pipefail +[ $# -ge 1 ] || { echo "usage: stop-project.sh [agent...]" >&2; exit 1; } +NAME="$1"; shift || true +source "$(dirname "$0")/_resolve.sh" +IFS=$'\t' read -r LOCATION CONFIG HARNESS < <(resolve_project "$NAME") +[ -d "$LOCATION" ] || { echo "stop-project: location not a local dir: $LOCATION" >&2; exit 1; } +case "$HARNESS" in + agent-orchestrator) + echo "stop-project: $NAME → python3 engine/agents.py down $* (in $LOCATION)" + ( cd "$LOCATION" && python3 engine/agents.py --config "$CONFIG" down "$@" );; + *) echo "stop-project: harness '$HARNESS' is not agent-orchestrator — read its docs and drive it by hand." >&2; exit 2;; +esac diff --git a/scripts/update-project.sh b/scripts/update-project.sh new file mode 100755 index 0000000..fb3ed05 --- /dev/null +++ b/scripts/update-project.sh @@ -0,0 +1,21 @@ +#!/usr/bin/env bash +# update-project.sh — bump a fleet project's harness submodule to a new ref. +# +# Updating the engine for a project = checkout a new ref of its `engine/` submodule + commit, IN +# THAT PROJECT'S REPO ONLY. It touches no other project (each pins its own copy). Afterwards, update +# the project's `ref` in this PO's fleet.toml so the registry stays accurate. +set -euo pipefail +[ $# -ge 2 ] || { echo "usage: update-project.sh " >&2; exit 1; } +NAME="$1"; NEWREF="$2" +source "$(dirname "$0")/_resolve.sh" +IFS=$'\t' read -r LOCATION CONFIG HARNESS < <(resolve_project "$NAME") +[ -d "$LOCATION" ] || { echo "update-project: location not a local dir: $LOCATION" >&2; exit 1; } +[ "$HARNESS" = "agent-orchestrator" ] || { echo "update-project: harness '$HARNESS' not agent-orchestrator — update by hand per its docs." >&2; exit 2; } + +echo "update-project: $NAME engine → $NEWREF (in $LOCATION)" +( cd "$LOCATION/engine" && git fetch -q --tags origin && git checkout -q "$NEWREF" ) +( cd "$LOCATION" && git add engine \ + && git -c user.name="project-orchestrator" -c user.email="po@localhost" \ + commit -q -m "chore: bump engine to $NEWREF" ) +echo "update-project: project committed. Now update fleet.toml: set ref = \"$NEWREF\" for '$NAME'." +echo " (edit $(cd "$(dirname "$0")/.." && pwd)/fleet.toml, then: python3 scripts/fleet.py validate)"