test: add tests/ — unit suite + isolated live claude/opencode smokes + runner
Unit tests (no agents/tmux): config load + defaults merge, kickoff-template assembly, phase machine (advance/idempotent-complete/append-resumes), limit reset-banner parsing, WAITING-UNTIL/stall parsing, claude+opencode activity detectors. Live smokes bring a throwaway project up THROUGH agents.py on each real backend in an isolated sandbox (unique prefix, opencode on a non-4096 port), verify attach + status + down, and clean up. tests/run.sh runs unit always + smokes when backends present; README documents it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
34
README.md
34
README.md
@ -17,6 +17,7 @@ agent-log.py render claude JSONL transcripts into clean, greppable logs
|
||||
agents.example.toml a self-contained 2-agent example project
|
||||
prompts/ generic role + kickoff templates (builder / adversary / kickoff)
|
||||
smoke.sh bring the example up + tear it down in an isolated sandbox, then clean up
|
||||
tests/ the test suite — unit tests + isolated live backend smokes + a runner
|
||||
flake.nix/.lock a Nix devShell with the runtime deps (python311, tmux, git)
|
||||
```
|
||||
|
||||
@ -315,6 +316,39 @@ documents this in its banner.
|
||||
|
||||
---
|
||||
|
||||
## Testing
|
||||
|
||||
The `tests/` directory holds the harness's own test suite. One runner drives everything:
|
||||
|
||||
```bash
|
||||
nix develop -c ./tests/run.sh # unit tests always; live backend smokes when available
|
||||
# or just: ./tests/run.sh # (python3 + tmux must be on PATH)
|
||||
```
|
||||
|
||||
What it runs:
|
||||
|
||||
- **Unit tests** (`tests/test_unit.py`) — pure logic, **no agents spawned, no live tmux sessions**.
|
||||
Cover config load + defaults merge, kickoff-template assembly, the phase machine (advance on the
|
||||
done marker, idempotent sequence-complete, append-a-phase resumes), usage-limit reset-banner
|
||||
parsing, `WAITING-UNTIL` / stall parsing, and the per-backend activity detectors (claude +
|
||||
opencode footers). Always run; a failure fails the suite. Run them alone with
|
||||
`python3 -m unittest discover -s tests` (or `python3 tests/test_unit.py`).
|
||||
- **Live backend smokes** (`tests/smoke_claude.sh`, `tests/smoke_opencode.sh`) — each brings a
|
||||
throwaway scratch project up **through `agents.py`** on a real backend, in a fully isolated
|
||||
sandbox (its own unique `session_prefix`, a temp `log_dir`, and — for opencode — a dedicated
|
||||
server on a non-default port `AOTEST_OC_PORT`, default `4097`), confirms the session attaches and
|
||||
`status` reports it RUNNING, then `down`s it and cleans up (no leftover sessions, port freed).
|
||||
Each **SKIPs gracefully** (exit 0) when its backend's binary or creds are unavailable. Useful env:
|
||||
`CLAUDE_BIN` / `OPENCODE_BIN`, `AOTEST_MODEL`, `AOTEST_OC_PORT`, `AOTEST_OC_CREDS`.
|
||||
- **Isolation sanity** — after the live runs, the runner asserts no `aotest-*` tmux sessions leaked
|
||||
and reports that any live sessions are untouched.
|
||||
|
||||
The smokes are safe by construction: a unique per-run session prefix (never `cc-ci-` or any real
|
||||
project's), a dedicated opencode port (never `4096`), and a cleanup trap that fires on success,
|
||||
failure, and Ctrl+C.
|
||||
|
||||
---
|
||||
|
||||
## Adding things
|
||||
|
||||
- **Add an agent** — add an `[[agent]]` block; `agents.py up <name>`. No code change.
|
||||
|
||||
Reference in New Issue
Block a user