test: add tests/ — unit suite + isolated live claude/opencode smokes + runner

Unit tests (no agents/tmux): config load + defaults merge, kickoff-template
assembly, phase machine (advance/idempotent-complete/append-resumes), limit
reset-banner parsing, WAITING-UNTIL/stall parsing, claude+opencode activity
detectors. Live smokes bring a throwaway project up THROUGH agents.py on each
real backend in an isolated sandbox (unique prefix, opencode on a non-4096
port), verify attach + status + down, and clean up. tests/run.sh runs unit
always + smokes when backends present; README documents it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
2026-06-13 18:55:34 +00:00
parent 289ef07df4
commit cdcece9a9a
5 changed files with 915 additions and 0 deletions

View File

@ -17,6 +17,7 @@ agent-log.py render claude JSONL transcripts into clean, greppable logs
agents.example.toml a self-contained 2-agent example project
prompts/ generic role + kickoff templates (builder / adversary / kickoff)
smoke.sh bring the example up + tear it down in an isolated sandbox, then clean up
tests/ the test suite — unit tests + isolated live backend smokes + a runner
flake.nix/.lock a Nix devShell with the runtime deps (python311, tmux, git)
```
@ -315,6 +316,39 @@ documents this in its banner.
---
## Testing
The `tests/` directory holds the harness's own test suite. One runner drives everything:
```bash
nix develop -c ./tests/run.sh # unit tests always; live backend smokes when available
# or just: ./tests/run.sh # (python3 + tmux must be on PATH)
```
What it runs:
- **Unit tests** (`tests/test_unit.py`) — pure logic, **no agents spawned, no live tmux sessions**.
Cover config load + defaults merge, kickoff-template assembly, the phase machine (advance on the
done marker, idempotent sequence-complete, append-a-phase resumes), usage-limit reset-banner
parsing, `WAITING-UNTIL` / stall parsing, and the per-backend activity detectors (claude +
opencode footers). Always run; a failure fails the suite. Run them alone with
`python3 -m unittest discover -s tests` (or `python3 tests/test_unit.py`).
- **Live backend smokes** (`tests/smoke_claude.sh`, `tests/smoke_opencode.sh`) — each brings a
throwaway scratch project up **through `agents.py`** on a real backend, in a fully isolated
sandbox (its own unique `session_prefix`, a temp `log_dir`, and — for opencode — a dedicated
server on a non-default port `AOTEST_OC_PORT`, default `4097`), confirms the session attaches and
`status` reports it RUNNING, then `down`s it and cleans up (no leftover sessions, port freed).
Each **SKIPs gracefully** (exit 0) when its backend's binary or creds are unavailable. Useful env:
`CLAUDE_BIN` / `OPENCODE_BIN`, `AOTEST_MODEL`, `AOTEST_OC_PORT`, `AOTEST_OC_CREDS`.
- **Isolation sanity** — after the live runs, the runner asserts no `aotest-*` tmux sessions leaked
and reports that any live sessions are untouched.
The smokes are safe by construction: a unique per-run session prefix (never `cc-ci-` or any real
project's), a dedicated opencode port (never `4096`), and a cleanup trap that fires on success,
failure, and Ctrl+C.
---
## Adding things
- **Add an agent** — add an `[[agent]]` block; `agents.py up <name>`. No code change.