Unit 51/51 PASS, claude smoke PASS, opencode smoke PASS (own :4097), no leftover aotest-* sessions/ports, cc-ci sessions intact. Cold-verified from /tmp clone inside nix develop. HOW/EXPECTED/WHERE in STATUS-aotest.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3.1 KiB
JOURNAL — phase aotest (Adversary)
2026-06-13T18:44Z — Phase orientation + initial files created
- Read plan-phase-aotest-verify.md: mission is to verify agent-orchestrator has a committed tests/ dir covering unit tests + isolated live smoke tests on both claude and opencode backends.
- Checked agent-orchestrator repo: current state is v0.1.0 (commit 289ef07), no tests/ dir.
- Created phase-namespaced files: STATUS-aotest.md, REVIEW-aotest.md, BACKLOG-aotest.md, JOURNAL-aotest.md.
- Builder has not yet pushed any aotest work. Entering polling stance.
Next: poll agent-orchestrator for new commits every ~10 min.
2026-06-13T18:56Z — (Builder) test suite built, all DoD met, gate CLAIMED
Approach. The harness (agents.py) is mostly pure functions with a thin tmux shell-out layer,
so I split testing into (a) unit tests that exercise the pure logic directly and (b) live smokes
that drive agents.py end-to-end on each real backend.
Unit tests (tests/test_unit.py, stdlib unittest, 51 tests). Each builds a throwaway
project (config + prompts + machine-docs) in a tempdir and calls the harness functions directly —
no agents, no live tmux. The one function that would spawn sessions, phase_advance_check,
calls module-level stop_loops/start_loops/handoff_reset; I monkeypatch those three to
recorders so the phase-machine logic (advance, idempotent sequence-complete, append-a-phase
resumes + clears the stale marker) is covered without launching anything. I also load the shipped
agents.example.toml so an example regression is caught.
- Gotcha: my
BASE_TOMLfixture had\d+/·regexes; in a normal triple-quoted string those collapse to single backslashes and tomllib rejects the invalid escape. Fixed by making the fixture a raw string (r"""…""") so the on-disk TOML keeps the doubled backslash, like the realagents.example.toml.
Live smokes. smoke_claude.sh / smoke_opencode.sh each spin up a throwaway persistent
"probe" through agents.py up in a sandbox with a unique session_prefix and temp log_dir,
confirm the session attaches (pane command claude/opencode), status shows RUNNING, down
removes it; a cleanup trap (EXIT INT TERM) kills everything. claude uses the cheap
claude-haiku-4-5. opencode generalizes cc-ci test-opencode.sh onto this repo with its own
server on :4097 (a guard refuses 4096).
- Gotcha: the opencode server runs in a subshell
( … serve … ) &, so$SERVER_PIDis the subshell, not the listener — killing it left:4097held (a DoD-4 leftover-port failure I caught on the first standalone run). Fixed cleanup to alsopkill -f "opencode serve.*--port ${PORT}"and wait for the port to free. Re-ran: freed.
Verification. Cold-cloned to /tmp/aotest-cold and ran inside nix develop (python311) — the
Adversary's exact path: unit=PASS (51) claude=PASS opencode=PASS isolation=PASS, rc=0; afterwards
no aotest-* sessions, :4097 free, cc-ci-orchestrator/watchdog/assistant3 present. Pushed the
deliverable as cdcece9; clean tree; claimed the gate.