Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6.3 KiB
STATUS — phase aotest (Builder)
Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md
Deliverable repo: recipe-maintainers/agent-orchestrator on git.autonomic.zone
Builder working clone: /home/loops/aoeng/agent-orchestrator (outside the cc-ci tracked tree)
DONE
All 5 Definition-of-Done items are Adversary-verified with a fresh PASS (@2026-06-13T19:00Z) on
deliverable commit cdcece9a9ac64b458103194025f2c22ba830ce15. No findings, no VETO — the Adversary
cold-cloned to /tmp and re-ran the unit suite + both live smokes + isolation check inside
nix develop (Python 3.11.11, tmux 3.5a) and independently confirmed every item. Full
cold-verification evidence is in REVIEW-aotest.md.
The agent-orchestrator harness now ships a committed test suite under tests/: 51 unit tests
(pure logic — config/defaults, kickoff assembly, phase machine, limit/WAITING-UNTIL parsing,
claude+opencode activity detection), isolated live smokes that bring a throwaway project up THROUGH
agents.py on the real claude and opencode backends (unique session prefix, dedicated opencode
port :4097, full cleanup), and tests/run.sh (unit always + smokes when available + isolation
sanity), documented in the README ## Testing section.
WHERE (verification inputs)
- Repo:
https://git.autonomic.zone/recipe-maintainers/agent-orchestrator.git mainHEAD →cdcece9a9ac64b458103194025f2c22ba830ce15(commitcdcece9, on top of289ef07v0.1.0)- New files:
tests/test_unit.py,tests/smoke_claude.sh,tests/smoke_opencode.sh,tests/run.sh; README updated (file-map line + a new## Testingsection). - Backends present on this host:
claude→/home/loops/.local/bin/claude(v2.1.177);opencode→/home/loops/.local/bin/opencode; creds at/srv/cc-ci/.testenv.
HOW to cold-verify (fresh /tmp clone, exactly as the plan specifies)
cd /tmp && rm -rf aotest-cold
git clone https://git.autonomic.zone/recipe-maintainers/agent-orchestrator.git aotest-cold
cd aotest-cold && git rev-parse HEAD # → cdcece9a9ac6...
nix develop -c python3 -m unittest discover -s tests # DoD-1: unit tests
nix develop -c ./tests/run.sh # full suite: unit + both smokes + isolation
Individual smokes (each is also invoked by run.sh):
nix develop -c bash tests/smoke_claude.sh # DoD-2
nix develop -c bash tests/smoke_opencode.sh # DoD-3 (own server on :4097, ≠ live :4096)
Post-run isolation check (DoD-4):
tmux ls | grep '^aotest-' # EXPECTED: no output (no leftover sessions)
ss -ltn | grep ':4097 ' # EXPECTED: no output (port freed)
tmux ls | grep -E 'cc-ci-orchestrator|cc-ci-watchdog|cc-ci-assistant3' # EXPECTED: all 3 present
WHERE (verification inputs)
- Repo:
https://git.autonomic.zone/recipe-maintainers/agent-orchestrator.git mainHEAD →cdcece9a9ac64b458103194025f2c22ba830ce15(commitcdcece9, on top of289ef07v0.1.0)- New files:
tests/test_unit.py,tests/smoke_claude.sh,tests/smoke_opencode.sh,tests/run.sh; README updated (file-map line + a new## Testingsection). - Backends present on this host:
claude→/home/loops/.local/bin/claude(v2.1.177);opencode→/home/loops/.local/bin/opencode; creds at/srv/cc-ci/.testenv.
HOW to cold-verify (fresh /tmp clone, exactly as the plan specifies)
cd /tmp && rm -rf aotest-cold
git clone https://git.autonomic.zone/recipe-maintainers/agent-orchestrator.git aotest-cold
cd aotest-cold && git rev-parse HEAD # → cdcece9a9ac6...
nix develop -c python3 -m unittest discover -s tests # DoD-1: unit tests
nix develop -c ./tests/run.sh # full suite: unit + both smokes + isolation
Individual smokes (each is also invoked by run.sh):
nix develop -c bash tests/smoke_claude.sh # DoD-2
nix develop -c bash tests/smoke_opencode.sh # DoD-3 (own server on :4097, ≠ live :4096)
Post-run isolation check (DoD-4):
tmux ls | grep '^aotest-' # EXPECTED: no output (no leftover sessions)
ss -ltn | grep ':4097 ' # EXPECTED: no output (port freed)
tmux ls | grep -E 'cc-ci-orchestrator|cc-ci-watchdog|cc-ci-assistant3' # EXPECTED: all 3 present
EXPECTED outcomes (from my cold run @2026-06-13T18:55Z on cdcece9, /tmp clone, nix develop)
- DoD-1 Unit tests:
Ran 51 tests…OK, rc=0. Pure logic — no agents spawned, no tmux sessions created. Covers: config load + defaults merge; kickoff-template assembly; phase machine (advance on## DONE, idempotent sequence-complete, append-a-phase resumes); limit reset-banner parsing;WAITING-UNTIL/stall parsing; claude + opencode activity detectors; the shippedagents.example.tomlloads. - DoD-2 claude smoke:
=== CLAUDE BACKEND SMOKE: PASS ===, rc=0 — probe brought up THROUGHagents.py(pane commandclaude),statusshows it RUNNING,downremoves it. Isolated prefixaotest-c-<pid>-; trivial probe onclaude-haiku-4-5. - DoD-3 opencode smoke:
=== OPENCODE BACKEND SMOKE: PASS ===, rc=0 — dedicated opencode server on :4097 (not 4096); probe attaches THROUGHagents.py(pane commandopencode),statusRUNNING,downremoves it; cleanup kills the server and waits for the port to free. (SKIPs gracefully with rc=0 ifopencode/creds are absent — not the case on this host.) - DoD-4 isolation: runner prints
PASS: no leftover aotest-* tmux sessionsand listscc-ci-orchestrator cc-ci-watchdog cc-ci-assistant3as present;:4097free afterwards. - DoD-5 committed + documented: the four
tests/files are committed atcdcece9; README## Testingsection documentsnix develop -c ./tests/run.shand what each layer covers. - Runner summary line:
SUMMARY: unit=PASS claude=PASS opencode=PASS isolation=PASS→ALL RUN TESTS PASSED (skips are OK), rc=0.
Working tree of the deliverable clone is clean and pushed.
Gate status
| Gate | Status | Verified |
|---|---|---|
| DoD-1 Unit tests PASS (clean /tmp, nix develop) | PASS | 2026-06-13T19:00Z |
| DoD-2 Claude smoke PASSES via harness | PASS | 2026-06-13T19:00Z |
| DoD-3 opencode smoke PASSES (dedicated port) | PASS | 2026-06-13T19:00Z |
| DoD-4 No leftover aotest-* sessions/ports; cc-ci intact | PASS | 2026-06-13T19:00Z |
| DoD-5 Test suite + runner committed + documented | PASS | 2026-06-13T19:00Z |