# REVIEW — phase aotest (Adversary log) **Phase plan:** `/srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md` **Deliverable repo:** `recipe-maintainers/agent-orchestrator` on git.autonomic.zone --- ## Adversary orientation @2026-06-13T18:44Z **Mission:** Verify the agent-orchestrator harness runs a real project generically on BOTH claude and opencode backends, fully isolated, with a committed test suite. **DoD items to verify (from phase plan):** 1. Unit tests PASS — run from clean /tmp checkout inside `nix develop` 2. claude smoke test PASSES via the harness (isolated, cleaned up) 3. opencode smoke test PASSES or SKIPs with clear, justified reason recorded here 4. No leftover `aotest-*` tmux sessions or held ports after the run; live cc-ci sessions (cc-ci-orchestrator/watchdog/assistant3) untouched 5. Test suite + runner committed and documented in README **Key guardrails for my verification:** - Must use a non-`cc-ci-` session prefix (aotest-* is correct) - opencode port must ≠ 4096 (the live cc-ci port) - Do NOT touch live launch system: `/srv/cc-ci/cc-ci-plan/agents.py`, `agents.toml`, `cc-ci-plan/state/`, or running tmux sessions - Verify from COLD START: fresh shell, /tmp checkout, no cached state **Repo state at orientation:** v0.1.0 (commit `289ef07`) — no tests/ dir present yet. Awaiting Builder to push the aotest deliverable. **Code orientation @2026-06-13T18:44Z (from clean /tmp/ao-adv-check clone):** Key functions the unit tests MUST exercise (from reading agents.py 929 lines): - `load_config`: session_prefix required → hard die; log_dir required → hard die; defaults merge; project_dir resolution; agents inherit defaults; services inherit defaults - `build_loop_kickoff`: reads `[loop].kickoff_template`, fills `{phase_id}/{plan}/{status}/{role}`, then appends `/.md`. No project text in code — must test slot substitution. - `phase_done`: reads `status_basename` from `handoff_repo(cfg)`, looks for `done_marker` line; skips DONE_PLACEHOLDER_RE lines. Must test: file absent → False, no marker → False, marker present → True, placeholder line → False. - `phase_advance_check`: auto-advance on DONE marker; idempotent when SEQUENCE-COMPLETE exists; appending a phase clears SEQUENCE-COMPLETE marker and resumes. - `_parse_reset_epoch`: AM/PM handling (12pm=12:00, 12am=00:00), 24h format, invalid hour/minute returns None, no match returns None. Takes the LAST match. - `_parse_waiting_until`: footer_ui branch uses last non-empty line only; non-footer scans whole pane. ISO-8601 with Z suffix. Invalid format returns None. - `pane_active`: claude backend uses `active_re` match; opencode uses `footer_ui` branch (only last line of 3 matters); limit banner + idle = not active (tested in selftest). **Live smoke isolation requirements (DoD verification):** - claude smoke: session prefix must be `aotest-` (NOT `cc-ci-`), isolated log dir under /tmp - opencode smoke: port must ≠ 4096 (live cc-ci port is 4096), own server, own prefix - Post-run: `tmux ls | grep aotest` → zero results; live sessions intact **Specific break-it checks I will run:** 1. `tmux ls | grep aotest` before AND after — no leakage 2. `ss -ltn | grep 4096` — opencode test must NOT use this port 3. Check cc-ci sessions: cc-ci-orchestrator, cc-ci-watchdog, cc-ci-assistant3 still present 4. Try to interrupt the live smoke mid-run (if isolatable) — cleanup still fires 5. Unit test edge cases: - load_config with missing session_prefix → expect die() - load_config with missing log_dir → expect die() - phase_done with ## DONE followed only by placeholder → expect False - _parse_reset_epoch("resets Jun 16, 12pm") → 12:00 (NOT 24:00 which is invalid) - _parse_reset_epoch("resets Jun 16, 12am") → 00:00 (not 12:00) - _parse_waiting_until with footer_ui=True: only last non-empty line checked 6. Confirm selftest (DoD-3 of aoeng) still passes after any test infrastructure changes --- ## Verdicts ### ALL DoD items: PASS @2026-06-13T19:00Z Cold verification from clean `/tmp/ao-adv-check` clone (fresh git clone before pulling the Builder's STATUS — verdict formed independently). Commit verified: `cdcece9a9ac64b458103194025f2c22ba830ce15`. ``` rm -rf /tmp/ao-adv-check git clone https://...@git.autonomic.zone/recipe-maintainers/agent-orchestrator.git /tmp/ao-adv-check git -C /tmp/ao-adv-check rev-parse HEAD # → cdcece9a9ac64b458103194025f2c22ba830ce15 ✓ matches claimed commit ``` --- #### DoD-1 — Unit tests PASS (clean /tmp, nix develop): PASS ``` cd /tmp/ao-adv-check && nix develop -c python3 -m unittest discover -s tests -p 'test_*.py' -v ``` ``` Ran 51 tests in 0.062s OK ``` 51 tests, rc=0. Coverage confirmed: - `TestConfigLoad` (12 tests): session_prefix required die, log_dir required die, defaults merge, explicit session override, per-agent override wins, relative/absolute dir resolution, log_dir resolved, state_dir created, service session named, backend_of resolves, backend_of unknown dies, env AGENT_MODEL override single-invocation - `TestExampleConfig` (1 test): shipped `agents.example.toml` loads with expected shape - `TestKickoff` (5 tests): slot fill ({phase_id}/{plan}/{status}/{role}), correct role prompt appended, no unrendered slots, agent_prompt dispatches correctly, role_model phase override - `TestPhaseMachine` (8 tests): phase_done detects marker, rejects placeholder, false when no marker, false when file missing; cur_idx reads state file; advance on DONE; sequence-complete idempotent (no re-stop on 2nd call); append-phase clears SEQUENCE-COMPLETE and resumes; custom done_marker respected - `TestLimitParsing` (8 tests): PM, AM+minutes, 12am=midnight, invalid hour=None, no match=None, picks last match, unparsable fallback, within-6h window uses banner, >6h falls back - `TestWaitingUntil` (5 tests): non-footer finds marker anywhere, non-footer None without marker, footer ignores marker not in last line, footer honors marker as last line, bad timestamp=None - `TestActivityDetection` (8 tests): claude active_re (esc to interrupt, Running tool, spinner), claude idle not active; opencode active footer, idle footer, active-only-at-top ignored, log_grace fallback via mtime --- #### DoD-2 — claude smoke PASSES via harness: PASS ``` cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_claude.sh ``` ``` === claude backend smoke (isolated: prefix=aotest-c-681472-) === [agents] starting aotest-c-681472-probe (claude, kind=persistent, model=claude-haiku-4-5) PASS: session aotest-c-681472-probe created via agents.py (pane command: claude) PASS: claude TUI attached + alive (driven entirely by agents.py) PASS: agents.py status reports probe RUNNING PASS: agents.py down cleanly removed the session === CLAUDE BACKEND SMOKE: PASS === ``` Confirmed: isolated prefix `aotest-c--` (not cc-ci-), temp sandbox log_dir, pane command is `claude` (TUI alive), status RUNNING, down cleans up. Cleanup trap on EXIT/INT/TERM. --- #### DoD-3 — opencode smoke PASSES via harness (dedicated port ≠ 4096): PASS ``` cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_opencode.sh ``` ``` === opencode backend smoke (isolated: prefix=aotest-o-681566- port=4097) === PASS: dedicated opencode server listening on :4097 [agents] starting aotest-o-681566-probe (opencode, kind=persistent, model=default) PASS: session aotest-o-681566-probe created via agents.py (pane command: opencode) PASS: opencode TUI attached + alive (driven entirely by agents.py) PASS: agents.py status reports probe RUNNING PASS: agents.py down cleanly removed the session === OPENCODE BACKEND SMOKE: PASS === ``` Confirmed: dedicated server on `:4097` (script has hardcoded guard refusing `4096`); isolated prefix `aotest-o--`; TUI attached; cleanup kills server AND does `pkill -f "opencode serve.*--port ${PORT}"` + waits for port to free. --- #### DoD-4 — No leftover aotest-* sessions or ports; cc-ci sessions intact: PASS Post-run isolation check (after full suite via run.sh): ``` tmux ls | grep '^aotest-' # → (no output) ✓ ss -ltn | grep ':4097 ' # → (no output) ✓ tmux ls | grep -E 'cc-ci-orchestrator|cc-ci-watchdog|cc-ci-assistant3' # → cc-ci-assistant3, cc-ci-orchestrator, cc-ci-watchdog ✓ ``` run.sh isolation sanity block output: ``` >>> ISOLATION SANITY PASS: no leftover aotest-* tmux sessions info: live cc-ci sessions present: cc-ci-orchestrator cc-ci-watchdog cc-ci-assistant3 ``` --- #### DoD-5 — Test suite + runner committed and documented: PASS Files at commit `cdcece9`: - `tests/test_unit.py` — 51-test stdlib unittest suite ✓ - `tests/smoke_claude.sh` — isolated live claude smoke ✓ - `tests/smoke_opencode.sh` — isolated live opencode smoke ✓ - `tests/run.sh` — runner: unit always, live smokes when available, isolation sanity ✓ README `## Testing` section (lines ~321–351): - Documents `nix develop -c ./tests/run.sh` as the canonical invocation ✓ - Explains what each layer covers (unit vs live vs isolation) ✓ - Documents skip conditions (backend bin/creds absent) ✓ - Documents useful env vars (CLAUDE_BIN, AOTEST_MODEL, AOTEST_OC_PORT, AOTEST_OC_CREDS) ✓ - Notes safety by construction (non-cc-ci prefix, non-4096 port, cleanup trap) ✓ --- ### Full suite summary (run.sh output) ``` SUMMARY: unit=PASS claude=PASS opencode=PASS isolation=PASS ALL RUN TESTS PASSED (skips are OK) ``` rc=0. Verified at commit `cdcece9`, clean /tmp clone, nix develop (Python 3.11.11, tmux 3.5a). --- ### No findings. No veto. Phase aotest is DONE. All 5 DoD items PASS at 2026-06-13T19:00Z on commit `cdcece9`.