4.0 KiB
REVIEW — phase aotest (Adversary log)
Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md
Deliverable repo: recipe-maintainers/agent-orchestrator on git.autonomic.zone
Adversary orientation @2026-06-13T18:44Z
Mission: Verify the agent-orchestrator harness runs a real project generically on BOTH claude and opencode backends, fully isolated, with a committed test suite.
DoD items to verify (from phase plan):
- Unit tests PASS — run from clean /tmp checkout inside
nix develop - claude smoke test PASSES via the harness (isolated, cleaned up)
- opencode smoke test PASSES or SKIPs with clear, justified reason recorded here
- No leftover
aotest-*tmux sessions or held ports after the run; live cc-ci sessions (cc-ci-orchestrator/watchdog/assistant3) untouched - Test suite + runner committed and documented in README
Key guardrails for my verification:
- Must use a non-
cc-ci-session prefix (aotest-* is correct) - opencode port must ≠ 4096 (the live cc-ci port)
- Do NOT touch live launch system:
/srv/cc-ci/cc-ci-plan/agents.py,agents.toml,cc-ci-plan/state/, or running tmux sessions - Verify from COLD START: fresh shell, /tmp checkout, no cached state
Repo state at orientation: v0.1.0 (commit 289ef07) — no tests/ dir present yet.
Awaiting Builder to push the aotest deliverable.
Code orientation @2026-06-13T18:44Z (from clean /tmp/ao-adv-check clone):
Key functions the unit tests MUST exercise (from reading agents.py 929 lines):
load_config: session_prefix required → hard die; log_dir required → hard die; defaults merge; project_dir resolution; agents inherit defaults; services inherit defaultsbuild_loop_kickoff: reads[loop].kickoff_template, fills{phase_id}/{plan}/{status}/{role}, then appends<roles_dir>/<role>.md. No project text in code — must test slot substitution.phase_done: readsstatus_basenamefromhandoff_repo(cfg), looks fordone_markerline; skips DONE_PLACEHOLDER_RE lines. Must test: file absent → False, no marker → False, marker present → True, placeholder line → False.phase_advance_check: auto-advance on DONE marker; idempotent when SEQUENCE-COMPLETE exists; appending a phase clears SEQUENCE-COMPLETE marker and resumes._parse_reset_epoch: AM/PM handling (12pm=12:00, 12am=00:00), 24h format, invalid hour/minute returns None, no match returns None. Takes the LAST match._parse_waiting_until: footer_ui branch uses last non-empty line only; non-footer scans whole pane. ISO-8601 with Z suffix. Invalid format returns None.pane_active: claude backend usesactive_rematch; opencode usesfooter_uibranch (only last line of 3 matters); limit banner + idle = not active (tested in selftest).
Live smoke isolation requirements (DoD verification):
- claude smoke: session prefix must be
aotest-(NOTcc-ci-), isolated log dir under /tmp - opencode smoke: port must ≠ 4096 (live cc-ci port is 4096), own server, own prefix
- Post-run:
tmux ls | grep aotest→ zero results; live sessions intact
Specific break-it checks I will run:
tmux ls | grep aotestbefore AND after — no leakagess -ltn | grep 4096— opencode test must NOT use this port- Check cc-ci sessions: cc-ci-orchestrator, cc-ci-watchdog, cc-ci-assistant3 still present
- Try to interrupt the live smoke mid-run (if isolatable) — cleanup still fires
- Unit test edge cases:
- load_config with missing session_prefix → expect die()
- load_config with missing log_dir → expect die()
- phase_done with ## DONE followed only by placeholder → expect False
- _parse_reset_epoch("resets Jun 16, 12pm") → 12:00 (NOT 24:00 which is invalid)
- _parse_reset_epoch("resets Jun 16, 12am") → 00:00 (not 12:00)
- _parse_waiting_until with footer_ui=True: only last non-empty line checked
- Confirm selftest (DoD-3 of aoeng) still passes after any test infrastructure changes
Verdicts
(none yet — awaiting Builder push of tests/ dir)