Files
cc-ci/machine-docs/REVIEW-aotest.md
autonomic-bot 1c15cbb934
Some checks failed
continuous-integration/drone/push Build is failing
chore(aotest): add code orientation notes to REVIEW — break-it checklist ready
2026-06-13 18:47:18 +00:00

4.0 KiB

REVIEW — phase aotest (Adversary log)

Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md Deliverable repo: recipe-maintainers/agent-orchestrator on git.autonomic.zone


Adversary orientation @2026-06-13T18:44Z

Mission: Verify the agent-orchestrator harness runs a real project generically on BOTH claude and opencode backends, fully isolated, with a committed test suite.

DoD items to verify (from phase plan):

  1. Unit tests PASS — run from clean /tmp checkout inside nix develop
  2. claude smoke test PASSES via the harness (isolated, cleaned up)
  3. opencode smoke test PASSES or SKIPs with clear, justified reason recorded here
  4. No leftover aotest-* tmux sessions or held ports after the run; live cc-ci sessions (cc-ci-orchestrator/watchdog/assistant3) untouched
  5. Test suite + runner committed and documented in README

Key guardrails for my verification:

  • Must use a non-cc-ci- session prefix (aotest-* is correct)
  • opencode port must ≠ 4096 (the live cc-ci port)
  • Do NOT touch live launch system: /srv/cc-ci/cc-ci-plan/agents.py, agents.toml, cc-ci-plan/state/, or running tmux sessions
  • Verify from COLD START: fresh shell, /tmp checkout, no cached state

Repo state at orientation: v0.1.0 (commit 289ef07) — no tests/ dir present yet. Awaiting Builder to push the aotest deliverable.

Code orientation @2026-06-13T18:44Z (from clean /tmp/ao-adv-check clone):

Key functions the unit tests MUST exercise (from reading agents.py 929 lines):

  • load_config: session_prefix required → hard die; log_dir required → hard die; defaults merge; project_dir resolution; agents inherit defaults; services inherit defaults
  • build_loop_kickoff: reads [loop].kickoff_template, fills {phase_id}/{plan}/{status}/{role}, then appends <roles_dir>/<role>.md. No project text in code — must test slot substitution.
  • phase_done: reads status_basename from handoff_repo(cfg), looks for done_marker line; skips DONE_PLACEHOLDER_RE lines. Must test: file absent → False, no marker → False, marker present → True, placeholder line → False.
  • phase_advance_check: auto-advance on DONE marker; idempotent when SEQUENCE-COMPLETE exists; appending a phase clears SEQUENCE-COMPLETE marker and resumes.
  • _parse_reset_epoch: AM/PM handling (12pm=12:00, 12am=00:00), 24h format, invalid hour/minute returns None, no match returns None. Takes the LAST match.
  • _parse_waiting_until: footer_ui branch uses last non-empty line only; non-footer scans whole pane. ISO-8601 with Z suffix. Invalid format returns None.
  • pane_active: claude backend uses active_re match; opencode uses footer_ui branch (only last line of 3 matters); limit banner + idle = not active (tested in selftest).

Live smoke isolation requirements (DoD verification):

  • claude smoke: session prefix must be aotest- (NOT cc-ci-), isolated log dir under /tmp
  • opencode smoke: port must ≠ 4096 (live cc-ci port is 4096), own server, own prefix
  • Post-run: tmux ls | grep aotest → zero results; live sessions intact

Specific break-it checks I will run:

  1. tmux ls | grep aotest before AND after — no leakage
  2. ss -ltn | grep 4096 — opencode test must NOT use this port
  3. Check cc-ci sessions: cc-ci-orchestrator, cc-ci-watchdog, cc-ci-assistant3 still present
  4. Try to interrupt the live smoke mid-run (if isolatable) — cleanup still fires
  5. Unit test edge cases:
    • load_config with missing session_prefix → expect die()
    • load_config with missing log_dir → expect die()
    • phase_done with ## DONE followed only by placeholder → expect False
    • _parse_reset_epoch("resets Jun 16, 12pm") → 12:00 (NOT 24:00 which is invalid)
    • _parse_reset_epoch("resets Jun 16, 12am") → 00:00 (not 12:00)
    • _parse_waiting_until with footer_ui=True: only last non-empty line checked
  6. Confirm selftest (DoD-3 of aoeng) still passes after any test infrastructure changes

Verdicts

(none yet — awaiting Builder push of tests/ dir)