Files
cc-ci/machine-docs/REVIEW-aotest.md
autonomic-bot 3568754e64
Some checks failed
continuous-integration/drone/push Build is failing
review(aotest): ALL DoD PASS @2026-06-13T19:00Z — phase DONE
2026-06-13 19:02:06 +00:00

9.5 KiB
Raw Blame History

REVIEW — phase aotest (Adversary log)

Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md Deliverable repo: recipe-maintainers/agent-orchestrator on git.autonomic.zone


Adversary orientation @2026-06-13T18:44Z

Mission: Verify the agent-orchestrator harness runs a real project generically on BOTH claude and opencode backends, fully isolated, with a committed test suite.

DoD items to verify (from phase plan):

  1. Unit tests PASS — run from clean /tmp checkout inside nix develop
  2. claude smoke test PASSES via the harness (isolated, cleaned up)
  3. opencode smoke test PASSES or SKIPs with clear, justified reason recorded here
  4. No leftover aotest-* tmux sessions or held ports after the run; live cc-ci sessions (cc-ci-orchestrator/watchdog/assistant3) untouched
  5. Test suite + runner committed and documented in README

Key guardrails for my verification:

  • Must use a non-cc-ci- session prefix (aotest-* is correct)
  • opencode port must ≠ 4096 (the live cc-ci port)
  • Do NOT touch live launch system: /srv/cc-ci/cc-ci-plan/agents.py, agents.toml, cc-ci-plan/state/, or running tmux sessions
  • Verify from COLD START: fresh shell, /tmp checkout, no cached state

Repo state at orientation: v0.1.0 (commit 289ef07) — no tests/ dir present yet. Awaiting Builder to push the aotest deliverable.

Code orientation @2026-06-13T18:44Z (from clean /tmp/ao-adv-check clone):

Key functions the unit tests MUST exercise (from reading agents.py 929 lines):

  • load_config: session_prefix required → hard die; log_dir required → hard die; defaults merge; project_dir resolution; agents inherit defaults; services inherit defaults
  • build_loop_kickoff: reads [loop].kickoff_template, fills {phase_id}/{plan}/{status}/{role}, then appends <roles_dir>/<role>.md. No project text in code — must test slot substitution.
  • phase_done: reads status_basename from handoff_repo(cfg), looks for done_marker line; skips DONE_PLACEHOLDER_RE lines. Must test: file absent → False, no marker → False, marker present → True, placeholder line → False.
  • phase_advance_check: auto-advance on DONE marker; idempotent when SEQUENCE-COMPLETE exists; appending a phase clears SEQUENCE-COMPLETE marker and resumes.
  • _parse_reset_epoch: AM/PM handling (12pm=12:00, 12am=00:00), 24h format, invalid hour/minute returns None, no match returns None. Takes the LAST match.
  • _parse_waiting_until: footer_ui branch uses last non-empty line only; non-footer scans whole pane. ISO-8601 with Z suffix. Invalid format returns None.
  • pane_active: claude backend uses active_re match; opencode uses footer_ui branch (only last line of 3 matters); limit banner + idle = not active (tested in selftest).

Live smoke isolation requirements (DoD verification):

  • claude smoke: session prefix must be aotest- (NOT cc-ci-), isolated log dir under /tmp
  • opencode smoke: port must ≠ 4096 (live cc-ci port is 4096), own server, own prefix
  • Post-run: tmux ls | grep aotest → zero results; live sessions intact

Specific break-it checks I will run:

  1. tmux ls | grep aotest before AND after — no leakage
  2. ss -ltn | grep 4096 — opencode test must NOT use this port
  3. Check cc-ci sessions: cc-ci-orchestrator, cc-ci-watchdog, cc-ci-assistant3 still present
  4. Try to interrupt the live smoke mid-run (if isolatable) — cleanup still fires
  5. Unit test edge cases:
    • load_config with missing session_prefix → expect die()
    • load_config with missing log_dir → expect die()
    • phase_done with ## DONE followed only by placeholder → expect False
    • _parse_reset_epoch("resets Jun 16, 12pm") → 12:00 (NOT 24:00 which is invalid)
    • _parse_reset_epoch("resets Jun 16, 12am") → 00:00 (not 12:00)
    • _parse_waiting_until with footer_ui=True: only last non-empty line checked
  6. Confirm selftest (DoD-3 of aoeng) still passes after any test infrastructure changes

Verdicts

ALL DoD items: PASS @2026-06-13T19:00Z

Cold verification from clean /tmp/ao-adv-check clone (fresh git clone before pulling the Builder's STATUS — verdict formed independently). Commit verified: cdcece9a9ac64b458103194025f2c22ba830ce15.

rm -rf /tmp/ao-adv-check
git clone https://...@git.autonomic.zone/recipe-maintainers/agent-orchestrator.git /tmp/ao-adv-check
git -C /tmp/ao-adv-check rev-parse HEAD
# → cdcece9a9ac64b458103194025f2c22ba830ce15  ✓ matches claimed commit

DoD-1 — Unit tests PASS (clean /tmp, nix develop): PASS

cd /tmp/ao-adv-check && nix develop -c python3 -m unittest discover -s tests -p 'test_*.py' -v
Ran 51 tests in 0.062s
OK

51 tests, rc=0. Coverage confirmed:

  • TestConfigLoad (12 tests): session_prefix required die, log_dir required die, defaults merge, explicit session override, per-agent override wins, relative/absolute dir resolution, log_dir resolved, state_dir created, service session named, backend_of resolves, backend_of unknown dies, env AGENT_MODEL override single-invocation
  • TestExampleConfig (1 test): shipped agents.example.toml loads with expected shape
  • TestKickoff (5 tests): slot fill ({phase_id}/{plan}/{status}/{role}), correct role prompt appended, no unrendered slots, agent_prompt dispatches correctly, role_model phase override
  • TestPhaseMachine (8 tests): phase_done detects marker, rejects placeholder, false when no marker, false when file missing; cur_idx reads state file; advance on DONE; sequence-complete idempotent (no re-stop on 2nd call); append-phase clears SEQUENCE-COMPLETE and resumes; custom done_marker respected
  • TestLimitParsing (8 tests): PM, AM+minutes, 12am=midnight, invalid hour=None, no match=None, picks last match, unparsable fallback, within-6h window uses banner, >6h falls back
  • TestWaitingUntil (5 tests): non-footer finds marker anywhere, non-footer None without marker, footer ignores marker not in last line, footer honors marker as last line, bad timestamp=None
  • TestActivityDetection (8 tests): claude active_re (esc to interrupt, Running tool, spinner), claude idle not active; opencode active footer, idle footer, active-only-at-top ignored, log_grace fallback via mtime

DoD-2 — claude smoke PASSES via harness: PASS

cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_claude.sh
=== claude backend smoke (isolated: prefix=aotest-c-681472-) ===
[agents] starting aotest-c-681472-probe (claude, kind=persistent, model=claude-haiku-4-5)
  PASS: session aotest-c-681472-probe created via agents.py (pane command: claude)
  PASS: claude TUI attached + alive (driven entirely by agents.py)
  PASS: agents.py status reports probe RUNNING
  PASS: agents.py down cleanly removed the session
=== CLAUDE BACKEND SMOKE: PASS ===

Confirmed: isolated prefix aotest-c-<pid>- (not cc-ci-), temp sandbox log_dir, pane command is claude (TUI alive), status RUNNING, down cleans up. Cleanup trap on EXIT/INT/TERM.


DoD-3 — opencode smoke PASSES via harness (dedicated port ≠ 4096): PASS

cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_opencode.sh
=== opencode backend smoke (isolated: prefix=aotest-o-681566- port=4097) ===
  PASS: dedicated opencode server listening on :4097
[agents] starting aotest-o-681566-probe (opencode, kind=persistent, model=default)
  PASS: session aotest-o-681566-probe created via agents.py (pane command: opencode)
  PASS: opencode TUI attached + alive (driven entirely by agents.py)
  PASS: agents.py status reports probe RUNNING
  PASS: agents.py down cleanly removed the session
=== OPENCODE BACKEND SMOKE: PASS ===

Confirmed: dedicated server on :4097 (script has hardcoded guard refusing 4096); isolated prefix aotest-o-<pid>-; TUI attached; cleanup kills server AND does pkill -f "opencode serve.*--port ${PORT}" + waits for port to free.


DoD-4 — No leftover aotest-* sessions or ports; cc-ci sessions intact: PASS

Post-run isolation check (after full suite via run.sh):

tmux ls | grep '^aotest-'
# → (no output)  ✓

ss -ltn | grep ':4097 '
# → (no output)  ✓

tmux ls | grep -E 'cc-ci-orchestrator|cc-ci-watchdog|cc-ci-assistant3'
# → cc-ci-assistant3, cc-ci-orchestrator, cc-ci-watchdog  ✓

run.sh isolation sanity block output:

>>> ISOLATION SANITY
  PASS: no leftover aotest-* tmux sessions
  info: live cc-ci sessions present: cc-ci-orchestrator cc-ci-watchdog cc-ci-assistant3

DoD-5 — Test suite + runner committed and documented: PASS

Files at commit cdcece9:

  • tests/test_unit.py — 51-test stdlib unittest suite ✓
  • tests/smoke_claude.sh — isolated live claude smoke ✓
  • tests/smoke_opencode.sh — isolated live opencode smoke ✓
  • tests/run.sh — runner: unit always, live smokes when available, isolation sanity ✓

README ## Testing section (lines ~321351):

  • Documents nix develop -c ./tests/run.sh as the canonical invocation ✓
  • Explains what each layer covers (unit vs live vs isolation) ✓
  • Documents skip conditions (backend bin/creds absent) ✓
  • Documents useful env vars (CLAUDE_BIN, AOTEST_MODEL, AOTEST_OC_PORT, AOTEST_OC_CREDS) ✓
  • Notes safety by construction (non-cc-ci prefix, non-4096 port, cleanup trap) ✓

Full suite summary (run.sh output)

SUMMARY:  unit=PASS  claude=PASS  opencode=PASS  isolation=PASS
ALL RUN TESTS PASSED (skips are OK)

rc=0. Verified at commit cdcece9, clean /tmp clone, nix develop (Python 3.11.11, tmux 3.5a).


No findings. No veto. Phase aotest is DONE.

All 5 DoD items PASS at 2026-06-13T19:00Z on commit cdcece9.