74 lines
4.0 KiB
Markdown
74 lines
4.0 KiB
Markdown
# REVIEW — phase aotest (Adversary log)
|
|
|
|
**Phase plan:** `/srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md`
|
|
**Deliverable repo:** `recipe-maintainers/agent-orchestrator` on git.autonomic.zone
|
|
|
|
---
|
|
|
|
## Adversary orientation @2026-06-13T18:44Z
|
|
|
|
**Mission:** Verify the agent-orchestrator harness runs a real project generically on BOTH
|
|
claude and opencode backends, fully isolated, with a committed test suite.
|
|
|
|
**DoD items to verify (from phase plan):**
|
|
1. Unit tests PASS — run from clean /tmp checkout inside `nix develop`
|
|
2. claude smoke test PASSES via the harness (isolated, cleaned up)
|
|
3. opencode smoke test PASSES or SKIPs with clear, justified reason recorded here
|
|
4. No leftover `aotest-*` tmux sessions or held ports after the run; live cc-ci sessions
|
|
(cc-ci-orchestrator/watchdog/assistant3) untouched
|
|
5. Test suite + runner committed and documented in README
|
|
|
|
**Key guardrails for my verification:**
|
|
- Must use a non-`cc-ci-` session prefix (aotest-* is correct)
|
|
- opencode port must ≠ 4096 (the live cc-ci port)
|
|
- Do NOT touch live launch system: `/srv/cc-ci/cc-ci-plan/agents.py`, `agents.toml`,
|
|
`cc-ci-plan/state/`, or running tmux sessions
|
|
- Verify from COLD START: fresh shell, /tmp checkout, no cached state
|
|
|
|
**Repo state at orientation:** v0.1.0 (commit `289ef07`) — no tests/ dir present yet.
|
|
Awaiting Builder to push the aotest deliverable.
|
|
|
|
**Code orientation @2026-06-13T18:44Z (from clean /tmp/ao-adv-check clone):**
|
|
|
|
Key functions the unit tests MUST exercise (from reading agents.py 929 lines):
|
|
- `load_config`: session_prefix required → hard die; log_dir required → hard die; defaults merge;
|
|
project_dir resolution; agents inherit defaults; services inherit defaults
|
|
- `build_loop_kickoff`: reads `[loop].kickoff_template`, fills `{phase_id}/{plan}/{status}/{role}`,
|
|
then appends `<roles_dir>/<role>.md`. No project text in code — must test slot substitution.
|
|
- `phase_done`: reads `status_basename` from `handoff_repo(cfg)`, looks for `done_marker` line;
|
|
skips DONE_PLACEHOLDER_RE lines. Must test: file absent → False, no marker → False, marker present
|
|
→ True, placeholder line → False.
|
|
- `phase_advance_check`: auto-advance on DONE marker; idempotent when SEQUENCE-COMPLETE exists;
|
|
appending a phase clears SEQUENCE-COMPLETE marker and resumes.
|
|
- `_parse_reset_epoch`: AM/PM handling (12pm=12:00, 12am=00:00), 24h format, invalid hour/minute
|
|
returns None, no match returns None. Takes the LAST match.
|
|
- `_parse_waiting_until`: footer_ui branch uses last non-empty line only; non-footer scans whole
|
|
pane. ISO-8601 with Z suffix. Invalid format returns None.
|
|
- `pane_active`: claude backend uses `active_re` match; opencode uses `footer_ui` branch (only
|
|
last line of 3 matters); limit banner + idle = not active (tested in selftest).
|
|
|
|
**Live smoke isolation requirements (DoD verification):**
|
|
- claude smoke: session prefix must be `aotest-` (NOT `cc-ci-`), isolated log dir under /tmp
|
|
- opencode smoke: port must ≠ 4096 (live cc-ci port is 4096), own server, own prefix
|
|
- Post-run: `tmux ls | grep aotest` → zero results; live sessions intact
|
|
|
|
**Specific break-it checks I will run:**
|
|
1. `tmux ls | grep aotest` before AND after — no leakage
|
|
2. `ss -ltn | grep 4096` — opencode test must NOT use this port
|
|
3. Check cc-ci sessions: cc-ci-orchestrator, cc-ci-watchdog, cc-ci-assistant3 still present
|
|
4. Try to interrupt the live smoke mid-run (if isolatable) — cleanup still fires
|
|
5. Unit test edge cases:
|
|
- load_config with missing session_prefix → expect die()
|
|
- load_config with missing log_dir → expect die()
|
|
- phase_done with ## DONE followed only by placeholder → expect False
|
|
- _parse_reset_epoch("resets Jun 16, 12pm") → 12:00 (NOT 24:00 which is invalid)
|
|
- _parse_reset_epoch("resets Jun 16, 12am") → 00:00 (not 12:00)
|
|
- _parse_waiting_until with footer_ui=True: only last non-empty line checked
|
|
6. Confirm selftest (DoD-3 of aoeng) still passes after any test infrastructure changes
|
|
|
|
---
|
|
|
|
## Verdicts
|
|
|
|
*(none yet — awaiting Builder push of tests/ dir)*
|