218 lines
9.5 KiB
Markdown
218 lines
9.5 KiB
Markdown
# REVIEW — phase aotest (Adversary log)
|
||
|
||
**Phase plan:** `/srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md`
|
||
**Deliverable repo:** `recipe-maintainers/agent-orchestrator` on git.autonomic.zone
|
||
|
||
---
|
||
|
||
## Adversary orientation @2026-06-13T18:44Z
|
||
|
||
**Mission:** Verify the agent-orchestrator harness runs a real project generically on BOTH
|
||
claude and opencode backends, fully isolated, with a committed test suite.
|
||
|
||
**DoD items to verify (from phase plan):**
|
||
1. Unit tests PASS — run from clean /tmp checkout inside `nix develop`
|
||
2. claude smoke test PASSES via the harness (isolated, cleaned up)
|
||
3. opencode smoke test PASSES or SKIPs with clear, justified reason recorded here
|
||
4. No leftover `aotest-*` tmux sessions or held ports after the run; live cc-ci sessions
|
||
(cc-ci-orchestrator/watchdog/assistant3) untouched
|
||
5. Test suite + runner committed and documented in README
|
||
|
||
**Key guardrails for my verification:**
|
||
- Must use a non-`cc-ci-` session prefix (aotest-* is correct)
|
||
- opencode port must ≠ 4096 (the live cc-ci port)
|
||
- Do NOT touch live launch system: `/srv/cc-ci/cc-ci-plan/agents.py`, `agents.toml`,
|
||
`cc-ci-plan/state/`, or running tmux sessions
|
||
- Verify from COLD START: fresh shell, /tmp checkout, no cached state
|
||
|
||
**Repo state at orientation:** v0.1.0 (commit `289ef07`) — no tests/ dir present yet.
|
||
Awaiting Builder to push the aotest deliverable.
|
||
|
||
**Code orientation @2026-06-13T18:44Z (from clean /tmp/ao-adv-check clone):**
|
||
|
||
Key functions the unit tests MUST exercise (from reading agents.py 929 lines):
|
||
- `load_config`: session_prefix required → hard die; log_dir required → hard die; defaults merge;
|
||
project_dir resolution; agents inherit defaults; services inherit defaults
|
||
- `build_loop_kickoff`: reads `[loop].kickoff_template`, fills `{phase_id}/{plan}/{status}/{role}`,
|
||
then appends `<roles_dir>/<role>.md`. No project text in code — must test slot substitution.
|
||
- `phase_done`: reads `status_basename` from `handoff_repo(cfg)`, looks for `done_marker` line;
|
||
skips DONE_PLACEHOLDER_RE lines. Must test: file absent → False, no marker → False, marker present
|
||
→ True, placeholder line → False.
|
||
- `phase_advance_check`: auto-advance on DONE marker; idempotent when SEQUENCE-COMPLETE exists;
|
||
appending a phase clears SEQUENCE-COMPLETE marker and resumes.
|
||
- `_parse_reset_epoch`: AM/PM handling (12pm=12:00, 12am=00:00), 24h format, invalid hour/minute
|
||
returns None, no match returns None. Takes the LAST match.
|
||
- `_parse_waiting_until`: footer_ui branch uses last non-empty line only; non-footer scans whole
|
||
pane. ISO-8601 with Z suffix. Invalid format returns None.
|
||
- `pane_active`: claude backend uses `active_re` match; opencode uses `footer_ui` branch (only
|
||
last line of 3 matters); limit banner + idle = not active (tested in selftest).
|
||
|
||
**Live smoke isolation requirements (DoD verification):**
|
||
- claude smoke: session prefix must be `aotest-` (NOT `cc-ci-`), isolated log dir under /tmp
|
||
- opencode smoke: port must ≠ 4096 (live cc-ci port is 4096), own server, own prefix
|
||
- Post-run: `tmux ls | grep aotest` → zero results; live sessions intact
|
||
|
||
**Specific break-it checks I will run:**
|
||
1. `tmux ls | grep aotest` before AND after — no leakage
|
||
2. `ss -ltn | grep 4096` — opencode test must NOT use this port
|
||
3. Check cc-ci sessions: cc-ci-orchestrator, cc-ci-watchdog, cc-ci-assistant3 still present
|
||
4. Try to interrupt the live smoke mid-run (if isolatable) — cleanup still fires
|
||
5. Unit test edge cases:
|
||
- load_config with missing session_prefix → expect die()
|
||
- load_config with missing log_dir → expect die()
|
||
- phase_done with ## DONE followed only by placeholder → expect False
|
||
- _parse_reset_epoch("resets Jun 16, 12pm") → 12:00 (NOT 24:00 which is invalid)
|
||
- _parse_reset_epoch("resets Jun 16, 12am") → 00:00 (not 12:00)
|
||
- _parse_waiting_until with footer_ui=True: only last non-empty line checked
|
||
6. Confirm selftest (DoD-3 of aoeng) still passes after any test infrastructure changes
|
||
|
||
---
|
||
|
||
## Verdicts
|
||
|
||
### ALL DoD items: PASS @2026-06-13T19:00Z
|
||
|
||
Cold verification from clean `/tmp/ao-adv-check` clone (fresh git clone before pulling the
|
||
Builder's STATUS — verdict formed independently). Commit verified: `cdcece9a9ac64b458103194025f2c22ba830ce15`.
|
||
|
||
```
|
||
rm -rf /tmp/ao-adv-check
|
||
git clone https://...@git.autonomic.zone/recipe-maintainers/agent-orchestrator.git /tmp/ao-adv-check
|
||
git -C /tmp/ao-adv-check rev-parse HEAD
|
||
# → cdcece9a9ac64b458103194025f2c22ba830ce15 ✓ matches claimed commit
|
||
```
|
||
|
||
---
|
||
|
||
#### DoD-1 — Unit tests PASS (clean /tmp, nix develop): PASS
|
||
|
||
```
|
||
cd /tmp/ao-adv-check && nix develop -c python3 -m unittest discover -s tests -p 'test_*.py' -v
|
||
```
|
||
|
||
```
|
||
Ran 51 tests in 0.062s
|
||
OK
|
||
```
|
||
|
||
51 tests, rc=0. Coverage confirmed:
|
||
- `TestConfigLoad` (12 tests): session_prefix required die, log_dir required die, defaults merge,
|
||
explicit session override, per-agent override wins, relative/absolute dir resolution, log_dir
|
||
resolved, state_dir created, service session named, backend_of resolves, backend_of unknown dies,
|
||
env AGENT_MODEL override single-invocation
|
||
- `TestExampleConfig` (1 test): shipped `agents.example.toml` loads with expected shape
|
||
- `TestKickoff` (5 tests): slot fill ({phase_id}/{plan}/{status}/{role}), correct role prompt
|
||
appended, no unrendered slots, agent_prompt dispatches correctly, role_model phase override
|
||
- `TestPhaseMachine` (8 tests): phase_done detects marker, rejects placeholder, false when no
|
||
marker, false when file missing; cur_idx reads state file; advance on DONE; sequence-complete
|
||
idempotent (no re-stop on 2nd call); append-phase clears SEQUENCE-COMPLETE and resumes;
|
||
custom done_marker respected
|
||
- `TestLimitParsing` (8 tests): PM, AM+minutes, 12am=midnight, invalid hour=None, no match=None,
|
||
picks last match, unparsable fallback, within-6h window uses banner, >6h falls back
|
||
- `TestWaitingUntil` (5 tests): non-footer finds marker anywhere, non-footer None without marker,
|
||
footer ignores marker not in last line, footer honors marker as last line, bad timestamp=None
|
||
- `TestActivityDetection` (8 tests): claude active_re (esc to interrupt, Running tool, spinner),
|
||
claude idle not active; opencode active footer, idle footer, active-only-at-top ignored,
|
||
log_grace fallback via mtime
|
||
|
||
---
|
||
|
||
#### DoD-2 — claude smoke PASSES via harness: PASS
|
||
|
||
```
|
||
cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_claude.sh
|
||
```
|
||
|
||
```
|
||
=== claude backend smoke (isolated: prefix=aotest-c-681472-) ===
|
||
[agents] starting aotest-c-681472-probe (claude, kind=persistent, model=claude-haiku-4-5)
|
||
PASS: session aotest-c-681472-probe created via agents.py (pane command: claude)
|
||
PASS: claude TUI attached + alive (driven entirely by agents.py)
|
||
PASS: agents.py status reports probe RUNNING
|
||
PASS: agents.py down cleanly removed the session
|
||
=== CLAUDE BACKEND SMOKE: PASS ===
|
||
```
|
||
|
||
Confirmed: isolated prefix `aotest-c-<pid>-` (not cc-ci-), temp sandbox log_dir, pane command
|
||
is `claude` (TUI alive), status RUNNING, down cleans up. Cleanup trap on EXIT/INT/TERM.
|
||
|
||
---
|
||
|
||
#### DoD-3 — opencode smoke PASSES via harness (dedicated port ≠ 4096): PASS
|
||
|
||
```
|
||
cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_opencode.sh
|
||
```
|
||
|
||
```
|
||
=== opencode backend smoke (isolated: prefix=aotest-o-681566- port=4097) ===
|
||
PASS: dedicated opencode server listening on :4097
|
||
[agents] starting aotest-o-681566-probe (opencode, kind=persistent, model=default)
|
||
PASS: session aotest-o-681566-probe created via agents.py (pane command: opencode)
|
||
PASS: opencode TUI attached + alive (driven entirely by agents.py)
|
||
PASS: agents.py status reports probe RUNNING
|
||
PASS: agents.py down cleanly removed the session
|
||
=== OPENCODE BACKEND SMOKE: PASS ===
|
||
```
|
||
|
||
Confirmed: dedicated server on `:4097` (script has hardcoded guard refusing `4096`); isolated
|
||
prefix `aotest-o-<pid>-`; TUI attached; cleanup kills server AND does `pkill -f "opencode serve.*--port ${PORT}"` + waits for port to free.
|
||
|
||
---
|
||
|
||
#### DoD-4 — No leftover aotest-* sessions or ports; cc-ci sessions intact: PASS
|
||
|
||
Post-run isolation check (after full suite via run.sh):
|
||
|
||
```
|
||
tmux ls | grep '^aotest-'
|
||
# → (no output) ✓
|
||
|
||
ss -ltn | grep ':4097 '
|
||
# → (no output) ✓
|
||
|
||
tmux ls | grep -E 'cc-ci-orchestrator|cc-ci-watchdog|cc-ci-assistant3'
|
||
# → cc-ci-assistant3, cc-ci-orchestrator, cc-ci-watchdog ✓
|
||
```
|
||
|
||
run.sh isolation sanity block output:
|
||
```
|
||
>>> ISOLATION SANITY
|
||
PASS: no leftover aotest-* tmux sessions
|
||
info: live cc-ci sessions present: cc-ci-orchestrator cc-ci-watchdog cc-ci-assistant3
|
||
```
|
||
|
||
---
|
||
|
||
#### DoD-5 — Test suite + runner committed and documented: PASS
|
||
|
||
Files at commit `cdcece9`:
|
||
- `tests/test_unit.py` — 51-test stdlib unittest suite ✓
|
||
- `tests/smoke_claude.sh` — isolated live claude smoke ✓
|
||
- `tests/smoke_opencode.sh` — isolated live opencode smoke ✓
|
||
- `tests/run.sh` — runner: unit always, live smokes when available, isolation sanity ✓
|
||
|
||
README `## Testing` section (lines ~321–351):
|
||
- Documents `nix develop -c ./tests/run.sh` as the canonical invocation ✓
|
||
- Explains what each layer covers (unit vs live vs isolation) ✓
|
||
- Documents skip conditions (backend bin/creds absent) ✓
|
||
- Documents useful env vars (CLAUDE_BIN, AOTEST_MODEL, AOTEST_OC_PORT, AOTEST_OC_CREDS) ✓
|
||
- Notes safety by construction (non-cc-ci prefix, non-4096 port, cleanup trap) ✓
|
||
|
||
---
|
||
|
||
### Full suite summary (run.sh output)
|
||
|
||
```
|
||
SUMMARY: unit=PASS claude=PASS opencode=PASS isolation=PASS
|
||
ALL RUN TESTS PASSED (skips are OK)
|
||
```
|
||
|
||
rc=0. Verified at commit `cdcece9`, clean /tmp clone, nix develop (Python 3.11.11, tmux 3.5a).
|
||
|
||
---
|
||
|
||
### No findings. No veto. Phase aotest is DONE.
|
||
|
||
All 5 DoD items PASS at 2026-06-13T19:00Z on commit `cdcece9`.
|