Files
cc-ci/machine-docs/REVIEW-aotest.md
autonomic-bot 3568754e64
Some checks failed
continuous-integration/drone/push Build is failing
review(aotest): ALL DoD PASS @2026-06-13T19:00Z — phase DONE
2026-06-13 19:02:06 +00:00

218 lines
9.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW — phase aotest (Adversary log)
**Phase plan:** `/srv/cc-ci/cc-ci-plan/plan-phase-aotest-verify.md`
**Deliverable repo:** `recipe-maintainers/agent-orchestrator` on git.autonomic.zone
---
## Adversary orientation @2026-06-13T18:44Z
**Mission:** Verify the agent-orchestrator harness runs a real project generically on BOTH
claude and opencode backends, fully isolated, with a committed test suite.
**DoD items to verify (from phase plan):**
1. Unit tests PASS — run from clean /tmp checkout inside `nix develop`
2. claude smoke test PASSES via the harness (isolated, cleaned up)
3. opencode smoke test PASSES or SKIPs with clear, justified reason recorded here
4. No leftover `aotest-*` tmux sessions or held ports after the run; live cc-ci sessions
(cc-ci-orchestrator/watchdog/assistant3) untouched
5. Test suite + runner committed and documented in README
**Key guardrails for my verification:**
- Must use a non-`cc-ci-` session prefix (aotest-* is correct)
- opencode port must ≠ 4096 (the live cc-ci port)
- Do NOT touch live launch system: `/srv/cc-ci/cc-ci-plan/agents.py`, `agents.toml`,
`cc-ci-plan/state/`, or running tmux sessions
- Verify from COLD START: fresh shell, /tmp checkout, no cached state
**Repo state at orientation:** v0.1.0 (commit `289ef07`) — no tests/ dir present yet.
Awaiting Builder to push the aotest deliverable.
**Code orientation @2026-06-13T18:44Z (from clean /tmp/ao-adv-check clone):**
Key functions the unit tests MUST exercise (from reading agents.py 929 lines):
- `load_config`: session_prefix required → hard die; log_dir required → hard die; defaults merge;
project_dir resolution; agents inherit defaults; services inherit defaults
- `build_loop_kickoff`: reads `[loop].kickoff_template`, fills `{phase_id}/{plan}/{status}/{role}`,
then appends `<roles_dir>/<role>.md`. No project text in code — must test slot substitution.
- `phase_done`: reads `status_basename` from `handoff_repo(cfg)`, looks for `done_marker` line;
skips DONE_PLACEHOLDER_RE lines. Must test: file absent → False, no marker → False, marker present
→ True, placeholder line → False.
- `phase_advance_check`: auto-advance on DONE marker; idempotent when SEQUENCE-COMPLETE exists;
appending a phase clears SEQUENCE-COMPLETE marker and resumes.
- `_parse_reset_epoch`: AM/PM handling (12pm=12:00, 12am=00:00), 24h format, invalid hour/minute
returns None, no match returns None. Takes the LAST match.
- `_parse_waiting_until`: footer_ui branch uses last non-empty line only; non-footer scans whole
pane. ISO-8601 with Z suffix. Invalid format returns None.
- `pane_active`: claude backend uses `active_re` match; opencode uses `footer_ui` branch (only
last line of 3 matters); limit banner + idle = not active (tested in selftest).
**Live smoke isolation requirements (DoD verification):**
- claude smoke: session prefix must be `aotest-` (NOT `cc-ci-`), isolated log dir under /tmp
- opencode smoke: port must ≠ 4096 (live cc-ci port is 4096), own server, own prefix
- Post-run: `tmux ls | grep aotest` → zero results; live sessions intact
**Specific break-it checks I will run:**
1. `tmux ls | grep aotest` before AND after — no leakage
2. `ss -ltn | grep 4096` — opencode test must NOT use this port
3. Check cc-ci sessions: cc-ci-orchestrator, cc-ci-watchdog, cc-ci-assistant3 still present
4. Try to interrupt the live smoke mid-run (if isolatable) — cleanup still fires
5. Unit test edge cases:
- load_config with missing session_prefix → expect die()
- load_config with missing log_dir → expect die()
- phase_done with ## DONE followed only by placeholder → expect False
- _parse_reset_epoch("resets Jun 16, 12pm") → 12:00 (NOT 24:00 which is invalid)
- _parse_reset_epoch("resets Jun 16, 12am") → 00:00 (not 12:00)
- _parse_waiting_until with footer_ui=True: only last non-empty line checked
6. Confirm selftest (DoD-3 of aoeng) still passes after any test infrastructure changes
---
## Verdicts
### ALL DoD items: PASS @2026-06-13T19:00Z
Cold verification from clean `/tmp/ao-adv-check` clone (fresh git clone before pulling the
Builder's STATUS — verdict formed independently). Commit verified: `cdcece9a9ac64b458103194025f2c22ba830ce15`.
```
rm -rf /tmp/ao-adv-check
git clone https://...@git.autonomic.zone/recipe-maintainers/agent-orchestrator.git /tmp/ao-adv-check
git -C /tmp/ao-adv-check rev-parse HEAD
# → cdcece9a9ac64b458103194025f2c22ba830ce15 ✓ matches claimed commit
```
---
#### DoD-1 — Unit tests PASS (clean /tmp, nix develop): PASS
```
cd /tmp/ao-adv-check && nix develop -c python3 -m unittest discover -s tests -p 'test_*.py' -v
```
```
Ran 51 tests in 0.062s
OK
```
51 tests, rc=0. Coverage confirmed:
- `TestConfigLoad` (12 tests): session_prefix required die, log_dir required die, defaults merge,
explicit session override, per-agent override wins, relative/absolute dir resolution, log_dir
resolved, state_dir created, service session named, backend_of resolves, backend_of unknown dies,
env AGENT_MODEL override single-invocation
- `TestExampleConfig` (1 test): shipped `agents.example.toml` loads with expected shape
- `TestKickoff` (5 tests): slot fill ({phase_id}/{plan}/{status}/{role}), correct role prompt
appended, no unrendered slots, agent_prompt dispatches correctly, role_model phase override
- `TestPhaseMachine` (8 tests): phase_done detects marker, rejects placeholder, false when no
marker, false when file missing; cur_idx reads state file; advance on DONE; sequence-complete
idempotent (no re-stop on 2nd call); append-phase clears SEQUENCE-COMPLETE and resumes;
custom done_marker respected
- `TestLimitParsing` (8 tests): PM, AM+minutes, 12am=midnight, invalid hour=None, no match=None,
picks last match, unparsable fallback, within-6h window uses banner, >6h falls back
- `TestWaitingUntil` (5 tests): non-footer finds marker anywhere, non-footer None without marker,
footer ignores marker not in last line, footer honors marker as last line, bad timestamp=None
- `TestActivityDetection` (8 tests): claude active_re (esc to interrupt, Running tool, spinner),
claude idle not active; opencode active footer, idle footer, active-only-at-top ignored,
log_grace fallback via mtime
---
#### DoD-2 — claude smoke PASSES via harness: PASS
```
cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_claude.sh
```
```
=== claude backend smoke (isolated: prefix=aotest-c-681472-) ===
[agents] starting aotest-c-681472-probe (claude, kind=persistent, model=claude-haiku-4-5)
PASS: session aotest-c-681472-probe created via agents.py (pane command: claude)
PASS: claude TUI attached + alive (driven entirely by agents.py)
PASS: agents.py status reports probe RUNNING
PASS: agents.py down cleanly removed the session
=== CLAUDE BACKEND SMOKE: PASS ===
```
Confirmed: isolated prefix `aotest-c-<pid>-` (not cc-ci-), temp sandbox log_dir, pane command
is `claude` (TUI alive), status RUNNING, down cleans up. Cleanup trap on EXIT/INT/TERM.
---
#### DoD-3 — opencode smoke PASSES via harness (dedicated port ≠ 4096): PASS
```
cd /tmp/ao-adv-check && nix develop -c bash tests/smoke_opencode.sh
```
```
=== opencode backend smoke (isolated: prefix=aotest-o-681566- port=4097) ===
PASS: dedicated opencode server listening on :4097
[agents] starting aotest-o-681566-probe (opencode, kind=persistent, model=default)
PASS: session aotest-o-681566-probe created via agents.py (pane command: opencode)
PASS: opencode TUI attached + alive (driven entirely by agents.py)
PASS: agents.py status reports probe RUNNING
PASS: agents.py down cleanly removed the session
=== OPENCODE BACKEND SMOKE: PASS ===
```
Confirmed: dedicated server on `:4097` (script has hardcoded guard refusing `4096`); isolated
prefix `aotest-o-<pid>-`; TUI attached; cleanup kills server AND does `pkill -f "opencode serve.*--port ${PORT}"` + waits for port to free.
---
#### DoD-4 — No leftover aotest-* sessions or ports; cc-ci sessions intact: PASS
Post-run isolation check (after full suite via run.sh):
```
tmux ls | grep '^aotest-'
# → (no output) ✓
ss -ltn | grep ':4097 '
# → (no output) ✓
tmux ls | grep -E 'cc-ci-orchestrator|cc-ci-watchdog|cc-ci-assistant3'
# → cc-ci-assistant3, cc-ci-orchestrator, cc-ci-watchdog ✓
```
run.sh isolation sanity block output:
```
>>> ISOLATION SANITY
PASS: no leftover aotest-* tmux sessions
info: live cc-ci sessions present: cc-ci-orchestrator cc-ci-watchdog cc-ci-assistant3
```
---
#### DoD-5 — Test suite + runner committed and documented: PASS
Files at commit `cdcece9`:
- `tests/test_unit.py` — 51-test stdlib unittest suite ✓
- `tests/smoke_claude.sh` — isolated live claude smoke ✓
- `tests/smoke_opencode.sh` — isolated live opencode smoke ✓
- `tests/run.sh` — runner: unit always, live smokes when available, isolation sanity ✓
README `## Testing` section (lines ~321351):
- Documents `nix develop -c ./tests/run.sh` as the canonical invocation ✓
- Explains what each layer covers (unit vs live vs isolation) ✓
- Documents skip conditions (backend bin/creds absent) ✓
- Documents useful env vars (CLAUDE_BIN, AOTEST_MODEL, AOTEST_OC_PORT, AOTEST_OC_CREDS) ✓
- Notes safety by construction (non-cc-ci prefix, non-4096 port, cleanup trap) ✓
---
### Full suite summary (run.sh output)
```
SUMMARY: unit=PASS claude=PASS opencode=PASS isolation=PASS
ALL RUN TESTS PASSED (skips are OK)
```
rc=0. Verified at commit `cdcece9`, clean /tmp clone, nix develop (Python 3.11.11, tmux 3.5a).
---
### No findings. No veto. Phase aotest is DONE.
All 5 DoD items PASS at 2026-06-13T19:00Z on commit `cdcece9`.