Files
cc-ci-orchestrator/memory/launch-system-unification.md

44 lines
3.0 KiB
Markdown

---
name: launch-system-unification
description: Plan to replace 5 bespoke launcher scripts + ~15 dotfiles with one agents.toml + one agents.py driver
metadata:
node_type: memory
type: project
originSessionId: fc17c9c2-ab6e-4c11-856e-a6a6e160a0ec
---
The cc-ci agent launch system was 5 near-duplicate launchers (`launch.py` loops+watchdog,
`launch-orchestrator.py`, `launch-assistant.py`, `launch-upgrader.py`, `launch-report.py`)
each re-implementing claude/opencode backend plumbing, plus ~15 scattered dotfiles in
`/srv/cc-ci/.cc-ci-logs/` (`.loop-backend`, `.loop-model*`, `.orch-model`, `.phases-spec`,
`.phase-idx`, `.*-session-id`, `.limited-*`, …).
**STATUS: IMPLEMENTED + CUT OVER 2026-06-13 ~05:27 UTC.** agents.toml + agents.py are live;
orchestrator/builder/adversary + watchdog all respawned under the new system and confirmed
working on phase pvfix. launch.py + launch-orchestrator.sh are now shims → agents.py (originals
at *.orig); systemd boot chain (cc-ci-loops-start → launch.sh → launch.py start) drives the new
system. State moved to .cc-ci-logs/state/ (phase-idx, <name>.id resume, limited-*.json). tmux
targeting uses exact match (=name / =name:) to avoid prefix collisions (cc-ci-assistant vs
cc-ci-assistant3). **VERIFIED STABLE over 3 hourly checks (06:30/07:34/08:36 UTC 2026-06-13);
hourly wake ENDED (orchestrator runs its own hourly self-wake via the watchdog).** During
verification the system autonomously ran the build to completion (pvfix→pvcheck→ghost→cf48, an
operator-appended opus review phase) — clean auto-advances, handoff pings, per-phase model
overrides all worked. One port defect found+fixed at check 2: phase_advance_check is now
idempotent once SEQUENCE-COMPLETE exists (no 5-min log spam) and the watchdog keeps death-healing
the orchestrator without restarting the intentionally-stopped finished loops. End state: build
sequence COMPLETE, loops intentionally stopped, orchestrator supervising; 13 recipe PRs await
operator review/merge.
**Original plan (approved 2026-06-13, operator chose TOML + build-now + de-dupe mailu):** one
declarative `cc-ci-plan/agents.toml` (single source of truth: per-agent backend, model,
prompt, kind, watch policy; backends declared as data) + one `cc-ci-plan/agents.py` driver
(up/down/status/watchdog/logs/phase). Watchdog reads the config file, not env. Config vs
state split: declarative config in TOML, runtime state (phase-idx, resume ids, limit
windows) under a `state/` dir. Full design + behavior-mapping + migration in
`cc-ci-plan/plan-unified-launch.md`. Mimic current behavior first, cut over between phases
(old launchers become shims), then retire them. De-dupe note: live `.phases-spec` lists
`mailu` twice (idx 5 & 7); dropping the 2nd shifts current phase cf55 from idx 10→9.
Agent kinds: loop (builder/adversary, phase machine) | persistent (orchestrator resume+wake,
assistant) | task (upgrader/report one-shot slash command) | service (watchdog/cleanlogs).
See [[orchestrator-backend-switch-gotcha]].