Bash scripts are now one-liner wrappers: exec python3 <script>.py "$@" All logic lives in the Python scripts (pure stdlib, no deps). launch.py — loops + watchdog: Full port of launch.sh: phase sequencing, start/stop/status/logs/watchdog, handoff signalling, stall detection, heal_session, heal_orchestrator. Cleaner structure: config block → helpers → phase/kickoff/agent/healing/ handoff/watchdog/main. LOOP_BACKEND + LOOP_MODEL switches throughout. launch-orchestrator.py — orchestrator session: claude path: --resume <id> preserved (conversation survives reboots). opencode path: run --attach --title (no --resume; STARTUP_PROMPT orients the new session; reads JOURNAL.md for context). STARTUP_PROMPT updated to reference JOURNAL.md on startup. launch-upgrader.py — one-shot upgrade job: LOOP_BACKEND / LOOP_MODEL take precedence over UPGRADER_BACKEND / UPGRADER_MODEL. Both claude and opencode paths supported. cc-ci-plan/JOURNAL.md — new orchestrator handoff file: Persistent across conversation resets. Documents the handoff format and carries the current session's summary: migration complete, phase 5 in progress (V3/V7 PASS), phase 4 deferred, open items for next session. AGENTS.md: step 1 on startup = read JOURNAL.md; step 5 = append on handoff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
83 lines
4.2 KiB
Markdown
83 lines
4.2 KiB
Markdown
# Orchestrator journal
|
||
|
||
This file is the **persistent handoff record** for the cc-ci orchestrator. Every orchestrator
|
||
session (whether Claude or opencode) reads this on startup and appends to it when handing off or
|
||
when something noteworthy happens. It survives conversation resets — it is the memory that
|
||
`--resume` can't provide for opencode, and a more readable supplement for Claude sessions.
|
||
|
||
**On startup:** read this file before doing anything else. The most recent `## Session` entry
|
||
is where the previous session left off. Carry that context forward.
|
||
|
||
**On handoff / end of session:** append a `## Session` block (see format below) summarising
|
||
what happened, the current state, and anything the next session needs to know.
|
||
|
||
**On significant events mid-session:** append a `### Event` sub-entry (no need to wait for
|
||
handoff).
|
||
|
||
---
|
||
|
||
## Format
|
||
|
||
```markdown
|
||
## Session YYYY-MM-DD HH:MM UTC — <backend> <model>
|
||
**Left off:** <one sentence — what was the last thing done>
|
||
**Phase / loop state:** <phase X [N/11], loops RUNNING/stopped, cc-ci healthy/issue>
|
||
**Open items:** <bullet list of anything the next session needs to act on, or "none">
|
||
**Notes:** <anything surprising, a decision made, a known blocker, etc.>
|
||
|
||
### Event HH:MM — <short label>
|
||
<brief note>
|
||
```
|
||
|
||
---
|
||
|
||
## Session 2026-05-31 ~04:00 UTC — Claude Sonnet 4.6
|
||
|
||
**Left off:** Completed the orchestrator → Hetzner migration (cpx22, server 134487234, public
|
||
`168.119.126.100`, tailnet `cc-ci-orchestrator-1` @ `100.84.190.30`). The old Incus VM
|
||
(`100.116.55.106`) is still on the tailnet — cold standby, not yet deleted.
|
||
|
||
**Phase / loop state:** Phases 1c–1e, 2w, 2pc, 2, 2b, 3 all DONE. Phase 5 [11/11]
|
||
(upgrade-flow verify) in progress — loops running, actively verifying the `!testme`
|
||
end-to-end flow on the new Hetzner cc-ci server.
|
||
|
||
**Open items:**
|
||
- Phase 5 is in progress — loops need to finish V1–V9 and write `## DONE` to STATUS-5.md.
|
||
- Phase 4 (final review/polish) was deliberately **skipped** this session — it is queued
|
||
at idx 9 in PHASE_IDX_FILE. Resume it after the weekly Opus credits reset.
|
||
- Phase 6 (reconcile-only over all 18 recipe mirrors) and Phase 7 (full upgrade on n8n +
|
||
ghost + matrix-synapse) are planned but not yet started — run them after Phase 5 DONE.
|
||
- Old Incus orchestrator VM (`cc-ci-orchestrator`, `100.116.55.106`) is still running —
|
||
stop it via the b1 Incus API once happy with the Hetzner box. mTLS certs at
|
||
`/srv/incus-terraform-nix-vm-creator/terraform-secrets/`.
|
||
- DNS: `oc.commoninternet.net` A record → `100.84.190.30` still needs adding (operator step).
|
||
|
||
**Notes:**
|
||
- `cc-ci-loops.service` is **enabled** and wired with `reboot-log.sh` ExecStartPre — a reboot
|
||
is a non-event; loops + watchdog auto-resume via RESUME_PHASE=1.
|
||
- The cc-ci **server** also moved to Hetzner (server 134485294, `ssh cc-ci` →
|
||
`100.95.31.88`). It has authenticated Docker Hub pulls and 150 GB disk — the old OOM /
|
||
disk-starvation / rate-limit issues are gone.
|
||
- All recipe mirrors currently reconcile correctly; no stale open PRs observed.
|
||
- `opencode` v1.15.13 installed at `/home/loops/.local/bin/opencode`. Tinfoil API key is in
|
||
`.testenv` as `TINFOIL_API_KEY`. Backend switch: `LOOP_BACKEND=opencode
|
||
LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
|
||
- Launcher scripts rewritten to Python (`launch.py`, `launch-orchestrator.py`,
|
||
`launch-upgrader.py`); bash wrappers are now one-liners that `exec python3 <script> "$@"`.
|
||
|
||
### Event 03:13 — migrated from old Incus VM to Hetzner
|
||
Loops were started manually during staging (not by the service); first systemd-managed
|
||
boot was later this session. `cc-ci-loops.service` now enabled.
|
||
|
||
### Event 05:23 — phase 3 (results-UX) completed
|
||
All R1–R8 Adversary-verified, no VETO. Watchdog auto-advanced to phase 4.
|
||
|
||
### Event 13:22 — phase 4 paused, jumped to phase 5
|
||
Operator deferred phase 4 (weekly Opus credits exhausted). Phase idx manually set to 10
|
||
(phase 5). Loops restarted on Sonnet.
|
||
|
||
### Event 17:29 — loops stopped pending restart on different model
|
||
Operator paused loops to reconfigure backend (opencode/tinfoil exploration). Phase 5
|
||
[11/11] was in progress — loops had verified V1/V2/V3/V7 (custom-html-tiny upgrade GREEN).
|
||
Phase idx = 10 (phase 5), loops stopped, watchdog stopped.
|