Files
cc-ci-orchestrator/cc-ci-plan/JOURNAL.md

9.0 KiB
Raw Blame History

Orchestrator journal

This file is the persistent handoff record for the cc-ci orchestrator. Every orchestrator session (whether Claude or opencode) reads this on startup and appends to it when handing off or when something noteworthy happens. It survives conversation resets — it is the memory that --resume can't provide for opencode, and a more readable supplement for Claude sessions.

On startup: read this file before doing anything else. The most recent ## Session entry is where the previous session left off. Carry that context forward.

On handoff / end of session: append a ## Session block (see format below) summarising what happened, the current state, and anything the next session needs to know.

On significant events mid-session: append a ### Event sub-entry (no need to wait for handoff).


Format

## Session YYYY-MM-DD HH:MM UTC — <backend> <model>
**Left off:** <one sentence  what was the last thing done>
**Phase / loop state:** <phase X [N/11], loops RUNNING/stopped, cc-ci healthy/issue>
**Open items:** <bullet list of anything the next session needs to act on, or "none">
**Notes:** <anything surprising, a decision made, a known blocker, etc.>

### Event HH:MM — <short label>
<brief note>

Session 2026-05-31 ~18:30 UTC — Claude Sonnet 4.6

Left off: Got opencode/deepseek-v4-pro working as the loop backend. Both builder and adversary are actively running on tinfoil/deepseek-v4-pro (via inference.tinfoil.sh). Phase 5 [11/11] in progress. The operator is debugging the opencode web UI visibility and wants to continue orchestrating from opencode itself.

Phase / loop state:

  • Phase 5 [11/11] (plan-phase5-verify-upgrade-flow.md), in progress
  • Latest product-repo commit: de635adstatus(5): V3 DONE (custom-html-tiny upgrade GREEN); V7 DONE; A5-1/A5-2 fixed
  • Loops RUNNING on opencode/deepseek-v4-pro, actively processing (3262K tokens in flight)
  • Watchdog RUNNING, backend persisted to .loop-backend / .loop-model files

Open items for next session:

  • Phase 5 loops need to finish V1V9 and write ## DONE to STATUS-5.md. They were at V3+V7 PASS before the backend switch. After completing phase 5, phase 6 (reconcile-only over all 18 recipe mirrors) and phase 7 (full upgrade on n8n + ghost + matrix-synapse) still need running.
  • Phase 4 (final review/polish) was deliberately deferred — run it after weekly Opus credits reset. Phase idx currently at 10 (phase 5). To run phase 4 later: set idx to 9, start with LOOP_BACKEND=claude RESUME_PHASE=1 cc-ci-plan/launch.sh start.
  • Restart loops after reading this — the current sessions are mid-processing. cc-ci-plan/launch.sh status will show state; if sessions are stalled, LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start.
  • DNS: oc.commoninternet.net A 100.84.190.30 still needs adding (operator step). Web UI reachable directly at http://100.84.190.30 in the meantime.
  • Old Incus orchestrator VM (cc-ci-orchestrator, 100.116.55.106) still cold standby — stop + delete when confident in Hetzner.

Notes — opencode/tinfoil setup (critical for next session):

  • Backend files: LOOP_BACKEND=opencode and LOOP_MODEL=tinfoil/deepseek-v4-pro are persisted in /srv/cc-ci/.cc-ci-logs/.loop-backend and .loop-model. The watchdog reads these to restart dead sessions with the right backend.
  • API key: stored in /srv/cc-ci/.testenv as TINFOIL_API_KEY. Written directly (not via env:) into ~/.config/opencode/opencode.jsonc — opencode doesn't do env substitution in apiKey. The config also has "permission": "allow" (all tool calls auto-approved).
  • Inference URL: https://inference.tinfoil.sh/v1 (NOT api.tinfoil.sh — that's the control plane only). Fixed in both .testenv and opencode.jsonc.
  • Opencode web server: opencode-web.service runs opencode serve --hostname 127.0.0.1 --port 4096. Nginx proxies oc.commoninternet.net → localhost:4096 on tailscale IP. Sessions from the plain opencode TUI DO appear in the shared server's DB (they auto-connect via IPC), so the web UI should show them once DNS is set.
  • Launch command for opencode loops: LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start
  • Launch command for claude loops (fallback): LOOP_BACKEND=claude LOOP_MODEL=sonnet RESUME_PHASE=1 cc-ci-plan/launch.sh start
  • Launchers rewritten to Python: launch.py, launch-orchestrator.py, launch-upgrader.py (bash wrappers are one-liners). All committed to recipe-maintainers/cc-ci-orchestrator (HEAD: 3412100).
  • Opencode binary: /home/loops/.local/bin/opencode v1.15.13. Re-install if missing: curl -sL https://github.com/anomalyco/opencode/releases/download/v1.15.13/opencode-linux-x64.tar.gz | tar -xz -C /home/loops/.local/bin opencode
  • Known opencode quirk: the loop bootstrap message (pointing to the kickoff file) is sent via ping_session with submit_key="Enter". The TUI needs ~8s to connect before the message is sent. If a session seems stuck at the blank prompt, manually send: the message from .cc-ci-logs/.kickoff-cc-ci-builder.txt (or adv), then press Enter.
  • Orchestrator in opencode: LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro cc-ci-plan/launch-orchestrator.sh fresh — no --resume (opencode doesn't support it); reads this JOURNAL.md as startup context.

Event 04:13 — migrated orchestrator to Hetzner cpx22

cc-ci-loops.service enabled, reboot-resilient. cc-ci server also Hetzner (server 134485294, ssh cc-ci100.95.31.88).

Event 13:22 — phase 4 paused, phase 5 started

Weekly Opus credits exhausted mid-session. Switched to Sonnet. Phase idx manually set to 10 (phase 5).

Event 17:29 — loops stopped to switch backends

Event 18:20 — opencode/deepseek loops running

After 7 bug fixes (wrong inference host, opencode run exits, --dir exits, env: not substituted in apiKey, permission prompts, submit key, timing), both loops now running on tinfoil/deepseek-v4-pro via the shared opencode-web.service.


Session 2026-05-31 ~04:00 UTC — Claude Sonnet 4.6

Left off: Completed the orchestrator → Hetzner migration (cpx22, server 134487234, public 168.119.126.100, tailnet cc-ci-orchestrator-1 @ 100.84.190.30). The old Incus VM (100.116.55.106) is still on the tailnet — cold standby, not yet deleted.

Phase / loop state: Phases 1c1e, 2w, 2pc, 2, 2b, 3 all DONE. Phase 5 [11/11] (upgrade-flow verify) in progress — loops running, actively verifying the !testme end-to-end flow on the new Hetzner cc-ci server.

Open items:

  • Phase 5 is in progress — loops need to finish V1V9 and write ## DONE to STATUS-5.md.
  • Phase 4 (final review/polish) was deliberately skipped this session — it is queued at idx 9 in PHASE_IDX_FILE. Resume it after the weekly Opus credits reset.
  • Phase 6 (reconcile-only over all 18 recipe mirrors) and Phase 7 (full upgrade on n8n + ghost + matrix-synapse) are planned but not yet started — run them after Phase 5 DONE.
  • Old Incus orchestrator VM (cc-ci-orchestrator, 100.116.55.106) is still running — stop it via the b1 Incus API once happy with the Hetzner box. mTLS certs at /srv/incus-terraform-nix-vm-creator/terraform-secrets/.
  • DNS: oc.commoninternet.net A record → 100.84.190.30 still needs adding (operator step).

Notes:

  • cc-ci-loops.service is enabled and wired with reboot-log.sh ExecStartPre — a reboot is a non-event; loops + watchdog auto-resume via RESUME_PHASE=1.
  • The cc-ci server also moved to Hetzner (server 134485294, ssh cc-ci100.95.31.88). It has authenticated Docker Hub pulls and 150 GB disk — the old OOM / disk-starvation / rate-limit issues are gone.
  • All recipe mirrors currently reconcile correctly; no stale open PRs observed.
  • opencode v1.15.13 installed at /home/loops/.local/bin/opencode. Tinfoil API key is in .testenv as TINFOIL_API_KEY. Backend switch: LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start.
  • Launcher scripts rewritten to Python (launch.py, launch-orchestrator.py, launch-upgrader.py); bash wrappers are now one-liners that exec python3 <script> "$@".

Event 03:13 — migrated from old Incus VM to Hetzner

Loops were started manually during staging (not by the service); first systemd-managed boot was later this session. cc-ci-loops.service now enabled.

Event 05:23 — phase 3 (results-UX) completed

All R1R8 Adversary-verified, no VETO. Watchdog auto-advanced to phase 4.

Event 13:22 — phase 4 paused, jumped to phase 5

Operator deferred phase 4 (weekly Opus credits exhausted). Phase idx manually set to 10 (phase 5). Loops restarted on Sonnet.

Event 17:29 — loops stopped pending restart on different model

Operator paused loops to reconfigure backend (opencode/tinfoil exploration). Phase 5 [11/11] was in progress — loops had verified V1/V2/V3/V7 (custom-html-tiny upgrade GREEN). Phase idx = 10 (phase 5), loops stopped, watchdog stopped.