Files
cc-ci-orchestrator/cc-ci-plan/JOURNAL.md

11 KiB
Raw Blame History

Orchestrator journal

This file is the persistent handoff record for the cc-ci orchestrator. Every orchestrator session (whether Claude or opencode) reads this on startup and appends to it when handing off or when something noteworthy happens. It survives conversation resets — it is the memory that --resume can't provide for opencode, and a more readable supplement for Claude sessions.

On startup: read this file before doing anything else. The most recent ## Session entry is where the previous session left off. Carry that context forward.

On handoff / end of session: append a ## Session block (see format below) summarising what happened, the current state, and anything the next session needs to know.

On significant events mid-session: append a ### Event sub-entry (no need to wait for handoff).


Format

## Session YYYY-MM-DD HH:MM UTC — <backend> <model>
**Left off:** <one sentence  what was the last thing done>
**Phase / loop state:** <phase X [N/11], loops RUNNING/stopped, cc-ci healthy/issue>
**Open items:** <bullet list of anything the next session needs to act on, or "none">
**Notes:** <anything surprising, a decision made, a known blocker, etc.>

### Event HH:MM — <short label>
<brief note>

Session 2026-05-31 ~18:30 UTC — Claude Sonnet 4.6

Left off: Got opencode/deepseek-v4-pro working as the loop backend. Both builder and adversary are actively running on tinfoil/deepseek-v4-pro (via inference.tinfoil.sh). Phase 5 [11/11] in progress. The operator is debugging the opencode web UI visibility and wants to continue orchestrating from opencode itself.

Phase / loop state:

  • Phase 5 [11/11] (plan-phase5-verify-upgrade-flow.md), in progress
  • Latest product-repo commit: de635adstatus(5): V3 DONE (custom-html-tiny upgrade GREEN); V7 DONE; A5-1/A5-2 fixed
  • Loops RUNNING on opencode/deepseek-v4-pro, actively processing (3262K tokens in flight)
  • Watchdog RUNNING, backend persisted to .loop-backend / .loop-model files

Open items for next session:

  • Phase 5 loops need to finish V1V9 and write ## DONE to STATUS-5.md. They were at V3+V7 PASS before the backend switch. After completing phase 5, phase 6 (reconcile-only over all 18 recipe mirrors) and phase 7 (full upgrade on n8n + ghost + matrix-synapse) still need running.
  • Phase 4 (final review/polish) was deliberately deferred — run it after weekly Opus credits reset. Phase idx currently at 10 (phase 5). To run phase 4 later: set idx to 9, start with LOOP_BACKEND=claude RESUME_PHASE=1 cc-ci-plan/launch.sh start.
  • Restart loops after reading this — the current sessions are mid-processing. cc-ci-plan/launch.sh status will show state; if sessions are stalled, LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start.
  • DNS: oc.commoninternet.net A 100.84.190.30 still needs adding (operator step). Web UI reachable directly at http://100.84.190.30 in the meantime.
  • Old Incus orchestrator VM (cc-ci-orchestrator, 100.116.55.106) still cold standby — stop + delete when confident in Hetzner.

Notes — opencode/tinfoil setup (critical for next session):

  • Backend files: LOOP_BACKEND=opencode and LOOP_MODEL=tinfoil/deepseek-v4-pro are persisted in /srv/cc-ci/.cc-ci-logs/.loop-backend and .loop-model. The watchdog reads these to restart dead sessions with the right backend.
  • API key: stored in /srv/cc-ci/.testenv as TINFOIL_API_KEY. Written directly (not via env:) into ~/.config/opencode/opencode.jsonc — opencode doesn't do env substitution in apiKey. The config also has "permission": "allow" (all tool calls auto-approved).
  • Inference URL: https://inference.tinfoil.sh/v1 (NOT api.tinfoil.sh — that's the control plane only). Fixed in both .testenv and opencode.jsonc.
  • Opencode web server: opencode-web.service runs opencode serve --hostname 127.0.0.1 --port 4096. Nginx proxies oc.commoninternet.net → localhost:4096 on tailscale IP. Sessions from the plain opencode TUI DO appear in the shared server's DB (they auto-connect via IPC), so the web UI should show them once DNS is set.
  • Launch command for opencode loops: LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start
  • Launch command for claude loops (fallback): LOOP_BACKEND=claude LOOP_MODEL=sonnet RESUME_PHASE=1 cc-ci-plan/launch.sh start
  • Launchers rewritten to Python: launch.py, launch-orchestrator.py, launch-upgrader.py (bash wrappers are one-liners). All committed to recipe-maintainers/cc-ci-orchestrator (HEAD: 3412100).
  • Opencode binary: /home/loops/.local/bin/opencode v1.15.13. Re-install if missing: curl -sL https://github.com/anomalyco/opencode/releases/download/v1.15.13/opencode-linux-x64.tar.gz | tar -xz -C /home/loops/.local/bin opencode
  • Known opencode quirk: the loop bootstrap message (pointing to the kickoff file) is sent via ping_session with submit_key="Enter". The TUI needs ~8s to connect before the message is sent. If a session seems stuck at the blank prompt, manually send: the message from .cc-ci-logs/.kickoff-cc-ci-builder.txt (or adv), then press Enter.
  • Orchestrator in opencode: LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro cc-ci-plan/launch-orchestrator.sh fresh — no --resume (opencode doesn't support it); reads this JOURNAL.md as startup context.

Event 04:13 — migrated orchestrator to Hetzner cpx22

cc-ci-loops.service enabled, reboot-resilient. cc-ci server also Hetzner (server 134485294, ssh cc-ci100.95.31.88).

Event 13:22 — phase 4 paused, phase 5 started

Weekly Opus credits exhausted mid-session. Switched to Sonnet. Phase idx manually set to 10 (phase 5).

Event 17:29 — loops stopped to switch backends

Event 18:20 — opencode/deepseek loops running

After 7 bug fixes (wrong inference host, opencode run exits, --dir exits, env: not substituted in apiKey, permission prompts, submit key, timing), both loops now running on tinfoil/deepseek-v4-pro via the shared opencode-web.service.


Session 2026-05-31 ~04:00 UTC — Claude Sonnet 4.6

Left off: Completed the orchestrator → Hetzner migration (cpx22, server 134487234, public 168.119.126.100, tailnet cc-ci-orchestrator-1 @ 100.84.190.30). The old Incus VM (100.116.55.106) is still on the tailnet — cold standby, not yet deleted.

Phase / loop state: Phases 1c1e, 2w, 2pc, 2, 2b, 3 all DONE. Phase 5 [11/11] (upgrade-flow verify) in progress — loops running, actively verifying the !testme end-to-end flow on the new Hetzner cc-ci server.

Open items:

  • Phase 5 is in progress — loops need to finish V1V9 and write ## DONE to STATUS-5.md.
  • Phase 4 (final review/polish) was deliberately skipped this session — it is queued at idx 9 in PHASE_IDX_FILE. Resume it after the weekly Opus credits reset.
  • Phase 6 (reconcile-only over all 18 recipe mirrors) and Phase 7 (full upgrade on n8n + ghost + matrix-synapse) are planned but not yet started — run them after Phase 5 DONE.
  • Old Incus orchestrator VM (cc-ci-orchestrator, 100.116.55.106) is still running — stop it via the b1 Incus API once happy with the Hetzner box. mTLS certs at /srv/incus-terraform-nix-vm-creator/terraform-secrets/.
  • DNS: oc.commoninternet.net A record → 100.84.190.30 still needs adding (operator step).

Notes:

  • cc-ci-loops.service is enabled and wired with reboot-log.sh ExecStartPre — a reboot is a non-event; loops + watchdog auto-resume via RESUME_PHASE=1.
  • The cc-ci server also moved to Hetzner (server 134485294, ssh cc-ci100.95.31.88). It has authenticated Docker Hub pulls and 150 GB disk — the old OOM / disk-starvation / rate-limit issues are gone.
  • All recipe mirrors currently reconcile correctly; no stale open PRs observed.
  • opencode v1.15.13 installed at /home/loops/.local/bin/opencode. Tinfoil API key is in .testenv as TINFOIL_API_KEY. Backend switch: LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start.
  • Launcher scripts rewritten to Python (launch.py, launch-orchestrator.py, launch-upgrader.py); bash wrappers are now one-liners that exec python3 <script> "$@".

Event 03:13 — migrated from old Incus VM to Hetzner

Loops were started manually during staging (not by the service); first systemd-managed boot was later this session. cc-ci-loops.service now enabled.

Event 05:23 — phase 3 (results-UX) completed

All R1R8 Adversary-verified, no VETO. Watchdog auto-advanced to phase 4.

Event 13:22 — phase 4 paused, jumped to phase 5

Operator deferred phase 4 (weekly Opus credits exhausted). Phase idx manually set to 10 (phase 5). Loops restarted on Sonnet.

Event 17:29 — loops stopped pending restart on different model

Operator paused loops to reconfigure backend (opencode/tinfoil exploration). Phase 5 [11/11] was in progress — loops had verified V1/V2/V3/V7 (custom-html-tiny upgrade GREEN). Phase idx = 10 (phase 5), loops stopped, watchdog stopped.


Session 2026-06-01 03:34 UTC — OpenCode GPT-5.4

Left off: Fixed opencode web visibility for the Builder/Adversary loop sessions by switching the loop launcher from plain TUI startup to opencode attach against the shared web server, and patched the orchestrator launcher the same way for the next session.

Phase / loop state:

  • Phase 5 [11/11] (plan-phase5-verify-upgrade-flow.md), still in progress
  • Loops RUNNING on opencode with OpenAI gpt-5.4
  • Watchdog RUNNING
  • opencode-web.service RUNNING and nginx still serving http://oc.commoninternet.net

Open items:

  • Start a fresh orchestrator session in opencode if desired; this current conversation cannot be resumed as an opencode session, only handed off.
  • If you want the orchestrator tmux session to move from Claude to opencode, use LOOP_BACKEND=opencode LOOP_MODEL=openai/gpt-5.4 ORCH_SESSION=cc-ci-orchestrator-oc cc-ci-plan/launch-orchestrator.sh fresh or stop/recreate cc-ci-orchestrator-vm explicitly.
  • Phase 5 work itself is still unfinished; loops should continue from current state.
  • Phase 4 remains deferred; phases 6 and 7 still remain after phase 5 completes.

Notes:

  • The key fix for web visibility was opencode attach http://127.0.0.1:4096 --dir .... Plain opencode TUI sessions were inconsistently recorded and often did not show in the web UI.
  • The path choice was much less important than attach mode. We tested both symlinked and real repo paths. Attach mode was the real fix.
  • One attached loop initially hit python3: not found because tool execution started flowing through the shared opencode-web.service environment. Fixed by broadening the service PATH at runtime and in nix/hosts/cc-ci-orchestrator-hetzner/configuration.nix.
  • Current launcher state: cc-ci-plan/launch.py uses attach mode for opencode loops; cc-ci-plan/launch-orchestrator.py is patched to use attach mode for opencode orchestrator sessions too.
  • A runtime systemd override was applied at /run/systemd/system/opencode-web.service.d/override.conf. Persist the final service environment with nixos-rebuild when convenient.