diff --git a/AGENTS.md b/AGENTS.md index e9dfad1..c54783d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -16,18 +16,18 @@ project (NixOS config, test runner, recipe tests) lives in a **separate** repo t The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator watches from outside. -## On startup: announce yourself + report reboots +## On startup: read the journal, announce yourself, report reboots -**Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online — -the operator wants to know the supervising session is back (especially after a reboot, which kills -this session). Include the current phase and the reboot count. Steps on startup: -1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status` - (current phase + whether the loops/watchdog are running). -2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog +**Every time you (the orchestrator) start or resume:** +1. **Read `cc-ci-plan/JOURNAL.md`** — the most recent `## Session` entry is where the previous + session left off. This is the persistent handoff record; read it before anything else. +2. Read `cc-ci-plan/REBOOTS.md` (count entries) and run `cc-ci-plan/launch.sh status` + (current phase + whether loops/watchdog are running). +3. **`PushNotification`** (proactive): *"cc-ci orchestrator online — phase X, loops+watchdog running; N reboots logged (last )."* -3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the - loops are down), check that `cc-ci-loops.service` brought the loops back; if not, relaunch with - `RESUME_PHASE=1 cc-ci-plan/launch.sh start`. +4. If loops are down, relaunch: `RESUME_PHASE=1 cc-ci-plan/launch.sh start`. +5. **On handoff / end of session:** append a `## Session` block to `JOURNAL.md` summarising + what happened, current state, and open items (see format in that file). Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops + diff --git a/cc-ci-plan/JOURNAL.md b/cc-ci-plan/JOURNAL.md new file mode 100644 index 0000000..a10d9bd --- /dev/null +++ b/cc-ci-plan/JOURNAL.md @@ -0,0 +1,82 @@ +# Orchestrator journal + +This file is the **persistent handoff record** for the cc-ci orchestrator. Every orchestrator +session (whether Claude or opencode) reads this on startup and appends to it when handing off or +when something noteworthy happens. It survives conversation resets — it is the memory that +`--resume` can't provide for opencode, and a more readable supplement for Claude sessions. + +**On startup:** read this file before doing anything else. The most recent `## Session` entry +is where the previous session left off. Carry that context forward. + +**On handoff / end of session:** append a `## Session` block (see format below) summarising +what happened, the current state, and anything the next session needs to know. + +**On significant events mid-session:** append a `### Event` sub-entry (no need to wait for +handoff). + +--- + +## Format + +```markdown +## Session YYYY-MM-DD HH:MM UTC — +**Left off:** +**Phase / loop state:** +**Open items:** +**Notes:** + +### Event HH:MM — + +``` + +--- + +## Session 2026-05-31 ~04:00 UTC — Claude Sonnet 4.6 + +**Left off:** Completed the orchestrator → Hetzner migration (cpx22, server 134487234, public +`168.119.126.100`, tailnet `cc-ci-orchestrator-1` @ `100.84.190.30`). The old Incus VM +(`100.116.55.106`) is still on the tailnet — cold standby, not yet deleted. + +**Phase / loop state:** Phases 1c–1e, 2w, 2pc, 2, 2b, 3 all DONE. Phase 5 [11/11] +(upgrade-flow verify) in progress — loops running, actively verifying the `!testme` +end-to-end flow on the new Hetzner cc-ci server. + +**Open items:** +- Phase 5 is in progress — loops need to finish V1–V9 and write `## DONE` to STATUS-5.md. +- Phase 4 (final review/polish) was deliberately **skipped** this session — it is queued + at idx 9 in PHASE_IDX_FILE. Resume it after the weekly Opus credits reset. +- Phase 6 (reconcile-only over all 18 recipe mirrors) and Phase 7 (full upgrade on n8n + + ghost + matrix-synapse) are planned but not yet started — run them after Phase 5 DONE. +- Old Incus orchestrator VM (`cc-ci-orchestrator`, `100.116.55.106`) is still running — + stop it via the b1 Incus API once happy with the Hetzner box. mTLS certs at + `/srv/incus-terraform-nix-vm-creator/terraform-secrets/`. +- DNS: `oc.commoninternet.net` A record → `100.84.190.30` still needs adding (operator step). + +**Notes:** +- `cc-ci-loops.service` is **enabled** and wired with `reboot-log.sh` ExecStartPre — a reboot + is a non-event; loops + watchdog auto-resume via RESUME_PHASE=1. +- The cc-ci **server** also moved to Hetzner (server 134485294, `ssh cc-ci` → + `100.95.31.88`). It has authenticated Docker Hub pulls and 150 GB disk — the old OOM / + disk-starvation / rate-limit issues are gone. +- All recipe mirrors currently reconcile correctly; no stale open PRs observed. +- `opencode` v1.15.13 installed at `/home/loops/.local/bin/opencode`. Tinfoil API key is in + `.testenv` as `TINFOIL_API_KEY`. Backend switch: `LOOP_BACKEND=opencode + LOOP_MODEL=tinfoil/deepseek-v4-pro RESUME_PHASE=1 cc-ci-plan/launch.sh start`. +- Launcher scripts rewritten to Python (`launch.py`, `launch-orchestrator.py`, + `launch-upgrader.py`); bash wrappers are now one-liners that `exec python3