cc-ci-orchestrator

Author	SHA1	Message	Date
autonomic-bot	d219b0972c	journal: BUILD COMPLETE + weekly-upgrade cron cutover to NixOS timer (Sun 02:00 UTC) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 23:26:59 +00:00
autonomic-bot	d8f558e987	journal: backend reverted to claude, waker folded into watchdog, boot service fixed Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 21:48:09 +00:00
autonomic-bot	2235110e29	journal: phase-5 progress-monitor events (19:04, 19:08) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 21:46:29 +00:00
autonomic-bot	ca6e68c08d	feat(orchestrator): fold hourly supervision wake into the watchdog The standalone ai-progress-monitor.sh waker pinged a hardcoded orchestrator session every 15m. Move that into the watchdog loop: ORCH_WAKE_INTERVAL (default 3600s) types the supervision prompt into the live orchestrator session, retrying each tick until it lands so a busy or briefly-absent orchestrator is never interrupted and no hour is skipped. Delete the now-redundant waker script; the prompt file is now driven by the watchdog. Reboot-safe by inheritance (the watchdog is started by cc-ci-loops.service). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 21:46:20 +00:00
autonomic-bot	8f7265e948	feat(orchestrator): wake the live monitor session	2026-06-01 18:51:05 +00:00
autonomic-bot	9fe9d49cac	journal: record Hetzner rescue recovery for cc-ci	2026-06-01 13:55:15 +00:00
autonomic-bot	8093a95184	journal: session 2026-06-01 03:34 UTC handoff (opencode gpt-5.4 visible)	2026-06-01 13:03:51 +00:00
autonomic-bot	837fed17d2	fix(orchestrator): attach opencode session from orchestrator repo	2026-06-01 13:03:51 +00:00
autonomic-bot	24bf379b5b	feat(assistant): add opencode launcher and phase 6/7 plans	2026-06-01 12:59:03 +00:00
autonomic-bot	6a6c17f526	fix(launch-orchestrator): opencode uses plain TUI + ping, not run --attach Same fix as the loops: opencode run --attach exits after one turn; plain opencode TUI stays alive in tmux. Send startup prompt via ping_session (Enter) after 8s init wait. Bootstrap points to JOURNAL.md rather than sending the full prompt inline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 18:30:09 +00:00
autonomic-bot	2aa3fbda8d	journal: session 2026-05-31 18:30 UTC handoff (opencode/deepseek running, phase 5)	2026-05-31 18:27:17 +00:00
autonomic-bot	3412100240	fix(opencode): all issues from first live run resolved 1. API key: opencode doesn't support env: substitution in apiKey — write actual key value to ~/.config/opencode/opencode.jsonc at setup time (file is not committed to git; key sourced from .testenv). 2. Permission system: add permission:"allow" to opencode config (equivalent to --dangerously-skip-permissions) to avoid interactive prompts. 3. Submit key: opencode TUI uses Enter (return) to submit; Ctrl+S not needed. ping_session already uses Enter — keep as is. 4. Startup timing: bump opencode TUI init wait from 4s to 8s so the TUI is fully connected to the server before bootstrap is sent. 5. Backend persistence: LOOP_BACKEND/LOOP_MODEL written to .loop-backend / .loop-model so the watchdog uses them when restarting dead sessions. All tested: both builder and adversary sessions alive, deepseek-v4-pro processing kickoffs via tinfoil inference.tinfoil.sh, no API/permission errors. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 18:21:10 +00:00
autonomic-bot	cd5e645427	fix(opencode): use inference.tinfoil.sh + attach TUI + NO_COLOR Three fixes discovered during first live run: - inference host is inference.tinfoil.sh not api.tinfoil.sh (control plane only serves /v1/models, not /v1/chat/completions) - opencode run exits after one turn; switch to opencode attach for the persistent TUI, then ping_session sends the kickoff prompt - NO_COLOR=1 suppresses the first-run interactive theme picker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 17:56:06 +00:00
autonomic-bot	bca51071bd	refactor: rewrite launchers as Python; add orchestrator JOURNAL.md Bash scripts are now one-liner wrappers: exec python3 <script>.py "$@" All logic lives in the Python scripts (pure stdlib, no deps). launch.py — loops + watchdog: Full port of launch.sh: phase sequencing, start/stop/status/logs/watchdog, handoff signalling, stall detection, heal_session, heal_orchestrator. Cleaner structure: config block → helpers → phase/kickoff/agent/healing/ handoff/watchdog/main. LOOP_BACKEND + LOOP_MODEL switches throughout. launch-orchestrator.py — orchestrator session: claude path: --resume <id> preserved (conversation survives reboots). opencode path: run --attach --title (no --resume; STARTUP_PROMPT orients the new session; reads JOURNAL.md for context). STARTUP_PROMPT updated to reference JOURNAL.md on startup. launch-upgrader.py — one-shot upgrade job: LOOP_BACKEND / LOOP_MODEL take precedence over UPGRADER_BACKEND / UPGRADER_MODEL. Both claude and opencode paths supported. cc-ci-plan/JOURNAL.md — new orchestrator handoff file: Persistent across conversation resets. Documents the handoff format and carries the current session's summary: migration complete, phase 5 in progress (V3/V7 PASS), phase 4 deferred, open items for next session. AGENTS.md: step 1 on startup = read JOURNAL.md; step 5 = append on handoff. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 17:50:09 +00:00
autonomic-bot	e0e5bf6e64	feat: opencode web at oc.commoninternet.net (one server, named sessions) configuration.nix: - systemd.services.opencode-web: one shared opencode server on 127.0.0.1:4096, EnvironmentFile=/srv/cc-ci/.testenv (TINFOIL_API_KEY), ExecStartPre clears stale /tmp/opencode so restarts never fail on the EEXIST race. - services.nginx: reverse-proxy oc.commoninternet.net → localhost:4096, bound to tailscale IP 100.84.190.30 (tailnet-only, plain HTTP). DNS: A record oc.commoninternet.net → 100.84.190.30 (operator step). launch.sh + launch-upgrader.sh: - Drop per-session ports / OPENCODE_HOST; add OPENCODE_SERVER=http://127.0.0.1:4096. - opencode backend: agents use `opencode run --attach $OPENCODE_SERVER --title $session` so each shows up as a named session in the web UI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 17:37:03 +00:00
autonomic-bot	a87d42f491	feat: opencode/tinfoil backend support in all launchers Adds LOOP_BACKEND=opencode\|claude (+ LOOP_MODEL) to launch.sh and launch-upgrader.sh, enabling the loops/upgrader to run via opencode CLI against the tinfoil.sh API (deepseek-v4-pro etc.) instead of Claude. launch.sh: - LOOP_BACKEND (claude\|opencode), LOOP_MODEL env vars - OPENCODE_BIN, OPENCODE_HOST (tailscale IP), OPENCODE_PORT (per-session) - start_agent: backend switch — claude path unchanged; opencode starts `opencode --hostname <ts-ip> --port <N> run <kickoff>` so the web UI is bound to the tailscale interface (tailnet-only observability) - preflight: validates the right binary per backend - heal_session / heal_orchestrator: extend active-work detection to opencode spinner chars + "Running tool" - help: shows both backend configs launch-upgrader.sh: - UPGRADER_BACKEND / UPGRADER_MODEL (LOOP_BACKEND/LOOP_MODEL override) - start: same backend switch as launch.sh - OPENCODE_PORT=4098 (separate from loops 4096/4097) configuration.nix: note opencode binary location + re-install command. Tinfoil config: ~/.config/opencode/opencode.jsonc — provider "tinfoil" with baseURL=https://api.tinfoil.sh/v1, apiKey=env:TINFOIL_API_KEY (key + TINFOIL_MODEL + TINFOIL_BASE_URL stored in .testenv). opencode v1.15.13 installed at /home/loops/.local/bin/opencode. Usage: LOOP_BACKEND=opencode LOOP_MODEL=tinfoil/deepseek-v4-pro \ RESUME_PHASE=1 cc-ci-plan/launch.sh start Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 17:21:13 +00:00
autonomic-bot	25fd7407fd	launch-upgrader: default model to sonnet (UPGRADER_MODEL) Adds UPGRADER_MODEL env var (default: sonnet) passed as --model to the claude invocation. The cron runs the upgrader on Sonnet so it doesn't consume Opus weekly credits. Override with UPGRADER_MODEL=opus if needed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 13:24:53 +00:00
autonomic-bot	21e7a79f50	orchestrator-hetzner: enable reboot-resilience + record migration Now the workspace is staged on the Hetzner cpx22 (server 134487234, public 91.98.47.73, tailnet cc-ci-orchestrator-1 @ 100.84.190.30): - configuration.nix: enable cc-ci-loops.service (wantedBy multi-user.target) so the loops + watchdog auto-resume on boot; wire reboot-log.sh as ExecStartPre so reboots auto-log to REBOOTS.md (boot_id-gated). - plan-orchestrator-hetzner-migration.md: full migration record. - REBOOTS.md / AGENTS.md: point the orchestrator host at Hetzner; first auto-logged reboot line. - launch-orchestrator.sh: default session id -> the Hetzner orchestrator session. - flake.lock: pin inputs. Verified: nixos-rebuild switch applied; systemctl is-enabled cc-ci-loops.service = enabled; ExecStartPre logged this boot to REBOOTS.md; loops healthy on phase 2. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 03:54:17 +00:00
autonomic-bot	4c418765c8	plan: full migrate-cc-ci-to-hetzner (provision cpx32 → benchmark 2 recipes → cutover loops+pipeline+DNS → retire Incus VM); age key is on the VM so no secret-blocker; harden .gitignore for the age key	2026-05-31 02:04:02 +00:00
autonomic-bot	102427ab5b	plan: full migrate-to-Hetzner (provision → cut over loops → stop old b1 VM); server type cpx31→cpx32 - plan-cc-ci-hetzner-migration.md: 3-phase plan — (1) provision the Hetzner cpx32 cc-ci fully + green !testme readiness gate, (2) repoint the loops + dashboard + *.ci at it (one ssh-config + DNS change), (3) stop the b1 cc-nix-test (cold standby). Parallel bring-up, reversible cutover, b1 freed. - plan-cc-ci-hetzner-terraform.md: cpx31 is retired → default to cpx32 (current dedicated-vCPU 8GB). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-31 01:15:29 +00:00
autonomic-bot	b98e527656	plan: switch cc-ci cloud terraform from DigitalOcean to Hetzner (cx32 8GB, hcloud provider, nixos-infect + D8 flake flow)	2026-05-31 00:25:05 +00:00
autonomic-bot	67226efe72	plan: cc-ci on DigitalOcean — terraform/ + nixos-infect + nix provisioning (8GB droplet, reproducible from the cc-ci flake)	2026-05-31 00:18:27 +00:00
autonomic-bot	01874821f2	decommission Pi: update all docs for VM-only setup The orchestrator Pi is retired (2026-05-31). All agents now run on the cc-ci-orchestrator VM (NixOS, loops user, /srv/cc-ci). The VM is a direct tailnet peer to cc-ci — no SOCKS proxy, no userspace tailscaled, no ProxyCommand. Updated across all affected files: AGENTS.md - Remove Pi from reboot description; migration complete (not "parked") - cc-ci access: direct ssh, not via proxy kickoff.md - Prerequisites: direct tailnet peer, not proxy - Host deps: NixOS (not apt) - Fallback/Incus: b1 reachable directly, no --proxy curl flag plan.md §1 + §1.5 - §1 bootstrap: direct SSH, check tailscale status (not restart proxy) - §1.5 intro: "VM" not "sandbox host"; no proxy - Credentials table: remove TS_AUTH_KEY row; update cc-ci SSH row - Replace "Tailscale connection (proxy)" subsection with direct-peer description plan-orchestrator-migration.md - Mark COMPLETE (2026-05-31); historical record only plan-phase1c-full-reproducibility.md - Incus access: direct, not via SOCKS proxy prompts/builder.md + prompts/adversary.md - cc-ci access language only: direct ssh, no proxy restart instructions - adversary: *.ci.commoninternet.net via plain curl, no proxy flag REBOOTS.md - Retitle for VM; note Pi retired; Pi entries marked historical systemd/cc-ci-loops.service - User/Group/HOME/PATH: notplants → loops - Remove cc-ci-tailscaled.service dependency (no proxy on VM) - Add note about nix/configuration.nix as the authoritative VM declaration test-e2e-testme-acceptance.md - tailscale status: no --socket flag - ssh to throwaway: no ProxyCommand Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 00:16:37 +00:00
autonomic-bot	db375bcc07	rename to cc-ci-orchestrator: update all repo name references Gitea repos renamed: cc-ci-autonomous-orchestrator → cc-ci-orchestrator cc-ci-orchestrator → archived-cc-ci-orchestrator Updated in this workspace: - README.md, AGENTS.md: repo title - cc-ci-plan/plan-orchestrator-migration.md: cc-ci-autonomous-orchestrator refs - cc-ci-plan/plan-repo-consolidation.md: marked complete + Pi remote-update notice - cc-ci-plan/launch-orchestrator.sh, launch.sh: session naming comment cleanup NOTE: Pi clone still has the old origin URL. On the Pi, run: git remote set-url origin https://git.autonomic.zone/recipe-maintainers/cc-ci-orchestrator.git Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-31 00:03:11 +00:00
autonomic-bot	2ef90a4237	launch-assistant.sh: run the assistant on sonnet (ASSISTANT_MODEL, default sonnet)	2026-05-30 23:54:25 +00:00
autonomic-bot	2233c6182a	add launch-assistant.sh: cc-ci-assistant — remote-control, non-loop helper A general-purpose Claude session sharing the orchestrator's workspace + access, under remote-control (cc-ci-assistant), NOT on a loop. Sits idle until the orchestrator/operator hands it a plan/task, does it, reports, waits. Modelled on launch-orchestrator.sh: persistent pinned session-id (resume across relaunch), root-aware --dangerously-skip-permissions handling, start/fresh/status/attach/stop. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 23:52:08 +00:00
autonomic-bot	b550d6c432	plan stub: repo consolidation (merge 2 orchestrator repos) + references/recipe-maintainer as a submodule — deferred until credits (operator 2026-05-30)	2026-05-30 23:47:07 +00:00
autonomic-bot	fffd83fe4b	launch.sh: use CLAUDE_DANGEROUSLY_SKIP_PERMISSIONS env var when running as root (VM uses root; --dangerously-skip-permissions flag blocked by claude for root)	2026-05-30 19:36:35 +01:00
autonomic-bot	fd08a977d0	overlay policy: standardize the ccci overlay filename to compose.ccci.yml Operator: use a single uniform filename `compose.ccci.yml` per recipe (one file holding all cc-ci-side deploy tweaks) rather than per-purpose suffixes like compose.ccci-health.yml. Updated §9 + plan-ccci-compose-overlay-policy.md; added a DoD item to rename tests/{ghost,discourse}/compose.ccci-health.yml -> compose.ccci.yml and update their install_steps.sh cp target + recipe_meta COMPOSE_FILE. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:25:48 +01:00
autonomic-bot	5f34c0ad01	overlay policy (content): §9 guardrail rewrite + plan-ccci-compose-overlay-policy.md The prior commit only captured the file deletion (git add aborted on the already-removed pathspec). This adds the actual content: the reworked §9 guardrail (justified ccci overlays OK; abra can't env start_period; always test upgrade-to-latest, from-version custom tests skippable) and the new policy doc. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:19:18 +01:00
autonomic-bot	6cb5580390	overlay policy: ccci compose overlays OK when justified (abra can't env start_period); keep upgrade-to-latest Operator correction (builder was right): abra does NOT support an env value for healthcheck start_period, so the earlier "parameterize via APP_START_PERIOD env PR" approach is impossible — a ccci compose overlay is the right tool there. - plan.md §9: replace the "don't fork compose / use env PR" guardrail with "avoid where possible + justify each + prefer upstream PR, BUT a uniform optional compose.ccci-*.yml overlay is an acceptable fallback" (esp. for abra-unparameterizable values like start_period). Add the upgrade-tier rule: ALWAYS test the upgrade to latest; a from-version's custom tests may be skipped if it can't fully run, but never drop upgrade-to-latest. - replace plan-prefer-env-over-compose-overlay.md with plan-ccci-compose-overlay-policy.md: ghost/discourse start_period overlays STAY (justified); discourse image re-pin STAYS (keeps the upgrade-to-latest testable; 0.7.0 custom tests may be skipped); mumble old-base host-ports copy DROPPED (skip 0.2.0 voice tests, still upgrade to latest + test there). Each surviving overlay must be minimal + header-justified + Adversary-confirmed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:18:53 +01:00
autonomic-bot	e7f05ceffe	orchestrator-migration: Phase A COMPLETE (cc-ci-orchestrator VM up + ssh) + reboot #3 log Phase A done before the Pi's reboot #3 (commit was interrupted): the loops VM cc-ci-orchestrator is on the tailnet (100.116.55.106) and ssh-able; TS-key finding recorded (VM-creator .test.env key revoked; cc-ci .testenv key valid + persisted). REBOOTS.md carries the auto-logged 2026-05-30 17:03 reboot (cc-ci-loops.service auto-recovered the loops at phase 2; swapfile persisted). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 17:07:02 +01:00
autonomic-bot	742a08b677	orchestrator-migration: Phase A started — cc-ci-orchestrator VM created (2GB/2vCPU) Operator go-ahead (Pi is OOM-thrashing/slow). Created the dedicated loops VM cc-ci-orchestrator (2GB RAM / 2 vCPU / 30GB, incus-base-vm NixOS) on b1 via the Incus API, mirroring the known-good cc-nix-test spec; started it — cloud-init is running nixos-rebuild boot + reboot + tailnet join. Status flipped DRAFT->IN PROGRESS with the remaining Phase-A items noted (add cc-ci-root key via incus exec, confirm tailnet+ssh, write the reproducible TF project). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:53:06 +01:00
autonomic-bot	71a4a1fea4	Reliable loop messaging: msg-loop.sh + hardened ping_session (retry submit) tmux `send-keys -l <long msg>` often leaves the text UNSENT in the input box (the immediate Enter is swallowed while the TUI ingests the paste). Both now type the message then retry Enter/C-m until the leading text is no longer in the input box (= submitted) or a bounded loop gives up. - msg-loop.sh: standalone reliable messenger for orchestrator use. - launch.sh ping_session: same retry-submit (loads on next watchdog restart). Live-tested: delivered first try. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:31:28 +01:00
autonomic-bot	7a1f7f75aa	Policy: prefer upstream env-parameterization over cc-ci compose overlays Operator (2026-05-30): a cc-ci-authored compose overlay risks silent drift from the recipe users actually run — avoid it wherever possible. - plan.md §9 guardrail: when a recipe needs a cc-ci-env-tuned value (e.g. a longer healthcheck start_period for the slow single node), the preferred fix is an UPSTREAM recipe PR exposing it as an env var (e.g. APP_START_PERIOD) with the current value as the default in env.sample — CI sets the env, no new compose. For making the upgrade tier work from an older base version, prefer DECLARING that version not-testable under this CI env over crafting a custom compose. Overlay = last resort, Adversary-confirmed non-drifting + paired with the env PR. - plan-prefer-env-over-compose-overlay.md: migrates the existing debt — ghost/discourse compose.ccci-health.yml start_period -> APP_START_PERIOD recipe PRs (default=current) then drop the overlays; discourse image re-pin + mumble old-base host-ports copy -> declare those old versions untestable instead of forking compose. No test weakened; untestable-version is an honest outcome. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 15:17:42 +01:00
autonomic-bot	a89b082240	plan §7: recommend Monitor-on-convergence pattern for long deploys (builder's idea) For a long deploy/convergence, arm a Monitor that polls the node every ~30s and wakes on convergence OR failure, with a longer fallback heartbeat (ScheduleWakeup) as a backstop. Proceeds the instant it converges (no over-waiting), surfaces failures promptly, and the heartbeat bounds the wait. Size the timeout sanely (longer if justified, never absurd like the ~40-min ghost case). Credit: builder. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 05:17:18 +01:00
autonomic-bot	e85e16318c	Phase 2b narrowed to "confirm minimal deploys"; perf ideas moved to IDEAS Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly (bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software micro-optimizations are judged unlikely to help, so: - IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline, attribution) + the optimization menu (image cache/prepull, readiness tuning, warm-SSO start/stop, runner caching, concurrency sizing, resources, secret overhead) under "Phase-2b empirical performance work", revisit only if measurement later proves a specific software bottleneck. - plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the per-recipe test sequence already uses the minimum deploys (1 base shared by install+functional+backup/restore, +1 for the upgrade tier, +1 per dep), enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 05:07:49 +01:00
autonomic-bot	1c2be64124	Phase 5 §4: install weekly upgrade cron at completion+1h and verify first kickoff Operator: when the final phase completes, install the weekly cron anchored to actual completion — first run ~1h after the build finishes, weekly from then on (supersedes the fixed "Sat 03:00 UTC" placeholder). - plan-phase5 §4: orchestrator computes T0=now+1h, installs a weekly job at T0's DOW+HH:MM running launch-upgrader.sh start; cron env needs claude on PATH + tmux + claude.ai login (mirror cc-ci-loops.service). VERIFY the first kickoff: cheap --dry-run pre-check, then confirm the real T0 fire launched the cc-ci-upgrader agent (status RUNNING, ran /upgrade-all, summary produced); record schedule + verified kickoff in DECISIONS.md. - upgrade-all skill Cron section + cron memory updated to the completion-anchored schedule + first-kickoff verification. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 21:21:20 +01:00
autonomic-bot	bf71420106	Add cc-ci-upgrader agent: observable one-shot weekly upgrade-run agent The weekly upgrade run now executes inside a dedicated, remote-control agent (cc-ci-upgrader) — viewable/steerable at claude.ai/code like the Builder — rather than buried in headless cron output. - launch-upgrader.sh: spins up the cc-ci-upgrader tmux session under --remote-control with a kickoff that runs /upgrade-all (DEFAULT mode) to completion. On finish the agent STOPS and stays idle (does NOT self-terminate) so the run + summary stay reviewable in the web UI. `start` = use-or-create: leaves an in-flight (busy) run alone, else clears a finished/idle/wedged session and runs fresh; `fresh` always restarts. UPGRADER_ARGS passes flags (e.g. --dry-run); never --with-tests. - launch.sh: orchestrator_alive() now also skips the cc-ci-upgrader remote-control name, so the upgrader job isn't mistaken for the orchestrator. - upgrade-all skill: documents it runs as the cc-ci-upgrader agent; the weekly cron invokes `launch-upgrader.sh start` (not /upgrade-all inline). - Phase 5: V8a verifies the agent lifecycle (launch → run to completion → stay idle/viewable → next start clears it); V9 stops the verification session. - cron memory: weekly task = launch-upgrader.sh start at 0 3 * * 6 UTC. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 21:12:47 +01:00
autonomic-bot	4f74676c72	Phase 5 (final): verify the /recipe-upgrade + testme-on-pr.sh end-to-end flow Appended as the LAST phase in the launcher sequence (… 3 4 5). It can only run once cc-ci is fully built — the !testme-on-recipe-PR flow depends on Phase 3 (results UX) surfacing the run result back on the PR for testme-on-pr.sh to read. DoD (Adversary cold-verifies): !testme on a recipe PR is the real gate + results land in the PR (V1); testme-on-pr.sh reads GREEN/RED/PENDING + BUILD url, POST=0 polls without re-triggering (V2); /recipe-upgrade default end-to-end green on a sandbox recipe, nothing merged (V3); the ≤3 !testme regression loop (V4); stale test DEFAULT = comment-only, no test edit (V5); --with-tests opens+verifies a cc-ci test PR, paired (V6); mirror reconcile closes merged/superseded PRs and main==upstream (V7); /upgrade-all default dry-run + small live run never edits tests (V8); all verification PRs closed + deploys torn down (V9). Use a sandbox recipe; never merge; never weaken tests. Watchdog reloaded (seq …3 4 5). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 20:38:39 +01:00
autonomic-bot	c7da03fa6c	watchdog: STALL_GRACE so stall_check never races a loop's own ScheduleWakeup Root cause of the adversary "overrun": stall_check rebooted the instant now >= WAITING-UNTIL (zero grace), but the loop's own ScheduleWakeup fires AT that stated time — and the runtime scheduled it ~40s later than the marker (date-vs-scheduler skew). So the watchdog pre-empted a HEALTHY self-wake by ~37s; the loop wasn't wedged, it was killed just before it woke. That was the single false reboot at 18:55Z. Fix: split the two cases cleanly. - Marker present: reboot only when now > WAITING-UNTIL + STALL_GRACE (180s) — covers wake+start latency + marker/scheduler skew, so the watchdog only fires if the self-wake GENUINELY failed. - No marker: unchanged — reboot when idle >= STALL_IDLE (300s). Verified post-fix: adversary self-woke on time and re-paced (WAITING-UNTIL 19:19:30Z); no new stall reboots. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 20:12:46 +01:00
autonomic-bot	e8c4330ce3	watchdog: reboot idle-wedged loops via self-reported WAITING-UNTIL markers The builder wedged at the context limit (garbled output) — alive but matching none of heal_session's signals (dead/FATAL/limit), so the watchdog left it stuck. Fix: loops now declare every wait, and the watchdog reboots a wait that never resumes. - plan.md §7 + both prompts: cap every wait at 10 min (chunk longer waits); before going idle, the loop's FINAL line must be `WAITING-UNTIL: <ISO8601 UTC>` (the resume time, matching its ScheduleWakeup); run /compact proactively at ~80% context to avoid wedging near the limit. - launch.sh: new stall_check (runs every 30s signal tick) — reboots a loop idle >= STALL_IDLE (300s) when it has NO current WAITING-UNTIL marker as its last message OR is past the time the marker named; a healthy paced wait (marker present, before its time) is left alone. Complements heal_session's dead/FATAL/limit cases. Reboot is safe — loops re-orient from git + STATUS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 19:05:29 +01:00
autonomic-bot	27480b3513	Commit the 3r removal + skills-tracking .gitignore (missed in prior 2 commits) The earlier `git add` included an already-`git rm`'d pathspec, so it errored and staged nothing — launch.sh (3r removal) and .gitignore (track .claude/skills/) were left uncommitted while the skill files went in via a separate -f add. Runtime was already correct (watchdog reads the working-tree launch.sh); this just syncs git HEAD to the working tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 17:05:43 +01:00
autonomic-bot	2530845e50	orchestrator: add /ci-test-review skill (in THIS repo) + drop Phase 3r from loops queue The on-demand AI review layer is now an orchestration-repo skill built directly by the orchestrator, NOT a loops phase in the cc-ci product repo: - .claude/skills/ci-test-review/{SKILL.md,run-all-recipes.sh}: runs the real cc-ci harness across all enrolled recipes (deterministic, AI-free execution), then AI diagnoses each failure and classifies it as needing a recipe PR / a CI-server PR / a stale-test update — or reports "ALL PASSED, recipes + tests up to date". Proposes PRs; never decides pass/fail; never auto-merges. - .gitignore: track .claude/skills/ (shareable) while still ignoring local claude session state (locks, history) under .claude/. - launch.sh: remove Phase 3r from PHASES_SPEC; loops sequence back to 1c 1b 1d 1e 2w 2pc 2 2b 3 4. Deleted plan-phase3r (superseded by the skill). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 16:57:26 +01:00
autonomic-bot	5f84f8c028	plan: Phase 3r — /ci-test-review Claude skill (on-demand AI review + recipe-vs-CI PR diagnosis) Deterministic CI stays the primary, AI-free path. Adds a separate on-demand skill (ships in the cc-ci repo .claude/skills/ci-test-review/) that runs the full suite across all recipes and, per failure, AI-diagnoses + classifies: recipe PR (+ proposed change) vs CI-server PR vs stale-test; or 'all passed, recipes+tests up to date' (incl. a latest-version freshness check). Proposes, never auto-merges (operator-merge rule). Slotted 3 -> 3r -> 4. AI only diagnoses; execution stays deterministic. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 16:39:07 +01:00
autonomic-bot	61ab3ecb3a	plan: per-test image pre-pull sub-plan (warm images before deploy + upgrade; cheap on warm cache) Resolve a recipe's images (docker compose config --images) and docker pull them (skip-if-present for pinned tags) at the start of the recipe sequence + before the upgrade-new-version deploy, then the normal abra deploy. Separates pull from converge (clear pull failures vs murky convergence timeouts), speeds convergence (fits abra-native window). No layer re-download on warm cache; nightly all-recipes run warms everything. Complements (not replaces) the recipe healthcheck for slow-init convergence. Near-term Phase-2 harness unit; real abra deploy unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 14:55:21 +01:00
autonomic-bot	e7ed0e14b8	lasuite-drive PR: scope the repeated-green/3x bar to lasuite-drive (flakiness proof) — NOT the general standard (operator 2026-05-29) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 13:25:10 +01:00
autonomic-bot	7a87dc02b1	plan: lasuite-drive recipe-robustness PR sub-plan (collabora healthcheck + perms + lazy OIDC) Operator (2026-05-29): dedicated sub-plan for the upstream recipe PR. Fixes collabora WOPI healthcheck/start_period (keystone — fixes F2-12 at the source so cc-ci can return to abra-native convergence + drop the -c/READY_PROBE backstop), backend WOPI retry, gunicorn-perms race, lazy OIDC. PR is 'working' only when cc-ci runs the full suite incl. upgrade tier green + Adversary cold-verify, then operator merges. Broken out from plan-lasuite-drive-oidc-robustness.md Part B (now points here). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 12:58:36 +01:00
autonomic-bot	7f8e6cb13e	guardrail: abra convergence by default; custom READY_PROBE only when necessary + a real strict test (operator 2026-05-29, re F2-12) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 12:56:26 +01:00
autonomic-bot	294a8a1a9e	rename the opt-in heavy-tests flag: --extra-tests -> --extra (operator 2026-05-29) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-29 10:36:04 +01:00

1 2

90 Commits