Files

autonomic-bot 399e999978 weekly-run: watchdog resumes on proc-death; supervisor defers to watchdog

Live-testing the resume path surfaced two gaps: (1) an `opencode run` proc
EXITS when the model ends its turn, so a long /upgrade-all run's process dies
repeatedly before the whole run completes — and the log mtime freezes on death,
so the watchdog's log-idle>15min signal is both too slow and unreliable. (2) A
resumed run had no watchdog, so nothing re-continued it.

- watchdog(): detect PROC-DEATH (no live `opencode run` proc for the session +
  not completed) and resume promptly, in addition to log-idle. Guarded by
  MAX_RESUMES (default 20) so a no-progress loop (e.g. disk-full) eventually hands
  off to the supervisor/operator instead of spinning forever.
- resume(): auto-spawn a watchdog if none is alive (skips when the watchdog itself
  called resume — it lives in {SESSION}-watchdog — so no duplicate).
- launch-supervisor.py gate: defer while the per-run watchdog is alive (it is the
  single writer for prompt-recovery). The supervisor only takes over once the
  watchdog gives up (MAX_RESUMES) — i.e. a wedge a bare resume can't fix. Removes
  the supervisor/watchdog double-resume race.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WxbpH3DquKzoSTSwGvGuET

2026-07-04 04:39:40 +00:00

prompts

loops: mandate machine-docs/ for ALL coordination files (kickoff/prompts/plan/AGENTS)

2026-06-11 20:56:24 +00:00

systemd

decommission Pi: update all docs for VM-only setup

2026-05-31 00:16:37 +00:00

upstream

upstream(hedgedoc): fix pgautoupgrade source repo URL

2026-07-03 04:09:57 +00:00

adversary-verify-pr6.md

skill(ci-dev-workflow): capture the cc-ci feature-dev flow + adversary plan template

2026-06-09 03:16:47 +00:00

agent-log.py

feat(logs): readable greppable per-agent transcript logs (agent-log.py)

2026-06-02 04:35:17 +00:00

agents.py

watchdog: suppress scheduled wakes once the build sequence is complete

2026-06-13 12:04:49 +00:00

agents.toml

plan: queue redfix — investigate ALL canon-sweep failures + FIX each (recipe PR or harness improvement, opus)

2026-06-17 23:17:51 +00:00

ai-progress-monitor-prompt.txt

wake prompt: remove temporary limit-system night-watch line (condition met)

2026-06-11 06:55:08 +00:00

brief.md

Initial commit: cc-ci autonomous orchestrator

2026-05-26 20:46:28 +01:00

concurrency-restructure-full-plan.md

plan: adapt concurrency restructure to builder/adversary loop protocol (gates M1/M2, phase-namespaced state)

2026-06-10 03:54:31 +00:00

concurrency-restructure-plan.md

plan: concurrency restructure — flock-probe janitor, per-run ABRA_DIR, lock-lifetime chain

2026-06-10 03:41:05 +00:00

IDEAS.md

plan: queue samever (older-base fallback when last-green==head, opus); IDEAS: defer canonical-history (B)

2026-06-17 03:50:49 +00:00

JOURNAL.md

journal: redfix DONE — all 6 canon-sweep failures fixed + verified (4 recipe PRs, 2 harness); SEQUENCE-COMPLETE

2026-06-18 14:54:28 +00:00

kickoff.md

decommission Pi: update all docs for VM-only setup

2026-05-31 00:16:37 +00:00

launch-assistant.py

feat(assistant): add opencode launcher and phase 6/7 plans

2026-06-01 12:59:03 +00:00

launch-assistant.sh

feat(assistant): add opencode launcher and phase 6/7 plans

2026-06-01 12:59:03 +00:00

launch-opencode.sh

orchestrator: restore opencode web launcher

2026-06-12 15:45:09 +00:00

launch-orchestrator.py

orchestrator: restore opencode web launcher

2026-06-12 15:45:09 +00:00

launch-orchestrator.sh

refactor: rewrite launchers as Python; add orchestrator JOURNAL.md

2026-05-31 17:50:09 +00:00

launch-report.py

watchdog: cover all parts of the weekly run + survive the systemd oneshot

2026-06-23 02:42:50 +00:00

launch-supervisor.py

weekly-run: watchdog resumes on proc-death; supervisor defers to watchdog

2026-07-04 04:39:40 +00:00

launch-upgrader.py

weekly-run: watchdog resumes on proc-death; supervisor defers to watchdog

2026-07-04 04:39:40 +00:00

launch-upgrader.sh

refactor: rewrite launchers as Python; add orchestrator JOURNAL.md

2026-05-31 17:50:09 +00:00

launch.py

fix(watchdog): detect idle opencode turns

2026-06-12 21:47:06 +00:00

launch.sh

refactor: rewrite launchers as Python; add orchestrator JOURNAL.md

2026-05-31 17:50:09 +00:00

msg-loop.sh

Reliable loop messaging: msg-loop.sh + hardened ping_session (retry submit)

2026-05-30 15:31:28 +01:00

orchestration.md

watchdog: parse limit-reset time, never reboot limit-stalled sessions; rename orch session

2026-06-11 00:55:07 +00:00

orchestrator-opencode-restart.txt

watchdog: parse limit-reset time, never reboot limit-stalled sessions; rename orch session

2026-06-11 00:55:07 +00:00

overnight-run.sh

watchdog: parse limit-reset time, never reboot limit-stalled sessions; rename orch session

2026-06-11 00:55:07 +00:00

phase6-phase7-summary-2026-06-01.md

feat(orchestrator): wake the live monitor session

2026-06-01 18:51:05 +00:00

plan-cc-ci-hetzner-migration.md

plan: full migrate-to-Hetzner (provision → cut over loops → stop old b1 VM); server type cpx31→cpx32

2026-05-31 01:15:29 +00:00

plan-cc-ci-hetzner-terraform.md

plan: full migrate-to-Hetzner (provision → cut over loops → stop old b1 VM); server type cpx31→cpx32

2026-05-31 01:15:29 +00:00

plan-ccci-compose-overlay-policy.md

overlay policy: standardize the ccci overlay filename to compose.ccci.yml

2026-05-30 17:25:48 +01:00

plan-ghostpr-debug-fix.md

plan(ghost-pr): fold in upgrader's diagnosis — mysql 8.0->8.4 data-dir upgrade race (update_config.monitor too tight); PR#4 open

2026-06-12 13:45:52 +00:00

plan-lasuite-drive-oidc-robustness.md

plan: lasuite-drive recipe-robustness PR sub-plan (collabora healthcheck + perms + lazy OIDC)

2026-05-29 12:58:36 +01:00

plan-lasuite-drive-recipe-pr.md

lasuite-drive PR: scope the repeated-green/3x bar to lasuite-drive (flakiness proof) — NOT the general standard (operator 2026-05-29)

2026-05-29 13:25:10 +01:00

plan-migrate-cc-ci-to-hetzner.md

plan: full migrate-cc-ci-to-hetzner (provision cpx32 → benchmark 2 recipes → cutover loops+pipeline+DNS → retire Incus VM); age key is on the VM so no secret-blocker; harden .gitignore for the age key

2026-05-31 02:04:02 +00:00

plan-mirror-enroll-all-recipes.md

plan(mirror): remove the operator deploy gate — loops deploy+verify autonomously

2026-06-02 00:38:59 +00:00

plan-orchestrator-hetzner-migration.md

orchestrator-hetzner: enable reboot-resilience + record migration

2026-05-31 03:54:17 +00:00

plan-orchestrator-migration.md

decommission Pi: update all docs for VM-only setup

2026-05-31 00:16:37 +00:00

plan-overnight-run.md

plan: overnight run — after assistant, run /upgrade-all + morning report

2026-06-02 02:10:13 +00:00

plan-phase1b-review-lint.md

orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

2026-05-28 20:28:10 +01:00

plan-phase1c-full-reproducibility.md

decommission Pi: update all docs for VM-only setup

2026-05-31 00:16:37 +00:00

plan-phase1d-generic-test-suite.md

orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

2026-05-28 20:28:10 +01:00

plan-phase1e-harness-corrections.md

orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

2026-05-28 20:28:10 +01:00

plan-phase2-recipe-tests.md

orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

2026-05-28 20:28:10 +01:00

plan-phase2b-test-performance.md

Phase 2b narrowed to "confirm minimal deploys"; perf ideas moved to IDEAS

2026-05-30 05:07:49 +01:00

plan-phase2pc-image-cache.md

2pc: drop the pull-through registry cache — single host makes it marginal; keep PC1 prune-policy only

2026-05-29 09:24:56 +01:00

plan-phase2w-warm-canonical-quick.md

plan(2w): WC1.2 — pre-deploy auto-upgrade safety gate (major/manual-migration -> alert, hold)

2026-05-29 00:02:28 +01:00

plan-phase3-results-ux.md

Add Phase-2b plan: test performance (measure, attribute, improve empirically)

2026-05-27 04:26:27 +01:00

plan-phase4-final-review-polish-cleanup.md

orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

2026-05-28 20:28:10 +01:00

plan-phase5-verify-upgrade-flow.md

Phase 5 §4: install weekly upgrade cron at completion+1h and verify first kickoff

2026-05-29 21:21:20 +01:00

plan-phase6-reconcile-all-mirrors.md

feat(assistant): add opencode launcher and phase 6/7 plans

2026-06-01 12:59:03 +00:00

plan-phase7-upgrade-three-recipes.md

feat(assistant): add opencode launcher and phase 6/7 plans

2026-06-01 12:59:03 +00:00

plan-phase-bsky-fix.md

plan: phase 'bsky' — fix bluesky-pds recipe + its screenshot (queued after lvl5)

2026-06-11 11:30:49 +00:00

plan-phase-canon-canonical-sweep.md

plan(canon): retire UPGRADE_BASE_VERSION (gated) — plausible's pin becomes redundant under the dynamic canonical base

2026-06-17 04:43:23 +00:00

plan-phase-cf48-opus-cfold-review.md

plan: queue cf48 — Opus 4.8 post-cfold coverage-loss review (cross-check of cf55 GPT-5.5)

2026-06-13 05:15:05 +00:00

plan-phase-cf55-gpt55-cfold-review.md

plan: add gpt55 cfold review phase

2026-06-12 16:07:48 +00:00

plan-phase-cfold-custom-folder.md

plan: phase 'cfold' — collapse functional/+playwright/ into custom/ + full !testme recipe sweep (queued after drone)

2026-06-11 22:52:45 +00:00

plan-phase-dash-recipe-history.md

plan: queue dash — fix incomplete per-recipe run history on the CI dashboard (opus, after canon)

2026-06-17 04:29:52 +00:00

plan-phase-drone-enroll.md

plan: phases dstamp, mailu, kuma, drone (queued after bsky) + journal

2026-06-11 11:43:03 +00:00

plan-phase-dstamp-discourse-drift.md

plan: phases dstamp, mailu, kuma, drone (queued after bsky) + journal

2026-06-11 11:43:03 +00:00

plan-phase-ghost-reeval.md

plan: queue proxy and ghost follow-up phases

2026-06-12 15:56:03 +00:00

plan-phase-gtea-gitea-fulltests.md

plan: queue gtea — enroll gitea as a fully-tested recipe + verify LFS PR #1

2026-06-15 19:32:14 +00:00

plan-phase-kuma-monitor.md

plan: phases dstamp, mailu, kuma, drone (queued after bsky) + journal

2026-06-11 11:43:03 +00:00

plan-phase-lvl5-lint-rung.md

watchdog: fix limit-probe self-match + scrollback dedupe wedge; plan(lvl5): badge shows level only

2026-06-11 05:52:26 +00:00

plan-phase-mailu-backup.md

plan: phases dstamp, mailu, kuma, drone (queued after bsky) + journal

2026-06-11 11:43:03 +00:00

plan-phase-nixenv-shared-runtime-env.md

plan: queue nixenv — single-source the harness runtime env (timer + Drone runner share deps; root-cause fix for DEFECT-3 drift, opus)

2026-06-17 15:38:45 +00:00

plan-phase-prevb-previous-dynamic-base.md

plan: queue prevb (dynamic upgrade base + previous/ config, opus) + regall (all-recipe regression, sonnet)

2026-06-16 23:55:28 +00:00

plan-phase-pvcheck-post-proxy-verification.md

plan: queue proxy and ghost follow-up phases

2026-06-12 15:56:03 +00:00

plan-phase-pvfix-swarm-proxy.md

plan: queue proxy and ghost follow-up phases

2026-06-12 15:56:03 +00:00

plan-phase-pxgate-proxy-healthgate.md

plan: queue pxgate — fix deploy-proxy/dashboard health-gate circular dependency (D8)

2026-06-13 12:38:40 +00:00

plan-phase-redfix-canon-sweep-failures.md

plan: queue redfix — investigate ALL canon-sweep failures + FIX each (recipe PR or harness improvement, opus)

2026-06-17 23:17:51 +00:00

plan-phase-regall-recipe-regression.md

plan: queue prevb (dynamic upgrade base + previous/ config, opus) + regall (all-recipe regression, sonnet)

2026-06-16 23:55:28 +00:00

plan-phase-samever-older-base-fallback.md

plan(samever): frame the same-version gap as the nightly cold-sweep STEADY STATE (operator insight)

2026-06-17 03:56:09 +00:00

plan-phase-settings-ci-server-config.md

plan(settings): add release-tag-first no-canonical fallback; bump to opus

2026-06-17 09:48:34 +00:00

plan-phase-shot-screenshots.md

plan: phase 'shot' — recipe screenshot audit & repair (queued after rcust)

2026-06-11 01:17:32 +00:00

plan-prepull-images.md

plan: per-test image pre-pull sub-plan (warm images before deploy + upgrade; cheap on warm cache)

2026-05-29 14:55:21 +01:00

plan-proxy-vip-exhaustion-fix.md

upgrade-all: proxy VIP-exhaustion guard in Step 0; runbooks for proxy /16 enlarge + ghost PR debug

2026-06-12 03:30:00 +00:00

plan-recipe-report-skill.md

feat(recipe-report): /recipe-report skill + helper + launcher (default opus); wire into upgrade-all

2026-06-02 23:02:22 +00:00

plan-repo-consolidation.md

rename to cc-ci-orchestrator: update all repo name references

2026-05-31 00:03:11 +00:00

plan-report-pr-status-column.md

plan: finalize report PR-STATUS column (binary open/✓; proxy in reports.nix; decisions locked)

2026-06-09 12:59:41 +00:00

plan-server-regression-canaries.md

plan(regression): add per-tier RED canaries (install/upgrade/backup/restore)

2026-06-02 01:28:23 +00:00

plan-sso-dep-testing.md

rename the opt-in heavy-tests flag: --extra-tests -> --extra (operator 2026-05-29)

2026-05-29 10:36:04 +01:00

plan.md

loops: mandate machine-docs/ for ALL coordination files (kickoff/prompts/plan/AGENTS)

2026-06-11 20:56:24 +00:00

README.md

orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

2026-05-28 20:28:10 +01:00

reboot-log.sh

orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

2026-05-28 20:28:10 +01:00

REBOOTS.md

orchestrator-hetzner: enable reboot-resilience + record migration

2026-05-31 03:54:17 +00:00

recipe-custom-restructure-full-plan.md

plan: recipe-customization restructure — full builder/adversary plan (P1-P6 + real-CI regression sweep gate)

2026-06-10 16:28:09 +00:00

recipe-report.py

feat(recipe-report): TESTS rename + live binary STATUS column

2026-06-09 13:15:02 +00:00

task-consolidate-recipe-prs.md

task: assistant — consolidate open recipe PRs to one per recipe

2026-06-02 02:02:00 +00:00

test-e2e-testme-acceptance.md

decommission Pi: update all docs for VM-only setup

2026-05-31 00:16:37 +00:00

used-recipes.md

upgrade-all: skip 'external' recipes (uptime-kuma) + add used-recipes.md inventory

2026-06-15 17:00:28 +00:00

README.md

cc-ci-plan

Self-contained handoff package for building the cc-ci Co-op Cloud recipe CI server with two autonomous Claude loops (a Builder and an adversarial Reviewer) running over days.

Start here

Read plan.md — the full plan and single source of truth (mission, Definition of Done, architecture, milestones, the two-agent coordination protocol, loop discipline).
Read kickoff.md — how to launch and supervise the loops.
Run ./launch.sh start to bring up both loops + the watchdog.

Files

File	Purpose
`plan.md`	The Phase-1 plan (build the CI server). Agents treat it as their single source of truth.
`plan-phase1c-full-reproducibility.md`	Phase 1c (runs first): make the VM fully reproducible from git (all secrets incl. the wildcard cert in sops, in a separate private `cc-ci-secrets` repo as a flake input; base stays well-parameterized) and do the genuine throwaway-VM live rebuild to close D8 honestly (the "infeasible by design" was overstated).
`plan-phase1b-review-lint.md`	Phase 1b (after 1c): deterministic linting/formatting in CI + a white-box review checklist (real tests, DRY harness, idempotent Nix, no footguns/secrets), ending in a full cold re-verification of all D1–D10 — now covering 1c's refactor.
`plan-phase1d-generic-test-suite.md`	Phase 1d (after 1b, before 2): a generic install/upgrade/backup/restore suite that runs on any recipe with zero config, with a recipe's own `test_<op>.py` overriding or extending the generic (Builder's call) and reusing the generic's deployment — no redeploy, plus optional custom install-steps; recipes needing special setup fail the generic form gracefully. The test-architecture foundation Phase 2 builds on.
`plan-phase1e-harness-corrections.md`	Phase 1e (after 1d, before 2): three operator-review corrections to the shared generic harness — (HC1) upgrade goes previous-release → PR head via `deploy --chaos`; (HC2) repo-local PR code runs only for approved recipes (default = cc-ci overlays + generic only); (HC3) the generic runs by default alongside an overlay, skipped only via explicit opt-out.
`plan-phase2-recipe-tests.md`	Phase 2 (after Phase 1e): build on the corrected generic suite — author the recipe overlays (port recipe-maintainer tests as `test_*.py`) + define custom install steps where a recipe fails generically.
`plan-phase2b-test-performance.md`	Phase 2b (after Phase 2, before Phase 3): empirically measure where test time goes and reduce it (image cache, readiness tuning, dedup deploys, warm infra, concurrency) — no weakened tests.
`plan-phase3-results-ux.md`	Phase 3 (after Phase 2b): beautiful YunoHost-style results — per-run level, image-forward PR comment (badge + summary card + app screenshot), polished dashboard.
`IDEAS.md`	Deferred/future ideas, parked out of current scope.
`brief.md`	The original one-page brief (context only; `plan.md` supersedes it).
`kickoff.md`	Launch & supervision guide.
`launch.sh`	Starts both loops + a watchdog; restarts dead loops; stops on `## DONE`.
`prompts/builder.md`	Builder loop prompt (fed to `claude` by the script).
`prompts/adversary.md`	Adversary loop prompt.

Before launching

Set the org in plan.md (git.autonomic.zone/recipe-maintainers/cc-ci) and lock the six proof recipes (§8).
Ensure the launching shell has: SSH+sudo to cc-ci, the Gitea token, git.autonomic.zone access.
Preconfigure test-app DNS + TLS (plan §4.0): point a wildcard *.ci.commoninternet.net record at a gateway that TLS-passthroughs to cc-ci, and pre-issue the wildcard cert (*.ci.commoninternet.net + ci.commoninternet.net, via Gandi DNS-01) into /var/lib/ci-certs/live/ on cc-ci. The agent handles everything else on cc-ci (Traefik file provider → that cert, swarm, routing) and does no ACME; renewal (~90 days) is an out-of-band operator task, so the DNS token never goes to the agent.
export CC_CI_REPO=https://git.autonomic.zone/recipe-maintainers/cc-ci.git so the watchdog can detect ## DONE.

What "done" means

The loops stop only when all of plan.md §2 (D1–D10) hold and the Adversary has independently re-verified each within 24h. The watchdog then tears the loops down automatically.

README.md Unescape Escape

cc-ci-plan

Start here

Files

Before launching

What "done" means

README.md