The handoff pings fired on the writer's LOCAL working-tree write (before push), so the receiver
pulled a stale origin/main, saw "no formal gate", and a clarifying inbox round-trip ensued
(several minutes + wasted turns per handoff). And the gate-id parser read "WC1" as "C1" and could
fire on prose mentions.
Fix (1): handoff_check now `git fetch`es and reads origin/main (what the receiver will pull), via
_wd_fetch_origin + _wd_show_pushed, for STATUS / REVIEW / both INBOXes — a ping only fires once the
claim/verdict is actually pushed, so the receiver's pull always sees it. Eliminates the stale-pull
"premature" dance.
Fix (2): gate-claim detection matches ONLY a formal line (Gate: <id> … CLAIMED, awaiting Adversary)
and edge-triggers on a genuinely-new such line compared whole — no firing on historical
"CLAIMED detail" lines or prose; gate-id is a best-effort label only.
Loops' clones have a credential helper (reads .testenv) so the watchdog's fetch works
non-interactively. Verified.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Operator (2026-05-29): a passing health check does NOT prove a required manual migration ran, so
auto-update needs a PRE-deploy gate in addition to the post-deploy health rollback. Reconciler
auto-applies only non-major (patch/minor) upgrades with no manual-migration release notes; a MAJOR
recipe-version bump (or release notes flagging a manual migration) is held on the current version
with a PushNotification carrying the release notes (operator upgrades manually). Leans on abra's
own major-bump caution + recipe releaseNotes/. Updated WC1.2/WC6/principles/decisions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Operator decision (2026-05-28): traefik + keycloak stay UNPINNED (fetch latest + chaos deploy);
a nightly `nixos-rebuild switch` rolls them to latest, then the full-cold sweep runs. The nix
closure stays byte-identical (recipe fetched at runtime, not in the store) so D8 holds.
Health-gated rollback is built INTO the reconciler (not nix-generation rollback, since the swarm
app isn't in the generation): record last-good -> deploy latest -> health-check -> commit or
roll back + PushNotification. Stateful apps (keycloak): snapshot the data volume before upgrade
(undeploy->snapshot->deploy-latest) and restore it on rollback, reusing the WC3 snapshot helper;
traefik = version rollback only. Added WC1.1 + updated WC1/WC6/milestones/guardrails/decisions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Operator-directed: pause Phase 2, build the warm-data + --quick system, then resume Phase 2.
- live-warm keycloak (SSO dep, realm-per-run), data-warm canonicals (undeploy keeps volume),
cold = authoritative default. --quick reattaches the canonical, upgrades to PR head, asserts,
and rolls back to the last-known-good snapshot on failure (never loses working data).
- known-good = raw volume copy taken while undeployed (consistent), one per app, advanced ONLY
by green cold runs; a nightly full-cold sweep refreshes canonicals + is a daily regression run.
- launch.sh: insert 2w at the current index (Phase 2 -> resumes after 2w DONE); seq is now
1c 1b 1d 1e 2w 2 2b 3 4.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- heal_session: detect the unrecoverable "thinking/redacted_thinking blocks cannot
be modified" 400 (recurs every turn, session stays alive so the dead-check misses
it) and kill+restart the loop fresh (re-orients from repo). Consolidates the
dead/fatal/limit handling for builder+adversary.
- heal_orchestrator: keep the orchestrator alive too, conflict-safe. Restarts via
launch-orchestrator.sh ONLY when no orchestrator is alive anywhere — liveness
detects both a managed cc-ci-orchestrator tmux session AND a hand-launched
terminal session (any non-loop claude), so it never double-resumes the
conversation (the likely cause of the thinking-block crashes). Kill+restart if
the managed session is wedged on the FATAL error. Toggle: WATCH_ORCHESTRATOR=0.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
handoff_check's now="$(grep CLAIMED.*awaiting ... )" returned non-zero when a phase's STATUS
has no claimed-awaiting lines yet (normal early in a phase); under set -euo pipefail that
assignment exited the whole watchdog. Append `|| true` to the now= and cur= command
substitutions. Verified: watchdog survives the handoff tick on a freshly-created STATUS-1c.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strengthen C4/W5: the genuine reproducibility proof is a clean-room repeat — the Adversary
destroys any existing throwaway VM, creates a brand-new blank VM, and runs the entire install
from scratch per docs/install.md so nothing from the Builder's setup attempt can mask a gap.
Cold, with logged evidence (VM id, exact install commands, convergence + TLS-from-git-cert).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Make the launcher drive an ordered phase sequence (default 1c then 1b). Each phase has its own
plan + phase-namespaced loop-state files (STATUS-<id>.md/BACKLOG/REVIEW/JOURNAL); the watchdog
auto-transitions when the current phase's STATUS-<id>.md shows ## DONE, and STOPS after the last
phase (writes SEQUENCE-COMPLETE, exits) as a manual gate before Phase 2. start_agent injects a
phase preamble (source-of-truth = phase plan; phase-namespaced state) ahead of the base role
prompt. DONE detection reads the builder's local clone (reliable, no push-lag). Handoff signalling
+ resilience preserved and made phase-scoped (reset baseline on transition).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1c (full git reproducibility: cc-ci-secrets split, cert-in-sops, genuine D8 live rebuild)
now runs before 1b. This way 1b's review/lint and its final cold re-verification of all
D1-D10 cover the final refactored state (incl. the secrets split) and the genuine post-1c
D8 — rather than reviewing pre-refactor code and re-verifying a flawed D8. Updated status
lines in 1b/1c and the README ordering. Sequence: 1 -> 1c -> 1b -> 2 -> 2b -> 3.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per operator: the split boundary is secrecy, not modularity. Only the sops-encrypted secrets
(incl. the wildcard cert) move to a separate private repo `cc-ci-secrets` (extra access-control
layer), consumed by the base via a flake input. Instance non-secret vars (domain, gateway,
recipients) stay in the well-parameterized base cc-ci repo — another admin repoints by editing
params, no second config repo. Guardrail reworded: instance vars in base are fine; only plaintext
SECRETS must never leak into base/store. Updated model/C1/C2/W2/§6/§7 + README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per operator: don't downsize cc-nix-test to 2GB. Instead raise the terraform-ci running-RAM
guideline to ~12GB (it's doc-only — the project has no enforced limits.memory; b1 is 16GB),
resize cc-nix-test 6->4GB, and create the throwaway VM at 4GB (4+4+lichen 4 = 12 <= 16).
Updated W1/W3/C6/§4 and the incus memory note.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D8's throwaway-VM live rebuild was wrongly marked "infeasible by design" — the master
recovery age key defeats the sops-host-key reason, DNS/cert is a precondition not a
rebuild blocker, and Incus was available. Phase 1c (loop-driven): (A) make the VM fully
reproducible from git including ALL secrets — move the wildcard cert + every secret into
sops-in-git, split generic base repo from a private instance repo composed via a flake
input (the only out-of-band secret is the bootstrap age key); (B) actually perform +
cold-verify a blank-VM nixos-rebuild and rewrite D8 honestly. Resize cc-nix-test to 2GB
first to free b1 headroom for a sized throwaway VM; destroy it after; restore/promote
sizing. Gandi token stays out of repo/agent (only the cert artifact is committed). Linked README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Adversary got a spurious "gate CLAIMED" ping: STATUS.md keeps historical
"Gate: Mn — CLAIMED, awaiting Adversary" lines after they PASS, and on watchdog restart the
first observation pinged on those already-passed lines. Now track the SET of gate ids on
CLAIMED-awaiting lines and ping only when an id NEWLY appears vs the prior observation, after a
silent baseline. A gate passing (line kept) or evidence edits don't re-ping; restart re-baselines
without pinging. Verified: watchdog restart no longer pings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug: start_watchdog used $0, which breaks when launch.sh is called by a relative path
(the watchdog tmux session cd's into PLAN_DIR, so a relative $0 no longer resolves —
"No such file or directory", watchdog dies instantly). Resolve BASH_SOURCE to an absolute
SELF once and use it for the watchdog self-invocation. Verified: watchdog now starts and
its handoff_check immediately pinged the Adversary about a standing CLAIMED gate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
launch.sh watchdog now runs a fast (~30s) handoff_check alongside the heavy (300s) restart/DONE
check: when the Builder writes a CLAIMED gate it pings the Adversary to verify now; when the
Adversary updates REVIEW.md it pings the Builder to proceed (edge-triggered, reads local clones).
So a pending handoff resolves in <~30s instead of a whole idle interval. Pacing revised: the
Adversary may idle freely when nothing's pending (no pointless re-verify/busy-poll) and is woken
by the watchdog; Builder waits on the ping + a fallback ~2-4m self-poll. kickoff documents the
new "handoff signalling" role.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause of "both waiting": parked-at-gate was lumped into the long idle sleep, so a
pending handoff sat while both loops slept on desynced timers. Fix: three cases — (1) in
flight → ~4m; (2) BLOCKED ON THE OTHER LOOP (Builder at CLAIMED gate / Adversary awaiting a
fix) → ~4m poll for the counterpart, never long-idle; (3) genuinely nothing pending → ~10-15m.
Adversary: a CLAIMED gate is immediate top-priority; otherwise run background probes, rarely
idle while Builder is active. Builder: keep an unblocked item in hand to rarely be fully gated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
§7 pacing + Builder/Adversary prompts: idle/parked sleep lowered to ~10–15 min so the next
unit of work (or a gate claim) is picked up without long gaps. Unchanged: ~4m polling while
a build/deploy is in flight; keep polling something clearly in-flight rather than treating
it as idle; don't spin on a minutes-long build. Adversary aligned to §7 for consistency.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RL3 strengthened: after lint/review findings are responded to and fixed, the Adversary
independently re-verifies EVERY Phase-1 Definition-of-Done item (D1–D10) from a cold start
to the same bar as Phase 1's own DONE (fresh PASS + evidence in REVIEW.md), proving the
cleanup regressed nothing. 1b cannot be DONE until all D1–D10 are re-confirmed green
post-cleanup. Method/W2 updated to make the ordering explicit (tooling -> fixes -> re-verify).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before scaling to many recipes: (1) deterministic style/hygiene via linters/formatters
(alejandra/statix/deadnix, ruff, shellcheck/shfmt) wired as a .drone.yml stage so commits
stay clean; (2) a white-box review checklist with teeth (real tests not health-only/skipped,
DRY harness, Nix-declared idempotent bring-up, no footguns/secrets-in-code, architecture
matches plan) — blocking fixed, advisory triaged. Bounded pass; never weaken a test for a
nit. Phase 2 now follows 1b. Linked in README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2b (after Phase 2, before Phase 3): instrument per-phase timings, baseline a
representative recipe set (cold vs warm), attribute where time goes (Pareto), then try
improvements as controlled before/after experiments and keep measured winners — image
pull cache/pre-pull, readiness-wait tuning, dedup deploy cycles, warm/shared infra
(isolation-proven), runner caching, concurrency sizing, vCPU. Speed never weakens tests
or isolation (Adversary re-measures + re-verifies). Phase 3 now follows 2b. Linked in README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 (after Phase-2 DONE, manual transition): compute a per-run quality LEVEL, post an
image-forward Gitea PR comment in the YunoHost shape (marker + status/level badge + a
rendered summary card containing a real app screenshot, linking to the run), and polish the
overview dashboard to a ci-apps.yunohost.org look/feel with per-recipe level badges +
screenshots. Reuses the Phase-1 dashboard/bridge/Playwright; presentation never changes the
verdict; no secrets in any artifact; cosmetics never block the pipeline. Linked from README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add §7.1 Adversary mandate: default assumption is everything meaningful is testable
(OIDC/SSO, federation, media, WOPI, WebRTC connectivity, backup data survival) — the job
is a good test, not declaring impossibility. Adversary reads test bodies, rejects
skip/xfail/mock/health-only/empty-assertion tests and bogus parity renames, re-runs cold.
"Untestable" is a rare exception needing a true environment blocker + maximal subset +
Adversary sign-off; "needs browser/SSO/another app" is not valid. Tighten P7 and §8 to match.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 fills the CI machine with good tests for every maintained Co-op Cloud app,
using references/recipe-maintainer as the corpus: port a comparable cc-ci test for
EACH existing recipe-maintainer test (parity, tracked in PARITY.md) + >=2 new
recipe-specific functional tests per recipe, plus real backup data-integrity and SSO
dependency handling. Reuses the Phase-1 harness/stages/trigger/resource-caps; adds
test content + small shared-harness ports from helpers.py. Linked from the package README.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Don't overload the single node: cap concurrent test builds at a configurable MAX_TESTS
(= DRONE_RUNNER_CAPACITY); Drone natively queues excess builds and times out hung ones,
freeing slots — no custom queue. Each run deploys one app then undeploys; the run-start
janitor is the backstop for timed-out/killed builds. At most MAX_TESTS apps live at once.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First item: later, for environments where the CI server has repo-admin, consider an
opt-in (off-by-default) feature to auto-register + idempotently reconcile the issue_comment
webhook — preserving the read-only/polling default. Parked, out of current scope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Finalize trigger model per operator: polling is the primary trigger (outbound, read-only,
no admin); the server never self-registers webhooks (that needs admin) — webhook is an
optional push optimization an admin registers manually, documented in enroll-recipe.md.
Commenter auth via org-membership endpoint (read-level), not the admin-only permission
endpoint. Bot's required privilege is read + comment + org-membership, never repo-admin.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The repo's explicit collaborator list is empty — bot and maintainers (trav/notplants)
all access via org ownership, so the collaborators check 404s for everyone. Authorize via
GET /collaborators/{user}/permission requiring owner/admin/write (matches the builder's fix).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Record the trigger design: webhook (default/primary, confirmed working) and polling
(kept but disabled behind a flag) are mutually exclusive — only one runs at a time, so no
cross-path dedupe. Poll is the fallback when webhook delivery fails. Also note the
commenter-auth check must count recipe-maintainers org members/admins, not just repo
collaborators (the bot is org admin and was being rejected).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Strengthen the idempotency guardrail: every infra piece (swarm, traefik recipe deploy,
drone, bridge, dashboard) is a systemd oneshot that re-runs each activation/boot and
converges to desired state (like swarm-init) — no manual post-steps, no run-once
sentinels. Goal: from-scratch install = clone + nixos-rebuild switch + preconditions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Supersedes the original modules/traefik.nix hand-rolled proxy. cc-ci now deploys the
coop-cloud/traefik recipe via abra in wildcard/file-provider mode, serving the operator's
pre-issued wildcard cert as the recipe's ssl_cert/ssl_key swarm secrets — canonical
web/web-secure + proxy/swarm conventions every recipe expects, no ACME, DNS token never
on cc-ci. Updated §1, §1.5, §3, §4.0, §4.2, §5 (M1), §8.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Records the exact sequence to keep the orchestrator alive in tmux and resume it
with remote control (survives disconnects/laptop close), reconnect commands, and
pointers to launch/supervision docs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clarify the two distinct names (--resume <conversation> vs --remote-control display
label), the in-session /remote-control shortcut, and the persist-vs-reconnect model.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Recipe repos under test live on the private mirror git.autonomic.zone/recipe-maintainers,
mirrored from upstream git.coopcloud.tech. autonomic-bot is admin on that org (can create
repos + add webhooks). A recipe missing from the mirror is not a blocker — fetch from
upstream and open a PR via the recipe-create-pr procedure. Updated D10 (§2) and enrollment (§4.1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Documents the three roles (orchestrator vs Builder/Adversary loops), how to keep
this orchestrator session alive under --remote-control for check-ins/steering via
claude.ai/code, launch/supervision pointers, access/cred locations, and the VM
fallback. Secrets remain gitignored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Planning + launch + setup material for the cc-ci Co-op Cloud recipe CI server:
plan.md (single source of truth), kickoff/launch supervision, and the
Builder/Adversary loop prompts. Secrets (.testenv) and runtime dirs are gitignored.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>