Commit Graph

65 Commits

Author SHA1 Message Date
71a4a1fea4 Reliable loop messaging: msg-loop.sh + hardened ping_session (retry submit)
tmux `send-keys -l <long msg>` often leaves the text UNSENT in the input box (the
immediate Enter is swallowed while the TUI ingests the paste). Both now type the
message then retry Enter/C-m until the leading text is no longer in the input box
(= submitted) or a bounded loop gives up.
- msg-loop.sh: standalone reliable messenger for orchestrator use.
- launch.sh ping_session: same retry-submit (loads on next watchdog restart).
Live-tested: delivered first try.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 15:31:28 +01:00
7a1f7f75aa Policy: prefer upstream env-parameterization over cc-ci compose overlays
Operator (2026-05-30): a cc-ci-authored compose overlay risks silent drift from
the recipe users actually run — avoid it wherever possible.

- plan.md §9 guardrail: when a recipe needs a cc-ci-env-tuned value (e.g. a longer
  healthcheck start_period for the slow single node), the preferred fix is an
  UPSTREAM recipe PR exposing it as an env var (e.g. APP_START_PERIOD) with the
  current value as the default in env.sample — CI sets the env, no new compose.
  For making the upgrade tier work from an older base version, prefer DECLARING
  that version not-testable under this CI env over crafting a custom compose.
  Overlay = last resort, Adversary-confirmed non-drifting + paired with the env PR.
- plan-prefer-env-over-compose-overlay.md: migrates the existing debt —
  ghost/discourse compose.ccci-health.yml start_period -> APP_START_PERIOD recipe
  PRs (default=current) then drop the overlays; discourse image re-pin + mumble
  old-base host-ports copy -> declare those old versions untestable instead of
  forking compose. No test weakened; untestable-version is an honest outcome.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 15:17:42 +01:00
a89b082240 plan §7: recommend Monitor-on-convergence pattern for long deploys (builder's idea)
For a long deploy/convergence, arm a Monitor that polls the node every ~30s and
wakes on convergence OR failure, with a longer fallback heartbeat (ScheduleWakeup)
as a backstop. Proceeds the instant it converges (no over-waiting), surfaces
failures promptly, and the heartbeat bounds the wait. Size the timeout sanely
(longer if justified, never absurd like the ~40-min ghost case). Credit: builder.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 05:17:18 +01:00
e85e16318c Phase 2b narrowed to "confirm minimal deploys"; perf ideas moved to IDEAS
Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM
was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly
(bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software
micro-optimizations are judged unlikely to help, so:

- IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline,
  attribution) + the optimization menu (image cache/prepull, readiness tuning,
  warm-SSO start/stop, runner caching, concurrency sizing, resources, secret
  overhead) under "Phase-2b empirical performance work", revisit only if
  measurement later proves a specific software bottleneck.
- plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the
  per-recipe test sequence already uses the minimum deploys (1 base shared by
  install+functional+backup/restore, +1 for the upgrade tier, +1 per dep),
  enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 05:07:49 +01:00
1c2be64124 Phase 5 §4: install weekly upgrade cron at completion+1h and verify first kickoff
Operator: when the final phase completes, install the weekly cron anchored to
actual completion — first run ~1h after the build finishes, weekly from then on
(supersedes the fixed "Sat 03:00 UTC" placeholder).

- plan-phase5 §4: orchestrator computes T0=now+1h, installs a weekly job at T0's
  DOW+HH:MM running launch-upgrader.sh start; cron env needs claude on PATH +
  tmux + claude.ai login (mirror cc-ci-loops.service). VERIFY the first kickoff:
  cheap --dry-run pre-check, then confirm the real T0 fire launched the
  cc-ci-upgrader agent (status RUNNING, ran /upgrade-all, summary produced);
  record schedule + verified kickoff in DECISIONS.md.
- upgrade-all skill Cron section + cron memory updated to the completion-anchored
  schedule + first-kickoff verification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 21:21:20 +01:00
bf71420106 Add cc-ci-upgrader agent: observable one-shot weekly upgrade-run agent
The weekly upgrade run now executes inside a dedicated, remote-control agent
(cc-ci-upgrader) — viewable/steerable at claude.ai/code like the Builder — rather
than buried in headless cron output.

- launch-upgrader.sh: spins up the cc-ci-upgrader tmux session under
  --remote-control with a kickoff that runs /upgrade-all (DEFAULT mode) to
  completion. On finish the agent STOPS and stays idle (does NOT self-terminate)
  so the run + summary stay reviewable in the web UI. `start` = use-or-create:
  leaves an in-flight (busy) run alone, else clears a finished/idle/wedged
  session and runs fresh; `fresh` always restarts. UPGRADER_ARGS passes flags
  (e.g. --dry-run); never --with-tests.
- launch.sh: orchestrator_alive() now also skips the cc-ci-upgrader
  remote-control name, so the upgrader job isn't mistaken for the orchestrator.
- upgrade-all skill: documents it runs as the cc-ci-upgrader agent; the weekly
  cron invokes `launch-upgrader.sh start` (not /upgrade-all inline).
- Phase 5: V8a verifies the agent lifecycle (launch → run to completion → stay
  idle/viewable → next start clears it); V9 stops the verification session.
- cron memory: weekly task = launch-upgrader.sh start at 0 3 * * 6 UTC.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 21:12:47 +01:00
4f74676c72 Phase 5 (final): verify the /recipe-upgrade + testme-on-pr.sh end-to-end flow
Appended as the LAST phase in the launcher sequence (… 3 4 5). It can only run
once cc-ci is fully built — the !testme-on-recipe-PR flow depends on Phase 3
(results UX) surfacing the run result back on the PR for testme-on-pr.sh to read.

DoD (Adversary cold-verifies): !testme on a recipe PR is the real gate + results
land in the PR (V1); testme-on-pr.sh reads GREEN/RED/PENDING + BUILD url, POST=0
polls without re-triggering (V2); /recipe-upgrade default end-to-end green on a
sandbox recipe, nothing merged (V3); the ≤3 !testme regression loop (V4); stale
test DEFAULT = comment-only, no test edit (V5); --with-tests opens+verifies a
cc-ci test PR, paired (V6); mirror reconcile closes merged/superseded PRs and
main==upstream (V7); /upgrade-all default dry-run + small live run never edits
tests (V8); all verification PRs closed + deploys torn down (V9). Use a sandbox
recipe; never merge; never weaken tests. Watchdog reloaded (seq …3 4 5).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 20:38:39 +01:00
4a1da1dd60 recipe-upgrade: !testme-on-PR verification + make test PRs opt-in (--with-tests)
Per operator:
- Verify via `!testme` posted ON the recipe PR (the real CI path) so results are
  viewable in the PR; iterate up to 3 !testme runs (fix a real regression + re-test).
  New helper testme-on-pr.sh posts !testme and polls the PR head commit status
  for the verdict (POST=0 to keep polling without re-triggering).
- Test updates are now OPT-IN via `--with-tests`. DEFAULT: recipe PR only using
  existing tests; if a test fails and is genuinely stale, leave an explanatory
  COMMENT on the PR (upgrade looks correct; re-run --with-tests to update tests)
  and do NOT touch any test. --with-tests keeps the verified cc-ci test-update PR
  path (verified via the branch-checkout harness run, since !testme uses prod tests).
- upgrade-all (weekly cron) calls the DEFAULT — never auto-edits tests unattended;
  surfaces "tests look stale" PRs in the summary for the operator to opt in per-recipe.
- New RESULT: SUCCESS-PENDING-TESTS for the recipe-green-but-test-stale default case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 20:18:59 +01:00
c7da03fa6c watchdog: STALL_GRACE so stall_check never races a loop's own ScheduleWakeup
Root cause of the adversary "overrun": stall_check rebooted the instant
now >= WAITING-UNTIL (zero grace), but the loop's own ScheduleWakeup fires AT
that stated time — and the runtime scheduled it ~40s later than the marker
(date-vs-scheduler skew). So the watchdog pre-empted a HEALTHY self-wake by
~37s; the loop wasn't wedged, it was killed just before it woke. That was the
single false reboot at 18:55Z.

Fix: split the two cases cleanly.
- Marker present: reboot only when now > WAITING-UNTIL + STALL_GRACE (180s) —
  covers wake+start latency + marker/scheduler skew, so the watchdog only fires
  if the self-wake GENUINELY failed.
- No marker: unchanged — reboot when idle >= STALL_IDLE (300s).

Verified post-fix: adversary self-woke on time and re-paced (WAITING-UNTIL
19:19:30Z); no new stall reboots.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 20:12:46 +01:00
e8c4330ce3 watchdog: reboot idle-wedged loops via self-reported WAITING-UNTIL markers
The builder wedged at the context limit (garbled output) — alive but matching
none of heal_session's signals (dead/FATAL/limit), so the watchdog left it
stuck. Fix: loops now declare every wait, and the watchdog reboots a wait that
never resumes.

- plan.md §7 + both prompts: cap every wait at 10 min (chunk longer waits);
  before going idle, the loop's FINAL line must be `WAITING-UNTIL: <ISO8601 UTC>`
  (the resume time, matching its ScheduleWakeup); run /compact proactively at
  ~80% context to avoid wedging near the limit.
- launch.sh: new stall_check (runs every 30s signal tick) — reboots a loop idle
  >= STALL_IDLE (300s) when it has NO current WAITING-UNTIL marker as its last
  message OR is past the time the marker named; a healthy paced wait (marker
  present, before its time) is left alone. Complements heal_session's
  dead/FATAL/limit cases. Reboot is safe — loops re-orient from git + STATUS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 19:05:29 +01:00
62b7af7a97 recipe-upgrade: reconcile mirror to upstream main + close merged/superseded PRs
Per operator: an open mirror PR must mean "genuinely still open against true
current upstream main". On every run the recipe-upgrade flow now:
- force-syncs the recipe-maintainers/<recipe> mirror `main` to be IDENTICAL to
  upstream main (origin/main of the abra checkout = coopcloud);
- closes any open mirror PR whose changes are already in upstream main (merged
  upstream, no-op merge detected via `git merge-tree` vs main's tree) — even
  when the recipe is up to date (new `--reconcile-only` mode, run in step 1);
- when opening a new upgrade PR, closes any other still-open PR for that recipe
  (superseded) and opens the new one IN ITS PLACE; same-version re-runs just
  update the existing same-branch PR.
open-recipe-pr.sh gains the --reconcile-only mode + the close logic (with an
auto-close comment naming the reason). upgrade-all reconciles every candidate's
mirror during the survey so merged PRs are closed fleet-wide. Still never merges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:32:34 +01:00
a8b4b4c39e upgrade-all: pin weekly slot (Sat 03:00 UTC) + defer activation until cc-ci is built
Operator: don't run the weekly upgrade-all while the build loops are still
constructing cc-ci (shared-host contention). Activate the Sat 03:00 UTC
(0 3 * * 6) cron only once the build is complete; on-demand until then.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:24:40 +01:00
db31c08d6a Add /recipe-upgrade + /upgrade-all skills (cc-ci-gated upgrades, never merge)
Per-recipe and fleet-wide upgrade skills modelled on recipe-maintainer's
recipe-upgrade-full / recipe-upgrade-cron-all, but gated by the cc-ci CI server
and inheriting ci-test-review's create+verify+never-merge discipline.

- recipe-upgrade/: plan (release notes, breaking changes) -> implement (abra
  recipe upgrade + version bump + config, lint) -> open the recipe PR -> VERIFY
  green on cc-ci (full suite cold against the PR head via verify-pr.sh). If the
  upgrade is correct but a cc-ci TEST went stale, also update the test, verify
  it, and open a second PR to recipe-maintainers/cc-ci. Never merges; never
  weakens a test; prefers a recipe-only PR. Emits a parseable RESULT line.
  + open-recipe-pr.sh: adapted recipe-create-pr; runs on cc-ci (has the recipe
    checkout + bot token), creds passed from the orchestrator .testenv;
    force-syncs the mirror main so the PR diff is exactly the upgrade.
- upgrade-all/: weekly fan-out — enumerate enrolled recipes, survey upgrades,
  run /recipe-upgrade per upgradeable recipe via subagent (sequential default,
  --parallel / --dry-run), collect into one PR-list summary. Coordination +
  single-writer + shared-Swarm-teardown guardrails; built for a weekly cron.
- ci-test-review/verify-pr.sh: pass SRC (recipe-maintainers/<recipe>) alongside
  REF so the harness clones the mirror PR head correctly (its real contract).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:19:20 +01:00
27480b3513 Commit the 3r removal + skills-tracking .gitignore (missed in prior 2 commits)
The earlier `git add` included an already-`git rm`'d pathspec, so it errored and
staged nothing — launch.sh (3r removal) and .gitignore (track .claude/skills/)
were left uncommitted while the skill files went in via a separate -f add.
Runtime was already correct (watchdog reads the working-tree launch.sh); this
just syncs git HEAD to the working tree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:05:43 +01:00
cbe1406bce ci-test-review: close the loop — author + open + cc-ci-verify fix PRs (never merge)
Per operator: the skill should not just propose, it should CREATE the fix PR
(recipe repo or cc-ci repo) and VERIFY it green on its own CI server — but not
merge. It drives cc-ci like the loops do.

- SKILL.md: diagnose+classify (recipe vs CI-server) -> author the fix + open a
  PR (recipe-create-pr for recipe PRs; Gitea API for cc-ci PRs, dedicated branch
  in a separate clone, single-writer safe) -> VERIFY on cc-ci (full suite cold
  against the PR head = the !testme dogfood path) -> report a verified,
  ready-to-merge PR. Never merges; never weakens a test; flake != bug. General
  bar = one cold green; repeated-green (REPEAT=3) only for a known-flaky recipe.
  Adds coordination/single-writer guardrails (shared Swarm is stateful; tear
  down deploys; never push main or touch the loops' clones).
- verify-pr.sh: deterministic recipe-PR gate — RECIPE + REF -> cold full suite
  on cc-ci, green iff every repeat exits 0. CI-server-PR verification stays
  bespoke (branch checkout + rebuild + regression sample) per SKILL.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:04:35 +01:00
2530845e50 orchestrator: add /ci-test-review skill (in THIS repo) + drop Phase 3r from loops queue
The on-demand AI review layer is now an orchestration-repo skill built directly
by the orchestrator, NOT a loops phase in the cc-ci product repo:

- .claude/skills/ci-test-review/{SKILL.md,run-all-recipes.sh}: runs the real
  cc-ci harness across all enrolled recipes (deterministic, AI-free execution),
  then AI diagnoses each failure and classifies it as needing a recipe PR / a
  CI-server PR / a stale-test update — or reports "ALL PASSED, recipes + tests
  up to date". Proposes PRs; never decides pass/fail; never auto-merges.
- .gitignore: track .claude/skills/ (shareable) while still ignoring local
  claude session state (locks, history) under .claude/.
- launch.sh: remove Phase 3r from PHASES_SPEC; loops sequence back to
  1c 1b 1d 1e 2w 2pc 2 2b 3 4. Deleted plan-phase3r (superseded by the skill).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 16:57:26 +01:00
5f84f8c028 plan: Phase 3r — /ci-test-review Claude skill (on-demand AI review + recipe-vs-CI PR diagnosis)
Deterministic CI stays the primary, AI-free path. Adds a separate on-demand skill (ships in the
cc-ci repo .claude/skills/ci-test-review/) that runs the full suite across all recipes and, per
failure, AI-diagnoses + classifies: recipe PR (+ proposed change) vs CI-server PR vs stale-test;
or 'all passed, recipes+tests up to date' (incl. a latest-version freshness check). Proposes, never
auto-merges (operator-merge rule). Slotted 3 -> 3r -> 4. AI only diagnoses; execution stays
deterministic.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 16:39:07 +01:00
61ab3ecb3a plan: per-test image pre-pull sub-plan (warm images before deploy + upgrade; cheap on warm cache)
Resolve a recipe's images (docker compose config --images) and docker pull them (skip-if-present for
pinned tags) at the start of the recipe sequence + before the upgrade-new-version deploy, then the
normal abra deploy. Separates pull from converge (clear pull failures vs murky convergence timeouts),
speeds convergence (fits abra-native window). No layer re-download on warm cache; nightly all-recipes
run warms everything. Complements (not replaces) the recipe healthcheck for slow-init convergence.
Near-term Phase-2 harness unit; real abra deploy unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 14:55:21 +01:00
e7ed0e14b8 lasuite-drive PR: scope the repeated-green/3x bar to lasuite-drive (flakiness proof) — NOT the general standard (operator 2026-05-29)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 13:25:10 +01:00
7a87dc02b1 plan: lasuite-drive recipe-robustness PR sub-plan (collabora healthcheck + perms + lazy OIDC)
Operator (2026-05-29): dedicated sub-plan for the upstream recipe PR. Fixes collabora WOPI
healthcheck/start_period (keystone — fixes F2-12 at the source so cc-ci can return to abra-native
convergence + drop the -c/READY_PROBE backstop), backend WOPI retry, gunicorn-perms race, lazy OIDC.
PR is 'working' only when cc-ci runs the full suite incl. upgrade tier green + Adversary cold-verify,
then operator merges. Broken out from plan-lasuite-drive-oidc-robustness.md Part B (now points here).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:58:36 +01:00
7f8e6cb13e guardrail: abra convergence by default; custom READY_PROBE only when necessary + a real strict test (operator 2026-05-29, re F2-12)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 12:56:26 +01:00
294a8a1a9e rename the opt-in heavy-tests flag: --extra-tests -> --extra (operator 2026-05-29)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 10:36:04 +01:00
b4451527c3 builder: clean-tree-before-claim discipline (git status must be clean — Adversary cold-verifies from git)
Cheap guard against the deploy/git divergence: a fix built locally but uncommitted/un-pushed is a
guaranteed Adversary cold-build mismatch. Added to the builder prompt claim discipline + plan.md
§6.1. (Lighter than binding the deploy to a git rev — iteration speed + the Adversary's
cold-from-git verify is the real safety net.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:49:09 +01:00
f7971d949d 2pc: drop the pull-through registry cache — single host makes it marginal; keep PC1 prune-policy only
Operator (2026-05-29): on one host Docker's local image store already IS the cache; the churn was
over-pruning, not a missing cache. So 2pc = conservative prune policy + confirm local-store retention
+ daemon auth (PC1-3). Registry pull-through cache deferred to IDEAS with a concrete revisit
condition (multi-node, or measured cold-deploy bottleneck on recreate-surviving storage).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:24:56 +01:00
0352cb5607 plan: Phase 2pc — image pull-through cache + sane prune policy (front-loaded perf interjection)
Operator-directed (2026-05-29): front-load the two EVIDENCE-BASED image wins before grinding the
remaining deploy-heavy recipes — Phase 2 pauses, 2pc runs, Phase 2 resumes (seq: …2w 2pc 2 2b 3 4).
PC1: conservative prune (no reflexive `prune -af`, never mid-run, keep base images) — kills the
documented prune→re-pull→rate-limit churn. PC2: local registry:2 pull-through cache for docker.io,
PAT-authenticated, Nix-reconciled, daemon registry-mirror → transparent to abra/swarm; subsequent
pulls (across recipes/runs/post-prune) are local → faster deploys + rate-limit gone. Bounded scope:
these two only; concurrency/readiness-tuning stay in measurement-driven 2b.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:20:30 +01:00
f40ac6d1ad sso-dep: resolve authentik question — default keycloak; authentik ONLY if a recipe requires it; Phase-2 DONE not gated on it (operator 2026-05-29)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 09:08:20 +01:00
269253916c plan: lasuite-drive OIDC-setup flakiness — harness restructure (A) + recipe robustness PR (B)
Deferred lasuite-drive [~] (Q3.2). Two parts: (A) cc-ci wires OIDC at INSTALL against the live-warm
keycloak (WC1) so there's no flaky mid-run 12-service --chaos reconverge — using REAL abra commands
only (no docker service update bypass; operator decision); (B) a lasuite-drive recipe PR fixing the
root cause (collabora WOPI healthcheck-gating + gunicorn-perms race + lazy/retrying OIDC discovery).
Operator rule: a recipe change is "working" only once cc-ci runs the full suite on the PR and it's
repeatedly green (Adversary cold-verified) — then the operator merges. A+B reinforce (lazy OIDC makes
install-time wiring safe for the generic-first invariant). Ground the fix in captured failure logs first.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 08:57:26 +01:00
ae83a8120d watchdog: signal handoffs off claim()/review() commit prefixes (robust) + codify the convention
Replaces the brittle markdown prose-match ("Gate: … CLAIMED, awaiting Adversary") with detection of
the loops' conventional commit prefixes on origin/main: a new `claim(...)` commit pings the
Adversary; a new `review(...)` commit pings the Builder. Edge-triggered on the origin/main SHA
(append-only — no force-push), no file parsing, can't mis-route. The loops already use these
prefixes consistently; codified as a load-bearing contract in plan.md §6.1 + both prompts so it
stays reliable. INBOX detection unchanged (pushed-state, file-routed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 03:10:12 +01:00
e0e60bc2bc watchdog: fix handoff lag — detect on pushed origin/main + precise formal-claim match
The handoff pings fired on the writer's LOCAL working-tree write (before push), so the receiver
pulled a stale origin/main, saw "no formal gate", and a clarifying inbox round-trip ensued
(several minutes + wasted turns per handoff). And the gate-id parser read "WC1" as "C1" and could
fire on prose mentions.

Fix (1): handoff_check now `git fetch`es and reads origin/main (what the receiver will pull), via
_wd_fetch_origin + _wd_show_pushed, for STATUS / REVIEW / both INBOXes — a ping only fires once the
claim/verdict is actually pushed, so the receiver's pull always sees it. Eliminates the stale-pull
"premature" dance.
Fix (2): gate-claim detection matches ONLY a formal line (Gate: <id> … CLAIMED, awaiting Adversary)
and edge-triggers on a genuinely-new such line compared whole — no firing on historical
"CLAIMED detail" lines or prose; gate-id is a best-effort label only.

Loops' clones have a credential helper (reads .testenv) so the watchdog's fetch works
non-interactively. Verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 02:38:47 +01:00
9f99b134cd ideas: ALT infra-app model — traefik/keycloak/drone as normal coop-cloud abra deployments, maintainer-updated outside Nix (parked, operator-flagged)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 00:12:15 +01:00
00e90bb597 plan(2w): WC1.2 — pre-deploy auto-upgrade safety gate (major/manual-migration -> alert, hold)
Operator (2026-05-29): a passing health check does NOT prove a required manual migration ran, so
auto-update needs a PRE-deploy gate in addition to the post-deploy health rollback. Reconciler
auto-applies only non-major (patch/minor) upgrades with no manual-migration release notes; a MAJOR
recipe-version bump (or release notes flagging a manual migration) is held on the current version
with a PushNotification carrying the release notes (operator upgrades manually). Leans on abra's
own major-bump caution + recipe releaseNotes/. Updated WC1.2/WC6/principles/decisions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-29 00:02:28 +01:00
c3a572e4b9 plan(2w): warm/infra auto-latest nightly + health-gated rollback (snapshot stateful apps)
Operator decision (2026-05-28): traefik + keycloak stay UNPINNED (fetch latest + chaos deploy);
a nightly `nixos-rebuild switch` rolls them to latest, then the full-cold sweep runs. The nix
closure stays byte-identical (recipe fetched at runtime, not in the store) so D8 holds.
Health-gated rollback is built INTO the reconciler (not nix-generation rollback, since the swarm
app isn't in the generation): record last-good -> deploy latest -> health-check -> commit or
roll back + PushNotification. Stateful apps (keycloak): snapshot the data volume before upgrade
(undeploy->snapshot->deploy-latest) and restore it on rollback, reusing the WC3 snapshot helper;
traefik = version rollback only. Added WC1.1 + updated WC1/WC6/milestones/guardrails/decisions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 23:59:16 +01:00
a2728eec2d plan: Phase 2w — warm canonical deployments + --quick CI mode (interjected into Phase 2)
Operator-directed: pause Phase 2, build the warm-data + --quick system, then resume Phase 2.
- live-warm keycloak (SSO dep, realm-per-run), data-warm canonicals (undeploy keeps volume),
  cold = authoritative default. --quick reattaches the canonical, upgrades to PR head, asserts,
  and rolls back to the last-known-good snapshot on failure (never loses working data).
- known-good = raw volume copy taken while undeployed (consistent), one per app, advanced ONLY
  by green cold runs; a nightly full-cold sweep refreshes canonicals + is a daily regression run.
- launch.sh: insert 2w at the current index (Phase 2 -> resumes after 2w DONE); seq is now
  1c 1b 1d 1e 2w 2 2b 3 4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 23:04:33 +01:00
11a2ce652d watchdog: self-heal FATAL session-state errors + supervise the orchestrator
- heal_session: detect the unrecoverable "thinking/redacted_thinking blocks cannot
  be modified" 400 (recurs every turn, session stays alive so the dead-check misses
  it) and kill+restart the loop fresh (re-orients from repo). Consolidates the
  dead/fatal/limit handling for builder+adversary.
- heal_orchestrator: keep the orchestrator alive too, conflict-safe. Restarts via
  launch-orchestrator.sh ONLY when no orchestrator is alive anywhere — liveness
  detects both a managed cc-ci-orchestrator tmux session AND a hand-launched
  terminal session (any non-loop claude), so it never double-resumes the
  conversation (the likely cause of the thinking-block crashes). Kill+restart if
  the managed session is wedged on the FATAL error. Toggle: WATCH_ORCHESTRATOR=0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 21:09:21 +01:00
36a6c9872a orchestrator: reboot-resilience + session auto-resume + full session plan/tooling
Reboot survival for the Pi orchestrator host:
- systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot
  records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the
  orchestrator session.
- reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count).
- launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed
  orchestrator announces itself (PushNotification) + reports reboots.
- AGENTS.md: on-startup notify routine documented.

Plans/tooling accumulated this session:
- plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review),
  sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance.
- launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution,
  limit-stall re-nudge, INBOX side-channel detection.
- plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED.
- prompts: isolation discipline + INBOX + pacing.
- .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 20:28:10 +01:00
5681438b0f launch.sh fix: don't let an empty-match grep kill the watchdog (set -e + pipefail)
handoff_check's now="$(grep CLAIMED.*awaiting ... )" returned non-zero when a phase's STATUS
has no claimed-awaiting lines yet (normal early in a phase); under set -euo pipefail that
assignment exited the whole watchdog. Append `|| true` to the now= and cur= command
substitutions. Verified: watchdog survives the handoff tick on a freshly-created STATUS-1c.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 16:09:01 +01:00
782a3c7360 Phase-1c: true verification = Adversary deletes the throwaway VM, creates a fresh one, full install
Strengthen C4/W5: the genuine reproducibility proof is a clean-room repeat — the Adversary
destroys any existing throwaway VM, creates a brand-new blank VM, and runs the entire install
from scratch per docs/install.md so nothing from the Builder's setup attempt can mask a gap.
Cold, with logged evidence (VM id, exact install commands, convergence + TLS-from-git-cert).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 16:05:54 +01:00
994e52c101 launch.sh: phase-aware sequencer (run 1c -> auto-transition 1b -> stop for manual gate)
Make the launcher drive an ordered phase sequence (default 1c then 1b). Each phase has its own
plan + phase-namespaced loop-state files (STATUS-<id>.md/BACKLOG/REVIEW/JOURNAL); the watchdog
auto-transitions when the current phase's STATUS-<id>.md shows ## DONE, and STOPS after the last
phase (writes SEQUENCE-COMPLETE, exits) as a manual gate before Phase 2. start_agent injects a
phase preamble (source-of-truth = phase plan; phase-namespaced state) ahead of the base role
prompt. DONE detection reads the builder's local clone (reliable, no push-lag). Handoff signalling
+ resilience preserved and made phase-scoped (reset baseline on transition).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 16:00:51 +01:00
9d13bb0b58 Reorder: Phase 1c before 1b (refactor first, then review/lint + full re-verify)
1c (full git reproducibility: cc-ci-secrets split, cert-in-sops, genuine D8 live rebuild)
now runs before 1b. This way 1b's review/lint and its final cold re-verification of all
D1-D10 cover the final refactored state (incl. the secrets split) and the genuine post-1c
D8 — rather than reviewing pre-refactor code and re-verifying a flawed D8. Updated status
lines in 1b/1c and the README ordering. Sequence: 1 -> 1c -> 1b -> 2 -> 2b -> 3.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 15:51:07 +01:00
c6d27b251a Phase-1c: split only secrets into a separate cc-ci-secrets repo; base stays parameterized
Per operator: the split boundary is secrecy, not modularity. Only the sops-encrypted secrets
(incl. the wildcard cert) move to a separate private repo `cc-ci-secrets` (extra access-control
layer), consumed by the base via a flake input. Instance non-secret vars (domain, gateway,
recipients) stay in the well-parameterized base cc-ci repo — another admin repoints by editing
params, no second config repo. Guardrail reworded: instance vars in base are fine; only plaintext
SECRETS must never leak into base/store. Updated model/C1/C2/W2/§6/§7 + README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 15:47:57 +01:00
769dfd0c62 Phase-1c: resource plan -> 4GB/4GB under a 12GB guideline (not 2GB)
Per operator: don't downsize cc-nix-test to 2GB. Instead raise the terraform-ci running-RAM
guideline to ~12GB (it's doc-only — the project has no enforced limits.memory; b1 is 16GB),
resize cc-nix-test 6->4GB, and create the throwaway VM at 4GB (4+4+lichen 4 = 12 <= 16).
Updated W1/W3/C6/§4 and the incus memory note.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 15:29:37 +01:00
d41a76f757 Add Phase-1c plan: full git reproducibility (secrets+cert in sops) + genuine D8 live rebuild
D8's throwaway-VM live rebuild was wrongly marked "infeasible by design" — the master
recovery age key defeats the sops-host-key reason, DNS/cert is a precondition not a
rebuild blocker, and Incus was available. Phase 1c (loop-driven): (A) make the VM fully
reproducible from git including ALL secrets — move the wildcard cert + every secret into
sops-in-git, split generic base repo from a private instance repo composed via a flake
input (the only out-of-band secret is the bootstrap age key); (B) actually perform +
cold-verify a blank-VM nixos-rebuild and rewrite D8 honestly. Resize cc-nix-test to 2GB
first to free b1 headroom for a sized throwaway VM; destroy it after; restore/promote
sizing. Gandi token stays out of repo/agent (only the cert artifact is committed). Linked README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 15:24:48 +01:00
e68a520d4c Fix watchdog false gate-ping: edge-trigger on NEW claimed-awaiting gate ids, baseline silently
The Adversary got a spurious "gate CLAIMED" ping: STATUS.md keeps historical
"Gate: Mn — CLAIMED, awaiting Adversary" lines after they PASS, and on watchdog restart the
first observation pinged on those already-passed lines. Now track the SET of gate ids on
CLAIMED-awaiting lines and ping only when an id NEWLY appears vs the prior observation, after a
silent baseline. A gate passing (line kept) or evidence edits don't re-ping; restart re-baselines
without pinging. Verified: watchdog restart no longer pings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 06:25:09 +01:00
649b90b586 launch.sh: resolve script to absolute path (SELF) so the watchdog re-invokes correctly
Bug: start_watchdog used $0, which breaks when launch.sh is called by a relative path
(the watchdog tmux session cd's into PLAN_DIR, so a relative $0 no longer resolves —
"No such file or directory", watchdog dies instantly). Resolve BASH_SOURCE to an absolute
SELF once and use it for the watchdog self-invocation. Verified: watchdog now starts and
its handoff_check immediately pinged the Adversary about a standing CLAIMED gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 06:16:54 +01:00
239dfd8e26 Watchdog handoff signalling: ping the waiting loop on gate-claim / verdict (kill double-idle)
launch.sh watchdog now runs a fast (~30s) handoff_check alongside the heavy (300s) restart/DONE
check: when the Builder writes a CLAIMED gate it pings the Adversary to verify now; when the
Adversary updates REVIEW.md it pings the Builder to proceed (edge-triggered, reads local clones).
So a pending handoff resolves in <~30s instead of a whole idle interval. Pacing revised: the
Adversary may idle freely when nothing's pending (no pointless re-verify/busy-poll) and is woken
by the watchdog; Builder waits on the ping + a fallback ~2-4m self-poll. kickoff documents the
new "handoff signalling" role.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 06:15:25 +01:00
deca47d9c7 Pacing §7: avoid both-loops-idle during a handoff (short-poll when blocked on the counterpart)
Root cause of "both waiting": parked-at-gate was lumped into the long idle sleep, so a
pending handoff sat while both loops slept on desynced timers. Fix: three cases — (1) in
flight → ~4m; (2) BLOCKED ON THE OTHER LOOP (Builder at CLAIMED gate / Adversary awaiting a
fix) → ~4m poll for the counterpart, never long-idle; (3) genuinely nothing pending → ~10-15m.
Adversary: a CLAIMED gate is immediate top-priority; otherwise run background probes, rarely
idle while Builder is active. Builder: keep an unblocked item in hand to rarely be fully gated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 06:05:55 +01:00
8a4a010723 Reduce idle loop cadence 20–30m -> ~10–15m (pick up work sooner)
§7 pacing + Builder/Adversary prompts: idle/parked sleep lowered to ~10–15 min so the next
unit of work (or a gate claim) is picked up without long gaps. Unchanged: ~4m polling while
a build/deploy is in flight; keep polling something clearly in-flight rather than treating
it as idle; don't spin on a minutes-long build. Adversary aligned to §7 for consistency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 05:17:29 +01:00
3d198c8c17 Phase-1b: require full cold-start re-verification of all Phase-1 D1–D10 as the final gate
RL3 strengthened: after lint/review findings are responded to and fixed, the Adversary
independently re-verifies EVERY Phase-1 Definition-of-Done item (D1–D10) from a cold start
to the same bar as Phase 1's own DONE (fresh PASS + evidence in REVIEW.md), proving the
cleanup regressed nothing. 1b cannot be DONE until all D1–D10 are re-confirmed green
post-cleanup. Method/W2 updated to make the ordering explicit (tooling -> fixes -> re-verify).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 05:15:34 +01:00
5d90cbd576 Add Phase-1b plan: bounded review & lint pass at the end of Phase 1
Before scaling to many recipes: (1) deterministic style/hygiene via linters/formatters
(alejandra/statix/deadnix, ruff, shellcheck/shfmt) wired as a .drone.yml stage so commits
stay clean; (2) a white-box review checklist with teeth (real tests not health-only/skipped,
DRY harness, Nix-declared idempotent bring-up, no footguns/secrets-in-code, architecture
matches plan) — blocking fixed, advisory triaged. Bounded pass; never weaken a test for a
nit. Phase 2 now follows 1b. Linked in README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 05:11:06 +01:00
2d3c17f4bd Add Phase-2b plan: test performance (measure, attribute, improve empirically)
Phase 2b (after Phase 2, before Phase 3): instrument per-phase timings, baseline a
representative recipe set (cold vs warm), attribute where time goes (Pareto), then try
improvements as controlled before/after experiments and keep measured winners — image
pull cache/pre-pull, readiness-wait tuning, dedup deploy cycles, warm/shared infra
(isolation-proven), runner caching, concurrency sizing, vCPU. Speed never weakens tests
or isolation (Adversary re-measures + re-verifies). Phase 3 now follows 2b. Linked in README.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 04:26:27 +01:00