Commit Graph

199 Commits

Author SHA1 Message Date
7c042c2f2a plan: phase 'shot' — recipe screenshot audit & repair (queued after rcust)
Audit every enrolled recipe's CI badge/card screenshot, diagnose defects
(plausible null-every-run; ~4.8KB blank-frame SPAs: immich/lasuite-meet/
cryptpad/flaky n8n), fix via harness default-wait improvement first, per-recipe
SCREENSHOT hooks second; M1 audit matrix + M2 visually-verified PNGs on fresh
real-CI runs (>=2 !testme). Cosmetics-never-block and secret-safety guardrails
binding. Also: temporary hourly-wake instruction to verify the new limit-wait
system tonight; journal entry.
2026-06-11 01:17:32 +00:00
2e1ab8d384 watchdog: hourly orchestrator wake fires even during a limit window
Operator request: the hourly supervision prompt should land regardless of
limit state, as a fallback that keeps things on track if the limit-state
machinery ever breaks. If the limit is genuinely still in force the wake is
harmless (the banner just re-prints and limit_tick re-arms); once it lifts,
the queued wake doubles as a resume nudge.
2026-06-11 01:00:29 +00:00
d6e1a704da watchdog: parse limit-reset time, never reboot limit-stalled sessions; rename orch session
Replace the blind every-300s 'limit appears lifted' nudge (claude) and the
opencode-only _maybe_nudge_limit with one unified limit_tick state machine:

- parse the reset time from the limit banner (last match wins; stale banners
  whose time already passed fall back rather than waiting ~a day)
- arm a quiet window until reset+45s; parse failure -> flat 5-minute probe
  loop (operator-specified; not exponential backoff)
- while armed, suppress ALL healing: a limit-stalled session is NEVER
  kill+rebooted (this was the conc-phase churn: claude limit stalls fell
  through to the generic idle reboot, losing the banner and re-hitting
  the limit fresh)
- at window end send ONE nudge as a self-verifying probe: spinner clears
  the state; a re-printed banner re-arms from the fresh reset time
- dedupe: never stack a probe while our own text is visible in the pane
- state persisted per session in LOG_DIR (.limited-<session>) so watchdog
  restarts keep the window
- orchestrator gets the same treatment: limit_tick in heal_orchestrator,
  a per-signal-tick orch_limit_check, and hourly wakes deferred during
  limit windows
- loud WARNING at 3 probes, then continue flat probes forever

Also rename the orchestrator session default cc-ci-orchestrator-vm ->
cc-ci-orchestrator (launch.py ORCH_SESSION, launch-orchestrator.py SESSION,
docs/scripts references).
2026-06-11 00:55:07 +00:00
aefaf17757 plan: recipe-customization restructure — full builder/adversary plan (P1-P6 + real-CI regression sweep gate) 2026-06-10 16:28:09 +00:00
a6e177e286 journal: phase conc DONE — concurrency restructure landed, M1+M2 adversary-verified 2026-06-10 13:58:25 +00:00
e0c9f23391 feat(launch): ADV_MODEL — per-role model override for the Adversary loop 2026-06-10 04:03:35 +00:00
a1b4943da1 plan: adapt concurrency restructure to builder/adversary loop protocol (gates M1/M2, phase-namespaced state) 2026-06-10 03:54:31 +00:00
520fb18461 plan: full concurrency restructure implementation plan (builder/adversary handoff) 2026-06-10 03:48:14 +00:00
0d169c2a20 plan: concurrency restructure — flock-probe janitor, per-run ABRA_DIR, lock-lifetime chain 2026-06-10 03:41:05 +00:00
335ea1d7c1 journal: session wrap — concurrent CI fixed, immich (245) + plausible (247) both GREEN 2026-06-09 23:18:32 +00:00
08706c665e memory: swarm UpdateStatus convergence gotchas (builds 238/241) 2026-06-09 23:14:18 +00:00
926e4553b7 journal: immich PR #2 GREEN (build 245, level=4); cc-ci PR #9 merged; plausible unblocked 2026-06-09 23:13:56 +00:00
e3e0a9ee80 journal: two harness convergence fixes (UpdateStatus settle + paused-is-settled); immich build 245 in flight 2026-06-09 23:08:59 +00:00
1580738c97 journal: concurrent-CI fixes landed on cc-ci main (build 236 green) 2026-06-09 22:02:08 +00:00
ec3e0c35dd journal: orchestrator handover — concurrent-CI fixes + immich/plausible drive 2026-06-09 19:45:21 +00:00
542ed0afe3 memory: move agent memory into repo (memory/), note in AGENTS.md
Persistent agent memories now live in memory/ in this repo; the Claude
auto-memory path is symlinked here so future memories land in the repo
and get committed like any other change.
2026-06-09 19:25:20 +00:00
330378d30d ideas: fail-fast on crash-looping deploy + don't let one wedged run starve the CI queue
After a live incident: plausible build 220 (ClickHouse exit-1 crash-loop) held the
single serial runner for its full 1200s DEPLOY_TIMEOUT, starving immich PR-2's
queued builds for ~12min until manually torn down. Logs the two fixes (fail-fast
on crash-loop; head-of-line blocking on the serial runner) + the interim
mitigations (step-2b dev loop for debugging; SIGINT to free a wedged run).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 16:29:30 +00:00
a2c1cb550a upstream(immich): release-notes sources + DB-pin & VectorChord backup/restore notes 2026-06-09 15:49:20 +00:00
c60fc6d056 change(cleanup): reap dev deploys at start+end of /upgrade-all instead of a timer
Per operator: drop the hourly cc-ci-reap-dev-deploys systemd timer; instead run the
dev-* reaper at the START (Step 0, alongside the orphan sweep) and END (new step 4b)
of each /upgrade-all run, with THRESHOLD=0 (the run is quiescent then, so clear all
dev-* unconditionally). The reaper keeps its safe default (4h) for ad-hoc use.
Step-2b mandatory teardown is unchanged (primary mechanism); this is the backstop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:47:16 +00:00
23bba98be4 feat(cleanup): guarantee step-2b dev deploys get reaped
- /recipe-upgrade step 2b: teardown is now MANDATORY on every exit path (finally),
  with a verify-no-leak check; tear down even on failure before reporting.
- reap-dev-deploys.sh: safe, age-gated backstop that removes only idle dev-* stacks
  (never CI per-run stacks, warm-*, infra; an active dev loop stays fresh).
- orchestrator: hourly cc-ci-reap-dev-deploys systemd timer runs it against cc-ci,
  bounding any leaked dev deploy from a crashed/abandoned loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:42:23 +00:00
77ba7ee075 guardrail: upgrader never modifies cc-ci tests/harness unless --with-tests
Absolute, mode-gated rule reinforced in /recipe-upgrade (Guardrails + the new
step-2b direct-deploy loop where the upgrader has cc-ci host access) and noted as
the interim safeguard in IDEAS.md until the deploy loop moves to isolated infra.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:32:50 +00:00
98276124e5 ideas: isolate the upgrader's direct deploy onto separate infra (can't tamper with tests)
The step-2b direct deploy-and-inspect runs on the cc-ci server's own swarm today, so
the upgrader holds write access to the host that owns the tests + CI verdict — a
trust hole (could hack the tests). Parked idea: a dedicated throwaway test server
with scoped creds, so the upgrader can deploy+inspect but not modify the gate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:31:20 +00:00
de0baa00b1 feat(upgrade): add direct deploy-and-inspect dev loop (recipe-maintainer style) before CI
The upgrader now deploys the WIP recipe directly on cc-ci (abra app deploy --chaos
under a dev-<recipe> domain on the local swarm) and inspects live logs
(docker service logs) to SEE what the upgrade does, before/alongside the !testme
CI gate. ADDITIONAL to — not a replacement for — the 3-attempt !testme verification;
it front-loads diagnosis so fewer CI attempts are spent on basics. Always torn down
(orphan-sweep is the backstop). /upgrade-all dispatch references the new step 2b.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:28:06 +00:00
f1c63f1ca0 feat(survey): don't skip recipes on abra tag+digest FATA — check upstream directly
abra hard-FATAs on image refs with both a tag and a digest (immich:
postgres:14-vectorchord...@sha256:..., valkey:9@sha256:...), aborting the whole
recipe survey so immich was silently dropped. Per operator: don't normalize the
recipe; catch the failure and check the upstream registry directly.

- /upgrade-all box item 4: a tag+digest parse FATA is NOT not-fetchable. Use abra
  for the images it parses; for the rest, list upstream tags (Docker Hub / ghcr /
  buildx imagetools) and judge availability (match the variant the app supports,
  not blindly the max). Upgradeable if abra OR the direct check finds a newer tag.
- /recipe-upgrade implement: hand-bump tag+digest pins (abra can't), and re-resolve
  + re-append the digest for the new tag so the pin is preserved (never drop it).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:21:36 +00:00
f687174b53 feat(recipe-report): TESTS rename + live binary STATUS column
Rename the table's Status column -> TESTS (the CI/test verdict, unchanged
content). Add a new STATUS column showing the PR's LIVE state, fetched
client-side: 'open' vs a ✓ for any not-open state (merged or closed). The cell
is a JS hook (data-repo/data-pr) derived from existing recipe+pr fields; an
inline, dependency-free, CSP-safe script GETs the same-origin /pr/<recipe>/<n>
proxy (cc-ci nix/modules/reports.nix) on load and every 30s, and degrades to a
muted '?' if the proxy/repo is unreachable. Blank cell when a row has no PR.
Doc + SKILL updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 13:15:02 +00:00
eb1439324e plan: finalize report PR-STATUS column (binary open/✓; proxy in reports.nix; decisions locked)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 12:59:41 +00:00
a76aca80e2 plan: Recipe Report TESTS rename + live PR-STATUS column (public mirrors + same-origin proxy)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 12:52:59 +00:00
1f52795534 skill(ci-dev-workflow): capture the cc-ci feature-dev flow + adversary plan template
Documents the end-to-end workflow used to land the intentional-skips/4-rung-ladder
feature: explore harness → branch a local cc-ci clone → implement + unit-verify
cold on cc-ci → live full-stage check → open PR (never push main) → independent
adversary verdict → squash-merge on PASS → deploy via /root/builder-clone rebuild.
Includes the adversary-verify-pr6.md plan as a reusable template.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 03:16:47 +00:00
dbafcddb62 feat(upgrade-all): sweep orphans from previous runs at the start of each weekly run
Adds sweep-orphans.sh (safe-by-allowlist: removes orphan test stacks, standalone
debug containers >30m old, leaked dangling volumes, and reparented docker-run
wrappers; spares infra + warm-* canonicals and their retained volumes) and wires
it as Step 0 of /upgrade-all so a prior run's leaked stack/container/process can't
contend for the shared Swarm or skew the survey. Idempotent; no-op when clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 02:39:43 +00:00
d31378b180 feat(recipe-report): restructure page — priority-sorted wire table w/ CVE column, addendum, per-recipe changes
New page order: short lead -> the full wire table (sorted by priority-to-address,
CVE recipes first, new CVEs count column) -> Addendum (bullets of real special
issues, omitted if clean) -> Security Bulletin -> per-recipe "What changed".

- recipe-report.py: _table() gains a CVEs column + recipe-name linking; new
  _changes() helper; render() reordered; docstring SPEC SHAPE updated
  (cve/addendum/changes added, needs_attention/routine removed).
- recipe-report/SKILL.md + example-spec.json: new procedure, spec shape, and
  gold-standard template (2026-06-05, new format).
- launch-report.py: kickoff text reflects the new priority-ordered structure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:06:43 +00:00
49491fcb90 fix(recipe-report): weekly trigger uses launch-report.py 'fresh'; start kills idle/leftover session
A stale cc-ci-report session (from a prior week's run, gone idle) caused this week's
launch-report.py 'start' (use-or-create) to leave it and never run a fresh report.
Fix: upgrade-all step 6 now calls 'fresh', and start only leaves a session that's
actively busy producing a report — an idle/leftover session is killed + restarted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:06:46 +00:00
f397968f47 upstream(uptime-kuma): release-notes sources 2026-06-05 06:18:33 +00:00
a22dc4fc93 upstream(plausible): release-notes sources 2026-06-05 04:35:44 +00:00
4a2af99147 upstream(n8n): release-notes sources 2026-06-05 04:28:33 +00:00
509b36b242 upstream(mattermost-lts): release-notes sources
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 04:25:02 +00:00
538c41810b upstream(matrix-synapse): release-notes sources 2026-06-05 03:13:27 +00:00
9bd5a2baf0 upstream(mailu): release-notes sources 2026-06-05 03:11:09 +00:00
44e396c3fd upstream(lasuite-meet): release-notes sources 2026-06-05 02:59:46 +00:00
b63edbbd7f upstream(lasuite-drive): release-notes sources 2026-06-05 02:49:50 +00:00
a3740e1fdf upstream(lasuite-docs): release-notes sources 2026-06-05 02:43:03 +00:00
f5da8ac3ff upstream(keycloak): release-notes sources 2026-06-05 02:27:00 +00:00
287fb51d91 upstream(ghost): release-notes sources 2026-06-05 02:20:11 +00:00
d24feb0671 upstream(discourse): release-notes sources 2026-06-05 02:02:19 +00:00
85065880a5 upstream(custom-html-tiny): release-notes sources
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-05 01:58:29 +00:00
65a96453fc fix(recipe-upgrade): reconcile mirror from TRUE coopcloud upstream, not from the mirror itself
The reconcile that's supposed to make the mirror main == upstream main was fetching origin/main —
but origin is the cc-ci MIRROR, so it synced the mirror to itself (a no-op) and never pulled real
upstream. Fix: fetch coopcloud explicitly (git.coopcloud.tech/coop-cloud/<recipe>, default branch
main OR master) via an 'upstream' remote and force-sync the mirror main + tags from it. Every recipe
has a coopcloud correspondent; none are forked. Also reorder the skill so the reconcile runs BEFORE
the upgrade check, so the check sees the real current recipe. Verified by divergence test (diverged a
mirror, reconcile snapped it back to coopcloud HEAD).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:44:45 +00:00
167ce2d881 upstream(custom-html): release-notes sources 2026-06-05 01:32:13 +00:00
f0716764db feat(recipe-upgrade): upstream release-notes registry + recipe-README read (recipe-maintainer parity)
Close the two gaps vs recipe-maintainer's recipe-upgrade-plan:
- Per-recipe release-notes registry at cc-ci-plan/upstream/<recipe>.md (discover the source repo +
  releases/changelog URL for each image once, persist+commit, reuse) — fetch release notes FROM those
  URLs instead of rediscovering ad-hoc each run. Format doc + cryptpad seed included.
- Explicitly read the recipe's README for shipped upgrade/migration notes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:28:27 +00:00
f4b1befbdd chore(nix): weekly timer = Thu 22:00 America/New_York (Boston 10pm, DST-aware)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:21:41 +00:00
0338dc23fd chore(nix): move weekly upgrade timer to Thursdays 22:00 UTC (was Sun 02:00)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:18:20 +00:00
d8ad5a2805 feat(recipe-report): link recipe names in all story sections (security/needs/routine), not just the lead
_stories() now auto-links whole-word recipe mentions in story titles + bodies to their mirror
repos (same single-pass linkify as the lead); explicit PR/build links are untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 02:21:31 +00:00