Make explicit that ALL formatting/HTML is owned by recipe-report.py render() and
the model's only artifact is the spec JSON — never hand-write/edit HTML. Matters
now that glm-5.2 drives the report. Also fix stale 'default opus' refs (report
now defaults to opencode-go/glm-5.2, overridable via REPORT_BACKEND/REPORT_MODEL).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Trust abra fully for any image whose tag it can read — a normal semver/calver
tag with no newer version is genuinely up-to-date. Only cross-check upstream for
images abra physically can't parse (tag+digest pins, digest-only pins), which is
the actual immich blind spot. Avoids redundant upstream checks on every recipe.
abra recipe upgrade is the first approach, but it silently contributes no
candidate for tag+digest pins (FATA: tag and digest not supported), digest-only
pins, and non-semver tags. immich kept getting skipped this way. Before
concluding SKIPPED — up-to-date, do a direct upstream tag check for every image
abra could not cleanly evaluate; only skip when BOTH agree nothing is newer.
§2 no longer bumps the coop-cloud version label in the recipe PR (no
--dry-run compute, no tag). It records the recommended 'abra recipe release
<recipe> -<x|y|z>' (no --dry-run) in the PR body (§3) as the operator's
final publish step — run after the upstream PR merges, it bumps the label +
tags + pushes in one go. Bumps recipe-maintainer submodule to 9daddac (same
change across its /recipe-upstream, -upgrade-apply, -upgrade-plan, -new-tag).
cc-ci recipe-upgrade skill now computes the version via 'abra recipe release --dry-run'
(not a hand-edit) and requires the PR body to link upstream release notes per service.
Bumps the recipe-maintainer submodule pointer to the matching change.
Operator: uptime-kuma is maintained elsewhere — drop it from the weekly upgrade
but keep it in the used-recipes inventory. New cc-ci-plan/used-recipes.md is the
canonical list of every recipe cc-ci deploys/tests, with a weekly|external tier;
upgrade-all §1 now excludes 'external' rows from the candidate list (explicit
--args still override). uptime-kuma = external; all others weekly.
The durable /16 proxy fix landed in phase pvfix (2026-06-13).
Update the guard description from "safety net until that lands"
to "belt-and-suspenders even after the /16 fix" — guard logic
unchanged, description now accurate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root-caused (empirically, dockerd logs) the discourse/ghost deploy wedges:
the shared proxy overlay (/24=254 VIPs) exhausts as concurrent stack rm leaks
endpoints over many days -> tasks stuck in Swarm 'New'. Add a per-run safety
net to Step 0 (network prune + docker restart when VIP-allocation failures are
logged). Plans + memory for the durable fix (enlarge proxy to /16 in swarm.nix,
maintenance window) and for debugging/fixing the ghost PR afterward.
Per operator: just work through recipes alphabetically keeping CAP (=
DRONE_RUNNER_CAPACITY=2) subagents running at once, starting the next the moment
one finishes (rolling pool via run_in_background). Removes the wave-barrier and
the heavy/light classification entirely — simpler and no slot ever idles.
Per operator: always fill all CAP slots. Heavy/light alternation only spreads
heavies across waves while a light is available; once only heavies remain, run
two-per-wave (capacity is the tuned ceiling) instead of one-per-wave.
Host memory is the binding limit, so never schedule two HEAVY recipes in the
same capacity wave — pair each heavy (discourse/immich/matrix-synapse/
lasuite-drive/mattermost-lts/ghost) with a light one to bound peak memory while
keeping both slots busy. Heaviest-first could co-schedule two heavies and OOM/
wedge the box (the disc-50cc8a 'New'-state wedge). For CAP>2 cap heavies at
~CAP/2; if only heavies remain, run one-per-wave.
Now that the 2026-06-10 concurrency restructure makes concurrent recipe runs
safe (per-run trees, app-domain locks, isolation), default /upgrade-all to run
up to DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) recipe
subagents at a time instead of strictly sequential — using all available
concurrency without oversubscribing. Query the live capacity from
'systemctl show drone-runner-exec' (fallback 2); process recipes in waves of
CAP (emit CAP Agent calls per message, await, next wave). Flags: --capacity N,
--sequential (CAP=1, old default — use when the build loops share the box),
--parallel (unbounded). Applies to the NEXT run; the in-flight run is unaffected.
Per operator: drop the hourly cc-ci-reap-dev-deploys systemd timer; instead run the
dev-* reaper at the START (Step 0, alongside the orphan sweep) and END (new step 4b)
of each /upgrade-all run, with THRESHOLD=0 (the run is quiescent then, so clear all
dev-* unconditionally). The reaper keeps its safe default (4h) for ad-hoc use.
Step-2b mandatory teardown is unchanged (primary mechanism); this is the backstop.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- /recipe-upgrade step 2b: teardown is now MANDATORY on every exit path (finally),
with a verify-no-leak check; tear down even on failure before reporting.
- reap-dev-deploys.sh: safe, age-gated backstop that removes only idle dev-* stacks
(never CI per-run stacks, warm-*, infra; an active dev loop stays fresh).
- orchestrator: hourly cc-ci-reap-dev-deploys systemd timer runs it against cc-ci,
bounding any leaked dev deploy from a crashed/abandoned loop.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Absolute, mode-gated rule reinforced in /recipe-upgrade (Guardrails + the new
step-2b direct-deploy loop where the upgrader has cc-ci host access) and noted as
the interim safeguard in IDEAS.md until the deploy loop moves to isolated infra.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The upgrader now deploys the WIP recipe directly on cc-ci (abra app deploy --chaos
under a dev-<recipe> domain on the local swarm) and inspects live logs
(docker service logs) to SEE what the upgrade does, before/alongside the !testme
CI gate. ADDITIONAL to — not a replacement for — the 3-attempt !testme verification;
it front-loads diagnosis so fewer CI attempts are spent on basics. Always torn down
(orphan-sweep is the backstop). /upgrade-all dispatch references the new step 2b.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
abra hard-FATAs on image refs with both a tag and a digest (immich:
postgres:14-vectorchord...@sha256:..., valkey:9@sha256:...), aborting the whole
recipe survey so immich was silently dropped. Per operator: don't normalize the
recipe; catch the failure and check the upstream registry directly.
- /upgrade-all box item 4: a tag+digest parse FATA is NOT not-fetchable. Use abra
for the images it parses; for the rest, list upstream tags (Docker Hub / ghcr /
buildx imagetools) and judge availability (match the variant the app supports,
not blindly the max). Upgradeable if abra OR the direct check finds a newer tag.
- /recipe-upgrade implement: hand-bump tag+digest pins (abra can't), and re-resolve
+ re-append the digest for the new tag so the pin is preserved (never drop it).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Rename the table's Status column -> TESTS (the CI/test verdict, unchanged
content). Add a new STATUS column showing the PR's LIVE state, fetched
client-side: 'open' vs a ✓ for any not-open state (merged or closed). The cell
is a JS hook (data-repo/data-pr) derived from existing recipe+pr fields; an
inline, dependency-free, CSP-safe script GETs the same-origin /pr/<recipe>/<n>
proxy (cc-ci nix/modules/reports.nix) on load and every 30s, and degrades to a
muted '?' if the proxy/repo is unreachable. Blank cell when a row has no PR.
Doc + SKILL updated.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Documents the end-to-end workflow used to land the intentional-skips/4-rung-ladder
feature: explore harness → branch a local cc-ci clone → implement + unit-verify
cold on cc-ci → live full-stage check → open PR (never push main) → independent
adversary verdict → squash-merge on PASS → deploy via /root/builder-clone rebuild.
Includes the adversary-verify-pr6.md plan as a reusable template.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adds sweep-orphans.sh (safe-by-allowlist: removes orphan test stacks, standalone
debug containers >30m old, leaked dangling volumes, and reparented docker-run
wrappers; spares infra + warm-* canonicals and their retained volumes) and wires
it as Step 0 of /upgrade-all so a prior run's leaked stack/container/process can't
contend for the shared Swarm or skew the survey. Idempotent; no-op when clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New page order: short lead -> the full wire table (sorted by priority-to-address,
CVE recipes first, new CVEs count column) -> Addendum (bullets of real special
issues, omitted if clean) -> Security Bulletin -> per-recipe "What changed".
- recipe-report.py: _table() gains a CVEs column + recipe-name linking; new
_changes() helper; render() reordered; docstring SPEC SHAPE updated
(cve/addendum/changes added, needs_attention/routine removed).
- recipe-report/SKILL.md + example-spec.json: new procedure, spec shape, and
gold-standard template (2026-06-05, new format).
- launch-report.py: kickoff text reflects the new priority-ordered structure.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A stale cc-ci-report session (from a prior week's run, gone idle) caused this week's
launch-report.py 'start' (use-or-create) to leave it and never run a fresh report.
Fix: upgrade-all step 6 now calls 'fresh', and start only leaves a session that's
actively busy producing a report — an idle/leftover session is killed + restarted.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The reconcile that's supposed to make the mirror main == upstream main was fetching origin/main —
but origin is the cc-ci MIRROR, so it synced the mirror to itself (a no-op) and never pulled real
upstream. Fix: fetch coopcloud explicitly (git.coopcloud.tech/coop-cloud/<recipe>, default branch
main OR master) via an 'upstream' remote and force-sync the mirror main + tags from it. Every recipe
has a coopcloud correspondent; none are forked. Also reorder the skill so the reconcile runs BEFORE
the upgrade check, so the check sees the real current recipe. Verified by divergence test (diverged a
mirror, reconcile snapped it back to coopcloud HEAD).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Close the two gaps vs recipe-maintainer's recipe-upgrade-plan:
- Per-recipe release-notes registry at cc-ci-plan/upstream/<recipe>.md (discover the source repo +
releases/changelog URL for each image once, persist+commit, reuse) — fetch release notes FROM those
URLs instead of rediscovering ad-hoc each run. Format doc + cryptpad seed included.
- Explicitly read the recipe's README for shipped upgrade/migration notes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
render() auto-links whole-word recipe mentions in the editorial lead to
git.autonomic.zone/recipe-maintainers/<recipe> (single regex pass, longest-name-first,
no href corruption). Skill: lead is ~3 short paragraphs (~150-180 words) incl. an
'anything strange worth looking into' paragraph. example-spec.json lead updated to the
concise target.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Save the operator-approved 2026-06-02 spec as example-spec.json (gold standard
for voice/structure/specificity). Skill now tells the agent to match it, with
one deliberate change: keep the editorial lead TIGHT (~2 short paragraphs,
~120 words). The live 2026-06-02 page stays as the reference.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Masthead + opus 'lead' editorial (overall fleet state + what to focus on), a Security Bulletin of
critical-CVE upgrades up top (mined from per-recipe upgrade_notes_md), then needs-attention/routine,
and the comprehensive table as 'the full wire' at the end. survey now includes each recipe's
upgrade_notes_md (breaking-change/CVE analysis) so opus can lead with security.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- recipe-report.py: survey (run + per-recipe PRs + CI verdicts) / render (spec->HTML) / publish
(copy to cc-ci:/var/lib/cc-ci-reports + regen index).
- skill .claude/skills/recipe-report: review the weekly run, classify needs-attention vs routine,
publish one public HTML page per week + index at report.ci.commoninternet.net. Read-only.
- launch-report.py: one-shot cc-ci-report agent, REPORT_MODEL default opus (separate from the
sonnet upgrader), REPORT_BACKEND default claude.
- upgrade-all SKILL: closing step launches the report agent.
Serving (nix/modules/reports.nix) already deployed + live.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The actual 'abra auth error' that skipped 8 recipes was go-git failing to
fetch tags from the PRIVATE git.autonomic.zone mirror ('authentication
required: Unauthorized'), NOT the TTY issue. abra/go-git reads
remote.origin.url literally and IGNORES git url.insteadOf + credential
helpers (confirmed: insteadOf left immich Unauthorized; literal embedded URL
fixed it). Skill now bakes $GITEA_USERNAME:$GITEA_PASSWORD into origin for
git.autonomic.zone recipes before the version check, and stashes the
untracked cc-ci overlay so it isn't mis-counted as dirty-worktree.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
abra over plain 'ssh cc-ci abra ...' has no TTY -> FATA 'inappropriate ioctl
for device' (the abra error). The working harness (runner/harness/abra.py)
wraps abra in util-linux 'script' for a pseudo-TTY + passes -n. Apply the
same in the recipe-upgrade and upgrade-all skills: every abra call becomes
ssh cc-ci 'script -qec "abra <args> -n" /dev/null'. Confirmed: abra server
ls FATAs plain, works pty-wrapped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Instead of force-pushing HEAD onto the existing PR branch (history rewrite),
add a commit ON TOP of the branch tip (fast-forward) when it already exists,
so the PR's history is preserved and it re-tests. Fresh branches still push
normally. The only remaining force-push is the mirror-main->upstream sync
(intentional mirroring), never a PR branch.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When an open upgrade PR already exists for a recipe (branch upgrade-*), push
the new work onto ITS branch and update+re-test that PR — one evolving
upgrade PR per recipe instead of spawning a second parallel PR. Only open a
fresh upgrade-<version> PR when none exists. Unrelated open PRs (e.g. backup
fixes) are still never touched; merged-upstream PRs still close.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per operator: opening a new upgrade PR should stack ON TOP of any other
still-open PRs, not close them. Only PRs already merged into upstream
main are closed (merging them is a no-op). This prevents the phase-7
incident where an unrelated open ghost PR was auto-closed as 'superseded'.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Resolve the recipe branch/ref to its head commit sha via the Gitea API
before invoking the cold full-suite run, so the upgrade tier deploys the
exact PR head. From the phase-5 upgrade-flow verification.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
When multiple commit statuses exist (e.g. an Adversary probe + the real run),
the first status in the array may not be the cc-ci run. Filter by context
'cc-ci/testme' to get the correct Drone build URL.
Operator: when the final phase completes, install the weekly cron anchored to
actual completion — first run ~1h after the build finishes, weekly from then on
(supersedes the fixed "Sat 03:00 UTC" placeholder).
- plan-phase5 §4: orchestrator computes T0=now+1h, installs a weekly job at T0's
DOW+HH:MM running launch-upgrader.sh start; cron env needs claude on PATH +
tmux + claude.ai login (mirror cc-ci-loops.service). VERIFY the first kickoff:
cheap --dry-run pre-check, then confirm the real T0 fire launched the
cc-ci-upgrader agent (status RUNNING, ran /upgrade-all, summary produced);
record schedule + verified kickoff in DECISIONS.md.
- upgrade-all skill Cron section + cron memory updated to the completion-anchored
schedule + first-kickoff verification.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The weekly upgrade run now executes inside a dedicated, remote-control agent
(cc-ci-upgrader) — viewable/steerable at claude.ai/code like the Builder — rather
than buried in headless cron output.
- launch-upgrader.sh: spins up the cc-ci-upgrader tmux session under
--remote-control with a kickoff that runs /upgrade-all (DEFAULT mode) to
completion. On finish the agent STOPS and stays idle (does NOT self-terminate)
so the run + summary stay reviewable in the web UI. `start` = use-or-create:
leaves an in-flight (busy) run alone, else clears a finished/idle/wedged
session and runs fresh; `fresh` always restarts. UPGRADER_ARGS passes flags
(e.g. --dry-run); never --with-tests.
- launch.sh: orchestrator_alive() now also skips the cc-ci-upgrader
remote-control name, so the upgrader job isn't mistaken for the orchestrator.
- upgrade-all skill: documents it runs as the cc-ci-upgrader agent; the weekly
cron invokes `launch-upgrader.sh start` (not /upgrade-all inline).
- Phase 5: V8a verifies the agent lifecycle (launch → run to completion → stay
idle/viewable → next start clears it); V9 stops the verification session.
- cron memory: weekly task = launch-upgrader.sh start at 0 3 * * 6 UTC.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per operator:
- Verify via `!testme` posted ON the recipe PR (the real CI path) so results are
viewable in the PR; iterate up to 3 !testme runs (fix a real regression + re-test).
New helper testme-on-pr.sh posts !testme and polls the PR head commit status
for the verdict (POST=0 to keep polling without re-triggering).
- Test updates are now OPT-IN via `--with-tests`. DEFAULT: recipe PR only using
existing tests; if a test fails and is genuinely stale, leave an explanatory
COMMENT on the PR (upgrade looks correct; re-run --with-tests to update tests)
and do NOT touch any test. --with-tests keeps the verified cc-ci test-update PR
path (verified via the branch-checkout harness run, since !testme uses prod tests).
- upgrade-all (weekly cron) calls the DEFAULT — never auto-edits tests unattended;
surfaces "tests look stale" PRs in the summary for the operator to opt in per-recipe.
- New RESULT: SUCCESS-PENDING-TESTS for the recipe-green-but-test-stale default case.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per operator: an open mirror PR must mean "genuinely still open against true
current upstream main". On every run the recipe-upgrade flow now:
- force-syncs the recipe-maintainers/<recipe> mirror `main` to be IDENTICAL to
upstream main (origin/main of the abra checkout = coopcloud);
- closes any open mirror PR whose changes are already in upstream main (merged
upstream, no-op merge detected via `git merge-tree` vs main's tree) — even
when the recipe is up to date (new `--reconcile-only` mode, run in step 1);
- when opening a new upgrade PR, closes any other still-open PR for that recipe
(superseded) and opens the new one IN ITS PLACE; same-version re-runs just
update the existing same-branch PR.
open-recipe-pr.sh gains the --reconcile-only mode + the close logic (with an
auto-close comment naming the reason). upgrade-all reconciles every candidate's
mirror during the survey so merged PRs are closed fleet-wide. Still never merges.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Operator: don't run the weekly upgrade-all while the build loops are still
constructing cc-ci (shared-host contention). Activate the Sat 03:00 UTC
(0 3 * * 6) cron only once the build is complete; on-demand until then.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per-recipe and fleet-wide upgrade skills modelled on recipe-maintainer's
recipe-upgrade-full / recipe-upgrade-cron-all, but gated by the cc-ci CI server
and inheriting ci-test-review's create+verify+never-merge discipline.
- recipe-upgrade/: plan (release notes, breaking changes) -> implement (abra
recipe upgrade + version bump + config, lint) -> open the recipe PR -> VERIFY
green on cc-ci (full suite cold against the PR head via verify-pr.sh). If the
upgrade is correct but a cc-ci TEST went stale, also update the test, verify
it, and open a second PR to recipe-maintainers/cc-ci. Never merges; never
weakens a test; prefers a recipe-only PR. Emits a parseable RESULT line.
+ open-recipe-pr.sh: adapted recipe-create-pr; runs on cc-ci (has the recipe
checkout + bot token), creds passed from the orchestrator .testenv;
force-syncs the mirror main so the PR diff is exactly the upgrade.
- upgrade-all/: weekly fan-out — enumerate enrolled recipes, survey upgrades,
run /recipe-upgrade per upgradeable recipe via subagent (sequential default,
--parallel / --dry-run), collect into one PR-list summary. Coordination +
single-writer + shared-Swarm-teardown guardrails; built for a weekly cron.
- ci-test-review/verify-pr.sh: pass SRC (recipe-maintainers/<recipe>) alongside
REF so the harness clones the mirror PR head correctly (its real contract).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per operator: the skill should not just propose, it should CREATE the fix PR
(recipe repo or cc-ci repo) and VERIFY it green on its own CI server — but not
merge. It drives cc-ci like the loops do.
- SKILL.md: diagnose+classify (recipe vs CI-server) -> author the fix + open a
PR (recipe-create-pr for recipe PRs; Gitea API for cc-ci PRs, dedicated branch
in a separate clone, single-writer safe) -> VERIFY on cc-ci (full suite cold
against the PR head = the !testme dogfood path) -> report a verified,
ready-to-merge PR. Never merges; never weakens a test; flake != bug. General
bar = one cold green; repeated-green (REPEAT=3) only for a known-flaky recipe.
Adds coordination/single-writer guardrails (shared Swarm is stateful; tear
down deploys; never push main or touch the loops' clones).
- verify-pr.sh: deterministic recipe-PR gate — RECIPE + REF -> cold full suite
on cc-ci, green iff every repeat exits 0. CI-server-PR verification stays
bespoke (branch checkout + rebuild + regression sample) per SKILL.md.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The on-demand AI review layer is now an orchestration-repo skill built directly
by the orchestrator, NOT a loops phase in the cc-ci product repo:
- .claude/skills/ci-test-review/{SKILL.md,run-all-recipes.sh}: runs the real
cc-ci harness across all enrolled recipes (deterministic, AI-free execution),
then AI diagnoses each failure and classifies it as needing a recipe PR / a
CI-server PR / a stale-test update — or reports "ALL PASSED, recipes + tests
up to date". Proposes PRs; never decides pass/fail; never auto-merges.
- .gitignore: track .claude/skills/ (shareable) while still ignoring local
claude session state (locks, history) under .claude/.
- launch.sh: remove Phase 3r from PHASES_SPEC; loops sequence back to
1c 1b 1d 1e 2w 2pc 2 2b 3 4. Deleted plan-phase3r (superseded by the skill).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>