Commit Graph

36 Commits

Author SHA1 Message Date
c60fc6d056 change(cleanup): reap dev deploys at start+end of /upgrade-all instead of a timer
Per operator: drop the hourly cc-ci-reap-dev-deploys systemd timer; instead run the
dev-* reaper at the START (Step 0, alongside the orphan sweep) and END (new step 4b)
of each /upgrade-all run, with THRESHOLD=0 (the run is quiescent then, so clear all
dev-* unconditionally). The reaper keeps its safe default (4h) for ad-hoc use.
Step-2b mandatory teardown is unchanged (primary mechanism); this is the backstop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:47:16 +00:00
23bba98be4 feat(cleanup): guarantee step-2b dev deploys get reaped
- /recipe-upgrade step 2b: teardown is now MANDATORY on every exit path (finally),
  with a verify-no-leak check; tear down even on failure before reporting.
- reap-dev-deploys.sh: safe, age-gated backstop that removes only idle dev-* stacks
  (never CI per-run stacks, warm-*, infra; an active dev loop stays fresh).
- orchestrator: hourly cc-ci-reap-dev-deploys systemd timer runs it against cc-ci,
  bounding any leaked dev deploy from a crashed/abandoned loop.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:42:23 +00:00
77ba7ee075 guardrail: upgrader never modifies cc-ci tests/harness unless --with-tests
Absolute, mode-gated rule reinforced in /recipe-upgrade (Guardrails + the new
step-2b direct-deploy loop where the upgrader has cc-ci host access) and noted as
the interim safeguard in IDEAS.md until the deploy loop moves to isolated infra.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:32:50 +00:00
de0baa00b1 feat(upgrade): add direct deploy-and-inspect dev loop (recipe-maintainer style) before CI
The upgrader now deploys the WIP recipe directly on cc-ci (abra app deploy --chaos
under a dev-<recipe> domain on the local swarm) and inspects live logs
(docker service logs) to SEE what the upgrade does, before/alongside the !testme
CI gate. ADDITIONAL to — not a replacement for — the 3-attempt !testme verification;
it front-loads diagnosis so fewer CI attempts are spent on basics. Always torn down
(orphan-sweep is the backstop). /upgrade-all dispatch references the new step 2b.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:28:06 +00:00
f1c63f1ca0 feat(survey): don't skip recipes on abra tag+digest FATA — check upstream directly
abra hard-FATAs on image refs with both a tag and a digest (immich:
postgres:14-vectorchord...@sha256:..., valkey:9@sha256:...), aborting the whole
recipe survey so immich was silently dropped. Per operator: don't normalize the
recipe; catch the failure and check the upstream registry directly.

- /upgrade-all box item 4: a tag+digest parse FATA is NOT not-fetchable. Use abra
  for the images it parses; for the rest, list upstream tags (Docker Hub / ghcr /
  buildx imagetools) and judge availability (match the variant the app supports,
  not blindly the max). Upgradeable if abra OR the direct check finds a newer tag.
- /recipe-upgrade implement: hand-bump tag+digest pins (abra can't), and re-resolve
  + re-append the digest for the new tag so the pin is preserved (never drop it).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 15:21:36 +00:00
f687174b53 feat(recipe-report): TESTS rename + live binary STATUS column
Rename the table's Status column -> TESTS (the CI/test verdict, unchanged
content). Add a new STATUS column showing the PR's LIVE state, fetched
client-side: 'open' vs a ✓ for any not-open state (merged or closed). The cell
is a JS hook (data-repo/data-pr) derived from existing recipe+pr fields; an
inline, dependency-free, CSP-safe script GETs the same-origin /pr/<recipe>/<n>
proxy (cc-ci nix/modules/reports.nix) on load and every 30s, and degrades to a
muted '?' if the proxy/repo is unreachable. Blank cell when a row has no PR.
Doc + SKILL updated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 13:15:02 +00:00
1f52795534 skill(ci-dev-workflow): capture the cc-ci feature-dev flow + adversary plan template
Documents the end-to-end workflow used to land the intentional-skips/4-rung-ladder
feature: explore harness → branch a local cc-ci clone → implement + unit-verify
cold on cc-ci → live full-stage check → open PR (never push main) → independent
adversary verdict → squash-merge on PASS → deploy via /root/builder-clone rebuild.
Includes the adversary-verify-pr6.md plan as a reusable template.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 03:16:47 +00:00
dbafcddb62 feat(upgrade-all): sweep orphans from previous runs at the start of each weekly run
Adds sweep-orphans.sh (safe-by-allowlist: removes orphan test stacks, standalone
debug containers >30m old, leaked dangling volumes, and reparented docker-run
wrappers; spares infra + warm-* canonicals and their retained volumes) and wires
it as Step 0 of /upgrade-all so a prior run's leaked stack/container/process can't
contend for the shared Swarm or skew the survey. Idempotent; no-op when clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 02:39:43 +00:00
d31378b180 feat(recipe-report): restructure page — priority-sorted wire table w/ CVE column, addendum, per-recipe changes
New page order: short lead -> the full wire table (sorted by priority-to-address,
CVE recipes first, new CVEs count column) -> Addendum (bullets of real special
issues, omitted if clean) -> Security Bulletin -> per-recipe "What changed".

- recipe-report.py: _table() gains a CVEs column + recipe-name linking; new
  _changes() helper; render() reordered; docstring SPEC SHAPE updated
  (cve/addendum/changes added, needs_attention/routine removed).
- recipe-report/SKILL.md + example-spec.json: new procedure, spec shape, and
  gold-standard template (2026-06-05, new format).
- launch-report.py: kickoff text reflects the new priority-ordered structure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 17:06:43 +00:00
49491fcb90 fix(recipe-report): weekly trigger uses launch-report.py 'fresh'; start kills idle/leftover session
A stale cc-ci-report session (from a prior week's run, gone idle) caused this week's
launch-report.py 'start' (use-or-create) to leave it and never run a fresh report.
Fix: upgrade-all step 6 now calls 'fresh', and start only leaves a session that's
actively busy producing a report — an idle/leftover session is killed + restarted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:06:46 +00:00
65a96453fc fix(recipe-upgrade): reconcile mirror from TRUE coopcloud upstream, not from the mirror itself
The reconcile that's supposed to make the mirror main == upstream main was fetching origin/main —
but origin is the cc-ci MIRROR, so it synced the mirror to itself (a no-op) and never pulled real
upstream. Fix: fetch coopcloud explicitly (git.coopcloud.tech/coop-cloud/<recipe>, default branch
main OR master) via an 'upstream' remote and force-sync the mirror main + tags from it. Every recipe
has a coopcloud correspondent; none are forked. Also reorder the skill so the reconcile runs BEFORE
the upgrade check, so the check sees the real current recipe. Verified by divergence test (diverged a
mirror, reconcile snapped it back to coopcloud HEAD).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:44:45 +00:00
f0716764db feat(recipe-upgrade): upstream release-notes registry + recipe-README read (recipe-maintainer parity)
Close the two gaps vs recipe-maintainer's recipe-upgrade-plan:
- Per-recipe release-notes registry at cc-ci-plan/upstream/<recipe>.md (discover the source repo +
  releases/changelog URL for each image once, persist+commit, reuse) — fetch release notes FROM those
  URLs instead of rediscovering ad-hoc each run. Format doc + cryptpad seed included.
- Explicitly read the recipe's README for shipped upgrade/migration notes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 01:28:27 +00:00
a6efcec720 feat(recipe-report): link recipe names in the lead to their mirror repos; 3-para concise lead
render() auto-links whole-word recipe mentions in the editorial lead to
git.autonomic.zone/recipe-maintainers/<recipe> (single regex pass, longest-name-first,
no href corruption). Skill: lead is ~3 short paragraphs (~150-180 words) incl. an
'anything strange worth looking into' paragraph. example-spec.json lead updated to the
concise target.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 02:17:19 +00:00
ea2d8c8210 feat(recipe-report): use approved 2026-06-02 report as the style template; tighter lead for future runs
Save the operator-approved 2026-06-02 spec as example-spec.json (gold standard
for voice/structure/specificity). Skill now tells the agent to match it, with
one deliberate change: keep the editorial lead TIGHT (~2 short paragraphs,
~120 words). The live 2026-06-02 page stays as the reference.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 02:06:45 +00:00
6cf59130db feat(recipe-report): newspaper front-page layout — editorial lead + CVE security bulletin first
Masthead + opus 'lead' editorial (overall fleet state + what to focus on), a Security Bulletin of
critical-CVE upgrades up top (mined from per-recipe upgrade_notes_md), then needs-attention/routine,
and the comprehensive table as 'the full wire' at the end. survey now includes each recipe's
upgrade_notes_md (breaking-change/CVE analysis) so opus can lead with security.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 23:13:40 +00:00
c7301a9e39 feat(recipe-report): /recipe-report skill + helper + launcher (default opus); wire into upgrade-all
- recipe-report.py: survey (run + per-recipe PRs + CI verdicts) / render (spec->HTML) / publish
  (copy to cc-ci:/var/lib/cc-ci-reports + regen index).
- skill .claude/skills/recipe-report: review the weekly run, classify needs-attention vs routine,
  publish one public HTML page per week + index at report.ci.commoninternet.net. Read-only.
- launch-report.py: one-shot cc-ci-report agent, REPORT_MODEL default opus (separate from the
  sonnet upgrader), REPORT_BACKEND default claude.
- upgrade-all SKILL: closing step launches the report agent.
Serving (nix/modules/reports.nix) already deployed + live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 23:02:22 +00:00
5c691cdb66 fix(upgrade skills): real abra-auth fix — embed git.autonomic.zone creds in origin (go-git)
The actual 'abra auth error' that skipped 8 recipes was go-git failing to
fetch tags from the PRIVATE git.autonomic.zone mirror ('authentication
required: Unauthorized'), NOT the TTY issue. abra/go-git reads
remote.origin.url literally and IGNORES git url.insteadOf + credential
helpers (confirmed: insteadOf left immich Unauthorized; literal embedded URL
fixed it). Skill now bakes $GITEA_USERNAME:$GITEA_PASSWORD into origin for
git.autonomic.zone recipes before the version check, and stashes the
untracked cc-ci overlay so it isn't mis-counted as dirty-worktree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 04:40:59 +00:00
027fdbd161 fix(upgrade skills): run abra over a pseudo-TTY (fixes FATA inappropriate ioctl)
abra over plain 'ssh cc-ci abra ...' has no TTY -> FATA 'inappropriate ioctl
for device' (the abra error). The working harness (runner/harness/abra.py)
wraps abra in util-linux 'script' for a pseudo-TTY + passes -n. Apply the
same in the recipe-upgrade and upgrade-all skills: every abra call becomes
ssh cc-ci 'script -qec "abra <args> -n" /dev/null'. Confirmed: abra server
ls FATAs plain, works pty-wrapped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 04:06:38 +00:00
ad7ba8375a fix(recipe-upgrade): extend open upgrade PRs by commit-on-top, no force-push
Instead of force-pushing HEAD onto the existing PR branch (history rewrite),
add a commit ON TOP of the branch tip (fast-forward) when it already exists,
so the PR's history is preserved and it re-tests. Fresh branches still push
normally. The only remaining force-push is the mirror-main->upstream sync
(intentional mirroring), never a PR branch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 01:58:10 +00:00
5f814307ad fix(recipe-upgrade): default to extending an existing open upgrade PR, not a parallel one
When an open upgrade PR already exists for a recipe (branch upgrade-*), push
the new work onto ITS branch and update+re-test that PR — one evolving
upgrade PR per recipe instead of spawning a second parallel PR. Only open a
fresh upgrade-<version> PR when none exists. Unrelated open PRs (e.g. backup
fixes) are still never touched; merged-upstream PRs still close.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 01:54:58 +00:00
19fda8d2b8 fix(recipe-upgrade): stop auto-closing superseded/unrelated open PRs
Per operator: opening a new upgrade PR should stack ON TOP of any other
still-open PRs, not close them. Only PRs already merged into upstream
main are closed (merging them is a no-op). This prevents the phase-7
incident where an unrelated open ghost PR was auto-closed as 'superseded'.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 00:07:05 +00:00
1f96eba577 fix(ci-test-review): resolve PR ref to commit sha in verify-pr.sh
Resolve the recipe branch/ref to its head commit sha via the Gitea API
before invoking the cold full-suite run, so the upgrade tier deploys the
exact PR head. From the phase-5 upgrade-flow verification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 21:46:29 +00:00
9574972f1d feat(skill): add Hetzner server recovery playbook 2026-06-01 13:48:23 +00:00
a896ee9476 fix(testme-on-pr): wait for a fresh cc-ci status update 2026-06-01 13:03:41 +00:00
2486b7c368 fix(ci-test-review): resolve remote cc-ci worktree 2026-06-01 13:03:41 +00:00
df6ca04611 feat(recipe-upgrade): add stale-test PR helpers 2026-06-01 03:48:05 +00:00
6910b197d0 fix(testme-on-pr): read cc-ci/testme context URL not first-status URL
When multiple commit statuses exist (e.g. an Adversary probe + the real run),
the first status in the array may not be the cc-ci run. Filter by context
'cc-ci/testme' to get the correct Drone build URL.
2026-05-31 14:00:02 +00:00
0df57c6d0c fix(open-recipe-pr): replace python3 with jq (cc-ci has jq, not python3) 2026-05-31 13:35:07 +00:00
1c2be64124 Phase 5 §4: install weekly upgrade cron at completion+1h and verify first kickoff
Operator: when the final phase completes, install the weekly cron anchored to
actual completion — first run ~1h after the build finishes, weekly from then on
(supersedes the fixed "Sat 03:00 UTC" placeholder).

- plan-phase5 §4: orchestrator computes T0=now+1h, installs a weekly job at T0's
  DOW+HH:MM running launch-upgrader.sh start; cron env needs claude on PATH +
  tmux + claude.ai login (mirror cc-ci-loops.service). VERIFY the first kickoff:
  cheap --dry-run pre-check, then confirm the real T0 fire launched the
  cc-ci-upgrader agent (status RUNNING, ran /upgrade-all, summary produced);
  record schedule + verified kickoff in DECISIONS.md.
- upgrade-all skill Cron section + cron memory updated to the completion-anchored
  schedule + first-kickoff verification.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 21:21:20 +01:00
bf71420106 Add cc-ci-upgrader agent: observable one-shot weekly upgrade-run agent
The weekly upgrade run now executes inside a dedicated, remote-control agent
(cc-ci-upgrader) — viewable/steerable at claude.ai/code like the Builder — rather
than buried in headless cron output.

- launch-upgrader.sh: spins up the cc-ci-upgrader tmux session under
  --remote-control with a kickoff that runs /upgrade-all (DEFAULT mode) to
  completion. On finish the agent STOPS and stays idle (does NOT self-terminate)
  so the run + summary stay reviewable in the web UI. `start` = use-or-create:
  leaves an in-flight (busy) run alone, else clears a finished/idle/wedged
  session and runs fresh; `fresh` always restarts. UPGRADER_ARGS passes flags
  (e.g. --dry-run); never --with-tests.
- launch.sh: orchestrator_alive() now also skips the cc-ci-upgrader
  remote-control name, so the upgrader job isn't mistaken for the orchestrator.
- upgrade-all skill: documents it runs as the cc-ci-upgrader agent; the weekly
  cron invokes `launch-upgrader.sh start` (not /upgrade-all inline).
- Phase 5: V8a verifies the agent lifecycle (launch → run to completion → stay
  idle/viewable → next start clears it); V9 stops the verification session.
- cron memory: weekly task = launch-upgrader.sh start at 0 3 * * 6 UTC.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 21:12:47 +01:00
4a1da1dd60 recipe-upgrade: !testme-on-PR verification + make test PRs opt-in (--with-tests)
Per operator:
- Verify via `!testme` posted ON the recipe PR (the real CI path) so results are
  viewable in the PR; iterate up to 3 !testme runs (fix a real regression + re-test).
  New helper testme-on-pr.sh posts !testme and polls the PR head commit status
  for the verdict (POST=0 to keep polling without re-triggering).
- Test updates are now OPT-IN via `--with-tests`. DEFAULT: recipe PR only using
  existing tests; if a test fails and is genuinely stale, leave an explanatory
  COMMENT on the PR (upgrade looks correct; re-run --with-tests to update tests)
  and do NOT touch any test. --with-tests keeps the verified cc-ci test-update PR
  path (verified via the branch-checkout harness run, since !testme uses prod tests).
- upgrade-all (weekly cron) calls the DEFAULT — never auto-edits tests unattended;
  surfaces "tests look stale" PRs in the summary for the operator to opt in per-recipe.
- New RESULT: SUCCESS-PENDING-TESTS for the recipe-green-but-test-stale default case.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 20:18:59 +01:00
62b7af7a97 recipe-upgrade: reconcile mirror to upstream main + close merged/superseded PRs
Per operator: an open mirror PR must mean "genuinely still open against true
current upstream main". On every run the recipe-upgrade flow now:
- force-syncs the recipe-maintainers/<recipe> mirror `main` to be IDENTICAL to
  upstream main (origin/main of the abra checkout = coopcloud);
- closes any open mirror PR whose changes are already in upstream main (merged
  upstream, no-op merge detected via `git merge-tree` vs main's tree) — even
  when the recipe is up to date (new `--reconcile-only` mode, run in step 1);
- when opening a new upgrade PR, closes any other still-open PR for that recipe
  (superseded) and opens the new one IN ITS PLACE; same-version re-runs just
  update the existing same-branch PR.
open-recipe-pr.sh gains the --reconcile-only mode + the close logic (with an
auto-close comment naming the reason). upgrade-all reconciles every candidate's
mirror during the survey so merged PRs are closed fleet-wide. Still never merges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:32:34 +01:00
a8b4b4c39e upgrade-all: pin weekly slot (Sat 03:00 UTC) + defer activation until cc-ci is built
Operator: don't run the weekly upgrade-all while the build loops are still
constructing cc-ci (shared-host contention). Activate the Sat 03:00 UTC
(0 3 * * 6) cron only once the build is complete; on-demand until then.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:24:40 +01:00
db31c08d6a Add /recipe-upgrade + /upgrade-all skills (cc-ci-gated upgrades, never merge)
Per-recipe and fleet-wide upgrade skills modelled on recipe-maintainer's
recipe-upgrade-full / recipe-upgrade-cron-all, but gated by the cc-ci CI server
and inheriting ci-test-review's create+verify+never-merge discipline.

- recipe-upgrade/: plan (release notes, breaking changes) -> implement (abra
  recipe upgrade + version bump + config, lint) -> open the recipe PR -> VERIFY
  green on cc-ci (full suite cold against the PR head via verify-pr.sh). If the
  upgrade is correct but a cc-ci TEST went stale, also update the test, verify
  it, and open a second PR to recipe-maintainers/cc-ci. Never merges; never
  weakens a test; prefers a recipe-only PR. Emits a parseable RESULT line.
  + open-recipe-pr.sh: adapted recipe-create-pr; runs on cc-ci (has the recipe
    checkout + bot token), creds passed from the orchestrator .testenv;
    force-syncs the mirror main so the PR diff is exactly the upgrade.
- upgrade-all/: weekly fan-out — enumerate enrolled recipes, survey upgrades,
  run /recipe-upgrade per upgradeable recipe via subagent (sequential default,
  --parallel / --dry-run), collect into one PR-list summary. Coordination +
  single-writer + shared-Swarm-teardown guardrails; built for a weekly cron.
- ci-test-review/verify-pr.sh: pass SRC (recipe-maintainers/<recipe>) alongside
  REF so the harness clones the mirror PR head correctly (its real contract).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:19:20 +01:00
cbe1406bce ci-test-review: close the loop — author + open + cc-ci-verify fix PRs (never merge)
Per operator: the skill should not just propose, it should CREATE the fix PR
(recipe repo or cc-ci repo) and VERIFY it green on its own CI server — but not
merge. It drives cc-ci like the loops do.

- SKILL.md: diagnose+classify (recipe vs CI-server) -> author the fix + open a
  PR (recipe-create-pr for recipe PRs; Gitea API for cc-ci PRs, dedicated branch
  in a separate clone, single-writer safe) -> VERIFY on cc-ci (full suite cold
  against the PR head = the !testme dogfood path) -> report a verified,
  ready-to-merge PR. Never merges; never weakens a test; flake != bug. General
  bar = one cold green; repeated-green (REPEAT=3) only for a known-flaky recipe.
  Adds coordination/single-writer guardrails (shared Swarm is stateful; tear
  down deploys; never push main or touch the loops' clones).
- verify-pr.sh: deterministic recipe-PR gate — RECIPE + REF -> cold full suite
  on cc-ci, green iff every repeat exits 0. CI-server-PR verification stays
  bespoke (branch checkout + rebuild + regression sample) per SKILL.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 17:04:35 +01:00
2530845e50 orchestrator: add /ci-test-review skill (in THIS repo) + drop Phase 3r from loops queue
The on-demand AI review layer is now an orchestration-repo skill built directly
by the orchestrator, NOT a loops phase in the cc-ci product repo:

- .claude/skills/ci-test-review/{SKILL.md,run-all-recipes.sh}: runs the real
  cc-ci harness across all enrolled recipes (deterministic, AI-free execution),
  then AI diagnoses each failure and classifies it as needing a recipe PR / a
  CI-server PR / a stale-test update — or reports "ALL PASSED, recipes + tests
  up to date". Proposes PRs; never decides pass/fail; never auto-merges.
- .gitignore: track .claude/skills/ (shareable) while still ignoring local
  claude session state (locks, history) under .claude/.
- launch.sh: remove Phase 3r from PHASES_SPEC; loops sequence back to
  1c 1b 1d 1e 2w 2pc 2 2b 3 4. Deleted plan-phase3r (superseded by the skill).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 16:57:26 +01:00