271 Commits

Author SHA1 Message Date
02f13ab35f recipe-upgrade: scope upstream cross-check to abra-unparseable tags only
Trust abra fully for any image whose tag it can read — a normal semver/calver
tag with no newer version is genuinely up-to-date. Only cross-check upstream for
images abra physically can't parse (tag+digest pins, digest-only pins), which is
the actual immich blind spot. Avoids redundant upstream checks on every recipe.
2026-06-19 11:54:21 +00:00
ce5d2e22cf recipe-upgrade: cross-check upstream when abra is blind (immich tag+digest skip)
abra recipe upgrade is the first approach, but it silently contributes no
candidate for tag+digest pins (FATA: tag and digest not supported), digest-only
pins, and non-semver tags. immich kept getting skipped this way. Before
concluding SKIPPED — up-to-date, do a direct upstream tag check for every image
abra could not cleanly evaluate; only skip when BOTH agree nothing is newer.
2026-06-19 11:51:37 +00:00
68bbfc72f2 upstream(mattermost-lts): update to 11.7 ESR; note restore fix, schemeid gotcha
- Mattermost now on 11.7 ESR (EOL 2027-05-15); 10.11 ESR expires 2026-08-15
- Latest patch: 11.7.5 (2026-06-18)
- Note: avoid 11.7.0–11.7.2 (schemeid bug upgrading from 10.11.17+)
- Backup/restore now uses pg_backup.sh (proper restore hook; PR #2/PR #1 fix)
- Next ESR expected ~Feb 2027

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Mbq9p2eCzZH59qfw9WsgKZ
2026-06-19 03:53:48 +00:00
15e88eaca2 upstream(hedgedoc): add release-notes sources registry 2026-06-19 02:49:42 +00:00
f195dfc828 upstream(gitea): release-notes sources 2026-06-19 02:45:01 +00:00
0ccd98fa8e upstream(drone): release-notes sources 2026-06-19 02:21:11 +00:00
e844ae3909 upstream(bluesky-pds): update registry — 0.4.5001 is new versioning scheme, not a mis-tag 2026-06-19 02:12:41 +00:00
a22ae8deed journal: redfix DONE — all 6 canon-sweep failures fixed + verified (4 recipe PRs, 2 harness); SEQUENCE-COMPLETE 2026-06-18 14:54:28 +00:00
ff6c44a627 plan: queue redfix — investigate ALL canon-sweep failures + FIX each (recipe PR or harness improvement, opus)
Operator 2026-06-17: fix all six (discourse timeout, mattermost-lts restore,
mumble handshake, bluesky warm-routing, gitea 3.6.0 advance, keycloak
de-enroll) — none left as a standing exception. Isolation re-run first
(flake vs genuine; operator recalls mattermost-lts/mumble passing). keycloak
+ bluesky likely share a warm-domain-collision harness fix; gitea via recipe
PR or warm-advance fallback. Nothing merged.
2026-06-17 23:17:51 +00:00
359f3f0978 journal: SEQUENCE-COMPLETE — regall/samever/canon/dash/settings/nixenv all DONE, host healthy 2026-06-17 22:36:13 +00:00
bd16123865 plan: queue nixenv — single-source the harness runtime env (timer + Drone runner share deps; root-cause fix for DEFECT-3 drift, opus)
Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
2026-06-17 15:38:45 +00:00
f7825f8494 plan(settings): add release-tag-first no-canonical fallback; bump to opus
Operator 2026-06-17: (1) no-canonical upgrade base should prefer the most recent
release TAG < head (a real published predecessor, reusing samever's helper),
with raw main-tip only as a last resort, then skip — always-on, improves this
server too. (2) SKIP_CANONICALS_FOR_UPGRADE=true feeds that same release-tag-first
fallback (so it swaps canonical->latest-release base without losing upgrade
coverage). (3) model bumped sonnet->opus.
2026-06-17 09:48:34 +00:00
aa1f897820 plan: queue settings — minimal CI-server settings.toml + SKIP_CANONICALS_FOR_UPGRADE (sonnet, after dash)
Operator 2026-06-17. Introduce a minimal, extensible server-level settings.toml
for the cc-ci server; first value SKIP_CANONICALS_FOR_UPGRADE (bool, default
false, false on this server). When true, resolve_upgrade_base bypasses the
canonical and uses the main-tip predecessor — codifying that canonicals are an
optional optimization for the upgrade base. Scoped to the upgrade base only
(promote/--quick separate). Runs after canon/dash (builds on the final resolver).
2026-06-17 09:44:22 +00:00
ee8d30b43e plan(canon): retire UPGRADE_BASE_VERSION (gated) — plausible's pin becomes redundant under the dynamic canonical base
Operator 2026-06-17: UPGRADE_BASE_VERSION is still used (plausible pins 3.0.1+
v2.0.0 to dodge the broken 3.0.0 base; bluesky-pds references it as a future
re-enable). Once canon establishes plausible's canonical at 3.0.1, the dynamic
base resolves correctly without the pin -> strip the key (meta/resolver/docs/
tests) + migrate plausible + update bluesky-pds note. GATED: keep it if
plausible genuinely still needs the escape-hatch (never drop upgrade coverage).
2026-06-17 04:43:23 +00:00
9e7d76ca1f plan: queue dash — fix incomplete per-recipe run history on the CI dashboard (opus, after canon)
Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes
because history is built from a single page of the latest 100 Drone builds,
while 362 runs exist on the host. Source per-recipe history from the local
/var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full,
durable history. Deploy + verify live on bluesky-pds.
2026-06-17 04:29:52 +00:00
4476cf1505 plan(canon): canonical promotes only to TAGGED releases; sweep triggers on a new release tag (not a new commit)
Operator 2026-06-17. (1) should_promote_canonical also requires the tested
version to have a release tag -> canonical is always a real published release,
never an untagged main commit. (2) sweep trigger is version/tag-keyed not
commit-keyed: skip a recipe unless its latest release tag is newer than its
canonical (skip new-untagged-commits-without-a-version). This makes the sweep
and samever ORTHOGONAL (samever never fires in the sweep; it's the PR-path
same-version guard). Updated gates/DoD accordingly.
2026-06-17 04:18:36 +00:00
9a81fe88e8 plan(canon): enroll ALL recipes, weekly cadence, and REQUIRE proof it plays nice with samever
Operator 2026-06-17: all recipes WARM_CANONICAL (watch warm-volume disk),
weekly timer, and an explicit M2 requirement to prove the sweep<->samever
interaction — skip uses COMMIT equality, samever uses VERSION equality; M2
must demonstrate all 3 sweep paths (skip / version-bump upgrade / same-version-
label step-back) and the commit-vs-version boundary.
2026-06-17 04:09:56 +00:00
05e2635019 plan: queue canon — make the canonical sweep actually work (substitute for hollow nightly sweep)
Operator 2026-06-17. The nightly-sweep timer fires green but is a no-op: only
custom-html is WARM_CANONICAL and zero canonical.json records exist -> no
canonical has ever been promoted end-to-end. canon makes it real + proven:
fix/prove the promote path, broaden enrollment, add upstream mirror-sync +
skip-when-unchanged, verify end-to-end (incl. run-twice no-op). Schedule is
incidental; correctness is the deliverable. Replaces the hollow sweep. opus.
2026-06-17 04:06:59 +00:00
03292c6f57 plan(samever): frame the same-version gap as the nightly cold-sweep STEADY STATE (operator insight)
The nightly cold-on-latest run promotes canonical->LATEST, so every subsequent
nightly (until a new version ships) finds canonical==latest==version-under-test
-> base==head -> same-version no-op. This is the common path, not a rare edge.
M2 now proves the fix on that nightly scenario (canonical already==latest ->
step back to previous published).
2026-06-17 03:56:09 +00:00
b40dcb504c plan: queue samever (older-base fallback when last-green==head, opus); IDEAS: defer canonical-history (B)
Operator 2026-06-17. Closes the prevb resolver gap: when the last-green
warm-canonical base version equals the PR head version, step back to the
newest published version strictly older than head (design A) instead of a
same-version no-op or a skip. Design B (canonical history for a green-verified
older base) saved to IDEAS. Auto-runs after regall (watchdog advances + switches
to opus).
2026-06-17 03:50:49 +00:00
65ee741869 plan: queue prevb (dynamic upgrade base + previous/ config, opus) + regall (all-recipe regression, sonnet)
Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single
compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) ->
main fallback -> skip; optional minimal per-recipe previous/ folder for
base-only version repairs (ignored for head, version-guarded, removable when
stale). Validated on discourse PR #4 (official-image switch the current overlay
masks). regall then sweeps all recipes for regressions on sonnet.
2026-06-16 23:55:28 +00:00
3c1d48e0b8 journal: gtea phase DONE — gitea fully-tested + LFS PR #1 verified; SEQUENCE-COMPLETE 2026-06-15 22:58:30 +00:00
24b3a25ce6 plan: queue gtea — enroll gitea as a fully-tested recipe + verify LFS PR #1
Operator-requested 2026-06-15. gitea is currently only a dep provider for
drone; this phase builds the full recipe-under-test suite (install/upgrade/
backup/restore/custom + lint + screenshot), ports the upstream parity corpus
(health_check, git_push), and verifies recipe-maintainers/gitea PR #1
('feat: support Git LFS on plain gitea', branch lfs-plain-gitea) via an LFS
round-trip + JWT-stability capstone test — red/skipped on main, green on the
PR. Central constraint: do NOT break drone's gitea-dep path (shared
recipe_meta.py). builder+adversary on sonnet.

The agents.toml diff also records the already-live aoeng/aotest/porepo/poe2e
queue head (agent-orchestrator workstream; their plan files remain owned by
that effort).
2026-06-15 19:32:14 +00:00
2f2225e466 recipe-upgrade: defer version bump to operator's final 'abra recipe release'
§2 no longer bumps the coop-cloud version label in the recipe PR (no
--dry-run compute, no tag). It records the recommended 'abra recipe release
<recipe> -<x|y|z>' (no --dry-run) in the PR body (§3) as the operator's
final publish step — run after the upstream PR merges, it bumps the label +
tags + pushes in one go. Bumps recipe-maintainer submodule to 9daddac (same
change across its /recipe-upstream, -upgrade-apply, -upgrade-plan, -new-tag).
2026-06-15 18:38:27 +00:00
6acad7b35b recipe-upgrade: abra recipe release for version bump + upstream release-notes links in PR body
cc-ci recipe-upgrade skill now computes the version via 'abra recipe release --dry-run'
(not a hand-edit) and requires the PR body to link upstream release notes per service.
Bumps the recipe-maintainer submodule pointer to the matching change.
2026-06-15 18:09:41 +00:00
6a2464469f upgrade-all: skip 'external' recipes (uptime-kuma) + add used-recipes.md inventory
Operator: uptime-kuma is maintained elsewhere — drop it from the weekly upgrade
but keep it in the used-recipes inventory. New cc-ci-plan/used-recipes.md is the
canonical list of every recipe cc-ci deploys/tests, with a weekly|external tier;
upgrade-all §1 now excludes 'external' rows from the candidate list (explicit
--args still override). uptime-kuma = external; all others weekly.
2026-06-15 17:00:28 +00:00
489f6670da journal: pxgate cold-boot proof passed (real reboot, deploy-proxy active before dashboard) 2026-06-13 13:52:56 +00:00
6005a212d6 memory+journal: cc-ci host rebuild procedure; pxgate M2 deployed + verified on live host 2026-06-13 13:46:19 +00:00
1aee81b4f3 plan: queue pxgate — fix deploy-proxy/dashboard health-gate circular dependency (D8)
Re-target the traefik health gate off ci.commoninternet.net (the dashboard,
which is After=deploy-proxy) onto a traefik-self endpoint, breaking the
fresh-boot deadlock while keeping health-gated rollback. M1 controlled repro by
the loops; M2 from-scratch cold-boot proof owned by the orchestrator.
2026-06-13 12:38:40 +00:00
97303abc25 watchdog: suppress scheduled wakes once the build sequence is complete
The unified agents.py watchdog kept firing the hourly orchestrator supervision
ping even after SEQUENCE-COMPLETE (the old launch.py watchdog exited on
completion, which stopped them). Gate the wake loop on the SEQUENCE-COMPLETE
marker so a finished build stays fully at rest — no pings. Resumes
automatically when new work is queued (that clears the marker, line 631).
2026-06-13 12:04:49 +00:00
84e13a7f23 fix(pvcheck/A2): update upgrade-all SKILL.md guard description
The durable /16 proxy fix landed in phase pvfix (2026-06-13).
Update the guard description from "safety net until that lands"
to "belt-and-suspenders even after the /16 fix" — guard logic
unchanged, description now accurate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 05:58:27 +00:00
1e9337ce89 agents.toml: re-add cf48 (opus cfold review) dropped during the launch-system migration
The unification transcribed phases from .phases-spec before cf48 was added,
so the operator's just-requested opus 4.8 cfold review got dropped. Re-append
it after ghost (system is past cf55/on pvfix, so can't insert before pvfix
without shifting the live phase index). agents.py re-reads config each tick.
2026-06-13 05:32:47 +00:00
b4a6aaea7e plan: queue cf48 — Opus 4.8 post-cfold coverage-loss review (cross-check of cf55 GPT-5.5)
Second independent review of the cfold custom-folder collapse, by Opus 4.8
instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...).
Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.
2026-06-13 05:15:05 +00:00
2c64cd69f0 fix(watchdog): detect idle opencode turns 2026-06-12 21:47:06 +00:00
85498931d1 plan: add gpt55 cfold review phase 2026-06-12 16:07:48 +00:00
dea6359bcd plan: queue proxy and ghost follow-up phases 2026-06-12 15:56:03 +00:00
a186f23b37 orchestrator: restore opencode web launcher 2026-06-12 15:45:09 +00:00
03343ed3cf plan(ghost-pr): fold in upgrader's diagnosis — mysql 8.0->8.4 data-dir upgrade race (update_config.monitor too tight); PR#4 open 2026-06-12 13:45:52 +00:00
5141335fb3 upstream(mattermost-lts): update ESR notes — 10.11 ESR (Aug 2026), 10.12 expired, 11.7 next ESR; backup format notes 2026-06-12 04:14:04 +00:00
e89884beba upstream(n8n): update standing notes for 2.26.3 upgrade 2026-06-12 04:12:32 +00:00
c7fae6cbee upstream(matrix-synapse): update notes for 7.3.0+v1.154.0 PR#2 2026-06-12 04:00:59 +00:00
4f31abc0c7 upstream(mailu): update Redis standing note — operator approved 8.8.x jump 2026-06-12 03:48:52 +00:00
d3a9455eb3 upstream(lasuite-meet): document livekit v1.13.1 TURN-auth note + redis 8.8.0 2026-06-12 03:43:29 +00:00
ca02a0dd6f upgrade-all: proxy VIP-exhaustion guard in Step 0; runbooks for proxy /16 enlarge + ghost PR debug
Root-caused (empirically, dockerd logs) the discourse/ghost deploy wedges:
the shared proxy overlay (/24=254 VIPs) exhausts as concurrent stack rm leaks
endpoints over many days -> tasks stuck in Swarm 'New'. Add a per-run safety
net to Step 0 (network prune + docker restart when VIP-allocation failures are
logged). Plans + memory for the durable fix (enlarge proxy to /16 in swarm.nix,
maintenance window) and for debugging/fixing the ghost PR afterward.
2026-06-12 03:30:00 +00:00
7ce898e0e4 upstream(immich): document concurrent app+db restart update_config fix 2026-06-12 03:26:37 +00:00
28b9431035 upstream(immich): note pgvectors0.3.0 bump in PR #2 + new digest (2026-06-12) 2026-06-12 02:50:45 +00:00
2c5e08f78c upgrade-all: simplify to a rolling pool, alphabetical (drop waves + heavy/light)
Per operator: just work through recipes alphabetically keeping CAP (=
DRONE_RUNNER_CAPACITY=2) subagents running at once, starting the next the moment
one finishes (rolling pool via run_in_background). Removes the wave-barrier and
the heavy/light classification entirely — simpler and no slot ever idles.
2026-06-12 01:58:22 +00:00
894d829313 upgrade-all: at the tail, fill slots with two heavies rather than serialize
Per operator: always fill all CAP slots. Heavy/light alternation only spreads
heavies across waves while a light is available; once only heavies remain, run
two-per-wave (capacity is the tuned ceiling) instead of one-per-wave.
2026-06-12 01:55:29 +00:00
f744c79e2d upgrade-all: alternate heavy/light per wave (not heaviest-first)
Host memory is the binding limit, so never schedule two HEAVY recipes in the
same capacity wave — pair each heavy (discourse/immich/matrix-synapse/
lasuite-drive/mattermost-lts/ghost) with a light one to bound peak memory while
keeping both slots busy. Heaviest-first could co-schedule two heavies and OOM/
wedge the box (the disc-50cc8a 'New'-state wedge). For CAP>2 cap heavies at
~CAP/2; if only heavies remain, run one-per-wave.
2026-06-12 01:47:22 +00:00
a45517b432 upgrade-all: default to concurrency-bounded (DRONE_RUNNER_CAPACITY) subagents
Now that the 2026-06-10 concurrency restructure makes concurrent recipe runs
safe (per-run trees, app-domain locks, isolation), default /upgrade-all to run
up to DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) recipe
subagents at a time instead of strictly sequential — using all available
concurrency without oversubscribing. Query the live capacity from
'systemctl show drone-runner-exec' (fallback 2); process recipes in waves of
CAP (emit CAP Agent calls per message, await, next wave). Flags: --capacity N,
--sequential (CAP=1, old default — use when the build loops share the box),
--parallel (unbounded). Applies to the NEXT run; the in-flight run is unaffected.
2026-06-12 01:39:44 +00:00