Trust abra fully for any image whose tag it can read — a normal semver/calver
tag with no newer version is genuinely up-to-date. Only cross-check upstream for
images abra physically can't parse (tag+digest pins, digest-only pins), which is
the actual immich blind spot. Avoids redundant upstream checks on every recipe.
abra recipe upgrade is the first approach, but it silently contributes no
candidate for tag+digest pins (FATA: tag and digest not supported), digest-only
pins, and non-semver tags. immich kept getting skipped this way. Before
concluding SKIPPED — up-to-date, do a direct upstream tag check for every image
abra could not cleanly evaluate; only skip when BOTH agree nothing is newer.
Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
Operator 2026-06-17: (1) no-canonical upgrade base should prefer the most recent
release TAG < head (a real published predecessor, reusing samever's helper),
with raw main-tip only as a last resort, then skip — always-on, improves this
server too. (2) SKIP_CANONICALS_FOR_UPGRADE=true feeds that same release-tag-first
fallback (so it swaps canonical->latest-release base without losing upgrade
coverage). (3) model bumped sonnet->opus.
Operator 2026-06-17. Introduce a minimal, extensible server-level settings.toml
for the cc-ci server; first value SKIP_CANONICALS_FOR_UPGRADE (bool, default
false, false on this server). When true, resolve_upgrade_base bypasses the
canonical and uses the main-tip predecessor — codifying that canonicals are an
optional optimization for the upgrade base. Scoped to the upgrade base only
(promote/--quick separate). Runs after canon/dash (builds on the final resolver).
Operator 2026-06-17: UPGRADE_BASE_VERSION is still used (plausible pins 3.0.1+
v2.0.0 to dodge the broken 3.0.0 base; bluesky-pds references it as a future
re-enable). Once canon establishes plausible's canonical at 3.0.1, the dynamic
base resolves correctly without the pin -> strip the key (meta/resolver/docs/
tests) + migrate plausible + update bluesky-pds note. GATED: keep it if
plausible genuinely still needs the escape-hatch (never drop upgrade coverage).
Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes
because history is built from a single page of the latest 100 Drone builds,
while 362 runs exist on the host. Source per-recipe history from the local
/var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full,
durable history. Deploy + verify live on bluesky-pds.
Operator 2026-06-17. (1) should_promote_canonical also requires the tested
version to have a release tag -> canonical is always a real published release,
never an untagged main commit. (2) sweep trigger is version/tag-keyed not
commit-keyed: skip a recipe unless its latest release tag is newer than its
canonical (skip new-untagged-commits-without-a-version). This makes the sweep
and samever ORTHOGONAL (samever never fires in the sweep; it's the PR-path
same-version guard). Updated gates/DoD accordingly.
Operator 2026-06-17: all recipes WARM_CANONICAL (watch warm-volume disk),
weekly timer, and an explicit M2 requirement to prove the sweep<->samever
interaction — skip uses COMMIT equality, samever uses VERSION equality; M2
must demonstrate all 3 sweep paths (skip / version-bump upgrade / same-version-
label step-back) and the commit-vs-version boundary.
Operator 2026-06-17. The nightly-sweep timer fires green but is a no-op: only
custom-html is WARM_CANONICAL and zero canonical.json records exist -> no
canonical has ever been promoted end-to-end. canon makes it real + proven:
fix/prove the promote path, broaden enrollment, add upstream mirror-sync +
skip-when-unchanged, verify end-to-end (incl. run-twice no-op). Schedule is
incidental; correctness is the deliverable. Replaces the hollow sweep. opus.
The nightly cold-on-latest run promotes canonical->LATEST, so every subsequent
nightly (until a new version ships) finds canonical==latest==version-under-test
-> base==head -> same-version no-op. This is the common path, not a rare edge.
M2 now proves the fix on that nightly scenario (canonical already==latest ->
step back to previous published).
Operator 2026-06-17. Closes the prevb resolver gap: when the last-green
warm-canonical base version equals the PR head version, step back to the
newest published version strictly older than head (design A) instead of a
same-version no-op or a skip. Design B (canonical history for a green-verified
older base) saved to IDEAS. Auto-runs after regall (watchdog advances + switches
to opus).
Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single
compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) ->
main fallback -> skip; optional minimal per-recipe previous/ folder for
base-only version repairs (ignored for head, version-guarded, removable when
stale). Validated on discourse PR #4 (official-image switch the current overlay
masks). regall then sweeps all recipes for regressions on sonnet.
Operator-requested 2026-06-15. gitea is currently only a dep provider for
drone; this phase builds the full recipe-under-test suite (install/upgrade/
backup/restore/custom + lint + screenshot), ports the upstream parity corpus
(health_check, git_push), and verifies recipe-maintainers/gitea PR #1
('feat: support Git LFS on plain gitea', branch lfs-plain-gitea) via an LFS
round-trip + JWT-stability capstone test — red/skipped on main, green on the
PR. Central constraint: do NOT break drone's gitea-dep path (shared
recipe_meta.py). builder+adversary on sonnet.
The agents.toml diff also records the already-live aoeng/aotest/porepo/poe2e
queue head (agent-orchestrator workstream; their plan files remain owned by
that effort).
§2 no longer bumps the coop-cloud version label in the recipe PR (no
--dry-run compute, no tag). It records the recommended 'abra recipe release
<recipe> -<x|y|z>' (no --dry-run) in the PR body (§3) as the operator's
final publish step — run after the upstream PR merges, it bumps the label +
tags + pushes in one go. Bumps recipe-maintainer submodule to 9daddac (same
change across its /recipe-upstream, -upgrade-apply, -upgrade-plan, -new-tag).
cc-ci recipe-upgrade skill now computes the version via 'abra recipe release --dry-run'
(not a hand-edit) and requires the PR body to link upstream release notes per service.
Bumps the recipe-maintainer submodule pointer to the matching change.
Operator: uptime-kuma is maintained elsewhere — drop it from the weekly upgrade
but keep it in the used-recipes inventory. New cc-ci-plan/used-recipes.md is the
canonical list of every recipe cc-ci deploys/tests, with a weekly|external tier;
upgrade-all §1 now excludes 'external' rows from the candidate list (explicit
--args still override). uptime-kuma = external; all others weekly.
Re-target the traefik health gate off ci.commoninternet.net (the dashboard,
which is After=deploy-proxy) onto a traefik-self endpoint, breaking the
fresh-boot deadlock while keeping health-gated rollback. M1 controlled repro by
the loops; M2 from-scratch cold-boot proof owned by the orchestrator.
The unified agents.py watchdog kept firing the hourly orchestrator supervision
ping even after SEQUENCE-COMPLETE (the old launch.py watchdog exited on
completion, which stopped them). Gate the wake loop on the SEQUENCE-COMPLETE
marker so a finished build stays fully at rest — no pings. Resumes
automatically when new work is queued (that clears the marker, line 631).
The durable /16 proxy fix landed in phase pvfix (2026-06-13).
Update the guard description from "safety net until that lands"
to "belt-and-suspenders even after the /16 fix" — guard logic
unchanged, description now accurate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The unification transcribed phases from .phases-spec before cf48 was added,
so the operator's just-requested opus 4.8 cfold review got dropped. Re-append
it after ghost (system is past cf55/on pvfix, so can't insert before pvfix
without shifting the live phase index). agents.py re-reads config each tick.
Second independent review of the cfold custom-folder collapse, by Opus 4.8
instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...).
Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.
Root-caused (empirically, dockerd logs) the discourse/ghost deploy wedges:
the shared proxy overlay (/24=254 VIPs) exhausts as concurrent stack rm leaks
endpoints over many days -> tasks stuck in Swarm 'New'. Add a per-run safety
net to Step 0 (network prune + docker restart when VIP-allocation failures are
logged). Plans + memory for the durable fix (enlarge proxy to /16 in swarm.nix,
maintenance window) and for debugging/fixing the ghost PR afterward.
Per operator: just work through recipes alphabetically keeping CAP (=
DRONE_RUNNER_CAPACITY=2) subagents running at once, starting the next the moment
one finishes (rolling pool via run_in_background). Removes the wave-barrier and
the heavy/light classification entirely — simpler and no slot ever idles.
Per operator: always fill all CAP slots. Heavy/light alternation only spreads
heavies across waves while a light is available; once only heavies remain, run
two-per-wave (capacity is the tuned ceiling) instead of one-per-wave.
Host memory is the binding limit, so never schedule two HEAVY recipes in the
same capacity wave — pair each heavy (discourse/immich/matrix-synapse/
lasuite-drive/mattermost-lts/ghost) with a light one to bound peak memory while
keeping both slots busy. Heaviest-first could co-schedule two heavies and OOM/
wedge the box (the disc-50cc8a 'New'-state wedge). For CAP>2 cap heavies at
~CAP/2; if only heavies remain, run one-per-wave.
Now that the 2026-06-10 concurrency restructure makes concurrent recipe runs
safe (per-run trees, app-domain locks, isolation), default /upgrade-all to run
up to DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) recipe
subagents at a time instead of strictly sequential — using all available
concurrency without oversubscribing. Query the live capacity from
'systemctl show drone-runner-exec' (fallback 2); process recipes in waves of
CAP (emit CAP Agent calls per message, await, next wave). Flags: --capacity N,
--sequential (CAP=1, old default — use when the build loops share the box),
--parallel (unbounded). Applies to the NEXT run; the in-flight run is unaffected.