abra recipe upgrade is the first approach, but it silently contributes no
candidate for tag+digest pins (FATA: tag and digest not supported), digest-only
pins, and non-semver tags. immich kept getting skipped this way. Before
concluding SKIPPED — up-to-date, do a direct upstream tag check for every image
abra could not cleanly evaluate; only skip when BOTH agree nothing is newer.
Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
Operator 2026-06-17: (1) no-canonical upgrade base should prefer the most recent
release TAG < head (a real published predecessor, reusing samever's helper),
with raw main-tip only as a last resort, then skip — always-on, improves this
server too. (2) SKIP_CANONICALS_FOR_UPGRADE=true feeds that same release-tag-first
fallback (so it swaps canonical->latest-release base without losing upgrade
coverage). (3) model bumped sonnet->opus.
Operator 2026-06-17. Introduce a minimal, extensible server-level settings.toml
for the cc-ci server; first value SKIP_CANONICALS_FOR_UPGRADE (bool, default
false, false on this server). When true, resolve_upgrade_base bypasses the
canonical and uses the main-tip predecessor — codifying that canonicals are an
optional optimization for the upgrade base. Scoped to the upgrade base only
(promote/--quick separate). Runs after canon/dash (builds on the final resolver).
Operator 2026-06-17: UPGRADE_BASE_VERSION is still used (plausible pins 3.0.1+
v2.0.0 to dodge the broken 3.0.0 base; bluesky-pds references it as a future
re-enable). Once canon establishes plausible's canonical at 3.0.1, the dynamic
base resolves correctly without the pin -> strip the key (meta/resolver/docs/
tests) + migrate plausible + update bluesky-pds note. GATED: keep it if
plausible genuinely still needs the escape-hatch (never drop upgrade coverage).
Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes
because history is built from a single page of the latest 100 Drone builds,
while 362 runs exist on the host. Source per-recipe history from the local
/var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full,
durable history. Deploy + verify live on bluesky-pds.
Operator 2026-06-17. (1) should_promote_canonical also requires the tested
version to have a release tag -> canonical is always a real published release,
never an untagged main commit. (2) sweep trigger is version/tag-keyed not
commit-keyed: skip a recipe unless its latest release tag is newer than its
canonical (skip new-untagged-commits-without-a-version). This makes the sweep
and samever ORTHOGONAL (samever never fires in the sweep; it's the PR-path
same-version guard). Updated gates/DoD accordingly.
Operator 2026-06-17: all recipes WARM_CANONICAL (watch warm-volume disk),
weekly timer, and an explicit M2 requirement to prove the sweep<->samever
interaction — skip uses COMMIT equality, samever uses VERSION equality; M2
must demonstrate all 3 sweep paths (skip / version-bump upgrade / same-version-
label step-back) and the commit-vs-version boundary.
Operator 2026-06-17. The nightly-sweep timer fires green but is a no-op: only
custom-html is WARM_CANONICAL and zero canonical.json records exist -> no
canonical has ever been promoted end-to-end. canon makes it real + proven:
fix/prove the promote path, broaden enrollment, add upstream mirror-sync +
skip-when-unchanged, verify end-to-end (incl. run-twice no-op). Schedule is
incidental; correctness is the deliverable. Replaces the hollow sweep. opus.
The nightly cold-on-latest run promotes canonical->LATEST, so every subsequent
nightly (until a new version ships) finds canonical==latest==version-under-test
-> base==head -> same-version no-op. This is the common path, not a rare edge.
M2 now proves the fix on that nightly scenario (canonical already==latest ->
step back to previous published).
Operator 2026-06-17. Closes the prevb resolver gap: when the last-green
warm-canonical base version equals the PR head version, step back to the
newest published version strictly older than head (design A) instead of a
same-version no-op or a skip. Design B (canonical history for a green-verified
older base) saved to IDEAS. Auto-runs after regall (watchdog advances + switches
to opus).
Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single
compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) ->
main fallback -> skip; optional minimal per-recipe previous/ folder for
base-only version repairs (ignored for head, version-guarded, removable when
stale). Validated on discourse PR #4 (official-image switch the current overlay
masks). regall then sweeps all recipes for regressions on sonnet.
Operator-requested 2026-06-15. gitea is currently only a dep provider for
drone; this phase builds the full recipe-under-test suite (install/upgrade/
backup/restore/custom + lint + screenshot), ports the upstream parity corpus
(health_check, git_push), and verifies recipe-maintainers/gitea PR #1
('feat: support Git LFS on plain gitea', branch lfs-plain-gitea) via an LFS
round-trip + JWT-stability capstone test — red/skipped on main, green on the
PR. Central constraint: do NOT break drone's gitea-dep path (shared
recipe_meta.py). builder+adversary on sonnet.
The agents.toml diff also records the already-live aoeng/aotest/porepo/poe2e
queue head (agent-orchestrator workstream; their plan files remain owned by
that effort).
§2 no longer bumps the coop-cloud version label in the recipe PR (no
--dry-run compute, no tag). It records the recommended 'abra recipe release
<recipe> -<x|y|z>' (no --dry-run) in the PR body (§3) as the operator's
final publish step — run after the upstream PR merges, it bumps the label +
tags + pushes in one go. Bumps recipe-maintainer submodule to 9daddac (same
change across its /recipe-upstream, -upgrade-apply, -upgrade-plan, -new-tag).
cc-ci recipe-upgrade skill now computes the version via 'abra recipe release --dry-run'
(not a hand-edit) and requires the PR body to link upstream release notes per service.
Bumps the recipe-maintainer submodule pointer to the matching change.
Operator: uptime-kuma is maintained elsewhere — drop it from the weekly upgrade
but keep it in the used-recipes inventory. New cc-ci-plan/used-recipes.md is the
canonical list of every recipe cc-ci deploys/tests, with a weekly|external tier;
upgrade-all §1 now excludes 'external' rows from the candidate list (explicit
--args still override). uptime-kuma = external; all others weekly.
Re-target the traefik health gate off ci.commoninternet.net (the dashboard,
which is After=deploy-proxy) onto a traefik-self endpoint, breaking the
fresh-boot deadlock while keeping health-gated rollback. M1 controlled repro by
the loops; M2 from-scratch cold-boot proof owned by the orchestrator.
The unified agents.py watchdog kept firing the hourly orchestrator supervision
ping even after SEQUENCE-COMPLETE (the old launch.py watchdog exited on
completion, which stopped them). Gate the wake loop on the SEQUENCE-COMPLETE
marker so a finished build stays fully at rest — no pings. Resumes
automatically when new work is queued (that clears the marker, line 631).
The durable /16 proxy fix landed in phase pvfix (2026-06-13).
Update the guard description from "safety net until that lands"
to "belt-and-suspenders even after the /16 fix" — guard logic
unchanged, description now accurate.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The unification transcribed phases from .phases-spec before cf48 was added,
so the operator's just-requested opus 4.8 cfold review got dropped. Re-append
it after ghost (system is past cf55/on pvfix, so can't insert before pvfix
without shifting the live phase index). agents.py re-reads config each tick.
Second independent review of the cfold custom-folder collapse, by Opus 4.8
instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...).
Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.
Root-caused (empirically, dockerd logs) the discourse/ghost deploy wedges:
the shared proxy overlay (/24=254 VIPs) exhausts as concurrent stack rm leaks
endpoints over many days -> tasks stuck in Swarm 'New'. Add a per-run safety
net to Step 0 (network prune + docker restart when VIP-allocation failures are
logged). Plans + memory for the durable fix (enlarge proxy to /16 in swarm.nix,
maintenance window) and for debugging/fixing the ghost PR afterward.
Per operator: just work through recipes alphabetically keeping CAP (=
DRONE_RUNNER_CAPACITY=2) subagents running at once, starting the next the moment
one finishes (rolling pool via run_in_background). Removes the wave-barrier and
the heavy/light classification entirely — simpler and no slot ever idles.
Per operator: always fill all CAP slots. Heavy/light alternation only spreads
heavies across waves while a light is available; once only heavies remain, run
two-per-wave (capacity is the tuned ceiling) instead of one-per-wave.
Host memory is the binding limit, so never schedule two HEAVY recipes in the
same capacity wave — pair each heavy (discourse/immich/matrix-synapse/
lasuite-drive/mattermost-lts/ghost) with a light one to bound peak memory while
keeping both slots busy. Heaviest-first could co-schedule two heavies and OOM/
wedge the box (the disc-50cc8a 'New'-state wedge). For CAP>2 cap heavies at
~CAP/2; if only heavies remain, run one-per-wave.
Now that the 2026-06-10 concurrency restructure makes concurrent recipe runs
safe (per-run trees, app-domain locks, isolation), default /upgrade-all to run
up to DRONE_RUNNER_CAPACITY (the drone runner's slots, currently 2) recipe
subagents at a time instead of strictly sequential — using all available
concurrency without oversubscribing. Query the live capacity from
'systemctl show drone-runner-exec' (fallback 2); process recipes in waves of
CAP (emit CAP Agent calls per message, await, next wave). Flags: --capacity N,
--sequential (CAP=1, old default — use when the build loops share the box),
--parallel (unbounded). Applies to the NEXT run; the in-flight run is unaffected.