The opencode-go subscription's rolling usage-limit (429) ends the 'opencode run'
agent loop mid-run; it does NOT self-resume. Add:
- resume: continue the SAME session (context preserved) via 'opencode run -s <id>
--continue' — finds the session from the web server, kills the idle proc safely
(via /proc scan, never pkill -f self-match), relaunches in the tmux session.
- babysit: poll the session log; on a stall (>15min idle) wait out any 429
retry-after then auto-resume. Spawned automatically by an opencode 'start'.
So a usage-limit pause now self-heals instead of needing a manual nudge.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Report launcher now defaults backend=opencode, model=opencode-go/glm-5.2 (claude
override → opus). Replaced the broken opencode 'attach + send-keys' path with the
same 'opencode run -m … --share --attach --title' pattern as the upgrader (the
old path passed no model and injected the prompt via keystrokes).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Weekly upgrade run now defaults backend=opencode, model=opencode-go/glm-5.2 with
no env set. Model default tracks backend (claude override → sonnet). Override via
LOOP_BACKEND/LOOP_MODEL or /srv/cc-ci/upgrader.env.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
cc-ci-upgrade-all now reads an optional EnvironmentFile so the weekly run can
switch backend/model (e.g. LOOP_BACKEND=opencode LOOP_MODEL=opencode-go/glm-5.2)
without a rebuild. Absent file → claude/sonnet (unchanged). Built+switched on
cc-ci-orchestrator-hetzner, host verified healthy.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The opencode backend emitted 'opencode --model X run ...' but -m/--model is a
flag on the run subcommand, so the model was being ignored. Move it after run.
Add OPENCODE_SHARE (default on): attach the session to the shared opencode web
server (oc.commoninternet.net) AND create a public --share link for monitoring.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
Operator 2026-06-17: (1) no-canonical upgrade base should prefer the most recent
release TAG < head (a real published predecessor, reusing samever's helper),
with raw main-tip only as a last resort, then skip — always-on, improves this
server too. (2) SKIP_CANONICALS_FOR_UPGRADE=true feeds that same release-tag-first
fallback (so it swaps canonical->latest-release base without losing upgrade
coverage). (3) model bumped sonnet->opus.
Operator 2026-06-17. Introduce a minimal, extensible server-level settings.toml
for the cc-ci server; first value SKIP_CANONICALS_FOR_UPGRADE (bool, default
false, false on this server). When true, resolve_upgrade_base bypasses the
canonical and uses the main-tip predecessor — codifying that canonicals are an
optional optimization for the upgrade base. Scoped to the upgrade base only
(promote/--quick separate). Runs after canon/dash (builds on the final resolver).
Operator 2026-06-17: UPGRADE_BASE_VERSION is still used (plausible pins 3.0.1+
v2.0.0 to dodge the broken 3.0.0 base; bluesky-pds references it as a future
re-enable). Once canon establishes plausible's canonical at 3.0.1, the dynamic
base resolves correctly without the pin -> strip the key (meta/resolver/docs/
tests) + migrate plausible + update bluesky-pds note. GATED: keep it if
plausible genuinely still needs the escape-hatch (never drop upgrade coverage).
Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes
because history is built from a single page of the latest 100 Drone builds,
while 362 runs exist on the host. Source per-recipe history from the local
/var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full,
durable history. Deploy + verify live on bluesky-pds.
Operator 2026-06-17. (1) should_promote_canonical also requires the tested
version to have a release tag -> canonical is always a real published release,
never an untagged main commit. (2) sweep trigger is version/tag-keyed not
commit-keyed: skip a recipe unless its latest release tag is newer than its
canonical (skip new-untagged-commits-without-a-version). This makes the sweep
and samever ORTHOGONAL (samever never fires in the sweep; it's the PR-path
same-version guard). Updated gates/DoD accordingly.
Operator 2026-06-17: all recipes WARM_CANONICAL (watch warm-volume disk),
weekly timer, and an explicit M2 requirement to prove the sweep<->samever
interaction — skip uses COMMIT equality, samever uses VERSION equality; M2
must demonstrate all 3 sweep paths (skip / version-bump upgrade / same-version-
label step-back) and the commit-vs-version boundary.
Operator 2026-06-17. The nightly-sweep timer fires green but is a no-op: only
custom-html is WARM_CANONICAL and zero canonical.json records exist -> no
canonical has ever been promoted end-to-end. canon makes it real + proven:
fix/prove the promote path, broaden enrollment, add upstream mirror-sync +
skip-when-unchanged, verify end-to-end (incl. run-twice no-op). Schedule is
incidental; correctness is the deliverable. Replaces the hollow sweep. opus.
The nightly cold-on-latest run promotes canonical->LATEST, so every subsequent
nightly (until a new version ships) finds canonical==latest==version-under-test
-> base==head -> same-version no-op. This is the common path, not a rare edge.
M2 now proves the fix on that nightly scenario (canonical already==latest ->
step back to previous published).
Operator 2026-06-17. Closes the prevb resolver gap: when the last-green
warm-canonical base version equals the PR head version, step back to the
newest published version strictly older than head (design A) instead of a
same-version no-op or a skip. Design B (canonical history for a green-verified
older base) saved to IDEAS. Auto-runs after regall (watchdog advances + switches
to opus).
Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single
compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) ->
main fallback -> skip; optional minimal per-recipe previous/ folder for
base-only version repairs (ignored for head, version-guarded, removable when
stale). Validated on discourse PR #4 (official-image switch the current overlay
masks). regall then sweeps all recipes for regressions on sonnet.
Operator-requested 2026-06-15. gitea is currently only a dep provider for
drone; this phase builds the full recipe-under-test suite (install/upgrade/
backup/restore/custom + lint + screenshot), ports the upstream parity corpus
(health_check, git_push), and verifies recipe-maintainers/gitea PR #1
('feat: support Git LFS on plain gitea', branch lfs-plain-gitea) via an LFS
round-trip + JWT-stability capstone test — red/skipped on main, green on the
PR. Central constraint: do NOT break drone's gitea-dep path (shared
recipe_meta.py). builder+adversary on sonnet.
The agents.toml diff also records the already-live aoeng/aotest/porepo/poe2e
queue head (agent-orchestrator workstream; their plan files remain owned by
that effort).
cc-ci recipe-upgrade skill now computes the version via 'abra recipe release --dry-run'
(not a hand-edit) and requires the PR body to link upstream release notes per service.
Bumps the recipe-maintainer submodule pointer to the matching change.
Operator: uptime-kuma is maintained elsewhere — drop it from the weekly upgrade
but keep it in the used-recipes inventory. New cc-ci-plan/used-recipes.md is the
canonical list of every recipe cc-ci deploys/tests, with a weekly|external tier;
upgrade-all §1 now excludes 'external' rows from the candidate list (explicit
--args still override). uptime-kuma = external; all others weekly.
Re-target the traefik health gate off ci.commoninternet.net (the dashboard,
which is After=deploy-proxy) onto a traefik-self endpoint, breaking the
fresh-boot deadlock while keeping health-gated rollback. M1 controlled repro by
the loops; M2 from-scratch cold-boot proof owned by the orchestrator.
The unified agents.py watchdog kept firing the hourly orchestrator supervision
ping even after SEQUENCE-COMPLETE (the old launch.py watchdog exited on
completion, which stopped them). Gate the wake loop on the SEQUENCE-COMPLETE
marker so a finished build stays fully at rest — no pings. Resumes
automatically when new work is queued (that clears the marker, line 631).
The unification transcribed phases from .phases-spec before cf48 was added,
so the operator's just-requested opus 4.8 cfold review got dropped. Re-append
it after ghost (system is past cf55/on pvfix, so can't insert before pvfix
without shifting the live phase index). agents.py re-reads config each tick.
Second independent review of the cfold custom-folder collapse, by Opus 4.8
instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...).
Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.