Commit Graph

285 Commits

Author SHA1 Message Date
28ef7e44ab launch-upgrader: add stall-detect + auto-resume watchdog (opencode-go limit)
The opencode-go subscription's rolling usage-limit (429) ends the 'opencode run'
agent loop mid-run; it does NOT self-resume. Add:
- resume: continue the SAME session (context preserved) via 'opencode run -s <id>
  --continue' — finds the session from the web server, kills the idle proc safely
  (via /proc scan, never pkill -f self-match), relaunches in the tmux session.
- babysit: poll the session log; on a stall (>15min idle) wait out any 429
  retry-after then auto-resume. Spawned automatically by an opencode 'start'.

So a usage-limit pause now self-heals instead of needing a manual nudge.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-23 01:26:24 +00:00
3762efcce3 upstream(matrix-synapse): document bridge calver + telegram Go rewrite + PG13→15 2026-06-22 21:46:42 +00:00
438819a94a upstream(mattermost-lts): 2026-06-22 re-check — 11.7.5 still latest ESR 2026-06-22 21:44:13 +00:00
543b98d030 upstream(ghost): document MySQL 9.x is NOT supported by Ghost (do not bump past 8.x) 2026-06-22 21:19:31 +00:00
25ef098581 upstream(discourse): bitnamilegacy frozen at 3.5.0; 9.x are Helm chart OCI artifacts 2026-06-22 20:56:17 +00:00
8887c1089e upstream(custom-html-tiny): fix alpine/git source repo URL 2026-06-22 20:52:55 +00:00
2a682b1fe5 upstream(custom-html): fix alpine/git source repo URL + add 1.31.2 notes 2026-06-22 20:43:38 +00:00
9e91d47205 recipe-report: harden spec-in/page-out determinism contract
Make explicit that ALL formatting/HTML is owned by recipe-report.py render() and
the model's only artifact is the spec JSON — never hand-write/edit HTML. Matters
now that glm-5.2 drives the report. Also fix stale 'default opus' refs (report
now defaults to opencode-go/glm-5.2, overridable via REPORT_BACKEND/REPORT_MODEL).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 20:34:14 +00:00
165eb62070 launch-report: default to opencode-go/glm-5.2 + fix opencode run invocation
Report launcher now defaults backend=opencode, model=opencode-go/glm-5.2 (claude
override → opus). Replaced the broken opencode 'attach + send-keys' path with the
same 'opencode run -m … --share --attach --title' pattern as the upgrader (the
old path passed no model and injected the prompt via keystrokes).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 20:28:55 +00:00
5351ec2e40 launch-upgrader: default to opencode-go/glm-5.2 when unset
Weekly upgrade run now defaults backend=opencode, model=opencode-go/glm-5.2 with
no env set. Model default tracks backend (claude override → sonnet). Override via
LOOP_BACKEND/LOOP_MODEL or /srv/cc-ci/upgrader.env.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 20:23:26 +00:00
1443ccaea5 weekly upgrade: optional backend/model via /srv/cc-ci/upgrader.env
cc-ci-upgrade-all now reads an optional EnvironmentFile so the weekly run can
switch backend/model (e.g. LOOP_BACKEND=opencode LOOP_MODEL=opencode-go/glm-5.2)
without a rebuild. Absent file → claude/sonnet (unchanged). Built+switched on
cc-ci-orchestrator-hetzner, host verified healthy.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 20:21:16 +00:00
ec18c98af6 launch-upgrader: fix opencode --model placement + add web-attach/--share
The opencode backend emitted 'opencode --model X run ...' but -m/--model is a
flag on the run subcommand, so the model was being ignored. Move it after run.
Add OPENCODE_SHARE (default on): attach the session to the shared opencode web
server (oc.commoninternet.net) AND create a public --share link for monitoring.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 20:14:27 +00:00
f29aacc29a point recipe-maintainer submodule at public repo (recipe-maintainers/recipe-maintainer)
Use the public mirror from now on; ignore the private notplants repo.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 13:45:28 +00:00
388e5ec7d0 bump recipe-maintainer: opencode readme + skill mirror
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 01:08:36 +00:00
02f13ab35f recipe-upgrade: scope upstream cross-check to abra-unparseable tags only
Trust abra fully for any image whose tag it can read — a normal semver/calver
tag with no newer version is genuinely up-to-date. Only cross-check upstream for
images abra physically can't parse (tag+digest pins, digest-only pins), which is
the actual immich blind spot. Avoids redundant upstream checks on every recipe.
2026-06-19 11:54:21 +00:00
ce5d2e22cf recipe-upgrade: cross-check upstream when abra is blind (immich tag+digest skip)
abra recipe upgrade is the first approach, but it silently contributes no
candidate for tag+digest pins (FATA: tag and digest not supported), digest-only
pins, and non-semver tags. immich kept getting skipped this way. Before
concluding SKIPPED — up-to-date, do a direct upstream tag check for every image
abra could not cleanly evaluate; only skip when BOTH agree nothing is newer.
2026-06-19 11:51:37 +00:00
68bbfc72f2 upstream(mattermost-lts): update to 11.7 ESR; note restore fix, schemeid gotcha
- Mattermost now on 11.7 ESR (EOL 2027-05-15); 10.11 ESR expires 2026-08-15
- Latest patch: 11.7.5 (2026-06-18)
- Note: avoid 11.7.0–11.7.2 (schemeid bug upgrading from 10.11.17+)
- Backup/restore now uses pg_backup.sh (proper restore hook; PR #2/PR #1 fix)
- Next ESR expected ~Feb 2027

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Mbq9p2eCzZH59qfw9WsgKZ
2026-06-19 03:53:48 +00:00
15e88eaca2 upstream(hedgedoc): add release-notes sources registry 2026-06-19 02:49:42 +00:00
f195dfc828 upstream(gitea): release-notes sources 2026-06-19 02:45:01 +00:00
0ccd98fa8e upstream(drone): release-notes sources 2026-06-19 02:21:11 +00:00
e844ae3909 upstream(bluesky-pds): update registry — 0.4.5001 is new versioning scheme, not a mis-tag 2026-06-19 02:12:41 +00:00
a22ae8deed journal: redfix DONE — all 6 canon-sweep failures fixed + verified (4 recipe PRs, 2 harness); SEQUENCE-COMPLETE 2026-06-18 14:54:28 +00:00
ff6c44a627 plan: queue redfix — investigate ALL canon-sweep failures + FIX each (recipe PR or harness improvement, opus)
Operator 2026-06-17: fix all six (discourse timeout, mattermost-lts restore,
mumble handshake, bluesky warm-routing, gitea 3.6.0 advance, keycloak
de-enroll) — none left as a standing exception. Isolation re-run first
(flake vs genuine; operator recalls mattermost-lts/mumble passing). keycloak
+ bluesky likely share a warm-domain-collision harness fix; gitea via recipe
PR or warm-advance fallback. Nothing merged.
2026-06-17 23:17:51 +00:00
359f3f0978 journal: SEQUENCE-COMPLETE — regall/samever/canon/dash/settings/nixenv all DONE, host healthy 2026-06-17 22:36:13 +00:00
bd16123865 plan: queue nixenv — single-source the harness runtime env (timer + Drone runner share deps; root-cause fix for DEFECT-3 drift, opus)
Operator 2026-06-17. The recipe-test runtime env is declared 2-3x today
(harness.nix cc-ci-run runtimeInputs; nightly-sweep.nix duplicate pyEnv + a
divergent list + DEFECT-3 host-PATH patch; host systemPackages git-lfs for the
Drone runner) -> they drift (DEFECT-3 = missing bash/git-lfs in the timer).
Factor ONE shared env referenced by cc-ci-run, the sweep timer, and host
systemPackages; sweep invokes cc-ci-run so env is identical by construction.
Queued last (after settings).
2026-06-17 15:38:45 +00:00
f7825f8494 plan(settings): add release-tag-first no-canonical fallback; bump to opus
Operator 2026-06-17: (1) no-canonical upgrade base should prefer the most recent
release TAG < head (a real published predecessor, reusing samever's helper),
with raw main-tip only as a last resort, then skip — always-on, improves this
server too. (2) SKIP_CANONICALS_FOR_UPGRADE=true feeds that same release-tag-first
fallback (so it swaps canonical->latest-release base without losing upgrade
coverage). (3) model bumped sonnet->opus.
2026-06-17 09:48:34 +00:00
aa1f897820 plan: queue settings — minimal CI-server settings.toml + SKIP_CANONICALS_FOR_UPGRADE (sonnet, after dash)
Operator 2026-06-17. Introduce a minimal, extensible server-level settings.toml
for the cc-ci server; first value SKIP_CANONICALS_FOR_UPGRADE (bool, default
false, false on this server). When true, resolve_upgrade_base bypasses the
canonical and uses the main-tip predecessor — codifying that canonicals are an
optional optimization for the upgrade base. Scoped to the upgrade base only
(promote/--quick separate). Runs after canon/dash (builds on the final resolver).
2026-06-17 09:44:22 +00:00
ee8d30b43e plan(canon): retire UPGRADE_BASE_VERSION (gated) — plausible's pin becomes redundant under the dynamic canonical base
Operator 2026-06-17: UPGRADE_BASE_VERSION is still used (plausible pins 3.0.1+
v2.0.0 to dodge the broken 3.0.0 base; bluesky-pds references it as a future
re-enable). Once canon establishes plausible's canonical at 3.0.1, the dynamic
base resolves correctly without the pin -> strip the key (meta/resolver/docs/
tests) + migrate plausible + update bluesky-pds note. GATED: keep it if
plausible genuinely still needs the escape-hatch (never drop upgrade coverage).
2026-06-17 04:43:23 +00:00
9e7d76ca1f plan: queue dash — fix incomplete per-recipe run history on the CI dashboard (opus, after canon)
Operator 2026-06-17. /recipe/<name> shows only the latest run for most recipes
because history is built from a single page of the latest 100 Drone builds,
while 362 runs exist on the host. Source per-recipe history from the local
/var/lib/cc-ci-runs artifacts (already bind-mounted read-only) instead — full,
durable history. Deploy + verify live on bluesky-pds.
2026-06-17 04:29:52 +00:00
4476cf1505 plan(canon): canonical promotes only to TAGGED releases; sweep triggers on a new release tag (not a new commit)
Operator 2026-06-17. (1) should_promote_canonical also requires the tested
version to have a release tag -> canonical is always a real published release,
never an untagged main commit. (2) sweep trigger is version/tag-keyed not
commit-keyed: skip a recipe unless its latest release tag is newer than its
canonical (skip new-untagged-commits-without-a-version). This makes the sweep
and samever ORTHOGONAL (samever never fires in the sweep; it's the PR-path
same-version guard). Updated gates/DoD accordingly.
2026-06-17 04:18:36 +00:00
9a81fe88e8 plan(canon): enroll ALL recipes, weekly cadence, and REQUIRE proof it plays nice with samever
Operator 2026-06-17: all recipes WARM_CANONICAL (watch warm-volume disk),
weekly timer, and an explicit M2 requirement to prove the sweep<->samever
interaction — skip uses COMMIT equality, samever uses VERSION equality; M2
must demonstrate all 3 sweep paths (skip / version-bump upgrade / same-version-
label step-back) and the commit-vs-version boundary.
2026-06-17 04:09:56 +00:00
05e2635019 plan: queue canon — make the canonical sweep actually work (substitute for hollow nightly sweep)
Operator 2026-06-17. The nightly-sweep timer fires green but is a no-op: only
custom-html is WARM_CANONICAL and zero canonical.json records exist -> no
canonical has ever been promoted end-to-end. canon makes it real + proven:
fix/prove the promote path, broaden enrollment, add upstream mirror-sync +
skip-when-unchanged, verify end-to-end (incl. run-twice no-op). Schedule is
incidental; correctness is the deliverable. Replaces the hollow sweep. opus.
2026-06-17 04:06:59 +00:00
03292c6f57 plan(samever): frame the same-version gap as the nightly cold-sweep STEADY STATE (operator insight)
The nightly cold-on-latest run promotes canonical->LATEST, so every subsequent
nightly (until a new version ships) finds canonical==latest==version-under-test
-> base==head -> same-version no-op. This is the common path, not a rare edge.
M2 now proves the fix on that nightly scenario (canonical already==latest ->
step back to previous published).
2026-06-17 03:56:09 +00:00
b40dcb504c plan: queue samever (older-base fallback when last-green==head, opus); IDEAS: defer canonical-history (B)
Operator 2026-06-17. Closes the prevb resolver gap: when the last-green
warm-canonical base version equals the PR head version, step back to the
newest published version strictly older than head (design A) instead of a
same-version no-op or a skip. Design B (canonical history for a green-verified
older base) saved to IDEAS. Auto-runs after regall (watchdog advances + switches
to opus).
2026-06-17 03:50:49 +00:00
65ee741869 plan: queue prevb (dynamic upgrade base + previous/ config, opus) + regall (all-recipe regression, sonnet)
Operator 2026-06-16. Replaces the static UPGRADE_BASE_VERSION + leaky single
compose.ccci.yml overlay model: dynamic base = last-green(warm canonical) ->
main fallback -> skip; optional minimal per-recipe previous/ folder for
base-only version repairs (ignored for head, version-guarded, removable when
stale). Validated on discourse PR #4 (official-image switch the current overlay
masks). regall then sweeps all recipes for regressions on sonnet.
2026-06-16 23:55:28 +00:00
3c1d48e0b8 journal: gtea phase DONE — gitea fully-tested + LFS PR #1 verified; SEQUENCE-COMPLETE 2026-06-15 22:58:30 +00:00
24b3a25ce6 plan: queue gtea — enroll gitea as a fully-tested recipe + verify LFS PR #1
Operator-requested 2026-06-15. gitea is currently only a dep provider for
drone; this phase builds the full recipe-under-test suite (install/upgrade/
backup/restore/custom + lint + screenshot), ports the upstream parity corpus
(health_check, git_push), and verifies recipe-maintainers/gitea PR #1
('feat: support Git LFS on plain gitea', branch lfs-plain-gitea) via an LFS
round-trip + JWT-stability capstone test — red/skipped on main, green on the
PR. Central constraint: do NOT break drone's gitea-dep path (shared
recipe_meta.py). builder+adversary on sonnet.

The agents.toml diff also records the already-live aoeng/aotest/porepo/poe2e
queue head (agent-orchestrator workstream; their plan files remain owned by
that effort).
2026-06-15 19:32:14 +00:00
2f2225e466 recipe-upgrade: defer version bump to operator's final 'abra recipe release'
§2 no longer bumps the coop-cloud version label in the recipe PR (no
--dry-run compute, no tag). It records the recommended 'abra recipe release
<recipe> -<x|y|z>' (no --dry-run) in the PR body (§3) as the operator's
final publish step — run after the upstream PR merges, it bumps the label +
tags + pushes in one go. Bumps recipe-maintainer submodule to 9daddac (same
change across its /recipe-upstream, -upgrade-apply, -upgrade-plan, -new-tag).
2026-06-15 18:38:27 +00:00
6acad7b35b recipe-upgrade: abra recipe release for version bump + upstream release-notes links in PR body
cc-ci recipe-upgrade skill now computes the version via 'abra recipe release --dry-run'
(not a hand-edit) and requires the PR body to link upstream release notes per service.
Bumps the recipe-maintainer submodule pointer to the matching change.
2026-06-15 18:09:41 +00:00
6a2464469f upgrade-all: skip 'external' recipes (uptime-kuma) + add used-recipes.md inventory
Operator: uptime-kuma is maintained elsewhere — drop it from the weekly upgrade
but keep it in the used-recipes inventory. New cc-ci-plan/used-recipes.md is the
canonical list of every recipe cc-ci deploys/tests, with a weekly|external tier;
upgrade-all §1 now excludes 'external' rows from the candidate list (explicit
--args still override). uptime-kuma = external; all others weekly.
2026-06-15 17:00:28 +00:00
489f6670da journal: pxgate cold-boot proof passed (real reboot, deploy-proxy active before dashboard) 2026-06-13 13:52:56 +00:00
6005a212d6 memory+journal: cc-ci host rebuild procedure; pxgate M2 deployed + verified on live host 2026-06-13 13:46:19 +00:00
1aee81b4f3 plan: queue pxgate — fix deploy-proxy/dashboard health-gate circular dependency (D8)
Re-target the traefik health gate off ci.commoninternet.net (the dashboard,
which is After=deploy-proxy) onto a traefik-self endpoint, breaking the
fresh-boot deadlock while keeping health-gated rollback. M1 controlled repro by
the loops; M2 from-scratch cold-boot proof owned by the orchestrator.
2026-06-13 12:38:40 +00:00
97303abc25 watchdog: suppress scheduled wakes once the build sequence is complete
The unified agents.py watchdog kept firing the hourly orchestrator supervision
ping even after SEQUENCE-COMPLETE (the old launch.py watchdog exited on
completion, which stopped them). Gate the wake loop on the SEQUENCE-COMPLETE
marker so a finished build stays fully at rest — no pings. Resumes
automatically when new work is queued (that clears the marker, line 631).
2026-06-13 12:04:49 +00:00
84e13a7f23 fix(pvcheck/A2): update upgrade-all SKILL.md guard description
The durable /16 proxy fix landed in phase pvfix (2026-06-13).
Update the guard description from "safety net until that lands"
to "belt-and-suspenders even after the /16 fix" — guard logic
unchanged, description now accurate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-13 05:58:27 +00:00
1e9337ce89 agents.toml: re-add cf48 (opus cfold review) dropped during the launch-system migration
The unification transcribed phases from .phases-spec before cf48 was added,
so the operator's just-requested opus 4.8 cfold review got dropped. Re-append
it after ghost (system is past cf55/on pvfix, so can't insert before pvfix
without shifting the live phase index). agents.py re-reads config each tick.
2026-06-13 05:32:47 +00:00
b4a6aaea7e plan: queue cf48 — Opus 4.8 post-cfold coverage-loss review (cross-check of cf55 GPT-5.5)
Second independent review of the cfold custom-folder collapse, by Opus 4.8
instead of GPT-5.5, inserted after cf55 (queue ...cfold;cf55;cf48;pvfix;...).
Per-phase overrides .loop-model[-adv]-cf48=claude-opus-4-8 on the claude backend.
2026-06-13 05:15:05 +00:00
2c64cd69f0 fix(watchdog): detect idle opencode turns 2026-06-12 21:47:06 +00:00
85498931d1 plan: add gpt55 cfold review phase 2026-06-12 16:07:48 +00:00
dea6359bcd plan: queue proxy and ghost follow-up phases 2026-06-12 15:56:03 +00:00