From c33b21fe8dfe9875f46fd685511ef262541ac1f3 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Thu, 11 Jun 2026 20:56:24 +0000 Subject: [PATCH] memory: commit session notes (drone P0, weekly-upgrade-queued, mailu/index updates) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per AGENTS.md 'Agent memory lives in memory/ (in this repo)' — memory notes must be committed + pushed like any repo change, not left only in the local ~/.claude symlink target. --- memory/MEMORY.md | 4 +++- memory/drone-phase-p0-host-deploy.md | 18 ++++++++++++++ memory/plausible-upgrade-base-trap.md | 16 +++++++------ memory/weekly-upgrade-queued-after-phases.md | 25 ++++++++++++++++++++ 4 files changed, 55 insertions(+), 8 deletions(-) create mode 100644 memory/drone-phase-p0-host-deploy.md create mode 100644 memory/weekly-upgrade-queued-after-phases.md diff --git a/memory/MEMORY.md b/memory/MEMORY.md index afc92be..c5220da 100644 --- a/memory/MEMORY.md +++ b/memory/MEMORY.md @@ -8,5 +8,7 @@ - [Shared recipe-checkout race](shared-recipe-checkout-race.md) — never git-checkout ~/.abra/recipes/ on cc-ci while its CI build runs; harness deploys from that tree - [immich pgvecto.rs DROP DATABASE panic](immich-pgvectors-drop-database-panic.md) — DROP DATABASE crashes immich's postgres image; use pg_dump --clean --if-exists + search_path rewrite - [Drone sqlite log extraction](drone-sqlite-log-extraction.md) — copy /data/database.sqlite from drone container, query builds→stages→steps→logs for full step output -- [plausible upgrade-base trap](plausible-upgrade-base-trap.md) — CI REDs from published 3.0.0 base (no x86_64 arch → 404 → silent exit 1), not the PR; needs UPGRADE_BASE_VERSION=3.0.1+v2.0.0 in cc-ci tests +- [plausible upgrade-base trap](plausible-upgrade-base-trap.md) — RESOLVED: PR#3 GREEN L4; lessons: check harness base version pre-!testme; backupbot v2 label syntax; TinyLog not FREEZEable; BEAM exit-0 needs restart_policy any - [Swarm UpdateStatus convergence gotchas](swarm-updatestatus-convergence-gotchas.md) — N/N is not converged mid stop-first update; paused flag persists forever; only updating/rollback_started are active +- [drone phase P0 host deploy](drone-phase-p0-host-deploy.md) — orchestrator must nixos-rebuild cc-ci host (/etc/timezone, commit 3bde76f) before phase `drone` can run gitea +- [Weekly upgrade queued after phases](weekly-upgrade-queued-after-phases.md) — 06-12 cron skipped; auto-runs /upgrade-all when phase queue (drone) finishes; don'\''t systemctl start the timer diff --git a/memory/drone-phase-p0-host-deploy.md b/memory/drone-phase-p0-host-deploy.md new file mode 100644 index 0000000..25a9d9a --- /dev/null +++ b/memory/drone-phase-p0-host-deploy.md @@ -0,0 +1,18 @@ +--- +name: drone-phase-p0-host-deploy +description: "Phase `drone` P0 — orchestrator must deploy /etc/timezone host fix (nixos-rebuild on cc-ci) before gitea/drone work can run" +metadata: + node_type: memory + type: project + originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b +--- + +When the cc-ci phase queue reaches phase `drone` (or STATUS-drone.md says BLOCKED on P0), +the ORCHESTRATOR must deploy the committed host fix: cc-ci repo commit `3bde76f` adds +`environment.etc."timezone" = "UTC\n"` to `nix/hosts/cc-ci/configuration.nix`. Deploy = +sync `/root/cc-ci` on the CI host (operator-synced non-git copy, STALE re this commit) +with the current repo state, then `nixos-rebuild switch --flake /root/cc-ci#cc-ci` (ssh +alias `cc-ci`, root). Verify `test -f /etc/timezone` (content `UTC`). Do it at a quiet +CI moment (rebuild may restart services). Loops must NOT do host changes themselves — +plan: cc-ci-plan/plan-phase-drone-enroll.md §0. Queued 2026-06-11; phases before it: +bsky → dstamp → mailu → kuma. diff --git a/memory/plausible-upgrade-base-trap.md b/memory/plausible-upgrade-base-trap.md index 6843356..36a152f 100644 --- a/memory/plausible-upgrade-base-trap.md +++ b/memory/plausible-upgrade-base-trap.md @@ -18,10 +18,12 @@ immutable history. **Why:** the PR adds 3.1.0 above the newest published tag — the harness-documented case where `[-2]` is the wrong base and `[-1]` (3.0.1) is correct. -**How to apply:** the fix is one line in the cc-ci repo (gated by --with-tests / operator): -`tests/plausible/recipe_meta.py: UPGRADE_BASE_VERSION = "3.0.1+v2.0.0"`. The recipe-side -hardening (verified cached binary on the persistent volume, Altinity URL, retries+timeout, -loud hard-fail, depends_on fix) is on PR #3 (commit 9f8bcbc). Diagnosis + ask posted at -https://git.autonomic.zone/recipe-maintainers/plausible/pulls/3#issuecomment-14261. -Before burning a !testme on an upgrade-stage recipe, check what base version the harness -will pick and whether that base can actually converge. See [[abra-chaos-deploy-checkout-gotcha]]. +**How to apply:** RESOLVED 2026-06-09 — `UPGRADE_BASE_VERSION = "3.0.1+v2.0.0"` is merged to +cc-ci main, and PR #3 went GREEN level 4 (build 247). Full debrief: +/srv/cc-ci/.cc-ci-logs/upgrades/plausible-upgrade-2026-06-09.md. Lasting lessons: before +burning a !testme on an upgrade-stage recipe, check what base version the harness picks +(`recipe_versions[-2]`) and whether that base can converge; backup-bot-two 2.4.0 ignores v1 +`backupbot.backup.path` labels (use `backupbot.backup.volumes..path`, dump INTO a +volume); clickhouse-backup cannot FREEZE TinyLog tables (ecto's schema_migrations — TSV +roundtrip needed); BEAM apps exit 0 on supervision escalation, so swarm `restart_policy` +must be `any`, not `on-failure`. See [[abra-chaos-deploy-checkout-gotcha]]. diff --git a/memory/weekly-upgrade-queued-after-phases.md b/memory/weekly-upgrade-queued-after-phases.md new file mode 100644 index 0000000..a0c2aa6 --- /dev/null +++ b/memory/weekly-upgrade-queued-after-phases.md @@ -0,0 +1,25 @@ +--- +name: weekly-upgrade-queued-after-phases +description: Weekly /upgrade-all cron skipped for 2026-06-12; queued to auto-run when the current phase queue finishes (drone) +metadata: + node_type: memory + type: project + originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b +--- + +Operator (2026-06-11) cancelled tonight's weekly `/upgrade-all` cron run and queued it to +run once after the current phase queue (…mailu→drone) completes instead. + +- `cc-ci-upgrade-all.timer` was STOPPED (couldn't `disable` — /etc/systemd is read-only). + Its persistent stamp was forwarded to `2026-06-12 03:00 UTC` so a reboot/nixos-rebuild + tonight schedules the NEXT run for 06-19, not a catch-up of tonight's slot. +- **GOTCHA:** `systemctl start cc-ci-upgrade-all.timer` fires the service IMMEDIATELY + (Persistent=true). Do NOT `start` it to re-arm — let a host reboot/`nixos-rebuild` + reactivate it (the [[drone-phase-p0-host-deploy]] rebuild will); the forward stamp + prevents a catch-up fire. +- Post-phase auto-run is wired in launch.py (commit 3fa3178): the watchdog launches + `launch-upgrader.py start` when the LAST phase reaches `## DONE`, gated by the one-shot + flag `/srv/cc-ci/.cc-ci-logs/.run-upgrade-on-complete` (set 2026-06-11, consumed on fire). + So when phase `drone` finishes, `/upgrade-all` starts automatically (upgrader on sonnet). + +Once this fires (or if the plan changes), this memory is stale — delete it.