From 542ed0afe3d34be07f54c49202d006b46d7cfb97 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Tue, 9 Jun 2026 19:25:20 +0000 Subject: [PATCH] memory: move agent memory into repo (memory/), note in AGENTS.md Persistent agent memories now live in memory/ in this repo; the Claude auto-memory path is symlinked here so future memories land in the repo and get committed like any other change. --- AGENTS.md | 9 ++++++ memory/MEMORY.md | 11 +++++++ memory/abra-chaos-deploy-checkout-gotcha.md | 22 ++++++++++++++ memory/drone-sqlite-log-extraction.md | 14 +++++++++ .../immich-pgvectors-drop-database-panic.md | 14 +++++++++ memory/orchestrator-host-hetzner.md | 26 +++++++++++++++++ memory/plausible-upgrade-base-trap.md | 27 +++++++++++++++++ memory/push-commits-to-remote.md | 14 +++++++++ memory/recipe-mirrors-public-org-blocker.md | 29 +++++++++++++++++++ memory/regression-canary-cadence.md | 14 +++++++++ memory/shared-recipe-checkout-race.md | 14 +++++++++ 11 files changed, 194 insertions(+) create mode 100644 memory/MEMORY.md create mode 100644 memory/abra-chaos-deploy-checkout-gotcha.md create mode 100644 memory/drone-sqlite-log-extraction.md create mode 100644 memory/immich-pgvectors-drop-database-panic.md create mode 100644 memory/orchestrator-host-hetzner.md create mode 100644 memory/plausible-upgrade-base-trap.md create mode 100644 memory/push-commits-to-remote.md create mode 100644 memory/recipe-mirrors-public-org-blocker.md create mode 100644 memory/regression-canary-cadence.md create mode 100644 memory/shared-recipe-checkout-race.md diff --git a/AGENTS.md b/AGENTS.md index 2eaec26..4307835 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -85,6 +85,15 @@ cc-ci VM"). The orchestrator is the human's steering wheel; the loops are the en Never commit secret values. `.testenv`, `*.tfstate`, `*.key`/`*.pem`, and the loop runtime/clone dirs are gitignored. Reference secret *locations*, never their contents (`plan.md` §9). +## Agent memory lives in `memory/` (in this repo) + +The orchestrator's persistent agent memory is the **`memory/`** directory of this repo — one file +per fact with frontmatter, indexed by `memory/MEMORY.md`. The Claude auto-memory path +(`~/.claude/projects/-srv-cc-ci-orch/memory`) is a **symlink** to it, so memories written the normal +way land in the repo automatically. **Future memories must also go there**: after writing or +updating a memory file (and its `MEMORY.md` index line), commit it here and push, like any other +intentional repo change. Never put secret values in a memory file (see Hard rule). + ## Commit discipline When the orchestrator, Builder, or assistant makes intentional repository changes here, commit them diff --git a/memory/MEMORY.md b/memory/MEMORY.md new file mode 100644 index 0000000..1c4cff3 --- /dev/null +++ b/memory/MEMORY.md @@ -0,0 +1,11 @@ +# Memory index + +- [Orchestrator host: Hetzner](orchestrator-host-hetzner.md) — runs on Hetzner cpx22; rebuild cmd, loops-service bounce, git-identity gotcha +- [Push commits to remote](push-commits-to-remote.md) — push to git.autonomic.zone right after every commit in this repo +- [Regression canary cadence](regression-canary-cadence.md) — server E2E canaries run on polish/review/release, not every commit +- [Recipe-mirrors public / org blocker](recipe-mirrors-public-org-blocker.md) — mirrors public but recipe-maintainers ORG is private → live PR-STATUS column dark until operator flips org public +- [abra chaos-deploy checkout gotcha](abra-chaos-deploy-checkout-gotcha.md) — `abra app new` moves recipe checkout to release tag; checkout PR branch after, or chaos deploys wrong tree +- [Shared recipe-checkout race](shared-recipe-checkout-race.md) — never git-checkout ~/.abra/recipes/ on cc-ci while its CI build runs; harness deploys from that tree +- [immich pgvecto.rs DROP DATABASE panic](immich-pgvectors-drop-database-panic.md) — DROP DATABASE crashes immich's postgres image; use pg_dump --clean --if-exists + search_path rewrite +- [Drone sqlite log extraction](drone-sqlite-log-extraction.md) — copy /data/database.sqlite from drone container, query builds→stages→steps→logs for full step output +- [plausible upgrade-base trap](plausible-upgrade-base-trap.md) — CI REDs from published 3.0.0 base (no x86_64 arch → 404 → silent exit 1), not the PR; needs UPGRADE_BASE_VERSION=3.0.1+v2.0.0 in cc-ci tests diff --git a/memory/abra-chaos-deploy-checkout-gotcha.md b/memory/abra-chaos-deploy-checkout-gotcha.md new file mode 100644 index 0000000..cbd293b --- /dev/null +++ b/memory/abra-chaos-deploy-checkout-gotcha.md @@ -0,0 +1,22 @@ +--- +name: abra-chaos-deploy-checkout-gotcha +description: "abra app new moves the recipe checkout to the release tag — checkout the PR branch AFTER app new, or chaos deploys the wrong tree" +metadata: + node_type: memory + type: project + originSessionId: fc17c9c2-ab6e-4c11-856e-a6a6e160a0ec +--- + +On cc-ci, `abra app new ` checks out the latest *published release tag* in +`~/.abra/recipes/`, silently discarding whatever commit you had checked out. A +subsequent `abra app deploy --chaos` then deploys that tag's tree, not your WIP. + +**Why:** abra pins app creation to the recipe's released version and moves the recipe +checkout to do it; `--chaos` only means "deploy the working tree as-is at deploy time". + +**How to apply:** in the step-2b direct-deploy loop, order matters: `abra app new` first, +*then* `git checkout ` in the recipe dir, then `abra app deploy --chaos`. +Verify with the deploy overview (config versions / images) that the intended tree went out. +Also: plausible's `.env.sample` ships `DISABLE_AUTH/DISABLE_REGISTRATION=replace-me`, which +crash-loops the app (`binary_to_existing_atom("replace-me")`) — set them to true/false in +any dev env. See [[regression-canary-cadence]] for related CI cadence. diff --git a/memory/drone-sqlite-log-extraction.md b/memory/drone-sqlite-log-extraction.md new file mode 100644 index 0000000..3cc6421 --- /dev/null +++ b/memory/drone-sqlite-log-extraction.md @@ -0,0 +1,14 @@ +--- +name: drone-sqlite-log-extraction +description: How to read full drone CI step logs on cc-ci — copy /data/database.sqlite from the drone container and query it +metadata: + node_type: memory + type: reference + originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b +--- + +Drone on cc-ci has no on-disk logs and no API token handy. To get full step logs: +1. `ssh cc-ci 'docker cp $(docker ps -qf name=drone):/data/database.sqlite /tmp/drone.sqlite'` then scp to orchestrator (no python3 on cc-ci PATH). +2. Query with python3 sqlite3: `builds` (build_number → build_id) → `stages` (stage_build_id) → `steps` (step_stage_id) → `logs` where log_id = step_id; `log_data` is a JSON array of `{pos,out,time}` lines. + +**Why:** this is how the real root cause of immich CI builds 229/230 ("bash: /pg_backup.sh: No such file or directory" in the backup hook) was found after results.json/junit gave only the assertion failure. Related: [[shared-recipe-checkout-race]] diff --git a/memory/immich-pgvectors-drop-database-panic.md b/memory/immich-pgvectors-drop-database-panic.md new file mode 100644 index 0000000..847b96c --- /dev/null +++ b/memory/immich-pgvectors-drop-database-panic.md @@ -0,0 +1,14 @@ +--- +name: immich-pgvectors-drop-database-panic +description: "Never DROP DATABASE on immich's postgres image — pgvecto.rs worker PANICs and crashes postgres; use pg_dump --clean --if-exists instead" +metadata: + node_type: memory + type: project + originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b +--- + +On immich's DB image (ghcr.io/immich-app/postgres:14-vectorchord0.4.3-pgvectors0.2.0), `DROP DATABASE` destabilises the legacy pgvecto.rs (`vectors`) background worker: it loops on "IPC connection is closed unexpected" until `PANIC: ERRORDATA_STACK_SIZE exceeded` → postgres aborts (signal 6) → the app never reconverges. Per-table `DROP TABLE` is safe; only `DROP DATABASE` triggers it. + +**Why:** confirmed live in dev-immich and in CI build 225 DB-service logs during the immich backup/restore fix (PR #2, June 2026). + +**How to apply:** for a true point-in-time restore without dropping the DB, back up with `pg_dump --clean --if-exists` (per-object DROP+recreate) and on restore rewrite pg_dump's `set_config('search_path', '', false)` to `'public, pg_catalog', true` (VectorChord types unresolvable otherwise — same rewrite as docs.immich.app/administration/backup-and-restore). See the recipe's pg_backup.sh. Related: [[shared-recipe-checkout-race]], [[drone-sqlite-log-extraction]] diff --git a/memory/orchestrator-host-hetzner.md b/memory/orchestrator-host-hetzner.md new file mode 100644 index 0000000..4fedaae --- /dev/null +++ b/memory/orchestrator-host-hetzner.md @@ -0,0 +1,26 @@ +--- +name: orchestrator-host-hetzner +description: The cc-ci orchestrator runs on a Hetzner cpx22; key host facts + the git-identity gotcha +metadata: + node_type: memory + type: project + originSessionId: cd772f12-1978-47c3-894b-0ebbe0d2987f +--- + +The cc-ci orchestrator (loops + watchdog + this session) runs on a **Hetzner cpx22** as of +2026-05-31, replacing the Incus VM (100.116.55.106). + +- Hetzner server **134487234**, public **168.119.126.100**, tailnet **cc-ci-orchestrator-1** @ + **100.84.190.30**. Flake host **cc-ci-orchestrator-hetzner**. +- Rebuild: `sudo nixos-rebuild switch --flake .#cc-ci-orchestrator-hetzner` from `/srv/cc-ci-orch` + (`/srv/cc-ci` is a symlink to it). The Bash tool runs as user **loops** (uid 1000, passwordless + sudo) — plain `nixos-rebuild switch` fails on the profile symlink; use `sudo`. +- Reboot-resilience: `cc-ci-loops.service` is **enabled** (wantedBy multi-user.target); ExecStartPre + `reboot-log.sh` auto-logs reboots to REBOOTS.md. Its `script` runs `launch.sh start`, which + **stops+restarts the loops** — so any rebuild that (re)starts the unit bounces the loops (they + re-orient from git; harmless but noticeable). +- **Git-identity gotcha:** the box had no git user.name/email configured; commits fail with "Author + identity unknown". Set per-repo to match prior commits: `autonomic-bot + `. + +Full record: `cc-ci-plan/plan-orchestrator-hetzner-migration.md`. diff --git a/memory/plausible-upgrade-base-trap.md b/memory/plausible-upgrade-base-trap.md new file mode 100644 index 0000000..6843356 --- /dev/null +++ b/memory/plausible-upgrade-base-trap.md @@ -0,0 +1,27 @@ +--- +name: plausible-upgrade-base-trap +description: "plausible CI REDs come from the published 3.0.0 base deploy (no x86_64 arch → 404 → silent exit 1), not the PR tree; needs UPGRADE_BASE_VERSION=3.0.1+v2.0.0 in cc-ci tests" +metadata: + node_type: memory + type: project + originSessionId: fc17c9c2-ab6e-4c11-856e-a6a6e160a0ec +--- + +cc-ci's upgrade tier deploys `recipe_versions[-2]` as the base before upgrading to the PR +head (deploy-once design: the install tier asserts against that base too). For plausible, +tags are `…, 3.0.0+v2.0.0, 3.0.1+v2.0.0` so the default base is **3.0.0+v2.0.0**, whose +entrypoint lacks an x86_64 ARCH mapping → requests `clickhouse-backup-linux-x86_64.tar.gz` +→ HTTP 404 always → `set -e` + silenced wget → container exits 1 with **empty service +logs** → crash-loop → install timeout RED. Nothing in the PR can fix this: the base tag is +immutable history. + +**Why:** the PR adds 3.1.0 above the newest published tag — the harness-documented case +where `[-2]` is the wrong base and `[-1]` (3.0.1) is correct. + +**How to apply:** the fix is one line in the cc-ci repo (gated by --with-tests / operator): +`tests/plausible/recipe_meta.py: UPGRADE_BASE_VERSION = "3.0.1+v2.0.0"`. The recipe-side +hardening (verified cached binary on the persistent volume, Altinity URL, retries+timeout, +loud hard-fail, depends_on fix) is on PR #3 (commit 9f8bcbc). Diagnosis + ask posted at +https://git.autonomic.zone/recipe-maintainers/plausible/pulls/3#issuecomment-14261. +Before burning a !testme on an upgrade-stage recipe, check what base version the harness +will pick and whether that base can actually converge. See [[abra-chaos-deploy-checkout-gotcha]]. diff --git a/memory/push-commits-to-remote.md b/memory/push-commits-to-remote.md new file mode 100644 index 0000000..52eb623 --- /dev/null +++ b/memory/push-commits-to-remote.md @@ -0,0 +1,14 @@ +--- +name: push-commits-to-remote +description: "Operator wants every commit pushed to git.autonomic.zone right after it's made" +metadata: + node_type: memory + type: feedback + originSessionId: 7b5366a6-263c-421b-be7d-9f888067336b +--- + +In the cc-ci orchestrator repo (`/srv/cc-ci-orch`), push to `origin` (git.autonomic.zone/recipe-maintainers/cc-ci-orchestrator) immediately after committing — don't leave commits sitting locally waiting to be asked. + +**Why:** the operator treats the remote as the source of truth / backup; local-only commits are a loss risk on this autonomous box. + +**How to apply:** after any `git commit` here, run `git push origin main` (or the current branch) in the same turn. The remote is already credentialed in the URL. Mind the [[orchestrator-host-hetzner]] git-identity gotcha (commit as `autonomic-bot`). This standing preference replaces the default "commit/push only when asked" for this repo. diff --git a/memory/recipe-mirrors-public-org-blocker.md b/memory/recipe-mirrors-public-org-blocker.md new file mode 100644 index 0000000..cb167a1 --- /dev/null +++ b/memory/recipe-mirrors-public-org-blocker.md @@ -0,0 +1,29 @@ +--- +name: recipe-mirrors-public-org-blocker +description: "Recipe mirrors are public repos but the recipe-maintainers ORG is private-visibility, so anon reads 404; bot can't flip the org" +metadata: + node_type: memory + type: project + originSessionId: f7960036-d990-4a21-a81e-f7c486d97fea +--- + +As of 2026-06-09 all 21 recipe mirrors under `recipe-maintainers` were flipped `private=false` +(secret-scanned first), to power the Recipe Report's live PR-STATUS column via the tokenless +same-origin proxy `report.ci.commoninternet.net/pr//` (shipped in cc-ci +`nix/modules/reports.nix`). BUT the **org itself is `visibility: private`**, which makes Gitea 404 +all its repos for anonymous users — so the live STATUS column shows a muted "?" instead of open/✓. + +**Blocker:** `autonomic-bot` cannot flip the org (PATCH `/orgs/recipe-maintainers` → 403 "Must be an +organization owner"; `is_admin=false`; the basic-auth credential lacks `write:organization` scope, +even though the bot is in the Owners team). Confirmed model: `autonomic-cooperative` is a public org +and its repos ARE anonymously visible; `recipe-maintainers` is private and they are not. + +**Why:** the whole live-status feature is dark until this is resolved. Private repos stay hidden even +in a public org, so flipping the org public does NOT expose the four locked-private repos (`cc-ci`, +`cc-ci-secrets`, `cc-ci-orchestrator`, `archived-cc-ci-orchestrator`). + +**How to apply:** operator (an org owner) must set `recipe-maintainers` org visibility to **public** +in the Gitea UI (Settings → make org public), OR provision a token with `write:organization` scope. +The instant that happens, the proxy returns 200 PR JSON and the column lights up — no redeploy needed. +Verify: `curl https://report.ci.commoninternet.net/pr/cryptpad/5` should return PR JSON, not a 404. +Related: [[push-commits-to-remote]]. diff --git a/memory/regression-canary-cadence.md b/memory/regression-canary-cadence.md new file mode 100644 index 0000000..3c32f40 --- /dev/null +++ b/memory/regression-canary-cadence.md @@ -0,0 +1,14 @@ +--- +name: regression-canary-cadence +description: "The cc-ci server regression canaries are expensive — run on polish/review/release, not every commit" +metadata: + node_type: memory + type: feedback + originSessionId: 7b5366a6-263c-421b-be7d-9f888067336b +--- + +The cc-ci **server regression canaries** (the codified E2E pytest suite — full lifecycle on `custom-html-tiny` + `lasuite-docs` good canaries plus a known-bad false-green-guard fixture; plan: `cc-ci-plan/plan-server-regression-canaries.md`) must **NOT** run on every commit/PR. + +**Why:** they're slow and resource-heavy — full lifecycle on lasuite-docs is minutes and needs the live server/abra/Swarm. Running them per-commit would be wasteful and slow the loop. + +**How to apply:** run them **deliberately at milestones** — polishing passes, code reviews, and releases of the cc-ci server — before trusting a batch of changes, not per incremental commit. Keep them opt-in behind the `@pytest.mark.canary` marker; if ever wired to `!testme` on the cc-ci repo, gate behind a deliberate trigger (label / `--canary`), never an automatic run on every PR. diff --git a/memory/shared-recipe-checkout-race.md b/memory/shared-recipe-checkout-race.md new file mode 100644 index 0000000..a683c75 --- /dev/null +++ b/memory/shared-recipe-checkout-race.md @@ -0,0 +1,14 @@ +--- +name: shared-recipe-checkout-race +description: Never run git checkout on ~/.abra/recipes/ on cc-ci while a CI build for that recipe is running — the harness chaos-deploys from that same working tree +metadata: + node_type: memory + type: feedback + originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b +--- + +The cc-ci harness (run_recipe_ci.py) deploys the upgrade tier from the SHARED `~/.abra/recipes/` working tree on cc-ci via `abra app deploy --chaos`. Dev debugging that switches that checkout (`git checkout -f`, repro scripts) while a CI build runs makes CI deploy the wrong tree. + +**Why:** immich builds 229/230 went RED with "bash: /pg_backup.sh: No such file or directory" — the configs stanza wasn't in the tree CI deployed, because concurrent dev repro scripts were flipping the same checkout between base tag and PR head. A faithful manual repro with no concurrent churn mounted the config fine. + +**How to apply:** before triggering !testme, park the recipe checkout clean at the PR head and do zero abra/git activity on cc-ci for that recipe until the build verdicts. Also remember [[abra-chaos-deploy-checkout-gotcha]] (`abra app new` moves the checkout to the release tag). Related: [[drone-sqlite-log-extraction]], [[immich-pgvectors-drop-database-panic]]