memory: move agent memory into repo (memory/), note in AGENTS.md

Persistent agent memories now live in memory/ in this repo; the Claude auto-memory path is symlinked here so future memories land in the repo and get committed like any other change.
2026-06-09 19:25:20 +00:00
parent 330378d30d
commit 542ed0afe3
11 changed files with 194 additions and 0 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -85,6 +85,15 @@ cc-ci VM"). The orchestrator is the human's steering wheel; the loops are the en
 Never commit secret values. `.testenv`, `*.tfstate`, `*.key`/`*.pem`, and the loop runtime/clone
 dirs are gitignored. Reference secret *locations*, never their contents (`plan.md` §9).

+## Agent memory lives in `memory/` (in this repo)
+
+The orchestrator's persistent agent memory is the **`memory/`** directory of this repo — one file
+per fact with frontmatter, indexed by `memory/MEMORY.md`. The Claude auto-memory path
+(`~/.claude/projects/-srv-cc-ci-orch/memory`) is a **symlink** to it, so memories written the normal
+way land in the repo automatically. **Future memories must also go there**: after writing or
+updating a memory file (and its `MEMORY.md` index line), commit it here and push, like any other
+intentional repo change. Never put secret values in a memory file (see Hard rule).
+
 ## Commit discipline

 When the orchestrator, Builder, or assistant makes intentional repository changes here, commit them
--- a/memory/MEMORY.md
+++ b/memory/MEMORY.md
@ -0,0 +1,11 @@
+# Memory index
+
+- [Orchestrator host: Hetzner](orchestrator-host-hetzner.md) — runs on Hetzner cpx22; rebuild cmd, loops-service bounce, git-identity gotcha
+- [Push commits to remote](push-commits-to-remote.md) — push to git.autonomic.zone right after every commit in this repo
+- [Regression canary cadence](regression-canary-cadence.md) — server E2E canaries run on polish/review/release, not every commit
+- [Recipe-mirrors public / org blocker](recipe-mirrors-public-org-blocker.md) — mirrors public but recipe-maintainers ORG is private → live PR-STATUS column dark until operator flips org public
+- [abra chaos-deploy checkout gotcha](abra-chaos-deploy-checkout-gotcha.md) — `abra app new` moves recipe checkout to release tag; checkout PR branch after, or chaos deploys wrong tree
+- [Shared recipe-checkout race](shared-recipe-checkout-race.md) — never git-checkout ~/.abra/recipes/<recipe> on cc-ci while its CI build runs; harness deploys from that tree
+- [immich pgvecto.rs DROP DATABASE panic](immich-pgvectors-drop-database-panic.md) — DROP DATABASE crashes immich's postgres image; use pg_dump --clean --if-exists + search_path rewrite
+- [Drone sqlite log extraction](drone-sqlite-log-extraction.md) — copy /data/database.sqlite from drone container, query builds→stages→steps→logs for full step output
+- [plausible upgrade-base trap](plausible-upgrade-base-trap.md) — CI REDs from published 3.0.0 base (no x86_64 arch → 404 → silent exit 1), not the PR; needs UPGRADE_BASE_VERSION=3.0.1+v2.0.0 in cc-ci tests
--- a/memory/abra-chaos-deploy-checkout-gotcha.md
+++ b/memory/abra-chaos-deploy-checkout-gotcha.md
@ -0,0 +1,22 @@
+---
+name: abra-chaos-deploy-checkout-gotcha
+description: "abra app new moves the recipe checkout to the release tag — checkout the PR branch AFTER app new, or chaos deploys the wrong tree"
+metadata: 
+  node_type: memory
+  type: project
+  originSessionId: fc17c9c2-ab6e-4c11-856e-a6a6e160a0ec
+---
+
+On cc-ci, `abra app new <recipe>` checks out the latest *published release tag* in
+`~/.abra/recipes/<recipe>`, silently discarding whatever commit you had checked out. A
+subsequent `abra app deploy --chaos` then deploys that tag's tree, not your WIP.
+
+**Why:** abra pins app creation to the recipe's released version and moves the recipe
+checkout to do it; `--chaos` only means "deploy the working tree as-is at deploy time".
+
+**How to apply:** in the step-2b direct-deploy loop, order matters: `abra app new` first,
+*then* `git checkout <PR-branch>` in the recipe dir, then `abra app deploy --chaos`.
+Verify with the deploy overview (config versions / images) that the intended tree went out.
+Also: plausible's `.env.sample` ships `DISABLE_AUTH/DISABLE_REGISTRATION=replace-me`, which
+crash-loops the app (`binary_to_existing_atom("replace-me")`) — set them to true/false in
+any dev env. See [[regression-canary-cadence]] for related CI cadence.
--- a/memory/drone-sqlite-log-extraction.md
+++ b/memory/drone-sqlite-log-extraction.md
@ -0,0 +1,14 @@
+---
+name: drone-sqlite-log-extraction
+description: How to read full drone CI step logs on cc-ci — copy /data/database.sqlite from the drone container and query it
+metadata: 
+  node_type: memory
+  type: reference
+  originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b
+---
+
+Drone on cc-ci has no on-disk logs and no API token handy. To get full step logs:
+1. `ssh cc-ci 'docker cp $(docker ps -qf name=drone):/data/database.sqlite /tmp/drone.sqlite'` then scp to orchestrator (no python3 on cc-ci PATH).
+2. Query with python3 sqlite3: `builds` (build_number → build_id) → `stages` (stage_build_id) → `steps` (step_stage_id) → `logs` where log_id = step_id; `log_data` is a JSON array of `{pos,out,time}` lines.
+
+**Why:** this is how the real root cause of immich CI builds 229/230 ("bash: /pg_backup.sh: No such file or directory" in the backup hook) was found after results.json/junit gave only the assertion failure. Related: [[shared-recipe-checkout-race]]
--- a/memory/immich-pgvectors-drop-database-panic.md
+++ b/memory/immich-pgvectors-drop-database-panic.md
@ -0,0 +1,14 @@
+---
+name: immich-pgvectors-drop-database-panic
+description: "Never DROP DATABASE on immich's postgres image — pgvecto.rs worker PANICs and crashes postgres; use pg_dump --clean --if-exists instead"
+metadata: 
+  node_type: memory
+  type: project
+  originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b
+---
+
+On immich's DB image (ghcr.io/immich-app/postgres:14-vectorchord0.4.3-pgvectors0.2.0), `DROP DATABASE` destabilises the legacy pgvecto.rs (`vectors`) background worker: it loops on "IPC connection is closed unexpected" until `PANIC: ERRORDATA_STACK_SIZE exceeded` → postgres aborts (signal 6) → the app never reconverges. Per-table `DROP TABLE` is safe; only `DROP DATABASE` triggers it.
+
+**Why:** confirmed live in dev-immich and in CI build 225 DB-service logs during the immich backup/restore fix (PR #2, June 2026).
+
+**How to apply:** for a true point-in-time restore without dropping the DB, back up with `pg_dump --clean --if-exists` (per-object DROP+recreate) and on restore rewrite pg_dump's `set_config('search_path', '', false)` to `'public, pg_catalog', true` (VectorChord types unresolvable otherwise — same rewrite as docs.immich.app/administration/backup-and-restore). See the recipe's pg_backup.sh. Related: [[shared-recipe-checkout-race]], [[drone-sqlite-log-extraction]]
--- a/memory/orchestrator-host-hetzner.md
+++ b/memory/orchestrator-host-hetzner.md
@ -0,0 +1,26 @@
+---
+name: orchestrator-host-hetzner
+description: The cc-ci orchestrator runs on a Hetzner cpx22; key host facts + the git-identity gotcha
+metadata: 
+  node_type: memory
+  type: project
+  originSessionId: cd772f12-1978-47c3-894b-0ebbe0d2987f
+---
+
+The cc-ci orchestrator (loops + watchdog + this session) runs on a **Hetzner cpx22** as of
+2026-05-31, replacing the Incus VM (100.116.55.106).
+
+- Hetzner server **134487234**, public **168.119.126.100**, tailnet **cc-ci-orchestrator-1** @
+  **100.84.190.30**. Flake host **cc-ci-orchestrator-hetzner**.
+- Rebuild: `sudo nixos-rebuild switch --flake .#cc-ci-orchestrator-hetzner` from `/srv/cc-ci-orch`
+  (`/srv/cc-ci` is a symlink to it). The Bash tool runs as user **loops** (uid 1000, passwordless
+  sudo) — plain `nixos-rebuild switch` fails on the profile symlink; use `sudo`.
+- Reboot-resilience: `cc-ci-loops.service` is **enabled** (wantedBy multi-user.target); ExecStartPre
+  `reboot-log.sh` auto-logs reboots to REBOOTS.md. Its `script` runs `launch.sh start`, which
+  **stops+restarts the loops** — so any rebuild that (re)starts the unit bounces the loops (they
+  re-orient from git; harmless but noticeable).
+- **Git-identity gotcha:** the box had no git user.name/email configured; commits fail with "Author
+  identity unknown". Set per-repo to match prior commits: `autonomic-bot
+  <autonomic-bot@git.autonomic.zone>`.
+
+Full record: `cc-ci-plan/plan-orchestrator-hetzner-migration.md`.
--- a/memory/plausible-upgrade-base-trap.md
+++ b/memory/plausible-upgrade-base-trap.md
@ -0,0 +1,27 @@
+---
+name: plausible-upgrade-base-trap
+description: "plausible CI REDs come from the published 3.0.0 base deploy (no x86_64 arch → 404 → silent exit 1), not the PR tree; needs UPGRADE_BASE_VERSION=3.0.1+v2.0.0 in cc-ci tests"
+metadata: 
+  node_type: memory
+  type: project
+  originSessionId: fc17c9c2-ab6e-4c11-856e-a6a6e160a0ec
+---
+
+cc-ci's upgrade tier deploys `recipe_versions[-2]` as the base before upgrading to the PR
+head (deploy-once design: the install tier asserts against that base too). For plausible,
+tags are `…, 3.0.0+v2.0.0, 3.0.1+v2.0.0` so the default base is **3.0.0+v2.0.0**, whose
+entrypoint lacks an x86_64 ARCH mapping → requests `clickhouse-backup-linux-x86_64.tar.gz`
+→ HTTP 404 always → `set -e` + silenced wget → container exits 1 with **empty service
+logs** → crash-loop → install timeout RED. Nothing in the PR can fix this: the base tag is
+immutable history.
+
+**Why:** the PR adds 3.1.0 above the newest published tag — the harness-documented case
+where `[-2]` is the wrong base and `[-1]` (3.0.1) is correct.
+
+**How to apply:** the fix is one line in the cc-ci repo (gated by --with-tests / operator):
+`tests/plausible/recipe_meta.py: UPGRADE_BASE_VERSION = "3.0.1+v2.0.0"`. The recipe-side
+hardening (verified cached binary on the persistent volume, Altinity URL, retries+timeout,
+loud hard-fail, depends_on fix) is on PR #3 (commit 9f8bcbc). Diagnosis + ask posted at
+https://git.autonomic.zone/recipe-maintainers/plausible/pulls/3#issuecomment-14261.
+Before burning a !testme on an upgrade-stage recipe, check what base version the harness
+will pick and whether that base can actually converge. See [[abra-chaos-deploy-checkout-gotcha]].
--- a/memory/push-commits-to-remote.md
+++ b/memory/push-commits-to-remote.md
@ -0,0 +1,14 @@
+---
+name: push-commits-to-remote
+description: "Operator wants every commit pushed to git.autonomic.zone right after it's made"
+metadata: 
+  node_type: memory
+  type: feedback
+  originSessionId: 7b5366a6-263c-421b-be7d-9f888067336b
+---
+
+In the cc-ci orchestrator repo (`/srv/cc-ci-orch`), push to `origin` (git.autonomic.zone/recipe-maintainers/cc-ci-orchestrator) immediately after committing — don't leave commits sitting locally waiting to be asked.
+
+**Why:** the operator treats the remote as the source of truth / backup; local-only commits are a loss risk on this autonomous box.
+
+**How to apply:** after any `git commit` here, run `git push origin main` (or the current branch) in the same turn. The remote is already credentialed in the URL. Mind the [[orchestrator-host-hetzner]] git-identity gotcha (commit as `autonomic-bot`). This standing preference replaces the default "commit/push only when asked" for this repo.
--- a/memory/recipe-mirrors-public-org-blocker.md
+++ b/memory/recipe-mirrors-public-org-blocker.md
@ -0,0 +1,29 @@
+---
+name: recipe-mirrors-public-org-blocker
+description: "Recipe mirrors are public repos but the recipe-maintainers ORG is private-visibility, so anon reads 404; bot can't flip the org"
+metadata: 
+  node_type: memory
+  type: project
+  originSessionId: f7960036-d990-4a21-a81e-f7c486d97fea
+---
+
+As of 2026-06-09 all 21 recipe mirrors under `recipe-maintainers` were flipped `private=false`
+(secret-scanned first), to power the Recipe Report's live PR-STATUS column via the tokenless
+same-origin proxy `report.ci.commoninternet.net/pr/<recipe>/<n>` (shipped in cc-ci
+`nix/modules/reports.nix`). BUT the **org itself is `visibility: private`**, which makes Gitea 404
+all its repos for anonymous users — so the live STATUS column shows a muted "?" instead of open/✓.
+
+**Blocker:** `autonomic-bot` cannot flip the org (PATCH `/orgs/recipe-maintainers` → 403 "Must be an
+organization owner"; `is_admin=false`; the basic-auth credential lacks `write:organization` scope,
+even though the bot is in the Owners team). Confirmed model: `autonomic-cooperative` is a public org
+and its repos ARE anonymously visible; `recipe-maintainers` is private and they are not.
+
+**Why:** the whole live-status feature is dark until this is resolved. Private repos stay hidden even
+in a public org, so flipping the org public does NOT expose the four locked-private repos (`cc-ci`,
+`cc-ci-secrets`, `cc-ci-orchestrator`, `archived-cc-ci-orchestrator`).
+
+**How to apply:** operator (an org owner) must set `recipe-maintainers` org visibility to **public**
+in the Gitea UI (Settings → make org public), OR provision a token with `write:organization` scope.
+The instant that happens, the proxy returns 200 PR JSON and the column lights up — no redeploy needed.
+Verify: `curl https://report.ci.commoninternet.net/pr/cryptpad/5` should return PR JSON, not a 404.
+Related: [[push-commits-to-remote]].
--- a/memory/regression-canary-cadence.md
+++ b/memory/regression-canary-cadence.md
@ -0,0 +1,14 @@
+---
+name: regression-canary-cadence
+description: "The cc-ci server regression canaries are expensive — run on polish/review/release, not every commit"
+metadata: 
+  node_type: memory
+  type: feedback
+  originSessionId: 7b5366a6-263c-421b-be7d-9f888067336b
+---
+
+The cc-ci **server regression canaries** (the codified E2E pytest suite — full lifecycle on `custom-html-tiny` + `lasuite-docs` good canaries plus a known-bad false-green-guard fixture; plan: `cc-ci-plan/plan-server-regression-canaries.md`) must **NOT** run on every commit/PR.
+
+**Why:** they're slow and resource-heavy — full lifecycle on lasuite-docs is minutes and needs the live server/abra/Swarm. Running them per-commit would be wasteful and slow the loop.
+
+**How to apply:** run them **deliberately at milestones** — polishing passes, code reviews, and releases of the cc-ci server — before trusting a batch of changes, not per incremental commit. Keep them opt-in behind the `@pytest.mark.canary` marker; if ever wired to `!testme` on the cc-ci repo, gate behind a deliberate trigger (label / `--canary`), never an automatic run on every PR.
--- a/memory/shared-recipe-checkout-race.md
+++ b/memory/shared-recipe-checkout-race.md
@ -0,0 +1,14 @@
+---
+name: shared-recipe-checkout-race
+description: Never run git checkout on ~/.abra/recipes/<recipe> on cc-ci while a CI build for that recipe is running — the harness chaos-deploys from that same working tree
+metadata: 
+  node_type: memory
+  type: feedback
+  originSessionId: 85355980-5e4f-4f90-b1ca-d0e4fe82f04b
+---
+
+The cc-ci harness (run_recipe_ci.py) deploys the upgrade tier from the SHARED `~/.abra/recipes/<recipe>` working tree on cc-ci via `abra app deploy --chaos`. Dev debugging that switches that checkout (`git checkout -f`, repro scripts) while a CI build runs makes CI deploy the wrong tree.
+
+**Why:** immich builds 229/230 went RED with "bash: /pg_backup.sh: No such file or directory" — the configs stanza wasn't in the tree CI deployed, because concurrent dev repro scripts were flipping the same checkout between base tag and PR head. A faithful manual repro with no concurrent churn mounted the config fine.
+
+**How to apply:** before triggering !testme, park the recipe checkout clean at the PR head and do zero abra/git activity on cc-ci for that recipe until the build verdicts. Also remember [[abra-chaos-deploy-checkout-gotcha]] (`abra app new` moves the checkout to the release tag). Related: [[drone-sqlite-log-extraction]], [[immich-pgvectors-drop-database-panic]]