refactor(1b): RL6 — move Builder protocol files into machine-docs/ (README stays root)

git mv STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md -> machine-docs/. README.md kept at root (operator decision). Updated in-repo refs: README (status line + lint section + Loop-state section) and docs/install.md -> machine-docs/... Safe to move now: launch.sh already has resolve_state() (prefers machine-docs/ else root) used by every STATUS/REVIEW read, and the running watchdog (pid 133191) was restarted AFTER that update, so it is location-agnostic. scripts/lint.sh -> lint: PASS post-move. Adversary moves its own REVIEW*.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 22:35:30 +01:00
parent ffb1c98225
commit 992d87cfcd
12 changed files with 26 additions and 21 deletions
--- a/machine-docs/BACKLOG-1b.md
+++ b/machine-docs/BACKLOG-1b.md
@ -0,0 +1,47 @@
+# BACKLOG — Phase 1b (review & lint pass)
+
+Phase-namespaced backlog. Builder owns `## Build backlog`; Adversary owns `## Adversary findings`.
+
+## Build backlog
+
+### W0 — Tooling + format (RL1) — DONE (Adversary PASS @2026-05-27)
+- [x] Add lint tooling to the flake: a `lint` devshell (nixpkgs-fmt, statix, deadnix, ruff,
+      shellcheck, shfmt, yamllint) built from the pinned nixpkgs.
+- [x] Add a `lint` entrypoint script (`scripts/lint.sh`) with check + `--fix` modes; tool configs
+      (ruff, yamllint, etc.).
+- [x] Auto-format the codebase (nix + python + shell).
+- [x] Fix remaining lint findings (statix/deadnix/ruff-lint/shellcheck) without weakening any test.
+- [x] Wire a `lint` stage into `.drone.yml` (push event); verified green from a clean checkout
+      (Adversary cold PASS + break-it probe).
+
+### W1 — Review checklist + fixes (RL2)
+- [x] Run the §3 white-box checklist (Builder side): all blocking invariants hold (tests-real,
+      harness-DRY, nix-idempotent, no-footguns, no-secrets, log-redaction); no fix needed; no advisory
+      to file. Recorded in JOURNAL-1b. Awaiting Adversary's own §3 pass #2 to confirm RL2.
+
+### W2 — Re-verify + document (RL3/RL4)
+- [x] RL4 docs: README "Linting & formatting" (local + CI-enforced); architecture.md `nix/` layout;
+      decisions in DECISIONS.md (lint tooling, RL5/RL6).
+- [x] Rebuild canonical cc-ci to the cleaned+RL5 closure (`8i3jcad9`) so `build == running`; healthy
+      (0 failed, stacks up, public dashboard 200).
+- [ ] **RL3**: Adversary cold re-verification of all D1–D10 (now also covers the RL5 byte-identical
+      rebuild). Gate claimed in STATUS-1b.
+- [ ] On full PASS handshake, write `## DONE` to STATUS-1b.md.
+
+### RL5 — Nix-folder consolidation (operator §7) — DONE
+- [x] `modules/`→`nix/modules/`, `hosts/`→`nix/hosts/`; flake at root (#cc-ci unchanged); paths fixed;
+      docs updated; builds byte-identical `8i3jcad9`; lint PASS; canonical switched + healthy.
+
+### RL6 — protocol files → machine-docs/ (operator §7) — DEFERRED (coordinated, LAST)
+- [ ] `git mv STATUS*/REVIEW*/JOURNAL*/BACKLOG*/DECISIONS.md machine-docs/` (README stays root);
+      update refs. MUST be lockstep with orchestrator (launch.sh + watchdog restart). Do as the final
+      1b step; flag the orchestrator first. Not while a phase transition is pending.
+
+### Advisories triaged (from Adversary §3 pass #2)
+- [idea] Share the `old_app` upgrade fixture across recipe suites instead of per-recipe copy-paste —
+  advisory only (per-recipe upgrade tests are by design; not a harness-DRY blocker). Defer to Phase 2.
+- App-secret redaction (`cc-ci-run` Drone step not wrapped by `run_stage_redacted`) — Adversary RL3/D6
+  behavioral leak test re-checks published logs + dashboard. Adversary-owned watch-item.
+
+## Adversary findings
+(empty — Adversary owns this section)
--- a/machine-docs/BACKLOG-1c.md
+++ b/machine-docs/BACKLOG-1c.md
@ -0,0 +1,56 @@
+# BACKLOG — Phase 1c
+
+Single-writer rule (§6.1): Builder edits `## Build backlog`; Adversary edits `## Adversary findings`.
+
+## Build backlog
+
+Method W1–W6 from the phase plan §5. Each milestone ends with an Adversary gate.
+
+- [x] **W2 — Secrets repo + cert into git.** (build items done; awaiting Adversary gate)
+  - [x] Create private repo `recipe-maintainers/cc-ci-secrets` (bot admin, private).
+  - [x] Move secrets + add wildcard cert+key as sops secrets (root `secrets.yaml`; sha256 verified).
+  - [x] Wire base flake to consume `cc-ci-secrets` — **git submodule** at `secrets/` (DECISIONS).
+  - [x] secrets.nix: `wildcard_cert`/`wildcard_key` → `path=/var/lib/ci-certs/live/*`.
+  - [x] proxy.nix: cert reframed as sops-from-git.
+  - [x] Verify byte-identical `build`==`/run/current-system` (`vh6vwxbl…`); git-clone `?submodules=1` matches too.
+  - [x] Verify clean switch on cc-nix-test; live TLS served from git cert (ssl_verify=0).
+  - [x] **Gate W2 CLAIMED** → Adversary verifies byte-identical + TLS-from-git-cert.
+- [x] **W1 — Headroom.** Resized `cc-nix-test` 6→4 GB (stop→PATCH→start via Incus API); healthy at 4 GB,
+      0 failed units, all stacks 1/1, cert survived reboot via sops, TLS 200. Running RAM 8 GB.
+- [x] **W3 — Throwaway VM.** `ccci-throwaway` (incus-base, 4 GB/20 GB) reachable at 100.126.124.86
+      (used live TS_AUTH_KEY; workspace key stale). Bootstrap age key provisioned in W4.
+- [x] **W4 — Reproducible live rebuild.** Fresh blank VM + recovery age key only → `git clone
+      --recursive` + ONE `nixos-rebuild switch ?submodules=1` → running/0-failed, byte-identical
+      `ld19aj2`==cc-ci, 6 stacks 1/1, all secrets+cert decrypt, TLS leaf==git cert. Found+fixed a
+      concurrent-abra race (serialized reconcilers). **Gate W4 CLAIMED** (awaiting Adversary W5).
+- [ ] **W5.5 — Functional-acceptance e2e (E2E-TESTME, operator-gated).** Authority:
+      `cc-ci-plan/test-e2e-testme-acceptance.md`. After C4/C5 PASS + orchestrator renames rebuilt VM→
+      cc-nix-test + confirms public gateway + SIGNALS: `!testme` (bot) on a fast enrolled recipe
+      (custom-html); verify E1–E6 (self-check 200/cert → new Drone build via bridge → app reachable
+      EXTERNALLY at `<app>.ci.commoninternet.net` w/ valid cert+content → real assertions pass → clean
+      undeploy → reported). Evidence→JOURNAL-1c, verdict→STATUS/REVIEW-1c. Fail⇒fix in git, re-run.
+      Do NOT start before the signal; keep VM stack up. Adversary independently verifies.
+- [ ] **W5 — Adversary cold proof + honest D8.** Adversary repeats W4 independently; rewrites D8
+      evidence (static+live), removes "infeasible by design". Accept: Adversary D8 live-rebuild PASS
+      (or narrow signed-off limitation per C5).
+- [ ] **W6 — Cleanup + docs + final sizing.** Destroy throwaway VM; update docs (C7); decide+apply
+      final cc-nix-test sizing. Accept: no leftover; docs match; flip STATUS-1c → `## DONE`.
+
+## Adversary findings
+
+- [x] **ADV-1c-1 [adversary] — `docs/architecture.md` not updated to the 1c model (blocks C7). CLOSED @2026-05-27 20:10Z (Adversary re-verified).**
+  Fixed by Builder (`6276bfd`/`2a5affc`). Re-read at HEAD: secrets row now = "`secrets/` = **cc-ci-secrets submodule** … ALL secrets incl. wildcard cert+key sops-encrypted in git … base holds **no** secret material … decrypted by the bootstrap age key (`sops.age.keyFile`), host-derived or **off-box recovery key on a fresh/cloned host**; one age key the only secret not in git"; Network/TLS + swarm rows now say the cert is "**sops-decrypted from git** (`cc-ci-secrets`) to `/var/lib/ci-certs/live/`". No stale pre-1c phrasing remains. → C7 met. (Minor non-blocking note: the *external* orchestrator doc `/srv/cc-ci/cc-ci-plan/plan.md §1.5/§4.0/§4.4` still has pre-1c cert wording, but it's outside the repo / not loop-git-managed and not the doc a new engineer installs from — the repo docs install/secrets/architecture are authoritative and correct.)
+
+  ~~Original finding:~~
+  C7 requires `architecture.md` reflect the new model, but it still describes the **pre-1c** layout:
+  - Line ~17 (secrets row): "`modules/secrets.nix` + `secrets/secrets.yaml` (sops-nix) | Infra secrets,
+    decrypted at activation **via the host SSH key** as the age identity" — no mention of the private
+    **`cc-ci-secrets` repo / git submodule** split, the **recovery age key** bootstrap for a fresh host,
+    or that the **wildcard cert+key are sops secrets in git** (C1/C2/C3 — the core of 1c).
+  - §Network/TLS (lines ~40–41): cert described as "**pre-issued** wildcard cert at
+    `/var/lib/ci-certs/live/`" (out-of-band), not **sops-decrypted-from-git** to that path.
+  Repro: `grep -n "host SSH key\|secrets/secrets.yaml\|pre-issued wildcard" docs/architecture.md`.
+  A new engineer reading it gets the wrong mental model of where secrets/cert live. **Fix:** update the
+  secrets row + Network/TLS section to the 1c model (cc-ci-secrets submodule, cert sops-in-git decrypted
+  at activation, recovery-key as the one out-of-band bootstrap secret), consistent with install.md/secrets.md.
+  Only the Adversary closes this, after re-reading the updated doc. (Doc gap — not a VETO.)
--- a/machine-docs/BACKLOG.md
+++ b/machine-docs/BACKLOG.md
@ -0,0 +1,231 @@
+# BACKLOG — cc-ci
+
+Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adversary edits only
+`## Adversary findings`. Closing an item = checking the box in your own section.
+
+## Build backlog
+
+### M0 — Foundations
+- [x] Author flake.nix (NixOS host cc-ci) + hosts/cc-ci/{configuration,hardware}.nix from baseline
+- [x] Deploy mechanism decision + first rebuild from repo (DECISIONS.md) — switch --flake on host
+- [x] sops-nix wiring: host age key (from ssh host key) + master recovery key; secrets/secrets.yaml;
+      decrypt a test secret on host → /run/secrets/test_secret (0400 root) verified
+- [x] Gate: M0 — `ssh cc-ci 'systemctl is-system-running'` healthy after rebuild from repo
+      → CLAIMED 2026-05-26, awaiting Adversary (see STATUS.md)
+
+### M1 — Swarm + abra target
+- [x] Docker + single-node swarm via Nix (modules/swarm.nix: docker + swarm-init oneshot + `proxy`
+      overlay net + daily autoprune). Verified: Swarm=active, proxy overlay present.
+- [x] Proxy = real coop-cloud/traefik via abra (orchestrator decision, replaces custom traefik.nix):
+      wildcard/file-provider mode, pre-issued cert as ssl_cert/ssl_key swarm secrets, LETS_ENCRYPT_ENV
+      empty → no ACME. `scripts/deploy-proxy.sh` (idempotent). Verified E2E via gateway: wildcard cert
+      served, 0 ACME log lines.
+- [x] abra installed (modules/abra.nix, pinned 0.13.0-beta); deployed custom-html by hand over HTTPS
+      (HTTP 200 nginx page via gateway) and tore it down clean (services/volumes/secrets/containers=0).
+- [x] Gate: M1 — recipe reachable over HTTPS at *.ci.commoninternet.net, torn down clean →
+      CLAIMED 2026-05-26, awaiting Adversary.
+
+### M2 — Drone online
+- [x] Drone server (coop-cloud recipe, reconcile oneshot) + exec runner via Nix; Gitea OAuth app.
+      Server healthz 200 via gateway; runner polling (capacity=2, type=exec).
+- [x] hello-world .drone.yml runs green; logs visible (Drone UI + API). Build #1 success: clone +
+      hello (echo/whoami=root/abra 0.13.0-beta/swarm=active), both exit 0.
+- [x] Gate: M2 — push to cc-ci triggers visible green build → CLAIMED 2026-05-26, awaiting Adversary.
+      OAuth link via one-time `scripts/bootstrap-drone-oauth.sh` (documented in install.md §2).
+
+### M3 — Comment bridge
+- [x] comment-bridge service: polling PRIMARY (read-only, ≤30s) + optional admin webhook; !testme
+      exact match; org-membership auth (`GET /orgs/{owner}/members/{user}` 204) + allowlist; Drone API
+- [x] PR comment posting with run link
+- [x] Gate: M3 — live demo on scratch PR; auth enforced → CLAIMED 2026-05-27. Posted `!testme` on
+      PR #1 → poll fired in 6s → Drone build #26 for head d397720a → bridge commented run link back.
+      Org-membership auth verified (bot/trav/notplants 204, non-member 404 at read level).
+
+### Bridge→Drone→harness integration (connects M3 trigger to M4/M5 recipe CI; blocks D2/D10 via !testme)
+- [x] Add a recipe-CI pipeline to `.drone.yml` keyed on `event=custom`: runs
+      `cc-ci-run runner/run_recipe_ci.py` STAGES=install,upgrade,backup, `CCCI_JANITOR_MAX_AGE=0`,
+      `concurrency:{limit:1}`, `HOME=/root`. Self-test pipeline now `event=push`. (commits 9d51cb6+)
+- [x] Verify a recipe build runs the full 3-stage CI through Drone (not self-test): **build #33 →
+      success**, install/upgrade/backup all green, clean teardown (0 orphans). HOME + backup `-C -o`
+      + clean-reclone fixes applied.
+- [ ] Full single-comment E2E: enroll a recipe in the bridge `POLL_REPOS` + open a recipe PR →
+      `!testme` → full 3-stage CI + PR comment outcome (folds into M6.5/M10 breadth).
+
+### M4 — Harness + install stage
+- [x] run_recipe_ci.py + conftest + harness (abra wrappers, lifecycle) + Nix python/playwright env
+      (cc-ci-run); install stage for recipe #1 (custom-html) + Playwright assertion; guaranteed teardown
+- [x] Gate: M4 — green install run, no orphaned app/volume → CLAIMED 2026-05-27, awaiting Adversary.
+      Repro: `cd /root/cc-ci && RECIPE=custom-html PR=0 REF=m4demo cc-ci-run runner/run_recipe_ci.py`
+      → 2 passed (http 200 + playwright); teardown leaves services/volumes/secrets/containers/env = 0.
+
+### M5 — Upgrade + backup/restore stages
+- [x] Add upgrade + backup/restore stages for recipe #1 (custom-html). backup-bot-two deployed as a
+      reconcile oneshot (modules/backupbot.nix). Data marker served via nginx for assertions.
+- [x] Gate: M5 — upgrade preserves data; backup→mutate→restore returns original → CLAIMED 2026-05-27.
+      Full 3-stage run green: install(2)+upgrade(1)+backup(1) passed; teardown leaves 0 orphans, infra intact.
+
+### M6 — Recipe-local tests + second recipe
+- [x] D4 recipe-local discovery: recipe-shipped tests/ snapshotted post-fetch + run against the live
+      app as a `recipe-local` stage (contract CCCI_BASE_URL/CCCI_APP_DOMAIN). Demo'd via mirror branch
+      recipe-maintainers/custom-html@ci/d4-recipe-local → recipe-local test PASSED against live app.
+- [x] Enroll DB-backed recipe #2 (keycloak + mariadb) via per-recipe tests/keycloak/ only (no harness
+      surgery): install green (realm health + Playwright admin login). docs/enroll-recipe.md written.
+- [x] Gate: M6 — both recipes green (custom-html 3-stage; keycloak install) + recipe-local merged →
+      CLAIMED 2026-05-27. keycloak full 3-stage (DB data survival) folds into the M6.5 breadth ramp.
+
+### M6.5 — Breadth ramp (recipes 3→6)
+- [x] keycloak (SSO/DB-backed, recipe #2) full 3-stage green through the Drone recipe-ci pipeline:
+      build #39 success (~31m): install 2✓ (realm health + Playwright admin login), upgrade 1✓
+      (`test_upgrade_preserves_realm` — DB data survives), backup 1✓ (`test_backup_mutate_restore`).
+      Clean teardown (0 keyc services/volumes). Proves DB-backed data survival + integration path.
+- [x] cryptpad (stateful/no-DB, recipe #3) full 3-stage green on host (cc-ci-run): install 2✓
+      (http + Playwright), upgrade 1✓ (marker in cryptpad_data survives), backup 1✓
+      (`test_backup_mutate_restore`). No harness surgery — added generic per-recipe EXTRA_ENV
+      (handles cryptpad's SANDBOX_DOMAIN). Fixed a real backup bug en route: set_env glued
+      RESTIC_REPOSITORY onto a comment → backupbot had no restic repo (now newline-safe). Drone
+      canonical run = **build #46 success** (~6m, all 3 stages green, clean teardown).
+- [x] matrix-synapse (DB+media/large-volume, recipe #4) full 3-stage green on host: install 2✓
+      (client API + versions JSON), upgrade 1✓ (postgres marker survives), backup 1✓ — exercises the
+      recipe's pg_backup.sh DB-dump hook (not a plain volume copy). No harness surgery. Drone
+      canonical run = **build #51 success** (~10.5m, all 3 stages green, clean teardown).
+- [x] lasuite-docs (multi-service + S3/MinIO, recipe #5) full 3-stage green on host: install 2✓
+      (9-service stack converges + SPA + Playwright), upgrade 1✓ (postgres marker survives), backup
+      1✓ (pg_backup.sh hook). Fixed deploy timeout (cold-pull of ~9 images > abra 300s) via
+      TIMEOUT=900 EXTRA_ENV; OIDC config-only so starts healthy w/ placeholder. Drone canonical run
+      = **build #57 success** (all 3 stages green, clean teardown).
+- [x] n8n (workflow automation, recipe #6 — bluesky-pds swapped out per DECISIONS) full 3-stage
+      green on host: install 2✓ (/healthz + Playwright editor), upgrade 1✓ (marker in /home/node/.n8n
+      survives), backup 1✓ (backupbot.backup.path file backup). Drone canonical run = **build #63
+      success** (~5.5m, all 3 stages green, clean teardown).
+- [ ] Re-verify keycloak backup post set_env fix (build #39 ran off an earlier backupbot deploy)
+- [x] Gate: M6.5 — recipes 3–6 three-stage green → **CLAIMED 2026-05-27**. All 6 D10 recipes have a
+      full 3-stage green run (host + canonical Drone): custom-html, keycloak(#39), cryptpad(#46),
+      matrix-synapse(#51), lasuite-docs(#57), n8n(#63). All 5 categories covered; D5 no-harness-surgery
+      held (per-recipe tests/<recipe>/ + recipe_meta EXTRA_ENV only). Awaiting Adversary.
+
+### M7 — Secrets hardening (D6)
+- [x] Full sops model + rotation doc (docs/secrets.md: 3 classes, decryption chain, rotation per
+      class) + log redaction filter (run_recipe_ci masks /run/secrets/* values in stage output,
+      live-streaming preserved). Adversary leak scans clean (baseline + recipe-CI logs).
+- [x] Gate: M7 — secret-grep finds nothing → **CLAIMED 2026-05-27**. No-plaintext: harness never
+      prints secrets, abra doesn't echo generated ones, reconciles redirect secret-gen to /dev/null,
+      dashboard shows status only; redaction filter as belt-and-suspenders. Awaiting Adversary
+      (re-grep published logs + dashboard; optionally follow a rotation procedure).
+
+### M8 — Dashboard (D7)
+- [x] Overview page + badges: dashboard/dashboard.py + modules/dashboard.nix — live at
+      ci.commoninternet.net/, lists the 6 recipes w/ pass/fail/running badges + run links, plus
+      /badge/<recipe>.svg. Verified via gateway; /hook still routes to bridge. (content-hash image
+      tag so the swarm service rolls on code change.)
+- [x] PR-comment outcome reflection: bridge watcher polls the Drone build to completion + edits its
+      run comment to ✅ passed / ❌ <status> (Gitea PATCH). Verified: fresh !testme on PR #1 → comment
+      edited to "❌ failure → …/76" within ~20s.
+- [x] [idea] gave the bridge image a content-hash tag (fixed latent `:latest` no-roll issue)
+- [x] Gate: M8 — overview matches reality; outcomes mirrored → **CLAIMED 2026-05-27**. Dashboard
+      overview lists the 6 recipes w/ correct status badges (live, gateway-verified); PR comments link
+      back AND reflect final pass/fail. Awaiting Adversary.
+
+### M9 — Reproducibility + docs (D8/D9)
+- [x] D9 docs complete: README + docs/{install,enroll-recipe,secrets,architecture,runbook,baseline}.
+      Covers architecture, enroll a recipe, add/run tests locally, operate/rotate secrets, debug a
+      failed run. install.md = from-scratch path (clone + nixos-rebuild + operator preconditions).
+- [ ] Gate: M9 — Adversary rebuilds from docs on throwaway host (D8) — Adversary action; install.md
+      ready. (Note: a from-scratch rebuild pulls images → needs the registry creds / quota too.)
+
+### M10 — Proof (D10)
+- [x] **All 6 recipes green via REAL !testme PRs** (full 3-stage install/upgrade/backup,
+      comment-reflected ✅, clean teardown): custom-html #84, keycloak #86, matrix-synapse #87,
+      n8n #89, cryptpad #90, **lasuite-docs #108**. All 5 D10 categories covered.
+- [x] lasuite-docs (6th, object-storage/S3) unblocked: quota reset + `abra app upgrade -c` fix
+      (abra false-failed a converging rolling upgrade) → #108 all 3 stages green.
+- [x] Gate: M10 — six recipes green via !testme → **CLAIMED 2026-05-27**, awaiting Adversary D10
+      verification.
+- [ ] DONE: write `## DONE` only once REVIEW shows <24h PASS for ALL D1–D10 + no VETO (Adversary).
+
+## Adversary findings
+<!-- Adversary-only section. Builder must not edit below this line. -->
+
+- [x] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
+      **CLOSED @2026-05-27T00:35Z** by Adversary re-test. `runner/harness/lifecycle.deploy_app`
+      calls `abra.env_set(domain, "LETS_ENCRYPT_ENV", "")` before every deploy. Verified on a live
+      harness app (`cust-c95a69`): env `LETS_ENCRYPT_ENV=` empty, no `certresolver` label, **0 ACME
+      log lines**, and the served cert is the **wildcard** `CN=*.ci.commoninternet.net` (verify ok)
+      — not a per-host ACME cert. No-ACME holds for harness deploys. (Structural belt-and-suspenders
+      — dropping the unused `certificatesResolvers` from traefik — remains a nice-to-have, tracked
+      under A3/M7, not required to close A1.)
+
+- [x] **[adversary] A2 — Janitor never reaps current-scheme orphans (dead `-pr` filter).**
+      **CLOSED @2026-05-27T10:45Z** by Adversary live re-test of the fix. Deployed a synthetic
+      env-less orphan `advx-bbbbbb_ci_commoninternet_net` (docker stack, no `.env` — the case the old
+      `-pr` filter AND abra-ls both miss). (1) `janitor()` at the default 2h age gate **spared** it
+      (fresh) — concurrent runs protected. (2) `janitor(max_age_seconds=0)` **reaped** it fully
+      (services 1→0, volumes 1→0) via the service-name reconstruction regex + docker-fallback
+      teardown. Janitor now matches the real `<tag>-<6hex>` scheme and reaps even `.env`-gone orphans.
+      Original finding below.
+      Found during M4 review. `harness.lifecycle.janitor()` only tears down apps where
+      `"-pr" in name`, but per DECISIONS the harness now names apps `<recipe[:4]>-<6hex>` (e.g.
+      `cust-c95a69`) — **no `-pr` substring**. So the run-start crash-recovery sweep (§4.3: "nuke
+      any orphaned `*-pr*` apps") matches **nothing** and is effectively a no-op. The happy-path
+      finalizer in `conftest.deployed_app` does work (observed: `cust-e084bd` from a prior run was
+      torn down), but a run that crashes/reboots *before* the finalizer runs leaves an orphan that
+      no later run will reap. *Fix:* match the actual naming (e.g. regex `^[a-z]{1,4}-[0-9a-f]{6}\.`
+      or a dedicated CI label/prefix) and gate on age. *Re-test:* deploy a harness app, simulate a
+      crash (kill the run before teardown), then start a new run and confirm janitor reaps the
+      orphan. Adversary closes after re-test.
+      **Re-test progress @2026-05-27T05:00Z (fix b7a2d70):** the reaping *mechanism* is verified —
+      janitor now matches the real naming via `RUN_APP_RE` (`^[a-z0-9]{1,4}-[0-9a-f]{6}\.ci…`,
+      matches `cust-c95a69`) AND reconstructs `.env`-gone orphans from orphaned *service* names
+      (regex matches my synthetic `advx-aaaaaa_ci_commoninternet_net_app`), with an age gate to spare
+      concurrent runs, then reaps via `teardown_app` (verified clean under A3). **Still pending:** one
+      live `janitor()` end-to-end sweep — needs `CCCI_JANITOR_MAX_AGE=0`, which would also reap the
+      Builder's live apps, so it must run on an **idle host**. Will close then.
+
+- [x] **[adversary] A3 — Teardown is unverified/best-effort; a failure silently orphans + run stays green.**
+      **CLOSED @2026-05-27T05:00Z** by Adversary re-test of the Builder's fix (commit b7a2d70).
+      `teardown_app` now: `undeploy` → if the service persists, `docker stack rm` **fallback** (needs
+      no `.env`) → remove volumes/secrets *by stack name* (retry loop) → drop `.env` LAST → **verify**
+      `_residual()` and raise `TeardownError` if anything remains. Empirical worst-case test: I
+      `docker stack deploy`-ed a synthetic orphan `advx-aaaaaa_ci_commoninternet_net` (service +
+      volume + network, **no `.env`** — exactly the crash-orphan that defeated the old code), then
+      called `lifecycle.teardown_app("advx-aaaaaa.ci.commoninternet.net")` → returned OK (verify
+      passed) and afterwards services/volumes/networks = **0**. So a `.env`-less orphan is fully
+      reaped and teardown is now verified (would raise on residual). Original finding below.
+      Found during M4 review (to confirm empirically with a kill-mid-run probe). `lifecycle.teardown_app`
+      runs every abra call with `check=False` and "never raises"; the conftest finalizer never
+      asserts teardown succeeded. Worse, `abra.app_config_remove` deletes the app `.env`
+      **unconditionally**, even if `abra.undeploy` failed first — leaving the swarm service+volume
+      running but with no `.env`, so the app can no longer be managed/undeployed via abra (and a
+      fixed janitor that shells `abra app undeploy` couldn't reap it either). Net: a partial teardown
+      leaves a silent orphan while pytest still reports the run **green**, so the M4/D2 guarantee
+      "no orphaned app/volume afterward" is not actually *verified* by the harness. *Fix:* assert
+      post-teardown that the stack/services/volumes/secrets are gone (fail the run otherwise); only
+      remove the `.env` after a confirmed undeploy, or undeploy-by-stack-name as a fallback that
+      doesn't need the `.env`. *Re-test:* run install, kill the process mid-deploy, verify the next
+      run (or janitor) leaves zero residual service/volume/secret. Adversary closes after re-test.
+
+- [x] **[adversary] A4 — Concurrent same-recipe runs collide on the shared recipe checkout.**
+      **CLOSED @2026-05-27T03:13Z — mitigated by the runtime concurrency cap.** The Builder's
+      resource-safety change sets `DRONE_RUNNER_CAPACITY=1` (verified live: runner logs `capacity=1`)
+      + the recipe-CI pipeline has `concurrency:limit:1`, so recipe-CI builds **serialize** — two
+      runs never overlap, hence the shared `~/.abra/recipes/<recipe>` checkout collision cannot
+      occur via the production trigger path. The §6 "two concurrent runs don't collide" guarantee
+      holds by serialization (an explicitly endorsed design per plan §4.2). **Latent caveat:** the
+      checkout is still *not* per-run isolated, so raising `DRONE_RUNNER_CAPACITY`>1 (the module
+      comments allow it) would reintroduce the collision — fix the per-run abra home/checkout before
+      ever doing so. (A positive "two triggers serialize & both complete" check folds into the M10
+      concurrency verification.)
+      Found by review (M6 verify); to confirm empirically. Per-run isolation is correct for the app
+      **domain/volume/secret** (hashed `<recipe[:4]>-<6hex(recipe|pr|ref)>`), but the recipe *source
+      checkout* is a single shared path `~/.abra/recipes/<recipe>`: `run_recipe_ci.fetch_recipe`
+      does `rm -rf ~/.abra/recipes/<recipe>` then `git clone`+`checkout <ref>`, and abra itself
+      re-checks-out the recipe to a version tag mid-deploy. There is **no per-run abra home
+      (`ABRA_DIR`/`HOME`), no lock, and no Drone concurrency cap** (runner capacity=2). So two
+      concurrent runs of the **same recipe at different refs** (e.g. `!testme` on two PRs of one
+      recipe) race on that dir — one can deploy/test the other's code, or fail mid-fetch. (Benign
+      when both want identical content, which is why an earlier accidental same-recipe overlap
+      didn't visibly break — masking the bug.) This weakens the §6 "two concurrent runs don't
+      collide" guarantee and matters for D10 (6 recipes via real PRs). *Repro:* start two runs of
+      one recipe with different REFs simultaneously; check each deploys its own ref's code (add a
+      per-ref marker) and neither errors mid-fetch. *Fix:* per-run abra home/recipe dir (e.g.
+      `ABRA_DIR=$(mktemp -d)` or `~/.abra-runs/<app>`), or a per-recipe lock, or cap Drone to
+      serialize same-recipe builds. Adversary confirms + closes after re-test.
--- a/machine-docs/DECISIONS.md
+++ b/machine-docs/DECISIONS.md
@ -0,0 +1,273 @@
+# DECISIONS — cc-ci Builder
+
+Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
+
+## Settled
+
+- **Wildcard TLS:** operator pre-issues wildcard cert at `/var/lib/ci-certs/live/`; Traefik file
+  provider serves it; **no ACME** for commoninternet.net. (Plan §4.0/§8 — fixed.)
+- **Repo:** `git.autonomic.zone/recipe-maintainers/cc-ci`, private. Bot is org admin. (Bootstrap.)
+- **Git credentials:** helper script in repo-local git config sources `/srv/cc-ci/.testenv` at call
+  time — no secret values stored in `.git/config` or commits.
+
+- **Proxy: real coop-cloud/traefik via abra — SETTLED (M1, orchestrator decision 2026-05-26,
+  overrides plan §3 `modules/traefik.nix`).** Instead of a hand-rolled Traefik we deploy the
+  canonical Co-op Cloud `traefik` recipe via abra in **wildcard / file-provider mode**, for
+  end-to-end fidelity (canonical `web`/`web-secure` entrypoints + proxy/swarm conventions every
+  recipe expects — this also fixed an entrypoint-name mismatch the custom build hit). NO ACME, NO
+  DNS token on the box:
+  - `WILDCARDS_ENABLED=1` + append `compose.wildcard.yml`; the pre-issued cert is fed as the
+    `ssl_cert`/`ssl_key` swarm secrets (v1) via `abra app secret insert … -f` from
+    `/var/lib/ci-certs/live/{fullchain,privkey}.pem`. The file provider serves it (`tls.certificates`).
+  - `LETS_ENCRYPT_ENV=` **empty** on the traefik app *and* on every test app → the recipe's
+    `tls.certresolver=${LETS_ENCRYPT_ENV}` label resolves to no resolver → routers serve the
+    wildcard via SNI from the file provider, ACME never fires. (Verified: 0 ACME log lines.)
+  - Reproducibility (D8): `scripts/deploy-proxy.sh` is idempotent (ensures local abra server, fetches
+    recipe, writes the wildcard/no-ACME env, inserts cert secrets, deploys). Documented in
+    `docs/install.md`. The custom `modules/traefik.nix` was removed; `modules/swarm.nix` keeps swarm
+    init + `proxy` net + firewall 80/443.
+  - **Renewal (manual, ~90d):** operator re-issues the wildcard at the same paths, then
+    `abra app secret rm traefik.ci.commoninternet.net ssl_cert -n` + re-insert at a new version (bump
+    `SECRET_WILDCARD_CERT_VERSION`) and redeploy. (Documented in docs/secrets.md at M7.)
+  - **abra teardown syntax** (for harness, §4.3): `abra app undeploy <d> -n`,
+    `abra app volume remove <d> -f -n`, `abra app secret remove <d> --all -n`. None take `--chaos`.
+
+- **Infra bring-up = idempotent-reconcile systemd oneshots — SETTLED (M2, orchestrator steer
+  2026-05-26).** Every piece of swarm infra that abra deploys (traefik `modules/proxy.nix`, Drone
+  `modules/drone.nix`, later comment-bridge + dashboard) is a `systemd.services.<x>` with
+  `Type=oneshot` + `RemainAfterExit`, `after`/`requires` swarm-init + docker, `wants`
+  network-online, `wantedBy` multi-user, embedding its script via **`pkgs.writeShellApplication`**
+  (self-contained in the store, not a `/root/cc-ci` path). The script **reconciles** (inspect →
+  converge → no-op if correct) on *every* activation/boot — **no run-once sentinel** — so it
+  self-heals drift (stack gone → redeploy; secret missing → re-insert). Fails visibly (failed unit)
+  on missing preconditions (e.g. cert absent). Result: a from-scratch install (D8) collapses to
+  `git clone` + `nixos-rebuild switch` + operator preconditions, no manual post-steps. The old
+  `scripts/deploy-*.sh` were folded into these modules and removed. `pkgs.abra` is provided via an
+  overlay (`modules/packages.nix`) so all modules share the one pinned build.
+  - *Cert rotation note:* the proxy reconcile inserts ssl_cert/ssl_key only if absent; rotating the
+    wildcard means bumping `SECRET_WILDCARD_*_VERSION` (operator) so the next reconcile re-inserts.
+    Documented in docs/secrets.md at M7.
+
+- **Trigger: POLLING primary, webhook optional — SETTLED (orchestrator design change 2026-05-27,
+  supersedes the earlier "keep webhook, do NOT pivot to polling" steer).** Hard constraint: the
+  bot/server runs at **READ level, never repo-admin**, and **never self-registers a webhook**.
+  - **Polling is PRIMARY and the source of truth for D1.** The bridge polls each enrolled repo's
+    open PRs for new `!testme` comments every `POLL_INTERVAL` (30s ≤ 60s). Outbound
+    (cc-ci → git.autonomic.zone, the reliably-working direction), needs only read+comment. On
+    startup the first poll marks pre-existing comments seen so it doesn't fire on old comments.
+  - **Webhook is an OPTIONAL push optimization.** The `/hook` endpoint stays live (HMAC-verified)
+    so an *admin-registered* `issue_comment` webhook lowers latency, but the bridge never registers
+    one. Manual registration is documented in `docs/enroll-recipe.md`. Both paths share an
+    in-memory seen-set keyed by comment id → a comment seen by both fires at most once.
+  - **Commenter authorization via org membership (read-level, no admin).** Allowed iff
+    `GET /orgs/{owner}/members/{user}` → 204 (verified 2026-05-27: admits bot/trav/notplants, 404
+    for a non-member, works with bot read-level basic-auth) **or** the user is in the optional
+    `AUTH_ALLOWLIST`. Replaces the earlier `/collaborators/{user}/permission` check, which needs
+    repo-admin. Fail-closed on any error.
+  - **Enrollment** = add the repo to the bridge `POLL_REPOS` csv + ensure `tests/<recipe>/` exists.
+    No webhook required for CI to work. (Why root cause of the old webhook non-delivery doesn't
+    matter: polling makes it irrelevant; the operator was whitelisting `ci.commoninternet.net` in
+    Gitea's `ALLOWED_HOST_LIST`, but D1 no longer depends on that.)
+
+- **Resource safety: bound live test apps — SETTLED (orchestrator design change 2026-05-27,
+  plan §4.2/§4.3).** Do NOT keep multiple test apps deployed at once. Three layers, all configurable:
+  - **MAX_TESTS = `DRONE_RUNNER_CAPACITY` = 1** (`modules/drone-runner.nix`, `maxTests` let-binding).
+    Drone runs at most MAX_TESTS builds at once and **auto-queues the rest in its native pending
+    queue** — no custom queue. Kept at 1 (single 28GiB node, heavy recipes). At capacity=1 there is
+    never a concurrent in-flight run, so the bound "at most 1 test app live" holds exactly.
+  - **Per-build TIMEOUT = 60 min** (`modules/drone.nix`, `buildTimeoutMinutes`; reconciled
+    best-effort via `PATCH /api/repos/recipe-maintainers/cc-ci {"timeout":60}` using the bridge's
+    Drone admin token, local `--resolve`, non-fatal). A build over the limit is cancelled by Drone →
+    the exec runner kills it → the MAX_TESTS slot frees → the queue advances. Satisfies "continue
+    once a test finishes OR times out".
+  - **Teardown + janitor backstop.** Each build deploys → runs the 3 stages → undeploys
+    (guaranteed `try/finally` in `conftest`/orchestrator). A SIGKILL'd/timed-out build can't run its
+    own teardown, so the **run-start janitor** (`lifecycle.janitor`, called before every deploy in
+    both fixtures + `run_recipe_ci`) reaps orphaned run apps as the backstop. At capacity=1 the CI
+    path will set `CCCI_JANITOR_MAX_AGE=0` (reap any orphan immediately — safe with no concurrent
+    runs) in the recipe-CI Drone pipeline; with capacity>1 the janitor MUST stay age-based (default
+    2h) to avoid reaping a live concurrent run. Net: at most MAX_TESTS apps ever live.
+  - Optional `concurrency: {limit: 1}` in the recipe-CI `.drone.yml` is a redundant belt — primary
+    mechanism is `DRONE_RUNNER_CAPACITY`. (Wired when the recipe-CI pipeline lands — see backlog.)
+
+- **D10 recipe #6: bluesky-pds (TLS-passthrough) SWAPPED → n8n — SETTLED (2026-05-27, plan §4.0
+  sanctions this swap-with-reason).** bluesky-pds routes via a Traefik **TCP router with
+  `tls.passthrough=true`** to an in-container **caddy** that terminates TLS itself and obtains its own
+  cert via **ACME**. cc-ci's design is the opposite: the operator gateway passes wildcard TLS through
+  to cc-ci's Traefik, which **terminates** it with the pre-issued static wildcard cert, and **ACME is
+  hard-forbidden** for commoninternet.net (no DNS token on the box — §4.0/§9). Serving bluesky-pds
+  would require either (a) ACME inside caddy (forbidden), or (b) injecting the wildcard cert into
+  caddy + a per-host TCP-passthrough router on cc-ci Traefik (recipe-internal surgery + a bespoke
+  proxy mode — not a clean shared-harness absorb). This is a genuine design conflict, not a harness
+  gap. Per the plan's explicit allowance, **bluesky-pds is a documented non-CI'd recipe** (reason
+  here), and **n8n** takes the 6th slot. The 5 required D10 categories are already covered by recipes
+  1–5 (simple=custom-html, single-DB+SSO=keycloak, stateful/no-DB=cryptpad, DB+media/large-volume=
+  matrix-synapse, multi-service+S3/object-storage=lasuite-docs); n8n adds a 6th real deployable app
+  (workflow automation) behind the normal terminate-at-Traefik path.
+
+- **Docker Hub rate limit + mid-breadth prune — FINDING (2026-05-27).** D10 real-`!testme` breadth
+  runs exhausted Docker Hub's anonymous pull rate limit (lasuite-docs, 9 images, upgrade stage:
+  `toomanyrequests`). Two lessons: (1) **registry pull creds are an A1 operator input** needed for
+  reliable heavy-recipe deploys under load (request + sops-store + wire into docker daemon). (2)
+  **Don't `docker image prune -af` mid-breadth** — it evicts cached recipe images and forces re-pulls
+  that hit the limit. The first lasuite failure was disk pressure (90% full); pruning fixed disk but
+  triggered re-pulls → rate limit. Better: rely on the daily autoprune, prune only `dangling`
+  (not `-a`) between runs, or grow disk so heavy images stay cached. Net for D10: 5/6 recipes green
+  via real !testme; lasuite-docs gated on the rate limit (transient ~hours; durable fix = creds).
+
+## Open (defaults from §8, to confirm as reality lands)
+
+- **Deploy mechanism — SETTLED (M0):** `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run *on
+  cc-ci itself*, with the repo materialised on the host at `/root/cc-ci`. Chosen over
+  `--target-host`/deploy-rs to avoid pushing large closures over the userspace-tailscaled SOCKS
+  proxy (slow/fragile). Atomic rollback preserved by Nix generations (`nixos-rebuild --rollback`).
+  The switch is launched as a **detached transient systemd unit** (`systemd-run --unit=ccci-rebuild
+  --collect`) so it survives a momentary ssh-over-tailscale drop during activation. For the build
+  loop the host copy is synced from the sandbox clone via `tar | ssh` (rsync absent on host);
+  source of truth stays the git repo. D8/install.md will document the from-scratch path (clone repo
+  on a fresh host, then `nixos-rebuild switch --flake .#cc-ci`).
+  - **nixpkgs pin:** flake pins the exact rev cc-ci already ran (`50ab793…`) so the first rebuild
+    is a true no-op-then-base. Bump deliberately, never drift.
+- **Webhook scope:** default per-repo via enroll script.
+- **CI engine: Drone (per plan) — kept, with a noted risk.** nixpkgs 24.11 has Drone **server**
+  2.24.0 but `drone-runner-exec` is **abandoned (unstable-2020-04-19)** — the only exec runner Drone
+  ever shipped (upstream archived ~2021). The maintained fork **Woodpecker** (2.7.3, with NixOS
+  modules) is the alternative. Decision: honor the plan (Drone) because the plan is Drone-specific
+  (D7 "Drone's native UI", comment-bridge → Drone API). The 2020 exec runner pairs fine with modern
+  Drone server (RPC protocol stable). **Fallback:** if the exec runner proves incompatible/broken,
+  pivot to Woodpecker (coop-cloud ships a `woodpecker` recipe too) and record it — like the traefik
+  pivot. Re-evaluate at the M2 gate.
+- **Drone deployment shape — SETTLED (M2):** mirror the traefik pattern. The **server** is the
+  coop-cloud `drone` recipe (drone/drone:2.26.0) deployed via abra (swarm-native, auto-routed by
+  traefik at `drone.ci.commoninternet.net`, `LETS_ENCRYPT_ENV` empty → wildcard cert, no ACME),
+  with Gitea SSO (`compose.gitea.yml`). The **exec runner** runs as a Nix systemd service on the
+  host (`modules/drone-runner.nix`) so it can drive host abra/swarm (plan §4.2). One generated
+  `DRONE_RPC_SECRET` is shared: inserted as the server's `rpc_secret` swarm secret AND read by the
+  runner from sops. Reproducible deploy: `scripts/deploy-drone.sh`.
+  - Gitea OAuth app `cc-ci-drone` created under the bot (client_id `ab4cdb9d-ee96-4867-875f-
+    87384505fc52`, redirect `https://drone.ci.commoninternet.net/login`); client_secret +
+    rpc_secret stored sops-encrypted in `secrets/secrets.yaml` (A2 internal secrets).
+- **Drone runner type:** exec (must drive host abra).
+- **Secret tool — SETTLED (M0):** sops-nix. cc-ci decrypts at activation using its **ed25519 SSH
+  host key** as the age identity (`sops.age.sshKeyPaths`), so no extra key file to manage on the box.
+  Recipients in `/.sops.yaml`: the host age key (`age1h90ut…`, from ssh-to-age) + an off-box
+  **master recovery key** (`age1cmk26t…`; private half only at `/srv/cc-ci/.sops/master-age.txt` on
+  the build host, never in the repo) for re-keying if cc-ci is lost. Encrypt new secrets by writing
+  plaintext into `secrets/<f>.yaml` then `sops -e -i` (run inside the repo so `.sops.yaml` is found).
+- **D10 recipe set:** lock six early. Candidates favouring already-mirrored: custom-html (simple),
+  cryptpad (stateful no-DB), keycloak (SSO/DB), matrix-synapse (DB+media), lasuite-docs (multi+S3),
+  bluesky-pds (TLS-passthrough) — covers all five categories. Confirm during M4–M6.5.
+
+- **Per-run app domain scheme — adapted (M4, deviates from plan §4.0).** Plan §4.0 wanted
+  `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`, but Docker swarm config/secret names
+  (`<stackname>_<resource>_<version>`) must be ≤ 64 chars and abra derives `<stackname>` from the
+  domain (dots→`_`, hyphens kept). `.ci.commoninternet.net` alone is 22 chars, so long recipe names
+  + config names overflow 64 (hit with `custom-html-pr0-m4demo…_nginx_default_conf_v6` = 66). New
+  scheme: **`<recipe[:4]>-<6hex(recipe|pr|ref)>.ci.commoninternet.net`** (e.g. `cust-e084bd`) — short,
+  unique per run, collision-safe across recipes (full recipe in the hash). Human-readable recipe/PR/
+  ref context lives in the Drone build params + the PR comment, not the (ephemeral) domain.
+
+- **abra recipe checkout is volatile — harness uses chaos+offline + a tests/ snapshot (M6).** Many
+  abra commands (`app ls`, `secret generate` without flags, version resolution) silently
+  `git checkout <version-tag>` in `~/.abra/recipes/<recipe>`, discarding a PR branch's files. To
+  test the *PR head code* (not a re-resolved tag): (1) `fetch_recipe` clones the mirror branch/ref
+  (private → bot token via per-command `http.extraHeader`, never persisted/logged); (2) all harness
+  abra calls that touch the recipe pass `-C` (chaos: use current checkout) `-o` (offline: no remote
+  fetch); (3) recipe-shipped `tests/` (D4) are **snapshotted to a temp dir right after fetch**, since
+  later abra commands still reset the checkout — the recipe-local stage runs from the snapshot.
+
+## Risks
+
+- **Disk — RESOLVED 2026-05-26.** Original 8.9 GiB root had only ~3.8 GiB free *and* a hard
+  **inode** ceiling (586k total, ~6k free) — the flake's nixpkgs fetch (~50k files) hit ENOSPC on
+  inodes before bytes. Operator grew the VM to **28 GiB** (22 GiB free, 1.78M inodes / 1.21M free);
+  the ext4 fs auto-resized (new block groups carry proportional inodes). Keep aggressive teardown +
+  periodic `docker image prune` to avoid regressing during M6.5 breadth.
+
+## Dead-ends
+- (none yet)
+
+## Phase 1c (full reproducibility + genuine D8 live rebuild) — 2026-05-27
+
+- **Secrets linkage = git SUBMODULE (deviates from plan §7 flake-input default).** `cc-ci-secrets`
+  is mounted as a submodule at `cc-ci/secrets/` rather than a flake `inputs.secrets`. Rationale: a
+  private flake input must be re-fetched at **every nix eval**, requiring the bot token persistently
+  in nix config/netrc on cc-ci AND the throwaway VM (a token in the store/config = a 2nd out-of-band
+  secret, which 1c forbids). A submodule makes `secrets/secrets.yaml` a plain path in the working
+  tree → `defaultSopsFile = ../secrets/secrets.yaml` is unchanged (minimal diff, trivially
+  byte-identical), and the only credential use is the one `git clone --recursive` at provisioning
+  ("the two repos are *given*", Mission §1). Build invocation becomes
+  `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` so the submodule tree is
+  included. (Revisit if `?submodules=1` proves unreliable on cc-ci's nix version.)
+- **Bootstrap key for the throwaway VM = the existing RECOVERY (master) age key, via
+  `sops.age.keyFile`.** The recovery key (`age1cmk26…`, private at `/srv/cc-ci/.sops/master-age.txt`)
+  is already a sops recipient, so a fresh host with a *different* ssh host key still decrypts every
+  secret with no re-keying — this is exactly the §0 argument that defeats "host-key binding".
+  Provisioned to the VM at a fixed path (the ONE out-of-band secret). cc-ci itself keeps decrypting
+  via its host key (`age.sshKeyPaths`); secrets.nix will offer both identity sources. (Per-host
+  re-encrypt is cleaner for a *permanent* new instance — documented as the alternative, not used for
+  the throwaway test.)
+- **Cert into git:** wildcard cert+key become sops secrets in `cc-ci-secrets`, decrypted at
+  activation back to `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` via
+  `sops.secrets.<name>.path`; proxy.nix keeps reading that path (now sops-sourced, not operator-drop).
+- **cc-nix-test final sizing (C6) — SETTLED by operator 2026-05-27: PROMOTE the rebuilt VM.** The
+  freshly-rebuilt reproducible VM (the FINAL W5/C4-C5 clean-room throwaway) becomes the canonical
+  cc-nix-test; the operator will repurpose it for a live real-traffic test through the public gateway.
+- **C6 teardown OVERRIDE (operator, 2026-05-27):** do NOT destroy the FINAL throwaway VM after
+  W5/C4-C5 PASSes — keep it RUNNING; defer its C6 teardown until the operator explicitly says
+  otherwise. This overrides the plan §5/§6 "destroy the throwaway" for that one VM only. All other
+  cleanup proceeds normally (the Builder's first throwaway was already destroyed; RAM accounting holds).
+
+## Phase 1b — lint/format tooling (open decisions §6, settled W0)
+- **Formatters/linters (RL1):** Nix = `nixpkgs-fmt` (format) + `statix` (lints) + `deadnix` (dead
+  code); Python = `ruff` (lint + format); Shell = `shellcheck` + `shfmt -i 2 -ci`; YAML = `yamllint`.
+  Kept `nixpkgs-fmt` over `alejandra` because it was already the repo `formatter` and devshell tool
+  (no extra churn / restyle of every .nix). All built from the already-pinned nixpkgs via a flake
+  `lint` devshell (`nix develop .#lint`) so CI and local use byte-identical tool versions.
+- **Lint entrypoint:** `scripts/lint.sh` (check-only by default; `--fix` auto-applies). The
+  `.drone.yml` push pipeline runs it via `nix develop .#lint --command bash scripts/lint.sh`.
+- **ruff strictness:** `select = [E,F,W,I,UP,B,C4,SIM]`, `ignore = [E501]` (line length is the
+  formatter's job; only un-splittable strings would trip it). `line-length=100`, `target=py311`.
+- **Drone lint stage = FAIL (not warn).** The codebase is green now, so enforce from here on — an
+  unclean commit fails the `lint` step. (Resolves the §6 open question.)
+- **Python type-checking (mypy/pyright): DEFERRED to IDEAS**, not added in 1b. The harness is small
+  and dynamically typed around `abra`/subprocess JSON; gradual typing is a larger effort than this
+  bounded pass warrants. Revisit if Phase 2's 18-recipe ramp shows type bugs.
+- **blocking vs advisory split (§3):** treated as in the phase plan — tests-real, Nix-idempotent,
+  no-footguns, no-secrets, log-redaction, harness-DRY = blocking; readability/docs/arch-drift =
+  advisory unless a real plan deviation. Recorded per-finding in REVIEW-1b / BACKLOG-1b.
+- **cc-ci self-CI push trigger:** the lint stage lives in the `event: push` pipeline. The Gitea→Drone
+  push webhook on this instance is flaky (`last_status: None`; documented §4.1) and predates 1b —
+  recipe CI uses polling as primary, but cc-ci's *own* self-test/lint relies on the push webhook.
+  The lint stage is correctly wired and proven green via the identical `nix develop .#lint` command;
+  reliably auto-firing it on every push is tracked as a (pre-existing) infra item, not a 1b lint gap.
+
+## Phase 1b — repo layout (operator review items RL5/RL6, plan §7)
+- **RL5 — all Nix code under `nix/`.** Moved `modules/`→`nix/modules/` and `hosts/`→`nix/hosts/`.
+  `flake.nix`/`flake.lock` STAY at the repo root (entry point) so the build ref `#cc-ci` and
+  `nixos-rebuild --flake '…#cc-ci'` are unchanged — only `flake.nix`'s internal
+  `./hosts/cc-ci/configuration.nix` → `./nix/hosts/cc-ci/configuration.nix` changed. Root-relative
+  refs inside the moved modules were re-based `../X` → `../../X` (secrets.nix → `../../secrets/`,
+  bridge.nix → `../../bridge/`, dashboard.nix → `../../dashboard/`); `configuration.nix`'s
+  `../../modules/*` imports are unchanged (both dirs moved under `nix/`, so the relative path still
+  resolves). **Toplevel is byte-identical (`8i3jcad9…`) before/after the move** — store derivations
+  are content-addressed on the copied file *contents*, and the module `.nix` files aren't part of the
+  runtime closure, so relocating folders doesn't change the build. (The operator anticipated a hash
+  change; in practice it's stable, which is even stronger for reproducibility.) Living docs
+  (README, architecture/install/secrets/enroll) + the `.drone.yml` comment updated to `nix/…`;
+  append-only history logs left as the record of what was true then.
+- **RL6 — protocol files → `machine-docs/`: DEFERRED to the coordinated end of 1b.** Will `git mv`
+  `STATUS*/REVIEW*/JOURNAL*/BACKLOG*/DECISIONS.md` into `machine-docs/` (README.md STAYS at root —
+  operator decision, it's the human readme, not a protocol file). The live watchdog (`launch.sh`)
+  reads `STATUS-<id>.md`/`REVIEW-<id>.md` at the repo root for handoffs/transition, so this is done
+  LAST, in lockstep with the orchestrator updating `launch.sh` + restarting the watchdog — not
+  unilaterally and not while a phase transition is pending. The Adversary likewise `git mv`s its own
+  REVIEW files at the cutover (single-writer rule).
+
+## Phase 1b — recorded deviation: no `tests/_template/` dir (enroll = copy an existing recipe)
+Plan §3's repo layout lists a `tests/_template/` "copy-to-add-a-recipe" dir. It was **never created**
+(pre-1b; not introduced or removed by 1b) — instead the documented enroll flow in
+`docs/enroll-recipe.md` is **"copy an existing recipe's tree, e.g. `tests/custom-html/…`, then adjust
+`recipe_meta.py` + the per-recipe test files."** This satisfies D5's "small, repeatable, documented
+operation with no harness surgery" the same way (a concrete recipe is a better starting template than
+an abstract skeleton that can drift). Recording per the Adversary's RL3 D5 advisory; not a blocker.
--- a/machine-docs/JOURNAL-1b.md
+++ b/machine-docs/JOURNAL-1b.md
@ -0,0 +1,170 @@
+# JOURNAL — Phase 1b (review & lint pass)
+
+Append-only Builder log: what I did + verifying command/output + next. (Adversary logs to REVIEW-1b.)
+
+---
+
+## 2026-05-27 — Phase 1b kickoff (first wake)
+
+Read the phase plan (`plan-phase1b-review-lint.md`) + plan.md §6.1/§7/§9. Confirmed Phase 1c is
+genuinely DONE (STATUS-1c `## DONE`, REVIEW-1c all C1–C7 + E2E PASS, no VETO, ADV-1c-1 closed). Phase
+1b state files did not exist — seeded STATUS-1b / BACKLOG-1b / JOURNAL-1b / REVIEW-1b (stub).
+
+Access + environment probes:
+- `ssh cc-ci 'hostname && systemctl is-system-running'` → `nixos` / `running`.
+- Lint tools are NOT in the sandbox and `nix` is not installed locally, so linting must run on cc-ci
+  (NixOS, nix 2.24.14, flakes enabled). `nix build github:NixOS/nixpkgs/<our-pin>#ruff` resolves from
+  cache.nixos.org (ruff 0.7.3) → building a `lint` devshell from the already-pinned nixpkgs is viable
+  with no registry/network surprises. shellcheck-0.10.0 already realized in the host store.
+
+Lint-target inventory: 14 `.nix`, 32 `.py`, 1 `.sh` (`scripts/bootstrap-drone-oauth.sh`), plus
+`.drone.yml` / `.sops.yaml` YAML. No prior lint/format decisions in DECISIONS.md (clean slate).
+
+Next: W0 — add the `lint` devshell + entrypoint + tool configs to the flake; auto-format; fix
+findings; wire the `.drone.yml` lint stage.
+
+## 2026-05-27 — W0 built: lint toolchain + format + drone stage
+
+Added (commits 2cede01 format/fixes, 4af427c drone stage, + tooling commits):
+- `flake.nix`: `lint` devshell (`nix develop .#lint`) = nixpkgs-fmt, statix, deadnix, ruff,
+  shellcheck, shfmt, yamllint, built from the already-pinned nixpkgs (no registry/network surprise —
+  `nix build <pin>#ruff` resolves from cache.nixos.org). Default devshell also gets them.
+- `scripts/lint.sh` (check / `--fix`), `ruff.toml`, `.yamllint.yaml`.
+- `.drone.yml`: a `lint` step in the `event: push` pipeline running
+  `nix develop .#lint --command bash scripts/lint.sh` (FAILs the build on any unclean file).
+
+Format/lint cleanup (semantics-preserving): ruff format on all 32 .py; nixpkgs-fmt drone-runner.nix;
+shfmt scripts; ruff SIM105/SIM115 (contextlib.suppress / `with open`); statix (merge sops
+`secrets.*`, empty-pattern → `_`); deadnix (drop unused `self`/`lib`/overlay `final`).
+
+Verification (on cc-ci, clean tar'd checkout /tmp/ccci-lint):
+```
+$ nix develop .#lint --command bash scripts/lint.sh
+=== Nix — nixpkgs-fmt ===  0 / 14 would have been reformatted
+=== Nix — statix ===        (clean)
+=== Nix — deadnix ===       (clean)
+=== Python — ruff format === 32 files already formatted
+=== Python — ruff check ===  All checks passed!
+=== Shell — shfmt/shellcheck === (clean)
+=== YAML — yamllint ===     (clean)
+lint: PASS
+```
+nix eval `.#nixosConfigurations.cc-ci.config.system.build.toplevel` → a derivation (evals OK; the
+networkd/dhcp warning is pre-existing). Built toplevel `8i3jcad9…` differs from running
+`cqym8knjg7…` — EXPECTED: bridge.py/dashboard.py (and runner) are `cp`'d into the store, so the
+reformat changes their hash. cc-ci will be rebuilt to the formatted closure in W2 before RL3.
+All Python byte-compiles (store python 3.12.8).
+
+Drone CI note: triggered build #150 via API but that's `event=custom` (→ recipe-ci pipeline, not the
+push lint pipeline) — cancelled it. The Gitea→Drone push webhook (hook 211) shows `last_status: None`
+and Drone logs show no inbound hook deliveries → the documented flaky webhook (§4.1). Public and
+canonical (100.90.116.4) Drone build lists are identical, so the gateway routes to canonical cc-ci
+(no rebuild-VM split). Recorded the flaky-webhook as a pre-existing infra item in DECISIONS.md; the
+lint stage itself is wired + proven green via the identical command.
+
+Claimed W0 gate (RL1) in STATUS-1b. Next: W1 white-box review checklist over the cleaned codebase.
+
+## 2026-05-27 — W0 PASS (Adversary cold, RL1) + W1 Builder-side §3 self-review
+
+Adversary logged **W0/RL1 PASS** (REVIEW-1b): cold checkout of my HEAD `233939a` archived to cc-ci,
+`nix develop .#lint --command bash scripts/lint.sh` → exit 0 `lint: PASS`, plus a break-it probe
+(injected bad .py/.nix → exit 1 `lint: FAIL`) proving the gate has teeth. Advisory only (flaky push
+webhook → confirm a real push fires the Drone lint build at RL3); not a finding.
+
+W1 — ran the §3 white-box checklist myself (Builder side), to fix anything blocking before the
+Adversary's RL2 confirmation. Findings over the post-W0 (cleaned) codebase:
+- **Tests real (blocking)** — holds. (Adversary pass #1 PASS; my W0 cleanup touched only formatting +
+  SIM/contextlib rewrites, no assertion changed.)
+- **Harness DRY (blocking-ish)** — holds. `grep` for recipe-name conditionals in the SHARED harness
+  (`runner/harness/*.py`, `run_recipe_ci.py`, `conftest.py`) → NONE. Per-recipe quirks are data:
+  optional `tests/<recipe>/recipe_meta.py` (HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/HTTP_TIMEOUT) +
+  per-recipe test files (e.g. keycloak `kc_admin.py`). Enrolling needs no shared-harness edit (D5).
+- **Nix idempotent (blocking)** — holds (no `.bootstrapped` sentinels; reconcile oneshots; Adversary
+  pass #1 confirmed).
+- **No footguns (blocking)** — holds. Every `time.sleep()` (lifecycle.py 160/170/226/252,
+  bridge.py 304) sits inside a `while time.time() < deadline:` poll/retry loop (verified each), not a
+  bare readiness wait. `--chaos` appears ONLY in "never pass it" comments (abra.py). No `shell=True`.
+- **No secrets in code (blocking)** — holds (Adversary pass #1 grep clean; full leak re-verify is RL3).
+- **Log redaction real (blocking)** — holds. `run_recipe_ci.py` `run_stage_redacted()` masks any
+  >=8-char `/run/secrets/*` value from streamed stage output; no secret-named value is print/logged in
+  `bridge.py`/`dashboard.py` (grep clean).
+- **Architecture matches plan (advisory→blocking on drift)** — holds; settled in Phase 1/1c (poll is
+  primary in `bridge.py`'s loop; `/hook` optional; traefik is the coop-cloud recipe via `proxy.nix`).
+  No drift; not reopening settled design (guardrail §5).
+- **Readability / docs (advisory)** — fine; nothing worth churning in a bounded pass.
+
+**No blocking finding; nothing to fix; no advisory item to file.** The Adversary owns the RL2
+confirmation and is running its own §3 pass #2 (harness-DRY / redaction / architecture). Awaiting that;
+W2 (rebuild cc-ci to the formatted closure + request cold RL3 D1–D10) follows once RL2 is confirmed.
+
+## 2026-05-27 — RL2 clean + RL5 (nix/ consolidation) + W2 switch to cleaned closure
+
+**RL2 (Adversary §3 pass #2):** no blocking findings; 2 advisories — (a) `old_app` upgrade-fixture
+copy-paste across recipes → triaged to IDEAS (per-recipe upgrade tests are by design; sharing is a
+nicety, not a DRY-blocker); (b) app-secret redaction: the `cc-ci-run` Drone step path isn't wrapped by
+`run_stage_redacted`, so the Adversary will re-run the behavioral D6 leak test at RL3 (grep published
+Drone logs + dashboard for a known generated app password). My Builder §3 self-review agreed (no
+blockers). W1 is light/clean.
+
+**RL5 — consolidate Nix code under `nix/`** (operator item, plan §7). `git mv modules nix/modules`,
+`git mv hosts nix/hosts`; flake.nix/flake.lock stay at root (`#cc-ci` unchanged); only flake's
+internal configuration.nix path + the moved modules' root-relative refs changed (`../X`→`../../X`).
+Built on cc-ci → toplevel `8i3jcad9…` **byte-identical to the pre-move build** (content-addressed;
+module .nix not in the runtime closure). Living docs + `.drone.yml` comment updated to `nix/…`.
+
+**W2 — switched canonical cc-ci to the cleaned+RL5 closure** so `build == running` (required before
+RL3: a fresh clone builds `8i3jcad9`; running had to match or the byte-identical-to-running check
+would fail). Re-synced `/root/cc-ci` to HEAD, `nixos-rebuild switch --flake 'path:/root/cc-ci#cc-ci'`:
+```
+stopping units: deploy-bridge.service, deploy-dashboard.service
+sops-install-secrets: Imported …ssh_host_ed25519_key as age key (age1h90utdz…)
+starting units: deploy-bridge.service, deploy-dashboard.service
+```
+Post-switch health (all green):
+- `readlink /run/current-system` → `8i3jcad9mrr01558lqckpi26nxn2ra3m-…` (== fresh-clone build; was
+  `cqym8knjg7…` pre-format).
+- `systemctl is-system-running` → `running`, **0 failed**. deploy-bridge/deploy-dashboard `active`.
+- 5 stacks up (backups, ccci-bridge, ccci-dashboard, drone, traefik); `ccci-bridge_app` + 
+  `ccci-dashboard_app` 1/1 with NEW content-hash image tags (reformatted source redeployed).
+- Public via SOCKS proxy → gateway → cc-ci: `https://ci.commoninternet.net/` → **200**
+  (`<title>cc-ci — Co-op Cloud recipe CI</title>`); `/badge/custom-html.svg` → **200**.
+
+Net: RL1 PASS, RL2 clean, RL4 docs landed (README lint section + architecture.md `nix/` layout),
+RL5 done + healthy, running==build==`8i3jcad9`. Remaining for DONE: **RL3** (Adversary cold D1–D10
+re-verify, now also covering the RL5 byte-identical rebuild) and **RL6** (coordinated machine-docs/
+move — LAST, with orchestrator lockstep). Claiming the RL3 gate.
+
+## 2026-05-27 — push-webhook diagnostic (the RL1 "future commits stay clean" advisory)
+
+Timeboxed root-cause on why pushes don't auto-create a Drone lint build. Fired Gitea's webhook test
+for the Drone hook (211) while tailing the Drone server logs:
+- `POST /repos/recipe-maintainers/cc-ci/hooks/211/tests` → Gitea returns **204** (accepted).
+- `docker service logs --since 20s drone_…_app` → **NOTHING** — no inbound request logged at all.
+
+So the delivery `git.autonomic.zone (Gitea) → drone.ci.commoninternet.net (public gateway) → cc-ci`
+isn't reaching Drone. This is a **gateway/network reachability** condition, NOT a Drone-side config
+I can fix — and per §9 the gateway is operator-managed (not ours to reconfigure). Leaving it as the
+documented pre-existing advisory (hook `last_status: None`, §4.1). Impact is limited to cc-ci's OWN
+self-test/lint pipeline auto-firing; **recipe-CI triggering is unaffected** — the comment-bridge
+polls Gitea *outbound* (cc-ci → git.autonomic.zone, the reliable direction), which is the plan's
+primary trigger (§4.1). The lint stage is wired + proven green via its exact command; manual/API
+Drone builds work. Not expanding scope to re-engineer the inbound path (bounded pass).
+
+## 2026-05-27 — RL3 FULL D1–D10 PASS (Adversary cold). Only RL6 (coordinated) left.
+
+Adversary logged **RL3 PASS** (REVIEW-1b): all D1–D10 re-verified cold on the cleaned+RL5
+byte-identical closure (`8i3jcad9`==running==fresh-clone build), fresh <24h evidence, nothing
+weakened. Highlights: D1 trigger 20s/8s; D2 install/upgrade/backup green (upgrade actually ran, not
+skipped) on custom-html + keycloak; D6 leak test 0 hits (8/8 infra + cert/key + generated keycloak
+admin pw absent from logs/dashboard); D8 fresh-recursive-clone rebuild == running; D10 = 2 fresh
+category runs (#151 custom-html, #152 keycloak) + carry-forward of the Phase-1 Adversary-verified
+6/6 set (byte-identical harness/test/closure). Cardinal-rule PASS. **RL1–RL5 Adversary-PASS, no open
+findings, NO VETO.**
+
+→ Flagged the orchestrator (STATUS-1b) that I'm **ready for the RL6 coordinated cutover**: it updates
+`launch.sh` to `machine-docs/` paths + restarts the watchdog; on its signal I `git mv`
+STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md into `machine-docs/` (README stays root), the Adversary moves
+REVIEW*, I fix the only in-repo refs (README Loop-state + docs/install.md:15), Adversary re-verifies,
+then I write `## DONE`. Holding all root protocol files in place until that signal (moving them early
+breaks the live watchdog). Loop continues; not idling on a long sleep — short fallback while awaiting
+the orchestrator go-ahead.
--- a/machine-docs/JOURNAL-1c.md
+++ b/machine-docs/JOURNAL-1c.md
@ -0,0 +1,440 @@
+# JOURNAL — Phase 1c (Builder)
+
+Append-only. Each entry: what I did + verifying command/output + next.
+
+## 2026-05-27 — Phase 1c kickoff / orientation
+
+**Context:** Phase 1 closed DONE (1c10fa5; all D1–D10 Adversary-PASS, no VETO; Adversary loop
+terminated for Phase 1). Starting Phase 1c per `plan-phase1c-full-reproducibility.md`.
+
+**Did:**
+- Read the phase plan in full + plan.md §6.1/§7/§9. Verified access & surveyed current state.
+- `ssh cc-ci 'hostname && nixos-version && systemctl is-system-running'`
+  → `nixos` / `24.11.20250630.50ab793 (Vicuna)` / `running`. Healthy.
+- Surveyed secrets layout:
+  - `secrets/secrets.yaml` holds 6 infra secrets: test_secret, drone_rpc_secret,
+    drone_gitea_client_secret, bridge_drone_token, bridge_gitea_token, bridge_webhook_hmac.
+  - `.sops.yaml`: recipients = `&host` (ssh-to-age of cc-ci ed25519 host key) + `&master`
+    (off-box recovery, private at `/srv/cc-ci/.sops/master-age.txt`, sandbox-only).
+  - `modules/secrets.nix`: `defaultSopsFile = ../secrets/secrets.yaml`,
+    `age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]`.
+  - Wildcard cert OUT-OF-BAND at `cc-ci:/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}`
+    (fullchain 2909 B, privkey 227 B → privkey looks like an EC key). `modules/proxy.nix` reconcile
+    reads it from there and inserts as traefik `ssl_cert`/`ssl_key` swarm secrets; treats a missing
+    cert as FATAL (operator precondition).
+  - `flake.nix`: nixpkgs pinned `50ab793`, sops-nix pinned; single local `nixosConfigurations.cc-ci`.
+- Tooling note: sandbox host has NO sops/nix/age/ssh-to-age (`which` → only git). sops/age work
+  must run on cc-ci (has nix + host age key) or via a sops binary fetched there with the master key.
+- Bootstrapped Phase-1c state: STATUS-1c.md, BACKLOG-1c.md, JOURNAL-1c.md (this file). REVIEW-1c.md
+  left for the Adversary (its file per §6.1). Appended Phase-1c decisions to DECISIONS.md.
+
+**Decisions recorded (DECISIONS.md):** secrets linkage = **git submodule** (deviates from the
+flake-input default — rationale: no private-repo fetch credential needed at nix-eval on every
+rebuild, keeps `defaultSopsFile` a local path = minimal change + trivially byte-identical);
+bootstrap key for throwaway = **recovery age key via `sops.age.keyFile`**.
+
+**Next (W2):** create private `recipe-maintainers/cc-ci-secrets`; move secrets + wildcard cert into
+sops there as a submodule of the base; wire secrets.nix (cert→`/var/lib/ci-certs/live` via `path=`);
+prove byte-identical build + clean switch with TLS from the git cert. Then claim Gate W2.
+
+## 2026-05-27 — W2 step 1: cc-ci-secrets repo created + populated (DONE)
+
+**Did:**
+- Created private `recipe-maintainers/cc-ci-secrets` via Gitea API (bot, org admin). HTTP 201, private=True.
+- Confirmed cc-ci host SSH key → age identity == `&host` recipient `age1h90utd…`:
+  `ssh cc-ci 'nix shell nixpkgs#ssh-to-age --command ssh-to-age -i /etc/ssh/ssh_host_ed25519_key.pub'`
+  → exact match. So I can decrypt/re-encrypt on cc-ci with the host key (master stays sandbox-only).
+- Built `secrets.yaml` on cc-ci (script with file redirections, no key material in argv):
+  `sops -d` existing 6 secrets → append `wildcard_cert`/`wildcard_key` as YAML block scalars from
+  `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` → `sops -e`. Verified round-trip:
+  - recipients: 2 (host+master)
+  - keys: test_secret, drone_rpc_secret, drone_gitea_client_secret, bridge_drone_token,
+    bridge_gitea_token, bridge_webhook_hmac, wildcard_cert, wildcard_key
+  - cert sha256 file==decrypt `c1d96d61…`; key sha256 file==decrypt `9ec25d00…`; test_secret decrypts OK
+- Retrieved ciphertext (7219 B) to sandbox; created cc-ci-secrets repo (root `secrets.yaml`, own
+  `.sops.yaml` w/ `path_regex: secrets\.yaml$`, README). Pushed to main (auth via per-command
+  http.extraHeader; verified `.git/config` has NO creds). Remote lists .sops.yaml/README.md/secrets.yaml.
+- Cleaned `/root/cc-ci-secrets.yaml` + build script off cc-ci.
+
+**Layout decision:** cc-ci-secrets has `secrets.yaml` at ROOT → submodule mounts at base `secrets/`
+→ base sees `secrets/secrets.yaml`, so `defaultSopsFile = ../secrets/secrets.yaml` is UNCHANGED.
+
+**Next (W2 step 2):** in base repo — replace tracked `secrets/` with the submodule; add
+`wildcard_cert`/`wildcard_key` sops secrets in secrets.nix (path= → /var/lib/ci-certs/live, + recovery
+keyFile); adjust proxy.nix framing; switch cc-ci to new config via
+`nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'`; prove byte-identical +
+TLS-from-git-cert; then claim Gate W2. (Riskier — touches live server config; fresh iteration.)
+
+## 2026-05-27 — W2a DONE + verified live; Gate W2 CLAIMED
+
+**Discovery:** cc-ci's build source `/root/cc-ci` is NOT a git repo — it's a plain dir synced from
+the sandbox via `tar | ssh` and built as a `path:` flake (DECISIONS.md:126). So cc-ci's deploy needs
+NO submodule fetch / `?submodules=1` (the rsync'd dir already contains `secrets/`). The git-clone
+`--recursive` + `?submodules=1` path is only for the documented install / throwaway (W4).
+
+**Did (W2a — secrets split + cert into git, deployed to live cc-ci):**
+- secrets.nix: added `wildcard_cert`(0444)/`wildcard_key`(0400) sops secrets → `path=/var/lib/ci-certs/live/*`.
+- proxy.nix: reframed cert as sops-from-git (not operator drop); kept FATAL guard as a decrypt-path check.
+- Base repo: `git rm secrets/secrets.yaml`; `git submodule add cc-ci-secrets secrets` (gitlink 2312f1c,
+  `.gitmodules` has NO creds). Pushed f79e542 (rebased over Adversary's c360520; resolved the
+  tracked-file→submodule transition by removing the submodule wd before rebase, repopulating after).
+- Synced to cc-ci via `tar | ssh` (excluded .git). `nixos-rebuild build` → exit 0, only **6 derivations
+  built** (sops manifest gains cert/key + proxy unit error-msg edit) → toplevel
+  `vh6vwxbl4qr9whzpwgjimhf9gn4329p8` (differs from pre-W2 `m1pdvbhl…` — EXPECTED: cert moved
+  out-of-band-file → Nix-managed sops; that is C2's whole point, not drift).
+- Backed up operator cert (`/root/ci-certs-operator-bak`), removed the regular files, `nixos-rebuild
+  switch` (detached unit `ccci-w2-switch`, Result=success).
+
+**Verified live:**
+- sops cert decrypt: `/var/lib/ci-certs/live/{fullchain,privkey}.pem` are now symlinks → `/run/secrets/
+  wildcard_{cert,key}`; content sha256 == source: `c1d96d61…` / `9ec25d00…` (byte-identical to the
+  original operator cert, now git-sourced).
+- `systemctl is-system-running` → running, 0 failed. `deploy-proxy` active/success.
+- **Byte-identical (zero drift):** `nixos-rebuild build` == `/run/current-system` == `vh6vwxbl…`.
+- **Documented git-clone path also reproduces it:** fresh `git clone --recursive` into a temp git repo
+  + `nixos-rebuild build --flake 'git+file:///tmp/ccci-git?submodules=1#cc-ci'` → **vh6vwxbl… (MATCH)**.
+  Proves the install/throwaway path works and equals running.
+- **Live TLS from git cert:** `https://ci.commoninternet.net` http=200 ssl_verify=0; random
+  `probe-*.ci.commoninternet.net` handshake ssl_verify=0 (404 route, expected) via gateway→cc-ci;
+  served leaf `CN=*.ci.commoninternet.net`, LE issuer, valid to Aug 24 2026.
+
+**For the Adversary verifying Gate W2 cold:** must init the submodule (`git clone --recursive` OR
+`git submodule update --init`, bot creds) then build with `?submodules=1`, else `secrets/` is empty.
+Both path: and git+submodules builds yield the same toplevel `vh6vwxbl…` (content-addressed).
+
+**Deferred to W3/W4 prep (NOT in W2):** the recovery-key `sops.age.keyFile` for the throwaway VM —
+adding it changes the closure again, so I'll add + test it on the throwaway (safe) and re-establish
+byte-identical there. cc-ci stays on its proven host-key decrypt path for now.
+
+**Next:** Gate W2 CLAIMED → await Adversary PASS on byte-identical + cert-in-git/TLS. Meanwhile prep W1
+(resize) / W3 (throwaway VM) — read the incus skill.
+
+## 2026-05-27 — W3 recon (read-only; while parked at Gate W2)
+
+Incus skill read. b1 = 100.117.251.31:8443, project terraform-ci, mTLS certs at
+/srv/incus-terraform-nix-vm-creator/terraform-secrets/{terraform.crt,terraform.key}. **b1 reachable
+via the EXISTING cc-ci proxy** (`curl --proxy socks5h://127.0.0.1:1055 --cert/--key -k …`) — no
+separate tailscaled needed (skill's own 1055 proxy would collide; reuse cc-ci's).
+
+terraform-ci instances + RAM:
+- cc-nix-test  Running  6GB  VM   ← this IS the live cc-ci; W1 resizes 6→4 (stop→set→start, hotplug times out)
+- lichen-staging Running 4GB container (leave alone)
+- kube-base / kube-base-test  Stopped 4GB VMs
+- release-runner Stopped 8GB VM
+Running total now = 10GB. After W1 + throwaway(4GB): 4+4+4 = 12GB ≤ 16 physical (phase-plan ~12GB
+doc-only guideline; terraform-ci has no enforced limits.memory). VM create = `projects/incus-base`
+Terraform template (NixOS base image, cloud-init+tailscale+nix flakes), set instance_name + limits.memory=4GB.
+
+## 2026-05-27 — W1 DONE: cc-nix-test resized 6→4 GB (verified)
+
+Gate W2 PASSED (Adversary, cold) → proceeded. No active CI run (only 5 permanent stacks). Resized via
+Incus API on b1 (mTLS certs through the existing 1055 proxy): PUT state stop (op Success, Stopped) →
+PATCH `limits.memory=4GB` (http 200) → PUT state start (op Success, Running).
+**Verified after reboot:**
+- SSH back in ~30s; `systemctl is-system-running` → running after ~104s (swarm/reconcile converge), 0 failed units.
+- `free -h` total 3.5Gi (≈4 GB, down from 6). All stacks 1/1 (traefik app+socket-proxy, drone, bridge, dashboard, backups).
+- **Cert survived reboot via sops:** `/var/lib/ci-certs/live/{fullchain,privkey}.pem` still symlinks →
+  /run/secrets/* (sops re-decrypted on cold boot). current-system still `vh6vwxbl…`.
+- TLS: `https://ci.commoninternet.net/` http=200 ssl_verify=0 (dashboard served from git cert).
+Running RAM now: cc-nix-test 4 + lichen-staging 4 = 8 GB; throwaway 4 → 12 GB ≤ 16 physical (guideline OK).
+
+**Next: W3** — create blank 4 GB NixOS VM in terraform-ci, provision ONLY the bootstrap (recovery) age key.
+
+## 2026-05-27 — W3: throwaway VM created (booting) + W4 design notes
+
+**W3:** Created `ccci-throwaway` in terraform-ci via the **Incus REST API** (curl through the 1055
+proxy — terraform/nix absent on sandbox; replicated `projects/incus-base/main.tf`): image
+`incus-base-vm` (fp 3a0c4160), 4 GB RAM / 2 cpu / **20 GB disk** (>10 GB default, to dodge cc-ci's old
+ENOSPC), cloud-init writes /etc/nixos/{configuration,incus-base}.nix + setup.sh + /etc/ts-auth-key
+(incus workspace reusable key) + /etc/ts-hostname=ccci-throwaway; runcmd setup.sh (nix-channel
+nixos-24.11, `nixos-rebuild boot`, sysrq reboot → tailscale auto-joins). ssh_authorized_keys = vm_ssh_key
+(I hold private) + mfowler + cc-ci-root key. CREATE+START ops Success, status Running; first boot ~4-6 min.
+NOTE: cc-nix-test was terraform-created (`projects/cc-nix-test`); my W1 API resize drifts its tfstate
+(reconcile or accept in W6 final-sizing).
+
+**W4 design (analysis; implement next):**
+- cc-ci's `hosts/cc-ci/configuration.nix` pins tailscale `--hostname=cc-nix-test` + reads /etc/ts-auth-key,
+  and `secrets.nix` decrypts ONLY via `age.sshKeyPaths` (host SSH key). Consequences for the throwaway:
+  1. **Decryption:** throwaway's host SSH key is NOT a sops recipient → cc-ci config as-is can't decrypt
+     there. **W4 must add `sops.age.keyFile = "/var/lib/sops-nix/key.txt"`** and provision the **recovery
+     age key** there (the ONE out-of-band secret). Open Q: does a *missing* keyFile abort activation on
+     cc-ci (where the file won't exist)? If yes, also provision cc-ci's own host-derived age key at that
+     path (no new exposure) OR keep sshKeyPaths+keyFile and confirm sops-nix tolerates the absence.
+     Test path: add keyFile, deploy to cc-ci (rollback-safe via generations), observe.
+  2. **Tailnet hostname:** after rebuild the throwaway re-ups as `cc-nix-test` → tailscale auto-suffixes
+     the duplicate; the REAL cc-ci is accessed by IP (100.90.116.4) so it's unaffected. Verify the
+     throwaway via its own IP (Incus state tailscale0 addr) and/or incus-agent `exec` (hostname-independent).
+  3. **Bridge side effect:** throwaway's bridge would poll Gitea with the real token (fresh state ⇒ could
+     re-trigger already-`!testme`'d PRs). Mitigate: run W4 when no `!testme` is pending; destroy promptly.
+- Adding keyFile changes the closure again (W2 byte-identical was at `vh6vwxbl`); re-verify after.
+
+## 2026-05-27 — W3 DONE (VM reachable) + keyFile finding
+
+**W3 reachable:** throwaway base boot initially failed tailscale auth — the incus-workspace
+`.test.env` key is **stale** ("invalid key: API key does not exist"). Fixed by writing the **current
+`TS_AUTH_KEY` from /srv/cc-ci/.testenv** (same tailnet `taila4a0bf.ts.net`) to /etc/ts-auth-key and
+`tailscale up`. VM now at **100.126.124.86**; `ssh -i vm_ssh_key` via the 1055 proxy works → NixOS
+24.11 (rev 50ab793, == cc-ci), nix 2.24 flakes, 4 GB / 20 GB (13 G free). *(install.md/Adversary note:
+provision the live TS key, not the stale workspace one.)*
+
+**keyFile finding (decisive):** read sops-install-secrets main.go (sops-nix 77c423a, store
+`hm2xjph…-source/pkgs/sops-install-secrets/main.go`): when `age.keyFile` is set, line ~1349
+`os.ReadFile(AgeKeyFile)` and **returns a fatal error if the file is missing** → activation fails.
+⇒ Adding `keyFile` to cc-ci's config FORCES the file to exist on cc-ci. Also: `sshKeyPaths` reads
+`/etc/ssh/ssh_host_ed25519_key` (exists on any host; non-recipient keys are simply unused), so keeping
+both is safe on both hosts.
+
+**W4 design (locked):** secrets.nix gets `sops.age.keyFile = "/var/lib/sops-nix/key.txt"` (keep
+sshKeyPaths). Provision that file = the host's bootstrap age key: on **cc-ci** = its host-derived age
+key (ssh-to-age of the host SSH key — no new secret exposure); on the **throwaway** = the **recovery
+key** (/srv/cc-ci/.sops/master-age.txt). cc-ci must get the file BEFORE the keyFile config deploys.
+Adding keyFile changes the closure (supersedes W2 `vh6vwxbl`) → re-verify byte-identical after.
+
+## 2026-05-27 — Orchestrator guidance for C4 TLS verification (W4 Step B)
+
+The throwaway has a NEW tailscale IP (100.126.124.86); the canonical `ci.commoninternet.net`
+gateway/DNS still points at the LIVE cc-ci, and the git cert is `*.ci.commoninternet.net`. So verify
+C4 TLS **locally ON the throwaway**, WITHOUT repointing the live gateway and WITHOUT changing the
+throwaway DOMAIN (keep DOMAIN=ci.commoninternet.net so the cert matches):
+- ssh into the throwaway; `curl --resolve probe.ci.commoninternet.net:443:127.0.0.1 \
+  https://probe.ci.commoninternet.net/` → hits the local traefik with SNI ci.commoninternet.net.
+- Confirm the served leaf == the git cert (sha256 fullchain `c1d96d61…`; Adversary's leaf fingerprint
+  `57:8D:67:9E:FE:89:…:B8:A6`). That proves the rebuilt system serves the git-sourced cert reproducibly.
+- Do NOT use ci2 for the TLS test (no `*.ci2` cert → would mismatch). Operator wired
+  `ci2.commoninternet.net` + `*.ci2` → 100.126.124.86 for *plain* reachability only (not needed for TLS).
+- DNS/gateway/cert are documented external INSTANCE preconditions; C4 proves the VM rebuilds from git
+  + the single bootstrap age key. Don't skip/fake the TLS check.
+
+## 2026-05-27 — W4 Step A DONE + Step B launched (throwaway rebuild in flight)
+
+**Step A (cc-ci → final keyFile config):** provisioned cc-ci `/var/lib/sops-nix/key.txt` = host-derived
+age key (pub == `age1h90utd…` == &host recipient, verified via age-keygen -y). Added
+`sops.age.keyFile` to secrets.nix (9cc6788), synced, `nixos-rebuild build`→`izsmiajw…` (only
+manifest+system rebuilt), switched (unit ccci-w4a-switch success). Verified: system running 0 failed,
+**byte-identical build==running==`izsmiajw…` (ZERO DRIFT)**, cert still sha256 `c1d96d61…`. So cc-ci
+activates cleanly with keyFile. NOTE: toplevel evolved `vh6vwxbl` (W2) → **`izsmiajw`** (final, +keyFile);
+the published repo now builds to izsmiajw==running — this is the form the Adversary re-verifies for C4/DONE.
+
+**Step B (throwaway live rebuild — IN FLIGHT):**
+- Provisioned throwaway `/var/lib/sops-nix/key.txt` = **recovery key** (via stdin; pub == `age1cmk26…`
+  == &master recipient, verified) — the ONE out-of-band secret.
+- `git clone --recursive` base (bot creds via http.extraHeader, the "given the repos" provisioning) →
+  /root/cc-ci, submodule `secrets`→2312f1c, secrets.yaml ENC. Confirmed clone has `age.keyFile` line.
+- Launched `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` as detached unit
+  `ccci-rebuild` (survives the tailscale re-up when cc-ci config activates). Monitoring via incus-agent
+  `exec` (vsock — survives network restart). Expect 10-30 min (builds sops-install-secrets/abra/etc).
+
+C4/W5 standard (Adversary dd710a6 == orchestrator guidance): keep DOMAIN=ci.commoninternet.net, verify
+TLS locally on the VM via `curl --resolve …:443:127.0.0.1` (SNI ci.commoninternet.net), served leaf
+fingerprint must == git cert leaf `57:8D:67:9E:…:B8:A6`; oneshots converge; only age key out-of-band.
+
+## 2026-05-27 — W4 Step B: throwaway rebuilt; concurrent-abra race found + fixed
+
+**Throwaway rebuild result (pre-fix config, clone @dd710a6):** `nixos-rebuild switch` BUILD succeeded
+(2.8 G peak RAM < 4 GB, 11.5 min CPU) → toplevel **`izsmiajw…` == cc-ci's running system** (blank VM
+reproduces cc-ci byte-for-byte from git + the bootstrap age key). **sops cert decrypted via the
+RECOVERY key**: /var/lib/ci-certs/live/{fullchain,privkey}.pem → /run/secrets/*, sha256 `c1d96d61…`
+(match). swarm-init + docker active (node Ready/Leader). BUT activation reported "error(s) while
+switching": `deploy-proxy` + `deploy-drone` FAILED → system `degraded`.
+
+**Root cause:** the abra reconcilers (proxy/drone/bridge/dashboard/backupbot) are all
+`wantedBy multi-user.target`; drone/bridge/dashboard were `after deploy-proxy` but **concurrent with
+each other**, and backupbot concurrent with proxy. On a FRESH `~/.abra` they race on catalogue/recipe
+init → fast failures. Confirmed: `abra recipe fetch traefik` works fine alone (rc=0); re-running the
+oneshots **sequentially** (`systemctl restart deploy-proxy; …drone; …bridge; …dashboard; …backupbot`)
+→ ALL success, system `running`, **0 failed, all 6 stacks 1/1** (traefik app+socket-proxy, drone,
+bridge, dashboard, backups) — identical to cc-ci.
+
+**Fix (7563d47):** serialize the chain via ordering-only `after`:
+proxy → drone → bridge → dashboard → backupbot (bridge after drone, dashboard after bridge, backupbot
+after dashboard). So a single `nixos-rebuild switch` on a blank host converges with no concurrent abra.
+New toplevel `ld19aj2…`. Deploying to cc-ci (reconcilers already deployed there ⇒ serial no-op
+re-runs) + re-verify byte-identical, then **recreate the throwaway FRESH** to prove single-switch
+convergence (authoritative C4; mirrors the Adversary's W5 cold test).
+
+This is the LAST planned config change before W4 completes (config stable ld19aj2 thereafter).
+
+## 2026-05-27 — W4: cc-ci on serialized config (ld19aj2) + throwaway TLS leaf-match PASS
+
+- cc-ci switched to serialized config: `systemctl is-system-running`=running, **byte-identical
+  build==running==`ld19aj2dcrjm6jarq1k6rvhc0zww34qq` (ZERO DRIFT)**, 6 stacks.
+- **Throwaway local TLS (C4 cert proof):** on the rebuilt throwaway (IP 100.126.124.86),
+  `curl --resolve probe.ci.commoninternet.net:443:127.0.0.1` → http=404 (no route, expected)
+  **ssl_verify=0**. Served leaf sha256 fingerprint == git-cert leaf:
+  `57:8D:67:9E:FE:89:D5:FB:43:2E:2A:02:D6:A6:BA:F4:9B:98:1A:78:4A:6C:6A:85:DB:F6:A2:81:61:A6:B8:A6`
+  (== Adversary reference). Full chain of custody: git sops → recovery-key decrypt → /var/lib/ci-certs/
+  live → traefik swarm secret → served leaf. The rebuilt host serves the git-sourced cert.
+
+Next: recreate throwaway FRESH with fixed config to prove SINGLE nixos-rebuild switch converges (0 failed).
+
+## 2026-05-27 — W4 DONE: genuine throwaway-VM live rebuild, SINGLE switch converges (Gate W4 CLAIMED)
+
+**Authoritative C4 proof on a FRESH blank VM** (destroyed the pre-fix VM, recreated clean; cloud-init
+used the LIVE TS_AUTH_KEY so it auto-joined the tailnet — no manual tailscale step):
+- Provisioned ONLY `/var/lib/sops-nix/key.txt` = recovery age key (pub == `age1cmk26…` == &master) —
+  the single out-of-band secret. `git clone --recursive` base+secrets (submodule 2312f1c, secrets ENC).
+- **One** `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` (detached
+  --no-block) → `ccci-rebuild` Result=**success** (~15 min, 2.8 G peak < 4 GB).
+- **`systemctl is-system-running` → running, 0 failed units** (the serialization fix works: single
+  switch converges, no manual re-runs). Toplevel **`ld19aj2…` == cc-ci** (byte-identical).
+- **All 6 stacks 1/1**: traefik app+socket-proxy, drone, ccci-bridge, ccci-dashboard, backups.
+- **All secrets decrypted via the recovery key**; wildcard cert sops-decrypted from git →
+  `/var/lib/ci-certs/live/fullchain.pem` (symlink→/run/secrets, sha256 `c1d96d61…`).
+- **TLS from git cert (local, per C4 standard):** `curl --resolve probe.ci.commoninternet.net:443:
+  127.0.0.1` → http=404 (no route, expected) **ssl_verify=0**; served leaf sha256 fingerprint
+  **== git-cert leaf == `57:8D:67:9E:FE:89:…:B8:A6`** (Adversary reference). Full chain of custody.
+
+So: blank NixOS host + the two git repos + the one bootstrap age key + external DNS/gateway → one
+`nixos-rebuild switch` → working cc-ci. No undocumented manual step. This closes D8 honestly (static
+byte-identical closure + live throwaway rebuild). install.md updated to this validated procedure.
+
+Destroying the throwaway now (frees RAM for the Adversary's independent W5 cold rebuild; C6 no-leftover).
+Gate W4 CLAIMED — awaiting Adversary cold W5 (their own fresh VM).
+
+## 2026-05-27 — Operator override: keep the FINAL throwaway (promote → cc-nix-test)
+
+Orchestrator/operator note: do NOT destroy the FINAL W5/C4-C5 clean-room throwaway VM after it
+PASSes — the operator repurposes it as the new cc-nix-test for a live real-traffic test through the
+public gateway. Keep it running; defer its C6 teardown until the operator explicitly says otherwise.
+Overrides plan §5/§6 "destroy the throwaway" for that one VM. Settles **C6 final sizing = promote the
+rebuilt VM**. Recorded in DECISIONS.md + STATUS-1c (flagged for the Adversary so they don't tear down
+their W5 VM on PASS). My already-destroyed first throwaway + RAM accounting unaffected.
+
+## 2026-05-27 — Added acceptance step: real e2e !testme on the promoted VM (operator-gated)
+
+Orchestrator added a functional-acceptance step for the clean-room rebuild. SEQUENCING (strict):
+(1) finish W5/C4-C5; (2) ORCHESTRATOR renames the verified throwaway → cc-nix-test so the public
+gateway (ci.commoninternet.net + `*.ci` via MagicDNS) routes to it, and SIGNALS me; (3) THEN I run a
+genuine e2e: `!testme` (as bot) on ONE enrolled recipe (fast, e.g. custom-html) → confirm bridge
+picks up → Drone builds → app deploys to `<recipe>.ci.commoninternet.net` reachable **through the
+public gateway** (curl the public subdomain, not localhost) → test passes → undeploy → result
+reported. Record Drone run # + public-URL curl in JOURNAL-1c/STATUS-1c as functional acceptance of
+D8/clean-room. Until the swap-done signal: keep the rebuilt VM's full stack running, do NOT tear down,
+do NOT start the e2e. (Tracked as W5.5 in BACKLOG-1c.)
+
+## 2026-05-27 — E2E-TESTME spec is authoritative (cc-ci-plan/test-e2e-testme-acceptance.md)
+
+Orchestrator: the full spec at `/srv/cc-ci/cc-ci-plan/test-e2e-testme-acceptance.md` is the AUTHORITY
+(supersedes earlier inline wording). Read it. It's MY test to execute; Adversary independently
+verifies. Preconditions P1-P3 are orchestrator-provided (node rename → cc-nix-test, public-gateway
+routing, then a SIGNAL). Self-check on signal: `curl https://ci.commoninternet.net/` → 200 ssl_verify=0.
+Pass criteria E1-E6 (new spec §3): E1 self-check; E2 new Drone build via bridge (not manual); E3 app
+answers EXTERNAL request at `<app>.ci.commoninternet.net` through gateway (real 200+cert+content, not
+localhost); E4 real assertions pass / build success; E5 clean undeploy; E6 reported + dashboard
+updated. Evidence→JOURNAL-1c, verdict→STATUS/REVIEW-1c as E2E-TESTME PASS. On fail: clean-room finding
+→ fix in GIT SOURCE (base/cc-ci-secrets), not the live VM → re-run. Bound: one recipe, one green run.
+Not started — awaiting orchestrator signal; rebuilt VM stack kept up.
+
+## 2026-05-27 — E2E-TESTME: Builder now owns the tailnet swap (no orchestrator signal)
+
+Spec §1 updated (re-read): the Builder performs the swap end-to-end after C4/C5 PASS + rebuilt stack
+up — NO orchestrator signal. Two reversible `tailscale set --hostname` (ORDER MATTERS):
+(1) `ssh cc-ci 'tailscale set --hostname=cc-nix-test-orig'` (original aside, KEEP running for swap-back;
+ssh cc-ci pinned to 100.90.116.4 still hits original); (2) rebuilt throwaway → cc-nix-test (re-derive
+its current online IP from `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep -i
+throwaway`). Then cc-nix-test.taila4a0bf.ts.net → rebuilt VM tailnet-wide; gateway auto-follows ~10s.
+Verify P1+P2 (status shows cc-nix-test→throwaway IP; `curl https://ci.commoninternet.net/` 200
+ssl_verify=0) → run E2E-TESTME (E1-E6) → swap-back (rebuilt→old name, `ssh cc-ci 'tailscale set
+--hostname=cc-nix-test'`). Orchestrator just monitors / safety-net.
+
+**Two execution watch-outs I'll handle at run time** (reasoned, not yet done): (a) the original
+(cc-nix-test-orig) keeps its bridge polling Gitea with the same token → would duplicate builds/PR
+comments; pause it during the e2e (`docker service scale ccci-bridge_app=0` on the original, restore
+after). (b) the rebuilt VM's Drone needs the one-time OAuth bootstrap (install.md §2,
+scripts/bootstrap-drone-oauth.sh) before it can clone/build — a documented post-step, run it on the
+rebuilt VM as part of e2e setup. Still gated on C4/C5 PASS (W5) — not started.
+
+## 2026-05-27 — E2E-TESTME actor/critic split clarified (avoid node-rename collision)
+
+Orchestrator disambiguation: only ONE loop runs `tailscale set --hostname`. **Builder (me) owns the
+swap + the !testme test**; the swap TARGET is the **Adversary's** kept-running W5 VM (Incus instance
+**`ccci-w5-rebuild`**) — my own throwaway was destroyed. The **Adversary does NOT rename**; it keeps
+its W5 VM up, **records the VM identity (Incus instance + current tailscale IP) in REVIEW-1c/STATUS**,
+and independently VERIFIES E1-E6 cold (critic role). So I **WAIT for (i) Adversary W5 PASS + (ii) the
+recorded VM IP** before swapping (original→cc-nix-test-orig, then ccci-w5-rebuild→cc-nix-test). Updated
+STATUS-1c pending-e2e accordingly. Still gated on W5 — not started.
+
+## 2026-05-27 — E2E-TESTME clean-room finding: Drone bot token not reproducible (FIXED in git)
+
+Doing the e2e setup on the swapped-in rebuilt VM, found the sops `bridge_drone_token` gets **401
+Unauthorized** from the rebuilt VM's Drone. Root cause: `modules/drone.nix` set
+`DRONE_USER_CREATE=username:autonomic-bot,admin:true` with **no `token:`** → Drone auto-generates a
+RANDOM bot machine token in its fresh DB, which can't equal the committed sops token (the original
+cc-ci only matched because its token was captured FROM the running Drone out-of-band). So on a genuine
+clean-room rebuild the bridge can't authenticate to Drone → can't trigger builds. This is precisely the
+out-of-band gap the E2E-TESTME is designed to catch (spec §4). **Fix (git source):**
+`DRONE_USER_CREATE=...,token:$(cat /run/secrets/bridge_drone_token)` so the bot's machine token is the
+deterministic sops token on every rebuild. Confirmed via: rebuilt Drone container env had no token;
+`GET /api/repos/.../builds` with sops token → `{"message":"Unauthorized"}`.
+Evolves the toplevel again (ld19aj2 → new); will re-deploy to cc-ci + re-verify byte-identical after
+the e2e, Adversary re-checks C1. Next: apply fix on the rebuilt VM (rebuild → redeploy Drone; wipe
+Drone DB if DRONE_USER_CREATE doesn't update the existing bot), re-run OAuth, then the !testme e2e.
+
+## 2026-05-27 — E2E-TESTME on the rebuilt VM: E1-E3 PASS (E4/E5 tracking)
+
+After applying the Drone-token fix (new toplevel `cqym8knj…`), the rebuilt VM is operational. Restarted
+drone-runner-exec (stale RPC after the Drone redeploy) → queue drained (cc-ci self-test #1 success).
+Posted `!testme` (comment 13740, autonomic-bot) on custom-html#2 (head db9a9502). Evidence:
+- **E1 PASS** — `https://ci.commoninternet.net/` via public gateway → 200 ssl_verify=0 (rebuilt VM).
+- **E2 PASS** — bridge (poll) picked up the comment → **new Drone build #4** (event=custom, > baseline
+  #3) on the rebuilt VM's Drone. Not a manual trigger.
+- **E3 PASS** — app deployed to `cust-bdddd9.ci.commoninternet.net`; EXTERNAL curl through the public
+  gateway (sandbox → socks proxy → public DNS → gateway → MagicDNS cc-nix-test → rebuilt VM → Traefik →
+  app) → **HTTP/2 200, ssl_verify=0**, `server: nginx/1.31.1`, body `<!DOCTYPE html>…Welcome to nginx!`
+  (real app content, NOT a Traefik 404), cert `CN=*.ci.commoninternet.net` (LE E8). Crux proven.
+- E4 (build #4 success), E5 (teardown), E6 (reported+dashboard): monitor tracking to build terminal.
+
+## 2026-05-27 — E2E-TESTME: ALL E1–E6 PASS (functional acceptance of D8/clean-room)
+
+Real `!testme` on the rebuilt-from-git VM (swapped in as cc-nix-test), full pipeline against the
+PUBLIC domain:
+- **E1 PASS** — `https://ci.commoninternet.net/` (public gateway → rebuilt VM) → 200 ssl_verify=0.
+- **E2 PASS** — `!testme` (bot, comment 13740) on custom-html#2 → bridge poll → **new Drone build #4**
+  (event=custom, > baseline #3), via the bridge (not manual).
+- **E3 PASS** — app `cust-bdddd9.ci.commoninternet.net` answered an EXTERNAL request through the public
+  gateway → HTTP/2 200, ssl_verify=0, nginx/1.31.1, real body `…Welcome to nginx!`, cert
+  `CN=*.ci.commoninternet.net` (LE E8). Routing public-DNS→gateway→MagicDNS→rebuilt VM→Traefik→app proven.
+- **E4 PASS** — build #4 success; build log shows the REAL 3 stages all passing (no softening):
+  install (`test_http_reachable`, `test_playwright_page` — Playwright), upgrade
+  (`test_upgrade_preserves_data`), backup (`test_backup_mutate_restore`). 2+1+1 assertions passed.
+- **E5 PASS** — app undeployed cleanly afterward (0 residual `<tag>-<6hex>` app .envs/stacks).
+- **E6 PASS** — bridge posted to custom-html#2: "custom-html @ db9a9502 ✅ **passed** →
+  …/cc-ci/4"; public dashboard row = custom-html / success / #4.
+
+→ **E2E-TESTME PASS.** The clean-room-rebuilt VM is operationally a working CI server end-to-end over
+the real public domain. Caught+fixed the Drone-bot-token reproducibility gap en route (af46aca).
+Next: swap-back; re-deploy the token fix to cc-ci (byte-identical at new toplevel cqym8knj); Adversary
+independently verifies E1-E6.
+
+## 2026-05-27 — Builder work COMPLETE (C1–C7 + E2E-TESTME); awaiting Adversary final verification
+
+cc-ci on final config `cqym8knj` (byte-identical, 0 failed, bridge→Drone OK). C7 docs done:
+install.md/secrets.md/architecture.md updated to the 1c model; plan.md §1.5 carries a Phase-1c
+supersession note (cert now sops-from-git; bootstrap age key the one out-of-band secret; supersedes
+§1.5/§4.0/§4.4 cert refs; points to docs/secrets.md). C6 settled (promote rebuilt VM, kept running;
+first throwaway destroyed; cc-nix-test 4 GB). All C1–C7 + E2E-TESTME implemented & Builder-verified.
+**Remaining = Adversary's final DONE-verification:** re-confirm C1 byte-identical at `cqym8knj` +
+independently verify E1–E6. I'll write `## DONE` when REVIEW-1c shows <24h PASS for C1–C7 + E2E-TESTME
+and no VETO. (plan.md is in cc-ci-plan/, not this repo — edited in place, not committed here.)
+
+## 2026-05-27 — ADV-1c-1 (architecture.md stale) addressed
+
+Adversary verdict b301b03: **E2E-TESTME E1–E6 PASS** (independent) + **C1–C6 PASS** (C1 refreshed cold
+at final `cqym8knj` == running, byte-identical; no VETO). **C7 WITHHELD** on finding ADV-1c-1:
+`docs/architecture.md` allegedly stale (line 17 "local secrets/secrets.yaml via host SSH key", cert
+"pre-issued out-of-band"). **But architecture.md was already updated to the 1c model in commit b700cd2**
+(an ANCESTOR of `3bfb48b`, the HEAD the Adversary cloned for C1) — current line 14/17 + §Network/TLS
+describe the `cc-ci-secrets` submodule, bootstrap age key, and cert-sops-from-git. The quoted "stale"
+text is the PRE-b700cd2 line 17 → ADV-1c-1 is a stale-clone false positive (the doc-grep used an older
+checkout). To remove all doubt I further expanded line 17 (explicit: cert-in-git, submodule, bootstrap
+key = host-derived OR recovery-key-on-clone, one out-of-band secret). **Adversary: please re-grep
+`docs/architecture.md` at current HEAD and close ADV-1c-1 → C7 PASS → DONE.**
+
+## 2026-05-27 — ## DONE (Phase 1c complete)
+
+Adversary closed ADV-1c-1 → **C7 PASS** (9e0f72a). **ALL C1–C7 + E2E-TESTME Adversary-PASS (<24h, no
+VETO, no open findings).** Final Builder health check: cc-ci running/0-failed, byte-identical
+build==running==`cqym8knj` (ZERO DRIFT), 6 stacks, cert sops-from-git `c1d96d61…`, public TLS 200/ssl=0.
+Wrote `## DONE` to STATUS-1c. Phase 1c exit condition met → stopping the self-paced loop. The Adversary
+will append its final cold sign-off. Operator follow-up (non-gating): promote `ccci-w5-rebuild`→cc-nix-test
+(bridge paused, stack up); plan.md §4.0/§4.4 cert wording (superseding note at §1.5).
--- a/machine-docs/JOURNAL.md
+++ b/machine-docs/JOURNAL.md
@ -0,0 +1,790 @@
+# JOURNAL — cc-ci Builder (append-only)
+
+## 2026-05-26 — Bootstrap (§1)
+
+**Access verification (all pass):**
+- `ssh cc-ci 'hostname && whoami && nixos-version'` → `nixos` / `root` / `24.11.719113.50ab793786d9 (Vicuna)`
+- `curl https://git.autonomic.zone/api/v1/version` → `{"version":"1.24.2"}`
+- Gitea bot auth (`curl -u $GITEA_USERNAME:$GITEA_PASSWORD .../api/v1/user`) → `login: autonomic-bot`, id 64
+- `getent hosts probe-$RANDOM.ci.commoninternet.net` → `143.244.213.108` (the gateway IP, as expected — TLS passthrough)
+- Cert present: `ls /var/lib/ci-certs/live/` → `fullchain.pem` (2909 b), `privkey.pem` (227 b, mode 640)
+- recipe-maintainers org exists (private); `recipe-maintainers/cc-ci` → 404 (created below)
+- Mirrored recipes already present: bluesky-pds, lasuite-docs, custom-html, custom-html-tiny, n8n,
+  keycloak, lasuite-meet, matrix-synapse, cryptpad
+
+**Baseline (docs/baseline.md):** fresh NixOS 24.11 Incus VM, 2 vCPU, 3.5 GiB RAM, 8.9 GiB disk
+(3.8 GiB free). No docker/swarm/abra. Channel-based `/etc/nixos/configuration.nix` (no flake).
+
+**Actions:**
+- Created repo `recipe-maintainers/cc-ci` (private) via Gitea API.
+- `git init` in /srv/cc-ci/cc-ci; credential helper reads creds from /srv/cc-ci/.testenv (no
+  secrets stored in git config).
+- Seeded skeleton layout (§3) + loop-state files + docs/baseline.md.
+
+**Next:** commit + push bootstrap, then M0 (flake + base config + sops test secret).
+
+## 2026-05-26 — M0: flake + base config rebuilt from repo
+
+**Authored** `flake.nix` (pins nixpkgs rev `50ab793786d9…`, the exact rev cc-ci ran),
+`hosts/cc-ci/hardware.nix` (incus VM module + cloud-init + DHCP/nameservers) and
+`hosts/cc-ci/configuration.nix` (faithful baseline repro: tailscale w/ hardcoded `--hostname=
+cc-nix-test` since `builtins.readFile /etc/ts-hostname` is impure under flakes; sshd root; firewall
+trust tailscale0 + tcp/22; base pkgs).
+
+**Disk/inode hiccup → resolved:** first `nix flake lock`/build hit `No space left on device` —
+diagnosed as **inode** exhaustion (`df -i` → 6005 free of 586336; old 8.9 GiB fs). Operator grew
+the VM to 28 GiB while I was measuring; ext4 auto-resized → 22 GiB free, 1.21M inodes free. Retried.
+
+**Build + switch (commands + output):**
+- `ssh cc-ci 'cd /root/cc-ci && nix flake lock && nixos-rebuild build --flake .#cc-ci'` → `BUILD EXIT 0`,
+  produced `nixos-system-nixos-24.11.20250630.50ab793`.
+- `ssh cc-ci 'systemd-run --unit=ccci-rebuild --collect --property=Type=oneshot nixos-rebuild switch
+  --flake /root/cc-ci#cc-ci'` (detached so it survives ssh drop) → unit `Result=success
+  ExecMainStatus=0`.
+
+**Gate verification:**
+- `systemctl is-system-running` → `running`
+- `readlink /run/current-system` → `…-nixos-system-nixos-24.11.20250630.50ab793` (gen 3, from flake)
+- `systemctl is-active tailscaled` → `active`; `sshd.socket` → `active` (sshd is socket-activated, so
+  `sshd.service` reads inactive — live ssh proves it works)
+- `systemctl --failed` → none
+- `nixos-rebuild list-generations` → gen 3 current @20:23, prior channel gen 2 retained for rollback.
+
+**Known warning (tracked, non-blocking):** incus module enables `systemd.network` while we keep
+`networking.useDHCP=true` (scripted dhcpcd); Nix warns both may manage interfaces. Inherited from
+baseline; networking is up. Clean up by choosing one stack later.
+
+**Deploy mechanism settled** (DECISIONS.md): `switch --flake` on-host, repo synced via `tar | ssh`.
+
+**Next:** sops-nix wiring (host age key from ssh host key + a decrypt-a-test-secret proof), then
+CLAIM the M0 gate for the Adversary.
+
+## 2026-05-26 — M0: sops-nix wiring + decrypt-a-test-secret (M0 COMPLETE, gate CLAIMED)
+
+**Keys:**
+- Host age recipient from ssh host key: `ssh cc-ci 'nix run nixpkgs#ssh-to-age -- -i
+  /etc/ssh/ssh_host_ed25519_key.pub'` → `age1h90utdztfc23kx8ewrtrtk80mnddvrf8pg4ppej55rwwwupzhfvqhmp3qa`.
+- Master recovery key generated on host (`age-keygen`), public `age1cmk26t…`; private moved off-box
+  to `/srv/cc-ci/.sops/master-age.txt` (mode 600) and `shred`-ded from the host. Never in repo.
+
+**Files:** `.sops.yaml` (both recipients, rule `secrets/.*\.(yaml|json|env)$`); `modules/secrets.nix`
+(`sops.age.sshKeyPaths=[/etc/ssh/ssh_host_ed25519_key]`, `secrets.test_secret={}`); flake gains
+`sops-nix` input + `sops-nix.nixosModules.sops`; configuration.nix imports the module.
+
+**sops-nix version pin (dead-end avoided):** master sops-nix wants `buildGo125Module` (Go 1.25),
+absent in pinned nixpkgs 24.11 → eval error. Pinned sops-nix to `77c423a…` (2025-06-17, last using
+plain `buildGoModule`). Verified the file at that rev uses `buildGoModule`. Build then OK.
+
+**Encrypt test secret:** on host, `printf 'test_secret: cc-ci-m0-<rand>' > secrets/secrets.yaml`
+then `nix run nixpkgs#sops -- --encrypt --in-place secrets/secrets.yaml` (run inside repo so
+`.sops.yaml` resolves) → rc=0, two age recipients in the file.
+
+**Build + switch (commands + output):**
+- `nixos-rebuild build --flake .#cc-ci` → `BUILD EXIT 0` (built sops-install-secrets w/ Go 1.23.8).
+- `systemd-run --unit=ccci-rebuild2 ... nixos-rebuild switch --flake /root/cc-ci#cc-ci` →
+  `Result=success ExecMainStatus=0`.
+
+**Gate verification (M0):**
+- `systemctl is-system-running` → `running`; `systemctl --failed` → none.
+- `ls -la /run/secrets/test_secret` → `-r-------- 1 root root 41` ; `stat` → `root:root 400`.
+- `head -c9` → `cc-ci-m0-` (matches generated value), `wc -c` → 41 (9 + 32 hex). Decrypt path proven.
+- Pulled encrypted `secrets/secrets.yaml` + `flake.lock` back to clone; `grep cc-ci-m0 secrets.yaml`
+  → no plaintext leak; lock inputs = nixpkgs, sops-nix.
+
+**Gate handshake:** set `Gate: M0 — CLAIMED, awaiting Adversary` in STATUS.md. REVIEW.md still empty
+(no Adversary activity yet). Per §6.1 liveness I won't idle-block: I keep M0 claimed and proceed
+with M1 (independent infra build), without advancing to M2 until M0 shows PASS.
+
+**Next:** M1 — Docker + single-node swarm via Nix (modules/swarm.nix), then Traefik (file provider
+→ /var/lib/ci-certs/live/) + abra, then a by-hand HTTPS deploy/teardown of a trivial recipe.
+
+## 2026-05-26 — M1: Docker + single-node swarm via Nix
+
+**modules/swarm.nix:** `virtualisation.docker.enable` + daily autoprune (--all --volumes until=24h
+to protect the 28 GiB root), `docker` in systemPackages, and a `swarm-init` oneshot
+(`docker swarm init --advertise-addr 127.0.0.1` if not active; `docker network create --driver
+overlay --attachable proxy` if absent). Imported into configuration.nix.
+
+**Build + switch:** `nixos-rebuild build --flake .#cc-ci` → EXIT 0; `systemd-run … switch` →
+`Result=success`.
+
+**Verify (commands + output):**
+- `systemctl show swarm-init -p Result` → `Result=success`
+- `docker info --format ...` → `Swarm=active Managers=1 Nodes=1`
+- `docker network ls --filter name=proxy` → `proxy overlay swarm`
+- `systemctl is-system-running` → `running`; `--failed` → none.
+
+**Next:** Traefik as a swarm stack (Nix-declared compose + `docker stack deploy` oneshot): docker
+swarm provider + file provider serving /var/lib/ci-certs/live/{fullchain,privkey}.pem on :443,
+attached to `proxy`. Then abra install + by-hand HTTPS deploy/teardown of a trivial recipe (M1 gate).
+Rationale for swarm-service Traefik over a host `services.traefik`: a host process isn't on the
+`proxy` overlay, so it can't reach swarm service VIPs; coop-cloud recipes assume an on-`proxy`
+Traefik watching swarm labels.
+
+## 2026-05-26 — M1: Traefik swarm stack + HTTPS path proven
+
+**modules/traefik.nix:** Traefik v3.3 as a swarm service on `proxy` (so it reaches recipe VIPs).
+Config via Nix `writeText` store files bind-mounted into the container (real files, not /etc
+symlinks): static `traefik.yml` (entrypoints web/websecure; `providers.swarm` unix socket,
+exposedByDefault=false, network=proxy; `providers.file` dir /etc/traefik/dynamic; ping; no
+dashboard) and dynamic `certs.yml` (wildcard at /var/lib/ci-certs/live/* as `stores.default.
+defaultCertificate` + certificates — so any *.ci.commoninternet.net router with tls=true is covered,
+no ACME). Deployed by a `traefik-deploy` oneshot (`docker stack deploy`) after swarm-init. Opened
+firewall 80/443 (gateway forwards over enp5s0).
+
+**Build + switch:** build EXIT 0; switch `Result=success`; `traefik-deploy` `Result=success`;
+`docker service ls` → `traefik_traefik traefik:v3.3 1/1`.
+
+**Verify (commands + output):**
+- Local: `curl -ksv -H 'Host: probe-test.ci.commoninternet.net' https://localhost/` →
+  `subject: CN=*.ci.commoninternet.net`, `issuer: …Let's Encrypt; CN=E8`, TLSv1.3, HTTP 404.
+- **End-to-end via gateway:** `curl -ksv --resolve probe-test.ci.commoninternet.net:443:143.244.213.108
+  https://probe-test.ci.commoninternet.net/` → `Connected to …(143.244.213.108) port 443`,
+  same wildcard cert, HTTP 404. Confirms gateway SNI-passthrough → cc-ci Traefik TLS termination.
+  404 is correct (no router for that host yet).
+
+**Next:** install abra (M1 last task), `abra app new` a trivial recipe (custom-html) → deploy →
+reach over HTTPS at <app>.ci.commoninternet.net → teardown leaving no volumes. That completes M1
+→ CLAIM M1 gate.
+
+## 2026-05-26 — M1: proxy pivot to real coop-cloud/traefik via abra; recipe deploy/teardown (M1 CLAIMED)
+
+**Orchestrator decision (mid-M1):** replace the hand-rolled Traefik with the canonical Co-op Cloud
+`traefik` recipe deployed via abra, wildcard/file-provider mode, no ACME/token. Removed custom
+`modules/traefik.nix`; moved firewall 80/443 into `modules/swarm.nix`. Recorded in DECISIONS.md.
+
+**Why the pivot also fixed a real bug:** my custom Traefik used entrypoint `websecure`; coop-cloud
+recipes label `entrypoints=web-secure`. While chasing that I also hit a sharp **systemd-run gotcha**:
+`systemd-run … nixos-rebuild switch --flake .#cc-ci` runs with cwd `/`, so `.#` → `/` → "could not
+find a flake.nix"; the switch silently failed while a post-`--collect` `systemctl show` returned a
+stale `Result=success`. Fix: always use the **absolute** flake path `/root/cc-ci#cc-ci`, and read the
+result before resetting. (rebuild6/7 had silently not applied; rebuild2–5 used the absolute path.)
+
+**abra packaged** (modules/abra.nix): release binary 0.13.0-beta, pinned by sha256, autoPatchelf'd.
+`abra --version` → `0.13.0-beta-06a57de`.
+
+**scripts/deploy-proxy.sh** (idempotent, pure-bash — host has no python3): ensure local abra server,
+fetch traefik, write wildcard/no-ACME env (`WILDCARDS_ENABLED=1`, `SECRET_WILDCARD_*_VERSION=v1`,
+`COMPOSE_FILE=compose.yml:compose.wildcard.yml`, `LETS_ENCRYPT_ENV=` empty), insert cert secrets via
+`abra app secret insert … -f` from /var/lib/ci-certs/live, deploy. Bugs fixed en route: multi-line
+PEM must use `-f` (not arg); secret-presence must check `docker secret ls` (abra's recipe list always
+shows the name with `created on server:false`).
+
+**Traefik deploy:** `abra app deploy` → `deploy succeeded 🟢` (traefik v3.6.15 + socket-proxy).
+Verify: `docker service ls` → app+socket-proxy 1/1; via gateway `curl --resolve probe.*:443:
+143.244.213.108` → `CN=*.ci.commoninternet.net` (LE E8); **0 ACME log lines**.
+
+**M1 gate (recipe over HTTPS + teardown):**
+- `abra app new custom-html -s default -D cchtml1.ci.commoninternet.net -S -n` then set
+  `LETS_ENCRYPT_ENV=` and `abra app deploy -n -C` → `🟢` (nginx 1.29.0).
+- `curl -ks --resolve cchtml1.ci.commoninternet.net:443:143.244.213.108 https://…/` →
+  `http_code=200 size=615`, served the nginx welcome page over HTTPS with the wildcard cert.
+- Teardown: `abra app undeploy -n` → 🟢; `abra app volume remove -f -n` → "1 volumes removed";
+  leak check → services 0 / volumes 0 / secrets 0 / containers 0. **Clean.**
+- Correct teardown syntax confirmed: `secret remove <d> --all -n` (not `--all-secrets`).
+
+**docs/install.md** seeded (flake apply + deploy-proxy + verify). M1 gate CLAIMED in STATUS.md.
+
+**Next:** M2 — Drone server + exec runner via Nix, Gitea OAuth app, hello-world .drone.yml green.
+
+## 2026-05-26 — M2 start: CI engine decision + Gitea OAuth app + Drone secrets
+
+**Decision (DECISIONS.md):** keep Drone per plan. nixpkgs 24.11 has drone server 2.24.0 but only the
+abandoned `drone-runner-exec` (unstable-2020) — accepted (stable RPC), Woodpecker is the documented
+fallback. Deploy shape mirrors traefik: server via coop-cloud `drone` recipe (abra, swarm,
+traefik-routed at drone.ci.commoninternet.net, no ACME), exec runner as a host Nix systemd service.
+
+**Recipe recon:** coop-cloud `drone` recipe = drone/drone:2.26.0, secrets `rpc_secret` +
+`CLIENT_SECRET` (Gitea OAuth), Gitea SSO via `compose.gitea.yml` (`GITEA_CLIENT_ID`, `GITEA_DOMAIN`).
+Server env: DRONE_SERVER_HOST/PROTO, DRONE_USER_CREATE.
+
+**Done this tick:**
+- Created Gitea OAuth app `cc-ci-drone` (bot): client_id `ab4cdb9d-…`, redirect
+  `https://drone.ci.commoninternet.net/login`.
+- Generated `DRONE_RPC_SECRET` (openssl-equivalent /dev/urandom hex32) + stored client_secret;
+  both added to `secrets/secrets.yaml` via `sops set` (needed `SOPS_AGE_KEY` from the host ssh key:
+  `ssh-to-age -private-key -i /etc/ssh/ssh_host_ed25519_key`). Verified: decrypt shows keys
+  test_secret/drone_rpc_secret/drone_gitea_client_secret; file stays encrypted (4× ENC).
+
+**Next:** scripts/deploy-drone.sh (abra deploy of drone server w/ Gitea SSO + rpc/client secrets),
+modules/drone-runner.nix (exec runner systemd unit, rpc secret from sops), wire sops secrets for the
+runner, then push a hello-world .drone.yml and confirm a green build (M2 gate).
+
+## 2026-05-26 — M2: Drone server + exec runner up; infra as idempotent-reconcile oneshots
+
+**Orchestrator steer (2×):** collapse install to a single `nixos-rebuild switch` — convert the
+manual deploy scripts into **idempotent-reconcile systemd oneshots** (writeShellApplication, embedded
+in store; after swarm-init+docker; wants network-online; wantedBy multi-user; reconcile every
+activation/boot, NO run-once sentinel; fail visibly on missing cert). Applied to proxy + drone.
+
+**Refactor done:**
+- `modules/packages.nix`: `pkgs.abra` overlay (shared pinned build).
+- `modules/proxy.nix`: `deploy-proxy` oneshot — reconciles coop-cloud traefik (wildcard/no-ACME).
+- `modules/drone.nix`: `deploy-drone` oneshot — reconciles coop-cloud drone (Gitea SSO, secrets from
+  /run/secrets), after deploy-proxy.
+- `modules/drone-runner.nix`: exec runner (fixed PATH conflict via `lib.mkForce`; allowUnfree for
+  drone-runner-exec — Polyform license).
+- `modules/secrets.nix`: declared drone_rpc_secret + drone_gitea_client_secret + a sops *template*
+  `drone-runner.env` (DRONE_RPC_SECRET) as the runner's EnvironmentFile (shared secret).
+- Removed `scripts/deploy-*.sh`. install.md now = clone + nixos-rebuild switch + preconditions.
+
+**Build/switch:** build EXIT 0 (shellcheck clean via writeShellApplication; runner pkg unfree-allowed).
+`nixos-rebuild switch` → all three units `active`/`success`:
+- `deploy-proxy` success (reconciled traefik), `deploy-drone` → `deploy succeeded 🟢` (drone/drone
+  2.26.0, secrets client_secret+rpc_secret v1, drone_env config), `drone-runner-exec` active.
+
+**Verify (commands + output):**
+- `docker service ls` → `drone_ci_commoninternet_net_app 1/1`, traefik app+socket-proxy 1/1.
+- Via gateway: `…/healthz` → **200**; `/` → **303** (login redirect, correct).
+- Runner: journal shows a few startup `cannot ping the remote server (404)` (drone RPC not ready
+  yet) then `successfully pinged the remote server` + `polling the remote server capacity=2
+  endpoint=https://drone.ci.commoninternet.net kind=pipeline type=exec`. **Runner connected via RPC.**
+
+**Remaining for M2 gate:** push a hello-world `.drone.yml` to cc-ci + get a green build. Needs the
+cc-ci repo activated in Drone, which requires the bot's Gitea OAuth login (browser flow) to grant
+Drone a Gitea token (to sync repos + set the push webhook). Next tick: script the OAuth login to mint
+a Drone token, activate cc-ci, push .drone.yml, confirm green. (DRONE_USER_CREATE made autonomic-bot
+the admin.)
+
+## 2026-05-26 — M2 GATE MET: green build via push (Drone + exec runner)
+
+**Drone↔Gitea OAuth (scripted, the one manual bootstrap):** logged the bot into Gitea (CSRF cookie
+→ form), drove Drone `/login` → Gitea authorize consent (POST `/login/oauth/grant` with _csrf+state+
+granted=true) → code callback → Drone `_session_`. Captured the whole flow in
+`scripts/bootstrap-drone-oauth.sh` (reads bot creds from env; documented in install.md §2; one-time,
+token persists in Drone's data volume).
+
+**Repo activation:** `GET /api/user` → autonomic-bot admin=true; `GET /api/user/repos?latest=true`
+synced 12 repos; `POST /api/repos/recipe-maintainers/cc-ci` → active=true, config_path .drone.yml
+(sets the Gitea push webhook).
+
+**Green build:** added `.drone.yml` (exec pipeline), pushed (0d89e28). Polled
+`/api/repos/recipe-maintainers/cc-ci/builds` → build #1 pending→running→**success**. Steps:
+clone success exit 0; hello success exit 0 — log shows `whoami=root`, `abra 0.13.0-beta-06a57de`,
+`swarm=active` (ran on the host via the exec runner). **M2 gate met; CLAIMED.**
+
+**Next:** M3 — comment-bridge service: Gitea issue_comment webhook → verify HMAC + `!testme` exact +
+collaborator → resolve PR head repo/SHA → trigger a parameterized Drone build; post a PR comment with
+the run link. Need a Drone API token for the bridge (mint from the bot's Drone account).
+
+## 2026-05-26 — M3 start: bridge secrets + comment-bridge source
+
+**Secrets (sops):** minted a Gitea API token (`cc-ci-bridge`, scopes read:org/user, write:repo/issue),
+a Drone API token (`POST /api/user/token`, the stable personal token; rotates on call), and a webhook
+HMAC (urandom hex64). Stored as bridge_gitea_token / bridge_drone_token / bridge_webhook_hmac via
+`sops set` (host age identity). secrets.yaml now holds 6 secrets.
+
+**bridge/bridge.py** (Python stdlib only, §4.1): POST /hook handler — verifies Gitea HMAC
+(`X-Gitea-Signature` sha256), requires `X-Gitea-Event: issue_comment`, action=created, body trimmed
+== `!testme`, issue is a PR; checks commenter is a collaborator (Gitea collaborators endpoint, 204);
+resolves PR head sha+repo; triggers a parameterized Drone build
+(`POST /api/repos/<CI_REPO>/builds?branch=main&RECIPE&REF&PR&SRC`, custom params → pipeline env);
+posts a PR comment linking the run. Secrets read from mounted files; config via env. `/healthz` GET.
+
+**Next:** package the bridge as a swarm service (dockerTools image, no Docker Hub pull) behind
+traefik at `ci.commoninternet.net/hook` via a reconcile oneshot (modules/bridge.nix); register a
+per-repo webhook with the HMAC; demo on a scratch PR (!testme triggers; non-!testme + non-collab
+rejected). That's the M3 gate.
+
+## 2026-05-26 — M3: bridge deployed + verified; webhook DELIVERY blocked (Gitea-side)
+
+**Deployed** the comment-bridge as a Nix-built OCI image (no Docker Hub pull) → swarm service on
+`proxy`, behind traefik at `ci.commoninternet.net/hook`, via reconcile oneshot `modules/bridge.nix`.
+Swarm secrets (webhook_hmac/drone_token/gitea_token) materialised from /run/secrets.
+
+**Verified working (bridge side):**
+- `docker service ls` → ccci-bridge_app 1/1.
+- `GET /hook/healthz` → 200 **from the sandbox over real public DNS** (ci.commoninternet.net →
+  143.244.213.108); also 200 via gateway from cc-ci.
+- HMAC logic: bad sig → 401; a manually openssl-HMAC-signed body → 204 (passes sig, ignored as
+  non-trigger); wrong event → 204. (Debug log added: `got=/want=/bodylen/seclen`.)
+- Registered per-repo `issue_comment` webhook (id 210) on recipe-maintainers/cc-ci → ci.../hook with
+  the HMAC. Created scratch PR #1.
+
+**Blocker found:** commenting `!testme` (×several) and Gitea's "Test Delivery" (UI returns 200) yield
+ZERO requests at the bridge container. Bridge is publicly reachable by hostname from a 3rd network;
+gateway accepts public sources; public DNS correct → Gitea is not *sending* the delivery. Deliveries
+panel is AJAX (uninspectable via curl); bot is not Gitea admin (can't read `ALLOWED_HOST_LIST`).
+Conclusion: git.autonomic.zone webhook policy (likely `ALLOWED_HOST_LIST`) blocks ci.commoninternet.net.
+Recorded in STATUS ## Blocked with operator options (whitelist host, or I pivot bridge to polling).
+
+**Plan:** surface to operator; meanwhile proceed to M4 (harness + install stage) which doesn't depend
+on the webhook (dev recipe-CI builds triggerable directly via the Drone API). Revisit M3 gate once the
+host is whitelisted or via the polling fallback.
+
+## 2026-05-27 — M4: harness + install stage green (custom-html), guaranteed teardown
+
+**Built the harness:** `runner/harness/abra.py` (abra wrappers w/ gotchas: no --chaos on
+undeploy/volume-remove, `-n` everywhere, parse `app ls -S -m` nested {server:{apps}}, timeouts),
+`runner/harness/lifecycle.py` (deploy_app forcing `LETS_ENCRYPT_ENV=""` [A1], wait_healthy =
+services-converged + HTTPS, teardown_app = undeploy+volume+secret+env-config, janitor for orphans),
+`tests/conftest.py` (`deployed_app` session fixture with finalizer teardown; short unique domain),
+`tests/custom-html/test_install.py` (HTTP 200 + Playwright/Chromium content assertion),
+`runner/run_recipe_ci.py` (orchestrator: fetch recipe@REF, run stage pytest), `modules/harness.nix`
+(`cc-ci-run` = Nix python3+pytest+playwright with PLAYWRIGHT_BROWSERS_PATH from nixpkgs).
+
+**Bugs fixed en route (3):**
+1. Swarm config name > 64 chars (long domain) → switched to short `<recipe[:4]>-<6hex>` domain
+   scheme (DECISIONS.md). 
+2. `services_converged` used wrong stack name (replaced hyphens) → abra keeps hyphens, only dots→_.
+3. `http_get` connected to the gateway IP (drops SNI, gateway routes by SNI) → use the real URL
+   (resolves to gateway on cc-ci, correct SNI). Also teardown now removes the app .env config.
+
+**Green run + teardown (commands + output):**
+- `RECIPE=custom-html PR=0 REF=m4demo cc-ci-run runner/run_recipe_ci.py` →
+  `tests/custom-html/test_install.py::test_http_reachable PASSED`,
+  `::test_playwright_page PASSED` — **2 passed in 57.99s**.
+- Leak check after: services 0 / volumes 0 / secrets 0 / containers 0 / env config removed. Clean.
+
+**A1 addressed:** deploy_app forces `LETS_ENCRYPT_ENV=""` (no ACME) on every deploy. M4 CLAIMED.
+
+**M3 still blocked** (Gitea webhook delivery — operator); no response yet. Next: M5 (upgrade +
+backup/restore for custom-html), then wire the parameterized Drone pipeline (API-triggerable).
+
+## 2026-05-27 — M5: upgrade + backup/restore stages green (custom-html)
+
+**Upgrade stage** (tests/custom-html/test_upgrade.py): deploy previous published version
+(git-tag sort, second-newest), write a data marker into the served volume (nginx serves
+/usr/share/nginx/html, so the marker is HTTP-fetchable), `abra app upgrade` to current, assert
+healthy + marker survived. Fix: `upgrade` has no `--chaos` flag (used `-f -D -n`).
+
+**backup-bot-two** deployed as reconcile oneshot (modules/backupbot.nix): restic repo in a local
+`backups` volume, restic_password abra-generated (only if missing). Fixes: `abra app secret generate`
+needs `-m` (machine) to avoid the TTY/ioctl path, and stdout redirected so generated values never
+hit the journal (D6). `abra app backup create`/`restore` need a real PTY ('input device is not a
+TTY') → run via util-linux `script -qec` (harness `_run_pty`; util-linux added to cc-ci-run).
+
+**Backup stage** (test_backup.py): write "original" → `abra app backup create` → mutate to
+"mutated" → `abra app restore` → assert state back to "original".
+
+**Full 3-stage run** (`STAGES=install,upgrade,backup`):
+- install: 2 passed (http 200 + playwright)
+- upgrade: 1 passed (data survives upgrade)
+- backup:  1 passed (restore returns pre-mutation state)
+- teardown: 0 orphaned run services/volumes/secrets; infra (traefik/drone/bridge/backupbot) all 1/1.
+M5 CLAIMED.
+
+**M3 still blocked** (webhook; no operator response across several ticks). Plan: if still blocked,
+pivot the bridge to poll the Gitea API (self-service, Adversary-endorsed) to unblock D1. Next: M6.
+
+## 2026-05-27 — Fix adversary findings A2 (dead janitor) + A3 (unverified teardown)
+
+**A2 (janitor matched dead `-pr` filter):** rewrote `harness.lifecycle.janitor` to match the real
+run-app naming (`RUN_APP_RE = ^[a-z0-9]{1,4}-[0-9a-f]{6}\.ci\.commoninternet\.net$`), reap via
+docker primitives, AND scan `docker service ls` to catch orphans whose `.env` is already gone
+(reconstructs the domain from the service name). Age-gated (default 2h, env `CCCI_JANITOR_MAX_AGE`)
+so concurrent in-flight runs are never killed.
+
+**A3 (teardown unverified + unconditional .env removal):** `teardown_app` now (1) `docker stack rm`
+fallback if `abra undeploy` leaves services, (2) removes volumes/secrets *before* the `.env` and
+only drops the `.env` after the stack is confirmed gone, (3) retries docker volume rm (a stopped
+task briefly holds the volume), (4) **verifies** no residual services/volumes/secrets and raises
+`TeardownError` otherwise — so a partial teardown FAILS the run instead of silently orphaning.
+
+**Re-test (commands + output):**
+- Normal install run → 2 passed, verified teardown clean.
+- Orphan (deploy, no teardown) → `janitor(CCCI_JANITOR_MAX_AGE=0)` → services/volumes/secrets/env 0.
+- **Env-less orphan** (deploy then `rm` the .env, the A3 bad state) → janitor reaps via docker stack
+  rm → services/volumes/secrets 0.
+- Full 3-stage run (install/upgrade/backup) still green with verified teardown, no TeardownError.
+
+A2/A3 fixed; left for the Adversary to re-test + close.
+
+## 2026-05-27 — M6 (part 1): harness enhancements for recipe #2 + D4 discovery
+
+Before enrolling recipe #2, made the shared harness recipe-agnostic so enrolling a recipe needs no
+harness-code change (D5):
+- **Per-recipe meta** (`tests/<recipe>/recipe_meta.py`, optional): HEALTH_PATH, HEALTH_OK,
+  DEPLOY_TIMEOUT, HTTP_TIMEOUT. conftest reads it; `wait_healthy` gained a `path` param (e.g.
+  keycloak `/realms/master`). Defaults preserve custom-html behaviour (verified: install still green).
+- **Shared naming** (`harness/naming.py`): single source for the `<recipe[:4]>-<6hex>` domain, used
+  by conftest + the orchestrator.
+- **D4 recipe-local discovery** (`run_recipe_ci.run_recipe_local`): if a recipe ships `tests/` with
+  `test_*.py`, deploy the app, run those tests against the LIVE deployment (contract: env
+  `CCCI_BASE_URL` + `CCCI_APP_DOMAIN`), merge as another reported stage, guaranteed teardown. Real
+  recipes ship tests/ committed in their repo (clean checkout) → discovered on clone/fetch. (custom-
+  html via catalogue is an awkward case — abra refuses an unstaged recipe and `abra recipe fetch`
+  resets local commits — so D4 is demonstrated end-to-end with recipe #2 hedgedoc, which ships
+  committed tests/.)
+
+**Next:** mirror hedgedoc (postgres+hedgedoc, DB-backed) via the mirror+PR flow with a committed
+tests/ dir, write tests/hedgedoc/ (install/upgrade/backup + recipe_meta), run all stages + D4 green.
+
+## 2026-05-27 — M6 (part 2): recipe #2 keycloak install green (DB-backed, no harness surgery)
+
+Enrolled keycloak (recipe #2): keycloak 26.6.2 **+ mariadb 12.2** — genuinely DB-backed/multi-service
+(vs custom-html stateless). Added only `tests/keycloak/recipe_meta.py` (HEALTH_PATH=/realms/master,
+HEALTH_OK=(200,), 600s timeouts) + `tests/keycloak/test_install.py` (realm-endpoint health +
+Playwright admin-console login). **No change to runner/harness code** — the recipe-agnostic harness
+(per-recipe meta) handled it (D5 evidence).
+
+Run: `RECIPE=keycloak STAGES=install cc-ci-run runner/run_recipe_ci.py` → 2 passed in 545s (keycloak
+is slow: image pull + JVM + mariadb migration). Teardown clean (0 keyc-* services/volumes after).
+
+**Next:** D4 demo via a mirror shipping committed tests/ (recipe-local run against live app); then
+keycloak upgrade + backup/restore (DB data survival via a realm marker through the admin API).
+
+## 2026-05-27 — M6: D4 recipe-local discovery + recipe #2 enrolled (CLAIMED)
+
+**D4 recipe-local discovery working.** Demo: pushed a committed `tests/test_recipe_local.py` to the
+mirror on branch `recipe-maintainers/custom-html@ci/d4-recipe-local`; ran
+`RECIPE=custom-html SRC=recipe-maintainers/custom-html REF=ci/d4-recipe-local STAGES=install` →
+install 2 passed, then `===== STAGE: recipe-local (D4) =====` ran the recipe-shipped test against
+the LIVE app (CCCI_BASE_URL) → 1 passed. Clean teardown (0 orphans).
+
+**Hard-won abra behaviour (DECISIONS.md):** private mirror clone needs the bot token (per-command
+`http.extraHeader`, not persisted/logged). abra commands (`app ls`, `secret generate`, version
+resolution) silently `git checkout <tag>` the recipe, dropping a PR branch's files — so (1) all
+harness abra calls use `-C -o` (chaos+offline = current checkout, no remote fetch), and (2) D4
+snapshots the recipe's tests/ to a temp dir right after fetch (later abra cmds still reset it).
+Traced the drop step-by-step: app_new ok, deploy ok, but `secret generate` (no flags) and `app ls`
+each reset the checkout.
+
+**Recipe #2 = keycloak** (keycloak + mariadb, DB-backed) install green with only
+`tests/keycloak/recipe_meta.py` + `test_install.py` — **no runner/harness change** (D5). custom-html
+remains 3-stage green (M5). docs/enroll-recipe.md written.
+
+**M6 CLAIMED.** keycloak's full 3-stage (DB data survival via a realm marker) folds into M6.5.
+**Next:** M6.5 — keycloak upgrade/backup, then recipes 3–6 across the remaining D10 categories.
+
+---
+## 2026-05-27 — Trigger redesign (polling primary) + resource safety + M3 verified
+
+Session restarted by watchdog (prior tmux died mid-turn with uncommitted bridge WIP). Re-oriented
+from STATUS + plan; two orchestrator design changes landed and are now implemented + verified.
+
+**(1) Trigger: POLLING PRIMARY, webhook optional, org-membership auth** (plan §4.1/§1.5; commit
+7addb96). Rewrote `bridge/bridge.py`: a poll thread (`poll_loop`, always-on, primary) scans each
+`POLL_REPOS` repo's open PRs every 30s for new `!testme`; the `/hook` webhook stays as an optional
+admin-registered push optimization. Both share an in-memory comment-id seen-set → a comment seen by
+both fires once. First poll marks pre-existing comments seen (no startup re-fire). Authorization now
+`GET /orgs/{owner}/members/{user}` (204=member, read-level) + optional `AUTH_ALLOWLIST`, replacing
+the admin-requiring `/collaborators/{user}/permission`. Bot never self-registers webhooks.
+- Verified org endpoint at read level (bot basic-auth):
+  `members/{autonomic-bot,trav,notplants}` → 204; `members/definitely-not-a-member-xyz` → 404.
+- Deployed (nixos-rebuild, deploy-bridge reconcile); new container logs:
+  `poller (primary) watching ['recipe-maintainers/cc-ci'] every 30s` + `(poll primary + optional webhook)`.
+- **End-to-end M3 trigger (poll path):** posted `!testme` on PR #1 (comment 13705, by bot) →
+  Drone build **#26** appeared after **6s** (latest was #25); bridge logged
+  `[poll] triggered build 26 for cc-ci@d397720a (PR #1, comment 13705) by autonomic-bot`; bridge
+  posted back `cc-ci: started CI run for cc-ci @ d397720a → https://drone.ci.commoninternet.net/...`.
+  Satisfies D1 (<60s) over the read-only outbound path — no operator webhook whitelist needed.
+
+**(2) Resource safety: bound live test apps** (plan §4.2/§4.3; commit 72ff8e2). MAX_TESTS =
+`DRONE_RUNNER_CAPACITY` = 1 (`modules/drone-runner.nix`) → Drone runs ≤1 build at once, queues the
+rest natively. Per-build timeout = 60m, reconciled best-effort in `modules/drone.nix`
+(`PATCH /api/repos/.../cc-ci {"timeout":60}`, non-fatal). Janitor remains the backstop for
+SIGKILL'd/timed-out builds (reaps orphaned run apps at run-start before each deploy).
+- Verified on host after rebuild: `DRONE_RUNNER_CAPACITY=1`; deploy-drone logged
+  `set cc-ci build timeout = 60m`; Drone API confirms repo `timeout: 60`.
+
+**Gap noted (next item):** `.drone.yml` still only has the `self-test` pipeline — a bridge-triggered
+build runs the self-test, NOT `runner/run_recipe_ci.py`. M4/M5 ran the orchestrator by hand
+(`cc-ci-run`). Need a recipe-CI pipeline keyed on the `RECIPE` build param (runs
+`cc-ci-run runner/run_recipe_ci.py` with STAGES=install,upgrade,backup, `CCCI_JANITOR_MAX_AGE=0`,
+`concurrency:{limit:1}`) to connect bridge→Drone→harness end-to-end (required for D2/D10 via real
+`!testme`). Added to Build backlog.
+
+**M3 CLAIMED** (gate). Trigger + auth + comment-back demoed live; the webhook-delivery blocker is
+moot now that polling is primary.
+
+---
+## 2026-05-27 — Bridge→Drone→harness integration (recipe-ci pipeline) wired & green
+
+Closed the gap where a bridge-triggered build ran only the self-test. Split `.drone.yml` into two
+event-filtered exec pipelines (commits 9d51cb6, bc8baae, 7aa0346):
+- `self-test` — `trigger.event: [push]` (M2 sanity on pushes).
+- `recipe-ci` — `trigger.event: [custom]` (bridge fires event=custom builds): runs
+  `cc-ci-run runner/run_recipe_ci.py` with STAGES=install,upgrade,backup, `CCCI_JANITOR_MAX_AGE=0`
+  (safe at capacity=1), `concurrency:{limit:1}`, and `HOME=/root` (the exec runner otherwise points
+  HOME at an empty per-build workspace → abra `FATA directory is empty: .../.abra/servers`).
+
+Verified by triggering a `custom` build (RECIPE=custom-html, as the bridge does) via the Drone API:
+- **Build #31** got past `abra app new` (HOME fix) but failed at backup:
+  `abra app backup create … FATA … authentication required: Unauthorized` — backup/restore weren't
+  passing `-C -o`, so abra fetched recipe tags from the (private) remote. Also `recipe versions`
+  found no tags (contaminated recipe dir: private-mirror origin, no tags) → upgrade stage SKIPPED.
+- Fixes: `abra.py` backup_create/restore now pass `-C -o`; `fetch_recipe` catalogue path rm's the
+  recipe dir first so a leftover private-mirror clone can't poison version resolution.
+- **Build #33 → SUCCESS (124s)**, all three stages green through Drone:
+  install `2 passed` (real deploy + Playwright), upgrade `1 passed` (real — tags restored by the
+  clean re-clone, no longer skipped), backup `1 passed` (the -C -o fix). Post-run on host:
+  0 run-app services, 0 run-app volumes; traefik/drone/bridge infra intact. Event filtering works
+  (only recipe-ci ran, not self-test).
+
+So the full D1→D2 path is wired and proven in two verified halves: poll-trigger→Drone (build #26,
+RECIPE param correct) and Drone→harness 3-stage CI (build #33, green + clean teardown). Remaining for
+full single-comment E2E on a *recipe* PR: enroll the recipe in the bridge POLL_REPOS + open a recipe
+PR (M6.5/M10 breadth work).
+
+**Adversary findings status (signal for re-test):** A2 (janitor `-pr` filter) and A3 (teardown
+verification + `.env`-last ordering) are both already fixed in the current code
+(`lifecycle.RUN_APP_RE` hashed-scheme match; `teardown_app` `_residual()` raise + `docker stack rm`
+fallback) — awaiting the Adversary's kill-probe re-test on an idle host. A4 (concurrent same-recipe
+collision): its named root cause "no Drone concurrency cap (capacity=2)" is eliminated by
+MAX_TESTS=capacity=1 — no concurrent runs possible on this single node, so the shared-recipe-dir race
+can't occur. No Builder fix outstanding on findings; next milestone work is M6.5 breadth.
+
+---
+## 2026-05-27 — M6.5: keycloak full 3-stage GREEN through the Drone recipe-ci pipeline
+
+Ran keycloak (DB-backed, SSO/identity category) end-to-end via the integrated recipe-ci pipeline
+(triggered `custom` build #39, RECIPE=keycloak). **Build #39 → success (~31m)**, all three stages
+green as separate reported stages:
+- install `2 passed` (8m30s): `test_realm_endpoint_healthy` (/realms/master 200) + Playwright admin
+  console login.
+- upgrade `1 passed` (10m10s): `test_upgrade_preserves_realm` — realm marker written pre-upgrade
+  survives the previous→latest upgrade (DB data survival).
+- backup `1 passed` (8m15s): `test_backup_mutate_restore` — backup→mutate→restore returns original.
+Clean teardown verified on host: 0 keyc services, 0 keyc volumes. keycloak cold start is slow on
+this VM (Quarkus augmentation ~80s + Liquibase schema init), so each deploy is ~5-8m — well within
+the 60m build timeout; that's why the run took ~31m. No harness surgery (D5): keycloak runs off
+`tests/keycloak/{recipe_meta,test_install,test_upgrade,test_backup}.py` + `kc_admin.py` only.
+
+This both advances M6.5 (first DB-backed recipe full 3-stage) and confirms the recipe-ci integration
+works on a heavy DB-backed recipe (Drone→harness→3 stages→teardown). Next M6.5: enroll recipes 3–6
+covering the remaining D10 categories (stateful-no-DB, multi-service+S3, large-volume, etc.).
+
+---
+## 2026-05-27 — M6.5: cryptpad (recipe #3) enrolled + full 3-stage green; fixed a real backup bug
+
+Enrolled **cryptpad** (stateful, no external DB — the D10 "stateful/no-DB" category). No shared-harness
+surgery beyond a *generic* feature: added per-recipe **EXTRA_ENV** (recipe_meta.py dict or
+domain-callable) applied in `deploy_app` at every deploy path. cryptpad uses it for its required
+distinct `SANDBOX_DOMAIN` (a sibling subdomain under the wildcard, so no cert work). Data-survival
+tests write a marker into the backed-up `cryptpad_data` volume and read it via `exec_in_app`
+(cryptpad's datastore isn't HTTP-served like custom-html).
+
+Host runs (HOME=/root, cc-ci-run): install **2 passed** (~2m; http 200 + Playwright loads cryptpad),
+upgrade **1 passed** (~1m; marker survives previous→current), backup **1 passed** after a fix
+(below). Clean teardown (0 cryp services/volumes).
+
+**Real bug found+fixed — backups were silently mis-wired (set_env newline).** cryptpad backup first
+failed: `abra app backup create` → backup-bot-two's `/usr/bin/backup` raised
+`KeyError: 'RESTIC_REPOSITORY'`. Root cause: backup-bot-two's `.env.sample` ends with a *newline-less*
+comment line, and the reconcile's `set_env` did a bare `printf >> .env`, gluing
+`RESTIC_REPOSITORY=/backups/restic` onto that comment → commented out. abra `--debug` confirmed the
+backupbot env map lacked `RESTIC_REPOSITORY`, and `docker exec backupbot printenv RESTIC_REPOSITORY`
+was empty. Fix: `set_env` now ensures a trailing newline before appending (modules/backupbot.nix +
+modules/drone.nix, same latent bug). After rebuild: `.env` has a clean `RESTIC_REPOSITORY=` line, the
+backupbot container has `RESTIC_REPOSITORY=/backups/restic`, and cryptpad backup→mutate→restore
+passes. NOTE: keycloak backup (build #39) passed off an *earlier, non-corrupted* backupbot deploy;
+worth a re-verify, but the mechanism is now correct/reproducible. Triggered Drone build #46 (cryptpad)
+as the canonical recipe-ci run.
+
+---
+## 2026-05-27 — M6.5: matrix-synapse (recipe #4, DB+media/large-volume) full 3-stage green
+
+Enrolled matrix-synapse (synapse `app` + postgres `db` + nginx `web`) — the large-volume/DB+media
+D10 category. No harness surgery (server_name = DOMAIN; no EXTRA_ENV needed). Host runs (cc-ci-run):
+install **2 passed** (~2.7m; client API 200 + real `/_matrix/client/versions` JSON), upgrade
+**1 passed** (~2.3m; postgres marker survives previous→current), backup **1 passed** (~1.5m). Clean
+teardown (0 matr services). The data-survival tests use a `ci_marker` postgres row exec'd via
+`psql` in the `db` service — this exercises the recipe's real DB-dump backup hook
+(`backupbot.backup.pre-hook=/pg_backup.sh backup` / `restore.post-hook`), the meaningful matrix data
+path (not a plain volume copy). Worked first try (the set_env/RESTIC fix holds for hook-based
+backups too). Triggering the canonical Drone recipe-ci run.
+
+4 of 6 D10 recipes now green: custom-html (simple), keycloak (SSO/DB), cryptpad (stateful/no-DB),
+matrix-synapse (DB+media/large-volume). Remaining categories: multi-service+S3 (lasuite-docs) and
+TLS-passthrough (bluesky-pds).
+
+---
+## 2026-05-27 — M6.5: lasuite-docs (recipe #5, multi-service + S3/MinIO) full 3-stage green
+
+Enrolled lasuite-docs (the object-storage/S3 + multi-service D10 category): a 9-service stack
+(frontend app + Django backend + celery + y-provider + docspec + postgres + redis + minio + nginx).
+Host runs (cc-ci-run): install **2 passed** (~2.5m; SPA served + Playwright), upgrade **1 passed**
+(~3m; postgres marker survives previous→current, incl. cold-pulling the older images), backup
+**1 passed** (~2.3m; pg_backup.sh dump/restore). Clean teardown.
+
+Root-caused the initial deploy timeout: cold-pulling ~9 large images (impress frontend/backend,
+minio, postgres18, docspec, y-provider, redis) exceeds abra's default 300s convergence TIMEOUT →
+`FATA deploy timed out 🟠`. A manual deploy confirmed the stack converges 9/9 once images are pulled.
+Fix: bump the recipe TIMEOUT to 900 via the generic EXTRA_ENV mechanism (no harness surgery). OIDC is
+config-only (Django `manage.py check` validates but doesn't fetch), so the stack starts healthy with
+placeholder OIDC; login isn't exercised in CI (documented in recipe_meta). Data-survival uses a
+postgres marker (docs/docs) via the pg_backup hook.
+
+5 of 6 D10 recipes green: custom-html (simple), keycloak (SSO/DB), cryptpad (stateful/no-DB),
+matrix-synapse (DB+media/large-volume), lasuite-docs (multi-service + S3/MinIO). Remaining: a
+TLS-passthrough recipe (bluesky-pds) for the 6th, which needs cc-ci Traefik passthrough config
+(plan §4.0 caveat) — the hardest infra-wise.
+
+---
+## 2026-05-27 — M6.5 COMPLETE: n8n (recipe #6) full 3-stage green — all 6 D10 recipes done
+
+Enrolled n8n (workflow automation; single `app` service, stateful via the /home/node/.n8n volume,
+normal terminate-at-Traefik). Host runs: install **2 passed** (~3.8m; /healthz 200 + Playwright
+editor), upgrade **1 passed** (~1.3m; marker in /home/node/.n8n survives), backup **1 passed**
+(~0.8m; backupbot.backup.path file backup). Clean teardown. (Caught a sync gap first: committed the
+tests but forgot to tar tests/n8n to the host → run skipped "no stage test files"; synced + re-ran.)
+
+n8n is recipe #6 in place of bluesky-pds (TLS-passthrough), swapped per DECISIONS (caddy self-ACME
+conflicts with cc-ci's no-ACME/static-wildcard design).
+
+**All 6 D10 recipes now have a full 3-stage green run (host):**
+1. custom-html — simple/stateless
+2. keycloak — SSO/identity + DB (Drone #39)
+3. cryptpad — stateful/no-DB (Drone #46)
+4. matrix-synapse — DB+media/large-volume (Drone #51)
+5. lasuite-docs — multi-service + S3/MinIO/object-storage (Drone #57)
+6. n8n — workflow automation (Drone canonical run triggering now)
+All 5 required D10 categories covered. Triggering n8n canonical Drone run, then claiming the M6.5 gate.
+
+---
+## 2026-05-27 — M8/D7: results dashboard live (overview + badges)
+
+Built the results dashboard (dashboard/dashboard.py + modules/dashboard.nix): a stdlib HTTP service
+(Nix-built OCI image, swarm service on proxy, reconcile oneshot like bridge/drone) that polls the
+Drone API for recipe-CI builds (event=custom), groups latest-run-per-recipe, and renders a
+YunoHost-CI-like overview at **ci.commoninternet.net/** with pass/fail/running badges, last ref,
+when, and a link to the canonical Drone run. Plus /badge/<recipe>.svg embeddable badges.
+
+Verified live via the public gateway: overview lists exactly the 6 enrolled recipes (cryptpad,
+custom-html, keycloak, lasuite-docs, matrix-synapse, n8n) each **success**; `/badge/keycloak.svg` →
+200 image/svg+xml; `/healthz` → 200; **`/hook` still routes to the bridge** (200) — the bridge's
+Host && PathPrefix(`/hook`) rule keeps priority over the dashboard's Host-only rule.
+
+Two fixes en route: (1) filter out the cc-ci repo's own name as a recipe row (Adversary !testme on
+the cc-ci PR showed a spurious cc-ci=failure); (2) **content-hash image tag** — a fixed `:latest`
+tag + unchanged stack spec does NOT roll the swarm service on a code change, so the tag is now
+derived from a hash of dashboard.py → `docker stack deploy` rolls reliably (reproducible/self-heal).
+NOTE: the bridge image has the same latent `:latest` issue (only rolled this session because its
+.nix env also changed) — worth the same content-tag treatment (backlog).
+
+Remaining M8 piece: PR-comment **outcome reflection** — the bridge posts the start/run-link comment
+but doesn't yet update it with the final pass/fail (needs a Drone build-completion hook or the
+bridge polling build status). Overview + badges (the core of D7) are done.
+
+---
+## 2026-05-27 — M8/D7 complete: PR-comment outcome reflection + gate claim
+
+Added outcome reflection to the bridge: after triggering, a daemon watcher polls the Drone build to
+completion and edits the run-link PR comment to ✅ passed / ❌ <status> (Gitea PATCH
+issues/comments/{id}). Gave the bridge image a content-hash tag so the swarm service actually rolls
+on bridge.py changes (same latent :latest no-roll issue the dashboard had).
+
+Verified end-to-end: posted a fresh `!testme` on PR #1 → poller fired → "started" comment posted →
+build #76 (RECIPE=cc-ci, fails fast: no tests/cc-ci) → within ~20s the **same comment was edited to
+`cc-ci: run for cc-ci @ d397720a ❌ failure → …/76`**. The pass/fail now mirrors onto the PR comment.
+
+D7 fully met: per-run logs (Drone UI) + overview page with badges (dashboard, live) + PR comment
+links back AND reflects the outcome. Claiming the M8 gate.
+
+---
+## 2026-05-27 — M10/D10: real !testme path proven on custom-html; enrolling the breadth set
+
+Wired the real-PR path end-to-end and proved it on custom-html. `!testme` on
+recipe-maintainers/custom-html#2 → bridge poller fired → recipe-ci build (SRC=mirror, REF=PR head
+db9a9502) → **build #84 success, all 3 stages green** (install 2✓, upgrade 1✓ — now runs for real,
+backup 1✓) → bridge comment edited to ✅ passed. Clean teardown.
+
+Three fixes to make the real-PR path exercise the upgrade stage (mirror PR clones carry no tags):
+1. fetch_recipe (SRC+REF) read-only fetches the published version tags from the PUBLIC upstream
+   (`git fetch <upstream> refs/tags/*:refs/tags/*` — bare `--tags` errored "no remote HEAD"); plain
+   git, never pushes to the mirror (guardrail-safe).
+2. abra.upgrade now passes `-o` (offline) — it was 401'ing trying to fetch tags from the private
+   mirror origin; offline uses the local (upstream-populated) tags.
+3. (earlier) backup/restore already pass `-C -o`.
+Now firing !testme on the other recipes' open PRs (keycloak#1, matrix-synapse#1, lasuite-docs#1,
+n8n#1) — they queue at MAX_TESTS=1. cryptpad has no open PR → opening one next.
+
+---
+## 2026-05-27 — M10/D10: real !testme breadth runs — 5/6 green, lasuite-docs upgrade retry
+
+Fired !testme on all 6 recipe PRs (capacity=1, sequential). Results (real PR-triggered, full 3-stage):
+- custom-html #84 ✅ (PR head db9a9502)
+- keycloak #86 ✅ (DB realm marker survives upgrade)
+- matrix-synapse #87 ✅ (postgres marker, pg_backup hook)
+- n8n #89 ✅
+- cryptpad #90 ✅ (test PR #2 opened via Gitea API: branch ci/testme + .ci-testme marker)
+- **lasuite-docs #88 ❌** — install ✅ + backup ✅, but UPGRADE failed: `abra app upgrade … -o`
+  → `FATA deploy failed` (a convergence failure during the 9-service rolling upgrade prev→latest,
+  not a timeout). It PASSED on the host/catalogue run, and ran right after the heavy matrix build,
+  so likely transient resource contention. Re-fired !testme on lasuite-docs#1 to test
+  transient-vs-persistent.
+
+So the real-!testme path + the upgrade fixes (upstream tags + `upgrade -o`) work across simple, DB,
+DB+media, workflow, and stateful recipes. lasuite-docs (the object-storage/S3 category, required)
+needs its upgrade to pass on the real path for the 6/6 D10 proof.
+
+---
+## 2026-05-27 — M10: 5/6 real-!testme green; lasuite-docs blocked on Docker Hub rate limit (A1)
+
+lasuite-docs #88/#92 upgrade failed "deploy failed" → diagnosed: node disk at 90% (2.7G free) — a
+9-service rolling upgrade couldn't converge. Pruned 30 unused images (reclaimed 12GB → 15G free).
+Retry #93: got further (5/8 services up) but redis task Rejected "No such image: redis:8.2.6" →
+`docker pull redis:8.2.6` on the node = `toomanyrequests: unauthenticated pull rate limit`. So the
+prune fixed disk but forced re-pulls that hit Docker Hub's anonymous limit (A1 registry-creds
+finding, §1.5/§4.4). Recorded in STATUS ## Blocked + DECISIONS; surfaced to operator (provide Docker
+Hub creds). 5/6 recipes green via real !testme; lasuite install+backup green, upgrade gated.
+Pivoting to M9 (docs/reproducibility, unblocked) while the limit resets / creds arrive.
+
+---
+## 2026-05-27 — lasuite quota-window retry insufficient; halting retries pending creds (3rd attempt)
+
+Re-fired lasuite-docs !testme during the apparently-eased window (#96). The cached image redis:8.2.6
+gave "up to date", but the LATEST version's uncached redis:8.6.3 → `toomanyrequests` again. So the
+anonymous quota isn't reset enough for a full 9-service × 2-version deploy. Cancelled #96 + tore down
+clean. This is the 3rd confirmation the blocker is the Docker Hub rate limit. Per anti-thrash:
+**halting lasuite retries until the operator provides Docker Hub creds** (A1, STATUS ## Blocked).
+5/6 D10 recipes remain green via real !testme. Pivoting to M9 (docs/reproducibility) — fully
+unblocked, no image pulls.
+
+---
+## 2026-05-27 — M10/D10 BUILDER-COMPLETE: all 6 recipes green via real !testme
+
+Diagnosed the lasuite-docs upgrade failure with an instrumented host run: `abra app upgrade` reported
+`FATA deploy failed` while all 9 services were actually 1/1 healthy — abra's convergence poll gives
+up too early on the slow stop-first rolling upgrade (pulling new images). Fix: pass `-c`
+(`--no-converge-checks`) to `abra app upgrade` and let the harness's wait_healthy + data-survival
+assertion be the (patient, real) gate. (Also: `/root/cc-ci` was stale — fully synced; the first diag
+hit the old no-`-o` auth error, masking this.)
+
+**lasuite-docs #108 → success** with the fix: install 2✓, upgrade 1✓, backup 1✓; bridge comment
+edited to `✅ passed`. So **all 6 D10 recipes are green via REAL `!testme` on a PR**, full 3-stage,
+comment-reflected, clean teardown:
+| recipe | category | build |
+|---|---|---|
+| custom-html | simple/stateless | #84 |
+| keycloak | SSO/identity + DB | #86 |
+| matrix-synapse | DB + media / large-volume | #87 |
+| n8n | workflow automation | #89 |
+| cryptpad | stateful / no external DB | #90 |
+| lasuite-docs | multi-service + S3/MinIO/object-storage | #108 |
+
+All 5 required D10 categories covered. The earlier Docker Hub rate-limit blocker resolved on quota
+reset (registry creds still recommended for reproducibility under load — see DECISIONS). D10 is
+Builder-complete; DONE awaits the Adversary's <24h PASS on D1–D10 (esp. independent D10 verification).
+
+---
+## 2026-05-27 — D10 PASS (6/6); DONE now blocked only on D8 live VM rebuild (Adversary)
+
+Adversary logged **D10 PASS (6/6) @11:57Z** (all 6 recipes green via real !testme, all categories) +
+**M8/D7 PASS, D8-core PASS, D9 PASS** this window. No VETO, no open findings. Audited REVIEW: D1
+(=M3), D2 (=M5/M6.5/D10), D3 (=M4), D4/D5 (=M6/M6.5), D6 (=M7), D7 (=M8), D9, D10 all have <24h PASS.
+**The sole remaining item is D8's full live blank-VM rebuild** — the Adversary marked it "core PASS;
+live rebuild pending — to complete before DONE," deferred on the premise that it needs operator
+registry creds. That premise is now OBSOLETE (D10 went 6/6 with no creds; rate limit was transient;
+real fix was `-c`). Surfaced in STATUS that the VM rebuild is feasible now without creds. NOT writing
+`## DONE` until REVIEW shows a full D8 PASS (per §6.1 — every D1–D10 needs a <24h PASS). No Builder
+implementation remains; awaiting the Adversary's D8 completion.
+
+---
+## 2026-05-27 — ## DONE
+
+Adversary logged **D8 PASS** (byte-identical `nixos-rebuild build` closure == `/run/current-system`
+ documented-alternative: a throwaway-VM live rebuild is infeasible by design — its host key isn't a
+sops recipient and the A1 preconditions bind to cc-ci itself) and the **DONE-readiness table: all
+D1–D10 PASS <24h, no VETO**, explicitly: "the DONE handshake (§6.1) is CLEARED — Builder may flip
+STATUS → DONE."
+
+Wrote `## DONE` to STATUS.md with the D1–D10 evidence table. Every Definition-of-Done item is
+independently Adversary-verified. **Stopping the self-paced loop** (no further ScheduleWakeup).
+
+Final system: NixOS-flake-declared cc-ci host running single-node Docker Swarm + coop-cloud/traefik
+(wildcard cert, no ACME) + Drone (server recipe + host exec runner, MAX_TESTS=1, 60m timeout) +
+comment-bridge (polling-primary `!testme`, org-membership auth, PR-comment outcome reflection) +
+backup-bot-two + results dashboard. `!testme` on an enrolled recipe PR → 3-stage (install/upgrade/
+backup) real e2e CI with Playwright → live Drone logs + dashboard + PR ✅/❌. Six recipes proven.
--- a/machine-docs/STATUS-1b.md
+++ b/machine-docs/STATUS-1b.md
@ -0,0 +1,67 @@
+# STATUS — Phase 1b (review & lint pass)
+
+**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1b-review-lint.md`
+**Loop state for THIS phase:** STATUS-1b / BACKLOG-1b / REVIEW-1b / JOURNAL-1b (DECISIONS.md shared).
+The repo's STATUS.md / BACKLOG.md / REVIEW.md are Phase-1 HISTORY; STATUS-1c etc. are Phase-1c
+HISTORY (DONE @2026-05-27). Neither is this phase's state.
+
+## Phase
+Phase 1b runs **after** Phase 1 + Phase 1c (both DONE) and **before** Phase 2. It is a **bounded**
+review + lint pass over the final post-1c codebase. Exit = RL1–RL4 all Adversary-confirmed in
+REVIEW-1b, then `## DONE`.
+
+## Definition of Done (Phase 1b) — now RL1–RL6 (operator added RL5/RL6, plan §7)
+- [x] **RL1** — Lint/format tooling + `.drone.yml` stage; codebase passes. **Adversary cold PASS.**
+- [x] **RL2** — §3 white-box checklist run (both loops); no blocking findings; 2 advisories triaged
+      (old_app→IDEAS; app-secret-redaction→RL3/D6 watch-item). Recorded REVIEW-1b + JOURNAL-1b.
+- [ ] **RL3** — Full D1–D10 cold re-verification (final gate), nothing weakened; now also covers the
+      RL5 byte-identical rebuild. **CLAIMED — awaiting Adversary.**
+- [x] **RL4** — Documented: README lint section (local + CI-enforced) + architecture.md `nix/` layout;
+      deviations in DECISIONS.md.
+- [x] **RL5** — Nix code consolidated under `nix/`; flake at root (#cc-ci unchanged); builds
+      byte-identical `8i3jcad9`; canonical switched + healthy.
+- [ ] **RL6** — protocol files → `machine-docs/`: DEFERRED to the coordinated end (orchestrator
+      lockstep on launch.sh + watchdog). README stays at root.
+
+## In flight
+**W0 (RL1) — DONE, Adversary cold PASS @2026-05-27** (REVIEW-1b: clean checkout → `lint: PASS` +
+break-it probe → `lint: FAIL`). Advisory (non-blocking): confirm a real push fires the Drone lint
+build at RL3 (flaky push webhook, §4.1).
+
+**W1 (RL2) — Builder §3 self-review complete, clean.** All blocking invariants hold (tests-real,
+harness-DRY [no recipe conditionals in shared harness; quirks are data via `recipe_meta.py`],
+nix-idempotent, no-footguns [all sleeps are poll-loop intervals], no-secrets, log-redaction); no
+fix needed, no advisory filed. **Awaiting the Adversary's own §3 pass #2 to confirm RL2.**
+
+**W2 (RL3/RL4) — next.** RL4 docs already landed (README lint section). After RL2 confirms: rebuild
+cc-ci to the formatted closure (running == cleaned source) and request the cold D1–D10 re-verify.
+
+## Gate — RL3 PASS; ONLY RL6 (coordinated) remains before DONE
+**RL3 ✅ PASS @2026-05-27** (Adversary cold, REVIEW-1b): full D1–D10 re-verified on the cleaned+RL5
+byte-identical closure (`8i3jcad9`==running==fresh-clone build), fresh evidence <24h, **nothing
+weakened**; cardinal-rule PASS; 2 fresh category-spanning green runs (custom-html #151, keycloak #152)
+ carry-forward of the Phase-1 Adversary-verified 6/6 set. **RL1–RL5 all Adversary-PASS, no open
+`[adversary]` findings, NO VETO.**
+
+### RL6 — Builder part DONE (machine-docs/ move executed). Adversary: move REVIEW* + re-verify.
+Verified the orchestrator's enabling condition is already in place: `launch.sh` (mtime 21:28:03) has
+`resolve_state()` (prefers `machine-docs/$base`, else root), used by EVERY STATUS/REVIEW read
+(`phase_done` L70, handoff watcher L147); the **running watchdog (pid 133191) was restarted at
+21:28:36 — after that update** → it is location-agnostic and "survives the move whenever it happens"
+(its own comment). So the move is safe now (no strict-lockstep instant required; `resolve_state` is
+per-file).
+
+Builder executed:
+- `git mv STATUS*.md BACKLOG*.md JOURNAL*.md DECISIONS.md → machine-docs/` (README.md STAYS at root).
+- Updated in-repo refs: `README.md` (status line + lint section + Loop-state section) and
+  `docs/install.md` → `machine-docs/…`. `scripts/lint.sh` → **lint: PASS** post-move.
+- (No `AGENTS.md`/`.drone.yml`/`scripts` protocol-file refs in-repo. The `cc-ci-plan/` plans are the
+  orchestrator's — not edited from here.)
+
+**Adversary:** please `git mv REVIEW*.md → machine-docs/` (yours to move, single-writer rule) and
+re-verify (a) in-repo refs updated + (b) the watchdog handoff still works via `resolve_state`. REVIEW*
+at root + my files in `machine-docs/` is a valid intermediate. On your RL6 PASS (RL1–RL5 still PASS,
+no VETO), Builder writes `## DONE`.
+
+## Blocked
+(none)
--- a/machine-docs/STATUS-1c.md
+++ b/machine-docs/STATUS-1c.md
@ -0,0 +1,195 @@
+# STATUS — Phase 1c (full git reproducibility + genuine D8 live rebuild)
+
+**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1c-full-reproducibility.md`
+**Loop state for THIS phase:** STATUS-1c / BACKLOG-1c / REVIEW-1c / JOURNAL-1c (DECISIONS.md shared).
+The repo's STATUS.md / BACKLOG.md / REVIEW.md are Phase-1 HISTORY — not this phase's state.
+
+## DONE
+**Phase 1c COMPLETE @2026-05-27.** All Definition-of-Done items **C1–C7 + E2E-TESTME** are
+Adversary-PASS within 24h (REVIEW-1c: W2 16:55Z, W5/C4/C5 18:55Z, E2E + C1–C6 b301b03, C7 9e0f72a),
+**no standing VETO, no open `[adversary]` findings** (ADV-1c-1 closed). Final Builder health check:
+cc-ci `running`/0-failed, **byte-identical build==running==`cqym8knjg7nkly1wdgwkyr873fm8scfl` (ZERO
+DRIFT)**, 6 stacks, cert sops-from-git `c1d96d61…`, public TLS `ci.commoninternet.net` 200/ssl_verify=0.
+
+The VM is now fully reproducible from git: blank NixOS host + the two repos (`cc-ci` +
+`cc-ci-secrets` submodule) + the one bootstrap age key → a single `nixos-rebuild switch` → a
+working cc-ci that serves a real `!testme` run end-to-end over the public domain (proven on a
+throwaway VM, cold, by both loops). D8 closed honestly (static byte-identical closure + live rebuild;
+"infeasible by design" withdrawn). Found+fixed two real reproducibility gaps en route: the
+concurrent-`abra` reconcile race (serialized) and the non-deterministic Drone bot token
+(`DRONE_USER_CREATE token:`).
+
+- [x] C1 secrets-repo split · [x] C2 cert-in-git · [x] C3 all-secrets-in-git (1 bootstrap key) ·
+  [x] C4 throwaway live rebuild · [x] C5 honest D8 · [x] C6 resize+sizing (promote rebuilt VM) ·
+  [x] C7 docs · [x] E2E-TESTME (E1–E6).
+
+Open items handed to the operator (not 1c-gating): physical promotion of `ccci-w5-rebuild` → cc-nix-test
+(its bridge paused, stack up — restore at promotion); plan.md §4.0/§4.4 still carry pre-1c cert wording
+(out-of-repo; superseding note added at §1.5). Adversary will append its final cold sign-off.
+
+<details><summary>pre-DONE phase note</summary>
+**1c — Builder COMPLETE; only ADV-1c-1 (C7 re-verify) between here and DONE.** All addressed.</details>
+
+## In flight — W4 DONE, Gate W4 CLAIMED
+- W1 DONE (cc-nix-test 6→4 GB). W2 PASS (Adversary cold). W3 DONE (VM reachable).
+- W4 DONE — genuine throwaway-VM live rebuild proven on a FRESH blank VM: only `/var/lib/sops-nix/
+  key.txt`=recovery key provisioned; `git clone --recursive` + **ONE** `nixos-rebuild switch
+  ?submodules=1` → **running, 0 failed**, byte-identical **`ld19aj2`==cc-ci**, all 6 stacks 1/1, all
+  secrets+cert decrypted via recovery key, **TLS leaf == git cert** (`57:8D:…:B8:A6`), no manual step.
+  (Final config = ld19aj2: `sops.age.keyFile` + serialized abra reconcilers fixing a fresh-host race.)
+- Throwaway destroyed (frees RAM for Adversary W5; C6 no-leftover). install.md updated to this procedure.
+- Remaining: W5 (Adversary cold rebuild + honest D8 rewrite), W6 (docs C7 + final cc-nix-test sizing).
+
+<details><summary>W2 detail (PASS)</summary>
+## In flight — W2 (secrets repo + cert into git) — COMPLETE, gate claimed
+- [x] **W2 step 1:** private `recipe-maintainers/cc-ci-secrets` created + populated (6 infra secrets
+  + wildcard cert/key, sops, both recipients; sha256 byte-perfect) + pushed.
+- [x] **W2 step 2:** base repo — `secrets/` is now the cc-ci-secrets submodule (gitlink 2312f1c);
+  secrets.nix adds `wildcard_cert`/`wildcard_key` → `/var/lib/ci-certs/live/*`; proxy.nix reframed.
+  Pushed f79e542. Switched live cc-ci (toplevel `vh6vwxbl…`). **Verified:** cert sops-decrypts from
+  git (symlinks, sha256 match), system running 0 failed, byte-identical (build==running), git-clone
+  `?submodules=1` path also reproduces `vh6vwxbl…`, live TLS valid (LE wildcard, ssl_verify=0).
+- (Recovery-key `sops.age.keyFile` for the throwaway deferred to W3/W4 — re-verify byte-identical there.)
+</details>
+
+## 🟢 CONFIG FINAL @2026-05-27 ~20:05Z — toplevel `cqym8knjg7nkly1wdgwkyr873fm8scfl`
+cc-ci switched to the FINAL config (secrets-split + cert-in-git + `sops.age.keyFile` + serialized abra
+reconcilers + Drone-token fix). **Byte-identical: build==running==`cqym8knj…` (ZERO DRIFT)**, system
+running 0 failed, bridge→Drone token OK. **No more config changes planned.**
+**For the Adversary's final DONE verification:** (a) re-confirm **C1 byte-identical at `cqym8knj`**
+(supersedes the ld19aj2 18:00Z / 18:55Z clocks — the only delta is the Drone-token fix af46aca);
+(b) independently verify **E1–E6** (E2E-TESTME — real `!testme`; note: requires the swap, OR verify
+against the run #4 evidence + a fresh trigger; the rebuilt VM `ccci-w5-rebuild` is up with bridge
+paused). C4/C5 hold (the rebuilt VM is also at `cqym8knj`; a fresh rebuild from the current repo
+reproduces it). No VETO expected.
+
+## Gate
+**Gate: W4 — PASS @2026-05-27 18:55Z (Adversary, cold independent rebuild).** C4 + C5 verified on the
+Adversary's own fresh blank VM `ccci-w5-rebuild`: single switch → `ld19aj2` byte-identical, 0 failed,
+6/6 stacks, all secrets+cert from git via recovery key, TLS leaf == git cert. **C1–C5 all
+Adversary-PASS, no VETO.** D8 honest (infeasible superseded). Narrow signed-off limitation: Drone↔Gitea
+OAuth grant (install.md §2 manual post-step) — validated functionally by E2E-TESTME next.
+**Now (Builder): swap (`ccci-w5-rebuild @ 100.97.167.73` → cc-nix-test) + run E2E-TESTME (E1–E6).**
+
+<details><summary>prior W4 CLAIMED</summary>
+**Gate: W4 — CLAIMED, awaiting Adversary @2026-05-27 ~18:45Z.** Genuine throwaway-VM live rebuild
+(C4/C5/D8). For the Adversary's cold W5 (own fresh Incus VM in terraform-ci, ~4 GB; RAM is free — my
+throwaway destroyed): provision ONLY `/var/lib/sops-nix/key.txt` = recovery age key (`age1cmk26…`
+private half, from `/srv/cc-ci/.sops/master-age.txt`); `git clone --recursive` base+secrets (bot
+creds); `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` (per docs/install.md).
+Expect: running/0-failed, toplevel `ld19aj2…`==cc-ci, 6 stacks 1/1, cert sha256 `c1d96d61…`, local
+`curl --resolve …:127.0.0.1` ssl_verify=0 with served leaf == git cert `57:8D:…:B8:A6`. Then rewrite
+the D8 evidence (static byte-identical + live rebuild; drop "infeasible by design"). My evidence:
+JOURNAL-1c 2026-05-27 W4 entry. (Note: throwaway base VM = Incus image; live TS_AUTH_KEY in cloud-init.)
+</details>
+
+**Gate: W2 — PASS @2026-05-27 16:55Z (Adversary, cold).** C1/C2/C3 verified (byte-identical, cert
+from git + TLS leaf-match, no plaintext leak). Config has since evolved vh6vwxbl→izsmiajw→**ld19aj2**
+(keyFile + serialized reconcilers); Adversary refreshed C1 against izsmiajw @18:00Z; ld19aj2 is final.
+
+<details><summary>prior</summary>
+**Gate: W2 — CLAIMED, awaiting Adversary @2026-05-27 ~16:45Z.**
+Acceptance to verify (cold): (1) byte-identical `nixos-rebuild build .#cc-ci` == `/run/current-system`
+(`vh6vwxbl4qr9whzpwgjimhf9gn4329p8`) — **must init the submodule** (`git clone --recursive` / `git
+submodule update --init`, bot creds) then build `--flake 'git+file://<clone>?submodules=1#cc-ci'`, else
+`secrets/` is empty; (2) cert sops-decrypted from git to `/var/lib/ci-certs/live/` (symlinks → /run/secrets,
+sha256 `c1d96d61…`/`9ec25d00…`) + live TLS served (`https://ci.commoninternet.net`); (3) no plaintext
+secret in base repo or Nix store (all 8 secrets ENC in cc-ci-secrets; cert decrypts to tmpfs, not store).
+See JOURNAL-1c 2026-05-27 W2a entry for full evidence.
+</details>
+
+## Definition of Done (C1–C7 — see phase plan §3)
+- [x] C1 — Secrets-repo split (Adversary-PASS 16:55Z; re-exercised cold on blank host at C4)
+- [x] C2 — Cert in git (Adversary-PASS 16:55Z; re-exercised at C4)
+- [x] C3 — All secrets in git, one exception = bootstrap age key (Adversary-PASS 16:55Z; keyFile-on-throwaway at W4)
+- [x] C4 — Genuine throwaway-VM live rebuild (Adversary-PASS W5 18:55Z, cold; rebuilt VM at cqym8knj)
+- [x] C5 — Honest D8 (Adversary-PASS W5; static+live, "infeasible" superseded; narrow OAuth limitation signed off)
+- [x] C6 — cc-nix-test 6→4 GB; first throwaway destroyed; final sizing = PROMOTE rebuilt VM (operator override, kept)
+- [~] C7 — install.md/secrets.md/architecture.md + plan.md done; Adversary re-verify of architecture.md pending (ADV-1c-1, addressed 6276bfd)
+
+## ✅ E2E-TESTME — PASS @2026-05-27 (functional acceptance of D8/clean-room)
+Real `!testme` on the rebuilt-from-git VM (swapped in as cc-nix-test) over the PUBLIC domain:
+**E1** public 200/ssl_verify=0; **E2** bridge→new Drone build #4 (>baseline #3, not manual); **E3**
+app `cust-bdddd9.ci.commoninternet.net` EXTERNAL via gateway → HTTP/2 200, ssl_verify=0, real nginx
+body, `CN=*.ci.commoninternet.net` cert; **E4** build #4 success, log shows real install/upgrade/backup
+(Playwright incl.) all passed, no softening; **E5** clean undeploy (0 residual); **E6** bridge PR
+comment "✅ passed →…/cc-ci/4" + dashboard custom-html/success/#4. Evidence: JOURNAL-1c. Caught+fixed
+the Drone-bot-token reproducibility gap (af46aca) en route. **Adversary independently verifies E1-E6.**
+Remaining: swap-back; re-deploy af46aca to cc-ci (byte-identical at new toplevel `cqym8knj…`).
+
+## SWAP REVERTED (2026-05-27 ~20:00Z) — public back on the ORIGINAL cc-ci
+E2E-TESTME passed; swapped back: `cc-nix-test` (MagicDNS) → `100.90.116.4` (original), public
+`ci.commoninternet.net` → 200 ssl_verify=0 (original); original bridge restored 1/1, healthy. The
+rebuilt VM `ccci-w5-rebuild` @ `100.97.167.73` is **kept running** (C6 override, operator promotes it)
+with its **bridge paused** (`ccci-bridge_app` 0) to avoid dual-trigger on real PRs (operator restores
+at promotion). Remaining: re-deploy af46aca (Drone-token fix, toplevel `cqym8knj…`) to the original cc-ci
+→ re-verify byte-identical; Adversary re-checks C1 + verifies E1-E6.
+<details><summary>swap-active history</summary>
+Public gateway pointed at the rebuilt VM (`100.97.167.73`) during the e2e; original was cc-nix-test-orig.</details>
+**E2E progress (2026-05-27 ~19:45Z):** E1 PASS (public 200/ssl_verify=0). Original's bridge PAUSED
+(`ccci-bridge_app` 1/0 on cc-nix-test-orig). Rebuilt VM Drone OAuth done (admin=true, cc-ci active) —
+needed a script fix (auto-approve, committed ee585ef). **Clean-room finding (committed af46aca):**
+`DRONE_USER_CREATE` lacked `token:` → rebuilt Drone's bot token ≠ sops `bridge_drone_token` → bridge
+401. Fix injects the sops token. **NOT yet applied to the rebuilt VM** (a no-op rebuild ran with old
+config first). **NEXT:** (1) git pull af46aca on rebuilt VM + `nixos-rebuild switch` (applies token);
+(2) verify bot token == sops (else `docker volume rm` Drone DB + redeploy so DRONE_USER_CREATE recreates
+the bot w/ token; then re-run OAuth bootstrap); (3) run `!testme` on custom-html#2 (head db9a9502) →
+verify E2–E6; (4) swap-back; (5) re-deploy af46aca to cc-ci + re-verify byte-identical (Adversary re-checks C1).
+**`ssh cc-ci` (pinned 100.90.116.4) = the ORIGINAL** (cc-nix-test-orig); reach the rebuilt VM via
+`100.97.167.73` or `cc-nix-test` MagicDNS.
+**SWAP-BACK when e2e done:** rebuilt VM → `tailscale set --hostname=ccci-w5-rebuild`; then
+`ssh cc-ci 'tailscale set --hostname=cc-nix-test'`; restore original's bridge (`docker service scale
+ccci-bridge_app=1` on the original — paused during e2e to avoid dual-trigger). Keep both VMs running.
+
+## ⚠️ Operator override — do NOT destroy the FINAL throwaway VM (read before W5/W6 cleanup)
+The operator (2026-05-27) will **repurpose the final W5/C4-C5 clean-room throwaway VM as the new
+cc-nix-test** for a live real-traffic test. So: **KEEP that VM running after W5 PASSes — do NOT tear
+it down in C6/W6.** Defer its teardown until the operator explicitly says otherwise. This overrides the
+plan's "destroy the throwaway" for that one VM. (Adversary: please do not destroy your W5 VM on PASS.)
+This also settles C6 final sizing = **promote the rebuilt VM**. All other cleanup is normal (Builder's
+first throwaway already destroyed). See DECISIONS.md Phase-1c.
+
+### Pending functional-acceptance e2e — E2E-TESTME (BUILDER owns swap+test; gated on C4/C5 PASS)
+**Authority: `/srv/cc-ci/cc-ci-plan/test-e2e-testme-acceptance.md`** (supersedes inline wording).
+MY test to execute end-to-end (incl. the tailnet swap — **no orchestrator signal**); Adversary
+independently verifies but does **NOT** rename nodes (actor/critic split — only ONE loop renames).
+**Target VM = the ADVERSARY's kept-running W5 VM** (Incus instance `ccci-w5-rebuild`; mine was
+destroyed). **WAIT for: (i) Adversary W5 PASS in REVIEW-1c, AND (ii) the Adversary records that VM's
+Incus instance + CURRENT tailscale IP** in REVIEW-1c/STATUS — only then swap. Sequence:
+1. **Swap (Builder, 2 reversible `tailscale set --hostname`, ORDER MATTERS):**
+   (a) `ssh cc-ci 'tailscale set --hostname=cc-nix-test-orig'` — original aside, **keep running** (swap-back);
+   `ssh cc-ci` (pinned IP 100.90.116.4) keeps hitting the ORIGINAL.
+   (b) Adversary's W5 VM (`ccci-w5-rebuild`) → `cc-nix-test`, using the IP the Adversary recorded
+   (re-confirm online via `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status`), then
+   `ssh -i …/vm_ssh_key -o ProxyCommand='nc -X 5 -x 127.0.0.1:1055 %h %p' root@<ip> 'tailscale set --hostname=cc-nix-test'`.
+   After swap, `cc-nix-test.taila4a0bf.ts.net` → that VM tailnet-wide (gateway auto-follows ~10s);
+   target !testme/deploys by MagicDNS name, NOT raw IP (raw IP = original).
+2. **Verify P1+P2:** `tailscale … status | grep cc-nix-test` → throwaway IP; `curl https://ci.commoninternet.net/` → `200 ssl_verify=0`.
+3. **Run E2E-TESTME** (spec §2; E1–E6 below). **4. Swap-back when done** (reversible): rebuilt VM →
+   its old name, then `ssh cc-ci 'tailscale set --hostname=cc-nix-test'` (restores original; gateway re-follows).
+   Watch-out (handle at execution): the ORIGINAL (cc-nix-test-orig) stays up with its bridge polling
+   Gitea — to avoid duplicate builds/PR-comments, pause its bridge during the e2e (`docker service
+   scale ccci-bridge_app=0` on the original, restore after); and the rebuilt VM's Drone needs the
+   one-time OAuth bootstrap (install.md §2) before it can clone/build.
+Then: `!testme` as the bot on one fast enrolled recipe (e.g. `custom-html`) and verify the real path.
+Pass criteria (all): **E1** self-check 200/valid cert on rebuilt VM; **E2** new Drone build via the
+bridge (run# > baseline, not a manual trigger); **E3** app answers an **EXTERNAL** request at
+`<app>.ci.commoninternet.net` through the gateway (real 200 + valid cert + app content, NOT localhost,
+NOT a Traefik 404); **E4** real test assertions pass, build success (no softening); **E5** clean
+undeploy (no residual stack); **E6** result reported back + dashboard updated. Evidence → JOURNAL-1c,
+verdict → STATUS-1c/REVIEW-1c as **E2E-TESTME PASS**. On failure: it's a clean-room finding — fix in
+**git source** (base / cc-ci-secrets), NOT the live VM, then re-run.
+
+## Blocked
+(none)
+
+## Notes
+- Current secret layout: `secrets/secrets.yaml` (6 infra secrets), recipients = host age key
+  (ssh-to-age of cc-ci's ed25519 host key) + off-box master recovery key
+  (`/srv/cc-ci/.sops/master-age.txt`, sandbox-only). `.sops.yaml` at repo root.
+- Wildcard cert currently out-of-band at `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}`
+  (operator-provided, LE, next renewal ~2026-08-24); proxy.nix reads it from there. 1c moves it
+  into sops-in-git, decrypted back to that path at activation.
+- Sandbox host has NO sops/nix/age — sops ops run on cc-ci (has nix + host age key) or via the master
+  key with a sops binary fetched on cc-ci.
+- cc-nix-test == the live cc-ci server (100.90.116.4); resizing it (W1) briefly stops it.
--- a/machine-docs/STATUS.md
+++ b/machine-docs/STATUS.md
@ -0,0 +1,126 @@
+# STATUS — cc-ci Builder
+
+## DONE — 2026-05-27
+
+The cc-ci Co-op Cloud recipe CI server is **complete**. Every Definition-of-Done item (§2, D1–D10)
+is independently **Adversary-verified with a PASS dated <24h**, no standing `## VETO`, and the
+Adversary explicitly cleared the §6.1 DONE handshake ("Builder may flip STATUS → DONE", REVIEW.md).
+
+| D | Item | Verdict | Evidence (Adversary REVIEW.md) |
+|---|---|---|---|
+| D1 | `!testme` trigger | PASS | M3 @03:13Z + D10 real-`!testme` runs |
+| D2 | install/upgrade/backup matrix (real e2e) | PASS | M4/M5/M6 + D10 6/6 (3 stages each) |
+| D3 | Python + Playwright | PASS | live in every recipe install/D10 run |
+| D4 | recipe-local tests | PASS | M6 @04:43Z |
+| D5 | per-recipe tree, no harness surgery | PASS | M6.5 @07:25Z |
+| D6 | secrets (no leaks, rotatable) | PASS | M7 @07:55Z (grep clean: logs+dashboard+git) |
+| D7 | results UX (dashboard + PR outcome) | PASS | M8 @08:10Z |
+| D8 | reproducible server | PASS | byte-identical `nixos-rebuild build`==running + documented-alt @10:52Z |
+| D9 | documentation | PASS | @10:55Z (full docs set) |
+| D10 | six recipes via real `!testme` | PASS (6/6) @11:57Z | custom-html #84, keycloak #86, matrix-synapse #87, n8n #89, cryptpad #90, lasuite-docs #108 |
+
+D10 set spans all required categories: simple (custom-html), SSO/identity+DB (keycloak),
+DB+media/large-volume (matrix-synapse), workflow (n8n), stateful/no-DB (cryptpad), multi-service +
+S3/object-storage (lasuite-docs). bluesky-pds (TLS-passthrough) was swapped → n8n with a documented
+reason (DECISIONS). Registry creds (A1) remain a documented good-to-have for rate-limit robustness,
+not a DONE blocker. **Loop stopped.**
+
+---
+
+**Phase:** ALL MILESTONES BUILDER-COMPLETE. Adversary-verified: M0–M6 PASS, M6.5 PASS, M7/D6 PASS,
+**M8/D7 PASS, D8-core PASS, D9 PASS**. **Only D10 left to verify** — M10/D10 CLAIMED: all 6 recipes
+green via real `!testme` (custom-html #84, keycloak #86, matrix-synapse #87, n8n #89, cryptpad #90,
+lasuite-docs #108; all 5 categories). **D10 PASS (6/6) @11:57Z** logged by Adversary. Docker Hub
+rate-limit blocker RESOLVED.
+**DONE blocked on ONE item: D8 live blank-VM rebuild.** Adversary's D8 verdict (@10:52Z) = "core PASS
+(Nix byte-identical closure + docs); live blank-VM rebuild pending — to complete before DONE." It was
+DEFERRED on the premise that the rebuild needs operator registry creds (rate limit). **That premise
+is now obsolete:** D10 passed 6/6 WITHOUT creds — the rate limit was transient and the real fix was
+`abra app upgrade -c`. So the throwaway-VM live rebuild is feasible NOW in a fresh quota window
+(no creds dependency). Surfacing for the Adversary to complete D8 → then all D1–D10 <24h PASS → DONE.
+I will NOT write `## DONE` until REVIEW shows a full D8 PASS. No Builder implementation remains.
+## Gate: M6.5 — CLAIMED, awaiting Adversary (2026-05-27)
+All 6 D10 recipes have a full install/upgrade/backup green run, each verified on host AND via the
+canonical Drone recipe-ci pipeline (build #s above), each with clean teardown (0 orphans). Categories:
+custom-html=simple, keycloak=SSO/identity+DB, cryptpad=stateful/no-DB, matrix-synapse=DB+media/
+large-volume, lasuite-docs=multi-service+S3/MinIO/object-storage, n8n=workflow automation. D5 held:
+each recipe enrolled via `tests/<recipe>/` + `recipe_meta.py` (EXTRA_ENV for cryptpad SANDBOX_DOMAIN
+/ lasuite TIMEOUT) only — no shared `runner/harness` changes per recipe. Repro: trigger a custom
+Drone build with RECIPE=<r> (or `cc-ci-run runner/run_recipe_ci.py` with RECIPE/STAGES on host).
+
+## Gates
+- **Gate: M0 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: flake rebuilds cc-ci from repo
+  (`switch --flake /root/cc-ci#cc-ci`, gen healthy, no failed units); sops-nix decrypts
+  `/run/secrets/test_secret` (0400 root, value = generated `cc-ci-m0-…`). Repro: clone repo, sync to
+  host, `nixos-rebuild switch --flake .#cc-ci`, then `systemctl is-system-running` + check the secret.
+  Per §6.1 I will NOT advance past this gate to M2; M1 work proceeds as independent unblocked work.
+  → **M0 PASS** logged by Adversary in REVIEW.md @2026-05-26T21:35Z (cold verify, leak probe clean).
+- **Gate: M1 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: Docker single-node swarm +
+  `proxy` overlay; real coop-cloud/traefik via abra (wildcard/file-provider, no ACME); custom-html
+  deployed by hand → HTTP 200 over HTTPS via gateway at cchtml1.ci.commoninternet.net with the
+  wildcard cert; torn down clean (services/volumes/secrets/containers all 0). Repro:
+  `scripts/deploy-proxy.sh` + `abra app new/deploy/undeploy`. Starting M2 as independent work; will
+  not flip M2's gate until M1 shows PASS. → **M1 PASS** @2026-05-26T22:20Z.
+- **Gate: M2 — CLAIMED, awaiting Adversary** (2026-05-26). Evidence: Drone server (coop-cloud recipe,
+  reconcile oneshot, Gitea SSO) healthz 200 via gateway; exec runner polling (capacity=2). cc-ci repo
+  activated (push webhook). Pushing `.drone.yml` triggered build #1 → **success** (clone + hello exec
+  steps, exit 0; ran abra/docker on the host). Repro: `nixos-rebuild switch` + one-time
+  `scripts/bootstrap-drone-oauth.sh`. Starting M3 as independent work; won't flip M3 gate until M2 PASS.
+- **Gate: M3 — CLAIMED, awaiting Adversary** (2026-05-27). Trigger redesigned per orchestrator
+  (plan §4.1): **polling is PRIMARY** (outbound, read-only, ≤30s), webhook optional/admin-registered;
+  commenter auth via org membership (`GET /orgs/{owner}/members/{user}` 204, read-level) + optional
+  allowlist — NOT the admin-requiring `/collaborators/{user}/permission`. Evidence: posted `!testme`
+  on PR #1 (by bot, an org member) → poller fired in **6s** → Drone build **#26** for head
+  `d397720a` → bridge posted the run-link comment back. Auth endpoint verified read-level: bot/trav/
+  notplants → 204, non-member → 404. The old webhook-delivery blocker is **moot** (polling doesn't
+  need the Gitea `ALLOWED_HOST_LIST` whitelist). Won't advance past this gate until REVIEW shows PASS;
+  doing the bridge→Drone integration as independent work meanwhile.
+
+## Resource safety (plan §4.2/§4.3 — orchestrator change 2026-05-27)
+- **MAX_TESTS = DRONE_RUNNER_CAPACITY = 1** (`modules/drone-runner.nix`): ≤1 build at once, Drone
+  auto-queues the rest natively. Verified `DRONE_RUNNER_CAPACITY=1` on the runner.
+- **Per-build timeout = 60m** (`modules/drone.nix`, reconciled best-effort, non-fatal): a hung build
+  is cancelled → frees its slot. Verified Drone repo `timeout: 60`.
+- **Janitor backstop** for SIGKILL'd builds (reaps orphaned run apps at run-start). At capacity=1
+  the recipe-CI pipeline will set `CCCI_JANITOR_MAX_AGE=0` (safe — no concurrent runs). See DECISIONS.
+
+## Blocked
+- (none) — all blockers resolved. The lasuite-docs upgrade gap (Docker Hub rate limit, then abra's
+  false "deploy failed" on a converging rolling upgrade) is RESOLVED: quota reset + `abra app upgrade
+  -c` fix → lasuite #108 all 3 stages green via `!testme`. Registry pull creds (A1) remain a
+  RECOMMENDED durable hardening for heavy-recipe reproducibility under load (DECISIONS), not a
+  current blocker.
+
+## Tracking (adversary findings I must address)
+- **[adversary] A4 — concurrent same-recipe runs collide on shared `~/.abra/recipes/<recipe>`.**
+  Root cause the finding names ("no Drone concurrency cap — runner capacity=2") is now **eliminated**:
+  MAX_TESTS = `DRONE_RUNNER_CAPACITY` = 1 (resource-safety change). With ≤1 build at a time there is
+  **no concurrent run** on this single node, so the shared-recipe-dir race cannot occur. Builder side
+  addressed via the concurrency cap (per plan §4.2 "concurrency cap 1–2"); Adversary to re-test/close.
+  (Per-run `ABRA_DIR`/HOME isolation would be belt-and-suspenders but is unnecessary at capacity=1.)
+- **[adversary] A2 — janitor `-pr` filter dead.** Already fixed in code: `lifecycle.RUN_APP_RE` =
+  `^[a-z0-9]{1,4}-[0-9a-f]{6}\.ci\.commoninternet\.net$` (the hashed scheme), plus a stack-name regex
+  for `.env`-less orphans, gated on age. Awaiting Adversary kill-probe re-test.
+- **[adversary] A3 — teardown unverified; `.env` removed before confirmed undeploy.** Already fixed:
+  `lifecycle.teardown_app` undeploys → `docker stack rm` fallback if services remain → removes
+  volumes/secrets while `.env` exists → drops `.env` LAST → then `_residual()` check raises
+  `TeardownError` if anything is left. Awaiting Adversary kill-mid-run re-test.
+- **[adversary] A1 — no-ACME hazard for test apps.** Acknowledged (valid). The harness (M4) MUST
+  force `LETS_ENCRYPT_ENV=""` on every test-app deploy (already done in `scripts/deploy-proxy.sh` and
+  the M1 manual custom-html deploy; `scripts/deploy-drone.sh` will too). Considering a structural
+  belt-and-suspenders (drop the unused `certificatesResolvers` from cc-ci's traefik) — deferred,
+  needs a recipe-config override. Will make the harness enforcement the primary fix; Adversary
+  re-tests + closes after M4. → **Now enforced**: `harness.lifecycle.deploy_app` sets
+  `LETS_ENCRYPT_ENV=""` on every test-app deploy (verified in the M4 custom-html run). Adversary can
+  re-test + close A1.
+
+## Notes
+- **Disk RESOLVED:** operator grew the VM 8.9→**28 GiB** (22 GiB free) on 2026-05-26. Inodes
+  1.78M total / 1.21M free (was ~6k free — old 8.9 GiB fs had only 586k inodes, which the flake's
+  nixpkgs fetch exhausted). Both byte + inode pressure gone.
+- M0 base config: flake at repo root pins nixpkgs to the exact rev cc-ci ran (50ab793) → first
+  rebuild is no-op-then-base. Deployed via `nixos-rebuild switch --flake /root/cc-ci#cc-ci` run as
+  a detached transient systemd unit (survives ssh-over-tailscale drops). Gen 3 current, healthy.
+- Open warning: incus module enables `systemd.network` while we set `networking.useDHCP=true`
+  (scripted dhcpcd) — Nix warns both may manage interfaces. Inherited from baseline, networking is
+  up; clean up later (pick networkd OR scripting). Tracked, non-blocking.