cc-ci-orchestrator/cc-ci-plan/plan.md

# cc-ci — Co-op Cloud Recipe CI Server (Autonomous Build Plan)

**Status:** ACTIVE — autonomous loop
**Owner agent:** Builder (primary) + Adversary (reviewer)
**Source brief:** `brief.md` (do not edit; this file supersedes it)
**This file's canonical path:** `/srv/cc-ci/cc-ci-plan/plan.md`
**Target server:** `cc-ci` (NixOS)
**Code/config home:** `git.autonomic.zone/recipe-maintainers/cc-ci` (the CI project repo — distinct from this
`/srv/cc-ci/cc-ci-plan/` planning+launch folder)
**Last updated:** keep current via `STATUS.md` (see §7)

---

## 0. How to read this document

This plan is written to be handed to an **autonomous Claude agent running in a sandbox over
several days**, driving itself in a loop until the CI server is "done" per §2. A second agent
(the **Adversary**) independently tries to disprove every "done" claim. Neither agent is
trusted to mark its own work complete.

If you are an agent waking up into this loop for the first time, go straight to **§1 Bootstrap**.
On every subsequent wake, go to **§7 The Loop Protocol** and continue from `STATUS.md`.

The rest of the document (§3–§6) is the technical design. Treat it as the default architecture,
but you are allowed to revise it when reality disagrees — record any deviation in `DECISIONS.md`
with a one-line rationale.

---

## 1. Bootstrap (first wake only)

Do these in order. Each step is idempotent; re-running is safe.

1. **Verify access.** (Full credential map + how each is used is in **§1.5** — read it first.)
   - `ssh cc-ci 'hostname && whoami'` — you log in as **root** on cc-ci (NixOS), so there is no
     separate sudo step. `ssh cc-ci` reaches cc-ci directly (the orchestrator VM is a direct tailnet
     peer — no proxy; key `~/.ssh/cc-ci-root-ed25519`). If it fails, check `tailscale status`
     before declaring blocked.
   - `ssh cc-ci 'nixos-version'` — confirm NixOS.
   - Confirm you can reach the Gitea API with the bot creds from `.testenv` (§1.5):
     `curl -s https://$GITEA_URL/api/v1/version`. The bot authenticates with
     `GITEA_USERNAME`/`GITEA_PASSWORD` (basic auth) or a token you mint from them via
     `POST /api/v1/users/<user>/tokens` — do **not** expect a ready-made `$GITEA_TOKEN`.
   - Confirm the **preconfigured** test-app DNS (§4.0/§4.4): a random subdomain under the wildcard
     resolves, e.g. `getent hosts probe-$RANDOM.ci.commoninternet.net` returns the **gateway's** IP
     (not cc-ci's — the gateway TLS-passthroughs to cc-ci, so do not expect cc-ci's address; and use
     `getent`, not `dig`, since this host's resolver is Tailscale-only — see §1.5).
     Traefik is *not* up yet — you deploy it at M1 (the real `coop-cloud/traefik` recipe via abra,
     wildcard/file-provider mode → the pre-issued cert at `/var/lib/ci-certs/live/`, **no ACME**);
     the DNS record + gateway passthrough + cert are the preconditions, and full end-to-end HTTPS
     reachability is proven at M1, not now.
     If the wildcard does not resolve at all, that's a `## Blocked` item (operator fixes DNS/gateway).
   - If any check fails, write the failure to `STATUS.md` under `## Blocked` and stop — a human must fix access. Do **not** try to work around missing access.

2. **Create the `cc-ci` repo** on git.autonomic.zone if it does not exist. Push an initial
   skeleton (see §3 layout). The Builder clones to `/srv/cc-ci/cc-ci`; the Adversary loop keeps
   its **own independent clone** at `/srv/cc-ci/cc-ci-adv`. The repo is the only channel between
   the two loops (§6.1) — loop state lives inside it (`STATUS.md`, `BACKLOG.md`, etc.).

3. **Snapshot the starting environment** into `cc-ci/docs/baseline.md`: current NixOS config on
   the server (`/etc/nixos` or existing flake), installed packages, whether Docker/Swarm/abra
   already exist, DNS that already points at the box. This is the rollback reference.

4. **Seed the loop state files** (§7) if absent: `STATUS.md`, `BACKLOG.md`, `REVIEW.md`,
   `JOURNAL.md`, `DECISIONS.md`. Give `BACKLOG.md` two H2 sections — `## Build backlog`
   (populated from §5 milestones) and `## Adversary findings` (empty) — per the single-writer
   rule in §6.1.

5. Commit ("chore: bootstrap cc-ci loop state") and begin the loop at §7.

---

## 1.5 Credentials & access — where everything lives and how to use it

The loops run **on the cc-ci-orchestrator VM** (`100.116.55.106`, NixOS, `loops` user) and reach
cc-ci directly over Tailscale (direct tailnet peer — no proxy). This section is the authoritative
map of what credentials exist, where, and how to use them. **Never copy any secret value into the
repo, a commit, a log, or the dashboard** (§9) — reference locations only.

### Provided credentials (already in place)

| What | Where | How to use |
|---|---|---|
| **cc-ci SSH (root)** | private key `~/.ssh/cc-ci-root-ed25519`; `Host cc-ci` in `~/.ssh/config` (HostName `100.90.116.4`, no ProxyCommand) | Just run `ssh cc-ci` (logs in as **root**). The orchestrator VM is a direct tailnet peer — direct route, no proxy. Pubkey already in cc-ci's `/root/.ssh/authorized_keys`. |
| **Gitea bot account** | `/srv/cc-ci/.testenv` → `GITEA_USERNAME` (`autonomic-bot`), `GITEA_PASSWORD`, `GITEA_URL` (`git.autonomic.zone`) | Basic-auth to the Gitea API, or mint a scoped token: `POST https://$GITEA_URL/api/v1/users/$GITEA_USERNAME/tokens`. Used to push the `cc-ci` project repo, read recipe repos, comment on PRs, and poll for `!testme` (read-level; the bot does not register webhooks). |

Load them in a shell with: `set -a; . /srv/cc-ci/.testenv; set +a` (don't echo the values).

### The Tailscale connection (how `ssh cc-ci` works)

cc-ci (`cc-nix-test`, **100.90.116.4`) is on the same tailnet as the orchestrator VM
(`taila4a0bf.ts.net`), so it is reached **directly** — no SOCKS proxy, no userspace tailscaled.
The VM's system tailscaled is on that tailnet; `ssh cc-ci` routes straight to cc-ci.

- `ssh cc-ci` works out of the box (`~/.ssh/config` has `Host cc-ci` pinned to `100.90.116.4`,
  no ProxyCommand; key `~/.ssh/cc-ci-root-ed25519`; logs in as root).
- For HTTP(S) to cc-ci / `*.ci.commoninternet.net` from the VM, use plain `curl` — no proxy flag
  needed. The VM uses public DNS resolvers (`1.1.1.1`/`8.8.8.8`) so `*.ci.commoninternet.net`
  resolves normally.
- **If `ssh cc-ci` fails:** run `tailscale status` (as loops or root) to confirm the VM is still
  on the tailnet and cc-ci is listed; check `systemctl status tailscaled`. A connectivity failure
  is recoverable, not an immediate `## Blocked`-and-stop, unless the VM has lost tailnet membership
  entirely (then that IS a class-A1 blocker).

### Credentials the loop GENERATES itself (do not wait on a human for these)

- **Drone RPC secret** and **webhook HMAC secret** — generate (`openssl rand -hex 32`), store
  sops-encrypted in `secrets/`, and wire both ends. Internal shared secrets, not human inputs.
- **Gitea OAuth app for Drone** — create it under the bot account via the API
  (`POST /api/v1/user/applications/oauth2`); capture client id/secret into `secrets/`.
- **cc-ci host age/GPG key for sops** — generate on the host (or derive from its SSH host key);
  add as a sops recipient. Keep a recovery copy of the master age identity off-box if desired.
- **Per-recipe app secrets** (class-B, §4.4) — the harness generates these per run.

### Credentials STILL NEEDED from the operator (class-A — block if missing, per §9)

- **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert
  (`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at
  `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0).
  > **Phase-1c update (supersedes the cert references in §1.5/§4.0/§4.4 below):** the cert is no longer
  > an out-of-band operator file-drop — it is now **sops-encrypted in the private `cc-ci-secrets` repo**
  > (a git submodule) and **decrypted at activation to that same path** by sops-nix. Issuance stays
  > operator-only (LE/Gandi, no token on the box); to rotate, the operator re-issues then re-encrypts
  > the cert into `cc-ci-secrets` and rebuilds. The ONE out-of-band secret is now the bootstrap age key
  > at `/var/lib/sops-nix/key.txt`. Authoritative model: `cc-ci/docs/secrets.md` + `docs/install.md`. The agent feeds these into the
  `coop-cloud/traefik` recipe as its `ssl_cert`/`ssl_key` swarm secrets (wildcard/file-provider
  mode) and runs **no ACME** for this domain. **Do not request or expect a `commoninternet.net` DNS
  token** — issuance/renewal is handled out-of-band by the operator (LE 90-day cert; next renewal
  ~2026-08-24). A missing/expired cert is a finding for the operator, not an agent re-issue.
- **Registry pull credentials** (e.g. Docker Hub) — *recommended* to avoid anonymous pull-rate
  limits breaking deploys under load. Treat a rate-limit failure traced to this as a finding, then
  request creds. Store sops-encrypted in `secrets/`.
- **Gitea bot permissions** (a grant, not a secret) — **least privilege: read, not admin.** The bot
  needs: write on its own `recipe-maintainers/cc-ci` project repo; **read** + **comment** on the
  recipe repos under test; and **org membership** in `recipe-maintainers` (read-level — used both to
  authorize commenters via the members endpoint and to read members). It does **not** need repo-admin
  and does **not** register webhooks (that's an optional manual admin task, §4.1). If a needed grant
  is missing, that's a `## Blocked` item for the operator.

---

## 2. Definition of Done (the loop's exit condition)

The loop terminates **only** when every item below is true *and the Adversary has independently
re-verified each one within the last 24h* (logged in `REVIEW.md` with timestamps and command
output). Partial credit does not count.

- [ ] **D1 — Trigger.** Commenting `!testme` on any open PR in any enrolled recipe repo on
      git.autonomic.zone starts a CI run for the code *at that PR's head commit* within 60s.
      Other comments do not. Re-commenting re-runs.
- [ ] **D2 — Test matrix.** For a recipe under test, the run executes, as separate reported
      stages: **new install**, **upgrade** (previous published version → PR version), and
      **backup + restore**. All are genuine end-to-end against a really-deployed recipe (real
      containers, real Traefik routing, real volumes) — no mocks, no stubs.
- [ ] **D3 — Python + Playwright.** Tests are Python. Functional assertions that require a
      browser use Playwright against the live deployed app.
- [ ] **D4 — Recipe-local tests.** If the recipe repo contains its own `tests/` folder, those
      tests are also discovered and run as part of the same CI run, with results merged in.
- [ ] **D5 — Per-recipe test tree.** The cc-ci repo holds `tests/<recipe>/` with the
      install/upgrade/backup tests as Python files, plus a shared harness. Adding a new recipe is
      a documented, small, repeatable operation.
- [ ] **D6 — Secrets.** App + infra secrets are handled reproducibly (committed encrypted,
      decrypted on the server), documented, and rotatable. No plaintext secrets in git, logs, or
      the results UI.
- [ ] **D7 — Results UX.** Each run has a stable URL with live, tail-able logs per stage and a
      final pass/fail; there is an overview page listing recipes with their latest status —
      look-and-feel comparable to the YunoHost app CI (`ci-apps.yunohost.org`). A PR comment links
      back to its run and reflects the outcome.
- [ ] **D8 — Reproducible server.** The entire server (Drone, runner, comment bridge, swarm,
      Traefik, dashboard, secrets wiring) is declared in the `cc-ci` repo's NixOS flake and can be
      rebuilt from scratch onto a blank NixOS host following `docs/install.md`, verified by the
      Adversary doing exactly that on a throwaway VM (or documenting why a full from-scratch
      rebuild was infeasible and what was tested instead).
- [ ] **D9 — Documentation.** `README.md` + `docs/` explain architecture, how to enroll a recipe,
      how to add/run tests locally, how to operate/rotate secrets, and how to debug a failed run.
      A new engineer can enroll a recipe and get a green run using only the docs.
- [ ] **D10 — Proof (breadth).** At least **six real recipes** spanning the meaningful
      categories have a full green run triggered by `!testme` on a real PR, with all three stages
      (install / upgrade / backup+restore) actually exercised. The set must cover:
      a stateless/simple app, a single-DB app, a multi-service app, an SSO/identity app, and an
      object-storage/large-volume app. **Target set (all previously verified deployable):**
      `hedgedoc` (simple), `cryptpad` (stateful, no external DB), `keycloak` + `authentik`
      (SSO/identity, DB-backed), `lasuite-docs` and/or `lasuite-drive` (multi-service + S3/MinIO),
      `matrix-synapse` (DB + media store), `immich` (large volumes + Postgres), `bluesky-pds`
      (TLS-passthrough/atproto). Pick six that together satisfy the categories; record the chosen
      set and per-recipe green-run evidence in `REVIEW.md`. Any recipe that genuinely cannot be CI'd
      is a documented finding (in `DECISIONS.md`) with the reason, not a silent omission.
      *Recipe availability:* the testable repos live on the **private mirror**
      `git.autonomic.zone/recipe-maintainers/<recipe>` (already mirrored as of bootstrap:
      `bluesky-pds`, `cryptpad`, `keycloak`, `lasuite-docs`, `lasuite-meet`, `matrix-synapse`, `n8n`,
      `custom-html`, `custom-html-tiny`). Any recipe **not** yet mirrored (e.g. `hedgedoc`,
      `authentik`, `immich`, `lasuite-drive`) is pulled from upstream **git.coopcloud.tech** and
      created on the mirror via the **recipe mirror+PR flow** (§4.1) — so the target set is not capped
      by what currently exists. If the chosen simple/stateless app isn't mirrored, `custom-html` /
      `custom-html-tiny` already are.

When all of D1–D10 hold and are Adversary-verified, write `## DONE` to `STATUS.md` with the
evidence links and stop scheduling new iterations.

---

## 3. Repository layout (`git.autonomic.zone/recipe-maintainers/cc-ci`)

```
cc-ci/
├── README.md
├── flake.nix                 # NixOS host(s) + devshell
├── flake.lock
├── hosts/
│   └── cc-ci/
│       ├── configuration.nix # the cc-ci machine
│       └── hardware.nix
├── modules/
│   ├── drone.nix             # Drone server + runner (exec/docker)
│   ├── comment-bridge.nix    # !testme webhook listener service
│   ├── swarm.nix             # Docker + single-node swarm + `proxy` net; deploys the
│   │                         #   coop-cloud/traefik recipe via abra (wildcard/file-provider, §4.2)
│   ├── dashboard.nix         # results overview site
│   └── secrets.nix           # sops-nix / agenix wiring
├── secrets/                  # sops-encrypted (*.enc / *.age); see §4.4
│   └── secrets.yaml
├── bridge/                   # comment-bridge source (small Go/Python service)
├── runner/                   # CI orchestration entrypoint invoked by Drone
│   ├── run_recipe_ci.py      # top-level: deploy→test→teardown for a recipe@ref
│   └── harness/              # shared pytest fixtures (abra wrappers, app lifecycle)
├── dashboard/                # results UI generator (reads Drone API → static site)
├── tests/
│   ├── conftest.py           # shared fixtures, recipe selection, teardown guarantees
│   ├── <recipe>/
│   │   ├── test_install.py
│   │   ├── test_upgrade.py
│   │   ├── test_backup.py
│   │   └── playwright/       # e2e flows for this recipe
│   └── _template/            # copy-to-add-a-recipe template
├── docs/
│   ├── install.md            # from-scratch server build (D8)
│   ├── enroll-recipe.md      # how to add a recipe (D5)
│   ├── secrets.md            # secret model + rotation (D6)
│   ├── architecture.md
│   ├── runbook.md            # debugging failed runs
│   └── baseline.md           # bootstrap snapshot
├── STATUS.md  BACKLOG.md  REVIEW.md  JOURNAL.md  DECISIONS.md   # loop state (§7)
└── .drone.yml                # pipeline for cc-ci's own repo (lint/self-test)
```

---

## 4. Technical design (default architecture)

### 4.0 Domain model (where things live)

Two DNS zones, deliberately separated — do **not** conflate them:

- **`git.autonomic.zone` — source of truth for code (unchanged, not ours to reconfigure).**
  The Gitea host: the enrolled recipe repos and the `cc-ci` config repo live here. The loop reads,
  comments, and (when enrolling) adds a webhook here, but deploys **nothing** here. Per §9 this zone
  is read/comment-only — never push recipe code, never point app DNS at it.
- **`commoninternet.net` — the CI server's own zone; everything CI-facing.** A wildcard
  `*.ci.commoninternet.net` resolves to a **gateway** (not cc-ci directly — see Network path below).
  Under it:
  - **Apps under test:** each run deploys to a unique subdomain
    `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`, so concurrent runs never collide on a
    hostname. The subdomain (app, volumes, secrets, Traefik route) is torn down at run end (§4.3).
  - **Results dashboard:** `ci.commoninternet.net` — overview page + per-recipe status badges (§4.5).
  - **Webhook bridge:** `ci.commoninternet.net/hook` — the Gitea `issue_comment` receiver (§4.1).
- **Network path (gateway → TLS passthrough → cc-ci).** The wildcard record does **not** point at
  cc-ci's IP. It points at a gateway that **passes TLS through** to cc-ci: the gateway routes by SNI
  and forwards the raw encrypted stream without decrypting it, so TLS still **terminates on cc-ci's
  Traefik**. Consequences the agent must respect:
  - `dig <sub>.ci.commoninternet.net` returns the **gateway's** IP, not cc-ci's — do not assert the
    record points at cc-ci. Reachability is proven end-to-end (an HTTPS request lands on cc-ci),
    not by comparing A records.
  - The gateway is assumed to passthrough the **whole wildcard**, so a fresh per-run subdomain needs
    **no gateway change** and **no cert work** (the pre-issued wildcard already covers it) — the
    agent only adds the Traefik **router** on cc-ci. (If the gateway
    instead needs per-host config, that's an operator/gateway concern and a `## Blocked` item, not
    something the agent reconfigures — the gateway is not ours, only cc-ci is, per §9.)
  - The gateway is operator-managed and out of scope; the agent configures only cc-ci.
  - **Caveat for TLS-passthrough recipes** (e.g. `bluesky-pds`, §2 D10): the default path terminates
    TLS at cc-ci's Traefik. A recipe that expects to terminate TLS in its own container needs cc-ci's
    Traefik configured to passthrough that host too (the outer gateway already passes the whole
    wildcard). Treat this as a per-recipe harness quirk to absorb (§5 M6.5), or pick a non-passthrough
    recipe for that D10 category and record the swap in `DECISIONS.md` — not a silent omission.
- **Wildcard TLS — operator pre-issues, agent serves it statically (no token in the agent).**
  Routing and certs are separate: the preconfigured wildcard DNS solves routing only; a cert is
  still needed because the gateway passes TLS through and cc-ci's Traefik terminates it. **The cert
  is pre-provisioned out-of-band** so the DNS-editing token never enters the agent/repo. A wildcard
  SAN cert covering **`*.ci.commoninternet.net` + `ci.commoninternet.net`** (issued via Let's
  Encrypt DNS-01 against Gandi, by the operator, using a token the agent never sees) lives on cc-ci:
  - `/var/lib/ci-certs/live/fullchain.pem` (leaf+intermediate) and `…/privkey.pem`.
  - **Traefik is the real `coop-cloud/traefik` recipe, deployed via abra** (for e2e fidelity — see
    §4.2), run in its **wildcard / file-provider mode** (`WILDCARDS_ENABLED=1` + `compose.wildcard.yml`).
    The pre-issued cert is supplied as the recipe's `ssl_cert`/`ssl_key` **swarm secrets** (sourced
    from the files above); the recipe's file provider then serves it under `tls.certificates`. **No
    ACME resolver / no DNS provider** is enabled — only the cert+key reach cc-ci, never the DNS token.
    One cert covers every per-run subdomain (matched by SNI), so a new app domain needs no cert work.
  - **Renewal is a manual operator task** (LE 90-day cert): the operator re-issues out-of-band, then
    updates the `ssl_cert`/`ssl_key` secret (bump its version) and redeploys traefik. The agent must
    **not** attempt ACME/DNS-01 for `commoninternet.net` and must **not** expect a DNS token — a
    missing/expired cert is an operator action surfaced as a finding, not something the agent re-issues.
  (Rationale for choosing a wildcard cert over per-subdomain: a wildcard is reused for every churning
  run subdomain and sidesteps LE's 50-certs/week-per-domain limit; only DNS-01 can mint a wildcard.
  We keep that DNS-01 issuance with the operator rather than handing the agent the zone token.)
- Record the live facts in `docs/install.md`: the zone + DNS provider (Gandi), that the wildcard
  `*.ci.commoninternet.net` (and bare `ci.commoninternet.net`) point at the **gateway**, that the
  gateway TLS-passthroughs the wildcard to cc-ci, the gateway's address, the TTL, and that the
  wildcard cert is pre-issued/operator-renewed at `/var/lib/ci-certs/live/` (no DNS token on cc-ci).

### 4.1 The `!testme` trigger path

Gitea does not natively forward PR-comment events to Drone, and Drone's built-in triggers fire on
push/PR-open, not on a magic comment. So:

```
PR comment "!testme"
   │  Gitea webhook (issue_comment event)  ──►  comment-bridge (modules/comment-bridge.nix)
   │                                              • verifies webhook HMAC secret
   │                                              • checks comment body == "!testme" (exact, trimmed)
   │                                              • checks commenter is allowed (org member / collaborator)
   │                                              • resolves PR head repo + SHA via Gitea API
   │                                              • calls Drone API: build for cc-ci pipeline,
   │                                                params RECIPE=<repo> REF=<sha> PR=<n> SRC=<headrepo>
   ▼
Drone build (cc-ci repo pipeline, parameterized) ──► runner/run_recipe_ci.py
   ▼
Bridge posts/updates a Gitea PR comment with the run URL and (on completion) pass/fail.
```

- The bridge is a tiny service (Go or Python+FastAPI). Keep it dependency-light; it's a NixOS
  systemd service behind Traefik at e.g. `ci.commoninternet.net/hook` (§4.0).
- **Trigger: POLLING is primary; webhook is an optional, admin-registered push optimization
  (SETTLED).** Hard constraint: **the CI server/bot must run on READ-level access — never repo-admin.**
  - **Polling (primary, default):** the bridge polls the Gitea API for new `!testme` comments on
    enrolled repos at ≤60s (satisfies D1). This is **outbound** (cc-ci → git.autonomic.zone, the
    reliably-working direction) and needs only **read**. It is the source of truth for triggering.
  - **Webhook (optional):** the bridge keeps its `/hook` endpoint so a Gitea `issue_comment` webhook,
    **if present**, gives lower latency. But the **server does NOT self-register webhooks** (that
    needs repo-admin, which we refuse to require). Registration is a **manual admin task, documented**
    in `docs/enroll-recipe.md` (URL `https://ci.commoninternet.net/hook`, event `issue_comment`,
    content-type `json`, the shared HMAC secret, and the note that the Gitea instance must allow the
    host). The two paths are mutually exclusive in effect; don't double-fire a comment seen by both.
  - (Webhook delivery on this instance was flaky early on — `last_status: None` — so polling being
    primary is also the robust choice, not just the low-privilege one.)
- **Commenter auth via org membership (read-level — no admin).** The repo's explicit collaborator
  list is empty: the bot *and* the maintainers (`trav`/`notplants`) all reach the repo as
  `recipe-maintainers` **org members/owners**, so `GET /collaborators/{user}` 404s for everyone, and
  `GET /collaborators/{user}/permission` would authorize correctly but **requires repo-admin** — which
  we refuse. Instead authorize with **`GET /orgs/recipe-maintainers/members/{user}`** (204 = member =
  authorized; 404 = rejected) — readable by any **org member** (read-level), verified to admit
  `trav`/`notplants`/the bot and reject non-members. Note `public_members` is hidden here, so use the
  authenticated `members` endpoint (bot must be an org member, still read-level). Fail-closed on
  error. Zero-privilege fallback: a configured allowlist of usernames. (Still satisfies §6's
  non-collaborator-rejection check.)
- Enrollment = adding the recipe to the bridge's **poll list** + ensuring a `tests/<recipe>/` dir
  exists. The bot needs only **read** on the recipe repo (+ comment-back to post status). Registering
  a webhook is **optional and operator/admin-side** (documented in `enroll-recipe.md`), never required
  for CI to work.
- **Recipe mirror+PR flow (how a recipe gets a testable PR).** Recipe repos under test live on the
  **private mirror** `git.autonomic.zone/recipe-maintainers/<recipe>`, mirrored from the **official
  upstream `git.coopcloud.tech`**. To bring a recipe under CI: `abra recipe fetch <recipe>` (pulls
  from upstream into `~/.abra/recipes/<recipe>`), then mirror it to the org + open a PR via the
  **recipe mirror+PR procedure** — reference implementation:
  `/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` (creates `recipe-maintainers/<recipe>`
  if absent, force-syncs `main` from upstream so the PR diff is clean, pushes a branch, opens the PR).
  `!testme` on that PR is what kicks off a run. So a recipe missing from the mirror is **not** a
  blocker — mirror it first.
- Decide and record in DECISIONS.md: one shared Gitea org-level webhook vs per-repo webhooks.
  Org-level is fewer moving parts; per-repo is more explicit. Default: per-repo via enroll script.

### 4.2 Drone + the test target

- Drone server connects to Gitea via OAuth app (Gitea → Settings → Applications). Runner is the
  **exec runner** (or a privileged docker runner) running **on cc-ci itself**, because tests must
  drive `abra` to deploy real recipes onto a real swarm.
- cc-ci doubles as the **deploy target**: single-node Docker Swarm + abra, with the reverse proxy
  provided by the **real `coop-cloud/traefik` recipe deployed via abra** (not a hand-rolled Traefik
  — chosen for **end-to-end fidelity**: test apps route through the exact proxy a real Co-op Cloud
  host uses — `web`/`web-secure` entrypoints, the `proxy` overlay, the swarm provider). TLS
  terminates on it using the **pre-issued static wildcard cert** (§4.0): run the recipe in
  **wildcard/file-provider mode** (`WILDCARDS_ENABLED=1` + `compose.wildcard.yml`) and supply the
  cert as the recipe's `ssl_cert`/`ssl_key` swarm secrets from `/var/lib/ci-certs/live/`. The
  operator preconfigures the wildcard DNS (→ gateway), the gateway's TLS-passthrough, and the cert
  itself (§4.4); the agent deploys the traefik recipe + swarm on top — **no ACME, no DNS token on
  cc-ci**. Make the `abra app new/deploy traefik` steps reproducible (scripted/Nix-invoked) for D8.
- Each CI run gets an isolated app domain `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`
  (§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes.
- **Concurrency cap + queue — use Drone natively (SETTLED).** Don't let the server fill with
  simultaneously-deployed apps. Expose a configurable **`MAX_TESTS`** mapped to the exec runner's
  **`DRONE_RUNNER_CAPACITY`** (Nix-set on the runner; default low — **1–2** given a single 28 GiB
  node and heavy recipes like matrix/immich). Drone runs at most `MAX_TESTS` builds at once and
  **automatically queues** excess builds (its native pending-build queue), starting them as slots
  free. **Per-build timeout** (repo/runner timeout) guarantees a hung test is killed and frees its
  slot — so "continue once a current test finishes *or times out*" is built in. No custom queue
  needed. Optionally also set `concurrency: { limit: <N> }` in `.drone.yml` as a per-pipeline cap.
- **One app at a time per run, torn down at run end.** A build deploys its recipe, runs the three
  stages, then **undeploys** — the server should not accumulate live test apps. Guaranteed teardown
  + the run-start janitor (§4.3) enforce this even when a build is timed-out/killed (in-process
  cleanup can't run, so the janitor reaps it).

### 4.3 The test harness & recipe test contract

`runner/run_recipe_ci.py` orchestrates per run:
1. Fetch recipe at `$REF` (the PR head) via abra/git.
2. **Install stage** → `tests/<recipe>/test_install.py`: `abra app new`, generate secrets,
   `abra app deploy`, wait healthy, run Playwright smoke + assertions.
3. **Upgrade stage** → deploy previous published version first, then upgrade to `$REF`; assert
   data survives and app still healthy.
4. **Backup/restore stage** → `abra app backup`, mutate state, `abra app restore`, assert restored
   state matches pre-mutation.
5. **Recipe-local tests (D4)** → if `<recipe-repo>/tests/` exists, discover & run it in the same
   live environment; merge results.
6. **Teardown (always, even on failure)** → `abra app undeploy`, `abra app volume remove`,
   `abra app secret remove`, DNS/route cleanup.

Shared fixtures (`tests/conftest.py` + `runner/harness/`) wrap abra. **Known abra gotchas to bake
in from day one** (carried over from prior work, re-verify on the installed abra version):
- `abra app undeploy` and `abra app volume remove` do **not** accept `--chaos` → never pass it.
- Plumb a `timeout` kwarg through secret-generate/insert/remove-all calls.
- `abra app ls -S -m` returns nested `{server: {apps: [...]}}` — parse the inner structure.
- Pick robust health checks per app (e.g. Keycloak: `/realms/master`, not `/`).

The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the
next run. Implement teardown as a pytest fixture finalizer / `try/finally` in the orchestrator and
add a janitor pass at run start that nukes any orphaned `*-pr*` apps older than N hours.
**Crucially, the janitor is the backstop for timed-out/killed builds:** when Drone hits the
per-build timeout (or a build is cancelled) it may SIGKILL the runner process, so the `try/finally`
teardown can't run — those orphaned apps/volumes are reaped by the next build's run-start janitor
(and the janitor should run regardless of how the previous build ended). Net effect with the
`MAX_TESTS`/`DRONE_RUNNER_CAPACITY` cap (§4.2): at most `MAX_TESTS` apps are ever live at once, and
each is torn down (or janitor-reaped) so the single node never accumulates deployments.

### 4.4 Secrets (D6)

There are **two distinct classes of secret** and they are handled in opposite ways. Do not
conflate them.

**(A) Infra secrets.** All of these end up `sops-nix`-encrypted in `secrets/`, decrypt into the Nix
store at activation, and are never world-readable. But they split into two sub-classes — see §1.5
for the concrete locations/usage — and only the first sub-class blocks:

- **(A1) External inputs — provided by the operator, the loop cannot create them.** The Tailscale
  auth key + Gitea bot creds (`/srv/cc-ci/.testenv`, already provided), the **pre-issued wildcard
  TLS cert** at `/var/lib/ci-certs/live/` (§4.0 — *not* a DNS token; the agent serves it, never
  issues it), and **registry pull creds** (if needed). If one of these is **missing or invalid, the
  loop is blocked** — write it to `STATUS.md ## Blocked` and stop (§9). The agent must not invent or
  work around an external input it wasn't given, and must **not** attempt ACME/DNS-01 for
  `commoninternet.net`.
- **(A2) Internal secrets — the loop generates and manages these itself; never block on them.**
  Drone RPC secret + webhook HMAC (`openssl rand`), the Gitea OAuth app for Drone (created via the
  bot API), and the cc-ci host age/GPG key for sops. These are *not* human inputs; generate, store
  in `secrets/`, and wire both ends.

Alongside these, three **preconfigured network/cert facts** are operator-provided inputs the loop
also depends on (not secrets the agent makes, but class-A in the same "provided, don't improvise"
sense): (1) the wildcard `*.ci.commoninternet.net` record (and bare `ci.commoninternet.net`) already
points at the **gateway**, (2) the gateway **TLS-passthroughs** that wildcard to cc-ci (SNI-routed,
no decryption — see §4.0 Network path), and (3) the **pre-issued wildcard cert** is in place at
`/var/lib/ci-certs/live/`. The operator owns the DNS record, the gateway, and cert issuance/renewal;
**everything else on cc-ci is the agent's job** — Traefik (pointed at the static cert), swarm,
per-run subdomain routing, and teardown. If the wildcard does not resolve, the gateway doesn't reach
cc-ci, or the cert is missing/expired, that is a `## Blocked` condition (operator action), not
something to work around (the gateway and DNS are not ours to reconfigure, per §9).

**(B) Recipe app secrets — generated by the test, persisted within the run.** These are NOT a
blocker and are NOT pre-provisioned by a human. The harness creates them itself for each app under
test and is responsible for persisting them across the run so the multi-stage lifecycle works:

- **Generate at install:** the harness runs `abra app secret generate` (+ inserts any deterministic
  test fixtures like an admin password / test user it chooses) when it deploys the app.
- **Persist for the run's duration:** the *same* generated secrets must survive across stages —
  install → upgrade and especially **backup → restore** — because an app cannot be upgraded or
  restored against rotated credentials. Persist them in a per-run secret store keyed by the run's
  unique app name (e.g. `<recipe>-pr<n>-<sha>`): the live abra/swarm secrets plus a sidecar record
  the harness writes (e.g. the app's `.env` + the generated values) to a run-scoped, non-public
  location on the runner, so any stage can re-read them. They are emphemeral by design.
- **Destroy at teardown:** the same teardown that removes the app/volumes also runs
  `abra app secret remove` (with `timeout` plumbed) and deletes the per-run sidecar. Nothing
  generated for a run outlives that run.
- **How the harness should "figure out" persistence (acceptance for D6):** decide and document one
  concrete mechanism — recommended default is "abra/swarm holds the live secrets; the harness keeps
  a run-scoped sidecar file under a `runs/<app-name>/` dir on the runner (mode 600), and reloads
  from it between stages." Whatever is chosen, it must (1) keep the same values stable across all
  three stages, (2) isolate concurrent runs from each other, and (3) leave nothing behind.

**(C) Drone CI tokens:** store as Drone org/repo secrets, referenced by the pipeline. Where a value
is an external input (A1, e.g. registry creds) it is provided; where it is internal (A2) it is
generated — see the (A) split above.

Hard rule across all classes: scrub secrets from logs before they reach the dashboard; the results
UI shows sanitized logs only. Add a redaction filter in the log pipeline and an Adversary test that
greps published logs and the overview site for known secret patterns and any generated app
password.

### 4.5 Results UX (D7) — YunoHost-CI-like

- **Per-run logs:** Drone's native UI already gives live, per-stage, tail-able logs and a final
  status — use it as the canonical run view; the PR comment links to it.
- **Overview page:** a small generator (`dashboard/`) polls the Drone API and renders a static
  page at `ci.commoninternet.net` (§4.0): a table of enrolled recipes, latest run status badge
  (pass/fail/running), last-tested version, link to history — mirroring the YunoHost app-list
  feel. Served by Traefik; regenerated on build-completion webhook or a short timer.
- Provide a status badge endpoint per recipe for embedding in recipe READMEs.

---

## 5. Milestones / initial BACKLOG

Work top-down; each milestone ends with an **Adversary gate** (Adversary must independently
verify the acceptance check before the next milestone starts). Seed `BACKLOG.md` from this.

- **M0 — Foundations.** Repo created; flake builds; `nixos-rebuild` (or deploy-rs) applies a
  no-op-then-base config to cc-ci; sops decrypts a test secret on the host.
  *Accept:* `ssh cc-ci 'systemctl is-system-running'` healthy after a rebuild from the repo.
- **M1 — Swarm + abra target.** Docker + single-node swarm + `proxy` network; the **`coop-cloud/traefik`
  recipe deployed via abra** (wildcard/file-provider mode, serving the pre-issued cert — §4.0/§4.2,
  not a custom Traefik); abra can deploy and tear down a trivial recipe by hand.
  *Accept:* a recipe deployed via abra is reachable over HTTPS (valid wildcard cert) on the
  `web-secure` entrypoint at `*.ci.commoninternet.net`, then fully torn down leaving no volumes; the
  proxy is verifiably the traefik recipe and **no DNS/ACME token is present on cc-ci**.
- **M2 — Drone online.** Drone server+runner via Nix, OAuth to Gitea; a hello-world `.drone.yml`
  in cc-ci runs green; logs visible in Drone UI.
  *Accept:* push to cc-ci triggers a visible green Drone build.
- **M3 — Comment bridge.** `!testme` on a PR triggers a parameterized Drone build; bridge posts a
  PR comment with the run link; non-`!testme` comments and non-collaborators are ignored.
  *Accept:* live demo on a scratch PR — comment in, build out, link back, auth enforced.
- **M4 — Harness + install stage.** `run_recipe_ci.py` + conftest; install stage green for one
  simple recipe end-to-end with a Playwright assertion; guaranteed teardown.
  *Accept:* full green install run for recipe #1, no orphaned app/volume afterward.
- **M5 — Upgrade + backup/restore stages.** Add the other two stages for recipe #1.
  *Accept:* upgrade preserves data; backup→mutate→restore returns original state.
- **M6 — Recipe-local tests (D4) + second recipe.** Discover/run recipe-repo `tests/`; enroll a
  second, DB-backed recipe via the documented flow.
  *Accept:* both recipes green; recipe-local tests demonstrably executed and merged.
- **M6.5 — Breadth ramp.** Enroll recipes 3→6 covering the remaining D10 categories, one at a
  time, each via the documented enroll flow (this is the real test of D5: enrolling recipe N
  should be template-copy + recipe-specific tests/fixtures, with **no harness surgery**). Expect
  per-recipe quirks — multi-service deps, S3/MinIO config, SSO client setup, TLS passthrough,
  large-volume backups — and absorb them into the *shared* harness, not one-off per-recipe hacks.
  When flakiness appears, add real readiness/wait robustness to the harness rather than sprinkling
  `sleep`s. Run benchmarks/long deploys **sequentially**, never in parallel (network contention).
  *Accept:* recipes 3–6 each have a full three-stage green run; enrolling N≥3 needed no changes to
  shared harness code.
- **M7 — Secrets hardening (D6).** Full sops model, rotation doc, log redaction + leak test.
  *Accept:* Adversary's secret-grep over published logs finds nothing; rotation doc followed.
- **M8 — Dashboard (D7).** Overview page + badges + PR-comment outcome reflection.
  *Accept:* overview matches reality across several runs; outcomes mirrored to PR comments.
- **M9 — Reproducibility + docs (D8/D9).** `docs/install.md` rebuilds the server from scratch on a
  blank VM; all docs complete.
  *Accept:* Adversary rebuilds from docs onto a throwaway host (or records the tested subset).
- **M10 — Proof (D10).** All six chosen recipes green via real `!testme` PRs (the breadth set from
  M6/M6.5 carried through the hardened pipeline), each with install/upgrade/backup-restore
  exercised and Adversary-verified; flip `STATUS.md` to DONE.

---

## 6. The two agents

### Builder (primary)
Implements the backlog top-down. Discipline:
- One backlog item in flight at a time. Small, committed, reversible steps.
- Every change verified against the *real* system (server, Drone, Gitea) before claiming done —
  never "should work". Paste the verifying command + output into `JOURNAL.md`.
- Touch production carefully: cc-ci is the only target; never deploy test apps onto unrelated
  production servers; never reuse production domains. Idempotent server changes only (via Nix).
- If blocked on access/secrets/external state, write it to `STATUS.md ## Blocked` and pick up an
  unblocked item rather than hacking around it.

### Adversary (reviewer)
Runs as a **separate, independent loop in its own process/sandbox** (see §6.1 for how the two
loops coordinate). Its job is to **disbelieve**. It:
- Re-verifies each `Definition of Done` and milestone-acceptance claim independently, from a cold
  start (fresh shell, own clone, no cached state), and logs PASS/FAIL + evidence in `REVIEW.md`.
- Actively tries to break things: comment `!testmexyz` (should NOT trigger), comment as a
  non-collaborator (should be rejected), push a PR that fails tests (must report red, not green),
  kill an app mid-run (teardown must still clean up), grep published logs/dashboard for secrets,
  run two `!testme`s concurrently (no domain/volume/secret collision), confirm the same generated
  app secrets persist across install→upgrade→backup/restore.
- Files every defect as a `BACKLOG.md` item tagged `[adversary]` with repro steps. The Builder
  may not close an adversary item; only the Adversary closes it after re-test.
- Has veto power over `STATUS.md → DONE`.

### 6.1 Coordination protocol (two independent loops, one shared repo)

The two loops never talk directly; the **git repo is the only coordination medium**. Each agent
has its own clone (e.g. Builder in `/srv/cc-ci/cc-ci`, Adversary in `/srv/cc-ci/cc-ci-adv`) and
its own pacing. To make concurrent writes conflict-free:

- **File ownership (one writer each — the other only reads):**
  - Builder owns: all source code/config, `STATUS.md`, `JOURNAL.md`, `DECISIONS.md`.
  - Adversary owns: `REVIEW.md`.
  - `BACKLOG.md` is split into two H2 sections — `## Build backlog` (Builder-only) and
    `## Adversary findings` (Adversary-only). Each agent edits **only its own section**, so git
    merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder
    fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it
    closed after re-test.
  - `DEFERRED.md` (in `machine-docs/`) is the **single canonical registry for things the loops
    have deliberately decided not to do autonomously and that need operator input to move on.**
    Append-only; either agent may file. Each entry should clearly say *what's needed from the
    operator* to lift the deferral (an opt-in flag, a resource decision, an architectural call,
    plain "go ahead"). The list is **open-ended** — items can sit indefinitely, **no obligation
    to close every item**, closure is operator-driven. A re-entry trigger / IDEA cross-link is
    **optional** (include when there's a natural mechanism, e.g. an opt-in flag in
    `cc-ci-plan/IDEAS.md`). Don't park deferrals as a vague "Q4 follow-up" / buried JOURNAL note
    — file them here so the operator can review the whole list. The Phase-4 cleanup pass should
    **surface** DEFERRED.md to the operator at least once but does **not** force closure.
    Future-aspirational ideas (out of current scope) still go to `cc-ci-plan/IDEAS.md`; DEFERRED
    is for considered-and-parked work the loops won't tackle without operator input.
- **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never
  conflict. Prefer appending over rewriting.
- **Artifact-layer isolation — facts in STATUS, reasoning in JOURNAL (anti-anchoring).** Rigorous
  adversarial verification requires the Adversary NOT to consume the Builder's rationalisations
  before forming its verdict. The split:
  - `STATUS.md` MUST carry **everything the Adversary needs to verify the claim** — withholding
    verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW**
    to verify (the exact command/check the Adversary can re-run from its own clone), the
    **EXPECTED** outcome (build hashes, file contents, leaf fingerprints, status codes), and
    **WHERE** the inputs live (commit shas, paths). If it's essential for the Adversary to verify,
    it goes in STATUS.
  - `STATUS.md` MUST NOT carry rationalisations / "why I think this passes" / design narrative /
    dead-ends explored. Those go in `JOURNAL.md` (Builder-private to write).
  - The Adversary reads STATUS for the claim + verification info, the plan as SSOT, and the code /
    git history; it forms its verdict from those + its own **cold** acceptance run, and does **not**
    read `JOURNAL.md` before the verdict. After an independent verdict, consulting JOURNAL is fine
    (e.g. to contextualise a finding) — note in REVIEW that you did.

  In short: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.**
- **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the
  smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's
  file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`.
- **Gate handshake via STATUS.md + commit-prefix signalling.** When the Builder believes a milestone
  gate is met, it sets in `STATUS.md`: `Gate: <Mn> — CLAIMED, awaiting Adversary`, **commits it with a
  `claim(...)` prefix**, and stops advancing past it. The Adversary runs the acceptance check cold and
  writes the verdict to `REVIEW.md` (`<Mn>: PASS @<ts>` with evidence, or `FAIL` + an `[adversary]`
  item), **committed with a `review(...)` prefix**. The Builder only proceeds past the gate after
  seeing `PASS` in `REVIEW.md`.
  - **The watchdog signals the handoff off these commit prefixes** (not by parsing prose): a new
    `claim(...)` commit on origin/main pings the Adversary; a new `review(...)` commit pings the
    Builder. So the prefixes are **load-bearing** — a gate claim MUST be a `claim(...)` commit and a
    verdict MUST be a `review(...)` commit, or the counterpart isn't promptly woken (it falls back to
    its slower self-poll). STATUS/REVIEW remain the durable source of truth; the prefix is the signal.
  - **Clean tree before claim.** The Builder runs `git status` before claiming — the working tree
    must be clean (everything committed AND pushed). The Adversary cold-verifies from a fresh clone,
    so an uncommitted/un-pushed change that only exists on the Builder's host (e.g. a locally-built
    fix) is a guaranteed cold-verify mismatch. Commit + push first, then claim.
- **DONE handshake.** Builder may write `## DONE` to `STATUS.md` **only** when `REVIEW.md` shows a
  PASS dated within 24h for every D1–D10. The Adversary can write `## VETO <reason>` to
  `REVIEW.md` at any time, which forbids DONE until cleared.
- **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or
  the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep
  doing independent work — neither loop blocks idle waiting on the other beyond its gate.
- **INBOX — explicit cross-loop messaging beyond CLAIMS.** Sometimes you have something to say to
  the other loop that isn't a gate claim or a REVIEW verdict (a heads-up, a request for
  early-look, a "I refactored X, please re-verify Y", an observation outside the normal flow). For
  those, use the inbox files in `machine-docs/`:
  - **Builder → Adversary:** the Builder writes/appends `machine-docs/ADVERSARY-INBOX.md` in its
    own clone, commits, pushes.
  - **Adversary → Builder:** the Adversary writes/appends `machine-docs/BUILDER-INBOX.md` in its
    own clone, commits, pushes.
  - The watchdog edge-triggers on **newly-present** inbox files in the relevant clone and pings
    the receiver. The receiver, on receipt, reads + processes the message, then **deletes the
    inbox file** (commits + pushes) — deletion is the "message consumed" signal. Single-writer
    discipline: only the sender writes their counterpart's inbox; only the receiver deletes it.
  - **Use for:** non-gate signals — "heads-up I'm about to refactor X," "please cold-verify this
    while I keep going," "I observed Y outside our normal flow," "I'm taking a long e2e now."
    **Do NOT use for:** formal gate claims (`STATUS.md` still owns those) or verdicts (`REVIEW.md`
    still owns those). The inbox is a side-channel, not a replacement.

(If you are ever forced to run with a single process, the degraded fallback is to alternate
roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is
the intended design.)

---

## 7. The Loop Protocol

Both loops run this same shape; state lives in the repo so it survives restarts/compaction. On
every wake, `git pull --rebase` first, then:

1. **Orient.** Read `STATUS.md` (phase, in-flight item, gate claims, blockers), `BACKLOG.md`, and
   the tail of `REVIEW.md`. Reconcile with reality via cheap probes (Drone health, last build,
   `git log`) — never trust the ledger blindly; if it disagrees with the system, fix the ledger
   first (your own files only — see §6.1).
2. **Select.**
   - *Builder:* highest-priority open item in `## Build backlog`: unresolved `[adversary]`
     findings > current milestone's next task > next milestone. Never advance past a milestone gate
     until `REVIEW.md` shows its PASS.
   - *Adversary:* any standing `Gate: <Mn> CLAIMED` in `STATUS.md` to verify > re-verify a D1–D10
     gate whose last PASS is stale (>24h) > a fresh break-it probe from §6.
3. **Act.** Smallest change that advances the item. Builder verifies against the real system;
   Adversary verifies from a cold start. Commit with a clear message (author per repo convention).
4. **Record (your own files only).** *Builder:* append to `JOURNAL.md` (what you did + verifying
   command/output + next), update `STATUS.md`, tick `## Build backlog`. *Adversary:* append PASS/
   FAIL + evidence to `REVIEW.md`, add/close items in `## Adversary findings`. Then `git push`.
5. **Gate handshake (§6.1).** Builder, on reaching a milestone, sets `Gate: <Mn> CLAIMED, awaiting
   Adversary` in `STATUS.md` and works on other unblocked items meanwhile. Adversary clears it with
   a `REVIEW.md` verdict. No gate is "passed" without a logged PASS.
6. **Decide continuation.** Builder writes `## DONE` only when `REVIEW.md` shows a <24h PASS for
   every D1–D10 and no standing `## VETO`. Otherwise schedule the next wake.

**Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the
harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll
the *specific* thing. Three cases:
1. **Something in flight** (build/deploy/`nixos-rebuild`/e2e/heavy test) → **poll every ~5 min** to
   stay cache-warm and to **see failures as they happen**, not at the end of a 25-minute sleep. Do
   **NOT** `ScheduleWakeup` for the expected total runtime of the task in a single big sleep — a 25
   min e2e gets 5 short cache-warm polls, not one 25-min cache-cold blackout. The wakeup that wakes
   you mid-task is *cheap* (one cache hit, one quick status check); the value of catching a deploy
   that died at minute 4 of a 25-min budget is large. Keep polling *it*, don't treat it as idle.
   - **Recommended pattern for long deploys/convergence (builder, 2026-05-30):** **arm a `Monitor`**
     that polls the node every ~30s and **wakes you on convergence OR failure**, with a **longer
     fallback heartbeat** (`ScheduleWakeup`) as a backstop if the Monitor never fires. This proceeds
     the *instant* the deploy converges (no over-waiting if it finishes early) and surfaces a failure
     promptly, while the heartbeat bounds the wait if the condition is never met. Size the convergence
     timeout sanely — longer than a few minutes if a recipe genuinely needs it, but **never absurd**
     (e.g. the ~40-min ghost timeout was excessive). Beats both a single big blind sleep and a fixed
     coarse poll.
2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or
   Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll
   here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate,
   the watchdog pings the Adversary to verify *now*; the moment the Adversary updates `REVIEW.md`
   (verdict/finding), it pings the Builder to proceed (`launch.sh`, ~30 s detection). So you may sleep
   while blocked and trust the ping — but keep a **fallback self-poll on a modest cadence (~2–4 min)**
   in case a ping is missed (a dead session is restarted by the watchdog and re-orients from the repo
   anyway). The goal: a pending handoff resolves in well under a minute, not a whole idle interval.
3. **Genuinely idle, nothing pending from either loop** → sleep in chunks of **at most 10 min**, then
   re-wake and re-orient; if still nothing, sleep another ≤10 min. **Never a single wait > 10 min**
   (600 s) — see the liveness rule below.

Notes: **The Adversary may idle freely when nothing is pending — it should NOT pointlessly re-verify
or busy-poll to look busy.** It gets woken by the watchdog the instant the Builder claims a gate, so
"start verifying very soon after the Builder waits" is handled by the signal, not by the Adversary
spinning. **The Builder** should prefer keeping an unblocked backlog item in hand so it's rarely
*fully* blocked on a gate; only hit case 2 when everything is genuinely gated behind the pending
verification — and then rely on the watchdog ping (+ fallback poll) rather than a long idle.

**Liveness marker & max-wait (the watchdog ENFORCES this).** Every wait is capped at **10 minutes**;
to wait longer, wake at 10 min, re-check, and wait again. **Immediately before going idle for any
wait, your FINAL output line MUST be exactly:**

    WAITING-UNTIL: <ISO-8601 UTC>

— the moment you intend to resume (≤10 min out, matching your `ScheduleWakeup`). Compute it from the
clock, e.g. `date -u -d '+10 min' +%FT%TZ`. The watchdog uses this to tell a healthy wait from a
wedge: if it sees a loop **idle ≥5 min with no current `WAITING-UNTIL` marker as its last message, OR
idle past the time the marker named, it kills + reboots that loop** (which then re-orients from git +
its STATUS/REVIEW files). So always leave a fresh marker before sleeping, and never overrun it.

**Proactive compaction.** If your context usage climbs high (≳80%), run `/compact` *before*
continuing — your state lives in git + the phase STATUS/REVIEW files, so compaction is lossless for
the loop and prevents wedging (garbled output, failed tool calls) near the context limit.

**Anti-drift guards.**
- Cap retries: if an approach fails 3× the same way, stop, write the dead-end in `DECISIONS.md`,
  and try a different approach or mark blocked. No thrashing.
- Never weaken a test to make it pass. A red test is information; "fix" the recipe/harness or file
  a finding — do not delete the assertion. (This is the single most important rule; the Adversary
  watches specifically for tests being softened or skipped.)
- Keep changes reversible; prefer Nix-declared state over imperative server edits so any rebuild
  reproduces it.
- Don't expand scope beyond §2. New ideas → `BACKLOG.md` (tagged `[idea]`), not into this run.

---

## 8. Open decisions to settle early (log in DECISIONS.md)

- Deploy mechanism: `nixos-rebuild --target-host` vs `deploy-rs`/`colmena`. (Default: deploy-rs
  for atomic rollbacks; nixos-rebuild fine if simpler.)
- Webhook scope: per-repo vs org-level Gitea webhook. (Default: per-repo via enroll script.)
- Drone runner type: exec vs privileged docker. (Default: exec, since it must drive host abra.)
- Secret tool: sops-nix vs agenix. (Default: sops-nix for multi-recipient + yaml ergonomics.)
- Reverse proxy / Wildcard TLS: **SETTLED — deploy the real `coop-cloud/traefik` recipe via abra
  (for e2e fidelity), in wildcard/file-provider mode, serving the operator's pre-issued wildcard
  cert; no ACME, no token** (§4.0/§4.2). Supersedes the original plan's hand-rolled
  `modules/traefik.nix`. The operator issued the wildcard SAN cert (`*.ci.commoninternet.net` +
  `ci.commoninternet.net`) via LE DNS-01/Gandi out-of-band into `/var/lib/ci-certs/live/`; the agent
  feeds it as the recipe's `ssl_cert`/`ssl_key` swarm secrets so the DNS-editing token never reaches
  cc-ci. **Manual renewal** ~90 days (next ~2026-08-24): re-issue → update the secret → redeploy.
- Proof recipe set (D10 — six, category-spanning). Default candidates, all previously verified
  deployable: `hedgedoc`, `cryptpad`, `keycloak`, `authentik`, `lasuite-docs`/`lasuite-drive`,
  `matrix-synapse`, `immich`, `bluesky-pds`. Lock the final six early so M4–M6.5 build toward them.
  Sequence easy→hard: prove the pipeline on `hedgedoc`/`cryptpad` before tackling SSO, S3, media
  stores, and TLS-passthrough recipes.

Each default stands until the Adversary or reality forces a change; record the change and why.

---

## 9. Guardrails / hard rules

- **Access boundary:** only cc-ci is yours to reconfigure. Recipe repos: read + comment + (when
  enrolling) add a webhook — nothing else. Never push to a recipe repo's code.
- **No secrets in git/logs/UI.** Ever. Verified by the Adversary's leak test.
- **No mocks for the e2e stages.** D2 means real deploys. If something can't be tested for real,
  it's a finding, not a pass.
- **Idempotent + reversible.** Anything done to the server must be re-derivable from the repo.
  Infra bring-up is **declarative idempotent reconciliation in Nix** — not manual post-steps and not
  run-once scripts. Each piece (swarm + `proxy` net, the traefik recipe deploy, Drone, the
  comment-bridge, the dashboard) is a systemd **oneshot that re-runs on every activation/boot** and
  *converges* to the desired state (inspect → act only if needed → no-op if already correct), like
  `swarm-init`. **No `/var/lib/.bootstrapped`-style sentinels** (they don't self-heal drift). The
  goal: a from-scratch install is `git clone` + `nixos-rebuild switch` + the operator preconditions
  — `docs/install.md` must never accumulate manual post-rebuild steps.
- **Stop on missing *external* infra inputs** (class-A1 in §4.4: cc-ci SSH/root access, the
  Tailscale auth key, Gitea bot creds, the pre-issued wildcard cert at `/var/lib/ci-certs/live/`,
  registry creds — and the preconfigured DNS/gateway facts) rather than improvising around them;
  surface in `STATUS.md ## Blocked`. **Never** attempt ACME/DNS-01 for `commoninternet.net` — the
  cert is pre-provided and renewed out-of-band by the operator. **This does NOT apply to** internal infra secrets (class-A2: Drone RPC,
  webhook HMAC, Gitea OAuth app, host age key — the agent generates these) or to recipe app secrets
  (class-B): those the test harness generates itself (`abra app secret generate` + chosen fixtures),
  persists for the run, and destroys at teardown — a missing app secret is never a blocker, it is
  something the harness
  creates. See §4.4.
- **Real abra deploys; abra convergence by default; custom readiness only if it's a real test.**
  Deploys/upgrades use the **real abra commands** (`abra app deploy`/`upgrade`) — never bypass abra
  with `docker service update`/`scale`. **Prefer abra's own convergence checks.** Only skip abra's
  post-deploy convergence monitor (`-c`/`--no-converge-checks`) and substitute a **harness READY_PROBE**
  when abra's monitor genuinely doesn't fit (e.g. its window is too short for a heavy app and it FATAs
  on a deploy that *does* converge). When you do: the deploy is still real abra (only abra's *waiting*
  is replaced), and the probe MUST be a **genuinely strict** readiness test — all services N/N **plus**
  a real app-level check — that **RAISES on actual non-readiness**, never a no-op that masks a failed
  deploy. **Prove it has teeth** (a negative test that fails on stuck convergence, e.g. F2-12's
  P7-negative). The Adversary treats a custom probe as a potential test-weakening until cold-verified.
- **Custom cc-ci compose overlays — avoid where possible, justify each, prefer upstream.** A
  cc-ci-authored compose overlay (an extra `compose.*.yml` layered via `COMPOSE_FILE`) risks **drift**
  from the recipe users actually run, so **avoid it where possible and justify each use**. In most
  cases the cleaner fix is an **upstream recipe PR** — either a genuine robustness fix, or exposing a
  knob the recipe should expose. **But a single, uniform, optional `compose.ccci.yml` overlay file per
  recipe is an acceptable fallback** — especially for a value abra/compose can't take from an env var.
  (One fixed filename per recipe — `compose.ccci.yml` — holding all cc-ci-side deploy tweaks; not
  per-purpose suffixes.)
  **Known limitation (builder, 2026-05-30): abra does NOT support an env value for a healthcheck
  `start_period`.** So the ghost/discourse `start_period` bumps legitimately **need** the overlay (an
  env-var PR is not possible for that field) — these overlays **stay**, justified. When you do use an
  overlay: keep it **minimal + single-purpose**, **document WHY in the file header** (the exact abra/
  upstream limitation that forces it), have the **Adversary confirm it doesn't weaken a test or mask a
  recipe defect**, and **file the upstream PR where the fix genuinely belongs** (e.g. if a recipe's
  `start_period` is too tight for any slow host, propose raising it upstream too).
- **Upgrade tier: always test the upgrade to the LATEST version.** Don't drop the upgrade test just
  because the *from* (older) version is awkward. If an older from-version can't be fully deployed/tested
  (its image tag was pulled from the registry, or it predates an overlay/feature), you do **NOT** need
  that older version's **custom tests** to run — deploy it minimally (a justified overlay is fine) or
  pick the nearest deployable prior, then **upgrade to latest and run the full assertions on the
  latest**. Skipping a from-version's custom tests is an honest, recorded outcome; skipping the
  upgrade-to-latest is not. (See `plan-ccci-compose-overlay-policy.md` for the per-recipe disposition.)
- **Honest reporting.** If a stage is skipped or a check failed, say so in `STATUS.md`/`JOURNAL.md`
  with the output. The loop's value depends entirely on the ledgers being true.