Files
cc-ci-orchestrator/cc-ci-plan/plan.md
autonomic-bot 01874821f2 decommission Pi: update all docs for VM-only setup
The orchestrator Pi is retired (2026-05-31). All agents now run on the
cc-ci-orchestrator VM (NixOS, loops user, /srv/cc-ci). The VM is a
direct tailnet peer to cc-ci — no SOCKS proxy, no userspace tailscaled,
no ProxyCommand. Updated across all affected files:

AGENTS.md
  - Remove Pi from reboot description; migration complete (not "parked")
  - cc-ci access: direct ssh, not via proxy

kickoff.md
  - Prerequisites: direct tailnet peer, not proxy
  - Host deps: NixOS (not apt)
  - Fallback/Incus: b1 reachable directly, no --proxy curl flag

plan.md §1 + §1.5
  - §1 bootstrap: direct SSH, check tailscale status (not restart proxy)
  - §1.5 intro: "VM" not "sandbox host"; no proxy
  - Credentials table: remove TS_AUTH_KEY row; update cc-ci SSH row
  - Replace "Tailscale connection (proxy)" subsection with direct-peer description

plan-orchestrator-migration.md
  - Mark COMPLETE (2026-05-31); historical record only

plan-phase1c-full-reproducibility.md
  - Incus access: direct, not via SOCKS proxy

prompts/builder.md + prompts/adversary.md
  - cc-ci access language only: direct ssh, no proxy restart instructions
  - adversary: *.ci.commoninternet.net via plain curl, no proxy flag

REBOOTS.md
  - Retitle for VM; note Pi retired; Pi entries marked historical

systemd/cc-ci-loops.service
  - User/Group/HOME/PATH: notplants → loops
  - Remove cc-ci-tailscaled.service dependency (no proxy on VM)
  - Add note about nix/configuration.nix as the authoritative VM declaration

test-e2e-testme-acceptance.md
  - tailscale status: no --socket flag
  - ssh to throwaway: no ProxyCommand

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 00:16:37 +00:00

847 lines
62 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci — Co-op Cloud Recipe CI Server (Autonomous Build Plan)
**Status:** ACTIVE — autonomous loop
**Owner agent:** Builder (primary) + Adversary (reviewer)
**Source brief:** `brief.md` (do not edit; this file supersedes it)
**This file's canonical path:** `/srv/cc-ci/cc-ci-plan/plan.md`
**Target server:** `cc-ci` (NixOS)
**Code/config home:** `git.autonomic.zone/recipe-maintainers/cc-ci` (the CI project repo — distinct from this
`/srv/cc-ci/cc-ci-plan/` planning+launch folder)
**Last updated:** keep current via `STATUS.md` (see §7)
---
## 0. How to read this document
This plan is written to be handed to an **autonomous Claude agent running in a sandbox over
several days**, driving itself in a loop until the CI server is "done" per §2. A second agent
(the **Adversary**) independently tries to disprove every "done" claim. Neither agent is
trusted to mark its own work complete.
If you are an agent waking up into this loop for the first time, go straight to **§1 Bootstrap**.
On every subsequent wake, go to **§7 The Loop Protocol** and continue from `STATUS.md`.
The rest of the document (§3§6) is the technical design. Treat it as the default architecture,
but you are allowed to revise it when reality disagrees — record any deviation in `DECISIONS.md`
with a one-line rationale.
---
## 1. Bootstrap (first wake only)
Do these in order. Each step is idempotent; re-running is safe.
1. **Verify access.** (Full credential map + how each is used is in **§1.5** — read it first.)
- `ssh cc-ci 'hostname && whoami'` — you log in as **root** on cc-ci (NixOS), so there is no
separate sudo step. `ssh cc-ci` reaches cc-ci directly (the orchestrator VM is a direct tailnet
peer — no proxy; key `~/.ssh/cc-ci-root-ed25519`). If it fails, check `tailscale status`
before declaring blocked.
- `ssh cc-ci 'nixos-version'` — confirm NixOS.
- Confirm you can reach the Gitea API with the bot creds from `.testenv` (§1.5):
`curl -s https://$GITEA_URL/api/v1/version`. The bot authenticates with
`GITEA_USERNAME`/`GITEA_PASSWORD` (basic auth) or a token you mint from them via
`POST /api/v1/users/<user>/tokens` — do **not** expect a ready-made `$GITEA_TOKEN`.
- Confirm the **preconfigured** test-app DNS (§4.0/§4.4): a random subdomain under the wildcard
resolves, e.g. `getent hosts probe-$RANDOM.ci.commoninternet.net` returns the **gateway's** IP
(not cc-ci's — the gateway TLS-passthroughs to cc-ci, so do not expect cc-ci's address; and use
`getent`, not `dig`, since this host's resolver is Tailscale-only — see §1.5).
Traefik is *not* up yet — you deploy it at M1 (the real `coop-cloud/traefik` recipe via abra,
wildcard/file-provider mode → the pre-issued cert at `/var/lib/ci-certs/live/`, **no ACME**);
the DNS record + gateway passthrough + cert are the preconditions, and full end-to-end HTTPS
reachability is proven at M1, not now.
If the wildcard does not resolve at all, that's a `## Blocked` item (operator fixes DNS/gateway).
- If any check fails, write the failure to `STATUS.md` under `## Blocked` and stop — a human must fix access. Do **not** try to work around missing access.
2. **Create the `cc-ci` repo** on git.autonomic.zone if it does not exist. Push an initial
skeleton (see §3 layout). The Builder clones to `/srv/cc-ci/cc-ci`; the Adversary loop keeps
its **own independent clone** at `/srv/cc-ci/cc-ci-adv`. The repo is the only channel between
the two loops (§6.1) — loop state lives inside it (`STATUS.md`, `BACKLOG.md`, etc.).
3. **Snapshot the starting environment** into `cc-ci/docs/baseline.md`: current NixOS config on
the server (`/etc/nixos` or existing flake), installed packages, whether Docker/Swarm/abra
already exist, DNS that already points at the box. This is the rollback reference.
4. **Seed the loop state files** (§7) if absent: `STATUS.md`, `BACKLOG.md`, `REVIEW.md`,
`JOURNAL.md`, `DECISIONS.md`. Give `BACKLOG.md` two H2 sections — `## Build backlog`
(populated from §5 milestones) and `## Adversary findings` (empty) — per the single-writer
rule in §6.1.
5. Commit ("chore: bootstrap cc-ci loop state") and begin the loop at §7.
---
## 1.5 Credentials & access — where everything lives and how to use it
The loops run **on the cc-ci-orchestrator VM** (`100.116.55.106`, NixOS, `loops` user) and reach
cc-ci directly over Tailscale (direct tailnet peer — no proxy). This section is the authoritative
map of what credentials exist, where, and how to use them. **Never copy any secret value into the
repo, a commit, a log, or the dashboard** (§9) — reference locations only.
### Provided credentials (already in place)
| What | Where | How to use |
|---|---|---|
| **cc-ci SSH (root)** | private key `~/.ssh/cc-ci-root-ed25519`; `Host cc-ci` in `~/.ssh/config` (HostName `100.90.116.4`, no ProxyCommand) | Just run `ssh cc-ci` (logs in as **root**). The orchestrator VM is a direct tailnet peer — direct route, no proxy. Pubkey already in cc-ci's `/root/.ssh/authorized_keys`. |
| **Gitea bot account** | `/srv/cc-ci/.testenv``GITEA_USERNAME` (`autonomic-bot`), `GITEA_PASSWORD`, `GITEA_URL` (`git.autonomic.zone`) | Basic-auth to the Gitea API, or mint a scoped token: `POST https://$GITEA_URL/api/v1/users/$GITEA_USERNAME/tokens`. Used to push the `cc-ci` project repo, read recipe repos, comment on PRs, and poll for `!testme` (read-level; the bot does not register webhooks). |
Load them in a shell with: `set -a; . /srv/cc-ci/.testenv; set +a` (don't echo the values).
### The Tailscale connection (how `ssh cc-ci` works)
cc-ci (`cc-nix-test`, **100.90.116.4`) is on the same tailnet as the orchestrator VM
(`taila4a0bf.ts.net`), so it is reached **directly** — no SOCKS proxy, no userspace tailscaled.
The VM's system tailscaled is on that tailnet; `ssh cc-ci` routes straight to cc-ci.
- `ssh cc-ci` works out of the box (`~/.ssh/config` has `Host cc-ci` pinned to `100.90.116.4`,
no ProxyCommand; key `~/.ssh/cc-ci-root-ed25519`; logs in as root).
- For HTTP(S) to cc-ci / `*.ci.commoninternet.net` from the VM, use plain `curl` — no proxy flag
needed. The VM uses public DNS resolvers (`1.1.1.1`/`8.8.8.8`) so `*.ci.commoninternet.net`
resolves normally.
- **If `ssh cc-ci` fails:** run `tailscale status` (as loops or root) to confirm the VM is still
on the tailnet and cc-ci is listed; check `systemctl status tailscaled`. A connectivity failure
is recoverable, not an immediate `## Blocked`-and-stop, unless the VM has lost tailnet membership
entirely (then that IS a class-A1 blocker).
### Credentials the loop GENERATES itself (do not wait on a human for these)
- **Drone RPC secret** and **webhook HMAC secret** — generate (`openssl rand -hex 32`), store
sops-encrypted in `secrets/`, and wire both ends. Internal shared secrets, not human inputs.
- **Gitea OAuth app for Drone** — create it under the bot account via the API
(`POST /api/v1/user/applications/oauth2`); capture client id/secret into `secrets/`.
- **cc-ci host age/GPG key for sops** — generate on the host (or derive from its SSH host key);
add as a sops recipient. Keep a recovery copy of the master age identity off-box if desired.
- **Per-recipe app secrets** (class-B, §4.4) — the harness generates these per run.
### Credentials STILL NEEDED from the operator (class-A — block if missing, per §9)
- **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert
(`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at
`/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0).
> **Phase-1c update (supersedes the cert references in §1.5/§4.0/§4.4 below):** the cert is no longer
> an out-of-band operator file-drop — it is now **sops-encrypted in the private `cc-ci-secrets` repo**
> (a git submodule) and **decrypted at activation to that same path** by sops-nix. Issuance stays
> operator-only (LE/Gandi, no token on the box); to rotate, the operator re-issues then re-encrypts
> the cert into `cc-ci-secrets` and rebuilds. The ONE out-of-band secret is now the bootstrap age key
> at `/var/lib/sops-nix/key.txt`. Authoritative model: `cc-ci/docs/secrets.md` + `docs/install.md`. The agent feeds these into the
`coop-cloud/traefik` recipe as its `ssl_cert`/`ssl_key` swarm secrets (wildcard/file-provider
mode) and runs **no ACME** for this domain. **Do not request or expect a `commoninternet.net` DNS
token** — issuance/renewal is handled out-of-band by the operator (LE 90-day cert; next renewal
~2026-08-24). A missing/expired cert is a finding for the operator, not an agent re-issue.
- **Registry pull credentials** (e.g. Docker Hub) — *recommended* to avoid anonymous pull-rate
limits breaking deploys under load. Treat a rate-limit failure traced to this as a finding, then
request creds. Store sops-encrypted in `secrets/`.
- **Gitea bot permissions** (a grant, not a secret) — **least privilege: read, not admin.** The bot
needs: write on its own `recipe-maintainers/cc-ci` project repo; **read** + **comment** on the
recipe repos under test; and **org membership** in `recipe-maintainers` (read-level — used both to
authorize commenters via the members endpoint and to read members). It does **not** need repo-admin
and does **not** register webhooks (that's an optional manual admin task, §4.1). If a needed grant
is missing, that's a `## Blocked` item for the operator.
---
## 2. Definition of Done (the loop's exit condition)
The loop terminates **only** when every item below is true *and the Adversary has independently
re-verified each one within the last 24h* (logged in `REVIEW.md` with timestamps and command
output). Partial credit does not count.
- [ ] **D1 — Trigger.** Commenting `!testme` on any open PR in any enrolled recipe repo on
git.autonomic.zone starts a CI run for the code *at that PR's head commit* within 60s.
Other comments do not. Re-commenting re-runs.
- [ ] **D2 — Test matrix.** For a recipe under test, the run executes, as separate reported
stages: **new install**, **upgrade** (previous published version → PR version), and
**backup + restore**. All are genuine end-to-end against a really-deployed recipe (real
containers, real Traefik routing, real volumes) — no mocks, no stubs.
- [ ] **D3 — Python + Playwright.** Tests are Python. Functional assertions that require a
browser use Playwright against the live deployed app.
- [ ] **D4 — Recipe-local tests.** If the recipe repo contains its own `tests/` folder, those
tests are also discovered and run as part of the same CI run, with results merged in.
- [ ] **D5 — Per-recipe test tree.** The cc-ci repo holds `tests/<recipe>/` with the
install/upgrade/backup tests as Python files, plus a shared harness. Adding a new recipe is
a documented, small, repeatable operation.
- [ ] **D6 — Secrets.** App + infra secrets are handled reproducibly (committed encrypted,
decrypted on the server), documented, and rotatable. No plaintext secrets in git, logs, or
the results UI.
- [ ] **D7 — Results UX.** Each run has a stable URL with live, tail-able logs per stage and a
final pass/fail; there is an overview page listing recipes with their latest status —
look-and-feel comparable to the YunoHost app CI (`ci-apps.yunohost.org`). A PR comment links
back to its run and reflects the outcome.
- [ ] **D8 — Reproducible server.** The entire server (Drone, runner, comment bridge, swarm,
Traefik, dashboard, secrets wiring) is declared in the `cc-ci` repo's NixOS flake and can be
rebuilt from scratch onto a blank NixOS host following `docs/install.md`, verified by the
Adversary doing exactly that on a throwaway VM (or documenting why a full from-scratch
rebuild was infeasible and what was tested instead).
- [ ] **D9 — Documentation.** `README.md` + `docs/` explain architecture, how to enroll a recipe,
how to add/run tests locally, how to operate/rotate secrets, and how to debug a failed run.
A new engineer can enroll a recipe and get a green run using only the docs.
- [ ] **D10 — Proof (breadth).** At least **six real recipes** spanning the meaningful
categories have a full green run triggered by `!testme` on a real PR, with all three stages
(install / upgrade / backup+restore) actually exercised. The set must cover:
a stateless/simple app, a single-DB app, a multi-service app, an SSO/identity app, and an
object-storage/large-volume app. **Target set (all previously verified deployable):**
`hedgedoc` (simple), `cryptpad` (stateful, no external DB), `keycloak` + `authentik`
(SSO/identity, DB-backed), `lasuite-docs` and/or `lasuite-drive` (multi-service + S3/MinIO),
`matrix-synapse` (DB + media store), `immich` (large volumes + Postgres), `bluesky-pds`
(TLS-passthrough/atproto). Pick six that together satisfy the categories; record the chosen
set and per-recipe green-run evidence in `REVIEW.md`. Any recipe that genuinely cannot be CI'd
is a documented finding (in `DECISIONS.md`) with the reason, not a silent omission.
*Recipe availability:* the testable repos live on the **private mirror**
`git.autonomic.zone/recipe-maintainers/<recipe>` (already mirrored as of bootstrap:
`bluesky-pds`, `cryptpad`, `keycloak`, `lasuite-docs`, `lasuite-meet`, `matrix-synapse`, `n8n`,
`custom-html`, `custom-html-tiny`). Any recipe **not** yet mirrored (e.g. `hedgedoc`,
`authentik`, `immich`, `lasuite-drive`) is pulled from upstream **git.coopcloud.tech** and
created on the mirror via the **recipe mirror+PR flow** (§4.1) — so the target set is not capped
by what currently exists. If the chosen simple/stateless app isn't mirrored, `custom-html` /
`custom-html-tiny` already are.
When all of D1D10 hold and are Adversary-verified, write `## DONE` to `STATUS.md` with the
evidence links and stop scheduling new iterations.
---
## 3. Repository layout (`git.autonomic.zone/recipe-maintainers/cc-ci`)
```
cc-ci/
├── README.md
├── flake.nix # NixOS host(s) + devshell
├── flake.lock
├── hosts/
│ └── cc-ci/
│ ├── configuration.nix # the cc-ci machine
│ └── hardware.nix
├── modules/
│ ├── drone.nix # Drone server + runner (exec/docker)
│ ├── comment-bridge.nix # !testme webhook listener service
│ ├── swarm.nix # Docker + single-node swarm + `proxy` net; deploys the
│ │ # coop-cloud/traefik recipe via abra (wildcard/file-provider, §4.2)
│ ├── dashboard.nix # results overview site
│ └── secrets.nix # sops-nix / agenix wiring
├── secrets/ # sops-encrypted (*.enc / *.age); see §4.4
│ └── secrets.yaml
├── bridge/ # comment-bridge source (small Go/Python service)
├── runner/ # CI orchestration entrypoint invoked by Drone
│ ├── run_recipe_ci.py # top-level: deploy→test→teardown for a recipe@ref
│ └── harness/ # shared pytest fixtures (abra wrappers, app lifecycle)
├── dashboard/ # results UI generator (reads Drone API → static site)
├── tests/
│ ├── conftest.py # shared fixtures, recipe selection, teardown guarantees
│ ├── <recipe>/
│ │ ├── test_install.py
│ │ ├── test_upgrade.py
│ │ ├── test_backup.py
│ │ └── playwright/ # e2e flows for this recipe
│ └── _template/ # copy-to-add-a-recipe template
├── docs/
│ ├── install.md # from-scratch server build (D8)
│ ├── enroll-recipe.md # how to add a recipe (D5)
│ ├── secrets.md # secret model + rotation (D6)
│ ├── architecture.md
│ ├── runbook.md # debugging failed runs
│ └── baseline.md # bootstrap snapshot
├── STATUS.md BACKLOG.md REVIEW.md JOURNAL.md DECISIONS.md # loop state (§7)
└── .drone.yml # pipeline for cc-ci's own repo (lint/self-test)
```
---
## 4. Technical design (default architecture)
### 4.0 Domain model (where things live)
Two DNS zones, deliberately separated — do **not** conflate them:
- **`git.autonomic.zone` — source of truth for code (unchanged, not ours to reconfigure).**
The Gitea host: the enrolled recipe repos and the `cc-ci` config repo live here. The loop reads,
comments, and (when enrolling) adds a webhook here, but deploys **nothing** here. Per §9 this zone
is read/comment-only — never push recipe code, never point app DNS at it.
- **`commoninternet.net` — the CI server's own zone; everything CI-facing.** A wildcard
`*.ci.commoninternet.net` resolves to a **gateway** (not cc-ci directly — see Network path below).
Under it:
- **Apps under test:** each run deploys to a unique subdomain
`<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`, so concurrent runs never collide on a
hostname. The subdomain (app, volumes, secrets, Traefik route) is torn down at run end (§4.3).
- **Results dashboard:** `ci.commoninternet.net` — overview page + per-recipe status badges (§4.5).
- **Webhook bridge:** `ci.commoninternet.net/hook` — the Gitea `issue_comment` receiver (§4.1).
- **Network path (gateway → TLS passthrough → cc-ci).** The wildcard record does **not** point at
cc-ci's IP. It points at a gateway that **passes TLS through** to cc-ci: the gateway routes by SNI
and forwards the raw encrypted stream without decrypting it, so TLS still **terminates on cc-ci's
Traefik**. Consequences the agent must respect:
- `dig <sub>.ci.commoninternet.net` returns the **gateway's** IP, not cc-ci's — do not assert the
record points at cc-ci. Reachability is proven end-to-end (an HTTPS request lands on cc-ci),
not by comparing A records.
- The gateway is assumed to passthrough the **whole wildcard**, so a fresh per-run subdomain needs
**no gateway change** and **no cert work** (the pre-issued wildcard already covers it) — the
agent only adds the Traefik **router** on cc-ci. (If the gateway
instead needs per-host config, that's an operator/gateway concern and a `## Blocked` item, not
something the agent reconfigures — the gateway is not ours, only cc-ci is, per §9.)
- The gateway is operator-managed and out of scope; the agent configures only cc-ci.
- **Caveat for TLS-passthrough recipes** (e.g. `bluesky-pds`, §2 D10): the default path terminates
TLS at cc-ci's Traefik. A recipe that expects to terminate TLS in its own container needs cc-ci's
Traefik configured to passthrough that host too (the outer gateway already passes the whole
wildcard). Treat this as a per-recipe harness quirk to absorb (§5 M6.5), or pick a non-passthrough
recipe for that D10 category and record the swap in `DECISIONS.md` — not a silent omission.
- **Wildcard TLS — operator pre-issues, agent serves it statically (no token in the agent).**
Routing and certs are separate: the preconfigured wildcard DNS solves routing only; a cert is
still needed because the gateway passes TLS through and cc-ci's Traefik terminates it. **The cert
is pre-provisioned out-of-band** so the DNS-editing token never enters the agent/repo. A wildcard
SAN cert covering **`*.ci.commoninternet.net` + `ci.commoninternet.net`** (issued via Let's
Encrypt DNS-01 against Gandi, by the operator, using a token the agent never sees) lives on cc-ci:
- `/var/lib/ci-certs/live/fullchain.pem` (leaf+intermediate) and `…/privkey.pem`.
- **Traefik is the real `coop-cloud/traefik` recipe, deployed via abra** (for e2e fidelity — see
§4.2), run in its **wildcard / file-provider mode** (`WILDCARDS_ENABLED=1` + `compose.wildcard.yml`).
The pre-issued cert is supplied as the recipe's `ssl_cert`/`ssl_key` **swarm secrets** (sourced
from the files above); the recipe's file provider then serves it under `tls.certificates`. **No
ACME resolver / no DNS provider** is enabled — only the cert+key reach cc-ci, never the DNS token.
One cert covers every per-run subdomain (matched by SNI), so a new app domain needs no cert work.
- **Renewal is a manual operator task** (LE 90-day cert): the operator re-issues out-of-band, then
updates the `ssl_cert`/`ssl_key` secret (bump its version) and redeploys traefik. The agent must
**not** attempt ACME/DNS-01 for `commoninternet.net` and must **not** expect a DNS token — a
missing/expired cert is an operator action surfaced as a finding, not something the agent re-issues.
(Rationale for choosing a wildcard cert over per-subdomain: a wildcard is reused for every churning
run subdomain and sidesteps LE's 50-certs/week-per-domain limit; only DNS-01 can mint a wildcard.
We keep that DNS-01 issuance with the operator rather than handing the agent the zone token.)
- Record the live facts in `docs/install.md`: the zone + DNS provider (Gandi), that the wildcard
`*.ci.commoninternet.net` (and bare `ci.commoninternet.net`) point at the **gateway**, that the
gateway TLS-passthroughs the wildcard to cc-ci, the gateway's address, the TTL, and that the
wildcard cert is pre-issued/operator-renewed at `/var/lib/ci-certs/live/` (no DNS token on cc-ci).
### 4.1 The `!testme` trigger path
Gitea does not natively forward PR-comment events to Drone, and Drone's built-in triggers fire on
push/PR-open, not on a magic comment. So:
```
PR comment "!testme"
│ Gitea webhook (issue_comment event) ──► comment-bridge (modules/comment-bridge.nix)
│ • verifies webhook HMAC secret
│ • checks comment body == "!testme" (exact, trimmed)
│ • checks commenter is allowed (org member / collaborator)
│ • resolves PR head repo + SHA via Gitea API
│ • calls Drone API: build for cc-ci pipeline,
│ params RECIPE=<repo> REF=<sha> PR=<n> SRC=<headrepo>
Drone build (cc-ci repo pipeline, parameterized) ──► runner/run_recipe_ci.py
Bridge posts/updates a Gitea PR comment with the run URL and (on completion) pass/fail.
```
- The bridge is a tiny service (Go or Python+FastAPI). Keep it dependency-light; it's a NixOS
systemd service behind Traefik at e.g. `ci.commoninternet.net/hook` (§4.0).
- **Trigger: POLLING is primary; webhook is an optional, admin-registered push optimization
(SETTLED).** Hard constraint: **the CI server/bot must run on READ-level access — never repo-admin.**
- **Polling (primary, default):** the bridge polls the Gitea API for new `!testme` comments on
enrolled repos at ≤60s (satisfies D1). This is **outbound** (cc-ci → git.autonomic.zone, the
reliably-working direction) and needs only **read**. It is the source of truth for triggering.
- **Webhook (optional):** the bridge keeps its `/hook` endpoint so a Gitea `issue_comment` webhook,
**if present**, gives lower latency. But the **server does NOT self-register webhooks** (that
needs repo-admin, which we refuse to require). Registration is a **manual admin task, documented**
in `docs/enroll-recipe.md` (URL `https://ci.commoninternet.net/hook`, event `issue_comment`,
content-type `json`, the shared HMAC secret, and the note that the Gitea instance must allow the
host). The two paths are mutually exclusive in effect; don't double-fire a comment seen by both.
- (Webhook delivery on this instance was flaky early on — `last_status: None` — so polling being
primary is also the robust choice, not just the low-privilege one.)
- **Commenter auth via org membership (read-level — no admin).** The repo's explicit collaborator
list is empty: the bot *and* the maintainers (`trav`/`notplants`) all reach the repo as
`recipe-maintainers` **org members/owners**, so `GET /collaborators/{user}` 404s for everyone, and
`GET /collaborators/{user}/permission` would authorize correctly but **requires repo-admin** — which
we refuse. Instead authorize with **`GET /orgs/recipe-maintainers/members/{user}`** (204 = member =
authorized; 404 = rejected) — readable by any **org member** (read-level), verified to admit
`trav`/`notplants`/the bot and reject non-members. Note `public_members` is hidden here, so use the
authenticated `members` endpoint (bot must be an org member, still read-level). Fail-closed on
error. Zero-privilege fallback: a configured allowlist of usernames. (Still satisfies §6's
non-collaborator-rejection check.)
- Enrollment = adding the recipe to the bridge's **poll list** + ensuring a `tests/<recipe>/` dir
exists. The bot needs only **read** on the recipe repo (+ comment-back to post status). Registering
a webhook is **optional and operator/admin-side** (documented in `enroll-recipe.md`), never required
for CI to work.
- **Recipe mirror+PR flow (how a recipe gets a testable PR).** Recipe repos under test live on the
**private mirror** `git.autonomic.zone/recipe-maintainers/<recipe>`, mirrored from the **official
upstream `git.coopcloud.tech`**. To bring a recipe under CI: `abra recipe fetch <recipe>` (pulls
from upstream into `~/.abra/recipes/<recipe>`), then mirror it to the org + open a PR via the
**recipe mirror+PR procedure** — reference implementation:
`/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` (creates `recipe-maintainers/<recipe>`
if absent, force-syncs `main` from upstream so the PR diff is clean, pushes a branch, opens the PR).
`!testme` on that PR is what kicks off a run. So a recipe missing from the mirror is **not** a
blocker — mirror it first.
- Decide and record in DECISIONS.md: one shared Gitea org-level webhook vs per-repo webhooks.
Org-level is fewer moving parts; per-repo is more explicit. Default: per-repo via enroll script.
### 4.2 Drone + the test target
- Drone server connects to Gitea via OAuth app (Gitea → Settings → Applications). Runner is the
**exec runner** (or a privileged docker runner) running **on cc-ci itself**, because tests must
drive `abra` to deploy real recipes onto a real swarm.
- cc-ci doubles as the **deploy target**: single-node Docker Swarm + abra, with the reverse proxy
provided by the **real `coop-cloud/traefik` recipe deployed via abra** (not a hand-rolled Traefik
— chosen for **end-to-end fidelity**: test apps route through the exact proxy a real Co-op Cloud
host uses — `web`/`web-secure` entrypoints, the `proxy` overlay, the swarm provider). TLS
terminates on it using the **pre-issued static wildcard cert** (§4.0): run the recipe in
**wildcard/file-provider mode** (`WILDCARDS_ENABLED=1` + `compose.wildcard.yml`) and supply the
cert as the recipe's `ssl_cert`/`ssl_key` swarm secrets from `/var/lib/ci-certs/live/`. The
operator preconfigures the wildcard DNS (→ gateway), the gateway's TLS-passthrough, and the cert
itself (§4.4); the agent deploys the traefik recipe + swarm on top — **no ACME, no DNS token on
cc-ci**. Make the `abra app new/deploy traefik` steps reproducible (scripted/Nix-invoked) for D8.
- Each CI run gets an isolated app domain `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`
(§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes.
- **Concurrency cap + queue — use Drone natively (SETTLED).** Don't let the server fill with
simultaneously-deployed apps. Expose a configurable **`MAX_TESTS`** mapped to the exec runner's
**`DRONE_RUNNER_CAPACITY`** (Nix-set on the runner; default low — **12** given a single 28 GiB
node and heavy recipes like matrix/immich). Drone runs at most `MAX_TESTS` builds at once and
**automatically queues** excess builds (its native pending-build queue), starting them as slots
free. **Per-build timeout** (repo/runner timeout) guarantees a hung test is killed and frees its
slot — so "continue once a current test finishes *or times out*" is built in. No custom queue
needed. Optionally also set `concurrency: { limit: <N> }` in `.drone.yml` as a per-pipeline cap.
- **One app at a time per run, torn down at run end.** A build deploys its recipe, runs the three
stages, then **undeploys** — the server should not accumulate live test apps. Guaranteed teardown
+ the run-start janitor (§4.3) enforce this even when a build is timed-out/killed (in-process
cleanup can't run, so the janitor reaps it).
### 4.3 The test harness & recipe test contract
`runner/run_recipe_ci.py` orchestrates per run:
1. Fetch recipe at `$REF` (the PR head) via abra/git.
2. **Install stage** → `tests/<recipe>/test_install.py`: `abra app new`, generate secrets,
`abra app deploy`, wait healthy, run Playwright smoke + assertions.
3. **Upgrade stage** → deploy previous published version first, then upgrade to `$REF`; assert
data survives and app still healthy.
4. **Backup/restore stage** → `abra app backup`, mutate state, `abra app restore`, assert restored
state matches pre-mutation.
5. **Recipe-local tests (D4)** → if `<recipe-repo>/tests/` exists, discover & run it in the same
live environment; merge results.
6. **Teardown (always, even on failure)** → `abra app undeploy`, `abra app volume remove`,
`abra app secret remove`, DNS/route cleanup.
Shared fixtures (`tests/conftest.py` + `runner/harness/`) wrap abra. **Known abra gotchas to bake
in from day one** (carried over from prior work, re-verify on the installed abra version):
- `abra app undeploy` and `abra app volume remove` do **not** accept `--chaos` → never pass it.
- Plumb a `timeout` kwarg through secret-generate/insert/remove-all calls.
- `abra app ls -S -m` returns nested `{server: {apps: [...]}}` — parse the inner structure.
- Pick robust health checks per app (e.g. Keycloak: `/realms/master`, not `/`).
The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the
next run. Implement teardown as a pytest fixture finalizer / `try/finally` in the orchestrator and
add a janitor pass at run start that nukes any orphaned `*-pr*` apps older than N hours.
**Crucially, the janitor is the backstop for timed-out/killed builds:** when Drone hits the
per-build timeout (or a build is cancelled) it may SIGKILL the runner process, so the `try/finally`
teardown can't run — those orphaned apps/volumes are reaped by the next build's run-start janitor
(and the janitor should run regardless of how the previous build ended). Net effect with the
`MAX_TESTS`/`DRONE_RUNNER_CAPACITY` cap (§4.2): at most `MAX_TESTS` apps are ever live at once, and
each is torn down (or janitor-reaped) so the single node never accumulates deployments.
### 4.4 Secrets (D6)
There are **two distinct classes of secret** and they are handled in opposite ways. Do not
conflate them.
**(A) Infra secrets.** All of these end up `sops-nix`-encrypted in `secrets/`, decrypt into the Nix
store at activation, and are never world-readable. But they split into two sub-classes — see §1.5
for the concrete locations/usage — and only the first sub-class blocks:
- **(A1) External inputs — provided by the operator, the loop cannot create them.** The Tailscale
auth key + Gitea bot creds (`/srv/cc-ci/.testenv`, already provided), the **pre-issued wildcard
TLS cert** at `/var/lib/ci-certs/live/` (§4.0 — *not* a DNS token; the agent serves it, never
issues it), and **registry pull creds** (if needed). If one of these is **missing or invalid, the
loop is blocked** — write it to `STATUS.md ## Blocked` and stop (§9). The agent must not invent or
work around an external input it wasn't given, and must **not** attempt ACME/DNS-01 for
`commoninternet.net`.
- **(A2) Internal secrets — the loop generates and manages these itself; never block on them.**
Drone RPC secret + webhook HMAC (`openssl rand`), the Gitea OAuth app for Drone (created via the
bot API), and the cc-ci host age/GPG key for sops. These are *not* human inputs; generate, store
in `secrets/`, and wire both ends.
Alongside these, three **preconfigured network/cert facts** are operator-provided inputs the loop
also depends on (not secrets the agent makes, but class-A in the same "provided, don't improvise"
sense): (1) the wildcard `*.ci.commoninternet.net` record (and bare `ci.commoninternet.net`) already
points at the **gateway**, (2) the gateway **TLS-passthroughs** that wildcard to cc-ci (SNI-routed,
no decryption — see §4.0 Network path), and (3) the **pre-issued wildcard cert** is in place at
`/var/lib/ci-certs/live/`. The operator owns the DNS record, the gateway, and cert issuance/renewal;
**everything else on cc-ci is the agent's job** — Traefik (pointed at the static cert), swarm,
per-run subdomain routing, and teardown. If the wildcard does not resolve, the gateway doesn't reach
cc-ci, or the cert is missing/expired, that is a `## Blocked` condition (operator action), not
something to work around (the gateway and DNS are not ours to reconfigure, per §9).
**(B) Recipe app secrets — generated by the test, persisted within the run.** These are NOT a
blocker and are NOT pre-provisioned by a human. The harness creates them itself for each app under
test and is responsible for persisting them across the run so the multi-stage lifecycle works:
- **Generate at install:** the harness runs `abra app secret generate` (+ inserts any deterministic
test fixtures like an admin password / test user it chooses) when it deploys the app.
- **Persist for the run's duration:** the *same* generated secrets must survive across stages —
install → upgrade and especially **backup → restore** — because an app cannot be upgraded or
restored against rotated credentials. Persist them in a per-run secret store keyed by the run's
unique app name (e.g. `<recipe>-pr<n>-<sha>`): the live abra/swarm secrets plus a sidecar record
the harness writes (e.g. the app's `.env` + the generated values) to a run-scoped, non-public
location on the runner, so any stage can re-read them. They are emphemeral by design.
- **Destroy at teardown:** the same teardown that removes the app/volumes also runs
`abra app secret remove` (with `timeout` plumbed) and deletes the per-run sidecar. Nothing
generated for a run outlives that run.
- **How the harness should "figure out" persistence (acceptance for D6):** decide and document one
concrete mechanism — recommended default is "abra/swarm holds the live secrets; the harness keeps
a run-scoped sidecar file under a `runs/<app-name>/` dir on the runner (mode 600), and reloads
from it between stages." Whatever is chosen, it must (1) keep the same values stable across all
three stages, (2) isolate concurrent runs from each other, and (3) leave nothing behind.
**(C) Drone CI tokens:** store as Drone org/repo secrets, referenced by the pipeline. Where a value
is an external input (A1, e.g. registry creds) it is provided; where it is internal (A2) it is
generated — see the (A) split above.
Hard rule across all classes: scrub secrets from logs before they reach the dashboard; the results
UI shows sanitized logs only. Add a redaction filter in the log pipeline and an Adversary test that
greps published logs and the overview site for known secret patterns and any generated app
password.
### 4.5 Results UX (D7) — YunoHost-CI-like
- **Per-run logs:** Drone's native UI already gives live, per-stage, tail-able logs and a final
status — use it as the canonical run view; the PR comment links to it.
- **Overview page:** a small generator (`dashboard/`) polls the Drone API and renders a static
page at `ci.commoninternet.net` (§4.0): a table of enrolled recipes, latest run status badge
(pass/fail/running), last-tested version, link to history — mirroring the YunoHost app-list
feel. Served by Traefik; regenerated on build-completion webhook or a short timer.
- Provide a status badge endpoint per recipe for embedding in recipe READMEs.
---
## 5. Milestones / initial BACKLOG
Work top-down; each milestone ends with an **Adversary gate** (Adversary must independently
verify the acceptance check before the next milestone starts). Seed `BACKLOG.md` from this.
- **M0 — Foundations.** Repo created; flake builds; `nixos-rebuild` (or deploy-rs) applies a
no-op-then-base config to cc-ci; sops decrypts a test secret on the host.
*Accept:* `ssh cc-ci 'systemctl is-system-running'` healthy after a rebuild from the repo.
- **M1 — Swarm + abra target.** Docker + single-node swarm + `proxy` network; the **`coop-cloud/traefik`
recipe deployed via abra** (wildcard/file-provider mode, serving the pre-issued cert — §4.0/§4.2,
not a custom Traefik); abra can deploy and tear down a trivial recipe by hand.
*Accept:* a recipe deployed via abra is reachable over HTTPS (valid wildcard cert) on the
`web-secure` entrypoint at `*.ci.commoninternet.net`, then fully torn down leaving no volumes; the
proxy is verifiably the traefik recipe and **no DNS/ACME token is present on cc-ci**.
- **M2 — Drone online.** Drone server+runner via Nix, OAuth to Gitea; a hello-world `.drone.yml`
in cc-ci runs green; logs visible in Drone UI.
*Accept:* push to cc-ci triggers a visible green Drone build.
- **M3 — Comment bridge.** `!testme` on a PR triggers a parameterized Drone build; bridge posts a
PR comment with the run link; non-`!testme` comments and non-collaborators are ignored.
*Accept:* live demo on a scratch PR — comment in, build out, link back, auth enforced.
- **M4 — Harness + install stage.** `run_recipe_ci.py` + conftest; install stage green for one
simple recipe end-to-end with a Playwright assertion; guaranteed teardown.
*Accept:* full green install run for recipe #1, no orphaned app/volume afterward.
- **M5 — Upgrade + backup/restore stages.** Add the other two stages for recipe #1.
*Accept:* upgrade preserves data; backup→mutate→restore returns original state.
- **M6 — Recipe-local tests (D4) + second recipe.** Discover/run recipe-repo `tests/`; enroll a
second, DB-backed recipe via the documented flow.
*Accept:* both recipes green; recipe-local tests demonstrably executed and merged.
- **M6.5 — Breadth ramp.** Enroll recipes 3→6 covering the remaining D10 categories, one at a
time, each via the documented enroll flow (this is the real test of D5: enrolling recipe N
should be template-copy + recipe-specific tests/fixtures, with **no harness surgery**). Expect
per-recipe quirks — multi-service deps, S3/MinIO config, SSO client setup, TLS passthrough,
large-volume backups — and absorb them into the *shared* harness, not one-off per-recipe hacks.
When flakiness appears, add real readiness/wait robustness to the harness rather than sprinkling
`sleep`s. Run benchmarks/long deploys **sequentially**, never in parallel (network contention).
*Accept:* recipes 36 each have a full three-stage green run; enrolling N≥3 needed no changes to
shared harness code.
- **M7 — Secrets hardening (D6).** Full sops model, rotation doc, log redaction + leak test.
*Accept:* Adversary's secret-grep over published logs finds nothing; rotation doc followed.
- **M8 — Dashboard (D7).** Overview page + badges + PR-comment outcome reflection.
*Accept:* overview matches reality across several runs; outcomes mirrored to PR comments.
- **M9 — Reproducibility + docs (D8/D9).** `docs/install.md` rebuilds the server from scratch on a
blank VM; all docs complete.
*Accept:* Adversary rebuilds from docs onto a throwaway host (or records the tested subset).
- **M10 — Proof (D10).** All six chosen recipes green via real `!testme` PRs (the breadth set from
M6/M6.5 carried through the hardened pipeline), each with install/upgrade/backup-restore
exercised and Adversary-verified; flip `STATUS.md` to DONE.
---
## 6. The two agents
### Builder (primary)
Implements the backlog top-down. Discipline:
- One backlog item in flight at a time. Small, committed, reversible steps.
- Every change verified against the *real* system (server, Drone, Gitea) before claiming done —
never "should work". Paste the verifying command + output into `JOURNAL.md`.
- Touch production carefully: cc-ci is the only target; never deploy test apps onto unrelated
production servers; never reuse production domains. Idempotent server changes only (via Nix).
- If blocked on access/secrets/external state, write it to `STATUS.md ## Blocked` and pick up an
unblocked item rather than hacking around it.
### Adversary (reviewer)
Runs as a **separate, independent loop in its own process/sandbox** (see §6.1 for how the two
loops coordinate). Its job is to **disbelieve**. It:
- Re-verifies each `Definition of Done` and milestone-acceptance claim independently, from a cold
start (fresh shell, own clone, no cached state), and logs PASS/FAIL + evidence in `REVIEW.md`.
- Actively tries to break things: comment `!testmexyz` (should NOT trigger), comment as a
non-collaborator (should be rejected), push a PR that fails tests (must report red, not green),
kill an app mid-run (teardown must still clean up), grep published logs/dashboard for secrets,
run two `!testme`s concurrently (no domain/volume/secret collision), confirm the same generated
app secrets persist across install→upgrade→backup/restore.
- Files every defect as a `BACKLOG.md` item tagged `[adversary]` with repro steps. The Builder
may not close an adversary item; only the Adversary closes it after re-test.
- Has veto power over `STATUS.md → DONE`.
### 6.1 Coordination protocol (two independent loops, one shared repo)
The two loops never talk directly; the **git repo is the only coordination medium**. Each agent
has its own clone (e.g. Builder in `/srv/cc-ci/cc-ci`, Adversary in `/srv/cc-ci/cc-ci-adv`) and
its own pacing. To make concurrent writes conflict-free:
- **File ownership (one writer each — the other only reads):**
- Builder owns: all source code/config, `STATUS.md`, `JOURNAL.md`, `DECISIONS.md`.
- Adversary owns: `REVIEW.md`.
- `BACKLOG.md` is split into two H2 sections — `## Build backlog` (Builder-only) and
`## Adversary findings` (Adversary-only). Each agent edits **only its own section**, so git
merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder
fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it
closed after re-test.
- `DEFERRED.md` (in `machine-docs/`) is the **single canonical registry for things the loops
have deliberately decided not to do autonomously and that need operator input to move on.**
Append-only; either agent may file. Each entry should clearly say *what's needed from the
operator* to lift the deferral (an opt-in flag, a resource decision, an architectural call,
plain "go ahead"). The list is **open-ended** — items can sit indefinitely, **no obligation
to close every item**, closure is operator-driven. A re-entry trigger / IDEA cross-link is
**optional** (include when there's a natural mechanism, e.g. an opt-in flag in
`cc-ci-plan/IDEAS.md`). Don't park deferrals as a vague "Q4 follow-up" / buried JOURNAL note
— file them here so the operator can review the whole list. The Phase-4 cleanup pass should
**surface** DEFERRED.md to the operator at least once but does **not** force closure.
Future-aspirational ideas (out of current scope) still go to `cc-ci-plan/IDEAS.md`; DEFERRED
is for considered-and-parked work the loops won't tackle without operator input.
- **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never
conflict. Prefer appending over rewriting.
- **Artifact-layer isolation — facts in STATUS, reasoning in JOURNAL (anti-anchoring).** Rigorous
adversarial verification requires the Adversary NOT to consume the Builder's rationalisations
before forming its verdict. The split:
- `STATUS.md` MUST carry **everything the Adversary needs to verify the claim** — withholding
verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW**
to verify (the exact command/check the Adversary can re-run from its own clone), the
**EXPECTED** outcome (build hashes, file contents, leaf fingerprints, status codes), and
**WHERE** the inputs live (commit shas, paths). If it's essential for the Adversary to verify,
it goes in STATUS.
- `STATUS.md` MUST NOT carry rationalisations / "why I think this passes" / design narrative /
dead-ends explored. Those go in `JOURNAL.md` (Builder-private to write).
- The Adversary reads STATUS for the claim + verification info, the plan as SSOT, and the code /
git history; it forms its verdict from those + its own **cold** acceptance run, and does **not**
read `JOURNAL.md` before the verdict. After an independent verdict, consulting JOURNAL is fine
(e.g. to contextualise a finding) — note in REVIEW that you did.
In short: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.**
- **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the
smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's
file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`.
- **Gate handshake via STATUS.md + commit-prefix signalling.** When the Builder believes a milestone
gate is met, it sets in `STATUS.md`: `Gate: <Mn> — CLAIMED, awaiting Adversary`, **commits it with a
`claim(...)` prefix**, and stops advancing past it. The Adversary runs the acceptance check cold and
writes the verdict to `REVIEW.md` (`<Mn>: PASS @<ts>` with evidence, or `FAIL` + an `[adversary]`
item), **committed with a `review(...)` prefix**. The Builder only proceeds past the gate after
seeing `PASS` in `REVIEW.md`.
- **The watchdog signals the handoff off these commit prefixes** (not by parsing prose): a new
`claim(...)` commit on origin/main pings the Adversary; a new `review(...)` commit pings the
Builder. So the prefixes are **load-bearing** — a gate claim MUST be a `claim(...)` commit and a
verdict MUST be a `review(...)` commit, or the counterpart isn't promptly woken (it falls back to
its slower self-poll). STATUS/REVIEW remain the durable source of truth; the prefix is the signal.
- **Clean tree before claim.** The Builder runs `git status` before claiming — the working tree
must be clean (everything committed AND pushed). The Adversary cold-verifies from a fresh clone,
so an uncommitted/un-pushed change that only exists on the Builder's host (e.g. a locally-built
fix) is a guaranteed cold-verify mismatch. Commit + push first, then claim.
- **DONE handshake.** Builder may write `## DONE` to `STATUS.md` **only** when `REVIEW.md` shows a
PASS dated within 24h for every D1D10. The Adversary can write `## VETO <reason>` to
`REVIEW.md` at any time, which forbids DONE until cleared.
- **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or
the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep
doing independent work — neither loop blocks idle waiting on the other beyond its gate.
- **INBOX — explicit cross-loop messaging beyond CLAIMS.** Sometimes you have something to say to
the other loop that isn't a gate claim or a REVIEW verdict (a heads-up, a request for
early-look, a "I refactored X, please re-verify Y", an observation outside the normal flow). For
those, use the inbox files in `machine-docs/`:
- **Builder → Adversary:** the Builder writes/appends `machine-docs/ADVERSARY-INBOX.md` in its
own clone, commits, pushes.
- **Adversary → Builder:** the Adversary writes/appends `machine-docs/BUILDER-INBOX.md` in its
own clone, commits, pushes.
- The watchdog edge-triggers on **newly-present** inbox files in the relevant clone and pings
the receiver. The receiver, on receipt, reads + processes the message, then **deletes the
inbox file** (commits + pushes) — deletion is the "message consumed" signal. Single-writer
discipline: only the sender writes their counterpart's inbox; only the receiver deletes it.
- **Use for:** non-gate signals — "heads-up I'm about to refactor X," "please cold-verify this
while I keep going," "I observed Y outside our normal flow," "I'm taking a long e2e now."
**Do NOT use for:** formal gate claims (`STATUS.md` still owns those) or verdicts (`REVIEW.md`
still owns those). The inbox is a side-channel, not a replacement.
(If you are ever forced to run with a single process, the degraded fallback is to alternate
roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is
the intended design.)
---
## 7. The Loop Protocol
Both loops run this same shape; state lives in the repo so it survives restarts/compaction. On
every wake, `git pull --rebase` first, then:
1. **Orient.** Read `STATUS.md` (phase, in-flight item, gate claims, blockers), `BACKLOG.md`, and
the tail of `REVIEW.md`. Reconcile with reality via cheap probes (Drone health, last build,
`git log`) — never trust the ledger blindly; if it disagrees with the system, fix the ledger
first (your own files only — see §6.1).
2. **Select.**
- *Builder:* highest-priority open item in `## Build backlog`: unresolved `[adversary]`
findings > current milestone's next task > next milestone. Never advance past a milestone gate
until `REVIEW.md` shows its PASS.
- *Adversary:* any standing `Gate: <Mn> CLAIMED` in `STATUS.md` to verify > re-verify a D1D10
gate whose last PASS is stale (>24h) > a fresh break-it probe from §6.
3. **Act.** Smallest change that advances the item. Builder verifies against the real system;
Adversary verifies from a cold start. Commit with a clear message (author per repo convention).
4. **Record (your own files only).** *Builder:* append to `JOURNAL.md` (what you did + verifying
command/output + next), update `STATUS.md`, tick `## Build backlog`. *Adversary:* append PASS/
FAIL + evidence to `REVIEW.md`, add/close items in `## Adversary findings`. Then `git push`.
5. **Gate handshake (§6.1).** Builder, on reaching a milestone, sets `Gate: <Mn> CLAIMED, awaiting
Adversary` in `STATUS.md` and works on other unblocked items meanwhile. Adversary clears it with
a `REVIEW.md` verdict. No gate is "passed" without a logged PASS.
6. **Decide continuation.** Builder writes `## DONE` only when `REVIEW.md` shows a <24h PASS for
every D1D10 and no standing `## VETO`. Otherwise schedule the next wake.
**Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the
harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll
the *specific* thing. Three cases:
1. **Something in flight** (build/deploy/`nixos-rebuild`/e2e/heavy test) → **poll every ~5 min** to
stay cache-warm and to **see failures as they happen**, not at the end of a 25-minute sleep. Do
**NOT** `ScheduleWakeup` for the expected total runtime of the task in a single big sleep — a 25
min e2e gets 5 short cache-warm polls, not one 25-min cache-cold blackout. The wakeup that wakes
you mid-task is *cheap* (one cache hit, one quick status check); the value of catching a deploy
that died at minute 4 of a 25-min budget is large. Keep polling *it*, don't treat it as idle.
- **Recommended pattern for long deploys/convergence (builder, 2026-05-30):** **arm a `Monitor`**
that polls the node every ~30s and **wakes you on convergence OR failure**, with a **longer
fallback heartbeat** (`ScheduleWakeup`) as a backstop if the Monitor never fires. This proceeds
the *instant* the deploy converges (no over-waiting if it finishes early) and surfaces a failure
promptly, while the heartbeat bounds the wait if the condition is never met. Size the convergence
timeout sanely — longer than a few minutes if a recipe genuinely needs it, but **never absurd**
(e.g. the ~40-min ghost timeout was excessive). Beats both a single big blind sleep and a fixed
coarse poll.
2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or
Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll
here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate,
the watchdog pings the Adversary to verify *now*; the moment the Adversary updates `REVIEW.md`
(verdict/finding), it pings the Builder to proceed (`launch.sh`, ~30 s detection). So you may sleep
while blocked and trust the ping — but keep a **fallback self-poll on a modest cadence (~24 min)**
in case a ping is missed (a dead session is restarted by the watchdog and re-orients from the repo
anyway). The goal: a pending handoff resolves in well under a minute, not a whole idle interval.
3. **Genuinely idle, nothing pending from either loop** → sleep in chunks of **at most 10 min**, then
re-wake and re-orient; if still nothing, sleep another ≤10 min. **Never a single wait > 10 min**
(600 s) — see the liveness rule below.
Notes: **The Adversary may idle freely when nothing is pending — it should NOT pointlessly re-verify
or busy-poll to look busy.** It gets woken by the watchdog the instant the Builder claims a gate, so
"start verifying very soon after the Builder waits" is handled by the signal, not by the Adversary
spinning. **The Builder** should prefer keeping an unblocked backlog item in hand so it's rarely
*fully* blocked on a gate; only hit case 2 when everything is genuinely gated behind the pending
verification — and then rely on the watchdog ping (+ fallback poll) rather than a long idle.
**Liveness marker & max-wait (the watchdog ENFORCES this).** Every wait is capped at **10 minutes**;
to wait longer, wake at 10 min, re-check, and wait again. **Immediately before going idle for any
wait, your FINAL output line MUST be exactly:**
WAITING-UNTIL: <ISO-8601 UTC>
— the moment you intend to resume (≤10 min out, matching your `ScheduleWakeup`). Compute it from the
clock, e.g. `date -u -d '+10 min' +%FT%TZ`. The watchdog uses this to tell a healthy wait from a
wedge: if it sees a loop **idle ≥5 min with no current `WAITING-UNTIL` marker as its last message, OR
idle past the time the marker named, it kills + reboots that loop** (which then re-orients from git +
its STATUS/REVIEW files). So always leave a fresh marker before sleeping, and never overrun it.
**Proactive compaction.** If your context usage climbs high (≳80%), run `/compact` *before*
continuing — your state lives in git + the phase STATUS/REVIEW files, so compaction is lossless for
the loop and prevents wedging (garbled output, failed tool calls) near the context limit.
**Anti-drift guards.**
- Cap retries: if an approach fails 3× the same way, stop, write the dead-end in `DECISIONS.md`,
and try a different approach or mark blocked. No thrashing.
- Never weaken a test to make it pass. A red test is information; "fix" the recipe/harness or file
a finding — do not delete the assertion. (This is the single most important rule; the Adversary
watches specifically for tests being softened or skipped.)
- Keep changes reversible; prefer Nix-declared state over imperative server edits so any rebuild
reproduces it.
- Don't expand scope beyond §2. New ideas → `BACKLOG.md` (tagged `[idea]`), not into this run.
---
## 8. Open decisions to settle early (log in DECISIONS.md)
- Deploy mechanism: `nixos-rebuild --target-host` vs `deploy-rs`/`colmena`. (Default: deploy-rs
for atomic rollbacks; nixos-rebuild fine if simpler.)
- Webhook scope: per-repo vs org-level Gitea webhook. (Default: per-repo via enroll script.)
- Drone runner type: exec vs privileged docker. (Default: exec, since it must drive host abra.)
- Secret tool: sops-nix vs agenix. (Default: sops-nix for multi-recipient + yaml ergonomics.)
- Reverse proxy / Wildcard TLS: **SETTLED — deploy the real `coop-cloud/traefik` recipe via abra
(for e2e fidelity), in wildcard/file-provider mode, serving the operator's pre-issued wildcard
cert; no ACME, no token** (§4.0/§4.2). Supersedes the original plan's hand-rolled
`modules/traefik.nix`. The operator issued the wildcard SAN cert (`*.ci.commoninternet.net` +
`ci.commoninternet.net`) via LE DNS-01/Gandi out-of-band into `/var/lib/ci-certs/live/`; the agent
feeds it as the recipe's `ssl_cert`/`ssl_key` swarm secrets so the DNS-editing token never reaches
cc-ci. **Manual renewal** ~90 days (next ~2026-08-24): re-issue → update the secret → redeploy.
- Proof recipe set (D10 — six, category-spanning). Default candidates, all previously verified
deployable: `hedgedoc`, `cryptpad`, `keycloak`, `authentik`, `lasuite-docs`/`lasuite-drive`,
`matrix-synapse`, `immich`, `bluesky-pds`. Lock the final six early so M4M6.5 build toward them.
Sequence easy→hard: prove the pipeline on `hedgedoc`/`cryptpad` before tackling SSO, S3, media
stores, and TLS-passthrough recipes.
Each default stands until the Adversary or reality forces a change; record the change and why.
---
## 9. Guardrails / hard rules
- **Access boundary:** only cc-ci is yours to reconfigure. Recipe repos: read + comment + (when
enrolling) add a webhook — nothing else. Never push to a recipe repo's code.
- **No secrets in git/logs/UI.** Ever. Verified by the Adversary's leak test.
- **No mocks for the e2e stages.** D2 means real deploys. If something can't be tested for real,
it's a finding, not a pass.
- **Idempotent + reversible.** Anything done to the server must be re-derivable from the repo.
Infra bring-up is **declarative idempotent reconciliation in Nix** — not manual post-steps and not
run-once scripts. Each piece (swarm + `proxy` net, the traefik recipe deploy, Drone, the
comment-bridge, the dashboard) is a systemd **oneshot that re-runs on every activation/boot** and
*converges* to the desired state (inspect → act only if needed → no-op if already correct), like
`swarm-init`. **No `/var/lib/.bootstrapped`-style sentinels** (they don't self-heal drift). The
goal: a from-scratch install is `git clone` + `nixos-rebuild switch` + the operator preconditions
— `docs/install.md` must never accumulate manual post-rebuild steps.
- **Stop on missing *external* infra inputs** (class-A1 in §4.4: cc-ci SSH/root access, the
Tailscale auth key, Gitea bot creds, the pre-issued wildcard cert at `/var/lib/ci-certs/live/`,
registry creds — and the preconfigured DNS/gateway facts) rather than improvising around them;
surface in `STATUS.md ## Blocked`. **Never** attempt ACME/DNS-01 for `commoninternet.net` — the
cert is pre-provided and renewed out-of-band by the operator. **This does NOT apply to** internal infra secrets (class-A2: Drone RPC,
webhook HMAC, Gitea OAuth app, host age key — the agent generates these) or to recipe app secrets
(class-B): those the test harness generates itself (`abra app secret generate` + chosen fixtures),
persists for the run, and destroys at teardown — a missing app secret is never a blocker, it is
something the harness
creates. See §4.4.
- **Real abra deploys; abra convergence by default; custom readiness only if it's a real test.**
Deploys/upgrades use the **real abra commands** (`abra app deploy`/`upgrade`) — never bypass abra
with `docker service update`/`scale`. **Prefer abra's own convergence checks.** Only skip abra's
post-deploy convergence monitor (`-c`/`--no-converge-checks`) and substitute a **harness READY_PROBE**
when abra's monitor genuinely doesn't fit (e.g. its window is too short for a heavy app and it FATAs
on a deploy that *does* converge). When you do: the deploy is still real abra (only abra's *waiting*
is replaced), and the probe MUST be a **genuinely strict** readiness test — all services N/N **plus**
a real app-level check — that **RAISES on actual non-readiness**, never a no-op that masks a failed
deploy. **Prove it has teeth** (a negative test that fails on stuck convergence, e.g. F2-12's
P7-negative). The Adversary treats a custom probe as a potential test-weakening until cold-verified.
- **Custom cc-ci compose overlays — avoid where possible, justify each, prefer upstream.** A
cc-ci-authored compose overlay (an extra `compose.*.yml` layered via `COMPOSE_FILE`) risks **drift**
from the recipe users actually run, so **avoid it where possible and justify each use**. In most
cases the cleaner fix is an **upstream recipe PR** — either a genuine robustness fix, or exposing a
knob the recipe should expose. **But a single, uniform, optional `compose.ccci.yml` overlay file per
recipe is an acceptable fallback** — especially for a value abra/compose can't take from an env var.
(One fixed filename per recipe — `compose.ccci.yml` — holding all cc-ci-side deploy tweaks; not
per-purpose suffixes.)
**Known limitation (builder, 2026-05-30): abra does NOT support an env value for a healthcheck
`start_period`.** So the ghost/discourse `start_period` bumps legitimately **need** the overlay (an
env-var PR is not possible for that field) — these overlays **stay**, justified. When you do use an
overlay: keep it **minimal + single-purpose**, **document WHY in the file header** (the exact abra/
upstream limitation that forces it), have the **Adversary confirm it doesn't weaken a test or mask a
recipe defect**, and **file the upstream PR where the fix genuinely belongs** (e.g. if a recipe's
`start_period` is too tight for any slow host, propose raising it upstream too).
- **Upgrade tier: always test the upgrade to the LATEST version.** Don't drop the upgrade test just
because the *from* (older) version is awkward. If an older from-version can't be fully deployed/tested
(its image tag was pulled from the registry, or it predates an overlay/feature), you do **NOT** need
that older version's **custom tests** to run — deploy it minimally (a justified overlay is fine) or
pick the nearest deployable prior, then **upgrade to latest and run the full assertions on the
latest**. Skipping a from-version's custom tests is an honest, recorded outcome; skipping the
upgrade-to-latest is not. (See `plan-ccci-compose-overlay-policy.md` for the per-recipe disposition.)
- **Honest reporting.** If a stage is skipped or a check failed, say so in `STATUS.md`/`JOURNAL.md`
with the output. The loop's value depends entirely on the ledgers being true.