The orchestrator Pi is retired (2026-05-31). All agents now run on the cc-ci-orchestrator VM (NixOS, loops user, /srv/cc-ci). The VM is a direct tailnet peer to cc-ci — no SOCKS proxy, no userspace tailscaled, no ProxyCommand. Updated across all affected files: AGENTS.md - Remove Pi from reboot description; migration complete (not "parked") - cc-ci access: direct ssh, not via proxy kickoff.md - Prerequisites: direct tailnet peer, not proxy - Host deps: NixOS (not apt) - Fallback/Incus: b1 reachable directly, no --proxy curl flag plan.md §1 + §1.5 - §1 bootstrap: direct SSH, check tailscale status (not restart proxy) - §1.5 intro: "VM" not "sandbox host"; no proxy - Credentials table: remove TS_AUTH_KEY row; update cc-ci SSH row - Replace "Tailscale connection (proxy)" subsection with direct-peer description plan-orchestrator-migration.md - Mark COMPLETE (2026-05-31); historical record only plan-phase1c-full-reproducibility.md - Incus access: direct, not via SOCKS proxy prompts/builder.md + prompts/adversary.md - cc-ci access language only: direct ssh, no proxy restart instructions - adversary: *.ci.commoninternet.net via plain curl, no proxy flag REBOOTS.md - Retitle for VM; note Pi retired; Pi entries marked historical systemd/cc-ci-loops.service - User/Group/HOME/PATH: notplants → loops - Remove cc-ci-tailscaled.service dependency (no proxy on VM) - Add note about nix/configuration.nix as the authoritative VM declaration test-e2e-testme-acceptance.md - tailscale status: no --socket flag - ssh to throwaway: no ProxyCommand Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
847 lines
62 KiB
Markdown
847 lines
62 KiB
Markdown
# cc-ci — Co-op Cloud Recipe CI Server (Autonomous Build Plan)
|
||
|
||
**Status:** ACTIVE — autonomous loop
|
||
**Owner agent:** Builder (primary) + Adversary (reviewer)
|
||
**Source brief:** `brief.md` (do not edit; this file supersedes it)
|
||
**This file's canonical path:** `/srv/cc-ci/cc-ci-plan/plan.md`
|
||
**Target server:** `cc-ci` (NixOS)
|
||
**Code/config home:** `git.autonomic.zone/recipe-maintainers/cc-ci` (the CI project repo — distinct from this
|
||
`/srv/cc-ci/cc-ci-plan/` planning+launch folder)
|
||
**Last updated:** keep current via `STATUS.md` (see §7)
|
||
|
||
---
|
||
|
||
## 0. How to read this document
|
||
|
||
This plan is written to be handed to an **autonomous Claude agent running in a sandbox over
|
||
several days**, driving itself in a loop until the CI server is "done" per §2. A second agent
|
||
(the **Adversary**) independently tries to disprove every "done" claim. Neither agent is
|
||
trusted to mark its own work complete.
|
||
|
||
If you are an agent waking up into this loop for the first time, go straight to **§1 Bootstrap**.
|
||
On every subsequent wake, go to **§7 The Loop Protocol** and continue from `STATUS.md`.
|
||
|
||
The rest of the document (§3–§6) is the technical design. Treat it as the default architecture,
|
||
but you are allowed to revise it when reality disagrees — record any deviation in `DECISIONS.md`
|
||
with a one-line rationale.
|
||
|
||
---
|
||
|
||
## 1. Bootstrap (first wake only)
|
||
|
||
Do these in order. Each step is idempotent; re-running is safe.
|
||
|
||
1. **Verify access.** (Full credential map + how each is used is in **§1.5** — read it first.)
|
||
- `ssh cc-ci 'hostname && whoami'` — you log in as **root** on cc-ci (NixOS), so there is no
|
||
separate sudo step. `ssh cc-ci` reaches cc-ci directly (the orchestrator VM is a direct tailnet
|
||
peer — no proxy; key `~/.ssh/cc-ci-root-ed25519`). If it fails, check `tailscale status`
|
||
before declaring blocked.
|
||
- `ssh cc-ci 'nixos-version'` — confirm NixOS.
|
||
- Confirm you can reach the Gitea API with the bot creds from `.testenv` (§1.5):
|
||
`curl -s https://$GITEA_URL/api/v1/version`. The bot authenticates with
|
||
`GITEA_USERNAME`/`GITEA_PASSWORD` (basic auth) or a token you mint from them via
|
||
`POST /api/v1/users/<user>/tokens` — do **not** expect a ready-made `$GITEA_TOKEN`.
|
||
- Confirm the **preconfigured** test-app DNS (§4.0/§4.4): a random subdomain under the wildcard
|
||
resolves, e.g. `getent hosts probe-$RANDOM.ci.commoninternet.net` returns the **gateway's** IP
|
||
(not cc-ci's — the gateway TLS-passthroughs to cc-ci, so do not expect cc-ci's address; and use
|
||
`getent`, not `dig`, since this host's resolver is Tailscale-only — see §1.5).
|
||
Traefik is *not* up yet — you deploy it at M1 (the real `coop-cloud/traefik` recipe via abra,
|
||
wildcard/file-provider mode → the pre-issued cert at `/var/lib/ci-certs/live/`, **no ACME**);
|
||
the DNS record + gateway passthrough + cert are the preconditions, and full end-to-end HTTPS
|
||
reachability is proven at M1, not now.
|
||
If the wildcard does not resolve at all, that's a `## Blocked` item (operator fixes DNS/gateway).
|
||
- If any check fails, write the failure to `STATUS.md` under `## Blocked` and stop — a human must fix access. Do **not** try to work around missing access.
|
||
|
||
2. **Create the `cc-ci` repo** on git.autonomic.zone if it does not exist. Push an initial
|
||
skeleton (see §3 layout). The Builder clones to `/srv/cc-ci/cc-ci`; the Adversary loop keeps
|
||
its **own independent clone** at `/srv/cc-ci/cc-ci-adv`. The repo is the only channel between
|
||
the two loops (§6.1) — loop state lives inside it (`STATUS.md`, `BACKLOG.md`, etc.).
|
||
|
||
3. **Snapshot the starting environment** into `cc-ci/docs/baseline.md`: current NixOS config on
|
||
the server (`/etc/nixos` or existing flake), installed packages, whether Docker/Swarm/abra
|
||
already exist, DNS that already points at the box. This is the rollback reference.
|
||
|
||
4. **Seed the loop state files** (§7) if absent: `STATUS.md`, `BACKLOG.md`, `REVIEW.md`,
|
||
`JOURNAL.md`, `DECISIONS.md`. Give `BACKLOG.md` two H2 sections — `## Build backlog`
|
||
(populated from §5 milestones) and `## Adversary findings` (empty) — per the single-writer
|
||
rule in §6.1.
|
||
|
||
5. Commit ("chore: bootstrap cc-ci loop state") and begin the loop at §7.
|
||
|
||
---
|
||
|
||
## 1.5 Credentials & access — where everything lives and how to use it
|
||
|
||
The loops run **on the cc-ci-orchestrator VM** (`100.116.55.106`, NixOS, `loops` user) and reach
|
||
cc-ci directly over Tailscale (direct tailnet peer — no proxy). This section is the authoritative
|
||
map of what credentials exist, where, and how to use them. **Never copy any secret value into the
|
||
repo, a commit, a log, or the dashboard** (§9) — reference locations only.
|
||
|
||
### Provided credentials (already in place)
|
||
|
||
| What | Where | How to use |
|
||
|---|---|---|
|
||
| **cc-ci SSH (root)** | private key `~/.ssh/cc-ci-root-ed25519`; `Host cc-ci` in `~/.ssh/config` (HostName `100.90.116.4`, no ProxyCommand) | Just run `ssh cc-ci` (logs in as **root**). The orchestrator VM is a direct tailnet peer — direct route, no proxy. Pubkey already in cc-ci's `/root/.ssh/authorized_keys`. |
|
||
| **Gitea bot account** | `/srv/cc-ci/.testenv` → `GITEA_USERNAME` (`autonomic-bot`), `GITEA_PASSWORD`, `GITEA_URL` (`git.autonomic.zone`) | Basic-auth to the Gitea API, or mint a scoped token: `POST https://$GITEA_URL/api/v1/users/$GITEA_USERNAME/tokens`. Used to push the `cc-ci` project repo, read recipe repos, comment on PRs, and poll for `!testme` (read-level; the bot does not register webhooks). |
|
||
|
||
Load them in a shell with: `set -a; . /srv/cc-ci/.testenv; set +a` (don't echo the values).
|
||
|
||
### The Tailscale connection (how `ssh cc-ci` works)
|
||
|
||
cc-ci (`cc-nix-test`, **100.90.116.4`) is on the same tailnet as the orchestrator VM
|
||
(`taila4a0bf.ts.net`), so it is reached **directly** — no SOCKS proxy, no userspace tailscaled.
|
||
The VM's system tailscaled is on that tailnet; `ssh cc-ci` routes straight to cc-ci.
|
||
|
||
- `ssh cc-ci` works out of the box (`~/.ssh/config` has `Host cc-ci` pinned to `100.90.116.4`,
|
||
no ProxyCommand; key `~/.ssh/cc-ci-root-ed25519`; logs in as root).
|
||
- For HTTP(S) to cc-ci / `*.ci.commoninternet.net` from the VM, use plain `curl` — no proxy flag
|
||
needed. The VM uses public DNS resolvers (`1.1.1.1`/`8.8.8.8`) so `*.ci.commoninternet.net`
|
||
resolves normally.
|
||
- **If `ssh cc-ci` fails:** run `tailscale status` (as loops or root) to confirm the VM is still
|
||
on the tailnet and cc-ci is listed; check `systemctl status tailscaled`. A connectivity failure
|
||
is recoverable, not an immediate `## Blocked`-and-stop, unless the VM has lost tailnet membership
|
||
entirely (then that IS a class-A1 blocker).
|
||
|
||
### Credentials the loop GENERATES itself (do not wait on a human for these)
|
||
|
||
- **Drone RPC secret** and **webhook HMAC secret** — generate (`openssl rand -hex 32`), store
|
||
sops-encrypted in `secrets/`, and wire both ends. Internal shared secrets, not human inputs.
|
||
- **Gitea OAuth app for Drone** — create it under the bot account via the API
|
||
(`POST /api/v1/user/applications/oauth2`); capture client id/secret into `secrets/`.
|
||
- **cc-ci host age/GPG key for sops** — generate on the host (or derive from its SSH host key);
|
||
add as a sops recipient. Keep a recovery copy of the master age identity off-box if desired.
|
||
- **Per-recipe app secrets** (class-B, §4.4) — the harness generates these per run.
|
||
|
||
### Credentials STILL NEEDED from the operator (class-A — block if missing, per §9)
|
||
|
||
- **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert
|
||
(`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at
|
||
`/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0).
|
||
> **Phase-1c update (supersedes the cert references in §1.5/§4.0/§4.4 below):** the cert is no longer
|
||
> an out-of-band operator file-drop — it is now **sops-encrypted in the private `cc-ci-secrets` repo**
|
||
> (a git submodule) and **decrypted at activation to that same path** by sops-nix. Issuance stays
|
||
> operator-only (LE/Gandi, no token on the box); to rotate, the operator re-issues then re-encrypts
|
||
> the cert into `cc-ci-secrets` and rebuilds. The ONE out-of-band secret is now the bootstrap age key
|
||
> at `/var/lib/sops-nix/key.txt`. Authoritative model: `cc-ci/docs/secrets.md` + `docs/install.md`. The agent feeds these into the
|
||
`coop-cloud/traefik` recipe as its `ssl_cert`/`ssl_key` swarm secrets (wildcard/file-provider
|
||
mode) and runs **no ACME** for this domain. **Do not request or expect a `commoninternet.net` DNS
|
||
token** — issuance/renewal is handled out-of-band by the operator (LE 90-day cert; next renewal
|
||
~2026-08-24). A missing/expired cert is a finding for the operator, not an agent re-issue.
|
||
- **Registry pull credentials** (e.g. Docker Hub) — *recommended* to avoid anonymous pull-rate
|
||
limits breaking deploys under load. Treat a rate-limit failure traced to this as a finding, then
|
||
request creds. Store sops-encrypted in `secrets/`.
|
||
- **Gitea bot permissions** (a grant, not a secret) — **least privilege: read, not admin.** The bot
|
||
needs: write on its own `recipe-maintainers/cc-ci` project repo; **read** + **comment** on the
|
||
recipe repos under test; and **org membership** in `recipe-maintainers` (read-level — used both to
|
||
authorize commenters via the members endpoint and to read members). It does **not** need repo-admin
|
||
and does **not** register webhooks (that's an optional manual admin task, §4.1). If a needed grant
|
||
is missing, that's a `## Blocked` item for the operator.
|
||
|
||
---
|
||
|
||
## 2. Definition of Done (the loop's exit condition)
|
||
|
||
The loop terminates **only** when every item below is true *and the Adversary has independently
|
||
re-verified each one within the last 24h* (logged in `REVIEW.md` with timestamps and command
|
||
output). Partial credit does not count.
|
||
|
||
- [ ] **D1 — Trigger.** Commenting `!testme` on any open PR in any enrolled recipe repo on
|
||
git.autonomic.zone starts a CI run for the code *at that PR's head commit* within 60s.
|
||
Other comments do not. Re-commenting re-runs.
|
||
- [ ] **D2 — Test matrix.** For a recipe under test, the run executes, as separate reported
|
||
stages: **new install**, **upgrade** (previous published version → PR version), and
|
||
**backup + restore**. All are genuine end-to-end against a really-deployed recipe (real
|
||
containers, real Traefik routing, real volumes) — no mocks, no stubs.
|
||
- [ ] **D3 — Python + Playwright.** Tests are Python. Functional assertions that require a
|
||
browser use Playwright against the live deployed app.
|
||
- [ ] **D4 — Recipe-local tests.** If the recipe repo contains its own `tests/` folder, those
|
||
tests are also discovered and run as part of the same CI run, with results merged in.
|
||
- [ ] **D5 — Per-recipe test tree.** The cc-ci repo holds `tests/<recipe>/` with the
|
||
install/upgrade/backup tests as Python files, plus a shared harness. Adding a new recipe is
|
||
a documented, small, repeatable operation.
|
||
- [ ] **D6 — Secrets.** App + infra secrets are handled reproducibly (committed encrypted,
|
||
decrypted on the server), documented, and rotatable. No plaintext secrets in git, logs, or
|
||
the results UI.
|
||
- [ ] **D7 — Results UX.** Each run has a stable URL with live, tail-able logs per stage and a
|
||
final pass/fail; there is an overview page listing recipes with their latest status —
|
||
look-and-feel comparable to the YunoHost app CI (`ci-apps.yunohost.org`). A PR comment links
|
||
back to its run and reflects the outcome.
|
||
- [ ] **D8 — Reproducible server.** The entire server (Drone, runner, comment bridge, swarm,
|
||
Traefik, dashboard, secrets wiring) is declared in the `cc-ci` repo's NixOS flake and can be
|
||
rebuilt from scratch onto a blank NixOS host following `docs/install.md`, verified by the
|
||
Adversary doing exactly that on a throwaway VM (or documenting why a full from-scratch
|
||
rebuild was infeasible and what was tested instead).
|
||
- [ ] **D9 — Documentation.** `README.md` + `docs/` explain architecture, how to enroll a recipe,
|
||
how to add/run tests locally, how to operate/rotate secrets, and how to debug a failed run.
|
||
A new engineer can enroll a recipe and get a green run using only the docs.
|
||
- [ ] **D10 — Proof (breadth).** At least **six real recipes** spanning the meaningful
|
||
categories have a full green run triggered by `!testme` on a real PR, with all three stages
|
||
(install / upgrade / backup+restore) actually exercised. The set must cover:
|
||
a stateless/simple app, a single-DB app, a multi-service app, an SSO/identity app, and an
|
||
object-storage/large-volume app. **Target set (all previously verified deployable):**
|
||
`hedgedoc` (simple), `cryptpad` (stateful, no external DB), `keycloak` + `authentik`
|
||
(SSO/identity, DB-backed), `lasuite-docs` and/or `lasuite-drive` (multi-service + S3/MinIO),
|
||
`matrix-synapse` (DB + media store), `immich` (large volumes + Postgres), `bluesky-pds`
|
||
(TLS-passthrough/atproto). Pick six that together satisfy the categories; record the chosen
|
||
set and per-recipe green-run evidence in `REVIEW.md`. Any recipe that genuinely cannot be CI'd
|
||
is a documented finding (in `DECISIONS.md`) with the reason, not a silent omission.
|
||
*Recipe availability:* the testable repos live on the **private mirror**
|
||
`git.autonomic.zone/recipe-maintainers/<recipe>` (already mirrored as of bootstrap:
|
||
`bluesky-pds`, `cryptpad`, `keycloak`, `lasuite-docs`, `lasuite-meet`, `matrix-synapse`, `n8n`,
|
||
`custom-html`, `custom-html-tiny`). Any recipe **not** yet mirrored (e.g. `hedgedoc`,
|
||
`authentik`, `immich`, `lasuite-drive`) is pulled from upstream **git.coopcloud.tech** and
|
||
created on the mirror via the **recipe mirror+PR flow** (§4.1) — so the target set is not capped
|
||
by what currently exists. If the chosen simple/stateless app isn't mirrored, `custom-html` /
|
||
`custom-html-tiny` already are.
|
||
|
||
When all of D1–D10 hold and are Adversary-verified, write `## DONE` to `STATUS.md` with the
|
||
evidence links and stop scheduling new iterations.
|
||
|
||
---
|
||
|
||
## 3. Repository layout (`git.autonomic.zone/recipe-maintainers/cc-ci`)
|
||
|
||
```
|
||
cc-ci/
|
||
├── README.md
|
||
├── flake.nix # NixOS host(s) + devshell
|
||
├── flake.lock
|
||
├── hosts/
|
||
│ └── cc-ci/
|
||
│ ├── configuration.nix # the cc-ci machine
|
||
│ └── hardware.nix
|
||
├── modules/
|
||
│ ├── drone.nix # Drone server + runner (exec/docker)
|
||
│ ├── comment-bridge.nix # !testme webhook listener service
|
||
│ ├── swarm.nix # Docker + single-node swarm + `proxy` net; deploys the
|
||
│ │ # coop-cloud/traefik recipe via abra (wildcard/file-provider, §4.2)
|
||
│ ├── dashboard.nix # results overview site
|
||
│ └── secrets.nix # sops-nix / agenix wiring
|
||
├── secrets/ # sops-encrypted (*.enc / *.age); see §4.4
|
||
│ └── secrets.yaml
|
||
├── bridge/ # comment-bridge source (small Go/Python service)
|
||
├── runner/ # CI orchestration entrypoint invoked by Drone
|
||
│ ├── run_recipe_ci.py # top-level: deploy→test→teardown for a recipe@ref
|
||
│ └── harness/ # shared pytest fixtures (abra wrappers, app lifecycle)
|
||
├── dashboard/ # results UI generator (reads Drone API → static site)
|
||
├── tests/
|
||
│ ├── conftest.py # shared fixtures, recipe selection, teardown guarantees
|
||
│ ├── <recipe>/
|
||
│ │ ├── test_install.py
|
||
│ │ ├── test_upgrade.py
|
||
│ │ ├── test_backup.py
|
||
│ │ └── playwright/ # e2e flows for this recipe
|
||
│ └── _template/ # copy-to-add-a-recipe template
|
||
├── docs/
|
||
│ ├── install.md # from-scratch server build (D8)
|
||
│ ├── enroll-recipe.md # how to add a recipe (D5)
|
||
│ ├── secrets.md # secret model + rotation (D6)
|
||
│ ├── architecture.md
|
||
│ ├── runbook.md # debugging failed runs
|
||
│ └── baseline.md # bootstrap snapshot
|
||
├── STATUS.md BACKLOG.md REVIEW.md JOURNAL.md DECISIONS.md # loop state (§7)
|
||
└── .drone.yml # pipeline for cc-ci's own repo (lint/self-test)
|
||
```
|
||
|
||
---
|
||
|
||
## 4. Technical design (default architecture)
|
||
|
||
### 4.0 Domain model (where things live)
|
||
|
||
Two DNS zones, deliberately separated — do **not** conflate them:
|
||
|
||
- **`git.autonomic.zone` — source of truth for code (unchanged, not ours to reconfigure).**
|
||
The Gitea host: the enrolled recipe repos and the `cc-ci` config repo live here. The loop reads,
|
||
comments, and (when enrolling) adds a webhook here, but deploys **nothing** here. Per §9 this zone
|
||
is read/comment-only — never push recipe code, never point app DNS at it.
|
||
- **`commoninternet.net` — the CI server's own zone; everything CI-facing.** A wildcard
|
||
`*.ci.commoninternet.net` resolves to a **gateway** (not cc-ci directly — see Network path below).
|
||
Under it:
|
||
- **Apps under test:** each run deploys to a unique subdomain
|
||
`<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`, so concurrent runs never collide on a
|
||
hostname. The subdomain (app, volumes, secrets, Traefik route) is torn down at run end (§4.3).
|
||
- **Results dashboard:** `ci.commoninternet.net` — overview page + per-recipe status badges (§4.5).
|
||
- **Webhook bridge:** `ci.commoninternet.net/hook` — the Gitea `issue_comment` receiver (§4.1).
|
||
- **Network path (gateway → TLS passthrough → cc-ci).** The wildcard record does **not** point at
|
||
cc-ci's IP. It points at a gateway that **passes TLS through** to cc-ci: the gateway routes by SNI
|
||
and forwards the raw encrypted stream without decrypting it, so TLS still **terminates on cc-ci's
|
||
Traefik**. Consequences the agent must respect:
|
||
- `dig <sub>.ci.commoninternet.net` returns the **gateway's** IP, not cc-ci's — do not assert the
|
||
record points at cc-ci. Reachability is proven end-to-end (an HTTPS request lands on cc-ci),
|
||
not by comparing A records.
|
||
- The gateway is assumed to passthrough the **whole wildcard**, so a fresh per-run subdomain needs
|
||
**no gateway change** and **no cert work** (the pre-issued wildcard already covers it) — the
|
||
agent only adds the Traefik **router** on cc-ci. (If the gateway
|
||
instead needs per-host config, that's an operator/gateway concern and a `## Blocked` item, not
|
||
something the agent reconfigures — the gateway is not ours, only cc-ci is, per §9.)
|
||
- The gateway is operator-managed and out of scope; the agent configures only cc-ci.
|
||
- **Caveat for TLS-passthrough recipes** (e.g. `bluesky-pds`, §2 D10): the default path terminates
|
||
TLS at cc-ci's Traefik. A recipe that expects to terminate TLS in its own container needs cc-ci's
|
||
Traefik configured to passthrough that host too (the outer gateway already passes the whole
|
||
wildcard). Treat this as a per-recipe harness quirk to absorb (§5 M6.5), or pick a non-passthrough
|
||
recipe for that D10 category and record the swap in `DECISIONS.md` — not a silent omission.
|
||
- **Wildcard TLS — operator pre-issues, agent serves it statically (no token in the agent).**
|
||
Routing and certs are separate: the preconfigured wildcard DNS solves routing only; a cert is
|
||
still needed because the gateway passes TLS through and cc-ci's Traefik terminates it. **The cert
|
||
is pre-provisioned out-of-band** so the DNS-editing token never enters the agent/repo. A wildcard
|
||
SAN cert covering **`*.ci.commoninternet.net` + `ci.commoninternet.net`** (issued via Let's
|
||
Encrypt DNS-01 against Gandi, by the operator, using a token the agent never sees) lives on cc-ci:
|
||
- `/var/lib/ci-certs/live/fullchain.pem` (leaf+intermediate) and `…/privkey.pem`.
|
||
- **Traefik is the real `coop-cloud/traefik` recipe, deployed via abra** (for e2e fidelity — see
|
||
§4.2), run in its **wildcard / file-provider mode** (`WILDCARDS_ENABLED=1` + `compose.wildcard.yml`).
|
||
The pre-issued cert is supplied as the recipe's `ssl_cert`/`ssl_key` **swarm secrets** (sourced
|
||
from the files above); the recipe's file provider then serves it under `tls.certificates`. **No
|
||
ACME resolver / no DNS provider** is enabled — only the cert+key reach cc-ci, never the DNS token.
|
||
One cert covers every per-run subdomain (matched by SNI), so a new app domain needs no cert work.
|
||
- **Renewal is a manual operator task** (LE 90-day cert): the operator re-issues out-of-band, then
|
||
updates the `ssl_cert`/`ssl_key` secret (bump its version) and redeploys traefik. The agent must
|
||
**not** attempt ACME/DNS-01 for `commoninternet.net` and must **not** expect a DNS token — a
|
||
missing/expired cert is an operator action surfaced as a finding, not something the agent re-issues.
|
||
(Rationale for choosing a wildcard cert over per-subdomain: a wildcard is reused for every churning
|
||
run subdomain and sidesteps LE's 50-certs/week-per-domain limit; only DNS-01 can mint a wildcard.
|
||
We keep that DNS-01 issuance with the operator rather than handing the agent the zone token.)
|
||
- Record the live facts in `docs/install.md`: the zone + DNS provider (Gandi), that the wildcard
|
||
`*.ci.commoninternet.net` (and bare `ci.commoninternet.net`) point at the **gateway**, that the
|
||
gateway TLS-passthroughs the wildcard to cc-ci, the gateway's address, the TTL, and that the
|
||
wildcard cert is pre-issued/operator-renewed at `/var/lib/ci-certs/live/` (no DNS token on cc-ci).
|
||
|
||
### 4.1 The `!testme` trigger path
|
||
|
||
Gitea does not natively forward PR-comment events to Drone, and Drone's built-in triggers fire on
|
||
push/PR-open, not on a magic comment. So:
|
||
|
||
```
|
||
PR comment "!testme"
|
||
│ Gitea webhook (issue_comment event) ──► comment-bridge (modules/comment-bridge.nix)
|
||
│ • verifies webhook HMAC secret
|
||
│ • checks comment body == "!testme" (exact, trimmed)
|
||
│ • checks commenter is allowed (org member / collaborator)
|
||
│ • resolves PR head repo + SHA via Gitea API
|
||
│ • calls Drone API: build for cc-ci pipeline,
|
||
│ params RECIPE=<repo> REF=<sha> PR=<n> SRC=<headrepo>
|
||
▼
|
||
Drone build (cc-ci repo pipeline, parameterized) ──► runner/run_recipe_ci.py
|
||
▼
|
||
Bridge posts/updates a Gitea PR comment with the run URL and (on completion) pass/fail.
|
||
```
|
||
|
||
- The bridge is a tiny service (Go or Python+FastAPI). Keep it dependency-light; it's a NixOS
|
||
systemd service behind Traefik at e.g. `ci.commoninternet.net/hook` (§4.0).
|
||
- **Trigger: POLLING is primary; webhook is an optional, admin-registered push optimization
|
||
(SETTLED).** Hard constraint: **the CI server/bot must run on READ-level access — never repo-admin.**
|
||
- **Polling (primary, default):** the bridge polls the Gitea API for new `!testme` comments on
|
||
enrolled repos at ≤60s (satisfies D1). This is **outbound** (cc-ci → git.autonomic.zone, the
|
||
reliably-working direction) and needs only **read**. It is the source of truth for triggering.
|
||
- **Webhook (optional):** the bridge keeps its `/hook` endpoint so a Gitea `issue_comment` webhook,
|
||
**if present**, gives lower latency. But the **server does NOT self-register webhooks** (that
|
||
needs repo-admin, which we refuse to require). Registration is a **manual admin task, documented**
|
||
in `docs/enroll-recipe.md` (URL `https://ci.commoninternet.net/hook`, event `issue_comment`,
|
||
content-type `json`, the shared HMAC secret, and the note that the Gitea instance must allow the
|
||
host). The two paths are mutually exclusive in effect; don't double-fire a comment seen by both.
|
||
- (Webhook delivery on this instance was flaky early on — `last_status: None` — so polling being
|
||
primary is also the robust choice, not just the low-privilege one.)
|
||
- **Commenter auth via org membership (read-level — no admin).** The repo's explicit collaborator
|
||
list is empty: the bot *and* the maintainers (`trav`/`notplants`) all reach the repo as
|
||
`recipe-maintainers` **org members/owners**, so `GET /collaborators/{user}` 404s for everyone, and
|
||
`GET /collaborators/{user}/permission` would authorize correctly but **requires repo-admin** — which
|
||
we refuse. Instead authorize with **`GET /orgs/recipe-maintainers/members/{user}`** (204 = member =
|
||
authorized; 404 = rejected) — readable by any **org member** (read-level), verified to admit
|
||
`trav`/`notplants`/the bot and reject non-members. Note `public_members` is hidden here, so use the
|
||
authenticated `members` endpoint (bot must be an org member, still read-level). Fail-closed on
|
||
error. Zero-privilege fallback: a configured allowlist of usernames. (Still satisfies §6's
|
||
non-collaborator-rejection check.)
|
||
- Enrollment = adding the recipe to the bridge's **poll list** + ensuring a `tests/<recipe>/` dir
|
||
exists. The bot needs only **read** on the recipe repo (+ comment-back to post status). Registering
|
||
a webhook is **optional and operator/admin-side** (documented in `enroll-recipe.md`), never required
|
||
for CI to work.
|
||
- **Recipe mirror+PR flow (how a recipe gets a testable PR).** Recipe repos under test live on the
|
||
**private mirror** `git.autonomic.zone/recipe-maintainers/<recipe>`, mirrored from the **official
|
||
upstream `git.coopcloud.tech`**. To bring a recipe under CI: `abra recipe fetch <recipe>` (pulls
|
||
from upstream into `~/.abra/recipes/<recipe>`), then mirror it to the org + open a PR via the
|
||
**recipe mirror+PR procedure** — reference implementation:
|
||
`/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` (creates `recipe-maintainers/<recipe>`
|
||
if absent, force-syncs `main` from upstream so the PR diff is clean, pushes a branch, opens the PR).
|
||
`!testme` on that PR is what kicks off a run. So a recipe missing from the mirror is **not** a
|
||
blocker — mirror it first.
|
||
- Decide and record in DECISIONS.md: one shared Gitea org-level webhook vs per-repo webhooks.
|
||
Org-level is fewer moving parts; per-repo is more explicit. Default: per-repo via enroll script.
|
||
|
||
### 4.2 Drone + the test target
|
||
|
||
- Drone server connects to Gitea via OAuth app (Gitea → Settings → Applications). Runner is the
|
||
**exec runner** (or a privileged docker runner) running **on cc-ci itself**, because tests must
|
||
drive `abra` to deploy real recipes onto a real swarm.
|
||
- cc-ci doubles as the **deploy target**: single-node Docker Swarm + abra, with the reverse proxy
|
||
provided by the **real `coop-cloud/traefik` recipe deployed via abra** (not a hand-rolled Traefik
|
||
— chosen for **end-to-end fidelity**: test apps route through the exact proxy a real Co-op Cloud
|
||
host uses — `web`/`web-secure` entrypoints, the `proxy` overlay, the swarm provider). TLS
|
||
terminates on it using the **pre-issued static wildcard cert** (§4.0): run the recipe in
|
||
**wildcard/file-provider mode** (`WILDCARDS_ENABLED=1` + `compose.wildcard.yml`) and supply the
|
||
cert as the recipe's `ssl_cert`/`ssl_key` swarm secrets from `/var/lib/ci-certs/live/`. The
|
||
operator preconfigures the wildcard DNS (→ gateway), the gateway's TLS-passthrough, and the cert
|
||
itself (§4.4); the agent deploys the traefik recipe + swarm on top — **no ACME, no DNS token on
|
||
cc-ci**. Make the `abra app new/deploy traefik` steps reproducible (scripted/Nix-invoked) for D8.
|
||
- Each CI run gets an isolated app domain `<recipe>-pr<n>-<short-sha>.ci.commoninternet.net`
|
||
(§4.0) so concurrent runs don't collide. Teardown removes app, secrets, and volumes.
|
||
- **Concurrency cap + queue — use Drone natively (SETTLED).** Don't let the server fill with
|
||
simultaneously-deployed apps. Expose a configurable **`MAX_TESTS`** mapped to the exec runner's
|
||
**`DRONE_RUNNER_CAPACITY`** (Nix-set on the runner; default low — **1–2** given a single 28 GiB
|
||
node and heavy recipes like matrix/immich). Drone runs at most `MAX_TESTS` builds at once and
|
||
**automatically queues** excess builds (its native pending-build queue), starting them as slots
|
||
free. **Per-build timeout** (repo/runner timeout) guarantees a hung test is killed and frees its
|
||
slot — so "continue once a current test finishes *or times out*" is built in. No custom queue
|
||
needed. Optionally also set `concurrency: { limit: <N> }` in `.drone.yml` as a per-pipeline cap.
|
||
- **One app at a time per run, torn down at run end.** A build deploys its recipe, runs the three
|
||
stages, then **undeploys** — the server should not accumulate live test apps. Guaranteed teardown
|
||
+ the run-start janitor (§4.3) enforce this even when a build is timed-out/killed (in-process
|
||
cleanup can't run, so the janitor reaps it).
|
||
|
||
### 4.3 The test harness & recipe test contract
|
||
|
||
`runner/run_recipe_ci.py` orchestrates per run:
|
||
1. Fetch recipe at `$REF` (the PR head) via abra/git.
|
||
2. **Install stage** → `tests/<recipe>/test_install.py`: `abra app new`, generate secrets,
|
||
`abra app deploy`, wait healthy, run Playwright smoke + assertions.
|
||
3. **Upgrade stage** → deploy previous published version first, then upgrade to `$REF`; assert
|
||
data survives and app still healthy.
|
||
4. **Backup/restore stage** → `abra app backup`, mutate state, `abra app restore`, assert restored
|
||
state matches pre-mutation.
|
||
5. **Recipe-local tests (D4)** → if `<recipe-repo>/tests/` exists, discover & run it in the same
|
||
live environment; merge results.
|
||
6. **Teardown (always, even on failure)** → `abra app undeploy`, `abra app volume remove`,
|
||
`abra app secret remove`, DNS/route cleanup.
|
||
|
||
Shared fixtures (`tests/conftest.py` + `runner/harness/`) wrap abra. **Known abra gotchas to bake
|
||
in from day one** (carried over from prior work, re-verify on the installed abra version):
|
||
- `abra app undeploy` and `abra app volume remove` do **not** accept `--chaos` → never pass it.
|
||
- Plumb a `timeout` kwarg through secret-generate/insert/remove-all calls.
|
||
- `abra app ls -S -m` returns nested `{server: {apps: [...]}}` — parse the inner structure.
|
||
- Pick robust health checks per app (e.g. Keycloak: `/realms/master`, not `/`).
|
||
|
||
The teardown guarantee is sacred: a failed test must never leak a deployed app or volume into the
|
||
next run. Implement teardown as a pytest fixture finalizer / `try/finally` in the orchestrator and
|
||
add a janitor pass at run start that nukes any orphaned `*-pr*` apps older than N hours.
|
||
**Crucially, the janitor is the backstop for timed-out/killed builds:** when Drone hits the
|
||
per-build timeout (or a build is cancelled) it may SIGKILL the runner process, so the `try/finally`
|
||
teardown can't run — those orphaned apps/volumes are reaped by the next build's run-start janitor
|
||
(and the janitor should run regardless of how the previous build ended). Net effect with the
|
||
`MAX_TESTS`/`DRONE_RUNNER_CAPACITY` cap (§4.2): at most `MAX_TESTS` apps are ever live at once, and
|
||
each is torn down (or janitor-reaped) so the single node never accumulates deployments.
|
||
|
||
### 4.4 Secrets (D6)
|
||
|
||
There are **two distinct classes of secret** and they are handled in opposite ways. Do not
|
||
conflate them.
|
||
|
||
**(A) Infra secrets.** All of these end up `sops-nix`-encrypted in `secrets/`, decrypt into the Nix
|
||
store at activation, and are never world-readable. But they split into two sub-classes — see §1.5
|
||
for the concrete locations/usage — and only the first sub-class blocks:
|
||
|
||
- **(A1) External inputs — provided by the operator, the loop cannot create them.** The Tailscale
|
||
auth key + Gitea bot creds (`/srv/cc-ci/.testenv`, already provided), the **pre-issued wildcard
|
||
TLS cert** at `/var/lib/ci-certs/live/` (§4.0 — *not* a DNS token; the agent serves it, never
|
||
issues it), and **registry pull creds** (if needed). If one of these is **missing or invalid, the
|
||
loop is blocked** — write it to `STATUS.md ## Blocked` and stop (§9). The agent must not invent or
|
||
work around an external input it wasn't given, and must **not** attempt ACME/DNS-01 for
|
||
`commoninternet.net`.
|
||
- **(A2) Internal secrets — the loop generates and manages these itself; never block on them.**
|
||
Drone RPC secret + webhook HMAC (`openssl rand`), the Gitea OAuth app for Drone (created via the
|
||
bot API), and the cc-ci host age/GPG key for sops. These are *not* human inputs; generate, store
|
||
in `secrets/`, and wire both ends.
|
||
|
||
Alongside these, three **preconfigured network/cert facts** are operator-provided inputs the loop
|
||
also depends on (not secrets the agent makes, but class-A in the same "provided, don't improvise"
|
||
sense): (1) the wildcard `*.ci.commoninternet.net` record (and bare `ci.commoninternet.net`) already
|
||
points at the **gateway**, (2) the gateway **TLS-passthroughs** that wildcard to cc-ci (SNI-routed,
|
||
no decryption — see §4.0 Network path), and (3) the **pre-issued wildcard cert** is in place at
|
||
`/var/lib/ci-certs/live/`. The operator owns the DNS record, the gateway, and cert issuance/renewal;
|
||
**everything else on cc-ci is the agent's job** — Traefik (pointed at the static cert), swarm,
|
||
per-run subdomain routing, and teardown. If the wildcard does not resolve, the gateway doesn't reach
|
||
cc-ci, or the cert is missing/expired, that is a `## Blocked` condition (operator action), not
|
||
something to work around (the gateway and DNS are not ours to reconfigure, per §9).
|
||
|
||
**(B) Recipe app secrets — generated by the test, persisted within the run.** These are NOT a
|
||
blocker and are NOT pre-provisioned by a human. The harness creates them itself for each app under
|
||
test and is responsible for persisting them across the run so the multi-stage lifecycle works:
|
||
|
||
- **Generate at install:** the harness runs `abra app secret generate` (+ inserts any deterministic
|
||
test fixtures like an admin password / test user it chooses) when it deploys the app.
|
||
- **Persist for the run's duration:** the *same* generated secrets must survive across stages —
|
||
install → upgrade and especially **backup → restore** — because an app cannot be upgraded or
|
||
restored against rotated credentials. Persist them in a per-run secret store keyed by the run's
|
||
unique app name (e.g. `<recipe>-pr<n>-<sha>`): the live abra/swarm secrets plus a sidecar record
|
||
the harness writes (e.g. the app's `.env` + the generated values) to a run-scoped, non-public
|
||
location on the runner, so any stage can re-read them. They are emphemeral by design.
|
||
- **Destroy at teardown:** the same teardown that removes the app/volumes also runs
|
||
`abra app secret remove` (with `timeout` plumbed) and deletes the per-run sidecar. Nothing
|
||
generated for a run outlives that run.
|
||
- **How the harness should "figure out" persistence (acceptance for D6):** decide and document one
|
||
concrete mechanism — recommended default is "abra/swarm holds the live secrets; the harness keeps
|
||
a run-scoped sidecar file under a `runs/<app-name>/` dir on the runner (mode 600), and reloads
|
||
from it between stages." Whatever is chosen, it must (1) keep the same values stable across all
|
||
three stages, (2) isolate concurrent runs from each other, and (3) leave nothing behind.
|
||
|
||
**(C) Drone CI tokens:** store as Drone org/repo secrets, referenced by the pipeline. Where a value
|
||
is an external input (A1, e.g. registry creds) it is provided; where it is internal (A2) it is
|
||
generated — see the (A) split above.
|
||
|
||
Hard rule across all classes: scrub secrets from logs before they reach the dashboard; the results
|
||
UI shows sanitized logs only. Add a redaction filter in the log pipeline and an Adversary test that
|
||
greps published logs and the overview site for known secret patterns and any generated app
|
||
password.
|
||
|
||
### 4.5 Results UX (D7) — YunoHost-CI-like
|
||
|
||
- **Per-run logs:** Drone's native UI already gives live, per-stage, tail-able logs and a final
|
||
status — use it as the canonical run view; the PR comment links to it.
|
||
- **Overview page:** a small generator (`dashboard/`) polls the Drone API and renders a static
|
||
page at `ci.commoninternet.net` (§4.0): a table of enrolled recipes, latest run status badge
|
||
(pass/fail/running), last-tested version, link to history — mirroring the YunoHost app-list
|
||
feel. Served by Traefik; regenerated on build-completion webhook or a short timer.
|
||
- Provide a status badge endpoint per recipe for embedding in recipe READMEs.
|
||
|
||
---
|
||
|
||
## 5. Milestones / initial BACKLOG
|
||
|
||
Work top-down; each milestone ends with an **Adversary gate** (Adversary must independently
|
||
verify the acceptance check before the next milestone starts). Seed `BACKLOG.md` from this.
|
||
|
||
- **M0 — Foundations.** Repo created; flake builds; `nixos-rebuild` (or deploy-rs) applies a
|
||
no-op-then-base config to cc-ci; sops decrypts a test secret on the host.
|
||
*Accept:* `ssh cc-ci 'systemctl is-system-running'` healthy after a rebuild from the repo.
|
||
- **M1 — Swarm + abra target.** Docker + single-node swarm + `proxy` network; the **`coop-cloud/traefik`
|
||
recipe deployed via abra** (wildcard/file-provider mode, serving the pre-issued cert — §4.0/§4.2,
|
||
not a custom Traefik); abra can deploy and tear down a trivial recipe by hand.
|
||
*Accept:* a recipe deployed via abra is reachable over HTTPS (valid wildcard cert) on the
|
||
`web-secure` entrypoint at `*.ci.commoninternet.net`, then fully torn down leaving no volumes; the
|
||
proxy is verifiably the traefik recipe and **no DNS/ACME token is present on cc-ci**.
|
||
- **M2 — Drone online.** Drone server+runner via Nix, OAuth to Gitea; a hello-world `.drone.yml`
|
||
in cc-ci runs green; logs visible in Drone UI.
|
||
*Accept:* push to cc-ci triggers a visible green Drone build.
|
||
- **M3 — Comment bridge.** `!testme` on a PR triggers a parameterized Drone build; bridge posts a
|
||
PR comment with the run link; non-`!testme` comments and non-collaborators are ignored.
|
||
*Accept:* live demo on a scratch PR — comment in, build out, link back, auth enforced.
|
||
- **M4 — Harness + install stage.** `run_recipe_ci.py` + conftest; install stage green for one
|
||
simple recipe end-to-end with a Playwright assertion; guaranteed teardown.
|
||
*Accept:* full green install run for recipe #1, no orphaned app/volume afterward.
|
||
- **M5 — Upgrade + backup/restore stages.** Add the other two stages for recipe #1.
|
||
*Accept:* upgrade preserves data; backup→mutate→restore returns original state.
|
||
- **M6 — Recipe-local tests (D4) + second recipe.** Discover/run recipe-repo `tests/`; enroll a
|
||
second, DB-backed recipe via the documented flow.
|
||
*Accept:* both recipes green; recipe-local tests demonstrably executed and merged.
|
||
- **M6.5 — Breadth ramp.** Enroll recipes 3→6 covering the remaining D10 categories, one at a
|
||
time, each via the documented enroll flow (this is the real test of D5: enrolling recipe N
|
||
should be template-copy + recipe-specific tests/fixtures, with **no harness surgery**). Expect
|
||
per-recipe quirks — multi-service deps, S3/MinIO config, SSO client setup, TLS passthrough,
|
||
large-volume backups — and absorb them into the *shared* harness, not one-off per-recipe hacks.
|
||
When flakiness appears, add real readiness/wait robustness to the harness rather than sprinkling
|
||
`sleep`s. Run benchmarks/long deploys **sequentially**, never in parallel (network contention).
|
||
*Accept:* recipes 3–6 each have a full three-stage green run; enrolling N≥3 needed no changes to
|
||
shared harness code.
|
||
- **M7 — Secrets hardening (D6).** Full sops model, rotation doc, log redaction + leak test.
|
||
*Accept:* Adversary's secret-grep over published logs finds nothing; rotation doc followed.
|
||
- **M8 — Dashboard (D7).** Overview page + badges + PR-comment outcome reflection.
|
||
*Accept:* overview matches reality across several runs; outcomes mirrored to PR comments.
|
||
- **M9 — Reproducibility + docs (D8/D9).** `docs/install.md` rebuilds the server from scratch on a
|
||
blank VM; all docs complete.
|
||
*Accept:* Adversary rebuilds from docs onto a throwaway host (or records the tested subset).
|
||
- **M10 — Proof (D10).** All six chosen recipes green via real `!testme` PRs (the breadth set from
|
||
M6/M6.5 carried through the hardened pipeline), each with install/upgrade/backup-restore
|
||
exercised and Adversary-verified; flip `STATUS.md` to DONE.
|
||
|
||
---
|
||
|
||
## 6. The two agents
|
||
|
||
### Builder (primary)
|
||
Implements the backlog top-down. Discipline:
|
||
- One backlog item in flight at a time. Small, committed, reversible steps.
|
||
- Every change verified against the *real* system (server, Drone, Gitea) before claiming done —
|
||
never "should work". Paste the verifying command + output into `JOURNAL.md`.
|
||
- Touch production carefully: cc-ci is the only target; never deploy test apps onto unrelated
|
||
production servers; never reuse production domains. Idempotent server changes only (via Nix).
|
||
- If blocked on access/secrets/external state, write it to `STATUS.md ## Blocked` and pick up an
|
||
unblocked item rather than hacking around it.
|
||
|
||
### Adversary (reviewer)
|
||
Runs as a **separate, independent loop in its own process/sandbox** (see §6.1 for how the two
|
||
loops coordinate). Its job is to **disbelieve**. It:
|
||
- Re-verifies each `Definition of Done` and milestone-acceptance claim independently, from a cold
|
||
start (fresh shell, own clone, no cached state), and logs PASS/FAIL + evidence in `REVIEW.md`.
|
||
- Actively tries to break things: comment `!testmexyz` (should NOT trigger), comment as a
|
||
non-collaborator (should be rejected), push a PR that fails tests (must report red, not green),
|
||
kill an app mid-run (teardown must still clean up), grep published logs/dashboard for secrets,
|
||
run two `!testme`s concurrently (no domain/volume/secret collision), confirm the same generated
|
||
app secrets persist across install→upgrade→backup/restore.
|
||
- Files every defect as a `BACKLOG.md` item tagged `[adversary]` with repro steps. The Builder
|
||
may not close an adversary item; only the Adversary closes it after re-test.
|
||
- Has veto power over `STATUS.md → DONE`.
|
||
|
||
### 6.1 Coordination protocol (two independent loops, one shared repo)
|
||
|
||
The two loops never talk directly; the **git repo is the only coordination medium**. Each agent
|
||
has its own clone (e.g. Builder in `/srv/cc-ci/cc-ci`, Adversary in `/srv/cc-ci/cc-ci-adv`) and
|
||
its own pacing. To make concurrent writes conflict-free:
|
||
|
||
- **File ownership (one writer each — the other only reads):**
|
||
- Builder owns: all source code/config, `STATUS.md`, `JOURNAL.md`, `DECISIONS.md`.
|
||
- Adversary owns: `REVIEW.md`.
|
||
- `BACKLOG.md` is split into two H2 sections — `## Build backlog` (Builder-only) and
|
||
`## Adversary findings` (Adversary-only). Each agent edits **only its own section**, so git
|
||
merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder
|
||
fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it
|
||
closed after re-test.
|
||
- `DEFERRED.md` (in `machine-docs/`) is the **single canonical registry for things the loops
|
||
have deliberately decided not to do autonomously and that need operator input to move on.**
|
||
Append-only; either agent may file. Each entry should clearly say *what's needed from the
|
||
operator* to lift the deferral (an opt-in flag, a resource decision, an architectural call,
|
||
plain "go ahead"). The list is **open-ended** — items can sit indefinitely, **no obligation
|
||
to close every item**, closure is operator-driven. A re-entry trigger / IDEA cross-link is
|
||
**optional** (include when there's a natural mechanism, e.g. an opt-in flag in
|
||
`cc-ci-plan/IDEAS.md`). Don't park deferrals as a vague "Q4 follow-up" / buried JOURNAL note
|
||
— file them here so the operator can review the whole list. The Phase-4 cleanup pass should
|
||
**surface** DEFERRED.md to the operator at least once but does **not** force closure.
|
||
Future-aspirational ideas (out of current scope) still go to `cc-ci-plan/IDEAS.md`; DEFERRED
|
||
is for considered-and-parked work the loops won't tackle without operator input.
|
||
- **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never
|
||
conflict. Prefer appending over rewriting.
|
||
- **Artifact-layer isolation — facts in STATUS, reasoning in JOURNAL (anti-anchoring).** Rigorous
|
||
adversarial verification requires the Adversary NOT to consume the Builder's rationalisations
|
||
before forming its verdict. The split:
|
||
- `STATUS.md` MUST carry **everything the Adversary needs to verify the claim** — withholding
|
||
verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW**
|
||
to verify (the exact command/check the Adversary can re-run from its own clone), the
|
||
**EXPECTED** outcome (build hashes, file contents, leaf fingerprints, status codes), and
|
||
**WHERE** the inputs live (commit shas, paths). If it's essential for the Adversary to verify,
|
||
it goes in STATUS.
|
||
- `STATUS.md` MUST NOT carry rationalisations / "why I think this passes" / design narrative /
|
||
dead-ends explored. Those go in `JOURNAL.md` (Builder-private to write).
|
||
- The Adversary reads STATUS for the claim + verification info, the plan as SSOT, and the code /
|
||
git history; it forms its verdict from those + its own **cold** acceptance run, and does **not**
|
||
read `JOURNAL.md` before the verdict. After an independent verdict, consulting JOURNAL is fine
|
||
(e.g. to contextualise a finding) — note in REVIEW that you did.
|
||
|
||
In short: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.**
|
||
- **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the
|
||
smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's
|
||
file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`.
|
||
- **Gate handshake via STATUS.md + commit-prefix signalling.** When the Builder believes a milestone
|
||
gate is met, it sets in `STATUS.md`: `Gate: <Mn> — CLAIMED, awaiting Adversary`, **commits it with a
|
||
`claim(...)` prefix**, and stops advancing past it. The Adversary runs the acceptance check cold and
|
||
writes the verdict to `REVIEW.md` (`<Mn>: PASS @<ts>` with evidence, or `FAIL` + an `[adversary]`
|
||
item), **committed with a `review(...)` prefix**. The Builder only proceeds past the gate after
|
||
seeing `PASS` in `REVIEW.md`.
|
||
- **The watchdog signals the handoff off these commit prefixes** (not by parsing prose): a new
|
||
`claim(...)` commit on origin/main pings the Adversary; a new `review(...)` commit pings the
|
||
Builder. So the prefixes are **load-bearing** — a gate claim MUST be a `claim(...)` commit and a
|
||
verdict MUST be a `review(...)` commit, or the counterpart isn't promptly woken (it falls back to
|
||
its slower self-poll). STATUS/REVIEW remain the durable source of truth; the prefix is the signal.
|
||
- **Clean tree before claim.** The Builder runs `git status` before claiming — the working tree
|
||
must be clean (everything committed AND pushed). The Adversary cold-verifies from a fresh clone,
|
||
so an uncommitted/un-pushed change that only exists on the Builder's host (e.g. a locally-built
|
||
fix) is a guaranteed cold-verify mismatch. Commit + push first, then claim.
|
||
- **DONE handshake.** Builder may write `## DONE` to `STATUS.md` **only** when `REVIEW.md` shows a
|
||
PASS dated within 24h for every D1–D10. The Adversary can write `## VETO <reason>` to
|
||
`REVIEW.md` at any time, which forbids DONE until cleared.
|
||
- **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or
|
||
the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep
|
||
doing independent work — neither loop blocks idle waiting on the other beyond its gate.
|
||
- **INBOX — explicit cross-loop messaging beyond CLAIMS.** Sometimes you have something to say to
|
||
the other loop that isn't a gate claim or a REVIEW verdict (a heads-up, a request for
|
||
early-look, a "I refactored X, please re-verify Y", an observation outside the normal flow). For
|
||
those, use the inbox files in `machine-docs/`:
|
||
- **Builder → Adversary:** the Builder writes/appends `machine-docs/ADVERSARY-INBOX.md` in its
|
||
own clone, commits, pushes.
|
||
- **Adversary → Builder:** the Adversary writes/appends `machine-docs/BUILDER-INBOX.md` in its
|
||
own clone, commits, pushes.
|
||
- The watchdog edge-triggers on **newly-present** inbox files in the relevant clone and pings
|
||
the receiver. The receiver, on receipt, reads + processes the message, then **deletes the
|
||
inbox file** (commits + pushes) — deletion is the "message consumed" signal. Single-writer
|
||
discipline: only the sender writes their counterpart's inbox; only the receiver deletes it.
|
||
- **Use for:** non-gate signals — "heads-up I'm about to refactor X," "please cold-verify this
|
||
while I keep going," "I observed Y outside our normal flow," "I'm taking a long e2e now."
|
||
**Do NOT use for:** formal gate claims (`STATUS.md` still owns those) or verdicts (`REVIEW.md`
|
||
still owns those). The inbox is a side-channel, not a replacement.
|
||
|
||
(If you are ever forced to run with a single process, the degraded fallback is to alternate
|
||
roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is
|
||
the intended design.)
|
||
|
||
---
|
||
|
||
## 7. The Loop Protocol
|
||
|
||
Both loops run this same shape; state lives in the repo so it survives restarts/compaction. On
|
||
every wake, `git pull --rebase` first, then:
|
||
|
||
1. **Orient.** Read `STATUS.md` (phase, in-flight item, gate claims, blockers), `BACKLOG.md`, and
|
||
the tail of `REVIEW.md`. Reconcile with reality via cheap probes (Drone health, last build,
|
||
`git log`) — never trust the ledger blindly; if it disagrees with the system, fix the ledger
|
||
first (your own files only — see §6.1).
|
||
2. **Select.**
|
||
- *Builder:* highest-priority open item in `## Build backlog`: unresolved `[adversary]`
|
||
findings > current milestone's next task > next milestone. Never advance past a milestone gate
|
||
until `REVIEW.md` shows its PASS.
|
||
- *Adversary:* any standing `Gate: <Mn> CLAIMED` in `STATUS.md` to verify > re-verify a D1–D10
|
||
gate whose last PASS is stale (>24h) > a fresh break-it probe from §6.
|
||
3. **Act.** Smallest change that advances the item. Builder verifies against the real system;
|
||
Adversary verifies from a cold start. Commit with a clear message (author per repo convention).
|
||
4. **Record (your own files only).** *Builder:* append to `JOURNAL.md` (what you did + verifying
|
||
command/output + next), update `STATUS.md`, tick `## Build backlog`. *Adversary:* append PASS/
|
||
FAIL + evidence to `REVIEW.md`, add/close items in `## Adversary findings`. Then `git push`.
|
||
5. **Gate handshake (§6.1).** Builder, on reaching a milestone, sets `Gate: <Mn> CLAIMED, awaiting
|
||
Adversary` in `STATUS.md` and works on other unblocked items meanwhile. Adversary clears it with
|
||
a `REVIEW.md` verdict. No gate is "passed" without a logged PASS.
|
||
6. **Decide continuation.** Builder writes `## DONE` only when `REVIEW.md` shows a <24h PASS for
|
||
every D1–D10 and no standing `## VETO`. Otherwise schedule the next wake.
|
||
|
||
**Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the
|
||
harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll
|
||
the *specific* thing. Three cases:
|
||
1. **Something in flight** (build/deploy/`nixos-rebuild`/e2e/heavy test) → **poll every ~5 min** to
|
||
stay cache-warm and to **see failures as they happen**, not at the end of a 25-minute sleep. Do
|
||
**NOT** `ScheduleWakeup` for the expected total runtime of the task in a single big sleep — a 25
|
||
min e2e gets 5 short cache-warm polls, not one 25-min cache-cold blackout. The wakeup that wakes
|
||
you mid-task is *cheap* (one cache hit, one quick status check); the value of catching a deploy
|
||
that died at minute 4 of a 25-min budget is large. Keep polling *it*, don't treat it as idle.
|
||
- **Recommended pattern for long deploys/convergence (builder, 2026-05-30):** **arm a `Monitor`**
|
||
that polls the node every ~30s and **wakes you on convergence OR failure**, with a **longer
|
||
fallback heartbeat** (`ScheduleWakeup`) as a backstop if the Monitor never fires. This proceeds
|
||
the *instant* the deploy converges (no over-waiting if it finishes early) and surfaces a failure
|
||
promptly, while the heartbeat bounds the wait if the condition is never met. Size the convergence
|
||
timeout sanely — longer than a few minutes if a recipe genuinely needs it, but **never absurd**
|
||
(e.g. the ~40-min ghost timeout was excessive). Beats both a single big blind sleep and a fixed
|
||
coarse poll.
|
||
2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or
|
||
Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll
|
||
here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate,
|
||
the watchdog pings the Adversary to verify *now*; the moment the Adversary updates `REVIEW.md`
|
||
(verdict/finding), it pings the Builder to proceed (`launch.sh`, ~30 s detection). So you may sleep
|
||
while blocked and trust the ping — but keep a **fallback self-poll on a modest cadence (~2–4 min)**
|
||
in case a ping is missed (a dead session is restarted by the watchdog and re-orients from the repo
|
||
anyway). The goal: a pending handoff resolves in well under a minute, not a whole idle interval.
|
||
3. **Genuinely idle, nothing pending from either loop** → sleep in chunks of **at most 10 min**, then
|
||
re-wake and re-orient; if still nothing, sleep another ≤10 min. **Never a single wait > 10 min**
|
||
(600 s) — see the liveness rule below.
|
||
|
||
Notes: **The Adversary may idle freely when nothing is pending — it should NOT pointlessly re-verify
|
||
or busy-poll to look busy.** It gets woken by the watchdog the instant the Builder claims a gate, so
|
||
"start verifying very soon after the Builder waits" is handled by the signal, not by the Adversary
|
||
spinning. **The Builder** should prefer keeping an unblocked backlog item in hand so it's rarely
|
||
*fully* blocked on a gate; only hit case 2 when everything is genuinely gated behind the pending
|
||
verification — and then rely on the watchdog ping (+ fallback poll) rather than a long idle.
|
||
|
||
**Liveness marker & max-wait (the watchdog ENFORCES this).** Every wait is capped at **10 minutes**;
|
||
to wait longer, wake at 10 min, re-check, and wait again. **Immediately before going idle for any
|
||
wait, your FINAL output line MUST be exactly:**
|
||
|
||
WAITING-UNTIL: <ISO-8601 UTC>
|
||
|
||
— the moment you intend to resume (≤10 min out, matching your `ScheduleWakeup`). Compute it from the
|
||
clock, e.g. `date -u -d '+10 min' +%FT%TZ`. The watchdog uses this to tell a healthy wait from a
|
||
wedge: if it sees a loop **idle ≥5 min with no current `WAITING-UNTIL` marker as its last message, OR
|
||
idle past the time the marker named, it kills + reboots that loop** (which then re-orients from git +
|
||
its STATUS/REVIEW files). So always leave a fresh marker before sleeping, and never overrun it.
|
||
|
||
**Proactive compaction.** If your context usage climbs high (≳80%), run `/compact` *before*
|
||
continuing — your state lives in git + the phase STATUS/REVIEW files, so compaction is lossless for
|
||
the loop and prevents wedging (garbled output, failed tool calls) near the context limit.
|
||
|
||
**Anti-drift guards.**
|
||
- Cap retries: if an approach fails 3× the same way, stop, write the dead-end in `DECISIONS.md`,
|
||
and try a different approach or mark blocked. No thrashing.
|
||
- Never weaken a test to make it pass. A red test is information; "fix" the recipe/harness or file
|
||
a finding — do not delete the assertion. (This is the single most important rule; the Adversary
|
||
watches specifically for tests being softened or skipped.)
|
||
- Keep changes reversible; prefer Nix-declared state over imperative server edits so any rebuild
|
||
reproduces it.
|
||
- Don't expand scope beyond §2. New ideas → `BACKLOG.md` (tagged `[idea]`), not into this run.
|
||
|
||
---
|
||
|
||
## 8. Open decisions to settle early (log in DECISIONS.md)
|
||
|
||
- Deploy mechanism: `nixos-rebuild --target-host` vs `deploy-rs`/`colmena`. (Default: deploy-rs
|
||
for atomic rollbacks; nixos-rebuild fine if simpler.)
|
||
- Webhook scope: per-repo vs org-level Gitea webhook. (Default: per-repo via enroll script.)
|
||
- Drone runner type: exec vs privileged docker. (Default: exec, since it must drive host abra.)
|
||
- Secret tool: sops-nix vs agenix. (Default: sops-nix for multi-recipient + yaml ergonomics.)
|
||
- Reverse proxy / Wildcard TLS: **SETTLED — deploy the real `coop-cloud/traefik` recipe via abra
|
||
(for e2e fidelity), in wildcard/file-provider mode, serving the operator's pre-issued wildcard
|
||
cert; no ACME, no token** (§4.0/§4.2). Supersedes the original plan's hand-rolled
|
||
`modules/traefik.nix`. The operator issued the wildcard SAN cert (`*.ci.commoninternet.net` +
|
||
`ci.commoninternet.net`) via LE DNS-01/Gandi out-of-band into `/var/lib/ci-certs/live/`; the agent
|
||
feeds it as the recipe's `ssl_cert`/`ssl_key` swarm secrets so the DNS-editing token never reaches
|
||
cc-ci. **Manual renewal** ~90 days (next ~2026-08-24): re-issue → update the secret → redeploy.
|
||
- Proof recipe set (D10 — six, category-spanning). Default candidates, all previously verified
|
||
deployable: `hedgedoc`, `cryptpad`, `keycloak`, `authentik`, `lasuite-docs`/`lasuite-drive`,
|
||
`matrix-synapse`, `immich`, `bluesky-pds`. Lock the final six early so M4–M6.5 build toward them.
|
||
Sequence easy→hard: prove the pipeline on `hedgedoc`/`cryptpad` before tackling SSO, S3, media
|
||
stores, and TLS-passthrough recipes.
|
||
|
||
Each default stands until the Adversary or reality forces a change; record the change and why.
|
||
|
||
---
|
||
|
||
## 9. Guardrails / hard rules
|
||
|
||
- **Access boundary:** only cc-ci is yours to reconfigure. Recipe repos: read + comment + (when
|
||
enrolling) add a webhook — nothing else. Never push to a recipe repo's code.
|
||
- **No secrets in git/logs/UI.** Ever. Verified by the Adversary's leak test.
|
||
- **No mocks for the e2e stages.** D2 means real deploys. If something can't be tested for real,
|
||
it's a finding, not a pass.
|
||
- **Idempotent + reversible.** Anything done to the server must be re-derivable from the repo.
|
||
Infra bring-up is **declarative idempotent reconciliation in Nix** — not manual post-steps and not
|
||
run-once scripts. Each piece (swarm + `proxy` net, the traefik recipe deploy, Drone, the
|
||
comment-bridge, the dashboard) is a systemd **oneshot that re-runs on every activation/boot** and
|
||
*converges* to the desired state (inspect → act only if needed → no-op if already correct), like
|
||
`swarm-init`. **No `/var/lib/.bootstrapped`-style sentinels** (they don't self-heal drift). The
|
||
goal: a from-scratch install is `git clone` + `nixos-rebuild switch` + the operator preconditions
|
||
— `docs/install.md` must never accumulate manual post-rebuild steps.
|
||
- **Stop on missing *external* infra inputs** (class-A1 in §4.4: cc-ci SSH/root access, the
|
||
Tailscale auth key, Gitea bot creds, the pre-issued wildcard cert at `/var/lib/ci-certs/live/`,
|
||
registry creds — and the preconfigured DNS/gateway facts) rather than improvising around them;
|
||
surface in `STATUS.md ## Blocked`. **Never** attempt ACME/DNS-01 for `commoninternet.net` — the
|
||
cert is pre-provided and renewed out-of-band by the operator. **This does NOT apply to** internal infra secrets (class-A2: Drone RPC,
|
||
webhook HMAC, Gitea OAuth app, host age key — the agent generates these) or to recipe app secrets
|
||
(class-B): those the test harness generates itself (`abra app secret generate` + chosen fixtures),
|
||
persists for the run, and destroys at teardown — a missing app secret is never a blocker, it is
|
||
something the harness
|
||
creates. See §4.4.
|
||
- **Real abra deploys; abra convergence by default; custom readiness only if it's a real test.**
|
||
Deploys/upgrades use the **real abra commands** (`abra app deploy`/`upgrade`) — never bypass abra
|
||
with `docker service update`/`scale`. **Prefer abra's own convergence checks.** Only skip abra's
|
||
post-deploy convergence monitor (`-c`/`--no-converge-checks`) and substitute a **harness READY_PROBE**
|
||
when abra's monitor genuinely doesn't fit (e.g. its window is too short for a heavy app and it FATAs
|
||
on a deploy that *does* converge). When you do: the deploy is still real abra (only abra's *waiting*
|
||
is replaced), and the probe MUST be a **genuinely strict** readiness test — all services N/N **plus**
|
||
a real app-level check — that **RAISES on actual non-readiness**, never a no-op that masks a failed
|
||
deploy. **Prove it has teeth** (a negative test that fails on stuck convergence, e.g. F2-12's
|
||
P7-negative). The Adversary treats a custom probe as a potential test-weakening until cold-verified.
|
||
- **Custom cc-ci compose overlays — avoid where possible, justify each, prefer upstream.** A
|
||
cc-ci-authored compose overlay (an extra `compose.*.yml` layered via `COMPOSE_FILE`) risks **drift**
|
||
from the recipe users actually run, so **avoid it where possible and justify each use**. In most
|
||
cases the cleaner fix is an **upstream recipe PR** — either a genuine robustness fix, or exposing a
|
||
knob the recipe should expose. **But a single, uniform, optional `compose.ccci.yml` overlay file per
|
||
recipe is an acceptable fallback** — especially for a value abra/compose can't take from an env var.
|
||
(One fixed filename per recipe — `compose.ccci.yml` — holding all cc-ci-side deploy tweaks; not
|
||
per-purpose suffixes.)
|
||
**Known limitation (builder, 2026-05-30): abra does NOT support an env value for a healthcheck
|
||
`start_period`.** So the ghost/discourse `start_period` bumps legitimately **need** the overlay (an
|
||
env-var PR is not possible for that field) — these overlays **stay**, justified. When you do use an
|
||
overlay: keep it **minimal + single-purpose**, **document WHY in the file header** (the exact abra/
|
||
upstream limitation that forces it), have the **Adversary confirm it doesn't weaken a test or mask a
|
||
recipe defect**, and **file the upstream PR where the fix genuinely belongs** (e.g. if a recipe's
|
||
`start_period` is too tight for any slow host, propose raising it upstream too).
|
||
- **Upgrade tier: always test the upgrade to the LATEST version.** Don't drop the upgrade test just
|
||
because the *from* (older) version is awkward. If an older from-version can't be fully deployed/tested
|
||
(its image tag was pulled from the registry, or it predates an overlay/feature), you do **NOT** need
|
||
that older version's **custom tests** to run — deploy it minimally (a justified overlay is fine) or
|
||
pick the nearest deployable prior, then **upgrade to latest and run the full assertions on the
|
||
latest**. Skipping a from-version's custom tests is an honest, recorded outcome; skipping the
|
||
upgrade-to-latest is not. (See `plan-ccci-compose-overlay-policy.md` for the per-recipe disposition.)
|
||
- **Honest reporting.** If a stage is skipped or a check failed, say so in `STATUS.md`/`JOURNAL.md`
|
||
with the output. The loop's value depends entirely on the ledgers being true.
|