From 352f624ce68cb6331cec2009edda35efef199dc3 Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Tue, 26 May 2026 22:38:26 +0100 Subject: [PATCH] review: M1 PASS (cold E2E: wildcard HTTPS via abra+traefik, clean teardown); file [adversary] A1 ACME-hazard --- BACKLOG.md | 17 +++++++++++++++++ REVIEW.md | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) diff --git a/BACKLOG.md b/BACKLOG.md index b54c222..e7e7f93 100644 --- a/BACKLOG.md +++ b/BACKLOG.md @@ -68,3 +68,20 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver ## Adversary findings + +- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).** + Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static + config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers` + (stock coop-cloud template). They're currently inert (no router references them; both + `*-acme.json` are 0 bytes; 0 ACME log lines) because the proxy runs `LETS_ENCRYPT_ENV=""`. + **But** the recipe default for test apps (e.g. `custom-html/.env.sample`) ships + `LETS_ENCRYPT_ENV=production`, which renders `traefik.http.routers..tls.certresolver=production`. + So if the harness (M4+) deploys a test app *without* forcing `LETS_ENCRYPT_ENV=""`, traefik + WILL attempt Let's Encrypt HTTP-01 for that app's domain — contradicting the "NO ACME" design, + hitting LE rate limits, and likely failing (HTTP-01 needs :80 reachable; gateway passes TLS). + *Repro:* `abra app new custom-html -D x.ci.commoninternet.net` (keep default env) → deploy → + `docker service inspect ... | grep certresolver` shows `=production`. + *Fix:* harness must force `LETS_ENCRYPT_ENV=""` (or strip the certresolver label) on every + test-app deploy; and/or remove the unused `certificatesResolvers` from cc-ci's traefik so + no-ACME is structural. Re-test: deploy a test app via the harness and confirm 0 ACME log lines + + served cert is the wildcard. Adversary closes after re-test. diff --git a/REVIEW.md b/REVIEW.md index 78dccb9..29c8d9d 100644 --- a/REVIEW.md +++ b/REVIEW.md @@ -28,3 +28,39 @@ closure includes docker, `unit-swarm-init`, and **traefik** units (`traefik.yml` `traefik-stack.yml`, `unit-traefik-deploy`) that are **not yet committed** (HEAD `ab839ae` is swarm-only, no traefik). Expected mid-M1 churn, but the Traefik config must be committed to the repo before M1 is claimed or it fails D8 reproducibility — will check at the M1 gate. + +## M1 — Swarm + abra target: PASS @2026-05-26T22:20Z + +Verified cold from own clone; deployed my **own** probe recipe via abra (not trusting the Builder's +hand-test). Acceptance "a recipe deployed via abra is reachable over HTTPS at +`*.ci.commoninternet.net`, then fully torn down leaving no volumes" + orchestrator's M1 checklist +(a–d). + +- **(a) Real coop-cloud/traefik recipe (not hand-rolled):** `docker service ls` → + `traefik_…_app` (`traefik:v3.6.15`) + `…_socket-proxy` (lscr.io socket-proxy) — the canonical + recipe layout, deployed via abra (`scripts/deploy-proxy.sh`). `modules/traefik.nix` is deleted. +- **(b) Wildcard on web-secure + proxy overlay:** static `traefik.yml` has `web-secure: :443` + (web→web-secure 301 redirect, verified live). File provider `/etc/traefik/file-provider.yml`: + `tls.certificates: [{certFile:/run/secrets/ssl_cert, keyFile:/run/secrets/ssl_key}]`; swarm + secrets `…_ssl_cert_v1`/`…_ssl_key_v1` mounted (2909 B / 227 B = the pre-issued cert). My probe + app `advm1probe_…_app` was attached to the `proxy` overlay. +- **E2E (cold deploy):** `abra app new custom-html -D advm1probe.ci.commoninternet.net` (forced + `LETS_ENCRYPT_ENV=""`) → `deploy succeeded 🟢`. Via SOCKS proxy: **HTTP 200**; served cert + `subject: CN=*.ci.commoninternet.net`, SAN-matched, `SSL certificate verify ok`, issuer LE E8 — + i.e. the **pre-issued wildcard**, NOT a per-host ACME cert. +- **(c) No Gandi/DNS token, no ACME credential:** repo (all history) clean; on host the only + gandi/dns-challenge strings are **commented-out** recipe-template options (`#GANDI_…`, + `#SECRET_GANDIV5_…`) holding no value. Active traefik env = `LETS_ENCRYPT_ENV=` (empty), + `WILDCARDS_ENABLED=1`, `compose.wildcard.yml`. `staging`/`production` certResolvers are *defined* + in traefik.yml (stock template) but **referenced by no router**; both acme.json are **0 bytes**; + **0 ACME lines in traefik logs**. No ACME ever fires. (Hardening risk filed — see findings.) +- **(d) Manual renewal documented:** DECISIONS.md — operator re-issues at same paths, then + `abra app secret rm … ssl_cert` + re-insert at bumped version; install.md "Renewed out-of-band; + never ACME here." +- **Teardown:** `abra app undeploy` + `volume remove` → post-teardown services/containers/volumes/ + secrets for the probe **all 0**. Also independently confirmed the Builder's `cchtml1` test left 0 + runtime resources (only its inert `.env` config file remains, harmless). + +Verdict: **M1 PASS.** Not a hard fail on (c) — no token/credential exists and no ACME fires — but +the inert ACME resolvers + test-app default `LETS_ENCRYPT_ENV=production` are a latent hazard that +goes live when the harness deploys apps; filed as `[adversary]` for M4.