review: M1 PASS (cold E2E: wildcard HTTPS via abra+traefik, clean teardown); file [adversary] A1 ACME-hazard
This commit is contained in:
17
BACKLOG.md
17
BACKLOG.md
@ -68,3 +68,20 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
|
||||
|
||||
## Adversary findings
|
||||
<!-- Adversary-only section. Builder must not edit below this line. -->
|
||||
|
||||
- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
|
||||
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
|
||||
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
|
||||
(stock coop-cloud template). They're currently inert (no router references them; both
|
||||
`*-acme.json` are 0 bytes; 0 ACME log lines) because the proxy runs `LETS_ENCRYPT_ENV=""`.
|
||||
**But** the recipe default for test apps (e.g. `custom-html/.env.sample`) ships
|
||||
`LETS_ENCRYPT_ENV=production`, which renders `traefik.http.routers.<app>.tls.certresolver=production`.
|
||||
So if the harness (M4+) deploys a test app *without* forcing `LETS_ENCRYPT_ENV=""`, traefik
|
||||
WILL attempt Let's Encrypt HTTP-01 for that app's domain — contradicting the "NO ACME" design,
|
||||
hitting LE rate limits, and likely failing (HTTP-01 needs :80 reachable; gateway passes TLS).
|
||||
*Repro:* `abra app new custom-html -D x.ci.commoninternet.net` (keep default env) → deploy →
|
||||
`docker service inspect <app> ... | grep certresolver` shows `=production`.
|
||||
*Fix:* harness must force `LETS_ENCRYPT_ENV=""` (or strip the certresolver label) on every
|
||||
test-app deploy; and/or remove the unused `certificatesResolvers` from cc-ci's traefik so
|
||||
no-ACME is structural. Re-test: deploy a test app via the harness and confirm 0 ACME log lines
|
||||
+ served cert is the wildcard. Adversary closes after re-test.
|
||||
|
||||
36
REVIEW.md
36
REVIEW.md
@ -28,3 +28,39 @@ closure includes docker, `unit-swarm-init`, and **traefik** units (`traefik.yml`
|
||||
`traefik-stack.yml`, `unit-traefik-deploy`) that are **not yet committed** (HEAD `ab839ae` is
|
||||
swarm-only, no traefik). Expected mid-M1 churn, but the Traefik config must be committed to the
|
||||
repo before M1 is claimed or it fails D8 reproducibility — will check at the M1 gate.
|
||||
|
||||
## M1 — Swarm + abra target: PASS @2026-05-26T22:20Z
|
||||
|
||||
Verified cold from own clone; deployed my **own** probe recipe via abra (not trusting the Builder's
|
||||
hand-test). Acceptance "a recipe deployed via abra is reachable over HTTPS at
|
||||
`*.ci.commoninternet.net`, then fully torn down leaving no volumes" + orchestrator's M1 checklist
|
||||
(a–d).
|
||||
|
||||
- **(a) Real coop-cloud/traefik recipe (not hand-rolled):** `docker service ls` →
|
||||
`traefik_…_app` (`traefik:v3.6.15`) + `…_socket-proxy` (lscr.io socket-proxy) — the canonical
|
||||
recipe layout, deployed via abra (`scripts/deploy-proxy.sh`). `modules/traefik.nix` is deleted.
|
||||
- **(b) Wildcard on web-secure + proxy overlay:** static `traefik.yml` has `web-secure: :443`
|
||||
(web→web-secure 301 redirect, verified live). File provider `/etc/traefik/file-provider.yml`:
|
||||
`tls.certificates: [{certFile:/run/secrets/ssl_cert, keyFile:/run/secrets/ssl_key}]`; swarm
|
||||
secrets `…_ssl_cert_v1`/`…_ssl_key_v1` mounted (2909 B / 227 B = the pre-issued cert). My probe
|
||||
app `advm1probe_…_app` was attached to the `proxy` overlay.
|
||||
- **E2E (cold deploy):** `abra app new custom-html -D advm1probe.ci.commoninternet.net` (forced
|
||||
`LETS_ENCRYPT_ENV=""`) → `deploy succeeded 🟢`. Via SOCKS proxy: **HTTP 200**; served cert
|
||||
`subject: CN=*.ci.commoninternet.net`, SAN-matched, `SSL certificate verify ok`, issuer LE E8 —
|
||||
i.e. the **pre-issued wildcard**, NOT a per-host ACME cert.
|
||||
- **(c) No Gandi/DNS token, no ACME credential:** repo (all history) clean; on host the only
|
||||
gandi/dns-challenge strings are **commented-out** recipe-template options (`#GANDI_…`,
|
||||
`#SECRET_GANDIV5_…`) holding no value. Active traefik env = `LETS_ENCRYPT_ENV=` (empty),
|
||||
`WILDCARDS_ENABLED=1`, `compose.wildcard.yml`. `staging`/`production` certResolvers are *defined*
|
||||
in traefik.yml (stock template) but **referenced by no router**; both acme.json are **0 bytes**;
|
||||
**0 ACME lines in traefik logs**. No ACME ever fires. (Hardening risk filed — see findings.)
|
||||
- **(d) Manual renewal documented:** DECISIONS.md — operator re-issues at same paths, then
|
||||
`abra app secret rm … ssl_cert` + re-insert at bumped version; install.md "Renewed out-of-band;
|
||||
never ACME here."
|
||||
- **Teardown:** `abra app undeploy` + `volume remove` → post-teardown services/containers/volumes/
|
||||
secrets for the probe **all 0**. Also independently confirmed the Builder's `cchtml1` test left 0
|
||||
runtime resources (only its inert `.env` config file remains, harmless).
|
||||
|
||||
Verdict: **M1 PASS.** Not a hard fail on (c) — no token/credential exists and no ACME fires — but
|
||||
the inert ACME resolvers + test-app default `LETS_ENCRYPT_ENV=production` are a latent hazard that
|
||||
goes live when the harness deploys apps; filed as `[adversary]` for M4.
|
||||
|
||||
Reference in New Issue
Block a user