review: M1 PASS (cold E2E: wildcard HTTPS via abra+traefik, clean teardown); file [adversary] A1 ACME-hazard
This commit is contained in:
17
BACKLOG.md
17
BACKLOG.md
@ -68,3 +68,20 @@ Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adver
|
|||||||
|
|
||||||
## Adversary findings
|
## Adversary findings
|
||||||
<!-- Adversary-only section. Builder must not edit below this line. -->
|
<!-- Adversary-only section. Builder must not edit below this line. -->
|
||||||
|
|
||||||
|
- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
|
||||||
|
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
|
||||||
|
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
|
||||||
|
(stock coop-cloud template). They're currently inert (no router references them; both
|
||||||
|
`*-acme.json` are 0 bytes; 0 ACME log lines) because the proxy runs `LETS_ENCRYPT_ENV=""`.
|
||||||
|
**But** the recipe default for test apps (e.g. `custom-html/.env.sample`) ships
|
||||||
|
`LETS_ENCRYPT_ENV=production`, which renders `traefik.http.routers.<app>.tls.certresolver=production`.
|
||||||
|
So if the harness (M4+) deploys a test app *without* forcing `LETS_ENCRYPT_ENV=""`, traefik
|
||||||
|
WILL attempt Let's Encrypt HTTP-01 for that app's domain — contradicting the "NO ACME" design,
|
||||||
|
hitting LE rate limits, and likely failing (HTTP-01 needs :80 reachable; gateway passes TLS).
|
||||||
|
*Repro:* `abra app new custom-html -D x.ci.commoninternet.net` (keep default env) → deploy →
|
||||||
|
`docker service inspect <app> ... | grep certresolver` shows `=production`.
|
||||||
|
*Fix:* harness must force `LETS_ENCRYPT_ENV=""` (or strip the certresolver label) on every
|
||||||
|
test-app deploy; and/or remove the unused `certificatesResolvers` from cc-ci's traefik so
|
||||||
|
no-ACME is structural. Re-test: deploy a test app via the harness and confirm 0 ACME log lines
|
||||||
|
+ served cert is the wildcard. Adversary closes after re-test.
|
||||||
|
|||||||
36
REVIEW.md
36
REVIEW.md
@ -28,3 +28,39 @@ closure includes docker, `unit-swarm-init`, and **traefik** units (`traefik.yml`
|
|||||||
`traefik-stack.yml`, `unit-traefik-deploy`) that are **not yet committed** (HEAD `ab839ae` is
|
`traefik-stack.yml`, `unit-traefik-deploy`) that are **not yet committed** (HEAD `ab839ae` is
|
||||||
swarm-only, no traefik). Expected mid-M1 churn, but the Traefik config must be committed to the
|
swarm-only, no traefik). Expected mid-M1 churn, but the Traefik config must be committed to the
|
||||||
repo before M1 is claimed or it fails D8 reproducibility — will check at the M1 gate.
|
repo before M1 is claimed or it fails D8 reproducibility — will check at the M1 gate.
|
||||||
|
|
||||||
|
## M1 — Swarm + abra target: PASS @2026-05-26T22:20Z
|
||||||
|
|
||||||
|
Verified cold from own clone; deployed my **own** probe recipe via abra (not trusting the Builder's
|
||||||
|
hand-test). Acceptance "a recipe deployed via abra is reachable over HTTPS at
|
||||||
|
`*.ci.commoninternet.net`, then fully torn down leaving no volumes" + orchestrator's M1 checklist
|
||||||
|
(a–d).
|
||||||
|
|
||||||
|
- **(a) Real coop-cloud/traefik recipe (not hand-rolled):** `docker service ls` →
|
||||||
|
`traefik_…_app` (`traefik:v3.6.15`) + `…_socket-proxy` (lscr.io socket-proxy) — the canonical
|
||||||
|
recipe layout, deployed via abra (`scripts/deploy-proxy.sh`). `modules/traefik.nix` is deleted.
|
||||||
|
- **(b) Wildcard on web-secure + proxy overlay:** static `traefik.yml` has `web-secure: :443`
|
||||||
|
(web→web-secure 301 redirect, verified live). File provider `/etc/traefik/file-provider.yml`:
|
||||||
|
`tls.certificates: [{certFile:/run/secrets/ssl_cert, keyFile:/run/secrets/ssl_key}]`; swarm
|
||||||
|
secrets `…_ssl_cert_v1`/`…_ssl_key_v1` mounted (2909 B / 227 B = the pre-issued cert). My probe
|
||||||
|
app `advm1probe_…_app` was attached to the `proxy` overlay.
|
||||||
|
- **E2E (cold deploy):** `abra app new custom-html -D advm1probe.ci.commoninternet.net` (forced
|
||||||
|
`LETS_ENCRYPT_ENV=""`) → `deploy succeeded 🟢`. Via SOCKS proxy: **HTTP 200**; served cert
|
||||||
|
`subject: CN=*.ci.commoninternet.net`, SAN-matched, `SSL certificate verify ok`, issuer LE E8 —
|
||||||
|
i.e. the **pre-issued wildcard**, NOT a per-host ACME cert.
|
||||||
|
- **(c) No Gandi/DNS token, no ACME credential:** repo (all history) clean; on host the only
|
||||||
|
gandi/dns-challenge strings are **commented-out** recipe-template options (`#GANDI_…`,
|
||||||
|
`#SECRET_GANDIV5_…`) holding no value. Active traefik env = `LETS_ENCRYPT_ENV=` (empty),
|
||||||
|
`WILDCARDS_ENABLED=1`, `compose.wildcard.yml`. `staging`/`production` certResolvers are *defined*
|
||||||
|
in traefik.yml (stock template) but **referenced by no router**; both acme.json are **0 bytes**;
|
||||||
|
**0 ACME lines in traefik logs**. No ACME ever fires. (Hardening risk filed — see findings.)
|
||||||
|
- **(d) Manual renewal documented:** DECISIONS.md — operator re-issues at same paths, then
|
||||||
|
`abra app secret rm … ssl_cert` + re-insert at bumped version; install.md "Renewed out-of-band;
|
||||||
|
never ACME here."
|
||||||
|
- **Teardown:** `abra app undeploy` + `volume remove` → post-teardown services/containers/volumes/
|
||||||
|
secrets for the probe **all 0**. Also independently confirmed the Builder's `cchtml1` test left 0
|
||||||
|
runtime resources (only its inert `.env` config file remains, harmless).
|
||||||
|
|
||||||
|
Verdict: **M1 PASS.** Not a hard fail on (c) — no token/credential exists and no ACME fires — but
|
||||||
|
the inert ACME resolvers + test-app default `LETS_ENCRYPT_ENV=production` are a latent hazard that
|
||||||
|
goes live when the harness deploys apps; filed as `[adversary]` for M4.
|
||||||
|
|||||||
Reference in New Issue
Block a user