All checks were successful
continuous-integration/drone/push Build is passing
Build #1 success (clone+hello on exec runner). Drone<->Gitea OAuth scripted as one-time bootstrap-drone-oauth.sh. M2 claimed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
91 lines
5.3 KiB
Markdown
91 lines
5.3 KiB
Markdown
# BACKLOG — cc-ci
|
||
|
||
Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adversary edits only
|
||
`## Adversary findings`. Closing an item = checking the box in your own section.
|
||
|
||
## Build backlog
|
||
|
||
### M0 — Foundations
|
||
- [x] Author flake.nix (NixOS host cc-ci) + hosts/cc-ci/{configuration,hardware}.nix from baseline
|
||
- [x] Deploy mechanism decision + first rebuild from repo (DECISIONS.md) — switch --flake on host
|
||
- [x] sops-nix wiring: host age key (from ssh host key) + master recovery key; secrets/secrets.yaml;
|
||
decrypt a test secret on host → /run/secrets/test_secret (0400 root) verified
|
||
- [x] Gate: M0 — `ssh cc-ci 'systemctl is-system-running'` healthy after rebuild from repo
|
||
→ CLAIMED 2026-05-26, awaiting Adversary (see STATUS.md)
|
||
|
||
### M1 — Swarm + abra target
|
||
- [x] Docker + single-node swarm via Nix (modules/swarm.nix: docker + swarm-init oneshot + `proxy`
|
||
overlay net + daily autoprune). Verified: Swarm=active, proxy overlay present.
|
||
- [x] Proxy = real coop-cloud/traefik via abra (orchestrator decision, replaces custom traefik.nix):
|
||
wildcard/file-provider mode, pre-issued cert as ssl_cert/ssl_key swarm secrets, LETS_ENCRYPT_ENV
|
||
empty → no ACME. `scripts/deploy-proxy.sh` (idempotent). Verified E2E via gateway: wildcard cert
|
||
served, 0 ACME log lines.
|
||
- [x] abra installed (modules/abra.nix, pinned 0.13.0-beta); deployed custom-html by hand over HTTPS
|
||
(HTTP 200 nginx page via gateway) and tore it down clean (services/volumes/secrets/containers=0).
|
||
- [x] Gate: M1 — recipe reachable over HTTPS at *.ci.commoninternet.net, torn down clean →
|
||
CLAIMED 2026-05-26, awaiting Adversary.
|
||
|
||
### M2 — Drone online
|
||
- [x] Drone server (coop-cloud recipe, reconcile oneshot) + exec runner via Nix; Gitea OAuth app.
|
||
Server healthz 200 via gateway; runner polling (capacity=2, type=exec).
|
||
- [x] hello-world .drone.yml runs green; logs visible (Drone UI + API). Build #1 success: clone +
|
||
hello (echo/whoami=root/abra 0.13.0-beta/swarm=active), both exit 0.
|
||
- [x] Gate: M2 — push to cc-ci triggers visible green build → CLAIMED 2026-05-26, awaiting Adversary.
|
||
OAuth link via one-time `scripts/bootstrap-drone-oauth.sh` (documented in install.md §2).
|
||
|
||
### M3 — Comment bridge
|
||
- [ ] comment-bridge service: HMAC verify, !testme exact match, collaborator check, Drone API call
|
||
- [ ] PR comment posting with run link
|
||
- [ ] Gate: M3 — live demo on scratch PR; auth enforced
|
||
|
||
### M4 — Harness + install stage
|
||
- [ ] run_recipe_ci.py + conftest; install stage for recipe #1 + Playwright assertion; teardown
|
||
- [ ] Gate: M4 — green install run, no orphaned app/volume
|
||
|
||
### M5 — Upgrade + backup/restore stages
|
||
- [ ] Add upgrade + backup/restore stages for recipe #1
|
||
- [ ] Gate: M5 — upgrade preserves data; backup→mutate→restore returns original
|
||
|
||
### M6 — Recipe-local tests + second recipe
|
||
- [ ] Discover/run recipe-repo tests/; enroll DB-backed recipe #2
|
||
- [ ] Gate: M6 — both green; recipe-local tests merged
|
||
|
||
### M6.5 — Breadth ramp (recipes 3→6)
|
||
- [ ] Enroll recipes 3–6 covering remaining D10 categories, no harness surgery
|
||
- [ ] Gate: M6.5 — recipes 3–6 three-stage green
|
||
|
||
### M7 — Secrets hardening (D6)
|
||
- [ ] Full sops model, rotation doc, log redaction + leak test
|
||
- [ ] Gate: M7 — secret-grep finds nothing
|
||
|
||
### M8 — Dashboard (D7)
|
||
- [ ] Overview page + badges + PR-comment outcome reflection
|
||
- [ ] Gate: M8 — overview matches reality; outcomes mirrored
|
||
|
||
### M9 — Reproducibility + docs (D8/D9)
|
||
- [ ] docs/install.md from-scratch rebuild; all docs complete
|
||
- [ ] Gate: M9 — Adversary rebuilds from docs on throwaway host
|
||
|
||
### M10 — Proof (D10)
|
||
- [ ] All six recipes green via real !testme PRs; flip STATUS to DONE
|
||
|
||
## Adversary findings
|
||
<!-- Adversary-only section. Builder must not edit below this line. -->
|
||
|
||
- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
|
||
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
|
||
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
|
||
(stock coop-cloud template). They're currently inert (no router references them; both
|
||
`*-acme.json` are 0 bytes; 0 ACME log lines) because the proxy runs `LETS_ENCRYPT_ENV=""`.
|
||
**But** the recipe default for test apps (e.g. `custom-html/.env.sample`) ships
|
||
`LETS_ENCRYPT_ENV=production`, which renders `traefik.http.routers.<app>.tls.certresolver=production`.
|
||
So if the harness (M4+) deploys a test app *without* forcing `LETS_ENCRYPT_ENV=""`, traefik
|
||
WILL attempt Let's Encrypt HTTP-01 for that app's domain — contradicting the "NO ACME" design,
|
||
hitting LE rate limits, and likely failing (HTTP-01 needs :80 reachable; gateway passes TLS).
|
||
*Repro:* `abra app new custom-html -D x.ci.commoninternet.net` (keep default env) → deploy →
|
||
`docker service inspect <app> ... | grep certresolver` shows `=production`.
|
||
*Fix:* harness must force `LETS_ENCRYPT_ENV=""` (or strip the certresolver label) on every
|
||
test-app deploy; and/or remove the unused `certificatesResolvers` from cc-ci's traefik so
|
||
no-ACME is structural. Re-test: deploy a test app via the harness and confirm 0 ACME log lines
|
||
+ served cert is the wildcard. Adversary closes after re-test.
|