Files
cc-ci/BACKLOG.md
autonomic-bot e251a1177c
All checks were successful
continuous-integration/drone/push Build is passing
M2 GATE: green build via push (Drone + exec runner); OAuth bootstrap script + docs
Build #1 success (clone+hello on exec runner). Drone<->Gitea OAuth scripted as
one-time bootstrap-drone-oauth.sh. M2 claimed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-26 23:08:38 +01:00

91 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BACKLOG — cc-ci
Two single-writer sections (§6.1): Builder edits only `## Build backlog`; Adversary edits only
`## Adversary findings`. Closing an item = checking the box in your own section.
## Build backlog
### M0 — Foundations
- [x] Author flake.nix (NixOS host cc-ci) + hosts/cc-ci/{configuration,hardware}.nix from baseline
- [x] Deploy mechanism decision + first rebuild from repo (DECISIONS.md) — switch --flake on host
- [x] sops-nix wiring: host age key (from ssh host key) + master recovery key; secrets/secrets.yaml;
decrypt a test secret on host → /run/secrets/test_secret (0400 root) verified
- [x] Gate: M0 — `ssh cc-ci 'systemctl is-system-running'` healthy after rebuild from repo
→ CLAIMED 2026-05-26, awaiting Adversary (see STATUS.md)
### M1 — Swarm + abra target
- [x] Docker + single-node swarm via Nix (modules/swarm.nix: docker + swarm-init oneshot + `proxy`
overlay net + daily autoprune). Verified: Swarm=active, proxy overlay present.
- [x] Proxy = real coop-cloud/traefik via abra (orchestrator decision, replaces custom traefik.nix):
wildcard/file-provider mode, pre-issued cert as ssl_cert/ssl_key swarm secrets, LETS_ENCRYPT_ENV
empty → no ACME. `scripts/deploy-proxy.sh` (idempotent). Verified E2E via gateway: wildcard cert
served, 0 ACME log lines.
- [x] abra installed (modules/abra.nix, pinned 0.13.0-beta); deployed custom-html by hand over HTTPS
(HTTP 200 nginx page via gateway) and tore it down clean (services/volumes/secrets/containers=0).
- [x] Gate: M1 — recipe reachable over HTTPS at *.ci.commoninternet.net, torn down clean →
CLAIMED 2026-05-26, awaiting Adversary.
### M2 — Drone online
- [x] Drone server (coop-cloud recipe, reconcile oneshot) + exec runner via Nix; Gitea OAuth app.
Server healthz 200 via gateway; runner polling (capacity=2, type=exec).
- [x] hello-world .drone.yml runs green; logs visible (Drone UI + API). Build #1 success: clone +
hello (echo/whoami=root/abra 0.13.0-beta/swarm=active), both exit 0.
- [x] Gate: M2 — push to cc-ci triggers visible green build → CLAIMED 2026-05-26, awaiting Adversary.
OAuth link via one-time `scripts/bootstrap-drone-oauth.sh` (documented in install.md §2).
### M3 — Comment bridge
- [ ] comment-bridge service: HMAC verify, !testme exact match, collaborator check, Drone API call
- [ ] PR comment posting with run link
- [ ] Gate: M3 — live demo on scratch PR; auth enforced
### M4 — Harness + install stage
- [ ] run_recipe_ci.py + conftest; install stage for recipe #1 + Playwright assertion; teardown
- [ ] Gate: M4 — green install run, no orphaned app/volume
### M5 — Upgrade + backup/restore stages
- [ ] Add upgrade + backup/restore stages for recipe #1
- [ ] Gate: M5 — upgrade preserves data; backup→mutate→restore returns original
### M6 — Recipe-local tests + second recipe
- [ ] Discover/run recipe-repo tests/; enroll DB-backed recipe #2
- [ ] Gate: M6 — both green; recipe-local tests merged
### M6.5 — Breadth ramp (recipes 3→6)
- [ ] Enroll recipes 36 covering remaining D10 categories, no harness surgery
- [ ] Gate: M6.5 — recipes 36 three-stage green
### M7 — Secrets hardening (D6)
- [ ] Full sops model, rotation doc, log redaction + leak test
- [ ] Gate: M7 — secret-grep finds nothing
### M8 — Dashboard (D7)
- [ ] Overview page + badges + PR-comment outcome reflection
- [ ] Gate: M8 — overview matches reality; outcomes mirrored
### M9 — Reproducibility + docs (D8/D9)
- [ ] docs/install.md from-scratch rebuild; all docs complete
- [ ] Gate: M9 — Adversary rebuilds from docs on throwaway host
### M10 — Proof (D10)
- [ ] All six recipes green via real !testme PRs; flip STATUS to DONE
## Adversary findings
<!-- Adversary-only section. Builder must not edit below this line. -->
- [ ] **[adversary] A1 — Test-app deploys can silently trigger ACME (no-ACME design hazard).**
Found during M1 verify (M1 still PASSes — proxy itself fires no ACME). cc-ci's traefik static
config (`/etc/traefik/traefik.yml`) defines `staging` + `production` HTTP-01 `certificatesResolvers`
(stock coop-cloud template). They're currently inert (no router references them; both
`*-acme.json` are 0 bytes; 0 ACME log lines) because the proxy runs `LETS_ENCRYPT_ENV=""`.
**But** the recipe default for test apps (e.g. `custom-html/.env.sample`) ships
`LETS_ENCRYPT_ENV=production`, which renders `traefik.http.routers.<app>.tls.certresolver=production`.
So if the harness (M4+) deploys a test app *without* forcing `LETS_ENCRYPT_ENV=""`, traefik
WILL attempt Let's Encrypt HTTP-01 for that app's domain — contradicting the "NO ACME" design,
hitting LE rate limits, and likely failing (HTTP-01 needs :80 reachable; gateway passes TLS).
*Repro:* `abra app new custom-html -D x.ci.commoninternet.net` (keep default env) → deploy →
`docker service inspect <app> ... | grep certresolver` shows `=production`.
*Fix:* harness must force `LETS_ENCRYPT_ENV=""` (or strip the certresolver label) on every
test-app deploy; and/or remove the unused `certificatesResolvers` from cc-ci's traefik so
no-ACME is structural. Re-test: deploy a test app via the harness and confirm 0 ACME log lines
+ served cert is the wildcard. Adversary closes after re-test.