Files
cc-ci/STATUS-1c.md
autonomic-bot ffd4565e73
All checks were successful
continuous-integration/drone/push Build is passing
1c: add operator-gated functional-acceptance e2e (W5.5) — real !testme via public gateway after VM promotion
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 18:46:50 +01:00

101 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# STATUS — Phase 1c (full git reproducibility + genuine D8 live rebuild)
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1c-full-reproducibility.md`
**Loop state for THIS phase:** STATUS-1c / BACKLOG-1c / REVIEW-1c / JOURNAL-1c (DECISIONS.md shared).
The repo's STATUS.md / BACKLOG.md / REVIEW.md are Phase-1 HISTORY — not this phase's state.
## Phase
**1c kickoff** — Phase 1 is DONE & Adversary-signed-off (1c10fa5; all D1D10 PASS, no VETO).
Now: make the VM fully reproducible from git (secrets+cert in a private `cc-ci-secrets` repo) and
perform a genuine throwaway-VM live rebuild to close D8 honestly.
## In flight — W4 DONE, Gate W4 CLAIMED
- W1 DONE (cc-nix-test 6→4 GB). W2 PASS (Adversary cold). W3 DONE (VM reachable).
- W4 DONE — genuine throwaway-VM live rebuild proven on a FRESH blank VM: only `/var/lib/sops-nix/
key.txt`=recovery key provisioned; `git clone --recursive` + **ONE** `nixos-rebuild switch
?submodules=1` → **running, 0 failed**, byte-identical **`ld19aj2`==cc-ci**, all 6 stacks 1/1, all
secrets+cert decrypted via recovery key, **TLS leaf == git cert** (`57:8D:…:B8:A6`), no manual step.
(Final config = ld19aj2: `sops.age.keyFile` + serialized abra reconcilers fixing a fresh-host race.)
- Throwaway destroyed (frees RAM for Adversary W5; C6 no-leftover). install.md updated to this procedure.
- Remaining: W5 (Adversary cold rebuild + honest D8 rewrite), W6 (docs C7 + final cc-nix-test sizing).
<details><summary>W2 detail (PASS)</summary>
## In flight — W2 (secrets repo + cert into git) — COMPLETE, gate claimed
- [x] **W2 step 1:** private `recipe-maintainers/cc-ci-secrets` created + populated (6 infra secrets
+ wildcard cert/key, sops, both recipients; sha256 byte-perfect) + pushed.
- [x] **W2 step 2:** base repo — `secrets/` is now the cc-ci-secrets submodule (gitlink 2312f1c);
secrets.nix adds `wildcard_cert`/`wildcard_key` → `/var/lib/ci-certs/live/*`; proxy.nix reframed.
Pushed f79e542. Switched live cc-ci (toplevel `vh6vwxbl…`). **Verified:** cert sops-decrypts from
git (symlinks, sha256 match), system running 0 failed, byte-identical (build==running), git-clone
`?submodules=1` path also reproduces `vh6vwxbl…`, live TLS valid (LE wildcard, ssl_verify=0).
- (Recovery-key `sops.age.keyFile` for the throwaway deferred to W3/W4 — re-verify byte-identical there.)
</details>
## Gate
**Gate: W4 — CLAIMED, awaiting Adversary @2026-05-27 ~18:45Z.** Genuine throwaway-VM live rebuild
(C4/C5/D8). For the Adversary's cold W5 (own fresh Incus VM in terraform-ci, ~4 GB; RAM is free — my
throwaway destroyed): provision ONLY `/var/lib/sops-nix/key.txt` = recovery age key (`age1cmk26…`
private half, from `/srv/cc-ci/.sops/master-age.txt`); `git clone --recursive` base+secrets (bot
creds); `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` (per docs/install.md).
Expect: running/0-failed, toplevel `ld19aj2…`==cc-ci, 6 stacks 1/1, cert sha256 `c1d96d61…`, local
`curl --resolve …:127.0.0.1` ssl_verify=0 with served leaf == git cert `57:8D:…:B8:A6`. Then rewrite
the D8 evidence (static byte-identical + live rebuild; drop "infeasible by design"). My evidence:
JOURNAL-1c 2026-05-27 W4 entry. (Note: throwaway base VM = Incus image; live TS_AUTH_KEY in cloud-init.)
**Gate: W2 — PASS @2026-05-27 16:55Z (Adversary, cold).** C1/C2/C3 verified (byte-identical, cert
from git + TLS leaf-match, no plaintext leak). Config has since evolved vh6vwxbl→izsmiajw→**ld19aj2**
(keyFile + serialized reconcilers); Adversary refreshed C1 against izsmiajw @18:00Z; ld19aj2 is final.
<details><summary>prior</summary>
**Gate: W2 — CLAIMED, awaiting Adversary @2026-05-27 ~16:45Z.**
Acceptance to verify (cold): (1) byte-identical `nixos-rebuild build .#cc-ci` == `/run/current-system`
(`vh6vwxbl4qr9whzpwgjimhf9gn4329p8`) — **must init the submodule** (`git clone --recursive` / `git
submodule update --init`, bot creds) then build `--flake 'git+file://<clone>?submodules=1#cc-ci'`, else
`secrets/` is empty; (2) cert sops-decrypted from git to `/var/lib/ci-certs/live/` (symlinks → /run/secrets,
sha256 `c1d96d61…`/`9ec25d00…`) + live TLS served (`https://ci.commoninternet.net`); (3) no plaintext
secret in base repo or Nix store (all 8 secrets ENC in cc-ci-secrets; cert decrypts to tmpfs, not store).
See JOURNAL-1c 2026-05-27 W2a entry for full evidence.
</details>
## Definition of Done (C1C7 — see phase plan §3)
- [x] C1 — Secrets-repo split (Adversary-PASS 16:55Z; re-exercised cold on blank host at C4)
- [x] C2 — Cert in git (Adversary-PASS 16:55Z; re-exercised at C4)
- [x] C3 — All secrets in git, one exception = bootstrap age key (Adversary-PASS 16:55Z; keyFile-on-throwaway at W4)
- [ ] C4 — Genuine throwaway-VM live rebuild (Incus terraform-ci, only age key provisioned)
- [ ] C5 — Honest D8 (static byte-identical + live rebuild; "infeasible by design" removed)
- [ ] C6 — Resource fit + cleanup (cc-nix-test 6→4 GB, throwaway 4 GB, destroyed after; final sizing decided)
- [ ] C7 — Docs (install.md/secrets.md/architecture.md + main plan refs updated to new model)
## ⚠️ Operator override — do NOT destroy the FINAL throwaway VM (read before W5/W6 cleanup)
The operator (2026-05-27) will **repurpose the final W5/C4-C5 clean-room throwaway VM as the new
cc-nix-test** for a live real-traffic test. So: **KEEP that VM running after W5 PASSes — do NOT tear
it down in C6/W6.** Defer its teardown until the operator explicitly says otherwise. This overrides the
plan's "destroy the throwaway" for that one VM. (Adversary: please do not destroy your W5 VM on PASS.)
This also settles C6 final sizing = **promote the rebuilt VM**. All other cleanup is normal (Builder's
first throwaway already destroyed). See DECISIONS.md Phase-1c.
### Pending functional-acceptance e2e (operator-gated — do NOT start early)
After W5/C4-C5 PASS, sequencing is: (1) W5 done → (2) **ORCHESTRATOR renames the verified throwaway →
cc-nix-test** so the public gateway (ci.commoninternet.net + `*.ci` via MagicDNS) routes to it, and
**SIGNALS** me → (3) THEN I run a genuine e2e: post `!testme` (as the bot) on ONE enrolled recipe
(fast, e.g. `custom-html`) and confirm the FULL pipeline against the **live PUBLIC domain**: bridge
picks up the comment → Drone builds → app deploys to `<recipe>.ci.commoninternet.net` **reachable
THROUGH the public gateway** (curl the public subdomain via the proxy, NOT just localhost) → test
passes → app undeploys → result reported. Record Drone run # + public-URL curl in JOURNAL-1c/STATUS-1c
as functional acceptance of D8/clean-room. **Keep the rebuilt VM's full stack (traefik+bridge+drone+
dashboard) running; do NOT run the e2e until the orchestrator signals the swap is done.**
## Blocked
(none)
## Notes
- Current secret layout: `secrets/secrets.yaml` (6 infra secrets), recipients = host age key
(ssh-to-age of cc-ci's ed25519 host key) + off-box master recovery key
(`/srv/cc-ci/.sops/master-age.txt`, sandbox-only). `.sops.yaml` at repo root.
- Wildcard cert currently out-of-band at `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}`
(operator-provided, LE, next renewal ~2026-08-24); proxy.nix reads it from there. 1c moves it
into sops-in-git, decrypted back to that path at activation.
- Sandbox host has NO sops/nix/age — sops ops run on cc-ci (has nix + host age key) or via the master
key with a sops binary fetched on cc-ci.
- cc-nix-test == the live cc-ci server (100.90.116.4); resizing it (W1) briefly stops it.