Files
cc-ci/REVIEW-1c.md
2026-05-27 17:57:18 +01:00

84 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW-1c.md — Adversary ledger for Phase 1c (Full reproducibility + genuine D8 live rebuild)
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase1c-full-reproducibility.md`
Definition of Done: **C1C7** (each must be Adversary-verified cold within 24h before DONE).
- **C1** — Secrets-repo split (`cc-ci-secrets` private repo, secrets-only, consumed via flake input; base stays one well-parameterized repo; `nixosConfigurations.cc-ci` still byte-identical to running).
- **C2** — Cert in git (wildcard cert+key are sops secrets in `cc-ci-secrets`, decrypted at activation; "operator drops a cert file" step gone; rebuild serves valid TLS from git-sourced cert).
- **C3** — All secrets in git, one exception (only out-of-band secret = bootstrap age key; everything else sops-encrypted in git).
- **C4** — Genuine throwaway-VM live rebuild (blank NixOS VM in `terraform-ci`, only bootstrap age key provisioned; clone base+secrets, `nixos-rebuild switch`, oneshots converge, cert+secrets decrypt, no manual step outside `docs/install.md`; Adversary performs cold).
- **C5** — Honest D8 (evidence rewritten: static byte-identical closure + live throwaway rebuild; "infeasible by design" removed; any limitation narrow + Adversary-signed-off).
- **C6** — Resource fit + cleanup (`cc-nix-test` 6→4 GB; throwaway VM at 4 GB; ≤~12 GB running guideline; throwaway destroyed after test; final sizing recorded in DECISIONS.md).
- **C7** — Docs (install.md/secrets.md/architecture.md + plan refs updated to new model; fresh engineer can stand up an instance).
Mapping to method milestones: W1→C6(headroom), W2→C1/C2/C3, W3→C4(VM), W4→C4(rebuild), W5→C4/C5(cold proof+honest D8), W6→C6/C7(cleanup+docs).
Standing rules: verify every claim from a COLD START (fresh shell, own clone, no cached state). Re-run the acceptance check myself. Veto power: `## VETO <reason>` forbids DONE until cleared.
---
## Cold-start baseline @2026-05-27 (Phase 1c kickoff)
Adversary loop entered. Observations from cold start:
- `git pull --rebase` → up to date @ `492fa23` (Phase-1 DONE sign-off). **No Phase-1c state files yet** (STATUS-1c.md / BACKLOG-1c.md / JOURNAL-1c.md absent) — Builder has not begun 1c bootstrap. Nothing CLAIMED.
- `ssh cc-ci 'hostname && systemctl is-system-running'``nixos` / `running` (healthy, pre-refactor baseline).
- SOCKS proxy `127.0.0.1:1055` and `ssh cc-ci` working. Incus skill present at `/srv/incus-terraform-nix-vm-creator/skills/incus-terraform/SKILL.md`.
No gates to verify yet. Idling until the Builder seeds 1c state and claims the first gate (watchdog will ping on CLAIM). Will keep break-it probes ready (greps for plaintext secrets in base + store; cert-in-git decrypt path; byte-identical drift; throwaway-VM rebuild cold-repro).
## Pre-W2 cold baselines @2026-05-27 16:10Z (reference values for verifying C1/C2/C3 after W2)
Builder has bootstrapped 1c state; **W2 in flight, not yet CLAIMED**. Decisions recorded by Builder (DECISIONS.md): secrets linkage = **git submodule** (deviates from flake-input default — rationale: no private-repo fetch cred at nix-eval, keeps `defaultSopsFile` a local path = minimal change + trivially byte-identical); bootstrap key for throwaway = **recovery age key via `sops.age.keyFile`**.
Reference values to compare against after W2:
- **C1 byte-identical** — running system toplevel: `/nix/store/m1pdvbhlmlj3x3gn0x83rgwcgssks7qs-nixos-system-nixos-24.11.20250630.50ab793` (booted: `09ia5qd0jw0nghx83b4fijcg2jak9cp4-…`). nixos-version `24.11.20250630.50ab793 (Vicuna)`. After the refactor, `nixos-rebuild build .#cc-ci` must produce the **same** toplevel (pure structural move ⇒ identical closure).
- **C2 cert content** — out-of-band cert at `cc-ci:/var/lib/ci-certs/live/`: `fullchain.pem` 2909 B sha256 `c1d96d61a43bfec10716e18d13832bd325ef173e9af01f197a48490481300080`; `privkey.pem` 227 B sha256 `9ec25d00910677718762713717b8c763da46fa7489e292b057e916a252d0ca42` (EC key). After W2 these must be **sops-decrypted from git** to the same path with the **same hashes**, and the operator-cert-drop precondition framing in proxy.nix must be gone.
- **C3 no-plaintext** — base repo clean: `secrets/secrets.yaml` is sops `ENC[AES256_GCM,…]`; `git grep` for `BEGIN … PRIVATE KEY|BEGIN CERTIFICATE` outside `secrets/` = 0 matches; no `*.pem/*.key/*.crt/*.p12/*.pfx` tracked. After W2: cert+key must be `ENC[…]` in `cc-ci-secrets`, never plaintext; base must stay clean; also grep the **Nix store** for decrypted secret material at activation.
Things to scrutinize hard when W2 is CLAIMED:
1. Submodule actually points at a **private** `recipe-maintainers/cc-ci-secrets` holding only encrypted secrets (no code/config logic).
2. Byte-identical: same toplevel store path (or differences are only expected & explained — zero functional drift).
3. Cert genuinely served from the git-sourced cert after switch (live TLS handshake on a `*.ci.commoninternet.net` host), not the stale out-of-band file.
4. All D1D10 still hold after the refactor (no regression) — spot-check the live system health + a `!testme`-path sanity check before DONE.
## Interim probe @2026-05-27 16:22Z — cc-ci-secrets repo (pre-W2-gate; not a gate verdict)
Independent cold check of the new secrets repo (Builder W2 step 1, commit `f972bc1`), via Gitea API with bot creds:
- `recipe-maintainers/cc-ci-secrets` exists, **`private: True`**, non-empty. Top-level: `.sops.yaml`, `README.md`, `secrets.yaml` (no code / no config logic — matches §2's "encrypted secrets only"; README is doc-only and leak-clean).
- `secrets.yaml`: **all 8 keys `ENC[...]`** — 6 infra (test_secret, drone_rpc_secret, drone_gitea_client_secret, bridge_drone_token, bridge_gitea_token, bridge_webhook_hmac) **+ `wildcard_cert` + `wildcard_key`**. **0 plaintext PEM/cert markers**; sops `mac` metadata present. → cert+key genuinely moved into sops-in-git (C2/C3 secrets-side looks good).
- Layout nuance: secrets file is at repo **root** `secrets.yaml`; Builder will mount the submodule at base `secrets/` so it resolves to `secrets/secrets.yaml`. OK for the submodule linkage.
**Not yet verifiable (needs W2 base-switch + activation):** byte-identical build==running (C1), cert sops-**decrypts to the same hashes** at `/var/lib/ci-certs/live/` (C2 — must match fullchain `c1d96d61…`, privkey `9ec25d00…`), no plaintext leak into the **Nix store**, live TLS from git-cert, and no D1D10 regression. Will run these when **Gate W2** is CLAIMED.
## W2: PASS @2026-05-27 16:55Z — secrets-split + cert-in-git (verifies C1, C2, C3) — COLD
Gate W2 CLAIMED by Builder (commits `f972bc1`/`f79e542`/`faa3709`; running toplevel `vh6vwxbl…`). Verified independently from a cold start (fresh clone on cc-ci, own checks, no reliance on the Builder's `/root/cc-ci`):
**(1) Byte-identical build==running (C1) — PASS.** Fresh recursive clone of `origin/main` (HEAD `0633aa7`) on cc-ci into `/tmp/advverify`, submodule `secrets``2312f1c` initialized with bot creds (via `http.extraheader`, not URL/args), `secrets/secrets.yaml` present + `ENC[…]`. `nixos-rebuild build --flake 'git+file:///tmp/advverify?submodules=1#cc-ci'``/nix/store/vh6vwxbl4qr9whzpwgjimhf9gn4329p8-nixos-system-…` == `/run/current-system` (`readlink -f` identical). **Zero drift** — the *currently published* repo+submodule reproduces the *currently running* system byte-for-byte. Base stays one parameterized repo; only `secrets/` is the external private submodule.
**(2) Cert in git + live TLS (C2) — PASS.** `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` are now **symlinks → `/run/secrets/wildcard_cert`,`wildcard_key`** (sops-decrypted at activation), not out-of-band files. File sha256 `c1d96d61…`/`9ec25d00…` == my pre-W2 operator-cert baseline (byte-identical cert, now git-sourced). `secrets.nix` adds `wildcard_cert`(0444)/`wildcard_key`(0400) with a comment that this "Replaces the prior operator-drops-a-cert-file step." Live HTTPS `https://ci.commoninternet.net` via proxy → `http_code=200`, `ssl_verify_result=0`, served leaf = LE `*.ci.commoninternet.net` (SAN `*.ci`+bare), valid 2026-05-26→08-24. **Served leaf fingerprint `57:8D:67:9E:FE:89:…:B8:A6` == the git-sourced cert's leaf fingerprint** (computed locally from the decrypted file) → live TLS provably served from the git cert, full chain of custody intact.
**(3) No plaintext leak (C3) — PASS.** Base repo: `secrets/` is a gitlink (`.gitmodules`→ private `cc-ci-secrets`); no `*.pem/*.key` tracked; `git grep BEGIN…PRIVATE KEY|CERTIFICATE` outside REVIEW text = 0. `cc-ci-secrets`: all 8 secrets `ENC[…]` (6 infra + cert + key), 0 plaintext PEM, valid sops MAC, private repo. On the host: secrets decrypt to **`/run/secrets.d` (ramfs, in-memory)**, not the world-readable store; no private key found in the system-closure store dirs.
**Non-regression:** `systemctl is-system-running`=running, **0 failed units**; swarm stack all 1/1 (`traefik` v3.6.15, `drone` 2.26.0, `ccci-bridge`, `ccci-dashboard`, `backups`), `drone-runner-exec` running; reconcile oneshots converged. No D1D10 regression observed.
**C1, C2, C3 Adversary-PASS** (24h freshness clock starts now; will be re-exercised on the blank host at C4). Remaining for DONE: C4 (genuine throwaway-VM live rebuild), C5 (honest D8), C6 (resize+cleanup), C7 (docs). No VETO.
## Corroboration @2026-05-27 17:23Z — sops cert re-decrypts at BOOT (after W1 resize-reboot)
W1 (Builder, `6c03a27`) resized cc-nix-test 6→4 GB and rebooted the live server. Cold spot-check post-reboot: system `running`, 0 failed, mem 3575 MB (≈4 GB applied), live TLS `http_code=200 ssl_verify=0`. Cert symlink target moved `/run/secrets.d/8/``/1/` (ramfs wiped on reboot) but `fullchain.pem` sha256 still `c1d96d61…`. → the git-sourced sops cert **re-decrypts byte-identically at boot**, not only at `switch` — strengthens C2 (reproducible from git across a cold boot). No formal gate (W1 has no Adversary gate); W4 = next gate. Builder W3 DONE: throwaway VM reachable `100.126.124.86`.
## C4/W5 verification standard (set @2026-05-27 17:30Z — read before claiming W4)
My cold proof of the throwaway-VM live rebuild (C4) will require, and I will REJECT a skipped/faked TLS check:
- Rebuilt VM **keeps `DOMAIN = ci.commoninternet.net`** (same instance ⇒ proves the SAME system reproduces). The git cert only covers `*.ci.commoninternet.net` + bare — **do NOT use a `ci2.commoninternet.net` domain** (no `*.ci2` cert ⇒ TLS unverifiable / would be a fake pass).
- Fresh VM has a NEW tailnet IP; public DNS for `*.ci.commoninternet.net` → gateway → the *real* cc-ci, not the fresh VM. So verify TLS **on the fresh VM itself**, forcing resolution to the VM: `curl --resolve <host>.ci.commoninternet.net:443:127.0.0.1` (or to the VM's tailnet IP), SNI `ci.commoninternet.net`.
- **Served leaf fingerprint must == the git cert leaf** `57:8D:67:9E:FE:89:…:B8:A6` (sha256), proving Traefik on the rebuilt host serves the sops-from-git cert. Cert-from-git serving is an integral part of the C4/D8 proof.
- Plus: oneshots converge (swarm/proxy/drone/bridge/dashboard), all secrets decrypt, **no manual step outside `docs/install.md`**, only the bootstrap age key provisioned out-of-band.
## C1 refresh @2026-05-27 18:00Z — byte-identical against NEW keyFile config (izsmiajw)
Builder W4 Step A (`9cc6788`/`24fe11a`) added `sops.age.keyFile` (recovery key on clones, host-derived on cc-ci) and switched cc-ci → new toplevel `izsmiajwjwa12356mm35fw08jdy5f0zs` (supersedes the `vh6vwxbl` from my 16:55 W2 PASS). Re-verified cold: fresh recursive clone (HEAD `24fe11a`, submodule `2312f1c`) → `nixos-rebuild build` = `izsmiajw` == `/run/current-system`. **BYTE-IDENTICAL: YES, zero drift.** Live host healthy (running, 0 failed), cert sha `c1d96d61…`, TLS `200/ssl_verify=0`. → **C1 stays Adversary-PASS** against the current running config; clock refreshed 18:00Z. (W4 Step B throwaway rebuild still in flight — not yet CLAIMED.)
<!-- Append PASS/FAIL verdicts below with timestamps + evidence. -->