# JOURNAL — Phase 1c (Builder) Append-only. Each entry: what I did + verifying command/output + next. ## 2026-05-27 — Phase 1c kickoff / orientation **Context:** Phase 1 closed DONE (1c10fa5; all D1–D10 Adversary-PASS, no VETO; Adversary loop terminated for Phase 1). Starting Phase 1c per `plan-phase1c-full-reproducibility.md`. **Did:** - Read the phase plan in full + plan.md §6.1/§7/§9. Verified access & surveyed current state. - `ssh cc-ci 'hostname && nixos-version && systemctl is-system-running'` → `nixos` / `24.11.20250630.50ab793 (Vicuna)` / `running`. Healthy. - Surveyed secrets layout: - `secrets/secrets.yaml` holds 6 infra secrets: test_secret, drone_rpc_secret, drone_gitea_client_secret, bridge_drone_token, bridge_gitea_token, bridge_webhook_hmac. - `.sops.yaml`: recipients = `&host` (ssh-to-age of cc-ci ed25519 host key) + `&master` (off-box recovery, private at `/srv/cc-ci/.sops/master-age.txt`, sandbox-only). - `modules/secrets.nix`: `defaultSopsFile = ../secrets/secrets.yaml`, `age.sshKeyPaths = ["/etc/ssh/ssh_host_ed25519_key"]`. - Wildcard cert OUT-OF-BAND at `cc-ci:/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (fullchain 2909 B, privkey 227 B → privkey looks like an EC key). `modules/proxy.nix` reconcile reads it from there and inserts as traefik `ssl_cert`/`ssl_key` swarm secrets; treats a missing cert as FATAL (operator precondition). - `flake.nix`: nixpkgs pinned `50ab793`, sops-nix pinned; single local `nixosConfigurations.cc-ci`. - Tooling note: sandbox host has NO sops/nix/age/ssh-to-age (`which` → only git). sops/age work must run on cc-ci (has nix + host age key) or via a sops binary fetched there with the master key. - Bootstrapped Phase-1c state: STATUS-1c.md, BACKLOG-1c.md, JOURNAL-1c.md (this file). REVIEW-1c.md left for the Adversary (its file per §6.1). Appended Phase-1c decisions to DECISIONS.md. **Decisions recorded (DECISIONS.md):** secrets linkage = **git submodule** (deviates from the flake-input default — rationale: no private-repo fetch credential needed at nix-eval on every rebuild, keeps `defaultSopsFile` a local path = minimal change + trivially byte-identical); bootstrap key for throwaway = **recovery age key via `sops.age.keyFile`**. **Next (W2):** create private `recipe-maintainers/cc-ci-secrets`; move secrets + wildcard cert into sops there as a submodule of the base; wire secrets.nix (cert→`/var/lib/ci-certs/live` via `path=`); prove byte-identical build + clean switch with TLS from the git cert. Then claim Gate W2. ## 2026-05-27 — W2 step 1: cc-ci-secrets repo created + populated (DONE) **Did:** - Created private `recipe-maintainers/cc-ci-secrets` via Gitea API (bot, org admin). HTTP 201, private=True. - Confirmed cc-ci host SSH key → age identity == `&host` recipient `age1h90utd…`: `ssh cc-ci 'nix shell nixpkgs#ssh-to-age --command ssh-to-age -i /etc/ssh/ssh_host_ed25519_key.pub'` → exact match. So I can decrypt/re-encrypt on cc-ci with the host key (master stays sandbox-only). - Built `secrets.yaml` on cc-ci (script with file redirections, no key material in argv): `sops -d` existing 6 secrets → append `wildcard_cert`/`wildcard_key` as YAML block scalars from `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` → `sops -e`. Verified round-trip: - recipients: 2 (host+master) - keys: test_secret, drone_rpc_secret, drone_gitea_client_secret, bridge_drone_token, bridge_gitea_token, bridge_webhook_hmac, wildcard_cert, wildcard_key - cert sha256 file==decrypt `c1d96d61…`; key sha256 file==decrypt `9ec25d00…`; test_secret decrypts OK - Retrieved ciphertext (7219 B) to sandbox; created cc-ci-secrets repo (root `secrets.yaml`, own `.sops.yaml` w/ `path_regex: secrets\.yaml$`, README). Pushed to main (auth via per-command http.extraHeader; verified `.git/config` has NO creds). Remote lists .sops.yaml/README.md/secrets.yaml. - Cleaned `/root/cc-ci-secrets.yaml` + build script off cc-ci. **Layout decision:** cc-ci-secrets has `secrets.yaml` at ROOT → submodule mounts at base `secrets/` → base sees `secrets/secrets.yaml`, so `defaultSopsFile = ../secrets/secrets.yaml` is UNCHANGED. **Next (W2 step 2):** in base repo — replace tracked `secrets/` with the submodule; add `wildcard_cert`/`wildcard_key` sops secrets in secrets.nix (path= → /var/lib/ci-certs/live, + recovery keyFile); adjust proxy.nix framing; switch cc-ci to new config via `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'`; prove byte-identical + TLS-from-git-cert; then claim Gate W2. (Riskier — touches live server config; fresh iteration.) ## 2026-05-27 — W2a DONE + verified live; Gate W2 CLAIMED **Discovery:** cc-ci's build source `/root/cc-ci` is NOT a git repo — it's a plain dir synced from the sandbox via `tar | ssh` and built as a `path:` flake (DECISIONS.md:126). So cc-ci's deploy needs NO submodule fetch / `?submodules=1` (the rsync'd dir already contains `secrets/`). The git-clone `--recursive` + `?submodules=1` path is only for the documented install / throwaway (W4). **Did (W2a — secrets split + cert into git, deployed to live cc-ci):** - secrets.nix: added `wildcard_cert`(0444)/`wildcard_key`(0400) sops secrets → `path=/var/lib/ci-certs/live/*`. - proxy.nix: reframed cert as sops-from-git (not operator drop); kept FATAL guard as a decrypt-path check. - Base repo: `git rm secrets/secrets.yaml`; `git submodule add cc-ci-secrets secrets` (gitlink 2312f1c, `.gitmodules` has NO creds). Pushed f79e542 (rebased over Adversary's c360520; resolved the tracked-file→submodule transition by removing the submodule wd before rebase, repopulating after). - Synced to cc-ci via `tar | ssh` (excluded .git). `nixos-rebuild build` → exit 0, only **6 derivations built** (sops manifest gains cert/key + proxy unit error-msg edit) → toplevel `vh6vwxbl4qr9whzpwgjimhf9gn4329p8` (differs from pre-W2 `m1pdvbhl…` — EXPECTED: cert moved out-of-band-file → Nix-managed sops; that is C2's whole point, not drift). - Backed up operator cert (`/root/ci-certs-operator-bak`), removed the regular files, `nixos-rebuild switch` (detached unit `ccci-w2-switch`, Result=success). **Verified live:** - sops cert decrypt: `/var/lib/ci-certs/live/{fullchain,privkey}.pem` are now symlinks → `/run/secrets/ wildcard_{cert,key}`; content sha256 == source: `c1d96d61…` / `9ec25d00…` (byte-identical to the original operator cert, now git-sourced). - `systemctl is-system-running` → running, 0 failed. `deploy-proxy` active/success. - **Byte-identical (zero drift):** `nixos-rebuild build` == `/run/current-system` == `vh6vwxbl…`. - **Documented git-clone path also reproduces it:** fresh `git clone --recursive` into a temp git repo + `nixos-rebuild build --flake 'git+file:///tmp/ccci-git?submodules=1#cc-ci'` → **vh6vwxbl… (MATCH)**. Proves the install/throwaway path works and equals running. - **Live TLS from git cert:** `https://ci.commoninternet.net` http=200 ssl_verify=0; random `probe-*.ci.commoninternet.net` handshake ssl_verify=0 (404 route, expected) via gateway→cc-ci; served leaf `CN=*.ci.commoninternet.net`, LE issuer, valid to Aug 24 2026. **For the Adversary verifying Gate W2 cold:** must init the submodule (`git clone --recursive` OR `git submodule update --init`, bot creds) then build with `?submodules=1`, else `secrets/` is empty. Both path: and git+submodules builds yield the same toplevel `vh6vwxbl…` (content-addressed). **Deferred to W3/W4 prep (NOT in W2):** the recovery-key `sops.age.keyFile` for the throwaway VM — adding it changes the closure again, so I'll add + test it on the throwaway (safe) and re-establish byte-identical there. cc-ci stays on its proven host-key decrypt path for now. **Next:** Gate W2 CLAIMED → await Adversary PASS on byte-identical + cert-in-git/TLS. Meanwhile prep W1 (resize) / W3 (throwaway VM) — read the incus skill. ## 2026-05-27 — W3 recon (read-only; while parked at Gate W2) Incus skill read. b1 = 100.117.251.31:8443, project terraform-ci, mTLS certs at /srv/incus-terraform-nix-vm-creator/terraform-secrets/{terraform.crt,terraform.key}. **b1 reachable via the EXISTING cc-ci proxy** (`curl --proxy socks5h://127.0.0.1:1055 --cert/--key -k …`) — no separate tailscaled needed (skill's own 1055 proxy would collide; reuse cc-ci's). terraform-ci instances + RAM: - cc-nix-test Running 6GB VM ← this IS the live cc-ci; W1 resizes 6→4 (stop→set→start, hotplug times out) - lichen-staging Running 4GB container (leave alone) - kube-base / kube-base-test Stopped 4GB VMs - release-runner Stopped 8GB VM Running total now = 10GB. After W1 + throwaway(4GB): 4+4+4 = 12GB ≤ 16 physical (phase-plan ~12GB doc-only guideline; terraform-ci has no enforced limits.memory). VM create = `projects/incus-base` Terraform template (NixOS base image, cloud-init+tailscale+nix flakes), set instance_name + limits.memory=4GB. ## 2026-05-27 — W1 DONE: cc-nix-test resized 6→4 GB (verified) Gate W2 PASSED (Adversary, cold) → proceeded. No active CI run (only 5 permanent stacks). Resized via Incus API on b1 (mTLS certs through the existing 1055 proxy): PUT state stop (op Success, Stopped) → PATCH `limits.memory=4GB` (http 200) → PUT state start (op Success, Running). **Verified after reboot:** - SSH back in ~30s; `systemctl is-system-running` → running after ~104s (swarm/reconcile converge), 0 failed units. - `free -h` total 3.5Gi (≈4 GB, down from 6). All stacks 1/1 (traefik app+socket-proxy, drone, bridge, dashboard, backups). - **Cert survived reboot via sops:** `/var/lib/ci-certs/live/{fullchain,privkey}.pem` still symlinks → /run/secrets/* (sops re-decrypted on cold boot). current-system still `vh6vwxbl…`. - TLS: `https://ci.commoninternet.net/` http=200 ssl_verify=0 (dashboard served from git cert). Running RAM now: cc-nix-test 4 + lichen-staging 4 = 8 GB; throwaway 4 → 12 GB ≤ 16 physical (guideline OK). **Next: W3** — create blank 4 GB NixOS VM in terraform-ci, provision ONLY the bootstrap (recovery) age key. ## 2026-05-27 — W3: throwaway VM created (booting) + W4 design notes **W3:** Created `ccci-throwaway` in terraform-ci via the **Incus REST API** (curl through the 1055 proxy — terraform/nix absent on sandbox; replicated `projects/incus-base/main.tf`): image `incus-base-vm` (fp 3a0c4160), 4 GB RAM / 2 cpu / **20 GB disk** (>10 GB default, to dodge cc-ci's old ENOSPC), cloud-init writes /etc/nixos/{configuration,incus-base}.nix + setup.sh + /etc/ts-auth-key (incus workspace reusable key) + /etc/ts-hostname=ccci-throwaway; runcmd setup.sh (nix-channel nixos-24.11, `nixos-rebuild boot`, sysrq reboot → tailscale auto-joins). ssh_authorized_keys = vm_ssh_key (I hold private) + mfowler + cc-ci-root key. CREATE+START ops Success, status Running; first boot ~4-6 min. NOTE: cc-nix-test was terraform-created (`projects/cc-nix-test`); my W1 API resize drifts its tfstate (reconcile or accept in W6 final-sizing). **W4 design (analysis; implement next):** - cc-ci's `hosts/cc-ci/configuration.nix` pins tailscale `--hostname=cc-nix-test` + reads /etc/ts-auth-key, and `secrets.nix` decrypts ONLY via `age.sshKeyPaths` (host SSH key). Consequences for the throwaway: 1. **Decryption:** throwaway's host SSH key is NOT a sops recipient → cc-ci config as-is can't decrypt there. **W4 must add `sops.age.keyFile = "/var/lib/sops-nix/key.txt"`** and provision the **recovery age key** there (the ONE out-of-band secret). Open Q: does a *missing* keyFile abort activation on cc-ci (where the file won't exist)? If yes, also provision cc-ci's own host-derived age key at that path (no new exposure) OR keep sshKeyPaths+keyFile and confirm sops-nix tolerates the absence. Test path: add keyFile, deploy to cc-ci (rollback-safe via generations), observe. 2. **Tailnet hostname:** after rebuild the throwaway re-ups as `cc-nix-test` → tailscale auto-suffixes the duplicate; the REAL cc-ci is accessed by IP (100.90.116.4) so it's unaffected. Verify the throwaway via its own IP (Incus state tailscale0 addr) and/or incus-agent `exec` (hostname-independent). 3. **Bridge side effect:** throwaway's bridge would poll Gitea with the real token (fresh state ⇒ could re-trigger already-`!testme`'d PRs). Mitigate: run W4 when no `!testme` is pending; destroy promptly. - Adding keyFile changes the closure again (W2 byte-identical was at `vh6vwxbl`); re-verify after. ## 2026-05-27 — W3 DONE (VM reachable) + keyFile finding **W3 reachable:** throwaway base boot initially failed tailscale auth — the incus-workspace `.test.env` key is **stale** ("invalid key: API key does not exist"). Fixed by writing the **current `TS_AUTH_KEY` from /srv/cc-ci/.testenv** (same tailnet `taila4a0bf.ts.net`) to /etc/ts-auth-key and `tailscale up`. VM now at **100.126.124.86**; `ssh -i vm_ssh_key` via the 1055 proxy works → NixOS 24.11 (rev 50ab793, == cc-ci), nix 2.24 flakes, 4 GB / 20 GB (13 G free). *(install.md/Adversary note: provision the live TS key, not the stale workspace one.)* **keyFile finding (decisive):** read sops-install-secrets main.go (sops-nix 77c423a, store `hm2xjph…-source/pkgs/sops-install-secrets/main.go`): when `age.keyFile` is set, line ~1349 `os.ReadFile(AgeKeyFile)` and **returns a fatal error if the file is missing** → activation fails. ⇒ Adding `keyFile` to cc-ci's config FORCES the file to exist on cc-ci. Also: `sshKeyPaths` reads `/etc/ssh/ssh_host_ed25519_key` (exists on any host; non-recipient keys are simply unused), so keeping both is safe on both hosts. **W4 design (locked):** secrets.nix gets `sops.age.keyFile = "/var/lib/sops-nix/key.txt"` (keep sshKeyPaths). Provision that file = the host's bootstrap age key: on **cc-ci** = its host-derived age key (ssh-to-age of the host SSH key — no new secret exposure); on the **throwaway** = the **recovery key** (/srv/cc-ci/.sops/master-age.txt). cc-ci must get the file BEFORE the keyFile config deploys. Adding keyFile changes the closure (supersedes W2 `vh6vwxbl`) → re-verify byte-identical after. ## 2026-05-27 — Orchestrator guidance for C4 TLS verification (W4 Step B) The throwaway has a NEW tailscale IP (100.126.124.86); the canonical `ci.commoninternet.net` gateway/DNS still points at the LIVE cc-ci, and the git cert is `*.ci.commoninternet.net`. So verify C4 TLS **locally ON the throwaway**, WITHOUT repointing the live gateway and WITHOUT changing the throwaway DOMAIN (keep DOMAIN=ci.commoninternet.net so the cert matches): - ssh into the throwaway; `curl --resolve probe.ci.commoninternet.net:443:127.0.0.1 \ https://probe.ci.commoninternet.net/` → hits the local traefik with SNI ci.commoninternet.net. - Confirm the served leaf == the git cert (sha256 fullchain `c1d96d61…`; Adversary's leaf fingerprint `57:8D:67:9E:FE:89:…:B8:A6`). That proves the rebuilt system serves the git-sourced cert reproducibly. - Do NOT use ci2 for the TLS test (no `*.ci2` cert → would mismatch). Operator wired `ci2.commoninternet.net` + `*.ci2` → 100.126.124.86 for *plain* reachability only (not needed for TLS). - DNS/gateway/cert are documented external INSTANCE preconditions; C4 proves the VM rebuilds from git + the single bootstrap age key. Don't skip/fake the TLS check. ## 2026-05-27 — W4 Step A DONE + Step B launched (throwaway rebuild in flight) **Step A (cc-ci → final keyFile config):** provisioned cc-ci `/var/lib/sops-nix/key.txt` = host-derived age key (pub == `age1h90utd…` == &host recipient, verified via age-keygen -y). Added `sops.age.keyFile` to secrets.nix (9cc6788), synced, `nixos-rebuild build`→`izsmiajw…` (only manifest+system rebuilt), switched (unit ccci-w4a-switch success). Verified: system running 0 failed, **byte-identical build==running==`izsmiajw…` (ZERO DRIFT)**, cert still sha256 `c1d96d61…`. So cc-ci activates cleanly with keyFile. NOTE: toplevel evolved `vh6vwxbl` (W2) → **`izsmiajw`** (final, +keyFile); the published repo now builds to izsmiajw==running — this is the form the Adversary re-verifies for C4/DONE. **Step B (throwaway live rebuild — IN FLIGHT):** - Provisioned throwaway `/var/lib/sops-nix/key.txt` = **recovery key** (via stdin; pub == `age1cmk26…` == &master recipient, verified) — the ONE out-of-band secret. - `git clone --recursive` base (bot creds via http.extraHeader, the "given the repos" provisioning) → /root/cc-ci, submodule `secrets`→2312f1c, secrets.yaml ENC. Confirmed clone has `age.keyFile` line. - Launched `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` as detached unit `ccci-rebuild` (survives the tailscale re-up when cc-ci config activates). Monitoring via incus-agent `exec` (vsock — survives network restart). Expect 10-30 min (builds sops-install-secrets/abra/etc). C4/W5 standard (Adversary dd710a6 == orchestrator guidance): keep DOMAIN=ci.commoninternet.net, verify TLS locally on the VM via `curl --resolve …:443:127.0.0.1` (SNI ci.commoninternet.net), served leaf fingerprint must == git cert leaf `57:8D:67:9E:…:B8:A6`; oneshots converge; only age key out-of-band. ## 2026-05-27 — W4 Step B: throwaway rebuilt; concurrent-abra race found + fixed **Throwaway rebuild result (pre-fix config, clone @dd710a6):** `nixos-rebuild switch` BUILD succeeded (2.8 G peak RAM < 4 GB, 11.5 min CPU) → toplevel **`izsmiajw…` == cc-ci's running system** (blank VM reproduces cc-ci byte-for-byte from git + the bootstrap age key). **sops cert decrypted via the RECOVERY key**: /var/lib/ci-certs/live/{fullchain,privkey}.pem → /run/secrets/*, sha256 `c1d96d61…` (match). swarm-init + docker active (node Ready/Leader). BUT activation reported "error(s) while switching": `deploy-proxy` + `deploy-drone` FAILED → system `degraded`. **Root cause:** the abra reconcilers (proxy/drone/bridge/dashboard/backupbot) are all `wantedBy multi-user.target`; drone/bridge/dashboard were `after deploy-proxy` but **concurrent with each other**, and backupbot concurrent with proxy. On a FRESH `~/.abra` they race on catalogue/recipe init → fast failures. Confirmed: `abra recipe fetch traefik` works fine alone (rc=0); re-running the oneshots **sequentially** (`systemctl restart deploy-proxy; …drone; …bridge; …dashboard; …backupbot`) → ALL success, system `running`, **0 failed, all 6 stacks 1/1** (traefik app+socket-proxy, drone, bridge, dashboard, backups) — identical to cc-ci. **Fix (7563d47):** serialize the chain via ordering-only `after`: proxy → drone → bridge → dashboard → backupbot (bridge after drone, dashboard after bridge, backupbot after dashboard). So a single `nixos-rebuild switch` on a blank host converges with no concurrent abra. New toplevel `ld19aj2…`. Deploying to cc-ci (reconcilers already deployed there ⇒ serial no-op re-runs) + re-verify byte-identical, then **recreate the throwaway FRESH** to prove single-switch convergence (authoritative C4; mirrors the Adversary's W5 cold test). This is the LAST planned config change before W4 completes (config stable ld19aj2 thereafter). ## 2026-05-27 — W4: cc-ci on serialized config (ld19aj2) + throwaway TLS leaf-match PASS - cc-ci switched to serialized config: `systemctl is-system-running`=running, **byte-identical build==running==`ld19aj2dcrjm6jarq1k6rvhc0zww34qq` (ZERO DRIFT)**, 6 stacks. - **Throwaway local TLS (C4 cert proof):** on the rebuilt throwaway (IP 100.126.124.86), `curl --resolve probe.ci.commoninternet.net:443:127.0.0.1` → http=404 (no route, expected) **ssl_verify=0**. Served leaf sha256 fingerprint == git-cert leaf: `57:8D:67:9E:FE:89:D5:FB:43:2E:2A:02:D6:A6:BA:F4:9B:98:1A:78:4A:6C:6A:85:DB:F6:A2:81:61:A6:B8:A6` (== Adversary reference). Full chain of custody: git sops → recovery-key decrypt → /var/lib/ci-certs/ live → traefik swarm secret → served leaf. The rebuilt host serves the git-sourced cert. Next: recreate throwaway FRESH with fixed config to prove SINGLE nixos-rebuild switch converges (0 failed). ## 2026-05-27 — W4 DONE: genuine throwaway-VM live rebuild, SINGLE switch converges (Gate W4 CLAIMED) **Authoritative C4 proof on a FRESH blank VM** (destroyed the pre-fix VM, recreated clean; cloud-init used the LIVE TS_AUTH_KEY so it auto-joined the tailnet — no manual tailscale step): - Provisioned ONLY `/var/lib/sops-nix/key.txt` = recovery age key (pub == `age1cmk26…` == &master) — the single out-of-band secret. `git clone --recursive` base+secrets (submodule 2312f1c, secrets ENC). - **One** `nixos-rebuild switch --flake 'git+file:///root/cc-ci?submodules=1#cc-ci'` (detached --no-block) → `ccci-rebuild` Result=**success** (~15 min, 2.8 G peak < 4 GB). - **`systemctl is-system-running` → running, 0 failed units** (the serialization fix works: single switch converges, no manual re-runs). Toplevel **`ld19aj2…` == cc-ci** (byte-identical). - **All 6 stacks 1/1**: traefik app+socket-proxy, drone, ccci-bridge, ccci-dashboard, backups. - **All secrets decrypted via the recovery key**; wildcard cert sops-decrypted from git → `/var/lib/ci-certs/live/fullchain.pem` (symlink→/run/secrets, sha256 `c1d96d61…`). - **TLS from git cert (local, per C4 standard):** `curl --resolve probe.ci.commoninternet.net:443: 127.0.0.1` → http=404 (no route, expected) **ssl_verify=0**; served leaf sha256 fingerprint **== git-cert leaf == `57:8D:67:9E:FE:89:…:B8:A6`** (Adversary reference). Full chain of custody. So: blank NixOS host + the two git repos + the one bootstrap age key + external DNS/gateway → one `nixos-rebuild switch` → working cc-ci. No undocumented manual step. This closes D8 honestly (static byte-identical closure + live throwaway rebuild). install.md updated to this validated procedure. Destroying the throwaway now (frees RAM for the Adversary's independent W5 cold rebuild; C6 no-leftover). Gate W4 CLAIMED — awaiting Adversary cold W5 (their own fresh VM). ## 2026-05-27 — Operator override: keep the FINAL throwaway (promote → cc-nix-test) Orchestrator/operator note: do NOT destroy the FINAL W5/C4-C5 clean-room throwaway VM after it PASSes — the operator repurposes it as the new cc-nix-test for a live real-traffic test through the public gateway. Keep it running; defer its C6 teardown until the operator explicitly says otherwise. Overrides plan §5/§6 "destroy the throwaway" for that one VM. Settles **C6 final sizing = promote the rebuilt VM**. Recorded in DECISIONS.md + STATUS-1c (flagged for the Adversary so they don't tear down their W5 VM on PASS). My already-destroyed first throwaway + RAM accounting unaffected. ## 2026-05-27 — Added acceptance step: real e2e !testme on the promoted VM (operator-gated) Orchestrator added a functional-acceptance step for the clean-room rebuild. SEQUENCING (strict): (1) finish W5/C4-C5; (2) ORCHESTRATOR renames the verified throwaway → cc-nix-test so the public gateway (ci.commoninternet.net + `*.ci` via MagicDNS) routes to it, and SIGNALS me; (3) THEN I run a genuine e2e: `!testme` (as bot) on ONE enrolled recipe (fast, e.g. custom-html) → confirm bridge picks up → Drone builds → app deploys to `.ci.commoninternet.net` reachable **through the public gateway** (curl the public subdomain, not localhost) → test passes → undeploy → result reported. Record Drone run # + public-URL curl in JOURNAL-1c/STATUS-1c as functional acceptance of D8/clean-room. Until the swap-done signal: keep the rebuilt VM's full stack running, do NOT tear down, do NOT start the e2e. (Tracked as W5.5 in BACKLOG-1c.) ## 2026-05-27 — E2E-TESTME spec is authoritative (cc-ci-plan/test-e2e-testme-acceptance.md) Orchestrator: the full spec at `/srv/cc-ci/cc-ci-plan/test-e2e-testme-acceptance.md` is the AUTHORITY (supersedes earlier inline wording). Read it. It's MY test to execute; Adversary independently verifies. Preconditions P1-P3 are orchestrator-provided (node rename → cc-nix-test, public-gateway routing, then a SIGNAL). Self-check on signal: `curl https://ci.commoninternet.net/` → 200 ssl_verify=0. Pass criteria E1-E6 (new spec §3): E1 self-check; E2 new Drone build via bridge (not manual); E3 app answers EXTERNAL request at `.ci.commoninternet.net` through gateway (real 200+cert+content, not localhost); E4 real assertions pass / build success; E5 clean undeploy; E6 reported + dashboard updated. Evidence→JOURNAL-1c, verdict→STATUS/REVIEW-1c as E2E-TESTME PASS. On fail: clean-room finding → fix in GIT SOURCE (base/cc-ci-secrets), not the live VM → re-run. Bound: one recipe, one green run. Not started — awaiting orchestrator signal; rebuilt VM stack kept up. ## 2026-05-27 — E2E-TESTME: Builder now owns the tailnet swap (no orchestrator signal) Spec §1 updated (re-read): the Builder performs the swap end-to-end after C4/C5 PASS + rebuilt stack up — NO orchestrator signal. Two reversible `tailscale set --hostname` (ORDER MATTERS): (1) `ssh cc-ci 'tailscale set --hostname=cc-nix-test-orig'` (original aside, KEEP running for swap-back; ssh cc-ci pinned to 100.90.116.4 still hits original); (2) rebuilt throwaway → cc-nix-test (re-derive its current online IP from `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep -i throwaway`). Then cc-nix-test.taila4a0bf.ts.net → rebuilt VM tailnet-wide; gateway auto-follows ~10s. Verify P1+P2 (status shows cc-nix-test→throwaway IP; `curl https://ci.commoninternet.net/` 200 ssl_verify=0) → run E2E-TESTME (E1-E6) → swap-back (rebuilt→old name, `ssh cc-ci 'tailscale set --hostname=cc-nix-test'`). Orchestrator just monitors / safety-net. **Two execution watch-outs I'll handle at run time** (reasoned, not yet done): (a) the original (cc-nix-test-orig) keeps its bridge polling Gitea with the same token → would duplicate builds/PR comments; pause it during the e2e (`docker service scale ccci-bridge_app=0` on the original, restore after). (b) the rebuilt VM's Drone needs the one-time OAuth bootstrap (install.md §2, scripts/bootstrap-drone-oauth.sh) before it can clone/build — a documented post-step, run it on the rebuilt VM as part of e2e setup. Still gated on C4/C5 PASS (W5) — not started. ## 2026-05-27 — E2E-TESTME actor/critic split clarified (avoid node-rename collision) Orchestrator disambiguation: only ONE loop runs `tailscale set --hostname`. **Builder (me) owns the swap + the !testme test**; the swap TARGET is the **Adversary's** kept-running W5 VM (Incus instance **`ccci-w5-rebuild`**) — my own throwaway was destroyed. The **Adversary does NOT rename**; it keeps its W5 VM up, **records the VM identity (Incus instance + current tailscale IP) in REVIEW-1c/STATUS**, and independently VERIFIES E1-E6 cold (critic role). So I **WAIT for (i) Adversary W5 PASS + (ii) the recorded VM IP** before swapping (original→cc-nix-test-orig, then ccci-w5-rebuild→cc-nix-test). Updated STATUS-1c pending-e2e accordingly. Still gated on W5 — not started. ## 2026-05-27 — E2E-TESTME clean-room finding: Drone bot token not reproducible (FIXED in git) Doing the e2e setup on the swapped-in rebuilt VM, found the sops `bridge_drone_token` gets **401 Unauthorized** from the rebuilt VM's Drone. Root cause: `modules/drone.nix` set `DRONE_USER_CREATE=username:autonomic-bot,admin:true` with **no `token:`** → Drone auto-generates a RANDOM bot machine token in its fresh DB, which can't equal the committed sops token (the original cc-ci only matched because its token was captured FROM the running Drone out-of-band). So on a genuine clean-room rebuild the bridge can't authenticate to Drone → can't trigger builds. This is precisely the out-of-band gap the E2E-TESTME is designed to catch (spec §4). **Fix (git source):** `DRONE_USER_CREATE=...,token:$(cat /run/secrets/bridge_drone_token)` so the bot's machine token is the deterministic sops token on every rebuild. Confirmed via: rebuilt Drone container env had no token; `GET /api/repos/.../builds` with sops token → `{"message":"Unauthorized"}`. Evolves the toplevel again (ld19aj2 → new); will re-deploy to cc-ci + re-verify byte-identical after the e2e, Adversary re-checks C1. Next: apply fix on the rebuilt VM (rebuild → redeploy Drone; wipe Drone DB if DRONE_USER_CREATE doesn't update the existing bot), re-run OAuth, then the !testme e2e. ## 2026-05-27 — E2E-TESTME on the rebuilt VM: E1-E3 PASS (E4/E5 tracking) After applying the Drone-token fix (new toplevel `cqym8knj…`), the rebuilt VM is operational. Restarted drone-runner-exec (stale RPC after the Drone redeploy) → queue drained (cc-ci self-test #1 success). Posted `!testme` (comment 13740, autonomic-bot) on custom-html#2 (head db9a9502). Evidence: - **E1 PASS** — `https://ci.commoninternet.net/` via public gateway → 200 ssl_verify=0 (rebuilt VM). - **E2 PASS** — bridge (poll) picked up the comment → **new Drone build #4** (event=custom, > baseline #3) on the rebuilt VM's Drone. Not a manual trigger. - **E3 PASS** — app deployed to `cust-bdddd9.ci.commoninternet.net`; EXTERNAL curl through the public gateway (sandbox → socks proxy → public DNS → gateway → MagicDNS cc-nix-test → rebuilt VM → Traefik → app) → **HTTP/2 200, ssl_verify=0**, `server: nginx/1.31.1`, body `…Welcome to nginx!` (real app content, NOT a Traefik 404), cert `CN=*.ci.commoninternet.net` (LE E8). Crux proven. - E4 (build #4 success), E5 (teardown), E6 (reported+dashboard): monitor tracking to build terminal. ## 2026-05-27 — E2E-TESTME: ALL E1–E6 PASS (functional acceptance of D8/clean-room) Real `!testme` on the rebuilt-from-git VM (swapped in as cc-nix-test), full pipeline against the PUBLIC domain: - **E1 PASS** — `https://ci.commoninternet.net/` (public gateway → rebuilt VM) → 200 ssl_verify=0. - **E2 PASS** — `!testme` (bot, comment 13740) on custom-html#2 → bridge poll → **new Drone build #4** (event=custom, > baseline #3), via the bridge (not manual). - **E3 PASS** — app `cust-bdddd9.ci.commoninternet.net` answered an EXTERNAL request through the public gateway → HTTP/2 200, ssl_verify=0, nginx/1.31.1, real body `…Welcome to nginx!`, cert `CN=*.ci.commoninternet.net` (LE E8). Routing public-DNS→gateway→MagicDNS→rebuilt VM→Traefik→app proven. - **E4 PASS** — build #4 success; build log shows the REAL 3 stages all passing (no softening): install (`test_http_reachable`, `test_playwright_page` — Playwright), upgrade (`test_upgrade_preserves_data`), backup (`test_backup_mutate_restore`). 2+1+1 assertions passed. - **E5 PASS** — app undeployed cleanly afterward (0 residual `-<6hex>` app .envs/stacks). - **E6 PASS** — bridge posted to custom-html#2: "custom-html @ db9a9502 ✅ **passed** → …/cc-ci/4"; public dashboard row = custom-html / success / #4. → **E2E-TESTME PASS.** The clean-room-rebuilt VM is operationally a working CI server end-to-end over the real public domain. Caught+fixed the Drone-bot-token reproducibility gap en route (af46aca). Next: swap-back; re-deploy the token fix to cc-ci (byte-identical at new toplevel cqym8knj); Adversary independently verifies E1-E6. ## 2026-05-27 — Builder work COMPLETE (C1–C7 + E2E-TESTME); awaiting Adversary final verification cc-ci on final config `cqym8knj` (byte-identical, 0 failed, bridge→Drone OK). C7 docs done: install.md/secrets.md/architecture.md updated to the 1c model; plan.md §1.5 carries a Phase-1c supersession note (cert now sops-from-git; bootstrap age key the one out-of-band secret; supersedes §1.5/§4.0/§4.4 cert refs; points to docs/secrets.md). C6 settled (promote rebuilt VM, kept running; first throwaway destroyed; cc-nix-test 4 GB). All C1–C7 + E2E-TESTME implemented & Builder-verified. **Remaining = Adversary's final DONE-verification:** re-confirm C1 byte-identical at `cqym8knj` + independently verify E1–E6. I'll write `## DONE` when REVIEW-1c shows <24h PASS for C1–C7 + E2E-TESTME and no VETO. (plan.md is in cc-ci-plan/, not this repo — edited in place, not committed here.) ## 2026-05-27 — ADV-1c-1 (architecture.md stale) addressed Adversary verdict b301b03: **E2E-TESTME E1–E6 PASS** (independent) + **C1–C6 PASS** (C1 refreshed cold at final `cqym8knj` == running, byte-identical; no VETO). **C7 WITHHELD** on finding ADV-1c-1: `docs/architecture.md` allegedly stale (line 17 "local secrets/secrets.yaml via host SSH key", cert "pre-issued out-of-band"). **But architecture.md was already updated to the 1c model in commit b700cd2** (an ANCESTOR of `3bfb48b`, the HEAD the Adversary cloned for C1) — current line 14/17 + §Network/TLS describe the `cc-ci-secrets` submodule, bootstrap age key, and cert-sops-from-git. The quoted "stale" text is the PRE-b700cd2 line 17 → ADV-1c-1 is a stale-clone false positive (the doc-grep used an older checkout). To remove all doubt I further expanded line 17 (explicit: cert-in-git, submodule, bootstrap key = host-derived OR recovery-key-on-clone, one out-of-band secret). **Adversary: please re-grep `docs/architecture.md` at current HEAD and close ADV-1c-1 → C7 PASS → DONE.** ## 2026-05-27 — ## DONE (Phase 1c complete) Adversary closed ADV-1c-1 → **C7 PASS** (9e0f72a). **ALL C1–C7 + E2E-TESTME Adversary-PASS (<24h, no VETO, no open findings).** Final Builder health check: cc-ci running/0-failed, byte-identical build==running==`cqym8knj` (ZERO DRIFT), 6 stacks, cert sops-from-git `c1d96d61…`, public TLS 200/ssl=0. Wrote `## DONE` to STATUS-1c. Phase 1c exit condition met → stopping the self-paced loop. The Adversary will append its final cold sign-off. Operator follow-up (non-gating): promote `ccci-w5-rebuild`→cc-nix-test (bridge paused, stack up); plan.md §4.0/§4.4 cert wording (superseding note at §1.5).