From 01874821f2cbd05b1d7b32208e1ad4492b37f5da Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sun, 31 May 2026 00:16:37 +0000 Subject: [PATCH] decommission Pi: update all docs for VM-only setup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The orchestrator Pi is retired (2026-05-31). All agents now run on the cc-ci-orchestrator VM (NixOS, loops user, /srv/cc-ci). The VM is a direct tailnet peer to cc-ci — no SOCKS proxy, no userspace tailscaled, no ProxyCommand. Updated across all affected files: AGENTS.md - Remove Pi from reboot description; migration complete (not "parked") - cc-ci access: direct ssh, not via proxy kickoff.md - Prerequisites: direct tailnet peer, not proxy - Host deps: NixOS (not apt) - Fallback/Incus: b1 reachable directly, no --proxy curl flag plan.md §1 + §1.5 - §1 bootstrap: direct SSH, check tailscale status (not restart proxy) - §1.5 intro: "VM" not "sandbox host"; no proxy - Credentials table: remove TS_AUTH_KEY row; update cc-ci SSH row - Replace "Tailscale connection (proxy)" subsection with direct-peer description plan-orchestrator-migration.md - Mark COMPLETE (2026-05-31); historical record only plan-phase1c-full-reproducibility.md - Incus access: direct, not via SOCKS proxy prompts/builder.md + prompts/adversary.md - cc-ci access language only: direct ssh, no proxy restart instructions - adversary: *.ci.commoninternet.net via plain curl, no proxy flag REBOOTS.md - Retitle for VM; note Pi retired; Pi entries marked historical systemd/cc-ci-loops.service - User/Group/HOME/PATH: notplants → loops - Remove cc-ci-tailscaled.service dependency (no proxy on VM) - Add note about nix/configuration.nix as the authoritative VM declaration test-e2e-testme-acceptance.md - tailscale status: no --socket flag - ssh to throwaway: no ProxyCommand Co-Authored-By: Claude Sonnet 4.6 --- AGENTS.md | 10 ++-- cc-ci-plan/REBOOTS.md | 13 +++-- cc-ci-plan/kickoff.md | 14 +++-- cc-ci-plan/plan-orchestrator-migration.md | 2 +- .../plan-phase1c-full-reproducibility.md | 4 +- cc-ci-plan/plan.md | 52 ++++++++----------- cc-ci-plan/prompts/adversary.md | 2 +- cc-ci-plan/prompts/builder.md | 2 +- cc-ci-plan/systemd/cc-ci-loops.service | 17 +++--- cc-ci-plan/test-e2e-testme-acceptance.md | 7 +-- 10 files changed, 58 insertions(+), 65 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index e40256c..04fa8ea 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -20,7 +20,7 @@ watches from outside. **Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online — the operator wants to know the supervising session is back (especially after a reboot, which kills -this session along with the Pi). Include the current phase and the reboot count. Steps on startup: +this session). Include the current phase and the reboot count. Steps on startup: 1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status` (current phase + whether the loops/watchdog are running). 2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog @@ -32,8 +32,8 @@ this session along with the Pi). Include the current phase and the reboot count. Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops + watchdog auto-resume the saved phase. The orchestrator session itself is NOT auto-started — the -operator reconnects to it (that's why the startup notification matters). The fuller "move the -orchestrator onto its own VM" plan is parked at `cc-ci-plan/plan-orchestrator-migration.md`. +operator reconnects to it (that's why the startup notification matters). The VM migration is +complete; see `cc-ci-plan/plan-orchestrator-migration.md` (historical record). ## Keep the orchestrator open, under remote-control @@ -71,8 +71,8 @@ cc-ci VM"). The orchestrator is the human's steering wheel; the loops are the en - `.testenv` (**NOT committed**): Tailscale auth key + Gitea bot creds. Load with `set -a; . .testenv; set +a` (never echo the values). -- **cc-ci:** `ssh cc-ci` (root) tunnels through the persistent userspace-tailscaled SOCKS proxy on - `127.0.0.1:1055` (`cc-ci-tailscaled.service`). If down: `sudo systemctl restart cc-ci-tailscaled`. +- **cc-ci:** `ssh cc-ci` (root) directly — the orchestrator VM is a direct tailnet peer (`100.90.116.4`). + No proxy. Key: `~/.ssh/cc-ci-root-ed25519`. If unreachable, check `tailscale status`. - **Incus/VM fallback:** mTLS certs at `/srv/incus-terraform-nix-vm-creator/terraform-secrets/`; b1 is on the same tailnet (reach via the same proxy). See kickoff "Fallback". - **Full credential map + how to use each:** `plan.md` §1.5. diff --git a/cc-ci-plan/REBOOTS.md b/cc-ci-plan/REBOOTS.md index 33c4b67..102d9e4 100644 --- a/cc-ci-plan/REBOOTS.md +++ b/cc-ci-plan/REBOOTS.md @@ -1,10 +1,13 @@ -# Reboot log — cc-ci orchestrator Pi +# Reboot log — cc-ci orchestrator VM -One line per genuine reboot of the orchestrator Pi (`raspberrypi`), appended automatically by +**Note:** the orchestrator Pi (`raspberrypi`) was decommissioned 2026-05-31. All agents now run on +the `cc-ci-orchestrator` NixOS VM (tailnet `100.116.55.106`). The three Pi reboot entries below are +historical. Entries from 2026-05-31 onward are VM reboots. + +One line per genuine reboot of the orchestrator host, appended automatically by `reboot-log.sh` (ExecStartPre of `cc-ci-loops.service`, boot_id-gated so manual service restarts are -NOT counted). The Pi hosts the Builder + Adversary loops + watchdog; a reboot drops the tmux sessions -(and this orchestrator session), and `cc-ci-loops.service` restarts the loops on boot. Count the -lines below to see how often it's happening. +NOT counted). A reboot drops the tmux sessions (and the orchestrator session); `cc-ci-loops.service` +restarts the loops on boot. Count the lines below to see how often it's happening. ## Reboots diff --git a/cc-ci-plan/kickoff.md b/cc-ci-plan/kickoff.md index 21f16bb..1ca8891 100644 --- a/cc-ci-plan/kickoff.md +++ b/cc-ci-plan/kickoff.md @@ -70,8 +70,8 @@ account (`claude auth status`); set `REMOTE_CONTROL=0` to skip the remote surfac `claude` process, after which the watchdog brings it back (a fresh remote-control session) — and when `STATUS.md` shows `## DONE`, it kills the loops and exits. -Prerequisites the sessions inherit from your shell: SSH (root) to `cc-ci` via the Tailscale proxy -(§1.5), Gitea bot creds, and `git.autonomic.zone` access. Plus **preconfigured** operator inputs the +Prerequisites the sessions inherit from your shell: SSH (root) to `cc-ci` directly (the orchestrator +VM is a direct tailnet peer — no proxy; §1.5), Gitea bot creds, and `git.autonomic.zone` access. Plus **preconfigured** operator inputs the loop depends on (plan §4.0/§4.4): the wildcard `*.ci.commoninternet.net` DNS record pointing at a gateway that TLS-passthroughs to cc-ci, and the **pre-issued wildcard cert** at `/var/lib/ci-certs/live/` on cc-ci. The operator owns the DNS record + gateway + cert @@ -79,8 +79,7 @@ issuance/renewal; the agent builds Traefik (file provider → that cert) + routi **no ACME**. If any prerequisite is absent, the Builder parks at `STATUS.md ## Blocked` (plan §1/§9) rather than improvise. -> Host deps: `launch.sh` needs **tmux** (and `claude`) — tmux is installed on this sandbox host -> (3.5a). On a fresh host: `sudo apt-get install -y tmux`. The script's `*_DIR` +> Host deps: `launch.sh` needs **tmux** (and `claude`) — both are installed on the VM (NixOS). The script's `*_DIR` > defaults now point at `/srv/cc-ci/...` (Builder clone `/srv/cc-ci/cc-ci`, Adversary > `/srv/cc-ci/cc-ci-adv`); override the `*_DIR` env vars only if your layout differs. @@ -112,13 +111,12 @@ re-bootstrap. (`100.117.251.31:8443`, Incus project `terraform-ci`). Skill + Terraform live at `/srv/incus-terraform-nix-vm-creator/` (`skills/incus-terraform/SKILL.md`); read that for full usage. -- **Access:** b1 is on the *same* cc-ci tailnet, so reach the Incus API through the existing - `cc-ci-tailscaled` SOCKS proxy (`127.0.0.1:1055`) with the mTLS certs in that repo's - `terraform-secrets/` — no second tailscaled needed. Quick check: +- **Access:** b1 (`100.117.251.31`) is reachable directly from the orchestrator VM (same tailnet). + Use the mTLS certs at `terraform-secrets/` — no proxy. Quick check: ```bash CRT=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.crt KEY=/srv/incus-terraform-nix-vm-creator/terraform-secrets/terraform.key - curl --proxy socks5h://localhost:1055 --cert "$CRT" --key "$KEY" -k -s \ + curl --cert "$CRT" --key "$KEY" -k -s \ https://100.117.251.31:8443/1.0/instances/cc-nix-test/state?project=terraform-ci ``` - **Soft restart (keeps the disk — preferred):** `POST .../1.0/instances/cc-nix-test/state?project=terraform-ci` diff --git a/cc-ci-plan/plan-orchestrator-migration.md b/cc-ci-plan/plan-orchestrator-migration.md index e68a204..2b6ecc7 100644 --- a/cc-ci-plan/plan-orchestrator-migration.md +++ b/cc-ci-plan/plan-orchestrator-migration.md @@ -10,7 +10,7 @@ relocating this orchestrator session there too. (claude CLI, proxy, loop supervisor) as systemd services that come back on boot — turning a reboot into a non-event. It also consolidates the orchestrator next to the infra it manages. -**Status:** IN PROGRESS (operator go-ahead 2026-05-30 — the Pi is OOM-thrashing/slow). +**Status:** COMPLETE (2026-05-31). All agents run on the VM; Pi fully decommissioned. Kept as a historical record. **Phase A ✅ COMPLETE (2026-05-30):** VM `cc-ci-orchestrator` (**2 GB / 2 vCPU / 30 GB**, `incus-base-vm`, NixOS 24.11) created via the Incus API + booted; **on the tailnet at diff --git a/cc-ci-plan/plan-phase1c-full-reproducibility.md b/cc-ci-plan/plan-phase1c-full-reproducibility.md index 305c047..f338455 100644 --- a/cc-ci-plan/plan-phase1c-full-reproducibility.md +++ b/cc-ci-plan/plan-phase1c-full-reproducibility.md @@ -124,8 +124,8 @@ When C1–C7 hold and are Adversary-verified, write `## DONE` to Phase-1c `STATU The loops normally only `ssh cc-ci`. For 1c they MAY drive Incus on **b1** (resize `cc-nix-test`; create/destroy ONE throwaway VM in `terraform-ci`), using the mTLS certs at -`/srv/incus-terraform-nix-vm-creator/terraform-secrets/` through the existing SOCKS proxy -(`127.0.0.1:1055`) — see the incus skill (`/srv/incus-terraform-nix-vm-creator/skills/incus-terraform/SKILL.md`) +`/srv/incus-terraform-nix-vm-creator/terraform-secrets/` (b1 is reachable directly from the VM — +direct tailnet peer, no proxy) — see the incus skill (`/srv/incus-terraform-nix-vm-creator/skills/incus-terraform/SKILL.md`) and [[cc-ci-vm-incus]]. Guardrails: only `terraform-ci`; keep total running RAM within the **~12 GB guideline** (doc-only — terraform-ci has no enforced `limits.memory`; b1 is 16 GB physical) — hence `cc-nix-test`→4 GB + throwaway 4 GB + lichen-staging 4 GB = 12 GB; **destroy the throwaway VM when diff --git a/cc-ci-plan/plan.md b/cc-ci-plan/plan.md index 89ed17d..2acfe2b 100644 --- a/cc-ci-plan/plan.md +++ b/cc-ci-plan/plan.md @@ -33,9 +33,9 @@ Do these in order. Each step is idempotent; re-running is safe. 1. **Verify access.** (Full credential map + how each is used is in **§1.5** — read it first.) - `ssh cc-ci 'hostname && whoami'` — you log in as **root** on cc-ci (NixOS), so there is no - separate sudo step. `ssh cc-ci` is preconfigured to tunnel through the userspace-tailscaled - SOCKS proxy (§1.5); if it fails, the proxy/daemon is probably down — restart it (§1.5) before - declaring blocked. + separate sudo step. `ssh cc-ci` reaches cc-ci directly (the orchestrator VM is a direct tailnet + peer — no proxy; key `~/.ssh/cc-ci-root-ed25519`). If it fails, check `tailscale status` + before declaring blocked. - `ssh cc-ci 'nixos-version'` — confirm NixOS. - Confirm you can reach the Gitea API with the bot creds from `.testenv` (§1.5): `curl -s https://$GITEA_URL/api/v1/version`. The bot authenticates with @@ -72,45 +72,35 @@ Do these in order. Each step is idempotent; re-running is safe. ## 1.5 Credentials & access — where everything lives and how to use it -The loops run **on the sandbox host** (not on cc-ci) and reach cc-ci over Tailscale. This section -is the authoritative map of what credentials exist, where, and how to use them. **Never copy any -secret value into the repo, a commit, a log, or the dashboard** (§9) — reference locations only. +The loops run **on the cc-ci-orchestrator VM** (`100.116.55.106`, NixOS, `loops` user) and reach +cc-ci directly over Tailscale (direct tailnet peer — no proxy). This section is the authoritative +map of what credentials exist, where, and how to use them. **Never copy any secret value into the +repo, a commit, a log, or the dashboard** (§9) — reference locations only. ### Provided credentials (already in place) | What | Where | How to use | |---|---|---| -| **Tailscale auth key** (joins cc-ci's tailnet `taila4a0bf.ts.net`) | `/srv/cc-ci/.testenv` → `TS_AUTH_KEY` (Tailscale SaaS key, keyID ends `CNTRL`) | Used to bring up the userspace tailscaled (below). It's reusable; re-run `tailscale up` with it if the node drops. | -| **cc-ci SSH (root)** | private key `~/.ssh/cc-ci-root-ed25519`; config `Host cc-ci` in `~/.ssh/config` | Just run `ssh cc-ci` (logs in as **root**). The pubkey is already in cc-ci's `/root/.ssh/authorized_keys`. | +| **cc-ci SSH (root)** | private key `~/.ssh/cc-ci-root-ed25519`; `Host cc-ci` in `~/.ssh/config` (HostName `100.90.116.4`, no ProxyCommand) | Just run `ssh cc-ci` (logs in as **root**). The orchestrator VM is a direct tailnet peer — direct route, no proxy. Pubkey already in cc-ci's `/root/.ssh/authorized_keys`. | | **Gitea bot account** | `/srv/cc-ci/.testenv` → `GITEA_USERNAME` (`autonomic-bot`), `GITEA_PASSWORD`, `GITEA_URL` (`git.autonomic.zone`) | Basic-auth to the Gitea API, or mint a scoped token: `POST https://$GITEA_URL/api/v1/users/$GITEA_USERNAME/tokens`. Used to push the `cc-ci` project repo, read recipe repos, comment on PRs, and poll for `!testme` (read-level; the bot does not register webhooks). | Load them in a shell with: `set -a; . /srv/cc-ci/.testenv; set +a` (don't echo the values). -### The Tailscale connection (how `ssh cc-ci` and the proxy work) +### The Tailscale connection (how `ssh cc-ci` works) -cc-ci (`cc-nix-test`, **100.90.116.4**) is on a *different* tailnet than the sandbox host's default -one, so it is reached via a **second, userspace tailscaled** — this keeps the host's own tailnet -untouched. State lives in `~/.cc-ci-ts/`; it exposes a **SOCKS5/HTTP proxy on `127.0.0.1:1055`**, -which is the only route to that tailnet (userspace networking ⇒ the host OS can't route the tailnet -IPs directly). +cc-ci (`cc-nix-test`, **100.90.116.4`) is on the same tailnet as the orchestrator VM +(`taila4a0bf.ts.net`), so it is reached **directly** — no SOCKS proxy, no userspace tailscaled. +The VM's system tailscaled is on that tailnet; `ssh cc-ci` routes straight to cc-ci. -It runs as a **persistent systemd service** (`cc-ci-tailscaled.service`, enabled, `Restart=always`, -starts on boot; unit at `/etc/systemd/system/cc-ci-tailscaled.service`, runs as user `notplants`). -It reuses the already-authenticated state in `~/.cc-ci-ts/`, so it reconnects across reboots/crashes -without the auth key. - -- `ssh cc-ci` works out of the box (its `ProxyCommand` uses the proxy; logs in as root). -- For HTTP(S) to cc-ci / `*.ci.commoninternet.net` from the sandbox, go through the proxy, e.g. - `curl --proxy socks5h://localhost:1055 https://.ci.commoninternet.net`. -- **If connectivity is down:** `sudo systemctl restart cc-ci-tailscaled` (diagnose with - `systemctl status cc-ci-tailscaled` / `journalctl -u cc-ci-tailscaled`). A dead proxy is an access - failure to recover, not a `## Blocked`-and-stop condition — *unless* the auth key itself is - rejected (then re-auth with `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock up - --auth-key="$TS_AUTH_KEY" --hostname=cc-ci-claude-sandbox --accept-routes --accept-dns=false`, and - if that fails the key is a class-A1 blocker). -- **DNS gotcha:** this host's `/etc/resolv.conf` lists only Tailscale resolvers, so direct - `dig @1.1.1.1 …` queries get no answer and look falsely empty. Use `getent hosts ` to - resolve from the sandbox. `commoninternet.net` itself is a normal public zone hosted at **Gandi**. +- `ssh cc-ci` works out of the box (`~/.ssh/config` has `Host cc-ci` pinned to `100.90.116.4`, + no ProxyCommand; key `~/.ssh/cc-ci-root-ed25519`; logs in as root). +- For HTTP(S) to cc-ci / `*.ci.commoninternet.net` from the VM, use plain `curl` — no proxy flag + needed. The VM uses public DNS resolvers (`1.1.1.1`/`8.8.8.8`) so `*.ci.commoninternet.net` + resolves normally. +- **If `ssh cc-ci` fails:** run `tailscale status` (as loops or root) to confirm the VM is still + on the tailnet and cc-ci is listed; check `systemctl status tailscaled`. A connectivity failure + is recoverable, not an immediate `## Blocked`-and-stop, unless the VM has lost tailnet membership + entirely (then that IS a class-A1 blocker). ### Credentials the loop GENERATES itself (do not wait on a human for these) diff --git a/cc-ci-plan/prompts/adversary.md b/cc-ci-plan/prompts/adversary.md index 3e78df4..ce878a6 100644 --- a/cc-ci-plan/prompts/adversary.md +++ b/cc-ci-plan/prompts/adversary.md @@ -7,7 +7,7 @@ LIVENESS PROTOCOL (the watchdog ENFORCES this — see plan.md §7): - **Declare every wait.** Immediately before going idle, your FINAL output line MUST be exactly `WAITING-UNTIL: ` — the time you will resume (≤10 min out, matching your ScheduleWakeup). Compute it from the clock (`date -u -d '+10 min' +%FT%TZ`). If the watchdog sees you idle ≥5 min with no current marker as your last line, OR idle past the time it names, it kills + reboots you (you resume cleanly from git + your REVIEW/STATUS files). - **Compact proactively.** If context usage climbs high (≳80%), run `/compact` before continuing — your loop state is in git + REVIEW/STATUS, so compaction is lossless and prevents wedging at the context limit. -Credentials/access: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv and ~/.ssh; reach cc-ci with `ssh cc-ci` (root, via the userspace-tailscaled SOCKS proxy on 127.0.0.1:1055), and hit the dashboard / *.ci.commoninternet.net through that proxy (`curl --proxy socks5h://localhost:1055 ...`). If the proxy is down, restart it per §1.5. Verify from a COLD START but you may rely on this shared access path. +Credentials/access: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv and ~/.ssh; reach cc-ci with `ssh cc-ci` (root, direct tailnet peer — no proxy). Hit the dashboard / *.ci.commoninternet.net via regular HTTP(S) from the VM (uses public DNS, no proxy needed). Verify from a COLD START but you may rely on this shared access path. You run as a SEPARATE process and coordinate ONLY through the git repo per §6.1: - Keep your OWN clone at /srv/cc-ci/cc-ci-adv. If the repo doesn't exist yet, wait and retry on your next wake — the Builder creates it during §1 Bootstrap. diff --git a/cc-ci-plan/prompts/builder.md b/cc-ci-plan/prompts/builder.md index c31a515..8ca7417 100644 --- a/cc-ci-plan/prompts/builder.md +++ b/cc-ci-plan/prompts/builder.md @@ -27,7 +27,7 @@ Overriding rules: - Never weaken, skip, or delete a test to make a run pass. A red test is information. - Only cc-ci is yours to reconfigure. Never push code to recipe repos; never touch production servers/domains. Keep server state Nix-declared and reversible. - 3rd identical failure → stop, record dead-end in DECISIONS.md, change approach or mark blocked. -- Credentials: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv (TS_AUTH_KEY, GITEA_USERNAME/PASSWORD/URL) and ~/.ssh (cc-ci-root-ed25519). Reach cc-ci with `ssh cc-ci` (root, via the userspace-tailscaled SOCKS proxy on 127.0.0.1:1055); if it fails, restart the proxy per §1.5 before declaring blocked. There is NO ready-made $GITEA_TOKEN — mint one from the bot creds if you want a token. +- Credentials: §1.5 is the authoritative map. Provided creds are in /srv/cc-ci/.testenv (GITEA_USERNAME/PASSWORD/URL) and ~/.ssh (cc-ci-root-ed25519). Reach cc-ci with `ssh cc-ci` (root, direct tailnet peer — no proxy). There is NO ready-made $GITEA_TOKEN — mint one from the bot creds if you want a token. - Secret classes (§4.4), handled differently: • Class A1 EXTERNAL infra inputs (cc-ci SSH/root access, TS auth key, Gitea bot creds, the pre-issued wildcard TLS cert at /var/lib/ci-certs/live/, registry creds; plus the preconfigured DNS/gateway facts): if missing/invalid → STATUS.md ## Blocked and stop. Do NOT improvise/invent. NEVER attempt ACME/DNS-01 for commoninternet.net — the cert is pre-provided and renewed out-of-band; point Traefik's file provider at /var/lib/ci-certs/live/{fullchain.pem,privkey.pem}. • Class A2 INTERNAL infra secrets (Drone RPC, webhook HMAC, Gitea OAuth app, host age key): you GENERATE these yourself — never block on them. diff --git a/cc-ci-plan/systemd/cc-ci-loops.service b/cc-ci-plan/systemd/cc-ci-loops.service index 3270274..005e3ff 100644 --- a/cc-ci-plan/systemd/cc-ci-loops.service +++ b/cc-ci-plan/systemd/cc-ci-loops.service @@ -1,22 +1,23 @@ [Unit] -# Canonical, version-controlled copy of the unit installed at /etc/systemd/system/cc-ci-loops.service. +# Canonical, version-controlled copy of the unit for the cc-ci-orchestrator VM. # Install: sudo install -m0644 cc-ci-plan/systemd/cc-ci-loops.service /etc/systemd/system/ \ # && sudo systemctl daemon-reload && sudo systemctl enable cc-ci-loops.service -# Brings the WHOLE rig back after a reboot of the orchestrator Pi: loops + watchdog (launch.sh) AND +# NOTE: the VM's actual reboot-resilience service is declared in nix/configuration.nix (systemd.services.cc-ci-loops). +# This file is the repo reference copy — keep both in sync when making changes. +# Brings the WHOLE rig back after a reboot of the cc-ci-orchestrator VM: loops + watchdog (launch.sh) AND # the orchestrator supervisory session (launch-orchestrator.sh), plus a reboot record (reboot-log.sh). Description=cc-ci autonomous loops + watchdog + orchestrator (reboot-resilient) Documentation=file:///srv/cc-ci/cc-ci-plan/plan.md -After=network-online.target cc-ci-tailscaled.service +After=network-online.target tailscaled.service Wants=network-online.target -Requires=cc-ci-tailscaled.service [Service] Type=oneshot RemainAfterExit=yes -User=notplants -Group=notplants -Environment=HOME=/home/notplants -Environment=PATH=/home/notplants/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin +User=loops +Group=loops +Environment=HOME=/home/loops +Environment=PATH=/home/loops/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin # RESUME_PHASE=1 so a reboot resumes the SAVED phase (e.g. phase 2), never restarts from phase 0/1c. Environment=RESUME_PHASE=1 # 1) record the reboot (boot_id-gated); 2) start loops + watchdog; 3) resume the orchestrator session. diff --git a/cc-ci-plan/test-e2e-testme-acceptance.md b/cc-ci-plan/test-e2e-testme-acceptance.md index d84f6cf..21df7a5 100644 --- a/cc-ci-plan/test-e2e-testme-acceptance.md +++ b/cc-ci-plan/test-e2e-testme-acceptance.md @@ -37,12 +37,13 @@ it. **Do this only after C4/C5 PASS** and after the rebuilt VM's full stack regardless of the name change.) 2. **Rename the rebuilt throwaway → `cc-nix-test`.** Re-derive its current tailscale IP (throwaways get a fresh IP each rebuild): pick the ONLINE throwaway node from - `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep -i throwaway`, then: + `tailscale status | grep -i throwaway`, then: ``` ssh -i /srv/incus-terraform-nix-vm-creator/terraform-secrets/vm_ssh_key \ - -o ProxyCommand='nc -X 5 -x 127.0.0.1:1055 %h %p' root@ \ + root@ \ 'tailscale set --hostname=cc-nix-test' ``` + (The orchestrator VM is a direct tailnet peer — no ProxyCommand needed.) **Heads-up — tailnet-wide effect:** after the swap, `cc-nix-test.taila4a0bf.ts.net` resolves to the rebuilt VM for *everyone* on the tailnet, so any of your own tooling that targets cc-nix-test **by @@ -51,7 +52,7 @@ the original). Account for that when you point `!testme`/deploys. **Verify the swap took (P1+P2) before starting the e2e** — must pass: ``` -tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep cc-nix-test # → the throwaway's IP +tailscale status | grep cc-nix-test # → the throwaway's IP curl -sS -o /dev/null -w '%{http_code} ssl_verify=%{ssl_verify_result}\n' https://ci.commoninternet.net/ # expect: 200 ssl_verify=0 (real public path now served by the rebuilt VM, valid cert) ```