diff --git a/.gitignore b/.gitignore index 36fcfc2..9b81468 100644 --- a/.gitignore +++ b/.gitignore @@ -10,3 +10,9 @@ /cc-ci-adv/ /.cc-ci-watch/ /.cc-ci-logs/ + +# More secrets / local state — NEVER commit +/.sops/ # master recovery age key +/cc-ci-secrets/ # separate sops-secrets repo, cloned in +/.claude/ # local claude session/project state +*.tmp.* # editor temp files diff --git a/AGENTS.md b/AGENTS.md index cc1c546..b9ad177 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -16,6 +16,25 @@ project (NixOS config, test runner, recipe tests) lives in a **separate** repo t The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator watches from outside. +## On startup: announce yourself + report reboots + +**Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online — +the operator wants to know the supervising session is back (especially after a reboot, which kills +this session along with the Pi). Include the current phase and the reboot count. Steps on startup: +1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status` + (current phase + whether the loops/watchdog are running). +2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog + running; N reboots logged (last )."* +3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the + loops are down), check that `cc-ci-loops.service` brought the loops back; if not, relaunch with + `RESUME_PHASE=1 cc-ci-plan/launch.sh start`. + +Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot +to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops + +watchdog auto-resume the saved phase. The orchestrator session itself is NOT auto-started — the +operator reconnects to it (that's why the startup notification matters). The fuller "move the +orchestrator onto its own VM" plan is parked at `cc-ci-plan/plan-orchestrator-migration.md`. + ## Keep the orchestrator open, under remote-control Run this session as a long-lived **interactive** session with `--remote-control` so the operator can diff --git a/cc-ci-plan/IDEAS.md b/cc-ci-plan/IDEAS.md index 6c472c1..9c0a2d5 100644 --- a/cc-ci-plan/IDEAS.md +++ b/cc-ci-plan/IDEAS.md @@ -4,6 +4,22 @@ Post-DONE or "revisit later" ideas that are intentionally **out of scope** for t (§2 Definition of Done). Not active work — parked here so they aren't lost. The loops may pull an item into the project `BACKLOG.md` as `[idea]` if/when it becomes relevant. +- [ ] **Optional `--extra-tests` flag for heavy / operational tests (opt-in heavy suite).** + Some recipe tests are "more than needed" for the default CI signal — state-management / + long-running-instance / load / helper-script operational tests that don't fit the ephemeral + per-run-deploy model cheaply but are useful occasionally. Today they're deferred to + `cc-ci/machine-docs/DEFERRED.md` (e.g. matrix-synapse `compress_state.sh`, + `test_complexity_limit.sh`, `test_purge.sh`) and don't run. + *Idea:* add an **opt-in `--extra-tests` flag** (e.g. `!testme --extra-tests` on a PR comment, or + a `STAGES=extra` / `EXTRA_TESTS=1` Drone build parameter) that the orchestrator passes through; + recipes declare an `extra/` test dir or mark tests with `@pytest.mark.extra`; on opt-in the + orchestrator runs them **alongside** the default tiers (still one deploy, still teardown). Default + off so default CI stays fast; the operator can ask for the heavy suite when reviewing a PR that + touches an extra-covered area (e.g. matrix-synapse's abra helpers). When implemented, each + matching DEFERRED entry can be CLOSED by porting its test into the recipe's `extra/` and noting + the commit in DEFERRED.md. *Why deferred for now:* default coverage is sufficient; this is a + later breadth/depth knob, not a critical-path feature. *Added:* 2026-05-28. + - [ ] **Optional webhook self-registration (admin-access environments).** We deliberately made **polling the primary trigger** and require the CI server/bot to run on **read-level** access only — so the server does **not** auto-register Gitea webhooks (that needs diff --git a/cc-ci-plan/README.md b/cc-ci-plan/README.md index 527357d..591af2d 100644 --- a/cc-ci-plan/README.md +++ b/cc-ci-plan/README.md @@ -17,7 +17,9 @@ autonomous Claude loops (a Builder and an adversarial Reviewer) running over day | `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. | | `plan-phase1c-full-reproducibility.md` | **Phase 1c** (runs first): make the VM fully reproducible from git (all secrets incl. the wildcard cert in sops, in a separate private `cc-ci-secrets` repo as a flake input; base stays well-parameterized) and do the **genuine throwaway-VM live rebuild** to close D8 honestly (the "infeasible by design" was overstated). | | `plan-phase1b-review-lint.md` | **Phase 1b** (after 1c): deterministic linting/formatting in CI + a white-box review checklist (real tests, DRY harness, idempotent Nix, no footguns/secrets), ending in a full cold re-verification of all D1–D10 — now covering 1c's refactor. | -| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1b): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. | +| `plan-phase1d-generic-test-suite.md` | **Phase 1d** (after 1b, before 2): a **generic install/upgrade/backup/restore** suite that runs on *any* recipe with zero config, with a recipe's own `test_.py` **overriding or extending** the generic (Builder's call) and **reusing the generic's deployment — no redeploy**, plus optional custom install-steps; recipes needing special setup fail the generic form gracefully. The test-architecture foundation Phase 2 builds on. | +| `plan-phase1e-harness-corrections.md` | **Phase 1e** (after 1d, before 2): three operator-review corrections to the shared generic harness — (HC1) upgrade goes previous-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved recipes** (default = cc-ci overlays + generic only); (HC3) the **generic runs by default** alongside an overlay, skipped only via explicit opt-out. | +| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1e): build on the corrected generic suite — author the recipe overlays (port recipe-maintainer tests as `test_*.py`) + define custom install steps where a recipe fails generically. | | `plan-phase2b-test-performance.md` | **Phase 2b** (after Phase 2, before Phase 3): empirically measure where test time goes and reduce it (image cache, readiness tuning, dedup deploys, warm infra, concurrency) — no weakened tests. | | `plan-phase3-results-ux.md` | **Phase 3** (after Phase 2b): beautiful YunoHost-style results — per-run **level**, image-forward PR comment (badge + summary card + app screenshot), polished dashboard. | | `IDEAS.md` | Deferred/future ideas, parked out of current scope. | diff --git a/cc-ci-plan/REBOOTS.md b/cc-ci-plan/REBOOTS.md new file mode 100644 index 0000000..6e3381f --- /dev/null +++ b/cc-ci-plan/REBOOTS.md @@ -0,0 +1,15 @@ +# Reboot log — cc-ci orchestrator Pi + +One line per genuine reboot of the orchestrator Pi (`raspberrypi`), appended automatically by +`reboot-log.sh` (ExecStartPre of `cc-ci-loops.service`, boot_id-gated so manual service restarts are +NOT counted). The Pi hosts the Builder + Adversary loops + watchdog; a reboot drops the tmux sessions +(and this orchestrator session), and `cc-ci-loops.service` restarts the loops on boot. Count the +lines below to see how often it's happening. + +## Reboots + +- 2026-05-28 (~19:?? BST) — reboot (backfilled from memory; mid-Phase-2). Orchestrator + loops were + down until manually relaunched. This pre-dates the systemd auto-restart service. +- 2026-05-28 (~20:02 BST) — reboot (backfilled from memory; uptime showed 5 min at 20:07). Loops + manually relaunched at phase 2; this is what prompted adding `cc-ci-loops.service` + + auto-logging. Auto-logging is live from the next reboot onward. diff --git a/cc-ci-plan/launch-orchestrator.sh b/cc-ci-plan/launch-orchestrator.sh new file mode 100755 index 0000000..7f927e2 --- /dev/null +++ b/cc-ci-plan/launch-orchestrator.sh @@ -0,0 +1,117 @@ +#!/usr/bin/env bash +# +# launch-orchestrator.sh — start/resume the cc-ci ORCHESTRATOR session in tmux under remote-control. +# +# The orchestrator (see /srv/cc-ci/AGENTS.md) is the long-lived SUPERVISORY session: it watches the +# Builder/Adversary loops, reads their logs/STATUS, edits the plan/prompts, restarts stuck loops, and +# owns the VM-level fallback. It is SEPARATE from the loops that launch.sh manages — this script only +# brings the orchestrator back (e.g. after a reboot, which kills the tmux server and every session in +# it). The conversation itself survives on disk across exits/reboots; remote-control only stays +# connected while the process is alive, so recovery = relaunch the process and re-attach by --resume. +# +# Naming: tmux session AND remote-control name are both "cc-ci-orchestrator", matching the loop +# sessions cc-ci-builder / cc-ci-adv / cc-ci-watchdog. +# +# Usage: +# ./launch-orchestrator.sh start # resume the persistent orchestrator session (DEFAULT) +# ./launch-orchestrator.sh fresh # start a NEW orchestrator session (no --resume) +# ./launch-orchestrator.sh status # show tmux + remote-control state +# ./launch-orchestrator.sh attach # tmux attach to the session (Ctrl-b d to detach) +# ./launch-orchestrator.sh stop # kill the tmux session (conversation persists on disk) +# +# The persistent session id is read from $ID_FILE (seeded on first run with DEFAULT_ID). A Claude +# session keeps the SAME id across --resume, so this stays valid across reboots. To point the script +# at a different session, edit that file or export ORCH_SESSION_ID. + +set -euo pipefail + +# ----- config ------------------------------------------------------------- +SESSION="${ORCH_SESSION:-cc-ci-orchestrator}" # tmux session name == remote-control name +WORKDIR="${ORCH_DIR:-/srv/cc-ci}" # orchestrator cwd (its claude project dir) +CLAUDE_BIN="${CLAUDE_BIN:-claude}" +CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}" +# REMOTE_CONTROL=1 → --remote-control session, viewable/steerable at claude.ai/code. Needs the box +# logged into the claude.ai account. =0 for a plain local interactive session. +REMOTE_CONTROL="${REMOTE_CONTROL:-1}" +LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}" +ID_FILE="${ORCH_ID_FILE:-$LOG_DIR/.orchestrator-session-id}" +DEFAULT_ID="34a80a99-b37e-4809-b8da-ccc9fafe785e" # the orchestrator session as of 2026-05-28 +# Startup nudge injected as the resumed session's first turn, so an AUTO-launched orchestrator (e.g. +# cc-ci-loops.service ExecStartPost after a reboot) actually RUNS its AGENTS.md startup routine — +# announce itself + report reboots — instead of resuming silently and waiting. Set empty to disable. +# Must contain NO single quotes (it is single-quoted into the tmux command). +STARTUP_PROMPT="${ORCH_STARTUP_PROMPT-STARTUP (auto-launch): you are the cc-ci orchestrator, just (re)launched, likely after a reboot. Do your AGENTS.md On-startup routine NOW: read cc-ci-plan/REBOOTS.md and run cc-ci-plan/launch.sh status, then send a proactive PushNotification that you are online with the current phase and reboot count, and confirm cc-ci-loops.service brought the loops + watchdog back (relaunch with RESUME_PHASE=1 cc-ci-plan/launch.sh start if not). Then resume supervising.}" +# -------------------------------------------------------------------------- + +log() { printf '[orchestrator %(%H:%M:%S)T] %s\n' -1 "$*"; } +die() { log "ERROR: $*"; exit 1; } +session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; } + +preflight() { + command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux" + command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)" + [[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR" + mkdir -p "$LOG_DIR" + [[ -f "$ID_FILE" ]] || echo "$DEFAULT_ID" > "$ID_FILE" +} + +resume_id() { echo "${ORCH_SESSION_ID:-$(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID")}"; } + +# Launch claude in a detached tmux session. $1=resume ("resume"|"fresh"). +start() { + local mode="${1:-resume}" + preflight + if session_alive; then + log "$SESSION already running — leaving it (use '$0 stop' first to relaunch)" + return 0 + fi + local rc="" resume="" id="" + [[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'" + if [[ "$mode" == "resume" ]]; then + id="$(resume_id)" + [[ -n "$id" ]] && resume="--resume '$id'" + log "starting $SESSION (resume=$id, cwd=$WORKDIR, rc=$REMOTE_CONTROL)" + else + log "starting $SESSION FRESH (no resume, cwd=$WORKDIR, rc=$REMOTE_CONTROL)" + fi + # Startup nudge as a POSITIONAL prompt (not stdin — stdin would force print mode and break + # remote-control). On --resume this appends as the session's next turn, triggering the AGENTS.md + # startup routine (announce + report reboots). Empty STARTUP_PROMPT => clean resume, no nudge. + local prompt_arg="" + [[ -n "$STARTUP_PROMPT" ]] && prompt_arg="'$STARTUP_PROMPT'" + tmux new-session -d -s "$SESSION" -c "$WORKDIR" \ + "$CLAUDE_BIN $resume $rc $CLAUDE_FLAGS $prompt_arg" + tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'" + log "started. status: $0 status | attach: tmux attach -t $SESSION" +} + +case "${1:-start}" in + start) start resume ;; + fresh) start fresh ;; + stop) + if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi + ;; + status) + if session_alive; then + log "$SESSION: RUNNING" + ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true + else + log "$SESSION: stopped" + fi + log "resume id: $(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID") (file: $ID_FILE)" + ;; + attach) exec tmux attach -t "$SESSION" ;; + *) + cat <.md etc.). When a phase's +# PHASES: the watchdog runs an ordered sequence of sub-phases (default: 1c → 1b → 1d → 1e → 2 → 2b → 3 → 4). +# Each phase has its own plan + phase-namespaced loop-state files (STATUS-.md etc.). When a phase's # STATUS-.md shows "## DONE", the watchdog AUTO-TRANSITIONS to the next phase; after the LAST -# phase it STOPS the loops and exits (a manual gate — e.g. check in before Phase 2). +# phase (4, final review/polish/cleanup) it STOPS the loops and exits (end of the whole build). # # Three jobs: ITERATION (each agent's /loop), RESILIENCE (restart a dead loop), HANDOFF SIGNALLING # (ping the waiting loop the moment its counterpart hands off), PHASE SEQUENCING (this file). @@ -49,7 +49,7 @@ WATCHDOG_SESSION="cc-ci-watchdog" # Ordered phase sequence: each entry "id|planfile|statusbasename". The watchdog runs them in order, # auto-transitions on the phase's "## DONE" (in BUILDER_DIR/), and STOPS after the # last one (manual gate). Override PHASES_SPEC (semicolon-separated) to change the sequence. -PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md}" +PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md;1d|plan-phase1d-generic-test-suite.md|STATUS-1d.md;1e|plan-phase1e-harness-corrections.md|STATUS-1e.md;2|plan-phase2-recipe-tests.md|STATUS-2.md;2b|plan-phase2b-test-performance.md|STATUS-2b.md;3|plan-phase3-results-ux.md|STATUS-3.md;4|plan-phase4-final-review-polish-cleanup.md|STATUS-4.md}" IFS=';' read -r -a PHASES <<< "$PHASES_SPEC" PHASE_IDX_FILE="${PHASE_IDX_FILE:-$LOG_DIR/.phase-idx}" # -------------------------------------------------------------------------- @@ -64,7 +64,10 @@ phase_id() { echo "${PHASES[$1]}" | cut -d'|' -f1; } phase_plan() { echo "${PHASES[$1]}" | cut -d'|' -f2; } phase_status() { echo "${PHASES[$1]}" | cut -d'|' -f3; } phase_review() { echo "REVIEW-$(phase_id "$1").md"; } -phase_done() { grep -qE '^##[[:space:]]+DONE' "$BUILDER_DIR/$1" 2>/dev/null; } # $1 = status basename (read locally) +# Loop-state files may sit at the repo root OR under machine-docs/ (the 1b RL6 move). Prefer +# machine-docs/ if present, else root — so the watchdog survives the move whenever it happens. +resolve_state() { local dir="$1" base="$2"; if [[ -f "$dir/machine-docs/$base" ]]; then echo "$dir/machine-docs/$base"; else echo "$dir/$base"; fi; } +phase_done() { grep -qE '^##[[:space:]]+DONE' "$(resolve_state "$BUILDER_DIR" "$1")" 2>/dev/null; } # $1 = status basename (read locally) all_ids() { local p; for p in "${PHASES[@]}"; do printf '%s ' "$(echo "$p" | cut -d'|' -f1)"; done; } preflight() { @@ -133,15 +136,32 @@ ping_session() { tmux send-keys -t "$s" -l -- "$msg" 2>/dev/null && { sleep 0.3; tmux send-keys -t "$s" Enter 2>/dev/null; } } +# A loop can stall ALIVE on a usage/spend-limit notice: the claude process stays up (so the +# dead-session restart never fires) but makes no progress, and the /loop self-pacing is dead because +# the limit interrupted the turn that would have scheduled the next tick. Detect that signature +# (limit text present + no active-turn marker) and re-nudge it each heavy tick — once the limit resets +# the next nudge lands and the loop resumes. Gated on the limit text so we NEVER nudge a loop that is +# just legitimately idle-waiting on a handoff. +LIMIT_RE='spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)' +nudge_if_limit_stalled() { + local s="$1" pane + pane="$(tmux capture-pane -pt "$s" 2>/dev/null | tail -25 || true)" + if printf '%s\n' "$pane" | grep -q 'esc to interrupt'; then return 0; fi # actively working + if ! printf '%s\n' "$pane" | grep -qiE "$LIMIT_RE"; then return 0; fi # not a limit stall + log "limit-stall detected on $s — re-nudging to resume" + ping_session "$s" "watchdog: the usage/spend limit appears lifted — RESUME your loop now. Pull latest, re-read your phase STATUS/REVIEW files, and continue from where you stopped; re-arm your loop pacing." +} + # Edge-triggered handoff signalling for the CURRENT phase. Reads the loops' local clones. # Ping the Adversary only when a gate id NEWLY appears on a "CLAIMED … awaiting" line (never on # the baseline / restart / a passed-but-kept line). Ping the Builder when the phase REVIEW changes. _wd_awaiting=""; _wd_baselined=""; _wd_last_review="" -handoff_reset() { _wd_awaiting=""; _wd_baselined=""; _wd_last_review=""; } # call on phase transition +_wd_adv_inbox_seen=""; _wd_builder_inbox_seen="" +handoff_reset() { _wd_awaiting=""; _wd_baselined=""; _wd_last_review=""; _wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""; } # call on phase transition handoff_check() { local idx sf rf cur now added idx="$(cur_idx)" - sf="$BUILDER_DIR/$(phase_status "$idx")"; rf="$ADV_DIR/$(phase_review "$idx")" + sf="$(resolve_state "$BUILDER_DIR" "$(phase_status "$idx")")"; rf="$(resolve_state "$ADV_DIR" "$(phase_review "$idx")")" if [[ -f "$sf" ]]; then now="$(grep -iE 'CLAIMED.*awaiting' "$sf" 2>/dev/null | grep -oiE 'M[0-9]+(\.[0-9]+)?|[A-Z][0-9]+' | tr '[:lower:]' '[:upper:]' | sort -u || true)" if [[ -n "$_wd_baselined" ]]; then @@ -163,6 +183,34 @@ handoff_check() { _wd_last_review="$cur" fi fi + + # INBOX side-channel (§6.1). The sender writes the receiver's inbox in their OWN clone, so we + # detect from the sender side. Edge-trigger on content hash so a fresh message (sender re-wrote + # before receiver consumed) re-pings. Receiver deletes after processing => hash empty => next + # write re-triggers. + local adv_inbox builder_inbox h + adv_inbox="$(resolve_state "$BUILDER_DIR" "ADVERSARY-INBOX.md")" + if [[ -f "$adv_inbox" ]]; then + h="$(md5sum "$adv_inbox" 2>/dev/null | awk '{print $1}' || true)" + if [[ -n "$h" && "$h" != "$_wd_adv_inbox_seen" ]]; then + log "handoff: ADVERSARY-INBOX.md new/changed -> pinging Adversary" + ping_session "$ADV_SESSION" "watchdog ping: the Builder wrote machine-docs/ADVERSARY-INBOX.md — pull, read the message, act on it, then delete the file (commit + push) to mark it consumed." + _wd_adv_inbox_seen="$h" + fi + else + _wd_adv_inbox_seen="" # consumed; ready for the next write + fi + builder_inbox="$(resolve_state "$ADV_DIR" "BUILDER-INBOX.md")" + if [[ -f "$builder_inbox" ]]; then + h="$(md5sum "$builder_inbox" 2>/dev/null | awk '{print $1}' || true)" + if [[ -n "$h" && "$h" != "$_wd_builder_inbox_seen" ]]; then + log "handoff: BUILDER-INBOX.md new/changed -> pinging Builder" + ping_session "$BUILDER_SESSION" "watchdog ping: the Adversary wrote machine-docs/BUILDER-INBOX.md — pull, read the message, act on it, then delete the file (commit + push) to mark it consumed." + _wd_builder_inbox_seen="$h" + fi + else + _wd_builder_inbox_seen="" + fi } watchdog_loop() { @@ -184,15 +232,15 @@ watchdog_loop() { handoff_reset start_loops else - log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — MANUAL CHECK-IN required before Phase 2." + log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — entire build (1c→3) finished." stop_loops - printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; manual check-in required before Phase 2.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE" + printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; entire build finished.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE" log "watchdog exiting." exit 0 fi else - session_alive "$BUILDER_SESSION" || { log "builder gone — restarting (phase $pid)"; start_agent builder "$BUILDER_SESSION" "$BUILDER_DIR"; } - session_alive "$ADV_SESSION" || { log "adversary gone — restarting (phase $pid)"; start_agent adversary "$ADV_SESSION" "$ADV_DIR"; } + if session_alive "$BUILDER_SESSION"; then nudge_if_limit_stalled "$BUILDER_SESSION"; else log "builder gone — restarting (phase $pid)"; start_agent builder "$BUILDER_SESSION" "$BUILDER_DIR"; fi + if session_alive "$ADV_SESSION"; then nudge_if_limit_stalled "$ADV_SESSION"; else log "adversary gone — restarting (phase $pid)"; start_agent adversary "$ADV_SESSION" "$ADV_DIR"; fi fi fi sleep "$SIGNAL_INTERVAL" diff --git a/cc-ci-plan/plan-orchestrator-migration.md b/cc-ci-plan/plan-orchestrator-migration.md new file mode 100644 index 0000000..9ad15d9 --- /dev/null +++ b/cc-ci-plan/plan-orchestrator-migration.md @@ -0,0 +1,135 @@ +# Plan — migrate the orchestrator off the Pi onto a dedicated NixOS Incus VM + +**Goal:** move everything that drives the cc-ci loops (the Builder/Adversary loops, the watchdog, +the SOCKS proxy, the orchestrator session itself) off the Raspberry Pi and onto a new, dedicated, +**reboot-resilient NixOS VM** on b1 — declared in a new git repo **`cc-ci-orchestrator`**. Finish by +relocating this orchestrator session there too. + +**Why:** the Pi has rebooted twice today, each time silently killing the tmux loops + watchdog +(they don't survive reboot, nothing auto-restarts them). A NixOS VM lets us declare the whole rig +(claude CLI, proxy, loop supervisor) as systemd services that come back on boot — turning a reboot +into a non-event. It also consolidates the orchestrator next to the infra it manages. + +**Status:** DRAFT — awaiting operator go-ahead before any infra creation / cutover. + +--- + +## 0. Current footprint (what has to move) + +On the Pi (`raspberrypi`, aarch64), workspace `/srv/cc-ci` (itself the +`cc-ci-autonomous-orchestrator` git repo): + +| Item | What | Move strategy | +|---|---|---| +| `cc-ci-plan/` | loop code: `launch.sh`, `plan*.md`, `prompts/`, `kickoff.md` | in git (this repo) → clone on VM | +| `cc-ci/`, `cc-ci-adv/` | Builder + Adversary working clones (~13M each) | **re-clone from git.autonomic.zone** on the VM (cleaner than copying) | +| `.cc-ci-logs/` | watchdog/loop logs + `.phase-idx` | copy `.phase-idx` (the resume point); logs start fresh | +| `cc-ci-secrets/` | sops-encrypted secrets repo | in git → clone | +| `references/` | recipe-maintainer corpus (read-only parity source) | clone/rsync from `/srv/recipe-maintainer` | +| **`.testenv`** | TS auth key, Gitea bot creds | **out-of-band copy** (gitignored, never in git) | +| **`~/.ssh/cc-ci-root-ed25519`** | root SSH key to cc-ci | **out-of-band copy** | +| **`.sops/master-age.txt`** | master recovery age key | **out-of-band copy** | +| **Incus mTLS certs** (`/srv/incus-terraform-nix-vm-creator/terraform-secrets/`) | `terraform.{crt,key}`, `vm_ssh_key` | **out-of-band copy** — so the VM can itself manage VMs | +| `cc-ci-tailscaled.service` | userspace SOCKS proxy :1055 | **re-declare as NixOS** (see §3) | +| **claude CLI + auth** | `~/.local/bin/claude` v2.1.154 + `~/.claude.json` | install on VM + **operator `claude auth login`** (§4) | +| this orchestrator session | the supervising claude conversation | **operator-assisted cutover** (§6) | + +Two hard human-in-the-loop steps, called out explicitly: **claude auth on the new VM** (device-code +login, can't be scripted) and the **final session cutover** (the operator connects to the new +orchestrator session). Everything else I can do. + +## 1. Target VM spec + +- **Host/API:** b1 Incus, `https://100.117.251.31:8443`, project `terraform-ci`, mTLS certs (have). +- **Name:** `cc-ci-orchestrator` (tailnet hostname too). +- **Resources:** **2 GB RAM, 2 vCPU, 30 GB disk** (dir backend → resize needs a reboot; size at + create time so no later grow). b1 has ample headroom (only cc-nix-test @8GB running). +- **Image:** the existing imported NixOS base VM image (`incus-base-vm`) — already ships tailscale, + openssh, git/jq/curl, flakes, cloud-init. +- **Tailnet:** joins via a fresh `TS_AUTH_KEY` (operator provides, or reuse the keyed approach in + `terraform-secrets/.test.env`). MagicDNS name `cc-ci-orchestrator.taila4a0bf.ts.net`. +- **Bootstrap:** cloud-init writes the `cc-ci-orchestrator` flake config + `nixos-rebuild switch`. + +## 2. The new `cc-ci-orchestrator` git repo (NixOS config) + +A new **private** repo on `git.autonomic.zone/recipe-maintainers/cc-ci-orchestrator` (bot is org +admin). It is the NixOS config for this VM — the orchestrator's equivalent of what `cc-ci` is for the +test server. Contents: + +- `flake.nix` + `hosts/cc-ci-orchestrator/configuration.nix` — the VM's NixOS config. +- **Packages:** `claude-code` (CLI), `git`, `tmux`, `python3`, `jq`, `openssh`, `nodejs` (claude + runtime), `coreutils`, `nettools` (`nc` for the proxy ProxyCommand). +- **`services.cc-ci-tailscaled`** — the userspace tailscaled SOCKS proxy on :1055, as a NixOS + systemd service (port to NixOS from the Pi's `cc-ci-tailscaled.service`). This is the path to b1 + + cc-ci. +- **`services.cc-ci-orchestrator`** — a systemd service that runs `launch.sh start` with + `RESUME_PHASE=1` **on boot** (after the proxy + network are up), as the workspace user. **This is + the reboot-resilience fix** — the loops + watchdog come back automatically after any reboot. +- **Secrets via sops-nix** (like cc-ci): the out-of-band secrets (`.testenv`, ssh key, incus certs) + are sops-encrypted into the repo, decrypted at activation to their runtime paths. The **master age + key** is the one irreducible out-of-band bootstrap secret placed on the VM once. +- `~/.ssh/config` for `cc-ci` (root, ProxyCommand via :1055) declared. +- **Excluded from git:** claude's own auth (`~/.claude.json`) — that's per-user login state, set up + once interactively (§4), not committed. + +## 3. Execution phases + +### Phase A — provision the VM (reversible; safe to do while Pi loops keep running) +1. Create `cc-ci-orchestrator` VM via the Incus API (2 GB / 2 vCPU / 30 GB, NixOS base image, TS auth + key in cloud-init). Wait for tailnet join + ssh. +2. Verify: `ssh` in, `tailscale status`, `nixos-rebuild` available, can reach b1 API + cc-ci through + its own proxy once configured. + +### Phase B — author + apply the `cc-ci-orchestrator` repo +3. Create the private git repo; author the flake/config (§2); commit/push. +4. Place the master age key on the VM; sops-encrypt the out-of-band secrets into the repo. +5. `nixos-rebuild switch` on the VM → proxy service up, packages present, services defined (loop + supervisor **not yet started** — or started in a dry mode). + +### Phase C — stage the workspace (no cutover yet) +6. On the VM: clone `cc-ci-autonomous-orchestrator` (the loop code), clone the Builder/Adversary + working repos fresh from git.autonomic.zone, clone `cc-ci-secrets`, rsync `references/`. +7. Copy `.phase-idx` (resume point = phase 2) so the VM watchdog resumes the right phase. +8. **Operator step:** `claude auth login` on the VM (device code) so the loops can run + `--remote-control --dangerously-skip-permissions`. Verify with a throwaway interactive claude. + +### Phase D — cutover (the only disruptive moment; pick a clean point) +9. **Quiesce the Pi:** stop the Pi loops + watchdog (`launch.sh stop`); confirm both loops are at a + safe point (no half-written commit; `git status` clean in both clones, last work pushed). +10. **Start on the VM:** enable + start the `cc-ci-orchestrator` systemd service → `launch.sh start` + (RESUME_PHASE=1) brings up Builder + Adversary + watchdog on the VM, resuming phase 2 from the + repo state. Verify all three sessions + a handoff + public health. +11. **Decommission the Pi loops:** disable the Pi's `cc-ci-tailscaled` + leave the workspace in place + (read-only fallback) but not running loops. (Keep the Pi as a cold standby for a few days before + deleting anything.) + +### Phase E — move the orchestrator session (operator-assisted) +12. On the VM, start the orchestrator session: `claude --remote-control 'autonomous-orchestrator' + --dangerously-skip-permissions` in a tmux session, seeded with AGENTS.md + this plan so it picks + up the supervising role. The **operator connects** to it (claude.ai/code) — this is the + "move myself" step; a session can't transplant itself across machines, so it's a fresh + orchestrator session on the VM with full context from the repo. +13. This Pi-side orchestrator session hands off (writes a short state note) and goes idle/ends. + +## 4. Risks & mitigations +- **claude auth (human step):** unavoidable device-code login on the VM. Mitigation: do it in Phase + C, well before cutover; verify before quiescing the Pi. +- **Loops mid-work at cutover:** pick a quiet point (between gate claims / after a push); the loops + re-orient from git on restart anyway, so worst case is a re-run of an in-flight iteration. +- **Secrets sprawl:** out-of-band secrets are copied once, then sops-managed in the new repo; never + committed in plaintext (same discipline as cc-ci). The master age key is the sole bootstrap secret. +- **Self-move gap:** between Pi-session-ends and VM-session-connected, there's no live orchestrator. + The watchdog (now a boot service) keeps the loops alive independently, so this gap is safe. +- **Rollback:** until the Pi workspace is deleted, reverting = stop VM service, `launch.sh start` on + the Pi again. Keep the Pi intact until the VM has run clean through at least one reboot + one gate + handshake. +- **Reboot-resilience proof:** before trusting the VM, reboot it once and confirm the loops + + watchdog + proxy all come back via systemd (the whole point of the move). + +## 5. Operator-assisted steps (the only things I can't fully do) +1. Provide a fresh `TS_AUTH_KEY` for the VM (or confirm reuse of the one in `terraform-secrets`). +2. `claude auth login` on the VM (device code). +3. Connect to the new orchestrator session on the VM at cutover (Phase E). + +Everything else (VM create, repo author, NixOS config, secret migration, workspace staging, the +loop cutover) I can drive. diff --git a/cc-ci-plan/plan-phase1b-review-lint.md b/cc-ci-plan/plan-phase1b-review-lint.md index c582fba..014a502 100644 --- a/cc-ci-plan/plan-phase1b-review-lint.md +++ b/cc-ci-plan/plan-phase1b-review-lint.md @@ -135,3 +135,40 @@ Blocking unless noted; these are *plan-relevant invariants visible only by readi - Whether to add Python **type-checking** (mypy/pyright) now or defer to `IDEAS.md`. - The precise **blocking vs advisory** split for the checklist. - Whether the `.drone.yml` lint stage should **fail** the build or just warn initially. + +--- + +## 7. Operator review items (added 2026-05-27) — repo layout (do in this 1b pass) + +Two structural-review items from the operator. Both are **blocking** for 1b. Apply them as part of +this pass, then re-verify (RL3 covers the re-verification). **Mind the coordination caveats — these +touch the live flake build and the running multi-agent machinery.** + +### RL5 — Consolidate all Nix-code folders under a root `nix/` +- Move the folders that contain `.nix` code — **`modules/` and `hosts/`** — to **`nix/modules/` and + `nix/hosts/`**. (Add future Nix dirs under `nix/` too.) +- **Keep `flake.nix` / `flake.lock` at the repo root** (entry point) so the build ref is unchanged + (`docs/install.md`'s `nixos-rebuild switch --flake 'git+file://…?submodules=1#cc-ci'` stays valid). + Just update the flake's internal paths (`./modules` → `./nix/modules`, `./hosts` → `./nix/hosts`) + and any `imports`/`scripts`/`.drone.yml` references. +- **Re-verify after the move:** the byte-identical clean-room result is the bar. The toplevel store + hash *will* change (paths differ) — that's fine; what must hold is that a fresh recursive clone + still rebuilds **byte-identical to the running system** and the Adversary re-confirms it cold + (folds into RL3). Update `docs/architecture.md` to describe the `nix/` layout. + +### RL6 — Move uppercase multi-agent-protocol files into `machine-docs/` +- Move the uppercase protocol files — **`STATUS*.md`, `REVIEW*.md`, `JOURNAL*.md`, `BACKLOG*.md`, + `DECISIONS.md`** — into a root **`machine-docs/`** folder. **`README.md` stays in the repo root** + (operator decision, 2026-05-27) — it is the human-facing repo readme, not a protocol file; do + **not** move it into `machine-docs/`. +- **Update every reference** to the new paths: the `cc-ci-plan/` plans (this file, `plan.md`, + `plan-phase1c-*`, `README.md`, `kickoff.md`, `test-e2e-testme-acceptance.md`), `AGENTS.md`, + `.drone.yml`, `scripts/`, and any in-repo doc that points at `STATUS.md`/`REVIEW.md`/etc. +- **⚠ COORDINATION CAVEAT (do not move these unilaterally mid-run):** the live **watchdog** + (`cc-ci-plan/launch.sh`, the orchestrator's file) reads `STATUS-.md` and `REVIEW-.md` at + the **repo root** to drive handoff pings + the 1c→1b auto-transition. Moving them breaks the + running watchdog until `launch.sh` is updated to the `machine-docs/` paths and the watchdog is + restarted. **So sequence it with the orchestrator:** the orchestrator updates `launch.sh`'s + `PHASES_SPEC`/path logic and restarts the watchdog **in lockstep** with the loops' `git mv`. + Safest to do this **near the end of 1b** (or as its final step), not while a phase transition is + pending. Flag the orchestrator when ready and it will handle `launch.sh` + the watchdog restart. diff --git a/cc-ci-plan/plan-phase1d-generic-test-suite.md b/cc-ci-plan/plan-phase1d-generic-test-suite.md new file mode 100644 index 0000000..99af082 --- /dev/null +++ b/cc-ci-plan/plan-phase1d-generic-test-suite.md @@ -0,0 +1,243 @@ +# cc-ci Phase 1d — Generic test suite + layered recipe overlays (Autonomous Build Plan) + +**Status:** QUEUED — runs **after Phase 1b** (`plan-phase1b-review-lint.md`) and **before Phase 2** +(`plan-phase2-recipe-tests.md`). It is the **test-architecture foundation** Phase 2 builds on, so it +must precede it. +**Transition:** **manual** (operator kicks it off at the post-1b check-in). +**Builds on:** the post-1b codebase (the runner/harness, `.drone.yml`, the comment-bridge, the proof +recipes, the `nix/` + `machine-docs/` layout from 1b RL5/RL6). +**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies. +**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md` +**Phase order:** 1c → 1b → **1d** → 2 → 2b → 3. + +--- + +## 0. Why this phase + +Today a recipe only gets tested if someone has authored tests for it. That doesn't scale to ~18+ +recipes and gives nothing for a brand-new recipe. The operator's model (2026-05-27): **every recipe +gets a generic lifecycle test suite for free**, and recipe-specific tests *layer on top* rather than +being the only thing that runs. This makes `!testme` meaningful on **any** recipe immediately, turns +Phase 2 into "add overlays where they add value" instead of "write everything from scratch," and +gives Phase 3 a natural basis for a YunoHost-style **level** (how many tiers pass). + +Core principle: **the generic is the default for each lifecycle op; a recipe's own `test_.py` +may override *or* extend that default.** The exact additive-vs-override mechanism is the **Builder's +design call** (operator, 2026-05-27: override — e.g. a present `test_install.py` *replaces* the +generic install assertions — is a perfectly good option; additive is fine too; could even be +per-recipe opt-in). What's **fixed** and non-negotiable: (1) when a recipe defines **no** test for an +op, **the generic runs** (so any recipe is testable with zero config); (2) the harness owns a +**single shared deployment** that generic + recipe assertions reuse — **no redeploy** (§2.2); (3) +custom (non-lifecycle) tests are opt-in. + +> **SUPERSEDED (2026-05-28) by Phase 1e HC3** (`plan-phase1e-harness-corrections.md`): the override +> default is replaced — the **generic now runs by default *alongside* an overlay (additive)**, +> skipped only via an explicit opt-out. Also: repo-local PR code is gated to approved recipes (HC2), +> and the upgrade tier targets the PR head (HC1). Read 1e for the current behavior. + +--- + +## 1. Definition of Done (Phase 1d exit condition) + +Terminates when every item holds **and the Adversary has independently cold-verified** (logged in +`machine-docs/REVIEW-1d.md`): + +- [ ] **DG1 — Generic INSTALL test.** A recipe-agnostic install test exists that, given only a + recipe name and **zero** recipe-specific config, does `abra app new` (sane defaults / auto + secrets) → `deploy` → polls to converged (all services running/healthy, no bare `sleep`) → + asserts the app is **actually serving** (real HTTP(S) response on its + `.ci.commoninternet.net` domain through Traefik — **not** a Traefik 404/default cert, not + health-only). Demonstrated **green** on ≥1 simple recipe (e.g. `custom-html`) that has **no** + cc-ci or repo-local tests. +- [ ] **DG2 — Generic UPGRADE test.** Deploy the **previous published version** (the last release + *before* the code under test), then **upgrade to the code under test (PR head) via + `abra app deploy --chaos`** (chaos = the current checkout) — i.e. previous-release → PR-head, + not previous → newest-published-tag. Assert services reconverge and the app still serves. + **OPERATOR CORRECTION (2026-05-28):** the current 1d impl upgrades to the newest *published tag* + and (because deploying the prev tag re-checks-out the recipe) never deploys the PR head — so a + recipe PR's actual changes aren't exercised by the upgrade path. Fix: after deploying prev, + **restore the PR-head checkout** (re-checkout the PR ref / re-snapshot) and `deploy --chaos` to + it as the upgrade target. The "deployment actually moved" assertion (`do_upgrade`) still + applies, but adapt it for prev→PR-head (a PR may not bump the version label — also accept an + image/config change, or assert the running config now matches the PR head). For a non-PR + `!testme`, "current checkout" = the catalogue current, so upgrade tests prev→current. (Data- + continuity assertions remain recipe overlays — see §2.1.) +- [ ] **DG3 — Generic BACKUP + RESTORE tests.** For backup-capable recipes (declare backupbot / + `abra app backup` support): run backup → assert a snapshot artifact is produced; then restore → + assert restore completes and the app is healthy after. For recipes that declare **no** backup + config, backup/restore are cleanly **N/A (skipped)** — *not* a failure. +- [ ] **DG4 — Layering (override-or-extend; generic is the default).** The harness composes a + recipe's run from the generic default per op, **overridden or extended** by the recipe's + `test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py` when present (in + cc-ci's tests dir **or** repo-local in the recipe's `tests/`) — the additive-vs-override + mechanism is the Builder's design choice, recorded in `machine-docs/DECISIONS.md`. **Invariant: + if a recipe defines no test for an op, the generic runs.** Arbitrary **custom** `test_*.py` run + **only if defined**. Discovery + cc-ci-vs-repo-local precedence is + implemented and settled in `machine-docs/DECISIONS.md`. +- [ ] **DG4.1 — Overlays reuse the deployment (no redeploy).** Overlay + custom assertions run + against the **same live deployment** the generic tier brought up — **one deploy per run, one + teardown** (§2.2), lifecycle ops mutating it in place. Verified: adding an overlay to a recipe + causes **no** extra `abra app new/deploy/undeploy` beyond the shared run (assert via + deploy-count / harness logs). Re-provisioning only where an op semantically demands it, and + then explicitly. +- [ ] **DG5 — Custom install-steps hook (with the "generic-anyway" rule).** The harness supports + **defined** per-recipe extra install steps (cc-ci or repo-local) that run before/around the + generic install. A recipe with **no** customization still **attempts the generic suite**. + Proven both ways: (a) a recipe needing a custom step **fails the generic install as expected** + when the step is absent; (b) the **same** recipe **passes** once the custom step is added — + demonstrating the hook + the graceful-generic-failure are both real. +- [ ] **DG6 — `!testme` end-to-end on an unconfigured recipe.** A `!testme` on a recipe with no + cc-ci/repo-local customization runs the full generic suite through the real pipeline + (bridge → Drone → deploy → assert → undeploy → report) and reports **per-operation** + pass/fail/skip (install/upgrade/backup/restore). +- [ ] **DG7 — Real, DRY, clean.** No softened/`skip`/`xfail`/can't-fail assertions; generic logic + lives in the **shared harness** (M6.5 — no per-recipe copy-paste); every run **undeploys** + (teardown in `finally`), respects `MAX_TESTS`. +- [ ] **DG8 — Documented + cold-verified.** `docs/` explains the generic suite, the overlay + convention (file names + locations + precedence), and the custom-install-steps hook + how to + add a recipe overlay. The Adversary re-runs the acceptance checks from a cold start within 24h. + +When DG1–DG8 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1d.md`. + +--- + +## 2. The layered test model (the core design) + +For a given recipe, the harness assembles and runs **tiers**, each = `generic [+ overlays]`: + +(`gen` = generic default; `→` = "overridden/extended by, if the recipe defines it" — mechanism is +the Builder's call, §2.2): +``` +INSTALL = gen_install → test_install.py (else gen) ← always runs +UPGRADE = gen_upgrade → test_upgrade.py (else gen) ← always runs +BACKUP = gen_backup → test_backup.py (else gen) ← if backup-capable, else N/A +RESTORE = gen_restore → test_restore.py (else gen) ← if backup-capable, else N/A +CUSTOM = recipe test_*.py (anything beyond the four) ← ONLY if defined +``` + +### 2.1 Generic baseline suite (recipe-agnostic) +- **install** — `abra app new --domain .ci.commoninternet.net` with non-interactive + defaults + auto-generated secrets; `abra app deploy`; poll to converged; assert a real HTTP(S) + response from the app over its domain (status + that it's the app, not Traefik's fallback). +- **upgrade** — deploy a prior/pinned version, `abra app upgrade` to target, assert reconverge + + still serving. +- **backup / restore** — only for recipes declaring backup config; verify the **mechanism** (backup + produces an artifact; restore completes + app healthy). **Honest limit:** generic backup/restore + can't assert app-specific *data integrity* without recipe knowledge — that's a recipe overlay + (`test_backup.py`/`test_restore.py` seed a marker + assert it survives). State this in docs. + +### 2.2 Layering — override or extend (Builder's call), always **reuse the deployment** +A recipe-defined `test_install.py` (etc.) either **overrides** the generic assertions for that op or +**extends** them — the Builder picks the mechanism (a present `test_.py` replacing the generic is +a fine, simple model; or additive; or per-recipe opt-in). The invariant either way: **if a recipe +defines nothing for an op, the generic runs.** This guarantees every recipe is testable with zero +config, while letting a recipe with a poor generic fit supply its own. + +**Reuse the deployment — do NOT redeploy per test (operator requirement, 2026-05-27).** Overlay +assertions run against the **same live deployment** the generic tier already brought up — no extra +`abra app new`/`deploy`/`undeploy` per overlay. The target shape: **one deploy per run**, then the +lifecycle ops mutate that single deployment in sequence and *all* assertions (generic + overlays + +custom) run against it, then **one teardown** at the end: + +``` +deploy ONCE + → INSTALL assertions: generic_install + test_install.py (same live app) + → UPGRADE in place (abra app upgrade) + assertions: generic_upgrade + test_upgrade.py (same app, upgraded) + → BACKUP (if capable) → generic_backup + test_backup.py + → RESTORE (if capable) → generic_restore + test_restore.py + → CUSTOM test_*.py (same live app) +teardown ONCE (in finally) +``` + +So a seed/marker written by `test_backup.py` is the same data `test_restore.py` checks; an overlay's +extra HTTP assertion hits the app the generic install already deployed. Tiers that intentionally +change state (UPGRADE, RESTORE) do so on the shared deployment in order. The only time a tier +re-provisions is when an op semantically requires it (e.g. a from-scratch restore-into-blank variant) +— and that must be explicit, not the default. This is also the main Phase-2b speed win. + +### 2.3 Custom tests +`test_*.py` that aren't one of the four lifecycle names run **only when present** for that recipe — +no generic equivalent, purely opt-in (e.g. `test_sso.py`, `test_federation.py`). + +### 2.4 Custom install steps (and the graceful-generic rule) +Some recipes need extra setup the generic flow won't do (pre-seed a DB, set an env/secret, run a +one-off command). The harness exposes a **defined** per-recipe install-steps hook (cc-ci or +repo-local). Rules: +- If a recipe **declares** custom install steps → run them as part of the install tier. +- If a recipe has **no** customization defined anywhere → **still attempt the generic suite.** + Recipes that genuinely need special steps will **fail the generic install — and that's acceptable + and expected**; the failure is reported (per-op), and the fix is to add the custom step (Phase 2 + work), not to special-case the harness. + +### 2.5 Discovery + precedence +- **Locations:** cc-ci's test dir (e.g. `tests//`) and the recipe repo's `tests/` + (repo-local). The harness discovers overlays + custom-install-steps from both. +- **Precedence (OPEN — settle in DECISIONS):** proposed default — **both layer**; repo-local is the + upstream-authoritative source and always runs, cc-ci's overlay runs in addition (and may pin/extend + for the CI env). Define the rule for same-named collisions explicitly. + +--- + +## 3. Milestones (bounded) + +- **G0 — Generic install.** Implement generic_install in the shared harness; green on `custom-html` + with no recipe config. *Accept:* DG1. +- **G1 — Generic upgrade + backup/restore.** Add generic_upgrade; add generic_backup/restore gated + on backup-capability (clean N/A otherwise). *Accept:* DG2, DG3. +- **G2 — Layering + discovery.** Implement the generic+overlay composition and cc-ci/repo-local + discovery + precedence; prove an overlay runs on top of generic. *Accept:* DG4. +- **G3 — Custom install-steps hook + graceful-generic.** Implement the hook; demonstrate the + fail-without / pass-with proof on one recipe needing a step. *Accept:* DG5. +- **G4 — `!testme` integration + per-op reporting + docs + cold verify.** *Accept:* DG6, DG7, DG8; + then flip `machine-docs/STATUS-1d.md` to `## DONE`. + +--- + +## 4. Guardrails +- **Generic is the default; recipe tests override or extend it** (Builder's mechanism) — but the + generic **always** runs when a recipe defines no test for that op. Never let a recipe end up with + *no* assertion for an op it should be tested on. +- **Generic failure ≠ harness bug** — a recipe needing custom steps failing the generic install is a + correct, reported outcome; fix by adding the step (Phase 2), don't weaken/special-case the generic. +- **Deploy once, reuse it** — overlays run against the generic tier's live deployment; no + per-test/per-overlay redeploy. One deploy + one teardown per run; re-provision only when an op + truly needs it, explicitly. (Correctness *and* the main perf lever.) +- **DRY** — generic logic in the shared harness, not per-recipe (M6.5); overlays are thin. +- **No weakened tests** — real assertions on real app state; teardown always; honor `MAX_TESTS`. +- **Bounded** — build the architecture + prove it on a couple of recipes; the full per-recipe overlay + authoring is Phase 2, not here. + +--- + +## 5. Impact on later phases (reshapes the plan set) +- **Phase 2 (`plan-phase2-recipe-tests.md`)** changes from "author every test from scratch" to: + every enrolled recipe gets the generic suite for free; Phase 2 = **author the additive overlays** + (port recipe-maintainer tests as `test_*.py` overlays) **+ define custom install steps** where a + recipe fails generically. Update Phase 2 to reference 1d as its foundation. +- **Phase 3 (`plan-phase3-results-ux.md`)** — the YunoHost-style **level** maps cleanly onto the + tiers: e.g. installs (generic) → +upgrade → +backup/restore → +custom assertions. The level is + derived from which tiers pass. +- **Phase 2b (perf)** — the generic suite is the common hot path, so it's the prime target for the + image-cache / readiness / dedup optimizations. + +--- + +## 6. Open decisions (log in machine-docs/DECISIONS.md) +- **Override vs extend (Builder's call).** Does a present `test_.py` **replace** the generic + assertions for that op, **add** to them, or is it **per-recipe opt-in**? Operator leans: override + is a good, simple model. Builder decides + documents — keeping the invariant "no recipe test for an + op ⇒ generic runs" and the single-shared-deployment rule. +- **cc-ci vs repo-local precedence** for overlays + install-steps (§2.5) and same-name collision rule. +- Exact **overlay file convention**: fixed names (`test_install.py`…) + discovery dir layout + (`tests//` in cc-ci; `tests/` in the recipe repo). +- How **custom install steps** are declared: a shell hook (`install_steps.sh`), a pytest fixture, or + a declarative field — pick the simplest that the harness can run uniformly. +- **Backup-capability detection**: how the harness decides a recipe is backup-capable (backupbot + labels present / `abra app backup` exit) to choose run-vs-N/A for DG3. +- Whether generic **upgrade** should always go previous→latest, or test the specific + version-bump under `!testme` (PR-driven). +- Per-op result vocabulary (`pass`/`fail`/`skip(N/A)`/`error`) feeding the Phase-3 level. +- **Deployment-sharing scope**: confirm one-deploy-per-run for the whole lifecycle (install→upgrade→ + backup→restore→custom on a single deployment) vs per-tier deployments; and how a failed earlier + tier (e.g. install) affects later tiers sharing that deployment (fail-fast vs continue-and-report). diff --git a/cc-ci-plan/plan-phase1e-harness-corrections.md b/cc-ci-plan/plan-phase1e-harness-corrections.md new file mode 100644 index 0000000..8ec9258 --- /dev/null +++ b/cc-ci-plan/plan-phase1e-harness-corrections.md @@ -0,0 +1,139 @@ +# cc-ci Phase 1e — Generic-harness corrections (Autonomous Build Plan) + +**Status:** QUEUED — runs **after Phase 1d** and **before Phase 2** (`plan-phase2-recipe-tests.md`). +It corrects the **shared generic-test harness** from 1d, so it must land before Phase 2 authors +overlays on top of it. +**Transition:** **manual** (operator kicks it off). +**Builds on:** the Phase-1d generic suite (`runner/run_recipe_ci.py`, `runner/harness/*`, +`tests/_generic/*`, `tests/conftest.py`) — see `plan-phase1d-generic-test-suite.md`. +**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies. +**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1e-harness-corrections.md` +**Phase order:** 1c → 1b → 1d → **1e** → 2 → 2b → 3. + +--- + +## 0. Why this phase + +An operator review of the 1d generic suite (2026-05-28) found three corrections to the **shared +harness** — the foundation every recipe overlay (Phase 2) builds on. Fixing them now, once, is far +cheaper than after overlays exist. All three are small in code but change behavior, so each needs a +fresh Adversary cold-verification and must not weaken any existing test. + +--- + +## 1. Definition of Done (Phase 1e exit condition) + +Terminates when every item holds **and the Adversary has independently cold-verified** (logged in +`machine-docs/REVIEW-1e.md`): + +- [ ] **HC1 — Upgrade tier upgrades to the code under test (PR head), not a published tag.** The + upgrade tier deploys the **previous published version** (last release before the PR) and then + **upgrades to the PR head via `abra app deploy --chaos`** (chaos = the current checkout). The + PR's actual changes are exercised by the upgrade path. (§2.1) +- [ ] **HC2 — Repo-local (PR-authored) code is not executed unless the recipe is approved.** By + default the harness runs **only cc-ci-authored** overlays/install-steps (`tests//…`) + + the generic; PR-authored repo-local `test_*.py` and `install_steps.sh` are **not run**. + Repo-local code is honored **only for recipes on an explicit cc-ci-maintained approval + allowlist** (default-deny). (§2.2) +- [ ] **HC3 — Generic runs by default (additive); skipping it is explicit.** When a recipe ships an + overlay for an op, the **generic still runs** alongside it by default; the generic is skipped + **only** when an explicit env/flag opts out. The baseline floor is never lost silently. (§2.3) +- [ ] **HC4 — No regression, cold-verified.** The Adversary re-runs the relevant D1–D10 / DG1–DG8 + acceptance from a cold start: nothing weakened, deploy-once (DG4.1) still holds, teardown still + sacred, and the three new behaviors are demonstrated (HC1: a PR-head upgrade proven to deploy + PR-head; HC2: a repo-local test is *ignored* for a non-approved recipe and *run* for an approved + one; HC3: generic runs with an overlay present, and is skipped only with the opt-out set). + +When HC1–HC4 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1e.md`. + +--- + +## 2. The three corrections + +### 2.1 HC1 — Upgrade to the PR head (not a published tag) +Current 1d behavior: deploy previous published version, then `abra app upgrade` to the **newest +published tag** — and because deploying the prev tag re-checks-out the recipe, the **PR-head code is +never deployed**, so a recipe PR's changes aren't exercised by upgrade. + +Corrected: +1. Deploy the **previous published version** (the last release before the code under test) as the + "before" state. +2. **Restore the PR-head checkout** (re-checkout the PR ref / re-use the post-fetch snapshot — the + prev-tag deploy will have reset `~/.abra/recipes/`). +3. **Upgrade to it via `abra app deploy --chaos`** (chaos = current checkout = PR head) in place on + the shared deployment. +4. Assert reconverge + still serving (as today). +- **Adapt the "deployment moved" assertion** (`generic.do_upgrade`): prev→PR-head may *not* bump the + coop-cloud version label (a PR can change a recipe without a version bump), so also accept an + image/config change, or assert the running config now matches the PR head — keep it non-vacuous + without false-failing a legit unbumped PR. +- **Non-PR `!testme`** (no PR head): "current checkout" = the catalogue current, so upgrade tests + prev→current — still valid. +- Preserve **deploy-once** spirit: this is still one app deployment mutated in place (prev → chaos + redeploy of PR head is the upgrade op, not a fresh second app). Reconcile with the DG4.1 + deploy-count guard — define whether a chaos redeploy counts as a "deploy" and adjust the guard so + the legitimate upgrade isn't flagged (e.g. count `abra app new` installs, not in-place redeploys). + +### 2.2 HC2 — Repo-local trust gate (default-deny; cc-ci overlays only) +`install_steps.sh` and repo-local `test_*.py` are PR-author-controlled code that runs on the CI host +with `/run/secrets/*` present — an untrusted-code risk. Operator decision (2026-05-28): + +- **Default:** the harness runs **only cc-ci-authored** overlays + install-steps + (`tests//…`) and the generic. Repo-local (`/tests/`) `test_*.py` and + `install_steps.sh` are **discovered-but-not-executed**. +- **Approved recipes only:** repo-local code is honored **only** when the recipe is on an explicit, + **cc-ci-maintained approval allowlist** (default-empty ⇒ default-deny). Adding a recipe to the + allowlist is a deliberate cc-ci-maintainer act after reviewing that recipe's tests. +- Update `discovery.resolve_op` / `custom_tests` / `install_steps` so the **repo-local source is + only consulted for allowlisted recipes**; otherwise precedence is **cc-ci > generic** only. +- **Open (settle in DECISIONS):** the allowlist's form + location (a checked-in file like + `tests/repo-local-approved.txt`, or a field in a cc-ci config), and the approval workflow. Keep it + simple + auditable + in git. +- (Future hardening, → IDEAS, not this phase: sandbox/network-restrict even cc-ci overlays.) + +### 2.3 HC3 — Generic by default (additive), explicit opt-out +Supersedes 1d's pure-override default. New rule: when a recipe ships an overlay for an op, **both the +generic and the overlay run** for that op by default; the generic is skipped **only** when an +explicit opt-out is set. + +- **Opt-out mechanism (propose; settle in DECISIONS):** an env flag `CCCI_SKIP_GENERIC` (all ops) and + per-op `CCCI_SKIP_GENERIC_` (e.g. `..._UPGRADE`), settable via the recipe's `recipe_meta.py` + (a `SKIP_GENERIC` list) so it's declarative per recipe, not a hidden global. +- **Op-vs-assertion split (required by additive + deploy-once):** a mutating op (upgrade/backup/ + restore) must run **once**, then **both** the generic assertions and the overlay assertions + evaluate the post-op state — never upgrade/backup twice. So refactor the tiers: the **orchestrator + performs the op once** (the harness owns the op), then runs generic assertions (unless opted out) + + overlay assertions against the shared post-op deployment. For `install` (no op) both assertion sets + just run. This keeps deploy-once and one-op-per-tier intact. +- Net effect: the generic "is it actually serving / did the upgrade move / snapshot produced" floor + is **always** exercised unless a recipe explicitly declares it skips generics — overlays add, they + don't silently subtract. + +--- + +## 3. Method / milestones (bounded) +- **E0 — HC2 trust gate.** Gate repo-local behind the approval allowlist (default-deny); cc-ci+generic + only otherwise. *Accept:* repo-local ignored for a non-approved recipe, run for an approved one. +- **E1 — HC3 additive + op/assertion split.** Generic runs alongside overlays by default; op runs + once; opt-out env skips the generic assertions. *Accept:* overlay + generic both run on one + deployment; opt-out skips generic; deploy-count still 1. +- **E2 — HC1 upgrade-to-PR-head.** prev-release → PR-head via `deploy --chaos`; moved-assertion + adapted; deploy-count guard reconciled. *Accept:* upgrade demonstrably deploys PR-head. +- **E3 — HC4 cold re-verification + docs.** Adversary cold-verifies no regression + the three new + behaviors; update `docs/` + `machine-docs/DECISIONS.md`; flip `STATUS-1e.md` to `## DONE`. + +--- + +## 4. Guardrails +- **Never weaken a test** — these are correctness/security fixes; the cardinal rule still wins. +- **Default-secure** — repo-local PR code is off unless the recipe is explicitly approved; the + allowlist lives in git and is auditable. +- **Floor-by-default** — the generic baseline always runs unless a recipe explicitly opts out. +- **Deploy-once preserved** — one app deployment, one teardown; ops run once; reconcile the DG4.1 + guard with the chaos-upgrade redeploy. +- **Bounded** — three fixes + verification, then stop; bigger hardening (sandboxing) → IDEAS. + +## 5. Open decisions (log in machine-docs/DECISIONS.md) +- HC2: approval-allowlist form/location + the approval workflow. +- HC3: opt-out flag name/granularity + declaring it via `recipe_meta.py`. +- HC1: how the DG4.1 deploy-count guard treats an in-place chaos upgrade (don't flag the legit op). diff --git a/cc-ci-plan/plan-phase2-recipe-tests.md b/cc-ci-plan/plan-phase2-recipe-tests.md index 8c5f9df..871f451 100644 --- a/cc-ci-plan/plan-phase2-recipe-tests.md +++ b/cc-ci-plan/plan-phase2-recipe-tests.md @@ -1,8 +1,13 @@ # cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan) -**Status:** QUEUED — starts after Phase 1 (`plan.md`) and the Phase-1b review/lint pass -(`plan-phase1b-review-lint.md`) reach `## DONE`. -**Builds on:** the Phase-1 cc-ci CI server (`plan.md`). This phase adds **test content**, not infra. +**Status:** QUEUED — starts after Phase 1b (`plan-phase1b-review-lint.md`) **and Phase 1d** +(`plan-phase1d-generic-test-suite.md`) reach `## DONE`. +**Builds on:** the Phase-1 cc-ci CI server (`plan.md`) **and Phase 1d's generic test suite** — every +recipe already gets generic install/upgrade/backup/restore for free, reusing one shared deployment. +So this phase is **not** "write every test from scratch": it's **authoring the recipe overlays** +(`test_.py` that override/extend the generic, per 1d's model) + **defining custom install steps** +for recipes that fail the generic form, porting the recipe-maintainer corpus as overlays. This phase +adds **test content**, not infra. **Reference corpus:** `references/recipe-maintainer/` → `/srv/recipe-maintainer/` (the existing, human-maintained recipe tests — the canonical source to port from). **Owner agents:** same Builder + Adversary loops + coordination protocol as Phase 1 (`plan.md` §6/§7). @@ -10,6 +15,44 @@ human-maintained recipe tests — the canonical source to port from). --- +## 0c. OIDC / SSO-dep recipes — follow the SSO-dep plan (NOT operator-pending) + +Recipes that authenticate via OIDC (lasuite-docs, cryptpad, lasuite-drive, lasuite-meet, future) +do **not** need operator input to wire OIDC. The canonical pattern lives at +**`plan-sso-dep-testing.md`** — declared `DEPS = ["keycloak"]` in `recipe_meta.py`, orchestrator +co-deploys the dep, recipe `install_steps.sh` reads `$CCCI_DEPS_FILE` and writes the OIDC env vars ++ injects the client secret via abra. Authenticated tests use `harness.sso.oidc_password_grant` or +Playwright on the dep's login page. **`DEFERRED.md` entries that cite "operator input needed for +OIDC" are mis-categorised** — re-open and execute them per this plan. + +## 0b. Auto-mirror missing recipes (NOT a blocker — autonomous loop work) + +A recipe is not on `git.autonomic.zone/recipe-maintainers/` is **not** an operator-pending +blocker. The bot is admin on `recipe-maintainers` (see memory `cc-ci-gitea-recipes`) and can create +private mirror repos. **For any recipe you want to enroll/test that isn't mirrored yet, mirror it +yourself** before enrolling, based on the **`recipe-create-pr` skill** — +`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (which references +`/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` for the full procedure). + +The flow (adapt the skill's command for the new-mirror case): +1. Create the **private** mirror repo on `git.autonomic.zone/recipe-maintainers/` (Gitea API + POST `/orgs/recipe-maintainers/repos`, bot creds from `.testenv`/§1.5). +2. Mirror the upstream `git.coopcloud.tech/coop-cloud/` (clone --mirror → push, including + tags) so the mirror's `main` is upstream-synced and tags carry over. +3. Then proceed with normal enrollment + the lifecycle suite (1d generic + your overlays from this + phase). + +Treat this as standard loop work — don't sit idle waiting on the operator for missing recipes. + +## 0a. Prerequisite: Phase 1e harness corrections (must be DONE first) + +The 1d/1b operator review produced three shared-harness corrections, now their own phase +**`plan-phase1e-harness-corrections.md`** (runs **before** this phase). Do **not** author overlays +until 1e is `## DONE`: it changes the foundation every overlay sits on — (HC1) upgrade goes +prev-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved +recipes** (default cc-ci-overlays + generic only); (HC3) the **generic runs by default alongside an +overlay**, skipped only via an explicit opt-out. See that plan for detail. + ## 0. Relationship to Phase 1 (read first) Phase 1 built the **machine**: the Drone pipeline, the `!testme` trigger (polling-primary), the diff --git a/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md b/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md new file mode 100644 index 0000000..fea783c --- /dev/null +++ b/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md @@ -0,0 +1,113 @@ +# cc-ci Phase 4 — Final review, polish & cleanup (capstone) (Autonomous Build Plan) + +**Status:** QUEUED — the **LAST** phase, runs after Phase 3 (`plan-phase3-results-ux.md`). A bounded +final review/lint/cleanup pass over the **entire** codebase as it stands after all phases, ending in a +**full cold re-verification that nothing regressed**. +**Transition:** auto (last in the launcher sequence); after it, the whole build is done. +**Builds on:** everything — Phase 1 + 1c + 1b + 1d + 1e + 2 + 2b + 3 (flake/`nix/` modules, the +runner/harness + generic suite + recipe overlays, the comment-bridge, Drone, the dashboard/results +UX, docs, `machine-docs/`). +**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies. +**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md` +**Phase order:** 1c → 1b → 1d → 1e → 2 → 2b → 3 → **4 (final)**. + +--- + +## 0. Why this phase (and why it's bounded) + +This is the analogue of Phase 1b, but final and whole-codebase. By now the tree has grown a lot — +recipe overlays (Phase 2), performance changes (2b), and results/dashboard UX (3) — all layered on +the foundation. Before calling the build done, do one **bounded** pass to clean and harden it, and — +critically — **re-verify from a cold start that none of the growth/cleanup regressed any earlier +guarantee.** Same discipline as 1b: **good-enough + enforceable**, style→tooling, judgment→checklist, +don't reopen settled design, and **never weaken a test** to satisfy a nit. + +--- + +## 1. Definition of Done (Phase 4 exit condition) + +Terminates when every item holds **and the Adversary has independently cold-verified** (logged in +`machine-docs/REVIEW-4.md`): + +- [ ] **F1 — Lint/format green across the whole codebase.** Re-run the 1b toolchain (alejandra/ + statix/deadnix, ruff, shellcheck/shfmt, yamllint) over everything added since 1b (Phase-2 + overlays, 2b changes, 3 UX/dashboard); extend the lint config to any new languages/areas (e.g. + dashboard front-end) so it's covered going forward. The `.drone.yml` lint stage still passes + from a clean checkout; prove with a break-it probe. +- [ ] **F2 — White-box review checklist over all post-1b code.** Run the §3 checklist; fix every + **blocking** finding, triage advisories to `BACKLOG`/`IDEAS`. Findings + resolutions in + `machine-docs/REVIEW-4.md`. +- [ ] **F3 — Cleanup.** Remove dead code/scaffolding and stale TODOs; consistent naming/structure; + reconcile `machine-docs/` (BACKLOG/IDEAS/DECISIONS current, no contradictions); docs match the + final state. No behavior change beyond what F2 mandates. +- [ ] **F4 — FULL cold re-verification (the final gate).** *After* F1–F3 land, the Adversary + **independently re-verifies every prior Definition-of-Done from a cold start**, to the same bar + each phase used — fresh PASS + evidence + timestamps in `machine-docs/REVIEW-4.md` within 24h, + **nothing weakened/skipped/softened** by the cleanup: + - **Phase 1 D1–D10** (incl. the genuine **D8** byte-identical fresh-clone rebuild + a + category-spanning live `!testme` e2e through the public gateway). + - **Phase 1c C1–C7** (secrets-in-git, cert-in-sops, honest reproducibility). + - **Phase 1d DG1–DG8** (generic install/upgrade/backup/restore, deploy-once `DG4.1`, override + floor) **as amended by 1e**. + - **Phase 1e HC1–HC3** (upgrade→PR-head via `deploy --chaos`; repo-local gated to approved + recipes; generic-by-default + explicit opt-out). + - **Phase 2** recipe-coverage criteria (every enrolled recipe's overlays/ported tests real, + DRY, green). + - **Phase 2b** performance claims (the measured improvements still hold; no test weakened to + get them). + - **Phase 3** results/level/UX criteria (per-run level honest, PR comment + dashboard correct). +- [ ] **F5 — Documented + cold-verified.** Final `docs/` accurate (install reproduces from scratch; + enroll-recipe + overlay/approval flow correct); accepted deviations in `DECISIONS.md`; the + Adversary confirms F1–F4 with no standing VETO and no open `[adversary]` finding. + +When F1–F5 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-4.md` — the build is +complete. + +--- + +## 2. Method +1. **Lint/format first** (F1) — re-run + extend; auto-fix style, don't deliberate. +2. **Review checklist** (F2, §3) — classify blocking vs advisory; fix blocking, triage rest. +3. **Cleanup** (F3) — dead code, naming, docs, `machine-docs/` reconciliation. +4. **Full cold re-verification LAST** (F4) — once everything has landed, the Adversary re-runs the + entire cross-phase acceptance from cold. Order matters: tooling → review/fixes → cleanup → *then* + full re-verify. Cleanup must regress nothing. +5. **Bound it** — a pass, not a rewrite; record dead-ends/deviations and stop. + +## 3. White-box review checklist (teeth, not taste) — whole codebase +Blocking unless noted (plan-relevant invariants visible only by reading code): +- **Tests are real** (blocking) — every generic/overlay/custom test asserts actual app state; no + `skip`/`xfail`/can't-fail; per-op `pass/fail/skip` honest; the 1d/1e anti-vacuous guards + (`assert_serving` routing proof, `do_upgrade` "moved", deploy-count==1) intact. +- **1e corrections intact** (blocking) — repo-local code still gated to approved recipes; generic + still runs by default (opt-out explicit); upgrade still targets the PR head. +- **Generic-first / custom-additive invariant** (blocking — `docs/testing.md`). Confirm no path + makes the generic tier depend on custom: deps deploy + `setup_custom_tests` run **after** all + generic tiers, never before; a forced `setup_custom_tests` failure still yields a clean + generic-tier `pass/pass/pass/pass` + `skip(deps-not-ready)` for `@requires_deps` custom tests + (re-exercise the forced-failure case). Future maintainers must be able to operate cc-ci with + the generic tier alone — verify that path stays viable. +- **Harness DRY** (blocking-ish) — recipe quirks are data (`recipe_meta.py`), not shared-harness + conditionals; overlays are thin; no per-recipe copy-paste of lifecycle logic. +- **Server state Nix-declared & idempotent** (blocking) — no imperative drift / run-once sentinels / + manual post-rebuild steps; the `nix/` layout clean. +- **No footguns** (blocking) — no bare `sleep` for readiness (poll); teardown in `finally`; secrets + reused per run not regenerated; no hardcoded versions/domains that break upstream. +- **No secrets in code/committed files** (blocking) — grep source/configs/`.drone.yml`/fixtures; + log/dashboard redaction real (incl. any new Phase-3 UX surface that echoes run data). +- **Phase-3 UX correctness** (advisory→blocking on real drift) — the displayed level/badge/screenshot + reflect the true per-op results; no misleading "pass". +- **Architecture matches the plans; deviations in `DECISIONS.md`** (advisory→blocking on real drift). +- **Readability & docs** (advisory) — clear names, dead code removed, docs reproduce from scratch. + +## 4. Guardrails +- **Never weaken a test** to satisfy a lint/review/cleanup nit (cardinal rule wins). +- **Don't reopen settled design** — clean + harden + re-verify; bigger ideas → `IDEAS.md`. +- **Bounded** — one pass; cap iterations; record + stop. +- **Cleanup regresses nothing** — F4 is the proof; if a cleanup breaks a prior guarantee, revert the + cleanup, not the guarantee. + +## 5. Open decisions (log in machine-docs/DECISIONS.md) +- Any new linters/formatters for Phase-3 front-end / new areas, and their strictness. +- The precise blocking-vs-advisory split for the §3 checklist on the new code. +- Whether to add Python type-checking now or defer to `IDEAS.md` (carried from 1b). diff --git a/cc-ci-plan/plan-sso-dep-testing.md b/cc-ci-plan/plan-sso-dep-testing.md new file mode 100644 index 0000000..9e408b6 --- /dev/null +++ b/cc-ci-plan/plan-sso-dep-testing.md @@ -0,0 +1,205 @@ +# SSO-dep testing pattern (OIDC + co-deployed provider) — reference plan + +**Status:** active reference for Phase 2 (does not need its own phase; it's a pattern Phase 2 +overlays apply per recipe). +**Operator clarification (2026-05-28):** integrating a recipe with an OIDC/SSO dep is **loop work, +not operator work**. Anything that was deferred citing "operator input needed for OIDC" should be +re-opened and done autonomously per this plan. +**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-sso-dep-testing.md` +**Companion:** the running harness in `runner/harness/sso.py` (existing primitives: +`setup_keycloak_realm`, `oidc_password_grant`, `assert_discovery_endpoint`). + +--- + +## 0. Why this plan + +Several recipes test their authenticated functionality through an OIDC/SSO provider (keycloak, +authentik). The cc-ci pattern is to **co-deploy the provider with the recipe under test in the same +ephemeral run** — one shared deployment per dep, configured at install time, used by the +recipe-under-test's authenticated tests, torn down with it. This file is the canonical pattern for +how to wire that up so any recipe that declares `DEPS = ["keycloak"]` (or `["authentik"]`) Just Works +without per-recipe ad-hoc plumbing. Recipes that need OIDC are not blocked on the operator — they +follow this plan. + +## 1. The DEPS model — deps deploy AFTER generic tiers (operator-2026-05-28) + +**Critical ordering rule:** generic tiers (install / upgrade / backup / restore) run against the +**recipe alone, with no dep available**, so a failure in dep-deploy or OIDC setup **cannot break +generic-tier signal**. Deps + OIDC wiring move to a **`setup_custom_tests` step** that runs *after* +generic tiers and *before* the custom tier — its failure is isolated to the SSO-marked custom tests. + +A recipe's `tests//recipe_meta.py` declares its SSO dep: +```python +DEPS = ["keycloak"] # or ["authentik"] when that backend lands +``` + +### Lifecycle order (single run, per recipe) + +``` +1. Deploy recipe-under-test ALONE (no deps, OIDC env unset or stubbed). + - app_new #1 for the recipe; generic install_steps.sh runs RECIPE-ONLY setup (no deps). +2. INSTALL tier: generic [+ overlay] assertions against the recipe alone. +3. UPGRADE tier: abra app upgrade in place, assertions against the recipe alone. +4. BACKUP tier: in place (if backup-capable), recipe-alone marker. +5. RESTORE tier: in place, recipe-alone marker. +6. setup_custom_tests step ← NEW (operator-2026-05-28) + a. For each dep in DEPS, deploy + provision realm/client via harness.sso.setup__realm. + b. Write $CCCI_DEPS_FILE with each dep's {domain, realm, client_id, client_secret, admin_*}. + c. Run the per-recipe post-deps hook `tests//setup_custom_tests.sh` to wire the OIDC + env into the running recipe (abra app config set + abra app secret insert) and trigger an + in-place redeploy of the affected services so the env takes effect. + d. Mark deps-ready = True on success; on ANY failure mark deps-ready = False and CONTINUE + (log the error; do NOT abort the run). +7. CUSTOM tier: + - If deps-ready: run all custom tests, including those tagged @pytest.mark.requires_deps. + - If NOT deps-ready: still run custom tests, but tests tagged @pytest.mark.requires_deps are + reported as ERROR/SKIP (with the captured setup_custom_tests error attached). Non-deps + custom tests still run normally. +8. Teardown (in finally): recipe first; then each dep in reverse declaration order. +``` + +### DG4.1 deploy-count guard, generalised +The "one deploy per run" guard becomes **one `abra app new` per app in the run** (recipe + each +dep). In-place reconfigure-and-redeploy (the step 6c env update) is **NOT** a fresh `app_new` and +does NOT increment the per-recipe count. So a run with `DEPS = ["keycloak"]` has exactly 2 +`app_new` calls (recipe + keycloak), no matter how many tiers ran. The per-run summary reports +deploy-count per app for verification. + +### Why this ordering +- **Generic-tier signal is preserved** when SSO/dep setup is broken — the recipe's own deploy/ + upgrade/backup/restore behaviour is still tested honestly. +- **Failure isolation**: a recipe whose generic tier passes but whose SSO setup is broken yields + per-op `pass/pass/pass/pass/skip(deps-not-ready)` — far more useful than the previous + all-or-nothing. +- A recipe that genuinely can't boot without OIDC fails its generic install honestly (the recipe + should accept a stubbed/empty OIDC env at install time and only require the env when an + authenticated endpoint is hit). That's a real recipe finding, not a CI artifact. + +## 2. Provider pluggability + +- **Provider-agnostic primitives** (today, in `harness/sso.py`) — these stay pluggable: + - `oidc_password_grant(discovery_url, client_id, client_secret, username, password) -> token` — + pure OIDC; works against any compliant provider. + - `assert_discovery_endpoint(discovery_url, expected_issuer)` — pure OIDC. +- **Provider-specific setup** (admin API calls) — one function per provider: + - `setup_keycloak_realm(domain, admin_user, admin_password, realm, client_id, redirect_uris) -> + {client_secret, discovery_url}` — exists today. + - `setup_authentik_realm(...)` — same shape, authentik admin API; **deferred** to a future Q4 + enrollment that actually wants authentik (see `machine-docs/DEFERRED.md`). Pluggable: a recipe + declaring `DEPS = ["authentik"]` would just call this instead. No change to the per-recipe + `install_steps.sh` shape beyond which provider it asks for from `$CCCI_DEPS_FILE`. +- **Don't write per-recipe SSO logic.** All recipes use the same DEPS+install_steps shape. + +## 3. Per-recipe hooks — two distinct scripts (recipe-only vs post-deps) + +A recipe with `DEPS = ["keycloak"]` ships **two** optional hook scripts (either may be absent if +not needed): + +### 3.1 `tests//install_steps.sh` — RECIPE-ONLY setup, runs at install time +This is the Phase-1d custom-install-steps hook. It runs **before** the recipe deploys, **with no +dep available** (the dep hasn't been deployed yet at this point). Use it only for recipe-only +setup that the recipe needs to boot at all (e.g. seed a fixture, set a non-OIDC env). **Do NOT +read `$CCCI_DEPS_FILE` here** — it doesn't exist yet. If the recipe requires OIDC to *boot at +all*, set a safe stub here (e.g. disable auth) so the recipe can come up for generic tiers; the +real OIDC wiring happens in §3.2. + +### 3.2 `tests//setup_custom_tests.sh` — POST-DEPS wiring, runs after generic tiers +This is the new (operator-2026-05-28) hook that wires the recipe to its already-deployed dep, +*after* the generic tiers have run. The orchestrator has already deployed each dep and written +`$CCCI_DEPS_FILE` by the time this runs. Roughly: + +```sh +#!/usr/bin/env bash +set -euo pipefail +# Read the dep's connection info from $CCCI_DEPS_FILE (orchestrator-written). +KC_DOMAIN=$(jq -r '.keycloak.domain' "$CCCI_DEPS_FILE") +KC_CLIENT=$(jq -r '.keycloak.client_id' "$CCCI_DEPS_FILE") +KC_SECRET=$(jq -r '.keycloak.client_secret' "$CCCI_DEPS_FILE") +KC_REALM=$( jq -r '.keycloak.realm' "$CCCI_DEPS_FILE") + +# Inject the OIDC client secret as an abra app secret (recipe-conventional name varies — match +# the recipe's .env.sample SECRET_*). +echo "$KC_SECRET" | abra app secret insert -n "$CCCI_APP_DOMAIN" oidc_rpcs v1 - + +# Write the OIDC env vars to the parent .env (names per the recipe's .env.sample). +abra app config set "$CCCI_APP_DOMAIN" \ + OIDC_REALM="$KC_REALM" \ + OIDC_OP_DISCOVERY_ENDPOINT="https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration" \ + OIDC_OP_AUTHORIZATION_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth" \ + OIDC_OP_TOKEN_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token" \ + OIDC_OP_USER_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo" \ + OIDC_RP_CLIENT_ID="$KC_CLIENT" \ + OIDC_RP_REDIRECT_URI="https://${CCCI_APP_DOMAIN}/auth/oidc/callback" + +# Force an in-place redeploy of the affected services to pick up the new env. This is NOT a fresh +# app_new (deploy-count guard still 1 for this recipe). +abra app deploy --force --chaos --no-input "$CCCI_APP_DOMAIN" +``` + +The OIDC env-var **names are recipe-specific** (`OIDC_OP_*` for lasuite-docs, different prefixes +elsewhere). Read the recipe's `.env.sample` to see which keys the recipe expects; the *values* follow +this template. If a recipe needs more than this (extra group/claim mappings, etc.), extend its +`setup_custom_tests.sh` only — never the shared harness. + +## 4. Test pattern: authenticated endpoints (mark + isolate) + +- **Mark dep-requiring tests:** every custom test that needs the dep up + OIDC wired must use + `@pytest.mark.requires_deps`. The orchestrator skips these with reason `"deps-not-ready: "` + if `setup_custom_tests` failed. Non-deps custom tests are unaffected by SSO setup failures. +- **Headless API tests** — use `harness.sso.oidc_password_grant` to mint an access token, then call + the recipe's authenticated endpoint with `Authorization: Bearer `. Asserts on the response. +- **Browser flows (Playwright)** — navigate to the recipe, follow the redirect to keycloak, fill the + pre-provisioned test user's credentials, return to the recipe, exercise the UI. (Use the + pre-provisioned `ci-user@example.com` / known password the realm setup creates.) +- **The realm/client is fresh per run** — no cross-run state, no shared accounts. The realm setup + creates one or more test users with known passwords (pass-through from a per-run secret) so the + tests can authenticate without prompts. + +## 5. Concrete recipes that use this pattern (Phase-2 scope) + +These are **loop work** under this plan, not deferred: + +- **lasuite-docs** — `DEPS = ["keycloak"]`; ports the upstream `oidc_login.py` + + `upload_conversion.py` parity tests + the §4.3-prescribed `create-a-doc + read-back via + authenticated /api/v1.0/documents/`. (Re-enters `DEFERRED.md` entry #5 — this plan IS the + re-entry, not operator input.) +- **cryptpad** — `DEPS = ["keycloak"]` (cryptpad upstream tests use authentik, but a keycloak-backed + cryptpad OIDC test is equally valid and uses the same primitives). The cryptpad create-a-pad + Playwright test (DEFERRED #6) is a separate concern — that one really does need a stable + CryptPad app-launch contract; it stays deferred. +- **lasuite-drive, lasuite-meet** — same pattern when mirrored (`recipe-create-pr` skill — loop work). +- Any future recipe that requires OIDC follows this plan; no operator handoff. + +## 6. What stays deferred (genuinely operator-input) + +- **authentik enrollment + `setup_authentik_realm` backend** (DEFERRED #9) — provider breadth, not + blocking any Phase-2 recipe under keycloak. Open question for the operator: do we want + cross-provider coverage as part of Phase-2 DONE? If yes, lift; if not, leave deferred. +- The `--extra-tests` flag IDEA is **not** a precondition for this plan; OIDC-dep tests are part + of the default suite for the recipes that need them. + +## 7. Definition of done for this pattern +- [ ] `DEPS = [...]` honored by `runner/run_recipe_ci.py`, with the **deps-AFTER-generic** ordering + (§1): deps deploy + `setup_custom_tests` step runs between RESTORE and CUSTOM tiers; + `$CCCI_DEPS_FILE` written; deps torn down LAST in reverse order. +- [ ] **Failure isolation proven:** a forced `setup_custom_tests` failure (e.g. simulate keycloak + realm-setup error) yields a run where generic tiers report **pass** and CUSTOM + `requires_deps` tests report **skip(deps-not-ready)** — no false fail of the generic tier, + no aborted run. +- [ ] **lasuite-docs** ships `tests/lasuite-docs/setup_custom_tests.sh` per §3.2 + authenticated + tests per §4 marked `@pytest.mark.requires_deps` (closes DEFERRED #5 — keep the entry there + with the closing commit, do not re-defer). +- [ ] At least one other OIDC-dep recipe (cryptpad oidc_login or a lasuite-* once mirrored) lands + cold-green using the same pattern, demonstrating reuse. +- [ ] `docs/sso-dep-testing.md` (in the cc-ci repo) explains the pattern for future recipe + enrollments — link to this plan. +- [ ] Adversary cold-verifies the full run for one such recipe + the forced-failure isolation + case, posts PASS in `REVIEW-2.md`. + +## 8. Mirror+enroll reminder (also loop work) + +If a recipe in scope (e.g. `lasuite-drive`, `lasuite-meet`, `immich`) **isn't mirrored to +`git.autonomic.zone/recipe-maintainers/`**, mirror it autonomously via the `recipe-create-pr` skill +at `/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (see also +`plan-phase2-recipe-tests.md §0b`). Mirror+enroll is **not** operator-pending; the bot is admin on +the org. diff --git a/cc-ci-plan/plan.md b/cc-ci-plan/plan.md index dfdecc3..169e0fb 100644 --- a/cc-ci-plan/plan.md +++ b/cc-ci-plan/plan.md @@ -126,7 +126,13 @@ without the auth key. - **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert (`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at - `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0). The agent feeds these into the + `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0). + > **Phase-1c update (supersedes the cert references in §1.5/§4.0/§4.4 below):** the cert is no longer + > an out-of-band operator file-drop — it is now **sops-encrypted in the private `cc-ci-secrets` repo** + > (a git submodule) and **decrypted at activation to that same path** by sops-nix. Issuance stays + > operator-only (LE/Gandi, no token on the box); to rotate, the operator re-issues then re-encrypts + > the cert into `cc-ci-secrets` and rebuilds. The ONE out-of-band secret is now the bootstrap age key + > at `/var/lib/sops-nix/key.txt`. Authoritative model: `cc-ci/docs/secrets.md` + `docs/install.md`. The agent feeds these into the `coop-cloud/traefik` recipe as its `ssl_cert`/`ssl_key` swarm secrets (wildcard/file-provider mode) and runs **no ACME** for this domain. **Do not request or expect a `commoninternet.net` DNS token** — issuance/renewal is handled out-of-band by the operator (LE 90-day cert; next renewal @@ -597,8 +603,37 @@ its own pacing. To make concurrent writes conflict-free: merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it closed after re-test. + - `DEFERRED.md` (in `machine-docs/`) is the **single canonical registry for things the loops + have deliberately decided not to do autonomously and that need operator input to move on.** + Append-only; either agent may file. Each entry should clearly say *what's needed from the + operator* to lift the deferral (an opt-in flag, a resource decision, an architectural call, + plain "go ahead"). The list is **open-ended** — items can sit indefinitely, **no obligation + to close every item**, closure is operator-driven. A re-entry trigger / IDEA cross-link is + **optional** (include when there's a natural mechanism, e.g. an opt-in flag in + `cc-ci-plan/IDEAS.md`). Don't park deferrals as a vague "Q4 follow-up" / buried JOURNAL note + — file them here so the operator can review the whole list. The Phase-4 cleanup pass should + **surface** DEFERRED.md to the operator at least once but does **not** force closure. + Future-aspirational ideas (out of current scope) still go to `cc-ci-plan/IDEAS.md`; DEFERRED + is for considered-and-parked work the loops won't tackle without operator input. - **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never conflict. Prefer appending over rewriting. +- **Artifact-layer isolation — facts in STATUS, reasoning in JOURNAL (anti-anchoring).** Rigorous + adversarial verification requires the Adversary NOT to consume the Builder's rationalisations + before forming its verdict. The split: + - `STATUS.md` MUST carry **everything the Adversary needs to verify the claim** — withholding + verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW** + to verify (the exact command/check the Adversary can re-run from its own clone), the + **EXPECTED** outcome (build hashes, file contents, leaf fingerprints, status codes), and + **WHERE** the inputs live (commit shas, paths). If it's essential for the Adversary to verify, + it goes in STATUS. + - `STATUS.md` MUST NOT carry rationalisations / "why I think this passes" / design narrative / + dead-ends explored. Those go in `JOURNAL.md` (Builder-private to write). + - The Adversary reads STATUS for the claim + verification info, the plan as SSOT, and the code / + git history; it forms its verdict from those + its own **cold** acceptance run, and does **not** + read `JOURNAL.md` before the verdict. After an independent verdict, consulting JOURNAL is fine + (e.g. to contextualise a finding) — note in REVIEW that you did. + + In short: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.** - **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`. @@ -613,6 +648,22 @@ its own pacing. To make concurrent writes conflict-free: - **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep doing independent work — neither loop blocks idle waiting on the other beyond its gate. +- **INBOX — explicit cross-loop messaging beyond CLAIMS.** Sometimes you have something to say to + the other loop that isn't a gate claim or a REVIEW verdict (a heads-up, a request for + early-look, a "I refactored X, please re-verify Y", an observation outside the normal flow). For + those, use the inbox files in `machine-docs/`: + - **Builder → Adversary:** the Builder writes/appends `machine-docs/ADVERSARY-INBOX.md` in its + own clone, commits, pushes. + - **Adversary → Builder:** the Adversary writes/appends `machine-docs/BUILDER-INBOX.md` in its + own clone, commits, pushes. + - The watchdog edge-triggers on **newly-present** inbox files in the relevant clone and pings + the receiver. The receiver, on receipt, reads + processes the message, then **deletes the + inbox file** (commits + pushes) — deletion is the "message consumed" signal. Single-writer + discipline: only the sender writes their counterpart's inbox; only the receiver deletes it. + - **Use for:** non-gate signals — "heads-up I'm about to refactor X," "please cold-verify this + while I keep going," "I observed Y outside our normal flow," "I'm taking a long e2e now." + **Do NOT use for:** formal gate claims (`STATUS.md` still owns those) or verdicts (`REVIEW.md` + still owns those). The inbox is a side-channel, not a replacement. (If you are ever forced to run with a single process, the degraded fallback is to alternate roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is @@ -649,8 +700,12 @@ every wake, `git pull --rebase` first, then: **Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll the *specific* thing. Three cases: -1. **Something in flight** (build/deploy/`nixos-rebuild`) → re-check on a short cadence (≈4 min) to - stay cache-warm; keep polling *it*, don't treat it as idle, and don't spin on a minutes-long build. +1. **Something in flight** (build/deploy/`nixos-rebuild`/e2e/heavy test) → **poll every ~5 min** to + stay cache-warm and to **see failures as they happen**, not at the end of a 25-minute sleep. Do + **NOT** `ScheduleWakeup` for the expected total runtime of the task in a single big sleep — a 25 + min e2e gets 5 short cache-warm polls, not one 25-min cache-cold blackout. The wakeup that wakes + you mid-task is *cheap* (one cache hit, one quick status check); the value of catching a deploy + that died at minute 4 of a 25-min budget is large. Keep polling *it*, don't treat it as idle. 2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate, diff --git a/cc-ci-plan/prompts/adversary.md b/cc-ci-plan/prompts/adversary.md index eebfd92..127b109 100644 --- a/cc-ci-plan/prompts/adversary.md +++ b/cc-ci-plan/prompts/adversary.md @@ -8,6 +8,8 @@ You run as a SEPARATE process and coordinate ONLY through the git repo per §6.1 - Keep your OWN clone at /srv/cc-ci/cc-ci-adv. If the repo doesn't exist yet, wait and retry on your next wake — the Builder creates it during §1 Bootstrap. - git pull --rebase before every edit; commit; push; never --force. - Write ONLY your files: REVIEW.md and the "## Adversary findings" section of BACKLOG.md. Everything else (code, STATUS.md, JOURNAL.md, "## Build backlog") is read-only to you. +- INBOX side-channel (§6.1). For non-gate messages to the Builder (heads-up, "I'm running a break-it probe on X," request for clarification, etc.), write/append `machine-docs/BUILDER-INBOX.md` in your clone and push — the watchdog edge-pings the Builder on appearance. To receive a message from the Builder, look for `machine-docs/ADVERSARY-INBOX.md`; process it, then DELETE the file (commit + push) — deletion is the "consumed" signal. Do NOT use the inbox for formal verdicts — REVIEW.md still owns those. +- ISOLATION DISCIPLINE (anti-anchoring — critical). The Builder is REQUIRED to give you in STATUS.md the essential verification info you need: WHAT is claimed (gate id, DoD items), HOW to verify (the exact command/check), the EXPECTED outcome (hashes, fingerprints, status codes, file contents), WHERE the inputs live (commit shas, paths). **Read STATUS for that — you need all of it to verify.** What you must IGNORE — in STATUS, and NEVER read in JOURNAL.md before your verdict — is the Builder's REASONING / RATIONALISATIONS: "I think this passes because…", design narrative, dead-ends, justifications. Reading those anchors you. Form your verdict from (a) the phase plan = SSOT for what is being verified, (b) the code / git history, (c) the verification info the Builder passed you in STATUS, and (d) your own COLD acceptance run that re-executes the check against the expected outcomes. Only AFTER you have written your verdict may you consult JOURNAL.md (e.g. to contextualise a finding) — note in REVIEW.md that you did. Do not trust the Builder's narrative; trust observable behaviour, the plan, and your own re-run. Each wake: 1. Pull. Read STATUS.md for any "Gate: CLAIMED, awaiting Adversary". diff --git a/cc-ci-plan/prompts/builder.md b/cc-ci-plan/prompts/builder.md index e0d40bf..68d4ed7 100644 --- a/cc-ci-plan/prompts/builder.md +++ b/cc-ci-plan/prompts/builder.md @@ -2,12 +2,16 @@ You are the Builder agent for the cc-ci project — one of two independent loops Single source of truth: /srv/cc-ci/cc-ci-plan/plan.md. Read it in full now, then begin at §1 Bootstrap. The original brief /srv/cc-ci/cc-ci-plan/brief.md is context only — do not edit it. -Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7 (three cases): (1) build/deploy/rebuild in flight → poll ~4m, keep polling it; (2) parked at a CLAIMED gate awaiting the Adversary with no other unblocked work → the watchdog will PING you the moment the Adversary updates REVIEW.md, so you may wait, but keep a fallback self-poll ~2–4m in case a ping is missed (don't sit in a long idle while a verdict may be landing); (3) genuinely idle, nothing pending → sleep ~10–15m. Prefer keeping an unblocked backlog item in hand so you rarely hit case 2. Do NOT spin on a minutes-long build. Stop the loop only when STATUS.md says ## DONE. +Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7 (three cases): (1) build/deploy/rebuild/e2e/heavy-test in flight → **poll every ~5 min, NEVER a single big ScheduleWakeup matching the expected runtime** (catch failures at minute 4 of a 25-min e2e, not at minute 25); the cache-warm 5-min poll is cheap, the long blackout is not; (2) parked at a CLAIMED gate awaiting the Adversary with no other unblocked work → the watchdog will PING you the moment the Adversary updates REVIEW.md OR writes a BUILDER-INBOX.md, so you may wait, but keep a fallback self-poll ~2–4m in case a ping is missed; (3) genuinely idle, nothing pending → sleep ~10–15m. Prefer keeping an unblocked backlog item in hand so you rarely hit case 2. Stop the loop only when STATUS.md says ## DONE. You run as a SEPARATE process from the Adversary loop and coordinate ONLY through the git repo per §6.1: - git pull --rebase before every edit; make the smallest change; commit; git push. Never --force. - Write ONLY your files: source/config, STATUS.md, JOURNAL.md, DECISIONS.md, and the "## Build backlog" section of BACKLOG.md. Treat REVIEW.md and "## Adversary findings" as read-only — the Adversary owns them. +- ARTIFACT-LAYER ISOLATION (facts in STATUS, reasoning in JOURNAL). STATUS.md **MUST** give the Adversary everything it needs to verify your claim — withholding verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW** to verify it (the exact command/check the Adversary can re-run from its own clone), the **EXPECTED** outcome (build hashes, file contents, status codes, leaf fingerprints, command exit), and **WHERE** the inputs live (commit shas, paths). If something is essential for the Adversary to verify, put it in STATUS. STATUS **MUST NOT** include rationalisations / "I think this passes because…" / design narrative / dead-ends explored / design choices and their justification — those go in JOURNAL.md, which the Adversary is instructed not to read before forming its verdict (anti-anchoring), so keeping reasoning out of STATUS preserves that. The line: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.** DECISIONS.md is for SETTLED design decisions (joint authority), not in-the-moment rationale. - At each milestone gate, set "Gate: CLAIMED, awaiting Adversary" in STATUS.md and work other unblocked items; do NOT advance past the gate until REVIEW.md shows its PASS. +- INBOX side-channel (§6.1). For non-gate messages to the Adversary (heads-up, "I'm starting a long e2e," "please cold-verify this while I keep going," etc.), write/append `machine-docs/ADVERSARY-INBOX.md` in your clone and push — the watchdog edge-pings the Adversary on appearance. To receive a message from the Adversary, look for `machine-docs/BUILDER-INBOX.md`; process it, then DELETE the file (commit + push) — deletion is the "consumed" signal. Do NOT use the inbox for formal gate claims or verdicts — STATUS.md / REVIEW.md still own those. +- INBOX — for non-gate cross-loop messages (heads-ups, requests for early-look, "I refactored X please re-verify Y", "starting a 25-min e2e"), write `machine-docs/ADVERSARY-INBOX.md` in your clone and push. The watchdog edge-triggers and pings the Adversary. The Adversary deletes the file on consumption. If you receive `machine-docs/BUILDER-INBOX.md` (Adversary side-channel to you), read+process+`git rm` it+push — deletion is the "consumed" signal. Use the inbox for things that aren't a formal gate claim or a verdict; CLAIMS still live in STATUS.md and verdicts in REVIEW.md (the inbox is a side-channel, not a replacement). +- PACING for long-running tasks (e2e / deploy / nixos-rebuild / heavy test): POLL every ~5 min, not a single big ScheduleWakeup that matches the expected runtime. A 25-min e2e gets ~5 short cache-warm polls so you see failures as they happen — never a 25-min cache-cold blackout. (plan.md §7 case 1.) - Write "## DONE" only when REVIEW.md shows a PASS dated <24h for every D1–D10 and there is no standing "## VETO". Overriding rules: diff --git a/cc-ci-plan/reboot-log.sh b/cc-ci-plan/reboot-log.sh new file mode 100755 index 0000000..aadebf0 --- /dev/null +++ b/cc-ci-plan/reboot-log.sh @@ -0,0 +1,22 @@ +#!/usr/bin/env bash +# Runs as ExecStartPre of cc-ci-loops.service. Appends ONE line to REBOOTS.md per genuine reboot. +# Uses the kernel boot_id to distinguish a real reboot from a mere `systemctl restart` of the unit: +# only logs when the current boot_id differs from the last one we recorded. +set -u + +REBOOTS="/srv/cc-ci/cc-ci-plan/REBOOTS.md" +LAST_BOOT_FILE="/srv/cc-ci/.cc-ci-logs/.last-boot-id" +PHASE_IDX_FILE="/srv/cc-ci/.cc-ci-logs/.phase-idx" + +cur_boot="$(cat /proc/sys/kernel/random/boot_id 2>/dev/null || echo unknown)" +last_boot="$(cat "$LAST_BOOT_FILE" 2>/dev/null || echo '')" + +# Same boot_id => this is a manual service restart, not a reboot => do nothing. +[ "$cur_boot" = "$last_boot" ] && exit 0 + +idx="$(cat "$PHASE_IDX_FILE" 2>/dev/null || echo '?')" +ts="$(date '+%Y-%m-%d %H:%M:%S %Z')" +mkdir -p "$(dirname "$LAST_BOOT_FILE")" +printf '%s\n' "- $ts — reboot detected; loops auto-started by systemd (resuming phase index $idx). boot_id=$cur_boot" >> "$REBOOTS" 2>/dev/null || true +echo "$cur_boot" > "$LAST_BOOT_FILE" 2>/dev/null || true +exit 0 diff --git a/cc-ci-plan/systemd/cc-ci-loops.service b/cc-ci-plan/systemd/cc-ci-loops.service new file mode 100644 index 0000000..3270274 --- /dev/null +++ b/cc-ci-plan/systemd/cc-ci-loops.service @@ -0,0 +1,32 @@ +[Unit] +# Canonical, version-controlled copy of the unit installed at /etc/systemd/system/cc-ci-loops.service. +# Install: sudo install -m0644 cc-ci-plan/systemd/cc-ci-loops.service /etc/systemd/system/ \ +# && sudo systemctl daemon-reload && sudo systemctl enable cc-ci-loops.service +# Brings the WHOLE rig back after a reboot of the orchestrator Pi: loops + watchdog (launch.sh) AND +# the orchestrator supervisory session (launch-orchestrator.sh), plus a reboot record (reboot-log.sh). +Description=cc-ci autonomous loops + watchdog + orchestrator (reboot-resilient) +Documentation=file:///srv/cc-ci/cc-ci-plan/plan.md +After=network-online.target cc-ci-tailscaled.service +Wants=network-online.target +Requires=cc-ci-tailscaled.service + +[Service] +Type=oneshot +RemainAfterExit=yes +User=notplants +Group=notplants +Environment=HOME=/home/notplants +Environment=PATH=/home/notplants/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin +# RESUME_PHASE=1 so a reboot resumes the SAVED phase (e.g. phase 2), never restarts from phase 0/1c. +Environment=RESUME_PHASE=1 +# 1) record the reboot (boot_id-gated); 2) start loops + watchdog; 3) resume the orchestrator session. +ExecStartPre=/srv/cc-ci/cc-ci-plan/reboot-log.sh +ExecStart=/srv/cc-ci/cc-ci-plan/launch.sh start +ExecStartPost=/srv/cc-ci/cc-ci-plan/launch-orchestrator.sh start +# Stop only the loops + watchdog. The orchestrator session is intentionally LEFT running on a manual +# `systemctl stop` (stopping the loops shouldn't kill your steering session; it resumes from disk). +ExecStop=/srv/cc-ci/cc-ci-plan/launch.sh stop +TimeoutStartSec=180 + +[Install] +WantedBy=multi-user.target diff --git a/cc-ci-plan/test-e2e-testme-acceptance.md b/cc-ci-plan/test-e2e-testme-acceptance.md new file mode 100644 index 0000000..d84f6cf --- /dev/null +++ b/cc-ci-plan/test-e2e-testme-acceptance.md @@ -0,0 +1,123 @@ +# Acceptance test — real end-to-end `!testme` on the clean-room-rebuilt VM + +**Owner:** the Builder + Adversary loops (they execute *and* independently verify this). +**When:** after **C4/C5 PASS** (genuine throwaway-VM clean-room rebuild verified). The Builder then +performs the tailnet swap (§1) and runs the e2e; the Adversary independently verifies. It is the +**functional acceptance** of D8/clean-room: proof that the rebuilt-from-git VM doesn't just match +byte-for-byte, but actually *serves a real CI run end-to-end through the public domain*. +**This file:** `/srv/cc-ci/cc-ci-plan/test-e2e-testme-acceptance.md` + +--- + +## 0. Why + +The reproducibility gates (C1–C5) prove the rebuilt VM is structurally identical and boots clean. +This test proves it is **operationally** a working CI server: a maintainer comment triggers a build, +the app deploys and is reachable on its real public URL through the operator's gateway, the test +passes, and it tears down — the whole `!testme` pipeline, on the from-git VM, over the real domain. + +--- + +## 1. Setup — the Builder performs the tailnet swap (then the e2e) + +The rebuilt throwaway must become the live `cc-nix-test` so that the public gateway routes real +`ci.commoninternet.net` traffic to it (the gateway TLS-passthroughs via MagicDNS to +`cc-nix-test.taila4a0bf.ts.net` and re-resolves every ~10s, so it auto-follows the name). The swap is +**two reversible `tailscale set --hostname` commands** on VMs you already control — the Builder does +it. **Do this only after C4/C5 PASS** and after the rebuilt VM's full stack +(traefik + bridge + drone + dashboard) is up and serving locally. + +**Order matters** (rename the original *aside first*, or the throwaway will get `cc-nix-test-1`): + +1. **Rename the original prod VM aside** (it stays running — do NOT destroy it; needed for swap-back): + ``` + ssh cc-ci 'tailscale set --hostname=cc-nix-test-orig' + ``` + (`ssh cc-ci` is pinned to the original's IP `100.90.116.4`, so it keeps reaching the original + regardless of the name change.) +2. **Rename the rebuilt throwaway → `cc-nix-test`.** Re-derive its current tailscale IP (throwaways + get a fresh IP each rebuild): pick the ONLINE throwaway node from + `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep -i throwaway`, then: + ``` + ssh -i /srv/incus-terraform-nix-vm-creator/terraform-secrets/vm_ssh_key \ + -o ProxyCommand='nc -X 5 -x 127.0.0.1:1055 %h %p' root@ \ + 'tailscale set --hostname=cc-nix-test' + ``` + +**Heads-up — tailnet-wide effect:** after the swap, `cc-nix-test.taila4a0bf.ts.net` resolves to the +rebuilt VM for *everyone* on the tailnet, so any of your own tooling that targets cc-nix-test **by +MagicDNS name** will now hit the rebuilt VM (tooling pinned to the raw IP `100.90.116.4` still hits +the original). Account for that when you point `!testme`/deploys. + +**Verify the swap took (P1+P2) before starting the e2e** — must pass: +``` +tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep cc-nix-test # → the throwaway's IP +curl -sS -o /dev/null -w '%{http_code} ssl_verify=%{ssl_verify_result}\n' https://ci.commoninternet.net/ +# expect: 200 ssl_verify=0 (real public path now served by the rebuilt VM, valid cert) +``` + +**Swap-back when testing is done** (reversible): rename the throwaway back to its old name, then +`ssh cc-ci 'tailscale set --hostname=cc-nix-test'` to restore the original; the gateway re-follows. + +--- + +## 2. Procedure + +1. **Pick one fast, already-enrolled recipe.** Prefer the lightest enrolled app (e.g. `custom-html`) + so the run is quick and resource-cheap. Note the recipe + the repo/issue or PR where `!testme` is + recognised (the same place prior runs were triggered). +2. **Record the baseline.** Capture the recipe's *current* latest Drone run number and the dashboard + row (`https://ci.commoninternet.net/` and `https://drone.ci.commoninternet.net/...`) so you can + prove the run you trigger is **new**. +3. **Trigger via the real path.** Post `!testme` as the **bot** (the normal maintainer-comment + trigger) on that recipe — exactly as a real maintainer would. Do **not** invoke Drone directly or + shortcut the bridge; the comment→bridge→Drone path is part of what's under test. +4. **Confirm the bridge picked it up.** Within the bridge's poll interval, a **new** Drone build for + that recipe starts. Capture the new run number (must be > the baseline from step 2). +5. **Confirm the app deploys and is reachable on its PUBLIC URL.** While the build runs, the app is + deployed to its `*.ci.commoninternet.net` test domain. From **off the VM** (external — through the + gateway, not `localhost`/`127.0.0.1`), confirm a real request succeeds: + ``` + curl -sS -D- -o /dev/null https://.ci.commoninternet.net/ + # expect: HTTP 200 (or the app's expected status), valid *.ci.commoninternet.net cert, + # served content from the deployed app — NOT a Traefik 404 / default-cert. + ``` + This is the crux: it proves routing public-DNS → gateway → MagicDNS → rebuilt VM → Traefik → + deployed app all works on the rebuilt server. +6. **Confirm the test logic passed.** The Drone build runs the recipe's real test assertions (app + state, not health-only) and finishes **success**. +7. **Confirm teardown.** After the run, the app is **undeployed** (no leftover stack/containers), per + the standard post-run cleanup — verify it's gone. +8. **Confirm the result was reported.** The outcome posts back to the trigger location and the + dashboard row updates to the new run with `success`. + +--- + +## 3. Pass criteria (all must hold; Adversary verifies independently) + +- [ ] **E1.** Self-check §1 passed (`ci.commoninternet.net` = 200, valid cert, on the rebuilt VM). +- [ ] **E2.** Posting `!testme` produced a **new** Drone build (run # > baseline) via the bridge — + not a manual Drone trigger. +- [ ] **E3.** The deployed app answered an **external** request on its real + `.ci.commoninternet.net` URL (through the gateway) with the expected response + valid cert + — captured with headers/body evidence. +- [ ] **E4.** The Drone build's **real test assertions** ran and the build finished **success** + (no skipped/softened tests). +- [ ] **E5.** The app **undeployed** cleanly afterward (no residual stack). +- [ ] **E6.** Result reported back + dashboard updated to the new successful run. + +Evidence (run #, the external `curl` headers/body, dashboard before/after, undeploy proof) is logged +in `JOURNAL-1c.md`, and the verdict in `REVIEW-1c.md` / `STATUS-1c.md` as **E2E-TESTME — PASS**. + +## 4. If it fails + +Treat as a clean-room finding, not a config patch: a failure here means the from-git rebuild is +missing something the running server had out-of-band (a secret, a manual step, drift). Capture the +failing stage + logs in `JOURNAL-1c.md`, raise it as a blocker, and fix it in the **git source** +(base or `cc-ci-secrets`) so the next rebuild includes it — do **not** hand-fix the live VM. Re-run +this test after the fix. + +## 5. Bound + +One recipe, one green run. This is a functional smoke test of the rebuilt VM, not a full recipe-test +campaign (that's Phase 2). Don't expand scope here.