orchestrator: reboot-resilience + session auto-resume + full session plan/tooling

Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-28 20:28:10 +01:00
parent 5681438b0f
commit 36a6c9872a
20 changed files with 1395 additions and 19 deletions
--- a/.gitignore
+++ b/.gitignore
@ -10,3 +10,9 @@
 /cc-ci-adv/
 /.cc-ci-watch/
 /.cc-ci-logs/
+
+# More secrets / local state — NEVER commit
+/.sops/            # master recovery age key
+/cc-ci-secrets/    # separate sops-secrets repo, cloned in
+/.claude/          # local claude session/project state
+*.tmp.*            # editor temp files
--- a/AGENTS.md
+++ b/AGENTS.md
@ -16,6 +16,25 @@ project (NixOS config, test runner, recipe tests) lives in a **separate** repo t
 The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator
 watches from outside.

+## On startup: announce yourself + report reboots
+
+**Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online —
+the operator wants to know the supervising session is back (especially after a reboot, which kills
+this session along with the Pi). Include the current phase and the reboot count. Steps on startup:
+1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status`
+   (current phase + whether the loops/watchdog are running).
+2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog
+   running; N reboots logged (last <date>)."*
+3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the
+   loops are down), check that `cc-ci-loops.service` brought the loops back; if not, relaunch with
+   `RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
+
+Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot
+to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops +
+watchdog auto-resume the saved phase. The orchestrator session itself is NOT auto-started — the
+operator reconnects to it (that's why the startup notification matters). The fuller "move the
+orchestrator onto its own VM" plan is parked at `cc-ci-plan/plan-orchestrator-migration.md`.
+
 ## Keep the orchestrator open, under remote-control

 Run this session as a long-lived **interactive** session with `--remote-control` so the operator can
--- a/cc-ci-plan/IDEAS.md
+++ b/cc-ci-plan/IDEAS.md
@ -4,6 +4,22 @@ Post-DONE or "revisit later" ideas that are intentionally **out of scope** for t
 (§2 Definition of Done). Not active work — parked here so they aren't lost. The loops may pull an
 item into the project `BACKLOG.md` as `[idea]` if/when it becomes relevant.

+- [ ] **Optional `--extra-tests` flag for heavy / operational tests (opt-in heavy suite).**
+  Some recipe tests are "more than needed" for the default CI signal — state-management /
+  long-running-instance / load / helper-script operational tests that don't fit the ephemeral
+  per-run-deploy model cheaply but are useful occasionally. Today they're deferred to
+  `cc-ci/machine-docs/DEFERRED.md` (e.g. matrix-synapse `compress_state.sh`,
+  `test_complexity_limit.sh`, `test_purge.sh`) and don't run.
+  *Idea:* add an **opt-in `--extra-tests` flag** (e.g. `!testme --extra-tests` on a PR comment, or
+  a `STAGES=extra` / `EXTRA_TESTS=1` Drone build parameter) that the orchestrator passes through;
+  recipes declare an `extra/` test dir or mark tests with `@pytest.mark.extra`; on opt-in the
+  orchestrator runs them **alongside** the default tiers (still one deploy, still teardown). Default
+  off so default CI stays fast; the operator can ask for the heavy suite when reviewing a PR that
+  touches an extra-covered area (e.g. matrix-synapse's abra helpers). When implemented, each
+  matching DEFERRED entry can be CLOSED by porting its test into the recipe's `extra/` and noting
+  the commit in DEFERRED.md. *Why deferred for now:* default coverage is sufficient; this is a
+  later breadth/depth knob, not a critical-path feature. *Added:* 2026-05-28.
+
 - [ ] **Optional webhook self-registration (admin-access environments).**
  We deliberately made **polling the primary trigger** and require the CI server/bot to run on
  **read-level** access only — so the server does **not** auto-register Gitea webhooks (that needs
--- a/cc-ci-plan/README.md
+++ b/cc-ci-plan/README.md
@ -17,7 +17,9 @@ autonomous Claude loops (a Builder and an adversarial Reviewer) running over day
 | `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. |
 | `plan-phase1c-full-reproducibility.md` | **Phase 1c** (runs first): make the VM fully reproducible from git (all secrets incl. the wildcard cert in sops, in a separate private `cc-ci-secrets` repo as a flake input; base stays well-parameterized) and do the **genuine throwaway-VM live rebuild** to close D8 honestly (the "infeasible by design" was overstated). |
 | `plan-phase1b-review-lint.md` | **Phase 1b** (after 1c): deterministic linting/formatting in CI + a white-box review checklist (real tests, DRY harness, idempotent Nix, no footguns/secrets), ending in a full cold re-verification of all D1–D10 — now covering 1c's refactor. |
-| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1b): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. |
+| `plan-phase1d-generic-test-suite.md` | **Phase 1d** (after 1b, before 2): a **generic install/upgrade/backup/restore** suite that runs on *any* recipe with zero config, with a recipe's own `test_<op>.py` **overriding or extending** the generic (Builder's call) and **reusing the generic's deployment — no redeploy**, plus optional custom install-steps; recipes needing special setup fail the generic form gracefully. The test-architecture foundation Phase 2 builds on. |
+| `plan-phase1e-harness-corrections.md` | **Phase 1e** (after 1d, before 2): three operator-review corrections to the shared generic harness — (HC1) upgrade goes previous-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved recipes** (default = cc-ci overlays + generic only); (HC3) the **generic runs by default** alongside an overlay, skipped only via explicit opt-out. |
+| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1e): build on the corrected generic suite — author the recipe overlays (port recipe-maintainer tests as `test_*.py`) + define custom install steps where a recipe fails generically. |
 | `plan-phase2b-test-performance.md` | **Phase 2b** (after Phase 2, before Phase 3): empirically measure where test time goes and reduce it (image cache, readiness tuning, dedup deploys, warm infra, concurrency) — no weakened tests. |
 | `plan-phase3-results-ux.md` | **Phase 3** (after Phase 2b): beautiful YunoHost-style results — per-run **level**, image-forward PR comment (badge + summary card + app screenshot), polished dashboard. |
 | `IDEAS.md` | Deferred/future ideas, parked out of current scope. |
--- a/cc-ci-plan/REBOOTS.md
+++ b/cc-ci-plan/REBOOTS.md
@ -0,0 +1,15 @@
+# Reboot log — cc-ci orchestrator Pi
+
+One line per genuine reboot of the orchestrator Pi (`raspberrypi`), appended automatically by
+`reboot-log.sh` (ExecStartPre of `cc-ci-loops.service`, boot_id-gated so manual service restarts are
+NOT counted). The Pi hosts the Builder + Adversary loops + watchdog; a reboot drops the tmux sessions
+(and this orchestrator session), and `cc-ci-loops.service` restarts the loops on boot. Count the
+lines below to see how often it's happening.
+
+## Reboots
+
+- 2026-05-28 (~19:?? BST) — reboot (backfilled from memory; mid-Phase-2). Orchestrator + loops were
+  down until manually relaunched. This pre-dates the systemd auto-restart service.
+- 2026-05-28 (~20:02 BST) — reboot (backfilled from memory; uptime showed 5 min at 20:07). Loops
+  manually relaunched at phase 2; this is what prompted adding `cc-ci-loops.service` +
+  auto-logging. Auto-logging is live from the next reboot onward.
--- a/cc-ci-plan/launch-orchestrator.sh
+++ b/cc-ci-plan/launch-orchestrator.sh
@ -0,0 +1,117 @@
+#!/usr/bin/env bash
+#
+# launch-orchestrator.sh — start/resume the cc-ci ORCHESTRATOR session in tmux under remote-control.
+#
+# The orchestrator (see /srv/cc-ci/AGENTS.md) is the long-lived SUPERVISORY session: it watches the
+# Builder/Adversary loops, reads their logs/STATUS, edits the plan/prompts, restarts stuck loops, and
+# owns the VM-level fallback. It is SEPARATE from the loops that launch.sh manages — this script only
+# brings the orchestrator back (e.g. after a reboot, which kills the tmux server and every session in
+# it). The conversation itself survives on disk across exits/reboots; remote-control only stays
+# connected while the process is alive, so recovery = relaunch the process and re-attach by --resume.
+#
+# Naming: tmux session AND remote-control name are both "cc-ci-orchestrator", matching the loop
+# sessions cc-ci-builder / cc-ci-adv / cc-ci-watchdog.
+#
+# Usage:
+#   ./launch-orchestrator.sh start     # resume the persistent orchestrator session (DEFAULT)
+#   ./launch-orchestrator.sh fresh     # start a NEW orchestrator session (no --resume)
+#   ./launch-orchestrator.sh status    # show tmux + remote-control state
+#   ./launch-orchestrator.sh attach    # tmux attach to the session (Ctrl-b d to detach)
+#   ./launch-orchestrator.sh stop      # kill the tmux session (conversation persists on disk)
+#
+# The persistent session id is read from $ID_FILE (seeded on first run with DEFAULT_ID). A Claude
+# session keeps the SAME id across --resume, so this stays valid across reboots. To point the script
+# at a different session, edit that file or export ORCH_SESSION_ID.
+
+set -euo pipefail
+
+# ----- config -------------------------------------------------------------
+SESSION="${ORCH_SESSION:-cc-ci-orchestrator}"        # tmux session name == remote-control name
+WORKDIR="${ORCH_DIR:-/srv/cc-ci}"                    # orchestrator cwd (its claude project dir)
+CLAUDE_BIN="${CLAUDE_BIN:-claude}"
+CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
+# REMOTE_CONTROL=1 → --remote-control session, viewable/steerable at claude.ai/code. Needs the box
+# logged into the claude.ai account. =0 for a plain local interactive session.
+REMOTE_CONTROL="${REMOTE_CONTROL:-1}"
+LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
+ID_FILE="${ORCH_ID_FILE:-$LOG_DIR/.orchestrator-session-id}"
+DEFAULT_ID="34a80a99-b37e-4809-b8da-ccc9fafe785e"    # the orchestrator session as of 2026-05-28
+# Startup nudge injected as the resumed session's first turn, so an AUTO-launched orchestrator (e.g.
+# cc-ci-loops.service ExecStartPost after a reboot) actually RUNS its AGENTS.md startup routine —
+# announce itself + report reboots — instead of resuming silently and waiting. Set empty to disable.
+# Must contain NO single quotes (it is single-quoted into the tmux command).
+STARTUP_PROMPT="${ORCH_STARTUP_PROMPT-STARTUP (auto-launch): you are the cc-ci orchestrator, just (re)launched, likely after a reboot. Do your AGENTS.md On-startup routine NOW: read cc-ci-plan/REBOOTS.md and run cc-ci-plan/launch.sh status, then send a proactive PushNotification that you are online with the current phase and reboot count, and confirm cc-ci-loops.service brought the loops + watchdog back (relaunch with RESUME_PHASE=1 cc-ci-plan/launch.sh start if not). Then resume supervising.}"
+# --------------------------------------------------------------------------
+
+log()  { printf '[orchestrator %(%H:%M:%S)T] %s\n' -1 "$*"; }
+die()  { log "ERROR: $*"; exit 1; }
+session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; }
+
+preflight() {
+  command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux"
+  command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)"
+  [[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR"
+  mkdir -p "$LOG_DIR"
+  [[ -f "$ID_FILE" ]] || echo "$DEFAULT_ID" > "$ID_FILE"
+}
+
+resume_id() { echo "${ORCH_SESSION_ID:-$(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID")}"; }
+
+# Launch claude in a detached tmux session. $1=resume ("resume"|"fresh").
+start() {
+  local mode="${1:-resume}"
+  preflight
+  if session_alive; then
+    log "$SESSION already running — leaving it (use '$0 stop' first to relaunch)"
+    return 0
+  fi
+  local rc="" resume="" id=""
+  [[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'"
+  if [[ "$mode" == "resume" ]]; then
+    id="$(resume_id)"
+    [[ -n "$id" ]] && resume="--resume '$id'"
+    log "starting $SESSION (resume=$id, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
+  else
+    log "starting $SESSION FRESH (no resume, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
+  fi
+  # Startup nudge as a POSITIONAL prompt (not stdin — stdin would force print mode and break
+  # remote-control). On --resume this appends as the session's next turn, triggering the AGENTS.md
+  # startup routine (announce + report reboots). Empty STARTUP_PROMPT => clean resume, no nudge.
+  local prompt_arg=""
+  [[ -n "$STARTUP_PROMPT" ]] && prompt_arg="'$STARTUP_PROMPT'"
+  tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
+    "$CLAUDE_BIN $resume $rc $CLAUDE_FLAGS $prompt_arg"
+  tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'"
+  log "started. status: $0 status | attach: tmux attach -t $SESSION"
+}
+
+case "${1:-start}" in
+  start)  start resume ;;
+  fresh)  start fresh ;;
+  stop)
+    if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi
+    ;;
+  status)
+    if session_alive; then
+      log "$SESSION: RUNNING"
+      ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true
+    else
+      log "$SESSION: stopped"
+    fi
+    log "resume id: $(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID")  (file: $ID_FILE)"
+    ;;
+  attach) exec tmux attach -t "$SESSION" ;;
+  *)
+    cat <<EOF
+cc-ci orchestrator launcher
+
+  $0 start    resume the persistent orchestrator session in tmux + remote-control (default)
+  $0 fresh    start a NEW orchestrator session (no --resume)
+  $0 status   show tmux + remote-control state and the resume id
+  $0 attach   tmux attach to the session
+  $0 stop     kill the tmux session (conversation persists on disk)
+
+Env: SESSION=$SESSION  WORKDIR=$WORKDIR  REMOTE_CONTROL=$REMOTE_CONTROL  CLAUDE_BIN=$CLAUDE_BIN
+EOF
+    ;;
+esac
--- a/cc-ci-plan/launch.sh
+++ b/cc-ci-plan/launch.sh
@ -7,10 +7,10 @@
 #   • Adversary (tmux session: cc-ci-adv)        working clone /srv/cc-ci/cc-ci-adv
 # coordinating only through the git repo on git.autonomic.zone.
 #
-# PHASES: the watchdog runs an ordered sequence of sub-phases (default: 1c then 1b). Each phase
-# has its own plan + phase-namespaced loop-state files (STATUS-<id>.md etc.). When a phase's
+# PHASES: the watchdog runs an ordered sequence of sub-phases (default: 1c → 1b → 1d → 1e → 2 → 2b → 3 → 4).
+# Each phase has its own plan + phase-namespaced loop-state files (STATUS-<id>.md etc.). When a phase's
 # STATUS-<id>.md shows "## DONE", the watchdog AUTO-TRANSITIONS to the next phase; after the LAST
-# phase it STOPS the loops and exits (a manual gate — e.g. check in before Phase 2).
+# phase (4, final review/polish/cleanup) it STOPS the loops and exits (end of the whole build).
 #
 # Three jobs: ITERATION (each agent's /loop), RESILIENCE (restart a dead loop), HANDOFF SIGNALLING
 # (ping the waiting loop the moment its counterpart hands off), PHASE SEQUENCING (this file).
@ -49,7 +49,7 @@ WATCHDOG_SESSION="cc-ci-watchdog"
 # Ordered phase sequence: each entry "id|planfile|statusbasename". The watchdog runs them in order,
 # auto-transitions on the phase's "## DONE" (in BUILDER_DIR/<statusbasename>), and STOPS after the
 # last one (manual gate). Override PHASES_SPEC (semicolon-separated) to change the sequence.
-PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md}"
+PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md;1d|plan-phase1d-generic-test-suite.md|STATUS-1d.md;1e|plan-phase1e-harness-corrections.md|STATUS-1e.md;2|plan-phase2-recipe-tests.md|STATUS-2.md;2b|plan-phase2b-test-performance.md|STATUS-2b.md;3|plan-phase3-results-ux.md|STATUS-3.md;4|plan-phase4-final-review-polish-cleanup.md|STATUS-4.md}"
 IFS=';' read -r -a PHASES <<< "$PHASES_SPEC"
 PHASE_IDX_FILE="${PHASE_IDX_FILE:-$LOG_DIR/.phase-idx}"
 # --------------------------------------------------------------------------
@ -64,7 +64,10 @@ phase_id()     { echo "${PHASES[$1]}" | cut -d'|' -f1; }
 phase_plan()   { echo "${PHASES[$1]}" | cut -d'|' -f2; }
 phase_status() { echo "${PHASES[$1]}" | cut -d'|' -f3; }
 phase_review() { echo "REVIEW-$(phase_id "$1").md"; }
-phase_done()   { grep -qE '^##[[:space:]]+DONE' "$BUILDER_DIR/$1" 2>/dev/null; }   # $1 = status basename (read locally)
+# Loop-state files may sit at the repo root OR under machine-docs/ (the 1b RL6 move). Prefer
+# machine-docs/ if present, else root — so the watchdog survives the move whenever it happens.
+resolve_state() { local dir="$1" base="$2"; if [[ -f "$dir/machine-docs/$base" ]]; then echo "$dir/machine-docs/$base"; else echo "$dir/$base"; fi; }
+phase_done()   { grep -qE '^##[[:space:]]+DONE' "$(resolve_state "$BUILDER_DIR" "$1")" 2>/dev/null; }   # $1 = status basename (read locally)
 all_ids()      { local p; for p in "${PHASES[@]}"; do printf '%s ' "$(echo "$p" | cut -d'|' -f1)"; done; }

 preflight() {
@ -133,15 +136,32 @@ ping_session() {
  tmux send-keys -t "$s" -l -- "$msg" 2>/dev/null && { sleep 0.3; tmux send-keys -t "$s" Enter 2>/dev/null; }
 }

+# A loop can stall ALIVE on a usage/spend-limit notice: the claude process stays up (so the
+# dead-session restart never fires) but makes no progress, and the /loop self-pacing is dead because
+# the limit interrupted the turn that would have scheduled the next tick. Detect that signature
+# (limit text present + no active-turn marker) and re-nudge it each heavy tick — once the limit resets
+# the next nudge lands and the loop resumes. Gated on the limit text so we NEVER nudge a loop that is
+# just legitimately idle-waiting on a handoff.
+LIMIT_RE='spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)'
+nudge_if_limit_stalled() {
+  local s="$1" pane
+  pane="$(tmux capture-pane -pt "$s" 2>/dev/null | tail -25 || true)"
+  if printf '%s\n' "$pane" | grep -q 'esc to interrupt'; then return 0; fi    # actively working
+  if ! printf '%s\n' "$pane" | grep -qiE "$LIMIT_RE"; then return 0; fi        # not a limit stall
+  log "limit-stall detected on $s — re-nudging to resume"
+  ping_session "$s" "watchdog: the usage/spend limit appears lifted — RESUME your loop now. Pull latest, re-read your phase STATUS/REVIEW files, and continue from where you stopped; re-arm your loop pacing."
+}
+
 # Edge-triggered handoff signalling for the CURRENT phase. Reads the loops' local clones.
 # Ping the Adversary only when a gate id NEWLY appears on a "CLAIMED … awaiting" line (never on
 # the baseline / restart / a passed-but-kept line). Ping the Builder when the phase REVIEW changes.
 _wd_awaiting=""; _wd_baselined=""; _wd_last_review=""
-handoff_reset() { _wd_awaiting=""; _wd_baselined=""; _wd_last_review=""; }   # call on phase transition
+_wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""
+handoff_reset() { _wd_awaiting=""; _wd_baselined=""; _wd_last_review=""; _wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""; }   # call on phase transition
 handoff_check() {
  local idx sf rf cur now added
  idx="$(cur_idx)"
-  sf="$BUILDER_DIR/$(phase_status "$idx")"; rf="$ADV_DIR/$(phase_review "$idx")"
+  sf="$(resolve_state "$BUILDER_DIR" "$(phase_status "$idx")")"; rf="$(resolve_state "$ADV_DIR" "$(phase_review "$idx")")"
  if [[ -f "$sf" ]]; then
    now="$(grep -iE 'CLAIMED.*awaiting' "$sf" 2>/dev/null | grep -oiE 'M[0-9]+(\.[0-9]+)?|[A-Z][0-9]+' | tr '[:lower:]' '[:upper:]' | sort -u || true)"
    if [[ -n "$_wd_baselined" ]]; then
@ -163,6 +183,34 @@ handoff_check() {
      _wd_last_review="$cur"
    fi
  fi
+
+  # INBOX side-channel (§6.1). The sender writes the receiver's inbox in their OWN clone, so we
+  # detect from the sender side. Edge-trigger on content hash so a fresh message (sender re-wrote
+  # before receiver consumed) re-pings. Receiver deletes after processing => hash empty => next
+  # write re-triggers.
+  local adv_inbox builder_inbox h
+  adv_inbox="$(resolve_state "$BUILDER_DIR" "ADVERSARY-INBOX.md")"
+  if [[ -f "$adv_inbox" ]]; then
+    h="$(md5sum "$adv_inbox" 2>/dev/null | awk '{print $1}' || true)"
+    if [[ -n "$h" && "$h" != "$_wd_adv_inbox_seen" ]]; then
+      log "handoff: ADVERSARY-INBOX.md new/changed -> pinging Adversary"
+      ping_session "$ADV_SESSION" "watchdog ping: the Builder wrote machine-docs/ADVERSARY-INBOX.md — pull, read the message, act on it, then delete the file (commit + push) to mark it consumed."
+      _wd_adv_inbox_seen="$h"
+    fi
+  else
+    _wd_adv_inbox_seen=""   # consumed; ready for the next write
+  fi
+  builder_inbox="$(resolve_state "$ADV_DIR" "BUILDER-INBOX.md")"
+  if [[ -f "$builder_inbox" ]]; then
+    h="$(md5sum "$builder_inbox" 2>/dev/null | awk '{print $1}' || true)"
+    if [[ -n "$h" && "$h" != "$_wd_builder_inbox_seen" ]]; then
+      log "handoff: BUILDER-INBOX.md new/changed -> pinging Builder"
+      ping_session "$BUILDER_SESSION" "watchdog ping: the Adversary wrote machine-docs/BUILDER-INBOX.md — pull, read the message, act on it, then delete the file (commit + push) to mark it consumed."
+      _wd_builder_inbox_seen="$h"
+    fi
+  else
+    _wd_builder_inbox_seen=""
+  fi
 }

 watchdog_loop() {
@ -184,15 +232,15 @@ watchdog_loop() {
          handoff_reset
          start_loops
        else
-          log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — MANUAL CHECK-IN required before Phase 2."
+          log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — entire build (1c→3) finished."
          stop_loops
-          printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; manual check-in required before Phase 2.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE"
+          printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; entire build finished.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE"
          log "watchdog exiting."
          exit 0
        fi
      else
-        session_alive "$BUILDER_SESSION" || { log "builder gone — restarting (phase $pid)"; start_agent builder   "$BUILDER_SESSION" "$BUILDER_DIR"; }
-        session_alive "$ADV_SESSION"     || { log "adversary gone — restarting (phase $pid)"; start_agent adversary "$ADV_SESSION"     "$ADV_DIR"; }
+        if session_alive "$BUILDER_SESSION"; then nudge_if_limit_stalled "$BUILDER_SESSION"; else log "builder gone — restarting (phase $pid)"; start_agent builder   "$BUILDER_SESSION" "$BUILDER_DIR"; fi
+        if session_alive "$ADV_SESSION";     then nudge_if_limit_stalled "$ADV_SESSION";     else log "adversary gone — restarting (phase $pid)"; start_agent adversary "$ADV_SESSION"     "$ADV_DIR"; fi
      fi
    fi
    sleep "$SIGNAL_INTERVAL"
--- a/cc-ci-plan/plan-orchestrator-migration.md
+++ b/cc-ci-plan/plan-orchestrator-migration.md
@ -0,0 +1,135 @@
+# Plan — migrate the orchestrator off the Pi onto a dedicated NixOS Incus VM
+
+**Goal:** move everything that drives the cc-ci loops (the Builder/Adversary loops, the watchdog,
+the SOCKS proxy, the orchestrator session itself) off the Raspberry Pi and onto a new, dedicated,
+**reboot-resilient NixOS VM** on b1 — declared in a new git repo **`cc-ci-orchestrator`**. Finish by
+relocating this orchestrator session there too.
+
+**Why:** the Pi has rebooted twice today, each time silently killing the tmux loops + watchdog
+(they don't survive reboot, nothing auto-restarts them). A NixOS VM lets us declare the whole rig
+(claude CLI, proxy, loop supervisor) as systemd services that come back on boot — turning a reboot
+into a non-event. It also consolidates the orchestrator next to the infra it manages.
+
+**Status:** DRAFT — awaiting operator go-ahead before any infra creation / cutover.
+
+---
+
+## 0. Current footprint (what has to move)
+
+On the Pi (`raspberrypi`, aarch64), workspace `/srv/cc-ci` (itself the
+`cc-ci-autonomous-orchestrator` git repo):
+
+| Item | What | Move strategy |
+|---|---|---|
+| `cc-ci-plan/` | loop code: `launch.sh`, `plan*.md`, `prompts/`, `kickoff.md` | in git (this repo) → clone on VM |
+| `cc-ci/`, `cc-ci-adv/` | Builder + Adversary working clones (~13M each) | **re-clone from git.autonomic.zone** on the VM (cleaner than copying) |
+| `.cc-ci-logs/` | watchdog/loop logs + `.phase-idx` | copy `.phase-idx` (the resume point); logs start fresh |
+| `cc-ci-secrets/` | sops-encrypted secrets repo | in git → clone |
+| `references/` | recipe-maintainer corpus (read-only parity source) | clone/rsync from `/srv/recipe-maintainer` |
+| **`.testenv`** | TS auth key, Gitea bot creds | **out-of-band copy** (gitignored, never in git) |
+| **`~/.ssh/cc-ci-root-ed25519`** | root SSH key to cc-ci | **out-of-band copy** |
+| **`.sops/master-age.txt`** | master recovery age key | **out-of-band copy** |
+| **Incus mTLS certs** (`/srv/incus-terraform-nix-vm-creator/terraform-secrets/`) | `terraform.{crt,key}`, `vm_ssh_key` | **out-of-band copy** — so the VM can itself manage VMs |
+| `cc-ci-tailscaled.service` | userspace SOCKS proxy :1055 | **re-declare as NixOS** (see §3) |
+| **claude CLI + auth** | `~/.local/bin/claude` v2.1.154 + `~/.claude.json` | install on VM + **operator `claude auth login`** (§4) |
+| this orchestrator session | the supervising claude conversation | **operator-assisted cutover** (§6) |
+
+Two hard human-in-the-loop steps, called out explicitly: **claude auth on the new VM** (device-code
+login, can't be scripted) and the **final session cutover** (the operator connects to the new
+orchestrator session). Everything else I can do.
+
+## 1. Target VM spec
+
+- **Host/API:** b1 Incus, `https://100.117.251.31:8443`, project `terraform-ci`, mTLS certs (have).
+- **Name:** `cc-ci-orchestrator` (tailnet hostname too).
+- **Resources:** **2 GB RAM, 2 vCPU, 30 GB disk** (dir backend → resize needs a reboot; size at
+  create time so no later grow). b1 has ample headroom (only cc-nix-test @8GB running).
+- **Image:** the existing imported NixOS base VM image (`incus-base-vm`) — already ships tailscale,
+  openssh, git/jq/curl, flakes, cloud-init.
+- **Tailnet:** joins via a fresh `TS_AUTH_KEY` (operator provides, or reuse the keyed approach in
+  `terraform-secrets/.test.env`). MagicDNS name `cc-ci-orchestrator.taila4a0bf.ts.net`.
+- **Bootstrap:** cloud-init writes the `cc-ci-orchestrator` flake config + `nixos-rebuild switch`.
+
+## 2. The new `cc-ci-orchestrator` git repo (NixOS config)
+
+A new **private** repo on `git.autonomic.zone/recipe-maintainers/cc-ci-orchestrator` (bot is org
+admin). It is the NixOS config for this VM — the orchestrator's equivalent of what `cc-ci` is for the
+test server. Contents:
+
+- `flake.nix` + `hosts/cc-ci-orchestrator/configuration.nix` — the VM's NixOS config.
+- **Packages:** `claude-code` (CLI), `git`, `tmux`, `python3`, `jq`, `openssh`, `nodejs` (claude
+  runtime), `coreutils`, `nettools` (`nc` for the proxy ProxyCommand).
+- **`services.cc-ci-tailscaled`** — the userspace tailscaled SOCKS proxy on :1055, as a NixOS
+  systemd service (port to NixOS from the Pi's `cc-ci-tailscaled.service`). This is the path to b1 +
+  cc-ci.
+- **`services.cc-ci-orchestrator`** — a systemd service that runs `launch.sh start` with
+  `RESUME_PHASE=1` **on boot** (after the proxy + network are up), as the workspace user. **This is
+  the reboot-resilience fix** — the loops + watchdog come back automatically after any reboot.
+- **Secrets via sops-nix** (like cc-ci): the out-of-band secrets (`.testenv`, ssh key, incus certs)
+  are sops-encrypted into the repo, decrypted at activation to their runtime paths. The **master age
+  key** is the one irreducible out-of-band bootstrap secret placed on the VM once.
+- `~/.ssh/config` for `cc-ci` (root, ProxyCommand via :1055) declared.
+- **Excluded from git:** claude's own auth (`~/.claude.json`) — that's per-user login state, set up
+  once interactively (§4), not committed.
+
+## 3. Execution phases
+
+### Phase A — provision the VM (reversible; safe to do while Pi loops keep running)
+1. Create `cc-ci-orchestrator` VM via the Incus API (2 GB / 2 vCPU / 30 GB, NixOS base image, TS auth
+   key in cloud-init). Wait for tailnet join + ssh.
+2. Verify: `ssh` in, `tailscale status`, `nixos-rebuild` available, can reach b1 API + cc-ci through
+   its own proxy once configured.
+
+### Phase B — author + apply the `cc-ci-orchestrator` repo
+3. Create the private git repo; author the flake/config (§2); commit/push.
+4. Place the master age key on the VM; sops-encrypt the out-of-band secrets into the repo.
+5. `nixos-rebuild switch` on the VM → proxy service up, packages present, services defined (loop
+   supervisor **not yet started** — or started in a dry mode).
+
+### Phase C — stage the workspace (no cutover yet)
+6. On the VM: clone `cc-ci-autonomous-orchestrator` (the loop code), clone the Builder/Adversary
+   working repos fresh from git.autonomic.zone, clone `cc-ci-secrets`, rsync `references/`.
+7. Copy `.phase-idx` (resume point = phase 2) so the VM watchdog resumes the right phase.
+8. **Operator step:** `claude auth login` on the VM (device code) so the loops can run
+   `--remote-control --dangerously-skip-permissions`. Verify with a throwaway interactive claude.
+
+### Phase D — cutover (the only disruptive moment; pick a clean point)
+9. **Quiesce the Pi:** stop the Pi loops + watchdog (`launch.sh stop`); confirm both loops are at a
+   safe point (no half-written commit; `git status` clean in both clones, last work pushed).
+10. **Start on the VM:** enable + start the `cc-ci-orchestrator` systemd service → `launch.sh start`
+    (RESUME_PHASE=1) brings up Builder + Adversary + watchdog on the VM, resuming phase 2 from the
+    repo state. Verify all three sessions + a handoff + public health.
+11. **Decommission the Pi loops:** disable the Pi's `cc-ci-tailscaled` + leave the workspace in place
+    (read-only fallback) but not running loops. (Keep the Pi as a cold standby for a few days before
+    deleting anything.)
+
+### Phase E — move the orchestrator session (operator-assisted)
+12. On the VM, start the orchestrator session: `claude --remote-control 'autonomous-orchestrator'
+    --dangerously-skip-permissions` in a tmux session, seeded with AGENTS.md + this plan so it picks
+    up the supervising role. The **operator connects** to it (claude.ai/code) — this is the
+    "move myself" step; a session can't transplant itself across machines, so it's a fresh
+    orchestrator session on the VM with full context from the repo.
+13. This Pi-side orchestrator session hands off (writes a short state note) and goes idle/ends.
+
+## 4. Risks & mitigations
+- **claude auth (human step):** unavoidable device-code login on the VM. Mitigation: do it in Phase
+  C, well before cutover; verify before quiescing the Pi.
+- **Loops mid-work at cutover:** pick a quiet point (between gate claims / after a push); the loops
+  re-orient from git on restart anyway, so worst case is a re-run of an in-flight iteration.
+- **Secrets sprawl:** out-of-band secrets are copied once, then sops-managed in the new repo; never
+  committed in plaintext (same discipline as cc-ci). The master age key is the sole bootstrap secret.
+- **Self-move gap:** between Pi-session-ends and VM-session-connected, there's no live orchestrator.
+  The watchdog (now a boot service) keeps the loops alive independently, so this gap is safe.
+- **Rollback:** until the Pi workspace is deleted, reverting = stop VM service, `launch.sh start` on
+  the Pi again. Keep the Pi intact until the VM has run clean through at least one reboot + one gate
+  handshake.
+- **Reboot-resilience proof:** before trusting the VM, reboot it once and confirm the loops +
+  watchdog + proxy all come back via systemd (the whole point of the move).
+
+## 5. Operator-assisted steps (the only things I can't fully do)
+1. Provide a fresh `TS_AUTH_KEY` for the VM (or confirm reuse of the one in `terraform-secrets`).
+2. `claude auth login` on the VM (device code).
+3. Connect to the new orchestrator session on the VM at cutover (Phase E).
+
+Everything else (VM create, repo author, NixOS config, secret migration, workspace staging, the
+loop cutover) I can drive.
--- a/cc-ci-plan/plan-phase1b-review-lint.md
+++ b/cc-ci-plan/plan-phase1b-review-lint.md
@ -135,3 +135,40 @@ Blocking unless noted; these are *plan-relevant invariants visible only by readi
 - Whether to add Python **type-checking** (mypy/pyright) now or defer to `IDEAS.md`.
 - The precise **blocking vs advisory** split for the checklist.
 - Whether the `.drone.yml` lint stage should **fail** the build or just warn initially.
+
+---
+
+## 7. Operator review items (added 2026-05-27) — repo layout (do in this 1b pass)
+
+Two structural-review items from the operator. Both are **blocking** for 1b. Apply them as part of
+this pass, then re-verify (RL3 covers the re-verification). **Mind the coordination caveats — these
+touch the live flake build and the running multi-agent machinery.**
+
+### RL5 — Consolidate all Nix-code folders under a root `nix/`
+- Move the folders that contain `.nix` code — **`modules/` and `hosts/`** — to **`nix/modules/` and
+  `nix/hosts/`**. (Add future Nix dirs under `nix/` too.)
+- **Keep `flake.nix` / `flake.lock` at the repo root** (entry point) so the build ref is unchanged
+  (`docs/install.md`'s `nixos-rebuild switch --flake 'git+file://…?submodules=1#cc-ci'` stays valid).
+  Just update the flake's internal paths (`./modules` → `./nix/modules`, `./hosts` → `./nix/hosts`)
+  and any `imports`/`scripts`/`.drone.yml` references.
+- **Re-verify after the move:** the byte-identical clean-room result is the bar. The toplevel store
+  hash *will* change (paths differ) — that's fine; what must hold is that a fresh recursive clone
+  still rebuilds **byte-identical to the running system** and the Adversary re-confirms it cold
+  (folds into RL3). Update `docs/architecture.md` to describe the `nix/` layout.
+
+### RL6 — Move uppercase multi-agent-protocol files into `machine-docs/`
+- Move the uppercase protocol files — **`STATUS*.md`, `REVIEW*.md`, `JOURNAL*.md`, `BACKLOG*.md`,
+  `DECISIONS.md`** — into a root **`machine-docs/`** folder. **`README.md` stays in the repo root**
+  (operator decision, 2026-05-27) — it is the human-facing repo readme, not a protocol file; do
+  **not** move it into `machine-docs/`.
+- **Update every reference** to the new paths: the `cc-ci-plan/` plans (this file, `plan.md`,
+  `plan-phase1c-*`, `README.md`, `kickoff.md`, `test-e2e-testme-acceptance.md`), `AGENTS.md`,
+  `.drone.yml`, `scripts/`, and any in-repo doc that points at `STATUS.md`/`REVIEW.md`/etc.
+- **⚠ COORDINATION CAVEAT (do not move these unilaterally mid-run):** the live **watchdog**
+  (`cc-ci-plan/launch.sh`, the orchestrator's file) reads `STATUS-<id>.md` and `REVIEW-<id>.md` at
+  the **repo root** to drive handoff pings + the 1c→1b auto-transition. Moving them breaks the
+  running watchdog until `launch.sh` is updated to the `machine-docs/` paths and the watchdog is
+  restarted. **So sequence it with the orchestrator:** the orchestrator updates `launch.sh`'s
+  `PHASES_SPEC`/path logic and restarts the watchdog **in lockstep** with the loops' `git mv`.
+  Safest to do this **near the end of 1b** (or as its final step), not while a phase transition is
+  pending. Flag the orchestrator when ready and it will handle `launch.sh` + the watchdog restart.
--- a/cc-ci-plan/plan-phase1d-generic-test-suite.md
+++ b/cc-ci-plan/plan-phase1d-generic-test-suite.md
@ -0,0 +1,243 @@
+# cc-ci Phase 1d — Generic test suite + layered recipe overlays (Autonomous Build Plan)
+
+**Status:** QUEUED — runs **after Phase 1b** (`plan-phase1b-review-lint.md`) and **before Phase 2**
+(`plan-phase2-recipe-tests.md`). It is the **test-architecture foundation** Phase 2 builds on, so it
+must precede it.
+**Transition:** **manual** (operator kicks it off at the post-1b check-in).
+**Builds on:** the post-1b codebase (the runner/harness, `.drone.yml`, the comment-bridge, the proof
+recipes, the `nix/` + `machine-docs/` layout from 1b RL5/RL6).
+**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
+**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md`
+**Phase order:** 1c → 1b → **1d** → 2 → 2b → 3.
+
+---
+
+## 0. Why this phase
+
+Today a recipe only gets tested if someone has authored tests for it. That doesn't scale to ~18+
+recipes and gives nothing for a brand-new recipe. The operator's model (2026-05-27): **every recipe
+gets a generic lifecycle test suite for free**, and recipe-specific tests *layer on top* rather than
+being the only thing that runs. This makes `!testme` meaningful on **any** recipe immediately, turns
+Phase 2 into "add overlays where they add value" instead of "write everything from scratch," and
+gives Phase 3 a natural basis for a YunoHost-style **level** (how many tiers pass).
+
+Core principle: **the generic is the default for each lifecycle op; a recipe's own `test_<op>.py`
+may override *or* extend that default.** The exact additive-vs-override mechanism is the **Builder's
+design call** (operator, 2026-05-27: override — e.g. a present `test_install.py` *replaces* the
+generic install assertions — is a perfectly good option; additive is fine too; could even be
+per-recipe opt-in). What's **fixed** and non-negotiable: (1) when a recipe defines **no** test for an
+op, **the generic runs** (so any recipe is testable with zero config); (2) the harness owns a
+**single shared deployment** that generic + recipe assertions reuse — **no redeploy** (§2.2); (3)
+custom (non-lifecycle) tests are opt-in.
+
+> **SUPERSEDED (2026-05-28) by Phase 1e HC3** (`plan-phase1e-harness-corrections.md`): the override
+> default is replaced — the **generic now runs by default *alongside* an overlay (additive)**,
+> skipped only via an explicit opt-out. Also: repo-local PR code is gated to approved recipes (HC2),
+> and the upgrade tier targets the PR head (HC1). Read 1e for the current behavior.
+
+---
+
+## 1. Definition of Done (Phase 1d exit condition)
+
+Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
+`machine-docs/REVIEW-1d.md`):
+
+- [ ] **DG1 — Generic INSTALL test.** A recipe-agnostic install test exists that, given only a
+      recipe name and **zero** recipe-specific config, does `abra app new` (sane defaults / auto
+      secrets) → `deploy` → polls to converged (all services running/healthy, no bare `sleep`) →
+      asserts the app is **actually serving** (real HTTP(S) response on its
+      `<run>.ci.commoninternet.net` domain through Traefik — **not** a Traefik 404/default cert, not
+      health-only). Demonstrated **green** on ≥1 simple recipe (e.g. `custom-html`) that has **no**
+      cc-ci or repo-local tests.
+- [ ] **DG2 — Generic UPGRADE test.** Deploy the **previous published version** (the last release
+      *before* the code under test), then **upgrade to the code under test (PR head) via
+      `abra app deploy --chaos`** (chaos = the current checkout) — i.e. previous-release → PR-head,
+      not previous → newest-published-tag. Assert services reconverge and the app still serves.
+      **OPERATOR CORRECTION (2026-05-28):** the current 1d impl upgrades to the newest *published tag*
+      and (because deploying the prev tag re-checks-out the recipe) never deploys the PR head — so a
+      recipe PR's actual changes aren't exercised by the upgrade path. Fix: after deploying prev,
+      **restore the PR-head checkout** (re-checkout the PR ref / re-snapshot) and `deploy --chaos` to
+      it as the upgrade target. The "deployment actually moved" assertion (`do_upgrade`) still
+      applies, but adapt it for prev→PR-head (a PR may not bump the version label — also accept an
+      image/config change, or assert the running config now matches the PR head). For a non-PR
+      `!testme`, "current checkout" = the catalogue current, so upgrade tests prev→current. (Data-
+      continuity assertions remain recipe overlays — see §2.1.)
+- [ ] **DG3 — Generic BACKUP + RESTORE tests.** For backup-capable recipes (declare backupbot /
+      `abra app backup` support): run backup → assert a snapshot artifact is produced; then restore →
+      assert restore completes and the app is healthy after. For recipes that declare **no** backup
+      config, backup/restore are cleanly **N/A (skipped)** — *not* a failure.
+- [ ] **DG4 — Layering (override-or-extend; generic is the default).** The harness composes a
+      recipe's run from the generic default per op, **overridden or extended** by the recipe's
+      `test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py` when present (in
+      cc-ci's tests dir **or** repo-local in the recipe's `tests/`) — the additive-vs-override
+      mechanism is the Builder's design choice, recorded in `machine-docs/DECISIONS.md`. **Invariant:
+      if a recipe defines no test for an op, the generic runs.** Arbitrary **custom** `test_*.py` run
+      **only if defined**. Discovery + cc-ci-vs-repo-local precedence is
+      implemented and settled in `machine-docs/DECISIONS.md`.
+- [ ] **DG4.1 — Overlays reuse the deployment (no redeploy).** Overlay + custom assertions run
+      against the **same live deployment** the generic tier brought up — **one deploy per run, one
+      teardown** (§2.2), lifecycle ops mutating it in place. Verified: adding an overlay to a recipe
+      causes **no** extra `abra app new/deploy/undeploy` beyond the shared run (assert via
+      deploy-count / harness logs). Re-provisioning only where an op semantically demands it, and
+      then explicitly.
+- [ ] **DG5 — Custom install-steps hook (with the "generic-anyway" rule).** The harness supports
+      **defined** per-recipe extra install steps (cc-ci or repo-local) that run before/around the
+      generic install. A recipe with **no** customization still **attempts the generic suite**.
+      Proven both ways: (a) a recipe needing a custom step **fails the generic install as expected**
+      when the step is absent; (b) the **same** recipe **passes** once the custom step is added —
+      demonstrating the hook + the graceful-generic-failure are both real.
+- [ ] **DG6 — `!testme` end-to-end on an unconfigured recipe.** A `!testme` on a recipe with no
+      cc-ci/repo-local customization runs the full generic suite through the real pipeline
+      (bridge → Drone → deploy → assert → undeploy → report) and reports **per-operation**
+      pass/fail/skip (install/upgrade/backup/restore).
+- [ ] **DG7 — Real, DRY, clean.** No softened/`skip`/`xfail`/can't-fail assertions; generic logic
+      lives in the **shared harness** (M6.5 — no per-recipe copy-paste); every run **undeploys**
+      (teardown in `finally`), respects `MAX_TESTS`.
+- [ ] **DG8 — Documented + cold-verified.** `docs/` explains the generic suite, the overlay
+      convention (file names + locations + precedence), and the custom-install-steps hook + how to
+      add a recipe overlay. The Adversary re-runs the acceptance checks from a cold start within 24h.
+
+When DG1–DG8 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1d.md`.
+
+---
+
+## 2. The layered test model (the core design)
+
+For a given recipe, the harness assembles and runs **tiers**, each = `generic [+ overlays]`:
+
+(`gen` = generic default; `→` = "overridden/extended by, if the recipe defines it" — mechanism is
+the Builder's call, §2.2):
+```
+INSTALL   = gen_install   → test_install.py    (else gen)    ← always runs
+UPGRADE   = gen_upgrade    → test_upgrade.py    (else gen)    ← always runs
+BACKUP    = gen_backup     → test_backup.py     (else gen)    ← if backup-capable, else N/A
+RESTORE   = gen_restore    → test_restore.py    (else gen)    ← if backup-capable, else N/A
+CUSTOM    = recipe test_*.py   (anything beyond the four)     ← ONLY if defined
+```
+
+### 2.1 Generic baseline suite (recipe-agnostic)
+- **install** — `abra app new <recipe> --domain <run>.ci.commoninternet.net` with non-interactive
+  defaults + auto-generated secrets; `abra app deploy`; poll to converged; assert a real HTTP(S)
+  response from the app over its domain (status + that it's the app, not Traefik's fallback).
+- **upgrade** — deploy a prior/pinned version, `abra app upgrade` to target, assert reconverge +
+  still serving.
+- **backup / restore** — only for recipes declaring backup config; verify the **mechanism** (backup
+  produces an artifact; restore completes + app healthy). **Honest limit:** generic backup/restore
+  can't assert app-specific *data integrity* without recipe knowledge — that's a recipe overlay
+  (`test_backup.py`/`test_restore.py` seed a marker + assert it survives). State this in docs.
+
+### 2.2 Layering — override or extend (Builder's call), always **reuse the deployment**
+A recipe-defined `test_install.py` (etc.) either **overrides** the generic assertions for that op or
+**extends** them — the Builder picks the mechanism (a present `test_<op>.py` replacing the generic is
+a fine, simple model; or additive; or per-recipe opt-in). The invariant either way: **if a recipe
+defines nothing for an op, the generic runs.** This guarantees every recipe is testable with zero
+config, while letting a recipe with a poor generic fit supply its own.
+
+**Reuse the deployment — do NOT redeploy per test (operator requirement, 2026-05-27).** Overlay
+assertions run against the **same live deployment** the generic tier already brought up — no extra
+`abra app new`/`deploy`/`undeploy` per overlay. The target shape: **one deploy per run**, then the
+lifecycle ops mutate that single deployment in sequence and *all* assertions (generic + overlays +
+custom) run against it, then **one teardown** at the end:
+
+```
+deploy ONCE
+  → INSTALL  assertions: generic_install  + test_install.py        (same live app)
+  → UPGRADE  in place (abra app upgrade)
+       assertions: generic_upgrade + test_upgrade.py               (same app, upgraded)
+  → BACKUP   (if capable) → generic_backup + test_backup.py
+  → RESTORE  (if capable) → generic_restore + test_restore.py
+  → CUSTOM   test_*.py                                             (same live app)
+teardown ONCE (in finally)
+```
+
+So a seed/marker written by `test_backup.py` is the same data `test_restore.py` checks; an overlay's
+extra HTTP assertion hits the app the generic install already deployed. Tiers that intentionally
+change state (UPGRADE, RESTORE) do so on the shared deployment in order. The only time a tier
+re-provisions is when an op semantically requires it (e.g. a from-scratch restore-into-blank variant)
+— and that must be explicit, not the default. This is also the main Phase-2b speed win.
+
+### 2.3 Custom tests
+`test_*.py` that aren't one of the four lifecycle names run **only when present** for that recipe —
+no generic equivalent, purely opt-in (e.g. `test_sso.py`, `test_federation.py`).
+
+### 2.4 Custom install steps (and the graceful-generic rule)
+Some recipes need extra setup the generic flow won't do (pre-seed a DB, set an env/secret, run a
+one-off command). The harness exposes a **defined** per-recipe install-steps hook (cc-ci or
+repo-local). Rules:
+- If a recipe **declares** custom install steps → run them as part of the install tier.
+- If a recipe has **no** customization defined anywhere → **still attempt the generic suite.**
+  Recipes that genuinely need special steps will **fail the generic install — and that's acceptable
+  and expected**; the failure is reported (per-op), and the fix is to add the custom step (Phase 2
+  work), not to special-case the harness.
+
+### 2.5 Discovery + precedence
+- **Locations:** cc-ci's test dir (e.g. `tests/<recipe>/`) and the recipe repo's `tests/`
+  (repo-local). The harness discovers overlays + custom-install-steps from both.
+- **Precedence (OPEN — settle in DECISIONS):** proposed default — **both layer**; repo-local is the
+  upstream-authoritative source and always runs, cc-ci's overlay runs in addition (and may pin/extend
+  for the CI env). Define the rule for same-named collisions explicitly.
+
+---
+
+## 3. Milestones (bounded)
+
+- **G0 — Generic install.** Implement generic_install in the shared harness; green on `custom-html`
+  with no recipe config. *Accept:* DG1.
+- **G1 — Generic upgrade + backup/restore.** Add generic_upgrade; add generic_backup/restore gated
+  on backup-capability (clean N/A otherwise). *Accept:* DG2, DG3.
+- **G2 — Layering + discovery.** Implement the generic+overlay composition and cc-ci/repo-local
+  discovery + precedence; prove an overlay runs on top of generic. *Accept:* DG4.
+- **G3 — Custom install-steps hook + graceful-generic.** Implement the hook; demonstrate the
+  fail-without / pass-with proof on one recipe needing a step. *Accept:* DG5.
+- **G4 — `!testme` integration + per-op reporting + docs + cold verify.** *Accept:* DG6, DG7, DG8;
+  then flip `machine-docs/STATUS-1d.md` to `## DONE`.
+
+---
+
+## 4. Guardrails
+- **Generic is the default; recipe tests override or extend it** (Builder's mechanism) — but the
+  generic **always** runs when a recipe defines no test for that op. Never let a recipe end up with
+  *no* assertion for an op it should be tested on.
+- **Generic failure ≠ harness bug** — a recipe needing custom steps failing the generic install is a
+  correct, reported outcome; fix by adding the step (Phase 2), don't weaken/special-case the generic.
+- **Deploy once, reuse it** — overlays run against the generic tier's live deployment; no
+  per-test/per-overlay redeploy. One deploy + one teardown per run; re-provision only when an op
+  truly needs it, explicitly. (Correctness *and* the main perf lever.)
+- **DRY** — generic logic in the shared harness, not per-recipe (M6.5); overlays are thin.
+- **No weakened tests** — real assertions on real app state; teardown always; honor `MAX_TESTS`.
+- **Bounded** — build the architecture + prove it on a couple of recipes; the full per-recipe overlay
+  authoring is Phase 2, not here.
+
+---
+
+## 5. Impact on later phases (reshapes the plan set)
+- **Phase 2 (`plan-phase2-recipe-tests.md`)** changes from "author every test from scratch" to:
+  every enrolled recipe gets the generic suite for free; Phase 2 = **author the additive overlays**
+  (port recipe-maintainer tests as `test_*.py` overlays) **+ define custom install steps** where a
+  recipe fails generically. Update Phase 2 to reference 1d as its foundation.
+- **Phase 3 (`plan-phase3-results-ux.md`)** — the YunoHost-style **level** maps cleanly onto the
+  tiers: e.g. installs (generic) → +upgrade → +backup/restore → +custom assertions. The level is
+  derived from which tiers pass.
+- **Phase 2b (perf)** — the generic suite is the common hot path, so it's the prime target for the
+  image-cache / readiness / dedup optimizations.
+
+---
+
+## 6. Open decisions (log in machine-docs/DECISIONS.md)
+- **Override vs extend (Builder's call).** Does a present `test_<op>.py` **replace** the generic
+  assertions for that op, **add** to them, or is it **per-recipe opt-in**? Operator leans: override
+  is a good, simple model. Builder decides + documents — keeping the invariant "no recipe test for an
+  op ⇒ generic runs" and the single-shared-deployment rule.
+- **cc-ci vs repo-local precedence** for overlays + install-steps (§2.5) and same-name collision rule.
+- Exact **overlay file convention**: fixed names (`test_install.py`…) + discovery dir layout
+  (`tests/<recipe>/` in cc-ci; `tests/` in the recipe repo).
+- How **custom install steps** are declared: a shell hook (`install_steps.sh`), a pytest fixture, or
+  a declarative field — pick the simplest that the harness can run uniformly.
+- **Backup-capability detection**: how the harness decides a recipe is backup-capable (backupbot
+  labels present / `abra app backup` exit) to choose run-vs-N/A for DG3.
+- Whether generic **upgrade** should always go previous→latest, or test the specific
+  version-bump under `!testme` (PR-driven).
+- Per-op result vocabulary (`pass`/`fail`/`skip(N/A)`/`error`) feeding the Phase-3 level.
+- **Deployment-sharing scope**: confirm one-deploy-per-run for the whole lifecycle (install→upgrade→
+  backup→restore→custom on a single deployment) vs per-tier deployments; and how a failed earlier
+  tier (e.g. install) affects later tiers sharing that deployment (fail-fast vs continue-and-report).
--- a/cc-ci-plan/plan-phase1e-harness-corrections.md
+++ b/cc-ci-plan/plan-phase1e-harness-corrections.md
@ -0,0 +1,139 @@
+# cc-ci Phase 1e — Generic-harness corrections (Autonomous Build Plan)
+
+**Status:** QUEUED — runs **after Phase 1d** and **before Phase 2** (`plan-phase2-recipe-tests.md`).
+It corrects the **shared generic-test harness** from 1d, so it must land before Phase 2 authors
+overlays on top of it.
+**Transition:** **manual** (operator kicks it off).
+**Builds on:** the Phase-1d generic suite (`runner/run_recipe_ci.py`, `runner/harness/*`,
+`tests/_generic/*`, `tests/conftest.py`) — see `plan-phase1d-generic-test-suite.md`.
+**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
+**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1e-harness-corrections.md`
+**Phase order:** 1c → 1b → 1d → **1e** → 2 → 2b → 3.
+
+---
+
+## 0. Why this phase
+
+An operator review of the 1d generic suite (2026-05-28) found three corrections to the **shared
+harness** — the foundation every recipe overlay (Phase 2) builds on. Fixing them now, once, is far
+cheaper than after overlays exist. All three are small in code but change behavior, so each needs a
+fresh Adversary cold-verification and must not weaken any existing test.
+
+---
+
+## 1. Definition of Done (Phase 1e exit condition)
+
+Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
+`machine-docs/REVIEW-1e.md`):
+
+- [ ] **HC1 — Upgrade tier upgrades to the code under test (PR head), not a published tag.** The
+      upgrade tier deploys the **previous published version** (last release before the PR) and then
+      **upgrades to the PR head via `abra app deploy --chaos`** (chaos = the current checkout). The
+      PR's actual changes are exercised by the upgrade path. (§2.1)
+- [ ] **HC2 — Repo-local (PR-authored) code is not executed unless the recipe is approved.** By
+      default the harness runs **only cc-ci-authored** overlays/install-steps (`tests/<recipe>/…`) +
+      the generic; PR-authored repo-local `test_*.py` and `install_steps.sh` are **not run**.
+      Repo-local code is honored **only for recipes on an explicit cc-ci-maintained approval
+      allowlist** (default-deny). (§2.2)
+- [ ] **HC3 — Generic runs by default (additive); skipping it is explicit.** When a recipe ships an
+      overlay for an op, the **generic still runs** alongside it by default; the generic is skipped
+      **only** when an explicit env/flag opts out. The baseline floor is never lost silently. (§2.3)
+- [ ] **HC4 — No regression, cold-verified.** The Adversary re-runs the relevant D1–D10 / DG1–DG8
+      acceptance from a cold start: nothing weakened, deploy-once (DG4.1) still holds, teardown still
+      sacred, and the three new behaviors are demonstrated (HC1: a PR-head upgrade proven to deploy
+      PR-head; HC2: a repo-local test is *ignored* for a non-approved recipe and *run* for an approved
+      one; HC3: generic runs with an overlay present, and is skipped only with the opt-out set).
+
+When HC1–HC4 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1e.md`.
+
+---
+
+## 2. The three corrections
+
+### 2.1 HC1 — Upgrade to the PR head (not a published tag)
+Current 1d behavior: deploy previous published version, then `abra app upgrade` to the **newest
+published tag** — and because deploying the prev tag re-checks-out the recipe, the **PR-head code is
+never deployed**, so a recipe PR's changes aren't exercised by upgrade.
+
+Corrected:
+1. Deploy the **previous published version** (the last release before the code under test) as the
+   "before" state.
+2. **Restore the PR-head checkout** (re-checkout the PR ref / re-use the post-fetch snapshot — the
+   prev-tag deploy will have reset `~/.abra/recipes/<recipe>`).
+3. **Upgrade to it via `abra app deploy --chaos`** (chaos = current checkout = PR head) in place on
+   the shared deployment.
+4. Assert reconverge + still serving (as today).
+- **Adapt the "deployment moved" assertion** (`generic.do_upgrade`): prev→PR-head may *not* bump the
+  coop-cloud version label (a PR can change a recipe without a version bump), so also accept an
+  image/config change, or assert the running config now matches the PR head — keep it non-vacuous
+  without false-failing a legit unbumped PR.
+- **Non-PR `!testme`** (no PR head): "current checkout" = the catalogue current, so upgrade tests
+  prev→current — still valid.
+- Preserve **deploy-once** spirit: this is still one app deployment mutated in place (prev → chaos
+  redeploy of PR head is the upgrade op, not a fresh second app). Reconcile with the DG4.1
+  deploy-count guard — define whether a chaos redeploy counts as a "deploy" and adjust the guard so
+  the legitimate upgrade isn't flagged (e.g. count `abra app new` installs, not in-place redeploys).
+
+### 2.2 HC2 — Repo-local trust gate (default-deny; cc-ci overlays only)
+`install_steps.sh` and repo-local `test_*.py` are PR-author-controlled code that runs on the CI host
+with `/run/secrets/*` present — an untrusted-code risk. Operator decision (2026-05-28):
+
+- **Default:** the harness runs **only cc-ci-authored** overlays + install-steps
+  (`tests/<recipe>/…`) and the generic. Repo-local (`<recipe-repo>/tests/`) `test_*.py` and
+  `install_steps.sh` are **discovered-but-not-executed**.
+- **Approved recipes only:** repo-local code is honored **only** when the recipe is on an explicit,
+  **cc-ci-maintained approval allowlist** (default-empty ⇒ default-deny). Adding a recipe to the
+  allowlist is a deliberate cc-ci-maintainer act after reviewing that recipe's tests.
+- Update `discovery.resolve_op` / `custom_tests` / `install_steps` so the **repo-local source is
+  only consulted for allowlisted recipes**; otherwise precedence is **cc-ci > generic** only.
+- **Open (settle in DECISIONS):** the allowlist's form + location (a checked-in file like
+  `tests/repo-local-approved.txt`, or a field in a cc-ci config), and the approval workflow. Keep it
+  simple + auditable + in git.
+- (Future hardening, → IDEAS, not this phase: sandbox/network-restrict even cc-ci overlays.)
+
+### 2.3 HC3 — Generic by default (additive), explicit opt-out
+Supersedes 1d's pure-override default. New rule: when a recipe ships an overlay for an op, **both the
+generic and the overlay run** for that op by default; the generic is skipped **only** when an
+explicit opt-out is set.
+
+- **Opt-out mechanism (propose; settle in DECISIONS):** an env flag `CCCI_SKIP_GENERIC` (all ops) and
+  per-op `CCCI_SKIP_GENERIC_<OP>` (e.g. `..._UPGRADE`), settable via the recipe's `recipe_meta.py`
+  (a `SKIP_GENERIC` list) so it's declarative per recipe, not a hidden global.
+- **Op-vs-assertion split (required by additive + deploy-once):** a mutating op (upgrade/backup/
+  restore) must run **once**, then **both** the generic assertions and the overlay assertions
+  evaluate the post-op state — never upgrade/backup twice. So refactor the tiers: the **orchestrator
+  performs the op once** (the harness owns the op), then runs generic assertions (unless opted out) +
+  overlay assertions against the shared post-op deployment. For `install` (no op) both assertion sets
+  just run. This keeps deploy-once and one-op-per-tier intact.
+- Net effect: the generic "is it actually serving / did the upgrade move / snapshot produced" floor
+  is **always** exercised unless a recipe explicitly declares it skips generics — overlays add, they
+  don't silently subtract.
+
+---
+
+## 3. Method / milestones (bounded)
+- **E0 — HC2 trust gate.** Gate repo-local behind the approval allowlist (default-deny); cc-ci+generic
+  only otherwise. *Accept:* repo-local ignored for a non-approved recipe, run for an approved one.
+- **E1 — HC3 additive + op/assertion split.** Generic runs alongside overlays by default; op runs
+  once; opt-out env skips the generic assertions. *Accept:* overlay + generic both run on one
+  deployment; opt-out skips generic; deploy-count still 1.
+- **E2 — HC1 upgrade-to-PR-head.** prev-release → PR-head via `deploy --chaos`; moved-assertion
+  adapted; deploy-count guard reconciled. *Accept:* upgrade demonstrably deploys PR-head.
+- **E3 — HC4 cold re-verification + docs.** Adversary cold-verifies no regression + the three new
+  behaviors; update `docs/` + `machine-docs/DECISIONS.md`; flip `STATUS-1e.md` to `## DONE`.
+
+---
+
+## 4. Guardrails
+- **Never weaken a test** — these are correctness/security fixes; the cardinal rule still wins.
+- **Default-secure** — repo-local PR code is off unless the recipe is explicitly approved; the
+  allowlist lives in git and is auditable.
+- **Floor-by-default** — the generic baseline always runs unless a recipe explicitly opts out.
+- **Deploy-once preserved** — one app deployment, one teardown; ops run once; reconcile the DG4.1
+  guard with the chaos-upgrade redeploy.
+- **Bounded** — three fixes + verification, then stop; bigger hardening (sandboxing) → IDEAS.
+
+## 5. Open decisions (log in machine-docs/DECISIONS.md)
+- HC2: approval-allowlist form/location + the approval workflow.
+- HC3: opt-out flag name/granularity + declaring it via `recipe_meta.py`.
+- HC1: how the DG4.1 deploy-count guard treats an in-place chaos upgrade (don't flag the legit op).
--- a/cc-ci-plan/plan-phase2-recipe-tests.md
+++ b/cc-ci-plan/plan-phase2-recipe-tests.md
@ -1,8 +1,13 @@
 # cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan)

-**Status:** QUEUED — starts after Phase 1 (`plan.md`) and the Phase-1b review/lint pass
-(`plan-phase1b-review-lint.md`) reach `## DONE`.
-**Builds on:** the Phase-1 cc-ci CI server (`plan.md`). This phase adds **test content**, not infra.
+**Status:** QUEUED — starts after Phase 1b (`plan-phase1b-review-lint.md`) **and Phase 1d**
+(`plan-phase1d-generic-test-suite.md`) reach `## DONE`.
+**Builds on:** the Phase-1 cc-ci CI server (`plan.md`) **and Phase 1d's generic test suite** — every
+recipe already gets generic install/upgrade/backup/restore for free, reusing one shared deployment.
+So this phase is **not** "write every test from scratch": it's **authoring the recipe overlays**
+(`test_<op>.py` that override/extend the generic, per 1d's model) + **defining custom install steps**
+for recipes that fail the generic form, porting the recipe-maintainer corpus as overlays. This phase
+adds **test content**, not infra.
 **Reference corpus:** `references/recipe-maintainer/` → `/srv/recipe-maintainer/` (the existing,
 human-maintained recipe tests — the canonical source to port from).
 **Owner agents:** same Builder + Adversary loops + coordination protocol as Phase 1 (`plan.md` §6/§7).
@ -10,6 +15,44 @@ human-maintained recipe tests — the canonical source to port from).

 ---

+## 0c. OIDC / SSO-dep recipes — follow the SSO-dep plan (NOT operator-pending)
+
+Recipes that authenticate via OIDC (lasuite-docs, cryptpad, lasuite-drive, lasuite-meet, future)
+do **not** need operator input to wire OIDC. The canonical pattern lives at
+**`plan-sso-dep-testing.md`** — declared `DEPS = ["keycloak"]` in `recipe_meta.py`, orchestrator
+co-deploys the dep, recipe `install_steps.sh` reads `$CCCI_DEPS_FILE` and writes the OIDC env vars
+ injects the client secret via abra. Authenticated tests use `harness.sso.oidc_password_grant` or
+Playwright on the dep's login page. **`DEFERRED.md` entries that cite "operator input needed for
+OIDC" are mis-categorised** — re-open and execute them per this plan.
+
+## 0b. Auto-mirror missing recipes (NOT a blocker — autonomous loop work)
+
+A recipe is not on `git.autonomic.zone/recipe-maintainers/<recipe>` is **not** an operator-pending
+blocker. The bot is admin on `recipe-maintainers` (see memory `cc-ci-gitea-recipes`) and can create
+private mirror repos. **For any recipe you want to enroll/test that isn't mirrored yet, mirror it
+yourself** before enrolling, based on the **`recipe-create-pr` skill** —
+`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (which references
+`/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` for the full procedure).
+
+The flow (adapt the skill's command for the new-mirror case):
+1. Create the **private** mirror repo on `git.autonomic.zone/recipe-maintainers/<recipe>` (Gitea API
+   POST `/orgs/recipe-maintainers/repos`, bot creds from `.testenv`/§1.5).
+2. Mirror the upstream `git.coopcloud.tech/coop-cloud/<recipe>` (clone --mirror → push, including
+   tags) so the mirror's `main` is upstream-synced and tags carry over.
+3. Then proceed with normal enrollment + the lifecycle suite (1d generic + your overlays from this
+   phase).
+
+Treat this as standard loop work — don't sit idle waiting on the operator for missing recipes.
+
+## 0a. Prerequisite: Phase 1e harness corrections (must be DONE first)
+
+The 1d/1b operator review produced three shared-harness corrections, now their own phase
+**`plan-phase1e-harness-corrections.md`** (runs **before** this phase). Do **not** author overlays
+until 1e is `## DONE`: it changes the foundation every overlay sits on — (HC1) upgrade goes
+prev-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved
+recipes** (default cc-ci-overlays + generic only); (HC3) the **generic runs by default alongside an
+overlay**, skipped only via an explicit opt-out. See that plan for detail.
+
 ## 0. Relationship to Phase 1 (read first)

 Phase 1 built the **machine**: the Drone pipeline, the `!testme` trigger (polling-primary), the
--- a/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md
+++ b/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md
@ -0,0 +1,113 @@
+# cc-ci Phase 4 — Final review, polish & cleanup (capstone) (Autonomous Build Plan)
+
+**Status:** QUEUED — the **LAST** phase, runs after Phase 3 (`plan-phase3-results-ux.md`). A bounded
+final review/lint/cleanup pass over the **entire** codebase as it stands after all phases, ending in a
+**full cold re-verification that nothing regressed**.
+**Transition:** auto (last in the launcher sequence); after it, the whole build is done.
+**Builds on:** everything — Phase 1 + 1c + 1b + 1d + 1e + 2 + 2b + 3 (flake/`nix/` modules, the
+runner/harness + generic suite + recipe overlays, the comment-bridge, Drone, the dashboard/results
+UX, docs, `machine-docs/`).
+**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
+**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md`
+**Phase order:** 1c → 1b → 1d → 1e → 2 → 2b → 3 → **4 (final)**.
+
+---
+
+## 0. Why this phase (and why it's bounded)
+
+This is the analogue of Phase 1b, but final and whole-codebase. By now the tree has grown a lot —
+recipe overlays (Phase 2), performance changes (2b), and results/dashboard UX (3) — all layered on
+the foundation. Before calling the build done, do one **bounded** pass to clean and harden it, and —
+critically — **re-verify from a cold start that none of the growth/cleanup regressed any earlier
+guarantee.** Same discipline as 1b: **good-enough + enforceable**, style→tooling, judgment→checklist,
+don't reopen settled design, and **never weaken a test** to satisfy a nit.
+
+---
+
+## 1. Definition of Done (Phase 4 exit condition)
+
+Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
+`machine-docs/REVIEW-4.md`):
+
+- [ ] **F1 — Lint/format green across the whole codebase.** Re-run the 1b toolchain (alejandra/
+      statix/deadnix, ruff, shellcheck/shfmt, yamllint) over everything added since 1b (Phase-2
+      overlays, 2b changes, 3 UX/dashboard); extend the lint config to any new languages/areas (e.g.
+      dashboard front-end) so it's covered going forward. The `.drone.yml` lint stage still passes
+      from a clean checkout; prove with a break-it probe.
+- [ ] **F2 — White-box review checklist over all post-1b code.** Run the §3 checklist; fix every
+      **blocking** finding, triage advisories to `BACKLOG`/`IDEAS`. Findings + resolutions in
+      `machine-docs/REVIEW-4.md`.
+- [ ] **F3 — Cleanup.** Remove dead code/scaffolding and stale TODOs; consistent naming/structure;
+      reconcile `machine-docs/` (BACKLOG/IDEAS/DECISIONS current, no contradictions); docs match the
+      final state. No behavior change beyond what F2 mandates.
+- [ ] **F4 — FULL cold re-verification (the final gate).** *After* F1–F3 land, the Adversary
+      **independently re-verifies every prior Definition-of-Done from a cold start**, to the same bar
+      each phase used — fresh PASS + evidence + timestamps in `machine-docs/REVIEW-4.md` within 24h,
+      **nothing weakened/skipped/softened** by the cleanup:
+      - **Phase 1 D1–D10** (incl. the genuine **D8** byte-identical fresh-clone rebuild + a
+        category-spanning live `!testme` e2e through the public gateway).
+      - **Phase 1c C1–C7** (secrets-in-git, cert-in-sops, honest reproducibility).
+      - **Phase 1d DG1–DG8** (generic install/upgrade/backup/restore, deploy-once `DG4.1`, override
+        floor) **as amended by 1e**.
+      - **Phase 1e HC1–HC3** (upgrade→PR-head via `deploy --chaos`; repo-local gated to approved
+        recipes; generic-by-default + explicit opt-out).
+      - **Phase 2** recipe-coverage criteria (every enrolled recipe's overlays/ported tests real,
+        DRY, green).
+      - **Phase 2b** performance claims (the measured improvements still hold; no test weakened to
+        get them).
+      - **Phase 3** results/level/UX criteria (per-run level honest, PR comment + dashboard correct).
+- [ ] **F5 — Documented + cold-verified.** Final `docs/` accurate (install reproduces from scratch;
+      enroll-recipe + overlay/approval flow correct); accepted deviations in `DECISIONS.md`; the
+      Adversary confirms F1–F4 with no standing VETO and no open `[adversary]` finding.
+
+When F1–F5 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-4.md` — the build is
+complete.
+
+---
+
+## 2. Method
+1. **Lint/format first** (F1) — re-run + extend; auto-fix style, don't deliberate.
+2. **Review checklist** (F2, §3) — classify blocking vs advisory; fix blocking, triage rest.
+3. **Cleanup** (F3) — dead code, naming, docs, `machine-docs/` reconciliation.
+4. **Full cold re-verification LAST** (F4) — once everything has landed, the Adversary re-runs the
+   entire cross-phase acceptance from cold. Order matters: tooling → review/fixes → cleanup → *then*
+   full re-verify. Cleanup must regress nothing.
+5. **Bound it** — a pass, not a rewrite; record dead-ends/deviations and stop.
+
+## 3. White-box review checklist (teeth, not taste) — whole codebase
+Blocking unless noted (plan-relevant invariants visible only by reading code):
+- **Tests are real** (blocking) — every generic/overlay/custom test asserts actual app state; no
+  `skip`/`xfail`/can't-fail; per-op `pass/fail/skip` honest; the 1d/1e anti-vacuous guards
+  (`assert_serving` routing proof, `do_upgrade` "moved", deploy-count==1) intact.
+- **1e corrections intact** (blocking) — repo-local code still gated to approved recipes; generic
+  still runs by default (opt-out explicit); upgrade still targets the PR head.
+- **Generic-first / custom-additive invariant** (blocking — `docs/testing.md`). Confirm no path
+  makes the generic tier depend on custom: deps deploy + `setup_custom_tests` run **after** all
+  generic tiers, never before; a forced `setup_custom_tests` failure still yields a clean
+  generic-tier `pass/pass/pass/pass` + `skip(deps-not-ready)` for `@requires_deps` custom tests
+  (re-exercise the forced-failure case). Future maintainers must be able to operate cc-ci with
+  the generic tier alone — verify that path stays viable.
+- **Harness DRY** (blocking-ish) — recipe quirks are data (`recipe_meta.py`), not shared-harness
+  conditionals; overlays are thin; no per-recipe copy-paste of lifecycle logic.
+- **Server state Nix-declared & idempotent** (blocking) — no imperative drift / run-once sentinels /
+  manual post-rebuild steps; the `nix/` layout clean.
+- **No footguns** (blocking) — no bare `sleep` for readiness (poll); teardown in `finally`; secrets
+  reused per run not regenerated; no hardcoded versions/domains that break upstream.
+- **No secrets in code/committed files** (blocking) — grep source/configs/`.drone.yml`/fixtures;
+  log/dashboard redaction real (incl. any new Phase-3 UX surface that echoes run data).
+- **Phase-3 UX correctness** (advisory→blocking on real drift) — the displayed level/badge/screenshot
+  reflect the true per-op results; no misleading "pass".
+- **Architecture matches the plans; deviations in `DECISIONS.md`** (advisory→blocking on real drift).
+- **Readability & docs** (advisory) — clear names, dead code removed, docs reproduce from scratch.
+
+## 4. Guardrails
+- **Never weaken a test** to satisfy a lint/review/cleanup nit (cardinal rule wins).
+- **Don't reopen settled design** — clean + harden + re-verify; bigger ideas → `IDEAS.md`.
+- **Bounded** — one pass; cap iterations; record + stop.
+- **Cleanup regresses nothing** — F4 is the proof; if a cleanup breaks a prior guarantee, revert the
+  cleanup, not the guarantee.
+
+## 5. Open decisions (log in machine-docs/DECISIONS.md)
+- Any new linters/formatters for Phase-3 front-end / new areas, and their strictness.
+- The precise blocking-vs-advisory split for the §3 checklist on the new code.
+- Whether to add Python type-checking now or defer to `IDEAS.md` (carried from 1b).
--- a/cc-ci-plan/plan-sso-dep-testing.md
+++ b/cc-ci-plan/plan-sso-dep-testing.md
@ -0,0 +1,205 @@
+# SSO-dep testing pattern (OIDC + co-deployed provider) — reference plan
+
+**Status:** active reference for Phase 2 (does not need its own phase; it's a pattern Phase 2
+overlays apply per recipe).
+**Operator clarification (2026-05-28):** integrating a recipe with an OIDC/SSO dep is **loop work,
+not operator work**. Anything that was deferred citing "operator input needed for OIDC" should be
+re-opened and done autonomously per this plan.
+**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-sso-dep-testing.md`
+**Companion:** the running harness in `runner/harness/sso.py` (existing primitives:
+`setup_keycloak_realm`, `oidc_password_grant`, `assert_discovery_endpoint`).
+
+---
+
+## 0. Why this plan
+
+Several recipes test their authenticated functionality through an OIDC/SSO provider (keycloak,
+authentik). The cc-ci pattern is to **co-deploy the provider with the recipe under test in the same
+ephemeral run** — one shared deployment per dep, configured at install time, used by the
+recipe-under-test's authenticated tests, torn down with it. This file is the canonical pattern for
+how to wire that up so any recipe that declares `DEPS = ["keycloak"]` (or `["authentik"]`) Just Works
+without per-recipe ad-hoc plumbing. Recipes that need OIDC are not blocked on the operator — they
+follow this plan.
+
+## 1. The DEPS model — deps deploy AFTER generic tiers (operator-2026-05-28)
+
+**Critical ordering rule:** generic tiers (install / upgrade / backup / restore) run against the
+**recipe alone, with no dep available**, so a failure in dep-deploy or OIDC setup **cannot break
+generic-tier signal**. Deps + OIDC wiring move to a **`setup_custom_tests` step** that runs *after*
+generic tiers and *before* the custom tier — its failure is isolated to the SSO-marked custom tests.
+
+A recipe's `tests/<recipe>/recipe_meta.py` declares its SSO dep:
+```python
+DEPS = ["keycloak"]   # or ["authentik"] when that backend lands
+```
+
+### Lifecycle order (single run, per recipe)
+
+```
+1. Deploy recipe-under-test ALONE (no deps, OIDC env unset or stubbed).
+   - app_new #1 for the recipe; generic install_steps.sh runs RECIPE-ONLY setup (no deps).
+2. INSTALL tier:   generic [+ overlay] assertions against the recipe alone.
+3. UPGRADE tier:   abra app upgrade in place, assertions against the recipe alone.
+4. BACKUP tier:    in place (if backup-capable), recipe-alone marker.
+5. RESTORE tier:   in place, recipe-alone marker.
+6. setup_custom_tests step  ← NEW (operator-2026-05-28)
+   a. For each dep in DEPS, deploy + provision realm/client via harness.sso.setup_<provider>_realm.
+   b. Write $CCCI_DEPS_FILE with each dep's {domain, realm, client_id, client_secret, admin_*}.
+   c. Run the per-recipe post-deps hook `tests/<recipe>/setup_custom_tests.sh` to wire the OIDC
+      env into the running recipe (abra app config set + abra app secret insert) and trigger an
+      in-place redeploy of the affected services so the env takes effect.
+   d. Mark deps-ready = True on success; on ANY failure mark deps-ready = False and CONTINUE
+      (log the error; do NOT abort the run).
+7. CUSTOM tier:
+   - If deps-ready: run all custom tests, including those tagged @pytest.mark.requires_deps.
+   - If NOT deps-ready: still run custom tests, but tests tagged @pytest.mark.requires_deps are
+     reported as ERROR/SKIP (with the captured setup_custom_tests error attached). Non-deps
+     custom tests still run normally.
+8. Teardown (in finally): recipe first; then each dep in reverse declaration order.
+```
+
+### DG4.1 deploy-count guard, generalised
+The "one deploy per run" guard becomes **one `abra app new` per app in the run** (recipe + each
+dep). In-place reconfigure-and-redeploy (the step 6c env update) is **NOT** a fresh `app_new` and
+does NOT increment the per-recipe count. So a run with `DEPS = ["keycloak"]` has exactly 2
+`app_new` calls (recipe + keycloak), no matter how many tiers ran. The per-run summary reports
+deploy-count per app for verification.
+
+### Why this ordering
+- **Generic-tier signal is preserved** when SSO/dep setup is broken — the recipe's own deploy/
+  upgrade/backup/restore behaviour is still tested honestly.
+- **Failure isolation**: a recipe whose generic tier passes but whose SSO setup is broken yields
+  per-op `pass/pass/pass/pass/skip(deps-not-ready)` — far more useful than the previous
+  all-or-nothing.
+- A recipe that genuinely can't boot without OIDC fails its generic install honestly (the recipe
+  should accept a stubbed/empty OIDC env at install time and only require the env when an
+  authenticated endpoint is hit). That's a real recipe finding, not a CI artifact.
+
+## 2. Provider pluggability
+
+- **Provider-agnostic primitives** (today, in `harness/sso.py`) — these stay pluggable:
+  - `oidc_password_grant(discovery_url, client_id, client_secret, username, password) -> token` —
+    pure OIDC; works against any compliant provider.
+  - `assert_discovery_endpoint(discovery_url, expected_issuer)` — pure OIDC.
+- **Provider-specific setup** (admin API calls) — one function per provider:
+  - `setup_keycloak_realm(domain, admin_user, admin_password, realm, client_id, redirect_uris) ->
+    {client_secret, discovery_url}` — exists today.
+  - `setup_authentik_realm(...)` — same shape, authentik admin API; **deferred** to a future Q4
+    enrollment that actually wants authentik (see `machine-docs/DEFERRED.md`). Pluggable: a recipe
+    declaring `DEPS = ["authentik"]` would just call this instead. No change to the per-recipe
+    `install_steps.sh` shape beyond which provider it asks for from `$CCCI_DEPS_FILE`.
+- **Don't write per-recipe SSO logic.** All recipes use the same DEPS+install_steps shape.
+
+## 3. Per-recipe hooks — two distinct scripts (recipe-only vs post-deps)
+
+A recipe with `DEPS = ["keycloak"]` ships **two** optional hook scripts (either may be absent if
+not needed):
+
+### 3.1 `tests/<recipe>/install_steps.sh` — RECIPE-ONLY setup, runs at install time
+This is the Phase-1d custom-install-steps hook. It runs **before** the recipe deploys, **with no
+dep available** (the dep hasn't been deployed yet at this point). Use it only for recipe-only
+setup that the recipe needs to boot at all (e.g. seed a fixture, set a non-OIDC env). **Do NOT
+read `$CCCI_DEPS_FILE` here** — it doesn't exist yet. If the recipe requires OIDC to *boot at
+all*, set a safe stub here (e.g. disable auth) so the recipe can come up for generic tiers; the
+real OIDC wiring happens in §3.2.
+
+### 3.2 `tests/<recipe>/setup_custom_tests.sh` — POST-DEPS wiring, runs after generic tiers
+This is the new (operator-2026-05-28) hook that wires the recipe to its already-deployed dep,
+*after* the generic tiers have run. The orchestrator has already deployed each dep and written
+`$CCCI_DEPS_FILE` by the time this runs. Roughly:
+
+```sh
+#!/usr/bin/env bash
+set -euo pipefail
+# Read the dep's connection info from $CCCI_DEPS_FILE (orchestrator-written).
+KC_DOMAIN=$(jq -r '.keycloak.domain'        "$CCCI_DEPS_FILE")
+KC_CLIENT=$(jq -r '.keycloak.client_id'     "$CCCI_DEPS_FILE")
+KC_SECRET=$(jq -r '.keycloak.client_secret' "$CCCI_DEPS_FILE")
+KC_REALM=$( jq -r '.keycloak.realm'         "$CCCI_DEPS_FILE")
+
+# Inject the OIDC client secret as an abra app secret (recipe-conventional name varies — match
+# the recipe's .env.sample SECRET_*).
+echo "$KC_SECRET" | abra app secret insert -n "$CCCI_APP_DOMAIN" oidc_rpcs v1 -
+
+# Write the OIDC env vars to the parent .env (names per the recipe's .env.sample).
+abra app config set "$CCCI_APP_DOMAIN" \
+  OIDC_REALM="$KC_REALM" \
+  OIDC_OP_DISCOVERY_ENDPOINT="https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration" \
+  OIDC_OP_AUTHORIZATION_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth" \
+  OIDC_OP_TOKEN_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token" \
+  OIDC_OP_USER_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo" \
+  OIDC_RP_CLIENT_ID="$KC_CLIENT" \
+  OIDC_RP_REDIRECT_URI="https://${CCCI_APP_DOMAIN}/auth/oidc/callback"
+
+# Force an in-place redeploy of the affected services to pick up the new env. This is NOT a fresh
+# app_new (deploy-count guard still 1 for this recipe).
+abra app deploy --force --chaos --no-input "$CCCI_APP_DOMAIN"
+```
+
+The OIDC env-var **names are recipe-specific** (`OIDC_OP_*` for lasuite-docs, different prefixes
+elsewhere). Read the recipe's `.env.sample` to see which keys the recipe expects; the *values* follow
+this template. If a recipe needs more than this (extra group/claim mappings, etc.), extend its
+`setup_custom_tests.sh` only — never the shared harness.
+
+## 4. Test pattern: authenticated endpoints (mark + isolate)
+
+- **Mark dep-requiring tests:** every custom test that needs the dep up + OIDC wired must use
+  `@pytest.mark.requires_deps`. The orchestrator skips these with reason `"deps-not-ready: <err>"`
+  if `setup_custom_tests` failed. Non-deps custom tests are unaffected by SSO setup failures.
+- **Headless API tests** — use `harness.sso.oidc_password_grant` to mint an access token, then call
+  the recipe's authenticated endpoint with `Authorization: Bearer <token>`. Asserts on the response.
+- **Browser flows (Playwright)** — navigate to the recipe, follow the redirect to keycloak, fill the
+  pre-provisioned test user's credentials, return to the recipe, exercise the UI. (Use the
+  pre-provisioned `ci-user@example.com` / known password the realm setup creates.)
+- **The realm/client is fresh per run** — no cross-run state, no shared accounts. The realm setup
+  creates one or more test users with known passwords (pass-through from a per-run secret) so the
+  tests can authenticate without prompts.
+
+## 5. Concrete recipes that use this pattern (Phase-2 scope)
+
+These are **loop work** under this plan, not deferred:
+
+- **lasuite-docs** — `DEPS = ["keycloak"]`; ports the upstream `oidc_login.py` +
+  `upload_conversion.py` parity tests + the §4.3-prescribed `create-a-doc + read-back via
+  authenticated /api/v1.0/documents/`. (Re-enters `DEFERRED.md` entry #5 — this plan IS the
+  re-entry, not operator input.)
+- **cryptpad** — `DEPS = ["keycloak"]` (cryptpad upstream tests use authentik, but a keycloak-backed
+  cryptpad OIDC test is equally valid and uses the same primitives). The cryptpad create-a-pad
+  Playwright test (DEFERRED #6) is a separate concern — that one really does need a stable
+  CryptPad app-launch contract; it stays deferred.
+- **lasuite-drive, lasuite-meet** — same pattern when mirrored (`recipe-create-pr` skill — loop work).
+- Any future recipe that requires OIDC follows this plan; no operator handoff.
+
+## 6. What stays deferred (genuinely operator-input)
+
+- **authentik enrollment + `setup_authentik_realm` backend** (DEFERRED #9) — provider breadth, not
+  blocking any Phase-2 recipe under keycloak. Open question for the operator: do we want
+  cross-provider coverage as part of Phase-2 DONE? If yes, lift; if not, leave deferred.
+- The `--extra-tests` flag IDEA is **not** a precondition for this plan; OIDC-dep tests are part
+  of the default suite for the recipes that need them.
+
+## 7. Definition of done for this pattern
+- [ ] `DEPS = [...]` honored by `runner/run_recipe_ci.py`, with the **deps-AFTER-generic** ordering
+      (§1): deps deploy + `setup_custom_tests` step runs between RESTORE and CUSTOM tiers;
+      `$CCCI_DEPS_FILE` written; deps torn down LAST in reverse order.
+- [ ] **Failure isolation proven:** a forced `setup_custom_tests` failure (e.g. simulate keycloak
+      realm-setup error) yields a run where generic tiers report **pass** and CUSTOM
+      `requires_deps` tests report **skip(deps-not-ready)** — no false fail of the generic tier,
+      no aborted run.
+- [ ] **lasuite-docs** ships `tests/lasuite-docs/setup_custom_tests.sh` per §3.2 + authenticated
+      tests per §4 marked `@pytest.mark.requires_deps` (closes DEFERRED #5 — keep the entry there
+      with the closing commit, do not re-defer).
+- [ ] At least one other OIDC-dep recipe (cryptpad oidc_login or a lasuite-* once mirrored) lands
+      cold-green using the same pattern, demonstrating reuse.
+- [ ] `docs/sso-dep-testing.md` (in the cc-ci repo) explains the pattern for future recipe
+      enrollments — link to this plan.
+- [ ] Adversary cold-verifies the full run for one such recipe + the forced-failure isolation
+      case, posts PASS in `REVIEW-2.md`.
+
+## 8. Mirror+enroll reminder (also loop work)
+
+If a recipe in scope (e.g. `lasuite-drive`, `lasuite-meet`, `immich`) **isn't mirrored to
+`git.autonomic.zone/recipe-maintainers/`**, mirror it autonomously via the `recipe-create-pr` skill
+at `/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (see also
+`plan-phase2-recipe-tests.md §0b`). Mirror+enroll is **not** operator-pending; the bot is admin on
+the org.
--- a/cc-ci-plan/plan.md
+++ b/cc-ci-plan/plan.md
@ -126,7 +126,13 @@ without the auth key.

 - **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert
  (`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at
-  `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0). The agent feeds these into the
+  `/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0).
+  > **Phase-1c update (supersedes the cert references in §1.5/§4.0/§4.4 below):** the cert is no longer
+  > an out-of-band operator file-drop — it is now **sops-encrypted in the private `cc-ci-secrets` repo**
+  > (a git submodule) and **decrypted at activation to that same path** by sops-nix. Issuance stays
+  > operator-only (LE/Gandi, no token on the box); to rotate, the operator re-issues then re-encrypts
+  > the cert into `cc-ci-secrets` and rebuilds. The ONE out-of-band secret is now the bootstrap age key
+  > at `/var/lib/sops-nix/key.txt`. Authoritative model: `cc-ci/docs/secrets.md` + `docs/install.md`. The agent feeds these into the
  `coop-cloud/traefik` recipe as its `ssl_cert`/`ssl_key` swarm secrets (wildcard/file-provider
  mode) and runs **no ACME** for this domain. **Do not request or expect a `commoninternet.net` DNS
  token** — issuance/renewal is handled out-of-band by the operator (LE 90-day cert; next renewal
@ -597,8 +603,37 @@ its own pacing. To make concurrent writes conflict-free:
    merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder
    fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it
    closed after re-test.
+  - `DEFERRED.md` (in `machine-docs/`) is the **single canonical registry for things the loops
+    have deliberately decided not to do autonomously and that need operator input to move on.**
+    Append-only; either agent may file. Each entry should clearly say *what's needed from the
+    operator* to lift the deferral (an opt-in flag, a resource decision, an architectural call,
+    plain "go ahead"). The list is **open-ended** — items can sit indefinitely, **no obligation
+    to close every item**, closure is operator-driven. A re-entry trigger / IDEA cross-link is
+    **optional** (include when there's a natural mechanism, e.g. an opt-in flag in
+    `cc-ci-plan/IDEAS.md`). Don't park deferrals as a vague "Q4 follow-up" / buried JOURNAL note
+    — file them here so the operator can review the whole list. The Phase-4 cleanup pass should
+    **surface** DEFERRED.md to the operator at least once but does **not** force closure.
+    Future-aspirational ideas (out of current scope) still go to `cc-ci-plan/IDEAS.md`; DEFERRED
+    is for considered-and-parked work the loops won't tackle without operator input.
 - **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never
  conflict. Prefer appending over rewriting.
+- **Artifact-layer isolation — facts in STATUS, reasoning in JOURNAL (anti-anchoring).** Rigorous
+  adversarial verification requires the Adversary NOT to consume the Builder's rationalisations
+  before forming its verdict. The split:
+  - `STATUS.md` MUST carry **everything the Adversary needs to verify the claim** — withholding
+    verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW**
+    to verify (the exact command/check the Adversary can re-run from its own clone), the
+    **EXPECTED** outcome (build hashes, file contents, leaf fingerprints, status codes), and
+    **WHERE** the inputs live (commit shas, paths). If it's essential for the Adversary to verify,
+    it goes in STATUS.
+  - `STATUS.md` MUST NOT carry rationalisations / "why I think this passes" / design narrative /
+    dead-ends explored. Those go in `JOURNAL.md` (Builder-private to write).
+  - The Adversary reads STATUS for the claim + verification info, the plan as SSOT, and the code /
+    git history; it forms its verdict from those + its own **cold** acceptance run, and does **not**
+    read `JOURNAL.md` before the verdict. After an independent verdict, consulting JOURNAL is fine
+    (e.g. to contextualise a finding) — note in REVIEW that you did.
+
+  In short: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.**
 - **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the
  smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's
  file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`.
@ -613,6 +648,22 @@ its own pacing. To make concurrent writes conflict-free:
 - **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or
  the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep
  doing independent work — neither loop blocks idle waiting on the other beyond its gate.
+- **INBOX — explicit cross-loop messaging beyond CLAIMS.** Sometimes you have something to say to
+  the other loop that isn't a gate claim or a REVIEW verdict (a heads-up, a request for
+  early-look, a "I refactored X, please re-verify Y", an observation outside the normal flow). For
+  those, use the inbox files in `machine-docs/`:
+  - **Builder → Adversary:** the Builder writes/appends `machine-docs/ADVERSARY-INBOX.md` in its
+    own clone, commits, pushes.
+  - **Adversary → Builder:** the Adversary writes/appends `machine-docs/BUILDER-INBOX.md` in its
+    own clone, commits, pushes.
+  - The watchdog edge-triggers on **newly-present** inbox files in the relevant clone and pings
+    the receiver. The receiver, on receipt, reads + processes the message, then **deletes the
+    inbox file** (commits + pushes) — deletion is the "message consumed" signal. Single-writer
+    discipline: only the sender writes their counterpart's inbox; only the receiver deletes it.
+  - **Use for:** non-gate signals — "heads-up I'm about to refactor X," "please cold-verify this
+    while I keep going," "I observed Y outside our normal flow," "I'm taking a long e2e now."
+    **Do NOT use for:** formal gate claims (`STATUS.md` still owns those) or verdicts (`REVIEW.md`
+    still owns those). The inbox is a side-channel, not a replacement.

 (If you are ever forced to run with a single process, the degraded fallback is to alternate
 roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is
@ -649,8 +700,12 @@ every wake, `git pull --rebase` first, then:
 **Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the
 harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll
 the *specific* thing. Three cases:
-1. **Something in flight** (build/deploy/`nixos-rebuild`) → re-check on a short cadence (≈4 min) to
-   stay cache-warm; keep polling *it*, don't treat it as idle, and don't spin on a minutes-long build.
+1. **Something in flight** (build/deploy/`nixos-rebuild`/e2e/heavy test) → **poll every ~5 min** to
+   stay cache-warm and to **see failures as they happen**, not at the end of a 25-minute sleep. Do
+   **NOT** `ScheduleWakeup` for the expected total runtime of the task in a single big sleep — a 25
+   min e2e gets 5 short cache-warm polls, not one 25-min cache-cold blackout. The wakeup that wakes
+   you mid-task is *cheap* (one cache hit, one quick status check); the value of catching a deploy
+   that died at minute 4 of a 25-min budget is large. Keep polling *it*, don't treat it as idle.
 2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or
   Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll
   here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate,
--- a/cc-ci-plan/prompts/adversary.md
+++ b/cc-ci-plan/prompts/adversary.md
@ -8,6 +8,8 @@ You run as a SEPARATE process and coordinate ONLY through the git repo per §6.1
 - Keep your OWN clone at /srv/cc-ci/cc-ci-adv. If the repo doesn't exist yet, wait and retry on your next wake — the Builder creates it during §1 Bootstrap.
 - git pull --rebase before every edit; commit; push; never --force.
 - Write ONLY your files: REVIEW.md and the "## Adversary findings" section of BACKLOG.md. Everything else (code, STATUS.md, JOURNAL.md, "## Build backlog") is read-only to you.
+- INBOX side-channel (§6.1). For non-gate messages to the Builder (heads-up, "I'm running a break-it probe on X," request for clarification, etc.), write/append `machine-docs/BUILDER-INBOX.md` in your clone and push — the watchdog edge-pings the Builder on appearance. To receive a message from the Builder, look for `machine-docs/ADVERSARY-INBOX.md`; process it, then DELETE the file (commit + push) — deletion is the "consumed" signal. Do NOT use the inbox for formal verdicts — REVIEW.md still owns those.
+- ISOLATION DISCIPLINE (anti-anchoring — critical). The Builder is REQUIRED to give you in STATUS.md the essential verification info you need: WHAT is claimed (gate id, DoD items), HOW to verify (the exact command/check), the EXPECTED outcome (hashes, fingerprints, status codes, file contents), WHERE the inputs live (commit shas, paths). **Read STATUS for that — you need all of it to verify.** What you must IGNORE — in STATUS, and NEVER read in JOURNAL.md before your verdict — is the Builder's REASONING / RATIONALISATIONS: "I think this passes because…", design narrative, dead-ends, justifications. Reading those anchors you. Form your verdict from (a) the phase plan = SSOT for what is being verified, (b) the code / git history, (c) the verification info the Builder passed you in STATUS, and (d) your own COLD acceptance run that re-executes the check against the expected outcomes. Only AFTER you have written your verdict may you consult JOURNAL.md (e.g. to contextualise a finding) — note in REVIEW.md that you did. Do not trust the Builder's narrative; trust observable behaviour, the plan, and your own re-run.

 Each wake:
 1. Pull. Read STATUS.md for any "Gate: <Mn> CLAIMED, awaiting Adversary".
--- a/cc-ci-plan/prompts/builder.md
+++ b/cc-ci-plan/prompts/builder.md
@ -2,12 +2,16 @@ You are the Builder agent for the cc-ci project — one of two independent loops

 Single source of truth: /srv/cc-ci/cc-ci-plan/plan.md. Read it in full now, then begin at §1 Bootstrap. The original brief /srv/cc-ci/cc-ci-plan/brief.md is context only — do not edit it.

-Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7 (three cases): (1) build/deploy/rebuild in flight → poll ~4m, keep polling it; (2) parked at a CLAIMED gate awaiting the Adversary with no other unblocked work → the watchdog will PING you the moment the Adversary updates REVIEW.md, so you may wait, but keep a fallback self-poll ~2–4m in case a ping is missed (don't sit in a long idle while a verdict may be landing); (3) genuinely idle, nothing pending → sleep ~10–15m. Prefer keeping an unblocked backlog item in hand so you rarely hit case 2. Do NOT spin on a minutes-long build. Stop the loop only when STATUS.md says ## DONE.
+Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7 (three cases): (1) build/deploy/rebuild/e2e/heavy-test in flight → **poll every ~5 min, NEVER a single big ScheduleWakeup matching the expected runtime** (catch failures at minute 4 of a 25-min e2e, not at minute 25); the cache-warm 5-min poll is cheap, the long blackout is not; (2) parked at a CLAIMED gate awaiting the Adversary with no other unblocked work → the watchdog will PING you the moment the Adversary updates REVIEW.md OR writes a BUILDER-INBOX.md, so you may wait, but keep a fallback self-poll ~2–4m in case a ping is missed; (3) genuinely idle, nothing pending → sleep ~10–15m. Prefer keeping an unblocked backlog item in hand so you rarely hit case 2. Stop the loop only when STATUS.md says ## DONE.

 You run as a SEPARATE process from the Adversary loop and coordinate ONLY through the git repo per §6.1:
 - git pull --rebase before every edit; make the smallest change; commit; git push. Never --force.
 - Write ONLY your files: source/config, STATUS.md, JOURNAL.md, DECISIONS.md, and the "## Build backlog" section of BACKLOG.md. Treat REVIEW.md and "## Adversary findings" as read-only — the Adversary owns them.
+- ARTIFACT-LAYER ISOLATION (facts in STATUS, reasoning in JOURNAL). STATUS.md **MUST** give the Adversary everything it needs to verify your claim — withholding verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW** to verify it (the exact command/check the Adversary can re-run from its own clone), the **EXPECTED** outcome (build hashes, file contents, status codes, leaf fingerprints, command exit), and **WHERE** the inputs live (commit shas, paths). If something is essential for the Adversary to verify, put it in STATUS. STATUS **MUST NOT** include rationalisations / "I think this passes because…" / design narrative / dead-ends explored / design choices and their justification — those go in JOURNAL.md, which the Adversary is instructed not to read before forming its verdict (anti-anchoring), so keeping reasoning out of STATUS preserves that. The line: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.** DECISIONS.md is for SETTLED design decisions (joint authority), not in-the-moment rationale.
 - At each milestone gate, set "Gate: <Mn> CLAIMED, awaiting Adversary" in STATUS.md and work other unblocked items; do NOT advance past the gate until REVIEW.md shows its PASS.
+- INBOX side-channel (§6.1). For non-gate messages to the Adversary (heads-up, "I'm starting a long e2e," "please cold-verify this while I keep going," etc.), write/append `machine-docs/ADVERSARY-INBOX.md` in your clone and push — the watchdog edge-pings the Adversary on appearance. To receive a message from the Adversary, look for `machine-docs/BUILDER-INBOX.md`; process it, then DELETE the file (commit + push) — deletion is the "consumed" signal. Do NOT use the inbox for formal gate claims or verdicts — STATUS.md / REVIEW.md still own those.
+- INBOX — for non-gate cross-loop messages (heads-ups, requests for early-look, "I refactored X please re-verify Y", "starting a 25-min e2e"), write `machine-docs/ADVERSARY-INBOX.md` in your clone and push. The watchdog edge-triggers and pings the Adversary. The Adversary deletes the file on consumption. If you receive `machine-docs/BUILDER-INBOX.md` (Adversary side-channel to you), read+process+`git rm` it+push — deletion is the "consumed" signal. Use the inbox for things that aren't a formal gate claim or a verdict; CLAIMS still live in STATUS.md and verdicts in REVIEW.md (the inbox is a side-channel, not a replacement).
+- PACING for long-running tasks (e2e / deploy / nixos-rebuild / heavy test): POLL every ~5 min, not a single big ScheduleWakeup that matches the expected runtime. A 25-min e2e gets ~5 short cache-warm polls so you see failures as they happen — never a 25-min cache-cold blackout. (plan.md §7 case 1.)
 - Write "## DONE" only when REVIEW.md shows a PASS dated <24h for every D1–D10 and there is no standing "## VETO".

 Overriding rules:
--- a/cc-ci-plan/reboot-log.sh
+++ b/cc-ci-plan/reboot-log.sh
@ -0,0 +1,22 @@
+#!/usr/bin/env bash
+# Runs as ExecStartPre of cc-ci-loops.service. Appends ONE line to REBOOTS.md per genuine reboot.
+# Uses the kernel boot_id to distinguish a real reboot from a mere `systemctl restart` of the unit:
+# only logs when the current boot_id differs from the last one we recorded.
+set -u
+
+REBOOTS="/srv/cc-ci/cc-ci-plan/REBOOTS.md"
+LAST_BOOT_FILE="/srv/cc-ci/.cc-ci-logs/.last-boot-id"
+PHASE_IDX_FILE="/srv/cc-ci/.cc-ci-logs/.phase-idx"
+
+cur_boot="$(cat /proc/sys/kernel/random/boot_id 2>/dev/null || echo unknown)"
+last_boot="$(cat "$LAST_BOOT_FILE" 2>/dev/null || echo '')"
+
+# Same boot_id => this is a manual service restart, not a reboot => do nothing.
+[ "$cur_boot" = "$last_boot" ] && exit 0
+
+idx="$(cat "$PHASE_IDX_FILE" 2>/dev/null || echo '?')"
+ts="$(date '+%Y-%m-%d %H:%M:%S %Z')"
+mkdir -p "$(dirname "$LAST_BOOT_FILE")"
+printf '%s\n' "- $ts — reboot detected; loops auto-started by systemd (resuming phase index $idx). boot_id=$cur_boot" >> "$REBOOTS" 2>/dev/null || true
+echo "$cur_boot" > "$LAST_BOOT_FILE" 2>/dev/null || true
+exit 0
--- a/cc-ci-plan/systemd/cc-ci-loops.service
+++ b/cc-ci-plan/systemd/cc-ci-loops.service
@ -0,0 +1,32 @@
+[Unit]
+# Canonical, version-controlled copy of the unit installed at /etc/systemd/system/cc-ci-loops.service.
+# Install:  sudo install -m0644 cc-ci-plan/systemd/cc-ci-loops.service /etc/systemd/system/ \
+#           && sudo systemctl daemon-reload && sudo systemctl enable cc-ci-loops.service
+# Brings the WHOLE rig back after a reboot of the orchestrator Pi: loops + watchdog (launch.sh) AND
+# the orchestrator supervisory session (launch-orchestrator.sh), plus a reboot record (reboot-log.sh).
+Description=cc-ci autonomous loops + watchdog + orchestrator (reboot-resilient)
+Documentation=file:///srv/cc-ci/cc-ci-plan/plan.md
+After=network-online.target cc-ci-tailscaled.service
+Wants=network-online.target
+Requires=cc-ci-tailscaled.service
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+User=notplants
+Group=notplants
+Environment=HOME=/home/notplants
+Environment=PATH=/home/notplants/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
+# RESUME_PHASE=1 so a reboot resumes the SAVED phase (e.g. phase 2), never restarts from phase 0/1c.
+Environment=RESUME_PHASE=1
+# 1) record the reboot (boot_id-gated); 2) start loops + watchdog; 3) resume the orchestrator session.
+ExecStartPre=/srv/cc-ci/cc-ci-plan/reboot-log.sh
+ExecStart=/srv/cc-ci/cc-ci-plan/launch.sh start
+ExecStartPost=/srv/cc-ci/cc-ci-plan/launch-orchestrator.sh start
+# Stop only the loops + watchdog. The orchestrator session is intentionally LEFT running on a manual
+# `systemctl stop` (stopping the loops shouldn't kill your steering session; it resumes from disk).
+ExecStop=/srv/cc-ci/cc-ci-plan/launch.sh stop
+TimeoutStartSec=180
+
+[Install]
+WantedBy=multi-user.target
--- a/cc-ci-plan/test-e2e-testme-acceptance.md
+++ b/cc-ci-plan/test-e2e-testme-acceptance.md
@ -0,0 +1,123 @@
+# Acceptance test — real end-to-end `!testme` on the clean-room-rebuilt VM
+
+**Owner:** the Builder + Adversary loops (they execute *and* independently verify this).
+**When:** after **C4/C5 PASS** (genuine throwaway-VM clean-room rebuild verified). The Builder then
+performs the tailnet swap (§1) and runs the e2e; the Adversary independently verifies. It is the
+**functional acceptance** of D8/clean-room: proof that the rebuilt-from-git VM doesn't just match
+byte-for-byte, but actually *serves a real CI run end-to-end through the public domain*.
+**This file:** `/srv/cc-ci/cc-ci-plan/test-e2e-testme-acceptance.md`
+
+---
+
+## 0. Why
+
+The reproducibility gates (C1–C5) prove the rebuilt VM is structurally identical and boots clean.
+This test proves it is **operationally** a working CI server: a maintainer comment triggers a build,
+the app deploys and is reachable on its real public URL through the operator's gateway, the test
+passes, and it tears down — the whole `!testme` pipeline, on the from-git VM, over the real domain.
+
+---
+
+## 1. Setup — the Builder performs the tailnet swap (then the e2e)
+
+The rebuilt throwaway must become the live `cc-nix-test` so that the public gateway routes real
+`ci.commoninternet.net` traffic to it (the gateway TLS-passthroughs via MagicDNS to
+`cc-nix-test.taila4a0bf.ts.net` and re-resolves every ~10s, so it auto-follows the name). The swap is
+**two reversible `tailscale set --hostname` commands** on VMs you already control — the Builder does
+it. **Do this only after C4/C5 PASS** and after the rebuilt VM's full stack
+(traefik + bridge + drone + dashboard) is up and serving locally.
+
+**Order matters** (rename the original *aside first*, or the throwaway will get `cc-nix-test-1`):
+
+1. **Rename the original prod VM aside** (it stays running — do NOT destroy it; needed for swap-back):
+   ```
+   ssh cc-ci 'tailscale set --hostname=cc-nix-test-orig'
+   ```
+   (`ssh cc-ci` is pinned to the original's IP `100.90.116.4`, so it keeps reaching the original
+   regardless of the name change.)
+2. **Rename the rebuilt throwaway → `cc-nix-test`.** Re-derive its current tailscale IP (throwaways
+   get a fresh IP each rebuild): pick the ONLINE throwaway node from
+   `tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep -i throwaway`, then:
+   ```
+   ssh -i /srv/incus-terraform-nix-vm-creator/terraform-secrets/vm_ssh_key \
+       -o ProxyCommand='nc -X 5 -x 127.0.0.1:1055 %h %p' root@<throwaway-ip> \
+       'tailscale set --hostname=cc-nix-test'
+   ```
+
+**Heads-up — tailnet-wide effect:** after the swap, `cc-nix-test.taila4a0bf.ts.net` resolves to the
+rebuilt VM for *everyone* on the tailnet, so any of your own tooling that targets cc-nix-test **by
+MagicDNS name** will now hit the rebuilt VM (tooling pinned to the raw IP `100.90.116.4` still hits
+the original). Account for that when you point `!testme`/deploys.
+
+**Verify the swap took (P1+P2) before starting the e2e** — must pass:
+```
+tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep cc-nix-test   # → the throwaway's IP
+curl -sS -o /dev/null -w '%{http_code} ssl_verify=%{ssl_verify_result}\n' https://ci.commoninternet.net/
+# expect: 200 ssl_verify=0   (real public path now served by the rebuilt VM, valid cert)
+```
+
+**Swap-back when testing is done** (reversible): rename the throwaway back to its old name, then
+`ssh cc-ci 'tailscale set --hostname=cc-nix-test'` to restore the original; the gateway re-follows.
+
+---
+
+## 2. Procedure
+
+1. **Pick one fast, already-enrolled recipe.** Prefer the lightest enrolled app (e.g. `custom-html`)
+   so the run is quick and resource-cheap. Note the recipe + the repo/issue or PR where `!testme` is
+   recognised (the same place prior runs were triggered).
+2. **Record the baseline.** Capture the recipe's *current* latest Drone run number and the dashboard
+   row (`https://ci.commoninternet.net/` and `https://drone.ci.commoninternet.net/...`) so you can
+   prove the run you trigger is **new**.
+3. **Trigger via the real path.** Post `!testme` as the **bot** (the normal maintainer-comment
+   trigger) on that recipe — exactly as a real maintainer would. Do **not** invoke Drone directly or
+   shortcut the bridge; the comment→bridge→Drone path is part of what's under test.
+4. **Confirm the bridge picked it up.** Within the bridge's poll interval, a **new** Drone build for
+   that recipe starts. Capture the new run number (must be > the baseline from step 2).
+5. **Confirm the app deploys and is reachable on its PUBLIC URL.** While the build runs, the app is
+   deployed to its `*.ci.commoninternet.net` test domain. From **off the VM** (external — through the
+   gateway, not `localhost`/`127.0.0.1`), confirm a real request succeeds:
+   ```
+   curl -sS -D- -o /dev/null https://<app-test-subdomain>.ci.commoninternet.net/
+   # expect: HTTP 200 (or the app's expected status), valid *.ci.commoninternet.net cert,
+   #         served content from the deployed app — NOT a Traefik 404 / default-cert.
+   ```
+   This is the crux: it proves routing public-DNS → gateway → MagicDNS → rebuilt VM → Traefik →
+   deployed app all works on the rebuilt server.
+6. **Confirm the test logic passed.** The Drone build runs the recipe's real test assertions (app
+   state, not health-only) and finishes **success**.
+7. **Confirm teardown.** After the run, the app is **undeployed** (no leftover stack/containers), per
+   the standard post-run cleanup — verify it's gone.
+8. **Confirm the result was reported.** The outcome posts back to the trigger location and the
+   dashboard row updates to the new run with `success`.
+
+---
+
+## 3. Pass criteria (all must hold; Adversary verifies independently)
+
+- [ ] **E1.** Self-check §1 passed (`ci.commoninternet.net` = 200, valid cert, on the rebuilt VM).
+- [ ] **E2.** Posting `!testme` produced a **new** Drone build (run # > baseline) via the bridge —
+      not a manual Drone trigger.
+- [ ] **E3.** The deployed app answered an **external** request on its real
+      `<app>.ci.commoninternet.net` URL (through the gateway) with the expected response + valid cert
+      — captured with headers/body evidence.
+- [ ] **E4.** The Drone build's **real test assertions** ran and the build finished **success**
+      (no skipped/softened tests).
+- [ ] **E5.** The app **undeployed** cleanly afterward (no residual stack).
+- [ ] **E6.** Result reported back + dashboard updated to the new successful run.
+
+Evidence (run #, the external `curl` headers/body, dashboard before/after, undeploy proof) is logged
+in `JOURNAL-1c.md`, and the verdict in `REVIEW-1c.md` / `STATUS-1c.md` as **E2E-TESTME — PASS**.
+
+## 4. If it fails
+
+Treat as a clean-room finding, not a config patch: a failure here means the from-git rebuild is
+missing something the running server had out-of-band (a secret, a manual step, drift). Capture the
+failing stage + logs in `JOURNAL-1c.md`, raise it as a blocker, and fix it in the **git source**
+(base or `cc-ci-secrets`) so the next rebuild includes it — do **not** hand-fix the live VM. Re-run
+this test after the fix.
+
+## 5. Bound
+
+One recipe, one green run. This is a functional smoke test of the rebuilt VM, not a full recipe-test
+campaign (that's Phase 2). Don't expand scope here.