orchestrator: reboot-resilience + session auto-resume + full session plan/tooling
Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
6
.gitignore
vendored
6
.gitignore
vendored
@ -10,3 +10,9 @@
|
||||
/cc-ci-adv/
|
||||
/.cc-ci-watch/
|
||||
/.cc-ci-logs/
|
||||
|
||||
# More secrets / local state — NEVER commit
|
||||
/.sops/ # master recovery age key
|
||||
/cc-ci-secrets/ # separate sops-secrets repo, cloned in
|
||||
/.claude/ # local claude session/project state
|
||||
*.tmp.* # editor temp files
|
||||
|
||||
19
AGENTS.md
19
AGENTS.md
@ -16,6 +16,25 @@ project (NixOS config, test runner, recipe tests) lives in a **separate** repo t
|
||||
The two loops coordinate **only** through the cc-ci git repo (see `plan.md` §6.1). The orchestrator
|
||||
watches from outside.
|
||||
|
||||
## On startup: announce yourself + report reboots
|
||||
|
||||
**Every time you (the orchestrator) start or resume, send a `PushNotification`** that you are online —
|
||||
the operator wants to know the supervising session is back (especially after a reboot, which kills
|
||||
this session along with the Pi). Include the current phase and the reboot count. Steps on startup:
|
||||
1. Read `cc-ci-plan/REBOOTS.md` (count the `## Reboots` entries) and `cc-ci-plan/launch.sh status`
|
||||
(current phase + whether the loops/watchdog are running).
|
||||
2. `PushNotification` (proactive), e.g.: *"cc-ci orchestrator online — phase 2, loops+watchdog
|
||||
running; N reboots logged (last <date>)."*
|
||||
3. If a reboot happened while you were away (a new line in REBOOTS.md since you last looked, or the
|
||||
loops are down), check that `cc-ci-loops.service` brought the loops back; if not, relaunch with
|
||||
`RESUME_PHASE=1 cc-ci-plan/launch.sh start`.
|
||||
|
||||
Reboot resilience is handled by **`cc-ci-loops.service`** (system unit): on boot it logs the reboot
|
||||
to `REBOOTS.md` (boot_id-gated) and runs `launch.sh start` with `RESUME_PHASE=1`, so the loops +
|
||||
watchdog auto-resume the saved phase. The orchestrator session itself is NOT auto-started — the
|
||||
operator reconnects to it (that's why the startup notification matters). The fuller "move the
|
||||
orchestrator onto its own VM" plan is parked at `cc-ci-plan/plan-orchestrator-migration.md`.
|
||||
|
||||
## Keep the orchestrator open, under remote-control
|
||||
|
||||
Run this session as a long-lived **interactive** session with `--remote-control` so the operator can
|
||||
|
||||
@ -4,6 +4,22 @@ Post-DONE or "revisit later" ideas that are intentionally **out of scope** for t
|
||||
(§2 Definition of Done). Not active work — parked here so they aren't lost. The loops may pull an
|
||||
item into the project `BACKLOG.md` as `[idea]` if/when it becomes relevant.
|
||||
|
||||
- [ ] **Optional `--extra-tests` flag for heavy / operational tests (opt-in heavy suite).**
|
||||
Some recipe tests are "more than needed" for the default CI signal — state-management /
|
||||
long-running-instance / load / helper-script operational tests that don't fit the ephemeral
|
||||
per-run-deploy model cheaply but are useful occasionally. Today they're deferred to
|
||||
`cc-ci/machine-docs/DEFERRED.md` (e.g. matrix-synapse `compress_state.sh`,
|
||||
`test_complexity_limit.sh`, `test_purge.sh`) and don't run.
|
||||
*Idea:* add an **opt-in `--extra-tests` flag** (e.g. `!testme --extra-tests` on a PR comment, or
|
||||
a `STAGES=extra` / `EXTRA_TESTS=1` Drone build parameter) that the orchestrator passes through;
|
||||
recipes declare an `extra/` test dir or mark tests with `@pytest.mark.extra`; on opt-in the
|
||||
orchestrator runs them **alongside** the default tiers (still one deploy, still teardown). Default
|
||||
off so default CI stays fast; the operator can ask for the heavy suite when reviewing a PR that
|
||||
touches an extra-covered area (e.g. matrix-synapse's abra helpers). When implemented, each
|
||||
matching DEFERRED entry can be CLOSED by porting its test into the recipe's `extra/` and noting
|
||||
the commit in DEFERRED.md. *Why deferred for now:* default coverage is sufficient; this is a
|
||||
later breadth/depth knob, not a critical-path feature. *Added:* 2026-05-28.
|
||||
|
||||
- [ ] **Optional webhook self-registration (admin-access environments).**
|
||||
We deliberately made **polling the primary trigger** and require the CI server/bot to run on
|
||||
**read-level** access only — so the server does **not** auto-register Gitea webhooks (that needs
|
||||
|
||||
@ -17,7 +17,9 @@ autonomous Claude loops (a Builder and an adversarial Reviewer) running over day
|
||||
| `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. |
|
||||
| `plan-phase1c-full-reproducibility.md` | **Phase 1c** (runs first): make the VM fully reproducible from git (all secrets incl. the wildcard cert in sops, in a separate private `cc-ci-secrets` repo as a flake input; base stays well-parameterized) and do the **genuine throwaway-VM live rebuild** to close D8 honestly (the "infeasible by design" was overstated). |
|
||||
| `plan-phase1b-review-lint.md` | **Phase 1b** (after 1c): deterministic linting/formatting in CI + a white-box review checklist (real tests, DRY harness, idempotent Nix, no footguns/secrets), ending in a full cold re-verification of all D1–D10 — now covering 1c's refactor. |
|
||||
| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1b): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. |
|
||||
| `plan-phase1d-generic-test-suite.md` | **Phase 1d** (after 1b, before 2): a **generic install/upgrade/backup/restore** suite that runs on *any* recipe with zero config, with a recipe's own `test_<op>.py` **overriding or extending** the generic (Builder's call) and **reusing the generic's deployment — no redeploy**, plus optional custom install-steps; recipes needing special setup fail the generic form gracefully. The test-architecture foundation Phase 2 builds on. |
|
||||
| `plan-phase1e-harness-corrections.md` | **Phase 1e** (after 1d, before 2): three operator-review corrections to the shared generic harness — (HC1) upgrade goes previous-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved recipes** (default = cc-ci overlays + generic only); (HC3) the **generic runs by default** alongside an overlay, skipped only via explicit opt-out. |
|
||||
| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1e): build on the corrected generic suite — author the recipe overlays (port recipe-maintainer tests as `test_*.py`) + define custom install steps where a recipe fails generically. |
|
||||
| `plan-phase2b-test-performance.md` | **Phase 2b** (after Phase 2, before Phase 3): empirically measure where test time goes and reduce it (image cache, readiness tuning, dedup deploys, warm infra, concurrency) — no weakened tests. |
|
||||
| `plan-phase3-results-ux.md` | **Phase 3** (after Phase 2b): beautiful YunoHost-style results — per-run **level**, image-forward PR comment (badge + summary card + app screenshot), polished dashboard. |
|
||||
| `IDEAS.md` | Deferred/future ideas, parked out of current scope. |
|
||||
|
||||
15
cc-ci-plan/REBOOTS.md
Normal file
15
cc-ci-plan/REBOOTS.md
Normal file
@ -0,0 +1,15 @@
|
||||
# Reboot log — cc-ci orchestrator Pi
|
||||
|
||||
One line per genuine reboot of the orchestrator Pi (`raspberrypi`), appended automatically by
|
||||
`reboot-log.sh` (ExecStartPre of `cc-ci-loops.service`, boot_id-gated so manual service restarts are
|
||||
NOT counted). The Pi hosts the Builder + Adversary loops + watchdog; a reboot drops the tmux sessions
|
||||
(and this orchestrator session), and `cc-ci-loops.service` restarts the loops on boot. Count the
|
||||
lines below to see how often it's happening.
|
||||
|
||||
## Reboots
|
||||
|
||||
- 2026-05-28 (~19:?? BST) — reboot (backfilled from memory; mid-Phase-2). Orchestrator + loops were
|
||||
down until manually relaunched. This pre-dates the systemd auto-restart service.
|
||||
- 2026-05-28 (~20:02 BST) — reboot (backfilled from memory; uptime showed 5 min at 20:07). Loops
|
||||
manually relaunched at phase 2; this is what prompted adding `cc-ci-loops.service` +
|
||||
auto-logging. Auto-logging is live from the next reboot onward.
|
||||
117
cc-ci-plan/launch-orchestrator.sh
Executable file
117
cc-ci-plan/launch-orchestrator.sh
Executable file
@ -0,0 +1,117 @@
|
||||
#!/usr/bin/env bash
|
||||
#
|
||||
# launch-orchestrator.sh — start/resume the cc-ci ORCHESTRATOR session in tmux under remote-control.
|
||||
#
|
||||
# The orchestrator (see /srv/cc-ci/AGENTS.md) is the long-lived SUPERVISORY session: it watches the
|
||||
# Builder/Adversary loops, reads their logs/STATUS, edits the plan/prompts, restarts stuck loops, and
|
||||
# owns the VM-level fallback. It is SEPARATE from the loops that launch.sh manages — this script only
|
||||
# brings the orchestrator back (e.g. after a reboot, which kills the tmux server and every session in
|
||||
# it). The conversation itself survives on disk across exits/reboots; remote-control only stays
|
||||
# connected while the process is alive, so recovery = relaunch the process and re-attach by --resume.
|
||||
#
|
||||
# Naming: tmux session AND remote-control name are both "cc-ci-orchestrator", matching the loop
|
||||
# sessions cc-ci-builder / cc-ci-adv / cc-ci-watchdog.
|
||||
#
|
||||
# Usage:
|
||||
# ./launch-orchestrator.sh start # resume the persistent orchestrator session (DEFAULT)
|
||||
# ./launch-orchestrator.sh fresh # start a NEW orchestrator session (no --resume)
|
||||
# ./launch-orchestrator.sh status # show tmux + remote-control state
|
||||
# ./launch-orchestrator.sh attach # tmux attach to the session (Ctrl-b d to detach)
|
||||
# ./launch-orchestrator.sh stop # kill the tmux session (conversation persists on disk)
|
||||
#
|
||||
# The persistent session id is read from $ID_FILE (seeded on first run with DEFAULT_ID). A Claude
|
||||
# session keeps the SAME id across --resume, so this stays valid across reboots. To point the script
|
||||
# at a different session, edit that file or export ORCH_SESSION_ID.
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# ----- config -------------------------------------------------------------
|
||||
SESSION="${ORCH_SESSION:-cc-ci-orchestrator}" # tmux session name == remote-control name
|
||||
WORKDIR="${ORCH_DIR:-/srv/cc-ci}" # orchestrator cwd (its claude project dir)
|
||||
CLAUDE_BIN="${CLAUDE_BIN:-claude}"
|
||||
CLAUDE_FLAGS="${CLAUDE_FLAGS:---dangerously-skip-permissions}"
|
||||
# REMOTE_CONTROL=1 → --remote-control session, viewable/steerable at claude.ai/code. Needs the box
|
||||
# logged into the claude.ai account. =0 for a plain local interactive session.
|
||||
REMOTE_CONTROL="${REMOTE_CONTROL:-1}"
|
||||
LOG_DIR="${LOG_DIR:-/srv/cc-ci/.cc-ci-logs}"
|
||||
ID_FILE="${ORCH_ID_FILE:-$LOG_DIR/.orchestrator-session-id}"
|
||||
DEFAULT_ID="34a80a99-b37e-4809-b8da-ccc9fafe785e" # the orchestrator session as of 2026-05-28
|
||||
# Startup nudge injected as the resumed session's first turn, so an AUTO-launched orchestrator (e.g.
|
||||
# cc-ci-loops.service ExecStartPost after a reboot) actually RUNS its AGENTS.md startup routine —
|
||||
# announce itself + report reboots — instead of resuming silently and waiting. Set empty to disable.
|
||||
# Must contain NO single quotes (it is single-quoted into the tmux command).
|
||||
STARTUP_PROMPT="${ORCH_STARTUP_PROMPT-STARTUP (auto-launch): you are the cc-ci orchestrator, just (re)launched, likely after a reboot. Do your AGENTS.md On-startup routine NOW: read cc-ci-plan/REBOOTS.md and run cc-ci-plan/launch.sh status, then send a proactive PushNotification that you are online with the current phase and reboot count, and confirm cc-ci-loops.service brought the loops + watchdog back (relaunch with RESUME_PHASE=1 cc-ci-plan/launch.sh start if not). Then resume supervising.}"
|
||||
# --------------------------------------------------------------------------
|
||||
|
||||
log() { printf '[orchestrator %(%H:%M:%S)T] %s\n' -1 "$*"; }
|
||||
die() { log "ERROR: $*"; exit 1; }
|
||||
session_alive() { tmux has-session -t "$SESSION" 2>/dev/null; }
|
||||
|
||||
preflight() {
|
||||
command -v tmux >/dev/null 2>&1 || die "missing dependency: tmux"
|
||||
command -v "$CLAUDE_BIN" >/dev/null 2>&1 || die "claude CLI not found (set CLAUDE_BIN)"
|
||||
[[ -d "$WORKDIR" ]] || die "workdir not found: $WORKDIR"
|
||||
mkdir -p "$LOG_DIR"
|
||||
[[ -f "$ID_FILE" ]] || echo "$DEFAULT_ID" > "$ID_FILE"
|
||||
}
|
||||
|
||||
resume_id() { echo "${ORCH_SESSION_ID:-$(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID")}"; }
|
||||
|
||||
# Launch claude in a detached tmux session. $1=resume ("resume"|"fresh").
|
||||
start() {
|
||||
local mode="${1:-resume}"
|
||||
preflight
|
||||
if session_alive; then
|
||||
log "$SESSION already running — leaving it (use '$0 stop' first to relaunch)"
|
||||
return 0
|
||||
fi
|
||||
local rc="" resume="" id=""
|
||||
[[ "$REMOTE_CONTROL" == "1" ]] && rc="--remote-control '$SESSION'"
|
||||
if [[ "$mode" == "resume" ]]; then
|
||||
id="$(resume_id)"
|
||||
[[ -n "$id" ]] && resume="--resume '$id'"
|
||||
log "starting $SESSION (resume=$id, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
|
||||
else
|
||||
log "starting $SESSION FRESH (no resume, cwd=$WORKDIR, rc=$REMOTE_CONTROL)"
|
||||
fi
|
||||
# Startup nudge as a POSITIONAL prompt (not stdin — stdin would force print mode and break
|
||||
# remote-control). On --resume this appends as the session's next turn, triggering the AGENTS.md
|
||||
# startup routine (announce + report reboots). Empty STARTUP_PROMPT => clean resume, no nudge.
|
||||
local prompt_arg=""
|
||||
[[ -n "$STARTUP_PROMPT" ]] && prompt_arg="'$STARTUP_PROMPT'"
|
||||
tmux new-session -d -s "$SESSION" -c "$WORKDIR" \
|
||||
"$CLAUDE_BIN $resume $rc $CLAUDE_FLAGS $prompt_arg"
|
||||
tmux pipe-pane -o -t "$SESSION" "cat >> '$LOG_DIR/$SESSION.log'"
|
||||
log "started. status: $0 status | attach: tmux attach -t $SESSION"
|
||||
}
|
||||
|
||||
case "${1:-start}" in
|
||||
start) start resume ;;
|
||||
fresh) start fresh ;;
|
||||
stop)
|
||||
if session_alive; then log "killing $SESSION"; tmux kill-session -t "$SESSION" || true; else log "$SESSION not running"; fi
|
||||
;;
|
||||
status)
|
||||
if session_alive; then
|
||||
log "$SESSION: RUNNING"
|
||||
ps -eo pid,etime,args | grep "[r]emote-control $SESSION" || true
|
||||
else
|
||||
log "$SESSION: stopped"
|
||||
fi
|
||||
log "resume id: $(cat "$ID_FILE" 2>/dev/null || echo "$DEFAULT_ID") (file: $ID_FILE)"
|
||||
;;
|
||||
attach) exec tmux attach -t "$SESSION" ;;
|
||||
*)
|
||||
cat <<EOF
|
||||
cc-ci orchestrator launcher
|
||||
|
||||
$0 start resume the persistent orchestrator session in tmux + remote-control (default)
|
||||
$0 fresh start a NEW orchestrator session (no --resume)
|
||||
$0 status show tmux + remote-control state and the resume id
|
||||
$0 attach tmux attach to the session
|
||||
$0 stop kill the tmux session (conversation persists on disk)
|
||||
|
||||
Env: SESSION=$SESSION WORKDIR=$WORKDIR REMOTE_CONTROL=$REMOTE_CONTROL CLAUDE_BIN=$CLAUDE_BIN
|
||||
EOF
|
||||
;;
|
||||
esac
|
||||
@ -7,10 +7,10 @@
|
||||
# • Adversary (tmux session: cc-ci-adv) working clone /srv/cc-ci/cc-ci-adv
|
||||
# coordinating only through the git repo on git.autonomic.zone.
|
||||
#
|
||||
# PHASES: the watchdog runs an ordered sequence of sub-phases (default: 1c then 1b). Each phase
|
||||
# has its own plan + phase-namespaced loop-state files (STATUS-<id>.md etc.). When a phase's
|
||||
# PHASES: the watchdog runs an ordered sequence of sub-phases (default: 1c → 1b → 1d → 1e → 2 → 2b → 3 → 4).
|
||||
# Each phase has its own plan + phase-namespaced loop-state files (STATUS-<id>.md etc.). When a phase's
|
||||
# STATUS-<id>.md shows "## DONE", the watchdog AUTO-TRANSITIONS to the next phase; after the LAST
|
||||
# phase it STOPS the loops and exits (a manual gate — e.g. check in before Phase 2).
|
||||
# phase (4, final review/polish/cleanup) it STOPS the loops and exits (end of the whole build).
|
||||
#
|
||||
# Three jobs: ITERATION (each agent's /loop), RESILIENCE (restart a dead loop), HANDOFF SIGNALLING
|
||||
# (ping the waiting loop the moment its counterpart hands off), PHASE SEQUENCING (this file).
|
||||
@ -49,7 +49,7 @@ WATCHDOG_SESSION="cc-ci-watchdog"
|
||||
# Ordered phase sequence: each entry "id|planfile|statusbasename". The watchdog runs them in order,
|
||||
# auto-transitions on the phase's "## DONE" (in BUILDER_DIR/<statusbasename>), and STOPS after the
|
||||
# last one (manual gate). Override PHASES_SPEC (semicolon-separated) to change the sequence.
|
||||
PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md}"
|
||||
PHASES_SPEC="${PHASES_SPEC:-1c|plan-phase1c-full-reproducibility.md|STATUS-1c.md;1b|plan-phase1b-review-lint.md|STATUS-1b.md;1d|plan-phase1d-generic-test-suite.md|STATUS-1d.md;1e|plan-phase1e-harness-corrections.md|STATUS-1e.md;2|plan-phase2-recipe-tests.md|STATUS-2.md;2b|plan-phase2b-test-performance.md|STATUS-2b.md;3|plan-phase3-results-ux.md|STATUS-3.md;4|plan-phase4-final-review-polish-cleanup.md|STATUS-4.md}"
|
||||
IFS=';' read -r -a PHASES <<< "$PHASES_SPEC"
|
||||
PHASE_IDX_FILE="${PHASE_IDX_FILE:-$LOG_DIR/.phase-idx}"
|
||||
# --------------------------------------------------------------------------
|
||||
@ -64,7 +64,10 @@ phase_id() { echo "${PHASES[$1]}" | cut -d'|' -f1; }
|
||||
phase_plan() { echo "${PHASES[$1]}" | cut -d'|' -f2; }
|
||||
phase_status() { echo "${PHASES[$1]}" | cut -d'|' -f3; }
|
||||
phase_review() { echo "REVIEW-$(phase_id "$1").md"; }
|
||||
phase_done() { grep -qE '^##[[:space:]]+DONE' "$BUILDER_DIR/$1" 2>/dev/null; } # $1 = status basename (read locally)
|
||||
# Loop-state files may sit at the repo root OR under machine-docs/ (the 1b RL6 move). Prefer
|
||||
# machine-docs/ if present, else root — so the watchdog survives the move whenever it happens.
|
||||
resolve_state() { local dir="$1" base="$2"; if [[ -f "$dir/machine-docs/$base" ]]; then echo "$dir/machine-docs/$base"; else echo "$dir/$base"; fi; }
|
||||
phase_done() { grep -qE '^##[[:space:]]+DONE' "$(resolve_state "$BUILDER_DIR" "$1")" 2>/dev/null; } # $1 = status basename (read locally)
|
||||
all_ids() { local p; for p in "${PHASES[@]}"; do printf '%s ' "$(echo "$p" | cut -d'|' -f1)"; done; }
|
||||
|
||||
preflight() {
|
||||
@ -133,15 +136,32 @@ ping_session() {
|
||||
tmux send-keys -t "$s" -l -- "$msg" 2>/dev/null && { sleep 0.3; tmux send-keys -t "$s" Enter 2>/dev/null; }
|
||||
}
|
||||
|
||||
# A loop can stall ALIVE on a usage/spend-limit notice: the claude process stays up (so the
|
||||
# dead-session restart never fires) but makes no progress, and the /loop self-pacing is dead because
|
||||
# the limit interrupted the turn that would have scheduled the next tick. Detect that signature
|
||||
# (limit text present + no active-turn marker) and re-nudge it each heavy tick — once the limit resets
|
||||
# the next nudge lands and the loop resumes. Gated on the limit text so we NEVER nudge a loop that is
|
||||
# just legitimately idle-waiting on a handoff.
|
||||
LIMIT_RE='spend limit|usage limit|limit reached|reached your .*limit|out of (credits|tokens)'
|
||||
nudge_if_limit_stalled() {
|
||||
local s="$1" pane
|
||||
pane="$(tmux capture-pane -pt "$s" 2>/dev/null | tail -25 || true)"
|
||||
if printf '%s\n' "$pane" | grep -q 'esc to interrupt'; then return 0; fi # actively working
|
||||
if ! printf '%s\n' "$pane" | grep -qiE "$LIMIT_RE"; then return 0; fi # not a limit stall
|
||||
log "limit-stall detected on $s — re-nudging to resume"
|
||||
ping_session "$s" "watchdog: the usage/spend limit appears lifted — RESUME your loop now. Pull latest, re-read your phase STATUS/REVIEW files, and continue from where you stopped; re-arm your loop pacing."
|
||||
}
|
||||
|
||||
# Edge-triggered handoff signalling for the CURRENT phase. Reads the loops' local clones.
|
||||
# Ping the Adversary only when a gate id NEWLY appears on a "CLAIMED … awaiting" line (never on
|
||||
# the baseline / restart / a passed-but-kept line). Ping the Builder when the phase REVIEW changes.
|
||||
_wd_awaiting=""; _wd_baselined=""; _wd_last_review=""
|
||||
handoff_reset() { _wd_awaiting=""; _wd_baselined=""; _wd_last_review=""; } # call on phase transition
|
||||
_wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""
|
||||
handoff_reset() { _wd_awaiting=""; _wd_baselined=""; _wd_last_review=""; _wd_adv_inbox_seen=""; _wd_builder_inbox_seen=""; } # call on phase transition
|
||||
handoff_check() {
|
||||
local idx sf rf cur now added
|
||||
idx="$(cur_idx)"
|
||||
sf="$BUILDER_DIR/$(phase_status "$idx")"; rf="$ADV_DIR/$(phase_review "$idx")"
|
||||
sf="$(resolve_state "$BUILDER_DIR" "$(phase_status "$idx")")"; rf="$(resolve_state "$ADV_DIR" "$(phase_review "$idx")")"
|
||||
if [[ -f "$sf" ]]; then
|
||||
now="$(grep -iE 'CLAIMED.*awaiting' "$sf" 2>/dev/null | grep -oiE 'M[0-9]+(\.[0-9]+)?|[A-Z][0-9]+' | tr '[:lower:]' '[:upper:]' | sort -u || true)"
|
||||
if [[ -n "$_wd_baselined" ]]; then
|
||||
@ -163,6 +183,34 @@ handoff_check() {
|
||||
_wd_last_review="$cur"
|
||||
fi
|
||||
fi
|
||||
|
||||
# INBOX side-channel (§6.1). The sender writes the receiver's inbox in their OWN clone, so we
|
||||
# detect from the sender side. Edge-trigger on content hash so a fresh message (sender re-wrote
|
||||
# before receiver consumed) re-pings. Receiver deletes after processing => hash empty => next
|
||||
# write re-triggers.
|
||||
local adv_inbox builder_inbox h
|
||||
adv_inbox="$(resolve_state "$BUILDER_DIR" "ADVERSARY-INBOX.md")"
|
||||
if [[ -f "$adv_inbox" ]]; then
|
||||
h="$(md5sum "$adv_inbox" 2>/dev/null | awk '{print $1}' || true)"
|
||||
if [[ -n "$h" && "$h" != "$_wd_adv_inbox_seen" ]]; then
|
||||
log "handoff: ADVERSARY-INBOX.md new/changed -> pinging Adversary"
|
||||
ping_session "$ADV_SESSION" "watchdog ping: the Builder wrote machine-docs/ADVERSARY-INBOX.md — pull, read the message, act on it, then delete the file (commit + push) to mark it consumed."
|
||||
_wd_adv_inbox_seen="$h"
|
||||
fi
|
||||
else
|
||||
_wd_adv_inbox_seen="" # consumed; ready for the next write
|
||||
fi
|
||||
builder_inbox="$(resolve_state "$ADV_DIR" "BUILDER-INBOX.md")"
|
||||
if [[ -f "$builder_inbox" ]]; then
|
||||
h="$(md5sum "$builder_inbox" 2>/dev/null | awk '{print $1}' || true)"
|
||||
if [[ -n "$h" && "$h" != "$_wd_builder_inbox_seen" ]]; then
|
||||
log "handoff: BUILDER-INBOX.md new/changed -> pinging Builder"
|
||||
ping_session "$BUILDER_SESSION" "watchdog ping: the Adversary wrote machine-docs/BUILDER-INBOX.md — pull, read the message, act on it, then delete the file (commit + push) to mark it consumed."
|
||||
_wd_builder_inbox_seen="$h"
|
||||
fi
|
||||
else
|
||||
_wd_builder_inbox_seen=""
|
||||
fi
|
||||
}
|
||||
|
||||
watchdog_loop() {
|
||||
@ -184,15 +232,15 @@ watchdog_loop() {
|
||||
handoff_reset
|
||||
start_loops
|
||||
else
|
||||
log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — MANUAL CHECK-IN required before Phase 2."
|
||||
log "PHASE SEQUENCE COMPLETE (last phase $pid DONE). Stopping loops — entire build (1c→3) finished."
|
||||
stop_loops
|
||||
printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; manual check-in required before Phase 2.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE"
|
||||
printf 'cc-ci phase sequence complete %(%F %T)T. Phases: %s. Loops stopped; entire build finished.\n' -1 "$(all_ids)" > "$LOG_DIR/SEQUENCE-COMPLETE"
|
||||
log "watchdog exiting."
|
||||
exit 0
|
||||
fi
|
||||
else
|
||||
session_alive "$BUILDER_SESSION" || { log "builder gone — restarting (phase $pid)"; start_agent builder "$BUILDER_SESSION" "$BUILDER_DIR"; }
|
||||
session_alive "$ADV_SESSION" || { log "adversary gone — restarting (phase $pid)"; start_agent adversary "$ADV_SESSION" "$ADV_DIR"; }
|
||||
if session_alive "$BUILDER_SESSION"; then nudge_if_limit_stalled "$BUILDER_SESSION"; else log "builder gone — restarting (phase $pid)"; start_agent builder "$BUILDER_SESSION" "$BUILDER_DIR"; fi
|
||||
if session_alive "$ADV_SESSION"; then nudge_if_limit_stalled "$ADV_SESSION"; else log "adversary gone — restarting (phase $pid)"; start_agent adversary "$ADV_SESSION" "$ADV_DIR"; fi
|
||||
fi
|
||||
fi
|
||||
sleep "$SIGNAL_INTERVAL"
|
||||
|
||||
135
cc-ci-plan/plan-orchestrator-migration.md
Normal file
135
cc-ci-plan/plan-orchestrator-migration.md
Normal file
@ -0,0 +1,135 @@
|
||||
# Plan — migrate the orchestrator off the Pi onto a dedicated NixOS Incus VM
|
||||
|
||||
**Goal:** move everything that drives the cc-ci loops (the Builder/Adversary loops, the watchdog,
|
||||
the SOCKS proxy, the orchestrator session itself) off the Raspberry Pi and onto a new, dedicated,
|
||||
**reboot-resilient NixOS VM** on b1 — declared in a new git repo **`cc-ci-orchestrator`**. Finish by
|
||||
relocating this orchestrator session there too.
|
||||
|
||||
**Why:** the Pi has rebooted twice today, each time silently killing the tmux loops + watchdog
|
||||
(they don't survive reboot, nothing auto-restarts them). A NixOS VM lets us declare the whole rig
|
||||
(claude CLI, proxy, loop supervisor) as systemd services that come back on boot — turning a reboot
|
||||
into a non-event. It also consolidates the orchestrator next to the infra it manages.
|
||||
|
||||
**Status:** DRAFT — awaiting operator go-ahead before any infra creation / cutover.
|
||||
|
||||
---
|
||||
|
||||
## 0. Current footprint (what has to move)
|
||||
|
||||
On the Pi (`raspberrypi`, aarch64), workspace `/srv/cc-ci` (itself the
|
||||
`cc-ci-autonomous-orchestrator` git repo):
|
||||
|
||||
| Item | What | Move strategy |
|
||||
|---|---|---|
|
||||
| `cc-ci-plan/` | loop code: `launch.sh`, `plan*.md`, `prompts/`, `kickoff.md` | in git (this repo) → clone on VM |
|
||||
| `cc-ci/`, `cc-ci-adv/` | Builder + Adversary working clones (~13M each) | **re-clone from git.autonomic.zone** on the VM (cleaner than copying) |
|
||||
| `.cc-ci-logs/` | watchdog/loop logs + `.phase-idx` | copy `.phase-idx` (the resume point); logs start fresh |
|
||||
| `cc-ci-secrets/` | sops-encrypted secrets repo | in git → clone |
|
||||
| `references/` | recipe-maintainer corpus (read-only parity source) | clone/rsync from `/srv/recipe-maintainer` |
|
||||
| **`.testenv`** | TS auth key, Gitea bot creds | **out-of-band copy** (gitignored, never in git) |
|
||||
| **`~/.ssh/cc-ci-root-ed25519`** | root SSH key to cc-ci | **out-of-band copy** |
|
||||
| **`.sops/master-age.txt`** | master recovery age key | **out-of-band copy** |
|
||||
| **Incus mTLS certs** (`/srv/incus-terraform-nix-vm-creator/terraform-secrets/`) | `terraform.{crt,key}`, `vm_ssh_key` | **out-of-band copy** — so the VM can itself manage VMs |
|
||||
| `cc-ci-tailscaled.service` | userspace SOCKS proxy :1055 | **re-declare as NixOS** (see §3) |
|
||||
| **claude CLI + auth** | `~/.local/bin/claude` v2.1.154 + `~/.claude.json` | install on VM + **operator `claude auth login`** (§4) |
|
||||
| this orchestrator session | the supervising claude conversation | **operator-assisted cutover** (§6) |
|
||||
|
||||
Two hard human-in-the-loop steps, called out explicitly: **claude auth on the new VM** (device-code
|
||||
login, can't be scripted) and the **final session cutover** (the operator connects to the new
|
||||
orchestrator session). Everything else I can do.
|
||||
|
||||
## 1. Target VM spec
|
||||
|
||||
- **Host/API:** b1 Incus, `https://100.117.251.31:8443`, project `terraform-ci`, mTLS certs (have).
|
||||
- **Name:** `cc-ci-orchestrator` (tailnet hostname too).
|
||||
- **Resources:** **2 GB RAM, 2 vCPU, 30 GB disk** (dir backend → resize needs a reboot; size at
|
||||
create time so no later grow). b1 has ample headroom (only cc-nix-test @8GB running).
|
||||
- **Image:** the existing imported NixOS base VM image (`incus-base-vm`) — already ships tailscale,
|
||||
openssh, git/jq/curl, flakes, cloud-init.
|
||||
- **Tailnet:** joins via a fresh `TS_AUTH_KEY` (operator provides, or reuse the keyed approach in
|
||||
`terraform-secrets/.test.env`). MagicDNS name `cc-ci-orchestrator.taila4a0bf.ts.net`.
|
||||
- **Bootstrap:** cloud-init writes the `cc-ci-orchestrator` flake config + `nixos-rebuild switch`.
|
||||
|
||||
## 2. The new `cc-ci-orchestrator` git repo (NixOS config)
|
||||
|
||||
A new **private** repo on `git.autonomic.zone/recipe-maintainers/cc-ci-orchestrator` (bot is org
|
||||
admin). It is the NixOS config for this VM — the orchestrator's equivalent of what `cc-ci` is for the
|
||||
test server. Contents:
|
||||
|
||||
- `flake.nix` + `hosts/cc-ci-orchestrator/configuration.nix` — the VM's NixOS config.
|
||||
- **Packages:** `claude-code` (CLI), `git`, `tmux`, `python3`, `jq`, `openssh`, `nodejs` (claude
|
||||
runtime), `coreutils`, `nettools` (`nc` for the proxy ProxyCommand).
|
||||
- **`services.cc-ci-tailscaled`** — the userspace tailscaled SOCKS proxy on :1055, as a NixOS
|
||||
systemd service (port to NixOS from the Pi's `cc-ci-tailscaled.service`). This is the path to b1 +
|
||||
cc-ci.
|
||||
- **`services.cc-ci-orchestrator`** — a systemd service that runs `launch.sh start` with
|
||||
`RESUME_PHASE=1` **on boot** (after the proxy + network are up), as the workspace user. **This is
|
||||
the reboot-resilience fix** — the loops + watchdog come back automatically after any reboot.
|
||||
- **Secrets via sops-nix** (like cc-ci): the out-of-band secrets (`.testenv`, ssh key, incus certs)
|
||||
are sops-encrypted into the repo, decrypted at activation to their runtime paths. The **master age
|
||||
key** is the one irreducible out-of-band bootstrap secret placed on the VM once.
|
||||
- `~/.ssh/config` for `cc-ci` (root, ProxyCommand via :1055) declared.
|
||||
- **Excluded from git:** claude's own auth (`~/.claude.json`) — that's per-user login state, set up
|
||||
once interactively (§4), not committed.
|
||||
|
||||
## 3. Execution phases
|
||||
|
||||
### Phase A — provision the VM (reversible; safe to do while Pi loops keep running)
|
||||
1. Create `cc-ci-orchestrator` VM via the Incus API (2 GB / 2 vCPU / 30 GB, NixOS base image, TS auth
|
||||
key in cloud-init). Wait for tailnet join + ssh.
|
||||
2. Verify: `ssh` in, `tailscale status`, `nixos-rebuild` available, can reach b1 API + cc-ci through
|
||||
its own proxy once configured.
|
||||
|
||||
### Phase B — author + apply the `cc-ci-orchestrator` repo
|
||||
3. Create the private git repo; author the flake/config (§2); commit/push.
|
||||
4. Place the master age key on the VM; sops-encrypt the out-of-band secrets into the repo.
|
||||
5. `nixos-rebuild switch` on the VM → proxy service up, packages present, services defined (loop
|
||||
supervisor **not yet started** — or started in a dry mode).
|
||||
|
||||
### Phase C — stage the workspace (no cutover yet)
|
||||
6. On the VM: clone `cc-ci-autonomous-orchestrator` (the loop code), clone the Builder/Adversary
|
||||
working repos fresh from git.autonomic.zone, clone `cc-ci-secrets`, rsync `references/`.
|
||||
7. Copy `.phase-idx` (resume point = phase 2) so the VM watchdog resumes the right phase.
|
||||
8. **Operator step:** `claude auth login` on the VM (device code) so the loops can run
|
||||
`--remote-control --dangerously-skip-permissions`. Verify with a throwaway interactive claude.
|
||||
|
||||
### Phase D — cutover (the only disruptive moment; pick a clean point)
|
||||
9. **Quiesce the Pi:** stop the Pi loops + watchdog (`launch.sh stop`); confirm both loops are at a
|
||||
safe point (no half-written commit; `git status` clean in both clones, last work pushed).
|
||||
10. **Start on the VM:** enable + start the `cc-ci-orchestrator` systemd service → `launch.sh start`
|
||||
(RESUME_PHASE=1) brings up Builder + Adversary + watchdog on the VM, resuming phase 2 from the
|
||||
repo state. Verify all three sessions + a handoff + public health.
|
||||
11. **Decommission the Pi loops:** disable the Pi's `cc-ci-tailscaled` + leave the workspace in place
|
||||
(read-only fallback) but not running loops. (Keep the Pi as a cold standby for a few days before
|
||||
deleting anything.)
|
||||
|
||||
### Phase E — move the orchestrator session (operator-assisted)
|
||||
12. On the VM, start the orchestrator session: `claude --remote-control 'autonomous-orchestrator'
|
||||
--dangerously-skip-permissions` in a tmux session, seeded with AGENTS.md + this plan so it picks
|
||||
up the supervising role. The **operator connects** to it (claude.ai/code) — this is the
|
||||
"move myself" step; a session can't transplant itself across machines, so it's a fresh
|
||||
orchestrator session on the VM with full context from the repo.
|
||||
13. This Pi-side orchestrator session hands off (writes a short state note) and goes idle/ends.
|
||||
|
||||
## 4. Risks & mitigations
|
||||
- **claude auth (human step):** unavoidable device-code login on the VM. Mitigation: do it in Phase
|
||||
C, well before cutover; verify before quiescing the Pi.
|
||||
- **Loops mid-work at cutover:** pick a quiet point (between gate claims / after a push); the loops
|
||||
re-orient from git on restart anyway, so worst case is a re-run of an in-flight iteration.
|
||||
- **Secrets sprawl:** out-of-band secrets are copied once, then sops-managed in the new repo; never
|
||||
committed in plaintext (same discipline as cc-ci). The master age key is the sole bootstrap secret.
|
||||
- **Self-move gap:** between Pi-session-ends and VM-session-connected, there's no live orchestrator.
|
||||
The watchdog (now a boot service) keeps the loops alive independently, so this gap is safe.
|
||||
- **Rollback:** until the Pi workspace is deleted, reverting = stop VM service, `launch.sh start` on
|
||||
the Pi again. Keep the Pi intact until the VM has run clean through at least one reboot + one gate
|
||||
handshake.
|
||||
- **Reboot-resilience proof:** before trusting the VM, reboot it once and confirm the loops +
|
||||
watchdog + proxy all come back via systemd (the whole point of the move).
|
||||
|
||||
## 5. Operator-assisted steps (the only things I can't fully do)
|
||||
1. Provide a fresh `TS_AUTH_KEY` for the VM (or confirm reuse of the one in `terraform-secrets`).
|
||||
2. `claude auth login` on the VM (device code).
|
||||
3. Connect to the new orchestrator session on the VM at cutover (Phase E).
|
||||
|
||||
Everything else (VM create, repo author, NixOS config, secret migration, workspace staging, the
|
||||
loop cutover) I can drive.
|
||||
@ -135,3 +135,40 @@ Blocking unless noted; these are *plan-relevant invariants visible only by readi
|
||||
- Whether to add Python **type-checking** (mypy/pyright) now or defer to `IDEAS.md`.
|
||||
- The precise **blocking vs advisory** split for the checklist.
|
||||
- Whether the `.drone.yml` lint stage should **fail** the build or just warn initially.
|
||||
|
||||
---
|
||||
|
||||
## 7. Operator review items (added 2026-05-27) — repo layout (do in this 1b pass)
|
||||
|
||||
Two structural-review items from the operator. Both are **blocking** for 1b. Apply them as part of
|
||||
this pass, then re-verify (RL3 covers the re-verification). **Mind the coordination caveats — these
|
||||
touch the live flake build and the running multi-agent machinery.**
|
||||
|
||||
### RL5 — Consolidate all Nix-code folders under a root `nix/`
|
||||
- Move the folders that contain `.nix` code — **`modules/` and `hosts/`** — to **`nix/modules/` and
|
||||
`nix/hosts/`**. (Add future Nix dirs under `nix/` too.)
|
||||
- **Keep `flake.nix` / `flake.lock` at the repo root** (entry point) so the build ref is unchanged
|
||||
(`docs/install.md`'s `nixos-rebuild switch --flake 'git+file://…?submodules=1#cc-ci'` stays valid).
|
||||
Just update the flake's internal paths (`./modules` → `./nix/modules`, `./hosts` → `./nix/hosts`)
|
||||
and any `imports`/`scripts`/`.drone.yml` references.
|
||||
- **Re-verify after the move:** the byte-identical clean-room result is the bar. The toplevel store
|
||||
hash *will* change (paths differ) — that's fine; what must hold is that a fresh recursive clone
|
||||
still rebuilds **byte-identical to the running system** and the Adversary re-confirms it cold
|
||||
(folds into RL3). Update `docs/architecture.md` to describe the `nix/` layout.
|
||||
|
||||
### RL6 — Move uppercase multi-agent-protocol files into `machine-docs/`
|
||||
- Move the uppercase protocol files — **`STATUS*.md`, `REVIEW*.md`, `JOURNAL*.md`, `BACKLOG*.md`,
|
||||
`DECISIONS.md`** — into a root **`machine-docs/`** folder. **`README.md` stays in the repo root**
|
||||
(operator decision, 2026-05-27) — it is the human-facing repo readme, not a protocol file; do
|
||||
**not** move it into `machine-docs/`.
|
||||
- **Update every reference** to the new paths: the `cc-ci-plan/` plans (this file, `plan.md`,
|
||||
`plan-phase1c-*`, `README.md`, `kickoff.md`, `test-e2e-testme-acceptance.md`), `AGENTS.md`,
|
||||
`.drone.yml`, `scripts/`, and any in-repo doc that points at `STATUS.md`/`REVIEW.md`/etc.
|
||||
- **⚠ COORDINATION CAVEAT (do not move these unilaterally mid-run):** the live **watchdog**
|
||||
(`cc-ci-plan/launch.sh`, the orchestrator's file) reads `STATUS-<id>.md` and `REVIEW-<id>.md` at
|
||||
the **repo root** to drive handoff pings + the 1c→1b auto-transition. Moving them breaks the
|
||||
running watchdog until `launch.sh` is updated to the `machine-docs/` paths and the watchdog is
|
||||
restarted. **So sequence it with the orchestrator:** the orchestrator updates `launch.sh`'s
|
||||
`PHASES_SPEC`/path logic and restarts the watchdog **in lockstep** with the loops' `git mv`.
|
||||
Safest to do this **near the end of 1b** (or as its final step), not while a phase transition is
|
||||
pending. Flag the orchestrator when ready and it will handle `launch.sh` + the watchdog restart.
|
||||
|
||||
243
cc-ci-plan/plan-phase1d-generic-test-suite.md
Normal file
243
cc-ci-plan/plan-phase1d-generic-test-suite.md
Normal file
@ -0,0 +1,243 @@
|
||||
# cc-ci Phase 1d — Generic test suite + layered recipe overlays (Autonomous Build Plan)
|
||||
|
||||
**Status:** QUEUED — runs **after Phase 1b** (`plan-phase1b-review-lint.md`) and **before Phase 2**
|
||||
(`plan-phase2-recipe-tests.md`). It is the **test-architecture foundation** Phase 2 builds on, so it
|
||||
must precede it.
|
||||
**Transition:** **manual** (operator kicks it off at the post-1b check-in).
|
||||
**Builds on:** the post-1b codebase (the runner/harness, `.drone.yml`, the comment-bridge, the proof
|
||||
recipes, the `nix/` + `machine-docs/` layout from 1b RL5/RL6).
|
||||
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
|
||||
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md`
|
||||
**Phase order:** 1c → 1b → **1d** → 2 → 2b → 3.
|
||||
|
||||
---
|
||||
|
||||
## 0. Why this phase
|
||||
|
||||
Today a recipe only gets tested if someone has authored tests for it. That doesn't scale to ~18+
|
||||
recipes and gives nothing for a brand-new recipe. The operator's model (2026-05-27): **every recipe
|
||||
gets a generic lifecycle test suite for free**, and recipe-specific tests *layer on top* rather than
|
||||
being the only thing that runs. This makes `!testme` meaningful on **any** recipe immediately, turns
|
||||
Phase 2 into "add overlays where they add value" instead of "write everything from scratch," and
|
||||
gives Phase 3 a natural basis for a YunoHost-style **level** (how many tiers pass).
|
||||
|
||||
Core principle: **the generic is the default for each lifecycle op; a recipe's own `test_<op>.py`
|
||||
may override *or* extend that default.** The exact additive-vs-override mechanism is the **Builder's
|
||||
design call** (operator, 2026-05-27: override — e.g. a present `test_install.py` *replaces* the
|
||||
generic install assertions — is a perfectly good option; additive is fine too; could even be
|
||||
per-recipe opt-in). What's **fixed** and non-negotiable: (1) when a recipe defines **no** test for an
|
||||
op, **the generic runs** (so any recipe is testable with zero config); (2) the harness owns a
|
||||
**single shared deployment** that generic + recipe assertions reuse — **no redeploy** (§2.2); (3)
|
||||
custom (non-lifecycle) tests are opt-in.
|
||||
|
||||
> **SUPERSEDED (2026-05-28) by Phase 1e HC3** (`plan-phase1e-harness-corrections.md`): the override
|
||||
> default is replaced — the **generic now runs by default *alongside* an overlay (additive)**,
|
||||
> skipped only via an explicit opt-out. Also: repo-local PR code is gated to approved recipes (HC2),
|
||||
> and the upgrade tier targets the PR head (HC1). Read 1e for the current behavior.
|
||||
|
||||
---
|
||||
|
||||
## 1. Definition of Done (Phase 1d exit condition)
|
||||
|
||||
Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
|
||||
`machine-docs/REVIEW-1d.md`):
|
||||
|
||||
- [ ] **DG1 — Generic INSTALL test.** A recipe-agnostic install test exists that, given only a
|
||||
recipe name and **zero** recipe-specific config, does `abra app new` (sane defaults / auto
|
||||
secrets) → `deploy` → polls to converged (all services running/healthy, no bare `sleep`) →
|
||||
asserts the app is **actually serving** (real HTTP(S) response on its
|
||||
`<run>.ci.commoninternet.net` domain through Traefik — **not** a Traefik 404/default cert, not
|
||||
health-only). Demonstrated **green** on ≥1 simple recipe (e.g. `custom-html`) that has **no**
|
||||
cc-ci or repo-local tests.
|
||||
- [ ] **DG2 — Generic UPGRADE test.** Deploy the **previous published version** (the last release
|
||||
*before* the code under test), then **upgrade to the code under test (PR head) via
|
||||
`abra app deploy --chaos`** (chaos = the current checkout) — i.e. previous-release → PR-head,
|
||||
not previous → newest-published-tag. Assert services reconverge and the app still serves.
|
||||
**OPERATOR CORRECTION (2026-05-28):** the current 1d impl upgrades to the newest *published tag*
|
||||
and (because deploying the prev tag re-checks-out the recipe) never deploys the PR head — so a
|
||||
recipe PR's actual changes aren't exercised by the upgrade path. Fix: after deploying prev,
|
||||
**restore the PR-head checkout** (re-checkout the PR ref / re-snapshot) and `deploy --chaos` to
|
||||
it as the upgrade target. The "deployment actually moved" assertion (`do_upgrade`) still
|
||||
applies, but adapt it for prev→PR-head (a PR may not bump the version label — also accept an
|
||||
image/config change, or assert the running config now matches the PR head). For a non-PR
|
||||
`!testme`, "current checkout" = the catalogue current, so upgrade tests prev→current. (Data-
|
||||
continuity assertions remain recipe overlays — see §2.1.)
|
||||
- [ ] **DG3 — Generic BACKUP + RESTORE tests.** For backup-capable recipes (declare backupbot /
|
||||
`abra app backup` support): run backup → assert a snapshot artifact is produced; then restore →
|
||||
assert restore completes and the app is healthy after. For recipes that declare **no** backup
|
||||
config, backup/restore are cleanly **N/A (skipped)** — *not* a failure.
|
||||
- [ ] **DG4 — Layering (override-or-extend; generic is the default).** The harness composes a
|
||||
recipe's run from the generic default per op, **overridden or extended** by the recipe's
|
||||
`test_install.py` / `test_upgrade.py` / `test_backup.py` / `test_restore.py` when present (in
|
||||
cc-ci's tests dir **or** repo-local in the recipe's `tests/`) — the additive-vs-override
|
||||
mechanism is the Builder's design choice, recorded in `machine-docs/DECISIONS.md`. **Invariant:
|
||||
if a recipe defines no test for an op, the generic runs.** Arbitrary **custom** `test_*.py` run
|
||||
**only if defined**. Discovery + cc-ci-vs-repo-local precedence is
|
||||
implemented and settled in `machine-docs/DECISIONS.md`.
|
||||
- [ ] **DG4.1 — Overlays reuse the deployment (no redeploy).** Overlay + custom assertions run
|
||||
against the **same live deployment** the generic tier brought up — **one deploy per run, one
|
||||
teardown** (§2.2), lifecycle ops mutating it in place. Verified: adding an overlay to a recipe
|
||||
causes **no** extra `abra app new/deploy/undeploy` beyond the shared run (assert via
|
||||
deploy-count / harness logs). Re-provisioning only where an op semantically demands it, and
|
||||
then explicitly.
|
||||
- [ ] **DG5 — Custom install-steps hook (with the "generic-anyway" rule).** The harness supports
|
||||
**defined** per-recipe extra install steps (cc-ci or repo-local) that run before/around the
|
||||
generic install. A recipe with **no** customization still **attempts the generic suite**.
|
||||
Proven both ways: (a) a recipe needing a custom step **fails the generic install as expected**
|
||||
when the step is absent; (b) the **same** recipe **passes** once the custom step is added —
|
||||
demonstrating the hook + the graceful-generic-failure are both real.
|
||||
- [ ] **DG6 — `!testme` end-to-end on an unconfigured recipe.** A `!testme` on a recipe with no
|
||||
cc-ci/repo-local customization runs the full generic suite through the real pipeline
|
||||
(bridge → Drone → deploy → assert → undeploy → report) and reports **per-operation**
|
||||
pass/fail/skip (install/upgrade/backup/restore).
|
||||
- [ ] **DG7 — Real, DRY, clean.** No softened/`skip`/`xfail`/can't-fail assertions; generic logic
|
||||
lives in the **shared harness** (M6.5 — no per-recipe copy-paste); every run **undeploys**
|
||||
(teardown in `finally`), respects `MAX_TESTS`.
|
||||
- [ ] **DG8 — Documented + cold-verified.** `docs/` explains the generic suite, the overlay
|
||||
convention (file names + locations + precedence), and the custom-install-steps hook + how to
|
||||
add a recipe overlay. The Adversary re-runs the acceptance checks from a cold start within 24h.
|
||||
|
||||
When DG1–DG8 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1d.md`.
|
||||
|
||||
---
|
||||
|
||||
## 2. The layered test model (the core design)
|
||||
|
||||
For a given recipe, the harness assembles and runs **tiers**, each = `generic [+ overlays]`:
|
||||
|
||||
(`gen` = generic default; `→` = "overridden/extended by, if the recipe defines it" — mechanism is
|
||||
the Builder's call, §2.2):
|
||||
```
|
||||
INSTALL = gen_install → test_install.py (else gen) ← always runs
|
||||
UPGRADE = gen_upgrade → test_upgrade.py (else gen) ← always runs
|
||||
BACKUP = gen_backup → test_backup.py (else gen) ← if backup-capable, else N/A
|
||||
RESTORE = gen_restore → test_restore.py (else gen) ← if backup-capable, else N/A
|
||||
CUSTOM = recipe test_*.py (anything beyond the four) ← ONLY if defined
|
||||
```
|
||||
|
||||
### 2.1 Generic baseline suite (recipe-agnostic)
|
||||
- **install** — `abra app new <recipe> --domain <run>.ci.commoninternet.net` with non-interactive
|
||||
defaults + auto-generated secrets; `abra app deploy`; poll to converged; assert a real HTTP(S)
|
||||
response from the app over its domain (status + that it's the app, not Traefik's fallback).
|
||||
- **upgrade** — deploy a prior/pinned version, `abra app upgrade` to target, assert reconverge +
|
||||
still serving.
|
||||
- **backup / restore** — only for recipes declaring backup config; verify the **mechanism** (backup
|
||||
produces an artifact; restore completes + app healthy). **Honest limit:** generic backup/restore
|
||||
can't assert app-specific *data integrity* without recipe knowledge — that's a recipe overlay
|
||||
(`test_backup.py`/`test_restore.py` seed a marker + assert it survives). State this in docs.
|
||||
|
||||
### 2.2 Layering — override or extend (Builder's call), always **reuse the deployment**
|
||||
A recipe-defined `test_install.py` (etc.) either **overrides** the generic assertions for that op or
|
||||
**extends** them — the Builder picks the mechanism (a present `test_<op>.py` replacing the generic is
|
||||
a fine, simple model; or additive; or per-recipe opt-in). The invariant either way: **if a recipe
|
||||
defines nothing for an op, the generic runs.** This guarantees every recipe is testable with zero
|
||||
config, while letting a recipe with a poor generic fit supply its own.
|
||||
|
||||
**Reuse the deployment — do NOT redeploy per test (operator requirement, 2026-05-27).** Overlay
|
||||
assertions run against the **same live deployment** the generic tier already brought up — no extra
|
||||
`abra app new`/`deploy`/`undeploy` per overlay. The target shape: **one deploy per run**, then the
|
||||
lifecycle ops mutate that single deployment in sequence and *all* assertions (generic + overlays +
|
||||
custom) run against it, then **one teardown** at the end:
|
||||
|
||||
```
|
||||
deploy ONCE
|
||||
→ INSTALL assertions: generic_install + test_install.py (same live app)
|
||||
→ UPGRADE in place (abra app upgrade)
|
||||
assertions: generic_upgrade + test_upgrade.py (same app, upgraded)
|
||||
→ BACKUP (if capable) → generic_backup + test_backup.py
|
||||
→ RESTORE (if capable) → generic_restore + test_restore.py
|
||||
→ CUSTOM test_*.py (same live app)
|
||||
teardown ONCE (in finally)
|
||||
```
|
||||
|
||||
So a seed/marker written by `test_backup.py` is the same data `test_restore.py` checks; an overlay's
|
||||
extra HTTP assertion hits the app the generic install already deployed. Tiers that intentionally
|
||||
change state (UPGRADE, RESTORE) do so on the shared deployment in order. The only time a tier
|
||||
re-provisions is when an op semantically requires it (e.g. a from-scratch restore-into-blank variant)
|
||||
— and that must be explicit, not the default. This is also the main Phase-2b speed win.
|
||||
|
||||
### 2.3 Custom tests
|
||||
`test_*.py` that aren't one of the four lifecycle names run **only when present** for that recipe —
|
||||
no generic equivalent, purely opt-in (e.g. `test_sso.py`, `test_federation.py`).
|
||||
|
||||
### 2.4 Custom install steps (and the graceful-generic rule)
|
||||
Some recipes need extra setup the generic flow won't do (pre-seed a DB, set an env/secret, run a
|
||||
one-off command). The harness exposes a **defined** per-recipe install-steps hook (cc-ci or
|
||||
repo-local). Rules:
|
||||
- If a recipe **declares** custom install steps → run them as part of the install tier.
|
||||
- If a recipe has **no** customization defined anywhere → **still attempt the generic suite.**
|
||||
Recipes that genuinely need special steps will **fail the generic install — and that's acceptable
|
||||
and expected**; the failure is reported (per-op), and the fix is to add the custom step (Phase 2
|
||||
work), not to special-case the harness.
|
||||
|
||||
### 2.5 Discovery + precedence
|
||||
- **Locations:** cc-ci's test dir (e.g. `tests/<recipe>/`) and the recipe repo's `tests/`
|
||||
(repo-local). The harness discovers overlays + custom-install-steps from both.
|
||||
- **Precedence (OPEN — settle in DECISIONS):** proposed default — **both layer**; repo-local is the
|
||||
upstream-authoritative source and always runs, cc-ci's overlay runs in addition (and may pin/extend
|
||||
for the CI env). Define the rule for same-named collisions explicitly.
|
||||
|
||||
---
|
||||
|
||||
## 3. Milestones (bounded)
|
||||
|
||||
- **G0 — Generic install.** Implement generic_install in the shared harness; green on `custom-html`
|
||||
with no recipe config. *Accept:* DG1.
|
||||
- **G1 — Generic upgrade + backup/restore.** Add generic_upgrade; add generic_backup/restore gated
|
||||
on backup-capability (clean N/A otherwise). *Accept:* DG2, DG3.
|
||||
- **G2 — Layering + discovery.** Implement the generic+overlay composition and cc-ci/repo-local
|
||||
discovery + precedence; prove an overlay runs on top of generic. *Accept:* DG4.
|
||||
- **G3 — Custom install-steps hook + graceful-generic.** Implement the hook; demonstrate the
|
||||
fail-without / pass-with proof on one recipe needing a step. *Accept:* DG5.
|
||||
- **G4 — `!testme` integration + per-op reporting + docs + cold verify.** *Accept:* DG6, DG7, DG8;
|
||||
then flip `machine-docs/STATUS-1d.md` to `## DONE`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Guardrails
|
||||
- **Generic is the default; recipe tests override or extend it** (Builder's mechanism) — but the
|
||||
generic **always** runs when a recipe defines no test for that op. Never let a recipe end up with
|
||||
*no* assertion for an op it should be tested on.
|
||||
- **Generic failure ≠ harness bug** — a recipe needing custom steps failing the generic install is a
|
||||
correct, reported outcome; fix by adding the step (Phase 2), don't weaken/special-case the generic.
|
||||
- **Deploy once, reuse it** — overlays run against the generic tier's live deployment; no
|
||||
per-test/per-overlay redeploy. One deploy + one teardown per run; re-provision only when an op
|
||||
truly needs it, explicitly. (Correctness *and* the main perf lever.)
|
||||
- **DRY** — generic logic in the shared harness, not per-recipe (M6.5); overlays are thin.
|
||||
- **No weakened tests** — real assertions on real app state; teardown always; honor `MAX_TESTS`.
|
||||
- **Bounded** — build the architecture + prove it on a couple of recipes; the full per-recipe overlay
|
||||
authoring is Phase 2, not here.
|
||||
|
||||
---
|
||||
|
||||
## 5. Impact on later phases (reshapes the plan set)
|
||||
- **Phase 2 (`plan-phase2-recipe-tests.md`)** changes from "author every test from scratch" to:
|
||||
every enrolled recipe gets the generic suite for free; Phase 2 = **author the additive overlays**
|
||||
(port recipe-maintainer tests as `test_*.py` overlays) **+ define custom install steps** where a
|
||||
recipe fails generically. Update Phase 2 to reference 1d as its foundation.
|
||||
- **Phase 3 (`plan-phase3-results-ux.md`)** — the YunoHost-style **level** maps cleanly onto the
|
||||
tiers: e.g. installs (generic) → +upgrade → +backup/restore → +custom assertions. The level is
|
||||
derived from which tiers pass.
|
||||
- **Phase 2b (perf)** — the generic suite is the common hot path, so it's the prime target for the
|
||||
image-cache / readiness / dedup optimizations.
|
||||
|
||||
---
|
||||
|
||||
## 6. Open decisions (log in machine-docs/DECISIONS.md)
|
||||
- **Override vs extend (Builder's call).** Does a present `test_<op>.py` **replace** the generic
|
||||
assertions for that op, **add** to them, or is it **per-recipe opt-in**? Operator leans: override
|
||||
is a good, simple model. Builder decides + documents — keeping the invariant "no recipe test for an
|
||||
op ⇒ generic runs" and the single-shared-deployment rule.
|
||||
- **cc-ci vs repo-local precedence** for overlays + install-steps (§2.5) and same-name collision rule.
|
||||
- Exact **overlay file convention**: fixed names (`test_install.py`…) + discovery dir layout
|
||||
(`tests/<recipe>/` in cc-ci; `tests/` in the recipe repo).
|
||||
- How **custom install steps** are declared: a shell hook (`install_steps.sh`), a pytest fixture, or
|
||||
a declarative field — pick the simplest that the harness can run uniformly.
|
||||
- **Backup-capability detection**: how the harness decides a recipe is backup-capable (backupbot
|
||||
labels present / `abra app backup` exit) to choose run-vs-N/A for DG3.
|
||||
- Whether generic **upgrade** should always go previous→latest, or test the specific
|
||||
version-bump under `!testme` (PR-driven).
|
||||
- Per-op result vocabulary (`pass`/`fail`/`skip(N/A)`/`error`) feeding the Phase-3 level.
|
||||
- **Deployment-sharing scope**: confirm one-deploy-per-run for the whole lifecycle (install→upgrade→
|
||||
backup→restore→custom on a single deployment) vs per-tier deployments; and how a failed earlier
|
||||
tier (e.g. install) affects later tiers sharing that deployment (fail-fast vs continue-and-report).
|
||||
139
cc-ci-plan/plan-phase1e-harness-corrections.md
Normal file
139
cc-ci-plan/plan-phase1e-harness-corrections.md
Normal file
@ -0,0 +1,139 @@
|
||||
# cc-ci Phase 1e — Generic-harness corrections (Autonomous Build Plan)
|
||||
|
||||
**Status:** QUEUED — runs **after Phase 1d** and **before Phase 2** (`plan-phase2-recipe-tests.md`).
|
||||
It corrects the **shared generic-test harness** from 1d, so it must land before Phase 2 authors
|
||||
overlays on top of it.
|
||||
**Transition:** **manual** (operator kicks it off).
|
||||
**Builds on:** the Phase-1d generic suite (`runner/run_recipe_ci.py`, `runner/harness/*`,
|
||||
`tests/_generic/*`, `tests/conftest.py`) — see `plan-phase1d-generic-test-suite.md`.
|
||||
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
|
||||
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1e-harness-corrections.md`
|
||||
**Phase order:** 1c → 1b → 1d → **1e** → 2 → 2b → 3.
|
||||
|
||||
---
|
||||
|
||||
## 0. Why this phase
|
||||
|
||||
An operator review of the 1d generic suite (2026-05-28) found three corrections to the **shared
|
||||
harness** — the foundation every recipe overlay (Phase 2) builds on. Fixing them now, once, is far
|
||||
cheaper than after overlays exist. All three are small in code but change behavior, so each needs a
|
||||
fresh Adversary cold-verification and must not weaken any existing test.
|
||||
|
||||
---
|
||||
|
||||
## 1. Definition of Done (Phase 1e exit condition)
|
||||
|
||||
Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
|
||||
`machine-docs/REVIEW-1e.md`):
|
||||
|
||||
- [ ] **HC1 — Upgrade tier upgrades to the code under test (PR head), not a published tag.** The
|
||||
upgrade tier deploys the **previous published version** (last release before the PR) and then
|
||||
**upgrades to the PR head via `abra app deploy --chaos`** (chaos = the current checkout). The
|
||||
PR's actual changes are exercised by the upgrade path. (§2.1)
|
||||
- [ ] **HC2 — Repo-local (PR-authored) code is not executed unless the recipe is approved.** By
|
||||
default the harness runs **only cc-ci-authored** overlays/install-steps (`tests/<recipe>/…`) +
|
||||
the generic; PR-authored repo-local `test_*.py` and `install_steps.sh` are **not run**.
|
||||
Repo-local code is honored **only for recipes on an explicit cc-ci-maintained approval
|
||||
allowlist** (default-deny). (§2.2)
|
||||
- [ ] **HC3 — Generic runs by default (additive); skipping it is explicit.** When a recipe ships an
|
||||
overlay for an op, the **generic still runs** alongside it by default; the generic is skipped
|
||||
**only** when an explicit env/flag opts out. The baseline floor is never lost silently. (§2.3)
|
||||
- [ ] **HC4 — No regression, cold-verified.** The Adversary re-runs the relevant D1–D10 / DG1–DG8
|
||||
acceptance from a cold start: nothing weakened, deploy-once (DG4.1) still holds, teardown still
|
||||
sacred, and the three new behaviors are demonstrated (HC1: a PR-head upgrade proven to deploy
|
||||
PR-head; HC2: a repo-local test is *ignored* for a non-approved recipe and *run* for an approved
|
||||
one; HC3: generic runs with an overlay present, and is skipped only with the opt-out set).
|
||||
|
||||
When HC1–HC4 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1e.md`.
|
||||
|
||||
---
|
||||
|
||||
## 2. The three corrections
|
||||
|
||||
### 2.1 HC1 — Upgrade to the PR head (not a published tag)
|
||||
Current 1d behavior: deploy previous published version, then `abra app upgrade` to the **newest
|
||||
published tag** — and because deploying the prev tag re-checks-out the recipe, the **PR-head code is
|
||||
never deployed**, so a recipe PR's changes aren't exercised by upgrade.
|
||||
|
||||
Corrected:
|
||||
1. Deploy the **previous published version** (the last release before the code under test) as the
|
||||
"before" state.
|
||||
2. **Restore the PR-head checkout** (re-checkout the PR ref / re-use the post-fetch snapshot — the
|
||||
prev-tag deploy will have reset `~/.abra/recipes/<recipe>`).
|
||||
3. **Upgrade to it via `abra app deploy --chaos`** (chaos = current checkout = PR head) in place on
|
||||
the shared deployment.
|
||||
4. Assert reconverge + still serving (as today).
|
||||
- **Adapt the "deployment moved" assertion** (`generic.do_upgrade`): prev→PR-head may *not* bump the
|
||||
coop-cloud version label (a PR can change a recipe without a version bump), so also accept an
|
||||
image/config change, or assert the running config now matches the PR head — keep it non-vacuous
|
||||
without false-failing a legit unbumped PR.
|
||||
- **Non-PR `!testme`** (no PR head): "current checkout" = the catalogue current, so upgrade tests
|
||||
prev→current — still valid.
|
||||
- Preserve **deploy-once** spirit: this is still one app deployment mutated in place (prev → chaos
|
||||
redeploy of PR head is the upgrade op, not a fresh second app). Reconcile with the DG4.1
|
||||
deploy-count guard — define whether a chaos redeploy counts as a "deploy" and adjust the guard so
|
||||
the legitimate upgrade isn't flagged (e.g. count `abra app new` installs, not in-place redeploys).
|
||||
|
||||
### 2.2 HC2 — Repo-local trust gate (default-deny; cc-ci overlays only)
|
||||
`install_steps.sh` and repo-local `test_*.py` are PR-author-controlled code that runs on the CI host
|
||||
with `/run/secrets/*` present — an untrusted-code risk. Operator decision (2026-05-28):
|
||||
|
||||
- **Default:** the harness runs **only cc-ci-authored** overlays + install-steps
|
||||
(`tests/<recipe>/…`) and the generic. Repo-local (`<recipe-repo>/tests/`) `test_*.py` and
|
||||
`install_steps.sh` are **discovered-but-not-executed**.
|
||||
- **Approved recipes only:** repo-local code is honored **only** when the recipe is on an explicit,
|
||||
**cc-ci-maintained approval allowlist** (default-empty ⇒ default-deny). Adding a recipe to the
|
||||
allowlist is a deliberate cc-ci-maintainer act after reviewing that recipe's tests.
|
||||
- Update `discovery.resolve_op` / `custom_tests` / `install_steps` so the **repo-local source is
|
||||
only consulted for allowlisted recipes**; otherwise precedence is **cc-ci > generic** only.
|
||||
- **Open (settle in DECISIONS):** the allowlist's form + location (a checked-in file like
|
||||
`tests/repo-local-approved.txt`, or a field in a cc-ci config), and the approval workflow. Keep it
|
||||
simple + auditable + in git.
|
||||
- (Future hardening, → IDEAS, not this phase: sandbox/network-restrict even cc-ci overlays.)
|
||||
|
||||
### 2.3 HC3 — Generic by default (additive), explicit opt-out
|
||||
Supersedes 1d's pure-override default. New rule: when a recipe ships an overlay for an op, **both the
|
||||
generic and the overlay run** for that op by default; the generic is skipped **only** when an
|
||||
explicit opt-out is set.
|
||||
|
||||
- **Opt-out mechanism (propose; settle in DECISIONS):** an env flag `CCCI_SKIP_GENERIC` (all ops) and
|
||||
per-op `CCCI_SKIP_GENERIC_<OP>` (e.g. `..._UPGRADE`), settable via the recipe's `recipe_meta.py`
|
||||
(a `SKIP_GENERIC` list) so it's declarative per recipe, not a hidden global.
|
||||
- **Op-vs-assertion split (required by additive + deploy-once):** a mutating op (upgrade/backup/
|
||||
restore) must run **once**, then **both** the generic assertions and the overlay assertions
|
||||
evaluate the post-op state — never upgrade/backup twice. So refactor the tiers: the **orchestrator
|
||||
performs the op once** (the harness owns the op), then runs generic assertions (unless opted out) +
|
||||
overlay assertions against the shared post-op deployment. For `install` (no op) both assertion sets
|
||||
just run. This keeps deploy-once and one-op-per-tier intact.
|
||||
- Net effect: the generic "is it actually serving / did the upgrade move / snapshot produced" floor
|
||||
is **always** exercised unless a recipe explicitly declares it skips generics — overlays add, they
|
||||
don't silently subtract.
|
||||
|
||||
---
|
||||
|
||||
## 3. Method / milestones (bounded)
|
||||
- **E0 — HC2 trust gate.** Gate repo-local behind the approval allowlist (default-deny); cc-ci+generic
|
||||
only otherwise. *Accept:* repo-local ignored for a non-approved recipe, run for an approved one.
|
||||
- **E1 — HC3 additive + op/assertion split.** Generic runs alongside overlays by default; op runs
|
||||
once; opt-out env skips the generic assertions. *Accept:* overlay + generic both run on one
|
||||
deployment; opt-out skips generic; deploy-count still 1.
|
||||
- **E2 — HC1 upgrade-to-PR-head.** prev-release → PR-head via `deploy --chaos`; moved-assertion
|
||||
adapted; deploy-count guard reconciled. *Accept:* upgrade demonstrably deploys PR-head.
|
||||
- **E3 — HC4 cold re-verification + docs.** Adversary cold-verifies no regression + the three new
|
||||
behaviors; update `docs/` + `machine-docs/DECISIONS.md`; flip `STATUS-1e.md` to `## DONE`.
|
||||
|
||||
---
|
||||
|
||||
## 4. Guardrails
|
||||
- **Never weaken a test** — these are correctness/security fixes; the cardinal rule still wins.
|
||||
- **Default-secure** — repo-local PR code is off unless the recipe is explicitly approved; the
|
||||
allowlist lives in git and is auditable.
|
||||
- **Floor-by-default** — the generic baseline always runs unless a recipe explicitly opts out.
|
||||
- **Deploy-once preserved** — one app deployment, one teardown; ops run once; reconcile the DG4.1
|
||||
guard with the chaos-upgrade redeploy.
|
||||
- **Bounded** — three fixes + verification, then stop; bigger hardening (sandboxing) → IDEAS.
|
||||
|
||||
## 5. Open decisions (log in machine-docs/DECISIONS.md)
|
||||
- HC2: approval-allowlist form/location + the approval workflow.
|
||||
- HC3: opt-out flag name/granularity + declaring it via `recipe_meta.py`.
|
||||
- HC1: how the DG4.1 deploy-count guard treats an in-place chaos upgrade (don't flag the legit op).
|
||||
@ -1,8 +1,13 @@
|
||||
# cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan)
|
||||
|
||||
**Status:** QUEUED — starts after Phase 1 (`plan.md`) and the Phase-1b review/lint pass
|
||||
(`plan-phase1b-review-lint.md`) reach `## DONE`.
|
||||
**Builds on:** the Phase-1 cc-ci CI server (`plan.md`). This phase adds **test content**, not infra.
|
||||
**Status:** QUEUED — starts after Phase 1b (`plan-phase1b-review-lint.md`) **and Phase 1d**
|
||||
(`plan-phase1d-generic-test-suite.md`) reach `## DONE`.
|
||||
**Builds on:** the Phase-1 cc-ci CI server (`plan.md`) **and Phase 1d's generic test suite** — every
|
||||
recipe already gets generic install/upgrade/backup/restore for free, reusing one shared deployment.
|
||||
So this phase is **not** "write every test from scratch": it's **authoring the recipe overlays**
|
||||
(`test_<op>.py` that override/extend the generic, per 1d's model) + **defining custom install steps**
|
||||
for recipes that fail the generic form, porting the recipe-maintainer corpus as overlays. This phase
|
||||
adds **test content**, not infra.
|
||||
**Reference corpus:** `references/recipe-maintainer/` → `/srv/recipe-maintainer/` (the existing,
|
||||
human-maintained recipe tests — the canonical source to port from).
|
||||
**Owner agents:** same Builder + Adversary loops + coordination protocol as Phase 1 (`plan.md` §6/§7).
|
||||
@ -10,6 +15,44 @@ human-maintained recipe tests — the canonical source to port from).
|
||||
|
||||
---
|
||||
|
||||
## 0c. OIDC / SSO-dep recipes — follow the SSO-dep plan (NOT operator-pending)
|
||||
|
||||
Recipes that authenticate via OIDC (lasuite-docs, cryptpad, lasuite-drive, lasuite-meet, future)
|
||||
do **not** need operator input to wire OIDC. The canonical pattern lives at
|
||||
**`plan-sso-dep-testing.md`** — declared `DEPS = ["keycloak"]` in `recipe_meta.py`, orchestrator
|
||||
co-deploys the dep, recipe `install_steps.sh` reads `$CCCI_DEPS_FILE` and writes the OIDC env vars
|
||||
+ injects the client secret via abra. Authenticated tests use `harness.sso.oidc_password_grant` or
|
||||
Playwright on the dep's login page. **`DEFERRED.md` entries that cite "operator input needed for
|
||||
OIDC" are mis-categorised** — re-open and execute them per this plan.
|
||||
|
||||
## 0b. Auto-mirror missing recipes (NOT a blocker — autonomous loop work)
|
||||
|
||||
A recipe is not on `git.autonomic.zone/recipe-maintainers/<recipe>` is **not** an operator-pending
|
||||
blocker. The bot is admin on `recipe-maintainers` (see memory `cc-ci-gitea-recipes`) and can create
|
||||
private mirror repos. **For any recipe you want to enroll/test that isn't mirrored yet, mirror it
|
||||
yourself** before enrolling, based on the **`recipe-create-pr` skill** —
|
||||
`/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (which references
|
||||
`/srv/recipe-maintainer/.claude/commands/recipe-create-pr.md` for the full procedure).
|
||||
|
||||
The flow (adapt the skill's command for the new-mirror case):
|
||||
1. Create the **private** mirror repo on `git.autonomic.zone/recipe-maintainers/<recipe>` (Gitea API
|
||||
POST `/orgs/recipe-maintainers/repos`, bot creds from `.testenv`/§1.5).
|
||||
2. Mirror the upstream `git.coopcloud.tech/coop-cloud/<recipe>` (clone --mirror → push, including
|
||||
tags) so the mirror's `main` is upstream-synced and tags carry over.
|
||||
3. Then proceed with normal enrollment + the lifecycle suite (1d generic + your overlays from this
|
||||
phase).
|
||||
|
||||
Treat this as standard loop work — don't sit idle waiting on the operator for missing recipes.
|
||||
|
||||
## 0a. Prerequisite: Phase 1e harness corrections (must be DONE first)
|
||||
|
||||
The 1d/1b operator review produced three shared-harness corrections, now their own phase
|
||||
**`plan-phase1e-harness-corrections.md`** (runs **before** this phase). Do **not** author overlays
|
||||
until 1e is `## DONE`: it changes the foundation every overlay sits on — (HC1) upgrade goes
|
||||
prev-release → **PR head** via `deploy --chaos`; (HC2) **repo-local PR code runs only for approved
|
||||
recipes** (default cc-ci-overlays + generic only); (HC3) the **generic runs by default alongside an
|
||||
overlay**, skipped only via an explicit opt-out. See that plan for detail.
|
||||
|
||||
## 0. Relationship to Phase 1 (read first)
|
||||
|
||||
Phase 1 built the **machine**: the Drone pipeline, the `!testme` trigger (polling-primary), the
|
||||
|
||||
113
cc-ci-plan/plan-phase4-final-review-polish-cleanup.md
Normal file
113
cc-ci-plan/plan-phase4-final-review-polish-cleanup.md
Normal file
@ -0,0 +1,113 @@
|
||||
# cc-ci Phase 4 — Final review, polish & cleanup (capstone) (Autonomous Build Plan)
|
||||
|
||||
**Status:** QUEUED — the **LAST** phase, runs after Phase 3 (`plan-phase3-results-ux.md`). A bounded
|
||||
final review/lint/cleanup pass over the **entire** codebase as it stands after all phases, ending in a
|
||||
**full cold re-verification that nothing regressed**.
|
||||
**Transition:** auto (last in the launcher sequence); after it, the whole build is done.
|
||||
**Builds on:** everything — Phase 1 + 1c + 1b + 1d + 1e + 2 + 2b + 3 (flake/`nix/` modules, the
|
||||
runner/harness + generic suite + recipe overlays, the comment-bridge, Drone, the dashboard/results
|
||||
UX, docs, `machine-docs/`).
|
||||
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
|
||||
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase4-final-review-polish-cleanup.md`
|
||||
**Phase order:** 1c → 1b → 1d → 1e → 2 → 2b → 3 → **4 (final)**.
|
||||
|
||||
---
|
||||
|
||||
## 0. Why this phase (and why it's bounded)
|
||||
|
||||
This is the analogue of Phase 1b, but final and whole-codebase. By now the tree has grown a lot —
|
||||
recipe overlays (Phase 2), performance changes (2b), and results/dashboard UX (3) — all layered on
|
||||
the foundation. Before calling the build done, do one **bounded** pass to clean and harden it, and —
|
||||
critically — **re-verify from a cold start that none of the growth/cleanup regressed any earlier
|
||||
guarantee.** Same discipline as 1b: **good-enough + enforceable**, style→tooling, judgment→checklist,
|
||||
don't reopen settled design, and **never weaken a test** to satisfy a nit.
|
||||
|
||||
---
|
||||
|
||||
## 1. Definition of Done (Phase 4 exit condition)
|
||||
|
||||
Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
|
||||
`machine-docs/REVIEW-4.md`):
|
||||
|
||||
- [ ] **F1 — Lint/format green across the whole codebase.** Re-run the 1b toolchain (alejandra/
|
||||
statix/deadnix, ruff, shellcheck/shfmt, yamllint) over everything added since 1b (Phase-2
|
||||
overlays, 2b changes, 3 UX/dashboard); extend the lint config to any new languages/areas (e.g.
|
||||
dashboard front-end) so it's covered going forward. The `.drone.yml` lint stage still passes
|
||||
from a clean checkout; prove with a break-it probe.
|
||||
- [ ] **F2 — White-box review checklist over all post-1b code.** Run the §3 checklist; fix every
|
||||
**blocking** finding, triage advisories to `BACKLOG`/`IDEAS`. Findings + resolutions in
|
||||
`machine-docs/REVIEW-4.md`.
|
||||
- [ ] **F3 — Cleanup.** Remove dead code/scaffolding and stale TODOs; consistent naming/structure;
|
||||
reconcile `machine-docs/` (BACKLOG/IDEAS/DECISIONS current, no contradictions); docs match the
|
||||
final state. No behavior change beyond what F2 mandates.
|
||||
- [ ] **F4 — FULL cold re-verification (the final gate).** *After* F1–F3 land, the Adversary
|
||||
**independently re-verifies every prior Definition-of-Done from a cold start**, to the same bar
|
||||
each phase used — fresh PASS + evidence + timestamps in `machine-docs/REVIEW-4.md` within 24h,
|
||||
**nothing weakened/skipped/softened** by the cleanup:
|
||||
- **Phase 1 D1–D10** (incl. the genuine **D8** byte-identical fresh-clone rebuild + a
|
||||
category-spanning live `!testme` e2e through the public gateway).
|
||||
- **Phase 1c C1–C7** (secrets-in-git, cert-in-sops, honest reproducibility).
|
||||
- **Phase 1d DG1–DG8** (generic install/upgrade/backup/restore, deploy-once `DG4.1`, override
|
||||
floor) **as amended by 1e**.
|
||||
- **Phase 1e HC1–HC3** (upgrade→PR-head via `deploy --chaos`; repo-local gated to approved
|
||||
recipes; generic-by-default + explicit opt-out).
|
||||
- **Phase 2** recipe-coverage criteria (every enrolled recipe's overlays/ported tests real,
|
||||
DRY, green).
|
||||
- **Phase 2b** performance claims (the measured improvements still hold; no test weakened to
|
||||
get them).
|
||||
- **Phase 3** results/level/UX criteria (per-run level honest, PR comment + dashboard correct).
|
||||
- [ ] **F5 — Documented + cold-verified.** Final `docs/` accurate (install reproduces from scratch;
|
||||
enroll-recipe + overlay/approval flow correct); accepted deviations in `DECISIONS.md`; the
|
||||
Adversary confirms F1–F4 with no standing VETO and no open `[adversary]` finding.
|
||||
|
||||
When F1–F5 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-4.md` — the build is
|
||||
complete.
|
||||
|
||||
---
|
||||
|
||||
## 2. Method
|
||||
1. **Lint/format first** (F1) — re-run + extend; auto-fix style, don't deliberate.
|
||||
2. **Review checklist** (F2, §3) — classify blocking vs advisory; fix blocking, triage rest.
|
||||
3. **Cleanup** (F3) — dead code, naming, docs, `machine-docs/` reconciliation.
|
||||
4. **Full cold re-verification LAST** (F4) — once everything has landed, the Adversary re-runs the
|
||||
entire cross-phase acceptance from cold. Order matters: tooling → review/fixes → cleanup → *then*
|
||||
full re-verify. Cleanup must regress nothing.
|
||||
5. **Bound it** — a pass, not a rewrite; record dead-ends/deviations and stop.
|
||||
|
||||
## 3. White-box review checklist (teeth, not taste) — whole codebase
|
||||
Blocking unless noted (plan-relevant invariants visible only by reading code):
|
||||
- **Tests are real** (blocking) — every generic/overlay/custom test asserts actual app state; no
|
||||
`skip`/`xfail`/can't-fail; per-op `pass/fail/skip` honest; the 1d/1e anti-vacuous guards
|
||||
(`assert_serving` routing proof, `do_upgrade` "moved", deploy-count==1) intact.
|
||||
- **1e corrections intact** (blocking) — repo-local code still gated to approved recipes; generic
|
||||
still runs by default (opt-out explicit); upgrade still targets the PR head.
|
||||
- **Generic-first / custom-additive invariant** (blocking — `docs/testing.md`). Confirm no path
|
||||
makes the generic tier depend on custom: deps deploy + `setup_custom_tests` run **after** all
|
||||
generic tiers, never before; a forced `setup_custom_tests` failure still yields a clean
|
||||
generic-tier `pass/pass/pass/pass` + `skip(deps-not-ready)` for `@requires_deps` custom tests
|
||||
(re-exercise the forced-failure case). Future maintainers must be able to operate cc-ci with
|
||||
the generic tier alone — verify that path stays viable.
|
||||
- **Harness DRY** (blocking-ish) — recipe quirks are data (`recipe_meta.py`), not shared-harness
|
||||
conditionals; overlays are thin; no per-recipe copy-paste of lifecycle logic.
|
||||
- **Server state Nix-declared & idempotent** (blocking) — no imperative drift / run-once sentinels /
|
||||
manual post-rebuild steps; the `nix/` layout clean.
|
||||
- **No footguns** (blocking) — no bare `sleep` for readiness (poll); teardown in `finally`; secrets
|
||||
reused per run not regenerated; no hardcoded versions/domains that break upstream.
|
||||
- **No secrets in code/committed files** (blocking) — grep source/configs/`.drone.yml`/fixtures;
|
||||
log/dashboard redaction real (incl. any new Phase-3 UX surface that echoes run data).
|
||||
- **Phase-3 UX correctness** (advisory→blocking on real drift) — the displayed level/badge/screenshot
|
||||
reflect the true per-op results; no misleading "pass".
|
||||
- **Architecture matches the plans; deviations in `DECISIONS.md`** (advisory→blocking on real drift).
|
||||
- **Readability & docs** (advisory) — clear names, dead code removed, docs reproduce from scratch.
|
||||
|
||||
## 4. Guardrails
|
||||
- **Never weaken a test** to satisfy a lint/review/cleanup nit (cardinal rule wins).
|
||||
- **Don't reopen settled design** — clean + harden + re-verify; bigger ideas → `IDEAS.md`.
|
||||
- **Bounded** — one pass; cap iterations; record + stop.
|
||||
- **Cleanup regresses nothing** — F4 is the proof; if a cleanup breaks a prior guarantee, revert the
|
||||
cleanup, not the guarantee.
|
||||
|
||||
## 5. Open decisions (log in machine-docs/DECISIONS.md)
|
||||
- Any new linters/formatters for Phase-3 front-end / new areas, and their strictness.
|
||||
- The precise blocking-vs-advisory split for the §3 checklist on the new code.
|
||||
- Whether to add Python type-checking now or defer to `IDEAS.md` (carried from 1b).
|
||||
205
cc-ci-plan/plan-sso-dep-testing.md
Normal file
205
cc-ci-plan/plan-sso-dep-testing.md
Normal file
@ -0,0 +1,205 @@
|
||||
# SSO-dep testing pattern (OIDC + co-deployed provider) — reference plan
|
||||
|
||||
**Status:** active reference for Phase 2 (does not need its own phase; it's a pattern Phase 2
|
||||
overlays apply per recipe).
|
||||
**Operator clarification (2026-05-28):** integrating a recipe with an OIDC/SSO dep is **loop work,
|
||||
not operator work**. Anything that was deferred citing "operator input needed for OIDC" should be
|
||||
re-opened and done autonomously per this plan.
|
||||
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-sso-dep-testing.md`
|
||||
**Companion:** the running harness in `runner/harness/sso.py` (existing primitives:
|
||||
`setup_keycloak_realm`, `oidc_password_grant`, `assert_discovery_endpoint`).
|
||||
|
||||
---
|
||||
|
||||
## 0. Why this plan
|
||||
|
||||
Several recipes test their authenticated functionality through an OIDC/SSO provider (keycloak,
|
||||
authentik). The cc-ci pattern is to **co-deploy the provider with the recipe under test in the same
|
||||
ephemeral run** — one shared deployment per dep, configured at install time, used by the
|
||||
recipe-under-test's authenticated tests, torn down with it. This file is the canonical pattern for
|
||||
how to wire that up so any recipe that declares `DEPS = ["keycloak"]` (or `["authentik"]`) Just Works
|
||||
without per-recipe ad-hoc plumbing. Recipes that need OIDC are not blocked on the operator — they
|
||||
follow this plan.
|
||||
|
||||
## 1. The DEPS model — deps deploy AFTER generic tiers (operator-2026-05-28)
|
||||
|
||||
**Critical ordering rule:** generic tiers (install / upgrade / backup / restore) run against the
|
||||
**recipe alone, with no dep available**, so a failure in dep-deploy or OIDC setup **cannot break
|
||||
generic-tier signal**. Deps + OIDC wiring move to a **`setup_custom_tests` step** that runs *after*
|
||||
generic tiers and *before* the custom tier — its failure is isolated to the SSO-marked custom tests.
|
||||
|
||||
A recipe's `tests/<recipe>/recipe_meta.py` declares its SSO dep:
|
||||
```python
|
||||
DEPS = ["keycloak"] # or ["authentik"] when that backend lands
|
||||
```
|
||||
|
||||
### Lifecycle order (single run, per recipe)
|
||||
|
||||
```
|
||||
1. Deploy recipe-under-test ALONE (no deps, OIDC env unset or stubbed).
|
||||
- app_new #1 for the recipe; generic install_steps.sh runs RECIPE-ONLY setup (no deps).
|
||||
2. INSTALL tier: generic [+ overlay] assertions against the recipe alone.
|
||||
3. UPGRADE tier: abra app upgrade in place, assertions against the recipe alone.
|
||||
4. BACKUP tier: in place (if backup-capable), recipe-alone marker.
|
||||
5. RESTORE tier: in place, recipe-alone marker.
|
||||
6. setup_custom_tests step ← NEW (operator-2026-05-28)
|
||||
a. For each dep in DEPS, deploy + provision realm/client via harness.sso.setup_<provider>_realm.
|
||||
b. Write $CCCI_DEPS_FILE with each dep's {domain, realm, client_id, client_secret, admin_*}.
|
||||
c. Run the per-recipe post-deps hook `tests/<recipe>/setup_custom_tests.sh` to wire the OIDC
|
||||
env into the running recipe (abra app config set + abra app secret insert) and trigger an
|
||||
in-place redeploy of the affected services so the env takes effect.
|
||||
d. Mark deps-ready = True on success; on ANY failure mark deps-ready = False and CONTINUE
|
||||
(log the error; do NOT abort the run).
|
||||
7. CUSTOM tier:
|
||||
- If deps-ready: run all custom tests, including those tagged @pytest.mark.requires_deps.
|
||||
- If NOT deps-ready: still run custom tests, but tests tagged @pytest.mark.requires_deps are
|
||||
reported as ERROR/SKIP (with the captured setup_custom_tests error attached). Non-deps
|
||||
custom tests still run normally.
|
||||
8. Teardown (in finally): recipe first; then each dep in reverse declaration order.
|
||||
```
|
||||
|
||||
### DG4.1 deploy-count guard, generalised
|
||||
The "one deploy per run" guard becomes **one `abra app new` per app in the run** (recipe + each
|
||||
dep). In-place reconfigure-and-redeploy (the step 6c env update) is **NOT** a fresh `app_new` and
|
||||
does NOT increment the per-recipe count. So a run with `DEPS = ["keycloak"]` has exactly 2
|
||||
`app_new` calls (recipe + keycloak), no matter how many tiers ran. The per-run summary reports
|
||||
deploy-count per app for verification.
|
||||
|
||||
### Why this ordering
|
||||
- **Generic-tier signal is preserved** when SSO/dep setup is broken — the recipe's own deploy/
|
||||
upgrade/backup/restore behaviour is still tested honestly.
|
||||
- **Failure isolation**: a recipe whose generic tier passes but whose SSO setup is broken yields
|
||||
per-op `pass/pass/pass/pass/skip(deps-not-ready)` — far more useful than the previous
|
||||
all-or-nothing.
|
||||
- A recipe that genuinely can't boot without OIDC fails its generic install honestly (the recipe
|
||||
should accept a stubbed/empty OIDC env at install time and only require the env when an
|
||||
authenticated endpoint is hit). That's a real recipe finding, not a CI artifact.
|
||||
|
||||
## 2. Provider pluggability
|
||||
|
||||
- **Provider-agnostic primitives** (today, in `harness/sso.py`) — these stay pluggable:
|
||||
- `oidc_password_grant(discovery_url, client_id, client_secret, username, password) -> token` —
|
||||
pure OIDC; works against any compliant provider.
|
||||
- `assert_discovery_endpoint(discovery_url, expected_issuer)` — pure OIDC.
|
||||
- **Provider-specific setup** (admin API calls) — one function per provider:
|
||||
- `setup_keycloak_realm(domain, admin_user, admin_password, realm, client_id, redirect_uris) ->
|
||||
{client_secret, discovery_url}` — exists today.
|
||||
- `setup_authentik_realm(...)` — same shape, authentik admin API; **deferred** to a future Q4
|
||||
enrollment that actually wants authentik (see `machine-docs/DEFERRED.md`). Pluggable: a recipe
|
||||
declaring `DEPS = ["authentik"]` would just call this instead. No change to the per-recipe
|
||||
`install_steps.sh` shape beyond which provider it asks for from `$CCCI_DEPS_FILE`.
|
||||
- **Don't write per-recipe SSO logic.** All recipes use the same DEPS+install_steps shape.
|
||||
|
||||
## 3. Per-recipe hooks — two distinct scripts (recipe-only vs post-deps)
|
||||
|
||||
A recipe with `DEPS = ["keycloak"]` ships **two** optional hook scripts (either may be absent if
|
||||
not needed):
|
||||
|
||||
### 3.1 `tests/<recipe>/install_steps.sh` — RECIPE-ONLY setup, runs at install time
|
||||
This is the Phase-1d custom-install-steps hook. It runs **before** the recipe deploys, **with no
|
||||
dep available** (the dep hasn't been deployed yet at this point). Use it only for recipe-only
|
||||
setup that the recipe needs to boot at all (e.g. seed a fixture, set a non-OIDC env). **Do NOT
|
||||
read `$CCCI_DEPS_FILE` here** — it doesn't exist yet. If the recipe requires OIDC to *boot at
|
||||
all*, set a safe stub here (e.g. disable auth) so the recipe can come up for generic tiers; the
|
||||
real OIDC wiring happens in §3.2.
|
||||
|
||||
### 3.2 `tests/<recipe>/setup_custom_tests.sh` — POST-DEPS wiring, runs after generic tiers
|
||||
This is the new (operator-2026-05-28) hook that wires the recipe to its already-deployed dep,
|
||||
*after* the generic tiers have run. The orchestrator has already deployed each dep and written
|
||||
`$CCCI_DEPS_FILE` by the time this runs. Roughly:
|
||||
|
||||
```sh
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
# Read the dep's connection info from $CCCI_DEPS_FILE (orchestrator-written).
|
||||
KC_DOMAIN=$(jq -r '.keycloak.domain' "$CCCI_DEPS_FILE")
|
||||
KC_CLIENT=$(jq -r '.keycloak.client_id' "$CCCI_DEPS_FILE")
|
||||
KC_SECRET=$(jq -r '.keycloak.client_secret' "$CCCI_DEPS_FILE")
|
||||
KC_REALM=$( jq -r '.keycloak.realm' "$CCCI_DEPS_FILE")
|
||||
|
||||
# Inject the OIDC client secret as an abra app secret (recipe-conventional name varies — match
|
||||
# the recipe's .env.sample SECRET_*).
|
||||
echo "$KC_SECRET" | abra app secret insert -n "$CCCI_APP_DOMAIN" oidc_rpcs v1 -
|
||||
|
||||
# Write the OIDC env vars to the parent .env (names per the recipe's .env.sample).
|
||||
abra app config set "$CCCI_APP_DOMAIN" \
|
||||
OIDC_REALM="$KC_REALM" \
|
||||
OIDC_OP_DISCOVERY_ENDPOINT="https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration" \
|
||||
OIDC_OP_AUTHORIZATION_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth" \
|
||||
OIDC_OP_TOKEN_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token" \
|
||||
OIDC_OP_USER_URL="https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo" \
|
||||
OIDC_RP_CLIENT_ID="$KC_CLIENT" \
|
||||
OIDC_RP_REDIRECT_URI="https://${CCCI_APP_DOMAIN}/auth/oidc/callback"
|
||||
|
||||
# Force an in-place redeploy of the affected services to pick up the new env. This is NOT a fresh
|
||||
# app_new (deploy-count guard still 1 for this recipe).
|
||||
abra app deploy --force --chaos --no-input "$CCCI_APP_DOMAIN"
|
||||
```
|
||||
|
||||
The OIDC env-var **names are recipe-specific** (`OIDC_OP_*` for lasuite-docs, different prefixes
|
||||
elsewhere). Read the recipe's `.env.sample` to see which keys the recipe expects; the *values* follow
|
||||
this template. If a recipe needs more than this (extra group/claim mappings, etc.), extend its
|
||||
`setup_custom_tests.sh` only — never the shared harness.
|
||||
|
||||
## 4. Test pattern: authenticated endpoints (mark + isolate)
|
||||
|
||||
- **Mark dep-requiring tests:** every custom test that needs the dep up + OIDC wired must use
|
||||
`@pytest.mark.requires_deps`. The orchestrator skips these with reason `"deps-not-ready: <err>"`
|
||||
if `setup_custom_tests` failed. Non-deps custom tests are unaffected by SSO setup failures.
|
||||
- **Headless API tests** — use `harness.sso.oidc_password_grant` to mint an access token, then call
|
||||
the recipe's authenticated endpoint with `Authorization: Bearer <token>`. Asserts on the response.
|
||||
- **Browser flows (Playwright)** — navigate to the recipe, follow the redirect to keycloak, fill the
|
||||
pre-provisioned test user's credentials, return to the recipe, exercise the UI. (Use the
|
||||
pre-provisioned `ci-user@example.com` / known password the realm setup creates.)
|
||||
- **The realm/client is fresh per run** — no cross-run state, no shared accounts. The realm setup
|
||||
creates one or more test users with known passwords (pass-through from a per-run secret) so the
|
||||
tests can authenticate without prompts.
|
||||
|
||||
## 5. Concrete recipes that use this pattern (Phase-2 scope)
|
||||
|
||||
These are **loop work** under this plan, not deferred:
|
||||
|
||||
- **lasuite-docs** — `DEPS = ["keycloak"]`; ports the upstream `oidc_login.py` +
|
||||
`upload_conversion.py` parity tests + the §4.3-prescribed `create-a-doc + read-back via
|
||||
authenticated /api/v1.0/documents/`. (Re-enters `DEFERRED.md` entry #5 — this plan IS the
|
||||
re-entry, not operator input.)
|
||||
- **cryptpad** — `DEPS = ["keycloak"]` (cryptpad upstream tests use authentik, but a keycloak-backed
|
||||
cryptpad OIDC test is equally valid and uses the same primitives). The cryptpad create-a-pad
|
||||
Playwright test (DEFERRED #6) is a separate concern — that one really does need a stable
|
||||
CryptPad app-launch contract; it stays deferred.
|
||||
- **lasuite-drive, lasuite-meet** — same pattern when mirrored (`recipe-create-pr` skill — loop work).
|
||||
- Any future recipe that requires OIDC follows this plan; no operator handoff.
|
||||
|
||||
## 6. What stays deferred (genuinely operator-input)
|
||||
|
||||
- **authentik enrollment + `setup_authentik_realm` backend** (DEFERRED #9) — provider breadth, not
|
||||
blocking any Phase-2 recipe under keycloak. Open question for the operator: do we want
|
||||
cross-provider coverage as part of Phase-2 DONE? If yes, lift; if not, leave deferred.
|
||||
- The `--extra-tests` flag IDEA is **not** a precondition for this plan; OIDC-dep tests are part
|
||||
of the default suite for the recipes that need them.
|
||||
|
||||
## 7. Definition of done for this pattern
|
||||
- [ ] `DEPS = [...]` honored by `runner/run_recipe_ci.py`, with the **deps-AFTER-generic** ordering
|
||||
(§1): deps deploy + `setup_custom_tests` step runs between RESTORE and CUSTOM tiers;
|
||||
`$CCCI_DEPS_FILE` written; deps torn down LAST in reverse order.
|
||||
- [ ] **Failure isolation proven:** a forced `setup_custom_tests` failure (e.g. simulate keycloak
|
||||
realm-setup error) yields a run where generic tiers report **pass** and CUSTOM
|
||||
`requires_deps` tests report **skip(deps-not-ready)** — no false fail of the generic tier,
|
||||
no aborted run.
|
||||
- [ ] **lasuite-docs** ships `tests/lasuite-docs/setup_custom_tests.sh` per §3.2 + authenticated
|
||||
tests per §4 marked `@pytest.mark.requires_deps` (closes DEFERRED #5 — keep the entry there
|
||||
with the closing commit, do not re-defer).
|
||||
- [ ] At least one other OIDC-dep recipe (cryptpad oidc_login or a lasuite-* once mirrored) lands
|
||||
cold-green using the same pattern, demonstrating reuse.
|
||||
- [ ] `docs/sso-dep-testing.md` (in the cc-ci repo) explains the pattern for future recipe
|
||||
enrollments — link to this plan.
|
||||
- [ ] Adversary cold-verifies the full run for one such recipe + the forced-failure isolation
|
||||
case, posts PASS in `REVIEW-2.md`.
|
||||
|
||||
## 8. Mirror+enroll reminder (also loop work)
|
||||
|
||||
If a recipe in scope (e.g. `lasuite-drive`, `lasuite-meet`, `immich`) **isn't mirrored to
|
||||
`git.autonomic.zone/recipe-maintainers/`**, mirror it autonomously via the `recipe-create-pr` skill
|
||||
at `/srv/recipe-maintainer/.opencode/skills/recipe-create-pr/SKILL.md` (see also
|
||||
`plan-phase2-recipe-tests.md §0b`). Mirror+enroll is **not** operator-pending; the bot is admin on
|
||||
the org.
|
||||
@ -126,7 +126,13 @@ without the auth key.
|
||||
|
||||
- **Wildcard TLS cert — PROVIDED, not a token.** The operator has pre-issued the wildcard SAN cert
|
||||
(`*.ci.commoninternet.net` + `ci.commoninternet.net`) and placed it on cc-ci at
|
||||
`/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0). The agent feeds these into the
|
||||
`/var/lib/ci-certs/live/{fullchain.pem,privkey.pem}` (§4.0).
|
||||
> **Phase-1c update (supersedes the cert references in §1.5/§4.0/§4.4 below):** the cert is no longer
|
||||
> an out-of-band operator file-drop — it is now **sops-encrypted in the private `cc-ci-secrets` repo**
|
||||
> (a git submodule) and **decrypted at activation to that same path** by sops-nix. Issuance stays
|
||||
> operator-only (LE/Gandi, no token on the box); to rotate, the operator re-issues then re-encrypts
|
||||
> the cert into `cc-ci-secrets` and rebuilds. The ONE out-of-band secret is now the bootstrap age key
|
||||
> at `/var/lib/sops-nix/key.txt`. Authoritative model: `cc-ci/docs/secrets.md` + `docs/install.md`. The agent feeds these into the
|
||||
`coop-cloud/traefik` recipe as its `ssl_cert`/`ssl_key` swarm secrets (wildcard/file-provider
|
||||
mode) and runs **no ACME** for this domain. **Do not request or expect a `commoninternet.net` DNS
|
||||
token** — issuance/renewal is handled out-of-band by the operator (LE 90-day cert; next renewal
|
||||
@ -597,8 +603,37 @@ its own pacing. To make concurrent writes conflict-free:
|
||||
merges the two cleanly. Closing an item = checking the box *in your own section*; the Builder
|
||||
fixes an `[adversary]` finding and notes the fix in JOURNAL, but only the Adversary ticks it
|
||||
closed after re-test.
|
||||
- `DEFERRED.md` (in `machine-docs/`) is the **single canonical registry for things the loops
|
||||
have deliberately decided not to do autonomously and that need operator input to move on.**
|
||||
Append-only; either agent may file. Each entry should clearly say *what's needed from the
|
||||
operator* to lift the deferral (an opt-in flag, a resource decision, an architectural call,
|
||||
plain "go ahead"). The list is **open-ended** — items can sit indefinitely, **no obligation
|
||||
to close every item**, closure is operator-driven. A re-entry trigger / IDEA cross-link is
|
||||
**optional** (include when there's a natural mechanism, e.g. an opt-in flag in
|
||||
`cc-ci-plan/IDEAS.md`). Don't park deferrals as a vague "Q4 follow-up" / buried JOURNAL note
|
||||
— file them here so the operator can review the whole list. The Phase-4 cleanup pass should
|
||||
**surface** DEFERRED.md to the operator at least once but does **not** force closure.
|
||||
Future-aspirational ideas (out of current scope) still go to `cc-ci-plan/IDEAS.md`; DEFERRED
|
||||
is for considered-and-parked work the loops won't tackle without operator input.
|
||||
- **Append-only where possible.** `JOURNAL.md` and `REVIEW.md` are append-only logs → they never
|
||||
conflict. Prefer appending over rewriting.
|
||||
- **Artifact-layer isolation — facts in STATUS, reasoning in JOURNAL (anti-anchoring).** Rigorous
|
||||
adversarial verification requires the Adversary NOT to consume the Builder's rationalisations
|
||||
before forming its verdict. The split:
|
||||
- `STATUS.md` MUST carry **everything the Adversary needs to verify the claim** — withholding
|
||||
verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW**
|
||||
to verify (the exact command/check the Adversary can re-run from its own clone), the
|
||||
**EXPECTED** outcome (build hashes, file contents, leaf fingerprints, status codes), and
|
||||
**WHERE** the inputs live (commit shas, paths). If it's essential for the Adversary to verify,
|
||||
it goes in STATUS.
|
||||
- `STATUS.md` MUST NOT carry rationalisations / "why I think this passes" / design narrative /
|
||||
dead-ends explored. Those go in `JOURNAL.md` (Builder-private to write).
|
||||
- The Adversary reads STATUS for the claim + verification info, the plan as SSOT, and the code /
|
||||
git history; it forms its verdict from those + its own **cold** acceptance run, and does **not**
|
||||
read `JOURNAL.md` before the verdict. After an independent verdict, consulting JOURNAL is fine
|
||||
(e.g. to contextualise a finding) — note in REVIEW that you did.
|
||||
|
||||
In short: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.**
|
||||
- **Git discipline (both loops, every write):** `git pull --rebase` before editing, make the
|
||||
smallest change, commit, `git push`. On a rebase conflict, it will be inside the *other* agent's
|
||||
file/section only if a rule was broken — re-pull and keep to your own files. Never `--force`.
|
||||
@ -613,6 +648,22 @@ its own pacing. To make concurrent writes conflict-free:
|
||||
- **Liveness.** If the Adversary sees a gate `CLAIMED` for too long with no Builder progress, or
|
||||
the Builder sees no Adversary verdict on a standing claim, note it in your own ledger and keep
|
||||
doing independent work — neither loop blocks idle waiting on the other beyond its gate.
|
||||
- **INBOX — explicit cross-loop messaging beyond CLAIMS.** Sometimes you have something to say to
|
||||
the other loop that isn't a gate claim or a REVIEW verdict (a heads-up, a request for
|
||||
early-look, a "I refactored X, please re-verify Y", an observation outside the normal flow). For
|
||||
those, use the inbox files in `machine-docs/`:
|
||||
- **Builder → Adversary:** the Builder writes/appends `machine-docs/ADVERSARY-INBOX.md` in its
|
||||
own clone, commits, pushes.
|
||||
- **Adversary → Builder:** the Adversary writes/appends `machine-docs/BUILDER-INBOX.md` in its
|
||||
own clone, commits, pushes.
|
||||
- The watchdog edge-triggers on **newly-present** inbox files in the relevant clone and pings
|
||||
the receiver. The receiver, on receipt, reads + processes the message, then **deletes the
|
||||
inbox file** (commits + pushes) — deletion is the "message consumed" signal. Single-writer
|
||||
discipline: only the sender writes their counterpart's inbox; only the receiver deletes it.
|
||||
- **Use for:** non-gate signals — "heads-up I'm about to refactor X," "please cold-verify this
|
||||
while I keep going," "I observed Y outside our normal flow," "I'm taking a long e2e now."
|
||||
**Do NOT use for:** formal gate claims (`STATUS.md` still owns those) or verdicts (`REVIEW.md`
|
||||
still owns those). The inbox is a side-channel, not a replacement.
|
||||
|
||||
(If you are ever forced to run with a single process, the degraded fallback is to alternate
|
||||
roles per iteration and keep `JOURNAL.md` and `REVIEW.md` strictly separate — but two loops is
|
||||
@ -649,8 +700,12 @@ every wake, `git pull --rebase` first, then:
|
||||
**Pacing.** Use `/loop` (self-paced) or `ScheduleWakeup`. Most waits here are for things the
|
||||
harness can't notify you about — a Drone build, a `nixos-rebuild`, a deploy converging — so poll
|
||||
the *specific* thing. Three cases:
|
||||
1. **Something in flight** (build/deploy/`nixos-rebuild`) → re-check on a short cadence (≈4 min) to
|
||||
stay cache-warm; keep polling *it*, don't treat it as idle, and don't spin on a minutes-long build.
|
||||
1. **Something in flight** (build/deploy/`nixos-rebuild`/e2e/heavy test) → **poll every ~5 min** to
|
||||
stay cache-warm and to **see failures as they happen**, not at the end of a 25-minute sleep. Do
|
||||
**NOT** `ScheduleWakeup` for the expected total runtime of the task in a single big sleep — a 25
|
||||
min e2e gets 5 short cache-warm polls, not one 25-min cache-cold blackout. The wakeup that wakes
|
||||
you mid-task is *cheap* (one cache hit, one quick status check); the value of catching a deploy
|
||||
that died at minute 4 of a 25-min budget is large. Keep polling *it*, don't treat it as idle.
|
||||
2. **Blocked on the *other* loop** — Builder parked at a `CLAIMED` gate awaiting the Adversary, or
|
||||
Adversary waiting for the Builder to fix an `[adversary]` finding. **You don't need to busy-poll
|
||||
here: the watchdog signals across the handoff.** The moment the Builder writes a `CLAIMED` gate,
|
||||
|
||||
@ -8,6 +8,8 @@ You run as a SEPARATE process and coordinate ONLY through the git repo per §6.1
|
||||
- Keep your OWN clone at /srv/cc-ci/cc-ci-adv. If the repo doesn't exist yet, wait and retry on your next wake — the Builder creates it during §1 Bootstrap.
|
||||
- git pull --rebase before every edit; commit; push; never --force.
|
||||
- Write ONLY your files: REVIEW.md and the "## Adversary findings" section of BACKLOG.md. Everything else (code, STATUS.md, JOURNAL.md, "## Build backlog") is read-only to you.
|
||||
- INBOX side-channel (§6.1). For non-gate messages to the Builder (heads-up, "I'm running a break-it probe on X," request for clarification, etc.), write/append `machine-docs/BUILDER-INBOX.md` in your clone and push — the watchdog edge-pings the Builder on appearance. To receive a message from the Builder, look for `machine-docs/ADVERSARY-INBOX.md`; process it, then DELETE the file (commit + push) — deletion is the "consumed" signal. Do NOT use the inbox for formal verdicts — REVIEW.md still owns those.
|
||||
- ISOLATION DISCIPLINE (anti-anchoring — critical). The Builder is REQUIRED to give you in STATUS.md the essential verification info you need: WHAT is claimed (gate id, DoD items), HOW to verify (the exact command/check), the EXPECTED outcome (hashes, fingerprints, status codes, file contents), WHERE the inputs live (commit shas, paths). **Read STATUS for that — you need all of it to verify.** What you must IGNORE — in STATUS, and NEVER read in JOURNAL.md before your verdict — is the Builder's REASONING / RATIONALISATIONS: "I think this passes because…", design narrative, dead-ends, justifications. Reading those anchors you. Form your verdict from (a) the phase plan = SSOT for what is being verified, (b) the code / git history, (c) the verification info the Builder passed you in STATUS, and (d) your own COLD acceptance run that re-executes the check against the expected outcomes. Only AFTER you have written your verdict may you consult JOURNAL.md (e.g. to contextualise a finding) — note in REVIEW.md that you did. Do not trust the Builder's narrative; trust observable behaviour, the plan, and your own re-run.
|
||||
|
||||
Each wake:
|
||||
1. Pull. Read STATUS.md for any "Gate: <Mn> CLAIMED, awaiting Adversary".
|
||||
|
||||
@ -2,12 +2,16 @@ You are the Builder agent for the cc-ci project — one of two independent loops
|
||||
|
||||
Single source of truth: /srv/cc-ci/cc-ci-plan/plan.md. Read it in full now, then begin at §1 Bootstrap. The original brief /srv/cc-ci/cc-ci-plan/brief.md is context only — do not edit it.
|
||||
|
||||
Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7 (three cases): (1) build/deploy/rebuild in flight → poll ~4m, keep polling it; (2) parked at a CLAIMED gate awaiting the Adversary with no other unblocked work → the watchdog will PING you the moment the Adversary updates REVIEW.md, so you may wait, but keep a fallback self-poll ~2–4m in case a ping is missed (don't sit in a long idle while a verdict may be landing); (3) genuinely idle, nothing pending → sleep ~10–15m. Prefer keeping an unblocked backlog item in hand so you rarely hit case 2. Do NOT spin on a minutes-long build. Stop the loop only when STATUS.md says ## DONE.
|
||||
Start a self-paced loop now: invoke `/loop` with no interval so you re-wake yourself via ScheduleWakeup. Each iteration = one unit of work (see §7). Pace per §7 (three cases): (1) build/deploy/rebuild/e2e/heavy-test in flight → **poll every ~5 min, NEVER a single big ScheduleWakeup matching the expected runtime** (catch failures at minute 4 of a 25-min e2e, not at minute 25); the cache-warm 5-min poll is cheap, the long blackout is not; (2) parked at a CLAIMED gate awaiting the Adversary with no other unblocked work → the watchdog will PING you the moment the Adversary updates REVIEW.md OR writes a BUILDER-INBOX.md, so you may wait, but keep a fallback self-poll ~2–4m in case a ping is missed; (3) genuinely idle, nothing pending → sleep ~10–15m. Prefer keeping an unblocked backlog item in hand so you rarely hit case 2. Stop the loop only when STATUS.md says ## DONE.
|
||||
|
||||
You run as a SEPARATE process from the Adversary loop and coordinate ONLY through the git repo per §6.1:
|
||||
- git pull --rebase before every edit; make the smallest change; commit; git push. Never --force.
|
||||
- Write ONLY your files: source/config, STATUS.md, JOURNAL.md, DECISIONS.md, and the "## Build backlog" section of BACKLOG.md. Treat REVIEW.md and "## Adversary findings" as read-only — the Adversary owns them.
|
||||
- ARTIFACT-LAYER ISOLATION (facts in STATUS, reasoning in JOURNAL). STATUS.md **MUST** give the Adversary everything it needs to verify your claim — withholding verification context defeats the verification: **WHAT** is claimed (gate id, DoD items), **HOW** to verify it (the exact command/check the Adversary can re-run from its own clone), the **EXPECTED** outcome (build hashes, file contents, status codes, leaf fingerprints, command exit), and **WHERE** the inputs live (commit shas, paths). If something is essential for the Adversary to verify, put it in STATUS. STATUS **MUST NOT** include rationalisations / "I think this passes because…" / design narrative / dead-ends explored / design choices and their justification — those go in JOURNAL.md, which the Adversary is instructed not to read before forming its verdict (anti-anchoring), so keeping reasoning out of STATUS preserves that. The line: **WHAT + HOW + EXPECTED + WHERE = STATUS; WHY = JOURNAL.** DECISIONS.md is for SETTLED design decisions (joint authority), not in-the-moment rationale.
|
||||
- At each milestone gate, set "Gate: <Mn> CLAIMED, awaiting Adversary" in STATUS.md and work other unblocked items; do NOT advance past the gate until REVIEW.md shows its PASS.
|
||||
- INBOX side-channel (§6.1). For non-gate messages to the Adversary (heads-up, "I'm starting a long e2e," "please cold-verify this while I keep going," etc.), write/append `machine-docs/ADVERSARY-INBOX.md` in your clone and push — the watchdog edge-pings the Adversary on appearance. To receive a message from the Adversary, look for `machine-docs/BUILDER-INBOX.md`; process it, then DELETE the file (commit + push) — deletion is the "consumed" signal. Do NOT use the inbox for formal gate claims or verdicts — STATUS.md / REVIEW.md still own those.
|
||||
- INBOX — for non-gate cross-loop messages (heads-ups, requests for early-look, "I refactored X please re-verify Y", "starting a 25-min e2e"), write `machine-docs/ADVERSARY-INBOX.md` in your clone and push. The watchdog edge-triggers and pings the Adversary. The Adversary deletes the file on consumption. If you receive `machine-docs/BUILDER-INBOX.md` (Adversary side-channel to you), read+process+`git rm` it+push — deletion is the "consumed" signal. Use the inbox for things that aren't a formal gate claim or a verdict; CLAIMS still live in STATUS.md and verdicts in REVIEW.md (the inbox is a side-channel, not a replacement).
|
||||
- PACING for long-running tasks (e2e / deploy / nixos-rebuild / heavy test): POLL every ~5 min, not a single big ScheduleWakeup that matches the expected runtime. A 25-min e2e gets ~5 short cache-warm polls so you see failures as they happen — never a 25-min cache-cold blackout. (plan.md §7 case 1.)
|
||||
- Write "## DONE" only when REVIEW.md shows a PASS dated <24h for every D1–D10 and there is no standing "## VETO".
|
||||
|
||||
Overriding rules:
|
||||
|
||||
22
cc-ci-plan/reboot-log.sh
Executable file
22
cc-ci-plan/reboot-log.sh
Executable file
@ -0,0 +1,22 @@
|
||||
#!/usr/bin/env bash
|
||||
# Runs as ExecStartPre of cc-ci-loops.service. Appends ONE line to REBOOTS.md per genuine reboot.
|
||||
# Uses the kernel boot_id to distinguish a real reboot from a mere `systemctl restart` of the unit:
|
||||
# only logs when the current boot_id differs from the last one we recorded.
|
||||
set -u
|
||||
|
||||
REBOOTS="/srv/cc-ci/cc-ci-plan/REBOOTS.md"
|
||||
LAST_BOOT_FILE="/srv/cc-ci/.cc-ci-logs/.last-boot-id"
|
||||
PHASE_IDX_FILE="/srv/cc-ci/.cc-ci-logs/.phase-idx"
|
||||
|
||||
cur_boot="$(cat /proc/sys/kernel/random/boot_id 2>/dev/null || echo unknown)"
|
||||
last_boot="$(cat "$LAST_BOOT_FILE" 2>/dev/null || echo '')"
|
||||
|
||||
# Same boot_id => this is a manual service restart, not a reboot => do nothing.
|
||||
[ "$cur_boot" = "$last_boot" ] && exit 0
|
||||
|
||||
idx="$(cat "$PHASE_IDX_FILE" 2>/dev/null || echo '?')"
|
||||
ts="$(date '+%Y-%m-%d %H:%M:%S %Z')"
|
||||
mkdir -p "$(dirname "$LAST_BOOT_FILE")"
|
||||
printf '%s\n' "- $ts — reboot detected; loops auto-started by systemd (resuming phase index $idx). boot_id=$cur_boot" >> "$REBOOTS" 2>/dev/null || true
|
||||
echo "$cur_boot" > "$LAST_BOOT_FILE" 2>/dev/null || true
|
||||
exit 0
|
||||
32
cc-ci-plan/systemd/cc-ci-loops.service
Normal file
32
cc-ci-plan/systemd/cc-ci-loops.service
Normal file
@ -0,0 +1,32 @@
|
||||
[Unit]
|
||||
# Canonical, version-controlled copy of the unit installed at /etc/systemd/system/cc-ci-loops.service.
|
||||
# Install: sudo install -m0644 cc-ci-plan/systemd/cc-ci-loops.service /etc/systemd/system/ \
|
||||
# && sudo systemctl daemon-reload && sudo systemctl enable cc-ci-loops.service
|
||||
# Brings the WHOLE rig back after a reboot of the orchestrator Pi: loops + watchdog (launch.sh) AND
|
||||
# the orchestrator supervisory session (launch-orchestrator.sh), plus a reboot record (reboot-log.sh).
|
||||
Description=cc-ci autonomous loops + watchdog + orchestrator (reboot-resilient)
|
||||
Documentation=file:///srv/cc-ci/cc-ci-plan/plan.md
|
||||
After=network-online.target cc-ci-tailscaled.service
|
||||
Wants=network-online.target
|
||||
Requires=cc-ci-tailscaled.service
|
||||
|
||||
[Service]
|
||||
Type=oneshot
|
||||
RemainAfterExit=yes
|
||||
User=notplants
|
||||
Group=notplants
|
||||
Environment=HOME=/home/notplants
|
||||
Environment=PATH=/home/notplants/.local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
|
||||
# RESUME_PHASE=1 so a reboot resumes the SAVED phase (e.g. phase 2), never restarts from phase 0/1c.
|
||||
Environment=RESUME_PHASE=1
|
||||
# 1) record the reboot (boot_id-gated); 2) start loops + watchdog; 3) resume the orchestrator session.
|
||||
ExecStartPre=/srv/cc-ci/cc-ci-plan/reboot-log.sh
|
||||
ExecStart=/srv/cc-ci/cc-ci-plan/launch.sh start
|
||||
ExecStartPost=/srv/cc-ci/cc-ci-plan/launch-orchestrator.sh start
|
||||
# Stop only the loops + watchdog. The orchestrator session is intentionally LEFT running on a manual
|
||||
# `systemctl stop` (stopping the loops shouldn't kill your steering session; it resumes from disk).
|
||||
ExecStop=/srv/cc-ci/cc-ci-plan/launch.sh stop
|
||||
TimeoutStartSec=180
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
123
cc-ci-plan/test-e2e-testme-acceptance.md
Normal file
123
cc-ci-plan/test-e2e-testme-acceptance.md
Normal file
@ -0,0 +1,123 @@
|
||||
# Acceptance test — real end-to-end `!testme` on the clean-room-rebuilt VM
|
||||
|
||||
**Owner:** the Builder + Adversary loops (they execute *and* independently verify this).
|
||||
**When:** after **C4/C5 PASS** (genuine throwaway-VM clean-room rebuild verified). The Builder then
|
||||
performs the tailnet swap (§1) and runs the e2e; the Adversary independently verifies. It is the
|
||||
**functional acceptance** of D8/clean-room: proof that the rebuilt-from-git VM doesn't just match
|
||||
byte-for-byte, but actually *serves a real CI run end-to-end through the public domain*.
|
||||
**This file:** `/srv/cc-ci/cc-ci-plan/test-e2e-testme-acceptance.md`
|
||||
|
||||
---
|
||||
|
||||
## 0. Why
|
||||
|
||||
The reproducibility gates (C1–C5) prove the rebuilt VM is structurally identical and boots clean.
|
||||
This test proves it is **operationally** a working CI server: a maintainer comment triggers a build,
|
||||
the app deploys and is reachable on its real public URL through the operator's gateway, the test
|
||||
passes, and it tears down — the whole `!testme` pipeline, on the from-git VM, over the real domain.
|
||||
|
||||
---
|
||||
|
||||
## 1. Setup — the Builder performs the tailnet swap (then the e2e)
|
||||
|
||||
The rebuilt throwaway must become the live `cc-nix-test` so that the public gateway routes real
|
||||
`ci.commoninternet.net` traffic to it (the gateway TLS-passthroughs via MagicDNS to
|
||||
`cc-nix-test.taila4a0bf.ts.net` and re-resolves every ~10s, so it auto-follows the name). The swap is
|
||||
**two reversible `tailscale set --hostname` commands** on VMs you already control — the Builder does
|
||||
it. **Do this only after C4/C5 PASS** and after the rebuilt VM's full stack
|
||||
(traefik + bridge + drone + dashboard) is up and serving locally.
|
||||
|
||||
**Order matters** (rename the original *aside first*, or the throwaway will get `cc-nix-test-1`):
|
||||
|
||||
1. **Rename the original prod VM aside** (it stays running — do NOT destroy it; needed for swap-back):
|
||||
```
|
||||
ssh cc-ci 'tailscale set --hostname=cc-nix-test-orig'
|
||||
```
|
||||
(`ssh cc-ci` is pinned to the original's IP `100.90.116.4`, so it keeps reaching the original
|
||||
regardless of the name change.)
|
||||
2. **Rename the rebuilt throwaway → `cc-nix-test`.** Re-derive its current tailscale IP (throwaways
|
||||
get a fresh IP each rebuild): pick the ONLINE throwaway node from
|
||||
`tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep -i throwaway`, then:
|
||||
```
|
||||
ssh -i /srv/incus-terraform-nix-vm-creator/terraform-secrets/vm_ssh_key \
|
||||
-o ProxyCommand='nc -X 5 -x 127.0.0.1:1055 %h %p' root@<throwaway-ip> \
|
||||
'tailscale set --hostname=cc-nix-test'
|
||||
```
|
||||
|
||||
**Heads-up — tailnet-wide effect:** after the swap, `cc-nix-test.taila4a0bf.ts.net` resolves to the
|
||||
rebuilt VM for *everyone* on the tailnet, so any of your own tooling that targets cc-nix-test **by
|
||||
MagicDNS name** will now hit the rebuilt VM (tooling pinned to the raw IP `100.90.116.4` still hits
|
||||
the original). Account for that when you point `!testme`/deploys.
|
||||
|
||||
**Verify the swap took (P1+P2) before starting the e2e** — must pass:
|
||||
```
|
||||
tailscale --socket=$HOME/.cc-ci-ts/tailscaled.sock status | grep cc-nix-test # → the throwaway's IP
|
||||
curl -sS -o /dev/null -w '%{http_code} ssl_verify=%{ssl_verify_result}\n' https://ci.commoninternet.net/
|
||||
# expect: 200 ssl_verify=0 (real public path now served by the rebuilt VM, valid cert)
|
||||
```
|
||||
|
||||
**Swap-back when testing is done** (reversible): rename the throwaway back to its old name, then
|
||||
`ssh cc-ci 'tailscale set --hostname=cc-nix-test'` to restore the original; the gateway re-follows.
|
||||
|
||||
---
|
||||
|
||||
## 2. Procedure
|
||||
|
||||
1. **Pick one fast, already-enrolled recipe.** Prefer the lightest enrolled app (e.g. `custom-html`)
|
||||
so the run is quick and resource-cheap. Note the recipe + the repo/issue or PR where `!testme` is
|
||||
recognised (the same place prior runs were triggered).
|
||||
2. **Record the baseline.** Capture the recipe's *current* latest Drone run number and the dashboard
|
||||
row (`https://ci.commoninternet.net/` and `https://drone.ci.commoninternet.net/...`) so you can
|
||||
prove the run you trigger is **new**.
|
||||
3. **Trigger via the real path.** Post `!testme` as the **bot** (the normal maintainer-comment
|
||||
trigger) on that recipe — exactly as a real maintainer would. Do **not** invoke Drone directly or
|
||||
shortcut the bridge; the comment→bridge→Drone path is part of what's under test.
|
||||
4. **Confirm the bridge picked it up.** Within the bridge's poll interval, a **new** Drone build for
|
||||
that recipe starts. Capture the new run number (must be > the baseline from step 2).
|
||||
5. **Confirm the app deploys and is reachable on its PUBLIC URL.** While the build runs, the app is
|
||||
deployed to its `*.ci.commoninternet.net` test domain. From **off the VM** (external — through the
|
||||
gateway, not `localhost`/`127.0.0.1`), confirm a real request succeeds:
|
||||
```
|
||||
curl -sS -D- -o /dev/null https://<app-test-subdomain>.ci.commoninternet.net/
|
||||
# expect: HTTP 200 (or the app's expected status), valid *.ci.commoninternet.net cert,
|
||||
# served content from the deployed app — NOT a Traefik 404 / default-cert.
|
||||
```
|
||||
This is the crux: it proves routing public-DNS → gateway → MagicDNS → rebuilt VM → Traefik →
|
||||
deployed app all works on the rebuilt server.
|
||||
6. **Confirm the test logic passed.** The Drone build runs the recipe's real test assertions (app
|
||||
state, not health-only) and finishes **success**.
|
||||
7. **Confirm teardown.** After the run, the app is **undeployed** (no leftover stack/containers), per
|
||||
the standard post-run cleanup — verify it's gone.
|
||||
8. **Confirm the result was reported.** The outcome posts back to the trigger location and the
|
||||
dashboard row updates to the new run with `success`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Pass criteria (all must hold; Adversary verifies independently)
|
||||
|
||||
- [ ] **E1.** Self-check §1 passed (`ci.commoninternet.net` = 200, valid cert, on the rebuilt VM).
|
||||
- [ ] **E2.** Posting `!testme` produced a **new** Drone build (run # > baseline) via the bridge —
|
||||
not a manual Drone trigger.
|
||||
- [ ] **E3.** The deployed app answered an **external** request on its real
|
||||
`<app>.ci.commoninternet.net` URL (through the gateway) with the expected response + valid cert
|
||||
— captured with headers/body evidence.
|
||||
- [ ] **E4.** The Drone build's **real test assertions** ran and the build finished **success**
|
||||
(no skipped/softened tests).
|
||||
- [ ] **E5.** The app **undeployed** cleanly afterward (no residual stack).
|
||||
- [ ] **E6.** Result reported back + dashboard updated to the new successful run.
|
||||
|
||||
Evidence (run #, the external `curl` headers/body, dashboard before/after, undeploy proof) is logged
|
||||
in `JOURNAL-1c.md`, and the verdict in `REVIEW-1c.md` / `STATUS-1c.md` as **E2E-TESTME — PASS**.
|
||||
|
||||
## 4. If it fails
|
||||
|
||||
Treat as a clean-room finding, not a config patch: a failure here means the from-git rebuild is
|
||||
missing something the running server had out-of-band (a secret, a manual step, drift). Capture the
|
||||
failing stage + logs in `JOURNAL-1c.md`, raise it as a blocker, and fix it in the **git source**
|
||||
(base or `cc-ci-secrets`) so the next rebuild includes it — do **not** hand-fix the live VM. Re-run
|
||||
this test after the fix.
|
||||
|
||||
## 5. Bound
|
||||
|
||||
One recipe, one green run. This is a functional smoke test of the rebuilt VM, not a full recipe-test
|
||||
campaign (that's Phase 2). Don't expand scope here.
|
||||
Reference in New Issue
Block a user