diff --git a/.claude/skills/ci-dev-workflow/SKILL.md b/.claude/skills/ci-dev-workflow/SKILL.md new file mode 100644 index 0000000..0fb3c18 --- /dev/null +++ b/.claude/skills/ci-dev-workflow/SKILL.md @@ -0,0 +1,120 @@ +--- +name: ci-dev-workflow +description: End-to-end workflow for developing a feature or fix in the cc-ci CI server (the harness, the level/results/card system, or a recipe's tests) from the orchestrator. Covers exploring the harness, branching a local clone, implementing with unit + live verification, opening a PR, getting an independent adversary verdict, merging, and deploying to the running CI server. Use whenever you are changing cc-ci server code (runner/harness/**, tests/**) rather than just running recipe upgrades. Invoke as /ci-dev-workflow. +--- + +# ci-dev-workflow + +How to safely build and ship a change to the **cc-ci CI server** from the orchestrator. This is the +flow used to land the "intentional skips + 4-rung level ladder" feature. It never pushes `main` and +never merges without an independent adversary PASS. + +## The two repos / two hosts +- **Orchestrator** (where you are): `/srv/cc-ci-orch` and `/srv/cc-ci` are both checkouts of the + **cc-ci-orchestrator** repo (plans, skills, launchers). Edit `/srv/cc-ci-orch`; push to + `git.autonomic.zone/recipe-maintainers/cc-ci-orchestrator`. +- **cc-ci CI server**: reached via `ssh cc-ci`. The CI server's own source is the **cc-ci** repo + (`https://autonomic-bot:@git.autonomic.zone/recipe-maintainers/cc-ci.git` — the token is in + the `/root/builder-clone` remote and in `/srv/cc-ci/.testenv` as `GITEA_*`). The harness lives in + `runner/harness/**`, the orchestrator entry is `runner/run_recipe_ci.py`, tests in `tests/**`. + +> **Key constraint:** there is NO `python3`/`pytest` on the orchestrator host. Run all Python — unit +> tests, card/badge rendering, live harness runs — ON the cc-ci host via `ssh cc-ci`, using the +> `cc-ci-run` wrapper (it provides python + pytest + playwright/chromium + the harness env). + +## Harness orientation (read before editing) +- `runner/harness/level.py` — PURE level ladder. **Cardinal invariant: presentation must never look + greener than the tests.** The ladder is the four ESSENTIAL rungs `install · upgrade · backup_restore + · functional` (top = L4); a gap (FAIL *or* N/A) caps the climb. integration/recipe-local are + optional and NOT leveled. Keep this module pure + unit-tested. +- `runner/harness/results.py` — builds `results.json` (`derive_rungs`, `skips`, `build_results`). + `skips` splits N/A rungs into intentional (declared in `recipe_meta.EXPECTED_NA={rung:reason}`) vs + unintentional. Results assembly is best-effort (R7): a failure here NEVER changes the run verdict. +- `runner/harness/card.py` — summary card (HTML→PNG via headless chromium) + the level badge SVG. +- `runner/run_recipe_ci.py` — the orchestrator: deploys once, runs the tiers + (`install,upgrade,backup,restore,custom`), computes the verdict. **The run VERDICT (SSO/F2-11, + teardown, leak scan) is separate from the cosmetic level/card — never weaken verdict enforcement.** +- A recipe's tests live in `tests//`: `recipe_meta.py` (timeouts, `EXPECTED_NA`, + `BACKUP_CAPABLE`, `DEPS`, …), `functional/` + `playwright/` (custom tier), `install_steps.sh` + (pre-deploy hook), `ops.py` (pre-op seed hooks). The generic tiers are `tests/_generic/`. + +## Steps + +### 1. Explore first +Read the modules you'll touch + their unit tests (`tests/unit/test_*.py`) and any recipe under +`tests//`. Understand the verdict-vs-cosmetic split and the cardinal "never inflate" rule. +For a recipe-behaviour question, also look at the recipe on the host: +`ssh cc-ci 'cat ~/.abra/recipes//compose*.yml'`. + +### 2. Branch a local clone of cc-ci (on the orchestrator) +So you can edit with real tooling (Read/Edit/Write) instead of ssh+heredocs: +``` +cd /tmp && git clone "https://autonomic-bot:@git.autonomic.zone/recipe-maintainers/cc-ci.git" cc-ci-feature +cd cc-ci-feature && git checkout -b feat/ +git config user.name autonomic-bot && git config user.email autonomic-bot@git.autonomic.zone +``` + +### 3. Implement +Edit the harness + tests in the local clone. Match surrounding style. For pure logic (level/results), +add focused unit tests in `tests/unit/`. For recipe behaviour, add a `tests//functional/ +test_*.py` (custom tier; gets the `live_app` fixture = the per-run domain). Remember shell-less +images (e.g. `static-web-server`) can't be `docker exec`-ed — seed via the volume mountpoint like +`install_steps.sh` does. + +### 4. Verify cold on the cc-ci host +Commit + push the branch, then run the FULL unit suite cold against a fresh checkout on cc-ci: +``` +git -C /tmp/cc-ci-feature add -A && git -C /tmp/cc-ci-feature commit -m "…" && git -C /tmp/cc-ci-feature push -u origin feat/ +ssh cc-ci 'rm -rf /tmp/v && git clone -q "https://…@…/cc-ci.git" /tmp/v && cd /tmp/v && git checkout -q feat/ && cc-ci-run -m pytest tests/unit/ -q' +``` +Iterate until **all** unit tests pass. For UI changes, render the card/badge on cc-ci with `cc-ci-run` +(use `card.render_card_html` / `render_card_png` / `level_badge_svg`) and eyeball the PNG. + +### 5. Live end-to-end check (for anything touching run behaviour) +Run the harness on a representative recipe with FULL stages and inspect `results.json`: +``` +ssh cc-ci 'cd /tmp/v && set -a; . /srv/cc-ci/.testenv; set +a; \ + CCCI_RUNS_DIR=/tmp/r RECIPE= STAGES=install,upgrade,backup,restore,custom cc-ci-run runner/run_recipe_ci.py' +``` +Confirm the tiers ran, the level/skips/badge are what you expect, **teardown is clean** (deploy-count +== 1, no orphan stack/volume), and no secret leaked. Use a temp `CCCI_RUNS_DIR` so you don't pollute +the dashboard. (`custom-html-tiny` is the lightest recipe for a smoke test.) + +### 6. Open a PR (never merge directly) +Push the branch and open a PR via the Gitea API (`POST /repos/recipe-maintainers/cc-ci/pulls`, basic +auth with `GITEA_USERNAME:GITEA_PASSWORD`). Tag `@notplants` for human awareness. **Never push `main`.** + +### 7. Independent adversary verification +Write a verification plan (see `cc-ci-plan/adversary-verify-pr6.md` for the template) and dispatch an +**independent** verifier (Agent tool, adversarial stance — disbelieve and verify; it must NOT reuse +your working dirs). It does a fresh clone, the full unit suite cold, a **diff regression review** +(especially: confirm no verdict/SSO enforcement was weakened, no test was weakened), a live full-stage +run, non-vacuity checks, and teardown hygiene — returning `VERDICT: PASS|REJECT` with evidence. + +### 8. Merge on PASS only +Only on an explicit adversary PASS, merge via the Gitea API +(`POST /repos/recipe-maintainers/cc-ci/pulls//merge`, `{"Do":"squash", …}`) — squash keeps `main` +clean. If REJECT, fix and re-verify. + +### 9. Deploy to the running CI server +The merge lands on the cc-ci repo `main`, but the **running harness** is the deploy clone +`/root/builder-clone` (NOTE: `/root/cc-ci` is gone — known deploy-path gap). Pull + rebuild: +``` +ssh cc-ci 'cd /root/builder-clone && git pull --ff-only origin main && git submodule update --init --recursive && \ + nixos-rebuild switch --flake "/root/builder-clone?submodules=1#cc-ci"' +``` +Verify the new code is present (`grep` for it in `/root/builder-clone/runner/harness/…`). + +### 10. Clean up +Remove temp clones / run dirs on BOTH hosts (`/tmp/cc-ci-feature` locally, `/tmp/v`, `/tmp/r` on cc-ci). + +## Guardrails +- **Never push `main`; never merge without an adversary PASS.** The operator (or the adversary gate) + owns merges. +- **Cosmetics never change the verdict (R7).** Level/card/badge/results.json are best-effort; the + run's pass/fail (`overall`) is sovereign. Don't let a presentation change touch verdict logic. +- **Presentation never inflates** — an N/A (even an intentional, declared skip) caps the level; it is + only *labelled* intentional, never promoted to a pass. +- **Self-match pitfall:** never `pkill -f`/`pgrep -f` a pattern that also matches your own ssh command + line (it kills your session / loops forever). Target explicit PIDs instead. +- **Push every cc-ci-orchestrator commit** to git.autonomic.zone immediately (standing rule). diff --git a/cc-ci-plan/adversary-verify-pr6.md b/cc-ci-plan/adversary-verify-pr6.md new file mode 100644 index 0000000..4687598 --- /dev/null +++ b/cc-ci-plan/adversary-verify-pr6.md @@ -0,0 +1,51 @@ +# Adversary verification plan — cc-ci PR #6 + +**PR:** https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/6 +**Branch:** `feat/expected-na-and-tiny-functional` (cc-ci repo) +**Stance:** disbelieve and verify. Try to make it fail. Default to REJECT unless every check below is +green with evidence. You did NOT author this code — verify it independently, cold. + +## What the PR claims +1. A `custom-html-tiny` functional test (`tests/custom-html-tiny/functional/test_serves_content.py`): + exact-byte round-trip from the served `content` volume + a real 404. +2. `recipe_meta.EXPECTED_NA = {rung: reason}` lists rungs a recipe intentionally skips; any essential + rung skipped and NOT listed is *unintentional*. Skips still cap the level (never inflate). +3. The level ladder is the FOUR essential rungs only: `install · upgrade · backup_restore · functional` + (top = L4). `integration` and `recipe_local` are OPTIONAL — not rungs, never cap, not shown as skips. + **SSO must still be enforced for the run VERDICT** (the `sso_dep_unverified` / F2-11 path in + `run_recipe_ci.py` must be intact). +4. results.json carries `skips:{intentional:{rung:reason}, unintentional:[rung]}` + `level_cap_rung`; + the card shows INTENTIONAL/UNINTENTIONAL SKIP rows; the badge shows an `expected`/`gap?` 3rd segment. + +## Verification steps (run on the cc-ci host; creds in `/srv/cc-ci/.testenv`) +0. Fresh independent checkout of the PR head (do NOT reuse my working dirs): + `git clone … cc-ci` → `git checkout feat/expected-na-and-tiny-functional`. Record the HEAD sha. +1. **Full unit suite cold** (not just the touched files): `cc-ci-run -m pytest tests/unit/ -q`. + ALL must pass. A single failure → REJECT. Capture the count + any failure. +2. **Diff regression review:** `git diff origin/main...HEAD`. Confirm: (a) `level.py` RUNGS is the 4 + essential rungs; (b) `derive_rungs` no longer emits integration/recipe_local; (c) the SSO VERDICT + logic in `run_recipe_ci.py` (`sso_dep_unverified`, the F2-11 fail-the-run block) is UNCHANGED — + the PR must not have weakened SSO enforcement; (d) no test was weakened/skipped; (e) no secrets. +3. **Live end-to-end harness run, FULL stages** on the real CI server: + `RECIPE=custom-html-tiny STAGES=install,upgrade,backup,restore,custom cc-ci-run runner/run_recipe_ci.py` + (set `CCCI_RUNS_DIR` to a temp dir; source `.testenv`). Then read its `results.json` and assert: + - `install` PASS and `upgrade` PASS — the upgrade tier MUST actually run and pass (essential rung; + prove it's not silently skipped). The custom tier (functional serve test) PASS. + - `level == 2`, `level_cap_rung == "backup_restore"`, `level_cap_reason` mentions L3 backup/restore. + - `rungs` has exactly install/upgrade/backup_restore/functional — **no** integration/recipe_local. + - `skips.intentional == {"backup_restore": }`, `skips.unintentional == []`. + - `backup_restore` is N/A because there is no `backupbot.backup` label (a real intentional skip, + NOT a masked failure) — confirm by inspecting the recipe compose. + - `badge.svg` contains the muted `expected` third segment (not `gap?`). + - The summary card renders and shows a green `INTENTIONAL SKIP` row for backup/restore with the reason. +4. **Non-vacuity of the functional test:** confirm it asserts the *exact random bytes* round-trip and a + 404 on a random path (so a 200-everything stub would fail it) — read the test; reason about whether + it could pass against a broken server. Bonus: confirm it writes via the volume mountpoint (the SWS + image is shell-less). +5. **Teardown + hygiene:** after the run, confirm the harness left NO orphan stack/volume/container for + custom-html-tiny (deploy-count == 1; clean teardown). Confirm no secret value appears in results.json. + +## Verdict +Return a clear **PASS** or **REJECT** with the evidence for each numbered step (the HEAD sha, the unit +count, the key results.json fields, the upgrade-tier verdict, the teardown state). REJECT if ANYTHING is +unproven. Do not merge — the orchestrator merges on your PASS.