diff --git a/machine-docs/BACKLOG-regression.md b/machine-docs/BACKLOG-regression.md index 86da043..cffa37f 100644 --- a/machine-docs/BACKLOG-regression.md +++ b/machine-docs/BACKLOG-regression.md @@ -2,7 +2,12 @@ ## Build backlog -*(Builder-owned section — read-only for Adversary)* +- [x] Create `tests/regression/` suite (conftest + test_canaries + README) +- [ ] Run `good-simple` canary (custom-html-tiny main) → confirm GREEN + test_serving passes +- [ ] Run `bad-false-green` canary (custom-html v5-stale-docroot) → confirm RED + test_content_type fails +- [ ] Run `good-significant` canary (lasuite-docs main) → confirm GREEN + test_serving_and_frontend passes +- [ ] Open PR for operator review (DoD item 5: NOT merged) +- [ ] Claim gate once all canary runs are GREEN/RED as expected + PR is open ## Adversary findings diff --git a/machine-docs/JOURNAL-regression.md b/machine-docs/JOURNAL-regression.md new file mode 100644 index 0000000..eff3164 --- /dev/null +++ b/machine-docs/JOURNAL-regression.md @@ -0,0 +1,56 @@ +# JOURNAL — server regression canaries phase (Builder) + +**Phase:** server regression canaries +**Started:** 2026-06-02 + +--- + +## Step 0 — phase kickoff and design (2026-06-02) + +**Context:** Mirror phase (plan-mirror-enroll-all-recipes.md) completed DONE at 2026-06-02T01:16Z. +Adversary initialized regression phase files in machine-docs/ at commit f202c5a. + +**Decision: run regression tests ON cc-ci, not from the orchestrator** + +The regression tests call `run_recipe_ci.py` which uses abra/docker/swarm — these only exist on +cc-ci. The test process runs under `cc-ci-run python -m pytest`, which sets up the right PATH +(abra, python3, playwright, etc.). The test then invokes `run_recipe_ci.py` as a subprocess using +`sys.executable` (inherits the same python3 from cc-ci-run). + +The README.md documents the `ssh cc-ci "cc-ci-run python -m pytest tests/regression/ -m canary"` +invocation pattern. + +**Canary selection:** + +| ID | Recipe | SHA | Rationale | +|----|--------|-----|-----------| +| good-simple | custom-html-tiny | 435df8fc (main) | Fast, few deps, quick signal | +| good-significant | lasuite-docs | 290a8ad7 (main) | Multi-service, exercises real breadth | +| bad-false-green | custom-html | 71e7326a (v5-stale-docroot) | Already produced RED build #75; pinned fixture | + +SHAs confirmed from Gitea API on 2026-06-02. + +**Semantic checks ("teeth") design:** + +The regression tests assert BOTH exit code AND named tests in results.json stages. This guards +against two failure modes: +1. Harness returns wrong exit code (false-green / false-red) → rc assertion catches it +2. A specific assertion is silently removed/vacuated → named test disappears from stages → semantic check catches it + +For custom-html-tiny: `test_serving` (generic install) must appear passing +For lasuite-docs: `test_serving_and_frontend` (install overlay) must appear passing +For bad canary: `test_content_type` (custom functional) must appear failing + +**File layout:** +- `tests/regression/conftest.py` — run_recipe_ci(), stage_has_passing_test(), stage_has_failing_test() +- `tests/regression/test_canaries.py` — parametrized @pytest.mark.canary test +- `tests/regression/README.md` — cadence policy + how to run + how to add + +**Next step:** commit + push, then run good-simple and bad-false-green canaries to get real output. +lasuite-docs is slow (10-20 min) so will run it last. + +--- + +## Step 1 — initial canary runs (in progress, 2026-06-02) + +Committed suite, will record canary outputs here as they complete. diff --git a/machine-docs/STATUS-regression.md b/machine-docs/STATUS-regression.md new file mode 100644 index 0000000..8b69389 --- /dev/null +++ b/machine-docs/STATUS-regression.md @@ -0,0 +1,85 @@ +# STATUS — server regression canaries phase + +**Phase:** server regression canaries (codified E2E self-tests) +**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-server-regression-canaries.md` +**Builder loop started:** 2026-06-02 +**Repo:** git.autonomic.zone/recipe-maintainers/cc-ci + +--- + +## Current state + +**Gate: D-initial CLAIMED — test suite written; awaiting first canary run** + +The `tests/regression/` suite is committed. Before claiming the final gate (all DoD items +verified), the canaries need to actually run on the live server and return the expected verdicts. +Currently running the good-simple (custom-html-tiny) canary to confirm GREEN. + +--- + +## What was built + +`tests/regression/` committed in the cc-ci repo: +- `conftest.py` — `run_recipe_ci()` helper that invokes the real harness as subprocess, returns `(rc, results_dict, artifact_dir)`; `stage_has_passing_test()` / `stage_has_failing_test()` helpers for semantic checks +- `test_canaries.py` — parametrized `@pytest.mark.canary` test with three canaries (see below) +- `README.md` — cadence policy, how to run, how to add a canary + +--- + +## Canaries defined + +| ID | Recipe | SHA pinned | Expected | +|----|--------|-----------|----------| +| `good-simple` | `custom-html-tiny` | `435df8fc` (main 2026-06-02) | GREEN | +| `good-significant` | `lasuite-docs` | `290a8ad7` (main 2026-06-02) | GREEN | +| `bad-false-green` | `custom-html` | `71e7326a` (v5-stale-docroot) | RED | + +--- + +## Semantic assertions (teeth) + +Good canaries: +- `rc == 0` (harness exit) +- install tier: "pass" +- No tier is "fail" +- `flags.clean_teardown == True` +- `flags.no_secret_leak == True` +- Named test `test_serving` present + passing in install stage (custom-html-tiny) +- Named test `test_serving_and_frontend` present + passing in install stage (lasuite-docs) + +Bad canary: +- `rc != 0` (PRIMARY — false-green catches here) +- Named test `test_content_type` present + FAILING in custom stage (proves guard not vacuated) + +--- + +## How to verify (Adversary commands) + +From cc-ci server root (requires the repo checked out at `/root/cc-ci` or similar): + +```bash +# Good simple (fast ~2-5 min): +cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v + +# Bad canary (fast ~2-5 min, same recipe lifecycle): +cc-ci-run python -m pytest tests/regression/ -m canary -k bad-false-green -v + +# Full suite (slow — lasuite-docs is 10-20 min): +cc-ci-run python -m pytest tests/regression/ -m canary -v +``` + +Expected outcomes: +- `good-simple`: test PASSES (harness returns GREEN, test_serving passes) +- `bad-false-green`: test PASSES (harness returns RED, test_content_type fails in custom stage) +- `good-significant`: test PASSES (harness returns GREEN, test_serving_and_frontend passes) + +Verify teeth: tamper with an outcome to confirm the regression test fails: +- For good canary: unset `test_serving` (remove it) → `stage_has_passing_test` returns False → test fails +- For bad canary: change the assert to `rc == 0` → would fail if harness returns non-zero (teeth work) + +--- + +## In-flight + +- Canary `good-simple` running on cc-ci (started ~now) +- Status will be updated once run completes and we have actual output to paste diff --git a/tests/regression/README.md b/tests/regression/README.md new file mode 100644 index 0000000..0b02acd --- /dev/null +++ b/tests/regression/README.md @@ -0,0 +1,136 @@ +# Regression canaries — E2E self-tests for the cc-ci server + +A standing pytest suite that drives the **real** cc-ci lifecycle harness against pinned canary +recipes and verifies both halves of the server's job: + +1. **Good canaries** — healthy apps are reported GREEN (install + upgrade + backup/restore pass). +2. **Bad canary** — broken apps are caught RED; a false-green makes the regression test itself fail. + +These tests run the full cold lifecycle on the live cc-ci server. They are **slow** (minutes per +canary) and **opt-in** — kept out of the per-commit fast path by the `canary` marker. + +--- + +## How to run + +Run on the cc-ci server (abra + Docker + Swarm required): + +```bash +ssh cc-ci +cd /root/cc-ci # or wherever the repo is checked out +cc-ci-run python -m pytest tests/regression/ -m canary -v +``` + +Or a single canary: + +```bash +cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v +``` + +From the orchestrator: + +```bash +ssh cc-ci "cd /root/cc-ci && cc-ci-run python -m pytest tests/regression/ -m canary -v" +``` + +--- + +## Canaries + +| ID | Recipe | Purpose | Expected verdict | +|----|--------|---------|-----------------| +| `good-simple` | `custom-html-tiny` | Minimal static server — fast signal | GREEN | +| `good-significant` | `lasuite-docs` | Multi-service (backend + Postgres + Collabora + OIDC) | GREEN | +| `bad-false-green` | `custom-html` @ `v5-stale-docroot` | App is UP but serves wrong Content-Type — catches false-green | RED | + +### Why the bad canary exists + +The scariest regression is a **false-green**: the server reports PASS while the app is broken. +We already saw a fabricated full-PASS during the build. The `bad-false-green` canary pins a known- +broken fixture (`v5-stale-docroot`: nginx serves `.txt` as `application/octet-stream`). The +harness's `test_content_type_html_and_txt` catches this and returns RED (build #75 was RED for +exactly this fixture). + +The regression test asserts `rc != 0`. If the harness ever wrongly returns green for this fixture, +that assert fires — false-green is caught before any merge. + +--- + +## What each canary verifies + +### Per-tier semantic assertions (the "teeth") + +The tests assert MORE than the harness exit code: they check that **specific named assertions** +ran and got the expected result. This guards against a different failure mode — a tier that +nominally "passes" because the assertion was silently removed or made vacuous. + +| Stage | Test name | What it proves | +|-------|-----------|---------------| +| install | `test_serving` | Generic HTTP readiness check actually ran | +| install | `test_serving_and_frontend` | Lasuite-docs frontend (SPA shell) actually loaded | +| custom | `test_content_type` | Content-type assertion actually ran (bad canary only) | + +If a tier assertion is removed: the named test disappears from `results.json` → the semantic +check fires → the regression suite catches the removal. + +### Additional structural assertions (good canaries) + +- `install` tier: "pass" (not fail, not skip) +- No tier is "fail" (skips acceptable for recipes without backup/custom tests) +- `flags.clean_teardown = True` (no leftover containers/volumes/secrets) +- `flags.no_secret_leak = True` (no secret value in the results artifact) + +--- + +## Cadence policy + +**Do NOT run on every commit or PR.** These are slow and resource-heavy. Run them: + +- Before a **release** of the cc-ci server (after a batch of server changes). +- As a **polishing pass** or pre-merge check for significant server refactors. +- On-demand when you suspect a regression: `pytest -m canary`. + +They are NOT wired to the per-commit Drone pipeline. If adding a `!testme`-style trigger for the +cc-ci repo, gate it behind a deliberate label (e.g. `run-canaries`) — not an automatic run on +every push. + +--- + +## How to add a canary + +1. Identify a recipe that is already deployable and has pinned version tags. +2. Decide the expected verdict (GREEN or RED) and which tier assertions have teeth. +3. Add an entry to `CANARIES` in `test_canaries.py`: + +```python +{ + "id": "good-myrecipe", + "recipe": "my-recipe", + "src": "recipe-maintainers/my-recipe", + "ref": "", # pin to a specific commit for stability + "expected_green": True, + "stage_pass_checks": [ + ("install", "test_serving"), # verify this named test ran and passed + ], + "stage_fail_checks": [], +} +``` + +4. Run the canary once to confirm it passes: + `cc-ci-run python -m pytest tests/regression/ -m canary -k good-myrecipe -v` + +5. Update the pin comment with the date and the recipe version it was pinned at. + +--- + +## Pin maintenance + +Canary refs are pinned to specific SHAs for stability. When a recipe publishes a new release: + +1. Update the `"ref"` SHA in the canary definition (use the new main-branch HEAD). +2. Update the pin comment with the new date/version. +3. Re-run the canary to confirm GREEN before committing the pin update. + +The bad canary (`v5-stale-docroot`) is a stable fixture branch — update only if the branch is +deleted. If deleted, recreate the pattern: an app that is up + passes lifecycle tiers but fails +one functional assertion. diff --git a/tests/regression/conftest.py b/tests/regression/conftest.py new file mode 100644 index 0000000..07b86a7 --- /dev/null +++ b/tests/regression/conftest.py @@ -0,0 +1,102 @@ +"""Shared fixtures and helpers for E2E canary regression tests. + +The regression tests call the real cc-ci harness (run_recipe_ci.py) as a subprocess and assert on +its outputs (exit code, results.json). They run ON the cc-ci server, not the orchestrator — abra, +Docker, and Swarm must be present. + +Invoke: cc-ci-run python -m pytest tests/regression/ -m canary -v +""" + +from __future__ import annotations + +import json +import os +import subprocess +import sys +import time + +ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) + + +def pytest_configure(config): + config.addinivalue_line( + "markers", + "canary: slow E2E canary test — drives the full cold CI lifecycle; run on-demand only.", + ) + + +def run_recipe_ci( + recipe: str, + src: str, + ref: str, + pr: str = "0", + stages: str = "install,upgrade,backup,restore,custom", + runs_dir: str | None = None, + run_id_prefix: str = "regression", + timeout: int = 3600, +) -> tuple[int, dict | None, str]: + """Invoke run_recipe_ci.py with the given canary params. + + Returns (rc, results_dict_or_None, run_artifact_dir). + Stdout/stderr stream live so a human can follow progress. + """ + ts = int(time.time()) + run_id = f"{run_id_prefix}-{recipe}-{ref[:12]}-{ts}" + if runs_dir is None: + runs_dir = "/var/lib/cc-ci-runs" + + env = dict(os.environ) + env.update( + { + "RECIPE": recipe, + "REF": ref, + "SRC": src, + "PR": pr, + "STAGES": stages, + "CCCI_RUN_ID": run_id, + "CCCI_RUNS_DIR": runs_dir, + "HOME": "/root", + } + ) + # Keep PLAYWRIGHT env from the outer cc-ci-run wrapper (already in os.environ if running under it) + + script = os.path.join(ROOT, "runner", "run_recipe_ci.py") + result = subprocess.run( + [sys.executable, script], + env=env, + timeout=timeout, + ) + rc = result.returncode + + artifact_dir = os.path.join(runs_dir, run_id) + results_path = os.path.join(artifact_dir, "results.json") + results_data: dict | None = None + if os.path.exists(results_path): + with open(results_path) as f: + results_data = json.load(f) + + return rc, results_data, artifact_dir + + +def find_stage_tests(results: dict, stage_name: str) -> list[dict]: + """Return the per-test list for a named stage from results.json, or [].""" + for stage in results.get("stages", []): + if stage.get("name") == stage_name: + return stage.get("tests", []) + return [] + + +def stage_has_passing_test(results: dict, stage_name: str, test_name_substr: str) -> bool: + """True if the named stage contains a passing test whose name includes test_name_substr.""" + for t in find_stage_tests(results, stage_name): + if test_name_substr in t.get("name", "") and t.get("status") == "pass": + return True + return False + + +def stage_has_failing_test(results: dict, stage_name: str, test_name_substr: str) -> bool: + """True if the named stage contains a failing test whose name includes test_name_substr.""" + for t in find_stage_tests(results, stage_name): + if test_name_substr in t.get("name", "") and t.get("status") in ("fail", "error"): + return True + return False diff --git a/tests/regression/test_canaries.py b/tests/regression/test_canaries.py new file mode 100644 index 0000000..d77ff1c --- /dev/null +++ b/tests/regression/test_canaries.py @@ -0,0 +1,181 @@ +"""E2E canary regression tests — the server's standing self-test suite. + +Three canaries prove both halves of the server's job: + 1. GREEN canaries — good apps are reported healthy (install+upgrade+backup/restore pass). + 2. RED canary — broken apps are caught; a false-green makes THIS test fail. + +Run: cc-ci-run python -m pytest tests/regression/ -m canary -v +Slow: each canary drives the full cold lifecycle on the live server (minutes per run). + +Pin policy: canary refs are pinned to specific SHAs for stability. Update them when the recipe +publishes a new release and the pin is stale (re-run to confirm GREEN before updating). +""" + +from __future__ import annotations + +import pytest + +from .conftest import run_recipe_ci, stage_has_failing_test, stage_has_passing_test + +# --------------------------------------------------------------------------- +# Canary definitions +# --------------------------------------------------------------------------- + +# Good canary 1: minimal static-file server — fast signal, few deps. +_SIMPLE = { + "id": "good-simple", + "recipe": "custom-html-tiny", + "src": "recipe-maintainers/custom-html-tiny", + # Pin: main @ 2026-06-02 — update if the recipe publishes a new release and pin goes stale. + "ref": "435df8fc98ef7598084fcffcd6225470eca80053", + "expected_green": True, + # Named tests that MUST appear with "pass" in the result — these are the semantic teeth. + # If the generic install assertion is removed/vacated, test_serving disappears → this fails. + "stage_pass_checks": [ + ("install", "test_serving"), + ], + "stage_fail_checks": [], +} + +# Good canary 2: multi-service stack — backend + Postgres + Collabora WOPI + OIDC. +# Exercises real breadth. Slowest canary (~10-20 min full lifecycle). +_SIGNIFICANT = { + "id": "good-significant", + "recipe": "lasuite-docs", + "src": "recipe-maintainers/lasuite-docs", + # Pin: main @ 2026-06-02 + "ref": "290a8ad72d06232f0b3f302d976af14bef0f3c53", + "expected_green": True, + "stage_pass_checks": [ + ("install", "test_serving_and_frontend"), + ], + "stage_fail_checks": [], +} + +# Bad canary: app is UP + passes all lifecycle tiers but the custom functional assertion detects a +# semantic defect (wrong Content-Type for .txt files). The harness MUST report RED. +# If the harness wrongly returns green for this fixture, assert rc != 0 fails → false-green caught. +_BAD = { + "id": "bad-false-green", + "recipe": "custom-html", + "src": "recipe-maintainers/custom-html", + # Pin: v5-stale-docroot @ 71e7326 — serves .txt as application/octet-stream; build #75 was RED. + # Recreate pattern if branch disappears: app up + passes lifecycle, fails one content assertion. + "ref": "71e7326a99bbb69035a046fba8fa51859ca66115", + "expected_green": False, + # The specific test that must have FAILED, proving the content-type assertion has teeth. + # If the assertion is vacated and the test disappears, stage_has_failing_test() returns False + # → the assert below fails → we detect that the guard was removed. + "stage_pass_checks": [], + "stage_fail_checks": [ + ("custom", "test_content_type"), + ], +} + +CANARIES = [_SIMPLE, _SIGNIFICANT, _BAD] + + +# --------------------------------------------------------------------------- +# Test +# --------------------------------------------------------------------------- + + +@pytest.mark.canary +@pytest.mark.parametrize("canary", CANARIES, ids=[c["id"] for c in CANARIES]) +def test_canary(canary, tmp_path): + """Drive the full cold CI lifecycle for a canary recipe and verify the outcome. + + For GREEN canaries: proves the harness correctly reports a healthy app as healthy, and that + the per-tier semantic assertions actually ran (not vacuous). + + For the RED canary: proves the harness catches a broken app — if the harness wrongly returned + green, `assert rc != 0` fails, catching the false-green. + """ + rc, results, artifact_dir = run_recipe_ci( + recipe=canary["recipe"], + src=canary["src"], + ref=canary["ref"], + runs_dir=str(tmp_path), + ) + + _note = f"artifact_dir={artifact_dir}" # visible in -v output via assert messages + + if canary["expected_green"]: + _assert_green(rc, results, canary, _note) + else: + _assert_red(rc, results, canary, _note) + + +def _assert_green(rc: int, results: dict | None, canary: dict, note: str) -> None: + """Assert a good-canary run is GREEN with real semantic assertions.""" + + # 1. Harness exit code must be 0 (GREEN). + assert rc == 0, f"[{canary['id']}] harness returned non-zero rc={rc} — expected GREEN. {note}" + + assert ( + results is not None + ), f"[{canary['id']}] results.json not written — harness may have crashed. {note}" + + # 2. Install tier must have passed. + assert results.get("results", {}).get("install") == "pass", ( + f"[{canary['id']}] install tier did not pass: " f"results={results.get('results')}. {note}" + ) + + # 3. No tier may have FAILED (skips are acceptable for recipes without backup or custom tests). + failed_tiers = [t for t, s in results.get("results", {}).items() if s == "fail"] + assert not failed_tiers, f"[{canary['id']}] tiers failed: {failed_tiers}. {note}" + + # 4. Teardown must be clean (no leftover containers/volumes/secrets). + assert ( + results.get("flags", {}).get("clean_teardown") is True + ), f"[{canary['id']}] clean_teardown=False — residual state left on server. {note}" + + # 5. No secret values leaked into the results artifact. + assert ( + results.get("flags", {}).get("no_secret_leak") is True + ), f"[{canary['id']}] no_secret_leak=False — a secret value appeared in results.json. {note}" + + # 6. Semantic stage assertions — TEETH CHECK. + # These verify that specific named tests actually ran and passed in the expected stage. + # If a tier assertion is removed or made vacuous, the named test disappears from results.json + # and this assert fires — proving the regression suite guards against silent test removal. + for stage_name, test_name_substr in canary.get("stage_pass_checks", []): + assert stage_has_passing_test(results, stage_name, test_name_substr), ( + f"[{canary['id']}] expected a passing test containing {test_name_substr!r} in " + f"stage={stage_name!r}, but none found. " + f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}" + ) + + +def _assert_red(rc: int, results: dict | None, canary: dict, note: str) -> None: + """Assert a bad-canary run is RED (false-green guard). + + The PRIMARY assertion is rc != 0. If the harness wrongly returns 0 (green) for this fixture, + this assert fails → the regression suite catches the false-green. This is the core guard. + """ + + # PRIMARY: harness must return non-zero (RED). + # If the harness returns 0 for a broken app, the regression suite fails here — false-green caught. + assert rc != 0, ( + f"[{canary['id']}] harness returned rc=0 (GREEN) for a KNOWN-BAD fixture — " + f"FALSE-GREEN detected. The harness failed to catch the broken app. {note}" + ) + + # SECONDARY: verify the specific failing test is present in results.json. + # If the content-type assertion is removed/vacuated, stage_has_failing_test() returns False here + # → this assert fires → we detect that the guard itself was removed (a meta-failure). + if results is not None: + for stage_name, test_name_substr in canary.get("stage_fail_checks", []): + assert stage_has_failing_test(results, stage_name, test_name_substr), ( + f"[{canary['id']}] expected a failing test containing {test_name_substr!r} in " + f"stage={stage_name!r}, but none found. " + f"The guard may have been removed or vacuated. " + f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}" + ) + + +def _stage_tests(results: dict, stage_name: str) -> list[dict]: + for stage in results.get("stages", []): + if stage.get("name") == stage_name: + return stage.get("tests", []) + return []