feat(regression): add tests/regression/ E2E canary suite
Some checks failed
continuous-integration/drone/push Build is failing
Some checks failed
continuous-integration/drone/push Build is failing
Three canaries (@pytest.mark.canary) drive the real cold CI lifecycle:
- good-simple: custom-html-tiny @ main (435df8fc) — fast signal, expects GREEN
- good-significant: lasuite-docs @ main (290a8ad7) — multi-service, expects GREEN
- bad-false-green: custom-html @ v5-stale-docroot (71e7326a) — expects RED
Semantic teeth: beyond exit-code, each test asserts that specific named tests
ran in results.json stages (test_serving, test_serving_and_frontend, test_content_type).
If an assertion is removed, the named test disappears → regression test fails.
Includes conftest (run_recipe_ci helper + stage_has_{passing,failing}_test),
README (cadence policy, how to run, how to add), and phase state files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@ -2,7 +2,12 @@
|
||||
|
||||
## Build backlog
|
||||
|
||||
*(Builder-owned section — read-only for Adversary)*
|
||||
- [x] Create `tests/regression/` suite (conftest + test_canaries + README)
|
||||
- [ ] Run `good-simple` canary (custom-html-tiny main) → confirm GREEN + test_serving passes
|
||||
- [ ] Run `bad-false-green` canary (custom-html v5-stale-docroot) → confirm RED + test_content_type fails
|
||||
- [ ] Run `good-significant` canary (lasuite-docs main) → confirm GREEN + test_serving_and_frontend passes
|
||||
- [ ] Open PR for operator review (DoD item 5: NOT merged)
|
||||
- [ ] Claim gate once all canary runs are GREEN/RED as expected + PR is open
|
||||
|
||||
## Adversary findings
|
||||
|
||||
|
||||
56
machine-docs/JOURNAL-regression.md
Normal file
56
machine-docs/JOURNAL-regression.md
Normal file
@ -0,0 +1,56 @@
|
||||
# JOURNAL — server regression canaries phase (Builder)
|
||||
|
||||
**Phase:** server regression canaries
|
||||
**Started:** 2026-06-02
|
||||
|
||||
---
|
||||
|
||||
## Step 0 — phase kickoff and design (2026-06-02)
|
||||
|
||||
**Context:** Mirror phase (plan-mirror-enroll-all-recipes.md) completed DONE at 2026-06-02T01:16Z.
|
||||
Adversary initialized regression phase files in machine-docs/ at commit f202c5a.
|
||||
|
||||
**Decision: run regression tests ON cc-ci, not from the orchestrator**
|
||||
|
||||
The regression tests call `run_recipe_ci.py` which uses abra/docker/swarm — these only exist on
|
||||
cc-ci. The test process runs under `cc-ci-run python -m pytest`, which sets up the right PATH
|
||||
(abra, python3, playwright, etc.). The test then invokes `run_recipe_ci.py` as a subprocess using
|
||||
`sys.executable` (inherits the same python3 from cc-ci-run).
|
||||
|
||||
The README.md documents the `ssh cc-ci "cc-ci-run python -m pytest tests/regression/ -m canary"`
|
||||
invocation pattern.
|
||||
|
||||
**Canary selection:**
|
||||
|
||||
| ID | Recipe | SHA | Rationale |
|
||||
|----|--------|-----|-----------|
|
||||
| good-simple | custom-html-tiny | 435df8fc (main) | Fast, few deps, quick signal |
|
||||
| good-significant | lasuite-docs | 290a8ad7 (main) | Multi-service, exercises real breadth |
|
||||
| bad-false-green | custom-html | 71e7326a (v5-stale-docroot) | Already produced RED build #75; pinned fixture |
|
||||
|
||||
SHAs confirmed from Gitea API on 2026-06-02.
|
||||
|
||||
**Semantic checks ("teeth") design:**
|
||||
|
||||
The regression tests assert BOTH exit code AND named tests in results.json stages. This guards
|
||||
against two failure modes:
|
||||
1. Harness returns wrong exit code (false-green / false-red) → rc assertion catches it
|
||||
2. A specific assertion is silently removed/vacuated → named test disappears from stages → semantic check catches it
|
||||
|
||||
For custom-html-tiny: `test_serving` (generic install) must appear passing
|
||||
For lasuite-docs: `test_serving_and_frontend` (install overlay) must appear passing
|
||||
For bad canary: `test_content_type` (custom functional) must appear failing
|
||||
|
||||
**File layout:**
|
||||
- `tests/regression/conftest.py` — run_recipe_ci(), stage_has_passing_test(), stage_has_failing_test()
|
||||
- `tests/regression/test_canaries.py` — parametrized @pytest.mark.canary test
|
||||
- `tests/regression/README.md` — cadence policy + how to run + how to add
|
||||
|
||||
**Next step:** commit + push, then run good-simple and bad-false-green canaries to get real output.
|
||||
lasuite-docs is slow (10-20 min) so will run it last.
|
||||
|
||||
---
|
||||
|
||||
## Step 1 — initial canary runs (in progress, 2026-06-02)
|
||||
|
||||
Committed suite, will record canary outputs here as they complete.
|
||||
85
machine-docs/STATUS-regression.md
Normal file
85
machine-docs/STATUS-regression.md
Normal file
@ -0,0 +1,85 @@
|
||||
# STATUS — server regression canaries phase
|
||||
|
||||
**Phase:** server regression canaries (codified E2E self-tests)
|
||||
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-server-regression-canaries.md`
|
||||
**Builder loop started:** 2026-06-02
|
||||
**Repo:** git.autonomic.zone/recipe-maintainers/cc-ci
|
||||
|
||||
---
|
||||
|
||||
## Current state
|
||||
|
||||
**Gate: D-initial CLAIMED — test suite written; awaiting first canary run**
|
||||
|
||||
The `tests/regression/` suite is committed. Before claiming the final gate (all DoD items
|
||||
verified), the canaries need to actually run on the live server and return the expected verdicts.
|
||||
Currently running the good-simple (custom-html-tiny) canary to confirm GREEN.
|
||||
|
||||
---
|
||||
|
||||
## What was built
|
||||
|
||||
`tests/regression/` committed in the cc-ci repo:
|
||||
- `conftest.py` — `run_recipe_ci()` helper that invokes the real harness as subprocess, returns `(rc, results_dict, artifact_dir)`; `stage_has_passing_test()` / `stage_has_failing_test()` helpers for semantic checks
|
||||
- `test_canaries.py` — parametrized `@pytest.mark.canary` test with three canaries (see below)
|
||||
- `README.md` — cadence policy, how to run, how to add a canary
|
||||
|
||||
---
|
||||
|
||||
## Canaries defined
|
||||
|
||||
| ID | Recipe | SHA pinned | Expected |
|
||||
|----|--------|-----------|----------|
|
||||
| `good-simple` | `custom-html-tiny` | `435df8fc` (main 2026-06-02) | GREEN |
|
||||
| `good-significant` | `lasuite-docs` | `290a8ad7` (main 2026-06-02) | GREEN |
|
||||
| `bad-false-green` | `custom-html` | `71e7326a` (v5-stale-docroot) | RED |
|
||||
|
||||
---
|
||||
|
||||
## Semantic assertions (teeth)
|
||||
|
||||
Good canaries:
|
||||
- `rc == 0` (harness exit)
|
||||
- install tier: "pass"
|
||||
- No tier is "fail"
|
||||
- `flags.clean_teardown == True`
|
||||
- `flags.no_secret_leak == True`
|
||||
- Named test `test_serving` present + passing in install stage (custom-html-tiny)
|
||||
- Named test `test_serving_and_frontend` present + passing in install stage (lasuite-docs)
|
||||
|
||||
Bad canary:
|
||||
- `rc != 0` (PRIMARY — false-green catches here)
|
||||
- Named test `test_content_type` present + FAILING in custom stage (proves guard not vacuated)
|
||||
|
||||
---
|
||||
|
||||
## How to verify (Adversary commands)
|
||||
|
||||
From cc-ci server root (requires the repo checked out at `/root/cc-ci` or similar):
|
||||
|
||||
```bash
|
||||
# Good simple (fast ~2-5 min):
|
||||
cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v
|
||||
|
||||
# Bad canary (fast ~2-5 min, same recipe lifecycle):
|
||||
cc-ci-run python -m pytest tests/regression/ -m canary -k bad-false-green -v
|
||||
|
||||
# Full suite (slow — lasuite-docs is 10-20 min):
|
||||
cc-ci-run python -m pytest tests/regression/ -m canary -v
|
||||
```
|
||||
|
||||
Expected outcomes:
|
||||
- `good-simple`: test PASSES (harness returns GREEN, test_serving passes)
|
||||
- `bad-false-green`: test PASSES (harness returns RED, test_content_type fails in custom stage)
|
||||
- `good-significant`: test PASSES (harness returns GREEN, test_serving_and_frontend passes)
|
||||
|
||||
Verify teeth: tamper with an outcome to confirm the regression test fails:
|
||||
- For good canary: unset `test_serving` (remove it) → `stage_has_passing_test` returns False → test fails
|
||||
- For bad canary: change the assert to `rc == 0` → would fail if harness returns non-zero (teeth work)
|
||||
|
||||
---
|
||||
|
||||
## In-flight
|
||||
|
||||
- Canary `good-simple` running on cc-ci (started ~now)
|
||||
- Status will be updated once run completes and we have actual output to paste
|
||||
136
tests/regression/README.md
Normal file
136
tests/regression/README.md
Normal file
@ -0,0 +1,136 @@
|
||||
# Regression canaries — E2E self-tests for the cc-ci server
|
||||
|
||||
A standing pytest suite that drives the **real** cc-ci lifecycle harness against pinned canary
|
||||
recipes and verifies both halves of the server's job:
|
||||
|
||||
1. **Good canaries** — healthy apps are reported GREEN (install + upgrade + backup/restore pass).
|
||||
2. **Bad canary** — broken apps are caught RED; a false-green makes the regression test itself fail.
|
||||
|
||||
These tests run the full cold lifecycle on the live cc-ci server. They are **slow** (minutes per
|
||||
canary) and **opt-in** — kept out of the per-commit fast path by the `canary` marker.
|
||||
|
||||
---
|
||||
|
||||
## How to run
|
||||
|
||||
Run on the cc-ci server (abra + Docker + Swarm required):
|
||||
|
||||
```bash
|
||||
ssh cc-ci
|
||||
cd /root/cc-ci # or wherever the repo is checked out
|
||||
cc-ci-run python -m pytest tests/regression/ -m canary -v
|
||||
```
|
||||
|
||||
Or a single canary:
|
||||
|
||||
```bash
|
||||
cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v
|
||||
```
|
||||
|
||||
From the orchestrator:
|
||||
|
||||
```bash
|
||||
ssh cc-ci "cd /root/cc-ci && cc-ci-run python -m pytest tests/regression/ -m canary -v"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Canaries
|
||||
|
||||
| ID | Recipe | Purpose | Expected verdict |
|
||||
|----|--------|---------|-----------------|
|
||||
| `good-simple` | `custom-html-tiny` | Minimal static server — fast signal | GREEN |
|
||||
| `good-significant` | `lasuite-docs` | Multi-service (backend + Postgres + Collabora + OIDC) | GREEN |
|
||||
| `bad-false-green` | `custom-html` @ `v5-stale-docroot` | App is UP but serves wrong Content-Type — catches false-green | RED |
|
||||
|
||||
### Why the bad canary exists
|
||||
|
||||
The scariest regression is a **false-green**: the server reports PASS while the app is broken.
|
||||
We already saw a fabricated full-PASS during the build. The `bad-false-green` canary pins a known-
|
||||
broken fixture (`v5-stale-docroot`: nginx serves `.txt` as `application/octet-stream`). The
|
||||
harness's `test_content_type_html_and_txt` catches this and returns RED (build #75 was RED for
|
||||
exactly this fixture).
|
||||
|
||||
The regression test asserts `rc != 0`. If the harness ever wrongly returns green for this fixture,
|
||||
that assert fires — false-green is caught before any merge.
|
||||
|
||||
---
|
||||
|
||||
## What each canary verifies
|
||||
|
||||
### Per-tier semantic assertions (the "teeth")
|
||||
|
||||
The tests assert MORE than the harness exit code: they check that **specific named assertions**
|
||||
ran and got the expected result. This guards against a different failure mode — a tier that
|
||||
nominally "passes" because the assertion was silently removed or made vacuous.
|
||||
|
||||
| Stage | Test name | What it proves |
|
||||
|-------|-----------|---------------|
|
||||
| install | `test_serving` | Generic HTTP readiness check actually ran |
|
||||
| install | `test_serving_and_frontend` | Lasuite-docs frontend (SPA shell) actually loaded |
|
||||
| custom | `test_content_type` | Content-type assertion actually ran (bad canary only) |
|
||||
|
||||
If a tier assertion is removed: the named test disappears from `results.json` → the semantic
|
||||
check fires → the regression suite catches the removal.
|
||||
|
||||
### Additional structural assertions (good canaries)
|
||||
|
||||
- `install` tier: "pass" (not fail, not skip)
|
||||
- No tier is "fail" (skips acceptable for recipes without backup/custom tests)
|
||||
- `flags.clean_teardown = True` (no leftover containers/volumes/secrets)
|
||||
- `flags.no_secret_leak = True` (no secret value in the results artifact)
|
||||
|
||||
---
|
||||
|
||||
## Cadence policy
|
||||
|
||||
**Do NOT run on every commit or PR.** These are slow and resource-heavy. Run them:
|
||||
|
||||
- Before a **release** of the cc-ci server (after a batch of server changes).
|
||||
- As a **polishing pass** or pre-merge check for significant server refactors.
|
||||
- On-demand when you suspect a regression: `pytest -m canary`.
|
||||
|
||||
They are NOT wired to the per-commit Drone pipeline. If adding a `!testme`-style trigger for the
|
||||
cc-ci repo, gate it behind a deliberate label (e.g. `run-canaries`) — not an automatic run on
|
||||
every push.
|
||||
|
||||
---
|
||||
|
||||
## How to add a canary
|
||||
|
||||
1. Identify a recipe that is already deployable and has pinned version tags.
|
||||
2. Decide the expected verdict (GREEN or RED) and which tier assertions have teeth.
|
||||
3. Add an entry to `CANARIES` in `test_canaries.py`:
|
||||
|
||||
```python
|
||||
{
|
||||
"id": "good-myrecipe",
|
||||
"recipe": "my-recipe",
|
||||
"src": "recipe-maintainers/my-recipe",
|
||||
"ref": "<pinned-sha>", # pin to a specific commit for stability
|
||||
"expected_green": True,
|
||||
"stage_pass_checks": [
|
||||
("install", "test_serving"), # verify this named test ran and passed
|
||||
],
|
||||
"stage_fail_checks": [],
|
||||
}
|
||||
```
|
||||
|
||||
4. Run the canary once to confirm it passes:
|
||||
`cc-ci-run python -m pytest tests/regression/ -m canary -k good-myrecipe -v`
|
||||
|
||||
5. Update the pin comment with the date and the recipe version it was pinned at.
|
||||
|
||||
---
|
||||
|
||||
## Pin maintenance
|
||||
|
||||
Canary refs are pinned to specific SHAs for stability. When a recipe publishes a new release:
|
||||
|
||||
1. Update the `"ref"` SHA in the canary definition (use the new main-branch HEAD).
|
||||
2. Update the pin comment with the new date/version.
|
||||
3. Re-run the canary to confirm GREEN before committing the pin update.
|
||||
|
||||
The bad canary (`v5-stale-docroot`) is a stable fixture branch — update only if the branch is
|
||||
deleted. If deleted, recreate the pattern: an app that is up + passes lifecycle tiers but fails
|
||||
one functional assertion.
|
||||
102
tests/regression/conftest.py
Normal file
102
tests/regression/conftest.py
Normal file
@ -0,0 +1,102 @@
|
||||
"""Shared fixtures and helpers for E2E canary regression tests.
|
||||
|
||||
The regression tests call the real cc-ci harness (run_recipe_ci.py) as a subprocess and assert on
|
||||
its outputs (exit code, results.json). They run ON the cc-ci server, not the orchestrator — abra,
|
||||
Docker, and Swarm must be present.
|
||||
|
||||
Invoke: cc-ci-run python -m pytest tests/regression/ -m canary -v
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
|
||||
ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
|
||||
def pytest_configure(config):
|
||||
config.addinivalue_line(
|
||||
"markers",
|
||||
"canary: slow E2E canary test — drives the full cold CI lifecycle; run on-demand only.",
|
||||
)
|
||||
|
||||
|
||||
def run_recipe_ci(
|
||||
recipe: str,
|
||||
src: str,
|
||||
ref: str,
|
||||
pr: str = "0",
|
||||
stages: str = "install,upgrade,backup,restore,custom",
|
||||
runs_dir: str | None = None,
|
||||
run_id_prefix: str = "regression",
|
||||
timeout: int = 3600,
|
||||
) -> tuple[int, dict | None, str]:
|
||||
"""Invoke run_recipe_ci.py with the given canary params.
|
||||
|
||||
Returns (rc, results_dict_or_None, run_artifact_dir).
|
||||
Stdout/stderr stream live so a human can follow progress.
|
||||
"""
|
||||
ts = int(time.time())
|
||||
run_id = f"{run_id_prefix}-{recipe}-{ref[:12]}-{ts}"
|
||||
if runs_dir is None:
|
||||
runs_dir = "/var/lib/cc-ci-runs"
|
||||
|
||||
env = dict(os.environ)
|
||||
env.update(
|
||||
{
|
||||
"RECIPE": recipe,
|
||||
"REF": ref,
|
||||
"SRC": src,
|
||||
"PR": pr,
|
||||
"STAGES": stages,
|
||||
"CCCI_RUN_ID": run_id,
|
||||
"CCCI_RUNS_DIR": runs_dir,
|
||||
"HOME": "/root",
|
||||
}
|
||||
)
|
||||
# Keep PLAYWRIGHT env from the outer cc-ci-run wrapper (already in os.environ if running under it)
|
||||
|
||||
script = os.path.join(ROOT, "runner", "run_recipe_ci.py")
|
||||
result = subprocess.run(
|
||||
[sys.executable, script],
|
||||
env=env,
|
||||
timeout=timeout,
|
||||
)
|
||||
rc = result.returncode
|
||||
|
||||
artifact_dir = os.path.join(runs_dir, run_id)
|
||||
results_path = os.path.join(artifact_dir, "results.json")
|
||||
results_data: dict | None = None
|
||||
if os.path.exists(results_path):
|
||||
with open(results_path) as f:
|
||||
results_data = json.load(f)
|
||||
|
||||
return rc, results_data, artifact_dir
|
||||
|
||||
|
||||
def find_stage_tests(results: dict, stage_name: str) -> list[dict]:
|
||||
"""Return the per-test list for a named stage from results.json, or []."""
|
||||
for stage in results.get("stages", []):
|
||||
if stage.get("name") == stage_name:
|
||||
return stage.get("tests", [])
|
||||
return []
|
||||
|
||||
|
||||
def stage_has_passing_test(results: dict, stage_name: str, test_name_substr: str) -> bool:
|
||||
"""True if the named stage contains a passing test whose name includes test_name_substr."""
|
||||
for t in find_stage_tests(results, stage_name):
|
||||
if test_name_substr in t.get("name", "") and t.get("status") == "pass":
|
||||
return True
|
||||
return False
|
||||
|
||||
|
||||
def stage_has_failing_test(results: dict, stage_name: str, test_name_substr: str) -> bool:
|
||||
"""True if the named stage contains a failing test whose name includes test_name_substr."""
|
||||
for t in find_stage_tests(results, stage_name):
|
||||
if test_name_substr in t.get("name", "") and t.get("status") in ("fail", "error"):
|
||||
return True
|
||||
return False
|
||||
181
tests/regression/test_canaries.py
Normal file
181
tests/regression/test_canaries.py
Normal file
@ -0,0 +1,181 @@
|
||||
"""E2E canary regression tests — the server's standing self-test suite.
|
||||
|
||||
Three canaries prove both halves of the server's job:
|
||||
1. GREEN canaries — good apps are reported healthy (install+upgrade+backup/restore pass).
|
||||
2. RED canary — broken apps are caught; a false-green makes THIS test fail.
|
||||
|
||||
Run: cc-ci-run python -m pytest tests/regression/ -m canary -v
|
||||
Slow: each canary drives the full cold lifecycle on the live server (minutes per run).
|
||||
|
||||
Pin policy: canary refs are pinned to specific SHAs for stability. Update them when the recipe
|
||||
publishes a new release and the pin is stale (re-run to confirm GREEN before updating).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import pytest
|
||||
|
||||
from .conftest import run_recipe_ci, stage_has_failing_test, stage_has_passing_test
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Canary definitions
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Good canary 1: minimal static-file server — fast signal, few deps.
|
||||
_SIMPLE = {
|
||||
"id": "good-simple",
|
||||
"recipe": "custom-html-tiny",
|
||||
"src": "recipe-maintainers/custom-html-tiny",
|
||||
# Pin: main @ 2026-06-02 — update if the recipe publishes a new release and pin goes stale.
|
||||
"ref": "435df8fc98ef7598084fcffcd6225470eca80053",
|
||||
"expected_green": True,
|
||||
# Named tests that MUST appear with "pass" in the result — these are the semantic teeth.
|
||||
# If the generic install assertion is removed/vacated, test_serving disappears → this fails.
|
||||
"stage_pass_checks": [
|
||||
("install", "test_serving"),
|
||||
],
|
||||
"stage_fail_checks": [],
|
||||
}
|
||||
|
||||
# Good canary 2: multi-service stack — backend + Postgres + Collabora WOPI + OIDC.
|
||||
# Exercises real breadth. Slowest canary (~10-20 min full lifecycle).
|
||||
_SIGNIFICANT = {
|
||||
"id": "good-significant",
|
||||
"recipe": "lasuite-docs",
|
||||
"src": "recipe-maintainers/lasuite-docs",
|
||||
# Pin: main @ 2026-06-02
|
||||
"ref": "290a8ad72d06232f0b3f302d976af14bef0f3c53",
|
||||
"expected_green": True,
|
||||
"stage_pass_checks": [
|
||||
("install", "test_serving_and_frontend"),
|
||||
],
|
||||
"stage_fail_checks": [],
|
||||
}
|
||||
|
||||
# Bad canary: app is UP + passes all lifecycle tiers but the custom functional assertion detects a
|
||||
# semantic defect (wrong Content-Type for .txt files). The harness MUST report RED.
|
||||
# If the harness wrongly returns green for this fixture, assert rc != 0 fails → false-green caught.
|
||||
_BAD = {
|
||||
"id": "bad-false-green",
|
||||
"recipe": "custom-html",
|
||||
"src": "recipe-maintainers/custom-html",
|
||||
# Pin: v5-stale-docroot @ 71e7326 — serves .txt as application/octet-stream; build #75 was RED.
|
||||
# Recreate pattern if branch disappears: app up + passes lifecycle, fails one content assertion.
|
||||
"ref": "71e7326a99bbb69035a046fba8fa51859ca66115",
|
||||
"expected_green": False,
|
||||
# The specific test that must have FAILED, proving the content-type assertion has teeth.
|
||||
# If the assertion is vacated and the test disappears, stage_has_failing_test() returns False
|
||||
# → the assert below fails → we detect that the guard was removed.
|
||||
"stage_pass_checks": [],
|
||||
"stage_fail_checks": [
|
||||
("custom", "test_content_type"),
|
||||
],
|
||||
}
|
||||
|
||||
CANARIES = [_SIMPLE, _SIGNIFICANT, _BAD]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.canary
|
||||
@pytest.mark.parametrize("canary", CANARIES, ids=[c["id"] for c in CANARIES])
|
||||
def test_canary(canary, tmp_path):
|
||||
"""Drive the full cold CI lifecycle for a canary recipe and verify the outcome.
|
||||
|
||||
For GREEN canaries: proves the harness correctly reports a healthy app as healthy, and that
|
||||
the per-tier semantic assertions actually ran (not vacuous).
|
||||
|
||||
For the RED canary: proves the harness catches a broken app — if the harness wrongly returned
|
||||
green, `assert rc != 0` fails, catching the false-green.
|
||||
"""
|
||||
rc, results, artifact_dir = run_recipe_ci(
|
||||
recipe=canary["recipe"],
|
||||
src=canary["src"],
|
||||
ref=canary["ref"],
|
||||
runs_dir=str(tmp_path),
|
||||
)
|
||||
|
||||
_note = f"artifact_dir={artifact_dir}" # visible in -v output via assert messages
|
||||
|
||||
if canary["expected_green"]:
|
||||
_assert_green(rc, results, canary, _note)
|
||||
else:
|
||||
_assert_red(rc, results, canary, _note)
|
||||
|
||||
|
||||
def _assert_green(rc: int, results: dict | None, canary: dict, note: str) -> None:
|
||||
"""Assert a good-canary run is GREEN with real semantic assertions."""
|
||||
|
||||
# 1. Harness exit code must be 0 (GREEN).
|
||||
assert rc == 0, f"[{canary['id']}] harness returned non-zero rc={rc} — expected GREEN. {note}"
|
||||
|
||||
assert (
|
||||
results is not None
|
||||
), f"[{canary['id']}] results.json not written — harness may have crashed. {note}"
|
||||
|
||||
# 2. Install tier must have passed.
|
||||
assert results.get("results", {}).get("install") == "pass", (
|
||||
f"[{canary['id']}] install tier did not pass: " f"results={results.get('results')}. {note}"
|
||||
)
|
||||
|
||||
# 3. No tier may have FAILED (skips are acceptable for recipes without backup or custom tests).
|
||||
failed_tiers = [t for t, s in results.get("results", {}).items() if s == "fail"]
|
||||
assert not failed_tiers, f"[{canary['id']}] tiers failed: {failed_tiers}. {note}"
|
||||
|
||||
# 4. Teardown must be clean (no leftover containers/volumes/secrets).
|
||||
assert (
|
||||
results.get("flags", {}).get("clean_teardown") is True
|
||||
), f"[{canary['id']}] clean_teardown=False — residual state left on server. {note}"
|
||||
|
||||
# 5. No secret values leaked into the results artifact.
|
||||
assert (
|
||||
results.get("flags", {}).get("no_secret_leak") is True
|
||||
), f"[{canary['id']}] no_secret_leak=False — a secret value appeared in results.json. {note}"
|
||||
|
||||
# 6. Semantic stage assertions — TEETH CHECK.
|
||||
# These verify that specific named tests actually ran and passed in the expected stage.
|
||||
# If a tier assertion is removed or made vacuous, the named test disappears from results.json
|
||||
# and this assert fires — proving the regression suite guards against silent test removal.
|
||||
for stage_name, test_name_substr in canary.get("stage_pass_checks", []):
|
||||
assert stage_has_passing_test(results, stage_name, test_name_substr), (
|
||||
f"[{canary['id']}] expected a passing test containing {test_name_substr!r} in "
|
||||
f"stage={stage_name!r}, but none found. "
|
||||
f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}"
|
||||
)
|
||||
|
||||
|
||||
def _assert_red(rc: int, results: dict | None, canary: dict, note: str) -> None:
|
||||
"""Assert a bad-canary run is RED (false-green guard).
|
||||
|
||||
The PRIMARY assertion is rc != 0. If the harness wrongly returns 0 (green) for this fixture,
|
||||
this assert fails → the regression suite catches the false-green. This is the core guard.
|
||||
"""
|
||||
|
||||
# PRIMARY: harness must return non-zero (RED).
|
||||
# If the harness returns 0 for a broken app, the regression suite fails here — false-green caught.
|
||||
assert rc != 0, (
|
||||
f"[{canary['id']}] harness returned rc=0 (GREEN) for a KNOWN-BAD fixture — "
|
||||
f"FALSE-GREEN detected. The harness failed to catch the broken app. {note}"
|
||||
)
|
||||
|
||||
# SECONDARY: verify the specific failing test is present in results.json.
|
||||
# If the content-type assertion is removed/vacuated, stage_has_failing_test() returns False here
|
||||
# → this assert fires → we detect that the guard itself was removed (a meta-failure).
|
||||
if results is not None:
|
||||
for stage_name, test_name_substr in canary.get("stage_fail_checks", []):
|
||||
assert stage_has_failing_test(results, stage_name, test_name_substr), (
|
||||
f"[{canary['id']}] expected a failing test containing {test_name_substr!r} in "
|
||||
f"stage={stage_name!r}, but none found. "
|
||||
f"The guard may have been removed or vacuated. "
|
||||
f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}"
|
||||
)
|
||||
|
||||
|
||||
def _stage_tests(results: dict, stage_name: str) -> list[dict]:
|
||||
for stage in results.get("stages", []):
|
||||
if stage.get("name") == stage_name:
|
||||
return stage.get("tests", [])
|
||||
return []
|
||||
Reference in New Issue
Block a user