feat(regression): add tests/regression/ E2E canary suite
Some checks failed
continuous-integration/drone/push Build is failing

Three canaries (@pytest.mark.canary) drive the real cold CI lifecycle:
- good-simple: custom-html-tiny @ main (435df8fc) — fast signal, expects GREEN
- good-significant: lasuite-docs @ main (290a8ad7) — multi-service, expects GREEN
- bad-false-green: custom-html @ v5-stale-docroot (71e7326a) — expects RED

Semantic teeth: beyond exit-code, each test asserts that specific named tests
ran in results.json stages (test_serving, test_serving_and_frontend, test_content_type).
If an assertion is removed, the named test disappears → regression test fails.

Includes conftest (run_recipe_ci helper + stage_has_{passing,failing}_test),
README (cadence policy, how to run, how to add), and phase state files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-06-02 01:25:55 +00:00
parent 91a7088f56
commit fd3db37c49
6 changed files with 566 additions and 1 deletions

View File

@ -2,7 +2,12 @@
## Build backlog
*(Builder-owned section — read-only for Adversary)*
- [x] Create `tests/regression/` suite (conftest + test_canaries + README)
- [ ] Run `good-simple` canary (custom-html-tiny main) → confirm GREEN + test_serving passes
- [ ] Run `bad-false-green` canary (custom-html v5-stale-docroot) → confirm RED + test_content_type fails
- [ ] Run `good-significant` canary (lasuite-docs main) → confirm GREEN + test_serving_and_frontend passes
- [ ] Open PR for operator review (DoD item 5: NOT merged)
- [ ] Claim gate once all canary runs are GREEN/RED as expected + PR is open
## Adversary findings

View File

@ -0,0 +1,56 @@
# JOURNAL — server regression canaries phase (Builder)
**Phase:** server regression canaries
**Started:** 2026-06-02
---
## Step 0 — phase kickoff and design (2026-06-02)
**Context:** Mirror phase (plan-mirror-enroll-all-recipes.md) completed DONE at 2026-06-02T01:16Z.
Adversary initialized regression phase files in machine-docs/ at commit f202c5a.
**Decision: run regression tests ON cc-ci, not from the orchestrator**
The regression tests call `run_recipe_ci.py` which uses abra/docker/swarm — these only exist on
cc-ci. The test process runs under `cc-ci-run python -m pytest`, which sets up the right PATH
(abra, python3, playwright, etc.). The test then invokes `run_recipe_ci.py` as a subprocess using
`sys.executable` (inherits the same python3 from cc-ci-run).
The README.md documents the `ssh cc-ci "cc-ci-run python -m pytest tests/regression/ -m canary"`
invocation pattern.
**Canary selection:**
| ID | Recipe | SHA | Rationale |
|----|--------|-----|-----------|
| good-simple | custom-html-tiny | 435df8fc (main) | Fast, few deps, quick signal |
| good-significant | lasuite-docs | 290a8ad7 (main) | Multi-service, exercises real breadth |
| bad-false-green | custom-html | 71e7326a (v5-stale-docroot) | Already produced RED build #75; pinned fixture |
SHAs confirmed from Gitea API on 2026-06-02.
**Semantic checks ("teeth") design:**
The regression tests assert BOTH exit code AND named tests in results.json stages. This guards
against two failure modes:
1. Harness returns wrong exit code (false-green / false-red) → rc assertion catches it
2. A specific assertion is silently removed/vacuated → named test disappears from stages → semantic check catches it
For custom-html-tiny: `test_serving` (generic install) must appear passing
For lasuite-docs: `test_serving_and_frontend` (install overlay) must appear passing
For bad canary: `test_content_type` (custom functional) must appear failing
**File layout:**
- `tests/regression/conftest.py` — run_recipe_ci(), stage_has_passing_test(), stage_has_failing_test()
- `tests/regression/test_canaries.py` — parametrized @pytest.mark.canary test
- `tests/regression/README.md` — cadence policy + how to run + how to add
**Next step:** commit + push, then run good-simple and bad-false-green canaries to get real output.
lasuite-docs is slow (10-20 min) so will run it last.
---
## Step 1 — initial canary runs (in progress, 2026-06-02)
Committed suite, will record canary outputs here as they complete.

View File

@ -0,0 +1,85 @@
# STATUS — server regression canaries phase
**Phase:** server regression canaries (codified E2E self-tests)
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-server-regression-canaries.md`
**Builder loop started:** 2026-06-02
**Repo:** git.autonomic.zone/recipe-maintainers/cc-ci
---
## Current state
**Gate: D-initial CLAIMED — test suite written; awaiting first canary run**
The `tests/regression/` suite is committed. Before claiming the final gate (all DoD items
verified), the canaries need to actually run on the live server and return the expected verdicts.
Currently running the good-simple (custom-html-tiny) canary to confirm GREEN.
---
## What was built
`tests/regression/` committed in the cc-ci repo:
- `conftest.py``run_recipe_ci()` helper that invokes the real harness as subprocess, returns `(rc, results_dict, artifact_dir)`; `stage_has_passing_test()` / `stage_has_failing_test()` helpers for semantic checks
- `test_canaries.py` — parametrized `@pytest.mark.canary` test with three canaries (see below)
- `README.md` — cadence policy, how to run, how to add a canary
---
## Canaries defined
| ID | Recipe | SHA pinned | Expected |
|----|--------|-----------|----------|
| `good-simple` | `custom-html-tiny` | `435df8fc` (main 2026-06-02) | GREEN |
| `good-significant` | `lasuite-docs` | `290a8ad7` (main 2026-06-02) | GREEN |
| `bad-false-green` | `custom-html` | `71e7326a` (v5-stale-docroot) | RED |
---
## Semantic assertions (teeth)
Good canaries:
- `rc == 0` (harness exit)
- install tier: "pass"
- No tier is "fail"
- `flags.clean_teardown == True`
- `flags.no_secret_leak == True`
- Named test `test_serving` present + passing in install stage (custom-html-tiny)
- Named test `test_serving_and_frontend` present + passing in install stage (lasuite-docs)
Bad canary:
- `rc != 0` (PRIMARY — false-green catches here)
- Named test `test_content_type` present + FAILING in custom stage (proves guard not vacuated)
---
## How to verify (Adversary commands)
From cc-ci server root (requires the repo checked out at `/root/cc-ci` or similar):
```bash
# Good simple (fast ~2-5 min):
cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v
# Bad canary (fast ~2-5 min, same recipe lifecycle):
cc-ci-run python -m pytest tests/regression/ -m canary -k bad-false-green -v
# Full suite (slow — lasuite-docs is 10-20 min):
cc-ci-run python -m pytest tests/regression/ -m canary -v
```
Expected outcomes:
- `good-simple`: test PASSES (harness returns GREEN, test_serving passes)
- `bad-false-green`: test PASSES (harness returns RED, test_content_type fails in custom stage)
- `good-significant`: test PASSES (harness returns GREEN, test_serving_and_frontend passes)
Verify teeth: tamper with an outcome to confirm the regression test fails:
- For good canary: unset `test_serving` (remove it) → `stage_has_passing_test` returns False → test fails
- For bad canary: change the assert to `rc == 0` → would fail if harness returns non-zero (teeth work)
---
## In-flight
- Canary `good-simple` running on cc-ci (started ~now)
- Status will be updated once run completes and we have actual output to paste

136
tests/regression/README.md Normal file
View File

@ -0,0 +1,136 @@
# Regression canaries — E2E self-tests for the cc-ci server
A standing pytest suite that drives the **real** cc-ci lifecycle harness against pinned canary
recipes and verifies both halves of the server's job:
1. **Good canaries** — healthy apps are reported GREEN (install + upgrade + backup/restore pass).
2. **Bad canary** — broken apps are caught RED; a false-green makes the regression test itself fail.
These tests run the full cold lifecycle on the live cc-ci server. They are **slow** (minutes per
canary) and **opt-in** — kept out of the per-commit fast path by the `canary` marker.
---
## How to run
Run on the cc-ci server (abra + Docker + Swarm required):
```bash
ssh cc-ci
cd /root/cc-ci # or wherever the repo is checked out
cc-ci-run python -m pytest tests/regression/ -m canary -v
```
Or a single canary:
```bash
cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v
```
From the orchestrator:
```bash
ssh cc-ci "cd /root/cc-ci && cc-ci-run python -m pytest tests/regression/ -m canary -v"
```
---
## Canaries
| ID | Recipe | Purpose | Expected verdict |
|----|--------|---------|-----------------|
| `good-simple` | `custom-html-tiny` | Minimal static server — fast signal | GREEN |
| `good-significant` | `lasuite-docs` | Multi-service (backend + Postgres + Collabora + OIDC) | GREEN |
| `bad-false-green` | `custom-html` @ `v5-stale-docroot` | App is UP but serves wrong Content-Type — catches false-green | RED |
### Why the bad canary exists
The scariest regression is a **false-green**: the server reports PASS while the app is broken.
We already saw a fabricated full-PASS during the build. The `bad-false-green` canary pins a known-
broken fixture (`v5-stale-docroot`: nginx serves `.txt` as `application/octet-stream`). The
harness's `test_content_type_html_and_txt` catches this and returns RED (build #75 was RED for
exactly this fixture).
The regression test asserts `rc != 0`. If the harness ever wrongly returns green for this fixture,
that assert fires — false-green is caught before any merge.
---
## What each canary verifies
### Per-tier semantic assertions (the "teeth")
The tests assert MORE than the harness exit code: they check that **specific named assertions**
ran and got the expected result. This guards against a different failure mode — a tier that
nominally "passes" because the assertion was silently removed or made vacuous.
| Stage | Test name | What it proves |
|-------|-----------|---------------|
| install | `test_serving` | Generic HTTP readiness check actually ran |
| install | `test_serving_and_frontend` | Lasuite-docs frontend (SPA shell) actually loaded |
| custom | `test_content_type` | Content-type assertion actually ran (bad canary only) |
If a tier assertion is removed: the named test disappears from `results.json` → the semantic
check fires → the regression suite catches the removal.
### Additional structural assertions (good canaries)
- `install` tier: "pass" (not fail, not skip)
- No tier is "fail" (skips acceptable for recipes without backup/custom tests)
- `flags.clean_teardown = True` (no leftover containers/volumes/secrets)
- `flags.no_secret_leak = True` (no secret value in the results artifact)
---
## Cadence policy
**Do NOT run on every commit or PR.** These are slow and resource-heavy. Run them:
- Before a **release** of the cc-ci server (after a batch of server changes).
- As a **polishing pass** or pre-merge check for significant server refactors.
- On-demand when you suspect a regression: `pytest -m canary`.
They are NOT wired to the per-commit Drone pipeline. If adding a `!testme`-style trigger for the
cc-ci repo, gate it behind a deliberate label (e.g. `run-canaries`) — not an automatic run on
every push.
---
## How to add a canary
1. Identify a recipe that is already deployable and has pinned version tags.
2. Decide the expected verdict (GREEN or RED) and which tier assertions have teeth.
3. Add an entry to `CANARIES` in `test_canaries.py`:
```python
{
"id": "good-myrecipe",
"recipe": "my-recipe",
"src": "recipe-maintainers/my-recipe",
"ref": "<pinned-sha>", # pin to a specific commit for stability
"expected_green": True,
"stage_pass_checks": [
("install", "test_serving"), # verify this named test ran and passed
],
"stage_fail_checks": [],
}
```
4. Run the canary once to confirm it passes:
`cc-ci-run python -m pytest tests/regression/ -m canary -k good-myrecipe -v`
5. Update the pin comment with the date and the recipe version it was pinned at.
---
## Pin maintenance
Canary refs are pinned to specific SHAs for stability. When a recipe publishes a new release:
1. Update the `"ref"` SHA in the canary definition (use the new main-branch HEAD).
2. Update the pin comment with the new date/version.
3. Re-run the canary to confirm GREEN before committing the pin update.
The bad canary (`v5-stale-docroot`) is a stable fixture branch — update only if the branch is
deleted. If deleted, recreate the pattern: an app that is up + passes lifecycle tiers but fails
one functional assertion.

View File

@ -0,0 +1,102 @@
"""Shared fixtures and helpers for E2E canary regression tests.
The regression tests call the real cc-ci harness (run_recipe_ci.py) as a subprocess and assert on
its outputs (exit code, results.json). They run ON the cc-ci server, not the orchestrator — abra,
Docker, and Swarm must be present.
Invoke: cc-ci-run python -m pytest tests/regression/ -m canary -v
"""
from __future__ import annotations
import json
import os
import subprocess
import sys
import time
ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
def pytest_configure(config):
config.addinivalue_line(
"markers",
"canary: slow E2E canary test — drives the full cold CI lifecycle; run on-demand only.",
)
def run_recipe_ci(
recipe: str,
src: str,
ref: str,
pr: str = "0",
stages: str = "install,upgrade,backup,restore,custom",
runs_dir: str | None = None,
run_id_prefix: str = "regression",
timeout: int = 3600,
) -> tuple[int, dict | None, str]:
"""Invoke run_recipe_ci.py with the given canary params.
Returns (rc, results_dict_or_None, run_artifact_dir).
Stdout/stderr stream live so a human can follow progress.
"""
ts = int(time.time())
run_id = f"{run_id_prefix}-{recipe}-{ref[:12]}-{ts}"
if runs_dir is None:
runs_dir = "/var/lib/cc-ci-runs"
env = dict(os.environ)
env.update(
{
"RECIPE": recipe,
"REF": ref,
"SRC": src,
"PR": pr,
"STAGES": stages,
"CCCI_RUN_ID": run_id,
"CCCI_RUNS_DIR": runs_dir,
"HOME": "/root",
}
)
# Keep PLAYWRIGHT env from the outer cc-ci-run wrapper (already in os.environ if running under it)
script = os.path.join(ROOT, "runner", "run_recipe_ci.py")
result = subprocess.run(
[sys.executable, script],
env=env,
timeout=timeout,
)
rc = result.returncode
artifact_dir = os.path.join(runs_dir, run_id)
results_path = os.path.join(artifact_dir, "results.json")
results_data: dict | None = None
if os.path.exists(results_path):
with open(results_path) as f:
results_data = json.load(f)
return rc, results_data, artifact_dir
def find_stage_tests(results: dict, stage_name: str) -> list[dict]:
"""Return the per-test list for a named stage from results.json, or []."""
for stage in results.get("stages", []):
if stage.get("name") == stage_name:
return stage.get("tests", [])
return []
def stage_has_passing_test(results: dict, stage_name: str, test_name_substr: str) -> bool:
"""True if the named stage contains a passing test whose name includes test_name_substr."""
for t in find_stage_tests(results, stage_name):
if test_name_substr in t.get("name", "") and t.get("status") == "pass":
return True
return False
def stage_has_failing_test(results: dict, stage_name: str, test_name_substr: str) -> bool:
"""True if the named stage contains a failing test whose name includes test_name_substr."""
for t in find_stage_tests(results, stage_name):
if test_name_substr in t.get("name", "") and t.get("status") in ("fail", "error"):
return True
return False

View File

@ -0,0 +1,181 @@
"""E2E canary regression tests — the server's standing self-test suite.
Three canaries prove both halves of the server's job:
1. GREEN canaries — good apps are reported healthy (install+upgrade+backup/restore pass).
2. RED canary — broken apps are caught; a false-green makes THIS test fail.
Run: cc-ci-run python -m pytest tests/regression/ -m canary -v
Slow: each canary drives the full cold lifecycle on the live server (minutes per run).
Pin policy: canary refs are pinned to specific SHAs for stability. Update them when the recipe
publishes a new release and the pin is stale (re-run to confirm GREEN before updating).
"""
from __future__ import annotations
import pytest
from .conftest import run_recipe_ci, stage_has_failing_test, stage_has_passing_test
# ---------------------------------------------------------------------------
# Canary definitions
# ---------------------------------------------------------------------------
# Good canary 1: minimal static-file server — fast signal, few deps.
_SIMPLE = {
"id": "good-simple",
"recipe": "custom-html-tiny",
"src": "recipe-maintainers/custom-html-tiny",
# Pin: main @ 2026-06-02 — update if the recipe publishes a new release and pin goes stale.
"ref": "435df8fc98ef7598084fcffcd6225470eca80053",
"expected_green": True,
# Named tests that MUST appear with "pass" in the result — these are the semantic teeth.
# If the generic install assertion is removed/vacated, test_serving disappears → this fails.
"stage_pass_checks": [
("install", "test_serving"),
],
"stage_fail_checks": [],
}
# Good canary 2: multi-service stack — backend + Postgres + Collabora WOPI + OIDC.
# Exercises real breadth. Slowest canary (~10-20 min full lifecycle).
_SIGNIFICANT = {
"id": "good-significant",
"recipe": "lasuite-docs",
"src": "recipe-maintainers/lasuite-docs",
# Pin: main @ 2026-06-02
"ref": "290a8ad72d06232f0b3f302d976af14bef0f3c53",
"expected_green": True,
"stage_pass_checks": [
("install", "test_serving_and_frontend"),
],
"stage_fail_checks": [],
}
# Bad canary: app is UP + passes all lifecycle tiers but the custom functional assertion detects a
# semantic defect (wrong Content-Type for .txt files). The harness MUST report RED.
# If the harness wrongly returns green for this fixture, assert rc != 0 fails → false-green caught.
_BAD = {
"id": "bad-false-green",
"recipe": "custom-html",
"src": "recipe-maintainers/custom-html",
# Pin: v5-stale-docroot @ 71e7326 — serves .txt as application/octet-stream; build #75 was RED.
# Recreate pattern if branch disappears: app up + passes lifecycle, fails one content assertion.
"ref": "71e7326a99bbb69035a046fba8fa51859ca66115",
"expected_green": False,
# The specific test that must have FAILED, proving the content-type assertion has teeth.
# If the assertion is vacated and the test disappears, stage_has_failing_test() returns False
# → the assert below fails → we detect that the guard was removed.
"stage_pass_checks": [],
"stage_fail_checks": [
("custom", "test_content_type"),
],
}
CANARIES = [_SIMPLE, _SIGNIFICANT, _BAD]
# ---------------------------------------------------------------------------
# Test
# ---------------------------------------------------------------------------
@pytest.mark.canary
@pytest.mark.parametrize("canary", CANARIES, ids=[c["id"] for c in CANARIES])
def test_canary(canary, tmp_path):
"""Drive the full cold CI lifecycle for a canary recipe and verify the outcome.
For GREEN canaries: proves the harness correctly reports a healthy app as healthy, and that
the per-tier semantic assertions actually ran (not vacuous).
For the RED canary: proves the harness catches a broken app — if the harness wrongly returned
green, `assert rc != 0` fails, catching the false-green.
"""
rc, results, artifact_dir = run_recipe_ci(
recipe=canary["recipe"],
src=canary["src"],
ref=canary["ref"],
runs_dir=str(tmp_path),
)
_note = f"artifact_dir={artifact_dir}" # visible in -v output via assert messages
if canary["expected_green"]:
_assert_green(rc, results, canary, _note)
else:
_assert_red(rc, results, canary, _note)
def _assert_green(rc: int, results: dict | None, canary: dict, note: str) -> None:
"""Assert a good-canary run is GREEN with real semantic assertions."""
# 1. Harness exit code must be 0 (GREEN).
assert rc == 0, f"[{canary['id']}] harness returned non-zero rc={rc} — expected GREEN. {note}"
assert (
results is not None
), f"[{canary['id']}] results.json not written — harness may have crashed. {note}"
# 2. Install tier must have passed.
assert results.get("results", {}).get("install") == "pass", (
f"[{canary['id']}] install tier did not pass: " f"results={results.get('results')}. {note}"
)
# 3. No tier may have FAILED (skips are acceptable for recipes without backup or custom tests).
failed_tiers = [t for t, s in results.get("results", {}).items() if s == "fail"]
assert not failed_tiers, f"[{canary['id']}] tiers failed: {failed_tiers}. {note}"
# 4. Teardown must be clean (no leftover containers/volumes/secrets).
assert (
results.get("flags", {}).get("clean_teardown") is True
), f"[{canary['id']}] clean_teardown=False — residual state left on server. {note}"
# 5. No secret values leaked into the results artifact.
assert (
results.get("flags", {}).get("no_secret_leak") is True
), f"[{canary['id']}] no_secret_leak=False — a secret value appeared in results.json. {note}"
# 6. Semantic stage assertions — TEETH CHECK.
# These verify that specific named tests actually ran and passed in the expected stage.
# If a tier assertion is removed or made vacuous, the named test disappears from results.json
# and this assert fires — proving the regression suite guards against silent test removal.
for stage_name, test_name_substr in canary.get("stage_pass_checks", []):
assert stage_has_passing_test(results, stage_name, test_name_substr), (
f"[{canary['id']}] expected a passing test containing {test_name_substr!r} in "
f"stage={stage_name!r}, but none found. "
f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}"
)
def _assert_red(rc: int, results: dict | None, canary: dict, note: str) -> None:
"""Assert a bad-canary run is RED (false-green guard).
The PRIMARY assertion is rc != 0. If the harness wrongly returns 0 (green) for this fixture,
this assert fails → the regression suite catches the false-green. This is the core guard.
"""
# PRIMARY: harness must return non-zero (RED).
# If the harness returns 0 for a broken app, the regression suite fails here — false-green caught.
assert rc != 0, (
f"[{canary['id']}] harness returned rc=0 (GREEN) for a KNOWN-BAD fixture — "
f"FALSE-GREEN detected. The harness failed to catch the broken app. {note}"
)
# SECONDARY: verify the specific failing test is present in results.json.
# If the content-type assertion is removed/vacuated, stage_has_failing_test() returns False here
# → this assert fires → we detect that the guard itself was removed (a meta-failure).
if results is not None:
for stage_name, test_name_substr in canary.get("stage_fail_checks", []):
assert stage_has_failing_test(results, stage_name, test_name_substr), (
f"[{canary['id']}] expected a failing test containing {test_name_substr!r} in "
f"stage={stage_name!r}, but none found. "
f"The guard may have been removed or vacuated. "
f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}"
)
def _stage_tests(results: dict, stage_name: str) -> list[dict]:
for stage in results.get("stages", []):
if stage.get("name") == stage_name:
return stage.get("tests", [])
return []