Three canaries (@pytest.mark.canary) drive the real cold CI lifecycle:
- good-simple: custom-html-tiny @ main (435df8fc) — fast signal, expects GREEN
- good-significant: lasuite-docs @ main (290a8ad7) — multi-service, expects GREEN
- bad-false-green: custom-html @ v5-stale-docroot (71e7326a) — expects RED
Semantic teeth: beyond exit-code, each test asserts that specific named tests
ran in results.json stages (test_serving, test_serving_and_frontend, test_content_type).
If an assertion is removed, the named test disappears → regression test fails.
Includes conftest (run_recipe_ci helper + stage_has_{passing,failing}_test),
README (cadence policy, how to run, how to add), and phase state files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Regression canaries — E2E self-tests for the cc-ci server
A standing pytest suite that drives the real cc-ci lifecycle harness against pinned canary recipes and verifies both halves of the server's job:
- Good canaries — healthy apps are reported GREEN (install + upgrade + backup/restore pass).
- Bad canary — broken apps are caught RED; a false-green makes the regression test itself fail.
These tests run the full cold lifecycle on the live cc-ci server. They are slow (minutes per
canary) and opt-in — kept out of the per-commit fast path by the canary marker.
How to run
Run on the cc-ci server (abra + Docker + Swarm required):
ssh cc-ci
cd /root/cc-ci # or wherever the repo is checked out
cc-ci-run python -m pytest tests/regression/ -m canary -v
Or a single canary:
cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v
From the orchestrator:
ssh cc-ci "cd /root/cc-ci && cc-ci-run python -m pytest tests/regression/ -m canary -v"
Canaries
| ID | Recipe | Purpose | Expected verdict |
|---|---|---|---|
good-simple |
custom-html-tiny |
Minimal static server — fast signal | GREEN |
good-significant |
lasuite-docs |
Multi-service (backend + Postgres + Collabora + OIDC) | GREEN |
bad-false-green |
custom-html @ v5-stale-docroot |
App is UP but serves wrong Content-Type — catches false-green | RED |
Why the bad canary exists
The scariest regression is a false-green: the server reports PASS while the app is broken.
We already saw a fabricated full-PASS during the build. The bad-false-green canary pins a known-
broken fixture (v5-stale-docroot: nginx serves .txt as application/octet-stream). The
harness's test_content_type_html_and_txt catches this and returns RED (build #75 was RED for
exactly this fixture).
The regression test asserts rc != 0. If the harness ever wrongly returns green for this fixture,
that assert fires — false-green is caught before any merge.
What each canary verifies
Per-tier semantic assertions (the "teeth")
The tests assert MORE than the harness exit code: they check that specific named assertions ran and got the expected result. This guards against a different failure mode — a tier that nominally "passes" because the assertion was silently removed or made vacuous.
| Stage | Test name | What it proves |
|---|---|---|
| install | test_serving |
Generic HTTP readiness check actually ran |
| install | test_serving_and_frontend |
Lasuite-docs frontend (SPA shell) actually loaded |
| custom | test_content_type |
Content-type assertion actually ran (bad canary only) |
If a tier assertion is removed: the named test disappears from results.json → the semantic
check fires → the regression suite catches the removal.
Additional structural assertions (good canaries)
installtier: "pass" (not fail, not skip)- No tier is "fail" (skips acceptable for recipes without backup/custom tests)
flags.clean_teardown = True(no leftover containers/volumes/secrets)flags.no_secret_leak = True(no secret value in the results artifact)
Cadence policy
Do NOT run on every commit or PR. These are slow and resource-heavy. Run them:
- Before a release of the cc-ci server (after a batch of server changes).
- As a polishing pass or pre-merge check for significant server refactors.
- On-demand when you suspect a regression:
pytest -m canary.
They are NOT wired to the per-commit Drone pipeline. If adding a !testme-style trigger for the
cc-ci repo, gate it behind a deliberate label (e.g. run-canaries) — not an automatic run on
every push.
How to add a canary
- Identify a recipe that is already deployable and has pinned version tags.
- Decide the expected verdict (GREEN or RED) and which tier assertions have teeth.
- Add an entry to
CANARIESintest_canaries.py:
{
"id": "good-myrecipe",
"recipe": "my-recipe",
"src": "recipe-maintainers/my-recipe",
"ref": "<pinned-sha>", # pin to a specific commit for stability
"expected_green": True,
"stage_pass_checks": [
("install", "test_serving"), # verify this named test ran and passed
],
"stage_fail_checks": [],
}
-
Run the canary once to confirm it passes:
cc-ci-run python -m pytest tests/regression/ -m canary -k good-myrecipe -v -
Update the pin comment with the date and the recipe version it was pinned at.
Pin maintenance
Canary refs are pinned to specific SHAs for stability. When a recipe publishes a new release:
- Update the
"ref"SHA in the canary definition (use the new main-branch HEAD). - Update the pin comment with the new date/version.
- Re-run the canary to confirm GREEN before committing the pin update.
The bad canary (v5-stale-docroot) is a stable fixture branch — update only if the branch is
deleted. If deleted, recreate the pattern: an app that is up + passes lifecycle tiers but fails
one functional assertion.