plan(regression): add per-tier RED canaries (install/upgrade/backup/restore)
One deliberately-broken custom-html-tiny fixture per lifecycle tier so the suite proves the server reports RED at EVERY tier (not just one) — each asserts RED at the intended tier with prior tiers PASS, so it's 'catches a failure at this tier', not 'fails somewhere'. Fast (simplest recipe); the fast subset of the suite vs the slow good canaries. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@ -26,14 +26,36 @@ The suite proves the server can do **both** halves of its job — and the second
|
||||
|---|---|---|---|
|
||||
| **Simple (good)** | `custom-html-tiny` | Minimal, fast, few deps — quick signal | GREEN |
|
||||
| **Significant (good)** | `lasuite-docs` | Multi-service: backend + Postgres + Collabora WOPI + keycloak OIDC — exercises real breadth | GREEN |
|
||||
| **Known-BAD (false-green guard)** | a seeded fixture (see below) | App comes up healthy but a semantic tier assertion is violated | **RED** |
|
||||
| **Known-BAD: custom-assertion** | a seeded fixture (see below) | App comes up healthy but a functional/custom assertion is violated | **RED** |
|
||||
| **Known-BAD: per-tier ×4** | `custom-html-tiny` broken at one tier each (see below) | install / upgrade / backup / restore each fail in turn | **RED** at the intended tier |
|
||||
|
||||
**Known-bad fixture:** reuse/recreate the phase-5 seeded case — `custom-html` branch `v5-stale-docroot`
|
||||
(serves `.txt` as `application/octet-stream` while the app is externally healthy), which already
|
||||
produced a RED build (#75) with only the content-type custom assertion failing. The regression test
|
||||
asserts the harness returns **RED** for this fixture. (If that branch is gone, recreate the pattern:
|
||||
an app that is up + passes lifecycle tiers but fails one functional assertion.) Pin the fixture by
|
||||
commit SHA so it's stable.
|
||||
**Known-bad fixture (custom-assertion):** reuse/recreate the phase-5 seeded case — `custom-html` branch
|
||||
`v5-stale-docroot` (serves `.txt` as `application/octet-stream` while the app is externally healthy),
|
||||
which already produced a RED build (#75) with only the content-type custom assertion failing. The
|
||||
regression test asserts the harness returns **RED** for this fixture. (If that branch is gone, recreate
|
||||
the pattern: an app that is up + passes lifecycle tiers but fails one functional assertion.) Pin by SHA.
|
||||
|
||||
### Per-tier RED canaries — prove the server catches failure at EVERY tier (fast)
|
||||
|
||||
The single fixture above only proves the server catches a *custom-assertion* failure. Add **one RED
|
||||
canary per lifecycle tier** so we prove the server reports RED at each of install / upgrade / backup /
|
||||
restore — false-green is the scariest regression, and it can hide at any tier (e.g. restore silently
|
||||
restoring nothing, the ghost/mattermost class of bug). Use the **simplest recipe — `custom-html-tiny`**
|
||||
(static content, deploys in seconds) so all four run **fast**; each is a fixture broken at exactly one
|
||||
tier, pinned by commit SHA.
|
||||
|
||||
| RED canary | How it's broken (custom-html-tiny fixture) | Expected harness result |
|
||||
|---|---|---|
|
||||
| **install** | image tag that never becomes healthy / a healthcheck that can't pass | **install tier RED** |
|
||||
| **upgrade** | installs clean; the upgrade target breaks the container so post-upgrade health fails | install PASS, **upgrade tier RED** |
|
||||
| **backup** | install+upgrade clean; backup misconfigured (backupbot label/target wrong → backup errors or yields no artifact) | **backup tier RED** |
|
||||
| **restore** | backup succeeds; restore is a no-op (hook does nothing) so the pre-seeded marker is ABSENT after restore | **restore tier RED** (the scariest false-green) |
|
||||
|
||||
Each pytest asserts **precisely**: overall verdict RED, the failing tier is the *intended* one, AND the
|
||||
tiers *before* it PASSED (e.g. upgrade-RED requires install to have passed) — so it proves "catches a
|
||||
failure **at this tier**", not merely "fails somewhere". These four form the **fast subset** of the
|
||||
suite; consider a sub-marker (`@pytest.mark.canary_fast`) so they can optionally run as a quicker
|
||||
pre-check while the slow good canaries (esp. lasuite-docs) stay on the milestone cadence below.
|
||||
|
||||
## What "works as expected" means per tier (real assertions, not exit codes)
|
||||
|
||||
@ -78,10 +100,14 @@ lasuite-docs is minutes, needs the live server/abra/Swarm). Run them **deliberat
|
||||
2. Run GREEN on both good canaries (`custom-html-tiny`, `lasuite-docs`) with the per-tier semantic
|
||||
assertions actually executing (Adversary confirms the assertions FAIL if you tamper with an outcome —
|
||||
i.e. the assertions have teeth, they're not vacuous).
|
||||
3. The known-bad canary makes the suite assert **RED** — and the Adversary confirms that if the server
|
||||
*wrongly* returned green for it, the regression test would FAIL (false-green is caught).
|
||||
4. A short `tests/regression/README.md`: how to run it, what each canary guards, how to add a canary.
|
||||
5. NOT merged — recipe/test PR opened for operator review (loops never merge).
|
||||
3. The custom-assertion known-bad canary makes the suite assert **RED** — and the Adversary confirms
|
||||
that if the server *wrongly* returned green for it, the regression test would FAIL (false-green caught).
|
||||
4. **The four per-tier RED canaries** (install/upgrade/backup/restore, on `custom-html-tiny`) each make
|
||||
the suite assert RED **at the intended tier**, with the prior tiers asserted PASS. Adversary confirms
|
||||
each has teeth: if the server wrongly green-lit that tier, the corresponding test would FAIL. They run
|
||||
fast.
|
||||
5. A short `tests/regression/README.md`: how to run it, what each canary guards, how to add a canary.
|
||||
6. NOT merged — recipe/test PR opened for operator review (loops never merge).
|
||||
|
||||
## Risks / notes
|
||||
- **Slow + resource-heavy:** full lifecycle on lasuite-docs is minutes and needs the live server/abra/
|
||||
|
||||
Reference in New Issue
Block a user