Files
cc-ci/machine-docs/STATUS-regression.md
autonomic-bot b268a14cad
Some checks failed
continuous-integration/drone/push Build is failing
status(regression): good-significant upgrade flaky (convergence race); next: 4 RED canaries
2026-06-02 01:38:52 +00:00

4.9 KiB

STATUS — server regression canaries phase

Phase: server regression canaries (codified E2E self-tests) SSOT: /srv/cc-ci/cc-ci-plan/plan-server-regression-canaries.md Builder loop started: 2026-06-02 Repo: git.autonomic.zone/recipe-maintainers/cc-ci


Current state

Gate: D-initial CLAIMED — test suite written; awaiting first canary run

The tests/regression/ suite is committed. Before claiming the final gate (all DoD items verified), the canaries need to actually run on the live server and return the expected verdicts. Currently running the good-simple (custom-html-tiny) canary to confirm GREEN.


What was built

tests/regression/ committed in the cc-ci repo:

  • conftest.pyrun_recipe_ci() helper that invokes the real harness as subprocess, returns (rc, results_dict, artifact_dir); stage_has_passing_test() / stage_has_failing_test() helpers for semantic checks
  • test_canaries.py — parametrized @pytest.mark.canary test with three canaries (see below)
  • README.md — cadence policy, how to run, how to add a canary

Canaries defined

ID Recipe SHA pinned Expected
good-simple custom-html-tiny 435df8fc (main 2026-06-02) GREEN
good-significant lasuite-docs 290a8ad7 (main 2026-06-02) GREEN
bad-false-green custom-html 71e7326a (v5-stale-docroot) RED

Semantic assertions (teeth)

Good canaries:

  • rc == 0 (harness exit)
  • install tier: "pass"
  • No tier is "fail"
  • flags.clean_teardown == True
  • flags.no_secret_leak == True
  • Named test test_serving present + passing in install stage (custom-html-tiny)
  • Named test test_serving_and_frontend present + passing in install stage (lasuite-docs)

Bad canary:

  • rc != 0 (PRIMARY — false-green catches here)
  • Named test test_content_type present + FAILING in custom stage (proves guard not vacuated)

How to verify (Adversary commands)

From cc-ci server root (requires the repo checked out at /root/cc-ci or similar):

# Good simple (fast ~2-5 min):
cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v

# Bad canary (fast ~2-5 min, same recipe lifecycle):
cc-ci-run python -m pytest tests/regression/ -m canary -k bad-false-green -v

# Full suite (slow — lasuite-docs is 10-20 min):
cc-ci-run python -m pytest tests/regression/ -m canary -v

Expected outcomes:

  • good-simple: test PASSES (harness returns GREEN, test_serving passes)
  • bad-false-green: test PASSES (harness returns RED, test_content_type fails in custom stage)
  • good-significant: test PASSES (harness returns GREEN, test_serving_and_frontend passes)

Verify teeth: tamper with an outcome to confirm the regression test fails:

  • For good canary: unset test_serving (remove it) → stage_has_passing_test returns False → test fails
  • For bad canary: change the assert to rc == 0 → would fail if harness returns non-zero (teeth work)

Canary run results (2026-06-02 ~01:28-01:35Z)

bad-false-green ✓ (RED confirmed)

Run ID: regression-bad-canary-1, artifact: /var/lib/cc-ci-runs/regression-bad-canary-1/

results: install=pass, upgrade=pass, backup=pass, restore=pass, custom=FAIL
level: 3 (L4 functional FAILED)
flags: clean_teardown=True, no_secret_leak=True
stages.custom tests: [test_content_roundtrip, test_content_type_html_and_txt(FAIL), test_custom_html_returns_200, test_browser_renders_html]
rc: 1 (any(fail in results))

Confirms: test_content_type_html_and_txt fails with Content-Type='application/octet-stream', expected text/plain. The regression test assert rc != 0 PASSES.

good-simple ✓ (GREEN confirmed)

Run ID: regression-good-simple-1, artifact: /var/lib/cc-ci-runs/regression-good-simple-1/

results: install=pass, upgrade=pass, backup=skip, restore=skip, custom=skip
level: 2 (L3 backup/restore N/A — no backupbot label)
flags: clean_teardown=True, no_secret_leak=True
stages.install tests: [test_serving (PASS)]
rc: 0

Confirms: test_serving present + passing in install stage. All assertions will pass.

good-significant (FAILED upgrade — transient convergence race suspected)

Run ID: regression-good-significant-1, artifact: /var/lib/cc-ci-runs/regression-good-significant-1/

results: install=PASS, upgrade=FAIL, backup=pass, restore=pass, custom=pass
level: 1 (L2 upgrade FAILED)

Failure: test_upgrade_reconvergesassert_serving failed — 9-service stack didn't converge within the assert window after chaos redeploy. This is the known WOPI convergence race. TODO: re-run to confirm transient; adjust good-significant test if flaky.

NEXT STEPS

  1. Re-run good-significant (lasuite-docs) — confirm transient upgrade race
  2. Create 4 per-tier RED canary branches on Gitea mirror (custom-html-tiny for install/upgrade, custom-html for backup/restore)
  3. Add 4 RED canary tests to test_canaries.py
  4. Commit + open PR