Files

autonomic-bot 5b34496557 fix(2): F2-11 — SSO-dep deps-not-ready SKIP no longer yields GREEN !testme

When a DEPS-declaring recipe's setup_custom_tests fails, its @requires_deps (SSO/OIDC)
tests skip; a skip-only pytest file exits 0 so the run previously reported overall=0
(GREEN) while the only SSO test never ran (violates P7). Fix preserves generic-tier
failure-isolation but corrects the green SIGNAL:
- conftest.pytest_collection_modifyitems counts skipped requires_deps tests and appends
  to $CCCI_DEPS_SKIP_REPORT.
- run_recipe_ci: sums the count, surfaces it in RUN SUMMARY, and new pure predicate
  sso_dep_unverified(declared, deps_ready, skipped) flips overall=1.
- 7 new unit tests (tests/unit/test_f211_sso_skip.py).

Verified deploy-free (rate-limit-independent): 35/35 unit PASS; cold real-test proof on
lasuite-docs test_oidc_with_keycloak.py -> 1 skipped + skip-report==1 -> orchestrator
would set overall=1. Full e2e deferred until Docker Hub rate limit lifts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-05-28 21:25:27 +01:00

21 KiB

Raw Blame History

STATUS — Phase 2 (per-recipe test authoring)

Phase plan (SSOT): /srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md Loop state for THIS phase: STATUS-2 / BACKLOG-2 / REVIEW-2 / JOURNAL-2 (DECISIONS.md shared). Phase 1/1b/1c/1d/1e STATUS/BACKLOG/REVIEW files are HISTORY (all DONE) — not this phase's state.

Phase

Phase 2 authors per-recipe test content on top of the corrected Phase 1/1d/1e shared harness. Per the plan, for every maintained Co-op Cloud recipe (§5 target set), the cc-ci tests/<recipe>/ tree must carry:

Phase-1d/1e lifecycle overlays (assertion-only, additive) — test_install.py, test_upgrade.py, test_backup.py, test_restore.py + ops.py pre-op seeds.
Parity-ported tests from references/recipe-maintainer/recipe-info/<recipe>/tests/*.py, one-to-one (P2), with a PARITY.md mapping table.
≥2 NEW recipe-specific functional tests (P3) — characteristic behavior, not just status==200.
Real backup data-integrity (P4): seed → backup → mutate → restore → assert seeded data survived.
Dependency resolution (P5): recipes that need other apps (SSO providers, DBs) deploy them in-run.
Playwright (P6) where the app's core UX is a UI flow.
Docs (P8): docs/enroll-recipe.md updated with the per-recipe test contract + worked example.

Definition of Done (Phase 2) — P1–P8, each Adversary cold-verified in REVIEW-2

P1 — Coverage. Every recipe in §5 target set has a tests/<recipe>/ suite enrolled and a full green !testme run (install + upgrade + backup-restore).
P2 — Parity port. Every recipe-info/<recipe>/tests/*.py has a comparable cc-ci test; tests/<recipe>/PARITY.md records the mapping; non-ports documented in DECISIONS.md.
P3 — Recipe-specific depth. Each recipe has ≥2 new functional tests beyond parity (characteristic behavior, real assertions on app state/responses).
P4 — Backup data-integrity is real. Seed → backup → mutate → restore → assert seeded data survived (recipe-aware, not health-only). Pattern already proven in Phase 1e on custom-html.
P5 — Dependencies handled. Recipes with deps declare them; harness deploys deps within the run (respecting MAX_TESTS); SSO setup runs automatically.
P6 — Browser flows where they matter (D3). UI-centric recipes have a Playwright test of the core flow (login, create-an-object, etc.).
P7 — No weakened tests, no corners cut. Every assertion is real; nothing skip/xfail'd, mocked, or health-only stand-in. Any "untestable" claim is a true env-level blocker with Adversary sign-off.
P8 — Docs. docs/enroll-recipe.md updated with the per-recipe test contract (§4.1) and a worked example; a new engineer can add a recipe's full suite from the docs.

Milestones (plan §6)

Q0 — Harness additions (HTTP/convergence, OIDC-flow, dep resolver, backup data-integrity, TTY abra). Reference recipe (custom-html) uses them for full parity+specific suite, green via !testme.
Q1 — Pattern proof (custom-html + n8n): full parity + ≥2 specific + real backup data-integrity.
Q2 — SSO providers (keycloak + authentik); reusable SSO-setup/OIDC-flow harness e2e.
Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich); deps auto-deployed, SSO setup automated, parity + specific.
Q4 — Remaining recipes (matrix-synapse, mumble, bluesky-pds, ghost, mattermost-lts, discourse, plausible, uptime-kuma, mailu, drone).
Q5 — Completeness + docs; flip ## DONE.

In flight

Q3 + Q4 — recipe enrollment sprint. After capacity unblock + Adversary checkpoint, landed:

Q3.1 lasuite-docs partial (parity + 2 specific + Q2.4 test_oidc_with_keycloak); deeper OIDC ports deferred in DEFERRED.md.
Q3.4 cryptpad partial (parity + 2 specific); create-pad deferred F2-9 conditional (must lift before Phase-2 DONE).
Q4.1 matrix-synapse FULL (parity-aligned + 3 specific incl. §4.3 register-and-message).
Q4.3 bluesky-pds FULL (4 functional incl. §4.3 account+post round-trip via goat CLI; F2-8 closed).
Q4.4 ghost FULL (parity + 3 specific; create-post deferred in DEFERRED.md).
Q4.8 uptime-kuma FULL (parity + 2 specific; create-monitor deferred in DEFERRED.md).

Harness change: lifecycle.deploy_app + run_recipe_ci.py + deps.py now thread recipe_meta.DEPLOY_TIMEOUT into abra.deploy(timeout=...) so heavy-recipe Python subprocess timeout matches the recipe's internal TIMEOUT.

DEFERRED.md (machine-docs/) — new orchestrator-canonical deferral registry; 9 entries open.

Remaining substantial: Q3.2 lasuite-drive (needs mirror), Q3.3 lasuite-meet (mirrored), Q3.5 immich (needs mirror), Q4.2/Q4.5-7/Q4.9-10 (mostly need mirror). The mirror-and-enroll path is established (recipe-create-pr skill); pausing this sprint for Adversary cold-verify.

Adversary findings — Builder response

F2-11 — FIXED, awaiting Adversary re-verify (commit: git log --oneline | grep 'F2-11'). SSO-dep "deps-not-ready" SKIP no longer yields a GREEN !testme.

WHAT: when a recipe declares DEPS and setup_custom_tests fails (deps not ready) so its @requires_deps (SSO/OIDC) tests SKIP, the run now reports FAIL (overall=1), not green — while generic-tier failure-isolation is preserved (install/upgrade/backup/restore results stand).
WHERE (code):
- tests/conftest.py::pytest_collection_modifyitems — now counts the requires_deps tests it skips and appends the count to $CCCI_DEPS_SKIP_REPORT.
- runner/run_recipe_ci.py — sets CCCI_DEPS_SKIP_REPORT (run-scoped temp, near depsfile); after teardown sums the count into requires_deps_skipped; RUN SUMMARY annotates the custom tier (custom: pass (N requires_deps SKIPPED ... SSO UNVERIFIED)); new pure predicate sso_dep_unverified(declared, deps_ready, requires_deps_skipped) flips overall=1.
- tests/unit/test_f211_sso_skip.py — 7 new unit tests.
HOW to verify (both deploy-free, rate-limit-independent):
1. ssh cc-ci 'cd /root/cc-ci && cc-ci-run -m pytest tests/unit -q' → EXPECTED: 35 passed (28 prior + 7 F2-11).
2. Cold real-test signal proof: ssh cc-ci 'cd /root/cc-ci && rm -f /tmp/f211-skip.txt && CCCI_DEPS_READY=0 \ CCCI_DEPS_NOT_READY_REASON=boom CCCI_DEPS_SKIP_REPORT=/tmp/f211-skip.txt \ cc-ci-run -m pytest tests/lasuite-docs/functional/test_oidc_with_keycloak.py -rs; \ cat /tmp/f211-skip.txt' → EXPECTED: 1 skipped, pytest exit 0 (the hazard), and /tmp/f211-skip.txt == 1. Since lasuite-docs declares DEPS=["keycloak"], the orchestrator computes sso_dep_unverified(["keycloak"], False, 1)=True → overall=1.
NOT verified by a live run yet: full e2e (real deploy with forced setup_custom_tests failure → observe overall=1) is deferred until the Docker Hub rate limit (## Blocked) lifts. The two proofs above cover the predicate, the conftest signal on real files, and the count flow; only the straight-line read→sum→predicate→overall wiring is unexercised by a live deploy.

Gate

Gate: Q2 — Adversary PASS @2026-05-28 (REVIEW-2 ## Q2 — PASS @2026-05-28 (re-verify after F2-5 fix + F2-6 collateral resolution); cold e2e on /root/adv-verify HEAD 874bfbb: deploy-count=2, all 5 assertions PASS, DEPS teardown clean, post-run docker stack/volume/secret with 'keyc|lasuite' filter all empty; NO VETO). F2-5 + F2-6 CLOSED; F2-7 stands as open scope (authentik backend in harness.sso when Q2.2 enrolls). Builder may advance to Q3 — already in flight (Q3.1 partial @ 874bfbb, Q5.1 docs @ b2151af).

Acceptance per plan §6 Q2: "a dependent recipe deploys its provider + runs an OIDC login test in one run." Proven cold:

Objective evidence pointers (Q2):

Q2.1 keycloak parity + 2 NEW specific tests — commit d5f5e86:
- tests/keycloak/functional/test_health_check.py — parity port.
- tests/keycloak/functional/test_password_grant_token.py — password grant, JWT decoded, claims (iss/azp/typ/exp/iat) validated.
- tests/keycloak/functional/test_create_client_and_use.py — admin-API client CRUD + client_credentials grant + JWT azp/iss validation + idempotent cleanup.
- oidc_integration.py parity deferred to Q3 (cross-recipe; see PARITY.md note).
- Bumped DEPLOY_TIMEOUT + HTTP_TIMEOUT to 900s.
- Cold e2e (log /root/ccci-q2-keycloak-r3.log): all 5 stages PASS, deploy-count=1, head_ref=666649a6 == chaos-version=666649a6, version 10.7.0+26.6.1 → 10.7.1+26.6.2.
Q2.3 dep resolver + SSO-setup harness primitives — commit 4d6b040:
- runner/harness/deps.py — declared_deps + dep_domain + deploy_deps + teardown_deps + JSON run state. Subsumes Q0.4 (dep resolver).
- runner/harness/sso.py — setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint. Reusable by every SSO-dependent recipe (Q3 will exercise).
- runner/run_recipe_ci.py — wired in dep deploy BEFORE recipe-under-test, dep teardown AFTER in finally (reverse order). DG4.1 expected count = 1 + len(deps).
- tests/conftest.py — deps_apps fixture exposes dep domains to dependent tests.
- 7 new unit tests in tests/unit/test_deps.py; 28/28 unit tests PASS cold.
F2-5 fix — dep teardown verify=True — commit c6e94af, log /root/ccci-f25-verify.log:
- runner/harness/deps.py::teardown_deps now uses lifecycle.teardown_app(..., verify=True) so residuals raise TeardownError. Errors are logged per-dep but we continue to other deps; a combined TeardownError is raised after all attempts.
- runner/run_recipe_ci.py catches the dep TeardownError in finally, surfaces via dep_teardown_error in the run summary + non-zero exit code.
- Cold-verified: lasuite-docs+keycloak dep e2e PASSED clean (3 custom + 2 lifecycle install = 5 PASS); post-run cc-ci state has NO leftover keycloak (docker stack ls | grep keyc → empty; docker volume ls | grep keyc → empty; docker secret ls | grep keyc → empty).
- deploy-count=2, expected 2.
Q2.4 acceptance (the gate) — commit 9e88741, log /root/ccci-q24-lasuite-keycloak.log:
- tests/lasuite-docs/recipe_meta.py declares DEPS = ["keycloak"].
- tests/lasuite-docs/functional/test_oidc_with_keycloak.py:
  - Asserts deps_apps["keycloak"] is the per-run dep domain.
  - Calls harness.sso.setup_keycloak_realm → realm/client/user.
  - GETs OIDC discovery; asserts issuer == https://<kc>/realms/lasuite-docs.
  - Performs password grant → JWT; asserts iss/azp/typ/exp claims.
- Cold-run output:
```
===== DEPS: ['keycloak'] =====
  dep: deploying keycloak -> keyc-c12afe.ci.commoninternet.net
  dep: keycloak ready @ keyc-c12afe.ci.commoninternet.net
===== TIER: install =====  2 PASS (generic + cc-ci overlay)
===== TIER: custom =====   1 PASS (test_oidc_password_grant_against_dep_keycloak)
===== DEPS teardown =====
===== RUN SUMMARY =====
deploy-count = 2 (expect 2)
```
F2-3 systemic fix — commit 47f7cb4: runner/harness/browser.py::goto_with_retry centralizes the F2-3 try/except PlaywrightError pattern; applied to all install overlays using page.goto (custom-html, n8n, keycloak, cryptpad, lasuite-docs) + the custom-html playwright/test_browser_smoke. Cold e2e (custom-html, log /root/ccci-q2-customhtml-r2.log): all 5 stages PASS, deploy-count=1, HC1 non-vacuous.

Reference command for Adversary (cold, on cc-ci):

ssh cc-ci 'cd /root/<your-clone> && \
  cc-ci-run -m pytest tests/unit -v && \
  RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py && \
  RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py'

Gate: Q1 — Adversary PASS @2026-05-28 (REVIEW-2 ## Q1 — PASS @2026-05-28 (re-verify after F2-3 + F2-4 fixes); cold e2e on /root/adv-verify HEAD fc89552 → all 5 stages PASS, deploy-count=1, HC1 non-vacuous; F2-3 + F2-4 CLOSED; NO VETO). Builder may advance to Q2.

Objective evidence pointers (Q1):

custom-html (Q1.1) — already cold-verified in Q0 PASS. Same evidence stands: full e2e green, HC1 non-vacuous, deploy-count=1; PARITY.md + functional/ + playwright/ in place.
n8n (Q1.2) — full e2e on cc-ci (log /root/ccci-q1-n8n-r3.log):
- HC1 PR-head proof: head_ref=63dd3e0f == chaos-version=63dd3e0f, version 3.1.0+2.9.4 → 3.2.0+2.20.6.
- Deploy-count = 1 (DG4.1 holds).
- Lifecycle tier results (generic + cc-ci overlay both PASS at each stage):
  - install: generic test_serving PASS + cc-ci test_serving_and_editor PASS (the robust Playwright poll handles n8n's /healthz-200-before-/-route-registered window).
  - upgrade: generic test_upgrade_reconverges PASS + cc-ci test_upgrade_preserves_data PASS (marker upgrade-survives written into /home/node/.n8n by ops.pre_upgrade survived the chaos redeploy of PR-head).
  - backup: generic test_backup_artifact PASS + cc-ci test_backup_captures_state PASS (marker original from ops.pre_backup captured by abra app backup create).
  - restore: generic test_restore_healthy PASS + cc-ci test_restore_returns_state PASS (marker mutated to mutated by ops.pre_restore, restored to original — real backup data-integrity).
- Custom tier results (4 PASS — log /root/ccci-q1-n8n-r4.log post-F2-4/F2-3 fix):
  - tests/n8n/functional/test_health_check.py::test_n8n_returns_200 — parity port (HTTP 200 from /), with SOURCE: recipe-info/n8n/tests/health_check.py comment.
  - tests/n8n/functional/test_workflow_roundtrip.py::test_workflow_create_and_read_back — plan §4.3 prescribed create+read-back: owner setup → POST /rest/workflows → GET /rest/workflows/; assert id/name/nodes round-trip. (F2-4 fix.)
  - tests/n8n/functional/test_rest_settings.py::test_rest_settings_returns_json_with_known_keys — polls /rest/settings until content-type is application/json (rejecting the "n8n is starting up" placeholder HTML), then asserts known public-settings keys (userManagement / defaultLocale / authCookie) in the data envelope.
  - tests/n8n/functional/test_login_state.py::test_login_endpoint_returns_json — polls /rest/login until content-type is application/json, proves auth subsystem initialized.
- PARITY.md complete: tests/n8n/PARITY.md — parity row for health_check.py, rationale for the 2 recipe-specific tests, data-integrity + playwright sections.
Q1 has no Adversary findings yet. No tests skipped/weakened; rejecting-the-placeholder pattern in the new functional tests is non-vacuous (a stuck-booting n8n that only serves the placeholder fails the test).

Reference command for Adversary (cold, on cc-ci):

ssh cc-ci 'cd /root/<your-clone> && RECIPE=n8n cc-ci-run runner/run_recipe_ci.py'

Gate: Q0 — Adversary PASS @2026-05-28 (REVIEW-2 ## Q0 — PASS @2026-05-28; cold re-verify on /root/adv-verify HEAD 0b834e9 → 21 unit PASS + e2e PASS; NO VETO). F2-1 closed; F2-2 (scope observation) acknowledged.

Prior Q0 claim detail (commit 5741e88 — F2-1 fix landed on top of the original Q0 changeset). Acceptance evidence (per plan §6 Q0): a reference recipe (custom-html) uses the new harness additions for a full parity + specific suite, green via the existing run path. F2-1 (test_custom_tests_repo_local_gated stale assertion) closed by Builder; cold re-run on cc-ci → 21/21 PASS including the previously-failing test. F2-2 (scope observation: OIDC-flow + dep resolver not in Q0) acknowledged — those primitives implement when Q2/Q3 consume them; BACKLOG-2 Q0.4 remains open and explicitly deferred.

Objective evidence pointers (Q0):

Harness additions landed
- runner/harness/http.py — canonical Phase-2 recipe-test HTTP API (vendored from references/recipe-maintainer/utils/tests/helpers.py): http_get, http_post, http_request, retry_http_get, retry_http_post, wait_for_http, assert_converges. JSON + form bodies, transport-failure → status=0.
- runner/harness/discovery.custom_tests recurses into tests/<recipe>/functional/ and tests/<recipe>/playwright/ (Phase 2 §4.1 layout) while excluding lifecycle test_<op>.py names; HC2 repo-local gate continues to apply.
- TTY abra wrapper already present in runner/harness/abra.py::_run_pty (Phase 1d) — reused.
Unit-test proof (deterministic, cc-ci; post-F2-1 fix commit 5741e88)
- cc-ci-run -m pytest tests/unit -v → 21 passed in 5.38s (the previously-failing test_custom_tests_repo_local_gated now passes; synthetic-recipe + monkeypatch fixture):
  - 8× pre-existing tests/unit/test_discovery.py (overlay + HC2 gate, regressed).
  - 2× new tests/unit/test_discovery_phase2.py (functional/+playwright/ recursion + HC2 gate still applies to subdirs).
  - 11× new tests/unit/test_http.py (in-process http.server fixture — JSON parsing, 4xx-with-body, non-JSON body, transport-failure=0, headers, JSON+form POST, retry convergence, retry timeout, wait_for_http, assert_converges return value).
End-to-end proof (custom-html on cc-ci, the reference recipe)
- RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py (log /root/ccci-q0-customhtml-full.log):
  - install/upgrade/backup/restore/custom all PASS, deploy-count=1.
  - HC1 PR-head proof: head_ref=8a026066 == chaos-version=8a026066, version 1.10.0→1.11.0.
  - 5 lifecycle assertions (generic + cc-ci overlay across 4 ops) + 4 custom-stage assertions (3 functional + 1 playwright). Reference command for Adversary cold re-run: RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py.
Per-recipe contract artifact landed
- tests/custom-html/PARITY.md — parity row for health_check.py, rationale for the 2 recipe-specific tests + the data-integrity + playwright sections.
- tests/custom-html/functional/{test_health_check.py,test_content_roundtrip.py,test_content_type_header.py} — parity port + 2 NEW recipe-specific tests; each parity file carries the SOURCE: recipe-info/custom-html/tests/<file> comment for audit.
- tests/custom-html/playwright/test_browser_smoke.py — Phase-2 P6 home.

Reference command for Adversary (cold, on cc-ci):

ssh cc-ci 'cd /root/cc-ci && cc-ci-run -m pytest tests/unit -v && RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py'

Blocked

@2026-05-28 ~21:10Z — ONE standing EXTERNAL (Class A1) block: Docker Hub pull rate limit. (The earlier Gitea outage is RESOLVED — see below — and git state is reconciled/pushed.)

Docker Hub anonymous pull rate limit (registry-creds finding, plan §1.5). docker.io pulls from cc-ci's IP fail with toomanyrequests: You have reached your unauthenticated pull rate limit. Verify: ssh cc-ci 'docker pull redis:8.6.3' → rate-limit error. After the Gitea outage I re-tested: exactly 1 pull (minio) trickled through as the rolling 6h window aged, then the next 3 (redis/nginx/ mailcatcher) hit the limit again — so the quota is still effectively exhausted, dribbling ~1 pull at a time. Traced to: today's many recipe deploys + a docker image prune -af (run to clear a disk-full that broke the drive deploy) forcing a full cold re-pull. Blocks every new recipe deploy. Per §1.5 this is a finding → request registry pull credentials (authenticated/Team Docker Hub, or a pull-through cache). Recurs for all remaining Q3.5/Q4 enrollments. Operator notified @~19:45Z.

Impact on Q3.2 lasuite-drive: base deploy got 8/12 services up (incl. heavy onlyoffice+collabora; big image LAYERS now cached on cc-ci so a re-run is light) but the last 3 small images can't pull. Will re-run the moment pulls flow (creds or window reset). cc-ci is CLEAN (teardown verified: 0 stack, 0 residue volumes/secrets; 6.8 GB disk + 6.5 GB RAM free).

Gitea outage (RESOLVED @~21:08Z). git.autonomic.zone returned blanket 404 page not found for ~1.5h (backend down; same from my sandbox AND cc-ci). Orchestrator confirmed it back online; I re-ran git pull --rebase (up to date) and pushed the 2 queued local commits — origin/main is now 4a118ea. The 3 watchdog pings during the outage were phantoms (Adversary's failed push retries); the remote has NO pending BUILDER-INBOX and NO new REVIEW-2 verdict, so nothing was lost on my side.

Prior bootstrap state: access re-verified @2026-05-28: ssh cc-ci ok (root, NixOS 24.11), Gitea API HTTP 200, wildcard DNS resolves to gateway 143.244.213.108.

Carryover from Phase 1e (not blockers for Phase 2)

F1e-2 [adversary] — concurrent same-recipe abra recipe fetch race in runner/run_recipe_ci.py::fetch_recipe. Pre-existing in Phase 1d; not a 1e regression. Drone caps MAX_TESTS=1 today, so practical impact bounded. Tracked for Phase-2 breadth-ramp if concurrent recipe runs become routine.

21 KiB Raw Blame History Unescape Escape

STATUS — Phase 2 (per-recipe test authoring)

Phase

Definition of Done (Phase 2) — P1–P8, each Adversary cold-verified in REVIEW-2

Milestones (plan §6)

In flight

Adversary findings — Builder response

Gate

Blocked

Carryover from Phase 1e (not blockers for Phase 2)

21 KiB

Raw Blame History