Files
cc-ci/machine-docs/BACKLOG-2.md

706 lines
58 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# BACKLOG — Phase 2 (per-recipe test authoring)
Phase-namespaced backlog. Builder edits `## Build backlog`; Adversary edits `## Adversary findings`.
Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
## Build backlog
### Q0 — Harness additions
- [x] **Q0.1**`runner/harness/http.py` landed (canonical Phase-2 recipe-test HTTP API:
`http_get`/`http_post`/`http_request`/`retry_http_get`/`retry_http_post`/`wait_for_http`/
`assert_converges`). TTY abra wrapper already present (`runner/harness/abra.py::_run_pty`)
from Phase 1d. 11 unit tests landed.
- [x] **Q0.2**`discovery.custom_tests` recurses into `tests/<recipe>/{functional,playwright}/`
(Phase 2 §4.1 layout); 2 unit tests landed.
- [x] **Q0.3**`tests/custom-html/PARITY.md` landed (parity row for health_check + rationale for
2 new recipe-specific tests + data-integrity + playwright sections). Parity port:
`tests/custom-html/functional/test_health_check.py` (SOURCE comment present).
- [ ] **Q0.4** — Dependency resolver harness primitive (read `tests/<recipe>/recipe.toml`
`requires`/`test_requires`, deploy deps before the recipe under test, tear down with it). Mind
`MAX_TESTS`/node budget; sequence heavy ones. **Deferred to Q2** (needed once SSO providers come
online; no Phase-2 recipe in Q1 needs deps). Tracked in BACKLOG.
- [x] **Q0.5****RE-CLAIMED @2026-05-28** (commit `5741e88` adds F2-1 fix to original Q0).
Custom-html reference recipe runs the full parity + ≥2 specific + playwright suite green on
cc-ci; deploy-count=1; DECISIONS.md Phase-2 section in place. F2-1 closed by Builder; 21/21
unit tests PASS cold. Awaiting Adversary cold re-verify.
### Q1 — Pattern proof (custom-html + n8n)
- [x] **Q1.1** — custom-html: 2 NEW recipe-specific functional tests landed
(`test_content_roundtrip.py` + `test_content_type_header.py`); already cold-verified in Q0 PASS.
- [x] **Q1.2** — n8n enrolled under cc-ci. Parity port `tests/n8n/functional/test_health_check.py`
+ **3 recipe-specific functional tests**: `test_workflow_roundtrip.py` (the plan §4.3
prescribed create-and-read-back via owner setup → POST /rest/workflows → GET round-trip;
F2-4 fix), `test_rest_settings.py` (REST bootstrap surface), `test_login_state.py` (auth
subsystem). Install overlay's Playwright now wraps page.goto in try/except PlaywrightError
so transient net::ERR_* triggers retry, not failure (F2-3 fix).
- [x] **Q1.3** — n8n real backup data-integrity already covered by the Phase-1d/1e lifecycle overlay
pattern (`ops.pre_backup` seeds "original" in /home/node/.n8n; `pre_restore` mutates; restore
must return "original" — passed in the Q1.2 e2e run).
- [x] **Q1.4****RE-CLAIMED @2026-05-28** (commit `fc89552` F2-3+F2-4 on top of `2f3d5aa`). Both
recipes green via the run path; both PARITY.md complete; Adversary findings F2-3 + F2-4 closed
by Builder. Awaiting Adversary cold re-verify.
### Q2 — SSO providers (keycloak + authentik)
- [x] **Q2.1** — keycloak: parity-port `test_health_check.py` + 2 NEW recipe-specific functional
tests. Bumped timeouts to 900s. Full e2e green (commit `d5f5e86`).
- [ ] **Q2.2** — authentik: **deferred (lower priority).** The SSO harness primitive is
provider-pluggable (the `setup_keycloak_realm` shape can be mirrored to `setup_authentik_provider` when needed); Q2.4 acceptance is already proven via keycloak. Will land when Q3
lights up an authentik-dependent recipe, or as Q4/Q5 sweep.
- [x] **Q2.3** — Dep resolver (`runner/harness/deps.py` — declared_deps + per-(parent,dep) domain
+ deploy_deps/teardown_deps + run state) + SSO-setup harness (`runner/harness/sso.py`
setup_keycloak_realm + oidc_password_grant + assert_discovery_endpoint) + orchestrator
wiring. 7 new unit tests; 28/28 PASS. **Subsumes Q0.4.** Commit `4d6b040`.
- [x] **Q2.4****RE-CLAIMED @2026-05-28** (commit `c6e94af` F2-5 fix on top of `9e88741`).
`tests/lasuite-docs/recipe_meta.py DEPS = ["keycloak"]`; `test_oidc_with_keycloak.py`
proves the full SSO flow. F2-5 verified: dep teardown now uses verify=True, raises +
surfaces leak failures; cold re-verify on cc-ci → no leftover keycloak after teardown.
### Q3 — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich)
- [~] **Q3.1** — lasuite-docs: parity port (health_check) ✓ + 2 NEW recipe-specific tests
(test_oidc_with_keycloak.py — Q2.4 acceptance test exercising real OIDC flow against
dep keycloak; test_auth_required.py — protected backend API requires auth). Open
follow-up: oidc_login.py + upload_conversion.py full ports + create-a-doc require
lasuite-docs OIDC env wiring (install_steps.sh wires dep keycloak's client_secret +
OIDC env into lasuite-docs's .env at install time). Documented in tests/lasuite-docs/
PARITY.md.
- [x] **Q3.2** — lasuite-drive: **FULL LIFECYCLE 3× GREEN @2026-05-29 — CLAIMED (STATUS-2 Gate Q3.2),
awaiting Adversary.** install+upgrade+backup+restore+custom all pass; OIDC password-grant PASSED
(not skip); deploy-count=1; clean teardown; data-integrity (ci_marker) survives upgrade +
backup/restore. Fixed via install-time OIDC (commit `a151489`) + collabora-ready upgrade gate +
DEPLOY_TIMEOUT plumbing (commit `4b38b66`). Logs r2/r3/r4. Original [~] detail retained below.
- [~] **Q3.2 (original)** — lasuite-drive: enrolled (mirrored). Maximal testable subset GREEN @2026-05-29
(`/root/ccci-drive-subset.log`): install (generic+cc-ci test_serving_and_frontend) + backup
(P4 test_backup_captures_state) + restore (P4 test_restore_returns_state) + custom — all 3
functional PASS: test_health_check (parity), test_minio_storage (real S3 upload→list→download→
assert-bytes round-trip), test_oidc_with_keycloak (password-grant JWT vs warm keycloak,
per-run realm, clean teardown). deploy-count=1, deps=['keycloak'] (warm-reused). **Upgrade
tier: disk-blocker RESOLVED @2026-05-29 (cc-ci grew to 64G/44G-free) — the upgrade tier is now
REQUIRED green (no longer deferrable, per Adversary + operator) and runs as part of the Q3.2a
rework. It stays a veto-eligible OPEN obligation until run green (incl. real prev→PR-head office
crossover) + Adversary cold-verified.** Bug fixed en route: `fix(2)`
`f1c626c` — setup_custom_tests `docker service scale --detach` (the run-once minio-createbuckets
job made a blocking scale hang the custom tier). **NOT CLAIMED — OIDC setup is FLAKY:** the
step-3 in-place full-stack `abra app deploy --force --chaos` (applies OIDC env) only converges
sometimes on this heaviest 12-service stack (run 1 OK → OIDC PASS; run 4 FAIL → OIDC SKIP → F2-11
RED). Test assertions are all correct (run 1 proved health+MinIO+OIDC green); the flakiness is in
the redeploy infra. **Two open issues block a reliable Q3.2 green:** (a) [Q3.2a] flaky OIDC
redeploy — see below; (b) upgrade tier disk-blocker (DEFERRED/operator). See JOURNAL-2 2026-05-29.
- [x] **Q3.2a****DONE @2026-05-29 (Part A + harness upgrade gate; claimed under Q3.2).** Part A
(install-time OIDC, deploy-once, no mid-run reconverge — real abra only) landed `a151489`;
Step 0 root-cause logs captured (JOURNAL-2). The upgrade-tier flakiness (collabora killed
mid-boot by the chaos redeploy) was fixed in the **harness** via a collabora-WOPI-ready gate in
`pre_upgrade` + DEPLOY_TIMEOUT plumbing (`4b38b66`) — 3× repeat-green, so **Part B (recipe PR)
is NOT required for CI green**. (Part B remains an optional upstream-robustness improvement; may
file separately. The `--chaos` reconverge is now race-free because it replaces a fully-ready
collabora.) Original plan detail retained below.
- [~] **Q3.2a (original plan)** — Make lasuite-drive OIDC wiring reliable. **PLAN:**
`cc-ci-plan/plan-lasuite-drive-oidc-robustness.md` (orchestrator, 2026-05-29). The full
12-service `--chaos` redeploy to apply OIDC env exposes collabora's flaky reconverge (+ transient
backend gunicorn-perms / WOPI-404). Structured as: **Step 0** capture real failure logs first;
**Part A** (cc-ci harness) — create the per-run realm/client in the live-WARM keycloak + set OIDC
env in `.env` BEFORE a single `abra app deploy` (deploy ONCE, NO mid-run `--chaos` reconverge);
REAL abra commands only (no `docker service update/scale` patching); verify full suite green **3×
in a row**. **Part B** — lasuite-drive RECIPE PR (collabora WOPI healthcheck-gating + backend
retry; gunicorn-perms entrypoint fix; lazy/retrying OIDC discovery); "working" ONLY once cc-ci
runs the full suite (incl. upgrade tier, now disk-unblocked) on the PR repeatedly-green +
Adversary cold-verified → operator merges. Q3.2 claimed + this item closed only after A+B green.
- [ ] **Q3.2b****PARKED behind Q3.2 (orchestrator 2026-05-29).** lasuite-drive **recipe-maintainer
PR** to fix robustness at the SOURCE — plan: `cc-ci-plan/plan-lasuite-drive-recipe-pr.md`. Four
changes: (1) **collabora healthcheck + start_period [KEYSTONE]** — lets abra's OWN convergence
wait succeed (fixes F2-12 at source); (2) backend retry/wait for collabora WOPI; (3) gunicorn-perms
startup-race fix; (4) lazy/retrying OIDC discovery. Merge rule: "working" only when cc-ci runs the
FULL suite (incl. upgrade tier) on the PR repeatedly-green + Adversary cold-verified → operator
merges. **Afterward: REVERT the F2-12 `-c`/READY_PROBE backstop (e1147b5) → return to abra-native
convergence** (per the DECISIONS guardrail "prefer abra convergence by default"). Recipe-side only;
harness-side OIDC-at-install (Part A) stays. Use the recipe-create-pr skill. Not started; do after
Q3.2 PASSes + higher-priority Q4 coverage.
- [x] **Q3.3** — lasuite-meet: **FULL LIFECYCLE GREEN @2026-05-29 — CLAIMED (STATUS-2 Gate Q3.3),
awaiting Adversary.** install+upgrade+backup+restore+custom all pass (deploy-count=1, clean
teardown); real upgrade crossover `0.2.0+v1.15.0→0.3.0+v1.16.0`. Parity: health_check +
oidc_login (→ test_oidc_with_keycloak, password-grant JWT). §4.3: test_meeting_flow
(create-room → read-back → LiveKit join token [JWT video grant] → delete) + OIDC. Reused
lasuite-drive OIDC-at-install machinery. R014 lightweight-tag fixed via chaos-base deploy
(commit 72719fe). webrtc-media/relay UDP media-relay = documented env-blocker non-port (maximal
subset = LiveKit token issuance, shipped) per §7.1. Commits 32a743f+9c6cb53+72719fe+1f7806a;
log /root/ccci-meet-full6.log. Original [ ] detail: parity (health_check, oidc_login,
meeting_flow, webrtc-media, webrtc-relay) + specific (create-a-room, LiveKit token issuance).
- [~] **Q3.4** — cryptpad: parity port (health_check) ✓ + 2 NEW recipe-specific
(test_spa_assets — branding + canonical asset paths in HTML; test_pad_create.py —
Playwright SPA renders + JS bundle loads + no console errors). Open follow-up: the
§4.3-prescribed "create-a-pad + type + reload + read-back" test deferred with technical
rationale (CryptPad pad-creation flow is version-specific; UI selector for 'new pad'
varies). See DECISIONS.md Phase-2 Q3.4 section; Adversary sign-off pending per §7.1.
- [~] **Q3.5** — immich: **ENROLLED, 4/5 tiers GREEN + §4.3 @2026-05-29.** install/upgrade (real
crossover 1.5.1+v2.6.3→1.6.0+v2.7.5)/backup/custom all pass; §4.3 test_asset_upload
(upload→read-back→thumbnail-derivative) PASSED; health PASSED; deploy-count=1; clean teardown;
self-contained (no SSO). Needed a host fix: time.timeZone=UTC→/etc/localtime (commit `d4eae4e`,
immich binds host /etc/localtime). Commits 98a37d4+d4eae4e+82dc2d7; log /root/ccci-immich-full.log.
**OPEN: restore data-integrity (P4) RED** — postgres ci_marker doesn't survive `abra app restore`
because immich's UPSTREAM recipe uses a live-volume backup (no pg_dump hook, unlike drive/meet).
Diagnosed (probe). Fix = immich recipe pg_dump hook (DEFERRED.md 2026-05-29 entry; recipe-PR
unit like Q3.2b). NOT claimed full (restore RED); Adversary to weigh recipe-PR-required vs §7.1
sign-off on the maximal subset.
- [ ] **Q3.6** — Q3 gate: each green with deps deployed, within node budget; SSO setup automated.
### Q4 — Remaining recipes
- [x] **Q4.1** — matrix-synapse: PARITY.md + 3 functional tests (federation_version, health_check,
register_and_message via shared-secret admin endpoint called from container localhost — the
§4.3 prescribed register-2-users + send/receive message). EXTRA_ENV TIMEOUT=900. Cold green
after capacity unblock (commit `8350865`). Shell-script parity tests
(compress_state/test_complexity_limit/test_purge) deferred with technical rationale.
- [x] **Q4.2** — mumble: **FULL LIFECYCLE GREEN @2026-05-29 — CLAIMED (STATUS-2 Gate Q4.2), awaiting
Adversary.** TCP/voice recipe (not HTTP-native) enrolled via mumbleweb (HTTP readiness + web_client
parity) + host-ports (64738 on host for protocol tests). P2: 3 parity ports (health_check→
test_tcp_health, mumble_connect→test_protocol_handshake [TLS handshake+channel presence+ServerSync],
web_client→test_web_client). P3: 2 specific (test_welcome_text_roundtrip + test_server_config_limits
— config round-trips over the protocol). P4: sqlite ci_marker in /data/mumble-server.sqlite survives
backup→mutate→restore. install+upgrade(real 0.2.0→1.0.0+ crossover, head_ref==chaos-version)+backup+
restore+custom all pass; deploy-count=1; clean teardown. Harness: CHAOS_BASE_DEPLOY flag,
recipe_checkout -f, TCP READY_PROBE (wait_ready_probes); install_steps provides host-ports.yml to
versions predating it. Commits 6841048+6bf0425+999dd0d+a0fd58b+1890cb5+ec76072; log ccci-mumble-full6.
- [x] **Q4.3** — bluesky-pds: enrolled. install_steps.sh generates per-run secp256k1 PLC rotation
key (recipe's pds_plc_rotation_key is generate=false). PARITY.md, recipe_meta.py + 3
functional tests (health_check, describe_server, session_auth-requires-auth). Cold green
via `RECIPE=bluesky-pds STAGES=install,custom cc-ci-run runner/run_recipe_ci.py`
(commit `6115d2e`). goat_account parity deferred (operational complexity).
- [x] **Q4.4** — ghost: enrolled. PARITY.md + recipe_meta.py (DEPLOY_TIMEOUT=1200, TIMEOUT=1200
via EXTRA_ENV; ghost cold-start ~12-15min) + 3 functional tests (health_check, content_api,
admin_redirect). Cold green (commit `1bd7c7a`). Create-a-post deeper test in DEFERRED.md.
- [x] **Q4.5** — mattermost-lts: ENROLLED, FULL lifecycle GREEN @2026-05-29 (`ccci-mm-full.log`).
HTTP-native, self-contained postgres (no dep), no reference corpus (P2 vacuous). recipe_meta +
3 functional: test_health_check (root + `/api/v4/system/ping`=OK), **test_create_message**
(§4.3 P3: first-user bootstrap → login [token via new `harness.http.post_with_headers`] → team →
channel → POST message → GET read-back, unique marker round-trips). Generic lifecycle tiers
(no overlays, ghost model). deploy-count=1; install+**upgrade** (real HC1 prev→PR-head
2.1.9+10.11.15→2.1.10+10.11.18, head_ref==chaos-version)+backup+restore+custom ALL PASS; clean
teardown. **P1 ✓ (install+upgrade+backup-restore), P3 ✓, P2 vacuous.** Remaining: P4 recipe-aware
backup data-integrity (seed→backup→mutate→restore→assert) = follow-up ops.py — tracked in the Q5
P4-sweep (generic backup/restore covers the floor; same bar as ghost Q4.4). Mirror to
recipe-maintainers needed only for the PR/!testme flow (catalogue-fetch e2e green now).
- [~] **Q4.6** — discourse: **BLOCKED (DEFERRED 2026-05-29)** — upstream recipe pins
`bitnami/discourse:*` images that Docker Hub no longer serves (manifest unknown; swarm task
Rejected 'No such image'). db/redis deploy; bitnami-imaged app/sidekiq cannot. Image exists at
`bitnamilegacy/discourse` but the install tier uses the prev published version (also gone), so a
recipe-PR can't unblock testing until upstream releases a fixed version. Scaffolding staged
(recipe_meta+postgres-P4 overlays+health, commit ca7acf3); §4.3 create-topic not written (deploy
blocked). See DEFERRED.md 2026-05-29 discourse entry. Same class as plausible Q4.7b.
- [~] **Q4.7** — plausible: enrolled. recipe_meta (DISABLE_AUTH/REGISTRATION, SECRET_KEY_BASE;
HEALTH_PATH=/api/health [200 w/ clickhouse+postgres+sites_cache ok — `/` 500s under headless
DISABLE_AUTH so not a valid probe]; DEPLOY/HTTP_TIMEOUT=1200) + PARITY.md (P2 vacuous, no
recipe-maintainer corpus) + lifecycle overlays (test_install asserts /api/health subsystems;
ops.py seeds postgres ci_marker via pg_dump-backed backup) + **§4.3 functional tests
(test_event_tracking.py): test_pageview_event_roundtrip + test_custom_event_roundtrip — register
site → POST /api/event (browser UA) → read back from clickhouse events_v2. Both PROVEN GREEN**
(`STAGES=install,custom` run, `2 passed in 73.58s`; custom tier pass). Commits 3943cd8 + b4f39cb.
**NOT CLAIMED — full-lifecycle deploy blocked by upstream clickhouse-backup boot-download
crash-loop (see DECISIONS + Q4.7b):** the recipe's clickhouse entrypoint downloads a 22MB binary
from GitHub at boot with `set -e`/no-retry; my back-to-back test churn exhausted the host IP's
GitHub budget → secondary rate-limit → crash-loop → `abra app deploy` 1200s timeout. Converges
when GitHub answers the first wget (proven: install,custom run + probe). Path to green: GitHub
cooldown + ONE clean full run. Test content is correct; this is upstream-recipe fragility.
- [ ] **Q4.7b** — plausible recipe PR (DEFERRED robustness, like Q3.2b/immich): harden
`entrypoint.clickhouse.sh` — cache clickhouse-backup on the persistent `/var/lib/clickhouse`
volume (skip-if-present → no re-download amplification), retry-with-backoff, `set +e` so a
download failure never blocks clickhouse-server start. NOTE: only fixes the upgrade tier + FUTURE
installs once released (install tier deploys the prev PUBLISHED version), so it does NOT unblock
this gate's install tier under throttle. Use recipe-create-pr skill; merge rule per Q3.2b.
- [ ] **Q4.7 gate** — full lifecycle (install+upgrade+backup-restore) green via clean run + Adversary.
- [x] **Q4.8** — uptime-kuma: enrolled. PARITY.md + recipe_meta.py + 3 functional tests
(health_check, socketio_handshake, spa_branding). Cold green (commit `1aaf3bd`).
Create-a-monitor in DEFERRED.md (Socket.IO client primitive + --extra; F2-10 closed).
- [x] **Q4.9** — mailu: **FULL LIFECYCLE GREEN @2026-05-29 — CLAIMED (STATUS-2 Gate Q4.9), awaiting
Adversary.** Full email stack. install+upgrade(real 3.0.0+2024.06.27→3.0.1+2024.06.37 crossover)+
custom green; deploy-count=1; clean teardown. backup/restore N/A-SKIP (no backupbot label → P4
N/A, documented PARITY.md+DEFERRED.md, Adversary §7.1 sign-off requested). P2 vacuous (no corpus).
P3: test_mailbox (flask mailu user create → config-export read-back) + test_mail_flow (in-container
sendmail inject → doveadm search deliver/store/fetch). TLS_FLAVOR=notls (avoids certdumper/ACME);
in-container mail tools (notls disallows network plaintext auth). Commits 916bdd8+8844943; log
ccci-mailu-full2.
- [~] **Q4.10** — drone: **BLOCKED on host /etc/timezone deploy (operator) @2026-05-29.** drone needs
a gitea SCM dep to boot; gitea binds /etc/timezone (absent on NixOS host → container rejected,
proven via smoke). Declarative fix committed `3bde76f` (environment.etc.timezone=UTC); needs an
operator nixos-rebuild (no self-service path). Full gitea+drone integration SCOPED + ready
(JOURNAL-2 f86a58a: tests/gitea dep + tests/drone DEPS=["gitea"] + install_steps OAuth-app wiring).
§4.3 build-creation = disproportionate sub-deferral (OAuth-token+repo+webhook) → maximal subset
(drone boots w/ gitea SCM) + §7.1 sign-off. See STATUS-2 ## Blocked + DEFERRED.md 2026-05-29 drone.
- [ ] **Q4.11** — Q4 gate: each recipe green with parity + specific.
### Q5 — Completeness + docs
- [~] **Q5.1**`docs/enroll-recipe.md` updated with the Phase-2 contract (commit `b2151af`):
§2 PARITY.md / functional/ / playwright/ layout; §2.1 Phase-2 contract + custom-tier
discovery; §2.2 DEPS / deps_apps fixture / F2-5 verify=True; §2.3 harness.sso primitives
with the F2-7 keycloak-specificity caveat; worked lasuite-docs example end-to-end. **Will
re-pass when Q3.2/Q3.5 enroll new recipes** (immich/lasuite-drive) to confirm a new
engineer can follow the doc cold.
- [x] **HQ1 — Harness image pre-pull — DONE @2026-05-29 (commit `2bf40d6`), CLAIMED (STATUS-2 gate),
awaiting Adversary.** `lifecycle.prepull_images` resolves images via `docker compose config
--images` (COMPOSE_FILE from app .env; $VERSION interpolation + multi-compose) → `docker pull`
skip-if-present; called in deploy_app before the (unchanged real) abra.deploy AND in
perform_upgrade before the chaos redeploy. Validated: 4 unit tests (tests/unit/test_prepull.py)
+ warm-cache 2nd run "present" (no re-download) + bad-tag → clear RuntimeError pre-deploy +
abra deploy unchanged (no service update/scale). Original spec below.
- [ ] **HQ1 (orig)** — Harness image pre-pull (near-term unit, orchestrator 2026-05-29). PLAN:
`cc-ci-plan/plan-prepull-images.md`. At the START of a recipe test sequence (before the first
`abra app deploy`) AND before the upgrade tier's new-version deploy: resolve recipe images via
`docker compose --env-file <app.env> -f <COMPOSE_FILE> config --images` and `docker pull` each
(skip-if-present via `docker image inspect` for pinned tags); then the normal abra deploy runs
UNCHANGED (real abra; pre-pull just warms the local store). Value: separates pull from converge
→ a pull failure is a CLEAR pull error (not a murky "not converged" timeout); images-local →
faster convergence within abra's native window (less need for the -c workaround on *pull-bound*
deploys — note collabora's slow-INIT still needs the recipe healthcheck, not affected). Cheap on
warm cache (`docker pull` = "Already exists" no re-download; skip-if-present = zero network for
pinned tags). Directly fixes the "No such image" first-deploy race I hit on immich + lasuite-meet.
**Adversary verifies:** warm-cache 2nd run does NO layer re-download; a bad-tag pre-pull fails as
a clear pull error PRE-deploy. Pick up as a near-term harness unit (NOT a phase-pause).
- [ ] **Q5.2** — Adversary samples a subset and cold-verifies parity tables + specific tests are real
(not health-only, not skipped). NO weakened test, no corners cut (P7).
- [ ] **Q5.3** — Phase 2 `## DONE` after all P1P8 Adversary cold-verified PASS, no standing VETO.
## Adversary findings
- [x] **F2-11 [adversary] — CLOSED @2026-05-28** by Builder commit `5b34496`. The deps-not-ready
SKIP no longer yields a GREEN run; generic-tier failure-isolation is preserved (only the green
SIGNAL is corrected). The fix: `conftest.pytest_collection_modifyitems` counts skipped
`requires_deps` tests and appends the count to `$CCCI_DEPS_SKIP_REPORT`; `run_recipe_ci`
sums it (`run_recipe_ci.py:582-585`), surfaces `(N requires_deps SKIPPED … SSO UNVERIFIED)`
in the RUN SUMMARY, and the pure predicate `sso_dep_unverified(declared, deps_ready, skipped)`
(`:48`) flips `overall=1` (`:633`) when a DEPS-declaring recipe skipped ≥1 SSO test.
**Adversary cold re-verify @2026-05-28 on `/root/adv-verify` HEAD `0d6cd05` (deploy-free,
rate-limit-independent):**
- `cc-ci-run -m pytest tests/unit -q`**35 passed** (28 prior + 7 new `test_f211_sso_skip.py`;
read the bodies — non-vacuous: predicate true + 3 false cases, conftest skip/record/append/
no-op with fakes).
- **Real signal proof:** the actual `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`
(lasuite-docs declares `DEPS=["keycloak"]`) run with `CCCI_DEPS_READY=0`
`1 skipped`, **pytest-exit=0** (the original hazard — a skip-only file still exits 0) BUT
`$CCCI_DEPS_SKIP_REPORT` content == `1`.
- **Stitched to the real orchestrator predicate:** `sso_dep_unverified(["keycloak"], False, 1)
= True` → `overall=1` (RED). Negatives correct: `deps_ready=True → False`, `no-deps → False`.
- Runtime wiring verified by code-read: `main()` sets `CCCI_DEPS_SKIP_REPORT` (`:445`) before
the custom tier; `_tier_env` returns `dict(os.environ, …)` so the pytest subprocess inherits
`CCCI_DEPS_READY` + the report path; orchestrator reads the same `skipfile`.
- **Residual (non-blocking):** the Builder honestly deferred the full live-deploy e2e (forced
`setup_custom_tests` failure on a real deployed recipe → observe `overall=1` end-to-end)
behind the Docker Hub pull rate limit. The decision logic + conftest→orchestrator signal it
would exercise are already proven above; I will confirm the live path on the next SSO-dep
deploy once pulls flow (belt-and-suspenders, not a re-open condition).
Original FAIL detail retained below for audit.
- [ ] ~~**F2-11 [adversary] — SSO-dep "deps-not-ready" SKIP yields a GREEN `!testme` while the
core OIDC test never ran (gate-integrity / P7, medium)**~~ — Filed by Adversary @2026-05-28
as an independent break-it probe during the git.autonomic.zone outage (no gate claimed).
**The hazard chain (cold-proven, end-to-end):**
`runner/run_recipe_ci.py:516` — if the `setup_custom_tests` step raises (dep deploy / SSO
realm enrich / hook redeploy fails), it sets `deps_ready=False` and *does not abort the run*
(by design — failure-isolation). At line 528 it exports `CCCI_DEPS_READY=0`. Then
`tests/conftest.py:98-112` (`pytest_collection_modifyitems`) adds a
`pytest.mark.skip(reason="deps-not-ready: …")` to every `@pytest.mark.requires_deps` test —
which for an SSO-dependent recipe is the ONLY meaningful test (e.g. lasuite-docs
`test_oidc_with_keycloak.py`, `test_oidc_login.py`, `test_create_doc.py` are all
`requires_deps`). A pytest file whose only test is skipped exits **0**:
- Cold-proven on cc-ci @2026-05-28: a one-test file marked
`@pytest.mark.skip(reason="deps-not-ready: …")` → `1 skipped in 0.01s`, `PYTEST_EXIT=0`.
- `run_custom` (`run_recipe_ci.py:372`) returns `"pass"` whenever `rc==0`, so the custom
tier is `pass`. The RUN SUMMARY (`overall`, lines 587-603) flips to `1` only on
deploy-count mismatch, dep-teardown leak, a tier == `"fail"`, or no-tiers. A skip is none
of those → **`overall=0` → the run reports fully GREEN.**
- The only counter-signal is a single ` deps-not-ready: <reason>` line, printed *only*
`if not deps_ready` (line 581-582), with NO skip count in the per-tier summary and no
change to the green/exit signal.
**Why it matters (P7 / §7.1):** for any SSO-dependent recipe, a green `!testme` would then
mean "generic install/upgrade/backup passed" while the characteristic OIDC/SSO test — the
whole point of P2/P3/P6 coverage for that recipe — silently skipped. P7 forbids a skip that
lets a recipe go green. The design's failure-isolation (don't let a transient SSO outage
break the generic-tier signal) is legitimate; the defect is that the *green run signal* is
indistinguishable from "SSO verified," and nothing makes an unexpected SSO-test skip
gate-blocking or even loudly visible in the summary.
**Did NOT compromise the existing Q2 PASS:** Q2.4 evidence (STATUS-2 + my REVIEW-2 Q2 PASS)
shows `test_oidc_password_grant_against_dep_keycloak` actually **PASSED** (`1 PASS`), not
skipped — deps_ready was true. So Q2 stands. This is a latent hazard for every *future*
SSO-dep gate (Q3 lasuite-*/immich/cryptpad-with-deps) and for the standing `!testme` signal.
**Adversary acceptance-discipline (binding on me, effective now):** I will NOT accept any
SSO-dependent recipe's gate on a green exit alone. For Q3 and any deps-declaring recipe I
must grep the run log for `SKIPPED` / `deps-not-ready` on `requires_deps` tests and require
the OIDC/SSO test to have actually **PASSED**. A skipped core test = NOT a PASS, regardless
of `overall=0`.
**Recommended Builder fix (not a VETO; no SSO-dep gate is claimed right now):**
1. Surface skipped `requires_deps` tests in the RUN SUMMARY — e.g. a per-tier
`custom: pass (N skipped: deps-not-ready)` and an explicit `!! N requires_deps tests
SKIPPED — SSO unverified` warning line.
2. Make an *unexpected* deps-not-ready skip gate-blocking: when a recipe declares `DEPS` and
`setup_custom_tests` fails, the run should not be reported as a clean PASS for that
recipe (e.g. `run_custom` could distinguish skip-only-of-required-tests from genuine
pass, or the orchestrator could set `overall=1` when `not deps_ready` and any
`requires_deps` test was thereby skipped). Failure-isolation for the *generic* tiers can
be preserved while still failing the recipe's own SSO claim.
- Repro: set `CCCI_DEPS_READY=0` (or force a `setup_custom_tests` raise) and run any
deps-declaring recipe through `runner/run_recipe_ci.py` with `STAGES=install,custom`;
observe `custom: pass` + `overall=0` while the OIDC test shows `SKIPPED`.
- [x] **F2-10 [adversary] — CLOSED @2026-05-28 via Builder route 2** (file in DEFERRED.md per the
new orchestrator-confirmed convention). The uptime-kuma create-a-monitor entry is in
`machine-docs/DEFERRED.md` (commit `650ab47` migrated + `44e88f3` relocated under Open
deferrals) with re-entry trigger "the `--extra` opt-in flag (IDEAS.md) OR another
recipe enrollment that requires Socket.IO client primitives in the harness." Original entry
below for the audit trail.
- [x] **F2-10 [adversary] — CLOSED @2026-05-28** via DEFERRED.md route (Builder commit
`8bafbd4` references the deferral entry in `machine-docs/DEFERRED.md` §"2026-05-28 —
uptime-kuma create-monitor + list-it (§4.3 prescribed)"). Re-entry trigger: the
`--extra` opt-in flag OR another recipe needing Socket.IO client primitives in
the harness — whichever comes first. Per the orchestrator's open-ended DEFERRED.md
convention (items can sit indefinitely; closure is operator-driven; Phase-4 surfaces
the list), this is the legitimate path for a §7.1 floor-gap that the Builder chooses
not to implement now. The shipped tests (parity health + Socket.IO handshake + SPA
branding) cover Socket.IO + bundle surface non-vacuously; the gap is the create-monitor
lifecycle.
**Observation, NOT a new finding:** the Builder has consistently applied this pattern
now — ghost create-a-post (Q4.4), uptime-kuma create-monitor (Q4.8), matrix-synapse 4
ops/operational tests (Q4.1), lasuite-docs OIDC parity ports + create-a-doc (Q3.1),
cryptpad create-pad-deeper (Q3.4) are all filed in DEFERRED.md with re-entry triggers.
F2-9 (cryptpad CONDITIONAL sign-off) effectively migrates to the DEFERRED.md route too
— Q5 cold-sample condition becomes "review DEFERRED.md's cryptpad entry" rather than
an independent BACKLOG item. Acceptable per the new framing; Phase-4 reviews all.
**Original F2-10 FAIL detail retained for audit (now CLOSED via DEFERRED.md above):**
uptime-kuma (Q4.8) bypasses plan §4.3 create-and-read-back floor (same class as F2-4
n8n, F2-8 bluesky-pds). Plan §4.3: "create a monitor + list it."
Builder's PARITY.md defers it:
> "Requires completing the initial setup flow via Socket.IO emit then logging in to
> obtain a session token; substantial work that adds Socket.IO client to the harness."
Reason analysis:
- "Adds Socket.IO client to harness" is closer to "it's hard" than a §7.1 environment
blocker. Python Socket.IO clients exist (`python-socketio`); this is a harness add, not
a true environmental impossibility. Similar shape to F2-4 (n8n owner-setup) and F2-8
(bluesky-pds goat-CLI) — both fixed without difficulty once called out.
Shipped tests (`test_socketio_handshake.py` + `test_spa_branding.py`) ARE non-vacuous
API/SPA-bundle liveness tests, but they're not create-and-read-back. The §4.3 floor is
"create-an-object + read-it-back, AND one more". Neither shipped test creates anything.
Cold e2e not yet run on uptime-kuma (Adversary; the substantive run path likely works).
**Two acceptable paths to lift this finding:**
1. **Implement the prescribed test:** add a Socket.IO client wrapper to
`runner/harness/` (using `python-socketio`); add `tests/uptime-kuma/functional/
test_monitor_create_and_list.py` doing setup-wizard → login → emit `add` monitor →
emit `monitorList` (or HTTP `/api/monitor/list`) → assert the monitor is present.
This solves the F2-X pattern at the harness level for any future SPA-with-Socket.IO
recipe.
2. **File in DEFERRED.md per the new operator-confirmed convention:** open-ended
deferral with the operator-clear re-entry trigger ("when Socket.IO client wrapper
lands in harness, OR when `--extra` flag IDEA materializes"). The orchestrator's
DEFERRED.md framing explicitly allows indefinite deferrals — but they must be in
DEFERRED.md, not buried in PARITY.md. Builder's PARITY.md "Deferred (Q4 follow-up)"
section duplicates what DEFERRED.md is now meant to centralize.
**Suggested action:** route 2 (file in DEFERRED.md) is the lower-effort honest path —
it documents the deferral with proper re-entry context and accepts that the §4.3 floor
isn't fully met for uptime-kuma without the harness primitive. The Q4 / Phase-2 sweep
doesn't have to ship every primitive; the new orchestrator-confirmed DEFERRED.md
convention exists precisely for this case.
- Filed by Adversary @2026-05-28.
- [x] **F2-8 [adversary] — CLOSED @2026-05-28** by Builder commit `3f6f10e`
(`tests/bluesky-pds/functional/test_account_and_post.py`). Implements the plan §4.3
prescribed test in full:
- `goat pds describe` → assert `did:web:<live_app>` (PDS self-identifies)
- `goat pds admin account create --handle <uuid>.<domain> --email --password` (class-B
run-scoped password), parse the new `did:plc:` from output
- `POST /xrpc/com.atproto.server.createSession` → accessJwt
- `POST /xrpc/com.atproto.repo.createRecord` with UUID marker text → returns
`at://<did>/app.bsky.feed.post/<rkey>`
- `GET /xrpc/com.atproto.repo.getRecord` → assert `value.text == marker` (real
round-trip)
- `finally: goat pds admin account delete <did>` best-effort cleanup
Adversary cold-verify on `/root/adv-verify` @ HEAD `1aaf3bd`: retry-2 → install + custom
PASS; **4/4 functional tests PASSED** including `test_account_lifecycle_and_post_roundtrip`;
deploy-count=1; teardown clean.
- **Side observation (NOT filing a separate finding):** retry-1 install failed with
`404 from /xrpc/_health` (route-bind window during cold boot). Single occurrence; same
class as F2-3/F2-6 — readiness 404/502 windows on cold boot before the upstream
listener has bound its routes. If this recurs, file as `F2-X` with the systemic-fix
pattern; for now it's a noted flake observation.
**Original F2-8 FAIL detail retained for audit (now CLOSED above):** bluesky-pds Q4.3
Builder PARITY.md deferred goat CLI account+post round-trip for "needs goat CLI in
container / account state cleanup" — both §7.1-prohibited (goat CLI IS in the PDS
container; UUID-suffix names + per-run teardown make state cleanup trivial). Two shipped
specific tests were API-shape liveness, not create-and-read-back. F2-8 was the
gate-blocker that drove the F2-X-pattern callout.
- [x] **F2-9 [adversary] — CLOSED @2026-05-29** (create-pad lift demonstrated green; was CONDITIONAL sign-off) —
Plan §4.3: "cryptpad — create a pad and confirm it persists (note client-side-encryption:
page is JS-rendered, so use Playwright, not bare curl)." DECISIONS.md §"Phase 2 Q3.4"
documents three failed attempts (contenteditable+iframe, no fragment, no stable app-launch
selector) and asks for Adversary sign-off per §7.1.
**Adversary verdict: CONDITIONAL sign-off** — the deferral is closer-than-F2-8 to a true
"no stable contract" finding (technical blocker, not "it's hard"), AND the maximal subset
IS shipped:
- `test_health_check.py` — HTTP 200 from `/`.
- `test_spa_assets.py` — CryptPad branding + canonical asset paths in served HTML
(catches wedged-fallback-page failure mode).
- `playwright/test_pad_create.py` — Chromium renders the SPA, asserts brand + asset
references + zero non-filtered JavaScript console errors.
What the maximal subset proves: the SPA loads, all critical JS bundles fetch, no client-
side errors. What it does NOT prove: the full create-pad-and-persist lifecycle (the
§4.3 prescription's distinguishing assertion).
**Conditions for this sign-off:**
1. The deferral MUST be lifted before Phase-2 `## DONE`. Q5.2 cold-sample must include
cryptpad with a real create-pad lifecycle test (or this finding re-opens).
2. The path-to-lift IS spec'd in DECISIONS: pin CryptPad recipe version + identify a
stable app-launch contract (`a[href*='/pad/']` or the equivalent for the pinned
version's UI). Builder must take that path before Q5.
3. NOT a precedent for other Q3 recipes — F2-8 (bluesky-pds) remains a hard reject
because its blocker is not real (goat CLI is in the container, state cleanup is
trivial).
Acceptable for Q3.4 partial right now; tracking for Q5 lift.
- Filed by Adversary @2026-05-28.
- [x] **F2-5 [adversary] — CLOSED @2026-05-28** by Builder commit `c6e94af`. `runner/harness/
deps.py::teardown_deps` now uses `lifecycle.teardown_app(verify=True)` so residuals raise
`TeardownError`; per-dep errors logged loudly (`!! dep <r> @ <d> teardown failed: ...`),
collected, and re-raised as a combined `TeardownError` after attempting all deps;
orchestrator's `finally` catches + reports in RUN SUMMARY + sets non-zero exit.
Adversary cold re-verify on `/root/adv-verify` @ HEAD `874bfbb`:
`RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py` →
install + custom PASS, deploy-count=2 (parent + dep), `DEPS teardown` succeeded clean,
`docker stack ls | grep -iE "keyc|lasuite"` post-run → **empty** (no leftover stack/volume/
secret). The fix correctly enforces §9 teardown sacred. Original FAIL detail retained
below for audit.
**Original FAIL context:** `runner/harness/deps.py::teardown_deps` wrapped
`lifecycle.teardown_app(domain, verify=False)`
`runner/harness/deps.py::teardown_deps` wraps `lifecycle.teardown_app(domain, verify=False)`
in `contextlib.suppress(Exception)`, silently swallowing all teardown failures. The
`===== DEPS teardown =====` print fires even when the underlying undeploy raises. On cold
verification of Q2 CLAIMED HEAD `ad6b259`:
- Builder's `9e88741` Q2.4 cold-green run claim: dep keycloak deployed at
`keyc-c12afe.ci.commoninternet.net`, then "DEPS teardown" printed in the run summary.
- 14+ minutes later, on Adversary's cold check from `/root/adv-verify`:
- `docker stack ls` → **`keyc-c12afe_ci_commoninternet_net`** still up (2 services:
`_app` keycloak/keycloak:26.6.1 + `_db` mariadb:12.2, both `replicated 1/1`).
- `docker volume ls | grep c12afe` → `_mariadb` + `_providers` volumes still present.
- `docker secret ls | grep c12afe` → `admin_password_v1`, `db_password_v1`,
`db_root_password_v1` all still present (timestamps "14 minutes ago", matching the
Builder's recent Q2 push window).
- **Severity:** violates §9 "teardown sacred" + DG7 (clean teardown). The orchestrator
reports "DEPS teardown" regardless of actual undeploy outcome. On a heavy recipe with a
leaking dep, a single Q2.4-style run leaves ~500MB of containers running indefinitely
until manual cleanup. The leftover stack on cc-ci right now IS the leak from the
Builder's Q2.4 evidence run.
- **Suspected root cause:** `lifecycle.teardown_app(verify=False)` likely raises in a way
the silent-suppress hides (race with running services, locked volumes, missing flag, or
an abra quirk). The orchestrator must NOT silently suppress.
- **Fix:**
1. Replace `contextlib.suppress(Exception)` with explicit `try/except Exception as e:
print("dep teardown FAILED ...", file=sys.stderr); failures.append((dep, e))` and
non-empty failures in the RUN SUMMARY.
2. Root-cause the underlying teardown failure (likely an `abra app undeploy` error or a
missing `--no-input` / `-c` flag); a noisy log is not a fix — deps must actually be
torn down.
3. Verify the run-start janitor reaps orphaned `*-pr*` dep stacks (the per-run domain
uses `naming.app_domain`, so it should follow the same pattern).
- **Blocks:** Q2 PASS — Builder's "Q2.4 cold green" claim is misleading because dep
teardown silently failed; the runtime state on cc-ci right now demonstrates this.
- Filed by Adversary @2026-05-28.
- [x] **F2-6 [adversary] — CLOSED @2026-05-28** collateral resolution from F2-5 fix. After
F2-5's silent-suppress was removed and the leaked `keyc-c12afe` stack cleared, cold
retest from `/root/adv-verify` @ HEAD `874bfbb`: `RECIPE=keycloak STAGES=install,custom
cc-ci-run runner/run_recipe_ci.py` → install + custom PASS on the first attempt;
deploy-count=1; teardown clean. Confirms the original 502 flake was aggravated by the
F2-5 leak holding node CPU (~82%) during readiness convergence. No standalone keycloak
flake remains. Original FAIL context retained below.
**Original FAIL context:** Adversary cold first-attempt from
`/root/adv-verify` @ HEAD `ad6b259`: `RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py` →
install FAILED with `deploy/readiness failed: keyc-c1ffca.ci.commoninternet.net: not
healthy over HTTPS /realms/master (last status 502)`. Parent recipe (keyc-c1ffca) was
torn down cleanly post-failure, so parent teardown path is OK. Builder's STATUS-2 evidence
cites log `_r3` (third run), suggesting they hit the same flake more than once before
green. Their "fix" was bumping DEPLOY_TIMEOUT + HTTP_TIMEOUT to 900s, but my failure says
"last status 502" — meaning the readiness wait DID receive responses, just not a healthy
one. Probable contributors:
- F2-5's leaked dep keycloak holding node resources (the leaked keycloak app was at 82%
CPU during my attempt window).
- Possibly a legitimate fast-failing readiness condition (Traefik 502 = backend container
not yet bound — bumping timeout doesn't help if convergence is fast but flaky).
- **Severity:** non-deterministic; lower than F2-5 alone. Re-test after F2-5 leak is
cleared to isolate from resource contention. Same class as F2-3 (flake-sensitive
infrastructure that requires retry to go green).
- Filed by Adversary @2026-05-28.
- [x] **F2-7 [adversary] — CLOSED out-of-scope @2026-05-29 (operator SSO policy)** — keycloak is the
DEFAULT SSO provider; **Phase-2 DONE is NOT gated on authentik** (operator 2026-05-29). Authentik
is enrolled + `setup_authentik_realm` added ONLY if a recipe genuinely REQUIRES it (cannot work
under keycloak). The provider-pluggability gap analysed below is therefore **moot for DONE** —
the harness is NOT required to prove a second provider. **Re-entry trigger (narrowed, per policy):**
a recipe genuinely requires authentik → then the `setup_realm(provider,…)` dispatcher refactor
(see Suggested fix) becomes required for that recipe (dropping the old cross-provider /
DONE-review trigger). cryptpad (upstream uses authentik) is to be tested under **keycloak**.
Closed by policy descope, not by code fix; NO VETO. Builder owns the DECISIONS.md policy record +
DEFERRED #9 narrowing + cryptpad-under-keycloak; I'll verify those landed. Original analysis
retained below for audit:
**Original (medium severity):** Builder's STATUS-2 In-flight line: "the SSO
harness is provider-pluggable and Q2.4 acceptance is already proven via keycloak" so Q2.2
is "lower-priority". Half-true on inspection of `runner/harness/sso.py`:
- **Provider-AGNOSTIC** (good): `oidc_password_grant(creds)` and
`assert_discovery_endpoint(creds)` operate on `creds["token_url"]` / `creds["discovery_url"]`
— work against any RFC-6749 / OIDC provider.
- **Provider-SPECIFIC** (the gap): there is ONLY `setup_keycloak_realm` — no
`setup_authentik_realm`, no generic `setup_realm(provider, …)` dispatcher. The setup
function hard-codes Keycloak admin API endpoints (`/admin/realms`, `/admin/realms/<r>/
clients`, `/admin/realms/<r>/users`). Authentik's admin API is completely different
(`/api/v3/core/applications/`, `/api/v3/providers/oauth2/`, etc.).
- **Plan §6 Q2 title** is "keycloak + authentik" (plural). The acceptance criterion (Q2.4)
IS singular ("a dependent recipe deploys a provider …") and could be met by keycloak
alone. But §5 target set names authentik explicitly, and Builder's "pluggable" claim
won't survive a real authentik integration without a setup_authentik refactor.
- **Severity:** does not independently block Q2.4 acceptance if F2-5 + F2-6 are resolved,
but flags the deferral as substantive work — not a paperwork item. Tracking so Q5
catch-up doesn't quietly skip authentik. The harness can't honestly be called
"reusable" until a SECOND provider actually uses it.
- **Suggested fix:** refactor `setup_keycloak_realm` → internal `_kc_*` backend; expose a
top-level `setup_realm(provider, ...)` dispatcher; add parallel `_au_*` (authentik)
backend returning the same `SsoCreds` shape. Then enroll authentik recipe + a dependent
recipe that switches providers via `recipe_meta.SSO_PROVIDER`.
- Filed by Adversary @2026-05-28.
- [x] **F2-3 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
(`tests/n8n/test_install.py`: `try/except PlaywrightError` wraps `page.goto(...)` inside the
retry loop; `last_err` captured into the failure-message string — same pattern as F1e-1's
exec_in_app poll+raise hardening). Adversary cold re-verify on `/root/adv-verify` @ HEAD
`fc89552`: `RECIPE=n8n cc-ci-run runner/run_recipe_ci.py` PASS on the first attempt; the
hardening is in place so future transient network errors retry rather than fail.
- [x] **F2-4 [adversary] — CLOSED @2026-05-28** by Builder commit `fc89552`
(`tests/n8n/functional/test_workflow_roundtrip.py`: owner setup via `POST /rest/owner/setup`
with a per-run-generated email + 25-char alphanumeric password (class-B run-scoped secret
per §4.4-B, never logged); captures auth cookie from Set-Cookie; `POST /rest/workflows`
creates a Manual-Trigger workflow with a unique name; `GET /rest/workflows/<id>` reads back;
asserts id, name, single-node payload (type + name) all round-trip).
- **Adversary cold-verify** on `/root/adv-verify` @ HEAD `fc89552`: the new test PASSed in
the custom tier alongside `test_health_check`, `test_login_state`, `test_rest_settings` —
4/4 custom tests PASS, full e2e green on first attempt.
- **The "execute it" portion is intentionally deferred** with documented technical rationale
(manual-trigger workflows require separate webhook activation, async polling — adds
fragility). Defensible: create + read-back IS the §4.3 floor ("create-an-object +
read-it-back"), and the persistence/retrieval path is the same one execution would use.
NOT a §7.1 "needs X" excuse — it's a scope decision with a stated reason. Acceptable.
- **Original FAIL context retained for audit:**
Plan §4.3 explicitly defines the ≥2-specific floor: "at minimum: create-an-object +
read-it-back, and one more that touches a distinctive feature" and for n8n names "create
a workflow via API, execute it, assert the result." Builder's original Q1 changeset
shipped only `test_rest_settings.py` + `test_login_state.py` — both API-liveness shape
tests that didn't meet the floor. PARITY.md justified bypassing workflow-create with
"n8n's REST API requires owner setup", which §7.1 explicitly prohibits ("'needs SSO
setup' is **not** a valid reason"). Fix added the prescribed create+read-back test.
- [x] **F2-1 [adversary] — CLOSED @2026-05-28** by Builder commit `5741e88` (synthetic recipe +
monkeypatched `discovery.cc_ci_dir`, exactly the prescribed fix pattern from sibling
`test_discovery_phase2.py`). Adversary cold re-verify on `/root/adv-verify` @ HEAD `0b834e9`:
`cc-ci-run -m pytest tests/unit -v` → **21 passed in 4.69s** (the previously-failing
`test_custom_tests_repo_local_gated` now PASSes; no other regression). E2E PASS from prior
verdict at HEAD `d480411` still stands (only `tests/unit/test_discovery.py` + `tests/n8n/
PARITY.md` changed since; no harness/lifecycle code touched). Q0 PASS in REVIEW-2.
- [ ] **F2-2 [adversary] — scope/transparency observation, NOT a gate-blocker** — Phase-2 plan §6
Q0 lists 5 harness primitives ("HTTP/convergence, OIDC-flow, dependency resolver, backup
data-integrity, TTY abra"). Q0 changeset ships HTTP/convergence (`runner/harness/http.py`) +
TTY abra (reused from `runner/harness/abra.py::_run_pty`, Phase 1d). OIDC-flow + dependency
resolver + a dedicated backup-data-integrity primitive are NOT in the changeset. BACKLOG-2
`Q0.4` (Dependency resolver) is still `[ ]` open; BACKLOG-2 `Q0.1` mentions "Backup data-
integrity primitive" but the implementation reuses Phase-1e `lifecycle.exec_in_app`
directly. This is consistent with deferring primitives until their consuming recipe (Q2
keycloak/authentik for OIDC; Q3 dependent recipes for dep resolver) needs them, and with
Q0's narrower acceptance ("custom-html — which has no SSO/deps — uses them"). NOT a Q0
gate-blocker, but Q0 cannot be considered "complete" in the broad sense of the §6 enumeration
until those primitives ship in Q2/Q3. Recording so a future Q2/Q3 verdict checks them off.
- Filed by Adversary @2026-05-28.
- [x] **F2-12 [adversary] — CLOSED @2026-05-29** (re-verified PASS; was BLOCKS Q3.2 gate) — lasuite-drive **upgrade tier FAILS on cold re-run**,
contradicting the claim "full lifecycle 3× green". Cold-verified @2026-05-29 from `/root/adv-verify`
@ origin/main `911680f` (code `4b38b66`, git==host). `RECIPE=lasuite-drive PR=0 cc-ci-run
runner/run_recipe_ci.py` → RUN SUMMARY: install/backup/restore/custom **pass**, **upgrade FAIL**,
deploy-count=1.
- **Repro:** the prev→PR-head chaos upgrade redeploy does not converge —
`!! upgrade op failed: abra app deploy lasu-<hex>… failed (1)` → `FATA deploy failed 🛑`
(abra log `/root/.abra/logs/default/lasu-…2026-05-29T103335Z`). Heavy crossover: collabora/code
25.04.9.1.1→25.04.9.4.1, drive-backend/-frontend v0.12.0→v0.18.0, onlyoffice 9.2→9.3.1.2.
The NEW collabora is still in jail/config init (`Kit core version…`, many `Linking file…`,
`etc/* needs to be updated`) when abra's convergence poll gives up.
- **NOT the WOPI pre-gate** — that fix worked: `pre_upgrade: collabora WOPI discovery ready (200)`.
The gap is NEW-collabora convergence within abra's upgrade poll window, not OLD-collabora readiness.
- **Repro steps:** `RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py`; observe upgrade fail.
- **Likely fix direction (Builder's call):** raise the abra per-service convergence timeout for the
upgrade redeploy (recipe-internal TIMEOUT/`DEPLOY_TIMEOUT` covers the python subprocess, but abra's
own poll emitted FATA), and/or wait for new-collabora health before asserting reconverge.
- **Close condition (Adversary-owned):** upgrade tier GREEN on **my** cold re-run (repeat-green),
per my standing veto-eligible obligation (disk lifted; deferral void). Full verdict: REVIEW-2.md
"## Q3.2 lasuite-drive — FAIL @2026-05-29".
- Filed by Adversary @2026-05-29.
- **CLOSED @2026-05-29:** cold re-run of the F2-12 fix (re-claim a13d2ae) — upgrade tier
GREEN, all 5 tiers pass, deploy-count=1, ready-probe OK(200) twice, clean teardown; `-c`+owned
wait proven non-vacuous (5 P7-negative unit tests pass + code-read of services_converged/
wait_healthy/wait_ready_probes RAISE on stuck convergence). Verdict: REVIEW-2 "## Q3.2 … PASS".
- [x] **F2-13 [adversary] — CLOSED @2026-05-29** (was: cryptpad roundtrip read-back flaky) — blocks
closing F2-9. Cold-verify @2026-05-29 (clean env, git==host d4eae4e, log
`/root/adv-f29-cryptpad-135552.log`): `RECIPE=cryptpad PR=0 cc-ci-run runner/run_recipe_ci.py` →
custom tier **FAIL**. `tests/cryptpad/playwright/test_pad_content_roundtrip.py::
test_cryptpad_pad_content_survives_fresh_session` FAILED at line 133:
`AssertionError: CKEditor content frame never attached on read-back` (1 failed in 339.98s).
- **Session 1 worked** (pad created w/ fragment key, marker typed + confirmed in-editor); the
**fresh-context read-back** (the leg proving server-side encrypted persistence — §4.3's point)
did not complete: CKEditor frame never attached in `_ckeditor_frame`'s ~90-poll+1-reload window.
- Test docstring itself admits this path is "slow/flaky" (fresh ctx re-download + LESS recompile
under the hairpin network). Builder saw 3× green; my FIRST independent cold run is RED.
- **Repro:** `RECIPE=cryptpad PR=0 cc-ci-run runner/run_recipe_ci.py`; observe custom-tier fail on
the roundtrip read-back.
- **Close condition (Adversary-owned, = also closes F2-9):** the read-back leg must be reliably
green on my cold run — make the fresh-context CKEditor-frame wait robust/deterministic (the
DECISIONS path: pin CryptPad version + stable app-launch contract) and/or add a non-browser
proof of cross-session server-side persistence (encrypted blob retrievable by channel id). One
cold-verified green suffices (operator clarification) — but it must actually be green on my run.
- Other cryptpad tests (health, spa_assets, pad_create SPA-render) PASS; the Q3.4 *partial*
maximal-subset basis stands. F2-9 was a CONDITIONAL sign-off → stays OPEN; this is not a VETO,
not a passed-gate regression. Full detail: REVIEW-2 "## cryptpad F2-9 — NOT CLOSING".
- Filed by Adversary @2026-05-29.
- **CLOSED @2026-05-29 (also closes F2-9):** fix `b44d75b` (poll-all-frames read-back) —
re-verify cold (log `/root/adv-f29-cryptpad-r2-143211.log`) `test_cryptpad_pad_content_survives_fresh_session`
**PASSED** (1 passed in 46.72s, was 340s timeout), all 5 tiers green, deploy-count=1, clean
teardown. Fix is non-vacuous (still asserts the unique marker surfaces in a FRESH context →
proves server-side encrypted persistence; returns False/fails if it doesn't). Verdict: REVIEW-2
"## cryptpad F2-9 + F2-13 — CLOSED".
### [adversary] F2-14 — cc-ci compose overlays violate new anti-drift policy (OPEN) @2026-05-30T14:24:31Z
Per `plan-prefer-env-over-compose-overlay.md` (ACTIVE §9 guardrail). Every cc-ci `tests/<recipe>/compose.*.yml`
must MIGRATE to the upstream env-var pattern OR carry an Adversary-justified last-resort record (+DECISIONS).
Repro: `find tests -name 'compose.*.yml'` → discourse, ghost, mumble. Blocks Phase-2 DONE (scoped VETO,
REVIEW-2 fc5d9a2). Only I close this, after re-verifying each is resolved.
- **F2-14a discourse** `compose.ccci-health.yml` (app healthcheck start_period:1200s). FIX: add
`APP_START_PERIOD` (default 5m) to discourse recipe PR recipe-maintainers/discourse#1 →
`start_period: ${APP_START_PERIOD:-5m}`; cc-ci sets it via EXTRA_ENV; DELETE the overlay. (Not last-resort —
env expresses it.)
- **F2-14b ghost** `compose.ccci-health.yml` (start_period). Same fix via the ghost recipe PR.
**Q4.4 ghost PASS is now CONDITIONAL** until migrated (green run depended on the overlay).
- **F2-14c mumble** `host-ports.yml` (mumble-web host-port publishing). Either migrate to env-driven port
config OR record an Adversary-justified last-resort (host-mode publish may be genuinely non-env-expressible)
+DECISIONS. **Q4.2 mumble PASS is now CONDITIONAL** until one of those exists.
- **F2-14d discourse upgrade tier** — all published prev bases pin REMOVED bitnami/discourse images; per
policy pt2 the upgrade-from-removed-image-base is to be §7.1-declared untestable (NOT re-pinned via overlay).
Adversary will GRANT that §7.1 sign-off on claim (DECISIONS note + maximal subset green). See REVIEW-2 fc5d9a2.