Files
cc-ci/machine-docs/STATUS-2.md

670 lines
48 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# STATUS — Phase 2 (per-recipe test authoring)
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase2-recipe-tests.md`
**Loop state for THIS phase:** STATUS-2 / BACKLOG-2 / REVIEW-2 / JOURNAL-2 (DECISIONS.md shared).
Phase 1/1b/1c/1d/1e STATUS/BACKLOG/REVIEW files are HISTORY (all DONE) — not this phase's state.
## Phase
Phase 2 authors **per-recipe test content** on top of the corrected Phase 1/1d/1e shared harness.
Per the plan, for every maintained Co-op Cloud recipe (§5 target set), the cc-ci `tests/<recipe>/`
tree must carry:
- Phase-1d/1e **lifecycle overlays** (assertion-only, additive) — `test_install.py`, `test_upgrade.py`,
`test_backup.py`, `test_restore.py` + `ops.py` pre-op seeds.
- **Parity-ported** tests from `references/recipe-maintainer/recipe-info/<recipe>/tests/*.py`,
one-to-one (P2), with a `PARITY.md` mapping table.
- **≥2 NEW recipe-specific functional tests** (P3) — characteristic behavior, not just `status==200`.
- **Real backup data-integrity** (P4): seed → backup → mutate → restore → assert seeded data survived.
- **Dependency resolution** (P5): recipes that need other apps (SSO providers, DBs) deploy them in-run.
- **Playwright** (P6) where the app's core UX is a UI flow.
- **Docs** (P8): `docs/enroll-recipe.md` updated with the per-recipe test contract + worked example.
## Definition of Done (Phase 2) — P1P8, each Adversary cold-verified in REVIEW-2
- [ ] **P1 — Coverage.** Every recipe in §5 target set has a `tests/<recipe>/` suite enrolled and a
full green `!testme` run (install + upgrade + backup-restore).
- [ ] **P2 — Parity port.** Every `recipe-info/<recipe>/tests/*.py` has a comparable cc-ci test;
`tests/<recipe>/PARITY.md` records the mapping; non-ports documented in DECISIONS.md.
- [ ] **P3 — Recipe-specific depth.** Each recipe has **≥2 new functional tests** beyond parity
(characteristic behavior, real assertions on app state/responses).
- [ ] **P4 — Backup data-integrity is real.** Seed → backup → mutate → restore → assert seeded data
survived (recipe-aware, not health-only). Pattern already proven in Phase 1e on custom-html.
- [ ] **P5 — Dependencies handled.** Recipes with deps declare them; harness deploys deps within the
run (respecting `MAX_TESTS`); SSO setup runs automatically.
- [ ] **P6 — Browser flows where they matter (D3).** UI-centric recipes have a Playwright test of the
core flow (login, create-an-object, etc.).
- [ ] **P7 — No weakened tests, no corners cut.** Every assertion is real; nothing skip/xfail'd,
mocked, or health-only stand-in. Any "untestable" claim is a true env-level blocker with
Adversary sign-off.
- [ ] **P8 — Docs.** `docs/enroll-recipe.md` updated with the per-recipe test contract (§4.1) and a
worked example; a new engineer can add a recipe's full suite from the docs.
## Milestones (plan §6)
- **Q0** — Harness additions (HTTP/convergence, OIDC-flow, dep resolver, backup data-integrity, TTY
abra). Reference recipe (custom-html) uses them for full parity+specific suite, green via `!testme`.
- **Q1** — Pattern proof (custom-html + n8n): full parity + ≥2 specific + real backup data-integrity.
- **Q2** — SSO providers (keycloak + authentik); reusable SSO-setup/OIDC-flow harness e2e.
- **Q3** — SSO-dependent suite (lasuite-docs, lasuite-drive, lasuite-meet, cryptpad, immich); deps
auto-deployed, SSO setup automated, parity + specific.
- **Q4** — Remaining recipes (matrix-synapse, mumble, bluesky-pds, ghost, mattermost-lts, discourse,
plausible, uptime-kuma, mailu, drone).
- **Q5** — Completeness + docs; flip `## DONE`.
## In flight
**Q4.6 discourse — BLOCKED/DEFERRED @2026-05-29.** Upstream recipe pins `bitnami/discourse:*` images
that Docker Hub no longer serves (manifest unknown; swarm task Rejected "No such image"). Image exists
at `bitnamilegacy/discourse` but the install tier deploys the prev published version (also gone), so a
recipe-PR can't unblock testing until upstream releases a fixed version (same class as plausible Q4.7b).
Scaffolding staged (recipe_meta + postgres-P4 overlays + health, commit ca7acf3); §4.3 create-topic not
written (deploy blocked). DEFERRED.md 2026-05-29 discourse entry. Node fully torn down/clean.
**Q4.9 mailu — ✅ Adversary PASS @2026-05-29 (REVIEW-2 `2958eb6`); P4-N/A §7.1 sign-off GRANTED. DONE.**
**Q4.10 drone — BLOCKED on a host /etc/timezone deploy (operator) @2026-05-29.** drone (last §5
recipe) is a CI server that REQUIRES a git-provider SCM to boot; its only dep is **gitea**, which
binds `/etc/timezone:ro` — absent on the NixOS host (`time.timeZone` makes only /etc/localtime). gitea
container REJECTED (proven via the drone+gitea smoke). **Declarative fix committed `3bde76f`**
(`environment.etc.timezone=UTC`); needs the operator host-deploy (`nixos-rebuild`, same mechanism as
the immich time.timeZone fix — no self-service path; `/root/cc-ci` is operator-synced + stale). The
full gitea+drone integration is SCOPED + ready (JOURNAL-2 `f86a58a`); §4.3 build-creation is a
disproportionate sub-deferral (maximal-subset + §7.1 sign-off). See ## Blocked + DEFERRED.md.
install+upgrade(3.0.0→3.0.1)+custom green; backup/restore N/A-skip (no backupbot → P4 N/A, §7.1
sign-off requested); 2 functional (create-mailbox + send→deliver→fetch mail-flow). TLS_FLAVOR=notls;
in-container sendmail/doveadm. Commits 916bdd8+8844943; log ccci-mailu-full2. **NEXT: drone Q4.10**
(last §5 gap; HTTP single-service; no backupbot [P4 N/A]; functional depth needs gitea OAuth dep).
**Q4.7 plausible — Adversary finding ACK @2026-05-29 (REVIEW-2 `0efcc36`).** Test content + deferral
verified sound; only gap: my "§4.3 proven green" claim lacks a surviving evidence log on cc-ci.
Builder action: after mumble, run `RECIPE=plausible PR=0` (or functional subset) when the GitHub/
clickhouse-backup rate-limit cooldown allows ClickHouse to boot, and PRESERVE the log
(`/root/ccci-plausible-green.log`) so the two `*_event_roundtrip` tests are independently shown green.
Queued behind the mumble re-run (single-node MAX_TESTS=1; heavy deploys sequential).
**Q4.2 mumble — ENROLLING (not claimed) @2026-05-29.** Next P1-coverage gap (mumble is in §5 target
set; was unenrolled). mumble has a recipe-maintainer corpus (P2 NON-vacuous): `health_check.py` (TCP
64738), `mumble_connect.py` (full TLS protocol handshake → ServerSync + channel presence), `web_client.py`
(mumble-web HTTP UI). Plan: enroll with `COMPOSE_FILE=compose.yml:compose.mumbleweb.yml:compose.host-ports.yml`
— mumbleweb gives an HTTP readiness endpoint (generic `wait_healthy`) + the web_client parity test;
host-ports publishes 64738 on the cc-ci host so the TLS-protocol tests (run on-host by cc-ci-run) connect
to 127.0.0.1:64738. P4 via sqlite (recipe backupbot dumps `/data/mumble-server.sqlite`). 3 version tags
(0.1.0/0.2.0/1.0.0) → real upgrade tier. De-risking reachability with a smoke deploy first.
**CODE COMPLETE (commits 6841048+6bf0425+999dd0d):** recipe_meta + 3 parity ports
(test_tcp_health/test_protocol_handshake/test_web_client) + 2 specific (welcome-text + max-users
config round-trips over the protocol) + P4 (ops/test_backup/test_restore via sqlite ci_marker) +
test_install overlay + PARITY.md + install_steps.sh (provides host-ports overlay to versions
predating it) + `CHAOS_BASE_DEPLOY` harness flag (untracked overlay trips abra's pinned clean-tree
check → chaos base deploy). Smoke proved web 200 + `Mumble`/`config.js`, TCP 64738 host-published,
sqlite3 in container. **FULL LIFECYCLE GREEN @2026-05-29 — CLAIMED (see ## Gate Q4.2), awaiting
Adversary.** All 5 tiers pass, deploy-count=1, HC1 upgrade crossover 0.2.0→1.0.0+, P4 ci_marker
survives, clean teardown; log `/root/ccci-mumble-full6.log`. Needed 4 fixes en route (host-ports for
old versions, CHAOS_BASE_DEPLOY, recipe_checkout -f, TCP voice-server READY_PROBE) — all in DECISIONS.
**Q3.3 lasuite-meet — ✅ Adversary PASS @2026-05-29 (REVIEW-2 `a46f7d4`).** Cold-verify all 5 tiers
GREEN, real upgrade crossover, meeting_flow + OIDC PASSED, ci_marker survives, clean teardown; WebRTC
media-relay non-port got explicit Adversary sign-off. Q3.2+Q3.3 PASS; Q3.4 cryptpad green (F2-9
resolved, awaiting close); Q3.1 lasuite-docs partial; **Q3.5 immich — enrolling now.**
**Q3.5 immich — VALIDATING (not claimed).** recipe_meta + ops + lifecycle overlays + health parity
landed (`98a37d4`); install,custom validation deploy in flight (`/root/ccci-immich-v1.log`). §4.3
upload-asset→list→thumbnail test next (after live-API discovery). Self-contained (no SSO dep).
**Q3.2 lasuite-drive — ✅ Adversary PASS @2026-05-29 (REVIEW-2 `3f5d58a`); F2-12 CLOSED.** Cold
re-run all 5 tiers GREEN, upgrade tier passes, deploy-count=1, ready-probe OK(200)×2, OIDC+minio PASS,
data-integrity survives, clean teardown; `-c`+owned-wait/READY_PROBE proven non-vacuous. The standing
veto-eligible obligation (lasuite-drive upgrade-tier green) is CLEARED. Q3.1/Q3.3/Q3.5 remain for Q3.
**cryptpad F2-9 + F2-13 — ✅ Adversary CLOSED @2026-05-29 (REVIEW-2 `f7ed2d9`).** The poll-all-frames
read-back fix (`b44d75b`) cold-verified: `test_cryptpad_pad_content_survives_fresh_session` PASSED
(46s, was a 340s timeout), all 5 tiers green, non-vacuous (still proves server-side E2E-encrypted
persistence), clean teardown. The §4.3 create-pad FLOOR is demonstrated → cryptpad's conditional
sign-off satisfied. **One of the two original Phase-2-DONE blockers is cleared.** (Q3.4 cryptpad
fully green.)
**cryptpad F2-9 — (prior) RESOLVED note (superseded by the F2-13 fix above).** `test_pad_content_roundtrip.py` (§4.3
create-pad → type → fresh-context read-back; commits `05d0dc1`+`656b68b`) is **green in the full
harness custom tier on a fresh cold deploy** — `/root/ccci-cryptpad-full3.log`: 5 tiers pass,
`test_cryptpad_pad_content_survives_fresh_session` PASSED, deploy-count=1, clean teardown. F2-9 is
Adversary-owned — left for the Adversary to close on cold-verify (HOW: `ssh cc-ci 'cd /root/<clone> &&
git pull && RECIPE=cryptpad PR=0 cc-ci-run runner/run_recipe_ci.py'` → custom tier roundtrip PASSED).
DEFERRED.md cryptpad create-pad entry marked resolved.
Working next unblocked items (next Q4 recipe) meanwhile.
**cryptpad F2-9 — RESOLVED (test landed `05d0dc1`, 3/3 green; Adversary to close).** New
`tests/cryptpad/playwright/test_pad_content_roundtrip.py` does the §4.3 create-pad → type → FRESH
browser context → read-back (proves E2E-encrypted server persistence). Full harness suite run pending.
Working next unblocked items meanwhile.
---
**Q3 + Q4 — recipe enrollment sprint.** After capacity unblock + Adversary checkpoint, landed:
- Q3.1 lasuite-docs partial (parity + 2 specific + Q2.4 test_oidc_with_keycloak); deeper OIDC
ports deferred in DEFERRED.md.
- Q3.4 cryptpad partial (parity + 2 specific); create-pad deferred F2-9 conditional (must lift
before Phase-2 DONE).
- Q4.1 matrix-synapse FULL (parity-aligned + 3 specific incl. §4.3 register-and-message).
- Q4.3 bluesky-pds FULL (4 functional incl. §4.3 account+post round-trip via goat CLI; F2-8 closed).
- Q4.4 ghost FULL (parity + 3 specific; create-post deferred in DEFERRED.md).
- Q4.8 uptime-kuma FULL (parity + 2 specific; create-monitor deferred in DEFERRED.md).
Harness change: `lifecycle.deploy_app` + `run_recipe_ci.py` + `deps.py` now thread
`recipe_meta.DEPLOY_TIMEOUT` into `abra.deploy(timeout=...)` so heavy-recipe Python subprocess
timeout matches the recipe's internal TIMEOUT.
DEFERRED.md (machine-docs/) — new orchestrator-canonical deferral registry; 9 entries open.
Remaining substantial: Q3.2 lasuite-drive (needs mirror), Q3.3 lasuite-meet (mirrored), Q3.5
immich (needs mirror), Q4.2/Q4.5-7/Q4.9-10 (mostly need mirror). The mirror-and-enroll path is
established (recipe-create-pr skill); pausing this sprint for Adversary cold-verify.
## Adversary findings — Builder response
**F2-11 — FIXED, awaiting Adversary re-verify** (commit: `git log --oneline | grep 'F2-11'`).
SSO-dep "deps-not-ready"
SKIP no longer yields a GREEN `!testme`.
- **WHAT:** when a recipe declares `DEPS` and `setup_custom_tests` fails (deps not ready) so its
`@requires_deps` (SSO/OIDC) tests SKIP, the run now reports **FAIL** (`overall=1`), not green —
while generic-tier failure-isolation is preserved (install/upgrade/backup/restore results stand).
- **WHERE (code):**
- `tests/conftest.py::pytest_collection_modifyitems` — now counts the requires_deps tests it skips
and appends the count to `$CCCI_DEPS_SKIP_REPORT`.
- `runner/run_recipe_ci.py` — sets `CCCI_DEPS_SKIP_REPORT` (run-scoped temp, near `depsfile`);
after teardown sums the count into `requires_deps_skipped`; RUN SUMMARY annotates the custom tier
(`custom: pass (N requires_deps SKIPPED ... SSO UNVERIFIED)`); new pure predicate
`sso_dep_unverified(declared, deps_ready, requires_deps_skipped)` flips `overall=1`.
- `tests/unit/test_f211_sso_skip.py` — 7 new unit tests.
- **HOW to verify (both deploy-free, rate-limit-independent):**
1. `ssh cc-ci 'cd /root/cc-ci && cc-ci-run -m pytest tests/unit -q'`**EXPECTED: 35 passed**
(28 prior + 7 F2-11).
2. Cold real-test signal proof:
`ssh cc-ci 'cd /root/cc-ci && rm -f /tmp/f211-skip.txt && CCCI_DEPS_READY=0 \
CCCI_DEPS_NOT_READY_REASON=boom CCCI_DEPS_SKIP_REPORT=/tmp/f211-skip.txt \
cc-ci-run -m pytest tests/lasuite-docs/functional/test_oidc_with_keycloak.py -rs; \
cat /tmp/f211-skip.txt'`
**EXPECTED:** `1 skipped`, pytest exit 0 (the hazard), and `/tmp/f211-skip.txt` == `1`. Since
lasuite-docs declares `DEPS=["keycloak"]`, the orchestrator computes
`sso_dep_unverified(["keycloak"], False, 1)=True``overall=1`.
- **NOT verified by a live run yet:** full e2e (real deploy with forced setup_custom_tests failure →
observe `overall=1`) is deferred until the Docker Hub rate limit (## Blocked) lifts. The two proofs
above cover the predicate, the conftest signal on real files, and the count flow; only the
straight-line read→sum→predicate→overall wiring is unexercised by a live deploy.
## Gate
**Gate: Q4.9 mailu — ✅ Adversary PASS @2026-05-29 (REVIEW-2 `2958eb6`).** Cold first-hand full
lifecycle GREEN ×2: deploy-count=1, real upgrade crossover 3.0.0→3.0.1 (head_ref==chaos-version),
2 non-vacuous P3 (unique-mailbox create→read-back + unique-marker postfix→dovecot delivery), clean
teardown; **P4-N/A §7.1 sign-off GRANTED** (no backupbot label, independently confirmed); P5/P6 N/A
justified. No VETO. mailu enrolled (P1 coverage advanced). (Claim detail retained below.)
**WHAT.** mailu (full email stack: nginx front `app` + admin + postfix `smtp` + dovecot `imap` +
rspamd `antispam` + webmail + redis `db` + certdumper) runs **install + upgrade + custom GREEN**;
`deploy-count=1`; clean teardown. backup/restore **SKIP (N/A)** — the upstream recipe ships **no
`backupbot.backup` label** on any service (`backup_capable=False`), so there is no recipe backup
mechanism to exercise → **P4 is genuinely N/A for mailu as published** (documented in
`tests/mailu/PARITY.md` + `machine-docs/DEFERRED.md` 2026-05-29 mailu entry). **Requesting Adversary
§7.1 sign-off on P4-N/A** (alternative: a cc-ci-authored backupbot recipe-PR, mirroring immich Q3.5).
- **P2 — VACUOUS:** no `recipe-info/mailu/tests/` corpus exists in the recipe-maintainer workspace,
so there are no tests to port (documented in PARITY.md).
- **P3 — 2 recipe-specific functional tests (both green):** `functional/test_mailbox.py` (create a
mailbox via the admin container's `flask mailu user` CLI → read it back from `flask mailu
config-export --json` → assert present: admin-DB provisioning round-trip) +
`functional/test_mail_flow.py` (the characteristic mail flow: inject a uniquely-marked message via
the postfix container's local `sendmail` → poll dovecot's `doveadm search` in the imap container →
assert delivered/stored: a real postfix→rspamd→dovecot deliver/store/fetch).
- **cc-ci integration:** `recipe_meta.EXTRA_ENV(domain)` sets `MAIL_DOMAIN`/`HOSTNAMES`=run domain,
`TRAEFIK_STACK_NAME=traefik_ci_commoninternet_net` (resolves certdumper's external `*_letsencrypt`
volume), and **`TLS_FLAVOR=notls`** (mailu's mail-port TLS comes from certdumper dumping traefik's
ACME acme.json, which cc-ci has none of — file-provider wildcard cert; notls removes the dep;
certdumper still converges idle). The mail tests use the **in-container** sendmail/doveadm because
notls makes dovecot refuse plaintext auth over the network (port 143) — the in-container path
exercises the same delivery/storage stack. `HEALTH_PATH=/` (front nginx → 301).
**HOW (Adversary, cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && git pull && RECIPE=mailu PR=0 cc-ci-run runner/run_recipe_ci.py'
```
**EXPECTED:**
- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/custom` **pass**; `backup/restore`
**skip** (N/A, no backupbot — EXPECTED, not a failure).
- Upgrade: `upgrade→PR-head: head_ref=23309a1a chaos-version=23309a1a version=3.0.0+2024.06.27→
3.0.1+2024.06.37` (real crossover; head_ref==chaos-version = HC1).
- Custom — **3 PASS**: `test_mailu_front_serves`, `test_create_mailbox_and_read_back`,
`test_send_and_receive_mail`.
- Clean teardown: post-run `docker stack ls | grep mail` → empty.
**WHERE.** Commits `916bdd8` (mailu tests) + `8844943` (in-container mail-flow rewrite, drop network
IMAP-auth test). Files: `tests/mailu/{recipe_meta.py,PARITY.md,functional/{_mailu.py,test_health_check.py,
test_mailbox.py,test_mail_flow.py}}`. Log `/root/ccci-mailu-full2.log`. Smoke-discovery logs:
`/root/mailu-smoke.log` (convergence/health/ports/flask CLI) + `/root/mailu-smoke2.log` (proved
sendmail-inject → doveadm-search delivery). DEFERRED.md mailu P4-N/A entry.
---
**Gate: Q4.2 mumble — ✅ Adversary PASS @2026-05-29 (REVIEW-2 `1daa1ea`).** Cold first-hand full
lifecycle GREEN on the Adversary's clone: all 5 tiers, deploy-count=1, tcp ready-probe ×2, real
upgrade crossover 0.2.0→1.0.0+ (head_ref==chaos-version), P3 config round-trips non-vacuous
(max_users=42 + welcome marker decoded from wire bytes), P4 sqlite ci_marker survives, clean teardown;
P5/P6 N/A justified. No VETO. First non-HTTP/TCP-voice recipe enrolled. (Claim detail retained below.)
**WHAT.** mumble (the §5 TCP/voice recipe — first non-HTTP-native recipe) runs its **full lifecycle
GREEN**: install + upgrade (real prev→PR-head crossover) + backup + restore + custom. `deploy-count=1`,
clean teardown. Enrolled by deploying two upstream overlays via
`COMPOSE_FILE=compose.yml:compose.mumbleweb.yml:compose.host-ports.yml`:
- **mumbleweb** → an HTTP web-client sidecar (HEALTH_PATH `/` → 200) giving the generic harness its
serving/readiness signal + the `web_client.py` parity surface.
- **host-ports** → publishes 64738 (tcp+udp, mode:host) on the cc-ci host so the on-host (cc-ci-run)
protocol tests connect to 127.0.0.1:64738.
- **P2 (3 parity ports, all green):** `health_check.py`→`functional/test_tcp_health.py` (TCP 64738);
`mumble_connect.py`→`functional/test_protocol_handshake.py` (TLS handshake → server Version → auth
accepted → ≥1 channel = channel presence → ServerSync, via vendored `functional/_mumble_proto.py`);
`web_client.py`→`functional/test_web_client.py` (HTTPS 200 + `Mumble`/`config.js`/`<!DOCTYPE html>`).
No recipe-maintainer mumble test omitted. `tests/mumble/PARITY.md` has the mapping.
- **P3 (2 specific, beyond parity, version-independent config round-trips):**
`functional/test_welcome_text_roundtrip.py` (deploy-set `WELCOME_TEXT` marker
`cc-ci-mumble-welcome-7f3a9c` surfaces in the ServerSync welcome_text) +
`functional/test_server_config_limits.py` (deploy-set non-default `USERS=42` surfaces as
ServerConfig.max_users==42). Both prove deploy-time config propagated into the running murmur server.
- **P4 (real backup data-integrity):** `ops.py` seeds a `ci_marker` row into
`/data/mumble-server.sqlite` (the exact file the recipe's backupbot `.backup`/restore hooks dump),
`test_backup.py` asserts it at backup time, `pre_restore` drops it, `test_restore.py` asserts it
returns as `original`. sqlite busy timeout set via the silent `.timeout` dot-command.
Harness/enrollment additions (DECISIONS.md "mumble" entries): `recipe_meta.CHAOS_BASE_DEPLOY` flag +
`lifecycle._recipe_meta_flag` + a `deploy_app` branch (the cc-ci-provided untracked host-ports overlay
trips abra's pinned clean-tree check → chaos base deploy of the checked-out pinned version, not
LATEST); `abra.recipe_checkout` now `git checkout -f` (the version-pinning checkout must yield the
exact ref tree, robust to the cc-ci overlay colliding with head_ref's tracked copy); `wait_ready_probes`
now supports a TCP probe (`{tcp_host,tcp_port,stable=N}`) so readiness gates on the **voice server**
(64738) being stably listening — HEALTH_PATH only proves the mumble-web sidecar, and after the chaos
upgrade redeploy the host-mode 64738 port churns (old task releases → new binds), which otherwise let
backup-bot exec into a not-running app container (409). `tests/mumble/install_steps.sh` provides an
identical `compose.host-ports.yml` to versions predating it (the upgrade base 0.2.0+; upstream added it
in 1.0.0). No browser flow (P6 N/A — mumble's core UX is the native client over the voice protocol,
covered by the handshake test; the web UI is asserted via test_web_client).
**HOW (Adversary, cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && git pull && RECIPE=mumble PR=0 cc-ci-run runner/run_recipe_ci.py'
```
**EXPECTED:**
- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/backup/restore/custom` **all `pass`**.
- Base deploy log: `deploy_app(mumble@0.2.0+v1.6.870-0): CHAOS_BASE_DEPLOY set → chaos base deploy …`
and `mumble install_steps: provided compose.host-ports.yml to recipe checkout (mumble)`.
- `ready-probe OK (tcp 3x): 127.0.0.1:64738` appears **TWICE** (post-install + post-upgrade).
- Upgrade: `upgrade→PR-head: head_ref=9fa5e949 chaos-version=9fa5e949 version=0.2.0+v1.6.870-0→
1.0.0+v1.6.870-0` (real crossover; head_ref==chaos-version = HC1).
- Custom tier — **5 PASS**: `test_tcp_health`, `test_protocol_handshake`
(`test_handshake_completes_with_channel_presence`), `test_web_client`
(`test_web_client_serves_mumble_web_ui`), `test_welcome_text_roundtrip`
(`test_configured_welcome_text_surfaces_in_serversync`), `test_server_config_limits`
(`test_configured_max_users_surfaces_in_serverconfig`).
- P4: `test_backup_captures_state` + `test_restore_returns_state` **PASSED** (ci_marker survives).
- Clean teardown: post-run `docker stack ls | grep mumb` → empty.
**WHERE.** Commits `6841048` (test content) + `6bf0425` (install_steps host-ports) + `999dd0d`
(CHAOS_BASE_DEPLOY) + `a0fd58b` (sqlite .timeout) + `1890cb5` (recipe_checkout -f) + `ec76072`
(TCP READY_PROBE). Files: `tests/mumble/{recipe_meta.py,PARITY.md,ops.py,install_steps.sh,
compose.host-ports.yml,test_install.py,test_backup.py,test_restore.py,functional/*.py}`,
`runner/harness/{lifecycle.py,abra.py}`. Log `/root/ccci-mumble-full6.log`. Isolation diagnostic
(backup/restore green on a stable deploy, no upgrade): `/root/ccci-mumble-diag.log`.
---
**Gate: HQ1 image pre-pull — ✅ Adversary PASS @2026-05-29 (REVIEW-2 `0215bd2`).**
**WHAT.** `runner/harness/lifecycle.prepull_images(recipe, domain)` warms the local image store BEFORE
the deploy: resolves the recipe's images via `docker compose --env-file <app.env> -f <COMPOSE_FILE>
config --images` (handles `$VERSION` interpolation + multi-compose, reading abra's COMPOSE_FILE from
the app .env), then `docker pull` each with **skip-if-present** (`docker image inspect` → zero network
for already-cached pinned tags). Called in `deploy_app` BEFORE the (UNCHANGED, real) `abra.deploy`, and
in `generic.perform_upgrade` BEFORE the chaos redeploy (warms the new-version images). A pull failure
RAISES a clear pull error pre-deploy (not a murky converge timeout). The deploy path is unchanged —
prepull only warms the store (no `docker service update/scale`). Honest scope: removes PULL time, NOT
app-INIT time (slow-init apps still need their healthcheck/READY_PROBE).
**HOW / EXPECTED (Adversary, on cc-ci):**
- `cc-ci-run -m pytest tests/unit/test_prepull.py -q` → **4 passed** (present→skip, missing→pull,
pull-fail→RAISE, no-images→skip).
- Warm-cache no-redownload: run a cached recipe (e.g. n8n) → log shows `prepull: present <img>` (skip,
no pull). Proven: `/root/ccci-n8n-prepull2.log` (`prepull: present n8nio/n8n:2.20.6`), install+custom pass.
- Bad-tag clear error: pointing a recipe at a bogus image tag → prepull RAISES
`RuntimeError: … clear pull error BEFORE deploy: manifest unknown` (proven, deploy-free).
- Real-run-green + abra unchanged: `/root/ccci-n8n-prepull.log` (cold image pulled by prepull, then
install+custom GREEN); `grep` of `prepull_images` shows only compose-config / image-inspect / pull.
**WHERE.** Commit `2bf40d6`. Files: `runner/harness/lifecycle.py` (`prepull_images` + deploy_app call),
`runner/harness/generic.py` (perform_upgrade call), `tests/unit/test_prepull.py`. Plan:
`cc-ci-plan/plan-prepull-images.md`.
---
**Gate: Q3.3 lasuite-meet — ✅ Adversary PASS @2026-05-29 (REVIEW-2 `a46f7d4`).**
**WHAT.** lasuite-meet (La Suite real-time meetings via LiveKit; OIDC-required; sibling of
lasuite-docs/drive) runs its **full lifecycle GREEN** — install + upgrade (real prev→PR-head
crossover) + backup + restore + custom (health + OIDC + meeting_flow). Enrolled by reusing the
lasuite-drive OIDC-at-install machinery (DEPS=["keycloak"], OIDC_AT_INSTALL, install_steps.sh wiring
OIDC env before the single deploy). Two infra fixes were needed:
- **R014 lightweight-tag → chaos-base deploy** (commit `72719fe`): upstream coop-cloud lasuite-meet
ships a stray LIGHTWEIGHT tag `0.3.0+v1.16.0`, which FATAs `abra recipe lint` (R014) on the pinned
prev-version base deploy. Fix: `abra.has_lightweight_version_tags` detects it; deploy_app then
deploys the EXPLICITLY-checked-out prev version with chaos (chaos skips lint + deploys the current
checkout — NOT latest; F1d-2's hazard was a *missing* checkout). Verified by the real upgrade
crossover below. (An origin-repoint approach was tried + abandoned: go-git 'reference not found'.)
- **meeting_flow tolerant delete** (commit `1f7806a`): meet 0.3.0 soft/async-deletes rooms, so the
post-delete 404 check is best-effort; the §4.3 create+read-back+LiveKit-token asserts stay HARD.
**HOW (Adversary, cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && git pull && RECIPE=lasuite-meet PR=0 cc-ci-run runner/run_recipe_ci.py'
```
**EXPECTED:**
- RUN SUMMARY: `deploy-count = 1`; `install/upgrade/backup/restore/custom` **all `pass`**.
- `tests/lasuite-meet/functional/test_meeting_flow.py::test_create_room_get_livekit_token_and_read_back`
**PASSED** — creates a room (201), reads it back (200, same LiveKit room), the LiveKit token is a JWT
granting that room, deletes it.
- `test_oidc_password_grant_against_dep_keycloak` **PASSED** (not skipped) — password-grant JWT vs the
per-run keycloak realm `lasuite-meet-<6hex>`.
- Log shows `lightweight upstream tag present → chaos base deploy` and
`upgrade→PR-head: … version=0.2.0+v1.15.0→0.3.0+v1.16.0` (real crossover, NOT latest-as-base).
- Data-integrity: postgres ci_marker survives upgrade + backup→wipe→restore.
- Clean teardown: post-run no `lasu` stacks/volumes.
**WHERE.** Commits `32a743f` (recipe_meta) + `9c6cb53` (meeting_flow + PARITY) + `72719fe` (R014
chaos-base) + `1f7806a` (tolerant delete). Files: `tests/lasuite-meet/{recipe_meta.py,install_steps.sh,
ops.py,test_*.py,functional/*.py,PARITY.md}`, `runner/harness/abra.py` (`has_lightweight_version_tags`),
`runner/harness/lifecycle.py` (chaos-base branch). Log `/root/ccci-meet-full6.log`. webrtc-media/relay
UDP media-relay = documented env-blocker non-port (maximal subset = LiveKit token issuance, shipped).
---
**Gate: Q3.2 lasuite-drive — RE-CLAIMED @2026-05-29 (after F2-12 fix), awaiting Adversary.**
(First claim `911680f` FAILed cold-verify — F2-12: the upgrade chaos redeploy's abra converge monitor
FATA'd while the NEW collabora 25.04.9.4.1 was still in its healthcheck `start_period`. Fixed by
`e1147b5`; re-validated 3× green. F2-12 is Adversary-owned — left for the Adversary to close.)
**WHAT.** lasuite-drive (the heaviest Phase-2 stack: 12 services incl. collabora + onlyoffice +
minio/S3 + postgres, OIDC-dependent) now runs its **full lifecycle GREEN, repeatably** — install +
upgrade (prev→PR-head chaos crossover) + backup + restore + custom (health + MinIO round-trip + OIDC
password-grant), via **three fixes**:
1. **Install-time OIDC wiring** (commit `a151489`) — the orchestrator provisions the per-run realm on
the live-warm keycloak BEFORE the single `abra app deploy`, and `tests/lasuite-drive/install_steps.sh`
writes the OIDC env + client secret into that one deploy. This **eliminates the flaky post-deploy
`--force --chaos` 12-service reconverge** the old `setup_custom_tests.sh` did (collabora WOPI-discovery
race; JOURNAL Step 0). New per-recipe `OIDC_AT_INSTALL` meta flag + reusable `_provision_deps()`
helper; legacy post-deploy path unchanged for all other dep recipes (gated on `not oidc_at_install`).
2. **collabora-ready upgrade gate + DEPLOY_TIMEOUT plumbing** (commit `4b38b66`) — `ops.py::pre_upgrade`
waits for collabora WOPI discovery → 200 BEFORE the chaos redeploy, so it no longer SIGTERMs a
still-booting OLD collabora; `DEPLOY_TIMEOUT` threads to the upgrade `chaos_redeploy`.
3. **F2-12 fix — own the upgrade convergence verification** (commit `e1147b5`). The upgrade chaos
redeploy now runs `abra … -c` (`--no-converge-checks`): abra's own post-deploy monitor — which
FATA'd while the NEW collabora 25.04.9.4.1's healthcheck was still in `start_period` (jail/config
init) — is dropped. `docker stack deploy` still applies the spec; `generic.perform_upgrade` then
OWNS a **stricter** verification with a generous (DEPLOY_TIMEOUT) deadline: `lifecycle.wait_healthy`
(every swarm service N/N + app HEALTH_PATH 200) **then** `lifecycle.wait_ready_probes`
(recipe `READY_PROBE` → collabora WOPI `/hosting/discovery` 200). The new collabora converges
through swarm's healthcheck retries; HC1 (chaos-version == PR-head) + deploy-count=1 preserved.
**Non-vacuous (P7-negative) PROVEN** by `tests/unit/test_f212_upgrade_convergence.py` (commit
`6506c4a`, 5 tests): `wait_ready_probes`/`wait_healthy` RAISE `TimeoutError` on a stuck/never-serving
convergence — so a genuinely broken upgrade stays RED; this is not green-washing abra's skipped check.
**HOW (Adversary, cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && git pull && RECIPE=lasuite-drive PR=0 cc-ci-run runner/run_recipe_ci.py'
ssh cc-ci 'cd /root/<your-clone> && cc-ci-run -m pytest tests/unit/test_f212_upgrade_convergence.py -q'
```
**EXPECTED:**
- RUN SUMMARY: `deploy-count = 1 (expect 1)`; `install/upgrade/backup/restore/custom` **all `pass`**.
- `test_oidc_password_grant_against_dep_keycloak` **PASSED** (NOT skipped) — real password-grant JWT.
- `test_minio_storage` PASSED (real S3 upload→list→cat readback inside the minio container).
- Data-integrity: `test_upgrade_preserves_data` (ci_marker survives prev→PR-head chaos crossover) +
backup/restore ci_marker survive.
- Log shows `install-time OIDC: deps provisioned` + `install_steps: OIDC env wired` (no post-deploy
reconverge) and **`ready-probe OK (200)` TWICE** (post-install + post-upgrade, collabora WOPI).
- Clean teardown: post-run `docker stack ls | grep lasu` and `docker volume ls | grep lasu` both empty.
- Unit: **5 passed** in `tests/unit/test_f212_upgrade_convergence.py` (the P7-negative proof).
**WHERE.** Commits `a151489` (Part A) + `4b38b66` (upgrade gate) + `e1147b5` (F2-12 own-convergence) +
`6506c4a` (P7-negative unit tests). Files: `runner/run_recipe_ci.py` (`_provision_deps`,
`OIDC_AT_INSTALL` branch, `_perform_op` meta+timeout, post-install `wait_ready_probes`),
`runner/harness/abra.py` (`deploy(no_converge_checks)`), `runner/harness/lifecycle.py`
(`chaos_redeploy(no_converge_checks)`, `wait_ready_probes`), `runner/harness/generic.py`
(`perform_upgrade` own-wait), `tests/lasuite-drive/{install_steps.sh,setup_custom_tests.sh,ops.py,recipe_meta.py}`
(`READY_PROBE`), `tests/unit/test_f212_upgrade_convergence.py`.
**3× repeat-green of the F2-12 fix** (flakiness gone, not absent-once): `/root/ccci-drive-f212-v1.log`,
`…-v2.log`, `…-v3.log` — each full-suite green, deploy-count=1, OIDC PASSED, ready-probe OK twice,
clean teardown. Step-0 root-cause logs in JOURNAL-2. DEFERRED.md disk-blocker CLOSED (host 64G).
---
**Gate: Q2 — Adversary PASS @2026-05-28** (REVIEW-2 `## Q2 — PASS @2026-05-28 (re-verify after
F2-5 fix + F2-6 collateral resolution)`; cold e2e on `/root/adv-verify` HEAD `874bfbb`:
deploy-count=2, all 5 assertions PASS, DEPS teardown clean, post-run docker stack/volume/secret
with 'keyc|lasuite' filter all empty; NO VETO). F2-5 + F2-6 CLOSED; F2-7 stands as open scope
(authentik backend in harness.sso when Q2.2 enrolls). Builder may advance to Q3 — already in
flight (Q3.1 partial @ `874bfbb`, Q5.1 docs @ `b2151af`).
Acceptance per plan §6 Q2: "a dependent recipe deploys its provider + runs an OIDC login test
in one run." Proven cold:
**Objective evidence pointers (Q2):**
- **Q2.1 keycloak parity + 2 NEW specific tests** — commit `d5f5e86`:
- `tests/keycloak/functional/test_health_check.py` — parity port.
- `tests/keycloak/functional/test_password_grant_token.py` — password grant, JWT decoded, claims
(iss/azp/typ/exp/iat) validated.
- `tests/keycloak/functional/test_create_client_and_use.py` — admin-API client CRUD +
client_credentials grant + JWT azp/iss validation + idempotent cleanup.
- `oidc_integration.py` parity deferred to Q3 (cross-recipe; see PARITY.md note).
- Bumped DEPLOY_TIMEOUT + HTTP_TIMEOUT to 900s.
- Cold e2e (log `/root/ccci-q2-keycloak-r3.log`): all 5 stages PASS, deploy-count=1,
`head_ref=666649a6 == chaos-version=666649a6`, version `10.7.0+26.6.1 → 10.7.1+26.6.2`.
- **Q2.3 dep resolver + SSO-setup harness primitives** — commit `4d6b040`:
- `runner/harness/deps.py` — declared_deps + dep_domain + deploy_deps + teardown_deps + JSON
run state. Subsumes Q0.4 (dep resolver).
- `runner/harness/sso.py` — setup_keycloak_realm + oidc_password_grant +
assert_discovery_endpoint. Reusable by every SSO-dependent recipe (Q3 will exercise).
- `runner/run_recipe_ci.py` — wired in dep deploy BEFORE recipe-under-test, dep teardown
AFTER in finally (reverse order). DG4.1 expected count = 1 + len(deps).
- `tests/conftest.py` — `deps_apps` fixture exposes dep domains to dependent tests.
- 7 new unit tests in `tests/unit/test_deps.py`; **28/28 unit tests PASS** cold.
- **F2-5 fix — dep teardown verify=True** — commit `c6e94af`, log `/root/ccci-f25-verify.log`:
- `runner/harness/deps.py::teardown_deps` now uses `lifecycle.teardown_app(..., verify=True)`
so residuals raise `TeardownError`. Errors are logged per-dep but we continue to other deps;
a combined `TeardownError` is raised after all attempts.
- `runner/run_recipe_ci.py` catches the dep `TeardownError` in finally, surfaces via
`dep_teardown_error` in the run summary + non-zero exit code.
- Cold-verified: lasuite-docs+keycloak dep e2e PASSED clean (3 custom + 2 lifecycle install =
5 PASS); post-run cc-ci state has NO leftover keycloak (`docker stack ls | grep keyc` →
empty; `docker volume ls | grep keyc` → empty; `docker secret ls | grep keyc` → empty).
- deploy-count=2, expected 2.
- **Q2.4 acceptance (the gate)** — commit `9e88741`, log `/root/ccci-q24-lasuite-keycloak.log`:
- `tests/lasuite-docs/recipe_meta.py` declares `DEPS = ["keycloak"]`.
- `tests/lasuite-docs/functional/test_oidc_with_keycloak.py`:
- Asserts `deps_apps["keycloak"]` is the per-run dep domain.
- Calls `harness.sso.setup_keycloak_realm` → realm/client/user.
- GETs OIDC discovery; asserts `issuer == https://<kc>/realms/lasuite-docs`.
- Performs password grant → JWT; asserts iss/azp/typ/exp claims.
- Cold-run output:
```
===== DEPS: ['keycloak'] =====
dep: deploying keycloak -> keyc-c12afe.ci.commoninternet.net
dep: keycloak ready @ keyc-c12afe.ci.commoninternet.net
===== TIER: install ===== 2 PASS (generic + cc-ci overlay)
===== TIER: custom ===== 1 PASS (test_oidc_password_grant_against_dep_keycloak)
===== DEPS teardown =====
===== RUN SUMMARY =====
deploy-count = 2 (expect 2)
```
- **F2-3 systemic fix** — commit `47f7cb4`: `runner/harness/browser.py::goto_with_retry`
centralizes the F2-3 try/except PlaywrightError pattern; applied to **all** install overlays
using page.goto (custom-html, n8n, keycloak, cryptpad, lasuite-docs) + the custom-html
playwright/test_browser_smoke. Cold e2e (custom-html, log `/root/ccci-q2-customhtml-r2.log`):
all 5 stages PASS, deploy-count=1, HC1 non-vacuous.
**Reference command for Adversary (cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && \
cc-ci-run -m pytest tests/unit -v && \
RECIPE=keycloak cc-ci-run runner/run_recipe_ci.py && \
RECIPE=lasuite-docs STAGES=install,custom cc-ci-run runner/run_recipe_ci.py'
```
---
**Gate: Q1 — Adversary PASS @2026-05-28** (REVIEW-2 `## Q1 — PASS @2026-05-28 (re-verify after
F2-3 + F2-4 fixes)`; cold e2e on `/root/adv-verify` HEAD `fc89552` → all 5 stages PASS,
deploy-count=1, HC1 non-vacuous; F2-3 + F2-4 CLOSED; NO VETO). Builder may advance to Q2.
**Objective evidence pointers (Q1):**
- **custom-html (Q1.1)** — already cold-verified in Q0 PASS. Same evidence stands: full e2e green,
HC1 non-vacuous, deploy-count=1; PARITY.md + functional/ + playwright/ in place.
- **n8n (Q1.2)** — full e2e on cc-ci (log `/root/ccci-q1-n8n-r3.log`):
- **HC1 PR-head proof:** `head_ref=63dd3e0f == chaos-version=63dd3e0f`, version
`3.1.0+2.9.4 → 3.2.0+2.20.6`.
- **Deploy-count = 1** (DG4.1 holds).
- **Lifecycle tier results (generic + cc-ci overlay both PASS at each stage):**
- install: generic `test_serving` PASS + cc-ci `test_serving_and_editor` PASS (the robust
Playwright poll handles n8n's /healthz-200-before-/-route-registered window).
- upgrade: generic `test_upgrade_reconverges` PASS + cc-ci `test_upgrade_preserves_data`
PASS (marker `upgrade-survives` written into /home/node/.n8n by `ops.pre_upgrade` survived
the chaos redeploy of PR-head).
- backup: generic `test_backup_artifact` PASS + cc-ci `test_backup_captures_state` PASS
(marker `original` from `ops.pre_backup` captured by `abra app backup create`).
- restore: generic `test_restore_healthy` PASS + cc-ci `test_restore_returns_state` PASS
(marker mutated to `mutated` by `ops.pre_restore`, restored to `original` — real backup
data-integrity).
- **Custom tier results (4 PASS — log `/root/ccci-q1-n8n-r4.log` post-F2-4/F2-3 fix):**
- `tests/n8n/functional/test_health_check.py::test_n8n_returns_200` — parity port (HTTP 200
from `/`), with `SOURCE: recipe-info/n8n/tests/health_check.py` comment.
- `tests/n8n/functional/test_workflow_roundtrip.py::test_workflow_create_and_read_back` —
**plan §4.3 prescribed create+read-back**: owner setup → POST /rest/workflows → GET
/rest/workflows/<id>; assert id/name/nodes round-trip. (F2-4 fix.)
- `tests/n8n/functional/test_rest_settings.py::test_rest_settings_returns_json_with_known_keys`
— polls `/rest/settings` until content-type is `application/json` (rejecting the
"n8n is starting up" placeholder HTML), then asserts known public-settings keys
(`userManagement` / `defaultLocale` / `authCookie`) in the `data` envelope.
- `tests/n8n/functional/test_login_state.py::test_login_endpoint_returns_json` — polls
`/rest/login` until content-type is `application/json`, proves auth subsystem initialized.
- **PARITY.md complete:** `tests/n8n/PARITY.md` — parity row for `health_check.py`, rationale
for the 2 recipe-specific tests, data-integrity + playwright sections.
- **Q1 has no Adversary findings yet.** No tests skipped/weakened; rejecting-the-placeholder
pattern in the new functional tests is non-vacuous (a stuck-booting n8n that only serves the
placeholder fails the test).
**Reference command for Adversary (cold, on cc-ci):**
```
ssh cc-ci 'cd /root/<your-clone> && RECIPE=n8n cc-ci-run runner/run_recipe_ci.py'
```
---
**Gate: Q0 — Adversary PASS @2026-05-28** (REVIEW-2 `## Q0 — PASS @2026-05-28`; cold re-verify on
`/root/adv-verify` HEAD `0b834e9` → 21 unit PASS + e2e PASS; NO VETO). F2-1 closed; F2-2 (scope
observation) acknowledged.
**Prior Q0 claim detail (commit `5741e88` — F2-1 fix landed on top of the original Q0 changeset).** Acceptance evidence (per plan §6 Q0): a reference recipe
(custom-html) uses the new harness additions for a full parity + specific suite, green via the
existing run path. F2-1 (test_custom_tests_repo_local_gated stale assertion) closed by Builder; cold
re-run on cc-ci → **21/21 PASS** including the previously-failing test. F2-2 (scope observation:
OIDC-flow + dep resolver not in Q0) acknowledged — those primitives implement when Q2/Q3 consume
them; BACKLOG-2 Q0.4 remains open and explicitly deferred.
**Objective evidence pointers (Q0):**
- **Harness additions landed**
- `runner/harness/http.py` — canonical Phase-2 recipe-test HTTP API (vendored from
`references/recipe-maintainer/utils/tests/helpers.py`): `http_get`, `http_post`, `http_request`,
`retry_http_get`, `retry_http_post`, `wait_for_http`, `assert_converges`. JSON + form bodies,
transport-failure → status=0.
- `runner/harness/discovery.custom_tests` recurses into `tests/<recipe>/functional/` and
`tests/<recipe>/playwright/` (Phase 2 §4.1 layout) while excluding lifecycle `test_<op>.py`
names; HC2 repo-local gate continues to apply.
- TTY abra wrapper already present in `runner/harness/abra.py::_run_pty` (Phase 1d) — reused.
- **Unit-test proof (deterministic, cc-ci; post-F2-1 fix commit `5741e88`)**
- `cc-ci-run -m pytest tests/unit -v` → **21 passed in 5.38s** (the previously-failing
`test_custom_tests_repo_local_gated` now passes; synthetic-recipe + monkeypatch fixture):
- 8× pre-existing `tests/unit/test_discovery.py` (overlay + HC2 gate, regressed).
- 2× new `tests/unit/test_discovery_phase2.py` (functional/+playwright/ recursion + HC2
gate still applies to subdirs).
- 11× new `tests/unit/test_http.py` (in-process http.server fixture — JSON parsing,
4xx-with-body, non-JSON body, transport-failure=0, headers, JSON+form POST, retry
convergence, retry timeout, wait_for_http, assert_converges return value).
- **End-to-end proof (custom-html on cc-ci, the reference recipe)**
- `RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py` (log `/root/ccci-q0-customhtml-full.log`):
- install/upgrade/backup/restore/custom **all PASS**, deploy-count=1.
- HC1 PR-head proof: `head_ref=8a026066 == chaos-version=8a026066`, version `1.10.0→1.11.0`.
- 5 lifecycle assertions (generic + cc-ci overlay across 4 ops) + 4 custom-stage assertions
(3 functional + 1 playwright). Reference command for Adversary cold re-run:
`RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py`.
- **Per-recipe contract artifact landed**
- `tests/custom-html/PARITY.md` — parity row for `health_check.py`, rationale for the 2
recipe-specific tests + the data-integrity + playwright sections.
- `tests/custom-html/functional/{test_health_check.py,test_content_roundtrip.py,test_content_type_header.py}` — parity port + 2 NEW recipe-specific tests; each parity file carries the
`SOURCE: recipe-info/custom-html/tests/<file>` comment for audit.
- `tests/custom-html/playwright/test_browser_smoke.py` — Phase-2 P6 home.
**Reference command for Adversary (cold, on cc-ci):**
```
ssh cc-ci 'cd /root/cc-ci && cc-ci-run -m pytest tests/unit -v && RECIPE=custom-html cc-ci-run runner/run_recipe_ci.py'
```
## Blocked
**Q4.10 drone — OPERATOR host-deploy needed @2026-05-29.** drone's required gitea SCM dep binds
`/etc/timezone`, absent on the NixOS host. Declarative fix committed (`3bde76f`,
`environment.etc.timezone=UTC` in `nix/hosts/cc-ci/configuration.nix`) but needs a host
`nixos-rebuild` to activate (no self-service path on the host; `/root/cc-ci` is operator-synced + currently
stale re this commit — same operator deploy mechanism that activated the immich `time.timeZone` fix).
**Operator action:** sync `/root/cc-ci` + `nixos-rebuild switch --flake /root/cc-ci#cc-ci`, then verify
`ssh cc-ci 'cat /etc/timezone'`=UTC. Once deployed, the Builder executes the scoped gitea+drone
integration (JOURNAL-2 `f86a58a`). DEFERRED.md 2026-05-29 drone entry has the full detail. This blocks
ONLY drone (the last §5 recipe); all other §5 recipes are enrolled (mumble/mailu PASS this session;
discourse deferred-sound; the rest PASS earlier).
**(historical) Docker Hub rate-limit block — RESOLVED @2026-05-28 ~22:10Z** (Adversary-confirmed).
**Docker Hub rate-limit fix — DONE (registry-creds finding, plan §1.5), all 3 conditions met.**
Operator provided a read-only PAT (`DOCKERHUB_USERNAME=nptest2` + `DOCKERHUB_TOKEN` in `.testenv`).
Wired declaratively; verify commands + expected outcomes for the Adversary:
1. **Authenticated 200-limit from account source** (Adversary already CONFIRMED in REVIEW-2). Re-check:
`ssh cc-ci` → `docker info | grep Username` = `nptest2`; an authenticated manifest HEAD shows
`ratelimit-limit: 200;w=21600` and `docker-ratelimit-source: b662dd8b-…` (account hash, NOT IP
`68.14.43.142`).
2. **Swarm SERVICE-task pulls authenticate** — PROVEN with an **uncached** image:
`ssh cc-ci 'cd /root/cc-ci && RECIPE=n8n STAGES=install cc-ci-run runner/run_recipe_ci.py'`
→ EXPECTED: `install: pass`, deploy-count=1, NO `toomanyrequests`; the swarm task pulls
`n8nio/n8n:2.20.6` to 1/1. During the run the **account** counter decrements (197→196 resolution
→195 agent layer pull, source = account hash) — the agent pull is billed to the account, not the
anon IP. (n8n images were uncached, so this is a real fresh-pull test, not a cached false-pass.)
Conclusion: abra `docker stack deploy` propagates the cred on this single-node swarm; no
`--with-registry-auth` flag or pre-pull needed.
3. **Declarative persistence across a 1c rebuild** — PAT sops-encrypted (`secrets/secrets.yaml` key
`dockerhub_auth` = base64("nptest2:PAT"), submodule `cdd5e0a`); `nix/modules/secrets.nix` adds
`sops.secrets.dockerhub_auth` + `sops.templates."docker-config.json"` → renders
`/root/.docker/config.json` (0600 root) at activation. Verify: after `nixos-rebuild switch`,
`ls -l /root/.docker/config.json` → symlink to `/run/secrets/rendered/docker-config.json`; the
activation log shows `adding rendered secret: docker-config.json`. Recorded in DECISIONS.md
("Docker Hub auth: declarative config.json via sops").
**Bonus unblocked:** Q3.2 lasuite-drive base deploy now CONVERGES (all 12 services incl.
onlyoffice+collabora) — `RECIPE=lasuite-drive STAGES=install` → `install: pass`, deploy-count=1
(commit before this; the rate limit was the only blocker). Q3.2 specifics (OIDC/WOPI/upload) are next.
**Earlier Gitea outage (RESOLVED @~21:08Z).** git.autonomic.zone returned blanket `404` for ~1.5h
(backend down; same from my sandbox AND cc-ci). Reconciled: pulled + pushed queued commits. The 3
watchdog pings during the outage were phantoms (Adversary's failed push retries); nothing lost.
**Prior bootstrap state:** access re-verified @2026-05-28: `ssh cc-ci` ok (root, NixOS 24.11), Gitea
API HTTP 200, wildcard DNS resolves to gateway 143.244.213.108.
## Carryover from Phase 1e (not blockers for Phase 2)
- **F1e-2** [adversary] — concurrent same-recipe `abra recipe fetch` race in
`runner/run_recipe_ci.py::fetch_recipe`. Pre-existing in Phase 1d; not a 1e regression. Drone
caps `MAX_TESTS=1` today, so practical impact bounded. Tracked for Phase-2 breadth-ramp if
concurrent recipe runs become routine.