Files
cc-ci/docs/recipe-customization.md
autonomic-bot 8cd72fd78d
All checks were successful
continuous-integration/drone/push Build is passing
feat(harness): P2 — delete legacy customization keys & paths (rcust)
a) compose.ccci.yml is FIRST-CLASS: the harness auto-copies tests/<recipe>/
   compose.ccci.yml into the run's recipe checkout (ABRA_DIR-aware, lifecycle.
   provide_ccci_overlay) and auto-chaoses the pinned base deploy on its presence
   (kills the R7 implicit coupling). ghost/discourse install_steps.sh (copy-only
   boilerplate) deleted; CHAOS_BASE_DEPLOY removed from both metas + the registry.

b) install-time deps wiring is the ONLY mode: deps with DEPS provision BEFORE the
   single deploy; legacy post-deploy provisioning + the setup_custom_tests.sh
   invocation machinery deleted. lasuite-docs migrated to install_steps.sh OIDC
   wiring (same env names/values as the old hook — only the timing moved);
   lasuite-drive's remaining post-deploy MinIO bucket one-shot moved to ops.py
   pre_install; both setup_custom_tests.sh files deleted; OIDC_AT_INSTALL removed
   from drive/meet metas + the registry.

c) SKIP_GENERIC meta key deleted (zero users). Env form CCCI_SKIP_GENERIC* stays
   as the documented dev-only escape hatch; when active in a drone CI run the
   orchestrator prints a loud !! warning (manifest embedding lands in P5).

d) conftest cleanup: dead pre-deploy-once fixtures deployed/deployed_app deleted
   (zero users), app_domain + _short + _wait_healthy dropped (only users were the
   deleted fixtures); deps_apps+deps_creds consolidated into ONE deps fixture
   (entries expose .domain etc. as attributes; dict access intact); the 6 lasuite
   test files renamed deps_creds->deps (fixture name only — assertions and flows
   byte-identical). requires_deps marker + F2-11 skip-report plumbing unchanged.

Registry is now exactly the 14 final keys; docs §4 table regenerated. Stale
setup_custom_tests/OIDC_AT_INSTALL prose in docstrings/comments/assert MESSAGES
updated (no assert logic or expected value touched).

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 175 passed; scripts/lint.sh -> PASS.
2026-06-10 17:01:33 +00:00

384 lines
25 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Recipe customization — review spec
Status: REVIEW SPEC — describes the customization surface as it exists today (main), written so
the structure can be reviewed and potentially restructured. §8 lists known limitations and
restructuring candidates; everything before it is purely descriptive.
Companion docs: `docs/testing.md` (test architecture / tier semantics), `docs/enroll-recipe.md`
(step-by-step enrollment). This doc is the **complete reference** for the two questions those docs
answer only partially:
1. How are custom tests written for a particular recipe?
2. What are ALL the per-recipe CI settings, where do they live, and who reads them?
---
## 1. The three customization surfaces
A recipe customizes its CI through **three distinct mechanisms** (worth noticing for the
restructure review — they are three different config languages):
| Surface | Form | Examples |
|---|---|---|
| **Declarative settings** | Python assignments in `tests/<recipe>/recipe_meta.py` | `DEPLOY_TIMEOUT = 1500`, `UPGRADE_BASE_VERSION = "2.3.1+..."` |
| **Code hooks** | Callables in `recipe_meta.py`, `ops.py` functions, shell hooks | `def READY_PROBE(domain): ...`, `pre_upgrade()`, `install_steps.sh` |
| **File presence** | A file existing at a discovered path changes behavior | `test_upgrade.py` overlay, `functional/test_*.py`, `compose.ccci.yml` |
There is additionally a fourth, operator-facing surface: **environment variables**
(`CCCI_SKIP_GENERIC*`) that override declarative settings at run time (§4.4).
## 2. Zero-config baseline
A recipe with **no `tests/<recipe>/` directory at all** still gets the full generic floor:
- deploy base version → INSTALL (generic `assert_serving`: HTTP on `/`, expect 200/301/302)
- chaos-upgrade to PR head → UPGRADE (generic `assert_upgraded`: version label matches head, converged, serving)
- BACKUP (generic `assert_backup_artifact`) — iff the recipe's compose files carry
`backupbot.backup` labels (auto-detected), else N/A
- RESTORE (generic `assert_restore_healthy`)
- CUSTOM tier: empty (no custom tests discovered)
- teardown
Defaults: `HEALTH_PATH="/"`, `HEALTH_OK=(200,301,302)`, `DEPLOY_TIMEOUT=600`, `HTTP_TIMEOUT=300`.
Everything in this doc is opt-in deviation from that floor. The cardinal invariant
(docs/testing.md §1): the generic floor is **always on** and never depends on custom code;
custom is **additive** by default.
## 3. The per-recipe tree — every file that can exist
Two locations, with precedence and a security gate between them:
- **cc-ci-owned**: `tests/<recipe>/` in this repo (trusted, maintainer-reviewed)
- **repo-local**: the recipe repo's own `tests/` dir (PR-author-controlled → **default-deny**,
consulted only when the recipe is listed in `tests/repo-local-approved.txt` — gate HC2,
centralized in `runner/harness/discovery.py`)
```
tests/<recipe>/ # cc-ci side (repo-local mirrors the same shape)
├── recipe_meta.py # ALL declarative settings + meta callables (§4)
├── test_<op>.py # lifecycle overlay assertions, op ∈ install|upgrade|backup|restore (§5.1)
├── ops.py # pre_<op>(domain, meta) seed hooks (§5.2)
├── test_*.py # custom-tier tests (top-level, cross-cutting)(§5.3)
├── functional/test_*.py # custom tier: parity ports + recipe-specific (§5.3)
├── playwright/test_*.py # custom tier: UI flows (§5.3)
├── install_steps.sh # pre-deploy shell hook (§5.4)
├── setup_custom_tests.sh # deps/OIDC credential wiring hook (§5.5)
├── compose.ccci.yml # CI-only compose overlay (via install_steps) (§5.6)
└── PARITY.md # enrollment contract doc (human-read only)
```
Precedence (machine-docs/DECISIONS.md, implemented in `discovery.py`):
- lifecycle overlay `test_<op>.py`: repo-local **wins** over cc-ci (same-name collision); the
generic floor still runs additively alongside.
- custom tier `test_*.py`: **ALL** run, from both locations (no collision concept).
- `install_steps.sh`: repo-local > cc-ci, or none.
- `ops.py` pre-op hook: cc-ci wins; repo-local consulted only if approved.
- `recipe_meta.py`: cc-ci only — repo-local recipes cannot set CI settings (by design; the
settings surface stays maintainer-controlled).
## 4. `recipe_meta.py` — complete settings reference
The single settings file. Plain Python, `exec()`d by the harness (trusted, in-repo). A key is "set"
by a top-level assignment or `def`. Unknown names are ignored silently (a recipe may keep private
constants here, e.g. mumble's `WELCOME_TEXT_MARKER` — but see §8 R6: typos in real key names are
also silently ignored).
**Loader column legend** — this is the structural finding for the review (§8 R1). There is no
single loader; six independent code paths each `exec()` the file and pick out their own keys:
| # | Loader | Keys it sees |
|---|---|---|
| L1 | `runner/run_recipe_ci.py:_load_meta` (orchestrator) | 4 base + explicit 8-key allowlist |
| L2 | `tests/conftest.py:_recipe_meta` (pytest `meta` fixture) | 4 base keys ONLY |
| L3 | `runner/harness/lifecycle.py:_recipe_extra_env` | `EXTRA_ENV` only |
| L4 | `runner/harness/lifecycle.py:_recipe_meta_flag` | boolean flags by name (`CHAOS_BASE_DEPLOY`) |
| L5 | `runner/harness/deps.py:declared_deps` | `DEPS` only |
| L6 | `runner/harness/canonical.py:is_canonical_enrolled` | `WARM_CANONICAL` only |
> **Restructure status (rcust P1):** the six loaders above are HISTORY — they have been replaced by
> the single registry-backed loader `runner/harness/meta.py::load(recipe) -> RecipeMeta` (the only
> `exec()` of `recipe_meta.py`). Unknown ALL-CAPS keys / type mismatches are now hard errors;
> underscore-prefixed names are recipe-private. The authoritative key reference is the generated
> table below; the per-loader subsections §4.1§4.8 are retained for context until the P6 doc
> rewrite.
<!-- META-TABLE-START -->
_This table is GENERATED from the `runner/harness/meta.py` KEYS registry by `scripts/gen-meta-docs.py` — do not edit by hand (a unit test pins the sync)._
| Key | Type | Default | Meaning |
|---|---|---|---|
| `HEALTH_PATH` | `str` | `'/'` | Path probed for serving/health checks (deploy wait + generic `assert_serving`). |
| `HEALTH_OK` | `tuple[int]` | `(200, 301, 302)` | Acceptable HTTP status codes for health. |
| `DEPLOY_TIMEOUT` | `int` | `600` | Max seconds to wait for swarm convergence per deploy. |
| `HTTP_TIMEOUT` | `int` | `300` | Max seconds to wait for HTTP health after convergence. |
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect. |
| `EXPECTED_NA` | `dict` | `None` | Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes. |
| `READY_PROBE` | `hook` | `None` | Callable returning extra readiness probes, run after install AND after upgrade: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}`. |
| `UPGRADE_BASE_VERSION` | `str` | `None` | Exact published tag overriding the upgrade tier's base (default: `recipe_versions[-2]`). |
| `BACKUP_VERIFY` | `hook` | `None` | Callable `-> bool` post-backup data-capture check; `False` re-runs the backup (truncated-dump race guard), retried up to 3 attempts. |
| `UPGRADE_EXTRA_ENV` | `dict_or_hook` | `None` | Extra `.env` keys applied after the PR-head checkout, before the chaos redeploy (env that exists only at head). |
| `EXTRA_ENV` | `dict_or_hook` | `{}` | Extra `.env` keys applied at EVERY deploy (base install AND upgrade old-app). Callable form derives values from the per-run domain. |
| `DEPS` | `list[str]` | `[]` | Dep recipes deployed/provisioned alongside (e.g. `["keycloak"]`); creds land in `$CCCI_DEPS_FILE`. |
| `WARM_CANONICAL` | `bool` | `False` | Enroll the recipe in the warm/canonical app system (docs/warm.md): green cold runs on LATEST advance the canonical snapshot. |
| `SCREENSHOT` | `hook` | `None` | Callable driving Playwright to a safe, credential-free post-login view for the results-card screenshot (default: landing page). |
<!-- META-TABLE-END -->
### 4.1 HTTP / health / timing (base 4 — seen by L1 AND L2)
| Key | Type / default | Meaning | Used by |
|---|---|---|---|
| `HEALTH_PATH` | str, `"/"` | Path probed for serving/health checks | deploy wait (`lifecycle.py`), generic `assert_serving` |
| `HEALTH_OK` | tuple, `(200, 301, 302)` | Acceptable HTTP status codes for health | same |
| `DEPLOY_TIMEOUT` | int s, `600` | Max wait for swarm convergence per deploy | `lifecycle.py`, generic ops |
| `HTTP_TIMEOUT` | int s, `300` | Max wait for HTTP health after converged | same |
Example: immich sets `DEPLOY_TIMEOUT = 1500`, `HTTP_TIMEOUT = 600` (ML containers are slow).
### 4.2 Upgrade tier (loader L1)
| Key | Type / default | Meaning |
|---|---|---|
| `UPGRADE_BASE_VERSION` | str (exact published tag), default `None` | **The "base pin"** — overrides the harness default base for the upgrade tier. Default base = `recipe_versions[-2]` (the previous published version); pin when that is not the PR's true predecessor (e.g. the PR is the first release on a new major, or the previous tag is known-broken). Must be an exact published tag — typos fail the base deploy. Consumed at `run_recipe_ci.py` (`prev = meta.get("UPGRADE_BASE_VERSION") or lifecycle.previous_version(recipe)`). Users: discourse, plausible. |
| `UPGRADE_EXTRA_ENV` | dict **or** callable `(domain) -> dict`, default `None` | Extra `.env` keys applied **after** the PR-head checkout, **before** the chaos redeploy (F2-14c) — for env vars that exist only at head (a new required setting introduced by the PR). Consumed in `generic.py:256`. User: mumble. |
### 4.3 Every-deploy shaping (loaders L3/L4 — NOT in the L1 allowlist)
| Key | Type / default | Meaning |
|---|---|---|
| `EXTRA_ENV` | dict **or** callable `(domain) -> dict`, default `{}` | Extra `.env` keys applied at **every** deploy (base install AND upgrade old-app). Callable form derives values from the per-run domain (e.g. cryptpad's `SANDBOX_DOMAIN`). Loaded by `lifecycle.py:_recipe_extra_env` (its own `exec()`). Users: cryptpad, discourse, ghost, matrix-synapse, mattermost-lts, mumble, plausible. |
| `CHAOS_BASE_DEPLOY` | bool, default `False` | Base deploy uses `--chaos` so it survives untracked files in the recipe checkout (required when `install_steps.sh` copies in a `compose.ccci.yml` overlay — §5.6; implicit coupling, see §8 R7). Loaded by `lifecycle.py:_recipe_meta_flag`. Users: discourse, ghost. |
### 4.4 Skips and intentional N/A (loader L1)
| Key | Type / default | Meaning |
|---|---|---|
| `SKIP_GENERIC` | list of op names or `"all"`/`"*"`, default `[]` | Suppress the generic floor for the listed ops (overlay becomes override instead of additive). Two env equivalents at run time: `CCCI_SKIP_GENERIC=1` (all ops), `CCCI_SKIP_GENERIC_<OP>=1` (one op). Currently set by **no enrolled recipe** (env form is the one used, ad hoc). |
| `EXPECTED_NA` | dict `{rung: reason}`, default `None` | Declares an N/A rung **intentional** (e.g. `{"backup": "stateless, nothing to back up"}`). Undeclared N/A is reported as an *unintentional coverage gap*. Both cap the achievable level — declaring does not un-cap, it only changes the report wording (`results.py`). User: custom-html-tiny. |
| `BACKUP_CAPABLE` | bool, default auto-detect | Overrides the backup-tier capability detection (scan of recipe compose files for `backupbot.backup` labels, `generic.py:34`). `False` forces N/A; `True` forces the tier on. Users: custom-html-bkp-bad/rst-bad (harness self-test recipes). |
### 4.5 Readiness & data-verification hooks (loader L1, callable values)
| Key | Type / default | Meaning |
|---|---|---|
| `READY_PROBE` | callable `(domain) -> [probe, ...]`, default `None` | Extra readiness probes run after install AND after upgrade, before that tier's assertions. Probe dicts: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}` (`stable`: must stay connectable across 3 checks — for UDP-adjacent voice ports etc.). Consumed at `lifecycle.py:516`. Users: lasuite-drive, mumble (TCP voice port). |
| `BACKUP_VERIFY` | callable `(domain) -> bool`, default `None` | Post-backup data-capture check, retried — guards the truncated-dump race (backup snapshot taken before the seeded marker row hit disk). Return `False` → retry the backup, then fail. Users: discourse, ghost. |
### 4.6 Dependencies / SSO (loaders L5 + L1)
| Key | Type / default | Meaning |
|---|---|---|
| `DEPS` | list of recipe names, default `[]` | Dep recipes deployed alongside (e.g. `["keycloak"]`). Dep domain is `<dep[:4]>-<6hex>`, hashed from (parent, pr, ref, dep) — collision-free per run. Creds land in `$CCCI_DEPS_FILE` (JSON); tests use the `deps_apps` fixture; teardown deps LAST. Deploy-count guard becomes `1 + len(DEPS)`. Loaded by `deps.py:declared_deps`. Users: lasuite-docs/-drive/-meet. |
| `OIDC_AT_INSTALL` | bool, default `False` | Provision deps **before** the single base deploy so `install_steps.sh` can wire OIDC env into that one deploy (reads `$CCCI_DEPS_FILE`). Default (legacy) is post-deploy provisioning + a `setup_custom_tests.sh` redeploy. Consumed at `run_recipe_ci.py:514`. Users: lasuite-drive, lasuite-meet. |
### 4.7 Warm-canonical enrollment (loader L6)
| Key | Type / default | Meaning |
|---|---|---|
| `WARM_CANONICAL` | bool, default `False` | Enrolls the recipe in the warm/canonical app system (`docs/warm.md`): green COLD runs on LATEST advance the canonical snapshot; the nightly sweep iterates enrolled recipes. Loaded by `canonical.py:is_canonical_enrolled`. User: custom-html. |
### 4.8 Cosmetic (BROKEN — see §8 R2)
| Key | Type / default | Meaning |
|---|---|---|
| `SCREENSHOT` | callable `(page, domain, meta) -> None` | Drives Playwright to a safe post-login view for the results-card screenshot (default: landing page). **Currently unreachable from the CI path**: `screenshot.py:41` reads it from the meta dict the orchestrator passes (`run_recipe_ci.py:1056`), but the L1 allowlist never loads `SCREENSHOT`, so the hook is always `None`. No recipe sets it (consistent with it never having worked). |
## 5. Writing custom tests & hooks
### 5.1 Lifecycle overlay assertions — `test_<op>.py`
One pytest file per lifecycle op (`install` / `upgrade` / `backup` / `restore`). The
**orchestrator performs the op exactly once**; the overlay only *asserts* on the resulting state
(HC3 op/assertion split — overlays never deploy, never restore, never mutate). The generic floor
test runs additively against the same state.
Conventions (see `tests/immich/test_backup.py` etc.):
- use the `live_app` fixture (asserts `CCCI_APP_DOMAIN` is set, yields the domain)
- use the `meta` fixture for HEALTH_*/timeouts (note: only the 4 base keys — §8 R3)
- read op context from `$CCCI_OP_STATE_FILE` (JSON written by the orchestrator after the op:
versions, artifact paths)
- execute in-container checks via `harness.lifecycle.exec_in_app(domain, service, cmd)`
### 5.2 Pre-op seed hooks — `ops.py`
`def pre_<op>(domain, meta)` callables, imported and called by the orchestrator **before**
performing the op. This is where data gets seeded so the post-op overlay can assert on it:
```python
# tests/immich/ops.py (pattern)
def pre_upgrade(domain, meta): _psql(domain, "INSERT ... 'upgrade-survives'")
def pre_backup(domain, meta): _psql(domain, "INSERT ... 'original'")
def pre_restore(domain, meta): _psql(domain, "DROP TABLE ci_marker") # damage, restore must undo
```
Seed → op → assert is the whole pattern: `pre_backup` writes a marker, the orchestrator backs up,
`pre_restore` destroys it, the orchestrator restores, `test_restore.py` asserts the marker is back.
### 5.3 Custom tier — `functional/`, `playwright/`, top-level `test_*.py`
All non-lifecycle `test_*.py` (discovery: `discovery.py:custom_tests`, recursive over the
top-level dir + `functional/` + `playwright/`; files named `test_<op>.py` excluded). Run in the
CUSTOM tier, after restore, against the post-upgrade (PR-head) app. ALL discovered files run —
cc-ci's and (if HC2-approved) repo-local's, additively.
Enrollment contract (`docs/enroll-recipe.md`): ≥2 NEW functional tests beyond ports of existing
upstream checks; ported tests carry `SOURCE:` comments. Playwright tests get the shared
browser/harness helpers (`harness.browser`); SSO recipes get `harness.sso`
(`setup_keycloak_realm` — idempotent, `oidc_password_grant` — provider-pluggable).
Tests gate on deps via `CCCI_DEPS_READY` (skip-with-reason when `0`; the skip is counted and
fails the run if deps were declared but unprovisionable — `run_recipe_ci.py:816`).
### 5.4 Pre-deploy shell hook — `install_steps.sh`
Runs after `abra app new` + `EXTRA_ENV` application + secret generation, **before** the base
deploy. For setup that must precede the first deploy: writing extra config files into the recipe
checkout, copying in a `compose.ccci.yml` overlay (§5.6), editing `.env` beyond simple key=val.
Env contract: `CCCI_APP_DOMAIN`, `CCCI_RECIPE`, `CCCI_APP_ENV` (path to the app's `.env`), and —
when `OIDC_AT_INSTALL` deps exist — `CCCI_DEPS_FILE`. Must locate the recipe checkout
ABRA_DIR-aware: `RECIPE_DIR="${ABRA_DIR:-${HOME}/.abra}/recipes/${CCCI_RECIPE}"` (per-run
`ABRA_DIR` since the concurrency restructure — a hardcoded `~/.abra` writes to the wrong tree).
Graceful-generic rule: a recipe needing a hook but not shipping one simply fails the generic
install — a correct reported outcome, not a harness error.
### 5.5 Deps credential wiring — `setup_custom_tests.sh`
For legacy (post-deploy) deps provisioning: runs after deps are up, reads `$CCCI_DEPS_FILE`
(jq-readable JSON of dep creds/URLs), wires OIDC config via `abra app config set` + secrets, and
redeploys. With `OIDC_AT_INSTALL = True` this hook is unnecessary (wiring happens in
`install_steps.sh` before the only deploy) — preferred for new enrollments (one deploy, no
deploy-count exception).
### 5.6 CI-only compose overlay — `compose.ccci.yml`
Not auto-discovered: `install_steps.sh` copies it into the recipe checkout, and the recipe must
set `CHAOS_BASE_DEPLOY = True` so the base deploy (`--chaos`) tolerates the untracked file.
Policy: minimal, justified fallback only (ghost's is a 15m `start_period` grace — a literal,
because abra validates `start_period` before env substitution). The overlay is cc-ci-owned even
though it rides in the recipe checkout.
### 5.7 Environment contract summary (what custom code can read)
| Var | Set for | Meaning |
|---|---|---|
| `CCCI_APP_DOMAIN` | all tests + hooks | the app's per-run domain |
| `CCCI_BASE_URL` | approved repo-local code | `https://<domain>` |
| `CCCI_RECIPE`, `CCCI_APP_ENV` | `install_steps.sh` | recipe name, app `.env` path |
| `CCCI_OP_STATE_FILE` | overlay tests | JSON op context (versions, artifacts) |
| `CCCI_DEPS_FILE` | deps hooks + tests | JSON dep creds dict |
| `CCCI_DEPS_READY` / `CCCI_DEPS_NOT_READY_REASON` | custom tier | gate SSO tests, skip-with-reason |
## 6. Run-model context (what the settings plug into)
One deploy chain per run (full detail: `docs/testing.md` §2):
```
deploy BASE (UPGRADE_BASE_VERSION or recipe_versions[-2]; EXTRA_ENV; install_steps.sh;
CHAOS_BASE_DEPLOY?; OIDC_AT_INSTALL deps first?)
→ INSTALL tier (READY_PROBE; generic + overlay asserts)
→ pre_upgrade → chaos-deploy PR HEAD (UPGRADE_EXTRA_ENV)
→ UPGRADE tier (READY_PROBE; version-label == head_ref)
→ pre_backup → backup (BACKUP_CAPABLE; BACKUP_VERIFY)
→ BACKUP tier
→ pre_restore → restore
→ RESTORE tier
→ CUSTOM tier (functional/ + playwright/; deps via CCCI_DEPS_*)
→ teardown (deps LAST)
```
Deploy-count guard (DG4.1): exactly `1 + len(DEPS)` deploys per run (chaos redeploys don't
count); the per-run counter file is keyed by run since the concurrency restructure.
## 7. Local iteration
```
RECIPE=<recipe> PR=<n> REF=<sha> SRC=recipe-maintainers/<recipe> \
STAGES=install,upgrade,backup,restore,custom \
cc-ci-run runner/run_recipe_ci.py
```
(`docs/enroll-recipe.md` §5 for the full loop, including dep teardown caveats.)
## 8. Known limitations & restructuring candidates
The review section. Ordered by how much they'd shape a restructure.
**R1 — Six divergent meta loaders (the core drift hazard).** §4's L1L6: every loader re-`exec()`s
`recipe_meta.py` and cherry-picks its own keys. Adding a key means knowing *which* loader to touch
(or that you must extend the L1 allowlist — `SCREENSHOT` proves people don't, R2). Two conventions
coexist: L1's explicit allowlist vs L3L6's ad-hoc `ns.get(...)` which silently bypasses it.
*Candidate:* one `harness.meta.load(recipe) -> RecipeMeta` with a declarative key registry
(name, type, default, validator, consumer) as the single source of truth; L1L6 become lookups
into the one loaded object; the registry also generates §4 of this doc (kills doc drift, R5).
**R2 — `SCREENSHOT` is a dead knob.** Fully implemented consumer (`screenshot.py`), documented
hook contract, never reachable: the orchestrator's allowlist omits it, so the dict passed at
`run_recipe_ci.py:1056` can never contain it. Direct evidence of R1. *Candidate:* fix trivially by
adding to the allowlist — or delete the hook path if post-login screenshots aren't wanted; decide
during the restructure.
**R3 — The pytest `meta` fixture sees 4 keys.** `tests/conftest.py:_recipe_meta` loads only
HEALTH_*/timeouts. An overlay test wanting e.g. `EXPECTED_NA` or a recipe constant must re-exec
the file itself. Probably intended minimalism, but it's a third key-set to keep in sync.
*Folds into R1.*
**R4 — Settings split across three config languages** (§1): recipe_meta keys, file-presence
(`install_steps.sh` existing changes deploy behavior), and run-time env (`CCCI_SKIP_GENERIC*`).
A reviewer asking "what does this recipe customize?" must check all three. *Candidate:* keep the
three surfaces (they serve different actors) but make the run header log a single resolved
"customization manifest" per run: every non-default key + every discovered hook file + every
CCCI_* override, in one block.
**R5 — Reference-doc drift already happened.** `docs/testing.md` documents 6 meta keys,
`docs/enroll-recipe.md` shows others by example; neither is complete (18 keys exist). This doc is
now complete but handwritten — it will drift too. *Candidate:* generate the key table from the R1
registry (test asserts doc ⊆ registry).
**R6 — No schema validation / silent typos.** Unknown top-level names in `recipe_meta.py` are
ignored, which is load-bearing (recipes keep private constants there: mumble's
`WELCOME_TEXT_MARKER`, `MAX_USERS`). Consequence: misspelling `READY_PROBE` as `READINESS_PROBE`
silently disables the probe — the run goes green with less coverage, the worst failure mode for a
CI harness. *Candidate:* with the R1 registry, warn (not fail) on ALL-CAPS top-level names that
are not registered and not referenced by the recipe's own tests; or namespace private constants
(`_WELCOME_TEXT_MARKER`).
**R7 — `compose.ccci.yml` ⇄ `CHAOS_BASE_DEPLOY` implicit coupling.** The overlay only works if
the recipe *also* sets the flag; forgetting it fails the base deploy with an abra
untracked-files error far from the cause. *Candidate:* if `install_steps.sh` exists alongside a
`compose.ccci.yml`, the harness could auto-enable chaos for the base deploy (or at least assert
the flag and fail with a pointed message).
**R8 — `SKIP_GENERIC` (meta form) has zero users.** Only the env-var form is used, ad hoc. Either
the meta key earns its place (first real user) or it's surface to delete in the restructure.
**R9 — `recipe_meta.py` is code, not config.** Five keys take callables (`EXTRA_ENV`,
`UPGRADE_EXTRA_ENV`, `READY_PROBE`, `BACKUP_VERIFY`, `SCREENSHOT`), so the file must stay an
`exec()`d Python module — it can't be validated as data, serialized into results, or diffed
declaratively. This is a real expressiveness need (cryptpad derives `SANDBOX_DOMAIN` from the
per-run domain), not an accident. *Candidate if restructuring:* split data keys (TOML-able,
schema-validated) from a `hooks.py` (callables only) — but weigh against the cost of two files
per recipe; the R1 registry gets most of the value without the split.
## 9. File / symbol index
| Concern | Where |
|---|---|
| Orchestrator meta loader (L1, allowlist) | `runner/run_recipe_ci.py:250` `_load_meta` |
| Pytest meta fixture (L2) | `tests/conftest.py` `_recipe_meta` |
| `EXTRA_ENV` loader (L3) | `runner/harness/lifecycle.py:114` `_recipe_extra_env` |
| Boolean-flag loader (L4) | `runner/harness/lifecycle.py:132` `_recipe_meta_flag` |
| `DEPS` loader (L5) | `runner/harness/deps.py:37` `declared_deps` |
| `WARM_CANONICAL` loader (L6) | `runner/harness/canonical.py:36` `is_canonical_enrolled` |
| Overlay/custom/hook discovery + HC2 gate | `runner/harness/discovery.py` |
| HC2 allowlist | `tests/repo-local-approved.txt` |
| Generic assertions + `BACKUP_CAPABLE` detect | `runner/harness/generic.py` |
| `READY_PROBE` / `CHAOS_BASE_DEPLOY` consumption | `runner/harness/lifecycle.py:516` / `:283` |
| `EXPECTED_NA` reporting | `runner/harness/results.py` |
| Dead `SCREENSHOT` consumer | `runner/harness/screenshot.py:36`, called `run_recipe_ci.py:1056` |
| Skip-generic logic (meta + env) | `runner/run_recipe_ci.py:285` |
| Worked examples | `tests/ghost/` (overlay+chaos), `tests/mumble/` (TCP probe, UPGRADE_EXTRA_ENV), `tests/lasuite-drive/` (DEPS+OIDC_AT_INSTALL), `tests/immich/` (ops.py seed pattern) |