Files
cc-ci/JOURNAL-rcust.md
2026-06-10 19:13:36 +00:00

11 KiB
Raw Blame History

JOURNAL — sub-phase rcust (Builder)

2026-06-10 bootstrap

Read phase plan (recipe-custom-restructure-full-plan.md), plan.md §6.1/§7/§9, and the reference spec docs/recipe-customization.md @ 76a4b6b in full. Created phase state files. Work branch will be restructure/recipe-custom off main @ 76a4b6b. Starting P1: reading the six current loaders (run_recipe_ci.py::_load_meta, conftest.py::_recipe_meta, lifecycle.py::_recipe_extra_env, lifecycle.py::_recipe_meta_flag, deps.py::declared_deps, canonical.py::is_canonical_enrolled) before writing harness/meta.py.

2026-06-10 P1 — single loader + registry (branch 472a68b)

Wrote runner/harness/meta.py: KEYS registry (14 keys + CHAOS_BASE_DEPLOY/OIDC_AT_INSTALL/ SKIP_GENERIC kept registered as deprecated=True so P1 lands green before P2 deletes them), RecipeMeta generated from KEYS via dataclasses.make_dataclass (frozen; field set cannot drift from the registry), load() = the only exec() of recipe_meta.py, MetaError on unknown ALL-CAPS/type mismatch/callable-on-data-key, difflib suggestion in the unknown-key message. BACKUP_CAPABLE keeps its tri-state via default None (None = auto-detect — preserves the old "BACKUP_CAPABLE" in meta semantics in generic.backup_capable).

Migrations: orchestrator loads once + passes meta down (deploy_app/perform_upgrade/_perform_op/ run_lifecycle_tier all take the object); conftest meta fixture returns full RecipeMeta (R3 closed); lifecycle._recipe_extra_env/_recipe_meta_flag and deps.declared_deps deleted; canonical.is_enrolled

  • enrolled_recipes go through meta.load (tests monkeypatch meta.TESTS_DIR now instead of canonical.file); screenshot._load_screenshot_hook reads the attribute (R2 fixed — unit test proves SCREENSHOT survives the real orchestrator load path). deploy_app keeps an optional meta=None fallback (loads via the single loader) for fixture/manual callers — exec still happens in exactly one function.

Effective-value safety check before committing: dumped non_default() for all 21 recipe dirs through the new loader — every recipe's customized key set matches its recipe_meta.py source (e.g. mumble: DEPLOY_TIMEOUT/EXTRA_ENV/HEALTH_OK/READY_PROBE/UPGRADE_EXTRA_ENV). One intentional delta class: deps.deploy_deps' fallback timeouts for a MISSING dep meta change from literal 900/600 to loading the dep's real meta (orchestrator path always supplied metas, so CI behavior is identical).

Verified on cc-ci (rsynced working tree before committing): cc-ci-run -m pytest tests/unit -q -> 175 passed nix develop .#lint --command scripts/lint.sh -> lint: PASS Three pre-existing f212 unit tests passed dicts to wait_ready_probes — updated mechanically to construct RecipeMeta via dataclasses.replace (assertions untouched).

Next: P2a compose.ccci.yml first-class + auto-chaos.

2026-06-10 P2 — legacy keys & paths deleted (branch 8cd72fd)

P2a: lifecycle.provide_ccci_overlay copies tests//compose.ccci.yml into the per-run checkout (after install_steps hook, before prepull/deploy); pinned base deploys auto-chaos on overlay presence (has_ccci_overlay replaces the meta.CHAOS_BASE_DEPLOY elif). ghost/discourse install_steps.sh were copy-only -> deleted whole; their metas keep COMPOSE_FILE in EXTRA_ENV (unchanged wiring, the harness now owns the copy).

P2b: oidc_at_install condition removed — if declared: provisions before the single deploy, legacy post-deploy block + _run_setup_custom_tests_hook deleted. lasuite-docs install_steps.sh is the meet/drive hook with docs' exact env names (diffed against the deleted setup_custom_tests.sh: same keys incl. OIDC_OP_DISCOVERY_ENDPOINT + scopes 'openid email profile'; secret-insert bump identical; only the abra-redeploy step is gone — the single deploy reads the env instead). lasuite-drive's MinIO bucket one-shot -> ops.py pre_install (runs at install-tier start, post- deploy; bucket lives in the minio volume so it survives upgrade/restore; same scale --detach + 30x3s poll as the shell version). run_quick: deps still provision (realm/creds), hook call gone — no quick-enrolled recipe declares DEPS today; noted inline.

P2c: SKIP_GENERIC out of the registry; _skip_generic(op) env-only; skip_generic_env_overrides() prints a !! warning when active under DRONE (P5 will embed in the manifest).

P2d: conftest deps fixture = dict of _DepEntry (dict subclass w/ attribute sugar) — the 6 lasuite files only ever used deps_creds, renamed param to deps, zero assertion changes. NOTE for Adversary: some assert MESSAGE strings ('setup_custom_tests should have populated this.' -> 'dep provisioning...') and docstrings updated — message text only, no assert logic/expected values.

Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 175 passed; nix develop .#lint --command scripts/lint.sh -> PASS. Doc table regenerated to the 14-key registry (doc-sync unit test pins it).

Next: P3 — HookCtx + ctx-hook signatures everywhere.

2026-06-10 P3 — uniform ctx hook convention (branch fd02d9f)

HookCtx frozen dataclass + hook_ctx() constructor in harness/meta.py; ctx.deps read straight from $CCCI_DEPS_FILE (json, both shapes) — meta.py stays import-cycle-free (deps.py imports lifecycle which imports meta). Registry keys carry hook_params; meta.load() enforces the expected positional names per hook key (READY_PROBE/BACKUP_VERIFY/EXTRA_ENV/UPGRADE_EXTRA_ENV=(ctx,), SCREENSHOT=(page, ctx)); run_pre_hook applies meta.check_hook_signature(fn, ("ctx",)) to ops.py hooks before calling. Conversion of 17 ops.py + 8 recipe_meta hooks was scripted (def-line regex + bare domain -> ctx.domain inside the pre*/hook function bodies only) and diff-reviewed; the only manual fixes: keycloak pre_restore passed meta -> ctx.meta, and two comment lines in lasuite-drive/-meet metas that the regex over-replaced were restored. wait_ready_probes gained op= (install/upgrade call sites pass it) so probes can know the phase.

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 180 passed; lint PASS.

Next: P4 — discovery placement rule + op_state/deps fixtures + migrate hand-parsers.

2026-06-10 P4 — custom-test ergonomics (branch 29a28e2)

Pre-change sweeps confirmed the plan's zero-users claims: no top-level non-lifecycle test_*.py in any recipe dir; no recipe test file reads os.environ / CCCI_OP_STATE_FILE directly (the only op-state consumers are the generic assertions via harness.generic.op_state — harness-side, fine). So P4 = discovery glob removal + new op_state fixture + pinning tests; no test migrations needed. test_discovery.py's HC2 gate test moved its repo-local custom fixture under functional/ (the rule); test_discovery_phase2.py now asserts top-level custom is NOT discovered. op_state fixture skips (clear reason) when env unset / file missing / unparseable; tested via request.getfixturevalue.

Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 184 passed; lint PASS.

Next: P5 — customization manifest (print block + results.json key).

2026-06-10 P5 — customization manifest (branch 68954be)

(Resumed after a usage-limit pause mid-P5; working tree carried the in-flight manifest.py.) New runner/harness/manifest.py: build() collects {meta_non_default, hooks, overlays, custom_tests, env_overrides} via the SAME discovery/meta functions the run uses (so the manifest can never disagree with what actually executes — incl. the HC2 _gated() repo-local gate), render() prints the block. Orchestrator builds+prints right after meta load / repo-local snapshot, BEFORE the quick-lane branch (both lanes get the block); the dict rides into build_results(customization=...) verbatim. run_quick writes no results.json, so the single build_results call site covers all. Hooks render as "", tuples as lists (JSON-clean); ops.py pre-ops listed by cheap source scan (same approach as discovery._module_defines — no import at manifest time).

Lint flagged: C408 dict() literal, import-block order (manifest after deps), ruff-format on the new test file — all fixed. Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 191 passed; nix develop .#lint --command scripts/lint.sh -> lint: PASS.

Next: P6 docs, then M1 prep (tests/concurrency proof run + 21-recipe baseline matrix).

2026-06-10 P6 — docs (branch da558ca) + inbox response (858e0f5)

Rewrote the three docs to the restructured end state; kept the generated §4 table byte-identical (doc-sync test pins it). recipe-customization.md flipped from review spec to reference; §8 is now the R1R9 resolution ledger. Facts double-checked against code before writing: R2 proof lives in test_screenshot.py::test_screenshot_reachable_through_real_load_path (not test_meta.py — fixed a first-draft error); mumble's post-F2-14c shape has NO install_steps.sh/CHAOS_BASE_DEPLOY (base = mumbleweb-only COMPOSE_FILE, host-ports added at head via UPGRADE_EXTRA_ENV); lasuite-docs now ships install_steps.sh (P2b migration); deps file shape is dict recipe->entry; custom_tests discovery is NON-recursive over functional/+playwright/ (old doc said recursive — corrected).

Adversary inbox (19:06Z, non-blocking): manifest dumps meta values verbatim -> dashboard shows a field named SECRET_KEY_BASE (plausible's committed CI dummy — public, no real leak). Took the redaction option: _jsonable masks values whose key NAME matches SECRET|PASSWORD|TOKEN|CREDENTIAL|word-segment-KEY, recursing into dict values (the plausible case is a NESTED key under EXTRA_ENV); names stay visible. KEYCLOAK_URL deliberately not matched (word-segment KEY). Unit test pins redacted+passthrough both.

Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 192 passed; nix develop .#lint --command scripts/lint.sh -> lint: PASS.

Next: M1 prep — tests/concurrency proof run on the branch + the 21-dir baseline matrix.

2026-06-10 M1 prep + claim

Concurrency proof run on branch head 858e0f5 (rsynced tree on cc-ci): cc-ci-run -m pytest tests/concurrency -q -> 23 passed in 11.46s (suite untouched by the restructure, as planned).

Baseline matrix: pulled every /var/lib/cc-ci-runs/*/results.json (141 files) and took the most recent per recipe. 19/21 dirs covered by results.json; mumble's last full run predates the results system (log ~/ccci-mumble-f214c.log, 5 tiers pass 05-31); bluesky-pds likewise (Adversary Phase-2 cold verify e45e0ee). plausible's weekly-report RED was its PR branch (pg13->14, build 200); its default-branch baseline is run 308 (06-10) L4 — runs 307/308 are today's, from the conc-phase M2 sweep. Bad canaries recorded at their designed-fail tier.

Claimed M1. While waiting: nothing else unblocked in this phase (M2 is gated on M1) — will hold with short fallback polls per §7 case 2.