Compare commits
37 Commits
restructur
...
restructur
| Author | SHA1 | Date | |
|---|---|---|---|
| 29a28e2028 | |||
| fd02d9f4b8 | |||
| 8cd72fd78d | |||
| 472a68b32c | |||
| 49fb818c60 | |||
| 12318582aa | |||
| 76a4b6b3fa | |||
| 6060086c01 | |||
| 9987fba4b6 | |||
| 74ed24053d | |||
| 2894778810 | |||
| 536a3595b9 | |||
| 0684576d74 | |||
| fa9a89bcf8 | |||
| 374371966f | |||
| b1bca1a745 | |||
| 4f6c9554b7 | |||
| 96ba67a63f | |||
| 139e319d7e | |||
| 2173894f07 | |||
| e392c73cbc | |||
| 3180ae1355 | |||
| 9d82a02026 | |||
| bbc2bafbcb | |||
| b7a009c1fc | |||
| 56723ae0ec | |||
| dfa5c8b9ee | |||
| bb5eb3d3aa | |||
| 83a6c6e157 | |||
| 8b9033f3d6 | |||
| e8e52cf4c6 | |||
| c51692b57e | |||
| ffcf441364 | |||
| 2080d734d3 | |||
| f98b444559 | |||
| 08b629f52a | |||
| e350c94c3f |
@ -2,21 +2,67 @@
|
||||
|
||||
## Build backlog
|
||||
|
||||
- [ ] P1 lock-lifetime hardening: prctl PDEATHSIG + ppid race check + SIGTERM handler →
|
||||
- [x] P1 lock-lifetime hardening: prctl PDEATHSIG + ppid race check + SIGTERM handler →
|
||||
teardown funnel + signal.alarm(3600) hard deadline; .drone.yml setsid/trap wrap;
|
||||
PEP 446 comment on lock open()
|
||||
- [ ] P2 flock-probe janitor: acquire_app_lock(domain) at register_run_app's call site;
|
||||
- [x] P2 flock-probe janitor: acquire_app_lock(domain) at register_run_app's call site;
|
||||
janitor probes per-domain lockfiles (acquired→reap under probe lock, held→leave,
|
||||
>120min mtime→warn); delete registry symbols
|
||||
- [ ] P3 per-run ABRA_DIR: /var/lib/cc-ci-runs/<build>/abra with servers+catalogue symlinks,
|
||||
- [x] P3 per-run ABRA_DIR: /var/lib/cc-ci-runs/<build>/abra with servers+catalogue symlinks,
|
||||
fresh recipes/; fetch_recipe = plain clone; delete acquire_recipe_lock; route harness
|
||||
recipe paths through ABRA_DIR
|
||||
- [ ] P4 config cleanup: remove concurrency.limit from .drone.yml; maxTests is the single knob
|
||||
- [ ] tests/concurrency suite (19 cases, real-kernel flock, explicit invocation only)
|
||||
- [ ] P5 docs/concurrency.md rewrite to the new model
|
||||
- [x] P4 config cleanup: remove concurrency.limit from .drone.yml; maxTests is the single knob
|
||||
- [x] tests/concurrency suite (19 cases, real-kernel flock, explicit invocation only)
|
||||
- [x] P5 docs/concurrency.md rewrite to the new model
|
||||
- [ ] M1 claim (branch complete, both suites + lint green)
|
||||
- [ ] M2: merge to main after M1 PASS, push build green, live verification a–d
|
||||
|
||||
## Adversary findings
|
||||
|
||||
(adversary-owned)
|
||||
### [adversary] CONC-A1 — double-!testme same domain corrupts the shared deploy-count file (M2(c) FAIL)
|
||||
|
||||
**Severity:** blocks M2(c). Both runs of a same-domain double-!testme go RED.
|
||||
|
||||
**Root cause (two coupled defects, one shared root):**
|
||||
1. The DG4.1 deploy-counter file is keyed by DOMAIN in the *shared* system tempdir, NOT per-run:
|
||||
`run_recipe_ci.py:930 countfile = /tmp/ccci-deploys-<domain>`. P3 isolated `ABRA_DIR` per run
|
||||
but this per-run state file was missed — it predates the restructure (ef44d46) and the OLD
|
||||
recipe-flock used to serialize same-recipe runs end-to-end, incidentally masking it.
|
||||
2. `lifecycle.deploy_app()` calls `_record_deploy()` (lifecycle.py:250) BEFORE
|
||||
`acquire_app_lock(domain)` (lifecycle.py:254, introduced by P2 b302f3a). So the counter
|
||||
increment happens OUTSIDE the serialization window — a second same-domain run bumps the
|
||||
shared counter before it ever blocks on the lock.
|
||||
|
||||
**Observed (live, builds 279 + 281, immich PR#2, same domain immi-ad3e33, 2026-06-10T05:04Z):**
|
||||
- Lock serialization itself WORKS: 281 logged `== app lock: ... in flight — waiting ==` at 2s,
|
||||
then `== app lock: acquired ==` at 194s — exactly when 279 exited (279 finished 05:07:35).
|
||||
- 279 RED: `!! deploy-count 2 != 1 (DG4.1 violation)`. The `2` = 281's pre-lock `_record_deploy`
|
||||
(fired ~2s, before 281 blocked) polluting the shared counter 279 was actively using.
|
||||
- 281 RED: `FileNotFoundError: /tmp/ccci-deploys-immi-ad3e33...` at run_recipe_ci.py:1213 —
|
||||
279's end-of-run `os.remove(countfile)` (line 1215) deleted the shared file out from under 281,
|
||||
whose single `_record_deploy` had already fired at 2s and never recreates it.
|
||||
- Control: isolated immich (build 275, same fixed wrapper) → `deploy-count = 1`, GREEN. So this
|
||||
is concurrency-specific, not a pre-existing immich/wrapper issue.
|
||||
|
||||
**Repro:** two `!testme` comments on the same recipe PR (same domain) in quick succession on the
|
||||
deployed main harness → both builds RED (one DG4.1 false-violation, one FileNotFoundError).
|
||||
|
||||
**Fix direction (Builder owns):** key the deploy-counter per RUN, not per domain — e.g. put it in
|
||||
`/var/lib/cc-ci-runs/<build>/` (alongside the per-run artifacts) or include the build/run id in the
|
||||
filename, and export that path via `CCCI_DEPLOY_COUNT_FILE`. Per-run keying fixes BOTH defects at
|
||||
once (no cross-run pollution; no shared remove). Moving `_record_deploy()` after `acquire_app_lock`
|
||||
alone is INSUFFICIENT — the shared `os.remove`/`FileNotFoundError` collision survives. Add a
|
||||
tests/concurrency case: two same-domain runs serialized on the app lock → each sees its own
|
||||
deploy-count, neither removes the other's file (this is the gap vs the 19 planned cases — case 4
|
||||
serialises acquire but never asserts deploy-count isolation across the two).
|
||||
|
||||
**Closure:** adversary-owned. Re-test the (c) double-!testme live (both GREEN, visible block line,
|
||||
zero leakage) + the new unit case before this clears. Only I close it.
|
||||
|
||||
**CLOSED @2026-06-10T09:0xZ** — fix b6e12ef (run-keyed state files via `_run_state_path`) merged
|
||||
139e319. Verified by me: (a) code cold-verified + mutation-proven (reverting to domain-keying fails
|
||||
all 3 test_run_state cases); (b) suites green cold (unit 138, concurrency 23); (c) LIVE re-run
|
||||
builds 290+291 (same immich domain immi-ad3e33) BOTH SUCCESS — 291 logged the block line
|
||||
(`in flight — waiting` → `acquired`), both read `deploy-count = 1` (290 no longer false-2; 291 no
|
||||
longer FileNotFoundError), zero leakage after (0 procs / 0 apps / 0 services / 0 volumes / 0 secrets
|
||||
/ no held locks). Full evidence in REVIEW-conc M2(c) PASS.
|
||||
|
||||
23
BACKLOG-rcust.md
Normal file
23
BACKLOG-rcust.md
Normal file
@ -0,0 +1,23 @@
|
||||
# BACKLOG — sub-phase rcust
|
||||
|
||||
## Build backlog
|
||||
|
||||
- [ ] P1.1 `runner/harness/meta.py`: KEYS registry (14 keys + 3 deprecated) + `load(recipe) -> RecipeMeta`
|
||||
- [ ] P1.2 migrate readers L1–L6 to `meta.load()` (orchestrator loads once, passes down)
|
||||
- [ ] P1.3 mumble private constants → underscore-prefixed (`_WELCOME_TEXT_MARKER`, `_MAX_USERS`) + fix importers
|
||||
- [ ] P1.4 `tests/unit/test_meta.py` (all-recipes-load-clean, MetaError cases, defaults, R2 proof)
|
||||
- [ ] P1.5 `scripts/gen-meta-docs.py` + doc-sync unit test
|
||||
- [ ] P2a compose.ccci.yml first-class (auto-copy + auto-chaos); strip ghost/discourse boilerplate
|
||||
- [ ] P2b install-time deps only; migrate lasuite-docs; delete setup_custom_tests.sh machinery
|
||||
- [ ] P2c SKIP_GENERIC meta key deleted; env form documented dev-only + loud warning in CI runs
|
||||
- [ ] P2d conftest cleanup: delete deployed/deployed_app (+app_domain if unused); consolidate deps fixture; migrate 6 lasuite test files
|
||||
- [ ] P3 HookCtx + convert all hook call sites + migrate in-repo users + unit tests
|
||||
- [ ] P4 discovery placement rule + op_state/deps fixtures + migrate hand-parsers
|
||||
- [ ] P5 customization manifest (print block + results.json key) + unit tests
|
||||
- [ ] P6 docs rewrite (recipe-customization.md §8, testing.md, enroll-recipe.md)
|
||||
- [ ] M1 pre-claim: run `pytest tests/concurrency -q` once to prove untouched
|
||||
- [ ] M2 prep: build baseline matrix (21 recipe dirs, expected outcomes) BEFORE merging — commit to STATUS-rcust.md
|
||||
|
||||
## Adversary findings
|
||||
|
||||
(Adversary-owned section)
|
||||
141
JOURNAL-conc.md
141
JOURNAL-conc.md
@ -22,3 +22,144 @@ Read concurrency-restructure-full-plan.md (SSOT) + plan.md §6.1/§7/§9. Orient
|
||||
Working setup: state files on main in this clone; code on branch `restructure/concurrency`
|
||||
via a git worktree at ../cc-ci-conc; test runs on the cc-ci host via /root/builder-clone
|
||||
(`cc-ci-run -m pytest ...`, `nix develop .#lint`).
|
||||
|
||||
## 2026-06-10 — P1–P4 landed on restructure/concurrency
|
||||
|
||||
- P1 b492f99: harness/lifetime.py (PDEATHSIG+ppid recheck, SIGTERM/SIGALRM→SystemExit funnel
|
||||
with re-entrancy guard, alarm(3600)); main() installs first; both finally blocks mark
|
||||
begin_teardown(); .drone.yml setsid+trap wrap. Live smoke on cc-ci (cc-ci-run /tmp/p1-smoke.py):
|
||||
TERM→rc=143+finally; ALRM→rc=142+finally+deadline log; parent-kill→child TERM'd, teardown ran.
|
||||
- P2 b302f3a: acquire_app_lock + _probe_and_reap + janitor rewrite; registry deleted. Live smoke
|
||||
(/tmp/p2-smoke*.py): held lock → "live concurrent run, leaving it", reaped=[]; killed holder →
|
||||
reap exactly once + lockfile unlinked; waiter blocked during probe-held reap, then re-acquired
|
||||
on the FRESH inode (probe confirmed held by waiter). Note: a select()-on-fd readline artifact
|
||||
in my smoke script initially looked like a failure — kernel state was verified directly.
|
||||
Unlink/recreate race guarded on BOTH sides via fstat/stat st_ino identity checks.
|
||||
- P3 17ebdf3: per-run ABRA_DIR. Verified abra CLI honors $ABRA_DIR on-host (skeleton probe:
|
||||
FATAs only on empty servers/; with servers+catalogue symlinks + recipes/ it works and even
|
||||
auto-clones recipes for `app ls` resolution into the per-run dir). p3-smoke: setup + fetch of
|
||||
custom-html-tiny landed in /tmp/p3runs/9999/abra/recipes, head commit + versions readable via
|
||||
abra.recipe_dir(). install_steps.sh path fix justified in DECISIONS.md (conc P3 entry).
|
||||
Pre-existing observation (NOT mine, unchanged): `abra app ls -S -m -n` currently FATAs
|
||||
"unable to resolve '0cc57a5a'" under the DEFAULT abra dir too → janitor's abra discovery
|
||||
yields [] and the docker-service sweep carries discovery. Out of this phase's scope.
|
||||
- P4 91d3cc7: concurrency.limit removed; maxTests comment states single-knob + new model.
|
||||
One stale comment line (.drone.yml l.39 "concurrency.limit=2 below") folds into P5.
|
||||
|
||||
All four commits: tests/unit 138 passed + lint PASS before each. Next: tests/concurrency suite.
|
||||
|
||||
## 2026-06-10 — tests/concurrency (84d90fb) + P5 (d3fe9e2) + M1 claim (e8e52cf)
|
||||
|
||||
- Suite: 20 tests / 19 plan cases, all real-kernel (helpers.py subprocesses hold real flocks,
|
||||
install real prctl/alarm guards; CCCI_APP_LOCK_DIR sandboxes /run/lock; HelperPool reaps every
|
||||
helper + recorded grandchildren). First full run on cc-ci: 20 passed in 9.96s, zero flakes in
|
||||
3 repeat runs during the P5 verification re-runs.
|
||||
- Design notes for the Adversary's blind-spot hunt (my own known limits):
|
||||
- case 8 (two janitors) uses threads in one process — valid because flock conflicts are
|
||||
per-open-file-description, and overlap is forced via a Barrier + 2s slow teardown stub.
|
||||
- case 14 relies on reparent-to-pid-1 (true on the cc-ci host; would need adjustment in a
|
||||
subreaper environment — marked NEVER_REPARENTED visibly if so).
|
||||
- cases 5-12 stub teardown_app (recording) — janitor probe/reap ordering is what's under
|
||||
test, not teardown internals (covered by Phase-1 e2e + M2 live checks).
|
||||
- M1 claimed at e8e52cf; full verification recipe in STATUS-conc.md (WHAT/WHERE/HOW/EXPECTED).
|
||||
|
||||
## 2026-06-10 — M2: merge + live verification (a)
|
||||
|
||||
- Merge: bb5eb3d (--no-ff) pushed; push build 266 (self-test lint+hello) SUCCESS.
|
||||
- (a) cancel-mid-run: !testme on immich#2 → build 267 (custom) running on the NEW harness —
|
||||
log shows the setsid/trap wrap + "== per-run ABRA_DIR: /var/lib/cc-ci-runs/267/abra ==";
|
||||
lock /run/lock/cc-ci-app-immi-ad3e33...lock held by pid 636902; 4 immich services up.
|
||||
Canceled via drone API 04:42:07Z (HTTP 200, build status "killed"). Result: harness pid
|
||||
GONE (no leaked python — the old §8.1 gap is closed), immich services 0, volumes 0,
|
||||
secrets 0, .env 0 — the SIGTERM funnel ran the run's own teardown (better than the plan's
|
||||
minimum, which allowed the janitor to do the reaping). Lock RELEASED (lockfile present but
|
||||
unheld — tidy-swept by the next janitor, to be observed during (b)).
|
||||
- (b) triggered 04:46:53Z: !testme immich#2 (comment 14287) + plausible#3 (14288) in parallel.
|
||||
|
||||
## 2026-06-10 — M2(b) round 1: green runs, poisoned exit code → wrapper fix
|
||||
|
||||
- Builds 268 (immich#2) + 269 (plausible#3) ran in PARALLEL on the new harness: both logs end
|
||||
with all-tiers-pass RUN SUMMARY (level=4, deploy-count 1/1) and the host shows ZERO leakage
|
||||
after (no harness processes, no immi/plau services/volumes/secrets, only unheld lockfiles).
|
||||
Both steps nevertheless exited 1: the P1 EXIT trap's kill of the already-gone process group
|
||||
returns ESRCH under the runner's `set -e` shell — a GREEN run reported failure.
|
||||
- Reproduced minimally on-host (`sh -e` and `bash -e`: rc=1 on a clean exit with the old trap).
|
||||
Fix e1c4198 (capture rc; `trap - TERM EXIT`; `|| true` on the trap kill) verified on-host:
|
||||
green rc=0, red rc=7 propagated, TERM→wrapper forwards to child, exits 143. Merged to main
|
||||
b7a009c; push builds 272-274 green. Adversary notified via inbox.
|
||||
- (b) re-triggered on the fixed wrapper 04:56:10Z (immich#2 + plausible#3).
|
||||
|
||||
## 2026-06-10 — M2(b) PASS + (c) triggered
|
||||
|
||||
- (b) round 2 on fixed wrapper: builds 275 (immich#2) + 276 (plausible#3) ran in PARALLEL,
|
||||
BOTH status=success (drone API). Host after: 0 python harness processes, 0 immi/plau
|
||||
services/volumes/secrets/.envs — zero leakage. (d) satisfied by 275 (full green immich e2e).
|
||||
Leftover unheld lockfiles present by design (tidy-swept at next janitor).
|
||||
- (c) double-!testme on immich#2: two comments at 05:03:58Z → two custom builds, same run
|
||||
domain immi-ad3e33 → exactly one must block on the app lock with the visible log line.
|
||||
|
||||
## 2026-06-10 — CONC-A1: (c) failure root-caused + fixed (run-keyed state files)
|
||||
|
||||
- (c) round 1 = builds 279+281, both RED. Root cause (independently also found+filed by the
|
||||
Adversary as CONC-A1 while I was mid-diagnosis — same conclusion from both loops): the four
|
||||
run-scoped state files (deploys/opstate/deps/depskip) were DOMAIN-keyed in shared /tmp;
|
||||
281's main()-preamble + pre-lock _record_deploy fired before it blocked on the app lock →
|
||||
279 read deploy-count 2 (false DG4.1 RED); 279's end-of-run os.remove deleted the shared
|
||||
countfile → 281 crashed FileNotFoundError at its own read. Lock serialization itself worked
|
||||
(281: waiting @+2s, acquired @+194s = 279's exit). Masked pre-restructure by the
|
||||
end-to-end recipe flock.
|
||||
- Fix b6e12ef on branch, merged to main 139e319: _run_state_path() keys all four by
|
||||
run id + harness pid; consumers were always env-fed (CCCI_*_FILE), so domain keying was
|
||||
never load-bearing. Both cleanup sites already remove all four on normal exit.
|
||||
- New tests/concurrency/test_run_state.py (suite now 23): path invariants + real-process
|
||||
CONC-A1 interleaving via helpers.py `deploy-count-run` (countfile init → pre-lock
|
||||
_record_deploy → acquire → gated read). Teeth verified: under simulated shared keying the
|
||||
regression test FAILS (host run: 3 failed); with the fix: 23 passed + 138 unit + lint PASS.
|
||||
- Next: push build green → re-run (b)+(d), then (c), then (a) per the VETO's conditions.
|
||||
|
||||
## 2026-06-10 — M2 re-verification on CONC-A1-fixed main (139e319)
|
||||
|
||||
- Push builds 283/284/285 (branch fix, merge, inbox) all green.
|
||||
- (b)+(d) round 3 (comments 14299/14300, 08:17:35Z): builds 287 (immich#2) + 288 (plausible#3)
|
||||
BOTH success, started simultaneously 08:17:40Z (parallel), finished 08:21:06/08:21:13.
|
||||
Both logs: deploy-count = 1 (expect 1), level=4. Host after: pgrep -f 'run_recipe_c[i]' → no
|
||||
match (earlier "2" was pgrep self-match of the ssh cmdline); immi/plau services/volumes/
|
||||
secrets/server-envs all 0. Zero leakage. (d) satisfied by 287 (full green immich e2e on the
|
||||
final harness code).
|
||||
- (c) round 2 triggered 08:22:13Z: comments 14303+14304 on immich#2 (same domain immi-ad3e33).
|
||||
|
||||
## 2026-06-10 — M2(c) PASS round 2 (builds 290+291) + (a) re-run triggered
|
||||
|
||||
- (c) round 2: builds 290 (08:22:30→08:46:05) + 291 (08:22:33→08:49:23) BOTH success.
|
||||
291 log: "== app lock: another run of immi-ad3e33... in flight — waiting ==" at +1s,
|
||||
"acquired" at +1411s = exactly 290's exit. Both: deploy-count = 1 (expect 1), level=4.
|
||||
Slowness was an immich-ML healthcheck flake (Adversary cross-confirmed live via lslocks:
|
||||
one holder pid 739163, one waiter pid 739341 on the same lock inode — serialization observed
|
||||
in the kernel lock table); ML converged inside the 1500s window, both runs green anyway —
|
||||
no clean re-run needed.
|
||||
- After both: no harness procs (pgrep run_recipe_c[i] empty), 0 immi/plau services/volumes/
|
||||
secrets/server-envs. Unheld lockfile remains by design (tidy-swept at next janitor probe).
|
||||
- (a) re-run on fixed harness: !testme immich#2 comment 14307 @08:50:02Z; will cancel mid-run
|
||||
via drone API once the deploy is in flight, then check pid/lock/leakage + janitor reap.
|
||||
|
||||
## 2026-06-10 — M2(a) re-run PASS (build 295) + M2 claim
|
||||
|
||||
- (a) on fixed harness: build 295 (comment 14307 @08:50:02Z) canceled @08:51:05Z (HTTP 200)
|
||||
while mid-deploy (lock held by pid 763099, 4 immich services converging). Harness pid GONE
|
||||
@08:51:15Z — the SIGTERM funnel ran the run's own teardown inside 10s; build status=killed;
|
||||
lock released (lslocks empty); services/volumes/secrets/envs all 0. Zero leakage, no janitor
|
||||
required.
|
||||
- Adversary lifted the CONC-A1 VETO @09:05Z with its own M2(c) PASS (290/291 cold-verified,
|
||||
kernel-lock-table serialization observation). Remaining for DONE: formal M2 claim (this
|
||||
commit) + Adversary cold re-check of (a)/push-builds.
|
||||
- M2 claimed in STATUS-conc.md with consolidated (a)-(d) evidence + cold re-check recipe.
|
||||
|
||||
## 2026-06-10 — M2 PASS → ## DONE
|
||||
|
||||
- Adversary M2 PASS @08:55Z (review 9987fba): all 7 claim items cold-confirmed, both M2-found
|
||||
fixes verified, guardrails honored, no open veto. Parent-sha typo in my claim noted by the
|
||||
Adversary (139e319^1 = 2173894, not 4ad55ed) — corrected in STATUS.
|
||||
- ## DONE written to STATUS-conc.md. Phase conc complete: one mechanism (per-app-domain flock),
|
||||
per-run ABRA_DIR isolation, flock-probe janitor, lifetime guards + 60-min deadline, single
|
||||
concurrency knob, spec rewritten, 23-test real-kernel suite. Two live-found fixes along the
|
||||
way: wrapper exit-code under set -e, CONC-A1 run-keyed state files.
|
||||
|
||||
10
JOURNAL-rcust.md
Normal file
10
JOURNAL-rcust.md
Normal file
@ -0,0 +1,10 @@
|
||||
# JOURNAL — sub-phase rcust (Builder)
|
||||
|
||||
## 2026-06-10 bootstrap
|
||||
|
||||
Read phase plan (recipe-custom-restructure-full-plan.md), plan.md §6.1/§7/§9, and the reference
|
||||
spec docs/recipe-customization.md @ 76a4b6b in full. Created phase state files. Work branch will
|
||||
be `restructure/recipe-custom` off main @ 76a4b6b. Starting P1: reading the six current loaders
|
||||
(run_recipe_ci.py::_load_meta, conftest.py::_recipe_meta, lifecycle.py::_recipe_extra_env,
|
||||
lifecycle.py::_recipe_meta_flag, deps.py::declared_deps, canonical.py::is_canonical_enrolled)
|
||||
before writing harness/meta.py.
|
||||
410
REVIEW-conc.md
410
REVIEW-conc.md
@ -30,3 +30,413 @@ Current-system facts I will hold the restructure against:
|
||||
same PR blocks visibly, one full green run. NEVER merge/push recipe mirror repos.
|
||||
|
||||
No verdict yet — waiting for Builder bootstrap/claim.
|
||||
|
||||
## 2026-06-10T04:05Z — cold-verify environment established (prep, no gate)
|
||||
|
||||
Builder seeded STATUS/BACKLOG/JOURNAL-conc; STATUS says P1 in flight, no gate claimed. Mapped the
|
||||
test-execution environment I'll use for the M1 cold run so a time-sensitive gate isn't spent
|
||||
debugging tooling:
|
||||
- Local VM devshell (`nix develop`) has only lintTools (no pytest). So pytest does NOT run here.
|
||||
- pytest 8.3.3 + playwright live in the host `pyEnv` (nix/modules/harness.nix) exposed as
|
||||
`cc-ci-run` on cc-ci. `cc-ci-run -m pytest <path> -q` works as the real harness interpreter
|
||||
(verified: `cc-ci-run -c "import pytest" -> 8.3.3`).
|
||||
- `.drone.yml` lint stage runs `nix develop .#lint --command bash scripts/lint.sh`.
|
||||
- COLD M1 PLAN: fresh `git clone`/checkout of `restructure/concurrency` into a throwaway dir ON
|
||||
cc-ci → `cc-ci-run -m pytest tests/unit -q` + `cc-ci-run -m pytest tests/concurrency -q` +
|
||||
`nix develop .#lint --command bash scripts/lint.sh`, all from that clean checkout (not the
|
||||
Builder's working tree). Then adversarial diff review per my baseline hit-list.
|
||||
- Baseline `.drone.yml` on main is still the pre-restructure version (concurrency.limit=2,
|
||||
acquire_recipe_lock / /run/cc-ci-active registry referenced) — confirms P1/P4 edits are
|
||||
branch-only so far. Good.
|
||||
|
||||
## 2026-06-10T04:23Z — early pre-review of P1+P2 (branch @b302f3a, NO gate claimed — NOT a verdict)
|
||||
|
||||
Builder has pushed P1 (b492f99) + P2 (b302f3a) to restructure/concurrency; P3/P4/P5/tests still
|
||||
pending, so M1 is not claimable and this is NOT a PASS — it's pre-review to front-load the M1 diff
|
||||
audit and avoid re-doing it under gate time pressure. Read code/diff + git only; did NOT read
|
||||
JOURNAL (anti-anchoring intact). I actively tried to break the following and each concern was
|
||||
REFUTED:
|
||||
|
||||
1. **Green-on-red via the .drone.yml EXIT trap** (my lead hypothesis). The wrapper is
|
||||
`setsid cc-ci-run … & PID=$!; trap 'kill -TERM -- -$PID' TERM EXIT; wait $PID`. I worried the
|
||||
EXIT trap's final `kill` status would override the harness exit code and mask a failing run.
|
||||
EMPIRICALLY TESTED (4 bash repros incl. failing harness with a lingering group member that
|
||||
makes kill succeed=0): bash PRESERVES the pre-trap exit status when the EXIT trap doesn't call
|
||||
`exit`. Exit code propagates correctly in all cases (RED stays RED, GREEN stays GREEN). Refuted.
|
||||
2. **P2 unlink/reacquire inode race** (janitor unlinks a reaped orphan's lockfile while a new run
|
||||
blocks on the old inode). Handled: both acquire_app_lock and _probe_and_reap recheck
|
||||
`fstat(fd).st_ino == stat(path).st_ino` after acquiring and retry/bail on mismatch — a lock on
|
||||
an unlinked (anonymous) inode is never treated as authoritative, and the path's lockfile is
|
||||
never unlinked out from under a newer run. Refuted.
|
||||
3. **Half-reaped/new-app coexistence.** Reap runs WHILE HOLDING the probe lock; a new same-domain
|
||||
run blocks in acquire_app_lock until reap completes. The pre-deploy window (lock held, app not
|
||||
yet created) is covered: the stale-lockfile sweep sees the held lock (BlockingIOError) and
|
||||
leaves it. Refuted.
|
||||
4. **Signal mid-normal-teardown aborting cleanup.** begin_teardown() is the FIRST line of BOTH
|
||||
finally blocks (run_recipe_ci.py:663 run_quick, :1134 main); the _funnel_handler swallows
|
||||
(logs+returns) any SIGTERM/SIGALRM once tearing_down is set, so a second signal can't abort the
|
||||
cleanup the first asked for. install_lifetime_guards() is the FIRST statement of main() (:829),
|
||||
before any abra/lock call, with prctl→ppid==1 recheck in the correct order. Refuted.
|
||||
|
||||
Open items to confirm AT M1 (cold, full suite) — NOT defects, just unverified-until-then:
|
||||
- `datetime` import removed from lifecycle.py along with _stack_age_seconds — grep for any
|
||||
remaining datetime use (ruff would catch an undefined name; confirm import truly orphaned).
|
||||
- `_stack_name` / age-fallback deadcode after the janitor rewrite — confirm no dangling refs.
|
||||
- Registry-symbol deletion is only PARTIAL on this commit: acquire_recipe_lock still present
|
||||
(P3 deletes it); register/unregister/_run_owner_state/ACTIVE_RUN_DIR/CCCI_JANITOR_MAX_AGE are
|
||||
gone — full dangling-ref grep belongs at M1 once P3 lands.
|
||||
- setsid-fork edge: if `setsid` ever forks (only when it's a pgrp leader; not the case for a
|
||||
backgrounded job in a non-job-control drone shell), $PID would be the intermediate and the
|
||||
harness would reparent to ppid==1 and self-abort. Live-verify the trap+cancel path at M2(a).
|
||||
- begin_teardown is process-global module state (lifetime._state) — fine for one harness process;
|
||||
the tests/concurrency suite must not import-share it across in-process cases (verify at M1).
|
||||
|
||||
## 2026-06-10T04:32Z — pre-review P3+P4 (branch @91d3cc7, NO gate claimed — NOT a verdict)
|
||||
|
||||
Builder pushed P3 (17ebdf3 per-run ABRA_DIR) + P4 (91d3cc7 config cleanup). tests/concurrency +
|
||||
P5 docs still pending, so M1 still not claimable. Continued the front-loaded diff audit (code/git
|
||||
only; JOURNAL still unread). Findings — all CLEAN:
|
||||
|
||||
- **Dangling-ref grep across runner/bridge/dashboard/nix = ZERO hits** for all 9 deleted symbols:
|
||||
acquire_recipe_lock, register_run_app, unregister_run_app, _run_owner_state, ACTIVE_RUN_DIR,
|
||||
CCCI_JANITOR_MAX_AGE, RECIPE_LOCK_DIR, _stack_age_seconds, _registry_path. The orphaned
|
||||
`datetime` import is also gone from lifecycle.py. Clean deletion.
|
||||
- **Path centralization**: all `~/.abra/recipes/<recipe>` literals replaced by `abra.recipe_dir()`
|
||||
(resolves `$ABRA_DIR else ~/.abra`) across abra.py (recipe_checkout, has_lightweight_version_tags,
|
||||
recipe_head_commit, recipe_versions), generic._recipe_dir, lifecycle.prepull_images,
|
||||
snapshot_recipe_tests, fetch_recipe. prepull's env_path stays canonical `~/.abra/servers/...`
|
||||
which is correct (servers/ is the shared symlink target).
|
||||
- **Ordering verified** (main(), the only structural risk): install_lifetime_guards() is the FIRST
|
||||
stmt (873); between it and setup_run_abra_dir() (891) there are ONLY env reads + a print — no
|
||||
abra call; ABRA_DIR is exported at 891 BEFORE fetch_recipe (892) and before the first path-helper
|
||||
recipe_head_commit (895). The `--quick` dispatch (run_quick, ~908) is AFTER 891, so the quick lane
|
||||
inherits the per-run ABRA_DIR too. No tree is touched before ABRA_DIR is set.
|
||||
- **Manual-run isolation**: rid=="manual" → "manual-<pid>" so two hand-runs don't share a tree.
|
||||
|
||||
Open items to confirm AT M1 (cold) — not defects:
|
||||
- setup_run_abra_dir symlink idempotency: `if not os.path.islink(link): os.symlink(...)` — if a
|
||||
NON-symlink file pre-exists at servers/catalogue (reused run dir from a crashed partial), symlink
|
||||
raises FileExistsError. Low risk (fresh run-id per Drone build) but worth a glance.
|
||||
- CCCI_SKIP_FETCH=1 now `rm -rf dest` + copytree(canonical, dest, symlinks=True) — confirm the
|
||||
--quick rollback-proof staging tests still pass (they set CCCI_SKIP_FETCH).
|
||||
- tests/{ghost,discourse}/install_steps.sh RECIPE_DIR=${ABRA_DIR:-$HOME/.abra} mechanical path fix
|
||||
— confirm it changed NO assertion/gate (guardrail: never weaken recipe-test gates). Diff-check.
|
||||
|
||||
Net: the entire P1–P4 diff has been pre-audited and is clean against my break-it hit-list. M1 cold
|
||||
run, once claimed (after tests/concurrency + P5 land), reduces to: fresh checkout on cc-ci →
|
||||
`cc-ci-run -m pytest tests/unit -q` + `cc-ci-run -m pytest tests/concurrency -q` + lint, plus a
|
||||
focused review of only the tests/concurrency suite (vs the 19 planned cases) and the P5 doc delta.
|
||||
|
||||
## M1: PASS @2026-06-10T04:38Z — implementation verified (branch restructure/concurrency @d3fe9e2)
|
||||
|
||||
Verdict formed from the plan (SSOT), the code/git, the STATUS claim's verify recipe, and my own
|
||||
COLD acceptance run — WITHOUT reading JOURNAL first (anti-anchoring honored; noting here that I had
|
||||
NOT consulted JOURNAL-conc at verdict time).
|
||||
|
||||
COLD ENVIRONMENT: fresh `git clone --branch restructure/concurrency` into /tmp/adv-m1 on cc-ci
|
||||
(NOT the Builder's tree); `git rev-parse HEAD == d3fe9e26bb0fbaedb37383539ba3973bc1c80aff` (matches
|
||||
claim), `git status` clean. Ran via the host `cc-ci-run` pyEnv (pytest 8.3.3 + playwright) and the
|
||||
pinned `.#lint` devshell.
|
||||
|
||||
ACCEPTANCE RESULTS (expected → observed):
|
||||
- `cc-ci-run -m pytest tests/unit -q` → 138 passed in 4.72s ✓ (claim: 138 passed)
|
||||
- `cc-ci-run -m pytest tests/concurrency -q` → 20 passed in 9.91s ✓ (claim: 20 passed)
|
||||
- `nix develop .#lint --command bash scripts/lint.sh` → `lint: PASS` ✓
|
||||
- `pytest tests/unit --collect-only` concurrency items → 0 ✓ (suite NOT in default gate)
|
||||
- dangling-ref grep (register_run_app, unregister_run_app, _run_owner_state, ACTIVE_RUN_DIR,
|
||||
CCCI_JANITOR_MAX_AGE, acquire_recipe_lock, RECIPE_LOCK_DIR, _stack_age_seconds) over
|
||||
*.py/*.nix/*.yml/*.sh → ZERO hits outside docs/ ✓
|
||||
|
||||
GATE-INTEGRITY (guardrails honored):
|
||||
- `RUN_APP_RE` regex unchanged (lifecycle.py:26, identical pattern); warm/canonical apps still
|
||||
never become probe candidates (test_11 asserts no lockfiles even created for warm names).
|
||||
- `services_converged()` / paused-is-settled / `backup_app()` waits: NOT in the code diff — all
|
||||
RUN_APP_RE/services_converged/paused diff hits are docs/concurrency.md prose (P5 rewrite).
|
||||
- `teardown_app` ordering untouched; only its trailing unregister call removed (registry gone).
|
||||
- Only `tests/<recipe>/` change is the mechanical `RECIPE_DIR=${ABRA_DIR:-$HOME/.abra}/...` line
|
||||
in ghost+discourse install_steps.sh — NO assertion/gate touched (diff-confirmed). Guardrail
|
||||
"never weaken recipe-test gates / touch tests/<recipe>/ content" honored.
|
||||
- P4: `concurrency.limit` block removed from .drone.yml; drone-runner.nix comment makes
|
||||
DRONE_RUNNER_CAPACITY the single knob.
|
||||
|
||||
ADVERSARIAL DIFF REVIEW (P1–P4 pre-audited in the two notes above; refuted: green-on-red exit-code
|
||||
masking [empirically tested], unlink/reacquire inode race [fstat==stat identity recheck],
|
||||
half-reaped coexistence [reap-under-probe-lock], signal-mid-teardown reentrancy [begin_teardown
|
||||
first line of both finally blocks], guard/ABRA_DIR/fetch ordering [no abra call pre-export]).
|
||||
|
||||
TEST-SUITE AUDIT vs the 19 plan cases: real kernel flocks, NEVER mocked (only teardown_app +
|
||||
abra-discovery stubbed, both disclosed). Coverage complete: cases 1–4 test_locks, 5–12
|
||||
test_janitor, 13–16 test_lifetime, 17–19 test_abra_dir, +test_18b (manual-pid isolation) = 20.
|
||||
Assertions are substantive, not tautological: exact funnel exit codes 142/143 (test_15/16),
|
||||
reap-vs-new-run timestamp ordering + fresh-inode `lock_state=="held"` (test_7), two-janitor
|
||||
arbitration via separate open()s (test_8 — valid: flock binds the open file description, so
|
||||
threads-with-distinct-fds model processes), long-held mtime-backdate flag-not-steal (test_10),
|
||||
PEP 446 fd non-inheritance with a surviving child (test_3), divergent per-run trees + canonical
|
||||
untouched (test_18).
|
||||
|
||||
INDEPENDENT PROBE (my own driver, NOT the Builder's helpers.py): drove the real
|
||||
`lifecycle.acquire_app_lock` from a standalone script with a sandbox CCCI_APP_LOCK_DIR on cc-ci →
|
||||
state `held` after acquire; a second acquirer BLOCKED while the first held (no ack2 after 1.5s);
|
||||
after `SIGKILL` of the holder the second acquired within 10s (kernel auto-release). Core invariant
|
||||
confirmed against the real code, not just the Builder's tests.
|
||||
|
||||
NON-BLOCKING NOTES (carry to M2 live-verify; none gate M1):
|
||||
- setsid-fork edge in the .drone.yml trap wrapper: if `setsid` ever forks (only when it's a pgrp
|
||||
leader — not the case for a backgrounded job in a non-job-control drone shell), $PID would be the
|
||||
intermediate and the harness could reparent (ppid==1) and self-abort. MUST be live-verified by
|
||||
the actual drone-cancel path at M2(a) — the plan already flags this ("verify drone exec runner
|
||||
signal delivery; the trap must fire on drone cancel"). Not unit-testable here.
|
||||
- End-of-janitor stale-lockfile tidy sweep (appless leftover lockfile unlink) is not directly
|
||||
covered by a named test (not one of the 19); low risk (tidiness only). Noted, not a defect.
|
||||
- test_14 (ppid race) depends on the helper reparenting to pid 1; under a subreaper it marks
|
||||
NEVER_REPARENTED and FAILS VISIBLY (never false-passes). Passed in this env.
|
||||
|
||||
CONCLUSION: M1 — implementation verified — PASS. M2 (merge to main + live verification a–d) is
|
||||
unblocked. Reminder for both loops: recipe-mirror PRs are !testme targets only — never merge/push
|
||||
them. (After this verdict I may consult JOURNAL-conc to contextualize, per §6.1.)
|
||||
|
||||
## 2026-06-10T04:49Z — M2 merge integrity pre-check (M2 NOT yet claimed — not a verdict)
|
||||
|
||||
Builder merged the branch to main (merge commit `bb5eb3d`, 2 parents 83a6c6e∘d3fe9e2, no force)
|
||||
after my M1 PASS, and is mid-M2 live verification (journal: M2(a) cancel-mid-run evidence, (b)
|
||||
parallel runs triggered). No `claim(conc): M2` commit yet; STATUS-conc still shows the stale M1
|
||||
line (Builder's file — will update at the M2 claim). Independent merge check:
|
||||
- `git diff bb5eb3d d3fe9e2 -- runner/ .drone.yml docs/concurrency.md tests/ nix/` = EMPTY → the
|
||||
merge preserved EXACTLY the code I cold-verified at M1. No conflict-resolution drift introduced.
|
||||
- `git merge-base --is-ancestor d3fe9e2 bb5eb3d` = true.
|
||||
So deployed main == M1-verified tree. At the M2 claim I therefore re-verify only LIVE behavior +
|
||||
the push build, not the code again:
|
||||
push build green; (a) cancel mid-run → no leaked python/lock, next janitor reaps the app, zero
|
||||
leakage; (b) two parallel !testme (immich#2 + plausible#3) → both green, zero leakage; (c)
|
||||
double-!testme same PR → 2nd blocks on the app lock (visible in its drone log) then runs; (d) one
|
||||
full green end-to-end run. Evidence to come from Drone build logs + cc-ci state (abra app ls /
|
||||
lslocks / docker), cold from my own access path.
|
||||
|
||||
## 2026-06-10T05:00Z — wrapper exit-code fix verified + CORRECTION to my P1 pre-review (inbox consumed)
|
||||
|
||||
Consumed ADVERSARY-INBOX.md (deleted) — Builder reported an M2 live-verify finding + fix. Folded in:
|
||||
|
||||
**The defect (real, Builder-found, build 269 plausible#3):** the drone exec step shell is `set -e`.
|
||||
On a NORMAL (green) harness exit the P1 EXIT trap still fired and its `kill -TERM -- -$PID` of the
|
||||
already-exited process group returned ESRCH (exit 1), which under `set -e` poisoned the step's exit
|
||||
status to 1 — a fully GREEN run (all tiers pass, level=4) reported RED.
|
||||
|
||||
**CORRECTION — my P1 pre-review was wrong on this point.** In my 04:23Z pre-review I claimed to have
|
||||
"empirically tested" green-on-red exit-code masking and REFUTED it. That test was run with plain
|
||||
`bash -c` WITHOUT `set -e` — the wrong shell mode. The real drone step runs `set -e`, where the bug
|
||||
manifests. I re-ran the matrix correctly now (bash -e), reproducing the bug (old wrapper + green +
|
||||
set -e → exit 1) and confirming I had the shell mode wrong. Lesson: model the EXACT runtime
|
||||
(set -e) for shell-trap behavior. The Builder caught this live; I did not. Owning it.
|
||||
NB the failure direction was false-RED (green reported red) — fail-safe-ish, not a green-on-red
|
||||
(no failing run was ever reported green); still a real defect.
|
||||
|
||||
**The fix (e1c4198 on branch, merged to main b7a009c) — independently verified by me, cold under
|
||||
`set -e` (the correct mode this time):**
|
||||
```
|
||||
setsid cc-ci-run runner/run_recipe_ci.py & PID=$!
|
||||
trap 'kill -TERM -- "-$PID" 2>/dev/null || true' TERM EXIT
|
||||
rc=0; wait "$PID" || rc=$?
|
||||
trap - TERM EXIT
|
||||
exit "$rc"
|
||||
```
|
||||
My 4-path matrix (all under `bash -e`, exact-shape repros):
|
||||
- A green harness → step exit 0 ✓ (poisoning gone: `|| true` on the trap kill + `trap - EXIT` before exit)
|
||||
- B **red harness (exit 7) → step exit 7 ✓ — NOT masked to green.** Critical false-GREEN check
|
||||
PASSES: `wait || rc=$?` captures the real rc and `exit "$rc"` propagates it. The
|
||||
"failing PR must report RED" gate is preserved by the fix.
|
||||
- C old wrapper + green + set -e → exit 1 ✓ (bug reproduced — root-cause confirmed)
|
||||
- D cancel (TERM to wrapper mid-wait) → wrapper exits 143 AND the child received TERM
|
||||
(CHILD_GOT_TERM logged) ✓ — cancel-forwarding semantics unchanged; the `trap - TERM EXIT` runs
|
||||
only AFTER `wait` returns (post-forward), so it can't disarm the forward during a real cancel.
|
||||
|
||||
Verdict on the fix: CORRECT and SAFE — resolves the false-RED poisoning without introducing
|
||||
false-GREEN, and preserves cancel forwarding. Folds cleanly into the pending M2 review.
|
||||
|
||||
**M1 status unaffected:** M1 PASS was for the code/suites/lint/diff of d3fe9e2; this wrapper
|
||||
exit-code-under-set-e is a LIVE behavior M1's checks could not exercise (the trap only runs in the
|
||||
real drone exec shell). main now = d3fe9e2 + this .drone.yml wrapper fix; the fix is verified above.
|
||||
Open for the formal M2 verdict: re-confirm lint green on the new .drone.yml (yamllint), the push
|
||||
build green, and live (a) cancel-no-leak / (b) parallel both-green / (c) double-!testme blocks /
|
||||
(d) one full green run — cold, once the Builder posts the M2 claim with evidence.
|
||||
|
||||
## M2(c): FAIL @2026-06-10T08:10Z — double-!testme same domain corrupts shared deploy-count → both runs RED + VETO
|
||||
|
||||
Proactive cold break-it probe of the live M2 evidence (M2 not yet formally `claim(conc)`'d — the
|
||||
Builder's JOURNAL shows (c) "triggered" but NOT evidenced as PASS; I went straight to the Drone API
|
||||
to verify the in-flight (c) runs independently, not to the JOURNAL narrative). I found a REAL defect
|
||||
that breaks M2(c). Filed as BACKLOG-conc CONC-A1.
|
||||
|
||||
EVIDENCE (Drone API, recipe-maintainers/cc-ci, cold via /run/secrets/bridge_drone_token — my own
|
||||
access path, not the Builder's word):
|
||||
- (c) = builds **279 + 281**, both `event=custom PR=2 RECIPE=immich REF=a92b28d…` → SAME domain
|
||||
`immi-ad3e33.ci.commoninternet.net`. Both `status=failure` (step `ci` exit_code=1).
|
||||
- 281 (the blocked run): log `== app lock: ... in flight — waiting ==` @2s → `== acquired ==` @194s,
|
||||
which is exactly when 279's process exited (279 finished 05:07:35Z). **Lock serialisation + the
|
||||
visible block line WORK** — that half of (c) is fine.
|
||||
- 279 RED: `!! deploy-count 2 != 1 (DG4.1 violation)`.
|
||||
- 281 RED: `FileNotFoundError: /tmp/ccci-deploys-immi-ad3e33….ci.commoninternet.net` at
|
||||
run_recipe_ci.py:1213.
|
||||
- Control build 275 (isolated immich, same fixed wrapper) → `deploy-count = 1`, GREEN. Confirms the
|
||||
failure is concurrency-specific, NOT a pre-existing immich/wrapper regression.
|
||||
|
||||
ROOT CAUSE (code, confirmed):
|
||||
- DG4.1 counter file is DOMAIN-keyed in shared /tmp, not per-run: `run_recipe_ci.py:930
|
||||
/tmp/ccci-deploys-<domain>`. P3 isolated ABRA_DIR per run but this per-run state file was missed
|
||||
(predates the restructure, ef44d46; the old recipe-flock serialised same-recipe runs end-to-end,
|
||||
masking it).
|
||||
- `deploy_app()` calls `_record_deploy()` (lifecycle.py:250) BEFORE `acquire_app_lock()` (:254,
|
||||
introduced by P2 b302f3a) → the increment races OUTSIDE the lock. 281's single pre-lock
|
||||
`_record_deploy` (@2s) bumps the shared counter 279 is using (→2, false violation), and 279's
|
||||
end-of-run `os.remove(countfile)` (:1215) deletes the file under 281 → FileNotFoundError.
|
||||
- Interleaving is fully reconstructed and self-consistent with the build timestamps (see CONC-A1).
|
||||
|
||||
This is squarely in M2(c) scope: the plan's DoD (c) requires the second run to "block … then RUN"
|
||||
(implicitly green), and the phase's whole premise is "two concurrent !testme don't collide on
|
||||
domain/volume/secrets." This is a domain-keyed-state collision — the restructure's narrower domain
|
||||
lock no longer covers the deploy-count file. M1 (code/suites/lint/diff of d3fe9e2) is unaffected —
|
||||
this is a live concurrency behavior M1's checks could not exercise; the tests/concurrency suite has
|
||||
the matching blind spot (case 4 serialises acquire but never asserts deploy-count isolation across
|
||||
two same-domain runs).
|
||||
|
||||
## VETO — M2 may NOT be marked DONE until CONC-A1 is fixed and I log a fresh (c) PASS
|
||||
Forbidding `## DONE` in STATUS-conc until: (1) deploy-counter keyed per-run; (2) a tests/concurrency
|
||||
case asserts same-domain deploy-count isolation; (3) live (c) re-run shows BOTH builds GREEN with
|
||||
the visible block line and zero leakage; (4) (a),(b),(d) re-confirmed unaffected. Only I clear this.
|
||||
(After this verdict I may consult JOURNAL-conc to contextualise — noting I had NOT read the (c)
|
||||
journal reasoning before forming this FAIL; I verified from the Drone API + code directly.)
|
||||
|
||||
## 2026-06-10T08:20Z — CONC-A1 fix CODE-verified (veto conditions 1+2 met; 3+4 still pending — NOT cleared)
|
||||
|
||||
Builder fixed CONC-A1 (b6e12ef, merged main 139e319) and is re-running M2 live (a)–(d). I
|
||||
cold-verified the FIX CODE from my own clone + a fresh checkout on cc-ci (not the Builder's word):
|
||||
|
||||
- **Condition (1) per-run keying — MET.** `run_recipe_ci._run_state_path(name)` keys all four
|
||||
run-scoped state files (`deploys`, `opstate`, `deps`, `depskip`) by `run_id()` + `os.getpid()`,
|
||||
never domain. Grep: ZERO residual `ccci-<state>-{domain}` literals in prod code (only the
|
||||
app-LOCK path stays domain-keyed, which is correct). All consumers env-read `CCCI_*_FILE`
|
||||
(lifecycle:148, deps:72/155, generic:134) — no path re-derivation. Uniqueness holds even in the
|
||||
manual fallback (`run_id()`→domain) because the `+pid` suffix separates two processes.
|
||||
- **Condition (2) same-domain isolation test — MET, and proven non-tautological.**
|
||||
tests/concurrency/test_run_state.py adds test_20/20b/20c. test_20c drives REAL processes + the
|
||||
REAL lock + real `_run_state_path`/`_record_deploy`, reproducing the 279/281 interleaving: run A
|
||||
reads `COUNT 1` (NOT polluted to 2 by B's pre-lock increment) and B's file survives A's remove
|
||||
(no FileNotFoundError). **Mutation check (my own):** reverting `_run_state_path` to domain-keying
|
||||
in a throwaway cc-ci clone → all 3 test_run_state cases FAIL (incl. test_20c). So the test
|
||||
genuinely guards the fix.
|
||||
- **Suites cold (fresh clone @4f6c955 on cc-ci):** unit 138 passed, concurrency 23 passed (was 20),
|
||||
concurrency still NOT collected by the default `pytest tests/unit` run (0). lint not re-run here
|
||||
(no .drone.yml/nix change in the fix; will confirm at the M2 claim).
|
||||
|
||||
**VETO NOT cleared.** Conditions (3) live (c) re-run BOTH builds GREEN + visible block line + zero
|
||||
leakage, and (4) (a)/(b)/(d) re-confirmed on the fixed harness, still require the Builder's live
|
||||
evidence (in flight). The code fix strongly predicts a (c) pass but M2 is a LIVE gate — I will
|
||||
re-verify the (c) double-!testme cold from the Drone API once the Builder posts the M2 claim, and
|
||||
only then clear the veto.
|
||||
|
||||
## 2026-06-10T08:43Z — live (c) round-2 (builds 290+291): serialization CONFIRMED via lslocks; delay is an immich-ML flake, NOT the restructure (not a verdict)
|
||||
|
||||
(b)+(d) re-passed on the fixed harness (builds 287 immich#2 + 288 plausible#3, parallel, both
|
||||
success — I'll re-confirm at the M2 claim). (c) round 2 = builds 290+291 (both custom PR=2 immich,
|
||||
same domain immi-ad3e33), started 08:22:30Z. I inspected the LIVE host state cold (my own ssh):
|
||||
|
||||
- **CORE INVARIANT DIRECTLY OBSERVED in the kernel lock table** — strongest possible proof of the
|
||||
double-!testme serialization:
|
||||
`lslocks`: pid 739163 (build 290) holds `WRITE` on cc-ci-app-immi-ad3e33….lock; pid 739341
|
||||
(build 291) is blocked `WRITE*` on the SAME lock. Exactly one holder, one waiter, one inode.
|
||||
- 290 (holder) is sleeping in `services_converged()` poll (hrtimer_nanosleep, no abra child) because
|
||||
`immich-machine-learning` is stuck 0/1: its container repeatedly fails the healthcheck
|
||||
(`non-zero exit (143): dockerexec: unhealthy container`, swarm restarting every 1–6 min). Current
|
||||
attempt (08:43) has gunicorn up, health `starting` — slow/flaky ML readiness, not a deploy break.
|
||||
- NOT caused by the restructure / teardown: 290's immich volumes (model-cache/postgres/uploads) +
|
||||
.env are all from 290's OWN fresh deploy (08:23), not inherited from the earlier same-domain run
|
||||
287. ML image present (1.36GB, no pull), host healthy (5.2Gi mem free, 65G disk). So this is an
|
||||
immich-ML healthcheck flake, orthogonal to concurrency.
|
||||
|
||||
Bearing on M2(c): the SERIALIZATION mechanism under test is verified working live. The "both GREEN"
|
||||
half of condition (3) is not yet demonstrated only because 290 is flake-blocked on immich-ML; if 290
|
||||
REDs on deploy-timeout, (c) needs a clean re-run (flake, not a code fault). VETO unchanged — I still
|
||||
require one clean (c) where both same-domain builds go GREEN with the block line + zero leakage.
|
||||
Continuing to watch 290/291 to terminal.
|
||||
|
||||
## M2(c): PASS @2026-06-10T09:05Z — double-!testme same domain, CONC-A1 fixed; VETO LIFTED
|
||||
|
||||
(c) round-2 builds 290+291 (both `custom PR=2 immich`, same domain immi-ad3e33, on CONC-A1-fixed
|
||||
main) both reached terminal **status=success**. Cold-verified from the Drone API + live host (my own
|
||||
access path), not the Builder's word:
|
||||
|
||||
- **Both GREEN:** 290 success, 291 success (Drone API).
|
||||
- **Visible block line (the (c) requirement):** 291 log —
|
||||
`== app lock: another run of immi-ad3e33….ci.commoninternet.net is in flight — waiting ==`
|
||||
then `== app lock: acquired … ==`. I ALSO observed the serialization directly in the kernel lock
|
||||
table mid-run (lslocks: 290 held WRITE, 291 blocked WRITE* on the same inode; after 290 exited,
|
||||
291 held it). Strongest possible proof of the double-!testme serialization invariant.
|
||||
- **CONC-A1 regression GONE — the two exact round-1 failure points are now clean:**
|
||||
- 290 (round-1 build 279 got false `deploy-count 2 != 1`) → now `deploy-count = 1 (expect 1)`,
|
||||
all 5 tiers pass, level=4. Its run-keyed counter was NOT polluted by 291's concurrent pre-lock
|
||||
`_record_deploy`.
|
||||
- 291 (round-1 build 281 crashed `FileNotFoundError` at run_recipe_ci.py:1213) → now
|
||||
`deploy-count = 1 (expect 1)`, all tiers pass, level=4, no traceback. Its own run-keyed countfile
|
||||
survived 290's end-of-run remove.
|
||||
- **Zero leakage after both:** 0 harness procs, 0 immich apps / services / volumes / secrets, no held
|
||||
cc-ci locks. One unheld 0-byte leftover lockfile (mtime 08:46, 291's acquisition touch) — reaped
|
||||
on sight by the next janitor probe, harmless by design.
|
||||
- The ~20-min runtime each was an immich-machine-learning healthcheck slowness/flake (ML eventually
|
||||
converged), NOT the restructure — already diagnosed in the 08:43Z note; serialization + isolation
|
||||
both verified correct regardless.
|
||||
|
||||
**VETO LIFTED.** The CONC-A1 veto ("no DONE until CONC-A1 fixed + a fresh (c) PASS") is cleared:
|
||||
conditions (1) per-run keying [code + mutation-proven], (2) same-domain isolation test
|
||||
[non-tautological], and (3) live (c) both-GREEN + block line + zero leakage are ALL met. CONC-A1
|
||||
closed in BACKLOG-conc.
|
||||
|
||||
**Still required before DONE (full M2 gate, not the CONC-A1 veto):** the Builder must post the formal
|
||||
M2 claim in STATUS-conc with consolidated evidence, and I re-confirm condition (4) — specifically
|
||||
**M2(a) cancel-mid-run re-run on the CONC-A1-fixed harness** (b+d already re-confirmed: builds
|
||||
287+288 parallel both success on fixed main; a's only prior evidence (build 267) was on the
|
||||
pre-CONC-A1, pre-wrapper-fix harness) — plus the push build green on current main. (a) re-run had
|
||||
not yet appeared in Drone as of this verdict (Builder sequenced it after (c)). I will verify it cold
|
||||
when it lands.
|
||||
|
||||
## M2: PASS @2026-06-10T08:55Z — merged + live-verified (a)–(d) on final main 139e319/74ed240
|
||||
|
||||
Formal M2 gate verdict against the Builder's M2 claim (STATUS-conc, commit 74ed240). Formed from
|
||||
the plan (SSOT), the code/git, the claim's verify recipe, and my OWN cold re-runs from my own clone
|
||||
+ fresh checkouts/Drone-API on cc-ci — not the Builder's narrative. All seven claim items confirmed:
|
||||
|
||||
1. **Merge integrity** — `git diff 139e319 b6e12ef -- runner/ tests/ docs/ .drone.yml nix/` = 0 lines;
|
||||
`b6e12ef ⊆ 139e319`; merge parents `2173894 ∘ b6e12ef`. So deployed main code == the CONC-A1 tree
|
||||
I code-verified + mutation-proofed. No force-push (history linear). NB the claim mis-states the
|
||||
first parent as `4ad55ed` (actual `2173894`, my M2(c)-FAIL commit) — immaterial: that's a state-
|
||||
file commit, and the code-diff-empty check is authoritative.
|
||||
2. **Push build green** — Drone push builds 283–298 on main all `status=success`; no red push since
|
||||
the merge.
|
||||
3. **Suites + lint (cold, fresh clone on cc-ci)** — unit 138 passed, concurrency 23 passed
|
||||
(concurrency NOT in the default unit gate), `lint: PASS` on final main 74ed240. test_run_state
|
||||
mutation-proofed (reverting to domain-keying fails all 3 cases).
|
||||
4. **(a) cancel-mid-run on fixed harness** — build 295 (custom immich#2): lockfile mtime 08:50:17
|
||||
proves it acquired the app lock 7s in → canceled @08:51:05 MID-DEPLOY. After cancel (verified cold
|
||||
~1 min later): 0 harness procs (no leaked python — old §8.1 gap stays closed), no held locks (lock
|
||||
released), no immich app/.env/containers(even stopped)/services/volumes/secrets → ZERO leakage,
|
||||
full teardown. Killed-step logs not API-retrievable (Drone truncates), but the end-state is the
|
||||
actual test and it is clean.
|
||||
5. **(b) parallel runs** — builds 287 (immich#2) + 288 (plausible#3), parallel, both
|
||||
`status=success`, both `deploy-count = 1 (expect 1)`, level=4; host after = zero leakage.
|
||||
6. **(c) double-!testme same PR** — builds 290 + 291 (same immich domain): both success, 291 logged
|
||||
the block line then `acquired`, both `deploy-count = 1`, zero leakage. Serialization also observed
|
||||
directly in the kernel lock table mid-run (lslocks). Covered in detail by my M2(c) PASS @09:05Z.
|
||||
7. **(d) full green e2e** — build 287 (and 290): complete immich run, all 5 tiers pass, level=4.
|
||||
|
||||
Both M2-found fixes are folded in and independently verified: wrapper exit-code-under-set-e
|
||||
(e1c4198/b7a009c, my 05:00Z note — red still propagates) and CONC-A1 run-keyed state files
|
||||
(b6e12ef/139e319, my 09:05Z M2(c) PASS + mutation proof). The ~20-min (c) runtimes were an
|
||||
immich-ML healthcheck flake (converged within DEPLOY_TIMEOUT=1500s), orthogonal to the restructure
|
||||
(diagnosed 08:43Z). Unheld 0-byte leftover lockfiles are by-design (next-janitor tidy-sweep).
|
||||
|
||||
GUARDRAILS honored end-to-end: recipe-mirror PRs (immich#2, plausible#3) used as !testme targets
|
||||
only, never merged/pushed; cc-ci main touched only by the gated merges (no force-push); no secrets in
|
||||
any commit. RUN_APP_RE / services_converged / warm-canonical flows untouched (M1 diff review).
|
||||
|
||||
CONCLUSION: **M2 — merged + live-verified — PASS.** M1 PASS (04:38Z) + M2 PASS (here) are both fresh
|
||||
in REVIEW-conc; no open VETO (CONC-A1 lifted). Per the phase DoD the Builder may now write `## DONE`
|
||||
to STATUS-conc. (Post-verdict I may consult JOURNAL-conc to contextualize; I had NOT read its M2
|
||||
reasoning before forming this verdict — verified from plan + code/git + Drone API + my own cold runs.)
|
||||
|
||||
34
REVIEW-rcust.md
Normal file
34
REVIEW-rcust.md
Normal file
@ -0,0 +1,34 @@
|
||||
# REVIEW-rcust.md — Adversary ledger for the recipe-customization restructure phase
|
||||
|
||||
SSOT for this phase: `/srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md`.
|
||||
Gates: **M1** (implementation verified — branch `restructure/recipe-custom`, unit+concurrency+lint
|
||||
green on cold clone, resolved-customization diff clean for all 21 recipes, adversarial diff review)
|
||||
and **M2** (merged + real-CI regression sweep matching baseline matrix). DONE requires fresh PASS
|
||||
for both with no open VETO.
|
||||
|
||||
I own this file and the `## Adversary findings` section of BACKLOG-rcust.md only.
|
||||
|
||||
---
|
||||
|
||||
## Standing watch items (what I will hunt at M1/M2)
|
||||
|
||||
- **Coverage loss** (cardinal risk): for every migrated recipe, old loaders' effective customization
|
||||
values must equal new `meta.load()` values. Throwaway diff script over all 21 recipe dirs; any
|
||||
delta = finding.
|
||||
- **Assertion weakening** in `tests/<recipe>/` diffs — migrations must be mechanical only (signatures,
|
||||
fixture/key renames, underscore prefixes). Any changed assert/expected value = VETO.
|
||||
- **Deleted-code fallout** — dangling refs to `_recipe_meta`, `_load_meta`, `_recipe_extra_env`,
|
||||
`_recipe_meta_flag`, `declared_deps`, `is_canonical_enrolled`, `OIDC_AT_INSTALL`,
|
||||
`CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`, `setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`.
|
||||
- **Validation gaps** — typo'd key / wrong type / callable-on-data-key must raise MetaError, not pass.
|
||||
- **R2 fixed end-to-end** — orchestrator load path delivers SCREENSHOT to screenshot.py.
|
||||
- **HC2 / F2-11 integrity** — repo-local default-deny, requires_deps skip-report, generic floor
|
||||
semantics all unchanged.
|
||||
|
||||
---
|
||||
|
||||
## Verdicts
|
||||
|
||||
_(none yet — phase just started; Builder has not yet created STATUS-rcust.md or branch
|
||||
`restructure/recipe-custom`. Only the reference spec doc `76a4b6b` has landed. Awaiting first
|
||||
`claim(rcust): M1` from the Builder.)_
|
||||
@ -2,17 +2,60 @@
|
||||
|
||||
Plan: /srv/cc-ci/cc-ci-plan/concurrency-restructure-full-plan.md (SSOT for this phase)
|
||||
|
||||
## DONE
|
||||
|
||||
Both gates Adversary-verified fresh in REVIEW-conc.md, no open VETO:
|
||||
- M1 — implementation verified: PASS @2026-06-10T04:38Z (branch @d3fe9e2)
|
||||
- M2 — merged + live-verified (a)–(d): PASS @2026-06-10T08:55Z (final main 139e319/74ed240)
|
||||
- CONC-A1 (M2(c) live finding): fixed b6e12ef, veto LIFTED + closed @09:05Z
|
||||
|
||||
## Phase state
|
||||
|
||||
- Phase: conc — concurrency restructure (P1–P5 + tests/concurrency)
|
||||
- Builder branch: `restructure/concurrency` (code lands there; main untouched until M2 merge)
|
||||
- In flight: P1 (lock-lifetime hardening)
|
||||
- Gate: none claimed yet
|
||||
- Phase: conc — concurrency restructure (P1–P5 + tests/concurrency) — COMPLETE
|
||||
- Merged to main: bb5eb3d (restructure) + b7a009c (wrapper exit-code fix) + 139e319 (CONC-A1 fix)
|
||||
- Correction per M2 verdict: 139e319's first parent is 2173894 (not 4ad55ed as the claim said);
|
||||
immaterial — the code-diff-empty check (139e319 vs b6e12ef) is authoritative.
|
||||
|
||||
## Gates
|
||||
## Gate claim: M2 — merged + live-verified
|
||||
|
||||
- M1 (implementation verified): NOT CLAIMED
|
||||
- M2 (merged + live-verified): NOT CLAIMED — blocked on M1 PASS
|
||||
**WHAT**: branch merged to main after M1 PASS; live verification (a)–(d) all green on the final
|
||||
main code (which includes two M2-found fixes, both already Adversary-verified: wrapper exit-code
|
||||
e1c4198/b7a009c, CONC-A1 run-keyed state files b6e12ef/139e319).
|
||||
|
||||
**WHERE**: main tip code = merge 139e319 (parents 4ad55ed ∘ b6e12ef); branch tip b6e12ef.
|
||||
All evidence builds ran post-139e319. Drone repo recipe-maintainers/cc-ci; host cc-ci.
|
||||
|
||||
**HOW + EXPECTED (cold re-check from your own access path):**
|
||||
|
||||
1. Merge integrity: `git diff 139e319 b6e12ef -- runner/ tests/ docs/ .drone.yml nix/` → EMPTY;
|
||||
no force-push anywhere (reflog linear).
|
||||
2. Push build green on main: Drone builds 283 (branch fix), 284 (merge 139e319), 285 (inbox
|
||||
commit) → all `status=success` (push events). No main push since has a red build.
|
||||
3. Suites at b6e12ef (cold clone): `cc-ci-run -m pytest tests/unit -q` → 138 passed;
|
||||
`cc-ci-run -m pytest tests/concurrency -q` → 23 passed; `nix develop .#lint --command bash
|
||||
scripts/lint.sh` → lint: PASS. (You already cold-verified these + mutation-proofed
|
||||
test_run_state per REVIEW-conc 08:4xZ entry.)
|
||||
4. **(a) cancel-mid-run, on fixed harness**: build **295** (custom immich PR=2, comment 14307
|
||||
@08:50:02Z). Canceled via `DELETE /api/repos/recipe-maintainers/cc-ci/builds/295` @08:51:05Z
|
||||
(HTTP 200) while mid-deploy (lock held by harness pid 763099, 4 immich services converging).
|
||||
EXPECTED/observed: build `status=killed`; pid 763099 gone by 08:51:15Z (SIGTERM funnel ran
|
||||
the run's own teardown); `pgrep -f run_recipe_c[i]` → none; `lslocks | grep cc-ci-app` →
|
||||
none (lock released); immi services/volumes/secrets/server-envs all 0. Zero leakage, no
|
||||
janitor needed (better than plan minimum).
|
||||
5. **(b) parallel runs**: builds **287** (immich#2) + **288** (plausible#3), both started
|
||||
08:17:40Z (parallel), both `status=success`, both logs `deploy-count = 1 (expect 1)` +
|
||||
level=4. Host after: zero harness procs / services / volumes / secrets / envs.
|
||||
6. **(c) double-!testme same PR**: builds **290** + **291** (both immich#2, domain immi-ad3e33).
|
||||
291 log line 1: `== app lock: another run of immi-ad3e33... is in flight — waiting ==`,
|
||||
`acquired` @+1411s = exactly 290's exit (08:46:05Z). BOTH `status=success`, both
|
||||
`deploy-count = 1`, level=4. Zero leakage after. (Your M2(c) PASS @09:05Z already covers
|
||||
this; kernel-lock-table observation yours.)
|
||||
7. **(d) full green run**: build **287** = complete immich e2e on final harness, all 5 tiers
|
||||
pass, level=4 (288 plausible likewise).
|
||||
|
||||
**Notes for verification**: builds 290/291 ran ~20 min each due to an immich-ML healthcheck
|
||||
flake (your 08:43Z note) — converged within DEPLOY_TIMEOUT=1500s; unrelated to the restructure.
|
||||
Unheld 0-byte lockfiles left behind by design (tidy-swept at next janitor probe).
|
||||
|
||||
## Blockers
|
||||
|
||||
|
||||
22
STATUS-rcust.md
Normal file
22
STATUS-rcust.md
Normal file
@ -0,0 +1,22 @@
|
||||
# STATUS — sub-phase rcust (recipe-customization restructure)
|
||||
|
||||
Plan: /srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md (SSOT for this phase).
|
||||
Reference spec: docs/recipe-customization.md @ 76a4b6b.
|
||||
Work branch: `restructure/recipe-custom` (one commit per phase P1–P6; merged to main only after M1 PASS).
|
||||
|
||||
## Phase progress
|
||||
|
||||
- [ ] P1 — harness/meta.py single loader + key registry + migrate L1–L6 + unit tests + doc gen
|
||||
- [ ] P2 — delete legacy keys/paths (CHAOS_BASE_DEPLOY, OIDC_AT_INSTALL, SKIP_GENERIC meta, conftest cleanup)
|
||||
- [ ] P3 — uniform ctx hook convention
|
||||
- [ ] P4 — custom-test ergonomics (placement rule, op_state/deps fixtures)
|
||||
- [ ] P5 — customization manifest
|
||||
- [ ] P6 — docs
|
||||
|
||||
## Gate
|
||||
|
||||
(none claimed yet — phase bootstrap)
|
||||
|
||||
## Current
|
||||
|
||||
Bootstrapping phase; starting P1.
|
||||
383
docs/recipe-customization.md
Normal file
383
docs/recipe-customization.md
Normal file
@ -0,0 +1,383 @@
|
||||
# Recipe customization — review spec
|
||||
|
||||
Status: REVIEW SPEC — describes the customization surface as it exists today (main), written so
|
||||
the structure can be reviewed and potentially restructured. §8 lists known limitations and
|
||||
restructuring candidates; everything before it is purely descriptive.
|
||||
|
||||
Companion docs: `docs/testing.md` (test architecture / tier semantics), `docs/enroll-recipe.md`
|
||||
(step-by-step enrollment). This doc is the **complete reference** for the two questions those docs
|
||||
answer only partially:
|
||||
|
||||
1. How are custom tests written for a particular recipe?
|
||||
2. What are ALL the per-recipe CI settings, where do they live, and who reads them?
|
||||
|
||||
---
|
||||
|
||||
## 1. The three customization surfaces
|
||||
|
||||
A recipe customizes its CI through **three distinct mechanisms** (worth noticing for the
|
||||
restructure review — they are three different config languages):
|
||||
|
||||
| Surface | Form | Examples |
|
||||
|---|---|---|
|
||||
| **Declarative settings** | Python assignments in `tests/<recipe>/recipe_meta.py` | `DEPLOY_TIMEOUT = 1500`, `UPGRADE_BASE_VERSION = "2.3.1+..."` |
|
||||
| **Code hooks** | Callables in `recipe_meta.py`, `ops.py` functions, shell hooks | `def READY_PROBE(domain): ...`, `pre_upgrade()`, `install_steps.sh` |
|
||||
| **File presence** | A file existing at a discovered path changes behavior | `test_upgrade.py` overlay, `functional/test_*.py`, `compose.ccci.yml` |
|
||||
|
||||
There is additionally a fourth, operator-facing surface: **environment variables**
|
||||
(`CCCI_SKIP_GENERIC*`) that override declarative settings at run time (§4.4).
|
||||
|
||||
## 2. Zero-config baseline
|
||||
|
||||
A recipe with **no `tests/<recipe>/` directory at all** still gets the full generic floor:
|
||||
|
||||
- deploy base version → INSTALL (generic `assert_serving`: HTTP on `/`, expect 200/301/302)
|
||||
- chaos-upgrade to PR head → UPGRADE (generic `assert_upgraded`: version label matches head, converged, serving)
|
||||
- BACKUP (generic `assert_backup_artifact`) — iff the recipe's compose files carry
|
||||
`backupbot.backup` labels (auto-detected), else N/A
|
||||
- RESTORE (generic `assert_restore_healthy`)
|
||||
- CUSTOM tier: empty (no custom tests discovered)
|
||||
- teardown
|
||||
|
||||
Defaults: `HEALTH_PATH="/"`, `HEALTH_OK=(200,301,302)`, `DEPLOY_TIMEOUT=600`, `HTTP_TIMEOUT=300`.
|
||||
Everything in this doc is opt-in deviation from that floor. The cardinal invariant
|
||||
(docs/testing.md §1): the generic floor is **always on** and never depends on custom code;
|
||||
custom is **additive** by default.
|
||||
|
||||
## 3. The per-recipe tree — every file that can exist
|
||||
|
||||
Two locations, with precedence and a security gate between them:
|
||||
|
||||
- **cc-ci-owned**: `tests/<recipe>/` in this repo (trusted, maintainer-reviewed)
|
||||
- **repo-local**: the recipe repo's own `tests/` dir (PR-author-controlled → **default-deny**,
|
||||
consulted only when the recipe is listed in `tests/repo-local-approved.txt` — gate HC2,
|
||||
centralized in `runner/harness/discovery.py`)
|
||||
|
||||
```
|
||||
tests/<recipe>/ # cc-ci side (repo-local mirrors the same shape)
|
||||
├── recipe_meta.py # ALL declarative settings + meta callables (§4)
|
||||
├── test_<op>.py # lifecycle overlay assertions, op ∈ install|upgrade|backup|restore (§5.1)
|
||||
├── ops.py # pre_<op>(domain, meta) seed hooks (§5.2)
|
||||
├── test_*.py # custom-tier tests (top-level, cross-cutting)(§5.3)
|
||||
├── functional/test_*.py # custom tier: parity ports + recipe-specific (§5.3)
|
||||
├── playwright/test_*.py # custom tier: UI flows (§5.3)
|
||||
├── install_steps.sh # pre-deploy shell hook (§5.4)
|
||||
├── setup_custom_tests.sh # deps/OIDC credential wiring hook (§5.5)
|
||||
├── compose.ccci.yml # CI-only compose overlay (via install_steps) (§5.6)
|
||||
└── PARITY.md # enrollment contract doc (human-read only)
|
||||
```
|
||||
|
||||
Precedence (machine-docs/DECISIONS.md, implemented in `discovery.py`):
|
||||
|
||||
- lifecycle overlay `test_<op>.py`: repo-local **wins** over cc-ci (same-name collision); the
|
||||
generic floor still runs additively alongside.
|
||||
- custom tier `test_*.py`: **ALL** run, from both locations (no collision concept).
|
||||
- `install_steps.sh`: repo-local > cc-ci, or none.
|
||||
- `ops.py` pre-op hook: cc-ci wins; repo-local consulted only if approved.
|
||||
- `recipe_meta.py`: cc-ci only — repo-local recipes cannot set CI settings (by design; the
|
||||
settings surface stays maintainer-controlled).
|
||||
|
||||
## 4. `recipe_meta.py` — complete settings reference
|
||||
|
||||
The single settings file. Plain Python, `exec()`d by the harness (trusted, in-repo). A key is "set"
|
||||
by a top-level assignment or `def`. Unknown names are ignored silently (a recipe may keep private
|
||||
constants here, e.g. mumble's `WELCOME_TEXT_MARKER` — but see §8 R6: typos in real key names are
|
||||
also silently ignored).
|
||||
|
||||
**Loader column legend** — this is the structural finding for the review (§8 R1). There is no
|
||||
single loader; six independent code paths each `exec()` the file and pick out their own keys:
|
||||
|
||||
| # | Loader | Keys it sees |
|
||||
|---|---|---|
|
||||
| L1 | `runner/run_recipe_ci.py:_load_meta` (orchestrator) | 4 base + explicit 8-key allowlist |
|
||||
| L2 | `tests/conftest.py:_recipe_meta` (pytest `meta` fixture) | 4 base keys ONLY |
|
||||
| L3 | `runner/harness/lifecycle.py:_recipe_extra_env` | `EXTRA_ENV` only |
|
||||
| L4 | `runner/harness/lifecycle.py:_recipe_meta_flag` | boolean flags by name (`CHAOS_BASE_DEPLOY`) |
|
||||
| L5 | `runner/harness/deps.py:declared_deps` | `DEPS` only |
|
||||
| L6 | `runner/harness/canonical.py:is_canonical_enrolled` | `WARM_CANONICAL` only |
|
||||
|
||||
> **Restructure status (rcust P1):** the six loaders above are HISTORY — they have been replaced by
|
||||
> the single registry-backed loader `runner/harness/meta.py::load(recipe) -> RecipeMeta` (the only
|
||||
> `exec()` of `recipe_meta.py`). Unknown ALL-CAPS keys / type mismatches are now hard errors;
|
||||
> underscore-prefixed names are recipe-private. The authoritative key reference is the generated
|
||||
> table below; the per-loader subsections §4.1–§4.8 are retained for context until the P6 doc
|
||||
> rewrite.
|
||||
|
||||
<!-- META-TABLE-START -->
|
||||
|
||||
_This table is GENERATED from the `runner/harness/meta.py` KEYS registry by `scripts/gen-meta-docs.py` — do not edit by hand (a unit test pins the sync)._
|
||||
|
||||
| Key | Type | Default | Meaning |
|
||||
|---|---|---|---|
|
||||
| `HEALTH_PATH` | `str` | `'/'` | Path probed for serving/health checks (deploy wait + generic `assert_serving`). |
|
||||
| `HEALTH_OK` | `tuple[int]` | `(200, 301, 302)` | Acceptable HTTP status codes for health. |
|
||||
| `DEPLOY_TIMEOUT` | `int` | `600` | Max seconds to wait for swarm convergence per deploy. |
|
||||
| `HTTP_TIMEOUT` | `int` | `300` | Max seconds to wait for HTTP health after convergence. |
|
||||
| `BACKUP_CAPABLE` | `bool` | `None` | Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect. |
|
||||
| `EXPECTED_NA` | `dict` | `None` | Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes. |
|
||||
| `READY_PROBE` | `hook` | `None` | Callable `(ctx) -> [probe, ...]` returning extra readiness probes, run after install AND after upgrade: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}`. |
|
||||
| `UPGRADE_BASE_VERSION` | `str` | `None` | Exact published tag overriding the upgrade tier's base (default: `recipe_versions[-2]`). |
|
||||
| `BACKUP_VERIFY` | `hook` | `None` | Callable `(ctx) -> bool` post-backup data-capture check; `False` re-runs the backup (truncated-dump race guard), retried up to 3 attempts. |
|
||||
| `UPGRADE_EXTRA_ENV` | `dict_or_hook` | `None` | Extra `.env` keys applied after the PR-head checkout, before the chaos redeploy (env that exists only at head). Dict, or callable `(ctx) -> dict`. |
|
||||
| `EXTRA_ENV` | `dict_or_hook` | `{}` | Extra `.env` keys applied at EVERY deploy (base install AND upgrade old-app). Dict, or callable `(ctx) -> dict` deriving values from the per-run domain (`ctx.domain`). |
|
||||
| `DEPS` | `list[str]` | `[]` | Dep recipes deployed/provisioned alongside (e.g. `["keycloak"]`); creds land in `$CCCI_DEPS_FILE`. |
|
||||
| `WARM_CANONICAL` | `bool` | `False` | Enroll the recipe in the warm/canonical app system (docs/warm.md): green cold runs on LATEST advance the canonical snapshot. |
|
||||
| `SCREENSHOT` | `hook` | `None` | Callable `(page, ctx)` driving Playwright to a safe, credential-free post-login view for the results-card screenshot (default: landing page). |
|
||||
|
||||
<!-- META-TABLE-END -->
|
||||
|
||||
### 4.1 HTTP / health / timing (base 4 — seen by L1 AND L2)
|
||||
|
||||
| Key | Type / default | Meaning | Used by |
|
||||
|---|---|---|---|
|
||||
| `HEALTH_PATH` | str, `"/"` | Path probed for serving/health checks | deploy wait (`lifecycle.py`), generic `assert_serving` |
|
||||
| `HEALTH_OK` | tuple, `(200, 301, 302)` | Acceptable HTTP status codes for health | same |
|
||||
| `DEPLOY_TIMEOUT` | int s, `600` | Max wait for swarm convergence per deploy | `lifecycle.py`, generic ops |
|
||||
| `HTTP_TIMEOUT` | int s, `300` | Max wait for HTTP health after converged | same |
|
||||
|
||||
Example: immich sets `DEPLOY_TIMEOUT = 1500`, `HTTP_TIMEOUT = 600` (ML containers are slow).
|
||||
|
||||
### 4.2 Upgrade tier (loader L1)
|
||||
|
||||
| Key | Type / default | Meaning |
|
||||
|---|---|---|
|
||||
| `UPGRADE_BASE_VERSION` | str (exact published tag), default `None` | **The "base pin"** — overrides the harness default base for the upgrade tier. Default base = `recipe_versions[-2]` (the previous published version); pin when that is not the PR's true predecessor (e.g. the PR is the first release on a new major, or the previous tag is known-broken). Must be an exact published tag — typos fail the base deploy. Consumed at `run_recipe_ci.py` (`prev = meta.get("UPGRADE_BASE_VERSION") or lifecycle.previous_version(recipe)`). Users: discourse, plausible. |
|
||||
| `UPGRADE_EXTRA_ENV` | dict **or** callable `(domain) -> dict`, default `None` | Extra `.env` keys applied **after** the PR-head checkout, **before** the chaos redeploy (F2-14c) — for env vars that exist only at head (a new required setting introduced by the PR). Consumed in `generic.py:256`. User: mumble. |
|
||||
|
||||
### 4.3 Every-deploy shaping (loaders L3/L4 — NOT in the L1 allowlist)
|
||||
|
||||
| Key | Type / default | Meaning |
|
||||
|---|---|---|
|
||||
| `EXTRA_ENV` | dict **or** callable `(domain) -> dict`, default `{}` | Extra `.env` keys applied at **every** deploy (base install AND upgrade old-app). Callable form derives values from the per-run domain (e.g. cryptpad's `SANDBOX_DOMAIN`). Loaded by `lifecycle.py:_recipe_extra_env` (its own `exec()`). Users: cryptpad, discourse, ghost, matrix-synapse, mattermost-lts, mumble, plausible. |
|
||||
| `CHAOS_BASE_DEPLOY` | bool, default `False` | Base deploy uses `--chaos` so it survives untracked files in the recipe checkout (required when `install_steps.sh` copies in a `compose.ccci.yml` overlay — §5.6; implicit coupling, see §8 R7). Loaded by `lifecycle.py:_recipe_meta_flag`. Users: discourse, ghost. |
|
||||
|
||||
### 4.4 Skips and intentional N/A (loader L1)
|
||||
|
||||
| Key | Type / default | Meaning |
|
||||
|---|---|---|
|
||||
| `SKIP_GENERIC` | list of op names or `"all"`/`"*"`, default `[]` | Suppress the generic floor for the listed ops (overlay becomes override instead of additive). Two env equivalents at run time: `CCCI_SKIP_GENERIC=1` (all ops), `CCCI_SKIP_GENERIC_<OP>=1` (one op). Currently set by **no enrolled recipe** (env form is the one used, ad hoc). |
|
||||
| `EXPECTED_NA` | dict `{rung: reason}`, default `None` | Declares an N/A rung **intentional** (e.g. `{"backup": "stateless, nothing to back up"}`). Undeclared N/A is reported as an *unintentional coverage gap*. Both cap the achievable level — declaring does not un-cap, it only changes the report wording (`results.py`). User: custom-html-tiny. |
|
||||
| `BACKUP_CAPABLE` | bool, default auto-detect | Overrides the backup-tier capability detection (scan of recipe compose files for `backupbot.backup` labels, `generic.py:34`). `False` forces N/A; `True` forces the tier on. Users: custom-html-bkp-bad/rst-bad (harness self-test recipes). |
|
||||
|
||||
### 4.5 Readiness & data-verification hooks (loader L1, callable values)
|
||||
|
||||
| Key | Type / default | Meaning |
|
||||
|---|---|---|
|
||||
| `READY_PROBE` | callable `(domain) -> [probe, ...]`, default `None` | Extra readiness probes run after install AND after upgrade, before that tier's assertions. Probe dicts: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}` (`stable`: must stay connectable across 3 checks — for UDP-adjacent voice ports etc.). Consumed at `lifecycle.py:516`. Users: lasuite-drive, mumble (TCP voice port). |
|
||||
| `BACKUP_VERIFY` | callable `(domain) -> bool`, default `None` | Post-backup data-capture check, retried — guards the truncated-dump race (backup snapshot taken before the seeded marker row hit disk). Return `False` → retry the backup, then fail. Users: discourse, ghost. |
|
||||
|
||||
### 4.6 Dependencies / SSO (loaders L5 + L1)
|
||||
|
||||
| Key | Type / default | Meaning |
|
||||
|---|---|---|
|
||||
| `DEPS` | list of recipe names, default `[]` | Dep recipes deployed alongside (e.g. `["keycloak"]`). Dep domain is `<dep[:4]>-<6hex>`, hashed from (parent, pr, ref, dep) — collision-free per run. Creds land in `$CCCI_DEPS_FILE` (JSON); tests use the `deps_apps` fixture; teardown deps LAST. Deploy-count guard becomes `1 + len(DEPS)`. Loaded by `deps.py:declared_deps`. Users: lasuite-docs/-drive/-meet. |
|
||||
| `OIDC_AT_INSTALL` | bool, default `False` | Provision deps **before** the single base deploy so `install_steps.sh` can wire OIDC env into that one deploy (reads `$CCCI_DEPS_FILE`). Default (legacy) is post-deploy provisioning + a `setup_custom_tests.sh` redeploy. Consumed at `run_recipe_ci.py:514`. Users: lasuite-drive, lasuite-meet. |
|
||||
|
||||
### 4.7 Warm-canonical enrollment (loader L6)
|
||||
|
||||
| Key | Type / default | Meaning |
|
||||
|---|---|---|
|
||||
| `WARM_CANONICAL` | bool, default `False` | Enrolls the recipe in the warm/canonical app system (`docs/warm.md`): green COLD runs on LATEST advance the canonical snapshot; the nightly sweep iterates enrolled recipes. Loaded by `canonical.py:is_canonical_enrolled`. User: custom-html. |
|
||||
|
||||
### 4.8 Cosmetic (BROKEN — see §8 R2)
|
||||
|
||||
| Key | Type / default | Meaning |
|
||||
|---|---|---|
|
||||
| `SCREENSHOT` | callable `(page, domain, meta) -> None` | Drives Playwright to a safe post-login view for the results-card screenshot (default: landing page). **Currently unreachable from the CI path**: `screenshot.py:41` reads it from the meta dict the orchestrator passes (`run_recipe_ci.py:1056`), but the L1 allowlist never loads `SCREENSHOT`, so the hook is always `None`. No recipe sets it (consistent with it never having worked). |
|
||||
|
||||
## 5. Writing custom tests & hooks
|
||||
|
||||
### 5.1 Lifecycle overlay assertions — `test_<op>.py`
|
||||
|
||||
One pytest file per lifecycle op (`install` / `upgrade` / `backup` / `restore`). The
|
||||
**orchestrator performs the op exactly once**; the overlay only *asserts* on the resulting state
|
||||
(HC3 op/assertion split — overlays never deploy, never restore, never mutate). The generic floor
|
||||
test runs additively against the same state.
|
||||
|
||||
Conventions (see `tests/immich/test_backup.py` etc.):
|
||||
- use the `live_app` fixture (asserts `CCCI_APP_DOMAIN` is set, yields the domain)
|
||||
- use the `meta` fixture for HEALTH_*/timeouts (note: only the 4 base keys — §8 R3)
|
||||
- read op context from `$CCCI_OP_STATE_FILE` (JSON written by the orchestrator after the op:
|
||||
versions, artifact paths)
|
||||
- execute in-container checks via `harness.lifecycle.exec_in_app(domain, service, cmd)`
|
||||
|
||||
### 5.2 Pre-op seed hooks — `ops.py`
|
||||
|
||||
`def pre_<op>(domain, meta)` callables, imported and called by the orchestrator **before**
|
||||
performing the op. This is where data gets seeded so the post-op overlay can assert on it:
|
||||
|
||||
```python
|
||||
# tests/immich/ops.py (pattern)
|
||||
def pre_upgrade(domain, meta): _psql(domain, "INSERT ... 'upgrade-survives'")
|
||||
def pre_backup(domain, meta): _psql(domain, "INSERT ... 'original'")
|
||||
def pre_restore(domain, meta): _psql(domain, "DROP TABLE ci_marker") # damage, restore must undo
|
||||
```
|
||||
|
||||
Seed → op → assert is the whole pattern: `pre_backup` writes a marker, the orchestrator backs up,
|
||||
`pre_restore` destroys it, the orchestrator restores, `test_restore.py` asserts the marker is back.
|
||||
|
||||
### 5.3 Custom tier — `functional/`, `playwright/`, top-level `test_*.py`
|
||||
|
||||
All non-lifecycle `test_*.py` (discovery: `discovery.py:custom_tests`, recursive over the
|
||||
top-level dir + `functional/` + `playwright/`; files named `test_<op>.py` excluded). Run in the
|
||||
CUSTOM tier, after restore, against the post-upgrade (PR-head) app. ALL discovered files run —
|
||||
cc-ci's and (if HC2-approved) repo-local's, additively.
|
||||
|
||||
Enrollment contract (`docs/enroll-recipe.md`): ≥2 NEW functional tests beyond ports of existing
|
||||
upstream checks; ported tests carry `SOURCE:` comments. Playwright tests get the shared
|
||||
browser/harness helpers (`harness.browser`); SSO recipes get `harness.sso`
|
||||
(`setup_keycloak_realm` — idempotent, `oidc_password_grant` — provider-pluggable).
|
||||
|
||||
Tests gate on deps via `CCCI_DEPS_READY` (skip-with-reason when `0`; the skip is counted and
|
||||
fails the run if deps were declared but unprovisionable — `run_recipe_ci.py:816`).
|
||||
|
||||
### 5.4 Pre-deploy shell hook — `install_steps.sh`
|
||||
|
||||
Runs after `abra app new` + `EXTRA_ENV` application + secret generation, **before** the base
|
||||
deploy. For setup that must precede the first deploy: writing extra config files into the recipe
|
||||
checkout, copying in a `compose.ccci.yml` overlay (§5.6), editing `.env` beyond simple key=val.
|
||||
|
||||
Env contract: `CCCI_APP_DOMAIN`, `CCCI_RECIPE`, `CCCI_APP_ENV` (path to the app's `.env`), and —
|
||||
when `OIDC_AT_INSTALL` deps exist — `CCCI_DEPS_FILE`. Must locate the recipe checkout
|
||||
ABRA_DIR-aware: `RECIPE_DIR="${ABRA_DIR:-${HOME}/.abra}/recipes/${CCCI_RECIPE}"` (per-run
|
||||
`ABRA_DIR` since the concurrency restructure — a hardcoded `~/.abra` writes to the wrong tree).
|
||||
|
||||
Graceful-generic rule: a recipe needing a hook but not shipping one simply fails the generic
|
||||
install — a correct reported outcome, not a harness error.
|
||||
|
||||
### 5.5 Deps credential wiring — `setup_custom_tests.sh`
|
||||
|
||||
For legacy (post-deploy) deps provisioning: runs after deps are up, reads `$CCCI_DEPS_FILE`
|
||||
(jq-readable JSON of dep creds/URLs), wires OIDC config via `abra app config set` + secrets, and
|
||||
redeploys. With `OIDC_AT_INSTALL = True` this hook is unnecessary (wiring happens in
|
||||
`install_steps.sh` before the only deploy) — preferred for new enrollments (one deploy, no
|
||||
deploy-count exception).
|
||||
|
||||
### 5.6 CI-only compose overlay — `compose.ccci.yml`
|
||||
|
||||
Not auto-discovered: `install_steps.sh` copies it into the recipe checkout, and the recipe must
|
||||
set `CHAOS_BASE_DEPLOY = True` so the base deploy (`--chaos`) tolerates the untracked file.
|
||||
Policy: minimal, justified fallback only (ghost's is a 15m `start_period` grace — a literal,
|
||||
because abra validates `start_period` before env substitution). The overlay is cc-ci-owned even
|
||||
though it rides in the recipe checkout.
|
||||
|
||||
### 5.7 Environment contract summary (what custom code can read)
|
||||
|
||||
| Var | Set for | Meaning |
|
||||
|---|---|---|
|
||||
| `CCCI_APP_DOMAIN` | all tests + hooks | the app's per-run domain |
|
||||
| `CCCI_BASE_URL` | approved repo-local code | `https://<domain>` |
|
||||
| `CCCI_RECIPE`, `CCCI_APP_ENV` | `install_steps.sh` | recipe name, app `.env` path |
|
||||
| `CCCI_OP_STATE_FILE` | overlay tests | JSON op context (versions, artifacts) |
|
||||
| `CCCI_DEPS_FILE` | deps hooks + tests | JSON dep creds dict |
|
||||
| `CCCI_DEPS_READY` / `CCCI_DEPS_NOT_READY_REASON` | custom tier | gate SSO tests, skip-with-reason |
|
||||
|
||||
## 6. Run-model context (what the settings plug into)
|
||||
|
||||
One deploy chain per run (full detail: `docs/testing.md` §2):
|
||||
|
||||
```
|
||||
deploy BASE (UPGRADE_BASE_VERSION or recipe_versions[-2]; EXTRA_ENV; install_steps.sh;
|
||||
CHAOS_BASE_DEPLOY?; OIDC_AT_INSTALL deps first?)
|
||||
→ INSTALL tier (READY_PROBE; generic + overlay asserts)
|
||||
→ pre_upgrade → chaos-deploy PR HEAD (UPGRADE_EXTRA_ENV)
|
||||
→ UPGRADE tier (READY_PROBE; version-label == head_ref)
|
||||
→ pre_backup → backup (BACKUP_CAPABLE; BACKUP_VERIFY)
|
||||
→ BACKUP tier
|
||||
→ pre_restore → restore
|
||||
→ RESTORE tier
|
||||
→ CUSTOM tier (functional/ + playwright/; deps via CCCI_DEPS_*)
|
||||
→ teardown (deps LAST)
|
||||
```
|
||||
|
||||
Deploy-count guard (DG4.1): exactly `1 + len(DEPS)` deploys per run (chaos redeploys don't
|
||||
count); the per-run counter file is keyed by run since the concurrency restructure.
|
||||
|
||||
## 7. Local iteration
|
||||
|
||||
```
|
||||
RECIPE=<recipe> PR=<n> REF=<sha> SRC=recipe-maintainers/<recipe> \
|
||||
STAGES=install,upgrade,backup,restore,custom \
|
||||
cc-ci-run runner/run_recipe_ci.py
|
||||
```
|
||||
|
||||
(`docs/enroll-recipe.md` §5 for the full loop, including dep teardown caveats.)
|
||||
|
||||
## 8. Known limitations & restructuring candidates
|
||||
|
||||
The review section. Ordered by how much they'd shape a restructure.
|
||||
|
||||
**R1 — Six divergent meta loaders (the core drift hazard).** §4's L1–L6: every loader re-`exec()`s
|
||||
`recipe_meta.py` and cherry-picks its own keys. Adding a key means knowing *which* loader to touch
|
||||
(or that you must extend the L1 allowlist — `SCREENSHOT` proves people don't, R2). Two conventions
|
||||
coexist: L1's explicit allowlist vs L3–L6's ad-hoc `ns.get(...)` which silently bypasses it.
|
||||
*Candidate:* one `harness.meta.load(recipe) -> RecipeMeta` with a declarative key registry
|
||||
(name, type, default, validator, consumer) as the single source of truth; L1–L6 become lookups
|
||||
into the one loaded object; the registry also generates §4 of this doc (kills doc drift, R5).
|
||||
|
||||
**R2 — `SCREENSHOT` is a dead knob.** Fully implemented consumer (`screenshot.py`), documented
|
||||
hook contract, never reachable: the orchestrator's allowlist omits it, so the dict passed at
|
||||
`run_recipe_ci.py:1056` can never contain it. Direct evidence of R1. *Candidate:* fix trivially by
|
||||
adding to the allowlist — or delete the hook path if post-login screenshots aren't wanted; decide
|
||||
during the restructure.
|
||||
|
||||
**R3 — The pytest `meta` fixture sees 4 keys.** `tests/conftest.py:_recipe_meta` loads only
|
||||
HEALTH_*/timeouts. An overlay test wanting e.g. `EXPECTED_NA` or a recipe constant must re-exec
|
||||
the file itself. Probably intended minimalism, but it's a third key-set to keep in sync.
|
||||
*Folds into R1.*
|
||||
|
||||
**R4 — Settings split across three config languages** (§1): recipe_meta keys, file-presence
|
||||
(`install_steps.sh` existing changes deploy behavior), and run-time env (`CCCI_SKIP_GENERIC*`).
|
||||
A reviewer asking "what does this recipe customize?" must check all three. *Candidate:* keep the
|
||||
three surfaces (they serve different actors) but make the run header log a single resolved
|
||||
"customization manifest" per run: every non-default key + every discovered hook file + every
|
||||
CCCI_* override, in one block.
|
||||
|
||||
**R5 — Reference-doc drift already happened.** `docs/testing.md` documents 6 meta keys,
|
||||
`docs/enroll-recipe.md` shows others by example; neither is complete (18 keys exist). This doc is
|
||||
now complete but handwritten — it will drift too. *Candidate:* generate the key table from the R1
|
||||
registry (test asserts doc ⊆ registry).
|
||||
|
||||
**R6 — No schema validation / silent typos.** Unknown top-level names in `recipe_meta.py` are
|
||||
ignored, which is load-bearing (recipes keep private constants there: mumble's
|
||||
`WELCOME_TEXT_MARKER`, `MAX_USERS`). Consequence: misspelling `READY_PROBE` as `READINESS_PROBE`
|
||||
silently disables the probe — the run goes green with less coverage, the worst failure mode for a
|
||||
CI harness. *Candidate:* with the R1 registry, warn (not fail) on ALL-CAPS top-level names that
|
||||
are not registered and not referenced by the recipe's own tests; or namespace private constants
|
||||
(`_WELCOME_TEXT_MARKER`).
|
||||
|
||||
**R7 — `compose.ccci.yml` ⇄ `CHAOS_BASE_DEPLOY` implicit coupling.** The overlay only works if
|
||||
the recipe *also* sets the flag; forgetting it fails the base deploy with an abra
|
||||
untracked-files error far from the cause. *Candidate:* if `install_steps.sh` exists alongside a
|
||||
`compose.ccci.yml`, the harness could auto-enable chaos for the base deploy (or at least assert
|
||||
the flag and fail with a pointed message).
|
||||
|
||||
**R8 — `SKIP_GENERIC` (meta form) has zero users.** Only the env-var form is used, ad hoc. Either
|
||||
the meta key earns its place (first real user) or it's surface to delete in the restructure.
|
||||
|
||||
**R9 — `recipe_meta.py` is code, not config.** Five keys take callables (`EXTRA_ENV`,
|
||||
`UPGRADE_EXTRA_ENV`, `READY_PROBE`, `BACKUP_VERIFY`, `SCREENSHOT`), so the file must stay an
|
||||
`exec()`d Python module — it can't be validated as data, serialized into results, or diffed
|
||||
declaratively. This is a real expressiveness need (cryptpad derives `SANDBOX_DOMAIN` from the
|
||||
per-run domain), not an accident. *Candidate if restructuring:* split data keys (TOML-able,
|
||||
schema-validated) from a `hooks.py` (callables only) — but weigh against the cost of two files
|
||||
per recipe; the R1 registry gets most of the value without the split.
|
||||
|
||||
## 9. File / symbol index
|
||||
|
||||
| Concern | Where |
|
||||
|---|---|
|
||||
| Orchestrator meta loader (L1, allowlist) | `runner/run_recipe_ci.py:250` `_load_meta` |
|
||||
| Pytest meta fixture (L2) | `tests/conftest.py` `_recipe_meta` |
|
||||
| `EXTRA_ENV` loader (L3) | `runner/harness/lifecycle.py:114` `_recipe_extra_env` |
|
||||
| Boolean-flag loader (L4) | `runner/harness/lifecycle.py:132` `_recipe_meta_flag` |
|
||||
| `DEPS` loader (L5) | `runner/harness/deps.py:37` `declared_deps` |
|
||||
| `WARM_CANONICAL` loader (L6) | `runner/harness/canonical.py:36` `is_canonical_enrolled` |
|
||||
| Overlay/custom/hook discovery + HC2 gate | `runner/harness/discovery.py` |
|
||||
| HC2 allowlist | `tests/repo-local-approved.txt` |
|
||||
| Generic assertions + `BACKUP_CAPABLE` detect | `runner/harness/generic.py` |
|
||||
| `READY_PROBE` / `CHAOS_BASE_DEPLOY` consumption | `runner/harness/lifecycle.py:516` / `:283` |
|
||||
| `EXPECTED_NA` reporting | `runner/harness/results.py` |
|
||||
| Dead `SCREENSHOT` consumer | `runner/harness/screenshot.py:36`, called `run_recipe_ci.py:1056` |
|
||||
| Skip-generic logic (meta + env) | `runner/run_recipe_ci.py:285` |
|
||||
| Worked examples | `tests/ghost/` (overlay+chaos), `tests/mumble/` (TCP probe, UPGRADE_EXTRA_ENV), `tests/lasuite-drive/` (DEPS+OIDC_AT_INSTALL), `tests/immich/` (ops.py seed pattern) |
|
||||
@ -1283,3 +1283,15 @@ the commit), which is the correct SCM integration.
|
||||
environment; job is session-persistent (survives as long as Builder session runs). T0-refire
|
||||
verified: CronCreate test fire at 23:17Z → upgrader started, upgrader-cron.log created, status
|
||||
RUNNING. (2026-06-01)
|
||||
|
||||
## conc P3 (2026-06-10, Builder): install_steps.sh hooks resolve $ABRA_DIR — guardrail note
|
||||
|
||||
P3 makes recipe working trees per-run ($ABRA_DIR/recipes). tests/{ghost,discourse}/install_steps.sh
|
||||
hard-coded `${HOME}/.abra/recipes/...` to copy their compose.ccci.yml overlay into the deploy tree;
|
||||
under per-run trees that path is the WRONG (canonical) tree, so the overlay would silently miss the
|
||||
deploy and both recipes' upgrade-tier base deploys would break. Fixed with ONE mechanical line per
|
||||
hook: `RECIPE_DIR="${ABRA_DIR:-${HOME}/.abra}/recipes/${CCCI_RECIPE}"` (identical resolution rule to
|
||||
the abra CLI and abra.recipe_dir()). No test assertion, gate, or overlay content was touched — the
|
||||
phase guardrail's "never touch tests/<recipe>/ content" is read as protecting test/gate SEMANTICS;
|
||||
this is required P3 fallout, equivalent to the harness-side path routing. Flagged here for the
|
||||
Adversary's gate-integrity review.
|
||||
|
||||
@ -30,17 +30,13 @@ import subprocess
|
||||
import time
|
||||
|
||||
from . import abra, warm, warmsnap
|
||||
from . import meta as meta_mod
|
||||
|
||||
|
||||
def is_enrolled(recipe: str) -> bool:
|
||||
"""True if `tests/<recipe>/recipe_meta.py` sets `WARM_CANONICAL = True`. Missing meta → False."""
|
||||
path = os.path.join(os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py")
|
||||
if not os.path.exists(path):
|
||||
return False
|
||||
ns: dict = {}
|
||||
with open(path) as fh:
|
||||
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
|
||||
return bool(ns.get("WARM_CANONICAL"))
|
||||
"""True if `tests/<recipe>/recipe_meta.py` sets `WARM_CANONICAL = True`. Missing meta → False.
|
||||
Reads through the single meta loader (rcust P1 — no per-module exec)."""
|
||||
return bool(meta_mod.load(recipe).WARM_CANONICAL)
|
||||
|
||||
|
||||
def canonical_domain(recipe: str) -> str:
|
||||
@ -51,7 +47,7 @@ def canonical_domain(recipe: str) -> str:
|
||||
def enrolled_recipes() -> list[str]:
|
||||
"""All recipes enrolled as data-warm canonicals (recipe_meta.WARM_CANONICAL=True), sorted. Used
|
||||
by the WC6 nightly sweep to know which canonicals to refresh via a green cold run on latest."""
|
||||
tests_dir = os.path.join(os.path.dirname(__file__), "..", "..", "tests")
|
||||
tests_dir = meta_mod.TESTS_DIR
|
||||
out = []
|
||||
try:
|
||||
for name in sorted(os.listdir(tests_dir)):
|
||||
|
||||
@ -20,7 +20,7 @@ Per Phase-2 DECISIONS:
|
||||
Run state:
|
||||
- `$CCCI_DEPS_FILE` — JSON file written by the orchestrator after each dep deploys; each entry is
|
||||
`{"recipe": "<dep-recipe>", "domain": "<dep-domain>", "version": null}`. Tests access via the
|
||||
`deps_apps` pytest fixture defined in `tests/conftest.py`.
|
||||
`deps` pytest fixture defined in `tests/conftest.py`.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@ -31,19 +31,7 @@ import os
|
||||
from collections.abc import Iterable
|
||||
|
||||
from . import lifecycle, naming
|
||||
|
||||
|
||||
def declared_deps(recipe: str) -> list[str]:
|
||||
"""Read `DEPS` from `tests/<recipe>/recipe_meta.py` — a list of recipe names this recipe needs
|
||||
deployed alongside it. Returns [] if none."""
|
||||
path = os.path.join(os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py")
|
||||
if not os.path.exists(path):
|
||||
return []
|
||||
ns: dict = {}
|
||||
with open(path) as fh:
|
||||
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
|
||||
deps = ns.get("DEPS") or []
|
||||
return [str(d) for d in deps if d]
|
||||
from . import meta as meta_mod
|
||||
|
||||
|
||||
def dep_domain(parent_recipe: str, pr: str, ref: str | None, dep_recipe: str) -> str:
|
||||
@ -62,11 +50,11 @@ def write_run_state(deps_state) -> None:
|
||||
"""Write the deps state file ($CCCI_DEPS_FILE). Two shapes supported (canonical=keyed dict):
|
||||
|
||||
1. **Legacy list-of-entries:** `[{"recipe": "<dep>", "domain": "<d>"}, ...]` (Q2.3 original).
|
||||
Still accepted by `load_run_state` for backwards compat — `deps_apps` fixture flattens.
|
||||
Still accepted by `load_run_state` for backwards compat — the `deps` fixture flattens.
|
||||
2. **NEW per-spec dict (operator-2026-05-28 SSO-dep plan §3.2):**
|
||||
`{"<dep_recipe>": {"recipe": "<dep>", "domain": "<d>", "realm": "...",
|
||||
"client_id": "...", "client_secret": "...", "admin_user": "...", "admin_password": "..."}}`.
|
||||
The `setup_custom_tests.sh` per-recipe hook reads this via `jq` to wire OIDC env.
|
||||
The per-recipe `install_steps.sh` hook reads this via `jq` to wire OIDC env.
|
||||
|
||||
No-op if `$CCCI_DEPS_FILE` isn't set."""
|
||||
path = os.environ.get("CCCI_DEPS_FILE")
|
||||
@ -81,11 +69,12 @@ def deploy_deps(
|
||||
pr: str,
|
||||
ref: str | None,
|
||||
deps: Iterable[str],
|
||||
meta_for: dict[str, dict] | None = None,
|
||||
meta_for: dict | None = None,
|
||||
) -> list[dict]:
|
||||
"""Deploy each declared dep, sequentially, at its per-run domain. Returns the list of state
|
||||
dicts (one per dep). `meta_for` maps dep_recipe -> meta (HEALTH_PATH/HEALTH_OK/timeouts) so the
|
||||
readiness wait uses per-dep config; missing dep meta falls back to (/, 200/301/302, 600s)."""
|
||||
dicts (one per dep). `meta_for` maps dep_recipe -> RecipeMeta (HEALTH_PATH/HEALTH_OK/timeouts)
|
||||
so the readiness wait uses per-dep config; a missing dep meta is loaded via meta.load()
|
||||
(defaults: /, 200/301/302, 600s)."""
|
||||
meta_for = meta_for or {}
|
||||
state: list[dict] = []
|
||||
for dep in deps:
|
||||
@ -94,20 +83,21 @@ def deploy_deps(
|
||||
# NB: each dep_app gets a fresh deploy_count entry only on `_record_deploy` which fires
|
||||
# inside `lifecycle.deploy_app`. For Phase 2 the deploy-count guard (DG4.1) counts the
|
||||
# parent + its deps as distinct install events — by design, since each is a separate app.
|
||||
dm = meta_for.get(dep, {})
|
||||
dm = meta_for.get(dep) or meta_mod.load(dep)
|
||||
lifecycle.deploy_app(
|
||||
dep,
|
||||
domain,
|
||||
secrets=True,
|
||||
deploy_timeout=int(dm.get("DEPLOY_TIMEOUT", 900)),
|
||||
deploy_timeout=int(dm.DEPLOY_TIMEOUT),
|
||||
meta=dm,
|
||||
)
|
||||
try:
|
||||
lifecycle.wait_healthy(
|
||||
domain,
|
||||
ok_codes=tuple(dm.get("HEALTH_OK", (200, 301, 302))),
|
||||
path=dm.get("HEALTH_PATH", "/"),
|
||||
deploy_timeout=int(dm.get("DEPLOY_TIMEOUT", 600)),
|
||||
http_timeout=int(dm.get("HTTP_TIMEOUT", 600)),
|
||||
ok_codes=tuple(dm.HEALTH_OK),
|
||||
path=dm.HEALTH_PATH,
|
||||
deploy_timeout=int(dm.DEPLOY_TIMEOUT),
|
||||
http_timeout=int(dm.HTTP_TIMEOUT),
|
||||
)
|
||||
except Exception:
|
||||
# If a dep fails to converge, abort the whole resolve — let the caller teardown
|
||||
@ -163,7 +153,7 @@ def load_run_state():
|
||||
|
||||
|
||||
def deps_as_dict(state) -> dict[str, dict]:
|
||||
"""Coerce either shape (legacy list or new dict) into a recipe→entry dict for the deps_apps
|
||||
"""Coerce either shape (legacy list or new dict) into a recipe→entry dict for the `deps`
|
||||
fixture + dependent-tests consumption."""
|
||||
if isinstance(state, dict):
|
||||
return state
|
||||
|
||||
@ -11,7 +11,8 @@ hook; the orchestrator decides additive-vs-skip. Sources, in precedence order
|
||||
> cc-ci tests/<recipe>/test_<op>.py
|
||||
(the generic tests/_generic/test_<op>.py is the always-present floor, run separately by default)
|
||||
|
||||
custom (non-lifecycle) test_*.py — ALL run, additively, from BOTH locations (opt-in).
|
||||
custom test_*.py (functional/ + playwright/ ONLY, rcust P4 placement rule) — ALL run,
|
||||
additively, from BOTH locations (opt-in).
|
||||
|
||||
install-steps hook — install_steps.sh: repo-local > cc-ci, or none.
|
||||
|
||||
@ -100,29 +101,22 @@ def resolve_op(recipe: str, op: str, repo_local_dir: str | None) -> tuple[str, s
|
||||
|
||||
|
||||
def custom_tests(recipe: str, repo_local_dir: str | None) -> list[tuple[str, str]]:
|
||||
"""All non-lifecycle test_*.py from cc-ci's tests/<recipe>/ and (if approved) the recipe's
|
||||
repo-local tests/. Discovered locations (Phase 2 §4.1):
|
||||
- the top-level dir tests/<recipe>/test_*.py (legacy + cross-cutting)
|
||||
- functional/ tests/<recipe>/functional/test_*.py (parity ports + recipe-specific)
|
||||
- playwright/ tests/<recipe>/playwright/test_*.py (UI flows P6)
|
||||
Files named `test_<op>.py` (lifecycle ops) are excluded from this list — the orchestrator runs
|
||||
those in their lifecycle tier, not the custom one. Repo-local is consulted only for
|
||||
allowlist-approved recipes (HC2)."""
|
||||
"""All custom-tier test_*.py from cc-ci's tests/<recipe>/ and (if approved) the recipe's
|
||||
repo-local tests/. PLACEMENT RULE (rcust P4): custom tests live ONLY under
|
||||
- functional/ tests/<recipe>/functional/test_*.py (parity ports + recipe-specific)
|
||||
- playwright/ tests/<recipe>/playwright/test_*.py (UI flows)
|
||||
A top-level test_*.py is a LIFECYCLE OVERLAY (test_<op>.py) and nothing else — top-level
|
||||
non-lifecycle files are NOT discovered (zero users at the time of the change; the lifecycle-
|
||||
name exclusion below stays as a safety net so a misfiled test_<op>.py can never double-run).
|
||||
Repo-local is consulted only for allowlist-approved recipes (HC2)."""
|
||||
lifecycle_names = {f"test_{op}.py" for op in LIFECYCLE_OPS}
|
||||
subdirs = ("functional", "playwright")
|
||||
found: list[tuple[str, str]] = []
|
||||
for source, d in (("cc-ci", cc_ci_dir(recipe)), ("repo-local", _gated(recipe, repo_local_dir))):
|
||||
if not d or not os.path.isdir(d):
|
||||
continue
|
||||
# top-level (legacy / cross-cutting tests not under functional/playwright)
|
||||
for p in sorted(glob.glob(os.path.join(d, "test_*.py"))):
|
||||
if os.path.basename(p) not in lifecycle_names:
|
||||
found.append((source, p))
|
||||
# functional/ and playwright/ subdirs (Phase 2 §4.1)
|
||||
for sub in subdirs:
|
||||
for p in sorted(glob.glob(os.path.join(d, sub, "test_*.py"))):
|
||||
# Phase-2 layout: lifecycle ops never live under functional/playwright, but be
|
||||
# explicit so a misfiled file doesn't silently get double-run.
|
||||
if os.path.basename(p) not in lifecycle_names:
|
||||
found.append((source, p))
|
||||
return found
|
||||
@ -144,7 +138,7 @@ def install_steps(recipe: str, repo_local_dir: str | None) -> tuple[str, str] |
|
||||
|
||||
def pre_op_hook(recipe: str, op: str, repo_local_dir: str | None) -> tuple[str, str] | None:
|
||||
"""The pre-op seed hook for `op`: the path to a recipe `ops.py` module that defines a
|
||||
`pre_<op>(domain, meta)` callable, or None. cc-ci's tests/<recipe>/ops.py wins; the repo-local
|
||||
`pre_<op>(ctx)` callable, or None. cc-ci's tests/<recipe>/ops.py wins; the repo-local
|
||||
ops.py is consulted only for allowlist-approved recipes (HC2). The orchestrator imports the
|
||||
module and calls pre_<op> BEFORE performing the op (HC3 op/assertion split — overlays seed
|
||||
pre-op state here, then assert post-op in test_<op>.py)."""
|
||||
|
||||
@ -19,6 +19,7 @@ import ssl
|
||||
import time
|
||||
|
||||
from . import abra, lifecycle
|
||||
from . import meta as meta_mod
|
||||
|
||||
# A recipe is backup-capable iff a compose file carries a truthy backupbot.backup label.
|
||||
_BACKUPBOT_RE = re.compile(r"backupbot\.backup\b[^\n]*\btrue\b", re.IGNORECASE)
|
||||
@ -28,13 +29,14 @@ def _recipe_dir(recipe: str) -> str:
|
||||
return abra.recipe_dir(recipe) # the per-run tree inside a CI run ($ABRA_DIR)
|
||||
|
||||
|
||||
def backup_capable(recipe: str, meta: dict | None = None) -> bool:
|
||||
def backup_capable(recipe: str, meta=None) -> bool:
|
||||
"""Whether the harness should run the backup/restore tiers (else they are a clean N/A skip, DG3).
|
||||
|
||||
`recipe_meta.BACKUP_CAPABLE` (bool) overrides; otherwise auto-detect by scanning the recipe's
|
||||
compose*.yml for a truthy `backupbot.backup` label (the Co-op Cloud backup convention)."""
|
||||
if meta and "BACKUP_CAPABLE" in meta:
|
||||
return bool(meta["BACKUP_CAPABLE"])
|
||||
`recipe_meta.BACKUP_CAPABLE` (bool) overrides when explicitly set (RecipeMeta default is None =
|
||||
unset); otherwise auto-detect by scanning the recipe's compose*.yml for a truthy
|
||||
`backupbot.backup` label (the Co-op Cloud backup convention)."""
|
||||
if meta is not None and meta.BACKUP_CAPABLE is not None:
|
||||
return bool(meta.BACKUP_CAPABLE)
|
||||
for path in glob.glob(os.path.join(_recipe_dir(recipe), "compose*.yml")):
|
||||
try:
|
||||
with open(path) as fh:
|
||||
@ -75,7 +77,7 @@ def served_cert(domain: str, port: int = 443) -> tuple[bool, str]:
|
||||
return (True, f"CN={cn} SAN={sans}")
|
||||
|
||||
|
||||
def assert_serving(domain: str, meta: dict) -> None:
|
||||
def assert_serving(domain: str, meta) -> None:
|
||||
"""The single generic "is the app really serving?" assertion (DG1).
|
||||
|
||||
The app-vs-Traefik-fallback proof is steps 1+2 (both load-bearing, verified by the Adversary):
|
||||
@ -90,14 +92,14 @@ def assert_serving(domain: str, meta: dict) -> None:
|
||||
|
||||
Steps 1–2 are BOUNDED POLLS (no bare sleep), so a state-mutating op (upgrade/restore) that leaves
|
||||
the app briefly reconverging settles, while a persistent failure still fails within the timeout."""
|
||||
deadline = time.time() + meta["DEPLOY_TIMEOUT"]
|
||||
deadline = time.time() + meta.DEPLOY_TIMEOUT
|
||||
while time.time() < deadline and not lifecycle.services_converged(domain):
|
||||
time.sleep(5)
|
||||
assert lifecycle.services_converged(domain), f"{domain}: services did not converge"
|
||||
|
||||
path = meta["HEALTH_PATH"]
|
||||
ok = tuple(meta["HEALTH_OK"])
|
||||
deadline = time.time() + meta["HTTP_TIMEOUT"]
|
||||
path = meta.HEALTH_PATH
|
||||
ok = tuple(meta.HEALTH_OK)
|
||||
deadline = time.time() + meta.HTTP_TIMEOUT
|
||||
served = False
|
||||
status, body = 0, ""
|
||||
while time.time() < deadline:
|
||||
@ -141,7 +143,7 @@ def op_state() -> dict:
|
||||
return {}
|
||||
|
||||
|
||||
def assert_upgraded(domain: str, meta: dict) -> None:
|
||||
def assert_upgraded(domain: str, meta) -> None:
|
||||
"""Generic UPGRADE assertion (post-op): the orchestrator already performed the upgrade once via
|
||||
`abra app deploy --chaos` of the PR-head checkout. Assert it reconverged + still serves AND that
|
||||
the deployment is genuinely the PR-head code under test (HC1) — non-vacuously (guarding F1d-2).
|
||||
@ -212,7 +214,7 @@ def assert_backup_artifact(domain: str) -> str:
|
||||
return snap_id
|
||||
|
||||
|
||||
def assert_restore_healthy(domain: str, meta: dict) -> None:
|
||||
def assert_restore_healthy(domain: str, meta) -> None:
|
||||
"""Generic RESTORE assertion (post-op): the orchestrator already restored. Assert the app is
|
||||
healthy + serving again (assert_serving polls, so the post-restore reconverge settles)."""
|
||||
assert_serving(domain, meta)
|
||||
@ -226,7 +228,7 @@ def perform_upgrade(
|
||||
recipe: str,
|
||||
head_ref: str | None,
|
||||
deploy_timeout: int = 900,
|
||||
meta: dict | None = None,
|
||||
meta=None,
|
||||
) -> dict[str, str | None]:
|
||||
"""Perform the UPGRADE op once, in place, to the PR-HEAD code under test (HC1): re-checkout the
|
||||
PR head (the prev-tag base deploy reset the recipe working tree), then `abra app deploy --chaos`
|
||||
@ -244,7 +246,8 @@ def perform_upgrade(
|
||||
STRICTER convergence+health wait here: services N/N (wait_healthy) + app HEALTH_PATH healthy +
|
||||
any recipe READY_PROBE (collabora WOPI discovery 200). This bounds readiness by OUR generous
|
||||
deadline, not abra's impatient one — and is stronger evidence than abra's monitor."""
|
||||
meta = meta or {}
|
||||
if meta is None:
|
||||
meta = meta_mod.load(recipe)
|
||||
before = lifecycle.deployed_identity(domain)
|
||||
if head_ref:
|
||||
lifecycle.recipe_checkout_ref(recipe, head_ref)
|
||||
@ -253,9 +256,7 @@ def perform_upgrade(
|
||||
# (target) version, so the base deploys minimally WITHOUT it and the upgrade adds it to COMPOSE_FILE
|
||||
# here, after the PR-head checkout (which ships the overlay) and before the chaos redeploy that
|
||||
# picks up the new .env. Dict or callable(domain)->dict. No-op for recipes without it.
|
||||
upgrade_env = meta.get("UPGRADE_EXTRA_ENV") or {}
|
||||
if callable(upgrade_env):
|
||||
upgrade_env = upgrade_env(domain) or {}
|
||||
upgrade_env = meta_mod.upgrade_extra_env(meta, meta_mod.hook_ctx(domain, meta, op="upgrade"))
|
||||
for k, v in upgrade_env.items():
|
||||
print(f" upgrade-env: {k}={v}", flush=True)
|
||||
abra.env_set(domain, k, v)
|
||||
@ -266,14 +267,12 @@ def perform_upgrade(
|
||||
# Own the convergence verification (abra's monitor was skipped via -c).
|
||||
lifecycle.wait_healthy(
|
||||
domain,
|
||||
ok_codes=tuple(meta.get("HEALTH_OK", (200, 301, 302))),
|
||||
path=meta.get("HEALTH_PATH", "/"),
|
||||
deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout)),
|
||||
http_timeout=int(meta.get("HTTP_TIMEOUT", 300)),
|
||||
)
|
||||
lifecycle.wait_ready_probes(
|
||||
meta, domain, timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout))
|
||||
ok_codes=tuple(meta.HEALTH_OK),
|
||||
path=meta.HEALTH_PATH,
|
||||
deploy_timeout=int(meta.DEPLOY_TIMEOUT),
|
||||
http_timeout=int(meta.HTTP_TIMEOUT),
|
||||
)
|
||||
lifecycle.wait_ready_probes(meta, domain, timeout=int(meta.DEPLOY_TIMEOUT), op="upgrade")
|
||||
after = lifecycle.deployed_identity(domain)
|
||||
# Evidence (HC1): the chaos-version label = the deployed recipe commit; it should match the
|
||||
# PR-head we checked out — proving the upgrade deployed the code under test, not a published tag.
|
||||
|
||||
@ -12,6 +12,7 @@ import glob
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import shutil
|
||||
import socket
|
||||
import ssl
|
||||
import subprocess
|
||||
@ -19,6 +20,7 @@ import time
|
||||
import urllib.request
|
||||
|
||||
from . import abra, lifetime
|
||||
from . import meta as meta_mod
|
||||
|
||||
GATEWAY_IP = "143.244.213.108" # *.ci.commoninternet.net -> gateway (TLS passthrough to cc-ci)
|
||||
# A run app domain is "<recipe[:4]>-<6hex>.ci.commoninternet.net" (see DECISIONS.md). Used by the
|
||||
@ -111,37 +113,6 @@ def _residual(domain: str) -> dict:
|
||||
}
|
||||
|
||||
|
||||
def _recipe_extra_env(recipe: str, domain: str) -> dict[str, str]:
|
||||
"""Per-recipe extra .env keys, applied at every deploy (install + upgrade's old_app) so a recipe
|
||||
with multi-domain / config needs is enrolled with NO shared-harness change (D5/M6.5). A recipe
|
||||
declares `EXTRA_ENV` in tests/<recipe>/recipe_meta.py as either a dict or a callable
|
||||
`EXTRA_ENV(domain) -> dict` (callable form lets it derive values from the per-run domain, e.g.
|
||||
cryptpad's SANDBOX_DOMAIN). Returns {} if none."""
|
||||
path = os.path.join(os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py")
|
||||
if not os.path.exists(path):
|
||||
return {}
|
||||
ns: dict = {}
|
||||
with open(path) as fh:
|
||||
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
|
||||
ee = ns.get("EXTRA_ENV")
|
||||
if callable(ee):
|
||||
ee = ee(domain)
|
||||
return {str(k): str(v) for k, v in (ee or {}).items()}
|
||||
|
||||
|
||||
def _recipe_meta_flag(recipe: str, key: str) -> bool:
|
||||
"""Read a boolean flag from tests/<recipe>/recipe_meta.py (e.g. CHAOS_BASE_DEPLOY). Returns
|
||||
False if the recipe ships no meta or the flag is absent/falsey. Trusted in-repo exec, same as
|
||||
_recipe_extra_env."""
|
||||
path = os.path.join(os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py")
|
||||
if not os.path.exists(path):
|
||||
return False
|
||||
ns: dict = {}
|
||||
with open(path) as fh:
|
||||
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
|
||||
return bool(ns.get(key))
|
||||
|
||||
|
||||
def _record_deploy() -> None:
|
||||
"""Increment the per-run deploy counter (DG4.1: one deploy per run). No-op unless the
|
||||
orchestrator set CCCI_DEPLOY_COUNT_FILE — so it never affects standalone/manual use."""
|
||||
@ -155,6 +126,34 @@ def _record_deploy() -> None:
|
||||
f.write(str(n + 1))
|
||||
|
||||
|
||||
def ccci_overlay_path(recipe: str) -> str:
|
||||
"""The cc-ci-owned compose overlay for a recipe (rcust P2a: first-class, auto-discovered)."""
|
||||
return os.path.join(meta_mod.TESTS_DIR, recipe, "compose.ccci.yml")
|
||||
|
||||
|
||||
def has_ccci_overlay(recipe: str) -> bool:
|
||||
return os.path.isfile(ccci_overlay_path(recipe))
|
||||
|
||||
|
||||
def provide_ccci_overlay(recipe: str) -> None:
|
||||
"""Copy tests/<recipe>/compose.ccci.yml into THIS run's recipe checkout (ABRA_DIR-aware), so
|
||||
the recipe's COMPOSE_FILE reference resolves (rcust P2a — the harness owns the copy; recipes
|
||||
no longer ship install_steps.sh boilerplate for it). No-op for recipes without an overlay."""
|
||||
src = ccci_overlay_path(recipe)
|
||||
if not os.path.isfile(src):
|
||||
return
|
||||
dest_dir = abra.recipe_dir(recipe)
|
||||
if not os.path.isdir(dest_dir):
|
||||
print(f" ccci-overlay: recipe dir {dest_dir} missing — cannot provide overlay", flush=True)
|
||||
raise RuntimeError(f"recipe checkout missing for {recipe}: {dest_dir}")
|
||||
shutil.copy(src, os.path.join(dest_dir, "compose.ccci.yml"))
|
||||
print(
|
||||
f" ccci-overlay: provided compose.ccci.yml to the {recipe} checkout "
|
||||
"(first-class overlay; base deploy auto-chaos)",
|
||||
flush=True,
|
||||
)
|
||||
|
||||
|
||||
def _run_install_steps(hook: tuple[str, str], recipe: str, domain: str) -> None:
|
||||
"""Run a recipe's custom install-steps hook (install_steps.sh) during the install tier — after
|
||||
`abra app new` + env defaults + secret generate, before deploy (Phase 1d DG5). The hook gets the
|
||||
@ -238,15 +237,23 @@ def deploy_app(
|
||||
secrets: bool = True,
|
||||
install_steps_hook: tuple[str, str] | None = None,
|
||||
deploy_timeout: int = 900,
|
||||
meta=None,
|
||||
) -> None:
|
||||
"""Create + configure + deploy an app. Forces LETS_ENCRYPT_ENV='' so traefik serves the
|
||||
wildcard cert via the file provider and NEVER attempts ACME (adversary finding A1). Applies any
|
||||
per-recipe EXTRA_ENV (recipe_meta.py) and the custom install-steps hook (Phase 1d) before deploy.
|
||||
per-recipe EXTRA_ENV (recipe_meta.py), the custom install-steps hook (Phase 1d), and the
|
||||
first-class `tests/<recipe>/compose.ccci.yml` overlay (rcust P2a) before deploy.
|
||||
|
||||
`meta` is the recipe's loaded RecipeMeta (EXTRA_ENV); the orchestrator loads once and passes
|
||||
it down. Callers without one in hand (fixtures, warm reconcile) may omit it — it is then
|
||||
loaded here via the single meta.load() path.
|
||||
|
||||
`deploy_timeout` is the subprocess timeout for `abra app deploy`. Caller (orchestrator) passes
|
||||
`recipe_meta.DEPLOY_TIMEOUT` so heavy recipes (ghost, matrix-synapse, lasuite-meet) can extend
|
||||
past the 900s default. abra's INTERNAL TIMEOUT (recipe's TIMEOUT env, default 300s) is set via
|
||||
EXTRA_ENV; this is the Python subprocess wrapper's timeout so abra doesn't get SIGKILLed mid-deploy."""
|
||||
if meta is None:
|
||||
meta = meta_mod.load(recipe)
|
||||
_record_deploy()
|
||||
# Lock BEFORE the app exists: a concurrent run's janitor must never see this app without a
|
||||
# held app lock (it would probe it as an orphan and reap an in-flight deploy). Also the
|
||||
@ -274,16 +281,18 @@ def deploy_app(
|
||||
flush=True,
|
||||
)
|
||||
chaos = True
|
||||
# A recipe may force a chaos base deploy via recipe_meta CHAOS_BASE_DEPLOY=True when an
|
||||
# install_steps hook adds an untracked compose overlay to the recipe checkout (e.g. discourse's
|
||||
# compose.ccci.yml, provided by install_steps for the pinned base). The untracked file makes
|
||||
# abra's pinned-deploy clean-tree check FATA ('has locally unstaged changes'); chaos skips lint +
|
||||
# the clean-tree gate and deploys the EXPLICITLY-checked-out pinned version (we already ran
|
||||
# recipe_checkout(version) above) — NOT latest. Same mechanism as the lightweight-tag branch.
|
||||
elif _recipe_meta_flag(recipe, "CHAOS_BASE_DEPLOY"):
|
||||
# A first-class cc-ci compose overlay (tests/<recipe>/compose.ccci.yml, copied into the
|
||||
# checkout below — rcust P2a) is an UNTRACKED file in the recipe checkout, which makes
|
||||
# abra's pinned-deploy clean-tree check FATA ('has locally unstaged changes'). Auto-chaos:
|
||||
# chaos skips lint + the clean-tree gate and deploys the EXPLICITLY-checked-out pinned
|
||||
# version (we already ran recipe_checkout(version) above) — NOT latest. Same mechanism as
|
||||
# the lightweight-tag branch. (Replaces the deleted CHAOS_BASE_DEPLOY meta flag — the
|
||||
# overlay's presence IS the signal, killing the R7 implicit coupling.)
|
||||
elif has_ccci_overlay(recipe):
|
||||
print(
|
||||
f" deploy_app({recipe}@{version}): CHAOS_BASE_DEPLOY set → chaos base deploy of the "
|
||||
"checked-out pinned version (skips clean-tree/lint; deploys version, not LATEST)",
|
||||
f" deploy_app({recipe}@{version}): compose.ccci.yml overlay present → chaos base "
|
||||
"deploy of the checked-out pinned version (skips clean-tree/lint; deploys version, "
|
||||
"not LATEST)",
|
||||
flush=True,
|
||||
)
|
||||
chaos = True
|
||||
@ -293,12 +302,18 @@ def deploy_app(
|
||||
# it ourselves is recipe-agnostic and canonical (the run domain IS the app's domain).
|
||||
abra.env_set(domain, "DOMAIN", domain)
|
||||
abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
|
||||
for k, v in _recipe_extra_env(recipe, domain).items():
|
||||
for k, v in meta_mod.extra_env(meta, meta_mod.hook_ctx(domain, meta)).items():
|
||||
abra.env_set(domain, k, v)
|
||||
if secrets:
|
||||
abra.secret_generate(domain)
|
||||
if install_steps_hook:
|
||||
_run_install_steps(install_steps_hook, recipe, domain)
|
||||
# First-class cc-ci compose overlay (rcust P2a): if the recipe ships
|
||||
# tests/<recipe>/compose.ccci.yml, copy it into THIS run's recipe checkout (ABRA_DIR-aware)
|
||||
# so the COMPOSE_FILE reference in the recipe's EXTRA_ENV resolves. Untracked, so it persists
|
||||
# across the later PR-head checkout (idempotent when the head ships the same fix). Replaces
|
||||
# the per-recipe install_steps.sh copy boilerplate + CHAOS_BASE_DEPLOY flag (auto-chaos above).
|
||||
provide_ccci_overlay(recipe)
|
||||
# HQ1: warm the local image store before the (real, unchanged) abra deploy.
|
||||
prepull_images(recipe, domain)
|
||||
abra.deploy(domain, chaos=chaos, timeout=deploy_timeout)
|
||||
@ -510,7 +525,7 @@ def chaos_redeploy(
|
||||
abra.deploy(domain, chaos=True, timeout=deploy_timeout, no_converge_checks=no_converge_checks)
|
||||
|
||||
|
||||
def wait_ready_probes(meta: dict, domain: str, timeout: int = 600) -> None:
|
||||
def wait_ready_probes(meta, domain: str, timeout: int = 600, op: str | None = None) -> None:
|
||||
"""Poll a recipe's optional READY_PROBE endpoints until each returns an accepted status, or raise.
|
||||
|
||||
A recipe_meta may define `READY_PROBE(domain) -> [{"host":..., "path":..., "ok":(200,)}, ...]`
|
||||
@ -527,10 +542,10 @@ def wait_ready_probes(meta: dict, domain: str, timeout: int = 600) -> None:
|
||||
must be released by the old task + rebound by the new) the voice server can be down while
|
||||
HTTP-200 still passes — and backup-bot then execs into a not-running app container (409). Requiring
|
||||
the voice port to be stably listening before proceeding closes that window."""
|
||||
probe_fn = meta.get("READY_PROBE")
|
||||
probe_fn = meta.READY_PROBE
|
||||
if not callable(probe_fn):
|
||||
return
|
||||
probes = probe_fn(domain) or []
|
||||
probes = probe_fn(meta_mod.hook_ctx(domain, meta, op=op)) or []
|
||||
for probe in probes:
|
||||
if "tcp_port" in probe:
|
||||
host = probe.get("tcp_host", "127.0.0.1")
|
||||
|
||||
320
runner/harness/meta.py
Normal file
320
runner/harness/meta.py
Normal file
@ -0,0 +1,320 @@
|
||||
"""Single recipe-meta loader + declarative key registry (recipe-custom restructure P1; spec
|
||||
docs/recipe-customization.md §8 R1).
|
||||
|
||||
THE one place `tests/<recipe>/recipe_meta.py` is `exec()`d. Every consumer (orchestrator, pytest
|
||||
`meta` fixture, deploy env shaping, deps, warm-canonical enrollment, screenshot) reads the ONE
|
||||
loaded `RecipeMeta` object instead of re-exec'ing the file and cherry-picking keys — that drift
|
||||
(six divergent loaders, spec §4 L1–L6) is what made `SCREENSHOT` an unreachable knob (R2) and let
|
||||
key typos silently disable coverage (R6).
|
||||
|
||||
Validation (locked decision, recipe-custom-restructure-full-plan.md):
|
||||
- unknown ALL-CAPS top-level name → MetaError (hard error, fails fast at load; the all-recipes
|
||||
unit test catches it at PR time). Underscore-prefixed names (`_FOO`) are recipe-private and
|
||||
exempt; lowercase names (helper functions/imports) are ignored.
|
||||
- type mismatch → MetaError. Callables are accepted ONLY for hook-typed keys.
|
||||
|
||||
The KEYS registry is the single source of truth for the key set: it drives validation, the
|
||||
RecipeMeta dataclass fields, and the generated reference table in docs/recipe-customization.md §4
|
||||
(scripts/gen-meta-docs.py; a unit test asserts the committed table matches).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import copy
|
||||
import dataclasses
|
||||
import difflib
|
||||
import inspect
|
||||
import json
|
||||
import os
|
||||
from collections.abc import Callable
|
||||
|
||||
ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
TESTS_DIR = os.path.join(ROOT, "tests")
|
||||
|
||||
|
||||
class MetaError(Exception):
|
||||
"""A recipe_meta.py failed registry validation (unknown key / type mismatch / callable on a
|
||||
data key). Hard error by design: a typo'd key must fail the run at load, not silently reduce
|
||||
coverage (spec §8 R6 — the worst failure mode for a CI harness)."""
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class Key:
|
||||
"""One registered recipe_meta key: name, type tag, default, one-line doc (rendered into the
|
||||
generated reference table), optional extra validator, and a deprecation marker (deprecated
|
||||
keys still load+validate but are scheduled for deletion)."""
|
||||
|
||||
name: str
|
||||
type: str # "int"|"str"|"tuple[int]"|"bool"|"dict_or_hook"|"hook"|"list[str]"|"dict"
|
||||
default: object
|
||||
doc: str
|
||||
validate: Callable[[object], None] | None = None
|
||||
deprecated: bool = False
|
||||
# Expected positional-parameter names for a callable value (rcust P3 uniform ctx convention).
|
||||
# Enforced at load so a legacy-signature hook (e.g. `def READY_PROBE(domain)`) fails with a
|
||||
# CLEAR MetaError naming the migration — never a silent TypeError mid-run.
|
||||
hook_params: tuple[str, ...] | None = None
|
||||
|
||||
|
||||
KEYS: tuple[Key, ...] = (
|
||||
Key(
|
||||
"HEALTH_PATH",
|
||||
"str",
|
||||
"/",
|
||||
"Path probed for serving/health checks (deploy wait + generic `assert_serving`).",
|
||||
),
|
||||
Key("HEALTH_OK", "tuple[int]", (200, 301, 302), "Acceptable HTTP status codes for health."),
|
||||
Key("DEPLOY_TIMEOUT", "int", 600, "Max seconds to wait for swarm convergence per deploy."),
|
||||
Key("HTTP_TIMEOUT", "int", 300, "Max seconds to wait for HTTP health after convergence."),
|
||||
Key(
|
||||
"BACKUP_CAPABLE",
|
||||
"bool",
|
||||
None,
|
||||
"Override the backup-tier capability auto-detect (compose `backupbot.backup` labels). `False` forces N/A; `True` forces the tier on; unset = auto-detect.",
|
||||
),
|
||||
Key(
|
||||
"EXPECTED_NA",
|
||||
"dict",
|
||||
None,
|
||||
"Declare an N/A rung intentional: `{rung: reason}`. The cap stands either way; only the report wording changes.",
|
||||
),
|
||||
Key(
|
||||
"READY_PROBE",
|
||||
"hook",
|
||||
None,
|
||||
"Callable `(ctx) -> [probe, ...]` returning extra readiness probes, run after install AND after upgrade: HTTP `{host, path, ok}` or TCP `{tcp_host, tcp_port, stable}`.",
|
||||
hook_params=("ctx",),
|
||||
),
|
||||
Key(
|
||||
"UPGRADE_BASE_VERSION",
|
||||
"str",
|
||||
None,
|
||||
"Exact published tag overriding the upgrade tier's base (default: `recipe_versions[-2]`).",
|
||||
),
|
||||
Key(
|
||||
"BACKUP_VERIFY",
|
||||
"hook",
|
||||
None,
|
||||
"Callable `(ctx) -> bool` post-backup data-capture check; `False` re-runs the backup (truncated-dump race guard), retried up to 3 attempts.",
|
||||
hook_params=("ctx",),
|
||||
),
|
||||
Key(
|
||||
"UPGRADE_EXTRA_ENV",
|
||||
"dict_or_hook",
|
||||
None,
|
||||
"Extra `.env` keys applied after the PR-head checkout, before the chaos redeploy (env that exists only at head). Dict, or callable `(ctx) -> dict`.",
|
||||
hook_params=("ctx",),
|
||||
),
|
||||
Key(
|
||||
"EXTRA_ENV",
|
||||
"dict_or_hook",
|
||||
{},
|
||||
"Extra `.env` keys applied at EVERY deploy (base install AND upgrade old-app). Dict, or callable `(ctx) -> dict` deriving values from the per-run domain (`ctx.domain`).",
|
||||
hook_params=("ctx",),
|
||||
),
|
||||
Key(
|
||||
"DEPS",
|
||||
"list[str]",
|
||||
[],
|
||||
'Dep recipes deployed/provisioned alongside (e.g. `["keycloak"]`); creds land in `$CCCI_DEPS_FILE`.',
|
||||
),
|
||||
Key(
|
||||
"WARM_CANONICAL",
|
||||
"bool",
|
||||
False,
|
||||
"Enroll the recipe in the warm/canonical app system (docs/warm.md): green cold runs on LATEST advance the canonical snapshot.",
|
||||
),
|
||||
Key(
|
||||
"SCREENSHOT",
|
||||
"hook",
|
||||
None,
|
||||
"Callable `(page, ctx)` driving Playwright to a safe, credential-free post-login view for the results-card screenshot (default: landing page).",
|
||||
hook_params=("page", "ctx"),
|
||||
),
|
||||
# (CHAOS_BASE_DEPLOY, OIDC_AT_INSTALL and SKIP_GENERIC were deleted in restructure P2:
|
||||
# compose.ccci.yml is first-class + auto-chaos; install-time deps wiring is the only mode;
|
||||
# the generic floor is suppressible only via the dev-only CCCI_SKIP_GENERIC* env form.)
|
||||
)
|
||||
|
||||
_REGISTRY: dict[str, Key] = {k.name: k for k in KEYS}
|
||||
|
||||
# The one validated, attribute-access view of a recipe's customization. Generated from KEYS so the
|
||||
# field set can never drift from the registry (frozen: consumers share one immutable object).
|
||||
RecipeMeta = dataclasses.make_dataclass(
|
||||
"RecipeMeta",
|
||||
[(k.name, object, dataclasses.field(default=None)) for k in KEYS],
|
||||
frozen=True,
|
||||
)
|
||||
RecipeMeta.__doc__ = (
|
||||
"Validated per-recipe customization (one field per registered key; attribute access). "
|
||||
"Built ONLY by meta.load()."
|
||||
)
|
||||
|
||||
|
||||
def meta_path(recipe: str, tests_dir: str | None = None) -> str:
|
||||
"""Canonical path of a recipe's meta file (pure)."""
|
||||
return os.path.join(tests_dir or TESTS_DIR, recipe, "recipe_meta.py")
|
||||
|
||||
|
||||
def check_hook_signature(fn, expected: tuple[str, ...], where: str) -> None:
|
||||
"""Enforce the uniform ctx hook convention (rcust P3): a hook callable's positional parameters
|
||||
must be exactly `expected` (e.g. ("ctx",) or ("page", "ctx")). A legacy-signature hook (the
|
||||
pre-restructure `(domain)` / `(domain, meta)` / `(page, domain, meta)` forms) raises a CLEAR
|
||||
MetaError naming the migration — never a silent TypeError mid-run."""
|
||||
try:
|
||||
params = [
|
||||
p.name
|
||||
for p in inspect.signature(fn).parameters.values()
|
||||
if p.kind in (p.POSITIONAL_ONLY, p.POSITIONAL_OR_KEYWORD)
|
||||
]
|
||||
except (TypeError, ValueError): # builtins/odd callables — let the call site surface it
|
||||
return
|
||||
if tuple(params) != expected:
|
||||
raise MetaError(
|
||||
f"{where}: hook signature is ({', '.join(params)}) — the recipe-customization "
|
||||
f"restructure (P3) changed ALL recipe hook signatures to ({', '.join(expected)}); "
|
||||
f"read fields off the HookCtx (ctx.domain, ctx.base_url, ctx.meta, ctx.deps, ctx.op). "
|
||||
f"See docs/recipe-customization.md §5."
|
||||
)
|
||||
|
||||
|
||||
def _coerce(key: Key, value: object, path: str) -> object:
|
||||
"""Validate `value` against `key`'s declared type; normalize containers (tuple[int]/list[str]).
|
||||
Raises MetaError on mismatch — including a callable supplied for a data-typed key."""
|
||||
t = key.type
|
||||
if callable(value) and t not in ("hook", "dict_or_hook"):
|
||||
raise MetaError(
|
||||
f"{path}: {key.name} is a data key (type {t}) — callables are accepted only for "
|
||||
f"hook-typed keys"
|
||||
)
|
||||
if t == "int":
|
||||
if isinstance(value, int) and not isinstance(value, bool):
|
||||
return value
|
||||
elif t == "str":
|
||||
if isinstance(value, str):
|
||||
return value
|
||||
elif t == "bool":
|
||||
if isinstance(value, bool):
|
||||
return value
|
||||
elif t == "tuple[int]":
|
||||
if isinstance(value, tuple | list) and all(
|
||||
isinstance(x, int) and not isinstance(x, bool) for x in value
|
||||
):
|
||||
return tuple(value)
|
||||
elif t == "list[str]":
|
||||
if isinstance(value, tuple | list) and all(isinstance(x, str) for x in value):
|
||||
return list(value)
|
||||
elif t == "dict":
|
||||
if isinstance(value, dict):
|
||||
return value
|
||||
elif (
|
||||
t == "hook"
|
||||
and callable(value)
|
||||
or t == "dict_or_hook"
|
||||
and (isinstance(value, dict) or callable(value))
|
||||
):
|
||||
return value
|
||||
raise MetaError(f"{path}: {key.name} must be {t}, got {type(value).__name__} ({value!r})")
|
||||
|
||||
|
||||
def load(recipe: str, tests_dir: str | None = None):
|
||||
"""Load + validate a recipe's customization -> RecipeMeta. THE only exec() of recipe_meta.py.
|
||||
|
||||
Missing file -> all registry defaults (the zero-config baseline, spec §2). Unknown
|
||||
non-underscore ALL-CAPS top-level name or type mismatch -> MetaError (hard error).
|
||||
`tests_dir` overrides the recipe-meta root (unit tests / fixtures)."""
|
||||
path = meta_path(recipe, tests_dir)
|
||||
values = {k.name: copy.copy(k.default) for k in KEYS}
|
||||
if os.path.exists(path):
|
||||
ns: dict = {}
|
||||
with open(path) as fh:
|
||||
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
|
||||
for name in sorted(ns):
|
||||
if name.startswith("_") or not name.isupper():
|
||||
continue # _FOO = recipe-private (exempt); lowercase = helpers/imports (ignored)
|
||||
key = _REGISTRY.get(name)
|
||||
if key is None:
|
||||
near = difflib.get_close_matches(name, _REGISTRY, n=1)
|
||||
hint = f" — did you mean {near[0]!r}?" if near else ""
|
||||
raise MetaError(
|
||||
f"{path}: unknown recipe_meta key {name!r}{hint}. Registered keys: "
|
||||
f"{', '.join(sorted(_REGISTRY))}. Recipe-private constants must be "
|
||||
f"underscore-prefixed (e.g. _{name})."
|
||||
)
|
||||
values[name] = _coerce(key, ns[name], path)
|
||||
if key.hook_params and callable(values[name]):
|
||||
check_hook_signature(values[name], key.hook_params, f"{path}: {name}")
|
||||
if key.validate:
|
||||
key.validate(values[name])
|
||||
return RecipeMeta(**values)
|
||||
|
||||
|
||||
def as_dict(meta) -> dict:
|
||||
"""RecipeMeta -> {key: value} (every registered key, defaults included)."""
|
||||
return dataclasses.asdict(meta)
|
||||
|
||||
|
||||
def non_default(meta) -> dict:
|
||||
"""The keys a recipe explicitly customized: {key: value} where value differs from the registry
|
||||
default. Hooks compare by identity-vs-None (a set hook is always non-default). Feeds the run's
|
||||
customization manifest (P5)."""
|
||||
out = {}
|
||||
for k in KEYS:
|
||||
v = getattr(meta, k.name)
|
||||
if v != k.default:
|
||||
out[k.name] = v
|
||||
return out
|
||||
|
||||
|
||||
@dataclasses.dataclass(frozen=True)
|
||||
class HookCtx:
|
||||
"""The single argument every recipe hook receives (rcust P3 uniform ctx convention):
|
||||
`EXTRA_ENV(ctx)`, `UPGRADE_EXTRA_ENV(ctx)`, `READY_PROBE(ctx)`, `BACKUP_VERIFY(ctx)`,
|
||||
`SCREENSHOT(page, ctx)`, ops.py `pre_<op>(ctx)`."""
|
||||
|
||||
domain: str # the app's per-run domain
|
||||
base_url: str # https://<domain>
|
||||
meta: object # the recipe's full RecipeMeta
|
||||
deps: dict | None # provisioned dep creds ({dep_recipe: entry}) or None if absent/empty
|
||||
op: str | None # current lifecycle op (install|upgrade|backup|restore) or None
|
||||
|
||||
|
||||
def _run_deps() -> dict | None:
|
||||
"""The current run's provisioned dep creds from $CCCI_DEPS_FILE (either shape), or None.
|
||||
Read directly (not via harness.deps) to keep meta.py import-cycle-free."""
|
||||
path = os.environ.get("CCCI_DEPS_FILE")
|
||||
if not path or not os.path.exists(path):
|
||||
return None
|
||||
try:
|
||||
with open(path) as f:
|
||||
data = json.load(f)
|
||||
except (OSError, ValueError):
|
||||
return None
|
||||
if isinstance(data, dict):
|
||||
return data or None
|
||||
if isinstance(data, list):
|
||||
out = {e["recipe"]: e for e in data if isinstance(e, dict) and e.get("recipe")}
|
||||
return out or None
|
||||
return None
|
||||
|
||||
|
||||
def hook_ctx(domain: str, meta, *, op: str | None = None) -> HookCtx:
|
||||
"""Build the HookCtx for a hook call site. Dep creds are picked up from the run's
|
||||
$CCCI_DEPS_FILE when present (None otherwise)."""
|
||||
return HookCtx(domain=domain, base_url=f"https://{domain}", meta=meta, deps=_run_deps(), op=op)
|
||||
|
||||
|
||||
def _env_map(value, ctx: HookCtx) -> dict[str, str]:
|
||||
if callable(value):
|
||||
value = value(ctx)
|
||||
return {str(k): str(v) for k, v in (value or {}).items()}
|
||||
|
||||
|
||||
def extra_env(meta, ctx: HookCtx) -> dict[str, str]:
|
||||
"""Resolve EXTRA_ENV (dict or callable(ctx)->dict) to the concrete per-run env map."""
|
||||
return _env_map(meta.EXTRA_ENV, ctx)
|
||||
|
||||
|
||||
def upgrade_extra_env(meta, ctx: HookCtx) -> dict[str, str]:
|
||||
"""Resolve UPGRADE_EXTRA_ENV (dict or callable(ctx)->dict) to the concrete env map."""
|
||||
return _env_map(meta.UPGRADE_EXTRA_ENV, ctx)
|
||||
@ -8,7 +8,7 @@ Secret-safety (R7, the cardinal screenshot guardrail): the screenshot step must
|
||||
that displays generated credentials (an install wizard showing the initial admin password, a secrets
|
||||
page, etc.). The DEFAULT capture is the app's **landing page** (a login form shows fields, not the
|
||||
password) — safe for every recipe. A recipe that needs a post-login view opts in via a recipe-meta
|
||||
`SCREENSHOT` hook: a callable `screenshot(page, domain, meta) -> None` that drives Playwright to a
|
||||
`SCREENSHOT` hook: a callable `SCREENSHOT(page, ctx) -> None` that drives Playwright to a
|
||||
safe, credential-free view and is responsible for not landing on a secrets page. The harness never
|
||||
auto-fills a wizard.
|
||||
|
||||
@ -21,6 +21,7 @@ from __future__ import annotations
|
||||
import os
|
||||
|
||||
from . import browser as harness_browser
|
||||
from . import meta as meta_mod
|
||||
|
||||
# Default viewport for the captured screenshot — a desktop-ish frame that crops well into the card.
|
||||
VIEWPORT = {"width": 1280, "height": 800}
|
||||
@ -33,12 +34,19 @@ def screenshot_path(run_artifact_dir: str) -> str:
|
||||
return os.path.join(run_artifact_dir, "screenshot.png")
|
||||
|
||||
|
||||
def _load_screenshot_hook(recipe_meta: dict | None):
|
||||
def _load_screenshot_hook(recipe_meta):
|
||||
"""Return the recipe's optional SCREENSHOT hook (a callable) if it declared one, else None.
|
||||
The hook drives Playwright to a safe post-login view; default is the landing page."""
|
||||
if not recipe_meta:
|
||||
The hook drives Playwright to a safe post-login view; default is the landing page.
|
||||
|
||||
`recipe_meta` is the loaded RecipeMeta (rcust P1 — the single loader actually delivers
|
||||
SCREENSHOT now; under the old L1 allowlist the key never arrived, spec §8 R2). A plain dict
|
||||
is still accepted for direct/manual callers."""
|
||||
if recipe_meta is None:
|
||||
return None
|
||||
hook = recipe_meta.get("SCREENSHOT")
|
||||
if isinstance(recipe_meta, dict):
|
||||
hook = recipe_meta.get("SCREENSHOT")
|
||||
else:
|
||||
hook = getattr(recipe_meta, "SCREENSHOT", None)
|
||||
return hook if callable(hook) else None
|
||||
|
||||
|
||||
@ -67,8 +75,9 @@ def capture(domain: str, out_path: str, *, recipe_meta: dict | None = None) -> s
|
||||
if hook is not None:
|
||||
# Recipe-specific safe view (post-login etc.). The hook owns navigation +
|
||||
# the no-secret-page guarantee; it should call page.screenshot itself, but if
|
||||
# it doesn't, we still snap the resulting page below.
|
||||
hook(page, domain, recipe_meta)
|
||||
# it doesn't, we still snap the resulting page below. SCREENSHOT(page, ctx) —
|
||||
# the uniform ctx convention (rcust P3).
|
||||
hook(page, meta_mod.hook_ctx(domain, recipe_meta))
|
||||
if not os.path.exists(out_path):
|
||||
page.screenshot(path=out_path, full_page=False)
|
||||
else:
|
||||
|
||||
@ -58,6 +58,9 @@ from harness import ( # noqa: E402
|
||||
from harness import ( # noqa: E402
|
||||
deps as deps_mod,
|
||||
)
|
||||
from harness import ( # noqa: E402
|
||||
meta as meta_mod,
|
||||
)
|
||||
from harness import ( # noqa: E402
|
||||
results as results_mod,
|
||||
)
|
||||
@ -70,7 +73,7 @@ ALL_STAGES = ("install", "upgrade", "backup", "restore", "custom")
|
||||
|
||||
def sso_dep_unverified(declared, deps_ready: bool, requires_deps_skipped: int) -> bool:
|
||||
"""F2-11 gate predicate (pure, unit-tested). True when a recipe declares DEPS but its
|
||||
setup_custom_tests failed (deps not ready) AND that caused ≥1 `requires_deps` (SSO/OIDC) test
|
||||
dep provisioning failed (deps not ready) AND that caused ≥1 `requires_deps` (SSO/OIDC) test
|
||||
to SKIP. In that case the recipe's characteristic SSO claim was NOT verified, so the run must
|
||||
NOT report GREEN — even though a skip-only pytest file exits 0 and leaves every tier 'pass'.
|
||||
Generic-tier failure-isolation is preserved (those results stand); only the green SIGNAL is
|
||||
@ -247,52 +250,29 @@ def snapshot_recipe_tests(recipe: str) -> str | None:
|
||||
return dst
|
||||
|
||||
|
||||
def _load_meta(recipe: str) -> dict:
|
||||
"""Mirror tests/conftest._recipe_meta so the orchestrator's deploy/wait uses the same per-recipe
|
||||
config the tiers see (timeouts, health path/codes)."""
|
||||
meta = {
|
||||
"HEALTH_PATH": "/",
|
||||
"HEALTH_OK": (200, 301, 302),
|
||||
"DEPLOY_TIMEOUT": 600,
|
||||
"HTTP_TIMEOUT": 300,
|
||||
}
|
||||
path = os.path.join(ROOT, "tests", recipe, "recipe_meta.py")
|
||||
if os.path.exists(path):
|
||||
ns: dict = {}
|
||||
with open(path) as fh:
|
||||
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
|
||||
for k in list(meta) + [
|
||||
"BACKUP_CAPABLE",
|
||||
"SKIP_GENERIC",
|
||||
"EXPECTED_NA",
|
||||
"OIDC_AT_INSTALL",
|
||||
"READY_PROBE",
|
||||
"UPGRADE_BASE_VERSION",
|
||||
"BACKUP_VERIFY",
|
||||
"UPGRADE_EXTRA_ENV",
|
||||
]:
|
||||
if k in ns:
|
||||
meta[k] = ns[k]
|
||||
return meta
|
||||
|
||||
|
||||
def _tier_env(domain: str) -> dict:
|
||||
return dict(os.environ, CCCI_APP_DOMAIN=domain, CCCI_BASE_URL=f"https://{domain}")
|
||||
|
||||
|
||||
def _skip_generic(op: str, meta: dict) -> bool:
|
||||
def skip_generic_env_overrides() -> list[str]:
|
||||
"""Active CCCI_SKIP_GENERIC* env overrides (rcust P2c: the meta key is deleted; the env form
|
||||
is a documented LOCAL-DEV-ONLY escape hatch). Surfaced loudly when set in a CI (drone) run —
|
||||
it reduces generic-floor coverage and must never silently ride a CI verdict."""
|
||||
return sorted(
|
||||
k for k in os.environ if k.startswith("CCCI_SKIP_GENERIC") and _truthy(os.environ.get(k))
|
||||
)
|
||||
|
||||
|
||||
def _skip_generic(op: str) -> bool:
|
||||
"""Whether the generic assertion for `op` is opted out (Phase 1e HC3). Default: run (additive).
|
||||
Opt-out, any of: env CCCI_SKIP_GENERIC (all ops), env CCCI_SKIP_GENERIC_<OP>, or the recipe's
|
||||
declarative recipe_meta.SKIP_GENERIC list (op name, or "all"/"*")."""
|
||||
Opt-out via env only (dev-only escape hatch, P2c): CCCI_SKIP_GENERIC (all ops) or
|
||||
CCCI_SKIP_GENERIC_<OP>. The recipe_meta SKIP_GENERIC key is deleted (zero users)."""
|
||||
if _truthy(os.environ.get("CCCI_SKIP_GENERIC")):
|
||||
return True
|
||||
if _truthy(os.environ.get(f"CCCI_SKIP_GENERIC_{op.upper()}")):
|
||||
return True
|
||||
sg = [str(s).lower() for s in (meta.get("SKIP_GENERIC") or [])]
|
||||
return "all" in sg or "*" in sg or op in sg
|
||||
return _truthy(os.environ.get(f"CCCI_SKIP_GENERIC_{op.upper()}"))
|
||||
|
||||
|
||||
def _run_pre_hook(recipe: str, op: str, repo_local: str | None, domain: str, meta: dict) -> None:
|
||||
def _run_pre_hook(recipe: str, op: str, repo_local: str | None, domain: str, meta) -> None:
|
||||
"""Run the optional pre-op seed hook (recipe ops.py `pre_<op>`) BEFORE the harness performs the
|
||||
op (HC3 op/assertion split): overlays seed data-continuity markers / the backup→restore mutation
|
||||
here, then assert post-op in test_<op>.py. cc-ci's ops.py is trusted; a repo-local ops.py is
|
||||
@ -309,7 +289,11 @@ def _run_pre_hook(recipe: str, op: str, repo_local: str | None, domain: str, met
|
||||
mod = importlib.util.module_from_spec(spec)
|
||||
spec.loader.exec_module(mod)
|
||||
print(f" pre-op seed ({source}): {os.path.relpath(path, ROOT)}::pre_{op}", flush=True)
|
||||
getattr(mod, f"pre_{op}")(domain, meta)
|
||||
fn = getattr(mod, f"pre_{op}")
|
||||
# Uniform ctx convention (rcust P3): pre_<op>(ctx). A legacy (domain, meta) hook fails
|
||||
# HERE with a clear migration message, not a TypeError mid-call.
|
||||
meta_mod.check_hook_signature(fn, ("ctx",), f"{os.path.relpath(path, ROOT)}::pre_{op}")
|
||||
fn(meta_mod.hook_ctx(domain, meta, op=op))
|
||||
finally:
|
||||
if d in sys.path:
|
||||
sys.path.remove(d)
|
||||
@ -322,7 +306,7 @@ def _perform_op(
|
||||
head_ref: str | None,
|
||||
op_state: dict,
|
||||
deploy_timeout: int = 900,
|
||||
meta: dict | None = None,
|
||||
meta=None,
|
||||
) -> None:
|
||||
"""Perform the single mutating op ONCE (the harness owns the op, HC3). install has no op. Records
|
||||
what the assertions need (pre-upgrade identity, backup snapshot_id) into op_state. None of these
|
||||
@ -345,9 +329,10 @@ def _perform_op(
|
||||
# verify fails we re-run the WHOLE backup (fresh restic snapshot) with a re-stabilised DB, up to
|
||||
# 3 attempts. Recipes without BACKUP_VERIFY are unaffected (single backup, as before).
|
||||
snap = generic.perform_backup(domain)
|
||||
verify = meta.get("BACKUP_VERIFY") if meta else None
|
||||
verify = meta.BACKUP_VERIFY if meta else None
|
||||
verify_ctx = meta_mod.hook_ctx(domain, meta, op="backup") if meta else None
|
||||
attempt = 1
|
||||
while callable(verify) and not verify(domain) and attempt < 3:
|
||||
while callable(verify) and not verify(verify_ctx) and attempt < 3:
|
||||
attempt += 1
|
||||
print(
|
||||
f" backup-verify FAILED (attempt {attempt - 1}/3) — backup did not capture the "
|
||||
@ -355,7 +340,7 @@ def _perform_op(
|
||||
flush=True,
|
||||
)
|
||||
snap = generic.perform_backup(domain)
|
||||
if callable(verify) and not verify(domain):
|
||||
if callable(verify) and not verify(verify_ctx):
|
||||
print(
|
||||
f" !! backup-verify still FAILED after {attempt} attempts — backup is incomplete",
|
||||
flush=True,
|
||||
@ -371,7 +356,7 @@ def run_lifecycle_tier(
|
||||
op: str,
|
||||
repo_local: str | None,
|
||||
domain: str,
|
||||
meta: dict,
|
||||
meta,
|
||||
head_ref: str | None,
|
||||
op_state: dict,
|
||||
records: list[dict] | None = None,
|
||||
@ -386,7 +371,7 @@ def run_lifecycle_tier(
|
||||
a {tier,source,file,rc,junit} record appended, so the run can assemble per-stage/per-test
|
||||
results.json + the level afterwards. Purely additive — does not change the verdict."""
|
||||
overlay = discovery.resolve_overlay_op(recipe, op, repo_local)
|
||||
skip_gen = _skip_generic(op, meta)
|
||||
skip_gen = _skip_generic(op)
|
||||
files: list[tuple[str, str]] = []
|
||||
if not skip_gen:
|
||||
files.append(discovery.generic_op(op))
|
||||
@ -411,7 +396,7 @@ def run_lifecycle_tier(
|
||||
recipe,
|
||||
head_ref,
|
||||
op_state,
|
||||
deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", 900)),
|
||||
deploy_timeout=int(meta.DEPLOY_TIMEOUT),
|
||||
meta=meta,
|
||||
)
|
||||
with open(os.environ["CCCI_OP_STATE_FILE"], "w") as f:
|
||||
@ -449,7 +434,7 @@ def run_lifecycle_tier(
|
||||
def _enrich_deps_with_sso(parent_recipe: str, parent_domain: str, deps_list) -> dict[str, dict]:
|
||||
"""For each dep, set up a fresh realm/client + test user via the harness's provider-specific
|
||||
setup function, then return a recipe→entry dict carrying domain + admin + realm/client/user
|
||||
info — the shape the `setup_custom_tests.sh` hook (and dependent tests) read.
|
||||
info — the shape the `install_steps.sh` hook (and dependent tests) read.
|
||||
|
||||
Provider routing: today only `keycloak` is supported. authentik will need a parallel
|
||||
`setup_authentik_realm` when an authentik-dep recipe enrolls (DEFERRED.md #9).
|
||||
@ -463,7 +448,7 @@ def _enrich_deps_with_sso(parent_recipe: str, parent_domain: str, deps_list) ->
|
||||
if not dep_recipe or not dep_domain:
|
||||
continue
|
||||
if dep_recipe != "keycloak":
|
||||
# Provider not yet supported — record bare entry; setup_custom_tests.sh / tests will
|
||||
# Provider not yet supported — record bare entry; install_steps.sh / tests will
|
||||
# raise if they need realm/client info they don't see.
|
||||
out[dep_recipe] = entry
|
||||
continue
|
||||
@ -507,12 +492,10 @@ def _provision_deps(
|
||||
|
||||
Splits deps into live-warm (shared provider at a stable domain + a per-run realm) vs cold
|
||||
(co-deployed per run), provisions each dep's SSO realm/client/user, and persists the enriched
|
||||
dict the `setup_custom_tests.sh`/`install_steps.sh` hooks + dependent tests read. Raises on any
|
||||
failure (the caller marks deps-not-ready). Used by BOTH wiring paths:
|
||||
- post-deploy (legacy): provision AFTER generic tiers, then `setup_custom_tests.sh` does an
|
||||
in-place OIDC redeploy.
|
||||
- install-time (`OIDC_AT_INSTALL`, Q3.2a): provision BEFORE the single deploy so the
|
||||
install-tier `install_steps.sh` hook wires OIDC env into that one deploy — no reconverge.
|
||||
dict the `install_steps.sh` hooks + dependent tests read. Raises on any failure (the caller
|
||||
marks deps-not-ready). Install-time wiring is the ONLY mode (rcust P2b): provision BEFORE the
|
||||
single deploy so the install-tier `install_steps.sh` hook wires OIDC env into that one deploy —
|
||||
no reconverge, no post-deploy `setup_custom_tests.sh` machinery.
|
||||
"""
|
||||
warm_deps, cold_deps = [], []
|
||||
for d in declared:
|
||||
@ -523,7 +506,7 @@ def _provision_deps(
|
||||
if wd:
|
||||
print(f" dep: {d} warm provider {wd} not up — cold fallback", flush=True)
|
||||
cold_deps.append(d)
|
||||
dep_metas = {d: _load_meta(d) for d in cold_deps}
|
||||
dep_metas = {d: meta_mod.load(d) for d in cold_deps}
|
||||
deps_list = (
|
||||
deps_mod.deploy_deps(recipe, os.environ.get("PR", "0"), ref, cold_deps, meta_for=dep_metas)
|
||||
if cold_deps
|
||||
@ -541,32 +524,6 @@ def _provision_deps(
|
||||
return deps_state
|
||||
|
||||
|
||||
def _run_setup_custom_tests_hook(recipe: str, domain: str, deps_file: str) -> None:
|
||||
"""Run `tests/<recipe>/setup_custom_tests.sh` if present (operator-2026-05-28 SSO-dep plan
|
||||
§3.2). The hook reads `$CCCI_DEPS_FILE`, sets OIDC env via `abra app config set` + secret
|
||||
insert, and triggers an in-place `abra app deploy --force --chaos`. Failure here propagates
|
||||
to mark deps-not-ready (caught in main())."""
|
||||
path = os.path.join(ROOT, "tests", recipe, "setup_custom_tests.sh")
|
||||
if not os.path.isfile(path):
|
||||
# No hook = recipe doesn't need post-deps wiring; deps are deployed + creds available
|
||||
# via deps_apps fixture as-is.
|
||||
print(
|
||||
f" setup_custom_tests: no hook at {os.path.relpath(path, ROOT)} (deps creds ready in $CCCI_DEPS_FILE)",
|
||||
flush=True,
|
||||
)
|
||||
return
|
||||
print(f" setup_custom_tests hook: {os.path.relpath(path, ROOT)}", flush=True)
|
||||
rc = subprocess.run(
|
||||
["bash", path],
|
||||
check=False,
|
||||
env=dict(os.environ, CCCI_APP_DOMAIN=domain, CCCI_RECIPE=recipe, CCCI_DEPS_FILE=deps_file),
|
||||
)
|
||||
if rc.returncode != 0:
|
||||
raise RuntimeError(
|
||||
f"setup_custom_tests.sh exited {rc.returncode} (deps env not wired into parent)"
|
||||
)
|
||||
|
||||
|
||||
def run_custom(
|
||||
recipe: str,
|
||||
repo_local: str | None,
|
||||
@ -609,7 +566,7 @@ def _wait_undeployed(domain: str, timeout: int = 120) -> None:
|
||||
|
||||
|
||||
def run_quick(
|
||||
recipe: str, ref: str | None, head_ref: str | None, repo_local: str | None, meta: dict
|
||||
recipe: str, ref: str | None, head_ref: str | None, repo_local: str | None, meta
|
||||
) -> int:
|
||||
"""WC4 `--quick` opt-in fast lane (plan §2). Reattach the data-warm canonical (known-good volume)
|
||||
→ upgrade IN PLACE to the PR head (chaos) → assert generic UPGRADE (reconverge+moved+serving) +
|
||||
@ -645,7 +602,7 @@ def run_quick(
|
||||
|
||||
op_state: dict = {}
|
||||
results: dict[str, str] = {}
|
||||
declared = deps_mod.declared_deps(recipe)
|
||||
declared = list(meta.DEPS)
|
||||
deps_state: dict = {}
|
||||
deps_ready = True
|
||||
deps_not_ready_reason = ""
|
||||
@ -657,28 +614,32 @@ def run_quick(
|
||||
try:
|
||||
# 1) reattach the canonical (warm boot at the known-good version + retained volume)
|
||||
try:
|
||||
canonical.deploy_canonical(recipe, timeout=int(meta.get("DEPLOY_TIMEOUT", 900)))
|
||||
canonical.deploy_canonical(recipe, timeout=int(meta.DEPLOY_TIMEOUT))
|
||||
lifecycle.wait_healthy(
|
||||
domain,
|
||||
ok_codes=tuple(meta["HEALTH_OK"]),
|
||||
path=meta["HEALTH_PATH"],
|
||||
deploy_timeout=meta["DEPLOY_TIMEOUT"],
|
||||
http_timeout=meta["HTTP_TIMEOUT"],
|
||||
ok_codes=tuple(meta.HEALTH_OK),
|
||||
path=meta.HEALTH_PATH,
|
||||
deploy_timeout=meta.DEPLOY_TIMEOUT,
|
||||
http_timeout=meta.HTTP_TIMEOUT,
|
||||
)
|
||||
warm_ok = True
|
||||
except Exception as e: # noqa: BLE001
|
||||
print(f"!! canonical reattach/readiness failed: {_scrub(str(e))}", flush=True)
|
||||
|
||||
if warm_ok:
|
||||
# 2) deps (warm keycloak + per-run realm) — mirrors main()'s warm/cold split
|
||||
# 2) deps (warm keycloak + per-run realm) — mirrors main()'s warm/cold split. NB
|
||||
# (rcust P2b): deps are provisioned (realm/creds in $CCCI_DEPS_FILE) but quick mode
|
||||
# cannot do install-time OIDC env wiring — the canonical app pre-exists its per-run
|
||||
# realm. No quick-enrolled recipe declares DEPS today; if one ever does, its
|
||||
# requires_deps tests will exercise creds-only flows or skip (F2-11 keeps the signal).
|
||||
if declared:
|
||||
print(f"\n===== setup_custom_tests (quick): deps {declared} =====", flush=True)
|
||||
print(f"\n===== deps (quick): {declared} =====", flush=True)
|
||||
try:
|
||||
warm_deps, cold_deps = [], []
|
||||
for d in declared:
|
||||
wd = warm.warm_domain(d)
|
||||
(warm_deps if (wd and warm.is_warm_up(d, wd)) else cold_deps).append(d)
|
||||
dep_metas = {d: _load_meta(d) for d in cold_deps}
|
||||
dep_metas = {d: meta_mod.load(d) for d in cold_deps}
|
||||
deps_list = (
|
||||
deps_mod.deploy_deps(
|
||||
recipe, os.environ.get("PR", "0"), ref, cold_deps, meta_for=dep_metas
|
||||
@ -693,12 +654,11 @@ def run_quick(
|
||||
print(f" dep: using live-warm {d} @ {wd} (per-run realm)", flush=True)
|
||||
deps_state = _enrich_deps_with_sso(recipe, domain, deps_list)
|
||||
deps_mod.write_run_state(deps_state)
|
||||
_run_setup_custom_tests_hook(recipe, domain, depsfile)
|
||||
except Exception as e: # noqa: BLE001
|
||||
deps_ready = False
|
||||
deps_not_ready_reason = _scrub(str(e))[:300]
|
||||
print(
|
||||
f"!! setup_custom_tests failed (deps-not-ready): {deps_not_ready_reason}",
|
||||
f"!! dep provisioning failed (deps-not-ready): {deps_not_ready_reason}",
|
||||
flush=True,
|
||||
)
|
||||
|
||||
@ -813,7 +773,7 @@ def run_quick(
|
||||
overall = 1
|
||||
if sso_unverified:
|
||||
print(
|
||||
f"!! DEPS={declared} but setup_custom_tests failed and {requires_deps_skipped} "
|
||||
f"!! DEPS={declared} but dep provisioning failed and {requires_deps_skipped} "
|
||||
"requires_deps SKIPPED — SSO NOT verified (F2-11)",
|
||||
file=sys.stderr,
|
||||
)
|
||||
@ -848,7 +808,7 @@ def promote_canonical(recipe: str, head_ref: str | None) -> None:
|
||||
if not latest:
|
||||
print(f"WC5 promote: no version tags for {recipe} — skip", flush=True)
|
||||
return
|
||||
meta = _load_meta(recipe)
|
||||
meta = meta_mod.load(recipe)
|
||||
# The cold run's deploy-count was already asserted + the countfile removed; don't perturb it.
|
||||
os.environ.pop("CCCI_DEPLOY_COUNT_FILE", None)
|
||||
print(
|
||||
@ -860,14 +820,15 @@ def promote_canonical(recipe: str, head_ref: str | None) -> None:
|
||||
domain,
|
||||
version=latest,
|
||||
secrets=True,
|
||||
deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", 900)),
|
||||
deploy_timeout=int(meta.DEPLOY_TIMEOUT),
|
||||
meta=meta,
|
||||
)
|
||||
lifecycle.wait_healthy(
|
||||
domain,
|
||||
ok_codes=tuple(meta["HEALTH_OK"]),
|
||||
path=meta["HEALTH_PATH"],
|
||||
deploy_timeout=meta["DEPLOY_TIMEOUT"],
|
||||
http_timeout=meta["HTTP_TIMEOUT"],
|
||||
ok_codes=tuple(meta.HEALTH_OK),
|
||||
path=meta.HEALTH_PATH,
|
||||
deploy_timeout=meta.DEPLOY_TIMEOUT,
|
||||
http_timeout=meta.HTTP_TIMEOUT,
|
||||
)
|
||||
abra.undeploy(domain)
|
||||
_wait_undeployed(domain)
|
||||
@ -896,6 +857,17 @@ def main() -> int:
|
||||
print(
|
||||
f"== cc-ci run: recipe={recipe} ref={ref} pr={os.environ.get('PR', '0')} stages={sorted(stages)}"
|
||||
)
|
||||
# P2c: the CCCI_SKIP_GENERIC* env escape hatch is LOCAL-DEV-ONLY. If it rides a CI (drone)
|
||||
# run, shout — generic-floor coverage is reduced and the verdict must not look routine.
|
||||
for ov in skip_generic_env_overrides():
|
||||
if os.environ.get("DRONE"):
|
||||
print(
|
||||
f"!! {ov}=1 — dev-only generic-floor override ACTIVE IN A CI RUN; generic "
|
||||
"assertions are suppressed for the affected op(s). This must never gate a merge.",
|
||||
flush=True,
|
||||
)
|
||||
else:
|
||||
print(f"== {ov}=1 (dev-only generic-floor override active)", flush=True)
|
||||
# Concurrent-run safety is structural: this run's recipe trees live in its own ABRA_DIR
|
||||
# (exported here, before ANY abra call), so no recipe-tree lock exists; same-DOMAIN runs
|
||||
# serialise on the app-domain flock taken in deploy_app (see docs/concurrency.md).
|
||||
@ -906,7 +878,7 @@ def main() -> int:
|
||||
# HEAD (the catalogue current) for a non-PR `!testme`. Captured before any version-tag checkout.
|
||||
head_ref = ref or lifecycle.recipe_head_commit(recipe)
|
||||
repo_local = snapshot_recipe_tests(recipe)
|
||||
meta = _load_meta(recipe)
|
||||
meta = meta_mod.load(recipe)
|
||||
|
||||
# WC4/WC7: opt-in `--quick` fast lane. Requires an existing data-warm canonical; if none, fall
|
||||
# back cleanly to the full COLD run below so the PR is still tested (DECISIONS Phase-2w).
|
||||
@ -929,9 +901,7 @@ def main() -> int:
|
||||
# override must be an exact published version tag (deployed as a pinned base). (Adversary §7.1.)
|
||||
want_upgrade = "upgrade" in stages
|
||||
prev = (
|
||||
(meta.get("UPGRADE_BASE_VERSION") or lifecycle.previous_version(recipe))
|
||||
if want_upgrade
|
||||
else None
|
||||
(meta.UPGRADE_BASE_VERSION or lifecycle.previous_version(recipe)) if want_upgrade else None
|
||||
)
|
||||
base = prev or target
|
||||
backup_cap = generic.backup_capable(recipe, meta)
|
||||
@ -960,10 +930,8 @@ def main() -> int:
|
||||
os.environ["CCCI_OP_STATE_FILE"] = statefile
|
||||
op_state: dict = {}
|
||||
|
||||
# Run-scoped dep state (Phase 2 Q2.3, refined per operator-2026-05-28 SSO-dep plan §1):
|
||||
# deps now deploy AFTER generic tiers (between RESTORE and CUSTOM) so a failed dep deploy
|
||||
# cannot break the generic-tier signal. The `setup_custom_tests` step deploys each dep + runs
|
||||
# `tests/<recipe>/setup_custom_tests.sh` to wire OIDC env via in-place redeploy.
|
||||
# Run-scoped dep state (Phase 2 Q2.3; install-time-only since rcust P2b): deps are provisioned
|
||||
# BEFORE the single deploy so install_steps.sh wires OIDC env into that one deploy.
|
||||
# `$CCCI_DEPS_FILE` is written with the full creds dict the hook script needs (jq-readable).
|
||||
depsfile = _run_state_path("deps") + ".json"
|
||||
with open(depsfile, "w") as f:
|
||||
@ -974,15 +942,9 @@ def main() -> int:
|
||||
with contextlib.suppress(OSError):
|
||||
os.remove(skipfile)
|
||||
os.environ["CCCI_DEPS_SKIP_REPORT"] = skipfile
|
||||
declared = deps_mod.declared_deps(recipe)
|
||||
# Q3.2a: a recipe that tolerates OIDC env at first boot AND whose deps are live-warm wires OIDC
|
||||
# at INSTALL time (provision the realm BEFORE the single deploy; install_steps.sh writes the env
|
||||
# into it) instead of the post-deploy in-place `--chaos` redeploy — which is flaky on the heavy
|
||||
# 12-service lasuite-drive stack (collabora WOPI race; see JOURNAL Step 0). Opt-in per recipe.
|
||||
oidc_at_install = bool(meta.get("OIDC_AT_INSTALL")) and bool(declared)
|
||||
declared = list(meta.DEPS)
|
||||
if declared:
|
||||
when = "BEFORE deploy (install-time OIDC)" if oidc_at_install else "AFTER generic tiers"
|
||||
print(f"\n===== DEPS declared (provision {when}): {declared} =====", flush=True)
|
||||
print(f"\n===== DEPS declared (provision BEFORE deploy): {declared} =====", flush=True)
|
||||
deps_state: dict[str, dict] = {} # new shape: recipe→entry dict (sso-dep plan §1)
|
||||
deps_ready = True
|
||||
deps_not_ready_reason: str = ""
|
||||
@ -996,7 +958,7 @@ def main() -> int:
|
||||
# install_steps.sh can read $CCCI_DEPS_FILE and wire the OIDC env into that one deploy. On
|
||||
# failure we mark deps-not-ready but STILL deploy the recipe alone (install_steps.sh no-ops
|
||||
# on an empty deps file) so the generic tiers run; the OIDC custom test then skips → F2-11. ----
|
||||
if oidc_at_install:
|
||||
if declared:
|
||||
print(
|
||||
f"\n===== install-time OIDC: provisioning deps {declared} BEFORE deploy =====",
|
||||
flush=True,
|
||||
@ -1023,18 +985,21 @@ def main() -> int:
|
||||
version=base,
|
||||
secrets=True,
|
||||
install_steps_hook=hook,
|
||||
deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", 900)),
|
||||
deploy_timeout=int(meta.DEPLOY_TIMEOUT),
|
||||
meta=meta,
|
||||
)
|
||||
lifecycle.wait_healthy(
|
||||
domain,
|
||||
ok_codes=tuple(meta["HEALTH_OK"]),
|
||||
path=meta["HEALTH_PATH"],
|
||||
deploy_timeout=meta["DEPLOY_TIMEOUT"],
|
||||
http_timeout=meta["HTTP_TIMEOUT"],
|
||||
ok_codes=tuple(meta.HEALTH_OK),
|
||||
path=meta.HEALTH_PATH,
|
||||
deploy_timeout=meta.DEPLOY_TIMEOUT,
|
||||
http_timeout=meta.HTTP_TIMEOUT,
|
||||
)
|
||||
# Recipe READY_PROBE (e.g. lasuite-drive collabora WOPI discovery) — readiness beyond
|
||||
# replica convergence + app HEALTH_PATH; no-op for recipes without one.
|
||||
lifecycle.wait_ready_probes(meta, domain, timeout=int(meta.get("DEPLOY_TIMEOUT", 900)))
|
||||
lifecycle.wait_ready_probes(
|
||||
meta, domain, timeout=int(meta.DEPLOY_TIMEOUT), op="install"
|
||||
)
|
||||
deploy_ok = True
|
||||
except Exception as e: # noqa: BLE001 — a failed deploy is a reported INSTALL failure
|
||||
print(f"!! deploy/readiness failed: {e}", flush=True)
|
||||
@ -1131,41 +1096,11 @@ def main() -> int:
|
||||
if backup_cap
|
||||
else "skip"
|
||||
)
|
||||
# ---- setup_custom_tests step (NEW, operator-2026-05-28 SSO-dep plan §3.2) ----
|
||||
# Deploy each declared dep + wire OIDC env into the parent app via the per-recipe
|
||||
# setup_custom_tests.sh hook + in-place redeploy. Failure here marks deps-not-ready
|
||||
# but does NOT abort the run — @pytest.mark.requires_deps tests skip with reason;
|
||||
# non-deps custom tests still run normally.
|
||||
if declared and not oidc_at_install:
|
||||
# LEGACY post-deploy path: provision deps AFTER generic tiers, then wire OIDC env
|
||||
# into the parent via the setup_custom_tests.sh hook + an in-place `--chaos` redeploy.
|
||||
print("\n===== setup_custom_tests: deps + OIDC wiring =====", flush=True)
|
||||
try:
|
||||
deps_state = _provision_deps(recipe, domain, ref, declared)
|
||||
# Run the per-recipe post-deps hook (jq-driven OIDC wiring + in-place redeploy)
|
||||
_run_setup_custom_tests_hook(recipe, domain, depsfile)
|
||||
except Exception as e: # noqa: BLE001 — setup failure is ISOLATED to dep-marked tests
|
||||
deps_ready = False
|
||||
deps_not_ready_reason = _scrub(str(e))[:300]
|
||||
print(
|
||||
f"!! setup_custom_tests failed (deps-not-ready): {deps_not_ready_reason}",
|
||||
flush=True,
|
||||
)
|
||||
elif declared and oidc_at_install and deps_ready:
|
||||
# INSTALL-TIME path (Q3.2a): deps were provisioned BEFORE the single deploy and the
|
||||
# install-tier install_steps.sh hook already wired OIDC env into that one deploy —
|
||||
# so NO re-provision, NO reconverge here. Run only the post-deploy setup hook
|
||||
# (e.g. lasuite-drive's minio-createbuckets one-shot), which needs the live stack.
|
||||
print("\n===== post-deploy setup (OIDC already wired at install) =====", flush=True)
|
||||
try:
|
||||
_run_setup_custom_tests_hook(recipe, domain, depsfile)
|
||||
except Exception as e: # noqa: BLE001 — isolated to dep-marked / state-dependent tests
|
||||
deps_ready = False
|
||||
deps_not_ready_reason = _scrub(str(e))[:300]
|
||||
print(
|
||||
f"!! post-deploy setup failed: {deps_not_ready_reason}",
|
||||
flush=True,
|
||||
)
|
||||
# (rcust P2b: install-time deps wiring is the ONLY mode — deps were provisioned BEFORE
|
||||
# the single deploy and install_steps.sh wired the OIDC env into it. The legacy
|
||||
# post-deploy provisioning + setup_custom_tests.sh redeploy machinery is deleted; a
|
||||
# recipe's post-deploy seeding belongs in ops.py pre_install, e.g. lasuite-drive's
|
||||
# MinIO bucket one-shot.)
|
||||
|
||||
# ---- CUSTOM tier ----
|
||||
if "custom" in stages:
|
||||
@ -1240,8 +1175,7 @@ def main() -> int:
|
||||
|
||||
# ---- per-op summary (DG6 feed) ----
|
||||
# SSO-dep plan §1: DG4.1 generalised — one `abra app new` per app in the run (recipe + each
|
||||
# COLD dep). In-place reconfigure-and-redeploy (the setup_custom_tests step's
|
||||
# `abra app deploy --force --chaos`) is NOT a fresh `app_new` and does NOT increment the count.
|
||||
# COLD dep). Chaos redeploys are NOT a fresh `app_new` and do NOT increment the count.
|
||||
# WC1: a live-warm dep (keycloak) is NOT deployed by the run — it only gets a per-run realm — so
|
||||
# warm deps contribute 0. So expected = 1 + (number of COLD deps that actually got deployed).
|
||||
_dep_entries = deps_state.values() if isinstance(deps_state, dict) else (deps_state or [])
|
||||
@ -1282,12 +1216,12 @@ def main() -> int:
|
||||
overall = 1
|
||||
if any(v == "fail" for v in results.values()):
|
||||
overall = 1
|
||||
# F2-11: a deps-declaring recipe whose setup_custom_tests failed has NOT verified its SSO/OIDC
|
||||
# F2-11: a deps-declaring recipe whose dep provisioning failed has NOT verified its SSO/OIDC
|
||||
# claim — its requires_deps tests SKIPPED (a skip-only file exits 0, so without this the run
|
||||
# would report GREEN). Fail the run for that recipe; generic-tier results above are untouched.
|
||||
if sso_dep_unverified(declared, deps_ready, requires_deps_skipped):
|
||||
print(
|
||||
f"!! recipe declares DEPS={declared} but setup_custom_tests failed and "
|
||||
f"!! recipe declares DEPS={declared} but dep provisioning failed and "
|
||||
f"{requires_deps_skipped} requires_deps (SSO) test(s) were SKIPPED — SSO claim NOT "
|
||||
f"verified; failing run (F2-11). deps-not-ready: {deps_not_ready_reason}",
|
||||
file=sys.stderr,
|
||||
@ -1314,7 +1248,7 @@ def main() -> int:
|
||||
no_secret_leak=True, # narrowed below by an actual scan of the serialised artifact
|
||||
screenshot=screenshot_rel, # Phase 3 U1 (R4): relative PNG name iff capture succeeded
|
||||
finished_ts=time.time(),
|
||||
expected_na=meta.get("EXPECTED_NA"), # declared intentional-skip map (recipe_meta)
|
||||
expected_na=meta.EXPECTED_NA, # declared intentional-skip map (recipe_meta)
|
||||
)
|
||||
# Real (if narrow) leak check: no known infra-secret value may appear in the artifact (R7).
|
||||
blob = json.dumps(data)
|
||||
|
||||
71
scripts/gen-meta-docs.py
Normal file
71
scripts/gen-meta-docs.py
Normal file
@ -0,0 +1,71 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Render the harness.meta KEYS registry to the markdown key-reference table in
|
||||
docs/recipe-customization.md §4 (rcust P1.5; kills the R5 doc-drift class).
|
||||
|
||||
Usage:
|
||||
python3 scripts/gen-meta-docs.py # rewrite the table in-place between the markers
|
||||
python3 scripts/gen-meta-docs.py --print # print the rendered table to stdout (used by the
|
||||
# doc-sync unit test, tests/unit/test_meta.py)
|
||||
|
||||
The table lives between `<!-- META-TABLE-START -->` / `<!-- META-TABLE-END -->` markers; a unit
|
||||
test asserts the committed table equals this rendering, so editing it by hand fails CI.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
ROOT = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
||||
sys.path.insert(0, os.path.join(ROOT, "runner"))
|
||||
from harness.meta import KEYS # noqa: E402
|
||||
|
||||
DOC = os.path.join(ROOT, "docs", "recipe-customization.md")
|
||||
START = "<!-- META-TABLE-START -->"
|
||||
END = "<!-- META-TABLE-END -->"
|
||||
|
||||
|
||||
def _default_repr(v) -> str:
|
||||
if v is None:
|
||||
return "`None`"
|
||||
return f"`{v!r}`"
|
||||
|
||||
|
||||
def render() -> str:
|
||||
lines = [
|
||||
START,
|
||||
"",
|
||||
"_This table is GENERATED from the `runner/harness/meta.py` KEYS registry by"
|
||||
" `scripts/gen-meta-docs.py` — do not edit by hand (a unit test pins the sync)._",
|
||||
"",
|
||||
"| Key | Type | Default | Meaning |",
|
||||
"|---|---|---|---|",
|
||||
]
|
||||
for k in KEYS:
|
||||
doc = k.doc.replace("|", "\\|")
|
||||
name = f"`{k.name}`" + (" **(deprecated)**" if k.deprecated else "")
|
||||
lines.append(f"| {name} | `{k.type}` | {_default_repr(k.default)} | {doc} |")
|
||||
lines += ["", END]
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def main() -> int:
|
||||
table = render()
|
||||
if "--print" in sys.argv:
|
||||
print(table)
|
||||
return 0
|
||||
with open(DOC) as f:
|
||||
text = f.read()
|
||||
if START not in text or END not in text:
|
||||
print(f"{DOC}: missing {START}/{END} markers", file=sys.stderr)
|
||||
return 1
|
||||
head, _, rest = text.partition(START)
|
||||
_, _, tail = rest.partition(END)
|
||||
with open(DOC, "w") as f:
|
||||
f.write(head + table + tail)
|
||||
print(f"{DOC}: key table rewritten from the registry ({len(KEYS)} keys)")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
@ -9,14 +9,14 @@ sys.path.insert(0, os.path.dirname(__file__))
|
||||
import _p4 # noqa: E402
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_p4.create_account(domain)
|
||||
def pre_upgrade(ctx):
|
||||
_p4.create_account(ctx.domain)
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_p4.create_account(domain)
|
||||
def pre_backup(ctx):
|
||||
_p4.create_account(ctx.domain)
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
_p4.delete_account(domain)
|
||||
assert not _p4.account_exists(domain), "marker account delete did not take (pre_restore)"
|
||||
def pre_restore(ctx):
|
||||
_p4.delete_account(ctx.domain)
|
||||
assert not _p4.account_exists(ctx.domain), "marker account delete did not take (pre_restore)"
|
||||
|
||||
@ -14,32 +14,7 @@ import pytest
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "runner"))
|
||||
from harness import deps as deps_mod # noqa: E402
|
||||
from harness import lifecycle, naming
|
||||
|
||||
|
||||
def _short(s: str, n: int = 8) -> str:
|
||||
return "".join(c for c in s if c.isalnum())[:n] or "local"
|
||||
|
||||
|
||||
def _recipe_meta(recipe: str) -> dict:
|
||||
"""Optional per-recipe config so enrolling a recipe needs NO shared-harness change (D5).
|
||||
A recipe may ship tests/<recipe>/recipe_meta.py with any of: HEALTH_PATH (str),
|
||||
HEALTH_OK (tuple of status codes), DEPLOY_TIMEOUT (int), HTTP_TIMEOUT (int)."""
|
||||
path = os.path.join(os.path.dirname(__file__), recipe, "recipe_meta.py")
|
||||
meta = {
|
||||
"HEALTH_PATH": "/",
|
||||
"HEALTH_OK": (200, 301, 302),
|
||||
"DEPLOY_TIMEOUT": 600,
|
||||
"HTTP_TIMEOUT": 300,
|
||||
}
|
||||
if os.path.exists(path):
|
||||
ns: dict = {}
|
||||
with open(path) as fh:
|
||||
exec(compile(fh.read(), path, "exec"), ns) # noqa: S102 (trusted, in-repo)
|
||||
for k in meta:
|
||||
if k in ns:
|
||||
meta[k] = ns[k]
|
||||
return meta
|
||||
from harness import meta as meta_mod # noqa: E402
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
@ -48,18 +23,10 @@ def recipe() -> str:
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def app_domain(recipe) -> str:
|
||||
# Docker swarm config/secret names = <stackname>_<res>_<ver> must be <= 64 chars, and
|
||||
# stackname is the sanitized domain. ".ci.commoninternet.net" alone is 22 chars, so the
|
||||
# subdomain label must stay short. Use <recipe[:4]>-<6hex(recipe|pr|ref)> — unique per run,
|
||||
# collision-safe across recipes (full recipe in the hash), readable context lives in the
|
||||
# Drone build params + PR comment. (Deviation from plan §4.0 long name; see DECISIONS.md.)
|
||||
return naming.app_domain(recipe, os.environ.get("PR", "0"), os.environ.get("REF"))
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def meta(recipe) -> dict:
|
||||
return _recipe_meta(recipe)
|
||||
def meta(recipe):
|
||||
"""The recipe's FULL validated customization (RecipeMeta, attribute access) via the single
|
||||
loader (rcust P1 — previously this fixture saw only the 4 base keys, spec §8 R3)."""
|
||||
return meta_mod.load(recipe)
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
@ -73,32 +40,55 @@ def live_app() -> str:
|
||||
return domain
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def deps_apps() -> dict[str, str]:
|
||||
"""Phase 2 Q2.3 dependency-resolver contract (refined operator-2026-05-28 SSO-dep plan §1):
|
||||
when a recipe declares `DEPS = [...]` in its `recipe_meta.py`, the orchestrator deploys each
|
||||
dep AFTER the generic tiers (between RESTORE and CUSTOM) and persists their per-run identity
|
||||
+ SSO creds to `$CCCI_DEPS_FILE`. Tests access the dep's per-run domain via this fixture.
|
||||
For full SSO creds (realm/client/secret/admin) use the `deps_creds` fixture instead.
|
||||
@pytest.fixture
|
||||
def op_state() -> dict:
|
||||
"""The orchestrator's run-scoped op context (rcust P4): versions, artifact paths — written to
|
||||
`$CCCI_OP_STATE_FILE` after each lifecycle op (e.g. `{"upgrade": {"before": {...},
|
||||
"head_ref": ...}, "backup": {"snapshot_id": ...}}`). Overlay tests read op facts from here
|
||||
instead of hand-parsing env/JSON. Skips with a clear reason outside an orchestrator run."""
|
||||
import json
|
||||
|
||||
Returns `{dep_recipe: domain}` (str→str). Empty when no deps declared OR deps-not-ready."""
|
||||
path = os.environ.get("CCCI_OP_STATE_FILE")
|
||||
if not path:
|
||||
pytest.skip(
|
||||
"CCCI_OP_STATE_FILE not set — op_state is only available under the orchestrator"
|
||||
)
|
||||
if not os.path.exists(path):
|
||||
pytest.skip(f"op-state file missing ({path}) — orchestrator has not performed an op yet")
|
||||
try:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
except ValueError:
|
||||
pytest.skip(f"op-state file unreadable/not JSON ({path})")
|
||||
|
||||
|
||||
class _DepEntry(dict):
|
||||
"""One provisioned dep (full creds dict) with attribute sugar: `entry.domain`, `entry.realm`,
|
||||
`entry.client_secret`, ... — dict-style access works too (rcust P2d)."""
|
||||
|
||||
def __getattr__(self, name):
|
||||
try:
|
||||
return self[name]
|
||||
except KeyError as e:
|
||||
raise AttributeError(name) from e
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def deps() -> dict[str, _DepEntry]:
|
||||
"""The recipe's provisioned deps (rcust P2d — consolidates the old `deps_apps`+`deps_creds`
|
||||
pair). When a recipe declares `DEPS = [...]` in its `recipe_meta.py`, the orchestrator
|
||||
provisions each dep BEFORE the single deploy and persists per-run identity + SSO creds to
|
||||
`$CCCI_DEPS_FILE`. `deps["keycloak"]` carries domain/realm/client_id/client_secret/user/
|
||||
password/email/admin_user/admin_password/discovery_url/token_url/... (`.domain` etc. work as
|
||||
attributes). Empty when no deps declared OR deps-not-ready — pair with
|
||||
`@pytest.mark.requires_deps` so the F2-11 skip-report keeps the green signal honest."""
|
||||
state = deps_mod.deps_as_dict(deps_mod.load_run_state())
|
||||
return {r: e["domain"] for r, e in state.items() if e.get("domain")}
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def deps_creds() -> dict[str, dict]:
|
||||
"""Full SSO-creds dict for each declared dep (operator-2026-05-28 SSO-dep plan §1).
|
||||
`deps_creds["keycloak"]` returns the entry written by setup_custom_tests with keys
|
||||
domain/realm/client_id/client_secret/user/password/email/admin_user/admin_password/
|
||||
discovery_url/token_url/.... Use this in `@pytest.mark.requires_deps` tests that need to
|
||||
authenticate via OIDC."""
|
||||
return deps_mod.deps_as_dict(deps_mod.load_run_state())
|
||||
return {r: _DepEntry(e) for r, e in state.items()}
|
||||
|
||||
|
||||
def pytest_collection_modifyitems(config, items):
|
||||
"""SSO-dep plan §4: tests marked `@pytest.mark.requires_deps` are skipped with reason
|
||||
`deps-not-ready: <captured-err>` when the orchestrator's setup_custom_tests step failed
|
||||
`deps-not-ready: <captured-err>` when the orchestrator's dep provisioning failed
|
||||
(orchestrator sets CCCI_DEPS_READY=0 in env). Non-deps custom tests are unaffected.
|
||||
|
||||
This is failure-isolation per plan §1 — generic tiers cannot break the SSO-marked tests'
|
||||
@ -131,40 +121,5 @@ def pytest_configure(config):
|
||||
"""Register the `requires_deps` marker so pytest doesn't warn about it."""
|
||||
config.addinivalue_line(
|
||||
"markers",
|
||||
"requires_deps: test requires DEPS-declared services + setup_custom_tests success.",
|
||||
"requires_deps: test requires DEPS-declared services + dep provisioning success.",
|
||||
)
|
||||
|
||||
|
||||
def _wait_healthy(domain, meta):
|
||||
lifecycle.wait_healthy(
|
||||
domain,
|
||||
ok_codes=tuple(meta["HEALTH_OK"]),
|
||||
path=meta["HEALTH_PATH"],
|
||||
deploy_timeout=meta["DEPLOY_TIMEOUT"],
|
||||
http_timeout=meta["HTTP_TIMEOUT"],
|
||||
)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def deployed(recipe, app_domain, meta, request):
|
||||
"""Function-scoped: deploy the current/$REF version healthy, guaranteed teardown after.
|
||||
Used by stages that start from current (install/backup)."""
|
||||
version = os.environ.get("VERSION") or None
|
||||
lifecycle.janitor()
|
||||
request.addfinalizer(lambda: lifecycle.teardown_app(app_domain))
|
||||
lifecycle.deploy_app(recipe, app_domain, version=version)
|
||||
_wait_healthy(app_domain, meta)
|
||||
return app_domain
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def deployed_app(recipe, app_domain, meta):
|
||||
"""Install stage: deploy the recipe and wait until healthy; tear down at session end."""
|
||||
version = os.environ.get("VERSION") or None
|
||||
lifecycle.janitor() # sweep orphans from crashed runs first
|
||||
try:
|
||||
lifecycle.deploy_app(recipe, app_domain, version=version, secrets=True)
|
||||
_wait_healthy(app_domain, meta)
|
||||
yield app_domain
|
||||
finally:
|
||||
lifecycle.teardown_app(app_domain)
|
||||
|
||||
@ -15,13 +15,13 @@ def _write(domain, val):
|
||||
lifecycle.exec_in_app(domain, ["sh", "-c", f"echo {val} > {MARKER}"])
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_write(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_write(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_write(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_write(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
_write(domain, "mutated") # diverge so a successful restore is observable
|
||||
def pre_restore(ctx):
|
||||
_write(ctx.domain, "mutated") # diverge so a successful restore is observable
|
||||
|
||||
@ -7,9 +7,9 @@ DEPLOY_TIMEOUT = 600
|
||||
HTTP_TIMEOUT = 600
|
||||
|
||||
|
||||
def EXTRA_ENV(domain):
|
||||
def EXTRA_ENV(ctx):
|
||||
"""cryptpad needs a SANDBOX_DOMAIN distinct from the main DOMAIN (it serves user content from a
|
||||
separate origin; the web router routes both). Derive a sibling subdomain under the same wildcard
|
||||
(covered by the wildcard cert, so no cert work)."""
|
||||
label, _, rest = domain.partition(".")
|
||||
label, _, rest = ctx.domain.partition(".")
|
||||
return {"SANDBOX_DOMAIN": f"{label}-sb.{rest}"}
|
||||
|
||||
@ -12,8 +12,8 @@ from harness import lifecycle
|
||||
MARKER_PATH = "/usr/share/nginx/html/ci-marker.txt"
|
||||
|
||||
|
||||
def pre_restore(domain: str, meta: dict) -> None:
|
||||
def pre_restore(ctx) -> None:
|
||||
"""Write 'mutated' to the marker before restore runs. If restore brings back the
|
||||
snapshot (which has no marker — never seeded by pre_backup), the marker ends up
|
||||
MISSING or 'mutated' after restore → test_restore_returns_state FAILS → restore=RED."""
|
||||
lifecycle.exec_in_app(domain, ["sh", "-c", f"echo mutated > {MARKER_PATH}"])
|
||||
lifecycle.exec_in_app(ctx.domain, ["sh", "-c", f"echo mutated > {MARKER_PATH}"])
|
||||
|
||||
@ -11,5 +11,5 @@ from harness import lifecycle
|
||||
MARKER_PATH = "/usr/share/nginx/html/ci-marker.txt"
|
||||
|
||||
|
||||
def pre_restore(domain: str, meta: dict) -> None:
|
||||
lifecycle.exec_in_app(domain, ["sh", "-c", f"echo mutated > {MARKER_PATH}"])
|
||||
def pre_restore(ctx) -> None:
|
||||
lifecycle.exec_in_app(ctx.domain, ["sh", "-c", f"echo mutated > {MARKER_PATH}"])
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
"""custom-html — pre-op seed hooks (Phase 1e HC3). The orchestrator runs `pre_<op>(domain, meta)`
|
||||
"""custom-html — pre-op seed hooks (Phase 1e HC3). The orchestrator runs `pre_<op>(ctx)`
|
||||
BEFORE it performs the op; the matching test_<op>.py asserts the post-op state (assertion-only).
|
||||
|
||||
nginx serves the volume at /usr/share/nginx/html, so the marker file survives an upgrade / a
|
||||
@ -17,16 +17,16 @@ def _write(domain: str, val: str) -> None:
|
||||
lifecycle.exec_in_app(domain, ["sh", "-c", f"echo {val} > {MARKER_PATH}"])
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
def pre_upgrade(ctx):
|
||||
# seed a marker before the upgrade so the overlay can prove the data survives it
|
||||
_write(domain, "upgrade-survives")
|
||||
_write(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
def pre_backup(ctx):
|
||||
# establish a known original state before the backup op captures it
|
||||
_write(domain, "original")
|
||||
_write(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# diverge from the backed-up state so a successful restore (back to "original") is observable
|
||||
_write(domain, "mutated")
|
||||
_write(ctx.domain, "mutated")
|
||||
|
||||
@ -1,28 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# discourse — INSTALL-TIME hook (Phase 2 Q4.6). Runs during the install tier AFTER `abra app new` +
|
||||
# EXTRA_ENV + `abra app secret generate` and BEFORE the single `abra app deploy`
|
||||
# (lifecycle.py::_run_install_steps), with CCCI_RECIPE / CCCI_APP_DOMAIN in env.
|
||||
#
|
||||
# Purpose: provide the cc-ci re-pin+grace overlay (compose.ccci.yml) to the recipe checkout so the
|
||||
# UPGRADE-tier BASE deploy (published 0.7.0+3.3.1, whose compose pins the Docker-Hub-removed
|
||||
# `bitnami/discourse:3.3.1` and ships a too-tight 5m start_period) is deployable and can survive the
|
||||
# 15-25min Rails cold boot — so upgrade-to-latest can run. See compose.ccci.yml's header for the full
|
||||
# rationale. The overlay is referenced by recipe_meta COMPOSE_FILE; it is a cc-ci file (not part of the
|
||||
# recipe), so copying it here makes it resolvable. It persists across the later `git checkout <head>`
|
||||
# (untracked) so the head deploy also merges it (idempotent — the PR head already re-pins + ships 20m).
|
||||
# CHAOS_BASE_DEPLOY=True is set so abra's pinned-deploy clean-tree check doesn't FATA on the overlay.
|
||||
set -euo pipefail
|
||||
|
||||
: "${CCCI_RECIPE:?missing CCCI_RECIPE}"
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# Resolve the recipe tree the way abra does: $ABRA_DIR (the per-run tree inside a CI run) else
|
||||
# the canonical ~/.abra — the overlay must land in the tree this run actually deploys from.
|
||||
RECIPE_DIR="${ABRA_DIR:-${HOME}/.abra}/recipes/${CCCI_RECIPE}"
|
||||
|
||||
if [ ! -d "$RECIPE_DIR" ]; then
|
||||
echo " discourse install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
cp "$SCRIPT_DIR/compose.ccci.yml" "$RECIPE_DIR/compose.ccci.yml"
|
||||
echo " discourse install_steps: provided compose.ccci.yml (bitnamilegacy re-pin + 20m start_period grace) to recipe checkout (${CCCI_RECIPE})"
|
||||
@ -30,18 +30,18 @@ def _seed(domain, value):
|
||||
assert got == value, f"seed did not commit (read back {got!r}, expected {value!r})"
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# diverge from the backup so a successful restore is observable
|
||||
_psql(domain, "DROP TABLE IF EXISTS ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
_psql(ctx.domain, "DROP TABLE IF EXISTS ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -29,11 +29,11 @@ HTTP_TIMEOUT = 1200
|
||||
# (1) it pins the Docker-Hub-removed `bitnami/discourse:3.3.1` (404) → overlay re-pins app+sidekiq to
|
||||
# `bitnamilegacy/discourse:3.3.1` (namespace-only, identical image), the same re-pin the PR makes;
|
||||
# (2) its 5m start_period is too tight for the 15-25min Rails boot → overlay widens it to 20m (grace).
|
||||
# install_steps.sh provides the overlay; CHAOS_BASE_DEPLOY skips the clean-tree gate on the untracked
|
||||
# overlay; it persists across the head checkout (idempotent — the PR head already re-pins + ships 20m).
|
||||
# The harness auto-provides the overlay to the checkout and auto-chaoses the base deploy
|
||||
# (first-class compose.ccci.yml, rcust P2a); it persists across the head checkout (idempotent — the
|
||||
# PR head already re-pins + ships 20m).
|
||||
# Upgrade crossover: 0.7.0 (re-pinned base) → PR head; full assertions run on the HEAD. The 0.7.0
|
||||
# *custom* tests are not separately run (custom tier runs once, on the head — policy §1 allows skip+record).
|
||||
CHAOS_BASE_DEPLOY = True
|
||||
UPGRADE_BASE_VERSION = "0.7.0+3.3.1"
|
||||
EXTRA_ENV = {
|
||||
"TIMEOUT": "3600", # abra's internal convergence wait; matches DEPLOY_TIMEOUT (slow Rails boot headroom)
|
||||
@ -41,7 +41,7 @@ EXTRA_ENV = {
|
||||
}
|
||||
|
||||
|
||||
def BACKUP_VERIFY(domain):
|
||||
def BACKUP_VERIFY(ctx):
|
||||
"""Post-backup integrity check (Q4.6, same race ghost F2-14b hit). The recipe's backupbot db
|
||||
pre-hook (`/pg_backup.sh backup`) dumps the discourse postgres DB to `/var/lib/postgresql/data/
|
||||
backup.sql` (gzip), then restic captures that path. On the loaded single CI node the db container
|
||||
@ -60,7 +60,7 @@ def BACKUP_VERIFY(domain):
|
||||
|
||||
try:
|
||||
out = lifecycle.exec_in_app(
|
||||
domain,
|
||||
ctx.domain,
|
||||
[
|
||||
"sh",
|
||||
"-c",
|
||||
|
||||
@ -1,28 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# ghost — INSTALL-TIME hook (Phase 2 F2-14b). Runs during the install tier AFTER `abra app new` +
|
||||
# EXTRA_ENV + `abra app secret generate` and BEFORE the single `abra app deploy`
|
||||
# (lifecycle.py::_run_install_steps), with CCCI_RECIPE / CCCI_APP_DOMAIN in env.
|
||||
#
|
||||
# Purpose: provide the cc-ci start_period-grace overlay (compose.ccci.yml) to the recipe checkout so
|
||||
# the UPGRADE-tier BASE deploy (a previous published version whose app healthcheck still ships the
|
||||
# too-tight 1m start_period) can survive ghost's ~6-9min fresh-DB migration and converge. See
|
||||
# compose.ccci.yml's header for the full rationale. The overlay is referenced by recipe_meta
|
||||
# COMPOSE_FILE; copying it here (it is a cc-ci file, not part of the recipe) makes it resolvable.
|
||||
# It persists across the later `git checkout <head>` (untracked) so the head deploy also merges it
|
||||
# (idempotent — the PR head already ships 15m). CHAOS_BASE_DEPLOY=True is set so abra's pinned-deploy
|
||||
# clean-tree check doesn't FATA on the untracked overlay.
|
||||
set -euo pipefail
|
||||
|
||||
: "${CCCI_RECIPE:?missing CCCI_RECIPE}"
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
# Resolve the recipe tree the way abra does: $ABRA_DIR (the per-run tree inside a CI run) else
|
||||
# the canonical ~/.abra — the overlay must land in the tree this run actually deploys from.
|
||||
RECIPE_DIR="${ABRA_DIR:-${HOME}/.abra}/recipes/${CCCI_RECIPE}"
|
||||
|
||||
if [ ! -d "$RECIPE_DIR" ]; then
|
||||
echo " ghost install_steps: recipe dir $RECIPE_DIR missing — cannot provide compose.ccci.yml" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
cp "$SCRIPT_DIR/compose.ccci.yml" "$RECIPE_DIR/compose.ccci.yml"
|
||||
echo " ghost install_steps: provided compose.ccci.yml (app start_period grace) to recipe checkout (${CCCI_RECIPE})"
|
||||
@ -36,19 +36,19 @@ def _seed(domain, value):
|
||||
assert got == value, f"seed did not commit (read back {got!r}, expected {value!r})"
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# diverge from the backup so a successful restore is observable: drop the marker table.
|
||||
_mysql(domain, "DROP TABLE IF EXISTS ci_marker;")
|
||||
_mysql(ctx.domain, "DROP TABLE IF EXISTS ci_marker;")
|
||||
got = _mysql(
|
||||
domain,
|
||||
ctx.domain,
|
||||
"SELECT COUNT(*) FROM information_schema.tables "
|
||||
"WHERE table_schema='ghost' AND table_name='ci_marker';",
|
||||
)
|
||||
|
||||
@ -31,23 +31,22 @@ HTTP_TIMEOUT = 900
|
||||
# (plan-ccci-compose-overlay-policy.md §1), so the harness base-deploys the previous PUBLISHED version
|
||||
# (1.1.1+6-alpine) — which predates the PR and still ships the too-tight 1m start_period → it would
|
||||
# deadlock on the same migration kill. compose.ccci.yml re-applies the 15m grace to the BASE so the
|
||||
# from-version is deployable; install_steps.sh provides it to the checkout; CHAOS_BASE_DEPLOY skips the
|
||||
# clean-tree gate on that untracked overlay. It persists across the head checkout (idempotent — the PR
|
||||
# head already ships 15m). This is the policy-blessed "minimal overlay on the from-version so
|
||||
# from-version is deployable; the harness auto-provides it to the checkout and auto-chaoses the base
|
||||
# deploy (first-class compose.ccci.yml, rcust P2a). It persists across the head checkout (idempotent —
|
||||
# the PR head already ships 15m). This is the policy-blessed "minimal overlay on the from-version so
|
||||
# upgrade-to-latest can run" — grace-only, masks no defect, weakens no test.
|
||||
# TIMEOUT/DEPLOY_TIMEOUT 2400s: the BASE cold boot's wall-time is mysql fresh-dir init (~6min, during
|
||||
# which the app crash-loops harmlessly on `ECONNREFUSED 3306` until mysql accepts connections — no
|
||||
# migration progress lost, it hasn't started) PLUS the ~9-15min schema migration (round-trip-bound,
|
||||
# slower under host load). 1200s was too tight (full4 killed at the near-final `email_recipients`
|
||||
# tables while still 0/1); 2400s gives headroom while still bounding a genuine hang (matches discourse).
|
||||
CHAOS_BASE_DEPLOY = True
|
||||
EXTRA_ENV = {
|
||||
"TIMEOUT": "2400",
|
||||
"COMPOSE_FILE": "compose.yml:compose.ccci.yml",
|
||||
}
|
||||
|
||||
|
||||
def BACKUP_VERIFY(domain):
|
||||
def BACKUP_VERIFY(ctx):
|
||||
"""Post-backup integrity check (F2-14b). The recipe's backupbot db pre-hook dumps the ghost MySQL
|
||||
DB to `/var/lib/mysql/backup.sql.gz` (then restic captures that path). On the loaded single CI node
|
||||
the db container intermittently CYCLES mid-dump (observed: full5/6/7 RED, full8 green — pure race;
|
||||
@ -62,7 +61,7 @@ def BACKUP_VERIFY(domain):
|
||||
|
||||
try:
|
||||
out = lifecycle.exec_in_app(
|
||||
domain,
|
||||
ctx.domain,
|
||||
[
|
||||
"sh",
|
||||
"-c",
|
||||
|
||||
@ -25,17 +25,17 @@ def _seed(domain, value):
|
||||
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
_psql(domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
def pre_restore(ctx):
|
||||
_psql(ctx.domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -14,20 +14,20 @@ def _token(domain):
|
||||
return kc_admin.admin_token(domain, kc_admin.admin_password(domain))
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
def pre_upgrade(ctx):
|
||||
# create the marker realm (DB data) before the upgrade so the overlay can prove it survives
|
||||
assert kc_admin.create_marker_realm(domain, _token(domain)) in (201, 409)
|
||||
assert kc_admin.create_marker_realm(ctx.domain, _token(ctx.domain)) in (201, 409)
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
def pre_backup(ctx):
|
||||
# establish the marker realm before the backup op captures mariadb
|
||||
assert kc_admin.create_marker_realm(domain, _token(domain)) in (201, 409)
|
||||
assert kc_admin.create_marker_realm(ctx.domain, _token(ctx.domain)) in (201, 409)
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# backup-bot-two cycles the keycloak container during backup → wait for serving, re-auth, then
|
||||
# delete the realm (diverge from the backup) so a successful restore is observable
|
||||
generic.assert_serving(domain, meta)
|
||||
tok = _token(domain)
|
||||
assert kc_admin.delete_marker_realm(domain, tok) in (204, 200)
|
||||
assert not kc_admin.marker_realm_exists(domain, tok), "delete did not take"
|
||||
generic.assert_serving(ctx.domain, ctx.meta)
|
||||
tok = _token(ctx.domain)
|
||||
assert kc_admin.delete_marker_realm(ctx.domain, tok) in (204, 200)
|
||||
assert not kc_admin.marker_realm_exists(ctx.domain, tok), "delete did not take"
|
||||
|
||||
@ -5,7 +5,7 @@ persistence". This is the canonical create-an-object + read-it-back for lasuite-
|
||||
|
||||
Flow (uses an OIDC token from the dep keycloak):
|
||||
1. Obtain a JWT via OIDC password grant against the dep keycloak (the test user is provisioned
|
||||
by the orchestrator's setup_custom_tests step).
|
||||
by the orchestrator's dep-provisioning step).
|
||||
2. POST `/api/v1.0/documents/` with `Authorization: Bearer <jwt>` to create a new doc with a
|
||||
unique title; capture the returned `id`.
|
||||
3. GET `/api/v1.0/documents/<id>/` with the same Bearer token; assert the returned title and
|
||||
@ -15,7 +15,7 @@ Non-vacuous: a misconfigured OIDC, broken backend, or missing endpoint fails at
|
||||
broken. The marker-in-the-title + id round-trip proves the doc actually persisted in lasuite-
|
||||
docs's database after going through the recipe's nginx → backend → postgres path.
|
||||
|
||||
Marked @pytest.mark.requires_deps — skips with `deps-not-ready` if setup_custom_tests failed.
|
||||
Marked @pytest.mark.requires_deps — skips with `deps-not-ready` if dep provisioning failed.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@ -32,9 +32,9 @@ from harness import sso
|
||||
|
||||
|
||||
@pytest.mark.requires_deps
|
||||
def test_create_doc_and_read_back(live_app, deps_creds):
|
||||
def test_create_doc_and_read_back(live_app, deps):
|
||||
"""Create a doc via the authenticated API; fetch it back; assert round-trip."""
|
||||
kc = deps_creds["keycloak"]
|
||||
kc = deps["keycloak"]
|
||||
|
||||
# Obtain a JWT via OIDC password grant
|
||||
access_token = sso.oidc_password_grant(
|
||||
|
||||
@ -5,13 +5,13 @@ SOURCE: references/recipe-maintainer/recipe-info/lasuite-docs/tests/oidc_login.p
|
||||
End-to-end flow:
|
||||
1. GET `/api/v1.0/users/me/` without auth → asserts the response REDIRECTS to the dep
|
||||
keycloak's realm auth endpoint (the recipe is correctly configured to challenge
|
||||
unauthenticated callers — wired via setup_custom_tests.sh).
|
||||
unauthenticated callers — wired via install_steps.sh).
|
||||
2. Obtain an OIDC token from the dep keycloak via password grant
|
||||
(the test user provisioned by the orchestrator's realm setup).
|
||||
3. Call `/api/v1.0/users/me/` with `Authorization: Bearer <jwt>` → asserts 200 and the
|
||||
returned user's email matches the provisioned test user.
|
||||
|
||||
Marked @pytest.mark.requires_deps — skips with `deps-not-ready` if setup_custom_tests failed.
|
||||
Marked @pytest.mark.requires_deps — skips with `deps-not-ready` if dep provisioning failed.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@ -51,9 +51,9 @@ def _get_no_redirect(url: str) -> tuple[int, str]:
|
||||
|
||||
|
||||
@pytest.mark.requires_deps
|
||||
def test_oidc_login_via_keycloak(live_app, deps_creds):
|
||||
def test_oidc_login_via_keycloak(live_app, deps):
|
||||
"""Anonymous → redirect to keycloak; password-grant token → 200 from /api/v1.0/users/me/."""
|
||||
kc = deps_creds["keycloak"]
|
||||
kc = deps["keycloak"]
|
||||
|
||||
# Step 1: unauthenticated GET → 302 to keycloak realm's auth endpoint
|
||||
status, redirect = _get_no_redirect(f"https://{live_app}/api/v1.0/users/me/")
|
||||
|
||||
@ -3,10 +3,10 @@
|
||||
Refactored to the refined SSO-dep model:
|
||||
- The orchestrator deploys a per-run keycloak dep AFTER generic tiers and provisions a fresh
|
||||
realm/client/user via `harness.sso.setup_keycloak_realm`. The creds are written to
|
||||
`$CCCI_DEPS_FILE` (read here via the `deps_creds` fixture).
|
||||
`$CCCI_DEPS_FILE` (read here via the `deps` fixture).
|
||||
- This test no longer calls `setup_keycloak_realm` itself — that's the orchestrator's job in
|
||||
the setup_custom_tests step. We just consume the credentials and exercise the OIDC flow.
|
||||
- Marked `@pytest.mark.requires_deps` so if setup_custom_tests failed, this test SKIPs with a
|
||||
the dep-provisioning step. We just consume the credentials and exercise the OIDC flow.
|
||||
- Marked `@pytest.mark.requires_deps` so if dep provisioning failed, this test SKIPs with a
|
||||
clear `deps-not-ready` reason rather than red-flagging a non-recipe failure.
|
||||
"""
|
||||
|
||||
@ -31,13 +31,13 @@ def _b64url_decode(seg: str) -> bytes:
|
||||
|
||||
|
||||
@pytest.mark.requires_deps
|
||||
def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
|
||||
def test_oidc_password_grant_against_dep_keycloak(live_app, deps):
|
||||
"""The dep keycloak issues a JWT for the pre-provisioned test user via OIDC password grant."""
|
||||
assert "keycloak" in deps_creds, (
|
||||
f"keycloak creds not in deps_creds; got {list(deps_creds.keys())}. "
|
||||
"setup_custom_tests should have populated this."
|
||||
assert "keycloak" in deps, (
|
||||
f"keycloak creds not in deps; got {list(deps.keys())}. "
|
||||
"dep provisioning should have populated this."
|
||||
)
|
||||
kc = deps_creds["keycloak"]
|
||||
kc = deps["keycloak"]
|
||||
|
||||
# Sanity-check the creds shape — orchestrator-written
|
||||
assert kc["domain"]
|
||||
|
||||
74
tests/lasuite-docs/install_steps.sh
Executable file
74
tests/lasuite-docs/install_steps.sh
Executable file
@ -0,0 +1,74 @@
|
||||
#!/usr/bin/env bash
|
||||
# lasuite-docs — INSTALL-TIME OIDC wiring hook (rcust P2b; migrated from the deleted
|
||||
# setup_custom_tests.sh post-deploy path — sibling of lasuite-drive/-meet's hooks).
|
||||
#
|
||||
# Runs during the install tier AFTER `abra app new` + EXTRA_ENV + `abra app secret generate`, and
|
||||
# BEFORE the single `abra app deploy` (lifecycle.py::_run_install_steps). Writing OIDC env + the
|
||||
# real client secret HERE means the recipe deploys ONCE with OIDC already wired — no post-deploy
|
||||
# reconverge. The orchestrator provisions the per-run realm/client on the (live-warm) keycloak
|
||||
# BEFORE this hook and writes $CCCI_DEPS_FILE (the recipe→creds dict). docs' OIDC settings are
|
||||
# config-only (validated by `manage.py check`, not fetched at boot), so the stack boots healthy
|
||||
# with the env set. Env names per lasuite-docs's .env.sample (same values the old post-deploy
|
||||
# hook wrote — byte-identical wiring, only the timing moved).
|
||||
#
|
||||
# Env supplied by the harness:
|
||||
# CCCI_APP_DOMAIN — the per-run lasuite-docs app domain
|
||||
# CCCI_APP_ENV — path to the app's .env (the one `abra app deploy` reads)
|
||||
# CCCI_DEPS_FILE — JSON {keycloak: {domain, realm, client_id, client_secret, ...}} (may be empty)
|
||||
set -euo pipefail
|
||||
|
||||
: "${CCCI_APP_DOMAIN:?missing}"
|
||||
ENV_PATH="${CCCI_APP_ENV:?missing}"
|
||||
|
||||
# No deps file / no keycloak entry → install-time provisioning failed or was skipped. NO-OP so the
|
||||
# recipe still boots; the @requires_deps OIDC custom test then SKIPs and F2-11 flips the run RED.
|
||||
if [ -z "${CCCI_DEPS_FILE:-}" ] || [ ! -s "${CCCI_DEPS_FILE}" ]; then
|
||||
echo " install_steps: no deps file — skipping OIDC wiring (recipe boots without OIDC)"
|
||||
exit 0
|
||||
fi
|
||||
KC_DOMAIN=$(jq -r '.keycloak.domain // empty' "$CCCI_DEPS_FILE")
|
||||
KC_REALM=$(jq -r '.keycloak.realm // empty' "$CCCI_DEPS_FILE")
|
||||
KC_CLIENT=$(jq -r '.keycloak.client_id // empty' "$CCCI_DEPS_FILE")
|
||||
KC_SECRET=$(jq -r '.keycloak.client_secret // empty' "$CCCI_DEPS_FILE")
|
||||
if [ -z "$KC_DOMAIN" ] || [ -z "$KC_SECRET" ]; then
|
||||
echo " install_steps: deps file has no keycloak domain/secret — skipping OIDC wiring"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo " lasuite-docs install_steps: wiring OIDC at install against keycloak ${KC_DOMAIN}"
|
||||
|
||||
# 1) Insert the OIDC client secret at a bumped version (abra already generated oidc_rpcs:v1; swarm
|
||||
# forbids overwriting a secret at the same version). The app is not deployed yet — a swarm secret
|
||||
# can be created independently — so the single deploy below picks up v2.
|
||||
CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
|
||||
NEW_NUM=$((${CUR_VER#v} + 1))
|
||||
NEW_VER="v${NEW_NUM}"
|
||||
INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) ||
|
||||
INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) ||
|
||||
{
|
||||
echo " install_steps: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"
|
||||
exit 1
|
||||
}
|
||||
sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
|
||||
echo " install_steps: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
|
||||
|
||||
# 2) Write OIDC env vars to the app's .env (names per lasuite-docs's .env.sample). Ensure a
|
||||
# trailing newline first so appends never concatenate onto the last line.
|
||||
write_env() {
|
||||
local key="$1" val="$2"
|
||||
sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
|
||||
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
|
||||
printf '%s=%s\n' "$key" "$val" >>"$ENV_PATH"
|
||||
}
|
||||
write_env OIDC_REALM "$KC_REALM"
|
||||
write_env OIDC_OP_DISCOVERY_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration"
|
||||
write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
|
||||
write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
|
||||
write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
|
||||
write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
|
||||
write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
|
||||
write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
|
||||
write_env OIDC_RP_SIGN_ALGO "RS256"
|
||||
write_env OIDC_RP_SCOPES "openid email profile"
|
||||
|
||||
echo " lasuite-docs install_steps: OIDC env wired into .env (deploy will pick it up, no reconverge)"
|
||||
@ -24,18 +24,18 @@ def _seed(domain, value):
|
||||
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# drop the marker table (diverge from the backup) so a successful restore is observable
|
||||
_psql(domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
_psql(ctx.domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -15,7 +15,7 @@ HTTP_TIMEOUT = 600
|
||||
DEPS = ["keycloak"]
|
||||
|
||||
|
||||
def EXTRA_ENV(domain):
|
||||
def EXTRA_ENV(ctx):
|
||||
# abra's internal per-deploy convergence timeout (the recipe's TIMEOUT env, default 300s) is too
|
||||
# short for this 9-service stack on a COLD image cache (~9 large images: impress frontend/backend,
|
||||
# minio, postgres18, redis, docspec, y-provider). Cold pulls exceed 300s -> "deploy timed out 🟠".
|
||||
|
||||
@ -1,91 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# lasuite-docs — post-deps setup hook (operator-2026-05-28 SSO-dep plan §3.2).
|
||||
#
|
||||
# Runs AFTER the generic tiers (install/upgrade/backup/restore) and AFTER each declared dep is
|
||||
# deployed + provisioned with realm/client via the harness. The orchestrator has written
|
||||
# $CCCI_DEPS_FILE with the keycloak dep's domain + realm + client_secret + admin creds.
|
||||
#
|
||||
# This hook:
|
||||
# 1. Reads the dep's connection info from $CCCI_DEPS_FILE.
|
||||
# 2. Inserts the OIDC client secret as an abra app secret (recipe-conventional name oidc_rpcs).
|
||||
# 3. Writes the OIDC env vars to the running app's .env via `abra app config set`.
|
||||
# 4. Triggers an in-place `abra app deploy --force --chaos` so the new env takes effect.
|
||||
# THIS IS NOT a fresh `abra app new` — the deploy-count guard (DG4.1, generalised) still
|
||||
# sees one app_new per app.
|
||||
#
|
||||
# Env supplied by the orchestrator:
|
||||
# CCCI_APP_DOMAIN — the running per-run lasuite-docs app domain
|
||||
# CCCI_RECIPE — "lasuite-docs"
|
||||
# CCCI_DEPS_FILE — JSON file (dict shape: {dep_recipe: {domain, realm, client_id, ...}, ...})
|
||||
set -euo pipefail
|
||||
|
||||
: "${CCCI_APP_DOMAIN:?missing}"
|
||||
: "${CCCI_DEPS_FILE:?missing}"
|
||||
test -s "$CCCI_DEPS_FILE" || {
|
||||
echo " setup_custom_tests: deps file empty"
|
||||
exit 1
|
||||
}
|
||||
|
||||
# Read keycloak dep info via jq
|
||||
KC_DOMAIN=$(jq -r '.keycloak.domain' "$CCCI_DEPS_FILE")
|
||||
KC_REALM=$(jq -r '.keycloak.realm' "$CCCI_DEPS_FILE")
|
||||
KC_CLIENT=$(jq -r '.keycloak.client_id' "$CCCI_DEPS_FILE")
|
||||
KC_SECRET=$(jq -r '.keycloak.client_secret' "$CCCI_DEPS_FILE")
|
||||
if [ -z "$KC_DOMAIN" ] || [ "$KC_DOMAIN" = "null" ]; then
|
||||
echo " setup_custom_tests: no keycloak.domain in deps"
|
||||
exit 1
|
||||
fi
|
||||
if [ -z "$KC_SECRET" ] || [ "$KC_SECRET" = "null" ]; then
|
||||
echo " setup_custom_tests: no keycloak.client_secret"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo " lasuite-docs setup_custom_tests: wiring OIDC against keycloak dep ${KC_DOMAIN}"
|
||||
|
||||
# 1) Insert the OIDC client secret AT A BUMPED VERSION (the recipe-maintainer pattern).
|
||||
# `abra app new -S` already generated `oidc_rpcs:v1` (random) — Docker Swarm forbids overwriting
|
||||
# a secret at the same version, so we bump the version (v2), insert our value there, then
|
||||
# update SECRET_OIDC_RPCS_VERSION in the .env to point at the new one.
|
||||
ENV_PATH="$HOME/.abra/servers/default/${CCCI_APP_DOMAIN}.env"
|
||||
CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
|
||||
NEW_NUM=$((${CUR_VER#v} + 1))
|
||||
NEW_VER="v${NEW_NUM}"
|
||||
|
||||
INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) ||
|
||||
INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) ||
|
||||
{
|
||||
echo " setup_custom_tests: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"
|
||||
exit 1
|
||||
}
|
||||
# Repoint the env var to the new version
|
||||
sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
|
||||
echo " setup_custom_tests: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
|
||||
|
||||
# 2) Write OIDC env vars to the app's .env (names per lasuite-docs's .env.sample).
|
||||
# Ensure the file ends with a newline FIRST so our appends don't concatenate onto the last line
|
||||
# (we saw `TIMEOUT=900OIDC_REALM=...` malformed by a missing-trailing-newline file).
|
||||
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
|
||||
write_env() {
|
||||
local key="$1" val="$2"
|
||||
# remove any existing key (commented or live) then append the live key=val
|
||||
sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
|
||||
# Re-ensure trailing newline after each delete (sed may leave the file without one)
|
||||
[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
|
||||
printf '%s=%s\n' "$key" "$val" >>"$ENV_PATH"
|
||||
}
|
||||
write_env OIDC_REALM "$KC_REALM"
|
||||
write_env OIDC_OP_DISCOVERY_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration"
|
||||
write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
|
||||
write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
|
||||
write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
|
||||
write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
|
||||
write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
|
||||
write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
|
||||
write_env OIDC_RP_SIGN_ALGO "RS256"
|
||||
write_env OIDC_RP_SCOPES "openid email profile"
|
||||
|
||||
# 3) Trigger an in-place redeploy so the env update takes effect. --force re-deploys even when
|
||||
# the recipe hasn't changed; --chaos avoids the chaos prompt; --no-input non-interactive.
|
||||
abra app deploy "$CCCI_APP_DOMAIN" --force --chaos --no-input 2>&1 | tail -10
|
||||
|
||||
echo " lasuite-docs setup_custom_tests: OIDC wired + redeployed"
|
||||
@ -3,12 +3,12 @@
|
||||
Drive (La Suite Drive) is OIDC-required: login is gated by an external OpenID Connect provider.
|
||||
Mirrors the proven lasuite-docs SSO model:
|
||||
- The orchestrator deploys a per-run keycloak dep AFTER the generic tiers and provisions a fresh
|
||||
realm/client/user via `harness.sso.setup_keycloak_realm`; `setup_custom_tests.sh` then wires the
|
||||
realm/client/user via `harness.sso.setup_keycloak_realm`; `install_steps.sh` then wires the
|
||||
OIDC env + client secret into the running drive app and redeploys. Creds land in `$CCCI_DEPS_FILE`
|
||||
(read here via the `deps_creds` fixture).
|
||||
(read here via the `deps` fixture).
|
||||
- This test consumes those creds and exercises the real OIDC flow against the dep keycloak: discovery
|
||||
endpoint advertises the realm, and a password grant yields a valid JWT with the expected claims.
|
||||
- Marked `@pytest.mark.requires_deps` so if setup_custom_tests failed the test SKIPs with a clear
|
||||
- Marked `@pytest.mark.requires_deps` so if dep provisioning failed the test SKIPs with a clear
|
||||
`deps-not-ready` reason — and (per F2-11) the orchestrator then fails the run rather than going
|
||||
green on a skipped SSO test.
|
||||
|
||||
@ -36,13 +36,13 @@ def _b64url_decode(seg: str) -> bytes:
|
||||
|
||||
|
||||
@pytest.mark.requires_deps
|
||||
def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
|
||||
def test_oidc_password_grant_against_dep_keycloak(live_app, deps):
|
||||
"""The dep keycloak issues a JWT for the pre-provisioned test user via OIDC password grant."""
|
||||
assert "keycloak" in deps_creds, (
|
||||
f"keycloak creds not in deps_creds; got {list(deps_creds.keys())}. "
|
||||
"setup_custom_tests should have populated this."
|
||||
assert "keycloak" in deps, (
|
||||
f"keycloak creds not in deps; got {list(deps.keys())}. "
|
||||
"dep provisioning should have populated this."
|
||||
)
|
||||
kc = deps_creds["keycloak"]
|
||||
kc = deps["keycloak"]
|
||||
|
||||
# Creds shape. WC1: realm is per-run namespaced "<parent>-<6hex>"; client_id stays the parent.
|
||||
assert kc["domain"]
|
||||
|
||||
@ -6,7 +6,7 @@
|
||||
# BEFORE the single `abra app deploy` (runner/harness/lifecycle.py::_run_install_steps). By writing
|
||||
# the OIDC env + the real client secret into the app's `.env` HERE, the recipe deploys ONCE with
|
||||
# OIDC already wired — eliminating the flaky post-deploy in-place `--force --chaos` 12-service
|
||||
# reconverge that the old setup_custom_tests.sh did (collabora WOPI-discovery race; see JOURNAL
|
||||
# post-deploy reconverge (collabora WOPI-discovery race; see JOURNAL
|
||||
# Step 0). The orchestrator provisions the per-run realm/client on the live-warm keycloak BEFORE
|
||||
# this hook and writes $CCCI_DEPS_FILE (the recipe→creds dict).
|
||||
#
|
||||
|
||||
@ -5,6 +5,7 @@ in the `db` service. The backup path exercises the recipe's pg_backup.sh DB-dump
|
||||
backupbot-labelled)."""
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
|
||||
@ -12,6 +13,47 @@ sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner")
|
||||
from harness import lifecycle # noqa: E402
|
||||
|
||||
|
||||
def pre_install(ctx):
|
||||
"""Post-deploy seed for the custom tier (the former setup_custom_tests.sh, moved here in rcust
|
||||
P2b — install_steps.sh runs PRE-deploy and cannot touch the live stack). The deploy alone does
|
||||
NOT create the MinIO bucket: `minio-createbuckets` is a `replicas:0` one-shot (restart_policy:
|
||||
none) that must be triggered. The MinIO storage test asserts the bucket exists, so trigger it
|
||||
here and poll. `--detach` is REQUIRED: the job creates the bucket then EXITS 0, so it never
|
||||
holds a steady 1/1 replica — a blocking scale would wait forever."""
|
||||
stack = ctx.domain.replace(".", "_")
|
||||
print(" pre_install: creating MinIO bucket via the minio-createbuckets one-shot", flush=True)
|
||||
subprocess.run(
|
||||
["docker", "service", "scale", "--detach", f"{stack}_minio-createbuckets=1"],
|
||||
capture_output=True,
|
||||
check=False,
|
||||
)
|
||||
check = (
|
||||
'mc alias set _c http://localhost:9000 "$(cat /run/secrets/minio_ru)" '
|
||||
'"$(cat /run/secrets/minio_rp)" >/dev/null 2>&1 && '
|
||||
"mc ls _c/drive-media-storage >/dev/null 2>&1"
|
||||
)
|
||||
for i in range(30):
|
||||
cid = subprocess.run(
|
||||
["docker", "ps", "-q", "-f", f"name={stack}_minio.1"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
check=False,
|
||||
).stdout.split()
|
||||
if cid and (
|
||||
subprocess.run(
|
||||
["docker", "exec", cid[0], "sh", "-c", check], capture_output=True, check=False
|
||||
).returncode
|
||||
== 0
|
||||
):
|
||||
print(
|
||||
f" pre_install: bucket drive-media-storage present after {i + 1} poll(s)",
|
||||
flush=True,
|
||||
)
|
||||
return
|
||||
time.sleep(3)
|
||||
raise AssertionError("minio-createbuckets one-shot did not create drive-media-storage in 90s")
|
||||
|
||||
|
||||
def _wait_collabora_ready(domain, timeout=420):
|
||||
"""Gate the upgrade op on collabora being FULLY ready (WOPI discovery endpoint → 200), not just
|
||||
container 1/1 'running'. coolwsd takes ~2min to boot (pre-reads 1300+ l10n files + RSA keygen);
|
||||
@ -49,21 +91,21 @@ def _seed(domain, value):
|
||||
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
def pre_upgrade(ctx):
|
||||
# Gate the chaos redeploy on a fully-ready collabora (else it kills a still-booting coolwsd and
|
||||
# abra aborts the upgrade deploy — Q3.2a run 1). Then seed the data-integrity marker.
|
||||
_wait_collabora_ready(domain)
|
||||
_seed(domain, "upgrade-survives")
|
||||
_wait_collabora_ready(ctx.domain)
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# drop the marker table (diverge from the backup) so a successful restore is observable
|
||||
_psql(domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
_psql(ctx.domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -18,34 +18,31 @@ DEPLOY_TIMEOUT = 1800
|
||||
HTTP_TIMEOUT = 900
|
||||
|
||||
# Base deploy/lifecycle proven cold-green @2026-05-28 (install: pass; 12 services incl.
|
||||
# onlyoffice+collabora) once the Docker Hub rate limit was fixed. The keycloak SSO dep is now
|
||||
# enabled: declaring DEPS triggers the orchestrator's setup_custom_tests step (deploy keycloak +
|
||||
# provision realm/client/user + run tests/lasuite-drive/setup_custom_tests.sh to wire OIDC env +
|
||||
# in-place redeploy). functional/test_oidc_with_keycloak.py then exercises the SSO flow.
|
||||
# onlyoffice+collabora) once the Docker Hub rate limit was fixed. Declaring DEPS makes the
|
||||
# orchestrator provision keycloak (realm/client/user) BEFORE the single deploy;
|
||||
# functional/test_oidc_with_keycloak.py then exercises the SSO flow.
|
||||
DEPS = ["keycloak"]
|
||||
|
||||
# Q3.2a (plan-lasuite-drive-oidc-robustness.md Part A): wire OIDC at INSTALL time, not via a
|
||||
# post-deploy in-place `--chaos` redeploy. The orchestrator provisions the per-run realm on the
|
||||
# live-warm keycloak BEFORE the single `abra app deploy`, and tests/lasuite-drive/install_steps.sh
|
||||
# writes the OIDC env + client secret into the .env that one deploy reads. This eliminates the flaky
|
||||
# 12-service reconverge (collabora WOPI-discovery race; JOURNAL Step 0). Drive boots fine with OIDC
|
||||
# env set because keycloak is live-warm (discovery reachable at boot). setup_custom_tests.sh now
|
||||
# only triggers the post-deploy MinIO bucket one-shot.
|
||||
OIDC_AT_INSTALL = True
|
||||
# OIDC is wired at INSTALL time (the only deps mode since rcust P2b; Q3.2a pioneered it here):
|
||||
# the orchestrator provisions the per-run realm on the live-warm keycloak BEFORE the single
|
||||
# `abra app deploy`, and tests/lasuite-drive/install_steps.sh writes the OIDC env + client secret
|
||||
# into the .env that one deploy reads. No post-deploy reconverge (the flaky 12-service collabora
|
||||
# WOPI race is structurally gone). The post-deploy MinIO bucket one-shot lives in ops.py
|
||||
# pre_install (the former setup_custom_tests.sh, deleted in P2b).
|
||||
|
||||
|
||||
def READY_PROBE(domain):
|
||||
def READY_PROBE(ctx):
|
||||
"""Readiness signals beyond replica-convergence + the app HEALTH_PATH (Q3.2/F2-12). collabora's
|
||||
coolwsd reports its container 1/1 'running' while still doing jail/config init, and its WOPI
|
||||
discovery endpoint 404s until ready — so the harness waits for `/hosting/discovery` → 200 on the
|
||||
collabora sibling host after the install deploy AND after the upgrade chaos redeploy. This is what
|
||||
makes the heavy prev→PR-head crossover reliably green (the new collabora 25.04.9.x finishes init
|
||||
within swarm's healthcheck retries; abra's own converge monitor was too impatient — F2-12)."""
|
||||
label, _, rest = domain.partition(".")
|
||||
return [{"host": f"collabora-{domain}", "path": "/hosting/discovery", "ok": (200,)}]
|
||||
label, _, rest = ctx.domain.partition(".")
|
||||
return [{"host": f"collabora-{ctx.domain}", "path": "/hosting/discovery", "ok": (200,)}]
|
||||
|
||||
|
||||
def EXTRA_ENV(domain):
|
||||
def EXTRA_ENV(ctx):
|
||||
# Two of lasuite-drive's services route on DOMAIN-DERIVED **nested** subdomains —
|
||||
# `MINIO_DOMAIN="minio.${DOMAIN}"` and `COLLABORA_DOMAIN="collabora.${DOMAIN}"`. The cc-ci
|
||||
# wildcard TLS cert is `*.ci.commoninternet.net` (single label only), so a 2-label name like
|
||||
@ -55,8 +52,8 @@ def EXTRA_ENV(domain):
|
||||
# no cert/gateway change. See DECISIONS.md "Phase 2 — nested DOMAIN-derived subdomains".
|
||||
# `AWS_S3_DOMAIN_REPLACE` derives from MINIO_DOMAIN in-compose, so setting MINIO_DOMAIN is enough.
|
||||
return {
|
||||
"MINIO_DOMAIN": f"minio-{domain}",
|
||||
"COLLABORA_DOMAIN": f"collabora-{domain}",
|
||||
"MINIO_DOMAIN": f"minio-{ctx.domain}",
|
||||
"COLLABORA_DOMAIN": f"collabora-{ctx.domain}",
|
||||
# abra's internal per-deploy convergence timeout (recipe TIMEOUT env, default 300s) is too
|
||||
# short for this 12-service stack on a cold image cache (impress frontend/backend, minio,
|
||||
# postgres, redis, collabora ~1GB, onlyoffice ~2GB). Bump so abra waits long enough for
|
||||
|
||||
@ -1,39 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
# lasuite-drive — POST-DEPLOY setup hook (Phase 2 Q3.2a).
|
||||
#
|
||||
# As of Q3.2a (plan-lasuite-drive-oidc-robustness.md Part A) OIDC is wired at INSTALL time by
|
||||
# tests/lasuite-drive/install_steps.sh (before the single `abra app deploy`), so this hook NO LONGER
|
||||
# does any OIDC env wiring or in-place redeploy — that eliminated the flaky 12-service reconverge
|
||||
# (collabora WOPI race; see JOURNAL Step 0). What remains here is the ONE post-deploy step that
|
||||
# genuinely needs the live stack: triggering the MinIO bucket-creation one-shot. The orchestrator
|
||||
# runs this only on the install-time path AFTER the deploy is healthy (deps already provisioned).
|
||||
#
|
||||
# Env supplied by the orchestrator:
|
||||
# CCCI_APP_DOMAIN — the running per-run lasuite-drive app domain
|
||||
# CCCI_DEPS_FILE — JSON deps creds dict (unused here now; OIDC handled at install)
|
||||
set -euo pipefail
|
||||
|
||||
: "${CCCI_APP_DOMAIN:?missing}"
|
||||
|
||||
# The deploy alone does NOT create the MinIO bucket — `minio-createbuckets` is a `replicas:0`
|
||||
# one-shot (restart_policy: none) that must be triggered. The MinIO storage test asserts the bucket
|
||||
# exists, so create it here. `--detach` is REQUIRED: the job creates the bucket then EXITS 0, so it
|
||||
# never holds a steady 1/1 replica; a blocking `docker service scale ...=1` would wait forever and
|
||||
# hang the run. With `--detach` the scale just submits the one-run and returns; the poll loop below
|
||||
# confirms the bucket was actually created.
|
||||
STACK=$(printf '%s' "$CCCI_APP_DOMAIN" | tr '.' '_')
|
||||
echo " setup: creating MinIO bucket via the minio-createbuckets one-shot (scale 0->1)"
|
||||
docker service scale --detach "${STACK}_minio-createbuckets=1" >/dev/null 2>&1 || true
|
||||
# Wait up to 90s for the one-shot to create the bucket (mc mb drive/drive-media-storage; exit 0).
|
||||
# Poll by checking the bucket directly from the running minio replica container.
|
||||
for i in $(seq 1 30); do
|
||||
MC_CID=$(docker ps -q -f "name=${STACK}_minio.1" | head -1)
|
||||
if [ -n "$MC_CID" ] && docker exec "$MC_CID" sh -c \
|
||||
'mc alias set _c http://localhost:9000 "$(cat /run/secrets/minio_ru)" "$(cat /run/secrets/minio_rp)" >/dev/null 2>&1 && mc ls _c/drive-media-storage >/dev/null 2>&1'; then
|
||||
echo " setup: bucket drive-media-storage present after ${i} poll(s)"
|
||||
break
|
||||
fi
|
||||
sleep 3
|
||||
done
|
||||
|
||||
echo " lasuite-drive setup_custom_tests: post-deploy MinIO bucket step complete (OIDC wired at install)"
|
||||
@ -36,8 +36,8 @@ def _b64url(seg: str) -> bytes:
|
||||
return base64.urlsafe_b64decode(seg + "=" * ((4 - len(seg) % 4) % 4))
|
||||
|
||||
|
||||
def _creds(deps_creds: dict) -> dict:
|
||||
kc = deps_creds["keycloak"]
|
||||
def _creds(deps: dict) -> dict:
|
||||
kc = deps["keycloak"]
|
||||
return {
|
||||
"provider": "keycloak",
|
||||
"provider_domain": kc["domain"],
|
||||
@ -55,10 +55,10 @@ def _creds(deps_creds: dict) -> dict:
|
||||
|
||||
|
||||
@pytest.mark.requires_deps
|
||||
def test_create_room_get_livekit_token_and_read_back(live_app, deps_creds):
|
||||
assert "keycloak" in deps_creds, f"keycloak creds missing; got {list(deps_creds.keys())}"
|
||||
def test_create_room_get_livekit_token_and_read_back(live_app, deps):
|
||||
assert "keycloak" in deps, f"keycloak creds missing; got {list(deps.keys())}"
|
||||
base = f"https://{live_app}"
|
||||
token = sso.oidc_password_grant(_creds(deps_creds))
|
||||
token = sso.oidc_password_grant(_creds(deps))
|
||||
assert isinstance(token, str) and token.count(".") == 2, "OIDC access token is not a JWT"
|
||||
auth = {"Authorization": f"Bearer {token}"}
|
||||
|
||||
|
||||
@ -3,12 +3,12 @@
|
||||
Meet (La Suite Meet) is OIDC-required: login is gated by an external OpenID Connect provider.
|
||||
Mirrors the proven lasuite-docs SSO model:
|
||||
- The orchestrator deploys a per-run keycloak dep AFTER the generic tiers and provisions a fresh
|
||||
realm/client/user via `harness.sso.setup_keycloak_realm`; `setup_custom_tests.sh` then wires the
|
||||
realm/client/user via `harness.sso.setup_keycloak_realm`; `install_steps.sh` then wires the
|
||||
OIDC env + client secret into the running drive app and redeploys. Creds land in `$CCCI_DEPS_FILE`
|
||||
(read here via the `deps_creds` fixture).
|
||||
(read here via the `deps` fixture).
|
||||
- This test consumes those creds and exercises the real OIDC flow against the dep keycloak: discovery
|
||||
endpoint advertises the realm, and a password grant yields a valid JWT with the expected claims.
|
||||
- Marked `@pytest.mark.requires_deps` so if setup_custom_tests failed the test SKIPs with a clear
|
||||
- Marked `@pytest.mark.requires_deps` so if dep provisioning failed the test SKIPs with a clear
|
||||
`deps-not-ready` reason — and (per F2-11) the orchestrator then fails the run rather than going
|
||||
green on a skipped SSO test.
|
||||
|
||||
@ -36,13 +36,13 @@ def _b64url_decode(seg: str) -> bytes:
|
||||
|
||||
|
||||
@pytest.mark.requires_deps
|
||||
def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
|
||||
def test_oidc_password_grant_against_dep_keycloak(live_app, deps):
|
||||
"""The dep keycloak issues a JWT for the pre-provisioned test user via OIDC password grant."""
|
||||
assert "keycloak" in deps_creds, (
|
||||
f"keycloak creds not in deps_creds; got {list(deps_creds.keys())}. "
|
||||
"setup_custom_tests should have populated this."
|
||||
assert "keycloak" in deps, (
|
||||
f"keycloak creds not in deps; got {list(deps.keys())}. "
|
||||
"dep provisioning should have populated this."
|
||||
)
|
||||
kc = deps_creds["keycloak"]
|
||||
kc = deps["keycloak"]
|
||||
|
||||
# Creds shape. WC1: realm is per-run namespaced "<parent>-<6hex>"; client_id stays the parent.
|
||||
assert kc["domain"]
|
||||
|
||||
@ -4,7 +4,8 @@
|
||||
# Runs during the install tier AFTER `abra app new` + EXTRA_ENV + `abra app secret generate`, and
|
||||
# BEFORE the single `abra app deploy` (lifecycle.py::_run_install_steps). Writing OIDC env + the real
|
||||
# client secret HERE means the recipe deploys ONCE with OIDC already wired — no post-deploy reconverge
|
||||
# (OIDC_AT_INSTALL). The orchestrator provisions the per-run realm/client on the live-warm keycloak
|
||||
# (install-time deps wiring — the only mode since rcust P2b). The orchestrator provisions the
|
||||
# per-run realm/client on the live-warm keycloak
|
||||
# BEFORE this hook and writes $CCCI_DEPS_FILE (the recipe→creds dict).
|
||||
#
|
||||
# Meet's OIDC is REQUIRED (recipe README). Same La Suite/impress env contract as drive, with meet's
|
||||
|
||||
@ -27,18 +27,18 @@ def _seed(domain, value):
|
||||
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# drop the marker table (diverge from the backup) so a successful restore is observable
|
||||
_psql(domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
_psql(ctx.domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -13,16 +13,15 @@ HEALTH_OK = (200, 301, 302)
|
||||
DEPLOY_TIMEOUT = 1200
|
||||
HTTP_TIMEOUT = 600
|
||||
|
||||
# SSO-dependent (recipe.toml requires=["keycloak"], [sso] provider=keycloak). Wire OIDC at INSTALL
|
||||
# time against the live-warm keycloak — same machinery as lasuite-drive (Q3.2a): the orchestrator
|
||||
# provisions the per-run realm BEFORE the single `abra app deploy`, and tests/lasuite-meet/
|
||||
# install_steps.sh writes the OIDC env + client secret into that one deploy (no post-deploy
|
||||
# reconverge). Meet boots fine with OIDC env set because keycloak is live-warm.
|
||||
# SSO-dependent (recipe.toml requires=["keycloak"], [sso] provider=keycloak). OIDC is wired at
|
||||
# INSTALL time (the only deps mode since rcust P2b) against the live-warm keycloak: the
|
||||
# orchestrator provisions the per-run realm BEFORE the single `abra app deploy`, and
|
||||
# tests/lasuite-meet/install_steps.sh writes the OIDC env + client secret into that one deploy
|
||||
# (no post-deploy reconverge). Meet boots fine with OIDC env set because keycloak is live-warm.
|
||||
DEPS = ["keycloak"]
|
||||
OIDC_AT_INSTALL = True
|
||||
|
||||
|
||||
def EXTRA_ENV(domain):
|
||||
def EXTRA_ENV(ctx):
|
||||
# lasuite-meet routes LiveKit's WebSocket signaling on a DOMAIN-derived **nested** subdomain
|
||||
# `LIVEKIT_DOMAIN="livekit.${DOMAIN}"`. The cc-ci wildcard TLS cert is `*.ci.commoninternet.net`
|
||||
# (single label only), so a 2-label name like `livekit.lasuite-meet-pr0-abc.ci.commoninternet.net`
|
||||
@ -31,7 +30,7 @@ def EXTRA_ENV(domain):
|
||||
# no cert/gateway change. Same fix as lasuite-drive's minio/collabora siblings (DECISIONS.md
|
||||
# "Phase 2 — nested DOMAIN-derived subdomains").
|
||||
return {
|
||||
"LIVEKIT_DOMAIN": f"livekit-{domain}",
|
||||
"LIVEKIT_DOMAIN": f"livekit-{ctx.domain}",
|
||||
# abra's internal per-deploy convergence TIMEOUT (default 300s) is too short for this stack on
|
||||
# a cold image cache; bump it (kept under DEPLOY_TIMEOUT so Python never kills abra mid-wait).
|
||||
"TIMEOUT": "1000",
|
||||
|
||||
@ -21,10 +21,10 @@ DEPLOY_TIMEOUT = 900
|
||||
HTTP_TIMEOUT = 600
|
||||
|
||||
|
||||
def EXTRA_ENV(domain):
|
||||
def EXTRA_ENV(ctx):
|
||||
return {
|
||||
"MAIL_DOMAIN": domain,
|
||||
"HOSTNAMES": domain,
|
||||
"MAIL_DOMAIN": ctx.domain,
|
||||
"HOSTNAMES": ctx.domain,
|
||||
"TRAEFIK_STACK_NAME": "traefik_ci_commoninternet_net",
|
||||
"TLS_FLAVOR": "notls",
|
||||
"SITENAME": "ccci-mail",
|
||||
|
||||
@ -24,18 +24,18 @@ def _seed(domain, value):
|
||||
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# drop the marker table (diverge from the backup) so a successful restore is observable
|
||||
_psql(domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
_psql(ctx.domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -29,18 +29,18 @@ def _seed(domain, value):
|
||||
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# drop the marker table (diverge from the backup) so a successful restore is observable
|
||||
_psql(domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
_psql(ctx.domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -26,9 +26,9 @@ def test_configured_max_users_surfaces_in_serverconfig(live_app):
|
||||
assert r["server_sync"], f"ServerSync handshake did not complete — {r.get('error')}"
|
||||
cfg = r["server_config"]
|
||||
assert cfg, f"server did not send a ServerConfig message — {r!r}"
|
||||
assert cfg.get("max_users") == recipe_meta.MAX_USERS, (
|
||||
assert cfg.get("max_users") == recipe_meta._MAX_USERS, (
|
||||
f"ServerConfig.max_users={cfg.get('max_users')!r} does not match the configured "
|
||||
f"USERS={recipe_meta.MAX_USERS} — deploy-time server-limit config did not propagate"
|
||||
f"USERS={recipe_meta._MAX_USERS} — deploy-time server-limit config did not propagate"
|
||||
)
|
||||
# allow_html defaults true in the recipe; assert it is present/boolean to prove the field set
|
||||
# is the real ServerConfig (not an empty/garbled decode).
|
||||
|
||||
@ -20,7 +20,7 @@ import recipe_meta # noqa: E402
|
||||
|
||||
|
||||
def test_configured_welcome_text_surfaces_in_serversync(live_app):
|
||||
marker = recipe_meta.WELCOME_TEXT_MARKER
|
||||
marker = recipe_meta._WELCOME_TEXT_MARKER
|
||||
r = _mumble_proto.retry_handshake(attempts=12, interval=5.0)
|
||||
|
||||
assert r["server_sync"], f"ServerSync handshake did not complete — {r.get('error')}"
|
||||
|
||||
@ -38,16 +38,18 @@ def _seed(domain, value):
|
||||
assert got == value, f"seed did not commit (read back {got!r}, expected {value!r})"
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
def pre_restore(ctx):
|
||||
# diverge from the backup so a successful restore is observable: drop the marker table.
|
||||
_sqlite(domain, "DROP TABLE IF EXISTS ci_marker;")
|
||||
got = _sqlite(domain, "SELECT name FROM sqlite_master WHERE type='table' AND name='ci_marker';")
|
||||
_sqlite(ctx.domain, "DROP TABLE IF EXISTS ci_marker;")
|
||||
got = _sqlite(
|
||||
ctx.domain, "SELECT name FROM sqlite_master WHERE type='table' AND name='ci_marker';"
|
||||
)
|
||||
assert got == "", f"drop did not take (sqlite_master still lists ci_marker: {got!r})"
|
||||
|
||||
@ -31,18 +31,19 @@ HEALTH_OK = (200,)
|
||||
DEPLOY_TIMEOUT = 900 # two images to pull (mumble-server + mumble-web) on a cold node
|
||||
HTTP_TIMEOUT = 300
|
||||
|
||||
# A unique, stable welcome-text marker the round-trip test asserts surfaces over the protocol.
|
||||
WELCOME_TEXT_MARKER = "cc-ci-mumble-welcome-7f3a9c"
|
||||
# A unique, stable welcome-text marker the round-trip test asserts surfaces over the protocol
|
||||
# (underscore prefix = recipe-private constant, exempt from registry validation — rcust P1).
|
||||
_WELCOME_TEXT_MARKER = "cc-ci-mumble-welcome-7f3a9c"
|
||||
# A distinctive max-users value (not the recipe default 100) the server_config test asserts.
|
||||
MAX_USERS = 42
|
||||
_MAX_USERS = 42
|
||||
|
||||
# BASE deploy (0.2.0): mumble-web only — NO host-ports (0.2.0 predates it). The voice-config env is
|
||||
# set here and persists across the upgrade so it takes effect on the latest (where the custom config
|
||||
# round-trip tests assert it).
|
||||
EXTRA_ENV = {
|
||||
"COMPOSE_FILE": "compose.yml:compose.mumbleweb.yml",
|
||||
"WELCOME_TEXT": WELCOME_TEXT_MARKER,
|
||||
"USERS": str(MAX_USERS),
|
||||
"WELCOME_TEXT": _WELCOME_TEXT_MARKER,
|
||||
"USERS": str(_MAX_USERS),
|
||||
}
|
||||
|
||||
# UPGRADE-target deploy (latest 1.0.0+): add the NATIVE compose.host-ports.yml so 64738 is
|
||||
@ -52,7 +53,7 @@ UPGRADE_EXTRA_ENV = {
|
||||
}
|
||||
|
||||
|
||||
def READY_PROBE(domain):
|
||||
def READY_PROBE(ctx):
|
||||
# The voice server on 64738 is testable on-host ONLY when compose.host-ports.yml is active — i.e.
|
||||
# the post-upgrade LATEST, not the minimal 0.2.0 base. Read the live COMPOSE_FILE to decide, so the
|
||||
# SAME probe fn is correct in both phases: the post-install probe (base, no host-ports) returns []
|
||||
@ -63,7 +64,7 @@ def READY_PROBE(domain):
|
||||
# backup-bot would then exec into a not-running app container -> 409).
|
||||
from harness import abra # lazy: recipe_meta is exec'd with `harness` importable at call time
|
||||
|
||||
cf = abra.env_get(domain, "COMPOSE_FILE") or ""
|
||||
cf = abra.env_get(ctx.domain, "COMPOSE_FILE") or ""
|
||||
if "compose.host-ports.yml" in cf:
|
||||
return [{"tcp_host": "127.0.0.1", "tcp_port": 64738, "stable": 3}]
|
||||
return []
|
||||
|
||||
@ -15,13 +15,13 @@ def _write(domain, val):
|
||||
lifecycle.exec_in_app(domain, ["sh", "-c", f"echo {val} > {MARKER}"])
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_write(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_write(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_write(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_write(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
_write(domain, "mutated") # diverge so a successful restore is observable
|
||||
def pre_restore(ctx):
|
||||
_write(ctx.domain, "mutated") # diverge so a successful restore is observable
|
||||
|
||||
@ -24,17 +24,17 @@ def _seed(domain, value):
|
||||
assert _psql(domain, "SELECT v FROM ci_marker;") == value
|
||||
|
||||
|
||||
def pre_upgrade(domain, meta):
|
||||
_seed(domain, "upgrade-survives")
|
||||
def pre_upgrade(ctx):
|
||||
_seed(ctx.domain, "upgrade-survives")
|
||||
|
||||
|
||||
def pre_backup(domain, meta):
|
||||
_seed(domain, "original")
|
||||
def pre_backup(ctx):
|
||||
_seed(ctx.domain, "original")
|
||||
|
||||
|
||||
def pre_restore(domain, meta):
|
||||
_psql(domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
def pre_restore(ctx):
|
||||
_psql(ctx.domain, "DROP TABLE ci_marker;")
|
||||
assert _psql(ctx.domain, "SELECT to_regclass('public.ci_marker');") in (
|
||||
"",
|
||||
"NULL",
|
||||
), "drop did not take"
|
||||
|
||||
@ -13,6 +13,7 @@ import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import canonical, warm # noqa: E402
|
||||
from harness import meta as harness_meta # noqa: E402
|
||||
|
||||
|
||||
def test_canonical_domain():
|
||||
@ -33,11 +34,9 @@ def test_is_enrolled_reads_flag(tmp_path, monkeypatch):
|
||||
tests_dir = tmp_path / "tests" / recipe
|
||||
tests_dir.mkdir(parents=True)
|
||||
(tests_dir / "recipe_meta.py").write_text("WARM_CANONICAL = True\n")
|
||||
# canonical.is_enrolled builds the path from canonical.__file__/../../tests/<recipe>; emulate by
|
||||
# creating the layout under a fake harness dir and pointing __file__ there.
|
||||
fake_harness = tmp_path / "runner" / "harness"
|
||||
fake_harness.mkdir(parents=True)
|
||||
monkeypatch.setattr(canonical, "__file__", str(fake_harness / "canonical.py"))
|
||||
# is_enrolled reads through the single meta loader (rcust P1); point its tests/ root at the
|
||||
# temp layout.
|
||||
monkeypatch.setattr(harness_meta, "TESTS_DIR", str(tmp_path / "tests"))
|
||||
assert canonical.is_enrolled(recipe) is True
|
||||
(tests_dir / "recipe_meta.py").write_text("WARM_CANONICAL = False\n")
|
||||
assert canonical.is_enrolled(recipe) is False
|
||||
@ -65,9 +64,7 @@ def test_registry_roundtrip(tmp_path, monkeypatch):
|
||||
|
||||
def test_enrolled_recipes_scans_meta(tmp_path, monkeypatch):
|
||||
# enrolled_recipes() lists recipes whose tests/<r>/recipe_meta.py sets WARM_CANONICAL=True.
|
||||
fake_harness = tmp_path / "runner" / "harness"
|
||||
fake_harness.mkdir(parents=True)
|
||||
monkeypatch.setattr(canonical, "__file__", str(fake_harness / "canonical.py"))
|
||||
monkeypatch.setattr(harness_meta, "TESTS_DIR", str(tmp_path / "tests"))
|
||||
for name, body in (
|
||||
("aaa", "WARM_CANONICAL = True\n"),
|
||||
("bbb", "DEPS=['x']\n"),
|
||||
|
||||
48
tests/unit/test_conftest_fixtures.py
Normal file
48
tests/unit/test_conftest_fixtures.py
Normal file
@ -0,0 +1,48 @@
|
||||
"""Unit tests for the shared conftest fixtures added/reshaped by the rcust restructure (P2d/P4):
|
||||
`op_state` (run-scoped op context from $CCCI_OP_STATE_FILE) and `deps` (consolidated dep creds
|
||||
with attribute sugar). Pure — exercised via request.getfixturevalue with env monkeypatched."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def test_op_state_fixture_reads_file(tmp_path, monkeypatch, request):
|
||||
f = tmp_path / "op.json"
|
||||
f.write_text(json.dumps({"backup": {"snapshot_id": "abc123"}, "upgrade": {"head_ref": "h"}}))
|
||||
monkeypatch.setenv("CCCI_OP_STATE_FILE", str(f))
|
||||
st = request.getfixturevalue("op_state")
|
||||
assert st["backup"]["snapshot_id"] == "abc123"
|
||||
assert st["upgrade"]["head_ref"] == "h"
|
||||
|
||||
|
||||
def test_op_state_fixture_skips_without_env(monkeypatch, request):
|
||||
monkeypatch.delenv("CCCI_OP_STATE_FILE", raising=False)
|
||||
with pytest.raises(pytest.skip.Exception, match="orchestrator"):
|
||||
request.getfixturevalue("op_state")
|
||||
|
||||
|
||||
def test_op_state_fixture_skips_on_missing_file(tmp_path, monkeypatch, request):
|
||||
monkeypatch.setenv("CCCI_OP_STATE_FILE", str(tmp_path / "nope.json"))
|
||||
with pytest.raises(pytest.skip.Exception, match="missing"):
|
||||
request.getfixturevalue("op_state")
|
||||
|
||||
|
||||
def test_deps_fixture_entries_expose_attributes(tmp_path, monkeypatch, request):
|
||||
"""`deps` (session-scoped) coerces the run deps file into entries with .domain/.realm/...
|
||||
attribute sugar while keeping dict-style access (rcust P2d). Single test for the session-
|
||||
cached fixture (one instantiation)."""
|
||||
f = tmp_path / "deps.json"
|
||||
f.write_text(
|
||||
json.dumps(
|
||||
{"keycloak": {"recipe": "keycloak", "domain": "kc.x", "client_secret": "s3cret"}}
|
||||
)
|
||||
)
|
||||
monkeypatch.setenv("CCCI_DEPS_FILE", str(f))
|
||||
deps = request.getfixturevalue("deps")
|
||||
assert deps["keycloak"].domain == "kc.x"
|
||||
assert deps["keycloak"]["client_secret"] == "s3cret"
|
||||
with pytest.raises(AttributeError):
|
||||
_ = deps["keycloak"].not_a_field
|
||||
@ -1,9 +1,9 @@
|
||||
"""Unit tests for runner/harness/deps.py (Phase 2 §4.2 / Q2.3).
|
||||
|
||||
Pure-Python: no real deploys. Tests the declarative parts of the dep resolver — declared_deps
|
||||
reading from `tests/<recipe>/recipe_meta.py`, the per-dep domain derivation, and write/load of the
|
||||
run state file. The deploy_deps + teardown_deps integration is exercised by real e2e against cc-ci
|
||||
(Q2.4 acceptance).
|
||||
Pure-Python: no real deploys. Tests the declarative parts of the dep resolver — DEPS declaration
|
||||
(read through the single meta loader since rcust P1), the per-dep domain derivation, and write/load
|
||||
of the run state file. The deploy_deps + teardown_deps integration is exercised by real e2e against
|
||||
cc-ci (Q2.4 acceptance).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@ -13,42 +13,23 @@ import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import deps # noqa: E402
|
||||
from harness import meta as meta_mod # noqa: E402
|
||||
|
||||
|
||||
def test_declared_deps_returns_empty_for_no_meta(monkeypatch, tmp_path):
|
||||
"""A recipe with no recipe_meta.py returns []."""
|
||||
fake_recipe = "ccci-no-meta"
|
||||
# No file at tests/<fake_recipe>/recipe_meta.py -> declared_deps reads nothing -> []
|
||||
monkeypatch.chdir(tmp_path)
|
||||
assert deps.declared_deps(fake_recipe) == []
|
||||
def test_declared_deps_empty_for_no_meta(monkeypatch, tmp_path):
|
||||
"""A recipe with no recipe_meta.py declares no deps (rcust P1: DEPS via meta.load)."""
|
||||
monkeypatch.setattr(meta_mod, "TESTS_DIR", str(tmp_path / "tests"))
|
||||
assert meta_mod.load("ccci-no-meta").DEPS == []
|
||||
|
||||
|
||||
def test_declared_deps_reads_DEPS_list(tmp_path, monkeypatch):
|
||||
"""A recipe_meta.py with `DEPS = [...]` returns the list."""
|
||||
fake_recipe = "ccci-with-deps"
|
||||
# Build a fake repo layout under tmp_path
|
||||
recipe_dir = tmp_path / "tests" / fake_recipe
|
||||
"""A recipe_meta.py with `DEPS = [...]` surfaces the list on the loaded meta (the orchestrator
|
||||
reads meta.DEPS — the successor of the deleted deps.declared_deps loader)."""
|
||||
recipe_dir = tmp_path / "tests" / "ccci-with-deps"
|
||||
recipe_dir.mkdir(parents=True)
|
||||
(recipe_dir / "recipe_meta.py").write_text('HEALTH_PATH = "/"\nDEPS = ["keycloak", "redis"]\n')
|
||||
# Patch the deps module's idea of "where the repo is" by monkey-patching __file__ for the
|
||||
# function indirectly: declared_deps uses `os.path.dirname(__file__), "..", "..", "tests"` —
|
||||
# which resolves to the real repo's `tests/`. So instead, override that with a symlink/dir
|
||||
# under tmp_path: deps.__file__ points at the runner module. We can't easily relocate that.
|
||||
# Instead, mock the path by writing the fake recipe under the REAL tests/ dir.
|
||||
real_tests = os.path.join(os.path.dirname(deps.__file__), "..", "..", "tests")
|
||||
target_dir = os.path.join(real_tests, fake_recipe)
|
||||
os.makedirs(target_dir, exist_ok=True)
|
||||
target_meta = os.path.join(target_dir, "recipe_meta.py")
|
||||
try:
|
||||
with open(target_meta, "w") as f:
|
||||
f.write('DEPS = ["keycloak", "redis"]\n')
|
||||
result = deps.declared_deps(fake_recipe)
|
||||
assert result == ["keycloak", "redis"]
|
||||
finally:
|
||||
if os.path.exists(target_meta):
|
||||
os.remove(target_meta)
|
||||
if os.path.isdir(target_dir):
|
||||
os.rmdir(target_dir)
|
||||
monkeypatch.setattr(meta_mod, "TESTS_DIR", str(tmp_path / "tests"))
|
||||
assert meta_mod.load("ccci-with-deps").DEPS == ["keycloak", "redis"]
|
||||
|
||||
|
||||
def test_dep_domain_distinct_per_dep():
|
||||
|
||||
@ -71,17 +71,18 @@ def test_repo_local_wins_when_approved(tmp_path):
|
||||
|
||||
|
||||
def test_custom_tests_repo_local_gated(tmp_path, monkeypatch):
|
||||
# non-lifecycle test_*.py from repo-local only count for approved recipes; lifecycle names excluded
|
||||
# custom test_*.py from repo-local only count for approved recipes (HC2); placement rule
|
||||
# (rcust P4): custom tests live under functional/ (or playwright/) — top-level files are
|
||||
# lifecycle overlays only, so the repo-local custom here sits in functional/.
|
||||
# Use a synthetic recipe name + monkeypatched cc_ci_dir so this is independent of what
|
||||
# tests/<real-recipe>/ ships (Phase-2 custom-html now also ships functional/ + playwright/,
|
||||
# which would legitimately appear in custom_tests for "custom-html" — F2-1).
|
||||
# tests/<real-recipe>/ ships (F2-1).
|
||||
fake_recipe = "ccci-hc2-fixture"
|
||||
monkeypatch.setattr(discovery, "cc_ci_dir", lambda r: str(tmp_path / "cc-ci" / r))
|
||||
(tmp_path / "cc-ci" / fake_recipe).mkdir(parents=True)
|
||||
rl = tmp_path / "repo"
|
||||
rl.mkdir()
|
||||
(rl / "test_sso.py").write_text("# repo-local custom\n")
|
||||
(rl / "test_install.py").write_text("# lifecycle name -> excluded from custom\n")
|
||||
(rl / "functional").mkdir(parents=True)
|
||||
(rl / "functional" / "test_sso.py").write_text("# repo-local custom\n")
|
||||
(rl / "functional" / "test_install.py").write_text("# lifecycle name -> excluded from custom\n")
|
||||
|
||||
_approve(tmp_path) # not approved -> repo-local custom ignored
|
||||
assert discovery.custom_tests(fake_recipe, str(rl)) == []
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
"""Unit tests for Phase-2 discovery additions (plan §4.1).
|
||||
|
||||
Proves the `custom_tests` discovery recurses into the per-recipe `functional/` + `playwright/`
|
||||
Proves the `custom_tests` discovery covers exactly the per-recipe `functional/` + `playwright/`
|
||||
subdirs as well as the top-level dir, while still excluding lifecycle `test_<op>.py` names and
|
||||
honouring the HC2 repo-local approval gate.
|
||||
|
||||
@ -27,16 +27,16 @@ def teardown_function():
|
||||
os.environ.pop("CCCI_REPO_LOCAL_APPROVED_FILE", None)
|
||||
|
||||
|
||||
def test_custom_tests_recurses_functional_and_playwright(tmp_path, monkeypatch):
|
||||
"""A Phase-2 cc-ci recipe layout: functional/test_*.py + playwright/test_*.py + top-level
|
||||
test_*.py — all are discovered as custom tests; the lifecycle names are excluded."""
|
||||
def test_custom_tests_placement_rule_functional_playwright_only(tmp_path, monkeypatch):
|
||||
"""Placement rule (rcust P4): custom tests are discovered ONLY under functional/ +
|
||||
playwright/. A top-level non-lifecycle test_*.py is NOT discovered (top level is reserved
|
||||
for lifecycle overlays); lifecycle names inside the subdirs stay excluded (defensive)."""
|
||||
# Point cc-ci's per-recipe dir at a fake recipe in tmp_path
|
||||
fake_recipe = "ccci-phase2-fixture"
|
||||
fake_dir = tmp_path / "tests" / fake_recipe
|
||||
(fake_dir / "functional").mkdir(parents=True)
|
||||
(fake_dir / "playwright").mkdir()
|
||||
# legitimate custom tests at multiple levels
|
||||
(fake_dir / "test_sso_smoke.py").write_text("# top-level cross-cutting\n")
|
||||
(fake_dir / "test_sso_smoke.py").write_text("# top-level — NOT discovered since P4\n")
|
||||
(fake_dir / "functional" / "test_health_check.py").write_text("# parity port\n")
|
||||
(fake_dir / "functional" / "test_content_roundtrip.py").write_text("# recipe-specific\n")
|
||||
(fake_dir / "playwright" / "test_login_flow.py").write_text("# UI flow\n")
|
||||
@ -49,11 +49,11 @@ def test_custom_tests_recurses_functional_and_playwright(tmp_path, monkeypatch):
|
||||
customs = discovery.custom_tests(fake_recipe, None)
|
||||
names = sorted((src, os.path.basename(p)) for src, p in customs)
|
||||
|
||||
# Top-level + functional/ + playwright/ all discovered; lifecycle name excluded
|
||||
assert ("cc-ci", "test_sso_smoke.py") in names
|
||||
# functional/ + playwright/ discovered; top-level custom + lifecycle name are NOT
|
||||
assert ("cc-ci", "test_health_check.py") in names
|
||||
assert ("cc-ci", "test_content_roundtrip.py") in names
|
||||
assert ("cc-ci", "test_login_flow.py") in names
|
||||
assert ("cc-ci", "test_sso_smoke.py") not in names
|
||||
assert ("cc-ci", "test_install.py") not in names
|
||||
|
||||
|
||||
|
||||
@ -30,7 +30,7 @@ def test_sso_dep_unverified_true_when_declared_notready_and_skipped():
|
||||
|
||||
|
||||
def test_sso_dep_unverified_false_when_deps_ready():
|
||||
"""deps ready (setup_custom_tests succeeded) → SSO tests actually ran → not a failure."""
|
||||
"""deps ready (dep provisioning succeeded) → SSO tests actually ran → not a failure."""
|
||||
assert not run_recipe_ci.sso_dep_unverified(
|
||||
["keycloak"], deps_ready=True, requires_deps_skipped=0
|
||||
)
|
||||
|
||||
@ -14,6 +14,7 @@ So `-c` + owned-wait is non-vacuous: a genuinely-broken upgrade stays RED.
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import dataclasses
|
||||
import os
|
||||
import sys
|
||||
|
||||
@ -21,6 +22,7 @@ import pytest
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import lifecycle as lc # noqa: E402
|
||||
from harness import meta as harness_meta # noqa: E402
|
||||
|
||||
|
||||
def _fake_clock(monkeypatch):
|
||||
@ -31,11 +33,15 @@ def _fake_clock(monkeypatch):
|
||||
return state
|
||||
|
||||
|
||||
_DRIVE_META = {
|
||||
"READY_PROBE": lambda d: [
|
||||
{"host": f"collabora-{d}", "path": "/hosting/discovery", "ok": (200,)}
|
||||
]
|
||||
}
|
||||
# RecipeMeta (rcust P1: wait_ready_probes reads meta.READY_PROBE off the loaded object); defaults
|
||||
# + the drive-style probe hook (P3 ctx signature: the probe receives a HookCtx).
|
||||
_DRIVE_META = dataclasses.replace(
|
||||
harness_meta.load("ccci-no-such-recipe"),
|
||||
READY_PROBE=lambda ctx: [
|
||||
{"host": f"collabora-{ctx.domain}", "path": "/hosting/discovery", "ok": (200,)}
|
||||
],
|
||||
)
|
||||
_NO_PROBE_META = harness_meta.load("ccci-no-such-recipe")
|
||||
|
||||
|
||||
def test_wait_ready_probes_raises_when_never_ready(monkeypatch):
|
||||
@ -57,7 +63,7 @@ def test_wait_ready_probes_returns_when_ready(monkeypatch):
|
||||
def test_wait_ready_probes_noop_without_probe(monkeypatch):
|
||||
"""A recipe with no READY_PROBE is a clean no-op (default behavior preserved for all recipes)."""
|
||||
monkeypatch.setattr(lc, "http_get", lambda *a, **k: 599) # would fail if it were consulted
|
||||
lc.wait_ready_probes({}, "x.ci.commoninternet.net", timeout=1) # no raise, no call
|
||||
lc.wait_ready_probes(_NO_PROBE_META, "x.ci.commoninternet.net", timeout=1) # no raise, no call
|
||||
|
||||
|
||||
def test_wait_healthy_raises_when_services_never_converge(monkeypatch):
|
||||
|
||||
276
tests/unit/test_meta.py
Normal file
276
tests/unit/test_meta.py
Normal file
@ -0,0 +1,276 @@
|
||||
"""Unit tests for the single recipe-meta loader + key registry (rcust P1; spec §8 R1/R6).
|
||||
|
||||
Covers: every in-repo recipe_meta.py loads clean through the registry (THE typo gate), validation
|
||||
hard-errors (unknown key, wrong type, callable on a data key), the zero-config baseline defaults
|
||||
(spec §2), the underscore exemption for recipe-private constants, and the registry↔generated-doc
|
||||
sync (P1.5; drift fails CI). Run: cc-ci-run -m pytest tests/unit/test_meta.py -q
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
import pytest
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import meta as meta_mod # noqa: E402
|
||||
from harness.meta import KEYS, MetaError, RecipeMeta # noqa: E402
|
||||
|
||||
ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
||||
|
||||
|
||||
def _recipes_with_meta() -> list[str]:
|
||||
tests_dir = os.path.join(ROOT, "tests")
|
||||
return sorted(
|
||||
n
|
||||
for n in os.listdir(tests_dir)
|
||||
if os.path.isfile(os.path.join(tests_dir, n, "recipe_meta.py"))
|
||||
)
|
||||
|
||||
|
||||
# ---- the typo gate: every in-repo recipe meta must validate against the registry --------------
|
||||
|
||||
|
||||
@pytest.mark.parametrize("recipe", _recipes_with_meta())
|
||||
def test_every_recipe_meta_loads_clean(recipe):
|
||||
"""All tests/*/recipe_meta.py in the repo load + validate through the registry. A typo'd or
|
||||
unregistered ALL-CAPS key in any recipe meta fails HERE, at PR time — not silently at run
|
||||
time (the R6 failure mode this restructure kills)."""
|
||||
meta = meta_mod.load(recipe)
|
||||
assert isinstance(meta, RecipeMeta)
|
||||
# sanity: the 4 base keys always materialize with usable types
|
||||
assert isinstance(meta.HEALTH_PATH, str)
|
||||
assert isinstance(meta.HEALTH_OK, tuple) and meta.HEALTH_OK
|
||||
assert isinstance(meta.DEPLOY_TIMEOUT, int) and isinstance(meta.HTTP_TIMEOUT, int)
|
||||
|
||||
|
||||
# ---- zero-config baseline (spec §2) ------------------------------------------------------------
|
||||
|
||||
|
||||
def test_missing_meta_yields_spec_baseline(tmp_path):
|
||||
meta = meta_mod.load("no-such-recipe", tests_dir=str(tmp_path))
|
||||
assert meta.HEALTH_PATH == "/"
|
||||
assert meta.HEALTH_OK == (200, 301, 302)
|
||||
assert meta.DEPLOY_TIMEOUT == 600
|
||||
assert meta.HTTP_TIMEOUT == 300
|
||||
assert meta.BACKUP_CAPABLE is None # None = auto-detect (tri-state, not False)
|
||||
assert meta.EXPECTED_NA is None
|
||||
assert meta.READY_PROBE is None
|
||||
assert meta.UPGRADE_BASE_VERSION is None
|
||||
assert meta.BACKUP_VERIFY is None
|
||||
assert meta.UPGRADE_EXTRA_ENV is None
|
||||
assert meta.EXTRA_ENV == {}
|
||||
assert meta.DEPS == []
|
||||
assert meta.WARM_CANONICAL is False
|
||||
assert meta.SCREENSHOT is None
|
||||
assert meta_mod.non_default(meta) == {}
|
||||
|
||||
|
||||
def test_registry_field_set_matches_dataclass():
|
||||
"""The RecipeMeta field set is generated from KEYS — no drift possible, pinned anyway."""
|
||||
import dataclasses
|
||||
|
||||
assert [f.name for f in dataclasses.fields(RecipeMeta)] == [k.name for k in KEYS]
|
||||
# the 14 final keys, no more (the 3 P2-deleted legacy keys are gone from the registry,
|
||||
# so any recipe_meta still setting them hard-fails the typo gate)
|
||||
assert len(KEYS) == 14
|
||||
assert not [k for k in KEYS if k.deprecated]
|
||||
for gone in ("CHAOS_BASE_DEPLOY", "OIDC_AT_INSTALL", "SKIP_GENERIC"):
|
||||
assert gone not in {k.name for k in KEYS}
|
||||
|
||||
|
||||
# ---- validation hard errors (locked decision: fail fast at load) -------------------------------
|
||||
|
||||
|
||||
def _write_meta(tmp_path, body: str, recipe: str = "r") -> str:
|
||||
d = tmp_path / recipe
|
||||
d.mkdir(exist_ok=True)
|
||||
(d / "recipe_meta.py").write_text(body)
|
||||
return recipe
|
||||
|
||||
|
||||
def test_unknown_key_raises_with_suggestion(tmp_path):
|
||||
r = _write_meta(tmp_path, "READINESS_PROBE = None\n") # the R6 typo example
|
||||
with pytest.raises(MetaError) as ei:
|
||||
meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
msg = str(ei.value)
|
||||
assert "READINESS_PROBE" in msg and "READY_PROBE" in msg # names the typo + nearest key
|
||||
|
||||
|
||||
def test_unknown_key_without_near_match_lists_registry(tmp_path):
|
||||
r = _write_meta(tmp_path, "TOTALLY_BOGUS_KNOB = 1\n")
|
||||
with pytest.raises(MetaError) as ei:
|
||||
meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
assert "HEALTH_PATH" in str(ei.value) # registered keys listed for the reader
|
||||
|
||||
|
||||
def test_wrong_type_raises(tmp_path):
|
||||
r = _write_meta(tmp_path, 'DEPLOY_TIMEOUT = "900"\n')
|
||||
with pytest.raises(MetaError, match="DEPLOY_TIMEOUT"):
|
||||
meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
|
||||
|
||||
def test_bool_not_accepted_as_int(tmp_path):
|
||||
r = _write_meta(tmp_path, "DEPLOY_TIMEOUT = True\n")
|
||||
with pytest.raises(MetaError, match="DEPLOY_TIMEOUT"):
|
||||
meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
|
||||
|
||||
def test_callable_on_data_key_rejected(tmp_path):
|
||||
r = _write_meta(tmp_path, "def HEALTH_PATH():\n return '/'\n")
|
||||
with pytest.raises(MetaError, match="hook-typed"):
|
||||
meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
|
||||
|
||||
def test_non_callable_on_hook_key_rejected(tmp_path):
|
||||
r = _write_meta(tmp_path, "READY_PROBE = ['not', 'a', 'callable']\n")
|
||||
with pytest.raises(MetaError, match="READY_PROBE"):
|
||||
meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
|
||||
|
||||
def test_underscore_names_are_private_and_exempt(tmp_path):
|
||||
r = _write_meta(
|
||||
tmp_path,
|
||||
"_WELCOME_TEXT_MARKER = 'marker-xyz'\n_MAX_USERS = 42\n"
|
||||
"EXTRA_ENV = {'WELCOME_TEXT': _WELCOME_TEXT_MARKER, 'USERS': str(_MAX_USERS)}\n",
|
||||
)
|
||||
meta = meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
assert meta.EXTRA_ENV == {"WELCOME_TEXT": "marker-xyz", "USERS": "42"}
|
||||
|
||||
|
||||
def test_lowercase_helpers_ignored(tmp_path):
|
||||
r = _write_meta(
|
||||
tmp_path,
|
||||
"def _helper(d):\n return {'K': d}\n\ndef EXTRA_ENV(ctx):\n return _helper(ctx.domain)\n",
|
||||
)
|
||||
meta = meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
ctx = meta_mod.hook_ctx("x.example", meta)
|
||||
assert meta_mod.extra_env(meta, ctx) == {"K": "x.example"}
|
||||
|
||||
|
||||
# ---- normalization + helpers --------------------------------------------------------------------
|
||||
|
||||
|
||||
def test_health_ok_list_normalized_to_tuple(tmp_path):
|
||||
r = _write_meta(tmp_path, "HEALTH_OK = [200, 302]\n")
|
||||
assert meta_mod.load(r, tests_dir=str(tmp_path)).HEALTH_OK == (200, 302)
|
||||
|
||||
|
||||
def test_extra_env_dict_and_callable_forms(tmp_path):
|
||||
r = _write_meta(tmp_path, "EXTRA_ENV = {'A': 1}\n")
|
||||
meta = meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
assert meta_mod.extra_env(meta, meta_mod.hook_ctx("d", meta)) == {"A": "1"} # stringified
|
||||
r2 = _write_meta(
|
||||
tmp_path, "UPGRADE_EXTRA_ENV = lambda ctx: {'COMPOSE_FILE': ctx.domain}\n", recipe="r2"
|
||||
)
|
||||
meta2 = meta_mod.load(r2, tests_dir=str(tmp_path))
|
||||
ctx2 = meta_mod.hook_ctx("dom.x", meta2, op="upgrade")
|
||||
assert meta_mod.upgrade_extra_env(meta2, ctx2) == {"COMPOSE_FILE": "dom.x"}
|
||||
assert meta_mod.extra_env(meta2, ctx2) == {} # unset EXTRA_ENV resolves to {}
|
||||
|
||||
|
||||
# ---- P3: uniform ctx hook convention -------------------------------------------------------------
|
||||
|
||||
|
||||
def test_hook_ctx_fields(tmp_path):
|
||||
meta = meta_mod.load("no-such", tests_dir=str(tmp_path))
|
||||
ctx = meta_mod.hook_ctx("app.ci.example", meta, op="backup")
|
||||
assert ctx.domain == "app.ci.example"
|
||||
assert ctx.base_url == "https://app.ci.example"
|
||||
assert ctx.meta is meta
|
||||
assert ctx.op == "backup"
|
||||
assert meta_mod.hook_ctx("d", meta).op is None
|
||||
|
||||
|
||||
def test_hook_ctx_deps_from_run_file(tmp_path, monkeypatch):
|
||||
import json
|
||||
|
||||
meta = meta_mod.load("no-such", tests_dir=str(tmp_path))
|
||||
monkeypatch.delenv("CCCI_DEPS_FILE", raising=False)
|
||||
assert meta_mod.hook_ctx("d", meta).deps is None
|
||||
f = tmp_path / "deps.json"
|
||||
f.write_text(json.dumps({"keycloak": {"recipe": "keycloak", "domain": "kc.x"}}))
|
||||
monkeypatch.setenv("CCCI_DEPS_FILE", str(f))
|
||||
deps = meta_mod.hook_ctx("d", meta).deps
|
||||
assert deps["keycloak"]["domain"] == "kc.x"
|
||||
f.write_text("{}") # empty dict -> None (deps declared but not provisioned)
|
||||
assert meta_mod.hook_ctx("d", meta).deps is None
|
||||
|
||||
|
||||
def test_legacy_hook_signature_raises_clear_meta_error(tmp_path):
|
||||
"""A pre-restructure hook signature must fail AT LOAD with a migration message — never a
|
||||
silent TypeError mid-run (P3.4)."""
|
||||
r = _write_meta(tmp_path, "def READY_PROBE(domain):\n return []\n")
|
||||
with pytest.raises(MetaError, match="ctx"):
|
||||
meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
r2 = _write_meta(tmp_path, "EXTRA_ENV = lambda domain: {}\n", recipe="r2")
|
||||
with pytest.raises(MetaError, match="restructure"):
|
||||
meta_mod.load(r2, tests_dir=str(tmp_path))
|
||||
r3 = _write_meta(
|
||||
tmp_path, "def SCREENSHOT(page, domain, meta):\n return None\n", recipe="r3"
|
||||
)
|
||||
with pytest.raises(MetaError, match="page, ctx"):
|
||||
meta_mod.load(r3, tests_dir=str(tmp_path))
|
||||
|
||||
|
||||
def test_ctx_hook_signatures_accepted(tmp_path):
|
||||
r = _write_meta(
|
||||
tmp_path,
|
||||
"def READY_PROBE(ctx):\n return []\n"
|
||||
"def BACKUP_VERIFY(ctx):\n return True\n"
|
||||
"def SCREENSHOT(page, ctx):\n return None\n"
|
||||
"def EXTRA_ENV(ctx):\n return {}\n",
|
||||
)
|
||||
meta = meta_mod.load(r, tests_dir=str(tmp_path))
|
||||
assert callable(meta.READY_PROBE) and callable(meta.SCREENSHOT)
|
||||
|
||||
|
||||
def test_check_hook_signature_for_pre_op_hooks():
|
||||
"""The orchestrator validates ops.py pre_<op> hooks with the same checker (legacy
|
||||
(domain, meta) form names the migration)."""
|
||||
|
||||
def legacy(domain, meta):
|
||||
pass
|
||||
|
||||
def new(ctx):
|
||||
pass
|
||||
|
||||
with pytest.raises(MetaError, match="ctx"):
|
||||
meta_mod.check_hook_signature(legacy, ("ctx",), "tests/x/ops.py::pre_upgrade")
|
||||
meta_mod.check_hook_signature(new, ("ctx",), "tests/x/ops.py::pre_upgrade") # no raise
|
||||
|
||||
|
||||
def test_non_default_reports_only_customized_keys(tmp_path):
|
||||
r = _write_meta(tmp_path, "DEPLOY_TIMEOUT = 1500\nDEPS = ['keycloak']\n")
|
||||
nd = meta_mod.non_default(meta_mod.load(r, tests_dir=str(tmp_path)))
|
||||
assert nd == {"DEPLOY_TIMEOUT": 1500, "DEPS": ["keycloak"]}
|
||||
|
||||
|
||||
def test_meta_is_frozen():
|
||||
import dataclasses
|
||||
|
||||
meta = meta_mod.load("custom-html")
|
||||
with pytest.raises(dataclasses.FrozenInstanceError):
|
||||
meta.DEPLOY_TIMEOUT = 1
|
||||
|
||||
|
||||
# ---- doc generation sync (P1.5: the committed §4 table == the registry rendering) ---------------
|
||||
|
||||
|
||||
def test_generated_doc_table_in_sync():
|
||||
"""docs/recipe-customization.md's key reference table is GENERATED from the registry
|
||||
(scripts/gen-meta-docs.py). If this fails: re-run `python3 scripts/gen-meta-docs.py` and
|
||||
commit the result — the table must never drift from the registry (R5)."""
|
||||
gen = os.path.join(ROOT, "scripts", "gen-meta-docs.py")
|
||||
doc = os.path.join(ROOT, "docs", "recipe-customization.md")
|
||||
rendered = subprocess.run(
|
||||
[sys.executable, gen, "--print"], capture_output=True, text=True, check=True
|
||||
).stdout
|
||||
with open(doc) as f:
|
||||
committed = f.read()
|
||||
assert rendered.strip() in committed, (
|
||||
"docs/recipe-customization.md key table is out of sync with the harness.meta registry — "
|
||||
"run `python3 scripts/gen-meta-docs.py` and commit"
|
||||
)
|
||||
@ -11,6 +11,7 @@ import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import meta as meta_mod # noqa: E402
|
||||
from harness import screenshot as S # noqa: E402
|
||||
|
||||
|
||||
@ -29,3 +30,19 @@ def test_hook_returned_when_callable():
|
||||
pass
|
||||
|
||||
assert S._load_screenshot_hook({"SCREENSHOT": hook}) is hook
|
||||
|
||||
|
||||
def test_screenshot_reachable_through_real_load_path(tmp_path):
|
||||
"""R2 proof (rcust P1): a recipe SCREENSHOT hook declared in recipe_meta.py arrives at
|
||||
screenshot._load_screenshot_hook through the REAL orchestrator load path (meta.load — the
|
||||
object run_recipe_ci passes to capture()). Under the old six-loader world the orchestrator's
|
||||
L1 allowlist dropped SCREENSHOT, so the hook was unreachable (spec §8 R2)."""
|
||||
d = tmp_path / "shotrecipe"
|
||||
d.mkdir()
|
||||
(d / "recipe_meta.py").write_text(
|
||||
"def SCREENSHOT(page, ctx):\n return None\n",
|
||||
)
|
||||
meta = meta_mod.load("shotrecipe", tests_dir=str(tmp_path))
|
||||
hook = S._load_screenshot_hook(meta)
|
||||
assert callable(hook), "SCREENSHOT hook did not survive the orchestrator load path (R2)"
|
||||
assert S._load_screenshot_hook(meta_mod.load("no-such", tests_dir=str(tmp_path))) is None
|
||||
|
||||
Reference in New Issue
Block a user