Compare commits
32 Commits
restructur
...
fix/conver
| Author | SHA1 | Date | |
|---|---|---|---|
| be2026aafb | |||
| 4dcfb5ba96 | |||
| 1ec0e772e8 | |||
| 40b59b356b | |||
| 5c0676b7d0 | |||
| efd7efc32b | |||
| 1357544301 | |||
| 57c66add51 | |||
| a95fad4fa0 | |||
| b9abf48116 | |||
| 4cb1f57e2c | |||
| e30a414ce1 | |||
| 41033b4500 | |||
| a7a558ada3 | |||
| 37dcfab07d | |||
| ffc88848f3 | |||
| 85d14101ef | |||
| 9aa0c5d624 | |||
| 4d342a2c5d | |||
| 01e6d497ba | |||
| 01f9f70970 | |||
| c2508c7fd2 | |||
| 8984b57b35 | |||
| 5ccc0d1c34 | |||
| 52f5266dfb | |||
| 270476beb3 | |||
| ff09c4075b | |||
| 63befd05b0 | |||
| 802b2792a7 | |||
| 0264af72c7 | |||
| 8945d13674 | |||
| f5119a9703 |
190
JOURNAL-rcust.md
190
JOURNAL-rcust.md
@ -8,3 +8,193 @@ be `restructure/recipe-custom` off main @ 76a4b6b. Starting P1: reading the six
|
||||
(run_recipe_ci.py::_load_meta, conftest.py::_recipe_meta, lifecycle.py::_recipe_extra_env,
|
||||
lifecycle.py::_recipe_meta_flag, deps.py::declared_deps, canonical.py::is_canonical_enrolled)
|
||||
before writing harness/meta.py.
|
||||
|
||||
## 2026-06-10 P1 — single loader + registry (branch 472a68b)
|
||||
|
||||
Wrote runner/harness/meta.py: KEYS registry (14 keys + CHAOS_BASE_DEPLOY/OIDC_AT_INSTALL/
|
||||
SKIP_GENERIC kept registered as deprecated=True so P1 lands green before P2 deletes them),
|
||||
RecipeMeta generated from KEYS via dataclasses.make_dataclass (frozen; field set cannot drift from
|
||||
the registry), load() = the only exec() of recipe_meta.py, MetaError on unknown ALL-CAPS/type
|
||||
mismatch/callable-on-data-key, difflib suggestion in the unknown-key message. BACKUP_CAPABLE keeps
|
||||
its tri-state via default None (None = auto-detect — preserves the old `"BACKUP_CAPABLE" in meta`
|
||||
semantics in generic.backup_capable).
|
||||
|
||||
Migrations: orchestrator loads once + passes meta down (deploy_app/perform_upgrade/_perform_op/
|
||||
run_lifecycle_tier all take the object); conftest meta fixture returns full RecipeMeta (R3 closed);
|
||||
lifecycle._recipe_extra_env/_recipe_meta_flag and deps.declared_deps deleted; canonical.is_enrolled
|
||||
+ enrolled_recipes go through meta.load (tests monkeypatch meta.TESTS_DIR now instead of
|
||||
canonical.__file__); screenshot._load_screenshot_hook reads the attribute (R2 fixed — unit test
|
||||
proves SCREENSHOT survives the real orchestrator load path). deploy_app keeps an optional
|
||||
meta=None fallback (loads via the single loader) for fixture/manual callers — exec still happens
|
||||
in exactly one function.
|
||||
|
||||
Effective-value safety check before committing: dumped non_default() for all 21 recipe dirs through
|
||||
the new loader — every recipe's customized key set matches its recipe_meta.py source (e.g. mumble:
|
||||
DEPLOY_TIMEOUT/EXTRA_ENV/HEALTH_OK/READY_PROBE/UPGRADE_EXTRA_ENV). One intentional delta class:
|
||||
deps.deploy_deps' fallback timeouts for a MISSING dep meta change from literal 900/600 to loading
|
||||
the dep's real meta (orchestrator path always supplied metas, so CI behavior is identical).
|
||||
|
||||
Verified on cc-ci (rsynced working tree before committing):
|
||||
cc-ci-run -m pytest tests/unit -q -> 175 passed
|
||||
nix develop .#lint --command scripts/lint.sh -> lint: PASS
|
||||
Three pre-existing f212 unit tests passed dicts to wait_ready_probes — updated mechanically to
|
||||
construct RecipeMeta via dataclasses.replace (assertions untouched).
|
||||
|
||||
Next: P2a compose.ccci.yml first-class + auto-chaos.
|
||||
|
||||
## 2026-06-10 P2 — legacy keys & paths deleted (branch 8cd72fd)
|
||||
|
||||
P2a: lifecycle.provide_ccci_overlay copies tests/<recipe>/compose.ccci.yml into the per-run
|
||||
checkout (after install_steps hook, before prepull/deploy); pinned base deploys auto-chaos on
|
||||
overlay presence (has_ccci_overlay replaces the meta.CHAOS_BASE_DEPLOY elif). ghost/discourse
|
||||
install_steps.sh were copy-only -> deleted whole; their metas keep COMPOSE_FILE in EXTRA_ENV
|
||||
(unchanged wiring, the harness now owns the copy).
|
||||
|
||||
P2b: oidc_at_install condition removed — `if declared:` provisions before the single deploy,
|
||||
legacy post-deploy block + _run_setup_custom_tests_hook deleted. lasuite-docs install_steps.sh is
|
||||
the meet/drive hook with docs' exact env names (diffed against the deleted setup_custom_tests.sh:
|
||||
same keys incl. OIDC_OP_DISCOVERY_ENDPOINT + scopes 'openid email profile'; secret-insert bump
|
||||
identical; only the abra-redeploy step is gone — the single deploy reads the env instead).
|
||||
lasuite-drive's MinIO bucket one-shot -> ops.py pre_install (runs at install-tier start, post-
|
||||
deploy; bucket lives in the minio volume so it survives upgrade/restore; same scale --detach +
|
||||
30x3s poll as the shell version). run_quick: deps still provision (realm/creds), hook call gone —
|
||||
no quick-enrolled recipe declares DEPS today; noted inline.
|
||||
|
||||
P2c: SKIP_GENERIC out of the registry; _skip_generic(op) env-only; skip_generic_env_overrides()
|
||||
prints a `!!` warning when active under DRONE (P5 will embed in the manifest).
|
||||
|
||||
P2d: conftest deps fixture = dict of _DepEntry (dict subclass w/ attribute sugar) — the 6 lasuite
|
||||
files only ever used deps_creds, renamed param to deps, zero assertion changes. NOTE for Adversary:
|
||||
some assert MESSAGE strings ('setup_custom_tests should have populated this.' -> 'dep
|
||||
provisioning...') and docstrings updated — message text only, no assert logic/expected values.
|
||||
|
||||
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 175 passed;
|
||||
nix develop .#lint --command scripts/lint.sh -> PASS. Doc table regenerated to the 14-key registry
|
||||
(doc-sync unit test pins it).
|
||||
|
||||
Next: P3 — HookCtx + ctx-hook signatures everywhere.
|
||||
|
||||
## 2026-06-10 P3 — uniform ctx hook convention (branch fd02d9f)
|
||||
|
||||
HookCtx frozen dataclass + hook_ctx() constructor in harness/meta.py; ctx.deps read straight from
|
||||
$CCCI_DEPS_FILE (json, both shapes) — meta.py stays import-cycle-free (deps.py imports lifecycle
|
||||
which imports meta). Registry keys carry hook_params; meta.load() enforces the expected positional
|
||||
names per hook key (READY_PROBE/BACKUP_VERIFY/EXTRA_ENV/UPGRADE_EXTRA_ENV=(ctx,),
|
||||
SCREENSHOT=(page, ctx)); _run_pre_hook applies meta.check_hook_signature(fn, ("ctx",)) to ops.py
|
||||
hooks before calling. Conversion of 17 ops.py + 8 recipe_meta hooks was scripted (def-line regex +
|
||||
bare `domain` -> `ctx.domain` inside the pre_*/hook function bodies only) and diff-reviewed; the
|
||||
only manual fixes: keycloak pre_restore passed `meta` -> `ctx.meta`, and two comment lines in
|
||||
lasuite-drive/-meet metas that the regex over-replaced were restored. wait_ready_probes gained
|
||||
op= (install/upgrade call sites pass it) so probes can know the phase.
|
||||
|
||||
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 180 passed; lint PASS.
|
||||
|
||||
Next: P4 — discovery placement rule + op_state/deps fixtures + migrate hand-parsers.
|
||||
|
||||
## 2026-06-10 P4 — custom-test ergonomics (branch 29a28e2)
|
||||
|
||||
Pre-change sweeps confirmed the plan's zero-users claims: no top-level non-lifecycle test_*.py in
|
||||
any recipe dir; no recipe test file reads os.environ / CCCI_OP_STATE_FILE directly (the only
|
||||
op-state consumers are the generic assertions via harness.generic.op_state — harness-side, fine).
|
||||
So P4 = discovery glob removal + new op_state fixture + pinning tests; no test migrations needed.
|
||||
test_discovery.py's HC2 gate test moved its repo-local custom fixture under functional/ (the rule);
|
||||
test_discovery_phase2.py now asserts top-level custom is NOT discovered. op_state fixture skips
|
||||
(clear reason) when env unset / file missing / unparseable; tested via request.getfixturevalue.
|
||||
|
||||
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 184 passed; lint PASS.
|
||||
|
||||
Next: P5 — customization manifest (print block + results.json key).
|
||||
|
||||
## 2026-06-10 P5 — customization manifest (branch 68954be)
|
||||
|
||||
(Resumed after a usage-limit pause mid-P5; working tree carried the in-flight manifest.py.)
|
||||
New runner/harness/manifest.py: build() collects {meta_non_default, hooks, overlays, custom_tests,
|
||||
env_overrides} via the SAME discovery/meta functions the run uses (so the manifest can never
|
||||
disagree with what actually executes — incl. the HC2 _gated() repo-local gate), render() prints
|
||||
the block. Orchestrator builds+prints right after meta load / repo-local snapshot, BEFORE the
|
||||
quick-lane branch (both lanes get the block); the dict rides into build_results(customization=...)
|
||||
verbatim. run_quick writes no results.json, so the single build_results call site covers all.
|
||||
Hooks render as "<hook>", tuples as lists (JSON-clean); ops.py pre-ops listed by cheap source
|
||||
scan (same approach as discovery._module_defines — no import at manifest time).
|
||||
|
||||
Lint flagged: C408 dict() literal, import-block order (manifest after deps), ruff-format on the
|
||||
new test file — all fixed. Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest
|
||||
tests/unit -q -> 191 passed; nix develop .#lint --command scripts/lint.sh -> lint: PASS.
|
||||
|
||||
Next: P6 docs, then M1 prep (tests/concurrency proof run + 21-recipe baseline matrix).
|
||||
|
||||
## 2026-06-10 P6 — docs (branch da558ca) + inbox response (858e0f5)
|
||||
|
||||
Rewrote the three docs to the restructured end state; kept the generated §4 table byte-identical
|
||||
(doc-sync test pins it). recipe-customization.md flipped from review spec to reference; §8 is now
|
||||
the R1–R9 resolution ledger. Facts double-checked against code before writing: R2 proof lives in
|
||||
test_screenshot.py::test_screenshot_reachable_through_real_load_path (not test_meta.py — fixed a
|
||||
first-draft error); mumble's post-F2-14c shape has NO install_steps.sh/CHAOS_BASE_DEPLOY (base =
|
||||
mumbleweb-only COMPOSE_FILE, host-ports added at head via UPGRADE_EXTRA_ENV); lasuite-docs now
|
||||
ships install_steps.sh (P2b migration); deps file shape is dict recipe->entry; custom_tests
|
||||
discovery is NON-recursive over functional/+playwright/ (old doc said recursive — corrected).
|
||||
|
||||
Adversary inbox (19:06Z, non-blocking): manifest dumps meta values verbatim -> dashboard shows a
|
||||
field named SECRET_KEY_BASE (plausible's committed CI dummy — public, no real leak). Took the
|
||||
redaction option: _jsonable masks values whose key NAME matches
|
||||
SECRET|PASSWORD|TOKEN|CREDENTIAL|word-segment-KEY, recursing into dict values (the plausible case
|
||||
is a NESTED key under EXTRA_ENV); names stay visible. KEYCLOAK_URL deliberately not matched
|
||||
(word-segment KEY). Unit test pins redacted+passthrough both.
|
||||
|
||||
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 192 passed;
|
||||
nix develop .#lint --command scripts/lint.sh -> lint: PASS.
|
||||
|
||||
Next: M1 prep — tests/concurrency proof run on the branch + the 21-dir baseline matrix.
|
||||
|
||||
## 2026-06-10 M1 prep + claim
|
||||
|
||||
Concurrency proof run on branch head 858e0f5 (rsynced tree on cc-ci): cc-ci-run -m pytest
|
||||
tests/concurrency -q -> 23 passed in 11.46s (suite untouched by the restructure, as planned).
|
||||
|
||||
Baseline matrix: pulled every /var/lib/cc-ci-runs/*/results.json (141 files) and took the most
|
||||
recent per recipe. 19/21 dirs covered by results.json; mumble's last full run predates the
|
||||
results system (log ~/ccci-mumble-f214c.log, 5 tiers pass 05-31); bluesky-pds likewise
|
||||
(Adversary Phase-2 cold verify e45e0ee). plausible's weekly-report RED was its PR branch
|
||||
(pg13->14, build 200); its default-branch baseline is run 308 (06-10) L4 — runs 307/308 are
|
||||
today's, from the conc-phase M2 sweep. Bad canaries recorded at their designed-fail tier.
|
||||
|
||||
Claimed M1. While waiting: nothing else unblocked in this phase (M2 is gated on M1) — will hold
|
||||
with short fallback polls per §7 case 2.
|
||||
|
||||
## 2026-06-11 M2 reconciliation — discourse upgrade-HC1 root-cause hunt + bluesky re-characterization
|
||||
|
||||
Resumed after a loop stall (~21:18Z–23:50Z): the m2b/ab sweeps had finished but nothing processed
|
||||
them. Adversary's 23:53Z inbox asked for (1) a same-ref A/B for the m2b-discourse upgrade-HC1 L1
|
||||
and (2) a fresh post-fix lasuite-drive L5 at baseline ref — both now queued/running.
|
||||
|
||||
Discourse dig (why I don't yet have a mechanism): first hypothesis was my own invocation error —
|
||||
m2b ran PR=0 where baseline 184 ran PR=2, and I guessed the PR-head sha was unreachable without
|
||||
the PR fetch. WRONG: fetch_recipe clones all mirror branches and `git checkout <sha>` is check=True
|
||||
— and the preserved per-run clone sits at HEAD=7ae7b0f, so the re-checkout ran AND persisted.
|
||||
Second hypothesis (prepull resets the checkout): also wrong — prepull_images is pure
|
||||
`docker compose config --images` in cwd, never touches git. The scary
|
||||
`service "sidekiq" depends on undefined service "discourse"` line turned out benign: it appears in
|
||||
the PASSING m2r/m2rr upgrade sections verbatim (the published compose ships a dangling depends_on;
|
||||
swarm ignores it — documented in the overlay NOTE). What's left: abra stamped the PREV-TAG commit
|
||||
(eb96de94 = 0.7.0+3.3.1) on the chaos redeploy while the tree was at 7ae7b0f. One live hypothesis:
|
||||
the cc-ci overlay clamps app+sidekiq images to bitnamilegacy/discourse:3.3.1; at this PR head
|
||||
(0.9.0+3.5.0 bump) the redeploy spec may end up close enough to the base spec that the label
|
||||
update path degenerates — but that requires abra-internals knowledge I can't verify analytically,
|
||||
and m2r at 7d53d4ec (which also post-dates the 3.5.0 bump?) stamped correctly with the same
|
||||
overlay, so content-difference-between-refs is doing SOMETHING. Decision: stop theorizing, let the
|
||||
2x2 complete — m2p-discourse (new main, PR=2, @7ae7b0f) distinguishes PR=0-artifact/race from
|
||||
deterministic; ab-discourse-7ae7b0f-oldmain (old main, PR=2, @7ae7b0f) distinguishes regression
|
||||
from pre-existing. Run 184 left no orchestrator log (drone-side), so its chaos stamp is unknowable
|
||||
— the old-main re-run stands in for it.
|
||||
|
||||
lifecycle.py diff c2508c7..main re-read for the upgrade path: overlay copy moved from per-recipe
|
||||
install_steps.sh to first-class auto-chaos (P2a) but the copied FILE and its untracked-persistence
|
||||
semantics are byte-identical; run_upgrade order (checkout → upgrade_env → prepull → chaos
|
||||
redeploy -c → own wait_healthy) unchanged from old main. Nothing jumps out as the delta.
|
||||
|
||||
bluesky-pds: pulled the swarm service logs from all three failed runs — identical
|
||||
`Cannot find module '/app/index.js'` crash-loop (Node v24.15.0) on new main @ mirror head, new
|
||||
main serial re-run, AND old main @ old default head. The earlier "deploy timed out during
|
||||
concurrent image pulls" guess in STATUS was wrong (the 600s timeout was the SYMPTOM; the ~2min
|
||||
A/B failure exposed the crash-loop). Upstream re-published the pinned tag with a different image
|
||||
layout — no harness can deploy it. Filed in STATUS as restructure-neutral with grep-able evidence.
|
||||
|
||||
354
REVIEW-rcust.md
354
REVIEW-rcust.md
@ -29,6 +29,354 @@ I own this file and the `## Adversary findings` section of BACKLOG-rcust.md only
|
||||
|
||||
## Verdicts
|
||||
|
||||
_(none yet — phase just started; Builder has not yet created STATUS-rcust.md or branch
|
||||
`restructure/recipe-custom`. Only the reference spec doc `76a4b6b` has landed. Awaiting first
|
||||
`claim(rcust): M1` from the Builder.)_
|
||||
_(no GATE verdict yet — M1 is not claimed. M1 only claims after P1–P6 are all on the branch;
|
||||
Builder has landed P1 (472a68b) + P2 (8cd72fd) and is mid-P3. The interim pre-review below is
|
||||
front-loaded break-it work on the FROZEN P1/P2 commits — NOT an M1 PASS.)_
|
||||
|
||||
### Interim pre-review of frozen P1+P2 (branch @ 8cd72fd) — @2026-06-10, cold from upstream clone
|
||||
|
||||
Done as idle-time break-it work while no gate is pending. P1/P2 phase commits won't be rewritten
|
||||
(Builder adds P3+ on top), so reviewing them now is non-wasted and front-loads M1. Cold clone of
|
||||
`origin/restructure/recipe-custom` into `/tmp/rcust-verify` from the true upstream remote.
|
||||
|
||||
**No defects found so far.** Results:
|
||||
|
||||
1. **Deleted-code fallout — CLEAN.** Grepped `runner/ tests/ scripts/` for live refs to every deleted
|
||||
symbol (`_recipe_meta`, `_load_meta`, `_recipe_extra_env`, `_recipe_meta_flag`, `declared_deps`,
|
||||
`is_canonical_enrolled`, `OIDC_AT_INSTALL`, `CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`,
|
||||
`setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`). All hits are comments/docstrings
|
||||
explaining the deletion, test names, or the intentionally-RETAINED `CCCI_SKIP_GENERIC*` env form
|
||||
(kept per P2c). Zero live call-sites. `setup_custom_tests.sh` files gone.
|
||||
2. **All-recipes-load-clean (typo gate) — PASS, independently.** Ran `meta.load()` (pure stdlib) over
|
||||
all 21 recipe dirs cold via plain python3 (did NOT trust the Builder's test_meta.py). All 21 load;
|
||||
non-default key sets sane. Every ALL-CAPS key used in any recipe_meta.py is in the 14-key registry.
|
||||
3. **Coverage-loss diff (CARDINAL check) — ZERO deltas on data keys + hook presence.** Throwaway
|
||||
harness (`/tmp/diff_meta.py`) reproduces main's six-loader effective resolution (`_load_meta`,
|
||||
`declared_deps`, `is_enrolled`, `_recipe_extra_env`) from MAIN's recipe_meta files and diffs vs the
|
||||
BRANCH's `meta.load()` for all 21 recipes. After correcting one harness artifact (EXTRA_ENV default
|
||||
is `{}` not None), **0/21 recipes show any delta** for HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/
|
||||
HTTP_TIMEOUT/BACKUP_CAPABLE/EXPECTED_NA/UPGRADE_BASE_VERSION/DEPS/WARM_CANONICAL + presence of
|
||||
READY_PROBE/BACKUP_VERIFY/UPGRADE_EXTRA_ENV/EXTRA_ENV/SCREENSHOT.
|
||||
4. **Validation gaps — CLOSED.** Crafted tmp recipe_metas: typo'd key → MetaError (with "did you mean
|
||||
DEPLOY_TIMEOUT?"); wrong type (`DEPLOY_TIMEOUT="str"`) → MetaError; callable on data key
|
||||
(`DEPLOY_TIMEOUT=lambda ctx:...`) → MetaError; `_PRIVATE`/lowercase-helper → loads clean (exemption
|
||||
works). All four behave per the locked decision.
|
||||
5. **meta.py read** — single `exec()`, frozen `RecipeMeta` generated from `KEYS`, `_coerce` rejects
|
||||
bool-as-int and callable-on-data-key; `non_default` compares vs registry default. No issues.
|
||||
|
||||
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** full `pytest tests/unit -q` +
|
||||
`pytest tests/concurrency -q` + `scripts/lint.sh` cold on the cc-ci host; R2 end-to-end through the
|
||||
real orchestrator screenshot path; P3 ctx-hook signature migration (assert byte-identical, legacy
|
||||
`lambda domain:` raises clear MetaError); P4/P5/P6; re-run the coverage diff on the FINAL branch
|
||||
(P3 changes hook signatures); recipe-test diffs are mechanical-only (no assertion weakening);
|
||||
HC2/F2-11/generic-floor integrity. These wait for the `claim(rcust): M1`.
|
||||
|
||||
### Interim pre-review of frozen P3 (branch @ fd02d9f) — @2026-06-10, cold from upstream clone
|
||||
|
||||
Builder landed P3 (uniform ctx hook convention) and moved to P4, so P3 is frozen. Pre-reviewed it.
|
||||
**No defects found.**
|
||||
|
||||
1. **Mechanical-migration discipline — HELD (no VETO trigger).** `git diff 8cd72fd..fd02d9f` over
|
||||
`tests/*/` shows ZERO changed assert/expected literals. Every hook change is purely
|
||||
`def HOOK(domain[, meta])` → `def HOOK(ctx)` + `domain` → `ctx.domain` in the body. Spot-checked
|
||||
cryptpad/mumble/ghost/lasuite-drive recipe_meta.py + lasuite-drive ops.py: seeded values, return
|
||||
dicts, paths, status codes, and the `pre_restore` `assert _psql(...) in (...)` are byte-identical
|
||||
apart from the `ctx.` deref.
|
||||
2. **HookCtx — present + complete.** `meta.HookCtx` frozen dataclass has all 5 documented fields
|
||||
(`.domain`, `.base_url`, `.meta`, `.deps`, `.op`); `meta.hook_ctx(domain, meta, op=…)` factory
|
||||
builds it and pulls `deps` from `$CCCI_DEPS_FILE`. All call sites migrated: run_recipe_ci
|
||||
`pre_<op>`, BACKUP_VERIFY; lifecycle `extra_env` + READY_PROBE; screenshot `SCREENSHOT(page, ctx)`.
|
||||
(NB my first pass falsely flagged "no HookCtx" — that was a STALE WORKTREE at P2; corrected by
|
||||
checking out fd02d9f. Logged here for honesty.)
|
||||
3. **Legacy-signature guard (P3.4) — PRESENT + works, live-probed.** `meta.check_hook_signature`
|
||||
exact-matches positional params and raises a CLEAR MetaError naming the P3 migration + HookCtx
|
||||
fields. Wired into both `load()` (recipe_meta hooks; SCREENSHOT expects `(page, ctx)`, rest
|
||||
`(ctx)`) and the orchestrator (ops.py `pre_<op>`). Crafted tmp metas: legacy `READY_PROBE(domain)`,
|
||||
`SCREENSHOT(page, domain, meta)`, `EXTRA_ENV(domain)` all → MetaError at load; `READY_PROBE(ctx)`
|
||||
loads clean. No silent mid-run TypeError path.
|
||||
4. **Coverage diff re-run at P3 head — still 0/21 deltas** (hook presence + all data keys unchanged).
|
||||
|
||||
Net: P1+P2+P3 all clean under cold adversarial probing. M1 still gated on full unit+concurrency+lint
|
||||
on the cc-ci host, P4–P6, R2 end-to-end via the real screenshot orchestrator path, and a final
|
||||
coverage re-diff. No findings filed; no VETO.
|
||||
|
||||
### Interim pre-review of frozen P4 (branch @ 29a28e2) — @2026-06-10T18:55Z, cold from fresh host clone
|
||||
|
||||
Builder landed P4 (custom-test ergonomics) and moved to P5, so P4 is frozen. Pre-reviewed it cold.
|
||||
**No defects found.** NOT an M1 verdict — M1 stays gated (see "Still UNVERIFIED" below).
|
||||
|
||||
Cold acceptance (fresh `git clone` on cc-ci host at 29a28e2, my own checkout — not the Builder's):
|
||||
- `cc-ci-run -m pytest tests/unit -q` → **184 passed** (exact match to claim; full suite, no
|
||||
cross-fixture pollution from the session-scoped `deps` fixture).
|
||||
- `cc-ci-run -m pytest tests/unit/test_discovery.py test_discovery_phase2.py
|
||||
test_conftest_fixtures.py -q` → 14 passed.
|
||||
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS** (ruff format/check, deadnix,
|
||||
shfmt, shellcheck, yamllint all clean).
|
||||
|
||||
Correctness probes:
|
||||
1. **Placement-rule claim ("zero in-repo users of top-level custom tests") — HOLDS.** Filesystem
|
||||
sweep of every `tests/<recipe>/test_*.py`: ALL are lifecycle names (test_{install,upgrade,
|
||||
backup,restore}.py). No top-level non-lifecycle custom exists in-repo, so dropping the top-level
|
||||
glob in `discovery.custom_tests` loses ZERO coverage. The lifecycle-name exclusion is retained
|
||||
inside functional/playwright as the double-run safety net.
|
||||
2. **Discovery diff — clean.** Top-level `glob(test_*.py)` branch removed; functional/ + playwright/
|
||||
subdir globs retained with `basename not in lifecycle_names` guard. Docstring + module header
|
||||
updated to state the placement RULE.
|
||||
3. **Test changes are adaptation + strengthening, NOT weakening (no VETO trigger).**
|
||||
- `test_discovery_phase2`: renamed to `..._placement_rule_...`; now ASSERTS the top-level
|
||||
`test_sso_smoke.py` is `not in names` (new negative assertion proving the behavior change),
|
||||
while functional/playwright customs are still `in names` and lifecycle name excluded.
|
||||
- `test_discovery::test_custom_tests_repo_local_gated`: repo-local custom moved from top-level
|
||||
into `functional/`; HC2 default-deny (`== []` when unapproved) and approved-case
|
||||
(`functional/test_sso.py in names`, `test_install.py` excluded) both INTACT. HC2 integrity
|
||||
preserved.
|
||||
4. **op_state fixture — correct.** Skips with clear reason on unset env / missing file / non-JSON
|
||||
(`except ValueError` catches JSONDecodeError); reads & returns parsed dict otherwise. Tests
|
||||
cover 3 of 4 paths (the non-JSON skip path is untested — minor coverage gap, not a defect; the
|
||||
branch is trivially correct by inspection).
|
||||
|
||||
Net: P1+P2+P3+P4 all clean under cold adversarial probing; both halves of every phase claim
|
||||
(unit count + lint) reproduced cold on a fresh clone. No findings filed; no VETO.
|
||||
|
||||
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** P5 (manifest) + P6 (docs);
|
||||
`pytest tests/concurrency -q` cold; R2 end-to-end through the real orchestrator screenshot path;
|
||||
final coverage re-diff on the COMPLETE branch (P1–P6, all 21 recipes, effective customization set
|
||||
unchanged); recipe-test diffs mechanical-only across the whole branch; HC2/F2-11/generic-floor
|
||||
integrity at the final head. These wait for `claim(rcust): M1`.
|
||||
|
||||
### Interim pre-review of frozen P5 (branch @ 68954be) — @2026-06-10T19:06Z, cold from fresh host clone
|
||||
|
||||
Builder landed P5 (customization manifest) and moved to P6, so P5 is frozen. Pre-reviewed it cold.
|
||||
**No blocking defect; one secret-SURFACE observation raised (heads-up to Builder, NOT a VETO, NOT
|
||||
an M1 secret-leak failure).** NOT an M1 verdict.
|
||||
|
||||
Cold acceptance (fresh `git clone` on cc-ci host at 68954be, my own checkout):
|
||||
- `cc-ci-run -m pytest tests/unit -q` → **191 passed** (exact match to claim).
|
||||
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
|
||||
|
||||
Primary adversarial target — SECRET LEAKAGE via the new manifest surface (D-gate: published logs +
|
||||
dashboard contain NO secrets, incl. generated app passwords):
|
||||
1. **Generated/runtime secrets — NOT exposed (gate holds).** `manifest.build` collects only:
|
||||
`meta_non_default` (static recipe_meta), hook NAMES (pre-ops/install_steps.sh/compose.ccci.yml),
|
||||
overlay FILENAMES, custom-test COUNTS, and env-override KEY names (printed `KEY=1`, value never
|
||||
rendered). It never touches `deps` (client_secret), `op_state`, abra-generated app passwords, or
|
||||
any env VALUE. The cardinal concern — generated app passwords on the dashboard — is structurally
|
||||
absent from this surface.
|
||||
2. **Cold all-recipes sweep.** Built+rendered the manifest for all 21 recipes on the host; grepped
|
||||
the rendered blocks AND the results.json `customization` payload for secret/password/token/key/
|
||||
credential and for any 32+ char high-entropy string. The ONLY hit, across every recipe, is
|
||||
plausible's `EXTRA_ENV.SECRET_KEY_BASE` =
|
||||
`"ccciplausibletestkeybase64charsexactlyforCIephemeral4567890123"`.
|
||||
3. **OBSERVATION (not a leak):** that value is a HARDCODED, committed, PUBLIC dummy CI constant
|
||||
(tests/plausible/recipe_meta.py, in the open-source repo) — not a generated or real secret.
|
||||
`meta_non_default` dumps EXTRA_ENV literal dicts verbatim into the log AND results.json (→
|
||||
dashboard), so a field literally named `SECRET_KEY_BASE` with a value now appears on the
|
||||
dashboard. No real secret is exposed (it's public), so this is NOT a D-gate failure and does NOT
|
||||
block P5. BUT it's a standing surface: (a) a dashboard secret-scan gets a true-positive-shaped
|
||||
hit on a public dummy (noise that could mask a real leak), and (b) if any recipe ever set a real
|
||||
secret-ish literal in a meta dict, the manifest would surface it unredacted. Flagged to Builder
|
||||
via BUILDER-INBOX as a heads-up to consider redacting values of sensitive-named meta keys before
|
||||
M1. Will re-examine on the real dashboard at the M1 cold-verify.
|
||||
4. **HC2-honoring — confirmed.** Manifest routes ALL repo-local reads through `discovery._gated`
|
||||
(ops.py loop direct; `install_steps`/`resolve_overlay_op`/`custom_tests` each call `_gated`
|
||||
internally). An unapproved repo-local recipe contributes nothing to the manifest.
|
||||
5. **Pure presentation — holds.** `build()` only reads files/env and returns a dict; `render()`
|
||||
formats a string. Called at run_recipe_ci.py:889-890 (print) + embedded at :1261 into results;
|
||||
no state mutation, no verdict influence. `_jsonable` renders callables as `'<hook>'` (so a
|
||||
callable EXTRA_ENV/READY_PROBE never leaks closure internals) and tuples→lists for JSON.
|
||||
|
||||
Net: P1–P5 all clean under cold adversarial probing; every phase claim (unit count + lint)
|
||||
reproduced cold. No findings filed; no VETO. One non-blocking secret-surface heads-up sent.
|
||||
|
||||
**Still UNVERIFIED for M1:** P6 (docs); `pytest tests/concurrency -q` cold; R2 end-to-end via the
|
||||
real orchestrator screenshot path; final coverage re-diff on the COMPLETE branch (all 21 recipes,
|
||||
effective customization unchanged); recipe-test diffs mechanical-only across the whole branch;
|
||||
HC2/F2-11/generic-floor integrity at final head; AND — at the M1 dashboard check — confirm the
|
||||
SECRET_KEY_BASE-named field on the real dashboard is the accepted public dummy (or redacted).
|
||||
These wait for `claim(rcust): M1`.
|
||||
|
||||
## M1 — implementation verified: **PASS** @2026-06-10T19:27Z (branch `restructure/recipe-custom` @ 858e0f5)
|
||||
|
||||
Cold-verified from TWO fresh clones on the cc-ci host (NEW=858e0f5, OLD=main pre-restructure;
|
||||
merge-base 49fb818 confirmed → `main..858e0f5` is exactly P1–P6). Verdict formed from the phase plan
|
||||
(SSOT), the code/git history, the STATUS verification facts, and my own cold re-runs — NOT from
|
||||
JOURNAL rationale (isolation discipline; I did not need to consult JOURNAL).
|
||||
|
||||
**All M1 Definition-of-Done items PASS:**
|
||||
|
||||
1. **Cold test suites — match claim exactly.** Fresh clone @858e0f5:
|
||||
`cc-ci-run -m pytest tests/unit -q` → **192 passed**; `tests/concurrency -q` → **23 passed**
|
||||
(untouched by this plan, proven); `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
|
||||
|
||||
2. **Coverage diff (cardinal risk) — 0 REAL deltas / 21 recipes.** Wrote throwaway extractors that
|
||||
resolve EVERY recipe's effective customization in BOTH worlds — OLD via the legacy loaders
|
||||
(`_load_meta` + `lifecycle._recipe_extra_env` + `deps.declared_deps` + `_recipe_meta_flag`),
|
||||
NEW via `meta.load()` + `meta.extra_env/upgrade_extra_env` — for the common keys (HEALTH_*,
|
||||
timeouts, DEPS, EXTRA_ENV resolved at a fixed domain, UPGRADE_EXTRA_ENV, BACKUP_CAPABLE,
|
||||
EXPECTED_NA, UPGRADE_BASE_VERSION, READY_PROBE/BACKUP_VERIFY presence). Diff = **0 behavioral
|
||||
deltas**; the only raw diffs were 20× `UPGRADE_EXTRA_ENV: None→{}` (unset default representation,
|
||||
behaviorally identical) and mumble (most-customized: callable EXTRA_ENV→dict, UPGRADE_EXTRA_ENV,
|
||||
READY_PROBE) is **byte-identical** old↔new.
|
||||
Deleted keys accounted for (no silent loss): `SKIP_GENERIC` (0 recipe users); `CHAOS_BASE_DEPLOY`
|
||||
→ overlay-presence (discourse+ghost, exactly the two shipping compose.ccci.yml — perfect 1:1, no
|
||||
change either direction); `OIDC_AT_INSTALL` → install-time made universal (drive+meet were
|
||||
already install-time). **lasuite-docs** declared DEPS but NOT OIDC_AT_INSTALL → OLD post-install,
|
||||
NEW install-time: an INTENTIONAL P2b consolidation, not a drop — flagged below for M2 validation.
|
||||
|
||||
3. **Assertion weakening (VETO-class) — NONE.** Full branch diff over all recipe test files
|
||||
(excl. harness unit/concurrency/regression): 18 removed asserts, 18 added. After mechanical
|
||||
normalization (`domain`→`ctx.domain`, `deps_creds`→`deps`, `MAX_USERS`→`_MAX_USERS`, whitespace)
|
||||
the removed and added assert sets are **IDENTICAL** — zero unmatched in either direction. Every
|
||||
change is a pure signature/fixture/constant rename; no expected value altered, no assert deleted.
|
||||
Spot-confirmed discourse/ghost `_psql(domain,…ci_marker…) in (…)` → `ctx.domain` only (expected
|
||||
tuple + SQL byte-identical). **No VETO.**
|
||||
|
||||
4. **Deleted-code fallout — clean.** No dangling LIVE refs to any of the 13 deleted symbols
|
||||
(`_recipe_meta`/`_load_meta`/`_recipe_extra_env`/`_recipe_meta_flag`/`declared_deps`/
|
||||
`is_canonical_enrolled`/`OIDC_AT_INSTALL`/`CHAOS_BASE_DEPLOY`/`SKIP_GENERIC`/`setup_custom_tests`/
|
||||
`deps_apps`/`deps_creds`/`deployed_app`). Only residue: stale DOC/comment mentions of
|
||||
`OIDC_AT_INSTALL` + `setup_custom_tests.sh` in PARITY.md files (non-blocking P6 cosmetic nit).
|
||||
|
||||
5. **Validation gaps — closed.** Cold-probed `meta.load()` with synthetic bad metas: typo'd key,
|
||||
str-on-int, bool-as-int, callable-on-data-key, legacy hook sig `READY_PROBE(domain)`, and unknown
|
||||
key ALL → `MetaError` (clear, names the offending file/key). Clean + underscore-private-helper
|
||||
metas load fine (no false positives). No silent pass.
|
||||
|
||||
6. **R2 fixed end-to-end.** Cold proof through the REAL load path: a recipe declaring
|
||||
`def SCREENSHOT(page, ctx)` is surfaced by `meta.load()` and resolved callable by
|
||||
`screenshot._load_screenshot_hook` (old L1 allowlist dropped it — now arrives); orchestrator wires
|
||||
it `run_recipe_ci.py:1029 capture(…, recipe_meta=meta)` → `hook(page, hook_ctx(domain, meta))`.
|
||||
Absent recipe → None (default landing-page path). Legacy `SCREENSHOT(page, domain, meta)` sig
|
||||
rejected at load.
|
||||
|
||||
7. **HC2 / F2-11 / generic-floor integrity — preserved.** Cold-probed `discovery.custom_tests` +
|
||||
`install_steps`: UNAPPROVED repo-local → `[]` / `None` (default-deny holds); APPROVED → surfaced.
|
||||
`sso_dep_unverified` (F2-11) logic UNCHANGED (only a comment edited) — a deps-not-ready run that
|
||||
skips ≥1 `requires_deps` test still suppresses the green signal. Generic floor `_skip_generic`
|
||||
default = run (additive); opt-out now env-only (same env vars as before; the 0-user meta key
|
||||
removed) and surfaced LOUDLY in CI + flagged `!!` in the manifest — strictly stronger, never
|
||||
silent.
|
||||
|
||||
8. **(Bonus) P5 secret-surface heads-up RESOLVED + verified.** The Builder landed `858e0f5`
|
||||
redacting secret-named meta values in the manifest (my P5 BUILDER-INBOX ask). Cold-verified:
|
||||
`plausible.EXTRA_ENV.SECRET_KEY_BASE` → `<redacted>` in BOTH the log block and results.json;
|
||||
recursive into nested dict keys; word-segment `(^|_)KEY(_|$)` regex avoids over-match
|
||||
(KEYCLOAK_* passes). All-21-recipe sweep: exactly 1 redaction, ZERO over-redaction, ZERO
|
||||
under-redaction (no secret-shaped value remains). Regression test
|
||||
`test_manifest_redacts_sensitive_named_values` present.
|
||||
|
||||
**Verdict: M1 PASS.** No findings filed, no VETO.
|
||||
|
||||
**This does NOT clear `## DONE`.** Per the phase DoD, DONE requires a fresh Adversary PASS for BOTH
|
||||
M1 *and* M2. M2 (merged-main real-CI regression sweep vs the committed baseline matrix) is still
|
||||
unverified. M2 watch-items I will specifically re-check from run logs:
|
||||
- **lasuite-docs OIDC is now install-time** (post→install change above) — must pass a real run with
|
||||
OIDC wired at install (skip-count 0 on its `requires_deps` tests).
|
||||
- the customization spot-checks the plan §M2.4 enumerates (mumble READY_PROBE tcp lines, cryptpad
|
||||
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + auto-chaos base deploy, lasuite-*
|
||||
deps provisioning + OIDC tests ran, immich ops.py seeds, manifest block present in every log,
|
||||
screenshot.png where capture succeeded).
|
||||
- canary suite (RED canaries still caught at intended tier) + per-recipe level == baseline matrix.
|
||||
- zero leaked apps after teardown.
|
||||
|
||||
### M2-prep — independent hook-port audit (shell→python / best-effort↔fatal drift) @2026-06-10T20:55Z
|
||||
|
||||
Triggered by the lasuite-drive regression (below), which my M1 PASS MISSED: my M1 coverage diff
|
||||
compared recipe_meta KEYS (resolved values), not ops.py hook BODIES, and my assertion scan matched
|
||||
`assert ` not `raise AssertionError`. So a hook that flipped best-effort→fatal was invisible to my
|
||||
M1 method. M2 (real-CI sweep) caught it — the safety net working as designed. I then audited ALL
|
||||
hook ports cold (`git diff c2508c7..origin/main` per recipe ops.py + the 2 setup_custom_tests.sh
|
||||
ports), filtering for non-mechanical error-handling (raise/assert/except/exit/timeout/poll changes):
|
||||
|
||||
- **lasuite-drive `pre_install`** — GENUINE rcust regression (Builder-disclosed, I confirmed):
|
||||
OLD setup_custom_tests.sh bucket poll fell through on 90s timeout (best-effort, no failure; the
|
||||
custom-tier `test_minio_storage.py` upload→list→download is the real gate); NEW port added a
|
||||
terminal `raise AssertionError` → deterministic install RED when the bucket appears just after
|
||||
90s. Fix-forward APPROVED (restore best-effort print+return, scoped to line-54 only; conditioned
|
||||
on an L5 re-run + my diff re-verify). See approval entry in BUILDER-INBOX history (commit 57c66ad).
|
||||
- **lasuite-docs `install_steps.sh`** — INTENTIONAL P2b change, NOT a defect: OLD setup_custom_tests
|
||||
did `exit 1` on missing deps/null KC creds; NEW does `exit 0` (no-op) for missing-deps (gated now
|
||||
by F2-11: the `@requires_deps` OIDC test skips → `sso_dep_unverified` suppresses green) BUT
|
||||
preserves `exit 1` on secret-insert failure. Consistent with the install-time-deps redesign.
|
||||
WATCH-ITEM (residual): the missing-deps path now relies entirely on F2-11; the sweep didn't
|
||||
exercise it (deps were ready, skip-count 0). Mechanism verified present at M1; not blocking.
|
||||
- **All other ops.py** (cryptpad, discourse, ghost, immich, keycloak, lasuite-meet, matrix-synapse,
|
||||
mattermost-lts, mumble, n8n, plausible, custom-html) — pure mechanical ctx migration
|
||||
(`domain`→`ctx.domain`, `meta`→`ctx.meta`); expected tuples/strings byte-identical (spot-checked
|
||||
keycloak 201/409 + 204/200, discourse/ghost _psql ci_marker). No error-handling drift.
|
||||
|
||||
Net: exactly ONE accidental hook-port regression (lasuite-drive), now under approved fix. No other
|
||||
best-effort↔fatal flips. This audit closes the M1-method gap for the hook bodies.
|
||||
|
||||
---
|
||||
|
||||
### M2 proof-run independent analysis (cold, Adversary) @2026-06-10T23:53Z
|
||||
|
||||
M2 is NOT yet claimed by the Builder; this is my independent read of the proof runs sitting on
|
||||
cc-ci (`/var/lib/cc-ci-runs/{m2b-*,ab-*-oldmain}`), parsed myself via jq (NOT trusting Builder
|
||||
narrative). The 6 first-sweep mismatches break down as follows.
|
||||
|
||||
**Confirmed root fact — REF MISMATCH is real (I verified, not taken on faith).** Every baseline
|
||||
matrix run used a *PR-head* ref; the first M2.3 sweep used each mirror's *default-branch head* — a
|
||||
different commit. Independently confirmed via `results.json.ref`:
|
||||
| recipe | baseline run/ref/level | sweep ref/level |
|
||||
|---|---|---|
|
||||
| discourse | 184 / 7ae7b0f76efb / L4 | 7d53d4ec390f / L2 |
|
||||
| plausible | 308 / 13458fac56a1 / L4 | da159375d89a / L2 |
|
||||
| mattermost-lts | 196 / a333e31a6002 / L4 | 41c9eb8e5f34 / L2 |
|
||||
| immich | 307 / 107d7220adce / L4 | 7eb3937a82d0 / L2 |
|
||||
| lasuite-drive | 189 / ffa7d585afa2 / L5 | f4135d78201e / L0 |
|
||||
So the sweep was NOT apples-to-apples vs the baseline matrix. Reconciliation requires either
|
||||
(a) re-run at the baseline ref on new main == baseline level, or (b) A/B same-ref old-vs-new main
|
||||
== same level. Status per recipe:
|
||||
|
||||
- **immich** — m2b-immich (new main, baseline ref 107d7220adce) = **L4 == baseline L4. CLEAN.**
|
||||
- **mattermost-lts** — m2b (new main, a333e31a6002) = **L4 == baseline L4. CLEAN.**
|
||||
- **plausible** — m2b (new main, 13458fac56a1) = **L4 == baseline L4. CLEAN.**
|
||||
→ these three: restructure proven INNOCENT (baseline ref reproduces baseline level on merged main).
|
||||
- **bluesky-pds** — ab-bluesky-pds-oldmain (OLD main, b2d86efba3f1) = L0 == new-main sweep L0 at
|
||||
same ref → restructure-NEUTRAL at the sweep ref. (Baseline is "L4-equiv, pre-results-era", no run
|
||||
id — softer baseline; A/B neutrality is the available evidence.)
|
||||
- **discourse — NOT yet clean. OPEN.** Two *distinct* flake modes seen, and the A/B was run at the
|
||||
wrong ref to close the gap:
|
||||
- baseline 184 (OLD main, 7ae7b0f): all pass → L4.
|
||||
- m2b-discourse (NEW main, SAME ref 7ae7b0f): **upgrade FAILED**, HC1 guard fired —
|
||||
"upgrade deployed chaos commit 'eb96de94+U', not intended PR-head '7ae7b0f76efb' — re-checkout
|
||||
to code-under-test failed (HC1)" → L1. ← same-ref old=L4 vs new=L1 discrepancy, UNexplained.
|
||||
- ab-discourse-oldmain (OLD main, 7d53d4ec): **restore FAILED** (ci_marker truncated-dump race)
|
||||
→ L2 == new-main sweep L2 at that ref → neutrality proven, but for the RESTORE mode at the
|
||||
DEFAULT-head ref, NOT for the L1/upgrade-HC1 mode at the baseline ref.
|
||||
- Net: the clean A/B (ref 7ae7b0f on OLD main vs NEW main) that would explain L4→L1 was NOT run.
|
||||
The upgrade re-checkout/HC1 path lives in run_recipe_ci.py/lifecycle which the meta-param
|
||||
threading DID touch — so "pre-existing flake" is plausible but UNPROVEN here. To clear: run
|
||||
discourse @7ae7b0f on OLD main (does it deterministically reproduce L4, or also flake to L1?),
|
||||
and/or repeat @7ae7b0f on new main to characterise the HC1 re-checkout as a race. The HC1 guard
|
||||
FIRING (not silently passing the wrong commit) is the safety net working — good — but it means
|
||||
the upgrade did not exercise the PR code, so the run is inconclusive, not a clean baseline match.
|
||||
- **lasuite-drive** — fix-forward 1357544 (restore best-effort bucket poll) landed; needs a fresh
|
||||
L5 run at the baseline ref ffa7d585afa2 on merged main to confirm baseline. m2rr/earlier runs
|
||||
predate or used the default head — NOT yet a clean baseline match. OPEN.
|
||||
|
||||
**M2 disposition: still OPEN — no PASS.** 3/6 cleanly reconciled (immich/mattermost/plausible);
|
||||
bluesky neutral-at-sweep-ref; discourse + lasuite-drive NOT yet closed. I will require, at the M2
|
||||
claim: (1) discourse same-ref A/B (or repeat) explaining L4→L1; (2) a clean lasuite-drive L5 at
|
||||
baseline ref; (3) my own cold re-parse of every per-recipe level vs baseline; (4) the M2.4
|
||||
customization-executed spot-greps; (5) zero leaked apps. Recorded a BUILDER-INBOX heads-up on the
|
||||
discourse-HC1 gap so it is addressed in the claim, not glossed as "the restore flake".
|
||||
|
||||
### M2 proof-run progress + self-correction @2026-06-11T00:05Z
|
||||
|
||||
Builder is running (independently, matching my inbox ask) the decisive A/B serially on the box:
|
||||
`m2-proof.sh` → lasuite-drive @ffa7d585afa2 PR=1 (post-fix-forward 1357544) on merged main 5c0676b,
|
||||
then discourse @7ae7b0f76efb **PR=2** on merged main (m2p-discourse); `m2-proof2.sh` (queued) →
|
||||
discourse @7ae7b0f76efb **PR=2** on OLD main (/root/m2-oldmain, ab-discourse-7ae7b0f-oldmain).
|
||||
|
||||
**Self-correction to my 23:53Z discourse analysis:** my m2b-discourse run used **PR=0**, but the
|
||||
upgrade HC1 guard resolves the *PR head* for the re-checkout. The L1 failure message ("deployed
|
||||
chaos commit 'eb96de94+U', not PR-head 7ae7b0f — re-checkout failed") is plausibly a **PR=0
|
||||
artifact** (no real PR to resolve the head from), NOT a restructure regression. The Builder's proof
|
||||
runs correctly use PR=2 (matching baseline run 184's pr=2). So the apples-to-apples comparison I
|
||||
need is m2p-discourse (PR=2, new main) vs ab-discourse-7ae7b0f-oldmain (PR=2, old main) vs baseline
|
||||
184 (PR=2, old main, L4). I will cold-verify those three when they land; my L4→L1 concern is on
|
||||
hold pending the PR=2 result, not yet a confirmed regression. Live lasu-f68b63 stack = active
|
||||
lasuite-drive proof run (expected, not a leak).
|
||||
|
||||
159
STATUS-rcust.md
159
STATUS-rcust.md
@ -6,17 +6,160 @@ Work branch: `restructure/recipe-custom` (one commit per phase P1–P6; merged t
|
||||
|
||||
## Phase progress
|
||||
|
||||
- [ ] P1 — harness/meta.py single loader + key registry + migrate L1–L6 + unit tests + doc gen
|
||||
- [ ] P2 — delete legacy keys/paths (CHAOS_BASE_DEPLOY, OIDC_AT_INSTALL, SKIP_GENERIC meta, conftest cleanup)
|
||||
- [ ] P3 — uniform ctx hook convention
|
||||
- [ ] P4 — custom-test ergonomics (placement rule, op_state/deps fixtures)
|
||||
- [ ] P5 — customization manifest
|
||||
- [ ] P6 — docs
|
||||
- [x] P1 — single loader + key registry + migrate L1–L6 + unit tests + doc gen
|
||||
(branch commit 472a68b)
|
||||
- [x] P2 — delete legacy keys/paths: compose.ccci.yml first-class+auto-chaos; install-time deps only
|
||||
(lasuite-docs migrated, setup_custom_tests.sh gone); SKIP_GENERIC meta deleted (env dev-only +
|
||||
loud CI warning); conftest cleanup (deployed/deployed_app/app_domain gone, one `deps` fixture)
|
||||
(branch commit 8cd72fd)
|
||||
- [x] P3 — uniform ctx hook convention: HookCtx(.domain/.base_url/.meta/.deps/.op); all hooks
|
||||
take ctx; legacy signatures raise MetaError at load naming the migration (branch fd02d9f)
|
||||
- [x] P4 — custom-test ergonomics: placement rule (custom under functional/+playwright/ only),
|
||||
op_state fixture, deps fixture tests (branch 29a28e2)
|
||||
- [x] P5 — customization manifest: one block at run start (non-default meta keys, hooks, overlays,
|
||||
custom-test counts, active CCCI_SKIP_GENERIC* env overrides with !! CI flag) printed +
|
||||
embedded verbatim in results.json under "customization"; pure presentation, HC2-honoring
|
||||
(branch commit 68954be — new runner/harness/manifest.py + tests/unit/test_manifest.py)
|
||||
- [x] P6 — docs rewritten to the end state: recipe-customization.md is now the REFERENCE (was
|
||||
review spec) — §8 records R1–R9 resolutions, §4 keeps the generated table + HookCtx, §5 the
|
||||
end-state shapes; testing.md invariant updated to install-time-deps isolation, generic
|
||||
opt-out documented dev-only; enroll-recipe.md worked examples (lasuite-docs install-time
|
||||
OIDC, mumble post-F2-14c), deps fixture, ctx signatures (branch commit da558ca)
|
||||
- [x] Adversary inbox 19:06Z (P5 manifest dashboard hygiene) — addressed: secret-NAMED meta
|
||||
values (top-level + nested dict keys) render as '<redacted>' in manifest + results.json;
|
||||
key names stay visible; unit-test pinned (branch commit 858e0f5)
|
||||
|
||||
## P1–P6 verification facts (for the eventual M1 cold-verify)
|
||||
|
||||
- WHERE: branch `restructure/recipe-custom`, P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2,
|
||||
P5=68954be, P6=da558ca, manifest-redaction fix=858e0f5 (branch head).
|
||||
- HOW: `cc-ci-run -m pytest tests/unit -q` and `nix develop .#lint --command scripts/lint.sh`
|
||||
from a clean checkout of the branch.
|
||||
- EXPECTED: 192 passed; `lint: PASS`.
|
||||
- New single loader: `runner/harness/meta.py::load()`; all-recipes typo gate + R2 proof in
|
||||
`tests/unit/test_meta.py`; docs §4 table generated by `scripts/gen-meta-docs.py` (sync pinned
|
||||
by unit test).
|
||||
|
||||
## M2 baseline matrix (built BEFORE merge, per plan M2.1)
|
||||
|
||||
Expected outcome per recipe dir for the post-merge regression sweep = most recent known-good
|
||||
evidence. Levels are results.json `level`; evidence = run id under /var/lib/cc-ci-runs/<id>/
|
||||
(on cc-ci) unless noted. Bad canaries are EXPECTED to fail at their designed tier.
|
||||
|
||||
| Recipe | Expected | Evidence |
|
||||
|---|---|---|
|
||||
| bluesky-pds | full lifecycle green: 5 tiers + 4 custom pass, deploy-count=1 (L4-equiv; pre-results-era) | Adversary cold run, REVIEW e45e0ee (Phase 2 Q4.3); weekly 06-05: up-to-date |
|
||||
| cryptpad | L4 (all four essential rungs pass) | run 181 (06-05) |
|
||||
| custom-html | L4 | run 182 (06-05) |
|
||||
| custom-html-bkp-bad | DESIGNED-BAD: backup tier fail → backup_restore=fail, L1 | run regression-bad-restore-2 (06-02) |
|
||||
| custom-html-rst-bad | DESIGNED-BAD: restore tier fail → backup_restore=fail, L1 | run regression-bad-restore-3 (06-02) |
|
||||
| custom-html-tiny | L2 (backup_restore N/A — declared EXPECTED_NA; functional N/A) | run 205 (06-09) |
|
||||
| discourse | L4 | run 184 (06-05) |
|
||||
| ghost | L4 | run 185 (06-05) |
|
||||
| hedgedoc | L4 | run 113 (06-02) |
|
||||
| immich | L4 | run 307 (06-10) |
|
||||
| keycloak | L4 | run 187 (06-05) |
|
||||
| lasuite-docs | L5 (integration pass) | run 188 (06-05) |
|
||||
| lasuite-drive | L5 (integration pass) | run 189 (06-05) |
|
||||
| lasuite-meet | L5 (integration pass) | run 204 (06-09) |
|
||||
| mailu | L2 (backup_restore N/A — no backupbot labels; functional pass) | run 191 (06-05) |
|
||||
| matrix-synapse | L4 | run 203 (06-08) |
|
||||
| mattermost-lts | L4 | run 196 (06-05) |
|
||||
| mumble | all 5 tiers pass, deploy-count=1 (L4-equiv; pre-results-era) | log ~/ccci-mumble-f214c.log on cc-ci (05-31) |
|
||||
| n8n | L4 | run 197 (06-05) |
|
||||
| plausible | L4 | run 308 (06-10) |
|
||||
| uptime-kuma | L4 | run 165 (06-02) |
|
||||
|
||||
Customization-executed spot-greps for M2.4 (mumble READY_PROBE tcp lines, cryptpad
|
||||
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + chaos base, lasuite-* deps
|
||||
provisioning + OIDC skip-count 0, immich ops.py seeds, manifest block in every log) apply on the
|
||||
sweep runs, not retroactively here.
|
||||
|
||||
## Gate
|
||||
|
||||
(none claimed yet — phase bootstrap)
|
||||
**Gate: M2 IN PROGRESS** — M1 PASS in REVIEW-rcust.md (01f9f70, 2026-06-10).
|
||||
|
||||
- M2.0 merge: `restructure/recipe-custom` merged to main as 01e6d49 (merge commit, no force);
|
||||
push build green: drone build **326 success** on 01e6d49 (API-verified).
|
||||
- M2.2 canary suite: **7/7 PASSED** in 286s (fresh clone of merged main at /root/m2-sweep on
|
||||
cc-ci, log /root/m2-canary.log) — green canaries pass, all four RED canaries still caught at
|
||||
their designed tiers (bad-install/bad-upgrade/bad-backup/bad-restore).
|
||||
- M2.3 per-recipe sweep (driver /root/m2-driver.sh, 2 concurrent, REF = mirror heads; logs
|
||||
/root/m2-logs/<r>.log; results /var/lib/cc-ci-runs/m2r-<r>/): first pass **15/21 matched
|
||||
baseline** —
|
||||
hedgedoc/custom-html/custom-html-tiny/uptime-kuma/n8n/cryptpad/ghost/keycloak/mumble/mailu/
|
||||
matrix-synapse/lasuite-docs/lasuite-meet at baseline level; both DESIGNED-BAD canaries failed
|
||||
at exactly their designed tier (bkp-bad: backup fail; rst-bad: backup pass→restore fail).
|
||||
6 below baseline, ALL flake-shaped (known modes, not new assertion semantics):
|
||||
discourse+plausible+mattermost-lts+immich restore data-integrity (the documented pre-existing
|
||||
truncated-dump capture race — discourse BACKUP_VERIFY honestly failed 3/3 attempts, its
|
||||
docstring + the 06-05 weekly report record this exact mode pre-restructure; seeds verified
|
||||
committed by ops.py read-back asserts, i.e. the migrated ctx hooks executed correctly);
|
||||
bluesky-pds abra `FATA deploy timed out` at default 600s during concurrent image pulls;
|
||||
lasuite-drive pre_install MinIO one-shot 90s timeout (bucket appeared later — every
|
||||
subsequent tier passed). Serial re-runs (MAX=1, /root/m2-rerun.sh, logs /root/m2-rerun-logs/,
|
||||
results m2rr-<r>/) completed 20:44Z — but ran default heads, not baseline refs (superseded by
|
||||
the targeted runs below).
|
||||
- M2.3 reconciliation runs (serial, MAX=1):
|
||||
- **Baseline-ref re-runs on merged main** (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/,
|
||||
results m2b-<r>/): **plausible L4, mattermost-lts L4, immich L4** at their exact baseline refs —
|
||||
baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three.
|
||||
m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): **L1, NEW mode** — upgrade HC1
|
||||
`deployed chaos commit 'eb96de94+U', not PR-head '7ae7b0f76efb'`. Investigated facts (cold-checkable
|
||||
in /var/lib/cc-ci-runs/m2b-discourse/): `eb96de94` IS the prev-base tag commit `0.7.0+3.3.1`
|
||||
(`git -C .../abra/recipes/discourse rev-list -n1 0.7.0+3.3.1`); the preserved per-run clone HEAD =
|
||||
7ae7b0f (the upgrade re-checkout DID run and persist); the
|
||||
`service "sidekiq" depends on undefined service "discourse"` log line is benign noise (appears
|
||||
verbatim in the PASSING m2r/m2rr upgrade sections too; published compose ships a dangling
|
||||
depends_on — see tests/discourse/compose.ccci.yml NOTE). So the chaos redeploy itself left the
|
||||
base stamp in place at this ref. NOT folded into the restore-flake cluster; discriminating runs
|
||||
queued (below).
|
||||
- **Old-main A/B at the m2r ref** (/root/m2-ab.sh, /root/m2-ab-logs/, results ab-<r>-oldmain/):
|
||||
discourse @7d53d4ec on OLD main = **L2 restore fail** == new-main m2r L2 at the same ref →
|
||||
restore race harness-neutral at that ref. bluesky-pds @b2d86ef on OLD main = **L0 install fail**.
|
||||
- **bluesky-pds re-characterized (not a pull timeout)**: the app container crash-loops
|
||||
`Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND, Node v24.15.0) in ALL THREE
|
||||
failures — m2r (new main @ mirror head), m2rr (new main, serial), ab-oldmain (OLD main @ old
|
||||
default head b2d86ef). Same pinned tag, both harnesses, both refs → upstream image content moved
|
||||
under the tag; recipe cannot deploy on ANY harness. Evidence:
|
||||
`grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/default/`.
|
||||
Restructure-neutral (old==new L0).
|
||||
- M2.3 in-flight proof runs (serial queue /root/m2-proof.sh + /root/m2-proof2.sh, logs
|
||||
/root/m2-proof-logs/, driver /root/m2-proof-logs/driver.log):
|
||||
1. **lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b** (post-fix-forward
|
||||
1357544) → run id m2p-lasuite-drive; EXPECTED L5 (the Adversary approval condition).
|
||||
2. **discourse @7ae7b0f PR=2 on merged main** (exact baseline-184 invocation) → m2p-discourse;
|
||||
discriminates PR=0-artifact/race vs deterministic-at-ref.
|
||||
3. **discourse @7ae7b0f PR=2 on OLD main** (/root/m2-oldmain) → ab-discourse-7ae7b0f-oldmain;
|
||||
completes the same-ref A/B the upgrade-HC1 mode is missing.
|
||||
- M2.4 spot-greps (customizations actually executed — log evidence in /root/m2-logs/):
|
||||
manifest block present 21/21; mumble `ready-probe OK (tcp 3x): 127.0.0.1:64738`; ghost+discourse
|
||||
`ccci-overlay: provided compose.ccci.yml ... auto-chaos` (P2a first-class path live);
|
||||
discourse BACKUP_VERIFY hook live (3 verify lines); lasuite-docs `install-time OIDC:
|
||||
provisioning deps ['keycloak'] BEFORE deploy` + `test_oidc_login_via_keycloak PASSED`
|
||||
(requires_deps skip-count 0); immich ops.py pre_upgrade/pre_backup/pre_restore seed lines;
|
||||
cryptpad EXTRA_ENV='<hook>' in manifest + its 4 overlays + playwright green (hook applied);
|
||||
19 screenshot.png across m2r-* dirs.
|
||||
- Teardown: `docker stack ls` after the full 21-recipe sweep = infra stacks + warm-keycloak only,
|
||||
**zero leaked apps**.
|
||||
- Drone→harness path: !testme on two open recipe PRs pending after the re-runs.
|
||||
|
||||
**Gate history: M1 CLAIMED 2026-06-10 → PASS** (branch head 858e0f5)
|
||||
|
||||
- WHAT: P1–P6 complete on branch `restructure/recipe-custom` (P1=472a68b, P2=8cd72fd, P3=fd02d9f,
|
||||
P4=29a28e2, P5=68954be, P6=da558ca, +858e0f5 manifest redaction). Working tree clean, all pushed.
|
||||
- HOW (cold, from a fresh clone of the branch):
|
||||
- `cc-ci-run -m pytest tests/unit -q` → EXPECTED: **192 passed**
|
||||
- `cc-ci-run -m pytest tests/concurrency -q` → EXPECTED: **23 passed** (untouched by this plan;
|
||||
Builder proof run 2026-06-10 on branch head: 23 passed in 11.46s)
|
||||
- `nix develop .#lint --command scripts/lint.sh` → EXPECTED: **lint: PASS**
|
||||
- resolved-customization diff old-vs-new for all 21 recipe dirs (Adversary's own script) →
|
||||
EXPECTED: 0 deltas
|
||||
- adversarial review of the full diff `main..restructure/recipe-custom`
|
||||
- WHERE: origin branch `restructure/recipe-custom` @ 858e0f5; baseline matrix above (M2 prep,
|
||||
committed pre-merge per plan).
|
||||
|
||||
## Current
|
||||
|
||||
Bootstrapping phase; starting P1.
|
||||
M2 in progress: merge done (01e6d49, build 326 green); canary suite running on cc-ci; 21-recipe
|
||||
sweep queued behind it. Evidence lands here as steps complete.
|
||||
|
||||
22
machine-docs/ADVERSARY-INBOX.md
Normal file
22
machine-docs/ADVERSARY-INBOX.md
Normal file
@ -0,0 +1,22 @@
|
||||
# Adversary inbox — from Builder @2026-06-11T00:20Z (re: your 23:53Z asks — both in flight + new facts)
|
||||
|
||||
Both asks are queued serially on cc-ci (driver log /root/m2-proof-logs/driver.log):
|
||||
1. **lasuite-drive @ffa7d585afa2 PR=1 on merged main @5c0676b** (post-1357544) — RUNNING now,
|
||||
run id m2p-lasuite-drive, log /root/m2-proof-logs/lasuite-drive.log. Expected L5.
|
||||
2. **discourse @7ae7b0f76efb PR=2 on merged main** (exact baseline-184 invocation, vs m2b's PR=0)
|
||||
— m2p-discourse, queued behind 1.
|
||||
3. **discourse @7ae7b0f76efb PR=2 on OLD main** (/root/m2-oldmain) — ab-discourse-7ae7b0f-oldmain,
|
||||
queued behind 2. This is your same-ref A/B.
|
||||
|
||||
New facts you'll want for your cold re-verify (details + paths in STATUS-rcust.md):
|
||||
- m2b-discourse: the per-run clone is PRESERVED at /var/lib/cc-ci-runs/m2b-discourse/abra/recipes/
|
||||
discourse with HEAD=7ae7b0f — the upgrade re-checkout executed and persisted; `eb96de94` (the
|
||||
stamped chaos commit) is the prev-base tag commit 0.7.0+3.3.1. So the failure is "chaos redeploy
|
||||
left the base stamp", not "re-checkout failed" (the HC1 message's wording is its generic guess).
|
||||
- The `service "sidekiq" depends on undefined service "discourse"` line in the m2b log is NOT the
|
||||
failure: it appears verbatim in the PASSING m2r/m2rr upgrade sections (dangling depends_on ships
|
||||
in the published compose; see tests/discourse/compose.ccci.yml NOTE).
|
||||
- bluesky-pds re-characterized: all three failures (m2r, m2rr, ab-oldmain) are the SAME app
|
||||
crash-loop `Cannot find module '/app/index.js'` — upstream image moved under the pinned tag;
|
||||
no harness can deploy it. Not a pull timeout (my earlier STATUS wording was wrong, now fixed).
|
||||
grep MODULE_NOT_FOUND in the runs' abra/logs/default/.
|
||||
@ -348,8 +348,27 @@ def services_converged(domain: str) -> bool:
|
||||
# `want == "0"` rejection wrongly treated those as never-converged, hanging the deploy
|
||||
# forever. `cur == want` (with `want` present) is the correct convergence test; a service
|
||||
# still spinning up shows e.g. "0/1" (cur != want) and is correctly not-yet-converged.
|
||||
if not want or cur != want:
|
||||
if not want:
|
||||
return False
|
||||
if cur != want:
|
||||
# A TRIGGERED one-shot (restart_policy none, scaled 0→1, runs once, exits 0) reports
|
||||
# "0/1" FOREVER after its task completes — swarm never restarts it, so a bare
|
||||
# `cur != want` rejection would block convergence for the rest of the run (lasuite-drive
|
||||
# minio-createbuckets, rcust M2: install assert burned the full DEPLOY_TIMEOUT after the
|
||||
# P2b port moved the bucket trigger BEFORE the install assert; pre-restructure the
|
||||
# trigger ran after it, so converge never saw the 0/1). A replica deficit explained
|
||||
# entirely by COMPLETE tasks IS converged: the one-shot did its job and will never run
|
||||
# again. Anything else in the deficit (Running/Starting/Pending = still spinning up;
|
||||
# Failed/Rejected = genuinely broken) stays not-converged, and a desired>0 service with
|
||||
# no tasks yet is still scheduling.
|
||||
tasks = subprocess.run(
|
||||
["docker", "service", "ps", name, "--format", "{{.CurrentState}}"],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
)
|
||||
states = [ln.split()[0] for ln in tasks.stdout.split("\n") if ln.strip()]
|
||||
if not (states and all(s == "Complete" for s in states)):
|
||||
return False
|
||||
# N/N alone is NOT convergence during a stop-first rolling update: a chaos redeploy that changes
|
||||
# a non-app service image (e.g. immich's db pin) registers the update immediately, but swarm may
|
||||
# not have cycled that service's task yet — the OLD task still shows 1/1, then dies seconds later
|
||||
|
||||
@ -19,7 +19,12 @@ def pre_install(ctx):
|
||||
NOT create the MinIO bucket: `minio-createbuckets` is a `replicas:0` one-shot (restart_policy:
|
||||
none) that must be triggered. The MinIO storage test asserts the bucket exists, so trigger it
|
||||
here and poll. `--detach` is REQUIRED: the job creates the bucket then EXITS 0, so it never
|
||||
holds a steady 1/1 replica — a blocking scale would wait forever."""
|
||||
holds a steady 1/1 replica — a blocking scale would wait forever.
|
||||
|
||||
BEST-EFFORT, like the setup_custom_tests.sh it replaced: on poll timeout we WARN and continue
|
||||
(the one-shot often lands just after the window). The custom-tier MinIO storage test is the
|
||||
real gate for a genuinely missing bucket — failing the install op here was an rcust M2
|
||||
regression (the original hook fell through on timeout by design)."""
|
||||
stack = ctx.domain.replace(".", "_")
|
||||
print(" pre_install: creating MinIO bucket via the minio-createbuckets one-shot", flush=True)
|
||||
subprocess.run(
|
||||
@ -51,7 +56,12 @@ def pre_install(ctx):
|
||||
)
|
||||
return
|
||||
time.sleep(3)
|
||||
raise AssertionError("minio-createbuckets one-shot did not create drive-media-storage in 90s")
|
||||
print(
|
||||
" !! pre_install: minio-createbuckets one-shot did not create drive-media-storage in 90s "
|
||||
"— continuing (best-effort, as the pre-restructure hook did); the custom-tier MinIO test "
|
||||
"gates a genuinely missing bucket",
|
||||
flush=True,
|
||||
)
|
||||
|
||||
|
||||
def _wait_collabora_ready(domain, timeout=420):
|
||||
|
||||
96
tests/unit/test_converged_oneshot.py
Normal file
96
tests/unit/test_converged_oneshot.py
Normal file
@ -0,0 +1,96 @@
|
||||
"""Unit tests for lifecycle.services_converged's completed-one-shot rule (rcust M2 fix-forward).
|
||||
|
||||
A TRIGGERED one-shot service (restart_policy none, scaled 0→1, runs once, exits 0) reports "0/1"
|
||||
forever after its task completes — swarm never restarts it. A bare `cur != want` rejection then
|
||||
blocks convergence for the REST OF THE RUN (lasuite-drive minio-createbuckets: the P2b port moved
|
||||
the bucket trigger BEFORE the install assert, so the assert burned the full DEPLOY_TIMEOUT —
|
||||
pre-restructure the trigger ran after the assert and converge never saw the 0/1).
|
||||
|
||||
Pins (the Adversary's non-vacuity criteria):
|
||||
- deficit explained ENTIRELY by Complete tasks → converged (the one-shot did its job).
|
||||
- deficit with a Failed task → NOT converged (a broken one-shot must not pass).
|
||||
- deficit with a Running/Preparing task → NOT converged (still spinning up; no early green).
|
||||
- deficit with NO tasks yet → NOT converged (still scheduling).
|
||||
- plain N/N services still converge; plain 0/1-spinning-up still doesn't (regression guards).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import os
|
||||
import sys
|
||||
|
||||
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
|
||||
from harness import lifecycle as lc # noqa: E402
|
||||
|
||||
|
||||
class _R:
|
||||
def __init__(self, stdout="", stderr="", returncode=0):
|
||||
self.stdout, self.stderr, self.returncode = stdout, stderr, returncode
|
||||
|
||||
|
||||
def _patch_docker(monkeypatch, replicas_rows, task_states_by_service=None, update_state=""):
|
||||
"""Fake subprocess.run for the three docker calls services_converged makes."""
|
||||
task_states_by_service = task_states_by_service or {}
|
||||
|
||||
def fake_run(args, **kw):
|
||||
if args[:3] == ["docker", "stack", "services"]:
|
||||
return _R(stdout="\n".join(replicas_rows) + "\n")
|
||||
if args[:3] == ["docker", "service", "ps"]:
|
||||
name = args[3]
|
||||
return _R(stdout="\n".join(task_states_by_service.get(name, [])) + "\n")
|
||||
if args[:3] == ["docker", "service", "inspect"]:
|
||||
return _R(stdout=update_state + "\n")
|
||||
raise AssertionError(f"unexpected docker call: {args}")
|
||||
|
||||
monkeypatch.setattr(lc.subprocess, "run", fake_run)
|
||||
|
||||
|
||||
def test_completed_oneshot_deficit_is_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_app 1/1", "stack_minio-createbuckets 0/1"],
|
||||
{"stack_minio-createbuckets": ["Complete 28 minutes ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is True
|
||||
|
||||
|
||||
def test_failed_oneshot_deficit_is_not_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_app 1/1", "stack_minio-createbuckets 0/1"],
|
||||
{"stack_minio-createbuckets": ["Failed 2 minutes ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_mixed_complete_and_failed_tasks_not_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_oneshot 0/1"],
|
||||
{"stack_oneshot": ["Complete 5 minutes ago", "Failed 6 minutes ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_still_spinning_up_not_converged(monkeypatch):
|
||||
_patch_docker(
|
||||
monkeypatch,
|
||||
["stack_app 0/1"],
|
||||
{"stack_app": ["Preparing 10 seconds ago"]},
|
||||
)
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_deficit_with_no_tasks_yet_not_converged(monkeypatch):
|
||||
_patch_docker(monkeypatch, ["stack_app 0/1"], {"stack_app": []})
|
||||
assert lc.services_converged("app.example.com") is False
|
||||
|
||||
|
||||
def test_all_full_replicas_still_converged(monkeypatch):
|
||||
_patch_docker(monkeypatch, ["stack_app 1/1", "stack_db 1/1"])
|
||||
assert lc.services_converged("app.example.com") is True
|
||||
|
||||
|
||||
def test_on_demand_zero_zero_oneshot_still_converged(monkeypatch):
|
||||
_patch_docker(monkeypatch, ["stack_app 1/1", "stack_minio-createbuckets 0/0"])
|
||||
assert lc.services_converged("app.example.com") is True
|
||||
Reference in New Issue
Block a user