refactor(level): four essential rungs only — integration & recipe-local are optional

Per operator: the level ladder is now the FOUR essential rungs every recipe is held to — install, upgrade (essential), backup/restore, functional (top = L4). Integration (SSO/OIDC) and recipe-local are OPTIONAL capabilities: they no longer appear as level rungs or skip rows and never cap the level. SSO is still enforced for the run VERDICT (unchanged in run_recipe_ci.py); it just doesn't affect the level. derive_rungs simplified accordingly (drops declared/deps/sso/repo-local inputs). custom-html-tiny's EXPECTED_NA is back to just backup_restore. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
test(card): cover _skip_rows (intentional green / unintentional amber)
2026-06-09 02:55:47 +00:00 · 2026-06-09 02:42:57 +00:00 · 2026-06-09 02:42:05 +00:00 · 2026-06-09 02:36:53 +00:00 · 2026-06-09 02:26:44 +00:00 · 2026-06-09 02:00:16 +00:00
121 changed files with 694 additions and 1186 deletions
--- a/.drone.yml
+++ b/.drone.yml
@ -35,12 +35,10 @@ steps:
 # the comment-bridge). Deploys the recipe at the PR head, runs install/upgrade/backup + any
 # recipe-local tests via the shared harness, then guarantees teardown (plan §4.2/§4.3).
 #
-# Resource safety (plan §4.2/§4.3): DRONE_RUNNER_CAPACITY=2 (nix/modules/drone-runner.nix) +
+# Resource safety (plan §4.2/§4.3): MAX_TESTS=DRONE_RUNNER_CAPACITY=1 (nix/modules/drone-runner.nix) is
-# concurrency.limit=2 below allow two recipe runs in parallel. Concurrent-run safety is enforced by
+# the primary concurrency cap; concurrency.limit below is a redundant belt. CCCI_JANITOR_MAX_AGE=0
-# the harness, not by serialisation: same-recipe runs serialise on a per-recipe flock
+# makes the run-start janitor reap ANY orphaned run app before deploying — safe because capacity=1
-# (lifecycle.acquire_recipe_lock — the shared ~/.abra/recipes/<recipe> checkout is the conflict),
+# means no concurrent run exists (a SIGKILL'd/timed-out build leaves an orphan with no teardown).
 # and every run registers its app domain + pid in /run/cc-ci-active so the run-start janitor only
 # reaps orphans whose owning run is DEAD (alive → never touched; unknown → age fallback, default 2h).
 kind: pipeline
 type: exec
 name: recipe-ci
@ -54,16 +52,16 @@ trigger:
    - custom
 concurrency:
-  limit: 2
+  limit: 1
 steps:
  - name: ci
    environment:
      STAGES: install,upgrade,backup,restore,custom
      CCCI_JANITOR_MAX_AGE: "0"
      # The exec runner points HOME at a per-build workspace; force it to /root so abra finds its
-      # server config + recipes under /root/.abra (as the manual M4/M5 runs did). Safe with
+      # server config + recipes under /root/.abra (as the manual M4/M5 runs did). Safe: capacity=1
-      # capacity=2: app names are unique per (recipe,pr,ref) and same-recipe runs serialise on the
+      # means no concurrent build shares /root/.abra.
      # per-recipe flock, so concurrent builds never touch the same recipe checkout or app.
      HOME: /root
    commands:
      # RECIPE/REF/PR/SRC (+ CCCI_QUICK for `!testme --quick`) are injected as env vars from the
--- a/bridge/bridge.py
+++ b/bridge/bridge.py
@ -64,8 +64,6 @@ def parse_trigger(body):
    if s == f"{TRIGGER} --quick":
        return True, True
    return False, False
 ALLOWLIST = {u.strip() for u in os.environ.get("AUTH_ALLOWLIST", "").split(",") if u.strip()}
@ -169,12 +167,8 @@ def post_commit_status(owner, repo, sha, state, target_url, description=""):
        f"{GITEA_API}/repos/{owner}/{repo}/statuses/{sha}",
        GITEA_TOKEN,
        method="POST",
-        data={
+        data={"state": state, "target_url": target_url,
-            "state": state,
+              "description": description, "context": "cc-ci/testme"},
            "target_url": target_url,
            "description": description,
            "context": "cc-ci/testme",
        },
    )
@ -223,9 +217,7 @@ def result_comment_body(recipe, sha, num, run_url, status):
        if artifact_available(badge_url):
            body += f"\n\n[![level]({badge_url})]({run_url})"
        return f"{body}\n\n{links}"
-    return (
+    return f"{header} → {run_url}\n\n_(summary card unavailable — see the run for details.)_ {links}"
        f"{header} → {run_url}\n\n_(summary card unavailable — see the run for details.)_ {links}"
    )
 def watch_and_reflect(owner, name, number, num, recipe, sha, comment_id, run_url):
--- a/dashboard/dashboard.py
+++ b/dashboard/dashboard.py
@ -66,13 +66,8 @@ _COLORS = {
 # Level → colour ramp, kept in sync with runner/harness/card.py LEVEL_COLOR (the dashboard is a
 # standalone stdlib service that doesn't import the runner harness, so the small map is duplicated).
 _LEVEL_COLOR = {
-    0: "#e5534b",
+    0: "#e5534b", 1: "#e0823d", 2: "#e0823d", 3: "#d9b343",
-    1: "#e0823d",
+    4: "#a0b93f", 5: "#57ab5a", 6: "#3fb950",
    2: "#e0823d",
    3: "#d9b343",
    4: "#a0b93f",
    5: "#57ab5a",
    6: "#3fb950",
 }
@ -274,11 +269,7 @@ def _card(r):
            f'<a class="shot" href="{run_url}" title="open run">'
            f'<span class="ph">no screenshot</span>{_level_pill(r["level"])}</a>'
        )
-    cap = (
+    cap = f'<div class="cap">{html.escape(r["level_cap_reason"])}</div>' if r["level_cap_reason"] else ""
        f'<div class="cap">{html.escape(r["level_cap_reason"])}</div>'
        if r["level_cap_reason"]
        else ""
    )
    return (
        f'<div class="card">{shot}<div class="body">'
        f'<div class="name">{html.escape(r["recipe"])}</div>'
@ -316,11 +307,7 @@ def render_history(recipe, rows):
    trs = []
    for r in rows:
        color = _COLORS.get(r["status"], "#8b949e")
-        lvl = (
+        lvl = "—" if r["level"] is None else f'<b style="color:{level_color(r["level"])}">L{int(r["level"])}</b>'
            "—"
            if r["level"] is None
            else f'<b style="color:{level_color(r["level"])}">L{int(r["level"])}</b>'
        )
        shot = f'<a href="/runs/{r["number"]}/summary.png">card</a>' if r["has_screenshot"] else "—"
        trs.append(
            f'<tr><td><a href="{html.escape(r["url"])}">#{r["number"]}</a></td>'
@ -330,7 +317,7 @@ def render_history(recipe, rows):
        )
    body = "\n".join(trs) or '<tr><td colspan="6">no runs for this recipe yet</td></tr>'
    inner = (
-        f"<h1>{_FLOWER} {html.escape(recipe)} — run history</h1>"
+        f'<h1>{_FLOWER} {html.escape(recipe)} — run history</h1>'
        '<p class="sub"><a href="/">← all recipes</a> · every <code>!testme</code> run, newest first.</p>'
        "<table><thead><tr><th>Run</th><th>Status</th><th>Level</th><th>Version</th>"
        "<th>When</th><th>Card</th></tr></thead><tbody>"
--- a/flake.nix
+++ b/flake.nix
@ -31,36 +31,34 @@
      ];
    in
    {
-      nixosConfigurations = {
+      # Canonical live host target: the Hetzner cc-ci server.
-        # Canonical live host target: the Hetzner cc-ci server.
+      # Use `.#cc-ci` for the current production host.
-        # Use `.#cc-ci` for the current production host.
+      nixosConfigurations.cc-ci = nixpkgs.lib.nixosSystem {
-        cc-ci = nixpkgs.lib.nixosSystem {
+        inherit system;
-          inherit system;
+        modules = [
-          modules = [
+          sops-nix.nixosModules.sops
-            sops-nix.nixosModules.sops
+          ./nix/hosts/cc-ci-hetzner/configuration.nix
-            ./nix/hosts/cc-ci-hetzner/configuration.nix
+        ];
-          ];
+      };
        };
-        # Legacy Incus VM host definition retained only for historical comparison and fallback.
+      # Legacy Incus VM host definition retained only for historical comparison and fallback.
-        # Do NOT use this target on the live Hetzner server.
+      # Do NOT use this target on the live Hetzner server.
-        cc-ci-incus = nixpkgs.lib.nixosSystem {
+      nixosConfigurations.cc-ci-incus = nixpkgs.lib.nixosSystem {
-          inherit system;
+        inherit system;
-          modules = [
+        modules = [
-            sops-nix.nixosModules.sops
+          sops-nix.nixosModules.sops
-            ./nix/hosts/cc-ci/configuration.nix
+          ./nix/hosts/cc-ci/configuration.nix
-          ];
+        ];
-        };
+      };
-        # Explicit alias for the live Hetzner host. Kept alongside `cc-ci` so the intended host
+      # Explicit alias for the live Hetzner host. Kept alongside `cc-ci` so the intended host target
-        # target remains obvious in recovery/migration workflows.
+      # remains obvious in recovery/migration workflows.
-        cc-ci-hetzner = nixpkgs.lib.nixosSystem {
+      nixosConfigurations.cc-ci-hetzner = nixpkgs.lib.nixosSystem {
-          inherit system;
+        inherit system;
-          modules = [
+        modules = [
-            sops-nix.nixosModules.sops
+          sops-nix.nixosModules.sops
-            ./nix/hosts/cc-ci-hetzner/configuration.nix
+          ./nix/hosts/cc-ci-hetzner/configuration.nix
-          ];
+        ];
        };
      };
      devShells.${system} = {
--- a/nix/hosts/cc-ci-hetzner/configuration.nix
+++ b/nix/hosts/cc-ci-hetzner/configuration.nix
@ -7,7 +7,7 @@
 #   git clone --recursive https://git.autonomic.zone/recipe-maintainers/cc-ci.git /etc/cc-ci
 #   install -m600 <age-private-key> /var/lib/sops-nix/key.txt
 #   nixos-rebuild switch --flake /etc/cc-ci#cc-ci-hetzner
-{ pkgs, ... }:
+{ pkgs, lib, ... }:
 {
  imports = [
    ./hardware.nix
--- a/nix/hosts/cc-ci-hetzner/hardware.nix
+++ b/nix/hosts/cc-ci-hetzner/hardware.nix
@ -11,17 +11,13 @@
 {
  imports = [ (modulesPath + "/profiles/qemu-guest.nix") ];
-  boot = {
+  boot.loader = {
-    loader = {
+    efi.efiSysMountPoint = "/boot/efi";
-      efi.efiSysMountPoint = "/boot/efi";
+    grub = {
-      grub = {
+      efiSupport = true;
-        efiSupport = true;
+      efiInstallAsRemovable = true;
-        efiInstallAsRemovable = true;
+      device = "nodev";
        device = "nodev";
      };
    };
    initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "xen_blkfront" "vmw_pvscsi" ];
    initrd.kernelModules = [ "nvme" ];
  };
  fileSystems."/boot/efi" = {
@ -29,6 +25,9 @@
    fsType = "vfat";
  };
  boot.initrd.availableKernelModules = [ "ata_piix" "uhci_hcd" "xen_blkfront" "vmw_pvscsi" ];
  boot.initrd.kernelModules = [ "nvme" ];
  fileSystems."/" = {
    device = "/dev/sda1";
    fsType = "ext4";
--- a/nix/modules/drone-runner.nix
+++ b/nix/modules/drone-runner.nix
@ -9,18 +9,13 @@
 let
  # MAX_TESTS (plan §4.2/§4.3 resource safety): max CI builds the exec runner runs at once. Drone
  # queues the rest in its native pending-build queue (no custom queue). THE concurrency cap that
-  # bounds how many test apps can be live at once.
+  # bounds how many test apps can be live at once — kept LOW (1) on this single 28GiB node since
-  #
+  # recipes are heavy (immich/matrix large volumes). With capacity=1 there is never a concurrent
-  # Raised to 2 (operator request 2026-06-09) so two recipes can be tested in parallel (e.g. immich
+  # in-flight run, so the run-start janitor can safely reap *any* orphan (a SIGKILL'd build runs no
-  # and plausible under active development at once). Verified safe on the current node (Hetzner cpx22,
+  # teardown) and the "at most MAX_TESTS apps live" bound holds exactly. Raise to 2 only if the node
-  # ~7.6 GiB / 4 vCPU — NOTE: smaller than the original 28 GiB this was written for): a full immich CI
+  # is shown to handle two light recipes at once (then the janitor MUST stay age-based to avoid
-  # stack measured ~1 GiB (server+ML+pg+redis) with multiple GiB free, so two concurrent recipes fit.
+  # reaping a concurrent run — see DECISIONS.md "Resource safety").
-  # The concurrency PRECONDITION holds: the run-start janitor is age-based (default 2h) + run-app-name
+  maxTests = "1";
  # scoped, so it never reaps a concurrent in-flight run (harness.lifecycle.janitor). TRADE-OFF: with
  # capacity>1 a SIGKILL'd build (no teardown) leaves an orphan the run-start sweep can't reap
  # immediately (it might be a live run) — bounded instead by the 2h janitor + the /upgrade-all
  # start/end reap + sweep-orphans. Revert to "1" if OOM / disk-I/O contention is observed under load.
  maxTests = "2";
 in
 {
  # Drone ships under the Polyform Small Business license (nixpkgs marks it unfree);
--- a/nix/modules/nightly-sweep.nix
+++ b/nix/modules/nightly-sweep.nix
@ -29,7 +29,7 @@ in
    serviceConfig = {
      Type = "oneshot";
      # A full sweep across several recipes (each a cold deploy/test/teardown) is long; bound it.
-      TimeoutStartSec = "21600"; # 6h ceiling
+      TimeoutStartSec = "21600";  # 6h ceiling
      ExecStart = "${sweep}/bin/cc-ci-nightly-sweep";
    };
  };
@ -39,7 +39,7 @@ in
    wantedBy = [ "timers.target" ];
    timerConfig = {
      OnCalendar = "*-*-* 03:00:00";
-      Persistent = true; # catch up a missed nightly after downtime
+      Persistent = true;   # catch up a missed nightly after downtime
      RandomizedDelaySec = "600";
    };
  };
--- a/nix/modules/reports.nix
+++ b/nix/modules/reports.nix
@ -3,49 +3,10 @@
 # no secrets — just static files behind traefik + the wildcard TLS (same pattern as dashboard.nix,
 # but a plain nginx:alpine since there's nothing to render server-side). Content is updated by writing
 # files into /var/lib/cc-ci-reports; nginx serves them live (no redeploy needed).
 #
 # It ALSO serves a same-origin realtime PR-status proxy at /pr/<recipe>/<n>: the report's STATUS
 # column fetches it client-side to show each PR's live state (open vs. ✓). Same-origin means no
 # dependency on the Gitea CORS allow-list; the recipe mirrors are public so no token is needed. The
 # proxy is pinned to recipe-maintainers + a safe recipe-name charset and is read-only (GET/HEAD).
 { pkgs, ... }:
 let
  reportsDir = "/var/lib/cc-ci-reports";
  # Custom nginx server: static report files + the /pr/<recipe>/<n> → Gitea-API proxy. Replaces the
  # stock /etc/nginx/conf.d/default.conf (which the image's nginx.conf includes inside http{}).
  nginxConf = pkgs.writeText "cc-ci-reports-default.conf" ''
    server {
        listen 80;
        server_name _;
        root /usr/share/nginx/html;
        index index.html;
        # Realtime PR-status proxy for the Recipe Report STATUS column.
        # GET /pr/<recipe>/<n> -> the PUBLIC Gitea PR JSON ({state, merged, ...}). Same-origin from
        # the browser's view, so no CORS dependency; unauthenticated, since the recipe mirrors are
        # public. The repo owner is hard-pinned to recipe-maintainers and the recipe name to a
        # slashless charset, so the proxied path can only ever address recipe-maintainers/<name>/pulls
        # (it cannot be coerced to another org or path). Only safe read methods are allowed.
        location ~ ^/pr/([a-z0-9._-]+)/([0-9]+)$ {
            limit_except GET HEAD { deny all; }
            resolver 127.0.0.11 ipv6=off valid=30s;   # docker embedded DNS (forwards external names)
            proxy_ssl_server_name on;
            proxy_set_header Host git.autonomic.zone;
            proxy_set_header Accept "application/json";
            proxy_pass https://git.autonomic.zone/api/v1/repos/recipe-maintainers/$1/pulls/$2;
            proxy_intercept_errors off;
            proxy_connect_timeout 5s;
            proxy_read_timeout 10s;
            add_header Cache-Control "no-store" always;  # always fetch live state, never cache in the browser
        }
        location / {
            try_files $uri $uri/ =404;
        }
    }
  '';
  stack = pkgs.writeText "cc-ci-reports-stack.yml" ''
    version: "3.8"
    services:
@ -56,10 +17,6 @@ let
            source: ${reportsDir}
            target: /usr/share/nginx/html
            read_only: true
          - type: bind
            source: ${nginxConf}
            target: /etc/nginx/conf.d/default.conf
            read_only: true
        networks:
          - proxy
        deploy:
--- a/runner/harness/abra.py
+++ b/runner/harness/abra.py
@ -168,9 +168,7 @@ def secret_generate(domain: str, timeout: int = 300) -> None:
    )
-def deploy(
+def deploy(domain: str, chaos: bool = True, timeout: int = 900, no_converge_checks: bool = False) -> None:
    domain: str, chaos: bool = True, timeout: int = 900, no_converge_checks: bool = False
 ) -> None:
    args = ["app", "deploy", domain, "-o", "-n"]
    if chaos:
        args.append("-C")
@ -205,10 +203,7 @@ def backup_create(domain: str, timeout: int = 900) -> str:
    # remote and fails "authentication required: Unauthorized". Returns the captured output, whose
    # restic JSON summary line carries the produced "snapshot_id" (the backup artifact, DG3) — note
    # `abra app backup snapshots` needs a TTY and is awkward to script, so we read the create output.
-    out = (
+    out = _run_pty(["app", "backup", "create", domain, "-n", "-C", "-o"], timeout=timeout).stdout or ""
        _run_pty(["app", "backup", "create", domain, "-n", "-C", "-o"], timeout=timeout).stdout
        or ""
    )
    # Echo the backup output (incl. backupbot's pre-hook run / any "Failed to run command" or
    # "Container ... not running" ERROR) into the run log. Backup is otherwise opaque: a pre-hook that
    # fails to register/run leaves the DB dump out of the snapshot, surfacing only as a downstream
--- a/runner/harness/browser.py
+++ b/runner/harness/browser.py
@ -13,15 +13,8 @@ from __future__ import annotations
 import time
-def goto_with_retry(
+def goto_with_retry(page, url, *, deadline_seconds: int = 120, accept_statuses=(200, 304),
-    page,
+                    goto_timeout_ms: int = 30_000, wait_until: str = "domcontentloaded"):
    url,
    *,
    deadline_seconds: int = 120,
    accept_statuses=(200, 304),
    goto_timeout_ms: int = 30_000,
    wait_until: str = "domcontentloaded",
 ):
    """Poll `page.goto(url)` until status is in `accept_statuses` OR the deadline expires.
    Returns the final Playwright response. Raises AssertionError if the deadline expires without
--- a/runner/harness/canonical.py
+++ b/runner/harness/canonical.py
@ -55,9 +55,7 @@ def enrolled_recipes() -> list[str]:
    out = []
    try:
        for name in sorted(os.listdir(tests_dir)):
-            if os.path.isfile(os.path.join(tests_dir, name, "recipe_meta.py")) and is_enrolled(
+            if os.path.isfile(os.path.join(tests_dir, name, "recipe_meta.py")) and is_enrolled(name):
                name
            ):
                out.append(name)
    except OSError:
        pass
@ -124,15 +122,11 @@ def deploy_canonical(recipe: str, timeout: int = 900) -> None:
    abra.recipe_checkout(recipe, version)
    r = subprocess.run(
        ["abra", "app", "deploy", domain, version, "-o", "-n", "-f"],
-        capture_output=True,
+        capture_output=True, text=True, timeout=timeout,
        text=True,
        timeout=timeout,
    )
    if r.returncode != 0:
-        raise RuntimeError(
+        raise RuntimeError(f"deploy canonical {domain} {version} failed: "
-            f"deploy canonical {domain} {version} failed: "
+                           f"{(r.stderr + ' ' + r.stdout).strip()[:300]}")
            f"{(r.stderr + ' ' + r.stdout).strip()[:300]}"
        )
    _set_status(recipe, "warm")
--- a/runner/harness/card.py
+++ b/runner/harness/card.py
@ -148,9 +148,7 @@ RUNG_LABEL = {
    "backup_restore": "backup/restore",
    "functional": "functional",
 }
-SKIP_GREEN = (
+SKIP_GREEN = "#57ab5a"  # muted green — an intentional skip reads like a pass (but labelled, never inflating)
    "#57ab5a"  # muted green — an intentional skip reads like a pass (but labelled, never inflating)
 )
 def _skip_rows(skips: dict) -> str:
@ -161,16 +159,14 @@ def _skip_rows(skips: dict) -> str:
    for rung, reason in (skips.get("intentional") or {}).items():
        rows.append(
            f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{SKIP_GREEN}">⊘</span>'
-            f"<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>"
+            f'<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>'
            f'<td class="st" style="color:{SKIP_GREEN}">intentional skip</td></tr>'
        )
-        rows.append(
+        rows.append(f'<tr class="skipreason"><td></td><td colspan="2">{html.escape(reason)}</td></tr>')
            f'<tr class="skipreason"><td></td><td colspan="2">{html.escape(reason)}</td></tr>'
        )
    for rung in skips.get("unintentional") or []:
        rows.append(
            f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{GAP_COLOR}">⊘</span>'
-            f"<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>"
+            f'<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>'
            f'<td class="st" style="color:{GAP_COLOR}">unintentional skip</td></tr>'
        )
        rows.append(
--- a/runner/harness/deps.py
+++ b/runner/harness/deps.py
@ -28,7 +28,7 @@ from __future__ import annotations
 import contextlib
 import json
 import os
-from collections.abc import Iterable
+from typing import Iterable
 from . import lifecycle, naming
@ -36,7 +36,9 @@ from . import lifecycle, naming
 def declared_deps(recipe: str) -> list[str]:
    """Read `DEPS` from `tests/<recipe>/recipe_meta.py` — a list of recipe names this recipe needs
    deployed alongside it. Returns [] if none."""
-    path = os.path.join(os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py")
+    path = os.path.join(
        os.path.dirname(__file__), "..", "..", "tests", recipe, "recipe_meta.py"
    )
    if not os.path.exists(path):
        return []
    ns: dict = {}
--- a/runner/harness/generic.py
+++ b/runner/harness/generic.py
@ -222,11 +222,7 @@ def assert_restore_healthy(domain: str, meta: dict) -> None:
 def perform_upgrade(
-    domain: str,
+    domain: str, recipe: str, head_ref: str | None, deploy_timeout: int = 900, meta: dict | None = None
    recipe: str,
    head_ref: str | None,
    deploy_timeout: int = 900,
    meta: dict | None = None,
 ) -> dict[str, str | None]:
    """Perform the UPGRADE op once, in place, to the PR-HEAD code under test (HC1): re-checkout the
    PR head (the prev-tag base deploy reset the recipe working tree), then `abra app deploy --chaos`
@ -271,9 +267,7 @@ def perform_upgrade(
        deploy_timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout)),
        http_timeout=int(meta.get("HTTP_TIMEOUT", 300)),
    )
-    lifecycle.wait_ready_probes(
+    lifecycle.wait_ready_probes(meta, domain, timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout)))
        meta, domain, timeout=int(meta.get("DEPLOY_TIMEOUT", deploy_timeout))
    )
    after = lifecycle.deployed_identity(domain)
    # Evidence (HC1): the chaos-version label = the deployed recipe commit; it should match the
    # PR-head we checked out — proving the upgrade deployed the code under test, not a published tag.
--- a/runner/harness/http.py
+++ b/runner/harness/http.py
@ -73,7 +73,7 @@ def http_post(
    `data` is JSON-encoded if content_type='application/json',
    form-encoded if 'application/x-www-form-urlencoded' (the OIDC token endpoint form),
    or sent raw bytes if data is already bytes."""
-    if isinstance(data, bytes | bytearray):
+    if isinstance(data, (bytes, bytearray)):
        body: bytes | None = bytes(data)
    elif content_type == "application/json" and data is not None:
        body = json.dumps(data).encode()
@ -107,7 +107,7 @@ def http_request(
 ) -> tuple[int, object | None]:
    """Arbitrary-method HTTP (PUT/DELETE/PATCH) for parity tests that mutate. Same shape as
    http_post (returns (status, json_or_None))."""
-    if isinstance(data, bytes | bytearray):
+    if isinstance(data, (bytes, bytearray)):
        body: bytes | None = bytes(data)
    elif content_type == "application/json" and data is not None:
        body = json.dumps(data).encode()
@ -142,7 +142,7 @@ def post_with_headers(
    """Like http_post but ALSO returns the response headers as a dict — for APIs that hand back an
    auth token in a response header rather than the body (e.g. mattermost login → `Token` header).
    Returns (status, parsed_json_or_None, response_headers). status=0 + {} on transport failure."""
-    if isinstance(data, bytes | bytearray):
+    if isinstance(data, (bytes, bytearray)):
        body: bytes | None = bytes(data)
    elif content_type == "application/json" and data is not None:
        body = json.dumps(data).encode()
@ -252,16 +252,13 @@ def retry_http_post(
 ) -> tuple[int, object | None]:
    """POST with retry until expect_fn(status, json) is truthy. Defaults to any 2xx."""
    if expect_fn is None:
        def expect_fn(s, _j):  # noqa: ARG001
            return 200 <= s < 300
    result: list[tuple[int, object | None]] = [(0, None)]
    def _check():
-        s, j = http_post(
+        s, j = http_post(url, data=data, headers=headers, content_type=content_type, timeout=timeout)
            url, data=data, headers=headers, content_type=content_type, timeout=timeout
        )
        result[0] = (s, j)
        return expect_fn(s, j)
--- a/runner/harness/lifecycle.py
+++ b/runner/harness/lifecycle.py
@ -8,7 +8,6 @@ from __future__ import annotations
 import contextlib
 import datetime
 import fcntl
 import json
 import os
 import re
@ -30,73 +29,6 @@ class TeardownError(RuntimeError):
    pass
 # --- Concurrent-run safety (capacity=2) -------------------------------------------------------
 # Two cooperating mechanisms, both process-lifetime-scoped so SIGKILL can't leak a stale lock:
 #  1. Per-recipe flock: ~/.abra/recipes/<recipe> is ONE shared working tree that fetch_recipe
 #     rm-rf's/reclones and the upgrade tier git-checkouts mid-run. Concurrent runs of the SAME
 #     recipe would corrupt each other's deploy tree (observed: immich builds 229/230 deployed a
 #     tree missing its config), so they serialise on an exclusive flock; different recipes run in
 #     parallel. The kernel drops a flock when the holder dies, however it dies.
 #  2. Active-run registry: each run registers its app domain + pid before creating the app, so the
 #     janitor can tell a live concurrent run from a crashed run's orphan (see janitor()).
 RECIPE_LOCK_DIR = "/run/lock"
 ACTIVE_RUN_DIR = "/run/cc-ci-active"
 def acquire_recipe_lock(recipe: str):
    """Take the per-recipe exclusive lock; blocks (with a log line) if another run of the same
    recipe is in flight. Returns the open lock file — the CALLER must keep a reference for the
    whole run; the lock is released only when the process exits and the fd closes."""
    path = os.path.join(RECIPE_LOCK_DIR, f"cc-ci-recipe-{recipe}.lock")
    f = open(path, "w")  # noqa: SIM115 — deliberately held for the lifetime of the run
    try:
        fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
    except BlockingIOError:
        print(
            f"== recipe lock: another {recipe} run is in flight — waiting for {path} "
            "(shared ~/.abra/recipes checkout) ==",
            flush=True,
        )
        fcntl.flock(f, fcntl.LOCK_EX)
        print(f"== recipe lock: acquired {path} ==", flush=True)
    return f
 def _registry_path(domain: str) -> str:
    return os.path.join(ACTIVE_RUN_DIR, domain)
 def register_run_app(domain: str) -> None:
    """Record this process as the live owner of a run app (called BEFORE the app is created, so a
    concurrent run's janitor can never observe the app without its registration)."""
    with contextlib.suppress(OSError):
        os.makedirs(ACTIVE_RUN_DIR, exist_ok=True)
        with open(_registry_path(domain), "w") as f:
            f.write(str(os.getpid()))
 def unregister_run_app(domain: str) -> None:
    with contextlib.suppress(OSError):
        os.remove(_registry_path(domain))
 def _run_owner_state(domain: str) -> str:
    """'alive' if the registered owner is a live run_recipe_ci process, 'dead' if registered but
    gone (definite orphan), 'unknown' if never registered (pre-registry code or post-reboot)."""
    try:
        with open(_registry_path(domain)) as f:
            pid = int(f.read().strip())
    except (OSError, ValueError):
        return "unknown"
    try:
        with open(f"/proc/{pid}/cmdline", "rb") as f:
            cmdline = f.read().decode(errors="replace").replace("\0", " ")
    except OSError:
        return "dead"
    # Guard against pid reuse: the owner must still look like a harness run.
    return "alive" if "run_recipe_ci" in cmdline else "dead"
 def _docker_names(kind: str, stack: str) -> list[str]:
    """docker <kind> ls names filtered to a stack (kind: service|volume|secret)."""
    proc = subprocess.run(
@ -229,8 +161,7 @@ def prepull_images(recipe: str, domain: str) -> None:
    # --env-file supplies $VERSION-style interpolation so pinned tags resolve correctly.
    cf = subprocess.run(
        ["bash", "-c", f'set -a; . "{env_path}"; printf "%s" "${{COMPOSE_FILE:-compose.yml}}"'],
-        capture_output=True,
+        capture_output=True, text=True,
        text=True,
    ).stdout.strip()
    files = [f for f in cf.split(":") if f] or ["compose.yml"]
    args = ["docker", "compose", "--env-file", env_path]
@ -278,9 +209,6 @@ def deploy_app(
    past the 900s default. abra's INTERNAL TIMEOUT (recipe's TIMEOUT env, default 300s) is set via
    EXTRA_ENV; this is the Python subprocess wrapper's timeout so abra doesn't get SIGKILLed mid-deploy."""
    _record_deploy()
    # Register BEFORE the app exists: a concurrent run's janitor must never see this app without
    # its live-owner registration (it would reap an in-flight deploy).
    register_run_app(domain)
    abra.app_config_remove(domain)  # clear any stale .env from a prior crashed run
    abra.app_new(recipe, domain, version=version, secrets=secrets)
    # A pinned version must actually deploy that version: check the recipe out to the tag so the
@ -340,22 +268,18 @@ def _stack_name(domain: str) -> str:
 def services_converged(domain: str) -> bool:
-    """True when every service in the stack reports replicas N/N (N>0) AND no service is
+    """True when every service in the stack reports replicas N/N (N>0)."""
    mid-rolling-update (swarm UpdateStatus settled)."""
    stack = _stack_name(domain)
    proc = subprocess.run(
-        ["docker", "stack", "services", stack, "--format", "{{.Name}} {{.Replicas}}"],
+        ["docker", "stack", "services", stack, "--format", "{{.Replicas}}"],
        capture_output=True,
        text=True,
    )
    rows = [r for r in proc.stdout.split("\n") if r.strip()]
    if not rows:
        return False
    names = []
    for r in rows:
-        name, _, replicas = r.partition(" ")
+        cur, _, want = r.partition("/")
        names.append(name)
        cur, _, want = replicas.partition("/")
        # A service at its DESIRED replica count is converged — including a `replicas: 0`
        # on-demand one-shot (e.g. lasuite-drive's `minio-createbuckets`, which is scaled up
        # manually only when buckets need (re)creating), which reports "0/0". The earlier
@ -364,28 +288,6 @@ def services_converged(domain: str) -> bool:
        # still spinning up shows e.g. "0/1" (cur != want) and is correctly not-yet-converged.
        if not want or cur != want:
            return False
    # N/N alone is NOT convergence during a stop-first rolling update: a chaos redeploy that changes
    # a non-app service image (e.g. immich's db pin) registers the update immediately, but swarm may
    # not have cycled that service's task yet — the OLD task still shows 1/1, then dies seconds later
    # (immich CI 238: backupbot exec'd the db pre-hook into the just-killed container → 409). Require
    # every service's UpdateStatus to be settled too, so the wait spans the whole rolling update.
    proc = subprocess.run(
        [
            "docker",
            "service",
            "inspect",
            *names,
            "--format",
            "{{if .UpdateStatus}}{{.UpdateStatus.State}}{{end}}",
        ],
        capture_output=True,
        text=True,
    )
    if proc.returncode != 0:
        return False  # a service vanished mid-check — not settled
    for state in proc.stdout.split("\n"):
        if state.strip() not in ("", "completed", "rollback_completed"):
            return False
    return True
@ -513,9 +415,7 @@ def recipe_checkout_ref(recipe: str, ref: str) -> None:
    abra.recipe_checkout(recipe, ref)
-def chaos_redeploy(
+def chaos_redeploy(domain: str, deploy_timeout: int = 900, no_converge_checks: bool = False) -> None:
    domain: str, deploy_timeout: int = 900, no_converge_checks: bool = False
 ) -> None:
    """In-place `abra app deploy --chaos`: redeploy the running app at the CURRENT recipe checkout
    (HC1: the PR-head code under test). This is the upgrade op, not a fresh install — it does NOT go
    through deploy_app, so the deploy-count guard (DG4.1) is not incremented.
@ -598,16 +498,6 @@ def wait_ready_probes(meta: dict, domain: str, timeout: int = 600) -> None:
 def backup_app(domain: str) -> str:
    """Create a backup; return the abra/restic output (carries the produced snapshot_id)."""
    # Never back up a stack that is still converging/rolling-updating: backupbot resolves each
    # service's hook container ONCE up front, so a task that cycles between that lookup and the
    # pre-hook exec crashes the whole backup with a 409 (immich CI 238). Bounded wait — on timeout
    # we still attempt the backup and let the tier's assertion deliver the verdict.
    deadline = time.time() + 300
    while time.time() < deadline and not services_converged(domain):
        print(
            f"  backup: {domain} stack not settled yet — waiting before backup create", flush=True
        )
        time.sleep(5)
    return abra.backup_create(domain)
@ -713,19 +603,13 @@ def teardown_app(domain: str, verify: bool = True) -> None:
        residual = _residual(domain)
        if any(residual.values()):
            raise TeardownError(f"teardown left residual for {domain}: {residual}")
    # The app is gone — drop its active-run registration (janitor() also clears it when reaping).
    unregister_run_app(domain)
 def janitor(max_age_seconds: int | None = None) -> None:
-    """Reap orphaned run apps from crashed/rebooted runs. Matches the real naming scheme. Safe under
+    """Reap orphaned run apps from crashed/rebooted runs. Matches the real naming scheme and only
-    CONCURRENT runs (capacity=2): every harness run registers its app in the active-run registry
+    reaps apps older than max_age_seconds (so concurrent in-flight runs are never killed). Reaps via
-    (register_run_app), so the janitor distinguishes the three cases instead of using age alone:
+    docker primitives so it works even when the .env is gone (A2/A3). Default 2h, env-overridable
-      - registered + owner run_recipe_ci process ALIVE  -> in-flight concurrent run: never reap
+    via CCCI_JANITOR_MAX_AGE (e.g. 0 to reap all matching orphans immediately)."""
      - registered + owner DEAD (crashed/SIGKILLed run) -> definite orphan: reap immediately
      - no registry entry (pre-registry code, reboot)   -> fall back to the age threshold
    Reaps via docker primitives so it works even when the .env is gone (A2/A3). Age fallback default
    2h, env-overridable via CCCI_JANITOR_MAX_AGE."""
    import os
    if max_age_seconds is None:
@ -743,18 +627,9 @@ def janitor(max_age_seconds: int | None = None) -> None:
            seen.add(f"{m.group(1)}.ci.commoninternet.net")
    for name in seen:
-        owner = _run_owner_state(name)
+        stack = _stack_name(name)
-        if owner == "alive":
+        age = _stack_age_seconds(stack)
-            print(f"  janitor: {name} is a live concurrent run — leaving it", flush=True)
+        if age is not None and age < max_age_seconds:
-            continue
+            continue  # likely a concurrent in-flight run; leave it
        if owner == "unknown":
            # No registry entry (manual run on pre-registry code, or post-reboot): only the age
            # threshold protects it, as before.
            stack = _stack_name(name)
            age = _stack_age_seconds(stack)
            if age is not None and age < max_age_seconds:
                continue  # young and of unknown provenance; leave it
        # owner == "dead" (a crashed/killed run's definite orphan) or old enough -> reap
        with contextlib.suppress(Exception):
            teardown_app(name, verify=False)
        unregister_run_app(name)
--- a/runner/harness/warmsnap.py
+++ b/runner/harness/warmsnap.py
@ -113,9 +113,7 @@ def _assert_undeployed(domain: str) -> None:
        )
-def snapshot(
+def snapshot(recipe: str, domain: str, commit: str | None = None, version: str | None = None) -> dict:
    recipe: str, domain: str, commit: str | None = None, version: str | None = None
 ) -> dict:
    """Take a last-known-good snapshot of every data volume of <domain>'s stack. The app MUST be
    undeployed. Atomically replaces the prior last-good. Returns the written meta dict."""
    _assert_undeployed(domain)
@ -171,9 +169,7 @@ def restore(recipe: str, domain: str) -> dict:
    for vol in meta.get("volumes", []):
        tar_path = os.path.join(volumes_dir(recipe), f"{vol}.tar")
        if vol not in current:
-            raise SnapshotError(
+            raise SnapshotError(f"snapshot volume {vol} absent from current stack {sorted(current)}")
                f"snapshot volume {vol} absent from current stack {sorted(current)}"
            )
        mp = _volume_mountpoint(vol)
        # Clear the volume contents (incl. dotfiles) without removing the mountpoint itself.
        r = _run(["sh", "-c", f'rm -rf -- "{mp}"/* "{mp}"/.[!.]* "{mp}"/..?* 2>/dev/null; true'])
--- a/runner/nightly_sweep.py
+++ b/runner/nightly_sweep.py
@ -60,17 +60,14 @@ def sweep() -> int:
    for r in recipes:
        print(f"\n===== nightly: full-cold {r} (latest) =====", flush=True)
        env = dict(os.environ, RECIPE=r)
-        env.pop("REF", None)  # latest, not a PR head
+        env.pop("REF", None)      # latest, not a PR head
        env.pop("CCCI_QUICK", None)
        env.pop("MODE", None)
        rc = subprocess.run(
            [sys.executable, os.path.join(_here(), "run_recipe_ci.py")], env=env
        ).returncode
        results[r] = rc
-        print(
+        print(f"nightly: {r} rc={rc} ({'green→canonical refreshed' if rc == 0 else 'red'})", flush=True)
            f"nightly: {r} rc={rc} ({'green→canonical refreshed' if rc == 0 else 'red'})",
            flush=True,
        )
    # WC8 disk hygiene: drop warm data for de-enrolled canonicals; log the disk budget.
    pruned = canonical.prune_stale()
    if pruned:
--- a/runner/run_recipe_ci.py
+++ b/runner/run_recipe_ci.py
@ -44,25 +44,17 @@ sys.path.insert(0, os.path.join(ROOT, "runner"))
 from harness import (  # noqa: E402
    abra,
    canonical,
    card as card_mod,
    deps as deps_mod,
    discovery,
    generic,
    lifecycle,
    naming,
    results as results_mod,
    screenshot as screenshot_mod,
    warm,
    warmsnap,
 )
 from harness import (  # noqa: E402
    card as card_mod,
 )
 from harness import (  # noqa: E402
    deps as deps_mod,
 )
 from harness import (  # noqa: E402
    results as results_mod,
 )
 from harness import (  # noqa: E402
    screenshot as screenshot_mod,
 )
 ALL_STAGES = ("install", "upgrade", "backup", "restore", "custom")
@ -835,12 +827,6 @@ def main() -> int:
    print(
        f"== cc-ci run: recipe={recipe} ref={ref} pr={os.environ.get('PR', '0')} stages={sorted(stages)}"
    )
    # Concurrent-run safety: runs of the SAME recipe serialise on a per-recipe flock — they share
    # ONE ~/.abra/recipes/<recipe> working tree which fetch_recipe (below) rm-rf's/reclones and the
    # upgrade tier git-checkouts mid-run. Must be taken BEFORE fetch_recipe. Different recipes run
    # in parallel (capacity=2). The reference must stay alive for the whole run: the kernel drops
    # the flock when the fd closes (including on any crash/SIGKILL — no stale-lock failure mode).
    _recipe_lock = lifecycle.acquire_recipe_lock(recipe)  # noqa: F841
    fetch_recipe(recipe, ref, src)
    # The PR-head commit the upgrade tier re-checks out for the chaos redeploy to the code under test
    # (HC1). Prefer the explicit PR head sha ($REF) — robust + exact; fall back to the recipe checkout
@ -1299,10 +1285,8 @@ def main() -> int:
            capped = data.get("level_cap_rung")
            sk = data.get("skips", {})
            cap_skip = (
-                "intentional"
+                "intentional" if capped in (sk.get("intentional") or {})
-                if capped in (sk.get("intentional") or {})
+                else "unintentional" if capped in (sk.get("unintentional") or [])
                else "unintentional"
                if capped in (sk.get("unintentional") or [])
                else ""
            )
            with open(os.path.join(run_artifact_dir, "badge.svg"), "w", encoding="utf-8") as f:
--- a/runner/warm_reconcile.py
+++ b/runner/warm_reconcile.py
@ -43,16 +43,11 @@ def _traefik_setup(recipe: str, domain: str, version: str) -> None:
    ssl_cert/ssl_key swarm secrets; NO ACME). Uses the proven abra.env_set (newline-safe, unlike the
    bash set_env that bit keycloak)."""
    cert_dir = "/var/lib/ci-certs/live"
-    if not (
+    if not (os.path.isfile(f"{cert_dir}/fullchain.pem") and os.path.isfile(f"{cert_dir}/privkey.pem")):
        os.path.isfile(f"{cert_dir}/fullchain.pem") and os.path.isfile(f"{cert_dir}/privkey.pem")
    ):
        raise RuntimeError(f"FATAL: wildcard cert missing at {cert_dir} (sops decrypt broken?)")
    if not os.path.isfile(env_file(domain)):
-        _run(
+        _run(["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
-            ["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
+             timeout=120, check=True)
            timeout=120,
            check=True,
        )
    abra.env_set(domain, "DOMAIN", domain)
    abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
    abra.env_set(domain, "WILDCARDS_ENABLED", "1")
@ -66,39 +61,11 @@ def _traefik_setup(recipe: str, domain: str, version: str) -> None:
        return any(s.endswith(f"_{name}_v1") for s in have)
    if not _has("ssl_cert"):
-        _run(
+        _run(["abra", "app", "secret", "insert", domain, "ssl_cert", "v1",
-            [
+              f"{cert_dir}/fullchain.pem", "-f", "-n"], timeout=120, check=True)
                "abra",
                "app",
                "secret",
                "insert",
                domain,
                "ssl_cert",
                "v1",
                f"{cert_dir}/fullchain.pem",
                "-f",
                "-n",
            ],
            timeout=120,
            check=True,
        )
    if not _has("ssl_key"):
-        _run(
+        _run(["abra", "app", "secret", "insert", domain, "ssl_key", "v1",
-            [
+              f"{cert_dir}/privkey.pem", "-f", "-n"], timeout=120, check=True)
                "abra",
                "app",
                "secret",
                "insert",
                domain,
                "ssl_key",
                "v1",
                f"{cert_dir}/privkey.pem",
                "-f",
                "-n",
            ],
            timeout=120,
            check=True,
        )
 SPECS: dict[str, dict] = {
@ -251,17 +218,8 @@ def health_code(spec: dict) -> int:
    domain = spec.get("health_domain", spec["domain"])
    r = _run(
        [
-            "curl",
+            "curl", "-sk", "-o", "/dev/null", "-w", "%{http_code}", "--max-time", "10",
-            "-sk",
+            "--resolve", f"{domain}:443:127.0.0.1", f"https://{domain}{spec['health_path']}",
            "-o",
            "/dev/null",
            "-w",
            "%{http_code}",
            "--max-time",
            "10",
            "--resolve",
            f"{domain}:443:127.0.0.1",
            f"https://{domain}{spec['health_path']}",
        ],
        timeout=20,
    )
@ -272,6 +230,7 @@ def health_code(spec: dict) -> int:
 def wait_healthy(spec: dict, timeout: int | None = None) -> bool:
    domain = spec["domain"]
    deadline = time.time() + (timeout or spec["health_timeout"])
    while time.time() < deadline:
        if health_code(spec) in tuple(spec["health_ok"]):
@ -366,18 +325,15 @@ def ensure_server() -> None:
 def ensure_app_config(recipe: str, domain: str, version: str) -> None:
    if not os.path.isfile(env_file(domain)):
-        _run(
+        _run(["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
-            ["abra", "app", "new", recipe, "-s", "default", "-D", domain, version, "-o", "-n"],
+             timeout=120, check=True)
            timeout=120,
            check=True,
        )
    abra.env_set(domain, "DOMAIN", domain)
    abra.env_set(domain, "LETS_ENCRYPT_ENV", "")
 def ensure_secrets(domain: str) -> None:
    stack = lifecycle._stack_name(domain)  # noqa: SLF001
-    have = set(lifecycle._docker_names("secret", stack))  # noqa: SLF001
+    have = {n for n in lifecycle._docker_names("secret", stack)}  # noqa: SLF001
    if not any(n.endswith("_admin_password_v1") for n in have):
        abra.secret_generate(domain)
@ -437,9 +393,8 @@ def reconcile(app: str) -> str:
        write_alert(app, "held-major", current=current, latest=latest, release_notes=notes[:4000])
        return f"held-major:{current}->{latest}"
    if notes_flag_manual_migration(notes):
-        write_alert(
+        write_alert(app, "held-manual-migration", current=current, latest=latest,
-            app, "held-manual-migration", current=current, latest=latest, release_notes=notes[:4000]
+                    release_notes=notes[:4000])
        )
        return f"held-manual-migration:{current}->{latest}"
    # WC1.1 health-gated upgrade with rollback.
@ -473,14 +428,8 @@ def reconcile(app: str) -> str:
        warmsnap.restore(recipe, domain)
    deploy_version(recipe, domain, last_good, dt)
    recovered = wait_healthy(spec)
-    write_alert(
+    write_alert(app, "rollback", last_good=last_good, attempted=latest, recovered=recovered,
-        app,
+                release_notes=notes[:2000])
        "rollback",
        last_good=last_good,
        attempted=latest,
        recovered=recovered,
        release_notes=notes[:2000],
    )
    if not recovered:
        raise RuntimeError(f"{app} rollback to {last_good} did not become healthy")
    return f"rolled-back:{latest}->{last_good}"
--- a/tests/bluesky-pds/_p4.py
+++ b/tests/bluesky-pds/_p4.py
@ -15,8 +15,7 @@ import shlex
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import http as harness_http  # noqa: E402
+from harness import http as harness_http, lifecycle  # noqa: E402
 from harness import lifecycle
 PDS_HOST_LOCAL = "http://localhost:3000"
 _PW = "ccci-P4-marker-pw-2026"
--- a/tests/bluesky-pds/functional/test_account_and_post.py
+++ b/tests/bluesky-pds/functional/test_account_and_post.py
@ -27,7 +27,6 @@ CRUD). A wedged PDS subsystem fails AT its layer.
 from __future__ import annotations
 import contextlib
 import os
 import re
 import secrets
@ -36,8 +35,7 @@ import sys
 import uuid
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
-from harness import http as harness_http  # noqa: E402
+from harness import http as harness_http, lifecycle  # noqa: E402
 from harness import lifecycle
 PDS_HOST_LOCAL = "http://localhost:3000"
@ -60,18 +58,14 @@ def _goat_admin(domain: str, args: str) -> str:
    return _in_container(domain, cmd)
-def _xrpc_post(
+def _xrpc_post(domain: str, nsid: str, data: dict, token: str | None = None) -> tuple[int, dict | None]:
    domain: str, nsid: str, data: dict, token: str | None = None
 ) -> tuple[int, dict | None]:
    headers = {}
    if token:
        headers["Authorization"] = f"Bearer {token}"
    return harness_http.http_post(f"https://{domain}/xrpc/{nsid}", data=data, headers=headers)
-def _xrpc_get(
+def _xrpc_get(domain: str, nsid: str, query: str, token: str | None = None) -> tuple[int, dict | None]:
    domain: str, nsid: str, query: str, token: str | None = None
 ) -> tuple[int, dict | None]:
    headers = {}
    if token:
        headers["Authorization"] = f"Bearer {token}"
@ -88,9 +82,9 @@ def test_account_lifecycle_and_post_roundtrip(live_app):
    # Step 1: PDS describe via goat — recipe self-identifies as did:web:<domain>
    out = _in_container(domain, f"goat pds describe {PDS_HOST_LOCAL} 2>&1")
-    assert (
+    assert f"did:web:{domain}" in out, (
-        f"did:web:{domain}" in out
+        f"goat pds describe did not contain expected DID 'did:web:{domain}'. Output:\n{out[:500]!r}"
-    ), f"goat pds describe did not contain expected DID 'did:web:{domain}'. Output:\n{out[:500]!r}"
+    )
    # Step 2: Create account (UUID-suffixed handle = no run-to-run collision)
    out = _goat_admin(
@ -133,9 +127,9 @@ def test_account_lifecycle_and_post_roundtrip(live_app):
        assert s == 200, f"createRecord HTTP {s}: {body!r}"
        record_uri = (body or {}).get("uri", "")
        # URI format: at://<did>/app.bsky.feed.post/<rkey>
-        assert record_uri.startswith(
+        assert record_uri.startswith(f"at://{new_did}/app.bsky.feed.post/"), (
-            f"at://{new_did}/app.bsky.feed.post/"
+            f"unexpected record uri: {record_uri!r}"
-        ), f"unexpected record uri: {record_uri!r}"
+        )
        rkey = record_uri.rsplit("/", 1)[-1]
        assert rkey, f"no rkey in uri: {record_uri!r}"
@ -148,13 +142,15 @@ def test_account_lifecycle_and_post_roundtrip(live_app):
        )
        assert s == 200, f"getRecord HTTP {s}: {body!r}"
        record_value = (body or {}).get("value", {})
-        assert (
+        assert record_value.get("text") == marker, (
-            record_value.get("text") == marker
+            f"post text did not round-trip: created={marker!r}, fetched={record_value.get('text')!r}"
-        ), f"post text did not round-trip: created={marker!r}, fetched={record_value.get('text')!r}"
+        )
        assert record_value.get("$type") == "app.bsky.feed.post"
    finally:
        # Step 6: Best-effort cleanup. (The per-run domain teardown will discard the volume
        # too, but we exercise the delete-account path because it's part of §4.3.)
        if cleanup_did:
-            with contextlib.suppress(Exception):
+            try:
                _goat_admin(domain, f"account delete {cleanup_did}")
            except Exception:  # noqa: BLE001
                pass
--- a/tests/bluesky-pds/functional/test_describe_server.py
+++ b/tests/bluesky-pds/functional/test_describe_server.py
@ -26,6 +26,6 @@ def test_describe_server_returns_atproto_envelope(live_app):
    # At least one of these atproto-spec fields must be present
    expected_any = ("availableUserDomains", "inviteCodeRequired", "links", "did")
    present = [k for k in expected_any if k in body]
-    assert (
+    assert present, (
-        present
+        f"describe-server missing all of {expected_any}; got keys: {sorted(body.keys())[:20]}"
-    ), f"describe-server missing all of {expected_any}; got keys: {sorted(body.keys())[:20]}"
+    )
--- a/tests/bluesky-pds/functional/test_health_check.py
+++ b/tests/bluesky-pds/functional/test_health_check.py
@ -17,6 +17,6 @@ def test_pds_health_returns_version(live_app):
    url = f"https://{live_app}/xrpc/_health"
    status, body = harness_http.retry_http_get(url, expect_status=200, max_wait=60, interval=3)
    assert status == 200, f"GET {url} HTTP {status} (expected 200)"
-    assert (
+    assert isinstance(body, dict) and isinstance(body.get("version"), str) and body["version"], (
-        isinstance(body, dict) and isinstance(body.get("version"), str) and body["version"]
+        f"GET {url} response is not the expected health envelope: {body!r}"
-    ), f"GET {url} response is not the expected health envelope: {body!r}"
+    )
--- a/tests/bluesky-pds/functional/test_session_auth.py
+++ b/tests/bluesky-pds/functional/test_session_auth.py
@ -30,6 +30,6 @@ def test_get_session_requires_auth(live_app):
        f"body: {body!r}"
    )
    # The XRPC error envelope is JSON with an `error` field per the atproto spec.
-    assert isinstance(body, dict) and body.get(
+    assert isinstance(body, dict) and body.get("error"), (
-        "error"
+        f"expected XRPC JSON error envelope; got: {body!r}"
-    ), f"expected XRPC JSON error envelope; got: {body!r}"
+    )
--- a/tests/bluesky-pds/install_steps.sh
+++ b/tests/bluesky-pds/install_steps.sh
@ -22,12 +22,12 @@ echo "  bluesky-pds install_steps: generating secp256k1 PLC rotation key..."
 # same shape the PDS expects (32-byte hex). Equivalent for atproto PDS bootstrap.
 KEY_HEX=$(cc-ci-run -c 'import secrets; print(secrets.token_bytes(32).hex())')
 if [ -z "${KEY_HEX}" ] || [ "${#KEY_HEX}" != "64" ]; then
-  echo "  install_steps: failed to generate PLC rotation key (KEY_HEX length=${#KEY_HEX})" >&2
+    echo "  install_steps: failed to generate PLC rotation key (KEY_HEX length=${#KEY_HEX})" >&2
-  exit 1
+    exit 1
 fi
 # Insert via abra under TTY-wrap (`abra app secret insert` requires a TTY on this version).
 # We DON'T log the key value — abra also doesn't print it.
 script -qec "abra app secret insert ${CCCI_APP_DOMAIN} pds_plc_rotation_key v1 ${KEY_HEX} --no-input" /dev/null \
-  >/dev/null 2>&1
+    >/dev/null 2>&1
 echo "  bluesky-pds install_steps: PLC rotation key inserted (v1)."
--- a/tests/bluesky-pds/test_restore.py
+++ b/tests/bluesky-pds/test_restore.py
@ -11,6 +11,6 @@ import _p4  # noqa: E402
 def test_restore_returns_state(live_app):
-    assert _p4.account_exists(
+    assert _p4.account_exists(live_app), (
-        live_app
+        "restore did not bring back the seeded marker account (PDS data did not survive restore)"
-    ), "restore did not bring back the seeded marker account (PDS data did not survive restore)"
+    )
--- a/tests/conftest.py
+++ b/tests/conftest.py
@ -13,8 +13,7 @@ import sys
 import pytest
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "runner"))
-from harness import deps as deps_mod  # noqa: E402
+from harness import deps as deps_mod, lifecycle, naming  # noqa: E402
 from harness import lifecycle, naming
 def _short(s: str, n: int = 8) -> str:
--- a/tests/cryptpad/playwright/test_pad_content_roundtrip.py
+++ b/tests/cryptpad/playwright/test_pad_content_roundtrip.py
@ -26,7 +26,6 @@ Transient `net::ERR_NETWORK_CHANGED` is handled by the shared `goto_with_retry`
 from __future__ import annotations
 import contextlib
 import os
 import sys
 import uuid
@ -40,11 +39,7 @@ def _open_pad(ctx, url):
    bar once CryptPad has created/loaded the fragment-keyed pad (`#/2/pad/edit/<key>/`)."""
    page = ctx.new_page()
    harness_browser.goto_with_retry(
-        page,
+        page, url, accept_statuses=(200,), goto_timeout_ms=60_000, wait_until="load",
        url,
        accept_statuses=(200,),
        goto_timeout_ms=60_000,
        wait_until="load",
        deadline_seconds=150,
    )
    pad_url = url
@ -58,15 +53,13 @@ def _open_pad(ctx, url):
            pad_url = page.url
            break
        if i == 40:
-            with contextlib.suppress(Exception):  # best-effort unstick
+            try:
                harness_browser.goto_with_retry(
-                    page,
+                    page, url, accept_statuses=(200,), goto_timeout_ms=60_000,
-                    url,
+                    wait_until="load", deadline_seconds=120,
                    accept_statuses=(200,),
                    goto_timeout_ms=60_000,
                    wait_until="load",
                    deadline_seconds=120,
                )
            except Exception:  # noqa: BLE001 — best-effort unstick
                pass
    return page, pad_url
@ -81,22 +74,18 @@ def _ckeditor_frame(page, deadline_polls=90, reload_at=22, reload_url=None):
            if "ckeditor-inner" in f.url:
                return f
        if i == reload_at and reload_url is not None:
-            with contextlib.suppress(Exception):  # reload is a best-effort unstick
+            try:
                harness_browser.goto_with_retry(
-                    page,
+                    page, reload_url, accept_statuses=(200,), goto_timeout_ms=60_000,
-                    reload_url,
+                    wait_until="load", deadline_seconds=120,
                    accept_statuses=(200,),
                    goto_timeout_ms=60_000,
                    wait_until="load",
                    deadline_seconds=120,
                )
            except Exception:  # noqa: BLE001 — reload is a best-effort unstick
                pass
        page.wait_for_timeout(2000)
    return None
-def _poll_any_frame_for_text(
+def _poll_any_frame_for_text(page, needle, deadline_polls=120, reload_at=(20, 45, 75, 100), reload_url=None):
    page, needle, deadline_polls=120, reload_at=(20, 45, 75, 100), reload_url=None
 ):
    """Robust read-back (F2-13): poll EVERY frame's body text for `needle`, returning True as soon as
    it appears. The fresh cold-cache read-back context's deeply-nested CKEditor frame is slow/flaky to
    *attach* by URL (the prior `_ckeditor_frame` wait timed out on the Adversary's cold run), but the
@ -112,15 +101,13 @@ def _poll_any_frame_for_text(
            except Exception:  # noqa: BLE001 — frame not ready / detached; keep polling
                pass
        if reload_url and i in reload_at:
-            with contextlib.suppress(Exception):  # best-effort unstick
+            try:
                harness_browser.goto_with_retry(
-                    page,
+                    page, reload_url, accept_statuses=(200,), goto_timeout_ms=60_000,
-                    reload_url,
+                    wait_until="load", deadline_seconds=120,
                    accept_statuses=(200,),
                    goto_timeout_ms=60_000,
                    wait_until="load",
                    deadline_seconds=120,
                )
            except Exception:  # noqa: BLE001 — best-effort unstick
                pass
        page.wait_for_timeout(2000)
    return False
@ -150,9 +137,9 @@ def test_cryptpad_pad_content_survives_fresh_session(live_app):
            # --- session 1: create the pad + write the marker ---
            ctx1 = browser.new_context(ignore_https_errors=True)
            page, pad_url = _open_pad(ctx1, f"https://{live_app}/pad/")
-            assert (
+            assert "#/2/pad/edit/" in pad_url, (
-                "#/2/pad/edit/" in pad_url
+                f"CryptPad did not create a fragment-keyed pad URL; got {pad_url!r}"
-            ), f"CryptPad did not create a fragment-keyed pad URL; got {pad_url!r}"
+            )
            ck = _ckeditor_frame(page, reload_url=pad_url)
            assert ck is not None, "CKEditor content frame never attached (pad editor not ready)"
            _dismiss_store_modal(page)
@ -161,9 +148,9 @@ def test_cryptpad_pad_content_survives_fresh_session(live_app):
            page.wait_for_timeout(1000)
            body.type(marker, delay=40)
            page.wait_for_timeout(12000)  # let CryptPad encrypt + sync the update to the server
-            assert (
+            assert marker in ck.locator("body").inner_text(), (
-                marker in ck.locator("body").inner_text()
+                "marker not present in the editor after typing — type did not land"
-            ), "marker not present in the editor after typing — type did not land"
+            )
            ctx1.close()
            # --- session 2: FRESH context (no shared storage/localStorage) reads the pad back by URL.
--- a/tests/cryptpad/playwright/test_pad_create.py
+++ b/tests/cryptpad/playwright/test_pad_create.py
@ -51,9 +51,9 @@ def test_cryptpad_spa_renders_with_no_console_errors(live_app):
            title = (page.title() or "").lower()
            body = page.content()
            blower = body.lower()
-            assert (
+            assert "cryptpad" in title or "cryptpad" in blower, (
-                "cryptpad" in title or "cryptpad" in blower
+                f"CryptPad SPA does not carry brand. title={title!r}, body excerpt: {body[:200]!r}"
-            ), f"CryptPad SPA does not carry brand. title={title!r}, body excerpt: {body[:200]!r}"
+            )
            # Canonical CryptPad asset references in the rendered DOM
            canonical = ("/customize/", "/components/", "main.js", "/api/broadcast")
--- a/tests/cryptpad/test_install.py
+++ b/tests/cryptpad/test_install.py
@ -8,8 +8,7 @@ import os
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import browser as harness_browser  # noqa: E402
+from harness import browser as harness_browser, generic, lifecycle  # noqa: E402
 from harness import generic, lifecycle
 def test_serving_and_content(live_app, meta):
--- a/tests/custom-html-bkp-bad/test_backup.py
+++ b/tests/custom-html-bkp-bad/test_backup.py
@ -20,9 +20,7 @@ def test_backup_captures_state(live_app):
    Since custom-html-bkp-bad has no ops.py::pre_backup to seed the marker, this file does NOT
    exist at backup time — exec_in_app returns empty or raises → assertion fails → backup tier RED.
    This models a recipe that declares backup capability but omits the data-seeding hook."""
-    result = lifecycle.exec_in_app(
+    result = lifecycle.exec_in_app(live_app, ["sh", "-c", f"cat {MARKER_PATH} 2>/dev/null || echo MISSING"]).strip()
        live_app, ["sh", "-c", f"cat {MARKER_PATH} 2>/dev/null || echo MISSING"]
    ).strip()
    assert result == "original", (
        f"backup did not capture the expected marker at {MARKER_PATH}: got {result!r}. "
        "Expected 'original' (seeded by pre_backup). If the marker is 'MISSING', the pre_backup "
--- a/tests/custom-html-tiny/functional/test_serves_content.py
+++ b/tests/custom-html-tiny/functional/test_serves_content.py
@ -79,9 +79,9 @@ def test_static_file_roundtrip_and_404(live_app):
        # A random non-existent path must 404 — proves real static-file semantics, distinguishing a
        # working server from a 200-everything stub or a mis-routed Traefik fallback.
        miss_status, _ = _get(f"https://{live_app}/ccci-missing-{uuid.uuid4().hex}.txt")
-        assert (
+        assert miss_status == 404, (
-            miss_status == 404
+            f"missing path returned {miss_status} (expected 404 — generic 200-returner / mis-route?)"
-        ), f"missing path returned {miss_status} (expected 404 — generic 200-returner / mis-route?)"
+        )
    finally:
        with contextlib.suppress(OSError):
            os.remove(path)
--- a/tests/custom-html/functional/test_content_roundtrip.py
+++ b/tests/custom-html/functional/test_content_roundtrip.py
@ -15,8 +15,7 @@ import sys
 import uuid
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
-from harness import http as harness_http  # noqa: E402
+from harness import http as harness_http, lifecycle  # noqa: E402
 from harness import lifecycle
 def test_content_roundtrip(live_app):
--- a/tests/custom-html/functional/test_content_type_header.py
+++ b/tests/custom-html/functional/test_content_type_header.py
@ -53,9 +53,9 @@ def test_content_type_html_and_txt(live_app):
    ct_txt = h_txt.get("content-type", "")
    # nginx default: "text/html" for .html and "text/plain" for .txt (may include "; charset=utf-8")
-    assert ct_html.startswith(
+    assert ct_html.startswith("text/html"), (
-        "text/html"
+        f"{html_name} Content-Type={ct_html!r}, expected text/html (nginx MIME config broken?)"
-    ), f"{html_name} Content-Type={ct_html!r}, expected text/html (nginx MIME config broken?)"
+    )
-    assert ct_txt.startswith(
+    assert ct_txt.startswith("text/plain"), (
-        "text/plain"
+        f"{txt_name} Content-Type={ct_txt!r}, expected text/plain (nginx MIME config broken?)"
-    ), f"{txt_name} Content-Type={ct_txt!r}, expected text/plain (nginx MIME config broken?)"
+    )
--- a/tests/custom-html/test_install.py
+++ b/tests/custom-html/test_install.py
@ -9,8 +9,7 @@ import os
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import browser as harness_browser  # noqa: E402
+from harness import browser as harness_browser, generic  # noqa: E402
 from harness import generic
 def test_serving_and_content(live_app, meta):
--- a/tests/discourse/functional/_discourse.py
+++ b/tests/discourse/functional/_discourse.py
@ -53,7 +53,7 @@ def mint_admin(domain: str) -> tuple[str, str]:
    cmd = (
        "cd /opt/bitnami/discourse && "
        "RUBY=$(command -v ruby || echo /opt/bitnami/ruby/bin/ruby) && "
-        f'RAILS_ENV=production "$RUBY" bin/rails runner "{_BOOTSTRAP_RB}"'
+        f"RAILS_ENV=production \"$RUBY\" bin/rails runner \"{_BOOTSTRAP_RB}\""
    )
    out = lifecycle.exec_in_app(domain, ["bash", "-c", cmd], service="app", timeout=240)
    key = user = None
@ -63,9 +63,9 @@ def mint_admin(domain: str) -> tuple[str, str]:
            key = line.split("=", 1)[1].strip()
        elif line.startswith("CCCI_API_USER="):
            user = line.split("=", 1)[1].strip()
-    assert (
+    assert key and user, (
-        key and user
+        f"could not bootstrap discourse admin/API key; rails output tail:\n{out[-1000:]}"
-    ), f"could not bootstrap discourse admin/API key; rails output tail:\n{out[-1000:]}"
+    )
    return key, user
--- a/tests/discourse/functional/test_create_topic.py
+++ b/tests/discourse/functional/test_create_topic.py
@ -48,23 +48,21 @@ def test_create_topic_roundtrip(live_app):
        headers=hdrs,
        timeout=60,
    )
-    assert status in (200, 201) and isinstance(
+    assert status in (200, 201) and isinstance(body, dict), (
-        body, dict
+        f"create topic failed: HTTP {status}, body={body!r}"
-    ), f"create topic failed: HTTP {status}, body={body!r}"
+    )
    topic_id = body.get("topic_id")
    assert topic_id, f"create topic returned no topic_id: {body!r}"
    # 4) Read the topic back and assert title + first-post body round-trip.
    status, got = harness_http.http_get(f"{base}/t/{topic_id}.json", headers=hdrs, timeout=30)
-    assert status == 200 and isinstance(
+    assert status == 200 and isinstance(got, dict), f"read topic failed: HTTP {status}, body={got!r}"
-        got, dict
+    assert got.get("title") == title, (
-    ), f"read topic failed: HTTP {status}, body={got!r}"
+        f"topic title did not round-trip: sent {title!r}, got {got.get('title')!r}"
-    assert (
+    )
        got.get("title") == title
    ), f"topic title did not round-trip: sent {title!r}, got {got.get('title')!r}"
    posts = (got.get("post_stream") or {}).get("posts") or []
    assert posts, f"topic has no posts on read-back: {got!r}"
    first_cooked = posts[0].get("cooked", "")
-    assert (
+    assert marker in first_cooked, (
-        marker in first_cooked
+        f"topic body did not round-trip: marker {marker!r} not in first post {first_cooked!r}"
-    ), f"topic body did not round-trip: marker {marker!r} not in first post {first_cooked!r}"
+    )
--- a/tests/discourse/functional/test_site_basic.py
+++ b/tests/discourse/functional/test_site_basic.py
@ -20,12 +20,12 @@ def test_site_json_has_discourse_config(live_app):
    status, body = harness_http.retry_http_get(
        f"https://{live_app}/site.json", expect_status=200, max_wait=120, interval=5
    )
-    assert status == 200 and isinstance(
+    assert status == 200 and isinstance(body, dict), (
-        body, dict
+        f"GET /site.json failed: HTTP {status}, body type={type(body).__name__}"
-    ), f"GET /site.json failed: HTTP {status}, body type={type(body).__name__}"
+    )
    # /site.json carries Discourse-specific structure — `categories` (a list) and `groups` are always
    # present in a booted Discourse. A non-Discourse 200 (placeholder page) would not parse to this.
    assert "categories" in body, f"/site.json missing 'categories' key: keys={list(body)[:20]}"
-    assert isinstance(
+    assert isinstance(body["categories"], list), (
-        body["categories"], list
+        f"/site.json 'categories' not a list: {type(body['categories']).__name__}"
-    ), f"/site.json 'categories' not a list: {type(body['categories']).__name__}"
+    )
--- a/tests/discourse/ops.py
+++ b/tests/discourse/ops.py
@ -15,7 +15,8 @@ from harness import lifecycle  # noqa: E402
 def _psql(domain, sql):
    cmd = (
-        "PGPASSWORD=$(cat /run/secrets/db_password) " f'psql -U discourse -d discourse -tAc "{sql}"'
+        'PGPASSWORD=$(cat /run/secrets/db_password) '
        f'psql -U discourse -d discourse -tAc "{sql}"'
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
@ -41,7 +42,6 @@ def pre_backup(domain, meta):
 def pre_restore(domain, meta):
    # diverge from the backup so a successful restore is observable
    _psql(domain, "DROP TABLE IF EXISTS ci_marker;")
-    assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
+    assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in ("", "NULL"), (
-        "",
+        "drop did not take"
-        "NULL",
+    )
    ), "drop did not take"
--- a/tests/discourse/recipe_meta.py
+++ b/tests/discourse/recipe_meta.py
@ -6,9 +6,7 @@
 # app is actually serving (the canonical "is discourse up" signal — NOT "/", which may redirect to setup).
 HEALTH_PATH = "/srv/status"
 HEALTH_OK = (200,)
-DEPLOY_TIMEOUT = (
+DEPLOY_TIMEOUT = 3600  # slow Rails cold boot (15-25min) on the 7-GiB single node; bumped 2400→3600 for
    3600  # slow Rails cold boot (15-25min) on the 7-GiB single node; bumped 2400→3600 for
 )
 # headroom after full4's base deploy timed out at 2400s (RAM/CPU-constrained boot + image re-pull).
 HTTP_TIMEOUT = 1200
@ -61,11 +59,7 @@ def BACKUP_VERIFY(domain):
    try:
        out = lifecycle.exec_in_app(
            domain,
-            [
+            ["sh", "-c", "gzip -t /var/lib/postgresql/data/backup.sql && wc -c < /var/lib/postgresql/data/backup.sql"],
                "sh",
                "-c",
                "gzip -t /var/lib/postgresql/data/backup.sql && wc -c < /var/lib/postgresql/data/backup.sql",
            ],
            service="db",
            timeout=60,
        ).strip()
--- a/tests/discourse/test_backup.py
+++ b/tests/discourse/test_backup.py
@ -14,12 +14,13 @@ from harness import lifecycle  # noqa: E402
 def _psql(domain, sql):
    cmd = (
-        "PGPASSWORD=$(cat /run/secrets/db_password) " f'psql -U discourse -d discourse -tAc "{sql}"'
+        'PGPASSWORD=$(cat /run/secrets/db_password) '
        f'psql -U discourse -d discourse -tAc "{sql}"'
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
 def test_backup_captures_state(live_app):
-    assert (
+    assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", (
-        _psql(live_app, "SELECT v FROM ci_marker;") == "original"
+        "the seeded discourse postgres state was not present at backup time"
-    ), "the seeded discourse postgres state was not present at backup time"
+    )
--- a/tests/discourse/test_restore.py
+++ b/tests/discourse/test_restore.py
@ -14,12 +14,13 @@ from harness import lifecycle  # noqa: E402
 def _psql(domain, sql):
    cmd = (
-        "PGPASSWORD=$(cat /run/secrets/db_password) " f'psql -U discourse -d discourse -tAc "{sql}"'
+        'PGPASSWORD=$(cat /run/secrets/db_password) '
        f'psql -U discourse -d discourse -tAc "{sql}"'
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
 def test_restore_returns_state(live_app):
-    assert (
+    assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", (
-        _psql(live_app, "SELECT v FROM ci_marker;") == "original"
+        "restore did not return the pre-mutation discourse postgres state (data-integrity failure)"
-    ), "restore did not return the pre-mutation discourse postgres state (data-integrity failure)"
+    )
--- a/tests/ghost/functional/_ghost.py
+++ b/tests/ghost/functional/_ghost.py
@ -93,10 +93,9 @@ class GhostAdmin:
        status, body = self.req(
            "POST", "/session/", {"username": ADMIN_EMAIL, "password": ADMIN_PW}
        )
-        assert status in (
+        assert status in (200, 201), (
-            200,
+            f"ghost admin session login failed: HTTP {status}, body={body!r}"
-            201,
+        )
        ), f"ghost admin session login failed: HTTP {status}, body={body!r}"
    def create_post(self, title: str, html: str) -> dict:
        status, body = self.req(
--- a/tests/ghost/functional/test_admin_redirect.py
+++ b/tests/ghost/functional/test_admin_redirect.py
@ -53,15 +53,13 @@ def test_ghost_admin_route_is_wired(live_app):
        return None
    status_body = harness_http.assert_converges(
-        _ready,
+        _ready, f"GET {url} returns Ghost admin (200) or setup redirect (302)",
-        f"GET {url} returns Ghost admin (200) or setup redirect (302)",
+        max_wait=60, interval=3,
        max_wait=60,
        interval=3,
    )
    status, body = status_body
    assert status in (200, 302), f"unexpected status: {status}"
    if status == 200:
        # The admin SPA references /ghost-assets/ or contains "ghost" in title/body
-        assert (
+        assert "ghost" in body.lower(), (
-            "ghost" in body.lower()
+            f"GET {url} 200 but body has no Ghost markers: {body[:200]!r}"
-        ), f"GET {url} 200 but body has no Ghost markers: {body[:200]!r}"
+        )
--- a/tests/ghost/functional/test_content_api.py
+++ b/tests/ghost/functional/test_content_api.py
@ -35,10 +35,10 @@ def test_content_api_settings_endpoint(live_app):
    assert body is not None, f"GET {url} returned non-JSON body"
    # On success: {"settings": {...}}. On error: {"errors": [...]}. Either shape is valid.
    if status == 200:
-        assert (
+        assert isinstance(body, dict) and "settings" in body, (
-            isinstance(body, dict) and "settings" in body
+            f"200 response missing 'settings' envelope: {body!r}"
-        ), f"200 response missing 'settings' envelope: {body!r}"
+        )
    else:
-        assert isinstance(body, dict) and (
+        assert isinstance(body, dict) and ("errors" in body or "message" in body or body), (
-            "errors" in body or "message" in body or body
+            f"error response not a proper Ghost error envelope: {body!r}"
-        ), f"error response not a proper Ghost error envelope: {body!r}"
+        )
--- a/tests/ghost/functional/test_post_roundtrip.py
+++ b/tests/ghost/functional/test_post_roundtrip.py
@ -43,17 +43,17 @@ def test_create_post_roundtrip(live_app):
    title = f"ccci-marker-{uniq}"
    marker = f"ccci-body-marker-{uniq}-roundtrip"
    created = admin.create_post(title, f"<p>{marker}</p>")
-    assert (
+    assert created.get("title") == title, (
-        created.get("title") == title
+        f"created post title mismatch: sent {title!r}, got {created.get('title')!r}"
-    ), f"created post title mismatch: sent {title!r}, got {created.get('title')!r}"
+    )
    # 4) Read it back by id and assert the post survived the round-trip (title always returned;
    #    html returned because we requested ?formats=html).
    got = admin.get_post(created["id"])
-    assert (
+    assert got.get("title") == title, (
-        got.get("title") == title
+        f"post title did not round-trip: sent {title!r}, got {got.get('title')!r}"
-    ), f"post title did not round-trip: sent {title!r}, got {got.get('title')!r}"
+    )
    html = got.get("html") or ""
-    assert (
+    assert marker in html, (
-        marker in html
+        f"post body did not round-trip: marker {marker!r} not in read-back html {html!r}"
-    ), f"post body did not round-trip: marker {marker!r} not in read-back html {html!r}"
+    )
--- a/tests/ghost/ops.py
+++ b/tests/ghost/ops.py
@ -22,7 +22,10 @@ from harness import lifecycle  # noqa: E402
 def _mysql(domain, sql):
-    cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
+    cmd = (
        'MYSQL_PWD="$(cat /run/secrets/db_password)" '
        f'mysql -u root -N -s ghost -e "{sql}"'
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
--- a/tests/ghost/recipe_meta.py
+++ b/tests/ghost/recipe_meta.py
@ -63,11 +63,7 @@ def BACKUP_VERIFY(domain):
    try:
        out = lifecycle.exec_in_app(
            domain,
-            [
+            ["sh", "-c", "gzip -t /var/lib/mysql/backup.sql.gz && wc -c < /var/lib/mysql/backup.sql.gz"],
                "sh",
                "-c",
                "gzip -t /var/lib/mysql/backup.sql.gz && wc -c < /var/lib/mysql/backup.sql.gz",
            ],
            service="db",
            timeout=60,
        ).strip()
--- a/tests/ghost/test_backup.py
+++ b/tests/ghost/test_backup.py
@ -15,11 +15,14 @@ from harness import lifecycle  # noqa: E402
 def _mysql(domain, sql):
-    cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
+    cmd = (
        'MYSQL_PWD="$(cat /run/secrets/db_password)" '
        f'mysql -u root -N -s ghost -e "{sql}"'
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
 def test_backup_captures_state(live_app):
-    assert (
+    assert _mysql(live_app, "SELECT v FROM ci_marker;") == "original", (
-        _mysql(live_app, "SELECT v FROM ci_marker;") == "original"
+        "the seeded ghost MySQL marker was not present at backup time"
-    ), "the seeded ghost MySQL marker was not present at backup time"
+    )
--- a/tests/ghost/test_restore.py
+++ b/tests/ghost/test_restore.py
@ -22,7 +22,10 @@ from harness import lifecycle  # noqa: E402
 def _mysql(domain, sql):
-    cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
+    cmd = (
        'MYSQL_PWD="$(cat /run/secrets/db_password)" '
        f'mysql -u root -N -s ghost -e "{sql}"'
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
--- a/tests/ghost/test_upgrade.py
+++ b/tests/ghost/test_upgrade.py
@ -14,11 +14,14 @@ from harness import lifecycle  # noqa: E402
 def _mysql(domain, sql):
-    cmd = 'MYSQL_PWD="$(cat /run/secrets/db_password)" ' f'mysql -u root -N -s ghost -e "{sql}"'
+    cmd = (
        'MYSQL_PWD="$(cat /run/secrets/db_password)" '
        f'mysql -u root -N -s ghost -e "{sql}"'
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", cmd], service="db").strip()
 def test_upgrade_preserves_state(live_app):
-    assert (
+    assert _mysql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives", (
-        _mysql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives"
+        "the seeded ghost MySQL marker did not survive the upgrade redeploy (data loss on upgrade)"
-    ), "the seeded ghost MySQL marker did not survive the upgrade redeploy (data loss on upgrade)"
+    )
--- a/tests/hedgedoc/functional/test_branding.py
+++ b/tests/hedgedoc/functional/test_branding.py
@ -14,6 +14,7 @@ import urllib.request
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
 from harness import http as harness_http  # noqa: E402
 _CTX = ssl.create_default_context()
 _CTX.check_hostname = False
 _CTX.verify_mode = ssl.CERT_NONE
--- a/tests/hedgedoc/functional/test_health_check.py
+++ b/tests/hedgedoc/functional/test_health_check.py
@ -15,5 +15,7 @@ from harness import http as harness_http  # noqa: E402
 def test_hedgedoc_root_serves(live_app):
    """GET / → 200 or 302 (login/new redirect)."""
    url = f"https://{live_app}/"
-    status, _ = harness_http.retry_http_get(url, expect_status=(200, 302), max_wait=90, interval=5)
+    status, _ = harness_http.retry_http_get(
        url, expect_status=(200, 302), max_wait=90, interval=5
    )
    assert status in (200, 302), f"GET {url} HTTP {status} (expected 200 or 302)"
--- a/tests/immich/functional/test_asset_processing.py
+++ b/tests/immich/functional/test_asset_processing.py
@ -111,13 +111,13 @@ def test_immich_processes_uploaded_asset_metadata_and_statistics(live_app):
        if exif and exif.get("exifImageWidth"):
            break
        time.sleep(5)
-    assert (
+    assert exif and exif.get("exifImageWidth") == 1 and exif.get("exifImageHeight") == 1, (
-        exif and exif.get("exifImageWidth") == 1 and exif.get("exifImageHeight") == 1
+        f"immich metadata-extraction did not populate the 1x1 PNG dimensions in exifInfo: {exif!r}"
-    ), f"immich metadata-extraction did not populate the 1x1 PNG dimensions in exifInfo: {exif!r}"
+    )
    # the asset is catalogued into the owner's library statistics (list-back in aggregate)
    sst, stats = harness_http.http_request("GET", f"{base}/api/assets/statistics", headers=auth)
    assert sst == 200 and isinstance(stats, dict), f"statistics HTTP {sst}: {stats!r}"
-    assert (
+    assert stats.get("images", 0) >= 1 and stats.get("total", 0) >= 1, (
-        stats.get("images", 0) >= 1 and stats.get("total", 0) >= 1
+        f"uploaded asset not reflected in library statistics: {stats!r}"
-    ), f"uploaded asset not reflected in library statistics: {stats!r}"
+    )
--- a/tests/immich/functional/test_asset_upload.py
+++ b/tests/immich/functional/test_asset_upload.py
@ -121,6 +121,6 @@ def test_immich_upload_asset_readback_and_thumbnail(live_app):
        if thumb == 200:
            break
        time.sleep(5)
-    assert (
+    assert thumb == 200, (
-        thumb == 200
+        f"immich did not generate a thumbnail/derivative for the uploaded asset (last HTTP {thumb})"
-    ), f"immich did not generate a thumbnail/derivative for the uploaded asset (last HTTP {thumb})"
+    )
--- a/tests/immich/functional/test_health_check.py
+++ b/tests/immich/functional/test_health_check.py
@ -16,11 +16,5 @@ from harness import http as harness_http  # noqa: E402
 def test_immich_returns_200(live_app):
    url = f"https://{live_app}/"
-    status, _ = harness_http.retry_http_get(
+    status, _ = harness_http.retry_http_get(url, expect_status=(200, 301, 302), max_wait=60, interval=3)
-        url, expect_status=(200, 301, 302), max_wait=60, interval=3
+    assert status in (200, 301, 302), f"immich at {url} returned HTTP {status} (expected 200/301/302)"
    )
    assert status in (
        200,
        301,
        302,
    ), f"immich at {url} returned HTTP {status} (expected 200/301/302)"
--- a/tests/immich/ops.py
+++ b/tests/immich/ops.py
@ -35,7 +35,4 @@ def pre_backup(domain, meta):
 def pre_restore(domain, meta):
    _psql(domain, "DROP TABLE ci_marker;")
-    assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in (
+    assert _psql(domain, "SELECT to_regclass('public.ci_marker');") in ("", "NULL"), "drop did not take"
        "",
        "NULL",
    ), "drop did not take"
--- a/tests/immich/test_backup.py
+++ b/tests/immich/test_backup.py
@ -14,6 +14,4 @@ def _psql(domain, sql):
 def test_backup_captures_state(live_app):
-    assert (
+    assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", "seeded postgres state not present at backup time"
        _psql(live_app, "SELECT v FROM ci_marker;") == "original"
    ), "seeded postgres state not present at backup time"
--- a/tests/immich/test_install.py
+++ b/tests/immich/test_install.py
@ -7,8 +7,7 @@ import os
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import browser as harness_browser  # noqa: E402
+from harness import browser as harness_browser, generic, lifecycle  # noqa: E402
 from harness import generic, lifecycle
 def test_serving_and_frontend(live_app, meta):
@ -26,11 +25,7 @@ def test_serving_and_frontend(live_app, meta):
            resp = harness_browser.goto_with_retry(
                page, url, accept_statuses=(200, 301, 302), goto_timeout_ms=60_000
            )
-            assert resp is not None and resp.status in (
+            assert resp is not None and resp.status in (200, 301, 302), f"page status {resp and resp.status}"
                200,
                301,
                302,
            ), f"page status {resp and resp.status}"
            assert "<html" in page.content().lower(), "no HTML served by the immich frontend"
        finally:
            browser.close()
--- a/tests/immich/test_restore.py
+++ b/tests/immich/test_restore.py
@ -14,6 +14,4 @@ def _psql(domain, sql):
 def test_restore_returns_state(live_app):
-    assert (
+    assert _psql(live_app, "SELECT v FROM ci_marker;") == "original", "restore did not return the pre-mutation postgres state"
        _psql(live_app, "SELECT v FROM ci_marker;") == "original"
    ), "restore did not return the pre-mutation postgres state"
--- a/tests/immich/test_upgrade.py
+++ b/tests/immich/test_upgrade.py
@ -14,6 +14,4 @@ def _psql(domain, sql):
 def test_upgrade_preserves_data(live_app):
-    assert (
+    assert _psql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives", "postgres data did not survive the upgrade"
        _psql(live_app, "SELECT v FROM ci_marker;") == "upgrade-survives"
    ), "postgres data did not survive the upgrade"
--- a/tests/keycloak/functional/test_create_client_and_use.py
+++ b/tests/keycloak/functional/test_create_client_and_use.py
@ -120,9 +120,9 @@ def test_create_confidential_client_and_obtain_token(live_app):
        "clientId": client_id,
        "enabled": True,
        "secret": client_secret,
-        "publicClient": False,  # confidential client
+        "publicClient": False,            # confidential client
-        "serviceAccountsEnabled": True,  # required for client_credentials grant
+        "serviceAccountsEnabled": True,    # required for client_credentials grant
-        "standardFlowEnabled": False,  # not needed for service-account-only client
+        "standardFlowEnabled": False,      # not needed for service-account-only client
        "directAccessGrantsEnabled": False,
        "protocol": "openid-connect",
    }
@ -144,25 +144,25 @@ def test_create_confidential_client_and_obtain_token(live_app):
        # Use the client to obtain its own token (client_credentials grant)
        tok_status, tok_resp = _client_credentials_token(live_app, client_id, client_secret)
-        assert (
+        assert tok_status == 200, (
-            tok_status == 200
+            f"client_credentials token returned HTTP {tok_status}: {tok_resp!r}"
-        ), f"client_credentials token returned HTTP {tok_status}: {tok_resp!r}"
+        )
        access_token = tok_resp.get("access_token") if isinstance(tok_resp, dict) else None
-        assert (
+        assert isinstance(access_token, str) and access_token.count(".") == 2, (
-            isinstance(access_token, str) and access_token.count(".") == 2
+            f"client_credentials access_token not a JWT: {access_token!r}"
-        ), f"client_credentials access_token not a JWT: {access_token!r}"
+        )
        # Decode the JWT payload; assert azp matches the new client
        payload = json.loads(_b64url_decode(access_token.split(".")[1]))
-        assert (
+        assert payload.get("azp") == client_id, (
-            payload.get("azp") == client_id
+            f"client_credentials JWT azp={payload.get('azp')!r} != client_id={client_id!r}"
-        ), f"client_credentials JWT azp={payload.get('azp')!r} != client_id={client_id!r}"
+        )
        # Service-account token does NOT carry a session-scoped user (azp + clientId differ from
        # admin-cli token). The presence of azp + iss == per-run-domain proves the issuance flow.
        expected_iss = f"https://{live_app}/realms/master"
-        assert (
+        assert payload.get("iss") == expected_iss, (
-            payload.get("iss") == expected_iss
+            f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
-        ), f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
+        )
    finally:
        # Idempotent cleanup
        if cleanup_id:
--- a/tests/keycloak/functional/test_password_grant_token.py
+++ b/tests/keycloak/functional/test_password_grant_token.py
@ -43,20 +43,22 @@ def test_password_grant_issues_valid_jwt(live_app):
    token = kc_admin.admin_token(live_app, password)
    # Shape: a JWT is exactly 3 base64url segments
-    assert (
+    assert isinstance(token, str) and token.count(".") == 2, (
-        isinstance(token, str) and token.count(".") == 2
+        f"access_token does not look like a JWT (no 3 segments): len={len(token) if token else 0}"
-    ), f"access_token does not look like a JWT (no 3 segments): len={len(token) if token else 0}"
+    )
    payload = _decode_jwt_payload(token)
    # iss = the issuer URL, must be the per-run domain's /realms/master endpoint
    expected_iss = f"https://{live_app}/realms/master"
-    assert (
+    assert payload.get("iss") == expected_iss, (
-        payload.get("iss") == expected_iss
+        f"JWT iss claim {payload.get('iss')!r} != {expected_iss!r}"
-    ), f"JWT iss claim {payload.get('iss')!r} != {expected_iss!r}"
+    )
    # azp = authorized party (which client requested this token)
-    assert payload.get("azp") == "admin-cli", f"JWT azp claim {payload.get('azp')!r} != 'admin-cli'"
+    assert payload.get("azp") == "admin-cli", (
        f"JWT azp claim {payload.get('azp')!r} != 'admin-cli'"
    )
    # typ = token type
    assert payload.get("typ") == "Bearer", f"JWT typ claim {payload.get('typ')!r} != 'Bearer'"
@ -68,6 +70,6 @@ def test_password_grant_issues_valid_jwt(live_app):
    # iat (issued at) is also a standard claim
    iat = payload.get("iat")
-    assert (
+    assert isinstance(iat, int) and iat <= time.time() + 60, (
-        isinstance(iat, int) and iat <= time.time() + 60
+        f"JWT iat {iat!r} not a reasonable past timestamp"
-    ), f"JWT iat {iat!r} not a reasonable past timestamp"
+    )
--- a/tests/keycloak/recipe_meta.py
+++ b/tests/keycloak/recipe_meta.py
@ -2,7 +2,5 @@
 # conftest — enrolling this recipe needs NO change to runner/harness code (D5).
 HEALTH_PATH = "/realms/master"  # 200 JSON once keycloak is up (not "/", which redirects)
 HEALTH_OK = (200,)
-DEPLOY_TIMEOUT = (
+DEPLOY_TIMEOUT = 900  # JVM + DB migration are slow on a 2-vCPU VM; observed 502 fallback up to ~10min
    900  # JVM + DB migration are slow on a 2-vCPU VM; observed 502 fallback up to ~10min
 )
 HTTP_TIMEOUT = 900
--- a/tests/keycloak/test_install.py
+++ b/tests/keycloak/test_install.py
@ -8,8 +8,7 @@ import os
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import browser as harness_browser  # noqa: E402
+from harness import browser as harness_browser, generic, lifecycle  # noqa: E402
 from harness import generic, lifecycle
 def test_serving_and_admin_console(live_app, meta):
--- a/tests/lasuite-docs/functional/test_auth_required.py
+++ b/tests/lasuite-docs/functional/test_auth_required.py
@ -28,7 +28,9 @@ def test_users_me_requires_auth(live_app):
    url = f"https://{live_app}/api/v1.0/users/me/"
    # Retry with broad acceptance: any 4xx (or specific 401) indicates the route exists + auth is
    # required. Reject 200 (anonymous access) and 5xx (broken backend).
-    status, _ = harness_http.retry_http_get(url, expect_status=(401, 403), max_wait=60, interval=3)
+    status, _ = harness_http.retry_http_get(
        url, expect_status=(401, 403), max_wait=60, interval=3
    )
    assert status in (401, 403), (
        f"GET {url} returned {status}, expected 401 (auth required). "
        f"200 = anonymous access leaked; 404 = route missing; 5xx = backend broken."
--- a/tests/lasuite-docs/functional/test_create_doc.py
+++ b/tests/lasuite-docs/functional/test_create_doc.py
@ -27,8 +27,7 @@ import uuid
 import pytest
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
-from harness import http as harness_http  # noqa: E402
+from harness import http as harness_http, sso  # noqa: E402
 from harness import sso
@pytest.mark.requires_deps
@ -37,15 +36,13 @@ def test_create_doc_and_read_back(live_app, deps_creds):
    kc = deps_creds["keycloak"]
    # Obtain a JWT via OIDC password grant
-    access_token = sso.oidc_password_grant(
+    access_token = sso.oidc_password_grant({
-        {
+        "client_id": kc["client_id"],
-            "client_id": kc["client_id"],
+        "client_secret": kc["client_secret"],
-            "client_secret": kc["client_secret"],
+        "user": kc["user"],
-            "user": kc["user"],
+        "password": kc["password"],
-            "password": kc["password"],
+        "token_url": kc["token_url"],
-            "token_url": kc["token_url"],
+    })
        }
    )
    auth = {"Authorization": f"Bearer {access_token}"}
    # Create a doc with a unique title
@ -59,9 +56,9 @@ def test_create_doc_and_read_back(live_app, deps_creds):
    assert isinstance(body, dict), f"unexpected response shape: {body!r}"
    doc_id = body.get("id")
    assert doc_id, f"created doc has no id: {body!r}"
-    assert (
+    assert body.get("title") == title, (
-        body.get("title") == title
+        f"created doc title mismatch: created={title!r}, response={body.get('title')!r}"
-    ), f"created doc title mismatch: created={title!r}, response={body.get('title')!r}"
+    )
    # Fetch it back via the dedicated GET endpoint
    s, fetched = harness_http.http_get(
@ -69,10 +66,9 @@ def test_create_doc_and_read_back(live_app, deps_creds):
    )
    assert s == 200, f"GET /api/v1.0/documents/{doc_id}/ HTTP {s}: {fetched!r}"
    assert isinstance(fetched, dict), f"unexpected GET response: {fetched!r}"
-    assert fetched.get("id") in (
+    assert fetched.get("id") in (doc_id, str(doc_id)), (
-        doc_id,
+        f"fetched id mismatch: created={doc_id!r}, fetched={fetched.get('id')!r}"
-        str(doc_id),
+    )
-    ), f"fetched id mismatch: created={doc_id!r}, fetched={fetched.get('id')!r}"
+    assert fetched.get("title") == title, (
-    assert (
+        f"fetched title mismatch: created={title!r}, fetched={fetched.get('title')!r}"
-        fetched.get("title") == title
+    )
    ), f"fetched title mismatch: created={title!r}, fetched={fetched.get('title')!r}"
--- a/tests/lasuite-docs/functional/test_health_check.py
+++ b/tests/lasuite-docs/functional/test_health_check.py
@ -22,11 +22,7 @@ def test_lasuite_docs_returns_200(live_app):
    url = f"https://{live_app}/"
    # accept 200 (frontend SPA shell) — lasuite-docs serves the SPA at root unauthenticated;
    # the SPA itself bootstraps via /api/v1.0/users/me/ which requires OIDC (separate test).
-    status, _ = harness_http.retry_http_get(
+    status, _ = harness_http.retry_http_get(url, expect_status=(200, 301, 302), max_wait=60, interval=3)
-        url, expect_status=(200, 301, 302), max_wait=60, interval=3
+    assert status in (200, 301, 302), (
        f"lasuite-docs at {url} returned HTTP {status} (expected 200/301/302)"
    )
    assert status in (
        200,
        301,
        302,
    ), f"lasuite-docs at {url} returned HTTP {status} (expected 200/301/302)"
--- a/tests/lasuite-docs/functional/test_oidc_login.py
+++ b/tests/lasuite-docs/functional/test_oidc_login.py
@ -25,8 +25,7 @@ import urllib.request
 import pytest
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
-from harness import http as harness_http  # noqa: E402
+from harness import http as harness_http, sso  # noqa: E402
 from harness import sso
 _CTX = ssl.create_default_context()
 _CTX.check_hostname = False
@ -62,9 +61,9 @@ def test_oidc_login_via_keycloak(live_app, deps_creds):
    # 302 redirect. Both are valid "auth-required" indicators — accept either, but if a
    # redirect is returned it must point at the dep keycloak realm.
    if status in (301, 302, 303, 307, 308):
-        assert expected_prefix in (
+        assert expected_prefix in (redirect or ""), (
-            redirect or ""
+            f"Docs redirected to {redirect!r}, expected to start with {expected_prefix!r}"
-        ), f"Docs redirected to {redirect!r}, expected to start with {expected_prefix!r}"
+        )
    else:
        assert status in (401, 403), (
            f"GET /api/v1.0/users/me/ unauth: HTTP {status}; expected redirect to keycloak "
@ -89,6 +88,6 @@ def test_oidc_login_via_keycloak(live_app, deps_creds):
    )
    assert status == 200, f"GET /api/v1.0/users/me/ with token HTTP {status}: {body!r}"
    assert isinstance(body, dict), f"unexpected response: {body!r}"
-    assert (
+    assert body.get("email") == kc["email"], (
-        body.get("email") == kc["email"]
+        f"unexpected user email: got {body.get('email')!r}, expected {kc['email']!r}"
-    ), f"unexpected user email: got {body.get('email')!r}, expected {kc['email']!r}"
+    )
--- a/tests/lasuite-docs/functional/test_oidc_with_keycloak.py
+++ b/tests/lasuite-docs/functional/test_oidc_with_keycloak.py
@ -42,9 +42,9 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
    # Sanity-check the creds shape — orchestrator-written
    assert kc["domain"]
    # WC1: realm is per-run namespaced "<parent>-<6hex>" so concurrent dependents never collide.
-    assert re.fullmatch(
+    assert re.fullmatch(r"lasuite-docs-[0-9a-f]{6}", kc["realm"]), (
-        r"lasuite-docs-[0-9a-f]{6}", kc["realm"]
+        f"realm {kc['realm']!r} not the per-run namespaced form lasuite-docs-<6hex>"
-    ), f"realm {kc['realm']!r} not the per-run namespaced form lasuite-docs-<6hex>"
+    )
    assert kc["client_id"] == "lasuite-docs"
    assert isinstance(kc["client_secret"], str) and len(kc["client_secret"]) >= 16
    assert isinstance(kc["password"], str) and len(kc["password"]) >= 16
@ -74,14 +74,16 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
    # Password grant → real JWT
    token = sso.oidc_password_grant(creds)
-    assert isinstance(token, str) and token.count(".") == 2, f"access_token is not a JWT: {token!r}"
+    assert isinstance(token, str) and token.count(".") == 2, (
        f"access_token is not a JWT: {token!r}"
    )
    payload = json.loads(_b64url_decode(token.split(".")[1]))
    assert payload.get("iss") == expected_iss, f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
-    assert (
+    assert payload.get("azp") == kc["client_id"], (
-        payload.get("azp") == kc["client_id"]
+        f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
-    ), f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
+    )
    assert payload.get("typ") == "Bearer", f"JWT typ={payload.get('typ')!r} != 'Bearer'"
    exp = payload.get("exp")
-    assert (
+    assert isinstance(exp, int) and exp > time.time(), (
-        isinstance(exp, int) and exp > time.time()
+        f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
-    ), f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
+    )
--- a/tests/lasuite-docs/setup_custom_tests.sh
+++ b/tests/lasuite-docs/setup_custom_tests.sh
@ -21,24 +21,15 @@ set -euo pipefail
 : "${CCCI_APP_DOMAIN:?missing}"
 : "${CCCI_DEPS_FILE:?missing}"
-test -s "$CCCI_DEPS_FILE" || {
+test -s "$CCCI_DEPS_FILE" || { echo "  setup_custom_tests: deps file empty"; exit 1; }
  echo "  setup_custom_tests: deps file empty"
  exit 1
 }
 # Read keycloak dep info via jq
-KC_DOMAIN=$(jq -r '.keycloak.domain' "$CCCI_DEPS_FILE")
+KC_DOMAIN=$(jq -r '.keycloak.domain'         "$CCCI_DEPS_FILE")
-KC_REALM=$(jq -r '.keycloak.realm' "$CCCI_DEPS_FILE")
+KC_REALM=$( jq -r '.keycloak.realm'          "$CCCI_DEPS_FILE")
-KC_CLIENT=$(jq -r '.keycloak.client_id' "$CCCI_DEPS_FILE")
+KC_CLIENT=$(jq -r '.keycloak.client_id'      "$CCCI_DEPS_FILE")
-KC_SECRET=$(jq -r '.keycloak.client_secret' "$CCCI_DEPS_FILE")
+KC_SECRET=$(jq -r '.keycloak.client_secret'  "$CCCI_DEPS_FILE")
-if [ -z "$KC_DOMAIN" ] || [ "$KC_DOMAIN" = "null" ]; then
+[ -n "$KC_DOMAIN" ] && [ "$KC_DOMAIN" != "null" ] || { echo "  setup_custom_tests: no keycloak.domain in deps"; exit 1; }
-  echo "  setup_custom_tests: no keycloak.domain in deps"
+[ -n "$KC_SECRET" ] && [ "$KC_SECRET" != "null" ] || { echo "  setup_custom_tests: no keycloak.client_secret"; exit 1; }
  exit 1
 fi
 if [ -z "$KC_SECRET" ] || [ "$KC_SECRET" = "null" ]; then
  echo "  setup_custom_tests: no keycloak.client_secret"
  exit 1
 fi
 echo "  lasuite-docs setup_custom_tests: wiring OIDC against keycloak dep ${KC_DOMAIN}"
@ -48,15 +39,12 @@ echo "  lasuite-docs setup_custom_tests: wiring OIDC against keycloak dep ${KC_D
 # update SECRET_OIDC_RPCS_VERSION in the .env to point at the new one.
 ENV_PATH="$HOME/.abra/servers/default/${CCCI_APP_DOMAIN}.env"
 CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
-NEW_NUM=$((${CUR_VER#v} + 1))
+NEW_NUM=$(( ${CUR_VER#v} + 1 ))
 NEW_VER="v${NEW_NUM}"
-INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) ||
+INSERT_LOG=$(abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o 2>&1) \
-  INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) ||
+  || INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) \
-  {
+  || { echo "  setup_custom_tests: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"; exit 1; }
    echo "  setup_custom_tests: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"
    exit 1
  }
 # Repoint the env var to the new version
 sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
 echo "  setup_custom_tests: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
@ -64,25 +52,25 @@ echo "  setup_custom_tests: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)
 # 2) Write OIDC env vars to the app's .env (names per lasuite-docs's .env.sample).
 # Ensure the file ends with a newline FIRST so our appends don't concatenate onto the last line
 # (we saw `TIMEOUT=900OIDC_REALM=...` malformed by a missing-trailing-newline file).
-[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
+[ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >> "$ENV_PATH"
-write_env() {
+write_env () {
  local key="$1" val="$2"
  # remove any existing key (commented or live) then append the live key=val
  sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
  # Re-ensure trailing newline after each delete (sed may leave the file without one)
-  [ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
+  [ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >> "$ENV_PATH"
-  printf '%s=%s\n' "$key" "$val" >>"$ENV_PATH"
+  printf '%s=%s\n' "$key" "$val" >> "$ENV_PATH"
 }
-write_env OIDC_REALM "$KC_REALM"
+write_env OIDC_REALM                       "$KC_REALM"
-write_env OIDC_OP_DISCOVERY_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration"
+write_env OIDC_OP_DISCOVERY_ENDPOINT       "https://${KC_DOMAIN}/realms/${KC_REALM}/.well-known/openid-configuration"
-write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
+write_env OIDC_OP_AUTHORIZATION_ENDPOINT   "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
-write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
+write_env OIDC_OP_TOKEN_ENDPOINT           "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
-write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
+write_env OIDC_OP_USER_ENDPOINT            "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
-write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
+write_env OIDC_OP_LOGOUT_ENDPOINT          "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
-write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
+write_env OIDC_OP_JWKS_ENDPOINT            "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
-write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
+write_env OIDC_RP_CLIENT_ID                "$KC_CLIENT"
-write_env OIDC_RP_SIGN_ALGO "RS256"
+write_env OIDC_RP_SIGN_ALGO                "RS256"
-write_env OIDC_RP_SCOPES "openid email profile"
+write_env OIDC_RP_SCOPES                   "openid email profile"
 # 3) Trigger an in-place redeploy so the env update takes effect. --force re-deploys even when
 # the recipe hasn't changed; --chaos avoids the chaos prompt; --no-input non-interactive.
--- a/tests/lasuite-docs/test_install.py
+++ b/tests/lasuite-docs/test_install.py
@ -10,8 +10,7 @@ import os
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import browser as harness_browser  # noqa: E402
+from harness import browser as harness_browser, generic, lifecycle  # noqa: E402
 from harness import generic, lifecycle
 def test_serving_and_frontend(live_app, meta):
--- a/tests/lasuite-drive/functional/test_health_check.py
+++ b/tests/lasuite-drive/functional/test_health_check.py
@ -25,8 +25,6 @@ def test_lasuite_drive_returns_200(live_app):
    status, _ = harness_http.retry_http_get(
        url, expect_status=(200, 301, 302), max_wait=60, interval=3
    )
-    assert status in (
+    assert status in (200, 301, 302), (
-        200,
+        f"lasuite-drive at {url} returned HTTP {status} (expected 200/301/302)"
-        301,
+    )
        302,
    ), f"lasuite-drive at {url} returned HTTP {status} (expected 200/301/302)"
--- a/tests/lasuite-drive/functional/test_minio_storage.py
+++ b/tests/lasuite-drive/functional/test_minio_storage.py
@ -29,8 +29,8 @@ BUCKET = "drive-media-storage"
 def _mc(domain: str, script: str) -> str:
    """Run an `mc` shell script inside the minio container (root creds from /run/secrets)."""
    prelude = (
-        "set -e; "
+        'set -e; '
-        "U=$(cat /run/secrets/minio_ru); P=$(cat /run/secrets/minio_rp); "
+        'U=$(cat /run/secrets/minio_ru); P=$(cat /run/secrets/minio_rp); '
        'mc alias set ccci http://localhost:9000 "$U" "$P" >/dev/null 2>&1; '
    )
    return lifecycle.exec_in_app(domain, ["sh", "-c", prelude + script], service="minio")
@ -49,13 +49,13 @@ def test_minio_bucket_present_and_object_roundtrip(live_app):
        domain,
        # upload via stdin; list the object; read it back (tagged); then delete.
        f'printf %s "{marker}" | mc pipe ccci/{BUCKET}/{key} >/dev/null 2>&1; '
-        f"mc ls ccci/{BUCKET}/{key}; "
+        f'mc ls ccci/{BUCKET}/{key}; '
        f'echo "READBACK:$(mc cat ccci/{BUCKET}/{key})"; '
-        f"mc rm ccci/{BUCKET}/{key} >/dev/null 2>&1",
+        f'mc rm ccci/{BUCKET}/{key} >/dev/null 2>&1',
    )
    # The object was listed (its key appears) and its content round-tripped intact.
    assert f"{marker}.txt" in out, f"uploaded object not listed in bucket: {out!r}"
-    assert (
+    assert f"READBACK:{marker}" in out, (
-        f"READBACK:{marker}" in out
+        f"object content did not round-trip through MinIO; got: {out!r}"
-    ), f"object content did not round-trip through MinIO; got: {out!r}"
+    )
--- a/tests/lasuite-drive/functional/test_oidc_with_keycloak.py
+++ b/tests/lasuite-drive/functional/test_oidc_with_keycloak.py
@ -46,9 +46,9 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
    # Creds shape. WC1: realm is per-run namespaced "<parent>-<6hex>"; client_id stays the parent.
    assert kc["domain"]
-    assert re.fullmatch(
+    assert re.fullmatch(r"lasuite-drive-[0-9a-f]{6}", kc["realm"]), (
-        r"lasuite-drive-[0-9a-f]{6}", kc["realm"]
+        f"realm {kc['realm']!r} not the per-run namespaced form lasuite-drive-<6hex>"
-    ), f"realm {kc['realm']!r} not the per-run namespaced form lasuite-drive-<6hex>"
+    )
    assert kc["client_id"] == "lasuite-drive"
    assert isinstance(kc["client_secret"], str) and len(kc["client_secret"]) >= 16
    assert isinstance(kc["password"], str) and len(kc["password"]) >= 16
@ -77,14 +77,16 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
    # Password grant → real JWT
    token = sso.oidc_password_grant(creds)
-    assert isinstance(token, str) and token.count(".") == 2, f"access_token is not a JWT: {token!r}"
+    assert isinstance(token, str) and token.count(".") == 2, (
        f"access_token is not a JWT: {token!r}"
    )
    payload = json.loads(_b64url_decode(token.split(".")[1]))
    assert payload.get("iss") == expected_iss, f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
-    assert (
+    assert payload.get("azp") == kc["client_id"], (
-        payload.get("azp") == kc["client_id"]
+        f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
-    ), f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
+    )
    assert payload.get("typ") == "Bearer", f"JWT typ={payload.get('typ')!r} != 'Bearer'"
    exp = payload.get("exp")
-    assert (
+    assert isinstance(exp, int) and exp > time.time(), (
-        isinstance(exp, int) and exp > time.time()
+        f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
-    ), f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
+    )
--- a/tests/lasuite-drive/install_steps.sh
+++ b/tests/lasuite-drive/install_steps.sh
@ -28,7 +28,7 @@ if [ -z "${CCCI_DEPS_FILE:-}" ] || [ ! -s "${CCCI_DEPS_FILE}" ]; then
  exit 0
 fi
 KC_DOMAIN=$(jq -r '.keycloak.domain        // empty' "$CCCI_DEPS_FILE")
-KC_REALM=$(jq -r '.keycloak.realm         // empty' "$CCCI_DEPS_FILE")
+KC_REALM=$( jq -r '.keycloak.realm         // empty' "$CCCI_DEPS_FILE")
 KC_CLIENT=$(jq -r '.keycloak.client_id     // empty' "$CCCI_DEPS_FILE")
 KC_SECRET=$(jq -r '.keycloak.client_secret // empty' "$CCCI_DEPS_FILE")
 if [ -z "$KC_DOMAIN" ] || [ -z "$KC_SECRET" ]; then
@ -43,38 +43,35 @@ echo "  lasuite-drive install_steps: wiring OIDC at install against keycloak ${K
 # point SECRET_OIDC_RPCS_VERSION at it. (The app is not deployed yet — a swarm secret can be created
 # independently of a running stack — so the single deploy below picks up v2.)
 CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
-NEW_NUM=$((${CUR_VER#v} + 1))
+NEW_NUM=$(( ${CUR_VER#v} + 1 ))
 NEW_VER="v${NEW_NUM}"
-INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) ||
+INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) \
-  INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) ||
+  || INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) \
-  {
+  || { echo "  install_steps: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"; exit 1; }
    echo "  install_steps: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"
    exit 1
  }
 sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
 echo "  install_steps: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
 # 2) Write the OIDC env vars (explicit endpoints — deterministic, no reliance on ${AUTH_DOMAIN}
 # expansion). Mirrors the recipe-maintainer impress/La Suite OIDC env contract.
-write_env() {
+write_env () {
  local key="$1" val="$2"
  sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
-  [ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
+  [ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >> "$ENV_PATH"
-  printf '%s=%s\n' "$key" "$val" >>"$ENV_PATH"
+  printf '%s=%s\n' "$key" "$val" >> "$ENV_PATH"
 }
-write_env AUTH_DOMAIN "$KC_DOMAIN"
+write_env AUTH_DOMAIN                      "$KC_DOMAIN"
-write_env OIDC_REALM "$KC_REALM"
+write_env OIDC_REALM                       "$KC_REALM"
-write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
+write_env OIDC_OP_JWKS_ENDPOINT            "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
-write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
+write_env OIDC_OP_AUTHORIZATION_ENDPOINT   "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
-write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
+write_env OIDC_OP_TOKEN_ENDPOINT           "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
-write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
+write_env OIDC_OP_USER_ENDPOINT            "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
-write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
+write_env OIDC_OP_LOGOUT_ENDPOINT          "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
-write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
+write_env OIDC_RP_CLIENT_ID                "$KC_CLIENT"
-write_env OIDC_RP_SIGN_ALGO "RS256"
+write_env OIDC_RP_SIGN_ALGO                "RS256"
-write_env OIDC_RP_SCOPES "openid email profile"
+write_env OIDC_RP_SCOPES                   "openid email profile"
-write_env OIDC_REDIRECT_ALLOWED_HOSTS "[\"https://${KC_DOMAIN}\", \"https://${CCCI_APP_DOMAIN}\"]"
+write_env OIDC_REDIRECT_ALLOWED_HOSTS      "[\"https://${KC_DOMAIN}\", \"https://${CCCI_APP_DOMAIN}\"]"
 # The recipe default acr_values=eidas1 is FranceConnect-specific; keycloak can't satisfy it and it
 # would break the interactive auth flow. Clear it so the keycloak OIDC client works.
-write_env OIDC_AUTH_REQUEST_EXTRA_PARAMS "{}"
+write_env OIDC_AUTH_REQUEST_EXTRA_PARAMS   "{}"
 echo "  lasuite-drive install_steps: OIDC env wired into .env (deploy will pick it up, no reconverge)"
--- a/tests/lasuite-drive/setup_custom_tests.sh
+++ b/tests/lasuite-drive/setup_custom_tests.sh
@ -29,7 +29,7 @@ docker service scale --detach "${STACK}_minio-createbuckets=1" >/dev/null 2>&1 |
 for i in $(seq 1 30); do
  MC_CID=$(docker ps -q -f "name=${STACK}_minio.1" | head -1)
  if [ -n "$MC_CID" ] && docker exec "$MC_CID" sh -c \
-    'mc alias set _c http://localhost:9000 "$(cat /run/secrets/minio_ru)" "$(cat /run/secrets/minio_rp)" >/dev/null 2>&1 && mc ls _c/drive-media-storage >/dev/null 2>&1'; then
+       'mc alias set _c http://localhost:9000 "$(cat /run/secrets/minio_ru)" "$(cat /run/secrets/minio_rp)" >/dev/null 2>&1 && mc ls _c/drive-media-storage >/dev/null 2>&1'; then
    echo "  setup: bucket drive-media-storage present after ${i} poll(s)"
    break
  fi
--- a/tests/lasuite-drive/test_install.py
+++ b/tests/lasuite-drive/test_install.py
@ -10,8 +10,7 @@ import os
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import browser as harness_browser  # noqa: E402
+from harness import browser as harness_browser, generic, lifecycle  # noqa: E402
 from harness import generic, lifecycle
 def test_serving_and_frontend(live_app, meta):
--- a/tests/lasuite-meet/functional/test_health_check.py
+++ b/tests/lasuite-meet/functional/test_health_check.py
@ -21,8 +21,6 @@ def test_lasuite_meet_returns_200(live_app):
    status, _ = harness_http.retry_http_get(
        url, expect_status=(200, 301, 302), max_wait=60, interval=3
    )
-    assert status in (
+    assert status in (200, 301, 302), (
-        200,
+        f"lasuite-meet at {url} returned HTTP {status} (expected 200/301/302)"
-        301,
+    )
        302,
    ), f"lasuite-meet at {url} returned HTTP {status} (expected 200/301/302)"
--- a/tests/lasuite-meet/functional/test_meeting_flow.py
+++ b/tests/lasuite-meet/functional/test_meeting_flow.py
@ -28,8 +28,7 @@ import sys
 import pytest
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
-from harness import http as harness_http  # noqa: E402
+from harness import http as harness_http, sso  # noqa: E402
 from harness import sso
 def _b64url(seg: str) -> bytes:
@ -75,40 +74,33 @@ def test_create_room_get_livekit_token_and_read_back(live_app, deps_creds):
    lk_room = livekit.get("room")
    lk_token = livekit.get("token")
    assert room_id, f"room created but no id: {body!r}"
-    assert (
+    assert lk_token and isinstance(lk_token, str) and lk_token.count(".") == 2, (
-        lk_token and isinstance(lk_token, str) and lk_token.count(".") == 2
+        f"room created but no LiveKit JWT token: {livekit!r}"
-    ), f"room created but no LiveKit JWT token: {livekit!r}"
+    )
    try:
        # --- read it back (a fresh authenticated GET of the created room) ---
-        status, got = harness_http.http_request(
+        status, got = harness_http.http_request("GET", f"{base}/api/v1.0/rooms/{room_id}/", headers=auth)
            "GET", f"{base}/api/v1.0/rooms/{room_id}/", headers=auth
        )
        assert status == 200, f"room read-back returned HTTP {status} (expected 200); body={got!r}"
-        assert (
+        assert isinstance(got, dict) and got.get("id") == room_id, (
-            isinstance(got, dict) and got.get("id") == room_id
+            f"read-back room id mismatch: {got!r}"
-        ), f"read-back room id mismatch: {got!r}"
+        )
-        got_lk = got.get("livekit") or {}
+        got_lk = (got.get("livekit") or {})
        assert got_lk.get("token"), f"read-back room missing LiveKit token: {got!r}"
-        assert (
+        assert got_lk.get("room") == lk_room, (
-            got_lk.get("room") == lk_room
+            f"read-back LiveKit room {got_lk.get('room')!r} != create-time {lk_room!r}"
-        ), f"read-back LiveKit room {got_lk.get('room')!r} != create-time {lk_room!r}"
+        )
        # --- the LiveKit token is a real signaling grant for this room (WebRTC subset) ---
        payload = json.loads(_b64url(lk_token.split(".")[1]))
        video = payload.get("video") or {}
-        assert (
+        assert video.get("room") == lk_room or payload.get("room") == lk_room, (
-            video.get("room") == lk_room or payload.get("room") == lk_room
+            f"LiveKit JWT does not grant the created room {lk_room!r}: {payload!r}"
-        ), f"LiveKit JWT does not grant the created room {lk_room!r}: {payload!r}"
+        )
    finally:
        # --- delete the room (cleanup + a real DELETE mutation) ---
-        del_status, _ = harness_http.http_request(
+        del_status, _ = harness_http.http_request("DELETE", f"{base}/api/v1.0/rooms/{room_id}/", headers=auth)
-            "DELETE", f"{base}/api/v1.0/rooms/{room_id}/", headers=auth
+        assert del_status in (204, 200), f"room delete returned HTTP {del_status} (expected 204/200)"
        )
        assert del_status in (
            204,
            200,
        ), f"room delete returned HTTP {del_status} (expected 204/200)"
    # --- best-effort: confirm the delete took (404 on re-GET). The §4.3 floor (create-an-object +
    # read-it-back + LiveKit-token issuance) is already proven by the hard assertions above; this
@ -120,9 +112,7 @@ def test_create_room_get_livekit_token_and_read_back(live_app, deps_creds):
    gone = False
    for _ in range(5):
-        status, _ = harness_http.http_request(
+        status, _ = harness_http.http_request("GET", f"{base}/api/v1.0/rooms/{room_id}/", headers=auth)
            "GET", f"{base}/api/v1.0/rooms/{room_id}/", headers=auth
        )
        if status == 404:
            gone = True
            break
--- a/tests/lasuite-meet/functional/test_oidc_with_keycloak.py
+++ b/tests/lasuite-meet/functional/test_oidc_with_keycloak.py
@ -46,9 +46,9 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
    # Creds shape. WC1: realm is per-run namespaced "<parent>-<6hex>"; client_id stays the parent.
    assert kc["domain"]
-    assert re.fullmatch(
+    assert re.fullmatch(r"lasuite-meet-[0-9a-f]{6}", kc["realm"]), (
-        r"lasuite-meet-[0-9a-f]{6}", kc["realm"]
+        f"realm {kc['realm']!r} not the per-run namespaced form lasuite-meet-<6hex>"
-    ), f"realm {kc['realm']!r} not the per-run namespaced form lasuite-meet-<6hex>"
+    )
    assert kc["client_id"] == "lasuite-meet"
    assert isinstance(kc["client_secret"], str) and len(kc["client_secret"]) >= 16
    assert isinstance(kc["password"], str) and len(kc["password"]) >= 16
@ -77,14 +77,16 @@ def test_oidc_password_grant_against_dep_keycloak(live_app, deps_creds):
    # Password grant → real JWT
    token = sso.oidc_password_grant(creds)
-    assert isinstance(token, str) and token.count(".") == 2, f"access_token is not a JWT: {token!r}"
+    assert isinstance(token, str) and token.count(".") == 2, (
        f"access_token is not a JWT: {token!r}"
    )
    payload = json.loads(_b64url_decode(token.split(".")[1]))
    assert payload.get("iss") == expected_iss, f"JWT iss={payload.get('iss')!r} != {expected_iss!r}"
-    assert (
+    assert payload.get("azp") == kc["client_id"], (
-        payload.get("azp") == kc["client_id"]
+        f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
-    ), f"JWT azp={payload.get('azp')!r} != {kc['client_id']!r}"
+    )
    assert payload.get("typ") == "Bearer", f"JWT typ={payload.get('typ')!r} != 'Bearer'"
    exp = payload.get("exp")
-    assert (
+    assert isinstance(exp, int) and exp > time.time(), (
-        isinstance(exp, int) and exp > time.time()
+        f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
-    ), f"JWT exp={exp!r} not a future timestamp (now={time.time():.0f})"
+    )
--- a/tests/lasuite-meet/install_steps.sh
+++ b/tests/lasuite-meet/install_steps.sh
@ -26,7 +26,7 @@ if [ -z "${CCCI_DEPS_FILE:-}" ] || [ ! -s "${CCCI_DEPS_FILE}" ]; then
  exit 0
 fi
 KC_DOMAIN=$(jq -r '.keycloak.domain        // empty' "$CCCI_DEPS_FILE")
-KC_REALM=$(jq -r '.keycloak.realm         // empty' "$CCCI_DEPS_FILE")
+KC_REALM=$( jq -r '.keycloak.realm         // empty' "$CCCI_DEPS_FILE")
 KC_CLIENT=$(jq -r '.keycloak.client_id     // empty' "$CCCI_DEPS_FILE")
 KC_SECRET=$(jq -r '.keycloak.client_secret // empty' "$CCCI_DEPS_FILE")
 if [ -z "$KC_DOMAIN" ] || [ -z "$KC_SECRET" ]; then
@ -40,34 +40,31 @@ echo "  lasuite-meet install_steps: wiring OIDC at install against keycloak ${KC
 # forbids overwriting a secret at the same version). The app is not deployed yet — a swarm secret can
 # be created independently — so the single deploy below picks up v2.
 CUR_VER=$(grep -E '^\s*SECRET_OIDC_RPCS_VERSION=' "$ENV_PATH" | tail -1 | cut -d= -f2 | tr -d '"\r' || echo "v1")
-NEW_NUM=$((${CUR_VER#v} + 1))
+NEW_NUM=$(( ${CUR_VER#v} + 1 ))
 NEW_VER="v${NEW_NUM}"
-INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) ||
+INSERT_LOG=$(abra app secret insert "$CCCI_APP_DOMAIN" oidc_rpcs "$NEW_VER" "$KC_SECRET" --no-input -C -o 2>&1) \
-  INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) ||
+  || INSERT_LOG=$(script -qec "abra app secret insert $CCCI_APP_DOMAIN oidc_rpcs $NEW_VER $KC_SECRET --no-input -C -o" /dev/null 2>&1) \
-  {
+  || { echo "  install_steps: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"; exit 1; }
    echo "  install_steps: abra app secret insert oidc_rpcs@$NEW_VER failed: $INSERT_LOG"
    exit 1
  }
 sed -i "s|^\s*SECRET_OIDC_RPCS_VERSION=.*|SECRET_OIDC_RPCS_VERSION=$NEW_VER|" "$ENV_PATH"
 echo "  install_steps: oidc_rpcs secret inserted at $NEW_VER (was $CUR_VER)"
 # 2) Write the OIDC env vars (explicit endpoints — deterministic). Meet's .env.sample templates the
 # endpoints off ${AUTH_DOMAIN}; set AUTH_DOMAIN + override each endpoint with the concrete realm URL.
-write_env() {
+write_env () {
  local key="$1" val="$2"
  sed -i "/^\s*#\?\s*${key}=/d" "$ENV_PATH"
-  [ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >>"$ENV_PATH"
+  [ -z "$(tail -c1 "$ENV_PATH" 2>/dev/null)" ] || printf '\n' >> "$ENV_PATH"
-  printf '%s=%s\n' "$key" "$val" >>"$ENV_PATH"
+  printf '%s=%s\n' "$key" "$val" >> "$ENV_PATH"
 }
-write_env AUTH_DOMAIN "$KC_DOMAIN"
+write_env AUTH_DOMAIN                      "$KC_DOMAIN"
-write_env OIDC_REALM "$KC_REALM"
+write_env OIDC_REALM                       "$KC_REALM"
-write_env OIDC_OP_JWKS_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
+write_env OIDC_OP_JWKS_ENDPOINT            "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/certs"
-write_env OIDC_OP_AUTHORIZATION_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
+write_env OIDC_OP_AUTHORIZATION_ENDPOINT   "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/auth"
-write_env OIDC_OP_TOKEN_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
+write_env OIDC_OP_TOKEN_ENDPOINT           "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/token"
-write_env OIDC_OP_USER_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
+write_env OIDC_OP_USER_ENDPOINT            "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/userinfo"
-write_env OIDC_OP_LOGOUT_ENDPOINT "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
+write_env OIDC_OP_LOGOUT_ENDPOINT          "https://${KC_DOMAIN}/realms/${KC_REALM}/protocol/openid-connect/logout"
-write_env OIDC_RP_CLIENT_ID "$KC_CLIENT"
+write_env OIDC_RP_CLIENT_ID                "$KC_CLIENT"
-write_env OIDC_RP_SIGN_ALGO "RS256"
+write_env OIDC_RP_SIGN_ALGO                "RS256"
-write_env OIDC_RP_SCOPES "openid email"
+write_env OIDC_RP_SCOPES                   "openid email"
 echo "  lasuite-meet install_steps: OIDC env wired into .env (deploy will pick it up, no reconverge)"
--- a/tests/lasuite-meet/test_install.py
+++ b/tests/lasuite-meet/test_install.py
@ -10,8 +10,7 @@ import os
 import sys
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
-from harness import browser as harness_browser  # noqa: E402
+from harness import browser as harness_browser, generic, lifecycle  # noqa: E402
 from harness import generic, lifecycle
 def test_serving_and_frontend(live_app, meta):
@ -34,11 +33,9 @@ def test_serving_and_frontend(live_app, meta):
            resp = harness_browser.goto_with_retry(
                page, url, accept_statuses=(200, 301, 302), goto_timeout_ms=60_000
            )
-            assert resp is not None and resp.status in (
+            assert resp is not None and resp.status in (200, 301, 302), (
-                200,
+                f"page status {resp and resp.status}"
-                301,
+            )
                302,
            ), f"page status {resp and resp.status}"
            assert "<html" in page.content().lower(), "no HTML served by the frontend"
        finally:
            browser.close()
--- a/tests/mailu/functional/test_mail_flow.py
+++ b/tests/mailu/functional/test_mail_flow.py
@ -43,7 +43,10 @@ def test_send_and_receive_mail(live_app):
    deadline = time.time() + 150
    while time.time() < deadline:
        for box in ("INBOX", "Junk"):
-            query = f"doveadm search -u '{email_addr}' mailbox {box} " f"header subject '{marker}'"
+            query = (
                f"doveadm search -u '{email_addr}' mailbox {box} "
                f"header subject '{marker}'"
            )
            out = lifecycle.exec_in_app(live_app, ["sh", "-c", query], service="imap")
            if out.strip():  # a non-empty result = "<mailbox-guid> <uid>" → message stored
                return
--- a/tests/mailu/functional/test_mailbox.py
+++ b/tests/mailu/functional/test_mailbox.py
@ -24,6 +24,6 @@ def test_create_mailbox_and_read_back(live_app):
    cfg = _mailu.config_export(live_app)
    emails = _mailu.user_emails(cfg)
-    assert (
+    assert email in emails, (
-        email in emails
+        f"created mailbox {email} not present in mailu config-export users {emails}"
-    ), f"created mailbox {email} not present in mailu config-export users {emails}"
+    )
--- a/tests/matrix-synapse/functional/test_federation_version.py
+++ b/tests/matrix-synapse/functional/test_federation_version.py
@ -34,12 +34,12 @@ def test_federation_version_endpoint(live_app):
    assert status == 200, f"GET {url} HTTP {status} (expected 200)"
    assert isinstance(body, dict), f"federation version returned non-dict: {type(body).__name__}"
    server = body.get("server")
-    assert isinstance(
+    assert isinstance(server, dict), (
-        server, dict
+        f"federation version response missing 'server' envelope: {body!r}"
-    ), f"federation version response missing 'server' envelope: {body!r}"
+    )
    name = server.get("name")
    assert name == "Synapse", f"server.name={name!r}, expected 'Synapse'"
    version = server.get("version")
-    assert (
+    assert isinstance(version, str) and len(version) > 0, (
-        isinstance(version, str) and len(version) > 0
+        f"server.version is not a non-empty string: {version!r}"
-    ), f"server.version is not a non-empty string: {version!r}"
+    )
--- a/tests/matrix-synapse/functional/test_health_check.py
+++ b/tests/matrix-synapse/functional/test_health_check.py
@ -11,6 +11,7 @@ Runs in the custom tier against the shared post-install live deployment.
 from __future__ import annotations
 import json
 import os
 import sys
@ -23,6 +24,6 @@ def test_synapse_client_versions_returns_json(live_app):
    url = f"https://{live_app}/_matrix/client/versions"
    status, body = harness_http.retry_http_get(url, expect_status=200, max_wait=60, interval=3)
    assert status == 200, f"GET {url} HTTP {status} (expected 200)"
-    assert (
+    assert isinstance(body, dict) and isinstance(body.get("versions"), list) and body["versions"], (
-        isinstance(body, dict) and isinstance(body.get("versions"), list) and body["versions"]
+        f"GET {url} did not return Matrix client-versions document: {body!r}"
-    ), f"GET {url} did not return Matrix client-versions document: {body!r}"
+    )
--- a/tests/matrix-synapse/functional/test_register_and_message.py
+++ b/tests/matrix-synapse/functional/test_register_and_message.py
@ -42,8 +42,7 @@ import sys
 import uuid
 sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
-from harness import http as harness_http  # noqa: E402
+from harness import http as harness_http, lifecycle  # noqa: E402
 from harness import lifecycle
 def _registration_secret(domain: str) -> str:
@ -102,11 +101,8 @@ def _admin_register(domain: str, secret: str, username: str, password: str, admi
        r = _container_curl(domain, "GET", "/_synapse/admin/v1/register")
        if r["status"] in (500, 502, 503, 504, 0):
            last = r
-            print(
+            print(f"  [register] {username}: nonce GET transient {r['status']} "
-                f"  [register] {username}: nonce GET transient {r['status']} "
+                  f"(attempt {attempt}, synapse recovering) — retrying", flush=True)
                f"(attempt {attempt}, synapse recovering) — retrying",
                flush=True,
            )
            time.sleep(5)
            continue
        assert r["status"] == 200, f"nonce GET failed: status={r['status']} raw={r['raw'][:200]!r}"
@ -126,19 +122,13 @@ def _admin_register(domain: str, secret: str, username: str, password: str, admi
        r = _container_curl(domain, "POST", "/_synapse/admin/v1/register", body=payload)
        if r["status"] == 200:
            if attempt > 1:
-                print(
+                print(f"  [register] {username}: succeeded on attempt {attempt} "
-                    f"  [register] {username}: succeeded on attempt {attempt} "
+                      f"(synapse recovered)", flush=True)
                    f"(synapse recovered)",
                    flush=True,
                )
            return r["body"] or {}
        if r["status"] in (500, 502, 503, 504, 0):
            last = r
-            print(
+            print(f"  [register] {username}: POST transient {r['status']} "
-                f"  [register] {username}: POST transient {r['status']} "
+                  f"(attempt {attempt}, synapse recovering) — retrying", flush=True)
                f"(attempt {attempt}, synapse recovering) — retrying",
                flush=True,
            )
            time.sleep(5)
            continue
        # a 4xx is a real rejection — fail fast, do not retry
@ -177,9 +167,9 @@ def test_register_two_users_send_receive_message(live_app):
    create + invite + join a room; send and read a message."""
    domain = live_app
    secret = _registration_secret(domain)
-    assert (
+    assert secret and len(secret) >= 16, (
-        secret and len(secret) >= 16
+        f"registration shared secret missing/short: len={len(secret) if secret else 0}"
-    ), f"registration shared secret missing/short: len={len(secret) if secret else 0}"
+    )
    suffix = uuid.uuid4().hex[:8]
    user_a = f"alice{suffix}"
--- a/tests/mattermost-lts/functional/test_create_message.py
+++ b/tests/mattermost-lts/functional/test_create_message.py
@ -41,9 +41,9 @@ def test_create_message_roundtrip(live_app):
        headers=auth,
        timeout=30,
    )
-    assert (
+    assert status in (200, 201) and isinstance(team, dict) and team.get("id"), (
-        status in (200, 201) and isinstance(team, dict) and team.get("id")
+        f"team creation failed: HTTP {status}, body={team!r}"
-    ), f"team creation failed: HTTP {status}, body={team!r}"
+    )
    status, chan = harness_http.http_post(
        f"{base}/channels",
        data={
@ -55,9 +55,9 @@ def test_create_message_roundtrip(live_app):
        headers=auth,
        timeout=30,
    )
-    assert (
+    assert status in (200, 201) and isinstance(chan, dict) and chan.get("id"), (
-        status in (200, 201) and isinstance(chan, dict) and chan.get("id")
+        f"channel creation failed: HTTP {status}, body={chan!r}"
-    ), f"channel creation failed: HTTP {status}, body={chan!r}"
+    )
    # 4) POST a unique marker message.
    marker = f"ccci-marker-{uniq}-roundtrip"
@ -67,13 +67,13 @@ def test_create_message_roundtrip(live_app):
        headers=auth,
        timeout=30,
    )
-    assert (
+    assert status in (200, 201) and isinstance(post, dict) and post.get("id"), (
-        status in (200, 201) and isinstance(post, dict) and post.get("id")
+        f"post creation failed: HTTP {status}, body={post!r}"
-    ), f"post creation failed: HTTP {status}, body={post!r}"
+    )
    # 5) Read it back by id and assert the message survived the round-trip.
    status, got = harness_http.http_get(f"{base}/posts/{post['id']}", headers=auth, timeout=30)
    assert status == 200 and isinstance(got, dict), f"read-back failed: HTTP {status}, body={got!r}"
-    assert (
+    assert got.get("message") == marker, (
-        got.get("message") == marker
+        f"message did not round-trip: sent {marker!r}, got {got.get('message')!r}"
-    ), f"message did not round-trip: sent {marker!r}, got {got.get('message')!r}"
+    )
--- a/tests/mattermost-lts/functional/test_health_check.py
+++ b/tests/mattermost-lts/functional/test_health_check.py
@ -18,7 +18,9 @@ from harness import http as harness_http  # noqa: E402
 def test_root_serves(live_app):
    """GET / → 200 or 302 (mattermost web app shell / login redirect)."""
    url = f"https://{live_app}/"
-    status, _ = harness_http.retry_http_get(url, expect_status=(200, 302), max_wait=60, interval=3)
+    status, _ = harness_http.retry_http_get(
        url, expect_status=(200, 302), max_wait=60, interval=3
    )
    assert status in (200, 302), f"GET {url} HTTP {status} (expected 200/302)"
@ -26,8 +28,10 @@ def test_system_ping_ok(live_app):
    """GET /api/v4/system/ping → 200 with JSON {"status":"OK"} — the mattermost server's own
    liveness endpoint (distinguishes a live mattermost API from a Traefik fallback / dead backend)."""
    url = f"https://{live_app}/api/v4/system/ping"
-    status, body = harness_http.retry_http_get(url, expect_status=200, max_wait=120, interval=3)
+    status, body = harness_http.retry_http_get(
        url, expect_status=200, max_wait=120, interval=3
    )
    assert status == 200, f"GET {url} HTTP {status} (expected 200)"
-    assert (
+    assert isinstance(body, dict) and body.get("status") == "OK", (
-        isinstance(body, dict) and body.get("status") == "OK"
+        f"/api/v4/system/ping did not report status=OK; got {body!r}"
-    ), f"/api/v4/system/ping did not report status=OK; got {body!r}"
+    )
--- a/tests/mattermost-lts/functional/test_multiuser_message.py
+++ b/tests/mattermost-lts/functional/test_multiuser_message.py
@ -51,12 +51,7 @@ def test_second_user_reads_first_users_message(live_app):
    assert status in (200, 201) and team.get("id"), f"team create HTTP {status}: {team!r}"
    status, chan = harness_http.http_post(
        f"{base}/channels",
-        data={
+        data={"team_id": team["id"], "name": f"c{uniq}", "display_name": f"chan {uniq}", "type": "O"},
            "team_id": team["id"],
            "name": f"c{uniq}",
            "display_name": f"chan {uniq}",
            "type": "O",
        },
        headers=auth_a,
        timeout=30,
    )
@ -65,10 +60,7 @@ def test_second_user_reads_first_users_message(live_app):
    # 2) user_a posts a unique marker
    marker = f"ccci-multiuser-{uniq}"
    status, post = harness_http.http_post(
-        f"{base}/posts",
+        f"{base}/posts", data={"channel_id": chan["id"], "message": marker}, headers=auth_a, timeout=30
        data={"channel_id": chan["id"], "message": marker},
        headers=auth_a,
        timeout=30,
    )
    assert status in (200, 201) and post.get("id"), f"post create HTTP {status}: {post!r}"
@ -105,6 +97,6 @@ def test_second_user_reads_first_users_message(live_app):
    # 5) user_b sees user_a's marker (cross-user delivery, not a self read-back)
    messages = [p.get("message") for p in (posts.get("posts") or {}).values()]
-    assert (
+    assert marker in messages, (
-        marker in messages
+        f"user_b did not see user_a's message {marker!r} in the channel; saw {messages!r}"
-    ), f"user_b did not see user_a's message {marker!r} in the channel; saw {messages!r}"
+    )
--- a/tests/mattermost-lts/test_install.py
+++ b/tests/mattermost-lts/test_install.py
@ -15,4 +15,6 @@ def test_serving_and_api(live_app, meta):
    generic.assert_serving(live_app, meta)
    # ... then the recipe-specific assertion: the mattermost REST liveness endpoint answers 200.
    status = lifecycle.http_get(live_app, "/api/v4/system/ping")
-    assert status == 200, f"expected 200 from {live_app}/api/v4/system/ping, got {status}"
+    assert status == 200, (
        f"expected 200 from {live_app}/api/v4/system/ping, got {status}"
    )
--- a/tests/mumble/functional/_mumble_proto.py
+++ b/tests/mumble/functional/_mumble_proto.py
@ -12,7 +12,6 @@ cc-ci host (`mode: host`); tests run on-host via cc-ci-run, so they connect to 1
 from __future__ import annotations
 import contextlib
 import socket
 import ssl
 import struct
@ -30,14 +29,8 @@ MSG_USERSTATE = 9
 MSG_SERVERCONFIG = 24
 REJECT_TYPES = {
-    0: "None",
+    0: "None", 1: "WrongVersion", 2: "InvalidUsername", 3: "WrongUserPW",
-    1: "WrongVersion",
+    4: "WrongServerPW", 5: "UsernameInUse", 6: "ServerFull", 7: "NoCertificate",
    2: "InvalidUsername",
    3: "WrongUserPW",
    4: "WrongServerPW",
    5: "UsernameInUse",
    6: "ServerFull",
    7: "NoCertificate",
    8: "AuthenticatorFail",
 }
@ -88,7 +81,7 @@ def _dec_fields(data: bytes) -> dict:
            off += 8
        elif wire == 2:
            length, off = _dec_varint(data, off)
-            raw = data[off : off + length]
+            raw = data[off:off + length]
            off += length
            try:
                value = raw.decode("utf-8")
@ -127,11 +120,9 @@ def _recv(sock, timeout: float) -> tuple[int, bytes]:
 def _build_version() -> bytes:
    v = (1 << 16) | (5 << 8) | 0  # pretend client 1.5.0
-    return (
+    return (_enc_field_varint(1, v)
-        _enc_field_varint(1, v)
+            + _enc_field_string(2, "cc-ci mumble probe 1.0")
-        + _enc_field_string(2, "cc-ci mumble probe 1.0")
+            + _enc_field_string(3, "Linux"))
        + _enc_field_string(3, "Linux")
    )
 def _build_authenticate(username: str, password: str = "") -> bytes:
@ -142,29 +133,18 @@ def _build_authenticate(username: str, password: str = "") -> bytes:
    return payload
-def handshake(
+def handshake(host: str = "127.0.0.1", port: int = PORT, username: str = "cc-ci-probe",
-    host: str = "127.0.0.1",
+              password: str = "", timeout: float = 20.0) -> dict:
    port: int = PORT,
    username: str = "cc-ci-probe",
    password: str = "",
    timeout: float = 20.0,
 ) -> dict:
    """Full Mumble control-channel handshake. Returns a result dict:
-    tls_connect (bool), server_version (dict|None), auth_accepted (bool), channels (list[str]),
+      tls_connect (bool), server_version (dict|None), auth_accepted (bool), channels (list[str]),
-    users (list[str]), server_sync (bool), welcome_text (str|None), server_config (dict),
+      users (list[str]), server_sync (bool), welcome_text (str|None), server_config (dict),
-    error (str|None).
+      error (str|None).
    """
    result = {
-        "tls_connect": False,
+        "tls_connect": False, "server_version": None, "auth_accepted": False,
-        "server_version": None,
+        "channels": [], "users": [], "server_sync": False, "welcome_text": None,
-        "auth_accepted": False,
+        "server_config": {}, "error": None,
        "channels": [],
        "users": [],
        "server_sync": False,
        "welcome_text": None,
        "server_config": {},
        "error": None,
    }
    raw = tls = None
    try:
@ -201,21 +181,19 @@ def handshake(
                break
            try:
                msg_type, payload = _recv(tls, timeout=remaining)
-            except (TimeoutError, ConnectionError):
+            except (socket.timeout, ConnectionError):
                break
            if msg_type == MSG_VERSION:
                f = _dec_fields(payload)
                v1 = f.get(1, 0)
                result["server_version"] = {
                    "string": f"{(v1 >> 16) & 0xFF}.{(v1 >> 8) & 0xFF}.{v1 & 0xFF}",
-                    "release": f.get(2, ""),
+                    "release": f.get(2, ""), "os": f.get(3, ""),
                    "os": f.get(3, ""),
                }
            elif msg_type == MSG_REJECT:
                f = _dec_fields(payload)
-                result["error"] = (
+                result["error"] = (f"Rejected: {REJECT_TYPES.get(f.get(1, 0), 'Unknown')} "
-                    f"Rejected: {REJECT_TYPES.get(f.get(1, 0), 'Unknown')} " f"— {f.get(2, '')}"
+                                   f"— {f.get(2, '')}")
                )
                return result
            elif msg_type == MSG_CHANNELSTATE:
                f = _dec_fields(payload)
@ -231,12 +209,9 @@ def handshake(
                # ServerConfig fields: 1 max_bandwidth, 2 welcome_text, 3 allow_html,
                # 4 message_length, 5 image_message_length, 6 max_users
                result["server_config"] = {
-                    "max_bandwidth": f.get(1),
+                    "max_bandwidth": f.get(1), "welcome_text": f.get(2),
-                    "welcome_text": f.get(2),
+                    "allow_html": f.get(3), "message_length": f.get(4),
-                    "allow_html": f.get(3),
+                    "image_message_length": f.get(5), "max_users": f.get(6),
                    "message_length": f.get(4),
                    "image_message_length": f.get(5),
                    "max_users": f.get(6),
                }
            elif msg_type == MSG_SERVERSYNC:
                f = _dec_fields(payload)
@ -255,8 +230,10 @@ def handshake(
        result["error"] = f"{type(e).__name__}: {e}"
    finally:
        if tls is not None:
-            with contextlib.suppress(OSError):
+            try:
                tls.shutdown(socket.SHUT_RDWR)
            except OSError:
                pass
            tls.close()
        elif raw is not None:
            raw.close()
--- a/tests/mumble/functional/test_protocol_handshake.py
+++ b/tests/mumble/functional/test_protocol_handshake.py
@ -25,7 +25,7 @@ def test_handshake_completes_with_channel_presence(live_app):
    assert r["server_version"] is not None, "server did not send a Version message"
    assert r["auth_accepted"], f"authentication not accepted — {r.get('error')}"
    # Channel presence: the server must expose at least the root channel (beyond a bare TCP open).
-    assert (
+    assert len(r["channels"]) >= 1, (
-        len(r["channels"]) >= 1
+        f"server reported no channels (expected >=1 root channel) — {r!r}"
-    ), f"server reported no channels (expected >=1 root channel) — {r!r}"
+    )
    assert r["server_sync"], f"ServerSync handshake did not complete — {r.get('error')}"
--- a/tests/mumble/functional/test_server_config_limits.py
+++ b/tests/mumble/functional/test_server_config_limits.py
@ -32,7 +32,6 @@ def test_configured_max_users_surfaces_in_serverconfig(live_app):
    )
    # allow_html defaults true in the recipe; assert it is present/boolean to prove the field set
    # is the real ServerConfig (not an empty/garbled decode).
-    assert cfg.get("allow_html") in (
+    assert cfg.get("allow_html") in (0, 1), (
-        0,
+        f"ServerConfig.allow_html unexpected: {cfg.get('allow_html')!r}"
-        1,
+    )
    ), f"ServerConfig.allow_html unexpected: {cfg.get('allow_html')!r}"
--- a/tests/mumble/recipe_meta.py
+++ b/tests/mumble/recipe_meta.py
@ -25,10 +25,10 @@
 #   WELCOME_TEXT -> MUMBLE_CONFIG_WELCOMETEXT, surfaced in the ServerSync welcome_text.
 #   USERS        -> MUMBLE_CONFIG_USERS (max users), surfaced in the ServerConfig.max_users.
-HEALTH_PATH = "/"  # mumble-web client UI (present on both 0.2.0 base and 1.0.0 latest)
+HEALTH_PATH = "/"          # mumble-web client UI (present on both 0.2.0 base and 1.0.0 latest)
 HEALTH_OK = (200,)
-DEPLOY_TIMEOUT = 900  # two images to pull (mumble-server + mumble-web) on a cold node
+DEPLOY_TIMEOUT = 900       # two images to pull (mumble-server + mumble-web) on a cold node
 HTTP_TIMEOUT = 300
 # A unique, stable welcome-text marker the round-trip test asserts surfaces over the protocol.
--- a/tests/mumble/test_backup.py
+++ b/tests/mumble/test_backup.py
@ -23,6 +23,6 @@ def _sqlite(domain, sql):
 def test_backup_captures_state(live_app):
-    assert (
+    assert _sqlite(live_app, "SELECT v FROM ci_marker;") == "original", (
-        _sqlite(live_app, "SELECT v FROM ci_marker;") == "original"
+        "the seeded mumble sqlite marker was not present at backup time"
-    ), "the seeded mumble sqlite marker was not present at backup time"
+    )
--- a/tests/mumble/test_restore.py
+++ b/tests/mumble/test_restore.py
@ -25,6 +25,6 @@ def _sqlite(domain, sql):
 def test_restore_returns_state(live_app):
-    assert (
+    assert _sqlite(live_app, "SELECT v FROM ci_marker;") == "original", (
-        _sqlite(live_app, "SELECT v FROM ci_marker;") == "original"
+        "restore did not return the pre-mutation mumble sqlite marker (data-integrity failure)"
-    ), "restore did not return the pre-mutation mumble sqlite marker (data-integrity failure)"
+    )
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
autonomic-bot	46e2cdb93e	refactor(level): four essential rungs only — integration & recipe-local are optional Some checks failed continuous-integration/drone/push Build is failing Details Per operator: the level ladder is now the FOUR essential rungs every recipe is held to — install, upgrade (essential), backup/restore, functional (top = L4). Integration (SSO/OIDC) and recipe-local are OPTIONAL capabilities: they no longer appear as level rungs or skip rows and never cap the level. SSO is still enforced for the run VERDICT (unchanged in run_recipe_ci.py); it just doesn't affect the level. derive_rungs simplified accordingly (drops declared/deps/sso/repo-local inputs). custom-html-tiny's EXPECTED_NA is back to just backup_restore. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 02:55:47 +00:00
autonomic-bot	3980340727	test(card): cover _skip_rows (intentional green / unintentional amber) Some checks failed continuous-integration/drone/push Build is failing Details Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 02:42:57 +00:00
autonomic-bot	d20ad1e989	feat(card): show skipped rungs as rows — INTENTIONAL SKIP (green) with reason below Some checks failed continuous-integration/drone/push Build is failing Details Per operator: intentional skips now render like a pass row but labelled 'INTENTIONAL SKIP' (muted green) with the declared reason on the line beneath; unintentional skips render amber 'UNINTENTIONAL SKIP' with a prompt to add a test or declare them. The cap line is back to just the level-cap reason (the per-rung reason now lives in the rows). Labelled, so it never reads as a PASS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 02:42:05 +00:00
autonomic-bot	b3ab68a9dd	refactor: simplify to a list of intentionally-skipped rungs Some checks failed continuous-integration/drone/push Build is failing Details Per operator: drop the gap-sensitivity / cap-intent-clause / stale-detection machinery. Model is now dead simple — recipe_meta.EXPECTED_NA = {rung: reason} lists the rungs a recipe intentionally skips; ANY rung skipped (N/A) and not in that list is unintentional. results.json: replace the 'na' block + level_cap_intent with skips: { intentional: {rung: reason}, unintentional: [rung] } plus level_cap_rung (which rung capped). Badge/card derive intentional-vs- unintentional from whether the capping rung is in the intentional list. Skips still cap the level (never inflate). custom-html-tiny lists all three rungs it intentionally skips (backup_restore, integration, recipe_local). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 02:36:53 +00:00
autonomic-bot	d733e2c4ca	feat(card): badge differentiates expected vs unexpected skip Some checks failed continuous-integration/drone/push Build is failing Details The level badge gains a third segment derived from level_cap_intent: - amber 'gap?' when the climb was capped by an UNDECLARED gap-sensitive N/A (backup_restore / functional) — a likely-missing test (unexpected skip) - muted 'expected' when capped by a DECLARED intentional N/A (reviewed, nothing to fix) - nothing extra for a clean cap, a full climb, or a real failure. Font-safe text labels (no emoji) so the SVG renders headless/anywhere. Badge never inflates — it only annotates the cap the level already reflects. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 02:26:44 +00:00
autonomic-bot	f3a1ad5388	test: representative expected_na scenario (functional covered, backup declared-N/A) Some checks failed continuous-integration/drone/push Build is failing Details Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 02:00:16 +00:00
autonomic-bot	3b0a3d14ea	feat(harness): declare intentional N/A tiers + custom-html-tiny functional test Some checks failed continuous-integration/drone/push Build is failing Details Two changes the operator asked for after noticing custom-html-tiny PR #6 has no backup/restore or functional coverage: 1) Intentional-vs-accidental N/A. A recipe can now declare recipe_meta.EXPECTED_NA = {rung: reason} to mark a tier as deliberately not applicable (e.g. a stateless static server has no backup surface). N/A still caps the level — the harness never claims a rung it did not verify — but the run is now annotated 'intentional · <reason>' instead of being indistinguishable from a forgotten test. An undeclared N/A on a gap-sensitive rung (backup_restore, functional) is surfaced as a 'possible coverage gap', and a stale EXPECTED_NA (declared N/A but actually exercised) is surfaced too. All non-blocking (R7): results.json gains level_cap_intent + an block, the summary card shows the clause, and the CI log prints the gap/stale warnings. (results.classify_na/cap_intent are pure + unit-tested; level.py untouched.) custom-html-tiny declares backup_restore intentionally N/A. 2) custom-html-tiny functional test: writes a random file into the served content volume (via the volume mountpoint, like install_steps.sh, since the SWS image is shell-less), asserts exact-byte round-trip + a real 404 on a missing path — proving the static-web-server actually serves the volume, not a 200-everything fallback. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 01:59:28 +00:00