fix(2): F2-5 — dep teardown verify=True, errors propagate to run-fail (Adversary cold)

Per REVIEW-2 ## Q2 FAIL: runner/harness/deps.py::teardown_deps suppressed ALL exceptions via contextlib.suppress(Exception), silently swallowing teardown failures. The 'DEPS teardown' print fired even when undeploy actually raised — leaving leftover swarm services/volumes/secrets that broke the NEXT run targeting the same deterministic dep domain (this is what caused the Q3.1 dep flake I saw immediately after the Q2.4 acceptance run). Fix: - runner/harness/deps.py: teardown_deps now uses lifecycle.teardown_app(..., verify=True) so residuals raise TeardownError. Errors are LOGGED LOUDLY per-dep but we continue to other deps so one failure doesn't strand the rest. After all attempts: raise a combined TeardownError if any dep failed. - runner/run_recipe_ci.py: orchestrator catches the dep TeardownError in finally, prints it, captures into dep_teardown_error; the run summary surfaces it and the exit code is non-zero. The run STILL prints the diagnosable summary so a leak doesn't hide other failures. Per §9 teardown sacred / DG7: a green run that leaks state is not 'green'. F2-5 now correctly fails the run instead of silently passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 09:00:37 +01:00
parent 9a857d9ef4
commit c6e94af766
2 changed files with 35 additions and 6 deletions
--- a/runner/harness/deps.py
+++ b/runner/harness/deps.py
@ -112,15 +112,31 @@ def deploy_deps(


 def teardown_deps(state: list[dict]) -> None:
-    """Undeploy each dep in reverse order. Suppresses exceptions per-dep so one teardown failure
-    doesn't strand the others. Mirrors the orchestrator's teardown_app(verify=False) pattern."""
+    """Undeploy each dep in reverse order. **VERIFY=True (F2-5 fix)**: per plan §9 teardown is
+    sacred — a dep that leaks containers/volumes/secrets corrupts the next run that uses the same
+    deterministic dep domain.
+
+    Failures are LOGGED LOUDLY (not silently suppressed) so a leak is visible in the run output;
+    we continue to teardown other deps so one failure doesn't strand the rest; after all attempts
+    we **raise** if any dep failed to fully teardown — the orchestrator's outer `finally` then
+    decides whether the leak is a run-failure (it should be, mirroring lifecycle.teardown_app's
+    own raise-on-residual behaviour at `verify=True`).
+    """
+    errors: list[str] = []
    for entry in reversed(state):
        domain = entry.get("domain")
        if not domain:
            continue
-        with contextlib.suppress(Exception):
-            print(f"  dep: tearing down {entry.get('recipe')} @ {domain}", flush=True)
-            lifecycle.teardown_app(domain, verify=False)
+        recipe = entry.get("recipe", "?")
+        print(f"  dep: tearing down {recipe} @ {domain}", flush=True)
+        try:
+            lifecycle.teardown_app(domain, verify=True)
+        except Exception as e:  # noqa: BLE001 — every failure must be visible, but we want to try the rest first
+            msg = f"dep {recipe} @ {domain} teardown failed: {e}"
+            print(f"  !! {msg}", flush=True)
+            errors.append(msg)
+    if errors:
+        raise lifecycle.TeardownError("dep teardown failures: " + " ; ".join(errors))


 def load_run_state() -> list[dict]: