HC1 ✓ HC2 ✓ HC3 ✓ all Adversary cold-verified. F1e-2 (pre-existing 1d concurrent fetch race) not a 1e regression; tracked separately. Awaiting Adversary HC4 verdict → ## DONE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.7 KiB
BACKLOG — Phase 1e (generic-harness corrections)
Phase-namespaced backlog. Builder edits ## Build backlog; Adversary edits ## Adversary findings.
Build backlog
- E0 / HC2 — repo-local approval allowlist (
tests/repo-local-approved.txt, default-deny); gatediscovery.resolve_op/custom_tests/install_stepsbehindrepo_local_approved(recipe); update unit tests (tests/unit/test_discovery.py) for approved vs non-approved. - E1 / HC3 — generic-by-default (additive); op/assertion split. Orchestrator performs each
mutating op once; runs generic test_.py (unless opt-out) + overlay test_.py. Opt-out:
CCCI_SKIP_GENERIC/CCCI_SKIP_GENERIC_<OP>/recipe_meta.SKIP_GENERIC. Pre-op seed via optionaltests/<recipe>/ops.py. Migrate generic + overlays to assertion-only. Keep count==1. - E2 / HC1 — upgrade to PR head via
abra app deploy --chaos: deploy prev, re-checkout PR head, chaos redeploy in place; adapt moved-assertion (chaos label proof); reconcile deploy-count. - E3 / HC4 — docs (docs/testing.md, enroll-recipe.md) + DECISIONS; claim gates; await Adversary cold-verify of HC1–HC4; flip STATUS-1e → ## DONE on full PASS.
Adversary findings
-
F1e-1 [adversary] (CLOSED @2026-05-28, fix-verified cold on commit
6eabfdc) —lifecycle.exec_in_appsilently swallows a faileddocker exec(returns empty stdout, returncode ignored) → backup/restore data-continuity overlays go RED on a healthy recipe when the post-op container cycle is slow. Found cold-verifying E1/HC3 (commitb7e6cbd) on custom-html: one opt-out run had backup=FAIL withAssertionError: '' == 'original'fromtests/custom-html/test_backup.py::test_backup_captures_state— the markercatreturned empty. CORRECTION (2026-05-28): isolated, no-concurrency repro (3× opt-out + 1× default, install,backup,restore) — 4/4 PASS, deploy-count=1 each. So the opt-out flag is NOT the trigger (my earlier "removes the ~1s generic-pytest timing buffer" theory is withdrawn); the original symptom coincided with parallel Builder e2e runs loading the node. Real trigger: load / concurrency slowing the post-backup container cycle into a window whereexec_in_app'sdocker execfails. The static defect is the same regardless of trigger. Root cause (static):exec_in_apprunsdocker exec <cid> …and returnsproc.stdoutwithout checkingreturncode; when backup-bot cycles the app container post-op,docker execcan fail → empty stdout silently passed back as data. The backup/restore overlays read viaexec_in_appimmediately after the cycling op with no readiness retry, despite docstrings claiming immunity. (Secondary risk: a failed exec masquerading as""could also make a real failure spuriously pass in a different assertion.) Repro (orig symptom): under any concurrent same-recipe load, an opt-outSTAGES=install,backup,restorecustom-html run can showtest_backup_captures_stateempty-string AssertionError. Status: Builder pushed fix at commit6eabfdc—exec_in_appnow polls (re-resolve container + re-exec) untilrc==0or 90s, then raises (never masks failed exec as empty). No assertion weakened. Adversary fix-verification in flight on/tmp/adv-fix. Closes when: cold-verified PASS under opt-out (and a reasonable concurrency probe), per Adversary close-rule. -
F1e-2 [adversary] — Two concurrent same-recipe runs collide on
~/.abra/recipes/<recipe>(rm-rf + abra-fetch race). Found during a controlled 2-concurrent custom-html test (PR=8001, PR=8002): run-a died atsubprocess.CalledProcessError: 'abra recipe fetch custom-html -n' rc=1; run-b completed all-green. Cause:runner/run_recipe_ci.py::fetch_recipedoesrm -rf ~/.abra/recipes/<recipe>thenabra recipe fetch <recipe> -n— concurrent execution on the same recipe races on the same directory. Domain/volume/secret isolation hold (different PRs ⇒ different domains), but the shared recipe checkout is a serialisation point. Why it matters: §6/D-gate requires "two concurrent !testme runs don't collide." Drone capsMAX_TESTS=1-2today so practical impact is bounded, but as breadth scales (D10) this surfaces. Pre-existing in 1d; orthogonal to E1/HC3; not blocking E1. Fix direction: per-run recipe snapshot dir (~/.abra/recipes/<recipe>may need to be run-scoped, or a flock around fetch+checkout, or move PR-head clones out of the shared abra dir). Status: Filed for HC4 / no-regression scope.