Files
cc-ci/machine-docs/BACKLOG-1e.md
autonomic-bot 49dc00a504 status(1e): E2/HC1 CLAIMED — chaos-version==head_ref proven on hedgedoc
upgrade→PR-head: head_ref=09bf4d54 chaos-version=09bf4d54 version=3.0.9+1.10.7→3.0.10+1.10.8
  deploy-count = 1; install/upgrade=pass; clean teardown.

E1/HC3 + E0/HC2 both Adversary PASS. Awaiting Adversary cold-verify HC1 + HC4 for ## DONE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 04:05:42 +01:00

4.7 KiB
Raw Blame History

BACKLOG — Phase 1e (generic-harness corrections)

Phase-namespaced backlog. Builder edits ## Build backlog; Adversary edits ## Adversary findings.

Build backlog

  • E0 / HC2 — repo-local approval allowlist (tests/repo-local-approved.txt, default-deny); gate discovery.resolve_op/custom_tests/install_steps behind repo_local_approved(recipe); update unit tests (tests/unit/test_discovery.py) for approved vs non-approved.
  • E1 / HC3 — generic-by-default (additive); op/assertion split. Orchestrator performs each mutating op once; runs generic test_.py (unless opt-out) + overlay test_.py. Opt-out: CCCI_SKIP_GENERIC / CCCI_SKIP_GENERIC_<OP> / recipe_meta.SKIP_GENERIC. Pre-op seed via optional tests/<recipe>/ops.py. Migrate generic + overlays to assertion-only. Keep count==1.
  • E2 / HC1 — upgrade to PR head via abra app deploy --chaos: deploy prev, re-checkout PR head, chaos redeploy in place; adapt moved-assertion (chaos label proof); reconcile deploy-count.
  • E3 / HC4 — docs (docs/testing.md, enroll-recipe.md) + DECISIONS; claim gates; await Adversary cold-verify of HC1HC4; flip STATUS-1e → ## DONE on full PASS.

Adversary findings

  • F1e-1 [adversary] (CLOSED @2026-05-28, fix-verified cold on commit 6eabfdc)lifecycle.exec_in_app silently swallows a failed docker exec (returns empty stdout, returncode ignored) → backup/restore data-continuity overlays go RED on a healthy recipe when the post-op container cycle is slow. Found cold-verifying E1/HC3 (commit b7e6cbd) on custom-html: one opt-out run had backup=FAIL with AssertionError: '' == 'original' from tests/custom-html/test_backup.py::test_backup_captures_state — the marker cat returned empty. CORRECTION (2026-05-28): isolated, no-concurrency repro (3× opt-out + 1× default, install,backup,restore) — 4/4 PASS, deploy-count=1 each. So the opt-out flag is NOT the trigger (my earlier "removes the ~1s generic-pytest timing buffer" theory is withdrawn); the original symptom coincided with parallel Builder e2e runs loading the node. Real trigger: load / concurrency slowing the post-backup container cycle into a window where exec_in_app's docker exec fails. The static defect is the same regardless of trigger. Root cause (static): exec_in_app runs docker exec <cid> … and returns proc.stdout without checking returncode; when backup-bot cycles the app container post-op, docker exec can fail → empty stdout silently passed back as data. The backup/restore overlays read via exec_in_app immediately after the cycling op with no readiness retry, despite docstrings claiming immunity. (Secondary risk: a failed exec masquerading as "" could also make a real failure spuriously pass in a different assertion.) Repro (orig symptom): under any concurrent same-recipe load, an opt-out STAGES=install,backup,restore custom-html run can show test_backup_captures_state empty-string AssertionError. Status: Builder pushed fix at commit 6eabfdcexec_in_app now polls (re-resolve container + re-exec) until rc==0 or 90s, then raises (never masks failed exec as empty). No assertion weakened. Adversary fix-verification in flight on /tmp/adv-fix. Closes when: cold-verified PASS under opt-out (and a reasonable concurrency probe), per Adversary close-rule.

  • F1e-2 [adversary]Two concurrent same-recipe runs collide on ~/.abra/recipes/<recipe> (rm-rf + abra-fetch race). Found during a controlled 2-concurrent custom-html test (PR=8001, PR=8002): run-a died at subprocess.CalledProcessError: 'abra recipe fetch custom-html -n' rc=1; run-b completed all-green. Cause: runner/run_recipe_ci.py::fetch_recipe does rm -rf ~/.abra/recipes/<recipe> then abra recipe fetch <recipe> -n — concurrent execution on the same recipe races on the same directory. Domain/volume/secret isolation hold (different PRs ⇒ different domains), but the shared recipe checkout is a serialisation point. Why it matters: §6/D-gate requires "two concurrent !testme runs don't collide." Drone caps MAX_TESTS=1-2 today so practical impact is bounded, but as breadth scales (D10) this surfaces. Pre-existing in 1d; orthogonal to E1/HC3; not blocking E1. Fix direction: per-run recipe snapshot dir (~/.abra/recipes/<recipe> may need to be run-scoped, or a flock around fetch+checkout, or move PR-head clones out of the shared abra dir). Status: Filed for HC4 / no-regression scope.