15 KiB
REVIEW-1e — Adversary verdicts (Phase 1e: generic-harness corrections)
Adversary-owned, append-only. Phase plan: /srv/cc-ci/cc-ci-plan/plan-phase1e-harness-corrections.md.
Definition of Done = HC1–HC4 each cold-verified PASS here (handshake per plan.md §6.1).
Definition-of-Done tracker
- HC1 — Upgrade tier upgrades to PR head (prev published → PR-head via
abra app deploy --chaos), not a published tag; moved-assertion adapted; DG4.1 deploy-count guard reconciled. PASS @2026-05-28 (E2, commit7472561). - HC2 — Repo-local (PR-authored)
test_*.py/install_steps.shNOT executed unless recipe is on the cc-ci approval allowlist (default-deny). PASS @2026-05-28 (E0, commitc7ae296). - HC3 — Generic runs by default alongside an overlay (additive); skipped only via explicit opt-out; op runs once. PASS @2026-05-28 (E1 re-claim, fix commit
6eabfdc). - HC4 — No regression: D1–D10 / DG1–DG8 re-verified cold; deploy-once (DG4.1) holds; teardown sacred; three new behaviors demonstrated. PASS @2026-05-28 (E3, build 155 own
!testmeon custom-html PR#2).
Maps to Builder milestones: E0=HC2, E1=HC3, E2=HC1, E3=HC4+docs.
Cold-start access (re-verified each phase)
- @2026-05-28 —
ssh cc-ciOK (NixOS 24.11), dashboard HTTP 200 via SOCKS proxy 127.0.0.1:1055. Proxy/SSH path healthy.
Verdicts
E0 / HC2 — repo-local trust gate (default-deny) — PASS @2026-05-28
Builder claim (STATUS-1e, commit c7ae296 / feat d38a695): repo-local (PR-authored)
test_*.py/install_steps.sh/ops.py consulted only for recipes on tests/repo-local-approved.txt
(empty ⇒ deny); centralized _gated() in discovery.py; 8 unit tests pass.
Cold verification (own clone HEAD=c7ae296, shipped to cc-ci, run via cc-ci-run):
- Unit suite, independent run:
cd /tmp/adv-1e && cc-ci-run -m pytest tests/unit -v→ 8 passed in 0.06s (incl. repo-local-ignored-when-unapproved / wins-when-approved for overlay+custom+install_steps+pre_op, and default-allowlist-is-empty). - My own break-it probe (
hc2_probe.py, planted a HOSTILE repo-localinstall_steps.shrm -rf /+ops.pyos.system('id')+test_install.py):- real checked-in allowlist →
approved_recipes() == set()(default-deny). real-default→approved=False, overlay falls back to cc-ci,install_steps=None,pre_op=None(hostile repo-local code NOT selected).- lone
*→ DENY (not a wildcard, as the file header promises). - only-comment / whitespace lines → DENY.
- approving a different recipe (hedgedoc) → custom-html still DENY (no leak).
custom-htmllisted →approved=True, overlay/install_steps/pre_op all flip to repo-local.
- real checked-in allowlist →
- No bypass: every execution path in
runner/run_recipe_ci.pyroutes through gateddiscovery.*(resolve_op→resolve_overlay_op,custom_tests,install_steps→lifecycle hook).snapshot_recipe_testsreads the repo-local dir ungated but only copies it (discover), never executes — matches the plan's "discovered-but-NOT-executed".pre_op_hooknot yet wired into the orchestrator (E1/HC3 work); its discovery fn is already gated.
Verdict: PASS — default-secure, centralized gate, flips only on explicit per-recipe approval;
hostile repo-local code provably not executed under the shipped default. No finding.
Note (not a defect): orchestrator still uses single-file override resolve_op (1d semantics);
the additive generic floor (HC3) is E1 in-flight — will re-check the gate survives the HC3 refactor.
E1 / HC3 — additive generic + op/assertion split — FAIL (PASS WITHHELD) @2026-05-28
Builder claim (STATUS-1e gate, commit b7e6cbd): generic runs additively alongside overlays;
orchestrator owns each op (once); opt-out via CCCI_SKIP_GENERIC[_<OP>]/recipe_meta.SKIP_GENERIC;
deploy-count stays 1; two e2e (default + opt-out) "clean."
Cold verification (own clone HEAD=b7e6cbd shipped to cc-ci /tmp/adv-1e, run via cc-ci-run):
- Structure (PASS): read the refactor —
run_lifecycle_tierperforms the op ONCE (_perform_op→generic.perform_{upgrade,backup,restore}, none calldeploy_app), then runs generic (unless_skip_generic) + overlay as separate pytests vs the shared post-op state. Generic+overlay test files are assertion-only; seeding moved toops.py pre_<op>.assert_upgradedkeeps the non-vacuous move check (F1d-2)._record_deploy()lives only indeploy_app. - Default e2e (custom-html, all stages): EVERY tier ran BOTH
assert (generic)ANDassert (cc-ci); pre_upgrade/backup/restore seeds fired; deploy-count=1; install/upgrade/backup/ restore all PASS; custom=skip; clean teardown (no leftover stack/volume). ✓ additive confirmed. - Opt-out e2e (
CCCI_SKIP_GENERIC=1): generic skipped on every tier (0_generic/files ran), overlay-only, deploy-count=1 ✓ — but backup=FAIL:test_backup_captures_state→AssertionError: '' == 'original'. Same code/recipe; only diff is the opt-out flag.
Interim verdict (commit 4334e19): FAIL — opt-out flipped backup RED, theorised cause was the
opt-out path removing an accidental ~1s generic-pytest timing buffer. Filed F1e-1.
CORRECTION @2026-05-28 (isolated repro disproved the opt-out theory)
Isolated, no-concurrency repro of STAGES=install,backup,restore on custom-html:
- opt-out × 3 (
CCCI_SKIP_GENERIC=1): backup PASS, restore PASS, deploy-count=1. 3/3. - default × 1: backup PASS, restore PASS, deploy-count=1.
So opting out of the generic is NOT what flips the backup RED — the original symptom occurred while
the Builder was running concurrent custom-html e2e on the same node. The real trigger is load /
concurrency putting the post-backup container cycle into a window where exec_in_app's docker exec
fails. The static defect stays the same (and the fix direction in F1e-1 is still correct):
exec_in_app silently returns empty stdout on a failed exec (returncode ignored) + no readiness retry.
F1e-1 reframed in BACKLOG-1e; my earlier "opt-out is not behavior-neutral" framing is withdrawn.
Builder's fix (commit 6eabfdc) — verification pending
exec_in_app now polls (re-resolves container + re-execs) until rc==0 or 90s, then raises —
never masks a failed exec as empty data. No assertion weakened. Same commit also lands HC1 plumbing
(chaos_redeploy, recipe_head_commit, .chaos-version parsing in deployed_identity, head_ref
match in assert_upgraded) — out-of-scope for this re-verification, will check at E2 claim.
Fix verified cold @2026-05-28 (own clone HEAD=6eabfdc shipped to /tmp/adv-fix):
CCCI_SKIP_GENERIC=1 RECIPE=custom-html STAGES=install,backup,restore cc-ci-run runner/run_recipe_ci.py
→ install/backup/restore all PASS, deploy-count=1, generic skipped on every tier (overlay-only),
clean teardown (no leftover stack/volume). The exec_in_app poll+raise is structurally watertight:
re-resolves the container each try, raises on persistent failure — no silent-empty data path remains;
a real exec failure becomes a real test failure rather than an '' == 'original' false-RED.
F1e-1 closed by Adversary @2026-05-28 (BACKLOG-1e).
Final E1/HC3 verdict — PASS @2026-05-28 (re-claim commit e75ec1b; fix commit 6eabfdc)
Cold-verified: (1) additive — every lifecycle tier runs both assert (generic) and assert (cc-ci) on
the shared post-op deployment (default run, all stages PASS); (2) opt-out — CCCI_SKIP_GENERIC=1
skips the generic on every tier with 0 _generic/ files run and overlay-only, deploy-count=1;
(3) op-once — op primitives perform_{upgrade,backup,restore} never call deploy_app, deploy-count
stays 1 in both modes; (4) assertion-only overlays — no double-op risk; (5) no assertion weakened —
assert_upgraded keeps the non-vacuous move check (F1d-2 honored). HC2 gate survives the refactor.
Open robustness item: F1e-2 (recipe-fetch concurrency race) — pre-existing, orthogonal, tracked
for HC4.
E2 / HC1 — upgrade to PR head via chaos redeploy — PASS @2026-05-28 (commit 7472561)
Builder claim (STATUS-1e gate, commit 7472561 fixing 6eabfdc multi-line-edit-miss): upgrade tier now
re-checks-out the PR-head ref (head_ref = $REF or recipe_head_commit(recipe), captured pre-tag-checkout)
and chaos-redeploys (abra.deploy(chaos=True) direct, not via deploy_app — count not incremented).
assert_upgraded (when head_ref known) requires the deployed coop-cloud.<stack>.chaos-version label
to MATCH head_ref (prefix-tolerant for short ↔ full commit); falls back to the version/image/chaos
moved-check when head_ref is unknown.
Cold verification (own clone HEAD=7472561 shipped to /tmp/adv-hc1):
- e2e custom-html install,upgrade (
cc-ci-run runner/run_recipe_ci.py):===== TIER: upgrade (generic=run, overlay=cc-ci:tests/custom-html/test_upgrade.py) ===== upgrade→PR-head: head_ref=8a026066 chaos-version=8a026066 version=1.10.0+1.28.0→1.11.0+1.29.0 deploy-count = 1 (expect 1) install : pass upgrade : passhead_ref == chaos-version(deterministic prefix match), real version move 1.10.0→1.11.0, deploy-count=1, additive generic+overlay both ran post-op, clean teardown (no leftover stack/volume). ✓ PR-head code under test demonstrably deployed. - Adversarial probe — non-vacuousness: monkey-patched
deployed_identityto returnchaos='09bf4d54'against a fakehead_ref='deadbeefcafe0001'in op_state, calledgeneric.assert_upgradeddirectly →AssertionError: upgrade deployed chaos commit '09bf4d54', not the intended PR-head 'deadbeefcafe' — the re-checkout to the code under test failed. ✓ A wrong PR-head fails loudly; the assertion is strictly non-vacuous (guards F1d-2 and the prev- checkout-vacuous-pass bug that7472561itself just fixed).
Verdict: PASS — HC1 acceptance met. deploy-count guard correctly reconciled (chaos path direct;
_record_deploy lives only in deploy_app). No assertion weakened (the move-check fallback for the
no-head_ref path is unchanged; production !testme always sets $REF). HC3 additive still holds
(generic+overlay both ran post-chaos-deploy). No new finding.
Phase-1e D-o-D tracker: HC1 ✓ HC2 ✓ HC3 ✓ — three corrections all Adversary-verified cold. Pending: HC4 (no-regression D1–D10/DG1–DG8) — re-verify when Builder claims E3.
E3 / HC4 — no regression, three new behaviors live — PASS @2026-05-28 (Builder claim 6397cd5)
Gold-standard cold verification = my own !testme end-to-end. Posted three comments by the bot on
recipe-maintainers/custom-html PR#2 (head db9a9502, "upgrade to 1.13.0+1.31.1"):
- id 13755:
!testmexyz adversary-1e-HC4 ...— negative control (D1 reject) → no trigger ✓ - id 13756:
!testme adversary-1e-HC4 ...— negative control (extra text after !testme; exact-match filter) → no trigger ✓ - id 13757:
!testme(exact) at03:19:25— positive trigger.
Bridge → Drone → runner production chain (Drone build #155):
- D1 latency: triggered build 155 at
03:19:34— 9 s after comment (well under 60 s). - D1 dedup/auth: only id 13757 triggered; 13755+13756 cleanly ignored; PR-comment reflection (id
13758):
cc-ci: run for custom-html @ db9a9502 ✅ passed → …/cc-ci/155. - HC1 live: build log shows
upgrade→PR-head: head_ref=db9a9502 chaos-version=db9a9502 version=1.10.0+1.28.0→1.13.0+1.31.1. Full-sha matchdb9a9502 == db9a9502—$REFflowed bridge→Drone→runner→re-checkout→chaos deploy correctly. PR-head code under test demonstrably deployed in production. - HC3 additive in production: every lifecycle tier ran BOTH
assert (generic): tests/_generic/ test_<op>.pyANDassert (cc-ci): tests/custom-html/test_<op>.py, all PASSED (8 assertions across install/upgrade/backup/restore). - HC2 in production: custom-html not on the allowlist → no repo-local consulted; cc-ci + generic only (matches HC2 default-deny behavior under load).
- DG4.1:
deploy-count = 1 (expect 1)✓ - F1e-1 fix under real load:
test_backup_captures_state PASSED(the previously failing assertion). The poll+raise hardening ofexec_in_appsurvives a production-pipeline run. - D6 secret-leak grep: 58 infra-secret values (tokens, HMAC, RPC, OAuth, cert/key) checked against the full published build #155 log — zero matches; sensitive-pattern sweep clean.
- Teardown sacred: post-build,
docker stack ls | grep cust→ none;docker volume ls | grep cust→ none. ✓
No regression on the D-gate / DG-gate surface I can attribute to 1e changes:
- DG1 serving (assert_serving in every tier), DG2 upgrade non-vacuous (head_ref match
- monkey-patched mismatch raise), DG3 backup-capable detect (custom-html backup-cap = true; flowed
through), DG4 overlay precedence (gated by HC2), DG4.1 deploy-once, DG5 install-steps hook
resolution (HC2 verified hook still resolves; not e2e-re-exercised here because custom-html ships no
hook), DG6 full integration (build #155 above), DG7 DRY/teardown-always, DG8 docs (
docs/testing.md docs/enroll-recipe.mdboth updated for HC1/HC2/HC3 and accurately describe the new behavior).
- monkey-patched mismatch raise), DG3 backup-capable detect (custom-html backup-cap = true; flowed
through), DG4 overlay precedence (gated by HC2), DG4.1 deploy-once, DG5 install-steps hook
resolution (HC2 verified hook still resolves; not e2e-re-exercised here because custom-html ships no
hook), DG6 full integration (build #155 above), DG7 DRY/teardown-always, DG8 docs (
- D1 trigger / dedup / outcome reflection all live in build #155.
- D6 secrets verified clean as above.
F1e-2 (pre-existing concurrent abra recipe fetch race) — confirmed not a 1e regression by the
Builder's status; tracked in BACKLOG-1e for HC4 visibility, not blocking DONE (Drone caps MAX_TESTS=1
in current config, so practical impact bounded; surface again at breadth-ramp).
Verdict: PASS. NO VETO. All four HC items Adversary cold-verified within the last 24 h
(HC1/HC2/HC3/HC4 ✓). Builder may write ## DONE to STATUS-1e.md.
Final summary — Phase 1e cold verification
HC1 ✓ (E2, commit 7472561 + build #155 head_ref==chaos-version)
HC2 ✓ (E0, commit c7ae296 + hostile-code probe)
HC3 ✓ (E1, commit e75ec1b + F1e-1 fix 6eabfdc verified cold)
HC4 ✓ (E3, commit 6397cd5 + own !testme build #155 production-chain cold)
Findings: F1e-1 CLOSED (fixed + re-verified). F1e-2 OPEN (pre-existing, not a 1e regression).
Separate observation while testing (NOT F1e-1)
A controlled 2-concurrent same-recipe test (PR=8001/PR=8002, both custom-html) on the OLD code
showed run-a die in abra recipe fetch custom-html -n (rc=1) — concurrent rm-rf + abra-fetch on the
same ~/.abra/recipes/custom-html collide. Pre-existing (in 1d too), orthogonal to E1/HC3, not the
F1e-1 trigger. Filing separately as F1e-2 [adversary] for HC4 visibility (§6 D-gate requires
concurrent runs to be safe). Drone caps MAX_TESTS=1-2 today, so practical impact is bounded.