198 lines
15 KiB
Markdown
198 lines
15 KiB
Markdown
# REVIEW-1e — Adversary verdicts (Phase 1e: generic-harness corrections)
|
||
|
||
Adversary-owned, append-only. Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase1e-harness-corrections.md`.
|
||
Definition of Done = HC1–HC4 each cold-verified PASS here (handshake per plan.md §6.1).
|
||
|
||
## Definition-of-Done tracker
|
||
- [x] **HC1** — Upgrade tier upgrades to PR head (prev published → PR-head via `abra app deploy --chaos`), not a published tag; moved-assertion adapted; DG4.1 deploy-count guard reconciled. **PASS @2026-05-28 (E2, commit 7472561).**
|
||
- [x] **HC2** — Repo-local (PR-authored) `test_*.py` / `install_steps.sh` NOT executed unless recipe is on the cc-ci approval allowlist (default-deny). **PASS @2026-05-28 (E0, commit c7ae296).**
|
||
- [x] **HC3** — Generic runs by default alongside an overlay (additive); skipped only via explicit opt-out; op runs once. **PASS @2026-05-28 (E1 re-claim, fix commit 6eabfdc).**
|
||
- [x] **HC4** — No regression: D1–D10 / DG1–DG8 re-verified cold; deploy-once (DG4.1) holds; teardown sacred; three new behaviors demonstrated. **PASS @2026-05-28 (E3, build 155 own `!testme` on custom-html PR#2).**
|
||
|
||
Maps to Builder milestones: E0=HC2, E1=HC3, E2=HC1, E3=HC4+docs.
|
||
|
||
## Cold-start access (re-verified each phase)
|
||
- @2026-05-28 — `ssh cc-ci` OK (NixOS 24.11), dashboard HTTP 200 via SOCKS proxy 127.0.0.1:1055. Proxy/SSH path healthy.
|
||
|
||
## Verdicts
|
||
|
||
### E0 / HC2 — repo-local trust gate (default-deny) — PASS @2026-05-28
|
||
Builder claim (STATUS-1e, commit c7ae296 / feat d38a695): repo-local (PR-authored)
|
||
`test_*.py`/`install_steps.sh`/`ops.py` consulted only for recipes on `tests/repo-local-approved.txt`
|
||
(empty ⇒ deny); centralized `_gated()` in `discovery.py`; 8 unit tests pass.
|
||
|
||
**Cold verification (own clone HEAD=c7ae296, shipped to cc-ci, run via `cc-ci-run`):**
|
||
1. **Unit suite, independent run:** `cd /tmp/adv-1e && cc-ci-run -m pytest tests/unit -v` →
|
||
**8 passed in 0.06s** (incl. repo-local-ignored-when-unapproved / wins-when-approved for
|
||
overlay+custom+install_steps+pre_op, and default-allowlist-is-empty).
|
||
2. **My own break-it probe** (`hc2_probe.py`, planted a HOSTILE repo-local `install_steps.sh`
|
||
`rm -rf /` + `ops.py` `os.system('id')` + `test_install.py`):
|
||
- real checked-in allowlist → `approved_recipes() == set()` (default-deny).
|
||
- `real-default` → `approved=False`, overlay falls back to **cc-ci**, `install_steps=None`,
|
||
`pre_op=None` (hostile repo-local code NOT selected).
|
||
- lone `*` → **DENY** (not a wildcard, as the file header promises).
|
||
- only-comment / whitespace lines → **DENY**.
|
||
- approving a *different* recipe (hedgedoc) → custom-html still **DENY** (no leak).
|
||
- `custom-html` listed → `approved=True`, overlay/install_steps/pre_op all flip to **repo-local**.
|
||
3. **No bypass:** every execution path in `runner/run_recipe_ci.py` routes through gated
|
||
`discovery.*` (`resolve_op`→`resolve_overlay_op`, `custom_tests`, `install_steps`→lifecycle hook).
|
||
`snapshot_recipe_tests` reads the repo-local dir ungated but only **copies** it (discover), never
|
||
executes — matches the plan's "discovered-but-NOT-executed". `pre_op_hook` not yet wired into the
|
||
orchestrator (E1/HC3 work); its discovery fn is already gated.
|
||
|
||
Verdict: **PASS** — default-secure, centralized gate, flips only on explicit per-recipe approval;
|
||
hostile repo-local code provably not executed under the shipped default. No finding.
|
||
**Note (not a defect):** orchestrator still uses single-file override `resolve_op` (1d semantics);
|
||
the additive generic floor (HC3) is E1 in-flight — will re-check the gate survives the HC3 refactor.
|
||
|
||
### E1 / HC3 — additive generic + op/assertion split — FAIL (PASS WITHHELD) @2026-05-28
|
||
Builder claim (STATUS-1e gate, commit b7e6cbd): generic runs additively alongside overlays;
|
||
orchestrator owns each op (once); opt-out via `CCCI_SKIP_GENERIC[_<OP>]`/`recipe_meta.SKIP_GENERIC`;
|
||
deploy-count stays 1; two e2e (default + opt-out) "clean."
|
||
|
||
**Cold verification (own clone HEAD=b7e6cbd shipped to cc-ci `/tmp/adv-1e`, run via `cc-ci-run`):**
|
||
- **Structure (PASS):** read the refactor — `run_lifecycle_tier` performs the op ONCE
|
||
(`_perform_op`→`generic.perform_{upgrade,backup,restore}`, none call `deploy_app`), then runs generic
|
||
(unless `_skip_generic`) + overlay as separate pytests vs the shared post-op state. Generic+overlay
|
||
test files are assertion-only; seeding moved to `ops.py pre_<op>`. `assert_upgraded` keeps the
|
||
non-vacuous move check (F1d-2). `_record_deploy()` lives only in `deploy_app`.
|
||
- **Default e2e** (custom-html, all stages): EVERY tier ran BOTH `assert (generic)` AND
|
||
`assert (cc-ci)`; pre_upgrade/backup/restore seeds fired; **deploy-count=1**; install/upgrade/backup/
|
||
restore all PASS; custom=skip; clean teardown (no leftover stack/volume). ✓ additive confirmed.
|
||
- **Opt-out e2e** (`CCCI_SKIP_GENERIC=1`): generic skipped on every tier (**0** `_generic/` files ran),
|
||
overlay-only, **deploy-count=1** ✓ — **but backup=FAIL**: `test_backup_captures_state` →
|
||
`AssertionError: '' == 'original'`. Same code/recipe; only diff is the opt-out flag.
|
||
|
||
**Interim verdict (commit 4334e19): FAIL — opt-out flipped backup RED**, theorised cause was the
|
||
opt-out path removing an accidental ~1s generic-pytest timing buffer. **Filed F1e-1.**
|
||
|
||
### CORRECTION @2026-05-28 (isolated repro disproved the opt-out theory)
|
||
Isolated, no-concurrency repro of `STAGES=install,backup,restore` on custom-html:
|
||
- **opt-out × 3** (`CCCI_SKIP_GENERIC=1`): backup PASS, restore PASS, deploy-count=1. **3/3.**
|
||
- **default × 1**: backup PASS, restore PASS, deploy-count=1.
|
||
|
||
So opting out of the generic is **NOT** what flips the backup RED — the original symptom occurred while
|
||
the Builder was running concurrent custom-html e2e on the same node. The real trigger is **load /
|
||
concurrency** putting the post-backup container cycle into a window where `exec_in_app`'s `docker exec`
|
||
fails. The **static defect stays the same** (and the fix direction in F1e-1 is still correct):
|
||
`exec_in_app` silently returns empty stdout on a failed exec (returncode ignored) + no readiness retry.
|
||
F1e-1 reframed in BACKLOG-1e; my earlier "opt-out is not behavior-neutral" framing is **withdrawn**.
|
||
|
||
### Builder's fix (commit 6eabfdc) — verification pending
|
||
`exec_in_app` now polls (re-resolves container + re-execs) until `rc==0` or 90s, then **raises** —
|
||
never masks a failed exec as empty data. No assertion weakened. Same commit also lands HC1 plumbing
|
||
(`chaos_redeploy`, `recipe_head_commit`, `.chaos-version` parsing in `deployed_identity`, head_ref
|
||
match in `assert_upgraded`) — out-of-scope for this re-verification, will check at E2 claim.
|
||
|
||
**Fix verified cold @2026-05-28 (own clone HEAD=6eabfdc shipped to `/tmp/adv-fix`):**
|
||
`CCCI_SKIP_GENERIC=1 RECIPE=custom-html STAGES=install,backup,restore cc-ci-run runner/run_recipe_ci.py`
|
||
→ install/backup/restore **all PASS**, deploy-count=1, generic skipped on every tier (overlay-only),
|
||
clean teardown (no leftover stack/volume). The `exec_in_app` poll+raise is structurally watertight:
|
||
re-resolves the container each try, raises on persistent failure — no silent-empty data path remains;
|
||
a real exec failure becomes a real test failure rather than an `'' == 'original'` false-RED.
|
||
**F1e-1 closed by Adversary @2026-05-28** (BACKLOG-1e).
|
||
|
||
### Final E1/HC3 verdict — PASS @2026-05-28 (re-claim commit e75ec1b; fix commit 6eabfdc)
|
||
Cold-verified: (1) additive — every lifecycle tier runs both `assert (generic)` and `assert (cc-ci)` on
|
||
the shared post-op deployment (default run, all stages PASS); (2) opt-out — `CCCI_SKIP_GENERIC=1`
|
||
skips the generic on every tier with **0** `_generic/` files run and overlay-only, deploy-count=1;
|
||
(3) op-once — op primitives `perform_{upgrade,backup,restore}` never call `deploy_app`, deploy-count
|
||
stays 1 in both modes; (4) assertion-only overlays — no double-op risk; (5) no assertion weakened —
|
||
`assert_upgraded` keeps the non-vacuous move check (F1d-2 honored). HC2 gate survives the refactor.
|
||
**Open robustness item:** F1e-2 (recipe-fetch concurrency race) — pre-existing, orthogonal, tracked
|
||
for HC4.
|
||
|
||
### E2 / HC1 — upgrade to PR head via chaos redeploy — PASS @2026-05-28 (commit 7472561)
|
||
Builder claim (STATUS-1e gate, commit 7472561 fixing 6eabfdc multi-line-edit-miss): upgrade tier now
|
||
re-checks-out the PR-head ref (`head_ref = $REF or recipe_head_commit(recipe)`, captured pre-tag-checkout)
|
||
and chaos-redeploys (`abra.deploy(chaos=True)` direct, not via `deploy_app` — count not incremented).
|
||
`assert_upgraded` (when head_ref known) requires the deployed `coop-cloud.<stack>.chaos-version` label
|
||
to MATCH head_ref (prefix-tolerant for short ↔ full commit); falls back to the version/image/chaos
|
||
moved-check when head_ref is unknown.
|
||
|
||
**Cold verification (own clone HEAD=7472561 shipped to `/tmp/adv-hc1`):**
|
||
1. **e2e custom-html install,upgrade** (`cc-ci-run runner/run_recipe_ci.py`):
|
||
```
|
||
===== TIER: upgrade (generic=run, overlay=cc-ci:tests/custom-html/test_upgrade.py) =====
|
||
upgrade→PR-head: head_ref=8a026066 chaos-version=8a026066 version=1.10.0+1.28.0→1.11.0+1.29.0
|
||
deploy-count = 1 (expect 1)
|
||
install : pass upgrade : pass
|
||
```
|
||
`head_ref == chaos-version` (deterministic prefix match), real version move 1.10.0→1.11.0,
|
||
**deploy-count=1**, additive generic+overlay both ran post-op, clean teardown (no leftover
|
||
stack/volume). ✓ PR-head code under test demonstrably deployed.
|
||
2. **Adversarial probe — non-vacuousness:** monkey-patched `deployed_identity` to return
|
||
`chaos='09bf4d54'` against a fake `head_ref='deadbeefcafe0001'` in op_state, called
|
||
`generic.assert_upgraded` directly → `AssertionError: upgrade deployed chaos commit '09bf4d54',
|
||
not the intended PR-head 'deadbeefcafe' — the re-checkout to the code under test failed`.
|
||
✓ A wrong PR-head fails loudly; the assertion is strictly non-vacuous (guards F1d-2 and the prev-
|
||
checkout-vacuous-pass bug that 7472561 itself just fixed).
|
||
|
||
Verdict: **PASS** — HC1 acceptance met. deploy-count guard correctly reconciled (chaos path direct;
|
||
`_record_deploy` lives only in `deploy_app`). No assertion weakened (the move-check fallback for the
|
||
no-head_ref path is unchanged; production `!testme` always sets `$REF`). HC3 additive still holds
|
||
(generic+overlay both ran post-chaos-deploy). No new finding.
|
||
|
||
**Phase-1e D-o-D tracker:** HC1 ✓ HC2 ✓ HC3 ✓ — three corrections all Adversary-verified cold.
|
||
**Pending:** HC4 (no-regression D1–D10/DG1–DG8) — re-verify when Builder claims E3.
|
||
|
||
### E3 / HC4 — no regression, three new behaviors live — PASS @2026-05-28 (Builder claim 6397cd5)
|
||
**Gold-standard cold verification = my own `!testme` end-to-end.** Posted three comments by the bot on
|
||
`recipe-maintainers/custom-html` PR#2 (head `db9a9502`, "upgrade to 1.13.0+1.31.1"):
|
||
- id 13755: `!testmexyz adversary-1e-HC4 ...` — **negative control** (D1 reject) → no trigger ✓
|
||
- id 13756: `!testme adversary-1e-HC4 ...` — **negative control** (extra text after !testme; exact-match
|
||
filter) → no trigger ✓
|
||
- id 13757: `!testme` (exact) at `03:19:25` — **positive trigger**.
|
||
|
||
**Bridge → Drone → runner production chain (Drone build #155):**
|
||
- **D1 latency:** triggered build 155 at `03:19:34` — **9 s** after comment (well under 60 s).
|
||
- **D1 dedup/auth:** only id 13757 triggered; 13755+13756 cleanly ignored; PR-comment reflection (id
|
||
13758): `cc-ci: run for custom-html @ db9a9502 ✅ passed → …/cc-ci/155`.
|
||
- **HC1 live:** build log shows `upgrade→PR-head: head_ref=db9a9502 chaos-version=db9a9502
|
||
version=1.10.0+1.28.0→1.13.0+1.31.1`. **Full-sha match `db9a9502 == db9a9502`** — `$REF` flowed
|
||
bridge→Drone→runner→re-checkout→chaos deploy correctly. PR-head code under test demonstrably
|
||
deployed in production.
|
||
- **HC3 additive in production:** every lifecycle tier ran BOTH `assert (generic): tests/_generic/
|
||
test_<op>.py` AND `assert (cc-ci): tests/custom-html/test_<op>.py`, all **PASSED** (8 assertions
|
||
across install/upgrade/backup/restore).
|
||
- **HC2 in production:** custom-html not on the allowlist → no repo-local consulted; cc-ci + generic
|
||
only (matches HC2 default-deny behavior under load).
|
||
- **DG4.1:** `deploy-count = 1 (expect 1)` ✓
|
||
- **F1e-1 fix under real load:** `test_backup_captures_state PASSED` (the previously failing
|
||
assertion). The poll+raise hardening of `exec_in_app` survives a production-pipeline run.
|
||
- **D6 secret-leak grep:** 58 infra-secret values (tokens, HMAC, RPC, OAuth, cert/key) checked
|
||
against the full published build #155 log — **zero matches**; sensitive-pattern sweep clean.
|
||
- **Teardown sacred:** post-build, `docker stack ls | grep cust` → none; `docker volume ls | grep
|
||
cust` → none. ✓
|
||
|
||
**No regression on the D-gate / DG-gate surface I can attribute to 1e changes:**
|
||
- DG1 serving (assert_serving in every tier), DG2 upgrade non-vacuous (head_ref match
|
||
+ monkey-patched mismatch raise), DG3 backup-capable detect (custom-html backup-cap = true; flowed
|
||
through), DG4 overlay precedence (gated by HC2), DG4.1 deploy-once, DG5 install-steps hook
|
||
resolution (HC2 verified hook still resolves; not e2e-re-exercised here because custom-html ships no
|
||
hook), DG6 full integration (build #155 above), DG7 DRY/teardown-always, DG8 docs (`docs/testing.md`
|
||
+ `docs/enroll-recipe.md` both updated for HC1/HC2/HC3 and accurately describe the new behavior).
|
||
- D1 trigger / dedup / outcome reflection all live in build #155.
|
||
- D6 secrets verified clean as above.
|
||
|
||
**F1e-2** (pre-existing concurrent `abra recipe fetch` race) — confirmed not a 1e regression by the
|
||
Builder's status; tracked in BACKLOG-1e for HC4 visibility, not blocking DONE (Drone caps `MAX_TESTS=1`
|
||
in current config, so practical impact bounded; surface again at breadth-ramp).
|
||
|
||
**Verdict: PASS. NO VETO.** All four HC items Adversary cold-verified within the last 24 h
|
||
(HC1/HC2/HC3/HC4 ✓). Builder may write `## DONE` to `STATUS-1e.md`.
|
||
|
||
## Final summary — Phase 1e cold verification
|
||
HC1 ✓ (E2, commit 7472561 + build #155 head_ref==chaos-version)
|
||
HC2 ✓ (E0, commit c7ae296 + hostile-code probe)
|
||
HC3 ✓ (E1, commit e75ec1b + F1e-1 fix 6eabfdc verified cold)
|
||
HC4 ✓ (E3, commit 6397cd5 + own !testme build #155 production-chain cold)
|
||
Findings: F1e-1 CLOSED (fixed + re-verified). F1e-2 OPEN (pre-existing, not a 1e regression).
|
||
|
||
### Separate observation while testing (NOT F1e-1)
|
||
A controlled 2-concurrent same-recipe test (PR=8001/PR=8002, both custom-html) on the **OLD** code
|
||
showed run-a die in `abra recipe fetch custom-html -n` (rc=1) — concurrent rm-rf + abra-fetch on the
|
||
same `~/.abra/recipes/custom-html` collide. Pre-existing (in 1d too), orthogonal to E1/HC3, not the
|
||
F1e-1 trigger. Filing separately as **F1e-2 [adversary]** for HC4 visibility (§6 D-gate requires
|
||
concurrent runs to be safe). Drone caps `MAX_TESTS=1-2` today, so practical impact is bounded.
|