Files
cc-ci/machine-docs/REVIEW-1e.md

198 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW-1e — Adversary verdicts (Phase 1e: generic-harness corrections)
Adversary-owned, append-only. Phase plan: `/srv/cc-ci/cc-ci-plan/plan-phase1e-harness-corrections.md`.
Definition of Done = HC1HC4 each cold-verified PASS here (handshake per plan.md §6.1).
## Definition-of-Done tracker
- [x] **HC1** — Upgrade tier upgrades to PR head (prev published → PR-head via `abra app deploy --chaos`), not a published tag; moved-assertion adapted; DG4.1 deploy-count guard reconciled. **PASS @2026-05-28 (E2, commit 7472561).**
- [x] **HC2** — Repo-local (PR-authored) `test_*.py` / `install_steps.sh` NOT executed unless recipe is on the cc-ci approval allowlist (default-deny). **PASS @2026-05-28 (E0, commit c7ae296).**
- [x] **HC3** — Generic runs by default alongside an overlay (additive); skipped only via explicit opt-out; op runs once. **PASS @2026-05-28 (E1 re-claim, fix commit 6eabfdc).**
- [x] **HC4** — No regression: D1D10 / DG1DG8 re-verified cold; deploy-once (DG4.1) holds; teardown sacred; three new behaviors demonstrated. **PASS @2026-05-28 (E3, build 155 own `!testme` on custom-html PR#2).**
Maps to Builder milestones: E0=HC2, E1=HC3, E2=HC1, E3=HC4+docs.
## Cold-start access (re-verified each phase)
- @2026-05-28 — `ssh cc-ci` OK (NixOS 24.11), dashboard HTTP 200 via SOCKS proxy 127.0.0.1:1055. Proxy/SSH path healthy.
## Verdicts
### E0 / HC2 — repo-local trust gate (default-deny) — PASS @2026-05-28
Builder claim (STATUS-1e, commit c7ae296 / feat d38a695): repo-local (PR-authored)
`test_*.py`/`install_steps.sh`/`ops.py` consulted only for recipes on `tests/repo-local-approved.txt`
(empty ⇒ deny); centralized `_gated()` in `discovery.py`; 8 unit tests pass.
**Cold verification (own clone HEAD=c7ae296, shipped to cc-ci, run via `cc-ci-run`):**
1. **Unit suite, independent run:** `cd /tmp/adv-1e && cc-ci-run -m pytest tests/unit -v`
**8 passed in 0.06s** (incl. repo-local-ignored-when-unapproved / wins-when-approved for
overlay+custom+install_steps+pre_op, and default-allowlist-is-empty).
2. **My own break-it probe** (`hc2_probe.py`, planted a HOSTILE repo-local `install_steps.sh`
`rm -rf /` + `ops.py` `os.system('id')` + `test_install.py`):
- real checked-in allowlist → `approved_recipes() == set()` (default-deny).
- `real-default``approved=False`, overlay falls back to **cc-ci**, `install_steps=None`,
`pre_op=None` (hostile repo-local code NOT selected).
- lone `*`**DENY** (not a wildcard, as the file header promises).
- only-comment / whitespace lines → **DENY**.
- approving a *different* recipe (hedgedoc) → custom-html still **DENY** (no leak).
- `custom-html` listed → `approved=True`, overlay/install_steps/pre_op all flip to **repo-local**.
3. **No bypass:** every execution path in `runner/run_recipe_ci.py` routes through gated
`discovery.*` (`resolve_op``resolve_overlay_op`, `custom_tests`, `install_steps`→lifecycle hook).
`snapshot_recipe_tests` reads the repo-local dir ungated but only **copies** it (discover), never
executes — matches the plan's "discovered-but-NOT-executed". `pre_op_hook` not yet wired into the
orchestrator (E1/HC3 work); its discovery fn is already gated.
Verdict: **PASS** — default-secure, centralized gate, flips only on explicit per-recipe approval;
hostile repo-local code provably not executed under the shipped default. No finding.
**Note (not a defect):** orchestrator still uses single-file override `resolve_op` (1d semantics);
the additive generic floor (HC3) is E1 in-flight — will re-check the gate survives the HC3 refactor.
### E1 / HC3 — additive generic + op/assertion split — FAIL (PASS WITHHELD) @2026-05-28
Builder claim (STATUS-1e gate, commit b7e6cbd): generic runs additively alongside overlays;
orchestrator owns each op (once); opt-out via `CCCI_SKIP_GENERIC[_<OP>]`/`recipe_meta.SKIP_GENERIC`;
deploy-count stays 1; two e2e (default + opt-out) "clean."
**Cold verification (own clone HEAD=b7e6cbd shipped to cc-ci `/tmp/adv-1e`, run via `cc-ci-run`):**
- **Structure (PASS):** read the refactor — `run_lifecycle_tier` performs the op ONCE
(`_perform_op``generic.perform_{upgrade,backup,restore}`, none call `deploy_app`), then runs generic
(unless `_skip_generic`) + overlay as separate pytests vs the shared post-op state. Generic+overlay
test files are assertion-only; seeding moved to `ops.py pre_<op>`. `assert_upgraded` keeps the
non-vacuous move check (F1d-2). `_record_deploy()` lives only in `deploy_app`.
- **Default e2e** (custom-html, all stages): EVERY tier ran BOTH `assert (generic)` AND
`assert (cc-ci)`; pre_upgrade/backup/restore seeds fired; **deploy-count=1**; install/upgrade/backup/
restore all PASS; custom=skip; clean teardown (no leftover stack/volume). ✓ additive confirmed.
- **Opt-out e2e** (`CCCI_SKIP_GENERIC=1`): generic skipped on every tier (**0** `_generic/` files ran),
overlay-only, **deploy-count=1** ✓ — **but backup=FAIL**: `test_backup_captures_state`
`AssertionError: '' == 'original'`. Same code/recipe; only diff is the opt-out flag.
**Interim verdict (commit 4334e19): FAIL — opt-out flipped backup RED**, theorised cause was the
opt-out path removing an accidental ~1s generic-pytest timing buffer. **Filed F1e-1.**
### CORRECTION @2026-05-28 (isolated repro disproved the opt-out theory)
Isolated, no-concurrency repro of `STAGES=install,backup,restore` on custom-html:
- **opt-out × 3** (`CCCI_SKIP_GENERIC=1`): backup PASS, restore PASS, deploy-count=1. **3/3.**
- **default × 1**: backup PASS, restore PASS, deploy-count=1.
So opting out of the generic is **NOT** what flips the backup RED — the original symptom occurred while
the Builder was running concurrent custom-html e2e on the same node. The real trigger is **load /
concurrency** putting the post-backup container cycle into a window where `exec_in_app`'s `docker exec`
fails. The **static defect stays the same** (and the fix direction in F1e-1 is still correct):
`exec_in_app` silently returns empty stdout on a failed exec (returncode ignored) + no readiness retry.
F1e-1 reframed in BACKLOG-1e; my earlier "opt-out is not behavior-neutral" framing is **withdrawn**.
### Builder's fix (commit 6eabfdc) — verification pending
`exec_in_app` now polls (re-resolves container + re-execs) until `rc==0` or 90s, then **raises**
never masks a failed exec as empty data. No assertion weakened. Same commit also lands HC1 plumbing
(`chaos_redeploy`, `recipe_head_commit`, `.chaos-version` parsing in `deployed_identity`, head_ref
match in `assert_upgraded`) — out-of-scope for this re-verification, will check at E2 claim.
**Fix verified cold @2026-05-28 (own clone HEAD=6eabfdc shipped to `/tmp/adv-fix`):**
`CCCI_SKIP_GENERIC=1 RECIPE=custom-html STAGES=install,backup,restore cc-ci-run runner/run_recipe_ci.py`
→ install/backup/restore **all PASS**, deploy-count=1, generic skipped on every tier (overlay-only),
clean teardown (no leftover stack/volume). The `exec_in_app` poll+raise is structurally watertight:
re-resolves the container each try, raises on persistent failure — no silent-empty data path remains;
a real exec failure becomes a real test failure rather than an `'' == 'original'` false-RED.
**F1e-1 closed by Adversary @2026-05-28** (BACKLOG-1e).
### Final E1/HC3 verdict — PASS @2026-05-28 (re-claim commit e75ec1b; fix commit 6eabfdc)
Cold-verified: (1) additive — every lifecycle tier runs both `assert (generic)` and `assert (cc-ci)` on
the shared post-op deployment (default run, all stages PASS); (2) opt-out — `CCCI_SKIP_GENERIC=1`
skips the generic on every tier with **0** `_generic/` files run and overlay-only, deploy-count=1;
(3) op-once — op primitives `perform_{upgrade,backup,restore}` never call `deploy_app`, deploy-count
stays 1 in both modes; (4) assertion-only overlays — no double-op risk; (5) no assertion weakened —
`assert_upgraded` keeps the non-vacuous move check (F1d-2 honored). HC2 gate survives the refactor.
**Open robustness item:** F1e-2 (recipe-fetch concurrency race) — pre-existing, orthogonal, tracked
for HC4.
### E2 / HC1 — upgrade to PR head via chaos redeploy — PASS @2026-05-28 (commit 7472561)
Builder claim (STATUS-1e gate, commit 7472561 fixing 6eabfdc multi-line-edit-miss): upgrade tier now
re-checks-out the PR-head ref (`head_ref = $REF or recipe_head_commit(recipe)`, captured pre-tag-checkout)
and chaos-redeploys (`abra.deploy(chaos=True)` direct, not via `deploy_app` — count not incremented).
`assert_upgraded` (when head_ref known) requires the deployed `coop-cloud.<stack>.chaos-version` label
to MATCH head_ref (prefix-tolerant for short ↔ full commit); falls back to the version/image/chaos
moved-check when head_ref is unknown.
**Cold verification (own clone HEAD=7472561 shipped to `/tmp/adv-hc1`):**
1. **e2e custom-html install,upgrade** (`cc-ci-run runner/run_recipe_ci.py`):
```
===== TIER: upgrade (generic=run, overlay=cc-ci:tests/custom-html/test_upgrade.py) =====
upgrade→PR-head: head_ref=8a026066 chaos-version=8a026066 version=1.10.0+1.28.0→1.11.0+1.29.0
deploy-count = 1 (expect 1)
install : pass upgrade : pass
```
`head_ref == chaos-version` (deterministic prefix match), real version move 1.10.0→1.11.0,
**deploy-count=1**, additive generic+overlay both ran post-op, clean teardown (no leftover
stack/volume). ✓ PR-head code under test demonstrably deployed.
2. **Adversarial probe — non-vacuousness:** monkey-patched `deployed_identity` to return
`chaos='09bf4d54'` against a fake `head_ref='deadbeefcafe0001'` in op_state, called
`generic.assert_upgraded` directly → `AssertionError: upgrade deployed chaos commit '09bf4d54',
not the intended PR-head 'deadbeefcafe' — the re-checkout to the code under test failed`.
✓ A wrong PR-head fails loudly; the assertion is strictly non-vacuous (guards F1d-2 and the prev-
checkout-vacuous-pass bug that 7472561 itself just fixed).
Verdict: **PASS** — HC1 acceptance met. deploy-count guard correctly reconciled (chaos path direct;
`_record_deploy` lives only in `deploy_app`). No assertion weakened (the move-check fallback for the
no-head_ref path is unchanged; production `!testme` always sets `$REF`). HC3 additive still holds
(generic+overlay both ran post-chaos-deploy). No new finding.
**Phase-1e D-o-D tracker:** HC1 ✓ HC2 ✓ HC3 ✓ — three corrections all Adversary-verified cold.
**Pending:** HC4 (no-regression D1D10/DG1DG8) — re-verify when Builder claims E3.
### E3 / HC4 — no regression, three new behaviors live — PASS @2026-05-28 (Builder claim 6397cd5)
**Gold-standard cold verification = my own `!testme` end-to-end.** Posted three comments by the bot on
`recipe-maintainers/custom-html` PR#2 (head `db9a9502`, "upgrade to 1.13.0+1.31.1"):
- id 13755: `!testmexyz adversary-1e-HC4 ...` — **negative control** (D1 reject) → no trigger ✓
- id 13756: `!testme adversary-1e-HC4 ...` — **negative control** (extra text after !testme; exact-match
filter) → no trigger ✓
- id 13757: `!testme` (exact) at `03:19:25` — **positive trigger**.
**Bridge → Drone → runner production chain (Drone build #155):**
- **D1 latency:** triggered build 155 at `03:19:34` — **9 s** after comment (well under 60 s).
- **D1 dedup/auth:** only id 13757 triggered; 13755+13756 cleanly ignored; PR-comment reflection (id
13758): `cc-ci: run for custom-html @ db9a9502 ✅ passed → …/cc-ci/155`.
- **HC1 live:** build log shows `upgrade→PR-head: head_ref=db9a9502 chaos-version=db9a9502
version=1.10.0+1.28.0→1.13.0+1.31.1`. **Full-sha match `db9a9502 == db9a9502`** — `$REF` flowed
bridge→Drone→runner→re-checkout→chaos deploy correctly. PR-head code under test demonstrably
deployed in production.
- **HC3 additive in production:** every lifecycle tier ran BOTH `assert (generic): tests/_generic/
test_<op>.py` AND `assert (cc-ci): tests/custom-html/test_<op>.py`, all **PASSED** (8 assertions
across install/upgrade/backup/restore).
- **HC2 in production:** custom-html not on the allowlist → no repo-local consulted; cc-ci + generic
only (matches HC2 default-deny behavior under load).
- **DG4.1:** `deploy-count = 1 (expect 1)` ✓
- **F1e-1 fix under real load:** `test_backup_captures_state PASSED` (the previously failing
assertion). The poll+raise hardening of `exec_in_app` survives a production-pipeline run.
- **D6 secret-leak grep:** 58 infra-secret values (tokens, HMAC, RPC, OAuth, cert/key) checked
against the full published build #155 log — **zero matches**; sensitive-pattern sweep clean.
- **Teardown sacred:** post-build, `docker stack ls | grep cust` → none; `docker volume ls | grep
cust` → none. ✓
**No regression on the D-gate / DG-gate surface I can attribute to 1e changes:**
- DG1 serving (assert_serving in every tier), DG2 upgrade non-vacuous (head_ref match
+ monkey-patched mismatch raise), DG3 backup-capable detect (custom-html backup-cap = true; flowed
through), DG4 overlay precedence (gated by HC2), DG4.1 deploy-once, DG5 install-steps hook
resolution (HC2 verified hook still resolves; not e2e-re-exercised here because custom-html ships no
hook), DG6 full integration (build #155 above), DG7 DRY/teardown-always, DG8 docs (`docs/testing.md`
+ `docs/enroll-recipe.md` both updated for HC1/HC2/HC3 and accurately describe the new behavior).
- D1 trigger / dedup / outcome reflection all live in build #155.
- D6 secrets verified clean as above.
**F1e-2** (pre-existing concurrent `abra recipe fetch` race) — confirmed not a 1e regression by the
Builder's status; tracked in BACKLOG-1e for HC4 visibility, not blocking DONE (Drone caps `MAX_TESTS=1`
in current config, so practical impact bounded; surface again at breadth-ramp).
**Verdict: PASS. NO VETO.** All four HC items Adversary cold-verified within the last 24 h
(HC1/HC2/HC3/HC4 ✓). Builder may write `## DONE` to `STATUS-1e.md`.
## Final summary — Phase 1e cold verification
HC1 ✓ (E2, commit 7472561 + build #155 head_ref==chaos-version)
HC2 ✓ (E0, commit c7ae296 + hostile-code probe)
HC3 ✓ (E1, commit e75ec1b + F1e-1 fix 6eabfdc verified cold)
HC4 ✓ (E3, commit 6397cd5 + own !testme build #155 production-chain cold)
Findings: F1e-1 CLOSED (fixed + re-verified). F1e-2 OPEN (pre-existing, not a 1e regression).
### Separate observation while testing (NOT F1e-1)
A controlled 2-concurrent same-recipe test (PR=8001/PR=8002, both custom-html) on the **OLD** code
showed run-a die in `abra recipe fetch custom-html -n` (rc=1) — concurrent rm-rf + abra-fetch on the
same `~/.abra/recipes/custom-html` collide. Pre-existing (in 1d too), orthogonal to E1/HC3, not the
F1e-1 trigger. Filing separately as **F1e-2 [adversary]** for HC4 visibility (§6 D-gate requires
concurrent runs to be safe). Drone caps `MAX_TESTS=1-2` today, so practical impact is bounded.