Documents the end-to-end workflow used to land the intentional-skips/4-rung-ladder feature: explore harness → branch a local cc-ci clone → implement + unit-verify cold on cc-ci → live full-stage check → open PR (never push main) → independent adversary verdict → squash-merge on PASS → deploy via /root/builder-clone rebuild. Includes the adversary-verify-pr6.md plan as a reusable template. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.0 KiB
4.0 KiB
Adversary verification plan — cc-ci PR #6
PR: recipe-maintainers/cc-ci#6
Branch: feat/expected-na-and-tiny-functional (cc-ci repo)
Stance: disbelieve and verify. Try to make it fail. Default to REJECT unless every check below is
green with evidence. You did NOT author this code — verify it independently, cold.
What the PR claims
- A
custom-html-tinyfunctional test (tests/custom-html-tiny/functional/test_serves_content.py): exact-byte round-trip from the servedcontentvolume + a real 404. recipe_meta.EXPECTED_NA = {rung: reason}lists rungs a recipe intentionally skips; any essential rung skipped and NOT listed is unintentional. Skips still cap the level (never inflate).- The level ladder is the FOUR essential rungs only:
install · upgrade · backup_restore · functional(top = L4).integrationandrecipe_localare OPTIONAL — not rungs, never cap, not shown as skips. SSO must still be enforced for the run VERDICT (thesso_dep_unverified/ F2-11 path inrun_recipe_ci.pymust be intact). - results.json carries
skips:{intentional:{rung:reason}, unintentional:[rung]}+level_cap_rung; the card shows INTENTIONAL/UNINTENTIONAL SKIP rows; the badge shows anexpected/gap?3rd segment.
Verification steps (run on the cc-ci host; creds in /srv/cc-ci/.testenv)
- Fresh independent checkout of the PR head (do NOT reuse my working dirs):
git clone … cc-ci→git checkout feat/expected-na-and-tiny-functional. Record the HEAD sha. - Full unit suite cold (not just the touched files):
cc-ci-run -m pytest tests/unit/ -q. ALL must pass. A single failure → REJECT. Capture the count + any failure. - Diff regression review:
git diff origin/main...HEAD. Confirm: (a)level.pyRUNGS is the 4 essential rungs; (b)derive_rungsno longer emits integration/recipe_local; (c) the SSO VERDICT logic inrun_recipe_ci.py(sso_dep_unverified, the F2-11 fail-the-run block) is UNCHANGED — the PR must not have weakened SSO enforcement; (d) no test was weakened/skipped; (e) no secrets. - Live end-to-end harness run, FULL stages on the real CI server:
RECIPE=custom-html-tiny STAGES=install,upgrade,backup,restore,custom cc-ci-run runner/run_recipe_ci.py(setCCCI_RUNS_DIRto a temp dir; source.testenv). Then read itsresults.jsonand assert:installPASS andupgradePASS — the upgrade tier MUST actually run and pass (essential rung; prove it's not silently skipped). The custom tier (functional serve test) PASS.level == 2,level_cap_rung == "backup_restore",level_cap_reasonmentions L3 backup/restore.rungshas exactly install/upgrade/backup_restore/functional — no integration/recipe_local.skips.intentional == {"backup_restore": <reason>},skips.unintentional == [].backup_restoreis N/A because there is nobackupbot.backuplabel (a real intentional skip, NOT a masked failure) — confirm by inspecting the recipe compose.badge.svgcontains the mutedexpectedthird segment (notgap?).- The summary card renders and shows a green
INTENTIONAL SKIProw for backup/restore with the reason.
- Non-vacuity of the functional test: confirm it asserts the exact random bytes round-trip and a 404 on a random path (so a 200-everything stub would fail it) — read the test; reason about whether it could pass against a broken server. Bonus: confirm it writes via the volume mountpoint (the SWS image is shell-less).
- Teardown + hygiene: after the run, confirm the harness left NO orphan stack/volume/container for custom-html-tiny (deploy-count == 1; clean teardown). Confirm no secret value appears in results.json.
Verdict
Return a clear PASS or REJECT with the evidence for each numbered step (the HEAD sha, the unit count, the key results.json fields, the upgrade-tier verdict, the teardown state). REJECT if ANYTHING is unproven. Do not merge — the orchestrator merges on your PASS.