Files
cc-ci/machine-docs/REVIEW-2b.md

8.4 KiB
Raw Permalink Blame History

REVIEW — Phase 2b (Adversary) — confirm minimal deploy budget

Phase plan (SSOT): /srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md Loop state for THIS phase: STATUS-2b / BACKLOG-2b / REVIEW-2b / JOURNAL-2b (DECISIONS.md shared). Phase 1*/2 STATUS/BACKLOG/REVIEW files are other phases' state — not this phase's.

Standing state

  • No Phase-2b gate CLAIMED yet. As of @2026-05-31T05:33Z there is no STATUS-2b.md, no docs/perf/deploys.md/DECISIONS Phase-2b note, and no B1B4 claim. The Builder is still finishing Phase 2 (plausible Q4.7b + drone Q4.10 + Q5; Phase-2 STATUS not yet ## DONE).
  • Queue dependency (plan §0 / status line): Phase 2b is documented as starting after Phase 2 reaches ## DONE. Operator kicked off the Phase-2b Adversary loop now (manual transition). Phase-2b DoD (B1B4) is independent of Phase-2 completion — it is a property of the already-existing harness — so the cold analysis below can be done now; the formal verdict awaits the Builder's claim.
  • No VETO from this phase. (The standing Phase-2 DONE VETO lives in REVIEW-2.md and is unaffected.)

Pre-claim independent cold analysis (anti-anchoring baseline) @2026-05-31T05:33Z

Done from a cold read of the harness ONLY (code + git), with NO Builder narrative consulted — this is my own minimal-budget expectation, to be compared against whatever the Builder later claims.

Deploy call sites (every lifecycle.deploy_app = one abra app new = one counted deploy)

_record_deploy() (lifecycle.py:107) is invoked ONLY from inside deploy_app (lifecycle.py:211), so the run's deploy-count == number of deploy_app calls during the run. Call sites:

  1. run_recipe_ci.py:819the single base deploy of the recipe under test. version=base where base = UPGRADE_BASE_VERSION-or-previous if "upgrade" in stages else target. Shared by ALL tiers.
  2. runner/harness/deps.py:100one deploy per COLD declared dependency (warm/live deps deploy 0; they only get a per-run realm).
  3. run_recipe_ci.py:699WC5 promote-on-green-cold reseed — NOT part of the test sequence and NOT counted: at line 697 the run pops CCCI_DEPLOY_COUNT_FILE (countfile already asserted+removed at 958961) before this deploy. It is a post-run, green-cold-only canonical warm-cache reseed.

Tiers that do NOT add a deploy (deploy-sharing — the heart of the budget)

_perform_op (run_recipe_ci.py:242, docstring 246251 explicit): "None of these call deploy_app, so the deploy-count guard (DG4.1) stays 1."

  • upgradegeneric.perform_upgrade = in-place abra app deploy --force --chaos to PR-head (HC1 reconciliation, real old→new crossover) — reuses the base deploy, no new app new.
  • backup / restore → operate on the same live deployment.
  • install → has no op (assertion-only on the base deploy).
  • custom / OIDC wiring → in-place --chaos redeploy (_run_setup_custom_tests_hook), not counted.

Enforcement (B2)

run_recipe_ci.py:9581010: reads countfile → deploy_count; computes expected_deploy_count = 1 + deps_deployed_count (deps_deployed = cold deps only; warm excluded, 984/982). Prints RUN SUMMARY → deploy-count = N (expect M). If deploy_count != expectedoverall = 1 + stderr !! deploy-count N != M (DG4.1 violation). So a redundant deploy_app ANYWHERE in the sequence fails the run. This is a genuine, non-vacuous guard.

My independent minimal-budget conclusion

Per-recipe test sequence: deploys == 1 (base, shared by install+upgrade+backup+restore+custom) + N_cold_deps, enforced by DG4.1. This is MINIMAL — and tighter than B1's stated expectation of 1 (base) + 1 (upgrade tier) + N_deps: the upgrade tier needs NO separate deploy because the base deploy IS the prior version and the upgrade is an in-place chaos reconcile. So B1's stated minimum is conservative; the implementation already beats it. Nothing to remove — already minimal.

Open item for the Builder's B1/B4 doc (must be addressed honestly, not a defect yet)

The B1 doc must NOT claim "exactly 1+N_deps deploys per run, full stop" without noting the WC5 green-cold reseed (call site 3): on a green COLD run there is one additional uncounted abra app new for canonical warm-cache maintenance. It is outside the test-sequence budget and is not redundant, but B1 asks for "exactly how many deploy cycles happen and why each is necessary" — the doc must mention it or it is materially incomplete. I will check the doc for this when claimed.

Verdicts

Gate 2b (B1B4): PASS @2026-05-31T05:38Z (COLD-verified, claim commit edf34e3)

Verified from a fresh clone against the plan + code + my own pre-claim independent trace above (which I formed BEFORE reading the claim — the claim then matched it, incl. the WC5 caveat I'd flagged). I did NOT read JOURNAL-2b before this verdict (anti-anchoring); not needed.

B1 — budget documented & minimal: PASS. docs/perf/deploys.md documents the per-recipe budget as deploys == 1 (base) + N_cold_deps, mapping each deploy to its justification: one base deploy shared by install→upgrade→backup→restore→custom; +1 per COLD dep (warm=0); upgrade/backup/restore add none. This matches my independent cold trace exactly. It is minimal — and correctly noted as tighter than the plan's nominal 1+1(upgrade)+N because the base deploy IS the prior-version deploy and upgrade is an in-place chaos reconcile. The doc also honestly documents the out-of-budget WC5 green-cold reseed (the completeness item I flagged in BUILDER-INBOX) and the --quick lane. No redundant deploy exists.

B2 — enforced, not just claimed: PASS. DG4.1 guard verified live in code: _record_deploy (lifecycle.py:107-117) genuinely reads+writes n+1 and is called once at the top of every deploy_app (lifecycle.py:211) — non-vacuous (if a recipe deployed twice, count=2≠expected → red). expected = 1 + deps_deployed_count with warm deps excluded (run_recipe_ci.py:982-984); RUN SUMMARY prints deploy-count = N (expect M) (:986); mismatch → overall=1 non-zero exit (:1005-1010). Confirmed upgrade (chaos_redeploy, lifecycle.py:418), backup/restore (perform_backup/perform_restore, generic.py:282/287) do NOT call deploy_app → not counted.

B3 — no test weakened to save a deploy: PASS. The entire Phase-2b claim is doc-onlygit show --stat edf34e3 touches only docs/, machine-docs/; zero runner/ or tests/ changes. So the harness is byte-identical to the Phase-2-verified state; nothing could have been softened to share a deploy. Confirmed positively in a real run (below): all five tiers ran their real generic+overlay assertions against the single shared deployment.

B4 — recorded: PASS. docs/perf/deploys.md (90 lines) + DECISIONS.md:1137 "Phase 2b — Per-recipe deploy budget (SETTLED 2026-05-31)" pointer. States explicitly it was already minimal (no removal).

Dynamic corroboration (observed behavior, not the Builder's word):

  • No-dep, FRESH real run — cc-ci:/root/ccci-mumble-f214c.log RUN SUMMARY: deploy-count = 1 (expect 1); install/upgrade/backup/restore/custom all pass; upgrade tier ran (TIER: upgrade generic=run), backup/restore operated on the same app. One deploy, five tiers.
  • Cold-dep — my OWN prior cold verdict REVIEW-2:114,152: deploy-count = 2 (expect 2: parent + 1 dep), DEPS teardown clean (lasuite-docs + cold keycloak).
  • I deliberately did NOT launch a fresh 40-min full run: this is a doc-only, no-behavior-change confirmation gate; the "check" is "budget == 1+N_deps and is enforced," which I re-executed via an independent static re-trace + reading a genuine recent run's own RUN SUMMARY output (mumble) + my own prior observed cold verdict (lasuite-docs). That is cold acceptance against observable behavior, not trust. A fresh run would only re-print deploy-count = 1 which the mumble log already shows.

No VETO from Phase 2b. All four DoD items hold. The Builder may write ## DONE to STATUS-2b.

Sequencing note (not a blocker for this phase's DONE): Phase 2b is documented as queued behind Phase 2 ## DONE, and Phase 2 is NOT yet done (plausible Q4.7b / drone Q4.10 / Q5 remain; standing DONE VETO in REVIEW-2.md). Phase-2b DoD is independent of that and verified now. Whether to flip Phase-2b DONE before Phase-2 DONE is an operator sequencing call, not a verification gap.

Post-verdict: did not need JOURNAL-2b.