114 lines
8.4 KiB
Markdown
114 lines
8.4 KiB
Markdown
# REVIEW — Phase 2b (Adversary) — confirm minimal deploy budget
|
||
|
||
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md`
|
||
**Loop state for THIS phase:** STATUS-2b / BACKLOG-2b / REVIEW-2b / JOURNAL-2b (DECISIONS.md shared).
|
||
Phase 1*/2 STATUS/BACKLOG/REVIEW files are other phases' state — not this phase's.
|
||
|
||
## Standing state
|
||
- **No Phase-2b gate CLAIMED yet.** As of @2026-05-31T05:33Z there is no STATUS-2b.md, no
|
||
`docs/perf/deploys.md`/DECISIONS Phase-2b note, and no B1–B4 claim. The Builder is still finishing
|
||
Phase 2 (plausible Q4.7b + drone Q4.10 + Q5; Phase-2 STATUS not yet `## DONE`).
|
||
- **Queue dependency (plan §0 / status line):** Phase 2b is documented as starting *after* Phase 2
|
||
reaches `## DONE`. Operator kicked off the Phase-2b Adversary loop now (manual transition). Phase-2b
|
||
DoD (B1–B4) is independent of Phase-2 completion — it is a property of the already-existing harness —
|
||
so the cold analysis below can be done now; the formal verdict awaits the Builder's claim.
|
||
- No VETO from this phase. (The standing Phase-2 DONE VETO lives in REVIEW-2.md and is unaffected.)
|
||
|
||
## Pre-claim independent cold analysis (anti-anchoring baseline) @2026-05-31T05:33Z
|
||
Done from a cold read of the harness ONLY (code + git), with NO Builder narrative consulted — this is
|
||
my own minimal-budget expectation, to be compared against whatever the Builder later claims.
|
||
|
||
### Deploy call sites (every `lifecycle.deploy_app` = one `abra app new` = one counted deploy)
|
||
`_record_deploy()` (lifecycle.py:107) is invoked ONLY from inside `deploy_app` (lifecycle.py:211), so
|
||
the run's deploy-count == number of `deploy_app` calls during the run. Call sites:
|
||
1. `run_recipe_ci.py:819` — **the single base deploy** of the recipe under test. `version=base` where
|
||
`base = UPGRADE_BASE_VERSION-or-previous if "upgrade" in stages else target`. Shared by ALL tiers.
|
||
2. `runner/harness/deps.py:100` — **one deploy per COLD declared dependency** (warm/live deps deploy 0;
|
||
they only get a per-run realm).
|
||
3. `run_recipe_ci.py:699` — **WC5 promote-on-green-cold reseed** — NOT part of the test sequence and
|
||
NOT counted: at line 697 the run pops `CCCI_DEPLOY_COUNT_FILE` (countfile already asserted+removed
|
||
at 958–961) before this deploy. It is a post-run, green-cold-only canonical warm-cache reseed.
|
||
|
||
### Tiers that do NOT add a deploy (deploy-sharing — the heart of the budget)
|
||
`_perform_op` (run_recipe_ci.py:242, docstring 246–251 explicit): "None of these call deploy_app, so
|
||
the deploy-count guard (DG4.1) stays 1."
|
||
- **upgrade** → `generic.perform_upgrade` = in-place `abra app deploy --force --chaos` to PR-head
|
||
(HC1 reconciliation, real old→new crossover) — reuses the base deploy, no new `app new`.
|
||
- **backup / restore** → operate on the same live deployment.
|
||
- **install** → has no op (assertion-only on the base deploy).
|
||
- **custom / OIDC wiring** → in-place `--chaos` redeploy (`_run_setup_custom_tests_hook`), not counted.
|
||
|
||
### Enforcement (B2)
|
||
`run_recipe_ci.py:958–1010`: reads countfile → `deploy_count`; computes
|
||
`expected_deploy_count = 1 + deps_deployed_count` (deps_deployed = cold deps only; warm excluded,
|
||
984/982). Prints `RUN SUMMARY → deploy-count = N (expect M)`. If `deploy_count != expected` →
|
||
`overall = 1` + stderr `!! deploy-count N != M (DG4.1 violation)`. So a redundant `deploy_app` ANYWHERE
|
||
in the sequence fails the run. This is a genuine, non-vacuous guard.
|
||
|
||
### My independent minimal-budget conclusion
|
||
Per-recipe test sequence: **`deploys == 1 (base, shared by install+upgrade+backup+restore+custom) +
|
||
N_cold_deps`**, enforced by DG4.1. This is **MINIMAL — and tighter than B1's stated expectation** of
|
||
`1 (base) + 1 (upgrade tier) + N_deps`: the upgrade tier needs NO separate deploy because the base
|
||
deploy IS the prior version and the upgrade is an in-place chaos reconcile. So B1's stated minimum is
|
||
conservative; the implementation already beats it. Nothing to remove — already minimal.
|
||
|
||
### Open item for the Builder's B1/B4 doc (must be addressed honestly, not a defect yet)
|
||
The B1 doc must NOT claim "exactly 1+N_deps deploys per run, full stop" without noting the **WC5
|
||
green-cold reseed** (call site 3): on a green COLD run there is one additional uncounted `abra app new`
|
||
for canonical warm-cache maintenance. It is outside the test-sequence budget and is not redundant, but
|
||
B1 asks for "exactly how many deploy cycles happen and why each is necessary" — the doc must mention it
|
||
or it is materially incomplete. I will check the doc for this when claimed.
|
||
|
||
## Verdicts
|
||
|
||
### Gate 2b (B1–B4): **PASS** @2026-05-31T05:38Z (COLD-verified, claim commit `edf34e3`)
|
||
Verified from a fresh clone against the plan + code + my own pre-claim independent trace above (which
|
||
I formed BEFORE reading the claim — the claim then matched it, incl. the WC5 caveat I'd flagged). I did
|
||
NOT read JOURNAL-2b before this verdict (anti-anchoring); not needed.
|
||
|
||
**B1 — budget documented & minimal: PASS.** `docs/perf/deploys.md` documents the per-recipe budget as
|
||
`deploys == 1 (base) + N_cold_deps`, mapping each deploy to its justification: one base deploy shared by
|
||
install→upgrade→backup→restore→custom; +1 per COLD dep (warm=0); upgrade/backup/restore add none. This
|
||
matches my independent cold trace exactly. It is minimal — and correctly noted as *tighter* than the
|
||
plan's nominal `1+1(upgrade)+N` because the base deploy IS the prior-version deploy and upgrade is an
|
||
in-place chaos reconcile. The doc also honestly documents the out-of-budget **WC5 green-cold reseed**
|
||
(the completeness item I flagged in BUILDER-INBOX) and the `--quick` lane. No redundant deploy exists.
|
||
|
||
**B2 — enforced, not just claimed: PASS.** DG4.1 guard verified live in code: `_record_deploy`
|
||
(lifecycle.py:107-117) genuinely reads+writes `n+1` and is called once at the top of every `deploy_app`
|
||
(lifecycle.py:211) — **non-vacuous** (if a recipe deployed twice, count=2≠expected → red). `expected =
|
||
1 + deps_deployed_count` with warm deps excluded (run_recipe_ci.py:982-984); RUN SUMMARY prints
|
||
`deploy-count = N (expect M)` (:986); mismatch → `overall=1` non-zero exit (:1005-1010). Confirmed
|
||
upgrade (`chaos_redeploy`, lifecycle.py:418), backup/restore (`perform_backup`/`perform_restore`,
|
||
generic.py:282/287) do NOT call `deploy_app` → not counted.
|
||
|
||
**B3 — no test weakened to save a deploy: PASS.** The entire Phase-2b claim is **doc-only** —
|
||
`git show --stat edf34e3` touches only `docs/`, `machine-docs/`; **zero `runner/` or `tests/` changes**.
|
||
So the harness is byte-identical to the Phase-2-verified state; nothing could have been softened to
|
||
share a deploy. Confirmed positively in a real run (below): all five tiers ran their real
|
||
generic+overlay assertions against the single shared deployment.
|
||
|
||
**B4 — recorded: PASS.** `docs/perf/deploys.md` (90 lines) + DECISIONS.md:1137 "Phase 2b — Per-recipe
|
||
deploy budget (SETTLED 2026-05-31)" pointer. States explicitly it was already minimal (no removal).
|
||
|
||
**Dynamic corroboration (observed behavior, not the Builder's word):**
|
||
- No-dep, FRESH real run — `cc-ci:/root/ccci-mumble-f214c.log` RUN SUMMARY:
|
||
`deploy-count = 1 (expect 1)`; install/upgrade/backup/restore/custom **all pass**; upgrade tier
|
||
ran (TIER: upgrade generic=run), backup/restore operated on the same app. One deploy, five tiers. ✅
|
||
- Cold-dep — my OWN prior cold verdict REVIEW-2:114,152: `deploy-count = 2 (expect 2: parent + 1 dep)`,
|
||
DEPS teardown clean (lasuite-docs + cold keycloak). ✅
|
||
- I deliberately did NOT launch a fresh 40-min full run: this is a doc-only, no-behavior-change
|
||
confirmation gate; the "check" is "budget == 1+N_deps and is enforced," which I re-executed via an
|
||
independent static re-trace + reading a genuine recent run's own RUN SUMMARY output (mumble) + my own
|
||
prior observed cold verdict (lasuite-docs). That is cold acceptance against observable behavior, not
|
||
trust. A fresh run would only re-print `deploy-count = 1` which the mumble log already shows.
|
||
|
||
**No VETO from Phase 2b.** All four DoD items hold. The Builder may write `## DONE` to STATUS-2b.
|
||
|
||
**Sequencing note (not a blocker for this phase's DONE):** Phase 2b is documented as queued behind
|
||
Phase 2 `## DONE`, and Phase 2 is NOT yet done (plausible Q4.7b / drone Q4.10 / Q5 remain; standing
|
||
DONE VETO in REVIEW-2.md). Phase-2b DoD is independent of that and verified now. Whether to flip
|
||
Phase-2b DONE before Phase-2 DONE is an operator sequencing call, not a verification gap.
|
||
|
||
_Post-verdict: did not need JOURNAL-2b._
|