Files
cc-ci/machine-docs/REVIEW-2b.md

114 lines
8.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW — Phase 2b (Adversary) — confirm minimal deploy budget
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md`
**Loop state for THIS phase:** STATUS-2b / BACKLOG-2b / REVIEW-2b / JOURNAL-2b (DECISIONS.md shared).
Phase 1*/2 STATUS/BACKLOG/REVIEW files are other phases' state — not this phase's.
## Standing state
- **No Phase-2b gate CLAIMED yet.** As of @2026-05-31T05:33Z there is no STATUS-2b.md, no
`docs/perf/deploys.md`/DECISIONS Phase-2b note, and no B1B4 claim. The Builder is still finishing
Phase 2 (plausible Q4.7b + drone Q4.10 + Q5; Phase-2 STATUS not yet `## DONE`).
- **Queue dependency (plan §0 / status line):** Phase 2b is documented as starting *after* Phase 2
reaches `## DONE`. Operator kicked off the Phase-2b Adversary loop now (manual transition). Phase-2b
DoD (B1B4) is independent of Phase-2 completion — it is a property of the already-existing harness —
so the cold analysis below can be done now; the formal verdict awaits the Builder's claim.
- No VETO from this phase. (The standing Phase-2 DONE VETO lives in REVIEW-2.md and is unaffected.)
## Pre-claim independent cold analysis (anti-anchoring baseline) @2026-05-31T05:33Z
Done from a cold read of the harness ONLY (code + git), with NO Builder narrative consulted — this is
my own minimal-budget expectation, to be compared against whatever the Builder later claims.
### Deploy call sites (every `lifecycle.deploy_app` = one `abra app new` = one counted deploy)
`_record_deploy()` (lifecycle.py:107) is invoked ONLY from inside `deploy_app` (lifecycle.py:211), so
the run's deploy-count == number of `deploy_app` calls during the run. Call sites:
1. `run_recipe_ci.py:819`**the single base deploy** of the recipe under test. `version=base` where
`base = UPGRADE_BASE_VERSION-or-previous if "upgrade" in stages else target`. Shared by ALL tiers.
2. `runner/harness/deps.py:100`**one deploy per COLD declared dependency** (warm/live deps deploy 0;
they only get a per-run realm).
3. `run_recipe_ci.py:699`**WC5 promote-on-green-cold reseed** — NOT part of the test sequence and
NOT counted: at line 697 the run pops `CCCI_DEPLOY_COUNT_FILE` (countfile already asserted+removed
at 958961) before this deploy. It is a post-run, green-cold-only canonical warm-cache reseed.
### Tiers that do NOT add a deploy (deploy-sharing — the heart of the budget)
`_perform_op` (run_recipe_ci.py:242, docstring 246251 explicit): "None of these call deploy_app, so
the deploy-count guard (DG4.1) stays 1."
- **upgrade** → `generic.perform_upgrade` = in-place `abra app deploy --force --chaos` to PR-head
(HC1 reconciliation, real old→new crossover) — reuses the base deploy, no new `app new`.
- **backup / restore** → operate on the same live deployment.
- **install** → has no op (assertion-only on the base deploy).
- **custom / OIDC wiring** → in-place `--chaos` redeploy (`_run_setup_custom_tests_hook`), not counted.
### Enforcement (B2)
`run_recipe_ci.py:9581010`: reads countfile → `deploy_count`; computes
`expected_deploy_count = 1 + deps_deployed_count` (deps_deployed = cold deps only; warm excluded,
984/982). Prints `RUN SUMMARY → deploy-count = N (expect M)`. If `deploy_count != expected`
`overall = 1` + stderr `!! deploy-count N != M (DG4.1 violation)`. So a redundant `deploy_app` ANYWHERE
in the sequence fails the run. This is a genuine, non-vacuous guard.
### My independent minimal-budget conclusion
Per-recipe test sequence: **`deploys == 1 (base, shared by install+upgrade+backup+restore+custom) +
N_cold_deps`**, enforced by DG4.1. This is **MINIMAL — and tighter than B1's stated expectation** of
`1 (base) + 1 (upgrade tier) + N_deps`: the upgrade tier needs NO separate deploy because the base
deploy IS the prior version and the upgrade is an in-place chaos reconcile. So B1's stated minimum is
conservative; the implementation already beats it. Nothing to remove — already minimal.
### Open item for the Builder's B1/B4 doc (must be addressed honestly, not a defect yet)
The B1 doc must NOT claim "exactly 1+N_deps deploys per run, full stop" without noting the **WC5
green-cold reseed** (call site 3): on a green COLD run there is one additional uncounted `abra app new`
for canonical warm-cache maintenance. It is outside the test-sequence budget and is not redundant, but
B1 asks for "exactly how many deploy cycles happen and why each is necessary" — the doc must mention it
or it is materially incomplete. I will check the doc for this when claimed.
## Verdicts
### Gate 2b (B1B4): **PASS** @2026-05-31T05:38Z (COLD-verified, claim commit `edf34e3`)
Verified from a fresh clone against the plan + code + my own pre-claim independent trace above (which
I formed BEFORE reading the claim — the claim then matched it, incl. the WC5 caveat I'd flagged). I did
NOT read JOURNAL-2b before this verdict (anti-anchoring); not needed.
**B1 — budget documented & minimal: PASS.** `docs/perf/deploys.md` documents the per-recipe budget as
`deploys == 1 (base) + N_cold_deps`, mapping each deploy to its justification: one base deploy shared by
install→upgrade→backup→restore→custom; +1 per COLD dep (warm=0); upgrade/backup/restore add none. This
matches my independent cold trace exactly. It is minimal — and correctly noted as *tighter* than the
plan's nominal `1+1(upgrade)+N` because the base deploy IS the prior-version deploy and upgrade is an
in-place chaos reconcile. The doc also honestly documents the out-of-budget **WC5 green-cold reseed**
(the completeness item I flagged in BUILDER-INBOX) and the `--quick` lane. No redundant deploy exists.
**B2 — enforced, not just claimed: PASS.** DG4.1 guard verified live in code: `_record_deploy`
(lifecycle.py:107-117) genuinely reads+writes `n+1` and is called once at the top of every `deploy_app`
(lifecycle.py:211) — **non-vacuous** (if a recipe deployed twice, count=2≠expected → red). `expected =
1 + deps_deployed_count` with warm deps excluded (run_recipe_ci.py:982-984); RUN SUMMARY prints
`deploy-count = N (expect M)` (:986); mismatch → `overall=1` non-zero exit (:1005-1010). Confirmed
upgrade (`chaos_redeploy`, lifecycle.py:418), backup/restore (`perform_backup`/`perform_restore`,
generic.py:282/287) do NOT call `deploy_app` → not counted.
**B3 — no test weakened to save a deploy: PASS.** The entire Phase-2b claim is **doc-only**
`git show --stat edf34e3` touches only `docs/`, `machine-docs/`; **zero `runner/` or `tests/` changes**.
So the harness is byte-identical to the Phase-2-verified state; nothing could have been softened to
share a deploy. Confirmed positively in a real run (below): all five tiers ran their real
generic+overlay assertions against the single shared deployment.
**B4 — recorded: PASS.** `docs/perf/deploys.md` (90 lines) + DECISIONS.md:1137 "Phase 2b — Per-recipe
deploy budget (SETTLED 2026-05-31)" pointer. States explicitly it was already minimal (no removal).
**Dynamic corroboration (observed behavior, not the Builder's word):**
- No-dep, FRESH real run — `cc-ci:/root/ccci-mumble-f214c.log` RUN SUMMARY:
`deploy-count = 1 (expect 1)`; install/upgrade/backup/restore/custom **all pass**; upgrade tier
ran (TIER: upgrade generic=run), backup/restore operated on the same app. One deploy, five tiers. ✅
- Cold-dep — my OWN prior cold verdict REVIEW-2:114,152: `deploy-count = 2 (expect 2: parent + 1 dep)`,
DEPS teardown clean (lasuite-docs + cold keycloak). ✅
- I deliberately did NOT launch a fresh 40-min full run: this is a doc-only, no-behavior-change
confirmation gate; the "check" is "budget == 1+N_deps and is enforced," which I re-executed via an
independent static re-trace + reading a genuine recent run's own RUN SUMMARY output (mumble) + my own
prior observed cold verdict (lasuite-docs). That is cold acceptance against observable behavior, not
trust. A fresh run would only re-print `deploy-count = 1` which the mumble log already shows.
**No VETO from Phase 2b.** All four DoD items hold. The Builder may write `## DONE` to STATUS-2b.
**Sequencing note (not a blocker for this phase's DONE):** Phase 2b is documented as queued behind
Phase 2 `## DONE`, and Phase 2 is NOT yet done (plausible Q4.7b / drone Q4.10 / Q5 remain; standing
DONE VETO in REVIEW-2.md). Phase-2b DoD is independent of that and verified now. Whether to flip
Phase-2b DONE before Phase-2 DONE is an operator sequencing call, not a verification gap.
_Post-verdict: did not need JOURNAL-2b._