claim(2b): deploy budget confirmed minimal+enforced (1+N_cold_deps); B1-B4 claimed

Phase 2b confirm-and-document outcome: per-recipe test-sequence deploy budget is already minimal — `deploys == 1 (base, shared by all 5 tiers) + N_cold_deps` — and tighter than plan B1's nominal `1+1(upgrade)+N` because the upgrade is an in-place chaos redeploy of the prev-version base, not a separate deploy. Enforced as a hard failure by DG4.1 (expected = 1 + deps_deployed_count, run_recipe_ci.py:1005-1010). No redundant deploy found; none removed (none existed). - docs/perf/deploys.md: the budget record (B4), names the out-of-budget WC5 reseed - STATUS-2b.md: B1-B4 claim with WHAT/HOW/EXPECTED/WHERE for cold verify - JOURNAL-2b.md / BACKLOG-2b.md / DECISIONS.md: reasoning + settled note - consume machine-docs/BUILDER-INBOX.md (Adversary heads-up processed) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:35:46 +00:00
parent 5f37de69e3
commit edf34e3e53
6 changed files with 267 additions and 25 deletions
--- a/docs/perf/deploys.md
+++ b/docs/perf/deploys.md
@ -0,0 +1,90 @@
+# Per-recipe deploy budget (Phase 2b)
+
+**Question:** does a recipe's full CI test sequence redeploy more than necessary?
+**Answer:** No. The budget is already minimal — and in fact tighter than the nominal
+`1 base + 1 upgrade + N_deps` — because the upgrade tier shares the base deployment.
+
+## The budget
+
+For one cold `!testme`/`run_recipe_ci.py` run of a recipe:
+
+```
+deploys == 1 (base) + N_cold_deps
+```
+
+- **1 base deploy**, shared by **install → upgrade → backup → restore → custom/functional**.
+  All five tiers run against this single deployment. (`run_recipe_ci.py:819`,
+  `lifecycle.deploy_app` → `_record_deploy`.)
+- **+ 1 per COLD declared dependency** (e.g. an SSO provider deployed in-run), each deployed
+  **once** and reused (`deps.py:81-120`, one `deploy_app` per dep). A **live-warm** dep
+  (e.g. a resident keycloak that only gets a per-run realm, not a fresh deploy) contributes **0**.
+- The **upgrade tier adds NO deploy.** When the upgrade tier runs, the *base* deploy is done at
+  the **previous published version** (`run_recipe_ci.py:746-754`: `base = prev or target`), and the
+  upgrade is an **in-place `abra app deploy --chaos`** redeploy of the PR-head code onto that same
+  running app (`generic.perform_upgrade` → `lifecycle.chaos_redeploy`). `chaos_redeploy` does **not**
+  call `deploy_app`, so it is **not counted** — and it is the *real* upgrade the PR's changes are
+  exercised by (HC1), verified by `assert_upgraded` on the chaos-version label.
+- **backup and restore add NO deploy.** They operate on the same running app
+  (`perform_backup`/`perform_restore` → `backup_app`/`restore_app`); neither calls `deploy_app`.
+
+### Reconciliation with the plan's nominal budget
+Plan B1 states the nominal minimum as `1 (base) + 1 (upgrade tier) + N_deps`, assuming the upgrade
+tier needs its own prior-version deploy. The cc-ci design is **stricter**: the base deploy *is* the
+prior-version deploy (when upgrade runs), and the upgrade is performed **in place**. So the
+prior-version deploy and the base deploy are the **same** deploy — there is no separate upgrade
+deploy. Net actual budget: `1 + N_cold_deps`. This is the deploy-sharing the operator expected.
+
+## Enforcement (not just claimed)
+
+The harness counts every `deploy_app()` (the only caller of `_record_deploy`, `lifecycle.py:107-211`)
+into a per-run countfile and **hard-fails** on a mismatch:
+
+- `expected_deploy_count = 1 + deps_deployed_count` — `run_recipe_ci.py:984`
+  (`deps_deployed_count` excludes warm deps, `:982-983`).
+- RUN SUMMARY prints `deploy-count = N (expect M)` — `run_recipe_ci.py:986`.
+- `if deploy_count != expected_deploy_count: … overall = 1` (DG4.1 violation, non-zero exit) —
+  `run_recipe_ci.py:1005-1010`.
+
+So every green run is a *proof* that the recipe stayed within budget: a redundant redeploy would
+push `deploy_count` above `expected` and turn the run red. No recipe can silently exceed the budget.
+
+### Verify from a cold clone
+```
+RECIPE=ghost        STAGES=install,upgrade,backup,restore,custom  cc-ci-run runner/run_recipe_ci.py
+RECIPE=lasuite-docs STAGES=install,custom                          cc-ci-run runner/run_recipe_ci.py
+```
+Expected RUN SUMMARY lines:
+- no-dep recipe (ghost): `deploy-count = 1 (expect 1)`, all tiers `pass`.
+- cold-dep recipe (lasuite-docs + cold keycloak): `deploy-count = 2 (expect 2)` —
+  `deps deployed: ['keycloak']` — all tiers `pass`, `DEPS teardown` clean.
+- warm-dep recipe (lasuite-meet, live-warm keycloak): `deploy-count = 1 (expect 1)`,
+  `deps deployed: ['keycloak']`.
+
+Observed across all Phase 2 recipe runs: every recipe ran at `deploy-count = 1` (no/warm deps)
+or `deploy-count = 2 (expect 2)` (one cold dep). No run exceeded `1 + N_cold_deps`.
+
+## No test weakened to share the deploy
+Sharing one deployment does **not** skip or soften any check:
+- install, upgrade, backup, restore, custom each still run their **real generic + overlay
+  assertions** against the shared app (`run_lifecycle_tier`, `ALL_STAGES`).
+- the upgrade is a **real** prev→PR-head crossover (`assert_upgraded` on the chaos-version label),
+  not a no-op.
+- backup→restore is **real data-integrity** (P4: seed → backup → mutate → restore → assert the
+  seeded data survived), not health-only.
+- per-run isolation/teardown is unchanged (`DEPS teardown`, app undeploy, volume/secret cleanup).
+
+Only the **deploy count** is constrained; coverage is untouched.
+
+## Out of scope of the budget (intentionally)
+- **WC5 canonical promote** (`promote_canonical`, `run_recipe_ci.py:682-707`) deploys a separate
+  `warm-<recipe>` app to (re)seed the warm-cache canonical. It runs **only** on a green cold run on
+  LATEST, **after** the deploy-count assertion, and explicitly **pops** `CCCI_DEPLOY_COUNT_FILE`
+  (`:697`) so it does not perturb the per-run test budget. It is warm-cache maintenance, not a test
+  deploy.
+- **`--quick` fast lane** (`run_quick`) reuses an existing data-warm canonical and is a separate
+  optimization path; the cold full run above is the budget of record.
+
+## Conclusion
+The per-recipe deploy budget is **already minimal** and **enforced**: `1 + N_cold_deps`, with the
+upgrade tier sharing the base deploy in place. No redundant deploy was found; none was removed
+because none existed. (Phase 2b, 2026-05-31.)
--- a/machine-docs/BACKLOG-2b.md
+++ b/machine-docs/BACKLOG-2b.md
@ -4,6 +4,13 @@ The "## Build backlog" section is the Builder's. The "## Adversary findings" sec
 (only the Adversary closes items there, after re-test). Phase plan SSOT:
 `/srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md`.

+## Build backlog
+- [x] **B1/B2/B3** — trace + confirm the per-recipe deploy budget is minimal and enforced
+      (`1 + N_cold_deps`; upgrade shares the base deploy in place). Done — claimed in STATUS-2b.md.
+- [x] **B4** — record the budget in `docs/perf/deploys.md` (+ DECISIONS.md pointer). Done.
+- No redundant deploy found → nothing to remove. Confirm-and-document outcome (no harness change).
+- Awaiting Adversary cold-verify of B1–B4 in REVIEW-2b.md.
+
 ## Adversary findings
 _(none open — Phase 2b not yet claimed. Pre-claim deploy-budget trace recorded in REVIEW-2b.md;
 the WC5 green-cold reseed is flagged there as a B1-doc-completeness item to check at claim time, not a
--- a/machine-docs/BUILDER-INBOX.md
+++ b/machine-docs/BUILDER-INBOX.md
@ -1,25 +0,0 @@
-# BUILDER-INBOX (from Adversary)
-
-## @2026-05-31T05:33Z — Phase 2b Adversary loop is LIVE (non-urgent; Phase 2 still in flight)
-Heads-up, not a gate. Operator kicked off the Phase-2b Adversary loop. I created REVIEW-2b.md and
-BACKLOG-2b.md (my files). No verdict yet — nothing claimed. This is non-urgent: Phase 2 isn't `## DONE`
-(plausible Q4.7b / drone Q4.10 / Q5 remain), and Phase 2b is queued behind that per the plan.
-
-I did my own COLD trace of the deploy budget (REVIEW-2b.md) so I'm ready to verify B1–B4 fast when you
-claim. Two things to save you a round-trip:
-
-1. The budget is already minimal — and **tighter than B1's stated `1 + 1(upgrade) + N_deps`**: the
-   upgrade tier reuses the base deploy via the in-place `--force --chaos` reconcile (`_perform_op`
-   never calls `deploy_app`), so the real budget is `1 (base, shared by install+upgrade+backup+restore
-   +custom) + N_cold_deps`, enforced by DG4.1 (`expected = 1 + deps_deployed_count`). Likely outcome:
-   B1 = "already minimal," no redundant deploy to remove. Your B4 doc should state this and that B1's
-   plan-text minimum is conservative.
-
-2. **One completeness item I WILL check** in your B1/B4 doc: the WC5 promote-on-green-cold path
-   (`run_recipe_ci.py:699`) does one *additional, uncounted* `abra app new` on a green COLD run for
-   canonical warm-cache reseed (countfile is popped at :697 first). It's outside the test-sequence
-   budget and not redundant — but B1 asks for "exactly how many deploy cycles happen and why each is
-   necessary," so the doc must mention it or I'll mark it materially incomplete. Just name it.
-
-When you write `docs/perf/deploys.md` (or the DECISIONS Phase-2b note) + claim B1–B4 in STATUS-2b.md
-with WHAT/HOW/EXPECTED, I'll cold-verify (re-trace + confirm a real run's RUN SUMMARY deploy-count).
--- a/machine-docs/DECISIONS.md
+++ b/machine-docs/DECISIONS.md
@ -1131,3 +1131,28 @@ recipe whose upgrade TARGET needs different app .env than the base (e.g. an over
 newer version) can switch it without a cc-ci fork. Added `abra.env_get` (symmetric reader). mumble's
 `READY_PROBE` + install-overlay now read the live COMPOSE_FILE and self-gate the tcp 64738 probe to the
 host-ports (latest) phase. No cc-ci fork of any upstream file remains for mumble.
+
+---
+
+## Phase 2b — Per-recipe deploy budget (SETTLED 2026-05-31)
+
+The per-recipe CI test sequence deploy budget is **minimal and enforced**:
+
+```
+deploys == 1 (base) + N_cold_deps
+```
+
+- **1 base deploy** shared by ALL five tiers (install → upgrade → backup → restore → custom).
+- **+1 per COLD declared dep** (deployed once, reused); a **live-warm** dep contributes **0**.
+- The **upgrade tier adds NO deploy**: the base is deployed at the previous published version
+  (`base = prev or target`, `run_recipe_ci.py:746-754`) and the upgrade is an in-place chaos redeploy
+  to PR-head (`chaos_redeploy`, not counted). backup/restore reuse the same app.
+- This is **tighter** than plan B1's nominal `1 + 1(upgrade) + N` — the base deploy IS the
+  prior-version deploy. Nothing redundant; nothing removed because nothing existed to remove.
+- **Enforced** by DG4.1: `expected_deploy_count = 1 + deps_deployed_count` (`run_recipe_ci.py:984`),
+  hard-fails on mismatch (`:1005-1010`). Every green run proves it stayed within budget.
+- **Out of budget by design:** WC5 `promote_canonical` (`:682-707`) does one additional *uncounted*
+  `abra app new` on a green-cold run for warm-cache reseed (pops the countfile at `:697` first); it is
+  not a test-sequence deploy.
+
+Full record: `docs/perf/deploys.md`.
--- a/machine-docs/JOURNAL-2b.md
+++ b/machine-docs/JOURNAL-2b.md
@ -0,0 +1,46 @@
+# JOURNAL — Phase 2b (reasoning; WHY) — confirm minimal deploy budget
+
+## 2026-05-31 — Bootstrap + analysis (Builder)
+
+Operator manually kicked off Phase 2b (narrowed scope, plan §0): the ONLY task is to confirm the
+per-recipe test sequence uses the minimum number of deploys, and fix it if not, without weakening any
+test. Broad empirical-perf work is parked in IDEAS. Phase 2 is not yet `## DONE` (plausible/drone/Q5
+remain), but B1–B4 are a property of the already-existing harness, so the analysis is independent of
+Phase-2 completion.
+
+### Method
+Traced every `abra app deploy`/`upgrade`/`new` path through the harness. Key realization: the only
+thing that increments the DG4.1 deploy counter is `lifecycle._record_deploy()`, and it is called from
+exactly one place — inside `lifecycle.deploy_app` (`:211`). So "deploy count" == number of `deploy_app`
+calls in a run. Enumerated all `deploy_app` callers: base deploy (`run_recipe_ci.py:819`), per-dep
+(`deps.py:100`), and WC5 promote (`:699`, which pops the countfile first so it's outside the budget).
+
+### Why the budget is minimal (and tighter than plan B1's nominal text)
+Plan B1 frames the minimum as `1 base + 1 upgrade + N_deps`, assuming the upgrade tier needs its own
+prior-version deploy. The cc-ci design avoids that: when the upgrade tier runs, the *base* deploy is
+done at the **previous published version** (`base = prev or target`, `:746-754`), and the upgrade is an
+**in-place chaos redeploy** of PR-head onto that same app (`perform_upgrade` → `chaos_redeploy`, which
+does NOT call `deploy_app`). So the prior-version deploy and the base deploy are the SAME deploy — the
+upgrade tier adds zero deploys. backup/restore also operate on the same app. Net: `1 + N_cold_deps`.
+This is the deploy-sharing the operator expected; nothing to remove because nothing is redundant.
+
+### Why I trust the enforcement (B2 is real, not vacuous)
+`run_recipe_ci.py:1005-1010` turns `deploy_count != expected_deploy_count` into a non-zero exit. So
+every GREEN run is itself a proof the recipe stayed within `1 + N_cold_deps` — a redundant redeploy
+would push the count over and fail the run red. The historical Phase-2 runs (recorded in
+STATUS-2/REVIEW-2) corroborate: every recipe ran at `deploy-count = 1`, or `2 (expect 2)` for the one
+cold-dep recipe (lasuite-docs + cold keycloak). Warm keycloak (lasuite-meet) → 0 dep deploys → expect 1.
+
+### Why B3 holds
+Sharing one deploy does not skip assertions: all five tiers still run their generic+overlay assertions
+against the shared app; upgrade is a real prev→PR-head crossover verified by `assert_upgraded`; P4
+backup→restore is real data-integrity; per-run isolation/teardown is unchanged. Only the deploy COUNT
+is constrained, never the coverage.
+
+### Cross-loop note
+The Adversary's independent pre-claim cold trace (REVIEW-2b @05:33Z) reached the identical conclusion
+and flagged exactly one completeness item: the B1/B4 doc must NAME the WC5 green-cold reseed
+(`run_recipe_ci.py:699`) — one additional uncounted `abra app new` for canonical warm-cache
+maintenance, outside the test-sequence budget. `docs/perf/deploys.md` addresses this in its
+"Out of scope of the budget (intentionally)" section, and STATUS-2b names it in verify-step (a).
+Claimed B1–B4 accordingly.
--- a/machine-docs/STATUS-2b.md
+++ b/machine-docs/STATUS-2b.md
@ -0,0 +1,99 @@
+# STATUS — Phase 2b (confirm the test sequence minimizes deploys)
+
+**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md`
+**Loop state for THIS phase:** STATUS-2b / BACKLOG-2b / REVIEW-2b / JOURNAL-2b (DECISIONS.md shared).
+Phase 1/1*/2/2* STATUS/BACKLOG/REVIEW files are HISTORY — not this phase's state.
+
+## Phase
+NARROWED scope (operator 2026-05-30): the only task is to **confirm the per-recipe test sequence
+already uses the minimum number of deploys** (and fix it if not) **without weakening any test**.
+The broad empirical-perf program is parked in IDEAS. Likely outcome (operator's expectation):
+already minimal via the deploy-once / deploy-sharing design.
+
+## Definition of Done (Phase 2b) — B1–B4, each Adversary cold-verified in REVIEW-2b
+- [ ] **B1 — Deploy budget documented and minimal.**
+- [ ] **B2 — Enforced, not just claimed** (deploy-count guard + RUN SUMMARY, expected reflects budget).
+- [ ] **B3 — No test weakened to save a deploy** (coverage/isolation/teardown unchanged).
+- [ ] **B4 — Recorded** (`docs/perf/deploys.md`).
+
+---
+
+## Gate: 2b CLAIMED, awaiting Adversary  (@2026-05-31, commit on origin/main)
+
+**Outcome: the per-recipe deploy budget is ALREADY MINIMAL and ENFORCED. No redundant deploy found;
+none removed because none existed.** This is a confirm-and-document result (no harness behavior
+change). Deliverable: `docs/perf/deploys.md`.
+
+### WHAT is claimed (the budget)
+Per cold `run_recipe_ci.py` run of a recipe:
+```
+deploys == 1 (base) + N_cold_deps          # enforced as a hard failure
+```
+- **1 base deploy** shared by ALL five tiers: install → upgrade → backup → restore → custom.
+- **+1 per COLD declared dep**, deployed once and reused; a **live-warm** dep contributes **0**.
+- The **upgrade tier adds NO deploy**: the base is deployed at the **previous published version**
+  when upgrade runs (`base = prev or target`), and the upgrade is an **in-place chaos redeploy** of
+  PR-head onto that same app — NOT counted, and the real HC1 upgrade under test.
+- **backup/restore add NO deploy** (operate on the same running app).
+- This is **tighter** than plan B1's nominal `1 + 1(upgrade) + N` because the base deploy *is* the
+  prior-version deploy — the prior-version and base deploy are the same deploy.
+
+### HOW the Adversary can verify (from a fresh clone)
+
+**(a) Static — only `deploy_app` increments the count, and it's called in exactly 3 sites:**
+```
+grep -n "_record_deploy" runner/harness/lifecycle.py          # called ONLY inside deploy_app (:107, :211)
+grep -rn "deploy_app(" runner/ | grep -v "def deploy_app"     # 3 callers: :699 :819 (+ deps.py:100)
+```
+- `lifecycle.py:211` — `deploy_app` is the sole caller of `_record_deploy`.
+- `run_recipe_ci.py:819` — the single base deploy (cold main path).
+- `runner/harness/deps.py:100` — one per declared dep.
+- `run_recipe_ci.py:699` — `promote_canonical` (WC5), which **pops** `CCCI_DEPLOY_COUNT_FILE` first
+  (`:697`) so it is OUTSIDE the per-run budget (post-green warm-cache maintenance, not a test deploy).
+- `lifecycle.chaos_redeploy` (the upgrade, `lifecycle.py:418-435`) does **NOT** call `deploy_app`
+  → not counted (docstring states this explicitly).
+- `generic.perform_backup`/`perform_restore` → `backup_app`/`restore_app`: no `deploy_app` → not counted.
+- Base-version selection that makes upgrade share the base deploy: `run_recipe_ci.py:746-754`
+  (`want_upgrade`; `prev = UPGRADE_BASE_VERSION or previous_version`; `base = prev or target`).
+
+**(b) Enforcement — DG4.1 guard hard-fails on mismatch:**
+```
+sed -n '958,1010p' runner/run_recipe_ci.py
+```
+- `expected_deploy_count = 1 + deps_deployed_count` (`:984`); warm deps excluded (`:982-983`).
+- RUN SUMMARY prints `deploy-count = N (expect M)` (`:986`).
+- `if deploy_count != expected_deploy_count: … overall = 1` → non-zero exit (`:1005-1010`).
+  ⇒ every GREEN run proves the recipe stayed within budget; a redundant redeploy turns it RED.
+
+**(c) Dynamic (optional, cold) — re-run a no-dep and a cold-dep recipe:**
+```
+RECIPE=ghost        STAGES=install,upgrade,backup,restore,custom  cc-ci-run runner/run_recipe_ci.py
+RECIPE=lasuite-docs STAGES=install,custom                          cc-ci-run runner/run_recipe_ci.py
+```
+
+**(d) B3 — coverage unchanged:** confirm all five tiers still run their real generic+overlay
+assertions against the shared app (`run_lifecycle_tier`, `ALL_STAGES` `run_recipe_ci.py:56`), the
+upgrade is a real prev→PR-head crossover (`assert_upgraded`), and P4 backup→restore is real
+data-integrity (seed→backup→mutate→restore→assert). Nothing is skipped/softened to share the deploy.
+
+**(e) B4 — the record:** `docs/perf/deploys.md` (this deliverable).
+
+### EXPECTED outcomes
+- (a) `_record_deploy` appears only inside `deploy_app`; exactly the 3 `deploy_app` callers above.
+- (b) guard present and hard-failing as quoted; `expected = 1 + cold_deps`.
+- (c) ghost: `deploy-count = 1 (expect 1)`, all tiers `pass`.
+      lasuite-docs + cold keycloak: `deploy-count = 2 (expect 2)`, `deps deployed: ['keycloak']`,
+      all tiers `pass`, `DEPS teardown` clean.
+- Historical corroboration (Phase 2 runs, recorded in STATUS-2/REVIEW-2): every recipe ran at
+  `deploy-count = 1` (no/warm dep) or `deploy-count = 2 (expect 2)` (one cold dep, lasuite-docs
+  Q2.4 — REVIEW-2 `:114`). No run ever exceeded `1 + N_cold_deps`.
+
+### WHERE the inputs live
+- Deliverable doc: `docs/perf/deploys.md`.
+- Code: `runner/run_recipe_ci.py` (`:56`, `:746-754`, `:819`, `:958-1010`),
+  `runner/harness/lifecycle.py` (`:107-211`, `:418-435`), `runner/harness/deps.py` (`:81-120`),
+  `runner/harness/generic.py` (`perform_upgrade`/`perform_backup`/`perform_restore`).
+- Commit: see `git log origin/main` for the `claim(2b)` commit.
+
+## Gates
+- Gate 2b — CLAIMED, awaiting Adversary PASS in REVIEW-2b.