cc-ci-orchestrator/cc-ci-plan/plan-phase2b-test-performance.md

# cc-ci Phase 2b — Confirm the test sequence minimizes deploys (no redundant deploys)

**Status:** QUEUED — starts after Phase 2 (`plan-phase2-recipe-tests.md`) reaches `## DONE`, before
Phase 3. **Transition:** manual (operator kicks it off). **Owner:** Builder + Adversary loops.
**This file:** `/srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md`

---

## 0. Scope (NARROWED — operator, 2026-05-30)

The original Phase 2b was a broad empirical performance program (instrument → baseline → attribute →
optimize). **That has been removed and parked in `IDEAS.md`** ("Phase-2b empirical performance work").

**Why:** the real deploy-speed bottleneck was **hardware**, not software — the cc-ci VM was **2 vCPU
on a 4-core host** and **disk-I/O-bound** (load ~8, io pressure ~65%), with warm-keycloak (JVM) + all
infra resident; RAM was never the constraint. That was fixed **directly**: cc-nix-test bumped to
**4 vCPU** and made the **only running VM** on b1 (full host CPU). The software micro-optimizations are
judged unlikely to be worth the effort and are deferred to IDEAS, to be revisited only if measurement
later proves a specific software bottleneck.

**So Phase 2b is reduced to ONE thing:** confirm the per-recipe test sequence already uses the
**minimum number of deploys** — and fix it if it doesn't — **without weakening any test**. (Operator's
expectation: we have probably already done this via the deploy-once / deploy-sharing design.)

## 1. Mission

Verify that a recipe's full test sequence does **not** redeploy more than necessary, and document the
deploy budget. Reuse a single deployment across the stages that can safely share one; only deploy
again where a stage genuinely requires a distinct deployment.

## 2. Definition of Done (Adversary cold-verifies → REVIEW.md)

- [ ] **B1 — Deploy budget is documented and minimal.** Write down, per recipe run, exactly how many
      `abra app deploy`/`upgrade` cycles happen and why each is necessary. Expected minimum:
      - **one** base deploy shared by **install + functional/custom + backup→restore** (restore
        redeploys onto the same app only as the restore mechanism itself requires);
      - **one** additional prior-version deploy **only** for the **upgrade** tier (old→new is the
        whole point of that tier);
      - **one** deploy per declared **dependency** (e.g. an SSO provider), deployed once and reused.
      i.e. `deploys == 1 (base) + 1 (upgrade tier) + N_deps` — no extra/redundant redeploys.
- [ ] **B2 — Enforced, not just claimed.** The harness already emits a deploy count and fails on a
      mismatch (the DG4.1 `deploy-count != expected` check + the `RUN SUMMARY` `deploy-count` line) —
      point to that as the enforcement and confirm `expected_deploy_count` reflects the minimal budget
      in B1. If any recipe exceeds it, **remove the redundant deploy** (e.g. collapse a needless
      re-deploy between install and functional) and re-verify.
- [ ] **B3 — No test weakened to save a deploy.** Every stage still runs its real assertions and real
      isolation/teardown; sharing a deployment must not skip or soften any check. Adversary confirms
      from a cold start that suite coverage is unchanged — only the deploy count is reduced/confirmed.
- [ ] **B4 — Recorded.** A short note (`docs/perf/deploys.md` or DECISIONS.md) states the confirmed
      per-recipe deploy budget and that it is minimal. If it was already minimal, say so explicitly
      (the likely outcome); if a redundant deploy was removed, record before/after counts.

When B1–B4 hold and are Adversary-verified, write `## DONE` to Phase-2b `STATUS.md`.

## 3. Method
1. Read `run_recipe_ci.py`/harness: trace every `abra app deploy`/`abra app upgrade` call across the
   stage sequence; count them; map each to a stage and a justification.
2. Compare to the minimal budget (B1). The existing `deploy-count`/`expected_deploy_count` logic is the
   reference — verify it equals the minimum and that runs pass it.
3. If over budget on any recipe, eliminate the redundant deploy **without** changing what's tested;
   re-run the full suite (Adversary cold-verifies green + isolation intact).
4. If already minimal, document the confirmation and finish — do NOT add speculative perf changes
   (those live in IDEAS).

## 4. Guardrails
- **Correctness first:** never weaken/skip/soften a test or break isolation/teardown to cut a deploy.
- **Bounded:** this phase ONLY confirms/fixes deploy count. Any other perf idea → `IDEAS.md`
  ("Phase-2b empirical performance work"); do not re-import them here.
- **Real abra path** throughout (no docker-level shortcuts).