Files
cc-ci-orchestrator/cc-ci-plan/plan-phase2b-test-performance.md
autonomic-bot e85e16318c Phase 2b narrowed to "confirm minimal deploys"; perf ideas moved to IDEAS
Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM
was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly
(bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software
micro-optimizations are judged unlikely to help, so:

- IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline,
  attribution) + the optimization menu (image cache/prepull, readiness tuning,
  warm-SSO start/stop, runner caching, concurrency sizing, resources, secret
  overhead) under "Phase-2b empirical performance work", revisit only if
  measurement later proves a specific software bottleneck.
- plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the
  per-recipe test sequence already uses the minimum deploys (1 base shared by
  install+functional+backup/restore, +1 for the upgrade tier, +1 per dep),
  enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 05:07:49 +01:00

70 lines
4.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# cc-ci Phase 2b — Confirm the test sequence minimizes deploys (no redundant deploys)
**Status:** QUEUED — starts after Phase 2 (`plan-phase2-recipe-tests.md`) reaches `## DONE`, before
Phase 3. **Transition:** manual (operator kicks it off). **Owner:** Builder + Adversary loops.
**This file:** `/srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md`
---
## 0. Scope (NARROWED — operator, 2026-05-30)
The original Phase 2b was a broad empirical performance program (instrument → baseline → attribute →
optimize). **That has been removed and parked in `IDEAS.md`** ("Phase-2b empirical performance work").
**Why:** the real deploy-speed bottleneck was **hardware**, not software — the cc-ci VM was **2 vCPU
on a 4-core host** and **disk-I/O-bound** (load ~8, io pressure ~65%), with warm-keycloak (JVM) + all
infra resident; RAM was never the constraint. That was fixed **directly**: cc-nix-test bumped to
**4 vCPU** and made the **only running VM** on b1 (full host CPU). The software micro-optimizations are
judged unlikely to be worth the effort and are deferred to IDEAS, to be revisited only if measurement
later proves a specific software bottleneck.
**So Phase 2b is reduced to ONE thing:** confirm the per-recipe test sequence already uses the
**minimum number of deploys** — and fix it if it doesn't — **without weakening any test**. (Operator's
expectation: we have probably already done this via the deploy-once / deploy-sharing design.)
## 1. Mission
Verify that a recipe's full test sequence does **not** redeploy more than necessary, and document the
deploy budget. Reuse a single deployment across the stages that can safely share one; only deploy
again where a stage genuinely requires a distinct deployment.
## 2. Definition of Done (Adversary cold-verifies → REVIEW.md)
- [ ] **B1 — Deploy budget is documented and minimal.** Write down, per recipe run, exactly how many
`abra app deploy`/`upgrade` cycles happen and why each is necessary. Expected minimum:
- **one** base deploy shared by **install + functional/custom + backup→restore** (restore
redeploys onto the same app only as the restore mechanism itself requires);
- **one** additional prior-version deploy **only** for the **upgrade** tier (old→new is the
whole point of that tier);
- **one** deploy per declared **dependency** (e.g. an SSO provider), deployed once and reused.
i.e. `deploys == 1 (base) + 1 (upgrade tier) + N_deps` — no extra/redundant redeploys.
- [ ] **B2 — Enforced, not just claimed.** The harness already emits a deploy count and fails on a
mismatch (the DG4.1 `deploy-count != expected` check + the `RUN SUMMARY` `deploy-count` line) —
point to that as the enforcement and confirm `expected_deploy_count` reflects the minimal budget
in B1. If any recipe exceeds it, **remove the redundant deploy** (e.g. collapse a needless
re-deploy between install and functional) and re-verify.
- [ ] **B3 — No test weakened to save a deploy.** Every stage still runs its real assertions and real
isolation/teardown; sharing a deployment must not skip or soften any check. Adversary confirms
from a cold start that suite coverage is unchanged — only the deploy count is reduced/confirmed.
- [ ] **B4 — Recorded.** A short note (`docs/perf/deploys.md` or DECISIONS.md) states the confirmed
per-recipe deploy budget and that it is minimal. If it was already minimal, say so explicitly
(the likely outcome); if a redundant deploy was removed, record before/after counts.
When B1B4 hold and are Adversary-verified, write `## DONE` to Phase-2b `STATUS.md`.
## 3. Method
1. Read `run_recipe_ci.py`/harness: trace every `abra app deploy`/`abra app upgrade` call across the
stage sequence; count them; map each to a stage and a justification.
2. Compare to the minimal budget (B1). The existing `deploy-count`/`expected_deploy_count` logic is the
reference — verify it equals the minimum and that runs pass it.
3. If over budget on any recipe, eliminate the redundant deploy **without** changing what's tested;
re-run the full suite (Adversary cold-verifies green + isolation intact).
4. If already minimal, document the confirmation and finish — do NOT add speculative perf changes
(those live in IDEAS).
## 4. Guardrails
- **Correctness first:** never weaken/skip/soften a test or break isolation/teardown to cut a deploy.
- **Bounded:** this phase ONLY confirms/fixes deploy count. Any other perf idea → `IDEAS.md`
("Phase-2b empirical performance work"); do not re-import them here.
- **Real abra path** throughout (no docker-level shortcuts).