Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly (bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software micro-optimizations are judged unlikely to help, so: - IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline, attribution) + the optimization menu (image cache/prepull, readiness tuning, warm-SSO start/stop, runner caching, concurrency sizing, resources, secret overhead) under "Phase-2b empirical performance work", revisit only if measurement later proves a specific software bottleneck. - plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the per-recipe test sequence already uses the minimum deploys (1 base shared by install+functional+backup/restore, +1 for the upgrade tier, +1 per dep), enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
70 lines
4.6 KiB
Markdown
70 lines
4.6 KiB
Markdown
# cc-ci Phase 2b — Confirm the test sequence minimizes deploys (no redundant deploys)
|
||
|
||
**Status:** QUEUED — starts after Phase 2 (`plan-phase2-recipe-tests.md`) reaches `## DONE`, before
|
||
Phase 3. **Transition:** manual (operator kicks it off). **Owner:** Builder + Adversary loops.
|
||
**This file:** `/srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md`
|
||
|
||
---
|
||
|
||
## 0. Scope (NARROWED — operator, 2026-05-30)
|
||
|
||
The original Phase 2b was a broad empirical performance program (instrument → baseline → attribute →
|
||
optimize). **That has been removed and parked in `IDEAS.md`** ("Phase-2b empirical performance work").
|
||
|
||
**Why:** the real deploy-speed bottleneck was **hardware**, not software — the cc-ci VM was **2 vCPU
|
||
on a 4-core host** and **disk-I/O-bound** (load ~8, io pressure ~65%), with warm-keycloak (JVM) + all
|
||
infra resident; RAM was never the constraint. That was fixed **directly**: cc-nix-test bumped to
|
||
**4 vCPU** and made the **only running VM** on b1 (full host CPU). The software micro-optimizations are
|
||
judged unlikely to be worth the effort and are deferred to IDEAS, to be revisited only if measurement
|
||
later proves a specific software bottleneck.
|
||
|
||
**So Phase 2b is reduced to ONE thing:** confirm the per-recipe test sequence already uses the
|
||
**minimum number of deploys** — and fix it if it doesn't — **without weakening any test**. (Operator's
|
||
expectation: we have probably already done this via the deploy-once / deploy-sharing design.)
|
||
|
||
## 1. Mission
|
||
|
||
Verify that a recipe's full test sequence does **not** redeploy more than necessary, and document the
|
||
deploy budget. Reuse a single deployment across the stages that can safely share one; only deploy
|
||
again where a stage genuinely requires a distinct deployment.
|
||
|
||
## 2. Definition of Done (Adversary cold-verifies → REVIEW.md)
|
||
|
||
- [ ] **B1 — Deploy budget is documented and minimal.** Write down, per recipe run, exactly how many
|
||
`abra app deploy`/`upgrade` cycles happen and why each is necessary. Expected minimum:
|
||
- **one** base deploy shared by **install + functional/custom + backup→restore** (restore
|
||
redeploys onto the same app only as the restore mechanism itself requires);
|
||
- **one** additional prior-version deploy **only** for the **upgrade** tier (old→new is the
|
||
whole point of that tier);
|
||
- **one** deploy per declared **dependency** (e.g. an SSO provider), deployed once and reused.
|
||
i.e. `deploys == 1 (base) + 1 (upgrade tier) + N_deps` — no extra/redundant redeploys.
|
||
- [ ] **B2 — Enforced, not just claimed.** The harness already emits a deploy count and fails on a
|
||
mismatch (the DG4.1 `deploy-count != expected` check + the `RUN SUMMARY` `deploy-count` line) —
|
||
point to that as the enforcement and confirm `expected_deploy_count` reflects the minimal budget
|
||
in B1. If any recipe exceeds it, **remove the redundant deploy** (e.g. collapse a needless
|
||
re-deploy between install and functional) and re-verify.
|
||
- [ ] **B3 — No test weakened to save a deploy.** Every stage still runs its real assertions and real
|
||
isolation/teardown; sharing a deployment must not skip or soften any check. Adversary confirms
|
||
from a cold start that suite coverage is unchanged — only the deploy count is reduced/confirmed.
|
||
- [ ] **B4 — Recorded.** A short note (`docs/perf/deploys.md` or DECISIONS.md) states the confirmed
|
||
per-recipe deploy budget and that it is minimal. If it was already minimal, say so explicitly
|
||
(the likely outcome); if a redundant deploy was removed, record before/after counts.
|
||
|
||
When B1–B4 hold and are Adversary-verified, write `## DONE` to Phase-2b `STATUS.md`.
|
||
|
||
## 3. Method
|
||
1. Read `run_recipe_ci.py`/harness: trace every `abra app deploy`/`abra app upgrade` call across the
|
||
stage sequence; count them; map each to a stage and a justification.
|
||
2. Compare to the minimal budget (B1). The existing `deploy-count`/`expected_deploy_count` logic is the
|
||
reference — verify it equals the minimum and that runs pass it.
|
||
3. If over budget on any recipe, eliminate the redundant deploy **without** changing what's tested;
|
||
re-run the full suite (Adversary cold-verifies green + isolation intact).
|
||
4. If already minimal, document the confirmation and finish — do NOT add speculative perf changes
|
||
(those live in IDEAS).
|
||
|
||
## 4. Guardrails
|
||
- **Correctness first:** never weaken/skip/soften a test or break isolation/teardown to cut a deploy.
|
||
- **Bounded:** this phase ONLY confirms/fixes deploy count. Any other perf idea → `IDEAS.md`
|
||
("Phase-2b empirical performance work"); do not re-import them here.
|
||
- **Real abra path** throughout (no docker-level shortcuts).
|