Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly (bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software micro-optimizations are judged unlikely to help, so: - IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline, attribution) + the optimization menu (image cache/prepull, readiness tuning, warm-SSO start/stop, runner caching, concurrency sizing, resources, secret overhead) under "Phase-2b empirical performance work", revisit only if measurement later proves a specific software bottleneck. - plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the per-recipe test sequence already uses the minimum deploys (1 base shared by install+functional+backup/restore, +1 for the upgrade tier, +1 per dep), enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
4.6 KiB
cc-ci Phase 2b — Confirm the test sequence minimizes deploys (no redundant deploys)
Status: QUEUED — starts after Phase 2 (plan-phase2-recipe-tests.md) reaches ## DONE, before
Phase 3. Transition: manual (operator kicks it off). Owner: Builder + Adversary loops.
This file: /srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md
0. Scope (NARROWED — operator, 2026-05-30)
The original Phase 2b was a broad empirical performance program (instrument → baseline → attribute →
optimize). That has been removed and parked in IDEAS.md ("Phase-2b empirical performance work").
Why: the real deploy-speed bottleneck was hardware, not software — the cc-ci VM was 2 vCPU on a 4-core host and disk-I/O-bound (load ~8, io pressure ~65%), with warm-keycloak (JVM) + all infra resident; RAM was never the constraint. That was fixed directly: cc-nix-test bumped to 4 vCPU and made the only running VM on b1 (full host CPU). The software micro-optimizations are judged unlikely to be worth the effort and are deferred to IDEAS, to be revisited only if measurement later proves a specific software bottleneck.
So Phase 2b is reduced to ONE thing: confirm the per-recipe test sequence already uses the minimum number of deploys — and fix it if it doesn't — without weakening any test. (Operator's expectation: we have probably already done this via the deploy-once / deploy-sharing design.)
1. Mission
Verify that a recipe's full test sequence does not redeploy more than necessary, and document the deploy budget. Reuse a single deployment across the stages that can safely share one; only deploy again where a stage genuinely requires a distinct deployment.
2. Definition of Done (Adversary cold-verifies → REVIEW.md)
- B1 — Deploy budget is documented and minimal. Write down, per recipe run, exactly how many
abra app deploy/upgradecycles happen and why each is necessary. Expected minimum: - one base deploy shared by install + functional/custom + backup→restore (restore redeploys onto the same app only as the restore mechanism itself requires); - one additional prior-version deploy only for the upgrade tier (old→new is the whole point of that tier); - one deploy per declared dependency (e.g. an SSO provider), deployed once and reused. i.e.deploys == 1 (base) + 1 (upgrade tier) + N_deps— no extra/redundant redeploys. - B2 — Enforced, not just claimed. The harness already emits a deploy count and fails on a
mismatch (the DG4.1
deploy-count != expectedcheck + theRUN SUMMARYdeploy-countline) — point to that as the enforcement and confirmexpected_deploy_countreflects the minimal budget in B1. If any recipe exceeds it, remove the redundant deploy (e.g. collapse a needless re-deploy between install and functional) and re-verify. - B3 — No test weakened to save a deploy. Every stage still runs its real assertions and real isolation/teardown; sharing a deployment must not skip or soften any check. Adversary confirms from a cold start that suite coverage is unchanged — only the deploy count is reduced/confirmed.
- B4 — Recorded. A short note (
docs/perf/deploys.mdor DECISIONS.md) states the confirmed per-recipe deploy budget and that it is minimal. If it was already minimal, say so explicitly (the likely outcome); if a redundant deploy was removed, record before/after counts.
When B1–B4 hold and are Adversary-verified, write ## DONE to Phase-2b STATUS.md.
3. Method
- Read
run_recipe_ci.py/harness: trace everyabra app deploy/abra app upgradecall across the stage sequence; count them; map each to a stage and a justification. - Compare to the minimal budget (B1). The existing
deploy-count/expected_deploy_countlogic is the reference — verify it equals the minimum and that runs pass it. - If over budget on any recipe, eliminate the redundant deploy without changing what's tested; re-run the full suite (Adversary cold-verifies green + isolation intact).
- If already minimal, document the confirmation and finish — do NOT add speculative perf changes (those live in IDEAS).
4. Guardrails
- Correctness first: never weaken/skip/soften a test or break isolation/teardown to cut a deploy.
- Bounded: this phase ONLY confirms/fixes deploy count. Any other perf idea →
IDEAS.md("Phase-2b empirical performance work"); do not re-import them here. - Real abra path throughout (no docker-level shortcuts).