Files
cc-ci-orchestrator/cc-ci-plan/plan-phase2b-test-performance.md
autonomic-bot e85e16318c Phase 2b narrowed to "confirm minimal deploys"; perf ideas moved to IDEAS
Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM
was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly
(bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software
micro-optimizations are judged unlikely to help, so:

- IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline,
  attribution) + the optimization menu (image cache/prepull, readiness tuning,
  warm-SSO start/stop, runner caching, concurrency sizing, resources, secret
  overhead) under "Phase-2b empirical performance work", revisit only if
  measurement later proves a specific software bottleneck.
- plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the
  per-recipe test sequence already uses the minimum deploys (1 base shared by
  install+functional+backup/restore, +1 for the upgrade tier, +1 per dep),
  enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 05:07:49 +01:00

4.6 KiB
Raw Blame History

cc-ci Phase 2b — Confirm the test sequence minimizes deploys (no redundant deploys)

Status: QUEUED — starts after Phase 2 (plan-phase2-recipe-tests.md) reaches ## DONE, before Phase 3. Transition: manual (operator kicks it off). Owner: Builder + Adversary loops. This file: /srv/cc-ci/cc-ci-plan/plan-phase2b-test-performance.md


0. Scope (NARROWED — operator, 2026-05-30)

The original Phase 2b was a broad empirical performance program (instrument → baseline → attribute → optimize). That has been removed and parked in IDEAS.md ("Phase-2b empirical performance work").

Why: the real deploy-speed bottleneck was hardware, not software — the cc-ci VM was 2 vCPU on a 4-core host and disk-I/O-bound (load ~8, io pressure ~65%), with warm-keycloak (JVM) + all infra resident; RAM was never the constraint. That was fixed directly: cc-nix-test bumped to 4 vCPU and made the only running VM on b1 (full host CPU). The software micro-optimizations are judged unlikely to be worth the effort and are deferred to IDEAS, to be revisited only if measurement later proves a specific software bottleneck.

So Phase 2b is reduced to ONE thing: confirm the per-recipe test sequence already uses the minimum number of deploys — and fix it if it doesn't — without weakening any test. (Operator's expectation: we have probably already done this via the deploy-once / deploy-sharing design.)

1. Mission

Verify that a recipe's full test sequence does not redeploy more than necessary, and document the deploy budget. Reuse a single deployment across the stages that can safely share one; only deploy again where a stage genuinely requires a distinct deployment.

2. Definition of Done (Adversary cold-verifies → REVIEW.md)

  • B1 — Deploy budget is documented and minimal. Write down, per recipe run, exactly how many abra app deploy/upgrade cycles happen and why each is necessary. Expected minimum: - one base deploy shared by install + functional/custom + backup→restore (restore redeploys onto the same app only as the restore mechanism itself requires); - one additional prior-version deploy only for the upgrade tier (old→new is the whole point of that tier); - one deploy per declared dependency (e.g. an SSO provider), deployed once and reused. i.e. deploys == 1 (base) + 1 (upgrade tier) + N_deps — no extra/redundant redeploys.
  • B2 — Enforced, not just claimed. The harness already emits a deploy count and fails on a mismatch (the DG4.1 deploy-count != expected check + the RUN SUMMARY deploy-count line) — point to that as the enforcement and confirm expected_deploy_count reflects the minimal budget in B1. If any recipe exceeds it, remove the redundant deploy (e.g. collapse a needless re-deploy between install and functional) and re-verify.
  • B3 — No test weakened to save a deploy. Every stage still runs its real assertions and real isolation/teardown; sharing a deployment must not skip or soften any check. Adversary confirms from a cold start that suite coverage is unchanged — only the deploy count is reduced/confirmed.
  • B4 — Recorded. A short note (docs/perf/deploys.md or DECISIONS.md) states the confirmed per-recipe deploy budget and that it is minimal. If it was already minimal, say so explicitly (the likely outcome); if a redundant deploy was removed, record before/after counts.

When B1B4 hold and are Adversary-verified, write ## DONE to Phase-2b STATUS.md.

3. Method

  1. Read run_recipe_ci.py/harness: trace every abra app deploy/abra app upgrade call across the stage sequence; count them; map each to a stage and a justification.
  2. Compare to the minimal budget (B1). The existing deploy-count/expected_deploy_count logic is the reference — verify it equals the minimum and that runs pass it.
  3. If over budget on any recipe, eliminate the redundant deploy without changing what's tested; re-run the full suite (Adversary cold-verifies green + isolation intact).
  4. If already minimal, document the confirmation and finish — do NOT add speculative perf changes (those live in IDEAS).

4. Guardrails

  • Correctness first: never weaken/skip/soften a test or break isolation/teardown to cut a deploy.
  • Bounded: this phase ONLY confirms/fixes deploy count. Any other perf idea → IDEAS.md ("Phase-2b empirical performance work"); do not re-import them here.
  • Real abra path throughout (no docker-level shortcuts).