Phase 2b narrowed to "confirm minimal deploys"; perf ideas moved to IDEAS
Operator (2026-05-30): the real deploy-speed bottleneck was hardware (cc-ci VM was 2 vCPU on a 4-core host + disk-I/O-bound; RAM fine), now fixed directly (bumped to 4 vCPU, made cc-nix-test the only running VM on b1). The 2b software micro-optimizations are judged unlikely to help, so: - IDEAS.md: parked the whole empirical-perf program (instrumentation, baseline, attribution) + the optimization menu (image cache/prepull, readiness tuning, warm-SSO start/stop, runner caching, concurrency sizing, resources, secret overhead) under "Phase-2b empirical performance work", revisit only if measurement later proves a specific software bottleneck. - plan-phase2b: reduced to ONE goal — confirm (and fix if needed) that the per-recipe test sequence already uses the minimum deploys (1 base shared by install+functional+backup/restore, +1 for the upgrade tier, +1 per dep), enforced by the existing DG4.1 deploy-count check, WITHOUT weakening any test. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@ -82,3 +82,32 @@ item into the project `BACKLOG.md` as `[idea]` if/when it becomes relevant.
|
||||
bottleneck** (e.g. D8 throwaway-rebuild / fresh-canonical seeding) **AND** the cache lives on
|
||||
**recreate-surviving storage** (an Incus volume / a path on host b1, not the VM's ephemeral disk).
|
||||
Otherwise it's complexity without payoff. See DECISIONS.md "Phase 2pc". *Added:* 2026-05-29.
|
||||
|
||||
- **Phase-2b empirical performance work (moved out of the 2b phase).** The original Phase 2b was a full
|
||||
empirical perf program: per-phase timing instrumentation in `results.json`, a cold/warm baseline
|
||||
across representative recipes, a Pareto attribution, and a menu of software optimizations. **Deferred
|
||||
(operator, 2026-05-30):** the real deploy-speed bottleneck turned out to be **hardware**, not
|
||||
software — the cc-ci VM was **2 vCPU on a 4-core host** and **disk-I/O-bound** (load ~8, io pressure
|
||||
~65%) while running warm-keycloak (JVM) + all infra; RAM was never the constraint. Fixed **directly**:
|
||||
bumped to **4 vCPU** and made cc-nix-test the **only running VM** on b1. The software micro-opts below
|
||||
are judged unlikely to move the needle enough to justify the work; revisit ONLY if measurement later
|
||||
shows a specific software bottleneck. (Phase 2b is narrowed to just confirming the test sequence
|
||||
already minimizes deploys — see plan-phase2b.) Parked ideas:
|
||||
- **Per-phase timing instrumentation** + cold/warm **baseline** + **attribution** — do this first if
|
||||
perf is ever revisited; numbers should drive any change.
|
||||
- **Image pulls:** local registry pull-through cache (see the item above) and/or pre-pull/warm the
|
||||
enrolled recipes' image set so the first run doesn't pay the cold pull.
|
||||
- **Readiness/convergence:** replace fixed sleeps with tight health-endpoint polling; per-recipe
|
||||
readiness probes; parallelize independent readiness checks within a run.
|
||||
- **Warm shared SSO provider** (already partly live as warm-keycloak): saves per-run SSO deploy time
|
||||
but is a steady JVM CPU tax that slows non-SSO recipes — only worth it with proven per-run
|
||||
isolation; consider start-when-needed / stop-when-idle rather than always-on.
|
||||
- **Runner/build caching:** persistent nix store + warm flake eval; cache pip/uv wheels + Playwright
|
||||
browsers in a persistent volume.
|
||||
- **Concurrency sizing:** tune `MAX_TESTS`/runner capacity + per-recipe weights so light recipes run
|
||||
concurrently while heavy ones serialize, without overcommitting the node.
|
||||
- **Resources:** further vCPU/RAM/disk-I/O sizing (the 4-vCPU bump is done; storage I/O on b1 is the
|
||||
harder co-bottleneck — a faster storage pool if it ever matters).
|
||||
- **abra/secret overhead:** avoid regenerating/re-inserting secrets redundantly across stages.
|
||||
*Why deferred:* hardware was the real lever and is fixed; these are speculative software gains best
|
||||
validated by measurement, not assumed. *Added:* 2026-05-30.
|
||||
|
||||
Reference in New Issue
Block a user