Records the exact tier+deps/SSO -> rung translation derive_rungs uses (the layer the level depends
on), gap-caps semantics (N/A caps like fail, conservative/never-inflate), the results.json schema,
flags (clean_teardown/no_secret_leak), and artifact dir ${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/
(dashboard serves /runs/<id>/ in U2/U4). So the Adversary can verify the level against a documented contract.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per-recipe deploy budget confirmed minimal (1 base + N_cold_deps, upgrade shares
the base in place) and enforced (DG4.1); no redundant deploy existed. All four DoD
items PASS in REVIEW-2b (edf34e3). Phase 2b complete.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase 2b confirm-and-document outcome: per-recipe test-sequence deploy budget is
already minimal — `deploys == 1 (base, shared by all 5 tiers) + N_cold_deps` — and
tighter than plan B1's nominal `1+1(upgrade)+N` because the upgrade is an in-place
chaos redeploy of the prev-version base, not a separate deploy. Enforced as a hard
failure by DG4.1 (expected = 1 + deps_deployed_count, run_recipe_ci.py:1005-1010).
No redundant deploy found; none removed (none existed).
- docs/perf/deploys.md: the budget record (B4), names the out-of-budget WC5 reseed
- STATUS-2b.md: B1-B4 claim with WHAT/HOW/EXPECTED/WHERE for cold verify
- JOURNAL-2b.md / BACKLOG-2b.md / DECISIONS.md: reasoning + settled note
- consume machine-docs/BUILDER-INBOX.md (Adversary heads-up processed)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Explain the cc-ci server -> Hetzner migration (ssh cc-ci now 91.98.47.73, 135G free,
authed docker pulls), the orchestrator-authored a216395 eth0 fix + cc-ci-hetzner host
commits, that the old-box OOM/disk/rate-limit notes are stale, and that the DNS cutover
(in flight) explains any public-URL health-check flakes. Loops delete on consume.
nixos-infect emitted defaultGateway6.address="" and ipv6.routes=[{address="";
prefixLength=128}] for this v4-only Hetzner instance, so network-addresses-eth0.service
failed at boot ("ip route add /128 ... any valid prefix is expected rather than /128").
The box has no real IPv6 (link-local only, kernel-managed), so remove the empty IPv6
gateway, address, and route. IPv4 unchanged.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port from terraform-hetzner branch. Adds the Hetzner cc-ci flake host with
all 3 root authorized keys so nixos-rebuild doesn't lock out SSH access.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The depends_on:[app] override in 04cc44c does NOT make compose valid: docker normalizes short-form
depends_on to a map and merges additively, so {discourse}+{app}={discourse,app} keeps the invalid
'discourse' key (config --images still rc=15). Reverted to keep the overlay minimal (re-pin + grace
only). Prepull-skip is harmless because bitnamilegacy/discourse:3.3.1 is warm in the node image cache
→ inline pull is a no-op. Timeout headroom (3600s) retained in recipe_meta.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
full4 base deploy timed out at 2400s on the 7-GiB single node. Root causes:
(1) sidekiq.depends_on referenced undefined service 'discourse' (main svc is 'app') → abra config
--images rc=15 → prepull SKIPPED → 2.4GB image pulled inline during deploy, eating convergence
budget. Overlay now overrides sidekiq.depends_on:[app] (swarm ignores depends_on → no-op at
runtime, masks nothing) so prepull resolves+pre-pulls images on both base+head deploys.
(2) bumped DEPLOY_TIMEOUT/TIMEOUT 2400→3600 for headroom on the RAM/CPU-constrained Rails cold boot.
Also pre-cached bitnamilegacy/discourse:3.3.1 by tag on cc-ci (was dangling <none>).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
full9: backup tier FAILed with NameError('__file__' not defined) — recipe_meta.py is exec()'d into a
bare namespace so __file__ is undefined. The harness already has runner/ on sys.path + harness imported,
so import lifecycle directly. (restore PASSED on full9 — the data-integrity fix works; this just fixes
the verify probe crashing the backup tier.)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Root cause (instrumented, DECISIONS 2026-05-30): a DB recipe dumps its data in a backupbot pre-hook,
but if the DB container cycles mid-dump (intermittent on the loaded CI node — full5/6/7 RED, full8
green; NOT OOM/NOT healthcheck) the dump is truncated/absent and restic snapshots an empty path —
abra app backup 'succeeds' yet a later restore silently loses the data (ghost ci_marker).
Fix (additive, recipe-scoped via meta like READY_PROBE): recipe_meta may define BACKUP_VERIFY(domain)
-> bool, a READ-ONLY post-backup integrity probe. When it returns False the harness re-runs the whole
backup (fresh snapshot, re-stabilised db) up to 3x. Recipes without the hook are unaffected. ghost's
BACKUP_VERIFY confirms /var/lib/mysql/backup.sql.gz is a valid non-empty gzip. Weakens no assertion —
it only retries a flaky CAPTURE so P4 restore is RELIABLY exercised, not luck-dependent.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>