Root cause of the 2 failing custom tests: TLS_FLAVOR=notls → dovecot refuses plaintext auth over
network 143, so host-side IMAP login/auth isn't a meaningful signal. Smoke2 PROVED the in-container
path: sendmail (postfix container) local-injects a marker mail → doveadm search (imap container) finds
it in INBOX. test_mail_flow now exercises the real postfix→rspamd→dovecot deliver/store/fetch via
exec_in_app(service=smtp/imap). Dropped test_imap_login (network plaintext-auth disallowed under notls).
test_mailbox (create+config-export read-back) unchanged. PARITY.md updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diagnostic (RECIPE=mumble STAGES=install,backup,restore,custom, no upgrade) PROVED backup+restore green
on a stable 1.0.0 deploy incl. ci_marker survival (P4). The full-run backup 409 ('container not
running') was the chaos UPGRADE redeploy: host-mode 64738 must be released by the old task + rebound by
the new, and HEALTH_PATH '/' only proves the mumble-web sidecar (not the voice server), so wait_healthy
passed while the app churned → backup-bot execed a not-running container. Fix: extend
lifecycle.wait_ready_probes to support a TCP probe ({tcp_host,tcp_port,stable=N consecutive connects});
mumble recipe_meta READY_PROBE returns 64738 (stable=3) so the harness waits for the voice server up
after install AND upgrade before backup.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
git checkout <head_ref> aborted on the untracked install_steps-provided compose.host-ports.yml (which
head_ref tracks). Force-checkout yields the exact ref tree. Also fixes the mumble restore tier: backup
labels exist only in 1.0.0+, so backup/restore are meaningful only after the (now-working) upgrade moves
the app to head_ref. DECISIONS.md updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PRAGMA busy_timeout=N emits its own result row, polluting the read-back parse (seed read back
'20000\nupgrade-survives' → AssertionError 'seed did not commit', failing upgrade/backup/restore ops
— though the INSERT actually committed). Switch _sqlite to 'sqlite3 -cmd ".timeout 20000"' which sets
the busy timeout silently. install+custom already green (handshake/welcome/web/tcp PASS); this fixes
the P4 lifecycle ops.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mumble's pinned base deploy (prev version 0.2.0) FATAs 'has locally unstaged changes' because
install_steps provides an untracked compose.host-ports.yml. New recipe_meta CHAOS_BASE_DEPLOY=True +
lifecycle._recipe_meta_flag + deploy_app branch -> base uses chaos (skips clean-tree/lint, deploys the
checked-out pinned version, not LATEST), mirroring the lightweight-tag chaos-base path. DECISIONS.md
records the full mumble enrollment design.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The upstream compose.host-ports.yml exists only from v1.0.0+, but the upgrade-tier base deploy is
the previous published version (0.2.0+), which predates it — so EXTRA_ENV's COMPOSE_FILE failed to
resolve on the base deploy (config --images rc=14, deploy FATA). install_steps.sh now copies a
cc-ci-owned identical overlay into the recipe checkout when absent, so 64738 is host-published for
every version (base + upgrade) and on-host protocol tests reach 127.0.0.1:64738.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
v1 failed wait_healthy 'not healthy / (last status 500)': plausible's app starts before clickhouse
(plausible_events_db) is ready (recipe depends_on names events_db, mismatched → no swarm ordering) and
returns 500 until DB migrations finish (several min on cold deploy). It serves 302 once ready; widen
the health window.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lifecycle.prepull_images(recipe, domain): resolve images via docker compose config --images (COMPOSE_FILE
from the app .env — handles $VERSION interpolation + multi-compose) → docker pull each, skip-if-present
(zero network for cached pinned tags). Called in deploy_app before the (unchanged, real) abra.deploy AND
in generic.perform_upgrade before the chaos redeploy (warms new-version images). A pull failure RAISES a
clear pre-deploy error (not a converge timeout); deploy path unchanged (no docker service update/scale).
Removes PULL time not app-INIT time. 4 unit tests (tests/unit/test_prepull.py): present→skip, missing→
pull, pull-fail→raise, no-images→skip. NOT claimed yet — validating cold-verify criteria next.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adversary cold-verify of F2-9 FAILED: the read-back's CKEditor-frame-attach wait timed out on a fresh
cold context (flaky, not 3x-reliable). Fix: read-back now polls EVERY frame's body text for the marker
(don't require the specific ckeditor-inner frame to attach — that's the flaky part) with a generous
~240s deadline + periodic reloads to unstick cold loads. The marker appearing in a fresh context still
proves server-side E2E-encrypted persistence (only URL+fragment key carried over). Also bumped the
session-1 post-type sync wait 9s→12s. F2-13 Adversary-owned; will validate cold before it closes F2-9.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>