Files
cc-ci/JOURNAL-dstamp.md

4.5 KiB

JOURNAL — phase dstamp (Builder, reasoning/private)

2026-06-11 — Bootstrap + investigation

Read the phase plan, plan.md §6.1/§7/§9, the Adversary's REVIEW-dstamp prep notes, and the stamp-relevant harness code (abra.py, lifecycle.py:deployed_identity/recipe_checkout_ref/ chaos_redeploy/prepull_images, generic.py:perform_upgrade/assert_upgraded, run_recipe_ci upgrade op + fetch_recipe).

Mechanism (from abra source @06a57de = the pinned binary)

chaos-version label is set in cli/app/deploy.go: for a -C deploy, getDeployVersion (l.365) returns Recipe.ChaosVersion() (l.367-373) and SetChaosVersionLabel(compose, stack, toDeployVersion) (l.168). ChaosVersion (pkg/recipe/git.go:300) = formatter.SmallSHA(Head().String()) + +U if dirty. Head (l.483) = go-git repo.Head(). Crucially, app.Recipe.Ensure(ctx) (deploy.go:86) calls into git.go:38 which early-returns on ctx.Chaos (l.41-43) — so a chaos deploy does NOT re-checkout the .env version. GetEnsureContext (cli/internal/ensure.go) wires EnsureContext{Chaos, Offline, IgnoreEnvVersion=DeployLatest} from the CLI flags. So -C ⇒ Ensure no-op ⇒ chaos version = whatever git HEAD the harness left checked out.

The contradiction that drove the dig

The m2p failure message is chaos commit 'eb96de94+U', not the intended PR-head '7ae7b0f76efb'. eb96de9 = tag 0.7.0+3.3.1 (the upgrade base); 7ae7b0f = PR head (9 commits past that tag, and there is NO 0.8/0.9 tag despite HEAD's "upgrade to 0.9.0+3.5.0" message). The harness perform_upgrade does recipe_checkout_ref(head_ref=7ae7b0f) then chaos_redeploy, with only env_set + prepull_images (pure docker compose, no git) in between — and the run's recipe snapshot HEAD = 7ae7b0f. So at deploy time HEAD should be 7ae7b0f ⇒ stamp 7ae7b0f. Yet it stamped eb96de9. abra's source says chaos = Head(); so for eb96de9 to be stamped, HEAD had to be eb96de9 at the chaos deploy — which the isolated flow never produces.

Reproductions (all on cc-ci, scratch ABRA_DIR, deploys bail at secret not generated

which is deploy.go:140, AFTER the chaos version is computed+logged at deploy.go:372)

  1. cp -a canonical recipe, checkout head→base(tag)→head, abra app deploy -Ctaking chaos version: 7ae7b0f7. HEAD stays 7ae7b0f. NO drift.
  2. real non-chaos base deploy (exercises go-git EnsureVersion which checks out tag via Branch: refs/tags/0.7.0+3.3.1, leaving HEAD=eb96de9), then CLI git checkout -f head, then -C deploy → taking chaos version: 7ae7b0f7. NO drift.
  3. mirror-faithful: git clone <recipe-maintainers/discourse> + git checkout 7ae7b0f + git fetch <coop-cloud/discourse> refs/tags/*:refs/tags/* (exact fetch_recipe), then base deploy → re-checkout head → -C deploy → taking chaos version: 7ae7b0f7. NO drift.

Conclusion: the isolated git/abra version-resolution path is correct in the current host state. The drift is not in that path.

Timeline / differentiator

  • abra binary: constant since 2026-06-01 (system-4). Not abra.
  • Same ref 7ae7b0f: run 184 (06-05 02:17, solo) was L4 upgrade-PASS. The drift runs (m2b 06-10 20:54, m2p 06-11 00:44, ab 06-11 00:48) are clustered (m2p & ab 4 min apart → overlapping for a multi-tier discourse run that takes ≫4 min).
  • app_domain hashes (recipe|pr|ref) ⇒ all three drift runs, same ref, collide on one swarm stack. The upgrade chaos_redeploy does NOT take deploy_app's app-domain flock, so two concurrent runs can interleave deploys on the shared stack and the <stack>_app service label read by deployed_identity reflects whichever deploy last wrote it.

Leading hypothesis: the "harness-neutral env drift" is actually a concurrency artifact of the rcust-phase M2 A/B discourse experiments running near-simultaneously on the shared stack — not an abra/recipe/environment regression. Run 184 solo = green; clustered 06-11 = drift; isolated re-reproduction now = green. Testing with one clean isolated real run (install,upgrade) before committing to this attribution — direct evidence required by the plan, not inference alone.

Open: must still explain exactly how a concurrent peer produces an eb96de9+U (dirty CHAOS) label on the shared stack — a base deploy is pinned/non-chaos (no chaos label), so the +U chaos label must come from some chaos deploy with HEAD=eb96de9. The isolated real run + (if needed) a deliberate 2-run concurrency repro will nail the mechanism. Will NOT claim M1 on inference.