All checks were successful
continuous-integration/drone/push Build is passing
64 lines
4.5 KiB
Markdown
64 lines
4.5 KiB
Markdown
# JOURNAL — phase `dstamp` (Builder, reasoning/private)
|
|
|
|
## 2026-06-11 — Bootstrap + investigation
|
|
|
|
Read the phase plan, plan.md §6.1/§7/§9, the Adversary's REVIEW-dstamp prep notes, and the
|
|
stamp-relevant harness code (`abra.py`, `lifecycle.py:deployed_identity/recipe_checkout_ref/
|
|
chaos_redeploy/prepull_images`, `generic.py:perform_upgrade/assert_upgraded`, run_recipe_ci
|
|
upgrade op + fetch_recipe).
|
|
|
|
### Mechanism (from abra source @06a57de = the pinned binary)
|
|
chaos-version label is set in `cli/app/deploy.go`: for a `-C` deploy, `getDeployVersion` (l.365)
|
|
returns `Recipe.ChaosVersion()` (l.367-373) and `SetChaosVersionLabel(compose, stack, toDeployVersion)`
|
|
(l.168). `ChaosVersion` (`pkg/recipe/git.go:300`) = `formatter.SmallSHA(Head().String())` + `+U`
|
|
if dirty. `Head` (l.483) = go-git `repo.Head()`. Crucially, `app.Recipe.Ensure(ctx)` (deploy.go:86)
|
|
calls into git.go:38 which **early-returns on `ctx.Chaos`** (l.41-43) — so a chaos deploy does NOT
|
|
re-checkout the .env version. `GetEnsureContext` (cli/internal/ensure.go) wires `EnsureContext{Chaos,
|
|
Offline, IgnoreEnvVersion=DeployLatest}` from the CLI flags. So `-C` ⇒ Ensure no-op ⇒ chaos version
|
|
= whatever git HEAD the harness left checked out.
|
|
|
|
### The contradiction that drove the dig
|
|
The m2p failure message is `chaos commit 'eb96de94+U', not the intended PR-head '7ae7b0f76efb'`.
|
|
`eb96de9` = tag `0.7.0+3.3.1` (the upgrade base); `7ae7b0f` = PR head (9 commits past that tag,
|
|
and there is NO 0.8/0.9 tag despite HEAD's "upgrade to 0.9.0+3.5.0" message). The harness
|
|
`perform_upgrade` does `recipe_checkout_ref(head_ref=7ae7b0f)` then `chaos_redeploy`, with only
|
|
`env_set` + `prepull_images` (pure docker compose, no git) in between — and the run's recipe
|
|
**snapshot HEAD = 7ae7b0f**. So at deploy time HEAD *should* be 7ae7b0f ⇒ stamp 7ae7b0f. Yet it
|
|
stamped eb96de9. abra's source says chaos = Head(); so for eb96de9 to be stamped, HEAD had to be
|
|
eb96de9 at the chaos deploy — which the isolated flow never produces.
|
|
|
|
### Reproductions (all on cc-ci, scratch ABRA_DIR, deploys bail at `secret not generated`
|
|
### which is deploy.go:140, AFTER the chaos version is computed+logged at deploy.go:372)
|
|
1. cp -a canonical recipe, checkout head→base(tag)→head, `abra app deploy -C` → `taking chaos
|
|
version: 7ae7b0f7`. HEAD stays 7ae7b0f. NO drift.
|
|
2. real non-chaos base deploy (exercises go-git `EnsureVersion` which checks out tag via
|
|
`Branch: refs/tags/0.7.0+3.3.1`, leaving HEAD=eb96de9), then CLI `git checkout -f head`, then
|
|
`-C` deploy → `taking chaos version: 7ae7b0f7`. NO drift.
|
|
3. mirror-faithful: `git clone <recipe-maintainers/discourse>` + `git checkout 7ae7b0f` +
|
|
`git fetch <coop-cloud/discourse> refs/tags/*:refs/tags/*` (exact `fetch_recipe`), then base
|
|
deploy → re-checkout head → `-C` deploy → `taking chaos version: 7ae7b0f7`. NO drift.
|
|
|
|
Conclusion: the isolated git/abra version-resolution path is **correct** in the current host
|
|
state. The drift is not in that path.
|
|
|
|
### Timeline / differentiator
|
|
- abra binary: constant since 2026-06-01 (system-4). Not abra.
|
|
- Same ref 7ae7b0f: run 184 (06-05 02:17, **solo**) was L4 upgrade-PASS. The drift runs
|
|
(m2b 06-10 20:54, m2p 06-11 00:44, ab 06-11 00:48) are **clustered** (m2p & ab 4 min apart →
|
|
overlapping for a multi-tier discourse run that takes ≫4 min).
|
|
- `app_domain` hashes (recipe|pr|ref) ⇒ all three drift runs, same ref, **collide on one swarm
|
|
stack**. The upgrade `chaos_redeploy` does NOT take `deploy_app`'s app-domain flock, so two
|
|
concurrent runs can interleave deploys on the shared stack and the `<stack>_app` service label
|
|
read by `deployed_identity` reflects whichever deploy last wrote it.
|
|
|
|
**Leading hypothesis:** the "harness-neutral env drift" is actually a **concurrency artifact** of
|
|
the rcust-phase M2 A/B discourse experiments running near-simultaneously on the shared stack — not
|
|
an abra/recipe/environment regression. Run 184 solo = green; clustered 06-11 = drift; isolated
|
|
re-reproduction now = green. Testing with one clean isolated real run (install,upgrade) before
|
|
committing to this attribution — direct evidence required by the plan, not inference alone.
|
|
|
|
Open: must still explain *exactly* how a concurrent peer produces an `eb96de9+U` (dirty CHAOS)
|
|
label on the shared stack — a base deploy is pinned/non-chaos (no chaos label), so the +U chaos
|
|
label must come from some chaos deploy with HEAD=eb96de9. The isolated real run + (if needed) a
|
|
deliberate 2-run concurrency repro will nail the mechanism. Will NOT claim M1 on inference.
|