3.6 KiB
BACKLOG — phase dstamp
Build backlog (Builder-owned)
- Read phase plan + plan.md §6.1/§7/§9 + Adversary prep notes + stamp-relevant harness code.
- Establish abra's chaos-version mechanism from abra source @06a57de (= pinned binary).
- Rule out abra-version drift (constant store path since nixos system-4, 2026-06-01).
- Minimal reproductions of the git/abra chaos-version path (cp-a; go-git base; mirror-faithful) — all stamp the CORRECT head 7ae7b0f7, NO drift in current host state.
- Timeline: run 184 (06-05, solo) green @7ae7b0f; clustered 06-10/06-11 runs drift @ same ref.
- Identify shared-stack collision vector (
app_domain= hash(recipe|pr|ref); upgrade chaos_redeploy bypasses app-domain flock). - IN FLIGHT: single clean ISOLATED real run (install,upgrade @7ae7b0f, console-captured) → decide concurrency-artifact vs real drift.
- If concurrency artifact: pin the exact mechanism producing the
eb96de9+Uchaos label on the shared stack (deliberate 2-run concurrency repro if needed); decide the fix (app-lock the upgrade chaos_redeploy / serialize same-stack runs) WITHOUT weakening HC1. - If real env drift: read the isolated-run console, attribute the exact 06-05→06-10 change.
- Blast-radius sweep: every enrolled recipe's latest upgrade-tier evidence for the same signature (prev-base tag commit stamped where a version was expected).
- Restore discourse to its true level in real CI via the drone
!testmepath (M2). - Prove HC1 still has teeth (a deliberately wrong stamp still FAILs).
- Close the DEFERRED.md dstamp re-entry with pointers.
Adversary findings
Root cause independently confirmed @2026-06-11T17:3x (JOURNAL not read, anti-anchoring preserved):
Docker Swarm failure_action: rollback + order: start-first in discourse's compose.yml app
service (BOTH eb96de94 base AND 7ae7b0f PR-head). On the upgrade chaos redeploy, start-first
runs OLD + NEW tasks co-resident (~2× memory); the heavy Rails/precompile app fails swarm's 5s
update monitor under host memory pressure → rollback fires → app service spec reverts to
PreviousSpec (chaos-version=eb96de94+U). Because start-first kept the OLD task serving,
wait_healthy passed; deployed_identity read the rolled-back spec; HC1 misreported it as
"stamp mismatch" (the real failure was "new task failed the update monitor").
services_converged blind spot: "rollback_completed" not in blocking states → returned True.
Evidence: docker service inspect disc-ae10f0_..._app confirmed UpdateConfig: {On failure: rollback, Order: start-first, Monitoring Period: 5s}. repro1 (isolated, no concurrency) ALSO
showed drift → pure-concurrency hypothesis REFUTED independently before reading Builder evidence.
abra exonerated: abra reads git HEAD = 7ae7b0f and stamps 7ae7b0f7+U CORRECTLY. Three
bail-at-secrets repros + repro2 debug line confirm. The +U comes from compose.ccci.yml as
untracked file in per-run recipe dir (rcust-era overlay absent from run 184's pre-rcust path).
Fix 0cc31a5 assessed CORRECT: overlay sets order: stop-first (eliminates OOM 2×-memory
trigger); lifecycle.assert_upgrade_converged closes the wait_healthy blind spot by catching
"rollback_completed"|"rollback_paused"|"paused" and failing HONESTLY. HC1 unchanged.
Minor race window in assert_upgrade_converged (first poll could see "none" before Docker
starts the roll) is covered: with stop-first, a post-race rollback also fails wait_healthy.
No blocker. Formal verdict awaits Builder's claim(dstamp) commit.