Files
cc-ci-orchestrator/cc-ci-plan/plan-phase-dstamp-discourse-drift.md
autonomic-bot 327b9f4efe plan: phases dstamp, mailu, kuma, drone (queued after bsky) + journal
- dstamp: attribute + fix the discourse abra-stamp drift (env change 06-05→
  06-10, harness-neutral, currently pinning discourse at L1); blast-radius
  sweep; HC1 keeps its teeth
- mailu: backupbot v2 labels recipe PR, restore proven on real seeded mail,
  backup rung earned instead of skipped (operator approved re-entry)
- kuma: uptime-kuma first-run wizard + create-a-monitor functional test
  (Socket.IO or Playwright, real probe evidence, flake-checked)
- drone: gitea-dep enrollment, maximal subset per Phase-2 scoping;
  P0 /etc/timezone host deploy is orchestrator-owned (3bde76f committed)
2026-06-11 11:43:03 +00:00

80 lines
4.8 KiB
Markdown

# Phase `dstamp` — investigate & solve the discourse abra-stamp drift
**Mission (operator-specified):** since ~2026-06-10, discourse's upgrade tier fails its
HC1 version-stamp check on EVERY run — on both old and new harness at the same ref, so it
is proven harness-neutral env drift, and its mechanism is UNATTRIBUTED. Find the root
cause, fix it properly, restore discourse to its true level in real CI, and determine
whether the same drift silently affects any other recipe's upgrade tier.
State files: `STATUS-dstamp.md`, `BACKLOG-dstamp.md`, `REVIEW-dstamp.md`,
`JOURNAL-dstamp.md`. DECISIONS.md shared.
## 1. Known evidence (rcust M2, 2026-06-11 — start here)
- Baseline: run 184 (2026-06-05) had discourse at L4 with upgrade green.
- Since ~06-10: upgrade-HC1 at ref `7ae7b0f` stamps the **prev-base tag commit
(eb96de94+U)** instead of the expected version — IDENTICAL on old pre-rcust harness
and new main (A/B at same ref + invocation ⇒ rcust exonerated, branch-tip/tag/abra-pin
drift eliminated as causes during M2; what CHANGED in the env between 06-05 and 06-10
was never attributed).
- Evidence artifacts: `/var/lib/cc-ci-runs/m2p-discourse/`,
`/var/lib/cc-ci-runs/ab-discourse-7ae7b0f-oldmain/`, JOURNAL-rcust 2026-06-11 entries,
machine-docs/DEFERRED.md note.
- Under the new de-capped semantics a failed upgrade rung blocks at L1 — this drift is
actively misrepresenting discourse, which makes it a live quality regression, not
cosmetic.
## 2. Investigation requirements
1. **Attribute, don't patch.** Build a timeline of everything stamp-relevant that changed
on the CI host between 06-05 and 06-10: abra binary version/mtime (`abra --version`,
~/.local/bin or wherever it lives), the recipe catalogue state, `~/.abra/recipes/
discourse` git state (tags, fetch times, what `git describe`/abra version resolution
sees), the discourse mirror's tags/branches (was a tag re-pointed upstream or in the
mirror?), and the harness's stash/revert dance around `abra recipe lint`/pinned deploy
(`runner/harness/abra.py:109-114`) — plus how upgrade-HC1 derives its EXPECTED stamp.
2. **Reproduce minimally** outside a full run if possible (the abra version-resolution
command against the same checkout) so the Adversary can re-run the attribution cheaply.
3. **Classify the fix target honestly:** env state (fix the host state + document how it
drifted), harness assumption (fix run_recipe_ci/lifecycle WITHOUT weakening HC1 — the
check itself must keep its teeth), or recipe/mirror tag problem (recipe-mirror PR,
never merged). If the expected-stamp derivation is what is wrong, the correction must
be justified against abra's documented behavior, not against "what makes the test
pass".
4. **Blast-radius sweep:** once attributed, check every enrolled recipe's most recent
upgrade-tier evidence for the same signature (prev-base tag commit stamped where a
version was expected). Any other affected recipe gets fixed by the same root-cause fix
and re-proven.
## 3. Gates
**M1 — Attribution.** Root cause documented with a reproducible minimal demonstration +
the 06-05→06-10 change identified by direct evidence (not inference alone); fix
implemented (env/harness/recipe per §2.3); blast-radius sweep complete. Adversary
independently reproduces the minimal demonstration and re-derives the attribution.
**M2 — Proven in real CI.** Discourse full lifecycle green with upgrade-HC1 stamping the
CORRECT value at its true level (expected L4+ / L5 if lint passes); ≥1 run via the drone
`!testme` path; any other affected recipes re-proven; HC1 demonstrably NOT weakened (the
Adversary must show a wrong stamp still fails — synthesize one if needed). DEFERRED
entry closed with pointers. Fresh Adversary PASS → `## DONE`.
## 4. Guardrails (binding)
- **HC1 keeps its teeth** — any change that would let a genuinely wrong version stamp
pass is an automatic FAIL.
- Recipe mirrors: PR only, never push main, never merge. Shared checkout race: never
git-checkout `~/.abra/recipes/discourse` while its build runs. Real-CI etiquette:
≤2-3 concurrent deploys, teardown on every exit path, no secrets in logs/commits.
- Host-state changes (abra binary, catalogue) beyond reading require a DECISIONS.md
entry; if the fix needs an abra version pin/upgrade host-wide, propose it in
STATUS-dstamp.md for the orchestrator/operator instead of doing it unilaterally.
- Commit author `autonomic-bot <autonomic-bot@noreply.git.autonomic.zone>`; push every
commit. CI host has no python3 on default PATH — use shell or `cc-ci-run`.
## 5. Definition of Done
Drift mechanism attributed with reproducible evidence; fixed at the true root; discourse
back at its real level in real CI (drone path included); no other recipe silently
affected; HC1 unweakened and adversarially re-proven; DEFERRED closed; M1+M2 fresh PASSes.