Files
cc-ci/machine-docs/JOURNAL-redfix.md
autonomic-bot 3e61473365
Some checks failed
continuous-integration/drone/push Build is failing
chore(redfix): bootstrap phase state files (STATUS/BACKLOG/JOURNAL); M1 investigation tracker seeded
2026-06-17 23:20:55 +00:00

26 lines
1.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# JOURNAL — phase `redfix`
## 2026-06-17T23:20Z — Bootstrap
Read phase plan + plan.md §6.1/§7/§9 + canon DECISIONS exceptions (lines ~14941552). Six
canon-sweep failures to investigate. Confirmed cc-ci access, no run in flight, sweep timer next
fires 2026-06-21 (3-day window), disk 38G free.
Isolation mechanism understood: `runner/nightly_sweep.run_on_tag` = `abra.recipe_checkout(r, tag)` +
`run_recipe_ci.py RECIPE=<r> CCCI_SKIP_FETCH=1` cold/full. I reproduce each failure by running ONE
recipe at a time with no concurrent load.
Starting canonical state notable: **mumble canonical IS present** (`1.0.0+v1.6.870-0`, written
20260617T180501Z — during today's nixenv sweep). The canon DECISIONS recorded mumble RED
(`test_handshake_completes_with_channel_presence`). A canonical only gets written on a GREEN cold run
on latest → mumble flipped green in a recent run. Strong early evidence for the operator's "mumble
passed before" → load flake hypothesis. Must confirm with a clean isolation re-run + check whether the
canon-sweep red was under concurrent load.
Next: start M1 investigation. Plan order (cheap/informative first): triage the existing sweep logs on
cc-ci to pin the EXACT assertion/error for each (mumble, mattermost-lts restore, gitea app.ini,
bluesky routing, discourse compose), then run isolation re-runs. discourse's recorded cause is an
UPSTREAM compose defect (`sidekiq.depends_on: discourse` while service is `app`) that FATAs before any
deploy — that's deterministic, not a load timeout, so it may not even need a long isolation run to
confirm; verify the compose at the latest tag directly first.