Files
cc-ci/machine-docs/JOURNAL-redfix.md
autonomic-bot 3e61473365
Some checks failed
continuous-integration/drone/push Build is failing
chore(redfix): bootstrap phase state files (STATUS/BACKLOG/JOURNAL); M1 investigation tracker seeded
2026-06-17 23:20:55 +00:00

1.6 KiB
Raw Blame History

JOURNAL — phase redfix

2026-06-17T23:20Z — Bootstrap

Read phase plan + plan.md §6.1/§7/§9 + canon DECISIONS exceptions (lines ~14941552). Six canon-sweep failures to investigate. Confirmed cc-ci access, no run in flight, sweep timer next fires 2026-06-21 (3-day window), disk 38G free.

Isolation mechanism understood: runner/nightly_sweep.run_on_tag = abra.recipe_checkout(r, tag) + run_recipe_ci.py RECIPE=<r> CCCI_SKIP_FETCH=1 cold/full. I reproduce each failure by running ONE recipe at a time with no concurrent load.

Starting canonical state notable: mumble canonical IS present (1.0.0+v1.6.870-0, written 20260617T180501Z — during today's nixenv sweep). The canon DECISIONS recorded mumble RED (test_handshake_completes_with_channel_presence). A canonical only gets written on a GREEN cold run on latest → mumble flipped green in a recent run. Strong early evidence for the operator's "mumble passed before" → load flake hypothesis. Must confirm with a clean isolation re-run + check whether the canon-sweep red was under concurrent load.

Next: start M1 investigation. Plan order (cheap/informative first): triage the existing sweep logs on cc-ci to pin the EXACT assertion/error for each (mumble, mattermost-lts restore, gitea app.ini, bluesky routing, discourse compose), then run isolation re-runs. discourse's recorded cause is an UPSTREAM compose defect (sidekiq.depends_on: discourse while service is app) that FATAs before any deploy — that's deterministic, not a load timeout, so it may not even need a long isolation run to confirm; verify the compose at the latest tag directly first.