7.6 KiB
JOURNAL — phase bsky
2026-06-11T11:31Z–11:55Z — bootstrap + root-cause diagnosis (B1, B2)
Phase start. Read plan-phase-bsky-fix.md + plan.md §6.1/§7/§9. Adversary seeded
REVIEW-bsky.md (8d5bf30) with cold baseline recon — same suspects I confirmed below.
Diagnosis chain (commands + outputs):
-
Mirror clone (b2d86ef):
compose.ymlpinsimage: ghcr.io/bluesky-social/pds:0.4, overrides entrypoint (dumb-init --+ config-mounted/entrypoint.sh);entrypoint.sh.tmplendsexec node --enable-source-maps index.js— relative path, resolved against image WORKDIR. -
Live image inspection on cc-ci:
docker image inspect ghcr.io/bluesky-social/pds:0.4 --format "{{.Id}} created={{.Created}} workdir={{.Config.WorkingDir}} ... cmd={{.Config.Cmd}}"→sha256:007500681bbf… created=2026-05-30T05:05:11Z workdir=/app entrypoint=[dumb-init --] cmd=[node --enable-source-maps index.ts]docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4 -c 'node --version; ls /app'→v24.15.0/index.ts node_modules package.json pnpm-lock.yaml— no index.js.grep @atproto/pds /app/package.json→"@atproto/pds": "0.5.1"; /usr/local/bin/goat present. So:0.4is now a main-branch 0.5.1 build → recipe'sindex.jsexec = MODULE_NOT_FOUND. This precisely explains the rcust-era crash-loop evidence (Node v24.15.0 in traceback). -
Upstream research:
- ghcr tags/list (paginated): exact tags …0.4.158, 0.4.169, 0.4.182, 0.4.188, 0.4.193,
0.4.204, 0.4.208, 0.4.219, plus anomalous 0.4.5001.
:0.4digest871194d2…==latest, ≠0.4.219(e0b756701c92…) → :0.4 republished past the release line. - Dockerfile@v0.4.219: node:20.20-alpine3.23, WORKDIR /app, CMD index.js, dumb-init.
- Dockerfile@main: node:24.15-alpine3.23, CMD index.ts, + goat binary — matches what
:0.4now contains. GitHubreleases/latest404s (they only push git tags). - service/package.json@v0.4.219:
"@atproto/pds": "0.4.219".
- ghcr tags/list (paginated): exact tags …0.4.158, 0.4.169, 0.4.182, 0.4.188, 0.4.193,
0.4.204, 0.4.208, 0.4.219, plus anomalous 0.4.5001.
-
Candidate-fix image verified on cc-ci:
docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4.219 -c 'node --version; ls /app; grep @atproto/pds /app/package.json; which dumb-init'→v20.20.2/ index.js present /"@atproto/pds": "0.4.219"//usr/bin/dumb-init. Image CMD[node --enable-source-maps index.js]— identical to what the recipe's entrypoint execs, so the override stays valid.
Why pin 0.4.219 and not chase 0.5.1 (rationale, summarized in DECISIONS.md): 0.5.1
exists only as the moving :0.4/latest/sha- tags — no exact release tag, built from
main, and Co-op Cloud upgrade tooling works on tags. Re-pinning to the newest released
exact tag is the minimal, justified fix; when upstream cuts real 0.5.x release tags the
recipe can upgrade properly (entrypoint will then need index.ts + Node 24 — noted in
upstream registry).
Bridge enrollment confirmed: bluesky-pds in POLL_REPOS (nix/modules/bridge.nix:43) →
!testme works. Mirror has only closed PR#1 (skill smoke test); my fix → PR#2.
Next: DECISIONS entry (B3), mirror branch + PR (B4), !testme (B5).
2026-06-11T11:40Z–11:55Z — run 423 red: the upgrade-BASE trap (B5 first attempt)
PR #2 opened (branch upgrade-0.3.0+v0.4.219, head f7b6c8df, 2-line diff) and !testme'd
(comment 14340) → drone build/run 423. RESULT: install=fail, level 0 — but NOT the PR:
the run never deployed the PR head. The harness deploys ONCE at the upgrade BASE
(previous_version = vers[-2] = 0.1.1+v0.4 — confirmed: run-423's recipe checkout sat at
tag 0.1.1+v0.4) and only the upgrade tier chaos-redeploys the PR head. Both published tags
(0.1.1+v0.4, 0.2.0+v0.4) pin the broken moving :0.4 → the base crash-loops the SAME
MODULE_NOT_FOUND (run-423 app log: Node v24.15.0, /app/index.js missing) → install fails
before my fix is ever exercised. No published version can EVER deploy again (upstream
republished the tag) — so the upgrade path is structurally unverifiable until a fixed
version is published post-merge.
Fix (harness, evidence-backed, not a weakening): EXPECTED_NA["upgrade"] (the EXISTING
declared-intentional-skip mechanism, de-capped levels phase lvl5) now also suppresses the
base deploy — extracted upgrade_base() pure helper in run_recipe_ci.py; single deploy
becomes the PR head; upgrade tier records "skip"; derive_rungs classifies it intentional
with the declared reason (visible in results.json skips.intentional — never reported as a
pass). tests/bluesky-pds/recipe_meta.py declares it with the full reason + the re-enable
path (UPGRADE_BASE_VERSION="0.3.0+v0.4.219" once published). 6 new unit tests
(tests/unit/test_upgrade_base.py) lock the decision matrix; meta-key doc regenerated.
Verified: 253 unit tests pass on cc-ci (was 247), repo lint PASS. Pushed e9745c8.
Re-triggered !testme (comment 14342) → build/run 427. Monitor armed.
2026-06-11T12:05Z — run 427 GREEN: level 5 at PR head; M1 claimed (B5, B6, B7)
Run 427 (drone build 427, comment 14342): level 5 — install/backup_restore/functional/ lint PASS, upgrade = declared intentional skip (reason verbatim in skips.intentional), clean_teardown + no_secret_leak true, ref f7b6c8dfb81c. Per-run recipe checkout at PR head f7b6c8d with image 0.4.219 (the fix WAS what deployed). Bridge reflected success → PR comment 14343 ✅. Screenshot Read and verified: genuine PDS landing page (ASCII butterfly, "This is an AT Protocol Personal Data Server", /xrpc/ pointer) — exactly the default capture the phase plan predicted would work once deploy works; no hook needed. Card (summary.png): 5/5, upgrade shown INTENTIONAL SKIP with reason; badge "level 5" green. M1 claimed in STATUS-bsky.md.
2026-06-11T12:15Z — records closed (B8) + operator summary drafted (B9)
DEFERRED bluesky entry marked RESOLVED with pointers (f150012) — covers BOTH the re-pin
follow-up and the rcust M2 baseline-exclusion note.
Shot-phase N/A disposition update (supersedes the deploy-gated classification): the shot phase classified bluesky-pds's screenshot "deploy-gated N/A — never capturable because the app never comes up". With the PR#2 fix deployed (run 427, PR head), the DEFAULT landing-page capture works exactly as the phase plan predicted: a real, representative, credential-free PDS landing page (ASCII butterfly + "This is an AT Protocol Personal Data Server" + /xrpc/ pointer). No SCREENSHOT hook was needed. The N/A stands for HISTORICAL runs only; post-merge, bluesky-pds screenshots like any other recipe.
Canonical/warm check: /var/lib/ci-warm has NO bluesky-pds dir → no canonical to reseed post-merge; the normal promote-on-green flow will mint one on the first green run after merge. Operator summary written to STATUS-bsky.md (B9).
2026-06-11T15:50Z — M1 PASS received; M2 claimed (B10)
M1 PASS @12:30Z (REVIEW-bsky 369f4f4), no findings, no VETO — every item reproduced cold
incl. negative-control teeth and the per-recipe scoping of the EXPECTED_NA change. (Gap
12:30→15:45 was a quota window, not work.) All M2 builder-side items were already in
place (DEFERRED f150012, operator summary cba53b6); claimed M2 with re-trigger
instructions for the fresh cold pass. Phase DoD after M2 PASS → ## DONE with PR open.
2026-06-11T15:55Z — M2 PASS → ## DONE
M2 PASS @15:48Z (42eabba): Adversary independently re-triggered !testme (comment 14344 →
build 435, level 5 at f7b6c8df, identical rung profile + screenshot sha to 427) and
corroborated every handoff item — including that 0.5.x has NO release tag, fully settling
the §2.2 upgrade-preference question. ## DONE written. Phase ends with PR #2 open for the
operator; loop stopped.