Files
cc-ci/JOURNAL-bsky.md

7.6 KiB
Raw Blame History

JOURNAL — phase bsky

2026-06-11T11:31Z11:55Z — bootstrap + root-cause diagnosis (B1, B2)

Phase start. Read plan-phase-bsky-fix.md + plan.md §6.1/§7/§9. Adversary seeded REVIEW-bsky.md (8d5bf30) with cold baseline recon — same suspects I confirmed below.

Diagnosis chain (commands + outputs):

  1. Mirror clone (b2d86ef): compose.yml pins image: ghcr.io/bluesky-social/pds:0.4, overrides entrypoint (dumb-init -- + config-mounted /entrypoint.sh); entrypoint.sh.tmpl ends exec node --enable-source-maps index.js — relative path, resolved against image WORKDIR.

  2. Live image inspection on cc-ci: docker image inspect ghcr.io/bluesky-social/pds:0.4 --format "{{.Id}} created={{.Created}} workdir={{.Config.WorkingDir}} ... cmd={{.Config.Cmd}}"sha256:007500681bbf… created=2026-05-30T05:05:11Z workdir=/app entrypoint=[dumb-init --] cmd=[node --enable-source-maps index.ts] docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4 -c 'node --version; ls /app'v24.15.0 / index.ts node_modules package.json pnpm-lock.yamlno index.js. grep @atproto/pds /app/package.json"@atproto/pds": "0.5.1"; /usr/local/bin/goat present. So :0.4 is now a main-branch 0.5.1 build → recipe's index.js exec = MODULE_NOT_FOUND. This precisely explains the rcust-era crash-loop evidence (Node v24.15.0 in traceback).

  3. Upstream research:

    • ghcr tags/list (paginated): exact tags …0.4.158, 0.4.169, 0.4.182, 0.4.188, 0.4.193, 0.4.204, 0.4.208, 0.4.219, plus anomalous 0.4.5001. :0.4 digest 871194d2… == latest, ≠ 0.4.219 (e0b756701c92…) → :0.4 republished past the release line.
    • Dockerfile@v0.4.219: node:20.20-alpine3.23, WORKDIR /app, CMD index.js, dumb-init.
    • Dockerfile@main: node:24.15-alpine3.23, CMD index.ts, + goat binary — matches what :0.4 now contains. GitHub releases/latest 404s (they only push git tags).
    • service/package.json@v0.4.219: "@atproto/pds": "0.4.219".
  4. Candidate-fix image verified on cc-ci: docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4.219 -c 'node --version; ls /app; grep @atproto/pds /app/package.json; which dumb-init'v20.20.2 / index.js present / "@atproto/pds": "0.4.219" / /usr/bin/dumb-init. Image CMD [node --enable-source-maps index.js] — identical to what the recipe's entrypoint execs, so the override stays valid.

Why pin 0.4.219 and not chase 0.5.1 (rationale, summarized in DECISIONS.md): 0.5.1 exists only as the moving :0.4/latest/sha- tags — no exact release tag, built from main, and Co-op Cloud upgrade tooling works on tags. Re-pinning to the newest released exact tag is the minimal, justified fix; when upstream cuts real 0.5.x release tags the recipe can upgrade properly (entrypoint will then need index.ts + Node 24 — noted in upstream registry).

Bridge enrollment confirmed: bluesky-pds in POLL_REPOS (nix/modules/bridge.nix:43) → !testme works. Mirror has only closed PR#1 (skill smoke test); my fix → PR#2.

Next: DECISIONS entry (B3), mirror branch + PR (B4), !testme (B5).

2026-06-11T11:40Z11:55Z — run 423 red: the upgrade-BASE trap (B5 first attempt)

PR #2 opened (branch upgrade-0.3.0+v0.4.219, head f7b6c8df, 2-line diff) and !testme'd (comment 14340) → drone build/run 423. RESULT: install=fail, level 0 — but NOT the PR: the run never deployed the PR head. The harness deploys ONCE at the upgrade BASE (previous_version = vers[-2] = 0.1.1+v0.4 — confirmed: run-423's recipe checkout sat at tag 0.1.1+v0.4) and only the upgrade tier chaos-redeploys the PR head. Both published tags (0.1.1+v0.4, 0.2.0+v0.4) pin the broken moving :0.4 → the base crash-loops the SAME MODULE_NOT_FOUND (run-423 app log: Node v24.15.0, /app/index.js missing) → install fails before my fix is ever exercised. No published version can EVER deploy again (upstream republished the tag) — so the upgrade path is structurally unverifiable until a fixed version is published post-merge.

Fix (harness, evidence-backed, not a weakening): EXPECTED_NA["upgrade"] (the EXISTING declared-intentional-skip mechanism, de-capped levels phase lvl5) now also suppresses the base deploy — extracted upgrade_base() pure helper in run_recipe_ci.py; single deploy becomes the PR head; upgrade tier records "skip"; derive_rungs classifies it intentional with the declared reason (visible in results.json skips.intentional — never reported as a pass). tests/bluesky-pds/recipe_meta.py declares it with the full reason + the re-enable path (UPGRADE_BASE_VERSION="0.3.0+v0.4.219" once published). 6 new unit tests (tests/unit/test_upgrade_base.py) lock the decision matrix; meta-key doc regenerated. Verified: 253 unit tests pass on cc-ci (was 247), repo lint PASS. Pushed e9745c8.

Re-triggered !testme (comment 14342) → build/run 427. Monitor armed.

2026-06-11T12:05Z — run 427 GREEN: level 5 at PR head; M1 claimed (B5, B6, B7)

Run 427 (drone build 427, comment 14342): level 5 — install/backup_restore/functional/ lint PASS, upgrade = declared intentional skip (reason verbatim in skips.intentional), clean_teardown + no_secret_leak true, ref f7b6c8dfb81c. Per-run recipe checkout at PR head f7b6c8d with image 0.4.219 (the fix WAS what deployed). Bridge reflected success → PR comment 14343 . Screenshot Read and verified: genuine PDS landing page (ASCII butterfly, "This is an AT Protocol Personal Data Server", /xrpc/ pointer) — exactly the default capture the phase plan predicted would work once deploy works; no hook needed. Card (summary.png): 5/5, upgrade shown INTENTIONAL SKIP with reason; badge "level 5" green. M1 claimed in STATUS-bsky.md.

2026-06-11T12:15Z — records closed (B8) + operator summary drafted (B9)

DEFERRED bluesky entry marked RESOLVED with pointers (f150012) — covers BOTH the re-pin follow-up and the rcust M2 baseline-exclusion note.

Shot-phase N/A disposition update (supersedes the deploy-gated classification): the shot phase classified bluesky-pds's screenshot "deploy-gated N/A — never capturable because the app never comes up". With the PR#2 fix deployed (run 427, PR head), the DEFAULT landing-page capture works exactly as the phase plan predicted: a real, representative, credential-free PDS landing page (ASCII butterfly + "This is an AT Protocol Personal Data Server" + /xrpc/ pointer). No SCREENSHOT hook was needed. The N/A stands for HISTORICAL runs only; post-merge, bluesky-pds screenshots like any other recipe.

Canonical/warm check: /var/lib/ci-warm has NO bluesky-pds dir → no canonical to reseed post-merge; the normal promote-on-green flow will mint one on the first green run after merge. Operator summary written to STATUS-bsky.md (B9).

2026-06-11T15:50Z — M1 PASS received; M2 claimed (B10)

M1 PASS @12:30Z (REVIEW-bsky 369f4f4), no findings, no VETO — every item reproduced cold incl. negative-control teeth and the per-recipe scoping of the EXPECTED_NA change. (Gap 12:30→15:45 was a quota window, not work.) All M2 builder-side items were already in place (DEFERRED f150012, operator summary cba53b6); claimed M2 with re-trigger instructions for the fresh cold pass. Phase DoD after M2 PASS → ## DONE with PR open.

2026-06-11T15:55Z — M2 PASS → ## DONE

M2 PASS @15:48Z (42eabba): Adversary independently re-triggered !testme (comment 14344 → build 435, level 5 at f7b6c8df, identical rung profile + screenshot sha to 427) and corroborated every handoff item — including that 0.5.x has NO release tag, fully settling the §2.2 upgrade-preference question. ## DONE written. Phase ends with PR #2 open for the operator; loop stopped.