All checks were successful
continuous-integration/drone/push Build is passing
121 lines
7.6 KiB
Markdown
121 lines
7.6 KiB
Markdown
# JOURNAL — phase bsky
|
||
|
||
## 2026-06-11T11:31Z–11:55Z — bootstrap + root-cause diagnosis (B1, B2)
|
||
|
||
Phase start. Read plan-phase-bsky-fix.md + plan.md §6.1/§7/§9. Adversary seeded
|
||
REVIEW-bsky.md (8d5bf30) with cold baseline recon — same suspects I confirmed below.
|
||
|
||
**Diagnosis chain (commands + outputs):**
|
||
|
||
1. Mirror clone (b2d86ef): `compose.yml` pins `image: ghcr.io/bluesky-social/pds:0.4`,
|
||
overrides entrypoint (`dumb-init --` + config-mounted `/entrypoint.sh`);
|
||
`entrypoint.sh.tmpl` ends `exec node --enable-source-maps index.js` — relative path,
|
||
resolved against image WORKDIR.
|
||
|
||
2. Live image inspection on cc-ci:
|
||
`docker image inspect ghcr.io/bluesky-social/pds:0.4 --format "{{.Id}} created={{.Created}} workdir={{.Config.WorkingDir}} ... cmd={{.Config.Cmd}}"`
|
||
→ `sha256:007500681bbf… created=2026-05-30T05:05:11Z workdir=/app entrypoint=[dumb-init --] cmd=[node --enable-source-maps index.ts]`
|
||
`docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4 -c 'node --version; ls /app'`
|
||
→ `v24.15.0` / `index.ts node_modules package.json pnpm-lock.yaml` — **no index.js**.
|
||
`grep @atproto/pds /app/package.json` → `"@atproto/pds": "0.5.1"`; /usr/local/bin/goat present.
|
||
So `:0.4` is now a main-branch 0.5.1 build → recipe's `index.js` exec = MODULE_NOT_FOUND.
|
||
This precisely explains the rcust-era crash-loop evidence (Node v24.15.0 in traceback).
|
||
|
||
3. Upstream research:
|
||
- ghcr tags/list (paginated): exact tags …0.4.158, 0.4.169, 0.4.182, 0.4.188, 0.4.193,
|
||
0.4.204, 0.4.208, 0.4.219, plus anomalous 0.4.5001. `:0.4` digest `871194d2…` ==
|
||
`latest`, ≠ `0.4.219` (`e0b756701c92…`) → :0.4 republished past the release line.
|
||
- Dockerfile@v0.4.219: node:20.20-alpine3.23, WORKDIR /app, CMD index.js, dumb-init.
|
||
- Dockerfile@main: node:24.15-alpine3.23, CMD index.ts, + goat binary — matches what
|
||
`:0.4` now contains. GitHub `releases/latest` 404s (they only push git tags).
|
||
- service/package.json@v0.4.219: `"@atproto/pds": "0.4.219"`.
|
||
|
||
4. Candidate-fix image verified on cc-ci:
|
||
`docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4.219 -c 'node --version; ls /app; grep @atproto/pds /app/package.json; which dumb-init'`
|
||
→ `v20.20.2` / index.js present / `"@atproto/pds": "0.4.219"` / `/usr/bin/dumb-init`.
|
||
Image CMD `[node --enable-source-maps index.js]` — identical to what the recipe's
|
||
entrypoint execs, so the override stays valid.
|
||
|
||
**Why pin 0.4.219 and not chase 0.5.1 (rationale, summarized in DECISIONS.md):** 0.5.1
|
||
exists only as the moving `:0.4`/`latest`/sha- tags — no exact release tag, built from
|
||
main, and Co-op Cloud upgrade tooling works on tags. Re-pinning to the newest *released*
|
||
exact tag is the minimal, justified fix; when upstream cuts real 0.5.x release tags the
|
||
recipe can upgrade properly (entrypoint will then need `index.ts` + Node 24 — noted in
|
||
upstream registry).
|
||
|
||
Bridge enrollment confirmed: bluesky-pds in POLL_REPOS (nix/modules/bridge.nix:43) →
|
||
`!testme` works. Mirror has only closed PR#1 (skill smoke test); my fix → PR#2.
|
||
|
||
Next: DECISIONS entry (B3), mirror branch + PR (B4), !testme (B5).
|
||
|
||
## 2026-06-11T11:40Z–11:55Z — run 423 red: the upgrade-BASE trap (B5 first attempt)
|
||
|
||
PR #2 opened (branch upgrade-0.3.0+v0.4.219, head f7b6c8df, 2-line diff) and !testme'd
|
||
(comment 14340) → drone build/run 423. RESULT: install=fail, level 0 — but NOT the PR:
|
||
the run never deployed the PR head. The harness deploys ONCE at the upgrade BASE
|
||
(`previous_version` = vers[-2] = 0.1.1+v0.4 — confirmed: run-423's recipe checkout sat at
|
||
tag 0.1.1+v0.4) and only the upgrade tier chaos-redeploys the PR head. Both published tags
|
||
(0.1.1+v0.4, 0.2.0+v0.4) pin the broken moving `:0.4` → the base crash-loops the SAME
|
||
MODULE_NOT_FOUND (run-423 app log: Node v24.15.0, /app/index.js missing) → install fails
|
||
before my fix is ever exercised. No published version can EVER deploy again (upstream
|
||
republished the tag) — so the upgrade path is structurally unverifiable until a fixed
|
||
version is published post-merge.
|
||
|
||
Fix (harness, evidence-backed, not a weakening): EXPECTED_NA["upgrade"] (the EXISTING
|
||
declared-intentional-skip mechanism, de-capped levels phase lvl5) now also suppresses the
|
||
base deploy — extracted `upgrade_base()` pure helper in run_recipe_ci.py; single deploy
|
||
becomes the PR head; upgrade tier records "skip"; derive_rungs classifies it intentional
|
||
with the declared reason (visible in results.json skips.intentional — never reported as a
|
||
pass). tests/bluesky-pds/recipe_meta.py declares it with the full reason + the re-enable
|
||
path (UPGRADE_BASE_VERSION="0.3.0+v0.4.219" once published). 6 new unit tests
|
||
(tests/unit/test_upgrade_base.py) lock the decision matrix; meta-key doc regenerated.
|
||
Verified: 253 unit tests pass on cc-ci (was 247), repo lint PASS. Pushed e9745c8.
|
||
|
||
Re-triggered !testme (comment 14342) → build/run 427. Monitor armed.
|
||
|
||
## 2026-06-11T12:05Z — run 427 GREEN: level 5 at PR head; M1 claimed (B5, B6, B7)
|
||
|
||
Run 427 (drone build 427, comment 14342): level 5 — install/backup_restore/functional/
|
||
lint PASS, upgrade = declared intentional skip (reason verbatim in skips.intentional),
|
||
clean_teardown + no_secret_leak true, ref f7b6c8dfb81c. Per-run recipe checkout at PR
|
||
head f7b6c8d with image 0.4.219 (the fix WAS what deployed). Bridge reflected success →
|
||
PR comment 14343 ✅. Screenshot Read and verified: genuine PDS landing page (ASCII
|
||
butterfly, "This is an AT Protocol Personal Data Server", /xrpc/ pointer) — exactly the
|
||
default capture the phase plan predicted would work once deploy works; no hook needed.
|
||
Card (summary.png): 5/5, upgrade shown INTENTIONAL SKIP with reason; badge "level 5"
|
||
green. M1 claimed in STATUS-bsky.md.
|
||
|
||
## 2026-06-11T12:15Z — records closed (B8) + operator summary drafted (B9)
|
||
|
||
DEFERRED bluesky entry marked RESOLVED with pointers (f150012) — covers BOTH the re-pin
|
||
follow-up and the rcust M2 baseline-exclusion note.
|
||
|
||
**Shot-phase N/A disposition update (supersedes the deploy-gated classification):**
|
||
the shot phase classified bluesky-pds's screenshot "deploy-gated N/A — never capturable
|
||
because the app never comes up". With the PR#2 fix deployed (run 427, PR head), the
|
||
DEFAULT landing-page capture works exactly as the phase plan predicted: a real,
|
||
representative, credential-free PDS landing page (ASCII butterfly + "This is an AT
|
||
Protocol Personal Data Server" + /xrpc/ pointer). No SCREENSHOT hook was needed. The
|
||
N/A stands for HISTORICAL runs only; post-merge, bluesky-pds screenshots like any other
|
||
recipe.
|
||
|
||
Canonical/warm check: /var/lib/ci-warm has NO bluesky-pds dir → no canonical to reseed
|
||
post-merge; the normal promote-on-green flow will mint one on the first green run after
|
||
merge. Operator summary written to STATUS-bsky.md (B9).
|
||
|
||
## 2026-06-11T15:50Z — M1 PASS received; M2 claimed (B10)
|
||
|
||
M1 PASS @12:30Z (REVIEW-bsky 369f4f4), no findings, no VETO — every item reproduced cold
|
||
incl. negative-control teeth and the per-recipe scoping of the EXPECTED_NA change. (Gap
|
||
12:30→15:45 was a quota window, not work.) All M2 builder-side items were already in
|
||
place (DEFERRED f150012, operator summary cba53b6); claimed M2 with re-trigger
|
||
instructions for the fresh cold pass. Phase DoD after M2 PASS → ## DONE with PR open.
|
||
|
||
## 2026-06-11T15:55Z — M2 PASS → ## DONE
|
||
|
||
M2 PASS @15:48Z (42eabba): Adversary independently re-triggered !testme (comment 14344 →
|
||
build 435, level 5 at f7b6c8df, identical rung profile + screenshot sha to 427) and
|
||
corroborated every handoff item — including that 0.5.x has NO release tag, fully settling
|
||
the §2.2 upgrade-preference question. ## DONE written. Phase ends with PR #2 open for the
|
||
operator; loop stopped.
|