Files
cc-ci/REVIEW-bsky.md

11 KiB
Raw Blame History

REVIEW-bsky.md — Adversary verdicts for the bsky sub-phase

Phase SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-bsky-fix.md. Gates: M1 (root cause + green fix PR), M2 (operator handoff complete → ## DONE). This file is append-only; the Builder reads it, never writes it.


Baseline recon @2026-06-11 (cold, pre-claim — NOT a verdict)

Established independently from the live recipe checkout on cc-ci (~/.abra/recipes/bluesky-pds, HEAD b2d86ef, tag 0.2.0+v0.4-4-gb2d86ef) so I am ready to verify the Builder's root-cause claim without anchoring:

  • compose.yml: app image: ghcr.io/bluesky-social/pds:0.4 — a moving minor tag. Version label coop-cloud.${STACK_NAME}.version=0.2.0+v0.4.
  • Recipe overrides the image entrypoint via entrypoint.sh.tmpl (mounted as a config at /entrypoint.sh, entrypoint: dumb-init --, command: /entrypoint.sh). That script ends with exec node --enable-source-maps index.js — a relative index.js, resolved against the image's WORKDIR.
  • Known symptom (rcust/shot evidence, DEFERRED.md): app crash-loops Cannot find module '/app/index.js' (MODULE_NOT_FOUND) under Node v24.15.0. Consistent with: image WORKDIR /app, but index.js no longer present there → upstream restructured/rebuilt whatever :0.4 now resolves to.

Verification angles I will hold the Builder's M1/M2 to (per phase plan §3 gates):

  1. Root-cause evidence reproduces — I independently inspect the live image (docker run --entrypoint sh ... -c 'ls; node --version' / crane/skopeo) and confirm index.js is absent from the assumed WORKDIR at the OLD pin, and present/working at the NEW pin.
  2. The fix is in the recipe mirror PR, not the harness; diff minimal + each line justified against upstream bluesky-social/pds changelog; version label bumped per recipe convention; no test/gate weakening anywhere in cc-ci.
  3. The green run is genuinely the PR head via the drone !testme path (not a local hand-run) — full lifecycle incl. lint, level recorded under de-capped semantics.
  4. Screenshot real + credential-free (I Read the PNG myself); never shows generated creds.
  5. DEFERRED entries closed with pointers; operator handoff in STATUS-bsky.md.

No gate CLAIMED yet — awaiting Builder's first claim(...) on a bsky gate.

Pre-claim recon update @2026-06-11T11:45Z (cold image probe — NOT a verdict)

Independently reproduced BOTH halves of the root cause via docker run on cc-ci:

  • ghcr.io/bluesky-social/pds:0.4 (current moving tag, digest …2324702f): Node v24.15.0, WORKDIR /app, ships index.ts only — no index.js. The recipe's entrypoint exec node --enable-source-maps index.js therefore fails with exactly Cannot find module '/app/index.js'. Symptom reproduced. ✔
  • ghcr.io/bluesky-social/pds:0.4.219 (Builder's proposed pin): Node v20.20.2, WORKDIR /app, ships index.js (package.json main: index.js). The recipe's existing entrypoint resolves the file → addresses the crash at the image level. ✔

Open scrutiny points I will hold the M1 claim to (NOT yet judged — no gate CLAIMED):

  • §2.2 upgrade-preference: 0.4.219 is the latest patch of the previous 0.4 line, not an upgrade to current stable (:0.4 now = 0.5.1). The plan prefers upgrading unless research justifies otherwise. Need: a genuine DECISIONS.md justification (e.g. 0.5.x moved to a TS entrypoint requiring an entrypoint rewrite / larger blast radius) — I'll read it only AFTER my own verdict, and check it against upstream changelog.
  • Pin should be exact/immutable (0.4.219 looks like a full patch tag — verify it's not itself moving; digest-pin would be strongest).
  • Fix must land on the recipe MIRROR PR and be proven green via the drone !testme path at PR head — not a local hand-run; no cc-ci harness/gate weakening.

Still no gate CLAIMED (STATUS-bsky: "none claimed yet — working M1"). Idling for the claim.

Pre-claim recon @2026-06-11T11:55Z — EXPECTED_NA['upgrade'] premise (cold, NOT a verdict)

Builder added a harness change: EXPECTED_NA['upgrade'] suppresses the upgrade-tier base deploy for bluesky-pds ("no deployable base"). I independently checked the premise on the live recipe checkout:

  • Published recipe tags: ONLY 0.1.1+v0.4 and 0.2.0+v0.4. Both pin ghcr.io/bluesky-social/pds:0.4 (the moving tag that now resolves to the broken 0.5.1/index.ts image). So every published base would crash identically → there is no deployable previous published version. Premise holds. ✔
  • Logic: the PR fix (pin 0.4.219) is the FIRST deployable published version; before it, NO published version deploys, so a "previous published → PR" upgrade path cannot exist. Genuinely N/A, not a dodge. (Post-merge, future PRs WILL have a deployable base → tier re-activates; operator handoff should note this.)

STILL must hard-verify when M1 is CLAIMED (do NOT pre-judge):

  • The NA is scoped to bluesky-pds only (per-recipe EXPECTED_NA declaration, not a global loosening of the upgrade tier for all recipes) — read the diff.
  • install / backup-restore / functional / lint tiers are NOT suppressed.
  • N/A recorded honestly with reason and handled correctly under de-capped level semantics (doesn't silently inflate the level nor falsely block); the 6 new upgrade_base() unit tests actually have teeth.
  • §9 alternative ("deploy base minimally via overlay, then upgrade to latest") is correctly rejected here: latest-deployable == PR head == 0.4.219, so there's no version delta to test and an overlay base would be synthetic — N/A is the honest call, not the overlay.

M1 — PASS @2026-06-11T12:30Z (root cause + green fix PR + screenshot)

Verdict formed COLD from my own clone + live cc-ci probes, BEFORE reading JOURNAL.md (anti-anchoring respected). Sources: phase plan §3 (SSOT), the code/git history, the verification info in STATUS-bsky.md, and my own re-runs below. Every M1 acceptance item independently reproduced.

1. Root cause reproduces ✔

Cold docker run on cc-ci of both images:

  • ghcr.io/bluesky-social/pds:0.4 (current, digest …2324702f/871194d2): @atproto/pds 0.5.1, Node v24.15.0, /app/index.tsNO index.js. The recipe's entrypoint exec node --enable-source-maps index.jsCannot find module '/app/index.js'. Symptom reproduced exactly.
  • :0.4.219 (the fix pin): @atproto/pds 0.4.219, Node v20.20.2, /app/index.js present (package.json main:index.js) ⇒ entrypoint resolves. Fix sound at image level.
  • Upstream registry cc-ci-plan/upstream/bluesky-pds.md matches my probes (moving :0.4 tracks main; 0.4.x keeps classic layout; env interface stable across 0.4.x → no migration). :0.4 is demonstrably a MOVING tag upstream republished.

2. PR #2 minimal + justified, unmerged ✔

Gitea API: PR #2 open, merged=false, mergeable=true; base main b2d86ef, head f7b6c8df (branch upgrade-0.3.0+v0.4.219). Diff = 1 file, +2 2 on compose.yml only: image :0.4:0.4.219, version label 0.2.0+v0.40.3.0+v0.4.219. No test/harness/recipe-test weakening in the PR. :0.4.219 is an exact (non-moving) version tag — newest 0.4.x exact tag preserving the recipe's index.js layout, so §2.2's "exact-version tag … unless research justifies otherwise" is met (0.5.x restructured to a TS entrypoint requiring a recipe entrypoint rewrite — the same-series re-pin is the minimal correct fix). NOTE (not a finding): pursuing the 0.5.x upgrade later is a reasonable operator follow-up; the re-pin is the right minimal fix now.

3. Green run 427 via the GENUINE drone !testme path, at PR head ✔

  • PR #2 comment 14342 !testme → bridge swarm log (ccci-bridge_app): [poll] triggered build 427 for bluesky-pds@f7b6c8df (PR #2, comment 14342) by autonomic-botreflected outcome build 427 (bluesky-pds PR #2): success → PR comment 14343 " passed @ f7b6c8df". Real poll→drone→reflect, not a hand-run.
  • run-427 recipe checkout = PR head f7b6c8d "chore: upgrade to 0.3.0+v0.4.219", compose.yml line 6 image=:0.4.219, version label 0.3.0+v0.4.219.
  • results.json: level=5, ref=f7b6c8dfb81c, pr=2; rungs install/backup_restore/functional/lint=pass, upgrade=skip; skips.intentional.upgrade=declared reason, skips.unintentional=[]; flags clean_teardown+no_secret_leak=true; schema=2.

4. No gate weakening (the EXPECTED_NA['upgrade'] harness change) ✔

  • Premise true (cold): BOTH published recipe tags (0.1.1+v0.4, 0.2.0+v0.4) pin the broken moving :0.4 ⇒ no deployable upgrade base. Genuine structural N/A, not a dodge.
  • upgrade_base() (e9745c8) returns None only when upgrade ∈ EXPECTED_NA, declared per-recipe in tests/bluesky-pds/recipe_meta.py. NOT a global loosening — unit test test_expected_na_other_rung_does_not_suppress proves a DIFFERENT-rung EXPECTED_NA does not suppress the upgrade base. The tier records "skip", never "pass".
  • Negative control run 423 (same PR head, pre-EXPECTED_NA): base 0.1.1+v0.4 deploy → install=fail → level 0. Proves the harness has TEETH: it goes red when a base IS attempted against the broken tag; 427's level 5 is solely the legitimate base-suppression, not a masked failure. A synthetic overlay base (0.4.219→0.4.219, zero delta) would be a meaningless green — N/A-skip is the honest call.
  • Level math (compute_level, pure): install=pass(1) · upgrade=skip(climbs) · backup_restore=pass(3) · functional=pass(4) · lint=pass(5) ⇒ 5. Consistent with the lvl5 de-cap semantics (skip climbs; only fail/unver block).
  • Unit tests COLD on cc-ci (fresh clone HEAD cba53b6): 253 passed (6 new in test_upgrade_base.py, with teeth). Repo lint COLD: lint: PASS (exit 0).

5. Screenshot — real + credential-free ✔

Published …/runs/427/screenshot.png (HTTP 200, 29274 B) is sha256-identical to the on-disk capture. I Read the PNG: the genuine PDS landing page — Bluesky ASCII butterfly, "This is an AT Protocol Personal Data Server (aka, an atproto PDS)", "/xrpc/" pointer, Code/Self-Host/Protocol links. No credentials (no admin password / invite / secret). Default capture suffices — no SCREENSHOT hook needed.

6. No secret leak ✔

Independent scan of published artifacts (results.json, summary.html, lint.txt, junit) for the PDS-generated secrets (admin password / jwt / plc rotation key) and high-entropy strings: the ONLY matches are recipe SOURCE secret-NAME references (- pds_jwt_secret etc.) and one abra lint WARN naming pds_admin_password (length policy) — no secret VALUE exposed. Only high-entropy token = the 40-char commit SHA. clean_teardown confirmed (no swarm secret/stack residue for the run).

M1 PASS. No VETO. Builder cleared to proceed to M2 (operator handoff). M2 will get a fresh cold pass: independent re-trigger/confirm green at PR head, PNG re-Read, level/baseline reconciliation, DEFERRED entries closed with pointers, and the operator summary checked — plus I will then consult JOURNAL/DECISIONS to contextualise (noting it there).