Files
cc-ci/machine-docs/REVIEW-bsky.md
autonomic-bot 85a781368a
Some checks failed
continuous-integration/drone/push Build is failing
machine-docs: move all per-phase coordination files out of repo root
STATUS/BACKLOG/REVIEW/JOURNAL for bsky/conc/dstamp/kuma/lvl5/mailu/rcust/shot
(32 files) were at the repo root; move them into machine-docs/ to match the
mandated file-location rule (DECISIONS/DEFERRED/INBOX + older phases already
live there). AGENTS.md gains an explicit File-location rule. No content change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:57:03 +00:00

15 KiB
Raw Blame History

REVIEW-bsky.md — Adversary verdicts for the bsky sub-phase

Phase SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-bsky-fix.md. Gates: M1 (root cause + green fix PR), M2 (operator handoff complete → ## DONE). This file is append-only; the Builder reads it, never writes it.


Baseline recon @2026-06-11 (cold, pre-claim — NOT a verdict)

Established independently from the live recipe checkout on cc-ci (~/.abra/recipes/bluesky-pds, HEAD b2d86ef, tag 0.2.0+v0.4-4-gb2d86ef) so I am ready to verify the Builder's root-cause claim without anchoring:

  • compose.yml: app image: ghcr.io/bluesky-social/pds:0.4 — a moving minor tag. Version label coop-cloud.${STACK_NAME}.version=0.2.0+v0.4.
  • Recipe overrides the image entrypoint via entrypoint.sh.tmpl (mounted as a config at /entrypoint.sh, entrypoint: dumb-init --, command: /entrypoint.sh). That script ends with exec node --enable-source-maps index.js — a relative index.js, resolved against the image's WORKDIR.
  • Known symptom (rcust/shot evidence, DEFERRED.md): app crash-loops Cannot find module '/app/index.js' (MODULE_NOT_FOUND) under Node v24.15.0. Consistent with: image WORKDIR /app, but index.js no longer present there → upstream restructured/rebuilt whatever :0.4 now resolves to.

Verification angles I will hold the Builder's M1/M2 to (per phase plan §3 gates):

  1. Root-cause evidence reproduces — I independently inspect the live image (docker run --entrypoint sh ... -c 'ls; node --version' / crane/skopeo) and confirm index.js is absent from the assumed WORKDIR at the OLD pin, and present/working at the NEW pin.
  2. The fix is in the recipe mirror PR, not the harness; diff minimal + each line justified against upstream bluesky-social/pds changelog; version label bumped per recipe convention; no test/gate weakening anywhere in cc-ci.
  3. The green run is genuinely the PR head via the drone !testme path (not a local hand-run) — full lifecycle incl. lint, level recorded under de-capped semantics.
  4. Screenshot real + credential-free (I Read the PNG myself); never shows generated creds.
  5. DEFERRED entries closed with pointers; operator handoff in STATUS-bsky.md.

No gate CLAIMED yet — awaiting Builder's first claim(...) on a bsky gate.

Pre-claim recon update @2026-06-11T11:45Z (cold image probe — NOT a verdict)

Independently reproduced BOTH halves of the root cause via docker run on cc-ci:

  • ghcr.io/bluesky-social/pds:0.4 (current moving tag, digest …2324702f): Node v24.15.0, WORKDIR /app, ships index.ts only — no index.js. The recipe's entrypoint exec node --enable-source-maps index.js therefore fails with exactly Cannot find module '/app/index.js'. Symptom reproduced. ✔
  • ghcr.io/bluesky-social/pds:0.4.219 (Builder's proposed pin): Node v20.20.2, WORKDIR /app, ships index.js (package.json main: index.js). The recipe's existing entrypoint resolves the file → addresses the crash at the image level. ✔

Open scrutiny points I will hold the M1 claim to (NOT yet judged — no gate CLAIMED):

  • §2.2 upgrade-preference: 0.4.219 is the latest patch of the previous 0.4 line, not an upgrade to current stable (:0.4 now = 0.5.1). The plan prefers upgrading unless research justifies otherwise. Need: a genuine DECISIONS.md justification (e.g. 0.5.x moved to a TS entrypoint requiring an entrypoint rewrite / larger blast radius) — I'll read it only AFTER my own verdict, and check it against upstream changelog.
  • Pin should be exact/immutable (0.4.219 looks like a full patch tag — verify it's not itself moving; digest-pin would be strongest).
  • Fix must land on the recipe MIRROR PR and be proven green via the drone !testme path at PR head — not a local hand-run; no cc-ci harness/gate weakening.

Still no gate CLAIMED (STATUS-bsky: "none claimed yet — working M1"). Idling for the claim.

Pre-claim recon @2026-06-11T11:55Z — EXPECTED_NA['upgrade'] premise (cold, NOT a verdict)

Builder added a harness change: EXPECTED_NA['upgrade'] suppresses the upgrade-tier base deploy for bluesky-pds ("no deployable base"). I independently checked the premise on the live recipe checkout:

  • Published recipe tags: ONLY 0.1.1+v0.4 and 0.2.0+v0.4. Both pin ghcr.io/bluesky-social/pds:0.4 (the moving tag that now resolves to the broken 0.5.1/index.ts image). So every published base would crash identically → there is no deployable previous published version. Premise holds. ✔
  • Logic: the PR fix (pin 0.4.219) is the FIRST deployable published version; before it, NO published version deploys, so a "previous published → PR" upgrade path cannot exist. Genuinely N/A, not a dodge. (Post-merge, future PRs WILL have a deployable base → tier re-activates; operator handoff should note this.)

STILL must hard-verify when M1 is CLAIMED (do NOT pre-judge):

  • The NA is scoped to bluesky-pds only (per-recipe EXPECTED_NA declaration, not a global loosening of the upgrade tier for all recipes) — read the diff.
  • install / backup-restore / functional / lint tiers are NOT suppressed.
  • N/A recorded honestly with reason and handled correctly under de-capped level semantics (doesn't silently inflate the level nor falsely block); the 6 new upgrade_base() unit tests actually have teeth.
  • §9 alternative ("deploy base minimally via overlay, then upgrade to latest") is correctly rejected here: latest-deployable == PR head == 0.4.219, so there's no version delta to test and an overlay base would be synthetic — N/A is the honest call, not the overlay.

M1 — PASS @2026-06-11T12:30Z (root cause + green fix PR + screenshot)

Verdict formed COLD from my own clone + live cc-ci probes, BEFORE reading JOURNAL.md (anti-anchoring respected). Sources: phase plan §3 (SSOT), the code/git history, the verification info in STATUS-bsky.md, and my own re-runs below. Every M1 acceptance item independently reproduced.

1. Root cause reproduces ✔

Cold docker run on cc-ci of both images:

  • ghcr.io/bluesky-social/pds:0.4 (current, digest …2324702f/871194d2): @atproto/pds 0.5.1, Node v24.15.0, /app/index.tsNO index.js. The recipe's entrypoint exec node --enable-source-maps index.jsCannot find module '/app/index.js'. Symptom reproduced exactly.
  • :0.4.219 (the fix pin): @atproto/pds 0.4.219, Node v20.20.2, /app/index.js present (package.json main:index.js) ⇒ entrypoint resolves. Fix sound at image level.
  • Upstream registry cc-ci-plan/upstream/bluesky-pds.md matches my probes (moving :0.4 tracks main; 0.4.x keeps classic layout; env interface stable across 0.4.x → no migration). :0.4 is demonstrably a MOVING tag upstream republished.

2. PR #2 minimal + justified, unmerged ✔

Gitea API: PR #2 open, merged=false, mergeable=true; base main b2d86ef, head f7b6c8df (branch upgrade-0.3.0+v0.4.219). Diff = 1 file, +2 2 on compose.yml only: image :0.4:0.4.219, version label 0.2.0+v0.40.3.0+v0.4.219. No test/harness/recipe-test weakening in the PR. :0.4.219 is an exact (non-moving) version tag — newest 0.4.x exact tag preserving the recipe's index.js layout, so §2.2's "exact-version tag … unless research justifies otherwise" is met (0.5.x restructured to a TS entrypoint requiring a recipe entrypoint rewrite — the same-series re-pin is the minimal correct fix). NOTE (not a finding): pursuing the 0.5.x upgrade later is a reasonable operator follow-up; the re-pin is the right minimal fix now.

3. Green run 427 via the GENUINE drone !testme path, at PR head ✔

  • PR #2 comment 14342 !testme → bridge swarm log (ccci-bridge_app): [poll] triggered build 427 for bluesky-pds@f7b6c8df (PR #2, comment 14342) by autonomic-botreflected outcome build 427 (bluesky-pds PR #2): success → PR comment 14343 " passed @ f7b6c8df". Real poll→drone→reflect, not a hand-run.
  • run-427 recipe checkout = PR head f7b6c8d "chore: upgrade to 0.3.0+v0.4.219", compose.yml line 6 image=:0.4.219, version label 0.3.0+v0.4.219.
  • results.json: level=5, ref=f7b6c8dfb81c, pr=2; rungs install/backup_restore/functional/lint=pass, upgrade=skip; skips.intentional.upgrade=declared reason, skips.unintentional=[]; flags clean_teardown+no_secret_leak=true; schema=2.

4. No gate weakening (the EXPECTED_NA['upgrade'] harness change) ✔

  • Premise true (cold): BOTH published recipe tags (0.1.1+v0.4, 0.2.0+v0.4) pin the broken moving :0.4 ⇒ no deployable upgrade base. Genuine structural N/A, not a dodge.
  • upgrade_base() (e9745c8) returns None only when upgrade ∈ EXPECTED_NA, declared per-recipe in tests/bluesky-pds/recipe_meta.py. NOT a global loosening — unit test test_expected_na_other_rung_does_not_suppress proves a DIFFERENT-rung EXPECTED_NA does not suppress the upgrade base. The tier records "skip", never "pass".
  • Negative control run 423 (same PR head, pre-EXPECTED_NA): base 0.1.1+v0.4 deploy → install=fail → level 0. Proves the harness has TEETH: it goes red when a base IS attempted against the broken tag; 427's level 5 is solely the legitimate base-suppression, not a masked failure. A synthetic overlay base (0.4.219→0.4.219, zero delta) would be a meaningless green — N/A-skip is the honest call.
  • Level math (compute_level, pure): install=pass(1) · upgrade=skip(climbs) · backup_restore=pass(3) · functional=pass(4) · lint=pass(5) ⇒ 5. Consistent with the lvl5 de-cap semantics (skip climbs; only fail/unver block).
  • Unit tests COLD on cc-ci (fresh clone HEAD cba53b6): 253 passed (6 new in test_upgrade_base.py, with teeth). Repo lint COLD: lint: PASS (exit 0).

5. Screenshot — real + credential-free ✔

Published …/runs/427/screenshot.png (HTTP 200, 29274 B) is sha256-identical to the on-disk capture. I Read the PNG: the genuine PDS landing page — Bluesky ASCII butterfly, "This is an AT Protocol Personal Data Server (aka, an atproto PDS)", "/xrpc/" pointer, Code/Self-Host/Protocol links. No credentials (no admin password / invite / secret). Default capture suffices — no SCREENSHOT hook needed.

6. No secret leak ✔

Independent scan of published artifacts (results.json, summary.html, lint.txt, junit) for the PDS-generated secrets (admin password / jwt / plc rotation key) and high-entropy strings: the ONLY matches are recipe SOURCE secret-NAME references (- pds_jwt_secret etc.) and one abra lint WARN naming pds_admin_password (length policy) — no secret VALUE exposed. Only high-entropy token = the 40-char commit SHA. clean_teardown confirmed (no swarm secret/stack residue for the run).

M1 PASS. No VETO. Builder cleared to proceed to M2 (operator handoff). M2 will get a fresh cold pass: independent re-trigger/confirm green at PR head, PNG re-Read, level/baseline reconciliation, DEFERRED entries closed with pointers, and the operator summary checked — plus I will then consult JOURNAL/DECISIONS to contextualise (noting it there).


M2 — PASS @2026-06-11T15:48Z (operator handoff complete)

Fresh Adversary cold pass. Verdict formed from the plan (§3 M2 SSOT), the code/deliverables, the STATUS-bsky verification info, and my OWN independent re-trigger — BEFORE reading JOURNAL.md (anti-anchoring respected; I may consult it after, noting so).

1. Green at PR head — independently RE-TRIGGERED ✔ (the decisive proof)

I posted !testme on PR #2 myself (comment 14344, 15:46:21Z). Bridge: [poll] triggered build 435 for bluesky-pds@f7b6c8df (PR #2, comment 14344) by autonomic-bot. Fresh build 435 results.json: level=5, ref=f7b6c8dfb81c (PR head), pr=2; rungs install/backup_restore/functional/lint=pass, upgrade=skip (skips.intentional.upgrade=declared reason, skips.unintentional=[]); clean_teardown + no_secret_leak=true. Recipe checkout = PR head f7b6c8d, image :0.4.219. Identical rung profile to run 427 → reproducibly green, not a one-off.

  • Real stages, not a no-op: junit shows install/backup(generic+cc-ci)/restore (generic+cc-ci) and FOUR live functional tests — test_health_check, test_describe_server, test_session_auth, test_account_and_post. A no-op could not pass account-creation/post/session-auth against a live PDS. (Wall-clock ~70s is plausible: lightweight 2-service recipe, image cached on host.)

2. PNG independently Read ✔

Fresh build 435 screenshot.png sha256 == run 427's (bdb71d3e…) == the image I Read at M1: genuine PDS landing page (Bluesky ASCII butterfly, "AT Protocol Personal Data Server", /xrpc/ pointer, upstream links), no credentials. Deterministic, real.

3. Level under new semantics + baseline reconciled ✔

level=5 under the de-capped ladder (upgrade=skip climbs; only fail/unver block). Old Phase-2 baseline ("full lifecycle green", e45e0ee, pre-results era) is genuinely unreproducible — the moving-tag republish broke ALL published recipe versions; the PR restores deployability. Reconciliation recorded in the DEFERRED closure + the M2 claim. Independently corroborated: 0.5.x has NO release tag (upstream git: 0 0.5.x tags, highest v0.4.219 + anomalous v0.4.5001; ghcr 0.5.0/0.5.1/v0.5.1 all absent) — so an exact-version pin REQUIRES 0.4.x. This fully resolves the §2.2 "prefer upgrade" scrutiny: re-pinning to 0.4.219 (newest exact) is not "old over new" — there is no exact 0.5.x tag to upgrade to; 0.5.x lives only on the moving tag the recipe must never pin. Justified.

4. DEFERRED entries closed with pointers ✔

machine-docs/DEFERRED.md: RESOLVED @2026-06-11 (phase bsky). Explicitly closes BOTH the re-pin follow-up AND the rcust M2 baseline-exclusion note, with pointers to PR #2 / run 427 / negative control 423 / upstream registry / DECISIONS. Original entry preserved (append-only).

5. Operator summary ✔

STATUS-bsky "Operator summary": crisp + complete — what was wrong (moving tag → index.ts vs recipe's index.js; broke both published versions), what the PR changes (2-line re-pin 0.4.219 + label bump; why not 0.5.1 = no release tag + entrypoint migration), and a 5-step post-merge runbook (merge → publish version → drop EXPECTED_NA + set UPGRADE_BASE_VERSION="0.3.0+v0.4.219" → no canonical to reseed → never re-pin :0.4). Corroborated: ci-warm has NO bluesky entry (only custom-html/keycloak/traefik) → "nothing to reseed" is true.

6. PR left OPEN ✔

PR #2 head f7b6c8df, state=open, merged=false (re-confirmed at re-trigger). The phase is done WITH the PR open — merging is the operator's, post-merge reseeding documented not done.

M2 PASS. No VETO. Both M1 (@369f4f4) and M2 are fresh Adversary PASSes; no gate weakening, no secret leak, screenshot real, PR unmerged. The Builder is cleared to write ## DONE to STATUS-bsky.md. (Post-verdict I will consult JOURNAL/DECISIONS only to contextualise — it does not change this verdict.)

Post-verdict consult (does NOT change the verdict)

Read DECISIONS.md bsky entries after writing M2 PASS. Fully consistent: pin-choice entry REJECTS 0.5.1 (no release tag + index.ts migration) AND digest-suffix pinning (abra survey/upgrade tooling chokes on tag@digest) → exact-version tag 0.4.219 chosen (satisfies plan §2.2 "digest-pinned OR exact-version tag"). EXPECTED_NA entry matches the harness behaviour I verified. No contradiction, no new finding.