Files
cc-ci/machine-docs/REVIEW-bsky.md
autonomic-bot 85a781368a
Some checks failed
continuous-integration/drone/push Build is failing
machine-docs: move all per-phase coordination files out of repo root
STATUS/BACKLOG/REVIEW/JOURNAL for bsky/conc/dstamp/kuma/lvl5/mailu/rcust/shot
(32 files) were at the repo root; move them into machine-docs/ to match the
mandated file-location rule (DECISIONS/DEFERRED/INBOX + older phases already
live there). AGENTS.md gains an explicit File-location rule. No content change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-11 20:57:03 +00:00

239 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# REVIEW-bsky.md — Adversary verdicts for the `bsky` sub-phase
Phase SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-bsky-fix.md`.
Gates: **M1** (root cause + green fix PR), **M2** (operator handoff complete → `## DONE`).
This file is append-only; the Builder reads it, never writes it.
---
## Baseline recon @2026-06-11 (cold, pre-claim — NOT a verdict)
Established independently from the live recipe checkout on cc-ci
(`~/.abra/recipes/bluesky-pds`, HEAD `b2d86ef`, tag `0.2.0+v0.4-4-gb2d86ef`) so I am
ready to verify the Builder's root-cause claim without anchoring:
- `compose.yml`: app `image: ghcr.io/bluesky-social/pds:0.4` — a **moving minor tag**.
Version label `coop-cloud.${STACK_NAME}.version=0.2.0+v0.4`.
- Recipe **overrides the image entrypoint** via `entrypoint.sh.tmpl` (mounted as a config
at `/entrypoint.sh`, `entrypoint: dumb-init --`, `command: /entrypoint.sh`). That script
ends with `exec node --enable-source-maps index.js` — a **relative** `index.js`, resolved
against the image's WORKDIR.
- Known symptom (rcust/shot evidence, DEFERRED.md): app crash-loops
`Cannot find module '/app/index.js'` (MODULE_NOT_FOUND) under Node v24.15.0. Consistent
with: image WORKDIR `/app`, but `index.js` no longer present there → upstream
restructured/rebuilt whatever `:0.4` now resolves to.
Verification angles I will hold the Builder's M1/M2 to (per phase plan §3 gates):
1. Root-cause evidence reproduces — I independently inspect the live image
(`docker run --entrypoint sh ... -c 'ls; node --version'` / crane/skopeo) and confirm
`index.js` is absent from the assumed WORKDIR at the OLD pin, and present/working at the
NEW pin.
2. The fix is in the **recipe mirror PR**, not the harness; diff minimal + each line
justified against upstream bluesky-social/pds changelog; version label bumped per recipe
convention; **no test/gate weakening** anywhere in cc-ci.
3. The green run is genuinely the **PR head via the drone `!testme` path** (not a local
hand-run) — full lifecycle incl. lint, level recorded under de-capped semantics.
4. Screenshot real + credential-free (I Read the PNG myself); never shows generated creds.
5. DEFERRED entries closed with pointers; operator handoff in STATUS-bsky.md.
No gate CLAIMED yet — awaiting Builder's first `claim(...)` on a bsky gate.
## Pre-claim recon update @2026-06-11T11:45Z (cold image probe — NOT a verdict)
Independently reproduced BOTH halves of the root cause via `docker run` on cc-ci:
- `ghcr.io/bluesky-social/pds:0.4` (current moving tag, digest …2324702f): **Node v24.15.0**,
WORKDIR `/app`, ships **`index.ts`** only — no `index.js`. The recipe's entrypoint
`exec node --enable-source-maps index.js` therefore fails with exactly
`Cannot find module '/app/index.js'`. Symptom reproduced. ✔
- `ghcr.io/bluesky-social/pds:0.4.219` (Builder's proposed pin): **Node v20.20.2**,
WORKDIR `/app`, ships **`index.js`** (`package.json` `main: index.js`). The recipe's
existing entrypoint resolves the file → addresses the crash at the image level. ✔
Open scrutiny points I will hold the M1 claim to (NOT yet judged — no gate CLAIMED):
- **§2.2 upgrade-preference:** `0.4.219` is the latest patch of the *previous* 0.4 line,
not an upgrade to current stable (`:0.4` now = 0.5.1). The plan prefers upgrading unless
research justifies otherwise. Need: a genuine DECISIONS.md justification (e.g. 0.5.x
moved to a TS entrypoint requiring an entrypoint rewrite / larger blast radius) — I'll
read it only AFTER my own verdict, and check it against upstream changelog.
- Pin should be exact/immutable (0.4.219 looks like a full patch tag — verify it's not
itself moving; digest-pin would be strongest).
- Fix must land on the recipe MIRROR PR and be proven green via the drone `!testme` path
at PR head — not a local hand-run; no cc-ci harness/gate weakening.
Still no gate CLAIMED (STATUS-bsky: "none claimed yet — working M1"). Idling for the claim.
## Pre-claim recon @2026-06-11T11:55Z — EXPECTED_NA['upgrade'] premise (cold, NOT a verdict)
Builder added a harness change: `EXPECTED_NA['upgrade']` suppresses the upgrade-tier base
deploy for bluesky-pds ("no deployable base"). I independently checked the premise on the
live recipe checkout:
- Published recipe tags: ONLY `0.1.1+v0.4` and `0.2.0+v0.4`. **Both** pin
`ghcr.io/bluesky-social/pds:0.4` (the moving tag that now resolves to the broken
0.5.1/index.ts image). So every published base would crash identically → there is no
deployable previous published version. Premise holds. ✔
- Logic: the PR fix (pin 0.4.219) is the FIRST deployable published version; before it,
NO published version deploys, so a "previous published → PR" upgrade path cannot exist.
Genuinely N/A, not a dodge. (Post-merge, future PRs WILL have a deployable base → tier
re-activates; operator handoff should note this.)
STILL must hard-verify when M1 is CLAIMED (do NOT pre-judge):
- The NA is **scoped to bluesky-pds only** (per-recipe EXPECTED_NA declaration, not a
global loosening of the upgrade tier for all recipes) — read the diff.
- install / backup-restore / functional / lint tiers are NOT suppressed.
- N/A recorded honestly with reason and handled correctly under de-capped level semantics
(doesn't silently inflate the level nor falsely block); the 6 new upgrade_base() unit
tests actually have teeth.
- §9 alternative ("deploy base minimally via overlay, then upgrade to latest") is correctly
rejected here: latest-deployable == PR head == 0.4.219, so there's no version delta to
test and an overlay base would be synthetic — N/A is the honest call, not the overlay.
---
## M1 — PASS @2026-06-11T12:30Z (root cause + green fix PR + screenshot)
Verdict formed COLD from my own clone + live cc-ci probes, BEFORE reading JOURNAL.md
(anti-anchoring respected). Sources: phase plan §3 (SSOT), the code/git history, the
verification info in STATUS-bsky.md, and my own re-runs below. Every M1 acceptance item
independently reproduced.
### 1. Root cause reproduces ✔
Cold `docker run` on cc-ci of both images:
- `ghcr.io/bluesky-social/pds:0.4` (current, digest …2324702f/871194d2): `@atproto/pds`
**0.5.1**, **Node v24.15.0**, `/app/index.ts`**NO index.js**. The recipe's
entrypoint `exec node --enable-source-maps index.js``Cannot find module
'/app/index.js'`. Symptom reproduced exactly.
- `:0.4.219` (the fix pin): `@atproto/pds` **0.4.219**, **Node v20.20.2**, `/app/index.js`
present (`package.json main:index.js`) ⇒ entrypoint resolves. Fix sound at image level.
- Upstream registry `cc-ci-plan/upstream/bluesky-pds.md` matches my probes (moving `:0.4`
tracks main; 0.4.x keeps classic layout; env interface stable across 0.4.x → no
migration). `:0.4` is demonstrably a MOVING tag upstream republished.
### 2. PR #2 minimal + justified, unmerged ✔
Gitea API: PR #2 **open, merged=false, mergeable=true**; base main b2d86ef, head
**f7b6c8df** (branch upgrade-0.3.0+v0.4.219). Diff = **1 file, +2 2** on compose.yml only:
image `:0.4``:0.4.219`, version label `0.2.0+v0.4``0.3.0+v0.4.219`. No
test/harness/recipe-test weakening in the PR. `:0.4.219` is an **exact** (non-moving)
version tag — newest 0.4.x exact tag preserving the recipe's `index.js` layout, so §2.2's
"exact-version tag … unless research justifies otherwise" is met (0.5.x restructured to a TS
entrypoint requiring a recipe entrypoint rewrite — the same-series re-pin is the minimal
correct fix). NOTE (not a finding): pursuing the 0.5.x upgrade later is a reasonable
operator follow-up; the re-pin is the right minimal fix now.
### 3. Green run 427 via the GENUINE drone !testme path, at PR head ✔
- PR #2 comment **14342** `!testme` → bridge swarm log (ccci-bridge_app):
`[poll] triggered build 427 for bluesky-pds@f7b6c8df (PR #2, comment 14342) by
autonomic-bot``reflected outcome build 427 (bluesky-pds PR #2): success` → PR comment
**14343** "✅ passed @ f7b6c8df". Real poll→drone→reflect, not a hand-run.
- run-427 recipe checkout = PR head `f7b6c8d "chore: upgrade to 0.3.0+v0.4.219"`,
compose.yml line 6 image=`:0.4.219`, version label `0.3.0+v0.4.219`.
- `results.json`: **level=5**, ref=f7b6c8dfb81c, pr=2; rungs
install/backup_restore/functional/lint=**pass**, upgrade=**skip**;
`skips.intentional.upgrade`=declared reason, `skips.unintentional`=[];
flags clean_teardown+no_secret_leak=true; schema=2.
### 4. No gate weakening (the EXPECTED_NA['upgrade'] harness change) ✔
- Premise true (cold): BOTH published recipe tags (0.1.1+v0.4, 0.2.0+v0.4) pin the broken
moving `:0.4` ⇒ no deployable upgrade base. Genuine structural N/A, not a dodge.
- `upgrade_base()` (e9745c8) returns None only when `upgrade ∈ EXPECTED_NA`, declared
**per-recipe** in `tests/bluesky-pds/recipe_meta.py`. NOT a global loosening — unit test
`test_expected_na_other_rung_does_not_suppress` proves a DIFFERENT-rung EXPECTED_NA does
not suppress the upgrade base. The tier records `"skip"`, never `"pass"`.
- **Negative control run 423** (same PR head, pre-EXPECTED_NA): base 0.1.1+v0.4 deploy →
**install=fail** → level **0**. Proves the harness has TEETH: it goes red when a base IS
attempted against the broken tag; 427's level 5 is solely the legitimate base-suppression,
not a masked failure. A synthetic overlay base (0.4.219→0.4.219, zero delta) would be a
meaningless green — N/A-skip is the honest call.
- Level math (`compute_level`, pure): install=pass(1) · upgrade=skip(climbs) ·
backup_restore=pass(3) · functional=pass(4) · lint=pass(5) ⇒ **5**. Consistent with the
lvl5 de-cap semantics (skip climbs; only fail/unver block).
- Unit tests COLD on cc-ci (fresh clone HEAD cba53b6): **253 passed** (6 new in
test_upgrade_base.py, with teeth). Repo lint COLD: `lint: PASS` (exit 0).
### 5. Screenshot — real + credential-free ✔
Published `…/runs/427/screenshot.png` (HTTP 200, 29274 B) is **sha256-identical** to the
on-disk capture. I Read the PNG: the genuine PDS landing page — Bluesky ASCII butterfly,
"This is an AT Protocol Personal Data Server (aka, an atproto PDS)", "/xrpc/" pointer,
Code/Self-Host/Protocol links. **No credentials** (no admin password / invite / secret).
Default capture suffices — no SCREENSHOT hook needed.
### 6. No secret leak ✔
Independent scan of published artifacts (results.json, summary.html, lint.txt, junit) for
the PDS-generated secrets (admin password / jwt / plc rotation key) and high-entropy
strings: the ONLY matches are recipe SOURCE secret-NAME references (`- pds_jwt_secret`
etc.) and one abra lint WARN naming `pds_admin_password` (length policy) — no secret VALUE
exposed. Only high-entropy token = the 40-char commit SHA. clean_teardown confirmed (no
swarm secret/stack residue for the run).
**M1 PASS. No VETO.** Builder cleared to proceed to M2 (operator handoff). M2 will get a
fresh cold pass: independent re-trigger/confirm green at PR head, PNG re-Read, level/baseline
reconciliation, DEFERRED entries closed with pointers, and the operator summary checked —
plus I will then consult JOURNAL/DECISIONS to contextualise (noting it there).
---
## M2 — PASS @2026-06-11T15:48Z (operator handoff complete)
Fresh Adversary cold pass. Verdict formed from the plan (§3 M2 SSOT), the code/deliverables,
the STATUS-bsky verification info, and my OWN independent re-trigger — BEFORE reading
JOURNAL.md (anti-anchoring respected; I may consult it after, noting so).
### 1. Green at PR head — independently RE-TRIGGERED ✔ (the decisive proof)
I posted `!testme` on PR #2 myself (comment **14344**, 15:46:21Z). Bridge:
`[poll] triggered build 435 for bluesky-pds@f7b6c8df (PR #2, comment 14344) by
autonomic-bot`. Fresh **build 435** results.json: **level=5**, ref=f7b6c8dfb81c (PR head),
pr=2; rungs install/backup_restore/functional/lint=**pass**, upgrade=**skip**
(skips.intentional.upgrade=declared reason, skips.unintentional=[]); clean_teardown +
no_secret_leak=true. Recipe checkout = PR head `f7b6c8d`, image `:0.4.219`. Identical rung
profile to run 427 → reproducibly green, not a one-off.
- **Real stages, not a no-op:** junit shows install/backup(generic+cc-ci)/restore
(generic+cc-ci) and FOUR live functional tests — `test_health_check`,
`test_describe_server`, `test_session_auth`, `test_account_and_post`. A no-op could not
pass account-creation/post/session-auth against a live PDS. (Wall-clock ~70s is plausible:
lightweight 2-service recipe, image cached on host.)
### 2. PNG independently Read ✔
Fresh build 435 screenshot.png sha256 == run 427's (bdb71d3e…) == the image I Read at M1:
genuine PDS landing page (Bluesky ASCII butterfly, "AT Protocol Personal Data Server",
/xrpc/ pointer, upstream links), **no credentials**. Deterministic, real.
### 3. Level under new semantics + baseline reconciled ✔
level=5 under the de-capped ladder (upgrade=skip climbs; only fail/unver block). Old Phase-2
baseline ("full lifecycle green", e45e0ee, pre-results era) is genuinely unreproducible —
the moving-tag republish broke ALL published recipe versions; the PR restores deployability.
Reconciliation recorded in the DEFERRED closure + the M2 claim. Independently corroborated:
**0.5.x has NO release tag** (upstream git: 0 `0.5.x` tags, highest v0.4.219 + anomalous
v0.4.5001; ghcr `0.5.0/0.5.1/v0.5.1` all absent) — so an exact-version pin REQUIRES 0.4.x.
This fully resolves the §2.2 "prefer upgrade" scrutiny: re-pinning to 0.4.219 (newest exact)
is not "old over new" — there is no exact 0.5.x tag to upgrade to; 0.5.x lives only on the
moving tag the recipe must never pin. Justified.
### 4. DEFERRED entries closed with pointers ✔
machine-docs/DEFERRED.md: ✅ RESOLVED @2026-06-11 (phase bsky). Explicitly closes BOTH the
re-pin follow-up AND the rcust M2 baseline-exclusion note, with pointers to PR #2 / run 427 /
negative control 423 / upstream registry / DECISIONS. Original entry preserved (append-only).
### 5. Operator summary ✔
STATUS-bsky "Operator summary": crisp + complete — what was wrong (moving tag → index.ts vs
recipe's index.js; broke both published versions), what the PR changes (2-line re-pin
0.4.219 + label bump; why not 0.5.1 = no release tag + entrypoint migration), and a 5-step
post-merge runbook (merge → publish version → drop EXPECTED_NA + set
UPGRADE_BASE_VERSION="0.3.0+v0.4.219" → no canonical to reseed → never re-pin :0.4).
Corroborated: ci-warm has NO bluesky entry (only custom-html/keycloak/traefik) → "nothing to
reseed" is true.
### 6. PR left OPEN ✔
PR #2 head f7b6c8df, state=open, merged=**false** (re-confirmed at re-trigger). The phase is
done WITH the PR open — merging is the operator's, post-merge reseeding documented not done.
**M2 PASS. No VETO.** Both M1 (@369f4f4) and M2 are fresh Adversary PASSes; no gate
weakening, no secret leak, screenshot real, PR unmerged. The Builder is cleared to write
`## DONE` to STATUS-bsky.md. (Post-verdict I will consult JOURNAL/DECISIONS only to
contextualise — it does not change this verdict.)
### Post-verdict consult (does NOT change the verdict)
Read DECISIONS.md bsky entries after writing M2 PASS. Fully consistent: pin-choice entry
REJECTS 0.5.1 (no release tag + index.ts migration) AND digest-suffix pinning (abra
survey/upgrade tooling chokes on `tag@digest`) → exact-version tag 0.4.219 chosen (satisfies
plan §2.2 "digest-pinned OR exact-version tag"). EXPECTED_NA entry matches the harness
behaviour I verified. No contradiction, no new finding.