machine-docs: move all per-phase coordination files out of repo root
Some checks failed
continuous-integration/drone/push Build is failing

STATUS/BACKLOG/REVIEW/JOURNAL for bsky/conc/dstamp/kuma/lvl5/mailu/rcust/shot
(32 files) were at the repo root; move them into machine-docs/ to match the
mandated file-location rule (DECISIONS/DEFERRED/INBOX + older phases already
live there). AGENTS.md gains an explicit File-location rule. No content change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-06-11 20:57:03 +00:00
parent 560e772b5f
commit 85a781368a
33 changed files with 8 additions and 0 deletions

View File

@ -0,0 +1,18 @@
# BACKLOG — phase bsky
## Build backlog
- [x] B1: Root-cause diagnosis — inspect recipe compose/entrypoint + actual `:0.4` image vs exact tags on cc-ci (2026-06-11)
- [x] B2: Upstream research persisted to cc-ci-plan/upstream/bluesky-pds.md (plan repo f395247)
- [x] B3: DECISIONS.md entry — pin choice (exact 0.4.219 over 0.5.1-main / digest pin), version label bump
- [x] B4: Mirror PR branch `upgrade-0.3.0+v0.4.219` — compose.yml re-pin + label bump; open PR on recipe-maintainers/bluesky-pds
- [x] B5: `!testme` on the PR → full lifecycle green (install/health, upgrade-path status justified, backup/restore, functional, L5 lint); record level under de-capped semantics + reconcile expected baseline
- [x] B6: Screenshot on the green PR run — verify PNG real/representative/credential-free (Read it); SCREENSHOT hook only if needed
- [x] B7: Claim M1 (root cause + green fix PR + screenshot verified)
- [ ] B8: Close DEFERRED bluesky entries with pointers; JOURNAL note updating shot-phase N/A disposition
- [ ] B9: Operator handoff summary in STATUS-bsky.md (what was wrong, what the PR changes, post-merge expectations incl. canonical/warm reseed)
- [x] B10: Claim M2
## Adversary findings
(Adversary-owned)

View File

@ -0,0 +1,68 @@
# BACKLOG — sub-phase conc
## Build backlog
- [x] P1 lock-lifetime hardening: prctl PDEATHSIG + ppid race check + SIGTERM handler →
teardown funnel + signal.alarm(3600) hard deadline; .drone.yml setsid/trap wrap;
PEP 446 comment on lock open()
- [x] P2 flock-probe janitor: acquire_app_lock(domain) at register_run_app's call site;
janitor probes per-domain lockfiles (acquired→reap under probe lock, held→leave,
>120min mtime→warn); delete registry symbols
- [x] P3 per-run ABRA_DIR: /var/lib/cc-ci-runs/<build>/abra with servers+catalogue symlinks,
fresh recipes/; fetch_recipe = plain clone; delete acquire_recipe_lock; route harness
recipe paths through ABRA_DIR
- [x] P4 config cleanup: remove concurrency.limit from .drone.yml; maxTests is the single knob
- [x] tests/concurrency suite (19 cases, real-kernel flock, explicit invocation only)
- [x] P5 docs/concurrency.md rewrite to the new model
- [ ] M1 claim (branch complete, both suites + lint green)
- [ ] M2: merge to main after M1 PASS, push build green, live verification ad
## Adversary findings
### [adversary] CONC-A1 — double-!testme same domain corrupts the shared deploy-count file (M2(c) FAIL)
**Severity:** blocks M2(c). Both runs of a same-domain double-!testme go RED.
**Root cause (two coupled defects, one shared root):**
1. The DG4.1 deploy-counter file is keyed by DOMAIN in the *shared* system tempdir, NOT per-run:
`run_recipe_ci.py:930 countfile = /tmp/ccci-deploys-<domain>`. P3 isolated `ABRA_DIR` per run
but this per-run state file was missed — it predates the restructure (ef44d46) and the OLD
recipe-flock used to serialize same-recipe runs end-to-end, incidentally masking it.
2. `lifecycle.deploy_app()` calls `_record_deploy()` (lifecycle.py:250) BEFORE
`acquire_app_lock(domain)` (lifecycle.py:254, introduced by P2 b302f3a). So the counter
increment happens OUTSIDE the serialization window — a second same-domain run bumps the
shared counter before it ever blocks on the lock.
**Observed (live, builds 279 + 281, immich PR#2, same domain immi-ad3e33, 2026-06-10T05:04Z):**
- Lock serialization itself WORKS: 281 logged `== app lock: ... in flight — waiting ==` at 2s,
then `== app lock: acquired ==` at 194s — exactly when 279 exited (279 finished 05:07:35).
- 279 RED: `!! deploy-count 2 != 1 (DG4.1 violation)`. The `2` = 281's pre-lock `_record_deploy`
(fired ~2s, before 281 blocked) polluting the shared counter 279 was actively using.
- 281 RED: `FileNotFoundError: /tmp/ccci-deploys-immi-ad3e33...` at run_recipe_ci.py:1213 —
279's end-of-run `os.remove(countfile)` (line 1215) deleted the shared file out from under 281,
whose single `_record_deploy` had already fired at 2s and never recreates it.
- Control: isolated immich (build 275, same fixed wrapper) → `deploy-count = 1`, GREEN. So this
is concurrency-specific, not a pre-existing immich/wrapper issue.
**Repro:** two `!testme` comments on the same recipe PR (same domain) in quick succession on the
deployed main harness → both builds RED (one DG4.1 false-violation, one FileNotFoundError).
**Fix direction (Builder owns):** key the deploy-counter per RUN, not per domain — e.g. put it in
`/var/lib/cc-ci-runs/<build>/` (alongside the per-run artifacts) or include the build/run id in the
filename, and export that path via `CCCI_DEPLOY_COUNT_FILE`. Per-run keying fixes BOTH defects at
once (no cross-run pollution; no shared remove). Moving `_record_deploy()` after `acquire_app_lock`
alone is INSUFFICIENT — the shared `os.remove`/`FileNotFoundError` collision survives. Add a
tests/concurrency case: two same-domain runs serialized on the app lock → each sees its own
deploy-count, neither removes the other's file (this is the gap vs the 19 planned cases — case 4
serialises acquire but never asserts deploy-count isolation across the two).
**Closure:** adversary-owned. Re-test the (c) double-!testme live (both GREEN, visible block line,
zero leakage) + the new unit case before this clears. Only I close it.
**CLOSED @2026-06-10T09:0xZ** — fix b6e12ef (run-keyed state files via `_run_state_path`) merged
139e319. Verified by me: (a) code cold-verified + mutation-proven (reverting to domain-keying fails
all 3 test_run_state cases); (b) suites green cold (unit 138, concurrency 23); (c) LIVE re-run
builds 290+291 (same immich domain immi-ad3e33) BOTH SUCCESS — 291 logged the block line
(`in flight — waiting``acquired`), both read `deploy-count = 1` (290 no longer false-2; 291 no
longer FileNotFoundError), zero leakage after (0 procs / 0 apps / 0 services / 0 volumes / 0 secrets
/ no held locks). Full evidence in REVIEW-conc M2(c) PASS.

View File

@ -0,0 +1,73 @@
# BACKLOG — phase `dstamp`
## Build backlog (Builder-owned)
- [x] Read phase plan + plan.md §6.1/§7/§9 + Adversary prep notes + stamp-relevant harness code.
- [x] Establish abra's chaos-version mechanism from abra source @06a57de (= pinned binary).
- [x] Rule out abra-version drift (constant store path since nixos system-4, 2026-06-01).
- [x] Minimal reproductions of the git/abra chaos-version path (cp-a; go-git base; mirror-faithful)
— all stamp the CORRECT head 7ae7b0f7, NO drift in current host state.
- [x] Timeline: run 184 (06-05, solo) green @7ae7b0f; clustered 06-10/06-11 runs drift @ same ref.
- [x] Identify shared-stack collision vector (`app_domain` = hash(recipe|pr|ref); upgrade
chaos_redeploy bypasses app-domain flock).
- [x] Isolated real runs (repro14) + direct UpdateStatus/PreviousSpec capture → root cause attributed.
- [x] Concurrency REFUTED (solo repro1/4 reproduce). Mechanism = swarm `failure_action:rollback`
reverts the chaos-version label (direct evidence repro4: Spec=7ae7b0f7+U→PreviousSpec=eb96de9+U).
- [x] 06-05→06-10 change = rcust-phase heavier resident host load → start-first new task reliably OOMs → rollback every run (solo 06-05 run 184 didn't; my repro2 didn't either).
- [x] Blast-radius: only discourse affected (keycloak/n8n have the policy but upgrade PASS L4 across runs; drone/traefik infra). General harness guard covers all.
- [x] Restore discourse to its true level in real CI via the drone `!testme` path (M2): build #450 = LEVEL 5, all tiers PASS (install/upgrade/backup/restore/custom), clean teardown, no leak; PR#2 ✅ passed. fix1+fix2+450 = 3 consecutive green with the fix.
- [~] HC1 teeth: code unchanged (generic.py:174-175) + assert_upgrade_converged RED on rollback (repro1/4). Live negative test = Adversary's M2 verification.
- [x] Closed the DEFERRED.md dstamp re-entry with pointers (✅ RESOLVED).
## Adversary findings
<!-- Adversary-owned. Do not edit above this line in this section. -->
**Root cause independently confirmed @2026-06-11T17:3x (JOURNAL not read, anti-anchoring preserved):**
Docker Swarm `failure_action: rollback` + `order: start-first` in discourse's `compose.yml` app
service (BOTH `eb96de94` base AND `7ae7b0f` PR-head). On the upgrade chaos redeploy, `start-first`
runs OLD + NEW tasks co-resident (~2× memory); the heavy Rails/precompile app fails swarm's 5s
update monitor under host memory pressure → rollback fires → app service spec reverts to
PreviousSpec (`chaos-version=eb96de94+U`). Because `start-first` kept the OLD task serving,
`wait_healthy` passed; `deployed_identity` read the rolled-back spec; HC1 misreported it as
"stamp mismatch" (the real failure was "new task failed the update monitor").
`services_converged` blind spot: `"rollback_completed"` not in blocking states → returned True.
Evidence: `docker service inspect disc-ae10f0_..._app` confirmed `UpdateConfig: {On failure:
rollback, Order: start-first, Monitoring Period: 5s}`. repro1 (isolated, no concurrency) ALSO
showed drift → pure-concurrency hypothesis REFUTED independently before reading Builder evidence.
abra exonerated: abra reads `git HEAD = 7ae7b0f` and stamps `7ae7b0f7+U` CORRECTLY. Three
bail-at-secrets repros + repro2 debug line confirm. The `+U` comes from `compose.ccci.yml` as
untracked file in per-run recipe dir (rcust-era overlay absent from run 184's pre-rcust path).
Fix 0cc31a5 assessed CORRECT: overlay sets `order: stop-first` (eliminates OOM 2×-memory
trigger); `lifecycle.assert_upgrade_converged` closes the wait_healthy blind spot by catching
`"rollback_completed"|"rollback_paused"|"paused"` and failing HONESTLY. HC1 unchanged.
Minor race window in `assert_upgrade_converged` (first poll could see "none" before Docker
starts the roll) is covered: with stop-first, a post-race rollback also fails `wait_healthy`.
No blocker. Formal verdict awaits Builder's `claim(dstamp)` commit.
**Blast-radius sweep @2026-06-11T17:4x:**
All 24 enrolled recipes swept for `failure_action: rollback` + `order: start-first` in `compose.yml`:
| Recipe | failure_action | order | ccci overlay | upgrade tests | recent upgrade | risk |
|-----------|---------------|-------------|--------------|---------------|----------------|------|
| discourse | rollback | start-first | YES (fixed) | yes | FIXED | fixed |
| drone | rollback | start-first | no | NO tests | n/a | latent, no CI exposure |
| keycloak | rollback | start-first | no | yes | PASS L4 | latent, low (JVM, lighter than Rails) |
| n8n | rollback | start-first | no | yes | PASS L4 | latent, low (Node.js) |
| traefik | rollback | STOP-first | no | no | n/a | SAFE |
| all others | none or absent | — | — | — | — | not at risk |
`assert_upgrade_converged` (added in 0cc31a5) provides a general harness backstop: if any
recipe's rolling update rolls back or pauses, the upgrade is failed HONESTLY for all recipes
— not just discourse. So keycloak/n8n are already covered by the harness fix even without
overlay changes.
Recommended overlay addition for keycloak if/when OOM symptoms appear:
`deploy.update_config.order: stop-first` (same pattern as discourse). Not urgent — current
host load shows no rollback symptom for keycloak/n8n and they're lighter apps than discourse.
drone has no upgrade tier in cc-ci; no action needed there.

View File

@ -0,0 +1,28 @@
# BACKLOG — phase `kuma` (uptime-kuma create-a-monitor functional test)
## Build backlog
### DONE
- [x] Phase state files created (STATUS-kuma.md, BACKLOG-kuma.md, REVIEW-kuma.md, JOURNAL-kuma.md)
- [x] Approach decision: Playwright over python-socketio (recorded in DECISIONS.md)
- [x] Inspect uptime-kuma 2.2.1 source for exact DOM selectors
- [x] Implement `tests/uptime-kuma/playwright/test_monitor_wizard.py`
### DONE (continued)
- [x] Open recipe-maintainers/uptime-kuma PR #3 + trigger `!testme`
- [x] Drone build #460 = LEVEL 5, playwright:1 PASS
- [x] Claim M1 gate (fe8922c)
### IN PROGRESS
- [ ] Second `!testme` run (comment #14352, flake check) — polling for build
- [ ] M1 Adversary review
### PENDING (after M1 Adversary PASS)
- [ ] Second `!testme` run (flake check — 2 consecutive green)
- [ ] Update PARITY.md (note the new playwright/ test)
- [ ] Close DEFERRED.md entry "2026-05-28 — uptime-kuma create-a-monitor"
- [ ] Claim M2 gate
- [ ] Write ## DONE after M2 Adversary PASS
## Adversary findings
(Adversary-owned — no items yet; populated as issues are found)

View File

@ -0,0 +1,99 @@
# BACKLOG — Phase lvl5
## Build backlog
- [x] B1 (P1) `level.py`: append rung `lint` (L5); new status vocabulary {pass, fail, skip, unver}; `compute_level()` → new formula (level = max i: rung_i pass ∧ ∀j<i status {pass,skip}); DELETE cap_reason/capped concepts.
- [x] B2 (P1) lint executor (`harness/lint.py`): `abra recipe lint <recipe>` against the exact tested ref; hard ~60s timeout; rc+full output `lint.txt` artifact; pass/fail/unver classification (missing abra / timeout / exception unver, never pass, never skip); mirror-context handling per phase-plan §2.3 (probe abra behavior first; any filtering = named + unit-tested + DECISIONS.md).
- [x] B3 (P1) `results.py`: wire lint into `derive_rungs` + explicit intentional-vs-unintentional classification of EVERY N/A source; drop level_cap_reason/level_cap_rung from schema; `skips()` reflects new statuses; orchestrator (`run_recipe_ci.py`) runs lint executor at the tested-ref point + passes result through; verdict-neutral (R7 wrap).
- [x] B4 (P1) unit tests: rewrite test_level.py/test_results.py to new semantics incl. mission worked examples (fail-blocks L1; intentional-skip climbs L5; unver-blocks L2; lint unver L4; unclassifiable N/A unver default); lint executor tests; old-artifact rendering compat tests.
- [x] B5 (P2) `card.py`: 05 color ramp; cap line removed ("level N of 5" neutral); rung table renders ✔/✘/intentional-skip/unverified; level_badge_svg loses cap_skip third segment (badge = number+color only); tolerate old artifacts.
- [x] B6 (P2) `dashboard.py`: _LEVEL_COLOR 5-scale; _level_pill/badge SVG number-only; legend text; old results.json (cap_reason present, lint absent) render without KeyError.
- [x] B7 (P2) docs: results-ux.md, testing.md, recipe-customization.md §EXPECTED_NA wording L5 ladder, de-cap semantics.
- [x] B8 (P1) DECISIONS.md: semantics change record (replaces Phase-3 "N/A caps"); N/A classification table (every derive_rungs N/A source intentional|unintentional); mirror-filter decision for lint (if any filtering).
- [x] B9 gate M1: claim (branch w/ P1+P2; clean tree; cold-verifiable).
- [x] B10 (P3) lint sweep over ALL enrolled recipes (scratch clones never touch ~/.abra/recipes during builds); matrix here (pass/fail + rule hits); mechanical fixes mirror PRs (never push main/never merge); rest DEFERRED.md.
- [x] B11 (P4) real-CI proofs: 1 genuine L5; 1 lint-blocked L4 (synth branch ok); 1 N/A-skip climb; 2× drone !testme; canary suite at re-derived designed levels; 1 synthesized unver-blocks run; before/after level table for ALL enrolled recipes; card/dashboard PNG/SVG visually verified.
- [x] B12 gate M2: claim; then ## DONE after fresh PASS.
## Adversary findings
## P3 lint sweep matrix (B10) — all 19 enrolled, mirror main HEAD, 2026-06-11
Method: per recipe, fresh scratch clone of its canonical origin (mirror for the 17
recipe-maintainers recipes; coopcloud upstream for bluesky-pds/custom-html-tiny/mumble) +
upstream version tags fetched (production fetch_recipe shape), then `harness.lint.run_lint`
from phase-lvl5 @ 3d8d286 in a scratch ABRA_DIR (`/tmp/lvl5-sweep` on cc-ci; full outputs in
`/tmp/lvl5-sweep/art/<recipe>/lint.txt`). Canonical `~/.abra/recipes` never touched.
**Result: 19/19 PASS** (no error-severity rule unsatisfied anywhere). No recipe-mirror PRs and
no DEFERRED entries needed. Warn-severity misses (informational, do not fail the rung):
| recipe | lint | warn-rule misses |
|---|---|---|
| bluesky-pds | pass | R002 R007 R015 |
| cryptpad | pass | R002 R005 R007 |
| custom-html | pass | R002 R004 R005 |
| custom-html-tiny | pass | R002 |
| discourse | pass | R002 R007 R015 |
| ghost | pass | R015 |
| hedgedoc | pass | R015 |
| immich | pass | R002 R005 |
| keycloak | pass | R002 R015 |
| lasuite-docs | pass | R005 |
| lasuite-drive | pass | R002 R005 |
| lasuite-meet | pass | R002 |
| mailu | pass | R002 |
| matrix-synapse | pass | R002 R015 |
| mattermost-lts | pass | R002 R015 |
| mumble | pass | R002 |
| n8n | pass | R002 R015 |
| plausible | pass | R002 R005 R007 |
| uptime-kuma | pass | R015 |
Note: lasuite-meet's historically-lightweight tag `0.3.0+v1.16.0` is now ANNOTATED upstream
(verified `git cat-file -t` = tag on all three version tags) R014 passes genuinely; the
abra.py:105 lightweight-tag deploy fallback simply no longer triggers for it.
## Before/after level table skeleton (§2.9 — "after" to be filled by P4 real runs)
Baseline = latest results.json on cc-ci per recipe re-scored under the CURRENT (pre-lvl5,
4-rung) rule; ancient 6-rung artifacts (builds 205, integration/recipe_local era) re-read on
their four essential rungs. Predicted = same tier outcomes + sweep lint result under the new
rule (assumption flagged; P4 produces the real values).
| recipe | baseline rungs (latest artifact) | baseline level | predicted new level | REAL new level (P4 run) | why it shifts |
|---|---|---|---|---|---|
| bluesky-pds | no artifact (deploy-gated upstream, shot-phase N/A) | | | (still deploy-gated; documented N/A) | still deploy-gated |
| cryptpad | I U B F (#181) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| custom-html | I U B F (#182) | 4 | 5 | **4** (#405 PR4 lintdemo: lint fail R011; main analytic 5) | + lint pass |
| custom-html-tiny | I U B-na F-na (#205, predates functional/) | 2 | 5 | **5** (#399 N/A-skip climb, was 2) | de-cap: backup skip declared; functional/ tests exist now; + lint |
| discourse | I U B F (#184) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| ghost | I U B F (#185) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| hedgedoc | I U B F (#113) | 4 | 5 | **5** (#398, 100s) | + lint pass |
| immich | I U B F (#370) | 4 | 5 | **5** (#406, drone !testme PR2, 199s) | + lint pass |
| keycloak | I U B F (#187) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| lasuite-docs | I U B F (#188) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| lasuite-drive | I U B F (#189) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| lasuite-meet | I U B F (#204) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| mailu | I U B-na F (#191) | 2 | 5 | (not re-run; analytic 5 same de-cap as #399) | de-cap: not backup-capable skip climbs (the §2.9 N/A-skip demo) |
| matrix-synapse | I U B F (#203) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| mattermost-lts | I U B F (#196) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| mumble | no results.json artifact retained | | | **5** (#413, 80s first retained artifact) | P4 run to establish |
| n8n | I U B F (#197) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
| plausible | I U B F (#371) | 4 | 5 | **5** (#407, drone !testme PR3, 164s) | + lint pass |
| uptime-kuma | I U B F (#165) | 4 | 5 | (not re-run; analytic 5) | + lint pass |
Canaries (designed levels under the NEW formula, re-derived): custom-html-bkp-bad /
custom-html-rst-bad backup-capable with a failing backup/restore tier backup_restore rung
FAIL level 2 (fail still blocks; run verdict red as today). To be proven in P4.
### Canary designed-level re-derivation (P4, runs 415/416 — 2026-06-11)
Under the NEW formula the bad canaries' designed level is **1**, not the old 2: their mirrors
carry no published version tags on the SRC+REF path upgrade = intentional skip (climbs past
but never earns), backup_restore = FAIL blocks level = install = 1. Verified live: 415
(bkp-bad) + 416 (rst-bad) both **verdict FAILURE (red)**, rungs
{install: pass, upgrade: skip, backup_restore: fail, functional: unver (post-failure abort),
lint: pass}, LEVEL 1. Backup/restore fail still blocks; verdict logic untouched.
(First attempts 411/412 failed in 1s: canaries are mirror-only, not catalogue recipes they
need SRC+REF params, as prior phases ran them.)

View File

@ -0,0 +1,32 @@
# BACKLOG — phase `mailu` (backupbot labels + backup/restore coverage)
## Build backlog
(Builder-owned — read only for Adversary)
## Adversary findings
### [ADV-mailu-01] `/mail` Maildir volume restoration not tested — seed too shallow [adversary]
**Filed**: 2026-06-11T20:58Z
**Status**: OPEN — blocks M1
**Plan requirement** (`plan-phase-mailu-backup.md` §2.3): "a seeded mailbox + message that survives
backup→wipe→restore — extend the existing functional helpers if the current seed is too shallow"
**Repro**:
1. Current `ops.py::pre_backup` creates user account in SQLite (account record in `/data`), but never
injects a mail message into the Maildir at `/mail`.
2. `ops.py::pre_restore` deletes the SQLite account record only — does NOT wipe any maildir content.
3. `test_restore.py::test_restore_returns_mailbox` only asserts the account is back in config-export.
4. Result: the entire test exercises ONLY the `/data` (SQLite) volume; `/mail` (Maildir) restoration
is never specifically verified. If backupbot silently failed to restore `/mail`, this test passes.
**Fix**:
1. `pre_backup`: inject a uniquely-tagged message into `citest@<domain>` mailbox via in-container
postfix→dovecot delivery (same mechanism as `test_mail_flow.py::test_send_and_receive_mail`)
2. `pre_restore`: additionally wipe the `citest@<domain>` maildir
(`doveadm expunge -u citest@<domain> mailbox INBOX ALL` in the `imap` container)
3. `test_restore.py`: also assert the seeded message is back
(e.g., `doveadm search -u citest@<domain> mailbox INBOX ALL` returns ≥1 result)
**Only the Adversary closes this** after re-test with a fresh green build.

View File

@ -0,0 +1,23 @@
# BACKLOG — sub-phase rcust
## Build backlog
- [ ] P1.1 `runner/harness/meta.py`: KEYS registry (14 keys + 3 deprecated) + `load(recipe) -> RecipeMeta`
- [ ] P1.2 migrate readers L1L6 to `meta.load()` (orchestrator loads once, passes down)
- [ ] P1.3 mumble private constants → underscore-prefixed (`_WELCOME_TEXT_MARKER`, `_MAX_USERS`) + fix importers
- [ ] P1.4 `tests/unit/test_meta.py` (all-recipes-load-clean, MetaError cases, defaults, R2 proof)
- [ ] P1.5 `scripts/gen-meta-docs.py` + doc-sync unit test
- [ ] P2a compose.ccci.yml first-class (auto-copy + auto-chaos); strip ghost/discourse boilerplate
- [ ] P2b install-time deps only; migrate lasuite-docs; delete setup_custom_tests.sh machinery
- [ ] P2c SKIP_GENERIC meta key deleted; env form documented dev-only + loud warning in CI runs
- [ ] P2d conftest cleanup: delete deployed/deployed_app (+app_domain if unused); consolidate deps fixture; migrate 6 lasuite test files
- [ ] P3 HookCtx + convert all hook call sites + migrate in-repo users + unit tests
- [ ] P4 discovery placement rule + op_state/deps fixtures + migrate hand-parsers
- [ ] P5 customization manifest (print block + results.json key) + unit tests
- [ ] P6 docs rewrite (recipe-customization.md §8, testing.md, enroll-recipe.md)
- [ ] M1 pre-claim: run `pytest tests/concurrency -q` once to prove untouched
- [ ] M2 prep: build baseline matrix (21 recipe dirs, expected outcomes) BEFORE merging — commit to STATUS-rcust.md
## Adversary findings
(Adversary-owned section)

View File

@ -0,0 +1,128 @@
# BACKLOG-shot.md — phase `shot` (recipe screenshot audit & repair)
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md. Gates: M1 (audit+diagnosis), M2 (all OK / agreed N/A).
## Build backlog
### P1 — Audit matrix (status: complete, all 19 PNGs visually inspected 2026-06-11)
Enrolled set (19) = `tests/<r>/recipe_meta.py` minus fixtures (`_generic`, `regression`, `concurrency`,
`custom-html-bkp-bad`, `custom-html-rst-bad`). Evidence: `/var/lib/cc-ci-runs/<run>/` on cc-ci;
PNGs pulled to /tmp/shot-audit/ on the builder host and each one Read (visually).
| recipe | latest run w/ artifacts | screenshot field | PNG bytes | visual content (I looked) | class |
|---|---|---|---|---|---|
| bluesky-pds | ab-bluesky-pds-oldmain | null | — | no PNG; install=fail level=0 (upstream image breakage, rcust DEFERRED) → capture correctly skipped (`if deploy_ok`) | N-A-candidate (blocked upstream) |
| cryptpad | m2r-cryptpad | screenshot.png | 4802 | solid light-grey frame, nothing else | BLANK |
| custom-html | m2r-custom-html | screenshot.png | 35707 | "Welcome to nginx!" default page | OK? (diagnose: is this the recipe's true fresh-install content?) |
| custom-html-tiny | m2r-custom-html-tiny | screenshot.png | 12950 | seeded CI content ("cc-ci custom-html-tiny … DG5") | OK |
| discourse | m2p-discourse | screenshot.png | 66121 | real forum UI, welcome topic, Sign Up/Log In | OK |
| ghost | m2r-ghost | screenshot.png | 444183 | real blog landing ("Thoughts, stories and ideas") | OK |
| hedgedoc | m2r-hedgedoc | screenshot.png | 131967 | real landing (logo, Sign In, feature intro) | OK |
| immich | 356 | screenshot.png | 4801 | pure white frame | BLANK |
| keycloak | m2r-keycloak | screenshot.png | 8764 | spinner + "Loading the Administration Console" | LOADING |
| lasuite-docs | m2r-lasuite-docs | screenshot.png | 6022 | lone spinner on white | LOADING |
| lasuite-drive | m2p2-lasuite-drive | screenshot.png | 5895 | lone spinner on white | LOADING |
| lasuite-meet | m2r-lasuite-meet | screenshot.png | 4801 | pure white frame | BLANK |
| mailu | m2r-mailu | screenshot.png | 33800 | real sign-in page (empty fields) | OK |
| matrix-synapse | m2r-matrix-synapse | screenshot.png | 33296 | "It works! Synapse is running" landing | OK |
| mattermost-lts | m2b-mattermost-lts | screenshot.png | 242139 | brand splash/loading screen (logo on blue), NOT the login form | LOADING (borderline — brand-recognizable but a loading state) |
| mumble | m2r-mumble | screenshot.png | 7913 | spinner on grey — a web page IS served on the domain | LOADING (diagnose what serves it; N/A may NOT be justified) |
| n8n | m2r-n8n | screenshot.png | 4801 | off-white blank frame. Flaky: run 197 (30256 B) shows the real "Set up owner account" form (empty fields, credential-free) | BLANK (flaky) |
| plausible | 357 | null | — | no PNG on ANY run (122→357) | NULL |
| uptime-kuma | m2r-uptime-kuma | screenshot.png | 30858 | real "Create your admin account" setup form (empty fields) | OK |
PNG-size note: 4801/4802 B at 1280×800 is a byte-stable blank-frame fingerprint (3 different apps, same size).
### P2 — Root-cause diagnoses
- [x] **NULL — plausible** (evidence: Drone build 357 ci-step log, t=73s):
`screenshot: capture failed (non-fatal, verdict unaffected): page.goto(https://plau-b51425.ci.commoninternet.net/) never returned a status in (200, 301, 302, 303, 401, 403) after 15 attempts (45s); last status=500`.
Plausible's `/` 500s **by design** under `DISABLE_AUTH=true` (auth_controller; documented in
`tests/plausible/functional/test_health_check.py` docstring and recipe_meta — that's why HEALTH_PATH
is `/api/health`). Default landing-page capture can NEVER succeed → needs a per-recipe SCREENSHOT
hook to a path that actually renders (probe live: e.g. /login or /sites).
- [x] **NULL — bluesky-pds**: install fails (level=0) before the app is up → `if deploy_ok:` gate in
runner/run_recipe_ci.py:1024 correctly skips capture. Not a screenshot defect; upstream image
breakage already filed in machine-docs/DEFERRED.md (rcust). → documented N/A while upstream is broken.
- [x] **BLANK class — immich, lasuite-meet, n8n(flaky), cryptpad**: SPA paint race. capture() navigates
with `wait_until="domcontentloaded"` (runner/harness/screenshot.py:91) and screenshots immediately;
SPA shell HTML has loaded but JS hasn't painted → solid 4801-2 B frame. n8n flakiness = same race,
sometimes JS wins (run 197 captured the real form).
- [x] **LOADING class — keycloak, lasuite-docs, lasuite-drive, mumble, mattermost-lts(borderline)**:
same race, caught mid-paint (spinner/splash rendered, app JS still loading/connecting).
- [x] **mumble** web stack identified: recipe deploys a `web` service (mumble-web client) on the domain —
spinner is its connecting state; landing renders a connect dialog once JS settles. NOT an N/A.
- [x] **custom-html** nginx-welcome question: the recipe's fresh install genuinely serves the nginx
default page at `/` (no content seeded for this recipe's install; only custom-html-tiny seeds via
install_steps.sh). Screenshot is an honest representative view of a fresh install. → OK as-is.
### P3 — Fixes (all merged to main)
- [x] Harness default improvement (ce50f64 + A1 hardening 7ad7d1f): bounded networkidle settle
(10s) + 0.5s render grace after domcontentloaded; blank/spinner-frame detect (<10000 B) ONE
retry with 4s settle, larger frame kept (A1). Wait budget 45+10+0.5+4+0.5 = 60s, unit-tested.
8 new unit tests; 207 pass; lint PASS.
- [x] plausible NOT a hook in the end: the real root cause was EXTRA_ENV SECRET_KEY_BASE being
62 chars (<64-byte Phoenix cookie-store minimum) every HTML render 500'd. Fixed to 68 chars
(b98a471); default capture then lands the genuine registration page. Stale auth_controller
comments corrected (no assertion touched).
- [x] mattermost-lts SCREENSHOT hook (80e5713 + 3c33129): interstitial appears on ANY first-visit
route incl /login (proven byte-identical PNG) hook navigates /login, clicks "View in Browser"
best-effort, settles; lands the real login form. First real hook; public screenshot.settle().
- [x] keycloak / lasuite-docs / lasuite-drive / lasuite-meet / immich / cryptpad / n8n: fixed by
the harness default alone (no hooks needed proof PNGs below).
- [x] mumble: NOT fixable harness-side pinned mumble-web:0.5 client never paints UI for an
anonymous browser (≥90s DOM/console/network observation: no errors, no failed requests,
connect-dialog elements absent, no autoconnect overrides). Loader frame = the genuine anonymous
web view; voice (the recipe's function) fully covered by protocol tests. DEFERRED.md entry filed
(upstream question for the operator).
- [x] bluesky-pds: documented N/A while upstream image broken (rcust DEFERRED; Adversary-agreed at
M1, contingent re-check at M2 latest failing evidence ab-bluesky-pds-oldmain, 2026-06-11).
### P4 — Proof runs (fresh, post-fix; every PNG visually Read by Builder)
| recipe | proof run (dir on cc-ci) | level (baseline) | PNG B | visual |
|---|---|---|---|---|
| immich | 370 (drone !testme immich#2) | 4 (=356:4) | 234351 | real "Welcome to Immich" onboarding |
| plausible | 371 (drone !testme plausible#3) | 4 (=357:4) | 64132 | real registration form, empty fields |
| keycloak | shot-proof-keycloak | 4 | 215587 | real "Sign in to your account" form |
| cryptpad | shot-proof-cryptpad | 4 | 57310 | real landing + document-type picker |
| lasuite-meet | shot-proof-lasuite-meet | 4 | 225686 | real video-conferencing landing |
| lasuite-docs | shot-proof-lasuite-docs | 4 | 284769 | real Docs landing |
| lasuite-drive | shot-proof2-lasuite-drive | 4 | 132037 | real Drive landing |
| n8n | shot-proof-n8n | 4 | 26433 | real "Set up owner account", empty fields (now deterministic) |
| mattermost-lts | shot-proof3-mattermost-lts | 2 (=m2r:2) | 178367 | real "Log in to your account" form (hook v2) |
| mumble | shot-proof-mumble | 4 | 7980 | loader frame best-available (see P3/DEFERRED) |
Drone durations pre/post (same recipe+PR): immich 199s198s; plausible 209s166s (faster capture
no longer burns 45s failing). Healthy class (ghost, hedgedoc, discourse, custom-html,
custom-html-tiny, mailu, matrix-synapse, uptime-kuma): existing artifacts cited in P1 matrix, each
visually verified real + credential-free; no new runs needed per plan §3 P4.
Dashboard/card: grid thumbnails for runs 370/371 served 200, summary.html embeds screenshot.png,
/badge/immich.svg 200.
## Adversary findings
### [adversary] A1 — blank-retry can REGRESS a larger frame to a worse one (LOW, non-blocking) — CLOSED @2026-06-11T06:32Z
**CLOSED:** fixed in 7ad7d1f (retry snapped to a temp path; `os.replace` only if `retry >= first`,
else discard + cleanup in `finally`). Re-verified COLD with my own probe (not the Builder's test):
the exact filed case `[9999,4801]` now keeps **9999** (retry discarded, no temp leak); originals
intact (`[4801,30256]`30256, `[4801,4802]`4802, `[35707]`1 shot, `[5000,5000]`replace). 5/5 pass.
R7 contract preserved (retry-raise still propagates to capture's swallow None; first frame on disk).
--- original finding (for the record) ---
**Where:** `runner/harness/screenshot.py` `_snap_with_blank_retry` (ce50f64).
**What:** the retry overwrites `out_path` *unconditionally* with the second screenshot. The code/comment
claim "the retry only ever replaces a tiny frame with a later one" but *later ≠ better*. If the first
frame is e.g. 9999 B (a partial render, just under `BLANK_SIZE_BYTES=10000`) and the page regresses in the
extra 4 s settle (redirect, session-timeout splash, error overlay), the retry can yield a 4801 B blank that
**overwrites the better 9999 B frame**. The Builder's unit test only covers blankblank (48014802); the
biggersmaller regression is untested.
**Repro (cold, my independent probe, not the Builder's test file):** fake page returning sizes
`[9999, 4801]` `_snap_with_blank_retry` keeps **4801** (the worse frame).
**Severity:** LOW. R7 holds (cosmetic only, never affects verdict); my M2 per-PNG visual check is the
backstop any actually-blank final PNG will FAIL that recipe regardless. Filed for hardening, not a veto.
**Suggested guard (trivial, strictly safer):** keep the larger frame only overwrite if
`getsize(retry) >= getsize(first)` (or snap retry to a temp path and pick `max`). Then extend the unit
test with a biggersmaller case asserting the larger frame survives.
**Closes:** only I close this, after re-test. Non-blocking for an M2 claim, but I will re-check at M2.

View File

@ -0,0 +1,120 @@
# JOURNAL — phase bsky
## 2026-06-11T11:31Z11:55Z — bootstrap + root-cause diagnosis (B1, B2)
Phase start. Read plan-phase-bsky-fix.md + plan.md §6.1/§7/§9. Adversary seeded
REVIEW-bsky.md (8d5bf30) with cold baseline recon — same suspects I confirmed below.
**Diagnosis chain (commands + outputs):**
1. Mirror clone (b2d86ef): `compose.yml` pins `image: ghcr.io/bluesky-social/pds:0.4`,
overrides entrypoint (`dumb-init --` + config-mounted `/entrypoint.sh`);
`entrypoint.sh.tmpl` ends `exec node --enable-source-maps index.js` — relative path,
resolved against image WORKDIR.
2. Live image inspection on cc-ci:
`docker image inspect ghcr.io/bluesky-social/pds:0.4 --format "{{.Id}} created={{.Created}} workdir={{.Config.WorkingDir}} ... cmd={{.Config.Cmd}}"`
`sha256:007500681bbf… created=2026-05-30T05:05:11Z workdir=/app entrypoint=[dumb-init --] cmd=[node --enable-source-maps index.ts]`
`docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4 -c 'node --version; ls /app'`
`v24.15.0` / `index.ts node_modules package.json pnpm-lock.yaml`**no index.js**.
`grep @atproto/pds /app/package.json``"@atproto/pds": "0.5.1"`; /usr/local/bin/goat present.
So `:0.4` is now a main-branch 0.5.1 build → recipe's `index.js` exec = MODULE_NOT_FOUND.
This precisely explains the rcust-era crash-loop evidence (Node v24.15.0 in traceback).
3. Upstream research:
- ghcr tags/list (paginated): exact tags …0.4.158, 0.4.169, 0.4.182, 0.4.188, 0.4.193,
0.4.204, 0.4.208, 0.4.219, plus anomalous 0.4.5001. `:0.4` digest `871194d2…` ==
`latest`, ≠ `0.4.219` (`e0b756701c92…`) → :0.4 republished past the release line.
- Dockerfile@v0.4.219: node:20.20-alpine3.23, WORKDIR /app, CMD index.js, dumb-init.
- Dockerfile@main: node:24.15-alpine3.23, CMD index.ts, + goat binary — matches what
`:0.4` now contains. GitHub `releases/latest` 404s (they only push git tags).
- service/package.json@v0.4.219: `"@atproto/pds": "0.4.219"`.
4. Candidate-fix image verified on cc-ci:
`docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4.219 -c 'node --version; ls /app; grep @atproto/pds /app/package.json; which dumb-init'`
`v20.20.2` / index.js present / `"@atproto/pds": "0.4.219"` / `/usr/bin/dumb-init`.
Image CMD `[node --enable-source-maps index.js]` — identical to what the recipe's
entrypoint execs, so the override stays valid.
**Why pin 0.4.219 and not chase 0.5.1 (rationale, summarized in DECISIONS.md):** 0.5.1
exists only as the moving `:0.4`/`latest`/sha- tags — no exact release tag, built from
main, and Co-op Cloud upgrade tooling works on tags. Re-pinning to the newest *released*
exact tag is the minimal, justified fix; when upstream cuts real 0.5.x release tags the
recipe can upgrade properly (entrypoint will then need `index.ts` + Node 24 — noted in
upstream registry).
Bridge enrollment confirmed: bluesky-pds in POLL_REPOS (nix/modules/bridge.nix:43) →
`!testme` works. Mirror has only closed PR#1 (skill smoke test); my fix → PR#2.
Next: DECISIONS entry (B3), mirror branch + PR (B4), !testme (B5).
## 2026-06-11T11:40Z11:55Z — run 423 red: the upgrade-BASE trap (B5 first attempt)
PR #2 opened (branch upgrade-0.3.0+v0.4.219, head f7b6c8df, 2-line diff) and !testme'd
(comment 14340) → drone build/run 423. RESULT: install=fail, level 0 — but NOT the PR:
the run never deployed the PR head. The harness deploys ONCE at the upgrade BASE
(`previous_version` = vers[-2] = 0.1.1+v0.4 — confirmed: run-423's recipe checkout sat at
tag 0.1.1+v0.4) and only the upgrade tier chaos-redeploys the PR head. Both published tags
(0.1.1+v0.4, 0.2.0+v0.4) pin the broken moving `:0.4` → the base crash-loops the SAME
MODULE_NOT_FOUND (run-423 app log: Node v24.15.0, /app/index.js missing) → install fails
before my fix is ever exercised. No published version can EVER deploy again (upstream
republished the tag) — so the upgrade path is structurally unverifiable until a fixed
version is published post-merge.
Fix (harness, evidence-backed, not a weakening): EXPECTED_NA["upgrade"] (the EXISTING
declared-intentional-skip mechanism, de-capped levels phase lvl5) now also suppresses the
base deploy — extracted `upgrade_base()` pure helper in run_recipe_ci.py; single deploy
becomes the PR head; upgrade tier records "skip"; derive_rungs classifies it intentional
with the declared reason (visible in results.json skips.intentional — never reported as a
pass). tests/bluesky-pds/recipe_meta.py declares it with the full reason + the re-enable
path (UPGRADE_BASE_VERSION="0.3.0+v0.4.219" once published). 6 new unit tests
(tests/unit/test_upgrade_base.py) lock the decision matrix; meta-key doc regenerated.
Verified: 253 unit tests pass on cc-ci (was 247), repo lint PASS. Pushed e9745c8.
Re-triggered !testme (comment 14342) → build/run 427. Monitor armed.
## 2026-06-11T12:05Z — run 427 GREEN: level 5 at PR head; M1 claimed (B5, B6, B7)
Run 427 (drone build 427, comment 14342): level 5 — install/backup_restore/functional/
lint PASS, upgrade = declared intentional skip (reason verbatim in skips.intentional),
clean_teardown + no_secret_leak true, ref f7b6c8dfb81c. Per-run recipe checkout at PR
head f7b6c8d with image 0.4.219 (the fix WAS what deployed). Bridge reflected success →
PR comment 14343 ✅. Screenshot Read and verified: genuine PDS landing page (ASCII
butterfly, "This is an AT Protocol Personal Data Server", /xrpc/ pointer) — exactly the
default capture the phase plan predicted would work once deploy works; no hook needed.
Card (summary.png): 5/5, upgrade shown INTENTIONAL SKIP with reason; badge "level 5"
green. M1 claimed in STATUS-bsky.md.
## 2026-06-11T12:15Z — records closed (B8) + operator summary drafted (B9)
DEFERRED bluesky entry marked RESOLVED with pointers (f150012) — covers BOTH the re-pin
follow-up and the rcust M2 baseline-exclusion note.
**Shot-phase N/A disposition update (supersedes the deploy-gated classification):**
the shot phase classified bluesky-pds's screenshot "deploy-gated N/A — never capturable
because the app never comes up". With the PR#2 fix deployed (run 427, PR head), the
DEFAULT landing-page capture works exactly as the phase plan predicted: a real,
representative, credential-free PDS landing page (ASCII butterfly + "This is an AT
Protocol Personal Data Server" + /xrpc/ pointer). No SCREENSHOT hook was needed. The
N/A stands for HISTORICAL runs only; post-merge, bluesky-pds screenshots like any other
recipe.
Canonical/warm check: /var/lib/ci-warm has NO bluesky-pds dir → no canonical to reseed
post-merge; the normal promote-on-green flow will mint one on the first green run after
merge. Operator summary written to STATUS-bsky.md (B9).
## 2026-06-11T15:50Z — M1 PASS received; M2 claimed (B10)
M1 PASS @12:30Z (REVIEW-bsky 369f4f4), no findings, no VETO — every item reproduced cold
incl. negative-control teeth and the per-recipe scoping of the EXPECTED_NA change. (Gap
12:30→15:45 was a quota window, not work.) All M2 builder-side items were already in
place (DEFERRED f150012, operator summary cba53b6); claimed M2 with re-trigger
instructions for the fresh cold pass. Phase DoD after M2 PASS → ## DONE with PR open.
## 2026-06-11T15:55Z — M2 PASS → ## DONE
M2 PASS @15:48Z (42eabba): Adversary independently re-triggered !testme (comment 14344 →
build 435, level 5 at f7b6c8df, identical rung profile + screenshot sha to 427) and
corroborated every handoff item — including that 0.5.x has NO release tag, fully settling
the §2.2 upgrade-preference question. ## DONE written. Phase ends with PR #2 open for the
operator; loop stopped.

View File

@ -0,0 +1,165 @@
# JOURNAL — sub-phase conc (Builder, append-only)
## 2026-06-10 — bootstrap
Read concurrency-restructure-full-plan.md (SSOT) + plan.md §6.1/§7/§9. Oriented on the code:
- `runner/harness/lifecycle.py` — recipe flock (l.46), registry (l.6597), deploy_app
registration (l.283), teardown unregister (l.723), three-way janitor (l.726).
- `runner/run_recipe_ci.py``acquire_recipe_lock` call site (l.843), `fetch_recipe` (l.140,
rm-rf + reclone of the shared tree), janitor call sites (l.600 quick, l.932 cold).
- `.drone.yml` — recipe-ci step runs `cc-ci-run runner/run_recipe_ci.py` bare (P1 wraps it),
`concurrency.limit: 2` (P4 removes).
- Greps for P3 fallout: `~/.abra/recipes` referenced in abra.py (recipe_checkout,
has_lightweight_version_tags, recipe_head_commit, recipe_versions), generic.py:28,
lifecycle.prepull_images, run_recipe_ci (fetch_recipe, snapshot_recipe_tests, comment),
warm_reconcile.py:202 (runs OUTSIDE per-run context — keeps default), and
tests/ghost+discourse install_steps.sh (`${HOME}/.abra/recipes/...` — these run INSIDE a
run and copy compose.ccci.yml into the deploy tree, so they must resolve the per-run dir).
- `~/.abra/servers/...` paths are unaffected by design (servers/ is symlinked to the canonical
/root/.abra/servers, so both resolutions land on the same file).
Working setup: state files on main in this clone; code on branch `restructure/concurrency`
via a git worktree at ../cc-ci-conc; test runs on the cc-ci host via /root/builder-clone
(`cc-ci-run -m pytest ...`, `nix develop .#lint`).
## 2026-06-10 — P1P4 landed on restructure/concurrency
- P1 b492f99: harness/lifetime.py (PDEATHSIG+ppid recheck, SIGTERM/SIGALRM→SystemExit funnel
with re-entrancy guard, alarm(3600)); main() installs first; both finally blocks mark
begin_teardown(); .drone.yml setsid+trap wrap. Live smoke on cc-ci (cc-ci-run /tmp/p1-smoke.py):
TERM→rc=143+finally; ALRM→rc=142+finally+deadline log; parent-kill→child TERM'd, teardown ran.
- P2 b302f3a: acquire_app_lock + _probe_and_reap + janitor rewrite; registry deleted. Live smoke
(/tmp/p2-smoke*.py): held lock → "live concurrent run, leaving it", reaped=[]; killed holder →
reap exactly once + lockfile unlinked; waiter blocked during probe-held reap, then re-acquired
on the FRESH inode (probe confirmed held by waiter). Note: a select()-on-fd readline artifact
in my smoke script initially looked like a failure — kernel state was verified directly.
Unlink/recreate race guarded on BOTH sides via fstat/stat st_ino identity checks.
- P3 17ebdf3: per-run ABRA_DIR. Verified abra CLI honors $ABRA_DIR on-host (skeleton probe:
FATAs only on empty servers/; with servers+catalogue symlinks + recipes/ it works and even
auto-clones recipes for `app ls` resolution into the per-run dir). p3-smoke: setup + fetch of
custom-html-tiny landed in /tmp/p3runs/9999/abra/recipes, head commit + versions readable via
abra.recipe_dir(). install_steps.sh path fix justified in DECISIONS.md (conc P3 entry).
Pre-existing observation (NOT mine, unchanged): `abra app ls -S -m -n` currently FATAs
"unable to resolve '0cc57a5a'" under the DEFAULT abra dir too → janitor's abra discovery
yields [] and the docker-service sweep carries discovery. Out of this phase's scope.
- P4 91d3cc7: concurrency.limit removed; maxTests comment states single-knob + new model.
One stale comment line (.drone.yml l.39 "concurrency.limit=2 below") folds into P5.
All four commits: tests/unit 138 passed + lint PASS before each. Next: tests/concurrency suite.
## 2026-06-10 — tests/concurrency (84d90fb) + P5 (d3fe9e2) + M1 claim (e8e52cf)
- Suite: 20 tests / 19 plan cases, all real-kernel (helpers.py subprocesses hold real flocks,
install real prctl/alarm guards; CCCI_APP_LOCK_DIR sandboxes /run/lock; HelperPool reaps every
helper + recorded grandchildren). First full run on cc-ci: 20 passed in 9.96s, zero flakes in
3 repeat runs during the P5 verification re-runs.
- Design notes for the Adversary's blind-spot hunt (my own known limits):
- case 8 (two janitors) uses threads in one process — valid because flock conflicts are
per-open-file-description, and overlap is forced via a Barrier + 2s slow teardown stub.
- case 14 relies on reparent-to-pid-1 (true on the cc-ci host; would need adjustment in a
subreaper environment — marked NEVER_REPARENTED visibly if so).
- cases 5-12 stub teardown_app (recording) — janitor probe/reap ordering is what's under
test, not teardown internals (covered by Phase-1 e2e + M2 live checks).
- M1 claimed at e8e52cf; full verification recipe in STATUS-conc.md (WHAT/WHERE/HOW/EXPECTED).
## 2026-06-10 — M2: merge + live verification (a)
- Merge: bb5eb3d (--no-ff) pushed; push build 266 (self-test lint+hello) SUCCESS.
- (a) cancel-mid-run: !testme on immich#2 → build 267 (custom) running on the NEW harness —
log shows the setsid/trap wrap + "== per-run ABRA_DIR: /var/lib/cc-ci-runs/267/abra ==";
lock /run/lock/cc-ci-app-immi-ad3e33...lock held by pid 636902; 4 immich services up.
Canceled via drone API 04:42:07Z (HTTP 200, build status "killed"). Result: harness pid
GONE (no leaked python — the old §8.1 gap is closed), immich services 0, volumes 0,
secrets 0, .env 0 — the SIGTERM funnel ran the run's own teardown (better than the plan's
minimum, which allowed the janitor to do the reaping). Lock RELEASED (lockfile present but
unheld — tidy-swept by the next janitor, to be observed during (b)).
- (b) triggered 04:46:53Z: !testme immich#2 (comment 14287) + plausible#3 (14288) in parallel.
## 2026-06-10 — M2(b) round 1: green runs, poisoned exit code → wrapper fix
- Builds 268 (immich#2) + 269 (plausible#3) ran in PARALLEL on the new harness: both logs end
with all-tiers-pass RUN SUMMARY (level=4, deploy-count 1/1) and the host shows ZERO leakage
after (no harness processes, no immi/plau services/volumes/secrets, only unheld lockfiles).
Both steps nevertheless exited 1: the P1 EXIT trap's kill of the already-gone process group
returns ESRCH under the runner's `set -e` shell — a GREEN run reported failure.
- Reproduced minimally on-host (`sh -e` and `bash -e`: rc=1 on a clean exit with the old trap).
Fix e1c4198 (capture rc; `trap - TERM EXIT`; `|| true` on the trap kill) verified on-host:
green rc=0, red rc=7 propagated, TERM→wrapper forwards to child, exits 143. Merged to main
b7a009c; push builds 272-274 green. Adversary notified via inbox.
- (b) re-triggered on the fixed wrapper 04:56:10Z (immich#2 + plausible#3).
## 2026-06-10 — M2(b) PASS + (c) triggered
- (b) round 2 on fixed wrapper: builds 275 (immich#2) + 276 (plausible#3) ran in PARALLEL,
BOTH status=success (drone API). Host after: 0 python harness processes, 0 immi/plau
services/volumes/secrets/.envs — zero leakage. (d) satisfied by 275 (full green immich e2e).
Leftover unheld lockfiles present by design (tidy-swept at next janitor).
- (c) double-!testme on immich#2: two comments at 05:03:58Z → two custom builds, same run
domain immi-ad3e33 → exactly one must block on the app lock with the visible log line.
## 2026-06-10 — CONC-A1: (c) failure root-caused + fixed (run-keyed state files)
- (c) round 1 = builds 279+281, both RED. Root cause (independently also found+filed by the
Adversary as CONC-A1 while I was mid-diagnosis — same conclusion from both loops): the four
run-scoped state files (deploys/opstate/deps/depskip) were DOMAIN-keyed in shared /tmp;
281's main()-preamble + pre-lock _record_deploy fired before it blocked on the app lock →
279 read deploy-count 2 (false DG4.1 RED); 279's end-of-run os.remove deleted the shared
countfile → 281 crashed FileNotFoundError at its own read. Lock serialization itself worked
(281: waiting @+2s, acquired @+194s = 279's exit). Masked pre-restructure by the
end-to-end recipe flock.
- Fix b6e12ef on branch, merged to main 139e319: _run_state_path() keys all four by
run id + harness pid; consumers were always env-fed (CCCI_*_FILE), so domain keying was
never load-bearing. Both cleanup sites already remove all four on normal exit.
- New tests/concurrency/test_run_state.py (suite now 23): path invariants + real-process
CONC-A1 interleaving via helpers.py `deploy-count-run` (countfile init → pre-lock
_record_deploy → acquire → gated read). Teeth verified: under simulated shared keying the
regression test FAILS (host run: 3 failed); with the fix: 23 passed + 138 unit + lint PASS.
- Next: push build green → re-run (b)+(d), then (c), then (a) per the VETO's conditions.
## 2026-06-10 — M2 re-verification on CONC-A1-fixed main (139e319)
- Push builds 283/284/285 (branch fix, merge, inbox) all green.
- (b)+(d) round 3 (comments 14299/14300, 08:17:35Z): builds 287 (immich#2) + 288 (plausible#3)
BOTH success, started simultaneously 08:17:40Z (parallel), finished 08:21:06/08:21:13.
Both logs: deploy-count = 1 (expect 1), level=4. Host after: pgrep -f 'run_recipe_c[i]' → no
match (earlier "2" was pgrep self-match of the ssh cmdline); immi/plau services/volumes/
secrets/server-envs all 0. Zero leakage. (d) satisfied by 287 (full green immich e2e on the
final harness code).
- (c) round 2 triggered 08:22:13Z: comments 14303+14304 on immich#2 (same domain immi-ad3e33).
## 2026-06-10 — M2(c) PASS round 2 (builds 290+291) + (a) re-run triggered
- (c) round 2: builds 290 (08:22:30→08:46:05) + 291 (08:22:33→08:49:23) BOTH success.
291 log: "== app lock: another run of immi-ad3e33... in flight — waiting ==" at +1s,
"acquired" at +1411s = exactly 290's exit. Both: deploy-count = 1 (expect 1), level=4.
Slowness was an immich-ML healthcheck flake (Adversary cross-confirmed live via lslocks:
one holder pid 739163, one waiter pid 739341 on the same lock inode — serialization observed
in the kernel lock table); ML converged inside the 1500s window, both runs green anyway —
no clean re-run needed.
- After both: no harness procs (pgrep run_recipe_c[i] empty), 0 immi/plau services/volumes/
secrets/server-envs. Unheld lockfile remains by design (tidy-swept at next janitor probe).
- (a) re-run on fixed harness: !testme immich#2 comment 14307 @08:50:02Z; will cancel mid-run
via drone API once the deploy is in flight, then check pid/lock/leakage + janitor reap.
## 2026-06-10 — M2(a) re-run PASS (build 295) + M2 claim
- (a) on fixed harness: build 295 (comment 14307 @08:50:02Z) canceled @08:51:05Z (HTTP 200)
while mid-deploy (lock held by pid 763099, 4 immich services converging). Harness pid GONE
@08:51:15Z — the SIGTERM funnel ran the run's own teardown inside 10s; build status=killed;
lock released (lslocks empty); services/volumes/secrets/envs all 0. Zero leakage, no janitor
required.
- Adversary lifted the CONC-A1 VETO @09:05Z with its own M2(c) PASS (290/291 cold-verified,
kernel-lock-table serialization observation). Remaining for DONE: formal M2 claim (this
commit) + Adversary cold re-check of (a)/push-builds.
- M2 claimed in STATUS-conc.md with consolidated (a)-(d) evidence + cold re-check recipe.
## 2026-06-10 — M2 PASS → ## DONE
- Adversary M2 PASS @08:55Z (review 9987fba): all 7 claim items cold-confirmed, both M2-found
fixes verified, guardrails honored, no open veto. Parent-sha typo in my claim noted by the
Adversary (139e319^1 = 2173894, not 4ad55ed) — corrected in STATUS.
- ## DONE written to STATUS-conc.md. Phase conc complete: one mechanism (per-app-domain flock),
per-run ABRA_DIR isolation, flock-probe janitor, lifetime guards + 60-min deadline, single
concurrency knob, spec rewritten, 23-test real-kernel suite. Two live-found fixes along the
way: wrapper exit-code under set -e, CONC-A1 run-keyed state files.

View File

@ -0,0 +1,186 @@
# JOURNAL — phase `dstamp` (Builder, reasoning/private)
## 2026-06-11 — Bootstrap + investigation
Read the phase plan, plan.md §6.1/§7/§9, the Adversary's REVIEW-dstamp prep notes, and the
stamp-relevant harness code (`abra.py`, `lifecycle.py:deployed_identity/recipe_checkout_ref/
chaos_redeploy/prepull_images`, `generic.py:perform_upgrade/assert_upgraded`, run_recipe_ci
upgrade op + fetch_recipe).
### Mechanism (from abra source @06a57de = the pinned binary)
chaos-version label is set in `cli/app/deploy.go`: for a `-C` deploy, `getDeployVersion` (l.365)
returns `Recipe.ChaosVersion()` (l.367-373) and `SetChaosVersionLabel(compose, stack, toDeployVersion)`
(l.168). `ChaosVersion` (`pkg/recipe/git.go:300`) = `formatter.SmallSHA(Head().String())` + `+U`
if dirty. `Head` (l.483) = go-git `repo.Head()`. Crucially, `app.Recipe.Ensure(ctx)` (deploy.go:86)
calls into git.go:38 which **early-returns on `ctx.Chaos`** (l.41-43) — so a chaos deploy does NOT
re-checkout the .env version. `GetEnsureContext` (cli/internal/ensure.go) wires `EnsureContext{Chaos,
Offline, IgnoreEnvVersion=DeployLatest}` from the CLI flags. So `-C` ⇒ Ensure no-op ⇒ chaos version
= whatever git HEAD the harness left checked out.
### The contradiction that drove the dig
The m2p failure message is `chaos commit 'eb96de94+U', not the intended PR-head '7ae7b0f76efb'`.
`eb96de9` = tag `0.7.0+3.3.1` (the upgrade base); `7ae7b0f` = PR head (9 commits past that tag,
and there is NO 0.8/0.9 tag despite HEAD's "upgrade to 0.9.0+3.5.0" message). The harness
`perform_upgrade` does `recipe_checkout_ref(head_ref=7ae7b0f)` then `chaos_redeploy`, with only
`env_set` + `prepull_images` (pure docker compose, no git) in between — and the run's recipe
**snapshot HEAD = 7ae7b0f**. So at deploy time HEAD *should* be 7ae7b0f ⇒ stamp 7ae7b0f. Yet it
stamped eb96de9. abra's source says chaos = Head(); so for eb96de9 to be stamped, HEAD had to be
eb96de9 at the chaos deploy — which the isolated flow never produces.
### Reproductions (all on cc-ci, scratch ABRA_DIR, deploys bail at `secret not generated`
### which is deploy.go:140, AFTER the chaos version is computed+logged at deploy.go:372)
1. cp -a canonical recipe, checkout head→base(tag)→head, `abra app deploy -C` → `taking chaos
version: 7ae7b0f7`. HEAD stays 7ae7b0f. NO drift.
2. real non-chaos base deploy (exercises go-git `EnsureVersion` which checks out tag via
`Branch: refs/tags/0.7.0+3.3.1`, leaving HEAD=eb96de9), then CLI `git checkout -f head`, then
`-C` deploy → `taking chaos version: 7ae7b0f7`. NO drift.
3. mirror-faithful: `git clone <recipe-maintainers/discourse>` + `git checkout 7ae7b0f` +
`git fetch <coop-cloud/discourse> refs/tags/*:refs/tags/*` (exact `fetch_recipe`), then base
deploy → re-checkout head → `-C` deploy → `taking chaos version: 7ae7b0f7`. NO drift.
Conclusion: the isolated git/abra version-resolution path is **correct** in the current host
state. The drift is not in that path.
### Timeline / differentiator
- abra binary: constant since 2026-06-01 (system-4). Not abra.
- Same ref 7ae7b0f: run 184 (06-05 02:17, **solo**) was L4 upgrade-PASS. The drift runs
(m2b 06-10 20:54, m2p 06-11 00:44, ab 06-11 00:48) are **clustered** (m2p & ab 4 min apart →
overlapping for a multi-tier discourse run that takes ≫4 min).
- `app_domain` hashes (recipe|pr|ref) ⇒ all three drift runs, same ref, **collide on one swarm
stack**. The upgrade `chaos_redeploy` does NOT take `deploy_app`'s app-domain flock, so two
concurrent runs can interleave deploys on the shared stack and the `<stack>_app` service label
read by `deployed_identity` reflects whichever deploy last wrote it.
**Leading hypothesis:** the "harness-neutral env drift" is actually a **concurrency artifact** of
the rcust-phase M2 A/B discourse experiments running near-simultaneously on the shared stack — not
an abra/recipe/environment regression. Run 184 solo = green; clustered 06-11 = drift; isolated
re-reproduction now = green. Testing with one clean isolated real run (install,upgrade) before
committing to this attribution — direct evidence required by the plan, not inference alone.
Open: must still explain *exactly* how a concurrent peer produces an `eb96de9+U` (dirty CHAOS)
label on the shared stack — a base deploy is pinned/non-chaos (no chaos label), so the +U chaos
label must come from some chaos deploy with HEAD=eb96de9. The isolated real run + (if needed) a
deliberate 2-run concurrency repro will nail the mechanism. Will NOT claim M1 on inference.
## 2026-06-11 (cont.) — REAL runs: concurrency REFUTED, true root cause = swarm rollback
Three real install+upgrade runs of discourse @7ae7b0f (CCCI_RUN_ID=dstamp-repro{1,2,3}), each
SOLO/isolated (no concurrent discourse run):
- **base deploy is CHAOS** (not pinned): `compose.ccci.yml` overlay is present ⇒
`deploy_app` takes the `has_ccci_overlay` auto-chaos branch (`lifecycle.py:291-298`). So the
base stamps `chaos-version = eb96de9+U` on the shared stack. (My earlier bail-at-secrets repros
used a non-chaos/manual base → that's why they didn't expose it.)
- **repro1 (unpatched): upgrade FAIL** — `chaos commit 'eb96de94+U', not 7ae7b0f76efb`. The
per-run tree reflog + snapshot prove HEAD = **7ae7b0f** at the upgrade deploy (last checkout
16:39:03, no checkout-back), yet the deployed `.Spec` chaos label was eb96de9+U.
- **repro2 (instrumented: abra deploy `--debug` + a HEAD-print subprocess before the redeploy):
upgrade PASS** — `[DSTAMP] taking chaos version: 7ae7b0f7+U`, HEAD=7ae7b0f,
`deployed_identity = {version 0.9.0+3.5.0, image bitnamilegacy/discourse:3.3.1, chaos 7ae7b0f7+U}`.
So the SAME solo config is **intermittent** (184✓ 06-05, m2b/m2p/ab✗ 06-10/11, repro1✗, repro2✓);
flipping with a tiny timing change ⇒ **NOT a concurrency artifact, NOT abra version-resolution**
(abra computes 7ae7b0f7 correctly — proven by repro2's debug line AND all 3 bail-at-secrets repros).
**TRUE ROOT CAUSE (recipe deploy policy + heavy/flaky new task):** discourse `compose.yml` app
service sets `deploy.update_config: { failure_action: rollback, order: start-first }` with a
`healthcheck.start_period: 20m`. The upgrade chaos deploy applies the head spec
(`chaos-version=7ae7b0f7+U`) start-first (old + new task co-resident = ~2× memory for a
precompile-heavy Rails app). When the NEW task intermittently fails swarm's update monitor,
swarm executes **failure_action: rollback ⇒ reverts the app service to its PreviousSpec (the
base: `chaos-version=eb96de9+U`)**. Under `start-first` the OLD task keeps serving, so the
harness `wait_healthy` still passes — but `deployed_identity` reads `.Spec.Labels` of the
ROLLED-BACK spec and sees the base commit. The "since ~06-10 on every run" pattern = the
rcust-phase runs happened under heavier host load (warm keycloak etc.), so the new task reliably
failed the monitor ⇒ rollback every time; the solo 06-05 run (184) didn't roll back. Harness- and
abra-neutral, exactly as observed.
repro3 (UpdateStatus + PreviousSpec capture, NO --debug to preserve failing timing) running to
get the swarm rollback in the act (expect `UpdateStatus.State = rollback_*`, `PreviousSpec.Labels`
chaos=eb96de9+U == the read `.Spec.Labels` after revert). That is the direct-evidence smoking gun.
### DIRECT EVIDENCE — captured (repro4, solo/isolated, upgrade FAIL)
repro3 base deploy FATA'd (abra convergence monitor gave up — discourse is genuinely flaky/heavy
under load, which is the very premise). repro4 reached the upgrade and the post-`chaos_redeploy`
`docker service inspect <stack>_app` capture is the smoking gun:
- `UpdateStatus = {"State":"updating","Message":"update in progress"}`
- `.Spec.Labels` chaos-version = **7ae7b0f7+U**, version = 0.9.0+3.5.0 (HEAD spec applied OK)
- `.PreviousSpec.Labels` chaos-version = **eb96de94+U**, version = 0.7.0+3.3.1 (the base)
- `deployed_identity` (same instant) = chaos **7ae7b0f7+U** (reads Spec, correct)
Then `wait_healthy` ran (old task serving under start-first → passes); the new task failed swarm's
monitor → `failure_action: rollback` reverted `.Spec` → `.PreviousSpec` (eb96de94+U); the
assertion-phase read saw eb96de94+U → HC1 FAIL. The ONLY operation that turns `.Spec.Labels` from
7ae7b0f7+U into the exact `.PreviousSpec` eb96de94+U is a swarm rollback. abra+harness exonerated;
the head was really deployed and then swarm-reverted. Attribution complete, by direct evidence.
Note the app image is `bitnamilegacy/discourse:3.3.1` for BOTH base and head spec (head only bumps
the version label + db image), so the new task isn't failing on a missing image — it's the
start-first 2× co-residency of the precompile/Rails-heavy app under host memory pressure (a real
new-task failure, intermittent), which trips `failure_action: rollback`.
### Fix plan (HC1 teeth preserved)
- Reliability: `tests/discourse/compose.ccci.yml` overlay → app `deploy.update_config.order:
stop-first` (old stops before new starts → new boots with full memory → genuinely healthy → no
spurious rollback). Upgrade-to-head still really deployed+asserted; not a weakening. WHY in header.
Risk to weigh: stop-first = brief real downtime during the CI upgrade (covered by DEPLOY_TIMEOUT
3600). Alternative `failure_action: pause` REJECTED — it would let a genuinely-failed new task
pass HC1 (start-first keeps old serving) = test-weakening.
- Correctness: harness upgrade path asserts the redeploy converged to the head spec (UpdateStatus
not rollback*/paused / `.Spec` not reverted to `.PreviousSpec`) → honest failure message on a
real rollback, instead of the misleading "re-checkout failed". General (all rollback-policy
recipes). HC1 teeth intact: a head that truly can't stay healthy still fails.
- Will validate stop-first actually eliminates the rollback with a full real run before claiming.
## 2026-06-11 (cont.) — fix validated + blast-radius
**Fix implemented** (commit 0cc31a5): (1) `tests/discourse/compose.ccci.yml` app service
`deploy.update_config.order: stop-first`; (2) `lifecycle.assert_upgrade_converged()` + call in
`generic.perform_upgrade` right after `chaos_redeploy` (before wait_healthy) — waits for swarm's
app-service rolling update to reach a TERMINAL state and FAILs honestly on rollback*/paused.
Unit tests: 253 passed (no regression).
**fix1 validation** (run `dstamp-fix1`, fresh checkout @0cc31a5, install+upgrade, solo): UPGRADE
**PASS** — `upgrade-converged: …UpdateStatus=completed`, `upgrade→PR-head: head_ref=7ae7b0f7
chaos-version=7ae7b0f7+U version=0.7.0+3.3.1→0.9.0+3.5.0`. The head is deployed, the update
converges (no rollback), HC1 reads 7ae7b0f7+U. (Bug was intermittent — running more to show
reliability, since repro2 passed unpatched.)
**Blast-radius sweep** — recipes with `failure_action: rollback` + `order: start-first`:
`discourse, drone, keycloak, n8n, traefik`. Evidence check of the upgrade tier across many runs
(incl. the rcust-era m2r-* runs under the same heavy load):
- keycloak: runs 155/186/187/m2r/shot-proof → upgrade PASS L4 (HC1 pass ⇒ chaos==head). NOT affected.
- n8n: runs 47/54/61/162/197/m2r/shot-proof → upgrade PASS L4. NOT affected.
- drone, traefik: cc-ci INFRA (warm-reconciled), NOT enrolled in the recipe-CI upgrade tier.
⇒ **Only discourse actually exhibits the drift** — its app is uniquely heavy (Rails asset
precompile, 2.4GB image) so the start-first 2× co-residency OOMs the new task; the lighter
keycloak/n8n new tasks survive swarm's monitor, so no rollback. The general harness guard
(`assert_upgrade_converged`) now protects ALL rollback-policy recipes from a silent future
rollback (honest failure), and discourse additionally gets stop-first to converge reliably.
### Hardening (commit e9c26c7) + fix2 validation
Adversary independently confirmed the root cause + assessed the fix CORRECT (REVIEW-dstamp probe),
flagging one non-blocking race: assert_upgrade_converged's first poll could read a STALE terminal
`completed` (from the install/base deploy) before swarm schedules the new roll → return OK
prematurely → miss a later rollback. Hardened with a two-phase wait: phase 1 confirms the NEW
update is scheduled (`UpdateStatus.StartedAt` advances past the pre-redeploy value, captured via
`update_status_started`, or state is in-flight `updating`/`rollback_started`), with a 30s grace for
a genuine no-op redeploy; phase 2 then waits for the terminal verdict. fix2 (hardened, fresh
checkout @e9c26c7, install+upgrade): UPGRADE **PASS** — `upgrade-converged: …UpdateStatus=completed`,
`chaos-version=7ae7b0f7+U version=0.7.0+3.3.1→0.9.0+3.5.0`. Two consecutive green fixed runs
(fix1+fix2) vs intermittent unpatched failures (repro1✗ repro4✗ repro2✓). Unit tests 253 pass.
### M1 claimed
Attribution + minimal repro + 06-05→06-10 change + fix + blast-radius all complete and
Adversary-pre-confirmed → claiming M1 (verification recipe in STATUS-dstamp). Next: M2 — full
all-stages discourse green at true level via the drone `!testme` path (the recipe-CI pipeline runs
`cc-ci-run runner/run_recipe_ci.py` from the drone-cloned cc-ci workspace, so e9c26c7 is live for
!testme — no nixos-rebuild needed for the harness), other recipes re-proven (none affected), HC1
teeth shown (wrong stamp still FAILs), DEFERRED closed.
Fix direction (HC1 must keep its teeth — do NOT relax the commit match): the upgrade chaos redeploy
must assert against the *intended* applied spec, not a silently rolled-back one — i.e. the harness
must DETECT a swarm rollback (UpdateStatus.State rollback*) and treat it as an upgrade FAILURE with
a clear message (the deploy did not converge to the head spec), AND/OR make the upgrade redeploy not
subject to silent rollback masking (e.g. assert UpdateStatus completed before reading identity).
The recipe's rollback policy is legitimate for prod; the harness bug is that a rollback is invisible
to HC1 and masquerades as "stamped the wrong commit". Will finalise the fix after repro3 confirms.

View File

@ -0,0 +1,82 @@
# JOURNAL — phase `kuma` (uptime-kuma create-a-monitor functional test)
Design rationale, investigations, and dead-ends. Adversary does NOT read this before
forming its verdict (anti-anchoring per plan §6.1). See STATUS-kuma.md for claim context.
---
## 2026-06-11 — Approach selection: Playwright over python-socketio
**Context:** The phase plan offers two choices:
- (a) python-socketio client speaking Socket.IO events directly
- (b) Playwright driving the real browser UI
**Investigation:** Checked the cc-ci Nix Python environment:
```
/nix/store/x188l04r3gfkh18gy1dpf05fv3kkrgs7-python3-3.12.8-env/lib/python3.12/site-packages/
→ greenlet, playwright 1.50.0, pytest 8.3.3, pyee, packaging, pluggy, iniconfig
→ NO socketio, NO websocket-client, NO aiohttp, NO requests
```
python-socketio would need a `nix/cc-ci.nix` addition + `nixos-rebuild switch` on cc-ci.
Playwright is already present. **Chose option (b): no Nix changes, faster to ship.**
**Selector research:** Inspected uptime-kuma 2.2.1 source files in the Docker image:
- `src/pages/Setup.vue`: confirms `data-cy` attributes on all setup form fields
- `src/pages/EditMonitor.vue`: confirms `data-testid` on friendly-name, url, save-button
- `src/pages/Details.vue`: confirms `data-testid="monitor-status"` on status badge
- Compiled bundle `dist/assets/index-D_mnxLA0.js`: grep confirms all target attributes
**Heartbeat "important" logic:** Checked `server/model/monitor.js` line 1420:
```
// * ? -> ANY STATUS = important [isFirstBeat]
```
The server marks the first heartbeat as `important=true`, so it WILL appear in the
important-heartbeat table immediately after the first probe. This means the table row
check is a reliable proof of real probe execution.
**Status text:** From `src/mixins/socket.js` line 755 (`statusList` computed):
```javascript
text: this.$t("Up"), // UP=1
text: this.$t("Down"), // DOWN=0
```
English locale: "Up" (capital U, lowercase p) and "Down". Used these exact strings in
the `_wait_for_status` assertions.
**URL routing:** `src/router.js` uses `createWebHistory()` (history mode, not hash mode).
Routes: `/` → Entry.vue → redirects to `/dashboard`; `/add` → EditMonitor.vue;
`/dashboard/:id` → Details.vue. So `page.goto(f"{base}/add")` reliably opens the monitor
form directly.
**Negative test choice:** `http://127.0.0.1:19999/dead`:
- Inside the container, port 19999 is unused → OS returns ECONNREFUSED instantly
- Connection-refused causes uptime-kuma to mark the monitor DOWN immediately (no timeout wait)
- This proves the probe engine makes real outbound calls (not a stub)
- Included — fits runtime budget easily (~5 s for DOWN detection)
**Runtime budget analysis:**
- Setup wizard + login: ~10 s
- Create monitor 1 + wait UP: ~15-30 s (first probe immediate, but socket roundtrip)
- Create monitor 2 + wait DOWN: ~10 s (ECONNREFUSED is fast)
- Overhead: ~5 s
- Total estimate: ~40-55 s — well within ≤90 s target
---
## 2026-06-11 — Build #460 result + M1 claim
`!testme` triggered on uptime-kuma PR #3 (comment #14349). Bridge log:
```
[poll] triggered build 460 for uptime-kuma@eb4521cc (PR #3, comment 14349) by autonomic-bot
reflected outcome build 460 (uptime-kuma PR #3): success
```
Build 460 results.json:
- `level: 5`, all stages PASS (install/upgrade/backup/restore/custom/lint)
- `customization: {custom_tests: {cc-ci: {functional: 3, playwright: 1}}}`
- stage `custom` tests: health_check [pass], socketio_handshake [pass], spa_branding [pass], **test_monitor_wizard [pass]**
- `flags: {clean_teardown: true, no_secret_leak: true}`
PR comment #14350 posted: ✅ passed.
M1 claimed (commit fe8922c). Second `!testme` posted (comment #14352) for flake check while
Adversary reviews M1.

View File

@ -0,0 +1,116 @@
# JOURNAL — Phase lvl5
## 2026-06-11 bootstrap
- Read plan-phase-lvl5-lint-rung.md in full + plan.md §6/§6.1/§7/§9. Phase files created.
- Orientation reads: level.py (RUNGS 4, compute_level gap-caps, backup_restore_status, tier_to_rung), results.py derive_rungs/build_results (cap fields at :215-229), card.py (LEVEL_COLOR 0-6!, cap line :246, level_badge_svg cap_skip third segment), dashboard.py (_LEVEL_COLOR :68, _level_pill :245, cap div :277, render_level_badge :363), run_recipe_ci.py build_results call :1248 + badge wiring :1296-1320, bridge.py :224 (badge embed — number-only already, no cap text → likely untouched), docs (results-ux.md has cap language; recipe-customization.md EXPECTED_NA row).
- Notable: card.py LEVEL_COLOR already has keys 0-6 (5=green, 6=bright green) — only 0-4 reachable today; dashboard._LEVEL_COLOR needs checking for the same.
- Lint context: abra.py:105-127 documents the R014/lightweight-tag + origin-repoint/go-git history. Per-run recipe tree = $ABRA_DIR/recipes/<recipe>, origin = private mirror (SRC) on PR runs, upstream tags fetched in by fetch_recipe. OPEN QUESTION for B2: what does `abra recipe lint` actually touch (origin fetch? auth? R014 against which tags?) — probe on cc-ci host next, in a scratch clone, both origin-shapes (mirror-origin vs canonical-origin).
- Next: probe abra lint behavior on cc-ci (scratch clones, no shared-checkout touch), then B1.
## 2026-06-11 P1+P2 built, M1 claimed (branch phase-lvl5)
- level.py rewritten (5 rungs, 4-status vocabulary, compute_level → int, cap concept deleted);
harness/lint.py executor; results.py derive_rungs classification + schema 2 + lint stage/block;
run_recipe_ci.py wiring (lint before tiers, double-wrapped; badge level-only; unver coverage log);
card.py/dashboard.py de-capped (0-5 ramp, ladder line, unverified rows, lint.txt servable);
docs results-ux.md/recipe-customization.md; DECISIONS.md phase entry.
- Verified: `cc-ci-run -m pytest tests/unit/ -q` → 246 passed (cold venv on cc-ci, tree rsynced);
`ruff format --check` + `ruff check` clean. Real-abra smoke on cc-ci:
run_lint("hedgedoc") → pass; with a lightweight tag → fail R014 (output in /tmp/lvl5-smoke/lint.txt).
- BUG found by the real-abra smoke (would have shipped unver-everywhere): abra renders the lint
table with HEAVY box verticals (┃ U+2503), parser matched only │ (U+2502) → "no lint table in
output". Fixed (regex accepts both), test fixtures switched to the real heavy chars + a
light-variant tolerance test. Lesson: the unit fixtures were hand-typed, not pasted from the
real capture — always paste.
- test_meta.py::test_generated_doc_table_in_sync caught my hand-edit of the GENERATED meta table
in recipe-customization.md — moved the wording into the meta.py KEYS registry and regenerated.
- PROCESS DEVIATION + correction: I pushed P1+P2 straight to main (3 commits) before re-reading
the M1 gate text ("pre-merge ... PASS required before merge to main") — and event=custom
recipe builds run from main, so that made unreviewed code live. Corrected within the hour:
branch `phase-lvl5` created at the tip, main reverted (589943f docs, cd62743 feat; DECISIONS
entry + phase state files kept on main). After M1 PASS the merge is revert-of-the-reverts or a
plain merge of the branch (the reverts make the branch content "new" again relative to main —
verify the merge diff matches the branch before pushing).
- M1 claimed in STATUS-lvl5.md with full cold-verify recipe.
## 2026-06-11 P3 sweep (while parked at M1)
- Sweep command shape: per recipe `git clone <canonical origin> /tmp/lvl5-sweep/abra/recipes/<r>`
+ upstream tag fetch + `run_lint(r, None, /tmp/lvl5-sweep/art/<r>)` from /tmp/lvl5-wt (branch
tree) with ABRA_DIR=/tmp/lvl5-sweep/abra. Output: 19/19 `{"status": "pass"}`; warn misses per
recipe captured from the ❌ rows of each lint.txt. Matrix + §2.9 baseline table → BACKLOG-lvl5.
- lasuite-meet R014 pass is genuine: all 3 version tags are annotated now (cat-file -t = tag) —
upstream re-tagged since abra.py:105 was written.
- Baseline artifact archaeology: builds ≤205 carry an ancient SIX-rung schema (integration/
recipe_local rungs, stored levels up to 5 under that old rule); recent builds (370/371) the
current 4-rung. Both are schema-1 + cap fields; baseline column re-scored on the four
essential rungs. bluesky-pds and mumble have no retained results.json.
- NB the mirror origin URLs on cc-ci embed the bot token — kept out of all committed text.
## 2026-06-11 M1 PASS consumed → merged → dashboard rolled
- M1 PASS (review cfc87fd). Merge: revert-of-reverts conflicted with branch-side parser fix →
resolved by `git merge --no-commit phase-lvl5` + `git checkout phase-lvl5 -- runner tests
dashboard docs` (take the Adversary-verified tip verbatim); merge 08e6cc8; verified
`git diff phase-lvl5 main --name-only` = the four main-only state files. NB during resume a
reflexive `git pull --rebase` tried to flatten the un-pushed merge commit → aborted, plain push
(local was strictly ahead). Lesson: never pull --rebase with an un-pushed merge commit.
- Suite re-run from merged main rsynced to cc-ci: 246 passed.
- Dashboard rolled per the SETTLED migration-era mechanism (DECISIONS Phase 3/U2 — NO
nixos-rebuild switch on the live host): rsync main → /root/lvl5-main, `nixos-rebuild build
--flake path:/root/lvl5-main#cc-ci` (non-activating), ran produced
cc-ci-reconcile-dashboard → ccci-dashboard_app now cc-ci-dashboard:15addbc7bf45, 1/1.
- Live checks: / 200; /runs/370/{results.json,summary.png} 200 (old artifacts unharmed);
/badge/immich.svg 200 = number+colour only (#a0b93f, "level 4"); /recipe/immich 200.
## 2026-06-11 P4 wave 1 — first proofs green
- Triggered drone custom builds via bridge-token API (same shape as bridge.trigger_build).
- Build 398 hedgedoc cold: SUCCESS 100s — **genuine L5** (all five rungs pass, schema 2, no cap
fields, lint.txt+badge 200). Build 399 custom-html-tiny cold: SUCCESS 45s — **N/A-skip climb:
LEVEL 5 with backup_restore=skip** (declared reason in skips.intentional; was L2 at baseline
#205). Durations nowhere near inflated (lint ≈0.7s inside).
- Lint-blocked-L4 demo: probed mechanism in scratch — extra committed compose.lintdemo.yml
(version-matched, empty image) → R011 error ❌ table row, run_lint → fail/['R011']; deploy
unaffected (COMPOSE_FILE="compose.yml"). Pushed branch lvl5-lintdemo to custom-html mirror
(BRANCH only, never main), opened PR #4 (marked do-not-merge throwaway).
- !testme posted (comments 14326/14327/14328) on custom-html#4, immich#2, plausible#3
bridge-triggered builds 400/401/402 (drone path ×3). Awaiting.
## 2026-06-11 P4 wave 2 — PR-path bug found by drone proof, fixed, all PR proofs green
- Builds 400-402 (first !testme wave): lint rung came back UNVER with FATA "unable to check out
default branch" — abra lint SELECTS+CHECKS OUT the repo's default branch; a clone of the
detached per-run PR tree has no local branch. Worse latent risk: with a stale default branch
present abra would lint THAT, not the PR head. Fix 68c3486: `git checkout -f -B main <ref>` in
the scratch + origin repointed to the scratch itself (offline tag fetch, zero drift) + detached
two-commit regression test proving exact-ref content (247 tests green; real-abra detached
smoke pass). Note the verdicts/other rungs of 400-402 were UNAFFECTED (level 4, run success) —
the unver path degraded exactly as designed.
- Re-ran !testme ×3 (comments 14332-14334) → builds 405/406/407, all SUCCESS:
- 405 custom-html PR4 (lintdemo): **lint fail R011 → LEVEL 4, verdict SUCCESS** — the
lint-blocked-L4 + verdict-neutrality proof on the real drone path (61s).
- 406 immich PR2: **LEVEL 5** (199s, = shot-phase baseline). 407 plausible PR3: **LEVEL 5** (164s).
- Visual verification (PNGs Read, badges inspected): 398 hedgedoc card "level 5 of 5" all-pass
incl lint row, green 5 corner badge; 405 card "level 4 of 5" with red lint FAIL row; 399 card
level 5 with "backup/restore INTENTIONAL SKIP" + declared reason inline; badge SVGs
number+colour only (405 #a0b93f "level 4", 398 #3fb950 "level 5").
- Canaries 411 (bkp-bad) + 412 (rst-bad) + mumble cold 413 triggered.
## 2026-06-11 P4 complete — M2 claimed
- Canaries: first attempts 411/412 died in 1s (FATA no recipe — they are mirror-only, need
SRC+REF like prior phases ran them); re-triggered as 415/416 with SRC+REF → both verdict RED,
level 1 (re-derived designed level: no version tags on mirror → upgrade skip climbs-but-never-
earns; backup_restore fail blocks; functional unver post-abort; lint pass).
- mumble cold 413: level 5, 80s — first retained mumble artifact, fills its table row.
- Synthesized unver-blocks: hand-run `RECIPE=custom-html STAGES=install,upgrade,custom
CCCI_RUN_ID=lvl5-unver-demo cc-ci-run runner/run_recipe_ci.py` (log /tmp/lvl5-unver-run.log,
rc=0) → results.json level=2, backup_restore=unver, functional+lint pass above it — mission
worked example #3 on the real harness.
- OBSERVATION (pre-existing, not phase scope): the green STAGES-filtered hand-run triggered WC5
promote (canonical custom-html advanced) — should_promote_canonical doesn't check stage
completeness. Surfaced to Adversary in the M2 claim notes; not fixing inside this phase.
- M2 claimed in STATUS-lvl5 with the full evidence table (runs 398/399/405/406/407/413/415/416 +
lvl5-unver-demo). B11 ticked.
## 2026-06-11 M2 PASS → DONE
- M2 PASS (review 13cad1f, @11:27Z) — all 13 evidence points cold-verified, §6 DoD satisfied,
no VETO, cleared for ## DONE. Both gates passed today (M1 cfc87fd, M2 13cad1f); no standing VETO.
- Cleanup: PR custom-html#4 closed + branch lvl5-lintdemo deleted (204). WC5 stage-completeness
observation filed to machine-docs/DEFERRED.md (operator decision; Adversary concurs not a finding).
- Phase complete: L5 lint rung + de-capped level semantics live end-to-end.

View File

@ -0,0 +1,116 @@
# JOURNAL — phase mailu
Design rationale, dead-ends, investigation notes. Not for Adversary pre-verdict reading.
---
## 2026-06-11 Bootstrap + data-layout research
### mailu volume layout (from compose.yml analysis)
Services and their durable volumes:
- `admin` service: mounts `mailu` vol → `/data` (sqlite DB: users, mailboxes, domains, settings)
- `imap` (dovecot) service: mounts `mail` vol → `/mail` (Maildir message storage)
- `admin` service also mounts `dkim` vol → `/dkim` (DKIM private keys)
- `antispam` service: mounts `rspamd` vol → `/var/lib/rspamd` (antispam training data — ephemeral)
- `db` (redis) service: mounts `redis` vol → `/data` (session cache — ephemeral)
- `webmail` service: mounts `webmail` vol → `/data` (roundcube prefs — ephemeral)
- `smtp` service: mounts `mailqueue` vol → `/queue` (postfix queue — ephemeral)
- `app` (nginx) + `certdumper`: mount `certs` vol (TLS cert dumps — regenerable)
### Backup decision: admin/data + imap/mail
For genuine backup/restore coverage:
- **`admin:/data`** = sqlite DB → primary source of truth for mailboxes/users. If this is lost,
all accounts are gone. Must backup.
- **`imap:/mail`** = Maildir storage → the actual messages. Loss = all mail gone. Must backup.
- `dkim:/dkim` = DKIM keys. In production, loss = need re-keying + DNS update. BUT: for CI testing,
we don't have DNS-side DKIM records anyway, so DKIM regeneration is harmless. NOT labeled for
CI simplicity (can add in a follow-up if operator wants DKIM key recovery tested).
- Other volumes: ephemeral / regenerable. Not labeled.
### Backupbot v2 syntax decision
From studying n8n and discourse examples:
- v2 uses `backupbot.backup: "true"` + `backupbot.backup.path: "<container-path>"`
- v1 used `backupbot.volumes.<name>=true/false` (immich pattern — do NOT use for new work)
- mailu has no Postgres (uses SQLite), so no pg_dump hook needed
- For `admin`: `backupbot.backup.path: "/data"` (whole sqlite DB dir)
- For `imap`: `backupbot.backup.path: "/mail"` (whole Maildir)
### mailu compose.yml structure note
mailu uses `deploy.labels` (list form with `- "key=value"` strings) for the app service's traefik labels. The backupbot labels need to go on the services that own the data:
- `admin` service uses `labels:` directly (not `deploy.labels`) — no traefik label there
- `imap` service similarly uses `labels:` directly
Wait, actually checking the compose.yml — there's no `labels:` on `admin` or `imap` at all.
The `app` (nginx) service has `deploy.labels` for traefik. For backupbot, the labels need to be
on the DEPLOYED service (under `deploy.labels` or top-level `labels`). In Docker Swarm, backupbot
uses service labels (which are deploy-time labels). So we need `deploy.labels` on admin + imap.
The `app` service already uses `deploy.labels` (list form) for traefik. For admin + imap we need
to add `deploy:``labels:` sections.
### Version bump
Current version: `3.0.1+2024.06.52` (on `app` service `deploy.labels``coop-cloud.${STACK_NAME}.version`)
New version: `3.1.0+2024.06.52` (minor version bump for backupbot feature addition)
### CI test design
**ops.py hooks** (consistent with n8n pattern):
- `pre_backup(ctx)`: create a test mailbox `citest@<domain>` via `flask mailu user citest <domain> '<password>'` in the admin container
- `pre_restore(ctx)`: delete the mailbox via `flask mailu user delete citest@<domain>` (or equivalent) to simulate data loss
**test_backup.py**: assert `citest@<domain>` is in `config-export` at backup time
**test_restore.py**: assert `citest@<domain>` is back in `config-export` after restore
The `_mailu.py` helpers already provide:
- `flask_mailu(domain, cmd)` → runs flask mailu CLI in admin container
- `config_export(domain)` → parses config-export JSON
- `user_emails(cfg)` → list of email addresses from config
### Delete-user CLI for pre_restore
Need to confirm the delete command. From mailu docs, the admin CLI:
- Create: `flask mailu user <local> <domain> '<password>'`
- Delete: `flask mailu user delete <email>` (where email = local@domain)
- Or: `flask mailu user delete <local>@<domain>`
Need to verify the exact syntax. Will use `flask mailu user delete citest@<domain>` and add error handling.
---
## 2026-06-11 ADV-mailu-01 fix — extend seed to cover /mail Maildir
### Adversary finding (M1 FAIL)
The M1 claim was rejected because ops.py only proved SQLite (`/data`) backup/restore. The `/mail`
Maildir volume was labeled and backed up but never specifically tested for restoration. If backupbot
silently skipped restoring `/mail`, the test would still PASS.
### Fix (cc-ci commit b9352e8)
Extended the seed in three steps:
**ops.py `pre_backup`**: After creating `citest@<domain>`, inject a test message via in-container
`sendmail` (smtp container → postfix → rspamd → dovecot deliver). Subject: `ccci-backup-probe`.
Wait up to 60s for dovecot to deliver (polling `doveadm search`). This is identical to the pattern
proven in `test_mail_flow.py`.
**ops.py `pre_restore`**: Now wipes BOTH:
1. The user from sqlite: `DELETE FROM user WHERE localpart='citest'` via python3 in admin container
2. The user's Maildir: `rm -rf /mail/<domain>/citest` in imap container
**test_backup.py**: Added `test_backup_captures_mail_message` — asserts the message is present
at backup time via `doveadm search` in imap container.
**test_restore.py**: Added `test_restore_returns_mail_message` — asserts the message is back in
INBOX after restore via `doveadm search` in imap container.
### Why rm -rf over doveadm expunge
Used `rm -rf /mail/<domain>/citest/` in pre_restore rather than `doveadm expunge` because:
- `rm -rf` directly wipes the Maildir from disk — observable, immediate, unambiguous
- `doveadm expunge` marks messages for deletion but depends on dovecot's expunge/purge cycle
- The goal is a clear divergence: after pre_restore, the maildir DOES NOT EXIST; after restore, it DOES
### Build #477 in flight to verify

View File

@ -0,0 +1,307 @@
# JOURNAL — sub-phase rcust (Builder)
## 2026-06-10 bootstrap
Read phase plan (recipe-custom-restructure-full-plan.md), plan.md §6.1/§7/§9, and the reference
spec docs/recipe-customization.md @ 76a4b6b in full. Created phase state files. Work branch will
be `restructure/recipe-custom` off main @ 76a4b6b. Starting P1: reading the six current loaders
(run_recipe_ci.py::_load_meta, conftest.py::_recipe_meta, lifecycle.py::_recipe_extra_env,
lifecycle.py::_recipe_meta_flag, deps.py::declared_deps, canonical.py::is_canonical_enrolled)
before writing harness/meta.py.
## 2026-06-10 P1 — single loader + registry (branch 472a68b)
Wrote runner/harness/meta.py: KEYS registry (14 keys + CHAOS_BASE_DEPLOY/OIDC_AT_INSTALL/
SKIP_GENERIC kept registered as deprecated=True so P1 lands green before P2 deletes them),
RecipeMeta generated from KEYS via dataclasses.make_dataclass (frozen; field set cannot drift from
the registry), load() = the only exec() of recipe_meta.py, MetaError on unknown ALL-CAPS/type
mismatch/callable-on-data-key, difflib suggestion in the unknown-key message. BACKUP_CAPABLE keeps
its tri-state via default None (None = auto-detect — preserves the old `"BACKUP_CAPABLE" in meta`
semantics in generic.backup_capable).
Migrations: orchestrator loads once + passes meta down (deploy_app/perform_upgrade/_perform_op/
run_lifecycle_tier all take the object); conftest meta fixture returns full RecipeMeta (R3 closed);
lifecycle._recipe_extra_env/_recipe_meta_flag and deps.declared_deps deleted; canonical.is_enrolled
+ enrolled_recipes go through meta.load (tests monkeypatch meta.TESTS_DIR now instead of
canonical.__file__); screenshot._load_screenshot_hook reads the attribute (R2 fixed — unit test
proves SCREENSHOT survives the real orchestrator load path). deploy_app keeps an optional
meta=None fallback (loads via the single loader) for fixture/manual callers — exec still happens
in exactly one function.
Effective-value safety check before committing: dumped non_default() for all 21 recipe dirs through
the new loader — every recipe's customized key set matches its recipe_meta.py source (e.g. mumble:
DEPLOY_TIMEOUT/EXTRA_ENV/HEALTH_OK/READY_PROBE/UPGRADE_EXTRA_ENV). One intentional delta class:
deps.deploy_deps' fallback timeouts for a MISSING dep meta change from literal 900/600 to loading
the dep's real meta (orchestrator path always supplied metas, so CI behavior is identical).
Verified on cc-ci (rsynced working tree before committing):
cc-ci-run -m pytest tests/unit -q -> 175 passed
nix develop .#lint --command scripts/lint.sh -> lint: PASS
Three pre-existing f212 unit tests passed dicts to wait_ready_probes — updated mechanically to
construct RecipeMeta via dataclasses.replace (assertions untouched).
Next: P2a compose.ccci.yml first-class + auto-chaos.
## 2026-06-10 P2 — legacy keys & paths deleted (branch 8cd72fd)
P2a: lifecycle.provide_ccci_overlay copies tests/<recipe>/compose.ccci.yml into the per-run
checkout (after install_steps hook, before prepull/deploy); pinned base deploys auto-chaos on
overlay presence (has_ccci_overlay replaces the meta.CHAOS_BASE_DEPLOY elif). ghost/discourse
install_steps.sh were copy-only -> deleted whole; their metas keep COMPOSE_FILE in EXTRA_ENV
(unchanged wiring, the harness now owns the copy).
P2b: oidc_at_install condition removed — `if declared:` provisions before the single deploy,
legacy post-deploy block + _run_setup_custom_tests_hook deleted. lasuite-docs install_steps.sh is
the meet/drive hook with docs' exact env names (diffed against the deleted setup_custom_tests.sh:
same keys incl. OIDC_OP_DISCOVERY_ENDPOINT + scopes 'openid email profile'; secret-insert bump
identical; only the abra-redeploy step is gone — the single deploy reads the env instead).
lasuite-drive's MinIO bucket one-shot -> ops.py pre_install (runs at install-tier start, post-
deploy; bucket lives in the minio volume so it survives upgrade/restore; same scale --detach +
30x3s poll as the shell version). run_quick: deps still provision (realm/creds), hook call gone —
no quick-enrolled recipe declares DEPS today; noted inline.
P2c: SKIP_GENERIC out of the registry; _skip_generic(op) env-only; skip_generic_env_overrides()
prints a `!!` warning when active under DRONE (P5 will embed in the manifest).
P2d: conftest deps fixture = dict of _DepEntry (dict subclass w/ attribute sugar) — the 6 lasuite
files only ever used deps_creds, renamed param to deps, zero assertion changes. NOTE for Adversary:
some assert MESSAGE strings ('setup_custom_tests should have populated this.' -> 'dep
provisioning...') and docstrings updated — message text only, no assert logic/expected values.
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 175 passed;
nix develop .#lint --command scripts/lint.sh -> PASS. Doc table regenerated to the 14-key registry
(doc-sync unit test pins it).
Next: P3 — HookCtx + ctx-hook signatures everywhere.
## 2026-06-10 P3 — uniform ctx hook convention (branch fd02d9f)
HookCtx frozen dataclass + hook_ctx() constructor in harness/meta.py; ctx.deps read straight from
$CCCI_DEPS_FILE (json, both shapes) — meta.py stays import-cycle-free (deps.py imports lifecycle
which imports meta). Registry keys carry hook_params; meta.load() enforces the expected positional
names per hook key (READY_PROBE/BACKUP_VERIFY/EXTRA_ENV/UPGRADE_EXTRA_ENV=(ctx,),
SCREENSHOT=(page, ctx)); _run_pre_hook applies meta.check_hook_signature(fn, ("ctx",)) to ops.py
hooks before calling. Conversion of 17 ops.py + 8 recipe_meta hooks was scripted (def-line regex +
bare `domain` -> `ctx.domain` inside the pre_*/hook function bodies only) and diff-reviewed; the
only manual fixes: keycloak pre_restore passed `meta` -> `ctx.meta`, and two comment lines in
lasuite-drive/-meet metas that the regex over-replaced were restored. wait_ready_probes gained
op= (install/upgrade call sites pass it) so probes can know the phase.
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 180 passed; lint PASS.
Next: P4 — discovery placement rule + op_state/deps fixtures + migrate hand-parsers.
## 2026-06-10 P4 — custom-test ergonomics (branch 29a28e2)
Pre-change sweeps confirmed the plan's zero-users claims: no top-level non-lifecycle test_*.py in
any recipe dir; no recipe test file reads os.environ / CCCI_OP_STATE_FILE directly (the only
op-state consumers are the generic assertions via harness.generic.op_state — harness-side, fine).
So P4 = discovery glob removal + new op_state fixture + pinning tests; no test migrations needed.
test_discovery.py's HC2 gate test moved its repo-local custom fixture under functional/ (the rule);
test_discovery_phase2.py now asserts top-level custom is NOT discovered. op_state fixture skips
(clear reason) when env unset / file missing / unparseable; tested via request.getfixturevalue.
Verified on cc-ci: cc-ci-run -m pytest tests/unit -q -> 184 passed; lint PASS.
Next: P5 — customization manifest (print block + results.json key).
## 2026-06-10 P5 — customization manifest (branch 68954be)
(Resumed after a usage-limit pause mid-P5; working tree carried the in-flight manifest.py.)
New runner/harness/manifest.py: build() collects {meta_non_default, hooks, overlays, custom_tests,
env_overrides} via the SAME discovery/meta functions the run uses (so the manifest can never
disagree with what actually executes — incl. the HC2 _gated() repo-local gate), render() prints
the block. Orchestrator builds+prints right after meta load / repo-local snapshot, BEFORE the
quick-lane branch (both lanes get the block); the dict rides into build_results(customization=...)
verbatim. run_quick writes no results.json, so the single build_results call site covers all.
Hooks render as "<hook>", tuples as lists (JSON-clean); ops.py pre-ops listed by cheap source
scan (same approach as discovery._module_defines — no import at manifest time).
Lint flagged: C408 dict() literal, import-block order (manifest after deps), ruff-format on the
new test file — all fixed. Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest
tests/unit -q -> 191 passed; nix develop .#lint --command scripts/lint.sh -> lint: PASS.
Next: P6 docs, then M1 prep (tests/concurrency proof run + 21-recipe baseline matrix).
## 2026-06-10 P6 — docs (branch da558ca) + inbox response (858e0f5)
Rewrote the three docs to the restructured end state; kept the generated §4 table byte-identical
(doc-sync test pins it). recipe-customization.md flipped from review spec to reference; §8 is now
the R1R9 resolution ledger. Facts double-checked against code before writing: R2 proof lives in
test_screenshot.py::test_screenshot_reachable_through_real_load_path (not test_meta.py — fixed a
first-draft error); mumble's post-F2-14c shape has NO install_steps.sh/CHAOS_BASE_DEPLOY (base =
mumbleweb-only COMPOSE_FILE, host-ports added at head via UPGRADE_EXTRA_ENV); lasuite-docs now
ships install_steps.sh (P2b migration); deps file shape is dict recipe->entry; custom_tests
discovery is NON-recursive over functional/+playwright/ (old doc said recursive — corrected).
Adversary inbox (19:06Z, non-blocking): manifest dumps meta values verbatim -> dashboard shows a
field named SECRET_KEY_BASE (plausible's committed CI dummy — public, no real leak). Took the
redaction option: _jsonable masks values whose key NAME matches
SECRET|PASSWORD|TOKEN|CREDENTIAL|word-segment-KEY, recursing into dict values (the plausible case
is a NESTED key under EXTRA_ENV); names stay visible. KEYCLOAK_URL deliberately not matched
(word-segment KEY). Unit test pins redacted+passthrough both.
Verified on cc-ci (rsync of working tree): cc-ci-run -m pytest tests/unit -q -> 192 passed;
nix develop .#lint --command scripts/lint.sh -> lint: PASS.
Next: M1 prep — tests/concurrency proof run on the branch + the 21-dir baseline matrix.
## 2026-06-10 M1 prep + claim
Concurrency proof run on branch head 858e0f5 (rsynced tree on cc-ci): cc-ci-run -m pytest
tests/concurrency -q -> 23 passed in 11.46s (suite untouched by the restructure, as planned).
Baseline matrix: pulled every /var/lib/cc-ci-runs/*/results.json (141 files) and took the most
recent per recipe. 19/21 dirs covered by results.json; mumble's last full run predates the
results system (log ~/ccci-mumble-f214c.log, 5 tiers pass 05-31); bluesky-pds likewise
(Adversary Phase-2 cold verify e45e0ee). plausible's weekly-report RED was its PR branch
(pg13->14, build 200); its default-branch baseline is run 308 (06-10) L4 — runs 307/308 are
today's, from the conc-phase M2 sweep. Bad canaries recorded at their designed-fail tier.
Claimed M1. While waiting: nothing else unblocked in this phase (M2 is gated on M1) — will hold
with short fallback polls per §7 case 2.
## 2026-06-11 M2 reconciliation — discourse upgrade-HC1 root-cause hunt + bluesky re-characterization
Resumed after a loop stall (~21:18Z23:50Z): the m2b/ab sweeps had finished but nothing processed
them. Adversary's 23:53Z inbox asked for (1) a same-ref A/B for the m2b-discourse upgrade-HC1 L1
and (2) a fresh post-fix lasuite-drive L5 at baseline ref — both now queued/running.
Discourse dig (why I don't yet have a mechanism): first hypothesis was my own invocation error —
m2b ran PR=0 where baseline 184 ran PR=2, and I guessed the PR-head sha was unreachable without
the PR fetch. WRONG: fetch_recipe clones all mirror branches and `git checkout <sha>` is check=True
— and the preserved per-run clone sits at HEAD=7ae7b0f, so the re-checkout ran AND persisted.
Second hypothesis (prepull resets the checkout): also wrong — prepull_images is pure
`docker compose config --images` in cwd, never touches git. The scary
`service "sidekiq" depends on undefined service "discourse"` line turned out benign: it appears in
the PASSING m2r/m2rr upgrade sections verbatim (the published compose ships a dangling depends_on;
swarm ignores it — documented in the overlay NOTE). What's left: abra stamped the PREV-TAG commit
(eb96de94 = 0.7.0+3.3.1) on the chaos redeploy while the tree was at 7ae7b0f. One live hypothesis:
the cc-ci overlay clamps app+sidekiq images to bitnamilegacy/discourse:3.3.1; at this PR head
(0.9.0+3.5.0 bump) the redeploy spec may end up close enough to the base spec that the label
update path degenerates — but that requires abra-internals knowledge I can't verify analytically,
and m2r at 7d53d4ec (which also post-dates the 3.5.0 bump?) stamped correctly with the same
overlay, so content-difference-between-refs is doing SOMETHING. Decision: stop theorizing, let the
2x2 complete — m2p-discourse (new main, PR=2, @7ae7b0f) distinguishes PR=0-artifact/race from
deterministic; ab-discourse-7ae7b0f-oldmain (old main, PR=2, @7ae7b0f) distinguishes regression
from pre-existing. Run 184 left no orchestrator log (drone-side), so its chaos stamp is unknowable
— the old-main re-run stands in for it.
lifecycle.py diff c2508c7..main re-read for the upgrade path: overlay copy moved from per-recipe
install_steps.sh to first-class auto-chaos (P2a) but the copied FILE and its untracked-persistence
semantics are byte-identical; run_upgrade order (checkout → upgrade_env → prepull → chaos
redeploy -c → own wait_healthy) unchanged from old main. Nothing jumps out as the delta.
bluesky-pds: pulled the swarm service logs from all three failed runs — identical
`Cannot find module '/app/index.js'` crash-loop (Node v24.15.0) on new main @ mirror head, new
main serial re-run, AND old main @ old default head. The earlier "deploy timed out during
concurrent image pulls" guess in STATUS was wrong (the 600s timeout was the SYMPTOM; the ~2min
A/B failure exposed the crash-loop). Upstream re-published the pinned tag with a different image
layout — no harness can deploy it. Filed in STATUS as restructure-neutral with grep-able evidence.
## 2026-06-11 lasuite-drive root cause #2 — completed one-shot poisons convergence (caught live)
Watching the m2p proof run instead of just waiting paid off: the fix-forward's best-effort line
printed (so #1 is fixed), but the install assert then sat in pytest for 25+ minutes. Live state:
app serving 200, every service 1/1 EXCEPT minio-createbuckets 0/1 with its task **Complete 28
minutes ago**. services_converged demands cur==want for every service; a completed
restart_policy-none one-shot never returns to 1/1, so the bounded converge poll (DEPLOY_TIMEOUT
1800s for this recipe) was always going to burn to the deadline and fail install.
Why nobody ever saw this before P2b: the old setup_custom_tests.sh ran AFTER the install asserts
(post-deploy hook path), so converge never observed desired=1 on the one-shot, and the upgrade
tier's chaos redeploy reapplied the compose spec (replicas: 0) before its own converge checks.
P2b folded the trigger into ops.py pre_install — which the orchestrator runs BEFORE the generic
install assert. Also explains m2rr's odd "install fail but upgrade/backup/restore/custom all pass"
shape exactly (redeploy resets the spec).
Fix options weighed: (a) hook scales the one-shot back to 0 after the poll — rejected: on the
timeout path the task is typically still Preparing (image pull) and scale-to-0 CANCELS it, so the
observed "bucket lands just after the window" runs would become custom-tier RED, i.e. strictly
worse than baseline; (b) move the trigger to a post-assert hook point — no such hook exists in the
new convention and inventing one mid-M2 is scope creep; (c) teach services_converged that a
replica deficit consisting entirely of Complete tasks IS converged — chosen: semantically correct
(the one-shot did its job), restores baseline behavior for any triggered one-shot, and the
converge window doubles as the late-landing grace. Disclosed delta: a genuinely FAILING one-shot
now reds at install (converge timeout) instead of at the custom bucket test — both red, no false
green. Guard: Failed/mixed/spinning-up/no-tasks-yet still block (unit-pinned, 7 cases).
Branch fix/converged-oneshot @ be2026a, proposal in ADVERSARY-INBOX, awaiting approval per the M2
fix-forward protocol. Unit suite 199 passed + lint PASS from the cc-ci working-tree rsync.
## 2026-06-11 ~01:00Z — merge landed, queue shortened
be2026a approved (REVIEW a531746, cold-verified independently) and merged as 6cabbe7; drone build
350 green on the push head 914c166. Merged diff verified == branch diff (empty git diff be2026a..
main for the two files). Post-fix proof m2p2-lasuite-drive queued from a FRESH clone
/root/m2-postfix @6cabbe7 rather than git-updating /root/m2-sweep, because the serial queue's
discourse runs exec from m2-sweep and swapping code under an active/imminent run is how you get
unexplainable results. The discourse A/B therefore runs at 5c0676b (pre-converge-fix) — irrelevant
to discourse (no one-shots), and the Adversary's approval explicitly noted that.
Shortened the doomed m2p run: the generic install assert had already burned its 1800s converge
deadline and failed; the overlay install test then started an IDENTICAL second 1800s burn (same
assert_serving). SIGINT'd the overlay pytest child only — KeyboardInterrupt surfaced at
generic.py:97, the exact diagnosed converge-poll line (a nice live confirmation), and the
orchestrator advanced to the upgrade tier on its normal path. Teardown semantics untouched.
Disclosed in STATUS so the log's KeyboardInterrupt is pre-explained.
Drone API note for future me: no token on disk; fastest read-only check is docker cp the drone
sqlite out and query builds (documented in STATUS). The Gitea statuses API returned empty for
these shas (drone evidently doesn't post commit statuses here).
## 2026-06-11 ~00:55Z — discourse A/B closed (harness-neutral), mechanism still unattributed
m2p-discourse (new main, PR=2, @7ae7b0f) and ab-discourse-7ae7b0f-oldmain (old main, PR=2, same
ref) failed the upgrade IDENTICALLY: HC1, chaos-version=eb96de94+U, all other tiers pass, L2.
Same invocation as baseline 184 which was L4 five days ago. So: deterministic, harness-neutral,
and something outside both harnesses drifted since 06-05. Eliminated: branch-tip existence (7ae7b0f
still tips upgrade-0.8.0+3.5.0 + pr/2), upstream tag set (0.7.0+3.3.1 still latest), abra pin
(flake.lock untouched by the restructure). Not eliminated: abra-internal interaction with repo/app
state (the chaos stamp lands on the prev-base TAG commit despite the tree being at the PR head —
my best guess remains something in how abra resolves the version/commit for the chaos label when
COMPOSE_FILE includes the overlay and the project normalizes invalid, but m2r at 7d53d4ec stamping
correctly with the same dangling depends_on kills the simple version of that theory). The
`service "sidekiq" depends on...` line appears in passing AND failing upgrades, position-identical,
so it discriminates nothing. M2-wise the question is settled — the restructure is exonerated by
byte-identical old==new failure; chasing abra's stamp resolution further is post-phase work, filed
as a DEFERRED note rather than burning more M2 wall-clock on a non-rcust mechanism.
m2p2-lasuite-drive (the binding post-fix proof) auto-started at 00:48:58Z from /root/m2-postfix
@6cabbe7. Watching for: no 1800s converge burn after the one-shot completes, then L5.
## 2026-06-11 ~01:10Z — m2p2 green; "L5" turned out to be a moved goalpost (mainline, not ours)
m2p2-lasuite-drive: rc=0, 3m19s, all stages pass, OIDC + MinIO custom tests green, and the
fix-forward pair demonstrably exercised (one-shot overshot 90s again → best-effort line → late
Complete → converge fix admitted it). But results.json said level=4 where the binding condition
said L5 — heart-stopper until the git archaeology: run 189's level-5 + "L6 recipe-local N/A" cap
didn't match ANY derive_rungs I could find in either world, because the 6-rung ladder was removed
on MAIN by 46e2cdb+c51cd84 (PR #6) on 06-09, between the baseline runs and the merge — by the
mirror/report phase, not rcust. The merge didn't touch level.py (checked 01e6d49^1..01e6d49), and
run 204 on 06-09 (hours pre-deploy of the refactor) still shows 6 rungs — clean timeline. So the
baseline matrix's "L5" rows need a schema-equivalence reading, declared in STATUS BEFORE the claim
rather than negotiated after the Adversary trips on it. Lesson re-learned: a baseline matrix
should pin the SCHEMA VERSION of its evidence, not just the level number.
## 2026-06-11 ~01:30Z — M2 claim assembled
Drone-path runs landed green (356 immich#2 L4, 357 plausible#3 L4, both with embedded
customization manifests + clean flags, triggered by real !testme comments). Zero-leak verified
after everything. Plausible's missing screenshot.png checked against its other runs — it never
produces one (no screenshot surface), so not a capture regression. Claimed M2 with the full
21-recipe reconciliation table against the corrected baseline; the three lasuite rows ride the
Adversary-accepted L5≡L4+OIDC equivalence, bluesky-pds is the one justified exclusion, discourse
is reconciled as env-drift with byte-identical old==new evidence. Nothing else unblocked in this
phase while the verdict is out — holding per §7 case 2.
## 2026-06-11 ~01:20Z — M2 PASS → ## DONE
Adversary cold-verified the whole claim independently (re-ran the canaries themselves, jq'd all 21
run dirs, re-checked the drone DB and the zero-leak state) and passed M2 with no findings and no
VETO. M1 + M2 both stand; ## DONE written. Phase summary: 6 plan phases landed on one branch,
merged after M1; the real-CI sweep then caught exactly TWO genuine regressions (both in the same
lasuite-drive P2b hook port: raise-on-timeout, and one-shot-vs-converge ordering), both root-caused
live, fixed forward under approval, and proven end-to-end — plus it surfaced two pre-existing
environment drifts (discourse upgrade-HC1, bluesky-pds upstream image) that the A/B discipline
kept from being misattributed to the restructure. The sweep-as-safety-net worked as designed.

View File

@ -0,0 +1,105 @@
# JOURNAL-shot.md — Builder journal, phase `shot`
## 2026-06-11 ~01:1701:35Z — phase open, P1+P2 in one sweep
Read the phase plan + plan.md §6.1/§7/§9. Enumerated enrolled recipes (19). Pulled per-recipe
latest-run data off cc-ci (`results.json` screenshot field + PNG size for all ~190 run dirs),
scp'd 18 PNGs to /tmp/shot-audit/ and Read every one of them.
Findings vs the orchestrator pre-audit: all four 4801-2B suspects are indeed blank frames
(immich pure white, lasuite-meet white, n8n off-white, cryptpad grey). keycloak 8.7KB is a
"Loading the Administration Console" spinner — NOT a sparse login page as §2 guessed.
lasuite-docs/drive ~5.9KB are lone spinners. Two surprises: (1) mattermost-lts 242KB, classed
healthy by size, is actually the brand splash/loading screen, not the login form — size
heuristics lie in both directions; (2) mumble serves a real web page (mumble-web client per
compose.mumbleweb.yml, deployed since Phase 2 for HTTP health) showing its connecting spinner —
so mumble is fixable, not an N/A.
plausible root cause: traced via Drone sqlite (no python3 on host; ran alpine+sqlite3 against
the drone data volume). Build 357 log t=73s: capture failed, last status=500 after 45s. Cross-ref
tests/plausible/functional/test_health_check.py: `/` 500s via auth_controller under
DISABLE_AUTH=true — permanent, not an init race. So the default landing capture can never work;
plausible needs a SCREENSHOT hook to a path that renders (will probe /login, /sites on a live
deploy during P3).
bluesky-pds: null because install fails at level 0 (upstream image breakage, already in
DEFERRED.md from rcust) — capture gated on deploy_ok, correctly skipped. N/A while upstream broken.
custom-html nginx-welcome: verified no install-time seeding exists for this recipe (custom-html-tiny
has install_steps.sh; custom-html only seeds in pre_backup/pre_upgrade ops, after capture). The
nginx default page IS the honest fresh-install view. Leaving OK; flagged in matrix for Adversary.
Adversary opened REVIEW-shot.md with its own cold pre-audit (4f3a747) before my first push —
good: my visual reads agree with theirs on every overlapping row.
Design thinking for P3 (next iteration): default-path improvement = after goto(domcontentloaded),
try a bounded `wait_for_load_state("networkidle")` (~10-15s cap) and/or wait for a non-trivial
painted body, then screenshot; then a blank-detect (PNG < ~6KB or near-uniform) → one retry with
a longer settle. Keep total ≤ ~60s worst case, all inside the existing capture() try/except so R7
(cosmetics never block) is preserved. Unit tests: blank-detector pure function + retry logic with
a fake page. Per-recipe hooks only for plausible (500 root) + whatever the re-audit still shows.
## 2026-06-11 ~05:45-06:00Z — plausible root cause was a 62-char SECRET_KEY_BASE; M1 PASSed meanwhile
M1 PASS (ae10b55) with a watch-list. P3 done in two commits: ce50f64 (harness settle+blank-retry,
6 unit tests, 205 pass, lint PASS) and b98a471 (plausible fix). The plausible story changed under
probing: three live probes (shot-probe{,2,3}-plausible) showed / and every HTML route 302→/register
which 500s; app logs gave the smoking gun: `(ArgumentError) cookie store expects conn.secret_key_base
to be at least 64 bytes`. Our EXTRA_ENV value — comment claimed "64-char" — measures 62. So every
page render 500'd while /api/* (no cookie store) passed all tiers. NOT auth_controller/DISABLE_AUTH
as the old comments claimed; corrected both stale comments. Fix = 68-char value; verified
shot-fix-plausible run: install pass, screenshot.png 64132B = real registration page (empty fields,
placeholders only — same safe shape the Adversary blessed for n8n/uptime-kuma). No hook needed.
P4 started: !testme posted 05:56:32Z on immich#2 + plausible#3 (drone builds 370+371 running,
concurrent). Manual full proof run keycloak launched (shot-proof-keycloak). Remaining queue:
mattermost-lts, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n, mumble.
## 2026-06-11 ~06:05-06:30Z — proof sweep underway; A1 fixed; mumble is the holdout
Proofs verified visually so far (each level matches its baseline): drone 370 immich L4 234KB real
onboarding card (was 4801B); drone 371 plausible L4 64KB registration page (was null); keycloak L4
real sign-in form (was loading spinner); cryptpad L4 real landing w/ document picker (was grey blank);
lasuite-meet L4 real product landing (was white blank); mattermost-lts L2(=m2r baseline L2) — real
page but it's the desktop-or-browser interstitial, so per the watch-list I added the first
SCREENSHOT hook (80e5713, → /login + public settle()); re-run pending.
A1 (blank-retry could regress a larger frame): fixed in 7ad7d1f — retry goes to a temp path and
only replaces via os.replace when >= first; regression test [9999,4801]→9999. 207 unit, lint PASS.
mumble: proof run still spinner after settle+retry (7980B). Probing live what mumble-web does over
90s (it printed real mumble-web HTML while up; suspect autoconnect overlay that never resolves
because the websocket voice path may not be browser-reachable). Orchestrated probe2 running.
Also in flight: n8n + lasuite-docs proofs from the A1-fixed tree. Queue: lasuite-drive, mattermost
re-run; then ghost/hedgedoc/etc. healthy-class citations + dashboard/card check + runtime compare.
## 2026-06-11 ~06:40-07:15Z — mattermost solved via click-through; mumble settled as best-available; M2 assembled
mattermost: hook v1 (/login) produced a byte-identical interstitial PNG — mattermost shows the
desktop-or-browser chooser on ANY first-visit route. Hook v2 clicks "View in Browser" (best-effort,
suppress) → shot-proof3 PNG is the genuine "Log in to your account" form at L2=baseline. That's
watch-list item 3 satisfied the hard way.
mumble: three live probes. probe4 (90s DOM+console watch): localization loads, NO errors, NO failed
requests, connect-dialog selectors match nothing, page stays at loading-container forever. orch5:
websockify serves everything (its own 404s on /ws,/websocket; config.local.js = untouched sample, no
autoconnect). Conclusion: the pinned mumble-web:0.5 client never paints for an anonymous visitor —
not a capture bug, not fixable harness-side without changing the deploy (guardrail says upstream).
Filed DEFERRED (6104a99); claiming the loader frame as documented best-available. Voice = the
recipe's function and is protocol-tested; the Adversary may still want a different disposition —
their call at the gate.
Ops lessons this stretch: 3 simultaneous run launches race on abra catalogue fetch (lasuite-drive
died "unable to update catalogue"; reran solo green) — stagger launches. Backgrounded one-shot ssh
launchers with `cd X && nohup A & nohup B &` only cd for the first — give each its own cd.
M2 evidence: 10 fixed-class proof runs (table in BACKLOG-shot P4, every PNG Read by me), 2 of them
real !testme drone builds (370/371, durations 198s/166s vs 199s/209s baselines — plausible FASTER
since capture stops burning its 45s fail window), healthy-class cited from P1, dashboard grid/card/
badge all 200. Claiming M2.
## 2026-06-11 ~07:20Z — phase complete
M2 PASS (2b54adb): 18/18 PNGs independently Read, both !testme proofs confirmed genuine via bridge
logs, durations/levels/R7 all verified, mumble N/A-variant agreed (Adversary reversed its M1 stance
on the new DOM evidence), bluesky-pds N/A re-confirmed. Wrote ## DONE. Loop ends.

238
machine-docs/REVIEW-bsky.md Normal file
View File

@ -0,0 +1,238 @@
# REVIEW-bsky.md — Adversary verdicts for the `bsky` sub-phase
Phase SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-bsky-fix.md`.
Gates: **M1** (root cause + green fix PR), **M2** (operator handoff complete → `## DONE`).
This file is append-only; the Builder reads it, never writes it.
---
## Baseline recon @2026-06-11 (cold, pre-claim — NOT a verdict)
Established independently from the live recipe checkout on cc-ci
(`~/.abra/recipes/bluesky-pds`, HEAD `b2d86ef`, tag `0.2.0+v0.4-4-gb2d86ef`) so I am
ready to verify the Builder's root-cause claim without anchoring:
- `compose.yml`: app `image: ghcr.io/bluesky-social/pds:0.4` — a **moving minor tag**.
Version label `coop-cloud.${STACK_NAME}.version=0.2.0+v0.4`.
- Recipe **overrides the image entrypoint** via `entrypoint.sh.tmpl` (mounted as a config
at `/entrypoint.sh`, `entrypoint: dumb-init --`, `command: /entrypoint.sh`). That script
ends with `exec node --enable-source-maps index.js` — a **relative** `index.js`, resolved
against the image's WORKDIR.
- Known symptom (rcust/shot evidence, DEFERRED.md): app crash-loops
`Cannot find module '/app/index.js'` (MODULE_NOT_FOUND) under Node v24.15.0. Consistent
with: image WORKDIR `/app`, but `index.js` no longer present there → upstream
restructured/rebuilt whatever `:0.4` now resolves to.
Verification angles I will hold the Builder's M1/M2 to (per phase plan §3 gates):
1. Root-cause evidence reproduces — I independently inspect the live image
(`docker run --entrypoint sh ... -c 'ls; node --version'` / crane/skopeo) and confirm
`index.js` is absent from the assumed WORKDIR at the OLD pin, and present/working at the
NEW pin.
2. The fix is in the **recipe mirror PR**, not the harness; diff minimal + each line
justified against upstream bluesky-social/pds changelog; version label bumped per recipe
convention; **no test/gate weakening** anywhere in cc-ci.
3. The green run is genuinely the **PR head via the drone `!testme` path** (not a local
hand-run) — full lifecycle incl. lint, level recorded under de-capped semantics.
4. Screenshot real + credential-free (I Read the PNG myself); never shows generated creds.
5. DEFERRED entries closed with pointers; operator handoff in STATUS-bsky.md.
No gate CLAIMED yet — awaiting Builder's first `claim(...)` on a bsky gate.
## Pre-claim recon update @2026-06-11T11:45Z (cold image probe — NOT a verdict)
Independently reproduced BOTH halves of the root cause via `docker run` on cc-ci:
- `ghcr.io/bluesky-social/pds:0.4` (current moving tag, digest …2324702f): **Node v24.15.0**,
WORKDIR `/app`, ships **`index.ts`** only — no `index.js`. The recipe's entrypoint
`exec node --enable-source-maps index.js` therefore fails with exactly
`Cannot find module '/app/index.js'`. Symptom reproduced. ✔
- `ghcr.io/bluesky-social/pds:0.4.219` (Builder's proposed pin): **Node v20.20.2**,
WORKDIR `/app`, ships **`index.js`** (`package.json` `main: index.js`). The recipe's
existing entrypoint resolves the file → addresses the crash at the image level. ✔
Open scrutiny points I will hold the M1 claim to (NOT yet judged — no gate CLAIMED):
- **§2.2 upgrade-preference:** `0.4.219` is the latest patch of the *previous* 0.4 line,
not an upgrade to current stable (`:0.4` now = 0.5.1). The plan prefers upgrading unless
research justifies otherwise. Need: a genuine DECISIONS.md justification (e.g. 0.5.x
moved to a TS entrypoint requiring an entrypoint rewrite / larger blast radius) — I'll
read it only AFTER my own verdict, and check it against upstream changelog.
- Pin should be exact/immutable (0.4.219 looks like a full patch tag — verify it's not
itself moving; digest-pin would be strongest).
- Fix must land on the recipe MIRROR PR and be proven green via the drone `!testme` path
at PR head — not a local hand-run; no cc-ci harness/gate weakening.
Still no gate CLAIMED (STATUS-bsky: "none claimed yet — working M1"). Idling for the claim.
## Pre-claim recon @2026-06-11T11:55Z — EXPECTED_NA['upgrade'] premise (cold, NOT a verdict)
Builder added a harness change: `EXPECTED_NA['upgrade']` suppresses the upgrade-tier base
deploy for bluesky-pds ("no deployable base"). I independently checked the premise on the
live recipe checkout:
- Published recipe tags: ONLY `0.1.1+v0.4` and `0.2.0+v0.4`. **Both** pin
`ghcr.io/bluesky-social/pds:0.4` (the moving tag that now resolves to the broken
0.5.1/index.ts image). So every published base would crash identically → there is no
deployable previous published version. Premise holds. ✔
- Logic: the PR fix (pin 0.4.219) is the FIRST deployable published version; before it,
NO published version deploys, so a "previous published → PR" upgrade path cannot exist.
Genuinely N/A, not a dodge. (Post-merge, future PRs WILL have a deployable base → tier
re-activates; operator handoff should note this.)
STILL must hard-verify when M1 is CLAIMED (do NOT pre-judge):
- The NA is **scoped to bluesky-pds only** (per-recipe EXPECTED_NA declaration, not a
global loosening of the upgrade tier for all recipes) — read the diff.
- install / backup-restore / functional / lint tiers are NOT suppressed.
- N/A recorded honestly with reason and handled correctly under de-capped level semantics
(doesn't silently inflate the level nor falsely block); the 6 new upgrade_base() unit
tests actually have teeth.
- §9 alternative ("deploy base minimally via overlay, then upgrade to latest") is correctly
rejected here: latest-deployable == PR head == 0.4.219, so there's no version delta to
test and an overlay base would be synthetic — N/A is the honest call, not the overlay.
---
## M1 — PASS @2026-06-11T12:30Z (root cause + green fix PR + screenshot)
Verdict formed COLD from my own clone + live cc-ci probes, BEFORE reading JOURNAL.md
(anti-anchoring respected). Sources: phase plan §3 (SSOT), the code/git history, the
verification info in STATUS-bsky.md, and my own re-runs below. Every M1 acceptance item
independently reproduced.
### 1. Root cause reproduces ✔
Cold `docker run` on cc-ci of both images:
- `ghcr.io/bluesky-social/pds:0.4` (current, digest …2324702f/871194d2): `@atproto/pds`
**0.5.1**, **Node v24.15.0**, `/app/index.ts`**NO index.js**. The recipe's
entrypoint `exec node --enable-source-maps index.js``Cannot find module
'/app/index.js'`. Symptom reproduced exactly.
- `:0.4.219` (the fix pin): `@atproto/pds` **0.4.219**, **Node v20.20.2**, `/app/index.js`
present (`package.json main:index.js`) ⇒ entrypoint resolves. Fix sound at image level.
- Upstream registry `cc-ci-plan/upstream/bluesky-pds.md` matches my probes (moving `:0.4`
tracks main; 0.4.x keeps classic layout; env interface stable across 0.4.x → no
migration). `:0.4` is demonstrably a MOVING tag upstream republished.
### 2. PR #2 minimal + justified, unmerged ✔
Gitea API: PR #2 **open, merged=false, mergeable=true**; base main b2d86ef, head
**f7b6c8df** (branch upgrade-0.3.0+v0.4.219). Diff = **1 file, +2 2** on compose.yml only:
image `:0.4``:0.4.219`, version label `0.2.0+v0.4``0.3.0+v0.4.219`. No
test/harness/recipe-test weakening in the PR. `:0.4.219` is an **exact** (non-moving)
version tag — newest 0.4.x exact tag preserving the recipe's `index.js` layout, so §2.2's
"exact-version tag … unless research justifies otherwise" is met (0.5.x restructured to a TS
entrypoint requiring a recipe entrypoint rewrite — the same-series re-pin is the minimal
correct fix). NOTE (not a finding): pursuing the 0.5.x upgrade later is a reasonable
operator follow-up; the re-pin is the right minimal fix now.
### 3. Green run 427 via the GENUINE drone !testme path, at PR head ✔
- PR #2 comment **14342** `!testme` → bridge swarm log (ccci-bridge_app):
`[poll] triggered build 427 for bluesky-pds@f7b6c8df (PR #2, comment 14342) by
autonomic-bot``reflected outcome build 427 (bluesky-pds PR #2): success` → PR comment
**14343** "✅ passed @ f7b6c8df". Real poll→drone→reflect, not a hand-run.
- run-427 recipe checkout = PR head `f7b6c8d "chore: upgrade to 0.3.0+v0.4.219"`,
compose.yml line 6 image=`:0.4.219`, version label `0.3.0+v0.4.219`.
- `results.json`: **level=5**, ref=f7b6c8dfb81c, pr=2; rungs
install/backup_restore/functional/lint=**pass**, upgrade=**skip**;
`skips.intentional.upgrade`=declared reason, `skips.unintentional`=[];
flags clean_teardown+no_secret_leak=true; schema=2.
### 4. No gate weakening (the EXPECTED_NA['upgrade'] harness change) ✔
- Premise true (cold): BOTH published recipe tags (0.1.1+v0.4, 0.2.0+v0.4) pin the broken
moving `:0.4` ⇒ no deployable upgrade base. Genuine structural N/A, not a dodge.
- `upgrade_base()` (e9745c8) returns None only when `upgrade ∈ EXPECTED_NA`, declared
**per-recipe** in `tests/bluesky-pds/recipe_meta.py`. NOT a global loosening — unit test
`test_expected_na_other_rung_does_not_suppress` proves a DIFFERENT-rung EXPECTED_NA does
not suppress the upgrade base. The tier records `"skip"`, never `"pass"`.
- **Negative control run 423** (same PR head, pre-EXPECTED_NA): base 0.1.1+v0.4 deploy →
**install=fail** → level **0**. Proves the harness has TEETH: it goes red when a base IS
attempted against the broken tag; 427's level 5 is solely the legitimate base-suppression,
not a masked failure. A synthetic overlay base (0.4.219→0.4.219, zero delta) would be a
meaningless green — N/A-skip is the honest call.
- Level math (`compute_level`, pure): install=pass(1) · upgrade=skip(climbs) ·
backup_restore=pass(3) · functional=pass(4) · lint=pass(5) ⇒ **5**. Consistent with the
lvl5 de-cap semantics (skip climbs; only fail/unver block).
- Unit tests COLD on cc-ci (fresh clone HEAD cba53b6): **253 passed** (6 new in
test_upgrade_base.py, with teeth). Repo lint COLD: `lint: PASS` (exit 0).
### 5. Screenshot — real + credential-free ✔
Published `…/runs/427/screenshot.png` (HTTP 200, 29274 B) is **sha256-identical** to the
on-disk capture. I Read the PNG: the genuine PDS landing page — Bluesky ASCII butterfly,
"This is an AT Protocol Personal Data Server (aka, an atproto PDS)", "/xrpc/" pointer,
Code/Self-Host/Protocol links. **No credentials** (no admin password / invite / secret).
Default capture suffices — no SCREENSHOT hook needed.
### 6. No secret leak ✔
Independent scan of published artifacts (results.json, summary.html, lint.txt, junit) for
the PDS-generated secrets (admin password / jwt / plc rotation key) and high-entropy
strings: the ONLY matches are recipe SOURCE secret-NAME references (`- pds_jwt_secret`
etc.) and one abra lint WARN naming `pds_admin_password` (length policy) — no secret VALUE
exposed. Only high-entropy token = the 40-char commit SHA. clean_teardown confirmed (no
swarm secret/stack residue for the run).
**M1 PASS. No VETO.** Builder cleared to proceed to M2 (operator handoff). M2 will get a
fresh cold pass: independent re-trigger/confirm green at PR head, PNG re-Read, level/baseline
reconciliation, DEFERRED entries closed with pointers, and the operator summary checked —
plus I will then consult JOURNAL/DECISIONS to contextualise (noting it there).
---
## M2 — PASS @2026-06-11T15:48Z (operator handoff complete)
Fresh Adversary cold pass. Verdict formed from the plan (§3 M2 SSOT), the code/deliverables,
the STATUS-bsky verification info, and my OWN independent re-trigger — BEFORE reading
JOURNAL.md (anti-anchoring respected; I may consult it after, noting so).
### 1. Green at PR head — independently RE-TRIGGERED ✔ (the decisive proof)
I posted `!testme` on PR #2 myself (comment **14344**, 15:46:21Z). Bridge:
`[poll] triggered build 435 for bluesky-pds@f7b6c8df (PR #2, comment 14344) by
autonomic-bot`. Fresh **build 435** results.json: **level=5**, ref=f7b6c8dfb81c (PR head),
pr=2; rungs install/backup_restore/functional/lint=**pass**, upgrade=**skip**
(skips.intentional.upgrade=declared reason, skips.unintentional=[]); clean_teardown +
no_secret_leak=true. Recipe checkout = PR head `f7b6c8d`, image `:0.4.219`. Identical rung
profile to run 427 → reproducibly green, not a one-off.
- **Real stages, not a no-op:** junit shows install/backup(generic+cc-ci)/restore
(generic+cc-ci) and FOUR live functional tests — `test_health_check`,
`test_describe_server`, `test_session_auth`, `test_account_and_post`. A no-op could not
pass account-creation/post/session-auth against a live PDS. (Wall-clock ~70s is plausible:
lightweight 2-service recipe, image cached on host.)
### 2. PNG independently Read ✔
Fresh build 435 screenshot.png sha256 == run 427's (bdb71d3e…) == the image I Read at M1:
genuine PDS landing page (Bluesky ASCII butterfly, "AT Protocol Personal Data Server",
/xrpc/ pointer, upstream links), **no credentials**. Deterministic, real.
### 3. Level under new semantics + baseline reconciled ✔
level=5 under the de-capped ladder (upgrade=skip climbs; only fail/unver block). Old Phase-2
baseline ("full lifecycle green", e45e0ee, pre-results era) is genuinely unreproducible —
the moving-tag republish broke ALL published recipe versions; the PR restores deployability.
Reconciliation recorded in the DEFERRED closure + the M2 claim. Independently corroborated:
**0.5.x has NO release tag** (upstream git: 0 `0.5.x` tags, highest v0.4.219 + anomalous
v0.4.5001; ghcr `0.5.0/0.5.1/v0.5.1` all absent) — so an exact-version pin REQUIRES 0.4.x.
This fully resolves the §2.2 "prefer upgrade" scrutiny: re-pinning to 0.4.219 (newest exact)
is not "old over new" — there is no exact 0.5.x tag to upgrade to; 0.5.x lives only on the
moving tag the recipe must never pin. Justified.
### 4. DEFERRED entries closed with pointers ✔
machine-docs/DEFERRED.md: ✅ RESOLVED @2026-06-11 (phase bsky). Explicitly closes BOTH the
re-pin follow-up AND the rcust M2 baseline-exclusion note, with pointers to PR #2 / run 427 /
negative control 423 / upstream registry / DECISIONS. Original entry preserved (append-only).
### 5. Operator summary ✔
STATUS-bsky "Operator summary": crisp + complete — what was wrong (moving tag → index.ts vs
recipe's index.js; broke both published versions), what the PR changes (2-line re-pin
0.4.219 + label bump; why not 0.5.1 = no release tag + entrypoint migration), and a 5-step
post-merge runbook (merge → publish version → drop EXPECTED_NA + set
UPGRADE_BASE_VERSION="0.3.0+v0.4.219" → no canonical to reseed → never re-pin :0.4).
Corroborated: ci-warm has NO bluesky entry (only custom-html/keycloak/traefik) → "nothing to
reseed" is true.
### 6. PR left OPEN ✔
PR #2 head f7b6c8df, state=open, merged=**false** (re-confirmed at re-trigger). The phase is
done WITH the PR open — merging is the operator's, post-merge reseeding documented not done.
**M2 PASS. No VETO.** Both M1 (@369f4f4) and M2 are fresh Adversary PASSes; no gate
weakening, no secret leak, screenshot real, PR unmerged. The Builder is cleared to write
`## DONE` to STATUS-bsky.md. (Post-verdict I will consult JOURNAL/DECISIONS only to
contextualise — it does not change this verdict.)
### Post-verdict consult (does NOT change the verdict)
Read DECISIONS.md bsky entries after writing M2 PASS. Fully consistent: pin-choice entry
REJECTS 0.5.1 (no release tag + index.ts migration) AND digest-suffix pinning (abra
survey/upgrade tooling chokes on `tag@digest`) → exact-version tag 0.4.219 chosen (satisfies
plan §2.2 "digest-pinned OR exact-version tag"). EXPECTED_NA entry matches the harness
behaviour I verified. No contradiction, no new finding.

442
machine-docs/REVIEW-conc.md Normal file
View File

@ -0,0 +1,442 @@
# REVIEW-conc.md — Adversary ledger, concurrency-restructure phase
Append-only. Verdicts: `<gate>: PASS @<ts>` + evidence, or `FAIL` + [adversary] finding in
BACKLOG-conc.md. SSOT for what is verified: /srv/cc-ci/cc-ci-plan/concurrency-restructure-full-plan.md.
## 2026-06-10T04:00Z — Adversary online; baseline pre-read (no gate pending)
Pulled main @5b65c6c. No STATUS-conc.md, no `restructure/concurrency` branch — nothing claimed yet.
Pre-read the CURRENT system (docs/concurrency.md @5b65c6c + lifecycle.py/run_recipe_ci.py) to
anchor my later diff review in the as-is code, not the Builder's narrative.
Current-system facts I will hold the restructure against:
- Registry symbols slated for deletion (will grep for dangling refs at M1):
`register_run_app` (lifecycle.py:69, call site :283), `unregister_run_app` (:78, call sites :723, :766),
`_run_owner_state` (:83), `ACTIVE_RUN_DIR` (:43), `CCCI_JANITOR_MAX_AGE` (janitor :738),
`acquire_recipe_lock` (:46, call site run_recipe_ci.py:843), `RECIPE_LOCK_DIR` (:42).
- Must survive untouched: `RUN_APP_RE` (lifecycle.py:26) allowlist semantics (warm/canonical apps
never probed), `services_converged()` paused-is-settled logic, docker-service sweep discovery,
`teardown_app(verify=False)` idempotence.
- M1 verification plan (cold, my clone): checkout branch; `pytest tests/unit -q`,
`pytest tests/concurrency -q`, `scripts/lint.sh`; full diff review hunting: probe-vs-acquire
ordering races, signal-handler reentrancy (SIGTERM during teardown / SIGALRM during SIGTERM),
teardown-during-teardown, lock-fd lifetime (object dropped → GC closes fd → lock silently
released), symlinked servers/ write conflicts, janitor unlink-vs-reacquire race (unlink while a
waiter blocks on the old inode → two "held" locks on different inodes for one domain),
PDEATHSIG-after-fork ordering (prctl before ppid check), alarm(0) vs teardown duration,
setsid wrapper trap semantics under drone cancel, test-suite blind spots vs the 19 planned cases.
- Tests/concurrency must NOT be wired into the default `pytest tests/unit` gate (plan decision).
- M2 (post-merge, live): cancel-mid-run leak check, parallel immich#2+plausible#3, double-!testme
same PR blocks visibly, one full green run. NEVER merge/push recipe mirror repos.
No verdict yet — waiting for Builder bootstrap/claim.
## 2026-06-10T04:05Z — cold-verify environment established (prep, no gate)
Builder seeded STATUS/BACKLOG/JOURNAL-conc; STATUS says P1 in flight, no gate claimed. Mapped the
test-execution environment I'll use for the M1 cold run so a time-sensitive gate isn't spent
debugging tooling:
- Local VM devshell (`nix develop`) has only lintTools (no pytest). So pytest does NOT run here.
- pytest 8.3.3 + playwright live in the host `pyEnv` (nix/modules/harness.nix) exposed as
`cc-ci-run` on cc-ci. `cc-ci-run -m pytest <path> -q` works as the real harness interpreter
(verified: `cc-ci-run -c "import pytest" -> 8.3.3`).
- `.drone.yml` lint stage runs `nix develop .#lint --command bash scripts/lint.sh`.
- COLD M1 PLAN: fresh `git clone`/checkout of `restructure/concurrency` into a throwaway dir ON
cc-ci → `cc-ci-run -m pytest tests/unit -q` + `cc-ci-run -m pytest tests/concurrency -q` +
`nix develop .#lint --command bash scripts/lint.sh`, all from that clean checkout (not the
Builder's working tree). Then adversarial diff review per my baseline hit-list.
- Baseline `.drone.yml` on main is still the pre-restructure version (concurrency.limit=2,
acquire_recipe_lock / /run/cc-ci-active registry referenced) — confirms P1/P4 edits are
branch-only so far. Good.
## 2026-06-10T04:23Z — early pre-review of P1+P2 (branch @b302f3a, NO gate claimed — NOT a verdict)
Builder has pushed P1 (b492f99) + P2 (b302f3a) to restructure/concurrency; P3/P4/P5/tests still
pending, so M1 is not claimable and this is NOT a PASS — it's pre-review to front-load the M1 diff
audit and avoid re-doing it under gate time pressure. Read code/diff + git only; did NOT read
JOURNAL (anti-anchoring intact). I actively tried to break the following and each concern was
REFUTED:
1. **Green-on-red via the .drone.yml EXIT trap** (my lead hypothesis). The wrapper is
`setsid cc-ci-run … & PID=$!; trap 'kill -TERM -- -$PID' TERM EXIT; wait $PID`. I worried the
EXIT trap's final `kill` status would override the harness exit code and mask a failing run.
EMPIRICALLY TESTED (4 bash repros incl. failing harness with a lingering group member that
makes kill succeed=0): bash PRESERVES the pre-trap exit status when the EXIT trap doesn't call
`exit`. Exit code propagates correctly in all cases (RED stays RED, GREEN stays GREEN). Refuted.
2. **P2 unlink/reacquire inode race** (janitor unlinks a reaped orphan's lockfile while a new run
blocks on the old inode). Handled: both acquire_app_lock and _probe_and_reap recheck
`fstat(fd).st_ino == stat(path).st_ino` after acquiring and retry/bail on mismatch — a lock on
an unlinked (anonymous) inode is never treated as authoritative, and the path's lockfile is
never unlinked out from under a newer run. Refuted.
3. **Half-reaped/new-app coexistence.** Reap runs WHILE HOLDING the probe lock; a new same-domain
run blocks in acquire_app_lock until reap completes. The pre-deploy window (lock held, app not
yet created) is covered: the stale-lockfile sweep sees the held lock (BlockingIOError) and
leaves it. Refuted.
4. **Signal mid-normal-teardown aborting cleanup.** begin_teardown() is the FIRST line of BOTH
finally blocks (run_recipe_ci.py:663 run_quick, :1134 main); the _funnel_handler swallows
(logs+returns) any SIGTERM/SIGALRM once tearing_down is set, so a second signal can't abort the
cleanup the first asked for. install_lifetime_guards() is the FIRST statement of main() (:829),
before any abra/lock call, with prctl→ppid==1 recheck in the correct order. Refuted.
Open items to confirm AT M1 (cold, full suite) — NOT defects, just unverified-until-then:
- `datetime` import removed from lifecycle.py along with _stack_age_seconds — grep for any
remaining datetime use (ruff would catch an undefined name; confirm import truly orphaned).
- `_stack_name` / age-fallback deadcode after the janitor rewrite — confirm no dangling refs.
- Registry-symbol deletion is only PARTIAL on this commit: acquire_recipe_lock still present
(P3 deletes it); register/unregister/_run_owner_state/ACTIVE_RUN_DIR/CCCI_JANITOR_MAX_AGE are
gone — full dangling-ref grep belongs at M1 once P3 lands.
- setsid-fork edge: if `setsid` ever forks (only when it's a pgrp leader; not the case for a
backgrounded job in a non-job-control drone shell), $PID would be the intermediate and the
harness would reparent to ppid==1 and self-abort. Live-verify the trap+cancel path at M2(a).
- begin_teardown is process-global module state (lifetime._state) — fine for one harness process;
the tests/concurrency suite must not import-share it across in-process cases (verify at M1).
## 2026-06-10T04:32Z — pre-review P3+P4 (branch @91d3cc7, NO gate claimed — NOT a verdict)
Builder pushed P3 (17ebdf3 per-run ABRA_DIR) + P4 (91d3cc7 config cleanup). tests/concurrency +
P5 docs still pending, so M1 still not claimable. Continued the front-loaded diff audit (code/git
only; JOURNAL still unread). Findings — all CLEAN:
- **Dangling-ref grep across runner/bridge/dashboard/nix = ZERO hits** for all 9 deleted symbols:
acquire_recipe_lock, register_run_app, unregister_run_app, _run_owner_state, ACTIVE_RUN_DIR,
CCCI_JANITOR_MAX_AGE, RECIPE_LOCK_DIR, _stack_age_seconds, _registry_path. The orphaned
`datetime` import is also gone from lifecycle.py. Clean deletion.
- **Path centralization**: all `~/.abra/recipes/<recipe>` literals replaced by `abra.recipe_dir()`
(resolves `$ABRA_DIR else ~/.abra`) across abra.py (recipe_checkout, has_lightweight_version_tags,
recipe_head_commit, recipe_versions), generic._recipe_dir, lifecycle.prepull_images,
snapshot_recipe_tests, fetch_recipe. prepull's env_path stays canonical `~/.abra/servers/...`
which is correct (servers/ is the shared symlink target).
- **Ordering verified** (main(), the only structural risk): install_lifetime_guards() is the FIRST
stmt (873); between it and setup_run_abra_dir() (891) there are ONLY env reads + a print — no
abra call; ABRA_DIR is exported at 891 BEFORE fetch_recipe (892) and before the first path-helper
recipe_head_commit (895). The `--quick` dispatch (run_quick, ~908) is AFTER 891, so the quick lane
inherits the per-run ABRA_DIR too. No tree is touched before ABRA_DIR is set.
- **Manual-run isolation**: rid=="manual" → "manual-<pid>" so two hand-runs don't share a tree.
Open items to confirm AT M1 (cold) — not defects:
- setup_run_abra_dir symlink idempotency: `if not os.path.islink(link): os.symlink(...)` — if a
NON-symlink file pre-exists at servers/catalogue (reused run dir from a crashed partial), symlink
raises FileExistsError. Low risk (fresh run-id per Drone build) but worth a glance.
- CCCI_SKIP_FETCH=1 now `rm -rf dest` + copytree(canonical, dest, symlinks=True) — confirm the
--quick rollback-proof staging tests still pass (they set CCCI_SKIP_FETCH).
- tests/{ghost,discourse}/install_steps.sh RECIPE_DIR=${ABRA_DIR:-$HOME/.abra} mechanical path fix
— confirm it changed NO assertion/gate (guardrail: never weaken recipe-test gates). Diff-check.
Net: the entire P1P4 diff has been pre-audited and is clean against my break-it hit-list. M1 cold
run, once claimed (after tests/concurrency + P5 land), reduces to: fresh checkout on cc-ci →
`cc-ci-run -m pytest tests/unit -q` + `cc-ci-run -m pytest tests/concurrency -q` + lint, plus a
focused review of only the tests/concurrency suite (vs the 19 planned cases) and the P5 doc delta.
## M1: PASS @2026-06-10T04:38Z — implementation verified (branch restructure/concurrency @d3fe9e2)
Verdict formed from the plan (SSOT), the code/git, the STATUS claim's verify recipe, and my own
COLD acceptance run — WITHOUT reading JOURNAL first (anti-anchoring honored; noting here that I had
NOT consulted JOURNAL-conc at verdict time).
COLD ENVIRONMENT: fresh `git clone --branch restructure/concurrency` into /tmp/adv-m1 on cc-ci
(NOT the Builder's tree); `git rev-parse HEAD == d3fe9e26bb0fbaedb37383539ba3973bc1c80aff` (matches
claim), `git status` clean. Ran via the host `cc-ci-run` pyEnv (pytest 8.3.3 + playwright) and the
pinned `.#lint` devshell.
ACCEPTANCE RESULTS (expected → observed):
- `cc-ci-run -m pytest tests/unit -q` → 138 passed in 4.72s ✓ (claim: 138 passed)
- `cc-ci-run -m pytest tests/concurrency -q` → 20 passed in 9.91s ✓ (claim: 20 passed)
- `nix develop .#lint --command bash scripts/lint.sh``lint: PASS`
- `pytest tests/unit --collect-only` concurrency items → 0 ✓ (suite NOT in default gate)
- dangling-ref grep (register_run_app, unregister_run_app, _run_owner_state, ACTIVE_RUN_DIR,
CCCI_JANITOR_MAX_AGE, acquire_recipe_lock, RECIPE_LOCK_DIR, _stack_age_seconds) over
*.py/*.nix/*.yml/*.sh → ZERO hits outside docs/ ✓
GATE-INTEGRITY (guardrails honored):
- `RUN_APP_RE` regex unchanged (lifecycle.py:26, identical pattern); warm/canonical apps still
never become probe candidates (test_11 asserts no lockfiles even created for warm names).
- `services_converged()` / paused-is-settled / `backup_app()` waits: NOT in the code diff — all
RUN_APP_RE/services_converged/paused diff hits are docs/concurrency.md prose (P5 rewrite).
- `teardown_app` ordering untouched; only its trailing unregister call removed (registry gone).
- Only `tests/<recipe>/` change is the mechanical `RECIPE_DIR=${ABRA_DIR:-$HOME/.abra}/...` line
in ghost+discourse install_steps.sh — NO assertion/gate touched (diff-confirmed). Guardrail
"never weaken recipe-test gates / touch tests/<recipe>/ content" honored.
- P4: `concurrency.limit` block removed from .drone.yml; drone-runner.nix comment makes
DRONE_RUNNER_CAPACITY the single knob.
ADVERSARIAL DIFF REVIEW (P1P4 pre-audited in the two notes above; refuted: green-on-red exit-code
masking [empirically tested], unlink/reacquire inode race [fstat==stat identity recheck],
half-reaped coexistence [reap-under-probe-lock], signal-mid-teardown reentrancy [begin_teardown
first line of both finally blocks], guard/ABRA_DIR/fetch ordering [no abra call pre-export]).
TEST-SUITE AUDIT vs the 19 plan cases: real kernel flocks, NEVER mocked (only teardown_app +
abra-discovery stubbed, both disclosed). Coverage complete: cases 14 test_locks, 512
test_janitor, 1316 test_lifetime, 1719 test_abra_dir, +test_18b (manual-pid isolation) = 20.
Assertions are substantive, not tautological: exact funnel exit codes 142/143 (test_15/16),
reap-vs-new-run timestamp ordering + fresh-inode `lock_state=="held"` (test_7), two-janitor
arbitration via separate open()s (test_8 — valid: flock binds the open file description, so
threads-with-distinct-fds model processes), long-held mtime-backdate flag-not-steal (test_10),
PEP 446 fd non-inheritance with a surviving child (test_3), divergent per-run trees + canonical
untouched (test_18).
INDEPENDENT PROBE (my own driver, NOT the Builder's helpers.py): drove the real
`lifecycle.acquire_app_lock` from a standalone script with a sandbox CCCI_APP_LOCK_DIR on cc-ci →
state `held` after acquire; a second acquirer BLOCKED while the first held (no ack2 after 1.5s);
after `SIGKILL` of the holder the second acquired within 10s (kernel auto-release). Core invariant
confirmed against the real code, not just the Builder's tests.
NON-BLOCKING NOTES (carry to M2 live-verify; none gate M1):
- setsid-fork edge in the .drone.yml trap wrapper: if `setsid` ever forks (only when it's a pgrp
leader — not the case for a backgrounded job in a non-job-control drone shell), $PID would be the
intermediate and the harness could reparent (ppid==1) and self-abort. MUST be live-verified by
the actual drone-cancel path at M2(a) — the plan already flags this ("verify drone exec runner
signal delivery; the trap must fire on drone cancel"). Not unit-testable here.
- End-of-janitor stale-lockfile tidy sweep (appless leftover lockfile unlink) is not directly
covered by a named test (not one of the 19); low risk (tidiness only). Noted, not a defect.
- test_14 (ppid race) depends on the helper reparenting to pid 1; under a subreaper it marks
NEVER_REPARENTED and FAILS VISIBLY (never false-passes). Passed in this env.
CONCLUSION: M1 — implementation verified — PASS. M2 (merge to main + live verification ad) is
unblocked. Reminder for both loops: recipe-mirror PRs are !testme targets only — never merge/push
them. (After this verdict I may consult JOURNAL-conc to contextualize, per §6.1.)
## 2026-06-10T04:49Z — M2 merge integrity pre-check (M2 NOT yet claimed — not a verdict)
Builder merged the branch to main (merge commit `bb5eb3d`, 2 parents 83a6c6e∘d3fe9e2, no force)
after my M1 PASS, and is mid-M2 live verification (journal: M2(a) cancel-mid-run evidence, (b)
parallel runs triggered). No `claim(conc): M2` commit yet; STATUS-conc still shows the stale M1
line (Builder's file — will update at the M2 claim). Independent merge check:
- `git diff bb5eb3d d3fe9e2 -- runner/ .drone.yml docs/concurrency.md tests/ nix/` = EMPTY → the
merge preserved EXACTLY the code I cold-verified at M1. No conflict-resolution drift introduced.
- `git merge-base --is-ancestor d3fe9e2 bb5eb3d` = true.
So deployed main == M1-verified tree. At the M2 claim I therefore re-verify only LIVE behavior +
the push build, not the code again:
push build green; (a) cancel mid-run → no leaked python/lock, next janitor reaps the app, zero
leakage; (b) two parallel !testme (immich#2 + plausible#3) → both green, zero leakage; (c)
double-!testme same PR → 2nd blocks on the app lock (visible in its drone log) then runs; (d) one
full green end-to-end run. Evidence to come from Drone build logs + cc-ci state (abra app ls /
lslocks / docker), cold from my own access path.
## 2026-06-10T05:00Z — wrapper exit-code fix verified + CORRECTION to my P1 pre-review (inbox consumed)
Consumed ADVERSARY-INBOX.md (deleted) — Builder reported an M2 live-verify finding + fix. Folded in:
**The defect (real, Builder-found, build 269 plausible#3):** the drone exec step shell is `set -e`.
On a NORMAL (green) harness exit the P1 EXIT trap still fired and its `kill -TERM -- -$PID` of the
already-exited process group returned ESRCH (exit 1), which under `set -e` poisoned the step's exit
status to 1 — a fully GREEN run (all tiers pass, level=4) reported RED.
**CORRECTION — my P1 pre-review was wrong on this point.** In my 04:23Z pre-review I claimed to have
"empirically tested" green-on-red exit-code masking and REFUTED it. That test was run with plain
`bash -c` WITHOUT `set -e` — the wrong shell mode. The real drone step runs `set -e`, where the bug
manifests. I re-ran the matrix correctly now (bash -e), reproducing the bug (old wrapper + green +
set -e → exit 1) and confirming I had the shell mode wrong. Lesson: model the EXACT runtime
(set -e) for shell-trap behavior. The Builder caught this live; I did not. Owning it.
NB the failure direction was false-RED (green reported red) — fail-safe-ish, not a green-on-red
(no failing run was ever reported green); still a real defect.
**The fix (e1c4198 on branch, merged to main b7a009c) — independently verified by me, cold under
`set -e` (the correct mode this time):**
```
setsid cc-ci-run runner/run_recipe_ci.py & PID=$!
trap 'kill -TERM -- "-$PID" 2>/dev/null || true' TERM EXIT
rc=0; wait "$PID" || rc=$?
trap - TERM EXIT
exit "$rc"
```
My 4-path matrix (all under `bash -e`, exact-shape repros):
- A green harness → step exit 0 ✓ (poisoning gone: `|| true` on the trap kill + `trap - EXIT` before exit)
- B **red harness (exit 7) → step exit 7 ✓ — NOT masked to green.** Critical false-GREEN check
PASSES: `wait || rc=$?` captures the real rc and `exit "$rc"` propagates it. The
"failing PR must report RED" gate is preserved by the fix.
- C old wrapper + green + set -e → exit 1 ✓ (bug reproduced — root-cause confirmed)
- D cancel (TERM to wrapper mid-wait) → wrapper exits 143 AND the child received TERM
(CHILD_GOT_TERM logged) ✓ — cancel-forwarding semantics unchanged; the `trap - TERM EXIT` runs
only AFTER `wait` returns (post-forward), so it can't disarm the forward during a real cancel.
Verdict on the fix: CORRECT and SAFE — resolves the false-RED poisoning without introducing
false-GREEN, and preserves cancel forwarding. Folds cleanly into the pending M2 review.
**M1 status unaffected:** M1 PASS was for the code/suites/lint/diff of d3fe9e2; this wrapper
exit-code-under-set-e is a LIVE behavior M1's checks could not exercise (the trap only runs in the
real drone exec shell). main now = d3fe9e2 + this .drone.yml wrapper fix; the fix is verified above.
Open for the formal M2 verdict: re-confirm lint green on the new .drone.yml (yamllint), the push
build green, and live (a) cancel-no-leak / (b) parallel both-green / (c) double-!testme blocks /
(d) one full green run — cold, once the Builder posts the M2 claim with evidence.
## M2(c): FAIL @2026-06-10T08:10Z — double-!testme same domain corrupts shared deploy-count → both runs RED + VETO
Proactive cold break-it probe of the live M2 evidence (M2 not yet formally `claim(conc)`'d — the
Builder's JOURNAL shows (c) "triggered" but NOT evidenced as PASS; I went straight to the Drone API
to verify the in-flight (c) runs independently, not to the JOURNAL narrative). I found a REAL defect
that breaks M2(c). Filed as BACKLOG-conc CONC-A1.
EVIDENCE (Drone API, recipe-maintainers/cc-ci, cold via /run/secrets/bridge_drone_token — my own
access path, not the Builder's word):
- (c) = builds **279 + 281**, both `event=custom PR=2 RECIPE=immich REF=a92b28d…` → SAME domain
`immi-ad3e33.ci.commoninternet.net`. Both `status=failure` (step `ci` exit_code=1).
- 281 (the blocked run): log `== app lock: ... in flight — waiting ==` @2s`== acquired ==` @194s,
which is exactly when 279's process exited (279 finished 05:07:35Z). **Lock serialisation + the
visible block line WORK** — that half of (c) is fine.
- 279 RED: `!! deploy-count 2 != 1 (DG4.1 violation)`.
- 281 RED: `FileNotFoundError: /tmp/ccci-deploys-immi-ad3e33….ci.commoninternet.net` at
run_recipe_ci.py:1213.
- Control build 275 (isolated immich, same fixed wrapper) → `deploy-count = 1`, GREEN. Confirms the
failure is concurrency-specific, NOT a pre-existing immich/wrapper regression.
ROOT CAUSE (code, confirmed):
- DG4.1 counter file is DOMAIN-keyed in shared /tmp, not per-run: `run_recipe_ci.py:930
/tmp/ccci-deploys-<domain>`. P3 isolated ABRA_DIR per run but this per-run state file was missed
(predates the restructure, ef44d46; the old recipe-flock serialised same-recipe runs end-to-end,
masking it).
- `deploy_app()` calls `_record_deploy()` (lifecycle.py:250) BEFORE `acquire_app_lock()` (:254,
introduced by P2 b302f3a) → the increment races OUTSIDE the lock. 281's single pre-lock
`_record_deploy` (@2s) bumps the shared counter 279 is using (→2, false violation), and 279's
end-of-run `os.remove(countfile)` (:1215) deletes the file under 281 → FileNotFoundError.
- Interleaving is fully reconstructed and self-consistent with the build timestamps (see CONC-A1).
This is squarely in M2(c) scope: the plan's DoD (c) requires the second run to "block … then RUN"
(implicitly green), and the phase's whole premise is "two concurrent !testme don't collide on
domain/volume/secrets." This is a domain-keyed-state collision — the restructure's narrower domain
lock no longer covers the deploy-count file. M1 (code/suites/lint/diff of d3fe9e2) is unaffected —
this is a live concurrency behavior M1's checks could not exercise; the tests/concurrency suite has
the matching blind spot (case 4 serialises acquire but never asserts deploy-count isolation across
two same-domain runs).
## VETO — M2 may NOT be marked DONE until CONC-A1 is fixed and I log a fresh (c) PASS
Forbidding `## DONE` in STATUS-conc until: (1) deploy-counter keyed per-run; (2) a tests/concurrency
case asserts same-domain deploy-count isolation; (3) live (c) re-run shows BOTH builds GREEN with
the visible block line and zero leakage; (4) (a),(b),(d) re-confirmed unaffected. Only I clear this.
(After this verdict I may consult JOURNAL-conc to contextualise — noting I had NOT read the (c)
journal reasoning before forming this FAIL; I verified from the Drone API + code directly.)
## 2026-06-10T08:20Z — CONC-A1 fix CODE-verified (veto conditions 1+2 met; 3+4 still pending — NOT cleared)
Builder fixed CONC-A1 (b6e12ef, merged main 139e319) and is re-running M2 live (a)(d). I
cold-verified the FIX CODE from my own clone + a fresh checkout on cc-ci (not the Builder's word):
- **Condition (1) per-run keying — MET.** `run_recipe_ci._run_state_path(name)` keys all four
run-scoped state files (`deploys`, `opstate`, `deps`, `depskip`) by `run_id()` + `os.getpid()`,
never domain. Grep: ZERO residual `ccci-<state>-{domain}` literals in prod code (only the
app-LOCK path stays domain-keyed, which is correct). All consumers env-read `CCCI_*_FILE`
(lifecycle:148, deps:72/155, generic:134) — no path re-derivation. Uniqueness holds even in the
manual fallback (`run_id()`→domain) because the `+pid` suffix separates two processes.
- **Condition (2) same-domain isolation test — MET, and proven non-tautological.**
tests/concurrency/test_run_state.py adds test_20/20b/20c. test_20c drives REAL processes + the
REAL lock + real `_run_state_path`/`_record_deploy`, reproducing the 279/281 interleaving: run A
reads `COUNT 1` (NOT polluted to 2 by B's pre-lock increment) and B's file survives A's remove
(no FileNotFoundError). **Mutation check (my own):** reverting `_run_state_path` to domain-keying
in a throwaway cc-ci clone → all 3 test_run_state cases FAIL (incl. test_20c). So the test
genuinely guards the fix.
- **Suites cold (fresh clone @4f6c955 on cc-ci):** unit 138 passed, concurrency 23 passed (was 20),
concurrency still NOT collected by the default `pytest tests/unit` run (0). lint not re-run here
(no .drone.yml/nix change in the fix; will confirm at the M2 claim).
**VETO NOT cleared.** Conditions (3) live (c) re-run BOTH builds GREEN + visible block line + zero
leakage, and (4) (a)/(b)/(d) re-confirmed on the fixed harness, still require the Builder's live
evidence (in flight). The code fix strongly predicts a (c) pass but M2 is a LIVE gate — I will
re-verify the (c) double-!testme cold from the Drone API once the Builder posts the M2 claim, and
only then clear the veto.
## 2026-06-10T08:43Z — live (c) round-2 (builds 290+291): serialization CONFIRMED via lslocks; delay is an immich-ML flake, NOT the restructure (not a verdict)
(b)+(d) re-passed on the fixed harness (builds 287 immich#2 + 288 plausible#3, parallel, both
success — I'll re-confirm at the M2 claim). (c) round 2 = builds 290+291 (both custom PR=2 immich,
same domain immi-ad3e33), started 08:22:30Z. I inspected the LIVE host state cold (my own ssh):
- **CORE INVARIANT DIRECTLY OBSERVED in the kernel lock table** — strongest possible proof of the
double-!testme serialization:
`lslocks`: pid 739163 (build 290) holds `WRITE` on cc-ci-app-immi-ad3e33….lock; pid 739341
(build 291) is blocked `WRITE*` on the SAME lock. Exactly one holder, one waiter, one inode.
- 290 (holder) is sleeping in `services_converged()` poll (hrtimer_nanosleep, no abra child) because
`immich-machine-learning` is stuck 0/1: its container repeatedly fails the healthcheck
(`non-zero exit (143): dockerexec: unhealthy container`, swarm restarting every 16 min). Current
attempt (08:43) has gunicorn up, health `starting` — slow/flaky ML readiness, not a deploy break.
- NOT caused by the restructure / teardown: 290's immich volumes (model-cache/postgres/uploads) +
.env are all from 290's OWN fresh deploy (08:23), not inherited from the earlier same-domain run
287. ML image present (1.36GB, no pull), host healthy (5.2Gi mem free, 65G disk). So this is an
immich-ML healthcheck flake, orthogonal to concurrency.
Bearing on M2(c): the SERIALIZATION mechanism under test is verified working live. The "both GREEN"
half of condition (3) is not yet demonstrated only because 290 is flake-blocked on immich-ML; if 290
REDs on deploy-timeout, (c) needs a clean re-run (flake, not a code fault). VETO unchanged — I still
require one clean (c) where both same-domain builds go GREEN with the block line + zero leakage.
Continuing to watch 290/291 to terminal.
## M2(c): PASS @2026-06-10T09:05Z — double-!testme same domain, CONC-A1 fixed; VETO LIFTED
(c) round-2 builds 290+291 (both `custom PR=2 immich`, same domain immi-ad3e33, on CONC-A1-fixed
main) both reached terminal **status=success**. Cold-verified from the Drone API + live host (my own
access path), not the Builder's word:
- **Both GREEN:** 290 success, 291 success (Drone API).
- **Visible block line (the (c) requirement):** 291 log —
`== app lock: another run of immi-ad3e33….ci.commoninternet.net is in flight — waiting ==`
then `== app lock: acquired … ==`. I ALSO observed the serialization directly in the kernel lock
table mid-run (lslocks: 290 held WRITE, 291 blocked WRITE* on the same inode; after 290 exited,
291 held it). Strongest possible proof of the double-!testme serialization invariant.
- **CONC-A1 regression GONE — the two exact round-1 failure points are now clean:**
- 290 (round-1 build 279 got false `deploy-count 2 != 1`) → now `deploy-count = 1 (expect 1)`,
all 5 tiers pass, level=4. Its run-keyed counter was NOT polluted by 291's concurrent pre-lock
`_record_deploy`.
- 291 (round-1 build 281 crashed `FileNotFoundError` at run_recipe_ci.py:1213) → now
`deploy-count = 1 (expect 1)`, all tiers pass, level=4, no traceback. Its own run-keyed countfile
survived 290's end-of-run remove.
- **Zero leakage after both:** 0 harness procs, 0 immich apps / services / volumes / secrets, no held
cc-ci locks. One unheld 0-byte leftover lockfile (mtime 08:46, 291's acquisition touch) — reaped
on sight by the next janitor probe, harmless by design.
- The ~20-min runtime each was an immich-machine-learning healthcheck slowness/flake (ML eventually
converged), NOT the restructure — already diagnosed in the 08:43Z note; serialization + isolation
both verified correct regardless.
**VETO LIFTED.** The CONC-A1 veto ("no DONE until CONC-A1 fixed + a fresh (c) PASS") is cleared:
conditions (1) per-run keying [code + mutation-proven], (2) same-domain isolation test
[non-tautological], and (3) live (c) both-GREEN + block line + zero leakage are ALL met. CONC-A1
closed in BACKLOG-conc.
**Still required before DONE (full M2 gate, not the CONC-A1 veto):** the Builder must post the formal
M2 claim in STATUS-conc with consolidated evidence, and I re-confirm condition (4) — specifically
**M2(a) cancel-mid-run re-run on the CONC-A1-fixed harness** (b+d already re-confirmed: builds
287+288 parallel both success on fixed main; a's only prior evidence (build 267) was on the
pre-CONC-A1, pre-wrapper-fix harness) — plus the push build green on current main. (a) re-run had
not yet appeared in Drone as of this verdict (Builder sequenced it after (c)). I will verify it cold
when it lands.
## M2: PASS @2026-06-10T08:55Z — merged + live-verified (a)(d) on final main 139e319/74ed240
Formal M2 gate verdict against the Builder's M2 claim (STATUS-conc, commit 74ed240). Formed from
the plan (SSOT), the code/git, the claim's verify recipe, and my OWN cold re-runs from my own clone
+ fresh checkouts/Drone-API on cc-ci — not the Builder's narrative. All seven claim items confirmed:
1. **Merge integrity** — `git diff 139e319 b6e12ef -- runner/ tests/ docs/ .drone.yml nix/` = 0 lines;
`b6e12ef ⊆ 139e319`; merge parents `2173894 ∘ b6e12ef`. So deployed main code == the CONC-A1 tree
I code-verified + mutation-proofed. No force-push (history linear). NB the claim mis-states the
first parent as `4ad55ed` (actual `2173894`, my M2(c)-FAIL commit) — immaterial: that's a state-
file commit, and the code-diff-empty check is authoritative.
2. **Push build green** — Drone push builds 283298 on main all `status=success`; no red push since
the merge.
3. **Suites + lint (cold, fresh clone on cc-ci)** — unit 138 passed, concurrency 23 passed
(concurrency NOT in the default unit gate), `lint: PASS` on final main 74ed240. test_run_state
mutation-proofed (reverting to domain-keying fails all 3 cases).
4. **(a) cancel-mid-run on fixed harness** — build 295 (custom immich#2): lockfile mtime 08:50:17
proves it acquired the app lock 7s in → canceled @08:51:05 MID-DEPLOY. After cancel (verified cold
~1 min later): 0 harness procs (no leaked python — old §8.1 gap stays closed), no held locks (lock
released), no immich app/.env/containers(even stopped)/services/volumes/secrets → ZERO leakage,
full teardown. Killed-step logs not API-retrievable (Drone truncates), but the end-state is the
actual test and it is clean.
5. **(b) parallel runs** — builds 287 (immich#2) + 288 (plausible#3), parallel, both
`status=success`, both `deploy-count = 1 (expect 1)`, level=4; host after = zero leakage.
6. **(c) double-!testme same PR** — builds 290 + 291 (same immich domain): both success, 291 logged
the block line then `acquired`, both `deploy-count = 1`, zero leakage. Serialization also observed
directly in the kernel lock table mid-run (lslocks). Covered in detail by my M2(c) PASS @09:05Z.
7. **(d) full green e2e** — build 287 (and 290): complete immich run, all 5 tiers pass, level=4.
Both M2-found fixes are folded in and independently verified: wrapper exit-code-under-set-e
(e1c4198/b7a009c, my 05:00Z note — red still propagates) and CONC-A1 run-keyed state files
(b6e12ef/139e319, my 09:05Z M2(c) PASS + mutation proof). The ~20-min (c) runtimes were an
immich-ML healthcheck flake (converged within DEPLOY_TIMEOUT=1500s), orthogonal to the restructure
(diagnosed 08:43Z). Unheld 0-byte leftover lockfiles are by-design (next-janitor tidy-sweep).
GUARDRAILS honored end-to-end: recipe-mirror PRs (immich#2, plausible#3) used as !testme targets
only, never merged/pushed; cc-ci main touched only by the gated merges (no force-push); no secrets in
any commit. RUN_APP_RE / services_converged / warm-canonical flows untouched (M1 diff review).
CONCLUSION: **M2 — merged + live-verified — PASS.** M1 PASS (04:38Z) + M2 PASS (here) are both fresh
in REVIEW-conc; no open VETO (CONC-A1 lifted). Per the phase DoD the Builder may now write `## DONE`
to STATUS-conc. (Post-verdict I may consult JOURNAL-conc to contextualize; I had NOT read its M2
reasoning before forming this verdict — verified from plan + code/git + Drone API + my own cold runs.)

View File

@ -0,0 +1,284 @@
# REVIEW-dstamp.md — Adversary verdicts for phase `dstamp`
Phase: investigate & solve the discourse abra-stamp drift (upgrade-HC1 stamps the
prev-base tag commit instead of the PR-head version, harness-neutral, since ~06-10).
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-dstamp-discourse-drift.md`. Gates M1, M2.
Verdict log is append-only. `review(...)`-prefixed commits carry verdicts (load-bearing
watchdog signal). Findings filed under `## Adversary findings` in BACKLOG-dstamp.md.
---
## Prep notes (NOT a verdict — no gate claimed yet) @2026-06-11T15:5x
Recon done cold before any Builder claim, to make M1/M2 verification fast and independent.
Anti-anchoring: formed only from the plan (SSOT), the harness code, and direct host evidence
— no dstamp JOURNAL exists yet; none read.
**Stamp mechanism (from code):** HC1's "stamp" = the `coop-cloud.<stack>.chaos-version`
docker service label abra writes on a `--chaos` deploy = the deployed recipe git commit
(`runner/harness/lifecycle.py:468 deployed_identity`, `runner/harness/generic.py:146
assert_upgraded`). Upgrade flow (`generic.py:226 perform_upgrade`): deploy prev-published
base → `recipe_checkout_ref(recipe, head_ref)` (git checkout -f head) → `chaos_redeploy`
(`abra app deploy --chaos`). HC1 asserts `chaos_commit == head_ref` (after stripping the
`+U` untracked-overlay marker). PASS requires the chaos-version to equal the PR head.
**Cold observable facts (from `/var/lib/cc-ci-runs/m2p-discourse/abra/recipes/discourse`
snapshot + live `~/.abra/recipes/discourse` on cc-ci, 2026-06-11):**
- Recipe HEAD `7ae7b0f` = "chore: upgrade to 0.9.0+3.5.0"; `git describe --tags` =
`0.7.0+3.3.1-9-g7ae7b0f` → HEAD is **9 commits past the newest annotated tag**
`0.7.0+3.3.1` (commit `eb96de9`). No `0.8.x`/`0.9.x` tag exists.
- The drift symptom (per plan): chaos-version stamped `eb96de94+U` = the **prev-base tag
commit** (= the upgrade base `0.7.0+3.3.1`), NOT the PR-head `7ae7b0f`.
- abra is **nix-pinned**: `abra version 0.13.0-beta-06a57de`, store path under
`/run/current-system` → binary drift requires a flake.lock/nixos-generation bump between
06-05 and 06-10 (verify against generations, don't assume).
**Open question I'll independently re-derive when M1 is claimed:** why the `--chaos`
redeploy after checkout-to-HEAD stamps the BASE commit (eb96de9), not HEAD (7ae7b0f).
Candidates to test cold: (a) re-checkout to head silently reverted (abra fetch/reset during
deploy); (b) abra chaos resolves the version from the app's recorded `.env` RECIPE/version
(= the base) rather than the working-tree HEAD; (c) the "env drift" since 06-10 = recipe/
mirror git state moved (unreleased commits pushed past last tag) or a tag re-pointed.
**Guardrail teeth I will enforce at M2:** HC1 must still FAIL on a genuinely wrong stamp
(synthesize a wrong-version deploy and show RED). Any "fix" that derives EXPECTED from
"what makes the test pass" rather than abra's documented behavior = automatic FAIL.
Status: idle, awaiting Builder to seed STATUS-dstamp.md and claim M1. Watchdog will ping
on the `claim(...)` commit.
---
## Independent probe findings @2026-06-11T17:3x (NOT a verdict — no M1 claim yet)
Anti-anchoring preserved: JOURNAL-dstamp NOT read. Root cause derived independently from
harness code, per-run artifacts (repro1/repro2 console logs), and direct docker service
inspect on cc-ci. Independently arrived at the same attribution as the Builder.
**Causal chain derived from code + direct evidence:**
1. `provide_ccci_overlay` (rcust-era addition) copies `compose.ccci.yml` into the per-run
recipe dir as an UNTRACKED file. Absent in run 184 (2026-06-05, which used the old
`install_steps.sh` path writing to canonical `~/.abra`) — consistent with run 184 having
no `+U` suffix and passing. The `+U` itself is stripped by HC1's `chaos_commit.split("+",1)[0]`
and is NOT the cause of drift.
2. abra reads `git HEAD = 7ae7b0f` and computes `chaos-version = 7ae7b0f7+U` CORRECTLY.
Confirmed via three bail-at-secrets manual repros + repro2 debug line
`taking chaos version: 7ae7b0f7+U`. abra and the per-run git checkout are EXONERATED.
3. `chaos_redeploy` passes `-c` (no_converge_checks) → `docker stack deploy` returns
immediately; Swarm rolling update runs asynchronously.
4. Discourse `compose.yml` (BOTH base `eb96de94` AND PR-head `7ae7b0f`) sets
`deploy.update_config: { failure_action: rollback, order: start-first, monitor: 5s }`
on the `app` service. Confirmed by direct `docker service inspect disc-ae10f0_..._app`.
5. With `order: start-first`, OLD + NEW task co-reside (~2× memory). Discourse's
Rails/Sidekiq precompile is memory-heavy; under the heavier host load since ~06-10
(warm keycloak and other rcust-phase stacks), the NEW task intermittently fails swarm's
5s update monitor → `failure_action: rollback` fires → Swarm REVERTS the app service
spec to PreviousSpec (base deploy, `chaos-version=eb96de94+U`).
6. `services_converged` blind spot: after rollback `UpdateStatus.State = "rollback_completed"`,
NOT in the blocking set `("updating", "rollback_started")` → returns True as if converged.
Under start-first the OLD task kept serving → `wait_healthy` also passes on the
rolled-back spec.
7. `deployed_identity` reads `.Spec.Labels` → rolled-back spec → `chaos-version=eb96de94+U`.
HC1 asserts head_ref `7ae7b0f76efb``eb96de94` → FAIL with misleading "re-checkout failed".
**Key disproving evidence (independent route):** repro1 was isolated (no concurrent discourse
run, domain `disc-ae10f0` used for the first time) and STILL showed the drift. This refuted
the pure-concurrency hypothesis BEFORE reading the Builder's evidence or JOURNAL.
**Intermittency explained (run 184 ✓ solo 06-05; clustered/repro1/repro4 ✗; repro2 ✓):**
Whether the new start-first task survives the 5s monitor depends on momentary memory pressure.
Run 184: solo + lighter host load + pre-rcust overlay path → new task survived. repro2: warm
volumes/containers from repro1 → faster Rails precompile → task survived. The "since ~06-10
on every run" pattern = heavier baseline load from warm rcust-phase stacks after run 184.
**Fix analysis (Builder commit 0cc31a5 — read before JOURNAL):**
*Part 1 — overlay `order: stop-first`*: Old task stops before new starts → new boots with full
host memory → no OOM under the 5s monitor → no spurious rollback. `failure_action: rollback`
intentionally preserved so a genuinely broken head still rolls back and is caught.
ASSESSMENT: **CORRECT AND SUFFICIENT** for eliminating the spurious-rollback trigger.
*Part 2 — `lifecycle.assert_upgrade_converged`*: Called in `perform_upgrade` immediately after
`chaos_redeploy`, before `wait_healthy`. Polls `docker service inspect
--format '{{if .UpdateStatus}}{{.UpdateStatus.State}}{{else}}none{{end}}'` until terminal.
Returns on `""|"none"|"completed"`; raises on `"rollback_completed"|"rollback_paused"|"paused"`;
polls on `"updating"|"rollback_started"`; times out at `meta.DEPLOY_TIMEOUT`.
ASSESSMENT: **CORRECT** — closes the wait_healthy-masking blind spot. Makes a swarm rollback
an HONEST upgrade failure ("head did not stay healthy") rather than a misreported stamp mismatch.
HC1 commit-match logic is unchanged; this only makes the rollback visible before HC1 runs.
**One concern flagged (not a blocker — defense-in-depth covers it):**
`assert_upgrade_converged` has a theoretical race window: on the very first poll, Docker may
not yet have transitioned from a prior `"completed"` state to `"updating"` (tiny gap between
`docker stack deploy` returning and the Swarm manager scheduling the roll). If the race fires,
the function returns OK on `"none"`, then the rollback happens silently afterward.
Mitigation: with `stop-first` (fix part 1), a post-assert-converged rollback leaves NO serving
task during the rollback → `wait_healthy` also FAILS → the test result is still FAIL, just
with a less specific error ("wait_healthy timeout" rather than "swarm rolled back"). HC1 is
NOT weakened even if the race fires. No action required unless a recipe uses `start-first`
where a post-race rollback could masquerade as a clean upgrade.
**UPDATE — race concern CLOSED by Builder (commit e9c26c7 `harden(dstamp)`):**
Builder addressed the race with a 2-phase protocol:
- **Pre-redeploy**: `update_status_started(domain)` snapshots `UpdateStatus.StartedAt`.
- **Phase 1**: polls until `StartedAt` advances past the snapshot (new update scheduled) OR
state is `"updating"/"rollback_started"`. 30s grace: if no new update appears → no-op
redeploy, nothing to converge.
- **Phase 2**: now that the NEW update is confirmed in flight, waits for terminal state
(same logic as before, but with confidence it's the right update).
Assessment: **CORRECT AND COMPLETE**. Phase 1 deterministically distinguishes the new update
from stale base-deploy terminal state. No new failure modes introduced. The grace period (30s)
is generous relative to Docker's near-immediate scheduling. Race concern fully closed.
**Status:** no `claim(dstamp)` commit yet. Awaiting M1 claim to issue formal verdict.
---
## M1: PASS @2026-06-11T17:36Z
Cold verification from `/srv/cc-ci/cc-ci-adv`. JOURNAL-dstamp not read before verdict (anti-anchoring).
**Check 1 — Recipe policy at 7ae7b0f76efb:** PASS
`cd ~/.abra/recipes/discourse && git checkout -q 7ae7b0f76efb && grep -nA3 update_config compose.yml`
`failure_action: rollback`, `order: start-first` confirmed present at lines 33-35. Direct evidence the
discourse app service is configured to rollback+start-first at the PR-head.
**Check 2 — abra CONSTANT (no binary change 06-05→06-10):** PASS
`for g in $(ls -d /nix/var/nix/profiles/system-*-link); do ...readlink -f $g/sw/bin/abra; done`
→ Gens 2-11 all `/nix/store/bf6azhpi8bi5491n8i4bhjm1z7fva7pb-abra-0.13.0-beta/bin/abra`.
Gen1 differs (pre-bootstrap), gens 4-11 (2026-06-01 onward) identical. abra version change as
cause of drift definitively ruled out by direct evidence.
**Check 3 — Direct rollback evidence (repro4):** PASS
`grep -E 'DSTAMP|UpdateStatus|PreviousSpec|chaos-version' /var/lib/cc-ci-runs/dstamp-repro4.console.log`
→ Line immediately after chaos_redeploy:
- `UpdateStatus.State="updating"` (in flight)
- `Spec.Labels chaos-version="7ae7b0f7+U"` (abra correctly applied HEAD)
- `PreviousSpec.Labels chaos-version="eb96de94+U"` (the base, what swarm reverts to)
→ HC1 line: `chaos-version=eb96de94+U` (AFTER rollback completed) → mismatch → FAIL
Causal chain proven in a single artifact: abra stamped correctly, swarm rolled back, label reverted.
Mechanism confirmed: start-first co-residency → OOM under monitor → failure_action:rollback → PreviousSpec.
**Check 4 — Fix present:** PASS
- `runner/harness/lifecycle.py`: `update_status_started` (line 511) + `assert_upgrade_converged` (line 526).
Phase-1 polls until StartedAt advances past prev_started (or in-flight state seen) → closes race.
Phase-2 terminal: `completed`=OK; `rollback_completed`/`rollback_paused`/`paused`=FAIL with honest message.
- `runner/harness/generic.py:268-278`: `prev_started = update_status_started(domain)` called BEFORE
`chaos_redeploy`, then `assert_upgrade_converged(domain, timeout=DEPLOY_TIMEOUT, prev_started=prev_started)`
called immediately after — BEFORE `wait_healthy`. Correct call order.
- `tests/discourse/compose.ccci.yml:54-55`: `deploy.update_config.order: stop-first` with full WHY
comment citing direct evidence (dstamp-repro1/4) and stating `failure_action: rollback` is LEFT INTACT.
Both commits 0cc31a5 + e9c26c7 verified present (git log --oneline).
**Check 5 — Fix works (dstamp-fix1 and dstamp-fix2):** PASS
- `dstamp-fix1`: `upgrade-converged: disc-ae10f0_ci_commoninternet_net_app swarm UpdateStatus=completed`
+ `upgrade→PR-head: head_ref=7ae7b0f7 chaos-version=7ae7b0f7+U version=0.7.0+3.3.1→0.9.0+3.5.0`
+ `test_upgrade_reconverges PASSED`. Level=2 (install+upgrade only, backup/functional not in STAGES).
- `dstamp-fix2`: same params, same domain, same result — second reliability run confirms.
Both runs: chaos-version=7ae7b0f7+U (head), NOT eb96de94+U (base). Fix is deterministic.
**Check 6 — Blast-radius:** PASS
- n8n: runs 162 (level=4, upgrade=pass) and 47 (level=4, upgrade=pass). Run 162 dated post-06-10
(when discourse was failing) → n8n not affected despite same rollback+start-first policy.
- keycloak: runs 155 (level=4, upgrade=pass) and 187 (level=4, upgrade=pass). Same conclusion.
- `assert_upgrade_converged` now provides a general harness backstop for all rollback-policy recipes.
No overlay change needed for keycloak/n8n (lighter apps, no OOM symptom in evidence).
- drone/traefik: infra, no recipe-CI upgrade tier. No action needed.
**HC1 teeth preserved (code inspection):** `generic.py:174-175``assert_upgraded` logic is UNCHANGED:
`chaos_commit = chaos.split("+",1)[0]`; assertion `head_ref.startswith(chaos_commit) or
chaos_commit.startswith(head_ref)`. `assert_upgrade_converged` runs BEFORE `assert_upgraded`; if a
rollback occurs it raises FIRST with the honest "head did not stay healthy" message; if no rollback occurs,
HC1 commit-match assertion still runs unmodified. A deliberately wrong stamp (e.g. deploying eb96de94
as the chaos version) would still fail HC1 exactly as before. M2 will demonstrate this with a live negative test.
**One nuance (not a blocker):** The "06-05→06-10 change" being specifically "heavier resident load from
rcust-phase stacks" is circumstantially supported by the timeline, but repro1 (isolated, no concurrent apps)
also showed drift — the mechanism fires under general memory pressure during discourse's precompile, not
only when other apps are warm. The exact delta between run 184 (06-05, passed) and subsequent runs is
intermittency of memory pressure, proven by repro2 (warm volumes → faster precompile → task survived) vs
repro4 (fresh boot → slower precompile → task failed). The ROOT CAUSE mechanism is proven by direct
evidence; the specific "what changed between 06-05 and 06-10" reduces to: heavier/more-variable memory
pressure, the mechanism was always latent. This doesn't weaken M1 — the fix eliminates the exposure.
**Verdict: M1 PASS.** Root cause attributed by direct evidence; minimal reproducible demonstration
confirmed; fix (stop-first overlay + assert_upgrade_converged) implemented and working; HC1 unweakened;
blast-radius sweep complete. Builder cleared to proceed to M2.
---
## M2: PASS @2026-06-11T17:58Z
Cold verification from `/srv/cc-ci/cc-ci-adv`. JOURNAL-dstamp not read before verdict (anti-anchoring).
**Check 1 — Build 450 results (level, tiers, flags):** PASS
`cat /var/lib/cc-ci-runs/450/results.json`:
- `"level": 5`
- `"recipe": "discourse"`, `"ref": "7ae7b0f76efb"`, `"pr": "2"`
- All tiers: `"install": "pass"`, `"upgrade": "pass"`, `"backup": "pass"`, `"restore": "pass"`, `"custom": "pass"`
- All rungs: `"install": "pass"`, `"upgrade": "pass"`, `"backup_restore": "pass"`, `"functional": "pass"`, `"lint": "pass"`
- `"clean_teardown": true`, `"no_secret_leak": true`
- Timestamp: `"finished": 1781199631.4...` (2026-06-11 ~17:40 UTC) ✓
- `screenshot.png` present (discourse functional screenshot)
**Check 2 — JUnit XML: test_upgrade_reconverges PASS (HC1 satisfied):** PASS
`grep -c '<failure\|<error' upgrade__generic__test_upgrade.xml` → 0
Full XML: `<testcase classname="tests._generic.test_upgrade" name="test_upgrade_reconverges" time="0.260"/>`
(no `<failure>` child). `test_upgrade_reconverges` directly calls `generic.assert_upgraded(live_app, meta)`.
`assert_upgraded` at `generic.py:174-175` does the HC1 commit-match: `chaos_commit == head_ref`.
Test PASSED → `chaos_commit = 7ae7b0f7` matched `head_ref = 7ae7b0f7`
**Check 3 — PR comment 14347 (!testme path):** PASS
Comment 14346 body = `!testme` (the trigger).
Comment 14347 body (bot response):
`<!-- cc-ci:testme -->\n🌻 **cc-ci** — \`discourse\` @ \`7ae7b0f7\` ✅ **passed**\n[...links to run 450 summary.png + badge + drone build 450...]`
Confirmed via Gitea API. Run directory `/var/lib/cc-ci-runs/450/` exists with full contents.
!testme → bridge ack → drone build 450 → run 450 results → PR comment ✅ passed. Path verified.
**Check 4 — DEFERRED entry closed:** PASS
`machine-docs/DEFERRED.md` lines 346-366: ✅ RESOLVED @2026-06-11 (phase dstamp, Builder) with:
- Root cause narrative (rollback mechanism)
- Direct evidence pointer (dstamp-repro4.console.log)
- Fix commits (0cc31a5 + e9c26c7)
- Real CI proof (drone build #450, LEVEL 5)
- Blast-radius note (only discourse; harness guard covers all rollback-policy recipes)
- Cross-references (STATUS/JOURNAL/REVIEW-dstamp)
**Check 5 — HC1 teeth (wrong stamp still FAILs):** PASS
*Negative control (pre-fix, existing run):* `m2p-discourse/results.json` shows HC1 caught wrong stamp:
`AssertionError: upgrade deployed chaos commit 'eb96de94+U', not the intended PR-head '7ae7b0f76efb'
— the re-checkout to the code under test failed, so the upgrade is not exercising the PR's changes (HC1)`
This is HC1 raising on `eb96de94 ≠ 7ae7b0f7`. HC1 commit-match assertion WORKS.
*Code unchanged (from M1):* `generic.py:174-175` commit-match assertion unmodified. The fix adds
`assert_upgrade_converged` BEFORE `assert_upgraded` — it catches rollback EARLIER with an honest message
but does NOT bypass HC1. If a non-rollback wrong stamp were deployed (e.g. abra bug stamping wrong commit),
`assert_upgrade_converged` would see `completed` and pass, then HC1 would FAIL on the commit mismatch.
*Post-fix rollback path:* `assert_upgrade_converged` raises `RuntimeError` on `rollback_completed` →
upgrade FAILS with honest "head did not stay healthy" → HC1 doesn't even run but test is RED.
Both paths (rollback → caught by assert_upgrade_converged; wrong stamp without rollback → caught by HC1)
still FAIL. The pre-fix negative controls (m2p-discourse, repro1, repro4) demonstrate the wrong-stamp
path is always caught; the fix only changes HOW it's reported and at which point.
**Blast-radius (confirmed at M1, still valid):** Only discourse affected. keycloak/n8n PASS L4
in 06-10/06-11 era. General `assert_upgrade_converged` guard now covers all rollback-policy recipes.
**Phase DoD summary:**
- ✅ Drift mechanism attributed with reproducible evidence (repro4 direct evidence)
- ✅ Fixed at the true root (stop-first overlay + assert_upgrade_converged)
- ✅ Discourse back at real level in real CI via drone !testme (build 450, LEVEL 5)
- ✅ No other recipe silently affected (blast-radius sweep, keycloak/n8n PASS)
- ✅ HC1 unweakened and adversarially re-proven (m2p-discourse negative control + code inspection)
- ✅ DEFERRED closed with pointers
**Verdict: M2 PASS. All phase dstamp DoD items satisfied. Builder cleared for ## DONE.**

184
machine-docs/REVIEW-kuma.md Normal file
View File

@ -0,0 +1,184 @@
# REVIEW — phase `kuma` (uptime-kuma create-a-monitor functional test)
Adversary verdict log. Append-only. SSOT: `cc-ci-plan/plan-phase-kuma-monitor.md`.
## Phase orientation (2026-06-11T18:03Z)
Builder clone: `/srv/cc-ci/cc-ci`; Adversary clone: `/srv/cc-ci/cc-ci-adv`.
Phase goal: add functional test that completes uptime-kuma's first-run setup wizard and exercises
its core function — create a monitor, see it probe a target, assert UP + real probe timestamp.
Negative test (monitor → dead target → DOWN) required if it fits the runtime budget.
Two gates:
- **M1** — test implemented + green locally; approach justified; bounded waits; real assertions
- **M2** — drone-path green (≥2 consecutive runs); flake check; DEFERRED closed
Pre-phase independent research notes:
- uptime-kuma uses Socket.IO for ALL management operations (setup wizard, login, monitor CRUD)
- Existing tests: Socket.IO handshake (EIO v4), SPA branding, health check — NONE exercise wizard/monitor
- Two viable approaches per plan: (a) python-socketio client speaking events; (b) Playwright UI
- Key verification concerns for M1:
- Probe reality: must confirm a *real* HTTP check occurred (timestamp advance + status from
uptime-kuma's state, not echo of config)
- Secret safety: generated admin creds must not appear in logs or test output
- Budget: target ≤90s added to functional tier; must use bounded poll not sleep
- Negative teeth: dead-target monitor must go DOWN (proves probe isn't stub) — required unless
runtime budget forces explicit justification
- Existing `tests/uptime-kuma/functional/` dir has 3 files: health_check, socketio_handshake,
spa_branding — all pass in CI (build #91 was green for uptime-kuma level 5)
- Phase plan says new test goes in `tests/uptime-kuma/functional/` (or `playwright/` if option b)
## Adversary pre-flight checks (2026-06-11T18:03Z)
uptime-kuma Socket.IO event map (from source / prior investigation):
- Setup wizard: `setup` event with `{username, password}` → response `{ok: true}`
- Login: `login` event with `{username, password, token: ""}` → response `{ok: true, token: "..."}`
- Add monitor: `add` event with monitor config → response `{ok: true, monitorID: N}`
- Heartbeat list: `heartbeatList` event or `uptime` event to check recent probe status
- Monitor status: `getMonitorList` or heartbeat events contain `{status: 1}` (UP) or `{status: 0}` (DOWN)
Adversary independent acceptance criteria (what I will cold-verify for M1):
1. Test file in correct location per plan (tests/uptime-kuma/functional/ or playwright/)
2. Setup wizard completed and login token obtained (not hardcoded)
3. Monitor created pointing at a harness-controlled URL (not a stub/no-op)
4. Wait loop is BOUNDED (deadline/max_wait, not open-ended sleep)
5. Assertion is on ACTUAL probe data: at minimum one heartbeat with status=1 + timestamp > deploy time
6. Admin credentials NOT printed/logged in test output
7. Negative test included OR explicit runtime-budget justification in DECISIONS.md
8. Runtime ≤ ~90s added (measure from CI timing)
## Independent pre-flight findings (2026-06-11T18:05Z)
**Critical: python-socketio NOT available on cc-ci.**
```
cc-ci-run -c 'import socketio' # → ModuleNotFoundError: No module named 'socketio'
cc-ci-run -c 'from playwright.sync_api import sync_playwright; print("ok")' # → ok
```
Implication: option (a) python-socketio requires a harness.nix + nixos-rebuild change; option (b)
Playwright works immediately from existing infrastructure. Builder must justify their choice in
DECISIONS.md regardless.
**uptime-kuma recipe pinned at 2.2.1** (image `louislam/uptime-kuma:2.2.1`).
Socket.IO port 3001, routed through Traefik `web-secure` entrypoint.
**uptime-kuma Gitea mirror exists** (recipe-maintainers/uptime-kuma), no open PRs yet. Builder
will need to create a test PR.
**Real probe evidence requirements I will enforce at M1 cold-verify:**
- heartbeat data must contain entries with `status` field (1=UP, 0=DOWN)
- heartbeat timestamps must be AFTER test start (not from config echo)
- For uptime-kuma 2.x: `heartbeatList` socket event OR API poll at `/api/status-page/heartbeat/...`
carries real probe results; event `uptime` also carries historical data
- The monitor's first heartbeat entry is sufficient if it has: `status: 1`, `time` > deploy timestamp
Builder has not yet started (no STATUS-kuma.md, no kuma commits). Waiting for M1 claim.
---
## M1: PASS @2026-06-11T18:26Z
**Claim commit:** `fe8922c claim(kuma): M1 PASS — test_monitor_wizard green at LEVEL 5 via drone build #460`
**Test commit:** `8da59cf feat(kuma): implement wizard+monitor Playwright test`
### Cold-verify evidence (Adversary-independent, from own clone + ssh cc-ci)
**1. Test file location and content**
- File: `tests/uptime-kuma/playwright/test_monitor_wizard.py` (167 lines)
- Correct placement per plan §2 "option b" + discovery.py `playwright/` subdir
- Discovery confirmed: `runner/harness/discovery.custom_tests` recurses into `playwright/`
- `live_app` fixture from root `tests/conftest.py` works (session-scoped, reads `CCCI_APP_DOMAIN`)
**2. Drone build #460 results (read from /var/lib/cc-ci-runs/460/results.json on cc-ci)**
```
level: 5
recipe: uptime-kuma ref: eb4521cc5d77
functional.test_uptime_kuma_root_serves [pass] 20ms
functional.test_socketio_polling_handshake [pass] 26ms
functional.test_uptime_kuma_spa_has_branding [pass] 27ms
playwright.test_monitor_wizard_and_probe [pass] 2817ms
clean_teardown: True
no_secret_leak: True
playwright count: 1
```
All tiers PASS: install/upgrade/backup/restore/custom/lint = Level 5.
**3. Probe reality**
- `test_monitor_wizard_and_probe` PASSED with both positive and negative assertions:
- Self-probe monitor → status "Up" (requires real Socket.IO heartbeat from uptime-kuma server)
- Dead-port monitor (`127.0.0.1:19999`) → status "Down" (proves probe engine not a stub)
- Heartbeat datetime row present (regex `\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}`) — real timestamp
- 2.817s runtime proves fast connection-refused (dead-port negative check confirmed real)
**4. Secret safety**
- `_pw` (64-char UUID hex) used only in `.fill()` calls — never printed, never in assertion messages
- `no_secret_leak: True` confirmed by independent results.json read
**5. Approach justification**
- `machine-docs/DECISIONS.md` entry "2026-06-11 — uptime-kuma: Playwright (option b)" present
- Confirms python-socketio absent, Playwright handles Socket.IO transparently, selectors confirmed
in 2.2.1 compiled bundle `dist/assets/index-D_mnxLA0.js`
**6. Runtime budget**
- 2.817s actual ≪ 90s target
**7. Nothing weakened**
- All 3 existing custom tests still PASS (health_check, socketio_handshake, spa_branding)
- No existing assertions removed or softened
**8. PR comment**
- git.autonomic.zone/recipe-maintainers/uptime-kuma/pulls/3 shows:
`🌻 cc-ci — uptime-kuma @ eb4521cc ✅ passed`
### M1 verdict: **PASS** — Builder cleared to proceed to M2.
Note: build #462 (flake-check second run for M2) was already in progress at time of this verdict.
DEFERRED close + PARITY.md update are M2 pre-conditions per BACKLOG.
---
## M2: PASS @2026-06-11T18:32Z
**Claim commit:** `9afdf3d claim(kuma): M2 — build #462 LEVEL 5 PASS (flake #2); DEFERRED closed; PARITY updated`
### Cold-verify evidence (Adversary-independent)
**1. Build #462 results (read from /var/lib/cc-ci-runs/462/results.json on cc-ci)**
```
level: 5 recipe: uptime-kuma ref: eb4521cc5d77
functional.test_uptime_kuma_root_serves [pass] 16ms
functional.test_socketio_polling_handshake [pass] 26ms
functional.test_uptime_kuma_spa_has_branding [pass] 27ms
playwright.test_monitor_wizard_and_probe [pass] 2746ms
clean_teardown: True no_secret_leak: True playwright count: 1
```
**2. 2 consecutive green runs**
- Build #460: Level 5, `test_monitor_wizard_and_probe` PASS 2817ms
- Build #462: Level 5, `test_monitor_wizard_and_probe` PASS 2746ms
- Both same ref (eb4521cc), same recipe, same PR #3
**3. DEFERRED.md closed**
```
[x] CLOSED @2026-06-11 (Builder, phase kuma): tests/uptime-kuma/playwright/test_monitor_wizard.py
implemented and proven in real CI … Drone builds #460 + #462 both LEVEL 5 …
```
**4. PARITY.md updated**
- New row for `tests/uptime-kuma/playwright/test_monitor_wizard.py` with full rationale
- Documents Up/Down probe, heartbeat datetime, Socket.IO-driven status
**5. PR comment build #462**
- `🌻 cc-ci — uptime-kuma @ eb4521cc ✅ passed`
### Phase DoD check
Per `plan-phase-kuma-monitor.md` §5:
- ✅ uptime-kuma proves actual function (wizard + real probe — Up AND Down confirmed)
- ✅ Flake-checked (2 consecutive Level 5 green runs #460 + #462)
- ✅ Budget held (2.752.82s actual ≪ 90s target)
- ✅ DEFERRED checked off (entry `[x] CLOSED @2026-06-11`)
- ✅ M1 fresh PASS (filed 2026-06-11T18:26Z)
- ✅ M2 fresh PASS (this entry)
- No VETO standing
### M2 verdict: **PASS** — all DoD satisfied. Builder may write `## DONE`.

148
machine-docs/REVIEW-lvl5.md Normal file
View File

@ -0,0 +1,148 @@
# REVIEW — Phase lvl5 (L5 lint rung + de-cap) — Adversary verdicts
Cold-verification ledger (append-only). Each verdict formed from the plan (SSOT), the code/git
history, the verification info in STATUS-lvl5.md, and my own cold re-run — NOT from JOURNAL
(anti-anchoring, §6.1). JOURNAL not consulted before this verdict.
---
## M1 — Implementation complete (pre-merge): **PASS** @ 2026-06-11T07:54Z
Branch `phase-lvl5` @ `3d8d286cf3f2df7d164bf458f07bbb916cc18f2b` (claim 24baac5). Implementation
deliberately NOT on main (reverts 589943f/cd62743 hold it pre-merge) — confirmed; only the
DECISIONS entry (392f7df) is on main. Verified from a **fresh cold clone** on the cc-ci host
(`/tmp/adv-lvl5`, cloned from origin, checked out phase-lvl5; HEAD matched 3d8d286).
**Acceptance per plan §4 M1 — all satisfied:**
1. **Cold clone + HEAD**`git rev-parse HEAD` = 3d8d286 ✓ (matches claim).
2. **Unit suite (CI host venv)**`cc-ci-run -m pytest tests/unit/ -q`**246 passed** in 5.32s
✓ (matches claimed count).
3. **Repo lint**`nix develop .#lint --command bash scripts/lint.sh`**lint: PASS** ✓.
4. **De-capped `compute_level` correct on ALL 4 mission worked examples** (hand-traced against
`level.py` + verified by the rewritten test_level.py):
- install✔ upgrade✘ backup✔ functional✔ lint✔ → **L1** (fail blocks) ✓
- install✔ upgrade✔ backup skip functional✔ lint✔ → **L5** (intentional skip climbs — the
de-cap; was L2 under old rule) ✓
- install✔ upgrade✔ backup **unver** functional✔ lint✔ → **L2** (unver blocks) ✓
- all four ✔, lint unver → **L4** (unverified top rung not earned) ✓
Formula `level = max i: rung_i==pass ∧ all j<i ∈ {pass,skip}` implemented exactly
(pass→advance, skip→continue, fail/unver→break). 0 if none.
5. **N/A classification table matches code.** `derive_rungs` (results.py) implements the
DECISIONS table verbatim, incl. the subtle upgrade split: `skip ∧ ¬has_upgrade_target`
`skip` (structural, climbs); a prior-stage abort (`skip`/None WITH a target, undeclared) →
`unver` (blocks). install never skips; backup_restore skip iff not-capable or EXPECTED_NA;
functional skip iff EXPECTED_NA else unver; **lint pass/fail-or-unver, NEVER skip** (no N/A
escape hatch, §2 item 5; EXPECTED_NA["lint"] ignored). Default-unclassifiable = unver. ✓
6. **§2.3 mirror-context decision reviewed — NO rule filtered.** Executor (`lint.py`) lints a
pristine scratch clone of the per-run tree at the tested sha; origin→local path makes abra's
tag force-fetch work offline (no auth, no go-git "reference not found"), and the run's real
tags ride along so R014 evaluates real content. The plumbing pollution is solved by context,
not exemptions. Confirmed by **real-abra behavioral probe** (not just synthetic fixtures):
- `run_lint("hedgedoc", …)` clean → `{'status':'pass',...}` ✓ (proves scratch-clone makes
abra lint actually run — no FATA).
- inject lightweight tag → `{'status':'fail','detail':'error rule(s) unsatisfied: R014',
'rules_failed':['R014']}` ✓ (proves the classifier has teeth; R014 is NOT suppressed).
Classifier correctly recognizes `rc=0`-with-critical-errors (parses table + "critical errors
present" sentinel, fails closed on disagreement); only content-FATA ("unable to validate
recipe") → fail, all other non-zero → unver.
7. **Verdict-neutrality — code inspection + targeted tests.** `run_lint` invoked once
(run_recipe_ci.py:942), defaults to `unver`, double-wrapped in try/except (crash → stays
unver, non-fatal print), runs BEFORE the tiers at `head_ref` (the exact tested ref). Its
result is consumed ONLY at build_results (line 1278, "non-fatal, verdict unaffected"); NO
verdict computation reads it. 60s hard budget, never raises. Targeted tests pass:
`test_run_lint_missing_recipe_is_unver_not_raise`,
`test_build_results_no_lint_given_is_unverified_never_pass`. ✓
8. **cap/cap_reason/capped fully removed** from active code/schema/card/dashboard/docs. grep over
runner/dashboard/docs/tests finds the words only in (a) the unrelated screenshot timeout-cap,
(b) "capable"/max-users, (c) explicit test/doc assertions that the fields are ABSENT in
schema 2 and that old schema-1 artifacts (which carry level_cap_reason) still render with no
relabeling — history-compat covered by test_card/test_dashboard (green). ✓
No verdict regression, no run-verdict coupling, no rule suppression, no silent pass. **M1 PASS.**
Builder cleared to merge phase-lvl5 → main and proceed to P3/P4 (M2). No VETO.
**Scope note (carried to M2):** M1 verified the lint executor + classifier + level math on real
abra output and the unit surface. M2 must still prove, on real CI end-to-end: ≥1 genuine L5,
≥1 lint-blocked L4, ≥1 N/A-skip climb, drone `!testme` ×2, canaries at designed levels under the
NEW formula, old artifacts rendering live, durations not inflated (lint ≤~60s; observed ~0.7s),
the before/after level table for ALL enrolled recipes, and card/dashboard/badge visually (PNG/SVG).
---
## M2 — Proven in real CI: **PASS** @ 2026-06-11T11:27Z
Main @ `a521d43` (impl merged 08e6cc8 + PR-path fix 68c3486). Cold-verified from a **fresh clone
of main** on the cc-ci host (`/tmp/adv-m2`), drone API (token from /run/secrets), live HTTPS
artifacts, and Read PNGs. JOURNAL not consulted before this verdict.
**Acceptance per plan §4 M2 + §6 DoD — all satisfied:**
1. **Unit suite + lint (fresh clone main).** `cc-ci-run -m pytest tests/unit/ -q` → **247 passed**;
`scripts/lint.sh` → PASS. The new PR-path regression test
`test_run_lint_detached_pr_tree_lints_exact_ref` passes (covers fix 68c3486: abra lint checks
out the repo DEFAULT BRANCH, so a detached scratch clone would FATA or silently lint a stale
branch; fix forces local main AT the tested ref + repoints origin to scratch → lints the PR
head content). My M1 smoke only exercised the HEAD path; this closes that gap.
2. **Genuine L5 (full clean climb).** Runs 398 hedgedoc / 406 immich / 407 plausible / 413 mumble:
results.json schema=2, level=5, all 5 rungs pass, no cap keys, drone build status=success.
3. **Lint-blocked L4, verdict-neutral — the central claim.** Run 405 custom-html PR4:
results.json level=4, lint=fail rules_failed=[R011], all five TIERS pass
(install/upgrade/backup/restore/custom), **drone build 405 status=SUCCESS**, and the bridge
`reflected outcome build 405 (custom-html PR #4): success` to the PR. A lint failure caps the
level at 4 but does NOT flip the run verdict. Card PNG shows lint ✗ FAIL red, "level 4 of 5",
badge #a0b93f. Neutrality proven BOTH directions (415/416 red with lint=pass — see #6).
4. **N/A-skip climb (the de-cap).** Run 399 custom-html-tiny: backup_restore=skip with declared
reason in skips.intentional ("stateless static file server … no backupbot.backup label"),
other rungs pass, **level=5** (was L2 @ #205). Card PNG shows backup/restore "⊘ INTENTIONAL
SKIP" + reason, level 5 of 5. A formerly-capped non-backup-capable recipe now climbs.
5. **Drone !testme path ×3, GENUINE (not manual API).** ccci-bridge poll logs:
`[poll] triggered build 405 for custom-html@36b362aa (PR #4, comment 14332)`,
`406 immich@107d7220 (PR #2, comment 14333)`, `407 plausible@13458fac (PR #3, comment 14334)`,
each followed by `reflected outcome … success`. Build params confirm RECIPE/PR/REF match the
real PR heads. ≥2 required; 3 delivered, all on real PRs showing the lint rung.
6. **Canaries at re-derived designed level + backup-fail still blocks.** 415 (bkp-bad) / 416
(rst-bad): drone build status=**failure** (red), results.json level=1, rungs {install pass,
upgrade skip(structural — no version tags on SRC+REF mirror), backup_restore FAIL, functional
unver, lint pass}. New-formula trace: install(1) → upgrade skip(climb) → backup_restore
fail(BLOCK) → L1. RED is caused by the failing backup/restore TIER (verdict logic untouched),
NOT by lint (lint=pass). Re-derivation is sound; matches OLD-rule level too (old: upgrade N/A
caps at L1) — no regression, same designed level, red either way.
7. **Unverified-blocks (mission example #3), synthesized.** host run
`/var/lib/cc-ci-runs/lvl5-unver-demo/results.json`: schema=2, level=2, rungs {install pass,
upgrade pass, backup_restore UNVER, functional pass, lint pass}, skips.unintentional=
[backup_restore]. backup unver blocks at L2 even though functional+lint pass above it. ✓
8. **Durations not inflated.** drone build wall-times: 398=100s, 399=45s, 405=61s, 406 immich=199s
(shot baseline 198-199s), 407 plausible=164s (shot baseline 166s), 413=80s. lint adds ~0.7s;
the two cross-phase baselines are flat (407 slightly faster). No duration regression.
9. **Old artifacts render, no relabel.** /runs/370 (schema=1, level=4, level_cap_reason present)
serves 200 (results.json + summary.png); dashboard `/` + `/recipe/immich` 200 with mixed
schema-1/schema-2 rows; unit history-compat tests green.
10. **lint.txt served.** /runs/398/lint.txt 200 — full real abra table (HEAVY-box), cmd + rc=0 +
status=pass header, ref=09bf4d54 (hedgedoc's EXACT tested ref).
11. **Badges number+colour only.** hedgedoc badge ">level 5<" #3fb950; custom-html ">level 4<"
#a0b93f; grep finds NO cap/skip/na/reason language in badge SVGs. Matches operator spec.
12. **P3 matrix 19/19 lint PASS** (BACKLOG-lvl5.md) via documented scratch-clone method; no mirror
PRs / DEFERRED needed; warn-severity misses only (don't fail the rung). lasuite-meet R014 now
passes genuinely (tag annotated upstream — not suppressed). **Before/after table: every level
shift is explained by the rule change** — L4→L5 (+lint, baseline from real artifacts + P3
sweep), de-cap L2→L5 (custom-html-tiny proven #399; mailu same mechanism), L4 lintdemo (#405),
canary L1, bluesky N/A consistent. **No unexplained shift / no downward regression.** "Analytic
5" cells are derivation-checkable from two evidenced inputs (real baseline tiers + proven lint).
13. **No secret leak.** Independent sweep: no /run/secrets infra-secret VALUES and no generated
app-credential patterns appear in any published run artifact (the new lint.txt surface incl.).
results.json flags no_secret_leak=true + clean_teardown=true across runs.
**§6 Definition of Done satisfied:** new level system live on main and visible end-to-end
(results.json→card→dashboard→badge); L5 = abra recipe lint on the tested ref; capping fully
removed (no cap/cap_reason/capped); all 19 enrolled recipes linted + dispositioned with an
adversary-checked before/after table; ≥1 real L5 + ≥1 lint-blocked L4 + ≥1 N/A-skip climb through
real CI incl. the drone path ×3; old artifacts unharmed; M1 (cfc87fd) + M2 fresh Adversary
PASSes; no verdict or duration regressions.
**No VETO. Builder is cleared to write `## DONE` to STATUS-lvl5.md.**
Out-of-scope note (Builder's STATUS query): the WC5 promote-on-green-cold observation (a
STAGES-filtered hand-run promoted custom-html's canonical) is pre-existing and orthogonal to the
level system — NOT a lvl5 finding/regression and not a DONE blocker. If the Builder wants it
tracked, DEFERRED.md/IDEAS.md is the right home; I'm not filing it as an [adversary] finding.

View File

@ -0,0 +1,91 @@
# REVIEW — phase `mailu` (backupbot labels + backup/restore coverage)
Adversary verdict log. Append-only. SSOT: `cc-ci-plan/plan-phase-mailu-backup.md`.
## Phase orientation (2026-06-11T17:59Z)
Builder clone: `/srv/cc-ci/cc-ci`; Adversary clone: `/srv/cc-ci/cc-ci-adv`.
Phase goal: mirror PR adding backupbot v2 labels to mailu recipe + proof backup→wipe→restore on real
seeded mail data passes CI.
Pre-phase independent research notes:
- Mailu compose.yml analyzed. Critical durable volumes:
- `mailu:/data` on `admin` svc — SQLite DB (accounts, domains, aliases, DKIM config)
- `dkim:/dkim` on `admin` svc — DKIM signing keys
- `mail:/mail` on `imap` svc — mail store (Maildir, all user messages)
- `redis:/data` on `db` svc — Redis (transient: rate-limits, sessions) — likely NOT needed for restore
- Other volumes (rspamd, webmail, certs, mailqueue) — transient/cache, NOT durable
- Correct backupbot v2 label placement: `admin` service (for DB + DKIM) and `imap` service (for mail store)
- Backupbot v2 map syntax confirmed from keycloak/immich/mattermost-lts recipes
- SQLite `/data` — pre-hook may be needed to dump consistently; or copy is safe if admin is quiesced
- Mail store backup: Maildir is file-based, safe to copy live
- Recipe mirror has open PR#2 (upgrade-3.1.0+2024.06.52) — backupbot PR must be separate
Awaiting M1 claim from Builder.
---
## M1 FAIL @2026-06-11T20:58Z
**Claim**: build #473 LEVEL 5 PASS, backup→wipe→restore on real seeded mail data proven.
**Verdict: FAIL** — the backup/restore test exercises only the SQLite `/data` volume; the Maildir
`/mail` volume is labeled and backed up but is NOT specifically tested for restoration.
### What I verified (cold)
1. **PR#3 labels correct** (`add-backupbot-labels`, head `edc0201a79d3`):
- `admin` service: `backupbot.backup: "true"` + `backupbot.backup.path: "/data"`
- `imap` service: `backupbot.backup: "true"` + `backupbot.backup.path: "/mail"`
- Version bump: `3.0.1``3.0.2+2024.06.52`
- DKIM exclusion intentional and documented in PR desc ✓
2. **Build #473 evidence** (drone API + results.json):
- status: success, level: 5, all 5 rungs PASS ✓
- `clean_teardown: true`, `no_secret_leak: true`
- `test_backup_captures_mailbox` PASS — `citest@<domain>` in config-export at backup time ✓
- `test_restore_returns_mailbox` PASS — `citest@<domain>` back in config-export after restore ✓
- Backup snapshot `13eee64e`: 139 files, 85MB ✓
- Cold teardown: `abra app ls --server cc-ci` shows no mailu apps ✓
- No plaintext secrets in compose.yml (secrets section uses swarm `external: true` refs) ✓
- PARITY.md updated: P4 COVERED ✓
3. **Backupbot v2 syntax verified** against keycloak/mattermost-lts/n8n patterns — `backupbot.backup.path`
is valid v2 syntax for specifying the backup path ✓
### Failing item: `/mail` volume restoration not tested
**Plan requirement** (`plan-phase-mailu-backup.md` §2.3):
> "ensure the restore tier's data-integrity seed/verify actually exercises MAIL data (a seeded
> mailbox + message that survives backup→wipe→restore — extend the existing functional helpers if
> the current seed is too shallow; never weaken anything)"
**What the test does** (`ops.py`):
- `pre_backup`: creates user account `citest@<domain>` in SQLite via `flask mailu user` — this
is an account record in `/data` (SQLite), NOT a mail message in `/mail` (Maildir)
- `pre_restore`: deletes `citest@<domain>` from SQLite via sqlite3 — only wipes the DB record;
the Maildir at `/mail` is untouched throughout
- `test_restore.py`: asserts `citest@<domain>` is back in `config-export` — this proves the SQLite
(`/data`) backup/restore worked, but says nothing about the Maildir (`/mail`)
**What is missing**: the test never (a) seeds an actual email message into the maildir, (b) wipes
maildir content before restore, or (c) verifies a message survived the restore cycle. If backupbot
silently failed to restore the `/mail` volume, this test would still PASS.
**Fix required** (using existing infra from `test_mail_flow.py`):
1. `pre_backup`: after creating `citest@<domain>`, inject a uniquely-tagged message into the mailbox
(e.g., via in-container `sendmail` → postfix → dovecot deliver, the same path as `test_mail_flow.py`)
2. `pre_restore`: also wipe the maildir for `citest@<domain>` (e.g.,
`doveadm expunge -u citest@<domain> mailbox INBOX ALL` in the `imap` container)
3. `test_restore.py`: after asserting the account is back, also assert the seeded message is present
(e.g., `doveadm search -u citest@<domain> mailbox INBOX ALL` returns ≥1 message)
Note: the Maildir delivery flow is already proven in `test_mail_flow.py` — the tooling exists,
the fix is an extension of the existing seed, not a new mechanism.
### Adversary finding filed
See BACKLOG-mailu.md `## Adversary findings` — item [ADV-mailu-01].
Builder: fix the seed shallow enough to exercise `/mail` and re-trigger. PARITY.md and the labels
are correct; only the seed depth needs extending.

View File

@ -0,0 +1,541 @@
# REVIEW-rcust.md — Adversary ledger for the recipe-customization restructure phase
SSOT for this phase: `/srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md`.
Gates: **M1** (implementation verified — branch `restructure/recipe-custom`, unit+concurrency+lint
green on cold clone, resolved-customization diff clean for all 21 recipes, adversarial diff review)
and **M2** (merged + real-CI regression sweep matching baseline matrix). DONE requires fresh PASS
for both with no open VETO.
I own this file and the `## Adversary findings` section of BACKLOG-rcust.md only.
---
## Standing watch items (what I will hunt at M1/M2)
- **Coverage loss** (cardinal risk): for every migrated recipe, old loaders' effective customization
values must equal new `meta.load()` values. Throwaway diff script over all 21 recipe dirs; any
delta = finding.
- **Assertion weakening** in `tests/<recipe>/` diffs — migrations must be mechanical only (signatures,
fixture/key renames, underscore prefixes). Any changed assert/expected value = VETO.
- **Deleted-code fallout** — dangling refs to `_recipe_meta`, `_load_meta`, `_recipe_extra_env`,
`_recipe_meta_flag`, `declared_deps`, `is_canonical_enrolled`, `OIDC_AT_INSTALL`,
`CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`, `setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`.
- **Validation gaps** — typo'd key / wrong type / callable-on-data-key must raise MetaError, not pass.
- **R2 fixed end-to-end** — orchestrator load path delivers SCREENSHOT to screenshot.py.
- **HC2 / F2-11 integrity** — repo-local default-deny, requires_deps skip-report, generic floor
semantics all unchanged.
---
## Verdicts
_(no GATE verdict yet — M1 is not claimed. M1 only claims after P1P6 are all on the branch;
Builder has landed P1 (472a68b) + P2 (8cd72fd) and is mid-P3. The interim pre-review below is
front-loaded break-it work on the FROZEN P1/P2 commits — NOT an M1 PASS.)_
### Interim pre-review of frozen P1+P2 (branch @ 8cd72fd) — @2026-06-10, cold from upstream clone
Done as idle-time break-it work while no gate is pending. P1/P2 phase commits won't be rewritten
(Builder adds P3+ on top), so reviewing them now is non-wasted and front-loads M1. Cold clone of
`origin/restructure/recipe-custom` into `/tmp/rcust-verify` from the true upstream remote.
**No defects found so far.** Results:
1. **Deleted-code fallout — CLEAN.** Grepped `runner/ tests/ scripts/` for live refs to every deleted
symbol (`_recipe_meta`, `_load_meta`, `_recipe_extra_env`, `_recipe_meta_flag`, `declared_deps`,
`is_canonical_enrolled`, `OIDC_AT_INSTALL`, `CHAOS_BASE_DEPLOY`, `SKIP_GENERIC`,
`setup_custom_tests`, `deps_apps`, `deps_creds`, `deployed_app`). All hits are comments/docstrings
explaining the deletion, test names, or the intentionally-RETAINED `CCCI_SKIP_GENERIC*` env form
(kept per P2c). Zero live call-sites. `setup_custom_tests.sh` files gone.
2. **All-recipes-load-clean (typo gate) — PASS, independently.** Ran `meta.load()` (pure stdlib) over
all 21 recipe dirs cold via plain python3 (did NOT trust the Builder's test_meta.py). All 21 load;
non-default key sets sane. Every ALL-CAPS key used in any recipe_meta.py is in the 14-key registry.
3. **Coverage-loss diff (CARDINAL check) — ZERO deltas on data keys + hook presence.** Throwaway
harness (`/tmp/diff_meta.py`) reproduces main's six-loader effective resolution (`_load_meta`,
`declared_deps`, `is_enrolled`, `_recipe_extra_env`) from MAIN's recipe_meta files and diffs vs the
BRANCH's `meta.load()` for all 21 recipes. After correcting one harness artifact (EXTRA_ENV default
is `{}` not None), **0/21 recipes show any delta** for HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/
HTTP_TIMEOUT/BACKUP_CAPABLE/EXPECTED_NA/UPGRADE_BASE_VERSION/DEPS/WARM_CANONICAL + presence of
READY_PROBE/BACKUP_VERIFY/UPGRADE_EXTRA_ENV/EXTRA_ENV/SCREENSHOT.
4. **Validation gaps — CLOSED.** Crafted tmp recipe_metas: typo'd key → MetaError (with "did you mean
DEPLOY_TIMEOUT?"); wrong type (`DEPLOY_TIMEOUT="str"`) → MetaError; callable on data key
(`DEPLOY_TIMEOUT=lambda ctx:...`) → MetaError; `_PRIVATE`/lowercase-helper → loads clean (exemption
works). All four behave per the locked decision.
5. **meta.py read** — single `exec()`, frozen `RecipeMeta` generated from `KEYS`, `_coerce` rejects
bool-as-int and callable-on-data-key; `non_default` compares vs registry default. No issues.
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** full `pytest tests/unit -q` +
`pytest tests/concurrency -q` + `scripts/lint.sh` cold on the cc-ci host; R2 end-to-end through the
real orchestrator screenshot path; P3 ctx-hook signature migration (assert byte-identical, legacy
`lambda domain:` raises clear MetaError); P4/P5/P6; re-run the coverage diff on the FINAL branch
(P3 changes hook signatures); recipe-test diffs are mechanical-only (no assertion weakening);
HC2/F2-11/generic-floor integrity. These wait for the `claim(rcust): M1`.
### Interim pre-review of frozen P3 (branch @ fd02d9f) — @2026-06-10, cold from upstream clone
Builder landed P3 (uniform ctx hook convention) and moved to P4, so P3 is frozen. Pre-reviewed it.
**No defects found.**
1. **Mechanical-migration discipline — HELD (no VETO trigger).** `git diff 8cd72fd..fd02d9f` over
`tests/*/` shows ZERO changed assert/expected literals. Every hook change is purely
`def HOOK(domain[, meta])``def HOOK(ctx)` + `domain``ctx.domain` in the body. Spot-checked
cryptpad/mumble/ghost/lasuite-drive recipe_meta.py + lasuite-drive ops.py: seeded values, return
dicts, paths, status codes, and the `pre_restore` `assert _psql(...) in (...)` are byte-identical
apart from the `ctx.` deref.
2. **HookCtx — present + complete.** `meta.HookCtx` frozen dataclass has all 5 documented fields
(`.domain`, `.base_url`, `.meta`, `.deps`, `.op`); `meta.hook_ctx(domain, meta, op=…)` factory
builds it and pulls `deps` from `$CCCI_DEPS_FILE`. All call sites migrated: run_recipe_ci
`pre_<op>`, BACKUP_VERIFY; lifecycle `extra_env` + READY_PROBE; screenshot `SCREENSHOT(page, ctx)`.
(NB my first pass falsely flagged "no HookCtx" — that was a STALE WORKTREE at P2; corrected by
checking out fd02d9f. Logged here for honesty.)
3. **Legacy-signature guard (P3.4) — PRESENT + works, live-probed.** `meta.check_hook_signature`
exact-matches positional params and raises a CLEAR MetaError naming the P3 migration + HookCtx
fields. Wired into both `load()` (recipe_meta hooks; SCREENSHOT expects `(page, ctx)`, rest
`(ctx)`) and the orchestrator (ops.py `pre_<op>`). Crafted tmp metas: legacy `READY_PROBE(domain)`,
`SCREENSHOT(page, domain, meta)`, `EXTRA_ENV(domain)` all → MetaError at load; `READY_PROBE(ctx)`
loads clean. No silent mid-run TypeError path.
4. **Coverage diff re-run at P3 head — still 0/21 deltas** (hook presence + all data keys unchanged).
Net: P1+P2+P3 all clean under cold adversarial probing. M1 still gated on full unit+concurrency+lint
on the cc-ci host, P4P6, R2 end-to-end via the real screenshot orchestrator path, and a final
coverage re-diff. No findings filed; no VETO.
### Interim pre-review of frozen P4 (branch @ 29a28e2) — @2026-06-10T18:55Z, cold from fresh host clone
Builder landed P4 (custom-test ergonomics) and moved to P5, so P4 is frozen. Pre-reviewed it cold.
**No defects found.** NOT an M1 verdict — M1 stays gated (see "Still UNVERIFIED" below).
Cold acceptance (fresh `git clone` on cc-ci host at 29a28e2, my own checkout — not the Builder's):
- `cc-ci-run -m pytest tests/unit -q`**184 passed** (exact match to claim; full suite, no
cross-fixture pollution from the session-scoped `deps` fixture).
- `cc-ci-run -m pytest tests/unit/test_discovery.py test_discovery_phase2.py
test_conftest_fixtures.py -q` → 14 passed.
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS** (ruff format/check, deadnix,
shfmt, shellcheck, yamllint all clean).
Correctness probes:
1. **Placement-rule claim ("zero in-repo users of top-level custom tests") — HOLDS.** Filesystem
sweep of every `tests/<recipe>/test_*.py`: ALL are lifecycle names (test_{install,upgrade,
backup,restore}.py). No top-level non-lifecycle custom exists in-repo, so dropping the top-level
glob in `discovery.custom_tests` loses ZERO coverage. The lifecycle-name exclusion is retained
inside functional/playwright as the double-run safety net.
2. **Discovery diff — clean.** Top-level `glob(test_*.py)` branch removed; functional/ + playwright/
subdir globs retained with `basename not in lifecycle_names` guard. Docstring + module header
updated to state the placement RULE.
3. **Test changes are adaptation + strengthening, NOT weakening (no VETO trigger).**
- `test_discovery_phase2`: renamed to `..._placement_rule_...`; now ASSERTS the top-level
`test_sso_smoke.py` is `not in names` (new negative assertion proving the behavior change),
while functional/playwright customs are still `in names` and lifecycle name excluded.
- `test_discovery::test_custom_tests_repo_local_gated`: repo-local custom moved from top-level
into `functional/`; HC2 default-deny (`== []` when unapproved) and approved-case
(`functional/test_sso.py in names`, `test_install.py` excluded) both INTACT. HC2 integrity
preserved.
4. **op_state fixture — correct.** Skips with clear reason on unset env / missing file / non-JSON
(`except ValueError` catches JSONDecodeError); reads & returns parsed dict otherwise. Tests
cover 3 of 4 paths (the non-JSON skip path is untested — minor coverage gap, not a defect; the
branch is trivially correct by inspection).
Net: P1+P2+P3+P4 all clean under cold adversarial probing; both halves of every phase claim
(unit count + lint) reproduced cold on a fresh clone. No findings filed; no VETO.
**Still UNVERIFIED for M1 (do NOT treat above as M1 PASS):** P5 (manifest) + P6 (docs);
`pytest tests/concurrency -q` cold; R2 end-to-end through the real orchestrator screenshot path;
final coverage re-diff on the COMPLETE branch (P1P6, all 21 recipes, effective customization set
unchanged); recipe-test diffs mechanical-only across the whole branch; HC2/F2-11/generic-floor
integrity at the final head. These wait for `claim(rcust): M1`.
### Interim pre-review of frozen P5 (branch @ 68954be) — @2026-06-10T19:06Z, cold from fresh host clone
Builder landed P5 (customization manifest) and moved to P6, so P5 is frozen. Pre-reviewed it cold.
**No blocking defect; one secret-SURFACE observation raised (heads-up to Builder, NOT a VETO, NOT
an M1 secret-leak failure).** NOT an M1 verdict.
Cold acceptance (fresh `git clone` on cc-ci host at 68954be, my own checkout):
- `cc-ci-run -m pytest tests/unit -q` → **191 passed** (exact match to claim).
- `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
Primary adversarial target — SECRET LEAKAGE via the new manifest surface (D-gate: published logs +
dashboard contain NO secrets, incl. generated app passwords):
1. **Generated/runtime secrets — NOT exposed (gate holds).** `manifest.build` collects only:
`meta_non_default` (static recipe_meta), hook NAMES (pre-ops/install_steps.sh/compose.ccci.yml),
overlay FILENAMES, custom-test COUNTS, and env-override KEY names (printed `KEY=1`, value never
rendered). It never touches `deps` (client_secret), `op_state`, abra-generated app passwords, or
any env VALUE. The cardinal concern — generated app passwords on the dashboard — is structurally
absent from this surface.
2. **Cold all-recipes sweep.** Built+rendered the manifest for all 21 recipes on the host; grepped
the rendered blocks AND the results.json `customization` payload for secret/password/token/key/
credential and for any 32+ char high-entropy string. The ONLY hit, across every recipe, is
plausible's `EXTRA_ENV.SECRET_KEY_BASE` =
`"ccciplausibletestkeybase64charsexactlyforCIephemeral4567890123"`.
3. **OBSERVATION (not a leak):** that value is a HARDCODED, committed, PUBLIC dummy CI constant
(tests/plausible/recipe_meta.py, in the open-source repo) — not a generated or real secret.
`meta_non_default` dumps EXTRA_ENV literal dicts verbatim into the log AND results.json (→
dashboard), so a field literally named `SECRET_KEY_BASE` with a value now appears on the
dashboard. No real secret is exposed (it's public), so this is NOT a D-gate failure and does NOT
block P5. BUT it's a standing surface: (a) a dashboard secret-scan gets a true-positive-shaped
hit on a public dummy (noise that could mask a real leak), and (b) if any recipe ever set a real
secret-ish literal in a meta dict, the manifest would surface it unredacted. Flagged to Builder
via BUILDER-INBOX as a heads-up to consider redacting values of sensitive-named meta keys before
M1. Will re-examine on the real dashboard at the M1 cold-verify.
4. **HC2-honoring — confirmed.** Manifest routes ALL repo-local reads through `discovery._gated`
(ops.py loop direct; `install_steps`/`resolve_overlay_op`/`custom_tests` each call `_gated`
internally). An unapproved repo-local recipe contributes nothing to the manifest.
5. **Pure presentation — holds.** `build()` only reads files/env and returns a dict; `render()`
formats a string. Called at run_recipe_ci.py:889-890 (print) + embedded at :1261 into results;
no state mutation, no verdict influence. `_jsonable` renders callables as `'<hook>'` (so a
callable EXTRA_ENV/READY_PROBE never leaks closure internals) and tuples→lists for JSON.
Net: P1P5 all clean under cold adversarial probing; every phase claim (unit count + lint)
reproduced cold. No findings filed; no VETO. One non-blocking secret-surface heads-up sent.
**Still UNVERIFIED for M1:** P6 (docs); `pytest tests/concurrency -q` cold; R2 end-to-end via the
real orchestrator screenshot path; final coverage re-diff on the COMPLETE branch (all 21 recipes,
effective customization unchanged); recipe-test diffs mechanical-only across the whole branch;
HC2/F2-11/generic-floor integrity at final head; AND — at the M1 dashboard check — confirm the
SECRET_KEY_BASE-named field on the real dashboard is the accepted public dummy (or redacted).
These wait for `claim(rcust): M1`.
## M1 — implementation verified: **PASS** @2026-06-10T19:27Z (branch `restructure/recipe-custom` @ 858e0f5)
Cold-verified from TWO fresh clones on the cc-ci host (NEW=858e0f5, OLD=main pre-restructure;
merge-base 49fb818 confirmed → `main..858e0f5` is exactly P1P6). Verdict formed from the phase plan
(SSOT), the code/git history, the STATUS verification facts, and my own cold re-runs — NOT from
JOURNAL rationale (isolation discipline; I did not need to consult JOURNAL).
**All M1 Definition-of-Done items PASS:**
1. **Cold test suites — match claim exactly.** Fresh clone @858e0f5:
`cc-ci-run -m pytest tests/unit -q` → **192 passed**; `tests/concurrency -q` → **23 passed**
(untouched by this plan, proven); `nix develop .#lint --command scripts/lint.sh` → **lint: PASS**.
2. **Coverage diff (cardinal risk) — 0 REAL deltas / 21 recipes.** Wrote throwaway extractors that
resolve EVERY recipe's effective customization in BOTH worlds — OLD via the legacy loaders
(`_load_meta` + `lifecycle._recipe_extra_env` + `deps.declared_deps` + `_recipe_meta_flag`),
NEW via `meta.load()` + `meta.extra_env/upgrade_extra_env` — for the common keys (HEALTH_*,
timeouts, DEPS, EXTRA_ENV resolved at a fixed domain, UPGRADE_EXTRA_ENV, BACKUP_CAPABLE,
EXPECTED_NA, UPGRADE_BASE_VERSION, READY_PROBE/BACKUP_VERIFY presence). Diff = **0 behavioral
deltas**; the only raw diffs were 20× `UPGRADE_EXTRA_ENV: None→{}` (unset default representation,
behaviorally identical) and mumble (most-customized: callable EXTRA_ENV→dict, UPGRADE_EXTRA_ENV,
READY_PROBE) is **byte-identical** old↔new.
Deleted keys accounted for (no silent loss): `SKIP_GENERIC` (0 recipe users); `CHAOS_BASE_DEPLOY`
→ overlay-presence (discourse+ghost, exactly the two shipping compose.ccci.yml — perfect 1:1, no
change either direction); `OIDC_AT_INSTALL` → install-time made universal (drive+meet were
already install-time). **lasuite-docs** declared DEPS but NOT OIDC_AT_INSTALL → OLD post-install,
NEW install-time: an INTENTIONAL P2b consolidation, not a drop — flagged below for M2 validation.
3. **Assertion weakening (VETO-class) — NONE.** Full branch diff over all recipe test files
(excl. harness unit/concurrency/regression): 18 removed asserts, 18 added. After mechanical
normalization (`domain`→`ctx.domain`, `deps_creds`→`deps`, `MAX_USERS`→`_MAX_USERS`, whitespace)
the removed and added assert sets are **IDENTICAL** — zero unmatched in either direction. Every
change is a pure signature/fixture/constant rename; no expected value altered, no assert deleted.
Spot-confirmed discourse/ghost `_psql(domain,…ci_marker…) in (…)` → `ctx.domain` only (expected
tuple + SQL byte-identical). **No VETO.**
4. **Deleted-code fallout — clean.** No dangling LIVE refs to any of the 13 deleted symbols
(`_recipe_meta`/`_load_meta`/`_recipe_extra_env`/`_recipe_meta_flag`/`declared_deps`/
`is_canonical_enrolled`/`OIDC_AT_INSTALL`/`CHAOS_BASE_DEPLOY`/`SKIP_GENERIC`/`setup_custom_tests`/
`deps_apps`/`deps_creds`/`deployed_app`). Only residue: stale DOC/comment mentions of
`OIDC_AT_INSTALL` + `setup_custom_tests.sh` in PARITY.md files (non-blocking P6 cosmetic nit).
5. **Validation gaps — closed.** Cold-probed `meta.load()` with synthetic bad metas: typo'd key,
str-on-int, bool-as-int, callable-on-data-key, legacy hook sig `READY_PROBE(domain)`, and unknown
key ALL → `MetaError` (clear, names the offending file/key). Clean + underscore-private-helper
metas load fine (no false positives). No silent pass.
6. **R2 fixed end-to-end.** Cold proof through the REAL load path: a recipe declaring
`def SCREENSHOT(page, ctx)` is surfaced by `meta.load()` and resolved callable by
`screenshot._load_screenshot_hook` (old L1 allowlist dropped it — now arrives); orchestrator wires
it `run_recipe_ci.py:1029 capture(…, recipe_meta=meta)` → `hook(page, hook_ctx(domain, meta))`.
Absent recipe → None (default landing-page path). Legacy `SCREENSHOT(page, domain, meta)` sig
rejected at load.
7. **HC2 / F2-11 / generic-floor integrity — preserved.** Cold-probed `discovery.custom_tests` +
`install_steps`: UNAPPROVED repo-local → `[]` / `None` (default-deny holds); APPROVED → surfaced.
`sso_dep_unverified` (F2-11) logic UNCHANGED (only a comment edited) — a deps-not-ready run that
skips ≥1 `requires_deps` test still suppresses the green signal. Generic floor `_skip_generic`
default = run (additive); opt-out now env-only (same env vars as before; the 0-user meta key
removed) and surfaced LOUDLY in CI + flagged `!!` in the manifest — strictly stronger, never
silent.
8. **(Bonus) P5 secret-surface heads-up RESOLVED + verified.** The Builder landed `858e0f5`
redacting secret-named meta values in the manifest (my P5 BUILDER-INBOX ask). Cold-verified:
`plausible.EXTRA_ENV.SECRET_KEY_BASE` → `<redacted>` in BOTH the log block and results.json;
recursive into nested dict keys; word-segment `(^|_)KEY(_|$)` regex avoids over-match
(KEYCLOAK_* passes). All-21-recipe sweep: exactly 1 redaction, ZERO over-redaction, ZERO
under-redaction (no secret-shaped value remains). Regression test
`test_manifest_redacts_sensitive_named_values` present.
**Verdict: M1 PASS.** No findings filed, no VETO.
**This does NOT clear `## DONE`.** Per the phase DoD, DONE requires a fresh Adversary PASS for BOTH
M1 *and* M2. M2 (merged-main real-CI regression sweep vs the committed baseline matrix) is still
unverified. M2 watch-items I will specifically re-check from run logs:
- **lasuite-docs OIDC is now install-time** (post→install change above) — must pass a real run with
OIDC wired at install (skip-count 0 on its `requires_deps` tests).
- the customization spot-checks the plan §M2.4 enumerates (mumble READY_PROBE tcp lines, cryptpad
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + auto-chaos base deploy, lasuite-*
deps provisioning + OIDC tests ran, immich ops.py seeds, manifest block present in every log,
screenshot.png where capture succeeded).
- canary suite (RED canaries still caught at intended tier) + per-recipe level == baseline matrix.
- zero leaked apps after teardown.
### M2-prep — independent hook-port audit (shell→python / best-effort↔fatal drift) @2026-06-10T20:55Z
Triggered by the lasuite-drive regression (below), which my M1 PASS MISSED: my M1 coverage diff
compared recipe_meta KEYS (resolved values), not ops.py hook BODIES, and my assertion scan matched
`assert ` not `raise AssertionError`. So a hook that flipped best-effort→fatal was invisible to my
M1 method. M2 (real-CI sweep) caught it — the safety net working as designed. I then audited ALL
hook ports cold (`git diff c2508c7..origin/main` per recipe ops.py + the 2 setup_custom_tests.sh
ports), filtering for non-mechanical error-handling (raise/assert/except/exit/timeout/poll changes):
- **lasuite-drive `pre_install`** — GENUINE rcust regression (Builder-disclosed, I confirmed):
OLD setup_custom_tests.sh bucket poll fell through on 90s timeout (best-effort, no failure; the
custom-tier `test_minio_storage.py` upload→list→download is the real gate); NEW port added a
terminal `raise AssertionError` → deterministic install RED when the bucket appears just after
90s. Fix-forward APPROVED (restore best-effort print+return, scoped to line-54 only; conditioned
on an L5 re-run + my diff re-verify). See approval entry in BUILDER-INBOX history (commit 57c66ad).
- **lasuite-docs `install_steps.sh`** — INTENTIONAL P2b change, NOT a defect: OLD setup_custom_tests
did `exit 1` on missing deps/null KC creds; NEW does `exit 0` (no-op) for missing-deps (gated now
by F2-11: the `@requires_deps` OIDC test skips → `sso_dep_unverified` suppresses green) BUT
preserves `exit 1` on secret-insert failure. Consistent with the install-time-deps redesign.
WATCH-ITEM (residual): the missing-deps path now relies entirely on F2-11; the sweep didn't
exercise it (deps were ready, skip-count 0). Mechanism verified present at M1; not blocking.
- **All other ops.py** (cryptpad, discourse, ghost, immich, keycloak, lasuite-meet, matrix-synapse,
mattermost-lts, mumble, n8n, plausible, custom-html) — pure mechanical ctx migration
(`domain`→`ctx.domain`, `meta`→`ctx.meta`); expected tuples/strings byte-identical (spot-checked
keycloak 201/409 + 204/200, discourse/ghost _psql ci_marker). No error-handling drift.
Net: exactly ONE accidental hook-port regression (lasuite-drive), now under approved fix. No other
best-effort↔fatal flips. This audit closes the M1-method gap for the hook bodies.
---
### M2 proof-run independent analysis (cold, Adversary) @2026-06-10T23:53Z
M2 is NOT yet claimed by the Builder; this is my independent read of the proof runs sitting on
cc-ci (`/var/lib/cc-ci-runs/{m2b-*,ab-*-oldmain}`), parsed myself via jq (NOT trusting Builder
narrative). The 6 first-sweep mismatches break down as follows.
**Confirmed root fact — REF MISMATCH is real (I verified, not taken on faith).** Every baseline
matrix run used a *PR-head* ref; the first M2.3 sweep used each mirror's *default-branch head* — a
different commit. Independently confirmed via `results.json.ref`:
| recipe | baseline run/ref/level | sweep ref/level |
|---|---|---|
| discourse | 184 / 7ae7b0f76efb / L4 | 7d53d4ec390f / L2 |
| plausible | 308 / 13458fac56a1 / L4 | da159375d89a / L2 |
| mattermost-lts | 196 / a333e31a6002 / L4 | 41c9eb8e5f34 / L2 |
| immich | 307 / 107d7220adce / L4 | 7eb3937a82d0 / L2 |
| lasuite-drive | 189 / ffa7d585afa2 / L5 | f4135d78201e / L0 |
So the sweep was NOT apples-to-apples vs the baseline matrix. Reconciliation requires either
(a) re-run at the baseline ref on new main == baseline level, or (b) A/B same-ref old-vs-new main
== same level. Status per recipe:
- **immich** — m2b-immich (new main, baseline ref 107d7220adce) = **L4 == baseline L4. CLEAN.**
- **mattermost-lts** — m2b (new main, a333e31a6002) = **L4 == baseline L4. CLEAN.**
- **plausible** — m2b (new main, 13458fac56a1) = **L4 == baseline L4. CLEAN.**
→ these three: restructure proven INNOCENT (baseline ref reproduces baseline level on merged main).
- **bluesky-pds** — ab-bluesky-pds-oldmain (OLD main, b2d86efba3f1) = L0 == new-main sweep L0 at
same ref → restructure-NEUTRAL at the sweep ref. (Baseline is "L4-equiv, pre-results-era", no run
id — softer baseline; A/B neutrality is the available evidence.)
- **discourse — NOT yet clean. OPEN.** Two *distinct* flake modes seen, and the A/B was run at the
wrong ref to close the gap:
- baseline 184 (OLD main, 7ae7b0f): all pass → L4.
- m2b-discourse (NEW main, SAME ref 7ae7b0f): **upgrade FAILED**, HC1 guard fired —
"upgrade deployed chaos commit 'eb96de94+U', not intended PR-head '7ae7b0f76efb' — re-checkout
to code-under-test failed (HC1)" → L1. ← same-ref old=L4 vs new=L1 discrepancy, UNexplained.
- ab-discourse-oldmain (OLD main, 7d53d4ec): **restore FAILED** (ci_marker truncated-dump race)
→ L2 == new-main sweep L2 at that ref → neutrality proven, but for the RESTORE mode at the
DEFAULT-head ref, NOT for the L1/upgrade-HC1 mode at the baseline ref.
- Net: the clean A/B (ref 7ae7b0f on OLD main vs NEW main) that would explain L4→L1 was NOT run.
The upgrade re-checkout/HC1 path lives in run_recipe_ci.py/lifecycle which the meta-param
threading DID touch — so "pre-existing flake" is plausible but UNPROVEN here. To clear: run
discourse @7ae7b0f on OLD main (does it deterministically reproduce L4, or also flake to L1?),
and/or repeat @7ae7b0f on new main to characterise the HC1 re-checkout as a race. The HC1 guard
FIRING (not silently passing the wrong commit) is the safety net working — good — but it means
the upgrade did not exercise the PR code, so the run is inconclusive, not a clean baseline match.
- **lasuite-drive** — fix-forward 1357544 (restore best-effort bucket poll) landed; needs a fresh
L5 run at the baseline ref ffa7d585afa2 on merged main to confirm baseline. m2rr/earlier runs
predate or used the default head — NOT yet a clean baseline match. OPEN.
**M2 disposition: still OPEN — no PASS.** 3/6 cleanly reconciled (immich/mattermost/plausible);
bluesky neutral-at-sweep-ref; discourse + lasuite-drive NOT yet closed. I will require, at the M2
claim: (1) discourse same-ref A/B (or repeat) explaining L4→L1; (2) a clean lasuite-drive L5 at
baseline ref; (3) my own cold re-parse of every per-recipe level vs baseline; (4) the M2.4
customization-executed spot-greps; (5) zero leaked apps. Recorded a BUILDER-INBOX heads-up on the
discourse-HC1 gap so it is addressed in the claim, not glossed as "the restore flake".
### M2 proof-run progress + self-correction @2026-06-11T00:05Z
Builder is running (independently, matching my inbox ask) the decisive A/B serially on the box:
`m2-proof.sh` → lasuite-drive @ffa7d585afa2 PR=1 (post-fix-forward 1357544) on merged main 5c0676b,
then discourse @7ae7b0f76efb **PR=2** on merged main (m2p-discourse); `m2-proof2.sh` (queued) →
discourse @7ae7b0f76efb **PR=2** on OLD main (/root/m2-oldmain, ab-discourse-7ae7b0f-oldmain).
**Self-correction to my 23:53Z discourse analysis:** my m2b-discourse run used **PR=0**, but the
upgrade HC1 guard resolves the *PR head* for the re-checkout. The L1 failure message ("deployed
chaos commit 'eb96de94+U', not PR-head 7ae7b0f — re-checkout failed") is plausibly a **PR=0
artifact** (no real PR to resolve the head from), NOT a restructure regression. The Builder's proof
runs correctly use PR=2 (matching baseline run 184's pr=2). So the apples-to-apples comparison I
need is m2p-discourse (PR=2, new main) vs ab-discourse-7ae7b0f-oldmain (PR=2, old main) vs baseline
184 (PR=2, old main, L4). I will cold-verify those three when they land; my L4→L1 concern is on
hold pending the PR=2 result, not yet a confirmed regression. Live lasu-f68b63 stack = active
lasuite-drive proof run (expected, not a leak).
### M2 fix-forward APPROVE: be2026a (services_converged completed-one-shot rule) @2026-06-11T00:31Z
Builder proposed a 2nd lasuite-drive P2b fix on branch `fix/converged-oneshot @ be2026a` and asked
approval before merging to main (M2 "trivial fix-forward w/ Adversary approval" path). Cold-verified
independently (fresh clone of be2026a at /root/adv-be2026a on cc-ci, NOT the Builder's working tree):
- **Diff** (`git diff origin/main..be2026a runner/harness/lifecycle.py`, read myself): in
`services_converged`, a `cur != want` deficit now passes ONLY if `docker service ps <svc>` shows
ALL task states == `Complete`. Conservative: any Running/Preparing/Pending (spinning up) or
Failed/Rejected (broken) in the deficit still returns False; no-tasks-yet still False; plain N/N
and 0/0 unchanged. Targeted addition, not a rewrite.
- **False-green analysis (my own):** only `restart_policy:none` one-shots ever show `Complete`; a
normal crashed service shows Failed/Running(restarting), never Complete. Even if converge passed
on a completed-but-ineffective one-shot, two INDEPENDENT gates still catch it — the generic
`test_serving` HTTP floor and the custom-tier functional test (lasuite-drive
`test_minio_storage.py` upload→list→download is the real bucket gate). Defense-in-depth holds; I
could not construct a false-green path.
- **Tests** `tests/unit/test_converged_oneshot.py` (read + cold-ran): 7 cases pin exactly the
non-vacuity criteria — completed→converged, Failed→NOT, mixed Complete+Failed→NOT (covers the
`docker service ps` history concern), Preparing→NOT, no-tasks→NOT, N/N→converged, 0/0→converged.
- **Cold suite+lint from fresh be2026a checkout:** `cc-ci-run -m pytest tests/unit -q` → **199
passed**; the 7 new tests pass alone; `nix develop .#lint --command scripts/lint.sh` → **lint:
PASS**. Matches Builder's claim.
- **Root cause judged genuine P2b regression** (hook moved into ops.py pre_install runs BEFORE the
install assert; the completed one-shot's 0/1 then burns DEPLOY_TIMEOUT in the converge poll). The
fix accepts a genuinely-healthy deploy (HTTP 200, all other services 1/1) the old `cur!=want`
wrongly rejected — correction, not masking.
- **Not on main** — confirmed `all(s == "Complete")` absent from origin/main; Builder held the gate.
- **Disclosed semantic delta** (a failing one-shot now blocks install convergence earlier vs later
at custom-tier): ACCEPTED — both paths RED, no false-green, no enrolled recipe has a
baseline-failing one-shot.
**VERDICT: fix-forward be2026a APPROVED, conditional on:**
1. Post-merge lasuite-drive proof re-run @ffa7d585afa2 PR=1 lands **L5** (binding end-to-end proof
the fix resolves the converge hang — if it doesn't, the diagnosis was wrong and approval voids).
2. I re-verify the MERGED diff == be2026a diff (no extra change sneaks in at merge).
3. discourse PR=2 A/B pair (m2p-discourse / ab-discourse-7ae7b0f-oldmain — no one-shots, unaffected
by this fix) completes and I cold-verify those levels too.
This APPROVE does NOT clear M2; M2 still needs all per-recipe levels reconciled + my independent
sample re-check + zero-leak teardown.
### be2026a merge cold-verify — condition #2 SATISFIED @2026-06-11T00:42Z
Builder merged be2026a as 6cabbe7 (build 350 green, origin/main now b4505ac). Independently checked:
`diff origin/main:runner/harness/lifecycle.py be2026a:...` → **IDENTICAL**; the merged
`tests/unit/test_converged_oneshot.py` → **IDENTICAL** to be2026a. Clean merge, no extra change
slipped in — approval condition #2 met. m2p-lasuite-drive (pre-fix) landed L0 (install/converge
timeout) = the diagnosed symptom (Builder disclosed b4505ac it SIGINT-shortcut the doomed burn;
binding proof is the post-fix m2p2 re-run). REMAINING be2026a conditions: #1 post-fix lasuite-drive
L5, #3 discourse PR=2 A/B cold-check — both pending (m2p-discourse running, then ab-oldmain, then
m2p2-lasuite-drive).
### be2026a conditions CLEARED + SSO-baseline staleness finding (independent) @2026-06-11T01:12Z
Reached the conclusions below COLD (own git archaeology + run-dir jq) BEFORE reading the Builder's
01:10Z inbox — which then concurred. Anti-anchoring preserved (no JOURNAL read; inbox read after my
own derivation).
**be2026a fix-forward — ALL 3 CONDITIONS SATISFIED → fix-forward FULLY CLEARED:**
1. **Post-fix lasuite-drive (m2p2, merged main 6cabbe7, ffa7d585afa2, PR=1): L4, rc=0, 3m19s.**
Independently verified: flags clean_teardown=true + no_secret_leak=true; all 4 essential rungs
pass; `test_minio_storage::...object_roundtrip` PASSED; `test_oidc_..._keycloak` PASSED. The
install converge no longer hangs — both fix-forwards (1357544 best-effort poll + 6cabbe7
completed-one-shot converge) exercised in one run. The literal "L5" in my condition is
**unmeetable on current code and NOT an rcust effect** — see staleness finding below; I accept
the L4-equivalence. Fix works end-to-end.
2. **Merged diff == branch diff** — verified earlier (4428e76): lifecycle.py + test file
byte-identical to be2026a.
3. **discourse A/B — restructure-NEUTRAL.** m2p-discourse (NEW main, 7ae7b0f, PR=2) = L1 and
ab-discourse-7ae7b0f-oldmain (OLD main, SAME ref, SAME PR=2) = L1, SAME stage (upgrade), SAME
message (`eb96de94+U` HC1 re-checkout). old==new byte-identical → rcust did NOT regress discourse.
The L4(184)→L1 vs baseline is pre-existing env drift since 06-05 (filed below), not rcust.
**FINDING [adversary] — M2 baseline matrix has 3 STALE L5 entries (lasuite-docs/drive/meet).**
Independently established: the level ladder dropped 6-rung(L5)→4-rung(max L4, integration &
recipe-local now OPTIONAL/non-laddered) in mainline PR#6 (c51cd84 "4-rung ladder", + 46e2cdb),
which `git merge-base --is-ancestor c51cd84 01e6d49^` confirms is an ANCESTOR OF PRE-RCUST MAIN.
The rcust merge touches level.py NOT AT ALL and results.py by +4 cosmetic P5 lines; compute_level
+ derive_rungs are byte-identical old-main↔merged-main. So NO current-code run (rcust or pre-rcust)
can produce L5; baselines 188/189/204 (L5, integration:pass) were recorded under the OLD schema
(run 204 ran 06-09 hours before the refactor deployed). **rcust is INNOCENT of L4≠L5.** Integration
coverage is NOT lost: the requires_deps OIDC tests EXECUTE and PASS (skip-count 0) on current code —
verified in m2p2 AND the sweep's m2r-lasuite-docs (`test_oidc_login_via_keycloak` +
`test_oidc_password_grant_...` PASSED) and m2r-lasuite-meet (`...password_grant...` PASSED).
ACCEPTED equivalence for the M2 matrix: **old L5 ≡ new L4 (all 4 essential rungs pass) + requires_deps
OIDC test PASSED (skip-count 0)**. Under this, lasuite-docs (m2r L4) / lasuite-meet (m2r L4) /
lasuite-drive (m2p2 L4) all MATCH. (Note: this validates — but corrects the basis of — the Builder's
first-sweep "lasuite-docs/meet matched baseline"; they are L4+OIDC, not numeric L5.) This is a
matrix-staleness correction, NOT a rcust regression; no VETO.
**Still OPEN for the M2 verdict (my side):** (a) per-recipe levels reconciled vs the CORRECTED
baseline for all 21; (b) bluesky-pds is L0 on BOTH old & new main (upstream image
`Cannot find module index.js`) — restructure-neutral but also cannot match its L4-equiv baseline on
ANY current run → needs a DECISIONS/DEFERRED note as non-rcust upstream breakage, not a silent
mismatch; (c) the 2 drone-path !testme runs (immich#2/plausible#3); (d) zero-leak teardown sweep;
(e) my own independent re-check of ≥5 recipes' logs + ALL mismatches before any M2 PASS.
---
## M2 — merged-main real-CI regression sweep: **PASS** @2026-06-11T01:15Z
Cold-verified the M2 claim (STATUS gate "M2 CLAIMED ~01:30Z") from my own clone + direct on cc-ci,
re-running/ re-parsing rather than trusting Builder logs. Every M2.0M2.4 item holds.
**M2.2 canaries — cold RE-RAN myself** from a fresh `origin/main` checkout (/root/adv-be2026a @
origin/main): `cc-ci-run -m pytest tests/regression/ -m canary -v` → **7/7 passed (301s)**, incl.
`bad-false-green` (the false-green detector) + all four RED canaries (bad-install/upgrade/backup/
restore) caught at their designed tier. The level system is NOT inflating. (log /root/adv-canary.log)
**M2.3 per-recipe — all 21 reconciled (cold jq on each run dir):**
- 13 clean: cryptpad/custom-html/ghost/hedgedoc/keycloak/matrix-synapse/n8n/uptime-kuma = L4;
mailu/custom-html-tiny = L2 (backup_restore N/A); mumble = L4 (deploy-count=1) — all == baseline,
clean_teardown=true.
- 2 designed-bad canaries genuinely exercised: bkp-bad rungs backup_restore=**fail** (backup=fail);
rst-bad backup_restore=**fail** (backup=pass→restore=fail). The L1 cap is upgrade-N/A ladder
semantics; the designed failure is recorded in the rung (verified — NOT a coincidental
level-match).
- immich/mattermost-lts/plausible: **L4 @ exact baseline refs** (m2b-*) — baseline REPRODUCED on the
restructured harness (cold-verified earlier this session).
- discourse: m2p-discourse (NEW main) == ab-discourse-7ae7b0f-oldmain (OLD main) — SAME ref/PR=2,
SAME stage, SAME upgrade-HC1 message (`eb96de94+U`), SAME L1. **old==new ⇒ rcust-neutral**; the
L4(184)→L1 is pre-existing env drift since 06-05 (DEFERRED.md), NOT caused by the restructure.
- lasuite-docs/-meet/-drive: L4 all-rungs-pass + requires_deps OIDC test PASSED (skip-count 0)
[lasuite-drive m2p2 also MinIO PASSED, post-both-fixes, rc=0]. Their "L5" baselines are STALE:
the 6→4-rung ladder landed in mainline c51cd84 (PR#6), which `git merge-base --is-ancestor
c51cd84 01e6d49^` confirms PREDATES the rcust merge; level.py untouched by the merge, derive_rungs
byte-identical old↔new. **rcust-innocent; integration coverage preserved** (OIDC tests execute &
pass). Accepted equivalence old L5 ≡ new L4-all-pass + OIDC-pass.
- bluesky-pds: EXCLUDED — `Cannot find module /app/index.js` crash-loop on BOTH old & new main at
every ref → upstream image breakage, rcust-neutral. DEFERRED.md note present.
**M2.3 drone→harness path:** drone builds **356 (immich) + 357 (plausible)** = `build_event=custom`
(bridge-triggered; distinct from push builds 358-361), trigger=autonomic-bot, both **success**
(verified in drone sqlite DB); run dirs 356/357 = immich L4 pr=2 / plausible L4 pr=3, customization
manifest present, clean_teardown=true.
**M2.4 customizations actually executed (cold-grep):** manifest block **21/21** logs; mumble
`ready-probe OK (tcp 3x) 127.0.0.1:64738`; ghost `ccci-overlay: provided compose.ccci.yml ...
base deploy auto-chaos` (P2a first-class path live); cryptpad `EXTRA_ENV='<hook>'`; immich
`ops.py[pre_backup,pre_restore,pre_upgrade]` + `pre-op seed` lines (migrated ctx hooks run).
**Teardown:** `docker stack ls` = infra (backups/bridge/dashboard/reports/drone/traefik) +
warm-keycloak ONLY, **zero leaked app stacks** (checked after ALL runs incl. drone-path).
**Fix-forwards (both Adversary-approved, additive):** 1357544 (lasuite-drive best-effort poll, appr
57c66ad) + be2026a/6cabbe7 (services_converged completed-one-shot, appr a531746) — merged diff ==
branch diff, all 3 be2026a conditions cleared (24a203a). Cold unit suite on post-fix main = 199
passed, lint PASS.
**VERDICT: M2 PASS.** No regression CAUSED BY the restructure: every deviation from the baseline
matrix is proven rcust-neutral by same-ref old-vs-new A/B (discourse, bluesky) or is a pre-rcust
stale-schema artifact with coverage preserved (3 lasuite), all documented in DEFERRED.md — not a
silent mismatch. The false-green detector is green on my own cold canary run. No findings filed,
no VETO.
**M1 PASS (01f9f70) + M2 PASS (this entry) both stand** → the phase DoD handshake is satisfied; the
Builder may write `## DONE` to STATUS-rcust.md. (M1's unit+lint acceptance still holds on post-fix
main: 199 passed / lint PASS, the fix-forwards being additive + separately approved.)

184
machine-docs/REVIEW-shot.md Normal file
View File

@ -0,0 +1,184 @@
# REVIEW-shot.md — Adversary verdicts, phase `shot` (recipe screenshot audit & repair)
Owner: Adversary loop. Append-only verdict log. Gates: M1 (audit+diagnosis), M2 (all working).
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md`.
No gate CLAIMED yet (phase just opened; Builder has not bootstrapped STATUS-shot.md). Doing
independent cold ground-truth prep below so M1/M2 cold-verify is fast and un-anchored.
---
## Independent cold pre-audit (Adversary, @2026-06-11T01:20Z)
Method: ssh cc-ci, scanned `/var/lib/cc-ci-runs/*/results.json` for recipe + `screenshot` field +
on-disk `screenshot.png` size; scp'd suspect PNGs locally and **looked at them** (Read tool).
This is MY ground truth, formed before any Builder claim — to compare against the Builder's matrix.
PNG sizes from latest representative runs (m2r-* sweep + numbered drone runs):
| recipe | PNG bytes | my visual read | class |
|---|---|---|---|
| immich | 4801 | pure blank white frame | **BLANK** |
| n8n | 4801 | blank near-white frame | **BLANK** |
| lasuite-meet | 4801 | (size-identical to immich/n8n 4801B — blank tell) | BLANK (to confirm visually) |
| cryptpad | 4802 | blank light-grey frame | **BLANK** |
| keycloak | 8764 | spinner + "Loading the Administration Console" — paint-race loading state, NOT a real login form | **BLANK/LOADING** (not the "genuine sparse login" §2 guessed) |
| lasuite-docs | 6022 | bare spinner on white | **BLANK/LOADING** |
| lasuite-drive | ~5.9K | (size sibling of lasuite-docs — likely same spinner) | BLANK (to confirm) |
| plausible | null / NO PNG | every run null (122→357 incl. 357); run dir has no screenshot.png; capture stdout not in run dir (goes to Drone build log) — root cause still to trace | **NULL** |
| ghost | 444183 | (reference healthy, §2) | OK (visual-confirm at M2) |
| mattermost-lts | 242139 | reference healthy | OK |
| hedgedoc | 131967 | reference healthy | OK |
| discourse | 66-67K | reference healthy | OK |
| custom-html | 35707 | reference healthy | OK |
| mailu | 33800 | reference healthy | OK |
| matrix-synapse | 33296 | reference healthy | OK |
| uptime-kuma | 30858 | reference healthy | OK |
| custom-html-tiny | 12950 | reference healthy | OK |
| mumble | 7913 | voice server — web-UI N/A candidate (confirm) | N/A? |
Confirmed defect classes match the orchestrator pre-audit (§2): SPA paint-race (domcontentloaded
fires before JS paints) → immich/n8n/cryptpad fully blank, keycloak/lasuite-docs/-drive caught at
loading spinner; plausible never captures (null on every run). **The 4801B byte-identical size is a
reliable blank-frame fingerprint.**
Open items I must still resolve when verifying:
- plausible NULL root cause — need the Drone build log for a plausible run (capture stdout: "capture
failed" vs "produced no file" vs step never reached). Run dir alone doesn't have it.
- lasuite-meet / lasuite-drive / mumble — visual confirm.
- Authoritative enrolled-recipe set: every `tests/<recipe>/recipe_meta.py` minus fixtures
(`_generic`, `regression`, `concurrency`, `custom-html-bkp-bad`, `custom-html-rst-bad`).
No verdict yet. Awaiting `claim(shot): M1`.
---
## M1: PASS @2026-06-11T01:38Z (audit + diagnosis complete)
Claim: `claim(shot): M1` commit e005897; matrix+diagnoses at 8978fa6. STATUS-shot.md "M1 claim".
Verified COLD from my own clone + ssh cc-ci, **without reading JOURNAL-shot.md** (anti-anchoring).
My independent pre-audit (commit 4f3a747, formed BEFORE reading the Builder's matrix) already
agreed on every BLANK/LOADING/NULL read I had pre-formed — no anchoring.
**Enrolled set — complete, no omissions.** `ls tests/*/recipe_meta.py` = 21. Minus the two harness
canaries `custom-html-bkp-bad`, `custom-html-rst-bad` (plan §2 explicitly excludes both) = **19**.
The 19 matrix rows are *exactly* that set (diffed by hand) and exactly the plan §2 expected set.
`_generic`/`regression`/`concurrency`/`unit` have no recipe_meta.py → correctly absent. ✓
**Every non-OK row has evidence-backed root cause (independently re-derived):**
- plausible NULL — ran the Builder's drone-log command myself: build 357 step log shows
`capture failed … page.goto(https://plau-…/) never returned a status in (200,301,302,303,401,403)
after 15 attempts (45s); last status=500`. `/` 500s by design (DISABLE_AUTH) → default landing
capture can never succeed; needs a SCREENSHOT hook to a rendering path. Confirmed. ✓
- bluesky-pds NULL — capture is `if deploy_ok:`-gated, OUTSIDE the deploy try/except
(runner/run_recipe_ci.py:1024, read it). install=fail level=0 → capture correctly skipped. Not a
screenshot defect; upstream image breakage already in DEFERRED.md (rcust). ✓
- BLANK/LOADING — screenshot.py:84-93 navigates `wait_until="domcontentloaded"` then screenshots
immediately, no paint wait; accept_statuses excludes 500 (plausible mechanism). Read the code. ✓
- mumble NOT N/A — tests/mumble/recipe_meta.py header: deploys `compose.mumbleweb.yml`, a mumble-web
HTTP client routed through Traefik, HEALTH_PATH "/". A real web surface IS served → correctly the
HARDER (non-N/A) call. ✓
**Independent visual spot-checks (Read tool) — 11 artifacts, matrix matched reality on every one:**
immich 4801B = pure white; n8n 4801B = blank; cryptpad 4802B = blank grey; lasuite-meet 4801B =
pure white; keycloak 8764B = "Loading the Administration Console" spinner (NOT a real login — the
§2 "might be a genuine login" guess was wrong, Builder classed it LOADING correctly); lasuite-docs
6022B = bare spinner; mumble 7913B = spinner ring on grey; mattermost-lts 242139B = blue brand
splash + logo, NO login form (correctly LOADING despite large size — size alone is NOT a sufficient
signal, good catch); n8n run 197 30256B = real "Set up owner account" form, empty fields,
credential-free (flaky-pass + secret-safe, confirmed); custom-html 35707B = genuine "Welcome to
nginx!" (honest fresh-install view for a bare static host — OK); plausible = NULL via drone log.
Includes plausible ✓ and multiple 4801B cases ✓ (M1 minimum was ≥5 incl. those — exceeded).
**N/A arguments — agreed:**
- bluesky-pds → justified N/A (deploy-gated: can't screenshot what can't deploy; upstream breakage
is pre-existing/DEFERRED, not a screenshot defect). Agreed, contingent on the upstream image still
being broken at M2 — if it becomes deployable, it re-enters as a real recipe.
- mumble → NOT N/A. Agreed (real mumble-web surface, evidence above).
No omissions, no fabricated visual reads, diagnoses are causal not symptomatic. **M1 PASS.**
Watch-list for M2 (so the Builder has it early — NOT blocking M1):
1. Harness default-wait fix must stay within NAV_DEADLINE_S=45 / step worst-case ≤~60s and must
NEVER affect a verdict on screenshot failure (R7) — I will test the failure path has teeth but
no verdict impact, and compare pre/post run durations.
2. plausible SCREENSHOT hook must land on a credential-free *rendering* path (not /login showing a
generated secret; not a 500 page).
3. mattermost-lts proof: a bigger PNG is NOT acceptance — I will visually confirm the real login,
not a brand splash.
4. Secret-safety: every final PNG must show no generated credentials (install wizards, secrets
pages). n8n's "Set up owner account" with EMPTY fields is the safe shape; a pre-filled one is not.
5. M2 requires ≥2 proof runs via the drone `!testme` path + me Reading *every* final PNG.
Did not read JOURNAL-shot.md before this verdict. No finding filed (audit is accurate). No VETO.
---
## M2: PASS @2026-06-11T07:17:53Z — all screenshots working (cold-verified from scratch)
Verified independently from a cold start (my own clone, my own scp/Read/re-runs; did NOT read
JOURNAL before this verdict). Claim commit 196156e. Every M2 DoD item checked:
**1. Every final PNG Read (18/18) — real, representative, credential-free.** Pulled each PNG by scp,
Read it with the image tool, byte-size matched the claim on all 18:
- Fixed-class (10): immich 234351B "Welcome to Immich" onboarding; plausible 64132B real
registration form (EMPTY fields); keycloak 215587B real "Sign in to your account" (EMPTY) — was
the 8764B "Loading Admin Console" spinner at M1, settle fix resolved it; cryptpad 57310B real
landing + doc-type picker; lasuite-meet 225686B real video-conf landing; lasuite-docs 284769B real
Docs landing; lasuite-drive 132037B real "Fichiers" landing; n8n 26433B "Set up owner account"
(ALL fields EMPTY — secret-safe, now deterministic); mattermost-lts 178367B **real "Log in to your
account" form (EMPTY) — NOT the byte-identical interstitial** (hook v2 click-through works — my
sharpest watch-item, resolved); mumble 7980B loader spinner (see §N/A).
- Healthy-class (8): ghost 444183B blog landing; hedgedoc 131967B landing; discourse 66121B forum +
welcome topic; custom-html 35707B "Welcome to nginx!" (honest fresh-install); custom-html-tiny
12950B seeded content; mailu 33800B sign-in (EMPTY); matrix-synapse 33296B "It works!"; uptime-kuma
30858B "Create your admin account" (EMPTY).
Every login/setup form has EMPTY fields — NO generated credential is shown anywhere. Secret-safety
cardinal guardrail holds across all 18.
**2. No verdict/level regression.** All 10 proof runs status=pass at their baseline level (immich
/plausible/keycloak/cryptpad/lasuite-*/n8n/mumble=4, mattermost-lts=2). screenshot field populated
on every one. no_secret_leak=true on every proof run I sampled (370/371/keycloak/n8n/mattermost
/mumble).
**3. ≥2 genuine drone `!testme` proofs — confirmed end-to-end, NOT manual.** ccci-bridge_app logs:
`[poll] triggered build 370 for immich@107d7220 (PR #2, comment 14321) by autonomic-bot` and
`...build 371 for plausible@13458fac (PR #3, comment 14322)...`, both `reflected outcome ...:
success`. The bridge polled Gitea, found real !testme comments, triggered the builds, reflected
verdicts back — the full comment→build path. Drone params {RECIPE,PR,REF,SRC}, event=custom,
trigger/sender=autonomic-bot — matches the Phase-1c bridge-!testme fingerprint (REVIEW-1c:110).
**4. Durations unaffected (no balloon).** Drone same-recipe pre/post: immich 199s→198s, plausible
209s→166s (faster — capture no longer burns 45s failing on the 500). Screenshot step wait budget =
60000ms exactly (unit test_wait_budget_within_step_cap + my own cold probe). ≤~60s holds.
**5. R7 (cosmetics never block) — intact.** Call site run_recipe_ci.py ~1024-1037 is OUTSIDE the
deploy try/except AND double-wrapped in its own try/except (`_scrub`-bed log) — and git log proves
NO shot-phase commit touched run_recipe_ci.py (call site unchanged). capture() swallows everything →
None → placeholder. I cold-probed the new helpers independently: _settle swallows all exceptions,
_snap keeps the larger frame (A1 fix, 5/5), 60s budget — 9/9+5/5 pass. Screenshot unit suite 12/12
+ card suite 10/10 ran GREEN cold on the real harness (cc-ci-run) from my scp'd clone.
**6. Dashboard/card/badge render — live 200.** GET dashboard / → 200; runs/370+371/screenshot.png →
200 image/png; badge/immich.svg + badge/plausible.svg → 200 image/svg+xml.
**7. N/A set (19/19 enrolled, no omissions) — AGREED.**
- bluesky-pds → N/A, re-confirmed at M2 (ab-bluesky-pds-oldmain: install=fail, level=0,
screenshot=null → placeholder correct; upstream MODULE_NOT_FOUND still broken, DEFERRED).
- mumble → N/A-variant, AGREED — **this reverses my M1 "NOT N/A" stance, on NEW evidence not
available at M1.** rankenstein/mumble-web:0.5 renders no usable UI for an anonymous browser:
connect-dialog DOM genuinely absent (probe4 console: `#connect-dialog_input_address ... did not
match any element`), perpetual loading-container spinner at 5/15/30/60/90s (probe2) — corroborated
by my own Read of the 7980B spinner PNG. The loader frame is the literal web-surface reality every
visitor gets; mumble's actual function (voice) is fully protocol-tested; fix needs a recipe/overlay
change (out of scope, guardrail prefers upstream). Documented in DEFERRED with an upstream
question. NOTE (not a defect, not a veto): the dashboard shows the honest loader frame rather than
the "no screenshot" placeholder — acceptable as a documented, agreed limitation, NOT a healthy-app
screenshot.
Finding A1 (blank-retry regression) was filed, fixed (7ad7d1f), and CLOSED after my cold re-test.
No open findings. No fabricated reads — every matrix/claim value matched what I independently
observed. **M2 PASS. No VETO.** With M1 PASS (ae10b55) + M2 PASS both fresh and A1 closed, the DoD
handshake (§6.1) is satisfied — the Builder may write `## DONE` to STATUS-shot.md.
(Consulted no JOURNAL-shot.md before forming this verdict.)

157
machine-docs/STATUS-bsky.md Normal file
View File

@ -0,0 +1,157 @@
# STATUS — phase bsky (fix bluesky-pds recipe + screenshot)
Phase SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-bsky-fix.md
## DONE
Phase bsky complete @2026-06-11T15:55Z: M1 PASS (REVIEW-bsky 369f4f4 @12:30Z) + M2 PASS
(42eabba @15:48Z, incl. the Adversary's own independent !testme re-trigger → build 435
level 5 at PR head), no VETO. bluesky-pds root cause proven, fix PR #2 OPEN+UNMERGED for
the operator (re-pin 0.4.219), green through the full lifecycle incl. lint on real drone
CI, screenshot real and verified, DEFERRED entries closed, operator runbook below.
## M2 claim — operator handoff complete (2026-06-11T15:50Z)
WHAT (phase plan §3 M2, all builder-side items in place; the fresh cold pass is yours):
1. **Green at PR head, re-triggerable:** PR #2 head f7b6c8df unchanged since run 427
(level 5). HOW to re-run independently: post `!testme` on PR #2 — the bridge polls
~1 min, triggers a drone build, run dir /var/lib/cc-ci-runs/<n>. EXPECTED: level=5,
rungs install/backup_restore/functional/lint=pass, upgrade=skip with
skips.intentional.upgrade = the declared reason, clean_teardown+no_secret_leak=true,
screenshot.png = the PDS landing page. (cc-ci main also unchanged functionally since
e9745c8; HEAD at claim time: see this commit.)
2. **PNG to independently Read:** https://ci.commoninternet.net/runs/427/screenshot.png
(+ the fresh run's, if you re-trigger). EXPECTED: ASCII Bluesky butterfly landing
page, no credentials.
3. **Level under new semantics + baseline reconciled:** achieved level 5 (de-capped:
skip climbs), upgrade = declared intentional skip with re-enable path. Old baseline
"full lifecycle green" (Phase-2 e45e0ee, pre-results-era) reconciled: unreproducible
for upstream reasons (moving-tag republish broke ALL published versions); the PR
restores deployability; recorded in DEFERRED closure + JOURNAL-bsky 12:15Z entry.
4. **DEFERRED entries closed with pointers:** machine-docs/DEFERRED.md bluesky entry
marked RESOLVED @2026-06-11 (commit f150012) — explicitly closes BOTH the re-pin
follow-up and the rcust M2 baseline-exclusion note, with PR/run/registry pointers.
5. **Operator summary:** below in this file (what was wrong / what the PR changes /
post-merge steps 1-5 incl. version publish, EXPECTED_NA→UPGRADE_BASE_VERSION swap,
no canonical to reseed, never re-pin :0.4).
6. **PR left OPEN** for the operator (merged=false; immich PR#2/plausible PR#3 precedent).
WHERE: cc-ci main (STATUS/JOURNAL/BACKLOG-bsky, DEFERRED f150012, DECISIONS 2026-06-11
×2, harness e9745c8); mirror PR #2 head f7b6c8df; runs 427 (green) / 423 (negative
control); upstream registry cc-ci-plan/upstream/bluesky-pds.md @ f395247.
## M1 claim — root cause + green fix PR + screenshot (2026-06-11T12:05Z)
### WHAT
1. Root cause proven with evidence (below).
2. Fix PR open on the recipe mirror: **recipe-maintainers/bluesky-pds PR #2**, branch
`upgrade-0.3.0+v0.4.219`, head `f7b6c8df` — 2-line compose.yml diff (image
`ghcr.io/bluesky-social/pds:0.4``0.4.219`; version label `0.2.0+v0.4`
`0.3.0+v0.4.219`). UNMERGED (operator merges).
3. `!testme` on the PR green through the full lifecycle via the real drone path:
**run 427 = level 5** — install/backup_restore/functional/lint all PASS, upgrade =
DECLARED intentional skip (justification below), clean_teardown, no_secret_leak.
4. Screenshot captured on that PR run and visually verified by me: the genuine PDS
HTTP landing page (ASCII Bluesky logo, "This is an AT Protocol Personal Data
Server", /xrpc/ pointer, upstream links) — real, representative, credential-free.
No SCREENSHOT hook needed.
### Root cause
The recipe pins MOVING tag `ghcr.io/bluesky-social/pds:0.4` and overrides the entrypoint
with a script ending `exec node --enable-source-maps index.js` (relative to WORKDIR /app).
Upstream now publishes main-branch builds to `:0.4` (== `latest`, manifest
`sha256:871194d2…`, created 2026-05-30): `@atproto/pds` **0.5.1**, Node v24.15.0, service
restructured to `/app/index.ts` (CMD `node --enable-source-maps index.ts`; **no
index.js**) → crash-loop `Cannot find module '/app/index.js'`. Exact tag `0.4.219`
(newest released; ghcr digest `sha256:e0b756701c92…`) keeps the expected layout: Node
v20.20.2, `/app/index.js`, dumb-init, CMD identical to the recipe's exec line.
HOW to verify root cause (any host with ssh cc-ci):
- `ssh cc-ci 'docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4 -c "node --version; ls /app; grep @atproto/pds /app/package.json"'`
→ EXPECTED v24.15.0; index.ts, NO index.js; `"@atproto/pds": "0.5.1"`
- `ssh cc-ci 'docker run --rm --entrypoint sh ghcr.io/bluesky-social/pds:0.4.219 -c "node --version; ls /app; grep @atproto/pds /app/package.json"'`
→ EXPECTED v20.20.2; index.js present; `"@atproto/pds": "0.4.219"`
- Upstream: Dockerfile@main = node:24.15-alpine3.23 + CMD index.ts;
Dockerfile@v0.4.219 = node:20.20-alpine3.23 + CMD index.js. Registry doc:
cc-ci-plan/upstream/bluesky-pds.md (plan repo f395247).
### Upgrade-rung justification (the "justify status either way" item)
Published versions exist (0.1.1+v0.4, 0.2.0+v0.4) but BOTH pin the republished `:0.4`
no published version can deploy as the upgrade base anymore (negative control: run 423,
pre-harness-change, deployed base 0.1.1+v0.4 → identical MODULE_NOT_FOUND crash-loop,
install=fail, PR head never reached; run-423 recipe checkout sat at tag 0.1.1+v0.4).
Harness change e9745c8 (main): declaring the upgrade rung in recipe_meta EXPECTED_NA now
also suppresses the base deploy — single deploy = the PR head; the upgrade tier records
"skip"; derive_rungs classifies it the DECLARED intentional skip; reason fully visible in
results.json `skips.intentional` and on the card. NOT a weakening: the rung is never
reported pass; decision + re-enable path in machine-docs/DECISIONS.md (re-enable =
UPGRADE_BASE_VERSION="0.3.0+v0.4.219" once merged+published).
HOW: `cc-ci-run -m pytest tests/unit/ -q` from a cold clone of main on cc-ci →
EXPECTED 253 passed (6 new in tests/unit/test_upgrade_base.py);
`nix develop .#lint -c bash scripts/lint.sh` → EXPECTED `lint: PASS`.
### Green-run evidence (run 427, drone path)
- Trigger: PR #2 comment 14342 (`!testme`) → bridge log line
`[poll] triggered build 427 for bluesky-pds@f7b6c8df (PR #2, comment 14342)`;
outcome line `reflected outcome build 427 (bluesky-pds PR #2): success`; PR result
comment 14343 "✅ passed @ f7b6c8df".
- HOW: `ssh cc-ci 'cat /var/lib/cc-ci-runs/427/results.json'` → EXPECTED level=5,
ref=f7b6c8dfb81c, rungs install/backup_restore/functional/lint=pass + upgrade=skip,
skips.intentional.upgrade=<declared reason>, flags clean_teardown+no_secret_leak true.
- PR-head proof: run-427 per-run recipe checkout
(`/var/lib/cc-ci-runs/427/abra/recipes/bluesky-pds`) at `f7b6c8d chore: upgrade to
0.3.0+v0.4.219`, compose.yml line 6 image=…:0.4.219.
- Visuals: https://ci.commoninternet.net/runs/427/summary.png (card: level 5 of 5, all
tiers PASS, upgrade INTENTIONAL SKIP + reason, screenshot thumb, clean-teardown +
no-secret-leak chips), …/badge.svg ("cc-ci: level 5", green),
…/screenshot.png (the PDS landing page described above).
### WHERE
- cc-ci main @ 72b3d6c (harness change e9745c8; journal/decisions 72b3d6c).
- Mirror PR #2: https://git.autonomic.zone/recipe-maintainers/bluesky-pds/pulls/2
(head f7b6c8df; base main b2d86ef).
- Runs: /var/lib/cc-ci-runs/427 (green, PR head), /var/lib/cc-ci-runs/423 (negative
control, pre-change base trap).
- Upstream registry: cc-ci-plan/upstream/bluesky-pds.md @ plan-repo f395247.
## Operator summary
**What was wrong.** bluesky-pds could not deploy at all: the app crash-looped
`Cannot find module '/app/index.js'`. The recipe pins the MOVING image tag
`ghcr.io/bluesky-social/pds:0.4`, and upstream now republishes that tag with main-branch
builds (currently @atproto/pds 0.5.1 on Node 24, where the service entrypoint moved to
`/app/index.ts``index.js` no longer exists). The recipe's entrypoint override
(`exec node --enable-source-maps index.js`) can no longer resolve. This also silently
broke BOTH previously published recipe versions (0.1.1+v0.4, 0.2.0+v0.4 — same moving
pin), so no historical version can deploy anymore either.
**What the PR changes.** https://git.autonomic.zone/recipe-maintainers/bluesky-pds/pulls/2
(branch `upgrade-0.3.0+v0.4.219`, head f7b6c8df), a 2-line compose.yml diff: pin the exact
released tag `0.4.219` (newest released; classic Node 20 / index.js layout the recipe's
entrypoint expects) and bump the version label to `0.3.0+v0.4.219`. Why not 0.5.1: it has
no release tag (only the moving :0.4/latest + sha- tags from main) and needs an entrypoint
migration; do that as a proper upgrade when upstream cuts a 0.5.x release tag (notes in
cc-ci-plan/upstream/bluesky-pds.md). Proven at PR head via real drone CI: run 427 =
**level 5** (install, backup/restore, functional, lint PASS; screenshot = real PDS landing
page). The upgrade rung is a DECLARED intentional skip — there is no deployable published
base to upgrade FROM (see above); declaration + reason in tests/bluesky-pds/recipe_meta.py.
**What to do post-merge.**
1. Merge PR #2 (your call, as with immich PR#2 / plausible PR#3 — all left open).
2. Publish the version per recipe convention (annotated tag `0.3.0+v0.4.219` /
`abra recipe release`) so `abra recipe versions` lists a deployable version again.
3. After the tag is published: in cc-ci `tests/bluesky-pds/recipe_meta.py`, DROP the
`EXPECTED_NA["upgrade"]` declaration and set
`UPGRADE_BASE_VERSION = "0.3.0+v0.4.219"` — the upgrade rung then re-activates from
the first deployable base (the older broken tags must never be auto-picked as base).
4. Canonical/warm: nothing to reseed — bluesky-pds has no canonical
(/var/lib/ci-warm has no entry); the normal promote-on-green flow mints one on the
first green run post-merge.
5. Never re-pin this recipe to `:0.4`/`latest` — upstream demonstrably republishes the
minor tag (registry notes: cc-ci-plan/upstream/bluesky-pds.md).

View File

@ -0,0 +1,62 @@
# STATUS — sub-phase conc (concurrency restructure)
Plan: /srv/cc-ci/cc-ci-plan/concurrency-restructure-full-plan.md (SSOT for this phase)
## DONE
Both gates Adversary-verified fresh in REVIEW-conc.md, no open VETO:
- M1 — implementation verified: PASS @2026-06-10T04:38Z (branch @d3fe9e2)
- M2 — merged + live-verified (a)(d): PASS @2026-06-10T08:55Z (final main 139e319/74ed240)
- CONC-A1 (M2(c) live finding): fixed b6e12ef, veto LIFTED + closed @09:05Z
## Phase state
- Phase: conc — concurrency restructure (P1P5 + tests/concurrency) — COMPLETE
- Merged to main: bb5eb3d (restructure) + b7a009c (wrapper exit-code fix) + 139e319 (CONC-A1 fix)
- Correction per M2 verdict: 139e319's first parent is 2173894 (not 4ad55ed as the claim said);
immaterial — the code-diff-empty check (139e319 vs b6e12ef) is authoritative.
## Gate claim: M2 — merged + live-verified
**WHAT**: branch merged to main after M1 PASS; live verification (a)(d) all green on the final
main code (which includes two M2-found fixes, both already Adversary-verified: wrapper exit-code
e1c4198/b7a009c, CONC-A1 run-keyed state files b6e12ef/139e319).
**WHERE**: main tip code = merge 139e319 (parents 4ad55ed ∘ b6e12ef); branch tip b6e12ef.
All evidence builds ran post-139e319. Drone repo recipe-maintainers/cc-ci; host cc-ci.
**HOW + EXPECTED (cold re-check from your own access path):**
1. Merge integrity: `git diff 139e319 b6e12ef -- runner/ tests/ docs/ .drone.yml nix/` → EMPTY;
no force-push anywhere (reflog linear).
2. Push build green on main: Drone builds 283 (branch fix), 284 (merge 139e319), 285 (inbox
commit) → all `status=success` (push events). No main push since has a red build.
3. Suites at b6e12ef (cold clone): `cc-ci-run -m pytest tests/unit -q` → 138 passed;
`cc-ci-run -m pytest tests/concurrency -q` → 23 passed; `nix develop .#lint --command bash
scripts/lint.sh` → lint: PASS. (You already cold-verified these + mutation-proofed
test_run_state per REVIEW-conc 08:4xZ entry.)
4. **(a) cancel-mid-run, on fixed harness**: build **295** (custom immich PR=2, comment 14307
@08:50:02Z). Canceled via `DELETE /api/repos/recipe-maintainers/cc-ci/builds/295` @08:51:05Z
(HTTP 200) while mid-deploy (lock held by harness pid 763099, 4 immich services converging).
EXPECTED/observed: build `status=killed`; pid 763099 gone by 08:51:15Z (SIGTERM funnel ran
the run's own teardown); `pgrep -f run_recipe_c[i]` → none; `lslocks | grep cc-ci-app`
none (lock released); immi services/volumes/secrets/server-envs all 0. Zero leakage, no
janitor needed (better than plan minimum).
5. **(b) parallel runs**: builds **287** (immich#2) + **288** (plausible#3), both started
08:17:40Z (parallel), both `status=success`, both logs `deploy-count = 1 (expect 1)` +
level=4. Host after: zero harness procs / services / volumes / secrets / envs.
6. **(c) double-!testme same PR**: builds **290** + **291** (both immich#2, domain immi-ad3e33).
291 log line 1: `== app lock: another run of immi-ad3e33... is in flight — waiting ==`,
`acquired` @+1411s = exactly 290's exit (08:46:05Z). BOTH `status=success`, both
`deploy-count = 1`, level=4. Zero leakage after. (Your M2(c) PASS @09:05Z already covers
this; kernel-lock-table observation yours.)
7. **(d) full green run**: build **287** = complete immich e2e on final harness, all 5 tiers
pass, level=4 (288 plausible likewise).
**Notes for verification**: builds 290/291 ran ~20 min each due to an immich-ML healthcheck
flake (your 08:43Z note) — converged within DEPLOY_TIMEOUT=1500s; unrelated to the restructure.
Unheld 0-byte lockfiles left behind by design (tidy-swept at next janitor probe).
## Blockers
(none)

View File

@ -0,0 +1,219 @@
# STATUS — phase `dstamp` (discourse abra-stamp drift)
Builder. SSOT: `cc-ci-plan/plan-phase-dstamp-discourse-drift.md`. Gates M1, M2.
## DONE
M1 PASS (REVIEW-dstamp `fb411b2` @17:36Z) + M2 PASS (`71358da` @17:58Z), both fresh, no VETO.
All Definition-of-Done items Adversary-verified.
**Operator summary.** The discourse upgrade-tier "abra stamp drift" (upgrade-HC1 stamping the
prev-base tag commit `eb96de94+U` instead of the PR head `7ae7b0f7+U`, since ~06-10) was **NOT an
abra or harness git bug** — abra stamps the head correctly. **Root cause:** discourse's
`compose.yml` app service uses `deploy.update_config: { failure_action: rollback, order:
start-first, monitor: 5s }`. On the upgrade chaos redeploy, start-first co-resides the OLD+NEW
precompile/Rails-heavy task (~2× memory); under host memory pressure the NEW task fails swarm's 5s
update monitor → swarm **rolls back** to the base spec, reverting the `chaos-version` label
(head→base). start-first kept the old task serving, so `wait_healthy` passed and HC1 read the
reverted base commit — misreported as "re-checkout failed". Intermittent (memory-pressure
dependent): solo run 184 on 06-05 passed; the heavier 06-10/06-11 runs rolled back every time.
**Direct evidence:** `dstamp-repro4` captured `.Spec chaos-version=7ae7b0f7+U` (head applied) →
`.PreviousSpec=eb96de94+U` (base) with `UpdateStatus=updating`, then the post-rollback read = base.
**Fix (commits `0cc31a5` + `e9c26c7`, HC1 unweakened):** (1) `tests/discourse/compose.ccci.yml`
app `update_config.order: stop-first` — the new task boots with full host memory, no OOM, no
spurious rollback (`failure_action: rollback` left intact for genuine failures); (2) a general
harness guard `lifecycle.assert_upgrade_converged` (2-phase StartedAt protocol) that detects a
swarm rollback/pause after the upgrade redeploy and fails the upgrade HONESTLY — the HC1
commit-match assertion is unchanged.
**Proven in real CI:** drone `!testme` build **#450** (discourse @7ae7b0f) = **LEVEL 5** (was L1
under the drift), all tiers green, clean teardown, no secret leak; PR recipe-maintainers/discourse#2
shows ✅ passed. **Blast-radius:** only discourse was affected (keycloak/n8n share the policy but
upgrade-PASS L4; drone/traefik are infra) — the new harness guard now protects all rollback-policy
recipes. DEFERRED entry closed with pointers. **No operator action required.**
---
## Gate: M1 — PASS (REVIEW-dstamp fb411b2 @2026-06-11T17:36Z). Now on M2.
## Gate: M2 — CLAIMED, awaiting Adversary
**WHAT (M2 = Proven in real CI):** discourse full lifecycle GREEN at its true level via the drone
`!testme` path, upgrade-HC1 stamping the CORRECT head value; no other affected recipe; HC1
unweakened (a wrong stamp still FAILs); DEFERRED closed.
- **Real-CI proof — drone `!testme` build #450:** discourse @ `7ae7b0f76efb` (PR#2), STAGES full
(install,upgrade,backup,restore,custom), drone workspace at cc-ci main `2da1f01` (fix present) →
**LEVEL 5** (max), ALL tiers PASS, `clean_teardown=true`, `no_secret_leak=true`. Upgrade tier
`test_upgrade_reconverges` PASSED (HC1's `assert_upgraded` only passes when the deployed
chaos-version commit == head_ref `7ae7b0f`, after `assert_upgrade_converged` confirmed
`UpdateStatus=completed`). Was L1 (drift) before the fix → L5 now.
- **Triggered via the !testme path:** comment `14346` (`!testme`) on recipe-maintainers/discourse#2
→ bridge ack `14347`, updated to "🌻 cc-ci — discourse @ 7ae7b0f7 ✅ **passed**" with the L5
result card/badge linking drone build 450.
**HOW to verify (Adversary, cold):**
1. `grep -oE '"level": [0-9]+|"(install|upgrade|backup|restore|custom)": "[a-z]+"|"clean_teardown":
(true|false)|"no_secret_leak": (true|false)' /var/lib/cc-ci-runs/450/results.json` → level 5,
all `pass`, both flags `true`.
2. `/var/lib/cc-ci-runs/450/junit/upgrade__generic__test_upgrade.xml` → `test_upgrade_reconverges`
testcase with NO `<failure>` child (passed).
3. PR comment 14347 on recipe-maintainers/discourse#2 = ✅ passed, run 450.
4. *Fresh independent re-trigger (recommended):* post `!testme` on discourse#2 → new drone build on
cc-ci main → expect L5 again (reliability: manual fix1+fix2 + build 450 = 3 consecutive green
with the fix vs intermittent unpatched failures).
5. **HC1 teeth (negative test — Adversary leads):** synthesize a wrong stamp and show RED. Two live
teeth: (a) the unchanged commit-match `generic.py:174-175` — a deployed chaos commit ≠ head_ref
still FAILs (e.g. force the recheckout to the base, or deploy base-as-head); (b) the new
`assert_upgrade_converged` raises on a swarm `rollback_completed`/`paused` (the ORIGINAL drift
path — repro1/repro4 are exactly this RED, now with an honest message). Neither relaxes HC1.
6. DEFERRED closed: `machine-docs/DEFERRED.md` dstamp entry → ✅ RESOLVED with pointers.
**EXPECTED:** build 450 level 5, all tiers pass, both flags true; PR#2 ✅ passed; DEFERRED resolved.
**WHERE:** `/var/lib/cc-ci-runs/450/`; commits `0cc31a5`,`e9c26c7`; PR#2 comments 14346/14347;
`machine-docs/DEFERRED.md`. **No other recipe affected** (blast-radius: keycloak/n8n upgrade-PASS L4
across runs incl. rcust era; drone/traefik infra). Fresh Adversary M2 PASS → `## DONE`.
---
## (M1 — verified PASS; detail retained below)
**WHAT (M1 = Attribution):** root cause attributed by direct evidence; minimal reproducible
demonstration; 06-05→06-10 change identified; fix implemented (recipe overlay + harness, HC1
unweakened); blast-radius sweep complete.
Root cause: discourse `compose.yml` app service sets `deploy.update_config: { failure_action:
rollback, order: start-first, monitor: 5s }`. On the upgrade chaos redeploy, start-first co-resides
OLD+NEW (~2× memory) for the precompile/Rails-heavy app; under host memory pressure the NEW task
fails swarm's 5s update monitor → `failure_action: rollback` reverts the app service to its
PreviousSpec — INCLUDING the `coop-cloud.<stack>.chaos-version` label (head→base). Under start-first
the OLD task keeps serving, so `wait_healthy` passes; `deployed_identity` then reads the rolled-back
`.Spec` (base commit `eb96de94+U`) and HC1 misreports it as "re-checkout failed". abra+harness git
path EXONERATED (abra stamps head `7ae7b0f7+U` correctly; per-run HEAD=7ae7b0f at deploy).
**HOW to verify (Adversary, cold):**
1. *Recipe policy:* `cd ~/.abra/recipes/discourse && git checkout -q 7ae7b0f76efb && grep -nA3
update_config compose.yml` → `failure_action: rollback`, `order: start-first`. EXPECTED present.
2. *abra exonerated (minimal repro):* scratch ABRA_DIR, base→head checkout, `abra app deploy <d> -C
-o -n --debug` bails at `secret not generated` AFTER logging `app/deploy.go:372 version: taking
chaos version: 7ae7b0f7+U` (HEAD-correct). Procedure: JOURNAL-dstamp "mirror-faithful repro".
3. *Direct rollback evidence:* console `/var/lib/cc-ci-runs/dstamp-repro4.console.log` line
`[DSTAMP] post-redeploy svc inspect …` shows immediately post-redeploy `UpdateStatus.State=
"updating"`, `.Spec…chaos-version=7ae7b0f7+U` (head applied), `.PreviousSpec…chaos-version=
eb96de94+U` (base); the later HC1 read = eb96de94+U after the rollback completes.
4. *Fix present:* `runner/harness/lifecycle.py::assert_upgrade_converged` (+ `update_status_started`)
and its call in `runner/harness/generic.py::perform_upgrade`; `tests/discourse/compose.ccci.yml`
app `deploy.update_config.order: stop-first`. Commits `0cc31a5` + `e9c26c7`.
5. *Fix works:* run `dstamp-fix1` (fresh checkout, STAGES=install,upgrade) → upgrade PASS,
console `upgrade-converged: …UpdateStatus=completed` + `chaos-version=7ae7b0f7+U version=
0.7.0+3.3.1→0.9.0+3.5.0`. (Re-runnable: `RECIPE=discourse PR=2
REF=7ae7b0f76efb2988c1e54956348dc9eeb7812e0b SRC=recipe-maintainers/discourse
STAGES=install,upgrade CCCI_RUN_ID=<id> cc-ci-run runner/run_recipe_ci.py` from a checkout at
`e9c26c7`.)
6. *Blast-radius:* recipes with rollback+start-first = discourse, drone, keycloak, n8n, traefik.
keycloak/n8n upgrade PASS L4 across runs (155/186/187/m2r; 47/54/61/162/197/m2r) ⇒ not affected;
drone/traefik infra (no recipe-CI upgrade tier). Only discourse affected; the general
`assert_upgrade_converged` guard now protects all rollback-policy recipes.
**EXPECTED:** all of 16 hold. **WHERE:** commits 0cc31a5, e9c26c7; runs
`/var/lib/cc-ci-runs/dstamp-{repro1,repro2,repro4,fix1}`; recipe `~/.abra/recipes/discourse`.
HC1 teeth preserved: the commit-match assertion is unchanged; `assert_upgrade_converged` only makes
a swarm rollback an HONEST upgrade failure before HC1 runs (a genuinely undeployable head still
fails). M2 will demonstrate a wrong stamp still FAILs + full-lifecycle green via the `!testme` path.
---
## Root cause detail (evidence)
## ROOT CAUSE (attributed by direct evidence, abra+harness EXONERATED)
The upgrade chaos redeploy applies the **correct** head spec, then swarm **rolls it back** to the
base spec, reverting the `chaos-version` label — masked by the recipe's `start-first` strategy +
the harness's `wait_healthy` (the OLD task keeps serving, so health passes).
Recipe policy (`~/.abra/recipes/discourse/compose.yml`, app service): `deploy.update_config:
{ failure_action: rollback, order: start-first }`, `healthcheck.start_period: 20m`. The heavy
discourse app, started **start-first** (old+new co-resident ≈ 2× memory), intermittently fails
swarm's update monitor on the NEW task → swarm executes `failure_action: rollback` → app service
reverts to PreviousSpec (the base, `chaos-version=eb96de94+U`).
**Direct evidence (run `dstamp-repro4`, console `/var/lib/cc-ci-runs/dstamp-repro4.console.log`,
solo/isolated):** immediately after `chaos_redeploy`, `docker service inspect <stack>_app`:
- `UpdateStatus.State = "updating"`,
- `.Spec.Labels coop-cloud.<stack>.chaos-version = 7ae7b0f7+U` (HEAD applied — abra stamped head
correctly), `.version = 0.9.0+3.5.0`,
- `.PreviousSpec.Labels …chaos-version = eb96de94+U` (the base), `.version = 0.7.0+3.3.1`.
Then `wait_healthy` passes (old task serves under start-first); the new task fails the monitor →
rollback → `.Spec` reverts to `eb96de94+U`; the later HC1 read sees `eb96de94+U` → FAIL with the
misleading "re-checkout failed" message. (`dstamp-repro2`, lighter timing, had NO rollback →
upgrade PASS @ `7ae7b0f7+U`.)
Intermittency (184✓ solo 06-05; m2b/m2p/ab✗ clustered/heavier-load 06-10/11; repro1✗ repro2✓
repro4✗) = whether the new start-first task survives swarm's monitor under the host's momentary
memory pressure. The "since ~06-10 on every run" = the rcust phase ran under heavier resident load
(warm keycloak etc.) so the new task reliably failed → rollback every time. abra version-resolution
is CORRECT (proven: repro2 debug line `taking chaos version: 7ae7b0f7+U` + 3 bail-at-secrets repros);
the per-run git checkout is CORRECT (HEAD=7ae7b0f at deploy, reflog-proven). NOT abra, NOT the
per-run tree, NOT concurrency.
## Fix (in progress) — HC1 keeps its teeth
1. **Reliability (restore true level):** discourse `tests/discourse/compose.ccci.yml` overlay set
the app service `deploy.update_config.order: stop-first` so the new task boots with full memory
(no 2× co-residency) and genuinely becomes healthy → no spurious rollback. The upgrade-to-head
is still really deployed + asserted on head; HC1 unchanged. Documented WHY in the overlay header.
2. **Correctness (honesty, general):** the harness upgrade path detects a swarm rollback after the
chaos redeploy (UpdateStatus.State rollback*/paused, or `.Spec` reverted to `.PreviousSpec`) and
fails the upgrade with the TRUE reason ("head spec applied then swarm-rolled-back: new task
failed the update monitor") instead of the misleading "re-checkout failed". A genuinely
undeployable head still FAILS (teeth preserved).
3. **Blast-radius:** sweep all enrolled recipes for `failure_action: rollback` + start-first heavy
apps with the same latent signature.
## What is established (direct evidence, reproducible)
- **abra is CONSTANT, not the cause.** abra binary `bf6azhpi…-abra-0.13.0-beta` is the store
path for every nixos system generation from system-4 (2026-06-01) through system-11 (now).
No abra change between 06-05 and 06-10.
HOW: `for g in $(ls -d /nix/var/nix/profiles/system-*-link); do readlink -f "$g/sw/bin/abra"; done`
on cc-ci. EXPECTED: all `…bf6azhpi…` from system-4 on.
- **abra's chaos-version = `SmallSHA(git HEAD of the recipe checkout)`** (+`+U` if worktree
dirty). Source: abra@06a57de `cli/app/deploy.go:106,168,365-373` (chaos →
`toDeployVersion = Recipe.ChaosVersion()`), `pkg/recipe/git.go:300-318` (`ChaosVersion` =
`SmallSHA(Head())`), `:483-495` (`Head` = go-git `repo.Head()`). In chaos mode
`Recipe.Ensure` early-returns (`pkg/recipe/git.go:41-43`) — NO env-version re-checkout.
- **The isolated git/abra path stamps CORRECTLY now.** Three faithful reproductions on cc-ci
(scratch ABRA_DIR, fake domain, deploys bail at `secret not generated` AFTER the chaos
version is computed) all log `taking chaos version: 7ae7b0f7` (= PR head), NOT `eb96de9`:
1. `cp -a` canonical recipe + manual tag/head checkout.
2. real non-chaos base deploy (go-git `EnsureVersion` tag checkout) → CLI re-checkout head → chaos.
3. exact `fetch_recipe` replica: clone mirror `recipe-maintainers/discourse` @7ae7b0f +
`git fetch upstream refs/tags/*` → base deploy → re-checkout head → chaos.
HOW (variant 3, re-runnable cold): see JOURNAL-dstamp 2026-06-11 "mirror-faithful repro".
EXPECTED: `DEBU app/deploy.go:372 version: taking chaos version: 7ae7b0f7`.
- **Same ref, solo run was GREEN; clustered runs DRIFTED.** discourse @ ref `7ae7b0f76efb`:
run **184** (2026-06-05 02:17, solo) = **L4, upgrade PASS**; the 06-10/06-11 runs
**m2b-discourse** (06-10 20:54), **m2p-discourse** (06-11 00:44), **ab-discourse-7ae7b0f-oldmain**
(06-11 00:48) = **L1, upgrade FAIL** (`chaos commit 'eb96de94+U', not the intended PR-head
'7ae7b0f76efb' (HC1)`). HOW: `grep -oE '"level": [0-9]+|"upgrade": "[a-z]+"'
/var/lib/cc-ci-runs/{184,m2p-discourse}/results.json`.
- **All same-ref discourse runs share ONE swarm stack.** `naming.app_domain(recipe,pr,ref)` =
`<recipe[:4]>-<6hex(recipe|pr|ref)>.ci.commoninternet.net` → identical for identical
(recipe,pr,ref). The upgrade `chaos_redeploy` bypasses `deploy_app`'s app-domain flock
(`lifecycle.chaos_redeploy` / `generic.perform_upgrade`). LEADING HYPOTHESIS: the 06-10/06-11
drift is a CONCURRENCY ARTIFACT of the clustered rcust-M2 A/B discourse experiments racing on
the shared stack — NOT an abra/recipe/env regression. Under test now.
## In flight
- Implementing the fix (overlay stop-first + harness rollback detection), then a full real run
(all stages) to prove discourse reliably reaches its true level, then the `!testme` drone path.
- Repro evidence runs: `/var/lib/cc-ci-runs/dstamp-repro{1,2,3,4}.console.log` on cc-ci
(repro2 PASS @7ae7b0f7+U; repro4 captured the rollback Spec/PreviousSpec).
## Blocked
- (none)

107
machine-docs/STATUS-kuma.md Normal file
View File

@ -0,0 +1,107 @@
# STATUS — phase `kuma` (uptime-kuma create-a-monitor functional test)
SSOT: `cc-ci-plan/plan-phase-kuma-monitor.md`
## Current state
## DONE
All DoD items satisfied. M1+M2 Adversary PASSes in REVIEW-kuma.md.
- test_monitor_wizard_and_probe: wizard + real probe (Up + Down) in Playwright
- Drone builds #460 + #462 — LEVEL 5, 2× consecutive green (flake check ✓)
- Runtime 2.752.82 s ≪ 90 s budget ✓
- DEFERRED.md "uptime-kuma create-a-monitor" closed ✓
- PARITY.md updated with playwright/ test row ✓
- M1 PASS @2026-06-11T18:26Z, M2 PASS @2026-06-11T18:3xZ
- No standing VETO
## What is claimed
### Approach choice (DECISIONS.md)
Playwright (option b). Justification: python-socketio is NOT available in the cc-ci Nix env
(confirmed: only playwright + pytest in site-packages). Playwright drives the real browser;
Socket.IO is handled transparently. No Nix changes needed.
### Test file
`tests/uptime-kuma/playwright/test_monitor_wizard.py`
### What the test does
1. Completes uptime-kuma 2.2.1 first-run setup wizard (admin create via browser).
2. Creates HTTP monitor targeting the app's own root URL (guaranteed UP at test time).
3. Waits ≤90 s for status badge (`data-testid="monitor-status"`) to show "Up".
4. Asserts important-heartbeat table row exists with a real datetime stamp (proves probe ran).
5. Creates a second monitor targeting `http://127.0.0.1:19999/dead` (dead port → connection refused).
6. Waits ≤60 s for status badge to show "Down" (negative teeth).
### Selectors used (all confirmed in compiled bundle `dist/assets/index-D_mnxLA0.js`)
- Setup: `data-cy="username-input"`, `data-cy="password-input"`, `data-cy="password-repeat-input"`, `data-cy="submit-setup-form"`
- EditMonitor: `data-testid="friendly-name-input"`, `data-testid="url-input"`, `data-testid="save-button"`
- Details: `data-testid="monitor-status"`
- Heartbeat table: `table.table-hover tbody tr` (first row)
### Secret safety
Admin password: 64-char UUID hex, generated per-run. Never printed, never in any assertion error message.
### Probe reality
- "Up" in the status badge comes from `lastHeartbeatList` populated via Socket.IO heartbeat events
(socket.js mixin line 755). Cannot be "Up" unless a real probe completed and the server sent the
heartbeat over the socket.
- Important-heartbeat table row exists: `isFirstBeat` is always `important=true` (server/model/monitor.js
line 1420). Presence of a row with "YYYY-MM-DD HH:mm:ss" timestamp proves the probe ran after monitor
creation.
- Negative teeth: "Down" can only appear after the probe attempted and got connection-refused.
### How to verify (Adversary cold-check)
```bash
# Deploy uptime-kuma against any fresh cc-ci domain, then run:
CCCI_APP_DOMAIN=<domain> RECIPE=uptime-kuma STAGES=custom \
cc-ci-run -m pytest tests/uptime-kuma/playwright/test_monitor_wizard.py -v
# Expected: test_monitor_wizard_and_probe PASSED
# In the Drone-path, it runs under the "custom" tier via run_recipe_ci.py.
```
### Runtime
Local estimate: wizard ~10 s + 2× (navigate+fill+probe) ≤ ~60 s total. Within ≤90 s budget.
### CI evidence (M1)
- Drone build **#460** — uptime-kuma@eb4521cc (PR #3, comment #14349)
- Result: **LEVEL 5** — install/upgrade/backup/restore/custom/lint all PASS
- Custom tier: `functional: 3` (health_check, socketio_handshake, spa_branding) + `playwright: 1` (`test_monitor_wizard`)
- `test_monitor_wizard [pass]` confirmed in stage results
- `flags: {clean_teardown: true, no_secret_leak: true}`
- PR comment posted: git.autonomic.zone/recipe-maintainers/uptime-kuma/pulls/3 shows ✅ passed
- Artifacts: `/var/lib/cc-ci-runs/460/` on cc-ci
### M2 evidence (flake check + DEFERRED closed)
- Drone build **#462** — uptime-kuma@eb4521cc (PR #3, comment #14352)
- Result: **LEVEL 5** — install/upgrade/backup/restore/custom/lint all PASS
- `test_monitor_wizard [pass]` — 2 consecutive green runs (#460 + #462)
- DEFERRED.md entry "2026-05-28 — uptime-kuma create-a-monitor" closed (commit below)
- PARITY.md updated: new row for `tests/uptime-kuma/playwright/test_monitor_wizard.py`
### How to cold-verify M2
```
git pull; cat machine-docs/DEFERRED.md | grep -A2 "uptime-kuma create-a-monitor"
# → "CLOSED @2026-06-11 (Builder, phase kuma)"
cat tests/uptime-kuma/PARITY.md | grep playwright
# → row for test_monitor_wizard.py
cat /var/lib/cc-ci-runs/462/results.json | python3 ...
# → level:5, test_monitor_wizard [pass]
```
### How to cold-verify M1
```
# On Adversary's clone (cc-ci-adv):
git pull; git log --oneline -3 # confirm 8da59cf feat(kuma): implement wizard+monitor Playwright test
# Inspect the test:
cat tests/uptime-kuma/playwright/test_monitor_wizard.py
# Verify CI results:
cat /var/lib/cc-ci-runs/460/results.json | grep -E "level|playwright|wizard|status"
# → level:5, playwright:1, test_monitor_wizard:[pass]
# Check PR comment confirms ✅:
# https://git.autonomic.zone/recipe-maintainers/uptime-kuma/pulls/3
```
## Blocked
(nothing)

View File

@ -0,0 +1,71 @@
# STATUS — Phase lvl5 (L5 lint rung + de-cap)
## DONE
Phase complete 2026-06-11: M1 PASS (cfc87fd) + M2 PASS (13cad1f), both <24h, no VETO.
The 5-rung ladder (L5 = abra recipe lint on the exact tested ref) and the de-capped level
semantics (pass/fail/skip/unver; fails AND unverified rungs block, intentional skips climb;
no cap/cap_reason anywhere) are live on main @ a521d43 and verified end-to-end
(results.json schema 2 card dashboard badge PR comment, drone path included).
Cleanup done: throwaway PR custom-html#4 closed, branch lvl5-lintdemo deleted; WC5
stage-completeness observation filed in machine-docs/DEFERRED.md.
## M2 claim — proven in real CI
**WHAT:** plan-phase-lvl5 §4 M2: P3 matrix complete for ALL 19 enrolled recipes; P4 runs done
(genuine L5, lint-blocked L4, N/A-skip climb, drone path ×3, canaries at re-derived designed
levels, synthesized unver-blocks run); old artifacts render; durations not inflated;
before/after table complete; card/dashboard/badge visually verified.
**WHERE:** main @ `dc924c679b4ae6dd1e21bfe9d231acb28b58ddf8` (implementation merged 08e6cc8 after
M1 + PR-path fix 68c3486). Evidence runs (all artifacts at
`https://ci.commoninternet.net/runs/<n>/{results.json,summary.png,badge.svg,lint.txt}`):
| run | what it proves | EXPECTED content |
|---|---|---|
| 398 hedgedoc cold | genuine L5, full clean climb | level=5, all 5 rungs pass, schema=2, no cap keys, dur 100s |
| 399 custom-html-tiny cold | N/A-skip climb (was L2 @ #205) | level=5, backup_restore=skip + declared reason in skips.intentional, dur 45s |
| 405 custom-html PR4 (!testme) | lint-blocked L4 + verdict-neutral | level=4, lint=fail rules_failed=[R011], **drone build status SUCCESS**, dur 61s |
| 406 immich PR2 (!testme) | drone path L5 on real PR | level=5, dur 199s (shot baseline 198-199s no inflation) |
| 407 plausible PR3 (!testme) | drone path L5 on real PR | level=5, dur 164s (shot baseline 166s) |
| 413 mumble cold | table row (no prior artifact) | level=5, dur 80s |
| 415/416 bkp-bad/rst-bad (SRC+REF) | canaries at re-derived designed level | **verdict FAILURE (red)**, level=1, rungs {install pass, upgrade skip (no version tags on mirror), backup_restore fail, functional unver, lint pass} |
| host `/var/lib/cc-ci-runs/lvl5-unver-demo/results.json` | synthesized unver-blocks (mission ex. #3) | hand-run STAGES=install,upgrade,custom on custom-html: level=2, backup_restore=unver in skips.unintentional, functional+lint pass above it |
**HOW to verify (cold):**
1. Fresh clone main; `cc-ci-run -m pytest tests/unit/ -q` EXPECTED **247 passed** (new since M1:
`test_run_lint_detached_pr_tree_lints_exact_ref` PR-path regression, see fix 68c3486:
abra lint checks out the repo's DEFAULT BRANCH, so run_lint forces local `main` AT the tested
ref + repoints origin to the scratch itself; found live in builds 400-402 where the rung
correctly degraded to unver/level 4 with run verdicts unaffected).
`nix develop .#lint --command bash scripts/lint.sh` PASS.
2. Fetch each run's results.json above and check the EXPECTED column; drone build statuses via
API (only 415/416 red and red by tier failure, not by lint).
3. Visuals: Read `summary.png` of 398 (level 5 of 5, lint row PASS, green 5 badge), 399
(backup/restore row "INTENTIONAL SKIP" + reason, level 5), 405 (lint row FAIL red, level 4 of
5, badge #a0b93f); badges are number+colour ONLY.
4. Old artifacts: `/runs/370/{results.json,summary.png}` 200 + render (pre-lvl5 schema-1 with cap
fields); dashboard `/` and `/recipe/immich` 200 with mixed-schema rows; unit history-compat
tests (test_card/test_dashboard old-schema cases).
5. lint.txt served: `/runs/398/lint.txt` 200 (full abra table; rc/status header).
6. P3 matrix + §2.9 before/after table: BACKLOG-lvl5.md (19/19 lint pass sweep re-runnable per
the documented scratch method; baseline column from latest artifacts; REAL column from the
runs above; canary re-derivation note).
7. Dashboard runtime is the rolled image `cc-ci-dashboard:15addbc7bf45` (reconcile per DECISIONS
Phase 3/U2 no host switch).
**Notes for the verdict:**
- The throwaway lint-violation PR (custom-html#4, branch lvl5-lintdemo) is left OPEN and marked
do-not-merge so you can re-run `!testme` independently; Builder will close branch+PR after M2.
- Level shifts vs baseline are exactly the rule change (table): formerly-capped intentional-N/A
recipes climb; nothing else moved.
- Observation (pre-existing, out of phase scope, noted in JOURNAL): WC5 promote-on-green-cold
does not require all stages the STAGES-filtered green hand-run promoted custom-html's
canonical. Filed as a JOURNAL note; flag if you want it as a finding.
---
## (history) M1 claim — implementation complete (pre-merge): PASS @cfc87fd
Branch `phase-lvl5` @ 3d8d286 (claim 24baac5); 246 unit tests cold-green, repo lint PASS,
mirror-context decision reviewed, verdict-neutral confirmed. Merged to main 08e6cc8.

View File

@ -0,0 +1,100 @@
# STATUS — phase mailu (backupbot labels for mailu recipe)
**Phase plan:** `/srv/cc-ci/cc-ci-plan/plan-phase-mailu-backup.md`
**Builder:** autonomic-bot / Claude (Builder loop)
**Started:** 2026-06-11T18:00Z
---
## Current state
**Gate M1: CLAIMED — awaiting Adversary**
Drone build #473: LEVEL 5 PASS at PR#3 head (edc0201a79d3), all rungs green including
backup/restore on real seeded mail data. Claimed 2026-06-11T20:52Z.
**Gate M2:** NOT YET CLAIMED
---
## DoD tracker (M1)
- [x] Data-layout research documented (which volumes hold durable state, justification in PR desc)
- [x] Recipe-mirror PR open with backupbot v2 labels (admin `/data` + imap `/mail`)
- **PR#3**: https://git.autonomic.zone/recipe-maintainers/mailu/pulls/3
- Branch: `add-backupbot-labels`, head commit: `edc0201a79d36bc87696b0f93f1ee88ad7bd10ed`
- Version bump: `3.0.1+2024.06.52``3.0.2+2024.06.52`
- Adds `deploy.labels: {backupbot.backup: "true", backupbot.backup.path: "/data"}` to `admin`
- Adds `deploy.labels: {backupbot.backup: "true", backupbot.backup.path: "/mail"}` to `imap`
- [x] Version label bumped in compose.yml (3.0.1 → 3.0.2+2024.06.52)
- [x] cc-ci: `tests/mailu/ops.py` with pre_backup (seed mailbox) + pre_restore (delete mailbox)
- [x] cc-ci: `tests/mailu/test_backup.py` asserting mailbox present at backup time
- [x] cc-ci: `tests/mailu/test_restore.py` asserting mailbox restored after restore
- [x] cc-ci: `tests/mailu/PARITY.md` updated (P4 now covered, not N/A)
- [x] Full lifecycle green at PR head (L5) including backup/restore rung — via drone `!testme`
- **Drone build #473**: LEVEL 5 of 5 — all rungs PASS (install/upgrade/backup/restore/custom)
- `test_backup_captures_mailbox` PASS — `citest@<domain>` present in config-export at backup time
- `test_restore_returns_mailbox` PASS — `citest@<domain>` restored after pre_restore deletion
- Backup snapshot: `13eee64e` (139 files, 88MB, admin `/data` + imap `/mail`)
- Clean teardown: no `mailu-*` stack left on host (`docker stack ls` confirms)
- [x] Before/after level recorded
- **BEFORE** (main, no labels): `backup_capable=False` → backup rung = intentional-skip → max **L4**
- **AFTER** (PR#3 head edc0201a): `backup_capable=True` (auto-detected from labels) → backup rung = PASS → **L5**
## DoD tracker (M2)
- [ ] Fresh Adversary cold pass (independent re-trigger at PR#3 head)
- [ ] Levels reconciled
- [ ] DEFERRED entry closed
- [ ] STATUS-mailu.md operator summary
- [ ] REVIEW-mailu.md shows PASS for M1 + M2 (within 24h)
---
## Verification recipe (for Adversary M1 check)
```bash
# 1. Verify backupbot v2 labels in PR#3 compose.yml (branch: add-backupbot-labels)
GITEA_PASSWORD=$(grep GITEA_PASSWORD /srv/cc-ci/.testenv | cut -d= -f2-)
curl -s "https://git.autonomic.zone/api/v1/repos/recipe-maintainers/mailu/contents/compose.yml?ref=add-backupbot-labels" \
-u "autonomic-bot:${GITEA_PASSWORD}" \
| python3 -c "import sys,json,base64; print(base64.b64decode(json.load(sys.stdin)['content']).decode())" \
| grep -A4 backupbot
# Expected:
# admin service → backupbot.backup: "true" + backupbot.backup.path: "/data"
# imap service → backupbot.backup: "true" + backupbot.backup.path: "/mail"
# 2. Verify PR#3 head commit
# Expected: edc0201a79d36bc87696b0f93f1ee88ad7bd10ed
# 3. Verify drone build #473 level 5
DRONE_TOKEN=$(ssh cc-ci 'cat /run/secrets/bridge_drone_token')
curl -s "https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/473" \
-H "Authorization: Bearer ${DRONE_TOKEN}" | python3 -c "
import sys,json; b=json.load(sys.stdin)
print('status:', b['status'])
print('steps:', [(s['name'], s['status']) for st in b['stages'] for s in st['steps']])
"
# Expected: success, clone+ci both success
# 4. Verify full results.json
ssh cc-ci 'cat /var/lib/cc-ci-runs/473/results.json'
# Expected: level=5, all rungs pass (backup_restore, functional, install, lint, upgrade)
# 5. Re-trigger to verify at current PR head (M2 requirement):
# Comment !testme on PR#3 as the Adversary and observe level 5 again
# 6. Confirm DEFERRED entry for mailu backup is closed (see machine-docs/DEFERRED.md)
```
---
## Blocked items
(none)
---
## DONE
Not yet. Written here only when all DoD items have M1+M2 Adversary PASS in REVIEW-mailu.md.

View File

@ -0,0 +1,293 @@
# STATUS — sub-phase rcust (recipe-customization restructure)
## DONE
Phase complete 2026-06-11: M1 PASS (REVIEW-rcust.md 01f9f70, 2026-06-10) + M2 PASS (REVIEW-rcust.md
3245150, 2026-06-11) — both fresh, Adversary-verified, no standing VETO. Restructure merged to main
(01e6d49 + approved fix-forwards 1357544, 6cabbe7); all 21 recipes reconciled vs corrected
baseline; canaries 7/7 (Adversary's own cold run); drone path covered; zero leaked apps.
Non-rcust follow-ups filed in machine-docs/DEFERRED.md (discourse abra-stamp env drift,
bluesky-pds upstream image breakage re-pin).
Plan: /srv/cc-ci/cc-ci-plan/recipe-custom-restructure-full-plan.md (SSOT for this phase).
Reference spec: docs/recipe-customization.md @ 76a4b6b.
Work branch: `restructure/recipe-custom` (one commit per phase P1P6; merged to main only after M1 PASS).
## Phase progress
- [x] P1 — single loader + key registry + migrate L1L6 + unit tests + doc gen
(branch commit 472a68b)
- [x] P2 — delete legacy keys/paths: compose.ccci.yml first-class+auto-chaos; install-time deps only
(lasuite-docs migrated, setup_custom_tests.sh gone); SKIP_GENERIC meta deleted (env dev-only +
loud CI warning); conftest cleanup (deployed/deployed_app/app_domain gone, one `deps` fixture)
(branch commit 8cd72fd)
- [x] P3 — uniform ctx hook convention: HookCtx(.domain/.base_url/.meta/.deps/.op); all hooks
take ctx; legacy signatures raise MetaError at load naming the migration (branch fd02d9f)
- [x] P4 — custom-test ergonomics: placement rule (custom under functional/+playwright/ only),
op_state fixture, deps fixture tests (branch 29a28e2)
- [x] P5 — customization manifest: one block at run start (non-default meta keys, hooks, overlays,
custom-test counts, active CCCI_SKIP_GENERIC* env overrides with !! CI flag) printed +
embedded verbatim in results.json under "customization"; pure presentation, HC2-honoring
(branch commit 68954be — new runner/harness/manifest.py + tests/unit/test_manifest.py)
- [x] P6 — docs rewritten to the end state: recipe-customization.md is now the REFERENCE (was
review spec) — §8 records R1R9 resolutions, §4 keeps the generated table + HookCtx, §5 the
end-state shapes; testing.md invariant updated to install-time-deps isolation, generic
opt-out documented dev-only; enroll-recipe.md worked examples (lasuite-docs install-time
OIDC, mumble post-F2-14c), deps fixture, ctx signatures (branch commit da558ca)
- [x] Adversary inbox 19:06Z (P5 manifest dashboard hygiene) — addressed: secret-NAMED meta
values (top-level + nested dict keys) render as '<redacted>' in manifest + results.json;
key names stay visible; unit-test pinned (branch commit 858e0f5)
## P1P6 verification facts (for the eventual M1 cold-verify)
- WHERE: branch `restructure/recipe-custom`, P1=472a68b, P2=8cd72fd, P3=fd02d9f, P4=29a28e2,
P5=68954be, P6=da558ca, manifest-redaction fix=858e0f5 (branch head).
- HOW: `cc-ci-run -m pytest tests/unit -q` and `nix develop .#lint --command scripts/lint.sh`
from a clean checkout of the branch.
- EXPECTED: 192 passed; `lint: PASS`.
- New single loader: `runner/harness/meta.py::load()`; all-recipes typo gate + R2 proof in
`tests/unit/test_meta.py`; docs §4 table generated by `scripts/gen-meta-docs.py` (sync pinned
by unit test).
## M2 baseline matrix (built BEFORE merge, per plan M2.1)
Expected outcome per recipe dir for the post-merge regression sweep = most recent known-good
evidence. Levels are results.json `level`; evidence = run id under /var/lib/cc-ci-runs/<id>/
(on cc-ci) unless noted. Bad canaries are EXPECTED to fail at their designed tier.
| Recipe | Expected | Evidence |
|---|---|---|
| bluesky-pds | full lifecycle green: 5 tiers + 4 custom pass, deploy-count=1 (L4-equiv; pre-results-era) | Adversary cold run, REVIEW e45e0ee (Phase 2 Q4.3); weekly 06-05: up-to-date |
| cryptpad | L4 (all four essential rungs pass) | run 181 (06-05) |
| custom-html | L4 | run 182 (06-05) |
| custom-html-bkp-bad | DESIGNED-BAD: backup tier fail → backup_restore=fail, L1 | run regression-bad-restore-2 (06-02) |
| custom-html-rst-bad | DESIGNED-BAD: restore tier fail → backup_restore=fail, L1 | run regression-bad-restore-3 (06-02) |
| custom-html-tiny | L2 (backup_restore N/A — declared EXPECTED_NA; functional N/A) | run 205 (06-09) |
| discourse | L4 | run 184 (06-05) |
| ghost | L4 | run 185 (06-05) |
| hedgedoc | L4 | run 113 (06-02) |
| immich | L4 | run 307 (06-10) |
| keycloak | L4 | run 187 (06-05) |
| lasuite-docs | L5 (integration pass) | run 188 (06-05) |
| lasuite-drive | L5 (integration pass) | run 189 (06-05) |
| lasuite-meet | L5 (integration pass) | run 204 (06-09) |
| mailu | L2 (backup_restore N/A — no backupbot labels; functional pass) | run 191 (06-05) |
| matrix-synapse | L4 | run 203 (06-08) |
| mattermost-lts | L4 | run 196 (06-05) |
| mumble | all 5 tiers pass, deploy-count=1 (L4-equiv; pre-results-era) | log ~/ccci-mumble-f214c.log on cc-ci (05-31) |
| n8n | L4 | run 197 (06-05) |
| plausible | L4 | run 308 (06-10) |
| uptime-kuma | L4 | run 165 (06-02) |
Customization-executed spot-greps for M2.4 (mumble READY_PROBE tcp lines, cryptpad
SANDBOX_DOMAIN, ghost/discourse BACKUP_VERIFY + overlay copy + chaos base, lasuite-* deps
provisioning + OIDC skip-count 0, immich ops.py seeds, manifest block in every log) apply on the
sweep runs, not retroactively here.
## Gate
**Gate: M2 CLAIMED 2026-06-11 ~01:30Z, awaiting Adversary.**
### M2 claim — WHAT / HOW / EXPECTED / WHERE
WHAT: plan M2.0M2.4 complete on merged main. Merge 01e6d49 (build 326 green) + two
Adversary-approved fix-forwards: 1357544 (lasuite-drive best-effort bucket poll, approval 57c66ad)
and 6cabbe7 = merge of be2026a (services_converged completed-one-shot rule, approval a531746,
build 350 green on 914c166, merged-diff==branch-diff verified 4428e76). Canaries 7/7. All 21
recipe dirs reconciled vs the CORRECTED baseline (the Adversary-accepted L5≡L4+OIDC equivalence
for the three stale lasuite-* rows; one justified exclusion: bluesky-pds, non-rcust upstream image
breakage, DEFERRED.md). Drone→harness path covered (2 PR !testme runs green). Zero leaked apps.
RECONCILIATION (final evidence per recipe; run dirs under /var/lib/cc-ci-runs/):
| Recipe | Baseline | Final evidence | Match |
|---|---|---|---|
| bluesky-pds | full green (pre-results-era) | m2r L0 == m2rr L0 == ab-oldmain L0, all `Cannot find module /app/index.js` crash-loop | EXCLUDED: upstream image breakage, harness-neutral (DEFERRED.md) |
| cryptpad | L4 | m2r-cryptpad L4 | ✓ |
| custom-html | L4 | m2r-custom-html L4 | ✓ |
| custom-html-bkp-bad | designed backup fail, L1 | m2r: backup fail exactly | ✓ |
| custom-html-rst-bad | designed restore fail, L1 | m2r: backup pass → restore fail exactly | ✓ |
| custom-html-tiny | L2 (declared EXPECTED_NA) | m2r-custom-html-tiny L2 | ✓ |
| discourse | L4 (184, 06-05) | m2r/m2b/m2p + ab-oldmain×2: ALL deviations byte-identical old==new harness (restore race @default head: L2==L2; upgrade-HC1 @baseline ref PR=2: L1==L1, stamp eb96de94+U both) | env drift since 06-05, rcust-neutral (Adversary-verified, condition 3 of a531746) |
| ghost | L4 | m2r-ghost L4 | ✓ |
| hedgedoc | L4 | m2r-hedgedoc L4 | ✓ |
| immich | L4 | m2b-immich L4 @baseline ref + drone-path run 356 L4 | ✓ |
| keycloak | L4 | m2r-keycloak L4 | ✓ |
| lasuite-docs | L5 (stale schema) | m2r-lasuite-docs L4 all-pass + OIDC PASSED skip-0 | ✓ (accepted equivalence) |
| lasuite-drive | L5 (stale schema) | m2p2-lasuite-drive L4 all-pass + OIDC + MinIO PASSED, rc=0, post-both-fixes | ✓ (accepted equivalence) |
| lasuite-meet | L5 (stale schema) | m2r-lasuite-meet L4 all-pass + OIDC PASSED | ✓ (accepted equivalence) |
| mailu | L2 | m2r-mailu L2 | ✓ |
| matrix-synapse | L4 | m2r-matrix-synapse L4 | ✓ |
| mattermost-lts | L4 | m2b-mattermost-lts L4 @baseline ref | ✓ |
| mumble | all 5 tiers (pre-results-era) | m2r-mumble all tiers pass, deploy-count=1 | ✓ |
| n8n | L4 | m2r-n8n L4 | ✓ |
| plausible | L4 | m2b-plausible L4 @baseline ref + drone-path run 357 L4 | ✓ |
| uptime-kuma | L4 | m2r-uptime-kuma L4 | ✓ |
HOW (cold, from the Adversary's own clone / direct on cc-ci):
- per-recipe: `jq '{recipe,level,rungs,flags}' /var/lib/cc-ci-runs/<id>/results.json` for every id
above; logs in /root/m2-logs/, /root/m2-baseline-logs/, /root/m2-proof-logs/, /root/m2-ab-logs/.
- canaries: /root/m2-canary.log (7/7, fresh clone of merged main).
- drone path: builds 356 (immich#2) + 357 (plausible#3) `custom` events SUCCESS in drone DB
(`docker cp <drone_cid>:/data/database.sqlite` + sqlite query, as documented above); run dirs
356/357 carry `customization` manifest keys + clean flags; triggered by real `!testme` comments
(gitea comment ids 14317/14318).
- M2.4 spot-greps: section above (manifest 21/21, mumble tcp probe, ghost/discourse overlay+
BACKUP_VERIFY, lasuite deps+OIDC, immich seeds, cryptpad EXTRA_ENV hook+playwright).
- zero-leak: `docker stack ls` on cc-ci → infra (backups/bridge/dashboard/reports/drone/traefik)
+ warm-keycloak ONLY (checked 01:27Z, after ALL runs incl. drone-path).
- tree: origin/main, working tree clean, every claim-referenced commit pushed.
EXPECTED: every check above reproduces as stated; no recipe regresses vs the corrected baseline.
WHERE: origin/main @ (this commit); REVIEW-rcust.md holds M1 PASS (01f9f70), be2026a approval +
all-conditions-cleared (a531746, 24a203a); DEFERRED.md holds the two non-rcust follow-ups
(discourse abra-stamp mechanism, bluesky-pds upstream re-pin).
**Gate history: M2 IN PROGRESS** — M1 PASS in REVIEW-rcust.md (01f9f70, 2026-06-10).
- M2.0 merge: `restructure/recipe-custom` merged to main as 01e6d49 (merge commit, no force);
push build green: drone build **326 success** on 01e6d49 (API-verified).
- M2.2 canary suite: **7/7 PASSED** in 286s (fresh clone of merged main at /root/m2-sweep on
cc-ci, log /root/m2-canary.log) — green canaries pass, all four RED canaries still caught at
their designed tiers (bad-install/bad-upgrade/bad-backup/bad-restore).
- M2.3 per-recipe sweep (driver /root/m2-driver.sh, 2 concurrent, REF = mirror heads; logs
/root/m2-logs/<r>.log; results /var/lib/cc-ci-runs/m2r-<r>/): first pass **15/21 matched
baseline** —
hedgedoc/custom-html/custom-html-tiny/uptime-kuma/n8n/cryptpad/ghost/keycloak/mumble/mailu/
matrix-synapse/lasuite-docs/lasuite-meet at baseline level; both DESIGNED-BAD canaries failed
at exactly their designed tier (bkp-bad: backup fail; rst-bad: backup pass→restore fail).
6 below baseline, ALL flake-shaped (known modes, not new assertion semantics):
discourse+plausible+mattermost-lts+immich restore data-integrity (the documented pre-existing
truncated-dump capture race — discourse BACKUP_VERIFY honestly failed 3/3 attempts, its
docstring + the 06-05 weekly report record this exact mode pre-restructure; seeds verified
committed by ops.py read-back asserts, i.e. the migrated ctx hooks executed correctly);
bluesky-pds abra `FATA deploy timed out` at default 600s during concurrent image pulls;
lasuite-drive pre_install MinIO one-shot 90s timeout (bucket appeared later — every
subsequent tier passed). Serial re-runs (MAX=1, /root/m2-rerun.sh, logs /root/m2-rerun-logs/,
results m2rr-<r>/) completed 20:44Z — but ran default heads, not baseline refs (superseded by
the targeted runs below).
- M2.3 reconciliation runs (serial, MAX=1):
- **Baseline-ref re-runs on merged main** (/root/m2-baseline-runs.sh, logs /root/m2-baseline-logs/,
results m2b-<r>/): **plausible L4, mattermost-lts L4, immich L4** at their exact baseline refs —
baseline REPRODUCED on the restructured harness; restore-race cluster closed for those three.
m2b-discourse @7ae7b0f (ran PR=0; baseline run 184 was PR=2): **L1, NEW mode** — upgrade HC1
`deployed chaos commit 'eb96de94+U', not PR-head '7ae7b0f76efb'`. Investigated facts (cold-checkable
in /var/lib/cc-ci-runs/m2b-discourse/): `eb96de94` IS the prev-base tag commit `0.7.0+3.3.1`
(`git -C .../abra/recipes/discourse rev-list -n1 0.7.0+3.3.1`); the preserved per-run clone HEAD =
7ae7b0f (the upgrade re-checkout DID run and persist); the
`service "sidekiq" depends on undefined service "discourse"` log line is benign noise (appears
verbatim in the PASSING m2r/m2rr upgrade sections too; published compose ships a dangling
depends_on — see tests/discourse/compose.ccci.yml NOTE). So the chaos redeploy itself left the
base stamp in place at this ref. NOT folded into the restore-flake cluster; discriminating runs
queued (below).
- **Old-main A/B at the m2r ref** (/root/m2-ab.sh, /root/m2-ab-logs/, results ab-<r>-oldmain/):
discourse @7d53d4ec on OLD main = **L2 restore fail** == new-main m2r L2 at the same ref →
restore race harness-neutral at that ref. bluesky-pds @b2d86ef on OLD main = **L0 install fail**.
- **bluesky-pds re-characterized (not a pull timeout)**: the app container crash-loops
`Error: Cannot find module '/app/index.js'` (MODULE_NOT_FOUND, Node v24.15.0) in ALL THREE
failures — m2r (new main @ mirror head), m2rr (new main, serial), ab-oldmain (OLD main @ old
default head b2d86ef). Same pinned tag, both harnesses, both refs → upstream image content moved
under the tag; recipe cannot deploy on ANY harness. Evidence:
`grep -r MODULE_NOT_FOUND /var/lib/cc-ci-runs/{m2r,m2rr,ab}-bluesky-pds*/abra/logs/default/`.
Restructure-neutral (old==new L0).
- M2.3 in-flight proof runs (serial queue /root/m2-proof.sh + /root/m2-proof2.sh, logs
/root/m2-proof-logs/, driver /root/m2-proof-logs/driver.log):
1. **lasuite-drive @baseline ref ffa7d585afa2 PR=1 on merged main @5c0676b** (post-fix-forward
1357544) → run id m2p-lasuite-drive: **WILL LAND L0 — second P2b regression found via this
run, root-caused LIVE.** The 1357544 best-effort path WORKED (`!!` warn + continue in the
log); the one-shot task went **Complete** ~3min in (bucket created); but a completed
restart_policy-none one-shot reports replicas 0/1 FOREVER, and services_converged requires
cur==want → the install assert burned DEPLOY_TIMEOUT (1800s) and failed. Old world never saw
this: setup_custom_tests.sh ran POST-install-assert (its own header: orchestrator runs it
after the deploy is healthy); P2b moved the trigger to ops.py pre_install = PRE-assert.
Verified live during the run: app HTTP 200, all other services 1/1,
`docker service ps ..._minio-createbuckets` = Complete, pytest in converge loop 27+ min.
**Fix-forward proposed, awaiting Adversary approval: branch `fix/converged-oneshot` @
be2026a** — services_converged treats a replica deficit explained ENTIRELY by Complete tasks
as converged (Failed/mixed/spinning-up/no-tasks still block; 0/0 + N/N unchanged); pinned by
tests/unit/test_converged_oneshot.py (7 cases). Proof: working tree on cc-ci
`cc-ci-run -m pytest tests/unit -q` → 199 passed; lint PASS.
**APPROVED (REVIEW a531746) and MERGED to main as 6cabbe7** (merge commit, no force);
merged diff == be2026a diff (`git diff be2026a..main -- runner/harness/lifecycle.py
tests/unit/test_converged_oneshot.py` = empty). Push build green: drone build **350
success** on 914c166 (branch head incl. the merge; verify on cc-ci:
`docker cp <drone_cid>:/data/database.sqlite /tmp/d.sqlite && sqlite3 /tmp/d.sqlite
"select build_number,build_status,build_after from builds order by build_id desc limit 5"`).
Post-fix re-run QUEUED: /root/m2-proof3.sh waits for the discourse A/B pair to drain, then
runs lasuite-drive @ffa7d585afa2 PR=1 from fresh clone /root/m2-postfix @6cabbe7
CCCI_RUN_ID=m2p2-lasuite-drive, log /root/m2-proof-logs/lasuite-drive-postfix.log.
EXPECTED **L5** (binding condition 1 of the approval).
DISCLOSED INTERVENTION: in the doomed pre-fix m2p run, after the GENERIC install assert had
already failed at the 1800s converge deadline, the OVERLAY install test entered a second
identical 1800s converge burn — Builder sent it (pytest pid only) SIGINT at ~01:00Z to skip
the redundant 20+ min wait. The log therefore shows `KeyboardInterrupt` at generic.py:97
(the converge poll — the exact diagnosed line). The orchestrator's own exit paths/teardown
untouched; run continued to upgrade/backup/restore/custom normally. The m2p result is
diagnostic evidence of the bug, not a baseline data point — the binding proof is m2p2.
2. **discourse @7ae7b0f PR=2 on merged main** (exact baseline-184 invocation) → m2p-discourse:
**COMPLETE — L2, upgrade HC1 fail, chaos-version=eb96de94+U** (identical to m2b: stamp = the
prev-base tag commit). Deterministic at this ref on new main; NOT a PR=0 artifact, NOT a race.
install/backup/restore/custom all pass.
3. **discourse @7ae7b0f PR=2 on OLD main** → ab-discourse-7ae7b0f-oldmain: **COMPLETE — L2,
upgrade HC1 fail, chaos-version=eb96de94+U — BYTE-IDENTICAL failure to the new-main run.**
**DISCOURSE A/B CLOSED: old harness == new harness at the baseline ref + baseline invocation
(PR=2). The upgrade-HC1 mode is HARNESS-NEUTRAL — not an rcust regression.** Baseline 184's
L4 (06-05) vs today's identical-both-worlds failure = environment/content drift since 06-05,
outside both harnesses. Drift candidates checked and ELIMINATED: 7ae7b0f is still a live
branch tip in the mirror (`refs/heads/upgrade-0.8.0+3.5.0` + `refs/pull/2/head` — git
ls-remote), and upstream's latest release tag is unchanged (0.7.0+3.3.1 = eb96de94, no new
tag since 06-05). flake.lock (abra pin) identical in both worlds. HC1 firing rather than
false-greening is the guard working as designed.
Cold-verify: results.json + full logs at /var/lib/cc-ci-runs/{m2p-discourse,
ab-discourse-7ae7b0f-oldmain}/ + /root/m2-proof-logs/discourse{,-oldmain}.log.
4. **lasuite-drive @ffa7d585afa2 PR=1 on merged main @6cabbe7 (post-converge-fix)**
m2p2-lasuite-drive: **COMPLETE in 3m19s, rc=0 — all 5 stages pass, deploy-count=1,
`test_oidc_password_grant_against_dep_keycloak` PASSED (requires_deps skip-count 0),
`test_minio_bucket_present_and_object_roundtrip` PASSED, clean_teardown+no_secret_leak
flags true. NO converge burn: the one-shot again exceeded its 90s window (`!!` best-effort
line), completed late, and the install assert passed straight through — both fix-forwards
proven end-to-end.** results.json `level=4`, NOT 5 — see schema note below.
- **BASELINE SCHEMA NOTE (affects lasuite-docs/-drive/-meet expected "L5")**: the 6-rung ladder
(L5 integration / L6 recipe-local) was REMOVED from main by the deliberate mainline refactor
46e2cdb + c51cd84 ("four essential rungs only — integration & recipe-local are optional",
PR #6, 2026-06-09 ~03:00Z) — BEFORE the rcust merge and NOT part of it (merge diff
01e6d49^1..01e6d49 touches level.py not at all and results.py by +4 lines; current
derive_rungs/compute_level are byte-equal to the pre-merge main versions). Every post-06-09 run
caps at L4 BY DESIGN; the integration (OIDC) test now counts inside the functional/custom rung.
Timeline evidence: run 204 (lasuite-meet, 06-09 pre-deploy) = 6-rung level 5; all later runs =
4-rung. EQUIVALENCE for the baseline matrix: old "L5 (integration pass)" ≡ new "L4 all-rungs
pass + the requires_deps OIDC test PASSED (skip-count 0)". m2p2-lasuite-drive meets it; the
m2r sweep's lasuite-docs + lasuite-meet L4-all-pass results (with their OIDC PASSED lines,
already in M2.4 spot-greps) meet it identically.
- M2.4 spot-greps (customizations actually executed — log evidence in /root/m2-logs/):
manifest block present 21/21; mumble `ready-probe OK (tcp 3x): 127.0.0.1:64738`; ghost+discourse
`ccci-overlay: provided compose.ccci.yml ... auto-chaos` (P2a first-class path live);
discourse BACKUP_VERIFY hook live (3 verify lines); lasuite-docs `install-time OIDC:
provisioning deps ['keycloak'] BEFORE deploy` + `test_oidc_login_via_keycloak PASSED`
(requires_deps skip-count 0); immich ops.py pre_upgrade/pre_backup/pre_restore seed lines;
cryptpad EXTRA_ENV='<hook>' in manifest + its 4 overlays + playwright green (hook applied);
19 screenshot.png across m2r-* dirs.
- Teardown: `docker stack ls` after the full 21-recipe sweep = infra stacks + warm-keycloak only,
**zero leaked apps**.
- Drone→harness path: !testme on two open recipe PRs pending after the re-runs.
**Gate history: M1 CLAIMED 2026-06-10 → PASS** (branch head 858e0f5)
- WHAT: P1P6 complete on branch `restructure/recipe-custom` (P1=472a68b, P2=8cd72fd, P3=fd02d9f,
P4=29a28e2, P5=68954be, P6=da558ca, +858e0f5 manifest redaction). Working tree clean, all pushed.
- HOW (cold, from a fresh clone of the branch):
- `cc-ci-run -m pytest tests/unit -q` → EXPECTED: **192 passed**
- `cc-ci-run -m pytest tests/concurrency -q` → EXPECTED: **23 passed** (untouched by this plan;
Builder proof run 2026-06-10 on branch head: 23 passed in 11.46s)
- `nix develop .#lint --command scripts/lint.sh` → EXPECTED: **lint: PASS**
- resolved-customization diff old-vs-new for all 21 recipe dirs (Adversary's own script) →
EXPECTED: 0 deltas
- adversarial review of the full diff `main..restructure/recipe-custom`
- WHERE: origin branch `restructure/recipe-custom` @ 858e0f5; baseline matrix above (M2 prep,
committed pre-merge per plan).
## Current
M2 CLAIMED (see Gate above) — awaiting Adversary cold-verify. No other unblocked work in this
phase; DONE follows the M2 PASS handshake.

View File

@ -0,0 +1,65 @@
# STATUS-shot.md — Builder status, phase `shot`
SSOT: /srv/cc-ci/cc-ci-plan/plan-phase-shot-screenshots.md
## DONE
Phase `shot` complete @2026-06-11T07:20Z: M1 PASS (ae10b55) + M2 PASS (2b54adb), finding A1
fixed+CLOSED (5fc8699), no VETO. All 19 enrolled recipes show Adversary-verified real screenshots
(18 PNGs Read by both loops, credential-free) or agreed N/A (bluesky-pds upstream-broken;
mumble best-available loader frame, DEFERRED upstream question). Fixes on main through 196156e.
## Gate history
Gate: M1 PASS (REVIEW-shot.md ae10b55). Finding A1 CLOSED (5fc8699).
Gate: M2 PASS (REVIEW-shot.md 2b54adb).
## M2 claim — verification map (WHAT/HOW/EXPECTED/WHERE)
WHAT: every enrolled recipe (19) is OK or Adversary-agreed N/A; fixes merged to main; fresh proof
runs incl. 2 via drone !testme; verdicts/levels/durations unaffected; screenshot path stays
best-effort end-to-end (R7); no PNG shows credentials.
Fix commits on main: ce50f64 (harness settle+blank-retry), 7ad7d1f (A1 keep-larger), b98a471
(plausible SECRET_KEY_BASE 62→68ch — the real NULL root cause; no hook needed), 80e5713+3c33129
(mattermost hook → /login + click "View in Browser"; public settle()). Unit: 207 pass
(`cc-ci-run -m pytest tests/unit -q`), lint PASS (`nix develop .#lint --command scripts/lint.sh`).
HOW to verify per recipe — artifacts on cc-ci `/var/lib/cc-ci-runs/<run>/{results.json,
screenshot.png,summary.html}`; scp the PNG and Read it. Full table with run dirs, levels
(each = its baseline), exact PNG bytes, and what each image shows: BACKLOG-shot.md "P4 — Proof
runs". Fixed-class proofs: immich=370 (drone !testme immich#2, posted 05:56:32Z), plausible=371
(drone !testme plausible#3), keycloak, cryptpad, lasuite-meet, lasuite-docs, lasuite-drive, n8n,
mattermost-lts (shot-proof3-* = hook v2 → real login form), mumble (best-available loader frame —
see N/A-variant below). Healthy-class (ghost 444183B, hedgedoc 131967B, discourse 66121B,
custom-html 35707B, custom-html-tiny 12950B, mailu 33800B, matrix-synapse 33296B,
uptime-kuma 30858B): cite the P1-matrix artifacts (m2r-*/m2p-* dirs per P1 table) — plan §3 P4 allows
existing artifact + visual check for class-3; all Read by Builder, all credential-free.
EXPECTED on re-run of any fixed recipe: results.json `screenshot: "screenshot.png"`, PNG ≥ ~26KB
real app view (mumble excepted), level equal to that recipe's baseline (immich 4, plausible 4,
keycloak 4, cryptpad 4, lasuite-* 4, n8n 4, mattermost-lts 2, mumble 4).
R7 / budget: wait components 45(nav, only-on-failure)+10(settle)+0.5+4(blank retry)+0.5 = 60s,
unit-tested (test_wait_budget_within_step_cap); capture() still swallows everything → None →
placeholder; double-wrapped at the call site (run_recipe_ci.py:1024-1037, unchanged).
Durations (drone, same recipe+PR pre/post): immich 199s→198s, plausible 209s→166s. Drone sqlite:
`select build_id, build_finished-build_started from builds where build_id in (356,357,370,371)`.
Dashboard/card: `https://ci.commoninternet.net/` grid references runs/370+371 screenshot.png (both
HTTP 200); summary.html embeds screenshot.png; /badge/immich.svg 200.
N/A + N/A-variant (need Adversary agreement at this gate):
- bluesky-pds: unchanged upstream MODULE_NOT_FOUND breakage (DEFERRED.md, evidence
ab-bluesky-pds-oldmain 2026-06-11, install=fail level=0) → capture correctly skipped, placeholder
correct.
- mumble: web client (rankenstein/mumble-web:0.5) never paints UI for an anonymous browser —
≥90s observation, no console errors, no failed requests, connect-dialog DOM absent, no
autoconnect overrides (probes: /tmp/mumble-probe{3,4}.out, /tmp/mumble-orch{4,5}.log on cc-ci).
The 7980B loader frame IS the genuine anonymous web view; voice covered by protocol tests.
DEFERRED.md entry filed (upstream question). Claimed as documented best-available, not a defect.
## Blocked
(nothing)