cc-ci/REVIEW-1b.md

# REVIEW — Phase 1b (review & lint pass) — Adversary ledger

**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1b-review-lint.md`
**Loop state for THIS phase:** STATUS-1b / BACKLOG-1b / JOURNAL-1b (Builder) · **REVIEW-1b (Adversary, this file)** · DECISIONS.md shared.
Phase-1 STATUS.md/BACKLOG.md/REVIEW.md and the Phase-1c `*-1c.md` files are HISTORY — not this phase's state.

This phase the Adversary is **also the white-box reviewer** (§3 checklist), so this ledger holds both
white-box review findings and the eventual cold RL3 re-verification of D1–D10.

DoD I must independently confirm (RL1 lint-in-CI-green · RL2 §3 checklist run, blocking fixed · **RL3
full cold D1–D10 re-verify — the final gate** · RL4 docs). Order per §2: tooling → review fixes → *then*
RL3. **Cardinal rule:** never weaken a test to satisfy a lint/review nit; RL3 must confirm cleanup
softened/skipped/regressed nothing.

---

## Phase-1b orientation @2026-05-27 (Adversary cold start)
- Pulled clean; Phase 1c is signed-off DONE (commit 6d2bc3d). Phase 1b kicked off by operator (manual transition).
- Builder has **not yet seeded** STATUS-1b/BACKLOG-1b/JOURNAL-1b and has not claimed W0. No gate pending.
- I began the independent white-box §3 review immediately (it's my role this phase and needs no Builder gate).

## White-box §3 prep pass #1 @2026-05-27 — over post-1c codebase (PRE-cleanup baseline; advisory until RL3)
Recording the baseline state *before* any W0/W1 cleanup, so I can later confirm the cleanup regressed nothing.

- **Tests are real** — PASS (provisional). Swept all 6 recipe suites (custom-html, lasuite-docs, keycloak,
  matrix-synapse, n8n, cryptpad) × install/upgrade/backup + conftest + runner/harness. No
  `@pytest.mark.skip/xfail/skipif`, no commented-out asserts, no tautologies. Install tests assert real
  app content (matrix: parses `versions` JSON non-empty; keycloak: admin DOM; others: Playwright body).
  Upgrade tests deploy v(n-1) → write marker → upgrade → assert exact marker survives. Backup tests
  establish+verify state → backup → mutate+verify → restore → assert exact pre-mutation state (keycloak
  deletes a realm). **Watch-item (to re-check black-box at RL3):** every upgrade test has a *conditional*
  `pytest.skip()` when no previous published version exists (e.g. custom-html test_upgrade.py:17-18). Valid
  by design, but if it ALWAYS skips, the upgrade stage would be silently fake — RL3 must confirm the
  upgrade stage actually RUNS (prev version found) for ≥1 recipe, not just skips. (1c E2E exercised this.)
- **Server state Nix-declared & idempotent** — PASS (provisional). No `.bootstrapped`/run-once sentinels in
  modules/ or scripts/ (grep clean). Convergence/oneshot pattern per §9 to be re-read fully in pass #2.
- **No footguns / sleep** — PASS (provisional). All `time.sleep()` in runner/harness/lifecycle.py (147,157,
  212,238) + bridge.py (280) are **poll-loop intervals inside `while time.time() < deadline:` loops**, not
  bare readiness waits. `wait_healthy` polls converge-then-HTTP with timeouts. Teardown (lifecycle.py:215)
  is correctly ordered (undeploy → `docker stack rm` fallback → volumes/secrets while .env exists → drop
  .env last), retries volume removal, and **verifies residual is empty (raises TeardownError otherwise)**.
- **No secrets in code/committed files** — PASS (provisional). Grep for inline passwords/tokens/private-key
  blocks across *.py/*.nix/*.sh/*.yml clean (only env/file references + generators). Full leak re-verify
  (incl. published logs + dashboard, and generated app passwords) belongs to RL3 D6.

Still owed in white-box pass #2 (after I read the rest): **harness DRY** (recipe quirks in shared harness,
not per-recipe copy-paste), **log redaction real** (bridge/dashboard/log pipeline), **architecture matches
plan** (layout/§3, poll-primary trigger §4.1, traefik-is-coop-cloud-recipe §4.2; drift → DECISIONS.md).

## W0 (RL1 — lint/format tooling + green) : **PASS** @2026-05-27 (Adversary cold)
Gate claimed in STATUS-1b. Acceptance: clean checkout → `nix develop .#lint --command bash
scripts/lint.sh` → `lint: PASS`; lint stage wired in `.drone.yml` push pipeline. **Verified cold,
independently** (no nix on sandbox; ran on cc-ci over a *pristine* tree, not the Builder's working copy):

- **Cold checkout = exact reviewed SHA.** `git archive 233939a` (= my `origin/main` HEAD) piped to
  cc-ci → `/tmp/ccci-cold` (clean tree, no untracked/cached state, secrets submodule empty as lint
  excludes it). Not cloned from `/root/cc-ci` (that's a non-git plain copy) — archived from my own clone.
- **Lint PASS cold.** `HOME=/root nix develop .#lint --command bash scripts/lint.sh` → **exit 0,
  `lint: PASS`.** All 8 linters ran clean: nixpkgs-fmt (0/14 reformat), statix, deadnix, ruff format
  (32 files), ruff check (all passed), shfmt, shellcheck, yamllint.
- **Stage real, not rigged.** `scripts/lint.sh` genuinely invokes each linter in check mode and
  accumulates a `fail` flag → `exit "$fail"` (correct `set -uo pipefail`, no `-e`, so all run). The
  `.drone.yml` `self-test` push pipeline runs the *exact* command `nix develop .#lint --command bash
  scripts/lint.sh` and FAILs the build on non-zero. Toolchain pinned from nixpkgs in `flake.nix`
  (`devShells.lint`), so CI == local.
- **Gate has TEETH (break-it probe).** Injected violations into the cold tree (a `.py` with
  `import os,sys` + `x=1+2`, and a mis-formatted `.nix`) → re-ran lint → **exit 1, `lint: FAIL`**
  (ruff E401/I001/F401 + nixpkgs-fmt). So the stage is not vacuously green.

Verdict: **W0 PASS.** Builder may proceed to W1.
Advisory (not W0-blocking; re-confirm at RL3): Builder notes the Gitea→Drone *push* webhook is flaky
(§4.1), so the lint stage may not auto-fire as a real Drone build on every push — RL1's intent
("future commits stay clean") depends on that path actually firing. The stage IS wired and proven
green via its exact command; I'll confirm a real push triggers the Drone lint build when I re-verify
M2/D-gates at RL3 (it overlaps). Not filing a finding now — bounded phase, acceptance-as-stated is met.

## White-box §3 pass #2 @2026-05-27 (Adversary, post-W0 formatted code) — RL2 input
Remaining §3 checklist items. **No blocking findings.**

- **Harness is DRY** — PASS. Recipe quirks live in shared harness + per-recipe *declarative* metadata
  (`tests/<recipe>/recipe_meta.py`: HEALTH_PATH/HEALTH_OK/timeouts/EXTRA_ENV), consumed uniformly by
  `tests/conftest.py` (`_recipe_meta`, `deployed`/`deployed_app` fixtures) and
  `runner/harness/lifecycle.py` (`_recipe_extra_env`). **No `if recipe == "..."` branches in the shared
  harness** (the M6.5 no-surgery rule holds). Recipe-specific logic is isolated to that recipe's dir
  (e.g. keycloak `kc_admin.py`, cryptpad's derived SANDBOX_DOMAIN). Only smell: the ~6-8-line `old_app`
  upgrade fixture is copy-pasted across recipes — thin boilerplate over shared metadata; **advisory**,
  not a violation (factoring it would just add another per-recipe injection point). → IDEAS, not blocking.
- **Architecture matches plan** — PASS. §4.1 trigger is **poll-primary** (`bridge/bridge.py` `poll_loop`
  runs unconditionally every ≤60s; webhook is optional + dedup'd by comment id; exact trimmed `!testme`;
  commenter-auth via read-level `GET /orgs/{owner}/members/{user}` 204=allow, fail-closed). §4.2 Traefik
  is the **real coop-cloud/traefik recipe via abra** (`modules/proxy.nix`: `abra recipe fetch/app new
  traefik`, `WILDCARDS_ENABLED=1`, `compose.wildcard.yml`, `LETS_ENCRYPT_ENV=""` → no ACME, cert as
  `ssl_cert`/`ssl_key` swarm secrets) — no hand-rolled traefik.nix. §3 layout matches.
- **Server state Nix-declared & idempotent** — PASS. `modules/proxy.nix` `deploy-proxy` is
  `Type=oneshot`+`RemainAfterExit`, re-runs every activation and converges (insert secret only if
  absent, deploy). No `.bootstrapped`/run-once sentinels anywhere (grep clean, pass #1). Leans on 1c's
  already-proven D8 (byte-identical closure + live throwaway rebuild, no manual post-step).
- **Log redaction is real** — PASS for infra secrets; **one advisory gap to verify behaviorally at
  RL3/D6.** `runner/run_recipe_ci.py` `_redact_values()` reads `/run/secrets/*` (≥8-char values) and
  `run_stage_redacted()` masks them in live-streamed stage output (sorted longest-first → no partial
  leak). **But class-B *generated app passwords* are NOT under `/run/secrets/*`, so they are NOT in the
  `_REDACT` list** — their non-leak rests entirely on the "harness never prints them / abra doesn't echo
  generated ones" assumption (code comment, run_recipe_ci.py:59-60). Also: the runner's *own* stdout
  (the `cc-ci-run …` Drone step) bypasses `run_stage_redacted`. This is exactly what my behavioral D6
  leak test must catch at RL3 (grep published Drone logs **and** the dashboard for a known generated app
  password). Phase-1 D6 passed that test once; recording the white-box shape so RL3 re-checks it, not a
  new blocking finding. → **WATCH-ITEM for RL3/D6.**
- **Readability / docs accuracy** — advisory; defer to RL4 (docs) + the ruff/lint pass already covers
  dead code / style deterministically.

**Net of §3 white-box review (RL2 input): no blocking findings; 2 advisories** (old_app copy-paste →
IDEAS; app-secret redaction → RL3/D6 watch-item). I expect Builder's W1 to be light. I have NOT filed
`[adversary]` BACKLOG items since nothing is blocking — will file if W1/RL3 surfaces a real defect.

## Operator added RL5 + RL6 (plan §7, 2026-05-27) — both BLOCKING for 1b DONE. Noted; verification plan:
- **RL5** (Builder moves; Adversary verifies cold): `modules/`→`nix/modules/`, `hosts/`→`nix/hosts/`;
  `flake.nix`/`flake.lock` STAY at root so build ref `#cc-ci` is unchanged; fix flake internal paths +
  `.drone.yml`/scripts refs; update `docs/architecture.md`. **Verification folds into RL3:** a fresh
  recursive clone must still rebuild **byte-identical to the running system** (toplevel store hash WILL
  change — expected; what must hold is build==running + reproducible). I'll re-confirm cold at RL3.
- **RL6** (coordinated near-END-of-1b): move `STATUS*/REVIEW*/JOURNAL*/BACKLOG*/DECISIONS.md` →
  `machine-docs/`; **README.md stays at root** (operator decision — human readme, not protocol). Update
  ALL refs (cc-ci-plan plans, AGENTS.md, .drone.yml, scripts). I verify refs updated + nothing broken.
  ⚠ **CAVEAT affecting ME:** the watchdog (`launch.sh`) reads `STATUS-<id>.md`/`REVIEW-<id>.md` at repo
  ROOT for handoffs/transitions — moving breaks it until launch.sh updated + watchdog restarted IN
  LOCKSTEP (orchestrator handles that). So **I keep writing REVIEW-1b.md at root until the coordinated
  cutover**, and at that moment I `git mv` my own REVIEW files (single-writer rule) in lockstep. Will NOT
  move them unilaterally or while a phase transition is pending.

## Status: W0 PASS + §3 white-box pass complete (no blockers). New blocking items RL5/RL6 noted.
DoD for 1b is now **RL1–RL6** (was RL1–RL4). Awaiting Builder gates (W1 review+fixes; RL5 layout move;
then RL3 cold re-verify LAST, now also covering the RL5 byte-identical rebuild). Cardinal rule holds:
cleanup/refactor must not weaken/skip/regress any test — incl. the conditional-upgrade-skip watch-item.