134 lines
11 KiB
Markdown
134 lines
11 KiB
Markdown
# REVIEW — Phase 1b (review & lint pass) — Adversary ledger
|
||
|
||
**Phase plan (SSOT):** `/srv/cc-ci/cc-ci-plan/plan-phase1b-review-lint.md`
|
||
**Loop state for THIS phase:** STATUS-1b / BACKLOG-1b / JOURNAL-1b (Builder) · **REVIEW-1b (Adversary, this file)** · DECISIONS.md shared.
|
||
Phase-1 STATUS.md/BACKLOG.md/REVIEW.md and the Phase-1c `*-1c.md` files are HISTORY — not this phase's state.
|
||
|
||
This phase the Adversary is **also the white-box reviewer** (§3 checklist), so this ledger holds both
|
||
white-box review findings and the eventual cold RL3 re-verification of D1–D10.
|
||
|
||
DoD I must independently confirm (RL1 lint-in-CI-green · RL2 §3 checklist run, blocking fixed · **RL3
|
||
full cold D1–D10 re-verify — the final gate** · RL4 docs). Order per §2: tooling → review fixes → *then*
|
||
RL3. **Cardinal rule:** never weaken a test to satisfy a lint/review nit; RL3 must confirm cleanup
|
||
softened/skipped/regressed nothing.
|
||
|
||
---
|
||
|
||
## Phase-1b orientation @2026-05-27 (Adversary cold start)
|
||
- Pulled clean; Phase 1c is signed-off DONE (commit 6d2bc3d). Phase 1b kicked off by operator (manual transition).
|
||
- Builder has **not yet seeded** STATUS-1b/BACKLOG-1b/JOURNAL-1b and has not claimed W0. No gate pending.
|
||
- I began the independent white-box §3 review immediately (it's my role this phase and needs no Builder gate).
|
||
|
||
## White-box §3 prep pass #1 @2026-05-27 — over post-1c codebase (PRE-cleanup baseline; advisory until RL3)
|
||
Recording the baseline state *before* any W0/W1 cleanup, so I can later confirm the cleanup regressed nothing.
|
||
|
||
- **Tests are real** — PASS (provisional). Swept all 6 recipe suites (custom-html, lasuite-docs, keycloak,
|
||
matrix-synapse, n8n, cryptpad) × install/upgrade/backup + conftest + runner/harness. No
|
||
`@pytest.mark.skip/xfail/skipif`, no commented-out asserts, no tautologies. Install tests assert real
|
||
app content (matrix: parses `versions` JSON non-empty; keycloak: admin DOM; others: Playwright body).
|
||
Upgrade tests deploy v(n-1) → write marker → upgrade → assert exact marker survives. Backup tests
|
||
establish+verify state → backup → mutate+verify → restore → assert exact pre-mutation state (keycloak
|
||
deletes a realm). **Watch-item (to re-check black-box at RL3):** every upgrade test has a *conditional*
|
||
`pytest.skip()` when no previous published version exists (e.g. custom-html test_upgrade.py:17-18). Valid
|
||
by design, but if it ALWAYS skips, the upgrade stage would be silently fake — RL3 must confirm the
|
||
upgrade stage actually RUNS (prev version found) for ≥1 recipe, not just skips. (1c E2E exercised this.)
|
||
- **Server state Nix-declared & idempotent** — PASS (provisional). No `.bootstrapped`/run-once sentinels in
|
||
modules/ or scripts/ (grep clean). Convergence/oneshot pattern per §9 to be re-read fully in pass #2.
|
||
- **No footguns / sleep** — PASS (provisional). All `time.sleep()` in runner/harness/lifecycle.py (147,157,
|
||
212,238) + bridge.py (280) are **poll-loop intervals inside `while time.time() < deadline:` loops**, not
|
||
bare readiness waits. `wait_healthy` polls converge-then-HTTP with timeouts. Teardown (lifecycle.py:215)
|
||
is correctly ordered (undeploy → `docker stack rm` fallback → volumes/secrets while .env exists → drop
|
||
.env last), retries volume removal, and **verifies residual is empty (raises TeardownError otherwise)**.
|
||
- **No secrets in code/committed files** — PASS (provisional). Grep for inline passwords/tokens/private-key
|
||
blocks across *.py/*.nix/*.sh/*.yml clean (only env/file references + generators). Full leak re-verify
|
||
(incl. published logs + dashboard, and generated app passwords) belongs to RL3 D6.
|
||
|
||
Still owed in white-box pass #2 (after I read the rest): **harness DRY** (recipe quirks in shared harness,
|
||
not per-recipe copy-paste), **log redaction real** (bridge/dashboard/log pipeline), **architecture matches
|
||
plan** (layout/§3, poll-primary trigger §4.1, traefik-is-coop-cloud-recipe §4.2; drift → DECISIONS.md).
|
||
|
||
## W0 (RL1 — lint/format tooling + green) : **PASS** @2026-05-27 (Adversary cold)
|
||
Gate claimed in STATUS-1b. Acceptance: clean checkout → `nix develop .#lint --command bash
|
||
scripts/lint.sh` → `lint: PASS`; lint stage wired in `.drone.yml` push pipeline. **Verified cold,
|
||
independently** (no nix on sandbox; ran on cc-ci over a *pristine* tree, not the Builder's working copy):
|
||
|
||
- **Cold checkout = exact reviewed SHA.** `git archive 233939a` (= my `origin/main` HEAD) piped to
|
||
cc-ci → `/tmp/ccci-cold` (clean tree, no untracked/cached state, secrets submodule empty as lint
|
||
excludes it). Not cloned from `/root/cc-ci` (that's a non-git plain copy) — archived from my own clone.
|
||
- **Lint PASS cold.** `HOME=/root nix develop .#lint --command bash scripts/lint.sh` → **exit 0,
|
||
`lint: PASS`.** All 8 linters ran clean: nixpkgs-fmt (0/14 reformat), statix, deadnix, ruff format
|
||
(32 files), ruff check (all passed), shfmt, shellcheck, yamllint.
|
||
- **Stage real, not rigged.** `scripts/lint.sh` genuinely invokes each linter in check mode and
|
||
accumulates a `fail` flag → `exit "$fail"` (correct `set -uo pipefail`, no `-e`, so all run). The
|
||
`.drone.yml` `self-test` push pipeline runs the *exact* command `nix develop .#lint --command bash
|
||
scripts/lint.sh` and FAILs the build on non-zero. Toolchain pinned from nixpkgs in `flake.nix`
|
||
(`devShells.lint`), so CI == local.
|
||
- **Gate has TEETH (break-it probe).** Injected violations into the cold tree (a `.py` with
|
||
`import os,sys` + `x=1+2`, and a mis-formatted `.nix`) → re-ran lint → **exit 1, `lint: FAIL`**
|
||
(ruff E401/I001/F401 + nixpkgs-fmt). So the stage is not vacuously green.
|
||
|
||
Verdict: **W0 PASS.** Builder may proceed to W1.
|
||
Advisory (not W0-blocking; re-confirm at RL3): Builder notes the Gitea→Drone *push* webhook is flaky
|
||
(§4.1), so the lint stage may not auto-fire as a real Drone build on every push — RL1's intent
|
||
("future commits stay clean") depends on that path actually firing. The stage IS wired and proven
|
||
green via its exact command; I'll confirm a real push triggers the Drone lint build when I re-verify
|
||
M2/D-gates at RL3 (it overlaps). Not filing a finding now — bounded phase, acceptance-as-stated is met.
|
||
|
||
## White-box §3 pass #2 @2026-05-27 (Adversary, post-W0 formatted code) — RL2 input
|
||
Remaining §3 checklist items. **No blocking findings.**
|
||
|
||
- **Harness is DRY** — PASS. Recipe quirks live in shared harness + per-recipe *declarative* metadata
|
||
(`tests/<recipe>/recipe_meta.py`: HEALTH_PATH/HEALTH_OK/timeouts/EXTRA_ENV), consumed uniformly by
|
||
`tests/conftest.py` (`_recipe_meta`, `deployed`/`deployed_app` fixtures) and
|
||
`runner/harness/lifecycle.py` (`_recipe_extra_env`). **No `if recipe == "..."` branches in the shared
|
||
harness** (the M6.5 no-surgery rule holds). Recipe-specific logic is isolated to that recipe's dir
|
||
(e.g. keycloak `kc_admin.py`, cryptpad's derived SANDBOX_DOMAIN). Only smell: the ~6-8-line `old_app`
|
||
upgrade fixture is copy-pasted across recipes — thin boilerplate over shared metadata; **advisory**,
|
||
not a violation (factoring it would just add another per-recipe injection point). → IDEAS, not blocking.
|
||
- **Architecture matches plan** — PASS. §4.1 trigger is **poll-primary** (`bridge/bridge.py` `poll_loop`
|
||
runs unconditionally every ≤60s; webhook is optional + dedup'd by comment id; exact trimmed `!testme`;
|
||
commenter-auth via read-level `GET /orgs/{owner}/members/{user}` 204=allow, fail-closed). §4.2 Traefik
|
||
is the **real coop-cloud/traefik recipe via abra** (`modules/proxy.nix`: `abra recipe fetch/app new
|
||
traefik`, `WILDCARDS_ENABLED=1`, `compose.wildcard.yml`, `LETS_ENCRYPT_ENV=""` → no ACME, cert as
|
||
`ssl_cert`/`ssl_key` swarm secrets) — no hand-rolled traefik.nix. §3 layout matches.
|
||
- **Server state Nix-declared & idempotent** — PASS. `modules/proxy.nix` `deploy-proxy` is
|
||
`Type=oneshot`+`RemainAfterExit`, re-runs every activation and converges (insert secret only if
|
||
absent, deploy). No `.bootstrapped`/run-once sentinels anywhere (grep clean, pass #1). Leans on 1c's
|
||
already-proven D8 (byte-identical closure + live throwaway rebuild, no manual post-step).
|
||
- **Log redaction is real** — PASS for infra secrets; **one advisory gap to verify behaviorally at
|
||
RL3/D6.** `runner/run_recipe_ci.py` `_redact_values()` reads `/run/secrets/*` (≥8-char values) and
|
||
`run_stage_redacted()` masks them in live-streamed stage output (sorted longest-first → no partial
|
||
leak). **But class-B *generated app passwords* are NOT under `/run/secrets/*`, so they are NOT in the
|
||
`_REDACT` list** — their non-leak rests entirely on the "harness never prints them / abra doesn't echo
|
||
generated ones" assumption (code comment, run_recipe_ci.py:59-60). Also: the runner's *own* stdout
|
||
(the `cc-ci-run …` Drone step) bypasses `run_stage_redacted`. This is exactly what my behavioral D6
|
||
leak test must catch at RL3 (grep published Drone logs **and** the dashboard for a known generated app
|
||
password). Phase-1 D6 passed that test once; recording the white-box shape so RL3 re-checks it, not a
|
||
new blocking finding. → **WATCH-ITEM for RL3/D6.**
|
||
- **Readability / docs accuracy** — advisory; defer to RL4 (docs) + the ruff/lint pass already covers
|
||
dead code / style deterministically.
|
||
|
||
**Net of §3 white-box review (RL2 input): no blocking findings; 2 advisories** (old_app copy-paste →
|
||
IDEAS; app-secret redaction → RL3/D6 watch-item). I expect Builder's W1 to be light. I have NOT filed
|
||
`[adversary]` BACKLOG items since nothing is blocking — will file if W1/RL3 surfaces a real defect.
|
||
|
||
## Operator added RL5 + RL6 (plan §7, 2026-05-27) — both BLOCKING for 1b DONE. Noted; verification plan:
|
||
- **RL5** (Builder moves; Adversary verifies cold): `modules/`→`nix/modules/`, `hosts/`→`nix/hosts/`;
|
||
`flake.nix`/`flake.lock` STAY at root so build ref `#cc-ci` is unchanged; fix flake internal paths +
|
||
`.drone.yml`/scripts refs; update `docs/architecture.md`. **Verification folds into RL3:** a fresh
|
||
recursive clone must still rebuild **byte-identical to the running system** (toplevel store hash WILL
|
||
change — expected; what must hold is build==running + reproducible). I'll re-confirm cold at RL3.
|
||
- **RL6** (coordinated near-END-of-1b): move `STATUS*/REVIEW*/JOURNAL*/BACKLOG*/DECISIONS.md` →
|
||
`machine-docs/`; **README.md stays at root** (operator decision — human readme, not protocol). Update
|
||
ALL refs (cc-ci-plan plans, AGENTS.md, .drone.yml, scripts). I verify refs updated + nothing broken.
|
||
⚠ **CAVEAT affecting ME:** the watchdog (`launch.sh`) reads `STATUS-<id>.md`/`REVIEW-<id>.md` at repo
|
||
ROOT for handoffs/transitions — moving breaks it until launch.sh updated + watchdog restarted IN
|
||
LOCKSTEP (orchestrator handles that). So **I keep writing REVIEW-1b.md at root until the coordinated
|
||
cutover**, and at that moment I `git mv` my own REVIEW files (single-writer rule) in lockstep. Will NOT
|
||
move them unilaterally or while a phase transition is pending.
|
||
|
||
## Status: W0 PASS + §3 white-box pass complete (no blockers). New blocking items RL5/RL6 noted.
|
||
DoD for 1b is now **RL1–RL6** (was RL1–RL4). Awaiting Builder gates (W1 review+fixes; RL5 layout move;
|
||
then RL3 cold re-verify LAST, now also covering the RL5 byte-identical rebuild). Cardinal rule holds:
|
||
cleanup/refactor must not weaken/skip/regress any test — incl. the conditional-upgrade-skip watch-item.
|