From 5d90cbd576a6cd792bee52ff5c67e2cba486f01d Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Wed, 27 May 2026 05:11:06 +0100 Subject: [PATCH] Add Phase-1b plan: bounded review & lint pass at the end of Phase 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Before scaling to many recipes: (1) deterministic style/hygiene via linters/formatters (alejandra/statix/deadnix, ruff, shellcheck/shfmt) wired as a .drone.yml stage so commits stay clean; (2) a white-box review checklist with teeth (real tests not health-only/skipped, DRY harness, Nix-declared idempotent bring-up, no footguns/secrets-in-code, architecture matches plan) — blocking fixed, advisory triaged. Bounded pass; never weaken a test for a nit. Phase 2 now follows 1b. Linked in README. Co-Authored-By: Claude Opus 4.7 (1M context) --- cc-ci-plan/README.md | 3 +- cc-ci-plan/plan-phase1b-review-lint.md | 127 +++++++++++++++++++++++++ cc-ci-plan/plan-phase2-recipe-tests.md | 3 +- 3 files changed, 131 insertions(+), 2 deletions(-) create mode 100644 cc-ci-plan/plan-phase1b-review-lint.md diff --git a/cc-ci-plan/README.md b/cc-ci-plan/README.md index 083cf88..eb2d931 100644 --- a/cc-ci-plan/README.md +++ b/cc-ci-plan/README.md @@ -15,7 +15,8 @@ autonomous Claude loops (a Builder and an adversarial Reviewer) running over day | File | Purpose | |---|---| | `plan.md` | The Phase-1 plan (build the CI server). Agents treat it as their single source of truth. | -| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase-1 `## DONE`): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. | +| `plan-phase1b-review-lint.md` | **Phase 1b** (bounded pass at the end of Phase 1): deterministic linting/formatting in CI + a white-box review checklist (real tests, DRY harness, idempotent Nix, no footguns/secrets). | +| `plan-phase2-recipe-tests.md` | **Phase 2** (after Phase 1b): author comprehensive per-recipe tests — port every recipe-maintainer test + ≥2 recipe-specific tests per app. | | `plan-phase2b-test-performance.md` | **Phase 2b** (after Phase 2, before Phase 3): empirically measure where test time goes and reduce it (image cache, readiness tuning, dedup deploys, warm infra, concurrency) — no weakened tests. | | `plan-phase3-results-ux.md` | **Phase 3** (after Phase 2b): beautiful YunoHost-style results — per-run **level**, image-forward PR comment (badge + summary card + app screenshot), polished dashboard. | | `IDEAS.md` | Deferred/future ideas, parked out of current scope. | diff --git a/cc-ci-plan/plan-phase1b-review-lint.md b/cc-ci-plan/plan-phase1b-review-lint.md new file mode 100644 index 0000000..5163d19 --- /dev/null +++ b/cc-ci-plan/plan-phase1b-review-lint.md @@ -0,0 +1,127 @@ +# cc-ci Phase 1b — Review & lint pass (Autonomous Build Plan) + +**Status:** QUEUED — a **bounded** pass that runs after Phase 1 (`plan.md`) reaches `## DONE`, and +**before** Phase 2 (`plan-phase2-recipe-tests.md`). +**Transition:** **manual** (operator kicks it off). +**Builds on:** the complete Phase-1 codebase (flake/modules, `runner/` + harness, the comment-bridge, +dashboard, scripts, the first recipes' tests, docs). +**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7). Here the Adversary also acts as +**white-box reviewer** (reads the code, not just black-box behavior). +**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1b-review-lint.md` + +--- + +## 0. Why this phase (and why it's bounded) + +Before scaling to ~18 recipes (Phase 2), clean and harden the foundation built in Phase 1 — it's far +cheaper to fix the harness/modules now than after every recipe copies their patterns. This is a +**one-time, bounded pass**, not an open-ended loop. Two distinct workstreams that go to two places, +to avoid bikeshedding: + +1. **Style & hygiene → deterministic tooling** (formatters/linters in CI). No judgment calls, no + churn, enforceable going forward. +2. **Code health a linter can't judge → a white-box review checklist with teeth** (§3) — anchored to + plan-relevant invariants, *not* taste. Blocking items get fixed; the rest are advisory. + +Discipline: **good-enough + enforceable**, not gold-plating. Don't reopen settled design; don't +weaken any test in the name of "cleanliness." When in doubt, lint it deterministically or leave it. + +--- + +## 1. Definition of Done (Phase 1b exit condition) + +Terminates when every item holds **and the Adversary/reviewer has independently confirmed** (logged +in `REVIEW.md`): + +- [ ] **RL1 — Lint/format tooling in place + green.** Formatters + linters are added to the repo (a + `lint` entrypoint + a Nix devshell) and wired as a **stage in cc-ci's own `.drone.yml`** so + future commits stay clean. The whole Phase-1 codebase **passes** them. Tooling (defaults; settle + in `DECISIONS.md`): + - **Nix:** `alejandra` (or `nixpkgs-fmt`) format · `statix` (lints) · `deadnix` (dead code). + - **Python** (`runner/`, harness, `bridge/`, `tests/`): `ruff` (lint + format). + - **Shell:** `shellcheck` + `shfmt`. + - **(optional) YAML:** `yamllint` for `.drone.yml`/compose. +- [ ] **RL2 — Review checklist run, blocking items fixed.** The white-box checklist (§3) has been run + over the codebase; every **blocking** finding is fixed; every **advisory** finding is either + fixed or moved to `BACKLOG.md`/`IDEAS.md` as `[idea]` with a one-line rationale. Findings + + resolutions recorded in `REVIEW.md`. +- [ ] **RL3 — Still green, nothing weakened.** After the cleanup, the full Phase-1 Definition of Done + (`plan.md` §2 D1–D10) **still holds** — the Adversary re-confirms a representative `!testme` run + is still green and no test/assertion was softened or skipped to satisfy a lint/review nit. +- [ ] **RL4 — Documented.** `docs/` notes how to run lint/format locally and that CI enforces it; any + accepted deviations from the review checklist are in `DECISIONS.md`. + +When RL1–RL4 hold and are confirmed, write `## DONE` to Phase-1b `STATUS.md`. + +--- + +## 2. Method + +1. **Tooling first.** Add the formatters/linters + a `lint` entrypoint + devshell; run format, then + fix lint findings. Wire the `.drone.yml` lint stage so it's enforced from here on. (Auto-fixable + style should just be auto-fixed — don't deliberate over it.) +2. **Then the review checklist (§3).** Read the code against the checklist; classify each finding + **blocking** vs **advisory**; fix blocking, triage advisory. +3. **Re-verify.** Confirm Phase-1 §2 still passes (RL3) — cleanup must not regress behavior or tests. +4. **Bound it.** Cap iterations; this is a pass, not a rewrite. Record dead-ends/deviations and stop. + +--- + +## 3. The white-box review checklist (teeth, not taste) + +Blocking unless noted; these are *plan-relevant invariants visible only by reading code*: + +- **Tests are real (blocking).** Every recipe/harness test asserts actual app state — not + health-only, not `skip`/`xfail`, not assertions that can't fail. (Pre-empts Phase-2 §7.1 at the + foundation.) +- **Harness is DRY (blocking-ish).** Recipe-specific quirks live in the **shared harness**, not + copy-pasted per recipe (the M6.5 "no harness surgery / no one-off hacks" rule) — so 18 recipes + don't multiply the mess. +- **Server state is Nix-declared & idempotent (blocking).** Bring-up is declarative idempotent + reconciliation (the `swarm-init`/`deploy-proxy` oneshot pattern, §9) — no imperative drift, no + run-once sentinels, no manual post-rebuild steps in `install.md`. +- **No footguns (blocking).** No bare `sleep` for readiness (use polling); correct `--chaos` usage; + no hardcoded versions/domains that break across upstream versions; teardown in `try/finally`; + per-run secrets reused across stages (not regenerated). +- **No secrets in code or committed files (blocking).** Grep + read for tokens/passwords/keys in + source, configs, `.drone.yml`, fixtures (complements the Adversary's behavioral leak test). +- **Architecture matches the plan (advisory→blocking on real drift).** Layout/》 modules follow + `plan.md` §3; the trigger is poll-primary + optional manual webhook (§4.1); Traefik is the + coop-cloud recipe (§4.2); deviations are recorded in `DECISIONS.md` (not silent). +- **Log redaction is real (blocking).** The log/dashboard pipeline scrubs secrets before publish. +- **Readability & error handling (advisory).** Clear names, sensible structure, errors surfaced not + swallowed, dead code removed — advisory; auto-fixable parts belong to the linters, not debate. +- **Docs accurate (advisory).** `install.md` reproduces from scratch; `enroll-recipe.md` matches the + real enroll flow. + +--- + +## 4. Milestones (bounded) + +- **W0 — Tooling + format.** Linters/formatters + `lint` entrypoint + devshell added; codebase + auto-formatted; `.drone.yml` lint stage wired and green. *Accept:* `lint` passes in CI from a clean + checkout. +- **W1 — Review + fixes.** Checklist (§3) run; blocking findings fixed; advisory triaged. *Accept:* + `REVIEW.md` lists findings + resolutions; no blocking item open. +- **W2 — Re-verify + document.** Phase-1 §2 still green; docs updated; deviations in `DECISIONS.md`. + *Accept:* a `!testme` run is still green post-cleanup; flip Phase-1b `STATUS.md` to `## DONE`. + +--- + +## 5. Guardrails + +- **Never weaken a test to satisfy a lint/review nit** (the cardinal rule still wins). +- **Style → tooling, judgment → checklist.** Don't spend agent effort (or loop iterations) on what a + formatter/linter settles deterministically; don't let the review drift into subjective taste. +- **Bounded.** This is a pass, not a refactor project — fix invariants + enforce linting, then stop; + bigger improvements go to `IDEAS.md`, not this phase. +- **Don't reopen settled design.** Architecture decisions from Phase 1 stand unless there's real + drift from the plan; record, don't relitigate. + +--- + +## 6. Open decisions (log in DECISIONS.md) +- Exact formatter choices (alejandra vs nixpkgs-fmt; ruff-format vs black) and lint strictness. +- Whether to add Python **type-checking** (mypy/pyright) now or defer to `IDEAS.md`. +- The precise **blocking vs advisory** split for the checklist. +- Whether the `.drone.yml` lint stage should **fail** the build or just warn initially. diff --git a/cc-ci-plan/plan-phase2-recipe-tests.md b/cc-ci-plan/plan-phase2-recipe-tests.md index 9eaf188..8c5f9df 100644 --- a/cc-ci-plan/plan-phase2-recipe-tests.md +++ b/cc-ci-plan/plan-phase2-recipe-tests.md @@ -1,6 +1,7 @@ # cc-ci Phase 2 — Comprehensive per-recipe test authoring (Autonomous Build Plan) -**Status:** QUEUED — starts only after Phase 1 (`plan.md`) reaches `## DONE`. +**Status:** QUEUED — starts after Phase 1 (`plan.md`) and the Phase-1b review/lint pass +(`plan-phase1b-review-lint.md`) reach `## DONE`. **Builds on:** the Phase-1 cc-ci CI server (`plan.md`). This phase adds **test content**, not infra. **Reference corpus:** `references/recipe-maintainer/` → `/srv/recipe-maintainer/` (the existing, human-maintained recipe tests — the canonical source to port from).