Files
cc-ci/machine-docs/JOURNAL-1b.md

13 KiB
Raw Blame History

JOURNAL — Phase 1b (review & lint pass)

Append-only Builder log: what I did + verifying command/output + next. (Adversary logs to REVIEW-1b.)


2026-05-27 — Phase 1b kickoff (first wake)

Read the phase plan (plan-phase1b-review-lint.md) + plan.md §6.1/§7/§9. Confirmed Phase 1c is genuinely DONE (STATUS-1c ## DONE, REVIEW-1c all C1C7 + E2E PASS, no VETO, ADV-1c-1 closed). Phase 1b state files did not exist — seeded STATUS-1b / BACKLOG-1b / JOURNAL-1b / REVIEW-1b (stub).

Access + environment probes:

  • ssh cc-ci 'hostname && systemctl is-system-running'nixos / running.
  • Lint tools are NOT in the sandbox and nix is not installed locally, so linting must run on cc-ci (NixOS, nix 2.24.14, flakes enabled). nix build github:NixOS/nixpkgs/<our-pin>#ruff resolves from cache.nixos.org (ruff 0.7.3) → building a lint devshell from the already-pinned nixpkgs is viable with no registry/network surprises. shellcheck-0.10.0 already realized in the host store.

Lint-target inventory: 14 .nix, 32 .py, 1 .sh (scripts/bootstrap-drone-oauth.sh), plus .drone.yml / .sops.yaml YAML. No prior lint/format decisions in DECISIONS.md (clean slate).

Next: W0 — add the lint devshell + entrypoint + tool configs to the flake; auto-format; fix findings; wire the .drone.yml lint stage.

2026-05-27 — W0 built: lint toolchain + format + drone stage

Added (commits 2cede01 format/fixes, 4af427c drone stage, + tooling commits):

  • flake.nix: lint devshell (nix develop .#lint) = nixpkgs-fmt, statix, deadnix, ruff, shellcheck, shfmt, yamllint, built from the already-pinned nixpkgs (no registry/network surprise — nix build <pin>#ruff resolves from cache.nixos.org). Default devshell also gets them.
  • scripts/lint.sh (check / --fix), ruff.toml, .yamllint.yaml.
  • .drone.yml: a lint step in the event: push pipeline running nix develop .#lint --command bash scripts/lint.sh (FAILs the build on any unclean file).

Format/lint cleanup (semantics-preserving): ruff format on all 32 .py; nixpkgs-fmt drone-runner.nix; shfmt scripts; ruff SIM105/SIM115 (contextlib.suppress / with open); statix (merge sops secrets.*, empty-pattern → _); deadnix (drop unused self/lib/overlay final).

Verification (on cc-ci, clean tar'd checkout /tmp/ccci-lint):

$ nix develop .#lint --command bash scripts/lint.sh
=== Nix — nixpkgs-fmt ===  0 / 14 would have been reformatted
=== Nix — statix ===        (clean)
=== Nix — deadnix ===       (clean)
=== Python — ruff format === 32 files already formatted
=== Python — ruff check ===  All checks passed!
=== Shell — shfmt/shellcheck === (clean)
=== YAML — yamllint ===     (clean)
lint: PASS

nix eval .#nixosConfigurations.cc-ci.config.system.build.toplevel → a derivation (evals OK; the networkd/dhcp warning is pre-existing). Built toplevel 8i3jcad9… differs from running cqym8knjg7… — EXPECTED: bridge.py/dashboard.py (and runner) are cp'd into the store, so the reformat changes their hash. cc-ci will be rebuilt to the formatted closure in W2 before RL3. All Python byte-compiles (store python 3.12.8).

Drone CI note: triggered build #150 via API but that's event=custom (→ recipe-ci pipeline, not the push lint pipeline) — cancelled it. The Gitea→Drone push webhook (hook 211) shows last_status: None and Drone logs show no inbound hook deliveries → the documented flaky webhook (§4.1). Public and canonical (100.90.116.4) Drone build lists are identical, so the gateway routes to canonical cc-ci (no rebuild-VM split). Recorded the flaky-webhook as a pre-existing infra item in DECISIONS.md; the lint stage itself is wired + proven green via the identical command.

Claimed W0 gate (RL1) in STATUS-1b. Next: W1 white-box review checklist over the cleaned codebase.

2026-05-27 — W0 PASS (Adversary cold, RL1) + W1 Builder-side §3 self-review

Adversary logged W0/RL1 PASS (REVIEW-1b): cold checkout of my HEAD 233939a archived to cc-ci, nix develop .#lint --command bash scripts/lint.sh → exit 0 lint: PASS, plus a break-it probe (injected bad .py/.nix → exit 1 lint: FAIL) proving the gate has teeth. Advisory only (flaky push webhook → confirm a real push fires the Drone lint build at RL3); not a finding.

W1 — ran the §3 white-box checklist myself (Builder side), to fix anything blocking before the Adversary's RL2 confirmation. Findings over the post-W0 (cleaned) codebase:

  • Tests real (blocking) — holds. (Adversary pass #1 PASS; my W0 cleanup touched only formatting + SIM/contextlib rewrites, no assertion changed.)
  • Harness DRY (blocking-ish) — holds. grep for recipe-name conditionals in the SHARED harness (runner/harness/*.py, run_recipe_ci.py, conftest.py) → NONE. Per-recipe quirks are data: optional tests/<recipe>/recipe_meta.py (HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/HTTP_TIMEOUT) + per-recipe test files (e.g. keycloak kc_admin.py). Enrolling needs no shared-harness edit (D5).
  • Nix idempotent (blocking) — holds (no .bootstrapped sentinels; reconcile oneshots; Adversary pass #1 confirmed).
  • No footguns (blocking) — holds. Every time.sleep() (lifecycle.py 160/170/226/252, bridge.py 304) sits inside a while time.time() < deadline: poll/retry loop (verified each), not a bare readiness wait. --chaos appears ONLY in "never pass it" comments (abra.py). No shell=True.
  • No secrets in code (blocking) — holds (Adversary pass #1 grep clean; full leak re-verify is RL3).
  • Log redaction real (blocking) — holds. run_recipe_ci.py run_stage_redacted() masks any

    =8-char /run/secrets/* value from streamed stage output; no secret-named value is print/logged in bridge.py/dashboard.py (grep clean).

  • Architecture matches plan (advisory→blocking on drift) — holds; settled in Phase 1/1c (poll is primary in bridge.py's loop; /hook optional; traefik is the coop-cloud recipe via proxy.nix). No drift; not reopening settled design (guardrail §5).
  • Readability / docs (advisory) — fine; nothing worth churning in a bounded pass.

No blocking finding; nothing to fix; no advisory item to file. The Adversary owns the RL2 confirmation and is running its own §3 pass #2 (harness-DRY / redaction / architecture). Awaiting that; W2 (rebuild cc-ci to the formatted closure + request cold RL3 D1D10) follows once RL2 is confirmed.

2026-05-27 — RL2 clean + RL5 (nix/ consolidation) + W2 switch to cleaned closure

RL2 (Adversary §3 pass #2): no blocking findings; 2 advisories — (a) old_app upgrade-fixture copy-paste across recipes → triaged to IDEAS (per-recipe upgrade tests are by design; sharing is a nicety, not a DRY-blocker); (b) app-secret redaction: the cc-ci-run Drone step path isn't wrapped by run_stage_redacted, so the Adversary will re-run the behavioral D6 leak test at RL3 (grep published Drone logs + dashboard for a known generated app password). My Builder §3 self-review agreed (no blockers). W1 is light/clean.

RL5 — consolidate Nix code under nix/ (operator item, plan §7). git mv modules nix/modules, git mv hosts nix/hosts; flake.nix/flake.lock stay at root (#cc-ci unchanged); only flake's internal configuration.nix path + the moved modules' root-relative refs changed (../X../../X). Built on cc-ci → toplevel 8i3jcad9… byte-identical to the pre-move build (content-addressed; module .nix not in the runtime closure). Living docs + .drone.yml comment updated to nix/….

W2 — switched canonical cc-ci to the cleaned+RL5 closure so build == running (required before RL3: a fresh clone builds 8i3jcad9; running had to match or the byte-identical-to-running check would fail). Re-synced /root/cc-ci to HEAD, nixos-rebuild switch --flake 'path:/root/cc-ci#cc-ci':

stopping units: deploy-bridge.service, deploy-dashboard.service
sops-install-secrets: Imported …ssh_host_ed25519_key as age key (age1h90utdz…)
starting units: deploy-bridge.service, deploy-dashboard.service

Post-switch health (all green):

  • readlink /run/current-system8i3jcad9mrr01558lqckpi26nxn2ra3m-… (== fresh-clone build; was cqym8knjg7… pre-format).
  • systemctl is-system-runningrunning, 0 failed. deploy-bridge/deploy-dashboard active.
  • 5 stacks up (backups, ccci-bridge, ccci-dashboard, drone, traefik); ccci-bridge_app + ccci-dashboard_app 1/1 with NEW content-hash image tags (reformatted source redeployed).
  • Public via SOCKS proxy → gateway → cc-ci: https://ci.commoninternet.net/200 (<title>cc-ci — Co-op Cloud recipe CI</title>); /badge/custom-html.svg200.

Net: RL1 PASS, RL2 clean, RL4 docs landed (README lint section + architecture.md nix/ layout), RL5 done + healthy, running==build==8i3jcad9. Remaining for DONE: RL3 (Adversary cold D1D10 re-verify, now also covering the RL5 byte-identical rebuild) and RL6 (coordinated machine-docs/ move — LAST, with orchestrator lockstep). Claiming the RL3 gate.

2026-05-27 — push-webhook diagnostic (the RL1 "future commits stay clean" advisory)

Timeboxed root-cause on why pushes don't auto-create a Drone lint build. Fired Gitea's webhook test for the Drone hook (211) while tailing the Drone server logs:

  • POST /repos/recipe-maintainers/cc-ci/hooks/211/tests → Gitea returns 204 (accepted).
  • docker service logs --since 20s drone_…_appNOTHING — no inbound request logged at all.

So the delivery git.autonomic.zone (Gitea) → drone.ci.commoninternet.net (public gateway) → cc-ci isn't reaching Drone. This is a gateway/network reachability condition, NOT a Drone-side config I can fix — and per §9 the gateway is operator-managed (not ours to reconfigure). Leaving it as the documented pre-existing advisory (hook last_status: None, §4.1). Impact is limited to cc-ci's OWN self-test/lint pipeline auto-firing; recipe-CI triggering is unaffected — the comment-bridge polls Gitea outbound (cc-ci → git.autonomic.zone, the reliable direction), which is the plan's primary trigger (§4.1). The lint stage is wired + proven green via its exact command; manual/API Drone builds work. Not expanding scope to re-engineer the inbound path (bounded pass).

2026-05-27 — RL3 FULL D1D10 PASS (Adversary cold). Only RL6 (coordinated) left.

Adversary logged RL3 PASS (REVIEW-1b): all D1D10 re-verified cold on the cleaned+RL5 byte-identical closure (8i3jcad9==running==fresh-clone build), fresh <24h evidence, nothing weakened. Highlights: D1 trigger 20s/8s; D2 install/upgrade/backup green (upgrade actually ran, not skipped) on custom-html + keycloak; D6 leak test 0 hits (8/8 infra + cert/key + generated keycloak admin pw absent from logs/dashboard); D8 fresh-recursive-clone rebuild == running; D10 = 2 fresh category runs (#151 custom-html, #152 keycloak) + carry-forward of the Phase-1 Adversary-verified 6/6 set (byte-identical harness/test/closure). Cardinal-rule PASS. RL1RL5 Adversary-PASS, no open findings, NO VETO.

→ Flagged the orchestrator (STATUS-1b) that I'm ready for the RL6 coordinated cutover: it updates launch.sh to machine-docs/ paths + restarts the watchdog; on its signal I git mv STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md into machine-docs/ (README stays root), the Adversary moves REVIEW*, I fix the only in-repo refs (README Loop-state + docs/install.md:15), Adversary re-verifies, then I write ## DONE. Holding all root protocol files in place until that signal (moving them early breaks the live watchdog). Loop continues; not idling on a long sleep — short fallback while awaiting the orchestrator go-ahead.

2026-05-27 — RL6 PASS → Phase 1b DONE

Adversary logged RL6 PASS + FINAL SIGN-OFF: all RL1RL6 Adversary-PASS, NO VETO (it moved its own REVIEW*.md → machine-docs/, re-verified refs + the watchdog resolve_state handoff survived the lockstep cutover). No open [adversary] findings; advisories → IDEAS + the documented push-webhook one.

DONE-handshake conditions (plan §6.1) met: a <24h Adversary PASS for every RL1RL6 + the full cold D1D10, no standing ## VETO. Final Builder health: cc-ci running/0-failed, toplevel 8i3jcad9mrr01558lqckpi26nxn2ra3m == fresh-clone build (build==running, byte-identical), 5 stacks up, public https://ci.commoninternet.net/ → 200. Wrote ## DONE to machine-docs/STATUS-1b.md.

Phase 1b is genuinely DONE. The foundation is now: formatted + lint-clean (CI-enforced via the .drone.yml lint stage), all Nix code under nix/ (flake at root, #cc-ci unchanged), multi-agent protocol files under machine-docs/, and every Phase-1 D1D10 re-verified cold on the cleaned closure with nothing weakened. Builder loop terminating.