187 lines
13 KiB
Markdown
187 lines
13 KiB
Markdown
# JOURNAL — Phase 1b (review & lint pass)
|
||
|
||
Append-only Builder log: what I did + verifying command/output + next. (Adversary logs to REVIEW-1b.)
|
||
|
||
---
|
||
|
||
## 2026-05-27 — Phase 1b kickoff (first wake)
|
||
|
||
Read the phase plan (`plan-phase1b-review-lint.md`) + plan.md §6.1/§7/§9. Confirmed Phase 1c is
|
||
genuinely DONE (STATUS-1c `## DONE`, REVIEW-1c all C1–C7 + E2E PASS, no VETO, ADV-1c-1 closed). Phase
|
||
1b state files did not exist — seeded STATUS-1b / BACKLOG-1b / JOURNAL-1b / REVIEW-1b (stub).
|
||
|
||
Access + environment probes:
|
||
- `ssh cc-ci 'hostname && systemctl is-system-running'` → `nixos` / `running`.
|
||
- Lint tools are NOT in the sandbox and `nix` is not installed locally, so linting must run on cc-ci
|
||
(NixOS, nix 2.24.14, flakes enabled). `nix build github:NixOS/nixpkgs/<our-pin>#ruff` resolves from
|
||
cache.nixos.org (ruff 0.7.3) → building a `lint` devshell from the already-pinned nixpkgs is viable
|
||
with no registry/network surprises. shellcheck-0.10.0 already realized in the host store.
|
||
|
||
Lint-target inventory: 14 `.nix`, 32 `.py`, 1 `.sh` (`scripts/bootstrap-drone-oauth.sh`), plus
|
||
`.drone.yml` / `.sops.yaml` YAML. No prior lint/format decisions in DECISIONS.md (clean slate).
|
||
|
||
Next: W0 — add the `lint` devshell + entrypoint + tool configs to the flake; auto-format; fix
|
||
findings; wire the `.drone.yml` lint stage.
|
||
|
||
## 2026-05-27 — W0 built: lint toolchain + format + drone stage
|
||
|
||
Added (commits 2cede01 format/fixes, 4af427c drone stage, + tooling commits):
|
||
- `flake.nix`: `lint` devshell (`nix develop .#lint`) = nixpkgs-fmt, statix, deadnix, ruff,
|
||
shellcheck, shfmt, yamllint, built from the already-pinned nixpkgs (no registry/network surprise —
|
||
`nix build <pin>#ruff` resolves from cache.nixos.org). Default devshell also gets them.
|
||
- `scripts/lint.sh` (check / `--fix`), `ruff.toml`, `.yamllint.yaml`.
|
||
- `.drone.yml`: a `lint` step in the `event: push` pipeline running
|
||
`nix develop .#lint --command bash scripts/lint.sh` (FAILs the build on any unclean file).
|
||
|
||
Format/lint cleanup (semantics-preserving): ruff format on all 32 .py; nixpkgs-fmt drone-runner.nix;
|
||
shfmt scripts; ruff SIM105/SIM115 (contextlib.suppress / `with open`); statix (merge sops
|
||
`secrets.*`, empty-pattern → `_`); deadnix (drop unused `self`/`lib`/overlay `final`).
|
||
|
||
Verification (on cc-ci, clean tar'd checkout /tmp/ccci-lint):
|
||
```
|
||
$ nix develop .#lint --command bash scripts/lint.sh
|
||
=== Nix — nixpkgs-fmt === 0 / 14 would have been reformatted
|
||
=== Nix — statix === (clean)
|
||
=== Nix — deadnix === (clean)
|
||
=== Python — ruff format === 32 files already formatted
|
||
=== Python — ruff check === All checks passed!
|
||
=== Shell — shfmt/shellcheck === (clean)
|
||
=== YAML — yamllint === (clean)
|
||
lint: PASS
|
||
```
|
||
nix eval `.#nixosConfigurations.cc-ci.config.system.build.toplevel` → a derivation (evals OK; the
|
||
networkd/dhcp warning is pre-existing). Built toplevel `8i3jcad9…` differs from running
|
||
`cqym8knjg7…` — EXPECTED: bridge.py/dashboard.py (and runner) are `cp`'d into the store, so the
|
||
reformat changes their hash. cc-ci will be rebuilt to the formatted closure in W2 before RL3.
|
||
All Python byte-compiles (store python 3.12.8).
|
||
|
||
Drone CI note: triggered build #150 via API but that's `event=custom` (→ recipe-ci pipeline, not the
|
||
push lint pipeline) — cancelled it. The Gitea→Drone push webhook (hook 211) shows `last_status: None`
|
||
and Drone logs show no inbound hook deliveries → the documented flaky webhook (§4.1). Public and
|
||
canonical (100.90.116.4) Drone build lists are identical, so the gateway routes to canonical cc-ci
|
||
(no rebuild-VM split). Recorded the flaky-webhook as a pre-existing infra item in DECISIONS.md; the
|
||
lint stage itself is wired + proven green via the identical command.
|
||
|
||
Claimed W0 gate (RL1) in STATUS-1b. Next: W1 white-box review checklist over the cleaned codebase.
|
||
|
||
## 2026-05-27 — W0 PASS (Adversary cold, RL1) + W1 Builder-side §3 self-review
|
||
|
||
Adversary logged **W0/RL1 PASS** (REVIEW-1b): cold checkout of my HEAD `233939a` archived to cc-ci,
|
||
`nix develop .#lint --command bash scripts/lint.sh` → exit 0 `lint: PASS`, plus a break-it probe
|
||
(injected bad .py/.nix → exit 1 `lint: FAIL`) proving the gate has teeth. Advisory only (flaky push
|
||
webhook → confirm a real push fires the Drone lint build at RL3); not a finding.
|
||
|
||
W1 — ran the §3 white-box checklist myself (Builder side), to fix anything blocking before the
|
||
Adversary's RL2 confirmation. Findings over the post-W0 (cleaned) codebase:
|
||
- **Tests real (blocking)** — holds. (Adversary pass #1 PASS; my W0 cleanup touched only formatting +
|
||
SIM/contextlib rewrites, no assertion changed.)
|
||
- **Harness DRY (blocking-ish)** — holds. `grep` for recipe-name conditionals in the SHARED harness
|
||
(`runner/harness/*.py`, `run_recipe_ci.py`, `conftest.py`) → NONE. Per-recipe quirks are data:
|
||
optional `tests/<recipe>/recipe_meta.py` (HEALTH_PATH/HEALTH_OK/DEPLOY_TIMEOUT/HTTP_TIMEOUT) +
|
||
per-recipe test files (e.g. keycloak `kc_admin.py`). Enrolling needs no shared-harness edit (D5).
|
||
- **Nix idempotent (blocking)** — holds (no `.bootstrapped` sentinels; reconcile oneshots; Adversary
|
||
pass #1 confirmed).
|
||
- **No footguns (blocking)** — holds. Every `time.sleep()` (lifecycle.py 160/170/226/252,
|
||
bridge.py 304) sits inside a `while time.time() < deadline:` poll/retry loop (verified each), not a
|
||
bare readiness wait. `--chaos` appears ONLY in "never pass it" comments (abra.py). No `shell=True`.
|
||
- **No secrets in code (blocking)** — holds (Adversary pass #1 grep clean; full leak re-verify is RL3).
|
||
- **Log redaction real (blocking)** — holds. `run_recipe_ci.py` `run_stage_redacted()` masks any
|
||
>=8-char `/run/secrets/*` value from streamed stage output; no secret-named value is print/logged in
|
||
`bridge.py`/`dashboard.py` (grep clean).
|
||
- **Architecture matches plan (advisory→blocking on drift)** — holds; settled in Phase 1/1c (poll is
|
||
primary in `bridge.py`'s loop; `/hook` optional; traefik is the coop-cloud recipe via `proxy.nix`).
|
||
No drift; not reopening settled design (guardrail §5).
|
||
- **Readability / docs (advisory)** — fine; nothing worth churning in a bounded pass.
|
||
|
||
**No blocking finding; nothing to fix; no advisory item to file.** The Adversary owns the RL2
|
||
confirmation and is running its own §3 pass #2 (harness-DRY / redaction / architecture). Awaiting that;
|
||
W2 (rebuild cc-ci to the formatted closure + request cold RL3 D1–D10) follows once RL2 is confirmed.
|
||
|
||
## 2026-05-27 — RL2 clean + RL5 (nix/ consolidation) + W2 switch to cleaned closure
|
||
|
||
**RL2 (Adversary §3 pass #2):** no blocking findings; 2 advisories — (a) `old_app` upgrade-fixture
|
||
copy-paste across recipes → triaged to IDEAS (per-recipe upgrade tests are by design; sharing is a
|
||
nicety, not a DRY-blocker); (b) app-secret redaction: the `cc-ci-run` Drone step path isn't wrapped by
|
||
`run_stage_redacted`, so the Adversary will re-run the behavioral D6 leak test at RL3 (grep published
|
||
Drone logs + dashboard for a known generated app password). My Builder §3 self-review agreed (no
|
||
blockers). W1 is light/clean.
|
||
|
||
**RL5 — consolidate Nix code under `nix/`** (operator item, plan §7). `git mv modules nix/modules`,
|
||
`git mv hosts nix/hosts`; flake.nix/flake.lock stay at root (`#cc-ci` unchanged); only flake's
|
||
internal configuration.nix path + the moved modules' root-relative refs changed (`../X`→`../../X`).
|
||
Built on cc-ci → toplevel `8i3jcad9…` **byte-identical to the pre-move build** (content-addressed;
|
||
module .nix not in the runtime closure). Living docs + `.drone.yml` comment updated to `nix/…`.
|
||
|
||
**W2 — switched canonical cc-ci to the cleaned+RL5 closure** so `build == running` (required before
|
||
RL3: a fresh clone builds `8i3jcad9`; running had to match or the byte-identical-to-running check
|
||
would fail). Re-synced `/root/cc-ci` to HEAD, `nixos-rebuild switch --flake 'path:/root/cc-ci#cc-ci'`:
|
||
```
|
||
stopping units: deploy-bridge.service, deploy-dashboard.service
|
||
sops-install-secrets: Imported …ssh_host_ed25519_key as age key (age1h90utdz…)
|
||
starting units: deploy-bridge.service, deploy-dashboard.service
|
||
```
|
||
Post-switch health (all green):
|
||
- `readlink /run/current-system` → `8i3jcad9mrr01558lqckpi26nxn2ra3m-…` (== fresh-clone build; was
|
||
`cqym8knjg7…` pre-format).
|
||
- `systemctl is-system-running` → `running`, **0 failed**. deploy-bridge/deploy-dashboard `active`.
|
||
- 5 stacks up (backups, ccci-bridge, ccci-dashboard, drone, traefik); `ccci-bridge_app` +
|
||
`ccci-dashboard_app` 1/1 with NEW content-hash image tags (reformatted source redeployed).
|
||
- Public via SOCKS proxy → gateway → cc-ci: `https://ci.commoninternet.net/` → **200**
|
||
(`<title>cc-ci — Co-op Cloud recipe CI</title>`); `/badge/custom-html.svg` → **200**.
|
||
|
||
Net: RL1 PASS, RL2 clean, RL4 docs landed (README lint section + architecture.md `nix/` layout),
|
||
RL5 done + healthy, running==build==`8i3jcad9`. Remaining for DONE: **RL3** (Adversary cold D1–D10
|
||
re-verify, now also covering the RL5 byte-identical rebuild) and **RL6** (coordinated machine-docs/
|
||
move — LAST, with orchestrator lockstep). Claiming the RL3 gate.
|
||
|
||
## 2026-05-27 — push-webhook diagnostic (the RL1 "future commits stay clean" advisory)
|
||
|
||
Timeboxed root-cause on why pushes don't auto-create a Drone lint build. Fired Gitea's webhook test
|
||
for the Drone hook (211) while tailing the Drone server logs:
|
||
- `POST /repos/recipe-maintainers/cc-ci/hooks/211/tests` → Gitea returns **204** (accepted).
|
||
- `docker service logs --since 20s drone_…_app` → **NOTHING** — no inbound request logged at all.
|
||
|
||
So the delivery `git.autonomic.zone (Gitea) → drone.ci.commoninternet.net (public gateway) → cc-ci`
|
||
isn't reaching Drone. This is a **gateway/network reachability** condition, NOT a Drone-side config
|
||
I can fix — and per §9 the gateway is operator-managed (not ours to reconfigure). Leaving it as the
|
||
documented pre-existing advisory (hook `last_status: None`, §4.1). Impact is limited to cc-ci's OWN
|
||
self-test/lint pipeline auto-firing; **recipe-CI triggering is unaffected** — the comment-bridge
|
||
polls Gitea *outbound* (cc-ci → git.autonomic.zone, the reliable direction), which is the plan's
|
||
primary trigger (§4.1). The lint stage is wired + proven green via its exact command; manual/API
|
||
Drone builds work. Not expanding scope to re-engineer the inbound path (bounded pass).
|
||
|
||
## 2026-05-27 — RL3 FULL D1–D10 PASS (Adversary cold). Only RL6 (coordinated) left.
|
||
|
||
Adversary logged **RL3 PASS** (REVIEW-1b): all D1–D10 re-verified cold on the cleaned+RL5
|
||
byte-identical closure (`8i3jcad9`==running==fresh-clone build), fresh <24h evidence, nothing
|
||
weakened. Highlights: D1 trigger 20s/8s; D2 install/upgrade/backup green (upgrade actually ran, not
|
||
skipped) on custom-html + keycloak; D6 leak test 0 hits (8/8 infra + cert/key + generated keycloak
|
||
admin pw absent from logs/dashboard); D8 fresh-recursive-clone rebuild == running; D10 = 2 fresh
|
||
category runs (#151 custom-html, #152 keycloak) + carry-forward of the Phase-1 Adversary-verified
|
||
6/6 set (byte-identical harness/test/closure). Cardinal-rule PASS. **RL1–RL5 Adversary-PASS, no open
|
||
findings, NO VETO.**
|
||
|
||
→ Flagged the orchestrator (STATUS-1b) that I'm **ready for the RL6 coordinated cutover**: it updates
|
||
`launch.sh` to `machine-docs/` paths + restarts the watchdog; on its signal I `git mv`
|
||
STATUS*/BACKLOG*/JOURNAL*/DECISIONS.md into `machine-docs/` (README stays root), the Adversary moves
|
||
REVIEW*, I fix the only in-repo refs (README Loop-state + docs/install.md:15), Adversary re-verifies,
|
||
then I write `## DONE`. Holding all root protocol files in place until that signal (moving them early
|
||
breaks the live watchdog). Loop continues; not idling on a long sleep — short fallback while awaiting
|
||
the orchestrator go-ahead.
|
||
|
||
## 2026-05-27 — RL6 PASS → Phase 1b DONE
|
||
|
||
Adversary logged **RL6 PASS** + **FINAL SIGN-OFF: all RL1–RL6 Adversary-PASS, NO VETO** (it moved its
|
||
own REVIEW*.md → machine-docs/, re-verified refs + the watchdog `resolve_state` handoff survived the
|
||
lockstep cutover). No open `[adversary]` findings; advisories → IDEAS + the documented push-webhook one.
|
||
|
||
DONE-handshake conditions (plan §6.1) met: a <24h Adversary PASS for every RL1–RL6 + the full cold
|
||
D1–D10, no standing `## VETO`. Final Builder health: cc-ci `running`/0-failed, toplevel
|
||
`8i3jcad9mrr01558lqckpi26nxn2ra3m` == fresh-clone build (build==running, byte-identical), 5 stacks up,
|
||
public `https://ci.commoninternet.net/` → 200. Wrote `## DONE` to machine-docs/STATUS-1b.md.
|
||
|
||
**Phase 1b is genuinely DONE.** The foundation is now: formatted + lint-clean (CI-enforced via the
|
||
`.drone.yml` lint stage), all Nix code under `nix/` (flake at root, `#cc-ci` unchanged), multi-agent
|
||
protocol files under `machine-docs/`, and every Phase-1 D1–D10 re-verified cold on the cleaned closure
|
||
with nothing weakened. Builder loop terminating.
|