Files
cc-ci/machine-docs/JOURNAL-3.md
autonomic-bot 14aa785f55
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
journal(3): U3 live-demo start — Drone DB reset discovered, repo reactivated; validating pipeline (build #1 running)
2026-05-31 09:37:21 +00:00

207 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 3 — Beautiful YunoHost-style results — JOURNAL (Builder-private reasoning)
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. WHY lives here; WHAT/HOW/EXPECTED/WHERE → STATUS-3.
## 2026-05-31T05:41Z — Phase-3 bootstrap + orientation
Read plan-phase3-results-ux.md in full (SSOT) + plan.md §6.1/§7/§9. Oriented on the existing
Phase-1/2 artifacts I'll extend:
- `runner/run_recipe_ci.py`: orchestrates deploy-once → per-tier (install/upgrade/backup/restore/custom),
produces an in-memory `results` dict `{tier: 'pass'|'fail'|'skip'}` printed to Drone logs. **No
results.json, no level, no screenshot today.** Also tracks deploy-count (DG4.1), deps/SSO readiness
(`sso_dep_unverified` → F2-11), teardown errors.
- `bridge/bridge.py`: posts a text PR comment with the Drone run URL; `watch_and_reflect` edits it to
✅/❌ on completion. No image/badge/level.
- `dashboard/dashboard.py`: stdlib HTTP service (swarm OCI image, Nix-built) that polls the **Drone API
only** and renders a latest-per-recipe table + a basic per-recipe SVG badge (Drone status, not level).
Runs as a container with **no host volume mounts** — relevant for artifact hosting (U0.4).
Key Phase-3 mapping insight: the level ladder (§4.1) maps cleanly onto the existing per-tier results:
- L1 install-tier pass; L2 upgrade pass; L3 backup AND restore pass; L4 custom (functional) pass;
L5 SSO/integration (requires_deps tests actually ran + passed — `deps_ready` and not
`sso_dep_unverified`); L6 recipe-local tests pass (D4 — discovered repo-local overlay/custom).
- Gap-caps-level (YunoHost): level = highest rung L such that every rung ≤ L passed. A rung that is
genuinely N/A (e.g. backup not BACKUP_CAPABLE, or no SSO/integration surface) must NOT block the
climb but caps with a recorded reason ("L4 — no integration surface" etc.) for fairness (§4.1 L5).
- Invariants surfaced as flags not levels: clean-teardown ✔ (no dep_teardown_error / DG4.1 ok),
no-secret-leak ✔.
Adversary is live (REVIEW-3 @05:42Z), flagged the Phase-2-DONE prerequisite but is not treating it as
a P3 blocker; operator kicked Phase 3 off manually. Proceeding.
### Plan for U0 (foundation)
1. Pure `level()` function in a new `runner/harness/level.py` — unit-testable (no I/O), so I can prove
"L4-pass" and "L2-cap" semantics cheaply and the Adversary can re-run the unit test cold. This is
the load-bearing logic; everything else (card, badge, dashboard) just *renders* what it returns.
2. Capture per-test detail: run each tier's pytest with `--junitxml` to a run-scoped dir, parse the
XML (stdlib `xml.etree`) into per-test rows {name, status, ms}. Aggregate per stage.
3. `run_recipe_ci.py` assembles `results.json` {recipe, version, pr, ref, run_id, stages[], level,
level_cap_reason, flags} and writes it to the artifact dir — wrapped so a failure here NEVER changes
the run's exit code (R7: cosmetics never block).
4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve
`/runs/<id>/...`. Decide details + record in DECISIONS.
## 2026-05-31T06:00Z — U0 complete + CLAIMED
Implemented U0.1U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding
risk the Adversary flagged at df54693) produces correct levels:
- **custom-html-tiny** (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/
restore skip→N/A, no custom → **level=2**, cap "L3 backup/restore N/A". Proves gap-caps on real data.
- **uptime-kuma** (backup-capable, 3 functional tests, no deps): all five tiers pass → **level=4**,
cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4.
Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after.
Design notes / WHY:
- Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle
rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal
guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear
reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL.
- Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker
plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside
the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't
cheaply distinguish — never inflates.
- results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is
logged but never changes `overall` (R7). The broader Adversary leak scan over published artifacts is
the authority (U5).
- "version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but
ugly for the card; will prefer the tested version tag for display in U2.
Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell):
not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better
than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise).
## 2026-05-31T06:50Z — U2 render-path de-risked headless on cc-ci (parked at U0 gate)
While U0 is CLAIMED awaiting the Adversary (its cold runs adv-cht=L2 / adv-uk=L4 reproduced my
claimed levels exactly @06:06/06:09 — swarm clean, no orphans), I kept the unblocked U2 render path
moving. Ran a real headless Playwright PNG render on cc-ci of the pure `harness.card` renderers from
two fixtures (a passing L4 uptime-kuma and a failing L0 custom-html-tiny):
cc-ci-run /tmp/smoke_card.py (renders render_card_html → render_card_png + level_badge_svg)
pass: png size=119765 badge svg=342B
fail: png size=56353 badge svg=342B
Pulled both PNGs back and eyeballed them:
- **pass card** — level 4 in a yellow-green badge, full per-stage/per-test ✔ rows with PASS labels,
inline sunflower renders, `clean teardown` + `no secret leak` flags green. Fonts clean (no tofu).
- **fail card** — level 0 in a red badge, install FAIL row, `no screenshot` placeholder shown.
- **No inflation:** the fail card honestly shows L0/red/FAIL; the card computes nothing, it reports
the dict verbatim (cardinal guardrail upheld at the render layer).
This proves the U2 render path (HTML→PNG headless) works on the real cc-ci browser for both pass and
fail runs — the U2 acceptance shape — *before* I wire it into run_recipe_ci.py (which I will not do
until U0 PASSes, to avoid rework if the schema changes).
WIRING CONTRACT noted for U1/U2: the broken-image icon seen on the pass fixture is only because the
fixture set `screenshot:"screenshot.png"` with no file present. The wiring MUST set
`data["screenshot"]` truthy ONLY when the captured PNG actually exists (screenshot.capture returns
None on failure) — then the card's `show_shot` gate falls back to the `no screenshot` placeholder,
as the fail fixture already proves. No renderer change needed.
Not claiming U2 — still parked at the U0 gate per §6.1 (no advance past a gate without its PASS).
## 2026-05-31T07:00Z — U0 PASS; U1 (app screenshot) wired + CLAIMED
Adversary cold-verified U0 (REVIEW-3 @18d2bd1: R1 ladder, no inflation, R7-safe emission, no VETO).
Carry-forwards it logged (hard-coded flags scanned at U5; served-URL hosting at U2/U4) are all
expected and U1/U5-scoped, not U0 defects. Proceeded past U0 to U1.
WHY / design notes for U1:
- **Capture point = right after deploy+health/readiness, before any tier runs.** Earliest and cleanest
"freshly installed, working app" state; if a later tier hangs/times out we already have the shot.
The app stays up through all tiers until the single `finally` teardown, so the timing is free.
- **Placed OUTSIDE the deploy try/except**, guarded by `if deploy_ok`. Originally I put it inside the
try right after `deploy_ok=True`; realised that if `capture()` ever raised it would be caught by the
deploy `except` and wrongly flip `deploy_ok=False` (a cosmetic failing the deploy — exactly the R7
violation we forbid). Moved it out so a screenshot issue is structurally incapable of touching the
verdict. `capture()` is also internally all-swallowing, so it's belt-and-suspenders.
- **Secret-safety = landing page by default.** The default shoots `https://<domain>/` (login/landing),
which shows form fields, never a generated secret. uptime-kuma's first-run page is "Create your
admin account" with EMPTY fields — the user sets the password, nothing is displayed. Recipes whose
landing page genuinely needs a post-login view opt in via a `SCREENSHOT` meta hook that owns the
no-credentials-page guarantee; none needed yet. The harness NEVER auto-fills a setup wizard.
- **results.json `screenshot` set only when a file was produced** — so the U2 card's `show_shot` gate
falls back to the "no screenshot" placeholder on failure (the fail fixture already proved this), and
no broken-image icon appears in real runs.
- **Degradation proven**, not asserted: capture against an unreachable host returns None after the 45s
deadline, writes no file, raises nothing (`GRACEFUL_DEGRADATION=True`). The deeper U5 R7 hardening
(kill-the-renderer, broad leak scan over served images/comments) is still the Adversary's at U5.
Verification (all on cc-ci @5fa15d4):
- 38 phase-3 unit tests pass (incl. 4 test_screenshot pure-helper tests).
- uptime-kuma real install run → 30KB screenshot.png of the working UI (empty cred fields), results.json
`screenshot="screenshot.png"`, clean_teardown=true, no orphan service.
- unreachable-host capture → None, no file, no raise.
## 2026-05-31T07:03Z — U2 generation wired + card embeds the REAL screenshot (held, not claimed)
While parked at the U1 gate (claimed d7e812e, awaiting Adversary), kept unblocked U2 work in hand:
wired `card_mod` into run_recipe_ci.py (afe5e51) so each run renders `summary.html``summary.png` +
`badge.svg` into the run artifact dir, in a separate best-effort block AFTER results.json is written
(so a card failure can't even look like a results.json failure; both swallow → never touch `overall`,
R7). The card passes `screenshot_rel=data.get("screenshot")` so it embeds the real shot iff one exists.
Proved end-to-end against the REAL u1-uk-shot run data (results.json + screenshot.png): rendered
summary.png (69KB) shows the YunoHost-style card — sunflower, "uptime-kuma" + version, an orange
LEVEL 1 badge, "capped: L2 upgrade N/A", the install/test_serving ✔ PASS rows, clean-teardown +
no-secret-leak flags, AND the real uptime-kuma "Create your admin account" screenshot embedded on the
right. badge.svg 342B. This is the U2 acceptance shape with a real embedded app screenshot — the only
U2 work left for its gate is SERVING these at stable URLs (U2.3, dashboard bind-mount) + showing a
fail run. NOT claiming U2 — still gated behind U1's PASS.
## 2026-05-31T07:25Z — U2 (summary card + badge + serving) wired, deployed, CLAIMED
U1 PASSED (REVIEW-3 @74a6993). Built out U2 end-to-end and rolled the serving layer to production.
WHY / notable decisions:
- **Card generation placed AFTER results.json write, in its own best-effort block** (not the same
try as results.json) so a card-render failure can't masquerade as a results.json failure; both
swallow → never touch `overall` (R7).
- **The card embeds the real screenshot** via `screenshot_rel=data["screenshot"]` (only truthy when
U1 captured a file), so the `show_shot` gate falls back to the "no screenshot" placeholder on a
failed/absent capture — no broken-image icon in real runs.
- **Serving = a new `/runs/<id>/<file>` route on the existing dashboard**, NOT a new service. Strict
allow-list of filenames + `run_id` regex + realpath-inside-runs-dir = three independent traversal
guards (unit-proven locally with `../`, `..`, `/etc`, non-whitelisted names; live-proven on cc-ci).
Runs dir bind-mounted READ-ONLY (dashboard never writes run artifacts).
- **DEPLOY: discovered `#cc-ci` now targets the cc-ci-hetzner migration host** (cloud-init/dhcpcd
hardware) — a `nixos-rebuild build` + `nix store diff-closures` vs the running system showed a big
hardware delta, NOT just my dashboard change. So a full `switch` on the LIVE host would be wrong/
dangerous. Rolled the dashboard via the **module reconcile only** (`docker load` + `docker stack
deploy`, image 466582e0aae0) — zero host-config impact, reversible. Recorded the mechanism +
migration caveat in DECISIONS.md (Phase-3/U2) and warned the Adversary via ADVERSARY-INBOX. This is
the cleanest in-scope way to make the change live without touching the migration-bound host config.
- **Transient 404 during the roll:** right after `docker stack deploy`, Traefik briefly returned its
own 19B 404 for ALL paths (old task down, new task + Traefik re-sync window). Resolved on its own in
~25s → `/` 200, `/runs/...` 200. Noted so it isn't mistaken for a real outage.
Verification (live, post-roll):
- `https://ci.commoninternet.net/runs/u1-uk-shot/summary.png` → 200 image/png 69313B (card w/ real
uptime-kuma screenshot embedded), `…/screenshot.png` 200 30858B, `…/badge.svg` 200, `…/results.json`
200. Traversal/non-whitelisted/nonexistent → 404 (9B = dashboard's own, guard fires).
- 8 test_card unit tests pass; deterministic fail-card render = L0/red/✘/no-screenshot (no inflation).
- `/etc/cc-ci` restored to `main`@fa56f6b (had temporarily checked it out to build).
## 2026-05-31T09:35Z — U3 live demo: discovered Drone DB reset (repo inactive), reactivated
Resuming U3 (bridge code already built+deployed @9a47aa2; deployed bridge image tag `6377f9571f3b`
== sha256(bridge.py), confirmed; dashboard do_HEAD live → A3-1 CLOSED by Adversary @8807240).
To run the U3 live demo (`!testme` → image-forward PR comment) I first validated the trigger path and
hit a real blocker: the bridge log showed `drone trigger failed 404`, and `GET /api/repos/
recipe-maintainers/cc-ci` → 404. Diagnosis: the Drone admin **token is valid** (`/api/user` → 200,
autonomic-bot admin=true) but the **repo was inactive** — Drone's DB was reset (the Hetzner migration;
`created`/`synced` timestamps are all recent ~1780220000). In Phase 1 the repo was activated once via
`POST /api/repos/recipe-maintainers/cc-ci` (JOURNAL.md:258); that activation is NOT Nix-declared
(drone.nix only PATCHes the timeout, which itself assumes the repo is already active), so a DB reset
silently de-registers it and the bridge can't trigger.
Action (in-scope reconfig of my own CI, reversible): `POST /api/user/repos?async=false` (sync, 200) →
`POST /api/repos/recipe-maintainers/cc-ci`**active=true**, config_path=.drone.yml, timeout=60. The
`trusted` flag stays false — irrelevant for the `type: exec` pipeline (trusted only gates privileged
*docker* pipelines). Validated by triggering a custom build directly (same params the bridge sends):
build **#1 → running** within ~10s (exec runner picked it up). Watching it produce /runs/1/ artifacts.
NOTE for hardening backlog (U5/operator): repo activation should be folded into the drone reconcile so
a future DB reset self-heals (`POST /api/repos/<slug>` before the timeout PATCH). Filing in BACKLOG-3.