207 lines
15 KiB
Markdown
207 lines
15 KiB
Markdown
# Phase 3 — Beautiful YunoHost-style results — JOURNAL (Builder-private reasoning)
|
||
|
||
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. WHY lives here; WHAT/HOW/EXPECTED/WHERE → STATUS-3.
|
||
|
||
## 2026-05-31T05:41Z — Phase-3 bootstrap + orientation
|
||
|
||
Read plan-phase3-results-ux.md in full (SSOT) + plan.md §6.1/§7/§9. Oriented on the existing
|
||
Phase-1/2 artifacts I'll extend:
|
||
- `runner/run_recipe_ci.py`: orchestrates deploy-once → per-tier (install/upgrade/backup/restore/custom),
|
||
produces an in-memory `results` dict `{tier: 'pass'|'fail'|'skip'}` printed to Drone logs. **No
|
||
results.json, no level, no screenshot today.** Also tracks deploy-count (DG4.1), deps/SSO readiness
|
||
(`sso_dep_unverified` → F2-11), teardown errors.
|
||
- `bridge/bridge.py`: posts a text PR comment with the Drone run URL; `watch_and_reflect` edits it to
|
||
✅/❌ on completion. No image/badge/level.
|
||
- `dashboard/dashboard.py`: stdlib HTTP service (swarm OCI image, Nix-built) that polls the **Drone API
|
||
only** and renders a latest-per-recipe table + a basic per-recipe SVG badge (Drone status, not level).
|
||
Runs as a container with **no host volume mounts** — relevant for artifact hosting (U0.4).
|
||
|
||
Key Phase-3 mapping insight: the level ladder (§4.1) maps cleanly onto the existing per-tier results:
|
||
- L1 install-tier pass; L2 upgrade pass; L3 backup AND restore pass; L4 custom (functional) pass;
|
||
L5 SSO/integration (requires_deps tests actually ran + passed — `deps_ready` and not
|
||
`sso_dep_unverified`); L6 recipe-local tests pass (D4 — discovered repo-local overlay/custom).
|
||
- Gap-caps-level (YunoHost): level = highest rung L such that every rung ≤ L passed. A rung that is
|
||
genuinely N/A (e.g. backup not BACKUP_CAPABLE, or no SSO/integration surface) must NOT block the
|
||
climb but caps with a recorded reason ("L4 — no integration surface" etc.) for fairness (§4.1 L5).
|
||
- Invariants surfaced as flags not levels: clean-teardown ✔ (no dep_teardown_error / DG4.1 ok),
|
||
no-secret-leak ✔.
|
||
|
||
Adversary is live (REVIEW-3 @05:42Z), flagged the Phase-2-DONE prerequisite but is not treating it as
|
||
a P3 blocker; operator kicked Phase 3 off manually. Proceeding.
|
||
|
||
### Plan for U0 (foundation)
|
||
1. Pure `level()` function in a new `runner/harness/level.py` — unit-testable (no I/O), so I can prove
|
||
"L4-pass" and "L2-cap" semantics cheaply and the Adversary can re-run the unit test cold. This is
|
||
the load-bearing logic; everything else (card, badge, dashboard) just *renders* what it returns.
|
||
2. Capture per-test detail: run each tier's pytest with `--junitxml` to a run-scoped dir, parse the
|
||
XML (stdlib `xml.etree`) into per-test rows {name, status, ms}. Aggregate per stage.
|
||
3. `run_recipe_ci.py` assembles `results.json` {recipe, version, pr, ref, run_id, stages[], level,
|
||
level_cap_reason, flags} and writes it to the artifact dir — wrapped so a failure here NEVER changes
|
||
the run's exit code (R7: cosmetics never block).
|
||
4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve
|
||
`/runs/<id>/...`. Decide details + record in DECISIONS.
|
||
|
||
## 2026-05-31T06:00Z — U0 complete + CLAIMED
|
||
|
||
Implemented U0.1–U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding
|
||
risk the Adversary flagged at df54693) produces correct levels:
|
||
- **custom-html-tiny** (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/
|
||
restore skip→N/A, no custom → **level=2**, cap "L3 backup/restore N/A". Proves gap-caps on real data.
|
||
- **uptime-kuma** (backup-capable, 3 functional tests, no deps): all five tiers pass → **level=4**,
|
||
cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4.
|
||
Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after.
|
||
|
||
Design notes / WHY:
|
||
- Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle
|
||
rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal
|
||
guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear
|
||
reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL.
|
||
- Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker
|
||
plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside
|
||
the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't
|
||
cheaply distinguish — never inflates.
|
||
- results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is
|
||
logged but never changes `overall` (R7). The broader Adversary leak scan over published artifacts is
|
||
the authority (U5).
|
||
- "version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but
|
||
ugly for the card; will prefer the tested version tag for display in U2.
|
||
|
||
Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell):
|
||
not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better
|
||
than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise).
|
||
|
||
## 2026-05-31T06:50Z — U2 render-path de-risked headless on cc-ci (parked at U0 gate)
|
||
|
||
While U0 is CLAIMED awaiting the Adversary (its cold runs adv-cht=L2 / adv-uk=L4 reproduced my
|
||
claimed levels exactly @06:06/06:09 — swarm clean, no orphans), I kept the unblocked U2 render path
|
||
moving. Ran a real headless Playwright PNG render on cc-ci of the pure `harness.card` renderers from
|
||
two fixtures (a passing L4 uptime-kuma and a failing L0 custom-html-tiny):
|
||
|
||
cc-ci-run /tmp/smoke_card.py (renders render_card_html → render_card_png + level_badge_svg)
|
||
pass: png size=119765 badge svg=342B
|
||
fail: png size=56353 badge svg=342B
|
||
|
||
Pulled both PNGs back and eyeballed them:
|
||
- **pass card** — level 4 in a yellow-green badge, full per-stage/per-test ✔ rows with PASS labels,
|
||
inline sunflower renders, `clean teardown` + `no secret leak` flags green. Fonts clean (no tofu).
|
||
- **fail card** — level 0 in a red badge, install FAIL row, `no screenshot` placeholder shown.
|
||
- **No inflation:** the fail card honestly shows L0/red/FAIL; the card computes nothing, it reports
|
||
the dict verbatim (cardinal guardrail upheld at the render layer).
|
||
|
||
This proves the U2 render path (HTML→PNG headless) works on the real cc-ci browser for both pass and
|
||
fail runs — the U2 acceptance shape — *before* I wire it into run_recipe_ci.py (which I will not do
|
||
until U0 PASSes, to avoid rework if the schema changes).
|
||
|
||
WIRING CONTRACT noted for U1/U2: the broken-image icon seen on the pass fixture is only because the
|
||
fixture set `screenshot:"screenshot.png"` with no file present. The wiring MUST set
|
||
`data["screenshot"]` truthy ONLY when the captured PNG actually exists (screenshot.capture returns
|
||
None on failure) — then the card's `show_shot` gate falls back to the `no screenshot` placeholder,
|
||
as the fail fixture already proves. No renderer change needed.
|
||
|
||
Not claiming U2 — still parked at the U0 gate per §6.1 (no advance past a gate without its PASS).
|
||
|
||
## 2026-05-31T07:00Z — U0 PASS; U1 (app screenshot) wired + CLAIMED
|
||
|
||
Adversary cold-verified U0 (REVIEW-3 @18d2bd1: R1 ladder, no inflation, R7-safe emission, no VETO).
|
||
Carry-forwards it logged (hard-coded flags scanned at U5; served-URL hosting at U2/U4) are all
|
||
expected and U1/U5-scoped, not U0 defects. Proceeded past U0 to U1.
|
||
|
||
WHY / design notes for U1:
|
||
- **Capture point = right after deploy+health/readiness, before any tier runs.** Earliest and cleanest
|
||
"freshly installed, working app" state; if a later tier hangs/times out we already have the shot.
|
||
The app stays up through all tiers until the single `finally` teardown, so the timing is free.
|
||
- **Placed OUTSIDE the deploy try/except**, guarded by `if deploy_ok`. Originally I put it inside the
|
||
try right after `deploy_ok=True`; realised that if `capture()` ever raised it would be caught by the
|
||
deploy `except` and wrongly flip `deploy_ok=False` (a cosmetic failing the deploy — exactly the R7
|
||
violation we forbid). Moved it out so a screenshot issue is structurally incapable of touching the
|
||
verdict. `capture()` is also internally all-swallowing, so it's belt-and-suspenders.
|
||
- **Secret-safety = landing page by default.** The default shoots `https://<domain>/` (login/landing),
|
||
which shows form fields, never a generated secret. uptime-kuma's first-run page is "Create your
|
||
admin account" with EMPTY fields — the user sets the password, nothing is displayed. Recipes whose
|
||
landing page genuinely needs a post-login view opt in via a `SCREENSHOT` meta hook that owns the
|
||
no-credentials-page guarantee; none needed yet. The harness NEVER auto-fills a setup wizard.
|
||
- **results.json `screenshot` set only when a file was produced** — so the U2 card's `show_shot` gate
|
||
falls back to the "no screenshot" placeholder on failure (the fail fixture already proved this), and
|
||
no broken-image icon appears in real runs.
|
||
- **Degradation proven**, not asserted: capture against an unreachable host returns None after the 45s
|
||
deadline, writes no file, raises nothing (`GRACEFUL_DEGRADATION=True`). The deeper U5 R7 hardening
|
||
(kill-the-renderer, broad leak scan over served images/comments) is still the Adversary's at U5.
|
||
|
||
Verification (all on cc-ci @5fa15d4):
|
||
- 38 phase-3 unit tests pass (incl. 4 test_screenshot pure-helper tests).
|
||
- uptime-kuma real install run → 30KB screenshot.png of the working UI (empty cred fields), results.json
|
||
`screenshot="screenshot.png"`, clean_teardown=true, no orphan service.
|
||
- unreachable-host capture → None, no file, no raise.
|
||
|
||
## 2026-05-31T07:03Z — U2 generation wired + card embeds the REAL screenshot (held, not claimed)
|
||
|
||
While parked at the U1 gate (claimed d7e812e, awaiting Adversary), kept unblocked U2 work in hand:
|
||
wired `card_mod` into run_recipe_ci.py (afe5e51) so each run renders `summary.html`→`summary.png` +
|
||
`badge.svg` into the run artifact dir, in a separate best-effort block AFTER results.json is written
|
||
(so a card failure can't even look like a results.json failure; both swallow → never touch `overall`,
|
||
R7). The card passes `screenshot_rel=data.get("screenshot")` so it embeds the real shot iff one exists.
|
||
|
||
Proved end-to-end against the REAL u1-uk-shot run data (results.json + screenshot.png): rendered
|
||
summary.png (69KB) shows the YunoHost-style card — sunflower, "uptime-kuma" + version, an orange
|
||
LEVEL 1 badge, "capped: L2 upgrade N/A", the install/test_serving ✔ PASS rows, clean-teardown +
|
||
no-secret-leak flags, AND the real uptime-kuma "Create your admin account" screenshot embedded on the
|
||
right. badge.svg 342B. This is the U2 acceptance shape with a real embedded app screenshot — the only
|
||
U2 work left for its gate is SERVING these at stable URLs (U2.3, dashboard bind-mount) + showing a
|
||
fail run. NOT claiming U2 — still gated behind U1's PASS.
|
||
|
||
## 2026-05-31T07:25Z — U2 (summary card + badge + serving) wired, deployed, CLAIMED
|
||
|
||
U1 PASSED (REVIEW-3 @74a6993). Built out U2 end-to-end and rolled the serving layer to production.
|
||
|
||
WHY / notable decisions:
|
||
- **Card generation placed AFTER results.json write, in its own best-effort block** (not the same
|
||
try as results.json) so a card-render failure can't masquerade as a results.json failure; both
|
||
swallow → never touch `overall` (R7).
|
||
- **The card embeds the real screenshot** via `screenshot_rel=data["screenshot"]` (only truthy when
|
||
U1 captured a file), so the `show_shot` gate falls back to the "no screenshot" placeholder on a
|
||
failed/absent capture — no broken-image icon in real runs.
|
||
- **Serving = a new `/runs/<id>/<file>` route on the existing dashboard**, NOT a new service. Strict
|
||
allow-list of filenames + `run_id` regex + realpath-inside-runs-dir = three independent traversal
|
||
guards (unit-proven locally with `../`, `..`, `/etc`, non-whitelisted names; live-proven on cc-ci).
|
||
Runs dir bind-mounted READ-ONLY (dashboard never writes run artifacts).
|
||
- **DEPLOY: discovered `#cc-ci` now targets the cc-ci-hetzner migration host** (cloud-init/dhcpcd
|
||
hardware) — a `nixos-rebuild build` + `nix store diff-closures` vs the running system showed a big
|
||
hardware delta, NOT just my dashboard change. So a full `switch` on the LIVE host would be wrong/
|
||
dangerous. Rolled the dashboard via the **module reconcile only** (`docker load` + `docker stack
|
||
deploy`, image 466582e0aae0) — zero host-config impact, reversible. Recorded the mechanism +
|
||
migration caveat in DECISIONS.md (Phase-3/U2) and warned the Adversary via ADVERSARY-INBOX. This is
|
||
the cleanest in-scope way to make the change live without touching the migration-bound host config.
|
||
- **Transient 404 during the roll:** right after `docker stack deploy`, Traefik briefly returned its
|
||
own 19B 404 for ALL paths (old task down, new task + Traefik re-sync window). Resolved on its own in
|
||
~25s → `/` 200, `/runs/...` 200. Noted so it isn't mistaken for a real outage.
|
||
|
||
Verification (live, post-roll):
|
||
- `https://ci.commoninternet.net/runs/u1-uk-shot/summary.png` → 200 image/png 69313B (card w/ real
|
||
uptime-kuma screenshot embedded), `…/screenshot.png` 200 30858B, `…/badge.svg` 200, `…/results.json`
|
||
200. Traversal/non-whitelisted/nonexistent → 404 (9B = dashboard's own, guard fires).
|
||
- 8 test_card unit tests pass; deterministic fail-card render = L0/red/✘/no-screenshot (no inflation).
|
||
- `/etc/cc-ci` restored to `main`@fa56f6b (had temporarily checked it out to build).
|
||
|
||
## 2026-05-31T09:35Z — U3 live demo: discovered Drone DB reset (repo inactive), reactivated
|
||
|
||
Resuming U3 (bridge code already built+deployed @9a47aa2; deployed bridge image tag `6377f9571f3b`
|
||
== sha256(bridge.py), confirmed; dashboard do_HEAD live → A3-1 CLOSED by Adversary @8807240).
|
||
|
||
To run the U3 live demo (`!testme` → image-forward PR comment) I first validated the trigger path and
|
||
hit a real blocker: the bridge log showed `drone trigger failed 404`, and `GET /api/repos/
|
||
recipe-maintainers/cc-ci` → 404. Diagnosis: the Drone admin **token is valid** (`/api/user` → 200,
|
||
autonomic-bot admin=true) but the **repo was inactive** — Drone's DB was reset (the Hetzner migration;
|
||
`created`/`synced` timestamps are all recent ~1780220000). In Phase 1 the repo was activated once via
|
||
`POST /api/repos/recipe-maintainers/cc-ci` (JOURNAL.md:258); that activation is NOT Nix-declared
|
||
(drone.nix only PATCHes the timeout, which itself assumes the repo is already active), so a DB reset
|
||
silently de-registers it and the bridge can't trigger.
|
||
|
||
Action (in-scope reconfig of my own CI, reversible): `POST /api/user/repos?async=false` (sync, 200) →
|
||
`POST /api/repos/recipe-maintainers/cc-ci` → **active=true**, config_path=.drone.yml, timeout=60. The
|
||
`trusted` flag stays false — irrelevant for the `type: exec` pipeline (trusted only gates privileged
|
||
*docker* pipelines). Validated by triggering a custom build directly (same params the bridge sends):
|
||
build **#1 → running** within ~10s (exec runner picked it up). Watching it produce /runs/1/ artifacts.
|
||
|
||
NOTE for hardening backlog (U5/operator): repo activation should be folded into the drone reconcile so
|
||
a future DB reset self-heals (`POST /api/repos/<slug>` before the timeout PATCH). Filing in BACKLOG-3.
|