recipe-maintainers/cc-ci

Fork 0

Files

autonomic-bot 14aa785f55

continuous-integration/drone/push Build is failing

Details

continuous-integration/drone Build is passing

Details

journal(3): U3 live-demo start — Drone DB reset discovered, repo reactivated; validating pipeline (build #1 running)

2026-05-31 09:37:21 +00:00

15 KiB

Raw Blame History

Phase 3 — Beautiful YunoHost-style results — JOURNAL (Builder-private reasoning)

SSOT: /srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md. WHY lives here; WHAT/HOW/EXPECTED/WHERE → STATUS-3.

2026-05-31T05:41Z — Phase-3 bootstrap + orientation

Read plan-phase3-results-ux.md in full (SSOT) + plan.md §6.1/§7/§9. Oriented on the existing Phase-1/2 artifacts I'll extend:

runner/run_recipe_ci.py: orchestrates deploy-once → per-tier (install/upgrade/backup/restore/custom), produces an in-memory results dict {tier: 'pass'|'fail'|'skip'} printed to Drone logs. No results.json, no level, no screenshot today. Also tracks deploy-count (DG4.1), deps/SSO readiness (sso_dep_unverified → F2-11), teardown errors.
bridge/bridge.py: posts a text PR comment with the Drone run URL; watch_and_reflect edits it to ✅/❌ on completion. No image/badge/level.
dashboard/dashboard.py: stdlib HTTP service (swarm OCI image, Nix-built) that polls the Drone API only and renders a latest-per-recipe table + a basic per-recipe SVG badge (Drone status, not level). Runs as a container with no host volume mounts — relevant for artifact hosting (U0.4).

Key Phase-3 mapping insight: the level ladder (§4.1) maps cleanly onto the existing per-tier results:

L1 install-tier pass; L2 upgrade pass; L3 backup AND restore pass; L4 custom (functional) pass; L5 SSO/integration (requires_deps tests actually ran + passed — deps_ready and not sso_dep_unverified); L6 recipe-local tests pass (D4 — discovered repo-local overlay/custom).
Gap-caps-level (YunoHost): level = highest rung L such that every rung ≤ L passed. A rung that is genuinely N/A (e.g. backup not BACKUP_CAPABLE, or no SSO/integration surface) must NOT block the climb but caps with a recorded reason ("L4 — no integration surface" etc.) for fairness (§4.1 L5).
Invariants surfaced as flags not levels: clean-teardown ✔ (no dep_teardown_error / DG4.1 ok), no-secret-leak ✔.

Adversary is live (REVIEW-3 @05:42Z), flagged the Phase-2-DONE prerequisite but is not treating it as a P3 blocker; operator kicked Phase 3 off manually. Proceeding.

Plan for U0 (foundation)

Pure level() function in a new runner/harness/level.py — unit-testable (no I/O), so I can prove "L4-pass" and "L2-cap" semantics cheaply and the Adversary can re-run the unit test cold. This is the load-bearing logic; everything else (card, badge, dashboard) just renders what it returns.
Capture per-test detail: run each tier's pytest with --junitxml to a run-scoped dir, parse the XML (stdlib xml.etree) into per-test rows {name, status, ms}. Aggregate per stage.
run_recipe_ci.py assembles results.json {recipe, version, pr, ref, run_id, stages[], level, level_cap_reason, flags} and writes it to the artifact dir — wrapped so a failure here NEVER changes the run's exit code (R7: cosmetics never block).
Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve /runs/<id>/.... Decide details + record in DECISIONS.

2026-05-31T06:00Z — U0 complete + CLAIMED

Implemented U0.1–U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding risk the Adversary flagged at df54693) produces correct levels:

custom-html-tiny (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/ restore skip→N/A, no custom → level=2, cap "L3 backup/restore N/A". Proves gap-caps on real data.
uptime-kuma (backup-capable, 3 functional tests, no deps): all five tiers pass → level=4, cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4. Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after.

Design notes / WHY:

Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL.
Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't cheaply distinguish — never inflates.
results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is logged but never changes overall (R7). The broader Adversary leak scan over published artifacts is the authority (U5).
"version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but ugly for the card; will prefer the tested version tag for display in U2.

Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell): not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise).

2026-05-31T06:50Z — U2 render-path de-risked headless on cc-ci (parked at U0 gate)

While U0 is CLAIMED awaiting the Adversary (its cold runs adv-cht=L2 / adv-uk=L4 reproduced my claimed levels exactly @06:06/06:09 — swarm clean, no orphans), I kept the unblocked U2 render path moving. Ran a real headless Playwright PNG render on cc-ci of the pure harness.card renderers from two fixtures (a passing L4 uptime-kuma and a failing L0 custom-html-tiny):

cc-ci-run /tmp/smoke_card.py  (renders render_card_html → render_card_png + level_badge_svg)
pass: png size=119765  badge svg=342B
fail: png size=56353   badge svg=342B

Pulled both PNGs back and eyeballed them:

pass card — level 4 in a yellow-green badge, full per-stage/per-test ✔ rows with PASS labels, inline sunflower renders, clean teardown + no secret leak flags green. Fonts clean (no tofu).
fail card — level 0 in a red badge, install FAIL row, no screenshot placeholder shown.
No inflation: the fail card honestly shows L0/red/FAIL; the card computes nothing, it reports the dict verbatim (cardinal guardrail upheld at the render layer).

This proves the U2 render path (HTML→PNG headless) works on the real cc-ci browser for both pass and fail runs — the U2 acceptance shape — before I wire it into run_recipe_ci.py (which I will not do until U0 PASSes, to avoid rework if the schema changes).

WIRING CONTRACT noted for U1/U2: the broken-image icon seen on the pass fixture is only because the fixture set screenshot:"screenshot.png" with no file present. The wiring MUST set data["screenshot"] truthy ONLY when the captured PNG actually exists (screenshot.capture returns None on failure) — then the card's show_shot gate falls back to the no screenshot placeholder, as the fail fixture already proves. No renderer change needed.

Not claiming U2 — still parked at the U0 gate per §6.1 (no advance past a gate without its PASS).

2026-05-31T07:00Z — U0 PASS; U1 (app screenshot) wired + CLAIMED

Adversary cold-verified U0 (REVIEW-3 @18d2bd1: R1 ladder, no inflation, R7-safe emission, no VETO). Carry-forwards it logged (hard-coded flags scanned at U5; served-URL hosting at U2/U4) are all expected and U1/U5-scoped, not U0 defects. Proceeded past U0 to U1.

WHY / design notes for U1:

Capture point = right after deploy+health/readiness, before any tier runs. Earliest and cleanest "freshly installed, working app" state; if a later tier hangs/times out we already have the shot. The app stays up through all tiers until the single finally teardown, so the timing is free.
Placed OUTSIDE the deploy try/except, guarded by if deploy_ok. Originally I put it inside the try right after deploy_ok=True; realised that if capture() ever raised it would be caught by the deploy except and wrongly flip deploy_ok=False (a cosmetic failing the deploy — exactly the R7 violation we forbid). Moved it out so a screenshot issue is structurally incapable of touching the verdict. capture() is also internally all-swallowing, so it's belt-and-suspenders.
Secret-safety = landing page by default. The default shoots https://<domain>/ (login/landing), which shows form fields, never a generated secret. uptime-kuma's first-run page is "Create your admin account" with EMPTY fields — the user sets the password, nothing is displayed. Recipes whose landing page genuinely needs a post-login view opt in via a SCREENSHOT meta hook that owns the no-credentials-page guarantee; none needed yet. The harness NEVER auto-fills a setup wizard.
results.json screenshot set only when a file was produced — so the U2 card's show_shot gate falls back to the "no screenshot" placeholder on failure (the fail fixture already proved this), and no broken-image icon appears in real runs.
Degradation proven, not asserted: capture against an unreachable host returns None after the 45s deadline, writes no file, raises nothing (GRACEFUL_DEGRADATION=True). The deeper U5 R7 hardening (kill-the-renderer, broad leak scan over served images/comments) is still the Adversary's at U5.

Verification (all on cc-ci @5fa15d4):

38 phase-3 unit tests pass (incl. 4 test_screenshot pure-helper tests).
uptime-kuma real install run → 30KB screenshot.png of the working UI (empty cred fields), results.json screenshot="screenshot.png", clean_teardown=true, no orphan service.
unreachable-host capture → None, no file, no raise.

2026-05-31T07:03Z — U2 generation wired + card embeds the REAL screenshot (held, not claimed)

While parked at the U1 gate (claimed d7e812e, awaiting Adversary), kept unblocked U2 work in hand: wired card_mod into run_recipe_ci.py (afe5e51) so each run renders summary.html→summary.png + badge.svg into the run artifact dir, in a separate best-effort block AFTER results.json is written (so a card failure can't even look like a results.json failure; both swallow → never touch overall, R7). The card passes screenshot_rel=data.get("screenshot") so it embeds the real shot iff one exists.

Proved end-to-end against the REAL u1-uk-shot run data (results.json + screenshot.png): rendered summary.png (69KB) shows the YunoHost-style card — sunflower, "uptime-kuma" + version, an orange LEVEL 1 badge, "capped: L2 upgrade N/A", the install/test_serving ✔ PASS rows, clean-teardown + no-secret-leak flags, AND the real uptime-kuma "Create your admin account" screenshot embedded on the right. badge.svg 342B. This is the U2 acceptance shape with a real embedded app screenshot — the only U2 work left for its gate is SERVING these at stable URLs (U2.3, dashboard bind-mount) + showing a fail run. NOT claiming U2 — still gated behind U1's PASS.

2026-05-31T07:25Z — U2 (summary card + badge + serving) wired, deployed, CLAIMED

U1 PASSED (REVIEW-3 @74a6993). Built out U2 end-to-end and rolled the serving layer to production.

WHY / notable decisions:

Card generation placed AFTER results.json write, in its own best-effort block (not the same try as results.json) so a card-render failure can't masquerade as a results.json failure; both swallow → never touch overall (R7).
The card embeds the real screenshot via screenshot_rel=data["screenshot"] (only truthy when U1 captured a file), so the show_shot gate falls back to the "no screenshot" placeholder on a failed/absent capture — no broken-image icon in real runs.
Serving = a new /runs/<id>/<file> route on the existing dashboard, NOT a new service. Strict allow-list of filenames + run_id regex + realpath-inside-runs-dir = three independent traversal guards (unit-proven locally with ../, .., /etc, non-whitelisted names; live-proven on cc-ci). Runs dir bind-mounted READ-ONLY (dashboard never writes run artifacts).
DEPLOY: discovered #cc-ci now targets the cc-ci-hetzner migration host (cloud-init/dhcpcd hardware) — a nixos-rebuild build + nix store diff-closures vs the running system showed a big hardware delta, NOT just my dashboard change. So a full switch on the LIVE host would be wrong/ dangerous. Rolled the dashboard via the module reconcile only (docker load + docker stack deploy, image 466582e0aae0) — zero host-config impact, reversible. Recorded the mechanism + migration caveat in DECISIONS.md (Phase-3/U2) and warned the Adversary via ADVERSARY-INBOX. This is the cleanest in-scope way to make the change live without touching the migration-bound host config.
Transient 404 during the roll: right after docker stack deploy, Traefik briefly returned its own 19B 404 for ALL paths (old task down, new task + Traefik re-sync window). Resolved on its own in ~25s → / 200, /runs/... 200. Noted so it isn't mistaken for a real outage.

Verification (live, post-roll):

https://ci.commoninternet.net/runs/u1-uk-shot/summary.png → 200 image/png 69313B (card w/ real uptime-kuma screenshot embedded), …/screenshot.png 200 30858B, …/badge.svg 200, …/results.json 200. Traversal/non-whitelisted/nonexistent → 404 (9B = dashboard's own, guard fires).
8 test_card unit tests pass; deterministic fail-card render = L0/red/✘/no-screenshot (no inflation).
/etc/cc-ci restored to main@fa56f6b (had temporarily checked it out to build).

2026-05-31T09:35Z — U3 live demo: discovered Drone DB reset (repo inactive), reactivated

Resuming U3 (bridge code already built+deployed @9a47aa2; deployed bridge image tag 6377f9571f3b == sha256(bridge.py), confirmed; dashboard do_HEAD live → A3-1 CLOSED by Adversary @8807240).

To run the U3 live demo (!testme → image-forward PR comment) I first validated the trigger path and hit a real blocker: the bridge log showed drone trigger failed 404, and GET /api/repos/ recipe-maintainers/cc-ci → 404. Diagnosis: the Drone admin token is valid (/api/user → 200, autonomic-bot admin=true) but the repo was inactive — Drone's DB was reset (the Hetzner migration; created/synced timestamps are all recent ~1780220000). In Phase 1 the repo was activated once via POST /api/repos/recipe-maintainers/cc-ci (JOURNAL.md:258); that activation is NOT Nix-declared (drone.nix only PATCHes the timeout, which itself assumes the repo is already active), so a DB reset silently de-registers it and the bridge can't trigger.

Action (in-scope reconfig of my own CI, reversible): POST /api/user/repos?async=false (sync, 200) → POST /api/repos/recipe-maintainers/cc-ci → active=true, config_path=.drone.yml, timeout=60. The trusted flag stays false — irrelevant for the type: exec pipeline (trusted only gates privileged docker pipelines). Validated by triggering a custom build directly (same params the bridge sends): build #1 → running within ~10s (exec runner picked it up). Watching it produce /runs/1/ artifacts.

NOTE for hardening backlog (U5/operator): repo activation should be folded into the drone reconcile so a future DB reset self-heals (POST /api/repos/<slug> before the timeout PATCH). Filing in BACKLOG-3.

15 KiB Raw Blame History Unescape Escape