Files
cc-ci/machine-docs/STATUS-3.md

239 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 3 — Beautiful YunoHost-style results — STATUS
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. DoD = R1R8. Milestones U0U5.
State files (this phase): `machine-docs/{STATUS,BACKLOG,REVIEW,JOURNAL}-3.md`. DECISIONS.md shared.
**WHAT + HOW + EXPECTED + WHERE live here; WHY → JOURNAL-3.md.**
## Phase context
- Phase 2b is `## DONE` (Adversary-verified, no VETO). Phase 3 kicked off **manually by the operator**.
Note for honesty: Phase-2 `## DONE` not yet flipped (REVIEW-2 standing VETO on full Phase-2 DONE
authorization); cross-phase sequencing is an operator call. Adversary concurs it's not a P3 blocker
(REVIEW-3 @05:42Z).
- **Pre-existing repo-wide lint is RED on origin/main** (94 files `ruff format`-dirty + 36 `ruff check`
errors; confirmed on cc-ci CI devshell against clean `origin/main`, ruff 0.7.3). This predates Phase 3
and is NOT introduced by my work — my NEW Phase-3 files are fully `ruff`-clean, and I left
`run_recipe_ci.py` with fewer ruff errors than main (1 vs 4). Flagged for the operator; not a Phase-3
DoD item, and the U0 gate is verified by unit tests + real-run results.json, not repo-wide lint.
---
## Gate: U0 — PASS (Adversary REVIEW-3 @18d2bd1, 2026-05-31; R1 cold-verified, no VETO) (Results schema + level)
**WHAT.** `run_recipe_ci.py` now emits a per-run `results.json` with per-stage AND per-test ✔/✘
breakdown and a computed integer **level** (L0L6, YunoHost gap-caps semantics). DoD R1 (level ladder)
satisfied; U0 milestone acceptance ("level correct for a recipe through L4 and one capped at L2")
demonstrated on two real end-to-end runs.
**WHERE (commits / files).**
- `9773e3f` `runner/harness/level.py` — pure `compute_level(rungs)->(level,cap_reason)` + helpers
`backup_restore_status`, `tier_to_rung`. `tests/unit/test_level.py` (15 tests).
- `52e5d21` `runner/harness/results.py` — JUnit-XML parse, `collect_stages`, `derive_rungs` (the
tier+deps/SSO→rung translation), `build_results`, `write_results`. `tests/unit/test_results.py`
(13 tests). `runner/run_recipe_ci.py` — tiers emit `--junitxml` + append `{tier,source,file,rc,junit}`
records; `main()` assembles+writes results.json wrapped so a failure NEVER changes the verdict (R7),
incl. a narrow self leak-scan of the serialised artifact.
- `757511e` `machine-docs/DECISIONS.md` (Phase-3 section) — the documented ladder + exact rung-mapping
contract `derive_rungs` implements + results.json schema + artifact-hosting decision.
**HOW to verify (cold, from your clone on cc-ci).**
1. **Unit tests** (deterministic; also fuzz-verifiable):
`cc-ci-run -m pytest tests/unit/test_level.py tests/unit/test_results.py -q`
2. **Real-run L2-cap** (stateless, not backup-capable, ≥2 versions):
`RECIPE=custom-html-tiny STAGES=install,upgrade,backup,restore,custom CCCI_RUN_ID=adv-cht cc-ci-run runner/run_recipe_ci.py`
then read `/var/lib/cc-ci-runs/adv-cht/results.json`.
3. **Real-run L4-pass** (backup-capable, 3 functional tests, no deps):
`RECIPE=uptime-kuma STAGES=install,upgrade,backup,restore,custom CCCI_RUN_ID=adv-uk cc-ci-run runner/run_recipe_ci.py`
then read `/var/lib/cc-ci-runs/adv-uk/results.json`.
(Compare the `level`/`rungs` against the `results` dict + DECISIONS contract — a level greener than
the tiers would be a FAIL. Verify clean teardown: no orphan `*-pr*`/recipe service after.)
**EXPECTED.**
1. `28 passed`.
2. custom-html-tiny: `level=2`, `level_cap_reason="L3 backup/restore (data integrity) N/A"`,
`rungs={install:pass, upgrade:pass, backup_restore:na, functional:na, integration:na, recipe_local:na}`,
`results={install:pass, upgrade:pass, backup:skip, restore:skip, custom:skip}`,
`flags={clean_teardown:true, no_secret_leak:true}`, stages=[install,upgrade] each w/ per-test rows.
(My run: `/var/lib/cc-ci-runs/u0-cht-L2/results.json`.)
3. uptime-kuma: `level=4`, `level_cap_reason="L5 integration (SSO/OIDC + cross-app) N/A"`,
`rungs={install:pass, upgrade:pass, backup_restore:pass, functional:pass, integration:na, recipe_local:na}`,
all five tiers pass, `flags.clean_teardown=true`, stages=[install,upgrade,backup,restore,custom]
with per-test rows (incl. 3 uptime-kuma functional tests, source `cc-ci`).
(My run: `/var/lib/cc-ci-runs/u0-uk-L4/results.json`.)
These two bracket the gate: a recipe whose functional tests **pass** is still capped at **L2** when a
lower rung (L3 backup) is N/A (gap-caps; never inflates), and a full clean climb with no SSO surface
caps at **L4**.
---
## Gate: U1 — PASS (Adversary REVIEW-3 @74a6993, 2026-05-31; R4 cold-verified, no VETO) (App screenshot)
**WHAT.** The harness now captures a **real Playwright screenshot of the deployed app** while it is
up (after deploy+health/readiness, before any tier mutates state, before teardown) and writes it to
the run artifact dir as `screenshot.png`. The capture is **secret-safe by default** (it shoots the
app **landing page**, never a credentials page; a recipe opts into a post-login view via an optional
`SCREENSHOT` meta hook that owns the no-secret-page guarantee — none used yet). It is **best-effort**:
`capture()` swallows every error and returns `None`, so it NEVER blocks/fails/hangs the run (R7); the
`results.json` `screenshot` field is set to `"screenshot.png"` ONLY when the capture actually produced
a file, else stays `null`. U1 milestone acceptance ("screenshot of a sample recipe shows the working
UI, no secrets") demonstrated on a real uptime-kuma run; graceful-degradation (R7) demonstrated on an
unreachable-domain capture.
**WHERE (commits / files).**
- `5fa15d4` `runner/run_recipe_ci.py` — imports `screenshot as screenshot_mod`; after deploy+readiness
and OUTSIDE the deploy try/except (so a screenshot issue can never flip `deploy_ok`), under
`if deploy_ok:` calls `screenshot_mod.capture(domain, screenshot_path(run_artifact_dir), recipe_meta=meta)`
and sets `screenshot_rel`; passes `screenshot=screenshot_rel` into `build_results(...)`.
- `daa7edd` `runner/harness/screenshot.py``capture()` (default landing-page nav via
`browser.goto_with_retry`, 45s deadline cap; optional `SCREENSHOT` hook), `screenshot_path()`,
`_load_screenshot_hook()`. `tests/unit/test_screenshot.py` (pure helpers; 4 tests).
**HOW to verify (cold, from your clone on cc-ci).**
1. **Pure-helper unit tests:** `cc-ci-run -m pytest tests/unit/test_screenshot.py -q`
2. **Real positive capture** (working UI, no secret): `rm -rf /var/lib/cc-ci-runs/adv-u1 &&
RECIPE=uptime-kuma STAGES=install CCCI_RUN_ID=adv-u1 cc-ci-run runner/run_recipe_ci.py`
then `scp` back `/var/lib/cc-ci-runs/adv-u1/screenshot.png` and EYEBALL it; check
`/var/lib/cc-ci-runs/adv-u1/results.json` has `"screenshot":"screenshot.png"`. Confirm NO orphan
service after (`docker service ls | grep -i uptime` empty = clean teardown).
3. **Graceful degradation (R7)** — capture against an unreachable host returns None, never raises:
`cc-ci-run -c 'import sys; sys.path.insert(0,"runner"); from harness import screenshot as S;
print(S.capture("adv-u1-noexist.ci.commoninternet.net","/tmp/x.png"))'` → prints `None` (≈45s),
no /tmp/x.png produced.
**EXPECTED.**
1. `3 passed` (test_screenshot.py has 3 pure-helper tests; corrected from an earlier "4" over-count
per the Adversary's honest-reporting flag, REVIEW-3 @74a6993 — doc-only, no behavioural impact).
2. `screenshot.png` ~30 KB showing uptime-kuma's **"Uptime Kuma / Create your admin account"**
landing page with **EMPTY** username/password/repeat fields (a setup form — it asks the user to
set a password; it does NOT display any generated secret), i.e. real working app UI, no secret
values. results.json `screenshot="screenshot.png"`, `flags.clean_teardown=true`; no orphan service.
(My run: `/var/lib/cc-ci-runs/u1-uk-shot/{screenshot.png,results.json}`.)
3. `None` returned after the 45s deadline, no file written, no exception — proving a screenshot
failure leaves the run/verdict untouched (cosmetics never block, R7). (My check log: capture
"failed (non-fatal, verdict unaffected)" → `GRACEFUL_DEGRADATION= True`.)
The cardinal Phase-3 invariant for U1: the screenshot is a faithful capture of the live app, never a
credentials page, and its presence/absence never changes the verdict.
---
## Gate: U2 — PASS (Adversary REVIEW-3 @324d84d, 2026-05-31; R3/R6 partial cold-verified, no VETO) (Summary card + badge)
**WHAT.** Each run now renders a **summary card PNG** (recipe+version, level badge, per-stage/per-test
✔/✘ table, embedded **real app screenshot**) and an **SVG level badge**, written into the run artifact
dir and **served at stable URLs** `https://ci.commoninternet.net/runs/<run_id>/{summary.png,badge.svg,
screenshot.png,results.json}`. The card REPORTS results.json verbatim — it computes nothing, so it can
never look greener than the tiers (cardinal invariant). U2 acceptance ("card + badge render correctly
for a pass run AND a fail run") demonstrated: a real PASS run served live; a deterministic FAIL render
shown honest (L0/red/✘/no-screenshot).
**WHERE (commits / files).**
- `afe5e51` `runner/run_recipe_ci.py` — after results.json is written, a separate best-effort block
renders `summary.html`→`summary.png` + `badge.svg` via `harness.card` (passes
`screenshot_rel=data["screenshot"]` so the real shot embeds iff present). R7-wrapped — any failure
is swallowed, never changes `overall`.
- `daa7edd`/`7217e0c`/`8179d3f` `runner/harness/card.py` — pure `render_card_html`, `render_badge_svg`/
`level_badge_svg` (deterministic string builders), `render_card_png` (best-effort Playwright). Inline
SVG sunflower (headless chromium has no colour-emoji font). `tests/unit/test_card.py` (8 tests).
- `fa56f6b` `dashboard/dashboard.py` + `nix/modules/dashboard.nix` — `/runs/<id>/<file>` route
(allow-list + `run_id` regex + realpath-inside-runs-dir traversal guard); `/var/lib/cc-ci-runs`
bind-mounted READ-ONLY into the dashboard swarm service; `CCCI_RUNS_DIR` env.
**HOW to verify (cold).** (See ADVERSARY-INBOX for the deploy gotcha — do NOT `nixos-rebuild switch`
the live host; `#cc-ci` targets the hetzner migration host. U2.3 was rolled via the dashboard module
reconcile only. DECISIONS.md Phase-3/U2 has the `diff-closures` evidence.)
1. **Unit tests:** `cc-ci-run -m pytest tests/unit/test_card.py -q` → `8 passed`.
2. **PASS card served live (real):**
`curl -s -o /tmp/c.png -w '%{http_code} %{content_type} %{size_download}\n'
https://ci.commoninternet.net/runs/u1-uk-shot/summary.png` → `200 image/png ~69313`. Eyeball
`/tmp/c.png`: uptime-kuma, **orange LEVEL 1**, "capped: L2 upgrade N/A", install/test_serving ✔
PASS rows, clean-teardown+no-secret-leak flags, and the **real uptime-kuma screenshot embedded**.
Also `…/screenshot.png` (200 ~30858), `…/badge.svg` (200 image/svg+xml), `…/results.json` (200).
3. **Traversal/whitelist guard:** `…/runs/u1-uk-shot/../../../etc/passwd`, `…/runs/u1-uk-shot/evil.sh`,
`…/runs/nonexist/results.json` → **404** with a **9-byte** body (the dashboard's own "not found",
NOT Traefik's 19-byte 404 — proves the request reached the app and the guard rejected it).
4. **FAIL render is honest (cardinal invariant):** feed the card a fail dict (cmd in ADVERSARY-INBOX
§3) → card shows **level 0**, `level_color(0)` (red), the **✘ FAIL** mark on the install row, and
the **"no screenshot"** placeholder — never greener than the data.
**EXPECTED.** (1) `8 passed`. (2) PASS card 200/image-png/~69KB, embeds the real screenshot, level/marks
match results.json (`u1-uk-shot`: level 1, install pass). (3) all three guarded paths 404 with a 9B
body. (4) fail render: `>0<` (level 0), red colour, ✘ present, "no screenshot" present — no inflation.
The cardinal U2 invariant: the rendered card/level/badge are a faithful, never-greener projection of
results.json + the actual test outcomes, served at a stable URL, generated best-effort so a render
failure never blocks the run.
## Gate: U3 — CLAIMED, awaiting Adversary (YunoHost-style PR comment; R2)
**WHAT.** On a `!testme` run the bridge now posts/updates ONE Gitea PR comment in the YunoHost shape:
on run start a 🌻 + ⏳ **placeholder** ("level pending", live-logs link); on completion it edits the
**SAME** comment in place to 🌻 + a **level badge** image + a **summary card** image, BOTH linked to
the full run, plus full-logs/dashboard links. A re-`!testme` refreshes that same comment (back to ⏳,
then to the new result) — never stacks a new one (R2 "one comment per PR, updated in place"). Falls
back to a compact text verdict if the rendered card isn't served (R7). DoD **R2** satisfied; U3
acceptance ("live on a scratch PR — comment shows badge + card + screenshot, updates on re-run, no
secrets") demonstrated on a real scratch PR. (This also lands R3's "embedded in the comment"
sub-requirement; R3 still needs "in dashboard" at U4.)
**WHERE (commits / files).**
- `9a47aa2` `bridge/bridge.py` — `COMMENT_MARKER` (hidden HTML comment `<!-- cc-ci:testme -->`),
`start_comment_body` (⏳ placeholder), `result_comment_body` (🌻 + badge + card, linked; text
fallback), `find_existing_comment` (marker → update-in-place), `artifact_available` (HEAD existence
check → image-vs-text), `watch_and_reflect` now edits to `result_comment_body`. Card/badge URLs are
`${DASH_URL}/runs/<DRONE_BUILD_NUMBER>/{summary.png,badge.svg}` (run_id == Drone build number, see
`runner/harness/results.py::run_id`).
- `9a47aa2` `dashboard/dashboard.py` — `do_HEAD` (shared `_route` with GET) so HEAD existence-checks +
strict image clients get 200, not 501 (closes Adversary A3-1, already re-verified @8807240).
- `9a47aa2` `tests/unit/test_bridge_trigger.py` — covers placeholder shape, image-forward result,
**text fallback when card missing**, marker-based find/update-in-place.
- **Deployed:** bridge swarm image `cc-ci-bridge:6377f9571f3b` == `sha256(bridge.py)` first-12 (content
tag, confirmed live); dashboard image live with `do_HEAD`.
**HOW to verify (cold, from your clone / the VM).**
1. **Unit tests** (on cc-ci): `cc-ci-run -m pytest tests/unit/test_bridge_trigger.py tests/unit/test_card.py -q` → `15 passed`.
2. **Deployed bridge == source:** `ssh cc-ci 'sha256sum /etc/cc-ci/bridge/bridge.py | cut -c1-12'` →
`6377f9571f3b`; `ssh cc-ci 'docker service ls | grep ccci-bridge'` shows image tag `6377f9571f3b`.
3. **LIVE demo on scratch PR** `recipe-maintainers/custom-html` **PR #2** (recipe == repo name; the
bridge poller, 30s, fires on a NEW `!testme`). The bot comment carrying the marker is **id 13792**:
`curl -s -u "$GITEA_USERNAME:$GITEA_PASSWORD" https://git.autonomic.zone/api/v1/repos/recipe-maintainers/custom-html/issues/comments/13792`
→ body has `<!-- cc-ci:testme -->`, 🌻, `✅ passed`, `[![cc-ci result card](…/runs/4/summary.png)](…/4)`,
`[![level](…/runs/4/badge.svg)](…/4)`, full-logs+dashboard links. (You may post your own `!testme`
on PR #2 — the repo is active in Drone; it will refresh **the same** comment 13792.)
4. **Images render (served):** `for f in summary.png badge.svg screenshot.png results.json; do
curl -s -o /dev/null -w "$f %{http_code}\n" https://ci.commoninternet.net/runs/4/$f; done` → all 200.
5. **Updates in place / no stacking:** the marked-comment set on PR #2 stays exactly `[13792]` across
runs #3 (first `!testme`) and #4 (re-`!testme`); the comment cycled ⏳→result both times. (Filter
comments for `<!-- cc-ci:testme -->` — there is exactly one.)
6. **No secrets:** scan the comment body + `/var/lib/cc-ci-runs/{3,4}/{results.json,summary.html}` for
`password|secret|token|passwd|api_key|privkey|PRIVATE` → only the `no_secret_leak` flag-name matches;
the embedded app screenshot is custom-html's **"Welcome to nginx!"** page (no values).
7. **No inflation:** the card for run #4 shows `level 4` / `capped: L5 integration N/A`, all
install/upgrade/backup/restore/custom rows ✔ — matches `/runs/4/results.json` verbatim.
**EXPECTED.**
1. `15 passed`. 2. tag `6377f9571f3b` both places. 3. comment 13792 body exactly as above (run 4).
4. all four `/runs/4/` files 200 (`summary.png` ~178 KB, `badge.svg` 342 B, `screenshot.png` 35707 B).
5. exactly one marked comment (`13792`); no new comment stacked on re-run. 6. zero real secret hits.
7. card level 4, all rows ✔, == results.json (`recipe=custom-html`, `level=4`, all tiers pass,
`flags.clean_teardown=true,no_secret_leak=true`).
The cardinal U3 invariant: ONE comment per PR, refreshed in place; the embedded card/badge are a
faithful never-greener projection of the run; image-gen failure degrades to text and never blocks the
run or the verdict.
## In flight
(none — U3 claimed; U4 dashboard polish next, unblocked work can start while parked at the U3 gate)
## Blocked
(none)
## Note — Drone repo reactivation (infra, recorded for the Adversary)
The Hetzner-migration Drone DB reset left `recipe-maintainers/cc-ci` **inactive** (bridge log `drone
trigger failed 404`); the bridge can't trigger builds when the repo is inactive. I reactivated it
(in-scope reconfig of my own CI, reversible): `POST /api/user/repos?async=false` then `POST
/api/repos/recipe-maintainers/cc-ci` → `active=true`, config_path `.drone.yml`, timeout 60. This is
why builds #1#4 above exist (counter reset to 1 by the DB reset). Self-heal hardening filed as
BACKLOG-3 U3.3 (fold activation into the drone reconcile) — not a U3 DoD item.