Files
cc-ci/machine-docs/STATUS-3.md

126 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 3 — Beautiful YunoHost-style results — STATUS
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. DoD = R1R8. Milestones U0U5.
State files (this phase): `machine-docs/{STATUS,BACKLOG,REVIEW,JOURNAL}-3.md`. DECISIONS.md shared.
**WHAT + HOW + EXPECTED + WHERE live here; WHY → JOURNAL-3.md.**
## Phase context
- Phase 2b is `## DONE` (Adversary-verified, no VETO). Phase 3 kicked off **manually by the operator**.
Note for honesty: Phase-2 `## DONE` not yet flipped (REVIEW-2 standing VETO on full Phase-2 DONE
authorization); cross-phase sequencing is an operator call. Adversary concurs it's not a P3 blocker
(REVIEW-3 @05:42Z).
- **Pre-existing repo-wide lint is RED on origin/main** (94 files `ruff format`-dirty + 36 `ruff check`
errors; confirmed on cc-ci CI devshell against clean `origin/main`, ruff 0.7.3). This predates Phase 3
and is NOT introduced by my work — my NEW Phase-3 files are fully `ruff`-clean, and I left
`run_recipe_ci.py` with fewer ruff errors than main (1 vs 4). Flagged for the operator; not a Phase-3
DoD item, and the U0 gate is verified by unit tests + real-run results.json, not repo-wide lint.
---
## Gate: U0 — PASS (Adversary REVIEW-3 @18d2bd1, 2026-05-31; R1 cold-verified, no VETO) (Results schema + level)
**WHAT.** `run_recipe_ci.py` now emits a per-run `results.json` with per-stage AND per-test ✔/✘
breakdown and a computed integer **level** (L0L6, YunoHost gap-caps semantics). DoD R1 (level ladder)
satisfied; U0 milestone acceptance ("level correct for a recipe through L4 and one capped at L2")
demonstrated on two real end-to-end runs.
**WHERE (commits / files).**
- `9773e3f` `runner/harness/level.py` — pure `compute_level(rungs)->(level,cap_reason)` + helpers
`backup_restore_status`, `tier_to_rung`. `tests/unit/test_level.py` (15 tests).
- `52e5d21` `runner/harness/results.py` — JUnit-XML parse, `collect_stages`, `derive_rungs` (the
tier+deps/SSO→rung translation), `build_results`, `write_results`. `tests/unit/test_results.py`
(13 tests). `runner/run_recipe_ci.py` — tiers emit `--junitxml` + append `{tier,source,file,rc,junit}`
records; `main()` assembles+writes results.json wrapped so a failure NEVER changes the verdict (R7),
incl. a narrow self leak-scan of the serialised artifact.
- `757511e` `machine-docs/DECISIONS.md` (Phase-3 section) — the documented ladder + exact rung-mapping
contract `derive_rungs` implements + results.json schema + artifact-hosting decision.
**HOW to verify (cold, from your clone on cc-ci).**
1. **Unit tests** (deterministic; also fuzz-verifiable):
`cc-ci-run -m pytest tests/unit/test_level.py tests/unit/test_results.py -q`
2. **Real-run L2-cap** (stateless, not backup-capable, ≥2 versions):
`RECIPE=custom-html-tiny STAGES=install,upgrade,backup,restore,custom CCCI_RUN_ID=adv-cht cc-ci-run runner/run_recipe_ci.py`
then read `/var/lib/cc-ci-runs/adv-cht/results.json`.
3. **Real-run L4-pass** (backup-capable, 3 functional tests, no deps):
`RECIPE=uptime-kuma STAGES=install,upgrade,backup,restore,custom CCCI_RUN_ID=adv-uk cc-ci-run runner/run_recipe_ci.py`
then read `/var/lib/cc-ci-runs/adv-uk/results.json`.
(Compare the `level`/`rungs` against the `results` dict + DECISIONS contract — a level greener than
the tiers would be a FAIL. Verify clean teardown: no orphan `*-pr*`/recipe service after.)
**EXPECTED.**
1. `28 passed`.
2. custom-html-tiny: `level=2`, `level_cap_reason="L3 backup/restore (data integrity) N/A"`,
`rungs={install:pass, upgrade:pass, backup_restore:na, functional:na, integration:na, recipe_local:na}`,
`results={install:pass, upgrade:pass, backup:skip, restore:skip, custom:skip}`,
`flags={clean_teardown:true, no_secret_leak:true}`, stages=[install,upgrade] each w/ per-test rows.
(My run: `/var/lib/cc-ci-runs/u0-cht-L2/results.json`.)
3. uptime-kuma: `level=4`, `level_cap_reason="L5 integration (SSO/OIDC + cross-app) N/A"`,
`rungs={install:pass, upgrade:pass, backup_restore:pass, functional:pass, integration:na, recipe_local:na}`,
all five tiers pass, `flags.clean_teardown=true`, stages=[install,upgrade,backup,restore,custom]
with per-test rows (incl. 3 uptime-kuma functional tests, source `cc-ci`).
(My run: `/var/lib/cc-ci-runs/u0-uk-L4/results.json`.)
These two bracket the gate: a recipe whose functional tests **pass** is still capped at **L2** when a
lower rung (L3 backup) is N/A (gap-caps; never inflates), and a full clean climb with no SSO surface
caps at **L4**.
---
## Gate: U1 — PASS (Adversary REVIEW-3 @74a6993, 2026-05-31; R4 cold-verified, no VETO) (App screenshot)
**WHAT.** The harness now captures a **real Playwright screenshot of the deployed app** while it is
up (after deploy+health/readiness, before any tier mutates state, before teardown) and writes it to
the run artifact dir as `screenshot.png`. The capture is **secret-safe by default** (it shoots the
app **landing page**, never a credentials page; a recipe opts into a post-login view via an optional
`SCREENSHOT` meta hook that owns the no-secret-page guarantee — none used yet). It is **best-effort**:
`capture()` swallows every error and returns `None`, so it NEVER blocks/fails/hangs the run (R7); the
`results.json` `screenshot` field is set to `"screenshot.png"` ONLY when the capture actually produced
a file, else stays `null`. U1 milestone acceptance ("screenshot of a sample recipe shows the working
UI, no secrets") demonstrated on a real uptime-kuma run; graceful-degradation (R7) demonstrated on an
unreachable-domain capture.
**WHERE (commits / files).**
- `5fa15d4` `runner/run_recipe_ci.py` — imports `screenshot as screenshot_mod`; after deploy+readiness
and OUTSIDE the deploy try/except (so a screenshot issue can never flip `deploy_ok`), under
`if deploy_ok:` calls `screenshot_mod.capture(domain, screenshot_path(run_artifact_dir), recipe_meta=meta)`
and sets `screenshot_rel`; passes `screenshot=screenshot_rel` into `build_results(...)`.
- `daa7edd` `runner/harness/screenshot.py``capture()` (default landing-page nav via
`browser.goto_with_retry`, 45s deadline cap; optional `SCREENSHOT` hook), `screenshot_path()`,
`_load_screenshot_hook()`. `tests/unit/test_screenshot.py` (pure helpers; 4 tests).
**HOW to verify (cold, from your clone on cc-ci).**
1. **Pure-helper unit tests:** `cc-ci-run -m pytest tests/unit/test_screenshot.py -q`
2. **Real positive capture** (working UI, no secret): `rm -rf /var/lib/cc-ci-runs/adv-u1 &&
RECIPE=uptime-kuma STAGES=install CCCI_RUN_ID=adv-u1 cc-ci-run runner/run_recipe_ci.py`
then `scp` back `/var/lib/cc-ci-runs/adv-u1/screenshot.png` and EYEBALL it; check
`/var/lib/cc-ci-runs/adv-u1/results.json` has `"screenshot":"screenshot.png"`. Confirm NO orphan
service after (`docker service ls | grep -i uptime` empty = clean teardown).
3. **Graceful degradation (R7)** — capture against an unreachable host returns None, never raises:
`cc-ci-run -c 'import sys; sys.path.insert(0,"runner"); from harness import screenshot as S;
print(S.capture("adv-u1-noexist.ci.commoninternet.net","/tmp/x.png"))'` → prints `None` (≈45s),
no /tmp/x.png produced.
**EXPECTED.**
1. `3 passed` (test_screenshot.py has 3 pure-helper tests; corrected from an earlier "4" over-count
per the Adversary's honest-reporting flag, REVIEW-3 @74a6993 — doc-only, no behavioural impact).
2. `screenshot.png` ~30 KB showing uptime-kuma's **"Uptime Kuma / Create your admin account"**
landing page with **EMPTY** username/password/repeat fields (a setup form — it asks the user to
set a password; it does NOT display any generated secret), i.e. real working app UI, no secret
values. results.json `screenshot="screenshot.png"`, `flags.clean_teardown=true`; no orphan service.
(My run: `/var/lib/cc-ci-runs/u1-uk-shot/{screenshot.png,results.json}`.)
3. `None` returned after the 45s deadline, no file written, no exception — proving a screenshot
failure leaves the run/verdict untouched (cosmetics never block, R7). (My check log: capture
"failed (non-fatal, verdict unaffected)" → `GRACEFUL_DEGRADATION= True`.)
The cardinal Phase-3 invariant for U1: the screenshot is a faithful capture of the live app, never a
credentials page, and its presence/absence never changes the verdict.
## In flight (next, post-gate)
- U2 — summary card + badge (HTML→PNG via Playwright; SVG level badge; stable URLs). Render path
already de-risked headless on cc-ci for pass+fail fixtures (JOURNAL-3 @06:50Z) — next is wiring the
card/badge generation into the run + serving them. Held until U1 PASSes (no advance past the gate).
## Blocked
(none)