U0 (R1) done: pure level() mapper (L0-L6 gap-caps) + per-test JUnit results + results.json, all emitted best-effort (never changes verdict, R7). Two real runs bracket the gate: custom-html-tiny=L2 (functional N/A, backup N/A caps at L2) and uptime-kuma=L4 (full climb, no SSO surface caps at L5). 28 unit tests + Adversary fuzz-clean. Rung-mapping contract in DECISIONS. Verify: STATUS-3.md HOW/EXPECTED. Awaiting Adversary cold-verify. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
72 lines
5.2 KiB
Markdown
72 lines
5.2 KiB
Markdown
# Phase 3 — Beautiful YunoHost-style results — JOURNAL (Builder-private reasoning)
|
||
|
||
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. WHY lives here; WHAT/HOW/EXPECTED/WHERE → STATUS-3.
|
||
|
||
## 2026-05-31T05:41Z — Phase-3 bootstrap + orientation
|
||
|
||
Read plan-phase3-results-ux.md in full (SSOT) + plan.md §6.1/§7/§9. Oriented on the existing
|
||
Phase-1/2 artifacts I'll extend:
|
||
- `runner/run_recipe_ci.py`: orchestrates deploy-once → per-tier (install/upgrade/backup/restore/custom),
|
||
produces an in-memory `results` dict `{tier: 'pass'|'fail'|'skip'}` printed to Drone logs. **No
|
||
results.json, no level, no screenshot today.** Also tracks deploy-count (DG4.1), deps/SSO readiness
|
||
(`sso_dep_unverified` → F2-11), teardown errors.
|
||
- `bridge/bridge.py`: posts a text PR comment with the Drone run URL; `watch_and_reflect` edits it to
|
||
✅/❌ on completion. No image/badge/level.
|
||
- `dashboard/dashboard.py`: stdlib HTTP service (swarm OCI image, Nix-built) that polls the **Drone API
|
||
only** and renders a latest-per-recipe table + a basic per-recipe SVG badge (Drone status, not level).
|
||
Runs as a container with **no host volume mounts** — relevant for artifact hosting (U0.4).
|
||
|
||
Key Phase-3 mapping insight: the level ladder (§4.1) maps cleanly onto the existing per-tier results:
|
||
- L1 install-tier pass; L2 upgrade pass; L3 backup AND restore pass; L4 custom (functional) pass;
|
||
L5 SSO/integration (requires_deps tests actually ran + passed — `deps_ready` and not
|
||
`sso_dep_unverified`); L6 recipe-local tests pass (D4 — discovered repo-local overlay/custom).
|
||
- Gap-caps-level (YunoHost): level = highest rung L such that every rung ≤ L passed. A rung that is
|
||
genuinely N/A (e.g. backup not BACKUP_CAPABLE, or no SSO/integration surface) must NOT block the
|
||
climb but caps with a recorded reason ("L4 — no integration surface" etc.) for fairness (§4.1 L5).
|
||
- Invariants surfaced as flags not levels: clean-teardown ✔ (no dep_teardown_error / DG4.1 ok),
|
||
no-secret-leak ✔.
|
||
|
||
Adversary is live (REVIEW-3 @05:42Z), flagged the Phase-2-DONE prerequisite but is not treating it as
|
||
a P3 blocker; operator kicked Phase 3 off manually. Proceeding.
|
||
|
||
### Plan for U0 (foundation)
|
||
1. Pure `level()` function in a new `runner/harness/level.py` — unit-testable (no I/O), so I can prove
|
||
"L4-pass" and "L2-cap" semantics cheaply and the Adversary can re-run the unit test cold. This is
|
||
the load-bearing logic; everything else (card, badge, dashboard) just *renders* what it returns.
|
||
2. Capture per-test detail: run each tier's pytest with `--junitxml` to a run-scoped dir, parse the
|
||
XML (stdlib `xml.etree`) into per-test rows {name, status, ms}. Aggregate per stage.
|
||
3. `run_recipe_ci.py` assembles `results.json` {recipe, version, pr, ref, run_id, stages[], level,
|
||
level_cap_reason, flags} and writes it to the artifact dir — wrapped so a failure here NEVER changes
|
||
the run's exit code (R7: cosmetics never block).
|
||
4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve
|
||
`/runs/<id>/...`. Decide details + record in DECISIONS.
|
||
|
||
## 2026-05-31T06:00Z — U0 complete + CLAIMED
|
||
|
||
Implemented U0.1–U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding
|
||
risk the Adversary flagged at df54693) produces correct levels:
|
||
- **custom-html-tiny** (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/
|
||
restore skip→N/A, no custom → **level=2**, cap "L3 backup/restore N/A". Proves gap-caps on real data.
|
||
- **uptime-kuma** (backup-capable, 3 functional tests, no deps): all five tiers pass → **level=4**,
|
||
cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4.
|
||
Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after.
|
||
|
||
Design notes / WHY:
|
||
- Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle
|
||
rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal
|
||
guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear
|
||
reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL.
|
||
- Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker
|
||
plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside
|
||
the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't
|
||
cheaply distinguish — never inflates.
|
||
- results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is
|
||
logged but never changes `overall` (R7). The broader Adversary leak scan over published artifacts is
|
||
the authority (U5).
|
||
- "version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but
|
||
ugly for the card; will prefer the tested version tag for display in U2.
|
||
|
||
Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell):
|
||
not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better
|
||
than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise).
|