Files
cc-ci/machine-docs/JOURNAL-3.md
autonomic-bot 5b6b378ade claim(3 U0): results.json + level ladder — gate CLAIMED
U0 (R1) done: pure level() mapper (L0-L6 gap-caps) + per-test JUnit results + results.json, all
emitted best-effort (never changes verdict, R7). Two real runs bracket the gate:
custom-html-tiny=L2 (functional N/A, backup N/A caps at L2) and uptime-kuma=L4 (full climb, no SSO
surface caps at L5). 28 unit tests + Adversary fuzz-clean. Rung-mapping contract in DECISIONS.
Verify: STATUS-3.md HOW/EXPECTED. Awaiting Adversary cold-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:03:49 +00:00

72 lines
5.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 3 — Beautiful YunoHost-style results — JOURNAL (Builder-private reasoning)
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase3-results-ux.md`. WHY lives here; WHAT/HOW/EXPECTED/WHERE → STATUS-3.
## 2026-05-31T05:41Z — Phase-3 bootstrap + orientation
Read plan-phase3-results-ux.md in full (SSOT) + plan.md §6.1/§7/§9. Oriented on the existing
Phase-1/2 artifacts I'll extend:
- `runner/run_recipe_ci.py`: orchestrates deploy-once → per-tier (install/upgrade/backup/restore/custom),
produces an in-memory `results` dict `{tier: 'pass'|'fail'|'skip'}` printed to Drone logs. **No
results.json, no level, no screenshot today.** Also tracks deploy-count (DG4.1), deps/SSO readiness
(`sso_dep_unverified` → F2-11), teardown errors.
- `bridge/bridge.py`: posts a text PR comment with the Drone run URL; `watch_and_reflect` edits it to
✅/❌ on completion. No image/badge/level.
- `dashboard/dashboard.py`: stdlib HTTP service (swarm OCI image, Nix-built) that polls the **Drone API
only** and renders a latest-per-recipe table + a basic per-recipe SVG badge (Drone status, not level).
Runs as a container with **no host volume mounts** — relevant for artifact hosting (U0.4).
Key Phase-3 mapping insight: the level ladder (§4.1) maps cleanly onto the existing per-tier results:
- L1 install-tier pass; L2 upgrade pass; L3 backup AND restore pass; L4 custom (functional) pass;
L5 SSO/integration (requires_deps tests actually ran + passed — `deps_ready` and not
`sso_dep_unverified`); L6 recipe-local tests pass (D4 — discovered repo-local overlay/custom).
- Gap-caps-level (YunoHost): level = highest rung L such that every rung ≤ L passed. A rung that is
genuinely N/A (e.g. backup not BACKUP_CAPABLE, or no SSO/integration surface) must NOT block the
climb but caps with a recorded reason ("L4 — no integration surface" etc.) for fairness (§4.1 L5).
- Invariants surfaced as flags not levels: clean-teardown ✔ (no dep_teardown_error / DG4.1 ok),
no-secret-leak ✔.
Adversary is live (REVIEW-3 @05:42Z), flagged the Phase-2-DONE prerequisite but is not treating it as
a P3 blocker; operator kicked Phase 3 off manually. Proceeding.
### Plan for U0 (foundation)
1. Pure `level()` function in a new `runner/harness/level.py` — unit-testable (no I/O), so I can prove
"L4-pass" and "L2-cap" semantics cheaply and the Adversary can re-run the unit test cold. This is
the load-bearing logic; everything else (card, badge, dashboard) just *renders* what it returns.
2. Capture per-test detail: run each tier's pytest with `--junitxml` to a run-scoped dir, parse the
XML (stdlib `xml.etree`) into per-test rows {name, status, ms}. Aggregate per stage.
3. `run_recipe_ci.py` assembles `results.json` {recipe, version, pr, ref, run_id, stages[], level,
level_cap_reason, flags} and writes it to the artifact dir — wrapped so a failure here NEVER changes
the run's exit code (R7: cosmetics never block).
4. Artifact hosting (U0.4): runner writes to a host dir; dashboard bind-mounts it read-only to serve
`/runs/<id>/...`. Decide details + record in DECISIONS.
## 2026-05-31T06:00Z — U0 complete + CLAIMED
Implemented U0.1U0.4. Two real end-to-end runs on cc-ci confirm the translation layer (the binding
risk the Adversary flagged at df54693) produces correct levels:
- **custom-html-tiny** (stateless, not backup-capable, ≥2 versions): install+upgrade pass, backup/
restore skip→N/A, no custom → **level=2**, cap "L3 backup/restore N/A". Proves gap-caps on real data.
- **uptime-kuma** (backup-capable, 3 functional tests, no deps): all five tiers pass → **level=4**,
cap "L5 integration N/A". Proves a full clean climb with no SSO surface caps at L4.
Both: deploy-count=1, clean_teardown=true, no_secret_leak=true, no orphan apps after.
Design notes / WHY:
- Chose STRICT monotonic capping (N/A caps like FAIL, distinct reason) over "N/A transparent for middle
rungs" because the only worked example in §4.1 (no-integration → cap L4) is N/A-caps, and the cardinal
guardrail is never-inflate. A stateless app that can't back up is honestly capped at L2 with a clear
reason rather than shown as L4 — understating is safe, overstating is the cardinal FAIL.
- Kept the LEVEL driven by tier results + deps signals (precise, in-hand) rather than per-test marker
plumbing; the per-test JUnit rows are for the card's DISPLAY (U2/U3). functional-vs-SSO split inside
the custom tier is conservative: a custom FAIL fails the functional rung (caps L3) since we don't
cheaply distinguish — never inflates.
- results.json assembly + the narrow leak-scan are wrapped in try/except in main() so any failure is
logged but never changes `overall` (R7). The broader Adversary leak scan over published artifacts is
the authority (U5).
- "version" field currently shows the recipe HEAD sha for a non-PR run (no VERSION env). Honest but
ugly for the card; will prefer the tested version tag for display in U2.
Pre-existing repo lint RED (94 reformat + 36 ruff errors on origin/main, ruff 0.7.3 on CI devshell):
not mine, flagged in STATUS for the operator. My new files are clean; run_recipe_ci.py left better
than found (1 vs 4 errors). NOT reformatting 94 cross-phase files in Phase 3 (out of scope, huge noise).