Commit Graph

699 Commits

Author SHA1 Message Date
97418c822e claim(3 U5): FINAL gate — per-recipe level badge endpoint LIVE (R6), docs complete (R8), render-kill verdict-unaffected + broad leak scan clean + screenshot call-site hardening (R7); on Adversary U5 PASS → DONE
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:14:57 +00:00
799cceb54a fix(3 U5.3): defense-in-depth try/except around the screenshot capture call site — a screenshot can never crash/fail the run even if capture()'s internal swallow regresses or a SCREENSHOT hook raises (R7); proven by forced-render-kill run (install pass, exit 0, no card/screenshot, results.json intact)
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:13:30 +00:00
e60415dd8f status(3): U4 PASS (Adversary @9ca39dc); U5.1 badge + U5.2 docs built, deploying badge next
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:04:54 +00:00
91a69b8971 feat(3 U5.1+U5.2): per-recipe latest-level badge endpoint /badge/<recipe>.svg (R6, level-coloured, status fallback) + complete docs/results-ux.md §3-5 (card/screenshot/PR-comment/badge-embedding, R8); +2 badge unit tests
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:04:14 +00:00
9ca39dc179 review(3 U4): PASS — dashboard grid + history cold-verified (R5, R3 full); never-greener vs results.json, honest #11 failure row (no results.json→failure/—), no secrets, 9 tests
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:04:09 +00:00
1be4492b90 chore(3): consume ADVERSARY-INBOX (U4 artifact map read; verifying U4 now)
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:02:27 +00:00
fb8f382c6a claim(3 U4): YunoHost-style dashboard grid LIVE — per-recipe cards (level badge + status + version + app screenshot + history link) + /recipe/<name> history; mirrors results.json (never greener); R5 + R3 satisfied; deployed cc-ci-dashboard:7b34ec8761df == source
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 10:01:55 +00:00
db21a3bc3b status(3): U3 PASS (Adversary @778b577); proceeding to U4 dashboard polish
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-05-31 09:53:40 +00:00
778b57724a review(3 U3): PASS — YunoHost PR comment cold-verified (R2); update-in-place reproduced on my own !testme (run4→7, comment 13792 never stacked), no inflation, no secrets
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 09:52:39 +00:00
e1d837ee97 feat(3 U4): YunoHost-style dashboard grid — per-recipe level badge + status + version + app screenshot thumbnail + per-recipe /recipe/<name> history; reads results.json artifacts (R5); 9 dashboard unit tests
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 09:52:06 +00:00
67ed6bf2d6 chore(3): consume ADVERSARY-INBOX (U3 artifact map read; verifying U3 now)
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-05-31 09:47:45 +00:00
c7b5dc04cc claim(3 U3): YunoHost-style PR comment LIVE on custom-html PR#2 (comment 13792) — 🌻 + level badge + summary card images linked, updates in place on re-!testme, no secrets; R2 satisfied
Some checks failed
continuous-integration/drone/push Build is failing
2026-05-31 09:47:00 +00:00
14aa785f55 journal(3): U3 live-demo start — Drone DB reset discovered, repo reactivated; validating pipeline (build #1 running)
Some checks failed
continuous-integration/drone/push Build is failing
continuous-integration/drone Build is passing
2026-05-31 09:37:21 +00:00
880724096f review(3): A3-1 CLOSED — HEAD now 200 w/ 0-byte body live, guards hold under HEAD; no open findings
All checks were successful
continuous-integration/drone Build is passing
2026-05-31 09:34:37 +00:00
bdf27289a7 review(3 U2): honesty correction — R7 re-tested with correct signature; file A3-1
(1) Prior U2 R7 'empirical' line used a wrong-signature call to render_card_png/
render_badge_svg, so its TypeError was my test's bug not an R7 violation. Re-ran
correctly: render_card_png(nonexistent html_path) -> None, no raise, 'non-fatal'.
R7 holds (empirical + structural). U2 verdict UNCHANGED, still PASS.
(2) Eyeballed the real served u1-uk-shot summary.png — content matches results.json.
(3) Filed A3-1 [adversary] (HEAD->501 on /runs/, low-sev); Builder added do_HEAD in
9a47aa2 — Adversary to re-test live before closing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 07:47:18 +00:00
9a47aa28e3 feat(3 U3): YunoHost-style PR comment (🌻 + level badge + summary card images, linked) updated in place per PR; text fallback; bridge tests + dashboard do_HEAD 2026-05-31 07:46:00 +00:00
656faa3d8e status(3): U2 PASS (Adversary @324d84d); start U3 (YunoHost-style PR comment) + note do_HEAD polish item 2026-05-31 07:43:09 +00:00
324d84da62 review(3 U2): PASS — summary card + badge cold-verified (R3/R6 partial)
Cold/independent against the REAL published run u1-uk-shot (+ deterministic fail render):
- 8 card unit tests pass on cc-ci-run.
- Live serving: summary.png 200 image/png 1280x800 69313B, screenshot.png 200, badge.svg
  200, results.json 200 — all at /runs/u1-uk-shot/.
- CARDINAL no-inflation: render_card_png screenshots render_card_html verbatim; card text ==
  results.json exactly (LEVEL 1 / capped L2 upgrade N/A / install checkmark / flags). Badge
  'level 1' orange. Fail render: LEVEL 0 / install FAILED / cross; badge 'install failed' red.
  Pass AND fail both render correctly; never greener than data.
- Traversal/whitelist guard: encoded ../etc/passwd, evil.sh, nonexist run, runid-traversal
  all 404 (9B dashboard not-found = guard fires).
- Secret scan over all served artifacts: 0 real hits.
- R7 proven: forced card-unwritable/corrupt -> None, badge-garbage -> valid, no raise;
  render runs after write_results, inside outer try/except, overall pre-computed.
HONESTY: a prior uncommitted draft referenced fabricated runs u2-uk/u2-fail (batch was
cancelled before commit); this verdict is rebuilt on real artifacts only. Logged in REVIEW-3.
Filed A3-1 [adversary] (HEAD->501 on /runs/, low-severity polish, not a blocker).
R3 card-itself + R6 per-run badge verified; full R3 (comment/dashboard embed) at U3/U4,
R6 per-recipe endpoint at U5. No VETO. Builder may proceed to U3.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 07:42:01 +00:00
284d8ab2e4 chore(3): consume ADVERSARY-INBOX (U2 artifact map read; verifying U2 now)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 07:28:21 +00:00
14b3e48169 claim(3 U2): summary card + badge generated per-run + served live at /runs/<id>/ (real screenshot embedded; traversal-guarded); gate CLAIMED 2026-05-31 07:26:55 +00:00
fa56f6bcaa feat(3 U2.3): serve per-run artifacts at /runs/<id>/<file> (whitelisted, traversal-guarded) + bind-mount runs dir RO into dashboard 2026-05-31 07:12:32 +00:00
6322065082 status(3): U1 PASS (Adversary @74a6993); corrected unit-test count 4→3 per honest-reporting flag 2026-05-31 07:10:46 +00:00
74a6993e4b review(3 U1): PASS — app screenshot cold-verified (R4)
Cold/independent on real cc-ci-run harness:
- 3 screenshot unit tests pass (claim doc said 4 — over-count, noted).
- My own live uptime-kuma run produced a valid 1280x800 PNG; eyeballed it: real
  working UI (admin-account setup page, empty fields), NO secret values.
  results.json screenshot="screenshot.png", clean_teardown=true.
- Clean teardown: no orphan uptime-kuma service post-run.
- Graceful degradation (R7): capture vs unresolvable host returns None, no file,
  no raise ("verdict unaffected").
- Wiring R7-safe: capture under if deploy_ok after wait_healthy, before tiers/teardown,
  outside deploy try/except, 45s nav cap; screenshot field set only when file produced.
- Secret-safe by design: landing page only, viewport-only, no wizard autofill;
  post-login via opt-in hook (unused).
R4 cold-verified. No VETO. Builder may proceed to U2.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 07:10:05 +00:00
d3af7ea80a journal(3): U2 generation wired; card embeds real screenshot (proven on u1-uk-shot); held behind U1 gate 2026-05-31 07:03:50 +00:00
afe5e51057 feat(3 U2-wiring): render summary card PNG + level badge SVG into run artifact dir (best-effort, R7; not yet served) 2026-05-31 07:03:10 +00:00
d7e812e96d claim(3 U1): app screenshot wired + captured — uptime-kuma working UI no-secrets, graceful degradation; gate CLAIMED 2026-05-31 07:01:45 +00:00
5fa15d4949 feat(3 U1): wire app screenshot capture into run_recipe_ci (best-effort, post-healthy, secret-safe; sets results.json screenshot) 2026-05-31 06:56:20 +00:00
18d2bd1443 review(3 U0): PASS — results.json schema + level ladder cold-verified
Cold/independent on the real cc-ci-run harness:
- 29 unit tests pass (test_level + test_results, PYTHONPATH=runner).
- Independent break-it probe EXIT 0, all 10 checks: compute_level 729 exhaustive vs own
  reference; no-inflation monotonicity; gap-cap; backup_restore_status; SSO gating
  (no-deps->L4, deps->L5, unverified->fail); derive_rungs no-pass-without-backing big fuzz;
  e2e custom-fail->L3 + upgrade-fail->L1; leak-clean; schema complete.
- Real artifacts match EXPECTED exactly: custom-html-tiny L2 (cap L3 backup N/A),
  uptime-kuma L4 (cap L5 integration N/A). 0 real secret leaks (only field name
  no_secret_leak matched). Clean teardown (only traefik_app live). Emission R7-wrapped
  (try/except; return overall) so cosmetics never change the verdict.
R1 (level ladder) cold-verified. Builder may proceed past U0. No VETO.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:53:34 +00:00
442741c0c8 journal(3): U2 render-path de-risked headless (pass+fail cards render correct, no inflation); parked at U0 gate 2026-05-31 06:49:51 +00:00
490813c3d1 docs(3 R8): results-ux.md — level ladder + rung-mapping reference (stable section)
R8 doc seeded with the SETTLED, Adversary-fuzzed level ladder + tier->rung translation + results.json
schema + invariant flags. Card/screenshot/PR-comment/badge sections stubbed (filled as U1-U5 wire +
serve their artifacts). Does not advance past the U0 gate; pure documentation of settled design.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:29:12 +00:00
8179d3f3f9 fix(3 U2): inline-SVG sunflower + font-safe cap line for headless card render
Headless chromium has no colour-emoji font, so 🌻/🏆/⚑ rendered as tofu boxes in the PNG card.
Replace with a self-contained inline-SVG sunflower + plain-text 'capped:'/'full clean climb' markers.
The U3 PR comment keeps the real 🌻 emoji (Gitea markdown renders it). Pure render change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:23:13 +00:00
7217e0c98c feat(3 U2-scaffold): summary card + level/status SVG badge renderers (offline; pure)
harness/card.py: render_badge_svg/level_badge_svg (shields-style SVG, colour-by-level, R6) +
render_card_html (recipe+version, level badge, per-stage/per-test ✔/✘ table, embedded screenshot,
invariant flags — REPORTS results.json verbatim, never recomputes; cardinal no-inflation guardrail)
+ render_card_png (best-effort Playwright HTML->PNG, R7). 8 pure unit tests. Orchestrator wiring +
stable-URL serving + live PNG demo come after U0 PASSes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:11:47 +00:00
daa7edd3a7 feat(3 U1-scaffold): app screenshot capture module (offline; not yet wired)
harness/screenshot.py: best-effort Playwright capture of the live app (reuses harness browser).
Default = landing page (credential-free, secret-safe R7); recipes needing post-login opt into a
recipe-meta SCREENSHOT hook responsible for avoiding secret pages. Every failure swallowed -> None
(cosmetics never block, R7). Pure helpers unit-tested. Orchestrator wiring + live demo come after U0
PASSes (avoid deploy contention with the Adversary's cold U0 re-runs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:05:39 +00:00
5b6b378ade claim(3 U0): results.json + level ladder — gate CLAIMED
U0 (R1) done: pure level() mapper (L0-L6 gap-caps) + per-test JUnit results + results.json, all
emitted best-effort (never changes verdict, R7). Two real runs bracket the gate:
custom-html-tiny=L2 (functional N/A, backup N/A caps at L2) and uptime-kuma=L4 (full climb, no SSO
surface caps at L5). 28 unit tests + Adversary fuzz-clean. Rung-mapping contract in DECISIONS.
Verify: STATUS-3.md HOW/EXPECTED. Awaiting Adversary cold-verify.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:03:49 +00:00
757511e4e7 decisions(3): settle level ladder + rung-mapping contract + artifact hosting (U0)
Records the exact tier+deps/SSO -> rung translation derive_rungs uses (the layer the level depends
on), gap-caps semantics (N/A caps like fail, conservative/never-inflate), the results.json schema,
flags (clean_teardown/no_secret_leak), and artifact dir ${CCCI_RUNS_DIR:-/var/lib/cc-ci-runs}/<run_id>/
(dashboard serves /runs/<id>/ in U2/U4). So the Adversary can verify the level against a documented contract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 06:01:38 +00:00
52e5d210d8 feat(3 U0.2+U0.3): per-test results + results.json with computed level
harness/results.py: JUnit-XML parsing (stdlib) → per-stage/per-test rows; derive_rungs (documented
tier+deps/SSO → rung mapping); build_results assembles results.json {recipe,version,pr,ref,run_id,
stages[],level,level_cap_reason,rungs,flags{clean_teardown,no_secret_leak},screenshot,summary_card};
write_results (atomic). run_recipe_ci.py: tiers emit --junitxml + append {tier,source,file,rc,junit}
records; main() assembles+writes results.json wrapped so a failure NEVER changes the verdict (R7),
incl. a narrow leak-scan of the serialised artifact. 17 new unit tests (test_results.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:55:58 +00:00
df54693449 review(3): pre-claim recon (not a verdict) — U0.1 pure level() mapper fuzz-clean (729/729 no inflation); binding U0 risk = translation layer in run_recipe_ci.py 2026-05-31 05:53:25 +00:00
9773e3ff63 feat(3 U0.1): pure level() ladder mapper (L0-L6, gap-caps) + unit tests
Phase-3 R1 foundation. harness.level.compute_level(rungs)->(level,cap_reason) with YunoHost
gap-caps semantics: level = highest rung 1..L all clean PASS; first non-PASS (FAIL or N/A) caps,
recorded in cap_reason. N/A caps like fail but distinctly (L5 'no integration surface' example).
Helpers backup_restore_status + tier_to_rung. 16 unit tests incl U0 gate cases (L4-pass, L2-cap).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:46:23 +00:00
805fbba2ad chore(3): bootstrap Phase-3 loop state (STATUS/BACKLOG/JOURNAL-3); seed U0-U5 backlog
Phase 3 = beautiful YunoHost-style results UX (level ladder + image-forward PR comment + summary
card w/ app screenshot + polished dashboard + badges). Operator kicked off manually. Starting U0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:43:27 +00:00
2022c3a2bb review(3): Phase-3 Adversary loop live; ledger seeded; no gate yet (Builder not started U0); P2-VETO dependency flagged but not a P3 blocker 2026-05-31 05:41:56 +00:00
7123d8288e status(2b): ## DONE — B1-B4 all Adversary cold-PASS @05:38Z, no VETO
Per-recipe deploy budget confirmed minimal (1 base + N_cold_deps, upgrade shares
the base in place) and enforced (DG4.1); no redundant deploy existed. All four DoD
items PASS in REVIEW-2b (edf34e3). Phase 2b complete.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:38:52 +00:00
f7d336fff4 review(2b): PASS — deploy budget 1+N_cold_deps COLD-verified minimal+enforced (DG4.1 non-vacuous; doc-only claim so B3 holds by construction; mumble real run deploy-count=1 all-tiers-green + prior lasuite-docs=2 cold-dep verdict). doc complete incl WC5 caveat. No VETO. B1-B4 all PASS 2026-05-31 05:38:17 +00:00
edf34e3e53 claim(2b): deploy budget confirmed minimal+enforced (1+N_cold_deps); B1-B4 claimed
Phase 2b confirm-and-document outcome: per-recipe test-sequence deploy budget is
already minimal — `deploys == 1 (base, shared by all 5 tiers) + N_cold_deps` — and
tighter than plan B1's nominal `1+1(upgrade)+N` because the upgrade is an in-place
chaos redeploy of the prev-version base, not a separate deploy. Enforced as a hard
failure by DG4.1 (expected = 1 + deps_deployed_count, run_recipe_ci.py:1005-1010).
No redundant deploy found; none removed (none existed).

- docs/perf/deploys.md: the budget record (B4), names the out-of-budget WC5 reseed
- STATUS-2b.md: B1-B4 claim with WHAT/HOW/EXPECTED/WHERE for cold verify
- JOURNAL-2b.md / BACKLOG-2b.md / DECISIONS.md: reasoning + settled note
- consume machine-docs/BUILDER-INBOX.md (Adversary heads-up processed)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 05:35:46 +00:00
5f37de69e3 review(2b): Phase-2b Adversary loop live; pre-claim cold deploy-budget trace (budget = 1+N_deps, enforced by DG4.1, tighter than B1's 1+1+N_deps); WC5 green-cold reseed flagged as B1-doc completeness item; BUILDER-INBOX heads-up 2026-05-31 05:33:49 +00:00
b4a6c02dde journal(2): DONE-VETO checklist complete; plausible Q4.7b mirror+PR#1+run launched 2026-05-31 05:28:57 +00:00
04e4051bc3 status(2): all 3 DONE-VETO upgrade-to-latest items Adversary-PASSED (ghost/discourse/mumble); remaining = plausible Q4.7b + drone Q4.10 + Q5; executing plausible 2026-05-31 05:27:18 +00:00
0d5d5164f9 review(2:F2-14c): PASS — mumble full lifecycle incl real upgrade-to-latest 0.2.0->1.0.0 GREEN cold-verified (fork removed via UPGRADE_EXTRA_ENV, voice/web/config on latest, P2/P3/P4 real, clean teardown); LAST DONE-VETO checklist item. F2-15 CLOSED (discourse PARITY.md) 2026-05-31 05:26:17 +00:00
470afbff98 fix(discourse F2-15): add N/A PARITY.md (P2 §4.1) — parity genuinely N/A (no upstream corpus); documents functional tests + P4 integrity 2026-05-31 05:24:19 +00:00
7525478304 review(2:Q4.6): PASS — discourse full lifecycle incl upgrade-to-latest GREEN cold-verified (deploy-count=1, real 0.7.0->PR-head crossover, P3 create-topic, P4 non-vacuous, clean teardown); closes discourse VETO portion. P2 PARITY.md gap filed F2-15 2026-05-31 05:22:40 +00:00
7f15367d1f backlog(2): plausible Q4.7b scoped + ready (staged hardened entrypoint.clickhouse.sh; mirror+PR+run steps); queued behind Adversary Q4.6/F2-14c verifies 2026-05-31 05:21:23 +00:00