Per operator: intentional skips now render like a pass row but labelled
'INTENTIONAL SKIP' (muted green) with the declared reason on the line beneath;
unintentional skips render amber 'UNINTENTIONAL SKIP' with a prompt to add a test
or declare them. The cap line is back to just the level-cap reason (the per-rung
reason now lives in the rows). Labelled, so it never reads as a PASS.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per operator: drop the gap-sensitivity / cap-intent-clause / stale-detection
machinery. Model is now dead simple — recipe_meta.EXPECTED_NA = {rung: reason}
lists the rungs a recipe intentionally skips; ANY rung skipped (N/A) and not in
that list is unintentional.
results.json: replace the 'na' block + level_cap_intent with
skips: { intentional: {rung: reason}, unintentional: [rung] }
plus level_cap_rung (which rung capped). Badge/card derive intentional-vs-
unintentional from whether the capping rung is in the intentional list. Skips
still cap the level (never inflate). custom-html-tiny lists all three rungs it
intentionally skips (backup_restore, integration, recipe_local).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The level badge gains a third segment derived from level_cap_intent:
- amber 'gap?' when the climb was capped by an UNDECLARED gap-sensitive N/A
(backup_restore / functional) — a likely-missing test (unexpected skip)
- muted 'expected' when capped by a DECLARED intentional N/A (reviewed, nothing to fix)
- nothing extra for a clean cap, a full climb, or a real failure.
Font-safe text labels (no emoji) so the SVG renders headless/anywhere. Badge
never inflates — it only annotates the cap the level already reflects.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two changes the operator asked for after noticing custom-html-tiny PR #6 has no
backup/restore or functional coverage:
1) Intentional-vs-accidental N/A. A recipe can now declare
recipe_meta.EXPECTED_NA = {rung: reason} to mark a tier as deliberately not
applicable (e.g. a stateless static server has no backup surface). N/A still
caps the level — the harness never claims a rung it did not verify — but the
run is now annotated 'intentional · <reason>' instead of being indistinguishable
from a forgotten test. An *undeclared* N/A on a gap-sensitive rung
(backup_restore, functional) is surfaced as a 'possible coverage gap', and a
stale EXPECTED_NA (declared N/A but actually exercised) is surfaced too. All
non-blocking (R7): results.json gains level_cap_intent + an block, the
summary card shows the clause, and the CI log prints the gap/stale warnings.
(results.classify_na/cap_intent are pure + unit-tested; level.py untouched.)
custom-html-tiny declares backup_restore intentionally N/A.
2) custom-html-tiny functional test: writes a random file into the served content
volume (via the volume mountpoint, like install_steps.sh, since the SWS image
is shell-less), asserts exact-byte round-trip + a real 404 on a missing path —
proving the static-web-server actually serves the volume, not a 200-everything
fallback.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
nginx:alpine swarm service serving /var/lib/cc-ci-reports behind traefik
(Host(report.ci.commoninternet.net) + wildcard TLS), deployed by a reconcile
oneshot mirroring dashboard.nix. The /recipe-report skill writes the weekly
HTML pages there; nginx serves them live. report.ci.* already resolves
(wildcard *.ci DNS) and is covered by the wildcard cert.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Operator preference: each !testme should get its own comment response so a
re-run is visible in the PR timeline. process_testme now always posts a fresh
⏳ placeholder comment; watch_and_reflect edits THAT comment to the result.
(Was: reuse/edit a single marker comment in place — which made re-runs on an
unchanged head invisible, only updating commit status.)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Having test_backup.py in custom-html-bkp-bad caused both bad-backup and bad-restore
to fail at the backup tier. Create custom-html-rst-bad with its own cc-ci test dir
that has ops.py+test_restore.py but NO test_backup.py, so:
- backup: only generic test_backup_artifact → PASS (snapshot exists)
- restore: pre_restore writes 'mutated', marker stays 'mutated' after restore → FAIL
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
backup-bot-two ignores backupbot.backup.path labels and always backs up
the full volume, making path-based restore-RED infeasible.
New approach: custom-html-bkp-bad has no pre_backup → marker never seeded
→ backup snapshot has no ci-marker.txt. pre_restore writes 'mutated'.
After restore: marker is MISSING or 'mutated' → test_restore_returns_state FAILS.
upgrade=skip (no version tags) is acceptable since passing_tiers_before=[install,backup].
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
No ops.py::pre_backup for custom-html-bkp-bad → ci-marker.txt never seeded.
test_backup_captures_state asserts marker=='original' → MISSING → FAIL → backup=RED.
This works regardless of backupbot label behavior.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
backupbot-two ignores nonexistent backup paths and backs up the whole
volume, making the bad-path approach unreliable. New approach:
- Create recipe-maintainers/custom-html-bkp-bad on Gitea (custom-html
without backupbot.backup=true label) — SHA 4e584063a99a
- Add tests/custom-html-bkp-bad/recipe_meta.py with BACKUP_CAPABLE=True
so the harness runs the backup tier despite auto-detect returning False
- Without a labeled container, backup-bot-two produces no snapshot →
parse_snapshot_id=None → test_backup_artifact fails → backup=RED ✓
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both compose.yml uploads had empty files due to a bash encoding bug.
Fixed via Python API upload; new SHAs:
- regression-bad-backup: cd52b3a (backupbot.backup.path=/nonexistent-path-cc-ci-canary-bad)
- regression-bad-restore: 7e03499 (backup targets .backup-data subdir + command creates it)
Adversary confirmed bad-install ✓ and bad-upgrade ✓ from run artifacts.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- tests/regression/test_canaries.py: replace `from .conftest import ...`
(relative import fails when not a package) with sys.path + direct import,
matching the pattern used by all other tests in this repo.
- Delete machine-docs/BUILDER-INBOX.md (Adversary inbox consumed).
- Update STATUS-regression.md + JOURNAL-regression.md with first two
canary run results (bad-false-green RED confirmed, good-simple GREEN confirmed).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three canaries (@pytest.mark.canary) drive the real cold CI lifecycle:
- good-simple: custom-html-tiny @ main (435df8fc) — fast signal, expects GREEN
- good-significant: lasuite-docs @ main (290a8ad7) — multi-service, expects GREEN
- bad-false-green: custom-html @ v5-stale-docroot (71e7326a) — expects RED
Semantic teeth: beyond exit-code, each test asserts that specific named tests
ran in results.json stages (test_serving, test_serving_and_frontend, test_content_type).
If an assertion is removed, the named test disappears → regression test fails.
Includes conftest (run_recipe_ci helper + stage_has_{passing,failing}_test),
README (cadence policy, how to run, how to add), and phase state files.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Server regression canaries (tests/regression/, pytest -m canary) are
expensive — run them at milestones (polish/review/release), NOT every
commit. Per-recipe lifecycle tests keep their normal per-PR !testme
trigger. Plus the standing 'never weaken a test to pass' guardrail.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ph1: 3 mirrors cold-verified — lasuite-drive/mailu/mumble all HTTP 200,
empty=false, default_branch=main, HEAD SHAs match upstream exactly.
Ph3: POLL_REPOS has 20 repos; all 9 new recipes present + all have tests/.
Ph2: tests authored (recipe_meta.py, test_health_check, test_branding, PARITY.md)
but builds #153/#154 predate authoring (2026-05-28 vs 2026-06-02). Plan requires
!testme green AFTER authoring. Filing A-mirror-1. Phase 4 deploy NOT blocked.
Ph4 operator deploy: OK to proceed. A-mirror-1 must close before Phase 5 DONE.
Supersede the CronCreate/busybox notes: the weekly /upgrade-all now runs
via the reboot-safe cc-ci-upgrade-all systemd timer in the orchestrator
flake. Records the T0 PASS and the schedule move (Mon 23:04 -> Sun 02:00 UTC).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CronCreate mechanism cold-verified: upgrader-cron.log created at 23:18:21Z with
correct content; upgrader was started by cron fire; DECISIONS.md updated.
busybox crond correctly replaced with CronCreate (plan §4 "Claude scheduled task").
All V1-V9 + §4 cron now PASS within 24h. No open findings, no VETOs.
Builder may write ## DONE to STATUS-5.md.