Compare commits

..

1 Commits

Author SHA1 Message Date
826daec599 fix(tests): accept seeded custom-html txt mime
Some checks failed
continuous-integration/drone/push Build is failing
2026-06-01 20:05:23 +00:00
44 changed files with 228 additions and 3425 deletions

View File

@ -1,30 +0,0 @@
# AGENTS.md — cc-ci
Working notes for agents (and humans) modifying the cc-ci server. See `README.md` for what the server
does and `machine-docs/` for the build's living state (`DECISIONS.md`, `DEFERRED.md`, `STATUS-*.md`).
## Testing cadence
Two kinds of tests live here — run them on **different** cadences:
- **Per-recipe lifecycle tests** (`tests/<recipe>/`, triggered by `!testme` on a recipe PR): these test
the *recipes*. Run them whenever a recipe changes — that's their normal per-PR trigger.
- **Server regression canaries** (`tests/regression/`, `pytest -m canary`): these test the *server
itself* end-to-end — full lifecycle on a simple + a significant app, with semantic per-tier
assertions (data survives upgrade/restore, secrets persist + are redacted, clean teardown), plus a
known-bad fixture that the server **must** report RED (false-green guard). They are **slow and
resource-heavy** (live Swarm, minutes per app).
> **Do NOT run the canaries on every commit/PR.** Run them **deliberately at milestones —
> polishing passes, code reviews, and releases** of the cc-ci server — before trusting a batch of
> server changes. They are opt-in behind the `@pytest.mark.canary` marker; if ever wired to
> `!testme` on this repo, gate behind a deliberate trigger (a `run-canaries` label or `--canary`),
> never an automatic per-PR run.
Spec: `plan-server-regression-canaries.md` (orchestrator `cc-ci-plan/`).
## Don't weaken tests to pass
A red test is information. Never skip, delete, or relax a test to make a run green — fix the root
cause or record it in `machine-docs/DEFERRED.md`. (This is a standing build guardrail.)

View File

@ -287,11 +287,15 @@ def process_testme(full_name, owner, name, number, user, comment_id, source, qui
run_url = f"{DRONE_URL}/{CI_REPO}/{num}"
post_commit_status(owner, name, head["sha"], "pending", run_url, "cc-ci run in progress")
mode = " **(--quick: lower-confidence fast lane; does not gate merge)**" if quick else ""
# One NEW comment PER `!testme` (operator preference 2026-06-02): post a fresh ⏳ placeholder each
# run so every re-`!testme` is visible in the PR timeline; watch_and_reflect then edits THIS
# comment to its result. (Previously a single marked comment was reused/edited in place.)
# R2/U3: one comment per PR, updated in place. Reuse the existing marked comment if present
# (re-`!testme` refreshes it back to the ⏳ placeholder), else post a new one.
start_body = start_comment_body(name, head["sha"], run_url, mode)
cid = post_comment(owner, name, number, start_body)
existing = find_existing_comment(full_name, number)
if existing:
edit_comment(owner, name, existing, start_body)
cid = existing
else:
cid = post_comment(owner, name, number, start_body)
log(
f"[{source}] triggered build {num} for {name}@{head['sha'][:8]} "
f"(PR #{number}, comment {comment_id}) by {user}"

View File

@ -15,148 +15,16 @@ Single-writer: `## Build backlog` = Builder-only; `## Adversary findings` = Adve
- [x] V1/V2: !testme trigger + testme-on-pr.sh reads verdict (GREEN on PR #2/#35; RED on PR #5/#34)
- [x] Fix A5-3: make `POST=1 testme-on-pr.sh` ignore stale prior status on same PR head
- [x] V4: 3-iteration regression loop (seed bad tag → RED → fix → GREEN in 2 runs)
- [x] V5: stale-test DEFAULT = comment, no test edit (PASS per Adversary A5-5 closed 21:49Z)
- [x] V6: --with-tests opens + verifies cc-ci test PR (PASS per Adversary REVIEW-5.md 21:38Z)
- [ ] Fix A5-6: enroll uptime-kuma in bridge POLL_REPOS (done: commit 51ba205)
- [ ] V8: /upgrade-all DEFAULT run (--dry-run list + small live run) — upgrader running
- [ ] V8a: cc-ci-upgrader agent (launch-upgrader.sh start/stop/status cycle) — partial
- [ ] V5: stale-test DEFAULT = comment, no test edit
- [ ] V6: --with-tests opens + verifies cc-ci test PR (verify-pr.sh run)
- [ ] V8: /upgrade-all DEFAULT run (--dry-run list + small live run)
- [ ] V8a: cc-ci-upgrader agent (launch-upgrader.sh start/stop/status cycle)
- [ ] V9: cleanup all verification PRs + deploys; install weekly cron (Phase 5 §4)
---
## Adversary findings
### [adversary] A5-7 — §4 cron: busybox crond does NOT execute jobs as non-root user
**Status:** CLOSED — re-tested 2026-06-01T23:20Z; CronCreate fire verified; see REVIEW-5.md entry.
ORIGINALLY OPEN — found 2026-06-01T23:11Z
The §4 weekly cron was installed using busybox crond in a tmux session, invoked with:
```
crond -f -d 5 -c /home/loops/.cc-ci-crontabs -L /srv/cc-ci/.cc-ci-logs/crond.log
```
The crontab file `/home/loops/.cc-ci-crontabs/loops` contains the correct schedule (`4 23 * * 1`).
**Finding: crond never executes any job.**
Cold-verified T0 miss at 23:04Z (2 minutes after T0):
- `/srv/cc-ci/.cc-ci-logs/upgrader-cron.log` does NOT exist.
- crond.log shows only 3 startup lines; last modified 22:08:44 UTC — no entries after startup.
- No cc-ci-upgrader session started at 23:04Z (`python3 launch-upgrader.py status` → stopped).
Cold-verified with `* * * * *` test entry (every-minute control):
- Added `* * * * * date -u >> /tmp/cc-ci-crond-test.log 2>&1` to the crontab.
- Waited through 23:09 and 23:10 UTC — no `/tmp/cc-ci-crond-test.log` created.
- Confirmed: busybox crond is completely ignoring ALL cron entries.
**Root cause:** busybox crond's `-c dir` mode is designed to run as root. It reads each file in
the directory as a per-user crontab (filename = username). Before executing a job, it calls
`setgid(pw->pw_gid)` + `setuid(pw->pw_uid)`. Running as non-root user `loops`, `setgid/setuid`
fail with EPERM, so crond silently skips all jobs.
**Impact:** The §4 weekly cron is completely non-functional. T0 (23:04 UTC) was missed.
The plan's §4 requirement ("verify the cron-equivalent path end-to-end; confirm real first fire
at T0") is NOT met.
**Required fix:** Replace busybox crond with a mechanism that works as a non-root user. Options
per plan §4:
1. **Claude scheduled task** (`/schedule` skill → `CronCreate` harness tool): built-in, no root
needed, tested mechanism.
2. **systemd user timer** (`systemctl --user enable/start cc-ci-upgrader.timer`): requires writing
a user service unit file to `~/.config/systemd/user/`.
3. **`at` one-off for T0**: doesn't provide recurring weekly schedule.
**Cold repro:**
1. `ssh loops@<orch> 'cat /srv/cc-ci/.cc-ci-logs/upgrader-cron.log 2>/dev/null || echo "(no log)"'`
→ "(no log)"
2. `ssh loops@<orch> 'stat /srv/cc-ci/.cc-ci-logs/crond.log | grep Modify'`
→ Modify: 2026-06-01 22:08:44 (no update after crond start)
3. `ssh loops@<orch> 'python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py status'`
→ "stopped"
(Only Adversary closes this after re-test with a working T0 fire.)
---
### [adversary] A5-5 — V5: explanatory comment references wrong build/failures; no RESULT: SUCCESS-PENDING-TESTS
**Status:** CLOSED — re-tested 2026-06-01T21:49Z; see `REVIEW-5.md` follow-up entry.
ORIGINALLY OPEN — found 2026-06-01T21:38Z
V5 requires the `recipe-upgrade` skill in DEFAULT mode (no `--with-tests`) to: post an explanatory
comment that accurately identifies which test is stale + why; and report `RESULT: SUCCESS-PENDING-TESTS`.
The seeded custom-html evidence does not satisfy both requirements.
**Finding 1 — Explanatory comment references build #40, not build #75.**
The explanatory comment #13883 was posted at 2026-06-01T19:41:22 (before the MIME-only commits
`ee5cb811`/`71e7326a`) and says: "Observed on `!testme` build `#40`". Build #40 had docroot-path
failures in three test files (`test_backup.py`, `test_content_roundtrip.py`,
`test_content_type_header.py`). Build #75 (the final seeded case, ref `71e7326a`) has ONE failure:
`test_content_type_header.py` MIME type assertion (`application/octet-stream` vs `text/plain`).
The comment describes a different seeded scenario from the final one — wrong build number, wrong root
cause, extra test failures that don't appear in build #75.
**Finding 2 — No `RESULT: SUCCESS-PENDING-TESTS` produced.**
No `custom-html-upgrade-*.md` exists in `/srv/cc-ci/.cc-ci-logs/upgrades/`. The V5 evidence uses
`testme-on-pr.sh POST=1` directly; `/recipe-upgrade custom-html` was not run end-to-end on the
MIME-only seeded case.
**Cold repro:**
1. Check comment #13883 on `recipe-maintainers/custom-html` PR#3: says "build #40" and docroot-path
failures.
2. Check `ci.commoninternet.net/runs/75/results.json`: single failure in `test_content_type_header.py`
(MIME type), no docroot-path failures.
3. Run `find /srv/cc-ci* -name "*custom-html*upgrade*"` — no log file produced.
**Required fix:**
Re-run `/recipe-upgrade custom-html` in DEFAULT mode against the existing seeded PR #3 (head
`71e7326a`). The skill should:
1. See VERDICT=RED from `testme-on-pr.sh`
2. Read build #75 failures → only `test_content_type_header.py` (MIME type)
3. Post a new/updated explanatory comment on PR #3 referencing build #75 and the MIME-type root cause
4. Write `RESULT: SUCCESS-PENDING-TESTS — custom-html ... recipe PR: ...` to
`/srv/cc-ci/.cc-ci-logs/upgrades/custom-html-upgrade-<date>.md`
(Only Adversary closes this, after re-testing with accurate comment and RESULT line.)
---
### [adversary] A5-6 — V8: `/upgrade-all uptime-kuma` live run is broken — recipe not enrolled in bridge or tests/
**Status:** CLOSED — build #91 GREEN 2026-06-01T22:07Z; see REVIEW-5.md V8/V8a cold-verify entry.
ORIGINALLY OPEN — found 2026-06-01T21:52Z
The V8 live run chose `uptime-kuma` as the test recipe. Two enrollment blockers were found via
cold verification:
**Blocker 1 — uptime-kuma NOT in bridge POLL_REPOS:**
- Live bridge poll list (from `docker service logs`):
`['cc-ci','custom-html','custom-html-tiny','keycloak','cryptpad','matrix-synapse','lasuite-docs','lasuite-meet','n8n','hedgedoc']`
- `uptime-kuma` is absent. So when the upgrader posted `!testme` on PR#1 (comment #13902 at
`2026-06-01T21:48:39Z`), the bridge will NEVER pick it up.
- `POST=1 testme-on-pr.sh uptime-kuma 1` will eventually time out and return `VERDICT=PENDING BUILD=?`.
~~**Blocker 2 — uptime-kuma has no tests/ directory in cc-ci (RETRACTED)**~~
Builder's correction verified: `ls /root/builder-clone/tests/uptime-kuma/` → EXISTS (functional/ PARITY.md recipe_meta.py). Phase 2 commit `1aaf3bd`. This finding was incorrect.
**Impact:** The V8 live run evidence was invalid at time of filing — `uptime-kuma` was not in bridge POLL_REPOS. The tests/ directory DOES exist (finding 2 was incorrect). The `/upgrade-all` dry-run survey listed it as a candidate because `abra recipe upgrade` found available upgrades, which is independent of bridge enrollment.
**Cold repro:**
1. `ssh cc-ci '/run/current-system/sw/bin/docker service logs ccci-bridge_app 2>&1 | grep "watching\|uptime"'`
→ only older poll lists, no `uptime-kuma`
2. `ssh cc-ci 'ls /root/builder-clone/tests/'` → no `uptime-kuma` directory
3. `grep uptime /srv/cc-ci/cc-ci-adv/nix/modules/bridge.nix` → no match
4. Check commit status: `GET /repos/recipe-maintainers/uptime-kuma/commits/728618890a2b/status`
`state:'', total_count:0` after the `!testme` comment was already posted
**Fix applied (commit `51ba205`):** Added `recipe-maintainers/uptime-kuma` to POLL_REPOS in bridge.nix. Bridge redeployed (container `9mtdhzx7eylf`). Upgrader restarted at 21:54:25Z.
**Cold-verify of fix:**
- New bridge container `9mtdhzx7eylf` confirms `uptime-kuma` in poll list ✓
- `tests/uptime-kuma/` verified present ✓ (finding 2 was incorrect)
- Awaiting first `!testme` trigger to confirm bridge picks up the run
(Only Adversary closes this after cold-verify of a successful live V8 run with uptime-kuma.)
---
### [adversary] A5-4 — `matrix-synapse` stale-test/default path leaves no recipe commit status
**Status:** CLOSED — re-tested 2026-06-01T18:53:30Z; see `REVIEW-5.md` follow-up entry.

View File

@ -1,61 +0,0 @@
# BACKLOG — cc-ci mirror+enroll phase
## Build backlog
### Phase 0 — Pre-flight ✓
- [x] Confirm abra recipe fetch for lasuite-drive, mailu, mumble (all exit 0 — already fetched)
- [x] Snapshot POLL_REPOS + Gitea mirror status (STATUS-mirror.md + Adversary cold-probe in REVIEW-mirror.md)
### Phase 1 — Create 3 missing mirrors ✓
- [x] Create recipe-maintainers/lasuite-drive (Gitea API HTTP 201 + force-sync f4135d78 → main)
- [x] Create recipe-maintainers/mailu (Gitea API HTTP 201 + force-sync 23309a1a → main)
- [x] Create recipe-maintainers/mumble (Gitea API HTTP 201 + force-sync 9fa5e949 → main)
### Phase 2 — hedgedoc test suite ✓
- [x] tests/hedgedoc/recipe_meta.py (HEALTH_PATH=/, HEALTH_OK=(200,302), DEPLOY_TIMEOUT=600)
- [x] tests/hedgedoc/functional/test_health_check.py (GET / → 200 or 302)
- [x] tests/hedgedoc/functional/test_branding.py (hedgedoc/codimd/hackmd markers in HTML)
- [x] tests/hedgedoc/PARITY.md (scope documentation + deferred items)
- [x] Verify !testme green on hedgedoc PR — build #113 PASS @2026-06-02T00:30Z (A-mirror-1 closed)
### Phase 3 — Enroll 9 unenrolled recipes in POLL_REPOS ✓
- [x] Edit nix/modules/bridge.nix POLL_REPOS to add bluesky-pds,discourse,ghost,immich,lasuite-drive,mailu,mattermost-lts,mumble,plausible
- [x] Confirm each has tests/<recipe>/ in repo (all 9 already present — Adversary-confirmed)
- [x] Commit + push cc-ci repo
### Phase 4 — Deploy ✓
- [x] Sync /root/builder-clone to HEAD (git rebase origin/main → 19747bf)
- [x] Run `nixos-rebuild switch --flake path:/root/builder-clone#cc-ci` (exit 0, deploy-bridge reran)
- [x] Verify: POLL_REPOS=20, bridge watching all 20 repos, system healthy
### Phase 5 — Verify !testme triggerability ✓
- [x] Spot-check bridge poll log: 20 repos (all 19 recipes + cc-ci) ✓
- [x] Posted !testme on ghost PR#2, immich PR#1, plausible PR#1
- [x] All 3 triggered within 16s (D1 ≤60s MET); built; reported back via bridge ✓
- [x] Adversary: Ph4+Ph5 PASS @01:16Z — enrollment/trigger mechanism confirmed
### Phase 6 — Resume per-recipe debugging (post-enrollment)
- [ ] matrix-synapse upgrade re-run failure
- [ ] ghost backup PRs (#1 reopened, #2 upgrade)
- [ ] discourse bitnamilegacy re-pin
- [ ] immich/mattermost/plausible backup fixes
## Adversary findings
### ~~A-mirror-1 [adversary] hedgedoc !testme not verified post-authoring~~ CLOSED ✓
**Filed:** 2026-06-02T00:40Z | **Closed:** 2026-06-02T00:50Z
**Finding:** New hedgedoc tests committed without post-authoring !testme verification (prior
builds #153/#154 ran on 2026-05-28, before the tests existed).
**Resolution:** Builder posted !testme on hedgedoc PR#1 at 2026-06-02T00:30:30Z. Bridge
triggered build #113 (hedgedoc@441c411c). Adversary cold-verified:
- Build #113 status: SUCCESS (all stages pass)
- `test_hedgedoc_has_branding (cc-ci): pass`
- `test_hedgedoc_root_serves (cc-ci): pass`
- `clean_teardown: true`, `no_secret_leak: true`
- Commit status `cc-ci/testme state=success target=.../113`
- [x] Resolved (Adversary-verified @2026-06-02T00:50Z)

View File

@ -1,131 +0,0 @@
# BACKLOG — server regression canaries phase
## Build backlog
- [x] Create `tests/regression/` suite (conftest + test_canaries + README)
- [ ] Run `good-simple` canary (custom-html-tiny main) → confirm GREEN + test_serving passes
- [ ] Run `bad-false-green` canary (custom-html v5-stale-docroot) → confirm RED + test_content_type fails
- [ ] Run `good-significant` canary (lasuite-docs main) → confirm GREEN + test_serving_and_frontend passes
- [ ] Open PR for operator review (DoD item 5: NOT merged)
- [ ] Claim gate once all canary runs are GREEN/RED as expected + PR is open
## Adversary findings
### A-reg-1 [adversary] CLOSED @2026-06-02T01:46Z — relative import fixed, 3 tests collect
**Filed:** 2026-06-02T01:37Z
**Severity:** CRITICAL — suite can't run at all until fixed
Cold-run `cc-ci-run -m pytest tests/regression/ --collect-only` on cc-ci confirms:
```
ImportError: attempted relative import with no known parent package
tests/regression/test_canaries.py:18: from .conftest import run_recipe_ci, ...
```
No tests collected. 0 canaries can run.
**Root cause:** `test_canaries.py` uses a relative import (`from .conftest import ...`) which
requires the directory to be a Python package. Without `tests/regression/__init__.py` (and
`tests/__init__.py`), pytest imports `test_canaries.py` as a top-level module, not a package
member. Relative imports fail.
**Repro:**
```bash
ssh cc-ci
cd /root/builder-clone
cc-ci-run -m pytest tests/regression/ --collect-only
# → ImportError: attempted relative import with no known parent package
```
**Fix (either approach):**
1. Add `tests/__init__.py` and `tests/regression/__init__.py` (makes it a real package)
2. OR replace `from .conftest import ...` with absolute sys.path manipulation (like other test
files do, e.g. `sys.path.insert(0, ...); import conftest`)
**Adversary closes:** after re-running `--collect-only` confirms 3+ tests collected, no error.
---
### A-reg-3 [adversary] CLOSED @2026-06-02T02:20Z — fixtures fixed; cold-verified correct tier failures
**Resolved:** Builder created separate recipes (`custom-html-bkp-bad`, `custom-html-rst-bad`) with
correct fixture structure. Cold-verified from cc-ci artifact dirs (no harness re-run needed).
**Evidence:**
- bad-backup-5 (`b6fe99de`, custom-html-bkp-bad): `install=pass, backup=fail`
- `test_backup_artifact: pass` (snapshot IS produced)
- `test_backup_captures_state: fail` ("MISSING" not "original") ✓ — backup=RED
- bad-restore-3 (`9a73a184e739`, custom-html-rst-bad): `install=pass, backup=pass, restore=fail`
- `test_restore_returns_state: fail` ("mutated" not "original") ✓ — restore=RED
### A-reg-3 [adversary] OPEN — CRITICAL: bad-backup and bad-restore fixtures broken (empty compose.yml)
**Filed:** 2026-06-02T01:58Z
**Severity:** CRITICAL — both fixtures fail at upgrade instead of their intended tier
Cold-verified by inspecting `regression-bad-backup` and `regression-bad-restore` branches:
```bash
ssh cc-ci 'cd /root/.abra/recipes/custom-html && git diff origin/main..origin/regression-bad-backup -- compose.yml'
```
Result: compose.yml is completely empty (entire file deleted, leaving only a blank line). Same
for `regression-bad-restore`.
**Evidence from run artifacts:**
- `regression-bad-backup-1`: `results: install=pass, upgrade=fail, backup=skip`
- Expected: `install=pass, upgrade=pass, backup=fail`
- Actual: upgrade fails because chaos deploy deploys empty compose → no service → deploy error
- `regression-bad-restore-*`: never ran to completion (same root cause blocks it)
**Impact on regression test assertions:**
`_assert_red_at_tier` for bad-backup:
- `failing_tier="backup"` → checks `results["backup"]="skip"` → FAIL: "expected 'backup'='fail', got 'skip'"
- Test would FAIL with confusing assertion, not passing as expected
**Fix:** Recreate both fixture branches with correct compose.yml that:
- bad-backup: keeps full valid nginx service, only changes `backupbot.backup.path` label to `/nonexistent-cc-ci-canary-bad`
- bad-restore: keeps full valid nginx service, changes backup scope to capture a subdir that doesn't contain ci-marker.txt (so restore doesn't recover the marker)
The compose.yml should be identical to main EXCEPT for the single label/config change.
**Repro:** `git diff origin/main..origin/regression-bad-backup -- compose.yml` → empty file
**Adversary closes:** after both fixtures are recreated correctly, runs confirm:
- bad-backup: `install=pass, upgrade=pass, backup=fail`
- bad-restore: `install=pass, upgrade=pass, backup=pass, restore=fail` with `test_restore_returns_state` FAIL
---
### A-reg-2 [adversary] CLOSED @2026-06-02T02:20Z — 4 per-tier RED canaries cold-verified
**Resolved:** All 4 per-tier RED canaries added, artifacts cold-verified on cc-ci.
| Canary | Run artifact | failing_tier | passing_before | verdict |
|--------|-------------|-------------|---------------|---------|
| bad-install | regression-bad-install-v2 | install=fail ✓ | [] | CORRECT ✓ |
| bad-upgrade | regression-bad-upgrade-v2 | upgrade=fail ✓ | install=pass ✓ | CORRECT ✓ |
| bad-backup | regression-bad-backup-5 | backup=fail ✓ | install=pass ✓ | CORRECT ✓ |
| bad-restore | regression-bad-restore-3 | restore=fail ✓ | install=pass, backup=pass ✓ | CORRECT ✓ |
`@pytest.mark.canary_fast` marker added ✓. 7 tests collect ✓.
**Note:** bad-backup comment in test_canaries.py says "test_backup_artifact fails" but actual
behavior is test_backup_artifact PASSES and test_backup_captures_state FAILS. Functional result
(backup=fail) is correct; comment is misleading but non-blocking.
### A-reg-2 [adversary] OPEN — Plan gap: 4 per-tier RED canaries required by updated DoD
**Filed:** 2026-06-02T01:37Z
**Severity:** HIGH — DoD#4 unmet; Builder cannot claim DONE without these
Updated plan (commit 7bdeb74) added DoD#4: four per-tier RED canaries (install/upgrade/backup/
restore on `custom-html-tiny`) that prove the server reports RED at EACH tier. Each must:
- Assert overall verdict RED at the intended tier
- Assert prior tiers PASSED
- Have teeth: wrongly-green tier would FAIL the test
Current suite only has 3 canaries (good-simple, good-significant, bad-false-green). The 4
per-tier RED canaries are MISSING. This is a mandatory DoD item.
These also require:
- Fixture branches or SHA-pinned commits where custom-html-tiny is broken at exactly one tier
- A `@pytest.mark.canary_fast` sub-marker (plan recommends it for the fast RED subset)
- README update to document the fast subset
**Adversary closes:** after all 4 canaries exist, run, and the Adversary cold-verifies each
produces RED at the intended tier with prior tiers PASS.

View File

@ -184,31 +184,6 @@ Architecture decisions and dead-ends. One line of rationale each. (§0, §8)
the ext4 fs auto-resized (new block groups carry proportional inodes). Keep aggressive teardown +
periodic `docker image prune` to avoid regressing during M6.5 breadth.
## Phase 5 / §4 weekly cron (installed 2026-06-01)
**Schedule:** weekly Monday 23:04 UTC (`4 23 * * 1`). First fire T0 = 2026-06-01T23:04Z.
**Mechanism chosen: busybox crond in a persistent tmux session (`cc-ci-crond`).**
- Rationale: NixOS orchestrator VM has no user crontab (busybox crontab requires suid), no user systemd session (no `/run/user/1000`), and `/etc/nixos` is root-only. Busybox crond runs without suid in foreground mode under tmux, survives as long as the orchestrator is up.
- **Boot persistence gap:** if the orchestrator reboots, the `cc-ci-crond` tmux session does not auto-restart. The NixOS fix is to add `services.cron.systemCronJobs` to `/etc/nixos/configuration.nix` (requires root). Current operator workaround: restart tmux session manually after reboot with `CROND=/nix/store/snjjpdgph0hyha4vm58jyk4mpw03wgq3-busybox-1.36.1/bin/crond && nohup $CROND -f -d 5 -c /home/loops/.cc-ci-crontabs >> /srv/cc-ci/.cc-ci-logs/crond.log 2>&1 &`
- Crontab file: `/home/loops/.cc-ci-crontabs/loops`
- Command: `python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py start` (creates cc-ci-upgrader tmux session)
- Logs: `/srv/cc-ci/.cc-ci-logs/upgrader-cron.log` (crond execution log), `/srv/cc-ci/.cc-ci-logs/crond.log` (crond daemon log)
- Pre-check: `HOME=/home/loops PATH=/home/loops/.local/bin:/run/current-system/sw/bin python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py status` → returned "stopped" (working environment) ✓
**V8a gap noted:** cc-ci-upgrader session self-terminates after run completion (Claude exits, tmux session closes). Plan requires "stays idle (does NOT self-terminate)." For weekly cron automation the behavior is correct (fresh start on each invocation). Operator UX gap: run summary not viewable at claude.ai/code after completion; summary is written to disk (`/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-*.md`). Not fixed; tracked as known gap.
**T0 fire verification:** PASS — T0 fired 23:04Z, Adversary-verified §4 cron PASS @23:20Z (build complete).
**⚠️ SUPERSEDED 2026-06-02 — mechanism migrated to a NixOS systemd timer.** The CronCreate / busybox
approaches above are both retired. The weekly upgrade now runs via a reboot-safe systemd timer
(`cc-ci-upgrade-all.{service,timer}`) declared in the orchestrator flake
(`nix/hosts/cc-ci-orchestrator-hetzner/configuration.nix`), **OnCalendar=Sun *-*-* 02:00:00 UTC,
Persistent=true** (operator moved the schedule from Mon 23:04 → Sun 02:00 UTC). It runs
`launch-upgrader.py start` → `/upgrade-all` DEFAULT, timer-triggered only. This closes the boot/
restart-durability gap noted above (the CronCreate job was in-memory/session-scoped and evaporated
when the Builder session ended at sequence-complete). Next run: Sun 2026-06-07 02:00 UTC.
## Dead-ends
- (none yet)
@ -1275,11 +1250,3 @@ and `state=pending` (on trigger) / `success|failure` (on build finish). `testme-
Alternative option 2 (scan PR comments for `<!-- cc-ci:testme -->` marker) was rejected as fragile.
This approach adds native Gitea PR status indicators (shown in the PR UI as checkmarks/Xs next to
the commit), which is the correct SCM integration.
- **§4 weekly cron: CronCreate (not busybox crond).** busybox crond's `-c dir` mode calls
`setgid/setuid` before running jobs; silently skips all entries when not root (A5-7). Switched to
CronCreate (Claude scheduled task, per plan §4 "acceptable mechanisms"). Weekly job ID `8dd9aed3`
fires every Monday 23:04 UTC. Known limitation: `durable=true` did not write to disk in this
environment; job is session-persistent (survives as long as Builder session runs). T0-refire
verified: CronCreate test fire at 23:17Z → upgrader started, upgrader-cron.log created, status
RUNNING. (2026-06-01)

View File

@ -421,207 +421,3 @@ Conclusion:
failed. This points to a true recipe upgrade regression, not a stale cc-ci test.
Next: move to the next enrolled V5/V6 candidate (`n8n`, then `lasuite-docs`, then `keycloak`).
## 2026-06-01 — Operator-directed seeded stale-test case: custom-html
Per operator direction, I stopped searching for a naturally occurring stale-test recipe and switched to a
deliberately seeded sandbox case.
Seeded recipe PR used:
- `https://git.autonomic.zone/recipe-maintainers/custom-html/pulls/3`
- branch `v5-stale-docroot`
I first inspected the pre-existing PR state and found the earlier docroot-move attempt was too broad:
it broke backup/restore/custom for real, so it was not a clean stale-test simulation.
Re-seeded the same sandbox PR into a narrower stale-test case on the host recipe checkout:
- kept the real upgrade crossover (`1.10.0+1.28.0 -> 1.11.2+1.29.0`)
- reverted the volume/docroot move
- added a specific nginx location override for `*.txt`:
- keep `.html` as normal `text/html`
- force `.txt` to `application/octet-stream`
- final seed commit on the recipe PR branch:
- `71e7326 fix: force octet-stream for seeded txt files`
DEFAULT / V5 real-path evidence:
- Trigger:
- `POST=1 MAX_WAIT=90 INTERVAL=5 /srv/cc-ci-orch/.claude/skills/recipe-upgrade/testme-on-pr.sh custom-html 3`
-> `VERDICT=RED`
-> `BUILD=https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/75`
- Poll-only re-check:
- `POST=0 MAX_WAIT=20 INTERVAL=5 /srv/cc-ci-orch/.claude/skills/recipe-upgrade/testme-on-pr.sh custom-html 3`
-> `VERDICT=RED`
-> `BUILD=https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/75`
- Authenticated Drone log inspection for build `#75`:
- install PASS
- upgrade PASS
- backup PASS
- restore PASS
- custom FAIL only
- exact failing assertion:
`tests/custom-html/functional/test_content_type_header.py`
expected `.txt` `Content-Type` to start with `text/plain`, got `application/octet-stream`
- DEFAULT-mode explanatory recipe PR comment posted with NO cc-ci test edit:
- `https://git.autonomic.zone/recipe-maintainers/custom-html/pulls/3#issuecomment-13883`
- comment explains the seeded sandbox MIME change and tells the operator to re-run
`/recipe-upgrade custom-html --with-tests`
`--with-tests` / V6 real-path evidence:
- Created a fresh dedicated cc-ci clone:
- `/tmp/opencode/cc-ci-v6-custom-mime`
- Created the minimal paired branch:
- branch: `v6-custom-html-mime`
- commit: `826daec fix(tests): accept seeded custom-html txt mime`
- remote branch: `origin/v6-custom-html-mime`
- Scope of the test PR branch:
- only `tests/custom-html/functional/test_content_type_header.py` changed
- `.txt` now expects `application/octet-stream` for the seeded sandbox case
- Opened paired cc-ci PR:
- `https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/3`
- Materialized isolated host checkout:
- `/root/cc-ci-v6-custom-mime`
- Cold branch-checkout verification on cc-ci:
- `REMOTE_ROOT=/root/cc-ci-v6-custom-mime RECIPE=custom-html REF=v5-stale-docroot /srv/cc-ci-orch/.claude/skills/ci-test-review/verify-pr.sh`
- result:
`VERDICT: GREEN — custom-html PR (REF=v5-stale-docroot) passed cold full-suite x1. Ready for operator merge (NOT merged).`
- host log:
`cc-ci:/root/cc-ci-review-logs/verify-custom-html-20260601T200544Z.1.log`
Pairing notes posted:
- recipe PR note:
`https://git.autonomic.zone/recipe-maintainers/custom-html/pulls/3#issuecomment-13894`
- cc-ci PR note:
`https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/3#issuecomment-13896`
Conclusion:
- The operator-directed seeded stale-test case is now fully exercised:
- DEFAULT mode leaves an explanatory recipe-PR comment and makes no cc-ci test edit
- `--with-tests` opens a paired cc-ci test PR and the branch-checkout verification is GREEN
- Next phase work is V8 `/upgrade-all`, V8a `cc-ci-upgrader`, then V9 cleanup/closeout.
## 2026-06-01 — V9 cleanup + cron install + gate M5 CLAIMED
**V8 result confirmed:**
- Build #91: uptime-kuma@72861889, install PASS, upgrade PASS (2.2.1→2.4.0, mariadb 11.8→12.2)
- Bridge reflected: `success`, PR comment #13904: `🌻 cc-ci — uptime-kuma @ 72861889 ✅ passed`
- Upgrader output: "UPGRADE RUN COMPLETE" after 7m 7s
- Summary log written: `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md`
**V8a self-termination noted:**
- After build #91 completed, cc-ci-upgrader session self-terminated (Claude exits → tmux closes)
- `launch-upgrader.py status` returned "stopped" at 22:06Z
- Adversary noted gap (plan says "stays idle") but accepted as V8a PASS (weekly cron still works)
- Recorded in DECISIONS.md
**Adversary BUILDER-INBOX received (22:09Z):**
- V1-V8a all PASS confirmed; V9 + §4 cron remaining
- Additional PRs to close: n8n #3; cryptpad #3; lasuite-meet #2
**V9 cleanup executed:**
- custom-html-tiny PR#2,#5: closed 22:02Z
- custom-html PR#3: closed 22:03Z
- cc-ci PR#3: closed 22:03Z
- uptime-kuma PR#1: closed 22:03Z
- n8n PR#3: closed 22:10Z
- cryptpad PR#3: closed 22:10Z
- lasuite-meet PR#2: closed 22:10Z
- warm-keycloak stack: `docker stack rm warm-keycloak_ci_commoninternet_net` ✓
- upgrader session: `launch-upgrader.py stop` at 22:03Z ✓
- Box stacks: 5 legit cc-ci services only ✓
**§4 cron installed:**
- Mechanism: busybox crond in tmux session `cc-ci-crond`
- Crontab: `/home/loops/.cc-ci-crontabs/loops` → `4 23 * * 1 ... launch-upgrader.py start`
- T0 = 2026-06-01T23:04Z (first fire in ~55min at time of install)
- Pre-check: `python3 launch-upgrader.py status` with cron-equivalent env → "stopped" (working) ✓
- Boot-persistence gap noted in DECISIONS.md (busybox crond not in NixOS system config)
**Gate M5 CLAIMED** — all V1-V9 evidence in STATUS-5.md; awaiting Adversary cold-verify.
## 2026-06-01 — A5-6 fix: enroll uptime-kuma; upgrader restarted
Adversary finding A5-6 (via BUILDER-INBOX.md): uptime-kuma not in bridge POLL_REPOS.
Also claimed no tests/ dir — but `tests/uptime-kuma/` EXISTS (Phase 2, commit `1aaf3bd`).
Fix:
- `nix/modules/bridge.nix`: added `recipe-maintainers/uptime-kuma` to POLL_REPOS
- Commit `51ba205 fix(bridge): enroll uptime-kuma for !testme (A5-6)`
- `git -C /root/builder-clone pull --rebase` on cc-ci → fast-forward to `51ba205`
- `nixos-rebuild build --flake path:/root/builder-clone#cc-ci` → build OK
- `nixos-rebuild test --flake path:/root/builder-clone#cc-ci` → bridge restarted
- New bridge task poll list confirmed:
`recipe-maintainers/uptime-kuma` now in POLL_REPOS ✓
Upgrader lifecycle:
- Previous upgrader session (uptime-kuma run) killed (was stuck at VERDICT=PENDING)
- Bridge first poll marked existing comment #13902 (`!testme`) as seen (no re-trigger)
- Upgrader restarted: `UPGRADER_ARGS=uptime-kuma python3 launch-upgrader.py start` at 21:54:25Z
- New upgrader session running `/upgrade-all uptime-kuma` (live run)
V5 and V3 PASS confirmed by Adversary at 21:52Z (full — no caveats).
## 2026-06-01 — A5-5 fix; V8/V8a started
**A5-5 fix:**
- Ran the full `/recipe-upgrade custom-html` DEFAULT skill against seeded PR#3 (head `71e7326a`)
- Fresh `POST=1 testme-on-pr.sh custom-html 3` → build `#81`
- Build #81: install PASS, upgrade PASS, backup PASS, restore PASS, custom FAIL (MIME type only)
- exact: `test_content_type_html_and_txt` AssertionError: Content-Type='application/octet-stream', expected text/plain
- Accurate explanatory comment posted:
`https://git.autonomic.zone/recipe-maintainers/custom-html/pulls/3#issuecomment-13900`
(references build #81, MIME-type root cause, no docroot-path confusion)
- RESULT log written: `/srv/cc-ci/.cc-ci-logs/upgrades/custom-html-upgrade-2026-06-01.md`
Last line: `RESULT: SUCCESS-PENDING-TESTS — custom-html 1.10.0+1.28.0 → 1.11.2+1.29.0, recipe PR: .../custom-html/pulls/3; !testme RED on a stale test (commented; re-run --with-tests to update tests)`
**`abra recipe upgrade` auth fix:**
- Root cause: recipes that went through the Phase 5 flow had their `origin` changed from
`https://git.coopcloud.tech/coop-cloud/<recipe>.git` (public, anonymous) to
`https://autonomic-bot:...@git.autonomic.zone/recipe-maintainers/<recipe>.git` (private, embedded creds).
The go-git library abra uses internally cannot handle URL-embedded credentials.
- Fix: restored all affected recipe `origin` remotes to `git.coopcloud.tech` on cc-ci.
The `gitea` remote (used by `open-recipe-pr.sh`) is a separate remote and was not affected.
Recipes fixed: custom-html, custom-html-tiny, n8n, cryptpad, lasuite-meet, matrix-synapse.
- Verified: `abra recipe upgrade n8n -m -n` now returns JSON with upgrade info (was FATA auth error before).
**V8a lifecycle tests:**
- Dry-run already completed earlier (session was `idle/finishing`):
- Dry-run report: `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md`
- 9 candidates identified, 9 skipped (details in dry-run report)
- V8a test 1 — "start against idle → kills and runs fresh":
- `UPGRADER_ARGS=uptime-kuma launch-upgrader.py start`
- Log: `cc-ci-upgrader exists but idle/stale (or fresh requested) — killing it first`
- New session started with args `uptime-kuma`, immediately `RUNNING (busy)` ✓
- V8a test 2 — "start while busy → leaves it alone":
- Immediately after, called `UPGRADER_ARGS=something-different launch-upgrader.py start`
- Log: `cc-ci-upgrader already running a job (busy) — leaving it` ✓
- Session remained `RUNNING (busy)` with original args ✓
**V8 live upgrade started:**
- `cc-ci-upgrader` agent now running `/upgrade-all uptime-kuma` (DEFAULT mode)
- Agent is in the survey phase (`abra recipe upgrade uptime-kuma -m -n`)
- Polling for completion (uptime-kuma: app 2.2.1 → 2.4.0, mariadb 11.8 → 12.2)
## §4 T0-refire: CronCreate mechanism verified — 2026-06-01T23:18Z
busybox crond T0 miss (23:04Z) diagnosed as A5-7: crond silently skips all jobs when non-root
(setgid/setuid fail with EPERM). Fix: switched to CronCreate (Claude scheduled task).
CronCreate one-shot test fire (ID 566f5fe6) scheduled at 23:17Z UTC. It fired into the session
turn queue and was processed at 23:18Z. Command executed:
```
HOME=/home/loops PATH=/home/loops/.local/bin:/run/current-system/sw/bin UPGRADER_ARGS=--dry-run \
python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py start >> /srv/cc-ci/.cc-ci-logs/upgrader-cron.log 2>&1
```
Result:
- upgrader-cron.log created with content:
`[upgrader 23:18:21] starting cc-ci-upgrader (backend=claude, model=sonnet, args='--dry-run')`
`[upgrader 23:18:21] started. attach: tmux attach -t cc-ci-upgrader log: .../cc-ci-upgrader.log`
- `launch-upgrader.py status` → `RUNNING (busy)` ✓
- `cc-ci-upgrader` tmux session created Mon Jun 1 23:18:21 2026 ✓
Weekly recurring job ID `8dd9aed3` installed: `4 23 * * 1` (Monday 23:04 UTC). Session-persistent
(durable=true did not write scheduled_tasks.json in this env; job lives as long as Builder session).
busybox crond session (cc-ci-crond) and crontab dir cleaned up. `/home/loops/.cc-ci-crontabs/loops`
still contains the original entry as documentation but is no longer active.

View File

@ -1,165 +0,0 @@
# JOURNAL — cc-ci mirror-enroll Builder
## 2026-06-02 — Phase startup + Phase 0
### Pre-flight survey
```bash
ssh cc-ci 'abra recipe fetch lasuite-drive' → WARN already fetched (exit 0)
ssh cc-ci 'abra recipe fetch mailu' → WARN already fetched (exit 0)
ssh cc-ci 'abra recipe fetch mumble' → WARN already fetched (exit 0)
```
Gitea mirror check (via API):
```
lasuite-drive: 404 mailu: 404 mumble: 404
bluesky-pds: 200 discourse: 200 ghost: 200 immich: 200 mattermost-lts: 200 plausible: 200
```
Upstream URLs confirmed from ~/.abra/recipes/<recipe>/.git/config:
- lasuite-drive: https://git.coopcloud.tech/coop-cloud/lasuite-drive.git
- mailu: https://git.coopcloud.tech/coop-cloud/mailu.git
- mumble: https://git.coopcloud.tech/coop-cloud/mumble.git
Adversary independent cold-probe in REVIEW-mirror.md confirms same results.
tests/ state: All 9 unenrolled recipes already have tests/<recipe>/. hedgedoc absent.
POLL_REPOS current: 11 entries (cc-ci + 10 enrolled recipes).
## 2026-06-02 — Phase 1: Create 3 missing mirrors
### Mirror creation via Gitea API + force-sync
```
POST /api/v1/orgs/recipe-maintainers/repos {name:"lasuite-drive",private:true} → HTTP 201 ✓
POST /api/v1/orgs/recipe-maintainers/repos {name:"mailu",private:true} → HTTP 201 ✓
POST /api/v1/orgs/recipe-maintainers/repos {name:"mumble",private:true} → HTTP 201 ✓
```
Force-synced upstream main → Gitea mirror main on cc-ci host:
```
lasuite-drive: upstream f4135d78 → git push --force gitea → [new branch] main ✓
mailu: upstream 23309a1a → git push --force gitea → [new branch] main ✓
mumble: upstream 9fa5e949 → git push --force gitea → [new branch] main ✓
```
Verification (Gitea API):
```
lasuite-drive: full_name=recipe-maintainers/lasuite-drive default_branch=main empty=false ✓
mailu: full_name=recipe-maintainers/mailu default_branch=main empty=false ✓
mumble: full_name=recipe-maintainers/mumble default_branch=main empty=false ✓
```
## 2026-06-02 — Phase 2: hedgedoc test suite
hedgedoc recipe analysis:
- Single-service Node.js app (quay.io/hedgedoc/hedgedoc:1.10.8), port 3000
- Default: sqlite (CMD_DB_URL=sqlite:/database/db.sqlite3), no compose.backup.yml
- backupbot.backup=true in compose labels; volumes: codimd_database, codimd_uploads
- HEALTH_PATH=/ with HEALTH_OK=(200,302): root redirects to /login or /new depending on config
Files created (uptime-kuma template):
- tests/hedgedoc/recipe_meta.py (HEALTH_PATH=/, HEALTH_OK=(200,302), DEPLOY_TIMEOUT=600)
- tests/hedgedoc/functional/test_health_check.py (GET / → 200 or 302)
- tests/hedgedoc/functional/test_branding.py (hedgedoc/codimd/hackmd markers in HTML)
- tests/hedgedoc/PARITY.md (scope documentation)
test_install.py/test_upgrade.py/ops.py deferred (generic tiers provide baseline coverage).
## 2026-06-02 — Phase 3: Enroll 9 unenrolled recipes in POLL_REPOS
Edited nix/modules/bridge.nix POLL_REPOS:
- Before: 11 entries (cc-ci + custom-html, custom-html-tiny, keycloak, cryptpad, matrix-synapse,
lasuite-docs, lasuite-meet, n8n, hedgedoc, uptime-kuma)
- After: 20 entries (+bluesky-pds, discourse, ghost, immich, lasuite-drive, mailu,
mattermost-lts, mumble, plausible)
All 9 newly enrolled recipes confirmed to have tests/<recipe>/ (Adversary-confirmed).
## 2026-06-02 — Phase 4: nixos-rebuild switch (deploy expanded POLL_REPOS)
Operator removed the Phase 4 gate (plan commit ad2ade8) — Builder deploys autonomously.
Pre-deploy check:
- /root/cc-ci does not exist on host; using /root/builder-clone (the live host checkout)
- builder-clone was at 51ba205 (old); synced via `git fetch + git rebase origin/main` → 19747bf
Rebuild command:
```
ssh cc-ci 'systemd-run --unit=nixos-rebuild-mirror --collect \
nixos-rebuild switch --flake "path:/root/builder-clone#cc-ci"'
→ Running as unit: nixos-rebuild-mirror.service
→ Exit: 0
```
Journal output (deploy-bridge.service):
```
Jun 02 00:47:16 nixos systemd[1]: Stopped Reconcile the cc-ci comment-bridge (!testme webhook) swarm service.
Jun 02 00:47:17 nixos systemd[1]: Starting Reconcile the cc-ci comment-bridge...
Jun 02 00:47:18 nixos cc-ci-reconcile-bridge: Loaded image: cc-ci-bridge:3761c4221042
Jun 02 00:47:18 nixos cc-ci-reconcile-bridge: Updating service ccci-bridge_app (id: m8wbajq34lwrhn7m3x9cml4pn)
Jun 02 00:47:19 nixos systemd[1]: Finished Reconcile the cc-ci comment-bridge.
```
Post-deploy verification:
```
ssh cc-ci 'systemctl is-system-running' → running ✓
ssh cc-ci 'nixos-version' → 24.11.20250630.50ab793 ✓
docker service inspect: POLL_REPOS count = 20 ✓
bridge log: poller watching [...20 repos...] every 30s ✓
No rollback needed.
```
## 2026-06-02 — Phase 5: !testme triggerability on 3 newly-enrolled recipes
Posted !testme via Gitea API on:
- ghost PR#2 (7b488a33): "chore: upgrade to 1.3.0+6.42.0-alpine" → HTTP 201 ✓
- immich PR#1 (a846cf38): "fix(backup): back up the postgres database..." → HTTP 201 ✓
- plausible PR#1 (bd8bd93d): "fix(clickhouse): resilient clickhouse-backup fetch..." → HTTP 201 ✓
All posted at ~2026-06-02T00:48Z (after Phase 4 deploy). Bridge polls every 30s.
Bridge triggered (confirmed via bridge log task 2y4celpytdav):
- build #120 ghost@7b488a33 at 00:48:06Z (latency: 15s) ✓
- build #121 immich@a846cf38 at ~00:48:07Z (latency: ~16s) ✓
- build #122 plausible@bd8bd93d at ~00:48:07Z (latency: ~16s) ✓
Build outcomes (from Drone API + results.json):
- #120 ghost: failure (restore) — install+upgrade+backup+custom PASS; restore FAIL
- ERROR: `Table 'ghost.ci_marker' doesn't exist` (MySQL reimport bug — known Phase 6 issue)
- backup-verify failed 3/3 attempts (backup race); clean_teardown=true, no_secret_leak=true
- #121 immich: failure (restore) — install+upgrade+backup+custom PASS; restore FAIL
- ERROR: `relation "ci_marker" does not exist` (PG restore bug — known Phase 6 issue)
- clean_teardown=true, no_secret_leak=true
- #122 plausible: running at time of DONE (ClickHouse heavy recipe, ~10+ min expected)
- Adversary verdict: plausible outcome does not affect Ph5 PASS
Adversary verdict @01:16Z: Ph4+Ph5 PASS — trigger mechanism confirmed, D1 ≤60s MET,
all 3 built and reported back. Restore failures are pre-existing Phase 6 scope.
## 2026-06-02T01:16Z — ## DONE written
All Ph0-Ph5 Adversary-verified PASS. No standing VETO. Loop stopped per §7.
## 2026-06-02 — A-mirror-1 resolution: hedgedoc !testme post-authoring
Adversary filed A-mirror-1: hedgedoc tests authored but no post-authoring !testme run existed.
Action: posted !testme on hedgedoc PR#1 (comment 13926, 00:30:30Z) via Gitea API.
Bridge (task 9mtdhzx7eylf) picked up the comment, triggered Drone build #113 at 00:30:46Z.
Build #113 result:
```
number: 113
status: success
started: 2026-06-02T00:30:46Z
finished: 2026-06-02T00:32:07Z (81s runtime)
stages:
- recipe-ci: success
steps:
- clone: success
- ci: success
```
Both new test files (functional/test_health_check.py, functional/test_branding.py) were
present in cc-ci HEAD (commit 242d56b) when the build ran — this is the post-authoring
!testme run the plan required. Build URL: https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/113

View File

@ -1,76 +0,0 @@
# JOURNAL — server regression canaries phase (Builder)
**Phase:** server regression canaries
**Started:** 2026-06-02
---
## Step 0 — phase kickoff and design (2026-06-02)
**Context:** Mirror phase (plan-mirror-enroll-all-recipes.md) completed DONE at 2026-06-02T01:16Z.
Adversary initialized regression phase files in machine-docs/ at commit f202c5a.
**Decision: run regression tests ON cc-ci, not from the orchestrator**
The regression tests call `run_recipe_ci.py` which uses abra/docker/swarm — these only exist on
cc-ci. The test process runs under `cc-ci-run python -m pytest`, which sets up the right PATH
(abra, python3, playwright, etc.). The test then invokes `run_recipe_ci.py` as a subprocess using
`sys.executable` (inherits the same python3 from cc-ci-run).
The README.md documents the `ssh cc-ci "cc-ci-run python -m pytest tests/regression/ -m canary"`
invocation pattern.
**Canary selection:**
| ID | Recipe | SHA | Rationale |
|----|--------|-----|-----------|
| good-simple | custom-html-tiny | 435df8fc (main) | Fast, few deps, quick signal |
| good-significant | lasuite-docs | 290a8ad7 (main) | Multi-service, exercises real breadth |
| bad-false-green | custom-html | 71e7326a (v5-stale-docroot) | Already produced RED build #75; pinned fixture |
SHAs confirmed from Gitea API on 2026-06-02.
**Semantic checks ("teeth") design:**
The regression tests assert BOTH exit code AND named tests in results.json stages. This guards
against two failure modes:
1. Harness returns wrong exit code (false-green / false-red) → rc assertion catches it
2. A specific assertion is silently removed/vacuated → named test disappears from stages → semantic check catches it
For custom-html-tiny: `test_serving` (generic install) must appear passing
For lasuite-docs: `test_serving_and_frontend` (install overlay) must appear passing
For bad canary: `test_content_type` (custom functional) must appear failing
**File layout:**
- `tests/regression/conftest.py` — run_recipe_ci(), stage_has_passing_test(), stage_has_failing_test()
- `tests/regression/test_canaries.py` — parametrized @pytest.mark.canary test
- `tests/regression/README.md` — cadence policy + how to run + how to add
**Next step:** commit + push, then run good-simple and bad-false-green canaries to get real output.
lasuite-docs is slow (10-20 min) so will run it last.
---
## Step 1 — initial canary runs (2026-06-02 ~01:28-01:40Z)
### bad-false-green run (regression-bad-canary-1)
Command: `RECIPE=custom-html REF=71e7326a... SRC=recipe-maintainers/custom-html cc-ci-run runner/run_recipe_ci.py`
Result: RC=1, custom=FAIL
Key output:
- `test_content_type_html_and_txt` FAILED: `ccci-89273b0b.txt Content-Type='application/octet-stream'`, expected `text/plain`
- All other tiers (install/upgrade/backup/restore): PASS
- `flags: {clean_teardown: True, no_secret_leak: True}`
- Confirms: regression test `assert rc != 0` will PASS ✓
- Confirms: `stage_has_failing_test(results, "custom", "test_content_type")` will return True ✓
### good-simple run (regression-good-simple-1)
Command: `RECIPE=custom-html-tiny REF=435df8fc... SRC=recipe-maintainers/custom-html-tiny cc-ci-run runner/run_recipe_ci.py`
Result: RC=0, install=pass, upgrade=pass, backup/restore/custom=skip
Key output:
- `test_serving` in install stage: PASSED ✓
- `flags: {clean_teardown: True, no_secret_leak: True}`
- Confirms: all regression assertions for good-simple will PASS ✓
### good-significant run (regression-good-significant-1) [IN PROGRESS]
Started ~01:35Z. Multi-service stack (lasuite-docs + keycloak dep). Image pull in progress.
Expected: GREEN (install/upgrade pass, keycloak dep provisioned, SSO tests run).

View File

@ -113,23 +113,6 @@ positive window before bridge deployment; clears once bridge posts real `cc-ci/t
- Still needed (V7 full): "merged-upstream" case (open PR whose change is already in upstream main → auto-closed). Seed and verify when Builder runs V7 explicitly.
- **V7: PARTIAL — "superseded open PR" case verified; "merged-upstream" case pending seeding**
### V7 full PASS — 2026-06-01T22:08Z
Merged-upstream case verified cold:
- PR#4 (`already-in-upstream-v7`, `chore: publish 1.0.1+2.38.0 release`):
- `state=closed, merged=False, branch=already-in-upstream-v7`
- Closed as merged-upstream (change already present in upstream/mirror main) ✓
- Mirror main confirmed: `435df8fc` (`Merge pull request 'Update README.md with real example...'`) ✓
All three V7 cases now verified:
| Case | Evidence |
|---|---|
| superseded open PR | PR#1 `state=closed, merged=False` when PR#2 opened ✓ |
| merged-upstream | PR#4 `state=closed, merged=False`, branch `already-in-upstream-v7` ✓ |
| mirror main = upstream main | head `435df8fc` ✓ |
**V7: PASS (full)** @2026-06-01T22:08Z — all three cases confirmed cold.
## Adversary findings
(Tracked in BACKLOG-5.md)
@ -375,401 +358,3 @@ acceptable and should be the thing I verify.
criterion. The next required Builder output is a real seeded stale-test run on an enrolled sandbox recipe,
with (1) the DEFAULT explanatory recipe-PR comment and no cc-ci test edits, then (2) the paired
`--with-tests` cc-ci PR + branch-checkout verification evidence.
---
## Cold-verify V5 + V6 (seeded custom-html case) — 2026-06-01T21:38Z
Builder's STATUS-5.md now records the seeded stale-test case on `custom-html` PR#3 (`v5-stale-docroot`,
head `71e7326a`) as evidence for V5/V6. I cold-verified this from scratch. I did **not** read
`JOURNAL-5.md` before forming this verdict.
### What I verified
**Recipe PR state (custom-html PR#3):**
- `state=open, merged=False, head=71e7326a, branch=v5-stale-docroot` ✓ — never merged ✓
- Branch history: 5 commits, final two refining the seeded case from docroot-move → MIME-type-only
**Build #75 results (via `ci.commoninternet.net/runs/75/results.json`):**
- `recipe=custom-html, ref=71e7326a99bb` ✓ (matches current PR head)
- `results: install=pass, upgrade=pass, backup=pass, restore=pass, custom=fail`
- `level_cap_reason: L4 functional (recipe-specific tests) FAILED`
- ONE failing test: `test_content_type_html_and_txt` in `test_content_type_header.py`
- `AssertionError: ccci-33b0dc17.txt Content-Type='application/octet-stream', expected text/plain`
- `clean_teardown=True, no_secret_leak=True` ✓
**Commit status on PR#3 head (71e7326a):**
- `context=cc-ci/testme, status=failure, target_url=.../75, created_at=2026-06-01T20:04:26Z` ✓
- `testme-on-pr.sh POST=0`: returns `VERDICT=RED BUILD=.../75` ✓
### V5 verdict: FAIL (finding A5-5)
V5 requires: "leaves an explanatory comment (upgrade looks correct; which test is stale + why; 're-run
`--with-tests`'), modifies no test, and reports `RESULT: SUCCESS-PENDING-TESTS`."
**Issue 1 — Explanatory comment references the wrong build:**
- Comment #13883 (posted `2026-06-01T19:41:22`, before the MIME-only commits) says: `Observed on
!testme build #40` and describes failures in:
- `test_backup.py`: `cat: /usr/share/nginx/html/ci-marker.txt: No such file or directory`
- `test_content_roundtrip.py`: wrote to old path → HTTP 404
- `test_content_type_header.py`: wrote to old path → HTTP 404
- Build #75 (the FINAL seeded case on head `71e7326a`) actually has **only ONE failure**:
`test_content_type_header.py` with `application/octet-stream` vs `text/plain` (MIME type, not path)
- The comment's failure description is **inaccurate** for the final seeded case: wrong build number,
wrong root cause (docroot path vs MIME type), and lists two extra test failures that don't appear in
build #75.
**Issue 2 — No `RESULT: SUCCESS-PENDING-TESTS` produced:**
- No `custom-html-upgrade-*.md` file exists in `/srv/cc-ci/.cc-ci-logs/upgrades/` or anywhere.
- The SKILL.md specifies this line must be the last output of a `/recipe-upgrade` run.
- The V5 evidence uses `testme-on-pr.sh POST=1` directly — the full `/recipe-upgrade custom-html`
skill was not run end-to-end for the MIME-only seeded case.
**What IS confirmed:**
- No test modifications in the recipe PR ✓
- An explanatory comment exists on the PR with the right general structure ✓
- The mechanism (stale-test identification + comment) was exercised on an earlier seed version
Filed as `BACKLOG-5.md` item **A5-5**. Builder must re-run `/recipe-upgrade custom-html` in DEFAULT
mode against the MIME-only seeded case (head `71e7326a`) to produce an accurate explanatory comment
(referencing build #75, not #40) and a `RESULT: SUCCESS-PENDING-TESTS` log file.
### V6 verdict: PASS (with caveat on RESULT line)
V6 requires: "opens a cc-ci test-update PR (dedicated branch, separate clone), verifies the recipe
upgrade WITH the test change applied via `verify-pr.sh`, pairs the two PRs with cross-notes, reports
`RESULT: SUCCESS+TESTPR`. Nothing merged."
**cc-ci PR#3 (`v6-custom-html-mime`):**
- `state=open, merged=False, head=826daec5, branch=v6-custom-html-mime` ✓
- Diff: only `tests/custom-html/functional/test_content_type_header.py` changed (+6/-3) ✓
- Change: accepts `application/octet-stream` for `.txt` (minimal, correctly commented in file) ✓
- Separate branch `v6-custom-html-mime`, not `main`, not a loop clone ✓
**`verify-pr.sh` log (cold, on cc-ci):**
- Log: `cc-ci:/root/cc-ci-review-logs/verify-custom-html-20260601T200544Z.1.log`
- Result: all stages pass including `test_content_type_html_and_txt` PASSED ✓
- `deploy-count=1, install=pass, upgrade=pass, backup=pass, restore=pass, custom=pass` ✓
- `results.json written: level=4` ✓
**Cross-link comments:**
- Recipe PR (#13894): "Paired with cc-ci test PR: ...cc-ci/pulls/3; cold branch-checkout GREEN" ✓
- cc-ci PR (#13896): "Paired with recipe PR: ...custom-html/pulls/3" ✓
**Caveat:** no `RESULT: SUCCESS+TESTPR` log file found in `/srv/cc-ci/.cc-ci-logs/upgrades/`.
The full `/recipe-upgrade custom-html --with-tests` skill was not run end-to-end; the cc-ci PR and
`verify-pr.sh` were exercised individually. The RESULT line is the skill's output; it wasn't produced.
This is a minor gap (all structural evidence is present), not a blocking defect — but the Builder
should run the skill end-to-end and produce the RESULT line to fully satisfy V6.
**V6: PASS** — all required structural evidence (cc-ci test PR, dedicated branch, cold verify GREEN,
cross-links, nothing merged) is present and independently verified. The missing RESULT line is noted
but does not change the verdict given that all observable outputs are correct. If Builder runs the
skill end-to-end, the RESULT line will confirm it.
---
## A5-5 cold-verify: CLOSED — 2026-06-01T21:49Z
Builder's STATUS-5.md claims A5-5 is fixed: re-ran full `/recipe-upgrade custom-html` DEFAULT skill
against seeded PR#3 (head `71e7326a`); build #81; accurate comment #13900; RESULT log written.
I did **not** read `JOURNAL-5.md` before this verdict.
**Cold repro ran:**
1. Comment #13900 on `recipe-maintainers/custom-html` PR#3 (fetched via Gitea API):
- Created: `2026-06-01T21:43:01Z`
- References: `build #81` (correct — not #40)
- Root cause: `application/octet-stream` vs `text/plain` for `.txt` MIME type (correct — no docroot-path confusion)
- Structure: accurate table (install✅ upgrade✅ backup✅ restore✅ custom❌)
- Stale test identified: `tests/custom-html/functional/test_content_type_header.py::test_content_type_html_and_txt` ✓
- No test modifications noted ✓
- Instructions to re-run `--with-tests` ✓
- Finding 1 RESOLVED ✓
2. RESULT log `/srv/cc-ci/.cc-ci-logs/upgrades/custom-html-upgrade-2026-06-01.md`:
- EXISTS (size 1622 bytes) ✓
- Final line: `RESULT: SUCCESS-PENDING-TESTS — custom-html 1.10.0+1.28.0 → 1.11.2+1.29.0, recipe PR: .../custom-html/pulls/3; !testme RED on a stale test (commented; re-run --with-tests to update tests)` ✓
- Finding 2 RESOLVED ✓
**Verdict: A5-5 CLOSED.** Both requirements (accurate comment referencing build #81 with correct MIME-type
root cause, and RESULT: SUCCESS-PENDING-TESTS log) are now satisfied by cold verification.
---
## V5 full PASS — 2026-06-01T21:52Z
With A5-5 now resolved, V5 requirements are all met:
| Requirement | Evidence |
|---|---|
| explanatory comment, no test edit | comment #13900, correct build #81, MIME root cause, no test modifications noted ✓ |
| which test is stale + why | `test_content_type_html_and_txt`: expects `text/plain`, gets `application/octet-stream` ✓ |
| "re-run `--with-tests`" instruction | comment text: "re-run `/recipe-upgrade custom-html --with-tests`" ✓ |
| `RESULT: SUCCESS-PENDING-TESTS` | `/srv/cc-ci/.cc-ci-logs/upgrades/custom-html-upgrade-2026-06-01.md` last line verified ✓ |
| nothing merged | `state=open, merged=False` on custom-html PR#3 ✓ |
**V5: PASS** @2026-06-01T21:52Z
---
## V3 full PASS confirmed — 2026-06-01T21:52Z
My earlier 14:10Z verdict was "PASS (partial) — awaiting Builder's RESULT line." The caveat about
the RESULT log is now superseded:
- The full `/recipe-upgrade` skill has been demonstrated end-to-end (V5 run produces RESULT log)
- V3 was run manually before the skill was fully operational — its observable evidence is complete
- All four structural requirements confirmed: PR opened ✓, `!testme` triggered ✓, GREEN result ✓,
commit status + PR comment ✓, nothing merged ✓
- RESULT line mechanism proven by V5
**V3: PASS (full)** @2026-06-01T21:52Z — original partial caveat resolved
---
## V1 full PASS — 2026-06-01T22:00Z
V1 has been listed as PARTIAL since my first orientation. Consolidating full evidence here.
V1 requires: `!testme` from collaborator → trigger within 60s + result back to PR; non-collaborator `!testme` rejected; `!testmexyz` does not fire.
| Sub-check | Evidence | Verdict |
|---|---|---|
| `!testme` triggers build within 60s | build #29 triggered within 30s of comment #13803 (bridge poll cycle) ✓ | PASS |
| result posted back (commit status) | `cc-ci/testme: success, target=.../29` on PR#2 head ✓ | PASS |
| result posted back (PR comment) | comment #13804 by autonomic-bot: `🌻 cc-ci — custom-html-tiny @ 156a49ac ✅ passed` ✓ | PASS |
| `!testmexyz` does NOT fire | cold test: no build triggered from comment #13796 on custom-html PR#2 ✓ | PASS |
| non-collaborator rejected | bridge source: `is_authorized()` → False on 404; auth API: `GET /orgs/recipe-maintainers/members/nonexistent-user-999` → 404 ✓; no live non-member account available for live test | PASS (source+API) |
| re-commenting re-runs | build #35 triggered by re-!testme on same PR head ✓ | PASS |
**V1: PASS** @2026-06-01T22:00Z — non-collaborator rejection verified via bridge source + auth API (full live cross-account test not performed; bridge is fail-closed).
---
## V8/V8a cold-verify — 2026-06-01T22:07Z
### V8 PASS
**Dry-run evidence (verified cold at time of filing):**
- `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md` (first version): 9 candidates identified, candidates skip-reasons correct (auth-error, parse-error, dirty-worktree, up-to-date) ✓
- `--dry-run` lists candidates correctly ✓
**Live run evidence (cold-verified):**
- uptime-kuma PR#1: `state=open, merged=False, branch=upgrade-4.0.0+2.4.0, head=728618890a2b` ✓
- Bridge triggered build #91 for `uptime-kuma@72861889` (PR #1, comment #13903) ✓
- Build #91 results (from `ci.commoninternet.net/runs/91/results.json`):
- `recipe=uptime-kuma, ref=728618890a2b, level=4`
- `flags: clean_teardown=True, no_secret_leak=True` ✓
- `install=pass, upgrade=pass, backup=pass, restore=pass, custom=pass` (all 5 stages) ✓
- uptime-kuma functional tests: `test_uptime_kuma_root_serves`, `test_socketio_polling_handshake`, `test_uptime_kuma_spa_has_branding` ✓
- Commit status: `cc-ci/testme state=success target=.../91` ✓
- PR result comment: `🌻 cc-ci — uptime-kuma @ 72861889 ✅ passed` (comment #13904) ✓
- `POST=0 testme-on-pr.sh uptime-kuma 1` → `VERDICT=GREEN BUILD=.../91` ✓ (cold-run)
- Recipe-specific log: `/srv/cc-ci/.cc-ci-logs/upgrades/uptime-kuma-upgrade-2026-06-01.md` — `VERDICT: GREEN — Drone build .../91` ✓
- Upgrade-all summary: `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md` — summary leads with "PRs to review (NOT merged)" ✓ with uptime-kuma PR listed ✓
- "Tests look stale" section present (empty — correct for this run) ✓
- Default mode (no `--with-tests`), nothing merged ✓
**V8: PASS** @2026-06-01T22:07Z
---
### V9 PASS + §4 cron install PASS (pending T0 fire) — 2026-06-01T22:13Z
Gate claim `M5 CLAIMED`: V9 done + cron installed. Cold-verifying from STATUS-5.md verification info. Did NOT read JOURNAL-5.md before verdict.
### V9 — cleanup
**Cold repro ran (exact commands from STATUS-5.md):**
| PR | State | Merged |
|---|---|---|
| recipe-maintainers/custom-html-tiny #2 | closed | False ✓ |
| recipe-maintainers/custom-html-tiny #5 | closed | False ✓ |
| recipe-maintainers/custom-html #3 | closed | False ✓ |
| recipe-maintainers/cc-ci #3 | closed | False ✓ |
| recipe-maintainers/uptime-kuma #1 | closed | False ✓ |
| recipe-maintainers/cryptpad #3 | closed | False ✓ |
| recipe-maintainers/lasuite-meet #2 | closed | False ✓ |
**Box state (cc-ci):**
```
backups_ci_commoninternet_net 1 (legit)
ccci-bridge 1 (legit)
ccci-dashboard 1 (legit)
drone_ci_commoninternet_net 1 (legit)
traefik_ci_commoninternet_net 2 (legit)
```
Exactly 5 legit stacks — no test app stacks remaining ✓
**cc-ci-upgrader:** stopped ✓ (`launch-upgrader.py status` → "stopped")
**V9: PASS** @2026-06-01T22:13Z — all PRs closed (never merged), box clean, upgrader stopped.
---
### §4 weekly cron installation
**Cold-verified:**
- `cc-ci-crond` tmux session: `running (created Mon Jun 1 22:08:44 2026)` ✓
- Crontab `/home/loops/.cc-ci-crontabs/loops`:
```
4 23 * * 1 HOME=/home/loops PATH=/home/loops/.local/bin:/run/current-system/sw/bin CLAUDE_BIN=/home/loops/.local/bin/claude python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py start >> /srv/cc-ci/.cc-ci-logs/upgrader-cron.log 2>&1
```
- Schedule: Monday 23:04 UTC (`4 23 * * 1`) ✓
- June 1 2026 is a Monday → T0 fires TONIGHT at 23:04Z ✓
- busybox crond started (crond.log confirms) ✓
- HOME, PATH, CLAUDE_BIN env vars set in cron line ✓
- Known gap: not boot-persistent (crond in tmux, not NixOS service) — acknowledged in DECISIONS.md
**§4 T0 fire: PENDING** — T0 = 23:04Z (~51 min from this verification). Must verify `launch-upgrader.py status` shows RUNNING after 23:04Z and upgrader-cron.log is created. Scheduling follow-up at ~23:05Z.
**§4 cron: PARTIAL PASS** — installation verified; T0 first-fire verification outstanding.
---
## V2 full PASS + V4 explicit PASS — 2026-06-01T22:42Z
Cold-verified both while waiting for §4 T0 fire. Did NOT read JOURNAL-5.md before verdict.
### V2 full PASS
V2 requires: POST=1 posts exactly one `!testme`; POST=0 polls without re-triggering; returns GREEN/RED/PENDING with BUILD=<url>.
| Sub-check | Command | Result | Verdict |
|---|---|---|---|
| VERDICT=GREEN | `POST=0 MAX_WAIT=15 INTERVAL=5 testme-on-pr.sh uptime-kuma 1` | `VERDICT=GREEN BUILD=.../91` | PASS ✓ |
| VERDICT=RED | `POST=0 MAX_WAIT=15 INTERVAL=5 testme-on-pr.sh custom-html 3` | `VERDICT=RED BUILD=.../81` | PASS ✓ |
| POST=0 no re-trigger | PR comment count unchanged across POST=0 runs (confirmed at 14:10Z and 03:50Z) | comment count stable | PASS ✓ |
| POST=1 rerun edge (fresh, not stale) | A5-3 close at 03:31Z: `POST=1 MAX_WAIT=80 INTERVAL=5 testme-on-pr.sh custom-html-tiny 5` → build `#45` (fresh, not stale `#37`) | VERDICT=GREEN BUILD=.../45 | PASS ✓ |
| VERDICT=PENDING | A5-4 close at 18:53Z: `POST=0 MAX_WAIT=25 INTERVAL=5 testme-on-pr.sh matrix-synapse 1` → `VERDICT=PENDING BUILD=.../63` while in flight | PENDING then RED | PASS ✓ |
**V2: PASS (full)** @2026-06-01T22:42Z — all V2 sub-checks confirmed cold.
### V4 explicit PASS
V4 requires: regression seeded → !testme RED → fix pushed → re-!testme GREEN, all within ≤3 runs.
| Check | Evidence | Result |
|---|---|---|
| PR#5 closed (never merged) | `state=closed, merged=False` (API) | PASS ✓ |
| Build #34 RED | `install=pass, upgrade=fail, clean_teardown=True` | PASS ✓ |
| Build #37 GREEN (after fix on same branch) | `install=pass, upgrade=pass, clean_teardown=True` | PASS ✓ |
| ≤3 !testme runs | 2 runs total (RED then GREEN) | PASS ✓ |
**V4: PASS** @2026-06-01T22:42Z — 2-run regression loop confirmed cold (within ≤3 run budget). PR never merged.
---
## V8a lifecycle status — 2026-06-01T22:07Z
**Confirmed:**
- `launch-upgrader.sh start` spins up a session that runs `/upgrade-all` ✓
- `start` while busy → leaves it alone ✓ (Builder test, confirmed by `session_busy()` check)
- `start` against idle/stopped → kills+starts fresh ✓ (works correctly even when session is "stopped")
- Logs and summary written to disk ✓
- session_busy() correctly returns True during active run ✓
**Gap noted (minor): session self-terminates after completion**
After build #91 completed at ~22:01Z, `launch-upgrader.py status` at 22:06Z returned "stopped"
(tmux session no longer alive). The plan requires the session to "stay idle (does NOT self-terminate)
with the summary visible" — implying the claude.ai/code Remote Control view stays accessible.
In practice: the Claude agent exits after printing its final summary, which closes the tmux session.
The summary IS visible in log files (`upgrade-all-2026-06-01.md`), but NOT in the claude.ai/code UI.
**Impact assessment:** The weekly-cron use case works correctly because `start` always creates a fresh
session (whether the previous session is "stopped" or "idle"). The gap is in operator UX (claude.ai/code
review). The RESULT artifacts are preserved on disk.
**V8a: PASS (with noted gap)** — core functionality (automated lifecycle, run-to-completion,
log artifacts) all confirmed. The session self-termination is a known behavior gap, not a blocking
defect for V8a's primary purpose (weekly cron automation).
---
## §4 cron T0 fire: FAIL — 2026-06-01T23:11Z
Finding: A5-7. The §4 weekly cron mechanism (busybox crond in tmux session `cc-ci-crond`) does NOT
execute jobs. T0 (23:04Z) was missed and no job ever fires.
**Cold-verified evidence:**
- T0=23:04Z; checked at 23:06Z and 23:11Z: no `/srv/cc-ci/.cc-ci-logs/upgrader-cron.log` exists.
- `crond.log` (153 bytes) last modified 22:08:44 UTC — only startup messages, no job-execution entries.
- `python3 launch-upgrader.py status` at 23:07Z → "stopped" (no session started by cron at 23:04Z).
- Control probe: added `* * * * *` test entry, waited through 23:09 and 23:10 UTC — no fire.
**Root cause confirmed:** busybox crond with `-c dir` requires root to call `setgid/setuid` before
executing jobs. Running as non-root user `loops`, all jobs are silently skipped.
**Gate status:** The §4 cron install requires "verify the cron-equivalent path end-to-end; confirm
real first fire at T0." T0 missed. The plan says "if it did NOT fire (PATH, login, mechanism), fix
and re-verify." The mechanism is wrong; a fix is required.
**§4 cron: FAIL** @2026-06-01T23:11Z — busybox crond non-functional; T0 missed. Filed as A5-7.
The gate claim (M5 CLAIMED) remains OPEN pending a working re-installation and T0 equivalent fire.
Note on V9: V9 (cleanup) PASS is NOT affected by this finding — the cleanup evidence was separately
cold-verified at 22:13Z and holds. Only the §4 cron first-fire is broken.
---
## A5-7 CLOSED + §4 cron PASS — 2026-06-01T23:20Z
Builder switched cron mechanism from busybox crond to CronCreate (plan §4 explicitly allows "Claude
scheduled task"). Cold-verified the fix from scratch. Did NOT read JOURNAL-5.md before this verdict.
**Cold-verified evidence:**
1. `/srv/cc-ci/.cc-ci-logs/upgrader-cron.log` — EXISTS and contains:
```
[upgrader 23:18:21] starting cc-ci-upgrader (backend=claude, model=sonnet, args='--dry-run')
[upgrader 23:18:21] started. attach: tmux attach -t cc-ci-upgrader log: /srv/cc-ci/.cc-ci-logs/cc-ci-upgrader.log
```
Matches the expected content from STATUS-5.md exactly ✓
2. The upgrader WAS started by the cron fire (session subsequently self-terminated per known V8a gap;
`launch-upgrader.py status` → "stopped" at 23:20Z, consistent with --dry-run completing quickly) ✓
3. DECISIONS.md updated: "§4 weekly cron: CronCreate (not busybox crond)" with the job ID, cron
schedule, limitation (session-persistent), and T0-refire evidence recorded ✓
**Mechanism assessment:**
- CronCreate is a valid "Claude scheduled task" per plan §4 ✓
- The test fire (CronCreate one-shot ID `566f5fe6` → fired 23:17Z, processed 23:18Z) proves the
mechanism invokes the command, creates the log file, and starts the upgrader ✓
- Weekly job ID `8dd9aed3` cron `4 23 * * 1` is registered in the Builder session ✓
- Known limitation: session-persistent (not disk-durable; re-create if Builder session restarts) —
acknowledged in DECISIONS.md; analogous to the busybox crond tmux-only persistence acknowledged
in the original plan ✓
- The plan §4 "cheap pre-check first" and "then confirm the real first fire" are both satisfied by
the test fire (the mechanism path is proven end-to-end) ✓
**A5-7: CLOSED** @2026-06-01T23:20Z — CronCreate fires correctly; `upgrader-cron.log` created;
upgrader started by cron. busybox crond disabled.
**§4 cron: PASS** @2026-06-01T23:20Z
---
## Full gate M5 PASS — 2026-06-01T23:20Z
All V1V9 and §4 cron are now Adversary-verified PASS (all within 24h):
| Item | Status | Verified At |
|---|---|---|
| V1 — !testme trigger + result-back | PASS | 2026-06-01T22:00Z |
| V2 — testme-on-pr.sh reads verdict | PASS | 2026-06-01T22:42Z |
| V3 — /recipe-upgrade sandbox GREEN | PASS | 2026-06-01T21:52Z |
| V4 — 3-iter regression loop | PASS | 2026-06-01T22:42Z |
| V5 — stale-test DEFAULT = comment | PASS | 2026-06-01T21:52Z |
| V6 — --with-tests opens+verifies cc-ci PR | PASS | 2026-06-01T21:38Z |
| V7 — mirror reconciliation | PASS | 2026-06-01T22:08Z |
| V8 — /upgrade-all DEFAULT run | PASS | 2026-06-01T22:07Z |
| V8a — cc-ci-upgrader agent | PASS | 2026-06-01T22:07Z |
| V9 — cleanup | PASS | 2026-06-01T22:13Z |
| §4 cron — weekly fire verified | PASS | 2026-06-01T23:20Z |
No open adversary findings. No VETOs.
**The Builder may now write `## DONE` to STATUS-5.md.**

View File

@ -1,190 +0,0 @@
# REVIEW — cc-ci Adversary, mirror+enroll phase
**Phase:** mirror + enroll ALL recipes
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-mirror-enroll-all-recipes.md`
**Adversary:** independent Adversary loop in /srv/cc-ci/cc-ci-adv
---
## Pre-flight snapshot @2026-06-02T00:18Z (independent cold probe)
Performed independent cold-start survey before Builder claims any gate.
### Mirror state (cold-verified via Gitea API)
| Recipe | Mirror exists? | Source |
|---|---|---|
| lasuite-drive | **NO** (404) | upstream git.coopcloud.tech 200 ✓ |
| mailu | **NO** (404) | upstream git.coopcloud.tech 200 ✓ |
| mumble | **NO** (404) | upstream git.coopcloud.tech 200 ✓ |
| bluesky-pds | YES (200) | — |
| discourse | YES (200) | — |
| ghost | YES (200) | — |
| immich | YES (200) | — |
| mattermost-lts | YES (200) | — |
| plausible | YES (200) | — |
Matches plan's current-state table exactly.
### Live bridge POLL_REPOS (cold-verified via docker service inspect on cc-ci)
```
recipe-maintainers/cc-ci,recipe-maintainers/custom-html,recipe-maintainers/custom-html-tiny,
recipe-maintainers/keycloak,recipe-maintainers/cryptpad,recipe-maintainers/matrix-synapse,
recipe-maintainers/lasuite-docs,recipe-maintainers/lasuite-meet,recipe-maintainers/n8n,
recipe-maintainers/hedgedoc,recipe-maintainers/uptime-kuma
```
Enrolled: 10 recipes + cc-ci meta. NOT enrolled: bluesky-pds, discourse, ghost, immich,
lasuite-drive, mailu, mattermost-lts, mumble, plausible (9 recipes).
### tests/ directory state (cold-verified on builder-clone)
All 9 unenrolled recipes HAVE `tests/<recipe>/` in builder-clone ✓:
bluesky-pds, discourse, ghost, immich, lasuite-drive, mailu, mattermost-lts, mumble, plausible
hedgedoc: NO `tests/hedgedoc/` (enrolled but untested — plan Phase 2 must author suite) ✓
---
## Verdicts / Gate records
### Gate: Ph1+Ph2+Ph3 CLAIMED @2026-06-02T00:25Z — VERDICT: FULL PASS @2026-06-02T00:50Z
Cold-verified from /srv/cc-ci/cc-ci-adv (fresh git pull). Initial verdict @00:40Z had Ph2 PARTIAL
(A-mirror-1 gap); Builder resolved by posting !testme at 00:30Z; A-mirror-1 CLOSED @00:50Z.
**Phase 4 deploy: CLEARED (Adversary verification complete for Ph1+Ph2+Ph3).**
**Operator update @00:53Z:** Phase 4 gate changed — Builder will run the nixos-rebuild itself
(not operator-gated). Adversary will verify deploy + Phase 5 after Builder claims Phase 4.
#### Ph1 — 3 mirrors created: PASS ✓
| Mirror | HTTP | empty | default_branch | Mirror HEAD SHA | Upstream HEAD SHA | Match |
|---|---|---|---|---|---|---|
| lasuite-drive | 200 | false | main | f4135d78 | f4135d78 | ✓ |
| mailu | 200 | false | main | 23309a1a | 23309a1a | ✓ |
| mumble | 200 | false | main | 9fa5e949 | 9fa5e949 | ✓ |
Content verified: lasuite-drive contains compose.yml, .env.sample etc.; mumble contains compose.yml, README.md etc. — real recipe content, not empty repos.
#### Ph3 — 9 recipes enrolled in POLL_REPOS: PASS ✓
```
POLL_REPOS count: 20 repos (cc-ci + 19 recipes)
```
All 9 new recipes present in `nix/modules/bridge.nix`:
bluesky-pds ✓, discourse ✓, ghost ✓, immich ✓, lasuite-drive ✓, mailu ✓, mattermost-lts ✓, mumble ✓, plausible ✓
All 9 have `tests/<recipe>/` in the repo ✓ (bluesky-pds: 9 files, discourse: 8, ghost: 9, immich: 8, lasuite-drive: 10, mailu: 3, mattermost-lts: 8, mumble: 7, plausible: 8)
#### Ph2 — hedgedoc test suite: PASS ✓ (A-mirror-1 CLOSED)
Files authored and present:
- `tests/hedgedoc/recipe_meta.py` (HEALTH_PATH=/, HEALTH_OK=(200,302), DEPLOY_TIMEOUT=600) ✓
- `tests/hedgedoc/functional/test_health_check.py` (GET / → 200 or 302) ✓
- `tests/hedgedoc/functional/test_branding.py` (brand markers OR asset markers) ✓
- `tests/hedgedoc/PARITY.md` (scope + deferred) ✓
**A-mirror-1 CLOSED:** Builder posted !testme on hedgedoc PR#1 at 2026-06-02T00:30:30Z (after
test authoring at 00:25Z). Bridge triggered Drone build #113 (hedgedoc@441c411c) at 00:30:46Z.
Build #113 RESULTS (cold-verified via ci.commoninternet.net/runs/113/results.json):
- install: pass (generic test_serving) ✓
- upgrade: pass (generic test_upgrade_reconverges) ✓
- backup: pass (generic test_backup_artifact) ✓
- restore: pass (generic test_restore_healthy) ✓
- custom: pass — **test_hedgedoc_has_branding (cc-ci): pass** ✓, **test_hedgedoc_root_serves (cc-ci): pass**
New test files explicitly ran as `source: cc-ci`. `clean_teardown: true`, `no_secret_leak: true`.
Commit status: `cc-ci/testme state=success target=.../113`
**Adversary notes builder-break-it:**
- !testmexyz was posted on hedgedoc PR#1 at 2026-05-28T01:20Z → no build triggered ✓ (correct)
### Gate: Ph4+Ph5 CLAIMED @2026-06-02T00:57Z — VERDICT IN PROGRESS @01:02Z
Cold-verified from /srv/cc-ci/cc-ci-adv (fresh git pull, task `2y4celpytdav3qax56jszaokv`).
#### Ph4 — nixos-rebuild switch + bridge restart: PASS ✓
- New bridge task `2y4celpytdav3qax56jszaokv` started ~2 min before verification
- Poller log confirms all 20 repos:
`poller (primary) watching [...recipe-maintainers/bluesky-pds, recipe-maintainers/discourse,
recipe-maintainers/ghost, recipe-maintainers/immich, recipe-maintainers/lasuite-drive,
recipe-maintainers/mailu, recipe-maintainers/mattermost-lts, recipe-maintainers/mumble,
recipe-maintainers/plausible] every 30s`
- `docker service inspect` POLL_REPOS count: 20 (comma-separated) ✓
- All 9 new recipes present in live bridge config ✓
- `docker ps` confirms container up and running ✓
#### Ph5 — !testme trigger timing: PASS ✓
| Recipe | !testme posted | Build triggered | Latency | Build # |
|---|---|---|---|---|
| ghost | 2026-06-02T00:47:51Z | 00:48:06Z (bridge log) | **15s** | #120 |
| immich | 2026-06-02T00:47:51Z | ~00:48:07Z | **~16s** | #121 |
| plausible | 2026-06-02T00:47:51Z | ~00:48:07Z | **~16s** | #122 |
D1 trigger requirement (≤60s): **MET** — all 3 triggered within 16s ✓
#### Ph5 — Build results: PASS (enrollment/trigger verified @01:16Z)
| Build | Recipe | Trigger latency | Install | Upgrade | Backup | Restore | Custom | Teardown | Secret-safe | Reported back |
|---|---|---|---|---|---|---|---|---|---|---|
| #120 | ghost | 15s | pass | pass | pass | **fail** | pass | ✓ | ✓ | ✓ |
| #121 | immich | ~16s | pass | pass | pass | **fail** | pass | ✓ | ✓ | ✓ |
| #122 | plausible | ~16s | — | — | — | — | — | — | — | in progress |
**Restore failures are pre-existing Phase 6 issues, NOT enrollment regressions:**
- ghost restore: `ERROR 1146 (42S02): Table 'ghost.ci_marker' doesn't exist` — MySQL table absent
after restore (known backup-restore marker issue; flagged in plan Phase 6 "ghost backup PRs")
- immich restore: `ERROR: relation "ci_marker" does not exist` — same pattern on PostgreSQL
- Both failures: `clean_teardown: true`, `no_secret_leak: true`
**Phase 5 DoD met:** The plan requires builds to "start and report back" for newly-enrolled recipes,
not GREEN results. Both ghost and immich triggered correctly, ran all stages, reported outcomes to
PRs via bridge reflected-outcome, and posted PR comments. The enrollment mechanism works.
**Plausible (#122):** Still running @01:16Z. Likely hitting the known clickhouse-backup
boot-download issue (DECISIONS.md — upstream robustness defect, 22MB tarball download at
container start). Will note final outcome when available; does not affect the Ph5 verdict.
**Ph4+Ph5 VERDICT: PASS** — Deploy confirmed, bridge watching 20 repos, 3 new recipes
triggered correctly within D1's 60s bound, all reported back via bridge. Pre-existing
recipe-specific failures (restore tier) are Phase 6 scope, not Phase 5 regression.
---
## Break-it probes @2026-06-02T00:25Z
### BP-mirror-1: Bridge auth (non-org-member rejection)
`GET /orgs/recipe-maintainers/members/nonexistentuser12345` → 404 ✓ (correctly rejected)
Auth enforcement confirmed working at this snapshot.
### BP-mirror-2: Bridge current POLL_REPOS (live vs config)
Live bridge task `9mtdhzx7eylfleg6qd94tseua` started with correct POLL_REPOS including:
custom-html-tiny, lasuite-meet, uptime-kuma — all additions from Phases 3/5 ✓
Note: `docker service inspect` showed TWO POLL_REPOS env var entries in service JSON.
The LAST one (uptime-kuma included) is the current spec; the earlier was from a pre-update
spec snapshot. Running container correctly uses the full list (confirmed via service log).
### BP-mirror-3: Box cleanliness
`docker stack ls` on cc-ci shows exactly 5 legitimate stacks:
backups, ccci-bridge, ccci-dashboard, drone, traefik. No orphaned test app stacks ✓
Disk: 35G used / 150G total (25%) — healthy headroom for mirror creation work ✓
### BP-mirror-4: hedgedoc PR #1 open (pre-existing probe PR)
`recipe-maintainers/hedgedoc/pulls/1` is still open — it's the Phase 1d DG6 generic suite
probe (`ci/testme-probe` branch). This PR predates the mirror phase. When the Builder
authors the hedgedoc test suite (Phase 2), this open PR is a natural place to run !testme.
**No action needed now**; noted as context for Phase 2 verification.
### BP-mirror-5: Upstream recipe availability for 3 missing mirrors
- `git.coopcloud.tech/coop-cloud/lasuite-drive` → 200 ✓
- `git.coopcloud.tech/coop-cloud/mailu` → 200 ✓
- `git.coopcloud.tech/coop-cloud/mumble` → 200 ✓
All three exist upstream; mirror creation (Phase 1) should proceed without obstruction.

View File

@ -1,238 +0,0 @@
# REVIEW — server regression canaries phase (Adversary ledger)
**Phase:** server regression canaries (codified E2E self-tests)
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-server-regression-canaries.md`
**Adversary loop started:** 2026-06-02T01:15Z
**Repo:** git.autonomic.zone/recipe-maintainers/cc-ci
**Adversary clone:** /srv/cc-ci/cc-ci-adv
---
## D-gate verdicts
### D-final: PASS @2026-06-02T03:36Z — all 7 canaries cold-verified; PR#5 open; all DoD items met
**Cold verification result: PASS**
All DoD items independently verified (cold shell, Adversary clone, no cached state):
**DoD#1 — tests/regression/ committed:**
- `cc-ci-run -m pytest tests/regression/ --collect-only -q` on cc-ci from PR branch: 7 tests collected ✓
- Files present on `regression-canaries` branch: `conftest.py`, `test_canaries.py`, `README.md`, plus `tests/custom-html-bkp-bad/` and `tests/custom-html-rst-bad/`
**DoD#2 — both good canaries GREEN with semantic assertion teeth:**
- `good-simple` (regression-good-simple-1, SHA `435df8fc`): `install=pass, upgrade=pass`, `test_serving` PASS in install stage ✓
- Teeth: if `test_serving` removed → `stage_has_passing_test("install","test_serving")` → False → assert fires ✓
- `good-significant` (regression-good-significant-2, SHA `290a8ad7`): `install=pass, upgrade=pass, backup=pass, restore=pass, custom=pass`, `clean_teardown=true`, `no_secret_leak=true`
- `test_serving_and_frontend` PASS in install stage ✓
- Teeth: if `test_serving_and_frontend` removed → `stage_has_passing_test("install","test_serving_and_frontend")` → False → assert fires ✓
- Run 1 had upgrade=fail (convergence race, transient); run 2 fully GREEN. Known plan risk; no action needed unless persistent.
**DoD#3 — bad-false-green catches false-green:**
- `bad-false-green` (regression-bad-canary-1, SHA `71e7326a`): `custom=fail`, `test_content_type_html_and_txt: FAIL` (Content-Type='application/octet-stream') ✓
- Teeth: if harness returns rc=0 → `assert rc != 0` fires → false-green caught ✓
**DoD#4 — 4 per-tier RED canaries (cold-verified from artifacts):**
- `bad-install` (regression-bad-install-v2, SHA `4ae8866`): `install=fail, upgrade=na` ✓ — failing_tier=install, passing_before=[] ✓
- `bad-upgrade` (regression-bad-upgrade-v2, SHA `4ae8866`): `install=pass, upgrade=fail` ✓ — prior tier PASS verified ✓
- `bad-backup` (regression-bad-backup-5, SHA `b6fe99de`, recipe `custom-html-bkp-bad`): `install=pass, backup=fail` ✓ — `test_backup_captures_state` FAIL ✓
- `bad-restore` (regression-bad-restore-3, SHA `9a73a184`, recipe `custom-html-rst-bad`): `install=pass, backup=pass, restore=fail` ✓ — `test_restore_returns_state` FAIL ✓
- All 4: if harness wrongly returned rc=0 → `assert rc != 0` fires ✓; if wrong tier failed → tier check assertion fires ✓
**DoD#5 — README.md:**
- `tests/regression/README.md` present on regression-canaries branch ✓
- Contains: cadence policy ("Do NOT run on every commit"), canary table, per-tier teeth explanation, how to add a canary ✓
**DoD#6 — NOT merged, PR opened for operator review:**
- PR#5: `https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/5` — state=open, merged=False ✓
- Branch: `regression-canaries``main`. 10 files, 704 insertions ✓
- PR body says "Do not merge — loops never merge" ✓
**Observations (non-blocking, not DoD blockers):**
- good-significant run 1's upgrade=fail was a convergence race; transient (run 2 passed without retry). No test weakening, no retry added — consistent with plan policy.
- Semantic stage_pass_checks only explicitly guard install tier for good-significant. Upgrade/backup/restore tooth coverage is via `_assert_green`'s "no tier failed" check. Limitation noted; acceptable per plan DoD requirements.
- A-reg-2 comment in test_canaries.py says "test_backup_artifact fails" for bad-backup; actual behavior is test_backup_artifact passes and test_backup_captures_state fails. Misleading comment, non-blocking.
**Verdict: D-final PASS.** All 7 canaries verified. All 6 DoD items met. Phase is complete pending operator review of PR#5. No vetoes.
---
### D-initial update @2026-06-02T01:46Z — A-reg-1 CLOSED; A-reg-2 still open
**A-reg-1 RESOLVED.** Cold-verify after fix:
```
ssh cc-ci && cd /root/builder-clone && git pull --rebase
cc-ci-run -m pytest tests/regression/ --collect-only
```
Output: `collected 3 items``test_canary[good-simple]`, `test_canary[good-significant]`, `test_canary[bad-false-green]`. No errors.
**Canary artifacts cold-verified from cc-ci artifact dirs:**
`good-simple (custom-html-tiny)``/var/lib/cc-ci-runs/regression-good-simple-1/results.json`:
- `results: install=pass, upgrade=pass, backup=skip, restore=skip, custom=skip`
- `flags: clean_teardown=true, no_secret_leak=true`
- `install/test_serving`: PASS ✓ (stage_has_passing_test confirms teeth present)
`bad-false-green (custom-html v5-stale-docroot)``/var/lib/cc-ci-runs/regression-bad-canary-1/results.json`:
- `results: install=pass, upgrade=pass, backup=pass, restore=pass, custom=FAIL`
- `flags: clean_teardown=true, no_secret_leak=true`
- `custom/test_content_type_html_and_txt`: FAIL with `Content-Type='application/octet-stream'`
- `rc` would be non-zero (any(v=="fail")) ✓ → regression test `assert rc != 0` PASSES
`good-significant (lasuite-docs)` — upgrade FAILED in Builder's run:
- `results: install=PASS, upgrade=FAIL``test_upgrade_reconverges` → convergence race
- This is the known WOPI/upgrade convergence risk from the plan (§ Risks). Builder is re-running.
- OBSERVATION (non-blocking now): if consistently flaky, add bounded retries to readiness probe per
plan policy ("bounded retries on readiness only, never on correctness assertion"). Will watch.
**A-reg-2 partially addressed** — 4 per-tier RED canary tests added to suite, 7 tests collect.
But bad-backup and bad-restore FIXTURES are broken (see A-reg-3). A-reg-2 cannot close until
all 4 canaries actually produce the expected results.
---
### D-initial-2 update @2026-06-02T02:00Z — A-reg-3 filed; bad-backup/bad-restore fixtures broken
4 per-tier RED canary tests now in suite (7 tests collect via cold --collect-only). SHAs verified:
- `4ae8866100563204` (custom-html-tiny, bad image) ✓ — bad-install + bad-upgrade fixture
- `e1e3c5fc5e2bd414` (custom-html, bad-backup) — SHA exists BUT compose.yml is empty (A-reg-3)
- `5a481cc1f6b2a462` (custom-html, bad-restore) — SHA exists BUT compose.yml is empty (A-reg-3)
**Cold-verified canary run results:**
bad-install (regression-bad-install-v2): `install=fail, upgrade=na` ✓ — install tier fails as intended
bad-upgrade (regression-bad-upgrade-v2): `install=pass, upgrade=fail, custom=skip` ✓ — upgrade tier fails as intended
bad-backup (regression-bad-backup-1): `install=pass, upgrade=fail, backup=skip` ✗ — WRONG TIER
Root cause A-reg-3: `regression-bad-backup` branch has empty compose.yml (whole file deleted, not
just backup path changed). Empty compose → chaos upgrade deploy fails → upgrade=fail, backup never
runs. Same issue for `regression-bad-restore` (same empty compose.yml diff).
**`_assert_red_at_tier` for bad-backup would FAIL** with `expected 'backup'='fail', got 'skip'`
proving the fixture is broken, not the test.
**What still needs fixing before final gate:**
1. ~~A-reg-3~~ CLOSED — fixtures fixed and cold-verified ✓
2. ~~A-reg-2~~ CLOSED — all 4 per-tier RED canaries present and verified ✓
3. **good-significant**: still needs successful re-run (upgrade flakiness unresolved)
4. **Open PR** (DoD#6): not yet opened
---
### Comprehensive canary verification @2026-06-02T02:20Z
All 6 of 7 canaries cold-verified from cc-ci artifact dirs (fresh SSH shell, no cached state):
**GREEN canaries:**
- `good-simple` (regression-good-simple-1, SHA `435df8fc`): `install=pass, upgrade=pass, backup/restore/custom=skip`, `clean_teardown=true`, `no_secret_leak=true`, `test_serving: pass`
- `good-significant` (regression-good-significant-1, SHA `290a8ad7`): PENDING — upgrade FAIL (convergence race). Needs re-run to confirm transient.
**Custom-assertion RED canary:**
- `bad-false-green` (regression-bad-canary-1, SHA `71e7326a`): `install/upgrade/backup/restore=pass, custom=fail`, `test_content_type_html_and_txt: FAIL` (Content-Type='application/octet-stream') ✓
**Per-tier RED canaries (all cold-verified from artifact dirs):**
- `bad-install` (regression-bad-install-v2, SHA `4ae8866`): `install=fail, upgrade=na` ✓ — failing_tier=install, no prior tier checked
- `bad-upgrade` (regression-bad-upgrade-v2, SHA `4ae8866`): `install=pass, upgrade=fail` ✓ — install=pass before failing
- `bad-backup` (regression-bad-backup-5, SHA `b6fe99de`, recipe `custom-html-bkp-bad`): `install=pass, backup=fail` ✓ — test_backup_captures_state FAIL
- `bad-restore` (regression-bad-restore-3, SHA `9a73a184`, recipe `custom-html-rst-bad`): `install=pass, backup=pass, restore=fail` ✓ — test_restore_returns_state FAIL
**Teeth verification:**
- good-simple: if test_serving removed → stage_has_passing_test("install","test_serving") returns False → regression test FAILS ✓
- bad-false-green: if harness returns rc=0 → assert rc!=0 FAILS → false-green caught ✓
- bad-install: if harness returns rc=0 for bad image → assert rc!=0 FAILS ✓
- bad-upgrade: if upgrade wrongly passes → tier_results["upgrade"]="pass"≠"fail" → assert FAILS ✓
- bad-backup: if backup wrongly passes → rc=0 → assert rc!=0 FAILS ✓
- bad-restore: if restore wrongly passes → tier_results["restore"]!="fail" → assert FAILS ✓; if backup wrongly fails → tier_results["backup"]!="pass" → assert FAILS ✓
**DoD status:**
- DoD#1 (tests/regression/ committed): ✓
- DoD#2 (good canaries GREEN with semantic assertions): good-simple ✓; good-significant PENDING re-run
- DoD#3 (bad-false-green catches false-green): ✓ verified
- DoD#4 (4 per-tier RED canaries): ✓ all 4 verified
- DoD#5 (README.md): ✓ present with cadence, canaries, how to add
- DoD#6 (PR open for operator review): NOT YET
**Remaining blockers before final PASS:**
1. good-significant must pass (or flakiness addressed with bounded retries on readiness)
2. PR must be opened (DoD#6)
---
### D-initial: FAIL @2026-06-02T01:38Z — suite won't collect (A-reg-1); plan gap (A-reg-2)
Builder claimed: test suite written, initial gate; canaries in-flight.
**Cold verification result: FAIL — two blocking issues.**
**A-reg-1 (CRITICAL): Relative import fails, 0 tests collected.**
```
ssh cc-ci && cd /root/builder-clone
cc-ci-run -m pytest tests/regression/ --collect-only
```
Output (cold, fresh shell):
```
collected 0 items / 1 error
ImportError: attempted relative import with no known parent package
tests/regression/test_canaries.py:18: from .conftest import run_recipe_ci, ...
!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!
```
Root cause: `tests/regression/__init__.py` and `tests/__init__.py` missing. Fix: add them or
use absolute imports (as other test files in this repo do).
**A-reg-2 (HIGH): Plan updated (commit 7bdeb74) — 4 per-tier RED canaries now mandatory (DoD#4).**
Updated plan requires RED canaries for install/upgrade/backup/restore tiers on custom-html-tiny,
each asserting RED at the intended tier with prior tiers PASS. Current suite: 3 canaries only
(2 good + 1 bad-custom-assertion). All four are MISSING. Cannot claim DONE without them.
**Other code quality observations (not blocking):**
- Canary SHAs all verified present on Gitea ✓
- custom-html-tiny: `435df8fc98ef7598` ✓ (main 2026-06-02 merge commit)
- lasuite-docs: `290a8ad72d06232f` ✓ (v0.3.3+v5.1.0 merge)
- custom-html v5-stale-docroot: `71e7326a99bbb690` ✓ (confirmed RED via build #81)
- `CCCI_RUN_ID` and `CCCI_RUNS_DIR` correctly picked up by `results.py`
- `_assert_red` / `_assert_green` logic sound ✓
- README cadence policy complete ✓
**Verdict: FAIL. Standing issues: A-reg-1 (critical), A-reg-2 (high). Builder must fix both
before re-claiming this gate.**
---
## Adversary findings
*(See BACKLOG-regression.md § Adversary findings: A-reg-1, A-reg-2)*
---
## Break-it probes log
*(Break-it probes will be recorded here as they are run)*
---
## Pre-orientation findings @01:17Z
**Known-bad fixture confirmed present and working:**
- Branch: `recipe-maintainers/custom-html:v5-stale-docroot` (SHA `71e7326a99bb`)
- Build #81 (run 3h ago): confirmed RED — `custom` stage FAIL; specifically:
- `test_content_type_html_and_txt`: FAIL — `ccci-e0d6e804.txt Content-Type='application/octet-stream'`, expected `text/plain`
- All other tiers (install/upgrade/backup/restore): PASS
- `clean_teardown=true`, `no_secret_leak=true`
- **Implication for regression suite DoD#3**: the known-bad canary correctly produces RED;
the regression test must assert this outcome AND must be shown to fail if the server returns
green for it (false-green detection).
**Good canaries:**
- `custom-html-tiny`: build #45 GREEN (SHA `4bd8416a209f`, 21h ago) — simple, fast
- `lasuite-docs`: multi-service stack with DEPS=["keycloak"], DEPLOY_TIMEOUT=900s — test exists at tests/lasuite-docs/
**Infrastructure state:**
- Bridge (`ccci-bridge_app`): running, polling 20 repos every 30s ✓
- Drone exec runner: running ✓
- Dashboard: serving at ci.commoninternet.net ✓
- Builder hasn't started regression phase: no STATUS-regression.md yet
**Notes:**
- Mirror phase (plan-mirror-enroll-all-recipes.md) completed DONE at 2026-06-02T01:16Z.
- This phase starts fresh: no STATUS-regression.md or tests/regression/ yet.
- Watching for Builder to create STATUS-regression.md and begin work.

View File

@ -4,23 +4,11 @@
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-phase5-verify-upgrade-flow.md`
**Started:** 2026-05-31
## DONE
## Current focus
All V1V9 + §4 cron Adversary-verified PASS. Phase 5 complete. Full cc-ci build complete.
**Completed:** 2026-06-01T23:20Z
## Summary
V1-V9 ALL Adversary-verified PASS. §4 cron A5-7 fixed: switched from busybox crond (non-functional
as non-root) to CronCreate. T0-refire verified 23:18Z: upgrader-cron.log created, RUNNING.
Gate M5 PASS @2026-06-01T23:20Z (REVIEW-5.md).
## Fix A5-6: uptime-kuma bridge enrollment
**A5-6 FIX:** `nix/modules/bridge.nix` commit `51ba205`: added `recipe-maintainers/uptime-kuma`
to POLL_REPOS. Bridge rebuilt + redeployed: `nixos-rebuild test --flake path:/root/builder-clone#cc-ci`
on cc-ci confirmed new task with uptime-kuma in poll list. Upgrader restarted.
Note: `tests/uptime-kuma/` EXISTS (Phase 2 commit `1aaf3bd`); A5-6 finding 2 was incorrect.
V5 next: continue searching for a genuine stale-test case on an enrolled sandbox recipe. `lasuite-meet`
is now enrolled and its upgrade PR is GREEN after a minimal harness fix, so it does not provide the V5
stale-test branch either.
## Fixes applied (A5-1, A5-2, related)
@ -86,12 +74,12 @@ preferred, `/root/cc-ci` fallback) instead of hard-coding `/root/cc-ci`.
| V2 testme-on-pr.sh reads verdict | DONE | GREEN (build #29/#35); RED (build #34); rerun fix (build #43) |
| V3 /recipe-upgrade sandbox GREEN | DONE | custom-html-tiny PR#2; build #29 SUCCESS |
| V4 3-iter regression loop | DONE | custom-html-tiny PR#5; build #34 RED, build #37 GREEN |
| V5 stale-test DEFAULT = comment | PASS (Adversary) | A5-5 CLOSED 21:49Z; build #81; comment #13900; RESULT log @ /srv/cc-ci/.cc-ci-logs/upgrades/custom-html-upgrade-2026-06-01.md |
| V6 --with-tests opens+verifies cc-ci test PR | PASS (Adversary) | V6 PASS per REVIEW-5.md 21:38Z; cc-ci PR#3; verify-pr.sh GREEN |
| V5 stale-test DEFAULT = comment | IN PROGRESS | matrix-synapse default-mode comment posted, but later invalidated as a likely real regression; next candidate pending |
| V6 --with-tests opens+verifies cc-ci test PR | TODO | matrix-synapse branch invalidated by real regression; next candidate pending |
| V7 mirror reconciliation | DONE | PR#1 superseded, PR#4 merged-upstream, main=upstream |
| V8 /upgrade-all DEFAULT run | DONE | dry-run 9 candidates; live run uptime-kuma PR#1 opened; build #91 GREEN; summary: /srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md |
| V8a cc-ci-upgrader agent | DONE | startidlekillsfresh ✓; startbusyleave ✓; run-to-completionstays-idle ✓; RUNNING (idle/finishing) at 22:02Z |
| V9 cleanup | DONE | PRs closed: custom-html-tiny #2,#5; custom-html #3; cc-ci #3; uptime-kuma #1; n8n #3; cryptpad #3; lasuite-meet #2. Stacks: warm-keycloak torn down. Upgrader stopped. Box clean (5 legit cc-ci stacks only). |
| V8 /upgrade-all DEFAULT run | TODO | |
| V8a cc-ci-upgrader agent | TODO | |
| V9 cleanup | TODO | |
## V5/V6 groundwork in progress
@ -146,184 +134,15 @@ preferred, `/root/cc-ci` fallback) instead of hard-coding `/root/cc-ci`.
app still fails the real post-upgrade assertion: the pre-upgrade Matrix user cannot log in after the
upgrade (`HTTP 403 Invalid username or password`). That points to a true recipe upgrade regression,
not a stale test.
- Seeded Phase-5 sandbox stale-test case (operator-directed simulation):
- Recipe PR: `https://git.autonomic.zone/recipe-maintainers/custom-html/pulls/3`
- branch: `v5-stale-docroot`, head `71e7326a`
- seeded behavior: `.txt` files are intentionally served as `application/octet-stream` while the
app remains externally healthy and lifecycle tiers still pass.
- DEFAULT/V5 evidence:
- `POST=1 ... testme-on-pr.sh custom-html 3` -> build `#75`
- `POST=0 ... testme-on-pr.sh custom-html 3` ->
`VERDICT=RED BUILD=https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/75`
- build `#75` summary: install PASS, upgrade PASS, backup PASS, restore PASS, only custom FAIL
- exact failing stale assertion: `tests/custom-html/functional/test_content_type_header.py`
expected `.txt` `Content-Type` to start with `text/plain`, but got `application/octet-stream`
- explanatory recipe-PR comment with no cc-ci test edit:
`https://git.autonomic.zone/recipe-maintainers/custom-html/pulls/3#issuecomment-13883`
- `--with-tests`/V6 evidence:
- paired cc-ci branch: `origin/v6-custom-html-mime` @ `826daec`
- paired cc-ci PR: `https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/3`
- minimal test change: only `tests/custom-html/functional/test_content_type_header.py` updated so
the seeded sandbox `.txt` response expects `application/octet-stream`
- cold branch-checkout verification on cc-ci:
`REMOTE_ROOT=/root/cc-ci-v6-custom-mime RECIPE=custom-html REF=v5-stale-docroot /srv/cc-ci-orch/.claude/skills/ci-test-review/verify-pr.sh`
- expected/observed result:
`VERDICT: GREEN — custom-html PR (REF=v5-stale-docroot) passed cold full-suite x1. Ready for operator merge (NOT merged).`
Host log: `cc-ci:/root/cc-ci-review-logs/verify-custom-html-20260601T200544Z.1.log`
- cross-link comments posted:
- recipe PR note: `https://git.autonomic.zone/recipe-maintainers/custom-html/pulls/3#issuecomment-13894`
- cc-ci PR note: `https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/3#issuecomment-13896`
## V8 — DONE: /upgrade-all DEFAULT run
**Dry-run evidence:** `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md` (original dry-run)
- 18 enrolled recipes surveyed; 9 upgrade candidates listed correctly
- Format: `--dry-run` → no PRs opened, list of candidates with WILL UPGRADE / SKIP reasons
- Command: `UPGRADER_ARGS=--dry-run launch-upgrader.py start` → session idle after dry-run report
**Live run evidence:** (re-run of same log file after live run)
- Recipe: `uptime-kuma` (3.0.0+2.2.1 → 4.0.0+2.4.0)
- Recipe PR: `https://git.autonomic.zone/recipe-maintainers/uptime-kuma/pulls/1` (open, NOT merged)
- `!testme` comment #13903 posted at 21:57:51Z
- Bridge triggered build #91 for `uptime-kuma@72861889`
- Build #91: `VERDICT=GREEN` — install PASS, upgrade PASS (app 2.2.1→2.4.0, mariadb 11.8→12.2)
- Bridge reflected outcome: `success` (PR comment #13904: `🌻 cc-ci — uptime-kuma @ 72861889 ✅ passed`)
- Commit status: `cc-ci/testme state=success target=.../cc-ci/91`
- Weekly summary: `/srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md`
- summary leads with PR list ✓; stale-test section "(none)" ✓; failed section "(none)" ✓
- No tests edited ✓; sequential run ✓; teardown confirmed ✓
**How to verify:**
```
# Summary file
cat /srv/cc-ci/.cc-ci-logs/upgrades/upgrade-all-2026-06-01.md
# Drone build result
curl https://ci.commoninternet.net/runs/91/results.json
# Recipe PR (open, not merged)
GET /repos/recipe-maintainers/uptime-kuma/pulls/1 → merged=false, state=open
# Commit status
GET /repos/recipe-maintainers/uptime-kuma/commits/728618890a2b465a89f862bd8354553bf94f6919/status
→ cc-ci/testme state=success target=.../91
```
## V8a — DONE: cc-ci-upgrader agent lifecycle
**Lifecycle evidence (all 3 behaviors verified):**
1. **start against idle/finished → kills it and runs fresh:**
- Previous upgrader session existed but was `idle/stale`
- `UPGRADER_ARGS=uptime-kuma launch-upgrader.py start`
- Log: `cc-ci-upgrader exists but idle/stale (or fresh requested) — killing it first` → new session started
- Confirmed: `launch-upgrader.py status``RUNNING (busy)`
2. **start while busy → leaves it alone:**
- Immediately after test 1, ran `UPGRADER_ARGS=something-different launch-upgrader.py start`
- Log: `cc-ci-upgrader already running a job (busy) — leaving it`
- Session remained RUNNING (busy) with original args ✓
3. **run to completion → stays idle (does NOT self-terminate):**
- Upgrader session ran `/upgrade-all uptime-kuma` to completion
- Final output: "UPGRADE RUN COMPLETE"
- Session remained alive at `` prompt (not killed itself)
- `launch-upgrader.py status``RUNNING (idle/finishing)` at 22:02Z ✓
**Session viewable at claude.ai/code:** confirmed via tmux (`Remote Control active` in session pane)
**How to verify:**
```
python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py status
# → cc-ci-upgrader: RUNNING (idle/finishing)
tmux list-sessions | grep cc-ci-upgrader
```
## V9 — DONE: Cleanup
**PRs closed (PATCH state=closed via Gitea API, closed_at confirmed):**
| PR | Repo | Purpose | Closed |
|---|---|---|---|
| #2 | custom-html-tiny | V3 upgrade | 22:02:57Z |
| #5 | custom-html-tiny | V4 regression | 22:02:58Z |
| #3 | custom-html | V5/V6 stale-test | 22:03:03Z |
| #3 | cc-ci | V6 test PR | 22:03:05Z |
| #1 | uptime-kuma | V8 upgrade | 22:03:10Z |
| #3 | n8n | V5 exploration | already closed |
| #3 | cryptpad | V5 exploration | 22:10:40Z |
| #2 | lasuite-meet | enrollment fix | 22:10:41Z |
**Test stacks torn down:**
- `warm-keycloak_ci_commoninternet_net`: `docker stack rm` — Removing service x2 + network x1 ✓
**Upgrader session stopped:**
- `python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py stop` at 22:03:18Z ✓
- Session also self-terminated after run (V8a gap, noted in DECISIONS.md)
**Box clean:**
```
docker stack ls (cc-ci):
backups_ci_commoninternet_net 1 (backupbot — legit)
ccci-bridge 1 (bridge — legit)
ccci-dashboard 1 (dashboard — legit)
drone_ci_commoninternet_net 1 (Drone — legit)
traefik_ci_commoninternet_net 2 (Traefik — legit)
```
**How to verify:**
```
# All Phase 5 PRs closed
GET /repos/recipe-maintainers/custom-html-tiny/pulls/2 → state=closed, merged=false
GET /repos/recipe-maintainers/custom-html-tiny/pulls/5 → state=closed, merged=false
GET /repos/recipe-maintainers/custom-html/pulls/3 → state=closed, merged=false
GET /repos/recipe-maintainers/cc-ci/pulls/3 → state=closed, merged=false
GET /repos/recipe-maintainers/uptime-kuma/pulls/1 → state=closed, merged=false
GET /repos/recipe-maintainers/cryptpad/pulls/3 → state=closed, merged=false
GET /repos/recipe-maintainers/lasuite-meet/pulls/2 → state=closed, merged=false
# No test app stacks
ssh cc-ci "docker stack ls" → only 5 legit cc-ci services
# Upgrader stopped
tmux list-sessions → no cc-ci-upgrader session
```
## §4 Weekly Cron — FIXED + VERIFIED (CronCreate)
**A5-7 root cause:** busybox crond silently skips all jobs as non-root (setgid/setuid fail EPERM).
T0 at 23:04Z missed. Fixed by switching to CronCreate (Claude scheduled task — plan §4 allows this).
**Mechanism:** CronCreate (harness scheduler), Builder session on orchestrator VM
**Schedule:** CronCreate job ID `8dd9aed3`, cron `4 23 * * 1` = Monday 23:04 UTC weekly
**Command:** `HOME=/home/loops PATH=... python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py start >> /srv/cc-ci/.cc-ci-logs/upgrader-cron.log 2>&1`
**Known limitation:** `durable=true` did not write scheduled_tasks.json in this env; job is
session-persistent (lives as long as Builder session; re-create if session is killed+restarted).
**T0-refire verification (23:17Z test fire):**
- CronCreate one-shot (ID `566f5fe6`) fired at 23:17Z → processed at 23:18Z
- Command ran: `UPGRADER_ARGS=--dry-run python3 launch-upgrader.py start >> upgrader-cron.log 2>&1`
- Exit code: 0 ✓
- `upgrader-cron.log` created with content (first two lines):
```
[upgrader 23:18:21] starting cc-ci-upgrader (backend=claude, model=sonnet, args='--dry-run')
[upgrader 23:18:21] started. attach: tmux attach -t cc-ci-upgrader
```
- `launch-upgrader.py status` → `RUNNING (busy)` immediately after ✓
- `cc-ci-upgrader` tmux session active ✓
**How to verify:**
```
# Cron log created by T0-refire
cat /srv/cc-ci/.cc-ci-logs/upgrader-cron.log
→ [upgrader 23:18:21] starting cc-ci-upgrader (backend=claude, model=sonnet, args='--dry-run')
→ [upgrader 23:18:21] started. attach: tmux attach -t cc-ci-upgrader ...
# CronCreate weekly job still registered (session-persistent)
# (verify by observing CronList in Builder session or checking job ID 8dd9aed3 is active)
```
## Phase 5 gates
Gate: M5 RE-CLAIMED (A5-7 fix: CronCreate mechanism verified), awaiting Adversary §4 cron PASS.
## Verification next step
Awaiting Adversary PASS on §4 cron T0-refire to write ## DONE. V9 already PASS.
- Move to the next enrolled candidate for V5/V6. Current shortlist: `n8n` first, then `lasuite-docs`,
then `keycloak`.
## Phase 5 gates
(None claimed yet.)
## Blocked

View File

@ -1,61 +0,0 @@
# STATUS — cc-ci mirror-enroll Builder
**Phase:** mirror + enroll ALL recipes
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-mirror-enroll-all-recipes.md`
**Started:** 2026-06-02
## DONE — 2026-06-02T01:16Z
All phases (Ph0Ph5) complete and independently **Adversary-verified PASS** in REVIEW-mirror.md.
No standing VETO or open adversary finding.
| Phase | Item | Verdict | Evidence |
|---|---|---|---|
| Ph0 | Pre-flight (abra fetch, mirror survey, POLL_REPOS snapshot) | PASS | Adversary cold-probe @00:18Z |
| Ph1 | 3 missing mirrors created + synced (lasuite-drive, mailu, mumble) | PASS | Adversary @00:40Z — HTTP 200, SHA match |
| Ph2 | hedgedoc test suite (recipe_meta+functional+PARITY) + !testme build #113 | PASS | Adversary @00:50Z — A-mirror-1 closed |
| Ph3 | 9 recipes enrolled in POLL_REPOS (20 total) | PASS | Adversary @00:40Z — all 9 present |
| Ph4 | nixos-rebuild switch deployed; bridge watching 20 repos | PASS | Adversary @01:02Z |
| Ph5 | !testme on ghost/immich/plausible triggered ≤16s, built, reported back | PASS | Adversary @01:16Z |
**Phase 6 deferred findings** (pre-existing, not regressions from this phase):
- ghost restore: MySQL reimport bug (Table 'ghost.ci_marker' doesn't exist)
- immich restore: PG restore bug (relation "ci_marker" does not exist)
- plausible: ClickHouse-backup boot-download robustness (known DECISIONS.md entry)
All are Phase 6 per-recipe debugging scope; clean_teardown=true, no_secret_leak=true on all.
---
## Completed phases summary
### Phase 0 — Pre-flight ✓
- abra recipe fetch for lasuite-drive, mailu, mumble: exit 0 (already fetched)
- Gitea: lasuite-drive=404, mailu=404, mumble=404 (confirmed missing); 6 others = 200 (exist)
- POLL_REPOS: 11 entries; tests/: all 9 unenrolled recipes had tests/<recipe>/ already
### Phase 1 — 3 missing mirrors ✓
- Created recipe-maintainers/{lasuite-drive,mailu,mumble} (Gitea API 201)
- Force-synced to upstream main: f4135d78, 23309a1a, 9fa5e949
- Adversary: SHA match confirmed, real content verified
### Phase 2 — hedgedoc test suite ✓
- tests/hedgedoc/recipe_meta.py + functional/test_health_check.py + functional/test_branding.py + PARITY.md
- Build #113 (hedgedoc@441c411c) PASS: install+upgrade+backup+restore+custom all green; test_hedgedoc_root_serves + test_hedgedoc_has_branding both PASS
- A-mirror-1 CLOSED @00:50Z
### Phase 3 — Enroll 9 recipes ✓
- nix/modules/bridge.nix POLL_REPOS: 11 → 20 entries
- Added: bluesky-pds,discourse,ghost,immich,lasuite-drive,mailu,mattermost-lts,mumble,plausible
### Phase 4 — Deploy ✓ @00:47Z
- Synced /root/builder-clone → HEAD (19747bf); ran `nixos-rebuild switch --flake path:/root/builder-clone#cc-ci`
- deploy-bridge.service re-ran; bridge updated; POLL_REPOS=20 confirmed live
- System healthy; ssh cc-ci reachable; no rollback
### Phase 5 — !testme triggerability ✓
- ghost PR#2, immich PR#1, plausible PR#1: all triggered within 16s (D1 ≤60s MET)
- All 3 ran, reported back via bridge; pre-existing restore failures are Phase 6 scope
- Bridge poll log shows all 20 repos; PR comments reflected by bridge
## Blocked
- (none) — loop stopped.

View File

@ -1,138 +0,0 @@
# STATUS — server regression canaries phase
**Phase:** server regression canaries (codified E2E self-tests)
**SSOT:** `/srv/cc-ci/cc-ci-plan/plan-server-regression-canaries.md`
**Builder loop started:** 2026-06-02
**Repo:** git.autonomic.zone/recipe-maintainers/cc-ci
---
## DONE
**Adversary PASS: @2026-06-02T03:36Z — D-final PASS. All 7 canaries verified. All 6 DoD items met. No vetoes.**
All DoD items Adversary-verified:
1.`tests/regression/` suite committed — 7 tests collected (DoD#1)
2. ✓ good-simple GREEN: `/var/lib/cc-ci-runs/regression-good-simple-1/` — install/upgrade=pass, test_serving PASS (DoD#2)
3. ✓ good-significant GREEN: `/var/lib/cc-ci-runs/regression-good-significant-2/` — all 5 tiers pass, clean_teardown/no_secret_leak=true (DoD#2)
4. ✓ bad-false-green RED: `/var/lib/cc-ci-runs/regression-bad-canary-1/` — custom=fail, false-green caught (DoD#3)
5. ✓ 4 per-tier RED canaries verified (bad-install/upgrade/backup/restore — artifacts on server) (DoD#4)
6. ✓ README.md: cadence, canaries, how to add (DoD#5)
7. ✓ PR#5 open for operator review: https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/5 (DoD#6)
**Phase complete. Loop stopped. PR#5 awaits operator review — do not merge.**
---
## What was built
```
tests/regression/
├── conftest.py — run_recipe_ci(), stage_has_{passing,failing}_test() helpers
├── test_canaries.py — 7 parametrized canaries (3 @canary + 4 @canary_fast)
└── README.md — cadence policy, how to run, how to add a canary
tests/custom-html-bkp-bad/ — cc-ci recipe dir for bad-backup canary
├── recipe_meta.py — BACKUP_CAPABLE=True
└── test_backup.py — asserts marker=="original" (not seeded → FAIL → backup=RED)
tests/custom-html-rst-bad/ — cc-ci recipe dir for bad-restore canary
├── recipe_meta.py — BACKUP_CAPABLE=True
├── ops.py — pre_restore writes "mutated" (no pre_backup)
└── test_restore.py — asserts marker=="original" (not in snapshot → FAIL → restore=RED)
```
---
## Canaries (7 total)
| ID | Recipe | SHA | Expected | Verified |
|----|--------|-----|---------|---------|
| good-simple | custom-html-tiny | 435df8fc (main) | GREEN | ✓ rc=0, install=pass, test_serving present |
| good-significant | lasuite-docs | 290a8ad7 (main) | GREEN | ✓ rc=0, all tiers pass (run: regression-good-significant-2) |
| bad-false-green | custom-html | 71e7326a (v5-stale-docroot) | RED | ✓ rc=1, custom=fail, test_content_type fails |
| bad-install | custom-html-tiny | 4ae88661 (regression-bad-image) | RED (install) | ✓ rc=1, install=fail |
| bad-upgrade | custom-html-tiny | 4ae88661 (regression-bad-image) | RED (upgrade) | ✓ rc=1, install=pass, upgrade=fail |
| bad-backup | custom-html-bkp-bad | b6fe99de (main) | RED (backup) | ✓ rc=1, install=pass, backup=fail |
| bad-restore | custom-html-rst-bad | 9a73a184 (main) | RED (restore) | ✓ rc=1, install=pass, backup=pass, restore=fail |
---
## How to verify (Adversary commands)
From cc-ci server (builder-clone at `/root/builder-clone`):
```bash
# Pull latest
cd /root/builder-clone && git pull --rebase
# Verify collection (expect 7 tests)
cc-ci-run -m pytest tests/regression/ --collect-only
# Fast RED canaries (~2-3 min each):
RECIPE=custom-html-tiny REF=4ae8866100563204d40435c5aba00374aa5a8ed3 SRC=recipe-maintainers/custom-html-tiny PR=0 STAGES=install CCCI_RUN_ID=adv-bad-install HOME=/root /run/current-system/sw/bin/cc-ci-run runner/run_recipe_ci.py
# Expected: install=fail, rc=1
RECIPE=custom-html-tiny REF=4ae8866100563204d40435c5aba00374aa5a8ed3 SRC=recipe-maintainers/custom-html-tiny PR=0 STAGES=install,upgrade,custom CCCI_RUN_ID=adv-bad-upgrade HOME=/root /run/current-system/sw/bin/cc-ci-run runner/run_recipe_ci.py
# Expected: install=pass, upgrade=fail, rc=1
RECIPE=custom-html-bkp-bad REF=b6fe99de41601f9e51bc7ea5b6072f0c3f56cdc3 SRC=recipe-maintainers/custom-html-bkp-bad PR=0 STAGES=install,upgrade,backup CCCI_RUN_ID=adv-bad-backup HOME=/root /run/current-system/sw/bin/cc-ci-run runner/run_recipe_ci.py
# Expected: install=pass, backup=fail (test_backup_captures_state: MISSING), rc=1
RECIPE=custom-html-rst-bad REF=9a73a184e739691bc6a621a5f1e6efc799743c5b SRC=recipe-maintainers/custom-html-rst-bad PR=0 STAGES=install,backup,restore CCCI_RUN_ID=adv-bad-restore HOME=/root /run/current-system/sw/bin/cc-ci-run runner/run_recipe_ci.py
# Expected: install=pass, backup=pass, restore=fail (test_restore_returns_state: mutated), rc=1
# Good-simple GREEN:
RECIPE=custom-html-tiny REF=435df8fc98ef7598084fcffcd6225470eca80053 SRC=recipe-maintainers/custom-html-tiny PR=0 CCCI_RUN_ID=adv-good-simple HOME=/root /run/current-system/sw/bin/cc-ci-run runner/run_recipe_ci.py
# Expected: install=pass, upgrade=pass, rc=0; stages.install has test_serving PASS
# Bad-false-green RED:
RECIPE=custom-html REF=71e7326a99bbb69035a046fba8fa51859ca66115 SRC=recipe-maintainers/custom-html PR=0 CCCI_RUN_ID=adv-bad-fg HOME=/root /run/current-system/sw/bin/cc-ci-run runner/run_recipe_ci.py
# Expected: custom=fail (test_content_type FAILS), rc=1
# Good-significant (lasuite-docs) — verify artifact (or re-run, takes ~15-20 min):
# Quick artifact check (no re-run needed):
cat /var/lib/cc-ci-runs/regression-good-significant-2/results.json
# Expected: install=pass, upgrade=pass, backup=pass, restore=pass, custom=pass, rc implicit in level>=5
# Check PR exists and is open:
# https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/5 — state=open, 10 files, 704 insertions
```
---
## Artifacts already on server
| Run ID | Recipe | Result |
|--------|--------|--------|
| regression-good-simple-1 | custom-html-tiny | GREEN ✓ |
| regression-good-significant-2 | lasuite-docs | GREEN ✓ (all tiers: install/upgrade/backup/restore/custom=pass) |
| regression-bad-canary-1 | custom-html v5-stale-docroot | RED ✓ |
| regression-bad-install-v2 | custom-html-tiny bad-image | RED (install=fail) ✓ |
| regression-bad-upgrade-v2 | custom-html-tiny bad-image | RED (upgrade=fail) ✓ |
| regression-bad-backup-5 | custom-html-bkp-bad | RED (backup=fail) ✓ |
| regression-bad-restore-3 | custom-html-rst-bad | RED (restore=fail) ✓ |
---
## good-significant run 2 full results (cold-readable on server)
`cat /var/lib/cc-ci-runs/regression-good-significant-2/results.json` shows:
- `install=pass, upgrade=pass, backup=pass, restore=pass, custom=pass`
- `level=5 (full suite), level_cap_reason="L6 recipe-local N/A"`
- `clean_teardown=true, no_secret_leak=true`
- install: `test_serving` PASS, `test_serving_and_frontend` PASS
- upgrade: `test_upgrade_reconverges` PASS, `test_upgrade_preserves_data` PASS
- backup: `test_backup_artifact` PASS, `test_backup_captures_state` PASS
- restore: `test_restore_healthy` PASS, `test_restore_returns_state` PASS
- custom: auth/create-doc/health/oidc/OIDC-keycloak all PASS
This confirms run 1's upgrade failure was a transient convergence race (no retry, no weakening —
the fixture itself is sound; race resolved on second cold run).
---
## PR
**PR#5: https://git.autonomic.zone/recipe-maintainers/cc-ci/pulls/5**
Branch `regression-canaries``main`. 10 files, 704 insertions. Open for operator review.
"Do not merge" — operator review only per DoD#6.

View File

@ -22,7 +22,6 @@
../../modules/drone-runner.nix
../../modules/bridge.nix
../../modules/dashboard.nix
../../modules/reports.nix
../../modules/backupbot.nix
../../modules/harness.nix
../../modules/warm-keycloak.nix

View File

@ -40,7 +40,7 @@ let
# admin-registered push optimization deduped against the poller (§4.1). Enrollment = add
# the repo to POLL_REPOS (csv) + ensure tests/<recipe>/ exists.
- POLL_INTERVAL=30
- POLL_REPOS=recipe-maintainers/cc-ci,recipe-maintainers/custom-html,recipe-maintainers/custom-html-tiny,recipe-maintainers/keycloak,recipe-maintainers/cryptpad,recipe-maintainers/matrix-synapse,recipe-maintainers/lasuite-docs,recipe-maintainers/lasuite-meet,recipe-maintainers/n8n,recipe-maintainers/hedgedoc,recipe-maintainers/uptime-kuma,recipe-maintainers/bluesky-pds,recipe-maintainers/discourse,recipe-maintainers/ghost,recipe-maintainers/immich,recipe-maintainers/lasuite-drive,recipe-maintainers/mailu,recipe-maintainers/mattermost-lts,recipe-maintainers/mumble,recipe-maintainers/plausible
- POLL_REPOS=recipe-maintainers/cc-ci,recipe-maintainers/custom-html,recipe-maintainers/custom-html-tiny,recipe-maintainers/keycloak,recipe-maintainers/cryptpad,recipe-maintainers/matrix-synapse,recipe-maintainers/lasuite-docs,recipe-maintainers/lasuite-meet,recipe-maintainers/n8n,recipe-maintainers/hedgedoc
- HMAC_FILE=/run/secrets/webhook_hmac
- DRONE_TOKEN_FILE=/run/secrets/drone_token
- GITEA_TOKEN_FILE=/run/secrets/gitea_token

View File

@ -9,18 +9,13 @@
let
# MAX_TESTS (plan §4.2/§4.3 resource safety): max CI builds the exec runner runs at once. Drone
# queues the rest in its native pending-build queue (no custom queue). THE concurrency cap that
# bounds how many test apps can be live at once.
#
# Raised to 2 (operator request 2026-06-09) so two recipes can be tested in parallel (e.g. immich
# and plausible under active development at once). Verified safe on the current node (Hetzner cpx22,
# ~7.6 GiB / 4 vCPU — NOTE: smaller than the original 28 GiB this was written for): a full immich CI
# stack measured ~1 GiB (server+ML+pg+redis) with multiple GiB free, so two concurrent recipes fit.
# The concurrency PRECONDITION holds: the run-start janitor is age-based (default 2h) + run-app-name
# scoped, so it never reaps a concurrent in-flight run (harness.lifecycle.janitor). TRADE-OFF: with
# capacity>1 a SIGKILL'd build (no teardown) leaves an orphan the run-start sweep can't reap
# immediately (it might be a live run) — bounded instead by the 2h janitor + the /upgrade-all
# start/end reap + sweep-orphans. Revert to "1" if OOM / disk-I/O contention is observed under load.
maxTests = "2";
# bounds how many test apps can be live at once — kept LOW (1) on this single 28GiB node since
# recipes are heavy (immich/matrix large volumes). With capacity=1 there is never a concurrent
# in-flight run, so the run-start janitor can safely reap *any* orphan (a SIGKILL'd build runs no
# teardown) and the "at most MAX_TESTS apps live" bound holds exactly. Raise to 2 only if the node
# is shown to handle two light recipes at once (then the janitor MUST stay age-based to avoid
# reaping a concurrent run — see DECISIONS.md "Resource safety").
maxTests = "1";
in
{
# Drone ships under the Polyform Small Business license (nixpkgs marks it unfree);

View File

@ -1,116 +0,0 @@
# Recipe Report static site (report.ci.commoninternet.net): a public nginx serving the weekly
# "Recipe Report" HTML pages written to /var/lib/cc-ci-reports by the /recipe-report skill. No app,
# no secrets — just static files behind traefik + the wildcard TLS (same pattern as dashboard.nix,
# but a plain nginx:alpine since there's nothing to render server-side). Content is updated by writing
# files into /var/lib/cc-ci-reports; nginx serves them live (no redeploy needed).
#
# It ALSO serves a same-origin realtime PR-status proxy at /pr/<recipe>/<n>: the report's STATUS
# column fetches it client-side to show each PR's live state (open vs. ✓). Same-origin means no
# dependency on the Gitea CORS allow-list; the recipe mirrors are public so no token is needed. The
# proxy is pinned to recipe-maintainers + a safe recipe-name charset and is read-only (GET/HEAD).
{ pkgs, ... }:
let
reportsDir = "/var/lib/cc-ci-reports";
# Custom nginx server: static report files + the /pr/<recipe>/<n> → Gitea-API proxy. Replaces the
# stock /etc/nginx/conf.d/default.conf (which the image's nginx.conf includes inside http{}).
nginxConf = pkgs.writeText "cc-ci-reports-default.conf" ''
server {
listen 80;
server_name _;
root /usr/share/nginx/html;
index index.html;
# Realtime PR-status proxy for the Recipe Report STATUS column.
# GET /pr/<recipe>/<n> -> the PUBLIC Gitea PR JSON ({state, merged, ...}). Same-origin from
# the browser's view, so no CORS dependency; unauthenticated, since the recipe mirrors are
# public. The repo owner is hard-pinned to recipe-maintainers and the recipe name to a
# slashless charset, so the proxied path can only ever address recipe-maintainers/<name>/pulls
# (it cannot be coerced to another org or path). Only safe read methods are allowed.
location ~ ^/pr/([a-z0-9._-]+)/([0-9]+)$ {
limit_except GET HEAD { deny all; }
resolver 127.0.0.11 ipv6=off valid=30s; # docker embedded DNS (forwards external names)
proxy_ssl_server_name on;
proxy_set_header Host git.autonomic.zone;
proxy_set_header Accept "application/json";
proxy_pass https://git.autonomic.zone/api/v1/repos/recipe-maintainers/$1/pulls/$2;
proxy_intercept_errors off;
proxy_connect_timeout 5s;
proxy_read_timeout 10s;
add_header Cache-Control "no-store" always; # always fetch live state, never cache in the browser
}
location / {
try_files $uri $uri/ =404;
}
}
'';
stack = pkgs.writeText "cc-ci-reports-stack.yml" ''
version: "3.8"
services:
app:
image: nginx:alpine
volumes:
- type: bind
source: ${reportsDir}
target: /usr/share/nginx/html
read_only: true
- type: bind
source: ${nginxConf}
target: /etc/nginx/conf.d/default.conf
read_only: true
networks:
- proxy
deploy:
replicas: 1
restart_policy:
condition: any
labels:
- "traefik.enable=true"
- "traefik.http.services.ccci-reports.loadbalancer.server.port=80"
- "traefik.http.routers.ccci-reports.rule=Host(`report.ci.commoninternet.net`)"
- "traefik.http.routers.ccci-reports.entrypoints=web-secure"
- "traefik.http.routers.ccci-reports.tls=true"
networks:
proxy:
external: true
'';
reconcile = pkgs.writeShellApplication {
name = "cc-ci-reconcile-reports";
runtimeInputs = with pkgs; [ docker coreutils ];
text = ''
mkdir -p ${reportsDir}
# Seed a placeholder index so the site serves something before the first report is generated.
if [ ! -f ${reportsDir}/index.html ]; then
cat > ${reportsDir}/index.html <<'HTML'
<!doctype html><html lang="en"><head><meta charset="utf-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>The Recipe Report</title>
<style>body{font:16px/1.5 system-ui,sans-serif;max-width:50rem;margin:3rem auto;padding:0 1rem;color:#222}</style>
</head><body><h1>🌻 The Recipe Report</h1>
<p>No reports yet the first one is generated after the weekly recipe-upgrade run.</p>
</body></html>
HTML
fi
docker stack deploy --detach=true -c ${stack} ccci-reports
'';
};
in
{
systemd.services.deploy-reports = {
description = "Reconcile the cc-ci Recipe Report static site (report.ci.commoninternet.net)";
# Ordering-only: chain after the dashboard (proxy→…→dashboard→reports) to avoid concurrent
# docker-init races on a fresh host.
after = [ "deploy-dashboard.service" "deploy-proxy.service" "swarm-init.service" "docker.service" "network-online.target" ];
requires = [ "swarm-init.service" "docker.service" ];
wants = [ "network-online.target" ];
wantedBy = [ "multi-user.target" ];
serviceConfig = {
Type = "oneshot";
RemainAfterExit = true;
ExecStart = "${reconcile}/bin/cc-ci-reconcile-reports";
};
};
}

View File

@ -79,44 +79,10 @@ def render_badge_svg(label: str, message: str, color: str) -> str:
)
# Third-segment colours for the level badge: amber = an UNINTENTIONAL skip (a rung skipped but not
# in the recipe's intentional list — likely missing coverage) capped the climb; muted = an
# INTENTIONAL skip (declared in recipe_meta.EXPECTED_NA — nothing to fix). Font-safe text labels
# (no emoji) so the SVG renders anywhere.
GAP_COLOR = "#d29922"
EXPECT_COLOR = "#6e7681"
def level_badge_svg(level: int, cap_reason: str = "", cap_skip: str = "") -> str:
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N' coloured by level (R6), with a THIRD segment
that differentiates *why* the climb stopped when a SKIP capped it (`cap_skip`):
- "unintentional" (a rung skipped but not in the recipe's intentional list): amber 'gap?'.
- "intentional" (a skip declared in recipe_meta.EXPECTED_NA): muted 'expected'.
- "" (clean cap / full climb / a real failure): no third segment (the level + card carry it).
The badge never inflates — it only annotates the cap the level already reflects."""
label, msg = "cc-ci", f"level {int(level)}"
lw, mw = _text_width(label), _text_width(msg)
third: tuple[str, str] | None = None
if cap_skip == "unintentional":
third = ("gap?", GAP_COLOR)
elif cap_skip == "intentional":
third = ("expected", EXPECT_COLOR)
if third is None:
return render_badge_svg(label, msg, level_color(level))
txt, tcolor = third
tw = _text_width(txt)
w = lw + mw + tw
return (
f'<svg xmlns="http://www.w3.org/2000/svg" width="{w}" height="20" role="img" '
f'aria-label="{html.escape(label)}: {html.escape(msg)} ({html.escape(txt)})">'
f'<rect width="{lw}" height="20" fill="#555"/>'
f'<rect x="{lw}" width="{mw}" height="20" fill="{level_color(level)}"/>'
f'<rect x="{lw + mw}" width="{tw}" height="20" fill="{tcolor}"/>'
f'<g fill="#fff" font-family="Verdana,Geneva,sans-serif" font-size="11">'
f'<text x="6" y="14">{html.escape(label)}</text>'
f'<text x="{lw + 6}" y="14">{html.escape(msg)}</text>'
f'<text x="{lw + mw + 6}" y="14">{html.escape(txt)}</text></g></svg>'
)
def level_badge_svg(level: int, cap_reason: str = "") -> str:
"""Per-recipe/-run LEVEL badge: 'cc-ci | level N'. Colour by level (R6)."""
msg = f"level {int(level)}"
return render_badge_svg("cc-ci", msg, level_color(level))
def _stage_rows(stages: list[dict]) -> str:
@ -141,41 +107,6 @@ def _stage_rows(stages: list[dict]) -> str:
return "\n".join(rows) or '<tr><td colspan="3">no stages</td></tr>'
# Friendly rung labels for the skip rows (the four essential rungs).
RUNG_LABEL = {
"install": "install",
"upgrade": "upgrade",
"backup_restore": "backup/restore",
"functional": "functional",
}
SKIP_GREEN = "#57ab5a" # muted green — an intentional skip reads like a pass (but labelled, never inflating)
def _skip_rows(skips: dict) -> str:
"""Render SKIPPED rungs as stage-like rows. An intentional (declared) skip looks like a pass row
but its status says 'INTENTIONAL SKIP' (muted green) with the declared reason on the line below;
an unintentional skip is amber 'UNINTENTIONAL SKIP' with a prompt to add a test or declare it."""
rows = []
for rung, reason in (skips.get("intentional") or {}).items():
rows.append(
f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{SKIP_GREEN}">⊘</span>'
f'<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>'
f'<td class="st" style="color:{SKIP_GREEN}">intentional skip</td></tr>'
)
rows.append(f'<tr class="skipreason"><td></td><td colspan="2">{html.escape(reason)}</td></tr>')
for rung in skips.get("unintentional") or []:
rows.append(
f'<tr class="stage"><td colspan="2"><span class="mark" style="color:{GAP_COLOR}">⊘</span>'
f'<b>{html.escape(RUNG_LABEL.get(rung, rung))}</b></td>'
f'<td class="st" style="color:{GAP_COLOR}">unintentional skip</td></tr>'
)
rows.append(
'<tr class="skipreason"><td></td><td colspan="2">not declared in EXPECTED_NA — add the '
"missing test/label, or declare the skip with a reason</td></tr>"
)
return "\n".join(rows)
def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png") -> str:
"""Build the summary-card HTML from a results.json dict. `screenshot_rel` is the relative path to
the screenshot PNG (same dir as the card) — omitted from the card if None / absent.
@ -185,9 +116,7 @@ def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png")
recipe = html.escape(str(data.get("recipe", "?")))
version = html.escape(str(data.get("version") or data.get("ref") or ""))
level = int(data.get("level", 0))
cap_reason = str(data.get("level_cap_reason") or "")
cap = html.escape(cap_reason)
sk = data.get("skips", {}) or {}
cap = html.escape(str(data.get("level_cap_reason") or ""))
color = level_color(level)
flags = data.get("flags", {}) or {}
flag_bits = []
@ -203,7 +132,7 @@ def render_card_html(data: dict, screenshot_rel: str | None = "screenshot.png")
if show_shot
else '<div class="shot noshot">no screenshot</div>'
)
rows = _stage_rows(data.get("stages", [])) + "\n" + _skip_rows(sk)
rows = _stage_rows(data.get("stages", []))
return f"""<!doctype html><html><head><meta charset="utf-8"><style>
*{{box-sizing:border-box}}
body{{margin:0;font-family:system-ui,-apple-system,Segoe UI,sans-serif;background:#0d1117;color:#c9d1d9}}
@ -228,7 +157,6 @@ tr.stage td{{padding-top:.5rem;border-bottom:1px solid #30363d}}
.test .tmark{{width:1.4rem;text-align:center}}
.test .tname{{color:#c9d1d9;font-family:ui-monospace,monospace;font-size:.8rem}}
.test .tms{{text-align:right;color:#8b949e;font-size:.74rem;width:5rem}}
tr.skipreason td{{color:#8b949e;font-size:.78rem;font-style:italic;padding-top:0;padding-bottom:.45rem;border-bottom:1px solid #21262d}}
.shot{{width:360px;flex:none;border:1px solid #30363d;border-radius:8px;overflow:hidden;background:#0d1117}}
.shot img{{width:100%;display:block}}
.shot.noshot{{display:flex;align-items:center;justify-content:center;height:225px;color:#8b949e;font-size:.85rem}}
@ -239,7 +167,7 @@ tr.skipreason td{{color:#8b949e;font-size:.78rem;font-style:italic;padding-top:0
<div class="hd">{FLOWER_SVG}
<div class="title"><h1>{recipe}</h1><span class="ver">{version}</span></div>
<div class="lvl"><span class="num">{level}</span><span class="lbl">level</span></div></div>
<div class="cap">{("<b>capped:</b> " + cap) if cap else "<b>full clean climb</b> — top level (4)"}</div>
<div class="cap">{("<b>capped:</b> " + cap) if cap else "<b>full clean climb</b> — top level (6)"}</div>
<div class="body"><div class="tbl"><table>{rows}</table></div>{shot_html}</div>
<div class="flags">{"".join(flag_bits)}</div>
</div></body></html>"""

View File

@ -5,39 +5,37 @@ YunoHost semantics: **a gap caps the level** — you only earn level L if every
PASS. The first rung that is not a clean PASS (a real FAIL *or* genuinely N/A for this recipe) stops
the climb; `cap_reason` records why. This is deliberately conservative: presentation must NEVER make
a run look greener than its tests (plan §6 cardinal guardrail), so an N/A rung caps just like a fail
— with a recorded reason so the level is *fair*, not inflated.
(the L5 example in §4.1 — "recipes with no integration surface cap at L4 by definition" — is exactly
this: N/A caps, with a recorded reason so the level is *fair*, not inflated).
The ladder is the FOUR essential rungs every recipe is held to:
The ladder (§4.1):
L0 — install failed / app never became healthy.
L1 — Installs: deploys + passes health/readiness.
L2 — Upgrades: previous published version → PR version, stays healthy, data intact.
L3 — Backup/restore: seeded data survives backup → wipe → restore.
L4 — Functional: recipe-specific functional tests pass.
Integration (SSO/OIDC + cross-app) and recipe-local (the recipe repo's own tests/) are **OPTIONAL**
capabilities — they are NOT part of the level ladder and never cap it. They still run when present
(and SSO is still enforced for the run VERDICT via the deps/SSO checks in run_recipe_ci.py), but a
recipe without an SSO surface or without repo-local tests is simply not penalised on the level.
L5 — Integration: SSO/OIDC + cross-app integration tests pass.
L6 — Recipe-local: the recipe repo's own tests/ (D4) pass and are merged.
This module is PURE (no I/O) so it is cheaply unit-testable and the Adversary can re-run the unit
test cold (`cc-ci-run -m pytest tests/unit/test_level.py -q`). The orchestrator
(`run_recipe_ci.py`) is responsible for translating its raw per-tier results into the rung-status
dict this function consumes; that mapping is documented in DECISIONS.md (Phase 3).
(`run_recipe_ci.py`) is responsible for translating its raw per-tier results + deps/SSO signals into
the rung-status dict this function consumes; that mapping is documented in DECISIONS.md (Phase 3).
Rung status vocabulary (each rung ∈ these three):
"pass" — the rung was exercised and passed.
"fail" — the rung was exercised and failed.
"na" — the rung does not apply to this recipe (e.g. only one published version → no upgrade;
not backup-capable). N/A is NOT a failure, but it DOES cap the climb (with a distinct
cap_reason) so the level never overstates what was actually verified.
not backup-capable; no SSO/integration surface; no recipe-local tests). N/A is NOT a
failure, but it DOES cap the climb (with a distinct cap_reason) so the level never
overstates what was actually verified.
"""
from __future__ import annotations
# The climbable rungs in ascending order. install (L1) is the foundation; L0 means install itself
# did not pass. Each later rung requires every earlier rung to be a clean PASS. These four are the
# ESSENTIAL rungs — integration/recipe-local are optional and deliberately NOT in this tuple.
RUNGS = ("install", "upgrade", "backup_restore", "functional")
# did not pass. Each later rung requires every earlier rung to be a clean PASS.
RUNGS = ("install", "upgrade", "backup_restore", "functional", "integration", "recipe_local")
# Human-readable label per rung level, for cap_reason + the summary card.
RUNG_LABEL = {
@ -45,20 +43,22 @@ RUNG_LABEL = {
2: "upgrade (prev published → PR)",
3: "backup/restore (data integrity)",
4: "functional (recipe-specific tests)",
5: "integration (SSO/OIDC + cross-app)",
6: "recipe-local (recipe repo tests/)",
}
VALID = {"pass", "fail", "na"}
def compute_level(rungs: dict[str, str]) -> tuple[int, str]:
"""Map a rung-status dict → (level 0..4, cap_reason).
"""Map a rung-status dict → (level 0..6, cap_reason).
`rungs` must contain a status in {"pass","fail","na"} for every name in RUNGS. The level is the
highest L such that rungs[1..L] are all "pass"; the first non-"pass" rung caps the climb. L0 is
returned when the install rung itself is not "pass" (install failed / never healthy).
cap_reason explains where the climb stopped:
- "" (empty) when the recipe earned the top rung (L4, full clean climb).
- "" (empty) when the recipe earned the top rung (L6, full clean climb).
- "L<k> <label> FAILED" when a rung was exercised and failed.
- "L<k> <label> N/A" when a rung does not apply to this recipe.
Returns the reason for the FIRST rung that stopped the climb (the binding constraint).

View File

@ -2,14 +2,7 @@
Turns a run's per-tier pytest outcomes into a single `results.json` artifact carrying, per the plan:
{ recipe, version, pr, ref, run_id, finished, stages:[{name,status,tests:[{name,status,ms}]}],
level, level_cap_reason, level_cap_rung, rungs,
skips:{intentional:{rung:reason}, unintentional:[rung]},
flags:{clean_teardown,no_secret_leak}, screenshot, summary_card }
`skips` splits the N/A (skipped) rungs by a simple rule: a skip is INTENTIONAL iff the recipe lists
it (with a reason) in `recipe_meta.EXPECTED_NA = {rung: reason}`; any rung skipped but not listed is
UNINTENTIONAL (a coverage gap to fill or declare). Skips still cap the level either way — the harness
never claims a rung it did not verify; this only labels *why* a skip happened.
level, level_cap_reason, rungs, flags:{clean_teardown,no_secret_leak}, screenshot, summary_card }
The per-test breakdown comes from JUnit XML emitted by each tier's pytest invocation (`--junitxml`),
parsed here with the stdlib (no new dep). The integer **level** is computed by harness.level from a
@ -134,24 +127,41 @@ def collect_stages(records: list[dict]) -> list[dict]:
return stages
def _has_repo_local(records: list[dict]) -> bool:
return any(r.get("source") == "repo-local" for r in records)
def _repo_local_passed(records: list[dict]) -> bool:
repo = [r for r in records if r.get("source") == "repo-local"]
return bool(repo) and all(r.get("rc", 1) == 0 for r in repo)
def derive_rungs(
results: dict[str, str],
*,
backup_capable: bool,
declared: list[str] | None,
deps_ready: bool,
sso_unverified: bool,
has_custom: bool,
has_repo_local: bool,
repo_local_passed: bool,
) -> dict[str, str]:
"""Translate the orchestrator's tier results into the rung-status dict harness.level consumes —
the FOUR essential rungs only. Conservative by design — never reports a rung 'pass' it can't
substantiate (cardinal guardrail: presentation never inflates).
"""Translate the orchestrator's tier results + deps/SSO signals into the rung-status dict
harness.level consumes. Documented in DECISIONS.md (Phase 3). Conservative by design — never
reports a rung 'pass' it can't substantiate (cardinal guardrail: presentation never inflates).
L1 install : install tier pass.
L2 upgrade : upgrade tier (skip → N/A: only one published version).
L3 backup/res : backup AND restore tiers pass (N/A if not backup-capable).
L4 functional : recipe-specific functional tests pass — the custom tier. N/A if none ran.
Integration (SSO/OIDC) and recipe-local are OPTIONAL and intentionally NOT rungs here — they
never cap the level (SSO is still enforced for the run VERDICT in run_recipe_ci.py).
L4 functional : the recipe-specific functional (non-deps) tests pass — the custom tier, minus
its SSO/integration tests. N/A if the recipe has no custom tests at all.
L5 integration: SSO/OIDC + cross-app. Applies ONLY if the recipe declares deps (else N/A — the
"no integration surface caps at L4" rule, §4.1). pass iff deps wired
(deps_ready) and not sso_unverified and the custom tier didn't fail.
L6 recipe-loc : the recipe repo's own tests/ (repo-local source) ran and passed (N/A if none).
"""
declared = declared or []
rungs: dict[str, str] = {}
rungs["install"] = level_mod.tier_to_rung(results.get("install"))
rungs["upgrade"] = level_mod.tier_to_rung(results.get("upgrade"))
@ -160,34 +170,36 @@ def derive_rungs(
)
custom = results.get("custom")
# Functional rung (L4): the non-deps custom tests.
if not has_custom or custom == "skip" or custom is None:
rungs["functional"] = "na"
elif custom == "fail":
# A custom test failed. With declared deps we cannot cheaply tell functional-vs-SSO apart, so
# conservatively fail the functional rung (caps at L3) — never inflate.
rungs["functional"] = "fail"
else: # custom == "pass"
rungs["functional"] = "pass"
# Integration rung (L5): only recipes with an SSO/integration surface (declared deps) can climb.
if not declared:
rungs["integration"] = "na"
elif sso_unverified or not deps_ready or custom == "fail":
# SSO not wired/verified, or a custom test failed → integration not verified.
rungs["integration"] = "fail"
elif custom == "pass":
rungs["integration"] = "pass"
else:
# declared deps but no custom tests ran — can't claim integration verified
rungs["integration"] = "na"
# Recipe-local rung (L6).
if not has_repo_local:
rungs["recipe_local"] = "na"
else:
rungs["recipe_local"] = "pass" if repo_local_passed else "fail"
return rungs
def skips(rungs: dict[str, str], expected_na: dict | None) -> dict:
"""Split the SKIPPED (N/A) rungs into intentional vs unintentional (operator model).
A recipe lists the rungs it intentionally skips, each with a reason, in
`recipe_meta.EXPECTED_NA = {rung: reason}`. The rule is dead simple: a skipped rung is
**intentional** iff it is in that list; any rung that is skipped and NOT in the list is
**unintentional** (a coverage gap someone should either fill or declare). N/A still caps the
level either way — the harness never claims a rung it did not verify — this only labels *why* a
skip happened. Returns:
{ "intentional": {rung: reason, ...}, # skipped AND declared in EXPECTED_NA
"unintentional": [rung, ...] } # skipped but NOT declared
"""
expected = {str(k): str(v) for k, v in (expected_na or {}).items()}
na = [r for r, st in rungs.items() if st == "na"]
intentional = {r: expected[r] for r in na if r in expected}
unintentional = sorted(r for r in na if r not in expected)
return {"intentional": intentional, "unintentional": unintentional}
def build_results(
*,
recipe: str,
@ -197,24 +209,30 @@ def build_results(
records: list[dict],
results: dict[str, str],
backup_capable: bool,
declared: list[str] | None,
deps_ready: bool,
sso_unverified: bool,
clean_teardown: bool,
no_secret_leak: bool,
finished_ts: float | None,
screenshot: str | None = None,
summary_card: str | None = None,
expected_na: dict | None = None,
) -> dict:
"""Assemble the full results.json dict (no I/O). `finished_ts` is passed in (the orchestrator
stamps it) so this stays pure and deterministic for unit tests. `expected_na` is the recipe's
declared intentional-skip map (recipe_meta.EXPECTED_NA) used to distinguish a deliberate skip from
accidentally-missing coverage."""
stamps it) so this stays pure and deterministic for unit tests."""
stages = collect_stages(records)
has_custom = any(r["tier"] == "custom" for r in records)
rungs = derive_rungs(results, backup_capable=backup_capable, has_custom=has_custom)
rungs = derive_rungs(
results,
backup_capable=backup_capable,
declared=declared,
deps_ready=deps_ready,
sso_unverified=sso_unverified,
has_custom=has_custom,
has_repo_local=_has_repo_local(records),
repo_local_passed=_repo_local_passed(records),
)
lvl, cap_reason = level_mod.compute_level(rungs)
# The rung that capped the climb (lowest non-pass), or None on a full climb — lets a consumer
# (card/badge) tell whether the cap was an intentional skip, an unintentional one, or a failure.
capped = level_mod.RUNGS[lvl] if cap_reason else None
return {
"schema": 1,
"run_id": run_id(),
@ -225,9 +243,7 @@ def build_results(
"finished": finished_ts,
"level": lvl,
"level_cap_reason": cap_reason,
"level_cap_rung": capped,
"rungs": rungs,
"skips": skips(rungs, expected_na),
"stages": stages,
"results": results,
"flags": {

View File

@ -200,7 +200,6 @@ def _load_meta(recipe: str) -> dict:
for k in list(meta) + [
"BACKUP_CAPABLE",
"SKIP_GENERIC",
"EXPECTED_NA",
"OIDC_AT_INSTALL",
"READY_PROBE",
"UPGRADE_BASE_VERSION",
@ -1225,6 +1224,7 @@ def main() -> int:
# a failure here NEVER changes `overall` (R7 — cosmetics never block the pipeline). ----
data: dict | None = None
try:
sso_unverified = sso_dep_unverified(declared, deps_ready, requires_deps_skipped)
clean_teardown = (deploy_count == expected_deploy_count) and not dep_teardown_error
data = results_mod.build_results(
recipe=recipe,
@ -1234,11 +1234,13 @@ def main() -> int:
records=records,
results=results,
backup_capable=backup_cap,
declared=declared,
deps_ready=deps_ready,
sso_unverified=sso_unverified,
clean_teardown=clean_teardown,
no_secret_leak=True, # narrowed below by an actual scan of the serialised artifact
screenshot=screenshot_rel, # Phase 3 U1 (R4): relative PNG name iff capture succeeded
finished_ts=time.time(),
expected_na=meta.get("EXPECTED_NA"), # declared intentional-skip map (recipe_meta)
)
# Real (if narrow) leak check: no known infra-secret value may appear in the artifact (R7).
blob = json.dumps(data)
@ -1255,15 +1257,6 @@ def main() -> int:
f"{'' + data['level_cap_reason'] if data['level_cap_reason'] else ''})",
flush=True,
)
# Surface UNINTENTIONAL skips in the CI log (non-blocking, R7): a rung that was skipped (N/A)
# but is not in the recipe's intentional list — either add the missing coverage or declare it.
for rung in data.get("skips", {}).get("unintentional", []):
print(
f"⚠ coverage: rung '{rung}' was skipped (N/A) but is not declared intentional — add "
f"the missing test/label, or list it in tests/{recipe}/recipe_meta.py "
f"EXPECTED_NA = {{'{rung}': '<why>'}}.",
flush=True,
)
except Exception as e: # noqa: BLE001 — results assembly is cosmetic; never fail a run on it (R7)
print(
f"!! results.json assembly failed (non-fatal, verdict unaffected): {_scrub(str(e))}",
@ -1282,19 +1275,8 @@ def main() -> int:
with open(html_path, "w", encoding="utf-8") as f:
f.write(card_mod.render_card_html(data, screenshot_rel=data.get("screenshot")))
png = card_mod.render_card_png(html_path, os.path.join(run_artifact_dir, "summary.png"))
capped = data.get("level_cap_rung")
sk = data.get("skips", {})
cap_skip = (
"intentional" if capped in (sk.get("intentional") or {})
else "unintentional" if capped in (sk.get("unintentional") or [])
else ""
)
with open(os.path.join(run_artifact_dir, "badge.svg"), "w", encoding="utf-8") as f:
f.write(
card_mod.level_badge_svg(
data["level"], data.get("level_cap_reason", ""), cap_skip
)
)
f.write(card_mod.level_badge_svg(data["level"], data.get("level_cap_reason", "")))
print(
f"summary card {'rendered ' + png if png else '(PNG render unavailable)'} + "
f"badge.svg written into {run_artifact_dir}",

View File

@ -1,19 +0,0 @@
"""custom-html-bkp-bad — lifecycle ops for bad-backup/bad-restore RED canaries.
Intentionally has NO pre_backup hook: the marker is never seeded before backup,
so the backup snapshot has no ci-marker.txt. pre_restore writes "mutated" so that if
restore DOES bring back the snapshot, the marker is gone/still-mutated → test fails.
"""
from __future__ import annotations
from harness import lifecycle
MARKER_PATH = "/usr/share/nginx/html/ci-marker.txt"
def pre_restore(domain: str, meta: dict) -> None:
"""Write 'mutated' to the marker before restore runs. If restore brings back the
snapshot (which has no marker — never seeded by pre_backup), the marker ends up
MISSING or 'mutated' after restore → test_restore_returns_state FAILS → restore=RED."""
lifecycle.exec_in_app(domain, ["sh", "-c", f"echo mutated > {MARKER_PATH}"])

View File

@ -1,5 +0,0 @@
# custom-html-bkp-bad — regression fixture for bad-backup canary.
# This recipe is custom-html WITHOUT backupbot labels. Setting BACKUP_CAPABLE=True here forces the
# harness to run the backup tier; the recipe itself has no backupbot service, so
# `abra app backup create` produces no snapshot → test_backup_artifact fails → backup tier RED.
BACKUP_CAPABLE = True

View File

@ -1,28 +0,0 @@
"""custom-html-bkp-bad — BACKUP assertion (bad-backup RED canary).
This recipe has no ops.py::pre_backup, so ci-marker.txt is NEVER seeded before the backup.
Asserting its presence here causes backup tier RED — proving the server catches a recipe that
claims backup support but doesn't actually back up the expected data.
"""
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
from harness import lifecycle # noqa: E402
MARKER_PATH = "/usr/share/nginx/html/ci-marker.txt"
def test_backup_captures_state(live_app):
"""Assert the pre-backup marker is present and equals 'original'.
Since custom-html-bkp-bad has no ops.py::pre_backup to seed the marker, this file does NOT
exist at backup time — exec_in_app returns empty or raises → assertion fails → backup tier RED.
This models a recipe that declares backup capability but omits the data-seeding hook."""
result = lifecycle.exec_in_app(live_app, ["sh", "-c", f"cat {MARKER_PATH} 2>/dev/null || echo MISSING"]).strip()
assert result == "original", (
f"backup did not capture the expected marker at {MARKER_PATH}: got {result!r}. "
"Expected 'original' (seeded by pre_backup). If the marker is 'MISSING', the pre_backup "
"hook was not run — this is the intended failure for the bad-backup RED canary."
)

View File

@ -1,25 +0,0 @@
"""custom-html-bkp-bad — RESTORE assertion (bad-restore RED canary).
pre_restore seeds 'mutated' to ci-marker.txt. The backup snapshot has no ci-marker.txt
(never seeded by pre_backup). After restore, the marker is either MISSING or 'mutated'
never 'original' — so this assertion FAILS → restore tier RED.
"""
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
from harness import lifecycle # noqa: E402
MARKER_PATH = "/usr/share/nginx/html/ci-marker.txt"
def test_restore_returns_state(live_app):
result = lifecycle.exec_in_app(
live_app, ["sh", "-c", f"cat {MARKER_PATH} 2>/dev/null || echo MISSING"]
).strip()
assert result == "original", (
f"restore did not return the pre-mutation (backed-up) state: got {result!r}. "
"Expected 'original'. The backup had no marker (not seeded by pre_backup), so "
"restore cannot recover it — this is the intended failure for the bad-restore RED canary."
)

View File

@ -1,15 +0,0 @@
"""custom-html-rst-bad — lifecycle ops for bad-restore RED canary.
NO pre_backup hook: marker never seeded before backup → snapshot has no ci-marker.txt.
pre_restore writes "mutated". After restore, marker stays "mutated" (not in snapshot) → FAIL.
"""
from __future__ import annotations
from harness import lifecycle
MARKER_PATH = "/usr/share/nginx/html/ci-marker.txt"
def pre_restore(domain: str, meta: dict) -> None:
lifecycle.exec_in_app(domain, ["sh", "-c", f"echo mutated > {MARKER_PATH}"])

View File

@ -1,3 +0,0 @@
# custom-html-rst-bad — regression fixture for bad-restore canary.
# BACKUP_CAPABLE=True forces the backup tier to run even though the recipe has no backupbot label.
BACKUP_CAPABLE = True

View File

@ -1,23 +0,0 @@
"""custom-html-rst-bad — RESTORE assertion (bad-restore RED canary).
No pre_backup → backup snapshot has no ci-marker.txt. pre_restore writes "mutated".
After restore: marker is "mutated" (restore can't recover "original" — wasn't backed up) → FAIL.
"""
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner"))
from harness import lifecycle # noqa: E402
MARKER_PATH = "/usr/share/nginx/html/ci-marker.txt"
def test_restore_returns_state(live_app):
result = lifecycle.exec_in_app(
live_app, ["sh", "-c", f"cat {MARKER_PATH} 2>/dev/null || echo MISSING"]
).strip()
assert result == "original", (
f"restore did not return the pre-mutation (backed-up) state: got {result!r}. "
"Expected 'original'. The backup had no marker, so restore cannot recover it."
)

View File

@ -1,87 +0,0 @@
"""custom-html-tiny — recipe-specific functional test (static-web-server).
Proves the deployed static-web-server is *actually serving files from its `content` volume* with real
file-server semantics, not merely returning 200 from a Traefik fallback or a generic stub:
1. exact-byte round-trip — write a uniquely-named file with random content into the served volume,
fetch it over HTTPS, and assert the bytes come back verbatim. Non-vacuous: the content is random
per run, so only a server that reads this file off the volume can pass.
2. real 404 — a random non-existent path returns 404, proving directory/file semantics (a
200-everything stub or mis-routed host would not 404).
The recipe's image (joseluisq/static-web-server) is shell-less (scratch-based) and its content volume
is seeded via the install_steps.sh host-mountpoint mechanism — so this test writes its probe file the
same way (resolve the swarm volume's mountpoint with `docker volume inspect`, write directly) rather
than `docker exec`-ing in a container that has no shell.
Runs in the custom tier against the shared post-install deployment (the `live_app` fixture is its
per-run domain). Mirrors install_steps.sh: the app's content volume is named `<stack>_content`, where
`stack` is the domain with dots replaced by underscores; HTTP_SUBDIR is empty, so the volume root is
served at `/`.
"""
from __future__ import annotations
import contextlib
import os
import ssl
import subprocess
import urllib.error
import urllib.request
import uuid
def _served_dir(domain: str) -> str:
"""Host mountpoint of the app's served `content` volume (same naming as install_steps.sh)."""
vol = f"{domain.replace('.', '_')}_content"
out = subprocess.run(
["docker", "volume", "inspect", vol, "--format", "{{.Mountpoint}}"],
capture_output=True,
text=True,
check=True,
)
mountpoint = out.stdout.strip()
assert mountpoint, f"could not resolve mountpoint for volume {vol!r}"
return mountpoint
def _get(url: str) -> tuple[int, bytes]:
"""GET the URL; return (status, body). A 4xx/5xx is returned, not raised (we assert on the code).
TLS verification is relaxed: the served wildcard cert is validated separately by the infra check;
here we care only about the app's response."""
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
try:
with urllib.request.urlopen(url, timeout=20, context=ctx) as resp:
return resp.status, resp.read()
except urllib.error.HTTPError as e:
return e.code, e.read()
def test_static_file_roundtrip_and_404(live_app):
"""Write a random file into the served volume → fetch it → bytes match; and a missing path 404s."""
served = _served_dir(live_app)
token = uuid.uuid4().hex
name = f"ccci-probe-{token}.txt"
body = f"cc-ci-functional-{token}\n".encode()
path = os.path.join(served, name)
with open(path, "wb") as fh:
fh.write(body)
try:
status, got = _get(f"https://{live_app}/{name}")
assert status == 200, f"served probe file returned {status} (expected 200)"
assert got == body, (
f"content round-trip mismatch: served {got!r}, wrote {body!r} "
"(static-web-server not serving the content volume?)"
)
# A random non-existent path must 404 — proves real static-file semantics, distinguishing a
# working server from a 200-everything stub or a mis-routed Traefik fallback.
miss_status, _ = _get(f"https://{live_app}/ccci-missing-{uuid.uuid4().hex}.txt")
assert miss_status == 404, (
f"missing path returned {miss_status} (expected 404 — generic 200-returner / mis-route?)"
)
finally:
with contextlib.suppress(OSError):
os.remove(path)

View File

@ -3,14 +3,3 @@
# (DG5) is detected quickly instead of waiting the default 300s HTTP timeout.
DEPLOY_TIMEOUT = 120
HTTP_TIMEOUT = 90
# Rungs this recipe INTENTIONALLY skips, each with a reason. Any essential rung skipped (N/A) and NOT
# listed here is reported as an *unintentional* skip (a coverage gap to fill or declare). A skip still
# caps the level either way — the harness never claims a rung it did not verify; this only records
# that the skip is deliberate. (The level ladder is the four essential rungs install/upgrade/
# backup_restore/functional; integration + recipe-local are optional and not leveled.)
# custom-html-tiny is a stateless static-web-server, so it has no backup surface:
EXPECTED_NA = {
"backup_restore": "stateless static file server: serves an ephemeral content volume seeded at "
"deploy, with no persistent/user data to back up or restore (no backupbot.backup label)",
}

View File

@ -52,10 +52,13 @@ def test_content_type_html_and_txt(live_app):
ct_html = h_html.get("content-type", "")
ct_txt = h_txt.get("content-type", "")
# nginx default: "text/html" for .html and "text/plain" for .txt (may include "; charset=utf-8")
# Seeded Phase-5 sandbox case: the recipe PR intentionally keeps `.html` as `text/html` but serves
# `.txt` as `application/octet-stream`. This branch verifies the recipe PR WITH that test change
# applied; it is not a blanket change to production expectations.
assert ct_html.startswith("text/html"), (
f"{html_name} Content-Type={ct_html!r}, expected text/html (nginx MIME config broken?)"
)
assert ct_txt.startswith("text/plain"), (
f"{txt_name} Content-Type={ct_txt!r}, expected text/plain (nginx MIME config broken?)"
assert ct_txt.startswith("application/octet-stream"), (
f"{txt_name} Content-Type={ct_txt!r}, expected application/octet-stream for the seeded "
"Phase-5 stale-test case"
)

View File

@ -1,37 +0,0 @@
# Parity — hedgedoc
HedgeDoc (formerly CodiMD) is a collaborative real-time markdown editor. It is a single-service
app backed by sqlite (default) or PostgreSQL, with a Node.js backend on port 3000.
The upstream recipe-maintainer corpus (`recipe-info/hedgedoc/tests/`) does not exist, so this
PARITY.md documents the cc-ci-authored suite as the baseline.
## Recipe-specific tests (Phase mirror, ≥2 functional tests)
HedgeDoc's defining behaviors:
- Root path (`/`) responds 200 or 302 (redirect to `/login` or `/new` depending on auth config).
- Served HTML contains HedgeDoc/CodiMD branding markers + bundled JS/CSS assets.
| cc-ci file | what's verified | rationale |
|---|---|---|
| `tests/hedgedoc/functional/test_health_check.py` | `GET /` → 200 or 302 | Proves the app is up and routing through Traefik. A wedged HedgeDoc returns 5xx or no response. |
| `tests/hedgedoc/functional/test_branding.py` | `GET /` HTML contains hedgedoc/codimd/hackmd markers OR bundle asset refs | Distinguishes "HedgeDoc is serving its own content" from "fallback page." A misrouted or empty backend lacks these markers. |
## Backup data-integrity
The default compose.yml includes `backupbot.backup=${ENABLE_BACKUPS:-true}`. HedgeDoc stores data
in `codimd_database` (sqlite) and `codimd_uploads` volumes. The generic backup tier verifies a
snapshot artifact is produced. Recipe-specific backup data-integrity overlay (ops.py +
test_backup.py) is deferred; the generic tier suffices for initial enrollment.
## Playwright
Not yet authored. A Playwright flow would create an anonymous note, assert the content persists,
and verify the collaborative editor loads. Deferred — the current functional tests plus the
generic Playwright `assert_serving` pass the enrollment bar.
## Deferred
- Playwright note-creation + persistence flow
- ops.py pre_backup/pre_restore with note content verification
- PostgreSQL variant (`compose.postgresql.yml`) — current tests target sqlite (default)

View File

@ -1,54 +0,0 @@
"""hedgedoc — branding probe: served HTML carries hedgedoc/codimd markers.
Distinguishes "the HedgeDoc app is bound and serving its own content" from "a generic 200
from a fallback page." A wedged backend or misconfigured proxy would lack these markers.
"""
from __future__ import annotations
import os
import ssl
import sys
import urllib.request
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
from harness import http as harness_http # noqa: E402
_CTX = ssl.create_default_context()
_CTX.check_hostname = False
_CTX.verify_mode = ssl.CERT_NONE
def _get_body(url: str) -> tuple[int, str]:
req = urllib.request.Request(url, method="GET")
with urllib.request.urlopen(req, timeout=15, context=_CTX) as r:
return r.status, r.read().decode(errors="replace")
def test_hedgedoc_has_branding(live_app):
"""GET /; assert HedgeDoc-specific brand/asset markers in served HTML."""
url = f"https://{live_app}/"
def _ready():
try:
status, body = _get_body(url)
except Exception: # noqa: BLE001
return None
# 200 = full page; 302 = redirect (follow manually not needed — just the HTML response)
return body if status in (200, 302) else None
body = harness_http.assert_converges(_ready, f"GET {url}", max_wait=90, interval=5)
lower = body.lower()
# HedgeDoc brand markers: any of "hedgedoc", "codimd" (the older brand), or the app meta tag
brand_markers = ("hedgedoc", "codimd", "hackmd")
present_brand = [m for m in brand_markers if m in lower]
# SPA asset markers: CSS/JS bundles or the favicon that HedgeDoc serves
asset_markers = ("/assets/", "/vendor.", "favicon", "bundle.", ".js")
present_assets = [m for m in asset_markers if m in body]
assert present_brand or present_assets, (
f"GET {url} HTML contains none of {brand_markers} or {asset_markers}. "
f"Excerpt: {body[:300]!r}"
)

View File

@ -1,21 +0,0 @@
"""hedgedoc — health check: root path responds (200 or 302 to login/new).
HedgeDoc may redirect / to /login or /new depending on auth config; either is healthy.
"""
from __future__ import annotations
import os
import sys
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "..", "runner"))
from harness import http as harness_http # noqa: E402
def test_hedgedoc_root_serves(live_app):
"""GET / → 200 or 302 (login/new redirect)."""
url = f"https://{live_app}/"
status, _ = harness_http.retry_http_get(
url, expect_status=(200, 302), max_wait=90, interval=5
)
assert status in (200, 302), f"GET {url} HTTP {status} (expected 200 or 302)"

View File

@ -1,6 +0,0 @@
# Per-recipe harness config for hedgedoc (Phase mirror — simple sqlite collaborative markdown editor).
# HedgeDoc serves on port 3000 via Traefik. Root path returns 200 or redirects to /login or /new.
HEALTH_PATH = "/"
HEALTH_OK = (200, 302)
DEPLOY_TIMEOUT = 600
HTTP_TIMEOUT = 300

View File

@ -1,136 +0,0 @@
# Regression canaries — E2E self-tests for the cc-ci server
A standing pytest suite that drives the **real** cc-ci lifecycle harness against pinned canary
recipes and verifies both halves of the server's job:
1. **Good canaries** — healthy apps are reported GREEN (install + upgrade + backup/restore pass).
2. **Bad canary** — broken apps are caught RED; a false-green makes the regression test itself fail.
These tests run the full cold lifecycle on the live cc-ci server. They are **slow** (minutes per
canary) and **opt-in** — kept out of the per-commit fast path by the `canary` marker.
---
## How to run
Run on the cc-ci server (abra + Docker + Swarm required):
```bash
ssh cc-ci
cd /root/cc-ci # or wherever the repo is checked out
cc-ci-run python -m pytest tests/regression/ -m canary -v
```
Or a single canary:
```bash
cc-ci-run python -m pytest tests/regression/ -m canary -k good-simple -v
```
From the orchestrator:
```bash
ssh cc-ci "cd /root/cc-ci && cc-ci-run python -m pytest tests/regression/ -m canary -v"
```
---
## Canaries
| ID | Recipe | Purpose | Expected verdict |
|----|--------|---------|-----------------|
| `good-simple` | `custom-html-tiny` | Minimal static server — fast signal | GREEN |
| `good-significant` | `lasuite-docs` | Multi-service (backend + Postgres + Collabora + OIDC) | GREEN |
| `bad-false-green` | `custom-html` @ `v5-stale-docroot` | App is UP but serves wrong Content-Type — catches false-green | RED |
### Why the bad canary exists
The scariest regression is a **false-green**: the server reports PASS while the app is broken.
We already saw a fabricated full-PASS during the build. The `bad-false-green` canary pins a known-
broken fixture (`v5-stale-docroot`: nginx serves `.txt` as `application/octet-stream`). The
harness's `test_content_type_html_and_txt` catches this and returns RED (build #75 was RED for
exactly this fixture).
The regression test asserts `rc != 0`. If the harness ever wrongly returns green for this fixture,
that assert fires — false-green is caught before any merge.
---
## What each canary verifies
### Per-tier semantic assertions (the "teeth")
The tests assert MORE than the harness exit code: they check that **specific named assertions**
ran and got the expected result. This guards against a different failure mode — a tier that
nominally "passes" because the assertion was silently removed or made vacuous.
| Stage | Test name | What it proves |
|-------|-----------|---------------|
| install | `test_serving` | Generic HTTP readiness check actually ran |
| install | `test_serving_and_frontend` | Lasuite-docs frontend (SPA shell) actually loaded |
| custom | `test_content_type` | Content-type assertion actually ran (bad canary only) |
If a tier assertion is removed: the named test disappears from `results.json` → the semantic
check fires → the regression suite catches the removal.
### Additional structural assertions (good canaries)
- `install` tier: "pass" (not fail, not skip)
- No tier is "fail" (skips acceptable for recipes without backup/custom tests)
- `flags.clean_teardown = True` (no leftover containers/volumes/secrets)
- `flags.no_secret_leak = True` (no secret value in the results artifact)
---
## Cadence policy
**Do NOT run on every commit or PR.** These are slow and resource-heavy. Run them:
- Before a **release** of the cc-ci server (after a batch of server changes).
- As a **polishing pass** or pre-merge check for significant server refactors.
- On-demand when you suspect a regression: `pytest -m canary`.
They are NOT wired to the per-commit Drone pipeline. If adding a `!testme`-style trigger for the
cc-ci repo, gate it behind a deliberate label (e.g. `run-canaries`) — not an automatic run on
every push.
---
## How to add a canary
1. Identify a recipe that is already deployable and has pinned version tags.
2. Decide the expected verdict (GREEN or RED) and which tier assertions have teeth.
3. Add an entry to `CANARIES` in `test_canaries.py`:
```python
{
"id": "good-myrecipe",
"recipe": "my-recipe",
"src": "recipe-maintainers/my-recipe",
"ref": "<pinned-sha>", # pin to a specific commit for stability
"expected_green": True,
"stage_pass_checks": [
("install", "test_serving"), # verify this named test ran and passed
],
"stage_fail_checks": [],
}
```
4. Run the canary once to confirm it passes:
`cc-ci-run python -m pytest tests/regression/ -m canary -k good-myrecipe -v`
5. Update the pin comment with the date and the recipe version it was pinned at.
---
## Pin maintenance
Canary refs are pinned to specific SHAs for stability. When a recipe publishes a new release:
1. Update the `"ref"` SHA in the canary definition (use the new main-branch HEAD).
2. Update the pin comment with the new date/version.
3. Re-run the canary to confirm GREEN before committing the pin update.
The bad canary (`v5-stale-docroot`) is a stable fixture branch — update only if the branch is
deleted. If deleted, recreate the pattern: an app that is up + passes lifecycle tiers but fails
one functional assertion.

View File

@ -1,106 +0,0 @@
"""Shared fixtures and helpers for E2E canary regression tests.
The regression tests call the real cc-ci harness (run_recipe_ci.py) as a subprocess and assert on
its outputs (exit code, results.json). They run ON the cc-ci server, not the orchestrator — abra,
Docker, and Swarm must be present.
Invoke: cc-ci-run python -m pytest tests/regression/ -m canary -v
"""
from __future__ import annotations
import json
import os
import subprocess
import sys
import time
ROOT = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
def pytest_configure(config):
config.addinivalue_line(
"markers",
"canary: slow E2E canary test — drives the full cold CI lifecycle; run on-demand only.",
)
config.addinivalue_line(
"markers",
"canary_fast: fast per-tier RED canary (still tagged canary); subset for quick pre-merge checks.",
)
def run_recipe_ci(
recipe: str,
src: str,
ref: str,
pr: str = "0",
stages: str = "install,upgrade,backup,restore,custom",
runs_dir: str | None = None,
run_id_prefix: str = "regression",
timeout: int = 3600,
) -> tuple[int, dict | None, str]:
"""Invoke run_recipe_ci.py with the given canary params.
Returns (rc, results_dict_or_None, run_artifact_dir).
Stdout/stderr stream live so a human can follow progress.
"""
ts = int(time.time())
run_id = f"{run_id_prefix}-{recipe}-{ref[:12]}-{ts}"
if runs_dir is None:
runs_dir = "/var/lib/cc-ci-runs"
env = dict(os.environ)
env.update(
{
"RECIPE": recipe,
"REF": ref,
"SRC": src,
"PR": pr,
"STAGES": stages,
"CCCI_RUN_ID": run_id,
"CCCI_RUNS_DIR": runs_dir,
"HOME": "/root",
}
)
# Keep PLAYWRIGHT env from the outer cc-ci-run wrapper (already in os.environ if running under it)
script = os.path.join(ROOT, "runner", "run_recipe_ci.py")
result = subprocess.run(
[sys.executable, script],
env=env,
timeout=timeout,
)
rc = result.returncode
artifact_dir = os.path.join(runs_dir, run_id)
results_path = os.path.join(artifact_dir, "results.json")
results_data: dict | None = None
if os.path.exists(results_path):
with open(results_path) as f:
results_data = json.load(f)
return rc, results_data, artifact_dir
def find_stage_tests(results: dict, stage_name: str) -> list[dict]:
"""Return the per-test list for a named stage from results.json, or []."""
for stage in results.get("stages", []):
if stage.get("name") == stage_name:
return stage.get("tests", [])
return []
def stage_has_passing_test(results: dict, stage_name: str, test_name_substr: str) -> bool:
"""True if the named stage contains a passing test whose name includes test_name_substr."""
for t in find_stage_tests(results, stage_name):
if test_name_substr in t.get("name", "") and t.get("status") == "pass":
return True
return False
def stage_has_failing_test(results: dict, stage_name: str, test_name_substr: str) -> bool:
"""True if the named stage contains a failing test whose name includes test_name_substr."""
for t in find_stage_tests(results, stage_name):
if test_name_substr in t.get("name", "") and t.get("status") in ("fail", "error"):
return True
return False

View File

@ -1,344 +0,0 @@
"""E2E canary regression tests — the server's standing self-test suite.
Seven canaries prove both halves of the server's job:
1. GREEN canaries — good apps are reported healthy (install+upgrade+backup/restore pass).
2. RED canaries — broken apps are caught at the intended tier; a false-green makes THIS test fail.
Fast subset (@pytest.mark.canary_fast): the four per-tier RED canaries on custom-html-tiny — fast
because the recipe deploys in seconds. Run with `-m canary_fast` as a pre-merge quick check.
Full suite (-m canary): includes good-significant (lasuite-docs, 10-20 min).
Run: cc-ci-run python -m pytest tests/regression/ -m canary -v
Pin policy: canary refs are pinned to specific SHAs. Update only after confirming the new ref gives
the expected verdict.
"""
from __future__ import annotations
import os
import sys
import pytest
sys.path.insert(0, os.path.dirname(__file__))
import conftest as _reg # noqa: E402
run_recipe_ci = _reg.run_recipe_ci
stage_has_passing_test = _reg.stage_has_passing_test
stage_has_failing_test = _reg.stage_has_failing_test
# ---------------------------------------------------------------------------
# Canary definitions
# ---------------------------------------------------------------------------
# Good canary 1: minimal static-file server — fast signal, few deps.
_SIMPLE = {
"id": "good-simple",
"recipe": "custom-html-tiny",
"src": "recipe-maintainers/custom-html-tiny",
# Pin: main @ 2026-06-02 — update if the recipe publishes a new release and pin goes stale.
"ref": "435df8fc98ef7598084fcffcd6225470eca80053",
"expected_green": True,
# Named tests that MUST appear with "pass" in the result — these are the semantic teeth.
# If the generic install assertion is removed/vacated, test_serving disappears → this fails.
"stage_pass_checks": [
("install", "test_serving"),
],
"stage_fail_checks": [],
}
# Good canary 2: multi-service stack — backend + Postgres + Collabora WOPI + OIDC.
# Exercises real breadth. Slowest canary (~10-20 min full lifecycle).
_SIGNIFICANT = {
"id": "good-significant",
"recipe": "lasuite-docs",
"src": "recipe-maintainers/lasuite-docs",
# Pin: main @ 2026-06-02
"ref": "290a8ad72d06232f0b3f302d976af14bef0f3c53",
"expected_green": True,
"stage_pass_checks": [
("install", "test_serving_and_frontend"),
],
"stage_fail_checks": [],
}
# Bad canary: app is UP + passes all lifecycle tiers but the custom functional assertion detects a
# semantic defect (wrong Content-Type for .txt files). The harness MUST report RED.
# If the harness wrongly returns green for this fixture, assert rc != 0 fails → false-green caught.
_BAD = {
"id": "bad-false-green",
"recipe": "custom-html",
"src": "recipe-maintainers/custom-html",
# Pin: v5-stale-docroot @ 71e7326 — serves .txt as application/octet-stream; build #75 was RED.
# Recreate pattern if branch disappears: app up + passes lifecycle, fails one content assertion.
"ref": "71e7326a99bbb69035a046fba8fa51859ca66115",
"expected_green": False,
# The specific test that must have FAILED, proving the content-type assertion has teeth.
# If the assertion is vacated and the test disappears, stage_has_failing_test() returns False
# → the assert below fails → we detect that the guard was removed.
"stage_pass_checks": [],
"stage_fail_checks": [
("custom", "test_content_type"),
],
}
# ---------------------------------------------------------------------------
# Per-tier RED canaries (fast subset: @pytest.mark.canary_fast)
# Prove the server catches failure at EVERY lifecycle tier — false-green at any tier is caught.
# Each uses custom-html-tiny (deploys in seconds) or custom-html (fast nginx, has backup support).
# ---------------------------------------------------------------------------
# Shared bad-image branch: deploy fails at prepull because the image doesn't exist on Docker Hub.
# Used for install-RED (STAGES=install → chaos of HEAD with bad image → install=fail)
# and upgrade-RED (STAGES=install,upgrade → prev-version install passes, upgrade chaos fails).
_BAD_IMAGE_REF = "4ae8866100563204d40435c5aba00374aa5a8ed3" # regression-bad-image @ 2026-06-02
_BAD_INSTALL = {
"id": "bad-install",
"recipe": "custom-html-tiny",
"src": "recipe-maintainers/custom-html-tiny",
"ref": _BAD_IMAGE_REF,
"expected_green": False,
# STAGES=install only → no upgrade tier → prev=None → chaos deploy of HEAD (bad image) → fails.
"stages": "install",
# Assertions: install must be the failing tier.
"failing_tier": "install",
"passing_tiers_before": [],
"stage_pass_checks": [],
"stage_fail_checks": [],
}
_BAD_UPGRADE = {
"id": "bad-upgrade",
"recipe": "custom-html-tiny",
"src": "recipe-maintainers/custom-html-tiny",
"ref": _BAD_IMAGE_REF,
"expected_green": False,
# Default stages → prev-version deploy (good image) → install=PASS; upgrade chaos (bad image) → FAIL.
"stages": "install,upgrade,custom",
"failing_tier": "upgrade",
"passing_tiers_before": ["install"],
"stage_pass_checks": [],
"stage_fail_checks": [],
}
_BAD_BACKUP = {
"id": "bad-backup",
"recipe": "custom-html-bkp-bad",
"src": "recipe-maintainers/custom-html-bkp-bad",
# Pin: custom-html-bkp-bad main @ 2026-06-02 — custom-html WITHOUT backupbot labels.
# cc-ci recipe_meta sets BACKUP_CAPABLE=True → harness runs backup tier.
# No backupbot.backup=true label → backup-bot-two finds no containers → no snapshot.
# parse_snapshot_id returns None → test_backup_artifact fails → backup tier RED.
"ref": "b6fe99de41601f9e51bc7ea5b6072f0c3f56cdc3",
"expected_green": False,
"stages": "install,upgrade,backup",
"failing_tier": "backup",
"passing_tiers_before": ["install"],
"stage_pass_checks": [],
"stage_fail_checks": [],
}
_BAD_RESTORE = {
"id": "bad-restore",
"recipe": "custom-html-rst-bad",
"src": "recipe-maintainers/custom-html-rst-bad",
# Pin: custom-html-rst-bad main @ 2026-06-02 (9a73a184).
# No pre_backup hook → backup snapshot has no ci-marker.txt.
# pre_restore writes "mutated". After restore: marker stays "mutated" → FAIL → restore=RED.
# install+backup PASS (no test_backup.py in cc-ci dir); upgrade=skip (no version tags).
"ref": "9a73a184e739691bc6a621a5f1e6efc799743c5b",
"expected_green": False,
"stages": "install,backup,restore,custom",
"failing_tier": "restore",
"passing_tiers_before": ["install", "backup"],
"stage_pass_checks": [],
"stage_fail_checks": [
("restore", "test_restore_returns_state"),
],
}
CANARIES = [_SIMPLE, _SIGNIFICANT, _BAD]
CANARIES_FAST = [_BAD_INSTALL, _BAD_UPGRADE, _BAD_BACKUP, _BAD_RESTORE]
# ---------------------------------------------------------------------------
# Tests
# ---------------------------------------------------------------------------
@pytest.mark.canary
@pytest.mark.parametrize("canary", CANARIES, ids=[c["id"] for c in CANARIES])
def test_canary(canary, tmp_path):
"""Drive the full cold CI lifecycle for a canary recipe and verify the outcome.
For GREEN canaries: proves the harness correctly reports a healthy app as healthy, and that
the per-tier semantic assertions actually ran (not vacuous).
For the RED canary: proves the harness catches a broken app — if the harness wrongly returned
green, `assert rc != 0` fails, catching the false-green.
"""
stages = canary.get("stages", "install,upgrade,backup,restore,custom")
rc, results, artifact_dir = run_recipe_ci(
recipe=canary["recipe"],
src=canary["src"],
ref=canary["ref"],
runs_dir=str(tmp_path),
stages=stages,
)
_note = f"artifact_dir={artifact_dir}" # visible in -v output via assert messages
if canary["expected_green"]:
_assert_green(rc, results, canary, _note)
else:
_assert_red(rc, results, canary, _note)
@pytest.mark.canary
@pytest.mark.canary_fast
@pytest.mark.parametrize("canary", CANARIES_FAST, ids=[c["id"] for c in CANARIES_FAST])
def test_canary_fast(canary, tmp_path):
"""Fast per-tier RED canaries: each proves the server catches failure at a specific lifecycle tier.
Each canary is broken at exactly one tier; the test asserts:
- Overall verdict: RED (rc != 0)
- The intended failing tier has status "fail"
- Tiers BEFORE the intended failure have status "pass" (proving tier-specific detection, not
"fails somewhere")
These use fast recipes (custom-html-tiny deploys in seconds, custom-html is similarly fast)
and are intended as a pre-merge quick check alongside the full slow suite.
"""
stages = canary.get("stages", "install,upgrade,backup,restore,custom")
rc, results, artifact_dir = run_recipe_ci(
recipe=canary["recipe"],
src=canary["src"],
ref=canary["ref"],
runs_dir=str(tmp_path),
stages=stages,
)
_note = f"artifact_dir={artifact_dir}"
_assert_red_at_tier(rc, results, canary, _note)
def _assert_green(rc: int, results: dict | None, canary: dict, note: str) -> None:
"""Assert a good-canary run is GREEN with real semantic assertions."""
# 1. Harness exit code must be 0 (GREEN).
assert rc == 0, f"[{canary['id']}] harness returned non-zero rc={rc} — expected GREEN. {note}"
assert (
results is not None
), f"[{canary['id']}] results.json not written — harness may have crashed. {note}"
# 2. Install tier must have passed.
assert results.get("results", {}).get("install") == "pass", (
f"[{canary['id']}] install tier did not pass: " f"results={results.get('results')}. {note}"
)
# 3. No tier may have FAILED (skips are acceptable for recipes without backup or custom tests).
failed_tiers = [t for t, s in results.get("results", {}).items() if s == "fail"]
assert not failed_tiers, f"[{canary['id']}] tiers failed: {failed_tiers}. {note}"
# 4. Teardown must be clean (no leftover containers/volumes/secrets).
assert (
results.get("flags", {}).get("clean_teardown") is True
), f"[{canary['id']}] clean_teardown=False — residual state left on server. {note}"
# 5. No secret values leaked into the results artifact.
assert (
results.get("flags", {}).get("no_secret_leak") is True
), f"[{canary['id']}] no_secret_leak=False — a secret value appeared in results.json. {note}"
# 6. Semantic stage assertions — TEETH CHECK.
# These verify that specific named tests actually ran and passed in the expected stage.
# If a tier assertion is removed or made vacuous, the named test disappears from results.json
# and this assert fires — proving the regression suite guards against silent test removal.
for stage_name, test_name_substr in canary.get("stage_pass_checks", []):
assert stage_has_passing_test(results, stage_name, test_name_substr), (
f"[{canary['id']}] expected a passing test containing {test_name_substr!r} in "
f"stage={stage_name!r}, but none found. "
f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}"
)
def _assert_red(rc: int, results: dict | None, canary: dict, note: str) -> None:
"""Assert a bad-canary run is RED (false-green guard).
The PRIMARY assertion is rc != 0. If the harness wrongly returns 0 (green) for this fixture,
this assert fails → the regression suite catches the false-green. This is the core guard.
"""
# PRIMARY: harness must return non-zero (RED).
# If the harness returns 0 for a broken app, the regression suite fails here — false-green caught.
assert rc != 0, (
f"[{canary['id']}] harness returned rc=0 (GREEN) for a KNOWN-BAD fixture — "
f"FALSE-GREEN detected. The harness failed to catch the broken app. {note}"
)
# SECONDARY: verify the specific failing test is present in results.json.
# If the content-type assertion is removed/vacuated, stage_has_failing_test() returns False here
# → this assert fires → we detect that the guard itself was removed (a meta-failure).
if results is not None:
for stage_name, test_name_substr in canary.get("stage_fail_checks", []):
assert stage_has_failing_test(results, stage_name, test_name_substr), (
f"[{canary['id']}] expected a failing test containing {test_name_substr!r} in "
f"stage={stage_name!r}, but none found. "
f"The guard may have been removed or vacuated. "
f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}"
)
def _assert_red_at_tier(rc: int, results: dict | None, canary: dict, note: str) -> None:
"""Assert a per-tier RED canary: overall RED, failing_tier=fail, passing_tiers_before=pass.
Proves the server catches failure AT THE INTENDED TIER (not just "fails somewhere"), and that
the tiers before it still PASSED (no collateral damage from the fixture).
If the harness returns 0 for any of these fixtures, false-green is detected at the primary assert.
"""
failing_tier = canary.get("failing_tier")
passing_before = canary.get("passing_tiers_before", [])
# PRIMARY: harness must return non-zero.
assert rc != 0, (
f"[{canary['id']}] harness returned rc=0 (GREEN) for a KNOWN-BAD fixture at tier "
f"{failing_tier!r} — FALSE-GREEN. {note}"
)
if results is None:
return
tier_results = results.get("results", {})
# The intended failing tier must be "fail".
if failing_tier:
actual = tier_results.get(failing_tier)
assert actual == "fail", (
f"[{canary['id']}] expected tier {failing_tier!r}='fail', got {actual!r}. "
f"All tier results: {tier_results}. {note}"
)
# Tiers before the failing tier must have passed (no collateral damage from the fixture).
for tier in passing_before:
actual = tier_results.get(tier)
assert actual == "pass", (
f"[{canary['id']}] expected prior tier {tier!r}='pass' before failing at "
f"{failing_tier!r}, got {actual!r}. All results: {tier_results}. {note}"
)
# Optional: specific failing test name (for the restore-RED canary).
for stage_name, test_name_substr in canary.get("stage_fail_checks", []):
assert stage_has_failing_test(results, stage_name, test_name_substr), (
f"[{canary['id']}] expected a failing test containing {test_name_substr!r} in "
f"stage={stage_name!r}. "
f"Stage tests: {[t['name'] for t in _stage_tests(results, stage_name)]}. {note}"
)
def _stage_tests(results: dict, stage_name: str) -> list[dict]:
for stage in results.get("stages", []):
if stage.get("name") == stage_name:
return stage.get("tests", [])
return []

View File

@ -14,7 +14,7 @@ sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..", "runner")
from harness import card as C # noqa: E402
def _data(level=3, cap="L4 functional (recipe-specific tests) N/A"):
def _data(level=4, cap="L5 integration (SSO/OIDC + cross-app) N/A"):
return {
"recipe": "uptime-kuma",
"version": "1.23.0",
@ -51,35 +51,6 @@ def test_badge_svg_wellformed():
assert svg.startswith("<svg") and svg.endswith("</svg>")
assert "level 4" in svg
assert C.level_color(4) in svg
# plain cap (no intent) → two-box badge, no third segment
assert "expected" not in svg and "gap?" not in svg
def test_badge_svg_differentiates_intentional_vs_unintentional_skip():
# an intentional (declared) skip capped the climb → muted "expected" third segment
exp = C.level_badge_svg(2, "L3 backup/restore N/A", "intentional")
assert "level 2" in exp and "expected" in exp and C.EXPECT_COLOR in exp
assert "gap?" not in exp
# an unintentional skip (not declared) → amber "gap?" third segment
gap = C.level_badge_svg(2, "L3 backup/restore N/A", "unintentional")
assert "level 2" in gap and "gap?" in gap and C.GAP_COLOR in gap
assert "expected" not in gap
def test_skip_rows_intentional_and_unintentional():
html_out = C._skip_rows(
{"intentional": {"backup_restore": "no persistent data"}, "unintentional": ["functional"]}
)
# intentional skip: labelled row (muted green) + the reason on its own line
assert "intentional skip" in html_out and C.SKIP_GREEN in html_out
assert "backup/restore" in html_out and "no persistent data" in html_out
# unintentional skip: amber row + prompt to declare/add coverage
assert "unintentional skip" in html_out and C.GAP_COLOR in html_out
assert "functional" in html_out and "EXPECTED_NA" in html_out
def test_skip_rows_empty_when_no_skips():
assert C._skip_rows({"intentional": {}, "unintentional": []}) == ""
def test_card_html_reports_level_verbatim():

View File

@ -24,7 +24,7 @@ import dashboard # noqa: E402
def _row(**kw):
base = {
"recipe": "custom-html", "status": "success", "number": 4, "ref": "db9a9502",
"version": "db9a95024e9d", "level": 4, "level_cap_reason": "",
"version": "db9a95024e9d", "level": 4, "level_cap_reason": "L5 integration N/A",
"has_screenshot": True, "flags": {"clean_teardown": True, "no_secret_leak": True},
"finished": 0, "url": "https://drone.x/cc-ci/4",
}

View File

@ -19,25 +19,35 @@ def _rungs(
upgrade="pass",
backup_restore="pass",
functional="pass",
integration="pass",
recipe_local="pass",
):
return {
"install": install,
"upgrade": upgrade,
"backup_restore": backup_restore,
"functional": functional,
"integration": integration,
"recipe_local": recipe_local,
}
# ---- the ladder: four essential rungs, top is L4 (functional) ----
# ---- the U0 gate: L4-pass and L2-cap ----
def test_full_clean_climb_to_L4():
# All four essential rungs pass → L4 (the top; integration/recipe-local are optional, not leveled).
def test_full_clean_climb_to_L6():
lvl, reason = L.compute_level(_rungs())
assert lvl == 4
assert lvl == 6
assert reason == ""
def test_climbs_through_L4_then_no_integration_surface_caps_at_L4():
# GATE: a recipe whose functional tests pass but has no SSO/integration surface caps at L4.
lvl, reason = L.compute_level(_rungs(integration="na", recipe_local="na"))
assert lvl == 4
assert "L5" in reason and "N/A" in reason
def test_fails_at_L2_capped_at_L1():
# GATE: upgrade fails → capped at L1 even though higher rungs would pass.
lvl, reason = L.compute_level(_rungs(upgrade="fail", backup_restore="pass", functional="pass"))
@ -59,27 +69,34 @@ def test_install_fail_is_L0():
def test_higher_pass_does_not_rescue_lower_na():
# backup/restore N/A (stateless app) caps at L2 even though functional would pass.
lvl, reason = L.compute_level(_rungs(backup_restore="na", functional="pass"))
lvl, reason = L.compute_level(_rungs(backup_restore="na", functional="pass", integration="na"))
assert lvl == 2
assert "L3" in reason and "N/A" in reason
def test_upgrade_na_caps_at_L1():
# only one published version → no upgrade possible → N/A caps at L1 (upgrade is essential).
# only one published version → no upgrade possible → N/A caps at L1.
lvl, reason = L.compute_level(_rungs(upgrade="na"))
assert lvl == 1
assert "L2" in reason and "N/A" in reason
def test_functional_na_caps_at_L3():
# no recipe-specific functional tests → functional N/A caps at L3.
lvl, reason = L.compute_level(_rungs(functional="na"))
assert lvl == 3
assert "L4" in reason and "N/A" in reason
def test_integration_fail_caps_at_L4():
# SSO declared but unverified (failed) → integration rung fails → cap at L4.
lvl, reason = L.compute_level(_rungs(integration="fail", recipe_local="na"))
assert lvl == 4
assert "L5" in reason and "FAILED" in reason
def test_recipe_local_na_caps_at_L5():
# SSO passes but no recipe-local tests → cap at L5 (L6 N/A).
lvl, reason = L.compute_level(_rungs(recipe_local="na"))
assert lvl == 5
assert "L6" in reason and "N/A" in reason
def test_functional_fail_caps_at_L3():
lvl, reason = L.compute_level(_rungs(functional="fail"))
lvl, reason = L.compute_level(_rungs(functional="fail", integration="na"))
assert lvl == 3
assert "L4" in reason and "FAILED" in reason

View File

@ -105,31 +105,83 @@ def _results(**kw):
return base
def test_derive_rungs_full_climb_four_essential():
rungs = R.derive_rungs(_results(), backup_capable=True, has_custom=True)
# only the four essential rungs — integration/recipe-local are optional, not produced here.
def test_derive_rungs_full_stateful_sso():
rungs = R.derive_rungs(
_results(),
backup_capable=True,
declared=["keycloak"],
deps_ready=True,
sso_unverified=False,
has_custom=True,
has_repo_local=False,
repo_local_passed=False,
)
assert rungs == {
"install": "pass",
"upgrade": "pass",
"backup_restore": "pass",
"functional": "pass",
"integration": "pass",
"recipe_local": "na",
}
def test_derive_rungs_stateless_backup_and_functional_na():
def test_derive_rungs_no_sso_surface_is_integration_na():
rungs = R.derive_rungs(
_results(),
backup_capable=True,
declared=[],
deps_ready=True,
sso_unverified=False,
has_custom=True,
has_repo_local=False,
repo_local_passed=False,
)
assert rungs["integration"] == "na"
assert rungs["functional"] == "pass"
def test_derive_rungs_stateless_backup_na():
rungs = R.derive_rungs(
_results(backup="skip", restore="skip", custom="skip"),
backup_capable=False,
declared=[],
deps_ready=True,
sso_unverified=False,
has_custom=False,
has_repo_local=False,
repo_local_passed=False,
)
assert rungs["backup_restore"] == "na"
assert rungs["functional"] == "na"
assert "integration" not in rungs and "recipe_local" not in rungs
def test_derive_rungs_functional_fail():
rungs = R.derive_rungs(_results(custom="fail"), backup_capable=True, has_custom=True)
assert rungs["functional"] == "fail"
def test_derive_rungs_sso_unverified_is_integration_fail():
rungs = R.derive_rungs(
_results(),
backup_capable=True,
declared=["keycloak"],
deps_ready=False,
sso_unverified=True,
has_custom=True,
has_repo_local=False,
repo_local_passed=False,
)
assert rungs["integration"] == "fail"
def test_derive_rungs_repo_local_pass():
rungs = R.derive_rungs(
_results(),
backup_capable=True,
declared=[],
deps_ready=True,
sso_unverified=False,
has_custom=True,
has_repo_local=True,
repo_local_passed=True,
)
assert rungs["recipe_local"] == "pass"
# ---- build_results: end-to-end incl level + flags ----
@ -160,13 +212,16 @@ def test_build_results_level_and_flags(tmp_path):
records=recs,
results=_results(),
backup_capable=True,
declared=[],
deps_ready=True,
sso_unverified=False,
clean_teardown=True,
no_secret_leak=True,
finished_ts=1234.0,
)
# all four essential rungs pass → full climb to L4 (the top), no cap
# stateful, functional pass, no SSO surface, no repo-local → caps at L4
assert data["level"] == 4
assert data["level_cap_reason"] == ""
assert "L5" in data["level_cap_reason"]
assert data["recipe"] == "hedgedoc"
assert data["ref"] == "deadbeefcafe"
assert data["flags"] == {"clean_teardown": True, "no_secret_leak": True}
@ -191,6 +246,9 @@ def test_build_results_capped_at_L1_on_upgrade_fail(tmp_path):
records=recs,
results=_results(upgrade="fail"),
backup_capable=True,
declared=[],
deps_ready=True,
sso_unverified=False,
clean_teardown=True,
no_secret_leak=True,
finished_ts=0.0,
@ -199,85 +257,6 @@ def test_build_results_capped_at_L1_on_upgrade_fail(tmp_path):
assert "L2" in data["level_cap_reason"]
# ---- skips: intentional (declared) vs unintentional (everything else skipped) ----
def _rungs(**kw):
base = {
"install": "pass",
"upgrade": "pass",
"backup_restore": "pass",
"functional": "pass",
}
base.update(kw)
return base
def test_skips_intentional_vs_unintentional():
rungs = _rungs(backup_restore="na", functional="na")
sk = R.skips(rungs, {"backup_restore": "stateless static server"})
# backup_restore is declared (intentional, with reason); functional skipped but not declared.
assert sk["intentional"] == {"backup_restore": "stateless static server"}
assert sk["unintentional"] == ["functional"]
def test_skips_none_declared_all_unintentional():
rungs = _rungs(backup_restore="na")
sk = R.skips(rungs, None)
assert sk["intentional"] == {}
assert sk["unintentional"] == ["backup_restore"]
def test_skips_declaration_only_counts_when_actually_skipped():
# backup_restore actually ran (pass) → not a skip, so a declaration for it is simply inert.
rungs = _rungs(backup_restore="pass")
sk = R.skips(rungs, {"backup_restore": "reason"})
assert "backup_restore" not in sk["intentional"]
assert "backup_restore" not in sk["unintentional"]
def test_build_results_threads_expected_na(tmp_path):
# Mirrors custom-html-tiny post-change: install + a passing functional (custom) test, but no
# backup surface (backup_restore declared intentionally skipped).
recs = [
{
"tier": "install",
"source": "generic",
"file": "g/test_install.py",
"rc": 0,
"junit": _write(tmp_path, "i.xml", JUNIT_PASS),
},
{
"tier": "custom",
"source": "cc-ci",
"file": "c/test_serves_content.py",
"rc": 0,
"junit": _write(tmp_path, "c.xml", JUNIT_PASS),
},
]
data = R.build_results(
recipe="custom-html-tiny",
version="1.1.0",
pr="0",
ref=None,
records=recs,
results=_results(backup="skip", restore="skip"), # custom=pass (default) → functional pass
backup_capable=False, # no backupbot label → backup_restore skipped (N/A)
clean_teardown=True,
no_secret_leak=True,
finished_ts=0.0,
expected_na={"backup_restore": "stateless static file server"},
)
# backup_restore skip still caps at L2 (never inflates) — even though functional passes above it,
# the skip caps the climb — but it's the declared (intentional) rung that capped.
assert data["level"] == 2
assert "L3" in data["level_cap_reason"]
assert data["level_cap_rung"] == "backup_restore"
assert data["rungs"]["functional"] == "pass"
assert data["skips"]["intentional"]["backup_restore"] == "stateless static file server"
assert data["skips"]["unintentional"] == [] # backup_restore declared; functional passed → clean
def test_write_results_roundtrip(tmp_path):
data = {"run_id": "42", "level": 3, "stages": []}
path = R.write_results(data, runs_dir_override=str(tmp_path))