Some checks failed
continuous-integration/drone/push Build is failing
CronCreate mechanism cold-verified: upgrader-cron.log created at 23:18:21Z with correct content; upgrader was started by cron fire; DECISIONS.md updated. busybox crond correctly replaced with CronCreate (plan §4 "Claude scheduled task"). All V1-V9 + §4 cron now PASS within 24h. No open findings, no VETOs. Builder may write ## DONE to STATUS-5.md.
264 lines
15 KiB
Markdown
264 lines
15 KiB
Markdown
# Phase 5 — BACKLOG
|
||
|
||
SSOT: `/srv/cc-ci/cc-ci-plan/plan-phase5-verify-upgrade-flow.md`. DoD = V1–V9.
|
||
Single-writer: `## Build backlog` = Builder-only; `## Adversary findings` = Adversary-only.
|
||
|
||
---
|
||
|
||
## Build backlog
|
||
|
||
- [x] Create phase 5 state files (STATUS-5.md, BACKLOG-5.md, JOURNAL-5.md)
|
||
- [x] Fix A5-2: Add commit status posting to bridge.py (pending on trigger, success/failure on finish)
|
||
- [x] Fix A5-1: Add custom-html-tiny to bridge POLL_REPOS; redeploy bridge (cc-ci-bridge:3761c4221042)
|
||
- [x] V3: /recipe-upgrade custom-html-tiny end-to-end GREEN (!testme PASS; PR #2 open)
|
||
- [x] V7: mirror reconciliation (PR #1 superseded, PR #4 merged-upstream, main force-synced)
|
||
- [x] V1/V2: !testme trigger + testme-on-pr.sh reads verdict (GREEN on PR #2/#35; RED on PR #5/#34)
|
||
- [x] Fix A5-3: make `POST=1 testme-on-pr.sh` ignore stale prior status on same PR head
|
||
- [x] V4: 3-iteration regression loop (seed bad tag → RED → fix → GREEN in 2 runs)
|
||
- [x] V5: stale-test DEFAULT = comment, no test edit (PASS per Adversary A5-5 closed 21:49Z)
|
||
- [x] V6: --with-tests opens + verifies cc-ci test PR (PASS per Adversary REVIEW-5.md 21:38Z)
|
||
- [ ] Fix A5-6: enroll uptime-kuma in bridge POLL_REPOS (done: commit 51ba205)
|
||
- [ ] V8: /upgrade-all DEFAULT run (--dry-run list + small live run) — upgrader running
|
||
- [ ] V8a: cc-ci-upgrader agent (launch-upgrader.sh start/stop/status cycle) — partial
|
||
- [ ] V9: cleanup all verification PRs + deploys; install weekly cron (Phase 5 §4)
|
||
|
||
---
|
||
|
||
## Adversary findings
|
||
|
||
### [adversary] A5-7 — §4 cron: busybox crond does NOT execute jobs as non-root user
|
||
**Status:** CLOSED — re-tested 2026-06-01T23:20Z; CronCreate fire verified; see REVIEW-5.md entry.
|
||
ORIGINALLY OPEN — found 2026-06-01T23:11Z
|
||
|
||
The §4 weekly cron was installed using busybox crond in a tmux session, invoked with:
|
||
```
|
||
crond -f -d 5 -c /home/loops/.cc-ci-crontabs -L /srv/cc-ci/.cc-ci-logs/crond.log
|
||
```
|
||
The crontab file `/home/loops/.cc-ci-crontabs/loops` contains the correct schedule (`4 23 * * 1`).
|
||
|
||
**Finding: crond never executes any job.**
|
||
|
||
Cold-verified T0 miss at 23:04Z (2 minutes after T0):
|
||
- `/srv/cc-ci/.cc-ci-logs/upgrader-cron.log` does NOT exist.
|
||
- crond.log shows only 3 startup lines; last modified 22:08:44 UTC — no entries after startup.
|
||
- No cc-ci-upgrader session started at 23:04Z (`python3 launch-upgrader.py status` → stopped).
|
||
|
||
Cold-verified with `* * * * *` test entry (every-minute control):
|
||
- Added `* * * * * date -u >> /tmp/cc-ci-crond-test.log 2>&1` to the crontab.
|
||
- Waited through 23:09 and 23:10 UTC — no `/tmp/cc-ci-crond-test.log` created.
|
||
- Confirmed: busybox crond is completely ignoring ALL cron entries.
|
||
|
||
**Root cause:** busybox crond's `-c dir` mode is designed to run as root. It reads each file in
|
||
the directory as a per-user crontab (filename = username). Before executing a job, it calls
|
||
`setgid(pw->pw_gid)` + `setuid(pw->pw_uid)`. Running as non-root user `loops`, `setgid/setuid`
|
||
fail with EPERM, so crond silently skips all jobs.
|
||
|
||
**Impact:** The §4 weekly cron is completely non-functional. T0 (23:04 UTC) was missed.
|
||
The plan's §4 requirement ("verify the cron-equivalent path end-to-end; confirm real first fire
|
||
at T0") is NOT met.
|
||
|
||
**Required fix:** Replace busybox crond with a mechanism that works as a non-root user. Options
|
||
per plan §4:
|
||
1. **Claude scheduled task** (`/schedule` skill → `CronCreate` harness tool): built-in, no root
|
||
needed, tested mechanism.
|
||
2. **systemd user timer** (`systemctl --user enable/start cc-ci-upgrader.timer`): requires writing
|
||
a user service unit file to `~/.config/systemd/user/`.
|
||
3. **`at` one-off for T0**: doesn't provide recurring weekly schedule.
|
||
|
||
**Cold repro:**
|
||
1. `ssh loops@<orch> 'cat /srv/cc-ci/.cc-ci-logs/upgrader-cron.log 2>/dev/null || echo "(no log)"'`
|
||
→ "(no log)"
|
||
2. `ssh loops@<orch> 'stat /srv/cc-ci/.cc-ci-logs/crond.log | grep Modify'`
|
||
→ Modify: 2026-06-01 22:08:44 (no update after crond start)
|
||
3. `ssh loops@<orch> 'python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py status'`
|
||
→ "stopped"
|
||
|
||
(Only Adversary closes this after re-test with a working T0 fire.)
|
||
|
||
---
|
||
|
||
### [adversary] A5-5 — V5: explanatory comment references wrong build/failures; no RESULT: SUCCESS-PENDING-TESTS
|
||
**Status:** CLOSED — re-tested 2026-06-01T21:49Z; see `REVIEW-5.md` follow-up entry.
|
||
ORIGINALLY OPEN — found 2026-06-01T21:38Z
|
||
|
||
V5 requires the `recipe-upgrade` skill in DEFAULT mode (no `--with-tests`) to: post an explanatory
|
||
comment that accurately identifies which test is stale + why; and report `RESULT: SUCCESS-PENDING-TESTS`.
|
||
The seeded custom-html evidence does not satisfy both requirements.
|
||
|
||
**Finding 1 — Explanatory comment references build #40, not build #75.**
|
||
The explanatory comment #13883 was posted at 2026-06-01T19:41:22 (before the MIME-only commits
|
||
`ee5cb811`/`71e7326a`) and says: "Observed on `!testme` build `#40`". Build #40 had docroot-path
|
||
failures in three test files (`test_backup.py`, `test_content_roundtrip.py`,
|
||
`test_content_type_header.py`). Build #75 (the final seeded case, ref `71e7326a`) has ONE failure:
|
||
`test_content_type_header.py` MIME type assertion (`application/octet-stream` vs `text/plain`).
|
||
The comment describes a different seeded scenario from the final one — wrong build number, wrong root
|
||
cause, extra test failures that don't appear in build #75.
|
||
|
||
**Finding 2 — No `RESULT: SUCCESS-PENDING-TESTS` produced.**
|
||
No `custom-html-upgrade-*.md` exists in `/srv/cc-ci/.cc-ci-logs/upgrades/`. The V5 evidence uses
|
||
`testme-on-pr.sh POST=1` directly; `/recipe-upgrade custom-html` was not run end-to-end on the
|
||
MIME-only seeded case.
|
||
|
||
**Cold repro:**
|
||
1. Check comment #13883 on `recipe-maintainers/custom-html` PR#3: says "build #40" and docroot-path
|
||
failures.
|
||
2. Check `ci.commoninternet.net/runs/75/results.json`: single failure in `test_content_type_header.py`
|
||
(MIME type), no docroot-path failures.
|
||
3. Run `find /srv/cc-ci* -name "*custom-html*upgrade*"` — no log file produced.
|
||
|
||
**Required fix:**
|
||
Re-run `/recipe-upgrade custom-html` in DEFAULT mode against the existing seeded PR #3 (head
|
||
`71e7326a`). The skill should:
|
||
1. See VERDICT=RED from `testme-on-pr.sh`
|
||
2. Read build #75 failures → only `test_content_type_header.py` (MIME type)
|
||
3. Post a new/updated explanatory comment on PR #3 referencing build #75 and the MIME-type root cause
|
||
4. Write `RESULT: SUCCESS-PENDING-TESTS — custom-html ... recipe PR: ...` to
|
||
`/srv/cc-ci/.cc-ci-logs/upgrades/custom-html-upgrade-<date>.md`
|
||
|
||
(Only Adversary closes this, after re-testing with accurate comment and RESULT line.)
|
||
|
||
---
|
||
|
||
### [adversary] A5-6 — V8: `/upgrade-all uptime-kuma` live run is broken — recipe not enrolled in bridge or tests/
|
||
**Status:** CLOSED — build #91 GREEN 2026-06-01T22:07Z; see REVIEW-5.md V8/V8a cold-verify entry.
|
||
ORIGINALLY OPEN — found 2026-06-01T21:52Z
|
||
|
||
The V8 live run chose `uptime-kuma` as the test recipe. Two enrollment blockers were found via
|
||
cold verification:
|
||
|
||
**Blocker 1 — uptime-kuma NOT in bridge POLL_REPOS:**
|
||
- Live bridge poll list (from `docker service logs`):
|
||
`['cc-ci','custom-html','custom-html-tiny','keycloak','cryptpad','matrix-synapse','lasuite-docs','lasuite-meet','n8n','hedgedoc']`
|
||
- `uptime-kuma` is absent. So when the upgrader posted `!testme` on PR#1 (comment #13902 at
|
||
`2026-06-01T21:48:39Z`), the bridge will NEVER pick it up.
|
||
- `POST=1 testme-on-pr.sh uptime-kuma 1` will eventually time out and return `VERDICT=PENDING BUILD=?`.
|
||
|
||
~~**Blocker 2 — uptime-kuma has no tests/ directory in cc-ci (RETRACTED)**~~
|
||
Builder's correction verified: `ls /root/builder-clone/tests/uptime-kuma/` → EXISTS (functional/ PARITY.md recipe_meta.py). Phase 2 commit `1aaf3bd`. This finding was incorrect.
|
||
|
||
**Impact:** The V8 live run evidence was invalid at time of filing — `uptime-kuma` was not in bridge POLL_REPOS. The tests/ directory DOES exist (finding 2 was incorrect). The `/upgrade-all` dry-run survey listed it as a candidate because `abra recipe upgrade` found available upgrades, which is independent of bridge enrollment.
|
||
|
||
**Cold repro:**
|
||
1. `ssh cc-ci '/run/current-system/sw/bin/docker service logs ccci-bridge_app 2>&1 | grep "watching\|uptime"'`
|
||
→ only older poll lists, no `uptime-kuma`
|
||
2. `ssh cc-ci 'ls /root/builder-clone/tests/'` → no `uptime-kuma` directory
|
||
3. `grep uptime /srv/cc-ci/cc-ci-adv/nix/modules/bridge.nix` → no match
|
||
4. Check commit status: `GET /repos/recipe-maintainers/uptime-kuma/commits/728618890a2b/status`
|
||
→ `state:'', total_count:0` after the `!testme` comment was already posted
|
||
|
||
**Fix applied (commit `51ba205`):** Added `recipe-maintainers/uptime-kuma` to POLL_REPOS in bridge.nix. Bridge redeployed (container `9mtdhzx7eylf`). Upgrader restarted at 21:54:25Z.
|
||
|
||
**Cold-verify of fix:**
|
||
- New bridge container `9mtdhzx7eylf` confirms `uptime-kuma` in poll list ✓
|
||
- `tests/uptime-kuma/` verified present ✓ (finding 2 was incorrect)
|
||
- Awaiting first `!testme` trigger to confirm bridge picks up the run
|
||
|
||
(Only Adversary closes this after cold-verify of a successful live V8 run with uptime-kuma.)
|
||
|
||
---
|
||
|
||
### [adversary] A5-4 — `matrix-synapse` stale-test/default path leaves no recipe commit status
|
||
**Status:** CLOSED — re-tested 2026-06-01T18:53:30Z; see `REVIEW-5.md` follow-up entry.
|
||
|
||
On the live V5 stale-test candidate `recipe-maintainers/matrix-synapse` PR `#1`, the PR comments show a
|
||
terminal failed `!testme` result for build `#53` plus the default-mode explanatory stale-test comment,
|
||
but the recipe PR head has **no** `cc-ci/testme` commit status at all. As a result, the helper cannot
|
||
read the verdict back from the PR and poll-only returns `PENDING` even though the PR already shows the
|
||
terminal outcome.
|
||
|
||
**Cold repro:**
|
||
1. Use `recipe-maintainers/matrix-synapse` PR `#1`, head
|
||
`21e5d84430bdc52f8fa8aa9a40fa5bda8adf06c0`.
|
||
2. Confirm PR comments include:
|
||
- failure result comment for build `#53` (`#13872`), and
|
||
- explanatory stale-test comment (`#13877`).
|
||
3. Run:
|
||
`POST=0 MAX_WAIT=20 INTERVAL=5 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh matrix-synapse 1`
|
||
4. Observe:
|
||
- helper returns `VERDICT=PENDING` and `BUILD=?`;
|
||
- `GET /repos/recipe-maintainers/matrix-synapse/commits/21e5d84430bdc52f8fa8aa9a40fa5bda8adf06c0/status`
|
||
returns `{"state":"","total_count":0,"statuses":null}`.
|
||
|
||
**Impact:** this breaks the Phase-5 requirement that the upgrade tooling read the verdict back from the
|
||
PR on the live stale-test/default path. The comment surface says the run is terminal; the status surface
|
||
still says nothing.
|
||
|
||
**Re-test result:** no longer reproducible on rerun build `#63`. The recipe PR head now shows
|
||
`cc-ci/testme` `pending -> failure` with target URL `.../63`, and poll-only returns
|
||
`VERDICT=PENDING BUILD=.../63` while in flight, then `VERDICT=RED BUILD=.../63` after completion.
|
||
|
||
### [adversary] A5-3 — `POST=1 testme-on-pr.sh` can return a stale prior GREEN on re-runs
|
||
**Status:** CLOSED — re-tested 2026-06-01T03:31:30Z; see `REVIEW-5.md` follow-up entry.
|
||
|
||
The helper currently posts a fresh `!testme`, then polls the recipe PR head's combined commit status.
|
||
If that PR head SHA already has a previous successful `cc-ci/testme` status and the bridge has not yet
|
||
processed the new comment, the helper exits immediately with the **old** GREEN/build URL instead of a
|
||
fresh `PENDING` or the new run's URL.
|
||
|
||
This is a real Phase-5/V2 correctness bug because re-commenting `!testme` on the same PR head is a
|
||
supported path, and the helper is meant to report the verdict for the run it just triggered.
|
||
|
||
**Cold repro:**
|
||
1. Use an open PR whose current head SHA already has `cc-ci/testme: success` from an earlier run.
|
||
2. Record the PR comment count.
|
||
3. Run:
|
||
`POST=1 MAX_WAIT=40 INTERVAL=5 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh custom-html-tiny 5`
|
||
4. Observe:
|
||
- the PR comment count increases by exactly one (`3 -> 4` in the reproducer), so one fresh `!testme`
|
||
was posted;
|
||
- the helper returns `VERDICT=GREEN` with the **old** build URL
|
||
`https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/37`;
|
||
- later, the live system shows a new run was actually triggered and reflected on the PR as build
|
||
`#41` (`cc-ci/testme pending -> success`, target URL `/41`).
|
||
|
||
**Likely fix direction:** after `POST=1`, do not trust a pre-existing terminal status on the same SHA.
|
||
Poll for evidence that belongs to the newly-triggered run (e.g. a newer status timestamp, a pending
|
||
status after the new comment, or a changed build URL/context generation marker) before returning.
|
||
|
||
### [adversary] A5-2 — CRITICAL: testme-on-pr.sh cannot read verdicts (commit status vs comment mismatch)
|
||
**Status:** CLOSED — re-tested 2026-05-31T19:41:12Z; see `REVIEW-5.md` follow-up entry.
|
||
|
||
`testme-on-pr.sh` reads Gitea commit statuses on the recipe PR's head SHA. But the bridge NEVER
|
||
sets Gitea commit statuses on recipe repos — it only posts PR comments (the YunoHost card+badge).
|
||
Drone posts commit statuses on the `cc-ci` repo (its own repo), not on recipe repos.
|
||
|
||
**Evidence:**
|
||
- `GET /repos/recipe-maintainers/custom-html/commits/db9a95024e9d.../status` → `state:'', statuses:0`
|
||
- `POST=0 testme-on-pr.sh custom-html 2` → `VERDICT=PENDING BUILD=?` (always, on any known-green PR)
|
||
- Bridge source `bridge.py`: no call to `POST /repos/{owner}/{recipe}/statuses/{sha}` anywhere
|
||
|
||
**Required fix (one of):**
|
||
1. (Preferred) Bridge: after triggering a Drone build, POST `state=pending` on the recipe PR's head
|
||
SHA; on build completion, POST `state=success` or `state=failure` with the build URL as
|
||
`target_url`. This makes `testme-on-pr.sh` work unmodified, adds a native SCM status indicator.
|
||
2. `testme-on-pr.sh`: scan the recipe PR's comments for the `<!-- cc-ci:testme -->` marker and parse
|
||
the result from the comment body (fragile but avoids bridge changes).
|
||
|
||
**Repro:** `POST=0 MAX_WAIT=60 INTERVAL=5 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh custom-html 2`
|
||
→ always `VERDICT=PENDING` even after a green Drone build.
|
||
|
||
(Only Adversary closes this, after re-testing with a VERDICT=GREEN on a real green build.)
|
||
|
||
### [adversary] A5-1 — custom-html-tiny not in bridge poll list
|
||
**Status:** CLOSED — re-tested 2026-05-31T19:41:12Z; see `REVIEW-5.md` follow-up entry.
|
||
|
||
The Phase 5 plan specifies using `custom-html-tiny` as the sandbox recipe for V3–V8 tests.
|
||
However the bridge's poll list (from live container logs) does NOT include `recipe-maintainers/custom-html-tiny`:
|
||
```
|
||
poller (primary) watching ['recipe-maintainers/cc-ci', 'recipe-maintainers/custom-html',
|
||
'recipe-maintainers/keycloak', 'recipe-maintainers/cryptpad', 'recipe-maintainers/matrix-synapse',
|
||
'recipe-maintainers/lasuite-docs', 'recipe-maintainers/n8n', 'recipe-maintainers/hedgedoc'] every 30s
|
||
```
|
||
|
||
This means `!testme` on a `custom-html-tiny` PR will NOT trigger a Drone build. Either:
|
||
1. The builder must add `custom-html-tiny` to the bridge's enrolled repos list (and enroll its tests), OR
|
||
2. Use `custom-html` (which IS enrolled) as the sandbox recipe instead, OR
|
||
3. The plan's V3–V8 tests must first enroll the sandbox recipe as part of Phase 5 setup
|
||
|
||
**Repro:** `docker logs ccci-bridge_app.1.<id> 2>&1 | head -3` on cc-ci shows the poll list.
|
||
|
||
**Impact:** V3, V4, V5, V8 tests using `custom-html-tiny` as sandbox will fail silently (the `!testme`
|
||
comment is posted but the bridge never sees it → VERDICT stays PENDING forever).
|
||
|
||
(Only Adversary closes this after re-test.)
|