Files
cc-ci/machine-docs/BACKLOG-5.md
autonomic-bot fd48daefc6
Some checks failed
continuous-integration/drone/push Build is failing
review(5): A5-7 CLOSED + §4 cron PASS + full gate M5 PASS @23:20Z
CronCreate mechanism cold-verified: upgrader-cron.log created at 23:18:21Z with
correct content; upgrader was started by cron fire; DECISIONS.md updated.
busybox crond correctly replaced with CronCreate (plan §4 "Claude scheduled task").

All V1-V9 + §4 cron now PASS within 24h. No open findings, no VETOs.
Builder may write ## DONE to STATUS-5.md.
2026-06-01 23:21:45 +00:00

15 KiB
Raw Permalink Blame History

Phase 5 — BACKLOG

SSOT: /srv/cc-ci/cc-ci-plan/plan-phase5-verify-upgrade-flow.md. DoD = V1V9. Single-writer: ## Build backlog = Builder-only; ## Adversary findings = Adversary-only.


Build backlog

  • Create phase 5 state files (STATUS-5.md, BACKLOG-5.md, JOURNAL-5.md)
  • Fix A5-2: Add commit status posting to bridge.py (pending on trigger, success/failure on finish)
  • Fix A5-1: Add custom-html-tiny to bridge POLL_REPOS; redeploy bridge (cc-ci-bridge:3761c4221042)
  • V3: /recipe-upgrade custom-html-tiny end-to-end GREEN (!testme PASS; PR #2 open)
  • V7: mirror reconciliation (PR #1 superseded, PR #4 merged-upstream, main force-synced)
  • V1/V2: !testme trigger + testme-on-pr.sh reads verdict (GREEN on PR #2/#35; RED on PR #5/#34)
  • Fix A5-3: make POST=1 testme-on-pr.sh ignore stale prior status on same PR head
  • V4: 3-iteration regression loop (seed bad tag → RED → fix → GREEN in 2 runs)
  • V5: stale-test DEFAULT = comment, no test edit (PASS per Adversary A5-5 closed 21:49Z)
  • V6: --with-tests opens + verifies cc-ci test PR (PASS per Adversary REVIEW-5.md 21:38Z)
  • Fix A5-6: enroll uptime-kuma in bridge POLL_REPOS (done: commit 51ba205)
  • V8: /upgrade-all DEFAULT run (--dry-run list + small live run) — upgrader running
  • V8a: cc-ci-upgrader agent (launch-upgrader.sh start/stop/status cycle) — partial
  • V9: cleanup all verification PRs + deploys; install weekly cron (Phase 5 §4)

Adversary findings

[adversary] A5-7 — §4 cron: busybox crond does NOT execute jobs as non-root user

Status: CLOSED — re-tested 2026-06-01T23:20Z; CronCreate fire verified; see REVIEW-5.md entry. ORIGINALLY OPEN — found 2026-06-01T23:11Z

The §4 weekly cron was installed using busybox crond in a tmux session, invoked with:

crond -f -d 5 -c /home/loops/.cc-ci-crontabs -L /srv/cc-ci/.cc-ci-logs/crond.log

The crontab file /home/loops/.cc-ci-crontabs/loops contains the correct schedule (4 23 * * 1).

Finding: crond never executes any job.

Cold-verified T0 miss at 23:04Z (2 minutes after T0):

  • /srv/cc-ci/.cc-ci-logs/upgrader-cron.log does NOT exist.
  • crond.log shows only 3 startup lines; last modified 22:08:44 UTC — no entries after startup.
  • No cc-ci-upgrader session started at 23:04Z (python3 launch-upgrader.py status → stopped).

Cold-verified with * * * * * test entry (every-minute control):

  • Added * * * * * date -u >> /tmp/cc-ci-crond-test.log 2>&1 to the crontab.
  • Waited through 23:09 and 23:10 UTC — no /tmp/cc-ci-crond-test.log created.
  • Confirmed: busybox crond is completely ignoring ALL cron entries.

Root cause: busybox crond's -c dir mode is designed to run as root. It reads each file in the directory as a per-user crontab (filename = username). Before executing a job, it calls setgid(pw->pw_gid) + setuid(pw->pw_uid). Running as non-root user loops, setgid/setuid fail with EPERM, so crond silently skips all jobs.

Impact: The §4 weekly cron is completely non-functional. T0 (23:04 UTC) was missed. The plan's §4 requirement ("verify the cron-equivalent path end-to-end; confirm real first fire at T0") is NOT met.

Required fix: Replace busybox crond with a mechanism that works as a non-root user. Options per plan §4:

  1. Claude scheduled task (/schedule skill → CronCreate harness tool): built-in, no root needed, tested mechanism.
  2. systemd user timer (systemctl --user enable/start cc-ci-upgrader.timer): requires writing a user service unit file to ~/.config/systemd/user/.
  3. at one-off for T0: doesn't provide recurring weekly schedule.

Cold repro:

  1. ssh loops@<orch> 'cat /srv/cc-ci/.cc-ci-logs/upgrader-cron.log 2>/dev/null || echo "(no log)"' → "(no log)"
  2. ssh loops@<orch> 'stat /srv/cc-ci/.cc-ci-logs/crond.log | grep Modify' → Modify: 2026-06-01 22:08:44 (no update after crond start)
  3. ssh loops@<orch> 'python3 /srv/cc-ci/cc-ci-plan/launch-upgrader.py status' → "stopped"

(Only Adversary closes this after re-test with a working T0 fire.)


[adversary] A5-5 — V5: explanatory comment references wrong build/failures; no RESULT: SUCCESS-PENDING-TESTS

Status: CLOSED — re-tested 2026-06-01T21:49Z; see REVIEW-5.md follow-up entry. ORIGINALLY OPEN — found 2026-06-01T21:38Z

V5 requires the recipe-upgrade skill in DEFAULT mode (no --with-tests) to: post an explanatory comment that accurately identifies which test is stale + why; and report RESULT: SUCCESS-PENDING-TESTS. The seeded custom-html evidence does not satisfy both requirements.

Finding 1 — Explanatory comment references build #40, not build #75. The explanatory comment #13883 was posted at 2026-06-01T19:41:22 (before the MIME-only commits ee5cb811/71e7326a) and says: "Observed on !testme build #40". Build #40 had docroot-path failures in three test files (test_backup.py, test_content_roundtrip.py, test_content_type_header.py). Build #75 (the final seeded case, ref 71e7326a) has ONE failure: test_content_type_header.py MIME type assertion (application/octet-stream vs text/plain). The comment describes a different seeded scenario from the final one — wrong build number, wrong root cause, extra test failures that don't appear in build #75.

Finding 2 — No RESULT: SUCCESS-PENDING-TESTS produced. No custom-html-upgrade-*.md exists in /srv/cc-ci/.cc-ci-logs/upgrades/. The V5 evidence uses testme-on-pr.sh POST=1 directly; /recipe-upgrade custom-html was not run end-to-end on the MIME-only seeded case.

Cold repro:

  1. Check comment #13883 on recipe-maintainers/custom-html PR#3: says "build #40" and docroot-path failures.
  2. Check ci.commoninternet.net/runs/75/results.json: single failure in test_content_type_header.py (MIME type), no docroot-path failures.
  3. Run find /srv/cc-ci* -name "*custom-html*upgrade*" — no log file produced.

Required fix: Re-run /recipe-upgrade custom-html in DEFAULT mode against the existing seeded PR #3 (head 71e7326a). The skill should:

  1. See VERDICT=RED from testme-on-pr.sh
  2. Read build #75 failures → only test_content_type_header.py (MIME type)
  3. Post a new/updated explanatory comment on PR #3 referencing build #75 and the MIME-type root cause
  4. Write RESULT: SUCCESS-PENDING-TESTS — custom-html ... recipe PR: ... to /srv/cc-ci/.cc-ci-logs/upgrades/custom-html-upgrade-<date>.md

(Only Adversary closes this, after re-testing with accurate comment and RESULT line.)


[adversary] A5-6 — V8: /upgrade-all uptime-kuma live run is broken — recipe not enrolled in bridge or tests/

Status: CLOSED — build #91 GREEN 2026-06-01T22:07Z; see REVIEW-5.md V8/V8a cold-verify entry. ORIGINALLY OPEN — found 2026-06-01T21:52Z

The V8 live run chose uptime-kuma as the test recipe. Two enrollment blockers were found via cold verification:

Blocker 1 — uptime-kuma NOT in bridge POLL_REPOS:

  • Live bridge poll list (from docker service logs): ['cc-ci','custom-html','custom-html-tiny','keycloak','cryptpad','matrix-synapse','lasuite-docs','lasuite-meet','n8n','hedgedoc']
  • uptime-kuma is absent. So when the upgrader posted !testme on PR#1 (comment #13902 at 2026-06-01T21:48:39Z), the bridge will NEVER pick it up.
  • POST=1 testme-on-pr.sh uptime-kuma 1 will eventually time out and return VERDICT=PENDING BUILD=?.

Blocker 2 — uptime-kuma has no tests/ directory in cc-ci (RETRACTED) Builder's correction verified: ls /root/builder-clone/tests/uptime-kuma/ → EXISTS (functional/ PARITY.md recipe_meta.py). Phase 2 commit 1aaf3bd. This finding was incorrect.

Impact: The V8 live run evidence was invalid at time of filing — uptime-kuma was not in bridge POLL_REPOS. The tests/ directory DOES exist (finding 2 was incorrect). The /upgrade-all dry-run survey listed it as a candidate because abra recipe upgrade found available upgrades, which is independent of bridge enrollment.

Cold repro:

  1. ssh cc-ci '/run/current-system/sw/bin/docker service logs ccci-bridge_app 2>&1 | grep "watching\|uptime"' → only older poll lists, no uptime-kuma
  2. ssh cc-ci 'ls /root/builder-clone/tests/' → no uptime-kuma directory
  3. grep uptime /srv/cc-ci/cc-ci-adv/nix/modules/bridge.nix → no match
  4. Check commit status: GET /repos/recipe-maintainers/uptime-kuma/commits/728618890a2b/statusstate:'', total_count:0 after the !testme comment was already posted

Fix applied (commit 51ba205): Added recipe-maintainers/uptime-kuma to POLL_REPOS in bridge.nix. Bridge redeployed (container 9mtdhzx7eylf). Upgrader restarted at 21:54:25Z.

Cold-verify of fix:

  • New bridge container 9mtdhzx7eylf confirms uptime-kuma in poll list ✓
  • tests/uptime-kuma/ verified present ✓ (finding 2 was incorrect)
  • Awaiting first !testme trigger to confirm bridge picks up the run

(Only Adversary closes this after cold-verify of a successful live V8 run with uptime-kuma.)


[adversary] A5-4 — matrix-synapse stale-test/default path leaves no recipe commit status

Status: CLOSED — re-tested 2026-06-01T18:53:30Z; see REVIEW-5.md follow-up entry.

On the live V5 stale-test candidate recipe-maintainers/matrix-synapse PR #1, the PR comments show a terminal failed !testme result for build #53 plus the default-mode explanatory stale-test comment, but the recipe PR head has no cc-ci/testme commit status at all. As a result, the helper cannot read the verdict back from the PR and poll-only returns PENDING even though the PR already shows the terminal outcome.

Cold repro:

  1. Use recipe-maintainers/matrix-synapse PR #1, head 21e5d84430bdc52f8fa8aa9a40fa5bda8adf06c0.
  2. Confirm PR comments include:
    • failure result comment for build #53 (#13872), and
    • explanatory stale-test comment (#13877).
  3. Run: POST=0 MAX_WAIT=20 INTERVAL=5 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh matrix-synapse 1
  4. Observe:
    • helper returns VERDICT=PENDING and BUILD=?;
    • GET /repos/recipe-maintainers/matrix-synapse/commits/21e5d84430bdc52f8fa8aa9a40fa5bda8adf06c0/status returns {"state":"","total_count":0,"statuses":null}.

Impact: this breaks the Phase-5 requirement that the upgrade tooling read the verdict back from the PR on the live stale-test/default path. The comment surface says the run is terminal; the status surface still says nothing.

Re-test result: no longer reproducible on rerun build #63. The recipe PR head now shows cc-ci/testme pending -> failure with target URL .../63, and poll-only returns VERDICT=PENDING BUILD=.../63 while in flight, then VERDICT=RED BUILD=.../63 after completion.

[adversary] A5-3 — POST=1 testme-on-pr.sh can return a stale prior GREEN on re-runs

Status: CLOSED — re-tested 2026-06-01T03:31:30Z; see REVIEW-5.md follow-up entry.

The helper currently posts a fresh !testme, then polls the recipe PR head's combined commit status. If that PR head SHA already has a previous successful cc-ci/testme status and the bridge has not yet processed the new comment, the helper exits immediately with the old GREEN/build URL instead of a fresh PENDING or the new run's URL.

This is a real Phase-5/V2 correctness bug because re-commenting !testme on the same PR head is a supported path, and the helper is meant to report the verdict for the run it just triggered.

Cold repro:

  1. Use an open PR whose current head SHA already has cc-ci/testme: success from an earlier run.
  2. Record the PR comment count.
  3. Run: POST=1 MAX_WAIT=40 INTERVAL=5 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh custom-html-tiny 5
  4. Observe:
    • the PR comment count increases by exactly one (3 -> 4 in the reproducer), so one fresh !testme was posted;
    • the helper returns VERDICT=GREEN with the old build URL https://drone.ci.commoninternet.net/recipe-maintainers/cc-ci/37;
    • later, the live system shows a new run was actually triggered and reflected on the PR as build #41 (cc-ci/testme pending -> success, target URL /41).

Likely fix direction: after POST=1, do not trust a pre-existing terminal status on the same SHA. Poll for evidence that belongs to the newly-triggered run (e.g. a newer status timestamp, a pending status after the new comment, or a changed build URL/context generation marker) before returning.

[adversary] A5-2 — CRITICAL: testme-on-pr.sh cannot read verdicts (commit status vs comment mismatch)

Status: CLOSED — re-tested 2026-05-31T19:41:12Z; see REVIEW-5.md follow-up entry.

testme-on-pr.sh reads Gitea commit statuses on the recipe PR's head SHA. But the bridge NEVER sets Gitea commit statuses on recipe repos — it only posts PR comments (the YunoHost card+badge). Drone posts commit statuses on the cc-ci repo (its own repo), not on recipe repos.

Evidence:

  • GET /repos/recipe-maintainers/custom-html/commits/db9a95024e9d.../statusstate:'', statuses:0
  • POST=0 testme-on-pr.sh custom-html 2VERDICT=PENDING BUILD=? (always, on any known-green PR)
  • Bridge source bridge.py: no call to POST /repos/{owner}/{recipe}/statuses/{sha} anywhere

Required fix (one of):

  1. (Preferred) Bridge: after triggering a Drone build, POST state=pending on the recipe PR's head SHA; on build completion, POST state=success or state=failure with the build URL as target_url. This makes testme-on-pr.sh work unmodified, adds a native SCM status indicator.
  2. testme-on-pr.sh: scan the recipe PR's comments for the <!-- cc-ci:testme --> marker and parse the result from the comment body (fragile but avoids bridge changes).

Repro: POST=0 MAX_WAIT=60 INTERVAL=5 /srv/cc-ci/.claude/skills/recipe-upgrade/testme-on-pr.sh custom-html 2 → always VERDICT=PENDING even after a green Drone build.

(Only Adversary closes this, after re-testing with a VERDICT=GREEN on a real green build.)

[adversary] A5-1 — custom-html-tiny not in bridge poll list

Status: CLOSED — re-tested 2026-05-31T19:41:12Z; see REVIEW-5.md follow-up entry.

The Phase 5 plan specifies using custom-html-tiny as the sandbox recipe for V3V8 tests. However the bridge's poll list (from live container logs) does NOT include recipe-maintainers/custom-html-tiny:

poller (primary) watching ['recipe-maintainers/cc-ci', 'recipe-maintainers/custom-html',
'recipe-maintainers/keycloak', 'recipe-maintainers/cryptpad', 'recipe-maintainers/matrix-synapse',
'recipe-maintainers/lasuite-docs', 'recipe-maintainers/n8n', 'recipe-maintainers/hedgedoc'] every 30s

This means !testme on a custom-html-tiny PR will NOT trigger a Drone build. Either:

  1. The builder must add custom-html-tiny to the bridge's enrolled repos list (and enroll its tests), OR
  2. Use custom-html (which IS enrolled) as the sandbox recipe instead, OR
  3. The plan's V3V8 tests must first enroll the sandbox recipe as part of Phase 5 setup

Repro: docker logs ccci-bridge_app.1.<id> 2>&1 | head -3 on cc-ci shows the poll list.

Impact: V3, V4, V5, V8 tests using custom-html-tiny as sandbox will fail silently (the !testme comment is posted but the bridge never sees it → VERDICT stays PENDING forever).

(Only Adversary closes this after re-test.)