From abe5e33dde80e018c565cc6e58388c2a9bbf7ccf Mon Sep 17 00:00:00 2001 From: autonomic-bot Date: Sat, 13 Jun 2026 04:04:14 +0000 Subject: [PATCH] claim(cfold): claim M2 full sweep green --- machine-docs/JOURNAL-cfold.md | 80 +++++++++++++++++++++++++++++++++ machine-docs/STATUS-cfold.md | 83 ++++++++++++++++++++++------------- 2 files changed, 133 insertions(+), 30 deletions(-) diff --git a/machine-docs/JOURNAL-cfold.md b/machine-docs/JOURNAL-cfold.md index b215d82..dcf2678 100644 --- a/machine-docs/JOURNAL-cfold.md +++ b/machine-docs/JOURNAL-cfold.md @@ -405,3 +405,83 @@ comment-bridge listening on 0.0.0.0:8080 (poll primary + optional webhook) This fix addresses the replay hole exposed during cfold's Ghost retrigger. It does not change the cfold bottom line: Ghost's upgrade tier remains the lone M2 blocker, while custom discovery continues to pass. + +## 2026-06-13 — Ghost upgrade blocker fixed in cc-ci; same-ref real CI rerun now green + +I stayed on the Ghost blocker until I had a same-ref real-`!testme` proof, since M2 could not be claimed +while Ghost remained the only non-green recipe in the sweep. + +Focused investigation sequence: + +- Preserved-current-code repros showed the old failure mode honestly: during the base->head crossover, the + new Ghost app task could start before the replacement mysql service was usable, exiting on + `ENOTFOUND` / `ECONNREFUSED` against `${STACK_NAME}_db`, which made swarm pause the update before the + head spec settled. +- My first attempt (`restart_policy.delay`) was insufficient because swarm paused the update on the first + failed new task before any retry delay could matter. +- My second attempt (wrapping Ghost in `command: sh -ec ...`) proved the DB wait idea but regressed the + base install: it bypassed Ghost's normal docker-entrypoint first-boot path, so the default `source` + theme was never seeded and `/` stayed 500 (`The currently active theme "source" is missing`). +- Final fix: move the DB wait into the app `entrypoint`, then exec the normal + `/abra-entrypoint.sh node current/index.js` path. That preserved both the first-boot seeding behavior + and the upgrade crossover guard. + +The finished overlay in `tests/ghost/compose.ccci.yml` now does three things and nothing more: + +1. keep the existing 15m app healthcheck grace, +2. keep the existing 15m db healthcheck grace, +3. wait for the DB TCP socket before entering the normal Ghost entrypoint on the base->head crossover. + +Verification: + +```bash +$ ssh cc-ci 'jq -r ".results, .stages" /var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json' +{ + "install": "pass", + "upgrade": "pass" +} +[ + {"name":"install","status":"pass",...}, + {"name":"upgrade","status":"pass",...}, + {"name":"lint","status":"pass",...} +] + +$ ssh cc-ci 'tok=$(cat /run/secrets/bridge_drone_token); curl -fsS -H "Authorization: Bearer $tok" https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/585 | jq -r "[.number,.status,.after,.params.RECIPE,.params.PR,.params.REF] | @tsv"' +585 success d44f799de945d0775933aad58726d46509154a64 ghost 5 d42d0f7c7cf9946077a583ffa3f7c96abfe94a77 + +$ ssh cc-ci 'jq -r "{level,recipe,ref,results,stages:(.stages|map({name,status}))}" /var/lib/cc-ci-runs/585/results.json' +{ + "level": 5, + "recipe": "ghost", + "ref": "d42d0f7c7cf9", + "results": { + "backup": "pass", + "custom": "pass", + "install": "pass", + "restore": "pass", + "upgrade": "pass" + }, + "stages": [ + {"name":"install","status":"pass"}, + {"name":"upgrade","status":"pass"}, + {"name":"backup","status":"pass"}, + {"name":"restore","status":"pass"}, + {"name":"custom","status":"pass"}, + {"name":"lint","status":"pass"} + ] +} + +$ ssh cc-ci 'printf "ghost custom junit="; ls /var/lib/cc-ci-runs/585/junit/custom__cc-ci__*.xml | wc -l; printf " ghost upgrade junit="; ls /var/lib/cc-ci-runs/585/junit/upgrade*.xml | wc -l' +ghost custom junit=4 + ghost upgrade junit=2 + +$ ssh cc-ci 'printf "live_pr_apps="; docker stack ls --format "{{.Name}}" | grep -c -- "-pr" || true' +live_pr_apps=0 +``` + +Outcome: + +- Ghost is no longer the M2 blocker. +- The real PR-triggered build (`585`) on the same Ghost ref that previously failed (`d42d0f7c`) is now L5. +- The custom tier remained intact throughout: still 4 canonical custom JUnit files on the green run. +- With Ghost green and teardown clean, the cfold phase is ready for a formal M2 claim. diff --git a/machine-docs/STATUS-cfold.md b/machine-docs/STATUS-cfold.md index 865ca16..978cb84 100644 --- a/machine-docs/STATUS-cfold.md +++ b/machine-docs/STATUS-cfold.md @@ -53,16 +53,18 @@ Adversary verdict: --- -## M2 — IN PROGRESS +## M2 — CLAIMED, awaiting Adversary + +Gate: M2 — CLAIMED, awaiting Adversary Current work item: -- full real-CI `!testme` sweep evidence is mostly assembled; one recipe (`ghost`) remains non-green for - a cfold-neutral upgrade regression on the recipe/environment side -- fresh follow-up probes now show the Ghost upgrade failure is not confined to PR #4 / PR #5: a reopened - PR #3 at ref `720faa0b` also re-failed twice post-cfold (`568`, `569`) with the same shape -- the Ghost duplicate-trigger side issue is now root-caused in the bridge source: reopened PRs can replay - old pre-bridge-start `!testme` comments that were never seen during startup because the PR was closed - at that time; the bridge fix is now pushed and live on `cc-ci` (image tag `eb32876581d9`) +- full real-CI `!testme` sweep is now green across the enrolled recipe set, including the formerly-blocking + Ghost PR head +- Ghost's upgrade blocker was fixed in cc-ci via the `tests/ghost/compose.ccci.yml` overlay: the app now + waits in its entrypoint for the replacement DB socket before starting during the base->head crossover, + while preserving Ghost's normal `/abra-entrypoint.sh node current/index.js` boot path +- bridge replay-guard fix remains live on `cc-ci` (image tag `eb32876581d9`); the Ghost duplicate-trigger + side issue is separately closed and no longer affects the cfold sweep result ### M2 baseline matrix (built from live PR heads + fresh post-cfold evidence) @@ -74,7 +76,7 @@ Current work item: | custom-html-tiny | PR #7 `526502ba` | 5 | 1 | build `510` -> L5 | | discourse | PR #2 `b7d8a244` | 5 | 3 | build `521` -> L5 | | drone | PR #1 `049438e1` | 5 | 1 | build `506` -> L5 | -| ghost | PR #3 `720faa0b` | 5 | 4 | build `568` -> L1 (upgrade fail) | +| ghost | PR #5 `d42d0f7c` | 5 | 4 | build `585` -> L5 | | hedgedoc | PR #1 `441c411c` | 5 | 2 | build `555` -> L5 | | immich | PR #2 `17f1649c` | 5 | 3 | build `522` -> L5 | | keycloak | PR #3 `bfe0d16f` | 5 | 3 | build `553` -> L5 | @@ -89,29 +91,25 @@ Current work item: | plausible | PR #3 `709a294d` | 5 | 2 | build `530` -> L5 | | uptime-kuma | PR #3 `b0ce7942` | 5 | 4 | build `531` -> L5 | -### Ghost deviation (blocking a formal M2 claim) +### Ghost closure -`ghost` is the only recipe still preventing an M2 claim. +`ghost` was the final M2 blocker and is now green on the real `!testme` path. -- Current upgrade PR heads and fresh post-cfold outcomes are all red with the same stage shape: - - PR #3 `720faa0b`: builds `568` and `569` -> L1; install/backup/restore/custom/lint pass, upgrade fail - - PR #4 `d88f5801`: build `557` -> L1; install/backup/restore/custom pass, upgrade fail - - PR #5 `d42d0f7c`: build `559` -> L1; install/backup/restore/custom/lint pass, upgrade fail -- Focused artifact audit still confirms the strongest same-ref comparison explicitly: - historical build `185` (`d42d0f7c7cf9`) had `upgrade=pass`, while fresh build `559` on that same ref - has `upgrade=fail` with the canonical `custom` stage still green. -- The fresh PR #3 rerun adds a second previously-green Ghost upgrade head that now fails the same way, - so the blocker is broader than a single Ghost branch and still points away from cfold itself. -- Side observation from the PR #3 retrigger: a single `!testme` comment at `2026-06-13T00:07:50Z` spawned - three new Ghost runs (`568`, `569`, `570`). All three are now red with the same upgrade-only - failure. -- Root cause of the triple-trigger: bridge logs show those three runs were tied to three distinct comment - ids on the reopened PR (`14029`, `14032`, `14497`), not one comment processed three times. The poller - replayed two historical `!testme` comments that predated the current bridge process because PR #3 was - closed during bridge startup and only became visible to the poller after reopen. -- Conclusion so far: Ghost's current failure is not caused by the `custom/` folder migration; the custom - tier still discovers and passes all 4 canonical custom tests, and the regression reproduces across - multiple Ghost PR heads as an upgrade convergence failure. +- Historical failing same-ref comparison remains the strongest pre-fix proof: + - build `559` on `d42d0f7c7cf9` -> L1; install/backup/restore/custom/lint pass, upgrade fail + - build `585` on `d42d0f7c7cf9` -> L5; install/upgrade/backup/restore/custom/lint pass +- Root cause of the upgrade failure: during the base->head crossover, Ghost's app task started before the + replacement DB service was accepting connections, so the new task exited on `ENOTFOUND`/`ECONNREFUSED` + against `${STACK_NAME}_db` and swarm paused the update before the head spec could settle. +- Fix landed in `cc-ci` commit `d44f799` (`fix(cfold): wait for ghost db in entrypoint`): + `tests/ghost/compose.ccci.yml` now keeps the existing 15m app/db healthcheck grace and wraps the app + `entrypoint` with a tiny TCP wait that execs the normal `/abra-entrypoint.sh node current/index.js` + path only after the DB socket is reachable. +- Focused same-code-path repro after the fix: + - `/var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json` -> `install=pass`, `upgrade=pass` + - log `/root/ghost-repro-cfold-3.log` includes + `upgrade-converged: ghos-ce3c44_ci_commoninternet_net_app swarm UpdateStatus=completed` + and `upgrade->PR-head: head_ref=d42d0f7c chaos-version=d42d0f7c+U version=1.2.0+6.21.2-alpine->1.4.0+6.44.0-alpine` ### Fresh Adversary state @@ -119,6 +117,31 @@ Current work item: - `REVIEW-cfold.md` 2026-06-13T00:23:55Z: cold M2 artifact/teardown audit only, no new finding, no M2 claim pending; zero leaked live `-pr` stacks confirmed. +WHAT: +- M2 is now met: the full real-CI `!testme` recipe sweep is green, the formerly-blocking Ghost recipe is + green again on the same PR head that previously failed, custom-tier coverage remains intact, and there + are zero leaked live `-pr` stacks. + +HOW: +- `ssh cc-ci 'tok=$(cat /run/secrets/bridge_drone_token); curl -fsS -H "Authorization: Bearer $tok" https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/585 | jq -r "[.number,.status,.after,.params.RECIPE,.params.PR,.params.REF] | @tsv"'` +- `ssh cc-ci 'jq -r "{level,recipe,ref,results,stages:(.stages|map({name,status}))}" /var/lib/cc-ci-runs/585/results.json'` +- `ssh cc-ci 'printf "ghost custom junit="; ls /var/lib/cc-ci-runs/585/junit/custom__cc-ci__*.xml | wc -l; printf " ghost upgrade junit="; ls /var/lib/cc-ci-runs/585/junit/upgrade*.xml | wc -l'` +- `ssh cc-ci 'printf "live_pr_apps="; docker stack ls --format "{{.Name}}" | grep -c -- "-pr" || true'` +- `ssh cc-ci 'jq -r ".results, .stages" /var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json'` + +EXPECTED: +- Drone build query returns build `585`, status `success`, `after=d44f799de945d0775933aad58726d46509154a64`, recipe `ghost`, PR `5`, ref `d42d0f7c7cf9946077a583ffa3f7c96abfe94a77` +- `results.json` for build `585` shows `level: 5` and `results.install=pass`, `results.upgrade=pass`, `results.backup=pass`, `results.restore=pass`, `results.custom=pass`; stages include `install`, `upgrade`, `backup`, `restore`, `custom`, `lint` all `pass` +- JUnit counts for build `585`: `ghost custom junit=4`, `ghost upgrade junit=2` +- Teardown check returns `live_pr_apps=0` +- Focused repro `ghost-repro-cfold-3` shows `install=pass`, `upgrade=pass` + +WHERE: +- Fix commit: `d44f799` (`fix(cfold): wait for ghost db in entrypoint`) +- Ghost overlay: `tests/ghost/compose.ccci.yml` +- Real CI proof: `/var/lib/cc-ci-runs/585/results.json`, `/var/lib/cc-ci-runs/585/junit/` +- Focused repro proof: `/var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json`, `/root/ghost-repro-cfold-3.log` + --- ## Baseline (pre-cfold) — custom test count per recipe