19 KiB
JOURNAL — phase cfold
2026-06-11 — Phase cfold start
Investigation findings
Pre-existing test layout:
- 60 files in
functional/subdirs across 20 recipes - 4 files in
playwright/subdirs (cryptpad, custom-html, uptime-kuma) - Helper modules to move:
_discourse.py,_ghost.py,_mailu.py,_mm.py,_mumble_proto.py,drone/functional/__init__.py mailu/test_backup.py,test_restore.py,ops.pyexplicitly addfunctional/to sys.path — need updating tocustom/
Decision: deprecated aliases
Per plan §2 option (RECOMMENDED): keep recognizing functional//playwright/ as deprecated aliases
AND emit a loud one-line warning when a test is found in a deprecated folder. Using warnings.warn()
at import time of discovery or print() directly. Will use print() (stderr) so it shows up in CI
logs without needing to configure warning filters.
Implementation: subdirs = ("custom", "functional", "playwright") — canonical first — and after
finding a test in functional/ or playwright/, emit:
print(f"WARNING [cfold]: test found in deprecated folder '{sub}/' — move to custom/: {path}", flush=True, file=sys.stderr)
This way:
custom/is canonical and gets discovered first- Old folders still work (zero breakage for repo-local tests) but emit a loud warning
- No silent coverage loss possible
2026-06-12 — M1 checkpoint: canonical custom/ layout landed locally
Code/work completed:
runner/harness/discovery.py: canonicalcustom/discovery, deprecated alias warnings, andcustom_subdir_label()normalization helper.runner/harness/manifest.py: custom-test counts now normalize to canonicalcustom.- all cc-ci custom tests/helper modules moved from
tests/<recipe>/{functional,playwright}/intotests/<recipe>/custom/. - helper-import fallout fixed where needed (
tests/mailu/{ops.py,test_backup.py,test_restore.py}). - docs updated to describe
custom/as the canonical layout and explain the alias-compatibility window.
Mechanical move summary:
- 64 custom test files relocated into
custom/ - helper modules relocated too:
_discourse.py,_ghost.py,_mailu.py,_mm.py,_mumble_proto.py,tests/drone/custom/__init__.py
Verification:
nix shell nixpkgs#python312Packages.pytest --command pytest \
tests/unit/test_discovery.py tests/unit/test_discovery_phase2.py tests/unit/test_manifest.py -q
# ..................
# 18 passed in 0.09s
Post-move grep state:
- remaining
functional//playwright/matches in live code are intentional: alias-policy docs, deprecated-folder assertions in the unit tests, and discovery comments describing the alias behavior. - the pre-migration inventory in
BACKLOG-cfold.mdis intentionally unchanged because it is the M1 baseline record the Adversary will compare against.
2026-06-12 — M1 coverage proof assembled
Verification commands + observed outputs:
$ git ls-files "tests/*/custom/test_*.py" | wc -l
64
$ git ls-files "tests/*/functional/*" "tests/*/playwright/*"
# no output
$ for recipe in bluesky-pds cryptpad custom-html custom-html-tiny discourse drone ghost hedgedoc immich keycloak lasuite-docs lasuite-drive lasuite-meet mailu matrix-synapse mattermost-lts mumble n8n plausible uptime-kuma; do count=$(git ls-files "tests/$recipe/custom/test_*.py" | wc -l); printf "%s %s\n" "$recipe" "$count"; done
bluesky-pds 4
cryptpad 4
custom-html 4
custom-html-tiny 1
discourse 3
drone 1
ghost 4
hedgedoc 2
immich 3
keycloak 3
lasuite-docs 5
lasuite-drive 3
lasuite-meet 3
mailu 3
matrix-synapse 3
mattermost-lts 3
mumble 5
n8n 4
plausible 2
uptime-kuma 4
$ nix shell nixpkgs#python311Packages.pytest -c pytest tests/unit/test_discovery.py tests/unit/test_discovery_phase2.py tests/unit/test_manifest.py -q
..................
18 passed in 0.14s
Conclusion: the migrated tree still contains the exact same 64 custom test files with the same
per-recipe cardinality as the pre-cfold baseline in BACKLOG-cfold.md; only the folder paths changed.
2026-06-12 — Adversary M1 PASS received
Pulled review(cfold): M1 PASS cold verification (4b4d665). Confirmed in REVIEW-cfold.md:
- total canonical custom tests = 64
- old tracked
functional//playwright/trees = none - per-recipe counts match the baseline exactly
- focused unit suite =
18 passed - deprecated-alias warning probe works
- normalized
(recipe, filename)before/after set = exact match (missing [],extra [])
No fix-forward required. Phase advances to M2 baseline assembly.
2026-06-12 — M2 sweep snapshot: 19 fresh greens, Ghost upgrade regression remains
Bootstrap/access re-checks before the live sweep:
$ ssh cc-ci "hostname && whoami && nixos-version"
nixos
root
24.11.20250630.50ab793 (Vicuna)
$ set -a; . /srv/cc-ci/.testenv; set +a; curl -fsS "https://$GITEA_URL/api/v1/version"
{"version":"1.24.2"}
$ getent hosts "probe-$RANDOM.ci.commoninternet.net"
91.98.47.73 probe-4360.ci.commoninternet.net
Open-PR inventory before triggering uncovered recipes showed 16 enrolled repos already had live PRs;
custom-html, keycloak, cryptpad, and mumble did not. I reopened reusable closed PRs for the
first three (custom-html#2, keycloak#3, cryptpad#5) and created a minimal sweep-only mumble#1
probe PR via the Gitea API.
Fresh post-cfold success set gathered from the live server (/var/lib/cc-ci-runs/<build>/results.json):
506 drone L5
510 custom-html-tiny L5
521 discourse L5
522 immich L5
523 lasuite-docs L5
524 lasuite-drive L5
525 lasuite-meet L5
526 mailu L5
527 matrix-synapse L5
528 n8n L5
529 mattermost-lts L5
530 plausible L5
531 uptime-kuma L5
541 custom-html L5
553 keycloak L5
554 cryptpad L5
555 hedgedoc L5
556 bluesky-pds L5
558 mumble L5
Ghost is the lone non-green outlier:
557 ghost PR#4 @ d88f5801 -> L1 (install pass, upgrade fail, backup/restore/custom pass)
559 ghost PR#5 @ d42d0f7c -> L1 (same failure shape on last known-green Ghost head)
185 ghost PR#4 @ d42d0f7c -> L4 / pre-lint-era green baseline on 2026-06-05
The critical Ghost comparison is the same ref d42d0f7c:
- historical build
185(2026-06-05): upgrade passed atd42d0f7c - fresh probe build
559(2026-06-12): samed42d0f7cnow fails upgrade with swarmUpdateStatus='paused'
That isolates the regression away from cfold itself. In both fresh Ghost failures (557, 559), the
custom tier still discovered and passed all four tests/ghost/custom/test_*.py files, while the
upgrade op failed before upgrade assertions could run:
!! upgrade op failed: <ghost-domain>: upgrade redeploy did NOT converge to the head spec — swarm UpdateStatus='paused'.
The recipe's app service uses update_config failure_action=rollback/pause; the NEW (head) task failed swarm's update monitor,
so the service reverted/paused and the RUNNING spec is the previous version, not the code under test.
Adversary update pulled during this pass:
review(cfold)commit93f56aeadded only an idle audit entry toREVIEW-cfold.md- no finding filed
- no M2 PASS yet because no
claim(cfold): M2 ...commit exists
2026-06-12 — Follow-up Ghost artifact audit (same-ref historical pass vs fresh fail)
Focused cold checks after the M2 sweep snapshot:
$ ssh cc-ci "jq '{level,recipe,ref,results,rungs,stages:(.stages|map({name,status}))}' /var/lib/cc-ci-runs/185/results.json"
{
"level": 4,
"recipe": "ghost",
"ref": "d42d0f7c7cf9",
"results": {
"backup": "pass",
"custom": "pass",
"install": "pass",
"restore": "pass",
"upgrade": "pass"
},
"rungs": {
"backup_restore": "pass",
"functional": "pass",
"install": "pass",
"integration": "na",
"recipe_local": "na",
"upgrade": "pass"
},
"stages": [
{"name": "install", "status": "pass"},
{"name": "upgrade", "status": "pass"},
{"name": "backup", "status": "pass"},
{"name": "restore", "status": "pass"},
{"name": "custom", "status": "pass"}
]
}
$ ssh cc-ci "jq '{level,recipe,stages:(.stages|map({name,status,summary}))}' /var/lib/cc-ci-runs/559/results.json"
{
"level": 1,
"recipe": "ghost",
"stages": [
{"name": "install", "status": "pass", "summary": null},
{"name": "backup", "status": "pass", "summary": null},
{"name": "restore", "status": "pass", "summary": null},
{"name": "custom", "status": "pass", "summary": null},
{"name": "lint", "status": "pass", "summary": null}
]
}
$ ssh cc-ci "grep -R -n \"start_period\" /var/lib/cc-ci-runs/559/abra/recipes/ghost"
/var/lib/cc-ci-runs/559/abra/recipes/ghost/compose.yml:60: start_period: 15m
/var/lib/cc-ci-runs/559/abra/recipes/ghost/compose.yml:84: start_period: 1m
/var/lib/cc-ci-runs/559/abra/recipes/ghost/compose.ccci.yml:35: start_period: 15m
/var/lib/cc-ci-runs/559/abra/recipes/ghost/compose.ccci.yml:38: start_period: 15m
Conclusion:
- Historical build
185passed the full Ghost lifecycle on the SAME ref now used in probe build559(d42d0f7c7cf9), so the current M2 blocker is not tied to thecustom/folder migration. - Fresh failing runs still execute the canonical 4-file
tests/ghost/custom/suite and pass every non-upgrade stage; the missing upgrade junit output remains the key symptom. - The current repo does not show an obvious cfold-local fix to apply: the Ghost-specific overlay is
unchanged, the recipe artifact still carries the expected
compose.ccci.ymlfile, and the failure remains in the live upgrade path rather than discovery/custom-test coverage. - Net: cfold remains blocked on a cfold-neutral Ghost upgrade regression / flake. No repo-local code change was justified by that audit alone.
2026-06-13 — Ghost PR #3 fresh probe after reopen: same upgrade-only failure, plus duplicate trigger signal
I looked for the smallest allowed M2 step that did not touch recipe code: reuse an existing Ghost PR head
that had historically gone green and rerun it through the live !testme path.
Actions taken:
$ set -a && . /srv/cc-ci/.testenv && set +a
$ curl -fsS -u "$GITEA_USERNAME:$GITEA_PASSWORD" -X PATCH \
-H 'Content-Type: application/json' \
-d '{"state":"open"}' \
"https://$GITEA_URL/api/v1/repos/recipe-maintainers/ghost/pulls/3"
# PR #3 reopened; head remains 720faa0bebc46a34857b2933df1924ccabbd4087
$ curl -fsS -u "$GITEA_USERNAME:$GITEA_PASSWORD" -X POST \
-H 'Content-Type: application/json' \
-d '{"body":"!testme"}' \
"https://$GITEA_URL/api/v1/repos/recipe-maintainers/ghost/issues/3/comments"
# comment 14497 created at 2026-06-13T00:07:50Z
Fresh live outcomes:
$ ssh cc-ci 'jq "{run_id, pr, recipe, ref, level, results, stages: (.stages | map({name,status,summary}))}" /var/lib/cc-ci-runs/568/results.json'
{
"run_id": "568",
"pr": "3",
"recipe": "ghost",
"ref": "720faa0bebc4",
"level": 1,
"results": {
"backup": "pass",
"custom": "pass",
"install": "pass",
"restore": "pass",
"upgrade": "fail"
},
"stages": [
{"name": "install", "status": "pass", "summary": null},
{"name": "backup", "status": "pass", "summary": null},
{"name": "restore", "status": "pass", "summary": null},
{"name": "custom", "status": "pass", "summary": null},
{"name": "lint", "status": "pass", "summary": null}
]
}
$ ssh cc-ci 'jq "{run_id, pr, recipe, ref, level, finished, results, stages: (.stages | map({name,status}))}" /var/lib/cc-ci-runs/569/results.json'
{
"run_id": "569",
"pr": "3",
"recipe": "ghost",
"ref": "720faa0bebc4",
"level": 1,
"finished": 1781309502.5494862,
"results": {
"backup": "pass",
"custom": "pass",
"install": "pass",
"restore": "pass",
"upgrade": "fail"
},
"stages": [
{"name": "install", "status": "pass"},
{"name": "backup", "status": "pass"},
{"name": "restore", "status": "pass"},
{"name": "custom", "status": "pass"},
{"name": "lint", "status": "pass"}
]
}
Comment-stream evidence for duplicate triggers from one !testme:
$ curl -fsS -u "$GITEA_USERNAME:$GITEA_PASSWORD" \
"https://$GITEA_URL/api/v1/repos/recipe-maintainers/ghost/issues/3/comments?limit=20"
# ...
# 14497: !testme (2026-06-13T00:07:50Z)
# 14498: cc-ci failure comment for run 568 (2026-06-13T00:08:05Z)
# 14499: cc-ci in-progress comment for run 569 (2026-06-13T00:08:05Z)
# 14500: cc-ci in-progress comment for run 570 (2026-06-13T00:08:05Z)
Takeaways:
- Ghost is now freshly red post-cfold on three distinct PR heads (
720faa0b,d88f5801,d42d0f7c), all with the same upgrade-only failure shape while custom discovery stays green. - That further weakens any cfold-local explanation; the blocker remains in Ghost's live upgrade path.
- There is also likely a separate trigger dedupe problem: one
!testmecomment spawned runs568,569, and570. I did not broaden into a D1 investigation in this loop step because cfold M2 is already hard-blocked by Ghost's repeated upgrade failures, but the evidence is now recorded.
2026-06-13 — Root-caused Ghost triple-trigger replay; bridge fix authored with unit coverage
Pulled the Adversary's latest cfold audit (review(cfold) ddefc96). It was not an M2 verdict or a
finding; it confirmed the sweep is still unclaimable while teardown remains clean (live_pr_apps=0).
I then closed out the duplicate-run side observation from the Ghost PR #3 retrigger.
Evidence:
$ ssh cc-ci 'docker logs --since "2026-06-13T00:07:30" --until "2026-06-13T00:08:30" c54c433972ac 2>&1'
[poll] triggered build 568 for ghost@720faa0b (PR #3, comment 14029) by autonomic-bot
[poll] triggered build 569 for ghost@720faa0b (PR #3, comment 14032) by autonomic-bot
[poll] triggered build 570 for ghost@720faa0b (PR #3, comment 14497) by autonomic-bot
$ ssh cc-ci 'docker service ps ccci-bridge_app --no-trunc'
# single running replica only; no restart near the incident
$ ssh cc-ci 'docker ps --format "{{.ID}} {{.Names}} {{.Status}}" | grep ccci-bridge || true'
c54c433972ac ccci-bridge_app.1.u5msezm603izeyf7kizqxq97j Up 22 hours
Conclusion: this was NOT one comment id deduped incorrectly inside a single process. It was the poller correctly treating THREE distinct comment ids as unseen after PR #3 was reopened:
14029and14032were historical!testmecomments from when PR #3 had been open earlier.- PR #3 was closed when the current bridge process started, so those comments were not covered by the startup pass that marks pre-existing comments seen.
- When PR #3 was reopened, the poller saw those old comments for the first time and replayed them, then
also processed the fresh comment
14497.
Repo fix authored:
bridge/bridge.py: added_PROCESS_STARTED_ATand_is_preexisting_comment()so the poller now marks any trigger comment older than the current bridge process as already-seen, even if the PR was closed at startup and only becomes visible later via reopen.tests/unit/test_bridge_trigger.py: added focused tests for pre-start vs post-start comment handling.
Verification:
$ nix shell nixpkgs#python311Packages.pytest -c pytest tests/unit/test_bridge_trigger.py -q
.......... [100%]
10 passed in 0.04s
$ ssh cc-ci 'nixos-rebuild switch --flake "git+file:///root/cfold-deploy?submodules=1#cc-ci"'
# rebuild succeeded; deploy-bridge.service restarted and rolled the bridge task
$ ssh cc-ci 'docker service inspect ccci-bridge_app --format "{{.Spec.TaskTemplate.ContainerSpec.Image}}"'
cc-ci-bridge:eb32876581d9
$ ssh cc-ci 'curl -fsS https://ci.commoninternet.net/hook/healthz'
ok
$ ssh cc-ci 'docker logs --since 5m 2088e44a0534 2>&1 | sed -n "1,80p"'
poller (primary) watching ['recipe-maintainers/cc-ci', ..., 'recipe-maintainers/drone'] every 30s
comment-bridge listening on 0.0.0.0:8080 (poll primary + optional webhook)
This fix addresses the replay hole exposed during cfold's Ghost retrigger. It does not change the cfold bottom line: Ghost's upgrade tier remains the lone M2 blocker, while custom discovery continues to pass.
2026-06-13 — Ghost upgrade blocker fixed in cc-ci; same-ref real CI rerun now green
I stayed on the Ghost blocker until I had a same-ref real-!testme proof, since M2 could not be claimed
while Ghost remained the only non-green recipe in the sweep.
Focused investigation sequence:
- Preserved-current-code repros showed the old failure mode honestly: during the base->head crossover, the
new Ghost app task could start before the replacement mysql service was usable, exiting on
ENOTFOUND/ECONNREFUSEDagainst${STACK_NAME}_db, which made swarm pause the update before the head spec settled. - My first attempt (
restart_policy.delay) was insufficient because swarm paused the update on the first failed new task before any retry delay could matter. - My second attempt (wrapping Ghost in
command: sh -ec ...) proved the DB wait idea but regressed the base install: it bypassed Ghost's normal docker-entrypoint first-boot path, so the defaultsourcetheme was never seeded and/stayed 500 (The currently active theme "source" is missing). - Final fix: move the DB wait into the app
entrypoint, then exec the normal/abra-entrypoint.sh node current/index.jspath. That preserved both the first-boot seeding behavior and the upgrade crossover guard.
The finished overlay in tests/ghost/compose.ccci.yml now does three things and nothing more:
- keep the existing 15m app healthcheck grace,
- keep the existing 15m db healthcheck grace,
- wait for the DB TCP socket before entering the normal Ghost entrypoint on the base->head crossover.
Verification:
$ ssh cc-ci 'jq -r ".results, .stages" /var/lib/cc-ci-runs/ghost-repro-cfold-3/results.json'
{
"install": "pass",
"upgrade": "pass"
}
[
{"name":"install","status":"pass",...},
{"name":"upgrade","status":"pass",...},
{"name":"lint","status":"pass",...}
]
$ ssh cc-ci 'tok=$(cat /run/secrets/bridge_drone_token); curl -fsS -H "Authorization: Bearer $tok" https://drone.ci.commoninternet.net/api/repos/recipe-maintainers/cc-ci/builds/585 | jq -r "[.number,.status,.after,.params.RECIPE,.params.PR,.params.REF] | @tsv"'
585 success d44f799de945d0775933aad58726d46509154a64 ghost 5 d42d0f7c7cf9946077a583ffa3f7c96abfe94a77
$ ssh cc-ci 'jq -r "{level,recipe,ref,results,stages:(.stages|map({name,status}))}" /var/lib/cc-ci-runs/585/results.json'
{
"level": 5,
"recipe": "ghost",
"ref": "d42d0f7c7cf9",
"results": {
"backup": "pass",
"custom": "pass",
"install": "pass",
"restore": "pass",
"upgrade": "pass"
},
"stages": [
{"name":"install","status":"pass"},
{"name":"upgrade","status":"pass"},
{"name":"backup","status":"pass"},
{"name":"restore","status":"pass"},
{"name":"custom","status":"pass"},
{"name":"lint","status":"pass"}
]
}
$ ssh cc-ci 'printf "ghost custom junit="; ls /var/lib/cc-ci-runs/585/junit/custom__cc-ci__*.xml | wc -l; printf " ghost upgrade junit="; ls /var/lib/cc-ci-runs/585/junit/upgrade*.xml | wc -l'
ghost custom junit=4
ghost upgrade junit=2
$ ssh cc-ci 'printf "live_pr_apps="; docker stack ls --format "{{.Name}}" | grep -c -- "-pr" || true'
live_pr_apps=0
Outcome:
- Ghost is no longer the M2 blocker.
- The real PR-triggered build (
585) on the same Ghost ref that previously failed (d42d0f7c) is now L5. - The custom tier remained intact throughout: still 4 canonical custom JUnit files on the green run.
- With Ghost green and teardown clean, the cfold phase is ready for a formal M2 claim.