Files
cc-ci/JOURNAL-conc.md
autonomic-bot ffcf441364
All checks were successful
continuous-integration/drone/push Build is passing
journal(conc): P1-P4 evidence (live smokes on cc-ci) + pre-existing abra app ls FATA observation
2026-06-10 04:21:17 +00:00

50 lines
3.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# JOURNAL — sub-phase conc (Builder, append-only)
## 2026-06-10 — bootstrap
Read concurrency-restructure-full-plan.md (SSOT) + plan.md §6.1/§7/§9. Oriented on the code:
- `runner/harness/lifecycle.py` — recipe flock (l.46), registry (l.6597), deploy_app
registration (l.283), teardown unregister (l.723), three-way janitor (l.726).
- `runner/run_recipe_ci.py``acquire_recipe_lock` call site (l.843), `fetch_recipe` (l.140,
rm-rf + reclone of the shared tree), janitor call sites (l.600 quick, l.932 cold).
- `.drone.yml` — recipe-ci step runs `cc-ci-run runner/run_recipe_ci.py` bare (P1 wraps it),
`concurrency.limit: 2` (P4 removes).
- Greps for P3 fallout: `~/.abra/recipes` referenced in abra.py (recipe_checkout,
has_lightweight_version_tags, recipe_head_commit, recipe_versions), generic.py:28,
lifecycle.prepull_images, run_recipe_ci (fetch_recipe, snapshot_recipe_tests, comment),
warm_reconcile.py:202 (runs OUTSIDE per-run context — keeps default), and
tests/ghost+discourse install_steps.sh (`${HOME}/.abra/recipes/...` — these run INSIDE a
run and copy compose.ccci.yml into the deploy tree, so they must resolve the per-run dir).
- `~/.abra/servers/...` paths are unaffected by design (servers/ is symlinked to the canonical
/root/.abra/servers, so both resolutions land on the same file).
Working setup: state files on main in this clone; code on branch `restructure/concurrency`
via a git worktree at ../cc-ci-conc; test runs on the cc-ci host via /root/builder-clone
(`cc-ci-run -m pytest ...`, `nix develop .#lint`).
## 2026-06-10 — P1P4 landed on restructure/concurrency
- P1 b492f99: harness/lifetime.py (PDEATHSIG+ppid recheck, SIGTERM/SIGALRM→SystemExit funnel
with re-entrancy guard, alarm(3600)); main() installs first; both finally blocks mark
begin_teardown(); .drone.yml setsid+trap wrap. Live smoke on cc-ci (cc-ci-run /tmp/p1-smoke.py):
TERM→rc=143+finally; ALRM→rc=142+finally+deadline log; parent-kill→child TERM'd, teardown ran.
- P2 b302f3a: acquire_app_lock + _probe_and_reap + janitor rewrite; registry deleted. Live smoke
(/tmp/p2-smoke*.py): held lock → "live concurrent run, leaving it", reaped=[]; killed holder →
reap exactly once + lockfile unlinked; waiter blocked during probe-held reap, then re-acquired
on the FRESH inode (probe confirmed held by waiter). Note: a select()-on-fd readline artifact
in my smoke script initially looked like a failure — kernel state was verified directly.
Unlink/recreate race guarded on BOTH sides via fstat/stat st_ino identity checks.
- P3 17ebdf3: per-run ABRA_DIR. Verified abra CLI honors $ABRA_DIR on-host (skeleton probe:
FATAs only on empty servers/; with servers+catalogue symlinks + recipes/ it works and even
auto-clones recipes for `app ls` resolution into the per-run dir). p3-smoke: setup + fetch of
custom-html-tiny landed in /tmp/p3runs/9999/abra/recipes, head commit + versions readable via
abra.recipe_dir(). install_steps.sh path fix justified in DECISIONS.md (conc P3 entry).
Pre-existing observation (NOT mine, unchanged): `abra app ls -S -m -n` currently FATAs
"unable to resolve '0cc57a5a'" under the DEFAULT abra dir too → janitor's abra discovery
yields [] and the docker-service sweep carries discovery. Out of this phase's scope.
- P4 91d3cc7: concurrency.limit removed; maxTests comment states single-knob + new model.
One stale comment line (.drone.yml l.39 "concurrency.limit=2 below") folds into P5.
All four commits: tests/unit 138 passed + lint PASS before each. Next: tests/concurrency suite.