diff --git a/BACKLOG-conc.md b/BACKLOG-conc.md index 3e3acbc..e3efe0e 100644 --- a/BACKLOG-conc.md +++ b/BACKLOG-conc.md @@ -2,18 +2,18 @@ ## Build backlog -- [ ] P1 lock-lifetime hardening: prctl PDEATHSIG + ppid race check + SIGTERM handler → +- [x] P1 lock-lifetime hardening: prctl PDEATHSIG + ppid race check + SIGTERM handler → teardown funnel + signal.alarm(3600) hard deadline; .drone.yml setsid/trap wrap; PEP 446 comment on lock open() -- [ ] P2 flock-probe janitor: acquire_app_lock(domain) at register_run_app's call site; +- [x] P2 flock-probe janitor: acquire_app_lock(domain) at register_run_app's call site; janitor probes per-domain lockfiles (acquired→reap under probe lock, held→leave, >120min mtime→warn); delete registry symbols -- [ ] P3 per-run ABRA_DIR: /var/lib/cc-ci-runs//abra with servers+catalogue symlinks, +- [x] P3 per-run ABRA_DIR: /var/lib/cc-ci-runs//abra with servers+catalogue symlinks, fresh recipes/; fetch_recipe = plain clone; delete acquire_recipe_lock; route harness recipe paths through ABRA_DIR -- [ ] P4 config cleanup: remove concurrency.limit from .drone.yml; maxTests is the single knob -- [ ] tests/concurrency suite (19 cases, real-kernel flock, explicit invocation only) -- [ ] P5 docs/concurrency.md rewrite to the new model +- [x] P4 config cleanup: remove concurrency.limit from .drone.yml; maxTests is the single knob +- [x] tests/concurrency suite (19 cases, real-kernel flock, explicit invocation only) +- [x] P5 docs/concurrency.md rewrite to the new model - [ ] M1 claim (branch complete, both suites + lint green) - [ ] M2: merge to main after M1 PASS, push build green, live verification a–d diff --git a/STATUS-conc.md b/STATUS-conc.md index 2e301d1..c8a2c3a 100644 --- a/STATUS-conc.md +++ b/STATUS-conc.md @@ -5,15 +5,46 @@ Plan: /srv/cc-ci/cc-ci-plan/concurrency-restructure-full-plan.md (SSOT for this ## Phase state - Phase: conc — concurrency restructure (P1–P5 + tests/concurrency) -- Builder branch: `restructure/concurrency` (code lands there; main untouched until M2 merge) -- Done on branch: P1 b492f99, P2 b302f3a, P3 17ebdf3, P4 91d3cc7 -- In flight: tests/concurrency suite (19 cases), then P5 spec rewrite -- Gate: none claimed yet +- Builder branch: `restructure/concurrency` — COMPLETE (P1–P5 + tests), tip `d3fe9e2` +- Gate: **M1 — CLAIMED, awaiting Adversary** +- M2 blocked on M1 PASS (no main merge yet; main untouched by this phase except state files) -## Gates +## Gate claim: M1 — implementation verified -- M1 (implementation verified): NOT CLAIMED -- M2 (merged + live-verified): NOT CLAIMED — blocked on M1 PASS +**WHAT**: Branch `restructure/concurrency` implements the full phase plan: P1 lock-lifetime +hardening, P2 flock-probe janitor (registry deleted), P3 per-run ABRA_DIR (recipe flock +deleted), P4 single concurrency knob, P5 spec rewrite, + `tests/concurrency` (20 tests covering +the 19 plan cases). One commit per phase. + +**WHERE**: origin/restructure/concurrency, commits (in order): +- P1 `b492f99` — harness/lifetime.py guards + .drone.yml setsid/trap wrap +- P2 `b302f3a` — acquire_app_lock + _probe_and_reap + janitor rewrite; registry symbols deleted +- P3 `17ebdf3` — setup_run_abra_dir + per-run fetch_recipe + abra.abra_dir()/recipe_dir() + routing; acquire_recipe_lock/RECIPE_LOCK_DIR deleted; tests/{ghost,discourse}/install_steps.sh + one-line RECIPE_DIR resolution fix (justification: machine-docs/DECISIONS.md "conc P3" entry) +- P4 `91d3cc7` — concurrency.limit removed from .drone.yml; maxTests comment updated +- tests `84d90fb` — tests/concurrency/ (real-kernel; NOT in the default unit gate) +- P5 `d3fe9e2` — docs/concurrency.md rewritten to the new model + +**HOW to verify (cold, from your clone)**: +1. `git fetch && git checkout restructure/concurrency` (tip must be `d3fe9e2`) +2. `cc-ci-run -m pytest tests/unit -q` +3. `cc-ci-run -m pytest tests/concurrency -q` (real flocks; uses tmp dirs, reaps its helpers) +4. `nix develop .#lint --command bash scripts/lint.sh` +5. dangling-reference grep (expect ZERO hits in code): + `grep -rn "register_run_app\|unregister_run_app\|_run_owner_state\|ACTIVE_RUN_DIR\|CCCI_JANITOR_MAX_AGE\|acquire_recipe_lock\|RECIPE_LOCK_DIR\|_stack_age_seconds" --include="*.py" --include="*.nix" --include="*.yml" --include="*.sh" .` +6. adversarial diff review per phase plan (races, deleted-code fallout, gate integrity vs + RUN_APP_RE/warm apps/services_converged, test-suite blind spots vs the 19 cases) + +**EXPECTED**: +- tests/unit: `138 passed` +- tests/concurrency: `20 passed` (runtime ~10 s; spawns helper subprocesses, cleans them up) +- lint: `lint: PASS` +- grep: no hits outside docs/git-history references (docs/concurrency.md lists them as deleted) +- `pytest tests/unit` does NOT collect tests/concurrency (separate dir, explicit invocation only) +- gate-integrity notes: RUN_APP_RE, services_converged()/paused-is-settled, teardown_app order, + warm/canonical flows untouched; only non-assertion change under tests// is the + RECIPE_DIR line in ghost+discourse install_steps.sh (DECISIONS.md "conc P3") ## Blockers