Reboot survival for the Pi orchestrator host: - systemd unit cc-ci-plan/systemd/cc-ci-loops.service (installed + enabled): on boot records the reboot, starts loops+watchdog (RESUME_PHASE=1), and resumes the orchestrator session. - reboot-log.sh: boot_id-gated reboot record -> REBOOTS.md (manual restarts don't count). - launch-orchestrator.sh: injects an AGENTS.md startup nudge so an auto-resumed orchestrator announces itself (PushNotification) + reports reboots. - AGENTS.md: on-startup notify routine documented. Plans/tooling accumulated this session: - plan-phase1d (generic suite), 1e (harness corrections), phase4 (final review), sso-dep-testing, orchestrator-migration (parked), test-e2e-testme-acceptance. - launch.sh: 1d/1e/2/2b/3/4 phase sequence, machine-docs-aware state resolution, limit-stall re-nudge, INBOX side-channel detection. - plan.md §6.1/§7: artifact-layer isolation, INBOX, 5-min long-run polling, DEFERRED. - prompts: isolation discipline + INBOX + pacing. - .gitignore: harden (.sops/, cc-ci-secrets/, .claude/, *.tmp.*). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
140 lines
9.0 KiB
Markdown
140 lines
9.0 KiB
Markdown
# cc-ci Phase 1e — Generic-harness corrections (Autonomous Build Plan)
|
||
|
||
**Status:** QUEUED — runs **after Phase 1d** and **before Phase 2** (`plan-phase2-recipe-tests.md`).
|
||
It corrects the **shared generic-test harness** from 1d, so it must land before Phase 2 authors
|
||
overlays on top of it.
|
||
**Transition:** **manual** (operator kicks it off).
|
||
**Builds on:** the Phase-1d generic suite (`runner/run_recipe_ci.py`, `runner/harness/*`,
|
||
`tests/_generic/*`, `tests/conftest.py`) — see `plan-phase1d-generic-test-suite.md`.
|
||
**Owner agents:** same Builder + Adversary loops (`plan.md` §6/§7); Adversary cold-verifies.
|
||
**This file's path:** `/srv/cc-ci/cc-ci-plan/plan-phase1e-harness-corrections.md`
|
||
**Phase order:** 1c → 1b → 1d → **1e** → 2 → 2b → 3.
|
||
|
||
---
|
||
|
||
## 0. Why this phase
|
||
|
||
An operator review of the 1d generic suite (2026-05-28) found three corrections to the **shared
|
||
harness** — the foundation every recipe overlay (Phase 2) builds on. Fixing them now, once, is far
|
||
cheaper than after overlays exist. All three are small in code but change behavior, so each needs a
|
||
fresh Adversary cold-verification and must not weaken any existing test.
|
||
|
||
---
|
||
|
||
## 1. Definition of Done (Phase 1e exit condition)
|
||
|
||
Terminates when every item holds **and the Adversary has independently cold-verified** (logged in
|
||
`machine-docs/REVIEW-1e.md`):
|
||
|
||
- [ ] **HC1 — Upgrade tier upgrades to the code under test (PR head), not a published tag.** The
|
||
upgrade tier deploys the **previous published version** (last release before the PR) and then
|
||
**upgrades to the PR head via `abra app deploy --chaos`** (chaos = the current checkout). The
|
||
PR's actual changes are exercised by the upgrade path. (§2.1)
|
||
- [ ] **HC2 — Repo-local (PR-authored) code is not executed unless the recipe is approved.** By
|
||
default the harness runs **only cc-ci-authored** overlays/install-steps (`tests/<recipe>/…`) +
|
||
the generic; PR-authored repo-local `test_*.py` and `install_steps.sh` are **not run**.
|
||
Repo-local code is honored **only for recipes on an explicit cc-ci-maintained approval
|
||
allowlist** (default-deny). (§2.2)
|
||
- [ ] **HC3 — Generic runs by default (additive); skipping it is explicit.** When a recipe ships an
|
||
overlay for an op, the **generic still runs** alongside it by default; the generic is skipped
|
||
**only** when an explicit env/flag opts out. The baseline floor is never lost silently. (§2.3)
|
||
- [ ] **HC4 — No regression, cold-verified.** The Adversary re-runs the relevant D1–D10 / DG1–DG8
|
||
acceptance from a cold start: nothing weakened, deploy-once (DG4.1) still holds, teardown still
|
||
sacred, and the three new behaviors are demonstrated (HC1: a PR-head upgrade proven to deploy
|
||
PR-head; HC2: a repo-local test is *ignored* for a non-approved recipe and *run* for an approved
|
||
one; HC3: generic runs with an overlay present, and is skipped only with the opt-out set).
|
||
|
||
When HC1–HC4 hold and are confirmed, write `## DONE` to `machine-docs/STATUS-1e.md`.
|
||
|
||
---
|
||
|
||
## 2. The three corrections
|
||
|
||
### 2.1 HC1 — Upgrade to the PR head (not a published tag)
|
||
Current 1d behavior: deploy previous published version, then `abra app upgrade` to the **newest
|
||
published tag** — and because deploying the prev tag re-checks-out the recipe, the **PR-head code is
|
||
never deployed**, so a recipe PR's changes aren't exercised by upgrade.
|
||
|
||
Corrected:
|
||
1. Deploy the **previous published version** (the last release before the code under test) as the
|
||
"before" state.
|
||
2. **Restore the PR-head checkout** (re-checkout the PR ref / re-use the post-fetch snapshot — the
|
||
prev-tag deploy will have reset `~/.abra/recipes/<recipe>`).
|
||
3. **Upgrade to it via `abra app deploy --chaos`** (chaos = current checkout = PR head) in place on
|
||
the shared deployment.
|
||
4. Assert reconverge + still serving (as today).
|
||
- **Adapt the "deployment moved" assertion** (`generic.do_upgrade`): prev→PR-head may *not* bump the
|
||
coop-cloud version label (a PR can change a recipe without a version bump), so also accept an
|
||
image/config change, or assert the running config now matches the PR head — keep it non-vacuous
|
||
without false-failing a legit unbumped PR.
|
||
- **Non-PR `!testme`** (no PR head): "current checkout" = the catalogue current, so upgrade tests
|
||
prev→current — still valid.
|
||
- Preserve **deploy-once** spirit: this is still one app deployment mutated in place (prev → chaos
|
||
redeploy of PR head is the upgrade op, not a fresh second app). Reconcile with the DG4.1
|
||
deploy-count guard — define whether a chaos redeploy counts as a "deploy" and adjust the guard so
|
||
the legitimate upgrade isn't flagged (e.g. count `abra app new` installs, not in-place redeploys).
|
||
|
||
### 2.2 HC2 — Repo-local trust gate (default-deny; cc-ci overlays only)
|
||
`install_steps.sh` and repo-local `test_*.py` are PR-author-controlled code that runs on the CI host
|
||
with `/run/secrets/*` present — an untrusted-code risk. Operator decision (2026-05-28):
|
||
|
||
- **Default:** the harness runs **only cc-ci-authored** overlays + install-steps
|
||
(`tests/<recipe>/…`) and the generic. Repo-local (`<recipe-repo>/tests/`) `test_*.py` and
|
||
`install_steps.sh` are **discovered-but-not-executed**.
|
||
- **Approved recipes only:** repo-local code is honored **only** when the recipe is on an explicit,
|
||
**cc-ci-maintained approval allowlist** (default-empty ⇒ default-deny). Adding a recipe to the
|
||
allowlist is a deliberate cc-ci-maintainer act after reviewing that recipe's tests.
|
||
- Update `discovery.resolve_op` / `custom_tests` / `install_steps` so the **repo-local source is
|
||
only consulted for allowlisted recipes**; otherwise precedence is **cc-ci > generic** only.
|
||
- **Open (settle in DECISIONS):** the allowlist's form + location (a checked-in file like
|
||
`tests/repo-local-approved.txt`, or a field in a cc-ci config), and the approval workflow. Keep it
|
||
simple + auditable + in git.
|
||
- (Future hardening, → IDEAS, not this phase: sandbox/network-restrict even cc-ci overlays.)
|
||
|
||
### 2.3 HC3 — Generic by default (additive), explicit opt-out
|
||
Supersedes 1d's pure-override default. New rule: when a recipe ships an overlay for an op, **both the
|
||
generic and the overlay run** for that op by default; the generic is skipped **only** when an
|
||
explicit opt-out is set.
|
||
|
||
- **Opt-out mechanism (propose; settle in DECISIONS):** an env flag `CCCI_SKIP_GENERIC` (all ops) and
|
||
per-op `CCCI_SKIP_GENERIC_<OP>` (e.g. `..._UPGRADE`), settable via the recipe's `recipe_meta.py`
|
||
(a `SKIP_GENERIC` list) so it's declarative per recipe, not a hidden global.
|
||
- **Op-vs-assertion split (required by additive + deploy-once):** a mutating op (upgrade/backup/
|
||
restore) must run **once**, then **both** the generic assertions and the overlay assertions
|
||
evaluate the post-op state — never upgrade/backup twice. So refactor the tiers: the **orchestrator
|
||
performs the op once** (the harness owns the op), then runs generic assertions (unless opted out) +
|
||
overlay assertions against the shared post-op deployment. For `install` (no op) both assertion sets
|
||
just run. This keeps deploy-once and one-op-per-tier intact.
|
||
- Net effect: the generic "is it actually serving / did the upgrade move / snapshot produced" floor
|
||
is **always** exercised unless a recipe explicitly declares it skips generics — overlays add, they
|
||
don't silently subtract.
|
||
|
||
---
|
||
|
||
## 3. Method / milestones (bounded)
|
||
- **E0 — HC2 trust gate.** Gate repo-local behind the approval allowlist (default-deny); cc-ci+generic
|
||
only otherwise. *Accept:* repo-local ignored for a non-approved recipe, run for an approved one.
|
||
- **E1 — HC3 additive + op/assertion split.** Generic runs alongside overlays by default; op runs
|
||
once; opt-out env skips the generic assertions. *Accept:* overlay + generic both run on one
|
||
deployment; opt-out skips generic; deploy-count still 1.
|
||
- **E2 — HC1 upgrade-to-PR-head.** prev-release → PR-head via `deploy --chaos`; moved-assertion
|
||
adapted; deploy-count guard reconciled. *Accept:* upgrade demonstrably deploys PR-head.
|
||
- **E3 — HC4 cold re-verification + docs.** Adversary cold-verifies no regression + the three new
|
||
behaviors; update `docs/` + `machine-docs/DECISIONS.md`; flip `STATUS-1e.md` to `## DONE`.
|
||
|
||
---
|
||
|
||
## 4. Guardrails
|
||
- **Never weaken a test** — these are correctness/security fixes; the cardinal rule still wins.
|
||
- **Default-secure** — repo-local PR code is off unless the recipe is explicitly approved; the
|
||
allowlist lives in git and is auditable.
|
||
- **Floor-by-default** — the generic baseline always runs unless a recipe explicitly opts out.
|
||
- **Deploy-once preserved** — one app deployment, one teardown; ops run once; reconcile the DG4.1
|
||
guard with the chaos-upgrade redeploy.
|
||
- **Bounded** — three fixes + verification, then stop; bigger hardening (sandboxing) → IDEAS.
|
||
|
||
## 5. Open decisions (log in machine-docs/DECISIONS.md)
|
||
- HC2: approval-allowlist form/location + the approval workflow.
|
||
- HC3: opt-out flag name/granularity + declaring it via `recipe_meta.py`.
|
||
- HC1: how the DG4.1 deploy-count guard treats an in-place chaos upgrade (don't flag the legit op).
|