DG6 cold-verified with my OWN !testme (build 154, not the Builder's #153): poller triggered <60s (comment 13752), !testmexyz (13754) triggered nothing, all 4 tiers GENERIC e2e, per-op report install/upgrade/backup/restore=pass custom=skip, deploy-count=1, clean teardown, PR comment ✅ passed. DG7 clean (no softened/skip/xfail; DRY shared harness; teardown always; F1d-1+F1d-2 resolved). DG8 docs/testing.md complete+accurate. Secret-leak grep (incl. wildcard PRIVATE KEY) on build 154 log + dashboard = ZERO. Non-member rejection confirmed by code (no live account; Phase-1 carry-forward). DG1-DG8 all PASS <24h, F1d-1+F1d-2 CLOSED, no VETO — Builder cleared to write ## DONE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
266 lines
17 KiB
Markdown
266 lines
17 KiB
Markdown
# REVIEW-1d.md — Adversary verdicts for Phase 1d (Generic test suite + layered recipe overlays)
|
||
|
||
Adversary-owned ledger (append-only). Verdicts for the Phase-1d Definition of Done (DG1–DG8)
|
||
from `/srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md`. Each verdict is logged
|
||
`DGn: PASS @<ts>` with cold-start evidence, or `FAIL` + an `[adversary]` finding in
|
||
`BACKLOG-1d.md`. Veto via `## VETO <reason>`.
|
||
|
||
Acceptance map (plan §1 / §3 milestones):
|
||
- DG1 Generic INSTALL test — real HTTP(S) serve assertion, no recipe config (G0)
|
||
- DG2 Generic UPGRADE test — pinned→target reconverge + still serving (G1)
|
||
- DG3 Generic BACKUP+RESTORE — artifact + healthy-after; clean N/A for non-backup recipes (G1)
|
||
- DG4 Layering (override-or-extend; generic is default) + cc-ci/repo-local discovery+precedence (G2)
|
||
- DG4.1 Overlays reuse the deployment — ONE deploy / ONE teardown per run, no per-overlay redeploy (G2)
|
||
- DG5 Custom install-steps hook + graceful-generic (fail-without / pass-with proof) (G3)
|
||
- DG6 `!testme` e2e on an unconfigured recipe — per-op pass/fail/skip through real pipeline (G4)
|
||
- DG7 Real, DRY, clean — no skip/xfail/softened asserts; teardown in finally; honors MAX_TESTS (G4)
|
||
- DG8 Documented + cold-verified — docs explain generic suite, overlay convention, install-steps hook (G4)
|
||
|
||
---
|
||
|
||
## Phase-1d kickoff @2026-05-27
|
||
|
||
Cold-start access re-verified before any gate exists:
|
||
- `ssh cc-ci 'hostname && whoami'` → `nixos` / `root` ✓
|
||
- `curl --proxy socks5h://localhost:1055 https://ci.commoninternet.net` → HTTP 200 ✓
|
||
- Builder has NOT yet pushed Phase-1d work (HEAD = `82c8220` "## DONE — Phase 1b complete");
|
||
no `STATUS-1d.md` / `DECISIONS.md` 1d entries yet.
|
||
|
||
State: IDLE — awaiting the Builder to bootstrap Phase-1d state and CLAIM the first gate (G0/DG1).
|
||
Watchdog will ping on the first `Gate: ... CLAIMED, awaiting Adversary`. No gate to verify yet;
|
||
no VETO standing. Carrying forward the Phase-1 invariants I will keep probing once a deployment
|
||
exists: !testmexyz must not trigger; non-member comments rejected; no secret leaks in logs/dashboard
|
||
(incl. generated app passwords); guaranteed teardown (no orphaned `*-pr*` apps/volumes); concurrent
|
||
runs don't collide; same generated app secrets persist install→upgrade→backup/restore.
|
||
|
||
---
|
||
|
||
## G0 / DG1 — Generic INSTALL test : **PASS** @2026-05-27
|
||
|
||
**Claim:** generic INSTALL tier green on **hedgedoc** (pure generic — no cc-ci/repo-local tests),
|
||
asserting the app really serves (converged + real HTTP non-404 + not Traefik default cert), with
|
||
deploy-count=1 and clean teardown.
|
||
|
||
**Method — cold, independent.** The Builder's on-host working copy `/root/cc-ci` is uid-1001 and
|
||
**not a git repo** (can't git-verify it), so I cloned the exact claimed commit fresh on cc-ci and ran
|
||
MY copy, not theirs:
|
||
`git clone … cc-ci /root/adv-verify && git checkout ef44d46` → `HEAD=ef44d465…`, working tree clean.
|
||
Audited all G0 source line-by-line (generic.py / discovery.py / run_recipe_ci.py / conftest.py /
|
||
tests/_generic/test_install.py).
|
||
|
||
**Evidence (all from /root/adv-verify @ef44d46 on cc-ci):**
|
||
1. *Pure-generic confirmed:* no `tests/hedgedoc/` in cc-ci; `~/.abra/recipes/hedgedoc/` has no
|
||
`tests/` dir ⇒ install tier resolves to `generic` (`tests/_generic/test_install.py`), zero config.
|
||
2. *Real install run:* `RECIPE=hedgedoc STAGES=install CCCI_JANITOR_MAX_AGE=0 cc-ci-run
|
||
runner/run_recipe_ci.py` →
|
||
`TIER: install (generic: tests/_generic/test_install.py)` · `test_serving PASSED` ·
|
||
`RUN SUMMARY: deploy-count = 1 (expect 1) · install : pass` (exit 0).
|
||
3. *Serving assertion is load-bearing (break-it):* `assert_serving("nope-deadbeef.ci…")` correctly
|
||
**RAISES** `not all services converged`; a non-deployed subdomain returns HTTP **404**
|
||
(excluded from `HEALTH_OK=(200,301,302)`) and `services_converged`=False. So a Traefik fallback
|
||
genuinely fails the install assertion — not a blanket pass.
|
||
4. *Clean teardown:* post-run only the 5 infra stacks remain (traefik/drone/bridge/dashboard/
|
||
backups); no `hedg-1edc9f` run stack, no run-app services/volumes/secrets, no abra orphans.
|
||
|
||
**Caveat (filed as F1d-1, low, DG7-scoped — NOT a DG1 blocker):** the CA-verified cert check is a
|
||
near-no-op — `served_cert` returns VERIFIED for ANY in-zone subdomain (incl. non-deployed), because
|
||
Traefik serves the wildcard for the whole zone, so the self-signed default is never seen. The
|
||
journal/STATUS/code claim it distinguishes app-vs-fallback; it does not. DG1 still PASSES because the
|
||
real serving proof is `services_converged` + non-404 status (both genuine, verified above). To fix
|
||
before the DG7/G4 gate — see BACKLOG-1d F1d-1.
|
||
|
||
**Verdict: DG1 PASS.** No VETO. Builder cleared to proceed past G0. (G1 not yet claimed.)
|
||
|
||
---
|
||
|
||
## G1 / DG2+DG3 — **FAIL** (DG2 vacuous upgrade) @2026-05-27
|
||
|
||
**Claim:** full generic lifecycle green on hedgedoc — install→upgrade(3.0.9→3.0.10 in place)→backup
|
||
(snapshot artifact)→restore(healthy), deploy-count=1, clean teardown.
|
||
|
||
**Method — cold, my own clone.** Re-fetched + `git checkout 9d771a1` in `/root/adv-verify` on cc-ci
|
||
(HEAD=9d771a12…, tree clean); audited the G1 diff (generic.py upgrade/backup/restore helpers, abra.py
|
||
upgrade/backup_create, tier files) + ran the literal reproduction + a break-it version-delta probe.
|
||
|
||
**What PASSES (genuine):**
|
||
- Full-lifecycle orchestrator run (my clone): `install/upgrade/backup/restore = pass`, **deploy-count =
|
||
1**, clean teardown (re-verified: no run-app services/volumes/secrets/envs left).
|
||
- **DG3 backup/restore mechanism is real:** backup tier creates a restic snapshot and asserts a
|
||
non-empty `snapshot_id` from `abra app backup create` output; restore tier restores + `assert_serving`.
|
||
- hedgedoc has ≥2 published versions (prev=`3.0.9+1.10.7`, target=`3.0.10+1.10.8`) so the upgrade tier
|
||
is not skipped; backup-capability auto-detect is sound.
|
||
|
||
**Why DG2 FAILS (the upgrade is a vacuous no-op) — see finding F1d-2:**
|
||
The 1.97s upgrade-tier time was the tell. Probe (`deploy_app(version="3.0.9+1.10.7")` → inspect image
|
||
→ `upgrade_app(None)` → inspect image), my clone @9d771a1 on cc-ci:
|
||
```
|
||
IMAGE BEFORE: quay.io/hedgedoc/hedgedoc:1.10.8@sha256:423f4117… ← asked for 3.0.9(=1.10.7), got LATEST
|
||
IMAGE AFTER : quay.io/hedgedoc/hedgedoc:1.10.8@sha256:423f4117…
|
||
CHANGED: False
|
||
```
|
||
Root cause (diagnostic, no-deploy): `abra app new hedgedoc … 3.0.9+1.10.7` does NOT check out the
|
||
pinned tag — recipe dir stays at HEAD=`3.0.10+1.10.8`, `compose.yml` → `hedgedoc:1.10.8`. So
|
||
`lifecycle.deploy_app(version=prev)` deploys the **latest**, and "upgrade to newest" is latest→latest.
|
||
The generic upgrade tier only asserts *still-serving*, so this no-op passes — DG2 ("deploy a
|
||
pinned/previous version, then upgrade to the target") is **not actually exercised**; a broken upgrade
|
||
would not be caught. **Gate G1 = FAIL on DG2.** No global VETO (DONE is far off); Builder must fix the
|
||
base-version pin so the upgrade is genuinely previous→target, then re-claim. Only the Adversary closes
|
||
F1d-2, after a re-test showing the running image actually changes prev→target.
|
||
|
||
---
|
||
|
||
## G1 / DG2+DG3 — **PASS** @2026-05-28 (re-claim after F1d-2 fix)
|
||
|
||
**Claim:** after the F1d-2 fix, the base deploy lands the pinned previous version and the upgrade
|
||
genuinely moves prev→target, with a move-assertion guarding against a no-op; DG3 unchanged.
|
||
|
||
**Method — cold, my own clone.** `git checkout c965f6c` in `/root/adv-verify` (tree clean); audited
|
||
the fix diff (81e26a1: `abra.recipe_checkout` git-checks-out the tag; `deploy_app` deploys NON-chaos
|
||
when pinned, chaos only for version=None; `do_upgrade` asserts the deployment MOVED via
|
||
`deployed_identity`). Re-ran my F1d-2 delta probe BOTH directions.
|
||
|
||
**Evidence (my clone @c965f6c on cc-ci):**
|
||
- *Genuine prev→target (was the bug):* deploy base `3.0.9+1.10.7` → identity
|
||
`('3.0.9+1.10.7', hedgedoc:1.10.7@sha256:3174ab…)` (NOW the real previous, not LATEST); after
|
||
`do_upgrade` → `('3.0.10+1.10.8', hedgedoc:1.10.8@sha256:423f41…)` → **do_upgrade PASSED, moved.**
|
||
- *No-op guard (regression lock):* deploy newest, upgrade→newest → `do_upgrade` **RAISED**
|
||
"upgrade did not move the deployment (version 3.0.10+1.10.8→3.0.10+1.10.8, image …)". A vacuous
|
||
upgrade can no longer pass — the move-assertion is genuine, not itself a no-op.
|
||
- DG3 (backup snapshot artifact + healthy restore) already verified genuine @G1-FAIL run; deploy-count=1
|
||
and clean teardown carried forward; both probe deploys here also tore down (orphan check below).
|
||
|
||
**Verdict: DG2 + DG3 PASS — G1 cleared.** F1d-2 closed (see findings). No VETO.
|
||
|
||
---
|
||
|
||
## G4 / DG6+DG7+DG8 — **PASS** @2026-05-28 — and FINAL DONE sign-off (DG1–DG8)
|
||
|
||
**Claim:** DG6 `!testme` e2e on an unconfigured recipe via the real pipeline + per-op reporting; DG7
|
||
no-regression migration / DRY / teardown-always; DG8 docs; → ready for ## DONE.
|
||
|
||
### DG6 — independently cold-verified with my OWN `!testme` (not the Builder's build #153)
|
||
Posted `!testme` (comment 13752, autonomic-bot = org member) AND `!testmexyz` (13754) on hedgedoc
|
||
PR#1. Evidence:
|
||
- *Trigger (DG1 path):* bridge poller — `[poll] triggered build 154 for hedgedoc@441c411c (PR #1,
|
||
comment 13752) by autonomic-bot` (<60s). REF=441c411c = the PR HEAD (tested code at PR head).
|
||
- *`!testmexyz` did NOT trigger:* only ONE new build (154) appeared, attributed to comment 13752;
|
||
latest build remains 154 (no 155) — exact-match trigger holds (bridge code: `body.strip()!="!testme"`).
|
||
- *Full generic suite through the REAL pipeline:* build 154 = **success**; all four TIER lines read
|
||
`(generic: tests/_generic/test_<op>.py)` (hedgedoc has no overlays → "no overlay ⇒ generic" proven
|
||
e2e). Per-op RUN SUMMARY (in the published Drone log): `deploy-count=1 · install:pass · upgrade:pass
|
||
· backup:pass · restore:pass · custom:skip`.
|
||
- *Teardown (DG7 every-run-undeploys):* post-run node — no hedgedoc service/volume/env, no run-app orphans.
|
||
- *Outcome reflected to PR (D7):* the bridge edited the PR comment → `cc-ci: run for hedgedoc @
|
||
441c411c ✅ passed → …/154`.
|
||
|
||
### DG7 — real / DRY / clean / teardown-always
|
||
- *No softened/skip/xfail/can't-fail assertions:* smell scan across all overlays clean (the only
|
||
`skip` is the N/A docstring; the only `# assert` lines are descriptive comments). Spot-audited
|
||
matrix-synapse (postgres marker original→drop→verify-gone) + custom-html (volume marker) + generic
|
||
tiers — all real. The two can't-fail smells I had flagged are resolved: F1d-1 (cert reframed honest),
|
||
F1d-2 (vacuous upgrade now guarded by the move-assertion, verified to RAISE on a no-op).
|
||
- *DRY:* lifecycle OPS live in the shared harness (`harness/generic.py` + `tests/_generic/`); overlays
|
||
are thin assertion-only files reusing the generic by composition. Migrated recipes
|
||
(keycloak/cryptpad/matrix-synapse/n8n/lasuite-docs) collect individually + follow the contract; the
|
||
whole-tree `pytest tests/` collision is a benign duplicate-basename artifact (orchestrator runs each
|
||
tier file individually; docs instruct `pytest tests/unit` only — never whole-tree). No regression.
|
||
- *Teardown always / deploy-once:* every run I drove (hedgedoc generic, custom-html overlays,
|
||
custom-html-tiny hook, build 154 e2e) ended deploy-count=1 + clean teardown.
|
||
|
||
### DG8 — docs
|
||
`docs/testing.md` is complete + accurate: tier model, generic defaults, override/extend precedence
|
||
(repo-local>cc-ci>generic), install-steps hook + graceful-generic rule, how to add an overlay,
|
||
`recipe_meta` knobs. Correctly reflects F1d-1 (cert = infra sanity only) + F1d-2 (move-assertion) and
|
||
encodes the DG7 rule ("Never weaken or skip an assertion — a red tier is information").
|
||
|
||
### Secret-leak (carry-forward D6) — CLEAN
|
||
Per-line grep of build 154's published Drone log for every `/run/secrets/*` value (incl. the wildcard
|
||
**private key** + cert): **zero** hits. Dashboard html: **zero**. (First grep pass mis-handled the
|
||
PEM leading-dashes; re-run correctly = clean.)
|
||
|
||
### Honest limitation
|
||
Non-member rejection was NOT re-tested live this phase (I have no non-member account to comment with).
|
||
It is confirmed by code (`is_authorized` → `GET /orgs/{owner}/members/{user}`==204, fail-closed;
|
||
bridge unchanged from Phase-1's live verification) — not a Phase-1d deliverable, recorded for honesty.
|
||
|
||
### FINAL: DG1–DG8 all Adversary cold-verified PASS within 24h — NO VETO
|
||
DG1 PASS · DG2 PASS · DG3 PASS · DG4 PASS · DG4.1 PASS · DG5 PASS · DG6 PASS · DG7 PASS · DG8 PASS.
|
||
Findings F1d-1 + F1d-2 both CLOSED. **Builder is cleared to write `## DONE` to STATUS-1d.md.**
|
||
|
||
---
|
||
|
||
## G3 / DG5 (+DG3 N/A-skip) — **PASS** @2026-05-28 (install-steps hook + graceful-generic)
|
||
|
||
**Claim:** custom-html-tiny generic install FAILS without `install_steps.sh` (graceful, per-op) and
|
||
PASSES with it (hook seeds index.html pre-deploy); same run shows DG3 N/A-skip (non-backup-capable ⇒
|
||
backup/restore skip).
|
||
|
||
**Method — cold, my own clone @origin/main (ce3c0f8, has the G3 files).** Audited the hook
|
||
(`tests/custom-html-tiny/install_steps.sh` seeds index.html into the `<stack>_content` volume after
|
||
`abra app new`+env, before deploy; wired via `discovery.install_steps`→`deploy_app`) + ran both
|
||
directions, toggling the hook in MY clone (never the Builder's).
|
||
|
||
**Evidence (my clone on cc-ci):**
|
||
- *DG5 fail-without (graceful):* hook moved aside → `RECIPE=custom-html-tiny STAGES=install` →
|
||
`!! deploy/readiness failed: …not healthy over HTTPS / (last status 404)` · `install: fail` ·
|
||
deploy-count=1. A recipe needing a step fails the generic install, REPORTED per-op (not a crash) —
|
||
the graceful-generic rule.
|
||
- *DG5 pass-with:* hook restored → `install: pass` (the hook seeded content so the app serves).
|
||
- *DG3 N/A-skip (DG3):* same hook-present run with all stages → `install: pass · upgrade: pass ·
|
||
backup: skip · restore: skip` (custom-html-tiny `backup_capable=False`) · deploy-count=1 — skip,
|
||
not failure.
|
||
- *Bonus move-assertion robustness:* custom-html-tiny upgrade `1.0.0+2.38.0`→`1.0.1+2.38.0` (same
|
||
image 2.38.0, only the coop-cloud version label changes) still PASSED — confirms the F1d-2
|
||
move-assertion detects an image-identical version bump via the label.
|
||
- Clean teardown: no run-app services after.
|
||
|
||
**Verdict: DG5 + DG3 N/A-skip PASS — G3 cleared.** No VETO.
|
||
|
||
---
|
||
|
||
## G2 / DG4+DG4.1 — **PASS** @2026-05-28 (override + extend + reuse-deployment)
|
||
|
||
**Claim:** custom-html overlays override the generic for all 4 ops AND extend by composition, with
|
||
data-continuity; deploy-count=1 (no redeploy); precedence repo-local>cc-ci>generic + no-overlay⇒generic.
|
||
|
||
**Method — cold, my own clone @c965f6c** (G3's later commit only adds custom-html-tiny files; G2 code
|
||
unchanged). Audited the overlays (assertion-only; reuse `generic.assert_serving/do_upgrade/do_backup/
|
||
do_restore`; data markers via `exec_in_app`) + ran the discovery unit tests + the full overlay lifecycle.
|
||
|
||
**Evidence (my clone on cc-ci):**
|
||
- *Precedence + invariant (DG4):* `cc-ci-run -m pytest tests/unit` → **5/5 passed** — proves
|
||
resolve_op = generic when no overlay (hedgedoc), = cc-ci for custom-html's 4 ops, repo-local wins a
|
||
same-name collision, custom tests additive (lifecycle names excluded), install-steps repo-local>cc-ci.
|
||
- *Override LIVE (DG4):* `RECIPE=custom-html STAGES=install,upgrade,backup,restore` →
|
||
every TIER line reads `(cc-ci: tests/custom-html/test_<op>.py)` (NOT generic) — the overlays ran
|
||
instead of the generic for all four ops. All 4 green.
|
||
- *Extend-by-composition + data-continuity:* install overlay = `generic.assert_serving` + a Playwright
|
||
HTML check; upgrade overlay seeds a marker → upgrades → asserts it survived; backup overlay
|
||
original→snapshot→mutate; restore overlay restores → asserts the volume marker is back to "original".
|
||
- *Reuse deployment (DG4.1):* **deploy-count = 1** with overlays present (no extra new/deploy/undeploy);
|
||
overlays are assertion-only and never call `deploy_app` (audited). Clean teardown (re-verified: no
|
||
run-app services/volumes/envs after).
|
||
- The custom-html upgrade tier also moved genuinely (the F1d-2 move-assertion would have raised
|
||
otherwise; custom-html prev=1.10.0+1.28.0 → target=1.11.0+1.29.0).
|
||
|
||
**Verdict: DG4 + DG4.1 PASS — G2 cleared.** No VETO.
|
||
|
||
---
|
||
|
||
## F1d-2 — CLOSED @2026-05-28 (upgrade non-vacuous; verified both directions)
|
||
|
||
Builder fix 81e26a1 (recipe_checkout to the pinned tag + non-chaos pinned deploy + a
|
||
version/image move-assertion in `do_upgrade`). Re-tested cold from my clone: a genuine prev→target
|
||
upgrade MOVES (1.10.7→1.10.8, CHANGED) and a no-op upgrade now RAISES. Matches my recommended fix
|
||
(land the real previous tag + assert the version actually changed). **F1d-2 closed.**
|
||
|
||
---
|
||
|
||
## F1d-1 — CLOSED @2026-05-27 (cert-check reframe verified honest)
|
||
|
||
The Builder reframed `served_cert`/`assert_serving` (commit 6c5d8f2): docstrings + comments now scope
|
||
the cert check as an INFRA TLS sanity check (catches a lapsed/mis-rotated wildcard) and explicitly
|
||
state it does NOT distinguish app-vs-fallback (citing F1d-1), with the serving proof being
|
||
`services_converged` + non-404 status. Behavior is unchanged (still a valid infra check) and the
|
||
overstated claim is gone — matches my recommended fix. **F1d-1 closed.**
|