Files
cc-ci/machine-docs/REVIEW-1d.md
autonomic-bot 4a6d6cf4bf review(1d): G4 PASS + FINAL sign-off — DG1-DG8 all Adversary cold-verified, NO VETO
DG6 cold-verified with my OWN !testme (build 154, not the Builder's #153): poller triggered <60s
(comment 13752), !testmexyz (13754) triggered nothing, all 4 tiers GENERIC e2e, per-op report
install/upgrade/backup/restore=pass custom=skip, deploy-count=1, clean teardown, PR comment  passed.
DG7 clean (no softened/skip/xfail; DRY shared harness; teardown always; F1d-1+F1d-2 resolved). DG8
docs/testing.md complete+accurate. Secret-leak grep (incl. wildcard PRIVATE KEY) on build 154 log +
dashboard = ZERO. Non-member rejection confirmed by code (no live account; Phase-1 carry-forward).

DG1-DG8 all PASS <24h, F1d-1+F1d-2 CLOSED, no VETO — Builder cleared to write ## DONE.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 02:25:02 +01:00

17 KiB
Raw Blame History

REVIEW-1d.md — Adversary verdicts for Phase 1d (Generic test suite + layered recipe overlays)

Adversary-owned ledger (append-only). Verdicts for the Phase-1d Definition of Done (DG1DG8) from /srv/cc-ci/cc-ci-plan/plan-phase1d-generic-test-suite.md. Each verdict is logged DGn: PASS @<ts> with cold-start evidence, or FAIL + an [adversary] finding in BACKLOG-1d.md. Veto via ## VETO <reason>.

Acceptance map (plan §1 / §3 milestones):

  • DG1 Generic INSTALL test — real HTTP(S) serve assertion, no recipe config (G0)
  • DG2 Generic UPGRADE test — pinned→target reconverge + still serving (G1)
  • DG3 Generic BACKUP+RESTORE — artifact + healthy-after; clean N/A for non-backup recipes (G1)
  • DG4 Layering (override-or-extend; generic is default) + cc-ci/repo-local discovery+precedence (G2)
  • DG4.1 Overlays reuse the deployment — ONE deploy / ONE teardown per run, no per-overlay redeploy (G2)
  • DG5 Custom install-steps hook + graceful-generic (fail-without / pass-with proof) (G3)
  • DG6 !testme e2e on an unconfigured recipe — per-op pass/fail/skip through real pipeline (G4)
  • DG7 Real, DRY, clean — no skip/xfail/softened asserts; teardown in finally; honors MAX_TESTS (G4)
  • DG8 Documented + cold-verified — docs explain generic suite, overlay convention, install-steps hook (G4)

Phase-1d kickoff @2026-05-27

Cold-start access re-verified before any gate exists:

  • ssh cc-ci 'hostname && whoami'nixos / root
  • curl --proxy socks5h://localhost:1055 https://ci.commoninternet.net → HTTP 200 ✓
  • Builder has NOT yet pushed Phase-1d work (HEAD = 82c8220 "## DONE — Phase 1b complete"); no STATUS-1d.md / DECISIONS.md 1d entries yet.

State: IDLE — awaiting the Builder to bootstrap Phase-1d state and CLAIM the first gate (G0/DG1). Watchdog will ping on the first Gate: ... CLAIMED, awaiting Adversary. No gate to verify yet; no VETO standing. Carrying forward the Phase-1 invariants I will keep probing once a deployment exists: !testmexyz must not trigger; non-member comments rejected; no secret leaks in logs/dashboard (incl. generated app passwords); guaranteed teardown (no orphaned *-pr* apps/volumes); concurrent runs don't collide; same generated app secrets persist install→upgrade→backup/restore.


G0 / DG1 — Generic INSTALL test : PASS @2026-05-27

Claim: generic INSTALL tier green on hedgedoc (pure generic — no cc-ci/repo-local tests), asserting the app really serves (converged + real HTTP non-404 + not Traefik default cert), with deploy-count=1 and clean teardown.

Method — cold, independent. The Builder's on-host working copy /root/cc-ci is uid-1001 and not a git repo (can't git-verify it), so I cloned the exact claimed commit fresh on cc-ci and ran MY copy, not theirs: git clone … cc-ci /root/adv-verify && git checkout ef44d46HEAD=ef44d465…, working tree clean. Audited all G0 source line-by-line (generic.py / discovery.py / run_recipe_ci.py / conftest.py / tests/_generic/test_install.py).

Evidence (all from /root/adv-verify @ef44d46 on cc-ci):

  1. Pure-generic confirmed: no tests/hedgedoc/ in cc-ci; ~/.abra/recipes/hedgedoc/ has no tests/ dir ⇒ install tier resolves to generic (tests/_generic/test_install.py), zero config.
  2. Real install run: RECIPE=hedgedoc STAGES=install CCCI_JANITOR_MAX_AGE=0 cc-ci-run runner/run_recipe_ci.pyTIER: install (generic: tests/_generic/test_install.py) · test_serving PASSED · RUN SUMMARY: deploy-count = 1 (expect 1) · install : pass (exit 0).
  3. Serving assertion is load-bearing (break-it): assert_serving("nope-deadbeef.ci…") correctly RAISES not all services converged; a non-deployed subdomain returns HTTP 404 (excluded from HEALTH_OK=(200,301,302)) and services_converged=False. So a Traefik fallback genuinely fails the install assertion — not a blanket pass.
  4. Clean teardown: post-run only the 5 infra stacks remain (traefik/drone/bridge/dashboard/ backups); no hedg-1edc9f run stack, no run-app services/volumes/secrets, no abra orphans.

Caveat (filed as F1d-1, low, DG7-scoped — NOT a DG1 blocker): the CA-verified cert check is a near-no-op — served_cert returns VERIFIED for ANY in-zone subdomain (incl. non-deployed), because Traefik serves the wildcard for the whole zone, so the self-signed default is never seen. The journal/STATUS/code claim it distinguishes app-vs-fallback; it does not. DG1 still PASSES because the real serving proof is services_converged + non-404 status (both genuine, verified above). To fix before the DG7/G4 gate — see BACKLOG-1d F1d-1.

Verdict: DG1 PASS. No VETO. Builder cleared to proceed past G0. (G1 not yet claimed.)


G1 / DG2+DG3 — FAIL (DG2 vacuous upgrade) @2026-05-27

Claim: full generic lifecycle green on hedgedoc — install→upgrade(3.0.9→3.0.10 in place)→backup (snapshot artifact)→restore(healthy), deploy-count=1, clean teardown.

Method — cold, my own clone. Re-fetched + git checkout 9d771a1 in /root/adv-verify on cc-ci (HEAD=9d771a12…, tree clean); audited the G1 diff (generic.py upgrade/backup/restore helpers, abra.py upgrade/backup_create, tier files) + ran the literal reproduction + a break-it version-delta probe.

What PASSES (genuine):

  • Full-lifecycle orchestrator run (my clone): install/upgrade/backup/restore = pass, deploy-count = 1, clean teardown (re-verified: no run-app services/volumes/secrets/envs left).
  • DG3 backup/restore mechanism is real: backup tier creates a restic snapshot and asserts a non-empty snapshot_id from abra app backup create output; restore tier restores + assert_serving.
  • hedgedoc has ≥2 published versions (prev=3.0.9+1.10.7, target=3.0.10+1.10.8) so the upgrade tier is not skipped; backup-capability auto-detect is sound.

Why DG2 FAILS (the upgrade is a vacuous no-op) — see finding F1d-2: The 1.97s upgrade-tier time was the tell. Probe (deploy_app(version="3.0.9+1.10.7") → inspect image → upgrade_app(None) → inspect image), my clone @9d771a1 on cc-ci:

IMAGE BEFORE: quay.io/hedgedoc/hedgedoc:1.10.8@sha256:423f4117…   ← asked for 3.0.9(=1.10.7), got LATEST
IMAGE AFTER : quay.io/hedgedoc/hedgedoc:1.10.8@sha256:423f4117…
CHANGED: False

Root cause (diagnostic, no-deploy): abra app new hedgedoc … 3.0.9+1.10.7 does NOT check out the pinned tag — recipe dir stays at HEAD=3.0.10+1.10.8, compose.ymlhedgedoc:1.10.8. So lifecycle.deploy_app(version=prev) deploys the latest, and "upgrade to newest" is latest→latest. The generic upgrade tier only asserts still-serving, so this no-op passes — DG2 ("deploy a pinned/previous version, then upgrade to the target") is not actually exercised; a broken upgrade would not be caught. Gate G1 = FAIL on DG2. No global VETO (DONE is far off); Builder must fix the base-version pin so the upgrade is genuinely previous→target, then re-claim. Only the Adversary closes F1d-2, after a re-test showing the running image actually changes prev→target.


G1 / DG2+DG3 — PASS @2026-05-28 (re-claim after F1d-2 fix)

Claim: after the F1d-2 fix, the base deploy lands the pinned previous version and the upgrade genuinely moves prev→target, with a move-assertion guarding against a no-op; DG3 unchanged.

Method — cold, my own clone. git checkout c965f6c in /root/adv-verify (tree clean); audited the fix diff (81e26a1: abra.recipe_checkout git-checks-out the tag; deploy_app deploys NON-chaos when pinned, chaos only for version=None; do_upgrade asserts the deployment MOVED via deployed_identity). Re-ran my F1d-2 delta probe BOTH directions.

Evidence (my clone @c965f6c on cc-ci):

  • Genuine prev→target (was the bug): deploy base 3.0.9+1.10.7 → identity ('3.0.9+1.10.7', hedgedoc:1.10.7@sha256:3174ab…) (NOW the real previous, not LATEST); after do_upgrade('3.0.10+1.10.8', hedgedoc:1.10.8@sha256:423f41…)do_upgrade PASSED, moved.
  • No-op guard (regression lock): deploy newest, upgrade→newest → do_upgrade RAISED "upgrade did not move the deployment (version 3.0.10+1.10.8→3.0.10+1.10.8, image …)". A vacuous upgrade can no longer pass — the move-assertion is genuine, not itself a no-op.
  • DG3 (backup snapshot artifact + healthy restore) already verified genuine @G1-FAIL run; deploy-count=1 and clean teardown carried forward; both probe deploys here also tore down (orphan check below).

Verdict: DG2 + DG3 PASS — G1 cleared. F1d-2 closed (see findings). No VETO.


G4 / DG6+DG7+DG8 — PASS @2026-05-28 — and FINAL DONE sign-off (DG1DG8)

Claim: DG6 !testme e2e on an unconfigured recipe via the real pipeline + per-op reporting; DG7 no-regression migration / DRY / teardown-always; DG8 docs; → ready for ## DONE.

DG6 — independently cold-verified with my OWN !testme (not the Builder's build #153)

Posted !testme (comment 13752, autonomic-bot = org member) AND !testmexyz (13754) on hedgedoc PR#1. Evidence:

  • Trigger (DG1 path): bridge poller — [poll] triggered build 154 for hedgedoc@441c411c (PR #1, comment 13752) by autonomic-bot (<60s). REF=441c411c = the PR HEAD (tested code at PR head).
  • !testmexyz did NOT trigger: only ONE new build (154) appeared, attributed to comment 13752; latest build remains 154 (no 155) — exact-match trigger holds (bridge code: body.strip()!="!testme").
  • Full generic suite through the REAL pipeline: build 154 = success; all four TIER lines read (generic: tests/_generic/test_<op>.py) (hedgedoc has no overlays → "no overlay ⇒ generic" proven e2e). Per-op RUN SUMMARY (in the published Drone log): deploy-count=1 · install:pass · upgrade:pass · backup:pass · restore:pass · custom:skip.
  • Teardown (DG7 every-run-undeploys): post-run node — no hedgedoc service/volume/env, no run-app orphans.
  • Outcome reflected to PR (D7): the bridge edited the PR comment → cc-ci: run for hedgedoc @ 441c411c ✅ passed → …/154.

DG7 — real / DRY / clean / teardown-always

  • No softened/skip/xfail/can't-fail assertions: smell scan across all overlays clean (the only skip is the N/A docstring; the only # assert lines are descriptive comments). Spot-audited matrix-synapse (postgres marker original→drop→verify-gone) + custom-html (volume marker) + generic tiers — all real. The two can't-fail smells I had flagged are resolved: F1d-1 (cert reframed honest), F1d-2 (vacuous upgrade now guarded by the move-assertion, verified to RAISE on a no-op).
  • DRY: lifecycle OPS live in the shared harness (harness/generic.py + tests/_generic/); overlays are thin assertion-only files reusing the generic by composition. Migrated recipes (keycloak/cryptpad/matrix-synapse/n8n/lasuite-docs) collect individually + follow the contract; the whole-tree pytest tests/ collision is a benign duplicate-basename artifact (orchestrator runs each tier file individually; docs instruct pytest tests/unit only — never whole-tree). No regression.
  • Teardown always / deploy-once: every run I drove (hedgedoc generic, custom-html overlays, custom-html-tiny hook, build 154 e2e) ended deploy-count=1 + clean teardown.

DG8 — docs

docs/testing.md is complete + accurate: tier model, generic defaults, override/extend precedence (repo-local>cc-ci>generic), install-steps hook + graceful-generic rule, how to add an overlay, recipe_meta knobs. Correctly reflects F1d-1 (cert = infra sanity only) + F1d-2 (move-assertion) and encodes the DG7 rule ("Never weaken or skip an assertion — a red tier is information").

Secret-leak (carry-forward D6) — CLEAN

Per-line grep of build 154's published Drone log for every /run/secrets/* value (incl. the wildcard private key + cert): zero hits. Dashboard html: zero. (First grep pass mis-handled the PEM leading-dashes; re-run correctly = clean.)

Honest limitation

Non-member rejection was NOT re-tested live this phase (I have no non-member account to comment with). It is confirmed by code (is_authorizedGET /orgs/{owner}/members/{user}==204, fail-closed; bridge unchanged from Phase-1's live verification) — not a Phase-1d deliverable, recorded for honesty.

FINAL: DG1DG8 all Adversary cold-verified PASS within 24h — NO VETO

DG1 PASS · DG2 PASS · DG3 PASS · DG4 PASS · DG4.1 PASS · DG5 PASS · DG6 PASS · DG7 PASS · DG8 PASS. Findings F1d-1 + F1d-2 both CLOSED. Builder is cleared to write ## DONE to STATUS-1d.md.


G3 / DG5 (+DG3 N/A-skip) — PASS @2026-05-28 (install-steps hook + graceful-generic)

Claim: custom-html-tiny generic install FAILS without install_steps.sh (graceful, per-op) and PASSES with it (hook seeds index.html pre-deploy); same run shows DG3 N/A-skip (non-backup-capable ⇒ backup/restore skip).

Method — cold, my own clone @origin/main (ce3c0f8, has the G3 files). Audited the hook (tests/custom-html-tiny/install_steps.sh seeds index.html into the <stack>_content volume after abra app new+env, before deploy; wired via discovery.install_stepsdeploy_app) + ran both directions, toggling the hook in MY clone (never the Builder's).

Evidence (my clone on cc-ci):

  • DG5 fail-without (graceful): hook moved aside → RECIPE=custom-html-tiny STAGES=install!! deploy/readiness failed: …not healthy over HTTPS / (last status 404) · install: fail · deploy-count=1. A recipe needing a step fails the generic install, REPORTED per-op (not a crash) — the graceful-generic rule.
  • DG5 pass-with: hook restored → install: pass (the hook seeded content so the app serves).
  • DG3 N/A-skip (DG3): same hook-present run with all stages → install: pass · upgrade: pass · backup: skip · restore: skip (custom-html-tiny backup_capable=False) · deploy-count=1 — skip, not failure.
  • Bonus move-assertion robustness: custom-html-tiny upgrade 1.0.0+2.38.01.0.1+2.38.0 (same image 2.38.0, only the coop-cloud version label changes) still PASSED — confirms the F1d-2 move-assertion detects an image-identical version bump via the label.
  • Clean teardown: no run-app services after.

Verdict: DG5 + DG3 N/A-skip PASS — G3 cleared. No VETO.


G2 / DG4+DG4.1 — PASS @2026-05-28 (override + extend + reuse-deployment)

Claim: custom-html overlays override the generic for all 4 ops AND extend by composition, with data-continuity; deploy-count=1 (no redeploy); precedence repo-local>cc-ci>generic + no-overlay⇒generic.

Method — cold, my own clone @c965f6c (G3's later commit only adds custom-html-tiny files; G2 code unchanged). Audited the overlays (assertion-only; reuse generic.assert_serving/do_upgrade/do_backup/ do_restore; data markers via exec_in_app) + ran the discovery unit tests + the full overlay lifecycle.

Evidence (my clone on cc-ci):

  • Precedence + invariant (DG4): cc-ci-run -m pytest tests/unit5/5 passed — proves resolve_op = generic when no overlay (hedgedoc), = cc-ci for custom-html's 4 ops, repo-local wins a same-name collision, custom tests additive (lifecycle names excluded), install-steps repo-local>cc-ci.
  • Override LIVE (DG4): RECIPE=custom-html STAGES=install,upgrade,backup,restore → every TIER line reads (cc-ci: tests/custom-html/test_<op>.py) (NOT generic) — the overlays ran instead of the generic for all four ops. All 4 green.
  • Extend-by-composition + data-continuity: install overlay = generic.assert_serving + a Playwright HTML check; upgrade overlay seeds a marker → upgrades → asserts it survived; backup overlay original→snapshot→mutate; restore overlay restores → asserts the volume marker is back to "original".
  • Reuse deployment (DG4.1): deploy-count = 1 with overlays present (no extra new/deploy/undeploy); overlays are assertion-only and never call deploy_app (audited). Clean teardown (re-verified: no run-app services/volumes/envs after).
  • The custom-html upgrade tier also moved genuinely (the F1d-2 move-assertion would have raised otherwise; custom-html prev=1.10.0+1.28.0 → target=1.11.0+1.29.0).

Verdict: DG4 + DG4.1 PASS — G2 cleared. No VETO.


F1d-2 — CLOSED @2026-05-28 (upgrade non-vacuous; verified both directions)

Builder fix 81e26a1 (recipe_checkout to the pinned tag + non-chaos pinned deploy + a version/image move-assertion in do_upgrade). Re-tested cold from my clone: a genuine prev→target upgrade MOVES (1.10.7→1.10.8, CHANGED) and a no-op upgrade now RAISES. Matches my recommended fix (land the real previous tag + assert the version actually changed). F1d-2 closed.


F1d-1 — CLOSED @2026-05-27 (cert-check reframe verified honest)

The Builder reframed served_cert/assert_serving (commit 6c5d8f2): docstrings + comments now scope the cert check as an INFRA TLS sanity check (catches a lapsed/mis-rotated wildcard) and explicitly state it does NOT distinguish app-vs-fallback (citing F1d-1), with the serving proof being services_converged + non-404 status. Behavior is unchanged (still a valid infra check) and the overstated claim is gone — matches my recommended fix. F1d-1 closed.