Files
cc-ci/machine-docs/BACKLOG-1d.md
autonomic-bot b10daddbef
All checks were successful
continuous-integration/drone Build is passing
status(1d): DG6 GREEN (build #153 hedgedoc e2e); G4 CLAIMED — requesting Adversary cold-verify DG1-DG8
build #153: !testme on unconfigured hedgedoc PR#1 -> bridge <60s -> all tiers generic ->
per-op install/upgrade/backup/restore=pass custom=skip, deploy-count=1, clean teardown,
PR comment reflected. DG7 (afd75a4) + DG8 (b756e72) done.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 02:15:25 +01:00

7.7 KiB
Raw Blame History

BACKLOG — Phase 1d

Build backlog (Builder-only)

G0 — Generic install + deploy-once orchestrator (DG1) — CLAIMED, awaiting Adversary

  • runner/harness/generic.py: assert_serving (real HTTP + CA-verified wildcard cert, not Traefik fallback/default) + op helpers (do_upgrade, do_backup, do_restore) + backup_capable(recipe) (scan compose for backupbot.backup).
  • runner/harness/discovery.py: per-op overlay resolution (repo-local > cc-ci > generic), custom-test discovery (both locations, additive), install-steps hook discovery.
  • tests/_generic/: assertion-only generic tier files (test_install/upgrade/backup/restore.py).
  • Refactor run_recipe_ci.py → deploy-once: deploy base once, tiers in order on the shared deployment, one teardown in finally; per-op result summary.
  • tests/conftest.py live_app fixture exposes the shared live deployment (no per-tier deploy).
  • Deploy-count guard (CCCI_DEPLOY_COUNT_FILE) in lifecycle.deploy_app; orchestrator asserts ==1.
  • Generic install green on hedgedoc (no cc-ci/repo-local tests, deploy-count=1, clean teardown). custom-html-tiny rejected (empty static volume → 404 zero-config). → G0 CLAIMED.

G1 — Generic upgrade + backup/restore (DG2, DG3) — Adversary PASS @2026-05-28

  • Generic upgrade tier: previous→target in place; reconverge + serving (hedgedoc 3.0.9→3.0.10).
  • Generic backup/restore tiers gated on backup-capability (snapshot_id artifact + healthy restore).
  • Proven green on backup-capable hedgedoc (full lifecycle, deploy-count=1, clean teardown).
  • DG3 N/A-skip run-demo on a non-capable serving recipe → folded into G3 (custom-html-tiny).

G2 — Layering + discovery + precedence (DG4, DG4.1) — Adversary PASS @2026-05-28

  • Migrated custom-html overlays to the assertion-only contract (override + extend + data-continuity).
  • Override proven (all 4 tiers ran cc-ci overlays); extend-by-composition (reuse generic helpers); no redeploy (deploy-count=1); precedence repo-local>cc-ci>generic via tests/unit/test_discovery.py (5/5).

G3 — Custom install-steps hook + graceful-generic (DG5) — CLAIMED, awaiting Adversary

  • install_steps.sh hook run during install tier (after app new+env, before deploy) — wired in deploy_app via discovery.install_steps.
  • Proof on custom-html-tiny: install FAILS without the hook (404, graceful), PASSES with it.
  • DG3 N/A-skip run-demo: custom-html-tiny non-backup-capable -> backup/restore = skip (Run B).

G4 — !testme e2e + per-op reporting + docs + cold verify (DG6, DG7, DG8)

  • !testme on an unconfigured recipe → full generic suite via real pipeline; per-op pass/fail/skip. DONE (CLAIMED): build #153 — hedgedoc PR#1 (no overlays) → bridge <60s → all 4 tiers ran tests/_generic → install/upgrade/backup/restore=pass, custom=skip, deploy-count=1, clean teardown, PR comment passed. Awaiting Adversary cold-verify.
  • Migrate remaining recipe tests to the new contract so nothing regresses (DG7) — afd75a4 (keycloak/cryptpad/matrix-synapse/n8n/lasuite-docs → assertion-only deploy-once contract).
  • docs/: generic suite, overlay convention (names/locations/precedence), install-steps hook, how to add an overlay — b756e72 (docs/testing.md + enroll-recipe.md + README).
  • Request Adversary cold-verify DG1DG8 → flip STATUS-1d to ## DONE.

Adversary findings (Adversary-only)

  • [adversary] F1d-2 (HIGH; blocks G1/DG2) — generic UPGRADE is a vacuous no-op: the "previous version" base deploy actually runs the LATEST image, so upgrade is latest→latest. CLOSED @2026-05-28: Builder fix 81e26a1 (recipe_checkout to the tag + non-chaos pinned deploy + a version/image move-assertion in do_upgrade). Re-verified cold both ways from my clone @c965f6c: genuine prev→target now MOVES (deploy 3.0.9→image 1.10.7; upgrade→1.10.8; version label 3.0.9+1.10.7→3.0.10+1.10.8, CHANGED), and a no-op upgrade now RAISES "did not move". DG2 non-vacuous + regression-locked. Closed. abra.app_new(version="3.0.9+1.10.7") does not check out the pinned tag — the hedgedoc recipe dir stays at HEAD=3.0.10+1.10.8 and compose.yml references hedgedoc:1.10.8 (diagnosed no-deploy: git -C ~/.abra/recipes/hedgedoc describe --tags3.0.10+1.10.8). So lifecycle.deploy_app(recipe, domain, version=prev) deploys the LATEST, and do_upgrade(domain, target=None) "upgrades" latest→latest — a no-op. Repro (cold, my clone @9d771a1, on cc-ci): deploy_app(version="3.0.9+1.10.7") → running image hedgedoc:1.10.8; upgrade_app(None) → still hedgedoc:1.10.8; CHANGED: False. (Tell: the upgrade tier passed in 1.97s — too fast for a real image pull + rolling update.) The generic upgrade tier asserts only still-serving, so the no-op passes and DG2 ("deploy a pinned/previous version, then abra app upgrade to the target") is never actually exercised — a genuinely broken upgrade would still report green. Fix: make the base deploy genuinely land the previous tag (e.g. actually git checkout the version tag in the recipe dir before deploy, or use the correct abra pin syntax — note abra app deploy -C/chaos also deploys the current checkout regardless of any .env version), and add an assertion that the running version/image actually changed prev→target (so a no-op upgrade fails). Re-claim G1 after. Only the Adversary closes this, after re-test showing CHANGED: True.

  • [adversary] F1d-1 (low; DG7-scoped, NOT a DG1 blocker) — served_cert is a near-no-op for distinguishing a deployed app from a non-deployed subdomain; journal/STATUS overstate it. CLOSED @2026-05-27: Builder reframed (6c5d8f2) the docstring/comments as an infra TLS sanity check, explicitly noting it does NOT distinguish app-vs-fallback (serving proof = converged + non-404). Behavior unchanged + claim now honest = my recommended fix. Re-verified. Closed. The G0 journal + STATUS-1d cite "a CA-verified trusted wildcard cert, not the default" as a distinguishing serving check, and the code comment in generic.served_cert claims Traefik's "DEFAULT cert ... FAILS verification — so this is a genuine 'not the default cert' assertion." Repro (cold, my clone @ef44d46, on cc-ci): served_cert("nope-deadbeef.ci.commoninternet.net")VERIFIED CN=*.ci.commoninternet.net. Because Traefik serves the pre-issued wildcard cert via the file provider for the WHOLE *.ci.commoninternet.net zone, the self-signed default cert is never served for any in-zone host — so this check passes for an app that was never deployed. It cannot fail in this topology for an in-zone domain ⇒ effectively a can't-fail assertion for the stated purpose (the exact DG7 smell the Builder thought they were removing when they replaced the openssl-missing no-op). Not a DG1 blocker: the load-bearing serving proof is genuine — assert_serving correctly RAISES on a non-deployed domain via services_converged=False (and a non-deployed subdomain returns HTTP 404, excluded from HEALTH_OK). Verified both directly. Fix (before the DG7/G4 gate): stop claiming the cert check distinguishes app-vs-fallback; either drop it or reframe it as an infra-cert sanity check, and rely on converged+non-404 (which already do the work) — or add a check that genuinely proves the body came from the app. Adjust the journal/STATUS/code-comment wording so it doesn't assert a guarantee it doesn't provide. Only the Adversary closes this, after re-test.