git checkout <head_ref> aborted on the untracked install_steps-provided compose.host-ports.yml (which
head_ref tracks). Force-checkout yields the exact ref tree. Also fixes the mumble restore tier: backup
labels exist only in 1.0.0+, so backup/restore are meaningful only after the (now-working) upgrade moves
the app to head_ref. DECISIONS.md updated.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The origin-repoint approach hit go-git 'reference not found' (mirror HEAD→master vs main). Simpler +
robust: detect lightweight version tags (has_lightweight_version_tags, read-only) and, for the pinned
base deploy of such a recipe, use chaos — which SKIPS abra lint (so no R014 FATA) and deploys the
EXPLICITLY-checked-out pinned version (recipe_checkout already ran; chaos uses the current checkout,
so it's the prev version, NOT LATEST — F1d-2's hazard was the missing checkout). No-op / stays pinned
for all-annotated recipes. The upgrade tier's prev→PR-head crossover + HC1 (chaos-version==head_ref)
still hold (verified by the run's upgrade-tier log).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
--bare lacked refs/heads/main, so abra's post-normalize git ops (app secret insert / deploy) failed
'unable to fetch tags: reference not found' when fetching from the repointed local origin. --mirror
copies all refs (heads+tags) → abra fetch OK + R014 passes (both verified).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diagnosed: abra runs git fetch --tags --force from origin before its pinned-deploy lint, so
re-annotating the lightweight tag in place is reverted before R014 runs. Fix: after re-annotating,
clone the recipe to a local bare repo (carrying the annotated tag) and repoint origin at it, so
abra's force-fetch pulls the annotated tag. Verified: abra recipe lint R014 then PASSES and the
annotation sticks. Deployed commit unchanged. No-op for all-annotated recipes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
lasuite-meet upgrade tier failed at the prev-version base deploy: abra's pinned-deploy lint FATA'd on
R014 'only annotated tags used for recipe version' because upstream coop-cloud lasuite-meet ships a
stray LIGHTWEIGHT tag (0.3.0+v1.16.0). chaos deploys skip lint (so install,custom passed) but the
upgrade tier's pinned prev-version deploy lints. New abra.normalize_recipe_tags() re-creates each
lightweight version tag as annotated at the SAME commit (no deployed content changes); called in
lifecycle.deploy_app after recipe_checkout when version is pinned. Idempotent; no-op for all-annotated
recipes (lasuite-drive etc.). Helps any recipe with a stray upstream lightweight tag.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adversary cold-verify FAILed Q3.2 (F2-12): the prev→PR-head chaos upgrade's abra converge monitor
FATAs while the NEW collabora 25.04.9.4.1's healthcheck is still in start_period (jail/config init),
even though it converges given swarm's healthcheck retries. My WOPI pre-gate fixed the OLD collabora
being killed mid-boot but not the NEW collabora's convergence. Flaky (3x green for me, 1x fail cold).
Fix (cc-ci-side, stronger verification — not weaker):
- abra.deploy gains no_converge_checks (`-c`); chaos_redeploy passes it for the upgrade op so abra's
impatient monitor no longer FATAs (the stack spec is applied regardless).
- perform_upgrade now OWNS the convergence verification after the redeploy: wait_healthy (services
N/N + app HEALTH_PATH) + new lifecycle.wait_ready_probes (recipe READY_PROBE), bounded by the
recipe DEPLOY_TIMEOUT (generous) not abra's impatient window. meta threaded _perform_op→perform_upgrade.
- recipe_meta READY_PROBE hook (added to _load_meta whitelist): lasuite-drive probes collabora WOPI
discovery (/hosting/discovery on collabora-<domain>) → 200. Called after install deploy AND after
the upgrade redeploy. No-op for recipes without a READY_PROBE.
NOT re-claiming yet — validating the upgrade tier is now reliably green (incl. the slow-collabora
crossover) across multiple runs before re-claiming Q3.2. F2-12 stays open (Adversary-owned).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
F1e-1 (Adversary): exec_in_app silently returned '' on a failed docker exec, flipping a healthy
recipe RED under opt-out (post-backup container cycle, no readiness buffer). Now polls (re-resolve
container + re-exec) until rc==0 or 90s, then RAISES — never masks an exec failure as empty data.
No assertion weakened. Verified: opt-out install,backup,restore on custom-html now PASS.
HC1: head_ref = ref or recipe_head_commit (prefer explicit PR head sha $REF — robust, no git race;
production !testme always sets REF). assert_upgraded, when head_ref known, REQUIRES the deployed
chaos-version commit to MATCH head_ref (direct + non-vacuous proof the PR-head code was deployed; a
stale prev-checkout chaos redeploy fails). Falls back to version/image/chaos move check otherwise.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- deploy_app: checkout the pinned tag + deploy NON-chaos when a version is pinned (chaos only for
version=None / PR-head). Was always -C, which ignored the pin and deployed LATEST -> upgrade no-op.
- do_upgrade: assert the deployment actually MOVED (coop-cloud version label and/or image changed)
via lifecycle.deployed_identity -> a vacuous no-op upgrade can no longer pass (DG2).
- G2: migrate custom-html overlays to the assertion-only contract (override + extend-by-composition
+ data-continuity; split backup/restore). tests/unit/test_discovery.py proves precedence (5/5).
Probe (Adversary's F1d-2 test): hedgedoc deploy-prev=1.10.7 -> upgrade=1.10.8, CHANGED=True.
hedgedoc full generic lifecycle green (install/upgrade/backup/restore, deploy-count=1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- backup artifact: read snapshot_id from 'abra app backup create' output (snapshots needs a TTY);
generic.parse_snapshot_id + do_backup assert it
- restore serving race: lifecycle.http_fetch (one request -> status+body, never raises) +
assert_serving is now a bounded poll (settles a post-op reconverge, no bare sleep); drop wait_serving
- F1d-1 (Adversary, low): reframe served_cert/assert_serving honestly as an INFRA TLS sanity check
(catches a lapsed/mis-rotated wildcard cert), NOT app-vs-fallback (Traefik serves the wildcard
zone-wide); the genuine serving proof is services_converged + non-404 status. Awaiting re-test.
DG1 Adversary PASS @ef44d46. G1 full-lifecycle re-verification in flight.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Diagnosed via instrumented diag: lasuite-docs upgrade reported 'FATA deploy failed' while all 9
services converged 1/1 — abra's convergence poll gives up too early on the slow stop-first roll
(pulling new images). Disable abra's check; the harness wait_healthy + data-survival assertion is
the real, more-patient gate (a genuine failure still fails the test: app never gets healthy).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes surfaced by the first real recipe-ci run through Drone:
- abra app backup/restore now pass -C -o (current checkout, no remote fetch) like
every other recipe-touching call — without -o they fetch recipe tags from the
(private) remote and fail 'authentication required: Unauthorized'.
- fetch_recipe's catalogue path rm's the recipe dir first so a leftover private-mirror
remote from a prior SRC+REF run can't poison version resolution / backup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
D4 snapshots recipe-shipped tests/ and runs them against the live app. abra -C -o
everywhere + token clone for private mirror PRs. keycloak install green with no
harness surgery (D5). docs/enroll-recipe.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3-stage run green (install/upgrade/backup), clean teardown. backupbot deployed
via reconcile oneshot; PTY (script) for abra backup/restore; -m for secret generate
(no value leak). M5 CLAIMED.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>