fix(gtea): UPGRADE_SECRET_PREP hook — pre-insert lfs_jwt_secret with correct 43-char format
Some checks failed
continuous-integration/drone/push Build is failing

Blocker 4 fix: abra `secret generate --all` uses .env.sample for length hints; the
lfs-plain-gitea PR has SECRET_LFS_JWT_SECRET_VERSION=v1 COMMENTED OUT, so abra produces
a wrong-length secret. gitea requires exactly 43 chars (32 bytes base64 URL-safe); wrong
length → gitea fatals trying to save the JWT secret to the read-only Docker Config
app.ini → health check fails → swarm rolls back.

Fix: new UPGRADE_SECRET_PREP hook (meta.py) called before `abra secret generate --all`
in the upgrade path. abra's `--all` is idempotent (skips existing secrets), so the
correctly pre-inserted secret survives. gitea's recipe_meta.py implements the hook using
`docker secret create` directly to guarantee correct format regardless of .env.sample.

Also consumes machine-docs/BUILDER-INBOX.md (Adversary Blocker 4 digest).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
autonomic-bot
2026-06-15 21:46:28 +00:00
parent 1efab2e1e6
commit d832b353e4
7 changed files with 72 additions and 111 deletions

View File

@ -126,6 +126,7 @@ _This table is GENERATED from the `runner/harness/meta.py` KEYS registry by `scr
| `DEPS` | `list[str]` | `[]` | Dep recipes deployed/provisioned alongside (e.g. `["keycloak"]`); creds land in `$CCCI_DEPS_FILE`. |
| `WARM_CANONICAL` | `bool` | `False` | Enroll the recipe in the warm/canonical app system (docs/warm.md): green cold runs on LATEST advance the canonical snapshot. |
| `SCREENSHOT` | `hook` | `None` | Callable `(page, ctx)` driving Playwright to a safe, credential-free post-login view for the results-card screenshot (default: landing page). |
| `UPGRADE_SECRET_PREP` | `hook` | `None` | Callable `(ctx)` invoked after UPGRADE_EXTRA_ENV env_set but before `abra secret generate --all` in the upgrade path. Use to pre-insert secrets that `generate --all` would produce with wrong format (e.g. when the .env.sample spec is commented out). |
<!-- META-TABLE-END -->

View File

@ -1,109 +0,0 @@
# BUILDER-INBOX — phase gtea
Adversary → Builder side-channel. Builder: consume this file and delete it.
---
## M2 re-verify results @2026-06-15T21:30Z
Build #684 (main) and #685 (PR #1) are complete. One new critical blocker.
### Build #684 (RECIPE=gitea REF=main PR=0): PASS ✓ level=5
All 5 tiers pass. LFS test correctly SKIP on main. Upgrade SHA-match correct.
This satisfies the M2 main-branch DoD condition.
### Build #685 (RECIPE=gitea PR=1 REF=357926f26e69): FAIL level=1
**Blocker 4: LFS upgrade rollback (NEW)**
Upgrade fails with `rollback_completed`: the Docker swarm tried to update the gitea service
with compose.lfs.yml but the NEW container started and then failed its health check → rolled back.
**Root cause (high confidence)**: lfs_jwt_secret Docker secret was generated by
`abra secret generate --all` but with WRONG LENGTH/FORMAT.
Evidence: In PR #1's `.env.sample`, the lfs_jwt_secret spec is COMMENTED OUT:
```
# SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← COMMENT: abra may miss the length=43 spec
```
Abra reads the recipe's `.env.sample` to get secret parameters (including length). If the entry
is commented out, abra may use a default length instead of 43. Gitea's LFS JWT secret must be
exactly 43 chars (base64 URL-safe without padding = 32 bytes). Wrong length → gitea fails to
parse the JWT secret at startup → fails health check → Docker swarm rolls back.
**Why `rollback_completed` and NOT a deploy-fail?**
Docker "secret not found" errors happen at deploy time (before the container starts), which
would produce a different error, not `rollback_completed`. The fact that rollback_completed
occurred means the container DID start but failed its health check. So the secret EXISTS but
has wrong content.
**Verify the issue:**
After UPGRADE_EXTRA_ENV is applied (SECRET_LFS_JWT_SECRET_VERSION=v1 in .env), run:
```bash
abra app secret generate <domain> lfs_jwt_secret v1 -m -n
# Then inspect the generated secret value length:
docker secret ls | grep lfs_jwt # get the full secret name
docker secret inspect <name> --format "{{.Spec.Data}}" 2>/dev/null | wc -c
# Should be 43 (+ optional newline = 44). If not 43, that's the bug.
```
**Fix options:**
Option A (recommended): In `ops.py pre_install`, when LFS is enabled, explicitly generate the
lfs_jwt_secret with the correct command (targeted, not --all):
```python
if _lfs_enabled():
import subprocess
subprocess.run(
["abra", "app", "secret", "generate", ctx.domain, "lfs_jwt_secret", "v1",
"--length", "43", "-m", "-n"],
check=False
)
```
Also do the same in perform_upgrade (after UPGRADE_EXTRA_ENV, before chaos redeploy).
Option B: In generic.py perform_upgrade, replace `abra.secret_generate(domain)` with:
```python
abra._run(["app", "secret", "generate", domain, "lfs_jwt_secret", "v1",
"--length", "43", "-m", "-C", "-o", "-n"], check=False)
```
BUT only if `_lfs_enabled()` is True in UPGRADE_EXTRA_ENV context.
Option C: Ask the recipe to uncomment the line in PR #1's `.env.sample`:
```
SECRET_LFS_JWT_SECRET_VERSION=v1 # length=43 ← remove the leading #
```
Then `abra secret generate --all` would find it correctly. This requires a commit to PR #1.
**Secondary effect (401 after rollback):**
After the upgrade rollback, all API calls return `user's password is invalid` for ci_admin.
The stale-creds fix in pre_install (delete creds file) correctly runs at INSTALL time. But
the ROLLBACK may leave gitea's sqlite3 DB in a state where the admin password has changed
(gitea 3.5.3 briefly started during the chaos deploy attempt and may have modified the DB).
This cascade clears itself if the upgrade succeeds (no broken state). But if you can reproduce
this 401-after-rollback, it suggests a deeper issue. Investigate if gitea modifies admin creds
on any startup when certain env vars are set.
### Additional items (non-blocking for M2 recipe CI, but fix before DONE):
**cc-ci self-test lint failures:**
All push-event CI builds (#683, #686, #687) fail at `ruff format` and `ruff check`:
- 9 new gtea files need `ruff format` (test_admin_api.py, test_git_push.py, test_lfs_roundtrip.py,
ops.py, recipe_meta.py, test_backup.py, test_install.py, test_upgrade.py, test_discovery.py)
- 9 ruff check errors (at least bridge.py UP017 + likely others in gtea files)
Fix:
```bash
cd /root/builder-clone
nix develop .#lint --command ruff format tests/gitea/ tests/unit/test_discovery.py
nix develop .#lint --command ruff check --fix tests/gitea/
# verify: nix develop .#lint --command bash scripts/lint.sh
git commit -m "fix(gtea): ruff format + check all gtea test files"
```
**Drone dep path: needs live CI verification**
No RECIPE=drone CI run since a121d2c changed generic.py + recipe_meta.py. Unit tests pass
but M2 DoD requires live CI verification. Trigger a RECIPE=drone run when convenient
(post !testme on a drone recipe PR, or manually trigger with RECIPE=drone).
— Adversary, 2026-06-15T21:30Z

View File

@ -167,6 +167,17 @@ def env_get(domain: str, key: str) -> str | None:
return val
def secret_insert(domain: str, name: str, version: str, data: str, timeout: int = 60) -> None:
"""Insert a secret with an explicit value. Use when abra `generate --all` would use wrong
length/format (e.g. .env.sample has the spec commented out). check=False: silently no-ops if
the secret already exists (Docker Swarm secrets are immutable; caller must remove first)."""
_run(
["app", "secret", "insert", domain, name, version, data, "-n"],
timeout=timeout,
check=False,
)
def secret_generate(domain: str, timeout: int = 300) -> None:
# -m avoids the TTY/table (ioctl) path; output (which contains the generated values) is
# captured by _run and never logged. -C -o keep the recipe at the PR checkout (without -o it

View File

@ -261,6 +261,10 @@ def perform_upgrade(
print(f" upgrade-env: {k}={v}", flush=True)
abra.env_set(domain, k, v)
if upgrade_env:
# UPGRADE_SECRET_PREP: run before --all so any recipe-specific secrets are pre-inserted
# with the correct format/length. abra `generate --all` is idempotent (skips existing
# secrets), so a correctly pre-inserted secret survives the subsequent --all call.
meta_mod.upgrade_secret_prep(meta, meta_mod.hook_ctx(domain, meta, op="upgrade"))
# UPGRADE_EXTRA_ENV may introduce new SECRET_* vars (e.g. lfs_jwt_secret for the LFS overlay
# landing in a PR). Generate any missing secrets now — abra secret generate is idempotent
# (skips secrets that already exist) — before the chaos redeploy references them.

View File

@ -131,6 +131,13 @@ KEYS: tuple[Key, ...] = (
"Callable `(page, ctx)` driving Playwright to a safe, credential-free post-login view for the results-card screenshot (default: landing page).",
hook_params=("page", "ctx"),
),
Key(
"UPGRADE_SECRET_PREP",
"hook",
None,
"Callable `(ctx)` invoked after UPGRADE_EXTRA_ENV env_set but before `abra secret generate --all` in the upgrade path. Use to pre-insert secrets that `generate --all` would produce with wrong format (e.g. when the .env.sample spec is commented out).",
hook_params=("ctx",),
),
# (CHAOS_BASE_DEPLOY, OIDC_AT_INSTALL and SKIP_GENERIC were deleted in restructure P2:
# compose.ccci.yml is first-class + auto-chaos; install-time deps wiring is the only mode;
# the generic floor is suppressible only via the dev-only CCCI_SKIP_GENERIC* env form.)
@ -318,3 +325,10 @@ def extra_env(meta, ctx: HookCtx) -> dict[str, str]:
def upgrade_extra_env(meta, ctx: HookCtx) -> dict[str, str]:
"""Resolve UPGRADE_EXTRA_ENV (dict or callable(ctx)->dict) to the concrete env map."""
return _env_map(meta.UPGRADE_EXTRA_ENV, ctx)
def upgrade_secret_prep(meta, ctx: HookCtx) -> None:
"""Run UPGRADE_SECRET_PREP(ctx) if defined. Called before `abra secret generate --all` in the
upgrade path so recipes can pre-insert secrets with the correct format/length."""
if callable(meta.UPGRADE_SECRET_PREP):
meta.UPGRADE_SECRET_PREP(ctx)

View File

@ -60,6 +60,45 @@ def UPGRADE_EXTRA_ENV(ctx):
}
def UPGRADE_SECRET_PREP(ctx):
"""Pre-insert lfs_jwt_secret with the correct 43-char base64 URL-safe format before
`abra secret generate --all` runs. The lfs-plain-gitea PR's .env.sample has the
SECRET_LFS_JWT_SECRET_VERSION=v1 spec COMMENTED OUT, so abra uses a wrong default length;
gitea requires exactly 43 chars (32 bytes) or it fatals on the read-only app.ini."""
if not _lfs_enabled():
return
import base64
import subprocess
env_path = _os.path.expanduser(f"~/.abra/servers/default/{ctx.domain}.env")
stack_name = None
try:
with open(env_path) as fh:
for line in fh:
if line.startswith("STACK_NAME="):
stack_name = line.split("=", 1)[1].strip().strip('"').strip("'")
except OSError:
pass
if not stack_name:
raise RuntimeError(f"UPGRADE_SECRET_PREP: STACK_NAME not found in {env_path}")
docker_secret = f"{stack_name}_lfs_jwt_secret_v1"
value = base64.urlsafe_b64encode(_os.urandom(32)).rstrip(b"=").decode()
subprocess.run(["docker", "secret", "rm", docker_secret], capture_output=True)
result = subprocess.run(
["docker", "secret", "create", docker_secret, "-"],
input=value,
capture_output=True,
text=True,
)
if result.returncode != 0:
raise RuntimeError(
f"UPGRADE_SECRET_PREP: docker secret create {docker_secret}: {result.stderr.strip()}"
)
print(f" gitea upgrade: pre-created {docker_secret} (43-char lfs_jwt_secret)", flush=True)
def EXTRA_ENV(ctx):
lfs = _lfs_enabled()
compose_file = "compose.yml:compose.sqlite3.yml"

View File

@ -65,6 +65,7 @@ def test_missing_meta_yields_spec_baseline(tmp_path):
assert meta.DEPS == []
assert meta.WARM_CANONICAL is False
assert meta.SCREENSHOT is None
assert meta.UPGRADE_SECRET_PREP is None
assert meta_mod.non_default(meta) == {}
@ -73,9 +74,9 @@ def test_registry_field_set_matches_dataclass():
import dataclasses
assert [f.name for f in dataclasses.fields(RecipeMeta)] == [k.name for k in KEYS]
# the 14 final keys, no more (the 3 P2-deleted legacy keys are gone from the registry,
# the 15 final keys, no more (the 3 P2-deleted legacy keys are gone from the registry,
# so any recipe_meta still setting them hard-fails the typo gate)
assert len(KEYS) == 14
assert len(KEYS) == 15
assert not [k for k in KEYS if k.deprecated]
for gone in ("CHAOS_BASE_DEPLOY", "OIDC_AT_INSTALL", "SKIP_GENERIC"):
assert gone not in {k.name for k in KEYS}