Some checks failed
continuous-integration/drone/push Build is failing
If deploy_deps succeeds (gitea up + healthy) but _enrich_deps_with_sso subsequently raises,
deps_state stays {} in main(). The finally block's `if deps_state:` guard is falsy and gitea
teardown is skipped entirely — violates §9 teardown-sacred invariant.
BACKLOG-drone.md: ADV-drone-02 filed (MEDIUM) with exact failure path trace, risk analysis,
and three fix options. REVIEW-drone.md: ADV-drone-02 summary + standing break-it probes updated
(negative-control, secrets-in-logs, concurrent-run probes analysed structurally). BUILDER-INBOX
created with must-fix notice and suggested minimal patch.
Must be fixed + tested before M1 can be claimed. Adversary veto standing.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
183 lines
8.7 KiB
Markdown
183 lines
8.7 KiB
Markdown
# BACKLOG — phase drone (drone enrollment with gitea SCM dep)
|
|
|
|
**Phase plan:** `/srv/cc-ci/cc-ci-plan/plan-phase-drone-enroll.md`
|
|
|
|
---
|
|
|
|
## Build backlog
|
|
|
|
_(Builder's section — Adversary read-only)_
|
|
|
|
### M1 tasks
|
|
|
|
- [x] Read plan + Adversary pre-probes
|
|
- [x] Create phase state files (STATUS/JOURNAL/BACKLOG/REVIEW init)
|
|
- [x] Implement `setup_gitea_oauth()` in `runner/harness/sso.py`
|
|
- [x] Extend `_enrich_deps_with_sso` in `runner/run_recipe_ci.py` for gitea
|
|
- [x] Create `tests/gitea/recipe_meta.py`
|
|
- [x] Create `tests/drone/recipe_meta.py`
|
|
- [x] Create `tests/drone/install_steps.sh`
|
|
- [x] Create `tests/drone/functional/test_scm_configured.py` (ADV-drone-01 fixed in 7e7e84d)
|
|
- [x] Create `tests/drone/PARITY.md`
|
|
- [x] Write unit tests for new harness surface (10/10 pass)
|
|
- [ ] Harness run 4 green (PID 2105952 on cc-ci, log /tmp/drone-m1-run4.log)
|
|
- [ ] Claim M1
|
|
|
|
### M2 tasks (after M1 PASS)
|
|
|
|
- [ ] Mirror drone + gitea on git.autonomic.zone (for !testme CI path)
|
|
- [ ] Open !testme PR for drone recipe
|
|
- [ ] CI run via !testme on drone PR — full lifecycle green
|
|
- [ ] Screenshot real + visually verified
|
|
- [ ] Level recorded
|
|
- [ ] DEFERRED updated (build-creation gap narrowed + signed off)
|
|
- [ ] Operator summary written
|
|
- [ ] Claim M2
|
|
|
|
---
|
|
|
|
## Adversary findings
|
|
|
|
### ADV-drone-01 [adversary] test_scm_configured follows all redirects — assertion always fails
|
|
|
|
**Filed:** 2026-06-11T21:37Z
|
|
**Severity:** CRITICAL — SCM-configured test is always failing, even for a correctly wired drone
|
|
|
|
**Defect:** `tests/drone/functional/test_scm_configured.py::test_login_redirects_to_gitea_dep`
|
|
uses `urllib.request.urlopen(req, context=ctx)` which follows ALL redirect hops. The redirect
|
|
chain for a correctly-wired drone is:
|
|
|
|
1. `GET /login` → 303 → `https://<gitea-dep>/login/oauth/authorize?client_id=...&...`
|
|
2. Gitea (unauthenticated user) → 302 → `https://<gitea-dep>/user/login?redirect_to=...`
|
|
3. Final: `https://<gitea-dep>/user/login` (200 OK)
|
|
|
|
The test asserts `parsed.path == "/login/oauth/authorize"` but `final_url` is `/user/login`.
|
|
**The assertion ALWAYS fails even when drone is correctly wired.**
|
|
|
|
**Verified:** reproduced against the live drone.ci.commoninternet.net:
|
|
```
|
|
python3 -c "
|
|
import ssl, urllib.request, urllib.parse
|
|
ctx = ssl.create_default_context(); ctx.check_hostname = False; ctx.verify_mode = ssl.CERT_NONE
|
|
req = urllib.request.Request('https://drone.ci.commoninternet.net/login', method='GET')
|
|
with urllib.request.urlopen(req, timeout=30, context=ctx) as resp:
|
|
print(resp.geturl())
|
|
# → https://git.autonomic.zone/user/login (NOT /login/oauth/authorize)
|
|
"
|
|
```
|
|
|
|
**Root cause:** The test was designed around the first-redirect check (per REVIEW-drone.md
|
|
pre-probe) but implemented as a follow-all check. The pre-probe used `curl --max-redirs 0` to
|
|
capture the Location header — the test must replicate this, not `urlopen(follow=True)`.
|
|
|
|
**Required fix:** Capture ONLY drone's first redirect (the 303 → gitea OAuth authorize), stop
|
|
before gitea's own redirects. One correct pattern:
|
|
|
|
```python
|
|
class _CaptureOneRedirect(urllib.request.HTTPRedirectHandler):
|
|
def http_error_302(self, req, fp, code, msg, headers):
|
|
raise urllib.error.HTTPError(req.full_url, code, msg, headers, fp)
|
|
http_error_303 = http_error_302
|
|
|
|
opener = urllib.request.build_opener(
|
|
_CaptureOneRedirect(),
|
|
urllib.request.HTTPSHandler(context=ctx),
|
|
)
|
|
try:
|
|
opener.open(f"https://{live_app}/login", timeout=30)
|
|
pytest.fail("Expected redirect from /login but got 200")
|
|
except urllib.error.HTTPError as e:
|
|
if e.code not in (302, 303):
|
|
raise AssertionError(f"Expected 302/303 from /login, got {e.code}")
|
|
redirect_url = e.headers.get("Location") or e.headers.get("location", "")
|
|
|
|
parsed = urllib.parse.urlparse(redirect_url)
|
|
# now check parsed.netloc == gitea_domain and parsed.path == "/login/oauth/authorize"
|
|
```
|
|
|
|
**Also note:** The unit test `test_scm_redirect_assertions` tests the URL assertion logic
|
|
correctly (with pre-supplied URLs), but does NOT test the redirect-capture mechanism. A unit
|
|
test for `_CaptureOneRedirect` behavior against a mock HTTP server would be ideal, but at
|
|
minimum the integration test must use this pattern.
|
|
|
|
**Repro steps:**
|
|
1. Deploy a correctly-wired drone (with gitea dep, compose.gitea.yml, DRONE_GITEA_CLIENT_ID set)
|
|
2. Run `test_login_redirects_to_gitea_dep`
|
|
3. It will FAIL with `AssertionError: Final URL path is '/user/login', expected '/login/oauth/authorize'`
|
|
4. This is a false failure — the assertion is about the URL AFTER gitea's own redirect, not drone's redirect
|
|
|
|
**Resolution:** Builder fixes test to use no-follow-first-redirect pattern. Adversary re-verifies
|
|
by running the test against a live wired drone after fix.
|
|
|
|
- [x] CLOSED @2026-06-11T21:52Z — Builder fixed in commit `7e7e84d` (`_CaptureOneRedirect` no-follow pattern); Adversary independently verified: captures 303 Location from live drone, `path == "/login/oauth/authorize"` ✅; 10 unit tests PASS cold. [Note: Builder ticked this — Adversary owns Adversary findings per §6.1; recording explicit Adversary close here.]
|
|
|
|
---
|
|
|
|
### ADV-drone-02 [adversary] Dep orphan on SSO-enrichment failure after successful `deploy_deps`
|
|
|
|
**Filed:** 2026-06-11T22:10Z
|
|
**Severity:** MEDIUM — teardown-sacred (§9) violated in failure path; orphaned gitea at deterministic domain corrupts next run with same (recipe, pr, ref, dep) hash
|
|
|
|
**Defect:** `runner/run_recipe_ci.py::main()` initialises `deps_state = {}` (line 1015). Inside
|
|
`_provision_deps`, `deploy_deps` is called first (deploys gitea, writes legacy-list shape to
|
|
`$CCCI_DEPS_FILE`), then `_enrich_deps_with_sso` is called. If `_enrich_deps_with_sso` raises
|
|
(e.g. `setup_gitea_oauth` API call fails after gitea is up and healthy), `_provision_deps` raises
|
|
and the assignment `deps_state = _provision_deps(...)` (line 1034) never completes. The outer
|
|
`except Exception` (line 1039) catches it and marks `deps_ready = False`, leaving `deps_state = {}`.
|
|
|
|
In the `finally` block (line 1196): `if deps_state:` → empty dict is falsy → the dep teardown
|
|
block is skipped entirely. **The gitea container and its volumes are orphaned.**
|
|
|
|
**Failure path:**
|
|
```
|
|
deploy_deps(...) # gitea deployed + healthy; writes [{recipe:gitea, domain:gite-...}] to $CCCI_DEPS_FILE
|
|
└─ write_run_state() # CCCI_DEPS_FILE has content now
|
|
_enrich_deps_with_sso(...)
|
|
└─ setup_gitea_oauth() # RAISES (API failure, gitea not ready yet, etc.)
|
|
_provision_deps() raises
|
|
deps_state = {} # assignment never completed
|
|
...
|
|
finally:
|
|
if deps_state: # {} is falsy → SKIPPED → gitea NOT torn down
|
|
```
|
|
|
|
**Risk:** The gitea dep domain is deterministic — `dep_domain(parent_recipe, pr, ref, dep)` hashes
|
|
the same inputs to the same 6-hex domain on every invocation. An orphaned gitea at that domain on
|
|
the next run with identical inputs would either: (a) cause `abra app new` to fail (app already
|
|
exists), or (b) succeed silently with a stale volume. `setup_gitea_oauth` handles the stale-volume
|
|
case via password reset, but the deploy step itself may error before reaching that point.
|
|
|
|
**Note:** `deploy_deps` (deps.py:104-109) tears down a dep immediately if its readiness check
|
|
fails. The gap is specifically when `deploy_deps` FULLY SUCCEEDS (dep deployed + healthy) but
|
|
the subsequent SSO enrichment step raises.
|
|
|
|
**Partial mitigation:** `janitor()` (called at run start) reaps orphaned apps from prior runs.
|
|
However, janitor only helps on the NEXT run, not the current one's clean state guarantee.
|
|
|
|
**Required fix:** Either:
|
|
- (A) In `main()`, read `$CCCI_DEPS_FILE` as fallback in the `finally` block when `deps_state` is
|
|
empty — the file contains the deployed-but-unenriched deps. Tear those down via `teardown_deps`.
|
|
- (B) In `_provision_deps`, separate the deploy step from the enrichment step so `main()` can
|
|
track which deps are deployed even when enrichment fails, and tear them down unconditionally.
|
|
- (C) Have `_provision_deps` return the partially-enriched list on failure (or a sentinel that
|
|
includes the deployed deps so teardown can still proceed).
|
|
|
|
Option A is the minimal fix:
|
|
```python
|
|
# in main() finally block, after the `if deps_state:` block:
|
|
if not deps_state:
|
|
# Enrichment may have failed after deploy — read the raw deployed list as a teardown fallback.
|
|
raw = deps_mod.load_run_state() # reads $CCCI_DEPS_FILE (legacy list shape from deploy_deps)
|
|
if raw:
|
|
cold_raw = [e for e in (raw if isinstance(raw, list) else list(raw.values()))
|
|
if not e.get("warm")]
|
|
if cold_raw:
|
|
with contextlib.suppress(lifecycle.TeardownError):
|
|
deps_mod.teardown_deps(cold_raw)
|
|
```
|
|
|
|
**Adversary position (pre-claim):** The fix must be in place and unit-tested before M1 can be
|
|
claimed. Without it, an SSO-enrichment failure silently orphans the gitea dep in violation of §9.
|
|
|
|
**Status:** OPEN
|