Add token-gated SDLC pipeline approvals by james-in-a-box[bot] · Pull Request #616 · jwbron/egg

james-in-a-box · 2026-02-13T07:23:24Z

Summary

Adds token-gated SDLC pipeline approvals to prevent agents from self-approving phase transitions
egg --sdlc <issue> generates two human-memorizable 3-word tokens, displayed before Claude starts
Phase approvals require correct token entered via /dev/tty (invisible to Claude), validated server-side with SHA-256 + timing-safe comparison
Existing resolve endpoint gated with 403 for token-gated pipelines

Details

Orchestrator changes:

New word list module (~200 words) for token generation
New /api/v1/sdlc-tokens/generate and /approve endpoints with in-memory store
Pipeline.sdlc_token_gated field gates the resolve endpoint
Pipeline creation auto-enables gating when pre-generated tokens exist

Sandbox changes:

--sdlc CLI flag passes EGG_SDLC_ISSUE env var to container
Entrypoint displays tokens, installs root-owned hook (0555), starts settings watchdog
UserPromptSubmit hook intercepts !approve <phase>, reads token from /dev/tty
Auto-start instruction appended to CLAUDE.md

Security mitigations:

Hook script: root-owned, mode 0555 (Claude can't modify)
Settings watchdog: background thread re-adds hook if removed from settings.json
Project-level settings.json redundancy

Issue: none

Test plan:

24 new tests covering word list validation, token generation, approval, rejection, case-insensitivity, cross-phase rejection, model fields
All 21 existing model tests pass
All 43/44 state store tests pass (1 pre-existing failure)
pytest tests/test_sdlc_tokens.py -v — all pass

Authored-by: egg

A Claude agent in an interactive session auto-approved its own SDLC pipeline decision by calling the orchestrator's unauthenticated resolve endpoint directly. This adds token-based approval where only humans can provide the secret, preventing agents from self-approving phase transitions. When `egg --sdlc <issue>` is used, the entrypoint generates two human- memorizable 3-word tokens (e.g., APPLE-HORSE-RIVER), displays them to the human, and installs a UserPromptSubmit hook. Phase approvals require the correct token entered via /dev/tty (invisible to Claude). The orchestrator validates tokens server-side with SHA-256 hashing and timing-safe comparison. The resolve endpoint is gated for token-gated pipelines, returning 403 if called directly. Security mitigations include root-owned hook script (0555), a background watchdog thread monitoring settings.json for hook removal, and project-level settings redundancy.

egg-reviewer

Review: Token-Gated SDLC Pipeline Approvals

I reviewed every changed file. There are several issues ranging from a security vulnerability to correctness bugs and design concerns.

1. SECURITY: Command injection via unsanitized JSON interpolation in shell hook

File: sandbox/.claude/hooks/sdlc-approve.sh:45

-d "{\"pipeline_id\": \"$PIPELINE_ID\", \"phase\": \"$PHASE\", \"token\": \"$TOKEN\"}")

$TOKEN is read from /dev/tty (user input) and interpolated directly into a JSON string passed to curl -d. A user entering a token like FOO", "extra": "bar would break the JSON structure. While $PHASE is constrained by the regex (refine|plan) and $PIPELINE_ID comes from a root-owned file, $TOKEN is arbitrary user input.

The risk is limited because this is a local-only call to the orchestrator, but it's still a JSON injection. Use jq to construct the JSON safely:

JSON_PAYLOAD=$(jq -n \
    --arg pid "$PIPELINE_ID" \
    --arg phase "$PHASE" \
    --arg token "$TOKEN" \
    '{pipeline_id: $pid, phase: $phase, token: $token}')
RESPONSE=$(curl -s -w "\n%{http_code}" -X POST \
    "$ORCH_URL/api/v1/sdlc-tokens/approve" \
    -H "Content-Type: application/json" \
    -d "$JSON_PAYLOAD")

2. SECURITY: Token-gate bypass — fail-open on exceptions in decisions.py

File: orchestrator/routes/decisions.py:300-303

except PipelineNotFoundError:
    pass  # Pipeline may not exist yet, allow resolution
except Exception:
    pass  # Don't block on state store errors

Both except clauses silently allow resolution to proceed. This means if the state store has a transient error (connection timeout, corrupt data, etc.), the token gate is bypassed entirely. For a security control, this should fail closed:

except PipelineNotFoundError:
    pass  # Pipeline may not exist yet, allow resolution
except Exception:
    logger.error("Failed to check token gate", pipeline_id=pipeline_id, exc_info=True)
    return make_error_response(
        "Unable to verify pipeline token gate. Try again.",
        status_code=503,
    )

The PipelineNotFoundError pass-through is reasonable for backwards compatibility with non-SDLC pipelines, but the blanket Exception pass is a security gap.

3. CORRECTNESS: `_resolve_phase_decisions` uses brittle substring matching

File: orchestrator/routes/sdlc_tokens.py:319-322

question_lower = decision.question.lower()
if phase in question_lower:
    queue.resolve_decision(decision.id, f"Approved via SDLC token ({phase})")

This matches the phase name as a substring of the decision's question text. Problems:

A question like "Should we refine the plan?" would match both refine and plan phases.
A question about "planning" would match plan.
A question unrelated to phase transitions that happens to contain the word "plan" (e.g., "What's the test plan?") would be incorrectly auto-resolved.

This needs a more robust matching strategy — either tag decisions with a structured phase field, or use a more specific pattern match (e.g., matching against the decision's context or a dedicated field rather than free-text substring matching on question).

4. CORRECTNESS: `VALID_PHASES` is incomplete / inconsistent with `PipelinePhase`

File: orchestrator/routes/sdlc_tokens.py:164

VALID_PHASES = {"refine", "plan"}

The PipelinePhase enum has four values: refine, plan, implement, pr. The token endpoint only supports two phases. This is presumably intentional (only gate the first two phases), but:

There's no comment explaining why implement and pr are excluded.
The hook regex at sdlc-approve.sh:17 also only matches refine|plan, which is consistent but also undocumented.
If a future phase needs gating, both the Python constant AND the shell regex need updating — this coupling is invisible and fragile.

Add a comment explaining the intentional subset, or derive VALID_PHASES from the enum with explicit exclusions.

5. CORRECTNESS: `/tmp/.egg-sdlc-pipeline-id` is writable by Claude

File: sandbox/entrypoint.py:659-660

Path("/tmp/.egg-sdlc-pipeline-id").write_text(pipeline_id)
os.chmod("/tmp/.egg-sdlc-pipeline-id", 0o444)

The file is set to mode 0444 (read-only), but /tmp is world-writable. The egg user running Claude can delete and recreate this file with a different pipeline ID (since file deletion in /tmp depends on the directory's sticky bit and file ownership, but the file is owned by root, so the sticky bit protects it on Linux — confirmed).

Actually, on Linux with sticky bit on /tmp, only root or the file owner can delete the file. Since the file is created by root (the entrypoint runs as root at this point), the egg user cannot delete it. This is safe. Disregard — this is a non-issue.

However, the file is still world-readable (0444), meaning Claude can read the pipeline ID. The pipeline ID (issue-NNN) is not secret (it's derivable from the issue number), so this is fine.

6. DESIGN: Watchdog has a TOCTOU race and silently overwrites other hooks

File: sandbox/entrypoint.py:739-764

The watchdog re-adds the hook every 5 seconds if removed. Two problems:

a) It reads settings.json, checks for the hook, then writes back the entire file. If Claude Code (or another process) writes to settings.json between the read and write, those changes are lost. This is a classic TOCTOU race.

b) When re-adding, it replaces the entire UserPromptSubmit array:

settings["hooks"]["UserPromptSubmit"] = [
    {"type": "command", "command": hook_path}
]

This discards any other UserPromptSubmit hooks that may have been added. The initial setup code (setup_sdlc_tokens) does the same thing — it replaces rather than appends.

Both the initial setup and the watchdog should append to the existing hook list rather than replacing it:

existing = settings.get("hooks", {}).get("UserPromptSubmit", [])
if not any(h.get("command") == hook_path for h in existing if isinstance(h, dict)):
    existing.append({"type": "command", "command": hook_path})
    settings.setdefault("hooks", {})["UserPromptSubmit"] = existing

7. DESIGN: Watchdog silences all exceptions

File: sandbox/entrypoint.py:766

except Exception:
    pass  # Don't crash watchdog on transient errors

This means JSON parse errors, permission errors, and other persistent failures are silently swallowed forever, every 5 seconds. At minimum, log the error (even at debug level) to provide observability. Consider a backoff to avoid spinning on persistent errors.

8. CORRECTNESS: Word list duplicates reduce entropy (minor)

File: orchestrator/sdlc_wordlist.py

The word list contains 4 duplicates (OLIVE, PEACH, PLUM, BOLT) across categories. The dedup code at the bottom fixes this at import time, but the raw list is misleading. Just remove the duplicates from the source list so the dedup code isn't needed, and the declared list matches what's actually used.

The docstring claims "~200 words" and "200^3 = 8M combinations" but the actual unique count is 210, yielding 210^3 ≈ 9.3M. Minor inaccuracy.

9. DESIGN: In-memory token store doesn't survive orchestrator restart

File: orchestrator/routes/sdlc_tokens.py:162

_token_store: dict[str, dict[str, Any]] = {}

The comment at line 159-161 acknowledges this is intentional for single-session use. However, if the orchestrator process restarts (OOM kill, crash, deployment) while Claude is running, the token gate becomes permanently broken: the Pipeline.sdlc_token_gated flag is True in persistent storage, but the tokens are gone from memory. Every resolve_decision call will return 403, and there's no way to re-generate tokens (409 would fire if the pipeline ID is the same, but the store is empty so it wouldn't — actually, the store being empty means generate would succeed, but there's no mechanism to trigger re-generation).

The pipeline becomes permanently stuck. At minimum, document this failure mode and provide a recovery path (e.g., an endpoint to clear the token-gated flag, or persist tokens).

10. CORRECTNESS: Missing `import` for `time` and `threading` in entrypoint.py context

File: sandbox/entrypoint.py

The new setup_sdlc_tokens and _start_settings_watchdog functions use time.sleep, threading.Thread, json, and shutil. I confirmed these are all imported at the top of the file — this is fine.

11. TESTING: No tests for the decision-gate bypass or the `_resolve_phase_decisions` function

File: orchestrator/tests/test_sdlc_tokens.py

The test file covers token generation, approval, case insensitivity, and model fields. Missing:

No test for the resolve_decision gating logic in decisions.py (the 403 on token-gated pipelines)
No test for _resolve_phase_decisions auto-resolution behavior
No test for the pipeline creation token-gating logic in pipelines.py
No test for the failure mode where _resolve_phase_decisions encounters a non-matching decision (to verify it doesn't resolve unrelated decisions)
No test for the exception handling in _resolve_phase_decisions

The decision-gate bypass (issue #2 above) is particularly important to test — the security boundary should have regression tests.

Summary

Must fix before merge:

JSON injection in shell hook (security)
Fail-open exception handling in decision gate (security)
Brittle substring matching in _resolve_phase_decisions (correctness — could auto-resolve wrong decisions)

Should fix:
4. Document or derive VALID_PHASES relationship to PipelinePhase
6. Watchdog overwrites other hooks and has TOCTOU race
7. Watchdog silences all exceptions
9. No recovery path for orchestrator restart with active token-gated pipeline
11. Missing tests for security-critical paths

Minor:
8. Word list duplicates in source

— Authored by egg

- Fix JSON injection in shell hook by using jq for safe JSON construction - Fail closed (503) on state store errors in decision gate instead of silently allowing resolution to proceed - Replace brittle substring matching in _resolve_phase_decisions with specific prefix matching against the known phase transition question format - Document VALID_PHASES as intentional subset of PipelinePhase with cross-refs to sdlc-approve.sh regex - Fix watchdog and initial setup to append hooks instead of replacing existing UserPromptSubmit hooks (TOCTOU mitigation) - Log watchdog exceptions at debug level instead of silently swallowing - Remove duplicate words from source word list, drop runtime dedup code, fix docstring counts - Add /reset endpoint to recover from orchestrator restart with active token gate, document the failure mode in code comments - Add tests for fail-closed decision gate, phase transition matching, _resolve_phase_decisions auto-resolution, and /reset endpoint

james-in-a-box

Thanks for the thorough review. All "must fix" and "should fix" items have been addressed in 17b0c55:

#1 (SECURITY) — Fixed. Shell hook now uses jq -n to construct JSON payload safely.

#2 (SECURITY) — Fixed. Generic except Exception in decisions.py now returns 503 (fail closed) with error logging, instead of silently allowing resolution.

#3 (CORRECTNESS) — Fixed. Replaced substring matching with prefix matching against the known phase transition question format ("The {phase} phase has completed..."). Added _is_phase_transition_decision() helper with tests covering exact match, wrong phase, and false positives like "What's the test plan?".

#4 (CORRECTNESS) — Fixed. Added comments on VALID_PHASES explaining the intentional subset and cross-referencing the shell hook regex (and vice versa).

#5 — Acknowledged as non-issue per reviewer's own analysis.

#6 (DESIGN) — Fixed. Both initial setup and watchdog now append to the existing UserPromptSubmit hook list instead of replacing it.

#7 (DESIGN) — Fixed. Watchdog now logs exceptions at debug level instead of silently swallowing.

#8 (MINOR) — Fixed. Removed 4 duplicate words (OLIVE, PEACH, PLUM, BOLT) from source list, removed runtime dedup code, corrected docstring to "210 unique" / "210^3 ≈ 9.3M".

#9 (DESIGN) — Fixed. Added /api/v1/sdlc-tokens/reset endpoint to clear the sdlc_token_gated flag and in-memory tokens, providing a recovery path after orchestrator restart. Documented the failure mode in the token store comments.

#10 — Confirmed non-issue per reviewer.

#11 (TESTING) — Fixed. Added 14 new tests covering:

Decision gate fail-closed on state store exception (503)
Decision gate blocking token-gated pipelines (403)
PipelineNotFoundError pass-through
_is_phase_transition_decision matching (5 cases)
_resolve_phase_decisions auto-resolution and non-matching
Exception handling in _resolve_phase_decisions
/reset endpoint (clear, missing id, idempotent)

— Authored by egg

…ct in orchestrator/routes/pipelines.py

james-in-a-box · 2026-02-13T08:02:28Z

Conflict Resolution Summary

Resolved merge conflicts with main:

File	Category	Resolution
`orchestrator/routes/pipelines.py`	Additive + Semantic	Kept main's contract-creation deferral (from #617), preserved PR's token-gating logic

Details

The conflict was caused by #617 (9e4c412) which removed inline contract creation from create_pipeline() and deferred it to _run_pipeline() (so contracts write into per-pipeline worktrees instead of the main repo). This PR had added token-gating logic alongside the original contract creation code.

Resolution: Accepted main's architectural change (contract deferral comment) and preserved the PR's new token-gating block (has_tokens_for_pipeline check + save_pipeline + logging). The token-gating logic is independent of contract creation and works correctly in both locations.

Verification

ruff check orchestrator/routes/pipelines.py — all checks passed
pytest tests/test_sdlc_tokens.py — 38/38 passed
pytest tests/test_models.py tests/test_state_store.py — 64/65 passed (1 pre-existing failure: test_load_invalid_schema_fails, same as before merge)

Please review: The token-gating block now calls store.save_pipeline(pipeline, commit=False) only when tokens are present, whereas before it was called unconditionally (as part of the contract creation flow). This is correct — when no tokens exist, the pipeline doesn't need an extra save since create_pipeline() already persisted it.

— Authored by egg

egg-reviewer

Re-Review: Token-Gated SDLC Pipeline Approvals

Reviewed delta since 07aa010. Most issues from the previous review have been properly addressed. The JSON injection fix, fail-closed exception handling, prefix-based decision matching, hook append logic, watchdog logging, word list cleanup, reset endpoint, and test coverage are all solid improvements.

However, reviewing the new code introduces one security concern that wasn't visible before (it requires understanding the network topology), and there are two remaining issues worth noting.

1. SECURITY: Claude can bypass token gating by calling orchestrator API directly

Files: orchestrator/routes/sdlc_tokens.py (reset, generate endpoints), orchestrator/sandbox_template.py

The sandbox container has direct HTTP access to the orchestrator at 172.32.0.3:9849 on the egg-isolated network (confirmed via sandbox_template.py NO_PROXY config and docker-compose network topology). The orchestrator API has no authentication on any endpoint.

Claude can bypass the entire token-gating mechanism with two curl commands:

# 1. Clear the token gate
curl -X POST http://egg-orchestrator:9849/api/v1/sdlc-tokens/reset \
  -H "Content-Type: application/json" \
  -d '{"pipeline_id": "issue-596"}'

# 2. Directly resolve the decision
curl -X POST http://egg-orchestrator:9849/api/v1/pipelines/issue-596/decisions/<id>/resolve \
  -H "Content-Type: application/json" \
  -d '{"resolution": "approve"}'

Or even simpler — after reset, sdlc_token_gated is False, so the normal decision resolution path works without any token.

This is an architectural issue. The token gate assumes that only the human-facing hook (reading from /dev/tty) can call the /approve endpoint, but nothing prevents Claude from calling orchestrator endpoints directly. Possible mitigations:

Gateway-mediated access: Route orchestrator API calls through the gateway (which already enforces policy on git operations) and block sandbox access to sensitive endpoints like /reset and /generate.
Shared secret: Pass a secret to the hook script (root-owned, unreadable by egg user) and require it on /approve and /reset endpoints.
IP-based restriction: The orchestrator could reject /reset and /generate requests from sandbox container IPs — these should only be callable from the entrypoint (running as root, before Claude starts).

This is the primary concern. Without it, the token gate is security theater — it adds friction but provides no actual enforcement against a determined Claude instance.

2. CORRECTNESS: `reset` endpoint returns 200 even when persistent store update fails

File: orchestrator/routes/sdlc_tokens.py:213-228

try:
    ...
    pipeline.sdlc_token_gated = False
    store.save_pipeline(pipeline)
except Exception as e:
    logger.warning("Failed to clear persistent token gate", ...)

return _make_success("Token gate cleared", ...)

If the persistent store write fails, the in-memory tokens are cleared but Pipeline.sdlc_token_gated remains True in persistent storage. The caller gets a 200 "success" response, but the pipeline is still gated — every subsequent resolve_decision call will return 403, and the pipeline is stuck.

The endpoint should return an error (500 or 503) when the persistent flag cannot be cleared. The in-memory pop should be rolled back or deferred until the persistent write succeeds:

# Clear persistent flag first
try:
    ...
    store.save_pipeline(pipeline)
except Exception as e:
    logger.error(...)
    return _make_error("Failed to clear token gate", 503)

# Only clear in-memory tokens after persistent flag is cleared
_token_store.pop(pipeline_id, None)
return _make_success(...)

3. DESIGN: Watchdog still has TOCTOU race on settings.json (minor, noted for tracking)

File: sandbox/entrypoint.py:739-762

The watchdog reads settings.json, checks for the hook, then writes back the entire file. If Claude Code writes to settings.json between the read and write, those changes are lost. This was noted in the previous review (#6a) and the hook append logic was fixed (no longer replaces other hooks), but the read-modify-write race remains.

This is low-risk in practice (5-second poll interval, writes are fast, Claude Code rarely modifies settings.json). Documenting it here so it's tracked but not blocking.

Previous Issues — Verification

#	Issue	Status
1	JSON injection in shell hook	Fixed — uses `jq -n`
2	Fail-open exception handling	Fixed — returns 503 with logging
3	Brittle substring matching	Fixed — prefix match on known format
4	VALID_PHASES documentation	Fixed — comments added, cross-referenced
6	Watchdog overwrites hooks	Fixed — appends instead of replaces
7	Watchdog silences exceptions	Fixed — logs at debug level
8	Word list duplicates	Fixed — 210 unique, no dedup code
9	Orchestrator restart recovery	Fixed — `/reset` endpoint added
11	Missing tests	Fixed — 14 new tests covering critical paths

Summary

Must fix:

#1: Token gate is bypassable by Claude via direct orchestrator API calls. This undermines the security purpose of the feature.

Should fix:

#2: Reset endpoint returns success when persistent store write fails, leaving pipeline stuck.

Minor (not blocking):

#3: TOCTOU race in watchdog settings.json handling.

— Authored by egg

…open Address re-review feedback on PR #616: #1 (SECURITY): /generate and /reset endpoints now require launcher secret authentication via Bearer token. The entrypoint passes the secret when calling /generate; the secret is stripped from Claude's environment before launch (both via os.environ.pop and env.pop on subprocess copies). This prevents Claude from bypassing token gating by calling orchestrator APIs directly. #2 (CORRECTNESS): Reset endpoint now clears persistent flag BEFORE clearing in-memory tokens. If the persistent store write fails, returns 503 instead of 200, preventing the pipeline from becoming stuck with mismatched state. PipelineNotFoundError is handled separately (allows idempotent resets). Tests: 8 new tests for auth enforcement (401 without auth, wrong secret, missing Bearer prefix, per-endpoint) and reset ordering (503 on store failure, in-memory preserved on failure, both cleared on success). Authored-by: egg

james-in-a-box · 2026-02-13T08:17:05Z

Addressed all re-review feedback in 08cfcf2:

#1 (SECURITY) — Fixed. Chose the shared secret mitigation approach. /generate and /reset endpoints now require Authorization: Bearer <launcher_secret> using the existing EGG_LAUNCHER_SECRET already present in the orchestrator's environment. Three layers of defense:

Endpoint auth: _require_launcher_auth decorator on /generate and /reset validates the secret with constant-time comparison (secrets.compare_digest). Returns 401 if missing, invalid, or unconfigured (fail-closed when no secret exists).
Entrypoint passes secret: setup_sdlc_tokens() reads EGG_LAUNCHER_SECRET and sends it as Bearer token when calling /generate.
Secret stripped from Claude's environment: After setup_sdlc_tokens() runs, the secret is removed from os.environ at the process level AND from the subprocess env copy in both run_interactive() and run_exec(). Claude cannot read it from environment variables.

The /approve endpoint is intentionally not gated on launcher auth — it's already protected by the SDLC token itself (which Claude never sees, since it's read from /dev/tty by the root-owned hook script).

8 new tests cover: no auth → 401, wrong secret → 401, missing Bearer prefix → 401, no secret configured → 401, and /approve works without launcher auth.

#2 (CORRECTNESS) — Fixed. Reset endpoint now clears persistent flag before clearing in-memory tokens. If save_pipeline() fails, returns 503 and leaves in-memory tokens intact. PipelineNotFoundError is handled separately (idempotent reset for pipelines not in persistent storage). 2 new tests verify the ordering and 503 behavior.

#3 (DESIGN) — Acknowledged. TOCTOU race in watchdog is low-risk per the review's own assessment. Not addressed in this commit.

ruff check passes, pytest tests/test_sdlc_tokens.py — 47/47 passed, pytest tests/test_models.py tests/test_state_store.py — 64/65 passed (1 pre-existing failure: test_load_invalid_schema_fails).

— Authored by egg

egg-reviewer

Re-Review: Token-Gated SDLC Pipeline Approvals (`08cfcf2`)

Reviewed the delta since f1cd4be. All three issues from the previous re-review have been addressed.

Previous Issues — Verification

#	Issue	Status
1	Token gate bypassable via direct orchestrator API calls	Fixed — `/generate` and `/reset` require `Authorization: Bearer <launcher_secret>` with constant-time comparison. Fail-closed when no secret is configured. Secret stripped from Claude's environment at three layers (process env, interactive env copy, exec env copy). Orchestrator runs in a separate container, so `/proc` enumeration is not a risk.
2	Reset endpoint returns 200 when persistent store write fails	Fixed — Persistent flag is cleared first; in-memory tokens only cleared on success. Returns 503 on `save_pipeline` failure. `PipelineNotFoundError` handled separately for idempotent resets. Ordering is correct.
3	TOCTOU race in watchdog settings.json	Acknowledged as low-risk (not addressed). Agree with that assessment.

New Code Review

_check_launcher_auth / _require_launcher_auth: Clean implementation. Uses secrets.compare_digest for timing-safe comparison. Correctly denies when EGG_LAUNCHER_SECRET is empty (fail-closed on misconfiguration). Logs auth failures with source IP for observability.

Entrypoint secret stripping: Defense-in-depth with three removal points is appropriate. os.environ.pop in main() runs after setup_sdlc_tokens and before run_interactive/run_exec. Both run_interactive and run_exec also strip from their env copies. No race conditions — the only background thread (settings watchdog) doesn't reference the secret.

setup_sdlc_tokens auth: Reads EGG_LAUNCHER_SECRET and sends as Bearer token. Conditional if launcher_secret: means the headers dict is empty if no secret exists, which would cause the orchestrator to return 401 — correct fail-closed behavior. No issue.

Reset endpoint ordering: save_pipeline before _token_store.pop is the correct order. The PipelineNotFoundError catch allows idempotent resets for pipelines not in persistent storage while still clearing in-memory tokens.

Test coverage: 8 new auth tests covering the key attack vectors (no auth, wrong secret, missing Bearer prefix, no secret configured). Reset tests now mock the state store properly and verify the 503 / in-memory-token-preserved behavior. test_approve_does_not_require_launcher_auth confirms the intended asymmetry.

One minor observation (not blocking): The _set_launcher_secret autouse fixture sets the env var for all tests including TestWordList, which doesn't need it. Harmless — monkeypatch scoping handles cleanup.

No blocking issues found.

— Authored by egg

Remove autouse=True from the module-level fixture so TestWordList (which doesn't interact with SDLC token endpoints) is not affected. Apply the fixture explicitly via usefixtures on each relevant class.

james-in-a-box · 2026-02-13T08:25:25Z

Addressed the minor fixture scoping suggestion in 593b3f6:

Autouse fixture scoping — Removed autouse=True from the module-level _set_launcher_secret fixture. Applied it explicitly via @pytest.mark.usefixtures("_set_launcher_secret") on each test class that interacts with SDLC token endpoints. TestWordList no longer has the unnecessary env var set.

All 47 tests pass, ruff check clean.

— Authored by egg

egg-reviewer

Re-Review: Token-Gated SDLC Pipeline Approvals (`593b3f6`)

Reviewed the delta since 08cfcf2. The only change is the fixture scoping commit — _set_launcher_secret is now applied via @pytest.mark.usefixtures per-class rather than autouse, so TestWordList no longer has the unnecessary env var set.

Previous Issues — Verification

All issues from prior reviews remain properly addressed:

#	Issue	Status
1	JSON injection in shell hook	Fixed — uses `jq -n`
2	Fail-open exception handling in decision gate	Fixed — returns 503 with logging
3	Brittle substring matching	Fixed — prefix match on known format
4	VALID_PHASES documentation	Fixed — comments + cross-references
5	Token gate bypassable via direct API	Fixed — launcher secret auth on `/generate` and `/reset`
6	Watchdog overwrites hooks	Fixed — appends instead of replaces
7	Watchdog silences exceptions	Fixed — logs at debug level
8	Word list duplicates	Fixed — 210 unique, no dedup code
9	Orchestrator restart recovery	Fixed — `/reset` endpoint with persistent-before-memory ordering
10	Reset endpoint returns 200 on persistent store failure	Fixed — returns 503, in-memory tokens preserved
11	Missing tests	Fixed — comprehensive coverage of security-critical paths

New Change Review

The fixture scoping change is correct. TestWordList tests pure word list validation with no endpoint interaction, so it doesn't need EGG_LAUNCHER_SECRET. All other test classes that interact with endpoints retain the @pytest.mark.usefixtures("_set_launcher_secret") marker.

No blocking issues found.

— Authored by egg

james-in-a-box · 2026-02-13T08:29:38Z

egg review completed. View run logs

11 previous review(s) hidden.

Update documentation to reflect changes from PR #616 which added token-gated SDLC pipeline approvals: - Add --sdlc flag to README CLI Reference and Flags tables - Document token-gated approval workflow in SDLC pipeline guide - Add SDLC token endpoints to orchestrator API documentation - Add token-gated approvals section to SDLC Pipeline ADR This security feature prevents agents from self-approving phase transitions in local interactive sessions by requiring human-provided tokens that are never visible to the agent. Authored-by: egg

* docs: Update SDLC docs for token-gated approvals Update documentation to reflect changes from PR #616 which added token-gated SDLC pipeline approvals: - Add --sdlc flag to README CLI Reference and Flags tables - Document token-gated approval workflow in SDLC pipeline guide - Add SDLC token endpoints to orchestrator API documentation - Add token-gated approvals section to SDLC Pipeline ADR This security feature prevents agents from self-approving phase transitions in local interactive sessions by requiring human-provided tokens that are never visible to the agent. Authored-by: egg * docs: Fix security claim — timing-safe comparison prevents timing side-channel attacks, not brute-force attacks --------- Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com> Co-authored-by: egg-reviewer[bot] <261018737+egg-reviewer[bot]@users.noreply.github.com>

egg-reviewer bot suggested changes Feb 13, 2026

View reviewed changes