Add egg-orchestrator for local SDLC pipeline execution#524
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
No agent-mode design concerns.
This PR introduces the egg-orchestrator service which manages SDLC pipeline lifecycle locally. From an agent-mode design perspective, the architecture aligns well with the principles in docs/guides/agent-mode-design.md:
Good patterns observed:
- Agent autonomy: Sandboxes are spawned with context (pipeline ID, issue number, agent role) but agents fetch their own data through tools rather than receiving pre-baked diffs or logs
- No structured output requirements: The orchestrator communicates with agents via simple signals (completion, failure) rather than requiring parsed JSON output
- Direct action: Agents post reviews, create PRs, and commit code directly via gateway—no post-processing pipelines parsing agent output
- Objectives over procedures: The
/sdlcskill provides the "what" (start a pipeline for issue N) and lets the orchestrator handle orchestration while agents handle execution
Architecture notes:
- The
SandboxConfigpasses lightweight metadata (session tokens, role, issue number) which helps orient agents without constraining them - Container spawning provides environment setup (proxy, gateway URLs) rather than pre-fetched content
- The decision queue for HITL provides objectives ("approve this draft") rather than procedural instructions
The design keeps orchestration logic in the orchestrator (where it belongs) and lets agents operate autonomously within their sandboxes.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Code Review: PR #524 - Egg Orchestrator
This is a comprehensive review of the orchestrator implementation. I've analyzed all files systematically for security vulnerabilities, correctness issues, race conditions, and design problems.
🔴 Critical Security Issues
1. Container ID Injection Vulnerability in docker_client.py
Location: orchestrator/docker_client.py - logs(), stop(), remove(), and other methods
The container ID is passed directly to Docker API calls without validation. An attacker who can control the container ID could potentially manipulate Docker operations.
def logs(self, container_id: str, ...) -> str:
container = self.client.containers.get(container_id) # No validationFix: Add container ID format validation before API calls:
def _validate_container_id(self, container_id: str) -> None:
if not re.match(r'^[a-zA-Z0-9][a-zA-Z0-9_.-]*$', container_id):
raise ValueError(f"Invalid container ID format: {container_id}")2. Path Traversal Risk in state_store.py
Location: orchestrator/state_store.py - _pipeline_path() method
Pipeline IDs are used directly to construct file paths without proper sanitization:
def _pipeline_path(self, pipeline_id: str) -> Path:
return self.pipelines_dir / f"{pipeline_id}.json"If pipeline_id contains ../, it could write files outside the intended directory.
Fix: Validate and sanitize pipeline IDs:
def _pipeline_path(self, pipeline_id: str) -> Path:
if not re.match(r'^issue-[0-9]+$', pipeline_id):
raise ValueError(f"Invalid pipeline ID format: {pipeline_id}")
path = self.pipelines_dir / f"{pipeline_id}.json"
# Ensure resolved path is still under pipelines_dir
if not path.resolve().is_relative_to(self.pipelines_dir.resolve()):
raise ValueError("Path traversal detected")
return path3. Missing Authentication on API Endpoints
Location: All files in orchestrator/routes/
None of the API endpoints implement authentication or authorization. Any process with network access to the orchestrator can:
- Create/delete pipelines
- Spawn containers
- Resolve HITL decisions
- Modify pipeline state
Fix: Add authentication middleware using gateway session tokens:
def require_auth(f):
@wraps(f)
def decorated(*args, **kwargs):
token = request.headers.get('X-Session-Token')
if not token or not validate_session_token(token):
return jsonify({"error": "Unauthorized"}), 401
return f(*args, **kwargs)
return decorated4. Command Injection in Webhook Handler
Location: orchestrator/webhooks.py - Line ~13450
The issue event handler directly uses issue_number from the webhook payload without validation:
pipeline_id=f"issue-{issue_number}"While not immediately exploitable, a malformed issue number from a spoofed webhook could cause issues.
Fix: Validate issue_number is an integer before use.
🟠 Correctness Issues
5. Race Condition in Container Spawning
Location: orchestrator/container_spawner.py - spawn_container() method
The method reads state, creates container, then updates state without any locking mechanism. Concurrent spawn requests could result in:
- Duplicate containers
- Lost state updates
- Inconsistent pipeline state
Fix: Add a locking mechanism:
_spawn_locks: dict[str, threading.Lock] = {}
def spawn_container(self, ...):
lock_key = f"{pipeline_id}:{agent_role}"
if lock_key not in _spawn_locks:
_spawn_locks[lock_key] = threading.Lock()
with _spawn_locks[lock_key]:
# existing spawn logic6. Race Condition in Phase Transitions
Location: orchestrator/routes/phases.py - advance_phase() endpoint
Multiple concurrent requests to advance phases could result in skipped phases or invalid transitions:
@phases_bp.route("/<pipeline_id>/phases/advance", methods=["POST"])
def advance_phase(pipeline_id: str):
pipeline = store.load_pipeline(pipeline_id)
# No lock between load and save
pipeline.current_phase = next_phase
store.save_pipeline(pipeline)Fix: Implement optimistic locking using a version field on the Pipeline model.
7. Missing Error Handling for Docker Connection Failures
Location: orchestrator/docker_client.py - __init__()
The DockerClient constructor doesn't handle connection failures gracefully:
def __init__(self, ...):
self.client = docker.from_env() # Can raise DockerExceptionFix: Wrap in try/except and provide meaningful error message.
8. Histogram Bucket Counting Bug
Location: orchestrator/metrics.py - Histogram.observe() method
The bucket counting logic appears to count values into ALL buckets where value <= bucket rather than just the appropriate bucket:
for bucket in self.buckets:
if value <= bucket:
self._bucket_counts[bucket] += 1 # Increments multiple bucketsThis is actually the correct cumulative behavior for Prometheus histograms, but the get_bucket_counts() returns cumulative counts which may confuse users expecting per-bucket counts. Consider adding documentation to clarify.
9. Git Commit Can Fail Silently
Location: orchestrator/state_store.py - _commit()
If git operations fail, the error is logged but not raised, which could lead to data loss:
def _commit(self, message: str) -> None:
try:
self._run_git(["add", ...])
self._run_git(["commit", "-m", message])
except Exception as e:
logger.error("Git commit failed", error=str(e))
# Error is swallowed!Fix: Re-raise the exception or implement a retry mechanism with eventual consistency guarantees.
🟡 Robustness Issues
10. Missing Request Validation in All Routes
Location: All routes in orchestrator/routes/
Most endpoints don't validate required fields:
@pipelines_bp.route("/", methods=["POST"])
def create_pipeline():
data = request.get_json()
issue_number = data.get("issue_number") # Could be None
repo = data.get("repo") # Could be NoneFix: Add Pydantic request validation:
class CreatePipelineRequest(BaseModel):
issue_number: int
repo: str
branch: str11. Timeout Not Enforced on Container Operations
Location: orchestrator/docker_client.py - multiple methods
Docker operations can hang indefinitely if a container becomes unresponsive:
def logs(self, container_id: str, tail: int = 100) -> str:
container = self.client.containers.get(container_id)
return container.logs(tail=tail).decode() # No timeoutFix: Add timeout parameters to Docker API calls.
12. Circuit Breaker Has No Persistence
Location: orchestrator/resilience.py - CircuitBreaker class
Circuit breaker state is stored in-memory. If the orchestrator restarts, all circuits reset to CLOSED state, potentially causing cascading failures.
Fix: Persist circuit breaker state to Redis or the state store.
13. HITL Decision Timeout Not Actively Checked
Location: orchestrator/routes/decisions.py
The timeout is stored on decisions but there's no background task to check for and handle timeouts:
@decisions_bp.route("/<pipeline_id>/decisions/check-timeouts", methods=["POST"])
def check_timeouts(pipeline_id: str):This endpoint exists but requires manual invocation.
Fix: Add a background scheduler (APScheduler or similar) to check timeouts periodically.
14. Multi-Agent Executor Missing Error Recovery
Location: orchestrator/multi_agent.py - MultiAgentExecutor
If one agent in a wave fails, there's no clear recovery strategy:
async def execute_wave(self, wave: AgentWave) -> list[AgentExecution]:
results = await asyncio.gather(*tasks, return_exceptions=True)Exceptions are captured but not properly handled - the wave continues without addressing failures.
Fix: Implement proper failure handling with configurable strategies (fail-fast, continue, retry).
🔵 Design Issues
15. Inconsistent Error Response Format
Location: All routes
Some endpoints return {"error": "message"} while others return {"success": false, "message": "..."}:
# containers.py
return jsonify({"error": "Container not found"}), 404
# pipelines.py
return jsonify({"success": False, "message": "Pipeline not found"}), 404Fix: Standardize on a single error response format across all endpoints.
16. Circular Import Risk
Location: orchestrator/webhooks.py
The module imports from events inside handler functions to avoid circular imports:
def _handle_issue_event(payload):
from events import EventType, emit_event # Late importThis is a code smell indicating the module structure may need refactoring.
Fix: Consider a more modular event system with proper dependency injection.
17. Hardcoded Network Configuration
Location: orchestrator/gateway_client.py and orchestrator/sandbox_template.py
Network CIDRs and IPs are hardcoded:
SANDBOX_NETWORK = "172.32.0.0/24"Fix: Move to configuration file or environment variables.
18. Missing Structured Logging Context
Location: Multiple files
Many log statements don't include relevant context:
logger.info("Container spawned") # Missing container_id, pipeline_idFix: Always include relevant identifiers in log statements.
📝 Test Coverage Gaps
19. No Integration Tests
The tests are all unit tests with mocked dependencies. There are no integration tests that verify:
- End-to-end pipeline execution
- Docker container lifecycle
- Gateway integration
- State persistence under concurrent access
20. No Load/Stress Tests
No tests verify behavior under concurrent load or when resources are constrained.
21. Tests Don't Cover Error Paths
Many error scenarios are untested:
- Docker daemon unavailable
- Git repository corrupted
- Network failures to gateway
- Disk full conditions
✅ Positive Notes
- The Pydantic models are well-structured with proper validation
- The event system design is clean and extensible
- Circuit breaker pattern is correctly implemented
- Webhook signature verification is properly done with constant-time comparison
- Test fixtures are well organized
Summary
| Severity | Count |
|---|---|
| Critical Security | 4 |
| Correctness | 5 |
| Robustness | 5 |
| Design | 4 |
| Test Gaps | 3 |
Recommendation: Request changes. The security issues (container ID validation, path traversal, missing auth) must be addressed before merge. The race conditions should also be fixed as they could cause data corruption in production.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
|
@james-in-a-box address this feedback: #524 (review) |
Addressed Review FeedbackI've pushed changes (7671581) addressing the critical security issues and race conditions from the review: ✅ Fixed: Critical Security Issues1. Container ID Injection Vulnerability (docker_client.py)
2. Path Traversal Risk (state_store.py)
✅ Fixed: Race Conditions5. Race Condition in Phase Transitions (routes/phases.py)
✅ Already Addressed: Docker Connection Failures7. Docker Connection Error Handling (docker_client.py)
Route UpdatesAll route files updated with proper exception handling for the new error types:
Remaining Items (not addressed in this commit)The following items from the review are noted but not yet implemented:
I recommend addressing the authentication requirement (#3) as a follow-up, since it requires decisions on the auth mechanism and integration with the gateway session tokens. — Authored by egg |
|
@james-in-a-box pull main into this branch |
Merged main into this branchSuccessfully pulled Changes merged from main:
No merge conflicts occurred. The branch is now up to date with — Authored by egg |
There was a problem hiding this comment.
No agent-mode design concerns with the delta since my last review.
The changes since commit a1c45ddd address security and race condition issues identified in the base code review, and they do so in a way that aligns well with agent-mode design principles:
Good pattern: Infrastructure-level constraints
- Container ID validation and pipeline ID validation are enforced in
docker_client.pyandstate_store.py, not via prompt instructions - Optimistic locking with version conflicts is handled at the state store level, returning proper 409 errors
- Path traversal protection uses resolved path validation, not trust in agent behavior
This follows the core principle from the design guide: "The sandbox is the constraint." These security fixes extend the infrastructure boundary rather than relying on prompt-level restrictions that agents could ignore.
No changes to agent prompts, pre-fetching patterns, or output format requirements were introduced in this delta.
— Authored by egg
There was a problem hiding this comment.
Re-Review: PR #524 - Egg Orchestrator
This re-review examines changes made since commit a1c45ddd.
✅ Issues Properly Addressed
1. Container ID Injection Vulnerability (docker_client.py)
Fixed correctly. The _validate_container_id() function validates container IDs using an appropriate regex pattern that allows:
- 12-64 character hex IDs (short/full Docker IDs)
- Valid container names (alphanumeric, underscore, hyphen, period - not starting with hyphen/period)
The validation is applied to all relevant methods: start_container, stop_container, remove_container, get_container_info, get_container_logs, wait_for_container.
2. Path Traversal Risk (state_store.py)
Fixed correctly. The implementation adds:
_validate_pipeline_id()with strict^issue-[0-9]+$pattern enforcement- Secondary
is_relative_to()check to catch any bypass attempts - Appropriate
InvalidPipelineIdErrorexception
3. Optimistic Locking for Phase Transitions (routes/phases.py)
Fixed correctly. All phase mutation endpoints (advance_phase, start_phase, complete_phase, fail_phase) now:
- Capture
original_versionbefore modifications - Pass
expected_versiontosave_pipeline() - Handle
VersionConflictErrorwith 409 Conflict response
4. Version Field on Pipeline Model (models.py)
Fixed correctly. The version field is added with default=1, ge=1 constraint and proper documentation.
🟠 Remaining Issues from Previous Review (Not Addressed)
These were noted as "not addressed in this commit" in the previous response, which is acceptable, but they should still be tracked:
5. Container Spawner Race Condition (container_spawner.py)
Not addressed. The spawn_agent_container method still has no locking mechanism. Concurrent calls for the same pipeline/role could create duplicate containers. This is a lower priority than the fixed issues, but should be addressed before high-concurrency use.
6. Silent Git Commit Failures (state_store.py:224)
Partially addressed. The _commit_state method now raises GitOperationError if git commands fail, which addresses the concern. However, _run_git catches CalledProcessError and re-raises as GitOperationError, which is appropriate.
🔴 New Issue Found
7. Missing Pipeline ID Validation in routes/signals.py
The signal routes do not validate pipeline IDs before using them. While get_state_store eventually validates in _get_pipeline_path, the error handling doesn't catch InvalidPipelineIdError:
# signals.py:169 - only catches PipelineNotFoundError, not InvalidPipelineIdError
try:
store = get_state_store(repo_path)
pipeline = store.load_pipeline(pipeline_id)
...
except PipelineNotFoundError:
return make_error_response(...)Fix: Add InvalidPipelineIdError to the exception handlers in handle_complete_signal, handle_error_signal, and the batch handler.
🟡 Design Observations
8. Inconsistent Error Response Handling
Routes containers.py, phases.py, pipelines.py, and decisions.py now properly handle InvalidPipelineIdError. However, signals.py was missed. The pattern should be consistently applied across all routes.
9. Route files containers.py, pipelines.py, phases.py, decisions.py
All properly import and handle the new exception types. Good work on consistency.
Summary
| Category | Status |
|---|---|
| Container ID Validation | ✅ Fixed |
| Path Traversal | ✅ Fixed |
| Optimistic Locking | ✅ Fixed |
| Pipeline Model Version | ✅ Fixed |
| Container Spawner Race | |
| Signals.py Error Handling | 🔴 New issue - must fix |
Recommendation: Request changes. The signals.py error handling gap needs to be addressed since it's a simple fix (adding one import and one exception handler to each signal handler) and follows the same pattern already applied to other routes.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
9301879 to
f1254c1
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
No agent-mode design concerns with the delta since my last review.
The changes since commit 930187936f02a8f123bc5f63ab782be3fd7f9f1e include a merge from main and security fixes. From an agent-mode design perspective, these changes actually strengthen alignment with the design guidelines:
Good pattern: Infrastructure-level constraints
The security fixes follow the core principle "The sandbox is the constraint":
- Container ID validation in
docker_client.pyprevents injection at the API layer - Pipeline ID validation in
state_store.pyprevents path traversal at the persistence layer - Optimistic locking with version field handles concurrency at the state management layer
These are infrastructure-level constraints, not prompt-level restrictions the agent could ignore.
No changes to agent prompts or context patterns
The orchestrator continues to pass only lightweight metadata to spawned containers:
- Pipeline ID, issue number, agent role (orienting context)
- Session tokens, proxy URLs (infrastructure credentials)
- No pre-fetched diffs, logs, or file contents
The agents remain free to fetch what they need through their tools.
— Authored by egg
There was a problem hiding this comment.
Re-review: Security and Race Condition Fixes (Delta since 9301879)
I've reviewed the changes since commit 9301879. The commit f1254c1 addresses several issues from my previous review.
✅ Issues Properly Addressed
1. Container ID Validation (docker_client.py)
The validation is well-implemented:
- Pattern
^[a-fA-F0-9]{12,64}$|^[a-zA-Z0-9][a-zA-Z0-9_.-]*$correctly validates both hex IDs (12-64 chars) and container names - Validation is applied to
start_container,stop_container,remove_container,get_container_info,get_container_logs, andwait_for_container - New
InvalidContainerIdErrorexception properly exposed
2. Path Traversal Prevention (state_store.py)
Properly implemented:
- Pattern
^issue-[0-9]+$restricts pipeline IDs to expected format - Secondary
is_relative_to()check ensures resolved path stays withinpipelines_dir - New
InvalidPipelineIdErrorexception
3. Optimistic Locking (state_store.py, routes/phases.py)
Implemented correctly:
versionfield added toPipelinemodel with default=1save_pipeline()acceptsexpected_versionparameterVersionConflictErrorraised on version mismatch- All phase mutation endpoints (
advance_phase,start_phase,complete_phase,fail_phase) use optimistic locking
4. Route Error Handling
New exceptions properly caught in routes:
routes/containers.py: HandlesInvalidContainerIdError(400)routes/pipelines.py: HandlesInvalidPipelineIdError(400)routes/phases.py: HandlesInvalidPipelineIdError(400) andVersionConflictError(409)routes/decisions.py: HandlesInvalidPipelineIdError(400)
🟠 Remaining Issues
1. Missing InvalidContainerIdError Handling in Health Check Endpoint
Location: orchestrator/routes/containers.py:426-447
The /health endpoint calls monitor.check_container_health(container_id), which in turn calls docker_client.get_container_info(container_id). This now raises InvalidContainerIdError, but the route doesn't catch it:
@containers_bp.route("/<pipeline_id>/containers/<container_id>/health", methods=["GET"])
def check_container_health(pipeline_id: str, container_id: str) -> tuple[Response, int]:
monitor = get_container_monitor()
health = monitor.check_container_health(container_id) # Can raise InvalidContainerIdError
return make_success_response("Health checked", data=health)Fix: Add the same InvalidContainerIdError handler as in other endpoints, or catch it in ContainerMonitor.check_container_health().
2. No Tests for Security Validation Functions
The new validation functions and exceptions (_validate_container_id, _validate_pipeline_id, InvalidContainerIdError, InvalidPipelineIdError, VersionConflictError) have no corresponding tests. The test files (test_docker_client.py, test_state_store.py) don't test:
- Invalid container ID formats being rejected
- Path traversal attempts being blocked
- Optimistic locking version conflicts
- Edge cases in the regex patterns
This is a test coverage gap for security-critical code.
3. Container Spawner Race Condition Not Addressed
From my original review (issue #5): The race condition in container spawning (container_spawner.py) was noted but not fixed. The response indicated this would be addressed separately, but it's still a concern for concurrent spawn requests.
📋 Summary
| Item | Status |
|---|---|
| Container ID injection fix | ✅ Fixed |
| Path traversal fix | ✅ Fixed |
| Optimistic locking | ✅ Fixed |
| Route exception handling | |
| Test coverage for security fixes | ❌ Missing |
| Container spawner race condition | ❌ Not addressed (noted for follow-up) |
Recommendation: The health check endpoint bug should be fixed before merge. The test coverage gap is concerning for security code but can be addressed in a follow-up. The container spawner race condition was acknowledged as a separate item.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add implementation plan for issue #496 introducing egg-orchestrator, a dedicated service for local SDLC pipeline execution. Authored-by: egg
There was a problem hiding this comment.
Contract Verification Re-Review
This is an incremental contract verification for changes since commit a841c27f8b0269e8fa14777df389b7ccfba3b396 (the merge from main).
Delta Since Last Review
One commit has been added:
0945d0c — "Guard start_pipeline against AWAITING_HUMAN and fix worktree cleanup safety"
This commit makes three targeted improvements:
-
orchestrator/routes/pipelines.py: Added guard against starting pipelines inAWAITING_HUMANstate, returning 409 Conflict. This prevents race conditions where the orchestrator might try to start a pipeline while waiting for human approval at a HITL gate. -
sandbox/entrypoint.py: Added working tree validation logic that detects empty worktrees (containing only.git) and re-populates them viagit checkout HEAD -- .. This fixes a startup failure mode where containers could start with empty workspaces. -
gateway/gateway.py: Added safety net to query Docker containers directly (get_active_docker_containers()) during startup cleanup, ensuring orphaned worktrees are cleaned even when session state is stale.
Contract Compliance Verification
No contract violations found in the delta. The changes are defensive bug fixes that improve resilience without changing the contract scope.
| Change | Impact on Contract |
|---|---|
| AWAITING_HUMAN guard | Improves ac-12 (phase transition validation) |
| Worktree re-population | Improves sandbox reliability (no AC change) |
| Docker container safety net | Improves ac-8 (orphaned container cleanup) |
Acceptance Criteria Status Summary
Based on previous contract reviews and my verification:
| Status | Count | Notes |
|---|---|---|
| ✅ Verified | 23 | Core orchestrator functionality implemented |
| ❌ NOT VERIFIED | 10 | Missing files/features documented below |
NOT VERIFIED Criteria (unchanged from previous review)
These acceptance criteria remain unverified due to missing implementations:
- ac-17 (HITL sandbox mode): File
sandbox/egg_lib/hitl_mode.pydoes not exist - ac-21 (Gateway orchestrator callback): File
gateway/orchestrator_api.pydoes not exist - ac-24 (Gateway health includes orchestrator status): No gateway health modification detected
- ac-25 (egg-deploy status shows both services): Partial — needs verification
- ac-27 (sandbox orchestrator_client.py): File
sandbox/egg_lib/orchestrator_client.pydoes not exist - ac-28 (Sandbox reports completion to orchestrator): Not implemented in sandbox entrypoint
- ac-29 (egg_orchestrator package in containers): Partial — needs installation verification
- ac-32 (Remote orchestrator mode handling): Partial — needs verification
- ac-33 (Architecture docs for deployment modes): Documentation missing
Recommendation
Comment — The delta since last review contains only defensive bug fixes with no contract violations. The same 10 acceptance criteria that were NOT VERIFIED in the previous review remain unverified.
The implementation of core orchestrator functionality is substantial and working (23/33 criteria met). The missing items are primarily:
- Sandbox-side orchestrator integration (
hitl_mode.py,orchestrator_client.py) - Gateway-orchestrator callback endpoint
- Documentation for deployment modes
These could be addressed as follow-up work if desired.
— Authored by egg
|
egg contract-verification completed. View run logs 21 previous review(s) hidden. |
There was a problem hiding this comment.
Re-Review: PR #524 - Egg Orchestrator
This re-review examines the delta since commit a841c27f8b0269e8fa14777df389b7ccfba3b396.
✅ New Changes Reviewed
1. Guard start_pipeline Against AWAITING_HUMAN Status
Location: orchestrator/routes/pipelines.py:1931-1935
if pipeline.status == PipelineStatus.AWAITING_HUMAN:
return make_error_response(
f"Pipeline {pipeline_id} is awaiting human approval",
status_code=409,
)Correct implementation. This prevents the race condition where an external caller could try to restart a pipeline while it's blocked waiting for human approval at a HITL gate. The pipeline remains in AWAITING_HUMAN status until the decision is resolved, then transitions back to RUNNING internally (lines 1860-1861).
2. Worktree Cleanup Safety
The worktree cleanup safety is properly ensured by validate_identifier() in gateway/worktree_manager.py:72-91:
def validate_identifier(value: str, name: str) -> None:
if not value:
raise ValueError(f"Invalid {name}: cannot be empty")
if ".." in value:
raise ValueError(f"Invalid {name}: path traversal not allowed")
if not re.match(r"^[a-zA-Z0-9][a-zA-Z0-9._-]*$", value):
raise ValueError(f"Invalid {name}: must be alphanumeric with ._- allowed")This validation is called in remove_worktree() (lines 430-434) and create_worktree() (lines 144-148) before any filesystem operations, preventing path traversal attacks via container_id or repo_name.
✅ Previously Identified Issues: Verified Fixed
All blocking issues from prior reviews remain properly addressed:
| Item | Status | Location |
|---|---|---|
| Container ID validation | ✅ Fixed | docker_client.py:97-121 |
| Pipeline ID validation | ✅ Fixed | state_store.py:67-89 |
| Optimistic locking | ✅ Fixed | state_store.py, routes/phases.py |
| Exception handling in routes | ✅ Fixed | All route files |
.env file permissions |
✅ Fixed | compose.py:159-160 |
| Gateway API endpoint paths | ✅ Fixed | gateway_client.py |
| Container spawner environment | ✅ Fixed | Pre-registers session at lines 202-231 |
delete_session_by_container |
✅ Fixed | Uses DELETE /api/v1/sessions/by-container/{id} |
⚠️ Minor Advisory (Not Blocking)
Test Coverage for AWAITING_HUMAN Guard
The new AWAITING_HUMAN guard at line 1931 does not have an explicit integration test. While the existing TestHITLGate tests cover the HITL gate flow and TestPipelineStartIdempotency covers re-starting running/completed pipelines, there's no test that specifically verifies:
- Creating a pipeline
- Starting it
- Letting it reach AWAITING_HUMAN status
- Attempting to start it again via API
- Asserting 409 is returned
This would be a straightforward addition to the existing integration test suite but is not blocking.
Summary
| Item | Status |
|---|---|
start_pipeline AWAITING_HUMAN guard |
✅ Correctly implemented |
| Worktree cleanup identifier validation | ✅ Correctly implemented |
| Previous blocking issues | ✅ All fixed |
| Test coverage for AWAITING_HUMAN guard |
Recommendation: Approve. The new changes correctly guard against starting pipelines in AWAITING_HUMAN state and the worktree cleanup safety is properly enforced via identifier validation. All blocking issues from previous reviews remain properly fixed.
— Authored by egg
|
egg review completed. View run logs 21 previous review(s) hidden. |
Docker creates the container bind mount before the gateway creates the git worktree. Docker's tmpfs mount over .git creates a .git DIRECTORY on the host. When create_worktree() later runs git worktree add, it fails because .git is a directory (git worktrees need .git to be a FILE containing a gitdir: pointer). After the existing shutil.rmtree (which silently fails on active mount points), explicitly check for and remove a leftover .git directory so git worktree add can create its .git file.
There was a problem hiding this comment.
No agent-mode design concerns.
Re-review summary: The branch was rebased since my last review at commit a1c45dd. The new commit dba968c adds a worktree creation fix (removing pre-existing .git directories that Docker creates before the gateway can set up the git worktree).
This change is purely infrastructure-level—it fixes container mount timing, not agent interaction patterns. My previous assessment still applies: the orchestrator passes lightweight metadata (pipeline ID, issue number, agent role) to orient agents without constraining them. Agents fetch their own data through tools rather than receiving pre-baked content.
— Authored by egg
|
egg agent-mode-design completed. View run logs 21 previous review(s) hidden. |
Verify that POST /start returns 409 when a pipeline is in awaiting_human status (paused at a HITL gate). The test creates a pipeline with hitl_gates=True, waits for it to reach awaiting_human, attempts a second start (expects 409), then resolves the gates so the pipeline completes cleanly.
There was a problem hiding this comment.
Re-Review: PR #524 - Egg Orchestrator
This re-review examines changes since commit 0945d0cc9421685f578be4fc91629014700262de.
✅ All Previously Identified Issues Resolved
1. Container Spawner Gateway Environment Bug
Fixed correctly. The spawn_agent_container() method now registers the gateway session and merges gateway environment variables BEFORE container creation (lines 202-225 in container_spawner.py):
# Lines 215-225: Gateway env added BEFORE container creation
gateway_env = self.gateway.get_container_env(
session_token=session_info.session_token,
...
)
# Add gateway environment to container env BEFORE creation
env.update(gateway_env)The container is then created at line 243 with the complete environment including EGG_SESSION_TOKEN, GATEWAY_URL, and proxy configuration.
2. signals.py InvalidPipelineIdError Handling
Fixed correctly. InvalidPipelineIdError is imported at line 37 and properly handled in both handle_complete_signal (lines 194-198) and handle_error_signal (lines 301-305).
3. containers.py Health Check InvalidContainerIdError Handling
Fixed correctly. The check_container_health endpoint (lines 426-454) now catches InvalidContainerIdError and returns 400.
4. compose.py .env File Permissions
Fixed correctly. Line 160 sets 0o600 permissions on the generated .env file:
env_file.chmod(0o600) # Owner read/write only🔵 New Code Review: gateway/worktree_manager.py
This commit adds a worktree manager that handles the Docker/.git directory race condition. The implementation is well-structured:
Security
Path traversal prevention is properly implemented:
validate_identifier()function (lines 79-90) validates container_id and repo_name- Rejects empty values, path traversal (
../), and special characters - Pattern:
^[a-zA-Z0-9][a-zA-Z0-9._-]*$ - Applied consistently before path construction
UID/GID validation:
- Lines 157-160 validate uid/gid are non-negative integers before use in subprocess
- Prevents injection via the chown command
Correctness
The Docker .git directory race condition fix:
- Lines 195-221 detect and handle invalid worktree directories
- If a
.gitdirectory exists (from Docker's tmpfs), it's removed before creating the actual.gitfile - This correctly addresses the issue described in the commit message
Test Coverage
Comprehensive tests in test_worktree_manager.py:
TestValidateIdentifier: Path traversal, empty values, special charactersTestWorktreeManagerDockerGitDir: Specifically tests the Docker pre-creation scenario- Cleanup behavior tests
Summary
| Issue | Status |
|---|---|
| Container spawner gateway env | ✅ Fixed |
| signals.py InvalidPipelineIdError | ✅ Fixed |
| containers.py health check | ✅ Fixed |
| compose.py .env permissions | ✅ Fixed |
| worktree_manager.py security | ✅ Properly implemented |
| worktree_manager.py tests | ✅ Adequate coverage |
All previously identified blocking issues have been properly addressed. The new worktree manager code is well-implemented with appropriate security validation and test coverage.
Recommendation: Approve.
— Authored by egg
There was a problem hiding this comment.
Contract Verification Re-Review
This is an incremental contract verification for changes since commit 0945d0cc9421685f578be4fc91629014700262de.
Delta Since Last Review
One commit has been added:
dba968c — "Fix worktree creation when Docker pre-creates .git directory"
This commit addresses an edge case where Docker creates the container bind mount (including a .git directory via tmpfs) before the gateway creates the git worktree. Since git worktree add expects .git to be a file (containing a gitdir: pointer), not a directory, the worktree creation was failing.
Changes:
gateway/worktree_manager.py: +7 lines — After the existingshutil.rmtree()(which silently fails on active mount points), explicitly check for and remove a leftover.gitdirectorygateway/tests/test_worktree_manager.py: +68 lines — Two new tests covering the bug fix and normal case scenarios
Contract Compliance Verification
No contract violations found. The change is a defensive bug fix that improves worktree creation reliability without changing any contract scope.
| Change | Impact on Contract |
|---|---|
.git directory cleanup |
Improves reliability for ac-7 (sandbox configuration) |
| Test coverage | Regression tests for worktree edge case |
Code Review Notes
The fix is correct and targeted:
- Only runs when
worktree_path.exists()afterrmtree(indicating bind mount blocked full removal) - Only removes
.gitif it exists AND is a directory (won't interfere with valid worktrees) - Uses
ignore_errors=Truefor safety - Tests verify both the fix scenario and normal operation
Acceptance Criteria Status Summary
| Status | Count | Notes |
|---|---|---|
| ✅ Verified | 23 | Core orchestrator functionality (unchanged) |
| ❌ NOT VERIFIED | 10 | Missing files/features (unchanged from previous review) |
NOT VERIFIED Criteria (unchanged)
These acceptance criteria remain unverified due to missing implementations:
- ac-17 — HITL sandbox mode:
sandbox/egg_lib/hitl_mode.pydoes not exist - ac-21 — Gateway orchestrator callback:
gateway/orchestrator_api.pydoes not exist - ac-24 — Gateway health includes orchestrator status: Not implemented
- ac-25 — egg-deploy status shows both services: Partial
- ac-27 — sandbox orchestrator_client.py: File does not exist
- ac-28 — Sandbox reports completion to orchestrator: Not implemented
- ac-29 — egg_orchestrator package in containers: Needs verification
- ac-32 — Remote orchestrator mode handling: Partial
- ac-33 — Architecture docs for deployment modes: Documentation missing
Recommendation
Comment — The delta contains only a targeted bug fix with proper test coverage and no contract violations. The same 10 acceptance criteria that were NOT VERIFIED in the previous review remain unverified. The core orchestrator implementation (23/33 criteria) is working as designed.
— Authored by egg
|
egg review completed. View run logs 21 previous review(s) hidden. |
|
egg contract-verification completed. View run logs 21 previous review(s) hidden. |
There was a problem hiding this comment.
No agent-mode design concerns with the delta since my last review.
The changes since commit 885e674 introduce several new features:
-
Multi-reviewer architecture: The pipeline now spawns typed reviewer containers (unified, agent-design, code, contract) for each phase. This aligns well with agent-mode principles — each reviewer gets a focused objective and posts verdicts directly to
.egg-state/reviews/. -
Review prompts: The
_build_review_prompt()function provides context (phase, pipeline ID, reviewer type) without pre-fetching large diffs. Reviewers are told to read draft files themselves, which aligns with guideline #1 (avoid unnecessary pre-fetching). -
Checker/autofixer agents: The implement phase now spawns checker and autofixer agents. The autofixer receives a lightweight failure summary (~3-4 lines) but is instructed to read the full results file itself. This is appropriate orienting context, not excessive pre-fetching.
-
Local pipelines include PR phase: Local-mode pipelines now run all 4 phases (including PR). The gateway mode is switched to
publicfor the PR phase only, enabling push access. This is enforced at the session level in the gateway, not via prompt instructions — correctly following guideline #5 (sandbox enforcement over prompt-level security). -
JSON verdict format for reviewers: Reviewers write a small JSON verdict file (
{verdict, summary, feedback}). This is appropriate because the verdict is consumed by the orchestrator (machine-readable), not directly by humans. The actual review comments should still be posted in natural language to the PR. -
HITL gates: The
hitl_gatesconfig option pauses for human approval after refine and plan phases. Decisions are surfaced via the status endpoint, keeping humans in the loop without constraining agent exploration.
The integration tests demonstrate the architecture working end-to-end with mock sandbox containers that write appropriate artifacts.
— Authored by egg
|
egg agent-mode-design completed. View run logs 21 previous review(s) hidden. |
There was a problem hiding this comment.
Incremental Contract Verification Review
Delta reviewed: dba968c..35147ac (1 commit)
New Commit Analysis
35147ac Add integration test for AWAITING_HUMAN start_pipeline guard
The new commit adds a comprehensive integration test that verifies the AWAITING_HUMAN guard in start_pipeline(). The test:
- Creates a pipeline with
hitl_gates=True - Starts the pipeline and waits for
awaiting_humanstatus - Attempts to start again and verifies 409 response
- Verifies error message contains "awaiting human"
- Resolves gates and waits for clean completion
This test strengthens contract compliance for acceptance criteria related to HITL integration (ac-16 through ac-20), specifically validating the interaction between HITL gates and pipeline state management.
Contract Verification Status
The 33 acceptance criteria remain unverified in the contract file. This is expected — my role as REVIEWER is to verify criteria and mark them verified using egg-contract verify-criterion. However, the egg-contract CLI has an import error in this environment:
ImportError: attempted relative import with no known parent package
Implementation Assessment
Based on code review of the delta:
- No regressions identified — The new test is additive and doesn't modify any existing logic
- Test quality is high — Uses proper fixtures, handles cleanup, validates both status codes and error messages
- No security concerns — The test file contains no vulnerabilities
Recommendation
The incremental change is correct and improves test coverage. Once the egg-contract CLI is functional, the following criteria should be verified based on the existing implementation:
| Criterion | Status | Evidence |
|---|---|---|
| ac-16 | Ready to verify | routes/decisions.py implements POST/GET endpoints |
| ac-18 | Ready to verify | routes/decisions.py implements resolve endpoint |
| ac-4 | Ready to verify | routes/health.py, routes/pipelines.py exist |
The PR continues to make solid progress toward full contract compliance.
— Authored by egg
|
egg contract-verification completed. View run logs 21 previous review(s) hidden. |
Update documentation to reflect egg-orchestrator component added in d1d4440: - Add orchestrator/ directory to STRUCTURE.md with detailed file listing - Add orchestrator to architecture component table - Update deployment guide network topology diagram - Add orchestrator to component documentation index - Include new local_pipeline integration tests Triggered by: PR #524 (d1d4440) Authored-by: egg
* docs: Add orchestrator to project structure docs Update documentation to reflect egg-orchestrator component added in d1d4440: - Add orchestrator/ directory to STRUCTURE.md with detailed file listing - Add orchestrator to architecture component table - Update deployment guide network topology diagram - Add orchestrator to component documentation index - Include new local_pipeline integration tests Triggered by: PR #524 (d1d4440) Authored-by: egg * docs: Address review feedback on orchestrator documentation - Add missing requirements.txt to orchestrator structure listing - Update component docs header text for link consistency Authored-by: egg --------- Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com> Co-authored-by: james-in-a-box[bot] <246424927+james-in-a-box[bot]@users.noreply.github.com>
The SDLC pipeline has been fully migrated to the local distributed orchestrator (PR #524). This removes the now-superseded GitHub Actions implementation: 8 workflow files, 3 supporting scripts, the checks/ directory, 14 action prompt builders, and associated tests. All documentation is updated to reference the orchestrator equivalents. Issue: #545 Authored-by: egg
* Add analysis for issue #545: Remove GitHub Actions SDLC architecture * Add implementation plan for issue #545: Remove GitHub Actions SDLC architecture * Remove GitHub Actions SDLC workflows and supporting scripts The SDLC pipeline has been fully migrated to the local distributed orchestrator (PR #524). This removes the now-superseded GitHub Actions implementation: 8 workflow files, 3 supporting scripts, the checks/ directory, 14 action prompt builders, and associated tests. All documentation is updated to reference the orchestrator equivalents. Issue: #545 Authored-by: egg * Add implement phase check results * Fix lint errors and stale test for invalid schema validation - Fix 8 auto-fixable ruff errors: remove unused imports (F401), update typing imports to collections.abc (UP035), sort import block (I001), remove unused variable (F841) - Fix 6 E741 ambiguous variable names: rename `l` to `line` in list comprehensions across SSE test files - Fix test_load_invalid_schema_fails: Pipeline model fields became optional in #554, so {"id": "issue-9998"} is now valid. Updated test to use an invalid enum value for status instead. Authored-by: egg * Add implement phase check results for issue-545 * Add unified review verdict for issue-545 implement phase * Add agent-design review verdict for issue-545 implement phase * Add code review verdict for issue-545 implement phase Reviewed security, correctness, robustness, and design across the v2 checkpoint system, session manager changes, workflow removals, and orchestrator updates. No critical issues introduced by this diff. Authored-by: egg * Add contract review verdict for issue-545 implement phase * Address review feedback: remove stale references and fix docstring Remove dangling references to deleted files (sdlc-work-loop.yml, build-sdlc-prompt.sh) from comments and docstrings. Fix redundant phrasing in _populate_contract_from_plan docstring. PR description updated separately to correct inaccurate claims about action/ and config/ directory removal. * Restore PR-operational checks, contract validator, and prompt builder Restores files that were incorrectly removed as part of the SDLC cleanup: - .github/scripts/checks/ directory (check_fixer, lint_check, test_check, merge_conflict_check, draft_validation_check, plan_yaml_check, base, run_check) - .github/workflows/on-pull-request-contract-verify.yml - action/build-contract-verification-prompt.sh - tests/scripts/test_checks.py Removes deployment_check from run_check.py registry (SDLC-only, correctly deleted). Updates docs, README, and test-action.yml shellcheck to include restored files. * Remove stale check-deployment definition from phase defaults The deployment_check.py script was deleted during merge conflict resolution (it depended on deleted .github/scripts/checks/ base infrastructure from the GHA SDLC removal). The CheckDefinition referencing it remained as a dangling reference from PR #653. Remove it to avoid a potential ValueError if run_check.py tries to dispatch an unknown 'deployment' check. * Address review feedback: fix EDITOR handling, add subprocess timeouts, add reconnection backoff - Fix _launch_editor to use shlex.split for multi-word $EDITOR values (e.g. "code --wait", "vim +10") - Add timeout=30 to all subprocess.run calls in _commit_statefiles_to_worktree to prevent indefinite hangs on git lock contention - Add exponential backoff (1s-30s) and max retry limit (20) to watch_pipeline reconnection loop to prevent resource exhaustion - Use consistent word-boundary regex matching in _detect_phase for all phase keywords instead of mixing substring and regex strategies - Narrow _find_repo_path exception handler from bare Exception to specific subprocess/OS error types Authored-by: egg --------- Co-authored-by: egg <egg@localhost> Co-authored-by: egg-reviewer[bot] <261018737+egg-reviewer[bot]@users.noreply.github.com> Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com>
Summary
Introduces egg-orchestrator, a new Docker container service that manages the
full SDLC pipeline lifecycle locally, replacing GitHub Actions orchestration.
The orchestrator stores state in git (consistent with contracts), manages
sandbox container lifecycle via Docker API, supports multi-agent parallel
execution, and integrates HITL decisions through Claude Code sessions.
This latest commit adds 14 integration tests (all passing) and fixes three
sandbox container spawning bugs that caused immediate failure.
Bug fixes
The orchestrator's
_run_pipeline()was spawning sandbox containers thatimmediately failed for three reasons:
Missing repo volume mount — Docker socket operates on the host daemon,
so volume mounts need host paths, not container paths. Added
EGG_HOST_REPOS_DIRenv var to pass the host-side repo path.Missing
GATEWAY_URL— Sandbox entrypoint health check defaulted tohttp://egg-gateway:9848(DNS name) which doesn't resolve from a containeron the isolated network. Now passes the IP-based URL from the orchestrator.
No command passed — Without explicit args, the sandbox entrypoint
launched interactive Claude Code (no TTY attached → immediate error). Now
builds a
claude --printcommand with phase-specific prompt matching theGHA SDLC workflow flags (
--verbose,--output-format stream-json,--max-turns 200).Additionally: container logs are now captured before cleanup, so failure
messages include the last 10 lines of sandbox output for debugging.
Integration tests (14 passing)
test_create_local_pipelinetest_start_local_pipeline_completestest_local_pipeline_no_pr_phasetest_local_pipeline_container_failuretest_failure_error_includes_container_logstest_sandbox_receives_pipeline_envtest_sandbox_receives_gateway_urltest_sandbox_receives_repo_volumetest_start_running_pipeline_returns_409test_start_completed_pipeline_returns_409test_issue_pipeline_includes_pr_phasetest_pipeline_crudtest_gateway_local_mode_blocks_pushtest_failed_pipeline_returns_409Known issues (analysis, not yet fixed)
1. Missing
EGG_SESSION_TOKENfor local-mode containersSymptom: The sandbox agent hits
ERROR: EGG_SESSION_TOKEN not setwhentrying to use git operations inside the container.
Root cause:
container_spawner.py:200hasif mode != "local" and repos:which explicitly skips gateway session registration for local pipelines. The
original rationale was "local pipelines don't need gateway access," but:
EGG_SESSION_TOKENsession level — without a session, there's no enforcement
Fix: Remove the
mode != "local"guard and always register sessions.For local pipelines without an explicit repo, detect the repo from the
mounted volume or pass a default. The gateway already handles mode="local"
sessions correctly (blocks push, allows read).
2. OOM kill on memory-constrained systems (exit code 137)
Symptom: Sandbox container exits with code 137, empty logs, container
already gone before log capture.
Root cause: Exit code 137 = SIGKILL (128 + 9), typically from the kernel
OOM killer. Spawning a full sandbox container (~400MB for Claude Code) on a
system that's already under memory pressure triggers OOM. Observed on a 32GB
system with ~1GB free RAM and swap nearly exhausted (Firefox was consuming
~15GB).
Analysis: The sandbox entrypoint runs 11 setup phases before launching
Claude Code. Most are necessary even for local mode (worktree isolation
prevents stomping on the user's working tree). Reducing peak memory without
skipping critical phases requires investigation into:
OOMKilledflag after exit code 137 fora clear diagnostic message instead of mysterious "Container exited with
code 137"
fail early with actionable guidance
made lighter without compromising isolation guarantees
Closes: #496