Phase 6: Release preparation#7
Closed
james-in-a-box[bot] wants to merge 1 commit intojib/phase-4-clifrom
Closed
Conversation
- Update CHANGELOG with comprehensive feature list from phases 1-4 - Rename package to egg-sandbox for PyPI (egg is likely taken) - Add PyPI classifiers and keywords for discoverability - Add topic categories (Security, Code Generators) Authored-by: jib
This was referenced Feb 7, 2026
james-in-a-box bot
added a commit
that referenced
this pull request
Feb 11, 2026
- Include truncated raw_partial_input in tool_use blocks when JSON parsing fails, for debugging incomplete streaming responses (re-review issue #1) - Add docstring notes documenting: - Tool result matching order-dependency (original issue #5) - Force push checkpoint behavior (original issue #7) - Shared transcript for multi-commit pushes (original issue #8) - container_id fallback from filename (original issue #12) - Fix import sorting in gateway.py (ruff auto-fix) Authored-by: egg
jwbron
pushed a commit
that referenced
this pull request
Feb 11, 2026
…pt capture (#517) * Initialize SDLC contract for issue #509 * Draft analysis for issue #509: checkpoint migration * Update refine review state (cycle 1) * Approve and advance to plan phase for issue #509 * Draft plan for issue #509: per-commit checkpoints and API proxy capture * Update plan review state (cycle 1) * Approve and advance to implement phase for issue #509 * Populate contract tasks from plan for issue #509 * Implement per-commit checkpoint granularity (Phase 1) Add support for creating one checkpoint per commit instead of one per push: - Add get_commits_in_push() helper to enumerate commits between old/new SHAs - Add capture_and_store_checkpoints_for_push() for multi-commit checkpoint creation - Modify gateway push handler to get remote ref before push and iterate commits - Each checkpoint now has push_sha pointing to the tip commit for traceability - Graceful fallback: if rev-list fails, creates single checkpoint for tip commit This is Phase 1 of issue #509 (per-commit checkpoints + API proxy capture). Transcript source is still Claude Code JSONL (to be migrated in Phase 3). Authored-by: egg * Implement API proxy transcript capture (Phase 2) Add instrumentation to the Anthropic API proxy to capture request/response pairs for checkpoint creation: - New TranscriptBuffer class manages per-session buffer files in /tmp/egg-transcripts/ - Fixed 10MB buffer rotation drops oldest entries when size limit exceeded - Non-streaming responses captured with full content and usage - Streaming SSE responses reassembled to extract content blocks - Buffer entry schema documented with timestamps, model, messages, tools, usage - Thread-safe file writes with file locking - Global buffer cache for efficient access by container ID This is Phase 2 of issue #509 (per-commit checkpoints + API proxy capture). Phase 3 will migrate checkpoint extraction to use this buffer instead of Claude Code JSONL files. Authored-by: egg * Migrate transcript source to API proxy buffer (Phase 3) Replace Claude Code JSONL extraction with API proxy buffer reading: - Add extract_transcript_from_proxy_buffer() and related extraction functions - Update CheckpointHandler.capture_checkpoint() to use proxy buffer instead of JSONL - Remove CLAUDE_PROJECTS_DIR constant (no longer needed) - Maintain same return types for backward compatibility The proxy buffer provides a stable API-based source for transcripts, eliminating dependency on Claude Code's internal file format which can change without notice. This is Phase 3 of issue #509. Phase 4 will clean up deprecated JSONL code and add buffer cleanup on session end. Authored-by: egg * Cleanup and documentation for proxy buffer architecture (Phase 4) - Remove all Claude Code JSONL-specific code from transcript_extractor.py (file now only contains proxy buffer extraction functions) - Add buffer cleanup on session end in session_manager.py - Update CLI help text to describe per-commit checkpoint behavior - Add architecture documentation in checkpoint_handler.py explaining the transcript flow from API proxy to checkpoint storage Authored-by: egg * Update autofix attempts for issue #509 * Address review feedback on PR #517 transcript capture Fixes critical and correctness issues raised in review: 1. Race condition in buffer rotation (Critical #1): - Use exclusive file lock during entire rotation operation - Write to temp file then atomically replace with os.replace() - Clean up temp file on error 2. Memory accumulation for streaming (Critical #2): - Cap collected chunks at 10MB to prevent resource exhaustion - Log when capture is truncated due to size limit 3. Missing error handling for failed API responses (Critical #3): - Capture 4xx/5xx responses with status code and error message - Pass status_code to capture function for proper handling 4. Missing input_tokens from message_start (Correctness #4): - Extract input_tokens, cache_read_input_tokens from message_start - message_delta only contains output_tokens per SSE spec 5. Error events not handled in SSE parsing (Correctness #4): - Handle 'error' event type and capture as error content block - Add input_parse_error flag when tool_use JSON fails to parse 6. Unsafe path construction (Correctness #6): - Validate container_id against path traversal characters - Verify resolved path is within buffer directory Minor fixes: - Move `import time` to module level (Minor #10) - Set buffer directory permissions to 0o700 (Minor #11) Authored-by: egg * Address re-review feedback: add raw_partial_input and doc comments - Include truncated raw_partial_input in tool_use blocks when JSON parsing fails, for debugging incomplete streaming responses (re-review issue #1) - Add docstring notes documenting: - Tool result matching order-dependency (original issue #5) - Force push checkpoint behavior (original issue #7) - Shared transcript for multi-commit pushes (original issue #8) - container_id fallback from filename (original issue #12) - Fix import sorting in gateway.py (ruff auto-fix) Authored-by: egg * Address minor review observations: add constant and logging - Add RAW_INPUT_TRUNCATE_SIZE constant for the 1000 char truncation threshold used when preserving raw tool input on parse failure - Add warning log in transcript_extractor when extracting tool calls with input_parse_error flag, surfacing streaming parse failures during checkpoint extraction Authored-by: egg * Add test coverage for input_parse_error handling path Adds tests for the input_parse_error and raw_partial_input handling that was added for incomplete streaming JSON responses. This addresses the non-blocking observation from the PR review to ensure the defensive code path is documented through tests and protected from regressions. - TestParseSSEResponse: Tests that incomplete tool_use JSON sets input_parse_error flag and preserves truncated raw_partial_input - TestExtractToolCallsFromProxyBuffer: Tests that extraction logs a warning when input_parse_error is present and truncates the raw input preview appropriately Authored-by: egg --------- Co-authored-by: james-in-a-box[bot] <2365503+james-in-a-box[bot]@users.noreply.github.com> Co-authored-by: james-in-a-box[bot] <246424927+james-in-a-box[bot]@users.noreply.github.com>
6 tasks
james-in-a-box bot
added a commit
that referenced
this pull request
Feb 12, 2026
Critical fix (AC-28): - Refactor run_interactive() and run_exec() to use subprocess.run() instead of os.execvpe() so entrypoint regains control after process exit and can signal completion to orchestrator Code quality fixes: - Use OrchestratorClient in entrypoint instead of raw urllib (#2) - Add thread-safe singleton pattern with double-checked locking (#3) - Add progress_percent validation (0-100) to ProgressData (#4) - Standardize health check timeout to 5s, signal ops to 10s (#5) - Preserve response body before JSON parsing in error handling (#7) - Add warning log when using fallback constants (#9) - Move ENV_AGENT_ROLE import to module level in detection.py (#10) - Fix docstring mismatch in gateway _check_orchestrator_connectivity (#11) - Export get_orchestrator_client from package __init__.py Authored-by: egg
4 tasks
jwbron
added a commit
that referenced
this pull request
Feb 12, 2026
#556) * Initialize SDLC contract for issue #544 * Draft analysis for issue #544 Analyze the five remaining orchestrator integration items: - AC-24: Gateway health reports orchestrator connectivity - AC-27: Typed sandbox-to-orchestrator API client - AC-28: Sandbox orchestrator mode detection and completion reporting - AC-29: shared/egg_orchestrator/ shared package - AC-33: Orchestrator architecture documentation Recommends hybrid approach following existing patterns. Includes HITL decisions for completion signaling method and shared types scope. Authored-by: egg * Update refine review state (cycle 1) * Approve and advance to plan phase for issue #544 * Draft plan for issue #544: orchestrator integration items * Update plan review state (cycle 1) * Approve and advance to implement phase for issue #544 * Populate contract tasks from plan for issue #544 * Add shared/egg_orchestrator/ package (AC-29) Create shared package for orchestrator types and utilities used by both gateway and sandbox containers. This provides: - OrchestratorClient: Typed client for sandbox-to-orchestrator communication - Orchestrator mode detection utilities (is_orchestrator_mode) - Deployment mode enum (LOCAL, REMOTE_SINGLE, DISTRIBUTED) - Signal types for completion reporting (complete, progress, error, heartbeat) - Constants and configuration for orchestrator connectivity The package is designed to be imported by both containers without creating circular dependencies. It uses only standard library for HTTP communication to avoid adding dependencies. Authored-by: egg * Add orchestrator connectivity to gateway health endpoint (AC-24) The /api/v1/health endpoint now reports orchestrator connectivity status when an orchestrator URL is configured via EGG_ORCHESTRATOR_URL. The response includes: - orchestrator.configured: Whether orchestrator URL is set - orchestrator.reachable: Whether the orchestrator responded to health check - orchestrator.status: Orchestrator health status if reachable - orchestrator.error: Error message if unreachable This enables monitoring tools to verify the full gateway-orchestrator communication path during distributed deployments. Authored-by: egg * Add sandbox orchestrator mode detection and completion reporting (AC-28) The sandbox entrypoint now detects when running in orchestrator mode (vs interactive/local) and signals completion/error back to the orchestrator on container exit. Detection is based on: - EGG_ORCHESTRATOR_MODE environment variable (explicit) - EGG_PIPELINE_ID environment variable (implicit) - EGG_ORCHESTRATOR_URL environment variable (implicit) On exit, the sandbox sends a signal to the orchestrator's signal API: - "complete" signal for successful exit (exit code 0) - "error" signal for failed exit with error message This enables the orchestrator to track sandbox lifecycle without polling container status, and to coordinate multi-agent workflows. Authored-by: egg * Add orchestrator architecture documentation (AC-33) Document the three deployment modes and how components interact: 1. Local mode: Interactive sandbox with gateway sidecar 2. Remote-single mode: Orchestrator spawns single managed sandbox 3. Distributed mode: Orchestrator coordinates multiple agent sandboxes Includes: - Network architecture diagrams (ASCII) - Component interaction details - Signal flow for sandbox lifecycle - Environment variable reference - Shared package documentation Also updates architecture README to reference the new orchestrator doc and adds egg_orchestrator to the components table. Authored-by: egg * Update autofix attempts for issue #544 * Address review feedback on orchestrator integration PR Critical fix (AC-28): - Refactor run_interactive() and run_exec() to use subprocess.run() instead of os.execvpe() so entrypoint regains control after process exit and can signal completion to orchestrator Code quality fixes: - Use OrchestratorClient in entrypoint instead of raw urllib (#2) - Add thread-safe singleton pattern with double-checked locking (#3) - Add progress_percent validation (0-100) to ProgressData (#4) - Standardize health check timeout to 5s, signal ops to 10s (#5) - Preserve response body before JSON parsing in error handling (#7) - Add warning log when using fallback constants (#9) - Move ENV_AGENT_ROLE import to module level in detection.py (#10) - Fix docstring mismatch in gateway _check_orchestrator_connectivity (#11) - Export get_orchestrator_client from package __init__.py Authored-by: egg * Address review feedback on orchestrator integration PR Key changes addressing reviewer feedback: 1. Signal handler logic (#5): Track subprocess completion state to report correct exit codes. If SIGTERM arrives before subprocess completes, report interrupted status (128+signum) instead of always success (0). 2. TTY handling (#4): Add explicit stdin/stdout/stderr to subprocess.run() calls to ensure consistent terminal behavior after the switch from os.execvpe() to subprocess.run(). 3. Security: Unknown phase fail-closed (#1): Change phase_filter to block files for unknown phases instead of allowing by default. This prevents bypass via invalid phase strings. 4. Security: Path escape validation (#2): Add validation in _normalize_path to block paths that escape the repository (e.g., ../../../etc/passwd). 5. py.typed marker file (#1): Add empty py.typed file for PEP 561 type checking support in egg_orchestrator package. 6. Test coverage: Add comprehensive tests for: - egg_orchestrator types, client, detection - Entrypoint orchestrator mode and subprocess handling - Gateway health orchestrator connectivity - Phase filter unknown phase blocking and path escape validation Authored-by: egg * Address contract verification feedback for AC-22, AC-23, AC-25, AC-13, AC-20 Fix acceptance criteria verification issues: - AC-22: Change orchestrator health check timeout from 5s to 2s as specified - AC-23: Include URL field in orchestrator health response when configured - AC-25: Add test for orchestrator unreachable case (connection failure) - AC-13: Add HTTP response tests for signal methods with mocked responses - AC-20: Add tests verifying signals are sent on normal exit and error exit Authored-by: egg * Add url field verification to orchestrator health test The test_health_check_orchestrator_reachable test was mocking _check_orchestrator_connectivity without including the url field that the actual implementation returns. Updated the mock and added an assertion to verify the url field is present. Authored-by: egg --------- Co-authored-by: james-in-a-box[bot] <2365503+james-in-a-box[bot]@users.noreply.github.com> Co-authored-by: james-in-a-box[bot] <246424927+james-in-a-box[bot]@users.noreply.github.com> Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com>
6 tasks
james-in-a-box bot
pushed a commit
that referenced
this pull request
Feb 13, 2026
…add tests - Extract validate_checks() helper in repo_config.py to centralize check validation logic (fixes #2/#3: duplicated validation, dead code) - Add validation of deserialized EGG_REPO_CHECKS in orchestrator to prevent KeyError on malformed data (fixes #1: missing env var validation) - Use validate_checks() from compose.py instead of inline duplication - Remove leading underscores from local variables in pipelines.py (fixes #5) - Add tests for _build_checker_prompt and _build_autofix_prompt with repo_checks parameter (fixes #6: missing prompt builder tests) - Remove redundant sys.path manipulation from test file (fixes #7)
This was referenced Feb 13, 2026
jwbron
added a commit
that referenced
this pull request
Feb 13, 2026
* Add per-repo check commands for multi-repo SDLC pipeline * Address review feedback: validate env var checks, deduplicate logic, add tests - Extract validate_checks() helper in repo_config.py to centralize check validation logic (fixes #2/#3: duplicated validation, dead code) - Add validation of deserialized EGG_REPO_CHECKS in orchestrator to prevent KeyError on malformed data (fixes #1: missing env var validation) - Use validate_checks() from compose.py instead of inline duplication - Remove leading underscores from local variables in pipelines.py (fixes #5) - Add tests for _build_checker_prompt and _build_autofix_prompt with repo_checks parameter (fixes #6: missing prompt builder tests) - Remove redundant sys.path manipulation from test file (fixes #7) * Centralize validate_checks in shared/egg_config/validators Move validate_checks() to shared/egg_config/validators.py as the single canonical definition. The orchestrator, config, and compose modules all import from this shared location, eliminating the duplicated inline validation logic flagged in review. Authored-by: egg * Separate validate_checks import into its own try/except block The validate_checks import was bundled into the same try/except as the network constants (ORCHESTRATOR_*_IP, ORCHESTRATOR_PORT). If egg_config.validators failed to import while egg_config constants succeeded, the except block would overwrite the real constants with hardcoded fallbacks. Use separate try/except blocks so each import has an independent fallback, matching the pattern in repo_config.py. * Add per-repo check commands to setup flow * Gate repo checks config behind top-level prompt; add integration test --------- Co-authored-by: egg <egg@localhost> Co-authored-by: egg-reviewer[bot] <261018737+egg-reviewer[bot]@users.noreply.github.com> Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com>
This was referenced Feb 14, 2026
This was referenced Feb 17, 2026
james-in-a-box bot
pushed a commit
that referenced
this pull request
Feb 22, 2026
- Fix #1: Resolve GitHub token from source_repo query param when repo_path is the scratch repo (no git remote). New _resolve_checkpoint_token helper uses get_token_for_repo(source_repo) as fallback, and all checkpoint endpoints pass the token explicitly. - Fix #2: Add threading lock to _ensure_checkpoint_scratch_repo to prevent TOCTOU race when concurrent requests create the scratch repo. - Fix #3: Add os.path.isdir check for session.last_repo_path so nonexistent sandbox paths fall through to the scratch repo. - Fix #4: Move scratch dir from /tmp/ to /home/egg/.egg-worktrees/ which is within ALLOWED_REPO_PATHS. - Fix #5: Refactor _get_checkpoint_repo_from_args to return (checkpoint_repo, source_repo) tuple, eliminating duplicate git remote calls in _add_checkpoint_resolution_params. - Fix #6: Add 11 gateway-side unit tests covering source_repo resolution, scratch repo creation, repo path fallthrough, and token resolution fallback. - Fix #7: Handle trailing slashes in _get_source_repo regex.
jwbron
added a commit
that referenced
this pull request
Feb 22, 2026
* Fix checkpoint CLI returning empty results in sandbox The checkpoint CLI failed to resolve checkpoint_repo inside sandbox containers because repositories.yaml is only mounted on the gateway. When the CLI couldn't resolve checkpoint_repo locally, it sent HTTP requests without it. The gateway then couldn't auto-detect it either because the sandbox's repo path doesn't exist on the gateway filesystem. Changes: - CLI: Extract _get_source_repo() helper and pass source_repo param to gateway when checkpoint_repo can't be resolved locally - Gateway: _resolve_checkpoint_repo() uses source_repo param as fallback for config lookup when path-based auto-detection fails - Gateway: _resolve_repo_path_for_checkpoints() falls through to session/env/scratch-repo fallbacks when sandbox path doesn't exist locally instead of hard-returning None Fixes #884 * Address review feedback for checkpoint sandbox fix - Fix #1: Resolve GitHub token from source_repo query param when repo_path is the scratch repo (no git remote). New _resolve_checkpoint_token helper uses get_token_for_repo(source_repo) as fallback, and all checkpoint endpoints pass the token explicitly. - Fix #2: Add threading lock to _ensure_checkpoint_scratch_repo to prevent TOCTOU race when concurrent requests create the scratch repo. - Fix #3: Add os.path.isdir check for session.last_repo_path so nonexistent sandbox paths fall through to the scratch repo. - Fix #4: Move scratch dir from /tmp/ to /home/egg/.egg-worktrees/ which is within ALLOWED_REPO_PATHS. - Fix #5: Refactor _get_checkpoint_repo_from_args to return (checkpoint_repo, source_repo) tuple, eliminating duplicate git remote calls in _add_checkpoint_resolution_params. - Fix #6: Add 11 gateway-side unit tests covering source_repo resolution, scratch repo creation, repo path fallthrough, and token resolution fallback. - Fix #7: Handle trailing slashes in _get_source_repo regex. * Address non-blocking review feedback on checkpoint CLI sandbox fix - Fix trailing slash regex in checkpoint_handler.py (3 instances) to match the CLI fix already applied — handles URLs like https://github.com/org/repo/ (review item A) - Add missing assertion for source_repo in test_returns_none_on_unexpected_error (review item B) - Tighten owner/repo regex in gateway.py to reject names starting with dots or hyphens, matching GitHub's actual naming rules (review item C) --------- Co-authored-by: egg <egg@localhost> Co-authored-by: egg-reviewer[bot] <261018737+egg-reviewer[bot]@users.noreply.github.com> Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com>
james-in-a-box bot
pushed a commit
that referenced
this pull request
Feb 25, 2026
- Fix free-text command interception: use /q and /s prefixes instead of bare q/s to avoid swallowing legitimate single-letter answers (issue #4) - Add TOCTOU race condition comment on _resolve_contract_decision documenting the known limitation and future fix path (issue #1) - Add clarifying comment on pipeline_id dual role at call site (issue #5) - Add unit tests for _handle_contract_questions: option-based selection, free-text input, skip (/s), quit (/q), EOFError, KeyboardInterrupt, resolve failure, and literal q/s as valid answers (issues #3/#7) - Add integration test for full [q] → answer → approve flow through the phase gate (issue #7)
jwbron
pushed a commit
that referenced
this pull request
Feb 25, 2026
* Bridge contract HITL decisions to CLI phase gate Contract decisions created by agents via `egg-contract add-decision` were invisible in local mode because the CLI only polled orchestrator decisions. This adds a CLI-level bridge that reads pending decisions directly from the contract JSON file and surfaces them as an interactive [q] option in the phase gate menu. Approving with unanswered questions triggers a warning. Issue: #908 * Address review feedback on contract decision bridge - Fix free-text command interception: use /q and /s prefixes instead of bare q/s to avoid swallowing legitimate single-letter answers (issue #4) - Add TOCTOU race condition comment on _resolve_contract_decision documenting the known limitation and future fix path (issue #1) - Add clarifying comment on pipeline_id dual role at call site (issue #5) - Add unit tests for _handle_contract_questions: option-based selection, free-text input, skip (/s), quit (/q), EOFError, KeyboardInterrupt, resolve failure, and literal q/s as valid answers (issues #3/#7) - Add integration test for full [q] → answer → approve flow through the phase gate (issue #7) * Handle EOF in option-based contract questions, add multi-question test Address non-blocking suggestions from re-review: - Handle _prompt_choice returning "c" on EOF/KeyboardInterrupt in the option-based path of _handle_contract_questions. Previously this would fall through to int("c") and raise ValueError. Now treats "c" as quit. - Add test for EOF during option-based questions to prevent regression. - Add multi-question test (answer one, skip another) to exercise the iteration logic with 2+ pending decisions. --------- Co-authored-by: egg <egg@localhost> Co-authored-by: egg-reviewer[bot] <261018737+egg-reviewer[bot]@users.noreply.github.com>
james-in-a-box bot
pushed a commit
that referenced
this pull request
Feb 26, 2026
- Fix operator precedence bug in test assertion (issue #1) - Restore auto_create_pr field as deprecated for backwards compat (issue #2) - Remove dead local-mode PR phase prompt code (issue #3) - Set work_started_at on phase_execution in auto-PR path (issue #4) - Remove unused mock handler variables (issue #5) - Document create_pr error contract difference from other temp-session methods (issue #7) - Upgrade pre-PR push failure log level from WARNING to ERROR (issue #8)
jwbron
pushed a commit
that referenced
this pull request
Feb 26, 2026
* Auto-create PR in orchestrator, skip agent spawn for PR phase * Fix checks: apply automated formatting fixes * Remove auto_create_pr opt-out, always auto-create PR in orchestrator * Fix lint: remove unused PipelineConfig import * Address review feedback on auto-PR creation - Fix operator precedence bug in test assertion (issue #1) - Restore auto_create_pr field as deprecated for backwards compat (issue #2) - Remove dead local-mode PR phase prompt code (issue #3) - Set work_started_at on phase_execution in auto-PR path (issue #4) - Remove unused mock handler variables (issue #5) - Document create_pr error contract difference from other temp-session methods (issue #7) - Upgrade pre-PR push failure log level from WARNING to ERROR (issue #8) --------- Co-authored-by: egg <egg@localhost> Co-authored-by: james-in-a-box[bot] <246424927+james-in-a-box[bot]@users.noreply.github.com> Co-authored-by: egg-reviewer[bot] <261018737+egg-reviewer[bot]@users.noreply.github.com>
This was referenced Feb 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 6 of the egg extraction: prepare for initial release.
Changes
CHANGELOG.md:
pyproject.toml:
egg-sandbox(PyPI namespace)Validation Plan
Authored-by: jib