Extract shared container-launch command builder by james-in-a-box[bot] · Pull Request #159 · jwbron/egg

james-in-a-box · 2026-02-06T05:52:12Z

Extract shared build_sandbox_docker_cmd() for container launches

Container-launch command construction was duplicated across three call
sites (run_claude(), exec_in_new_container(), E2E run_claude_structured()),
causing divergence bugs like missing --add-host and wrong env var names.
The test harness was also missing PRIVATE_MODE, DNS lockdown, proxy
config, RUNTIME_UID/GID, CONTAINER_ID, and --security-opt.

This introduces a new shared/egg_container module with a pure
build_sandbox_docker_cmd() function that all three call sites now use.
The function builds the base docker run argument list with all
security-critical flags (network, DNS, proxy, session token, etc.)
while callers insert their own lifecycle flags, mounts, and commands
via documented conventions (cmd[2:2], cmd[-1:-1], cmd.extend()).

_get_container_network_config() in runtime.py now returns a
ContainerNetworkConfig dataclass instead of a raw tuple, making the
interface self-documenting.

Issue: none

Test plan:

26 new unit tests cover private/public mode, optional params, and caller insertion conventions
Full test suite passes (1049/1049)
ruff check passes on all modified files
Verify via CI workflow_dispatch E2E after merge

Authored-by: egg

jwbron · 2026-02-06T05:53:19Z

@james-in-a-box fix merge conflicts on this branch

james-in-a-box · 2026-02-06T06:03:06Z

Merge conflicts resolved. Rebased onto main (dcfe0a8 from PR #157 which removed redundant docs directories).

Resolution summary:

File	Resolution
`.github/workflows/on-pull-request.yml`	Kept main's version (unrelated to this PR)
`README.md`, `action/.{md,sh}`, `config/`, `docs/.md`, `gateway/`, `sandbox/README.md`	Kept main's version (PR #157 changes)
`sandbox/egg_lib/setup_flow.py`	Kept main's GitHub docs URL (local docs file was removed in #157)
`pyproject.toml`	Added `shared/egg_container` to packages list
`shared/pyproject.toml`	Added `egg_container*` to packages.find include
`integration_tests/conftest.py`	Merged both: PR's `build_sandbox_docker_cmd()` usage + main's E2E credential passthrough and timeout handling
`sandbox/egg_lib/runtime.py`	Merged: PR's refactoring to use `ContainerNetworkConfig` dataclass and `build_sandbox_docker_cmd()`

The PR's new shared/egg_container/ module and tests were added cleanly.

— Authored by egg

james-in-a-box · 2026-02-06T06:03:24Z

egg finished successfully. View run logs

jwbron · 2026-02-06T18:34:22Z

@james-in-a-box review this pr

james-in-a-box

Review: LGTM

I've reviewed the implementation and it looks good. The refactoring successfully consolidates the duplicated container-launch command construction into a single, well-tested function.

What I reviewed

shared/egg_container/__init__.py

Clean, pure function with keyword-only arguments for safety
ContainerNetworkConfig dataclass provides a self-documenting interface
Handles both private and public modes correctly:
- Private: DNS lockdown (--dns 0.0.0.0), proxy env vars, PRIVATE_MODE=true
- Public: No DNS restriction, no proxy vars, PRIVATE_MODE=false
Extension points (extra_env, extra_args) allow callers to add context-specific configuration
Good docstring documents the caller insertion conventions (cmd[2:2], cmd[-1:-1], cmd.extend())

sandbox/egg_lib/runtime.py

_get_container_network_config() now returns ContainerNetworkConfig instead of a raw tuple
Both run_claude() and exec_in_new_container() use the shared builder
Caller-specific env vars (timing, auth, task correlation) are passed via extra_env
Mount args and lifecycle flags inserted using documented conventions

integration_tests/conftest.py

run_claude_structured() migrated to use build_sandbox_docker_cmd()
Properly constructs ContainerNetworkConfig with test network values
Session token and credentials passed correctly

tests/shared/egg_container/test_build_cmd.py

26 tests covering:
- Basic structure (docker run, image placement)
- Security options and network configuration
- Optional parameters (IP, session token, UID/GID)
- Private vs public mode behavior
- Caller insertion conventions
All tests pass

Verification

$ pytest tests/shared/egg_container/ -v
26 passed in 0.60s

$ ruff check shared/egg_container/__init__.py sandbox/egg_lib/runtime.py integration_tests/conftest.py tests/shared/egg_container/test_build_cmd.py
All checks passed!

The 5 test errors in tests/sandbox/test_git_isolation.py are pre-existing (git init failures unrelated to this PR).

— Authored by egg

james-in-a-box · 2026-02-06T18:38:29Z

egg finished successfully. View run logs

Container-launch command construction was duplicated across three call sites (run_claude, exec_in_new_container, E2E conftest), causing divergence bugs like missing --add-host and wrong env var names. The test harness was also missing PRIVATE_MODE, DNS lockdown, proxy config, RUNTIME_UID/GID, CONTAINER_ID, and --security-opt. This extracts a single build_sandbox_docker_cmd() into a new shared/egg_container module that all three sites now call, with 26 unit tests covering private/public mode, optional params, and the caller insertion conventions.

james-in-a-box

AI Code Review

Review completed but no structured output found.

This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

jwbron · 2026-02-06T18:47:49Z

@james-in-a-box review this pr

james-in-a-box

Code Review

Summary

This PR extracts a shared build_sandbox_docker_cmd() function into a new shared/egg_container module to eliminate duplicated container-launch command construction across three call sites. The change addresses real divergence bugs where different launch sites had inconsistent flags.

What I Reviewed

shared/egg_container/__init__.py - New shared module
tests/shared/egg_container/test_build_cmd.py - Unit tests (26 tests)
sandbox/egg_lib/runtime.py - Integration in production launcher
integration_tests/conftest.py - Integration in test harness
pyproject.toml and shared/pyproject.toml - Package configuration

Findings

Strengths:

Well-designed API - The build_sandbox_docker_cmd() function uses keyword-only arguments with sensible defaults, making call sites self-documenting. The ContainerNetworkConfig dataclass cleanly bundles network parameters.
Clear conventions documented - The docstring explains the caller insertion patterns (cmd[2:2] for lifecycle flags, cmd[-1:-1] for mounts, cmd.extend() for commands) with concrete examples.
Comprehensive test coverage - 26 unit tests covering:
- Basic structure (docker run prefix, image suffix)
- Security options (--security-opt label=disable)
- Optional parameters (IP, session token, UID/GID)
- Private mode (DNS lockdown, proxy env vars, PRIVATE_MODE=true)
- Public mode (no DNS lockdown, no proxy vars, PRIVATE_MODE=false)
- Caller conventions for list manipulation
Mode-specific network config - The _get_container_network_config() helper in runtime.py returns a ContainerNetworkConfig dataclass instead of a raw tuple, making the interface self-documenting and preventing field-order bugs.
Proxy fallback logic - When proxy_url is None in private mode, the function correctly falls back to http://{gateway_hostname}:3129.

No blocking issues found. The implementation is clean and the tests are thorough.

Minor Observations (non-blocking)

Dict ordering for extra_env - The extra_env dict iteration order is deterministic in Python 3.7+, so the env var order in the command will be consistent. This is fine.
Test helper functions - _private_config() and _public_config() in the tests use **overrides pattern effectively for test clarity.

Verification

✓ 26/26 unit tests pass (tests/shared/egg_container/test_build_cmd.py)
✓ ruff check passes on all modified files
✓ Module imports correctly (PYTHONPATH=shared)

Recommendation: Approve and merge after CI passes.

— Authored by egg

james-in-a-box · 2026-02-06T18:49:49Z

egg finished successfully. View run logs

jwbron · 2026-02-06T19:15:09Z

@james-in-a-box address the minor issues above and fix lint issues

james-in-a-box · 2026-02-06T19:17:13Z

Fixed ruff format issues in integration_tests/conftest.py and shared/egg_container/__init__.py. Lint checks should now pass.

— Authored by egg

james-in-a-box · 2026-02-06T19:17:23Z

egg finished successfully. View run logs

james-in-a-box

AI Code Review

Review completed but no structured output found.

This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

james-in-a-box

Thorough audit of whether this PR unifies the three sandbox container launch paths (local interactive via run_claude(), headless/GHA via exec_in_new_container(), and E2E tests via run_claude_structured()).

What's working well

The shared build_sandbox_docker_cmd() successfully centralizes the core docker command construction. Before this PR, the test path in conftest.py was missing 6 critical flags/env vars that production had:

--security-opt label=disable
--name
-e CONTAINER_ID
-e RUNTIME_UID/GID
-e PRIVATE_MODE=true/false
--dns 0.0.0.0 + proxy env vars (private mode)

All fixed now. The ContainerNetworkConfig dataclass replacing the raw tuple is also a nice improvement.

Remaining gaps

1. Tests don't set ANTHROPIC_AUTH_METHOD — Production callers always set this env var (api_key or oauth), but run_claude_structured() only passes ANTHROPIC_OAUTH_TOKEN directly and never sets ANTHROPIC_AUTH_METHOD. If sandbox startup logic branches on this env var, tests are exercising a different code path than production. Consider adding it to the test's extra_env.

2. Network config construction still diverges — Production callers go through _get_container_network_config() → ContainerNetworkConfig, while the test constructs ContainerNetworkConfig manually from EggStack values. If _get_container_network_config() is ever updated (e.g., adds a new field), the test won't pick it up automatically. Worth documenting this coupling.

3. Tests only exercise private mode — run_claude_structured() hardcodes repo_mode="private". The unit tests for build_sandbox_docker_cmd cover public mode, but the integration test never launches a sandbox container in public mode. If the shared builder has a public-mode bug in practice, E2E tests won't catch it.

4. The cmd[2:2] insertion convention is fragile — All callers rely on the builder always starting with ["docker", "run", ...] and insert lifecycle flags at index 2. This is documented and tested, but it's an implicit contract. If the builder's structure changes, all callers silently break. Not blocking, but worth noting as a maintenance risk.

5. Tests skip container_ip (no session IP binding) — Production always allocates a static --ip for gateway request verification. Tests bypass this, so they don't replicate the production security model for IP-bound sessions.

Verdict

The PR substantially addresses the startup unification goal. The security-critical flags and env vars are now consistent across all three paths. Gaps 1 and 2 are the most actionable — the rest are reasonable test-vs-production trade-offs.

— Authored by egg

jwbron · 2026-02-06T19:37:09Z

@james-in-a-box go ahead and address the issues in this comment: #159 (review)

Addresses two gaps from the audit review: 1. Tests now set ANTHROPIC_AUTH_METHOD=oauth to match production behavior. Production callers always set this env var (api_key or oauth), but the test was only passing ANTHROPIC_OAUTH_TOKEN. This ensures sandbox startup code branches the same way in tests as in production. 2. Added docstring to run_claude_structured() documenting the network config coupling. The test constructs ContainerNetworkConfig manually from EggStack values rather than calling _get_container_network_config(). If that function changes, the test must be updated to match. Authored-by: egg

james-in-a-box

Addressed the two most actionable gaps from the review:

Gap #1 (ANTHROPIC_AUTH_METHOD): run_claude_structured() now sets ANTHROPIC_AUTH_METHOD=oauth in extra_env. This ensures the sandbox startup code branches the same way in tests as in production.

Gap #2 (Network config coupling): Added a docstring section to run_claude_structured() documenting that it constructs ContainerNetworkConfig manually from EggStack values rather than calling _get_container_network_config(). If that function changes, the test must be updated to match.

Gaps 3-5 (public mode E2E coverage, cmd[2:2] fragility, container IP binding) are noted as reasonable test-vs-production trade-offs and not blocking.

— Authored by egg

james-in-a-box · 2026-02-06T19:39:27Z

egg finished successfully. View run logs

james-in-a-box

AI Code Review

Review completed but no structured output found.

This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

jwbron · 2026-02-06T19:41:30Z

@james-in-a-box go ahead and address points 4 and 5 above. Create an issue for 3. #159 (review)

Gap #4 (cmd[2:2] fragility): Add LIFECYCLE_FLAGS_INDEX constant to make the implicit contract explicit. Callers now reference the module constant instead of hardcoding the magic number. Added a test that verifies the constant value, so changes to the builder's structure will fail tests. Gap #5 (container_ip): run_claude_structured() now allocates a static IP for test containers via _allocate_test_container_ip(). This matches production where sessions are bound to specific container IPs for request verification. Gap #3 (public mode E2E): Created issue #168 to track adding E2E test coverage for public mode container launches. Authored-by: egg

james-in-a-box · 2026-02-06T19:46:01Z

Addressed the review feedback from #159 (review):

Gap #4 (cmd[2:2] fragility): Added LIFECYCLE_FLAGS_INDEX constant to shared/egg_container/__init__.py. All callers now use this constant instead of hardcoding 2. Added a test that verifies the constant value, so any changes to the builder's structure will cause test failures rather than silent caller breakage.

Gap #5 (container_ip): run_claude_structured() now allocates a static IP via _allocate_test_container_ip() and passes it to build_sandbox_docker_cmd(). Test containers now have static IPs for session binding, matching production behavior.

Gap #3 (public mode E2E): Created issue #168 to track adding E2E test coverage for public mode container launches.

— Authored by egg

james-in-a-box · 2026-02-06T19:46:13Z

egg finished successfully. View run logs

james-in-a-box

AI Code Review

Review completed but no structured output found.

This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

james-in-a-box bot force-pushed the egg/extract-container-cmd-builder branch from 1699b8c to 8d5f38d Compare February 6, 2026 06:02

james-in-a-box bot commented Feb 6, 2026

View reviewed changes

james-in-a-box bot force-pushed the egg/extract-container-cmd-builder branch from 8d5f38d to 8ccda2b Compare February 6, 2026 18:43

james-in-a-box bot commented Feb 6, 2026

View reviewed changes

Fix ruff format issues

c144301

james-in-a-box bot commented Feb 6, 2026

View reviewed changes

This was referenced Feb 6, 2026

Add docker/claude invocation linter #167

Merged

Add E2E test coverage for public mode container launches #168

Open

james-in-a-box bot commented Feb 6, 2026

View reviewed changes

jwbron approved these changes Feb 6, 2026

View reviewed changes

jwbron merged commit 5c75222 into main Feb 6, 2026
5 checks passed

Conversation

james-in-a-box bot commented Feb 6, 2026

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Review: LGTM

What I reviewed

Verification

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

AI Code Review

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Code Review

Summary

What I Reviewed

Findings

Minor Observations (non-blocking)

Verification

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

AI Code Review

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

What's working well

Remaining gaps

Verdict

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

AI Code Review

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

AI Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development