Extract shared container-launch command builder#159
Conversation
|
@james-in-a-box fix merge conflicts on this branch |
1699b8c to
8d5f38d
Compare
|
Merge conflicts resolved. Rebased onto main (dcfe0a8 from PR #157 which removed redundant docs directories). Resolution summary:
The PR's new — Authored by egg |
|
egg finished successfully. View run logs |
|
@james-in-a-box review this pr |
There was a problem hiding this comment.
Review: LGTM
I've reviewed the implementation and it looks good. The refactoring successfully consolidates the duplicated container-launch command construction into a single, well-tested function.
What I reviewed
shared/egg_container/__init__.py
- Clean, pure function with keyword-only arguments for safety
ContainerNetworkConfigdataclass provides a self-documenting interface- Handles both private and public modes correctly:
- Private: DNS lockdown (
--dns 0.0.0.0), proxy env vars,PRIVATE_MODE=true - Public: No DNS restriction, no proxy vars,
PRIVATE_MODE=false
- Private: DNS lockdown (
- Extension points (
extra_env,extra_args) allow callers to add context-specific configuration - Good docstring documents the caller insertion conventions (
cmd[2:2],cmd[-1:-1],cmd.extend())
sandbox/egg_lib/runtime.py
_get_container_network_config()now returnsContainerNetworkConfiginstead of a raw tuple- Both
run_claude()andexec_in_new_container()use the shared builder - Caller-specific env vars (timing, auth, task correlation) are passed via
extra_env - Mount args and lifecycle flags inserted using documented conventions
integration_tests/conftest.py
run_claude_structured()migrated to usebuild_sandbox_docker_cmd()- Properly constructs
ContainerNetworkConfigwith test network values - Session token and credentials passed correctly
tests/shared/egg_container/test_build_cmd.py
- 26 tests covering:
- Basic structure (docker run, image placement)
- Security options and network configuration
- Optional parameters (IP, session token, UID/GID)
- Private vs public mode behavior
- Caller insertion conventions
- All tests pass
Verification
$ pytest tests/shared/egg_container/ -v
26 passed in 0.60s
$ ruff check shared/egg_container/__init__.py sandbox/egg_lib/runtime.py integration_tests/conftest.py tests/shared/egg_container/test_build_cmd.py
All checks passed!
The 5 test errors in tests/sandbox/test_git_isolation.py are pre-existing (git init failures unrelated to this PR).
— Authored by egg
|
egg finished successfully. View run logs |
Container-launch command construction was duplicated across three call sites (run_claude, exec_in_new_container, E2E conftest), causing divergence bugs like missing --add-host and wrong env var names. The test harness was also missing PRIVATE_MODE, DNS lockdown, proxy config, RUNTIME_UID/GID, CONTAINER_ID, and --security-opt. This extracts a single build_sandbox_docker_cmd() into a new shared/egg_container module that all three sites now call, with 26 unit tests covering private/public mode, optional params, and the caller insertion conventions.
8d5f38d to
8ccda2b
Compare
|
@james-in-a-box review this pr |
There was a problem hiding this comment.
Code Review
Summary
This PR extracts a shared build_sandbox_docker_cmd() function into a new shared/egg_container module to eliminate duplicated container-launch command construction across three call sites. The change addresses real divergence bugs where different launch sites had inconsistent flags.
What I Reviewed
shared/egg_container/__init__.py- New shared moduletests/shared/egg_container/test_build_cmd.py- Unit tests (26 tests)sandbox/egg_lib/runtime.py- Integration in production launcherintegration_tests/conftest.py- Integration in test harnesspyproject.tomlandshared/pyproject.toml- Package configuration
Findings
Strengths:
-
Well-designed API - The
build_sandbox_docker_cmd()function uses keyword-only arguments with sensible defaults, making call sites self-documenting. TheContainerNetworkConfigdataclass cleanly bundles network parameters. -
Clear conventions documented - The docstring explains the caller insertion patterns (
cmd[2:2]for lifecycle flags,cmd[-1:-1]for mounts,cmd.extend()for commands) with concrete examples. -
Comprehensive test coverage - 26 unit tests covering:
- Basic structure (docker run prefix, image suffix)
- Security options (
--security-opt label=disable) - Optional parameters (IP, session token, UID/GID)
- Private mode (DNS lockdown, proxy env vars,
PRIVATE_MODE=true) - Public mode (no DNS lockdown, no proxy vars,
PRIVATE_MODE=false) - Caller conventions for list manipulation
-
Mode-specific network config - The
_get_container_network_config()helper inruntime.pyreturns aContainerNetworkConfigdataclass instead of a raw tuple, making the interface self-documenting and preventing field-order bugs. -
Proxy fallback logic - When
proxy_urlisNonein private mode, the function correctly falls back tohttp://{gateway_hostname}:3129.
No blocking issues found. The implementation is clean and the tests are thorough.
Minor Observations (non-blocking)
-
Dict ordering for extra_env - The
extra_envdict iteration order is deterministic in Python 3.7+, so the env var order in the command will be consistent. This is fine. -
Test helper functions -
_private_config()and_public_config()in the tests use**overridespattern effectively for test clarity.
Verification
✓ 26/26 unit tests pass (tests/shared/egg_container/test_build_cmd.py)
✓ ruff check passes on all modified files
✓ Module imports correctly (PYTHONPATH=shared)
Recommendation: Approve and merge after CI passes.
— Authored by egg
|
egg finished successfully. View run logs |
|
@james-in-a-box address the minor issues above and fix lint issues |
|
Fixed ruff format issues in — Authored by egg |
|
egg finished successfully. View run logs |
There was a problem hiding this comment.
Thorough audit of whether this PR unifies the three sandbox container launch paths (local interactive via run_claude(), headless/GHA via exec_in_new_container(), and E2E tests via run_claude_structured()).
What's working well
The shared build_sandbox_docker_cmd() successfully centralizes the core docker command construction. Before this PR, the test path in conftest.py was missing 6 critical flags/env vars that production had:
--security-opt label=disable--name-e CONTAINER_ID-e RUNTIME_UID/GID-e PRIVATE_MODE=true/false--dns 0.0.0.0+ proxy env vars (private mode)
All fixed now. The ContainerNetworkConfig dataclass replacing the raw tuple is also a nice improvement.
Remaining gaps
1. Tests don't set ANTHROPIC_AUTH_METHOD — Production callers always set this env var (api_key or oauth), but run_claude_structured() only passes ANTHROPIC_OAUTH_TOKEN directly and never sets ANTHROPIC_AUTH_METHOD. If sandbox startup logic branches on this env var, tests are exercising a different code path than production. Consider adding it to the test's extra_env.
2. Network config construction still diverges — Production callers go through _get_container_network_config() → ContainerNetworkConfig, while the test constructs ContainerNetworkConfig manually from EggStack values. If _get_container_network_config() is ever updated (e.g., adds a new field), the test won't pick it up automatically. Worth documenting this coupling.
3. Tests only exercise private mode — run_claude_structured() hardcodes repo_mode="private". The unit tests for build_sandbox_docker_cmd cover public mode, but the integration test never launches a sandbox container in public mode. If the shared builder has a public-mode bug in practice, E2E tests won't catch it.
4. The cmd[2:2] insertion convention is fragile — All callers rely on the builder always starting with ["docker", "run", ...] and insert lifecycle flags at index 2. This is documented and tested, but it's an implicit contract. If the builder's structure changes, all callers silently break. Not blocking, but worth noting as a maintenance risk.
5. Tests skip container_ip (no session IP binding) — Production always allocates a static --ip for gateway request verification. Tests bypass this, so they don't replicate the production security model for IP-bound sessions.
Verdict
The PR substantially addresses the startup unification goal. The security-critical flags and env vars are now consistent across all three paths. Gaps 1 and 2 are the most actionable — the rest are reasonable test-vs-production trade-offs.
— Authored by egg
|
@james-in-a-box go ahead and address the issues in this comment: #159 (review) |
Addresses two gaps from the audit review: 1. Tests now set ANTHROPIC_AUTH_METHOD=oauth to match production behavior. Production callers always set this env var (api_key or oauth), but the test was only passing ANTHROPIC_OAUTH_TOKEN. This ensures sandbox startup code branches the same way in tests as in production. 2. Added docstring to run_claude_structured() documenting the network config coupling. The test constructs ContainerNetworkConfig manually from EggStack values rather than calling _get_container_network_config(). If that function changes, the test must be updated to match. Authored-by: egg
There was a problem hiding this comment.
Addressed the two most actionable gaps from the review:
Gap #1 (ANTHROPIC_AUTH_METHOD): run_claude_structured() now sets ANTHROPIC_AUTH_METHOD=oauth in extra_env. This ensures the sandbox startup code branches the same way in tests as in production.
Gap #2 (Network config coupling): Added a docstring section to run_claude_structured() documenting that it constructs ContainerNetworkConfig manually from EggStack values rather than calling _get_container_network_config(). If that function changes, the test must be updated to match.
Gaps 3-5 (public mode E2E coverage, cmd[2:2] fragility, container IP binding) are noted as reasonable test-vs-production trade-offs and not blocking.
— Authored by egg
|
egg finished successfully. View run logs |
|
@james-in-a-box go ahead and address points 4 and 5 above. Create an issue for 3. #159 (review) |
Gap #4 (cmd[2:2] fragility): Add LIFECYCLE_FLAGS_INDEX constant to make the implicit contract explicit. Callers now reference the module constant instead of hardcoding the magic number. Added a test that verifies the constant value, so changes to the builder's structure will fail tests. Gap #5 (container_ip): run_claude_structured() now allocates a static IP for test containers via _allocate_test_container_ip(). This matches production where sessions are bound to specific container IPs for request verification. Gap #3 (public mode E2E): Created issue #168 to track adding E2E test coverage for public mode container launches. Authored-by: egg
|
Addressed the review feedback from #159 (review): Gap #4 (cmd[2:2] fragility): Added Gap #5 (container_ip): Gap #3 (public mode E2E): Created issue #168 to track adding E2E test coverage for public mode container launches. — Authored by egg |
|
egg finished successfully. View run logs |
Extract shared build_sandbox_docker_cmd() for container launches
Container-launch command construction was duplicated across three call
sites (
run_claude(),exec_in_new_container(), E2Erun_claude_structured()),causing divergence bugs like missing
--add-hostand wrong env var names.The test harness was also missing
PRIVATE_MODE, DNS lockdown, proxyconfig,
RUNTIME_UID/GID,CONTAINER_ID, and--security-opt.This introduces a new
shared/egg_containermodule with a purebuild_sandbox_docker_cmd()function that all three call sites now use.The function builds the base
docker runargument list with allsecurity-critical flags (network, DNS, proxy, session token, etc.)
while callers insert their own lifecycle flags, mounts, and commands
via documented conventions (
cmd[2:2],cmd[-1:-1],cmd.extend())._get_container_network_config()inruntime.pynow returns aContainerNetworkConfigdataclass instead of a raw tuple, making theinterface self-documenting.
Issue: none
Test plan:
ruff checkpasses on all modified filesworkflow_dispatchE2E after mergeAuthored-by: egg