Skip to content

Extract shared container-launch command builder#159

Merged
jwbron merged 4 commits intomainfrom
egg/extract-container-cmd-builder
Feb 6, 2026
Merged

Extract shared container-launch command builder#159
jwbron merged 4 commits intomainfrom
egg/extract-container-cmd-builder

Conversation

@james-in-a-box
Copy link
Contributor

@james-in-a-box james-in-a-box bot commented Feb 6, 2026

Extract shared build_sandbox_docker_cmd() for container launches

Container-launch command construction was duplicated across three call
sites (run_claude(), exec_in_new_container(), E2E run_claude_structured()),
causing divergence bugs like missing --add-host and wrong env var names.
The test harness was also missing PRIVATE_MODE, DNS lockdown, proxy
config, RUNTIME_UID/GID, CONTAINER_ID, and --security-opt.

This introduces a new shared/egg_container module with a pure
build_sandbox_docker_cmd() function that all three call sites now use.
The function builds the base docker run argument list with all
security-critical flags (network, DNS, proxy, session token, etc.)
while callers insert their own lifecycle flags, mounts, and commands
via documented conventions (cmd[2:2], cmd[-1:-1], cmd.extend()).

_get_container_network_config() in runtime.py now returns a
ContainerNetworkConfig dataclass instead of a raw tuple, making the
interface self-documenting.

Issue: none

Test plan:

  • 26 new unit tests cover private/public mode, optional params, and caller insertion conventions
  • Full test suite passes (1049/1049)
  • ruff check passes on all modified files
  • Verify via CI workflow_dispatch E2E after merge

Authored-by: egg

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box fix merge conflicts on this branch

@james-in-a-box james-in-a-box bot force-pushed the egg/extract-container-cmd-builder branch from 1699b8c to 8d5f38d Compare February 6, 2026 06:02
@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

Merge conflicts resolved. Rebased onto main (dcfe0a8 from PR #157 which removed redundant docs directories).

Resolution summary:

File Resolution
.github/workflows/on-pull-request.yml Kept main's version (unrelated to this PR)
README.md, action/*.{md,sh}, config/*, docs/*.md, gateway/*, sandbox/README.md Kept main's version (PR #157 changes)
sandbox/egg_lib/setup_flow.py Kept main's GitHub docs URL (local docs file was removed in #157)
pyproject.toml Added shared/egg_container to packages list
shared/pyproject.toml Added egg_container* to packages.find include
integration_tests/conftest.py Merged both: PR's build_sandbox_docker_cmd() usage + main's E2E credential passthrough and timeout handling
sandbox/egg_lib/runtime.py Merged: PR's refactoring to use ContainerNetworkConfig dataclass and build_sandbox_docker_cmd()

The PR's new shared/egg_container/ module and tests were added cleanly.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box review this pr

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: LGTM

I've reviewed the implementation and it looks good. The refactoring successfully consolidates the duplicated container-launch command construction into a single, well-tested function.

What I reviewed

shared/egg_container/__init__.py

  • Clean, pure function with keyword-only arguments for safety
  • ContainerNetworkConfig dataclass provides a self-documenting interface
  • Handles both private and public modes correctly:
    • Private: DNS lockdown (--dns 0.0.0.0), proxy env vars, PRIVATE_MODE=true
    • Public: No DNS restriction, no proxy vars, PRIVATE_MODE=false
  • Extension points (extra_env, extra_args) allow callers to add context-specific configuration
  • Good docstring documents the caller insertion conventions (cmd[2:2], cmd[-1:-1], cmd.extend())

sandbox/egg_lib/runtime.py

  • _get_container_network_config() now returns ContainerNetworkConfig instead of a raw tuple
  • Both run_claude() and exec_in_new_container() use the shared builder
  • Caller-specific env vars (timing, auth, task correlation) are passed via extra_env
  • Mount args and lifecycle flags inserted using documented conventions

integration_tests/conftest.py

  • run_claude_structured() migrated to use build_sandbox_docker_cmd()
  • Properly constructs ContainerNetworkConfig with test network values
  • Session token and credentials passed correctly

tests/shared/egg_container/test_build_cmd.py

  • 26 tests covering:
    • Basic structure (docker run, image placement)
    • Security options and network configuration
    • Optional parameters (IP, session token, UID/GID)
    • Private vs public mode behavior
    • Caller insertion conventions
  • All tests pass

Verification

$ pytest tests/shared/egg_container/ -v
26 passed in 0.60s

$ ruff check shared/egg_container/__init__.py sandbox/egg_lib/runtime.py integration_tests/conftest.py tests/shared/egg_container/test_build_cmd.py
All checks passed!

The 5 test errors in tests/sandbox/test_git_isolation.py are pre-existing (git init failures unrelated to this PR).

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

Container-launch command construction was duplicated across three call
sites (run_claude, exec_in_new_container, E2E conftest), causing
divergence bugs like missing --add-host and wrong env var names.
The test harness was also missing PRIVATE_MODE, DNS lockdown, proxy
config, RUNTIME_UID/GID, CONTAINER_ID, and --security-opt.

This extracts a single build_sandbox_docker_cmd() into a new
shared/egg_container module that all three sites now call, with 26
unit tests covering private/public mode, optional params, and the
caller insertion conventions.
@james-in-a-box james-in-a-box bot force-pushed the egg/extract-container-cmd-builder branch from 8d5f38d to 8ccda2b Compare February 6, 2026 18:43
Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Review completed but no structured output found.


This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box review this pr

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary

This PR extracts a shared build_sandbox_docker_cmd() function into a new shared/egg_container module to eliminate duplicated container-launch command construction across three call sites. The change addresses real divergence bugs where different launch sites had inconsistent flags.

What I Reviewed

  • shared/egg_container/__init__.py - New shared module
  • tests/shared/egg_container/test_build_cmd.py - Unit tests (26 tests)
  • sandbox/egg_lib/runtime.py - Integration in production launcher
  • integration_tests/conftest.py - Integration in test harness
  • pyproject.toml and shared/pyproject.toml - Package configuration

Findings

Strengths:

  1. Well-designed API - The build_sandbox_docker_cmd() function uses keyword-only arguments with sensible defaults, making call sites self-documenting. The ContainerNetworkConfig dataclass cleanly bundles network parameters.

  2. Clear conventions documented - The docstring explains the caller insertion patterns (cmd[2:2] for lifecycle flags, cmd[-1:-1] for mounts, cmd.extend() for commands) with concrete examples.

  3. Comprehensive test coverage - 26 unit tests covering:

    • Basic structure (docker run prefix, image suffix)
    • Security options (--security-opt label=disable)
    • Optional parameters (IP, session token, UID/GID)
    • Private mode (DNS lockdown, proxy env vars, PRIVATE_MODE=true)
    • Public mode (no DNS lockdown, no proxy vars, PRIVATE_MODE=false)
    • Caller conventions for list manipulation
  4. Mode-specific network config - The _get_container_network_config() helper in runtime.py returns a ContainerNetworkConfig dataclass instead of a raw tuple, making the interface self-documenting and preventing field-order bugs.

  5. Proxy fallback logic - When proxy_url is None in private mode, the function correctly falls back to http://{gateway_hostname}:3129.

No blocking issues found. The implementation is clean and the tests are thorough.

Minor Observations (non-blocking)

  1. Dict ordering for extra_env - The extra_env dict iteration order is deterministic in Python 3.7+, so the env var order in the command will be consistent. This is fine.

  2. Test helper functions - _private_config() and _public_config() in the tests use **overrides pattern effectively for test clarity.

Verification

✓ 26/26 unit tests pass (tests/shared/egg_container/test_build_cmd.py)
✓ ruff check passes on all modified files
✓ Module imports correctly (PYTHONPATH=shared)

Recommendation: Approve and merge after CI passes.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box address the minor issues above and fix lint issues

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

Fixed ruff format issues in integration_tests/conftest.py and shared/egg_container/__init__.py. Lint checks should now pass.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Review completed but no structured output found.


This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough audit of whether this PR unifies the three sandbox container launch paths (local interactive via run_claude(), headless/GHA via exec_in_new_container(), and E2E tests via run_claude_structured()).

What's working well

The shared build_sandbox_docker_cmd() successfully centralizes the core docker command construction. Before this PR, the test path in conftest.py was missing 6 critical flags/env vars that production had:

  • --security-opt label=disable
  • --name
  • -e CONTAINER_ID
  • -e RUNTIME_UID/GID
  • -e PRIVATE_MODE=true/false
  • --dns 0.0.0.0 + proxy env vars (private mode)

All fixed now. The ContainerNetworkConfig dataclass replacing the raw tuple is also a nice improvement.

Remaining gaps

1. Tests don't set ANTHROPIC_AUTH_METHOD — Production callers always set this env var (api_key or oauth), but run_claude_structured() only passes ANTHROPIC_OAUTH_TOKEN directly and never sets ANTHROPIC_AUTH_METHOD. If sandbox startup logic branches on this env var, tests are exercising a different code path than production. Consider adding it to the test's extra_env.

2. Network config construction still diverges — Production callers go through _get_container_network_config()ContainerNetworkConfig, while the test constructs ContainerNetworkConfig manually from EggStack values. If _get_container_network_config() is ever updated (e.g., adds a new field), the test won't pick it up automatically. Worth documenting this coupling.

3. Tests only exercise private moderun_claude_structured() hardcodes repo_mode="private". The unit tests for build_sandbox_docker_cmd cover public mode, but the integration test never launches a sandbox container in public mode. If the shared builder has a public-mode bug in practice, E2E tests won't catch it.

4. The cmd[2:2] insertion convention is fragile — All callers rely on the builder always starting with ["docker", "run", ...] and insert lifecycle flags at index 2. This is documented and tested, but it's an implicit contract. If the builder's structure changes, all callers silently break. Not blocking, but worth noting as a maintenance risk.

5. Tests skip container_ip (no session IP binding) — Production always allocates a static --ip for gateway request verification. Tests bypass this, so they don't replicate the production security model for IP-bound sessions.

Verdict

The PR substantially addresses the startup unification goal. The security-critical flags and env vars are now consistent across all three paths. Gaps 1 and 2 are the most actionable — the rest are reasonable test-vs-production trade-offs.

— Authored by egg

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box go ahead and address the issues in this comment: #159 (review)

Addresses two gaps from the audit review:

1. Tests now set ANTHROPIC_AUTH_METHOD=oauth to match production behavior.
   Production callers always set this env var (api_key or oauth), but the
   test was only passing ANTHROPIC_OAUTH_TOKEN. This ensures sandbox
   startup code branches the same way in tests as in production.

2. Added docstring to run_claude_structured() documenting the network
   config coupling. The test constructs ContainerNetworkConfig manually
   from EggStack values rather than calling _get_container_network_config().
   If that function changes, the test must be updated to match.

Authored-by: egg
Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the two most actionable gaps from the review:

Gap #1 (ANTHROPIC_AUTH_METHOD): run_claude_structured() now sets ANTHROPIC_AUTH_METHOD=oauth in extra_env. This ensures the sandbox startup code branches the same way in tests as in production.

Gap #2 (Network config coupling): Added a docstring section to run_claude_structured() documenting that it constructs ContainerNetworkConfig manually from EggStack values rather than calling _get_container_network_config(). If that function changes, the test must be updated to match.

Gaps 3-5 (public mode E2E coverage, cmd[2:2] fragility, container IP binding) are noted as reasonable test-vs-production trade-offs and not blocking.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Review completed but no structured output found.


This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box go ahead and address points 4 and 5 above. Create an issue for 3. #159 (review)

Gap #4 (cmd[2:2] fragility): Add LIFECYCLE_FLAGS_INDEX constant to make
the implicit contract explicit. Callers now reference the module constant
instead of hardcoding the magic number. Added a test that verifies the
constant value, so changes to the builder's structure will fail tests.

Gap #5 (container_ip): run_claude_structured() now allocates a static IP
for test containers via _allocate_test_container_ip(). This matches
production where sessions are bound to specific container IPs for request
verification.

Gap #3 (public mode E2E): Created issue #168 to track adding E2E test
coverage for public mode container launches.

Authored-by: egg
@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

Addressed the review feedback from #159 (review):

Gap #4 (cmd[2:2] fragility): Added LIFECYCLE_FLAGS_INDEX constant to shared/egg_container/__init__.py. All callers now use this constant instead of hardcoding 2. Added a test that verifies the constant value, so any changes to the builder's structure will cause test failures rather than silent caller breakage.

Gap #5 (container_ip): run_claude_structured() now allocates a static IP via _allocate_test_container_ip() and passes it to build_sandbox_docker_cmd(). Test containers now have static IPs for session binding, matching production behavior.

Gap #3 (public mode E2E): Created issue #168 to track adding E2E test coverage for public mode container launches.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Review completed but no structured output found.


This is an automated review. Please evaluate suggestions carefully.

— Authored by egg

@jwbron jwbron merged commit 5c75222 into main Feb 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant