Implement GitHub Actions support (Phase 1) by james-in-a-box[bot] · Pull Request #111 · jwbron/egg

james-in-a-box · 2026-02-05T08:46:39Z

Implement Phase 1 (MVP) of GitHub Actions support

This adds a composite GitHub Action that orchestrates the full egg stack
(gateway + sandbox) within a GHA runner, matching the architecture from
the ADR (#98) and implementation plan. Users can run egg in CI/CD
pipelines while maintaining the same security model as local execution:
sandbox never sees credentials, all git/gh ops route through the gateway.

Changes:

action/action.yml — Composite action definition with inputs (prompt,
anthropic-oauth-token, github-token, bot-github-token, mode, timeout,
model, image-tag) and outputs (exit-code, pr-url, log-file)
action/entrypoint.sh — Container orchestration: pulls GHCR images,
creates Docker networks with dynamic subnet allocation, detects mode
(auto/public/private), starts dual-homed gateway, creates session via
API, runs sandbox with Claude Code in exec mode, captures output, and
cleans up via trap handler
action/generate-config.sh — Generates ephemeral repositories.yaml,
secrets.env, and launcher-secret for the gateway
.github/workflows/release-images.yml — Builds and pushes gateway +
sandbox images to GHCR on release events
.github/workflows/test-action.yml — Config generation tests,
shellcheck validation, and integration smoke test
gateway/Dockerfile — Fix EXPOSE ports to match actual values
(9847→9848, 3128→3129)
docs/plans/github-actions-implementation-plan.md — Detailed
implementation plan mapping ADR design to codebase

Issue: #78
Depends on: #98 (ADR)

Test plan:

Verify shellcheck passes on action/*.sh
Verify yamllint passes on action/action.yml and new workflows
Run test-action workflow to validate config generation
Manual: trigger release-images workflow, then run action in a test repo

Authored-by: egg

Detailed technical plan for Phase 1 implementation of the GitHub Actions ADR (#98). Covers all deliverables (action.yml, entrypoint.sh, config generation, GHCR image publishing, integration tests), maps each step to existing codebase references, and identifies risks and open items. Closes #78 (partially — plan only, implementation to follow)

james-in-a-box

Review: Implementation Plan for GitHub Actions Support

Overall this is a well-structured implementation plan. The codebase references are thorough and almost all line-number citations check out against the current code. A few items to address:

Inaccuracy: `repo_parser.py` reference

The plan states that repo_parser.py expects the full repositories.yaml format with writable_repos, bot_username, repo_settings, user_mode, and local_repos.paths. However, repo_parser.py is a URL/path parsing utility — it doesn't parse the config file. The actual config parsing happens in config/repo_config.py and shared/egg_config/config.py. The fields themselves are correct, but the attribution to repo_parser.py is wrong in both the "Codebase Observations" section (§3) and the "Deliverables" section (§3, config generation). This should be corrected to avoid confusion during implementation.

Dependency on unmerged ADR (PR #98)

PR #98 (the ADR) is still OPEN and currently has merge conflicts. This plan references it as the design authority. The plan should either:

Wait for #98 to merge before this lands, or
Note explicitly that this plan may need revision if the ADR changes during review

Missing detail: `.git` shadow mount

Step 9 mentions shadowing .git via bind-mount of /dev/null, but doesn't explain why this is needed or what happens if it's omitted. This is a non-obvious security/correctness measure that deserves a one-line rationale (presumably to prevent the sandbox from accessing the full repo history or manipulating the worktree's parent repo directly).

Network subnet allocation (Step 2)

The plan says "inspect existing Docker networks, allocate unused 172.x.0.0/24 subnets" but the existing codebase uses hardcoded subnets (172.32.0.0/24 and 172.33.0.0/24 in config.py:59-65). The plan should clarify whether GHA will use dynamic allocation (different from existing behavior) and why — presumably to avoid collisions on self-hosted runners that might also run egg locally. This is a design divergence from the ADR's "reuse existing infrastructure" principle and should be called out explicitly.

Open item #4: `.dockerignore`

The suggestion to add a .dockerignore to exclude docs/tests from the sandbox image is reasonable but is a change to the existing build. If the sandbox image currently works without one, this optimization should be a separate PR to avoid scope creep.

Minor items

Timeout mechanism: timeout ${TIMEOUT_MINUTES}m docker run ... will send SIGTERM to docker, which may not propagate cleanly to Claude Code inside the container. Consider whether docker stop with a grace period is more reliable for cleanup.
Image tagging (open item #5): Recommend pinning to a specific version tag in action.yml rather than latest. Using latest in a GitHub Action is a footgun — users who pin @v1 of the action would still get unpredictable image versions.
gh api for mode detection (Step 3): The $GITHUB_TOKEN provided by github.token can already determine repo visibility from the ${{ github.event.repository.visibility }} context variable, which avoids an extra API call.

Verified claims

The following codebase references were verified as accurate:

Port discrepancy (9847/3128 vs 9848/3129) — confirmed in Dockerfile vs config.py/entrypoint.sh
Session creation API shape and location (gateway.py:449-508)
Credential file layout (anthropic_credentials.py:31-33, entrypoint.sh:47-53)
Dual-network architecture (gateway.py:808-853, config.py:59-65)
Runtime references for IP allocation, cleanup, and sandbox start
Both Dockerfiles using repo root as build context

— Authored by egg

Correct repo_parser.py attribution to config/repo_config.py and shared/egg_config/config.py, add ADR dependency caveat, document .git shadow mount rationale, clarify dynamic subnet allocation for GHA, scope .dockerignore to separate PR, improve timeout mechanism to use docker stop, use github.event.repository.visibility context variable for mode detection, and pin image tags in action.yml.

james-in-a-box

Thanks for the thorough review. All points addressed in 44cb0a6:

repo_parser.py attribution — Agreed. Corrected all three references (§3, §7, Deliverable §3) to attribute config parsing to config/repo_config.py and shared/egg_config/config.py. Added a clarifying note that gateway/repo_parser.py is a URL/path parsing utility.

Dependency on unmerged ADR (#98) — Agreed. Added a dependency note at the top of the plan stating it should be re-validated against the final ADR before implementation begins.

Missing .git shadow mount rationale — Agreed. Added inline rationale in Step 9: prevents the sandbox from accessing the full repo history or manipulating the worktree's parent repo directly.

Network subnet allocation — Agreed. Added a note in Step 2 clarifying this is a deliberate divergence from the hardcoded subnets in the local dev setup, needed to avoid collisions on self-hosted runners with concurrent runs.

.dockerignore scope — Agreed. Reworded open item #4 to note this should be a separate PR since it changes the existing build.

Timeout mechanism — Agreed. Replaced timeout(1) approach with docker stop --stop-timeout 30 via background timer for clean signal propagation.

Image tagging — Agreed. Rewrote open item #5 to recommend pinning to a specific version tag, with the release workflow updating action.yml per release.

Mode detection via context variable — Agreed. Step 3 now uses ${{ github.event.repository.visibility }} instead of gh api, avoiding the extra API call.

— Authored by egg

* Add ADR for GitHub Actions support (#78) * Address PR #98 review feedback on GHA ADR --------- Co-authored-by: egg <egg@example.com>

Detailed technical plan for Phase 1 implementation of the GitHub Actions ADR (#98). Covers all deliverables (action.yml, entrypoint.sh, config generation, GHCR image publishing, integration tests), maps each step to existing codebase references, and identifies risks and open items. Closes #78 (partially — plan only, implementation to follow)

Correct repo_parser.py attribution to config/repo_config.py and shared/egg_config/config.py, add ADR dependency caveat, document .git shadow mount rationale, clarify dynamic subnet allocation for GHA, scope .dockerignore to separate PR, improve timeout mechanism to use docker stop, use github.event.repository.visibility context variable for mode detection, and pin image tags in action.yml.

…o egg/gha-implementation-plan

Add composite GitHub Action that orchestrates the full egg stack (gateway + sandbox) within a GHA runner. This enables running egg in CI/CD pipelines while maintaining the same security model as local execution. Deliverables: - action/action.yml: Composite action definition with inputs/outputs - action/entrypoint.sh: Container orchestration (pull, network, gateway, session, sandbox lifecycle with cleanup trap) - action/generate-config.sh: Ephemeral config generation for gateway - .github/workflows/release-images.yml: GHCR image publishing on release - .github/workflows/test-action.yml: Integration tests for the action - gateway/Dockerfile: Fix EXPOSE ports (9847->9848, 3128->3129) Issue: #78

james-in-a-box

Review: Phase 1 Implementation of GitHub Actions Support

Cross-referenced the implementation against the existing gateway/sandbox code. The orchestration logic in entrypoint.sh faithfully replicates the Python flow from runtime.py and gateway.py. The config generation, session lifecycle, dual-network architecture, and credential isolation model are all correctly implemented. A few items to address:

Issue 1: Proxy vars not stripped in sandbox exec mode

The PR runs Claude Code via docker run ... sandbox_image claude --print ..., which hits the sandbox entrypoint's run_exec() path (sandbox/entrypoint.py:1113-1124). Unlike run_interactive() (line 1088-1092), run_exec() does not strip HTTP_PROXY/HTTPS_PROXY from the environment before exec'ing the command.

In private mode, the PR sets HTTP_PROXY=http://egg-gateway:3129 and HTTPS_PROXY=http://egg-gateway:3129 on the sandbox container. Claude Code (Node.js) doesn't respect NO_PROXY, so its API calls to ANTHROPIC_BASE_URL=http://egg-gateway:9848 will get routed through Squid at port 3129. Squid's ACL only allows api.anthropic.com — connections to egg-gateway:9848 may be rejected, causing Claude Code to fail in private mode.

This is a pre-existing gap in run_exec() but it specifically affects this PR since GHA is the first consumer of exec mode with private repos. Fix options:

(a) Add proxy stripping to run_exec() in a prerequisite PR (matches run_interactive behavior)
(b) Unset proxy vars in the sandbox command: add -e HTTP_PROXY= -e HTTPS_PROXY= -e http_proxy= -e https_proxy= after the proxy-setting block in entrypoint.sh, so Claude Code's process env has them cleared while the shell env retains them

Issue 2: Session response logged to CI output

entrypoint.sh:616 logs the full session response:

echo "Session response: $SESSION_RESPONSE"

This response contains the full session token and worktree paths. While the session token is partially redacted on the next line (${SESSION_TOKEN:0:8}...), the full token has already been printed to the CI log. Consider removing this echo or redacting it:

echo "Session created successfully"

Issue 3: `bc` dependency for floating-point health check timing

Lines 563-576 use bc for floating-point elapsed time comparison. While bc is available on ubuntu-latest, it may not be on all self-hosted runners. A simpler approach using integer seconds would be more portable:

SECONDS=0
while (( SECONDS < HEALTH_TIMEOUT )); do
  if curl -sf "$HEALTH_URL" > /dev/null 2>&1; then
    echo "Gateway healthy after ${SECONDS}s"
    break
  fi
  sleep 1
done

Issue 4: No proxy health check

The local flow (gateway.py:681-754) performs a two-stage health check: API health (/api/v1/health) and proxy health (HTTPS CONNECT through Squid to api.anthropic.com). The PR only checks API health. For private mode, Squid readiness is essential — if it's not ready when the sandbox starts, Claude Code's traffic (routed through the proxy for non-gateway destinations) will fail.

Consider adding a proxy check for private mode:

if [[ "$MODE" == "private" ]]; then
  curl -sf --proxy "http://${GATEWAY_IP_ISOLATED}:${GATEWAY_PROXY_PORT}" \
    -o /dev/null https://api.anthropic.com/ 2>/dev/null || true
fi

Even a best-effort check (accepting 401/403 as success since it proves connectivity) would catch Squid startup delays.

Minor items

secrets.env contains GITHUB_USER_TOKEN (generate-config.sh:85): The gateway's anthropic_credentials.py reads CLAUDE_CODE_OAUTH_TOKEN from this file, and github_client.py reads GITHUB_USER_TOKEN from the environment variable (not from secrets.env). The gateway container already receives GITHUB_USER_TOKEN as an env var (entrypoint.sh:534). Having it in secrets.env is harmless but redundant — the gateway ignores GITHUB_USER_TOKEN in the secrets file.
Verbose sandbox startup output: The sandbox entrypoint prints banner/timing info that will clutter CI logs. Consider passing -e EGG_QUIET=1 to the sandbox container to suppress the startup noise, matching the existing quiet mode support in sandbox/entrypoint.py.
allocate_subnet scans broadly: The function scans 172.28-63 with 256 minor values each (~9000 candidates). On self-hosted runners with many Docker networks, this inner loop could be slow. Not a functional issue, but a break after the first inspection mismatch in the outer loop would help (currently scans exhaustively per major).

Verified correct

The following aspects were validated against the existing codebase and are correctly implemented:

Port usage: 9848 (API) and 3129 (proxy) — matches config.py:53-54
Gateway Dockerfile EXPOSE fix: 9847/3128 → 9848/3129
Session create/delete API paths and payloads — matches gateway.py:2176
Launcher secret mount at /secrets/launcher-secret — matches entrypoint.sh:47
Secrets.env mount at ~/.config/egg/ — matches anthropic_credentials.py:31-33
EGG_REPO_CONFIG env var for config path override — matches repo_config.py:41-72
Dual-network architecture with gateway dual-homing via docker network connect — matches gateway.py:839-853
Private mode: --dns 0.0.0.0, proxy env vars, NO_PROXY — matches runtime.py:118-139
Public mode: external network, PRIVATE_MODE=false, no proxy — matches runtime.py:146
.git shadow mount via /dev/null bind
Container naming with $GITHUB_RUN_ID for concurrency safety
RUNTIME_UID/RUNTIME_GID naming — matches runtime.py:672-674
--security-opt label=disable — matches runtime.py:652
Timeout via background docker stop with --stop-timeout 30 — correct signal propagation
Sandbox entrypoint.py handles ANTHROPIC_BASE_URL setup internally — no need for PR to pass it
Config generation produces all required fields for repo_config.py and egg_config/config.py
Credential isolation maintained: sandbox never sees real tokens

ADR dependency

PR #98 (the ADR) is still open. This implementation tracks the ADR's design closely. The dependency note added in the plan is appropriate — if the ADR changes materially, the implementation may need revision.

— Authored by egg

- Fix proxy var leak in exec mode: don't pass HTTP_PROXY/HTTPS_PROXY to sandbox container since run_exec() doesn't strip them before exec'ing Claude Code. Network isolation maintained via DNS and network topology. - Remove session response echo that leaked full session token to CI logs. The redacted token (first 8 chars) is still printed. - Replace bc-based health check timing with bash SECONDS builtin for portability across self-hosted runners. - Add Squid proxy health check for private mode — verifies proxy is accepting connections before starting sandbox. - Remove redundant GITHUB_USER_TOKEN from secrets.env — gateway reads it from the environment variable, not from this file. - Add EGG_QUIET=1 to sandbox container to suppress startup banner noise in CI logs. - Use associative array in allocate_subnet for O(1) subnet lookups instead of repeated grep. - Update test-action.yml assertion for secrets.env contents.

james-in-a-box

All points addressed in 9c1b950:

Proxy vars not stripped in exec mode — Agreed. Removed HTTP_PROXY/HTTPS_PROXY/http_proxy/https_proxy from the sandbox docker run command entirely. Network isolation is still enforced via --dns 0.0.0.0 and the isolated network topology. Added a comment documenting the run_exec() gap and noting it should be fixed properly in sandbox/entrypoint.py as a follow-up.

Session response logged to CI output — Agreed. Removed the echo "Session response: $SESSION_RESPONSE" line. The redacted token (${SESSION_TOKEN:0:8}...) is still printed for debugging.

bc dependency for health check timing — Agreed. Replaced bc-based float arithmetic with bash's SECONDS builtin and integer comparison. Sleep interval changed from 0.5s to 1s for consistency.

No proxy health check — Agreed. Added a Squid readiness check for private mode after the gateway health check. Accepts any HTTP response (including 403/407) as proof that Squid is up. Uses a 15s timeout with a warning (not a hard failure) to avoid blocking on transient proxy delays.

Redundant GITHUB_USER_TOKEN in secrets.env — Agreed. Removed from generate-config.sh. Added a comment explaining the gateway reads it from the environment variable. Updated the test assertion in test-action.yml to verify it's absent from secrets.env.

EGG_QUIET=1 for sandbox — Agreed. Added -e "EGG_QUIET=1" to the sandbox container's common env vars.

allocate_subnet performance — Agreed. Replaced the grep-per-candidate approach with an associative array precomputed from docker network inspect, giving O(1) lookups per candidate.

— Authored by egg

- Use $GITHUB_RUN_ID instead of hardcoded values (99999/88888) — GHA built-in env vars cannot be overridden by step-level env blocks - Replace `! grep` with if/then/exit pattern to avoid SC2251 (errexit skip under negated command)

GITHUB_REPOSITORY, GITHUB_ACTOR, and GITHUB_ACTOR_ID are also built-in GHA variables that cannot be overridden by step-level env blocks. Validate against the actual runtime values instead of hardcoded test values. Clean up config dir between test steps since both now use the same GITHUB_RUN_ID.

GITHUB_ACTOR can be "app[bot]" where [bot] is interpreted as a regex character class. Use grep -F for fixed string matching in all config generation test assertions.

Phase 1 implementation is underway in PR #111. Update ADR status from "Proposed" to "In Progress", move the file to docs/adr/in-progress/, and fix references in the ADR index and implementation plan.

The merge-base changed after approval.

james-in-a-box bot commented Feb 5, 2026

View reviewed changes

james-in-a-box bot and others added 5 commits February 5, 2026 01:47

Add ADR for GitHub Actions support (#98)

a2d8e8a

* Add ADR for GitHub Actions support (#78) * Address PR #98 review feedback on GHA ADR --------- Co-authored-by: egg <egg@example.com>

Merge remote-tracking branch 'origin/egg/gha-implementation-plan' int…

46625f2

…o egg/gha-implementation-plan

james-in-a-box bot changed the title ~~Add implementation plan for GitHub Actions support~~ Implement GitHub Actions support (Phase 1) Feb 5, 2026

james-in-a-box bot commented Feb 5, 2026

View reviewed changes

egg added 4 commits February 5, 2026 17:19

Fix CI failures in test-action workflow

42a7e4d

- Use $GITHUB_RUN_ID instead of hardcoded values (99999/88888) — GHA built-in env vars cannot be overridden by step-level env blocks - Replace `! grep` with if/then/exit pattern to avoid SC2251 (errexit skip under negated command)

Fix grep regex issue with bot actor names

b5917a4

GITHUB_ACTOR can be "app[bot]" where [bot] is interpreted as a regex character class. Use grep -F for fixed string matching in all config generation test assertions.

Move GHA ADR from not-implemented to in-progress

212c745

Phase 1 implementation is underway in PR #111. Update ADR status from "Proposed" to "In Progress", move the file to docs/adr/in-progress/, and fix references in the ADR index and implementation plan.

jwbron previously approved these changes Feb 5, 2026

View reviewed changes

jwbron approved these changes Feb 5, 2026

View reviewed changes

jwbron merged commit e20c5f7 into main Feb 5, 2026
7 checks passed

This was referenced Feb 5, 2026

Support interaction with egg via github issues and PRs #82

Closed

implement comprehensive integration tests #109

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement GitHub Actions support (Phase 1)#111

Implement GitHub Actions support (Phase 1)#111
jwbron merged 12 commits intomainfrom
egg/gha-implementation-plan

james-in-a-box bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

james-in-a-box bot left a comment

Uh oh!

james-in-a-box bot left a comment

Uh oh!

james-in-a-box bot left a comment

Uh oh!

james-in-a-box bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

james-in-a-box bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Review: Implementation Plan for GitHub Actions Support

Inaccuracy: repo_parser.py reference

Dependency on unmerged ADR (PR #98)

Missing detail: .git shadow mount

Network subnet allocation (Step 2)

Open item #4: .dockerignore

Minor items

Verified claims

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Review: Phase 1 Implementation of GitHub Actions Support

Issue 1: Proxy vars not stripped in sandbox exec mode

Issue 2: Session response logged to CI output

Issue 3: bc dependency for floating-point health check timing

Issue 4: No proxy health check

Minor items

Verified correct

ADR dependency

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

james-in-a-box bot commented Feb 5, 2026 •

edited

Loading

Inaccuracy: `repo_parser.py` reference

Missing detail: `.git` shadow mount

Open item #4: `.dockerignore`

Issue 3: `bc` dependency for floating-point health check timing