Add plan for AI-powered code review bots by james-in-a-box[bot] · Pull Request #140 · jwbron/egg

james-in-a-box · 2026-02-05T22:17:38Z

Add implementation plan for AI-powered code review bots (#134).

The plan proposes a phased approach starting with a single-agent GitHub
Action reviewer that runs automatically on PRs. It reuses the existing
egg Action infrastructure (gateway + sandbox orchestration) rather than
building new webhook or bot infrastructure.

Phase 1 (MVP): Single-pass reviewer combining security, standards,
and quality checks. Three new files: build-review-prompt.sh,
post-review-comments.sh, and an on-pull-request.yml workflow.

Phase 2: Specialized review modes (security-focused, plan
verification, bounded-context "outsider" review).

Phase 3: False positive management, incremental review, metrics.

Phase 4: AI lintbot integration (semantic naming, logic correctness,
API usage antipatterns).

Key design decisions:

Single-agent over multi-agent (simpler, sufficient for code review)
GitHub Actions over webhook server (reuses existing infra)
Advisory comments only (COMMENT, not REQUEST_CHANGES)
Per-repo customization via .egg/review-rules.md

Issue: #134
Related: #70, #77

Test plan:

Review the plan document for completeness and feasibility
Validate against existing egg Action infrastructure
Confirm design aligns with issue requirements

Authored-by: egg

jwbron · 2026-02-05T22:18:50Z

@james-in-a-box review this plan

james-in-a-box

Plan Review

Reviewed the plan against the existing egg Action infrastructure (action/action.yml, build-mention-prompt.sh, on-mention.yml, gateway allowlists). Overall this is well-structured and the phased approach is sound. A few observations:

What works well

Reusing the egg Action is the right call. The orchestration, credential isolation, and network modes are already production-tested. Adding a new trigger workflow is far less risky than building webhook infrastructure.
Single-agent over multi-agent is the correct starting point. The justification (latency, cost, debugging complexity) is accurate. Multi-agent can be layered on later if single-pass proves insufficient.
cancel-in-progress: true for the review workflow — correct. Unlike on-mention.yml (where canceling an in-progress task would lose work), stale reviews on superseded commits have no value.
Advisory-only COMMENT events — right default. Giving an AI reviewer veto power (via REQUEST_CHANGES) should require significant confidence in false positive rates first.
Trusted checkout pattern — good that this carries over from on-mention.yml. Building the prompt from main prevents a malicious PR from tampering with the prompt assembly script.

Issues to address

Prompt output mechanism. The plan shows build-review-prompt.sh setting steps.prompt.outputs.prompt, which is passed to the egg Action's prompt input. build-mention-prompt.sh writes to $GITHUB_OUTPUT using a heredoc delimiter. With the proposed truncation limits (100K chars overall prompt), this will hit GitHub Actions' output size limits — GITHUB_OUTPUT has practical limits around 1 MB but multiline outputs can be fragile at that size. Consider writing the prompt to a temp file and passing the path instead, or compressing it. This is a real implementation concern that should be addressed in the plan.
gh api comment posting syntax. The example in §1.3:
```
gh api repos/{owner}/{repo}/pulls/{pr}/reviews \
  -f 'comments[][path]'="$FILE" \
  -f 'comments[][position]'="$LINE" \
  -f 'comments[][body]'="$COMMENT"
```
This array syntax doesn't work with gh api. The GitHub API expects a JSON body with a comments array. This will need to use --input with a constructed JSON payload (via jq) or --raw-field with properly formatted JSON. The plan should show the correct approach since post-review-comments.sh is a core deliverable.
Diff position vs. line number. The prompt asks Claude to output "line": <line_number_in_diff>, and the comment poster uses position in the API call. The GitHub Reviews API position field is the line index within the diff hunk (not the file line number). This mapping is non-trivial — Claude would need to output the diff-relative position, or post-review-comments.sh needs to compute it from the diff. Alternatively, use the newer subject_type: "line" with line and side parameters (available since 2022), which accept actual file line numbers. This distinction should be explicit in the plan because getting it wrong means every inline comment lands on the wrong line.
Missing log-file output handling. The workflow uses steps.egg.outputs.log-file (referenced in §1.3), but post-review-comments.sh needs to know where to read Claude's output. The plan should specify how the egg Action exposes this — the existing action.yml outputs log-file, so this should work, but the plan should confirm the format (raw text with embedded JSON blocks? pure JSON?). The prompt asks for structured JSON, but Claude may wrap it in markdown code fences or include preamble text. The parser needs to handle this robustly.
ready_for_review trigger gap. The workflow triggers on ready_for_review (draft → ready), which is good. But it doesn't trigger on reopened. If someone closes and reopens a PR, no review runs. Minor, but worth noting whether this is intentional.

Suggestions

Phase 3.2 (Incremental Review) says "diff between previous head and new head" on synchronize events. This information is available in the pull_request.synchronize event payload as before and after SHAs. Worth noting that in the plan since it makes the implementation straightforward — no need to store state between runs.
Open Question 2 (comment threading): I'd recommend resolving old review comments and posting fresh. The GitHub API supports DISMISSing reviews, or you can resolve individual comment threads. This avoids clutter while still preserving history in the timeline. The plan should take a position here rather than leaving it open, since it affects the Phase 1 implementation of post-review-comments.sh.
Open Question 4 (model selection): The existing action.yml already has a model input. The workflow can parameterize this. A reasonable default: use Haiku for PRs under N changed files (e.g., 5), Opus for larger ones. This is easy to add in Phase 1 since the infrastructure already supports it.

Verdict

The plan is solid and feasible. The Phase 1 scope (3 new files + 1 workflow) is appropriately sized. The issues above (#1-4) are implementation details that need resolution before coding starts, but none of them change the architecture. I'd recommend addressing the diff position mapping (issue #3) and the gh api JSON payload format (issue #2) in the plan itself, since those are design decisions, not just implementation details.

— Authored by egg

james-in-a-box · 2026-02-05T22:22:25Z

egg finished successfully. View run logs

jwbron · 2026-02-05T22:23:56Z

@james-in-a-box address the above feedback

james-in-a-box

Addressed all review feedback. Summary of changes:

Issues resolved:

Prompt output mechanism (issue #1). Replaced $GITHUB_OUTPUT multiline approach with a temp file ($RUNNER_TEMP/review-prompt.txt). The workflow now passes a prompt-file path to the egg Action instead of inline prompt text. This avoids size limits entirely.
gh api comment posting syntax (issue #2). Replaced the invalid -f 'comments[][path]' array syntax with a jq-constructed JSON payload piped via --input -. The plan now shows the correct approach.
Diff position vs. line number (issue #3). Switched from the legacy position field to the newer subject_type: "line" API with line and side parameters. The prompt now explicitly instructs Claude to output file line numbers (not diff-relative positions), and the comment poster sets side: "RIGHT" for all comments. This avoids diff-position mapping entirely.
Log file format and parsing (issue #4). Added a section specifying that Claude's output may include markdown code fences, preamble text, or multiple JSON blocks. The parser tries structured extraction first, falls back to scanning for individual JSON blocks, then falls back to posting raw output as a plain comment.
reopened trigger (issue #5). Added reopened to the workflow's pull_request.types list.

Suggestions addressed:

Phase 3.2 incremental review: Noted that pull_request.synchronize payload includes before and after SHAs (github.event.before/github.event.after), eliminating the need for external state storage.
Open Question 2 (comment threading): Resolved — dismiss the bot's previous review via the GitHub API and post a fresh one. This preserves history while keeping the PR timeline clean.
Open Question 4 (model selection): Resolved — default to Haiku for PRs with ≤5 changed files, Opus for larger PRs. Threshold is configurable via .egg/review-rules.md. The workflow passes the model as an output from the prompt builder step.

— Authored by egg

james-in-a-box · 2026-02-05T22:28:26Z

egg finished successfully. View run logs

jwbron · 2026-02-06T00:58:58Z

@james-in-a-box implement phase 1 of this plan and open a PR.

james-in-a-box · 2026-02-06T01:09:07Z

Phase 1 implementation is ready in PR #146.

The implementation includes:

action/build-review-prompt.sh — builds structured review prompts from PR metadata, diffs, and file contents
action/post-review-comments.sh — parses Claude's JSON output and posts GitHub PR reviews with inline comments
prompt-file input added to the egg action for large prompts
.github/workflows/on-pull-request.yml workflow

Note: The workflow file could not be pushed due to GitHub App permission restrictions (workflows permission required). It's committed locally and documented in the PR — a human will need to merge it manually or copy the contents.

— Authored by egg

james-in-a-box · 2026-02-06T01:09:24Z

egg finished successfully. View run logs

Add section 2.4 documenting a deep-review mode that gives the bot direct PR access for multi-turn analysis. Unlike the structured output approach in Phase 1, deep review allows: - Running tests to validate suspected issues - Exploring the codebase beyond the changed files - Multi-turn investigation chains - Posting inline code suggestions using GitHub suggestion blocks - Direct PR interaction via gh CLI Includes architecture diagram, comparison table with Phase 1, guardrails (time limits, comment limits, no self-approval), trigger mechanism (@Egg deep-review), and implementation notes. Authored-by: egg

jwbron · 2026-02-06T04:41:44Z

@james-in-a-box move forward with phase 2. Be sure to pull in from main.

james-in-a-box · 2026-02-06T04:43:53Z

egg run failed (exit code: unknown). View run logs

jwbron · 2026-02-06T04:44:43Z

@james-in-a-box move forward with phase 2. Be sure to pull in from main.

james-in-a-box · 2026-02-06T04:53:48Z

Phase 2 implementation is now ready in PR #152.

The implementation includes:

action/prompts/security-review.md — Security-focused review prompt
action/prompts/plan-verify.md — Plan verification review prompt
action/prompts/outsider-review.md — Outsider/clarity review prompt
action/build-deep-review-prompt.sh — Deep review prompt builder
Updated build-review-prompt.sh for specialized mode support
Updated on-pull-request.yml with workflow_dispatch inputs
Updated action.yml with review-mode input

All four specialized review modes from the plan are now implemented:

Security-focused review (auto-triggers on security-sensitive changes)
Plan verification review (compares PR against linked issue)
Outsider review (reviews without project context)
Deep review mode (multi-turn analysis with test execution)

— Authored by egg

james-in-a-box · 2026-02-06T04:54:04Z

egg finished successfully. View run logs

jwbron · 2026-02-06T19:12:27Z

see #161. We're going to rework the strategy here.

Add plan for AI-powered code review bots (#134)

100928d

james-in-a-box bot mentioned this pull request Feb 5, 2026

Design and implement AI-powered code review bots #134

Closed

james-in-a-box bot commented Feb 5, 2026

View reviewed changes

Address review feedback on AI code review bots plan

083dcd8

james-in-a-box bot commented Feb 5, 2026

View reviewed changes

james-in-a-box bot mentioned this pull request Feb 6, 2026

Implement Phase 1: AI-powered code review #146

Merged

james-in-a-box bot mentioned this pull request Feb 6, 2026

Add on-pull-request workflow for AI code review #147

Closed

james-in-a-box bot mentioned this pull request Feb 6, 2026

Implement Phase 2: specialized review modes #152

Closed

5 tasks

jwbron closed this Feb 6, 2026

Conversation

james-in-a-box bot commented Feb 5, 2026

Uh oh!

jwbron commented Feb 5, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Plan Review

What works well

Issues to address

Suggestions

Verdict

Uh oh!

james-in-a-box bot commented Feb 5, 2026

Uh oh!

jwbron commented Feb 5, 2026

Uh oh!

james-in-a-box bot left a comment

Choose a reason for hiding this comment

Uh oh!

james-in-a-box bot commented Feb 5, 2026

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

james-in-a-box bot commented Feb 6, 2026

Uh oh!

jwbron commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant