Add plan for AI-powered code review bots#140
Add plan for AI-powered code review bots#140james-in-a-box[bot] wants to merge 3 commits intomainfrom
Conversation
|
@james-in-a-box review this plan |
There was a problem hiding this comment.
Plan Review
Reviewed the plan against the existing egg Action infrastructure (action/action.yml, build-mention-prompt.sh, on-mention.yml, gateway allowlists). Overall this is well-structured and the phased approach is sound. A few observations:
What works well
- Reusing the egg Action is the right call. The orchestration, credential isolation, and network modes are already production-tested. Adding a new trigger workflow is far less risky than building webhook infrastructure.
- Single-agent over multi-agent is the correct starting point. The justification (latency, cost, debugging complexity) is accurate. Multi-agent can be layered on later if single-pass proves insufficient.
cancel-in-progress: truefor the review workflow — correct. Unlikeon-mention.yml(where canceling an in-progress task would lose work), stale reviews on superseded commits have no value.- Advisory-only
COMMENTevents — right default. Giving an AI reviewer veto power (viaREQUEST_CHANGES) should require significant confidence in false positive rates first. - Trusted checkout pattern — good that this carries over from
on-mention.yml. Building the prompt frommainprevents a malicious PR from tampering with the prompt assembly script.
Issues to address
-
Prompt output mechanism. The plan shows
build-review-prompt.shsettingsteps.prompt.outputs.prompt, which is passed to the egg Action'spromptinput.build-mention-prompt.shwrites to$GITHUB_OUTPUTusing a heredoc delimiter. With the proposed truncation limits (100K chars overall prompt), this will hit GitHub Actions' output size limits —GITHUB_OUTPUThas practical limits around 1 MB but multiline outputs can be fragile at that size. Consider writing the prompt to a temp file and passing the path instead, or compressing it. This is a real implementation concern that should be addressed in the plan. -
gh apicomment posting syntax. The example in §1.3:gh api repos/{owner}/{repo}/pulls/{pr}/reviews \ -f 'comments[][path]'="$FILE" \ -f 'comments[][position]'="$LINE" \ -f 'comments[][body]'="$COMMENT"This array syntax doesn't work with
gh api. The GitHub API expects a JSON body with acommentsarray. This will need to use--inputwith a constructed JSON payload (viajq) or--raw-fieldwith properly formatted JSON. The plan should show the correct approach sincepost-review-comments.shis a core deliverable. -
Diff position vs. line number. The prompt asks Claude to output
"line": <line_number_in_diff>, and the comment poster usespositionin the API call. The GitHub Reviews APIpositionfield is the line index within the diff hunk (not the file line number). This mapping is non-trivial — Claude would need to output the diff-relative position, orpost-review-comments.shneeds to compute it from the diff. Alternatively, use the newersubject_type: "line"withlineandsideparameters (available since 2022), which accept actual file line numbers. This distinction should be explicit in the plan because getting it wrong means every inline comment lands on the wrong line. -
Missing
log-fileoutput handling. The workflow usessteps.egg.outputs.log-file(referenced in §1.3), butpost-review-comments.shneeds to know where to read Claude's output. The plan should specify how the egg Action exposes this — the existingaction.ymloutputslog-file, so this should work, but the plan should confirm the format (raw text with embedded JSON blocks? pure JSON?). The prompt asks for structured JSON, but Claude may wrap it in markdown code fences or include preamble text. The parser needs to handle this robustly. -
ready_for_reviewtrigger gap. The workflow triggers onready_for_review(draft → ready), which is good. But it doesn't trigger onreopened. If someone closes and reopens a PR, no review runs. Minor, but worth noting whether this is intentional.
Suggestions
-
Phase 3.2 (Incremental Review) says "diff between previous head and new head" on
synchronizeevents. This information is available in thepull_request.synchronizeevent payload asbeforeandafterSHAs. Worth noting that in the plan since it makes the implementation straightforward — no need to store state between runs. -
Open Question 2 (comment threading): I'd recommend resolving old review comments and posting fresh. The GitHub API supports
DISMISSing reviews, or you can resolve individual comment threads. This avoids clutter while still preserving history in the timeline. The plan should take a position here rather than leaving it open, since it affects the Phase 1 implementation ofpost-review-comments.sh. -
Open Question 4 (model selection): The existing
action.ymlalready has amodelinput. The workflow can parameterize this. A reasonable default: use Haiku for PRs under N changed files (e.g., 5), Opus for larger ones. This is easy to add in Phase 1 since the infrastructure already supports it.
Verdict
The plan is solid and feasible. The Phase 1 scope (3 new files + 1 workflow) is appropriately sized. The issues above (#1-4) are implementation details that need resolution before coding starts, but none of them change the architecture. I'd recommend addressing the diff position mapping (issue #3) and the gh api JSON payload format (issue #2) in the plan itself, since those are design decisions, not just implementation details.
— Authored by egg
|
egg finished successfully. View run logs |
|
@james-in-a-box address the above feedback |
There was a problem hiding this comment.
Addressed all review feedback. Summary of changes:
Issues resolved:
-
Prompt output mechanism (issue #1). Replaced
$GITHUB_OUTPUTmultiline approach with a temp file ($RUNNER_TEMP/review-prompt.txt). The workflow now passes aprompt-filepath to the egg Action instead of inline prompt text. This avoids size limits entirely. -
gh apicomment posting syntax (issue #2). Replaced the invalid-f 'comments[][path]'array syntax with ajq-constructed JSON payload piped via--input -. The plan now shows the correct approach. -
Diff position vs. line number (issue #3). Switched from the legacy
positionfield to the newersubject_type: "line"API withlineandsideparameters. The prompt now explicitly instructs Claude to output file line numbers (not diff-relative positions), and the comment poster setsside: "RIGHT"for all comments. This avoids diff-position mapping entirely. -
Log file format and parsing (issue #4). Added a section specifying that Claude's output may include markdown code fences, preamble text, or multiple JSON blocks. The parser tries structured extraction first, falls back to scanning for individual JSON blocks, then falls back to posting raw output as a plain comment.
-
reopenedtrigger (issue #5). Addedreopenedto the workflow'spull_request.typeslist.
Suggestions addressed:
-
Phase 3.2 incremental review: Noted that
pull_request.synchronizepayload includesbeforeandafterSHAs (github.event.before/github.event.after), eliminating the need for external state storage. -
Open Question 2 (comment threading): Resolved — dismiss the bot's previous review via the GitHub API and post a fresh one. This preserves history while keeping the PR timeline clean.
-
Open Question 4 (model selection): Resolved — default to Haiku for PRs with ≤5 changed files, Opus for larger PRs. Threshold is configurable via
.egg/review-rules.md. The workflow passes the model as an output from the prompt builder step.
— Authored by egg
|
egg finished successfully. View run logs |
|
@james-in-a-box implement phase 1 of this plan and open a PR. |
|
Phase 1 implementation is ready in PR #146. The implementation includes:
Note: The workflow file could not be pushed due to GitHub App permission restrictions ( — Authored by egg |
|
egg finished successfully. View run logs |
Add section 2.4 documenting a deep-review mode that gives the bot direct PR access for multi-turn analysis. Unlike the structured output approach in Phase 1, deep review allows: - Running tests to validate suspected issues - Exploring the codebase beyond the changed files - Multi-turn investigation chains - Posting inline code suggestions using GitHub suggestion blocks - Direct PR interaction via gh CLI Includes architecture diagram, comparison table with Phase 1, guardrails (time limits, comment limits, no self-approval), trigger mechanism (@Egg deep-review), and implementation notes. Authored-by: egg
|
@james-in-a-box move forward with phase 2. Be sure to pull in from main. |
|
egg run failed (exit code: unknown). View run logs |
|
@james-in-a-box move forward with phase 2. Be sure to pull in from main. |
|
Phase 2 implementation is now ready in PR #152. The implementation includes:
All four specialized review modes from the plan are now implemented:
— Authored by egg |
|
egg finished successfully. View run logs |
|
see #161. We're going to rework the strategy here. |
Add implementation plan for AI-powered code review bots (#134).
The plan proposes a phased approach starting with a single-agent GitHub
Action reviewer that runs automatically on PRs. It reuses the existing
egg Action infrastructure (gateway + sandbox orchestration) rather than
building new webhook or bot infrastructure.
Phase 1 (MVP): Single-pass reviewer combining security, standards,
and quality checks. Three new files:
build-review-prompt.sh,post-review-comments.sh, and anon-pull-request.ymlworkflow.Phase 2: Specialized review modes (security-focused, plan
verification, bounded-context "outsider" review).
Phase 3: False positive management, incremental review, metrics.
Phase 4: AI lintbot integration (semantic naming, logic correctness,
API usage antipatterns).
Key design decisions:
.egg/review-rules.mdIssue: #134
Related: #70, #77
Test plan:
Authored-by: egg