Skip to content

Add plan for AI-powered code review bots#140

Closed
james-in-a-box[bot] wants to merge 3 commits intomainfrom
james-in-a-box/134-review-bots-plan
Closed

Add plan for AI-powered code review bots#140
james-in-a-box[bot] wants to merge 3 commits intomainfrom
james-in-a-box/134-review-bots-plan

Conversation

@james-in-a-box
Copy link
Contributor

@james-in-a-box james-in-a-box bot commented Feb 5, 2026

Add implementation plan for AI-powered code review bots (#134).

The plan proposes a phased approach starting with a single-agent GitHub
Action reviewer that runs automatically on PRs. It reuses the existing
egg Action infrastructure (gateway + sandbox orchestration) rather than
building new webhook or bot infrastructure.

Phase 1 (MVP): Single-pass reviewer combining security, standards,
and quality checks. Three new files: build-review-prompt.sh,
post-review-comments.sh, and an on-pull-request.yml workflow.

Phase 2: Specialized review modes (security-focused, plan
verification, bounded-context "outsider" review).

Phase 3: False positive management, incremental review, metrics.

Phase 4: AI lintbot integration (semantic naming, logic correctness,
API usage antipatterns).

Key design decisions:

  • Single-agent over multi-agent (simpler, sufficient for code review)
  • GitHub Actions over webhook server (reuses existing infra)
  • Advisory comments only (COMMENT, not REQUEST_CHANGES)
  • Per-repo customization via .egg/review-rules.md

Issue: #134
Related: #70, #77

Test plan:

  • Review the plan document for completeness and feasibility
  • Validate against existing egg Action infrastructure
  • Confirm design aligns with issue requirements

Authored-by: egg

@jwbron
Copy link
Owner

jwbron commented Feb 5, 2026

@james-in-a-box review this plan

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plan Review

Reviewed the plan against the existing egg Action infrastructure (action/action.yml, build-mention-prompt.sh, on-mention.yml, gateway allowlists). Overall this is well-structured and the phased approach is sound. A few observations:

What works well

  • Reusing the egg Action is the right call. The orchestration, credential isolation, and network modes are already production-tested. Adding a new trigger workflow is far less risky than building webhook infrastructure.
  • Single-agent over multi-agent is the correct starting point. The justification (latency, cost, debugging complexity) is accurate. Multi-agent can be layered on later if single-pass proves insufficient.
  • cancel-in-progress: true for the review workflow — correct. Unlike on-mention.yml (where canceling an in-progress task would lose work), stale reviews on superseded commits have no value.
  • Advisory-only COMMENT events — right default. Giving an AI reviewer veto power (via REQUEST_CHANGES) should require significant confidence in false positive rates first.
  • Trusted checkout pattern — good that this carries over from on-mention.yml. Building the prompt from main prevents a malicious PR from tampering with the prompt assembly script.

Issues to address

  1. Prompt output mechanism. The plan shows build-review-prompt.sh setting steps.prompt.outputs.prompt, which is passed to the egg Action's prompt input. build-mention-prompt.sh writes to $GITHUB_OUTPUT using a heredoc delimiter. With the proposed truncation limits (100K chars overall prompt), this will hit GitHub Actions' output size limits — GITHUB_OUTPUT has practical limits around 1 MB but multiline outputs can be fragile at that size. Consider writing the prompt to a temp file and passing the path instead, or compressing it. This is a real implementation concern that should be addressed in the plan.

  2. gh api comment posting syntax. The example in §1.3:

    gh api repos/{owner}/{repo}/pulls/{pr}/reviews \
      -f 'comments[][path]'="$FILE" \
      -f 'comments[][position]'="$LINE" \
      -f 'comments[][body]'="$COMMENT"

    This array syntax doesn't work with gh api. The GitHub API expects a JSON body with a comments array. This will need to use --input with a constructed JSON payload (via jq) or --raw-field with properly formatted JSON. The plan should show the correct approach since post-review-comments.sh is a core deliverable.

  3. Diff position vs. line number. The prompt asks Claude to output "line": <line_number_in_diff>, and the comment poster uses position in the API call. The GitHub Reviews API position field is the line index within the diff hunk (not the file line number). This mapping is non-trivial — Claude would need to output the diff-relative position, or post-review-comments.sh needs to compute it from the diff. Alternatively, use the newer subject_type: "line" with line and side parameters (available since 2022), which accept actual file line numbers. This distinction should be explicit in the plan because getting it wrong means every inline comment lands on the wrong line.

  4. Missing log-file output handling. The workflow uses steps.egg.outputs.log-file (referenced in §1.3), but post-review-comments.sh needs to know where to read Claude's output. The plan should specify how the egg Action exposes this — the existing action.yml outputs log-file, so this should work, but the plan should confirm the format (raw text with embedded JSON blocks? pure JSON?). The prompt asks for structured JSON, but Claude may wrap it in markdown code fences or include preamble text. The parser needs to handle this robustly.

  5. ready_for_review trigger gap. The workflow triggers on ready_for_review (draft → ready), which is good. But it doesn't trigger on reopened. If someone closes and reopens a PR, no review runs. Minor, but worth noting whether this is intentional.

Suggestions

  • Phase 3.2 (Incremental Review) says "diff between previous head and new head" on synchronize events. This information is available in the pull_request.synchronize event payload as before and after SHAs. Worth noting that in the plan since it makes the implementation straightforward — no need to store state between runs.

  • Open Question 2 (comment threading): I'd recommend resolving old review comments and posting fresh. The GitHub API supports DISMISSing reviews, or you can resolve individual comment threads. This avoids clutter while still preserving history in the timeline. The plan should take a position here rather than leaving it open, since it affects the Phase 1 implementation of post-review-comments.sh.

  • Open Question 4 (model selection): The existing action.yml already has a model input. The workflow can parameterize this. A reasonable default: use Haiku for PRs under N changed files (e.g., 5), Opus for larger ones. This is easy to add in Phase 1 since the infrastructure already supports it.

Verdict

The plan is solid and feasible. The Phase 1 scope (3 new files + 1 workflow) is appropriately sized. The issues above (#1-4) are implementation details that need resolution before coding starts, but none of them change the architecture. I'd recommend addressing the diff position mapping (issue #3) and the gh api JSON payload format (issue #2) in the plan itself, since those are design decisions, not just implementation details.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 5, 2026

egg finished successfully. View run logs

@jwbron
Copy link
Owner

jwbron commented Feb 5, 2026

@james-in-a-box address the above feedback

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all review feedback. Summary of changes:

Issues resolved:

  1. Prompt output mechanism (issue #1). Replaced $GITHUB_OUTPUT multiline approach with a temp file ($RUNNER_TEMP/review-prompt.txt). The workflow now passes a prompt-file path to the egg Action instead of inline prompt text. This avoids size limits entirely.

  2. gh api comment posting syntax (issue #2). Replaced the invalid -f 'comments[][path]' array syntax with a jq-constructed JSON payload piped via --input -. The plan now shows the correct approach.

  3. Diff position vs. line number (issue #3). Switched from the legacy position field to the newer subject_type: "line" API with line and side parameters. The prompt now explicitly instructs Claude to output file line numbers (not diff-relative positions), and the comment poster sets side: "RIGHT" for all comments. This avoids diff-position mapping entirely.

  4. Log file format and parsing (issue #4). Added a section specifying that Claude's output may include markdown code fences, preamble text, or multiple JSON blocks. The parser tries structured extraction first, falls back to scanning for individual JSON blocks, then falls back to posting raw output as a plain comment.

  5. reopened trigger (issue #5). Added reopened to the workflow's pull_request.types list.

Suggestions addressed:

  • Phase 3.2 incremental review: Noted that pull_request.synchronize payload includes before and after SHAs (github.event.before/github.event.after), eliminating the need for external state storage.

  • Open Question 2 (comment threading): Resolved — dismiss the bot's previous review via the GitHub API and post a fresh one. This preserves history while keeping the PR timeline clean.

  • Open Question 4 (model selection): Resolved — default to Haiku for PRs with ≤5 changed files, Opus for larger PRs. Threshold is configurable via .egg/review-rules.md. The workflow passes the model as an output from the prompt builder step.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 5, 2026

egg finished successfully. View run logs

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box implement phase 1 of this plan and open a PR.

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

Phase 1 implementation is ready in PR #146.

The implementation includes:

  • action/build-review-prompt.sh — builds structured review prompts from PR metadata, diffs, and file contents
  • action/post-review-comments.sh — parses Claude's JSON output and posts GitHub PR reviews with inline comments
  • prompt-file input added to the egg action for large prompts
  • .github/workflows/on-pull-request.yml workflow

Note: The workflow file could not be pushed due to GitHub App permission restrictions (workflows permission required). It's committed locally and documented in the PR — a human will need to merge it manually or copy the contents.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

Add section 2.4 documenting a deep-review mode that gives the bot direct
PR access for multi-turn analysis. Unlike the structured output approach
in Phase 1, deep review allows:

- Running tests to validate suspected issues
- Exploring the codebase beyond the changed files
- Multi-turn investigation chains
- Posting inline code suggestions using GitHub suggestion blocks
- Direct PR interaction via gh CLI

Includes architecture diagram, comparison table with Phase 1, guardrails
(time limits, comment limits, no self-approval), trigger mechanism
(@Egg deep-review), and implementation notes.

Authored-by: egg
@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box move forward with phase 2. Be sure to pull in from main.

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg run failed (exit code: unknown). View run logs

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

@james-in-a-box move forward with phase 2. Be sure to pull in from main.

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

Phase 2 implementation is now ready in PR #152.

The implementation includes:

  • action/prompts/security-review.md — Security-focused review prompt
  • action/prompts/plan-verify.md — Plan verification review prompt
  • action/prompts/outsider-review.md — Outsider/clarity review prompt
  • action/build-deep-review-prompt.sh — Deep review prompt builder
  • Updated build-review-prompt.sh for specialized mode support
  • Updated on-pull-request.yml with workflow_dispatch inputs
  • Updated action.yml with review-mode input

All four specialized review modes from the plan are now implemented:

  1. Security-focused review (auto-triggers on security-sensitive changes)
  2. Plan verification review (compares PR against linked issue)
  3. Outsider review (reviews without project context)
  4. Deep review mode (multi-turn analysis with test execution)

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

james-in-a-box bot commented Feb 6, 2026

egg finished successfully. View run logs

@jwbron
Copy link
Owner

jwbron commented Feb 6, 2026

see #161. We're going to rework the strategy here.

@jwbron jwbron closed this Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant