Skip to content

Unify prompt criteria and trim CLAUDE.md for non-SDLC tasks (#659)#725

Merged
jwbron merged 15 commits intomainfrom
egg/issue-659-implement
Feb 15, 2026
Merged

Unify prompt criteria and trim CLAUDE.md for non-SDLC tasks (#659)#725
jwbron merged 15 commits intomainfrom
egg/issue-659-implement

Conversation

@jwbron
Copy link
Owner

@jwbron jwbron commented Feb 15, 2026

Summary

  • Unifies prompt criteria across GHA bash scripts and the Python orchestrator by extracting shared review criteria into shared/prompts/*.md files (code-review, autofixer, agent-design, contract-review)
  • Trims CLAUDE.md for non-SDLC agents by conditionally excluding contract.md and orchestrator.md sections when EGG_PIPELINE_ID is not set, reducing ~6 KB of irrelevant context
  • Wires shared criteria into all prompt builders — both shell scripts (action/build-*.sh) and orchestrator (routes/pipelines.py) now read from the same source files with fallback to inline defaults
  • Adds comprehensive tests for prompt builders and entrypoint CLAUDE.md assembly

Pipeline artifacts included

This PR includes SDLC pipeline state files from the issue-659 run:

  • Contract: .egg-state/contracts/659.json
  • Analysis: .egg-state/drafts/659-analysis.md
  • Plan outputs: .egg-state/drafts/659-plan-*.json (architect, risk-analyst, task-planner)
  • Implement outputs: .egg-state/agent-outputs/integrator-output.json, .egg-state/checks/implement-results.json

Note: The implementation pipeline completed (coder, tester, documenter, integrator all passed) but the review phase was not reached due to token limits.

Closes #659

Test plan

  • Verify shared prompt files are loaded correctly by shell scripts (tests/action/test_build_*.py)
  • Verify orchestrator reads shared criteria (orchestrator/tests/test_pipeline_prompts.py)
  • Verify CLAUDE.md excludes contract/orchestrator sections for non-SDLC agents (tests/sandbox/test_entrypoint.py)
  • Verify fallback behavior when shared files are missing

🤖 Generated with Claude Code

egg-orchestrator added 14 commits February 15, 2026 09:50
Extracts review criteria (code review, agent-design, contract verification,
autofixer rules) from duplicated inline content in GHA bash scripts and
orchestrator Python functions into shared markdown files under
shared/prompts/. Both systems now read from the same source of truth with
fallback to inline defaults for rollout safety.

GHA scripts use script-relative paths ($SCRIPT_DIR/../shared/prompts/) to
read criteria from the trusted checkout, preserving the security model.
User override files (.egg/*-rules.md) retain priority over shared defaults.

Conditionally excludes contract.md and orchestrator.md from CLAUDE.md when
EGG_PIPELINE_ID is not set, reducing token usage for non-SDLC invocations.

Removes dead code in _build_phase_prompt() where is_local branches for
refine and plan phases produced identical output.

Adds orchestrator Dockerfile COPY for shared/prompts/ to make criteria
available in the production container.

Authored-by: egg
Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No agent-mode design concerns.

This PR aligns well with agent-mode design principles:

  • Shared infrastructure, not pre-fetching: The shared criteria files are small markdown guidelines (~1-2KB each) that orient the agent on what to review. This matches the "good pattern: shared infrastructure" called out in docs/guides/agent-mode-design.md.
  • CLAUDE.md trimming: Conditionally excluding irrelevant SDLC sections for non-pipeline agents reduces cognitive noise without constraining agent flexibility.
  • No anti-patterns introduced: No large pre-fetched data, no structured output for humans, no post-processing pipelines, no rigid procedures.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: Unify prompt criteria and trim CLAUDE.md (#659)

The overall approach is sound — extracting shared criteria into markdown files is the right level of abstraction, and the conditional CLAUDE.md assembly is a clean win. However, there are several issues that should be addressed before merging.


1. repo_path not propagated through _get_review_criteria_for_type() — user overrides silently ignored in orchestrator reviews

File: orchestrator/routes/pipelines.py lines 990-1005

_get_code_review_criteria() and _get_contract_review_criteria() now accept repo_path and use it to load .egg/review-rules.md user overrides. However, the dispatcher _get_review_criteria_for_type() calls them without repo_path:

def _get_review_criteria_for_type(reviewer_type: str, phase: str) -> str:
    ...
    elif reviewer_type == "code":
        return _get_code_review_criteria()   # ← no repo_path
    elif reviewer_type == "contract":
        return _get_contract_review_criteria()  # ← no repo_path

This is called from _build_review_prompt() at line 1166, which also has no repo_path parameter. The user-override mechanism that was added to the orchestrator is unreachable through the production code path. The feature only works when calling _get_code_review_criteria(repo_path=...) directly (i.e., only in tests).

This is either dead code or a missing plumbing fix. Either propagate repo_path through _get_review_criteria_for_type() and _build_review_prompt() to make the feature work, or remove the repo_path parameter from these functions and document that user overrides are GHA-only (matching pre-existing behavior).

2. _build_autofix_prompt hardcodes repo_path=None, making user overrides unreachable

File: orchestrator/routes/pipelines.py lines 2431-2434

autofixer_rules = _read_shared_criteria(
    "autofixer-rules.md",
    user_override="autofixer-rules.md",
    repo_path=None,  # ← always None
)

The function receives a repo parameter (line 2381) that is available, but the call hardcodes repo_path=None. Passing user_override="autofixer-rules.md" is pointless since the override check is gated by if user_override and repo_path:. This is misleading — it looks like user overrides are supported when they aren't.

Either pass the repo parameter through (after computing a usable repo path), or remove the user_override argument to avoid the false promise.

3. Empty user override file returns empty string, silently producing an empty criteria section

File: orchestrator/routes/pipelines.py lines 826-829

if user_override and repo_path:
    override_path = Path(repo_path) / ".egg" / user_override
    if override_path.is_file():
        return override_path.read_text()  # returns "" for empty file

If a user creates an empty .egg/review-rules.md file (e.g., by accident, or to "clear" defaults), _read_shared_criteria returns "" (empty string). This passes the if content is not None: check in callers, producing a review prompt with an empty criteria section. The test at line 2190-2203 actually documents this behavior as expected.

An empty criteria section in a review prompt would produce a low-quality review. Consider treating empty override files the same as absent ones: if override_path.is_file() and override_path.stat().st_size > 0:, or add a check at the caller: if content is not None and content.strip():.

4. action/README.md — paragraph inserted inside a markdown table breaks rendering

File: action/README.md

The diff inserts a paragraph between table rows:

| `build-contract-verification-prompt.sh` | Builds prompts for contract verification reviews |

Prompt builders load review criteria from `shared/prompts/` ...
| **Convention Documents** | |
| `review-conventions.md` | Code review conventions and guidelines |

This breaks the markdown table. The parser will end the table at the blank line, render the paragraph, then render the remaining rows as a malformed second table (no header row/separator). Move the paragraph either before or after the entire table, or make it a note below the table.

5. Test for Docker path fallback is a no-op

File: orchestrator/tests/test_pipeline_prompts.py lines 2158-2188

The test_docker_path_fallback method has multiple abandoned patching attempts (including with ... pass blocks) and ends with a simple assert _read_shared_criteria("definitely-does-not-exist.md") is None, which is just a duplicate of test_returns_none_for_missing_file. The test name says it tests Docker path fallback, but it doesn't actually test that code path. Either write a working test for the Docker fallback or remove this method.

6. Path.__new__ patching in entrypoint tests is fragile

File: tests/sandbox/test_entrypoint.py lines 3224-3232, 3267-3275, etc.

The tests patch Path.__new__ to redirect /opt/claude-rules to a temp directory. This is brittle — it intercepts all Path construction globally, which could interfere with other Path operations during test execution (file reads, symlink creation, etc.). A more robust approach would be to extract the rules directory path as a parameter or module-level constant in setup_agent_rules() so tests can override it cleanly. This is a pre-existing design issue that the tests work around, so it's not blocking, but worth noting as tech debt.

7. Minor: Missing trailing newline in shared prompt files

Files: shared/prompts/agent-design-criteria.md, shared/prompts/autofixer-rules.md, shared/prompts/code-review-criteria.md, shared/prompts/contract-review-criteria.md

These files don't end with a trailing newline. When concatenated into prompts (especially via cat in shell scripts), this can cause the last line to merge with subsequent content. Not critical since markdown rendering is forgiving, but conventional practice is to end files with a newline.


Summary

Issues #1 and #2 are correctness bugs: the user-override feature for orchestrator reviews is advertised (parameters exist, tests pass) but is unreachable through production code paths. Issue #4 is a documentation rendering bug. Issue #3 is an edge case that could produce degraded behavior. Issues #5-7 are minor.

The core design — shared markdown files with three-tier lookup (user override → source tree → Docker) and inline fallback — is well-structured and the shell script changes are clean.

— Authored by egg

@james-in-a-box

This comment has been minimized.

james-in-a-box[bot]
james-in-a-box bot previously approved these changes Feb 15, 2026
Copy link
Contributor

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Verification Review

Verified PR #725 against the SDLC contract for issue #659. The implementation covers all 5 phases and 18 tasks from the task planner contract.

Acceptance Criteria Verification

Criterion Description Status
ac-1 Shared prompt criteria files exist in shared/prompts/ Verified
ac-2 GHA prompt scripts read criteria from shared files with fallback Verified
ac-3 Orchestrator criteria functions read from shared files with fallback Verified (minor gap noted)
ac-4 .egg/*-rules.md user-override mechanism works in both systems Verified (minor gap noted)
ac-5 CLAUDE.md excludes contract.md and orchestrator.md for non-SDLC Verified
ac-6 Dead code (redundant is_local branches) removed Verified
ac-7 All tests pass; new tests added for shared file loading, fallback, conditional assembly Verified

Detailed Findings

AC-1: Shared Prompt Files — All 4 files exist and contain the correct content:

  • shared/prompts/code-review-criteria.md — 4 review categories + How to Review + Skip sections
  • shared/prompts/agent-design-criteria.md — 5 anti-patterns + What to Skip section
  • shared/prompts/contract-review-criteria.md — Contract verification checklist (intentionally output-format-agnostic per file header)
  • shared/prompts/autofixer-rules.md — Auto-fixable + report-only categories with lock file and build failure items

AC-2: GHA Scripts — All 4 scripts correctly implement the shared file → inline fallback pattern:

  • build-review-prompt.sh — shared file + .egg/review-rules.md override + fallback ✓
  • build-autofixer-prompt.sh — shared file + .egg/autofixer-rules.md override + fallback ✓
  • build-agent-mode-design-review-prompt.sh — shared file + fallback (no override, as designed) ✓
  • build-contract-verification-prompt.sh — shared file + .egg/contract-rules.md override + fallback ✓

AC-3 & AC-4: Orchestrator_read_shared_criteria() helper correctly implements 3-tier lookup (user override → source tree → Docker path). All criteria functions use it correctly, except:

Advisory (non-blocking): _build_autofix_prompt() at orchestrator/routes/pipelines.py:2434 passes repo_path=None to _read_shared_criteria(), which means the .egg/autofixer-rules.md user override will never be found (the guard if user_override and repo_path: short-circuits). The repo parameter (GitHub identifier like jwbron/egg) is available but not the filesystem path. Also missing a logger.warning() when falling back to inline defaults, unlike the other three criteria functions. The shared file itself is still found via source tree / Docker path, so primary functionality is unaffected.

AC-5: Conditional CLAUDE.mdsandbox/entrypoint.py:setup_agent_rules() correctly checks os.environ.get("EGG_PIPELINE_ID") and conditionally extends rules_order with ["contract.md", "orchestrator.md"].

AC-6: Dead Code — Refine and plan phase branches consolidated into single if is_local and phase in ("refine", "plan") block, eliminating duplicate code.

AC-7: Testsimplement-results.json confirms 3366 passed, 0 failed. New test coverage includes:

  • tests/action/test_build_review_prompt.py:TestSharedCriteriaLoading — shared file, fallback, override
  • tests/action/test_build_agent_mode_design_review_prompt.py:TestSharedCriteriaLoading — shared file, fallback
  • orchestrator/tests/test_pipeline_prompts.py:TestReadSharedCriteria + TestSharedPromptFileContent — comprehensive coverage of helper function and all criteria functions
  • tests/sandbox/test_entrypoint.py:TestSetupAgentRules — conditional assembly with/without EGG_PIPELINE_ID

Additional Notes

  • Dockerfile correctly includes COPY shared/prompts/ ./prompts/ for production container
  • All agent executions (coder, tester, documenter, integrator) completed successfully
  • No contract violations found

Verdict

All 7 acceptance criteria are verified. The repo_path=None issue in _build_autofix_prompt is a minor gap that doesn't affect core functionality (shared file is still found; only user override is broken for this one function). This can be addressed in a follow-up.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed all review feedback in efedf6a:

#1. repo_path not propagated through review criteria dispatcherAgreed. Added repo_path parameter to _get_review_criteria_for_type(), _build_review_prompt(), and _build_agent_prompt(). Both call sites (multi-agent prompt builder and parallel reviewer spawner) now pass str(worktree_repo_path) through the chain, making user overrides reachable in production.

#2. _build_autofix_prompt hardcodes repo_path=NoneAgreed. Added repo_path parameter to _build_autofix_prompt() and updated the call site to pass str(worktree_repo_path). Also added the missing logger.warning() for the inline fallback path (noted in the contract review).

#3. Empty user override file returns empty stringAgreed. Added st_size > 0 check to _read_shared_criteria() so empty override files are treated as absent. Updated the test to verify the fallback behavior.

#4. README table broken by inline paragraphAgreed. Moved the paragraph below the table.

#5. Docker path fallback test is a no-opAgreed. Rewrote the test to mock Path.is_file and Path.read_text for the specific Docker path, verifying the fallback actually reaches and reads from /app/prompts/.

#6. Path.__new__ patching in entrypoint tests — Acknowledged as tech debt. Not addressed in this commit.

#7. Missing trailing newlines in shared prompt files — Verified all four files already end with \n. No change needed.

— Authored by egg

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No agent-mode design concerns in the delta since last review.

The changes (commit efedf6a) address prior code review feedback: repo_path propagation through the criteria dispatch chain, empty override file handling, and test fixes. All of this is standard parameter plumbing for shared infrastructure — a good pattern per the agent-mode design guidelines.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contract Re-verification: PR #725 (Incremental)

Re-verified PR #725 at commit efedf6ae8b9977025e9067d2c9a27a0940274b44, delta from previously reviewed 84ecb662fa2216b902b7ccada7ad575016449286.

Delta Summary

One commit (efedf6a) addressing 5 of 7 issues raised in the prior code review. Changes span 3 files:

  • orchestrator/routes/pipelines.pyrepo_path parameter propagation + empty override guard
  • orchestrator/tests/test_pipeline_prompts.py — Rewritten Docker fallback + empty override tests
  • action/README.md — Table formatting fix

Acceptance Criteria Re-verification

Criterion Description Status Notes
ac-1 Shared prompt criteria files exist in shared/prompts/ Verified No regression — files unchanged in delta
ac-2 GHA prompt scripts read criteria from shared files with fallback Verified No regression — scripts unchanged in delta
ac-3 Orchestrator criteria functions read from shared files with fallback Verified Strengthened — repo_path now propagated through full call chain
ac-4 .egg/*-rules.md user-override mechanism works in both systems Verified Fixed — previously unreachable in orchestrator; now plumbed through _get_review_criteria_for_type()_build_review_prompt()_build_agent_prompt()_run_multi_agent_phase() and parallel reviewer spawner
ac-5 CLAUDE.md excludes contract.md and orchestrator.md for non-SDLC Verified No regression — unchanged in delta
ac-6 Dead code removed Verified No regression — unchanged in delta
ac-7 All tests pass; new tests added Verified 36/36 tests pass in test_pipeline_prompts.py. Docker fallback test rewritten to actually exercise the code path. Empty override test updated to verify fallback behavior.

Detailed Verification of Delta Changes

Issue #1 (repo_path propagation) — Fixed correctly.

  • _get_review_criteria_for_type() now accepts repo_path: str | None = None and passes it to _get_code_review_criteria() and _get_contract_review_criteria()
  • _build_review_prompt() accepts and passes repo_path to the criteria dispatcher
  • _build_agent_prompt() accepts and passes repo_path to _build_review_prompt()
  • Both call sites in _run_pipeline() pass str(worktree_repo_path) — the multi-agent phase builder and the parallel reviewer spawner

Issue #2 (autofix repo_path=None) — Fixed correctly.

  • _build_autofix_prompt() now accepts repo_path: str | None = None and passes it to _read_shared_criteria() instead of None
  • Call site in _run_pipeline() passes str(worktree_repo_path)
  • Added logger.warning("Shared autofixer-rules.md not found, using inline fallback") for the inline fallback path

Issue #3 (empty override file) — Fixed correctly.

  • _read_shared_criteria() now checks override_path.stat().st_size > 0 in addition to is_file()
  • Empty override files are treated as absent and fall through to shared/Docker/inline paths

Issue #4 (README table) — Fixed correctly.

  • Paragraph moved below the complete table; table now renders correctly

Issue #5 (Docker fallback test) — Fixed correctly.

  • Test now mocks Path.is_file and Path.read_text to simulate the Docker path (/app/prompts/) being available while the source tree path is absent
  • Asserts the Docker content is returned

Previously Noted Items (Unchanged)

  • Issue #6 (Path.new patching) — Acknowledged as tech debt; not addressed (appropriate for follow-up)
  • Issue #7 (trailing newlines) — Verified pre-existing; no change needed

Regression Check

No regressions found. All 7 previously verified acceptance criteria remain valid. The delta strictly addresses prior review feedback without introducing new functionality or breaking existing behavior.

Verdict

All acceptance criteria verified. Prior review issues #1#5 addressed correctly. No contract violations.

— Authored by egg

Copy link
Contributor

@egg-reviewer egg-reviewer bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review: All previous issues addressed

Reviewed the delta from 84ecb66 to efedf6a. All five actionable issues from the previous review have been properly fixed:

#1. repo_path propagation — Fixed correctly. repo_path parameter added to _get_review_criteria_for_type(), _build_review_prompt(), and _build_agent_prompt(). Both production call sites (_run_multi_agent_phase at line 1990 and _spawn_reviewer at line 3333) pass str(worktree_repo_path). User overrides are now reachable through all code paths.

#2. _build_autofix_prompt repo_path=None — Fixed. repo_path parameter added to the function signature, call site at line 3242 passes str(worktree_repo_path). Missing logger.warning() also added for the inline fallback path.

#3. Empty override file handling — Fixed with override_path.stat().st_size > 0 check at line 828. Empty .egg/ override files now fall through to shared criteria. Test updated to verify fallback behavior.

#4. README table rendering — Fixed. Paragraph moved below the table.

#5. Docker path fallback test — Rewritten. Now actually tests the Docker fallback path by mocking Path.is_file and Path.read_text for specific paths. The test verifies that when the source tree path doesn't exist, _read_shared_criteria falls through to /app/prompts/.

No new issues found in the delta. The plumbing is complete — verified all callers of _build_review_prompt (2 sites) and _build_agent_prompt (1 site) pass repo_path. The _spawn_reviewer closure correctly captures worktree_repo_path via default argument, and worktree_repo_path is never reassigned after the closure is defined.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box
Copy link
Contributor

egg review completed. View run logs

5 previous review(s) hidden.

@jwbron jwbron merged commit 0d2d870 into main Feb 15, 2026
33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tune claude.md files and prompts

1 participant