Unify prompt criteria and trim CLAUDE.md for non-SDLC tasks (#659) by jwbron · Pull Request #725 · jwbron/egg

jwbron · 2026-02-15T17:53:57Z

Summary

Unifies prompt criteria across GHA bash scripts and the Python orchestrator by extracting shared review criteria into shared/prompts/*.md files (code-review, autofixer, agent-design, contract-review)
Trims CLAUDE.md for non-SDLC agents by conditionally excluding contract.md and orchestrator.md sections when EGG_PIPELINE_ID is not set, reducing ~6 KB of irrelevant context
Wires shared criteria into all prompt builders — both shell scripts (action/build-*.sh) and orchestrator (routes/pipelines.py) now read from the same source files with fallback to inline defaults
Adds comprehensive tests for prompt builders and entrypoint CLAUDE.md assembly

Pipeline artifacts included

This PR includes SDLC pipeline state files from the issue-659 run:

Contract: .egg-state/contracts/659.json
Analysis: .egg-state/drafts/659-analysis.md
Plan outputs: .egg-state/drafts/659-plan-*.json (architect, risk-analyst, task-planner)
Implement outputs: .egg-state/agent-outputs/integrator-output.json, .egg-state/checks/implement-results.json

Note: The implementation pipeline completed (coder, tester, documenter, integrator all passed) but the review phase was not reached due to token limits.

Closes #659

Test plan

Verify shared prompt files are loaded correctly by shell scripts (tests/action/test_build_*.py)
Verify orchestrator reads shared criteria (orchestrator/tests/test_pipeline_prompts.py)
Verify CLAUDE.md excludes contract/orchestrator sections for non-SDLC agents (tests/sandbox/test_entrypoint.py)
Verify fallback behavior when shared files are missing

🤖 Generated with Claude Code

Extracts review criteria (code review, agent-design, contract verification, autofixer rules) from duplicated inline content in GHA bash scripts and orchestrator Python functions into shared markdown files under shared/prompts/. Both systems now read from the same source of truth with fallback to inline defaults for rollout safety. GHA scripts use script-relative paths ($SCRIPT_DIR/../shared/prompts/) to read criteria from the trusted checkout, preserving the security model. User override files (.egg/*-rules.md) retain priority over shared defaults. Conditionally excludes contract.md and orchestrator.md from CLAUDE.md when EGG_PIPELINE_ID is not set, reducing token usage for non-SDLC invocations. Removes dead code in _build_phase_prompt() where is_local branches for refine and plan phases produced identical output. Adds orchestrator Dockerfile COPY for shared/prompts/ to make criteria available in the production container. Authored-by: egg

egg-reviewer

No agent-mode design concerns.

This PR aligns well with agent-mode design principles:

Shared infrastructure, not pre-fetching: The shared criteria files are small markdown guidelines (~1-2KB each) that orient the agent on what to review. This matches the "good pattern: shared infrastructure" called out in docs/guides/agent-mode-design.md.
CLAUDE.md trimming: Conditionally excluding irrelevant SDLC sections for non-pipeline agents reduces cognitive noise without constraining agent flexibility.
No anti-patterns introduced: No large pre-fetched data, no structured output for humans, no post-processing pipelines, no rigid procedures.

— Authored by egg

egg-reviewer

Review: Unify prompt criteria and trim CLAUDE.md (#659)

The overall approach is sound — extracting shared criteria into markdown files is the right level of abstraction, and the conditional CLAUDE.md assembly is a clean win. However, there are several issues that should be addressed before merging.

1. `repo_path` not propagated through `_get_review_criteria_for_type()` — user overrides silently ignored in orchestrator reviews

File: orchestrator/routes/pipelines.py lines 990-1005

_get_code_review_criteria() and _get_contract_review_criteria() now accept repo_path and use it to load .egg/review-rules.md user overrides. However, the dispatcher _get_review_criteria_for_type() calls them without repo_path:

def _get_review_criteria_for_type(reviewer_type: str, phase: str) -> str:
    ...
    elif reviewer_type == "code":
        return _get_code_review_criteria()   # ← no repo_path
    elif reviewer_type == "contract":
        return _get_contract_review_criteria()  # ← no repo_path

This is called from _build_review_prompt() at line 1166, which also has no repo_path parameter. The user-override mechanism that was added to the orchestrator is unreachable through the production code path. The feature only works when calling _get_code_review_criteria(repo_path=...) directly (i.e., only in tests).

This is either dead code or a missing plumbing fix. Either propagate repo_path through _get_review_criteria_for_type() and _build_review_prompt() to make the feature work, or remove the repo_path parameter from these functions and document that user overrides are GHA-only (matching pre-existing behavior).

2. `_build_autofix_prompt` hardcodes `repo_path=None`, making user overrides unreachable

File: orchestrator/routes/pipelines.py lines 2431-2434

autofixer_rules = _read_shared_criteria(
    "autofixer-rules.md",
    user_override="autofixer-rules.md",
    repo_path=None,  # ← always None
)

The function receives a repo parameter (line 2381) that is available, but the call hardcodes repo_path=None. Passing user_override="autofixer-rules.md" is pointless since the override check is gated by if user_override and repo_path:. This is misleading — it looks like user overrides are supported when they aren't.

Either pass the repo parameter through (after computing a usable repo path), or remove the user_override argument to avoid the false promise.

3. Empty user override file returns empty string, silently producing an empty criteria section

File: orchestrator/routes/pipelines.py lines 826-829

if user_override and repo_path:
    override_path = Path(repo_path) / ".egg" / user_override
    if override_path.is_file():
        return override_path.read_text()  # returns "" for empty file

If a user creates an empty .egg/review-rules.md file (e.g., by accident, or to "clear" defaults), _read_shared_criteria returns "" (empty string). This passes the if content is not None: check in callers, producing a review prompt with an empty criteria section. The test at line 2190-2203 actually documents this behavior as expected.

An empty criteria section in a review prompt would produce a low-quality review. Consider treating empty override files the same as absent ones: if override_path.is_file() and override_path.stat().st_size > 0:, or add a check at the caller: if content is not None and content.strip():.

4. `action/README.md` — paragraph inserted inside a markdown table breaks rendering

File: action/README.md

The diff inserts a paragraph between table rows:

| `build-contract-verification-prompt.sh` | Builds prompts for contract verification reviews |

Prompt builders load review criteria from `shared/prompts/` ...
| **Convention Documents** | |
| `review-conventions.md` | Code review conventions and guidelines |

This breaks the markdown table. The parser will end the table at the blank line, render the paragraph, then render the remaining rows as a malformed second table (no header row/separator). Move the paragraph either before or after the entire table, or make it a note below the table.

5. Test for Docker path fallback is a no-op

File: orchestrator/tests/test_pipeline_prompts.py lines 2158-2188

The test_docker_path_fallback method has multiple abandoned patching attempts (including with ... pass blocks) and ends with a simple assert _read_shared_criteria("definitely-does-not-exist.md") is None, which is just a duplicate of test_returns_none_for_missing_file. The test name says it tests Docker path fallback, but it doesn't actually test that code path. Either write a working test for the Docker fallback or remove this method.

6. `Path.new` patching in entrypoint tests is fragile

File: tests/sandbox/test_entrypoint.py lines 3224-3232, 3267-3275, etc.

The tests patch Path.__new__ to redirect /opt/claude-rules to a temp directory. This is brittle — it intercepts all Path construction globally, which could interfere with other Path operations during test execution (file reads, symlink creation, etc.). A more robust approach would be to extract the rules directory path as a parameter or module-level constant in setup_agent_rules() so tests can override it cleanly. This is a pre-existing design issue that the tests work around, so it's not blocking, but worth noting as tech debt.

7. Minor: Missing trailing newline in shared prompt files

Files: shared/prompts/agent-design-criteria.md, shared/prompts/autofixer-rules.md, shared/prompts/code-review-criteria.md, shared/prompts/contract-review-criteria.md

These files don't end with a trailing newline. When concatenated into prompts (especially via cat in shell scripts), this can cause the last line to merge with subsequent content. Not critical since markdown rendering is forgiving, but conventional practice is to end files with a newline.

Summary

Issues #1 and #2 are correctness bugs: the user-override feature for orchestrator reviews is advertised (parameters exist, tests pass) but is unreachable through production code paths. Issue #4 is a documentation rendering bug. Issue #3 is an edge case that could produce degraded behavior. Issues #5-7 are minor.

The core design — shared markdown files with three-tier lookup (user override → source tree → Docker) and inline fallback — is well-structured and the shell script changes are clean.

— Authored by egg

james-in-a-box

Contract Verification Review

Verified PR #725 against the SDLC contract for issue #659. The implementation covers all 5 phases and 18 tasks from the task planner contract.

Acceptance Criteria Verification

Criterion	Description	Status
ac-1	Shared prompt criteria files exist in `shared/prompts/`	Verified
ac-2	GHA prompt scripts read criteria from shared files with fallback	Verified
ac-3	Orchestrator criteria functions read from shared files with fallback	Verified (minor gap noted)
ac-4	`.egg/*-rules.md` user-override mechanism works in both systems	Verified (minor gap noted)
ac-5	CLAUDE.md excludes `contract.md` and `orchestrator.md` for non-SDLC	Verified
ac-6	Dead code (redundant `is_local` branches) removed	Verified
ac-7	All tests pass; new tests added for shared file loading, fallback, conditional assembly	Verified

Detailed Findings

AC-1: Shared Prompt Files — All 4 files exist and contain the correct content:

shared/prompts/code-review-criteria.md — 4 review categories + How to Review + Skip sections
shared/prompts/agent-design-criteria.md — 5 anti-patterns + What to Skip section
shared/prompts/contract-review-criteria.md — Contract verification checklist (intentionally output-format-agnostic per file header)
shared/prompts/autofixer-rules.md — Auto-fixable + report-only categories with lock file and build failure items

AC-2: GHA Scripts — All 4 scripts correctly implement the shared file → inline fallback pattern:

build-review-prompt.sh — shared file + .egg/review-rules.md override + fallback ✓
build-autofixer-prompt.sh — shared file + .egg/autofixer-rules.md override + fallback ✓
build-agent-mode-design-review-prompt.sh — shared file + fallback (no override, as designed) ✓
build-contract-verification-prompt.sh — shared file + .egg/contract-rules.md override + fallback ✓

AC-3 & AC-4: Orchestrator — _read_shared_criteria() helper correctly implements 3-tier lookup (user override → source tree → Docker path). All criteria functions use it correctly, except:

Advisory (non-blocking): _build_autofix_prompt() at orchestrator/routes/pipelines.py:2434 passes repo_path=None to _read_shared_criteria(), which means the .egg/autofixer-rules.md user override will never be found (the guard if user_override and repo_path: short-circuits). The repo parameter (GitHub identifier like jwbron/egg) is available but not the filesystem path. Also missing a logger.warning() when falling back to inline defaults, unlike the other three criteria functions. The shared file itself is still found via source tree / Docker path, so primary functionality is unaffected.

AC-5: Conditional CLAUDE.md — sandbox/entrypoint.py:setup_agent_rules() correctly checks os.environ.get("EGG_PIPELINE_ID") and conditionally extends rules_order with ["contract.md", "orchestrator.md"].

AC-6: Dead Code — Refine and plan phase branches consolidated into single if is_local and phase in ("refine", "plan") block, eliminating duplicate code.

AC-7: Tests — implement-results.json confirms 3366 passed, 0 failed. New test coverage includes:

tests/action/test_build_review_prompt.py:TestSharedCriteriaLoading — shared file, fallback, override
tests/action/test_build_agent_mode_design_review_prompt.py:TestSharedCriteriaLoading — shared file, fallback
orchestrator/tests/test_pipeline_prompts.py:TestReadSharedCriteria + TestSharedPromptFileContent — comprehensive coverage of helper function and all criteria functions
tests/sandbox/test_entrypoint.py:TestSetupAgentRules — conditional assembly with/without EGG_PIPELINE_ID

Additional Notes

Dockerfile correctly includes COPY shared/prompts/ ./prompts/ for production container
All agent executions (coder, tester, documenter, integrator) completed successfully
No contract violations found

Verdict

All 7 acceptance criteria are verified. The repo_path=None issue in _build_autofix_prompt is a minor gap that doesn't affect core functionality (shared file is still found; only user override is broken for this one function). This can be addressed in a follow-up.

— Authored by egg

james-in-a-box

Addressed all review feedback in efedf6a:

#1. repo_path not propagated through review criteria dispatcher — Agreed. Added repo_path parameter to _get_review_criteria_for_type(), _build_review_prompt(), and _build_agent_prompt(). Both call sites (multi-agent prompt builder and parallel reviewer spawner) now pass str(worktree_repo_path) through the chain, making user overrides reachable in production.

#2. _build_autofix_prompt hardcodes repo_path=None — Agreed. Added repo_path parameter to _build_autofix_prompt() and updated the call site to pass str(worktree_repo_path). Also added the missing logger.warning() for the inline fallback path (noted in the contract review).

#3. Empty user override file returns empty string — Agreed. Added st_size > 0 check to _read_shared_criteria() so empty override files are treated as absent. Updated the test to verify the fallback behavior.

#4. README table broken by inline paragraph — Agreed. Moved the paragraph below the table.

#5. Docker path fallback test is a no-op — Agreed. Rewrote the test to mock Path.is_file and Path.read_text for the specific Docker path, verifying the fallback actually reaches and reads from /app/prompts/.

#6. Path.__new__ patching in entrypoint tests — Acknowledged as tech debt. Not addressed in this commit.

#7. Missing trailing newlines in shared prompt files — Verified all four files already end with \n. No change needed.

— Authored by egg

egg-reviewer

No agent-mode design concerns in the delta since last review.

The changes (commit efedf6a) address prior code review feedback: repo_path propagation through the criteria dispatch chain, empty override file handling, and test fixes. All of this is standard parameter plumbing for shared infrastructure — a good pattern per the agent-mode design guidelines.

— Authored by egg

james-in-a-box

Contract Re-verification: PR #725 (Incremental)

Re-verified PR #725 at commit efedf6ae8b9977025e9067d2c9a27a0940274b44, delta from previously reviewed 84ecb662fa2216b902b7ccada7ad575016449286.

Delta Summary

One commit (efedf6a) addressing 5 of 7 issues raised in the prior code review. Changes span 3 files:

orchestrator/routes/pipelines.py — repo_path parameter propagation + empty override guard
orchestrator/tests/test_pipeline_prompts.py — Rewritten Docker fallback + empty override tests
action/README.md — Table formatting fix

Acceptance Criteria Re-verification

Criterion	Description	Status	Notes
ac-1	Shared prompt criteria files exist in `shared/prompts/`	Verified	No regression — files unchanged in delta
ac-2	GHA prompt scripts read criteria from shared files with fallback	Verified	No regression — scripts unchanged in delta
ac-3	Orchestrator criteria functions read from shared files with fallback	Verified	Strengthened — `repo_path` now propagated through full call chain
ac-4	`.egg/*-rules.md` user-override mechanism works in both systems	Verified	Fixed — previously unreachable in orchestrator; now plumbed through `_get_review_criteria_for_type()` → `_build_review_prompt()` → `_build_agent_prompt()` → `_run_multi_agent_phase()` and parallel reviewer spawner
ac-5	CLAUDE.md excludes `contract.md` and `orchestrator.md` for non-SDLC	Verified	No regression — unchanged in delta
ac-6	Dead code removed	Verified	No regression — unchanged in delta
ac-7	All tests pass; new tests added	Verified	36/36 tests pass in `test_pipeline_prompts.py`. Docker fallback test rewritten to actually exercise the code path. Empty override test updated to verify fallback behavior.

Detailed Verification of Delta Changes

Issue #1 (repo_path propagation) — Fixed correctly.

_get_review_criteria_for_type() now accepts repo_path: str | None = None and passes it to _get_code_review_criteria() and _get_contract_review_criteria()
_build_review_prompt() accepts and passes repo_path to the criteria dispatcher
_build_agent_prompt() accepts and passes repo_path to _build_review_prompt()
Both call sites in _run_pipeline() pass str(worktree_repo_path) — the multi-agent phase builder and the parallel reviewer spawner

Issue #2 (autofix repo_path=None) — Fixed correctly.

_build_autofix_prompt() now accepts repo_path: str | None = None and passes it to _read_shared_criteria() instead of None
Call site in _run_pipeline() passes str(worktree_repo_path)
Added logger.warning("Shared autofixer-rules.md not found, using inline fallback") for the inline fallback path

Issue #3 (empty override file) — Fixed correctly.

_read_shared_criteria() now checks override_path.stat().st_size > 0 in addition to is_file()
Empty override files are treated as absent and fall through to shared/Docker/inline paths

Issue #4 (README table) — Fixed correctly.

Paragraph moved below the complete table; table now renders correctly

Issue #5 (Docker fallback test) — Fixed correctly.

Test now mocks Path.is_file and Path.read_text to simulate the Docker path (/app/prompts/) being available while the source tree path is absent
Asserts the Docker content is returned

Previously Noted Items (Unchanged)

Issue #6 (Path.new patching) — Acknowledged as tech debt; not addressed (appropriate for follow-up)
Issue #7 (trailing newlines) — Verified pre-existing; no change needed

Regression Check

No regressions found. All 7 previously verified acceptance criteria remain valid. The delta strictly addresses prior review feedback without introducing new functionality or breaking existing behavior.

Verdict

All acceptance criteria verified. Prior review issues #1–#5 addressed correctly. No contract violations.

— Authored by egg

egg-reviewer

Re-review: All previous issues addressed

Reviewed the delta from 84ecb66 to efedf6a. All five actionable issues from the previous review have been properly fixed:

#1. repo_path propagation — Fixed correctly. repo_path parameter added to _get_review_criteria_for_type(), _build_review_prompt(), and _build_agent_prompt(). Both production call sites (_run_multi_agent_phase at line 1990 and _spawn_reviewer at line 3333) pass str(worktree_repo_path). User overrides are now reachable through all code paths.

#2. _build_autofix_prompt repo_path=None — Fixed. repo_path parameter added to the function signature, call site at line 3242 passes str(worktree_repo_path). Missing logger.warning() also added for the inline fallback path.

#3. Empty override file handling — Fixed with override_path.stat().st_size > 0 check at line 828. Empty .egg/ override files now fall through to shared criteria. Test updated to verify fallback behavior.

#4. README table rendering — Fixed. Paragraph moved below the table.

#5. Docker path fallback test — Rewritten. Now actually tests the Docker fallback path by mocking Path.is_file and Path.read_text for specific paths. The test verifies that when the source tree path doesn't exist, _read_shared_criteria falls through to /app/prompts/.

No new issues found in the delta. The plumbing is complete — verified all callers of _build_review_prompt (2 sites) and _build_agent_prompt (1 site) pass repo_path. The _spawn_reviewer closure correctly captures worktree_repo_path via default argument, and worktree_repo_path is never reassigned after the closure is defined.

— Authored by egg

james-in-a-box · 2026-02-15T18:23:22Z

egg review completed. View run logs

5 previous review(s) hidden.

egg-orchestrator added 14 commits February 15, 2026 09:50

Initialize SDLC contract for issue #659

25709fe

Add analysis draft for issue #659: tune claude.md and prompts

0c6765e

Add analysis draft for issue #659: tune claude.md and prompts

4f3a97f

Add refine analysis for prompt and claude.md audit (#659)

87803b7

Add architect analysis for issue #659 plan phase

9da5d47

Add plan phase agent outputs for issue #659

7e0fd91

Document shared prompt criteria and update CLI references

e1caed6

Add tests for shared prompt criteria and shell script changes

b1aa132

Fix unused variable lint error in test_pipeline_prompts

eb68621

Add integrator output for issue #659 implement phase

c2d1736

Add implement phase check results (all passed)

0b88562

Persist statefiles after implement phase

fc6b56e

Persist statefiles after pr phase

84ecb66

egg-reviewer bot approved these changes Feb 15, 2026

View reviewed changes