Unify prompt criteria and trim CLAUDE.md for non-SDLC tasks (#659)#725
Unify prompt criteria and trim CLAUDE.md for non-SDLC tasks (#659)#725
Conversation
Extracts review criteria (code review, agent-design, contract verification, autofixer rules) from duplicated inline content in GHA bash scripts and orchestrator Python functions into shared markdown files under shared/prompts/. Both systems now read from the same source of truth with fallback to inline defaults for rollout safety. GHA scripts use script-relative paths ($SCRIPT_DIR/../shared/prompts/) to read criteria from the trusted checkout, preserving the security model. User override files (.egg/*-rules.md) retain priority over shared defaults. Conditionally excludes contract.md and orchestrator.md from CLAUDE.md when EGG_PIPELINE_ID is not set, reducing token usage for non-SDLC invocations. Removes dead code in _build_phase_prompt() where is_local branches for refine and plan phases produced identical output. Adds orchestrator Dockerfile COPY for shared/prompts/ to make criteria available in the production container. Authored-by: egg
There was a problem hiding this comment.
No agent-mode design concerns.
This PR aligns well with agent-mode design principles:
- Shared infrastructure, not pre-fetching: The shared criteria files are small markdown guidelines (~1-2KB each) that orient the agent on what to review. This matches the "good pattern: shared infrastructure" called out in
docs/guides/agent-mode-design.md. - CLAUDE.md trimming: Conditionally excluding irrelevant SDLC sections for non-pipeline agents reduces cognitive noise without constraining agent flexibility.
- No anti-patterns introduced: No large pre-fetched data, no structured output for humans, no post-processing pipelines, no rigid procedures.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Review: Unify prompt criteria and trim CLAUDE.md (#659)
The overall approach is sound — extracting shared criteria into markdown files is the right level of abstraction, and the conditional CLAUDE.md assembly is a clean win. However, there are several issues that should be addressed before merging.
1. repo_path not propagated through _get_review_criteria_for_type() — user overrides silently ignored in orchestrator reviews
File: orchestrator/routes/pipelines.py lines 990-1005
_get_code_review_criteria() and _get_contract_review_criteria() now accept repo_path and use it to load .egg/review-rules.md user overrides. However, the dispatcher _get_review_criteria_for_type() calls them without repo_path:
def _get_review_criteria_for_type(reviewer_type: str, phase: str) -> str:
...
elif reviewer_type == "code":
return _get_code_review_criteria() # ← no repo_path
elif reviewer_type == "contract":
return _get_contract_review_criteria() # ← no repo_pathThis is called from _build_review_prompt() at line 1166, which also has no repo_path parameter. The user-override mechanism that was added to the orchestrator is unreachable through the production code path. The feature only works when calling _get_code_review_criteria(repo_path=...) directly (i.e., only in tests).
This is either dead code or a missing plumbing fix. Either propagate repo_path through _get_review_criteria_for_type() and _build_review_prompt() to make the feature work, or remove the repo_path parameter from these functions and document that user overrides are GHA-only (matching pre-existing behavior).
2. _build_autofix_prompt hardcodes repo_path=None, making user overrides unreachable
File: orchestrator/routes/pipelines.py lines 2431-2434
autofixer_rules = _read_shared_criteria(
"autofixer-rules.md",
user_override="autofixer-rules.md",
repo_path=None, # ← always None
)The function receives a repo parameter (line 2381) that is available, but the call hardcodes repo_path=None. Passing user_override="autofixer-rules.md" is pointless since the override check is gated by if user_override and repo_path:. This is misleading — it looks like user overrides are supported when they aren't.
Either pass the repo parameter through (after computing a usable repo path), or remove the user_override argument to avoid the false promise.
3. Empty user override file returns empty string, silently producing an empty criteria section
File: orchestrator/routes/pipelines.py lines 826-829
if user_override and repo_path:
override_path = Path(repo_path) / ".egg" / user_override
if override_path.is_file():
return override_path.read_text() # returns "" for empty fileIf a user creates an empty .egg/review-rules.md file (e.g., by accident, or to "clear" defaults), _read_shared_criteria returns "" (empty string). This passes the if content is not None: check in callers, producing a review prompt with an empty criteria section. The test at line 2190-2203 actually documents this behavior as expected.
An empty criteria section in a review prompt would produce a low-quality review. Consider treating empty override files the same as absent ones: if override_path.is_file() and override_path.stat().st_size > 0:, or add a check at the caller: if content is not None and content.strip():.
4. action/README.md — paragraph inserted inside a markdown table breaks rendering
File: action/README.md
The diff inserts a paragraph between table rows:
| `build-contract-verification-prompt.sh` | Builds prompts for contract verification reviews |
Prompt builders load review criteria from `shared/prompts/` ...
| **Convention Documents** | |
| `review-conventions.md` | Code review conventions and guidelines |This breaks the markdown table. The parser will end the table at the blank line, render the paragraph, then render the remaining rows as a malformed second table (no header row/separator). Move the paragraph either before or after the entire table, or make it a note below the table.
5. Test for Docker path fallback is a no-op
File: orchestrator/tests/test_pipeline_prompts.py lines 2158-2188
The test_docker_path_fallback method has multiple abandoned patching attempts (including with ... pass blocks) and ends with a simple assert _read_shared_criteria("definitely-does-not-exist.md") is None, which is just a duplicate of test_returns_none_for_missing_file. The test name says it tests Docker path fallback, but it doesn't actually test that code path. Either write a working test for the Docker fallback or remove this method.
6. Path.__new__ patching in entrypoint tests is fragile
File: tests/sandbox/test_entrypoint.py lines 3224-3232, 3267-3275, etc.
The tests patch Path.__new__ to redirect /opt/claude-rules to a temp directory. This is brittle — it intercepts all Path construction globally, which could interfere with other Path operations during test execution (file reads, symlink creation, etc.). A more robust approach would be to extract the rules directory path as a parameter or module-level constant in setup_agent_rules() so tests can override it cleanly. This is a pre-existing design issue that the tests work around, so it's not blocking, but worth noting as tech debt.
7. Minor: Missing trailing newline in shared prompt files
Files: shared/prompts/agent-design-criteria.md, shared/prompts/autofixer-rules.md, shared/prompts/code-review-criteria.md, shared/prompts/contract-review-criteria.md
These files don't end with a trailing newline. When concatenated into prompts (especially via cat in shell scripts), this can cause the last line to merge with subsequent content. Not critical since markdown rendering is forgiving, but conventional practice is to end files with a newline.
Summary
Issues #1 and #2 are correctness bugs: the user-override feature for orchestrator reviews is advertised (parameters exist, tests pass) but is unreachable through production code paths. Issue #4 is a documentation rendering bug. Issue #3 is an edge case that could produce degraded behavior. Issues #5-7 are minor.
The core design — shared markdown files with three-tier lookup (user override → source tree → Docker) and inline fallback — is well-structured and the shell script changes are clean.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Contract Verification Review
Verified PR #725 against the SDLC contract for issue #659. The implementation covers all 5 phases and 18 tasks from the task planner contract.
Acceptance Criteria Verification
| Criterion | Description | Status |
|---|---|---|
| ac-1 | Shared prompt criteria files exist in shared/prompts/ |
Verified |
| ac-2 | GHA prompt scripts read criteria from shared files with fallback | Verified |
| ac-3 | Orchestrator criteria functions read from shared files with fallback | Verified (minor gap noted) |
| ac-4 | .egg/*-rules.md user-override mechanism works in both systems |
Verified (minor gap noted) |
| ac-5 | CLAUDE.md excludes contract.md and orchestrator.md for non-SDLC |
Verified |
| ac-6 | Dead code (redundant is_local branches) removed |
Verified |
| ac-7 | All tests pass; new tests added for shared file loading, fallback, conditional assembly | Verified |
Detailed Findings
AC-1: Shared Prompt Files — All 4 files exist and contain the correct content:
shared/prompts/code-review-criteria.md— 4 review categories + How to Review + Skip sectionsshared/prompts/agent-design-criteria.md— 5 anti-patterns + What to Skip sectionshared/prompts/contract-review-criteria.md— Contract verification checklist (intentionally output-format-agnostic per file header)shared/prompts/autofixer-rules.md— Auto-fixable + report-only categories with lock file and build failure items
AC-2: GHA Scripts — All 4 scripts correctly implement the shared file → inline fallback pattern:
build-review-prompt.sh— shared file +.egg/review-rules.mdoverride + fallback ✓build-autofixer-prompt.sh— shared file +.egg/autofixer-rules.mdoverride + fallback ✓build-agent-mode-design-review-prompt.sh— shared file + fallback (no override, as designed) ✓build-contract-verification-prompt.sh— shared file +.egg/contract-rules.mdoverride + fallback ✓
AC-3 & AC-4: Orchestrator — _read_shared_criteria() helper correctly implements 3-tier lookup (user override → source tree → Docker path). All criteria functions use it correctly, except:
Advisory (non-blocking):
_build_autofix_prompt()atorchestrator/routes/pipelines.py:2434passesrepo_path=Noneto_read_shared_criteria(), which means the.egg/autofixer-rules.mduser override will never be found (the guardif user_override and repo_path:short-circuits). Therepoparameter (GitHub identifier likejwbron/egg) is available but not the filesystem path. Also missing alogger.warning()when falling back to inline defaults, unlike the other three criteria functions. The shared file itself is still found via source tree / Docker path, so primary functionality is unaffected.
AC-5: Conditional CLAUDE.md — sandbox/entrypoint.py:setup_agent_rules() correctly checks os.environ.get("EGG_PIPELINE_ID") and conditionally extends rules_order with ["contract.md", "orchestrator.md"].
AC-6: Dead Code — Refine and plan phase branches consolidated into single if is_local and phase in ("refine", "plan") block, eliminating duplicate code.
AC-7: Tests — implement-results.json confirms 3366 passed, 0 failed. New test coverage includes:
tests/action/test_build_review_prompt.py:TestSharedCriteriaLoading— shared file, fallback, overridetests/action/test_build_agent_mode_design_review_prompt.py:TestSharedCriteriaLoading— shared file, fallbackorchestrator/tests/test_pipeline_prompts.py:TestReadSharedCriteria+TestSharedPromptFileContent— comprehensive coverage of helper function and all criteria functionstests/sandbox/test_entrypoint.py:TestSetupAgentRules— conditional assembly with/withoutEGG_PIPELINE_ID
Additional Notes
- Dockerfile correctly includes
COPY shared/prompts/ ./prompts/for production container - All agent executions (coder, tester, documenter, integrator) completed successfully
- No contract violations found
Verdict
All 7 acceptance criteria are verified. The repo_path=None issue in _build_autofix_prompt is a minor gap that doesn't affect core functionality (shared file is still found; only user override is broken for this one function). This can be addressed in a follow-up.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Addressed all review feedback in efedf6a:
#1. repo_path not propagated through review criteria dispatcher — Agreed. Added repo_path parameter to _get_review_criteria_for_type(), _build_review_prompt(), and _build_agent_prompt(). Both call sites (multi-agent prompt builder and parallel reviewer spawner) now pass str(worktree_repo_path) through the chain, making user overrides reachable in production.
#2. _build_autofix_prompt hardcodes repo_path=None — Agreed. Added repo_path parameter to _build_autofix_prompt() and updated the call site to pass str(worktree_repo_path). Also added the missing logger.warning() for the inline fallback path (noted in the contract review).
#3. Empty user override file returns empty string — Agreed. Added st_size > 0 check to _read_shared_criteria() so empty override files are treated as absent. Updated the test to verify the fallback behavior.
#4. README table broken by inline paragraph — Agreed. Moved the paragraph below the table.
#5. Docker path fallback test is a no-op — Agreed. Rewrote the test to mock Path.is_file and Path.read_text for the specific Docker path, verifying the fallback actually reaches and reads from /app/prompts/.
#6. Path.__new__ patching in entrypoint tests — Acknowledged as tech debt. Not addressed in this commit.
#7. Missing trailing newlines in shared prompt files — Verified all four files already end with \n. No change needed.
— Authored by egg
There was a problem hiding this comment.
No agent-mode design concerns in the delta since last review.
The changes (commit efedf6a) address prior code review feedback: repo_path propagation through the criteria dispatch chain, empty override file handling, and test fixes. All of this is standard parameter plumbing for shared infrastructure — a good pattern per the agent-mode design guidelines.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Contract Re-verification: PR #725 (Incremental)
Re-verified PR #725 at commit efedf6ae8b9977025e9067d2c9a27a0940274b44, delta from previously reviewed 84ecb662fa2216b902b7ccada7ad575016449286.
Delta Summary
One commit (efedf6a) addressing 5 of 7 issues raised in the prior code review. Changes span 3 files:
orchestrator/routes/pipelines.py—repo_pathparameter propagation + empty override guardorchestrator/tests/test_pipeline_prompts.py— Rewritten Docker fallback + empty override testsaction/README.md— Table formatting fix
Acceptance Criteria Re-verification
| Criterion | Description | Status | Notes |
|---|---|---|---|
| ac-1 | Shared prompt criteria files exist in shared/prompts/ |
Verified | No regression — files unchanged in delta |
| ac-2 | GHA prompt scripts read criteria from shared files with fallback | Verified | No regression — scripts unchanged in delta |
| ac-3 | Orchestrator criteria functions read from shared files with fallback | Verified | Strengthened — repo_path now propagated through full call chain |
| ac-4 | .egg/*-rules.md user-override mechanism works in both systems |
Verified | Fixed — previously unreachable in orchestrator; now plumbed through _get_review_criteria_for_type() → _build_review_prompt() → _build_agent_prompt() → _run_multi_agent_phase() and parallel reviewer spawner |
| ac-5 | CLAUDE.md excludes contract.md and orchestrator.md for non-SDLC |
Verified | No regression — unchanged in delta |
| ac-6 | Dead code removed | Verified | No regression — unchanged in delta |
| ac-7 | All tests pass; new tests added | Verified | 36/36 tests pass in test_pipeline_prompts.py. Docker fallback test rewritten to actually exercise the code path. Empty override test updated to verify fallback behavior. |
Detailed Verification of Delta Changes
Issue #1 (repo_path propagation) — Fixed correctly.
_get_review_criteria_for_type()now acceptsrepo_path: str | None = Noneand passes it to_get_code_review_criteria()and_get_contract_review_criteria()_build_review_prompt()accepts and passesrepo_pathto the criteria dispatcher_build_agent_prompt()accepts and passesrepo_pathto_build_review_prompt()- Both call sites in
_run_pipeline()passstr(worktree_repo_path)— the multi-agent phase builder and the parallel reviewer spawner
Issue #2 (autofix repo_path=None) — Fixed correctly.
_build_autofix_prompt()now acceptsrepo_path: str | None = Noneand passes it to_read_shared_criteria()instead ofNone- Call site in
_run_pipeline()passesstr(worktree_repo_path) - Added
logger.warning("Shared autofixer-rules.md not found, using inline fallback")for the inline fallback path
Issue #3 (empty override file) — Fixed correctly.
_read_shared_criteria()now checksoverride_path.stat().st_size > 0in addition tois_file()- Empty override files are treated as absent and fall through to shared/Docker/inline paths
Issue #4 (README table) — Fixed correctly.
- Paragraph moved below the complete table; table now renders correctly
Issue #5 (Docker fallback test) — Fixed correctly.
- Test now mocks
Path.is_fileandPath.read_textto simulate the Docker path (/app/prompts/) being available while the source tree path is absent - Asserts the Docker content is returned
Previously Noted Items (Unchanged)
- Issue #6 (Path.new patching) — Acknowledged as tech debt; not addressed (appropriate for follow-up)
- Issue #7 (trailing newlines) — Verified pre-existing; no change needed
Regression Check
No regressions found. All 7 previously verified acceptance criteria remain valid. The delta strictly addresses prior review feedback without introducing new functionality or breaking existing behavior.
Verdict
All acceptance criteria verified. Prior review issues #1–#5 addressed correctly. No contract violations.
— Authored by egg
There was a problem hiding this comment.
Re-review: All previous issues addressed
Reviewed the delta from 84ecb66 to efedf6a. All five actionable issues from the previous review have been properly fixed:
#1. repo_path propagation — Fixed correctly. repo_path parameter added to _get_review_criteria_for_type(), _build_review_prompt(), and _build_agent_prompt(). Both production call sites (_run_multi_agent_phase at line 1990 and _spawn_reviewer at line 3333) pass str(worktree_repo_path). User overrides are now reachable through all code paths.
#2. _build_autofix_prompt repo_path=None — Fixed. repo_path parameter added to the function signature, call site at line 3242 passes str(worktree_repo_path). Missing logger.warning() also added for the inline fallback path.
#3. Empty override file handling — Fixed with override_path.stat().st_size > 0 check at line 828. Empty .egg/ override files now fall through to shared criteria. Test updated to verify fallback behavior.
#4. README table rendering — Fixed. Paragraph moved below the table.
#5. Docker path fallback test — Rewritten. Now actually tests the Docker fallback path by mocking Path.is_file and Path.read_text for specific paths. The test verifies that when the source tree path doesn't exist, _read_shared_criteria falls through to /app/prompts/.
No new issues found in the delta. The plumbing is complete — verified all callers of _build_review_prompt (2 sites) and _build_agent_prompt (1 site) pass repo_path. The _spawn_reviewer closure correctly captures worktree_repo_path via default argument, and worktree_repo_path is never reassigned after the closure is defined.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
|
egg review completed. View run logs 5 previous review(s) hidden. |
Summary
shared/prompts/*.mdfiles (code-review, autofixer, agent-design, contract-review)contract.mdandorchestrator.mdsections whenEGG_PIPELINE_IDis not set, reducing ~6 KB of irrelevant contextaction/build-*.sh) and orchestrator (routes/pipelines.py) now read from the same source files with fallback to inline defaultsPipeline artifacts included
This PR includes SDLC pipeline state files from the issue-659 run:
.egg-state/contracts/659.json.egg-state/drafts/659-analysis.md.egg-state/drafts/659-plan-*.json(architect, risk-analyst, task-planner).egg-state/agent-outputs/integrator-output.json,.egg-state/checks/implement-results.jsonNote: The implementation pipeline completed (coder, tester, documenter, integrator all passed) but the review phase was not reached due to token limits.
Closes #659
Test plan
tests/action/test_build_*.py)orchestrator/tests/test_pipeline_prompts.py)tests/sandbox/test_entrypoint.py)🤖 Generated with Claude Code