Conversation
There was a problem hiding this comment.
Agent-mode design review: No concerns.
This PR adds git infrastructure to keep issue branches up-to-date with main before the agent works. These are standard CI/git operations that don't affect agent prompts, output formats, or behavioral constraints.
(Unable to approve own PR — human reviewer needed for merge.)
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Code Review: Merge main before starting SDLC work
I've reviewed the full diff and examined the surrounding workflow context. Overall, this is a sound approach to addressing stale branches, but there are several issues that need attention.
Issues Requiring Revision
1. Missing merge commit push in sdlc-pipeline.yml init job
File: .github/workflows/sdlc-pipeline.yml, lines 267-276
The init job merges main into the issue branch but does not push the merge commit. This is inconsistent with the work-loop implementation and means the merge commit will be lost if another push occurs before the "Push branch" step completes.
# Current code - no push after merge
if git merge origin/main --no-edit; then
echo "Successfully merged main into ${BRANCH_NAME}"
else
...The work-loop version correctly pushes the merge commit:
if git merge origin/main --no-edit; then
echo "Successfully merged main into ${BRANCH_NAME}"
# Push the merge commit so reviewers and subsequent steps see it
git push origin "${BRANCH_NAME}" || {
git pull --rebase origin "${BRANCH_NAME}"
git push origin "${BRANCH_NAME}"
}Recommendation: Add the same push logic to the init job, or rely solely on the downstream "Push branch" step (which happens shortly after). Given that other operations may commit before "Push branch" runs, pushing immediately after the merge is safer.
2. Race condition in work-loop push fallback
File: .github/workflows/sdlc-work-loop.yml, lines 335-338
The push fallback uses git pull --rebase after a failed push, but this can fail silently or produce unexpected results:
git push origin "${BRANCH_NAME}" || {
git pull --rebase origin "${BRANCH_NAME}"
git push origin "${BRANCH_NAME}"
}Issues:
- If
git pull --rebasefails (e.g., rebase conflicts), the subsequentgit pushwill still run and fail, but the error message will be unclear. - If the rebase succeeds but the second push fails, the job continues with no indication of failure.
- The entire block is in a subshell that doesn't propagate exit codes properly to the outer
ifstatement.
Recommendation: Add explicit error handling:
git push origin "${BRANCH_NAME}" || {
echo "::warning::Initial push failed, attempting pull --rebase and retry..."
if ! git pull --rebase origin "${BRANCH_NAME}"; then
echo "::error::Failed to rebase during push retry"
exit 1
fi
if ! git push origin "${BRANCH_NAME}"; then
echo "::error::Failed to push after rebase"
exit 1
fi
}3. Duplicate merge operations in work-loop
File: .github/workflows/sdlc-work-loop.yml, lines 327-343
With this PR, main is merged:
- In
sdlc-pipeline.ymlinit job (for existing branches) - In
sdlc-work-loop.ymlwork job (every iteration)
This creates redundant merge commits. On the first work-loop iteration after init, the branch will already have main merged. The work-loop merge will:
- Either be a no-op (if no new commits to main)
- Or create another merge commit
While technically harmless, this adds noise to the commit history and wastes CI time on the push.
Recommendation: Consider adding a check before merging:
git fetch origin main
if ! git merge-base --is-ancestor origin/main HEAD; then
echo "Main has commits not in ${BRANCH_NAME}, merging..."
if git merge origin/main --no-edit; then
# ... push logic
fi
else
echo "Branch already contains latest main, skipping merge"
fi4. No conflict recovery path
Both merge operations fail hard on conflict with exit 1. The PR description states conflicts "abort cleanly with a descriptive error," but there's no recovery mechanism or guidance for users.
Observation: While failing on conflict is correct (you can't auto-resolve), consider adding:
- A comment on the issue with conflict details
- Instructions for manual resolution
- A label like
sdlc:merge-conflictfor filtering
This is more of an enhancement, but worth noting that the current implementation leaves users with just a failed workflow run.
Minor Issues / Observations
5. Inconsistent error message formatting
In init job:
echo "::error::Merge conflict merging main into ${BRANCH_NAME}. Manual resolution required."In work-loop:
echo "::error::Merge conflict merging main into ${BRANCH_NAME}. Manual resolution required."These are consistent, which is good. However, consider including a hint about which workflow/job failed, since users may not immediately know where to look.
6. BRANCH_NAME variable usage
In the work-loop merge step (line 330), BRANCH_NAME is referenced but it's set via the env: block on the job level. This works, but the variable reference is using ${BRANCH_NAME} directly rather than ${{ env.BRANCH_NAME }}. While both work in a bash step, using the direct syntax is fine—just noting this is intentional.
Summary
The core approach is sound: merging main into issue branches before work ensures the agent works against current code. However, the implementation needs refinement:
- Critical: Add push after merge in init job (or document why it's omitted)
- Important: Fix error handling in the push fallback
- Suggested: Add check to skip redundant merges
- Suggested: Consider conflict notification improvements
Request changes to address issues #1 and #2 before merging.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
We should allow the merge fixer to autofix merge conflicts instead of immediately bailing |
This comment has been minimized.
This comment has been minimized.
Review Feedback AddressedI've pushed changes addressing all the feedback: Issues Fixed
Not Addressed (Enhancement Suggestions)
— Authored by egg |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-review (agent-mode design): No concerns.
The changes since my last review (ffe471e → 8ca6736) address the code review feedback:
- Push after merge in init job
- Explicit error handling in push fallback
- Skip redundant merges via
merge-base --is-ancestorcheck - Trigger conflict resolution workflow instead of failing immediately
This is CI/git infrastructure that runs before agents start work—it doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.
(Unable to approve own PR — human reviewer needed for merge.)
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-Review: Merge main before starting SDLC work
I've reviewed the changes since my last review at commit ffe471ef. The previous feedback on push handling, redundant merges, and explicit error handling has been addressed. However, the new conflict resolution logic introduces critical issues that need to be fixed.
Critical Issues
1. Init job exits success with missing outputs on conflict
File: .github/workflows/sdlc-pipeline.yml, lines 296-298
When a merge conflict is detected and the conflict resolution workflow is triggered, the step exits with exit 0:
if gh workflow run on-merge-conflict.yml -f pr_number="${CONFLICT_PR_NUMBER}"; then
echo "::notice::Conflict resolution workflow triggered for PR #${CONFLICT_PR_NUMBER}"
# Exit with success - the conflict resolver will handle it
exit 0This causes the init job to succeed, but critical outputs are never set:
contract_exists(set at line 325)current_phase(set at line 415)pr_number(set at line 419)
The downstream work-loop job has this condition:
if: >-
always() &&
needs.init.result == 'success' &&
needs.init.outputs.current_phase != 'pr'Since current_phase is unset/empty, this evaluates to '' != 'pr' which is true, so the work-loop runs with undefined/empty outputs. This will cause failures or unexpected behavior.
Fix: The step should exit 1 to fail the job (preventing work-loop from running), or the conflict handling should be moved to after the outputs are set, exiting the job in a way that skips work-loop.
2. Work-loop continues after triggering conflict resolver
File: .github/workflows/sdlc-work-loop.yml, lines 369-372
Same issue in the work-loop: when conflict resolution is triggered, exit 0 causes the step to succeed, and subsequent steps continue to run:
if gh workflow run on-merge-conflict.yml -f pr_number="${CONFLICT_PR_NUMBER}"; then
echo "::notice::Conflict resolution workflow triggered for PR #${CONFLICT_PR_NUMBER}"
# Exit with success - the conflict resolver will handle it
exit 0After this, the following steps will still execute:
- "Populate contract tasks" (line 384)
- "Check for multi-agent mode" (line 409)
- "Run egg agent" (line 479)
These steps will work on an unmerged branch (since git merge --abort was called), leading to the agent working on stale code—the exact problem this PR aims to solve.
Fix: The step should exit 1 to fail the job. The conflict resolver will handle the conflict, and the next pipeline run will re-attempt the merge.
Minor Issues
3. Inconsistent exit behavior between conflict scenarios
When conflict resolution workflow trigger fails, the code exits with exit 1. When it succeeds, it exits with exit 0. But in both cases, the merge didn't complete and the workflow should stop—the difference is just whether there's a follow-up process.
For consistency and correctness, both should exit non-zero to prevent downstream jobs from running with an unmerged branch.
4. Race condition with conflict resolver
If the init job succeeds (due to exit 0) and the work-loop runs, and simultaneously the conflict resolver runs, both could be pushing to the same branch. This creates a race condition. This is avoided if the workflow fails on conflict as suggested.
Summary
The key change needed: replace exit 0 with exit 1 in both files after triggering the conflict resolution workflow. The workflow_dispatch to on-merge-conflict.yml is fire-and-forget—the current pipeline should fail so it doesn't continue with stale code. The conflict resolver will fix the branch, and the next pipeline trigger will pick up where it left off.
Request changes to address the critical issues before merging.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-review (agent-mode design): No concerns.
Changes since last review (8ca6736 → 51df3f1) fix the critical exit code issue I raised: both workflows now correctly exit 1 after triggering conflict resolution, preventing downstream jobs from running with stale code.
This remains CI/git infrastructure that runs before agents start work—it doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-Review: Changes since commit 8ca6736
Reviewed the delta from 8ca6736 to 51df3f1. The changes correctly address the critical issues from my previous review.
Verified Fixes
1. Init job no longer succeeds with missing outputs on conflict ✓
Changed from exit 0 to exit 1 at line 297 in sdlc-pipeline.yml. This ensures the init job fails when a conflict is detected, preventing the work-loop from running with undefined outputs.
2. Work-loop no longer continues after triggering conflict resolver ✓
Changed from exit 0 to exit 1 at line 371 in sdlc-work-loop.yml. This stops the job from proceeding to subsequent steps (Populate contract tasks, Run egg agent) with an unmerged branch.
3. Consistent exit behavior ✓
Both files now exit with failure in all conflict scenarios, whether or not the conflict resolution workflow was triggered. This is the correct approach: the current pipeline should always stop on conflict, and a separate workflow handles resolution.
No New Issues
The changes are minimal and targeted. No new logic was introduced—only the exit codes were corrected. The conflict resolution workflow trigger remains fire-and-forget as intended, and the next pipeline run will re-attempt the merge after the resolver fixes the branch.
Approve.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
All review feedback has been addressed in previous commits:
No new actionable feedback since the approval at commit 51df3f1. — Authored by egg |
This comment has been minimized.
This comment has been minimized.
|
@james-in-a-box We should allow the merge fixer to autofix merge conflicts instead of immediately bailing |
|
Updated the merge conflict handling in both Changes: When a merge conflict is detected, the workflow now:
This allows the merge fixer to automatically resolve conflicts inline before the pipeline continues, rather than failing immediately and requiring a separate pipeline re-run. Commit: 54ac646 — Authored by egg |
There was a problem hiding this comment.
Agent-mode design review: No concerns.
Changes since last review (51df3f1 → 54ac646) update the conflict handling to wait for the resolver workflow to complete inline rather than bailing immediately. This is a CI efficiency improvement—the pipeline waits for conflicts to be resolved before continuing, avoiding a separate re-run.
This remains git/CI infrastructure that runs before agents start work. It doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Re-Review: Changes since commit 51df3f1
Reviewed the delta from 51df3f14bfbb68c0017cca471dfff9f7aa8bbfa1 to 54ac646. The new approach waits for the conflict resolver to complete instead of bailing immediately. While this addresses @jwbron's feedback to let the merge fixer auto-resolve conflicts, the implementation introduces critical correctness and reliability issues.
Critical Issues
1. Race condition in workflow run identification
Files: .github/workflows/sdlc-pipeline.yml (lines 304-315) and .github/workflows/sdlc-work-loop.yml (lines 378-389)
The code finds the triggered workflow run by fetching the most recent run of on-merge-conflict.yml:
CONFLICT_RUN_ID=$(gh run list \
--workflow on-merge-conflict.yml \
--limit 5 \
--json databaseId,status,createdAt \
--jq 'sort_by(.createdAt) | reverse | .[0].databaseId' 2>/dev/null || echo "")Problem: This picks the most recently created run globally, not the run that was just triggered for this specific PR. If two pipelines detect conflicts simultaneously (different issues/PRs), or if a scheduled/push-triggered conflict check runs concurrently, this code may:
- Watch the wrong workflow run
- Wait indefinitely for a run that resolves a different PR
- Continue with stale code if the wrong run succeeds
- Incorrectly fail if the wrong run fails
Fix: Filter by the PR number. The on-merge-conflict.yml workflow receives pr_number as an input. Use the GitHub API to find runs triggered with that specific input:
CONFLICT_RUN_ID=$(gh api "repos/${GITHUB_REPOSITORY}/actions/workflows/on-merge-conflict.yml/runs" \
--jq ".workflow_runs | map(select(.event == \"workflow_dispatch\")) | sort_by(.created_at) | reverse | .[0].id" \
2>/dev/null || echo "")Even better: capture the run ID from the gh workflow run command output (though this requires parsing or using the --json flag if available).
2. Indefinite blocking of pipeline runs
Files: Both workflow files
The gh run watch command blocks until the conflict resolver completes. The conflict resolver has a 30-minute timeout (from reusable-conflict-resolve.yml). Combined with the 10-second initial sleep and up to 30 seconds of polling (6 attempts × 5 seconds), this step can block for ~31 minutes.
Problems:
- GitHub Actions runners are billed per-minute. Blocking a runner for 30+ minutes while waiting for another workflow is expensive.
- The init job in
sdlc-pipeline.ymlruns inline (not as a separate step that can timeout independently), so downstream jobs cannot start. - If the conflict resolver is queued behind other runs, the wait extends further.
Suggestion: Consider using workflow outputs and triggering a continuation workflow instead of inline waiting, or accept the current exit 1 behavior with a re-trigger mechanism. The inline wait approach has significant operational cost.
3. Missing error handling for git reset --hard
Files: .github/workflows/sdlc-pipeline.yml (line 327) and .github/workflows/sdlc-work-loop.yml (line 401)
After the conflict resolver succeeds:
git fetch origin "${BRANCH_NAME}"
git reset --hard "origin/${BRANCH_NAME}"Problems:
- No error handling if the fetch fails (e.g., network issues, branch deleted)
- No error handling if the reset fails
- No verification that the resolved branch actually contains the expected merge commit
- The
set -euo pipefailat the top should catch failures, but explicit handling with clear error messages would be better for debugging
Suggestion: Add explicit error checking:
if ! git fetch origin "${BRANCH_NAME}"; then
echo "::error::Failed to fetch resolved branch"
exit 1
fi
if ! git reset --hard "origin/${BRANCH_NAME}"; then
echo "::error::Failed to reset to resolved branch"
exit 1
fiImportant Issues
4. Workflow may pick up stale run from previous trigger
The polling loop (lines 304-315 in sdlc-pipeline.yml) starts after a 10-second sleep. If the gh workflow run command triggers successfully but the run hasn't appeared in the API yet, and there's a previous run from an earlier conflict resolution, the code will find and watch that old run instead.
The jq query sort_by(.createdAt) | reverse | .[0] gets the most recent run, but if the new run hasn't registered yet, "most recent" is the wrong run.
Mitigation: Record the timestamp before triggering, then filter for runs created after that timestamp:
TRIGGER_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
if gh workflow run on-merge-conflict.yml ...; then
# ... later in polling loop:
CONFLICT_RUN_ID=$(gh run list ... --jq "[.[] | select(.createdAt > \"${TRIGGER_TIME}\")] | .[0].databaseId")5. No timeout on the polling loop itself
The polling loop tries 6 times with 5-second sleeps (30 seconds total), which is reasonable. However, if the workflow consistently fails to appear (e.g., workflow file doesn't exist, permissions issue), the job will fail after 30 seconds with a vague "Could not find triggered workflow run" error.
Suggestion: The error message at line 318 should suggest possible causes (workflow doesn't exist, permissions, rate limiting).
Minor Issues
6. Duplicate code between sdlc-pipeline.yml and sdlc-work-loop.yml
The conflict resolution logic (lines 287-343 in sdlc-pipeline.yml and lines 357-417 in sdlc-work-loop.yml) is nearly identical. This is a maintainability concern—bugs fixed in one location may not be fixed in the other.
Observation: Consider extracting this into a composite action or reusable workflow step. Not blocking for this PR, but worth noting for future cleanup.
7. The 10-second initial sleep is arbitrary
sleep 10 before polling assumes the workflow run will appear within 10 seconds. This may be too short under heavy GitHub Actions load, or wastefully long if the run appears quickly.
Suggestion: Start polling immediately with exponential backoff, or use a shorter initial delay (2-3 seconds) since the loop already handles retries.
Summary
The new approach of waiting for the conflict resolver to complete is reasonable in concept—it avoids forcing users to wait for a separate pipeline re-trigger. However, the implementation has a critical race condition in identifying the correct workflow run, and the inline blocking approach is operationally expensive.
Required before merge:
- Fix the workflow run identification to filter by PR number (issue #1)
- Add proper error handling for git operations (issue #3)
Strongly recommended:
3. Filter by trigger timestamp to avoid picking up stale runs (issue #4)
4. Consider the operational cost of 30-minute blocking (issue #2)
Request changes to address the critical issues.
— Authored by egg
This comment has been minimized.
This comment has been minimized.
Fixes: 1. Add push after merge in sdlc-pipeline.yml init job (was missing) 2. Fix error handling in work-loop push fallback with explicit checks 3. Add check to skip redundant merges when branch already contains main 4. Trigger conflict resolution workflow instead of bailing on conflicts When merge conflicts occur, the workflow now: - Looks up the PR number for the branch - Triggers the on-merge-conflict.yml workflow to auto-resolve - Only fails if no PR exists or the workflow trigger fails
Change exit 0 to exit 1 after triggering conflict resolution workflow. This prevents downstream jobs/steps from running with an unmerged branch, which would cause the agent to work on stale code. The conflict resolver will handle the conflict asynchronously, and the next pipeline run will re-attempt the merge. Authored-by: egg
When a merge conflict is detected, the workflow now: 1. Triggers the on-merge-conflict.yml workflow 2. Waits for it to complete using gh run watch 3. If successful, pulls the resolved changes and continues 4. If unsuccessful, fails with a clear error This allows the merge fixer to automatically resolve conflicts before the pipeline continues, rather than immediately failing and requiring a separate pipeline re-run.
- Record timestamp before triggering conflict resolution workflow to filter runs created after that time, avoiding picking up stale or concurrent runs - Increase run limit from 5 to 10 to improve chances of finding the run - Add explicit error handling for git fetch and git reset --hard operations - Improve error message when run is not found to suggest possible causes Authored-by: egg
Align TRIGGER_TIME and the if statement with the surrounding code block indentation (18 spaces instead of 16).
b284ff6 to
07b597c
Compare
|
egg conflict resolution completed. View run logs |
Review Feedback StatusAll review feedback has been addressed. The conflict with main was automatically resolved by the conflict resolution workflow. Summary of ChangesThis PR adds git infrastructure to keep issue branches up-to-date with main before the agent works:
Review Feedback Addressed
All automated reviews (agent-mode design and code review) show approval. Ready for human review. — Authored by egg |
|
egg feedback addressed. View run logs 24 previous review(s) hidden. |
There was a problem hiding this comment.
Re-review (agent-mode design): No concerns.
Changes since last review (b284ff6 → HEAD) add autofix attempt tracking, limit enforcement, and re-dispatch logic:
modeinput forchecks-onlyre-runs after autofixautofix_attempts/max_autofix_attemptsin contract schema- Circuit breaker escalation when autofix limit exceeded
- Re-dispatch job to re-run checks after successful autofix
This is all CI/workflow orchestration—managing when agents run and what happens after, not how agents behave. It doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.
— Authored by egg
There was a problem hiding this comment.
Re-Review: Changes since commit b284ff6
Reviewed the delta from b284ff60a16845ed902676f7aa0b2820f9d07a5f to 07b597c. This PR has grown significantly since my last approval—the original merge conflict handling is still correct, but substantial new functionality has been added for autofix tracking and synchronous execution. These new additions introduce several critical issues.
Critical Issues
1. Silent failure in push retry logic during autofix attempt reset
File: sdlc-work-loop.yml, lines 365-368
git push origin "${BRANCH_NAME}" || {
git pull --rebase origin "${BRANCH_NAME}"
git push origin "${BRANCH_NAME}"
}If git pull --rebase fails (e.g., rebase conflicts) or the second push fails, the step continues silently. The reset to 0 may never be persisted, but the workflow proceeds as if it was. This is the same silent fallback pattern I flagged in a previous review for the merge conflict handling, but it was reintroduced here.
Fix: Add explicit error handling:
git push origin "${BRANCH_NAME}" || {
if ! git pull --rebase origin "${BRANCH_NAME}"; then
echo "::error::Failed to rebase during autofix reset push"
exit 1
fi
if ! git push origin "${BRANCH_NAME}"; then
echo "::error::Failed to push autofix reset after rebase"
exit 1
fi
}2. Race condition: autofix attempts counter may double-increment
File: sdlc-work-loop.yml, lines 929-1006 ("Update autofix attempts counter" step)
The step runs git fetch + git reset --hard to get the latest, then reads autofix_attempts, increments it, and pushes. If the autofix workflow itself also modifies the contract (e.g., to record fixes made), this reset will overwrite those changes.
More critically, if two pipelines run concurrently (e.g., rapid re-triggers), both may:
- Read the same
autofix_attemptsvalue - Increment to the same new value
- Race to push
The retry loop (lines 968-1000) attempts to handle this, but it uses reset --hard which discards any local commit, then re-applies the increment to whatever the remote has. This could result in:
- Double-incrementing if two runs overlap
- Missing increments if one run's commit is overwritten
Recommendation: Use atomic operations or file locking. Since this is GitHub Actions, consider using the GitHub API to update a workflow variable or issue label instead of file-based state. Alternatively, check the remote's autofix_attempts value and only increment if it matches the expected pre-increment value.
3. workflow_dispatch runs may fail to find the dispatched run
File: sdlc-work-loop.yml, lines 876-898
The gh run list query filters by headBranch, but for workflow_dispatch events, the run may not have a headBranch set (it depends on workflow configuration). The reusable-autofix.yml workflow is triggered via workflow_dispatch, and if it doesn't check out a branch or specify one, headBranch will be empty or "main".
RUN_ID=$(gh run list \
--workflow reusable-autofix.yml \
--json databaseId,status,createdAt,headBranch \
--jq --arg branch "${BRANCH_NAME}" --arg dispatch_time "${DISPATCH_TIME}" '
[.[] | select(
.headBranch == $branch and ...
)] ...Fix: Verify that reusable-autofix.yml sets headBranch correctly, or remove the branch filter and rely solely on the timestamp filter (with a narrow window).
4. re-dispatch-checks may trigger before autofix attempts counter is updated
File: sdlc-work-loop.yml, lines 1111-1159
The re-dispatch-checks job depends on aggregate-checks, but aggregate-checks depends on check-fixer. The check-fixer job's "Update autofix attempts counter" step runs after the fix step completes. However, the counter update step is NOT a separate job—it's a step within check-fixer.
If the counter update step fails (line 1004: exit 1 on push failure), the check-fixer job fails. But aggregate-checks runs with always(), so it still executes and may still see autofix_succeeded == 'true' (from steps.fix.outputs.fixed). This triggers re-dispatch-checks, which dispatches a new workflow run—but the autofix counter was never incremented, potentially leading to infinite loops.
Fix: Either:
- Make the counter update a separate job that
re-dispatch-checksdepends on - Or set
fixed=falseif the counter update fails
5. Autofix limit check uses stale data
File: sdlc-work-loop.yml, lines 787-809
The "Check autofix attempt limit" step reads from the local checkout, which was done at line 773. If another concurrent run incremented autofix_attempts between checkout and this check, this run will use the stale (lower) value and proceed with autofix, potentially exceeding the actual limit.
Recommendation: Fetch latest before checking:
git fetch origin "${BRANCH_NAME}"
git reset --hard "origin/${BRANCH_NAME}"Add this before reading the contract in the limit check step.
Important Issues
6. Escalation condition may never trigger if fixer job doesn't run
File: sdlc-work-loop.yml, lines 2132-2137
if: >-
always() &&
needs.aggregate-checks.result == 'success' &&
needs.aggregate-checks.outputs.autofix_limit_exceeded == 'true' &&
needs.aggregate-checks.outputs.all_passed != 'true'autofix_limit_exceeded is only set in check-fixer.check-limit.outputs.exceeded. But check-fixer only runs if checks fail (see line 743-752). If the limit was exceeded on a previous run and the current run has no failing checks (somehow), check-fixer won't run, autofix_limit_exceeded will be empty/false, and the escalation won't trigger even though the limit was previously exceeded.
This is a minor edge case, but worth noting for correctness.
7. Mode parameter default allows unexpected behavior
File: sdlc-work-loop.yml, lines 105-110
mode:
description: 'Execution mode: full (work+checks+review) or checks-only'
required: false
type: string
default: "full"The workflow accepts arbitrary mode values—there's no validation. If someone passes mode: "foo", the workflow will:
- Skip the work job (line 277:
if: needs.resolve-inputs.outputs.mode != 'checks-only')... wait, this would NOT skip work sincefoo != 'checks-only'is true.
Actually this is fine—only checks-only skips work. But the lack of validation means typos like check-only (singular) would run the full workflow unexpectedly. Consider adding validation.
Minor Issues
8. Inconsistent error message formatting
In the autofix trigger step, some errors use ::warning:: while others use ::error::. The messages should be consistent—use ::error:: for failures that cause the step to exit early.
9. BRANCH_PREFIX output added but not used in check-fixer
Line 763 sets BRANCH_PREFIX: ${{ needs.resolve-inputs.outputs.branch_prefix }} but it's never referenced in the check-fixer job (the comment at line 862 says "Note: branch_prefix is only available for workflow_call, not workflow_dispatch"). Either remove the unused env var or use it.
Summary
The original merge conflict handling approved at b284ff6 remains correct. However, the new autofix tracking functionality introduces race conditions and failure modes that need attention:
Required before merge:
- Fix silent push failure in autofix reset (issue #1)
- Address race condition in counter increment (issue #2)
- Verify headBranch filtering works for workflow_dispatch (issue #3)
- Ensure counter update failure prevents re-dispatch (issue #4)
- Fetch latest before checking autofix limit (issue #5)
Request changes.
— Authored by egg
|
egg agent-mode-design completed. View run logs 24 previous review(s) hidden. |
|
egg review completed. View run logs 24 previous review(s) hidden. |
| egg is addressing review feedback... |
Fix critical issues raised in code review: 1. Add explicit error handling for silent push failures in: - Reset autofix attempts on manual trigger - Populate contract tasks step 2. Remove headBranch filtering for workflow_dispatch runs since workflow_dispatch doesn't set headBranch to the issue branch. Rely solely on timestamp filtering to identify triggered runs. 3. Add counter_updated output to Update autofix attempts counter step and factor it into the fixed output. This prevents re-dispatch if the counter update fails (issue #4). 4. Fetch latest branch state before checking autofix attempt limit to avoid proceeding with stale counter values (issue #5). 5. Improve error message when workflow run is not found to include timestamp and suggest possible causes.
Review Feedback AddressedPushed commit 0369170 addressing the critical issues from the latest review: Fixes Applied
Issue #2 (Race condition in counter increment)The current implementation already handles this correctly via optimistic locking:
The reviewer suggested atomic operations or file locking, but the current approach is a valid pattern for this use case and would require significant architectural changes to replace. — Authored by egg |
|
egg feedback addressed. View run logs 24 previous review(s) hidden. |
There was a problem hiding this comment.
Re-review (agent-mode design): No concerns.
Changes since last review (07b597c → 0369170) address code review feedback on autofix workflow robustness:
- Explicit error handling for push retry failures
- Removed
headBranchfiltering forworkflow_dispatchruns (timestamp filtering only) - Added
counter_updatedoutput to prevent re-dispatch on counter update failure - Fetch latest branch state before checking autofix limit
These are all CI/workflow improvements—better error handling, race condition fixes, and output chaining. None of this affects agent prompts, output formats, or behavioral constraints. The changes make the workflow more robust without changing how agents operate.
— Authored by egg
There was a problem hiding this comment.
Re-Review: Changes since commit 07b597c
Reviewed the delta to commit 0369170. The changes correctly address all five critical issues from my previous review.
Verified Fixes
1. Silent push failure in autofix reset (issue #1) ✓
Lines 365-375 in sdlc-work-loop.yml now have explicit error handling:
git push origin "${BRANCH_NAME}" || {
echo "::warning::Initial push failed, attempting pull --rebase and retry..."
if ! git pull --rebase origin "${BRANCH_NAME}"; then
echo "::error::Failed to rebase during autofix reset push"
exit 1
fi
...
}The same pattern was correctly applied to "Populate contract tasks" (lines 497-506).
2. headBranch filtering for workflow_dispatch (issue #3) ✓
Lines 897-911: The headBranch filter has been removed. The jq query now relies solely on timestamp filtering:
--jq --arg dispatch_time "${DISPATCH_TIME}" '
[.[] | select(
(.status == "queued" or .status == "in_progress" or .status == "completed") and
.createdAt >= $dispatch_time
)] | sort_by(.createdAt) | .[0].databaseId // empty
'3. Counter update failure prevents re-dispatch (issue #4) ✓
Lines 949-1032: The step now has id: update-counter and sets counter_updated output. The job-level fixed output (lines 768-770) correctly incorporates this:
fixed: ${{ (steps.fix.outputs.fixed == 'true' && steps.update-counter.outputs.counter_updated != 'false') && 'true' || steps.limit-exceeded.outputs.fixed || 'false' }}This prevents re-dispatch when the counter update fails.
4. Fetch latest before checking autofix limit (issue #5) ✓
Lines 807-809 in check-limit step now fetch and reset before reading autofix_attempts:
git fetch origin "${BRANCH_NAME}"
git reset --hard "origin/${BRANCH_NAME}"5. Improved error messaging ✓
Line 921: The warning message now includes the dispatch timestamp and suggests possible causes.
Previous Issues: All Addressed
For completeness, confirming fixes from earlier reviews remain in place:
- ✓ Push after merge in init job
- ✓ Skip redundant merges via
merge-base --is-ancestorcheck - ✓ Trigger and wait for conflict resolution workflow
- ✓ Timestamp filtering to avoid race conditions
- ✓ Explicit error handling for git fetch/reset after conflict resolution
Issue #2 (Race condition in counter increment)
The author notes the optimistic locking pattern is intentional: on push failure, fetch latest, re-read counter, increment from fresh value, retry. This is a valid concurrency pattern for this use case.
No New Issues
The changes are targeted and correct. No new logic was introduced—only the specific fixes requested.
Approve.
— Authored by egg
|
egg agent-mode-design completed. View run logs 24 previous review(s) hidden. |
|
egg review completed. View run logs 24 previous review(s) hidden. |
| egg is addressing review feedback... |
Feedback Addressing CompleteAll review feedback has been addressed. Latest automated reviews (commit 0369170) show approval:
No new actionable feedback since the last commit. Ready for human review. — Authored by egg |
|
egg feedback addressed. View run logs 24 previous review(s) hidden. |
Update documentation to reflect changes from PR #493 which added automatic main branch merging before SDLC work and synchronous conflict resolution. Changes: - Document automatic main branch merging in sdlc-pipeline.md - Describe conflict resolution workflow integration in both guides - Update github-automation.md to list SDLC pipeline as a trigger mode Triggered by: 0adf69e Authored-by: egg
* docs: Update SDLC pipeline docs for auto-merge behavior [doc-updater] Update documentation to reflect changes from PR #493 which added automatic main branch merging before SDLC work and synchronous conflict resolution. Changes: - Document automatic main branch merging in sdlc-pipeline.md - Describe conflict resolution workflow integration in both guides - Update github-automation.md to list SDLC pipeline as a trigger mode Triggered by: 0adf69e Authored-by: egg * docs: Address review feedback on auto-merge documentation - Remove inaccurate "(with timeout)" claim since gh run watch has no explicit timeout configured - Add explanation that PR is required for workflow_dispatch targeting - Clarify that SDLC pipeline "attempts to merge" rather than "merges" to more precisely describe the conflict-triggering scenario Authored-by: egg * Update Conflict Resolver trigger in Workflows Overview table Add SDLC pipeline integration as the first trigger in the table entry to match the detailed Trigger Modes section later in the document. --------- Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com> Co-authored-by: james-in-a-box[bot] <246424927+james-in-a-box[bot]@users.noreply.github.com>
Summary
git merge origin/mainstep insdlc-pipeline.ymlinit job when an existing issue branch is checked out, so the branch starts with the latest main codesdlc-work-loop.ymlwork job, right before the agent runs, ensuring every work loop iteration incorporates the latest mainPreviously, when an issue branch already existed, the pipeline would check it out and pull its own latest commits but never merge in changes from main. This meant the agent could work on stale code, leading to merge conflicts discovered only at the end of the implement phase. By merging main upfront, conflicts surface early and the agent always works against current code.
Both merge steps abort cleanly on conflict and fail the job with a clear error, rather than silently proceeding with outdated code.
Issue: none
Test plan:
Authored-by: egg