Skip to content

Merge main before starting SDLC work#493

Merged
jwbron merged 7 commits intomainfrom
egg/merge-main-before-work
Feb 11, 2026
Merged

Merge main before starting SDLC work#493
jwbron merged 7 commits intomainfrom
egg/merge-main-before-work

Conversation

@james-in-a-box
Copy link
Contributor

Summary

  • Add a git merge origin/main step in sdlc-pipeline.yml init job when an existing issue branch is checked out, so the branch starts with the latest main code
  • Add a "Merge latest main into issue branch" step in sdlc-work-loop.yml work job, right before the agent runs, ensuring every work loop iteration incorporates the latest main

Previously, when an issue branch already existed, the pipeline would check it out and pull its own latest commits but never merge in changes from main. This meant the agent could work on stale code, leading to merge conflicts discovered only at the end of the implement phase. By merging main upfront, conflicts surface early and the agent always works against current code.

Both merge steps abort cleanly on conflict and fail the job with a clear error, rather than silently proceeding with outdated code.

Issue: none

Test plan:

  • Verify the pipeline initializes correctly for a new issue (new branch path unchanged)
  • Verify an existing issue branch gets main merged in during init
  • Verify the work loop merges main before the agent starts work
  • Verify merge conflicts abort cleanly with a descriptive error

Authored-by: egg

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent-mode design review: No concerns.

This PR adds git infrastructure to keep issue branches up-to-date with main before the agent works. These are standard CI/git operations that don't affect agent prompts, output formats, or behavioral constraints.

(Unable to approve own PR — human reviewer needed for merge.)

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Merge main before starting SDLC work

I've reviewed the full diff and examined the surrounding workflow context. Overall, this is a sound approach to addressing stale branches, but there are several issues that need attention.

Issues Requiring Revision

1. Missing merge commit push in sdlc-pipeline.yml init job

File: .github/workflows/sdlc-pipeline.yml, lines 267-276

The init job merges main into the issue branch but does not push the merge commit. This is inconsistent with the work-loop implementation and means the merge commit will be lost if another push occurs before the "Push branch" step completes.

# Current code - no push after merge
if git merge origin/main --no-edit; then
  echo "Successfully merged main into ${BRANCH_NAME}"
else
  ...

The work-loop version correctly pushes the merge commit:

if git merge origin/main --no-edit; then
  echo "Successfully merged main into ${BRANCH_NAME}"
  # Push the merge commit so reviewers and subsequent steps see it
  git push origin "${BRANCH_NAME}" || {
    git pull --rebase origin "${BRANCH_NAME}"
    git push origin "${BRANCH_NAME}"
  }

Recommendation: Add the same push logic to the init job, or rely solely on the downstream "Push branch" step (which happens shortly after). Given that other operations may commit before "Push branch" runs, pushing immediately after the merge is safer.

2. Race condition in work-loop push fallback

File: .github/workflows/sdlc-work-loop.yml, lines 335-338

The push fallback uses git pull --rebase after a failed push, but this can fail silently or produce unexpected results:

git push origin "${BRANCH_NAME}" || {
  git pull --rebase origin "${BRANCH_NAME}"
  git push origin "${BRANCH_NAME}"
}

Issues:

  • If git pull --rebase fails (e.g., rebase conflicts), the subsequent git push will still run and fail, but the error message will be unclear.
  • If the rebase succeeds but the second push fails, the job continues with no indication of failure.
  • The entire block is in a subshell that doesn't propagate exit codes properly to the outer if statement.

Recommendation: Add explicit error handling:

git push origin "${BRANCH_NAME}" || {
  echo "::warning::Initial push failed, attempting pull --rebase and retry..."
  if ! git pull --rebase origin "${BRANCH_NAME}"; then
    echo "::error::Failed to rebase during push retry"
    exit 1
  fi
  if ! git push origin "${BRANCH_NAME}"; then
    echo "::error::Failed to push after rebase"
    exit 1
  fi
}

3. Duplicate merge operations in work-loop

File: .github/workflows/sdlc-work-loop.yml, lines 327-343

With this PR, main is merged:

  1. In sdlc-pipeline.yml init job (for existing branches)
  2. In sdlc-work-loop.yml work job (every iteration)

This creates redundant merge commits. On the first work-loop iteration after init, the branch will already have main merged. The work-loop merge will:

  • Either be a no-op (if no new commits to main)
  • Or create another merge commit

While technically harmless, this adds noise to the commit history and wastes CI time on the push.

Recommendation: Consider adding a check before merging:

git fetch origin main
if ! git merge-base --is-ancestor origin/main HEAD; then
  echo "Main has commits not in ${BRANCH_NAME}, merging..."
  if git merge origin/main --no-edit; then
    # ... push logic
  fi
else
  echo "Branch already contains latest main, skipping merge"
fi

4. No conflict recovery path

Both merge operations fail hard on conflict with exit 1. The PR description states conflicts "abort cleanly with a descriptive error," but there's no recovery mechanism or guidance for users.

Observation: While failing on conflict is correct (you can't auto-resolve), consider adding:

  • A comment on the issue with conflict details
  • Instructions for manual resolution
  • A label like sdlc:merge-conflict for filtering

This is more of an enhancement, but worth noting that the current implementation leaves users with just a failed workflow run.

Minor Issues / Observations

5. Inconsistent error message formatting

In init job:

echo "::error::Merge conflict merging main into ${BRANCH_NAME}. Manual resolution required."

In work-loop:

echo "::error::Merge conflict merging main into ${BRANCH_NAME}. Manual resolution required."

These are consistent, which is good. However, consider including a hint about which workflow/job failed, since users may not immediately know where to look.

6. BRANCH_NAME variable usage

In the work-loop merge step (line 330), BRANCH_NAME is referenced but it's set via the env: block on the job level. This works, but the variable reference is using ${BRANCH_NAME} directly rather than ${{ env.BRANCH_NAME }}. While both work in a bash step, using the direct syntax is fine—just noting this is intentional.

Summary

The core approach is sound: merging main into issue branches before work ensures the agent works against current code. However, the implementation needs refinement:

  1. Critical: Add push after merge in init job (or document why it's omitted)
  2. Important: Fix error handling in the push fallback
  3. Suggested: Add check to skip redundant merges
  4. Suggested: Consider conflict notification improvements

Request changes to address issues #1 and #2 before merging.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@jwbron
Copy link
Owner

jwbron commented Feb 10, 2026

We should allow the merge fixer to autofix merge conflicts instead of immediately bailing

@james-in-a-box

This comment has been minimized.

@james-in-a-box
Copy link
Contributor Author

Review Feedback Addressed

I've pushed changes addressing all the feedback:

Issues Fixed

  1. Missing merge commit push in init job - Added push logic with retry handling after successful merge in sdlc-pipeline.yml.

  2. Race condition in work-loop push fallback - Replaced the silent fallback with explicit error handling:

    git push origin "${BRANCH_NAME}" || {
      echo "::warning::Initial push failed, attempting pull --rebase and retry..."
      if ! git pull --rebase origin "${BRANCH_NAME}"; then
        echo "::error::Failed to rebase during push retry"
        exit 1
      fi
      if ! git push origin "${BRANCH_NAME}"; then
        echo "::error::Failed to push after rebase"
        exit 1
      fi
    }
  3. Duplicate merge operations - Added git merge-base --is-ancestor check to skip merge when branch already contains main:

    if git merge-base --is-ancestor origin/main HEAD; then
      echo "Branch already contains latest main, skipping merge"
    fi
  4. Conflict resolution (per @jwbron's feedback) - Instead of immediately bailing on conflicts, the workflow now triggers the existing on-merge-conflict.yml workflow:

    • Looks up the PR number for the branch via gh pr list
    • Triggers conflict resolution workflow if PR exists
    • Only fails if no PR exists or workflow trigger fails

Not Addressed (Enhancement Suggestions)

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (agent-mode design): No concerns.

The changes since my last review (ffe471e8ca6736) address the code review feedback:

  • Push after merge in init job
  • Explicit error handling in push fallback
  • Skip redundant merges via merge-base --is-ancestor check
  • Trigger conflict resolution workflow instead of failing immediately

This is CI/git infrastructure that runs before agents start work—it doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.

(Unable to approve own PR — human reviewer needed for merge.)

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: Merge main before starting SDLC work

I've reviewed the changes since my last review at commit ffe471ef. The previous feedback on push handling, redundant merges, and explicit error handling has been addressed. However, the new conflict resolution logic introduces critical issues that need to be fixed.

Critical Issues

1. Init job exits success with missing outputs on conflict

File: .github/workflows/sdlc-pipeline.yml, lines 296-298

When a merge conflict is detected and the conflict resolution workflow is triggered, the step exits with exit 0:

if gh workflow run on-merge-conflict.yml -f pr_number="${CONFLICT_PR_NUMBER}"; then
  echo "::notice::Conflict resolution workflow triggered for PR #${CONFLICT_PR_NUMBER}"
  # Exit with success - the conflict resolver will handle it
  exit 0

This causes the init job to succeed, but critical outputs are never set:

  • contract_exists (set at line 325)
  • current_phase (set at line 415)
  • pr_number (set at line 419)

The downstream work-loop job has this condition:

if: >-
  always() &&
  needs.init.result == 'success' &&
  needs.init.outputs.current_phase != 'pr'

Since current_phase is unset/empty, this evaluates to '' != 'pr' which is true, so the work-loop runs with undefined/empty outputs. This will cause failures or unexpected behavior.

Fix: The step should exit 1 to fail the job (preventing work-loop from running), or the conflict handling should be moved to after the outputs are set, exiting the job in a way that skips work-loop.

2. Work-loop continues after triggering conflict resolver

File: .github/workflows/sdlc-work-loop.yml, lines 369-372

Same issue in the work-loop: when conflict resolution is triggered, exit 0 causes the step to succeed, and subsequent steps continue to run:

if gh workflow run on-merge-conflict.yml -f pr_number="${CONFLICT_PR_NUMBER}"; then
  echo "::notice::Conflict resolution workflow triggered for PR #${CONFLICT_PR_NUMBER}"
  # Exit with success - the conflict resolver will handle it
  exit 0

After this, the following steps will still execute:

  • "Populate contract tasks" (line 384)
  • "Check for multi-agent mode" (line 409)
  • "Run egg agent" (line 479)

These steps will work on an unmerged branch (since git merge --abort was called), leading to the agent working on stale code—the exact problem this PR aims to solve.

Fix: The step should exit 1 to fail the job. The conflict resolver will handle the conflict, and the next pipeline run will re-attempt the merge.

Minor Issues

3. Inconsistent exit behavior between conflict scenarios

When conflict resolution workflow trigger fails, the code exits with exit 1. When it succeeds, it exits with exit 0. But in both cases, the merge didn't complete and the workflow should stop—the difference is just whether there's a follow-up process.

For consistency and correctness, both should exit non-zero to prevent downstream jobs from running with an unmerged branch.

4. Race condition with conflict resolver

If the init job succeeds (due to exit 0) and the work-loop runs, and simultaneously the conflict resolver runs, both could be pushing to the same branch. This creates a race condition. This is avoided if the workflow fails on conflict as suggested.

Summary

The key change needed: replace exit 0 with exit 1 in both files after triggering the conflict resolution workflow. The workflow_dispatch to on-merge-conflict.yml is fire-and-forget—the current pipeline should fail so it doesn't continue with stale code. The conflict resolver will fix the branch, and the next pipeline trigger will pick up where it left off.

Request changes to address the critical issues before merging.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (agent-mode design): No concerns.

Changes since last review (8ca673651df3f1) fix the critical exit code issue I raised: both workflows now correctly exit 1 after triggering conflict resolution, preventing downstream jobs from running with stale code.

This remains CI/git infrastructure that runs before agents start work—it doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: Changes since commit 8ca6736

Reviewed the delta from 8ca6736 to 51df3f1. The changes correctly address the critical issues from my previous review.

Verified Fixes

1. Init job no longer succeeds with missing outputs on conflict

Changed from exit 0 to exit 1 at line 297 in sdlc-pipeline.yml. This ensures the init job fails when a conflict is detected, preventing the work-loop from running with undefined outputs.

2. Work-loop no longer continues after triggering conflict resolver

Changed from exit 0 to exit 1 at line 371 in sdlc-work-loop.yml. This stops the job from proceeding to subsequent steps (Populate contract tasks, Run egg agent) with an unmerged branch.

3. Consistent exit behavior

Both files now exit with failure in all conflict scenarios, whether or not the conflict resolution workflow was triggered. This is the correct approach: the current pipeline should always stop on conflict, and a separate workflow handles resolution.

No New Issues

The changes are minimal and targeted. No new logic was introduced—only the exit codes were corrected. The conflict resolution workflow trigger remains fire-and-forget as intended, and the next pipeline run will re-attempt the merge after the resolver fixes the branch.

Approve.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@james-in-a-box

This comment has been minimized.

@james-in-a-box
Copy link
Contributor Author

All review feedback has been addressed in previous commits:

  1. @jwbron's feedback (allow merge fixer to autofix conflicts): Addressed in commit 8ca6736 - workflow now triggers on-merge-conflict.yml instead of immediately bailing
  2. Bot review feedback: All issues fixed in commits 8ca6736 and 51df3f1

No new actionable feedback since the approval at commit 51df3f1.

— Authored by egg

@james-in-a-box

This comment has been minimized.

@jwbron
Copy link
Owner

jwbron commented Feb 10, 2026

@james-in-a-box We should allow the merge fixer to autofix merge conflicts instead of immediately bailing

@james-in-a-box
Copy link
Contributor Author

Updated the merge conflict handling in both sdlc-pipeline.yml and sdlc-work-loop.yml to wait for the conflict resolver to complete instead of immediately bailing.

Changes:

When a merge conflict is detected, the workflow now:

  1. Triggers on-merge-conflict.yml
  2. Waits for it to complete using gh run watch --exit-status
  3. If successful, pulls the resolved changes and continues with the pipeline
  4. If unsuccessful, fails with a clear error

This allows the merge fixer to automatically resolve conflicts inline before the pipeline continues, rather than failing immediately and requiring a separate pipeline re-run.

Commit: 54ac646

— Authored by egg

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent-mode design review: No concerns.

Changes since last review (51df3f154ac646) update the conflict handling to wait for the resolver workflow to complete inline rather than bailing immediately. This is a CI efficiency improvement—the pipeline waits for conflicts to be resolved before continuing, avoiding a separate re-run.

This remains git/CI infrastructure that runs before agents start work. It doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.

— Authored by egg

@james-in-a-box

This comment has been minimized.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: Changes since commit 51df3f1

Reviewed the delta from 51df3f14bfbb68c0017cca471dfff9f7aa8bbfa1 to 54ac646. The new approach waits for the conflict resolver to complete instead of bailing immediately. While this addresses @jwbron's feedback to let the merge fixer auto-resolve conflicts, the implementation introduces critical correctness and reliability issues.

Critical Issues

1. Race condition in workflow run identification

Files: .github/workflows/sdlc-pipeline.yml (lines 304-315) and .github/workflows/sdlc-work-loop.yml (lines 378-389)

The code finds the triggered workflow run by fetching the most recent run of on-merge-conflict.yml:

CONFLICT_RUN_ID=$(gh run list \
  --workflow on-merge-conflict.yml \
  --limit 5 \
  --json databaseId,status,createdAt \
  --jq 'sort_by(.createdAt) | reverse | .[0].databaseId' 2>/dev/null || echo "")

Problem: This picks the most recently created run globally, not the run that was just triggered for this specific PR. If two pipelines detect conflicts simultaneously (different issues/PRs), or if a scheduled/push-triggered conflict check runs concurrently, this code may:

  1. Watch the wrong workflow run
  2. Wait indefinitely for a run that resolves a different PR
  3. Continue with stale code if the wrong run succeeds
  4. Incorrectly fail if the wrong run fails

Fix: Filter by the PR number. The on-merge-conflict.yml workflow receives pr_number as an input. Use the GitHub API to find runs triggered with that specific input:

CONFLICT_RUN_ID=$(gh api "repos/${GITHUB_REPOSITORY}/actions/workflows/on-merge-conflict.yml/runs" \
  --jq ".workflow_runs | map(select(.event == \"workflow_dispatch\")) | sort_by(.created_at) | reverse | .[0].id" \
  2>/dev/null || echo "")

Even better: capture the run ID from the gh workflow run command output (though this requires parsing or using the --json flag if available).

2. Indefinite blocking of pipeline runs

Files: Both workflow files

The gh run watch command blocks until the conflict resolver completes. The conflict resolver has a 30-minute timeout (from reusable-conflict-resolve.yml). Combined with the 10-second initial sleep and up to 30 seconds of polling (6 attempts × 5 seconds), this step can block for ~31 minutes.

Problems:

  • GitHub Actions runners are billed per-minute. Blocking a runner for 30+ minutes while waiting for another workflow is expensive.
  • The init job in sdlc-pipeline.yml runs inline (not as a separate step that can timeout independently), so downstream jobs cannot start.
  • If the conflict resolver is queued behind other runs, the wait extends further.

Suggestion: Consider using workflow outputs and triggering a continuation workflow instead of inline waiting, or accept the current exit 1 behavior with a re-trigger mechanism. The inline wait approach has significant operational cost.

3. Missing error handling for git reset --hard

Files: .github/workflows/sdlc-pipeline.yml (line 327) and .github/workflows/sdlc-work-loop.yml (line 401)

After the conflict resolver succeeds:

git fetch origin "${BRANCH_NAME}"
git reset --hard "origin/${BRANCH_NAME}"

Problems:

  • No error handling if the fetch fails (e.g., network issues, branch deleted)
  • No error handling if the reset fails
  • No verification that the resolved branch actually contains the expected merge commit
  • The set -euo pipefail at the top should catch failures, but explicit handling with clear error messages would be better for debugging

Suggestion: Add explicit error checking:

if ! git fetch origin "${BRANCH_NAME}"; then
  echo "::error::Failed to fetch resolved branch"
  exit 1
fi
if ! git reset --hard "origin/${BRANCH_NAME}"; then
  echo "::error::Failed to reset to resolved branch"
  exit 1
fi

Important Issues

4. Workflow may pick up stale run from previous trigger

The polling loop (lines 304-315 in sdlc-pipeline.yml) starts after a 10-second sleep. If the gh workflow run command triggers successfully but the run hasn't appeared in the API yet, and there's a previous run from an earlier conflict resolution, the code will find and watch that old run instead.

The jq query sort_by(.createdAt) | reverse | .[0] gets the most recent run, but if the new run hasn't registered yet, "most recent" is the wrong run.

Mitigation: Record the timestamp before triggering, then filter for runs created after that timestamp:

TRIGGER_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
if gh workflow run on-merge-conflict.yml ...; then
  # ... later in polling loop:
  CONFLICT_RUN_ID=$(gh run list ... --jq "[.[] | select(.createdAt > \"${TRIGGER_TIME}\")] | .[0].databaseId")

5. No timeout on the polling loop itself

The polling loop tries 6 times with 5-second sleeps (30 seconds total), which is reasonable. However, if the workflow consistently fails to appear (e.g., workflow file doesn't exist, permissions issue), the job will fail after 30 seconds with a vague "Could not find triggered workflow run" error.

Suggestion: The error message at line 318 should suggest possible causes (workflow doesn't exist, permissions, rate limiting).

Minor Issues

6. Duplicate code between sdlc-pipeline.yml and sdlc-work-loop.yml

The conflict resolution logic (lines 287-343 in sdlc-pipeline.yml and lines 357-417 in sdlc-work-loop.yml) is nearly identical. This is a maintainability concern—bugs fixed in one location may not be fixed in the other.

Observation: Consider extracting this into a composite action or reusable workflow step. Not blocking for this PR, but worth noting for future cleanup.

7. The 10-second initial sleep is arbitrary

sleep 10 before polling assumes the workflow run will appear within 10 seconds. This may be too short under heavy GitHub Actions load, or wastefully long if the run appears quickly.

Suggestion: Start polling immediately with exponential backoff, or use a shorter initial delay (2-3 seconds) since the loop already handles retries.

Summary

The new approach of waiting for the conflict resolver to complete is reasonable in concept—it avoids forcing users to wait for a separate pipeline re-trigger. However, the implementation has a critical race condition in identifying the correct workflow run, and the inline blocking approach is operationally expensive.

Required before merge:

  1. Fix the workflow run identification to filter by PR number (issue #1)
  2. Add proper error handling for git operations (issue #3)

Strongly recommended:
3. Filter by trigger timestamp to avoid picking up stale runs (issue #4)
4. Consider the operational cost of 30-minute blocking (issue #2)

Request changes to address the critical issues.

— Authored by egg

@james-in-a-box

This comment has been minimized.

egg and others added 6 commits February 10, 2026 23:31
Fixes:
1. Add push after merge in sdlc-pipeline.yml init job (was missing)
2. Fix error handling in work-loop push fallback with explicit checks
3. Add check to skip redundant merges when branch already contains main
4. Trigger conflict resolution workflow instead of bailing on conflicts

When merge conflicts occur, the workflow now:
- Looks up the PR number for the branch
- Triggers the on-merge-conflict.yml workflow to auto-resolve
- Only fails if no PR exists or the workflow trigger fails
Change exit 0 to exit 1 after triggering conflict resolution workflow.
This prevents downstream jobs/steps from running with an unmerged branch,
which would cause the agent to work on stale code.

The conflict resolver will handle the conflict asynchronously, and the
next pipeline run will re-attempt the merge.

Authored-by: egg
When a merge conflict is detected, the workflow now:
1. Triggers the on-merge-conflict.yml workflow
2. Waits for it to complete using gh run watch
3. If successful, pulls the resolved changes and continues
4. If unsuccessful, fails with a clear error

This allows the merge fixer to automatically resolve conflicts before
the pipeline continues, rather than immediately failing and requiring
a separate pipeline re-run.
- Record timestamp before triggering conflict resolution workflow to filter
  runs created after that time, avoiding picking up stale or concurrent runs
- Increase run limit from 5 to 10 to improve chances of finding the run
- Add explicit error handling for git fetch and git reset --hard operations
- Improve error message when run is not found to suggest possible causes

Authored-by: egg
Align TRIGGER_TIME and the if statement with the surrounding code
block indentation (18 spaces instead of 16).
@james-in-a-box james-in-a-box bot force-pushed the egg/merge-main-before-work branch from b284ff6 to 07b597c Compare February 10, 2026 23:32
@james-in-a-box
Copy link
Contributor Author

egg conflict resolution completed. View run logs

@james-in-a-box
Copy link
Contributor Author

Review Feedback Status

All review feedback has been addressed. The conflict with main was automatically resolved by the conflict resolution workflow.

Summary of Changes

This PR adds git infrastructure to keep issue branches up-to-date with main before the agent works:

  1. sdlc-pipeline.yml: Merges main into existing issue branches in the init job
  2. sdlc-work-loop.yml: Merges main into issue branches before each work iteration

Review Feedback Addressed

  • ✓ Push after merge in init job
  • ✓ Explicit error handling in push fallback
  • ✓ Skip redundant merges via merge-base --is-ancestor check
  • ✓ Trigger conflict resolution workflow on conflicts
  • ✓ Wait for conflict resolver to complete inline (per @jwbron's feedback)
  • ✓ Timestamp filtering to avoid race condition in workflow run identification
  • ✓ Explicit error handling for git fetch/reset operations
  • ✓ Improved error messaging
  • ✓ Indentation fix in sdlc-pipeline.yml

All automated reviews (agent-mode design and code review) show approval. Ready for human review.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

egg feedback addressed. View run logs

24 previous review(s) hidden.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (agent-mode design): No concerns.

Changes since last review (b284ff6 → HEAD) add autofix attempt tracking, limit enforcement, and re-dispatch logic:

  • mode input for checks-only re-runs after autofix
  • autofix_attempts / max_autofix_attempts in contract schema
  • Circuit breaker escalation when autofix limit exceeded
  • Re-dispatch job to re-run checks after successful autofix

This is all CI/workflow orchestration—managing when agents run and what happens after, not how agents behave. It doesn't affect agent prompts, output formats, or behavioral constraints. No agent-mode design issues.

— Authored by egg

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: Changes since commit b284ff6

Reviewed the delta from b284ff60a16845ed902676f7aa0b2820f9d07a5f to 07b597c. This PR has grown significantly since my last approval—the original merge conflict handling is still correct, but substantial new functionality has been added for autofix tracking and synchronous execution. These new additions introduce several critical issues.

Critical Issues

1. Silent failure in push retry logic during autofix attempt reset

File: sdlc-work-loop.yml, lines 365-368

git push origin "${BRANCH_NAME}" || {
  git pull --rebase origin "${BRANCH_NAME}"
  git push origin "${BRANCH_NAME}"
}

If git pull --rebase fails (e.g., rebase conflicts) or the second push fails, the step continues silently. The reset to 0 may never be persisted, but the workflow proceeds as if it was. This is the same silent fallback pattern I flagged in a previous review for the merge conflict handling, but it was reintroduced here.

Fix: Add explicit error handling:

git push origin "${BRANCH_NAME}" || {
  if ! git pull --rebase origin "${BRANCH_NAME}"; then
    echo "::error::Failed to rebase during autofix reset push"
    exit 1
  fi
  if ! git push origin "${BRANCH_NAME}"; then
    echo "::error::Failed to push autofix reset after rebase"
    exit 1
  fi
}

2. Race condition: autofix attempts counter may double-increment

File: sdlc-work-loop.yml, lines 929-1006 ("Update autofix attempts counter" step)

The step runs git fetch + git reset --hard to get the latest, then reads autofix_attempts, increments it, and pushes. If the autofix workflow itself also modifies the contract (e.g., to record fixes made), this reset will overwrite those changes.

More critically, if two pipelines run concurrently (e.g., rapid re-triggers), both may:

  1. Read the same autofix_attempts value
  2. Increment to the same new value
  3. Race to push

The retry loop (lines 968-1000) attempts to handle this, but it uses reset --hard which discards any local commit, then re-applies the increment to whatever the remote has. This could result in:

  • Double-incrementing if two runs overlap
  • Missing increments if one run's commit is overwritten

Recommendation: Use atomic operations or file locking. Since this is GitHub Actions, consider using the GitHub API to update a workflow variable or issue label instead of file-based state. Alternatively, check the remote's autofix_attempts value and only increment if it matches the expected pre-increment value.

3. workflow_dispatch runs may fail to find the dispatched run

File: sdlc-work-loop.yml, lines 876-898

The gh run list query filters by headBranch, but for workflow_dispatch events, the run may not have a headBranch set (it depends on workflow configuration). The reusable-autofix.yml workflow is triggered via workflow_dispatch, and if it doesn't check out a branch or specify one, headBranch will be empty or "main".

RUN_ID=$(gh run list \
  --workflow reusable-autofix.yml \
  --json databaseId,status,createdAt,headBranch \
  --jq --arg branch "${BRANCH_NAME}" --arg dispatch_time "${DISPATCH_TIME}" '
    [.[] | select(
      .headBranch == $branch and ...
    )] ...

Fix: Verify that reusable-autofix.yml sets headBranch correctly, or remove the branch filter and rely solely on the timestamp filter (with a narrow window).

4. re-dispatch-checks may trigger before autofix attempts counter is updated

File: sdlc-work-loop.yml, lines 1111-1159

The re-dispatch-checks job depends on aggregate-checks, but aggregate-checks depends on check-fixer. The check-fixer job's "Update autofix attempts counter" step runs after the fix step completes. However, the counter update step is NOT a separate job—it's a step within check-fixer.

If the counter update step fails (line 1004: exit 1 on push failure), the check-fixer job fails. But aggregate-checks runs with always(), so it still executes and may still see autofix_succeeded == 'true' (from steps.fix.outputs.fixed). This triggers re-dispatch-checks, which dispatches a new workflow run—but the autofix counter was never incremented, potentially leading to infinite loops.

Fix: Either:

  1. Make the counter update a separate job that re-dispatch-checks depends on
  2. Or set fixed=false if the counter update fails

5. Autofix limit check uses stale data

File: sdlc-work-loop.yml, lines 787-809

The "Check autofix attempt limit" step reads from the local checkout, which was done at line 773. If another concurrent run incremented autofix_attempts between checkout and this check, this run will use the stale (lower) value and proceed with autofix, potentially exceeding the actual limit.

Recommendation: Fetch latest before checking:

git fetch origin "${BRANCH_NAME}"
git reset --hard "origin/${BRANCH_NAME}"

Add this before reading the contract in the limit check step.

Important Issues

6. Escalation condition may never trigger if fixer job doesn't run

File: sdlc-work-loop.yml, lines 2132-2137

if: >-
  always() &&
  needs.aggregate-checks.result == 'success' &&
  needs.aggregate-checks.outputs.autofix_limit_exceeded == 'true' &&
  needs.aggregate-checks.outputs.all_passed != 'true'

autofix_limit_exceeded is only set in check-fixer.check-limit.outputs.exceeded. But check-fixer only runs if checks fail (see line 743-752). If the limit was exceeded on a previous run and the current run has no failing checks (somehow), check-fixer won't run, autofix_limit_exceeded will be empty/false, and the escalation won't trigger even though the limit was previously exceeded.

This is a minor edge case, but worth noting for correctness.

7. Mode parameter default allows unexpected behavior

File: sdlc-work-loop.yml, lines 105-110

mode:
  description: 'Execution mode: full (work+checks+review) or checks-only'
  required: false
  type: string
  default: "full"

The workflow accepts arbitrary mode values—there's no validation. If someone passes mode: "foo", the workflow will:

  • Skip the work job (line 277: if: needs.resolve-inputs.outputs.mode != 'checks-only')... wait, this would NOT skip work since foo != 'checks-only' is true.

Actually this is fine—only checks-only skips work. But the lack of validation means typos like check-only (singular) would run the full workflow unexpectedly. Consider adding validation.

Minor Issues

8. Inconsistent error message formatting

In the autofix trigger step, some errors use ::warning:: while others use ::error::. The messages should be consistent—use ::error:: for failures that cause the step to exit early.

9. BRANCH_PREFIX output added but not used in check-fixer

Line 763 sets BRANCH_PREFIX: ${{ needs.resolve-inputs.outputs.branch_prefix }} but it's never referenced in the check-fixer job (the comment at line 862 says "Note: branch_prefix is only available for workflow_call, not workflow_dispatch"). Either remove the unused env var or use it.

Summary

The original merge conflict handling approved at b284ff6 remains correct. However, the new autofix tracking functionality introduces race conditions and failure modes that need attention:

Required before merge:

  1. Fix silent push failure in autofix reset (issue #1)
  2. Address race condition in counter increment (issue #2)
  3. Verify headBranch filtering works for workflow_dispatch (issue #3)
  4. Ensure counter update failure prevents re-dispatch (issue #4)
  5. Fetch latest before checking autofix limit (issue #5)

Request changes.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

egg agent-mode-design completed. View run logs

24 previous review(s) hidden.

@james-in-a-box
Copy link
Contributor Author

egg review completed. View run logs

24 previous review(s) hidden.

@james-in-a-box
Copy link
Contributor Author

egg is addressing review feedback...

Fix critical issues raised in code review:

1. Add explicit error handling for silent push failures in:
   - Reset autofix attempts on manual trigger
   - Populate contract tasks step

2. Remove headBranch filtering for workflow_dispatch runs since
   workflow_dispatch doesn't set headBranch to the issue branch.
   Rely solely on timestamp filtering to identify triggered runs.

3. Add counter_updated output to Update autofix attempts counter step
   and factor it into the fixed output. This prevents re-dispatch if
   the counter update fails (issue #4).

4. Fetch latest branch state before checking autofix attempt limit
   to avoid proceeding with stale counter values (issue #5).

5. Improve error message when workflow run is not found to include
   timestamp and suggest possible causes.
@james-in-a-box
Copy link
Contributor Author

Review Feedback Addressed

Pushed commit 0369170 addressing the critical issues from the latest review:

Fixes Applied

  1. Silent push failure in autofix reset (issue Phases 1-2: Repository setup, docs, and gateway extraction (partial) #1)

    • Added explicit error handling in the "Reset autofix attempts on manual trigger" step
    • Also fixed the same pattern in "Populate contract tasks" step for consistency
  2. headBranch filtering for workflow_dispatch (issue Phase 1.5: Documentation extraction #3)

    • Removed headBranch filter since workflow_dispatch doesn't set it to the issue branch
    • Now relies solely on timestamp filtering to identify the triggered run
    • Improved error message to include timestamp and suggest possible causes
  3. Counter update failure prevents re-dispatch (issue Phase 2: Gateway module extraction #4)

    • Added id: update-counter and counter_updated output to the update step
    • Modified fixed output to only be true if both autofix succeeded AND counter was updated:
      fixed: ${{ (steps.fix.outputs.fixed == 'true' && steps.update-counter.outputs.counter_updated != 'false') && 'true' || ... }}
  4. Fetch latest before checking autofix limit (issue Phase 3: Container extraction #5)

    • Added git fetch and git reset --hard before reading autofix_attempts
    • Ensures concurrent updates from other runs are visible

Issue #2 (Race condition in counter increment)

The current implementation already handles this correctly via optimistic locking:

  • On push failure, it fetches latest, resets hard, re-reads the counter, increments from the new value, and retries
  • This ensures that if run A pushes first, run B will see A's increment and increment from there

The reviewer suggested atomic operations or file locking, but the current approach is a valid pattern for this use case and would require significant architectural changes to replace.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

egg feedback addressed. View run logs

24 previous review(s) hidden.

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review (agent-mode design): No concerns.

Changes since last review (07b597c0369170) address code review feedback on autofix workflow robustness:

  • Explicit error handling for push retry failures
  • Removed headBranch filtering for workflow_dispatch runs (timestamp filtering only)
  • Added counter_updated output to prevent re-dispatch on counter update failure
  • Fetch latest branch state before checking autofix limit

These are all CI/workflow improvements—better error handling, race condition fixes, and output chaining. None of this affects agent prompts, output formats, or behavioral constraints. The changes make the workflow more robust without changing how agents operate.

— Authored by egg

Copy link
Contributor Author

@james-in-a-box james-in-a-box bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review: Changes since commit 07b597c

Reviewed the delta to commit 0369170. The changes correctly address all five critical issues from my previous review.

Verified Fixes

1. Silent push failure in autofix reset (issue #1)

Lines 365-375 in sdlc-work-loop.yml now have explicit error handling:

git push origin "${BRANCH_NAME}" || {
  echo "::warning::Initial push failed, attempting pull --rebase and retry..."
  if ! git pull --rebase origin "${BRANCH_NAME}"; then
    echo "::error::Failed to rebase during autofix reset push"
    exit 1
  fi
  ...
}

The same pattern was correctly applied to "Populate contract tasks" (lines 497-506).

2. headBranch filtering for workflow_dispatch (issue #3)

Lines 897-911: The headBranch filter has been removed. The jq query now relies solely on timestamp filtering:

--jq --arg dispatch_time "${DISPATCH_TIME}" '
  [.[] | select(
    (.status == "queued" or .status == "in_progress" or .status == "completed") and
    .createdAt >= $dispatch_time
  )] | sort_by(.createdAt) | .[0].databaseId // empty
'

3. Counter update failure prevents re-dispatch (issue #4)

Lines 949-1032: The step now has id: update-counter and sets counter_updated output. The job-level fixed output (lines 768-770) correctly incorporates this:

fixed: ${{ (steps.fix.outputs.fixed == 'true' && steps.update-counter.outputs.counter_updated != 'false') && 'true' || steps.limit-exceeded.outputs.fixed || 'false' }}

This prevents re-dispatch when the counter update fails.

4. Fetch latest before checking autofix limit (issue #5)

Lines 807-809 in check-limit step now fetch and reset before reading autofix_attempts:

git fetch origin "${BRANCH_NAME}"
git reset --hard "origin/${BRANCH_NAME}"

5. Improved error messaging

Line 921: The warning message now includes the dispatch timestamp and suggests possible causes.

Previous Issues: All Addressed

For completeness, confirming fixes from earlier reviews remain in place:

  • ✓ Push after merge in init job
  • ✓ Skip redundant merges via merge-base --is-ancestor check
  • ✓ Trigger and wait for conflict resolution workflow
  • ✓ Timestamp filtering to avoid race conditions
  • ✓ Explicit error handling for git fetch/reset after conflict resolution

Issue #2 (Race condition in counter increment)

The author notes the optimistic locking pattern is intentional: on push failure, fetch latest, re-read counter, increment from fresh value, retry. This is a valid concurrency pattern for this use case.

No New Issues

The changes are targeted and correct. No new logic was introduced—only the specific fixes requested.

Approve.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

egg agent-mode-design completed. View run logs

24 previous review(s) hidden.

@james-in-a-box
Copy link
Contributor Author

egg review completed. View run logs

24 previous review(s) hidden.

@james-in-a-box
Copy link
Contributor Author

egg is addressing review feedback...

@james-in-a-box
Copy link
Contributor Author

Feedback Addressing Complete

All review feedback has been addressed. Latest automated reviews (commit 0369170) show approval:

  • ✓ Agent-mode design review: No concerns
  • ✓ Code review: Approve

No new actionable feedback since the last commit. Ready for human review.

— Authored by egg

@james-in-a-box
Copy link
Contributor Author

egg feedback addressed. View run logs

24 previous review(s) hidden.

@jwbron jwbron merged commit 0adf69e into main Feb 11, 2026
32 of 33 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 11, 2026
Update documentation to reflect changes from PR #493 which added automatic
main branch merging before SDLC work and synchronous conflict resolution.

Changes:
- Document automatic main branch merging in sdlc-pipeline.md
- Describe conflict resolution workflow integration in both guides
- Update github-automation.md to list SDLC pipeline as a trigger mode

Triggered by: 0adf69e

Authored-by: egg
jwbron added a commit that referenced this pull request Feb 12, 2026
* docs: Update SDLC pipeline docs for auto-merge behavior [doc-updater]

Update documentation to reflect changes from PR #493 which added automatic
main branch merging before SDLC work and synchronous conflict resolution.

Changes:
- Document automatic main branch merging in sdlc-pipeline.md
- Describe conflict resolution workflow integration in both guides
- Update github-automation.md to list SDLC pipeline as a trigger mode

Triggered by: 0adf69e

Authored-by: egg

* docs: Address review feedback on auto-merge documentation

- Remove inaccurate "(with timeout)" claim since gh run watch has no
  explicit timeout configured
- Add explanation that PR is required for workflow_dispatch targeting
- Clarify that SDLC pipeline "attempts to merge" rather than "merges"
  to more precisely describe the conflict-triggering scenario

Authored-by: egg

* Update Conflict Resolver trigger in Workflows Overview table

Add SDLC pipeline integration as the first trigger in the table entry to
match the detailed Trigger Modes section later in the document.

---------

Co-authored-by: jwbron <8340608+jwbron@users.noreply.github.com>
Co-authored-by: james-in-a-box[bot] <246424927+james-in-a-box[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant