Skip to content

Conversation

@locriandev
Copy link
Contributor

images:health - Enable immediate failure notifications for OKD

Summary

Modified the images:health command to notify on the first build failure for OKD variant, bypassing the 3-attempt grace period used for OCP builds.

Changes

  • Updated doozer/doozerlib/cli/images_health.py:163 to skip the "success within last 3 attempts" check when variant is BuildVariant.OKD
  • Added clarifying comment explaining OKD's immediate notification behavior

Rationale

OKD builds are slower than OCP builds. With the previous 3-attempt grace period, by the time a notification was sent, significant time had already passed. Immediate notifications allow faster response to build failures.

Impact

  • OKD builds: Now receive LATEST_ATTEMPT_FAILED concern on first failure
  • OCP builds: No change - continues to suppress notifications when there's been a success within the last 3 attempts

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

Walkthrough

The notification skip condition in the health check logic was refined to handle OKD variants differently. Previously, notifications were skipped when the latest successful build occurred within the last three attempts. Now, this skip is disabled for OKD builds, ensuring OKD variants always notify on first failure while non-OKD variants retain the original behavior.

Changes

Cohort / File(s) Summary
Notification Skip Logic
doozer/doozerlib/cli/images_health.py
Modified condition to add OKD guard: skip only applies when variant is not OKD AND latest success index is within last three attempts. Ensures OKD builds always notify on first failure.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@joepvd joepvd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 19, 2026
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 19, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joepvd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 19, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
doozer/doozerlib/cli/images_health.py (1)

163-169: Critical logic bug: Missing guard causes false positives for OKD.

The modified condition lacks a check for latest_success_idx > 0. When latest_success_idx == 0 (latest build succeeded), for OKD:

  1. Line 140-146 adds LATEST_BUILD_SUCCEEDED concern
  2. Line 163 condition evaluates to False (since variant is OKD)
  3. Falls through to else block, incorrectly adding LATEST_ATTEMPT_FAILED

Similarly, when latest_success_idx == -1 (never succeeded), OKD would add both FAILING_AT_LEAST_FOR and LATEST_ATTEMPT_FAILED.

🐛 Proposed fix
-        if self.variant is not BuildVariant.OKD and latest_success_idx <= 3:
+        if latest_success_idx > 0 and self.variant is not BuildVariant.OKD and latest_success_idx <= 3:
             # The latest attempt was a failure, but there was a success within the last 3 attempts: skip notification
             # Note: For OKD, we always notify on first failure, so this check is skipped
             self.logger.info(
                 f'Latest attempt for {image_meta.distgit_key} failed, but the one before it succeeded, skipping notification.'
             )
             return
 
-        else:
+        elif latest_success_idx > 0:
             # The latest attempt was a failure, and the last success was more than 3 attempts ago: add a concern

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 19, 2026

@locriandev: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/security a242887 link false /test security

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@locriandev locriandev merged commit 4a175aa into openshift-eng:main Jan 19, 2026
2 of 4 checks passed
@locriandev locriandev deleted the okd-health branch January 19, 2026 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants