Skip to content

Fix UpdateHyperNode rebuild logic#5076

Open
hzxuzhonghu wants to merge 1 commit intomasterfrom
fix/update-hypernode-rebuild
Open

Fix UpdateHyperNode rebuild logic#5076
hzxuzhonghu wants to merge 1 commit intomasterfrom
fix/update-hypernode-rebuild

Conversation

@hzxuzhonghu
Copy link
Member

@hzxuzhonghu hzxuzhonghu commented Mar 4, 2026

What type of PR is this?

/kind cleanup

And also make a optimization: only rebuild when its members change

What this PR does / why we need it:

Fixes UpdateHyperNode to properly apply spec changes, restore regex/label rebuild behavior, and rebuild caches when members change so scheduler topology state stays consistent. Updates related tests accordingly.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Copilot AI review requested due to automatic review settings March 4, 2026 01:15
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from hzxuzhonghu. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 4, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a bug in the UpdateHyperNode logic to ensure the scheduler's topology state remains consistent and accurate. It optimizes the update process by introducing intelligent conditional rebuilds, preventing full cache recomputations when only metadata changes, while guaranteeing that changes to HyperNode specifications, member lists, or dynamic selectors correctly trigger necessary updates. This leads to more efficient and reliable HyperNode management within the scheduler.

Highlights

  • Refactored HyperNode Update Logic: The UpdateHyperNode function has been significantly refactored to introduce conditional rebuilds. It now uses reflect.DeepEqual to determine if the HyperNode's spec, members, or tier have changed, avoiding unnecessary full cache rebuilds.
  • Improved Handling of Regex/Label Members: A new mechanism ensures that HyperNodes with regex or label-matching members always trigger a rebuild, even if their spec hasn't changed, to account for dynamic cluster node state changes that could affect their resolved node set.
  • Enhanced Parent-Child Relationship Management: The updateParent function now returns a set of 'freed' members whose parent pointers were cleared. This allows for immediate rebuilding of other HyperNodes that might claim these newly available members, resolving potential adoption issues.
  • Streamlined Cache Rebuild Process: The rebuildCache and updateAncestors functions have been updated to accept the current list of nodes as an argument, preventing redundant node list fetches. Error tracking via builtErrHyperNode has been removed.
  • Expanded Test Coverage: New test cases have been added to validate the conditional rebuild logic for scenarios involving no spec change, regex members, tier-only changes, and member changes, ensuring the robustness of the updated UpdateHyperNode behavior.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pkg/scheduler/api/hyper_node_info.go
    • Imported the reflect package for deep equality comparisons.
    • Removed the builtErrHyperNode field from the HyperNodesInfo struct.
    • Refactored UpdateHyperNode to conditionally apply spec changes and rebuild caches based on specChanged, membersChanged, and tierChanged flags.
    • Added a fast-path in UpdateHyperNode to skip rebuilds for exact-match HyperNodes when the spec is unchanged.
    • Introduced the hyperNodeHasRegexOrLabelMember function to identify HyperNodes with dynamic member selectors.
    • Modified updateParent to return a set of freed member names.
    • Removed hni.builtErrHyperNode assignments from addChild and resetParent functions.
    • Added the hyperNodesThatClaimMember function to find HyperNodes claiming a specific member.
    • Updated updateAncestors to fetch the node list once and pass it to rebuildCache.
    • Modified rebuildCache to accept a list of nodes and an ancestor set, and removed internal node list fetching.
  • pkg/scheduler/api/hyper_node_info_test.go
    • Imported fmt and sync/atomic packages.
    • Added TestHyperNodesInfo_UpdateHyperNode_NoSpecChange to verify no rebuild occurs for exact-match HyperNodes with unchanged specs.
    • Added TestHyperNodesInfo_UpdateHyperNode_RegexMember_AlwaysRebuilds to confirm rebuilds for regex-matching HyperNodes even with unchanged specs.
    • Added TestHyperNodesInfo_UpdateHyperNode_TierOnlyChange to test rebuild behavior when only the tier changes.
    • Added TestHyperNodesInfo_UpdateHyperNode_MembersChangeTriggersRebuild to validate rebuilds upon member list changes.
    • Removed the correctHyperNode field from the test struct in TestHyperNodesInfo_UpdateHyperNode_MultipleParents.
    • Added a fmt.Printf statement for debugging in TestHyperNodesInfo_UpdateHyperNode_MultipleParents.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a significant and well-implemented refactoring of the UpdateHyperNode logic. The changes correctly handle spec modifications, member updates, and tier changes, introducing a performance-optimizing fast-path for updates that don't require a full cache rebuild. The new logic properly manages hypernodes with regex or label selectors that depend on the live cluster state. A key improvement is the handling of member removal, which now triggers a rebuild for other hypernodes that might adopt the freed members, resolving potential parent-child relationship issues. The accompanying tests are thorough and cover the new scenarios effectively. My review identified one minor issue: a leftover debug print statement in a test file that should be removed.

}
assert.Equal(t, false, hni.Ready())

fmt.Printf("%v\n", hni.HyperNodesInfo())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This fmt.Printf appears to be a leftover debugging statement. It should be removed to keep the test output clean.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes UpdateHyperNode cache rebuild behavior so scheduler topology state stays consistent when HyperNode specs change, regex/label selectors are involved, or members are released and can be adopted by other HyperNodes.

Changes:

  • Add spec-diffing and a fast-path to skip rebuilds when spec is unchanged and members are exact-match only.
  • Reintroduce “always rebuild” behavior for regex/label selectors and rebuild caches when members change (including rebuilding claimers of newly-freed members).
  • Update and expand UpdateHyperNode unit tests to cover no-op updates, tier-only updates, regex rebuilds, and member-change rebuilds.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
pkg/scheduler/api/hyper_node_info.go Refactors UpdateHyperNode rebuild logic, adds regex/label rebuild detection, rebuilds affected claimers, and threads a pre-listed node slice into rebuild functions.
pkg/scheduler/api/hyper_node_info_test.go Adds focused tests for new rebuild semantics and adjusts existing multiple-parent test coverage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +342 to +343
fmt.Printf("%v\n", hni.HyperNodesInfo())

Copy link

Copilot AI Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid printing directly to stdout in tests (fmt.Printf). This adds noisy output to go test runs and forces an extra import. If you need diagnostics, use t.Logf/t.Log (or remove the line entirely) and drop the fmt import.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget to delete it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete it, this PR looks overall good to me.

@hzxuzhonghu
Copy link
Member Author

cc @Monokaix

Copy link
Contributor

@hajnalmt hajnalmt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks overall good to me 😊 This is a really great improvement!
I had 2 minor comments, but they are really minor.

tierChanged := true
if exists && old.HyperNode != nil {
oldSpec := old.HyperNode.Spec
specChanged = !reflect.DeepEqual(oldSpec, hn.Spec)
Copy link
Contributor

@hajnalmt hajnalmt Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that reflect here is comparatively slow and a little overkill for a struct with 3 fields. Plus these are dependent specChanged will be true if membersChanged or tierChanged is true. I would prefer to use the k8s equality library for this instead of reflect.

oldSpec := old.HyperNode.Spec
newSpec :=  hn.Spec
tierChanged := oldSpec.Tier != newSpec.Tier
membersChanged := !equality.Semantic.DeepEqual(oldSpec.Members, newSpec.Members)
specChanged := tierChanged || membersChanged || oldSpec.TierName != newSpec.TierName

This is somewhat more exact.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

Comment on lines +342 to +343
fmt.Printf("%v\n", hni.HyperNodesInfo())

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete it, this PR looks overall good to me.

@hzxuzhonghu
Copy link
Member Author

Thank you for your review @hajnalmt I will try to address all the comments

Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>
@hzxuzhonghu hzxuzhonghu force-pushed the fix/update-hypernode-rebuild branch from 5992776 to 2f2fad4 Compare March 16, 2026 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants