Support configuring priority for NodeGroup when used by queue#5110
Support configuring priority for NodeGroup when used by queue#5110ruanwenjun wants to merge 1 commit intovolcano-sh:masterfrom
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the nodegroup scheduling plugin, allowing users to define explicit priority for preferred nodegroups. By enabling the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
34fb991 to
d4cbdca
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a new feature to allow prioritizing preferred nodegroups based on their order in the configuration. The changes are well-implemented across the documentation, application logic, and tests. I've included one suggestion to optimize the performance of the scoring logic, which is in a hot path.
| n := len(q.queueGroupAffinityPreferredList) | ||
| for i, preferred := range q.queueGroupAffinityPreferredList { | ||
| if preferred == group { | ||
| nodeScore += 0.5 * BaseScore * float64(n-i) / float64(n) | ||
| break | ||
| } | ||
| } |
There was a problem hiding this comment.
For performance optimization, consider using a map for O(1) lookups instead of iterating through the queueGroupAffinityPreferredList slice in the score function. This function can be a hot path in the scheduler.
You can implement this with the following changes:
-
Add a map to the
queueGroupAffinitystruct to store the index of each preferred nodegroup:// in queueGroupAffinity struct queueGroupAffinityPreferredMap map[string]int
-
Populate this map in
newQueueGroupAffinityafter populating the list:// in newQueueGroupAffinity function affinity.queueGroupAffinityPreferredList = nodeGroupAffinity.PreferredDuringSchedulingIgnoredDuringExecution affinity.queueGroupAffinityPreferredMap = make(map[string]int, len(affinity.queueGroupAffinityPreferredList)) for i, group := range affinity.queueGroupAffinityPreferredList { affinity.queueGroupAffinityPreferredMap[group] = i }
-
Use the map for a direct lookup in the
scorefunction as suggested below.
if i, ok := q.queueGroupAffinityPreferredMap[group]; ok {
n := len(q.queueGroupAffinityPreferredList)
nodeScore += 0.5 * BaseScore * float64(n-i) / float64(n)
}There was a problem hiding this comment.
Pull request overview
Adds an optional “preferred nodegroup ordering” mode to the nodegroup scheduler plugin so that queues can prioritize earlier entries in preferredDuringSchedulingIgnoredDuringExecution when scoring nodes.
Changes:
- Add
enablePreferredOrderplugin argument and use list-order-based scoring for preferred nodegroups when enabled. - Extend nodegroup unit tests to cover ordered preferred scoring.
- Document the new argument and scoring formula in user guide and design docs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pkg/scheduler/plugins/nodegroup/nodegroup.go | Introduces enablePreferredOrder, stores preferred list order, and updates scoring logic to be position-based. |
| pkg/scheduler/plugins/nodegroup/nodegroup_test.go | Adds a new test case validating ordered preferred scoring results. |
| docs/user-guide/how_to_use_nodegroup_plugin.md | Documents how to enable and configure ordered preferred nodegroup scoring. |
| docs/design/node-group.md | Updates design doc to describe the new ordered scoring behavior and configuration. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| 2. affinity.nodeGroupAffinity.preferredDuringSchedulingIgnoredDuringExecution, soft constraints, such as `nlp = nodegroup1`, it means that task in queue=nlp runs on nodegroup1 first, but if the resources of nodegroup1 is insufficient, it can also run on other nodegroups. Combine rule1 and rule2, task in queue=nlp runs on nodegroup1 first, but if the resources of nodegroup1 is insufficient, it can also run on nodegroup2. When `enablePreferredOrder` is enabled in plugin arguments, the order of the list determines priority — earlier entries get higher scores (score formula: `0.5 * BaseScore * (N - i) / N`). By default, all preferred nodegroups receive the same score. | ||
| 3. affinity.nodeGroupAntiAffinity.requiredDuringSchedulingIgnoredDuringExecution, hard constraints, such as `nlp = nodegroup1`, it means that task in queue=nlp can run on any nodegroups but nodegroup1. | ||
| 4. affinity.nodeGroupAntiAffinity.preferredDuringSchedulingIgnoredDuringExecution, soft constraints, such as `nlp = nodegroup1`, it means that task in queue=nlp runs on any other nodegroups, but if the resources of other nodegroup is insufficient, it can also run on nodegroup1. |
d4cbdca to
decb9b7
Compare
decb9b7 to
d969067
Compare
What type of PR is this?
feature
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #5109
Special notes for your reviewer:
Does this PR introduce a user-facing change?
NONE, this is complated with the latest version.