[Proposal] Gate-Controlled Scheduling for Cluster Autoscalers Compatibility#4727
[Proposal] Gate-Controlled Scheduling for Cluster Autoscalers Compatibility#4727devzizu wants to merge 30 commits intovolcano-sh:masterfrom
Conversation
|
Welcome @devzizu! It looks like this is your first PR to volcano-sh/volcano 🎉 |
c7f8885 to
a58e67a
Compare
a58e67a to
5f0ed53
Compare
|
@kingeasternsun @hajnalmt Do you have time help @devzizu to take a look at this proposal? |
|
/cc @hajnalmt Sure I am starting to review it 👍 |
There was a problem hiding this comment.
Thank you for this great proposal @devzizu.
I am looking forward to the implementation, great idea and solid design change.
I had mostly minor remarks except the capacity plugin change.
What I see is that the pods that are marked scheduling gated by volcano after this change with the annotation they can maybe considered to be inqueue but straight out to be considered allocated seems a too significant change.
Wouldn’t it be sufficient to adjust the DeductSchGatedResources function so that the tasks that are only volcano scheduling gated won't be deducted when hasOnlyVolcanoSchedulingGate returns true? This approach would also maintain compatibility with the overcommit plugin, which I’m slightly concerned about since it heavily depends on scheduling gate behavior.
--- I want to remark that this would mean that these annotated pods would move to inqueue and would never be in the scope of the enqueue action if every other scheduling gate is removed, which is probably what we want.
Thank you for the contribution and the idea once more!
61ee592 to
1f0df13
Compare
hajnalmt
left a comment
There was a problem hiding this comment.
We are getting there! Keep up the great work 😊
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Great Work!
Please squash your commits, and sign the DCO! I think this is quite ready.
Do you need any help with the implementation? We will need to add an e2e test for this too.
I think the it shall happen in a different PR though as the design is quite big.
03bfa14 to
42003fa
Compare
|
/cc @JesseStutler I think we can pull this in release-1.15 |
42003fa to
e035957
Compare
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
Signed-off-by: devzizu <jazevedo960@gmail.com>
20c2601 to
db8f346
Compare
Signed-off-by: devzizu <jazevedo960@gmail.com>
|
@devzizu Hi Pedro, sorry, I want to look back at why we chose to use scheduling gate instead of taking a new condition like adding an Unallocatable, instead of letting pods with insufficient queue resources continue to add an Unscheduable condition, because it feels like the intrusive modification of schedulingGate is a bit too much, even though we can control it with a switch. But it is inevitable that there will be more switch control code in the future, I am not sure if this is a better solution, sorry to look back at our original design at this time, but I suddenly thought that we have changed so much code, why not try to solve it from the condition? WDYT, however, it's not very friendly to your current job, it's just a suggestion of mine, and you're welcome to disagree with me at any time. |
IIRC, we did not considered following that path as it would introduce a new non-standard reason name in the condition (i.e., WDYT? |
What type of PR is this?
/kind documentation
/kind feature
What this PR does / why we need it:
This design proposal addresses an issue where Volcano incorrectly signals cluster autoscalers (e.g., CA or Karpenter) to scale up nodes even when pods are only waiting for queue capacity, not cluster resources. Currently, Volcano marks all unallocated pods as
Unschedulableregardless of the reason, causing autoscalers to interpret queue constraints as insufficient node capacity.This proposal introduces an opt-in feature using Kubernetes
schedulingGatesto hide queue-constrained pods from autoscalers, ensuring scale-up operations only trigger for legitimate node-fit failures. This aims to prevent unnecessary infrastructure costs and resource waste.Which issue(s) this PR fixes:
Fixes #4710