devices: add support for first_available device priotisation#27391
Draft
chrisboulton wants to merge 1 commit intohashicorp:mainfrom
Draft
devices: add support for first_available device priotisation#27391chrisboulton wants to merge 1 commit intohashicorp:mainfrom
chrisboulton wants to merge 1 commit intohashicorp:mainfrom
Conversation
Member
|
Hi @chrisboulton and thanks for raising this PR, adding all the detail, and clearly having read our documentation. Given the size of the addition I think a good first step would be to open up an issue where we can better discuss the use cases and design specifics. I'll be able to raise this internally and get the right people involved to try and move it forward. That being, said a quick glance by a few of us indicates we do like this idea, so would be keen to see it progress. |
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
(note: this is in a bit of a draft state right now - that said, I'd love feedback from HashiCorp on the chances of having something like this incorporated and how to best align it to y'alls design goals - a rough first pass at the design would be amazing -- don't spend a lot of time on the changes themselves until we're happy with the design and I've done more of my own homework)
This PR introduces a new
first_availableblock for device requests in Nomad job specifications. This enables more flexible device scheduling by allowing you to specify a prioritized list of device reservation sizes, where the scheduler attempts each option in order and selects the first one that can be fulfilled.This is particularly useful in heterogeneous clusters with varying device types (such as a bunch of different GPU models) where you want to prioritize one type of GPU over another, but to carry the workload you need to reserve a different number of devices (GPUs).
A concrete example: I've got a workload which fits on a single 96GB GH200, but if I don't have that available I can also carry it on two H100s with 80GB memory each. I want to be able to do this in one job, and have Nomad figure out what the resource reservation should be. Today, this needs multiple jobs or multiple task groups, because
deviceonly accepts a single reservation size (count).To support this, the following is introduced:
With a job configuration like this, Nomad will first try to schedule the workload on a GH200. If that's not available, it will then try to schedule on two H100 SDMs. If that's not available, it will fail the job.
count,affinity, andconstraintwithoutfirst_availableare supported as before.Implementation Notes
I'm open to feedback on the implementation of this -- this was just the first take that came to mind.
first_availableis an ordered list of options where the first match wins. Insidefirst_available,constraintis supported, which lets you perform the additional filtering.first_availableandcountare mutually exclusive at thedevicelevel.Alternative Approach
Would it make sense to have a syntax like this instead, where the constraints are specified inline instead of in their own
constraintblock?Testing Notes
I've gone in and added a bunch of E2E tests for this as a first pass - these cover the existing device scheduling functionality and the new
first_availablefunctionality.I've only run these tests locally - I've not used the Terraform E2E test suite, and am mostly certain (given I let Claude do the work almost exclusively for the tests) that at least the TF test setup needs some work.. but otherwise, the tests themselves are passing and seem to do the right thing.
AI Use
I noticed a new callout for this in the contributing guidelines, so to call it out: