Skip to content

Conversation

@Sa4dUs
Copy link
Contributor

@Sa4dUs Sa4dUs commented Dec 27, 2025

Allows modifying the workgroup and thread grid dimensions directly from the intrinsic call.

core::intrinsics::offload(_kernel_1, [256, 1, 1], [32, 1, 1], (x,))

r? @ZuseZ4

@rustbot
Copy link
Collaborator

rustbot commented Dec 27, 2025

The rustc-dev-guide subtree was changed. If this PR only touches the dev guide consider submitting a PR directly to rust-lang/rustc-dev-guide otherwise thank you for updating the dev guide with your changes.

cc @BoxyUwU, @jieyouxu, @Kobzol, @tshepang

Some changes occurred to the intrinsics. Make sure the CTFE / Miri interpreter
gets adapted for the changes, if necessary.

cc @rust-lang/miri, @RalfJung, @oli-obk, @lcnr

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-rustc-dev-guide Area: rustc-dev-guide labels Dec 27, 2025
@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Dec 27, 2025
@ZuseZ4 ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Dec 27, 2025
@ZuseZ4
Copy link
Member

ZuseZ4 commented Dec 27, 2025

@kevinsala it would be nice to have an example in our docs where giving the same dimensions as runtime value (e.g. via command line) vs giving them at compile time (like in the example here) have some measurable perf difference. I assume it's hard to artificially come up with an example exactly for that, or do you know how to get one?

@kevinsala
Copy link

@ZuseZ4 I'll try to find an example.

@Sa4dUs Sa4dUs force-pushed the offload-intrinsic2 branch from efcf026 to 330d170 Compare December 28, 2025 10:05
@ZuseZ4
Copy link
Member

ZuseZ4 commented Dec 30, 2025

As per discussion with offload devs, this should be u32 not i32, otherwise lgtm for now.

Once we have a proper macro frontend for our intrinsic we can also consider changing it to [Option<NonZeroU32>;3], but until then, the interface would be way to cumbersome to call manually and [u32;3] is the best approximation. Certainly better than having the hardcoded values from one benchmark.

@Sa4dUs Sa4dUs force-pushed the offload-intrinsic2 branch from 330d170 to 6e0d0da Compare December 30, 2025 21:24
@rustbot

This comment has been minimized.

@Sa4dUs Sa4dUs force-pushed the offload-intrinsic2 branch from 6e0d0da to 33d39a9 Compare December 31, 2025 11:35
@rustbot
Copy link
Collaborator

rustbot commented Dec 31, 2025

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@ZuseZ4 ZuseZ4 mentioned this pull request Dec 31, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-rustc-dev-guide Area: rustc-dev-guide F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants