Add more misc. changes from candle fork by EricLBuehler · Pull Request #3196 · huggingface/candle

EricLBuehler · 2025-11-17T20:10:11Z

indexed_moe_forward (fast path for ggml quants)
Improved usability of Context
Add full attn support for Metal SDPA
Fix bug w/ FlashAttn f16
Add necessary metal Device apis

ivarflakstad

Excellent stuff

candle-core/src/metal_backend/device.rs

ivarflakstad · 2025-11-19T10:08:49Z

candle-core/src/quantized/cuda.rs

+            crate::bail!(
+                "The given quantized dtype {:?} is not supported for indexed_moe_forward!",
+                self.dtype()
+            );


Just thinking out loud here. It would be nice to have automatic fallback to an approach that isn't as optimized, but still valid. Perhaps returning Result<Option<(CudaStorage, crate::Shape)>> is a decent starting point?
If None then fallback?

Not thinking we add this in this PR ofc.

This might work, the issue is that effectively indexed_moe_forward is a grouped gemm so we'd need existing infrastructure to run a grouped gemm.

Regardless, providing a grouped gemm functionality will be very useful!

candle-core/src/error.rs

candle-core/src/tensor.rs

candle-flash-attn/src/lib.rs

candle-kernels/src/quantized.cu

candle-metal-kernels/src/kernels/sdpa.rs

candle-metal-kernels/src/metal/device.rs

candle-core/src/error.rs

EricLBuehler · 2025-11-20T01:17:43Z

Addressed the review comments, the new_private_buffer method is now implemented correctly.

candle-core/src/metal_backend/device.rs

Co-authored-by Guoqing Bao <topon@outlook.com>

* Update CI * I have no clue what was going on with this maturin file, but I don't like it * update cuda container options * Add compute cap to cuda wf * Fix rust toolchain call * update cuda ci runner and bindgen_cuda

haricot · 2025-11-24T13:15:15Z

for ci ubuntu, the linker seems to have crashed due to lack of memory.
maybe possible resolution:

workflow .github/workflows/rust-ci.yml:

      # Add lld install, optional if lld is already present on runners
      - name: Install lld (Linux only)
        if: runner.os == 'Linux'
        run: sudo apt-get update && sudo apt-get install -y lld

      # The change: Add RUSTFLAGS for Linux to use linker-features
      - name: Run tests (with lld on Linux)
        if: runner.os == 'Linux'
        env:
          RUSTFLAGS: "-Clinker-features=-lld"
        run: cargo test --workspace

      # Existing Windows and Mac steps (unchanged)
      - name: Run tests (Windows & macOS)
        if: runner.os != 'Linux'
        run: cargo test --workspace

add : .cargo/config.toml
[target.x86_64-unknown-linux-gnu] rustflags = ["-Clinker-features=-lld"]

ivarflakstad

Lgtm! 🔥
Same wrt CI here as well

ivarflakstad · 2025-11-24T15:07:35Z

@haricot yeah we can try setting the flag in CI.
I'd prefer to not have candle be opinionated wrt the linker, so I'd rather not add it to the config if we can avoid it.
I assume there are some (possibly obscure) reasons why it is not the default.

EricLBuehler marked this pull request as ready for review November 17, 2025 22:53

ivarflakstad reviewed Nov 19, 2025

View reviewed changes

EricLBuehler requested a review from ivarflakstad November 20, 2025 01:16

ivarflakstad reviewed Nov 20, 2025

View reviewed changes

candle-core/src/metal_backend/device.rs Outdated Show resolved Hide resolved

EricLBuehler and others added 14 commits November 21, 2025 06:20

Merge with fork

178987a

Co-authored-by Guoqing Bao <topon@outlook.com>

Update sdpa

d4dab0c

Fix flash attn bf16 case

0ee2bc8

Metal fixes

bc9030c

Add metal methods

fd2b563

Add new_private_buffer

00689f5

Fix metal tests

60e297a

Format

dc80e40

Apply review comments

15591ff

Update CI (#3194)

5d1dbd6

* Update CI * I have no clue what was going on with this maturin file, but I don't like it * update cuda container options * Add compute cap to cuda wf * Fix rust toolchain call * update cuda ci runner and bindgen_cuda

Add initial support for imatrix quantization (#3193)

a372a14

add clear kv cache to quantized qwen3 weights (#3189)

1bb1c93

Fix metal bug

cb4a042

Apply review comments

bdb66f2

EricLBuehler force-pushed the misc_fork_updates branch from 4c3f2be to bdb66f2 Compare November 21, 2025 11:22

EricLBuehler requested a review from ivarflakstad November 21, 2025 11:23

EricLBuehler added 2 commits November 21, 2025 06:24

Merge branch 'main' into misc_fork_updates

2536e75

Fix merge

d21b0a7

ivarflakstad approved these changes Nov 24, 2025

View reviewed changes

haricot and others added 2 commits November 25, 2025 11:10

Add lld installation and test steps for Linux (#3213)

7fe9d49

Merge branch 'main' into misc_fork_updates

921a80b

EricLBuehler merged commit 95ea453 into main Nov 25, 2025
10 checks passed

EricLBuehler deleted the misc_fork_updates branch November 25, 2025 18:39

SpenserCai mentioned this pull request Dec 24, 2025

Add Z-Image Text-to-Image Generation Support #3261

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more misc. changes from candle fork#3196

Add more misc. changes from candle fork#3196
EricLBuehler merged 18 commits intomainfrom
misc_fork_updates

EricLBuehler commented Nov 17, 2025 •

edited

Loading

Uh oh!

ivarflakstad left a comment

Uh oh!

Uh oh!

ivarflakstad Nov 19, 2025

Uh oh!

EricLBuehler Nov 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EricLBuehler commented Nov 20, 2025

Uh oh!

Uh oh!

haricot commented Nov 24, 2025

Uh oh!

ivarflakstad left a comment

Uh oh!

ivarflakstad commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

EricLBuehler commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ivarflakstad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ivarflakstad Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

EricLBuehler Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

EricLBuehler commented Nov 20, 2025

Uh oh!

Uh oh!

haricot commented Nov 24, 2025

Uh oh!

ivarflakstad left a comment

Choose a reason for hiding this comment

Uh oh!

ivarflakstad commented Nov 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

EricLBuehler commented Nov 17, 2025 •

edited

Loading