Skip to content

Add more misc. changes from candle fork#3196

Merged
EricLBuehler merged 18 commits intomainfrom
misc_fork_updates
Nov 25, 2025
Merged

Add more misc. changes from candle fork#3196
EricLBuehler merged 18 commits intomainfrom
misc_fork_updates

Conversation

@EricLBuehler
Copy link
Member

@EricLBuehler EricLBuehler commented Nov 17, 2025

  • indexed_moe_forward (fast path for ggml quants)
  • Improved usability of Context
  • Add full attn support for Metal SDPA
  • Fix bug w/ FlashAttn f16
  • Add necessary metal Device apis

@EricLBuehler EricLBuehler marked this pull request as ready for review November 17, 2025 22:53
Copy link
Member

@ivarflakstad ivarflakstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent stuff

Comment on lines +488 to +491
crate::bail!(
"The given quantized dtype {:?} is not supported for indexed_moe_forward!",
self.dtype()
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud here. It would be nice to have automatic fallback to an approach that isn't as optimized, but still valid. Perhaps returning Result<Option<(CudaStorage, crate::Shape)>> is a decent starting point?
If None then fallback?

Not thinking we add this in this PR ofc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might work, the issue is that effectively indexed_moe_forward is a grouped gemm so we'd need existing infrastructure to run a grouped gemm.

Regardless, providing a grouped gemm functionality will be very useful!

@EricLBuehler
Copy link
Member Author

Addressed the review comments, the new_private_buffer method is now implemented correctly.

EricLBuehler and others added 14 commits November 21, 2025 06:20
Co-authored-by Guoqing Bao <topon@outlook.com>
* Update CI

* I have no clue what was going on with this maturin file, but I don't like it

* update cuda container options

* Add compute cap to cuda wf

* Fix rust toolchain call

* update cuda ci runner and bindgen_cuda
@haricot
Copy link
Contributor

haricot commented Nov 24, 2025

for ci ubuntu, the linker seems to have crashed due to lack of memory.
maybe possible resolution:

  1. workflow .github/workflows/rust-ci.yml:
      # Add lld install, optional if lld is already present on runners
      - name: Install lld (Linux only)
        if: runner.os == 'Linux'
        run: sudo apt-get update && sudo apt-get install -y lld

      # The change: Add RUSTFLAGS for Linux to use linker-features
      - name: Run tests (with lld on Linux)
        if: runner.os == 'Linux'
        env:
          RUSTFLAGS: "-Clinker-features=-lld"
        run: cargo test --workspace

      # Existing Windows and Mac steps (unchanged)
      - name: Run tests (Windows & macOS)
        if: runner.os != 'Linux'
        run: cargo test --workspace
  1. add : .cargo/config.toml
    [target.x86_64-unknown-linux-gnu] rustflags = ["-Clinker-features=-lld"]

Copy link
Member

@ivarflakstad ivarflakstad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! 🔥
Same wrt CI here as well

@ivarflakstad
Copy link
Member

@haricot yeah we can try setting the flag in CI.
I'd prefer to not have candle be opinionated wrt the linker, so I'd rather not add it to the config if we can avoid it.
I assume there are some (possibly obscure) reasons why it is not the default.

@EricLBuehler EricLBuehler merged commit 95ea453 into main Nov 25, 2025
10 checks passed
@EricLBuehler EricLBuehler deleted the misc_fork_updates branch November 25, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants