Add more misc. changes from candle fork#3196
Conversation
| crate::bail!( | ||
| "The given quantized dtype {:?} is not supported for indexed_moe_forward!", | ||
| self.dtype() | ||
| ); |
There was a problem hiding this comment.
Just thinking out loud here. It would be nice to have automatic fallback to an approach that isn't as optimized, but still valid. Perhaps returning Result<Option<(CudaStorage, crate::Shape)>> is a decent starting point?
If None then fallback?
Not thinking we add this in this PR ofc.
There was a problem hiding this comment.
This might work, the issue is that effectively indexed_moe_forward is a grouped gemm so we'd need existing infrastructure to run a grouped gemm.
Regardless, providing a grouped gemm functionality will be very useful!
|
Addressed the review comments, the |
Co-authored-by Guoqing Bao <topon@outlook.com>
* Update CI * I have no clue what was going on with this maturin file, but I don't like it * update cuda container options * Add compute cap to cuda wf * Fix rust toolchain call * update cuda ci runner and bindgen_cuda
4c3f2be to
bdb66f2
Compare
|
for ci ubuntu, the linker seems to have crashed due to lack of memory.
|
ivarflakstad
left a comment
There was a problem hiding this comment.
Lgtm! 🔥
Same wrt CI here as well
|
@haricot yeah we can try setting the flag in CI. |
indexed_moe_forward(fast path for ggml quants)ContextDeviceapis