Skip to content

Releases: gittool/llama.cpp

b6122

09 Aug 15:42
34c9d76

Choose a tag to compare

CUDA: add attention sinks for tile and wmma (#15178)

* CUDA: add attention sinks for tile and wmma

* Review: formatting changes + remove syncthreads from tile + remove warp_reduce_max from wmma

b6117

08 Aug 08:45
1425f58

Choose a tag to compare

CUDA: attention sinks for mma FlashAttention (#15157)