A collection of high-performance CUDA kernels and experiments for learning and optimizing GPU compute primitives.
-
Updated
Jan 21, 2026 - Cuda
A collection of high-performance CUDA kernels and experiments for learning and optimizing GPU compute primitives.
CUDA HPC Kernel Optimization Textbook: Naive to Tensor Core — GEMM, FlashAttention & Quantization | CUDA 高性能算子开发教科书:从 Naive 到 Tensor Core 完整优化路径,涵盖 GEMM/FlashAttention/量化
Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."