- New York, NY
Stars
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Supporting code for the blog post on modular manifolds.
🚀 Efficient implementations of state-of-the-art linear attention models
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Distributed Compiler based on Triton for Parallel Systems
Fast and memory-efficient exact attention
Ongoing research training transformer models at scale
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
SGLang is a high-performance serving framework for large language models and multimodal models.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Analyze computation-communication overlap in V3/R1.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Tile primitives for speedy kernels
Important concepts in numerical linear algebra and related areas
CUDA Templates and Python DSLs for High-Performance Linear Algebra
A PyTorch native platform for training generative AI models
Development repository for the Triton language and compiler
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more



