Lists (9)
Sort Name ascending (A-Z)
Starred repositories
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
A hardware-aware guide to data structures for system software engineers.
A Easy-to-understand TensorOp Matmul Tutorial
MoE training for Me and You and maybe other people
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
Building blocks for foundation models.
Machine Learning library for the emerging Mojo/Python ecosystem
torchcomms: a modern PyTorch communications API
Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling
A workload for deploying LLM inference services on Kubernetes
Inference server benchmarking tool
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
Ship correct and fast LLM kernels to PyTorch
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…
My learning notes for ML SYS.
A massively parallel, high-level programming language
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Generate python ctypes classes from C headers. Requires LLVM clang
Performance-portable, length-agnostic SIMD with runtime dispatch




