Lists (1)
Sort Name ascending (A-Z)
Starred repositories
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Making large AI models cheaper, faster and more accessible
You like pytorch? You like micrograd? You love tinygrad! ❤️
SGLang is a high-performance serving framework for large language models and multimodal models.
Machine Learning Engineering Open Book
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
🏔️国立台湾大学、新加坡国立大学、早稻田大学、东京大学,中央研究院(台湾)以及中国重点高校及科研机构,社科、经济、数学、博弈论、哲学、系统工程类学术论文等知识库。
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
FlashInfer: Kernel Library for LLM Serving
Performance-Optimized AI Inference on Your GPUs. Unlock it by selecting and tuning the optimal inference engine for your model.
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Several simple examples for popular neural network toolkits calling custom CUDA operators.
Automatically Collect POC or EXP from GitHub by CVE ID.
FlagGems is an operator library for large language models implemented in the Triton Language.
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
Visual Studio Code project/compile_commands.json generator for Linux kernel sources and out-of-tree modules
GLake: optimizing GPU memory management and IO transmission.
