- Beijing, China
Starred repositories
A framework for efficient model inference with omni-modality models
HAMi-core compiles libvgpu.so, which ensures hard limit on GPU in container
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
Fast and memory-efficient exact attention
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
A collection of modern C++ libraries, include coro_http, coro_rpc, compile-time reflection, struct_pack, struct_json, struct_xml, struct_pb, easylog, async_simple etc.
SGLang is a high-performance serving framework for large language models and multimodal models.
FlashInfer: Kernel Library for LLM Serving
KV cache store for distributed LLM inference
verl: Volcano Engine Reinforcement Learning for LLMs
A Datacenter Scale Distributed Inference Serving Framework
gopy generates a CPython extension module from a go package.
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
A lightweight data processing framework built on DuckDB and 3FS.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
Cost-efficient and pluggable Infrastructure components for GenAI inference
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
A high-throughput and memory-efficient inference and serving engine for LLMs
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
Development repository for the Triton language and compiler
Kubernetes WithOut Kubelet - Simulates thousands of Nodes and Clusters.
DLRover: An Automatic Distributed Deep Learning System

