Stars
[AAAI 2026] QiMeng-Kernel: Macro-Thinking Micro-Coding Paradigm for LLM-Based High-Performance GPU Kernel Generation
Implementation for NeurIPS 2024 oral paper: Divide-and-Conquer Meets Consensus: Unleashing the Power of Functions in Code Generation
[OSDI'25] QiMeng-Xpiler: Transcompiling Tensor Programs for Deep Learning Systems with a Neural-Symbolic Approach
Throughput-oriented multi-turn inference engine for KernelBench [ICML '25]
[DEPRECATED] Moved to ROCm/rocm-libraries repo
FlagGems is an operator library for large language models implemented in the Triton Language.
Fast and memory-efficient exact attention
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
SGLang is a high-performance serving framework for large language models and multimodal models.
My learning notes for ML SYS.
A minimal, responsive, and feature-rich Jekyll theme for technical writing.
Optimize softmax in triton in many cases
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Create the best admin based on Vue3.x, Vite5.x, TypeScript, Vuetify3.x, Chat GPT