Lists (11)
Sort Name ascending (A-Z)
Stars
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
The absolute trainer to light up AI agents.
Understanding R1-Zero-Like Training: A Critical Perspective
A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
Triton based sparse quantization attention kernel collection
Distributed parallel 3D-Causal-VAE for efficient training and inference
High performance inference engine for diffusion models
Toolchain built around the Megatron-LM for Distributed Training
A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.
Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)
Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature set.
slime is an LLM post-training framework for RL Scaling.
CUDA Matrix Multiplication Optimization
Fast CUDA matrix multiplication from scratch
Bridge Megatron-Core to Hugging Face/Reinforcement Learning

