Stars
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Community maintained hardware plugin for vLLM on Ascend
Fully open reproduction of DeepSeek-R1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
Simba2017 / EasyR1
Forked from hiyouga/EasyR1EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
The PYthoN General UnIt Test geNerator is a test-generation tool for Python
verl: Volcano Engine Reinforcement Learning for LLMs
Minimal reproduction of DeepSeek R1-Zero
Everything about the SmolLM and SmolVLM family of models
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
21 Lessons, Get Started Building with Generative AI
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
The Open Cookbook for Top-Tier Code Large Language Model
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization.
Code for the paper "Language Models are Unsupervised Multitask Learners"
llama3 implementation one matrix multiplication at a time