LLM
[TMLR 2024] Efficient Large Language Models: A Survey
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
ReLE评测:中文AI大模型能力评测(持续更新):目前已囊括335个大模型,覆盖chatgpt、gpt-5.2、o4-mini、谷歌gemini-3-pro、Claude-4.5、文心ERNIE-X1.1、ERNIE-5.0-Thinking、qwen3-max、百川、讯飞星火、商汤senseChat等商用模型, 以及kimi-k2、ernie4.5、minimax-M2、deepseek-…
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
An open source AI wearable device that captures what you say and hear in the real world and then transcribes and stores it on your own server. You can then chat with Adeus using the app, and it wil…
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
FlashMLA: Efficient Multi-head Latent Attention Kernels
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
LLM Frontend for Power Users.
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek-R1, Qwen3, Gemma 3, TTS 2x faster with 70% less VRAM.
Integrate the DeepSeek API into popular softwares
verl: Volcano Engine Reinforcement Learning for LLMs
TradingAgents: Multi-Agents LLM Financial Trading Framework
