Lists (11)
Sort Name ascending (A-Z)
Stars
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
The absolute trainer to light up AI agents.
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
🔎 📈 🐍 💰 Backtest trading strategies in Python.
PyTorch implementations of deep reinforcement learning algorithms and environments
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
slime is an LLM post-training framework for RL Scaling.
Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
Minimalistic 4D-parallelism distributed training framework for education purpose
A fork to add multimodal model training to open-r1
Scalable toolkit for efficient model reinforcement
Understanding R1-Zero-Like Training: A Critical Perspective
Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incen…
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
A library to analyze PyTorch traces.
✨First Open-Source R1-like Video-LLM [2025/02/18]
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

