Skip to content
View v4if's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@RiseAI-Sys

Block or report v4if

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
43 stars written in Python
Clear filter

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 38,262 4,586 Updated Jan 18, 2026

The absolute trainer to light up AI agents.

Python 11,886 973 Updated Jan 27, 2026

Nano vLLM

Python 11,210 1,468 Updated Nov 3, 2025

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Python 8,979 974 Updated Jul 8, 2025

Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!

Python 8,452 696 Updated Jan 29, 2026

🔎 📈 🐍 💰 Backtest trading strategies in Python.

Python 7,856 1,388 Updated Dec 20, 2025

PyTorch implementations of deep reinforcement learning algorithms and environments

Python 5,921 1,211 Updated Jul 25, 2024

Open-source unified multimodal model

Python 5,606 494 Updated Oct 27, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 4,950 337 Updated Jan 18, 2026

NanoGPT (124M) in 2 minutes

Python 4,507 593 Updated Jan 29, 2026

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,733 262 Updated Dec 18, 2025

slime is an LLM post-training framework for RL Scaling.

Python 3,577 471 Updated Jan 29, 2026

Clean, Robust, and Unified PyTorch implementation of popular Deep Reinforcement Learning (DRL) algorithms (Q-learning, Duel DDQN, PER, C51, Noisy DQN, PPO, DDPG, TD3, SAC, ASL)

Python 3,257 389 Updated Jun 11, 2025

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

Python 3,132 485 Updated Apr 22, 2023

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,728 207 Updated Jan 29, 2026

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 2,037 163 Updated Aug 26, 2025

A fork to add multimodal model training to open-r1

Python 1,442 70 Updated Feb 8, 2025

Scalable toolkit for efficient model reinforcement

Python 1,270 231 Updated Jan 29, 2026

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,199 56 Updated Aug 27, 2025

Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs

Python 923 54 Updated Nov 27, 2025

Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.

Python 839 53 Updated May 14, 2025

[ICLR2026] This is the first paper to explore how to effectively use R1-like RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-start initialization and RL training to incen…

Python 753 20 Updated Jan 26, 2026

LeetGPU Challenges

Python 606 50 Updated Jan 26, 2026

PyTorch-native post-training at scale

Python 602 82 Updated Jan 29, 2026

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 512 25 Updated Jan 18, 2026

A library to analyze PyTorch traces.

Python 464 78 Updated Jan 22, 2026

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 380 13 Updated Feb 23, 2025

A scalable asynchronous reinforcement learning implementation with in-flight weight updates.

Python 357 34 Updated Jan 27, 2026

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Python 214 9 Updated Sep 26, 2025
Next