v4if

Follow

🎯

Focusing

v4if

🎯

Focusing

Follow

RiseAI-Sys

41 followers · 185 following

Achievements

Achievements

Organizations

Lists (11)

Sort

AI Compiler

Cuda Kernel

10 repositories

Diffusion Inference

Inside&Insight

LLM Inference

12 repositories

Model Parallel

Models

Others

Performance Analysis

Quant Trading

Reinforcement Learning

30 repositories

Stars

jingyaogong / minimind

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 38,257 4,586 Updated Jan 18, 2026

KellerJordan / modded-nanogpt

NanoGPT (124M) in 2 minutes

Python 4,506 593 Updated Jan 29, 2026

microsoft / agent-lightning

The absolute trainer to light up AI agents.

Python 11,883 972 Updated Jan 27, 2026

vllm-project / vllm-daily

vLLM Daily Summarization of Merged PRs

39 3 Updated Jan 29, 2026

sail-sg / understand-r1-zero

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,199 56 Updated Aug 27, 2025

meta-pytorch / torchforge

PyTorch-native post-training at scale

Python 601 82 Updated Jan 29, 2026

AlphaGPU / leetgpu-challenges

LeetGPU Challenges

Python 606 50 Updated Jan 26, 2026

attention-survey / Efficient_Attention_Survey

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

277 5 Updated Dec 1, 2025

weigao266 / Awesome-Efficient-Arch

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

390 34 Updated Nov 11, 2025

RiseAI-Sys / attention-gym

Triton based sparse quantization attention kernel collection

Python 38 4 Updated Aug 29, 2025

RiseAI-Sys / ParaVAE

Distributed parallel 3D-Causal-VAE for efficient training and inference

Python 46 3 Updated Aug 20, 2025

RiseAI-Sys / DAX

High performance inference engine for diffusion models

Python 103 3 Updated Sep 5, 2025

OpenSQZ / MegatronApp

Toolchain built around the Megatron-LM for Distributed Training

Python 84 5 Updated Dec 7, 2025

0russwest0 / Awesome-Agent-RL

490 19 Updated Oct 11, 2025

xetdata / xet-core

Rust crates for XetHub

Rust 77 18 Updated Oct 16, 2024

hao-ai-lab / Awesome-Video-Attention

A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.

55 4 Updated Oct 27, 2025

Psi-Robot / Awesome-VLA-Papers

Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

424 15 Updated Jul 3, 2025

fzyzcjy / torch_utils

Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 82 7 Updated Sep 11, 2025

omni-ai-npu / omni-infer

Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature set.

Python 102 16 Updated Jan 29, 2026

hkproj / triton-flash-attention

Python 234 28 Updated Jan 2, 2025

huggingface / flux-fast

Making Flux go brrr on GPUs.

Python 159 16 Updated Jan 5, 2026

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 529 64 Updated Sep 18, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 3,577 470 Updated Jan 29, 2026

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,268 189 Updated Nov 18, 2024

jt-zhang / Sparse_Attention_API

Python 67 7 Updated Oct 25, 2025

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 364 62 Updated Jul 25, 2022

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 256 24 Updated Jul 19, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 1,035 156 Updated Sep 2, 2025

harrism / nsys_easy

Easier, quicker command-line CUDA profiling

Shell 44 3 Updated Sep 17, 2024

ISEEKYAN / mbridge

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 188 51 Updated Jan 28, 2026