Skip to content
View v4if's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@RiseAI-Sys

Block or report v4if

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 38,257 4,586 Updated Jan 18, 2026

NanoGPT (124M) in 2 minutes

Python 4,506 593 Updated Jan 29, 2026

The absolute trainer to light up AI agents.

Python 11,883 972 Updated Jan 27, 2026

vLLM Daily Summarization of Merged PRs

39 3 Updated Jan 29, 2026

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,199 56 Updated Aug 27, 2025

PyTorch-native post-training at scale

Python 601 82 Updated Jan 29, 2026

LeetGPU Challenges

Python 606 50 Updated Jan 26, 2026

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

277 5 Updated Dec 1, 2025

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

390 34 Updated Nov 11, 2025

Triton based sparse quantization attention kernel collection

Python 38 4 Updated Aug 29, 2025

Distributed parallel 3D-Causal-VAE for efficient training and inference

Python 46 3 Updated Aug 20, 2025

High performance inference engine for diffusion models

Python 103 3 Updated Sep 5, 2025

Toolchain built around the Megatron-LM for Distributed Training

Python 84 5 Updated Dec 7, 2025

Rust crates for XetHub

Rust 77 18 Updated Oct 16, 2024

A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.

55 4 Updated Oct 27, 2025

Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

424 15 Updated Jul 3, 2025

Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 82 7 Updated Sep 11, 2025

Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature set.

Python 102 16 Updated Jan 29, 2026

Making Flux go brrr on GPUs.

Python 159 16 Updated Jan 5, 2026

Fastest kernels written from scratch

Cuda 529 64 Updated Sep 18, 2025

slime is an LLM post-training framework for RL Scaling.

Python 3,577 470 Updated Jan 29, 2026

Puzzles for learning Triton

Jupyter Notebook 2,268 189 Updated Nov 18, 2024

Yinghan's Code Sample

Cuda 364 62 Updated Jul 25, 2022

CUDA Matrix Multiplication Optimization

Cuda 256 24 Updated Jul 19, 2024

Fast CUDA matrix multiplication from scratch

Cuda 1,035 156 Updated Sep 2, 2025

Easier, quicker command-line CUDA profiling

Shell 44 3 Updated Sep 17, 2024

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 188 51 Updated Jan 28, 2026
Next