Skip to content
View idning's full-sized avatar

Block or report idning

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🧮 A collection of resources to learn mathematics for machine learning

5,580 599 Updated Jan 24, 2023

Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs

HTML 784 112 Updated Dec 22, 2025

Efficient Triton Kernels for LLM Training

Python 5,977 455 Updated Dec 25, 2025

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,033 103 Updated Dec 30, 2024

Triton implementation of Flash Attention2.0

Python 47 6 Updated Jul 31, 2023

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,285 106 Updated Dec 15, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,998 779 Updated Dec 23, 2025

A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations

Python 16,255 1,193 Updated Dec 25, 2025

Minimal hackable GRPO implementation

Python 307 42 Updated Jan 31, 2025

Implementation of papers in 100 lines of code.

Python 1,686 181 Updated Nov 13, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,653 840 Updated Dec 18, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 12,518 1,536 Updated Apr 24, 2025

Instead of running one environment at a time or one per thread, run everything in batch using numpy on a single core.

Jupyter Notebook 5 2 Updated Feb 19, 2018

Fully open reproduction of DeepSeek-R1

Python 25,755 2,407 Updated Nov 24, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,119 339 Updated Dec 25, 2025

A PyTorch native platform for training generative AI models

Python 4,873 652 Updated Dec 24, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,031 588 Updated Dec 22, 2025

FlagGems is an operator library for large language models implemented in the Triton Language.

Python 809 184 Updated Dec 25, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,592 389 Updated Dec 25, 2025

Development repository for the Triton language and compiler

MLIR 17,933 2,469 Updated Dec 25, 2025

Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.

Cuda 397 52 Updated Jan 2, 2025

TORCH_LOGS parser for PT2

Rust 70 22 Updated Nov 10, 2025

A very simple shared memory dict implementation

Python 173 23 Updated Aug 26, 2022

Simple, minimal implementation of the Mamba SSM in one file of PyTorch.

Python 2,904 215 Updated Mar 8, 2024

Seamless operability between C++11 and Python

C++ 17,575 2,253 Updated Dec 25, 2025

Implementation of Denoising Diffusion Probabilistic Model in Pytorch

Python 10,322 1,255 Updated Aug 4, 2025

Denoising Diffusion Probabilistic Models

Python 4,949 464 Updated Aug 29, 2023

A minimal PyTorch implementation of probabilistic diffusion models for 2D datasets.

Jupyter Notebook 968 77 Updated May 7, 2024

An open source implementation of CLIP.

Python 13,160 1,221 Updated Nov 4, 2025

PyTorch Implementation of OpenAI's Image GPT

Python 260 33 Updated Oct 3, 2023
Next