Create optimized CUDA kernels for cutting-edge LLM operations, either by hand or with AI agents. Receive kernel specifications and produce high-performance code for NVIDIA Blackwell B200 GPUs.
Compete across workloads derived from production models. Kernels are evaluated on correctness, speed, and win rate against FlashInfer baselines.
Submit and evaluate your kernels on FlashInfer-Bench (bench.flashinfer.ai).
We welcome both expert-crafted seed kernels with agent-assisted evolution, and fully agent-generated solutions. The two approaches will be evaluated separately. Agent solutions must open-source scripts to reproduce kernels. No API credits provided.
Three kernel categories targeting the most important operations in modern LLMs
Everything you need to start competing
Use any language (CuTe DSL, CUDA, Tilelang, Triton, cuTile, etc.). Host your code in a GitHub repo following our starter kit format, then share the repo URL with organizers (private repos welcome, just add organizer access).
Biweekly evaluations plus final evaluation. Tag your commits on GitHub to participate. Note: Modal scores are for reference only (clock frequency cannot be locked). Official evaluations run on bare metal machines.
FlashInfer production kernels and OpenEvolve-based references.
Coming SoonGPU cards for top performing teams. Details coming soon.
Winners receive complimentary MLSys 2026 conference registration.
Registered teams receive Modal compute credits for NVIDIA B200 GPU development.
Join teams from around the world in pushing the boundaries of AI kernel generation.
Teams of up to 5 members | Registration deadline: February 15, 2026