Highlights
- All languages
- ANTLR
- Assembly
- Astro
- Batchfile
- Bluespec
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Crystal
- Cuda
- Cython
- D
- Dart
- Dockerfile
- Erlang
- Forth
- Gherkin
- Go
- Groovy
- HTML
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MDX
- Makefile
- Markdown
- Nim
- Nix
- OCaml
- Objective-C
- OpenSCAD
- PHP
- PowerShell
- Python
- Racket
- ReScript
- Ruby
- Rust
- SCSS
- SVG
- Sail
- Sass
- Scala
- Shell
- Standard ML
- Swift
- SystemVerilog
- TeX
- TypeScript
- Typst
- VHDL
- Verilog
- Vim Script
- Visual Basic
- Vue
- WebAssembly
- Zig
- ooc
Starred repositories
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
An unofficial cuda assembler, for all generations of SASS, hopefully :)
High-Throughput, Cost-Effective Billion-Scale Vector Search with a Single GPU [to appear in SIGMOD'26]
Simulator code of the paper "Dissecting and Modeling the Architecture of Modern GPU Cores"
Unofficial description of the CUDA assembly (SASS) instruction sets.
Smart pointers for the (GNU) C programming language
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
A Coverage Explorer for Reverse Engineers
Implementation of "Beyond Classification: Inferring Function Names in Stripped Binaries via Domain Adapted LLMs" (NDSS'25)
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Tinymist [ˈtaɪni mɪst] is an integrated language service for Typst [taɪpst].
💫 Toolkit to help you get started with Spec-Driven Development
An extremely fast Python type checker and language server, written in Rust.
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
This patch removes restriction on maximum number of simultaneous NVENC video encoding sessions imposed by Nvidia to consumer-grade GPUs.
Deploy headless browsers in Docker. Run on our cloud or bring your own. Free for non-commercial uses.
实现Linux Wayland下腾讯会议屏幕共享(非虚拟相机). Hook library that enables screenshare with Tencent Wemeet on Linux Wayland, without the need of using virtual cameras.
A fast type checker and language server for Python
GDB-compatible RISC-V Debugger for CH32V003 that runs on a Raspberry Pi Pico
Open Source Inventory Management System
FPGA implementation of a CDR targeting a Xilinx Kintex-7 for data rates up to 250 MHz
Test of the USB3 IP Core from Daisho on a Xilinx device





