Stars
Fast and accurate automatic speech recognition (ASR) for edge devices
Spec-driven development (SDD) for AI coding assistants.
[ICLR 2026] GRAPE: Group Representational Position Encoding (https://arxiv.org/abs/2512.07805)
DeepEP: an efficient expert-parallel communication library
Pretraining and inference code for a large-scale depth-recurrent language model
A comprehensive library for computational molecular biology
The simplest but fast implementation of matrix multiplication in CUDA.
Accelerated First Order Parallel Associative Scan
GPU programming related news and material links
Streamlit — A faster way to build and share data apps.
Low Precision Arithmetic Simulation in PyTorch
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
RARS -- RISC-V Assembler and Runtime Simulator
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Production infrastructure for machine learning at scale
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
ClearML - Auto-Magical CI/CD to streamline your AI workload. Experiment Management, Data Management, Pipeline, Orchestration, Scheduling & Serving in one MLOps/LLMOps solution
open source tools to generate mypy stubs from protobufs
A framework for managing and maintaining multi-language pre-commit hooks.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Full resolution images of the O RLY book covers made by The Practical Dev

