Anyone may include a project in this list if it uses huggingface/candle.
Official Candle extensions for more specialized kernels, typically without backward equivalents but faster than raw Candle expressions
CublasLt matmul operation for the Candle ML framework with support for bias and Relu/Gelu fusing
Cross-platform browser ML framework leveraging WebGPU for inference with support for Whisper, Phi models, and quantization
Fused Layer Norm operation adapted from Flash Attention with support for dropout, residual, RMSNorm, and hidden dimensions up to 8192
Optimized rotary embeddings implementation adapted from vLLM project for efficient positional encoding in transformers
Blazingly fast LLM inference platform with all-in-one multimodal workflow support for text, vision, audio, speech, image, and embeddings
Efficient platform for inference and serving local LLMs with OpenAI compatible API server
Efficient and ergonomic LoRA implementation for Candle with out-of-the-box support for many models
Sampling techniques for Candle including multinomial, top-k, top-p, logprobs, repeat penalty, and logit bias
Extension library adding PyTorch functions not currently available in Candle
Runtime for quantized ML inference using WGPU to run models on any accelerator natively or in the browser
Collection of optimizers including SGD with momentum, AdaGrad, AdaDelta, AdaMax, Adam, AdamW, NAdam, RAdam, and RMSprop
The atoma-infer repository is a collection of optimized infrastructure for serving Large Language Models (LLMs) compute. We rely on highly optimized KV cache memory management software, through block pagination, such as PagedAttention and FlashAttention2. The codebase is mostly written in the Rust programming language, which leads to safe and highly optimized inference request scheduling, enhancing LLM inference serving.
Candle backend for the Burn deep learning framework, enabling Burn to leverage Candle's performance
Simple CUDA or CPU powered library for creating vector embeddings using Candle and Hugging Face models
Candle-based sentence embedder library and server with OpenAI-compatible API for sentence transformers
Distributed LLM and Stable Diffusion inference framework for mobile, desktop and server with support for LLaMA3
Port of Candle ML framework to Enflame GCU platform for deep learning inference on Enflame hardware accelerators
Large language model inference and chat service framework for Enflame GCU built on Candle-GCU and candle-vllm
Rust native library for Gen AI workflows providing seamless access to RAG pipelines and embeddings
- git
RWKV models and inference implementation with quantization support in Candle for efficient recurrent neural networks
Integration layer between ONNX Runtime and Candle for hardware-accelerated inference with cross-platform support
Kyutai's Moshi speech-text foundation model implementation in Rust/Candle with int8 and bf16 quantization for full-duplex spoken dialogue
Text To Speech interface implemented in pure Rust using Candle over Axum with a Tauri/Leptos WASM frontend
LLM chat interface implemented in pure Rust using Candle over Axum WebSockets with an SQL database and Tauri/Leptos WASM frontend
candle-video is a Rust-native implementation of video generation models, targeting deployment scenarios where startup time, binary size, and memory efficiency matter. It provides inference for state-of-the-art text-to-video models without requiring a Python runtime.
Tutorial project demonstrating Llama inference on GPU using Candle with GGUF format support
Core infrastructure for confidential computing in distributed AI systems using Candle for inference operations
Demo projects showcasing Candle capabilities on GPU instances (AWS Deep Learning AMI)
24/7 local screen and audio recording application using Candle for OCR, voice activity detection, and AI-powered analysis
Basic RWKV implementation in Rust supporting 32, 8 and 4 bit quantized evaluation with PyTorch and SafeTensors model loading
Almost-pure Rust TTS engine with experimental Candle/Torch/Tract support for loading pre-trained SpeedySpeech models
Detailed tutorial showing how to convert PyTorch models to Candle
This project aims to provide Rust code that follows the incredible text, Build An LLM From Scratch by Sebastian Raschka. The book provides arguably the most clearest step by step walkthrough for building a GPT-style LLM. Listed below are the titles for each of the 7 Chapters of the book.