- Canada
ML
Large Language Model Text Generation Inference
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and cont…
Claude Engineer is an interactive command-line interface (CLI) that leverages the power of Anthropic's Claude-3.5-Sonnet model to assist with software development tasks.This framework enables Claud…
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
A Survey on Large Language Model-Based Game Agents
The official Python client for the Hugging Face Hub.
Optimizing inference proxy for LLMs
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
A PyTorch native platform for training generative AI models
PyTorch native quantization and sparsity for training and inference
A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train OpenAI gpt-oss, DeepSeek, Qwen, Llama, Gemma, TTS 2x faster with 70% less VRAM.
Entropy Based Sampling and Parallel CoT Decoding
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
A collection of projects designed to help developers quickly get started with building deployable applications using the Claude API
Prometheus exporter for Starlette and FastAPI
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Curated list of datasets and tools for post-training.
The reinforcement learning training code for AgiBot X1.



