Stars
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
DRANET is a Kubernetes Network Driver that uses Dynamic Resource Allocation (DRA) to deliver high-performance networking for demanding applications in Kubernetes.
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
Demo project showing a single Rust codebase running on CPU and directly on GPUs
Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration
KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
MCP Server for kubernetes management commands
A Datacenter Scale Distributed Inference Serving Framework
Fast and memory-efficient exact attention
FlashMLA: Efficient Multi-head Latent Attention Kernels
llama3 implementation one matrix multiplication at a time
Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.
SGLang is a high-performance serving framework for large language models and multimodal models.
Analyzes resource usage and performance characteristics of running containers.
Add-on agent to generate and expose cluster-level metrics.
SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.
An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.
A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deployment.
A Cloud Native Batch System (Project under CNCF)
🍃 MINT-1T: A one trillion token multimodal interleaved dataset.
A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with…
Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.
Templates to deploy a serverless Minecraft Server on demand in AWS


