Skip to content
View sara4dev's full-sized avatar
🏠
Working from home
🏠
Working from home
  • nvidia
  • Dallas, TX
  • 11:09 (UTC -06:00)
  • X @sara4dev

Organizations

@meygam

Block or report sara4dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 81,923 12,279 Updated Dec 27, 2025
Python 711 76 Updated Dec 21, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 781 208 Updated Dec 29, 2025

DRANET is a Kubernetes Network Driver that uses Dynamic Resource Allocation (DRA) to deliver high-performance networking for demanding applications in Kubernetes.

Go 159 24 Updated Dec 9, 2025

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 38,854 2,607 Updated Dec 29, 2025

kernels, of the mega variety

Python 636 35 Updated Sep 28, 2025

Demo project showing a single Rust codebase running on CPU and directly on GPUs

Rust 466 12 Updated Aug 8, 2025

Open, Multi-Cloud, Multi-Cluster Kubernetes Orchestration

Go 5,217 1,035 Updated Dec 29, 2025

KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale

Go 1,043 128 Updated Dec 25, 2025

MCP Server for kubernetes management commands

TypeScript 1,232 207 Updated Dec 28, 2025

Blazingly fast LLM inference.

Rust 6,312 498 Updated Dec 19, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,696 756 Updated Dec 29, 2025

Fast and memory-efficient exact attention

Python 21,344 2,257 Updated Dec 26, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,941 924 Updated Dec 15, 2025

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,205 1,289 Updated May 23, 2024

Achieve the llama3 inference step-by-step, grasp the core concepts, master the process derivation, implement the code.

Jupyter Notebook 613 51 Updated Feb 24, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 22,059 3,897 Updated Dec 29, 2025

tooling to verify resources associated to containers

Go 1 Updated Jan 29, 2025

Analyzes resource usage and performance characteristics of running containers.

Go 18,716 2,437 Updated Dec 25, 2025

Add-on agent to generate and expose cluster-level metrics.

Go 6,023 2,145 Updated Dec 29, 2025

SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.

C++ 1,792 77 Updated Jun 16, 2025

Docker-based inference engine for AMD GPUs

Python 231 8 Updated Oct 7, 2024

An Operator for deployment and maintenance of NVIDIA NIMs and NeMo microservices in a Kubernetes environment.

Go 140 36 Updated Dec 17, 2025

A collection of YAML files, Helm Charts, Operator code, and guides to act as an example reference implementation for NVIDIA NIM deployment.

Jupyter Notebook 216 92 Updated Dec 18, 2025

A Cloud Native Batch System (Project under CNCF)

Go 5,201 1,243 Updated Dec 29, 2025

🍃 MINT-1T: A one trillion token multimodal interleaved dataset.

829 18 Updated Jul 31, 2024

A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 37.3% tasks (pass@1) in SWE-bench lite and 46.2% tasks (pass@1) in SWE-bench verified with…

Python 3,044 333 Updated Apr 24, 2025

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Rust 552 64 Updated Dec 24, 2025

Templates to deploy a serverless Minecraft Server on demand in AWS

TypeScript 1,782 132 Updated Aug 4, 2023
Next