Skip to content
View SwayamInSync's full-sized avatar
:octocat:
Swayam ❤️ Open Source
:octocat:
Swayam ❤️ Open Source

Organizations

@numpy @microsoft @Azure @conda-forge @The-Asynchronous-Lab

Block or report SwayamInSync

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

517 42 Updated Jan 15, 2025

A hardware-aware guide to data structures for system software engineers.

1,173 84 Updated Dec 17, 2025

A Easy-to-understand TensorOp Matmul Tutorial

C++ 403 52 Updated Oct 10, 2025

MoE training for Me and You and maybe other people

Python 313 27 Updated Jan 3, 2026

A Quirky Assortment of CuTe Kernels

Python 734 67 Updated Dec 31, 2025

NanoGPT (124M) in 3 minutes

Python 4,082 544 Updated Jan 3, 2026

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,342 196 Updated Apr 14, 2025

Building blocks for foundation models.

587 28 Updated Jan 3, 2024

Machine Learning library for the emerging Mojo/Python ecosystem

Python 302 11 Updated Jan 3, 2026

torchcomms: a modern PyTorch communications API

C++ 315 52 Updated Jan 3, 2026

PyTorch-native post-training at scale

Python 584 72 Updated Jan 3, 2026

PyTorch Single Controller

Rust 938 123 Updated Jan 2, 2026

RLP: Reinforcement as a Pretraining Objective

220 13 Updated Oct 5, 2025

Dion optimizer algorithm

Python 412 42 Updated Dec 23, 2025

Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling

Go 134 31 Updated Dec 31, 2025

A workload for deploying LLM inference services on Kubernetes

Go 153 39 Updated Dec 25, 2025

Inference server benchmarking tool

Rust 134 24 Updated Oct 2, 2025

🔀 yet another mixture of experts

Python 22 2 Updated Sep 19, 2025

The Art of Debugging

Python 1,190 60 Updated Dec 20, 2025

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 882 72 Updated Dec 23, 2025

Ship correct and fast LLM kernels to PyTorch

Python 128 15 Updated Dec 18, 2025

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 153 15 Updated Nov 11, 2025

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 432 50 Updated Dec 31, 2025

My learning notes for ML SYS.

Python 4,864 316 Updated Jan 3, 2026

A massively parallel, high-level programming language

Rust 19,127 470 Updated Jun 3, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,170 568 Updated Aug 22, 2025

Generate python ctypes classes from C headers. Requires LLVM clang

Python 246 65 Updated Feb 19, 2025

Performance-portable, length-agnostic SIMD with runtime dispatch

C++ 5,237 398 Updated Jan 2, 2026
Next