SwayamInSync

Swayam ❤️ Open Source

Swayam SwayamInSync

Swayam ❤️ Open Source

देखा एक ख्वाब तो ये सिलसिले हुए ✨

132 followers · 196 following

Achievements

x3 x2 x2

Achievements

x3 x2 x2

Organizations

Lists (9)

Sort

Starred repositories

KnowingNothing / compiler-and-arch

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

517 42 Updated Jan 15, 2025

djiangtw / data-structures-in-practice-public

A hardware-aware guide to data structures for system software engineers.

1,173 84 Updated Dec 17, 2025

KnowingNothing / MatmulTutorial

A Easy-to-understand TensorOp Matmul Tutorial

C++ 403 52 Updated Oct 10, 2025

Noumena-Network / nmoe

MoE training for Me and You and maybe other people

Python 313 27 Updated Jan 3, 2026

LearningInfiniTensor / TinyInfiniTensor

C++ 39 201 Updated Jan 8, 2025

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 734 67 Updated Dec 31, 2025

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 4,082 544 Updated Jan 3, 2026

tensor-compiler / taco

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

C++ 1,342 196 Updated Apr 14, 2025

HazyResearch / aisys-building-blocks

Building blocks for foundation models.

587 28 Updated Jan 3, 2024

nabla-ml / nabla

Machine Learning library for the emerging Mojo/Python ecosystem

Python 302 11 Updated Jan 3, 2026

meta-pytorch / torchcomms

torchcomms: a modern PyTorch communications API

C++ 315 52 Updated Jan 3, 2026

meta-pytorch / torchforge

PyTorch-native post-training at scale

Python 584 72 Updated Jan 3, 2026

meta-pytorch / monarch

PyTorch Single Controller

Rust 938 123 Updated Jan 2, 2026

NVlabs / RLP

RLP: Reinforcement as a Pretraining Objective

220 13 Updated Oct 5, 2025

deepseek-ai / DeepSeek-V3.2-Exp

Python 1,401 118 Updated Nov 18, 2025

microsoft / dion

Dion optimizer algorithm

Python 412 42 Updated Dec 23, 2025

ai-dynamo / grove

Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling

Go 134 31 Updated Dec 31, 2025

sgl-project / rbg

A workload for deploying LLM inference services on Kubernetes

Go 153 39 Updated Dec 25, 2025

huggingface / inference-benchmarker

Inference server benchmarking tool

Rust 134 24 Updated Oct 2, 2025

drbh / yamoe

🔀 yet another mixture of experts

Python 22 2 Updated Sep 19, 2025

stas00 / the-art-of-debugging

The Art of Debugging

Python 1,190 60 Updated Dec 20, 2025

MoonshotAI / checkpoint-engine

Checkpoint-engine is a simple middleware to update model weights in LLM inference engines

Python 882 72 Updated Dec 23, 2025

meta-pytorch / BackendBench

Ship correct and fast LLM kernels to PyTorch

Python 128 15 Updated Dec 18, 2025

IST-DASLab / qutlass

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 153 15 Updated Nov 11, 2025

NVIDIA / nvshmem

NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmer…

C++ 432 50 Updated Dec 31, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 4,864 316 Updated Jan 3, 2026

HigherOrderCO / Bend

A massively parallel, high-level programming language

Rust 19,127 470 Updated Jun 3, 2025

meta-pytorch / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 6,170 568 Updated Aug 22, 2025

trolldbois / ctypeslib

Generate python ctypes classes from C headers. Requires LLVM clang

Python 246 65 Updated Feb 19, 2025

google / highway

Performance-portable, length-agnostic SIMD with runtime dispatch

C++ 5,237 398 Updated Jan 2, 2026

Swayam SwayamInSync

Organizations

Lists (9)

Compilers and Interpreters

Cool CUDA libraries

CUDA/CUTLASS/NCCL

DType

JAX

Optimized Kernels

Reading Collection

Torch

TuneX

Starred repositories

C++