-
PaddlePaddle
- Shanghai
-
15:17
(UTC +08:00) - https://enigmatisms.github.io/owner-info/
Highlights
Lists (17)
Sort Name ascending (A-Z)
C++
Excellent C++ projs.CUDA
Some CUDA repos.DeepLearning
Deep learning based algorithms and repos.Game Engine
Engine EngineeringGraphics
Computer graphics related.HPC
High performance computingInteresting
Interesting little project or games.Learning
Excellent learning resourcesLightning
Fast! Make your code faster.LLMs
Interesting LLMsMath Algos
Mathematically based AlgorithmsNeRF&GS
NeRF and 3D GaussianPython
Python interesting repos and utilitiesRe-implementation
"My-own" series.Rust
Repos about rust programming language.SLAM
Repos of SLAMUtilities
Speed boosters (CUDA) and useful repos.- All languages
- BibTeX Style
- C
- C#
- C++
- CMake
- CSS
- CoffeeScript
- Cuda
- Cython
- Dart
- Dockerfile
- GDScript
- GLSL
- Go
- HLSL
- HTML
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- LLVM
- MATLAB
- MLIR
- Makefile
- Markdown
- Mathematica
- NASL
- Odin
- PHP
- Perl
- PostScript
- PowerShell
- Python
- Roff
- Ruby
- Rust
- SCSS
- SWIG
- Shell
- Slang
- Stylus
- Swift
- SystemVerilog
- TeX
- TypeScript
- Typst
- VHDL
- Vim Script
- Vue
Starred repositories
qqr is an RL training framework for open-ended agents.
A tool for bandwidth measurements on NVIDIA GPUs.
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
Machine Learning Engineering Open Book
High-performance, light-weight C++ LLM and VLM Inference Software for Physical AI
Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"
CUDA-L2: Surpassing cuBLAS Performance for Matrix Multiplication through Reinforcement Learning
Open-source release accompanying Gao et al. 2025
verl: Volcano Engine Reinforcement Learning for LLMs
Accelerating MoE with IO and Tile-aware Optimizations
2021年最新总结,推荐工程师合适读本,计算机科学,软件技术,创业,思想类,数学类,人物传记书籍
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Helpful kernel tutorials and examples for tile-based GPU programming
An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.
An early research stage expert-parallel load balancer for MoE models based on linear programming.
A cinematic Git commit replay tool for the terminal, turning your Git history into a living, animated story.
Enjoy the magic of Diffusion models!
Large-Area Fabrication-Aware Computational Diffractive Optics (SIGGRAPH Asia & TOG 2025)
The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the…
Unifying 3D Mesh Generation with Language Models
A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
Core Functional Library for Distributed Training





