Stars
Masked Depth Modeling for Spatial Perception
Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
The implementation for ThreadWeaver Adaptive Threading for Efficient Parallel Reasoning in Language Models
The simplest, fastest repository for training/finetuning small-sized VLMs.
verl: Volcano Engine Reinforcement Learning for LLMs
Reference PyTorch implementation and models for DINOv3
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Renderer for the harmony response format to be used with gpt-oss
Kimi K2 is the large language model series developed by Moonshot AI team
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data (ACM MM '25)
[CoRL 2025] Real2Render2Real: Scaling Robot Data Without Dynamics Simulation or Robot Hardware
Realtime & high-frequency control interfaces for the YuMi IRB 14000 bi-manual robot arm including manual tele-operation and autonomous Diffusion Policy controllers
A Modular Toolkit for Robot Kinematic Optimization
Visual Imitation Enables Contextual Humanoid Control. CoRL 2025, Best Student Paper Award.
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild
NVIDIA Isaac GR00T N1.6 - A Foundation Model for Generalist Robots.
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A Data Streaming Library for Efficient Neural Network Training
FlashInfer: Kernel Library for LLM Serving
Official implementation of Continuous 3D Perception Model with Persistent State


