-
SenseTime Research
- Shenzhen
Stars
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
This repository contains the official implementation of "The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion".
The repository provides code for running inference with the SAM 3D Body Model (3DB), links for downloading the trained model checkpoints and datasets, and example notebooks that show how to use the…
⚡️SwanLab - an open-source, modern-design AI training tracking and visualization tool. Supports Cloud / Self-hosted use. Integrated with PyTorch / Transformers / verl / LLaMA Factory / ms-swift / U…
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Text-driven human motion generation surveys, datasets and models.
MotionGPT3: Human Motion as a Second Modality, a MoT-based framework for unified motion understanding and generation
Code for "GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates", Siggraph Asia 2024
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
4DHumans: Reconstructing and Tracking Humans with Transformers
Project page for End-to-end Recovery of Human Shape and Pose
Official implementation of CVPR2020 paper "VIBE: Video Inference for Human Body Pose and Shape Estimation"
Multi-Joint dynamics with Contact. A general purpose physics simulator.
Automated, hardware-independent Hand-Eye Calibration
Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots
Open Source framework for voice and multimodal conversational AI
robomimic: A Modular Framework for Robot Learning from Demonstration
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The …
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.