-
Meta
- United States
Highlights
Stars
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations
Reference PyTorch implementation and models for DINOv3
Stereo4D dataset and processing code
[CVPR'25 Highlight] Official repository of Sonata: Self-Supervised Learning of Reliable Point Representations
Pointcept: Perceive the world with sparse points, a codebase for point cloud perception research. Latest works: Concerto (NeurIPS'25), Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral)
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos, CVPR 2025
projectaria_tools is an C++/Python open-source toolkit to interact with Project Aria data.
Pretrain, finetune ANY AI model of ANY size on 1 or 10,000+ GPUs with zero code changes.
Cramming the training of a (BERT-type) language model into limited compute.
Robust Speech Recognition via Large-Scale Weak Supervision
A library for differentiable nonlinear optimization
Aria data tools provide the open-source toolkit in C++ and Python to interact with data from Project Aria
[TensorFlow] Official implementation of CVPR'20 oral paper - D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features https://arxiv.org/abs/2003.03164
Cost Volume Pyramid Based Depth Inference for Multi-View Stereo (CVPR 2020 Oral)
[CVPR 2020] Local Class-Specific and Global Image-Level Generative Adversarial Networks for Semantic-Guided Scene Generation
[BMVC 2020 Oral] Bipartite Graph Reasoning GANs for Person Image Generation
[ECCV 2020] XingGAN for Person Image Generation
TensorFlow implementation of GeoDesc (ECCV'18), ContextDesc (CVPR'19) and ASLFeat (CVPR'20)
Implementation of CVPR'20 paper - ASLFeat: Learning Local Features of Accurate Shape and Localization
KFNet: Learning Temporal Camera Relocalization using Kalman Filtering (CVPR 2020 Oral)
Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks
Implementation of ICCV19 Paper "Learning Two-View Correspondences and Geometry Using Order-Aware Network"
Code for "Point-based Multi-view Stereo Network" (ICCV 2019 Oral) & "Visibility-aware Point-based Multi-view Stereo Network" (TPAMI)
AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation
[CVPR 2019 Oral] Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image Translation


