Lists (5)
Sort Name ascending (A-Z)
Starred repositories
Masked Depth Modeling for Spatial Perception
[ICLR 2026] π^3: Permutation-Equivariant Visual Geometry Learning
《美国反对美国》是王沪宁先生在上世纪80年代末赴美观察写作的。我们知道在那个年代中国对西方特别是美国的追捧有多高,所以突然看到一个学者在80年代就有如此清楚的认识,十分钦佩。由于网上只有效果很差的PDF扫描版,所以我想利用OCR技术和肉眼(人体OCR)来转成现代化的文本格式。目前已经全部完成。
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
Deep Learning Book Chinese Translation
TurboDiffusion: 100–200× Acceleration for Video Diffusion Models
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
Simple and readable code for training and sampling from diffusion models
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
[SIGGRAPH Asia 2025 (ACM TOG)] AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
Towards Scalable Pre-training of Visual Tokenizers for Generation
Integrating MinerU2.5 into FiftyOne as a Remote Source Zoo Model
Implementing sam3 for images as a Remote Source Zoo Model in FiftyOne
Unifying Variational Autoencoder (VAE) implementations in Pytorch (NeurIPS 2022)
The absolute trainer to light up AI agents.
[CVPR 2025 Highlight] Real-time dense scene reconstruction with SLAM3R
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
[CVPRW oral 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment
Contains the public resources of Hands on GenAI book
Native Multimodal Models are World Learners