Stars
[ICLR 2026] Learning to Parallel: Accelerating Diffusion Large Language Models via Learnable Parallel Decoding
Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!
PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005
A Framework of Small-scale Large Multimodal Models
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Open-Sora: Democratizing Efficient Video Production for All
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT workshop
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
maxin-cn / Latte
Forked from Vchitect/LatteThe official implementation of Latte: Latent Diffusion Transformer for Video Generation.
[ICCV 2025] QuEST: Efficient Finetuning for Low-bit Diffusion Models
Lossless Training Speed Up by Unbiased Dynamic Data Pruning
Strong and Open Vision Language Assistant for Mobile Devices
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Activation-aware Singular Value Decomposition for Compressing Large Language Models
#WORK IN PROGRESS PyTorch Implementation of Supervised and Deep Q-Learning EWC(Elastic Weight Consolidation), introduced in "Overcoming Catastrophic Forgetting in Neural Networks"
PB-LLM: Partially Binarized Large Language Models
[ICCV2023] Dataset Quantization
The official GitHub page for the survey paper "A Survey of Large Language Models".
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)
Reorder-based post-training quantization for large language model
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
The code for the Network Binarization via Contrastive Learning, which has been accepted to ECCV 2022.
PyTorch 1.0 implementation of the approximate Earth Mover's Distance
Using pre-trained Diffusion models as priors for inference tasks

