GitHub - vipshop/cache-dit: 🤗 A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

Baseline	SCM S S*	SCM F D*	SCM U D*	+TS	+compile	+FP8*
24.85s	15.4s	11.4s	8.2s	8.2s	🎉7.1s	🎉4.5s

Scheme: DBCache + SCM(steps_computation_mask) + TS(TaylorSeer) + FP8*, L20x1, S*: static cache,
D*: dynamic cache, S: Slow, F: Fast, U: Ultra Fast, TS: TaylorSeer, FP8*: FP8 DQ + Sage, FLUX.1-Dev

🔥Hightlight

We are excited to announce that the 🎉v1.1.0 version of cache-dit has finally been released! It brings 🔥Context Parallelism and 🔥Tensor Parallelism to cache-dit, thus making it a PyTorch-native and Flexible Inference Engine for 🤗DiTs. Key features: Unified Cache APIs, Forward Pattern Matching, Block Adapter, DBCache, DBPrune, Cache CFG, TaylorSeer, SCM, Context Parallelism (w/ UAA), Tensor Parallelism and 🎉SOTA performance.

pip3 install -U cache-dit # Also, pip3 install git+https://github.com/huggingface/diffusers.git (latest)

You can install the stable release of cache-dit from PyPI, or the latest development version from GitHub. Then try ♥️ Cache Acceleration with just one line of code ~ ♥️

>>> import cache_dit
>>> from diffusers import DiffusionPipeline
>>> pipe = DiffusionPipeline.from_pretrained("Qwen/Qwen-Image") # Can be any diffusion pipeline
>>> cache_dit.enable_cache(pipe) # One-line code with default cache options.
>>> output = pipe(...) # Just call the pipe as normal.
>>> stats = cache_dit.summary(pipe) # Then, get the summary of cache acceleration stats.
>>> cache_dit.disable_cache(pipe) # Disable cache and run original pipe.

📚Core Features

🎉Full 🤗Diffusers Support: Notably, cache-dit now supports nearly all of Diffusers' DiT-based pipelines, include 30+ series, ~100+ pipelines: 🔥FLUX, 🔥Qwen-Image, 🔥Z-image, 🔥LongCat-Image, 🔥Wan, etc.
🎉Extremely Easy to Use: In most cases, you only need one line of code: cache_dit.enable_cache(...). After calling this API, just use the pipeline as normal.
🎉State-of-the-Art Performance: Compared with other algorithms, cache-dit achieved the SOTA w/ 7.4x↑🎉 speedup on ClipScore! Surprisingly, it's DBCache also works for extremely few-step distilled models.
🎉Compatibility with Other Optimizations: Designed to work seamlessly with torch.compile, Quantization, CPU or Sequential Offloading, 🔥Context Parallelism, 🔥Tensor Parallelism, etc.
🎉Hybrid Cache Acceleration: Now supports hybrid Block-wise Cache + Calibrator schemes. DBCache acts as the Indicator to decide when to cache, while the Calibrator decides how to cache.
🎉HTTP Serving Support: Built-in HTTP serving capabilities for production deployment with simple REST API. Supports text-to-image, image editing, text/image-to-video, and LoRA.
🎉Ecosystem Integration: Joined the Diffusers community as the first DiTs' cache acceleration framework for 🤗diffusers, 🔥SGLang Diffusion, 🔥vLLM-Omni and 🔥stable-diffusion.cpp.

🔥Supported DiTs

Tip

✅: supported; ✖️: not supported now; Q: nunchaku; C-P: Context Parallelism; T-P: Tensor Parallelism; TE-P: Text Encoder Parallelism; CN-P: ControlNet Parallelism; VE-P: VAE Parallelism.

📚Model	Cache	C-P	T-P	TE-P	CN-P	VE-P
🔥LongCat-Image	✅	✅	✅	✅	✖️	✖️
🔥LongCat-Image-Edit	✅	✅	✅	✅	✖️	✖️
🔥Z-Image-Turbo	✅	✅	✅	✅	✖️	✖️
🔥Z-Image-ControlNet	✅	✅	✅	✅	✅	✖️
🔥Ovis-Image	✅	✅	✅	✅	✖️	✖️
🔥HuyuanVideo-1.5	✅	✖️	✖️	✅	✖️	✖️
🔥FLUX.2-dev	✅	✅	✅	✅	✖️	✖️
🎉FLUX.1-dev	✅	✅	✅	✅	✖️	✖️
🎉FLUX.1-Fill-dev	✅	✅	✅	✅	✖️	✖️
🎉Qwen-Image	✅	✅	✅	✅	✖️	✖️
🎉Qwen-Image-Edit	✅	✅	✅	✅	✖️	✖️
🎉Qwen-Image-ControlNet	✅	✅	✅	✅	✖️	✖️
🎉Qwen-Image-Lightning	✅	✅	✅	✅	✖️	✖️
🎉Qwen-Image-Edit-Lightning	✅	✅	✅	✅	✖️	✖️
🎉Wan-2.2 T2V/ITV	✅	✅	✅	✅	✖️	✖️
🎉Wan-2.2 VACE	✅	✅	✅	✅	✖️	✖️
🎉Wan-2.1 T2V/ITV	✅	✅	✅	✅	✖️	✖️
🎉Wan-2.1 VACE	✅	✅	✅	✅	✖️	✖️
🎉HunyuanImage	✅	✅	✅	✅	✖️	✖️
🎉HunyuanVideo	✅	✅	✅	✅	✖️	✖️
🎉FLUX.1-dev `Q`	✅	✅	✖️	✅	✖️	✖️
🎉FLUX.1-Fill-dev `Q`	✅	✅	✖️	✅	✖️	✖️
🎉Qwen-Image `Q`	✅	✅	✖️	✅	✖️	✖️
🎉Qwen-Image-Edit `Q`	✅	✅	✖️	✅	✖️	✖️
🎉Qwen-Image-Lightning `Q`	✅	✅	✖️	✅	✖️	✖️
🎉Qwen-Image-Edit-Lightning `Q`	✅	✅	✖️	✅	✖️	✖️
🎉SkyReelsV2	✅	✅	✅	✅	✖️	✖️
🎉LongCatVideo	✅	✖️	✖️	✅	✖️	✖️
🎉ChronoEdit	✅	✅	✅	✅	✖️	✖️
🎉Kandinsky-5	✅	✅️	✅️	✅	✖️	✖️
🎉PRX-T2I	✅	✖️	✖️	✅	✖️	✖️
🎉LTXVideo	✅	✅	✅	✅	✖️	✖️
🎉CogVideoX	✅	✅	✅	✅	✖️	✖️
🎉CogVideoX-1.5	✅	✅	✅	✅	✖️	✖️
🎉CogView4	✅	✅	✅	✅	✖️	✖️
🎉CogView3Plus	✅	✅	✅	✅	✖️	✖️
🎉PixArt Sigma	✅	✅	✅	✅	✖️	✖️
🎉PixArt Alpha	✅	✅	✅	✅	✖️	✖️
🎉Chroma-HD	✅	✅	️✅	✅	✖️	✖️
🎉VisualCloze	✅	✅	✅	✅	✖️	✖️
🎉ConsisID	✅	✅	✅	✅	✖️	✖️
🎉Mochi	✅	✖️	✅	✅	✖️	✖️
🎉Lumina 1/2	✅	✖️	✅	✅	✖️	✖️
🎉HiDream	✅	✖️	✖️	✅	✖️	✖️
🎉HunyuanDiT	✅	✖️	✅	✅	✖️	✖️
🎉Sana	✅	✖️	✖️	✅	✖️	✖️
🎉Bria	✅	✖️	✖️	✅	✖️	✖️
🎉DiT-XL	✅	✅	✖️	✅	✖️	✖️
🎉Allegro	✅	✖️	✖️	✅	✖️	✖️
🎉Cosmos	✅	✖️	✖️	✅	✖️	✖️
🎉OmniGen	✅	✖️	✖️	✅	✖️	✖️
🎉EasyAnimate	✅	✖️	✖️	✅	✖️	✖️
🎉StableDiffusion-3	✅	✖️	✖️	✅	✖️	✖️
🎉Amused	✅	✖️	✖️	✅	✖️	✖️
🎉AuraFlow	✅	✖️	✖️	✅	✖️	✖️

🔥Click here to show many Image/Video cases🔥

🔥Wan2.2 MoE | +cache-dit:2.0x↑🎉 | HunyuanVideo | +cache-dit:2.1x↑🎉

🔥Qwen-Image | +cache-dit:1.8x↑🎉 | FLUX.1-dev | +cache-dit:2.1x↑🎉

🔥Qwen...Lightning | +cache-dit:1.14x↑🎉 | HunyuanImage | +cache-dit:1.7x↑🎉

🔥Qwen-Image-Edit | Input w/o Edit | Baseline | +cache-dit:1.6x↑🎉 | 1.9x↑🎉

🔥FLUX-Kontext-dev | Baseline | +cache-dit:1.3x↑🎉 | 1.7x↑🎉 | 2.0x↑ 🎉

🔥HiDream-I1 | +cache-dit:1.9x↑🎉 | CogView4 | +cache-dit:1.4x↑🎉 | 1.7x↑🎉

🔥CogView3 | +cache-dit:1.5x↑🎉 | 2.0x↑🎉| Chroma1-HD | +cache-dit:1.9x↑🎉

🔥Mochi-1-preview | +cache-dit:1.8x↑🎉 | SkyReelsV2 | +cache-dit:1.6x↑🎉

🔥LTX-Video-0.9.7 | +cache-dit:1.7x↑🎉 | CogVideoX1.5 | +cache-dit:2.0x↑🎉

🔥OmniGen-v1 | +cache-dit:1.5x↑🎉 | 3.3x↑🎉 | Lumina2 | +cache-dit:1.9x↑🎉

🔥Allegro | +cache-dit:1.36x↑🎉 | AuraFlow-v0.3 | +cache-dit:2.27x↑🎉

🔥Sana | +cache-dit:1.3x↑🎉 | 1.6x↑🎉| PixArt-Sigma | +cache-dit:2.3x↑🎉

🔥PixArt-Alpha | +cache-dit:1.6x↑🎉 | 1.8x↑🎉| SD 3.5 | +cache-dit:2.5x↑🎉

🔥Asumed | +cache-dit:1.1x↑🎉 | 1.2x↑🎉 | DiT-XL-256 | +cache-dit:1.8x↑🎉
♥️ Please consider to leave a ⭐️ Star to support us ~ ♥️

📖Table of Contents

🚀Quick Links

📊Examples - The easiest way to enable hybrid cache acceleration and parallelism for DiTs with cache-dit is to start with our examples for popular models: FLUX, Z-Image, Qwen-Image, Wan, etc.
🌐HTTP Serving - Deploy cache-dit models with HTTP API for text-to-image, image editing, multi-image editing, and text/image-to-video generation.
🎉User Guide - For more advanced features, please refer to the 🎉User_Guide.md for details.
❓FAQ - Frequently asked questions including attention backend configuration, troubleshooting, and optimization tips.

📚Documentation

👋Contribute

How to contribute? Star ⭐️ this repo to support us or check CONTRIBUTE.md.

🎉Projects Using CacheDiT

Here is a curated list of open-source projects integrating CacheDiT, including popular repositories like jetson-containers, flux-fast, sdnext, 🔥stable-diffusion.cpp, 🔥vLLM-Omni, and 🔥SGLang Diffusion. 🎉CacheDiT has been recommended by many famous opensource projects: 🔥Z-Image, 🔥Wan 2.2, 🔥Qwen-Image, 🔥LongCat-Video, Qwen-Image-Lightning, Kandinsky-5, LeMiCa, 🤗diffusers, HelloGitHub and GiantPandaLLM.

Special thanks to vipshop's Computer Vision AI Team for supporting document, testing and production-level deployment of this project. We learned the design and reused code from the following projects: 🤗diffusers, SGLang, ParaAttention, xDiT, TaylorSeer and LeMiCa.

©️Citations

@misc{cache-dit@2025,
  title={cache-dit: A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.},
  url={https://github.com/vipshop/cache-dit.git},
  note={Open-source software available at https://github.com/vipshop/cache-dit.git},
  author={DefTruth, vipshop.com},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 750 Commits
.github/workflows		.github/workflows
assets		assets
bench		bench
docs		docs
examples		examples
src/cache_dit		src/cache_dit
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
collect_env.py		collect_env.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

🔥Hightlight

📚Core Features

🔥Supported DiTs

📖Table of Contents

🚀Quick Links

📚Documentation

👋Contribute

🎉Projects Using CacheDiT

©️Acknowledgements

©️Citations

About

Uh oh!

Releases 78

Packages

Contributors 8

Languages

License

vipshop/cache-dit

Folders and files

Latest commit

History

Repository files navigation

A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs

🔥Hightlight

📚Core Features

🔥Supported DiTs

📖Table of Contents

🚀Quick Links

📚Documentation

👋Contribute

🎉Projects Using CacheDiT

©️Acknowledgements

©️Citations

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 78

Packages 0

Contributors 8

Languages

A PyTorch-native and Flexible Inference Engine with
Hybrid Cache Acceleration and Parallelism for 🤗DiTs

Packages