Official PyTorch implementation of "Rethinking Garment Conditioning in Diffusion-based Virtual Try-On"
- [2025.12.22] 🎉 Inference code and pre-trained models released!
- [2025.11.24] The paper is available on arXiv.
Re-CatVTON is an efficient single UNet diffusion-based virtual try-on (VTON) framework that revisits how garment information should be used to condition the denoising process.
conda create -n recatvton python=3.12
conda activate recatvton
git clone https://github.com/Levinna/Re-CatVTON.git
cd Re-CatVTON
pip install -r requirements.txtWe trained and tested our Re-CatVTON on Python 3.12, PyTorch 2.8.0 with CUDA 12.9.
VITON-HD or DressCode dataset is required for inference.
In the thirdparty folder, you can generate agnostic masks for the DressCode dataset using preprocess_agnostic_mask.py (Credit to CatVTON!)
cd thirdparty
CUDA_VISIBLE_DEVICES=0 python preprocess_agnostic_mask.py \
--data_root_path /path/to/DressCodeOption 1: Load from HuggingFace Hub
python inference_recatvton.py \
--hf_repo levinna/Re-CatVTON \
--hf_subfolder VITON-HD/checkpoint-16000/unet \
--dataset_name vitonhd \
--data_root_path /path/to/VITON-HD \
--output_dir ./output \
--batch_size 16 \
--mixed_precision bf16Option 2: Load from local path
# First, download the model
hf download levinna/Re-CatVTON --local-dir ./checkpoints # or huggingface-cli download
# Then run inference
python inference_recatvton.py \
--base_model_path ./checkpoints/VITON-HD/checkpoint-16000 \
--dataset_name vitonhd \
--data_root_path /path/to/VITON-HD \
--output_dir ./output \
--batch_size 16 \
--mixed_precision bf16If your GPU does not support bf16, you can try fp16 or fp32.
| Dataset | HF Subfolder | Resolution |
|---|---|---|
| VITON-HD | VITON-HD/checkpoint-16000/unet |
512×384 |
| DressCode | DressCode/checkpoint-32000/unet |
512×384 |
| Argument | Default | Description |
|---|---|---|
--sampler |
ddim |
Sampler type: ddim, ddpm, unipc, dpmpp |
--num_inference_steps |
50 |
Number of diffusion steps |
--guidance_scale |
2.5 |
CFG guidance scale |
--repaint |
True |
Blend result with original background |
--eval_pair |
True |
Evaluate on paired split |
Recommended steps per sampler:
ddim: 50 steps (main results)unipc: 30 stepsdpmpp: 25 steps
| Model | FID ↓ | KID ↓ | LPIPS ↓ | Params (M) |
|---|---|---|---|---|
| CatVTON | 5.888 | 0.513 | 0.061 | 859.5 |
| Leffa | 4.540 | 0.050 | 0.048 | 1802.7 |
| Re-CatVTON (Ours) | 4.438 | 0.010 | 0.047 | 859.5 |
Comparison on the VITON-HD paired setting.
Re-CatVTON/
├── thirdparty/
│ ├── SCHP/
│ ├── DensePose/
│ ├── cloth_masker.py
│ ├── preprocess_agnostic_mask.py
│ └── preprocess_agnostic_mask.sh
├── model/
│ ├── attn_processor.py
│ ├── pipeline.py
│ └── utils.py
├── assets/
├── inference_recatvton.py
├── inference_recatvton.sh
├── vton_datasets.py
├── evaluation.py
├── evaluation.sh
├── requirements.txt
├── LICENSE
└── README.md
- Release inference code
- Release pre-trained models
- HuggingFace Demo
- ComfyUI Support
- Code: CC-BY-NC-SA 4.0
- Model Weights: CC-BY-NC 4.0 Note: The model weights are licensed under CC BY-NC 4.0 due to the non-commercial usage constraints of the VITON-HD and DressCode datasets.
This project is built upon Diffusers and uses Stable Diffusion v1.5 Inpainting as the base model.
For fair comparison, our data pipeline for inference and evaluation protocol follow those of CatVTON and Leffa.
We thank all the contributors of these projects for their excellent work.
If you find our work helpful, please consider citing:
@article{na2025rethinking,
title={Rethinking Garment Conditioning in Diffusion-based Virtual Try-On},
author={Na, Kihyun and Choi, Jinyoung and Kim, Injung},
journal={arXiv preprint arXiv:2511.18775},
year={2025}
}