Official repo for DiffArtist: Towards Structure and Appearance Controllable Image Stylization
DiffArtist is a training-free text-driven image stylization method that stylize in both structure and appearance. You give an image and input a prompt describing the desired style, DiffArtist give you the stylized image in that style. The semantics of the original image and the style is harmonically integrated with the style, and you can easily control the structure and appearance-level style strength.
No need to train, no need to download any ControNets or LoRAs. Just use a pretrained Stable Diffusion.
🔥Jul 05, 2025. DiffArtist is accepted to ACM MM 2025!
🔥Apr 23, 2025. Updated paper, added more comparisons and analysis for the dual controllability in structure and appearance.
🔥Dec 24, 2024. Updated paper, added more comparisons and analysis.
🔥Sep 21, 2024. Add config file for playground-v2 (experimental).
🔥Jul 22, 2024. The paper and inference code is released.
🔥Jul 30, 2024. Updated huggingface demo, thanks for fffiloni!
Clone the repository:
git clone https://github.com/songrise/Artist
Create a virtual environment and install dependencies:
conda create -n artist python=3.8
conda activate artist
pip install -r requirements.txt
For the first time you execute the code, you need to wait for the download of the Stable Diffusion model from the Hugging Face repository.
Run the following command to start the gradio interface:
python injection_main.py --mode app
Visit http://localhost:7860 in your browser to access the interface.
Notice that for some input image you may need to adjust the parameters to have the best result.
You can also run the following command to stylize an image in the command line:
python injection_main.py --mode cli --image_dir data/example/1.png --prompt "A B&W pencil sketch, detailed cross-hatching" --config example_config.yaml
Aside from the Stable Diffusino model 2.1, we now provide a config file for the playground-v2 model, located in ./example_config_playground.yaml. Note that this feature is still experimental. Compared with SD 2.1, it can have better performance on some image/prompt pairs, but it may also have worse performance on some other pairs. Some good examples are shown below:
If you find this repo useful, please consider cite it as this updated version, (the older title was Artist: Aesthetically controllable text-driven stylization without training)
@misc{jiang2024diffartist,
title={DiffArtist: Towards Structure and Appearance Controllable Image Stylization},
author={Ruixiang Jiang and Changwen Chen},
year={2024},
eprint={2407.15842},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

