Skip to content

Official implementation of UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy

Notifications You must be signed in to change notification settings

EnVision-Research/UniCalli

Repository files navigation

UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy

arXiv Project Page Hugging Face ModelScope GitHub

English | 简体中文

Overview

UniCalli is a groundbreaking unified diffusion framework that addresses column-level generation of Chinese calligraphy. Unlike existing methods that focus on isolated character generation or compromise calligraphic correctness for page-level synthesis, UniCalli integrates both recognition and generation tasks in a single model, achieving superior results in both stylistic fidelity and structural accuracy.

Key Features

  • Unified Architecture: First framework to unify column-level calligraphy generation and recognition
  • Multi-Master Styles: Supports diverse calligraphic styles, including Wang Xizhi, Yan Zhenqing, Ouyang Xun, etc.
  • Densely Annotated Data: Trained on large-scale calligraphy dataset with detailed annotations

Licence

For academic research and non-commercial use only. For commercial use, please contact the authors.

本模型仅供学术研究、非商业使用,商业使用请联系作者。

TODO List

  • Model Release - Base version without pred_box
  • Inference Code
  • Interactive Demo
  • Dataset Release
  • Training Code

Getting Started

Installation

git clone https://github.com/EnVision-Research/UniCalli.git
cd UniCalli
pip install -r requirements.txt

Download Model

Download the pretrained model from Hugging Face:

# Using huggingface-cli
huggingface-cli download TSXu/UniCalli-base unicalli-base_cleaned.bin --local-dir ./checkpoints

Or from ModelScope:

# Using modelscope
pip install modelscope
python -c "from modelscope import snapshot_download; snapshot_download('tianshuo/UniCalli-base', local_dir='./checkpoints')"

Download Other Components

Please note that you need to download additional components to ensure the model runs properly:

# InternVL3-1B:
https://huggingface.co/OpenGVLab/InternVL3-1B

# Fangzheng TTF:
https://www.fonts.net.cn/font-31659110985.html
MD5: 579e8932d773f5f58ebb2c643aa89ba9

Usage

You can also use the API directly:

from inference import CalligraphyGenerator

generator = CalligraphyGenerator(
    model_name="flux-dev",
    device="cuda",
    offload=False,
    intern_vlm_path="path/to/InternVL3-1B",
    checkpoint_path="unicalli-base_cleaned.bin",
    font_descriptions_path='dataset/chirography.json',
    author_descriptions_path='dataset/calligraphy_styles_en.json'
)

image, cond_img = generator.generate(
    text="生日快乐喵",  # Must be 5 characters
    font_style="楷",    # 楷(Regular)/草(Cursive)/行(Running)
    author="赵佶",    # Or None to use synthetic style
    save_path="output.png",
    num_steps=39,
    seed=1128293374,
)

Using DeepSpeed for Memory Optimization

For large models or limited GPU memory, you can use DeepSpeed ZeRO:

from inference import CalligraphyGenerator

generator = CalligraphyGenerator(
    model_name="flux-dev",
    device="cuda",
    offload=False,  # DeepSpeed manages memory
    intern_vlm_path="path/to/InternVL3-1B",
    checkpoint_path="unicalli-base_cleaned.bin",
    font_descriptions_path='dataset/chirography.json',
    author_descriptions_path='dataset/calligraphy_styles_en.json',
    use_deepspeed=True,
    deepspeed_config="ds_config_zero2.json"
)

image, cond_img = generator.generate(
    text="生日快乐喵",  # Must be 5 characters
    font_style="楷",    # 楷(Regular)/草(Cursive)/行(Running)
    author="赵佶",    # Or None to use synthetic style
    save_path="output.png",
    num_steps=39,
    seed=1128293374,
)

Supported Font Styles

  • 楷 (Regular Script / Kaishu): Standard, block-style characters
  • 行 (Running Script / Xingshu): Semi-cursive, flowing style
  • 草 (Cursive Script / Caoshu): Highly cursive, artistic style

Supported Calligraphy Masters

The model supports various historical calligraphy masters including:

  • 王羲之 (Wang Xizhi) - "Sage of Calligraphy"
  • 颜真卿 (Yan Zhenqing) - Tang Dynasty master
  • 欧阳询 (Ouyang Xun) - One of the Four Great Masters
  • 赵佶 (Emperor Huizong) - Song Dynasty emperor and calligrapher
  • And many more...

You can also use author=None to generate in a synthetic, averaged style.

Model Details

  • Base Architecture: FLUX diffusion model
  • Model Size: ~23GB
  • Input: Text (5 characters), font style, author style
  • Output: Column-level calligraphy image
  • Training Data: Large-scale Chinese calligraphy dataset with dense annotations

Citation

If you find UniCalli useful in your research, please consider citing:

@article{xu2025unicalli,
  title={UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy},
  author={Xu, Tianshuo and Wang, Kai and Chen, Zhifei and Wu, Leyi and Wen, Tianshui and Chao, Fei and Chen, Ying-Cong},
  journal={arXiv preprint arXiv:2025.13745},
  year={2025}
}

Acknowledgments

This work builds upon the FLUX architecture and benefits from the rich heritage of Chinese calligraphy. We thank the calligraphy masters whose works made this research possible.

About

Official implementation of UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published