IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation

¹The Chinese University of Hong Kong ²Monash University ³The Hong Kong University of Science and Technology (Guangzhou)
⁴Kling Team, Kuaishou Technology ⁵Amazon ⁶South China University of Technology
^*Equal contribution. ^†Corresponding authors.
Accepted by AAAI 2026

Abstract

Recent visual generative models enable story generation with consistent characters from text, but human-centric story generation faces additional challenges, such as maintaining detailed and diverse human face consistency and coordinating multiple characters across different images. This paper presents IdentityStory, a framework for human-centric story generation that ensures consistent character identity across multiple sequential images. By taming identity-preserving generators, the framework features two key components: (i) Iterative Identity Discovery, which extracts cohesive character identities, and (ii) Re-denoising Identity Injection, which re-denoises images to inject identities while preserving desired context. Experiments on the ConsiStory-Human benchmark demonstrate that IdentityStory outperforms existing methods, particularly in face consistency, and supports multi-character combinations. The framework also shows strong potential for applications such as infinite-length story generation and dynamic character composition.

Human-Centric Story Generation

Our IdentityStory can solely rely on text to generate a series of images that consistently depict human characters and faithfully align with text prompts, outperforming the state-of-the-art methods.

Findings

Finding 1: Identity Space

We find that identity-preserving generators possess a well-constructed identity space, where identity representation can be obtained by aggregating character image embeddings.

Finding 2: Text Alignment Degradation

We find that identity-preserving generators tend to exhibit degraded performance on text alignment, where the generated images may deviate from the text prompts.

Methodology

IdentityStory comprises two core techniques to address challenges in human-centric story generation:

(i) Iterative Identity Discovery: We find that identity-preserving generators process a well-constructed identity space, where identity representation can be obtained by aggregating character image embeddings. After generating diverse character images from descriptions and projecting them into the identity space, we use Singular Value Decomposition (SVD) to iteratively filter out low-relevance embeddings and extract cohesive identities.

(ii) Re-denoising Identity Injection: To address text alignment degradation of identity-preserving generators, we first use a general generator to create a more text-aligned prototype image. Meanwhile, we cache noisy images during generation to preserve environmental semantics and segment the prototype image to extract character layouts. Using a progressive masking strategy, we then re-denoise with identity-preserving generators to inject identities.

BibTeX

@article{zhou2025identitystory, title={IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation}, author={Zhou, Donghao and Lin, Jingyu and Shen, Guibao and Liu, Quande and Gao, Jialin and Liu, Lihao and Du, Lan and Chen, Cunjian and Fu, Chi-Wing and Hu, Xiaowei and Heng, Pheng-Ann}, journal={arXiv preprint arXiv:2512.23519}, year={2025} }

IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation

Abstract

Human-Centric Story Generation

Findings

Finding 1: Identity Space

Finding 2: Text Alignment Degradation

Methodology

Experiments

Quantitative Comparisons

Qualitative Comparisons

More Applications

BibTeX