IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation

1The Chinese University of Hong Kong 2Monash University 3The Hong Kong University of Science and Technology (Guangzhou)
4Kling Team, Kuaishou Technology 5Amazon 6South China University of Technology
*Equal contribution. Corresponding authors.
Accepted by AAAI 2026

Abstract

Recent visual generative models enable story generation with consistent characters from text, but human-centric story generation faces additional challenges, such as maintaining detailed and diverse human face consistency and coordinating multiple characters across different images. This paper presents IdentityStory, a framework for human-centric story generation that ensures consistent character identity across multiple sequential images. By taming identity-preserving generators, the framework features two key components: (i) Iterative Identity Discovery, which extracts cohesive character identities, and (ii) Re-denoising Identity Injection, which re-denoises images to inject identities while preserving desired context. Experiments on the ConsiStory-Human benchmark demonstrate that IdentityStory outperforms existing methods, particularly in face consistency, and supports multi-character combinations. The framework also shows strong potential for applications such as infinite-length story generation and dynamic character composition.

Human-Centric Story Generation

Our IdentityStory can solely rely on text to generate a series of images that consistently depict human characters and faithfully align with text prompts, outperforming the state-of-the-art methods.

teaser

Findings

Finding 1: Identity Space

We find that identity-preserving generators possess a well-constructed identity space, where identity representation can be obtained by aggregating character image embeddings.

Identity Embeddings

Finding 2: Text Alignment Degradation

We find that identity-preserving generators tend to exhibit degraded performance on text alignment, where the generated images may deviate from the text prompts.

Text Alignment Degradation

Methodology

IdentityStory comprises two core techniques to address challenges in human-centric story generation:

(i) Iterative Identity Discovery: We find that identity-preserving generators process a well-constructed identity space, where identity representation can be obtained by aggregating character image embeddings. After generating diverse character images from descriptions and projecting them into the identity space, we use Singular Value Decomposition (SVD) to iteratively filter out low-relevance embeddings and extract cohesive identities.

(ii) Re-denoising Identity Injection: To address text alignment degradation of identity-preserving generators, we first use a general generator to create a more text-aligned prototype image. Meanwhile, we cache noisy images during generation to preserve environmental semantics and segment the prototype image to extract character layouts. Using a progressive masking strategy, we then re-denoise with identity-preserving generators to inject identities.


pipeline

Experiments

Quantitative Comparisons

The results of automatic metrics demonstrate IdentityStory's overall superior performance, especially in face similarity (Face-Sim). The best and second-best results are marked in bold and underlined.

quantitative_comparisons

Qualitative Comparisons

Compared to other methods, our IdentityStory exhibits remarkable performance in handling human-centric scenarios, enabling consistent generation of human characters with only text as input. Zoom in for better view.

More Applications

Infinite-length Story Generation. IdentityStory supports infinite-length story generation, due to its decoupled nature, maintaining consistent character identities and coherent narratives across long sequences.

BibTeX

@article{zhou2025identitystory, 
    title={IdentityStory: Taming Your Identity-Preserving Generator for Human-Centric Story Generation}, 
    author={Zhou, Donghao and Lin, Jingyu and Shen, Guibao and Liu, Quande and Gao, Jialin and Liu, Lihao and Du, Lan and Chen, Cunjian and Fu, Chi-Wing and Hu, Xiaowei and Heng, Pheng-Ann}, 
    journal={arXiv preprint arXiv:2512.23519}, 
    year={2025} 
}