Skip to content

39-Rep/Phantom

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

arXiv  project page 

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Lijie Liu * , Tianxiang Ma * , Bingchuan Li * †, Zhuowei Chen * , Jiawei Liu, Qian He, Xinglong Wu
* Equal contribution,Project lead
Intelligent Creation Team, ByteDance

Overview

Phantom is a unified video generation framework for single and multi-subject references, built on existing text-to-video and image-to-video architectures. It achieves cross-modal alignment using text-image-video triplet data by redesigning the joint text-image injection model. Additionally, it emphasizes subject consistency in human generation while enhancing ID-preserving video generation.

Comparative Results 🆚

  • Identity Preserving Video Generation. image
  • Single Reference Subject-to-Video Generation. image
  • Multi-Reference Subject-to-Video Generation. image

BibTeX

@article{liu2025phantom,
  title={Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment},
  author={Liu, Lijie and Ma, Tianxaing and Li, Bingchuan and Chen, Zhuowei and Liu, Jiawei and He, Qian and Wu, Xinglong},
  journal={arXiv preprint arXiv:2502.11079},
  year={2025}
}

About

Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • CSS 88.6%
  • JavaScript 11.4%