Skip to content

Ryan-w2024/PoseAnything

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦„ PoseAnything: Universal Pose-guided Video Generation with Part-aware Temporal Coherence

Python Version arXiv Project Page

πŸ“ Introduction

PoseAnything is a universal pose-guided video generation framework. It enables high-quality video generation for both human and non-human characters from arbitrary skeletal inputs


πŸ“… Time Schedule

No. Content State
1 Model Enhanced Using Human Data βœ…
2 Release Training Code βœ…
3 XPose Dataset Release Hugging Face

πŸ› οΈ Installation Guide

1. πŸ“‚ Clone Repository

git clone https://github.com/Ryan-w2024/PoseAnything.git
cd PoseAnything

2. 🐍 Environment Setup

Install with conda

conda create -n poseanything python=3.10
conda activate poseanything
pip install -e .
pip install flash_attn --no-build-isolation

3. πŸ’Ύ Model Weights Download

Use the following command to download the model weights:

pip install "huggingface_hub[cli]"
huggingface-cli download Wan-AI/Wan2.2-TI2V-5B --local-dir ./models/Wan2.2-TI2V-5B
huggingface-cli download Ryan241005/PoseAnything --local-dir ./models/Pony

After downloading, the weights files should be organized as:

PoseAnything/
β”œβ”€β”€ models/
β”‚     β”œβ”€ Wan2.2-TI2V-5B/
β”‚     β”‚       β”œβ”€β”€ models_t5_umt5-xxl-enc-bf16.pth
β”‚     β”‚       β”œβ”€β”€ Wan2.2_VAE.pth
β”‚     β”‚       └── ...
β”‚     └── Pony/
β”‚           β”œβ”€β”€ diffusion_pytorch_model-00001-of-00002.safetensors
β”‚           β”œβ”€β”€ diffusion_pytorch_model-00002-of-00002.safetensors
β”‚           └── ...
└── ...

πŸ’» Quick Start: Inference

To run PoseAnything, you need to extract the masked image of the target subject based on the first frame and skeleton. You can either store the masked image directly to DATA_DIR/video, or use the following example script for automatic extraction:

cd Extractor
bash mask.sh # May need to downgrade transformers to 4.40.2

The data will then be formatted as follows:

DATA_DIR/
β”œβ”€β”€ first_frame/
β”‚      └── {file_name}.png
β”œβ”€β”€ skeleton_image/
β”‚      └── {file_name}/
β”‚              └── 000.png
β”‚              └── 001.png
β”‚              └── 002.png
β”œβ”€β”€ video/
β”‚      └── {file_name}_id.png
└── ...

Then, You can then use the provided example script to run the demo

bash test.sh

If you wish to test the version that does not include the PTC module, run the following command (masked image is not required).

bash test_without_ptc.sh

βœ” Tip: PoseAnything supports arbitrary skeleton inputs. For strong skeletal conditions (large motion/high density input), we suggest using a smaller CFG scale or no CFG for natural output. For weak skeletal conditions (small motion/low density input), increase the CFG scale to enhance fitting to the pose.

To test the TikTok dataset, refer to the script below:

bash test_tiktok.sh

Demo Showcase

skeleton_1.mp4
result_1.mp4
skeleton_2.mp4
result_2.mp4
skeleton_3.mp4
result_3.mp4
skeleton_4.mp4
result_4.mp4

πŸš€ Training

We provide training scripts for the DiT module, which excludes the PTC (Part-aware Temporal Coherence) module to reduce VRAM consumption.You need to modify the checkpoint beforehand to adapt the channel count of the patchify module to the newly added skeleton input:

python update_weight.py
bash train_without_ptc.sh

Since adding the PTC module leads to high VRAM overhead, we suggest using the DeepSpeed framework for optimization.

πŸ—ƒοΈ Data Process

We also provide the code for automated skeleton extraction, which is built based on BlumNet and Grounded-Sam-2.

Installation Guide

To avoid conflicts, we highly recommend creating a new Conda environment.

cd Extractor
conda create -n extractor python=3.10
conda activate extractor
pip install -r requirement.txt

# Compile CUDA operators (as required by BlumNet) 
cd BlumNet/models/ops
sh ./make.sh
cd ../../../

Download Weights

Please download the weights for BlumNet and Grounded-SAM-2 following the instructions provided in corresponding repository, and run

cp -r Grounded/sam2 ./

Usage

To automate the extraction of skeletons from your own video data, you must first provide the paths to the videos to be processed and their corresponding captions, as shown in ../data/example/raw_metadata.csv. To run the example data:

bash run.sh

πŸ“§ Acknowledgement

Our implementation is based on DiffSynth-Studio, BlumNet and Grounded-SAM-2. Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors