Skip to content

RuyangFan/chameleon_gitlab

Repository files navigation

chameleon

A Digital Human Project with Generative AI.

  1. 项目相关论文
  • MoDiTalker MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation

论文相关链接: https://arxiv.org/abs/2403.19144 https://paperswithcode.com/paper/moditalker-motion-disentangled-diffusion https://ku-cvlab.github.io/MoDiTalker/ https://github.com/KU-CVLAB/MoDiTalker

1 环境准备

1.1 python环境准备


<1> chameleon
cd chameleon
pip install -r requirements.txt
pip install -U edge-tts==6.1.12

<2> ModiTalker
pip install p_tqdm

<3> AniPortrait
pip install -U diffusers==0.24.0 imageio==2.33.0 imageio-ffmpeg==0.4.9 omegaconf==2.2.3 ffmpeg-python==0.2.0
(如果在运行中遇到未安装的包,可参考Zejun-Yang_AniPortrait/requirements.txt 进行安装)

1.2 预训练模型


<1> chameleon

* face_alignment / 3drecon 的预训练模型地址(复制到本地):
user@192.168.58.238:
~/lk/mdls/digitalhuman/deep_3drecon/
~/lk/mdls/digitalhuman/fa/

将模型复制到指定目录:
cd chameleon/processes
cp -R ~/lk/mdls/digitalhuman/deep_3drecon/BFM deep_3drecon/
cp -R ~/lk/mdls/digitalhuman/deep_3drecon/checkpoints deep_3drecon/
cp -R ~/lk/mdls/digitalhuman/fa/fan deep_3drecon/


<2> ModiTalker
hubert预训练模型地址(复制到本地):
user@192.168.58.238:
/home/user/mnt/sdg1/mdls/models--facebook--hubert-large-ls960-ft

(更新代码:KU-CVLAB_MoDiTalker/data/data_utils/preprocess/process_audio_lk.py L:14
  chameleon/projects/animake/process_audio_hubert.py L:14)

<3> AniPortrait

预训练模型地址(复制到本地):
user@192.168.58.238:
pretrained_base_model_path: '/home/user/mnt/sdg1/mdls/models--runwayml--stable-diffusion-v1-5'
pretrained_vae_path: '/home/user/mnt/sdg1/mdls/models--stabilityai--sd-vae-ft-mse'
image_encoder_path: '/home/user/mnt/sdg1/mdls/models--lambdalabs--sd-image-variations-diffusers/image_encoder'
mm_path: '/home/user/mnt/sdg1/mdls/models--guoyww--animatediff/mm_sd_v15_v2.ckpt'

(注意更新文件中相应地址:Zejun-Yang_AniPortrait/configs/prompts/animation_trnfa41.yaml)


1.3 自训练模型

<1> chameleon
user@192.168.58.238:
/data/likun/outs/chameleon/aniptrt/trn_fa_51/
/data/likun/outs/chameleon/aniptrt/trn_fa_51/stage1_bk/*-600000.pth
/data/likun/outs/chameleon/aniptrt/trn_fa_51/stage2_bk/*.pth


<2> ModiTalker
AToM:
user@192.168.58.238:
/home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt


<3> AniPortrait
user@192.168.58.238:
denoising_unet_path: "/home/user/mnt/sdg1/outs/digitalhuman/aniportrait/trn_fa_41/stage1_41_bk/denoising_unet-300000.pth"
reference_unet_path: "/home/user/mnt/sdg1/outs/digitalhuman/aniportrait/trn_fa_41/stage1_41_bk/reference_unet-300000.pth"
pose_guider_path: "/home/user/mnt/sdg1/outs/digitalhuman/aniportrait/trn_fa_41/stage1_41_bk/pose_guider-300000.pth"
motion_module_path: "/home/user/mnt/sdg1/outs/digitalhuman/aniportrait/trn_fa_41/stage2_bk/motion_module-400000.pth"

(注意更新文件中相应地址:chameleon/configs/prompts/animation_trnfa41.yaml)

注:其他模型可参考注意更新文件中相应地址:chameleon/configs/prompts/animation_trnfa41.yaml

  1. 数据预处理

使用了几个开源的数据集: HDTF,CelebV-HQ,VFHQ

数据地址: user@192.168.58.238:/data/likun/data/data/HDTF/ user@192.168.58.238:/data/likun/data/data/celebv_hq/ user@192.168.58.238:/data/likun/data/data/VFHQ/

对于图像及视频数据,训练前需要做一些预处理,详细命令请参考文档: chameleon/processes/README.md

调整视频fps

python -u preprocess/unify_fps_vid.py \
  --load_video_path /data/likun/data/data/HDTF/HDTF-FACE \
  --save_video_path /data/likun/data/data/HDTF/HDTF_fps25 \
  --start_idx 0  --end_idx 9999 \
  --fps 25 \
  --batch_size 1 \
  --num_workers 10 \
  &> /data/likun/data/data/HDTF/HDTF_fps25__log__mp.txt

从视频中提取图像

python -u preprocess/pick_audio_from_video.py \
  --load_video_path /data/likun/data/data/HDTF/HDTF_fps25 \
  --save_audio_path /data/likun/data/data/HDTF/HDTF_fps25_wavs \
  --start_idx 0  --end_idx 9999 \
  --batch_size 1 \
  --num_workers 10 \
  &> /data/likun/data/data/HDTF/HDTF_fps25_wavs__log.txt

从视频中抽帧存为图像

python -u preprocess/video2frame_celebv.py \
  --load_video_path /data/likun/data/data/HDTF/HDTF_fps25 \
  --save_images_path /data/likun/data/data/HDTF/HDTF_fps25_frame \
  &> /data/likun/data/data/HDTF/HDTF_fps25_frame__log.txt

调整图片大小

python -u preprocess/prcs_img_resize.py \
  /data/likun/data/data/HDTF/HDTF_fps25_frame \
  /data/likun/data/data/HDTF/HDTF_fps25_frame_256 \
  256 ".jpg" \
  &> /data/likun/data/data/HDTF/HDTF_fps25_frame_256__log.txt

提取图片中2D及3D人脸关键点

python -u preprocess/process_video_3dmm_rollback_fa3drec_mp.py \
  --audio_ok_file_path "" \
  --hdtf_frames_path /data/likun/data/data/HDTF/HDTF_fps25_frame_256 \
  --image_wh 256 \
  --if_save_img 1 \
  --mp_np 8 \
  --saving_path /data/likun/data/data/HDTF/HDTF_fps25_frame_256_kps_23_unfcpsok_img \
  &> /data/likun/data/data/HDTF/HDTF_fps25_frame_256_kps_23_unfcpsok_img__log_1.txt

  1. 推理生成
  • 代码更新

<1> chameleon

chameleon/projects/animake/main_1.py   L:51 L:59

<2> ModiTalker

<3> AniPortrait
Zejun-Yang_AniPortrait/scripts/pose2vid_fa_tr_lk.py  L:41

3.1 分步执行 (详见chameleon/projects/animake/CMDS_1.md)

3.1.1 生成人脸关键点 这一步由图片生成相应的人脸关键点,

(pt210g118) user@8a100svr3:~/lk/proj/mypj/chameleon/processes$ 
python preprocess/prcs_img_ldmk_fa_diff.py \
  --image_path "/home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/00000000.png" \
  --saving_path "/home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/" \
  --if_save_img 1 \
  --image_wh 256

参数解释: --image_path 输入图片 --saving_path 输出保存目录 --if_save_img 1 是否保存相关图片结果 --image_wh 图片分辨率

3.1.2 生成语音feature

(pt210g118) user@8a100svr3:~/lk/proj/pydl/digitalhuman/KU-CVLAB_MoDiTalker/data/data_utils$ 
python preprocess/process_audio_lk.py \
  --audio /home/user/mnt/sdg1/data/wavs/237-134500-0003.wav \
  --save_sample_dir /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/inf_ddp_atom_en_21/12_smpl \
  --save_hubert_dir /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/inf_ddp_atom_en_21/12_hbrt

3.1.3 由语音feature生成关键点序列

(pt210g118) user@8a100svr3:~/lk/proj/pydl/digitalhuman/KU-CVLAB_MoDiTalker/AToM$ 
CUDA_VISIBLE_DEVICES=3 python inference_lk.py \
    --data_root /home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/for_atom \
  --cond_kps_path /home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/face-centric/unposed/00000000.png.npy \
    --hubert_path /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/inf_ddp_atom_en_21/12_hbrt/16000/237-134500-0003.npy \
    --audio_wav_path /home/user/mnt/sdg1/data/wavs/237-134500-0003.wav \
    --save_dir /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/inf_ddp_atom_en_21/12_kpts \
    --checkpoint /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt

3.1.4 由关键点序列生成视频

(pt210g118) user@8a100svr3:~/lk/proj/pydl/digitalhuman/Zejun-Yang_AniPortrait$ 
python -m scripts.pose2vid_fa_tr_lk \
  --config configs/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --ref_image_path "/home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/00000000.png" \
  --ref_pose_path "" \
  --tgt_pose_path "/home/user/mnt/sdg1/outs/digitalhuman/mobitalker/inf_ddp_atom_en_21/12_kpts/frontalized_npy/00000000.png/atom_0.npy" \
  --tgt_audio_path /home/user/mnt/sdg1/data/wavs/237-134500-0003.wav \
  --out_save_path /home/user/mnt/sdg1/outs/digitalhuman/aniportrait/inf_p2v_trn32_21

3.2 单次执行

(详见chameleon/projects/animake/CMDS_3.md)

main文件: chameleon/projects/animake/main_1.py

(注意:根据MoDiTalker和AniPortrait代码目录更新main_1.py L:51 L:59)

3.2.1 由现有语音生成视频

python main_1.py \
  \
  --input_audio /home/user/mnt/sdg1/data/wavs/121-121726-0000.wav \
  --save_sample_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_smpl \
  --save_hubert_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_hbrt \
  \
  --save_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_kpts \
  --checkpoint /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt \
  \
  --config /home/user/lk/proj/pydl/digitalhuman/Zejun-Yang_AniPortrait/configs/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --device "cuda" \
  --ref_image_path "/home/user/mnt/sdh1/data/digiman/VFHQ_512_p100_256/Clip+_OUh5xHwjqs+P0+C2+F23393-23551/00000000.png" \
  --ref_pose_path "" \
  --tgt_pose_path "" \
  --out_save_path /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12

3.2.2 由文本生成语音再生成视频

python main_1.py \
  \
  --tts_txt "a novel framework for generating high-quality animation driven by audio and a reference portrait image." \
  --save_sample_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_smpl \
  --save_hubert_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_hbrt \
  \
  --save_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_kpts \
  --checkpoint /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt \
  \
  --config /home/user/lk/proj/pydl/digitalhuman/Zejun-Yang_AniPortrait/configs/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --device "cuda" \
  --ref_image_path "/home/user/mnt/sdh1/data/digiman/VFHQ_512_p100_256/Clip+_OUh5xHwjqs+P0+C2+F23393-23551/00000000.png" \
  --ref_pose_path "" \
  --tgt_pose_path "" \
  --out_save_path /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12

3.2.3 由预处理过的关键点序列生成视频

python main_1.py \
  \
  --save_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_kpts \
  --checkpoint /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt \
  \
  --config /home/user/lk/proj/pydl/digitalhuman/Zejun-Yang_AniPortrait/configs/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --device "cuda" \
  --ref_image_path "/home/user/mnt/sdg1/data/imgs/face_11/ref_images/solo.png" \
  --ref_pose_path "" \
  --tgt_pose_type "fcup" \
  --tgt_pose_path "/home/user/mnt/sdh1/data/digiman/VFHQ_512_p100_256_kps_42_unfcpsok_img/face-centric/unposed/Clip+_pcoGxTYEKk+P0+C0+F106-276/" \
  --out_save_path /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12

3.3 单次执行 (详见chameleon/projects/animake/CMDS_21.md)

main文件: chameleon/projects/animake/main_2.py

3.3.1 由现有语音生成视频

python main_2.py \
  \
  --input_audio /home/user/mnt/sdg1/data/wavs/121-121726-0000.wav \
  --save_sample_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_smpl \
  --save_hubert_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_hbrt \
  \
  --save_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_kpts \
  --checkpoint /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt \
  \
  --config ../../configs/aniptrt/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --device "cuda" \
  --ref_image_path "/home/user/mnt/sdh1/data/digiman/VFHQ_512_p100_256/Clip+_OUh5xHwjqs+P0+C2+F23393-23551/00000000.png" \
  --ref_pose_path "" \
  --tgt_pose_path "" \
  --out_save_path /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12

参数介绍: --input_audio 输入的音频wav文件 --save_sample_dir 音频采样率调整后保存目录 --save_hubert_dir 音频特征提取后保存目录 --save_dir 生成的关键点序列保存目录 --checkpoint 语音生成关键点序列模型的路径 --config 关键点序列生成图片序列模型的配置文件,相应模型在其中指定 --ref_image_path 参考图片地址 --out_save_path 生成视频保存路径

或者简洁命令:

python main_2.py \
  \
  --input_audio /data/likun/data/data/wavs/LJ050-0180.wav \
  \
    --checkpoint /data/likun/outs/chameleon/moditalk/trn_31_lrs3/exp/weights/train-2000.pt \
  \
  --config ../../configs/aniptrt/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --ref_image_path "/data/likun/data/data/face/FFHQ/FFHQ512x512_p1/00004.png" \
  --ref_pose_path "" \
  --tgt_pose_path "" \
  --out_save_path /data/likun/outs/chameleon/animake/main_2_42_41

3.3.2 由文本生成语音再生成视频

python main_2.py \
  \
  --tts_txt "a novel framework for generating high-quality animation driven by audio and a reference portrait image." \
  --save_sample_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_smpl \
  --save_hubert_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_hbrt \
  \
  --save_dir /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12/12_kpts \
  --checkpoint /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt \
  \
  --config /home/user/lk/proj/pydl/digitalhuman/Zejun-Yang_AniPortrait/configs/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --device "cuda" \
  --ref_image_path "/home/user/mnt/sdh1/data/digiman/VFHQ_512_p100_256/Clip+_OUh5xHwjqs+P0+C2+F23393-23551/00000000.png" \
  --ref_pose_path "" \
  --tgt_pose_path "" \
  --out_save_path /home/user/mnt/sdg1/outs/digitalhuman/chameleon/animake/main_1_41_12

参数介绍: --tts_txt 输入文本

python main_2.py \
  \
  --tts_txt "a novel framework for generating high-quality animation driven by audio and a reference portrait image." \
  \
  --checkpoint /data/likun/outs/chameleon/moditalk/trn_31_lrs3/exp/weights/train-2000.pt \
  \
  --config ../../configs/aniptrt/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --ref_image_path "/home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/00000000.png" \
  --ref_pose_path "" \
  --tgt_pose_path "" \
  --out_save_path /data/likun/outs/chameleon/animake/main_2_43_24

3.3.3 由预处理过的关键点序列生成视频

python main_2.py \
  \
  --checkpoint /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/trn_19_atm_1/exp/weights/train-2000.pt \
  \
  --config ../../configs/aniptrt/prompts/animation_trnfa41.yaml \
  -W 256 \
  -H 256 \
  -L 0 \
  --fps 25 \
  --ref_image_path "/home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/00000000.png" \
  --ref_pose_path "" \
  --tgt_pose_type "fcup" \
  --tgt_pose_path "/home/user/mnt/sdh1/data/digiman/VFHQ_512_p100_256_kps_42_unfcpsok_img/face-centric/unposed/Clip+_pcoGxTYEKk+P0+C0+F106-276/" \
  --out_save_path /data/likun/outs/chameleon/animake/main_2_51_11

参数介绍: --tgt_pose_type 关键点序列的类型 --tgt_pose_path 预先生成的关键点序列目录

4 模型训练

4.1 语音生成关键点序列模型

这里使用 MoDiTalker 项目的架构及方法训练 语音生成关键点序列模型。 项目目录:chameleon/projects/moditalk/ 详细请参考:chameleon/projects/moditalk/README.md

数据: lrs3 (预处理之后的) 地址: user@192.168.58.238:/data/likun/data/data/wav2lib/lrs3/lrs3_tmp/

  • 训练命令:
CUDA_VISIBLE_DEVICES=4,5,6,7 \
torchrun --nproc_per_node=4  --master-port=30021 \
  train_ddp.py \
    --batch_size 128 \
    --epochs 2000 \
    --feature_type jukebox \
    --save_interval 1 \
    --processed_data_dir /data/likun/data/data/wav2lib/lrs3/lrs3_tmp \
    --project /data/likun/outs/chameleon/moditalk/trn_31_lrs3
  • 测试命令:
CUDA_VISIBLE_DEVICES=1 \
python inference_lk.py \
  --cond_kps_path /home/user/mnt/sdg1/data/imgs/VFHQ_512_p100_256_kps_fortest_31/Clip+_pcoGxTYEKk+P0+C0+F106-276/face-centric/unposed/00000000.png.npy \
    --hubert_path /home/user/mnt/sdg1/outs/digitalhuman/mobitalker/inf_ddp_atom_en_21/12_hbrt/16000/237-134500-0003.npy \
    --audio_wav_path /home/user/mnt/sdg1/data/wavs/237-134500-0003.wav \
    --save_dir /data/likun/outs/chameleon/moditalk/inf_ddp_atom_en_31/ \
    --checkpoint /data/likun/outs/chameleon/moditalk/trn_31_lrs3/exp/weights/train-2000.pt

4.2 关键点序列生成视频模型

这里使用 AniPortrait 项目的架构及方法训练 关键点序列生成视频模型。 项目目录:chameleon/projects/aniptrt/ 详细请参考:chameleon/projects/aniptrt/README.md

这个模型的训练过程分为2个stage,主要目的是分步骤训练模型不同方面的能力,详细请参考AniPortrait的技术论文。

4.2.1 数据

使用了几个开源的数据集: HDTF,CelebV-HQ,VFHQ

数据地址: user@192.168.58.238:/data/likun/data/data/HDTF/ user@192.168.58.238:/data/likun/data/data/celebv_hq/ user@192.168.58.238:/data/likun/data/data/VFHQ/

详细请参考配置文件: chameleon/configs/aniptrt/train/stage1_fa_mlt.yaml chameleon/configs/aniptrt/train/stage2_fa_mlt.yaml

4.2.2 训练stage1

首先编辑配置文件:chameleon/configs/aniptrt/train/stage1_fa_mlt.yaml 主要是数据集的目录地址,训练过程LOG及输出保存地址。

CUDA_VISIBLE_DEVICES=4,5,6,7 \
    accelerate launch --main_process_port 29502 \
    train_stage_1_fa_tm.py --config ../../configs/aniptrt/train/stage1_fa_mlt.yaml \

4.2.3 训练stage2

首先编辑配置文件:chameleon/configs/aniptrt/train/stage2_fa_mlt.yaml 主要是数据集的目录地址,训练过程LOG及输出保存地址。

CUDA_VISIBLE_DEVICES=0,1,2,3 \
    accelerate launch --main_process_port 29522 \
    train_stage_2_fa.py --config ../../configs/aniptrt/train/stage2_fa_mlt.yaml \

About

数字人项目

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages