Dataset · Checkpoints · Website · Paper
git clone --branch main --single-branch https://github.com/Dou-Yiming/hearing_hands.git
cd hearing_hands
conda env create -f environment.ymlDownload the dataset from this link, then extract them:
tar -xvf dataset.tar.gz video2audio/data/dataset
Download the pretrained checkpoints from this link, then extract them:
tar -xvf checkpoints.tar.gz ./
mkdir video2audio/logs; mv checkpoints/sarf_full video2audio/logs
mv checkpoints/adm_checkpoints video2audio/ldm/adm/checkpoints
mv checkpoints/vocoder_checkpoints/* video2audio/ldm/vocoder/bigvgan/checkpoints
rm checkpoints.tar.gzsh scripts/inference.sh
sh scripts/train.sh
If you find our work useful, please consider citing:
@inproceedings{dou2025hearing,
title={Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes},
author={Dou, Yiming and Oh, Wonseok and Luo, Yuqing and Loquercio, Antonio and Owens, Andrew},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={1795--1804},
year={2025}
}