Pytorch implementation for paper Compositional Physical Reasoning of Objects and Events from Videos. More details and visualization results can be found at the project page.
Compositional Physical Reasoning of Objects and Events from Videos
Zhenfang Chen, Shilong Dong, Kexin Yi, Yunzhu Li, Mingyu Ding, Antonio Torralba, Joshua B. Tenenbaum, Chuang Gan
-
Prerequisites
- Python 3
- PyTorch 1.0 or higher, with NVIDIA CUDA Support
- Other required python packages specified by
requirements.txt.
-
Install Jacinle: Clone the package, and add the bin path to your global
PATHenvironment variable:git clone https://github.com/vacancy/Jacinle --recursive export PATH=<path_to_jacinle>/bin:$PATH -
Clone this repository:
git clone https://github.com/zfchenUnique/DCL-Release.git --recursive -
Create a conda environment for NS-CL, and install the requirements. This includes the required python packages from both Jacinle NS-CL. Most of the required packages have been included in the built-in
anacondapackage.
- Download videos, video annotation, questions and answers, and object proposals accordingly from the official website
- Transform videos into ".png" frames with ffmpeg.
- Step 1: download the proposals from the region proposal network and extract object trajectories for train and val set by
bash scripts/script_gen_tubes.sh
- Step 2: train a concept learner with descriptive and explanatory questions for static concepts (i.e. color, shape and material)
bash scripts/comphy_train_pcr_stage1 <GPU_ID> <DATA_DIR>
- Step 3: extract static attributes & refine object trajectories extract static attributes
bash scripts/script_extract_attribute.sh
refine object trajectories
bash scripts/script_gen_tubes_refine.sh
- Step 4: train a pcr for stage2 learning
bash script/script_comphy_train_pcr_stage2.sh <GPU_ID> <DATA_DIR> <STAGE1_MODEL_DIR>
- Step 5: train a pcr for stage3 learning
bash script/script_comphy_train_pcr_stage3.sh <GPU_ID> <DATA_DIR> <STAGE2_MODEL_DIR>
- Step 1: Download training videos, validation videos and related questions from google drive.
- Step 2: Finetune a pretrained PCR model on the Real-World Scenario dataset.
bash script/script_real_world_dataset_finetune.sh <GPU_ID> <DATA_DIR> <STAGE2_MODEL_DIR>
If you find this repo useful in your research, please consider citing:
@misc{chen2024compositionalphysicalreasoningobjects,
title={Compositional Physical Reasoning of Objects and Events from Videos},
author={Zhenfang Chen and Shilong Dong and Kexin Yi and Yunzhu Li and Mingyu Ding and Antonio Torralba and Joshua B. Tenenbaum and Chuang Gan},
year={2024},
eprint={2408.02687},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2408.02687},
}
