Snap&Spot: Leveraging Large Vision Language Models for Train-Free Zero-Shot Localization of Unusual Activities in Video
Texas A&M University
- Clone this repository
git clone https://github.com/Hasnat79/Snap_n_Spot
- init the submodules (foundation_models)
git submodule update --init --recursive
To install the necessary dependencies, run:
conda create -n snap
conda activate snap
pip install -r requirements.txt
/data directory contains charades-sta and uag_oops annnotation files. oops_video/val contains the videos of UAG-OOPS dataset. charades-sta contains the videos of the Charades-STA dataset.
cd src
python feature_extraction.py
- genrates blip2 features for the videos in the data directory in numpy format
🧠 Methodology __ Colab demo
cd src
python infer_snap.py --dataset uag_oops
- generates the metrics for zero-shot unusual activity localization on UAG-OOPS dataset using the Snap&Spot pipeline.
Click to see the output format
Expected output format:
R@0.3: 0.6620967741935484
R@0.5: 0.49489247311827955
R@0.7: 0.23951612903225805
python demo.py \
--video_path "/Snap_n_Spot/data/oops_video/34 Funny Kid Nominees - FailArmy Hall Of Fame (May 2017)0.mp4" \
--query "A guy jumps onto a bed where his son is. When the guy jumps, the son flies up and hits the wall."
- set up the dataset in the data_configs file
- generate the features using the feature_extraction.py file
- run the evaluation using the evaluate.py file
Note: You need to make sure you are updating the paths correctly in the config file and inside the scripts.
