Snap&Spot: Leveraging Large Vision Language Models for Train-Free Zero-Shot Localization of Unusual Activities in Video

Texas A&M University

🔧 Getting Started

Clone this repository

git clone https://github.com/Hasnat79/Snap_n_Spot

init the submodules (foundation_models)

git submodule update --init --recursive

🚀 Installation

To install the necessary dependencies, run:

conda create -n snap
conda activate snap
pip install -r requirements.txt

📂 Dataset

/data directory contains charades-sta and uag_oops annnotation files. oops_video/val contains the videos of UAG-OOPS dataset. charades-sta contains the videos of the Charades-STA dataset.

⚙️ Blips2 feature generation

cd src
python feature_extraction.py

genrates blip2 features for the videos in the data directory in numpy format

🧠 Methodology __ Colab demo

cd src
python infer_snap.py --dataset uag_oops

generates the metrics for zero-shot unusual activity localization on UAG-OOPS dataset using the Snap&Spot pipeline.

Click to see the output format

Expected output format:

R@0.3: 0.6620967741935484
R@0.5: 0.49489247311827955
R@0.7: 0.23951612903225805

Try for a single video and query

python demo.py \
 --video_path "/Snap_n_Spot/data/oops_video/34 Funny Kid Nominees - FailArmy Hall Of Fame (May 2017)0.mp4" \
 --query "A guy jumps onto a bed where his son is. When the guy jumps, the son flies up and hits the wall."

📝 Evaluate any dataset using Our methodology

set up the dataset in the data_configs file
generate the features using the feature_extraction.py file
run the evaluation using the evaluate.py file

Note: You need to make sure you are updating the paths correctly in the config file and inside the scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
Figure		Figure
data		data
evaluation		evaluation
notebooks		notebooks
outputs		outputs
setups		setups
src		src
vid_llms		vid_llms
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
demo.ipynb		demo.ipynb
demo.py		demo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Snap&Spot: Leveraging Large Vision Language Models for Train-Free Zero-Shot Localization of Unusual Activities in Video

🔧 Getting Started

🚀 Installation

📂 Dataset

⚙️ Blips2 feature generation

🧠 Methodology __ Colab demo

Try for a single video and query

📝 Evaluate any dataset using Our methodology

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Snap&Spot: Leveraging Large Vision Language Models for Train-Free Zero-Shot Localization of Unusual Activities in Video

🔧 Getting Started

🚀 Installation

📂 Dataset

⚙️ Blips2 feature generation

🧠 Methodology __ Colab demo

Try for a single video and query

📝 Evaluate any dataset using Our methodology

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages