TemporalAttentionPlayground

Evaluating Temporal Context for Robustness to Perturbations in Video Object Detection Models

This repository contains code and experiments for the bachelor thesis "Evaluating Temporal Context for Robustness to Perturbations in Video Object Detection Models". It is exploring temporal attention mechanisms for object detection in video sequences using UAV (Unmanned Aerial Vehicle) datasets VisDrone and XS-VID. The project evaluates the performance of temporal attention models TRANSVOD and YOLOV as well as there baselines MSDA and YOLOX.

The repository supports:

🔧 Enviroment setup
📦 Dataset structure generation & preprocessing
🔥 Model setup and training
📊 Evaluation and result visualization
♻️ FAIR and reproducible experiment design

🚀 Project Overview

Component	Description
🎯 Goal	Investigate influence of temporal attention on complex datasets and study the influence of temporal attention on model robustness
🧪 Datasets	VisDrone (drone-based object detection) XS-VID (small object dataset)
🧠 Framework	CV Deep learning (PyTorch-based, CUDA)
💡 Output	Thesis, Evaluation metrics (mAP, AP), trained models, prediction results, examples
♻️ FAIR compliance	Code stored in Git, data in RDM repository (10.70124/mv76r-r8x04), fully documented

📂 Repository Structure

.
├─ code/                 # Python code for setup, dataset building
├─ setup/                # Setup sript and environment information
├─ data/                 # Data folder with dataset information and example
├─ results/              # Summary metrics & figures (full outputs in RDM repository)
├─ README.md             # This file
├─ scripts/              # Easy to use scripts for training, evaluating and result generation
└─ .gitignore

🔧 Setup

📌 Preconditions

Before running the setup, ensure that you have:

NVIDIA GPU supporting CUDA 11.3
At least 50 GB of free storage space
A Linux-based system (recommended) with sudo permissions
Conda or Python 3.8+ installed

🚀 Quick Setup Instructions

Clone repository including submodules

git clone --recurse-submodules https://github.com/mozi30/TemporalAttentionPlayground.git

Navigate to the setup directory:
```
cd setup
```
Configure required paths by editing the config.env file:
```
nano setup.env
```
Adjust the values according to your system, for example:
1. Base environment path
2. Dataset storage location.
3. Output directory for annotations
4. Directory for model weights
Run the setup script:
```
sudo bash setup-env.sh
```
This script will: Install required environments and dependencies (e.g. msda, YOLOX)
- Download datasets from the specified storage location
- Generate annotations in the correct format for training and evaluation
- Download base model weights
- Finalize the environment for model training and evaluation

You are ready to start working 🚀🚀🚀

For more information on training and evaluation check out scripts/README.md

📚 Reproducibility & FAIR Principles

This project is designed according to the FAIR principles (Findable, Accessible, Interoperable, Reusable). The experiment follows the data lifecycle described in the Data Management Plan (DMP).

🔍 Findable

Code is version-controlled in this Git repository.
Full experimental results, dataset annotations, and trained model weights will be published in the TU Wien Research Data Repository (DOI: 10.70124/mv76r-r8x04).
Each dataset used (VisDrone, XS-VID) is referenced with its official source and citation. Further information in data/README.md

🔓 Accessible

Code and lightweight experiment results are openly available in this repository.
Full datasets and large result files (e.g., video sequences, large model checkpoints) are available via RDM repository access, according to their respective licenses.
Repository includes clear instructions on how to obtain and prepare input data.

🔁 Interoperable

Standard formats are used whenever possible (JSON, YAML, COCO-style annotations, PNG/JPEG images).
Data processing follows structured Python pipelines.
Configuration files (e.g. .env, YAML) allow reproducibility of the setup.

🔂 Reusable

Code will be provided under the MIT License (see LICENSE file).
Produced experimental data will be shared under MIT for XS-VID and CC BY-NC-SA 3.0 for Visdrone.
Detailed metadata is provided in:
- data/README.md – dataset structure and label mapping
- results/README.md – explanation of metrics and result files
- scripts/ – runnable experiment scripts
- DMP (deposited to Zenodo)

📎 Data Management Plan (DMP)

A detailed DMP following Science Europe Guidelines has been created for this project. It includes:

Data sources and licensing
Data processing workflow and reproducibility
Storage, backup, and access strategy
Metadata and documentation standards
FAIR self-assessment and steps taken

The DMP will be deposited on Zenodo as part of the Intro RDM – DMPs 2025 collection (embargoed until deadline).

🔹 DMP Title: `DMP: Temporal Context in Computer Vision Detection Model` 🔹 DOI: `10.5281/zenodo.17771932`

📜 Licensing

Component	License
Code	MIT License
Produced Data & Results	CC BY-NC-SA 3.0 for results including Visdrone, MIT for XS-VID
Input Data	Per dataset terms (VisDrone: CC BY-NC-SA 3.0, XS-VID: MIT)

Check dataset license before redistribution.

The appropriate license files will be added to the repository and confirmed in the DMP.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TemporalAttentionPlayground

Evaluating Temporal Context for Robustness to Perturbations in Video Object Detection Models

🚀 Project Overview

📂 Repository Structure

🔧 Setup

📌 Preconditions

🚀 Quick Setup Instructions

📚 Reproducibility & FAIR Principles

🔍 Findable

🔓 Accessible

🔁 Interoperable

🔂 Reusable

📎 Data Management Plan (DMP)

🔹 DMP Title: `DMP: Temporal Context in Computer Vision Detection Model` 🔹 DOI: `10.5281/zenodo.17771932`

📜 Licensing

Check dataset license before redistribution.

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
TransVOD_plusplus @ 7c36d5f		TransVOD_plusplus @ 7c36d5f
YOLOV @ 032681b		YOLOV @ 032681b
code		code
data		data
results		results
scripts		scripts
setup		setup
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

License

mozi30/TemporalAttentionPlayground

Folders and files

Latest commit

History

Repository files navigation

TemporalAttentionPlayground

Evaluating Temporal Context for Robustness to Perturbations in Video Object Detection Models

🚀 Project Overview

📂 Repository Structure

🔧 Setup

📌 Preconditions

🚀 Quick Setup Instructions

📚 Reproducibility & FAIR Principles

🔍 Findable

🔓 Accessible

🔁 Interoperable

🔂 Reusable

📎 Data Management Plan (DMP)

🔹 DMP Title: DMP: Temporal Context in Computer Vision Detection Model 🔹 DOI: 10.5281/zenodo.17771932

📜 Licensing

Check dataset license before redistribution.

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🔹 DMP Title: `DMP: Temporal Context in Computer Vision Detection Model` 🔹 DOI: `10.5281/zenodo.17771932`

Packages