ReKV

Official PyTorch code of "Streaming Video Question-Answering with In-context Video KV-Cache Retrieval", ICLR 2025.

Abstract

We propose ReKV, a novel, training-free approach that integrates seamlessly with existing Video Large Language Models (Video-LLMs) to enable efficient streaming video question-answering (StreamingVQA).

Traditional VideoQA systems struggle with long videos, as they must process the entire video before responding to queries, and repeat this process for each new question. In contrast, our approach analyzes long videos in a streaming fashion, allowing for prompt responses as soon as user queries are received.

Building on a common Video-LLM, we first incorporate a sliding-window attention mechanism, ensuring that input frames attend to a limited number of preceding frames, thereby reducing computational overhead.
To prevent information loss, we store processed video key-value caches (KV-Caches) in RAM and disk, reloading them into GPU memory as needed.
Additionally, we introduce a retrieval method that leverages an external retriever or the parameters within Video-LLMs to retrieve only query-relevant KV-Caches, ensuring both efficiency and accuracy in question answering.

ReKV enables the separation of video analyzing and question-answering across different processes and GPUs, significantly enhancing the efficiency of StreamingVQA. Through comprehensive experimentation, we validate the efficacy and practicality of our approach, which significantly boosts efficiency and enhances applicability over existing VideoQA models.

Directory Structure

.
├── data        processed benchmarks
├── model       code for integrating ReKV with various Video-LLMs
├── model_zoo   pretrained Video-LLM checkpoints
├── results     evaluation results
└── video_qa    code for StreamingVQA & OfflineVQA

Preparation

Our setup: Ubuntu 22.04, CUDA 12.6, 8x Nvidia H800 (80GB)

Clone this repo: git clone https://github.com/Becomebright/ReKV.git
Prepare the conda environment: bash prepare.sh
Download pretrained Video-LLMs under model_zoo/

Download benchmarks under data/

MLVU-dev-mc
QAEgo4D-test-mc
EgoSchema-full
ActivityNet-QA
RVS
CGBench

The data/ folder should be arranged as:

data
├── activitynet_qa
│   ├── test.json
│   └── videos
├── cgbench
│   ├── full_mc.json
│   └── videos
├── egoschema
│   ├── full.json
│   └── videos
├── mlvu
│   ├── dev_debug_mc.json
│   └── videos
├── qaego4d
│   ├── test_mc.json
│   └── videos
└── rvs
    ├── ego
    │   ├── ego4d_oe.json
    │   └── videos
    └── movie
        ├── movienet_oe.json
        └── videos

Increases the memory map limit for processes (needed for offloading KV-Caches): sudo sysctl -w vm.max_map_count=262144

Evaluation

# The number of processes utilized for parallel evaluation.
# Normally, set it to the number of GPUs on your machine.
# Yet, llava_ov_72b needs 4x 80GB GPUs. So set num_chunks to num_gpus//4.
num_chunks=8

# Supported model: llava_ov_0.5b llava_ov_7b llava_ov_72b video_llava_7b longva_7b
model=llava_ov_0.5b

# Supported dataset: qaego4d egoschema cgbench mlvu activitynet_qa rvs_ego rvs_movie
# MLVU has an extremely long video (~9hr). Remove it in the annotation file if your system doesn't have enough RAM.
dataset=qaego4d

python -m video_qa.run_eval \
    --num_chunks $num_chunks \
    --model ${model} \
    --dataset ${dataset} \
    --sample_fps 0.5 \
    --n_local 15000 \
    --retrieve_size 64

Citation

@inproceedings{di2025rekv,
  title={Streaming Video Question-Answering with In-context Video KV-Cache Retrieval},
  author={Di, Shangzhe and Yu, Zhelun and Zhang, Guanghao and Li, Haoyuan and Cheng, Hao and Li, Bolin and He, Wanggui and Shu, Fangxun and Jiang, Hao and others},
  booktitle={ICLR},
  year={2025}
}

Acknowledgements

Our code is based on InfLLM, StreamingLLM, and Flash-VStream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReKV

Abstract

Directory Structure

Preparation

Evaluation

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
model		model
model_zoo		model_zoo
results		results
video_qa		video_qa
.gitignore		.gitignore
README.md		README.md
prepare.sh		prepare.sh
pyproject.toml		pyproject.toml

Becomebright/ReKV

Folders and files

Latest commit

History

Repository files navigation

ReKV

Abstract

Directory Structure

Preparation

Evaluation

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages