Skip to content
View long8v's full-sized avatar
🤓
Happy Research
🤓
Happy Research

Block or report long8v

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs

Python 409 21 Updated Dec 20, 2025

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Python 815 53 Updated Oct 15, 2025

implementations and experimentation on mHC by deepseek - https://arxiv.org/abs/2512.24880

Python 202 15 Updated Jan 4, 2026

A Paper List for Humanoid Robot Learning.

1,491 70 Updated Jan 6, 2026

Official Implementation of MambaMia (AAAI-26 Oral)

3 Updated Dec 21, 2025

Official repository of the paper "Does audio matter for modern video-LLMs and their benchmarks?"

3 Updated Nov 24, 2025

An open-source implementaion for fine-tuning Qwen-VL series by Alibaba Cloud.

Python 1,565 189 Updated Dec 19, 2025

Query-aware Token Selector (QTSplus), a lightweight yet powerful visual token selection module that serves as an information gate between the vision encoder and LLMs.

Python 129 9 Updated Nov 29, 2025

Official PyTorch Code of ReKV (ICLR'25)

Python 88 6 Updated Nov 4, 2025

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Python 492 14 Updated Nov 18, 2025

slime is an LLM post-training framework for RL Scaling.

Python 3,258 408 Updated Jan 9, 2026

A framework for efficient model inference with omni-modality models

Python 2,074 271 Updated Jan 10, 2026

ChatDev 2.0: Dev All through LLM-powered Multi-Agent Collaboration

Python 28,148 3,558 Updated Jan 10, 2026

Ring attention implementation with flash attention

Python 961 93 Updated Sep 10, 2025

🔥🔥🔥 Latest Papers, Codes and Datasets on Video-LMM Post-Training

Python 225 10 Updated Nov 21, 2025

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 616 66 Updated Nov 26, 2025

Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO

Python 83 Updated Dec 1, 2025

Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries

33 1 Updated Nov 19, 2025

[NeurIPS 2025] HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models

Python 22 1 Updated Nov 30, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,372 271 Updated Jan 9, 2026

GLM-4.6V/4.5V/4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Python 2,112 145 Updated Dec 18, 2025

SAIL-RL: Guiding MLLMs in When and How to Think via Dual-Reward RL Tuning

9 Updated Nov 8, 2025

An efficient video loader for deep learning with smart shuffling that's super easy to digest

C++ 2,394 214 Updated Jul 17, 2024

Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"

Python 127 7 Updated Dec 18, 2025

video-SALMONN 2 is a powerful audio-visual large language model (LLM) that generates high-quality audio-visual video captions, which is developed by the Department of Electronic Engineering at Tsin…

Python 136 15 Updated Dec 22, 2025

Native Multimodal Models are World Learners

Python 1,395 53 Updated Dec 30, 2025

Code for the Molmo Vision-Language Model

Python 854 83 Updated Dec 12, 2024

A simple, unified multimodal models training engine. Lean, flexible, and built for hacking at scale.

Python 693 27 Updated Jan 9, 2026

State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!

Jupyter Notebook 2,065 139 Updated Dec 18, 2025

The best ChatGPT that $100 can buy.

Python 40,038 5,134 Updated Jan 8, 2026
Next