Skip to content
View runninglsy's full-sized avatar

Block or report runninglsy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Ovis-Image is a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational constraints.

Python 296 16 Updated Dec 21, 2025

[CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation

Python 1,341 126 Updated Dec 19, 2025

An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.

Python 446 14 Updated Dec 2, 2025

Awesome Unified Multimodal Models

1,022 32 Updated Aug 17, 2025

Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"

Python 989 111 Updated Mar 4, 2024

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Python 211 9 Updated Sep 26, 2025

[NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomalies

Python 224 4 Updated Apr 14, 2025

Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding

Python 53 5 Updated Dec 12, 2024

Valley is a cutting-edge multimodal large model designed to handle a variety of tasks involving text, images, and video data.

Python 268 14 Updated Dec 2, 2025

Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.

Python 32 3 Updated Feb 26, 2025

The code repository for "Wings: Learning Multimodal LLMs without Text-only Forgetting" [NeurIPS 2024]

Python 24 1 Updated Dec 28, 2024

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 3 Updated Nov 4, 2024
Python 3 Updated Nov 14, 2024

Agentic ADK is an Agent application development framework launched by Alibaba International AI Business, based on Google-ADK and Ali-LangEngine.

Java 638 120 Updated Nov 24, 2025

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Python 677 43 Updated Jan 29, 2025

Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vectors

Python 264 14 Updated Feb 17, 2025
Python 187 10 Updated Feb 6, 2025

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3, Qwen3-MoE, DeepSeek-R1, GLM4.5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, …

Python 12,020 1,106 Updated Jan 7, 2026

[ECCV 2024] Official Implementation of An Incremental Unified Framework for Small Defect Inspection

Python 45 1 Updated Feb 17, 2025

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 96,905 10,742 Updated Jan 5, 2026

[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Python 116 4 Updated Jul 18, 2024

🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.

Python 77 3 Updated Jun 12, 2025

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 9,679 752 Updated Sep 22, 2025
Python 7 Updated Aug 16, 2024

[NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.

Python 128 12 Updated May 16, 2025

Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

Python 92 9 Updated Jun 28, 2024

[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Python 137 16 Updated Apr 22, 2025

When do we not need larger vision models?

Python 413 15 Updated Feb 8, 2025

Data and Code for ACL 2021 Paper "Inter-GPS: Interpretable Geometry Problem Solving with Formal Language and Symbolic Reasoning"

Python 171 34 Updated Mar 29, 2025
Python 12 Updated May 9, 2023
Next