Stars
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
Enforce the output format (JSON Schema, Regex etc) of a language model
A series of technical report on Slow Thinking with LLM
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
A fork to add multimodal model training to open-r1
Witness the aha moment of VLM with less than $3.
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
[EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models
qnguyen3 / nanoLLaVA
Forked from BAAI-DCAI/BunnyWorld's Smallest Vision-Language Model
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、…
WebChatGPT: A browser extension that augments your ChatGPT prompts with web results.
This is an open-source toolkit for Heterogeneous Graph Neural Network(OpenHGNN) based on DGL.
A plug-and-play library for parameter-efficient-tuning (Delta Tuning)
A Unified Library for Parameter-Efficient and Modular Transfer Learning
Transfer learning / domain adaptation / domain generalization / multi-task learning etc. Papers, codes, datasets, applications, tutorials.-迁移学习
推荐、广告工业界经典以及最前沿的论文、资料集合/ Must-read Papers on Recommendation System and CTR Prediction