Starred repositories
🎬 卡卡字幕助手 | VideoCaptioner - 基于 LLM 的智能字幕助手 - 视频字幕生成、断句、校正、字幕翻译全流程处理!- A powered tool for easy and efficient video subtitling.
Anthropic's Interactive Prompt Engineering Tutorial
A free, open source, and extensible speech-to-text application that works completely offline.
We write your reusable computer vision tools. 💜
Reference PyTorch implementation and models for DINOv3
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
[NeurIPS 2025] Official implementation of "XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation".
A unified inference and post-training framework for accelerated video generation.
An inference and training framework for multiple image input in Flux Kontext dev
A PyTorch native platform for training generative AI models
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
The ultimate training toolkit for finetuning diffusion models
[ICCV 2025] Official implementations for paper: VACE: All-in-One Video Creation and Editing
This node preserves image quality by selectively merging only the changed regions from AI-generated edits back into the original image.
Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By Value Sign Flip
Repo for SeedVR2 & SeedVR (CVPR2025 Highlight)
Use Claude Code as the foundation for coding infrastructure, allowing you to decide how to interact with the model while enjoying updates from Anthropic.
LBM: Latent Bridge Matching for Fast Image-to-Image Translation ✨ (ICCV 2025 Highlight)
Context engineering is the new vibe coding - it's the way to actually make AI coding assistants work. Claude Code is the best for this so that's what this repo is centered around, but you can apply…
[NeurIPS' 2025] JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
An open-source AI agent that brings the power of Gemini directly into your terminal.
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
[WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"
[AAAI 2026] VisionReward: Fine-Grained Multi-Dimensional Human Preference Learning for Image and Video Generation
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences for Text-to-image Generation
CLIP+MLP Aesthetic Score Predictor