Stars
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
Wan: Open and Advanced Large-Scale Video Generative Models
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Text Normalization & Inverse Text Normalization
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
PyTorch implementation of some attentions for Deep Learning Researchers.
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
基于AI的图片/视频硬字幕去除、文本水印去除,无损分辨率生成去字幕、去水印后的图片/视频文件。无需申请第三方API,本地实现。AI-based tool for removing hard-coded subtitles and text-like watermarks from videos or Pictures.
APISR: Anime Production Inspired Real-World Anime Super-Resolution (CVPR 2024)
[WIP] Layer Diffusion for WebUI (via Forge)
Transparent Image Layer Diffusion using Latent Transparency
Unofficial implementation of I2VGenXL for ComfyUI
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/ (no longer working)
a state-of-the-art-level open visual language model | 多模态预训练模型
Official implementation of AnimateDiff.
该资源为作者在CSDN的撰写Python图像处理文章的支撑,主要是Python实现图像处理、图像识别、图像分类等算法代码实现,希望该资源对您有所帮助,一起加油。
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
