Stars
Nano Banana(nanobanana),GPT-5(GPT5),GPT-4o(GPT4o) Image Prompts,Nanobanana Prompts,nanobanana提示词
real time face swap and one-click video deepfake with only a single image
[SIGGRAPH Asia 2025] DreamO: A Unified Framework for Image Customization
Implementation of "EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer"(ICCV2025)
Multi/Single UAV(unmanned aerial vehicle) path planning based on deep reinforcement learning
路径规划算法,A*,A-star启发搜索,Hybrid-A*,混合A*算法,Dijkstra迪杰斯特拉算法,GBFS贪婪最佳优先搜索算法,DFS深度优先搜索,BFS广度优先搜索算法等,车辆路径规划算法,小黑子路径规划
RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
[ICCV2025] UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Image composition toolbox: everything you want to know about image composition or object insertion
ControlNet++: All-in-one ControlNet for image generations and editing!
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Noise supression using deep filtering
Awesome-LLM: a curated list of Large Language Model
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
singing voice change based on whisper, and lora for singing voice clone
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Multilingual Voice Understanding Model
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Open-source, accurate and easy-to-use video speech recognition & clipping tool, LLM based AI clipping intergrated.
This converter converts multiple Uyghur scripts: ULS(Uyghur Latin Script), UAS(Uyghur Arabic Script), CTS(Common Turkick Scritp), UCS(Uyghur Cyrilik Script) and Uyghur Yengi (new) Script.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
vits2 backbone with multilingual-bert
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

