Stars
Bash is all you need - A nano claude code–like 「agent harness」, built from 0 to 1
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
Self-supervised key estimation model that matches performance with supervised state-of-the-art model.
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
The most powerful local music generation model that outperforms most commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
Robust Singing Voice Transcription and MIDI Extraction
Suno API with JWT token authentication support
Use API to call the music generation AI of suno.ai, and easily integrate it into agents like GPTs.
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".
轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Encode and decode audio samples to/from continuous and discrete compressed representations!
HeartMuLa Official Repo: The Most Powerful Open-Source Music Generation Model of 2026
Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
AnyAccomp: Generalizable accompaniment generation for vocals and solo instruments, powered by a quantized melodic bottleneck.
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
Official PyTorch Implementation of "Diffusion Transformers with Representation Autoencoders"
A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation