A tutorial for Sound Source Localization researchers and practitioners. The purpose of this repo is to organize the world’s resources for Sound Source Localization, and make them universally access…

50 12 Updated Mar 17, 2023

snakers4 / silero-vad

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 7,976 720 Updated Dec 30, 2025

ftshijt / speech_evaluation

A toolkit dedicate for speech evaluation.

Python 24 4 Updated Sep 26, 2024

idootop / mi-gpt

🏠 将小爱音箱接入 ChatGPT 和豆包，改造成你的专属语音助手。

TypeScript 12,063 1,613 Updated Sep 10, 2025

microsoft / markitdown

Python tool for converting files and office documents to Markdown.

Python 85,554 4,947 Updated Jan 8, 2026

crewAIInc / crewAI-tools

Extend the capabilities of your CrewAI agents with Tools

Python 1,349 493 Updated Oct 23, 2025

78 / xiaozhi-esp32

An MCP-based chatbot | 一个基于MCP的聊天机器人

C++ 23,431 4,935 Updated Jan 20, 2026

modelscope / ClearerVoice-Studio

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

Python 3,850 314 Updated Aug 14, 2025

ddlBoJack / Speech-Resources

语音方向实验室/公司/资源/实习等，欢迎推荐或自荐

594 69 Updated Nov 13, 2024

jhauret / eben

Repo for source code of EBEN: Extreme Bandwidth Extension Network

Python 76 11 Updated May 21, 2025

YoungJay0612 / Single-Channel-Speech-Enhancement

Keep track of good articles on speech processing, mainly on speech enhancement, include speech denoise, speech dereverberation and aec、agc, etc.

47 6 Updated Jul 17, 2024

bytedance / piano_transcription

Python 1,933 232 Updated Aug 18, 2023

CarmiShimon / Phase-Aware-Deep-Speech-Enhancement

Phase Aware Deep Speech Enhancement - Pytorch

Python 7 Updated Jul 6, 2022

libAudioFlux / audioFlux

A library for audio and music analysis, feature extraction.

C 3,242 150 Updated May 24, 2024

gmalivenko / pytorch2keras

PyTorch to Keras model convertor

Python 863 143 Updated Dec 8, 2022

vb000 / Waveformer

A deep neural network architecture for low-latency audio processing

Python 323 34 Updated Aug 15, 2023

funcwj / conv-tasnet

A PyTorch implementation of "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" (see recipes in aps framework https://github.com/funcwj/aps)

Python 218 61 Updated Jul 6, 2023

YunyangZeng / TAPLoss

Python 66 12 Updated Jun 27, 2023

XiaoMi / mace

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

C++ 5,030 825 Updated Jun 17, 2024

NVIDIA-NeMo / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

凌逆战 LXP-Never

Starred repositories

acoustic-model

speech-synthesis

voice-activity-detection

noise-reduction

lpc

beamforming

personalized-speech-enhancement

speech-enhancement