Starred repositories
✍ WeChat Markdown Editor | 一款高度简洁的微信 Markdown 编辑器:支持 Markdown 语法、自定义主题样式、内容管理、多图床、AI 助手等特性
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
A set of tools that gives agents powerful capabilities.
🥢像老乡鸡🐔那样做饭。主要部分于2024年完工,非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》,并做归纳、编辑与整理。CookLikeHOC.
Parallelised rsync - using GNU parallel
A general purpose task-agnostic speech augmentation policy
Public documentation of Documentation.AI — an AI‑native documentation platform to ship beautiful, AI‑ready product and API docs.
Voice activity detection (VAD) paper and code(From 198*~ )and its classification.
A toolkit for speaker diarization.
Causality Check in Frame-online Speech Separation
A tutorial for Sound Source Localization researchers and practitioners. The purpose of this repo is to organize the world’s resources for Sound Source Localization, and make them universally access…
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
A toolkit dedicate for speech evaluation.
Python tool for converting files and office documents to Markdown.
Extend the capabilities of your CrewAI agents with Tools
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Repo for source code of EBEN: Extreme Bandwidth Extension Network
Keep track of good articles on speech processing, mainly on speech enhancement, include speech denoise, speech dereverberation and aec、agc, etc.
Phase Aware Deep Speech Enhancement - Pytorch
A library for audio and music analysis, feature extraction.
A deep neural network architecture for low-latency audio processing
A PyTorch implementation of "TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation" (see recipes in aps framework https://github.com/funcwj/aps)
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)


