-
National Taiwan University
- Taiwan
- https://daniellin94144.github.io/
Highlights
- Pro
Stars
A real-time and multilingual speech translation model
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
PaperBanana: Automating Academic Illustration For AI Scientists
TTS model capable of streaming conversational audio in realtime.
Qwen3-TTS is an open-source series of TTS models developed by the Qwen team at Alibaba Cloud, supporting stable, expressive, and streaming speech generation, free-form voice design, and vivid voice…
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS …
X-Talk is an open-source full-duplex cascaded spoken dialogue system framework enabling low-latency, interruptible, and human-like speech interaction with a lightweight, pure-Python, production-rea…
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Post-training with Tinker
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
davidbrowne17 / csm-streaming
Forked from SesameAILabs/csmRealtime demo, Streaming and Finetuning code for CSM
Fast and memory-efficient exact attention
Qwen2.5-Omni fine-tuned on MNV-17 dataset for nonverbal vocalization recognition
🟣 LLMs interview questions and answers to help you prepare for your next machine learning and data science interview in 2026.
Official code of ICML 2025 paper "NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction"
This repo is meant to serve as a guide for Machine Learning/AI technical interviews.
Mini-Omni-Reasoner: a real-time speech reasoning framework that interleaves silent reasoning tokens with spoken response tokens (“thinking-in-speaking”), exploiting the LLM–audio throughput gap to …
End-to-end realtime stack for connecting humans and AI
Foundational model for human-like, expressive TTS
EMO-Reasoning: Benchmarking Emotional Reasoning Capabilities in Spoken Dialogue Systems
