I got my master’s degree from Harbin Institute of Technology (HIT), supervised by Dr. Guangming Lu and Dr. Wenjie Pei, and my bachelor’s degree from South China Normal University (SCNU).
Now I am a member of the Next-gen Kaldi team at Xiaomi (via Xiaomi Future Star program), under the supervision of Dr. Daniel Povey, an IEEE Fellow renowned in speech processing. I am a core contributor to the open-source speech recognition project icefall , with main contributions in advanced model architectures and training recipes.
My research mainly focuses on speech and audio modeling (recognition, generation, and separation), with publications in reputable conferences and journals such as ICLR, TASLP, ICASSP, INTERSPEECH, ASRU, and NeurIPS . I also serve as a reviewer for ICLR 2025, ICLR 2026, ICML 2026, and ACL 2026.
🔥 News
- 2026.01: 🎉🎉 One paper (Flow2GAN) is accepted by ICLR 2026.
- 2025.09: 🎉🎉 One paper (TransMLA) is accepted by NeurIPS 2025 (Spotlight, Top 3.19%).
- 2025.08: 🎉🎉 One paper (ZipVoice) is accepted by ASRU 2025.
- 2025.01: 🎉🎉 One paper (CR-CTC) is accepted by ICLR 2025.
- 2024.01: 🎉🎉 One paper (Zipformer) is accepted by ICLR 2024 (Oral, Top1.2%).
- 2023.12: 🎉🎉 Two papers (PromptASR, LibriHeavy) are accepted by ICASSP 2024.
- 2023.05: 🎉🎉 Two papers (Delay-penalized CTC, Blank-regularized CTC) are accepted by INTERSPEECH 2023.
- 2023.02: 🎉🎉 Three papers (Delay-penalized transducer, MVQ, Fast decoding) are accepted by ICASSP 2023.
- 2022.06: 🎉🎉 One paper (Pruned RNN-T) is accepted by INTERSPEECH 2022.
- 2022.02: 😄😄 I join Xiaomi Next-gen Kaldi team, under the supervision of Dr. Daniel Povey.
- 2022.01: 🎉🎉 One paper (Stepwise-Refining Speech Separation) is accepted by TASLP 2022.
📝 Publications
-
[ICLR 2026] Flow2GAN: Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-step High-Fidelity Audio Generation, Z Yao, W Kang, H Zhu, L Guo, L Ye, F Kuang, W Zhuang, Z Li, Z Han, L Lin, D Povey
-
[ICLR 2025] CR-CTC: Consistency regularization on CTC for improved speech recognition, Z Yao, W Kang, X Yang, F Kuang, L Guo, H Zhu, Z Jin, Z Li, L Lin, D Povey
-
[ICLR 2024 Oral, Top 1.2%] Zipformer: A faster and better encoder for automatic speech recognition, Z Yao, L Guo, X Yang, W Kang, F Kuang, Y Yang, Z Jin, L Lin, D Povey
-
[INTERSPEECH 2023] Delay-penalized CTC implemented based on Finite State Transducer, Z Yao*, W Kang*, F Kuang, L Guo, X Yang, Y Yang, L Lin, D Povey
-
[ICASSP 2023] Delay-penalized transducer for low-latency streaming ASR, W Kang*, Z Yao*, F Kuang, L Guo, X Yang, L Lin, P Żelasko, D Povey
-
[TASLP 2022] Stepwise-refining speech separation network via fine-grained encoding in high-order latent domain, Z Yao, W Pei, F Chen, G Lu, D Zhang
-
[NeurIPS 2025 Spotlight, Top 3.19%] TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup, F Meng, P Tang, Z Yao, X Sun, M Zhang
-
[ASRU 2025] ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching, H Zhu, W Kang, Z Yao, L Guo, F Kuang, Z Li, W Zhuang, L Lin, D Povey
-
[ICASSP 2024] PromptASR for contextualized ASR with controllable style, X Yang, W Kang, Z Yao, Y Yang, L Guo, F Kuang, L Lin, D Povey
-
[ICASSP 2024] Libriheavy: a 50,000 hours asr corpus with punctuation casing and context, W Kang, X Yang, Z Yao, F Kuang, Y Yang, L Guo, L Lin, D Povey
-
[INTERSPEECH 2023] Blank-regularized CTC for Frame Skipping in Neural Transducer, Y Yang, X Yang, L Guo, Z Yao, W Kang, F Kuang, L Lin, X Chen, D Povey
-
[ICASSP 2023] Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation, L Guo, X Yang, Q Wang, Y Kong, Z Yao, F Cui, F Kuang, W Kang, L Lin, M Luo, P Żelasko, D Povey
-
[ICASSP 2023] Fast and parallel decoding for transducer, W Kang, L Guo, F Kuang, L Lin, M Luo, Z Yao, X Yang, P Żelasko, D Povey
-
[INTERSPEECH 2022] Pruned RNN-T for fast, memory-efficient ASR training, F Kuang, L Guo, W Kang, L Lin, M Luo, Z Yao, D Povey
📖 Educations
- 2019.09 - 2022.01, Harbin Institute of Technology, Master of Engineering in Computer Technology
- 2015.09 - 2019.06, South China Normal University, Bachelor of Engineering in Software Engineering
💬 Invited Talks
- 2025 RTE 论坛硬件和端侧模型专场分享:Flow2GAN 高效高质量音频生成
- 2023 CCF 语音对话与听觉专委会语音算法技术交流沙龙: 流式语音识别吐字时延正则化
- 2022 CCF 语音对话与听觉专委会 AI 产业沙龙: Reworked Conformer 模型与基于多码本量化的蒸馏方案
💻 Internships
- 2021.06 - 2021.08, Tencent, Shenzhen, China.