Google Scholar Github Email: kifish.pro@gmail.com
Experience
- 2023.04-present LLM Researcher at ByteDance Seed LLM
- 2021.07-2023.03 NLP algorithm engineer at Kuaishou MMU
Publications
2025
Xingwei Qu, Shaowen Wang, Zihao Huang, Kai Hua, Fan Yin, Rui-Jie Zhu, Jundong Zhou, Qiyang Min, Zihao Wang, Yizhi Li, Tianyu Zhang, He Xing, Zheng Zhang, Yuxuan Song, Tianyu Zheng, Zhiyuan Zeng, Chenghua Lin, Ge Zhang, Wenhao Huang. Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space. arXiv:2512.24617, 2025.10
- Great Team Collaboration
- We propose Dynamic Large Concept Models (DLCM), a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient.
- I design and construct the training data entirely from open-source data.
- arXiv Hugging Face
Rui-Jie Zhu, Zixuan Wang, Kai Hua, Tianyu Zhang, Ziniu Li, Haoran Que, Boyi Wei, Zixin Wen, Fan Yin, He Xing, Lu Li, Jiajun Shi, Kaijing Ma, Shanda Li, Taylor Kergan, Andrew Smith, Xingwei Qu, Mude Hui, Bohong Wu, Qiyang Min, Hongzhi Huang, Xun Zhou, Wei Ye, Jiaheng Liu, Jian Yang, Yunfeng Shi, Chenghua Lin, Enduo Zhao, Tianle Cai, Ge Zhang, Wenhao Huang, Yoshua Bengio, Jason Eshraghian. Scaling Latent Reasoning via Looped Language Models. arXiv:2510.25741, 2025.10
- Great Team Collaboration
- We scale up Looped Language Models (LoopLM) to 2.6 billion parameters and complete pretraining on 7.7 trillion open-source tokens following a multi-stage data recipe encompassing Pretraining, Continual Training (CT), Long-CT, and Mid-Training. The resulting model is on par with SOTA language models of 2–3× size. We open source all the model weights and the data recipe.
- I design and curate all pretraining data mixtures utilizing open-source data and provide key insights throughout the pretraining process.
- Project Page arXiv Twitter Hugging Face 机器之心
Kai Hua, Steven Wu, Ge Zhang. AttentionInfluence: Adopting Attention Head Influence for Weak-to-Strong Pretraining Data Selection. arXiv:2505.07293, 2025.05
- LLM Pretrain-data Selection (Idea Originator && Project Leader)
- We propose AttentionInfluence, a training-free and supervision-free method for reasoning-centric data selection. By masking attention heads in a small pretrained model and measuring loss differences, we identify reasoning-intensive data that significantly improves the performance of larger models. Applied to a 7B model, our approach yields consistent gains on benchmarks like MMLU, GSM8K, and HumanEval—demonstrating an effective weak-to-strong scaling path for reasoning-focused pretraining.
- arXiv Twitter 量子位 Community Reproduction
NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents. arXiv:2512.12730, 2025.12
- Great Team Collaboration
- discussion and cooperation
- labeled examples case
- arXiv Hugging Face Twitter
Seed VLM&LLM Team. Seed-1.8, Technical Report, 2025.12
- VLM&LLM (Team Collaboration)
- Provide long-context(128K/512K) CT data and long-context evaluation
- github
Seed Model&LLM&VLM Team. Seed-VWN, Technical Report, 2025.11
- Model&LLM&VLM (Team Collaboration)
- Provide long-context(128K/512K) CT data and long-context evaluation
- arXiv
Seed LLM Team. Seed OSS 36B, Open Source Model, 2025.08
- LLM Code/Pretrain (Team Collaboration)
- Led the text mid-training and long-context(128K/512K) CT
- Hugging Face 量子位
Seed LLM&VLM Team. Seed-1.6, Technical Blog, 2025.06
- LLM&VLM Pretrain (Team Collaboration)
- Led the multimodal long-context(128K/512K) CT
- Technical Blog 机器之心
Seed VLM&LLM Team. Seed1.5-VL Technical Report. arXiv:2505.07062, 2025.05.
Seed LLM Team. Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning. arXiv:2504.13914. 2025.04
2024
- Chongyang Tao, Tao Shen, Shen Gao, Junshuo Zhang, Zhen Li, Kai Hua, Zhengwei Tao, and Shuai Ma. Llms are also effective embedding models: An in-depth overview. arXiv preprint arXiv:2412.12591, 2024.12
- arXiv
2020
- Kai Hua, Zhiyuan Feng, Chongyang Tao, Rui Yan, Lu Zhang. Learning to Detect Relevant Contexts and Knowledge for Response Selection in Retrieval-based Dialogue Systems. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM 2020), 2020.10
- arXiv