Blogs

2026

Bing He, Rui Sun, Zhan Shi, Hanqing Lu, From Llama-2 Replication to DeepSeek V3.2&R1: Revisiting DeepSeek’s 13 Research Papers for the Complete Technical Evolution of DeepSeek Base and Reasoning Models, Blog link, Feb 2026
Blog Summary: Based on DeepSeek’s 13 papers, this blog post traces DeepSeek’s complete technical evolution from Jan 2024 to Jan 2026 across two major arcs: 1) the base model lineage (DeepSeek LLM → MoE → V2 → V3 → V3.2) and 2) the reasoning model lineage (Coder → Math → Coder-V2 → Prover → Prover-V1.5 → R1 → Prover-V2 → Math-V2). What emerges is a story of incremental compounding — where each paper’s contribution, however modest in isolation, becomes a critical building block for what follows.

2025

Bing He, Rui Sun, Zhan Shi, Building and Aligning LLM using User Data and RL, Blog link, Nov 2025
Blog Summary: This blog discusses the challenges and solutions in building and aligning large language models (LLMs) using user data and reinforcement learning (RL). It walks through an end-to-end alignment stack for an open-ended dialogue model: from low-level training stability in Mixture-of-Experts (MoE) to FP8 serving, supervised fine-tuning (SFT) on user chats, and finally reinforcement learning (RL) driven by engagement and retention signals.
Zhan Shi, Rui Sun, Bing He, Awesome Cursor Training Tutorial, Blog Link, Nov 2025
Blog Summary: A comprehensive guide to understanding how Cursor scales Reinforcement Learning (RL) for AI-powered code assistance from Ray Summit 2025. Cursor uses a sophisticated RL pipeline that trains AI models to become better coding assistants by learning from real-world interactions in a production-like environment. The system consists of three main components working in harmony: (1) Trainer: Handles model training with custom optimizations; (2) Inference: Manages model deployment and load balancing; (3) Environment: Provides a production-grade agent server for realistic training.
Zhan Shi, Rui Sun, Bing He, The Hidden Language of AI: How Chat Templates Reveal the Evolution of LLMs, Blog Link, Oct 2025
Blog Summary: The blog explains that “chat templates”—the hidden formatting that turns role-tagged messages into the exact token strings models were trained on—have quietly driven LLMs’ evolution from plain text completion to multi-turn assistants to tool-enabled agents. It traces the shift from early prompt hacks to role labels and system prompts (which enforce persona and rules), then shows how different ecosystems (OpenAI, Llama, Mistral, Qwen) adopted incompatible templates, prompting Hugging Face’s Jinja-based standardization. We also provide practical guides that using the wrong template degrades quality, highlights the token-economics of template design, and forecasts templates expanding to handle tool use, multi-agent threads, million-token contexts, and richer modalities.

Bing He

Blogs