The University of North Carolina
National University
CFAR, Agency for Science,
University of California,
Harvard University
Diffusion language models (Diffusion-LMs) introduce an explicit temporal dimension into text generation, yet how this structure can be leveraged to control generation diversity remains underexplored. We identify that Diffusion-LMs exhibit a temporal division of labor: early denoising steps determine global semantic structure, while later steps focus on local lexical refinement.
Building on this insight, we propose Time-Annealed Perturbation Sampling (TAPS), a training-free inference strategy that encourages semantic branching by injecting perturbations early in the process and annealing them over time. TAPS significantly improves output diversity across creative writing and mathematical reasoning benchmarks without compromising generation quality.
Figure 1: Comparison between our method (TAPS) and base models. The visualization shows how TAPS encourages the model to explore distinct narrative paths (e.g., different stories about a knight) rather than converging on a single output.
The core intuition behind TAPS is to mimic the "temporal division of labor" found in diffusion models. Standard Diffusion-LMs use a fixed conditioning signal, leading to repetitive outcomes. TAPS perturbs the conditioning signal with a decaying magnitude.
Formally, given a context embedding \( E \), we define the perturbed conditioning at inference step \( t \) as:
$$ \tilde{E}^{(t)} = E + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I) $$where \( \sigma(t) \) follows a monotonically decreasing annealing schedule, ensuring that noise is strong in early stages for diversity and fades to zero in later stages for quality.
To prevent excessive drift and maintain instruction adherence, we employ a mixing strategy and norm-preserving projection. We interpolate between the rescaled perturbed embeddings \( E' \) and the original embeddings \( E \) using a coefficient \( \psi \):
$$ \hat{E} = \psi E' + (1-\psi)E $$This allows us to balance semantic exploration with fidelity to the original prompt.
Algorithm 1: The complete TAPS sampling procedure, illustrating the noise injection, annealing schedule, and quality preservation mechanisms integrated into the diffusion loop.
Figure 2: A conceptual comparison of the inference process. (A) Standard Diffusion-LMs use a static prompt embedding. (B) TAPS injects time-annealed noise into the prompt embedding, enabling diverse generation trajectories.
We evaluated TAPS on benchmarks like NoveltyBench, WritingPrompts, and GSM8K. Our method consistently outperforms standard decoding strategies (Top-k, Top-p, Min-p) in balancing diversity and quality.
Figure 3: Radar plots on NoveltyBench showing multi-aspect quality comparison. TAPS (blue line) demonstrates superior performance in Creativity and subjective rankings while maintaining factual knowledge.
Table 2: Diversity and quality comparison on WritingPrompts and Arena-Hard-Auto. TAPS achieves the highest scores in semantic diversity (Sent-BERT) and embedding-level diversity (EAD).
To understand how TAPS works, we projected the generated samples into a 2D space at different denoising stages.
Figure 5: Toy experiment on semantic branching. While the standard model collapses to a single mode, TAPS maintains a broad semantic coverage from the early stages through to the final output.
If you find our work useful, please cite our paper:
@article{wu2026taps,
title={Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models},
author={Wu, Jingxuan and Wan, Zhenglin and Yu, Xingrui and Yang, Yuzhe and Huang, Yiqiao and Tsang, Ivor and You, Yang},
journal={arXiv preprint},
year={2026}
}