Time-Annealed Perturbation Sampling:
Diverse Generation for Diffusion Language Models

Jingxuan Wu¹*, Zhenglin Wan²*, Xingrui Yu³†
Yuzhe Yang⁴, Yiqiao Huang⁵, Ivor Tsang³, Yang You²

* Equal contribution † Corresponding author

The University of North Carolina
at Chapel Hill

National University
of Singapore

CFAR, Agency for Science,
Technology and Research

University of California,
Santa Barbara

Harvard University

jingxwu@unc.edu, Yu_Xingrui@a-star.edu.sg

arXiv Code BibTeX

Abstract

Diffusion language models (Diffusion-LMs) introduce an explicit temporal dimension into text generation, yet how this structure can be leveraged to control generation diversity remains underexplored. We identify that Diffusion-LMs exhibit a temporal division of labor: early denoising steps determine global semantic structure, while later steps focus on local lexical refinement.

Building on this insight, we propose Time-Annealed Perturbation Sampling (TAPS), a training-free inference strategy that encourages semantic branching by injecting perturbations early in the process and annealing them over time. TAPS significantly improves output diversity across creative writing and mathematical reasoning benchmarks without compromising generation quality.

Figure 1: Comparison between TAPS and base models

Figure 1: Comparison between our method (TAPS) and base models. The visualization shows how TAPS encourages the model to explore distinct narrative paths (e.g., different stories about a knight) rather than converging on a single output.

Methodology

The core intuition behind TAPS is to mimic the "temporal division of labor" found in diffusion models. Standard Diffusion-LMs use a fixed conditioning signal, leading to repetitive outcomes. TAPS perturbs the conditioning signal with a decaying magnitude.

Perturbed Conditioning

Formally, given a context embedding $ E $, we define the perturbed conditioning at inference step $ t $ as:

$$ \tilde{E}^{(t)} = E + \sigma(t) \cdot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I) $$

where $ \sigma(t) $ follows a monotonically decreasing annealing schedule, ensuring that noise is strong in early stages for diversity and fades to zero in later stages for quality.

Quality Preservation

To prevent excessive drift and maintain instruction adherence, we employ a mixing strategy and norm-preserving projection. We interpolate between the rescaled perturbed embeddings $ E' $ and the original embeddings $ E $ using a coefficient $ \psi $:

$$ \hat{E} = \psi E' + (1-\psi)E $$

This allows us to balance semantic exploration with fidelity to the original prompt.

Algorithm Overview

Algorithm 1: The complete TAPS sampling procedure, illustrating the noise injection, annealing schedule, and quality preservation mechanisms integrated into the diffusion loop.

Figure 2: A conceptual comparison of the inference process. (A) Standard Diffusion-LMs use a static prompt embedding. (B) TAPS injects time-annealed noise into the prompt embedding, enabling diverse generation trajectories.

Early Stage (Semantic Branching): Stronger noise forces the model to explore diverse high-level semantic paths.
Late Stage (Refinement): Noise is annealed (reduced) to ensure fluency and instruction adherence in the final steps.

Experimental Results

We evaluated TAPS on benchmarks like NoveltyBench, WritingPrompts, and GSM8K. Our method consistently outperforms standard decoding strategies (Top-k, Top-p, Min-p) in balancing diversity and quality.

Quality vs. Diversity

Figure 3: Radar plots on NoveltyBench showing multi-aspect quality comparison. TAPS (blue line) demonstrates superior performance in Creativity and subjective rankings while maintaining factual knowledge.

Quantitative Metrics

Table 2: Diversity and quality comparison on WritingPrompts and Arena-Hard-Auto. TAPS achieves the highest scores in semantic diversity (Sent-BERT) and embedding-level diversity (EAD).

Visualizing Semantic Branching

To understand how TAPS works, we projected the generated samples into a 2D space at different denoising stages.

Figure 5: Semantic Branching Visualization

Figure 5: Toy experiment on semantic branching. While the standard model collapses to a single mode, TAPS maintains a broad semantic coverage from the early stages through to the final output.

Citation

If you find our work useful, please cite our paper:

@article{wu2026taps,
  title={Time-Annealed Perturbation Sampling: Diverse Generation for Diffusion Language Models},
  author={Wu, Jingxuan and Wan, Zhenglin and Yu, Xingrui and Yang, Yuzhe and Huang, Yiqiao and Tsang, Ivor and You, Yang},
  journal={arXiv preprint},
  year={2026}
}