A Comparative Survey of Inference Acceleration for DLLMs against AR-LLMs: No Free Lunch

Haoyun Jiang; Junqi He; Muyi Wang; Fanqin Zeng; Feng Hong; Geng Yu; Pengyi Chen; Yushi Ye; Yuting Cao; Yicheng Fu; Ziyi Tang; Haolin Li; Yuchen Xiong; Zhiyong Chen; Xiaofeng Cao; Xiangtao Li; Bo Han; Ya Zhang; Yanfeng Wang; Jiangchao Yao

doi:10.20944/preprints202605.0776.v1

Submitted:

11 May 2026

Posted:

12 May 2026

You are already at the latest version

Abstract

Autoregressive large language models (AR-LLMs) have achieved remarkable success, but their inherently sequential decoding process remains a fundamental bottleneck for efficient inference. Diffusion large language models (DLLMs), with bidirectional modeling and parallel token generation, offer a promising alternative to break this token-by-token limitation. Yet despite rapid progress, the practical inference efficiency of current DLLMs remains unclear. From a verification perspective, this survey establishes a systematic taxonomy of existing acceleration methods, benchmarks representative techniques under a unified experimental setting, and further evaluates strong strategy combinations to quantify the gap between mainstream DLLM inference methods and state-of-the-art AR baselines. Specially, the overall analysis highlights that the parallel decoding efficiency of DLLMs still remains a significant lag compared to the decoding efficiency of AR-LLMs under inference acceleration. We provide an in-depth experimental analysis about the underlying trade-offs among generation quality, latency, and system compatibility, and build up a standard evaluation bench open to the community. Remaining bottlenecks are also summarized, together with future directions for more practical and competitive DLLM inference. Code is available at \url{https://github.com/haoyun-jiang/DLLM-AccelEval}.

Keywords:

diffusion LLMs

;

autoregressive LLMs

;

inference acceleration

;

parallel decoding

;

comparative analysis

Subject:

Computer Science and Mathematics - Computer Science

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

A Comparative Survey of Inference Acceleration for DLLMs against AR-LLMs: No Free Lunch

Abstract

Keywords:

Subject:

MDPI Initiatives

Important Links

Subscribe