Submitted:
15 August 2025
Posted:
19 August 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Mathematical Foundations of Quantization for Generative AI Models
2.1. Quantization as a Mapping Function
- is a real-valued input (e.g., a model weight or activation),
- is the quantization offset (often the minimum of the range),
- is the quantization step size or scale factor,
- and define the lower and upper bounds of the quantized integer range (e.g., for 8-bit unsigned integers),
- denotes the rounding operator, often chosen as round-to-nearest or stochastic rounding [10].
2.2. Error Characterization and Impact on Generative Models
2.3. Quantization Granularity and Precision Schemes
- Weights:, usually quantized offline [15].
- Activations:, often requiring dynamic or range-aware calibration [16].
- Gradients: in quantization-aware training (QAT), gradient quantization may be employed to enable low-precision backpropagation.
- 8-bit (INT8): The most common target for efficient inference, offering a good trade-off between compression and performance.
- 4-bit (INT4): Provides further compression but with higher sensitivity; requires advanced calibration or retraining.
- Mixed-precision: Different layers or components are quantized at different bitwidths, chosen via heuristic, data-driven, or learned policies [17].
- Adaptive precision: Precision is adjusted dynamically based on model confidence, entropy of output distribution, or computational budget [18].
2.4. Statistical Calibration and Range Estimation
- Clipping-based methods: Define a clipping threshold T such that the quantization range is limited to or for Gaussian-distributed activations [19].
- KL-divergence minimization: Choose quantization boundaries to minimize the Kullback-Leibler divergence between the original and quantized distributions.
- Percentile-based heuristics: Use percentiles (e.g., 99.9%) of the activation histogram to exclude outliers [20].
2.5. Layer-Wise Sensitivity and Hessian-Aware Quantization
2.6. Quantization and Attention Mechanisms
2.7. Quantization Noise Propagation
2.8. Summary
3. Taxonomy of Quantization Techniques for Generative AI Models
4. Empirical Evaluation and Benchmarking of Quantization Methods
| Method | Type | Bit-width | Granularity | Calibration / Training | Notable Features |
|---|---|---|---|---|---|
| Naive PTQ | Post-training | 8-bit | Per-tensor | Min-max or percentile stats | Simple, fast; poor performance on large GenAI models |
| GPTQ | Post-training | 4–8 bit | Block-wise | Hessian-based, no retraining | Uses second-order info to minimize quantization error |
| AWQ | Post-training | 4-bit | Per-channel | Weight-scaling, outlier handling | Improved outlier robustness and downstream generation |
| QAT | During-training | Any (e.g., 8/4-bit) | Per-channel or mixed | Full training loop with fake quant ops | High fidelity, expensive training overhead |
| LQ-Nets | Training-time | Variable | Learned group-wise | Optimized quantizer via gradient descent | Non-uniform quantization learned jointly with model |
| AdaQuant | Post-training | Adaptive (4–8 bit) | Mixed-precision | Loss-based selection | Selective quantization of layers to maintain accuracy |
| OCS (Outlier Channel Splitting) | Post-training | 4–8 bit | Channel-level | Static clipping or learned thresholds | Splits large-magnitude channels to reduce outlier error |
| ZeroQuant | Post-training | 4-bit | Layer-wise | Activation range via representative data | Transformer-specific, zero-point adjusted scaling |
5. Implementation and Deployment Considerations for Quantized Generative Models
6. Challenges, Limitations, and Open Problems in Quantizing Generative Models
7. Future Directions and Research Opportunities
8. Conclusions
References
- Cao, H.; Tan, C.; Gao, Z.; Xu, Y.; Chen, G.; Heng, P.A.; Li, S.Z. A Survey on Generative Diffusion Models. IEEE Transactions on Knowledge and Data Engineering 2024, 36, 2814–2830. [Google Scholar] [CrossRef]
- Vignac, C.; Krawczuk, I.; Siraudin, A.; Wang, B.; Cevher, V.; Frossard, P. DiGress: Discrete Denoising diffusion for graph generation. In Proceedings of the The Eleventh International Conference on Learning Representations; 2022. [Google Scholar]
- Kim, J.; Halabi, M.E.; Ji, M.; Song, H.O. LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging. arXiv, 2024; arXiv:2406.12837 2024. [Google Scholar]
- Barratt, S.; Sharma, R. A Note on the Inception Score. 2018; arXiv:stat.ML/1801.01973]. [Google Scholar]
- Zhang, K.; Yang, X.; Wang, W.Y.; Li, L. ReDi: efficient learning-free diffusion inference via trajectory retrieval. In Proceedings of the International Conference on Machine Learning. PMLR; 2023; pp. 41770–41785. [Google Scholar]
- Lee, S.g.; Kim, H.; Shin, C.; Tan, X.; Liu, C.; Meng, Q.; Qin, T.; Chen, W.; Yoon, S.; Liu, T.Y. PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior. In Proceedings of the International Conference on Learning Representations; 2021. [Google Scholar] [CrossRef]
- Song, Y.; Ermon, S. Improved techniques for training score-based generative models. Advances in neural information processing systems 2020, 33, 12438–12448. [Google Scholar]
- Ulhaq, A.; Akhtar, N.; Pogrebna, G. Efficient diffusion models for vision: A survey. arXiv, 2022; arXiv:2210.09292 2022. [Google Scholar]
- Li, X.; Lai, Z.; Xu, L.; Guo, J.; Cao, L.; Zhang, S.; Dai, B.; Ji, R. Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion. arXiv, 2024; arXiv:2405.09874 2024. [Google Scholar]
- Croitoru, F.A.; Hondru, V.; Ionescu, R.T.; Shah, M. Diffusion Models in Vision: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 2023, 45, 10850–10869. [Google Scholar] [CrossRef]
- Ma, H.; Zhang, L.; Zhu, X.; Feng, J. Accelerating score-based generative models with preconditioned diffusion sampling. In Proceedings of the European Conference on Computer Vision. Springer; 2022; pp. 1–16. [Google Scholar] [CrossRef]
- Nichol, A.; Dhariwal, P. Improved Denoising Diffusion Probabilistic Models, 2021, [arXiv:cs.LG/2102.09672].
- Park, J.; Kwon, G.; Ye, J.C. ED-NeRF: Efficient Text-Guided Editing of 3D Scene using Latent Space NeRF. arXiv, 2023; arXiv:2310.02712 2023. [Google Scholar]
- Lin, Y.; Zhang, T.; Sun, P.; Li, Z.; Zhou, S. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv, 2021; arXiv:2111.13824 2021. [Google Scholar]
- Fang, J.; Zhao, S. A Unified Sequence Parallelism Approach for Long Context Generative AI. arXiv, 2024; arXiv:2405.07719 2024. [Google Scholar]
- Fang, G.; Ma, X.; Wang, X. Structural pruning for diffusion models. In Proceedings of the Advances in Neural Information Processing Systems; 2023. [Google Scholar]
- Yu, S.; Kwak, S.; Jang, H.; Jeong, J.; Huang, J.; Shin, J.; Xie, S. Representation alignment for generation: Training diffusion transformers is easier than you think. arXiv, 2024; arXiv:2410.06940 2024. [Google Scholar]
- Lovelace, J.; Kishore, V.; Wan, C.; Shekhtman, E.; Weinberger, K.Q. Latent diffusion for language generation. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- So, J.; Lee, J.; Ahn, D.; Kim, H.; Park, E. Temporal dynamic quantization for diffusion models. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Ma, H.; Yang, J.; Huang, H. Taming diffusion model for exemplar-based image translation. Computational Visual Media 2024, 10, 1031–1043. [Google Scholar] [CrossRef]
- Lee, S.; Lin, Z.; Fanti, G. Improving the Training of Rectified Flows. arXiv, 2024; arXiv:2405.20320 2024. [Google Scholar]
- Hegde, S.; Batra, S.; Zentner, K.; Sukhatme, G. Generating behaviorally diverse policies with latent diffusion models. Advances in Neural Information Processing Systems 2023, 36, 7541–7554. [Google Scholar]
- Tang, Z.; Gu, S.; Wang, C.; Zhang, T.; Bao, J.; Chen, D.; Guo, B. Volumediffusion: Flexible text-to-3d generation with efficient volumetric encoder. arXiv, 2023; arXiv:2312.11459 2023. [Google Scholar]
- Contributors, O. OneDiff: An out-of-the-box acceleration library for diffusion models. https://github.com/siliconflow/onediff, 2022.
- Zhu, Y.; Liu, X.; Liu, Q. SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow. arXiv, 2024; arXiv:2407.12718 2024. [Google Scholar]
- Luo, W. A comprehensive survey on knowledge distillation of diffusion models. arXiv, 2023; arXiv:2304.04262 2023. [Google Scholar]
- Yan, H.; Liu, X.; Pan, J.; Liew, J.H.; Liu, Q.; Feng, J. Perflow: Piecewise rectified flow as universal plug-and-play accelerator. arXiv, 2024; arXiv:2405.07510 2024. [Google Scholar]
- Rabin, J.; Peyré, G.; Delon, J.; Bernot, M. Wasserstein barycenter and its application to texture mixing. In Proceedings of the Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, 2011, Revised Selected Papers 3. Springer, 2012, May 29–June 2; pp. 435–446. [CrossRef]
- Shang, Y.; Yuan, Z.; Xie, B.; Wu, B.; Yan, Y. Post-training quantization on diffusion models. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023,; pp. 1972–1981.
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, 2018, [arXiv:cs.LG/1706.08500].
- Poole, B.; Jain, A.; Barron, J.T.; Mildenhall, B. DreamFusion: Text-to-3D using 2D Diffusion. arXiv 2022. [Google Scholar]
- Hochbruck, M.; Ostermann, A. Exponential integrators. Acta Numerica 2010, 19, 209–286. [Google Scholar] [CrossRef]
- Zhou, W.; Dou, Z.; Cao, Z.; Liao, Z.; Wang, J.; Wang, W.; Liu, Y.; Komura, T.; Wang, W.; Liu, L. Emdm: Efficient motion diffusion model for fast, high-quality motion generation.
- Wang, J.; Fang, J.; Li, A.; Yang, P. PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models. arXiv, 2024; arXiv:2405.14430 2024. [Google Scholar]
- Ma, J.; Chen, C.; Xie, Q.; Lu, H. PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation. arXiv, 2023; arXiv:2311.17086 2023. [Google Scholar]
- Wang, X.; Zhang, S.; Zhang, H.; Liu, Y.; Zhang, Y.; Gao, C.; Sang, N. Videolcm: Video latent consistency model. arXiv, 2023; arXiv:2312.09109 2023. [Google Scholar]
- Xu, X.; Wang, Z.; Zhang, G.; Wang, K.; Shi, H. Versatile diffusion: Text, images and variations all in one diffusion model. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023; pp. 7754–7765.
- Gholami, A.; Kim, S.; Dong, Z.; Yao, Z.; Mahoney, M.W.; Keutzer, K. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision; Chapman and Hall/CRC, 2022; pp. 291–326.
- Yang, Y.; Dai, X.; Wang, J.; Zhang, P.; Zhang, H. Efficient Quantization Strategies for Latent Diffusion Models. 2023; arXiv:cs.CV/2312.05431]. [Google Scholar]
- Mou, C.; Wang, X.; Xie, L.; Wu, Y.; Zhang, J.; Qi, Z.; Shan, Y. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2024, Vol.
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Advances in neural information processing systems 2020, 33, 6840–6851. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv, 2020; arXiv:2011.13456 2020. [Google Scholar]
- Qin, C.; Zhang, S.; Yu, N.; Feng, Y.; Yang, X.; Zhou, Y.; Wang, H.; Niebles, J.C.; Xiong, C.; Savarese, S.; et al. Unicontrol: A unified diffusion model for controllable visual generation in the wild. arXiv, 2023; arXiv:2305.11147 2023. [Google Scholar]
- Rasley, J.; Rajbhandari, S.; Ruwase, O.; He, Y. Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining; 2020; pp. 3505–3506. [Google Scholar]
- Roessle, B.; Müller, N.; Porzi, L.; Rota Bulò, S.; Kontschieder, P.; Dai, A.; Nießner, M. L3dg: Latent 3d gaussian diffusion. In Proceedings of the SIGGRAPH Asia 2024 Conference Papers; 2024; pp. 1–11. [Google Scholar]
- Robertson, S.; Zaragoza, H.; et al. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 2009, 3, 333–389. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. 2019; arXiv:cs.LG/1912.01703]. [Google Scholar]
- Shang, Y.; Yuan, Z.; Xie, B.; Wu, B.; Yan, Y. Post-training Quantization on Diffusion Models. In Proceedings of the CVPR; 2023. [Google Scholar]
- Yu, S.; Sohn, K.; Kim, S.; Shin, J. Video probabilistic diffusion models in projected latent space. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 18456–18466.
- Vahdat, A.; Kreis, K.; Kautz, J. Score-based generative modeling in latent space. Advances in neural information processing systems 2021, 34, 11287–11302. [Google Scholar]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. 2017; arXiv:cs.LG/1712.05877]. [Google Scholar]
- Jiang, M.; Bai, Y.; Cornman, A.; Davis, C.; Huang, X.; Jeon, H.; Kulshrestha, S.; Lambert, J.; Li, S.; Zhou, X.; et al. Scenediffuser: Efficient and controllable driving simulation initialization and rollout. Advances in Neural Information Processing Systems 2024, 37, 55729–55760. [Google Scholar]
- Chen, S.; Sun, P.; Song, Y.; Luo, P. Diffusiondet: Diffusion model for object detection. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023,; pp. 19830–19843.
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning. PmLR; 2021; pp. 8748–8763. [Google Scholar]
- Tian, Y.; Jia, Z.; Luo, Z.; Wang, Y.; Wu, C. DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines. 2024; arXiv:cs.DC/2405.01248]. [Google Scholar]
- De Bortoli, V.; Thornton, J.; Heng, J.; Doucet, A. Diffusion schrödinger bridge with applications to score-based generative modeling. Advances in Neural Information Processing Systems 2021, 34, 17695–17709. [Google Scholar]
- Kim, B.; Ye, J.C. Denoising mcmc for accelerating diffusion-based generative models. arXiv, 2022; arXiv:2209.14593 2022. [Google Scholar]
- Fang, J.; Pan, J.; Wang, J.; Li, A.; Sun, X. PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference. arXiv, 2024; arXiv:2405.14430 2024. [Google Scholar]
- Zand, M.; Etemad, A.; Greenspan, M. Diffusion models with deterministic normalizing flow priors. arXiv, 2023; arXiv:2309.01274 2023. [Google Scholar]
- Zhang, H.; Zhang, J.; Srinivasan, B.; Shen, Z.; Qin, X.; Faloutsos, C.; Rangwala, H.; Karypis, G. Mixed-type tabular data synthesis with score-based diffusion in latent space. arXiv, 2023; arXiv:2310.09656 2023. [Google Scholar]
- Li, Y.; Xu, S.; Cao, X.; Zhang, B.; Sun, X. Q-DM: An Efficient Low-bit Quantized Diffusion Model. In Proceedings of the NeurIPS 2023, October 2023. [Google Scholar]
- Shen, M.; Chen, P.; Ye, P.; Xia, G.; Chen, T.; Bouganis, C.S.; Zhao, Y. MD-DiT: Step-aware Mixture-of-Depths for Efficient Diffusion Transformers. In Proceedings of the Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning.
- Nagel, M.; Amjad, R.A.; Van Baalen, M.; Louizos, C.; Blankevoort, T. Up or down? In adaptive rounding for post-training quantization. In Proceedings of the International Conference on Machine Learning. PMLR; 2020; pp. 7197–7206. [Google Scholar]
- Lin, J.; Liu, J.; Zhu, J.; Xi, Y.; Liu, C.; Zhang, Y.; Yu, Y.; Zhang, W. A Survey on Diffusion Models for Recommender Systems. arXiv, 2024; arXiv:2409.05033 2024. [Google Scholar]
- Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. Advances in neural information processing systems 2022, 35, 26565–26577. [Google Scholar]
- Melnik, A.; Ljubljanac, M.; Lu, C.; Yan, Q.; Ren, W.; Ritter, H. Video diffusion models: A survey. arXiv, 2024; arXiv:2405.03150 2024. [Google Scholar]
- Chen, C.; Deng, F.; Kawaguchi, K.; Gulcehre, C.; Ahn, S. Simple hierarchical planning with diffusion. arXiv, 2024; arXiv:2401.02644 2024. [Google Scholar]
- Luhman, E.; Luhman, T. Knowledge distillation in iterative generative models for improved sampling speed. arXiv, 2021; arXiv:2101.02388 2021. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. 2020; arXiv:cs.LG/2006.11239]. [Google Scholar]
- Luo, S.; Tan, Y.; Patil, S.; Gu, D.; von Platen, P.; Passos, A.; Huang, L.; Li, J.; Zhao, H. Lcm-lora: A universal stable-diffusion acceleration module. arXiv, 2023; arXiv:2311.05556 2023. [Google Scholar]
- He, Y.; Liu, J.; Wu, W.; Zhou, H.; Zhuang, B. EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models. 2023; arXiv:cs.CV/2310.03270]. [Google Scholar]
- Li, M.; Cai, T.; Cao, J.; Zhang, Q.; Cai, H.; Bai, J.; Jia, Y.; Li, K.; Han, S. Distrifusion: Distributed parallel inference for high-resolution diffusion models. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp.; pp. 7183–7193.
- Wang, C.; Wang, Z.; Xu, X.; Tang, Y.; Zhou, J.; Lu, J. Towards Accurate Data-free Quantization for Diffusion Models. 2023; arXiv:cs.CV/2305.18723]. [Google Scholar]
- Van Den Oord, A.; Vinyals, O.; et al. Neural discrete representation learning. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Tang, S.; Wang, Y.; Ding, C.; Liang, Y.; Li, Y.; Xu, D. Deediff: Dynamic uncertainty-aware early exiting for accelerating diffusion model generation 2023.
- Yu, Y.; Zhu, S.; Qin, H.; Li, H. BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion. arXiv, 2024; arXiv:2401.16764 2024. [Google Scholar]
- Lin, J.; Tang, J.; Tang, H.; Yang, S.; Dang, X.; Gan, C.; Han, S. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. 2023; arXiv:cs.CL/2306.00978]. [Google Scholar]
- Dohmatob, E.; Feng, Y.; Subramonian, A.; Kempe, J. Strong model collapse. arXiv, 2024; arXiv:2410.04840 2024. [Google Scholar]
- Sabour, A.; Fidler, S.; Kreis, K. Align your steps: Optimizing sampling schedules in diffusion models. arXiv, 2024; arXiv:2404.14507 2024. [Google Scholar]
- Zniyed, Y.; Nguyen, T.P.; et al. Enhanced network compression through tensor decompositions and pruning. IEEE Transactions on Neural Networks and Learning Systems 2024, 36, 4358–4370. [Google Scholar] [CrossRef]
- Liu, J.; Niu, L.; Yuan, Z.; Yang, D.; Wang, X.; Liu, W. PD-Quant: Post-Training Quantization based on Prediction Difference Metric. 2023; arXiv:cs.CV/2212.07048]. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. 2021; arXiv:cs.CV/2112.10752]. [Google Scholar]
- Ran, L.; Cun, X.; Liu, J.W.; Zhao, R.; Zijie, S.; Wang, X.; Keppo, J.; Shou, M.Z. X-adapter: Adding universal compatibility of plugins for upgraded diffusion model. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp.; pp. 8775–8784.
- Aiello, E.; Valsesia, D.; Magli, E. Fast inference in denoising diffusion models via mmd finetuning. IEEE Access 2024. [Google Scholar] [CrossRef]
- Xiao, G.; Lin, J.; Seznec, M.; Wu, H.; Demouth, J.; Han, S. SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models. 2023; arXiv:cs.CL/2211.10438]. [Google Scholar]
- Castells, T.; Song, H.K.; Kim, B.K.; Choi, S. LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp.; pp. 821–830.
- Song, Y.; Ermon, S. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Xu, Y.; Deng, M.; Cheng, X.; Tian, Y.; Liu, Z.; Jaakkola, T. Restart sampling for improving generative processes. Advances in Neural Information Processing Systems 2023, 36, 76806–76838. [Google Scholar]
- Song, Y.; Dhariwal, P. Improved techniques for training consistency models. arXiv, 2023; arXiv:2310.14189 2023. [Google Scholar]
- Kong, Z.; Ping, W.; Huang, J.; Zhao, K.; Catanzaro, B. Diffwave: A versatile diffusion model for audio synthesis. arXiv, 2020; arXiv:2009.09761 2020. [Google Scholar]
- Yuan, J.; Li, X.; Cheng, C.; Liu, J.; Guo, R.; Cai, S.; Yao, C.; Yang, F.; Yi, X.; Wu, C.; et al. Oneflow: Redesign the distributed deep learning framework from scratch. arXiv, 2021; arXiv:2110.15032 2021. [Google Scholar]
- Liu, L.; Ren, Y.; Lin, Z.; Zhao, Z. Pseudo Numerical Methods for Diffusion Models on Manifolds. 2022; arXiv:cs.CV/2202.09778]. [Google Scholar]
- Moon, T.; Choi, M.; Yun, E.; Yoon, J.; Lee, G.; Cho, J.; Lee, J. A simple early exiting framework for accelerated sampling in diffusion models. arXiv, 2024; arXiv:2408.05927 2024. [Google Scholar]
- Zhang, H.; Wu, Z.; Xing, Z.; Shao, J.; Jiang, Y.G. Adadiff: Adaptive step selection for fast diffusion. arXiv, 2023; arXiv:2311.14768 2023. [Google Scholar]
- Chen, N.; Zhang, Y.; Zen, H.; Weiss, R.J.; Norouzi, M.; Chan, W. Wavegrad: Estimating gradients for waveform generation. arXiv, 2020; arXiv:2009.00713 2020. [Google Scholar]
- You, Y.; Zhou, R.; Park, J.; Xu, H.; Tian, C.; Wang, Z.; Shen, Y. Latent 3d graph diffusion. International Conference on Learning Representations (ICLR), 2024.
- Wu, Z.; Zhou, P.; Yi, X.; Yuan, X.; Zhang, H. Consistent3d: Towards consistent high-fidelity text-to-3d generation with deterministic sampling prior. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp.; pp. 9892–9902.
- Kim, D.; Kim, Y.; Kwon, S.J.; Kang, W.; Moon, I.C. Refining generative process with discriminator guidance in score-based diffusion models. arXiv, 2022; arXiv:2211.17091 2022. [Google Scholar]
- Peng, M.; Chen, K.; Guo, X.; Zhang, Q.; Lu, H.; Zhong, H.; Chen, D.; Zhu, M.; Yang, H. Diffusion Models for Intelligent Transportation Systems: A Survey. arXiv, 2024; arXiv:2409.15816 2024. [Google Scholar]
- Mo, S. Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs. arXiv, 2024; arXiv:2406.05038 2024. [Google Scholar]
- Ma, X.; Fang, G.; Mi, M.B.; Wang, X. Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching. arXiv preprint arXiv:2406.01733, arXiv:2406.01733 2024.
- Lee, T.; Kwon, S.; Kim, T. Grid Diffusion Models for Text-to-Video Generation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp.
- Tang, Z.; Tang, J.; Luo, H.; Wang, F.; Chang, T.H. Accelerating parallel sampling of diffusion models. In Proceedings of the Forty-first International Conference on Machine Learning; 2024. [Google Scholar]
- Ho, J.; Salimans, T.; Gritsenko, A.; Chan, W.; Norouzi, M.; Fleet, D.J. Video diffusion models. Advances in Neural Information Processing Systems 2022, 35, 8633–8646. [Google Scholar]
- Xu, S.; Li, Y.; Lin, M.; Gao, P.; Guo, G.; Lu, J.; Zhang, B. 2023; arXiv:cs.CV/2304.00253].
- Esser, S.K.; McKinstry, J.L.; Bablani, D.; Appuswamy, R.; Modha, D.S. Learned step size quantization. arXiv preprint arXiv:1902.08153, arXiv:1902.08153 2019.
- Yang, G.; Xie, Y.; Xue, Z.J.; Chang, S.E.; Li, Y.; Dong, P.; Lei, J.; Xie, W.; Wang, Y.; Lin, X.; et al. SDA: Low-Bit Stable Diffusion Acceleration on Edge FPGAs 2023.
- Wang, Z.; Lu, C.; Wang, Y.; Bao, F.; Li, C.; Su, H.; Zhu, J. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. Advances in Neural Information Processing Systems 2024, 36. [Google Scholar]
- Ye, H.; Zhang, J.; Liu, S.; Han, X.; Yang, W. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, arXiv:2308.06721 2023.
- Li, M.; Lin, Y.; Zhang, Z.; Cai, T.; Li, X.; Guo, J.; Xie, E.; Meng, C.; Zhu, J.Y.; Han, S. 2025; arXiv:cs.CV/2411.05007].

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).