Submitted:
27 March 2026
Posted:
31 March 2026
Read the latest preprint version here
Abstract
Keywords:
1. Introduction
- We introduce a unified conceptual framework that interprets language-driven IR as an interaction-centric paradigm, revealing how language models impact restoration behavior beyond architectural modifications.
- We systematically categorize VLM-/MLLM-based IR methods through a functional role-oriented taxonomy, clarifying distinct interaction mechanisms including representation-level integration, optimization-level coupling, decision-driven control, and cross-level paradigms.
- We critically analyze VLM-/MLLM-based IQA, highlighting its conceptual distinctions from conventional fidelity metrics and clarifying challenges related to evaluation reliability, calibration stability, and semantic bias.
- We summarize the restoration datasets used in VLM-/MLLM-based frameworks and analyze their limitations and emerging requirements from a language-driven perspective, emphasizing the need for semantically enriched, language-aware benchmarks. We also provide comparisons between conventional frameworks and VLM-/MLLM-based methods across different settings.
- We investigate open challenges posed by language-integrated restoration systems and outline promising directions for future research that bridge multimodal reasoning, visual perception, and restoration optimization.
2. Background
2.1. Image Restoration

2.2. Multimodal Language Models
2.3. Image Quality Assessment
2.4. Relevant Surveys
3. Methodology
3.1. Overview of Language-Driven Restoration and Taxonomy Definition

- Representation-Level Coupling: LM outputs modify forward feature representations without altering the optimization objective or execution logic. This includes embedding replacement and semantic conditioning.
- Optimization-Level Coupling: LM outputs reshape the training objective by defining differentiable supervision signals or scalar reward functions, thereby altering optimization dynamics.
- Decision-Level Coupling: LM outputs regulate the execution structure of the restoration pipeline, such as task decomposition, module scheduling, or control policies.
- Cross-Level Coupling: Systems that simultaneously integrate multiple interaction depths within a unified framework, combining representation, optimization, and/or decision mechanisms.
3.2. VLM-Based Image Restoration
3.3. MLLM-Based Image Restoration
3.4. Cross-Level Hybrid Systems
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Experimental Results
5. Discussion and Open Challenges
5.1. Generalization and Robustness
5.2. Computational Efficiency
5.3. Cross-Paradigm Trade-Offs
5.4. Evaluation Reliability
5.5. Dataset Design for VLM-/MLLM-Based IR
5.6. Leveraging Multimodal Data and High-Dimensional Representations
5.7. Ethics and Trustworthiness
6. Conclusions
References
- Jiang, B.; Li, J.; Lu, Y.; Cai, Q.; Song, H.; Lu, G. Efficient image denoising using deep learning: A brief survey. Information Fusion 2025, 103013. [Google Scholar] [CrossRef]
- Tian, C.; Xu, Y.; Zuo, W. Image denoising using deep CNN with batch renormalization. Neural Networks 2020, 121, 461–473. [Google Scholar] [CrossRef]
- Zhang, K.; Zuo, W.; Chen, Y.; Meng, D.; Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE transactions on image processing 2017, 26, 3142–3155. [Google Scholar] [CrossRef] [PubMed]
- Chen, X.; Pan, J.; Dong, J.; Tang, J. Towards unified deep image deraining: A survey and a new benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025. [Google Scholar]
- Chen, X.; Li, H.; Li, M.; Pan, J. Learning a sparse transformer network for effective image deraining. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 5896–5905. [Google Scholar]
- Xiao, J.; Fu, X.; Liu, A.; Wu, F.; Zha, Z.J. Image de-raining transformer. IEEE transactions on pattern analysis and machine intelligence 2022, 45, 12978–12995. [Google Scholar] [CrossRef]
- Gui, J.; Cong, X.; Cao, Y.; Ren, W.; Zhang, J.; Zhang, J.; Cao, J.; Tao, D. A comprehensive survey and taxonomy on single image dehazing based on deep learning. ACM Computing Surveys 2023, 55, 1–37. [Google Scholar] [CrossRef]
- Tsai, F.J.; Peng, Y.T.; Lin, Y.Y.; Lin, C.W. PHATNet: A Physics-guided Haze Transfer Network for Domain-adaptive Real-world Image Dehazing. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025; pp. 5591–5600. [Google Scholar]
- Yu, H.; Huang, J.; Zheng, K.; Zhao, F. High-quality image dehazing with diffusion model. arXiv 2023, arXiv:2308.11949. [Google Scholar]
- Quan, Y.; Tan, X.; Huang, Y.; Xu, Y.; Ji, H. Image desnowing via deep invertible separation. IEEE Transactions on Circuits and Systems for Video Technology 2023, 33, 3133–3144. [Google Scholar] [CrossRef]
- Guo, X.; Wang, X.; Fu, X.; Zha, Z.J. Deep unfolding network for image desnowing with snow shape prior. IEEE Transactions on Circuits and Systems for Video Technology 2025. [Google Scholar]
- Liu, Y.F.; Jaw, D.W.; Huang, S.C.; Hwang, J.N. Desnownet: Context-aware deep network for snow removal. IEEE Transactions on Image Processing 2018, 27, 3064–3073. [Google Scholar] [CrossRef]
- Xiang, Y.; Zhou, H.; Li, C.; Sun, F.; Li, Z.; Xie, Y. Deep learning in motion deblurring: current status, benchmarks and future prospects. The Visual Computer 2025, 41, 3801–3827. [Google Scholar] [CrossRef]
- Abuolaim, A.; Brown, M.S. Defocus deblurring using dual-pixel data. In Proceedings of the European conference on computer vision, 2020; Springer; pp. 111–126. [Google Scholar]
- Nah, S.; Hyun Kim, T.; Mu Lee, K. Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 3883–3891. [Google Scholar]
- Li, C.; Guo, C.; Han, L.; Jiang, J.; Cheng, M.M.; Gu, J.; Loy, C.C. Low-light image and video enhancement using deep learning: A survey. IEEE transactions on pattern analysis and machine intelligence 2021, 44, 9396–9416. [Google Scholar] [CrossRef]
- Wei, C.; Wang, W.; Yang, W.; Liu, J. Deep Retinex Decomposition for Low-Light Enhancement. In Proceedings of the BMVC, 2018. [Google Scholar]
- Yang, W.; Wang, W.; Huang, H.; Wang, S.; Liu, J. Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Transactions on Image Processing 2021, 30, 2072–2086. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Chen, J.; Hoi, S.C. Deep learning for image super-resolution: A survey. IEEE transactions on pattern analysis and machine intelligence 2020, 43, 3365–3387. [Google Scholar] [CrossRef]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 1905–1914. [Google Scholar]
- Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 22367–22377. [Google Scholar]
- Zhang, W.; Dong, L.; Pan, X.; Zou, P.; Qin, L.; Xu, W. A survey of restoration and enhancement for underwater images. IEEE Access 2019, 7, 182259–182279. [Google Scholar] [CrossRef]
- Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE transactions on image processing 2019, 29, 4376–4389. [Google Scholar] [CrossRef]
- Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE robotics and automation letters 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
- Kermany, D.S.; Goldbaum, M.; Cai, W.; Valentim, C.C.; Liang, H.; Baxter, S.L.; McKeown, A.; Yang, G.; Wu, X.; Yan, F.; et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. cell 2018, 172, 1122–1131. [Google Scholar] [CrossRef] [PubMed]
- McCollough, C.H.; Bartley, A.C.; Carter, R.E.; Chen, B.; Drees, T.A.; Edwards, P.; Holmes, D.R., III; Huang, A.E.; Khan, F.; Leng, S.; et al. Low-dose CT for the detection and classification of metastatic liver lesions: results of the 2016 low dose CT grand challenge. Medical physics 2017, 44, e339–e352. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 2012, 25. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Gu, A.; Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. In Proceedings of the First conference on language modeling, 2024. [Google Scholar]
- Cai, Y.; Bian, H.; Lin, J.; Wang, H.; Timofte, R.; Zhang, Y. Retinexformer: One-stage retinex-based transformer for low-light image enhancement. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023; pp. 12504–12513. [Google Scholar]
- Jiang, J.; Zuo, Z.; Wu, G.; Jiang, K.; Liu, X. A survey on all-in-one image restoration: Taxonomy, evaluation and future trends. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025. [Google Scholar]
- Li, R.; Tan, R.T.; Cheong, L.F. All in one bad weather removal using architectural search. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020; pp. 3175–3185. [Google Scholar]
- Potlapalli, V.; Zamir, S.W.; Khan, S.H.; Shahbaz Khan, F. Promptir: Prompting for all-in-one image restoration. Advances in Neural Information Processing Systems 2023, 36, 71275–71293. [Google Scholar]
- Conde, M.V.; Geigle, G.; Timofte, R. Instructir: High-quality image restoration following human instructions. In Proceedings of the European Conference on Computer Vision, 2024; Springer; pp. 1–21. [Google Scholar]
- Guan, C.; Yoshie, O. CLIP-driven rain perception: Adaptive deraining with pattern-aware network routing and mask-guided cross-attention. arXiv 2025, arXiv:2506.01366. [Google Scholar] [CrossRef]
- Lin, Y.; Lin, Z.; Chen, H.; Pan, P.; Li, C.; Chen, S.; Wen, K.; Jin, Y.; Li, W.; Ding, X. Jarvisir: Elevating autonomous driving perception with intelligent image restoration. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025; pp. 22369–22380. [Google Scholar]
- Zhu, K.; Gu, J.; You, Z.; Qiao, Y.; Dong, C. An intelligent agentic system for complex image restoration problems. arXiv 2024, arXiv:2410.17809. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Huynh-Thu, Q.; Ghanbari, M. Scope of validity of PSNR in image/video quality assessment. Electronics letters 2008, 44, 800–801. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 586–595. [Google Scholar]
- Wang, J.; Chan, K.C.; Loy, C.C. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI conference on artificial intelligence, 2023; Vol. 37, pp. 2555–2563. [Google Scholar]
- Hessel, J.; Holtzman, A.; Forbes, M.; Le Bras, R.; Choi, Y. Clipscore: A reference-free evaluation metric for image captioning. In Proceedings of the Proceedings of the 2021 conference on empirical methods in natural language processing, 2021; pp. 7514–7528. [Google Scholar]
- Agnolucci, L.; Galteri, L.; Bertini, M. Quality-aware image-text alignment for opinion-unaware image quality assessment. arXiv 2024, arXiv:2403.11176. [Google Scholar]
- Liu, H.; Li, C.; Wu, Q.; Lee, Y.J. Visual instruction tuning. Advances in neural information processing systems 2023, 36, 34892–34916. [Google Scholar]
- You, Z.; Cai, X.; Gu, J.; Xue, T.; Dong, C. Teaching large language models to regress accurate image quality scores using score distribution. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025; pp. 14483–14494. [Google Scholar]
- Wu, H.; Zhang, Z.; Zhang, W.; Chen, C.; Liao, L.; Li, C.; Gao, Y.; Wang, A.; Zhang, E.; Sun, W.; et al. Q-align: Teaching lmms for visual scoring via discrete text-defined levels. arXiv 2023, arXiv:2312.17090. [Google Scholar]
- Zhang, Z.; Wu, H.; Jia, Z.; Lin, W.; Zhai, G. Teaching lmms for image quality scoring and interpreting. arXiv 2025, arXiv:2503.09197. [Google Scholar] [CrossRef]
- Zhu, H.; Tian, Y.; Ding, K.; Chen, B.; Chen, B.; Wang, S.; Lin, W. Agenticiqa: An agentic framework for adaptive and interpretable image quality assessment. arXiv 2025, arXiv:2509.26006. [Google Scholar] [CrossRef]
- Su, J.; Xu, B.; Yin, H. A survey of deep learning approaches to image restoration. Neurocomputing 2022, 487, 46–65. [Google Scholar] [CrossRef]
- Wang, L.; Zhou, W.; Wang, C.; Lam, K.M.; Su, Z.; Pan, J. Deep Learning-Driven Ultra-High-Definition Image Restoration: A Survey. arXiv 2025, arXiv:2505.16161. [Google Scholar]
- Zhang, J.; Huang, J.; Jin, S.; Lu, S. Vision-language models for vision tasks: A survey. IEEE transactions on pattern analysis and machine intelligence 2024, 46, 5625–5644. [Google Scholar] [CrossRef] [PubMed]
- Suhr, A.; Zhou, S.; Zhang, A.; Zhang, I.; Bai, H.; Artzi, Y. A corpus for reasoning about natural language grounded in photographs. In Proceedings of the Proceedings of the 57th annual meeting of the association for computational linguistics, 2019; pp. 6418–6428. [Google Scholar]
- Xu, J.; Li, H.; Liang, Z.; Zhang, D.; Zhang, L. Real-world noisy image denoising: A new benchmark. arXiv 2018, arXiv:1804.02603. [Google Scholar] [CrossRef]
- Guo, Y.; Xiao, X.; Chang, Y.; Deng, S.; Yan, L. From sky to the ground: A large-scale benchmark and simple baseline towards real rain removal. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023; pp. 12097–12107. [Google Scholar]
- Ancuti, C.O.; Ancuti, C.; Timofte, R. NH-HAZE: An image dehazing benchmark with non-homogeneous hazy and haze-free images. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020; pp. 444–445. [Google Scholar]
- Lai, J.; Chen, S.; Lin, Y.; Ye, T.; Liu, Y.; Fei, S.; Xing, Z.; Wu, H.; Wang, W.; Zhu, L. SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025; pp. 4302–4312. [Google Scholar]
- Hai, J.; Xuan, Z.; Yang, R.; Hao, Y.; Zou, F.; Lin, F.; Han, S. R2rnet: Low-light image enhancement via real-low to real-normal network. Journal of Visual Communication and Image Representation 2023, 90, 103712. [Google Scholar] [CrossRef]
- Zuo, Y.; Zheng, Q.; Wu, M.; Jiang, X.; Li, R.; Wang, J.; Zhang, Y.; Mai, G.; Wang, L.V.; Zou, J.; et al. 4kagent: agentic any image to 4k super-resolution. arXiv 2025, arXiv:2507.07105. [Google Scholar] [CrossRef]
- Berman, D.; Levy, D.; Avidan, S.; Treibitz, T. Underwater single image color restoration using haze-lines and a new quantitative dataset. IEEE transactions on pattern analysis and machine intelligence 2020, 43, 2822–2837. [Google Scholar] [CrossRef]
- Knoll, F.; Zbontar, J.; Sriram, A.; Muckley, M.J.; Bruno, M.; Defazio, A.; Parente, M.; Geras, K.J.; Katsnelson, J.; Chandarana, H.; et al. fastMRI: A publicly available raw k-space and DICOM dataset of knee images for accelerated MR image reconstruction using machine learning. Radiology: Artificial Intelligence 2020, 2, e190007. [Google Scholar] [CrossRef]
- Kong, X.; Dong, C.; Zhang, L. Towards effective multiple-in-one image restoration: A sequential and prompt learning strategy. arXiv 2024, arXiv:2401.03379. [Google Scholar] [CrossRef]
- Wang, Z.; Cun, X.; Bao, J.; Zhou, W.; Liu, J.; Li, H. Uformer: A general u-shaped transformer for image restoration. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 17683–17693. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 5728–5739. [Google Scholar]
- Liu, M.; Cui, Y.; Liu, X.; Strand, L.; Yin, H.; Knoll, A. Drfir: A dimensionality reduction framework for all-in-one image restoration in spatial and frequency domains. Expert Systems with Applications 2025, 128959. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, H.; Wang, G.; Zhang, Q.; Zhang, L.; Du, B. UniUIR: Considering Underwater Image Restoration as An All-in-One Learner. arXiv 2025, arXiv:2501.12981. [Google Scholar] [CrossRef]
- Zhang, X.; Zhang, H.; Wang, G.; Zhang, Q.; Zhang, L. ClearAIR: A Human-Visual-Perception-Inspired All-in-One Image Restoration. arXiv 2026, arXiv:2601.02763. [Google Scholar] [CrossRef]
- Zeng, H.; Wang, X.; Chen, Y.; Su, J.; Liu, J. Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025; pp. 7524–7533. [Google Scholar]
- Jin, X.; Shi, Y.; Xia, B.; Yang, W. Llmra: Multi-modal large language model based restoration assistant. arXiv 2024, arXiv:2401.11401. [Google Scholar] [CrossRef]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Advances in neural information processing systems 2020, 33, 1877–1901. [Google Scholar]
- Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; Al-Dahle, A.; Letman, A.; Mathur, A.; Schelten, A.; Vaughan, A.; et al. The llama 3 herd of models. arXiv 2024, arXiv:2407.21783. [Google Scholar] [CrossRef]
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; Fan, Y.; Ge, W.; Han, Y.; Huang, F.; et al. Qwen technical report. arXiv 2023, arXiv:2309.16609. [Google Scholar] [CrossRef]
- Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; Lu, C.; Zhao, C.; Deng, C.; Zhang, C.; Ruan, C.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437. [Google Scholar]
- Luo, H.; Bao, J.; Wu, Y.; He, X.; Li, T. Segclip: Patch aggregation with learnable centers for open-vocabulary semantic segmentation. In Proceedings of the International Conference on Machine Learning. PMLR, 2023; pp. 23033–23044. [Google Scholar]
- Jia, C.; Yang, Y.; Xia, Y.; Chen, Y.T.; Parekh, Z.; Pham, H.; Le, Q.; Sung, Y.H.; Li, Z.; Duerig, T. Scaling up visual and vision-language representation learning with noisy text supervision. In Proceedings of the International conference on machine learning. PMLR, 2021; pp. 4904–4916. [Google Scholar]
- Yao, L.; Han, J.; Wen, Y.; Liang, X.; Xu, D.; Zhang, W.; Li, Z.; Xu, C.; Xu, H. Detclip: Dictionary-enriched visual-concept paralleled pre-training for open-world detection. Advances in Neural Information Processing Systems 2022, 35, 9125–9138. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar] [CrossRef]
- Zhang, B.; Li, K.; Cheng, Z.; Hu, Z.; Yuan, Y.; Chen, G.; Leng, S.; Jiang, Y.; Zhang, H.; Li, X.; et al. Videollama 3: Frontier multimodal foundation models for image and video understanding. arXiv 2025, arXiv:2501.13106. [Google Scholar]
- Wang, P.; Bai, S.; Tan, S.; Wang, S.; Fan, Z.; Bai, J.; Chen, K.; Liu, X.; Wang, J.; Ge, W.; et al. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution. arXiv 2024, arXiv:2409.12191. [Google Scholar]
- Bai, S.; Chen, K.; Liu, X.; Wang, J.; Ge, W.; Song, S.; Dang, K.; Wang, P.; Wang, S.; Tang, J.; et al. Qwen2. 5-vl technical report. arXiv 2025, arXiv:2502.13923. [Google Scholar]
- Team, G.; Anil, R.; Borgeaud, S.; Alayrac, J.B.; Yu, J.; Soricut, R.; Schalkwyk, J.; Dai, A.M.; Hauth, A.; Millican, K.; et al. Gemini: a family of highly capable multimodal models. arXiv 2023, arXiv:2312.11805. [Google Scholar] [CrossRef]
- Zhai, G.; Min, X. Perceptual image quality assessment: a survey. Science China Information Sciences 2020, 63, 211301. [Google Scholar] [CrossRef]
- Ding, K.; Ma, K.; Wang, S.; Simoncelli, E.P. Image quality assessment: Unifying structure and texture similarity. IEEE transactions on pattern analysis and machine intelligence 2020, 44, 2567–2581. [Google Scholar] [CrossRef] [PubMed]
- Zheng, H.; Yang, H.; Fu, J.; Zha, Z.J.; Luo, J. Learning conditional knowledge distillation for degraded-reference image quality assessment. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp. 10242–10251. [Google Scholar]
- Lao, S.; Gong, Y.; Shi, S.; Yang, S.; Wu, T.; Wang, J.; Xia, W.; Yang, Y. Attentions help cnns see better: Attention-based hybrid image quality assessment network. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 1140–1149. [Google Scholar]
- Chen, C.; Mo, J.; Hou, J.; Wu, H.; Liao, L.; Sun, W.; Yan, Q.; Lin, W. Topiq: A top-down approach from semantics to distortions for image quality assessment. IEEE Transactions on Image Processing 2024, 33, 2404–2418. [Google Scholar] [CrossRef]
- Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Transactions on image processing 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal processing letters 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Venkatanath, N.; Praneeth, D.; Sumohana, S.C.; Swarup, S.M.; et al. Blind image quality evaluation using perception based features. In Proceedings of the 2015 twenty first national conference on communications (NCC), 2015; IEEE; pp. 1–6. [Google Scholar]
- Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 5148–5157. [Google Scholar]
- Talebi, H.; Milanfar, P. NIMA: Neural image assessment. IEEE transactions on image processing 2018, 27, 3998–4011. [Google Scholar] [CrossRef]
- Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 1191–1200. [Google Scholar]
- Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020; pp. 3667–3676. [Google Scholar]
- Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2014; pp. 1733–1740. [Google Scholar]
- Wu, H.; Zhang, Z.; Zhang, E.; Chen, C.; Liao, L.; Wang, A.; Xu, K.; Li, C.; Hou, J.; Zhai, G.; et al. Q-instruct: Improving low-level visual abilities for multi-modality foundation models. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024; pp. 25490–25500. [Google Scholar]
- Zhu, H.; Wu, H.; Li, Y.; Zhang, Z.; Chen, B.; Zhu, L.; Fang, Y.; Zhai, G.; Lin, W.; Wang, S. Adaptive image quality assessment via teaching large multimodal model to compare. Advances in Neural Information Processing Systems 2024, 37, 32611–32629. [Google Scholar]
- You, Z.; Li, Z.; Gu, J.; Yin, Z.; Xue, T.; Dong, C. Depicting beyond scores: Advancing image quality assessment through multi-modal language models. In Proceedings of the European Conference on Computer Vision, 2024; Springer; pp. 259–276. [Google Scholar]
- Chen, Z.; Wang, J.; Wang, W.; Xu, S.; Xiong, H.; Zeng, Y.; Guo, J.; Wang, S.; Yuan, C.; Li, B.; et al. Seagull: No-reference image quality assessment for regions of interest via vision-language instruction tuning. arXiv 2024, arXiv:2411.10161. [Google Scholar]
- Chen, C.; Yang, S.; Wu, H.; Liao, L.; Zhang, Z.; Wang, A.; Sun, W.; Yan, Q.; Lin, W. Q-ground: Image quality grounding with large multi-modality models. In Proceedings of the Proceedings of the 32nd ACM International Conference on Multimedia, 2024; pp. 486–495. [Google Scholar]
- Tang, Z.; Yang, S.; Peng, B.; Wang, Z.; Dong, J. Revisiting MLLM Based Image Quality Assessment: Errors and Remedy. arXiv 2025, arXiv:2511.07812. [Google Scholar] [CrossRef]
- Li, X.; Ren, Y.; Jin, X.; Lan, C.; Wang, X.; Zeng, W.; Wang, X.; Chen, Z. Diffusion models for image restoration and enhancement: a comprehensive survey. International Journal of Computer Vision 2025, 133, 8078–8108. [Google Scholar] [CrossRef]
- Cheng, J.; Liang, D.; Tan, S. Transfer clip for generalizable image denoising. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024; pp. 25974–25984. [Google Scholar]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International conference on machine learning. PmLR, 2021; pp. 8748–8763. [Google Scholar]
- Yang, H.; Pan, L.; Yang, Y.; Hartley, R.; Liu, M. Ldp: Language-driven dual-pixel image defocus deblurring network. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024; pp. 24078–24087. [Google Scholar]
- Chen, Z.; Chen, T.; Wang, C.; Gao, Q.; Niu, C.; Wang, G.; Shan, H. Low-dose CT denoising with language-engaged dual-space alignment. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE, 2024; pp. 3088–3091. [Google Scholar]
- Chen, Z.; Chen, T.; Wang, C.; Gao, Q.; Xie, H.; Niu, C.; Wang, G.; Shan, H. LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models. IEEE Transactions on Radiation and Plasma Medical Sciences 2025. [Google Scholar] [CrossRef]
- Hu, B.; Liu, H.; Zheng, Z.; Liu, P. CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution. IEEE Transactions on Multimedia 2025. [Google Scholar] [CrossRef]
- Duan, H.; Min, X.; Wu, S.; Shen, W.; Zhai, G. Uniprocessor: a text-induced unified low-level image processor. In Proceedings of the European Conference on Computer Vision, 2024; Springer; pp. 180–199. [Google Scholar]
- Li, J.; Wang, Y.; Yan, J.; Zhang, R.; Yang, B. MdaIF: Robust One-Stop Multi-Degradation-Aware Image Fusion with Language-Driven Semantics. arXiv 2025, arXiv:2511.12525. [Google Scholar] [CrossRef]
- Mao, J.; Yang, Y.; Yin, X.; Shao, L.; Tang, H. AllRestorer: All-in-One Transformer for Image Restoration under Composite Degradations. arXiv 2024, arXiv:2411.10708. [Google Scholar] [CrossRef]
- Wu, Z.; Chen, Y.; Yokoya, N.; He, W. MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration. arXiv 2025, arXiv:2503.09131. [Google Scholar]
- Lee, C.M.; Cheng, C.H.; Lin, Y.F.; Cheng, Y.C.; Liao, W.T.; Hsu, C.C.; Yang, F.E.; Wang, Y.C.F. Prompthsi: Universal hyperspectral image restoration framework for composite degradation. arXiv e-prints 2024, arXiv–2411. [Google Scholar]
- Luo, Z.; Gustafsson, F.K.; Zhao, Z.; Sjölund, J.; Schön, T.B. Controlling vision-language models for multi-task image restoration. arXiv 2023, arXiv:2310.01018. [Google Scholar]
- Jiang, Y.; Zhang, Z.; Xue, T.; Gu, J. Autodir: Automatic all-in-one image restoration with latent diffusion. In Proceedings of the European Conference on Computer Vision, 2024; Springer; pp. 340–359. [Google Scholar]
- Qi, C.; Tu, Z.; Ye, K.; Delbracio, M.; Milanfar, P.; Chen, Q.; Talebi, H. Spire: Semantic prompt-driven image restoration. In Proceedings of the European Conference on Computer Vision, 2024; Springer; pp. 446–464. [Google Scholar]
- Ai, Y.; Huang, H.; Zhou, X.; Wang, J.; He, R. Multimodal prompt perceiver: Empower adaptiveness generalizability and fidelity for all-in-one image restoration. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024; pp. 25432–25444. [Google Scholar]
- Tu, Y.; Yan, Q.; Niu, A.; Tang, J. TPGDiff: Hierarchical Triple-Prior Guided Diffusion for Image Restoration. arXiv 2026, arXiv:2601.20306. [Google Scholar] [CrossRef]
- Zhang, Z.; Lei, J.; Peng, B.; Zhu, J.; Xu, L.; Huang, Q. Advancing Real-World Stereoscopic Image Super-Resolution via Vision-Language Model. IEEE Transactions on Image Processing 2025. [Google Scholar]
- Yang, S.; Ding, M.; Wu, Y.; Li, Z.; Zhang, J. Implicit neural representation for cooperative low-light image enhancement. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023; pp. 12918–12927. [Google Scholar]
- Li, B.; Li, X.; Zhu, H.; Jin, Y.; Feng, R.; Zhang, Z.; Chen, Z. Sed: Semantic-aware discriminator for image super-resolution. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024; pp. 25784–25795. [Google Scholar]
- Song, W.; Liu, C.; Di Mauro, M.; Liotta, A. Unsupervised Underwater Image Enhancement Combining Imaging Restoration and Prompt Learning. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2024; Springer; pp. 421–434. [Google Scholar]
- Li, Y.; Fan, H.; Hu, R.; Feichtenhofer, C.; He, K. Scaling language-image pre-training via masking. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 23390–23400. [Google Scholar]
- Zhang, X.; Ma, J.; Wang, G.; Zhang, Q.; Zhang, H.; Zhang, L. Perceive-ir: Learning to perceive degradation better for all-in-one image restoration. IEEE Transactions on Image Processing 2025. [Google Scholar]
- Sun, X.; Wang, L.; Wang, C.; Jin, Y.; Lam, K.m.; Su, Z.; Yang, Y.; Pan, J. Adapting Large VLMs with Iterative and Manual Instructions for Generative Low-light Enhancement. arXiv 2025, arXiv:2507.18064. [Google Scholar] [CrossRef]
- Zhou, H.; Dong, W.; Liu, X.; Zhang, Y.; Zhai, G.; Chen, J. Low-light image enhancement via generative perceptual priors. In Proceedings of the AAAI Conference on Artificial Intelligence, 2025; Vol. 39, pp. 10752–10760. [Google Scholar] [CrossRef]
- Chen, H.; Li, W.; Gu, J.; Ren, J.; Chen, S.; Ye, T.; Pei, R.; Zhou, K.; Song, F.; Zhu, L. Restoreagent: Autonomous image restoration agent via multimodal large language models. Advances in Neural Information Processing Systems 2024, 37, 110643–110666. [Google Scholar]
- Zhou, Y.; Cao, J.; Zhang, Z.; Wen, F.; Jiang, Y.; Jia, J.; Liu, X.; Min, X.; Zhai, G. Q-Agent: Quality-Driven Chain-of-Thought Image Restoration Agent through Robust Multimodal Large Language Model. arXiv 2025, arXiv:2504.07148. [Google Scholar]
- Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Xia, F.; Chi, E.; Le, Q.V.; Zhou, D.; et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems 2022, 35, 24824–24837. [Google Scholar]
- Jiang, X.; Li, G.; Chen, B.; Zhang, J. Multi-Agent Image Restoration. arXiv 2025, arXiv:2503.09403. [Google Scholar] [CrossRef]
- Li, B.; Li, X.; Lu, Y.; Chen, Z. Hybrid agents for image restoration. arXiv 2025, arXiv:2503.10120. [Google Scholar] [CrossRef]
- Wei, Y.; Zhang, Z.; Ren, J.; Xu, X.; Hong, R.; Yang, Y.; Yan, S.; Wang, M. Clarity chatgpt: An interactive and adaptive processing system for image restoration and enhancement. arXiv 2023, arXiv:2311.11695. [Google Scholar]
- Wang, T.; Xia, P.; Li, B.; Jiang, P.T.; Kong, Z.; Zhang, K.; Lu, T.; Luo, W. MOERL: When Mixture-of-Experts Meet Reinforcement Learning for Adverse Weather Image Restoration. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025; pp. 13673–13683. [Google Scholar]
- Lu, J.; Wu, Y.; Zhao, Z.; Wang, H.; Jimenez, F.; Majeedi, A.; Fu, Y. SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback. arXiv 2025, arXiv:2512.18599. [Google Scholar]
- Hugging Face. Introducing IDEFICS: An Open Reproduction of State-of-the-Art Visual Language Models. https://huggingface.co/blog/idefics, 2023.
- Yu, Y.; Zeng, Z.; Hua, H.; Fu, J.; Luo, J. Promptfix: You prompt and we fix the photo. arXiv 2024, arXiv:2405.16785. [Google Scholar] [CrossRef]
- Franzen, R. Kodak lossless true color image suite, 1999.
- Zhang, L.; Wu, X.; Buades, A.; Li, X. Color demosaicking by local directional interpolation and nonlocal adaptive thresholding. Journal of Electronic imaging 2011, 20, 023016–023016. [Google Scholar]
- Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the Proceedings eighth IEEE international conference on computer vision. ICCV 2001. IEEE, 2001, Vol. 2, pp. 416–423.
- Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015; pp. 5197–5206. [Google Scholar]
- Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017; pp. 126–135. [Google Scholar]
- Abdelhamed, A.; Lin, S.; Brown, M.S. A high-quality denoising dataset for smartphone cameras. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 1692–1700. [Google Scholar]
- Ma, K.; Duanmu, Z.; Wu, Q.; Wang, Z.; Yong, H.; Li, H.; Zhang, L. Waterloo exploration database: New challenges for image quality assessment models. IEEE Transactions on Image Processing 2016, 26, 1004–1016. [Google Scholar] [CrossRef]
- Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE transactions on pattern analysis and machine intelligence 2010, 33, 898–916. [Google Scholar] [CrossRef]
- Yang, W.; Tan, R.T.; Feng, J.; Liu, J.; Guo, Z.; Yan, S. Deep joint rain detection and removal from a single image. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 1357–1366. [Google Scholar]
- Zhang, H.; Sindagi, V.; Patel, V.M. Image de-raining using a conditional generative adversarial network. IEEE transactions on circuits and systems for video technology 2019, 30, 3943–3956. [Google Scholar] [CrossRef]
- Fu, X.; Huang, J.; Zeng, D.; Huang, Y.; Ding, X.; Paisley, J. Removing rain from single images via a deep detail network. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2017; pp. 3855–3863. [Google Scholar]
- Qian, R.; Tan, R.T.; Yang, W.; Su, J.; Liu, J. Attentive generative adversarial network for raindrop removal from a single image. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 2482–2491. [Google Scholar]
- Li, R.; Cheong, L.F.; Tan, R.T. Heavy rain image restoration: Integrating physics model and conditional adversarial learning. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019; pp. 1633–1642. [Google Scholar]
- Quan, R.; Yu, X.; Liang, Y.; Yang, Y. Removing raindrops and rain streaks in one go. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021; pp. 9147–9156. [Google Scholar]
- Huang, H.; Luo, M.; He, R. Memory uncertainty learning for real-world single image deraining. IEEE Transactions on Pattern Analysis and Machine Intelligence 2022, 45, 3446–3460. [Google Scholar] [CrossRef] [PubMed]
- Sakaridis, C.; Dai, D.; Van Gool, L. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision 2018, 126, 973–992. [Google Scholar] [CrossRef]
- Sakaridis, C.; Wang, H.; Li, K.; Zurbrügg, R.; Jadon, A.; Abbeloos, W.; Reino, D.O.; Van Gool, L.; Dai, D. ACDC: The adverse conditions dataset with correspondences for robust semantic driving scene perception. arXiv 2021, arXiv:2104.13395. [Google Scholar] [CrossRef] [PubMed]
- Li, B.; Ren, W.; Fu, D.; Tao, D.; Feng, D.; Zeng, W.; Wang, Z. Benchmarking single-image dehazing and beyond. IEEE transactions on image processing 2018, 28, 492–505. [Google Scholar] [CrossRef]
- Ancuti, C.O.; Ancuti, C.; Sbert, M.; Timofte, R. Dense-haze: A benchmark for image dehazing with dense-haze and haze-free images. In Proceedings of the 2019 IEEE international conference on image processing (ICIP), 2019; IEEE; pp. 1014–1018. [Google Scholar]
- Punnappurath, A.; Abuolaim, A.; Afifi, M.; Brown, M.S. Modeling defocus-disparity in dual-pixel sensors. In Proceedings of the 2020 IEEE International Conference on Computational Photography (ICCP), 2020; IEEE; pp. 1–12. [Google Scholar]
- Pan, L.; Chowdhury, S.; Hartley, R.; Liu, M.; Zhang, H.; Li, H. Dual pixel exploration: Simultaneous depth estimation and image restoration. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021; pp. 4340–4349. [Google Scholar]
- Abuolaim, A.; Delbracio, M.; Kelly, D.; Brown, M.S.; Milanfar, P. Learning to reduce defocus blur by realistically modeling dual-pixel data. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021; pp. 2289–2298. [Google Scholar]
- Lee, C.; Lee, C.; Kim, C.S. Contrast enhancement based on layered difference representation of 2D histograms. IEEE transactions on image processing 2013, 22, 5372–5384. [Google Scholar] [CrossRef]
- Wang, S.; Zheng, J.; Hu, H.M.; Li, B. Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE transactions on image processing 2013, 22, 3538–3548. [Google Scholar] [CrossRef]
- Vonikakis, V.; Kouskouridas, R.; Gasteratos, A. On the evaluation of illumination compensation algorithms. Multimedia Tools and Applications 2018, 77, 9211–9231. [Google Scholar] [CrossRef]
- Ma, K.; Zeng, K.; Wang, Z. Perceptual quality assessment for multi-exposure image fusion. IEEE Transactions on Image Processing 2015, 24, 3345–3356. [Google Scholar] [CrossRef]
- Cai, J.; Gu, S.; Zhang, L. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing 2018, 27, 2049–2062. [Google Scholar] [CrossRef]
- Guo, X.; Li, Y.; Ling, H. LIME: Low-light image enhancement via illumination map estimation. IEEE Transactions on image processing 2016, 26, 982–993. [Google Scholar] [CrossRef] [PubMed]
- Liu, R.; Fan, X.; Zhu, M.; Hou, M.; Luo, Z. Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE transactions on circuits and systems for video technology 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
- Bevilacqua, M.; Roumy, A.; Guillemot, C.; Alberi-Morel, M.L. Low-complexity single-image super-resolution based on nonnegative neighbor embedding 2012.
- Zeyde, R.; Elad, M.; Protter, M. On single image scale-up using sparse-representations. In Proceedings of the International conference on curves and surfaces, 2010; Springer; pp. 711–730. [Google Scholar]
- Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimedia tools and applications 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
- Cheng, D.; Price, B.; Cohen, S.; Brown, M.S. Beyond white: Ground truth colors for color constancy correction. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, 2015; pp. 298–306. [Google Scholar]
- Cai, J.; Zeng, H.; Yong, H.; Cao, Z.; Zhang, L. Toward real-world single image super-resolution: A new benchmark and a new model. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2019; pp. 3086–3095. [Google Scholar]
- Wei, P.; Xie, Z.; Lu, H.; Zhan, Z.; Ye, Q.; Zuo, W.; Lin, L. Component divide-and-conquer for real-world image super-resolution. In Proceedings of the European conference on computer vision, 2020; Springer; pp. 101–117. [Google Scholar]
- Wu, R.; Yang, T.; Sun, L.; Zhang, Z.; Li, S.; Zhang, L. Seesr: Towards semantics-aware real-world image super-resolution. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024; pp. 25456–25467. [Google Scholar]
- Zhang, K.; Liang, J.; Van Gool, L.; Timofte, R. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021; pp. 4791–4800. [Google Scholar]
- Wang, Z.J.; Montoya, E.; Munechika, D.; Yang, H.; Hoover, B.; Chau, D.H. Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models. In Proceedings of the Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 893–911.
- Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS journal of photogrammetry and remote sensing 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2018; pp. 3974–3983. [Google Scholar]
- Jia, F.; Tan, L.; Wang, G.; Jia, C.; Chen, Z. A super-resolution network using channel attention retention for pathology images. PeerJ Computer Science 2023, 9, e1196. [Google Scholar] [CrossRef] [PubMed]
- FUJIFILM Healthcare Europe.; SonoSkills. US-CASE: Ultrasound Cases Dataset, 2025.
- Tang, L.; Yuan, J.; Zhang, H.; Jiang, X.; Ma, J. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Information Fusion 2022, 83, 79–92. [Google Scholar] [CrossRef]
- Liu, J.; Liu, Z.; Wu, G.; Ma, L.; Liu, R.; Zhong, W.; Luo, Z.; Fan, X. Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023; pp. 8115–8124. [Google Scholar]
- Guo, Y.; Gao, Y.; Lu, Y.; Zhu, H.; Liu, R.W.; He, S. Onerestore: A universal restoration framework for composite degradation. In Proceedings of the European conference on computer vision, 2024; Springer; pp. 255–272. [Google Scholar]
- Zhou, Y.; Ren, D.; Emerton, N.; Lim, S.; Large, T. Image restoration for under-display camera. In Proceedings of the Proceedings of the ieee/cvf conference on computer vision and pattern recognition, 2021; pp. 9179–9188. [Google Scholar]
- Lin, C.H.; Hsu, C.C.; Young, S.S.; Hsieh, C.Y.; Tai, S.C. QRCODE: Quasi-residual convex deep network for fusing misaligned hyperspectral and multispectral images. IEEE Transactions on Geoscience and Remote Sensing 2024, 62, 1–15. [Google Scholar] [CrossRef]
- Arad, B.; Timofte, R.; Yahel, R.; Morag, N.; Bernat, A.; Cai, Y.; Lin, J.; Lin, Z.; Wang, H.; Zhang, Y.; et al. Ntire 2022 spectral recovery challenge and data set. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022; pp. 863–881. [Google Scholar]
- Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE transactions on Image Processing 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 2017, 30. [Google Scholar]
- Du, Q.; Younan, N.H.; King, R.; Shah, V.P. On the performance evaluation of pan-sharpening techniques. IEEE Geoscience and Remote Sensing Letters 2007, 4, 518–522. [Google Scholar] [CrossRef]
- Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 PIRM challenge on perceptual image super-resolution. In Proceedings of the Proceedings of the European conference on computer vision (ECCV) workshops, 2018; pp. 0–0. [Google Scholar]
- Ying, Z.; Niu, H.; Gupta, P.; Mahajan, D.; Ghadiyaram, D.; Bovik, A. From patches to pictures (PaQ-2-PiQ): Mapping the perceptual space of picture quality. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020; pp. 3575–3585. [Google Scholar]
- Network, A. Blind image quality assessment using a deep bilinear convolutional neural network. Deep Bilinear Convolutional Neural 2022, 5. [Google Scholar]
- Zhou, H.; Tang, L.; Yang, R.; Qin, G.; Zhang, Y.; Li, Y.; Li, X.; Hu, R.; Zhai, G. UniQA: Unified vision-language pre-training for image quality and aesthetic assessment. arXiv 2024, arXiv:2406.01069. [Google Scholar]
- Zhang, W.; Zhai, G.; Wei, Y.; Yang, X.; Ma, K. Blind image quality assessment via vision-language correspondence: A multitask learning perspective. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 14071–14081. [Google Scholar]
- Li, X.; Huang, Z.; Zhang, Y.; Shen, Y.; Li, K.; Zheng, X.; Cao, L.; Ji, R. Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025; pp. 10442–10452. [Google Scholar]
- Chen, Z.; Qin, H.; Wang, J.; Yuan, C.; Li, B.; Hu, W.; Wang, L. Promptiqa: Boosting the performance and generalization for no-reference image quality assessment via prompts. In Proceedings of the European conference on computer vision, 2024; Springer; pp. 247–264. [Google Scholar]
- Kwon, D.; Kim, D.; Ki, S.; Jo, Y.; Lee, H.E.; Kim, S.J. ATTIQA: Generalizable image quality feature extractor using attribute-aware pretraining. In Proceedings of the Proceedings of the Asian Conference on Computer Vision, 2024; pp. 4526–4543. [Google Scholar]
- Dong, G.; Liao, X.; Li, M.; Guo, G.; Ren, C. Exploring semantic feature discrimination for perceptual image super-resolution and opinion-unaware no-reference image quality assessment. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025; pp. 28176–28187. [Google Scholar]
- Liu, K.; Zhang, Z.; Li, W.; Pei, R.; Song, F.; Liu, X.; Kong, L.; Zhang, Y. Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment. arXiv 2024, arXiv:2410.02505. [Google Scholar]
- Li, W.; Zhang, X.; Zhao, S.; Zhang, Y.; Li, J.; Zhang, L.; Zhang, J. Q-insight: Understanding image quality via visual reinforcement learning. arXiv 2025, arXiv:2503.22679. [Google Scholar] [CrossRef]
- You, Z.; Gu, J.; Li, Z.; Cai, X.; Zhu, K.; Dong, C.; Xue, T. Descriptive image quality assessment in the wild. arXiv 2024, arXiv:2405.18842. [Google Scholar] [CrossRef]
- Chen, Z.; Hu, B.; Niu, C.; Chen, T.; Li, Y.; Shan, H.; Wang, G. IQAGPT: computed tomography image quality assessment with vision-language and ChatGPT models. Visual Computing for Industry, Biomedicine, and Art 2024, 7, 20. [Google Scholar] [CrossRef] [PubMed]
- Xie, W.; Dai, R.; Ding, R.; Liu, K.; Chu, X.; Hou, X.; Wen, J. Q-Hawkeye: Reliable Visual Policy Optimization for Image Quality Assessment. arXiv 2026, arXiv:2601.22920. [Google Scholar] [CrossRef]
- Zhou, J.; Liu, C.; Jiang, Q.; Fu, X.; Hou, J.; Li, X. Semantic Contrast for Domain-Robust Underwater Image Quality Assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence 2026. [Google Scholar]
- Li, X.; Zhang, Z.; Xu, Z.; Xu, S.; Min, X.; Chen, Y.; Zhai, G. Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework. arXiv 2026, arXiv:2601.20689. [Google Scholar] [CrossRef]
- Cai, Z.; Zhang, J.; Yuan, X.; Jiang, P.T.; Chen, W.; Tang, B.; Yao, L.; Wang, Q.; Chen, J.; Li, B. Q-ponder: A unified training pipeline for reasoning-based visual quality assessment. arXiv 2025, arXiv:2506.05384. [Google Scholar]
- Rifa, K.R.; Zhang, J.; Imran, A. CAP-IQA: Context-Aware Prompt-Guided CT Image Quality Assessment. arXiv 2026, arXiv:2601.01613. [Google Scholar]
- Wu, H.; Zhu, H.; Zhang, Z.; Zhang, E.; Chen, C.; Liao, L.; Li, C.; Wang, A.; Sun, W.; Yan, Q.; et al. Towards open-ended visual quality comparison. In Proceedings of the European Conference on Computer Vision. Springer, 2024, pp. 360–377.
- Wu, H.; Zhang, Z.; Zhang, E.; Chen, C.; Liao, L.; Wang, A.; Li, C.; Sun, W.; Yan, Q.; Zhai, G.; et al. Q-bench: A benchmark for general-purpose foundation models on low-level vision. arXiv 2023, arXiv:2309.14181. [Google Scholar]
- Zhang, Z.; Wu, H.; Zhang, E.; Zhai, G.; Lin, W. Q-Bench +: A Benchmark for Multi-Modal Foundation Models on Low-Level Vision From Single Images to Pairs. IEEE Transactions on Pattern Analysis and Machine Intelligence 2024, 46, 10404–10418. [Google Scholar] [CrossRef] [PubMed]
- Zhao, S.; Zhang, X.; Li, W.; Li, J.; Zhang, L.; Xue, T.; Zhang, J. Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment. arXiv 2025, arXiv:2510.11369. [Google Scholar] [CrossRef]
- Kendall, M.G. A new measure of rank correlation. Biometrika 1938, 30, 81–93. [Google Scholar] [CrossRef]
- Tinsley, H.E.; Weiss, D.J. Interrater reliability and agreement of subjective judgments. Journal of Counseling Psychology 1975, 22, 358. [Google Scholar] [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Proceedings of the 40th annual meeting of the Association for Computational Linguistics, 2002; pp. 311–318. [Google Scholar]
- Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text summarization branches out, 2004; pp. 74–81. [Google Scholar]
- Banerjee, S.; Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005; pp. 65–72. [Google Scholar]
- Vedantam, R.; Lawrence Zitnick, C.; Parikh, D. Cider: Consensus-based image description evaluation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2015; pp. 4566–4575. [Google Scholar]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2023; pp. 4015–4026. [Google Scholar]
- Sun, S.; Ren, W.; Wang, T.; Cao, X. Rethinking image restoration for object detection. Advances in Neural Information Processing Systems 2022, 35, 4461–4474. [Google Scholar]
- Zhu, H.; Sui, X.; Chen, B.; Liu, X.; Chen, P.; Fang, Y.; Wang, S. 2AFC prompting of large multimodal models for image quality assessment. IEEE Transactions on Circuits and Systems for Video Technology 2024. [Google Scholar] [CrossRef]
- Yi, X.; Xu, H.; Zhang, H.; Tang, L.; Ma, J. Diff-Retinex++: Retinex-driven reinforced diffusion model for low-light image enhancement. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025. [Google Scholar] [CrossRef]
- Li, B.; Liu, X.; Hu, P.; Wu, Z.; Lv, J.; Peng, X. All-in-one image restoration for unknown corruption. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022; pp. 17452–17462. [Google Scholar]
- Zhang, J.; Huang, J.; Yao, M.; Yang, Z.; Yu, H.; Zhou, M.; Zhao, F. Ingredient-oriented multi-degradation learning for image restoration. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023; pp. 5825–5835. [Google Scholar]
- Cui, Y.; Zamir, S.W.; Khan, S.; Knoll, A.; Shah, M.; Khan, F.S. Adair: Adaptive all-in-one image restoration via frequency mining and modulation. In Proceedings of the 13th international conference on learning representations, ICLR 2025. International Conference on Learning Representations, ICLR, 2025, pp. 57335–57356.
- Wu, G.; Jiang, J.; Jiang, K.; Liu, X.; Nie, L. DSwinIR: Rethinking Window-Based Attention for Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 2025. [Google Scholar] [CrossRef]
- Cui, Y.; Ren, W.; Shi, B.; Knoll, A. Visual-in-Visual: A Unified and Efficient Baseline for Image Restoration. IEEE Transactions on Pattern Analysis and Machine Intelligence 2026. [Google Scholar] [CrossRef]
- Wang, C.; Fan, H.; Yang, H.; Karimi, S.; Yao, L.; Yang, Y. Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025; pp. 23539–23550. [Google Scholar]
- Wang, T.; Zhang, K.; Shen, T.; Luo, W.; Stenger, B.; Lu, T. Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. In Proceedings of the AAAI conference on artificial intelligence, 2023; Vol. 37, pp. 2654–2662. [Google Scholar] [CrossRef]
- Zhang, T.; Liu, P.; Lu, Y.; Cai, M.; Zhang, Z.; Zhang, Z.; Zhou, Q. Cwnet: Causal wavelet network for low-light image enhancement. In Proceedings of the Proceedings of the IEEE/CVF International Conference on Computer Vision, 2025; pp. 8789–8799. [Google Scholar]
- Deng, X.; Dragotti, P.L. Deep convolutional neural network for multi-modal image restoration and fusion. IEEE transactions on pattern analysis and machine intelligence 2020, 43, 3333–3348. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.; Wu, Y.; Li, D.; Li, G.; Zhu, P.; Zhang, Z.; Jiang, R. LiDAR-assisted image restoration for extreme low-light conditions. Knowledge-Based Systems 2025, 316, 113382. [Google Scholar] [CrossRef]
- Janjua, M.K.; Ghasemabadi, A.; Zhang, K.; Salameh, M.; Gao, C.; Niu, D. Grounding Degradations in Natural Language for All-In-One Video Restoration. arXiv 2025, arXiv:2507.14851. [Google Scholar] [CrossRef]
Short Biography of Authors
![]() |
Mingyu Liu (Student Member, IEEE) is currently a PhD candidate in the Chair of Robotics, Artificial Intelligence and Real-time Systems at the Technical University of Munich (TUM), Germany. He received his dual master’s degree in Electrical and Computer Engineering from TUM and Electronics and Communication Engineering from Tongji University, China, respectively. His research interests include computer vision in autonomous driving, deep learning, and artificial intelligence. |
![]() |
Haozhan Shu received his bachelor’s degree from Harbin Institute of Technology and is currently pursuing his M.Sc. degree in Design and Engineering at the Technical University of Munich (TUM). His research interests include image restoration and autonomous driving. |
![]() |
Yuning Cui (Student Member, IEEE) received the B.Eng. degree from Central South University, China, in 2016, and the M.Eng. degree from National University of Defense Technology, China, in 2018. He is currently working towards the Ph.D. degree at the Chair of Robotics, Artificial Intelligence and Real-time Systems within the School of Computation, Information and Technology at the Technical University of Munich. His research interest lies in image restoration. |
![]() |
Xingcheng Zhou is currently a Ph.D. student in the Chair of Robotics, Artificial Intelligence and Real-time Systems at the Technical University of Munich (TUM), Germany. He completed his M.Sc. in Electrical and Computer Engineering at the Technical University of Munich in 2021. Before starting at TUM, he worked as an Industrial AI Researcher at Siemens. His current research interests include computer vision, autonomous driving, and vision language models. |
![]() |
Hu Cao is currently a professor at school of automation, Southeast University. He was a research associate at the Chair of Robotics, Artificial Intelligence, and Real-Time Systems of the Technical University of Munich (TUM), where he has also been working as a research assistant since October. 2019. He received his Ph.D. degree in computer engineering from the Technical University of Munich (TUM) in 2023. During his studies, he stayed abroad at the ETH Zürich and the University of Hong Kong (HKU), where he was involved in developing algorithms for dense prediction (classification, detection, and segmentation), autonomous driving, and robotic grasping. His current research interests include robotics, machine learning, computer vision, event-based vision, and embodied AI. |
![]() |
Wenqi Ren (Senior Member, IEEE) received the Ph.D. degree from Tianjin University, China, in 2017. From 2015 to 2016, he worked with Prof. Ming-Hsuan Yang as a joint training student in the Electrical Engineering and Computer Science Department, University of California at Merced. He is currently a Professor with the School of Cyber Science and Technology, Shenzhen Campus, Sun Yat-sen University. His research interests include image processing and high-level vision problems. |
![]() |
Boxin Shi (Senior Member, IEEE) received the BE degree from the Beijing University of Posts and Telecommunications, in 2007, the ME degree from Peking University, in 2010, and the PhD degree from the University of Tokyo, in 2013. He is currently a Boya Young Fellow Associate Professor (with tenure) and Research Professor with Peking University, where he leads the Camera Intelligence Lab. Before joining PKU, he did research with MIT Media Lab, Singapore University of Technology and Design, Nanyang Technological University, National Institute of Advanced Industrial Science and Technology, from 2013 to 2017. His papers were awarded as Best Paper, Runners-Up at CVPR 2024, ICCP 2015, and selected as Best Paper candidate at ICCV 2015. He is an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence/International Journal of Computer Vision and an area chair of CVPR/ICCV/ECCV. His research interests include computational photography and computer vision. |
![]() |
Alois C. Knoll (Fellow, IEEE) received the M.Sc. degree in electrical / communications engineering from the University of Stuttgart, Stuttgart, Germany, in 1985, and the Ph.D. degree (summa cum laude) in computer science from the Technical University of Berlin (TU Berlin), Berlin, Germany, in 1988. He was with the Faculty of the Computer Science Department, TU Berlin, until 1993. He joined Bielefeld University, Bielefeld, Germany, as a Full Professor, where he has served as the Director for the Technical Informatics Research Group, until 2001. Since 2001, he has been a Professor with the Department of Informatics, Technical University of Munich (TUM), Munich. He was also on the Board of Directors of the Central Institute of Medical Technology, TUM (IMETUM). From 2004 to 2006, he was an Executive Director of the Institute of Computer Science, TUM. His research interests include cognitive, medical robotics, mult-iagent systems, data fusion, adaptive systems, multimedia information retrieval, modeldriven development of embedded systems with applications to automotive software and electric transportation, and simulation systems for robotics and traffic. He was a member of the EU’s Highest Advisory Board on Information Technology, Information Society Technology Advisory Group (ISTAG), and its Future and Emerging Technologies (FET) subgroup from 2007 to 2009. |




| Task | Dataset | Year | Type | Domain | Training/Testing | Description |
|---|---|---|---|---|---|---|
| Denoising | Kodak24 [135] | 1999 | r | Natural | -/24 | Clean color images |
| McMaster [136] | 2011 | r | Natural | -/18 | 18 high quality color images | |
| CBSD68 [137] | 2001 | r | Natural | -/68 | 68 clean natural images with different noisy levels | |
| Urban100 [138] | 2015 | r | Natural | -/100 | 100 high resolution urban scenes with repetitive structures | |
| DIV2K [139] | 2017 | r | Natural | 800/100 | 1000 high resolution images | |
| SIDD [140] | 2018 | r | Natural | -/160 | Real-noise image pairs with clean ground truth | |
| PolyU [53] | 2018 | r | Natural | -/40 | Real-noise paired dataset with 40 scenes | |
| WED [141] | 2016 | r&s | Natural | 4744/- | Waterloo Exploration Database | |
| BSD400 [142] | 2010 | r | Natural | 400/- | Training subset from BSD500 | |
| Mayo-2016 [26] | 2016 | r | Medical | 4800/1136 | Paired normal-dose and simulated quarter-dose abdominal CT | |
| Deraining | Rain100L [143] | 2017 | s | Natural | 200/100 | Images with light rain effect |
| Rain100H [143] | 2017 | s | Natural | 1800/100 | Images with heavy rain conditions | |
| Rain800 [144] | 2019 | s | Natural | 700/100 | Images with diverse rain patterns | |
| Rain1400 [145] | 2017 | s | Natural | 12600/1400 | 14 rain streak types | |
| Raindrop [146] | 2018 | r | Natural | 1069/58 | A paired raindrop dataset captured using dual identical glass setups | |
| Outdoor-Rain [147] | 2019 | r&s | Natural | 9000/1500 | A synthetic outdoor rain dataset with streak and accumulation effects | |
| RainDS [148] | 2021 | r&s | Natural | -/5800 | Paired deraining dataset organized as a 4-image set | |
| SSID [149] | 2022 | r&s | Natural | 47600/200 | Semi-supervised image deraining sets | |
| LHP [54] | 2023 | r | Natural | 2100/300 | Largest paired real rain dataset with image resolution | |
| Dehazing | FoggyCityscapes [150] | 2018 | s | Natural | 2975/1525 | Paired foggy and clear images |
| ACDC [151] | 2021 | r | Natural | 1600/2400 | Real-world images captured under adverse conditions | |
| RESIDE [152] | 2018 | r | Natural | 86125/4842 | Real and synthetic data across indoor and outdoor scenarios | |
| NH-HAZE [55] | 2020 | r | Natural | 45/5 | A real paired outdoor dehazing set with non-homogeneous haze | |
| Dense-Haze [153] | 2019 | r | Natural | 45/5 | A real paired dehazing dataset for dense, homogeneous haze | |
| Desnowing | RealSnow10K [56] | 2025 | r | Natural | 6406/1047 | Real-world snow removal dataset |
| Snow100K-L [12] | 2018 | s | Natural | 1872/601 | A single-image snow removal benchmark | |
| Deblurring | DPD-blur [14] | 2020 | r | Natural | 350/150 | 500 real defocus blur image pairs |
| DPD-disp [154] | 2020 | r | Natural | -/350 | Reuse the checkpoints trained on the DPD-blur dataset | |
| DDD-syn [155] | 2021 | s | Natural | 10000/1000 | Synthetic deblurring dataset with paired blurry and sharp images | |
| RDPD [156] | 2021 | s | Natural | 18000/1000 | Images captured using a dual-pixel camera | |
| GoPro [15] | 2017 | r&s | Natural | 2103/1111 | Paired images generated from real high-frame-rate GoPro videos | |
| LLIE | LOL-v1 [17] | 2018 | r | Natural | 485/15 | Paired low-light and normal-light under controlled conditions |
| LSRW [57] | 2023 | r | Natural | 445/50 | paired low-light LR with normal-light HR | |
| DICM [157] | 2013 | r | Natural | -/64 | Low light images without ground truth for visual comparison | |
| NPE [158] | 2013 | r | Natural | -/85 | Unpaired low light images | |
| VV [159] | 2018 | r | Natural | -/24 | 24 real-world unpaired low light images | |
| LOL-v2-real [18] | 2021 | r | Natural | 689/100 | Real paired low-light sets | |
| LOL-v2-syn [18] | 2021 | s | Natural | 900/100 | Synthetic paired low-light sets | |
| MEF [160] | 2015 | r | Natural | -/17 | Multiple images with different exposure levels for the same scene | |
| SICE [161] | 2018 | r | Natural | 360/229 | Multiple reference images of different enhancement levels | |
| LIME [162] | 2016 | r | Natural | -/10 | 10 images without ground truth | |
| Underwater | EUVP [24] | 2019 | r | Underwater | 20000/- | Include both paired and unpaired samples |
| UIEB [23] | 2019 | r | Underwater | 800/90 | Underwater image enhancement benchmark | |
| RUIE [163] | 2020 | r | Underwater | -/4230 | Real-world underwater image enhancement | |
| Super Resolution | Set5 [164] | 2021 | r | Natural | -/5 | 5 real-world natural images |
| Set14 [165] | 2010 | r | Natural | -/14 | 14 real-world natural images | |
| Manga109 [166] | 2017 | r | Natural | -/109 | 109 real-world manga images | |
| CelebA [167] | 2015 | r | Natural | 162770/19867 | Images with 40 binary attributes | |
| RealSR [168] | 2019 | r | Natural | -/35 | Real-world low-and high-resolution image pairs | |
| DrealSR [169] | 2020 | r | Natural | -/93 | 93 aligned LR-HR image pairs | |
| DIV2K-Val [170] | 2024 | r | Natural | -/100 | 3K patches from the DIV2K validation set | |
| RealSRSet [171] | 2021 | r | Natural | -/20 | comprising images captured in practical scenarios | |
| DIV4K-50 [58] | 2024 | r | Natural | -/50 | distorted images paired with counterparts | |
| DiffusionDB [172] | 2023 | s | Natural | -/100 | Text-to-image prompt gallery sets | |
| AID [173] | 2017 | r | Natural | -/135 | Aerial image dataset | |
| DIOR [174] | 2019 | r | Natural | -/154 | Object detection in optical remote sensing images | |
| DOTA [175] | 2018 | r | Natural | -/183 | Dataset for object detection in aerial images | |
| bcSR [176] | 2023 | r | Medical | -/200 | Pathology images patches from breast cancer whole slide images | |
| US-Case [177] | 2025 | r | Medical | -/111 | Ultrasound cases | |
| Composite | PromptFix [134] | 2024 | r&s | Natural | 101320/- | Paired input–goal–instruction triplets spanning 7 tasks |
| MiO100 [61] | 2024 | r&s | Natural | -/700 | Each image is degraded with 7 single degradation types | |
| AgenticIR [37] | 2025 | r&s | Natural | -/1440 | 16 mixed-degradation combinations (2–3 types) | |
| CleanBench [36] | 2025 | r&s | Natural | 150000/80000 | A large-scale, high-quality instruction-response | |
| MSRS [178] | 2022 | r | Natural | 1163/361 | A multi-spectral IR-VIS paired set | |
| FMB [179] | 2023 | r | Natural | 1220/280 | 1500 aligned pairs | |
| CDD-11 [180] | 2024 | r&s | Natural | 13013/2200 | images selected from the RAISE dataset | |
| TOLED [181] | 2021 | r | Natural | 240/30 | A real paired under-display camera restoration set | |
| AVIRIS [182] | 2024 | r | HSI | 1678/200 | Airborne visible/infrared imaging spectrometer | |
| ARAD [183] | 2022 | r | HSI | 1000/- | A large natural spectral image set |
| Category | Sub-category | Representative Metrics | GT | Usage |
|---|---|---|---|---|
| Full-Reference | Non-learning-based | PSNR, SSIM [38], FSIM [184], MAE, MSE, RMSE, ERGAS [186] | ✓ | Pixel-level fidelity or structural consistency evaluation |
| Learning-based | LPIPS [40], DISTS [82], CKDN [83], AHIQ [84], TOPIQ-FR [85] | ✓ | Feature-based perceptual similarity | |
| Distribution-based | FID [185] | ✓ | Feature-space distribution alignment | |
| No-Reference | Hand-crafted | BRISQUE [86], NIQE [87], PIQE [88], LOE [158], PI [187] | × | Blind perceptual quality estimation |
| Learning-based | MUSIQ [89], MANIQA [91], NIMA [90], HyperIQA [92], PAQ2-PIQ [188], DBCNN [189], TOPIQ-NR [85], CNNIQA [93] | × | Learning-based NR-IQA | |
| VLM-based | CLIP-IQA [41], QualiCLIP [43], UNIQA [190], LIQE [191], GRMP-IQA [192], PromptIQA [193], ATTIQA [194], SFD [195] | × | Vision-language-aligned perceptual quality evaluation |
|
| MLLM-based | DeQA-Score [45], Q-Align [46], Compare2Score [95], Dog-IQA [196], Q-Insight [197], DepictQA [96], DepictQA-Wild [198], IQAGPT [199], Q-Hawkeye [200], Q-Ground [98], Q-Scorer [99], AgenticIQA [48], SEAGULL [97], SCUIA [201], LEAF [202], Q-Ponder [203], CAP-IQA [204], Co-Instruct [205], Q-Instruct [94], Q-Bench [206], Q-Bench+ [207], RALI [208] | × | Language-grounded perceptual reasoning, preference modeling, and quality scoring |
|
| Evaluation Paradigms |
Human-aligned | PLCC, SRCC, KRCC [209], Weighted Kappa [210], Percent Agreement | × | Correlation with human subjective quality perception |
| Task-oriented | Precision, Recall, F1, mIoU, Accuracy [96] | ✓ | Downstream task performance | |
| Text-based | BLEU-N [211], ROUGE-L [212], METEOR [213], CIDEr [214] | ✓ | Textual or semantic fidelity evaluation |
| Method | Venue | Params | Deraining | Denoising (BSD68 [137]) | Dehazing | Average | |||||||||||
| Rain100L [143] | SOTS [152] | ||||||||||||||||
| AirNet [219] | CVPR’22 | 9M | 34.90 | 0.967 | 33.92 | 0.933 | 31.26 | 0.888 | 28.00 | 0.797 | 27.94 | 0.962 | 31.20 | 0.910 | |||
| IDR [220] | CVPR’23 | 15M | 36.03 | 0.971 | 33.89 | 0.931 | 31.32 | 0.884 | 28.04 | 0.798 | 29.87 | 0.970 | 31.83 | 0.911 | |||
| PromptIR [33] | NeurIPS’23 | 33M | 36.37 | 0.972 | 33.98 | 0.933 | 31.31 | 0.888 | 28.06 | 0.799 | 30.58 | 0.974 | 32.06 | 0.913 | |||
| AdaIR [221] | ICLR’25 | 29M | 38.64 | 0.983 | 34.12 | 0.934 | 31.45 | 0.892 | 28.19 | 0.802 | 31.06 | 0.980 | 32.69 | 0.918 | |||
| DSwinIR [222] | T-PAMI’25 | 24M | 37.73 | 0.983 | 34.12 | 0.933 | 31.59 | 0.890 | 28.31 | 0.803 | 31.86 | 0.980 | 32.72 | 0.917 | |||
| VIVNet [223] | T-PAMI’26 | 7.42M | 38.47 | 0.983 | 34.16 | 0.936 | 31.50 | 0.893 | 28.24 | 0.806 | 32.19 | 0.982 | 32.91 | 0.920 | |||
| InstructIR-3D [34] | ECCV’24 | 16M | 37.98 | 0.978 | 34.15 | 0.933 | 31.52 | 0.890 | 28.30 | 0.803 | 30.22 | 0.959 | 32.43 | 0.913 | |||
| VLU-Net [67] | CVPR’25 | 35M | 38.93 | 0.984 | 34.13 | 0.935 | 31.48 | 0.892 | 28.23 | 0.804 | 30.71 | 0.980 | 32.70 | 0.919 | |||
| Perceive-IR [122] | T-IP’25 | 42M | 38.29 | 0.980 | 34.13 | 0.934 | 31.53 | 0.890 | 28.31 | 0.804 | 30.87 | 0.975 | 32.63 | 0.917 | |||
| ClearAIR [66] | AAAI’26 | 31M | 38.61 | 0.984 | 34.18 | 0.935 | 31.50 | 0.891 | 28.31 | 0.804 | 31.08 | 0.981 | 32.74 | 0.919 | |||
| Method | Venue | Params | Dehazing | Deraining | Denoising | Deblurring | LLIE | Average | |||||||||||
| SOTS [152] | Rain100L [143] | [137] | GoPro [15] | LOL [17] | |||||||||||||||
| AirNet [219] | CVPR’22 | 9M | 21.04 | 0.884 | 32.98 | 0.951 | 30.91 | 0.882 | 24.35 | 0.781 | 18.18 | 0.735 | 25.49 | 0.847 | |||||
| IDR [220] | CVPR’23 | 15M | 25.24 | 0.943 | 35.63 | 0.965 | 31.60 | 0.887 | 27.87 | 0.846 | 21.34 | 0.826 | 28.34 | 0.893 | |||||
| PromptIR [33] | NeurIPS’23 | 33M | 26.54 | 0.949 | 36.37 | 0.970 | 31.47 | 0.886 | 28.71 | 0.881 | 22.68 | 0.832 | 29.15 | 0.904 | |||||
| AdaIR [221] | ICLR’25 | 29M | 30.53 | 0.978 | 38.02 | 0.981 | 31.35 | 0.888 | 28.12 | 0.858 | 23.00 | 0.845 | 30.20 | 0.910 | |||||
| DSwinIR [222] | T-PAMI’25 | 24M | 30.09 | 0.975 | 37.77 | 0.982 | 31.34 | 0.885 | 29.17 | 0.879 | 22.64 | 0.843 | 30.20 | 0.913 | |||||
| VIVNet [223] | T-PAMI’26 | 7.42M | 31.85 | 0.982 | 38.67 | 0.984 | 31.46 | 0.892 | 28.50 | 0.866 | 23.03 | 0.857 | 30.70 | 0.916 | |||||
| DA-CLIP [112] | ICLR’24 | 125M | 26.28 | 0.939 | 35.91 | 0.972 | 25.77 | 0.653 | 28.81 | 0.882 | 22.57 | 0.832 | 29.23 | 0.898 | |||||
| DiffRes [224] | CVPR’25 | 45M | 27.23 | 0.958 | 37.25 | 0.979 | 32.07 | 0.890 | 29.33 | 0.883 | 23.13 | 0.843 | 29.78 | 0.911 | |||||
| InstructIR-5D [34] | ECCV’24 | 16M | 27.10 | 0.956 | 36.84 | 0.973 | 31.40 | 0.887 | 29.40 | 0.886 | 23.00 | 0.836 | 29.55 | 0.907 | |||||
| VLU-Net [67] | CVPR’25 | 35M | 30.84 | 0.980 | 38.54 | 0.982 | 31.43 | 0.891 | 27.46 | 0.840 | 22.29 | 0.833 | 30.11 | 0.905 | |||||
| Perceive-IR [122] | T-IP’25 | 42M | 28.19 | 0.964 | 37.25 | 0.977 | 31.44 | 0.887 | 29.46 | 0.886 | 22.88 | 0.833 | 29.84 | 0.909 | |||||
| ClearAIR [66] | AAAI’26 | 31M | 30.12 | 0.978 | 38.20 | 0.982 | 31.53 | 0.888 | 29.67 | 0.887 | 22.83 | 0.846 | 30.45 | 0.916 | |||||
| Method | Venue | PSNR | SSIM |
|---|---|---|---|
| RetinexFormer [30] | ICCV’23 | 25.16 | 0.845 |
| LLFormer [225] | AAAI’23 | 23.65 | 0.8163 |
| CWNet [226] | ICCV’25 | 23.60 | 0.8496 |
| RetinexDiff++ [218] | T-PAMI’25 | 24.67 | 0.867 |
| LLMRA [68] | ECCV’24 | 23.30 | 0.846 |
| DA-CLIP [112] | ICLR’24 | 23.40 | 0.811 |
| DiffRes [224] | CVPR’25 | 24.55 | 0.839 |
| Perceive-IR [122] | T-IP’25 | 23.79 | 0.841 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).







