Submitted:
07 July 2025
Posted:
08 July 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Handcrafted feature-based methods. An input image can be quantified by a compact feature vector, which serves as the input to a regression model for quality score prediction. For example, Chen . extract handcrafted features by computing maximum gradient variations and local entropy across multiple image channels, which are then combined to assess image sharpness [10]. Yu . treat the outputs of multiple indicators as mid-level features and explore various regression models for predicting perceptual quality [11]. Wu . extract multi-stage semantic features from a pre-trained deep network and refine them using a multi-level channel attention module to improve prediction accuracy [12]. Similarly, Chen and Yu utilize a pre-trained deep network as a fixed feature extractor and evaluate a range of regression models on the BIQA task [13]. These feature-based approaches [10,11,12,13,14] are computationally efficient and can compress an UHD image into a low-dimensional representation, enabling fast quality prediction. However, their reliance on handcrafted or shallow features often limits their representational capacity, making it challenging to capture the rich visual structures and complex distortions present in UHD images.
- Patch-based deep learning methods. Many deep learning-based approaches adopt a patch-based strategy, where numerous sub-regions (patches) are sampled from an input image and fed into a neural network. The image quality score is then predicted through an end-to-end learning process that jointly optimizes hierarchical feature representations and network parameters. For instance, Yu . develop a shallow convolutional neural network (CNN), where randomly cropped patches from each image are used to train the network by minimizing a cost function with respect to the corresponding global image quality score [15,16]. Bianco . explore the use of features extracted from pre-trained networks, as well as fine-tuning strategies tailored for the BIQA task, and the final image quality score is obtained by average-pooling the predicted scores across multiple patches [17]. Ma . propose a multi-task, end-to-end framework that simultaneously predicts distortion types and image quality using two sub-networks [18]. Su . introduce a hyper-network architecture that divides the BIQA process into three stages, content understanding, perception rule learning, and self-adaptive score regression [19]. While these patch-based methods [15,16,17,18,19] offer an effective means to learn visual features in a data-efficient manner, they typically assign the same global quality score to all patches regardless of local variation. This simplification overlooks the spatial heterogeneity and region-specific distortions that are especially pronounced in UHD images.
- Transformer-based methods. To fully exploit image content and mitigate the negative effects of cropping or resizing, BIQA based on Transformer and their variants has been proposed [20,21,22]. These models leverage the strength of self-attention mechanisms to capture both global and local dependencies, which is especially important for handling the complex structures in high-resolution images. For instance, Ke . introduce a multi-scale image quality Transformer, which takes full-resolution images as input. The model represents image quality at multiple granularities with varying sizes and aspect ratios, and incorporates a hash-based spatial and scale-aware embedding to support positional encoding in the multi-scale representation [23]. Qin . fine-tune a pre-trained vision backbone and introduce a Transformer decoder to extract quality-aware features. They further propose an attention panel that enhances performance and reduces prediction uncertainty [24]. Yang . design a model that uses image patches for feature extraction, applies channel-wise self-attention, and incorporates a scale factor to model the interaction between global context and local details. The final quality score is computed as a weighted aggregation of patch-level scores [25]. Pan . design a semantic attention module for refining quality perceptual features and introduce a perceptual rule learning module tailed to image content, leveraging image semantics into the BIQA process [26]. Despite their superior performance, these Transformer-based models [23,24,25,26] often require substantial computational resources for training and fine-tuning. This high computational cost poses a barrier to their widespread application, particularly in real-time or resource-constrained environments.
- Large multi-modal model-based methods. Large multi-modal models (LMMMs) offer promising opportunities for advancing BIQA by integrating both visual and textual information for rich image quality representation. By leveraging language understanding alongside visual perception, these models can incorporate subjective reasoning, descriptive feedback, and contextual knowledge into the assessment process. For example, You . construct a multi-functional BIQA framework that includes both subjective scoring and comparison tasks. They develop a LMMM capable of interpreting user-provided explanations and reasoning to inform the final quality predictions [27]. Zhu . train a LMMM across diverse datasets to enable it to compare the perceptual quality of multiple anchor images. The final quality scores are derived via maximum a posteriori estimation from a predicted comparison probability matrix [28]. Chen . further extend the capabilities of LMMMs by incorporating detailed visual quality analysis from multiple modalities, including the image itself, quality-related textual descriptions, and distortion segmentation. They utilize multi-scale feature learning to support image quality answering and region-specific distortion detection via text prompts [29]. Kwon . generate a large number of attribute-aware pseudo-labels by using LMMM and allow to learn rich representative attributes of image quality by fine-tuning on large image datasets, and these quality-related knowledge enables several applications in real-world scenarios [30]. While these LMMM-based frameworks significantly enhance the flexibility and interpretability of BIQA systems, they come with substantial costs. Training such models requires massive amounts of annotated image-text data and high-performance computational resources [27,28,29,30]. In addition, the inference time of LMMMs is often longer compared to conventional deep learning models, which limits their practical deployment in real-time or resource-constrained scenarios.
- A novel BIQA framework, namely SUper-resolved Pseudo References In Dual-branch Embedding (SURPRIDE), is proposed, that leverages SR reconstruction as a self-supervised transformation to generate external quality representations.
- A dual-branch network with a hybrid loss function is implemented. It jointly models intrinsic quality features from the distorted image and comparative cues from the generated pseudo-reference. The hybrid loss function combines the cosine similarity and mean squared error (MSE), allowing to learn from both absolute quality indicators and relational differences between the input patch pairs.
- Comprehensive experiments are conducted on multiple BIQA benchmarks, including UHD, high-definition (HD), and standard-definition (SD) image datasets. The results demonstrate that SURPRIDE achieves superior or competitive performance compared to state-of-the-art (SOTA) works.
2. The Proposed SURPRIDE Framework
2.1. One Phenomenon Observed in Image Quality Degradation
2.2. The Proposed Framework
2.2.1. Patch Preparation
2.3. A Dual-Branch Network Architecture
2.4. The Proposed Hybrid Loss Function
3. Materials and methods
3.1. Databases
3.2. Involved BIQA Models
3.3. Experimental Design
4. Results
4.1. Performance on the UHD-IQA Image Database
4.2. Ablation Studies on the UHD-IQA Image Database
4.2.1. Setting of the Dual-Branch Networks
4.2.2. Determination of the Input Image Sizes
4.2.3. Effect of SR Methods and Scaling Factors
4.2.4. Optimization of Parameter and Configurations
4.2.5. Effect of the Basic Patch Sizes
4.2.6. Effect of the SR Branch
4.2.7. Effect of the Loss Function
4.3. Performance on Several other Image Databases
4.3.1. Results on Another UHD Image Database
4.3.2. Results on two HD Image Database
4.3.3. Results on Two SD Image Database
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| BIQA | blind image quality assessment |
| UHD | ultra-high-definition |
| SR | super-resolution |
| SURPRIDE | SUper-resolved Pseudo References In Dual-branch Embedding |
| CNN | convolutional neural network |
| LMMM | Large multi-modal model |
| ViT | Vision Transformer |
| RNN | recurrent neural network |
| MSE | mean squared error |
| SOTA | state-of-the-art |
| HD | high-definition |
| SD | standard-definition |
| MOS | mean opinion score |
| FC | full connection |
| PLCC | Pearson’s linear correlation coefficient |
| SRCC | Spearman rank-order correlation coefficient |
References
- Dai, G.; Wang, Z.; Li, Y.; Chen, Q.; Yu, S.; Xie, Y. Evaluation of no-reference models to assess image sharpness. In Proceedings of the 2017 IEEE International Conference on Information and Automation (ICIA). IEEE, 2017, pp. 683–687. [CrossRef]
- Zhai, G.; Min, X. Perceptual image quality assessment: A survey. Science China Information Sciences 2020, 63(11), 1–52. [CrossRef]
- Yang, P.; Sturtz, J.; Qingge, L. Progress in blind image quality assessment: A brief review. Mathematics 2023, 11, 2766. [CrossRef]
- Chen, L.; Jiang, F.; Zhang, H.; Wu, S.; Yu, S.; Xie, Y. Edge preservation ratio for image sharpness assessment. In Proceedings of the 2016 12th World Congress on Intelligent Control and Automation (WCICA). IEEE, 2016, pp. 1377–1381. [CrossRef]
- Lang, S.; Liu, X.; Zhou, M.; Luo, J.; Pu, H.; Zhuang, X.; Wang, J.; Wei, X.; Zhang, T.; Feng, Y.; et al. A full-reference image quality assessment method via deep meta-learning and conformer. IEEE Transactions on Broadcasting 2023, 70, 316–324. [CrossRef]
- Soundararajan, R.; Bovik, A.C. RRED indices: Reduced reference entropic differencing for image quality assessment. IEEE Transactions on Image Processing 2011, 21, 517–526. [CrossRef]
- Wu, J.; Lin, W.; Shi, G.; Liu, A. Reduced-reference image quality assessment with visual information fidelity. IEEE Transactions on Multimedia 2013, 15, 1700–1705. [CrossRef]
- Ghadiyaram, D.; Bovik, A.C. Massive online crowdsourced study of subjective and objective picture quality. IEEE Transactions on Image Processing 2015, 25, 372–387. [CrossRef]
- Hosu, V.; Agnolucci, L.; Wiedemann, O.; Iso, D.; Saupe, D. Uhd-iqa benchmark database: Pushing the boundaries of blind photo quality assessment. In Proceedings of the European Conference on Computer Vision. Springer, 2025, pp. 467–482. [CrossRef]
- Chen, J.; Li, S.; Lin, L. A no-reference blurred colourful image quality assessment method based on dual maximum local information. IET Signal Processing 2021, 15(9), 597–611. [CrossRef]
- Yu, S.; Wang, J.; Gu, J.; Jin, M.; Ma, Y.; Yang, L.; Li, J. A hybrid indicator for realistic blurred image quality assessment. Journal of Visual Communication and Image Representation 2023, 94, 103848. [CrossRef]
- Wu, W.; Huang, D.; Yao, Y.; Shen, Z.; Zhang, H.; Yan, C.; Zheng, B. Feature rectification and enhancement for no-reference image quality assessment. Journal of Visual Communication and Image Representation 2024, 98, 104030. [CrossRef]
- Chen, Z.; Yu, S. Taylor expansion-based Kolmogorov-Arnold network for blind image quality assessment. arXiv preprint arXiv:2505.21592 2025. [CrossRef]
- Yu, S.; Chen, Z.; Yang, Z.; Gu, J.; Feng, B.; Sun, Q. Exploring Kolmogorov-Arnold networks for realistic image sharpness assessment. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5. [CrossRef]
- Yu, S.; Jiang, F.; Li, L.; Xie, Y. CNN-GRNN for image sharpness assessment. In Proceedings of the Asian Conference on Computer Vision, 2016, pp. 50–61. [CrossRef]
- Yu, S.; Wu, S.; Wang, L.; Jiang, F.; Xie, Y.; Li, L. A shallow convolutional neural network for blind image sharpness assessment. PloS one 2017, 12(5), e0176632. [CrossRef]
- Bianco, S.; Celona, L.; Napoletano, P.; Schettini, R. On the use of deep learning for blind image quality assessment. Signal, Image and Video Processing 2018, 12, 355–362. [CrossRef]
- Ma, K.; Liu, W.; Zhang, K.; Duanmu, Z.; Wang, Z.; Zuo, W. End-to-end blind image quality assessment using deep neural networks. IEEE Transactions on Image Processing 2017, 27, 1202–1213. [CrossRef]
- Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y. Blindly assess image quality in the wild guided by a self-adaptive hyper network. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 3667–3676.
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30.
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 2020.
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 10012–10022.
- Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5148–5157.
- Qin, G.; Hu, R.; Liu, Y.; Zheng, X.; Liu, H.; Li, X.; Zhang, Y. Data-efficient image quality assessment with attention-panel decoder. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2023, Vol. 37, pp. 2091–2100. [CrossRef]
- Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 1191–1200.
- Pan, L.; Zhang, X.; Xie, F.; Zhang, H.; Zheng, Y. SGIQA: semantic-guided no-reference image quality assessment. IEEE Transactions on Broadcasting 2024. [CrossRef]
- You, Z.; Gu, J.; Li, Z.; Cai, X.; Zhu, K.; Dong, C.; Xue, T. Descriptive image quality assessment in the wild. arXiv preprint arXiv:2405.18842 2024. [CrossRef]
- Zhu, H.; Wu, H.; Li, Y.; Zhang, Z.; Chen, B.; Zhu, L.; Fang, Y.; Zhai, G.; Lin, W.; Wang, S. Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare. arXiv preprint arXiv:2405.19298 2024. [CrossRef]
- Chen, C.; Yang, S.; Wu, H.; Liao, L.; Zhang, Z.; Wang, A.; Sun, W.; Yan, Q.; Lin, W. Q-ground: Image quality grounding with large multi-modality models. In Proceedings of the Proceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 486–495. [CrossRef]
- Kwon, D.; Kim, D.; Ki, S.; Jo, Y.; Lee, H.E.; Kim, S.J. ATTIQA: Generalizable Image Quality Feature Extractor using Attribute-aware Pretraining. In Proceedings of the Proceedings of the Asian Conference on Computer Vision, 2024, pp. 4526–4543.
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
- Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 1994, 5, 157–166. [CrossRef]
- Huang, H.; Wan, Q.; Korhonen, J. High resolution image quality database. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024. [CrossRef]
- Sun, W.; Zhang, W.; Cao, Y.; Cao, L.; Jia, J.; Chen, Z.; Zhang, Z.; Min, X.; Zhai, G. Assessing UHD image quality from aesthetics, distortions, and saliency. In Proceedings of the European Conference on Computer Vision. Springer, 2025, pp. 109–126. [CrossRef]
- Tan, X.; Zhang, J.; Quan, Y.; Li, J.; Wu, Y.; Bian, Z. Highly efficient no-reference 4k video quality assessment with full-pixel covering sampling and training strategy. In Proceedings of the Proceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 9913–9922. [CrossRef]
- Stern, M.K.; Johnson, J.H. Just noticeable difference. The corsini encyclopedia of psychology 2010, pp. 1–2. [CrossRef]
- Yu, A.; Grauman, K. Just noticeable differences in visual attributes. In Proceedings of the Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2416–2424.
- Wu, J.; Li, L.; Dong, W.; Shi, G.; Lin, W.; Kuo, C.C.J. Enhanced just noticeable difference model for images with pattern complexity. IEEE Transactions on Image Processing 2017, 26, 2682–2693. [CrossRef]
- Ferzli, R.; Karam, L.J. A no-reference objective image sharpness metric based on the notion of just noticeable blur (JNB). IEEE transactions on image processing 2009, 18, 717–728. [CrossRef]
- Ahmed, N.; Asif, S. BIQ2021: a large-scale blind image quality assessment database. Journal of Electronic Imaging 2022, 31, 053010–053010. [CrossRef]
- Virtanen, T.; Nuutinen, M.; Vaahteranoksa, M.; Oittinen, P.; Häkkinen, J. CID2013: A database for evaluating no-reference image quality assessment algorithms. IEEE Transactions on Image Processing 2014, 24, 390–402. [CrossRef]
- Hosu, V.; Lin, H.; Sziranyi, T.; Saupe, D. KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing 2020, 29, 4041–4056. [CrossRef]
- Hosu, V.; Conde, M.V.; Agnolucci, L.; Barman, N.; Zadtootaghaj, S.; Timofte, R.; Sun, W.; Zhang, W.; Cao, Y.; Cao, L.; et al. AIM 2024 challenge on UHD blind photo quality assessment. In Proceedings of the European Conference on Computer Vision. Springer, 2025, pp. 261–286. [CrossRef]
- Zhu, H.; Li, L.; Wu, J.; Dong, W.; Shi, G. MetaIQA: Deep meta-learning for no-reference image quality assessment. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 14143–14152.
- Pan, Z.; Zhang, H.; Lei, J.; Fang, Y.; Shao, X.; Ling, N.; Kwong, S. DACNN: Blind image quality assessment via a distortion-aware convolutional neural network. IEEE Transactions on Circuits and Systems for Video Technology 2022, 32, 7518–7531. [CrossRef]
- Gao, Y.; Min, X.; Cao, Y.; Liu, X.; Zhai, G. No-Reference Image Quality Assessment: Obtain MOS from Image Quality Score Distribution. IEEE Transactions on Circuits and Systems for Video Technology 2024. [CrossRef]
- Zhao, W.; Li, M.; Xu, L.; Sun, Y.; Zhao, Z.; Zhai, Y. A Multi-Branch Network with Multi-Layer Feature Fusion for No-Reference Image Quality Assessment. IEEE Transactions on Instrumentation and Measurement 2024. [CrossRef]
- Saha, A.; Mishra, S.; Bovik, A.C. Re-iqa: Unsupervised learning for image quality assessment in the wild. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 5846–5855.
- Golestaneh, S.A.; Dadsetan, S.; Kitani, K.M. No-reference image quality assessment via transformers, relative ranking, and self-consistency. In Proceedings of the Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2022, pp. 1220–1230.
- Shin, N.H.; Lee, S.H.; Kim, C.S. Blind image quality assessment based on geometric order learning. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 12799–12808.
- Chen, Z.; Wang, J.; Li, B.; Yuan, C.; Hu, W.; Liu, J.; Li, P.; Wang, Y.; Zhang, Y.; Zhang, C. Gmc-iqa: Exploiting global-correlation and mean-opinion consistency for no-reference image quality assessment. arXiv preprint arXiv:2401.10511 2024. [CrossRef]
- Chen, Z.; Qin, H.; Wang, J.; Yuan, C.; Li, B.; Hu, W.; Wang, L. Promptiqa: Boosting the performance and generalization for no-reference image quality assessment via prompts. In Proceedings of the European Conference on Computer Vision. Springer, 2024, pp. 247–264. [CrossRef]
- Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jégou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the International conference on machine learning. PMLR, 2021, pp. 10347–10357.
- Liu, Z.; Mao, H.; Wu, C.Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A convnet for the 2020s. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976–11986.
- Zhang, D.; Huang, F.; Liu, S.; Wang, X.; Jin, Z. Swinfir: Revisiting the swinir with fast fourier convolution and improved training for image super-resolution. arXiv preprint arXiv:2208.11247 2022. [CrossRef]
- Chen, X.; Wang, X.; Zhang, W.; Kong, X.; Qiao, Y.; Zhou, J.; Dong, C. Hat: Hybrid attention transformer for image restoration. arXiv preprint arXiv:2309.05239 2023. [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 2017. [CrossRef]
- Murray, N.; Marchesotti, L.; Perronnin, F. AVA: A large-scale database for aesthetic visual analysis. In Proceedings of the 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 2408–2415. [CrossRef]
- Huang, Z.; Liu, H.; Jia, Z.; Zhang, S.; Zhang, Y.; Liu, S. Texture dominated no-reference quality assessment for high resolution image by multi-scale mechanism. Neurocomputing 2025, 636, 130003. [CrossRef]
- Yang, Y.; Liu, C.; Wu, H.; Yu, D. A quality assessment algorithm for no-reference images based on transfer learning. PeerJ Computer Science 2025, 11, e2654. [CrossRef]
- Valicharla, S.K.; Li, X.; Greenleaf, J.; Turcotte, R.; Hayes, C.; Park, Y.L. Precision detection and assessment of ash death and decline caused by the emerald ash borer using drones and deep learning. Plants 2023, 12, 798. [CrossRef]
- König, M.; Seeböck, P.; Gerendas, B.S.; Mylonas, G.; Winklhofer, R.; Dimakopoulou, I.; Schmidt-Erfurth, U.M. Quality assessment of colour fundus and fluorescein angiography images using deep learning. British Journal of Ophthalmology 2024, 108, 98–104. [CrossRef]
- Yao, J.; Yang, B.; Wang, X. Reconstruction vs. generation: Taming optimization dilemma in latent diffusion models. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 15703–15712.
- Zhou, Y.; Ye, Y.; Zhang, P.; Wei, X.; Chen, M. Exact fusion via feature distribution matching for few-shot image generation. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 8383–8392.
- Yu, S.; Meng, J.; Fan, W.; Chen, Y.; Zhu, B.; Yu, H.; Xie, Y.; Sun, Q. Speech emotion recognition using dual-stream representation and cross-attention fusion. Electronics 2024, 13, 2191. [CrossRef]
- He, H.; Zhang, J.; Cai, Y.; Chen, H.; Hu, X.; Gan, Z.; Wang, Y.; Wang, C.; Wu, Y.; Xie, L. Mobilemamba: Lightweight multi-receptive visual mamba network. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 4497–4507.
- Yue, Z.; Liao, K.; Loy, C.C. Arbitrary-steps image super-resolution via diffusion inversion. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 23153–23163.
- Zhang, L.; You, W.; Shi, K.; Gu, S. Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model. In Proceedings of the Proceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 17980–17989.
| 1 | Pre-trained DeiT model: https://huggingface.co/facebook/deit-base-distilled-patch16-384
|
| 2 | Pre-trained ConvNeXt model: https://huggingface.co/facebook/convnext-base-384-22k-1k
|




| Year | No. of images | Resolution | MOS range | Cropped resolution | |
| UHD-IQA [9] | 2024 | 6,073 | ≈ 3840×2160 | [0,1] | 2560×1440 |
| HRIQ [33] | 2024 | 1,120 | 2880×2160 | [0,5] | 2560×1440 |
| CID [41] | 2014 | 474 | 1600×1200 | [0,100] | 1600×1200 |
| KonIQ-10k [42] | 2020 | 10,073 | 1024×768 | [0,5] | 1024×768 |
| CLIVE [8] | 2015 | 1,162 | ≈ 500×500 | [0,100] | 496×496 |
| BIQ2021 [40] | 2021 | 12,000 | ≈ 512×512 | [0,1] | 512×512 |
| on the testing set | on the validation set | |||
| PLCC | SRCC | PLCC | SRCC | |
| SJTU | 0.7985 | 0.8463 | 0.8238 | 0.8169 |
| GS-PIQA | 0.7925 | 0.8297 | 0.8192 | 0.8092 |
| CIPLAB | 0.7995 | 0.8354 | 0.8136 | 0.8063 |
| EQCNet | 0.7682 | 0.7954 | 0.8285 | 0.8234 |
| MobileNet-IQA | 0.7558 | 0.7883 | 0.7831 | 0.7757 |
| NF-RegNets | 0.7222 | 0.7715 | 0.7968 | 0.7897 |
| CLIP-IQA | 0.7116 | 0.7305 | 0.7069 | 0.6918 |
| ICL | 0.5206 | 0.5166 | 0.5217 | 0.5101 |
| SURPRIDE (ours) | 0.7755 | 0.8133 | 0.7983 | 0.7930 |
| PLCC | SRCC | PLCC | SRCC | |
| Original branch | SR branch | |||
| DeiT [53] | ConvNeXt [54] | |||
| DeiT [53] | 0.7188 | 0.7390 | 0.6769 | 0.7054 |
| ConvNeXt [54] | 0.7695 | 0.8073 | 0.7755 | 0.8133 |
| dual branches | input size | UHD-IQA [9] | |
| PLCC | SRCC | ||
| ConvNeXt | 224 × 224 | 0.7510 | 0.7847 |
| 384 × 384 | 0.7755 | 0.8133 | |
| SR method | scaling factors | UHD-IQA [9] | |
| PLCC | SRCC | ||
| SwinFIR [55] | × 2 | 0.7722 | 0.8098 |
| × 4 | 0.7755 | 0.8133 | |
| HAT [56] | × 2 | 0.7665 | 0.8111 |
| × 4 | 0.7765 | 0.8095 | |
| 0.766/0.801 | 0.781/0.823 | 0.776/0.813 | 0.762/0.799 | 0.774/0.813 | |
| 0.751/0.793 | 0.763/0.804 | 0.769/0.807 | 0.774/0.811 | 0.773/0.816 | |
| 0.758/0.796 | 0.778/0.811 | 0.781/0.816 | 0.784/0.819 | 0.774/0.818 |
| patch sizes | UHD-IQA [9] | |
| PLCC | SRCC | |
| 8 × 8 | 0.7741 | 0.8039 |
| 16 × 16 | 0.7755 | 0.8133 |
| 32 × 32 | 0.7720 | 0.8140 |
| input size | SR branch | UHD-IQA [9] | CLIVE [8] | KonIQ-10k [42] | |||
| PLCC | SRCC | PLCC | SRCC | PLCC | SRCC | ||
| 224 × 224 | w/o | 0.7514 | 0.7855 | 0.8241 | 0.7993 | 0.9200 | 0.9082 |
| w | 0.7510 | 0.7847 | 0.8764 | 0.8431 | 0.9234 | 0.9113 | |
| 384 × 384 | w/o | 0.7682 | 0.8029 | 0.8948 | 0.8551 | 0.9358 | 0.9258 |
| w | 0.7755 | 0.8133 | 0.9024 | 0.8662 | 0.9360 | 0.9269 | |
| input size | UHD-IQA [9] | CLIVE [8] | KonIQ-10k [42] | ||||
| PLCC | SRCC | PLCC | SRCC | PLCC | SRCC | ||
| 224 × 224 | w/o | 0.7002 | 0.7327 | 0.8492 | 0.8255 | 0.9232 | 0.9104 |
| w | 0.7510 | 0.7847 | 0.8764 | 0.8431 | 0.9234 | 0.9113 | |
| 384 × 384 | w/o | 0.7701 | 0.8077 | 0.8990 | 0.8642 | 0.9389 | 0.9299 |
| w | 0.7755 | 0.8133 | 0.9024 | 0.8662 | 0.9360 | 0.9269 | |
| year | HRIQ [33] | ||
| PLCC | SRCC | ||
| HyperIQA [19] | 2020 | 0.848 | 0.848 |
| MANIQA [25] | 2022 | 0.824 | 0.824 |
| HR-BIQA [33] | 2024 | 0.925 | 0.920 |
| TD-HRNet [59] | 2025 | 0.856 | 0.861 |
| SURPRIDE (ours) | 2025 | 0.882 | 0.873 |
| year | CID [41] | ||
| PLCC | SRCC | ||
| MetaIQA [44] | 2020 | 0.7840 | 0.7660 |
| DACNN [45] | 2022 | 0.9280 | 0.9060 |
| GCN-IQD [46] | 2023 | 0.9211 | 0.9095 |
| MFFNet [47] | 2024 | 0.9560 | 0.9530 |
| SURPRIDE (ours) | 2025 | 0.9635 | 0.9647 |
| year | KonIQ-10k [42] | ||
| PLCC | SRCC | ||
| HyperIQA [19] | 2020 | 0.9170 | 0.9060 |
| TReS [49] | 2022 | 0.9280 | 0.9150 |
| ReIQA [48] | 2023 | 0.9230 | 0.9140 |
| QCN [50] | 2024 | 0.9450 | 0.9340 |
| ATTIQA [30] | 2024 | 0.9520 | 0.9420 |
| GMC-IQA [51] | 2024 | 0.9471 | 0.9325 |
| Prompt-IQA [52] | 2024 | 0.9430 | 0.9287 |
| SGIQA [26] | 2024 | 0.9510 | 0.9420 |
| SURPRIDE (ours) | 2025 | 0.9360 | 0.9269 |
| year | CLIVE [8] | ||
| PLCC | SRCC | ||
| HyperIQA [19] | 2020 | 0.8820 | 0.8590 |
| TReS [49] | 2022 | 0.8770 | 0.8460 |
| ReIQA [48] | 2023 | 0.8540 | 0.8400 |
| ATTIQA [30] | 2024 | 0.9160 | 0.8980 |
| GMC-IQA [51] | 2024 | 0.9225 | 0.9062 |
| Prompt-IQA [52] | 2024 | 0.9280 | 0.9125 |
| SGIQA [26] | 2024 | 0.9160 | 0.8940 |
| QCN [50] | 2024 | 0.8930 | 0.8750 |
| SURPRIDE (ours) | 2025 | 0.9024 | 0.8662 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).