1. Introduction
Super-resolution (SR) has gained significant attention in recent years, particularly in medical imaging applications, where the resolution of acquired images is often limited by hardware constraints, time limitations, and patient comfort considerations. Traditional medical imaging modalities such as Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) produce images at a resolution that can restrict the level of detail observable for diagnostic purposes. Increasing the resolution of these images through hardware improvements is often costly and impractical. As a result, computational techniques like SR have emerged as a powerful alternative, allowing high-resolution (HR) images to be reconstructed from low-resolution (LR) inputs without the need for expensive hardware [
1,
2,
3].
The principle behind SR methods is to overcome limitations by leveraging redundant information from multiple LR images or sequences, often involving complex algorithms like regularization methods and machine learning models [
4]. Various approaches, from classic interpolation methods to more advanced neural network-based models, have been employed to enhance the quality of medical images in terms of spatial resolution, signal-to-noise ratio (SNR), and edge preservation [
5].
In medical imaging, SR is particularly valuable because it enhances the quality of images used in diagnostic processes. For instance, MRI scans are used to assess various medical conditions, and improving their resolution can lead to more accurate diagnoses. Using SR techniques like Wiener filter regularization [
1] or edge-preserving high-frequency regularization [
3] allows for better visual quality in images without increasing acquisition costs or hardware requirements. Furthermore, neural network-based SR methods [
2] have demonstrated promising results in improving image quality with reduced computational time, making them suitable for real-time applications in clinical settings.
The concept of SR in medical imaging has evolved significantly over the years, with various methodologies proposed to tackle the challenges of resolution enhancement. One of the early applications of SR to MRI was proposed by Peled et al. [
6], where an Iterative-Back-Projection (IBP) method was used to enhance MRI images of human white matter fiber tracts. While this method showed some promise, it was limited by the use of synthetic image data, which does not fully capture the complexities of real-world medical imaging. Subsequent work by Scheffler [
7] addressed this limitation by highlighting the importance of utilizing original image data for more reliable SR reconstruction.
More recently, the integration of machine learning techniques into SR models has shown great promise. For example, a method combining iterative regularization with feed-forward neural networks was proposed by Babu et al. [
2], yielding improved results over previous methods due to its capability to handle noise and produce clearer, higher-resolution images. This method demonstrates the potential of neural networks to enhance SR models by reducing computational complexity while maintaining high image quality.
Bayesian methods have also been a major area of exploration in SR research [
8,
9,
10,
11,
12]. Aguena et al. [
1] introduced a Bayesian approach to MRI SR, which employed a Wiener filter to regularize the iterative solution. This method achieved notable improvements in both noise reduction and edge preservation. Similarly, Ben-Ezra et al. [
4] proposed a regularized SR framework for brain MRI, incorporating domain-specific knowledge to improve the quality of SR reconstructions. Their approach outperformed traditional maximum a posteriori (MAP) estimators in terms of both edge clarity and overall image quality.
Moreover, Ahmadi and Salari [
3] proposed a high-frequency regularization technique that combines edge-preserving methods with traditional SR models. Their approach allows for enhanced edge definition in MRI images without the need for image segmentation, offering a computationally efficient solution suitable for clinical applications.
The Accelerated Proximal Gradient Method (APGM), as outlined in [
13], is a well-known optimization technique commonly used for solving inverse problems in imaging, including SR. APGM accelerates the convergence of proximal gradient methods, which are widely adopted for SR tasks involving regularization. Its primary strength lies in its speed, as it converges more quickly than traditional gradient methods, making it suitable for large-scale imaging problems. However, APGM’s effectiveness is heavily dependent on the choice of regularizer, which influences how well the method can balance smoothness and sharpness in the reconstructed image. Poorly chosen regularizers can introduce artifacts or excessively smooth the image. While APGM is flexible and powerful, it requires careful tuning to achieve optimal results, especially when handling high-frequency details.
Block-matching and 3D filtering (BM3D), a renowned denoising algorithm discussed in [
14], uses a collaborative filtering approach to reduce noise while preserving image structures. For super-resolution tasks, BM3D can act as a regularizer that effectively manages noise without compromising edges and textures. Its block-matching mechanism compares similar patches in the image, applying 3D filtering to reduce noise in these matched blocks. Although BM3D excels at preserving textures and fine details in natural images, its computational complexity can be high, particularly when dealing with large images or complex noise patterns. Additionally, the block-matching process may struggle in scenarios where image structures do not align well with the blocks, leading to potential loss of detail in areas with intricate textures.
Gu et al. [
15] enhanced the BM3D algorithm by introducing weighted nuclear norm minimization, which improved its performance in image denoising. This modification further solidified BM3D as a versatile and widely adopted tool in image processing. Although BM3D is primarily an image denoising algorithm rather than a typical regularization technique, denoising often relies on regularization to reduce noise and improve image quality. BM3D utilizes collaborative filtering and 3D transform-domain techniques to achieve denoising, making it more aligned with advanced signal processing than conventional regularization methods.
Total variation (TV) regularization, a widely used technique in inverse problems, aims to promote sparsity in image gradients, leading to smoother regions while preserving sharp edges. TV regularization is known for its simplicity and its ability to retain edge information, making it a popular choice in SR tasks. However, TV regularization often suffers from the staircasing effect, where smooth regions of the image appear blocky or exhibit artificial edges. Over-regularization can further result in a loss of fine details, which limits the technique’s applicability in images with rich textures or high-frequency content [
16].
Rapid and Accurate Image Super Resolution (RAISR), introduced in [
17], is a learning-based SR method that is both computationally efficient and fast. It works by learning filters that are adaptive to local image features, such as gradients and edge orientations. Unlike deep learning-based methods that often require significant computational resources, RAISR is lightweight and quick, making it an attractive option for real-time applications. Despite its efficiency, RAISR tends to fall short when compared to more advanced SR methods like deep neural networks in terms of recovering high-frequency details. Its performance is highly dependent on the quality of the learned filters, and it may struggle with images that have complex structures or varying noise levels.
PPPV1, as described in [
18], is a video super-resolution method, based on the Plug-and-Play (PnP) framework. This method iteratively refines images, ensuring the gradual recovery of fine details over multiple iterations. A key feature of PPPV1 is its reliance on a denoising module, originally based on DnCNN (Denoising Convolutional Neural Network), to remove noise from the image during each step of the reconstruction process. While DnCNN is effective at reducing noise, it sometimes introduces oversmoothing, especially in high-frequency regions where texture and fine details are critical. The proposed improvement to this method involves replacing DnCNN with a custom prior for denoising. This change allows for more control over detail preservation and texture recovery, potentially reducing the risk of oversmoothing. A well-designed custom prior can provide better balance between noise suppression and sharpness, which could lead to more accurate and visually appealing results, particularly in areas with intricate patterns or high-frequency details.
In addition to neural network-based and Bayesian approaches, convex optimization methods have been explored. Kawamura et al. [
5] applied convex optimization techniques to MRI SR, producing state-of-the-art results by carefully balancing noise suppression and detail preservation.
Overall, the field of SR in medical imaging is rapidly advancing, with numerous approaches showing great potential in improving diagnostic imaging and reducing the need for high-cost imaging hardware. The next phase of research will likely focus on integrating these various techniques into more robust, real-time systems suitable for clinical environments.
In this paper, we propose an improved PPP regularization method for MRI super-resolution, utilizing an effective prior specifically designed for denoising and handling motion between frames. Building upon the foundation of our previous method (PPPV1 [
18]), our approach incorporates an innovative denoiser that significantly enhances performance. Unlike traditional methods that primarily focus on denoising, our method integrates these advances into an MRI super-resolution framework.
2. Materials and Methods
The acquisition model we are assuming is:
where:
is the full set of low resolution (LR) frames, described as , where are the p LR images. Each observed LR image is of size . Let the kth LR image be denoted in lexicographic notation as , for and .
is the desired high resolution (HR) image, of size , written in lexicographical notation as the vector , where and and represent the up-sampling factors in the horizontal and vertical directions, respectively.
, where is the noise vector for frame k and contains independent zero-mean Gaussian random variables.
is the degradation matrix which performs the operations of blur, rigid transformation and subsampling.
Assuming that each LR image is corrupted by additive noise, we can then represent the observation model as [
19]:
where
is a matrix of size that performs the rigid transformation, represents a blur matrix, and S is a subsampling matrix. In our case , since we assumed no added blur on video frames.
The goal is to find the estimate
of the HR image
from the
p LR images
by minimizing the cost function
where
is the “fidelity to the data” term, and
is the regularization term, which offers some prior knowledge about
. In this study, we adopt the Plug-and-Play Priors approach, in which the ADMM algorithm is modified so that the proximal the proximal operator related to
is replaced by a denoiser that solves the problem of Eq. (
5). The denoiser used is based on the work by Chantas et al. [
20].
The following outlines the algorithm we propose:
- 1.
-
The first step of our algorithm is to evaluate the term
from the Equation (
3), by using rigid registration. Rigid registration, also known as rigid body registration or rigid transformation, is a fundamental technique in medical image processing and computer vision. It is used to align two images by performing translations and rotations while preserving the shape and size of the structures within the images [22].
In a 2D plane, a rigid transformation can be represented using a
matrix, often referred to as the transformation matrix. For example, a 2D translation can be represented as [23]:
Rotation and reflection matrices can also be formulated similarly. The result of the rigid transformation is represented as an affine transformation matrix. This matrix captures the translation and rotation parameters applied to the original image [23].
We assume that one of the LR images, (typically the middle one), is produced from the HR image , by applying only downsampling, without transformation. Thus, . Rigid transformation is calculated between and the rest of the LR images. Following that, we get for the remaining images.
- 2.
The subsequent phase is centered on employing the PnP-ADMM technique. We execute the PnP-ADMM, adhering to the procedure outlined in Algorithm 1 until reaching convergence, in order to minimize the problem described by Eq. (
4). The initial HR image guess,
, is generated from
using the pseudo-inverse of
. Here,
D represents the denoising operator, introduced and discussed in
Section 2.1, and
g is formulated as
.
|
Algorithm 1 PnP-ADMM [24] |
- 1:
- 2:
for do
- 3:
- 4:
- 5:
- 6:
end for
- 7:
return
|
We next explain the modification made to the standard ADMM algorithm to obtain PnP-ADMM. Line 4 or the standard ADMM is
. In the PnP-ADMM, the proximal operator is replaced by a denoiser D that solves the problem
It can be shown that the Maximum A Posteriori (MAP) estimator
of
is the proximal operator:
for
.
2.1. The Denoising Algorithm
In this section, we describe the algorithm we use to implement the denoising step of Eq. (
6). The algorithm is a simplification of that proposed in [
20], it is formulated in a probabilistic (Variational Bayes) context and utilizes an effective prior distribution, which we describe in short next.
2.1.1. The Prior Distribution
The prior distribution we employ for the denoising step was proposed in [
20] for single image Super-Resolution, and it is of the form:
where
are the real-positive distribution parameters and
is a similarity measure between two patches each of center pixel
w and
. The above distribution is produced after integrating out the hidden variables of the prior in [
20]. However, this form in never explicitly used (it is not necessary) in the optimization algorithm. We show it here in this form for simplicity of presentation. Indeed,
enables us to interpret the prior in a deterministic context, analogous with the penalty function imposed on the video frames, see equation (
6).
We introduce a similarity measure between two image patches, denoted as and , where and represent the central pixel of the first and second patch, respectively.
The complete set of pixel coordinates is represented by
. Furthermore, we define
as the integer displacement between the center pixels of the two patches, such that
. For measuring similarity, we employ a weighted Euclidean norm, represented by
, to quantify the difference between
and
(or
) as follows:
where
is defined by:
and
indicates the vector obtained by squaring each element of
.
represents the difference operator, an
matrix, such that the
i-th component of
equals
for all
with
. The matrix
is an
diagonal matrix, where its diagonal elements corresponding to the pixels in
are the only non-zero values, specifically,
for all
i not in
. Lastly, we denote by
the
vector with elements the weights of the weighted norm: the closer to the central pixel of the patches the larger the weight value.
The norm defined by (
8) retains its value even if the summation (
8) runs over only the subset
instead of
, since
for
. However, we use the full summation range over
for enabling fast computations with the Fast Fourier Transform, as explained next.
The distance between the patch
and an arbitrary patch
,
, is
. Given that the image patches correspond to
and
, it is:
As we can see, each
, is a circularly shifted by
version of
(denoted simply by
from now on). The formula (
8) for calculating
, expressed in terms of
, is:
Clearly, the values of for all w’s, are the result of the correlation between and , since the indices of and always differ by the constant . To calculate the correlation required for the super-resolution technique discussed in the following section, we use the Fast Fourier Transform (FFT). This approach decreases the computational complexity of the algorithm from , typical for correlation calculations, to , which is the complexity for multiplication in the DFT (Discrete Fourier Transform) domain.
2.1.2. Denoising in PnP-ADMM
Next, we describe the algorithm we employ in the PnP-ADMM context of Algorithm 1, and specifically for the denoising step (line 4). The algorithm we employ, as a denoising sub-problem of the general super-resolution algorithm (Algorithm 1), is in essence a special case of the VBPS algorithm in [
20], where there is no blurring nor decimation. Mathematically speaking, this means that the imaging operator
is the
identity matrix
, as shown in line 8 of Algorithm 2.
|
Algorithm 1: Variational Bayes Patch Similarity Denoising |
|
Input: Noisy image .
Output: Denoised image .
Initialization:
Image initial estimate: Set , where is the regularization parameter obtained from [ 21]. Then, set , where is the super-resolved image obtained after setting . Parameter selection: Set , and , , , , and err .
- 1:
while AND do
- 2:
for every in do
- 3:
- 4:
for every w in do
- 5:
Calculate the expectations of the following model’s random variables:
where is the in ( 8), calculated with the image estimation provided in the previous iteration ,
- 6:
calculate , for all w and ,
- 7:
set (convolution),
- 8:
- 9:
Obtain by solving the linear system with the Conjugated Gradients algorithm.
- 10:
end for
- 11:
end for
- 12:
end while
- 13:
T=t; .
|
More specifically, the imaging model assumed for the denoising step is a simplified form of Eq. (2.1) in [
20], because it is now
(i.e., no blur/decimation, hence it is just the identity matrix). Also, in this form,
has the role of the “noisy image” and
is the uncorrupted one, meant to be estimated by the denoising algorithm.
In parallel with imaging model, we assume the imaging model, i.e., the prior distribution introduced above and given by Eq. (
5). This is in essence the prior distribution for the uncorrupted image to be estimated via the denoising procedure. This means that the Algorithm 2 is the result of the adoption of both the imaging model mentioned above and the prior (
5) for
. Lastly, note that the denoising Algorithm 2 selects automatically, in the initialization step, the noise variance
, among other parameters.
We implemented our method in SCICO [25], which is an open source library for computational imaging that includes implementations of several algorithms.
To evaluate our method, the widely-used publicly available dataset named the cancer image archive (TCIA) [26] was used, in order to compare our results to the previously proposed method. Specifically, we conducted experiments using a dataset of LR brain MRI images and a corresponding HR reference dataset.
3. Results
Our method with the effective prior achieved notable improvements in image quality, as demonstrated by
Figure 1 and
Figure 2.
To objectively evaluate the effectiveness of our improved technique, we calculated the PSNR and conducted comparisons with both alternative approaches and enhanced versions of our own method. Specifically, we compared against PPPV1 [
18], APGM (accelerated proximal gradient method) [
13], BM3D (Block-matching and 3D filtering) [
16], Total Variation [
14], RAISR (Rapid and Accurate Image Super Resolution) [
17] and MIRNetv2 [27], as well as with the pseudo-inverse and the denoised pseudo-inverse images. The difference between PPPV1 and the currently proposed method is that now we use a custom prior instead of DnCNN for the denoising, while we use the same rigid transformation. The outcomes, detailed in
Table 1, unequivocally demonstrate that our method surpasses others in delivering higher image quality.
The Wilcoxon signed-rank test was used to compare the PSNR values of the proposed method with the respective values for PPP V1, Pseudo-inverse, Denoised Pseudoinverse, APGM, BM3D and TV methods. The results obtained with those statistical tests are shown in
Figure 3 and indicated statistically significant differences between the PPP and the other six methods, since no per-slice data was available for RAISR and MIRNetv2.
Considering the perceptual quality of the frames, it is obvious from
Table 2 that our method gives the best results, outperforming the other methods on all datasets. This outcome demonstrates the robustness and effectiveness of our method in enhancing the natural quality of super-resolved videos for this specific dataset.