DCT-Domain Detail Image Enhancement for More Resolved Images

: This paper develops a detail image signal enhancement that makes images perceived as clearer and more resolved and so is more effective for higher resolution displays. We observe that the local variant signal enhancement makes images more vivid, and the more revealed granular signals harmonically embedded on the local variant signals make images more resolved. Based on this observation, we develop a method that not only emphasizes the local variant signals by scaling up the frequency energy in accordance with human visual perception, but also strengths up the granular signals by embedding the alpha-rooting enhanced frequency components. The proposed energy scaling method emphasizes the detail signals in texture images and rarely boosts noisy signals in plain images. In addition, to avoid the local ringing artifact, the proposed method adjusts the enhancement direction to be parallel to the underlying image signal direction. It was veriﬁed through the subjective and objective quality evaluations that the developed method makes images perceived as clearer and highly resolved.


Introduction
Humans recognize the more sharpened images clearer and perceive images embedding finely resolved signals as higher resolution images even at the same resolution. So, as image contents are increasingly produced toward higher quality and are presented at higher resolution displays, the state-of-art detail image enhancements need to make images clearer and finely resolved. Fig. 1 shows a comparison of the original image with one that is clearer and another that is clearer and more finely resolved. To get a better understanding, one-dimensional horizontal signals of the images are plotted. The image in Fig. 1.(b) is clearer than the original image because magnified local variations sharpen the image. In the image in Fig. 1.(c), the local variations are magnified, and, simultaneously, the granular variations indicated by circles are embedded while preserving the contour of the local variant signals. The granular signals are the most resolved signals for an image display to represent. Owing to the granular variations, humans tend to perceive the image shown in Fig. 1(c) as the highest resolution.
Since existing detail enhancement methods increase local contrast only without observing the different effects of local variants and granular signals, they enhance only the sharpness of the images. Therefore, it is necessary to develop a detail image enhancement capable of magnifying local variations and simultaneously emphasizing granular signals in harmony with local variant signals.
The detail enhancement methods could be roughly categorized into spatial domain, frequency domain, and learning-based methods. Spatial domain methods focus primarily on elevating local variant signals. Kou et al.  different resolution signals in the wavelet transform domain and then increased the energy ratio of high-resolution signals [3]. Since the granular signals are tiny compared to local variant signals, these methods are technically unable to extract the granular signals and thus rarely enhance them. Therefore, although the spatial domain methods usually enhance the local contrast for the images to become clearer, they rarely enhance images to be finely resolved.
Using the capability of dissolving the image signals into frequency components, the frequency domain methods have focussed on how to increase the frequency energy of detail signals. The multi-band energy scaling method (MESM) developed by Tang et al. recursively scales up the frequency energy ratio in the discrete cosine transform(DCT) domain as the frequency band increases [4]. This method sharpens local variant signals in match with human visual perception, but rarely makes images seemed more resolved. The alpha rooting method exponentially boosts the frequency energy inversely to the original frequency energy [5] and results in the granular signals well, but often produces noisy signals. Celik conducted the DCT over the entire image to utilize extremely fine frequency resolution [6]. This method more weights higher frequency components in proportion to global variation. Since the method processes detail signals globally, it may produce insufficiently enhanced textures or excessively boosted noises. Moreover, it is not easy in actual systems to take the DCT over the entire image. Therefore, the existing frequency domain methods either rarely reveal granular signals or may produce noisy signals when they reveal granular signals.
More recently, learning-based image-enhancement methods have been developed. Yan et al. applied a convolution neural network(CNN) that was learned from the images enhanced by algorithms or human experts [7]. The network mainly improves the image visibility rather than enhancing the detail image signals. Gharbi et al. designed a bilateral CNN separately learning global and local variant signals to achieve real-time processing even on mobile devices [8]. The method mainly improves the global contrast and color brightness, but has limitations in generating or inferring granular signals. Chen et al. proposed a GAN-based image enhancement network that overcomes the ill-convergence that often occurs in GAN [9]. Since the GAN-based methods are unsupervised approaches, the method inherently bears the possibility of producing unnatural image signals. Moreover, learning-based methods commonly require heavy computation compared to model-based methods.
We developed a frequency domain method that enhances an image to be perceived as both clearer and of a higher resolution, distinguished from existing methods enhancing an image to be clearer only. The proposed method further decomposes the detail image signals into local variant and granular signals. To increase the sharpness of local variant signals, we devise a recursive frequency energy scaling-up method from the perceived contrast model that indicates the visual sensitivity of detail signals in the frequency domain. We enhance the frequency components by the alpha-rooting while scaling up the frequency energy, to embed the granular signals harmonically on the local variant signals. We also design the energy scaler to emphasize the detail image signals at texture images and suppress the increasing noisy patterns in plain images. Additionally, to reduce the ringing artifact, we devise a method for tuning the enhancement direction to be parallel with the signal direction in the DCT domain.
The remainder of this paper is organized as follows. Section 2 discusses the perceived contrast measure in the DCT-domain. Section 3 proposes the perceptual contrast increment method that recursively modifies DCT coefficients and presents a method to avoid artifacts and noise boosting. Section 4 evaluates the proposed method's performance compared to existing enhancement methods and analyzes the artifacts caused by enhancement methods. Section 5 reaches conclusions.

DCT-Domain Human Perceptual Contrast
Many psychological and physiological studies have reported that human visual neurons accept visual signals in frequency components; thus human visual perception is primarily affected by the frequency energy distributions of images [10,11]. The image signal components in the frequency domain are also efficiently separated and robustly processed. Therefore, we adopt the DCT as the enhancement platform.
Let f (i, j) be an image pixel value at position (i, j) of the N × N DCT block. The DCT coefficient F(u, v) is obtained such as When the ratio between the viewing distance and display height is R d and the vertical pixel number of the displayed image is Pix, the spatial frequency, ω(u, v), in actual viewing conditions is converted into the DCT frequency as follows: Several studies found that the human physiological visual sensitivity varies in the spatial frequency and is highest at 3∼5 cycles/degree, and is higher at the vertical and horizontal directions than diagonal direction because of the oblique effect. The studies Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 August 2021 doi:10.20944/preprints202108.0286.v1 also have modeled the visual sensitivity in the DCT domain, referred to as the contrast sensitivity function (CSF) [12], [13]. The CSF in the DCT domain is In addition to the CSF, the human visual sensitivity at specific frequency is also affected by the frequency energy distribution of an underlying image. Haun and Peli conducted experiments measuring human visual sensitivities to stimuli with different frequencies and directions, deriving a human visual sensitivity model, called as perceptual contrasts (PC) [10,11]. The perceptual contrasts (PC) is where B(u, v) is the background energy at (u, v). The background energy with respect to F(u, v) is the energy accumulation of the frequency components lower than (u, v) [11], [4]. That is, The PC indicates that human visual sensitivity is higher for the frequency components with larger CSF value, lower background energy and larger frequency energy [10,11]. While the existing contrast measurements quantify image signal variations in the spatial domain, the PC measures how much human actually perceives each frequency component and thus provides the contrast measure more matched with human visual perception.

Development of human perception oriented detail image enhancement
We develop a detail image signal enhancement method that recursively increases the perceptual contrast (PC) and simultaneously intensifies the granular signals. In addition, to avoid the ringing artifact, we devised a method that adjusts the enhancement direction to be parallel to the image signal direction.

Perceptual Contrast (PC) based Energy Scaling Method
Human visual perception generally prefers images with higher visual sensitivity [14]. In order to increase the human perceived visual sensitivities at frequency components, we propose a method recursively scaling up the PC as the frequency proceeds from low to high band.
The original and enhanced DCT coefficients are denoted as F(u, v) andF(u, v), respectively. Subsequently, the perceptual contrast at (u, v) of the original and enhanced images are denoted as PC(u, v) and PC(u, v), respectively. Introducing the PC-enhancing scaler, λ(> 1), PC(u, v) is related to PC(u, v) in the following way: By inserting (3) to (5), the following equation is obtained where R(u,v) is the energy scaling factor and To enhance the granular signals, the alpha rooting method was exploited, which emphasizes the energy of high-frequency components. The frequency component enhanced by the alpha rooting method is where α is the enhancement factor. As α is closer to 0, the higher-frequency components are emphasized to generate more granular signals. Then, R(u, v) embedding the alpha routine enhancement becomes R(u, v) is recursively updated from (6) and (8) as the frequency increases. The role of R(u, v) is to control the enhancement in adaptation with the characteristics of the image signals. Since CSF(u, v) has high values at middle-frequency components corresponding to local variant image signals, R(u, v) correspondingly has large values at these frequency bands so that it primarily enhances the local variant signals to produce sharpened images. Because the alpha rooting enhanced frequency component is embedded into R(u, v), the granular signals become more visible, while the noise signals probably occurring from the enhanced high-frequency signals are prevented. Thus, in texture images typically having a large energy at middle-and high-frequency components, R(u, v) enhances the detail signals while revealing the granular signals. In plain images not much Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 August 2021 doi:10.20944/preprints202108.0286.v1 containing detail image signals, the frequency energies of detail image signals are much smaller that those of CSF(u, v) and CSF(u, v) dominates over the background energy. Therefore, R(u, v) becomes approximately 1 at all frequency bands and rarely produces noise signals that may occur if the plain image signals are enhanced. Fig. 2 compares R(u, v) for the texture and plain images and shows their detail enhanced images. The texture image is shown in Fig. 2.(a) where the proposed method not only increases local variations but also embeds granular signals in local variant signals. The plane image is shown in Fig. 2.(b) where the proposed method rarely enhances the image signals and does not produce noisy image signals.

Signal direction adaptive enhancement
We develop the DCT domain method to remove the ringing artifact. When directional images such as edges, are not processed in parallel with their directions, overshooting occurs, which appears as ringing patterns. The ringing artifact is apparent as the enhancement direction becomes more perpendicular to the image direction. To prevent ringing artifacts, the enhancement direction must be parallel to the signal direction.
When an image signal is directed more vertically, the energy of the DCT coefficients is more condensed in the first row. Conversely, more horizontally directed signals have more energy of the DCT coefficients in the first column. In the diagonal signals, the DCT coefficients are symmetric. Therefore, the magnitudes of the first column and row DCT coefficients are equivalent to the gradients in the vertical and horizontal directions [15,16]. Let ∇ ver and ∇ hor be the vertical and horizontal gradients, respectively. Then, As an image signal slants closer to the vertical direction, ∇ ver is larger than ∇ hor . When an image signal is directed diagonally, ∇ ver = ∇ hor . To adjust the energy scaling direction, the DCT coefficients are decomposed into horizontal and vertical directions. The recursive PC scaling factors for the horizontal and vertical directions are obtained as follows: Then, the enhanced DCT coefficients for each direction becomē To steer the PC scaling direction in parallel with the image signal direction, we weighted each gradient to vertical and horizontal enhanced DCT coefficients. So, we propose the direction adaptive enhanced DCT coefficients as followings: where ∇ = ∇ ver + ∇ hor . The energies of the DC and low-frequency bands control the overall brightness of the DCT block. A change in the energy in the DC and low-frequency bands induces a brightness discontinuity among adjacent blocks, called the block artifact. The existing methods that avoid block artifacts do not scale the energies at DC and the first three frequency bands [17]. Following the existing methods, we do not scale the energy at the first three bands by setting λ as follows; Fig. 3 shows the edge images enhanced by the MESM and the proposed method. Although the MESM makes the ringing artifacts at sail edges, the proposed method tunes the DCT-coefficient scaling direction to be parallel with the sail edge direction, to avoid any visible ringing artifacts. In part of the detail image signal enhancement, the frequency energies are recursively scaled up in match with the human-perceived contrast. The proposed method designs the energy scaler R(u, v) from the perceived contrast embedding the alpha-rooting enhanced frequency components. R(u, v) enhances the detail image signals in texture images suppressing the enhancement of noisy signals in plain images. Additionally, since the alpha rooting enhanced frequency component is embedded into the perceived contrast, the granular signals become more suitable for human visual perception.

Outline of the proposed method
In part of the artifact reduction, the proposed method measures the vertical and horizontal gradients in the DCT domain. To reduce ringing artifacts, the proposed method adjusts the frequency energy scaling direction parallel to the image signal direction by weighting each gradient to the enhanced DCT coefficients in the horizontal and vertical directions.

Experiment and Discussion
We evaluate and analyze the performances of the proposed method in terms of objective and subjective image quality evaluations and artifact occurrence analysis. The test images are the ultra HD images in [18].

Objective and subjective image quality evaluation
The performance of the proposed method was evaluated and compared to recently developed local contrast enhancement methods including the unsharp masking [2], content adaptive image detail enhancement (CAIDE) [1], multiband energy scaling method (MESM) [17] and coefficient weight method (CWD) [6]. The unsharp masking method only enhances the detail signals after decomposing the image signals into the detail and base signals. The CAIDE exploits global optimization to enhance the detail signals by minimizing the less Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 August 2021 doi:10.20944/preprints202108.0286.v1 For objective quality evaluation, both image signal feature-based and learning-based methods are utilized. As an image signal feature-based method, the cumulative probability of blur detection (CPBD) metric is adopted, which has been reported to produce a more stable image quality score. CPBD, a no-reference image quality assessment metric, measures the degree of artifacts such as blurring, ringing and sharpness collectively and determines the numerical value indicating image quality. As the metric score approaches 1, the image quality is perceived to be better for humans. As a deep learning-based method, the full reference deep image quality assessment network(DeepIQA) [19] is used. DeepIQA lets a neural networks train the subjective image quality for predicting the DMOS values of the original and enhanced image pairs. The DeepIQA well evaluates image qualities in accordance with human visual perception because the DeepIQA exploits perceptual sensitivity map based on human visual perception. Positive DeepIQA values indicate the degree of quality improvement of the processed images over the original images.
For the subjective evaluation, we followed the categorical stimulus comparison judgment method recommended by ITU-R BT.500-11 [20]. We put two same Ultra HD(3840×2160) monitors next to each other with a viewing distance of 1.2m and ambient illumination of 200 lux. A total of 20 subjects were invited to compare the qualities of the 20 original images and images enhanced by the proposed method and the existing methods. The images were shown adjacent to each other to guarantee blindness. Subjects assigned a score in the range of [-3, 3] to the sequences. A score of -3 indicates that the left sequence has a significantly better visual quality than the sequence on the right whilst a score of 3 signifies that the sequence on the right has significantly better visual quality than the sequence on the left. A score of 0 indicates that no difference in visual quality is perceived. Table 1 lists the scores of each method. It should be noted that CPBD and DeepIQA are objective measures, and MOS is the subjective measure. The scores, even if expressed on different scales, show similar patterns, confirming the reliability of the experimental results. Because human perception is more precisely described in the frequency domain, the performance of frequency domain methods is usually higher than that of spatial domain methods. Among the frequency domain methods, the proposed method consistently produces competitive performances for most test images. In particular, the proposed method produced higher values in images containing more detail image signals. The MOS values of the proposed method and the CWM were in the superior group. The proposed method outperforms, especially in Arial and Boat, which contain many textures. Consequently, the proposed method produces superior perceptual performances for various test images, compared to existing methods. Fig. 5 compares the images enhanced by each method. The images enhanced by the frequency domain methods appear to be more sharpened, compared to those by the spatial domain methods. With increasing the sharpness, the proposed method also embeds granular signals onto the texture signals. Therefore, the images enhanced by the proposed method have the most resolved detail signals. Similarly, in the MOS test, the subjects selected the image enhanced by the proposed method as the highest resolution image. Fig. 6 analyzes how each method enhances detail image signals. For a better understanding, the horizontal signals of the enhanced images are presented in one dimension. As shown in the rectangles, the frequency domain methods intensify locally variant signals than the spatial domain methods, demonstrating how the frequency domain methods produce better sharpness. The signals the in circles indicate the granular signals embedded at locally variant signals. Such granular signals make the textures more finely resolved, allowing images to be perceived to have a higher resolution to humans. As shown by the circles in Fig. 6(e) and (f), the proposed method and CWM mainly generate such signals. CWM simply emphasizes higher frequency components over the entire image, and thus, it may generate granular signals that appear as noise in plain images. However, as discussed in 3.1, because the proposed method increases the high-frequency components according to R(u, v), the proposed method does not produce noisy granular signals in plain images.

Artifact Analysis
The major artifacts related to the frequency domain methods are ringing artifacts and noise boost-up. Fig. 7 compares the enhanced results of the plain, edge, and texture regions.
In texture regions such as grass areas, the proposed method and MESM enhance the texture image signals. However, in plain regions such as cloud areas, whereas MESM produces noisy signals with a mosaic pattern because it increases the frequency energy without observing the image signal features, the proposed method does not produce any visible noise signals. This indicates that, through the energy scaler R(u, v), the proposed method properly controls the enhancement effect in accordance with the underlying images. In edge regions such as wall boundaries, MESM produces ringing patterns along the edges because it enhances the image signals in all directions, including the direction perpendicular to the edge. However, the proposed method adjusts the enhancement direction to be parallel to the edge direction and rarely generates the ringing artifact.

Conclusion
We exploit the human perceptual contrast that measures the sensitivity of human visual perception to frequency components. Based on the perceptual contrast measure, we developed a frequency energy scaling-up method that not only emphasizes the local variant signals but also strengths the granular signals embedded in the local variant signals. Additionally, we developed a method to control the enhancement strength in adaptation to the characteristics of the underlying image signals. To reduce the ringing artifact, we devise a method that adjusts the enhancement direction to be parallel to the signal direction in the DCT domain. Therefore, the developed method enhances images to be perceived as clear and finely resolved while avoiding any visible artifacts.