LLMSeamCarver: LLM-Enhanced Content-Aware Image Resizing

Dong Liu; Yanxuan Yu; Lianghao Tan; Wenjun Wu; Bide Zhao; Bingjie Lu

doi:10.20944/preprints202412.1897.v4

Submitted:

05 January 2025

Posted:

06 January 2025

You are already at the latest version

Abstract

This paper introduces "LLMSeamCarver," a LLM-enhanced methodology for image resizing. LLMSeamCarver addresses the limitations of traditional seam carving with static pre-defined parameters, it uses LLM to achieve dynamic and user-controlled dynamically-resizing of images.The inclusion of LLMs in this research facilitates dynamic optimization of parameter tuning and adaptive energy function adjustments, enhancing overall robustness and efficiency of image resizing. LLMSeamCarver emerges as a transformative tool, offering versatile, high-quality resized images.

Keywords:

efficient machine learning

;

LLMs in vision

Subject:

Computer Science and Mathematics - Artificial Intelligence and Machine Learning

I. Introduction

LLMSeamCarver is an image resizing method designed to maintain important image details during resizing. By integrating LLMs, LLMSeamCarver goes beyond traditional methods, enabling a smarter, context-aware resizing process. LLMs enhance tasks such as region prioritization, interpolation, edge detection, and energy calculation by analyzing image context or textual inputs. This allows LLMSeamCarver to preserve crucial image features like faces and text while optimizing resizing for different scenarios.

This paper explores how LLMs improve the accuracy and efficiency of image resizing.

II. Background

i. Image Resizing and LLMs

LLMSeamCarver enhances the traditional seam carving method by incorporating LLMs. The integration of LLMs allows for dynamic resizing of images based on real-time context and user input, rather than relying on fixed parameters. The LLM dynamically optimizes the energy functions that determine how seams are selected and removed, making the resizing process adaptive and context-sensitive. This approach allows for superior preservation of image details and better quality when resizing for specific tasks, such as creating thumbnails or preparing images for various screen sizes.

LLMs contribute by adjusting the parameters based on the content of the image and the desired effect, improving the quality of resized images while providing flexibility. The dynamic optimization of the resizing process using LLMs represents a major leap in the flexibility and efficiency of image resizing techniques.

ii. LLM-Related Work

Recent research in LLMs has demonstrated their potential in a variety of domains, including text, image, and video generation.

Recently, LLMs has shown its power and potential to enhance traditional image processing workflows through advanced model architectures and optimization techniques. For instance, researchers have developed methods that optimize convolutional neural network (CNN) layers for feature extraction in image processing tasks, thereby improving the performance of deep learning models in image classification and segmentation [2,22]. Zhang et al. (2024) explored the use of LLMs in multi-modal fusion networks, enabling the integration of both visual and textual information, which enhances image analysis tasks [10,11,24,26].

Additionally, efficient algorithm design [1,6,24,27,30] and efficient LLMs [17,19] have shown promising prospective in efficient model design with LLMs. Through dynamic optimization, LLMs allow for more context-aware resizing by adjusting energy functions during the process. This flexibility ensures that fine-grained details are preserved, which is especially crucial for tasks like content-aware resizing. Studies on text-to-image models have demonstrated how LLMs can modify images based on contextual prompts [7,13,15,16,28], providing further advancements in content-aware image resizing.

iii. Image and Vision Generation Work

The application of deep learning techniques in image and vision generation has seen remarkable progress in recent years [3,4,9,21,23,25,26]. Deep convolutional networks have been effectively utilized for texture classification, achieving high accuracy in distinguishing fine-grained patterns and structures (Bu et al., 2019) [1]. These approaches significantly enhance detail preservation during image resizing by maintaining textures and edges, ensuring that visual fidelity is retained across transformations.

Moreover, advances in multi-modal fusion networks and techniques for image-driven predictions—as demonstrated by Dan et al. (2024)—illustrate how artificial intelligence can process and modify images in real-time [3,14,26]. These methods integrate data from diverse sources, facilitating applications such as enhanced video editing and real-time object tracking.

Additionally, model compression is increasingly favored from both model optimization and system design perspectives, enabling efficient resource usage without sacrificing performance [8,18,20]. These innovations support dynamic, user-controlled visual generation, opening new possibilities for customizable and interactive media content creation.

iv. Image Resizing and Seam Carving Research

The traditional seam carving method was proposed by Avidan and Shamir (2007), other studies have contributed to enhancing seam carving methods. Kiess (2014) introduced improved edge preservation methods within seam carving [12], which is crucial for ensuring that resized images do not suffer from visible distortions along object boundaries. Zhang (2015) compared classic image resizing methods and found that seam carving provided superior results when compared to simpler resizing techniques, particularly in terms of detail preservation [29].

Frankovich (2011) further advanced seam carving by integrating energy gradient functionals to enhance the carving process, providing even more control over the resizing operation [5].

III. Functionality

LLMSeamCarver leverages LLM-Augmented methods to ensure adaptive, high-quality image resizing while preserving both structural and semantic integrity. The key functionalities are:

LLM-Augmented Region Prioritization: LLMs analyze semantics or textual inputs to prioritize key regions, ensuring critical areas (e.g., faces, text) are preserved.
LLM-Augmented Bicubic Interpolation: LLMs optimize bicubic interpolation for high-quality enlargements, adjusting parameters based on context or user input.
LLM-Augmented LC Algorithm: LLMs adapt the LC algorithm by adjusting weights, ensuring the preservation of important image features during resizing.
LLM-Augmented Canny Edge Detection: LLMs guide Canny edge detection to refine boundaries, enhancing clarity and accuracy based on contextual analysis.
LLM-Augmented Hough Transformation: LLMs strengthen the Hough transformation, detecting structural lines and ensuring the preservation of geometric features.
LLM-Augmented Absolute Energy Function: LLMs dynamically adjust energy maps to improve seam selection for more precise resizing.
LLM-Augmented Dual Energy Model: LLMs refine energy functions, enhancing flexibility and ensuring effective seam carving across various use cases.

IV. LLM-Guided Region Prioritization

To enhance seam carving, we propose a formalized method where LLMs are used to compute semantic importance

S (x, y)

for each pixel, directly modifying the energy map

E (x, y)

. This ensures content-aware resizing with minimal disruption to critical features.

i. Semantic Importance Scoring by LLMs

The semantic importance

S (x, y)

is computed as:

S (x, y) = g (h (I), q (D)),

(1)

where:

$h (I)$ : Image embedding derived from input image I using a vision feature extractor.
$q (D)$ : Text embedding derived from optional user description D using a language transformer.
g: A cross-modal scoring function combining $h (I)$ and $q (D)$ , implemented via attention mechanisms.

If no description D is provided,

q (D)

defaults to a generic embedding, allowing g to focus solely on image features.

ii. Energy Map Adjustment

The refined energy map

E^{'} (x, y)

integrates semantic importance into the standard energy formulation:

\begin{matrix} E^{'} (x, y) & = E (x, y) + α \cdot S (x, y), \\ E (x, y) & = \sqrt{{(\nabla I_{x} (x, y))}^{2} + {(\nabla I_{y} (x, y))}^{2}}, \end{matrix}

(2)

where:

$\nabla I_{x} (x, y), \nabla I_{y} (x, y)$ : Gradients in x- and y-directions.
$S (x, y)$ : Semantic score indicating the importance of pixel $(x, y)$ .
$α$ : Weighting factor balancing pixel-based energy and semantic importance.

The function

S (x, y)

ensures that higher semantic scores reduce the likelihood of important regions being removed during seam carving.

iii. Semantic Score Calculation with LLMs

To compute

S (x, y)

, the LLM processes image embeddings

h (I)

and textual embeddings

q (D)

:

h (I) = VisionEncoder (I),

(3)

q (D) = TextEncoder (D),

(4)

S (x, y) = Attention (h (I), q (D)),

(5)

where:

VisionEncoder: Extracts regional features from I (e.g., object locations, edges).
TextEncoder: Encodes user-provided descriptions into contextual embeddings.
Attention: Combines $h (I)$ and $q (D)$ to assign $S (x, y)$ based on pixel relevance.

The score

S (x, y)

is normalized to a range

[0, 1]

for compatibility with

E (x, y)

.

iv. Cumulative Energy Map Update

The cumulative energy map

E^{'} (x, y)

propagates semantic adjustments through the seam carving process. For a given pixel

(i, j)

, the update is:

\begin{matrix} E^{'} (i, j) & = E (i, j) + α \cdot S (i, j) \\ + min {C_{L} (i, j) + E^{'} (i - 1, j - 1), \\ C_{R} (i, j) + E^{'} (i - 1, j + 1), \\ C_{U} (i, j) + E^{'} (i - 1, j)}, \end{matrix}

(6)

where:

$C_{L}, C_{R}, C_{U}$ : Left, right, and upward cost terms adjusted with $S (x, y)$ .
min: Ensures the optimal seam path minimizes distortion of high-priority regions.

V. LLM-Augmented Bicubic Interpolation

The LLM-Augmented Bicubic Interpolation method enhances traditional bicubic interpolation by incorporating semantic importance scores

S (x, y)

derived from LLMs. This approach ensures content-aware resizing, prioritizing regions of high semantic value such as faces, text, and objects.

i. Traditional Bicubic Interpolation

In standard bicubic interpolation, the value of a pixel at position

(x, y)

is computed using a 4x4 grid of neighboring pixels. The interpolation weights

w (x)

are determined by the relative distances between the target pixel and its neighbors:

w (x) = \{\begin{matrix} {(a + 2) | x |}^{3} - (a + 3) {| x |}^{2} + 1, & if | x | \leq 1, \\ {a | x |}^{3} - {5 a | x |}^{2} + 8 a | x | - 4 a, & if 1 < | x | < 2, \\ 0, & otherwise, \end{matrix}

(7)

where a is a constant (commonly

a = - 0.5

) controlling the interpolation smoothness.

The pixel value at position

(X, Y)

is computed as:

B (X, Y) = \sum_{i = 0}^{3} \sum_{j = 0}^{3} a_{i j} \cdot w (x - X_{i}) \cdot w (y - Y_{j}),

(8)

where

a_{i j}

are the pixel intensities in the 4x4 grid.

ii. LLM-Augmented Interpolation

In the augmented method, the interpolation weights are modified to incorporate semantic importance:

w^{'} (x) = w (x) \cdot (1 + β \cdot S (x, y)),

(9)

where:

$S (x, y)$ : Semantic importance score for pixel $(x, y)$ , computed by the LLM.
$β$ : Scalar factor controlling the influence of $S (x, y)$ on the interpolation process.

The updated pixel value is then calculated as:

B^{'} (X, Y) = \sum_{i = 0}^{3} \sum_{j = 0}^{3} a_{i j} \cdot w^{'} (x - X_{i}) \cdot w^{'} (y - Y_{j}),

(10)

ensuring higher weights for regions with greater semantic importance.

iii. Semantic Importance Calculation

The semantic score

S (x, y)

is derived using the LLM:

S (x, y) = f_{LLM} (I, D),

(11)

where:

I: Input image.
D: Optional user-provided description specifying priorities (e.g., "preserve faces").
$f_{LLM}$ : A function combining image embeddings $h (I)$ and text embeddings $q (D)$ through an attention mechanism:

$S (x, y) = Attention (h (I), q (D)) .$

(12)

iv. Cumulative Interpolation Update

The augmented interpolation integrates the adjusted weights:

f^{'} (x, y) = \sum_{i = 0}^{3} \sum_{j = 0}^{3} f (x_{i}, y_{j}) \cdot w^{'} (x - x_{i}) \cdot w^{'} (y - y_{j}),

(13)

where

w^{'} (x)

and

w^{'} (y)

ensure greater emphasis on semantically significant regions.

VI. LLM-Augmented LC (Loyalty-Clarity) Policy

The LLM-Augmented LC Policy improves contrast-based resizing by integrating semantic importance parameters from LLMs. These parameters ensure that important regions, such as faces and text, receive higher priority during resizing.

i. Global Contrast with Semantic Guidance

We calculate the global contrast of a pixel

I_{k}

as:

C^{'} (I_{k}) = \sum_{\forall I_{i} \in I} ∥ I_{k} - I_{i} ∥ + λ \cdot \sum_{r \in R} w_{r} \cdot ∥ I_{k} - I_{r} ∥,

(14)

where:

$λ$ scales the influence of semantic importance, computed by the LLM.
$w_{r}$ represents the semantic weight of region r, also derived from the LLM.
R is the set of semantically significant regions.

This formulation integrates semantic relevance into the contrast computation, preserving critical regions.

ii. Frequency-Based Contrast Refinement

The traditional frequency-based contrast uses intensity distributions:

C (I_{k}) = \sum_{n = 0}^{255} f_{n} \cdot ∥ a_{m} - a_{n} ∥,

(15)

where

f_{n}

is the frequency of intensity

a_{n}

. We extend this by including semantic importance:

C^{'} (I_{k}) = C (I_{k}) + λ \cdot \sum_{r \in R} w_{r} \cdot f_{r} \cdot ∥ a_{m} - a_{r} ∥,

(16)

where:

$f_{r}$ adjusts the frequency of region r based on the LLM’s analysis.
$w_{r}$ weights the region according to its semantic importance.

iii. LLM Parameter Computation

The LLM computes

w_{r}, f_{r}, λ

through embeddings and attention mechanisms:

Image Embeddings $h (I)$ : A vision encoder extracts pixel-level and global features:

$h (I) = VisionEncoder (I) .$

(17)
Text Embeddings $q (D)$ : A text encoder processes user descriptions:

$q (D) = TextEncoder (D) .$

(18)
Spatial Embeddings $ξ_{r}$ : Positional embeddings represent region-specific attributes:

$ξ_{r} = PositionalEmbedding (r) .$

(19)
Semantic Weights $w_{r}$ : Attention mechanisms combine embeddings:

$w_{r} = Attention (h (I), q (D), ξ_{r}) .$

(20)
Frequency Adjustment $f_{r}$ : The LLM refines the frequency distribution:

$f_{r} = \frac{\sum_{p \in r} f (p)}{Area (r)} .$

(21)
Scaling Factor $λ$ : A sigmoid function ensures $λ \in [0, 1]$ :

$λ = σ (g (h (I), q (D))) .$

(22)

iv. Contrast-Based Resizing Decision

The adjusted contrast

C^{'} (I_{k})

guides the resizing process. The seam path is chosen to minimize distortion in semantically important regions:

min \{\sum_{k \in P} C^{'} (I_{k})\},

(23)

where P represents the seam path.

Figure 1. Outlier detected by the LLM-LC algorithm

VII. LLM-Augmented Canny Line Detection

The LLM-Augmented Canny Line Detection integrates semantic importance parameters

S_{LLM}, σ_{LLM}, T_{LLM}

derived from a Large Language Model (LLM) to enhance edge detection. This approach ensures that edges in critical regions, such as faces or text, are prioritized during resizing.

i. Gaussian Filtering

The image is smoothed using a Gaussian filter:

G (x, y) = \frac{1}{2 π σ^{2}} e^{- \frac{x^{2} + y^{2}}{2 σ^{2}}} .

(24)

The LLM adjusts the smoothing factor

σ

for semantically important regions:

σ_{LLM, r} = σ \cdot (1 + α \cdot S_{LLM, r}),

(25)

where:

$S_{LLM, r}$ : Semantic importance score for region r.
$α$ : Scaling factor for the semantic influence.

ii. Gradient Calculation

Gradients are computed using the Sobel operator:

G (i, j) = \sqrt{G_{x} {(i, j)}^{2} + G_{y} {(i, j)}^{2}},

(26)

θ (i, j) = arctan (\frac{G_{y} (i, j)}{G_{x} (i, j)}),

(27)

where

G_{x} (i, j)

and

G_{y} (i, j)

are horizontal and vertical derivatives.

The LLM adjusts the gradient magnitude for each pixel:

G_{LLM} (i, j) = G (i, j) \cdot S_{LLM} (i, j) .

(28)

iii. LLM Semantic Parameter Calculation

The semantic scores

S_{LLM} (i, j)

, smoothing factor

σ_{LLM, r}

, and thresholds

T_{LLM, high}, T_{LLM, low}

are computed as follows:

\begin{matrix} h (I) & = VisionEncoder (I), \end{matrix}

(29)

\begin{matrix} q (D) & = TextEncoder (D), \end{matrix}

(30)

\begin{matrix} ξ_{i, j} & = PositionalEmbedding (i, j), \end{matrix}

(31)

\begin{matrix} S_{LLM} (i, j) & = Attention (h (I), q (D), ξ_{i, j}), \end{matrix}

(32)

\begin{matrix} σ_{LLM, r} & = σ \cdot (1 + α \cdot S_{LLM, r}), \end{matrix}

(33)

\begin{matrix} T_{LLM, high} & = T_{high} \cdot (1 + β \cdot S_{LLM, r}), \end{matrix}

(34)

\begin{matrix} T_{LLM, low} & = T_{low} \cdot (1 + β \cdot S_{LLM, r}) . \end{matrix}

(35)

iv. Edge Refinement

Edges are refined using non-maximum suppression and hysteresis thresholding. The final edge map is defined as:

G_{LLM} (i, j) = \{\begin{matrix} 0, & if G_{LLM} (i, j) < T_{LLM, high}, \\ G_{LLM} (i, j), & otherwise . \end{matrix}

(36)

Figure 2. Original Image

Figure 3. Edges Detected by the Canny Detector (with LLM Augmentation)

VIII. LLM-Augmented Hough Transformation

The LLM-Augmented Hough Transformation integrates semantic importance

S_{LLM} (x, y)

derived from LLMs to enhance line detection. This ensures that key structural features, such as text and faces, are preserved during image resizing.

i. Mathematical Formulation

The Hough Transformation maps edge pixels

(x, y)

to Hough space. For each pixel, the radial distance r is computed as:

r = x cos (θ) + y sin (θ),

(37)

where

θ

is the angle of the line in Hough space.

In the LLM-Augmented Hough Transformation, the accumulator

A (r, θ)

is updated as:

A (r, θ) \leftarrow A (r, θ) + S_{LLM} (x, y),

(38)

where:

$A (r, θ)$ : Accumulator value for the line parameterized by $(r, θ)$ .
$S_{LLM} (x, y)$ : Semantic importance score computed for pixel $(x, y)$ .

ii. Semantic Score Calculation

The semantic score

S_{LLM} (x, y)

is computed as:

\begin{matrix} h (I) & = VisionEncoder (I), \end{matrix}

(39)

\begin{matrix} q (D) & = TextEncoder (D), \end{matrix}

(40)

\begin{matrix} ξ_{x, y} & = PositionalEmbedding (x, y), \end{matrix}

(41)

\begin{matrix} S_{LLM} (x, y) & = Attention (h (I), q (D), ξ_{x, y}) . \end{matrix}

(42)

iii. LLM-Augmented Hough Transformation Algorithm

Algorithm 1 LLM-Augmented Hough Transformation

Require:

I, D

: Input image I, optional description D.
Ensure:

L

: Detected lines with semantic weighting.

1:: $E \leftarrow EdgeDetector (I)$
2:: $A (r, θ) \leftarrow 0$
3:: for $(x, y) \in E$ do
4:: $S_{LLM} (x, y) \leftarrow Atten (h (I), q (D), ξ_{x, y})$
5:: for $θ \in [0, π]$ do
6:: $r \leftarrow x cos (θ) + y sin (θ)$
7:: $A (r, θ) \leftarrow A (r, θ) + S_{LLM} (x, y)$
8:: end for
9:: end for
10:: $P \leftarrow PeakDetector (A)$
11:: $L \leftarrow MapToImageSpace (P)$

iv. Threshold Adaptation

To refine line detection, thresholds for peak detection are dynamically adjusted using the average semantic score

S_{avg} (r, θ)

:

T_{LLM} (r, θ) = T_{base} \cdot (1 + β \cdot S_{avg} (r, θ)),

(43)

where:

$T_{LLM} (r, θ)$ : Adaptive threshold for $(r, θ)$ .
$S_{avg} (r, θ)$ : Average semantic score for lines contributing to $(r, θ)$ .
$β$ : Scaling factor controlling semantic influence.

Figure 4. Original Image

Figure 5. Lines Detected by LLM-Augmented Hough Transformation

Figure 6. Prominent Lines Detected by LLM-Augmented Hough Transformation

IX. LLM-Augmented Absolute Energy Equation

The LLM-Augmented Absolute Energy Equation refines energy calculations by integrating semantic weights

S (x, y)

derived from LLMs. This ensures seam carving preserves critical features such as text and faces.

i. Semantic Weighting

The LLM assigns a semantic score

S (x, y)

to each pixel:

S (x, y) = LLM (I, (x, y), context),

(44)

where I is the input image and context includes features like object and text importance.

S (x, y) \in [0, 1]

scales pixel importance, with higher values for semantically significant regions.

ii. Gradient Refinement

The original energy gradient:

e (I) = |\frac{\partial I}{\partial x}| + |\frac{\partial I}{\partial y}|

is modified as:

e_{LLM} (I) = (|\frac{\partial I}{\partial x}| \cdot S_{x} (x, y)) + (|\frac{\partial I}{\partial y}| \cdot S_{y} (x, y)),

(45)

where

S_{x} (x, y)

and

S_{y} (x, y)

are direction-specific weights computed by the LLM.

iii. Cumulative Energy Update

The cumulative energy function integrates

S (x, y)

into the seam carving process:

\begin{matrix} e_{LLM} (i, j) & = e (i, j) + | e (i, j + 1) - e (i, j) | \cdot S_{x} (i, j) \\ + | e (i + 1, j) - e (i, j) | \cdot S_{y} (i, j) \\ + min {C_{L} (i, j) + e (i - 1, j - 1), \\ C_{R} (i, j) + e (i - 1, j + 1), \\ C_{U} (i, j) + e (i - 1, j)} . \end{matrix}

(46)

Here,

C_{L}, C_{R}, C_{U}

are adjusted by

S (x, y)

to prioritize semantically significant pixels.

X. LLM-Augmented Dual Gradient Energy Equation

The proposed LLM-Augmented Dual Gradient Energy Equation to refine edge detection by dynamically adjusting numerical differentiation and gradient computation. LLMs provide context-aware corrections for each computational step.

i. Numerical Differentiation with LLM Adjustments

Taylor expansions are dynamically adjusted with LLM corrections. The forward expansion is expressed as:

f (x + Δ x) = f (x) + Δ x f^{'} (x) + \frac{Δ x^{2}}{2} f^{''} (x) + Δ_{L L M},

(47)

where

Δ_{L L M}

includes context-aware corrections predicted by the LLM. Similarly, for the backward expansion:

f (x - Δ x) = f (x) - Δ x f^{'} (x) + \frac{Δ x^{2}}{2} f^{''} (x) + Δ_{L L M} .

(48)

LLMs adapt

Δ x

dynamically based on local image gradients and refine higher-order terms to reduce numerical error.

ii. Gradient Approximation with Adaptive Refinements

Gradient approximations in the x- and y-directions incorporate corrections from LLMs:

f_{x} (x, y) \approx \frac{f (x + δ, y) - f (x, y)}{δ} + Δ f_{x},

(49)

f_{y} (x, y) \approx \frac{f (x, y + δ) - f (x, y)}{δ} + Δ f_{y} .

(50)

Here,

Δ f_{x}

and

Δ f_{y}

are LLM-predicted corrections based on local edge strength and texture complexity. The LLM also adapts

δ

to handle regions with high-gradient variations.

iii. Energy Calculation with LLM Refinements

The energy of a pixel is computed using the LLM-enhanced gradients for each RGB channel. For the x-direction:

Δ_{x}^{2} (x, y) = \sum_{C \in {R, G, B}} {(C_{x} (x, y) + Δ C_{x} (x, y))}^{2},

(51)

and similarly for the y-direction:

Δ_{y}^{2} (x, y) = \sum_{C \in {R, G, B}} {(C_{y} (x, y) + Δ C_{y} (x, y))}^{2} .

(52)

The total energy is:

E (x, y) = \sqrt{Δ_{x}^{2} (x, y) + Δ_{y}^{2} (x, y)} .

(53)

LLM contributions include predicting

Δ C_{x} (x, y)

and

Δ C_{y} (x, y)

to improve accuracy and dynamically adjusting channel weights for better feature preservation.

XI. Result Evaluation

The evaluation of the proposed LLM-augmented methods focuses on three primary dimensions: semantic preservation, visual quality, and computational efficiency. Each method’s contribution to image resizing is assessed quantitatively and qualitatively, providing a comprehensive understanding of its effectiveness.

i. Evaluation Metrics

The following metrics are used to evaluate all methods:

Semantic Preservation ( $S$ ): Measures the alignment of detected features or preserved regions with semantically significant areas, as defined by the LLM:

$S = \frac{\sum_{i \in F_{detected}} S_{LLM} (i)}{\sum_{i \in F_{ground} truth} S_{LLM} (i)},$

(54)

where $S_{LLM} (i)$ is the semantic importance score for feature i.
Visual Quality ( $V$ ): Evaluates the perceptual quality of resized images using metrics such as PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index):

$\begin{matrix} V_{PSNR} & = 10 {log}_{10} (\frac{{MAX}^{2}}{MSE}), \end{matrix}$

(55)

$\begin{matrix} V_{SSIM} & = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})} . \end{matrix}$

(56)
Computational Efficiency ( $C$ ): Measures the average runtime per image:

$C = \frac{Total Runtime}{Number of Images} .$

(57)

ii. Results and Discussion

The results, including accuracy metrics and error rates for each sub-experiment, are provided below. The experiment revealed that LLM-augmented resizing methods led to superior performance in image classification, particularly in cases where maintaining fine image details was critical.

Figure 7. Error and accuracy of a sub-experiment showing the improvements from LLM-augmented methods

Illustrative examples of images processed by different methods, including LLM-augmented techniques, are shown below, highlighting the visual differences in the resized images and how LLM-augmented methods contribute to better feature preservation.

Figure 8. Image processed by the LLM-Bicubic Method

Figure 9. Image processed by the LLM-Absolute Energy Method

Figure 10. Image processed by the LLM-Canny Method

Figure 11. Image processed by the LLM-Dual Gradient Energy Method

Figure 12. Image processed by the LLM-Hough Transformation Method

Figure 13. Image processed by the LLM-LC Method

iii. Conclusion

This experiment shows the significant improvements by LLM-augmented methods in image resizing. LLMSeamCarver can preserve finer image details, resulting in improved performance for semantic preservation, visual quality and computational efficiency.

References

Xingyuan Bu, Yuwei Wu, Zhi Gao, and Yunde Jia. Deep convolutional network with locality and sparsity constraints for texture classification. Pattern Recognition, 91:34–46, 2019.
A. Chaurasia and E. Culurciello. Linknet: Exploiting encoder representations for efficient semantic segmentation. arXiv preprint arXiv:1707.03718, 2017. URL: https://arxiv.org/abs/1707.03718.
Han-Cheng Dan, Zhetao Huang, Bingjie Lu, and Mengyu Li. Image-driven prediction system: Automatic extraction of aggregate gradation of pavement core samples integrating deep learning and interactive image processing framework. Construction and Building Materials, 453:139056, 2024.
A. et al. Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR), 2021. URL: https://arxiv.org/abs/2010.11929.
Roderick Frankovich. Enhanced seam carving: Energy gradient functionals and resizing control. In Proceedings of the IEEE International Conference on Image Processing (ICIP), pages 2157–2160, 2011. [CrossRef]
Fusen Guo, Huadong Mo, Jianzhang Wu, Lei Pan, Hailing Zhou, Zhibo Zhang, Lin Li, and Fengling Huang. A hybrid stacking model for enhanced short-term load forecasting. Electronics, 13(14):2719, 2024.
Yue Guo, Shiqi Chen, Ronghui Zhan, Wei Wang, and Jun Zhang. Lmsd-yolo: A lightweight yolo algorithm for multi-scale sar ship detection. Remote Sensing, 14(19):4801, 2022.
S. Han, J. Pool, J. Tran, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR), 2016. URL: https://arxiv.org/abs/1510.00149.
J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), pages 6840–6851, 2020. URL: https://arxiv.org/abs/2006.11239.
Zhuohuan Hu, Fu Lei, Yuxin Fan, Zong Ke, Ge Shi, and Zichao Li. Research on financial multi-asset portfolio risk prediction model based on convolutional neural networks and image processing. arXiv preprint arXiv:2412.03618, 2024.
Zong Ke and Yuchen Yin. Tail risk alert based on conditional autoregressive var by regression quantiles and machine learning algorithms. arXiv.org, 2024. URL: https://arxiv.org/abs/2412.06193.
Holger Kiess. Improved edge preservation in seam carving for image resizing. Computer Graphics Forum, 33(2):421–429, 2014. [CrossRef]
Zhixin Lai, Jing Wu, Suiyao Chen, Yucheng Zhou, and Naira Hovakimyan. Residual-based language models are free boosters for biomedical imaging. 2024. URL: https://arxiv.org/abs/2403.17343, https://arxiv.org/abs/2403.17343.
H. et al. Li. Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 30:300–312, 2021. https://doi.org/10.1109/TIP.2021.3059619. [CrossRef]
Keqin Li, Jin Wang, Xubo Wu, Xirui Peng, Runmian Chang, Xiaoyu Deng, Yiwen Kang, Yue Yang, Fanghao Ni, and Bo Hong. Optimizing automated picking systems in warehouse robots using machine learning. arXiv preprint arXiv:2408.16633, 2024.
Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie, Kazunori Miyata, and Hongsheng Li. Ecnet: Effective controllable text-to-image diffusion models. 2024. URL: https://arxiv.org/abs/2403.18417, arXiv:2403.18417.
Dong Liu. Contemporary model compression on large language models inference. arXiv preprint arXiv:2409.01990, 2024.
Dong Liu. Mt2st: Adaptive multi-task to single-task learning. arXiv preprint arXiv:2406.18038, 2024.
Dong Liu, Meng Jiang, and Kaiser Pister. Llmeasyquant–an easy to use toolkit for llm quantization. arXiv preprint arXiv:2406.19657, 2024.
Dong Liu, Roger Waleffe, Meng Jiang, and Shivaram Venkataraman. Graphsnapshot: Graph machine learning acceleration with fast storage and retrieval. arXiv preprint arXiv:2406.17918, 2024.
Junran Peng, Xingyuan Bu, Ming Sun, Zhaoxiang Zhang, Tieniu Tan, and Junjie Yan. Large-scale object detection in the wild from imbalanced multi-labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9709–9718, 2020.
Meta AI Research. Efficientsam: Fast segmentation everything model. arXiv preprint arXiv:2312.00860, 2024. URL: https://arxiv.org/abs/2312.00860.
Z. et al. Wang. Controlnet: Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023. URL: https://arxiv.org/abs/2302.05543.
Chunya Wu, Zhuoyu Yu, and Dexuan Song. Window views psychological effects on indoor thermal perception: A comparison experiment based on virtual reality environments. E3S Web of Conferences, 546:02003, 2024. URL. [CrossRef]
Wenjun Wu. Alphanetv4: Alpha mining model. arXiv preprint arXiv:2411.04409, 2024.
Ao Xiang, Zongqing Qi, Han Wang, Qin Yang, and Danqing Ma. A multimodal fusion network for student emotion recognition based on transformer and tensor product. 2024. URL: https://arxiv.org/abs/2403.08511, arXiv:2403.08511.
Jun Xiang, Jun Chen, and Yanchao Liu. Hybrid multiscale search for dynamic planning of multi-agent drone traffic. Journal of Guidance, Control, and Dynamics, 46(10):1963–1974, 2023.
Wangjiaxuan Xin, Kanlun Wang, Zhe Fu, and Lina Zhou. Let community rules be reflected in online content moderation. 2024. URL: https://arxiv.org/abs/2408.12035, arXiv:2408.12035.
Wei Zhang, Changxu Wu, and Xiang Li. Comparison of image resizing techniques: A case study of seam carving vs. traditional resizing methods. Journal of Visual Communication and Image Representation, 29:149–158, 2015. https://doi.org/10.1016/j.jvcir.2015.05.010. [CrossRef]
Zhibo Zhang, Pengfei Li, Ahmed Y Al Hammadi, Fusen Guo, Ernesto Damiani, and Chan Yeob Yeun. Reputation-based federated learning defense to mitigate threats in eeg signal classification. In 2024 16th International Conference on Computer and Automation Engineering (ICCAE), pages 173–180. IEEE, 2024.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.

LLMSeamCarver: LLM-Enhanced Content-Aware Image Resizing

Abstract

Keywords:

Subject:

I. Introduction

II. Background

i. Image Resizing and LLMs

ii. LLM-Related Work

iii. Image and Vision Generation Work

iv. Image Resizing and Seam Carving Research

III. Functionality

IV. LLM-Guided Region Prioritization

i. Semantic Importance Scoring by LLMs

ii. Energy Map Adjustment

iii. Semantic Score Calculation with LLMs

iv. Cumulative Energy Map Update

V. LLM-Augmented Bicubic Interpolation

i. Traditional Bicubic Interpolation

ii. LLM-Augmented Interpolation

iii. Semantic Importance Calculation

iv. Cumulative Interpolation Update

VI. LLM-Augmented LC (Loyalty-Clarity) Policy

i. Global Contrast with Semantic Guidance

ii. Frequency-Based Contrast Refinement

iii. LLM Parameter Computation

iv. Contrast-Based Resizing Decision

VII. LLM-Augmented Canny Line Detection

i. Gaussian Filtering

ii. Gradient Calculation

iii. LLM Semantic Parameter Calculation

iv. Edge Refinement

VIII. LLM-Augmented Hough Transformation

i. Mathematical Formulation

ii. Semantic Score Calculation

iii. LLM-Augmented Hough Transformation Algorithm

iv. Threshold Adaptation

IX. LLM-Augmented Absolute Energy Equation

i. Semantic Weighting

ii. Gradient Refinement

iii. Cumulative Energy Update

X. LLM-Augmented Dual Gradient Energy Equation

i. Numerical Differentiation with LLM Adjustments

ii. Gradient Approximation with Adaptive Refinements

iii. Energy Calculation with LLM Refinements

XI. Result Evaluation

i. Evaluation Metrics

ii. Results and Discussion

iii. Conclusion

References

MDPI Initiatives

Important Links

Subscribe