1. Introduction
LLMSeamCarver is an image resizing method designed to maintain important image details during resizing. By integrating LLMs, LLMSeamCarver goes beyond traditional methods, enabling a smarter, context-aware resizing process. LLMs enhance tasks such as region prioritization, interpolation, edge detection, and energy calculation by analyzing image context or textual inputs. This allows LLMSeamCarver to preserve crucial image features like faces and text while optimizing resizing for different scenarios.
This paper explores how LLMs improve the accuracy and efficiency of image resizing.
2. Background
2.1. Image Resizing and LLMs
LLMSeamCarver enhances the traditional seam carving method by incorporating LLMs. The integration of LLMs allows for dynamic resizing of images based on real-time context and user input, rather than relying on fixed parameters. The LLM dynamically optimizes the energy functions that determine how seams are selected and removed, making the resizing process adaptive and context-sensitive. This approach allows for superior preservation of image details and better quality when resizing for specific tasks, such as creating thumbnails or preparing images for various screen sizes.
LLMs contribute by adjusting the parameters based on the content of the image and the desired effect, improving the quality of resized images while providing flexibility. The dynamic optimization of the resizing process using LLMs represents a major leap in the flexibility and efficiency of image resizing techniques.
2.2. LLM-Related Work
Recent research in LLMs has demonstrated their potential in a variety of domains, including text, image, and video generation.
Recently, LLMs has shown its power and potential to enhance traditional image processing workflows through advanced model architectures and optimization techniques. For instance, researchers have developed methods that optimize convolutional neural network (CNN) layers for feature extraction in image processing tasks, thereby improving the performance of deep learning models in image classification and segmentation [
2,
22]. Zhang et al. (2024) explored the use of LLMs in multi-modal fusion networks, enabling the integration of both visual and textual information, which enhances image analysis tasks [
10,
11,
24,
26].
Additionally, efficient algorithm design [
1,
6,
24,
27,
30] and efficient LLMs [
17,
18] have shown promising prospective in efficient model design with LLMs. Through dynamic optimization, LLMs allow for more context-aware resizing by adjusting energy functions during the process. This flexibility ensures that fine-grained details are preserved, which is especially crucial for tasks like content-aware resizing. Studies on text-to-image models have demonstrated how LLMs can modify images based on contextual prompts [
7,
13,
15,
16,
28], providing further advancements in content-aware image resizing.
2.3. Image and Vision Generation Work
The application of deep learning techniques in
image and vision generation has seen remarkable progress in recent years [
3,
4,
9,
21,
23,
25,
26]. Deep convolutional networks have been effectively utilized for texture classification, achieving high accuracy in distinguishing fine-grained patterns and structures (Bu et al., 2019) [
1]. These approaches significantly enhance detail preservation during image resizing by maintaining textures and edges, ensuring that visual fidelity is retained across transformations.
Moreover, advances in
multi-modal fusion networks and techniques for
image-driven predictions—as demonstrated by Dan et al. (2024)—illustrate how artificial intelligence can process and modify images in real-time [
3,
14,
26]. These methods integrate data from diverse sources, facilitating applications such as enhanced video editing and real-time object tracking.
Additionally, model compression is increasingly favored from both model optimization and system design perspectives, enabling efficient resource usage without sacrificing performance [
8,
19,
20]. These innovations support dynamic, user-controlled visual generation, opening new possibilities for customizable and interactive media content creation.
2.4. Image Resizing and Seam Carving Research
The traditional seam carving method was proposed by Avidan and Shamir (2007), other studies have contributed to enhancing seam carving methods. Kiess (2014) introduced improved edge preservation methods within seam carving [
12], which is crucial for ensuring that resized images do not suffer from visible distortions along object boundaries. Zhang (2015) compared classic image resizing methods and found that seam carving provided superior results when compared to simpler resizing techniques, particularly in terms of detail preservation [
29].
Frankovich (2011) further advanced seam carving by integrating energy gradient functionals to enhance the carving process, providing even more control over the resizing operation [
5].
3. Functionality
LLMSeamCarver leverages LLM-Augmented methods to ensure adaptive, high-quality image resizing while preserving both structural and semantic integrity. The key functionalities are:
LLM-Augmented Region Prioritization: LLMs analyze semantics or textual inputs to prioritize key regions, ensuring critical areas (e.g., faces, text) are preserved.
LLM-Augmented Bicubic Interpolation: LLMs optimize bicubic interpolation for high-quality enlargements, adjusting parameters based on context or user input.
LLM-Augmented LC Algorithm: LLMs adapt the LC algorithm by adjusting weights, ensuring the preservation of important image features during resizing.
LLM-Augmented Canny Edge Detection: LLMs guide Canny edge detection to refine boundaries, enhancing clarity and accuracy based on contextual analysis.
LLM-Augmented Hough Transformation: LLMs strengthen the Hough transformation, detecting structural lines and ensuring the preservation of geometric features.
LLM-Augmented Absolute Energy Function: LLMs dynamically adjust energy maps to improve seam selection for more precise resizing.
LLM-Augmented Dual Energy Model: LLMs refine energy functions, enhancing flexibility and ensuring effective seam carving across various use cases.
4. LLM-Guided Region Prioritization
To enhance seam carving, we propose a formalized method where LLMs are used to compute semantic importance for each pixel, directly modifying the energy map . This ensures content-aware resizing with minimal disruption to critical features.
4.1. Semantic Importance Scoring by LLMs
The semantic importance
is computed as:
where:
: Image embedding derived from input image I using a vision feature extractor.
: Text embedding derived from optional user description D using a language transformer.
g: A cross-modal scoring function combining and , implemented via attention mechanisms.
If no description D is provided, defaults to a generic embedding, allowing g to focus solely on image features.
4.2. Energy Map Adjustment
The refined energy map
integrates semantic importance into the standard energy formulation:
where:
: Gradients in x- and y-directions.
: Semantic score indicating the importance of pixel .
: Weighting factor balancing pixel-based energy and semantic importance.
The function ensures that higher semantic scores reduce the likelihood of important regions being removed during seam carving.
4.3. Semantic Score Calculation with LLMs
To compute
, the LLM processes image embeddings
and textual embeddings
:
where:
VisionEncoder: Extracts regional features from I (e.g., object locations, edges).
TextEncoder: Encodes user-provided descriptions into contextual embeddings.
Attention: Combines and to assign based on pixel relevance.
The score is normalized to a range for compatibility with .
4.4. Cumulative Energy Map Update
The cumulative energy map
propagates semantic adjustments through the seam carving process. For a given pixel
, the update is:
where:
: Left, right, and upward cost terms adjusted with .
min: Ensures the optimal seam path minimizes distortion of high-priority regions.
5. LLM-Augmented Bicubic Interpolation
The LLM-Augmented Bicubic Interpolation method enhances traditional bicubic interpolation by incorporating semantic importance scores derived from LLMs. This approach ensures content-aware resizing, prioritizing regions of high semantic value such as faces, text, and objects.
5.1. Traditional Bicubic Interpolation
In standard bicubic interpolation, the value of a pixel at position
is computed using a 4x4 grid of neighboring pixels. The interpolation weights
are determined by the relative distances between the target pixel and its neighbors:
where
a is a constant (commonly
) controlling the interpolation smoothness.
The pixel value at position
is computed as:
where
are the pixel intensities in the 4x4 grid.
5.2. LLM-Augmented Interpolation
In the augmented method, the interpolation weights are modified to incorporate semantic importance:
where:
: Semantic importance score for pixel , computed by the LLM.
: Scalar factor controlling the influence of on the interpolation process.
The updated pixel value is then calculated as:
ensuring higher weights for regions with greater semantic importance.
5.3. Semantic Importance Calculation
The semantic score
is derived using the LLM:
where:
5.4. Cumulative Interpolation Update
The augmented interpolation integrates the adjusted weights:
where
and
ensure greater emphasis on semantically significant regions.
6. LLM-Augmented LC (Loyalty-Clarity) Policy
The LLM-Augmented LC Policy improves contrast-based resizing by integrating semantic importance parameters from LLMs. These parameters ensure that important regions, such as faces and text, receive higher priority during resizing.
6.1. Global Contrast with Semantic Guidance
We calculate the global contrast of a pixel
as:
where:
scales the influence of semantic importance, computed by the LLM.
represents the semantic weight of region r, also derived from the LLM.
R is the set of semantically significant regions.
This formulation integrates semantic relevance into the contrast computation, preserving critical regions.
6.2. Frequency-Based Contrast Refinement
The traditional frequency-based contrast uses intensity distributions:
where
is the frequency of intensity
. We extend this by including semantic importance:
where:
6.3. LLM Parameter Computation
The LLM computes through embeddings and attention mechanisms:
Image Embeddings : A vision encoder extracts pixel-level and global features:
Text Embeddings : A text encoder processes user descriptions:
Spatial Embeddings : Positional embeddings represent region-specific attributes:
Semantic Weights : Attention mechanisms combine embeddings:
Frequency Adjustment : The LLM refines the frequency distribution:
Scaling Factor : A sigmoid function ensures
:
6.4. Contrast-Based Resizing Decision
The adjusted contrast
guides the resizing process. The seam path is chosen to minimize distortion in semantically important regions:
where
P represents the seam path.
7. LLM-Augmented Canny Line Detection
The LLM-Augmented Canny Line Detection integrates semantic importance parameters derived from a Large Language Model (LLM) to enhance edge detection. This approach ensures that edges in critical regions, such as faces or text, are prioritized during resizing.
Figure 1.
Outlier detected by the LLM-LC algorithm.
Figure 1.
Outlier detected by the LLM-LC algorithm.
7.1. Gaussian Filtering
The image is smoothed using a Gaussian filter:
The LLM adjusts the smoothing factor
for semantically important regions:
where:
7.2. Gradient Calculation
Gradients are computed using the Sobel operator:
where
and
are horizontal and vertical derivatives.
The LLM adjusts the gradient magnitude for each pixel:
7.3. LLM Semantic Parameter Calculation
The semantic scores
, smoothing factor
, and thresholds
are computed as follows:
7.4. Edge Refinement
Edges are refined using non-maximum suppression and hysteresis thresholding. The final edge map is defined as:
Figure 2.
Original Image.
Figure 2.
Original Image.
Figure 3.
Edges Detected by the Canny Detector (with LLM Augmentation).
Figure 3.
Edges Detected by the Canny Detector (with LLM Augmentation).
8. LLM-Augmented Hough Transformation
The LLM-Augmented Hough Transformation integrates semantic importance derived from LLMs to enhance line detection. This ensures that key structural features, such as text and faces, are preserved during image resizing.
8.1. Mathematical Formulation
The Hough Transformation maps edge pixels
to Hough space. For each pixel, the radial distance
r is computed as:
where
is the angle of the line in Hough space.
In the LLM-Augmented Hough Transformation, the accumulator
is updated as:
where:
: Accumulator value for the line parameterized by .
: Semantic importance score computed for pixel .
8.2. Semantic Score Calculation
The semantic score
is computed as:
8.3. LLM-Augmented Hough Transformation Algorithm
|
Algorithm 1:LLM-Augmented Hough Transformation |
Require: I, D: Input image I, optional description D. Ensure:
: Detected lines with semantic weighting.
- 1:
- 2:
- 3:
fordo
- 4:
- 5:
for do
- 6:
- 7:
- 8:
end for
- 9:
end for
- 10:
- 11:
|
8.4. Threshold Adaptation
To refine line detection, thresholds for peak detection are dynamically adjusted using the average semantic score
:
where:
: Adaptive threshold for .
: Average semantic score for lines contributing to .
: Scaling factor controlling semantic influence.
Figure 4.
Original Image.
Figure 4.
Original Image.
Figure 5.
Lines Detected by LLM-Augmented Hough Transformation.
Figure 5.
Lines Detected by LLM-Augmented Hough Transformation.
Figure 6.
Prominent Lines Detected by LLM-Augmented Hough Transformation.
Figure 6.
Prominent Lines Detected by LLM-Augmented Hough Transformation.
9. LLM-Augmented Absolute Energy Equation
The LLM-Augmented Absolute Energy Equation refines energy calculations by integrating semantic weights derived from LLMs. This ensures seam carving preserves critical features such as text and faces.
9.1. Semantic Weighting
The LLM assigns a semantic score
to each pixel:
where
I is the input image and context includes features like object and text importance.
scales pixel importance, with higher values for semantically significant regions.
9.2. Gradient Refinement
The original energy gradient:
is modified as:
where
and
are direction-specific weights computed by the LLM.
9.3. Cumulative Energy Update
The cumulative energy function integrates
into the seam carving process:
Here, are adjusted by to prioritize semantically significant pixels.
10. LLM-Augmented Dual Gradient Energy Equation
The proposed LLM-Augmented Dual Gradient Energy Equation to refine edge detection by dynamically adjusting numerical differentiation and gradient computation. LLMs provide context-aware corrections for each computational step.
10.1. Numerical Differentiation with LLM Adjustments
Taylor expansions are dynamically adjusted with LLM corrections. The forward expansion is expressed as:
where
includes context-aware corrections predicted by the LLM. Similarly, for the backward expansion:
LLMs adapt dynamically based on local image gradients and refine higher-order terms to reduce numerical error.
10.2. Gradient Approximation with Adaptive Refinements
Gradient approximations in the
x- and
y-directions incorporate corrections from LLMs:
Here, and are LLM-predicted corrections based on local edge strength and texture complexity. The LLM also adapts to handle regions with high-gradient variations.
10.3. Energy Calculation with LLM Refinements
The energy of a pixel is computed using the LLM-enhanced gradients for each RGB channel. For the
x-direction:
and similarly for the
y-direction:
LLM contributions include predicting and to improve accuracy and dynamically adjusting channel weights for better feature preservation.
11. Result Evaluation
The evaluation of the proposed LLM-augmented methods focuses on three primary dimensions: semantic preservation, visual quality, and computational efficiency. Each method’s contribution to image resizing is assessed quantitatively and qualitatively, providing a comprehensive understanding of its effectiveness.
11.1. Evaluation Metrics
The following metrics are used to evaluate all methods:
Semantic Preservation (): Measures the alignment of detected features or preserved regions with semantically significant areas, as defined by the LLM:
where
is the semantic importance score for feature
i.
Visual Quality (): Evaluates the perceptual quality of resized images using metrics such as PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index):
Computational Efficiency (): Measures the average runtime per image:
11.2. Results and Discussion
The results, including accuracy metrics and error rates for each sub-experiment, are provided below. The experiment revealed that LLM-augmented resizing methods led to superior performance in image classification, particularly in cases where maintaining fine image details was critical.
Figure 7.
Error and accuracy of a sub-experiment showing the improvements from LLM-augmented methods.
Figure 7.
Error and accuracy of a sub-experiment showing the improvements from LLM-augmented methods.
Illustrative examples of images processed by different methods, including LLM-augmented techniques, are shown below, highlighting the visual differences in the resized images and how LLM-augmented methods contribute to better feature preservation.
Figure 8.
Image processed by the LLM-Bicubic Method.
Figure 8.
Image processed by the LLM-Bicubic Method.
Figure 9.
Image processed by the LLM-Absolute Energy Method.
Figure 9.
Image processed by the LLM-Absolute Energy Method.
Figure 10.
Image processed by the LLM-Canny Method.
Figure 10.
Image processed by the LLM-Canny Method.
Figure 11.
Image processed by the LLM-Dual Gradient Energy Method.
Figure 11.
Image processed by the LLM-Dual Gradient Energy Method.
Figure 12.
Image processed by the LLM-Hough Transformation Method.
Figure 12.
Image processed by the LLM-Hough Transformation Method.
Figure 13.
Image processed by the LLM-LC Method.
Figure 13.
Image processed by the LLM-LC Method.
11.3. Conclusion
This experiment shows the significant improvements by LLM-augmented methods in image resizing. LLMSeamCarver can preserve finer image details, resulting in improved performance for semantic preservation, visual quality and computational efficiency.
References
- Xingyuan Bu, Yuwei Wu, Zhi Gao, and Yunde Jia. Deep convolutional network with locality and sparsity constraints for texture classification. Pattern Recognition, 91:34–46, 2019.
- A. Chaurasia and E. Culurciello. Linknet: Exploiting encoder representations for efficient semantic segmentation. arXiv preprint arXiv:1707.03718, 2017. URL: https://arxiv.org/abs/1707.03718.
- Han-Cheng Dan, Zhetao Huang, Bingjie Lu, and Mengyu Li. Image-driven prediction system: Automatic extraction of aggregate gradation of pavement core samples integrating deep learning and interactive image processing framework. Construction and Building Materials, 453:139056, 2024.
- A. et al. Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations (ICLR), 2021. URL: https://arxiv.org/abs/2010.11929.
- Roderick Frankovich. Enhanced seam carving: Energy gradient functionals and resizing control. In Proceedings of the IEEE International Conference on Image Processing (ICIP), pages 2157–2160, 2011. [CrossRef]
- Fusen Guo, Huadong Mo, Jianzhang Wu, Lei Pan, Hailing Zhou, Zhibo Zhang, Lin Li, and Fengling Huang. A hybrid stacking model for enhanced short-term load forecasting. Electronics, 13(14):2719, 2024. [CrossRef]
- Yue Guo, Shiqi Chen, Ronghui Zhan, Wei Wang, and Jun Zhang. Lmsd-yolo: A lightweight yolo algorithm for multi-scale sar ship detection. Remote Sensing, 14(19):4801, 2022. [CrossRef]
- S. Han, J. Pool, J. Tran, and W. J. Dally. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In International Conference on Learning Representations (ICLR), 2016. URL: https://arxiv.org/abs/1510.00149.
- J. Ho, A. Jain, and P. Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems (NeurIPS), pages 6840–6851, 2020. URL: https://arxiv.org/abs/2006.11239.
- Zhuohuan Hu, Fu Lei, Yuxin Fan, Zong Ke, Ge Shi, and Zichao Li. Research on financial multi-asset portfolio risk prediction model based on convolutional neural networks and image processing. arXiv preprint arXiv:2412.03618, 2024.
- Zong Ke and Yuchen Yin. Tail risk alert based on conditional autoregressive var by regression quantiles and machine learning algorithms. arXiv.org, 2024. URL: https://arxiv.org/abs/2412.06193.
- Holger Kiess. Improved edge preservation in seam carving for image resizing. Computer Graphics Forum, 33(2):421–429, 2014. doi:10.1111/cgf.12320. [CrossRef]
- Zhixin Lai, Jing Wu, Suiyao Chen, Yucheng Zhou, and Naira Hovakimyan. Residual-based language models are free boosters for biomedical imaging. 2024. URL: https://arxiv.org/abs/2403.17343, arXiv:2403.17343.
- H. et al. Li. Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 30:300–312, 2021. [CrossRef]
- Keqin Li, Jin Wang, Xubo Wu, Xirui Peng, Runmian Chang, Xiaoyu Deng, Yiwen Kang, Yue Yang, Fanghao Ni, and Bo Hong. Optimizing automated picking systems in warehouse robots using machine learning. arXiv preprint arXiv:2408.16633, 2024.
- Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie, Kazunori Miyata, and Hongsheng Li. Ecnet: Effective controllable text-to-image diffusion models. 2024. URL: https://arxiv.org/abs/2403.18417, arXiv:2403.18417.
- Dong Liu, Zhixin Lai, Yite Wang, Jing Wu, Yanxuan Yu, Zhongwei Wan, Benjamin Lengerich, and Ying Nian Wu. Efficient large foundation model inference: A perspective from model and system co-design. 2024. URL: https://arxiv.org/abs/2409.01990, arXiv:2409.01990.
- Dong Liu and Kaiser Pister. Llmeasyquant – an easy to use toolkit for llm quantization. 2024. URL: https://arxiv.org/abs/2406.19657, arXiv:2406.19657.
- Dong Liu, Roger Waleffe, Meng Jiang, and Shivaram Venkataraman. Graphsnapshot: Graph machine learning acceleration with fast storage and retrieval. 2024. URL: https://arxiv.org/abs/2406.17918, arXiv:2406.17918.
- Dong Liu and Yanxuan Yu. Mt2st: Adaptive multi-task to single-task learning. 2024. URL: https://arxiv.org/abs/2406.18038, arXiv:2406.18038.
- Junran Peng, Xingyuan Bu, Ming Sun, Zhaoxiang Zhang, Tieniu Tan, and Junjie Yan. Large-scale object detection in the wild from imbalanced multi-labels. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9709–9718, 2020.
- Meta AI Research. Efficientsam: Fast segmentation everything model. arXiv preprint arXiv:2312.00860, 2024. URL: https://arxiv.org/abs/2312.00860.
- Z. et al. Wang. Controlnet: Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023. URL: https://arxiv.org/abs/2302.05543.
- hunya Wu, Zhuoyu Yu, and Dexuan Song. Window views psychological effects on indoor thermal perception: A comparison experiment based on virtual reality environments. E3S Web of Conferences, 546:02003, 2024. URL:. [CrossRef]
- Wenjun Wu. Alphanetv4: Alpha mining model. arXiv preprint arXiv:2411.04409, 2024.
- Ao Xiang, Zongqing Qi, Han Wang, Qin Yang, and Danqing Ma. A multimodal fusion network for student emotion recognition based on transformer and tensor product. 2024. URL: https://arxiv.org/abs/2403.08511, arXiv:2403.08511.
- Jun Xiang, Jun Chen, and Yanchao Liu. Hybrid multiscale search for dynamic planning of multi-agent drone traffic. Journal of Guidance, Control, and Dynamics, 46(10):1963–1974, 2023.
- Wangjiaxuan Xin, Kanlun Wang, Zhe Fu, and Lina Zhou. Let community rules be reflected in online content moderation. 2024. URL: https://arxiv.org/abs/2408.12035, arXiv:2408.12035.
- Wei Zhang, Changxu Wu, and Xiang Li. Comparison of image resizing techniques: A case study of seam carving vs. traditional resizing methods. Journal of Visual Communication and Image Representation, 29:149–158, 2015. [CrossRef]
- Zhibo Zhang, Pengfei Li, Ahmed Y Al Hammadi, Fusen Guo, Ernesto Damiani, and Chan Yeob Yeun. Reputation-based federated learning defense to mitigate threats in eeg signal classification. In 2024 16th International Conference on Computer and Automation Engineering (ICCAE), pages 173–180. IEEE, 2024.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).