Submitted:
19 February 2025
Posted:
20 February 2025
You are already at the latest version
Abstract
Keywords:
I. Introduction
II. Background and Related Work
A. Fine-Tuning of Large Language Models
- High computational cost: Updating billions of parameters requires extensive GPU memory and processing power [24].
- Storage inefficiency: Each fine-tuned model requires storing a full copy of the modified weights, making it infeasible to maintain multiple task-specific models [25].
- Catastrophic forgetting: Adapting a model to one task may degrade its performance on previously learned tasks if not handled carefully.
B. Parameter-Efficient Fine-Tuning (PEFT) Approaches
1) Adapter Layers
2) Prompt Tuning and Prefix Tuning
3) BitFit
4) Low-Rank Adaptation (LoRA)
C. Evolution of LoRA and Its Adoption in NLP
- Natural Language Understanding (NLU): Tasks such as sentiment analysis, named entity recognition, and text classification benefit from LoRA’s ability to fine-tune LLMs efficiently.
- Text Generation: LoRA has been integrated into large autoregressive models like GPT to improve domain-specific text generation while maintaining fluency and coherence [37].
- Multimodal Applications: Recent work has extended LoRA to multimodal models, enabling efficient adaptation of vision-language models for tasks such as image captioning and visual question answering.
D. Summary
III. Mathematical Foundations of LoRA
A. Low-Rank Decomposition in Neural Networks
B. Parameter Efficiency and Complexity Reduction
C. Integration with Transformer Architectures
D. Rank Selection and Performance Trade-Offs
E. Comparison with Other Fine-Tuning Methods
- Storage efficiency: Since only the low-rank matrices are stored, multiple task-specific adaptations can be maintained without redundant full model copies.
- Reduced computational cost: Training requires fewer parameters to be updated, leading to faster convergence and lower memory consumption [56].
- Preservation of pre-trained knowledge: By keeping the original model weights frozen, LoRA avoids catastrophic forgetting and enables easy model reversibility [57].
F. Summary
IV. Practical Implementation of LoRA
A. Integrating LoRA into Deep Learning Frameworks
- Hugging Face Transformers: The Hugging Face library provides APIs to integrate LoRA with models such as GPT, BERT, and T5, enabling efficient fine-tuning with minimal modifications [62].
- PyTorch LoRA Implementations: Several PyTorch-based implementations, such as peft (Parameter Efficient Fine-Tuning), provide easy-to-use modules for applying LoRA to transformer layers.
- TensorFlow and JAX Support: Although less common, LoRA implementations exist for TensorFlow and JAX, allowing for efficient adaptation of LLMs within these ecosystems.
B. Training Strategies for LoRA
1) Optimizing the Learning Rate
2) Gradient Accumulation and Mixed Precision Training
3) Task-Specific Adaptation
C. Evaluation and Benchmarking
- Perplexity (PPL): Measures how well the fine-tuned model predicts test data, commonly used in language modeling tasks [73].
- Accuracy and F1-score: Standard metrics for classification tasks, such as sentiment analysis or named entity recognition [74].
- BLEU and ROUGE scores: Used for text generation and summarization tasks to evaluate output quality [75].
- Computational efficiency: GPU memory usage, training speed, and inference latency are key factors in evaluating LoRA’s efficiency [76].
D. Real-World Applications of LoRA
1) Natural Language Processing (NLP)
- Chatbots and Virtual Assistants: LoRA enables fast adaptation of conversational AI models to specific industries (e.g., healthcare, customer service) [78].
- Machine Translation: By fine-tuning pre-trained models like mBART, LoRA improves translation quality without excessive computational costs [79].
- Legal and Financial Text Processing: LoRA has been used to adapt LLMs for specialized jargon-heavy domains, such as legal document summarization [80].
2) Computer Vision
3) Biomedical and Healthcare Applications
4) Code Generation and Programming Assistance
E. Challenges and Best Practices
- Rank Selection: Choosing an appropriate rank r is crucial for maintaining a balance between efficiency and expressiveness.
- Memory Efficiency: While LoRA reduces training costs, inference efficiency remains an area of active research [86].
- Hybrid Fine-Tuning Approaches: Combining LoRA with other techniques, such as prompt tuning and adapter layers, can further improve performance [87].
F. Summary
V. Recent Advancements and Ongoing Research
A. Hybrid Approaches: Combining LoRA with Other Fine-Tuning Techniques
1) LoRA + Prompt Tuning
2) LoRA + Prefix Tuning
3) LoRA + Adapter Layers
B. Adaptive Rank Selection and Dynamic LoRA
- Layer-wise Rank Allocation: Instead of assigning a uniform rank to all transformer layers, models can be optimized by using higher ranks in critical layers (e.g., deeper attention layers) and lower ranks in less important layers [100].
- Task-Specific Rank Optimization: Algorithms such as evolutionary search or reinforcement learning can be employed to find optimal rank configurations for different tasks [101].
- Sparse LoRA: Some studies propose sparsifying the low-rank matrices to further reduce computational requirements while preserving model accuracy [102].
C. LoRA for Multimodal and Cross-Domain Applications
1) LoRA for Vision-Language Models
2) LoRA for Speech and Audio Processing
3) LoRA for Reinforcement Learning and Robotics
D. Optimizing LoRA for Efficient Inference
1) Quantized LoRA
2) Fusion of LoRA Adapters
3) LoRA for On-Device AI
E. Theoretical Insights into LoRA’s Effectiveness
- LoRA and Model Overparameterization: Research suggests that large language models contain redundant parameters, making them well-suited for low-rank adaptations [119].
- Information Flow in LoRA-Modified Networks: Studies analyzing LoRA-modified transformers indicate that low-rank updates primarily affect key subspaces responsible for task-specific information [120].
- Optimization Landscapes with LoRA: Some researchers have analyzed LoRA’s impact on the optimization landscape, showing that it enables more stable convergence compared to full fine-tuning [121].
F. Challenges and Future Directions
- LoRA for Highly Specialized Tasks: While LoRA works well for many tasks, certain applications requiring extensive parameter updates may benefit from hybrid approaches [123].
- Reducing Inference Overhead: Although LoRA is efficient during training, methods to optimize inference without introducing additional latency remain an open research question [124].
- Automated LoRA Configuration: Developing algorithms that automatically determine the optimal rank and layer placement for LoRA in different architectures can further enhance its usability [125].
- Expanding LoRA Beyond Transformers: Most research has focused on transformers, but exploring LoRA’s applicability to other architectures, such as CNNs and RNNs, could broaden its impact.
G. Summary
VI. Conclusion and Future Perspectives
A. Key Takeaways
- Parameter Efficiency: LoRA drastically reduces the number of trainable parameters by decomposing weight updates into low-rank matrices, making fine-tuning feasible for large-scale models.
- Computational and Memory Benefits: By keeping the original model weights frozen, LoRA significantly lowers the GPU memory footprint and accelerates training compared to full fine-tuning [131].
- Seamless Integration: LoRA has been successfully integrated into widely-used deep learning frameworks such as PyTorch and Hugging Face Transformers, facilitating its adoption by the research and industry communities.
- Hybrid and Adaptive Techniques: Recent advancements, such as combining LoRA with prompt tuning, adapter layers, and dynamic rank selection, have further improved its flexibility and effectiveness [132].
- Multimodal and Cross-Domain Applications: LoRA has extended beyond NLP and is now being explored in vision-language models, speech processing, and even reinforcement learning.
- Inference-Time Considerations: While LoRA optimizes training efficiency, reducing inference overhead remains an important area of research.
B. Future Directions
1) Automated LoRA Optimization
2) Reducing Inference Overhead
3) Expanding LoRA Beyond Transformers
4) Continual Learning and On-Device Adaptation
5) LoRA for Foundation Models
C. Final Thoughts
References
- Konstantinidis, T.; Iacovides, G.; Xu, M.; Constantinides, T.G.; Mandic, D.P. Finllama: Financial sentiment classification for algorithmic trading applications. arXiv 2024, arXiv:2403.12285. [Google Scholar]
- Zhu, Y.; Wichers, N.; Lin, C.; Wang, X.; Chen, T.; Shu, L.; Lu, H.; Liu, C.; Luo, L.; Chen, J.; et al. Sira: Sparse mixture of low rank adaptation. arXiv 2023, arXiv:2311.09179. [Google Scholar]
- Chen, T.; Ding, T.; Yadav, B.; Zharkov, I.; Liang, L. Lorashear: Efficient large language model structured pruning and knowledge recovery. arXiv 2023, arXiv:2310.18356. [Google Scholar]
- Chen, Y.; Qian, S.; Tang, H.; Lai, X.; Liu, Z.; Han, S.; Jia, J. Longlora: Efficient fine-tuning of long-context large language models. arXiv 2023, arXiv:2309.12307. [Google Scholar]
- Zhang, H. Sinklora: Enhanced efficiency and chat capabilities for long-context large language models. arXiv 2023, arXiv:2406.05678. [Google Scholar]
- He, J.; Zhou, C.; Ma, X.; Berg-Kirkpatrick, T.; Neubig, G. Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations; 2022. [Google Scholar]
- Meng, X.; Dai, D.; Luo, W.; Yang, Z.; Wu, S.; Wang, X.; Wang, P.; Dong, Q.; Chen, L.; Sui, Z. Periodiclora: Breaking the low-rank bottleneck in lora optimization. arXiv 2024, arXiv:2402.16141. [Google Scholar]
- Wang, H.; Xiao, Z.; Li, Y.; Wang, S.; Chen, G.; Chen, Y. Milora: Harnessing minor singular components for parameter-efficient llm finetuning. arXiv 2024, arXiv:2406.09044. [Google Scholar]
- Zhang, F.; Pilanci, M. Riemannian preconditioned lora for fine-tuning foundation models. arXiv 2024, arXiv:2402.02347. [Google Scholar]
- Gao, C.; Chen, K.; Rao, J.; Sun, B.; Liu, R.; Peng, D.; Zhang, Y.; Guo, X.; Yang, J.; Subrahmanian, V.S. Higher layers need more lora experts. arXiv 2024, arXiv:2402.08562. [Google Scholar]
- Gong, Y.; Zhan, Z.; Jin, Q.; Li, Y.; Idelbayev, Y.; Liu, X.; Zharkov, A.; Aberman, K.; Tulyakov, S.; Wang, Y.; et al. E2gan: Efficient training of efficient gans for image-to-image translation. arXiv 2024, arXiv:2401.06127. [Google Scholar]
- Qin, H.; Ma, X.; Zheng, X.; Li, X.; Zhang, Y.; Liu, S.; Luo, J.; Liu, X.; Magno, M. Accurate lora-finetuning quantization of llms via information retention. arXiv 2024, arXiv:2402.05445. [Google Scholar]
- Yadav, P.; Choshen, L.; Raffel, C.; Bansal, M. Compeft: Compression for communicating parameter efficient updates via sparsification and quantization. arXiv 2023, arXiv:2311.13171. [Google Scholar]
- Asadi, N.; Beitollahi, M.; Khalil, Y.H.; Li, Y.; Zhang, G.; Chen, X. Does combining parameter-efficient modules improve few-shot transfer accuracy? arXiv 2024, arXiv:2402.15414. [Google Scholar]
- Zhang, M.; Chen, H.; Shen, C.; Yang, Z.; Ou, L.; Yu, X.; Zhuang, B. Loraprune: Pruning meets low-rank parameter-efficient fine-tuning. arXiv 2023, arXiv:2305.18403. [Google Scholar]
- Hendrycks, D.; Burns, C.; Basart, S.; Zou, A.; Mazeika, M.; Song, D.; Steinhardt, J. Measuring massive multitask language understanding. arXiv 2020, arXiv:2009.03300. [Google Scholar]
- Zniyed, Y.; Nguyen, T.P.; et al. Efficient tensor decomposition-based filter pruning. Neural Networks 2024, 178, 106393. [Google Scholar]
- Liu, Z.; Lyn, J.; Zhu, W.; Tian, X.; Graham, Y. Alora: Allocating low-rank adaptation for fine-tuning large language models. arXiv 2024, arXiv:2403.16187. [Google Scholar]
- Xu, Y.; Xie, L.; Gu, X.; Chen, X.; Chang, H.; Zhang, H.; Chen, Z.; Zhang, X.; Tian, Q. Qa-lora: Quantization-aware low-rank adaptation of large language models. arXiv 2023, arXiv:2309.14717. [Google Scholar]
- Zhang, Y.; Wang, M.; Wu, Y.; Tiwari, P.; Li, Q.; Wang, B.; Qin, J. Dialoguellm: Context and emotion knowledge-tuned large language models for emotion recognition in conversations. arXiv 2024, arXiv:2310.11374. [Google Scholar]
- Wang, H.; Xiang, X.; Fan, Y.; Xue, J. Customizing 360-degree panoramas through text-to-image diffusion models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; 2024; pp. 4933–4943. [Google Scholar]
- Zhang, F.; Li, L.; Chen, J.; Jiang, Z.; Wang, B.; Qian, Y. Increlora: Incremental parameter allocation method for parameter-efficient fine-tuning. arXiv 2023, arXiv:2308.12043. [Google Scholar]
- Hu, Y.; Xie, Y.; Wang, T.; Chen, M.; Pan, Z. Structure-aware low-rank adaptation for parameter-efficient fine-tuning. Mathematics 2023, 11, 4317. [Google Scholar] [CrossRef]
- Ma, Y.; Fan, Y.; Ji, J.; Wang, H.; Sun, X.; Jiang, G.; Shu, A.; Ji, R. X-dreamer: Creating high-quality 3d content by bridging the domain gap between text-to-2d and text-to-3d generation. arXiv 2023, arXiv:2312.00085. [Google Scholar]
- He, X.; Li, C.; Zhang, P.; Yang, J.; Wang, X.E. Parameter-efficient model adaptation for vision transformers. In Thirty-Seventh AAAI Conference on Artificial Intelligence; 2023; pp. 817–825. [Google Scholar]
- Belofsky, J. Token-level adaptation of lora adapters for downstream task generalization. In 6th Artificial Intelligence and Cloud Computing Conference; 2023; pp. 168–172. [Google Scholar]
- Suri, K.; Mishra, P.; Saha, S.; Singh, A. Suryakiran at mediqa-sum 2023: Leveraging lora for clinical dialogue summarization. In Working Notes of the Conference and Labs of the Evaluation Forum; 2023; pp. 1720–1735. [Google Scholar]
- Li, S.; Lu, H.; Wu, T.; Yu, M.; Weng, Q.; Chen, X.; Shan, Y.; Yuan, B.; Wang, W. Caraserve: Cpu-assisted and rank-aware lora serving for generative llm inference. arXiv 2024, arXiv:2401.11240. [Google Scholar]
- Li, S. Diffstyler: Diffusion-based localized image style transfer. arXiv 2024, arXiv:2403.18461. [Google Scholar]
- Miles, R.; Reddy, P.; Elezi, I.; Deng, J. Velora: Memory efficient training using rank-1 sub-token projections. arXiv 2024, arXiv:2405.17991. [Google Scholar]
- Pan, R.; Liu, X.; Diao, S.; Pi, R.; Zhang, J.; Han, C.; Zhang, T. LISA: layerwise importance sampling for memory-efficient large language model fine-tuning. arXiv 2024, arXiv:2403.17919. [Google Scholar]
- Frank, M.; Wolfe, P.; et al. An algorithm for quadratic programming. Naval research logistics quarterly 1956, 3, 95–110. [Google Scholar] [CrossRef]
- Wang, A.; Islam, M.; Xu, M.; Zhang, Y.; Ren, H. SAM meets robotic surgery: An empirical study on generalization, robustness and adaptation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; 2023; pp. 234–244. [Google Scholar]
- Gema, A.P.; Daines, L.; Minervini, P.; Alex, B. Parameter-efficient fine-tuning of llama for the clinical domain. arXiv 2023, arXiv:2307.03042. [Google Scholar]
- Sui, Y.; Yin, M.; Gong, Y.; Xiao, J.; Phan, H.; Yuan, B. ELRT: efficient low-rank training for compact convolutional neural networks. arXiv 2024, arXiv:2401.10341. [Google Scholar]
- Kim, S.; Yang, H.; Kim, Y.; Hong, Y.; Park, E. Hydra: Multi-head low-rank adaptation for parameter efficient fine-tuning. Neural Networks 2024, 106414. [Google Scholar] [CrossRef]
- Bhatti, A.; Parmar, S.; Lee, S. SM70: A large language model for medical devices. arXiv 2023, arXiv:2312.06974. [Google Scholar]
- Sun, Y.; Li, Z.; Li, Y.; Ding, B. Improving LoRA in privacy-preserving federated learning. arXiv 2024, arXiv:2403.12313. [Google Scholar]
- Liu, Y.; An, C.; Qiu, X. Y-tuning: An efficient tuning paradigm for large-scale pre-trained models via label representation learning. Frontiers of Computer Science 2024, 18, 184320. [Google Scholar] [CrossRef]
- Li, Y.; Yu, Y.; Liang, C.; He, P.; Karampatziakis, N.; Chen, W.; Zhao, T. Loftq: Lora-fine-tuning-aware quantization for large language models. arXiv 2023, arXiv:2310.08659. [Google Scholar]
- Smith, J.S.; Cascante-Bonilla, P.; Arbelle, A.; Kim, D.; Panda, R.; Cox, D.D.; Yang, D.; Kira, Z.; Feris, R.; Karlinsky, L. Construct-vl: Data-free continual structured VL concepts learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2023; pp. 14994–15004. [Google Scholar]
- Han, A.; Li, J.; Huang, W.; Hong, M.; Takeda, A.; Jawanpuria, P.; Mishra, B. Sltrain: a sparse plus low-rank approach for parameter and memory efficient pretraining. arXiv 2024, arXiv:2406.02214. [Google Scholar]
- Ayupov, S.; Chirkova, N. Parameter-efficient finetuning of transformers for source code. arXiv 2022, arXiv:2212.05901. [Google Scholar]
- Huang, T.; Zeng, Y.; Zhang, Z.; Xu, W.; Xu, H.; Xu, S.; Lau, R.W.H.; Zuo, W. Dreamcontrol: Control-based text-to-3d generation with 3d self-prior. arXiv 2023, arXiv:2312.06439. [Google Scholar]
- Blattmann, A.; Dockhorn, T.; Kulal, S.; Mendelevitch, D.; Kilian, M.; Lorenz, D.; Levi, Y.; English, Z.; Voleti, V.; Letts, A.; et al. Stable video diffusion: Scaling latent video diffusion models to large datasets. arXiv 2023, arXiv:2311.15127. [Google Scholar]
- Liu, S.; Keung, J.; Yang, Z.; Liu, F.; Zhou, Q.; Liao, Y. Delving into parameter-efficient fine-tuning in code change learning: An empirical study. arXiv 2024, arXiv:2402.06247. [Google Scholar]
- Zhang, R.; Qiang, R.; Somayajula, S.A.; Xie, P. Autolora: Automatically tuning matrix ranks in low-rank adaptation based on meta learning. arXiv 2024, arXiv:2403.09113. [Google Scholar]
- Bornheim, T.; Grieger, N.; Blaneck, P.G.; Bialonski, S. Speaker attribution in german parliamentary debates with qlora-adapted large language models. arXiv 2024, arXiv:2309.09902. [Google Scholar] [CrossRef]
- Yeo, J.H.; Han, S.; Kim, M.; Ro, Y.M. Where visual speech meets language: VSP-LLM framework for efficient and context-aware visual speech processing. arXiv 2024, arXiv:2402.15151. [Google Scholar]
- Qiang, R.; Zhang, R.; Xie, P. Bilora: A bi-level optimization framework for overfitting-resilient low-rank adaptation of large pre-trained models. arXiv 2024, arXiv:2403.13037. [Google Scholar]
- Gallego-Posada, J.; Ramirez, J.; Erraqabi, A.; Bengio, Y.; Lacoste-Julien, S. Controlled sparsity via constrained optimization or: How I learned to stop tuning penalties and love constraints. In Annual Conference on Neural Information Processing Systems; 2022. [Google Scholar]
- Zhai, Y.; Zhang, H.; Lei, Y.; Yu, Y.; Xu, K.; Feng, D.; Ding, B.; Wang, H. Uncertainty-penalized reinforcement learning from human feedback with diverse reward lora ensembles. arXiv 2024, arXiv:2401.00243. [Google Scholar]
- Jin, F.; Liu, Y.; Tan, Y. Derivative-free optimization for low-rank adaptation in large language models. arXiv 2024, arXiv:2403.01754. [Google Scholar] [CrossRef]
- Jang, U.; Lee, J.D.; Ryu, E.K. Lora training in the NTK regime has no spurious local minima. arXiv 2024, arXiv:2402.11867. [Google Scholar]
- Shen, Y.; Xu, Z.; Wang, Q.; Cheng, Y.; Yin, W.; Huang, L. Multimodal instruction tuning with conditional mixture of lora. arXiv 2024, arXiv:2402.15896. [Google Scholar]
- Lee, A.N.; Hunter, C.J.; Ruiz, N. Platypus: Quick; cheap, and powerful refinement of llms. arXiv 2023, arXiv:2308.07317. [Google Scholar]
- Zhou, H.; Lu, X.; Xu, W.; Zhu, C.; Zhao, T. Lora-drop: Efficient lora parameter pruning based on output evaluation. arXiv 2024, arXiv:2402.07721. [Google Scholar]
- Zi, B.; Qi, X.; Wang, L.; Wang, J.; Wong, K.; Zhang, L. Delta-lora: Fine-tuning high-rank parameters with the delta of low-rank matrices. arXiv 2023, arXiv:2309.02411. [Google Scholar]
- Sun, J.; Fu, D.; Hu, Y.; Wang, S.; Rassin, R.; Juan, D.-C.; Alon, D.; Herrmann, C.; van Steenkiste, S.; Krishna, R.; et al. Dreamsync: Aligning text-to-image generation with image understanding feedback. In Synthetic Data for Computer Vision Workshop@ CVPR 2024; 2023. [Google Scholar]
- Luo, S.; Tan, Y.; Patil, S.; Gu, D.; von Platen, P.; Passos, A.; Huang, L.; Li, J.; Zhao, H. Lcm-lora: A universal stable-diffusion acceleration module. arXiv 2023, arXiv:2311.05556. [Google Scholar]
- Meng, F.; Wang, Z.; Zhang, M. Pissa: Principal singular values and singular vectors adaptation of large language models. arXiv 2024, arXiv:2404.02948. [Google Scholar]
- Ye, M.; Fang, X.; Du, B.; Yuen, P.C.; Tao, D. Heterogeneous federated learning: State-of-the-art and research challenges. ACM Computing Surveys 2024, 56, 79. [Google Scholar] [CrossRef]
- Li, H.; Koto, F.; Wu, M.; Aji, A.F.; Baldwin, T. Bactrian-x: Multilingual replicable instruction-following models with low-rank adaptation. arXiv 2023, arXiv:2305.15011. [Google Scholar]
- Valipour, M.; Rezagholizadeh, M.; Kobyzev, I.; Ghodsi, A. Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv 2022, arXiv:2210.07558. [Google Scholar]
- Sidahmed, H.; Phatale, S.; Hutcheson, A.; Lin, Z.; Chen, Z.; Yu, Z.; Jin, J.; Komarytsia, R.; Ahlheim, C.; Zhu, Y.; et al. Perl:parameter efficient reinforcement learning from human feedback. arXiv 2024, arXiv:2403.10704. [Google Scholar]
- Sun, Y.; Li, M.; Cao, Y.; Wang, K.; Wang, W.; Zeng, X.; Zhao, R. To be or not to be? an exploration of continuously controllable prompt engineering. arXiv 2023, arXiv:2311.09773. [Google Scholar]
- Quan, S. Dmoerm: Recipes of mixture-of-experts for effective reward modeling. arXiv 2024, arXiv:2403.01197. [Google Scholar]
- Zhang, L.; Zhang, L.; Shi, S.; Chu, X.; Li, B. Lora-fa: Memory-efficient low-rank adaptation for large language models fine-tuning. arXiv 2023, arXiv:2308.03303. [Google Scholar]
- Wang, X.; Aitchison, L.; Rudolph, M. Lora ensembles for large language model fine-tuning. arXiv 2023, arXiv:2310.00035. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT; 2019; pp. 4171–4186. [Google Scholar]
- Zhang, Y.; Wang, J.; Yu, L.; Xu, D.; Zhang, X. Personalized lora for human-centered text understanding. In Thirty-Eighth AAAI Conference on Artificial Intelligence; 2024; pp. 19588–19596. [Google Scholar]
- Zaken, E.B.; Goldberg, Y.; Ravfogel, S. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers); 2022; pp. 1–9. [Google Scholar]
- Zhu, Y.; Yang, X.; Wu, Y.; Zhang, W. Parameter-efficient fine-tuning with layer pruning on free-text sequence-to-sequence modeling. arXiv 2023, arXiv:2305.08285. [Google Scholar]
- Yang, A.X.; Robeyns, M.; Wang, X.; Aitchison, L. Bayesian low-rank adaptation for large language models. arXiv 2023, arXiv:2308.13111. [Google Scholar]
- Chen, L.; Ye, Z.; Wu, Y.; Zhuo, D.; Ceze, L.; Krishnamurthy, A. Punica: Multi-tenant lora serving. Proceedings of Machine Learning and Systems; 2024; pp. 1–13. [Google Scholar]
- Ding, H.; Gao, J.; Yuan, Y.; Wang, Q. Samlp: A customized segment anything model for license plate detection. arXiv 2024, arXiv:2401.06374. [Google Scholar]
- Zeng, Y.; Lee, K. The expressive power of low-rank adaptation. arXiv 2023, arXiv:2310.17513. [Google Scholar]
- Khandelwal, A. Infusion: Inject and attention fusion for multi concept zero-shot text-based video editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision; 2023; pp. 3017–3026. [Google Scholar]
- Louizos, C.; Welling, M.; Kingma, D.P. Learning sparse neural networks through l0 regularization. arXiv 2017, arXiv:1712.01312. [Google Scholar]
- Liu, Q.; Wu, X.; Zhao, X.; Zhu, Y.; Xu, D.; Tian, F.; Zheng, Y. Moelora: An moe-based parameter efficient fine-tuning method for multi-task medical applications. arXiv 2023, arXiv:2310.18339. [Google Scholar]
- Yang, S.; Zhou, Y.; Liu, Z.; Loy, C.C. Rerender A video: Zero-shot text-guided video-to-video translation. In SIGGRAPH Asia 2023 Conference Papers; 2023; pp. 1–11. [Google Scholar]
- Liu, S.; Wang, C.; Yin, H.; Molchanov, P.; Wang, Y.F.; Cheng, K.; Chen, M. Dora: Weight-decomposed low-rank adaptation. arXiv 2024, arXiv:2402.09353. [Google Scholar]
- Ding, N.; Lv, X.; Wang, Q.; Chen, Y.; Zhou, B.; Liu, Z.; Sun, M. Sparse low-rank adaptation of pre-trained language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; 2023; pp. 4133–4145. [Google Scholar]
- Fomenko, V.; Yu, H.; Lee, J.; Hsieh, S.; Chen, W. A note on lora. arXiv 2024, arXiv:2404.05086. [Google Scholar]
- Zhang, S.; Chen, Z.; Chen, S.; Shen, Y.; Sun, Z.; Gan, C. Improving reinforcement learning from human feedback with efficient reward model ensemble. arXiv 2024, arXiv:2401.16635. [Google Scholar]
- Zhang, J.; Chen, S.; Liu, J.; He, J. Composing parameter-efficient modules with arithmetic operations. arXiv 2023, arXiv:2306.14870. [Google Scholar]
- Ding, N.; Qin, Y.; Yang, G.; Wei, F.; Yang, Z.; Su, Y.; Hu, S.; Chen, Y.; Chan, C.; Chen, W.; et al. Parameter-efficient fine-tuning of large-scale pre-trained language models. Nat. Mac. Intell. 2023, 5, 220–235. [Google Scholar] [CrossRef]
- Ren, P.; Shi, C.; Wu, S.; Zhang, M.; Ren, Z.; de Rijke, M.; Chen, Z.; Pei, J. Mini-ensemble low-rank adapters for parameter-efficient fine-tuning. arXiv 2024, arXiv:2402.17263. [Google Scholar]
- Feng, W.; Zhu, L.; Yu, L. Cheap lunch for medical image segmentation by fine-tuning SAM on few exemplars. arXiv 2023, arXiv:2308.14133. [Google Scholar]
- Yang, H.; Wang, Y.; Xu, X.; Zhang, H.; Bian, Y. Can we trust llms? Mitigate overconfidence bias in llms through knowledge transfer. arXiv 2024, arXiv:2405.16856. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. In The Tenth International Conference on Learning Representations; 2022. [Google Scholar]
- Jiang, W.; Lin, B.; Shi, H.; Zhang, Y.; Li, Z.; Kwok, J.T. Effective; parameter-efficient reusing fine-tuned models. arXiv 2023, arXiv:2310.01886. [Google Scholar]
- Ba, K.; Banaei, M.; Aberer, K.; Tabor, J. Lora-xs: Low-rank adaptation with extremely small number of parameters. arXiv 2024, arXiv:2405.17604. [Google Scholar]
- Lialin, V.; Muckatira, S.; Shivagunde, N.; Rumshisky, A. Relora: High-rank training through low-rank updates. In The Twelfth International Conference on Learning Representations; 2023. [Google Scholar]
- Zhao, J.; Zhang, Z.; Chen, B.; Wang, Z.; Anandkumar, A.; Tian, Y. Galore: Memory-efficient LLM training by gradient low-rank projection. arXiv 2024, arXiv:2403.03507. [Google Scholar]
- Yan, Y.; Tang, S.; Shi, Z.; Yang, Q. FeDeRA: Efficient fine-tuning of language models in federated learning leveraging weight decomposition. arXiv 2024, arXiv:2404.18848. [Google Scholar]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Liu, T.; Low, B.K.H. Goat: Fine-tuned llama outperforms GPT-4 on arithmetic tasks. arXiv 2023, arXiv:2305.14201. [Google Scholar]
- Woo, S.; Park, B.; Kim, B.; Jo, M.; Kwon, S.; Jeon, D.; Lee, D. Dropbp: Accelerating fine-tuning of large language models by dropping backward propagation. arXiv 2024, arXiv:2402.17812. [Google Scholar]
- Malladi, S.; Wettig, A.; Yu, D.; Chen, D.; Arora, S. A kernel-based view of language model fine-tuning. In International Conference on Machine Learning; 2023; pp. 23610–23641. [Google Scholar]
- Yu, K.; Liu, J.; Feng, M.; Cui, M.; Xie, X. Boosting3d: High-fidelity image-to-3d by boosting 2d diffusion prior to 3d prior with progressive learning. arXiv 2023, arXiv:2311.13617. [Google Scholar]
- Chitale, R.; Vaidya, A.; Kane, A.; Ghotkar, A. Task arithmetic with lora for continual learning. arXiv 2023, arXiv:2311.02428. [Google Scholar]
- Yang, J. Longqlora: Efficient and effective method to extend context length of large language models. arXiv 2023, arXiv:2311.04879. [Google Scholar]
- Zhao, Z.; Gan, L.; Wang, G.; Zhou, W.; Yang, H.; Kuang, K.; Wu, F. Loraretriever: Input-aware lora retrieval and composition for mixed tasks in the wild. arXiv 2024, arXiv:2402.09997. [Google Scholar]
- Toma, A.; Lawler, P.R.; Ba, J.; Krishnan, R.G.; Rubin, B.B.; Wang, B. Clinical camel: An open-source expert-level medical language model with dialogue-based knowledge encoding. arXiv 2023, arXiv:2305.12031. [Google Scholar]
- Roberson, R.; Kaki, G.; Trivedi, A. Analyzing the effectiveness of large language models on text-to-sql synthesis. arXiv 2024, arXiv:2401.12379. [Google Scholar]
- Chen, Z.; Huang, H.; Andrusenko, A.; Hrinchuk, O.; Puvvada, K.C.; Li, J.; Ghosh, S.; Balam, J.; Ginsburg, B. SALM: speech-augmented language model with in-context learning for speech recognition and translation. arXiv 2023, arXiv:2310.09424. [Google Scholar]
- Yi, L.; Yu, H.; Wang, G.; Liu, X.; Li, X. pFedLoRA: Model-Heterogeneous Personalized Federated Learning with LoRA Tuning. arXiv 2023, arXiv:2310.13283. [Google Scholar]
- Guo, Y.; Yang, C.; Rao, A.; Wang, Y.; Qiao, Y.; Lin, D.; Dai, B. Animatediff: Animate your personalized text-to-image diffusion models without specific tuning. arXiv 2023, arXiv:2307.04725. [Google Scholar]
- Geshkovski, B.; Letrouit, C.; Polyanskiy, Y.; Rigollet, P. The emergence of clusters in self-attention dynamics. In Annual Conference on Neural Information Processing Systems; 2023. [Google Scholar]
- Sun, S.; Gupta, D.; Iyyer, M. Exploring the impact of low-rank adaptation on the performance, efficiency, and regularization of RLHF. arXiv 2023, arXiv:2309.09055. [Google Scholar]
- Wu, Y.; Xiang, Y.; Huo, S.; Gong, Y.; Liang, P. Lora-sp: streamlined partial parameter adaptation for resource efficient fine-tuning of large language models. In Third International Conference on Algorithms, Microchips, and Network Applications; 2024; pp. 488–496. [Google Scholar]
- Dong, Q.; Li, L.; Dai, D.; Zheng, C.; Wu, Z.; Chang, B.; Sun, X.; Xu, J.; Li, L.; Sui, Z. A survey for in-context learning. arXiv 2023, arXiv:2301.00234. [Google Scholar]
- Jeon, H.; Kim, Y.; Kim, J.-J. L4q: Parameter efficient quantization-aware training on large language models via lora-wise lsq. arXiv 2024, arXiv:2402.04902. [Google Scholar]
- Deng, Y.; Wang, R.; Zhang, Y.; Tai, Y.; Tang, C. Dragvideo: Interactive drag-style video editing. arXiv 2023, arXiv:2312.02216. [Google Scholar]
- Chen, Z.; Wang, Z.; Wang, Z.; Liu, H.; Yin, Z.; Liu, S.; Sheng, L.; Ouyang, W.; Qiao, Y.; Shao, J. Octavius: Mitigating task interference in mllms via moe. arXiv 2023, arXiv:2311.02684. [Google Scholar]
- Zniyed, Y.; Nguyen, T.P.; et al. Enhanced network compression through tensor decompositions and pruning. IEEE Transactions on Neural Networks and Learning Systems 2024. [Google Scholar]
- Goodfellow, I.J.; Bengio, Y.; Courville, A.C. Deep Learning, ser. Adaptive computation and machine learning; MIT Press, 2016. [Google Scholar]
- Hansen, N.; Ostermeier, A. Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation. Proceedings of IEEE international conference on evolutionary computation; 1996; pp. 312–317. [Google Scholar]
- Wang, S.; Chen, L.; Jiang, J.; Xue, B.; Kong, L.; Wu, C. Lora meets dropout under a unified framework. arXiv 2024, arXiv:2403.00812. [Google Scholar]
- Geshkovski, B.; Letrouit, C.; Polyanskiy, Y.; Rigollet, P. A mathematical perspective on transformers. arXiv 2023, arXiv:2312.10794. [Google Scholar]
- Zhong, M.; Shen, Y.; Wang, S.; Lu, Y.; Jiao, Y.; Ouyang, S.; Yu, D.; Han, J.; Chen, W. Multi-lora composition for image generation. arXiv 2024, arXiv:2402.16843. [Google Scholar]
- Sheng, Y.; Cao, S.; Li, D.; Hooper, C.; Lee, N.; Yang, S.; Chou, C.; Zhu, B.; Zheng, L.; Keutzer, K.; et al. S-lora: Serving thousands of concurrent lora adapters. arXiv 2023, arXiv:2311.03285. [Google Scholar]
- Li, J.; Lei, Y.; Bian, Y.; Cheng, D.; Ding, Z.; Jiang, C. Ra-cfgpt: Chinese financial assistant with retrieval-augmented large language model. Frontiers of Computer Science 2024, 18, 185350. [Google Scholar] [CrossRef]
- Yoo, S.; Kim, K.; Kim, V.G.; Sung, M. As-plausible-as-possible: Plausibility-aware mesh deformation using 2d diffusion priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2024; pp. 4315–4324. [Google Scholar]
- Tan, W.; Zhang, W.; Liu, S.; Zheng, L.; Wang, X.; An, B. True knowledge comes from practice: Aligning llms with embodied environments via reinforcement learning. arXiv 2024, arXiv:2401.14151. [Google Scholar]
- Qi, Z.; Tan, X.; Shi, S.; Qu, C.; Xu, Y.; Qi, Y. PILLOW: enhancing efficient instruction fine-tuning via prompt matching. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track; 2023; pp. 471–482. [Google Scholar]
- Gou, Y.; Liu, Z.; Chen, K.; Hong, L.; Xu, H.; Li, A.; Yeung, D.; Kwok, J.T.; Zhang, Y. Mixture of cluster-conditional lora experts for vision-language instruction tuning. arXiv 2023, arXiv:2312.12379. [Google Scholar]
- Aghajanyan, A.; Gupta, S.; Zettlemoyer, L. Intrinsic dimensionality explains the effectiveness of language model fine-tuning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing; 2021; pp. 7319–7328. [Google Scholar]
- Biderman, D.; Ortiz, J.J.G.; Portes, J.; Paul, M.; Greengard, P.; Jennings, C.; King, D.; Havens, S.; Chiley, V.; Frankle, J.; et al. Lora learns less and forgets less. arXiv 2024, arXiv:2405.09673. [Google Scholar]
- Ge, Y.; Ge, Y.; Zeng, Z.; Wang, X.; Shan, Y. Planting a SEED of vision in large language model. arXiv 2023, arXiv:2307.08041. [Google Scholar]
- Kopiczko, D.J.; Blankevoort, T.; Asano, Y.M. Vera: Vector-based random matrix adaptation. arXiv 2023, arXiv:2310.11454. [Google Scholar]
- Salimans, T.; Kingma, D.P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In Advances in neural information processing systems; 2016; p. 901. [Google Scholar]
- Yang, A.X.; Robeyns, M.; Coste, T.; Wang, J.; Bou-Ammar, H.; Aitchison, L. Bayesian reward models for LLM alignment. arXiv 2024, arXiv:2402.13210. [Google Scholar]
- Sakaguchi; Bras, R.L.; Bhagavatula, C.; Choi, Y. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM 2021, 64, 99–106. [Google Scholar] [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Guo, Q.; Wang, M.; et al. Retrieval-augmented generation for large language models: A survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
- Ye, Z.; Lovell, L.; Faramarzi, A.; Ninic, J. Sam-based instance segmentation models for the automation of structural damage detection. arXiv 2024, arXiv:2401.15266. [Google Scholar] [CrossRef]
- Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S.R. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv 2018, arXiv:1804.07461. [Google Scholar]
- Shi, J.; Hua, H. Space narrative: Generating images and 3d scenes of chinese garden from text using deep learning. In xArch-creativity in the age of digital reproduction symposium; 2023; pp. 236–243. [Google Scholar]
- Liao, B.; Monz, C. Apiq: Finetuning of 2-bit quantized large language model. arXiv 2024, arXiv:2402.05147. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).