Submitted:
21 March 2026
Posted:
23 March 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
- Memory representation describes where and in what form information resides: token-level memory in the input context, intermediate latent memory as inference-time states (e.g., Key–Value caches), and parameter-level memory in model weights via adaptation or editing.
- Memory management describes how memory is operated over time to satisfy task requirements under practical constraints. Across representations, we observe a shared interface of three core operations: memory construction (what to store and how to structure it), memory update (how to maintain, consolidate, or remove stored content), and memory query (how to select and integrate relevant information during inference).
2. Natural Language Tokens as Memory

2.1. Retrieval-Augmented Generation (RAG)
2.2. Agentic Memory

3. Intermediate Latent as Memory

3.1. KV Cache as Memory
3.2. Other Vectors as Memory

4. Parameter as Memory


5. Discussion

6. Conclusions
Limitations
Appendix A. Related Surveys
Appendix B. Overview of Human and LLM Memory and Taxonomy
Appendix B.1. Human Memory

| Memory Type | Key Function/Characteristics | Duration/Capacity |
|---|---|---|
| Sensory Memory | Brief buffer for incoming sensory information (visual, auditory, etc.) | Milliseconds to a few seconds |
| Working Memory (WM) | Transient active store for manipulating information; supports complex cognitive operations (reasoning, language) | Tens of seconds to minutes; limited items |
| Short-Term Memory (STM) | Temporary holding of information before transfer to LTM or forgetting | Tens of seconds to minutes; limited items |
| Long-Term Memory (LTM) | Stores information for extended periods; large capacity and durability | Minutes to decades; Vast capacity |
| Declarative (Explicit) | Consciously recalled facts and events | Minutes to decades; Vast capacity |
| Episodic Memory | Personal experiences, specific events with contextual details | Minutes to decades |
| Semantic Memory | General world knowledge, facts, concepts, language | Minutes to decades |
| Non-Declarative (Implicit) | Unconscious learning: skills, habits, priming, conditioning | Acquired slowly, long-lasting |
Appendix B.2. LLM Memory

| Question: Can you remind me about the trip I mentioned planning to Paris? |
|---|
| Memory (Vector Database): Embedding match - Conversation: "User plans a trip to Paris in September 2025, interested in visiting the Louvre and Eiffel Tower." Vector Database: Enables semantic retrieval by matching the query’s meaning to stored embeddings, useful for broad or vague queries. |
| Memory (Time Index): On July 15, 2025, at 14:30, the user said, "I’m planning a trip to Paris next month and want to see the Louvre." Time Index: Organizes memories chronologically, ideal for queries referencing recent or specific dates. |
| Memory (Username Index): User "JaneDoe123" discussed a Paris trip, mentioning a preference for art museums. Username Index: Ensures personalization by linking memories to a specific user, enhancing relevance. |
| Memory (Event Name Index): Event "Paris Trip 2025": User plans to visit Paris, focusing on cultural landmarks. Event Name Index: Tags memories with specific events, allowing precise retrieval for event-related queries. |
| Memory (Story Index): Story "Jane’s European Adventure": Includes a chapter on planning a Paris trip, with details about booking a hotel near the Seine. Story Index: Structures memories as narratives, preserving context across related interactions. |
| Memory (Place Index): Place "Paris, France": User mentioned visiting the Eiffel Tower and dining at a café in Montmartre. Place Index: Associates memories with locations, enabling spatial queries about specific places. |
Appendix B.3. Taxonomy of Memory Implementations
| Implementation Ways | Memory Type | Forgetting Pretrained Knowledge | Memory Scalability | Explainability | Serving Costs |
|---|---|---|---|---|---|
| In-context Learning | Short Term | No | High | High | Low |
| Long Term | No | Weak | High | Low | |
| Parameter by Training | Short Term | Weak | Hight | Low | Medium |
| Long Term | Severe | Hight | Low | High |
| Implementation Ways | Memory Type | Forgetting Pretrained Knowledge | Explainability | Costs | Knowledge Match Degree |
|---|---|---|---|---|---|
| In-context Learning | Procedural | No | High | Low | Low |
| Episodic | No | High | Low | High | |
| Semantic | No | High | Low | High | |
| Parameter by Training | Procedural | Possible | Low | High | High |
| Episodic | Possible | Low | High | Moderate | |
| Semantic | Possible | Low | High | Moderate |
Appendix B.4. How Human Memory Benefits LLM Agentic Applications
Appendix C. Future Directions
Appendix D. Taxonomy of Different Memory
![]() |
![]() |
![]() |
![]() |
![]() |
References
- Qian, C.; Cong, X.; Yang, C.; Chen, W.; Su, Y.; Xu, J.; Liu, Z.; Sun, M. Communicative agents for software development. arXiv preprint arXiv:2307.07924 2023.
- Gao, C.; Lan, X.; Lu, Z.; Mao, J.; Piao, J.; Wang, H.; Jin, D.; Li, Y. S3: Social-network simulation system with large language model-empowered agents. arxiv 2023, [arXiv:cs.AI/arXiv:2307.14984].
- Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. arxiv 2023, [arXiv:cs.AI/arXiv:2308.11432].
- Gao, M., T. Lu, K. Yu, A. Byerly, and D. Khashabi. 2024. Insights into LLM Long-Context Failures: When Transformers Know but Don’t Tell. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024. Edited by Y. Al-Onaizan, M. Bansal and Y.N. Chen. Miami, Florida, USA. [Google Scholar]
- Zhang, Z.; Bo, X.; Ma, C.; Li, R.; Chen, X.; Dai, Q.; Zhu, J.; Dong, Z.; Wen, J.R. A survey on the memory mechanism of large language model based agents. arXiv preprint arXiv:2404.13501 2024.
- Wu, Y.; Liang, S.; Zhang, C.; Wang, Y.; Zhang, Y.; Guo, H.; Tang, R.; Liu, Y. From Human Memory to AI Memory: A Survey on Memory Mechanisms in the Era of LLMs, 2025, [arXiv:cs.IR/2504.15965].
- Shinwari, H.U.K.; Usama, M. Memory-Augmented Architecture for Long-Term Context Handling in Large Language Models. arXiv preprint arXiv:2506.18271 2025.
- Maharana, A.; Lee, D.H.; Tulyakov, S.; Bansal, M.; Barbieri, F.; Fang, Y. Evaluating very long-term conversational memory of llm agents. arXiv preprint arXiv:2402.17753 2024. arXiv:2402.17753.
- Wan, L.; Ma, W. StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns. arXiv preprint arXiv:2506.13356 2025.
- Shinn, N., F. Cassano, A. Gopinath, K.R. Narasimhan, and S. Yao. Reflexion: Language agents with verbal reinforcement learning. Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Google Scholar]
- Zhong, W.; Guo, L.; Gao, Q.; Wang, Y. Memorybank: Enhancing large language models with long-term memory. arXiv preprint arXiv:2305.10250 2023.
- Modarressi, A.; Imani, A.; Fayyaz, M.; Schütze, H. Ret-llm: Towards a general read-write memory for large language models. arxiv 2023, [arXiv:cs.AI/arXiv:2305.14322].
- Zhong, W., L. Guo, Q. Gao, H. Ye, and Y. Wang. 2024. Memorybank: Enhancing large language models with long-term memory. Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38: 19724–19731. [Google Scholar] [CrossRef]
- Qian, H.; Zhang, P.; Liu, Z.; Mao, K.; Dou, Z. Memorag: Moving towards next-gen rag via memory-inspired knowledge discovery. arXiv preprint arXiv:2409.05591 2024.
- Baddeley, A. Working memory, thought, and action; Vol. 45, OuP Oxford, 2007.
- Budson, A.E., and E.A. Kensinger. 2023. Why we forget and how to remember better: the science behind memory. Oxford University Press. [Google Scholar]
- Zhang, Z., Q. Dai, X. Bo, C. Ma, R. Li, X. Chen, J. Zhu, Z. Dong, and J.R. Wen. 2025. A Survey on the Memory Mechanism of Large Language Model based Agents. In ACM Trans. Inf. Syst. Just Accepted. [Google Scholar] [CrossRef]
- Huang, Y., J. Xu, J. Lai, Z. Jiang, T. Chen, Z. Li, Y. Yao, X. Ma, L. Yang, H. Chen, and et al. 2023. Advancing transformer architecture in long-context large language models: A comprehensive survey. arXiv arXiv:2311.12351. [Google Scholar]
- Jiang, X., F. Li, H. Zhao, J. Wang, J. Shao, S. Xu, S. Zhang, W. Chen, X. Tang, Y. Chen, and et al. 2024. Long term memory: The foundation of ai self-evolution. arXiv, 2410.15665. [Google Scholar] [CrossRef]
- Liu, J.; Qiu, Z.; Li, Z.; Dai, Q.; Zhu, J.; Hu, M.; Yang, M.; King, I. A Survey of Personalized Large Language Models: Progress and Future Directions. arXiv preprint arXiv:2502.11528 2025.
- Pan, J.; Li, G. A Survey of LLM Inference Systems, 2025, [arXiv:cs.DB/2506.21901].
- LI, H.; Li, Y.; Tian, A.; Tang, T.; Xu, Z.; Chen, X.; HU, N.; Dong, W.; Qing, L.; Chen, L. A Survey on Large Language Model Acceleration based on KV Cache Management. Transactions on Machine Learning Research 2025.
- Luohe, S., H. Zhang, Y. Yao, Z. Li, and et al. Keep the Cost Down: A Review on Methods to Optimize LLM’s KV-Cache Consumption. Proceedings of the First Conference on Language Modeling.
- Liu, N.F., K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang. 2024. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics 12: 157–173. [Google Scholar] [CrossRef]
- Li, X.; Nie, E.; Liang, S. From Classification to Generation: Insights into Crosslingual Retrieval Augmented ICL. arXiv preprint arXiv:2311.06595 2023.
- Yan, S.Q.; Gu, J.C.; Zhu, Y.; Ling, Z.H. Corrective Retrieval Augmented Generation. arXiv preprint arXiv:2401.15884 2024.
- Zha, L.; Zhou, J.; Li, L.; Wang, R.; Huang, Q.; Yang, S.; Yuan, J.; Su, C.; Li, X.; Su, A.; et al. Tablegpt: Towards unifying tables, nature language and commands into one gpt. arXiv preprint arXiv:2307.08674 2023.
- Luo, Z.; Xu, C.; Zhao, P.; Geng, X.; Tao, C.; Ma, J.; Lin, Q.; Jiang, D. Augmented Large Language Models with Parametric Knowledge Guiding. arXiv preprint arXiv:2305.04757 2023.
- Wang, X.; Yang, Q.; Qiu, Y.; Liang, J.; He, Q.; Gu, Z.; Xiao, Y.; Wang, W. KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases. arXiv preprint arXiv:2308.11761 2023.
- He, X.; Tian, Y.; Sun, Y.; Chawla, N.V.; Laurent, T.; LeCun, Y.; Bresson, X.; Hooi, B. G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. arXiv preprint arXiv:2402.07630 2024.
- Wang, Y.; Li, P.; Sun, M.; Liu, Y. Self-Knowledge Guided Retrieval Augmentation for Large Language Models. arXiv preprint arXiv:2310.05002 2023.
- Yu, W.; Iter, D.; Wang, S.; Xu, Y.; Ju, M.; Sanyal, S.; Zhu, C.; Zeng, M.; Jiang, M. Generate rather than retrieve: Large language models are strong context generators. arXiv preprint arXiv:2209.10063 2022.
- Cheng, X.; Luo, D.; Chen, X.; Liu, L.; Zhao, D.; Yan, R. Lift Yourself Up: Retrieval-augmented Text Generation with Self Memory. arXiv preprint arXiv:2305.02437 2023.
- Shi, F.; Chen, X.; Misra, K.; Scales, N.; Dohan, D.; Chi, E.H.; Schärli, N.; Zhou, D. Large language models can be easily distracted by irrelevant context. In Proceedings of the International Conference on Machine Learning. PMLR, 2023, pp. 31210–31227.
- Yu, W.; Zhang, H.; Pan, X.; Ma, K.; Wang, H.; Yu, D. Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models. arXiv preprint arXiv:2311.09210 2023.
- Chen, T.; Wang, H.; Chen, S.; Yu, W.; Ma, K.; Zhao, X.; Yu, D.; Zhang, H. Dense X Retrieval: What Retrieval Granularity Should We Use? arXiv preprint arXiv:2312.06648 2023.
- Jin, B.; Zeng, H.; Wang, G.; Chen, X.; Wei, T.; Li, R.; Wang, Z.; Li, Z.; Li, Y.; Lu, H.; et al. Language Models As Semantic Indexers. arXiv preprint arXiv:2310.07815 2023.
- Wang, L.; Yang, N.; Wei, F. Learning to retrieve in-context examples for large language models. arXiv preprint arXiv:2307.07164 2023.
- Teja, R. 2023. Evaluating the Ideal Chunk Size for a RAG System using LlamaIndex. Available online: https://www.llamaindex.ai/blog/evaluating-the-ideal-chunk-size-for-a-rag-system-using-llamaindex-6207e5d3fec5.
- Langchain. 2023. Recursively split by character. Available online: https://python.langchain.com/docs/modules/data_connection/document_transformers/recursive_text_splitter.
- Yang, S. 2023. Advanced RAG 01: Small-to-Big Retrieval. Available online: https://towardsdatascience.com/advanced-rag-01-small-to-big-retrieval-172181b396d4.
- Gao, L.; Ma, X.; Lin, J.; Callan, J. Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496 2022.
- Wang, Y.; Lipka, N.; Rossi, R.A.; Siu, A.; Zhang, R.; Derr, T. Knowledge graph prompting for multi-document question answering. arXiv preprint arXiv:2308.11730 2023.
- Li, X.; Li, J. AnglE-optimized Text Embeddings. arXiv preprint arXiv:2309.12871 2023.
- VoyageAI. 2023. Voyage’s embedding models. Available online: https://docs.voyageai.com/embeddings/.
- BAAI. 2023. FlagEmbedding. Available online: https://github.com/FlagOpen/FlagEmbedding.
- Zhang, Z.; Feng, Y.; Zhang, M. LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers, 2025, [arXiv:cs.CL/2502.18139].
- Zhou, D.; Schärli, N.; Hou, L.; Wei, J.; Scales, N.; Wang, X.; Schuurmans, D.; Cui, C.; Bousquet, O.; Le, Q.; et al. Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, 2023, [arXiv:cs.AI/2205.10625].
- Dhuliawala, S.; Komeili, M.; Xu, J.; Raileanu, R.; Li, X.; Celikyilmaz, A.; Weston, J. Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495 2023.
- Ma, X.; Gong, Y.; He, P.; Zhao, H.; Duan, N. Query Rewriting for Retrieval-Augmented Large Language Models. arXiv preprint arXiv:2305.14283 2023.
- Peng, W.; Li, G.; Jiang, Y.; Wang, Z.; Ou, D.; Zeng, X.; Chen, E.; et al. Large language model based long-tail query rewriting in taobao search. arXiv preprint arXiv:2311.03758 2023.
- Zheng, H.S.; Mishra, S.; Chen, X.; Cheng, H.T.; Chi, E.H.; Le, Q.V.; Zhou, D. Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models, 2024, [arXiv:cs.LG/2310.06117]. arXiv:cs.
- Wang, C.; Liu, X.; Liu, Y.; Zhu, Y.; Mo, X.; Jiang, J.; Chen, H. When to Reason: Semantic Router for vLLM. arXiv preprint arXiv:2510.08731 2025.
- Sun, Z.; Wang, X.; Tay, Y.; Yang, Y.; Zhou, D. Recitation-augmented language models. arXiv preprint arXiv:2210.01296 2022.
- Khattab, O.; Santhanam, K.; Li, X.L.; Hall, D.; Liang, P.; Potts, C.; Zaharia, M. Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP. arXiv preprint arXiv:2212.14024 2022.
- Jiang, Z.; Xu, F.F.; Gao, L.; Sun, Z.; Liu, Q.; Dwivedi-Yu, J.; Yang, Y.; Callan, J.; Neubig, G. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983 2023.
- Asai, A.; Wu, Z.; Wang, Y.; Sil, A.; Hajishirzi, H. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. arXiv preprint arXiv:2310.11511 2023.
- Ke, Z.; Kong, W.; Li, C.; Zhang, M.; Mei, Q.; Bendersky, M. Bridging the Preference Gap between Retrievers and LLMs. arXiv preprint arXiv:2401.06954 2024.
- Lin, X.V.; Chen, X.; Chen, M.; Shi, W.; Lomeli, M.; James, R.; Rodriguez, P.; Kahn, J.; Szilvasy, G.; Lewis, M.; et al. RA-DIT: Retrieval-Augmented Dual Instruction Tuning. arXiv preprint arXiv:2310.01352 2023.
- Salama, R.; Cai, J.; Yuan, M.; Currey, A.; Sunkara, M.; Zhang, Y.; Benajiba, Y. Meminsight: Autonomous memory augmentation for llm agents. arXiv preprint arXiv:2503.21760 2025.
- Xi, Y., W. Liu, J. Lin, B. Chen, R. Tang, W. Zhang, and Y. Yu. Memocrs: Memory-enhanced sequential conversational recommender systems with large language models. Proceedings of the Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024; pp. 2585–2595. [Google Scholar]
- Pan, Z.; Wu, Q.; Jiang, H.; Luo, X.; Cheng, H.; Li, D.; Yang, Y.; Lin, C.Y.; Zhao, H.V.; Qiu, L.; et al. On memory construction and retrieval for personalized conversational agents. arXiv preprint arXiv:2502.05589 2025.
- mem0ai. mem0: The memory layer for personalized ai. mem0.ai, 2024.
- Nie, Y., H. Huang, W. Wei, and X.L. Mao. 2022. Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network. In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates, Edited by Y. Goldberg, Z. Kozareva and Y. Zhang. pp. 5036–5047. [Google Scholar] [CrossRef]
- Li, S., Y. He, H. Guo, X. Bu, G. Bai, J. Liu, J. Liu, X. Qu, Y. Li, W. Ouyang, and et al. 2024. GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024. Edited by Y. Al-Onaizan, M. Bansal and Y.N. Chen. Miami, Florida, USA: pp. 12758–12786. [Google Scholar] [CrossRef]
- Gutiérrez, B.J., Y. Shu, Y. Gu, M. Yasunaga, and Y. Su. Hipporag: Neurobiologically inspired long-term memory for large language models. Proceedings of the The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. [Google Scholar]
- Wu, D.; Wang, H.; Yu, W.; Zhang, Y.; Chang, K.W.; Yu, D. Longmemeval: Benchmarking chat assistants on long-term interactive memory. arXiv preprint arXiv:2410.10813 2024.
- iunn Ong, K.T., N. Kim, M. Gwak, H. Chae, T. Kwon, Y. Jo, S. won Hwang, D. Lee, and J. Yeo. Towards Lifelong Dialogue Agents via Timeline-based Memory Management. Proceedings of the Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Mexico City, Mexico, 2025. [Google Scholar]
- Jiang, H., Q. Wu, C.Y. Lin, Y. Yang, and L. Qiu. 2023. LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models. In Proceedings of the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore, Edited by H. Bouamor, J. Pino and K. Bali. pp. 13358–13376. [Google Scholar] [CrossRef]
- Chevalier, A., A. Wettig, A. Ajith, and D. Chen. 2023. Adapting Language Models to Compress Contexts. In Proceedings of the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore, Edited by H. Bouamor, J. Pino and K. Bali. pp. 3829–3846. [Google Scholar] [CrossRef]
- Liu, J., L. Li, T. Xiang, B. Wang, and Y. Qian. 2023. TCRA-LLM: Token Compression Retrieval Augmented Large Language Model for Inference Cost Reduction. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023. Edited by H. Bouamor, J. Pino and K. Bali. Singapore: pp. 9796–9810. [Google Scholar] [CrossRef]
- Gim, I., G. Chen, S.s. Lee, N. Sarda, A. Khandelwal, and L. Zhong. 2024. Prompt cache: Modular attention reuse for low-latency inference. Proceedings of Machine Learning and Systems 6: 325–338. [Google Scholar]
- Wang, Q., Y. Fu, Y. Cao, S. Wang, Z. Tian, and L. Ding. 2025. Recursively summarizing enables long-term dialogue memory in large language models. Neurocomputing 639: 130193. [Google Scholar] [CrossRef]
- Lu, J.; An, S.; Lin, M.; Pergola, G.; He, Y.; Yin, D.; Sun, X.; Wu, Y. Memochat: Tuning llms to use memos for consistent long-range open-domain conversation. arXiv preprint arXiv:2308.08239 2023.
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629 2022.
- Shinn, N., F. Cassano, A. Gopinath, K. Narasimhan, and S. Yao. 2024. Reflexion: Language agents with verbal reinforcement learning. Advances in Neural Information Processing Systems 36. [Google Scholar]
- Yang, L.; Yu, Z.; Zhang, T.; Cao, S.; Xu, M.; Zhang, W.; Gonzalez, J.E.; Cui, B. Buffer of thoughts: Thought-augmented reasoning with large language models. arXiv preprint arXiv:2406.04271 2024.
- Wang, Z.Z.; Mao, J.; Fried, D.; Neubig, G. Agent Workflow Memory, 2024, [arXiv:cs.CL/2409.07429].
- Liu, L.; Yang, X.; Shen, Y.; Hu, B.; Zhang, Z.; Gu, J.; Zhang, G. Think-in-memory: Recalling and post-thinking enable llms with long-term memory. arXiv preprint arXiv:2311.08719 2023.
- Zhu, X.; Chen, Y.; Tian, H.; Tao, C.; Su, W.; Yang, C.; Huang, G.; Li, B.; Lu, L.; Wang, X.; et al. Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144 2023.
- Wang, G.; Xie, Y.; Jiang, Y.; Mandlekar, A.; Xiao, C.; Zhu, Y.; Fan, L.; Anandkumar, A. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291 2023.
- Yao, W.; Heinecke, S.; Niebles, J.C.; Liu, Z.; Feng, Y.; Xue, L.; Murthy, R.; Chen, Z.; Zhang, J.; Arpit, D.; et al. Retroformer: Retrospective large language agents with policy gradient optimization. arXiv preprint arXiv:2308.02151 2023.
- Zhao, A., D. Huang, Q. Xu, M. Lin, Y.J. Liu, and G. Huang. 2024. Expel: Llm agents are experiential learners. Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence Vol. 38: 19632–19642. [Google Scholar] [CrossRef]
- Li, H.; Yang, C.; Zhang, A.; Deng, Y.; Wang, X.; Chua, T.S. Hello again! llm-powered personalized agent for long-term dialogue. arXiv preprint arXiv:2406.05925 2024.
- Xu, W.; Liang, Z.; Mei, K.; Gao, H.; Tan, J.; Zhang, Y. A-mem: Agentic memory for llm agents. arXiv preprint arXiv:2502.12110 2025.
- Kadavy, D. 2021. Digital Zettelkasten: Principles, Methods, and Examples. Kadavy, Inc. [Google Scholar]
- Zheng, L., R. Wang, X. Wang, and B. An. Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control. Proceedings of the Proceedings of the International Conference on Learning Representations (ICLR), 2024. [Google Scholar]
- Samsami, M.R., A. Zholus, J. Rajendran, and S. Chandar. Mastering Memory Tasks with World Models. Proceedings of the The Twelfth International Conference on Learning Representations, 2024. [Google Scholar]
- Wang, B.; Liang, X.; Yang, J.; Huang, H.; Wu, S.; Wu, P.; Lu, L.; Ma, Z.; Li, Z. Enhancing Large Language Model with Self-Controlled Memory Framework, 2024, [arXiv:cs.CL/2304.13343]. arXiv.
- Bae, S., D. Kwak, S. Kang, M.Y. Lee, S. Kim, Y. Jeong, H. Kim, S.W. Lee, W. Park, and N. Sung. 2022. Keep Me Updated! Memory Management in Long-term Conversations. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022. Edited by Y. Goldberg, Z. Kozareva and Y. Zhang. Abu Dhabi, United Arab Emirates: pp. 3769–3787. [Google Scholar] [CrossRef]
- Wang, Q., Y. Fu, Y. Cao, S. Wang, Z. Tian, and L. Ding. 2025. Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models. Neurocomputing, 130193. [Google Scholar] [CrossRef]
- Kim, S.H.; Ka, K.; Jo, Y.; Hwang, S.w.; Lee, D.; Yeo, J. Ever-Evolving Memory by Blending and Refining the Past. arXiv preprint arXiv:2403.04787 2024.
- Sun, H.; Cai, H.; Wang, B.; Hou, Y.; Wei, X.; Wang, S.; Zhang, Y.; Yin, D. Towards Verifiable Text Generation with Evolving Memory and Self-Reflection. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing; Al-Onaizan, Y.; Bansal, M.; Chen, Y.N., Eds., Miami, Florida, USA, 2024; pp. 8211–8227.
- Jiang, Z.; Xu, F.; Gao, L.; Sun, Z.; Liu, Q.; Dwivedi-Yu, J.; Yang, Y.; Callan, J.; Neubig, G. Active Retrieval Augmented Generation. In Proceedings of the Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing; Bouamor, H.; Pino, J.; Bali, K., Eds., Singapore, 2023; pp. 7969–7992. [CrossRef]
- Jang, Y.; Lee, K.i.; Bae, H.; Lee, H.; Jung, K. IterCQR: Iterative Conversational Query Reformulation with Retrieval Guidance. In Proceedings of the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Duh, K.; Gomez, H.; Bethard, S., Eds., Mexico City, Mexico, 2024; pp. 8121–8138. [CrossRef]
- Du, Y.; Wang, H.; Zhao, Z.; Liang, B.; Wang, B.; Zhong, W.; Wang, Z.; Wong, K.F. PerLTQA: A Personal Long-Term Memory Dataset for Memory Classification, Retrieval, and Synthesis in Question Answering, 2024, [arXiv:cs.CL/2402.16288].
- Jang, J.; Boo, M.; Kim, H. Conversation chronicles: Towards diverse temporal and relational dynamics in multi-session conversations. arXiv preprint arXiv:2310.13420 2023.
- Xu, J.; Szlam, A.; Weston, J. Beyond goldfish memory: Long-term open-domain conversation. arXiv preprint arXiv:2107.07567 2021.
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models, 2023, [arXiv:cs.CL/2302.13971].
- Touvron, H.; Martin, L.; Stone, K.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models, 2023, [arXiv:cs.CL/2307.09288].
- Grattafiori, A.; Dubey, A.; Jauhri, A.; Pandey, A.; Kadian, A.; et al. The Llama 3 Herd of Models, 2024, [arXiv:cs.AI/2407.21783].
- DeepSeek-AI.; Guo, D.; Yang, D.; Zhang, H.; Song, J.; et al. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, 2025, [arXiv:cs.CL/2501.12948].
- DeepSeek-AI.; Liu, A.; Feng, B.; Xue, B.; Wang, B.; Wu, B.; et al. DeepSeek-V3 Technical Report, 2025, [arXiv:cs.CL/2412.19437].
- Xiao, G.; Tian, Y.; Chen, B.; Han, S.; Lewis, M. Efficient Streaming Language Models with Attention Sinks. In Proceedings of the The Twelfth International Conference on Learning Representations, 2024.
- Han, C.; Wang, Q.; Peng, H.; Xiong, W.; Chen, Y.; Ji, H.; Wang, S. LM-Infinite: Zero-Shot Extreme Length Generalization for Large Language Models. In Proceedings of the Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers); Duh, K.; Gomez, H.; Bethard, S., Eds., Mexico City, Mexico, 2024; pp. 3991–4008. 1. [CrossRef]
- Wu, H.; Tu, K. Layer-Condensed KV Cache for Efficient Inference of Large Language Models. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Ku, L.W.; Martins, A.; Srikumar, V., Eds., Bangkok, Thailand, 2024; pp. 11175–11188. Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. [CrossRef]
- Zhang, Z.; Sheng, Y.; Zhou, T.; Chen, T.; Zheng, L.; Cai, R.; Song, Z.; Tian, Y.; Re, C.; Barrett, C.; et al. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. In Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Ge, S.; Zhang, Y.; Liu, L.; Zhang, M.; Han, J.; Gao, J. Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. In Proceedings of the The Twelfth International Conference on Learning Representations, 2024.
- Hao, Y.; Zhai, M.; Hajimirsadeghi, H.; Hosseini, S.; Tung, F. Radar: Fast Long-Context Decoding for Any Transformer. In Proceedings of the The Thirteenth International Conference on Learning Representations, 2025.
- Chen, Y.; Wang, G.; Shang, J.; Cui, S.; Zhang, Z.; Liu, T.; Wang, S.; Sun, Y.; Yu, D.; Wu, H. NACL: A General and Effective KV Cache Eviction Framework for LLM at Inference Time. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Ku, L.W.; Martins, A.; Srikumar, V., Eds., Bangkok, Thailand, 2024; pp. 7913–7926. [CrossRef]
- Liu, Z.; Desai, A.; Liao, F.; Wang, W.; Xie, V.; Xu, Z.; Kyrillidis, A.; Shrivastava, A. Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time. In Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, 2023.
- Devoto, A., Y. Zhao, S. Scardapane, and P. Minervini. A Simple and Effective L_2 Norm-Based Strategy for KV Cache Compression. Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, 2024; pp. 18476–18499. [Google Scholar] [CrossRef]
- Yao, Y.; Li, Z.; Zhao, H. SirLLM: Streaming Infinite Retentive LLM, 2024, [arXiv:cs.CL/2405.12528].
- Jiang, Y., H. Wang, L. Xie, H. Zhao, C. Zhang, H. Qian, and J.C. Lui. 2024. D-LLM: A Token Adaptive Computing Resource Allocation Strategy for Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems. Edited by A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak and C. Zhang. Curran Associates, Inc.: Vol. 37, pp. 1725–1749. [Google Scholar]
- Zhong, M.; Liu, X.; Zhang, C.; Lei, Y.; Gao, Y.; Hu, Y.; Chen, K.; Zhang, M. ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty, 2024, [arXiv:cs.CL/2412.09036].
- Zhou, X.; Wang, W.; Zeng, M.; Guo, J.; Liu, X.; Shen, L.; Zhang, M.; Ding, L. DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs, 2025, [arXiv:cs.CL/2412.14838].
- Liu, A., J. Liu, Z. Pan, Y. He, G. Haffari, and B. Zhuang. 2024. MiniCache: KV Cache Compression in Depth Dimension for Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems. Edited by A. Globerson, L. Mackey, D. Belgrave, A. Fan, U. Paquet, J. Tomczak and C. Zhang. Curran Associates, Inc.: Vol. 37, pp. 139997–140031. [Google Scholar]
- Kim, M.; Shim, K.; Choi, J.; Chang, S. InfiniPot: Infinite Context Processing on Memory-Constrained LLMs. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing; Al-Onaizan, Y.; Bansal, M.; Chen, Y.N., Eds., Miami, Florida, USA, 2024; pp. 16046–16060. [CrossRef]
- Agarwal, S.; Acun, B.; Hosmer, B.; Elhoushi, M.; Lee, Y.; Venkataraman, S.; Papailiopoulos, D.; Wu, C.J. CHAI: Clustered Head Attention for Efficient LLM Inference. In Proceedings of the Proceedings of the 41st International Conference on Machine Learning; Salakhutdinov, R.; Kolter, Z.; Heller, K.; Weller, A.; Oliver, N.; Scarlett, J.; Berkenkamp, F., Eds. PMLR, 21–27 Jul 2024, Vol. 235, Proceedings of Machine Learning Research, pp. 291–312.
- Zhang, P., Z. Liu, S. Xiao, N. Shao, Q. Ye, and Z. Dou. Long Context Compression with Activation Beacon. Proceedings of the The Thirteenth International Conference on Learning Representations, 2025. [Google Scholar]
- Liu, Y.; Li, H.; Cheng, Y.; Ray, S.; Huang, Y.; Zhang, Q.; Du, K.; Yao, J.; Lu, S.; Ananthanarayanan, G.; et al. CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving, 2024, [arXiv:cs.NI/2310.07240]. arXiv:cs.
- Liu, X.; Tang, Z.; Dong, P.; Li, Z.; Liu, Y.; Li, B.; Hu, X.; Chu, X. ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference, 2025, [arXiv:cs.CL/2502.00299].
- Zhu, Y.; Falahati, A.; Yang, D.H.; Amiri, M.M. SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching, 2025, [arXiv:cs.CL/2504.00970].
- Liu, Z.; Yuan, J.; Jin, H.; Zhong, S.; Xu, Z.; Braverman, V.; Chen, B.; Hu, X. KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache. In Proceedings of the International Conference on Machine Learning, ICML 2024. PMLR, 2024, pp. 32332–32344.
- Duanmu, H.; Yuan, Z.; Li, X.; Duan, J.; Zhang, X.; Lin, D. SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models, 2024, [arXiv:cs.LG/2405.06219].
- Dong, S.; Cheng, W.; Qin, J.; Wang, W. QAQ: Quality Adaptive Quantization for LLM KV Cache, 2024, [arXiv:cs.CL/2403.04643].
- Hooper, C., S. Kim, H. Mohammadzadeh, M.W. Mahoney, S. Shao, K. Keutzer, and A. Gholami. 2024. KVQuant: Towards 10 million context length LLM inference with KV cache quantization. Advances in Neural Information Processing Systems, NeurIPS 37: 1270–1303. [Google Scholar]
- Zhang, T., J. Yi, Z. Xu, and A. Shrivastava. 2024. KV cache is 1 bit per channel: Efficient large language model inference with coupled quantization. Advances in Neural Information Processing Systems, NeurIPS 37: 3304–3331. [Google Scholar]
- Li, Z.; Xiao, C.; Wang, Y.; Liu, X.; Tang, Z.; Lu, B.; Yang, M.; Chen, X.; Chu, X. AnTKV: Anchor Token-Aware Sub-Bit Vector Quantization for KV Cache in Large Language Models, 2025, [arXiv:cs.CL/2506.19505].
- Lin, Y.; Tang, H.; Yang, S.; Zhang, Z.; Xiao, G.; Gan, C.; Han, S. QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving, 2025, [arXiv:cs.CL/2405.04532].
- Yang, J.Y.; Kim, B.; Bae, J.; Kwon, B.; Park, G.; Yang, E.; Kwon, S.J.; Lee, D. No Token Left Behind: Reliable KV Cache Compression via Importance-Aware Mixed Precision Quantization, 2024, [arXiv:cs.LG/2402.18096].
- Dong, H.; Yang, X.; Zhang, Z.; Wang, Z.; Chi, Y.; Chen, B. Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference. In Proceedings of the Proceedings of the 41st International Conference on Machine Learning; Salakhutdinov, R.; Kolter, Z.; Heller, K.; Weller, A.; Oliver, N.; Scarlett, J.; Berkenkamp, F., Eds. PMLR, 21–27 Jul 2024, Vol. 235, Proceedings of Machine Learning Research, pp. 11437–11452.
- Saxena, U.; Saha, G.; Choudhary, S.; Roy, K. Eigen Attention: Attention in Low-Rank Space for KV Cache Compression. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024; Al-Onaizan, Y.; Bansal, M.; Chen, Y.N., Eds., Miami, Florida, USA, 2024; pp. 15332–15344. [CrossRef]
- Kang, H.; Zhang, Q.; Kundu, S.; Jeong, G.; Liu, Z.; Krishna, T.; Zhao, T. GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM, 2024, [arXiv:cs.LG/2403.05527].
- Sheng, Y.; Zheng, L.; Yuan, B.; Li, Z.; Ryabinin, M.; Chen, B.; Liang, P.; Ré, C.; Stoica, I.; Zhang, C. FlexGen: high-throughput generative inference of large language models with a single GPU. In Proceedings of the Proceedings of the 40th International Conference on Machine Learning. JMLR.org, 2023, ICML’23.
- Zhao, Y.; Lin, C.Y.; Zhu, K.; Ye, Z.; Chen, L.; Zheng, S.; Ceze, L.; Krishnamurthy, A.; Chen, T.; Kasikci, B. Atom: Low-Bit Quantization for Efficient and Accurate LLM Serving. In Proceedings of the MLSys, 2024.
- He, Y.; Zhang, L.; Wu, W.; Liu, J.; Zhou, H.; Zhuang, B. ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification. In Proceedings of the Advances in Neural Information Processing Systems; Globerson, A.; Mackey, L.; Belgrave, D.; Fan, A.; Paquet, U.; Tomczak, J.; Zhang, C., Eds. Curran Associates, Inc., 2024, Vol. 37, pp. 68287–68307.
- Chen, S.; Jiang, R.; Yu, D.; Xu, J.; Chao, M.; Meng, F.; Jiang, C.; Xu, W.; Liu, H. KVDirect: Distributed Disaggregated LLM Inference, 2024, [arXiv:cs.DC/2501.14743].
- Li, W.; Jiang, G.; Ding, X.; Tao, Z.; Hao, C.; Xu, C.; Zhang, Y.; Wang, H. FlowKV: A Disaggregated Inference Framework with Low-Latency KV Cache Transfer and Load-Aware Scheduling, 2025, [arXiv:cs.DC/2504.03775].
- Yang, D., X. Han, Y. Gao, Y. Hu, S. Zhang, and H. Zhao. PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 2024; pp. 3258–3270. [Google Scholar] [CrossRef]
- Zhu, Y., Z. Tang, X. Liu, A. Li, B. Li, X. Chu, and B. Han. OracleKV: Oracle Guidance for Question-Independent KV Cache Compression. Proceedings of the ICML 2025 Workshop on Long-Context Foundation Models, 2025. [Google Scholar]
- Tang, J.; Zhao, Y.; Zhu, K.; Xiao, G.; Kasikci, B.; Han, S. QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference. In Proceedings of the Proceedings of the 41st International Conference on Machine Learning. PMLR, 21–27 Jul 2024, Vol. 235, Proceedings of Machine Learning Research, pp. 47901–47911.
- Wu, W.; Pan, Z.; Wang, C.; Chen, L.; Bai, Y.; Wang, T.; Fu, K.; Wang, Z.; Xiong, H. TokenSelect: Efficient Long-Context Inference and Length Extrapolation for LLMs via Dynamic Token-Level KV Cache Selection, 2025, [arXiv:cs.CL/2411.02886].
- Leviathan, Y., M. Kalman, and Y. Matias. Selective Attention Improves Transformer. Proceedings of the The Thirteenth International Conference on Learning Representations, 2025. [Google Scholar]
- Liu, D.; Chen, M.; Lu, B.; Jiang, H.; Han, Z.; Zhang, Q.; Chen, Q.; Zhang, C.; Ding, B.; Zhang, K.; et al. RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval, 2024, [arXiv:cs.LG/2409.10516]. arXiv.
- Zheng, L.; Yin, L.; Xie, Z.; Sun, C.; Huang, J.; Yu, C.H.; Cao, S.; Kozyrakis, C.; Stoica, I.; Gonzalez, J.E.; et al. SGLang: Efficient Execution of Structured Language Model Programs, 2024, [arXiv:cs.AI/2312.07104].
- Ye, L.; Tao, Z.; Huang, Y.; Li, Y. ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition, 2024, [arXiv:cs.LG/2402.15220].
- Yang, H.; Zhang, R.; Huang, M.; Wang, W.; Tang, Y.; Li, Y.; Liu, Y.; Zhang, D. KVShare: An LLM Service System with Efficient and Effective Multi-Tenant KV Cache Reuse, 2025, [arXiv:cs.CL/2503.16525].
- Agarwal, S.; Sundaresan, S.; Mitra, S.; Mahapatra, D.; Gupta, A.; Sharma, R.; Kapu, N.J.; Yu, T.; Saini, S. Cache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented Generation, 2025, [arXiv:cs.DC/2502.15734].
- Tan, X.; Jiang, Y.; Yang, Y.; Xu, H. Teola: Towards End-to-End Optimization of LLM-based Applications, 2025, [arXiv:cs.DC/2407.00326].
- Yang, J.; Hou, B.; Wei, W.; Bao, Y.; Chang, S. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse, 2025, [arXiv:cs.CL/2502.16002].
- Yao, J.; Li, H.; Liu, Y.; Ray, S.; Cheng, Y.; Zhang, Q.; Du, K.; Lu, S.; Jiang, J. CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion, 2025, [arXiv:cs.LG/2405.16444].
- Hu, J.; Huang, W.; Wang, W.; Wang, H.; Hu, T.; Zhang, Q.; Feng, H.; Chen, X.; Shan, Y.; Xie, T. EPIC: Efficient Position-Independent Caching for Serving Large Language Models, 2025, [arXiv:cs.LG/2410.15332].
- Zhu, Q.; Zhang, L.; Xu, Q.; Long, C.; Zhang, J. SubGCache: Accelerating Graph-based RAG with Subgraph-level KV Cache, 2025, [arXiv:cs.LG/2505.10951].
- An, Y.; Cheng, Y.; Park, S.J.; Jiang, J. HyperRAG: Enhancing Quality-Efficiency Tradeoffs in Retrieval-Augmented Generation with Reranker KV-Cache Reuse, 2025, [arXiv:cs.CL/2504.02921].
- Jiang, W.; Subramanian, S.; Graves, C.; Alonso, G.; Yazdanbakhsh, A.; Dadu, V. RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving, 2025, [arXiv:cs.IR/2503.14649].
- Wu, Y.; Rabe, M.N.; Hutchins, D.; Szegedy, C. Memorizing transformers. arXiv preprint arXiv:2203.08913 2022.
- Tworkowski, S., K. Staniszewski, M.a. Pacek, Y. Wu, H. Michalewski, and P. Mił oś. 2023. Focused Transformer: Contrastive Training for Context Scaling. In Proceedings of the Advances in Neural Information Processing Systems. Edited by A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt and S. Levine. Curran Associates, Inc.: Vol. 36, pp. 42661–42688. [Google Scholar]
- Di, S., Z. Yu, G. Zhang, H. Li, *!!! REPLACE !!!* TaoZhong, H. Cheng, B. Li, W. He, F. Shu, and H. Jiang. Streaming Video Question-Answering with In-context Video KV-Cache Retrieval. Proceedings of the The Thirteenth International Conference on Learning Representations, 2025. [Google Scholar]
- Al Adel, A., and M.S. Burtsev. Memory transformer with hierarchical attention for long document processing. Proceedings of the 2021 International Conference Engineering and Telecommunication, 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Khandelwal, U.; Levy, O.; Jurafsky, D.; Zettlemoyer, L.; Lewis, M. Generalization through memorization: Nearest neighbor language models. arXiv preprint arXiv:1911.00172 2019.
- Packer, C.; Wooders, S.; Lin, K.; Fang, V.; Patil, S.G.; Stoica, I.; Gonzalez, J.E. Memgpt: Towards llms as operating systems. arXiv preprint arXiv:2310.08560 2023.
- Safaya, A.; Yuret, D. Neurocache: Efficient vector retrieval for long-range language modeling. arXiv preprint arXiv:2407.02486 2024.
- He, Z.; Karlinsky, L.; Kim, D.; McAuley, J.; Krotov, D.; Feris, R. Camelot: Towards large language models with training-free consolidated associative memory. arXiv preprint arXiv:2402.13449 2024.
- Li, Z.; Song, S.; Xi, C.; Wang, H.; Tang, C.; Niu, S.; Chen, D.; Yang, J.; Li, C.; Yu, Q.; et al. Memos: A memory os for ai system. arXiv preprint arXiv:2507.03724 2025.
- Yang, H.; Lin, Z.; Wang, W.; Wu, H.; Li, Z.; Tang, B.; Wei, W.; Wang, J.; Tang, Z.; Song, S.; et al. Memory3: Language Modeling with Explicit Memory. arXiv preprint arXiv:2407.01178 2024.
- Dathathri, S., A. Madotto, J. Lan, J. Hung, E. Frank, P. Molino, J. Yosinski, and R. Liu. Plug and Play Language Models: A Simple Approach to Controlled Text Generation. Proceedings of the International Conference on Learning Representations.
- Turner, A.M.; Thiergart, L.; Leech, G.; Udell, D.; Vazquez, J.J.; Mini, U.; MacDiarmid, M. Steering language models with activation engineering. arXiv preprint arXiv:2308.10248 2023.
- Liu, S.; Ye, H.; Xing, L.; Zou, J. In-context vectors: Making in context learning more effective and controllable through latent space steering. arXiv 2023, [2311.06668].
- Zou, A.; Phan, L.; Chen, S.; Campbell, J.; Guo, P.; Ren, R.; Pan, A.; Yin, X.; Mazeika, M.; Dombrowski, A.K.; et al. Representation engineering: A top-down approach to ai transparency. arXiv 2023, [2310.01405].
- Arditi, A.; Obeso, O.; Syed, A.; Paleka, D.; Panickssery, N.; Gurnee, W.; Nanda, N. Refusal in language models is mediated by a single direction. arXiv 2024, [2406.11717].
- Chugtai, B., and L. Bushnaq. Activation space interpretability may be doomed, 2025.
- Subramani, N.; Suresh, N.; Peters, M.E. Extracting latent steering vectors from pretrained language models. arXiv 2022, [2205.05124].
- Hernandez, E.; Li, B.Z.; Andreas, J. Inspecting and editing knowledge representations in language models. arXiv 2023, [2304.00740].
- Dunefsky, J.; Cohan, A. Investigating generalization of one-shot LLM steering vectors. arXiv preprint arXiv:2502.18862 2025.
- Mack, A.; Turner, A. Mechanistically eliciting latent behaviors in language models, 2024.
- Li, K., O. Patel, F. Viégas, H. Pfister, and M. Wattenberg. Inference-time intervention: Eliciting truthful answers from a language model. Proceedings of the Advances in Neural Information Processing Systems, 2024, Vol. 36. [Google Scholar]
- Turner, A., M. Kurzeja, D. Orr, and D. Elson. 2025. Steering gemini using bidpo vectors. [Google Scholar]
- Cao, Y.; Zhang, T.; Cao, B.; Yin, Z.; Lin, L.; Ma, F.; Chen, J. Personalized steering of large language models: Versatile steering vectors through bi-directional preference optimization. arXiv 2024, [2406.00045].
- Allen-Zhu, Z., and Y. Li. Physics of language models: part 3.1, knowledge storage and extraction. Proceedings of the Proceedings of the 41st International Conference on Machine Learning, 2024; pp. 1067–1077. [Google Scholar]
- Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. Extracting Training Data from Large Language Models, 2021, [arXiv:cs.CR/2012.07805].
- Lee, K.; Ippolito, D.; Nystrom, A.; Zhang, C.; Eck, D.; Callison-Burch, C.; Carlini, N. Deduplicating Training Data Makes Language Models Better. In Proceedings of the Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2022.
- Kandpal, N.; Wallace, E.; Raffel, C. Deduplicating Training Data Mitigates Privacy Risks in Language Models, 2022, [arXiv:cs.CR/2202.06539].
- Carlini, N.; Ippolito, D.; Jagielski, M.; Lee, K.; Tramer, F.; Zhang, C. Quantifying Memorization Across Neural Language Models, 2023, [arXiv:cs.LG/2202.07646].
- Wang, Z.; Bao, R.; Wu, Y.; Taylor, J.; Xiao, C.; Zheng, F.; Jiang, W.; Gao, S.; Zhang, Y. Unlocking Memorization in Large Language Models with Dynamic Soft Prompting. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing; Al-Onaizan, Y.; Bansal, M.; Chen, Y.N., Eds., Miami, Florida, USA, 2024; pp. 9782–9796. [CrossRef]
- Tirumala, K.; Markosyan, A.H.; Zettlemoyer, L.; Aghajanyan, A. Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models, 2022, [arXiv:cs.CL/2205.10770].
- Freeman, J.; Rippe, C.; Debenedetti, E.; Andriushchenko, M. Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit, 2024, [arXiv:cs.LG/2412.06370].
- Geva, M.; Schuster, R.; Berant, J.; Levy, O. Transformer feed-forward layers are key-value memories. arXiv 2021, [2109.04554].
- Dai, D.; Dong, L.; Hao, Y.; Sui, Z.; Chang, B.; Wei, F. Knowledge neurons in pretrained transformers. arXiv 2021, [2104.08696].
- Wang, L.; Zhang, X.; Su, H.; Zhu, J. A comprehensive survey of continual learning: Theory, method and application. IEEE Transactions on Pattern Analysis and Machine Intelligence 2024.
- Kirkpatrick, J., R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A.A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, and et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences 114: 3521–3526. [Google Scholar] [CrossRef]
- Feng, Y.; Chu, X.; Xu, Y.; Shi, G.; Liu, B.; Wu, X.M. TaSL: Continual Dialog State Tracking via Task Skill Localization and Consolidation. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers); Ku, L.W.; Martins, A.; Srikumar, V., Eds., Bangkok, Thailand, 2024; pp. 1266–1279. [CrossRef]
- Wang, Y., X. Liu, X. Chen, S. O’Brien, J. Wu, and J. McAuley. Self-Updatable Large Language Models by Integrating Context into Model Parameters. Proceedings of the The Thirteenth International Conference on Learning Representations.
- Wu, Y., H. Wang, P. Zhao, Y. Zheng, Y. Wei, and L.K. Huang. Mitigating catastrophic forgetting in online continual learning by modeling previous task interrelations via pareto optimization. Proceedings of the Forty-first International Conference on Machine Learning, 2024. [Google Scholar]
- Mehta, S.V.; Gupta, J.; Tay, Y.; Dehghani, M.; Tran, V.Q.; Rao, J.; Najork, M.; Strubell, E.; Metzler, D. DSI++: Updating transformer memory with new documents. arXiv preprint arXiv:2212.09744 2022.
- Wang, Y.; Han, C.; Wu, T.; He, X.; Zhou, W.; Sadeq, N.; Chen, X.; He, Z.; Wang, W.; Haffari, G.; et al. Towards lifespan cognitive systems. arXiv preprint arXiv:2409.13265 2024.
- Han, Z.; Gao, C.; Liu, J.; Zhang, J.; Zhang, S.Q. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:2403.14608 2024.
- Shao, Y., L. Li, J. Dai, and X. Qiu. Character-llm: A trainable agent for role-playing.
- Shang, J.; Zheng, Z.; Wei, J.; Ying, X.; Tao, F.; Team, M. Ai-native memory: A pathway from llms towards agi. arXiv preprint arXiv:2406.18312 2024.
- Liu, W.; Zhang, R.; Zhou, A.; Gao, F.; Liu, J. Echo: A large language model with temporal episodic memory. arXiv preprint arXiv:2502.16090 2025.
- McMahan, B., E. Moore, D. Ramage, S. Hampson, and B.A. y Arcas. Communication-efficient learning of deep networks from decentralized data. Proceedings of the AISTATS. PMLR, 2017; pp. 1273–1282. [Google Scholar]
- Marczak, D., B. Twardowski, T. Trzciński, and S. Cygert. MagMax: Leveraging Model Merging for Seamless Continual Learning. Proceedings of the ECCV, 2024. [Google Scholar]
- Lee, N., T. Ajanthan, and P. Torr. SNIP: SINGLE-SHOT NETWORK PRUNING BASED ON CONNECTION SENSITIVITY. Proceedings of the International Conference on Learning Representations, 2019. [Google Scholar]
- Qu, Z., X. Li, R. Duan, Y. Liu, B. Tang, and Z. Lu. Generalized federated learning via sharpness aware minimization. Proceedings of the International Conference on Machine Learning. PMLR, 2022; pp. 18250–18280. [Google Scholar]
- Matena, M.S., and C.A. Raffel. 2022. Merging models with fisher-weighted averaging. NeurIPS 35: 17703–17716. [Google Scholar]
- Yadav, P.; Tam, D.; Choshen, L.; Raffel, C.; Bansal, M. TIES-Merging: Resolving Interference When Merging Models 2023. [2306.01708].
- Yu, L.; Yu, B.; Yu, H.; Huang, F.; Li, Y. Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch. ICML 2024, [2311.03099].
- Davari, M.; Belilovsky, E. Model breadcrumbs: Scaling multi-task model merging with sparse masks. arXiv preprint arXiv:2312.06795 2023.
- Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; Dean, J. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer 2017. [1701.06538].
- Muqeeth, M.; Liu, H.; Raffel, C. Soft merging of experts with adaptive routing. TMLR 2024.
- Ilharco, G.; Ribeiro, M.T.; Wortsman, M.; Gururangan, S.; Schmidt, L.; Hajishirzi, H.; Farhadi, A. Editing Models with Task Arithmetic 2022. [2212.04089].
- Yang, E.; Wang, Z.; Shen, L.; Liu, S.; Guo, G.; Wang, X.; Tao, D. AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR 2024, [2310.02575].
- Lu, Z.; Fan, C.; Wei, W.; Qu, X.; Chen, D.; Cheng, Y. Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging. NIPS 2024, [2406.15479].
- Wang, S., Y. Zhu, H. Liu, Z. Zheng, C. Chen, and J. Li. 2024. Knowledge editing for large language models: A survey. ACM Computing Surveys 57: 1–37. [Google Scholar] [CrossRef]
- Wang, Y.; Gao, Y.; Chen, X.; Jiang, H.; Li, S.; Yang, J.; Yin, Q.; Li, Z.; Li, X.; Yin, B.; et al. Memoryllm: Towards self-updatable large language models. arXiv preprint arXiv:2402.04624 2024.
- Wang, P., Z. Li, N. Zhang, Z. Xu, Y. Yao, Y. Jiang, P. Xie, F. Huang, and H. Chen. 2024. Wise: Rethinking the knowledge memory for lifelong model editing of large language models. Advances in Neural Information Processing Systems 37: 53764–53797. [Google Scholar]
- Nasr, M.; Carlini, N.; Hayase, J.; Jagielski, M.; Cooper, A.F.; Ippolito, D.; Choquette-Choo, C.A.; Wallace, E.; Tramèr, F.; Lee, K. Scalable Extraction of Training Data from (Production) Language Models, 2023, [arXiv:cs.LG/2311.17035].
- Ippolito, D.; Tramer, F.; Nasr, M.; Zhang, C.; Jagielski, M.; Lee, K.; Choquette Choo, C.; Carlini, N. Preventing Generation of Verbatim Memorization in Language Models Gives a False Sense of Privacy. In Proceedings of the Proceedings of the 16th International Natural Language Generation Conference; Keet, C.M.; Lee, H.Y.; Zarrieß, S., Eds., Prague, Czechia, 2023; pp. 28–53. [CrossRef]
- Biderman, S.; Prashanth, U.S.; Sutawika, L.; Schoelkopf, H.; Anthony, Q.; Purohit, S.; Raff, E. Emergent and Predictable Memorization in Large Language Models, 2023, [arXiv:cs.CL/2304.11158].
- Tulving, E., and W. Donaldson. 1972. Episodic and semantic memory. Academic Press. [Google Scholar]
- Begg, I. 1984. Tulving’s memory [Review of the book Elements of episodic memory, by E. Tulving]. Canadian Journal of Psychology / Revue canadienne de psychologie 38: 144–147. [Google Scholar] [CrossRef]
- Squire, L.R. Memory and brain systems: 1969-2009. J Neurosci. 2009 Oct 14;29(41):12711-6. doi: 10.1523/JNEUROSCI.3575-09.2009. PMID: 19828780; PMCID: PMC2791502. J Neurosci 2009, 29, 12711–12716.
- Xiao, G.; Tian, Y.; Chen, B.; Han, S.; Lewis, M. Efficient Streaming Language Models with Attention Sinks. arXiv 2023.
- Meng, K., A.S. Sharma, A.J. Andonian, Y. Belinkov, and D. Bau. Mass-Editing Memory in a Transformer. Proceedings of the The Eleventh International Conference on Learning Representations, 2023. [Google Scholar]
- Zhang, Z., Y. Sheng, T. Zhou, T. Chen, L. Zheng, R. Cai, Z. Song, Y. Tian, C. Ré, C. Barrett, and et al. 2023. H2o: Heavy-hitter oracle for efficient generative inference of large language models. Advances in Neural Information Processing Systems 36: 34661–34710. [Google Scholar]
- Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. The rise and potential of large language model based agents: A survey. arxiv 2023, [arXiv:cs.AI/arXiv:2309.07864].
- Zhao, P.; Jin, Z.; Cheng, N. An in-depth survey of large language model-based artificial intelligence agents. arXiv 2023, [2309.14365].
- Cheng, Y.; Zhang, C.; Zhang, Z.; Meng, X.; Hong, S.; Li, W.; Wang, Z.; Wang, Z.; Yin, F.; Zhao, J.; et al. Exploring large language model based intelligent agents: Definitions, methods, and prospects. arXiv 2024, [2401.03428].
- Ge, Y.; Ren, Y.; Hua, W.; Xu, S.; Tan, J.; Zhang, Y. Llm as os (llmao), agents as apps: Envisioning aios, agents and the aios-agent ecosystem. arXiv 2023, [2312.03815].
- Durante, Z.; Huang, Q.; Wake, N.; Gong, R.; Park, J.S.; Sarkar, B.; Taori, R.; Noda, Y.; Terzopoulos, D.; Choi, Y.; et al. Agent ai: Surveying the horizons of multimodal interaction. arXiv 2024, [2401.03568].
- Huang, X.; Liu, W.; Chen, X.; Wang, X.; Wang, H.; Lian, D.; Wang, Y.; Tang, R.; Chen, E. Understanding the planning of llm agents: A survey. arXiv 2024, [2402.02716].
- Guo, T.; Chen, X.; Wang, Y.; Chang, R.; Pei, S.; Chawla, N.V.; Wiest, O.; Zhang, X. Large language model based multi-agents: A survey of progress and challenges. arXiv 2024, [2402.01680]. a.
- Li, Y.; Wen, H.; Wang, W.; Li, X.; Yuan, Y.; Liu, G.; Liu, J.; Xu, W.; Wang, X.; Sun, Y.; et al. Personal llm agents: Insights and survey about the capability, efficiency and security. arXiv 2024, [2401.05459].
- Zhu, Y.; Yuan, H.; Wang, S.; Liu, J.; Liu, W.; Deng, C.; Dou, Z.; Wen, J.R. Large language models for information retrieval: A survey. arXiv 2023, [2308.07107].
- Xu, D.; Chen, W.; Peng, W.; Zhang, C.; Xu, T.; Zhao, X.; Wu, X.; Zheng, Y.; Chen, E. Large language models for generative information extraction: A survey. arXiv 2023, [2312.17617].
- Li, L.; Zhang, Y.; Liu, D.; Chen, L. Large language models for generative recommendation: A survey and visionary discussions. arXiv 2023, [2309.01157].
- Lin, J.; Dai, X.; Xi, Y.; Liu, W.; Chen, B.; Li, X.; Zhu, C.; Guo, H.; Yu, Y.; Tang, R.; et al. How can recommender systems benefit from large language models: A survey. arXiv 2023, [2306.05817].
- Wang, W.; Lin, X.; Feng, F.; He, X.; Chua, T.S. Generative recommendation: Towards next-generation recommender paradigm. arXiv 2023, [2304.03516].
- Fan, A.; Gokkaya, B.; Harman, M.; Lyubarskiy, M.; Sengupta, S.; Yoo, S.; Zhang, J.M. Large language models for software engineering: Survey and open problems. arXiv 2023, [2310.03533].
- Wang, J.; Huang, Y.; Chen, C.; Liu, Z.; Wang, S.; Wang, Q. Software testing with large language models: Survey, landscape, and vision. IEEE Transactions on Software Engineering 2024.
- Zheng, Z.; Ning, K.; Wang, Y.; Zhang, J.; Zheng, D.; Ye, M.; Chen, J. A survey of large language models for code: Evolution, benchmarking, and future trends. arXiv 2023, [2311.10372].
- Zeng, F.; Gan, W.; Wang, Y.; Liu, N.; Yu, P.S. Large language models for robotics: A survey. arXiv 2023, [2311.07226].
- Cui, C.; Ma, Y.; Cao, X.; Ye, W.; Zhou, Y.; Liang, K.; Chen, J.; Lu, J.; Yang, Z.; Liao, K.D.; et al. A survey on multimodal large language models for autonomous driving. In Proceedings of the Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 958–979.
- Yang, Z.; Jia, X.; Li, H.; Yan, J. A survey of large language models for autonomous driving. arXiv 2023, [2311.01043].
- He, K.; Mao, R.; Lin, Q.; Ruan, Y.; Lan, X.; Feng, M.; Cambria, E. A survey of large language models for healthcare: from data, technology, and applications to accountability and ethics. arXiv 2023, [2310.05694].
- Zhou, H.; Gu, B.; Zou, X.; Li, Y.; Chen, S.S.; Zhou, P.; Liu, J.; Hua, Y.; Mao, C.; Wu, X.; et al. A survey of large language models in medicine: Progress, application, and challenge. arXiv 2023, [2311.05112].
- Wang, B.; Xie, Q.; Pei, J.; Chen, Z.; Tiwari, P.; Li, Z.; Fu, J. Pre-trained language models in biomedical domain: A systematic survey. ACM Computing Surveys 2023, 56, 1–52.
- Li, Y., S. Wang, H. Ding, and H. Chen. Large language models in finance: A survey. Proceedings of the Proceedings of the Fourth ACM International Conference on AI in Finance, 2023; pp. 374–382. [Google Scholar]
- He, T.; Fu, G.; Yu, Y.; Wang, F.; Li, J.; Zhao, Q.; Song, C.; Qi, H.; Luo, D.; Zou, H.; et al. Towards a psychological generalist ai: A survey of current applications of large language models and future prospects. arXiv 2023, [2312.04578].
- He, Z.; Lin, W.; Zheng, H.; Zhang, F.; Jones, M.W.; Aitchison, L.; Xu, X.; Liu, M.; Kristensson, P.O.; Shen, J. Human-inspired Perspectives: A Survey on AI Long-term Memory. arXiv preprint arXiv:2411.00489 2024.
- Du, Y.; Huang, W.; Zheng, D.; Wang, Z.; Montella, S.; Lapata, M.; Wong, K.F.; Pan, J.Z. Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions, 2025, [arXiv:cs.CL/2505.00675]. arXiv:cs.
- Zeng, R.; Fang, J.; Liu, S.; Meng, Z. On the structural memory of llm agents. arXiv preprint arXiv:2412.15266 2024.
- Hatalis, K.; Christou, D.; Myers, J.; Jones, S.; Lambert, K.; Amos-Binks, A.; Dannenhauer, Z.; Dannenhauer, D. Memory Matters: The Need to Improve Long-Term Memory in LLM-Agents. Proceedings of the AAAI Symposium Series 2024, 2.
- Zhou, Z.; Ning, X.; Hong, K.; Fu, T.; Xu, J.; Li, S.; Lou, Y.; Wang, L.; Yuan, Z.; Li, X.; et al. A survey on efficient inference for large language models. arXiv preprint arXiv:2404.14294 2024.
- Shan, L.; Luo, S.; Zhu, Z.; Yuan, Y.; Wu, Y. Cognitive memory in large language models. arXiv preprint arXiv:2504.02441 2025.
- Baddeley, A.D.; Hitch, G. Working Memory; Academic Press, 1974; Vol. 8, Psychology of Learning and Motivation, pp. 47–89. [CrossRef]
- Sridhar, S., A. Khamaj, and M. Asthana. 2023. Cognitive neuroscience perspective on memory: overview and summary. Frontiers in human neuroscience 17: 1217093. [Google Scholar] [CrossRef]
- Sherwood, L., R.T. Kell, and C. Ward. 2004. Human physiology: from cells to systems. Thomson/Brooks/Cole. [Google Scholar]
- Weng, L. 2023. Llm-powered autonomous agents. lilianweng.github.io. [Google Scholar]
- Solso, R.L., and J. Kagan. 1979. Cognitive psychology. Houghton Mifflin Harcourt P. [Google Scholar]
- Craik, F.I., and R.S. Lockhart. 1972. Levels of processing: A framework for memory research. Journal of verbal learning and verbal behavior 11: 671–684. [Google Scholar] [CrossRef]
- Leydesdorff, S. 2017. Memory cultures: Memory, subjectivity and recognition. Routledge. [Google Scholar]
- Johnson-Laird, P.N. Mental models: Towards a cognitive science of language, inference, and consciousness; Vol. 6, Harvard University Press, 1983.
- Laird, J.E. 2019. The Soar cognitive architecture. MIT press. [Google Scholar]
- Sun, R. 2001. Duality of the mind: A bottom-up approach toward cognition. Psychology Press. [Google Scholar]
- Sutton, R.S., and A.G. Barto. 2018. Reinforcement learning: An introduction. MIT press. [Google Scholar]
- Zheng, L., R. Wang, X. Wang, and B. An. Synapse: Trajectory-as-exemplar prompting with memory for computer control. Proceedings of the NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023. [Google Scholar]
- Montazeralghaem, A., H. Zamani, and J. Allan. A reinforcement learning framework for relevance feedback. Proceedings of the Proceedings of the 43rd international acm sigir conference on research and development in information retrieval, 2020; pp. 59–68. [Google Scholar]
- Zhu, X.; Chen, Y.; Tian, H.; Tao, C.; Su, W.; Yang, C.; Huang, G.; Li, B.; Lu, L.; Wang, X.; et al. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144 2023.
- Zhao, A.; Huang, D.; Xu, Q.; Lin, M.; Liu, Y.J.; Huang, G. Expel: Llm agents are experiential learners. arXiv preprint arXiv:2308.10144 2023.
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners, 2020, [arXiv:cs.CL/2005.14165].
- Qwen.; :.; Yang, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; et al. Qwen2.5 Technical Report, 2025, [arXiv:cs.CL/2412.15115].
- Bai, J.; Bai, S.; Chu, Y.; Cui, Z.; Dang, K.; Deng, X.; et al. Qwen Technical Report, 2023, [arXiv:cs.CL/2309.16609].
- Meng, F.; Tang, P.; Tang, X.; Yao, Z.; Sun, X.; Zhang, M. TransMLA: Multi-Head Latent Attention Is All You Need, 2025, [arXiv:cs.LG/2502.07864]. arXiv:cs.
- Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. Proceedings of the Proceedings of the 31st International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2017, vol. NIPS’17, pp. 6000–6010. [Google Scholar]
- Dai, D., Y. Sun, L. Dong, Y. Hao, S. Ma, Z. Sui, and F. Wei. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023; pp. 4005–4019. [Google Scholar]
- Hahn, M.; Goyal, N. A theory of emergent in-context learning as implicit structure induction. arxiv 2023, arXiv:2303.07971.
- Garg, S.; Tsipras, D.; Liang, P.S.; Valiant, G. What can transformers learn in-context? A case study of simple function classes. In Proceedings of the Advances in Neural Information Processing Systems, 2022, Vol. 35, pp. 30583–30598.
- Min, S.; Lyu, X.; Holtzman, A.; Artetxe, M.; Lewis, M.; Hajishirzi, H.; Zettlemoyer, L. Rethinking the role of demonstrations: What makes in-context learning work? In Proceedings of the Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 2022; pp. 11048–11064.
- Pan, J., T. Gao, H. Chen, and D. Chen. What in-context learning “learns” in-context: Disentangling task recognition and task learning. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, 2023; pp. 8298–8319. [Google Scholar]
- Singh, A.K., S.C. Chan, T. Moskovitz, E. Grant, A.M. Saxe, and F. Hill. The transient nature of emergent in-context learning in transformers. Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Google Scholar]
- Raventos, A., M. Paul, F. Chen, and S. Ganguli. Pretraining task diversity and the emergence of non-bayesian in-context learning for regression. Proceedings of the Thirty-seventh Conference on Neural Information Processing Systems, 2023. [Google Scholar]
- Shen, L.; Mishra, A.; Khashabi, D. Do pretrained transformers learn in-context by gradient descent? arxiv 2024, arXiv:2310.08540.
- Liu, X., H. Chen, X. Hu, and X. Chu. FlowKV: Enhancing Multi-Turn Conversational Coherence in LLMs via Isolated Key-Value Cache Management. Proceedings of the First Workshop on Multi-Turn Interactions in Large Language Models, 2025. [Google Scholar]
- Li, Y.; Huang, Y.; Yang, B.; Venkitesh, B.; Locatelli, A.; Ye, H.; Cai, T.; Lewis, P.; Chen, D. SnapKV: LLM Knows What You are Looking for Before Generation. arXiv preprint arXiv:2404.14469 2024.
- Zhang, Z., Y. Sheng, T. Zhou, T. Chen, L. Zheng, R. Cai, Z. Song, Y. Tian, C. Ré, C. Barrett, and et al. 2023. H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models. In Proceedings of the Advances in Neural Information Processing Systems. Edited by A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt and S. Levine. Curran Associates, Inc.: Vol. 36, pp. 34661–34710. [Google Scholar]
- Lu, J.; An, S.; Lin, M.; Pergola, G.; He, Y.; Yin, D.; Sun, X.; Wu, Y. Memochat: Tuning llms to use memos for consistent long-range open-domain conversation. arXiv preprint arXiv:2308.08239 2023.
- Wang, Q., Z. Tang, and B. He. Can LLM Simulations Truly Reflect Humanity? A Deep Dive. Proceedings of the The Fourth Blogpost Track at ICLR 2025, 2025. [Google Scholar]
- Wang, L., J. Zhang, H. Yang, Z. Chen, J. Tang, Z. Zhang, X. Chen, Y. Lin, R. Song, W.X. Zhao, and et al. 2023. When large language model based agent meets user behavior analysis: A novel user simulation paradigm. [Google Scholar]
- Jin, B.; Yoon, J.; Han, J.; Arik, S.Ö. Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG. CoRR 2024, abs/2410.05983, [2410.05983]. [CrossRef]
- Shi, K., X. Sun, Q. Li, and G. Xu. 2024. Compressing Long Context for Enhancing RAG with AMR-based Concept Distillation. CoRR abs/2405.03085: 2405.03085. [Google Scholar] [CrossRef]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. CoRR 2020, abs/2004.05150, [2004.05150].
- Guo, M., J. Ainslie, D.C. Uthus, S. Ontañón, J. Ni, Y. Sung, and Y. Yang. 2022. LongT5: Efficient Text-To-Text Transformer for Long Sequences. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022. Seattle, WA, United States, Edited by M. Carpuat, M. de Marneffe and I.V.M. Ruíz. Association for Computational Linguistics, July 10-15, pp. 724–736. [Google Scholar] [CrossRef]
- Jin, H., Y. Zhang, D. Meng, J. Wang, and J. Tan. 2024. A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. CoRR abs/2403.02901: 2403.02901. [Google Scholar] [CrossRef]
- Zhu, Y., H. Yuan, S. Wang, J. Liu, W. Liu, C. Deng, Z. Dou, and J. Wen. 2023. Large Language Models for Information Retrieval: A Survey. CoRR abs/2308.07107: [2308.07107]. [Google Scholar] [CrossRef]
- Wang, L., N. Yang, X. Huang, L. Yang, R. Majumder, and F. Wei. 2024. Improving Text Embeddings with Large Language Models. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok, Thailand, Edited by L. Ku, A. Martins and V. Srikumar. Association for Computational Linguistics, August 11-16, Volume 1, pp. 11897–11916. [Google Scholar] [CrossRef]
- OpenAI. New Embedding Models and API Updates, 2024. Accessed: 2024-01-25. 202.
- Günther, M., J. Ong, I. Mohr, A. Abdessalem, T. Abel, M.K. Akram, S. Guzman, G. Mastrapas, S. Sturua, B. Wang, and et al. 2023. Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents. CoRR abs/2310.19923: 2310.19923. [Google Scholar] [CrossRef]
- Chen, J., S. Xiao, P. Zhang, K. Luo, D. Lian, and Z. Liu. 2024. BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation. CoRR abs/2402.03216: 2402.03216. [Google Scholar] [CrossRef]
- Zhu, D.; Wang, L.; Yang, N.; Song, Y.; Wu, W.; Wei, F.; Li, S. LongEmbed: Extending Embedding Models for Long Context Retrieval. In Proceedings of the Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024, Miami, FL, USA, November 12-16, 2024; Al-Onaizan, Y.; Bansal, M.; Chen, Y., Eds. Association for Computational Linguistics, 2024, pp. 802–816.
- Saad-Falcon, J.; Fu, D.Y.; Arora, S.; Guha, N.; Ré, C. Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT. In Proceedings of the Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27, 2024. OpenReview.net, 2024.
- Herold, C.; Ney, H. Improving Long Context Document-Level Machine Translation. CoRR 2023, abs/2306.05183, [2306.05183]. [CrossRef]
- Wang, L., Z. Du, W. Jiao, C. Lyu, J. Pang, L. Cui, K. Song, D.F. Wong, S. Shi, and Z. Tu. 2024. Benchmarking and Improving Long-Text Translation with Large Language Models. Proceedings of the Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, Edited by L. Ku, A. Martins and V. Srikumar. Association for Computational Linguistics: pp. 7175–7187. [Google Scholar] [CrossRef]
- Lyu, C., Z. Du, J. Xu, Y. Duan, M. Wu, T. Lynn, A.F. Aji, D.F. Wong, and L. Wang. 2024. A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models. In Proceedings of the Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC/COLING 2024 ELRA and ICCL. Torino, Italy, Edited by N. Calzolari, M. Kan, V. Hoste, A. Lenci, S. Sakti and N. Xue. May 20-25, pp. 1339–1352. [Google Scholar]
- OpenAI. Memory and New Controls for ChatGPT, 2024. Accessed: 2024-02-13.
- Inflection. I’m Pi, Your personal AI. https://inflection.ai/, 2023.
- Character AI. Character AI. Retrieved September 14, 2023 from https://character.ai/, 2023.
- Ai, T. Talkie | AI-Native Character Community, 2024.
- Lee, G., V. Hartmann, J. Park, D. Papailiopoulos, and K. Lee. 2023. Prompted LLMs as Chatbot Modules for Long Open-domain Conversation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023. Toronto, Canada, Edited by A. Rogers, J.L. Boyd-Graber and N. Okazaki. Association for Computational Linguistics, July 9-14, pp. 4536–4554. [Google Scholar] [CrossRef]
- Zhong, W., L. Guo, Q. Gao, H. Ye, and Y. Wang. 2024. MemoryBank: Enhancing Large Language Models with Long-Term Memory. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence, IAAI 2024, Fourteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2014, February 20-27, 2024. Vancouver, Canada, Edited by M.J. Wooldridge, J.G. Dy and S. Natarajan. AAAI Press: pp. 19724–19731. [Google Scholar] [CrossRef]
- Wang, W.; Dong, L.; Cheng, H.; Liu, X.; Yan, X.; Gao, J.; Wei, F. Augmenting Language Models with Long-Term Memory. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023; Oh, A.; Naumann, T.; Globerson, A.; Saenko, K.; Hardt, M.; Levine, S., Eds., 2023.
- Wang, X.; Salmani, M.; Omidi, P.; Ren, X.; Rezagholizadeh, M.; Eshaghi, A. Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models. In Proceedings of the Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, South Korea, August 3-9, 2024. ijcai.org, 2024, pp. 8299–8307.
- Tsai, Y.; Liu, M.; Ren, H. Rtlfixer: Automatically fixing rtl syntax errors with large language models. arXiv preprint arXiv:2311.16543 2023.
- Chen, D.; Wang, H.; Huo, Y.; Li, Y.; Zhang, H. Gamegpt: Multi-agent collaborative framework for game development. arXiv preprint arXiv:2310.08067 2023.
- Li, Y.; Zhang, Y.; Sun, L. Metaagents: Simulating interactions of human behaviors for llm-based task-oriented coordination via collaborative generative agents. arXiv preprint arXiv:2310.06500 2023.
- Zhang, K.; Li, J.; Li, G.; Shi, X.; Jin, Z. Codeagent: Enhancing code generation with tool-integrated agent systems for real-world repo-level coding challenges. arXiv preprint arXiv:2401.07339 2024.
- Lozhkov, A., R. Li, L.B. Allal, F. Cassano, J. Lamy-Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, Y. Wei, and et al. 2024. StarCoder 2 and The Stack v2: The Next Generation. CoRR abs/2402.19173: 2402.19173. [Google Scholar] [CrossRef]
- Hui, B.; Yang, J.; Cui, Z.; Yang, J.; Liu, D.; Zhang, L.; Liu, T.; Zhang, J.; Yu, B.; Dang, K.; et al. Qwen2.5-Coder Technical Report. CoRR 2024, abs/2409.12186, [2409.12186]. CoRR. [CrossRef]
- Mishra, M.; Stallone, M.; Zhang, G.; Shen, Y.; Prasad, A.; Soria, A.M.; Merler, M.; Selvam, P.; Surendran, S.; Singh, S.; et al. Granite Code Models: A Family of Open Foundation Models for Code Intelligence. CoRR 2024, abs/2405.04324, [2405.04324]. [CrossRef]
- GitHub. GitHub Copilot. 2022.
- Anysphere. Cursor - The AI Code Editor. https://www.cursor.com/en, 2025.
- Shao, Y.; Li, L.; Dai, J.; Qiu, X. Character-llm: A trainable agent for role-playing. arXiv preprint arXiv:2310.10158 2023.
- Kaiya, Z.; Naim, M.; Kondic, J.; Cortes, M.; Ge, J.; Luo, S.; Yang, G.R.; Ahn, A. Lyfe agents: Generative agents for low-cost real-time social interactions. arXiv 2023, [2310.02172].
- Li, N.; Gao, C.; Li, Y.; Liao, Q. Large language model-empowered agents for simulating macroeconomic activities. arXiv 2023, [2310.10436].
- Hua, W.; Fan, L.; Li, L.; Mei, K.; Ji, J.; Ge, Y.; Hemphill, L.; Zhang, Y. War and peace (waragent): Large language model-based multi-agent simulation of world wars. arXiv preprint arXiv:2311.17227 2023.
- Lee, G.; Hartmann, V.; Park, J.; Papailiopoulos, D.; Lee, K. Prompted llms as chatbot modules for long open-domain conversation. arXiv preprint arXiv:2305.04533 2023.
- Pan, H.; Zhai, Z.; Yuan, H.; Lv, Y.; Fu, R.; Liu, M.; Wang, Z.; Qin, B. Kwaiagents: Generalized information-seeking agent system with large language models. arXiv 2023, [2312.04889].
- Wu, Q.; Bansal, G.; Zhang, J.; Wu, Y.; Zhang, S.; Zhu, E.; Li, B.; Jiang, L.; Zhang, X.; Wang, C. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155 2023.
- Gao, S.; Chen, X.; Li, P.; Ren, Z.; Bing, L.; Zhao, D.; Yan, R. Abstractive Text Summarization by Incorporating Reader Comments. In Proceedings of the The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019. AAAI Press, 2019, pp. 6399–6406. [CrossRef]
- Kapoor, S.; Henderson, P.; Narayanan, A. Promises and pitfalls of artificial intelligence for legal applications. CoRR 2024, abs/2402.01656, [2402.01656]. [CrossRef]
- Fan, Y.; Sun, H.; Xue, K.; Zhang, X.; Zhang, S.; Ruan, T. MedOdyssey: A Medical Domain Benchmark for Long Context Evaluation Up to 200K Tokens. CoRR 2024, abs/2406.15019, [2406.15019]. [CrossRef]
- Reddy, V.; Koncel-Kedziorski, R.; Lai, V.D.; Krumdick, M.; Lovering, C.; Tanner, C. DocFinQA: A Long-Context Financial Reasoning Dataset. In Proceedings of the Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 - Short Papers, Bangkok, Thailand, August 11-16, 2024; Ku, L.; Martins, A.; Srikumar, V., Eds. Association for Computational Linguistics, 2024, pp. 445–458.
- Masry, A.; Hajian, A. LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents. CoRR 2024, abs/2401.15050, [2401.15050]. [CrossRef]
- Nie, Y.; Kong, Y.; Dong, X.; Mulvey, J.M.; Poor, H.V.; Wen, Q.; Zohren, S. A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges. CoRR 2024, abs/2406.11903, [2406.11903]. [CrossRef]
- Hilgert, L.; Liu, D.; Niehues, J. Evaluating and Training Long-Context Large Language Models for Question Answering on Scientific Papers. In Proceedings of the Proceedings of the 1st Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U); Kumar, S.; Balachandran, V.; Park, C.Y.; Shi, W.; Hayati, S.A.; Tsvetkov, Y.; Smith, N.; Hajishirzi, H.; Kang, D.; Jurgens, D., Eds., Miami, Florida, USA, 2024; pp. 220–236. [CrossRef]
- Shao, B., and J. Yan. 2024. A long-context language model for deciphering and generating bacteriophage genomes. Nature Communications 15: 9392. [Google Scholar] [CrossRef]
- Wang, H.; Liu, C.; Xi, N.; Qiang, Z.; Zhao, S.; Qin, B.; Liu, T. Huatuo: Tuning llama model with chinese medical knowledge. arXiv preprint arXiv:2304.06975 2023.
- Xiong, H.; Wang, S.; Zhu, Y.; Zhao, Z.; Liu, Y.; Wang, Q.; Shen, D. Doctorglm: Fine-tuning your chinese doctor is not a herculean task. arXiv preprint arXiv:2304.01097 2023.
- Liu, Z.; Zhong, A.; Li, Y.; Yang, L.; Ju, C.; Wu, Z.; Ma, C.; Shu, P.; Chen, C.; Kim, S.; et al. Radiology-gpt: A large language model for radiology. arXiv preprint arXiv:2306.08666 2023.
- Wang, H.; Zhao, S.; Qiang, Z.; Li, Z.; Xi, N.; Du, Y.; Cai, M.; Guo, H.; Chen, Y.; Xu, H.; et al. Knowledge-tuning large language models with structured medical knowledge bases for reliable response generation in chinese. arXiv preprint arXiv:2309.04175 2023.
- Yunxiang, L.; Zihan, L.; Kai, Z.; Ruilong, D.; You, Z. Chatdoctor: A medical chat model fine-tuned on llama model using medical domain knowledge. arXiv preprint arXiv:2303.14070 2023.
- Chen, K.; Li, J.; Wang, K.; Du, Y.; Yu, J.; Lu, J.; Li, L.; Qiu, J.; Pan, J.; Huang, Y.; et al. Chemist-x: Large language model-empowered agent for reaction condition recommendation in chemical synthesis 2024.
- Zhao, Z.; Ma, D.; Chen, L.; Sun, L.; Li, Z.; Xu, H.; Zhu, Z.; Zhu, S.; Fan, S.; Shen, G.; et al. Chemdfm: Dialogue foundation model for chemistry. arXiv preprint arXiv:2401.14818 2024.
- Chen, Z.Y., F.K. Xie, M. Wan, Y. Yuan, M. Liu, Z.G. Wang, S. Meng, and Y.G. Wang. 2023. Matchat: A large language model and application service platform for materials science. Chinese Physics B 32: 118104. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, Z.; Zhang, Y.; Zhong, A.; Fan, L.; Wu, L.; Wen, Q. Rcagent: Cloud root cause analysis by autonomous agents with tool-augmented large language models. arXiv 2023, [2310.16340].
- Qiang, Z.; Wang, W.; Taylor, K. Agent-om: Leveraging large language models for ontology matching. arXiv 2023, [2312.00326].
- Pope, R.; Douglas, S.; Chowdhery, A.; Devlin, J.; Bradbury, J.; Levskaya, A.; Heek, J.; Xiao, K.; Agrawal, S.; Dean, J. Efficiently Scaling Transformer Inference, 2022, [arXiv:cs.LG/2211.05102].
- Lewis, P., E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.t. Yih, T. Rocktäschel, and et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems 33: 9459–9474. [Google Scholar]
- Park, J.S., J. O’Brien, C.J. Cai, M.R. Morris, P. Liang, and M.S. Bernstein. Generative agents: Interactive simulacra of human behavior. Proceedings of the Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023; pp. 1–22. [Google Scholar]
- Wulf, W.A., and S.A. McKee. 1995. Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News 23: 20–24. [Google Scholar] [CrossRef]
- Jouppi, N.P., C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, and et al. 2017. In-Datacenter Performance Analysis of a Tensor Processing Unit. SIGARCH Comput. Archit. News 45: 1–12. [Google Scholar] [CrossRef]
- Dao, T., D.Y. Fu, S. Ermon, A. Rudra, and C. Ré. FLASHATTENTION: fast and memory-efficient exact attention with IO-awareness. Proceedings of the Proceedings of the 36th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2022; p. NIPS ’22. [Google Scholar]
- Yu, G.I., J.S. Jeong, G.W. Kim, S. Kim, and B.G. Chun. Orca: A Distributed Serving System for Transformer-Based Generative Models. Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), Carlsbad, CA, 2022; pp. 521–538. [Google Scholar]
- Zhong, Y., S. Liu, J. Chen, J. Hu, Y. Zhu, X. Liu, X. Jin, and H. Zhang. DistServe: disaggregating prefill and decoding for goodput-optimized large language model serving. Proceedings of the Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation, USA, 2024; p. OSDI’24. [Google Scholar]
- Brown, T.B., B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, and et al. Language models are few-shot learners. Proceedings of the Proceedings of the 34th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2020; p. NIPS ’20. [Google Scholar]
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models, 2022, [arXiv:cs.LG/2108.07258].
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models, 2021, [arXiv:cs.CL/2106.09685].
- Kwon, W., Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C.H. Yu, J. Gonzalez, H. Zhang, and I. Stoica. Efficient Memory Management for Large Language Model Serving with PagedAttention. Proceedings of the Proceedings of the 29th Symposium on Operating Systems Principles, New York, NY, USA, 2023, vol. SOSP ’23, pp. 611–626. [Google Scholar] [CrossRef]
- Xiao, G.; Tian, Y.; Chen, B.; Han, S.; Lewis, M. Efficient Streaming Language Models with Attention Sinks, 2024, [arXiv:cs.CL/2309.17453]. arXiv:cs.
- Karpukhin, V.; Oguz, B.; Min, S.; Lewis, P.S.; Wu, L.; Edunov, S.; Chen, D.; Yih, W.t. Dense passage retrieval for open-domain question answering. In Proceedings of the EMNLP (1), 2020, pp. 6769–6781.
- Khattab, O.; Zaharia, M. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, New York, NY, USA, 2020; SIGIR ’20, p. 39–48. [CrossRef]
- Izacard, G., P. Lewis, M. Lomeli, L. Hosseini, F. Petroni, T. Schick, J. Dwivedi-Yu, A. Joulin, S. Riedel, and E. Grave. 2023. Atlas: few-shot learning with retrieval augmented language models. J. Mach. Learn. Res. 24. [Google Scholar]
- Fu, Y.; Xue, L.; Huang, Y.; Brabete, A.O.; Ustiugov, D.; Patel, Y.; Mai, L. ServerlessLLM: low-latency serverless inference for large language models. In Proceedings of the Proceedings of the 18th USENIX Conference on Operating Systems Design and Implementation, USA, 2024; OSDI’24.
- Jin, H.; Wu, Y. CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration. In Proceedings of the 2025 IEEE International Conference on Web Services (ICWS), 2025, pp. 316–323. [CrossRef]
- Murre, J.M., and J. Dros. 2015. Replication and analysis of ebbinghaus’ forgetting curve. PloS one 10: e0120644. [Google Scholar] [CrossRef]
- Wang, Z.Z., J. Mao, D. Fried, and G. Neubig. Wang, Z.Z.; Mao, J.; Fried, D.; Neubig, G. Agent workflow memory. arXiv preprint arXiv:2409.07429 2024.
- Jhunjhunwala, D.; Wang, S.; Joshi, G. FedFisher: Leveraging Fisher Information for One-Shot Federated Learning. In Proceedings of the AISTATS. PMLR, 2024, pp. 1612–1620.
- Daheim, N.; Möllenhoff, T.; Ponti, E.; Gurevych, I.; Khan, M.E. Model Merging by Uncertainty-Based Gradient Matching. In Proceedings of the ICLR, 2024.
- Yu, L.; Yu, B.; Yu, H.; Huang, F.; Li, Y. Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch. ICML 2024.
- Wang, K.; Dimitriadis, N.; Ortiz-Jimenez, G.; Fleuret, F.; Frossard, P. Localizing Task Information for Improved Model Merging and Compression. ICML 2024.
- Lu, Z.; Fan, C.; Wei, W.; Qu, X.; Chen, D.; Cheng, Y. Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging. arXiv preprint arXiv:2406.15479 2024.
- Tang, A.; Shen, L.; Luo, Y.; Yin, N.; Zhang, L.; Tao, D. Merging Multi-Task Models via Weight-Ensembling Mixture of Experts. ICML 2024.
| 1 | For LLMs, we do not discuss sensory memory in detail, as LLMs primarily operate on text. |

| Memory | Preferred Knowledge Requirements | Management Challenge | Strategic Gain | |||
|---|---|---|---|---|---|---|
| Representation | Retention | Functional | Construction | Update | Query | (Return) |
| Token-level | Short-term | Episodic + Semantic | Medium | Medium | High | Editability |
| Intermediate latent | Short-term | Episodic | Low | High | Low | Efficiency |
| Parameter-level | Long-term | Procedural + Semantic | High | High | Low | Persistence |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).




