Submitted:
15 January 2026
Posted:
16 January 2026
You are already at the latest version
Abstract
Keywords:
I. Introduction
- We empirically evaluate two meta parameters (model size and temperature) in prompting approach to perform slang meaning inference. Our experimental results demonstrate that increasing model size yields limited improvements, while higher temperature settings may adversely affect task accuracy.
- We propose a novel application of chain-of-thought approach integrated with greedy search algorithm to enhance language models’ inference capabilities. This approach extends the application of chain-of-thought prompting beyond its traditional use in arithmetic reasoning tasks, demonstrating its effectiveness in improving models’ general inference abilities.
II. Related Work
III. Methodology
A. Problem Formulation
B. Greedy Search-Guided Chain-of-Thought
C. Step-by-Step Prompting
| Algorithm 1 Category Inference |
|
Inputs: Context: the usage example; Slang: the target slang term; K: total number of candidates to be generated. Prompt: prompt to instruct LLM to generate categories. Output: maxTuple: the category with the highest score 1: maxTuple ← (null, 0) 2: thoughts ← LLMInfer (Context, Slang, K, Prompt) 3: for (category, score) in thoughts do 4: if maxTuple [1] < score then 5: maxTuple (category, score) 6: end if 7: end for 8: return maxTuple |
| Algorithm 2 Essential Meaning Generation |
|
Inputs: Category: the selected category; Slang: the target slang term; K: total number of candidates to be generated. Prompt: prompt to instruct LLM to generate meanings. Output: meaningList: a list of generated meanings with confidence score 1: [] 2: LLMInfer (Category, Slang, K, Prompt) 3: return meaningList |
| Algorithm 3 Context Coherence Check |
|
Inputs: Context: the usage example; meaningList: a list of generated meanings with confidence score. Prompt: prompt to instruct LLM to check compatibility in original context Output: selectedMeaning: the selected meaning 1: selectedMeaning null 2:0 3:null 4: for (meaning, score) in meaningList do 5: LLMInfer (Context, meaning, Prompt) 6: if finalScore < (confidenceScore *0.6 + score * 0.4) 7: confidenceScore *0.6 + score * 0.4 8: meaning 9: end if 10: end for 11: return selectedMeaning |
IV. Experiment and Analysis
A. Experimental Setting
B. Metrics
C. Experimental Results
- a)
- Limited Impact from Model Size and Temperature Setting: Based on experimental results comparing multiple model sizes, we observe that larger models do not necessarily demonstrate superior performance, as shown in Table 1. Interestingly, smaller models such as GPT-4o-mini and Qwen2-7B-Instruct achieve higher F1 scores despite lower SimCSE results.

- b)
- Improved Accuracy with Greedy Search-Guided Chain-of-Thought: Given the superior overall performance of Qwen2-7B-Instruct with the temperature set to 0.3, we proceeded to the second experiment implementing our proposed approach.
V. Conclusions
Appendix A



References
- Zeng, Y.; et al. ‘Bridging the Editing Gap in LLMs: FineEdit for Precise and Targeted Text Modifications’. arXiv [cs.CL] 2025. [Google Scholar]
- Ji, Y.; et al. ‘RAG-RLRC-LaySum at BioLaySumm: Integrating Retrieval-Augmented Generation and Readability Control for Layman Summarization of Biomedical Texts’. In Proceedings of the 23rd Workshop on Biomedical Natural Language Processing, 2024; pp. 810–817. [Google Scholar]
- Ji, Y.; Yu, Z.; Wang, Y. ‘Assertion Detection in Clinical Natural Language Processing Using Large Language Models’. 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI) 2024, 242–247. [Google Scholar]
- Lin, X.; Tu, Y.; Lu, Q.; Cao, J.; Yang, H.; et al. ‘Research on Content Detection Algorithms and Bypass Mechanisms for Large Language Models’. Academic Journal of Computing & Information Science vol. 8(no. 1), 48–56.
- Shi, C.; et al. ‘Deep Semantic Graph Learning via LLM based Node Enhancement’. arXiv [cs.AI] 2025. [Google Scholar]
- Sundaram; Subramaniam, H.; Hamid, S. H. A.; Nor, A. M. “A Systematic Literature Review on Social Media Slang Analytics in Contemporary Discourse”. IEEE Access 2023, vol. 11, 132457–132471. [Google Scholar] [CrossRef]
- Slang and Sociability | Connie Eble; University of North Carolina Press; Available online: https://uncpress.org/book/9780807845844/slang-and-sociability/.
- Hovy, D.; Søgaard, A. “Tagging Performance Correlates with Author Age”. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing 2015, Volume 2, 483–488. [Google Scholar]
- Hovy, D.; Spruit, S. L. “The Social Impact of Natural Language Processing”. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics 2016, Volume 2, 591–598. [Google Scholar]
- Urban Dictionary, “Urban Dictionary,” Urban Dictionary, 2000. Available online: https://www.urbandictionary.com/.
- Sun, Z.; Hu, Q.; Gupta, R.; Zemel, R.; Xu, Y. “Toward Informal Language Processing: Knowledge of Slang in Large Language Models”. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) 2024, 1683–1701. [Google Scholar]
- Pei, Z.; Sun, Z.; Xu, Y. “Slang Detection and Identification”. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL), 2019; pp. 881–889. [Google Scholar]
- Rauber, T.; Berns, K. “Kernel Multilayer Perceptron”. 2011 24th SIBGRAPI Conference on Graphics, Patterns and Images, 2011; pp. 337–343. [Google Scholar]
- Seki, Y.; Liu, Y. ‘Multi-task Learning Model for Detecting Internet Slang Words with Two-Layer Annotation’. 2022 International Conference on Asian Language Processing (IALP), 2022; pp. 212–218. [Google Scholar]
- Ni, K.; Wang, W. Y. “Learning to Explain Non-Standard English Words and Phrases”. Proceedings of the Eighth International Joint Conference on Natural Language Processing 2017, Volume 2, 413–417. [Google Scholar]
- Wuraola; Dethlefs, N.; Marciniak, D. ‘Understanding Slang with LLMs: Modelling Cross-Cultural Nuances through Paraphrasing’. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 2024, 15525–15531. [Google Scholar]
- Mei, L.; Liu, S.; Wang, Y.; Bi, B.; Cheng, X. “SLANG: New Concept Comprehension of Large Language Models”. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing 2024, 12558–12575. [Google Scholar]
- Wei, J.; et al. ‘Chain-of-Thought Prompting Elicits Reasoning in Large Language Models’. arXiv [cs.CL] 2023. [Google Scholar]
- Wang, X.; et al. ‘Self-Consistency Improves Chain of Thought Reasoning in Language Models’. arXiv [cs.CL] 2023. [Google Scholar]
- Yao, S.; et al. ‘Tree of Thoughts: Deliberate Problem Solving with Large Language Models’. arXiv [cs.CL] 2023. [Google Scholar]
- Lin, C.-Y. ‘ROUGE: A Package for Automatic Evaluation of Summaries’. In Text Summarization Branches Out; 2004; pp. 74–81. [Google Scholar]
- Gao, T.; Yao, X.; Chen, D. ‘SimCSE: Simple Contrastive Learning of Sentence Embeddings’. arXiv [cs.CL] 2022. [Google Scholar]
- Renze, M. ‘The Effect of Sampling Temperature on Problem Solving in Large Language Models’. Findings of the Association for Computational Linguistics: EMNLP 2024, 2024, 7346–7356. [Google Scholar]






| Models | ROUGE-L | SimCSE | ||
|---|---|---|---|---|
| F1 | Precision | Recall | ||
| GPT-4o | 0.225 | 0.149 | 0.171 | 0.736 |
| GTP-4o-mini | 0.299 | 0.166 | 0.250 | 0.727 |
| Qwen2.5-72B | 0.123 | 0.250 | 0.199 | 0.715 |
| Qwen2-7B-Instruct | 0.170 | 0.322 | 0.222 | 0.696 |
| DeepSeek-V3 | 0.235 | 0.166 | 0.300 | 0.726 |
| DeepSeek-R1-Distill-Llama-8B | 0.166 | 0.111 | 0.222 | 0.696 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).