Submitted:
23 September 2024
Posted:
24 September 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Related Work
3. Methodology
3.1. Text Preprocessing and Formatting
- Start and End Tokens: These tokens clearly demarcate the beginning and end of each dialogue. This helps the model to correctly interpret the scope of the conversation and ensures that it processes each dialogue turn within the defined boundaries.
- System Prompts: System prompts are used to provide the model with initial context or instructions. For example, a system prompt may include information about the role of the user and the expected type of interaction. This helps in setting the stage for the model’s responses and aligning them with the desired outcome of the dialogue.
- Few-Shot Examples: Few-shot examples are included to demonstrate the format and type of interactions expected. These examples help the model to understand the context and generate appropriate responses based on the given examples. They act as a reference point for the model, improving its performance by showing it what a typical dialogue might look like.
- Turn Formatting: Each turn in the dialogue, whether a user input or a model response, is formatted according to a specific template. This ensures consistency in the dialogue structure, making it easier for the model to parse and generate responses. The turns are interleaved in a way that maintains the flow of conversation.
- State Management: The preprocessing step also involves managing the state of the dialogue. This includes tracking the sequence of turns and ensuring that the context is preserved across multiple interactions. Proper state management helps in maintaining coherence and relevance in the model’s responses.
3.2. Agent Methods
- Initialization:where model is the pre-trained language model, formatter is the text formatting method, and state represents the current state of the dialogue. Initialization involves loading the pre-trained model and setting up the formatter with the appropriate configuration to manage the dialogue state.
-
Dialogue Processing: For a given input , the formatted input is:The agent generates a response r using the model:This step includes formatting the dialogue to ensure consistency and generating responses based on the context and previous turns.
-
Response Parsing: The agent parses the model’s response to extract meaningful information:This involves parsing the response to extract key elements and updating the dialogue state accordingly.
- Initializing Dialogues: Setting up the initial context and prompts.
- Question Generation: Formulating questions to narrow down the possible answers based on the current state and previous responses.
- State Management: Continuously updating the state with new information to refine future questions.
- Initializing Dialogues: Preparing the agent with the secret word and setting up the initial state.
- Response Generation: Providing accurate yes/no answers based on the questions asked.
- State Management: Tracking the questions and answers to ensure consistency and accuracy in responses.
3.3. Model Loading, Pretraining, and Configuration
- Model Loading:where identifies the pre-trained model, and config includes necessary configurations. The configuration file specifies the model architecture, tokenization details, and any pre-training objectives. Loading the model involves setting up the computational environment and ensuring all dependencies are properly installed.
- Pretraining: The models are further pretrained using specific datasets to enhance their performance for the 20 Questions game. This involves fine-tuning the model parameters to better handle the nuances of the game:where dataset represents the data used for pretraining. The fine-tuning process includes adjusting the learning rate, optimizing hyperparameters, and running multiple training epochs to improve model accuracy and response quality.
-
Prompt Definition: Define a prompt template P to guide the model during the pre-training phase:The prompt includes system messages explaining the game rules and few-shot examples demonstrating the expected interaction between questioner and answerer. Crafting effective prompts involves selecting representative examples that cover a wide range of potential scenarios in the game, thereby improving the model’s adaptability.
-
Game Environment Initialization: The game environment was created using Kaggle’s environment tools, where agents were configured to play as questioners or answerers. The environment setup involves several key steps:The environment configuration specifies the number of episodes, the time limits for each turn, and other relevant parameters. This setup ensures that the game runs smoothly and that the agents can interact within a controlled, reproducible framework.
-
Agent Configuration: Each agent is assigned a role (questioner or answerer) and initialized with the appropriate model:The initialization involves loading the model weights, setting up the prompt, and configuring the agent to handle dialogue turns. This step includes defining the agent’s behavior scripts and ensuring they adhere to the rules of the game.
-
Game Execution: The game runs for a specified number of episodes, and the performance of the agents is recorded:During game execution, the agents interact by generating questions and answers, with the goal of identifying the secret word in the fewest turns possible. Performance metrics such as accuracy, number of questions asked, and response time are recorded and analyzed to evaluate the effectiveness of each model.
- Debugging and Optimization: Throughout the process, extensive debugging and optimization are conducted to refine the models and improve their performance. This includes adjusting parameters, refining prompts, and iteratively testing the models in various scenarios to ensure robustness and reliability.
3.4. Advanced Strategy Implementation
- Probabilistic Reasoning: The probability of each potential word being the correct answer was updated after each question based on the responses received. This probabilistic approach allows the model to dynamically adjust its hypotheses as new information becomes available. Bayesian updating was used to formalize this process:where is the posterior probability of word w given the response , is the likelihood of receiving response if w is the correct word, is the prior probability of w being the correct word, and is the marginal likelihood of receiving response . This approach ensures that the model continually refines its predictions based on the most recent data.
- Decision-Making Algorithm: An advanced decision-making algorithm was designed to select the next question that maximizes the expected information gain. This strategy aims to ask questions that are most likely to reduce uncertainty about the correct word. The expected information gain I for a question q is calculated as:where is the probability of receiving response r given question q, and R is the set of all possible responses. This calculation helps the model to choose questions that are expected to yield the most informative answers, thereby optimizing the question selection process and improving the efficiency of narrowing down the possible answers.
- Heuristic Adjustments: In addition to probabilistic reasoning and decision-making algorithms, heuristic adjustments were made to improve performance. These adjustments include prioritizing certain types of questions based on empirical data, balancing exploration and exploitation strategies, and using domain-specific knowledge to guide the questioning process. Heuristics provide a practical approach to refine the model’s decision-making in scenarios where purely probabilistic methods may be insufficient or computationally expensive.
3.5. Performance Metrics
4. Experiments and Results
4.1. Experimental Setup
4.2. Results Analysis
4.3. Comparative Analysis
5. Conclusions
References
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; others. Language models are few-shot learners. Advances in neural information processing systems 2020, 33, 1877–1901. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018; arXiv:1810.04805. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I.; others. Language models are unsupervised multitask learners. OpenAI blog 2019, 1, 9. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019; arXiv:1910.01108. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Clark, K.; Luong, M.T.; Le, Q.V.; Manning, C.D. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555, 2020; arXiv:2003.10555. [Google Scholar]
- Zhang, B.; Xiao, J.; Yan, H.; Yang, L.; Qu, P. Review of NLP Applications in the Field of Text Sentiment Analysis. Journal of Industrial Engineering and Applied Science 2024, 2, 28–34. [Google Scholar]
- Dong, L.; Yang, N.; Wang, W.; Wei, F.; Liu, X.; Wang, Y.; Gao, J.; Zhou, M.; Hon, H.W. Unified language model pre-training for natural language understanding and generation. Advances in neural information processing systems 2019, 32. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research 2020, 21, 1–67. [Google Scholar]
- Yan, H.; Xiao, J.; Zhang, B.; Yang, L.; Qu, P. The Application of Natural Language Processing Technology in the Era of Big Data. Journal of Industrial Engineering and Applied Science 2024, 2, 20–27. [Google Scholar]
- Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942, 2019; arXiv:1909.11942. [Google Scholar]
- Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. Mass: Masked sequence to sequence pre-training for language generation. arXiv preprint arXiv:1905.02450, 2019; arXiv:1905.02450. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019; arXiv:1910.13461. [Google Scholar]
- Zhang, J.; Zhao, Y.; Saleh, M.; Liu, P. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. International conference on machine learning. PMLR, 2020, pp. 11328–11339.
- Zhang, Y.; Sun, S.; Galley, M.; Chen, Y.C.; Brockett, C.; Gao, X.; Gao, J.; Liu, J.; Dolan, B. Dialogpt: Large-scale generative pre-training for conversational response generation. arXiv preprint arXiv:1911.00536, 2019; arXiv:1911.00536. [Google Scholar]
- He, P.; Liu, X.; Gao, J.; Chen, W. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020; arXiv:2006.03654. [Google Scholar]
- Shoeybi, M.; Patwary, M.; Puri, R.; LeGresley, P.; Casper, J.; Catanzaro, B. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv preprint arXiv:1909.08053, 2019; arXiv:1909.08053. [Google Scholar]
- Janka, H.T.; Wongwathanarat, A.; Kramer, M. Supernova fallback as origin of neutron star spins and spin-kick alignment. The Astrophysical Journal 2022, 926, 9. [Google Scholar] [CrossRef]
- Wang, A.; Singh, A.; Michael, J.; Hill, F.; Levy, O.; Bowman, S.R. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018; arXiv:1804.07461. [Google Scholar]
- Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; others. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics 2019, 7, 453–466. [Google Scholar] [CrossRef]
| Model | Score |
|---|---|
| Self-defined Agent + Pre-trained Model | 651 |
| Self-defined Agent + Gemma 2b + Pre-trained Model | 692 |
| Self-defined Agent + ChatGPT + Pre-trained Model | 702 |
| Self-defined Agent + Llama-3-8b + Pre-trained Model | 741 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).