Submitted:
27 June 2024
Posted:
28 June 2024
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Related Work
1.2. Contributions
- Two-Phase conversation workflow. We implement a two-phase conversation system to mimic the workflows of real expert consultants and mitigate the limitations of RAG. Due to document chunking, traditional RAG approaches risk feeding the conversational engine incomplete or misleading information1. This can lead to hallucinations or suboptimal matches. Our solution involves splitting the conversational flow into two phases. In the first phase, the chatbot searches for information in a set of tender summary cards, allowing the system to evaluate multiple funding initiatives comprehensively. The LLM’s reasoning abilities then filter out irrelevant documents, ensuring the user receives the most accurate and relevant funding opportunities.
- Italian speaking capabilities. We enhance our chatbot’s effectiveness by deploying state-of-the-art LLMs pre-trained on large multi-language text corpora, ensuring robust Italian language support.
- Pertinent visual recommendations. We incorporate an ad-hoc user interface powered by an LLM-based filter with function-calling capabilities. This feature triggers a second conversation phase where users can delve into the details of the most promising tenders.
2. Material And Methods
2.1. Chatbot Architecture
2.2. 2-Phase Conversation
2.3. Technical Motivations
2.4. Back-end Architecture
2.5. Large Language Model (LLM)


2.6. Vector Database
2.7. Summarizer

2.8. Knowledge Base Structure
2.9. Information Retrieval System
2.10. Challenges and Solutions
2.11. Evaluation
- Accuracy in identifying the tender that best matches the characteristics of the end user (objective).
- Average perceived quality of each response, based on a scale containing five different levels of satisfaction (subjective).
- Average perceived quality of an entire conversation, based on a scale containing five different levels of satisfaction (subjective).
- Each evaluator will briefly study the summary sheet (about a paragraph of text) of the assigned tender.
- The evaluator will then start a conversation with the chatbot, simulating the behavior of a user whose characteristics exactly match the requirements of the tender in question. Specifically, the evaluator will correctly and as completely as possible fill out the form shown by the web application’s interface.
- The chatbot will generate an initial response, which will be evaluated through the specific interface (5 emojis) provided by the application. All subsequent responses will also be evaluated individually in the same manner.
- The evaluator will continue to interact with the chatbot for up to 3 iterations, until the following condition occurs: the chatbot identifies the optimal tender, presents it in the response, and displays it in the follow-up buttons. When this condition is met, the evaluator notes the successful match, along with the number of iterations performed (including the form – thus +1), in a designated spreadsheet and proceeds by pressing the button associated with the identified tender. If the condition is not met after the maximum number of iterations, the evaluator notes the failure of the tender-client match in the spreadsheet, and the conversation is considered ended.
- If entering a possible second phase of the conversation, the evaluator continues to interact with the chatbot for up to 3 iterations, asking more detailed questions about the tender in question (e.g., application requirements, support in submitting the funding application).
- At the end of the conversation, the evaluator, using the interface (5 emojis) within the sidebar of the application, provides an overall satisfaction rating, possibly attaching a summary comment on the service experience. The correct saving of the evaluation is done by clicking the "Submit" button, which triggers a visual confirmation feedback.
3. Results
4. Discussion
5. Conclusion
Appendix A Chatbot’s User Interface

Appendix B Prompts’ Translations




References
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. arXiv 2024. [Google Scholar]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing; 2024. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017. [Google Scholar]
- Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network. Physica D: Nonlinear Phenomena 2020, 404, 132306. [Google Scholar] [CrossRef]
- Li, Z.; Li, X.; Liu, Y.; Xie, H.; Li, J.; lee Wang, F.; Li, Q.; Zhong, X. Label Supervised LLaMA Finetuning. arXiv 2023. [Google Scholar]
- Lv, K.; Yang, Y.; Liu, T.; Gao, Q.; Guo, Q.; Qiu, X. Full Parameter Fine-tuning for Large Language Models with Limited Resources. arXiv 2023. [Google Scholar]
- Tian, K.; Mitchell, E.; Yao, H.; Manning, C.D.; Finn, C. Fine-tuning Language Models for Factuality. arXiv 2023. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; et al. P.C. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv 2019. [Google Scholar]
- Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A Comprehensive Survey on Transfer Learning. arXiv 2020. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. arXiv 2021. [Google Scholar]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; et al. Y.D. Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv 2024. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; et al. H.K. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2021. [Google Scholar]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; et al. N.B. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023. [Google Scholar]
- Liu, Y.; Han, T.; Ma, S.; Zhang, J.; Yang, Y.; Tian, J.; et al. H.H. Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models. Meta-Radiology 2023, 1, 100017. [Google Scholar] [CrossRef]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; et al. F.L.A. GPT-4 Technical Report. arXiv 2024. [Google Scholar]
- Douze, M.; Guzhva, A.; Deng, C.; Johnson, J.; Szilvasy, G.; Mazaré, P.E.; Lomeli, M.; Hosseini, L.; Jégou, H. The Faiss library. arXiv 2024. [Google Scholar]
| 1 | This claim is correct under the assumption that knowledge base documents exceed the context length of the deployed LLM. |
| 2 | The last two metrics were computed on conversations for which the required data was available |





Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).