1. Introduction
Chatbots as an application of human-computer interaction have gained significant importance in the last few years and undergone remarkable developments. The growing interest in chatbots can be attributed to the proliferation of mobile devices over the past decade. As these smart devices, such as smartphones, have become increasingly popular, so too have the applications that run on them, including chatbots. Consequently, chatbots have transformed the ways in which humans interact with technology and have created new opportunities for organizations and businesses to engage with their clients. These programs are designed to perform a wide range of tasks in an automated manner, making them a useful tool in various settings [
1]. Chatbots, also called conversational agents, can be defined as conversational or interactive agents that provide an instant response to the user [
2,
3]. A computer program or artificial intelligence tool that conducts a conversation using auditory or textual methods.
Chatbots, or conversational agents, have been around since the development of the first application in 1960 called ELIZA by Joseph Weizenbaum at MIT. ELIZA used natural language processing (NLP) to recognize patterns and act as a therapist[
4]. Other notable examples include ALICE, developed by Richard Wallace in 1995, which uses Artificial Intelligence Markup Language (AIML) for pattern matching to recognize inputs and generate responses [
5]. Many current chatbots are based on ALICE's framework, such as HeX [
6] and Claude [
7] which were also developed based on standard pattern matching. HeX introduces a brand-new topic based on a certain probability[
8].
In the past, chatbots were primarily used in the customer service industry, but in recent years, they have become more sophisticated, able to understand and respond to more complex inputs. They are now used in a wide range of industries, such as healthcare [
9,
10], finance [
11,
12] ,customer service [
13,
14], individualized support via intelligent audio device [
15], and education such as using chatbots to learn Computer Programming concepts [
3,
16,
17]. The majority of chatbots are task-oriented and open-domain [
18], and are now integral parts of many companies' customer service and support operations, with their use expected to increase in the coming years [
19].
Educational institutions widely use chatbots to provide information and assistance related to education and learning. They can be implemented in a variety of settings, including schools, universities, and online learning platforms, as mobile web applications that support learning. Through chatbots, students can instantly access standardized details, such as course materials [
20], practice questions and answers [
21,
22], evaluation criteria [
23,
24], assignment due dates, advice [
25], campus path direction [
26], and study materials. Such systems can improve student engagement and support while reducing lecturers' administrative workload, allowing them to devote more time to curriculum development [
20]. Chatbots not only help students develop their interaction skills but also assist teaching faculty by automating some of their tasks [
27]. Integrating chatbots with education improves connectivity, efficiency, and reduces uncertainty in interactions [
28,
29], resulting in focused, personalized, and result-oriented online learning experiences [
20], which are vital for today's educational institutions. Furthermore, chatbots increase the level of support provided to each student, reducing the likelihood of ineffective learning and dropouts [
30,
31].
Chatbots, whether domain-specific or open-domain, usually undergo a training process that involves teaching them to understand and respond to user inputs in a natural and human-like manner. This is achieved through the use of machine learning techniques, including natural language processing (NLP) and deep learning.
There are various Natural Language Processing (NLP) models that can be utilized to develop open-domain chatbots. These models, such as BERT, RoBERTa, GPT-2, and GPT-3, are trained on large amounts of general-domain text data and then fine-tuned on smaller, task-specific datasets to adapt them to the particular task of an open-domain chatbot [
32,
33,
34]. Commonly used datasets, such as pushshift.io Reddit [
35] and Empathetic Dialogues [
36], are used to train the weights of a Transformer encoder-decoder, as seen in state-of-the-art chatbots such as Meena [
37] and BlenderBot [
38].
However, these models rely solely on facts provided in the training dataset and do not incorporate external knowledge sources to enhance their generation capabilities. This static language modeling approach fails to account for the dynamic nature of the world, where new information is constantly being updated. In other words, these models are trained based on the data collected at the time of their creation, and the knowledge acquired is static and not adaptive to changes. Other studies [
39,
40,
41] have explored the possibility of knowledge selection from a small set of knowledge, without employing a retrieval step or search engine, as was done in this study. Notably, using search engines for machine translation tasks has been shown to produce effective results [
42].
The Wizard of Wikipedia task [
43], involves conversations grounded in Wikipedia and utilizes a TFIDF retrieval model to find relevant information from the database. Similarly, our previous study [
44] employed the Wikipedia API to retrieve related knowledge from Wikipedia and answer user queries. Those studies are perhaps the closest to this research work. However, the approach in this study is more comprehensive as it leverages all publicly available information on the internet, thereby enabling the chatbot to cover a wider range of user queries.
To keep an open-domain chatbot's knowledge base current, it is important to continuously update it with new information. One way to achieve this is through web scraping techniques that gather data from sources like news websites, social media, and forums. APIs can also be used to access real-time data from external sources. By incorporating new information in real-time, chatbots can maintain an up-to-date knowledge base that is responsive to user needs.
The primary objective of this research is to develop a mechanism that can handle the ever-changing nature of acquired knowledge and empower a question-answering chatbot to respond to queries from diverse fields. To achieve this objective, the study will explore the potential of contemporary Internet search tools, especially search engines and their features. The research will integrate data science and machine learning algorithms to retrieve and analyze knowledge from the world wide web, creating a dynamic knowledge base that can be leveraged by an open-domain QA chatbot.
Furthermore, the research will investigate the potential of utilizing dynamic-related questions or follow-up questions retrieved from search engines to enhance student engagement with the chatbot. This strategy aims to encourage students to seek additional information about specific queries in a higher education setting, thereby promoting active information seeking. Incorporating dynamic follow-up questions is expected to improve the chatbot's effectiveness in addressing a wide range of queries and promoting active information-seeking among students. To answer the research questions, a pilot study will be conducted to evaluate the effectiveness of the developed chatbot in an educational setting. Additionally, the next chapter of the research will provide a detailed explanation of the architecture of the developed chatbot, including the technologies utilized.
2. Chatbot Architecture
This section describes the architecture of the developed question answering chatbot. Essentially, chatbot architecture describes how a chatbot system is structured, including the components and technologies used in its creation, and how these components interact to provide a seamless conversational experience for the user. Overall, architectures for chatbots are dependent upon their specific use cases and functionality requirements.
The type of chatbot developed in this study is a question-answering (QA) chatbot named Kucko Version 2. A question-answering (QA) chatbot is a type of chatbot designed to provide answers to user questions in natural language. The main goal of a QA chatbot is to provide accurate and relevant information to the user in real-time. Natural language processing (NLP) and machine learning techniques are typically used by QA chatbots to understand the user's question and retrieve the most relevant answer from its knowledge base. The knowledge base can be a structured database, unstructured text, or a combination of both. The main challenge of the question answering chatbot is its limited knowledge base [
44]. The architecture of the chatbot consists of a combination of various technical and functional components that work together to provide an appropriate and up-to-date response to user queries. These components work together to enable a chatbot to understand user input, retrieve information from its knowledge base, and generate an appropriate response in real-time. In general the developed chatbot is composed of four components: User Interface, Natural Language Processing, Information Retrieval, and Response Generation. In each of those components, a variety of technologies have been used emphasizing their accessibility and public availability. As the primary part of this research, Information Retrieval focuses on the mechanism for retrieving dynamic knowledge from the world wide web which we refer to as the internet wizard.
Figure 1.
Chatbot Architecture.
Figure 1.
Chatbot Architecture.
2.1. User Interface
It is the first and very crucial component where the user initiates communication with the chatbot. It is a visual element that allows users to interact with the chatbot and provides a means for the chatbot to communicate with the user. During the development of the chatbot, a key priority was placed on utilizing publicly available technologies and platforms. Specifically, the user interface of the chatbot was constructed using the Facebook Messenger API, which offers a range of tools and APIs designed to facilitate the creation of chatbots for the Facebook Messenger application. Through the use of the Facebook Messenger API, developers are able to develop customized chatbots that operate within the Facebook Messenger environment, allowing users to initiate conversations and engage with the chatbot.
Using the Facebook Messenger API enables the creation of chatbots that interact with users by sending and receiving messages, using templates and quick replies, and creating persistent menus. Also receive real-time updates about events that occur within the Messenger app, such as message delivery and receipt, via webhooks. The Facebook Messenger API is accessible through a RESTful interface, making it easy for developers to integrate with their existing systems and workflows. The API is designed to be scalable and flexible, allowing developers to build complex chatbots and conversational experiences with ease. A chatbot's UI components are an integral part of its overall design and functionality and should be carefully considered in order to provide a user-friendly and effective experience. Statistics show that social media users and mobile users heavily use Facebook Messenger. Also, the Facebook Messenger platform proved to be a powerful, sophisticated, and user-friendly platform to interact with the chatbot [
44].
In January 2023, Facebook boasted a staggering 2.963 billion monthly active users, positioning it as the most active social media platform worldwide. Furthermore, Facebook Messenger, which remains inaccessible in China, amassed at least 931.0 million users globally in the same month. This impressive figure translates to a 14.9% penetration rate among individuals aged 13 and above who currently use Facebook Messenger, securing its place as the seventh most active social media platform globally [
45,
46].
The development of the chatbot involved the use of several Facebook Messenger APIs to facilitate user interactions. The Messenger messaging API was utilized to handle the exchange of messages between the user and the chatbot. This API, provided by the Facebook Messenger Platform, enables the chatbot to send messages to users through Facebook Messenger and receive responses in return.
To enhance the messaging experience and provide users with a more interactive experience, the Button Template API has been integrated with chatbot responses to user queries. This template has been used to provide the message recipient with a list of related questions to the query submitted by the user. We call this list Related Questions which have been generated by the knowledge retrieval component and the user can post back those questions simply by clicking the button referring to the desired question.
In addition, the Customer Feedback Template has been used to provide users with the option of rating the chatbot response. This template has been associated with each chatbot response, which we call it Rate Answer section. The CSAT (Customer Satisfaction Score) has been used which allows users to rate the chatbot response from 1 to 5 stars.
2.2. Information Retrieval
In a chatbot architecture, the Information Retrieval (IR) component searches the chatbot's knowledge base for the most relevant data to address the user's query. It is a crucial part of a Question Answering (QA) chatbot, as it fundamentally governs the quality of responses proffered by the chatbot. The chatbot employs two approaches for information retrieval: keyword matching and a primary approach called the Internet Wizard, which leverages the vast repository of the World Wide Web to obtain pertinent information.
2.2.1. Keyword Matching
This is the first, simple and straightforward approach used in information retrieval of the developed open-domain QA chatbot. The basic idea behind keyword matching is to compare the user's input with a predefined list of keywords or phrases to determine the most relevant response. For the Keyword Matching approach, the Wit.ai NLP model was applied to detect keywords in user queries and MongoDB was used to store the keywords and corresponding answers in a structured database as the knowledgebase. The chatbot's knowledge base in this case MongoDB is pre-populated with a set of answers, and each answer is associated with a set of keywords that describe its content. When the user asks a question, the chatbot performs a keyword search through the wit.ai API to find the most relevant information in its knowledge base.
The keyword matching approach is used in such cases when the answer to a specific keyword or list of keywords needs to be hardcoded in the chatbot knowledge base for example questions related to the personality of the chatbot such as "What is your name?" otherwise the second approach which is Internet Wizard is used to answer user queries.
To facilitate the addition of keywords and answers to the MongoDB and Wit.ai databases, a user-friendly and sophisticated online platform has been developed. This platform streamlines the process of updating the knowledge base by eliminating the need for coding skills, making it easy and straightforward to maintain the database over time.
2.2.2. Internet Wizard
Internet Wizard is the dominant approach of the IR component of the developed open-domain QA chatbot and knowledgebase of the chatbot are heavily depend on this approach. Its called internet Wizard because it uses the world wide web knowledge to retrieve appropriate answer for the user quires. The world wide web (WWW) is a vast and constantly growing network of information, making it a vital resource for an open-domain chatbot knowledgebase. This approach combines several technologies, including data mining and machine learning, to find related documents on the web, index them, and process the content to locate knowledge that can be used for answering user inquiries.
The first step in this approach is searching the internet to find the most related documents to the user's queries. As a matter of fact, one of the most effective ways to access this information is through the use of search engines. Search engines are a fundamental component of the World Wide Web and provide users with an efficient means of accessing information. They have become the primary source of information for many individuals, who turn to them first when seeking information. Search engines are complex systems that use an interface to process and arrange documents based on their relevance to a given query [
47]. In light of this fact, the approach used the most popular search engine worldwide known as Google and utilized its features. Google's popularity and effectiveness make it a go-to tool for information retrieval, as its uses a proprietary algorithm known as PageRank to rank web pages based on their relevance to a user's query. PageRank considers the link structure of the web and assesses an individual page's value based on it, facilitating efficient and expedient information retrieval [
48]. Additionally, the vast size of Google's index, which was estimated to be over 55-billion individual web pages as of 2022 [
49], makes it a powerful tool for discovering information on a wide range of topics.
As part of the information retrieval (IR) component, a web scraping technique is implemented to interact with Google search results and extract pertinent content from featured snippets, the Google Knowledge Graph, and organic search results. This approach allows for the retrieval of relevant information directly associated with user queries.
The Internet wizard mechanism relies on Google-featured snippets as the primary source of information for answering user queries. Featured snippets are the latest enhancement to Search Engine Results Pages (SERPs), which display relevant information from web pages along with their source URL above organic search results. This provides users with quick and accurate answers to their queries, improving their search experience and efficiency [
50]. To generate its featured snippets, Google uses a complex set of algorithms. The exact algorithms used by Google are not publicly disclosed. However studies show several factors that influence the generation of featured snippets including the relevance and quality of the content, the source of the content, rank of the page, the user's search query, the page structure and formatting, multiple keyword inclusion in the website's content, and the use of different keyword locations, such as headings, titles, URLs, paragraphs, image ALT, link, and the frequency of the query [
51,
52].
The Internet Wizard leverages the Google Knowledge Graph as the second source of knowledge for responding to user queries. The Google Knowledge Graph is a semantic search engine that aims to provide more relevant and comprehensive information on entities, including people, places, things, and events. This is achieved by integrating structured data from multiple sources and processing it using machine learning algorithms to identify new relationships. By representing information using entities and their relationships, the Knowledge Graph provides users with constantly evolving and updating information. However, maintaining the vast amount of data and ensuring its completeness and accuracy presents significant challenges[
53].
During the next phase of Internet Wizard, if both the Google featured snippet and Google knowledge graph fail to provide appropriate answers to user queries, the next step will utilize Google organic search results.Google organic search results are the web pages that appear in response to a user's search query and are ranked based on relevance and popularity. These results are generated by Google's algorithms, which use complex mathematical formulas to determine the relevance and popularity of web pages. The algorithms take into account a variety of factors, including the content of the web page, the number of links pointing to the page, the relevance of the page to the search query, and the overall quality of the website. Google's organic search results are powered by the PageRank and RankBrain algorithms. PageRank analyzes the number and quality of links pointing to a web page and assigns a relevance score, while RankBrain uses machine learning to provide more relevant and accurate results for complex and conversational searches [
54,
55].
The mechanism takes the first five organic search results pages and feeds their content to the TF-IDF algorithm after removing HTML tags. TF-IDF is a statistical method used in information retrieval and natural language processing to determine the importance of a word in a document or corpus. It calculates word relevance based on its frequency in a document (term frequency) and rarity in the entire corpus (inverse document frequency), resulting in a score that reflects both. TF-IDF is widely used in information retrieval, text classification, and text clustering to assess document relevance[
56] [
57]. The TF-IDF algorithm selects the most relevant candidate document from the top five ranked pages, based on the user's query, to generate a response to the query in the next step.
The candidate document and user query are inputted into a TensorFlow Question and Answer (QA) model after being retrieved by TF-IDF. This model uses a pre-trained BERT model fine-tuned on the SQuAD 2.0 dataset to answer user questions based on the candidate document's content. TensorFlow is an open-source library developed by Google Brain [
58], has emerged as a powerful platform for QA tasks due to its scalability and flexibility. A widely used approach for QA tasks is to use pre-trained transformer models, such as BERT [
33] , to encode input text and questions into a fixed-length representation and use a classifier to predict the answer span. TensorFlow provides pre-built models and libraries for implementing BERT and other transformer models, making it easy to fine-tune these models for QA tasks [
59]. The use of pre-trained models reduces the amount of labeled data needed for training and improves performance by leveraging a large amount of general-purpose language information learned during pre-training.
In the event that the TensorFlow question answer model is unable to provide an answer based on the content of the candidate document, then the link to the candidate document with the default text "The following link provides a detailed answer" will be provided as the answer to the user's query. User can click the link to open the document inside the chat window without opening an external browser app. This document usually contains the most relevant information and explanation for the user's query.
2.3. NLP
Chatbot applications require Natural Language Processing (NLP) as an essential component to understand and interpret the user's language. NLP techniques include text classification, named entity recognition, sentiment analysis, and language translation, which are performed using rule-based methods, machine learning algorithms, and deep learning techniques[
60]. For the developed chatbot, the wit.ai NLP platform was employed to handle the first part of the NLP component. Wit.ai is a popular NLP platform that makes it easy for developers to incorporate natural language processing into their applications, particularly chatbots which uses a combination of rule-based and machine learning-based techniques to process natural language input and extract information like entities and intents [
61] . This information can then be used to trigger actions within an application or generate a response. The platform also enables developers to train customized models that can recognize specific entities, intents, and actions within a particular domain. As the expected user query structure was similar to our previous research, we used the same wit.ai application previously created and trained for that study.
The wit.ai model was trained to detect entities or more specifically keywords in each user query. Later, these entities were used to find the answer in the database.
The Tensorflow Question and Answer model, based on the BERT model, was also used in the chatbot architecture to provide answers to user queries. This model is created based on the BERT (Bidirectional Encoder Representations from Transformers) model and fine-tuned on SQuAD 2.0 dataset. BERT, developed by Google, is a transformer-based model that has been fine-tuned for a wide range of NLP tasks and has set new state-of-the-art performance on several benchmark datasets [
33].
2.4. Response Generation Component
This component is responsible for generating responses to user input and is a critical element in determining the effectiveness of the chatbot. This component is tightly integrated with the Information retrieval component. The response-generating component uses Facebook messenger APIs more specifically the pages_messaging API to send the response generated from the IR component. The Response Generation Component integrates two methods from the IR component: the Retrieval-based model using keyword matching to select a suitable response from a database, and the Generative model called Internet Wizard, which leverages Google snippets, knowledge graphs, TF-ID algorithm, and Tensorflow Question and Answer model to generate responses.
In addition, the Response Generation Component adds a list of the three related questions to the response generated by the IR component for each chatbot response. Associated questions are the most related questions that people have asked about the same topic or keyword on Google search engine. The Messaging API's button template is used to create a button for each related question, enabling users to easily ask follow-up questions by selecting the appropriate button.
It is hypothesized that this feature of the chatbot will enable users to dig deeper into the query in mind and become more informed about related topics, which will likely lead to more engagement between the chatbot and the user. Furthermore, the Customer Feedback Template Messaging API is used in this component to gather feedback from users in real-time. This template is associated with each chatbot response and is known as the "Rate the Answer" section. When the user clicks the "Rate the Answer" button, a rating window appears that allows users to rate their satisfaction with the chatbot response on a scale of 1 to 5 using the Customer Satisfaction Score (CSAT) metric. CSAT is a commonly used metric for measuring user satisfaction through surveys, providing businesses with a quick and easy way to monitor customer satisfaction over time [
62]. The ratings collected are saved in the MongoDB database for analysis of chatbot response quality and user satisfaction.
Figure 2.
Examples of Chatbot Responses to User Queries.
Figure 2.
Examples of Chatbot Responses to User Queries.
3. Pilot Study
To evaluate the performance of the implemented mechanism and the open-domain question QA chatbot features, a pilot study was conducted. Participants from the first-year Programming course at Eotvos Lorand University were enlisted, with encouragement to utilize the chatbot for any questions they had regarding the programming course, other courses, or unrelated topics. This course covers the basics of algorithms and programming and uses C# as the programming language. The study featured 35 participants of different nationalities, including 13 females and 22 males. It was carried out between December 1, 2022 and January 15, 2023.
3.1. Data collection
The data generated from the QA chatbot and user interaction were stored in two MongoDB collections (data and feedbacks) on a virtual node provided by ELTE where the chatbot code was deployed. MongoDB, a document-oriented NoSQL database system, stored the data in collections of JSON-like documents in BSON format.
The collected data involved different metrics, such as the user's query and the origin, which could either be entered by the user at the chatbot prompt or derived from the Related Questions feature of the chatbot. This metric is used for checking the effectiveness of the Related Questions feature of the chatbot in motivating the user to explore and discover more knowledge about the topic. The data further encompassed a distinct variable used to distinguish whether the answer obtained from the World Wide Web was directly from Google features, such as Google featured snippets or Google Knowledge Graph, or from a Tensorflow QA model dependent on the candidate document selected by TF-IDF. Furthermore, a variable was assigned to keep track of all the web documents used by the mechanism to generate the answer. Finally, another variable was created to capture the user's rating and satisfaction with the chatbot's response.
3.2. Evaluation Method
Chatbot technology has undergone continuous evolution, and it is expected to continue improving in the foreseeable future. Despite their widespread use, the standard for evaluating chatbots has not been firmly established and may become obsolete as technology advances. Even so, the evaluation of chatbots is crucial to their development and success. In this research the User Satisfaction Evaluation method has been used to evaluate the mechanism and features of the developed chatbot.
User satisfaction is a widely adopted evaluation method in chatbot assessment, where users interact with the chatbot and rate their satisfaction using a Likert scale or a similar measure [
63]. User satisfaction can be measured at two levels: session and turn level. At the session level, user satisfaction is measured based on the overall experience of the entire interaction between the user and the chatbot in one session. Turn level user satisfaction is a metric that reflects a user's satisfaction with each turn in the conversation. The turn level is defined as the interaction between a user and a chatbot, starting with a user message and ending in a chatbot response.
This research uses Turn Level Evaluation to enable users to evaluate each chatbot's response. This approach is a commonly used approach for assessing user satisfaction with open-domain QA chatbots, which allows users to express their preferences and opinions for each interaction. This method is preferred due to the diverse nature of topics that users may ask about, which requires a more flexible and independent evaluation approach. Studies have shown that incorporating user feedback through turn-level evaluations can improve chatbot performance and user engagement over time. The effectiveness and reliability of the turn level evaluation method for chatbot evaluation has been documented in several studies [
64,
65,
66].
The turn-level user satisfaction evaluation method in this research is implemented using the Customer Feedback Template Messaging API. In this context a template known as the "Rate the Answer" section has been incorporated into each chatbot response. This template prompts the user to rate their satisfaction with the chatbot response by presenting a rating window when the user clicks the "Rate the Answer" button. The rating scale ranges from 1 to 5 and is based on the Customer Satisfaction Score (CSAT) methodology.
The CSAT methodology is a widely adopted approach within the customer service industry to gauge customer satisfaction levels with a given product or service. This approach involves administering surveys to obtain a rating score, which can be numerical or symbolic (e.g., stars or smiley faces). The score ranges from 1 to 5, where higher values indicate higher satisfaction. The CSAT methodology has found applicability within the field of human-computer interaction (HCI) in assessing the usability and overall user experience of software applications including chatbots and conversational agents. CSAT scores serve as a valuable tool in assessing user interface effectiveness and holistic user experience [
67].
Figure 3.
Evaluating chatbot responses with CSAT.
Figure 3.
Evaluating chatbot responses with CSAT.
4. Data Analysis and Results
4.1. Analysis of a Chatbot's User Queries
The chatbot received a total of 1132 queries from 35 users. Out of these, 20 queries were about the chatbot's personality, such as "Who are you?" and "Where are you from?" The chatbot's local knowledge base had hardcoded answers for these queries, such as "I am designed and developed by the Kuckó lab team at Eötvös Loránd University. Ask questions about other topics! Such as: What is a higher-order function in functional programming?".
The other 1112 questions were diverse and covered a wide range of topics. Popular subjects included programming and coding concepts, mathematics, science and technology, health, and sports. Other miscellaneous topics were also asked about. It is noteworthy that while a significant number of inquiries received were related to course content, particularly programming and mathematics, many students posed questions regarding a diverse range of topics. For instance, one student asked about capitalism and the 1776 revolution, while another inquired, "Is there any pharmacy in elte lagymanyos campus?" and reached out for help with depression, writing "I am depressed can you help me?" . Other diverse questions such as How is Ukrainian beetroot soup called? When is the final match of World Cup 2022? Is VW atlas bigger than Telluride? These queries indicate that the chatbot was being used for a variety of purposes beyond course content, highlighting the need for chatbots to be capable of addressing a wide range of topics.
To address these queries, the chatbot utilized the Internet Wizard mechanism from the World Wide Web. Out of the 1112 queries, 890 answers were generated using Google feature snippets and the Google Knowledge Graph, while the remaining 222 responses were produced with the aid of the TF-IDF and Tensorflow modules.
Figure 4 presents a graphical representation of this information.
4.2. Analysis of Web Documents Used by Internet Wizard Mechanism
The Internet Wizard sub-component of the chatbot uses 763 unique web documents to generate answers. Each of these documents was employed at least once to generate answers to user queries. Analysis of resource metrics revealed that Wikipedia was the most frequently used web source with a record count of 116. This suggests that users who accessed the chatbot were using it as a source of general knowledge and information. This is followed by the URL
https://www.simplilearn.com/tutorials/c-tutorial/, which has a record count of 27, which indicates that users were interested in programming and mathematical topics. It is noteworthy that the URLs with the highest record counts are predominantly educational resources. For instance,
https://www.javatpoint.com/pyhton-factorial-number and
https://www.programiz.com/c-programming, which have 8 and 7 record counts respectively. This suggests that users were inquiring about information on how to solve programming and math problems related to their courses. The other websites on the list are more specialized and focus on particular topics such as information technology, science, sports, health, and many other diverse topics. These entries suggest that users use the chatbot to answer questions about a diverse range of interests beyond just programming and math. In
Figure 2, the top twenty most frequently used web documents by the chatbot to generate answers to user queries were identified and classified into separate categories in the pie chart. The remaining web documents, which were referenced less frequently and therefore demonstrated a high degree of diversity, have been aggregated into a collective category labeled "others." This category represents web documents that were used between one and three times, consisting of 743 unique web documents. This finding indicates that the Internet Wizard mechanism utilized a wide range of web documents to generate responses to user queries, which can be attributed to the diversity of the queries' topics.
Figure 5.
Analysis of Web Documents used by Internet Wizard Mechanism.
Figure 5.
Analysis of Web Documents used by Internet Wizard Mechanism.
4.3. Distribution of Queries in the Chatbot
The distribution of queries in
Figure 6 indicates that the Related Questions Feature was the predominant source, accounting for 55.6% of the records, compared to User-input questions, which accounted for 44.4% of the records. This implies that a substantial proportion of the questions asked were generated through the Related Questions Feature, indicating its ability to capture users' attention and stimulate their interest in seeking further knowledge about the original query.
The higher number of "Related Questions Feature" questions compared to "User-input question" questions further suggests that users were more likely to use the chatbot as an information source and seek follow-up questions. These results emphasize the usefulness of the Related Questions Feature and its contribution to the overall efficacy of the chatbot. It's possible that the user-friendly nature of the Related Questions Feature contributed to its frequent usage.
4.4. Feedback Analysis of Chatbot Responses
Based on the feedback collected through the Rating the Answer feature, it can be observed that out of 1,222 chatbot responses, 821 were rated by users. Of these, 511 answers were provided for queries generated by the Related Questions Feature, while 310 answers were provided for questions inputted by users. It is important to note that the Rating the Answer feature was not available for the hardcoded answers in the local database used for persona-related questions, where a keyword-matching mechanism was employed.
Analysis of the feedback collection documents presented in
Table 1 suggests that the majority of chatbot responses were sourced from Google Featured Snippets and Knowledge Graphs. Specifically, for questions generated by the Related Questions Feature, 453 answers were provided from this source, compared to 238 answers provided for questions inputted by users.
Similarly, for questions inputted by users, most answers were sourced from Google Featured Snippets and Knowledge Graphs, with only 72 coming from TF-IDF and TensorFlow. This finding indicates that Google's Featured Snippets and Knowledge Graph sections of the Internet Wizard mechanism played a significant role in providing answers.
The feedback ratings for the answers were generally high, with an average rating of 4.69 for questions sourced from the Related Questions Feature and Google Featured Snippets and Knowledge Graphs, and an average rating of 4.70 for questions inputted by users and sourced from the same source.
Figure 7 depicts that the majority of responses sourced from Google Featured Snippets and Knowledge Graphs received a rating of 5. However, when TF-IDF and TensorFlow were used as sources of answers for user-input questions, the average feedback rating decreased to 3.81. Responses sourced from TF-IDF and TensorFlow received lower ratings compared to those sourced from Google's Featured Snippets and Knowledge Graphs, as illustrated in
Figure 7, indicating a lower level of satisfaction with the answers provided by these sources.
Regarding the number of records, the highest count of 453 was for answers sourced from Google Featured Snippets and Knowledge Graphs for questions generated by the Related Questions Feature. The lowest count of 58 was for answers sourced from TF-IDF and TensorFlow for questions generated by the Related Questions Feature. The average feedback rate was 4.58, indicating that the majority of users were highly satisfied with the chatbot's responses. A feedback rating of 4 or above is generally considered positive, indicative of the chatbot's ability to accurately answer a wide range of questions and provide users with the information they need satisfactorily.
Overall, the results suggest that Google Featured Snippets and Knowledge Graphs were the primary sources for generating answers in the Internet Wizard mechanism. The high feedback ratings for answers sourced from this section Internet Wizard mechanism indicate that users were generally satisfied with the answers provided. Based on the feedback analysis, the Internet Wizard mechanism appears to be generally effective at answering user queries with a high level of user satisfaction. However, there is potential for improvement, particularly with regard to the use of TF-IDF and TensorFlow as sources of generating answers.
4.5. Comparison to statical language model
The Internet Wizard extracts information from the vast World Wide Web and continually updates its knowledge base in real-time without requiring any fees. This allows the chatbot to provide flexible and adaptable responses to queries that it has not been explicitly trained on, thus providing a more flexible and adaptable solution. Unlike statistical language models, the dynamic knowledge retrieval of the Internet Wizard provides a diverse and up-to-date knowledge base
Statistical language models, including large ones like GPT, are limited by their training data and have a static and time-limited knowledge base. This means that they are unable to respond to queries based on information acquired after a specific time period, such as 2021. For instance,
Figure 5 shows that ChatGPT struggles to answer a question about an event that happened in 2022. However, the Internet Wizard mechanism is effective in answering such queries by continuously retrieving the most recent and relevant information from the World Wide Web, as shown in
Figure 6.
The integration of the Internet Wizard mechanism with GPT can serve as a solution to the limitations of a static knowledge base in language models. By continuously retrieving the most recent and relevant information available on the web, the Internet Wizard mechanism enables the model to expand its knowledge base, resulting in more comprehensive and up-to-date responses. This combination of dynamic knowledge retrieval and advanced language modeling can result in a chatbot that provides accurate and informative responses to a wide range of user queries in a human-like style.
Figure 1.
(a) Real-time Knowledge Base of the Internet Wizard Mechanism;(b) Temporal Knowledge Base of the GPT-3.5 Language Model
Figure 1.
(a) Real-time Knowledge Base of the Internet Wizard Mechanism;(b) Temporal Knowledge Base of the GPT-3.5 Language Model
5. Conclusions, limitations, and future works
5.1. Conclusion
This study assessed the effectiveness of the Internet Wizard mechanism for improving the knowledge base of an open-domain QA chatbot. By employing search engine features, data mining techniques, and a machine learning module, the Internet Wizard mechanism dynamically retrieved related knowledge from the web, enabling the chatbot to generate responses to a wide range of topics.
The study demonstrated that the use of Google Featured Snippets and Knowledge Graphs played a crucial role in providing accurate responses to user queries, resulting in high levels of user satisfaction with the generated answers. Moreover, the chatbot and the Internet Wizard mechanism handled both educational and non-educational topics satisfactorily, as evidenced by the high rate of feedback and user satisfaction.
The use of dynamic knowledge-retrieving approaches, such as the Internet Wizard mechanism, offers a powerful tool for generating answers to a wide range of questions. This approach can complement the strengths of statistical language models, providing a comprehensive and flexible solution for natural language processing tasks. The success of the chatbot in addressing a broad range of topics underscores the potential of such mechanisms in developing a dynamic knowledge base for an open-domain QA chatbot.
In addition, the study highlights the importance of using the Related Questions feature, as it contributed to the majority of the questions asked by users indicating that users were more inclined to leverage the chatbot as an information source and seek follow-up questions. The findings suggest that a chatbot that can handle a wide variety of topics, leverages the Related Questions Feature, and provides accurate answers, can be a valuable tool for students seeking information beyond their coursework.
In conclusion, the utilization of the Internet Wizard mechanism, combined with the Related Questions feature, offers a promising approach for developing a dynamic knowledge base for an open-domain QA chatbot that can handle a wide range of topics. The success of the chatbot in addressing a broad range of questions, along with the high rate of feedback and user satisfaction, highlights the potential of this approach for providing accurate and informative responses to users seeking information.
5.2. Limitation
The present study demonstrates the efficacy of the Internet Wizard mechanism in augmenting the chatbot's proficiency in handling diverse topics. However, inconsistent performance was observed in the sections that employed TF-IDF and TensorFlow.
Specifically, the TensorFlow model intermittently failed to predict accurate responses based on the candidate document generated by TF-IDF.
This limitation may be attributed to either a paucity of pertinent information in the document content or the requirement for further refinement of the web document context through text fine-tuning techniques. To address these limitations and enhance the performance of the Internet Wizard mechanism, several potential solutions can be considered. One approach involves retrieving pertinent knowledge for the user's query from multiple documents and synthesizing them into a comprehensive knowledge base that can be fed into the TensorFlow model. Additionally, fine-tuning the TensorFlow model itself for specific tasks with relevant training data could potentially improve its precision and effectiveness.
5.3. Future Work
In terms of future research directions, there exists an opportunity to investigate the potential benefits of integrating the Internet Wizard mechanism with advanced natural language processing techniques, such as neural and Large Language Models (LLMs). This could serve to further enhance the chatbot's overall performance, thereby offering a more human-like chatting experience to the user, while also providing access to a dynamic and up-to-date knowledge base. Additionally, the potential solutions that have been suggested for addressing the limitations of the Internet Wizard mechanism could be explored in further research.