Submitted:
06 January 2025
Posted:
07 January 2025
You are already at the latest version
Abstract
Keywords:
MSC: 68T07; 03E72; 90B50
1. Introduction
- Lack of trust – Users may hesitate to fully trust AI-powered chatbots, particularly when dealing with sensitive information or complex interactions. Building trust in the accuracy, security and reliability of these systems is critical for their broader acceptance.
- Limited understanding and awareness – Many users are unfamiliar with the capabilities and benefits of GAI chatbots. This lack of knowledge or understanding about how they function and what they offer may hinder adoption.
- User experience and satisfaction – Poorly designed chatbots can lead to unsatisfactory user experiences. Frustrating interactions or failure to resolve queries effectively may discourage continued use.
- Cost and ROI – Developing and maintaining GAI chatbots can be expensive, particularly for small and medium-sized enterprises. Organizations must carefully assess the return on investment (ROI) and weigh costs against potential benefits.
- Ethical and bias concerns – GAI chatbots are only as reliable and fair as the data they are trained on, which can sometimes perpetuate biases or unfair practices. Ensuring chatbots are ethical, unbiased and inclusive is important for their acceptance and broader implementation.
- Analysis and categorization of existing multi-criteria approaches for AI chatbot selection, classified by the techniques used and the types of estimates employed (numeric, interval, linguistic values, as well as crisp and fuzzy numbers). These approaches are then grouped into three main categories based on complexity (number of multi-criteria techniques), flexibility (type of fuzziness) and iterativeness (single or repeated data processing).
- Development of a theoretical framework for ranking GAI chatbots using both single and hybrid methods with crisp and fuzzy estimates. Single methods rely on one weight determination or ranking approach, while hybrid methods combine multiple approaches. The framework also incorporates complementary techniques such as fuzzy interval arithmetic operations, robustness analysis and sensitivity analysis to enhance decision-making and benchmark rankings. Additionally, it proposes a newly developed 3D distance metric to improve the efficiency of the hesitant Fermatean uzzy group TOPSIS method, enabling more effective multi-criteria comparisons of chatbot features.
- Creation of static and dynamic rankings of an AI chatbot dataset via single or repeated multi-criteria decision analysis. In static rankings, experts’ opinions serve as inputs for the decision matrices, whereas dynamic rankings measure user attitudes—potentially informed by behavior or survey data. Comparative analyses with other multi-criteria baselines underscore both the effectiveness and reliability of the proposed methods.
2. Related Work
2.1. Literature Review on MCDM Methods for GAI Chatbot Evaluation
- Limited handling of inaccurate attribute estimates – Few studies, such as those by Chakrabortty et al. [15], Pandey et al. [18] and Ojo et al. [21], effectively address imprecise attribute estimates. Since AI chatbot evaluations often depend on subjective factors, assessments should involve expert groups utilizing classic fuzzy numbers or their advanced variants.
- Non-iterative fuzzy solutions – Existing fuzzy methodologies typically implement only one or two MCDM methods in a single, non-iterative procedure.
2.2. Chatbot Evaluation Criteria
- Conversational ability evaluates the chatbot’s capacity to understand and generate natural language responses, ensuring context-aware, coherent, and human-like interactions.
- User experience measures ease of use, intuitiveness, and satisfaction, focusing on design, accessibility, and the chatbot’s ability to meet user needs effectively.
- Integration capability assesses how seamlessly the chatbot integrates with existing tools, platforms, or workflows, enhancing usability and productivity.
- Price considers the affordability of the chatbot, evaluating its cost relative to its features, functionality and overall value.
2.3. State-of-the-Art of the Most Widely Used GAI Chatbots
3. Methodological Framework for GAI Chatbot Selection
3.1. Interval-Valued Hesitant Fermatean Fuzzy Numbers–Some Basic Definitions and Operations
3.2. TOPSIS in IVHFFNs Environment
3.3. Theoretical Framework for GAI Chatbot Selection
4. A Case Study of Quality-Based Evaluation of GAI Chatbots
5. Conclusions
- Incorporation of interval-valued membership and non-membership grades, along with interval-valued hesitancy degrees in the evaluation process.
- Integration of Minkowski distance-based family of metrics, enabling flexible and accurate distance calculations tailored to various data types.
- Consideration of the lengths of belongingness, non-belongingness, and hesitancy intervals in distance calculations, ensuring a comprehensive assessment of each criterion’s impact.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Bulchand-Gidumal, J. Impact of artificial intelligence in travel, tourism, and hospitality. In Handbook of e-Tourism, pp. 1943–1962. Cham: Springer International Publishing, 2022.
- Obaid, A.J.; Bhushan, B.; Rajest, S.S., Eds. Advanced Applications of Generative AI and Natural Language Processing Models. IGI Global, 2023.
- Al-Amin, M.; Ali, M.S.; Salam, A.; Khan, A.; Ali, A.; Ullah, A.; …; Chowdhury, S.K. History of generative Artificial Intelligence (AI) chatbots: Past, present, and future development. arXiv preprint arXiv:2402.05122, 2024. Available online: https://arxiv.org/abs/2402.05122 (accessed on 1 January 2025).
- Yenduri, G.; Srivastava, G.; Maddikunta, P.K.R.; Jhaveri, R.H.; Wang, W.; Vasilakos, A.V.; Gadekallu, T.R. Generative pre-trained transformer: A comprehensive review on enabling technologies, potential applications, emerging challenges, and future directions. arXiv preprint 2023, arXiv:2305.10435. [Google Scholar] [CrossRef]
- Saka, A.; Taiwo, R.; Saka, N.; Salami, B.A.; Ajayi, S.; Akande, K.; Kazemi, H. GPT models in construction industry: Opportunities, limitations, and a use case validation. Developments in the Built Environment 2023, 100300. [Google Scholar] [CrossRef]
- Dwivedi, Y.K.; Pandey, N.; Currie, W.; Micu, A. Leveraging ChatGPT and other generative artificial intelligence (AI)-based applications in the hospitality and tourism industry: Practices, challenges and research agenda. International Journal of Contemporary Hospitality Management 2024, 36(1), 1–12. [Google Scholar] [CrossRef]
- Chen, B.; Wu, Z.; Zhao, R. From fiction to fact: The growing role of generative AI in business and finance. Journal of Chinese Economic and Business Studies 2023, 21(4), 471–496. [Google Scholar] [CrossRef]
- Ghaffari, S.; Yousefimehr, B.; Ghatee, M. Generative-AI in E-Commerce: Use-Cases and Implementations. In Proceedings of the 2024 20th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), February 2024; pp. 1–5, IEEE. [Google Scholar]
- Al Naqbi, H.; Bahroun, Z.; Ahmed, V. Enhancing Work Productivity through Generative Artificial Intelligence: A Comprehensive Literature Review. Sustainability 2024, 16(3), 1166. [Google Scholar] [CrossRef]
- Statista. Chatbot market worldwide 2016–2025. Available online: https://www.statista.com/statistics/656596/worldwide-chatbot-market/ (accessed on 30 June 2024).
- Gartner. Gartner Says More Than 80% of Enterprises Will Have Used Generative AI APIs or Deployed Generative AI-Enabled Applications by 2026. Available online: https://www.gartner.com/en/newsroom/press-releases/2023-10-11-gartner-says-more-than-80-percent-of-enterprises-will-have-used-generative-ai-apis-or-deployed-generative-ai-enabled-applications-by-2026 (accessed on 30 June 2024).
- Wang, K.; Ying, Z.; Goswami, S.S.; Yin, Y.; Zhao, Y. Investigating the role of artificial intelligence technologies in the construction industry using a Delphi-ANP-TOPSIS hybrid MCDM concept under a fuzzy environment. Sustainability 2023, 15(15), 11848. [Google Scholar] [CrossRef]
- Alshahrani, R.; Yenugula, M.; Algethami, H.; Alharbi, F.; Goswami, S.S.; Naveed, Q.N.; Zahmatkesh, S. Establishing the fuzzy integrated hybrid MCDM framework to identify the key barriers to implementing artificial intelligence-enabled sustainable cloud system in an IT industry. Expert Systems with Applications 2024, 238, 121732. [Google Scholar] [CrossRef]
- Mishra, A.R.; Liu, P.; Rani, P. COPRAS method based on interval-valued hesitant Fermatean fuzzy sets and its application in selecting desalination technology. Applied Soft Computing 2022, 119, 108570. [Google Scholar] [CrossRef]
- Chakrabortty, R.K.; Abdel-Basset, M.; Ali, A.M. A multi-criteria decision analysis model for selecting an optimum customer service chatbot under uncertainty. Decision Analytics Journal 2023, 6, 100168. [Google Scholar] [CrossRef]
- Santa Barletta, V.; Caivano, D.; Colizzi, L.; Dimauro, G.; Piattini, M. Clinical-chatbot AHP evaluation based on “quality in use” of ISO/IEC 25010. International Journal of Medical Informatics 2023, 170, 104951. [Google Scholar] [CrossRef] [PubMed]
- Singh, C.; Dash, M.K.; Sahu, R.; Singh, G. Evaluating Critical Success Factors for Acceptance of Digital Assistants for Online Shopping Using Grey–DEMATEL. International Journal of Human–Computer Interaction 2023, 1–15. [Google Scholar]
- Pandey, M.; Litoriya, R.; Pandey, P. Indicators of AI in Automation: An Evaluation Using Intuitionistic Fuzzy DEMATEL Method with Special Reference to Chat GPT. Wireless Personal Communications 2024, 134, 445–465. Available online: https://link.springer.com/article/10.1007/s11277-024-10917-7 (accessed on 30 June 2024).
- Pathak, A.; Bansal, V. Factors Influencing the Readiness for Artificial Intelligence Adoption in Indian Insurance Organizations. In Transfer, Diffusion and Adoption of Next-Generation Digital Technologies, S.K. Sharma; Y.K. Dwivedi; B. Metri; B. Lal; A. Elbanna, Eds.; IFIP Advances in Information and Communication Technology, vol 698, Springer, Cham, 2024, pp. 384–397. Available online: https://link.springer.com/chapter/10.1007/978-3-031-50192-0_5.
- Wiangkham, A.; Vongvit, R. Comparative Analysis of MCDM Methods for Prioritizing Influential Factors of Chatgpt Adoption in Higher Education. 2024. Available online: https://ssrn.com/abstract=5040810.
- Ojo, Y.; Davids, V.; Oni, O.; Odoemene, M.; Idowu-Collin, P.; Eyeregba, U. A multi-criteria approach for evaluating the use of AI for matching patients to optimal mental health treatment plans. Reading Time 2024, 05–09. Available online: https://worldscientificnews.com/wp-content/uploads/2024/04/WSN-1932-2024-201-222.pdf.
- Chatbot Arena. Available online: https://lmarena.ai (accessed on 1 January 2025).
- Artificial Analysis. Available online: https://artificialanalysis.ai/ (accessed on 1 January 2025).
- Parasuraman, A.; Zeithaml, V.A.; Berry, L.L. SERVQUAL: A multiple-item scale for measuring consumer perceptions of service quality. Journal of Retailing 1988, 64(1), 12–40. [Google Scholar]
- ISO/IEC. Systems and software engineering — Systems and software Quality Requirements and Evaluation (SQuaRE) — Product quality model. International Organization for Standardization (ISO), Geneva, Switzerland, 2023. Available online: https://www.iso.org/standard/78176.html (accessed on 1 January 2025).
- Davis, F.D. Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly 1989, 319–340. [Google Scholar] [CrossRef]
- Verhoef, P.C.; Lemon, K.N.; Parasuraman, A.; Roggeveen, A.; Tsiros, M.; Schlesinger, L.A. Customer experience creation: Determinants, dynamics and management strategies. Journal of Retailing 2009, 85(1), 31–41. [Google Scholar] [CrossRef]
- Venkatesh, V.; Morris, M.G.; Davis, G.B.; Davis, F.D. User Acceptance of Information Technology: Toward a Unified View. MIS Quarterly 2003, 27(3), 425–478. [Google Scholar] [CrossRef]
- Tornatzky, L.G.; Fleischer, M. The Processes of Technological Innovation. Lexington Books, 1990.
- Yusof, M.M.; Kuljis, J.; Papazafeiropoulou, A.; Stergioulas, L.K. An Evaluation Framework for Health Information Systems: Human, Organization and Technology-Fit Factors (HOT-Fit). International Journal of Medical Informatics 2008, 77(6), 386–398. [Google Scholar] [CrossRef] [PubMed]
- Pan, C.; Banerjee, J.S.; De, D.; Sarigiannidis, P.; Chakraborty, A.; Bhattacharyya, S. ChatGPT: A OpenAI platform for society 5.0. In Proceedings of the Doctoral Symposium on Human Centered Computing, Singapore, February 2023; pp. 384–397, Springer Nature Singapore. [Google Scholar]
- Stratton, J. An Introduction to Microsoft Copilot. In Copilot for Microsoft 365: Harness the Power of Generative AI in the Microsoft Apps You Use Every Day, pp. 19–35. Berkeley, CA: Apress, 2024.
- Saeidnia, H.R. Welcome to the Gemini era: Google DeepMind and the information industry. Library Hi Tech News 2023, ahead-of-print.
- Priyanshu, A.; Maurya, Y.; Hong, Z. AI Governance and Accountability: An Analysis of Anthropic’s Claude. arXiv preprint 2024, arXiv:2407.01557. [Google Scholar]
- Deike, M. Evaluating the performance of ChatGPT and Perplexity AI in Business Reference. Journal of Business & Finance Librarianship 2024, 29(2), 125–154. [Google Scholar]
- Hwang, C.L.; Yoon, K. Multiple Attribute Decision Making: Methods and Applications A State-of-the-Art Survey. Springer, 1981, vol. 186.
- Zadeh, L.A. The concept of a linguistic variable and its application to approximate reasoning—I. Information Sciences 1975, 8(3), 199–249. [Google Scholar] [CrossRef]
- Torra, V. Hesitant fuzzy sets. International Journal of Intelligent Systems 2010, 25(6), 529–539. [Google Scholar] [CrossRef]
- Senapati, T.; Yager, R.R. Fermatean fuzzy sets. Journal of Ambient Intelligence and Humanized Computing 2020, 11, 663–674. [Google Scholar] [CrossRef]

| Reference | Methodology | Application area | Alternatives | Criteria (number) | Ranking validation |
|---|---|---|---|---|---|
| Chakrabortty et al. 2023 [15] | SVN AHP-CoCoCo | Telecommunication business | Eight chatbots for customer service | Security, Speed, Responsiveness, Satisfaction, Reliability, Assurance, Tangibility, Engagement, Emphaty (9) | SVN MABAC, Pythagorean fuzzy CoCoCo, Interval valued Neutrosophic TOPSIS |
| Santa Barleta et al. 2023 [16] | AHP | Medical care | Two clinical chatbots | Effectiveness, Efficacy, Satisfaction, Freedom from risk, Context coverage in three fuctional dimensions (5 criteria groups) | Superdecision software |
| Singh et al. 2023 [17] | Grey- DEMATEL |
Online retail | Only criteria weights | Social influence, Enjoyment, Performance, Ease of use, Usefulness, Social presence, Anxiety, Trust, Rapport, Privacy risk, Social isolation, Sense of control, Compatibility (12) | Sensitivity analysis in three scenarious |
| Pandey at al. 2024 [18] |
Intuitionistic fuzzy DEMATEL |
GAI chatbots challanges | Only criteria weights | Hallucination*, Bias, Language learning, Real-world harm*, Proprietary LLMs*, AI problems*, Disruption, Jobs at risk, Educational system problems, Training data amount*, Unknown threats, Ethical and legal implications (12) | MAE of issues using Classical DEMATEL and fuzzy DEMATEL |
| Pathak and Bansal 2024 [19] | Rough SWARA | Insurance | Only criteria weights | Technology (7), Organization (6), Environment (3), Individual (4) criteria groups | Sensitivity analysis |
| Wiangkham and Vongvit 2024 [20] | WSM, ANN with SHAP and LIME | Higher education |
Only criteria weights | Usage (4), Agent (3), Technical (4), Trust (3) related criteria groups | WMAPE for ANN models |
| Ojo et al. 2024 [21] | Fuzzy triangular TOPSIS |
Medical care | Six AI alternatives |
Privacy protection, Treatment effectiveness, Explainability, Costs, Regulatory compliance, Ethical implications (6) | Comparative analysis |
| Feature | ChatGPT | Copilot | Gemini | Claude | Perplexity |
|---|---|---|---|---|---|
| Foundation LLM(s) | GPT-o1, GPT-4o | GPT-4o | Gemini 2.0 Flash, Gemini 1.5 Pro |
Claude 3.5 Sonnet, Claude 3.5 Haiku |
Sonar Small, Sonar Large |
| Features | Web browsing, code execution, image generation, custom GPTs for tailored interactions | Coding assistance, task automation, integration with MS product |
Multimodal data processing, integration with Google services, advanced reasoning capabilities | Safety, ethical considerations, handling extensive context for in-depth analyses | Information retrieval, real-time web search capabilities, user-friendly interfaces |
| Advantages | Versatile tasks, including content creation, coding assistance, and data analysis. | Deep integration with MS’s ecosystem, excelling in coding support and task automation within MS applications | Handling large context reasoning, multimodal data processing, and integration with Google services | Managing extensive context windows, suitable for processing large documents and complex conversations | Quick information retrieval and concise answers, functioning as an AI-powered search assistant |
| Context length | 128K | 128K | 1M, 2M | 200K | 131K |
| Integration | Available as an API, browser and mobile app | Integrated into MS produts (web, Windows, mobile) and code editors (Visual Studio, GitHub) | Integrated into Google Workspace and other Google services | Available via API and standalone applications |
Accessible through web interface and browser extensions |
| Price | Free tier available; Plus and Pro subscriptions at $20/month and $200/month for priority access and additional features | Integrated into MS’s ecosystem; pricing varies based on specific application and subscription model, Pro at $20/month | Offers free and premium versions, advanced plan priced at $19/month | Free tier with limited daily messages; Pro plan at $20/month offering enhanced capabilities | Free access with basic functionalities; Pro version at $20/month for advanced features |
|
Real-time access |
Yes, can browse the internet to provide current information | Yes, accesses real-time data from the web | Yes, designed for real-time interactions and data retrieval | Limited, primarily relies on training data with some real-time capabilities in advanced versions | Yes, provides up-to-date information from the web |
| Price | Free tier available; Plus and Pro subscriptions at $20/month and $200/month for priority access and additional features | Integrated into MS’s ecosystem; pricing varies based on specific application and subscription model, Pro at $20/month | Offers free and premium versions, advanced plan priced at $19/month | Free tier with limited daily messages; Pro plan at $20/month offering enhanced capabilities | Free access with basic functionalities; Pro version at $20/month for advanced features |
| Criteria Alternative |
C1 | C2 | C3 | C4 |
|---|---|---|---|---|
| A1 | VH | H | VH | H |
| A2 | H | H | VH | H |
| A3 | H | H | M | L |
| A4 | M | M | M | L |
| A5 | M | M | L | H |
| Criterion type |
| Linguistic term | IVHFFN |
|---|---|
| Very Low (VL) | {(0.1, 0.2) (0.3, 0.4)}, {(0.7, 0.8) (0.75, 0.85)} |
| Low (L) | {(0.3, 0.4) (0.5, 0.6)}, {(0.5, 0.6) (0.55, 0.65)} |
| Medium (M) | {(0.5, 0.6) (0.7, 0.8) (0.75, 0.9)}, {(0.3, 0.4) (0.35, 0.45)} |
| High (H) | {(0.7, 0.8) (0.8, 0.9)}, {(0.1, 0.2)} |
| Very High (VH) | {(0.9, 0.95) (0.9, 0.99)}, {(0.01, 0.1) (0.06, 0.15)} |
| A1 | A2 | A3 | A4 | A5 | ||
|---|---|---|---|---|---|---|
| IVHFFNSs TOPSIS |
Score | 0.45 | 0.39 | 0.34 | 0.30 | 0.30 |
| Rank | 1 | 2 | 3 | 4 | 4 |
| WSM | TFNs WSM | EDAS | TOPSIS | |||||
| Alternative | Score | Rank | Score | Rank | Score | Rank | Score | Rank |
| A1 | 0.40 | 1 | 0.19 | 1 | 0.67 | 1 | 0.65 | 1 |
| A2 | 0.36 | 2 | 0.17 | 3 | 0.58 | 3 | 0.60 | 3 |
| A3 | 0.36 | 2 | 0.18 | 2 | 0.64 | 2 | 0.54 | 2 |
| A4 | 0.20 | 4 | 0.10 | 4 | 0.42 | 5 | 0.22 | 5 |
| A5 | 0.16 | 5 | 0.08 | 5 | 0.50 | 4 | 0.0 | 4 |
| Benchmark | 0.95 | 0.85 | 0.95 | |||||
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
