Submitted:
11 April 2026
Posted:
14 April 2026
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Literature Review
3. Materials and Methods
- a set of business-relevant decision criteria,
- criterion weights established by organizational experts,
- prompt evaluation by a classification model,
- the SAW multicriteria decision-support method,
- routing of the query to either a less expensive or a more capable model.
- C1—required substantive accuracy (describes how high the correctness and precision of the response must be for the outcome to be business-useful; the greater the required accuracy, the greater the need to use a more advanced model),
- C2—risk of the business consequences of error (describes the potential effects of generating an incomplete, misleading, or incorrect response; an error in a draft marketing post has different significance than an error in a compliance analysis, HR policy, communication with a strategic client, or the interpretation of a document),
- C3—required depth of reasoning (describes whether the task requires simple information processing or multi-step reasoning, synthesis, and logical analysis; the greater the depth of reasoning required, the greater the likelihood that the organization will prefer a more powerful model),
- C4—sensitivity to processing cost (describes how important cost savings are from the company’s perspective for a given type of query; not every task requires maximizing quality at any cost. In many large-scale processes, unit cost is the priority),
- C5—task sensitivity to response time (describes how important it is to obtain the result quickly; in operational, contact-intensive, or high-volume tasks, speed may be just as important as quality),
- C6—required standardization and compliance of the response (describes whether the response must strictly conform to the adopted style, structure, company policy, or communication standard; in organizations, a large part of AI’s value derives not from creativity, but from repeatability, consistency, and the scalability of communication),
- C7—required creativity/openness of generation (describes the extent to which the task requires a creative, non-standard, or exploratory approach).
- a vector of responses originating exclusively from the less expensive model (always cheap strategy),
- a vector of responses originating exclusively from the more expensive model (always strong strategy),
- a vector of responses resulting from prompt routing using the multicriteria model (SAW-based routing).
- Sufficiency Rate—this is the basic effectiveness indicator of the compared models
- 2.
- Average Cost per Prompt (total strategy cost)—this is the indicator of the average cost of handling a single query
- 3.
- Cost per Sufficient Response—an indicator referring to how much the organization pays on average to obtain one sufficient response. This is a key indicator in the context of comparing the analyzed models
- 4.
- Incremental Cost of Sufficiency Gain—the incremental cost of improving sufficiency (e.g., relative to cheap-only)
4. Results
- C1—required substantive accuracy,
- C2—risk of the business consequences of error,
- C3—required depth of reasoning,
- C4—sensitivity to processing cost,
- C5—task sensitivity to response time,
- C6—required standardization and compliance of the response,
- C7—required creativity/openness of generation.
- a vector of responses originating exclusively from the less expensive model (always cheap strategy),
- a vector of responses originating exclusively from the more expensive model (always strong strategy),
- a vector of responses resulting from prompt routing using the multicriteria model (SAW-based routing).
5. Conclusions
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A. System Prompt for Prompt Evaluation
- Do not inflate scores just because the task sounds professional.
- Typical operational, communicative, and template-based tasks should NOT automatically receive high C1/C2/C3 scores.
- High standardization (C6) → cheaper model.
- High cost sensitivity (C4) and time sensitivity (C5) → cheaper model.
- Assess the NEED for escalation, not the general “importance” of the task.
- Simple summaries, emails, lists, messages → low C1/C2/C3, high C4/C5/C6
- Legal, strategic, high-risk tasks → higher C1/C2/C3
- Do not assign multiple criteria a score of 5 without a clear justification.
References
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar] [CrossRef]
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models. ACM Trans. Intell. Syst. Technol. 2025, 16, 1–72. [Google Scholar] [CrossRef]
- Wamba-Taguimdje, S.-L.; Fosso Wamba, S.; Kala Kamdjoug, J.R.; Tchatchouang Wanko, C.E. Influence of Artificial Intelligence (AI) on Firm Performance: The Business Value of AI-Based Transformation Projects. Bus. Process Manag. J. 2020, 26, 1893–1924. [Google Scholar] [CrossRef]
- Sestino, A.; De Mauro, A. Leveraging Artificial Intelligence in Business: Implications, Applications and Methods. Technol. Anal. Strateg. Manag. 2022, 34, 16–29. [Google Scholar] [CrossRef]
- Le Dinh, T.; Vu, M.-C.; Tran, G.T.C. Artificial Intelligence in SMEs: Enhancing Business Functions through Technologies and Applications. Information 2025, 16, 415. [Google Scholar] [CrossRef]
- Nowak, M. Prediction of Voluntary Employee Turnover Using Machine Learning. In Zeszyty Naukowe. Organizacja i Zarzadzanie/Politechnika Ślaska; 2024; Volume 201. [Google Scholar]
- Kanbach, D.K.; Heiduk, L.; Blueher, G.; Schreiter, M.; Lahmann, A. The GenAI Is out of the Bottle: Generative Artificial Intelligence from a Business Model Innovation Perspective. Rev. Manag. Sci. 2024, 18, 1189–1220. [Google Scholar] [CrossRef]
- Romeo, E.; Lacko, J. Adoption and Integration of AI in Organizations: A Systematic Review of Challenges and Drivers towards Future Directions of Research. Kybernetes 2026, 55, 1286–1307. [Google Scholar] [CrossRef]
- Sánchez, M.A. Exploring Value Creation of Generative Artificial Intelligence in Organizations: A Systematic Review. Strateg. Bus. Res. 2025, 100015. [Google Scholar] [CrossRef]
- Nowak, M.; Pawłowska-Nowak, M. Dynamic Pricing Method in the E-Commerce Industry Using Machine Learning. Appl. Sci. 2024, 14, 11668. [Google Scholar] [CrossRef]
- Nowak, M.; Pawłowska-Nowak, M. Prediction of the Type of Organizational Culture Using Machine Learning Approach. Prz. Organ. 2023, 264–272. [Google Scholar] [CrossRef]
- Brynjolfsson, E.; Li, D.; Raymond, L. Generative AI at Work. Q. J. Econ. 2025, 140, 889–942. [Google Scholar] [CrossRef]
- Noy, S.; Zhang, W. Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence. Science (1979) . 2023, 381, 187–192. [Google Scholar]
- Dell’Acqua, F.; McFowland, E., III; Mollick, E.; Lifshitz, H.; Kellogg, K.C.; Rajendran, S.; Krayer, L.; Candelon, F.; Lakhani, K.R. Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality. Organ. Sci. 2026. [Google Scholar] [CrossRef]
- Papagiannidis, E.; Mikalef, P.; Conboy, K. Responsible Artificial Intelligence Governance: A Review and Research Framework. J. Strateg. Inf. Syst. 2025, 34, 101885. [Google Scholar] [CrossRef]
- Schneider, J.; Kuss, P.; Abraham, R.; Meske, C. Governance of Generative Artificial Intelligence for Companies. arXiv 2024, arXiv:2403.08802. [Google Scholar]
- Leoni, L.; Gueli, G.; Ardolino, M.; Panizzon, M.; Gupta, S. AI-Empowered KM Processes for Decision-Making: Empirical Evidence from Worldwide Organisations. J. Knowl. Manag. 2024, 28, 320–347. [Google Scholar] [CrossRef]
- Yan, J.; Husted, K.; Fath, B. Transforming Organizational Knowledge Creation through Artificial Intelligence: A Systematic Review of the Emergent Literature. VINE J. Inf. Knowl. Manag. Syst. 2026, 56, 522–540. [Google Scholar] [CrossRef]
- Vidgof, M.; Bachhofner, S.; Mendling, J. Large Language Models for Business Process Management: Opportunities and Challenges. In Proceedings of the International conference on business process management, 2023; pp. 107–123. [Google Scholar]
- Bernardi, M.L.; Casciani, A.; Cimitile, M.; Marrella, A. Conversing with Business Process-Aware Large Language Models: The BPLLM Framework. J. Intell. Inf. Syst. 2024, 62, 1607–1629. [Google Scholar] [CrossRef]
- Kourani, H.; Berti, A.; Schuster, D.; van der Aalst, W.M.P. Process Modeling with Large Language Models. In Proceedings of the International Conference on Business Process Modeling, Development and Support, 2024; pp. 229–244. [Google Scholar]
- Apaydin, K.; Zisgen, Y. Local Large Language Models for Business Process Modeling. In Proceedings of the International Conference on Process Mining, 2024; pp. 605–609. [Google Scholar]
- Kourani, H.; Berti, A.; Schuster, D.; van der Aalst, W.M.P. Evaluating Large Language Models on Business Process Modeling: Framework, Benchmark, and Self-Improvement Analysis: H. Kourani et Al. Softw. Syst. Model. 2025, 1–36. [Google Scholar] [CrossRef]
- Kampik, T.; Warmuth, C.; Rebmann, A.; Agam, R.; Egger, L.N.P.; Gerber, A.; Hoffart, J.; Kolk, J.; Herzig, P.; Decker, G.; et al. Large Process Models: A Vision for Business Process Management in the Age of Generative AI. KI-Künstliche Intell. 2025, 39, 81–95. [Google Scholar] [CrossRef]
- Dohan, D.; Xu, W.; Lewkowycz, A.; Austin, J.; Bieber, D.; Lopes, R.G.; Wu, Y.; Michalewski, H.; Saurous, R.A.; Sohl-Dickstein, J.; et al. Language Model Cascades. arXiv 2022, arXiv:2207.10342. [Google Scholar] [CrossRef]
- Chen, L.; Zaharia, M.; Zou, J. Frugalgpt: How to Use Large Language Models While Reducing Cost and Improving Performance. arXiv 2023, arXiv:2305.05176. [Google Scholar] [CrossRef]
- Šakota, M.; Peyrard, M.; West, R. Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling. In Proceedings of the Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024; pp. 606–615. [Google Scholar]
- Hu, Q.J.; Bieker, J.; Li, X.; Jiang, N.; Keigwin, B.; Ranganath, G.; Keutzer, K.; Upadhyay, S.K. Routerbench: A Benchmark for Multi-Llm Routing System. arXiv 2024, arXiv:2403.12031. [Google Scholar]
- Ong, I.; Almahairi, A.; Wu, V.; Chiang, W.-L.; Wu, T.; Gonzalez, J.E.; Kadous, M.W.; Stoica, I. Routellm: Learning to Route Llms with Preference Data. arXiv 2024, arXiv:2406.18665. [Google Scholar]
- Mohammadshahi, A.; Shaikh, A.R.; Yazdani, M. Routoo: Learning to Route to Large Language Models Effectively. arXiv 2024, arXiv:2401.13979. [Google Scholar] [CrossRef]
- Dekoninck, J.; Baader, M.; Vechev, M. A Unified Approach to Routing and Cascading for Llms. arXiv 2024, arXiv:2410.10347. [Google Scholar] [CrossRef]
- Shirkavand, R.; Gao, S.; Yu, P.; Huang, H. Cost-Aware Contrastive Routing for Llms. arXiv 2025, arXiv:2508.12491. [Google Scholar] [CrossRef]
- Yue, M.; Zhao, J.; Zhang, M.; Du, L.; Yao, Z. Large Language Model Cascades with Mixture of Thoughts Representations for Cost-Efficient Reasoning. arXiv 2023, arXiv:2310.03094. [Google Scholar]
- Kazimieras Zavadskas, E.; Antucheviciene, J.; Chatterjee, P. Multiple-Criteria Decision-Making (MCDM) Techniques for Business Processes Information Management. Information 2018, 10, 4. [Google Scholar] [CrossRef]
- Saaty, T.L. A Scaling Method for Priorities in Hierarchical Structures. J. Math. Psychol. 1977, 15, 234–281. [Google Scholar] [CrossRef]
- Saaty, T.L. Decision Making with the Analytic Hierarchy Process. 2002. [Google Scholar]
- Hwang, C.-L.; Yoon, K. Multiple Attribute Decision Making: Methods and Applications a State-of-the-Art Survey; Springer Science & Business Media, 2012. [Google Scholar]
- Kaliszewski, I.; Podkopaev, D. Simple Additive Weighting—A Metamodel for Multiple Criteria Decision Analysis Methods. Expert Syst. Appl. 2016, 54, 155–161. [Google Scholar] [CrossRef]
- Ciardiello, F.; Genovese, A. A Comparison between TOPSIS and SAW Methods. Ann. Oper. Res. 2023, 325, 967–994. [Google Scholar] [CrossRef]
- Chakrabortty, R.K.; Abdel-Basset, M.; Ali, A.M. A Multi-Criteria Decision Analysis Model for Selecting an Optimum Customer Service Chatbot under Uncertainty. Decis. Anal. J. 2023, 6, 100168. [Google Scholar] [CrossRef]
- Nowak, M.; Mierzwiak, R.; Butlewski, M. Occupational Risk Assessment with Grey System Theory. Cent. Eur. J. Oper. Res. 2020, 28, 717–732. [Google Scholar] [CrossRef]
| Criterion | Weight | Value [%] |
|---|---|---|
| C1 Accuracy | 0.2591 | 25.9078 |
| C4 Cost_Sensitivity | 0.2086 | 20.8578 |
| C5 Time_Sensitivity | 0.1894 | 18.9366 |
| C2 Business_Risk | 0.1337 | 13.3739 |
| C6 Standardization | 0.0922 | 9.2207 |
| C3 Reasoning_Depth | 0.0874 | 8.7351 |
| C7 Creativity | 0.0297 | 2.9680 |
| Prompt No. | Prompt Content |
|---|---|
| 1 | Read the following excerpt from a quarterly report and prepare a 5-point summary for the Chief Operating Officer. Focus exclusively on the key conclusions regarding sales, costs, and profitability. Do not exceed 150 words. “In the second quarter of 2025, the company’s revenues amounted to PLN 18.4 million and were 6.8% higher than in the corresponding period of the previous year. Sales growth was recorded primarily in the subscription services segment, whose share in the revenue structure increased from 34% to 41%. At the same time, operating costs increased by 11.9%, mainly as a result of higher wages, energy costs, and expenditures on customer acquisition campaigns. Gross margin declined from 29.7% to 26.1%, while net profitability fell from 11.4% to 8.9%. The Management Board indicated that the current level of marketing costs is temporary in nature and should translate into sales growth over the next two quarters.” |
| 2 | Write a professional email message to a strategic client explaining a two-week delay in the implementation of the system. The message should be polite, specific, and reassuring. It should include: the reason for the delay, a corrective action plan, and a proposal for a brief status meeting. Situation: the implementation of a reporting platform for a client from the logistics sector was scheduled to be completed by 12 May, but during testing a problem was detected with the integration of data from the warehouse management system. The technical team has prepared a fix, and the new plan assumes completion of the implementation by 26 May. The client is important to the company and expects regular communication. |
| 3 | Propose 5 ideas for the main slogan of a campaign promoting a new advisory service for small and medium-sized enterprises. The slogans should be modern, professional, and emphasize time savings as well as better business decisions. Then, for each slogan, add one short sentence explaining its meaning. Service description: the company offers subscription-based online business advisory services, including sales data analysis, operational consultations, and recommendations for process improvements in small and medium-sized enterprises. |
| prompt | C1 | C2 | C3 | C4 | C5 | C6 | C7 |
| 1 | 5 | 4 | 3 | 2 | 4 | 3 | 1 |
| 2 | 5 | 5 | 4 | 2 | 4 | 5 | 2 |
| 3 | 4 | 4 | 4 | 5 | 3 | 3 | 2 |
| … | … | ||||||
| 99 | 5 | 5 | 5 | 3 | 4 | 4 | 2 |
| 100 | 4 | 3 | 2 | 1 | 4 | 3 | 1 |
| Prompt id | Cheap score | Strong score | Score gap | Selected model |
|---|---|---|---|---|
| 1 | 2.389 | 3.611 | 1.222 | strong |
| 2 | 2.322 | 3.678 | 1.356 | strong |
| 3 | 2.966 | 3.034 | 0.068 | strong |
| 4 | 3.883 | 2.117 | -1.766 | cheap |
| 5 | 3.572 | 2.428 | -1.145 | cheap |
| Prompt id | Model cheap-only | Model strong-only | Model routing (selected) |
|---|---|---|---|
| 1 | sufficient | sufficient | sufficient (strong) |
| 2 | non-sufficient | sufficient | sufficient (strong) |
| 3 | sufficient | sufficient | sufficient (strong) |
| 4 | sufficient | sufficient | sufficient (cheap) |
| 5 | non-sufficient | non-sufficient | non-sufficient (cheap) |
| Model | Number of sufficient responses | Number of sufficient responses (%) (SR) | Total cost (USD) | |
|---|---|---|---|---|
| Cheap only | 50 | 50% | 0.8000 | |
| Strong only | 85 | 85% | 2.0000 | |
| Routing | 85 | 85% | 1.5200 | |
| Model | Average cost per prompt (ACP) |
Cost per sufficient response (CSR) |
Incremental Cost of Sufficiency Gain (vs cheap) (ICSG) | |
| Cheap only | 0.0080 | 0.0160 | - | |
| Strong only | 0.0200 | 0.0235 | 0.0343 | |
| Routing | 0.0152 | 0.0179 | 0.0206 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).