Submitted:
28 September 2025
Posted:
29 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Business Process Modeling and the Emergence of LLM-Based Tools
- BA-Copilot is a professional assistant that generates editable BPMN diagrams using the BPMN.io toolkit. It generates process models directly in .bpmn format, focusing on structured outputs and practical usability.
- BPMN-Chatbot, an academic prototype from the University of Klagenfurt, explores the potential of natural language interfaces in process modeling, leveraging LLM capabilities.
- Camunda BPMN Copilot is an open-source solution integrated into the Camunda Modeler, enabling prompt-based generation of BPMN models within a widely used modeling environment.
- Nala2BPMN, developed by Bonitasoft, automates the transformation of natural language inputs into BPMN diagrams, focusing on accelerating the design phase.
- ProMoAI is a lightweight research prototype built using Streamlit, designed to demonstrate prompt-to-BPMN generation with minimal user setup.
3. Research Methodology
4. Evaluation Framework
4.1. Quality Measures Selection
4.2. The 3C Evaluation System
Correctness Criteria
- No syntactic/behavioral violations (High): The model must not contain deadlocks, livelocks, or infinite loops.
- No structural errors (High): All elements must be connected and part of a valid flow; there should be no isolated nodes.
- No semantic violations (Moderate): The model must accurately represent the described process logic. This means that the sequence and type of elements in the BPMN model should reflect the intended process behavior.
- No redundant flows/elements (Moderate): Avoid unnecessary components that do not serve a functional purpose.
- All gateway splits must be followed by matching joins (Low): Logical consistency in control flow must be maintained.
Completeness Criteria
- All control flow elements from the prompt are modeled (High).
- The process includes start and end events (High).
- All decision outcomes are modeled (Moderate): Each decision point must have all expected branches (e.g., Yes/No).
- Exception handling is modeled where relevant (Moderate).
- All gateway splits must be followed by matching joins (Low): Logical consistency in control flow must be maintained.
Clarity Criteria
- No misleading or incorrect labels (High): Activity, event, and gateway labels must clearly and accurately reflect their function.
- No unlabeled elements (High): All elements, especially tasks, gateways, and events, must be named to clarify their role in the process.
- No diagram layout issues (Moderate): Models should use consistent spacing and avoid zigzag flows to ensure visual readability.
- No overlapping flows (Moderate): Sequence flows should not cross unnecessarily.
- All outgoing arcs of (X)OR-splits are labeled (Low): Paths from decision gateways must be labeled with conditions to indicate logic.
5. Design of Experiments
5.1. Selection and Extension of the Process Scenario
- Event-driven conditional waiting: A conditional waiting mechanism was added, described as: “The process waits until one of two events occurs: either the applicant submits the missing documents, or 15 days pass without submission”.
- Recurring automated reminders: A recurring non-interrupting reminder mechanism was incorporated, stated as: “As long as the license remains unissued, an automated reminder is sent to the responsible official every 5 working days”.
5.2. Prompt Variations
- Prompt 1 (Step-by-step instructions): A clearly enumerated list that explicitly outlines each step of the process, providing a highly structured input format.
- Prompt 2 (Paragraph description): An unstructured narrative paragraph describing the overall process flow without explicit enumeration, requiring the tools to implicitly infer and reconstruct the underlying structure.
- Prompt 3 (Noisy paragraph): Similar to Prompt 2, but augmented with additional, irrelevant contextual information (i.e., noise). This variant specifically evaluates the tools’ ability to filter out non-essential content and accurately represent only the relevant process logic in the generated BPMN model.
5.3. Model Generation
5.4. Evaluation Procedure
- Criteria Organization: The sheet is divided into the three selected quality dimensions - Clarity, Correctness, and Completeness. Under each dimension, specific quality metrics are listed (e.g., “No misleading or incorrect labels”, “No structural correctness violations”, “No missing start/end events”). Each metric is associated with an importance level (High, Moderate, or Low) and a corresponding weight (3, 2, or 1 respectively).
- Evaluation Entries: For each LLM-based tool (e.g., BA Copilot, BPMN Chatbot, Nala2BPMN, Camunda BPMN Copilot, ProMoAI), the scoring sheet records a binary outcome (True/False) for each quality criterion under each prompt (Prompt 1, Prompt 2, Prompt 3). A “True” entry indicates that the model satisfied the criterion, while a “False” entry indicates the presence of an issue. For example, if a model contained clearly labeled activities with no ambiguity, the criterion “No misleading or incorrect labels” would be marked as True, contributing positively to the model’s Clarity score.
- Visual Evidence: Under each prompt, the corresponding BPMN model URL from [32] is referenced, providing traceability to the specific model version evaluated.
- Score Calculation: For each prompt, a raw score is computed by summing the weights of all criteria marked as True. This score is then normalized by dividing it by the maximum possible score within each quality dimension, resulting in a Normalized Score ranging from 0 (lowest) to 5 (highest).
6. Results
6.1. Selection of Process for Evaluation
6.2. Descriptive Statistics
6.3. Correlation Between Quality Dimensions
6.4. Performance Across Individual Quality Criteria
7. Results
Funding
Conflicts of Interest
References
- Raiaan, M.A.K.; Mukta, M.S.H.; Fatema, K.; Fahad, N.M.; Sakib, S.; Mim, M.M.J.; Ahmad, J.; Ali, M.E.; Azam, S. A Review on Large Language Models: Architectures, Applications, Taxonomies, Open Issues and Challenges. IEEE access 2024, 12, 26839–26874. [Google Scholar] [CrossRef]
- Vidgof, M.; Bachhofner, S.; Mendling, J. Large Language Models for Business Process Management: Opportunities and Challenges. In Business Process Management Forum; Di Francescomarino, C., Burattin, A., Janiesch, C., Sadiq, S., Eds.; Lecture Notes in Business Information Processing; Springer Nature Switzerland: Cham, 2023; Vol. 490, pp. 107–123. ISBN 978-3-031-41622-4. [Google Scholar]
- Laue, R.; Kirchner, K.; Lantow, B.; Edwards, K. Bridging the Gap Between Business Process Modellers and Domain Experts by Variability Patterns. In Transactions on Pattern Languages of Programming V; Wallingford, E., Zdun, U., Kohls, C., Eds.; Springer: Berlin, Heidelberg, 2025; pp. 190–225. ISBN 978-3-662-70810-1. [Google Scholar]
- Grohs, M.; Abb, L.; Elsayed, N.; Rehse, J.-R. Large Language Models Can Accomplish Business Process Management Tasks. In Business Process Management Workshops; De Weerdt, J., Pufahl, L., Eds.; Lecture Notes in Business Information Processing; Springer Nature Switzerland: Cham, 2024; Vol. 492, pp. 453–465. ISBN 978-3-031-50973-5. [Google Scholar]
- Kourani, H.; Berti, A.; Schuster, D.; Van Der Aalst, W.M.P. Process Modeling with Large Language Models. In Enterprise, Business-Process and Information Systems Modeling; Van Der Aa, H., Bork, D., Schmidt, R., Sturm, A., Eds.; Lecture Notes in Business Information Processing; Springer Nature Switzerland: Cham, 2024; Vol. 511, pp. 229–244. ISBN 978-3-031-61006-6. [Google Scholar]
- Dumas, M.; Rosa, M.L.; Mendling, J.; Reijers, H.A. Fundamentals of Business Process Management; 2nd ed.; Springer-Verlag: Berlin Heidelberg, 2018; ISBN 978-3-662-56508-7.
- Swanson, L. An Information-Processing Model of Maintenance Management. International Journal of Production Economics 2003, 83, 45–64. [Google Scholar] [CrossRef]
- Moreira, S.A.S.; Dallavalle, S. Unraveling the Trends in Business Process Management: A Comprehensive Bibliometric Analysis of Management and Business Literature. Business Process Management Journal 2024, 30, 2541–2563. [Google Scholar] [CrossRef]
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Akhtar, N.; Barnes, N.; Mian, A. A Comprehensive Overview of Large Language Models 2024.
- Nousias, N.; Tsakalidis, G.; Vergidis, K. Not yet Another BPM Lifecycle: A Synthesis of Existing Approaches Using BPMN. Information and Software Technology 2024, 171, 107471. [Google Scholar] [CrossRef]
- Grikštaitė, J. Business Process Modelling and Simulation: Advantages and Disadvantages. Global Academic Society Journal: Social Science Insight 2008, 1, 4–14. [Google Scholar]
- Nousias, N.; Tsakalidis, G.; Michoulis, G.; Petridou, S.; Vergidis, K. A Process-Aware Approach for Blockchain-Based Verification of Academic Qualifications. Simulation Modelling Practice and Theory 2022, 121, 102642. [Google Scholar] [CrossRef]
- Tsakalidis, G.; Vergidis, K.; Delias, P.; Vlachopoulou, M. A Conceptual Business Process Entity with Lifecycle and Compliance Alignment. In Proceedings of the International Conference on Decision Support System Technology; Springer, 2019; pp. 70–82. [Google Scholar]
- Object Management Group (OMG) About the Business Process Model And Notation Specification Version 2.0. 2013.
- Yahya, F.; Boukadi, K.; Ben-Abdallah, H. Improving the Quality of Business Process Models: Lesson Learned from the State of the Art. Business Process Management Journal 2018, 25, 1357–1376. [Google Scholar] [CrossRef]
- Gokul, A. LLMs and AI: Understanding Its Reach and Impact. Preprints 2023. [Google Scholar]
- Yang, J.; Jin, H.; Tang, R.; Han, X.; Feng, Q.; Jiang, H.; Zhong, S.; Yin, B.; Hu, X. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. ACM Trans. Knowl. Discov. Data 2024, 18, 1–32. [Google Scholar] [CrossRef]
- Lai, J.; Gan, W.; Wu, J.; Qi, Z.; Yu, P.S. Large Language Models in Law: A Survey. AI Open 2024. [Google Scholar] [CrossRef]
- Feng, Z.; Hu, G.; Li, B.; Wang, J. Unleashing the Power of ChatGPT in Finance Research: Opportunities and Challenges. Financ Innov 2025, 11, 93. [Google Scholar] [CrossRef]
- Firaina, R.; Sulisworo, D. Exploring the Usage of ChatGPT in Higher Education: Frequency and Impact on Productivity. Buletin Edukasi Indonesia 2023, 2, 39–46. [Google Scholar] [CrossRef]
- Javaid, M.; Haleem, A.; Singh, R.P. ChatGPT for Healthcare Services: An Emerging Stage for an Innovative Perspective. BenchCouncil Transactions on Benchmarks, Standards and Evaluations 2023, 3, 100105. [Google Scholar] [CrossRef]
- Szilágyi, R.; Tóth, M. Use of LLM for SMEs, Opportunities and Challenges. Journal of Agricultural Informatics 2023, 14. [Google Scholar] [CrossRef]
- Wolf, V.; Maier, C. ChatGPT Usage in Everyday Life: A Motivation-Theoretic Mixed-Methods Study. International Journal of Information Management 2024, 79, 102821. [Google Scholar] [CrossRef]
- Chen, Y.; Liu, Y.; Yan, J.; Bai, X.; Zhong, M.; Yang, Y.; Yang, Z.; Zhu, C.; Zhang, Y. See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses 2024.
- Klievtsova, N.; Benzin, J.-V.; Kampik, T.; Mangler, J.; Rinderle-Ma, S. Conversational Process Modelling: State of the Art, Applications, and Implications in Practice. In Business Process Management Forum; Di Francescomarino, C., Burattin, A., Janiesch, C., Sadiq, S., Eds.; Lecture Notes in Business Information Processing; Springer Nature Switzerland: Cham, 2023; Vol. 490, pp. 319–336. ISBN 978-3-031-41622-4. [Google Scholar]
- Kourani, H.; Berti, A.; Schuster, D.; Aalst, W.M.P. van der Evaluating Large Language Models on Business Process Modeling: Framework, Benchmark, and Self-Improvement Analysis 2024.
- National Registry of Administrative Public Services 664541:Renewal of a Driving Licence (All Categories). Available online: https://en.mitos.gov.gr (accessed on 18 May 2025).
- Sánchez-González, L.; García, F.; Ruiz, F.; Piattini, M. Toward a Quality Framework for Business Process Models. International Journal of Cooperative Information Systems 2013, 22, 1350003. [Google Scholar] [CrossRef]
- Mendling, J. Managing Structural and Textual Quality of Business Process Models. In Data-Driven Process Discovery and Analysis; Cudre-Mauroux, P., Ceravolo, P., Gašević, D., Eds.; Lecture Notes in Business Information Processing; Springer Berlin Heidelberg: Berlin, Heidelberg, 2013; Vol. 162, pp. 100–111. ISBN 978-3-642-40918-9. [Google Scholar]
- Mendling, J.; Strembeck, M.; Recker, J. Factors of Process Model Comprehension—Findings from a Series of Experiments. Decision Support Systems 2012, 53, 195–206. [Google Scholar] [CrossRef]
- Vanderfeesten, I.; Cardoso, J.; Mendling, J.; Reijers, H.A.; Van der Aalst, W. Quality Metrics for Business Process Models. BPM and Workflow handbook 2007, 144, 179–190. [Google Scholar]
- BPM-UOM BPM-UOM/Llm_based_tools_for_process_modeling 2025.
- Klievtsova, N.; Benzin, J.-V.; Mangler, J.; Kampik, T.; Rinderle-Ma, S. Process Modeler vs. Chatbot: Is Generative AI Taking over Process Modeling? In Process Mining Workshops; Delgado, A., Slaats, T., Eds.; Lecture Notes in Business Information Processing; Springer Nature Switzerland: Cham, 2025; Vol. 533, pp. 637–649. ISBN 978-3-031-82224-7. [Google Scholar]
- Nour Eldin, A.; Assy, N.; Anesini, O.; Dalmas, B.; Gaaloul, W. Nala2BPMN: Automating BPMN Model Generation with Large Language Models. In Cooperative Information Systems; Comuzzi, M., Grigori, D., Sellami, M., Zhou, Z., Eds.; Lecture Notes in Computer Science; Springer Nature Switzerland: Cham, 2025; Vol. 15506, pp. 398–404. ISBN 978-3-031-81374-0. [Google Scholar]
- Bennoit, C.; Greff, T.; Baum, D.; Bajwa, I.A. Identifying Use Cases for Large Language Models in the Business Process Management Lifecycle. In Proceedings of the 2024 26th International Conference on Business Informatics (CBI), IEEE; 2024; pp. 256–263. [Google Scholar]





| Score | Quality Level | Interpretation |
|---|---|---|
| 5 | Very High | No issues observed. The model is complete, correct, and clearly structured. |
| 4 | High | Only one minor issue is present (low or moderate importance). Overall quality is generally acceptable. |
| 3 | Moderate | One major issue or a combination of moderate and minor issues. The model is usable but requires improvement. |
| 2 | Low | Several issues, including at least one major problem. Overall quality is significantly impaired. |
| 1 | Very Low | Multiple major issues are present, including two or more of high importance. The model lacks usability and reliability. |
| Metric | A | B | C | D | E | Mean | St. Dev. |
|---|---|---|---|---|---|---|---|
| Clarity | 2.42 | 3.64 | 4.09 | 3.18 | 0.91 | 2.85 | 1.12 |
| Correctness | 1.52 | 2.73 | 3.33 | 1.97 | 1.36 | 2.18 | 0.75 |
| Completeness | 3.33 | 0.61 | 4.55 | 4.55 | 0.45 | 2.70 | 1.82 |
| Metric | Clarity | Correctness | Completeness |
|---|---|---|---|
| Clarity | 1.000 | 0.882 | 0.525 |
| Correctness | 0.882 | 1.000 | 0.297 |
| Completeness | 0.525 | 0.297 | 1.000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).