Submitted:
19 September 2025
Posted:
22 September 2025
You are already at the latest version
Abstract
Keywords:
1. Introduction
2. Theoretical Concepts
2.1. ISO/IEC 25010:2023
2.2. Behavior-Driven Development (BDD)
2.3. Large Language Model (LLM)
3. Related Work
4. Experiment
4.1. Experiment Execution

4.2. Evaluated Metric
5. Results
5.1. Q1—What Is the Similarity of Responses Between ChatGPT and Gemini?
- Metric: Similarity coefficient.
- Alternative Hypothesis (H1): LLMs do not generate similarity to each other.
- Null Hypothesis (HN1): LLMs generate similarity to each other.
5.2. Q2—How Can BDD Be Used to Ensure Quality Related to the Security Non-Functional Requirements Through an LLM?
- Requirements creation and validation: ChatGPT and Gemini help one clearly and precisely develop non-functional requirements, such as security.
- Requirements gathering: During this stage, ChatGPT and Gemini can transform complex information placed through the prompt into precise specifications to generate test scenarios aligned with ISO/IEC 25010: 2023 Standard through BDD.
- Documentation automation: ChatGPT and Gemini expand the ability to generate documentation and facilitate the generation of technical documentation and test scenarios.
- Productivity and Consistency: Automating documentation can save time and reduce manual effort and consistent text generation helps maintain the quality of requirements.
6. Discussion
7. Threats to Validity
7.1. Construction Validity
7.2. Internal Validity
7.3. External Validity
7.4. Conclusion Validity
8. Conclusion
Acknowledgments
References
- North, D. Introducing BDD. https://dannorth.net/introducing-bdd/, 2006.
- 7, T.C.I.J.S. ISO/IEC 25010:2023. https://www.iso.org/standard/78176.html, 2023.
- Binamungu, L.P.; Embury, S.M.; Konstantinou, N. Maintaining Behaviour Driven-Development specifications: Challenges and opportunities. In Proceedings of the 2018 IEEE 25th International Conference on Software Analysis, Evolution and Reengineering SANER; IEEE, 2018; pp. 175–184. [Google Scholar]
- Sauvola, J.; Tarkoma, S.; Klemettinen, M.; Riekki, J.; Doermann, D. Future of software development with generative AI. Automated Software Engineering 2024, 31, 26. [Google Scholar] [CrossRef]
- Brown, T.B. Language models are few-shot learners. arXiv preprint, arXiv:2005.14165 2020.
- DeepMind, G. Gemini: Advancing Language Understanding and Generation with Deep Neural Networks. Journal of Artificial Intelligence Research 2024. [Google Scholar]
- Wohlin, C.; Runeson, P.; Höst, M.; Ohlsson, M.C.; Regnell, B.; Wesslén, A.; et al. Experimentation in software engineering; Vol. 236, Springer, 2012.
- Karagöz, G.; Sözer, H. Reproducing failures based on semiformal failure scenario descriptions. Software Quality Journal 2017, 25, 111–129. [Google Scholar] [CrossRef]
- Estdale, J.; Georgiadou, E., Applying the ISO/IEC 25010 Quality Models to Software Product: 25th European Conference, EuroSPI 2018, Bilbao, Spain, September 5-7, 2018, Proceedings; 2018; pp. 492–503. [CrossRef]
- April, A.; Abran, A. , Maturity Models in Software Engineering. In Software Maintenance Management: Evaluation and Continuous Improvement; 2008; pp. 41–67. [CrossRef]
- Cucumber. Who Does What? https://cucumber.io/docs/bdd/who-does-what/, 2019.
- Silva, T.R.; Fitzgerald, B. Empirical findings on BDD story parsing to support consistency assurance between requirements and artifacts. In Evaluation and Assessment in Software Engineering; 2021; pp. 266–271.
- Bruschi, S.; Xiao, L.; Kavatkar, M.; et al. Behavior Driven-Development (BDD): a case study in healthtech. In Proceedings of the Pacific NW Software Quality Conference; 2019. [Google Scholar]
- Radford, A.; Wu, J.; Amodei, D.; Sutskever, I. Language Models are Few-Shot Learners. arXiv preprint arXiv:2005.14165, arXiv:2005.14165 2019. Acesso em: 10 ago. 2024.
- Rajbhoj, A.; Somase, A.; Kulkarni, P.; Kulkarni, V. Accelerating Software Development Using Generative AI: ChatGPT Case Study. In Proceedings of the Proceedings of the 17th Innovations in Software Engineering Conference; 2024; pp. 1–11. [Google Scholar]
- Santos, S.; Pimentel, T.; Rocha, F.G.; Soares, M.S. Using Behavior-Driven Development (BDD) for Non-Functional Requirements. Software 2024, 3, 271–283. [Google Scholar] [CrossRef]
- Olsson, T.; Sentilles, S.; Papatheocharous, E. A Systematic Literature Review of empirical research on quality requirements. Requirements Engineering. Springer 2022, V.27, 249–271. [Google Scholar] [CrossRef]
- Kasauli, R.; Knauss, E.; Horkoff, J.; Liebel, G.; de Oliveira Neto, F.G. Requirements engineering challenges and practices in large-scale agile system development. Journal of Systems and Software 2021, 172, 110851. [Google Scholar] [CrossRef]
- Jarzębowicz, A.; Weichbroth, P. A qualitative study on non-functional requirements in agile software development. IEEE Access 2021, V. 9, 40458–40475. [Google Scholar] [CrossRef]
- Kitchenham, B.A.; Dyba, T.; Jorgensen, M. Evidence-based software engineering. In Proceedings of the Proceedings. 26th International Conference on Software Engineering; IEE, 2004; pp. 273–281. [Google Scholar]
- Melegati, J.; Wang, X. Case survey studies in software engineering research. In Proceedings of the Proceedings of the 14th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM); 2020; pp. 1–12. [Google Scholar]
- Petersen, K. Guidelines for Case Survey Research in Software Engineering. Contemporary empirical methods in software engineering 2020, 63–92. [Google Scholar]
- Van Solingen, R.; Berghout, E.W. The Goal/Question/Metric Method: a practical guide for quality improvement of software development; McGraw-Hill, 1999.
- Runeson, P.; Höst, M. Guidelines for conducting and reporting case study research in software engineering. Empirical software engineering. Springer 2009, V.14, 131–164. [Google Scholar] [CrossRef]
- Zhou, X.; Jin, Y.; Zhang, H.; Li, S.; Huang, X. A map of threats to validity of Systematic Literature Reviews in Software Engineering. In Proceedings of the 2016 23rd Asia-Pacific Software Engineering Conference (APSEC); IEEE, 2016; pp. 153–160. [Google Scholar]


| Id | Question | Justification |
|---|---|---|
| Q1 | What is the similarity of responses between ChatGPT and Gemini? | Present possible differences between these LLMs. |
| Q2 | How can BDD be used to ensure quality related to the security non-functional requirement through an LLM? | Explain how this framework could contribute to eliciting non-functional requirements. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).