Submitted:
03 February 2023
Posted:
06 February 2023
You are already at the latest version
Abstract
Keywords:
1. Introduction
1.1. Importance of Requirements Engineering
- Necessary: capable of conveying what is necessary to achieve the required system functionalities, while being compliant with regulations
- Clear: able to convey the desired goal to the stakeholders by being simple and concise
- Traceable: able to be traced back to higher-level specifications, and vice versa
- Verifiable: can be verified by making use of different verification processes, such as analysis, inspection, demonstration, and test
- Complete: the requirements should result in a system that successfully achieves the client’s needs while being compliant with the regulatory standards
1.2. Shift towards Model-based Systems Engineering
1.3. Research focus and expected contributions
2. Background
2.1. Natural Language Processing for Requirements Engineering (NLP4RE)
2.2. Natural Language Processing (NLP) & Language Models (LMs)
2.2.1. Bidirectional Encoder Representations from Transformers (BERT)
- BERTBASE: contains 12 encoder blocks with a hidden size of 768, and 12 self-attention heads (total of 110 M parameters)
- BERTLARGE: contains 24 encoder blocks with a hidden size of 1024, and 16 self-attention heads (total of 340M parameters)
2.2.2. Zero-shot Text Classification
3. Research Gaps & Objectives
- Creation of a labeled aerospace requirements corpus: Aerospace requirements are collected from Parts 23 and 25 of Title 14 of the Code of Federal Regulations (CFRs) and annotated. The annotation involves tagging each requirement with its “type” (e.g., functional, performance, interface, design, etc.).
- Identification of aerospace requirements of interest: There are different variations of the BERT LM for text classification, however, these models are trained on Tweets, sentiment data, movie reviews, etc. Hence, there is a need for a model that is capable of classifying requirements. Capabilities are added to identify three types of requirements: Design requirements, Functional requirements, and Performance requirements.
- Fine-tuning of BERT for aerospace requirements classification: The annotated aerospace requirements are used for fine-tuning BERTBASE-UNCASED LM. Metrics such as precision, recall, and F1 score are used to assess the model performance.
4. Technical Approach
4.1. Data Collection, Cleaning, and Annotation
4.2. Preparing the dataset for fine-tuning BERTBASE-UNCASED
- [CLS]: This token is added to the beginning of every sequence of text and its final hidden state contains the aggregate sequence representation for the entire sequence, which is then used for the sequence classification task.
- [SEP]: This token is used to separate one sequence from the next and is needed for Next-Sentence-Prediction (NSP) task. Since aerospace requirements used for this research are single sentences, this token was not used.
- [PAD]: This token is used to make sure that all the input sequences are of the same length. The maximum length for the input sequences was set to 100 after examining the distribution of lengths of all sequences in the training set (Figure 9). All the sequences with a length less than the set maximum length will be post-padded with [PAD] tokens till the sequence length is equal to the maximum length. The sequences which are longer than 100, will be truncated.
4.3. Fine-Tuning BERTBASE-UNCASED
5. Results
5.1. aeroBERT-Classifier Performance
5.2. Comparison between aeroBERT-Classifier and bart-large-mnli
6. Conclusions & Future Work
7. Other details
Author Contributions:
- Archana Tikayat Ray: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing—original draft preparation, Writing—review and editing
- Bjorn F. Cole: Conceptualization, Data curation, Writing—review and editing
- Olivia J. Pinon Fischer: Conceptualization, Writing—review and editing
- Ryan T. White: Methodology, Writing—review and editing
- Dimitri N. Mavris: Writing—review and editing
Data Availability Statement
Acknowledgments
Abbreviations
| BERT | Bidirectional Encoder Representations from Transformers |
| CFR | Code of Federal Regulations |
| COND | Condition |
| FAA | Federal Aviation Administration |
| FAR | Federal Aviation Regulations |
| GPT | Generated Pre-trained Transformer |
| INCOSE | International Council on Systems Engineering |
| LM | Language Model |
| LOC | Location (Entity label) |
| MBSE | Model-Based Systems Engineering |
| MISC | Miscellaneous |
| MNLI | Multi-Genre Natural Language Inference |
| NE | Named Entity |
| NER | Named Entity Recognition |
| NL | Natural Language |
| NLI | Natural Language Inference |
| NLP | Natural Language Processing |
| NLP4RE | Natural Language Processing for Requirements Engineering |
| ORG | Organization (Entity label) |
| RE | Requirements Engineering |
| RES | Resource (Entity label) |
| SME | Subject Matter Expert |
| SOTA | State Of The Art |
| SYS | System (Entity label) |
| SysML | Systems Modeling Language |
| UML | Unified Modeling Language |
| ZSL | Zero-shot learning |
References
- Guide to the Systems Engineering Body of Knowledge; BKCASE Editorial Board, INCOSE, 2020; p. 945.
- INCOSE. INCOSE INFRASTRUCTURE WORKING GROUP Charter (accessed Jan. 10, 2023). pp. 3–5.
- NASA. Appendix C: How to Write a Good Requirement (accessed Jan. 05, 2022). pp. 115–119.
- Firesmith, D. Are your requirements complete? J. Object Technol. 2005, 4, 27–44. [CrossRef]
- NASA. 2.1 The Common Technical Processes and the SE Engine. J. Object Technol. (accessed Jan. 10, 2023), 4.
- Nuseibeh, B.; Easterbrook, S. Requirements Engineering: A Roadmap. Proceedings of the Conference on The Future of Software Engineering; Association for Computing Machinery: New York, NY, USA, 2000; ICSE ’00, p. 35–46. [CrossRef]
- Regnell, B.; Svensson, R.B.; Wnuk, K. Can we beat the complexity of very large-scale requirements engineering? International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer, 2008, pp. 123–128.
- Firesmith, D. Common Requirements Problems, Their Negative Consequences, and the Industry Best Practices to Help Solve Them. J. Object Technol. 2007, 6, 17–33. [CrossRef]
- Haskins, B.; Stecklein, J.; Dick, B.; Moroney, G.; Lovell, R.; Dabney, J. 8.4. 2 error cost escalation through the project life cycle. INCOSE International Symposium. Wiley Online Library, 2004, Vol. 14, pp. 1723–1737. [CrossRef]
- Bell, T.E.; Thayer, T.A. Software requirements: Are they really a problem? Proceedings of the 2nd international conference on Software engineering, 1976, pp. 61–68.
- Dalpiaz, F.; Ferrari, A.; Franch, X.; Palomares, C. Natural language processing for requirements engineering: The best is yet to come. IEEE software 2018, 35, 115–119. [CrossRef]
- Ramos, A.L.; Ferreira, J.V.; Barceló, J. Model-based systems engineering: An emerging approach for modern systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 2011, 42, 101–111. [CrossRef]
- Estefan, J.A.; others. Survey of model-based systems engineering (MBSE) methodologies. Incose MBSE Focus Group 2007, 25, 1–12.
- Jacobson, L.; Booch, J.R.G. The unified modeling language reference manual 2021.
- Ballard, M.; Peak, R.; Cimtalay, S.; Mavris, D.N. Bidirectional Text-to-Model Element Requirement Transformation. IEEE Aerospace Conference 2020, pp. 1–14. [CrossRef]
- Lemazurier, L.; Chapurlat, V.; Grossetête, A. An MBSE approach to pass from requirements to functional architecture. IFAC-PapersOnLine 2017, 50, 7260–7265. [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2019. arXiv:cs.CL/1810.04805.
- Ferrari, A.; Dell’Orletta, F.; Esuli, A.; Gervasi, V.; Gnesi, S. Natural Language Requirements Processing: A 4D Vision. IEEE Softw. 2017, 34, 28–35. [CrossRef]
- Abbott, R.J.; Moorhead, D. Software requirements and specifications: A survey of needs and languages. Journal of Systems and Software 1981, 2, 297–316. [CrossRef]
- Luisa, M.; Mariangela, F.; Pierluigi, N.I. Market research for requirements analysis using linguistic tools. Requirements Engineering 2004, 9, 40–56. [CrossRef]
- Manning, C.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S.; McClosky, D. The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations; Association for Computational Linguistics: Baltimore, Maryland, 2014; pp. 55–60. [CrossRef]
- Natural Language Toolkit. https://www.nltk.org/. (accessed: 01.10.2023).
- spaCy. https://spacy.io/. (accessed: 01.10.2023).
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Advances in neural information processing systems 2017, 30.
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. CoRR 2019, abs/1910.13461. arXiv:1910.13461.
- Zhao, L.; Alhoshan, W.; Ferrari, A.; Letsholo, K.J.; Ajagbe, M.A.; Chioasca, E.V.; Batista-Navarro, R.T. Natural language processing for requirements engineering: A systematic mapping study. ACM Computing Surveys (CSUR) 2021, 54, 1–41. [CrossRef]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv 2019, abs/1910.01108. arXiv:1910.01108.
- Goldberg, Y. Neural network methods for natural language processing. Synthesis lectures on human language technologies 2017, 10, 1–309.
- Jurafsky, D.; Martin, J.H. Speech and language processing (draft) 2021.
- Niesler, T.R.; Woodland, P.C. A variable-length category-based n-gram language model. 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings. IEEE, 1996, Vol. 1, pp. 164–167. [CrossRef]
- Bengio, Y.; Ducharme, R.; Vincent, P. A Neural Probabilistic Language Model. Advances in Neural Information Processing Systems; Leen, T.; Dietterich, T.; Tresp, V., Eds. MIT Press, 2000, Vol. 13.
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 2013. arXiv:1301.3781.
- Graves, A. Generating Sequences With Recurrent Neural Networks. [1308.0850v5]. arXiv:1308.0850.
- Cho, K.; van Merrienboer, B.; Çaglar Gülçehre.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. EMNLP, 2014.
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings; Bengio, Y.; LeCun, Y., Eds., 2015.
- Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. Advances in neural information processing systems 2014, 27.
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I.; others. Language models are unsupervised multitask learners. OpenAI blog 2019, 1, 9.
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D. Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems; Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.; Lin, H., Eds. Curran Associates, Inc., 2020, Vol. 33, pp. 1877–1901.
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 2020, 21, 1–67.
- Sun, C.; Qiu, X.; Xu, Y.; Huang, X. How to fine-tune bert for text classification? China national conference on Chinese computational linguistics. Springer, 2019, pp. 194–206.
- Alammar, J. The Illustrated BERT, ELMo, and co. (How NLP Cracked Transfer Learning). https://jalammar.github.io/illustrated-bert/.
- Ray, A.T.; Pinon-Fischer, O.J.; Mavris, D.N.; White, R.T.; Cole, B.F., aeroBERT-NER: Named-Entity Recognition for Aerospace Requirements Engineering using BERT. In AIAA SCITECH 2023 Forum. [CrossRef]
- Dima, A.; Lukens, S.; Hodkiewicz, M.; Sexton, T.; Brundage, M.P. Adapting natural language processing for technical text. Applied AI Letters 2021, 2, e33. [CrossRef] [PubMed]
- Sharir, O.; Peleg, B.; Shoham, Y. The cost of training nlp models: A concise overview. arXiv preprint arXiv:2004.08900 2020. arXiv:2004.08900.
- Dai, A.M.; Le, Q.V. Semi-supervised sequence learning. Advances in neural information processing systems 2015, 28.
- Peters, M.E.; Ammar, W.; Bhagavatula, C.; Power, R. Semi-supervised sequence tagging with bidirectional language models. arXiv preprint arXiv:1705.00108 2017. arXiv:1705.00108.
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding with unsupervised learning 2018.
- Howard, J.; Ruder, S. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 2018. arXiv:1801.06146.
- Hugging Face. https://huggingface.co/. (accessed: 01.10.2023).
- Alammar, J. The Illustrated Transformer. https://jalammar.github.io/illustrated-transformer/.
- Cleland-Huang, J.; Mazrouee, S.; Liguo, H.; Port, D. nfr. [CrossRef]
- Hey, T.; Keim, J.; Koziolek, A.; Tichy, W.F. NoRBERT: Transfer learning for requirements classification. 2020 IEEE 28th International Requirements Engineering Conference (RE). IEEE, 2020, pp. 169–179.
- Zero-Shot Learning in Modern NLP. https://joeddav.github.io/blog/2020/05/29/ZSL.html. (accessed: 01.10.2023).
- Yin, W.; Hay, J.; Roth, D. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and Entailment Approach. CoRR 2019, abs/1909.00161, [1909.00161].
- Williams, A.; Nangia, N.; Bowman, S. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, 2018, pp. 1112–1122.
- Alhoshan, W.; Zhao, L.; Ferrari, A.; Letsholo, K.J. A Zero-Shot Learning Approach to Classifying Requirements: A Preliminary Study. Requirements Engineering: Foundation for Software Quality; Gervasi, V.; Vogelsang, A., Eds.; Springer International Publishing: Cham, 2022; pp. 52–59.
- Beltagy, I.; Lo, K.; Cohan, A. SciBERT: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 2019. arXiv:1903.10676.
- Araci, D. Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063 2019. arXiv:1908.10063.
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020, 36, 1234–1240. [CrossRef] [PubMed]
- Alsentzer, E.; Murphy, J.R.; Boag, W.; Weng, W.H.; Jin, D.; Naumann, T.; McDermott, M. Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323 2019. arXiv:1904.03323.
- Lee, J.S.; Hsiang, J. Patentbert: Patent classification with fine-tuning a pre-trained bert model. arXiv preprint arXiv:1906.02124 2019. arXiv:1906.02124.
- Wheatcraft, L.S. Everything you wanted to know about interfaces, but were afraid to ask. https://reqexperts.com/wp-content/uploads/2016/04/Wheatcraft-Interfaces-061511.pdf.
- Spacey, J. 11 Examples of Quality Requirements. https://simplicable.com/new/quality-requirements.
- Loshchilov, I.; Hutter, F., Decoupled Weight Decay Regularization; arXiv, 2017. [CrossRef]
- Fundamentals of Systems Engineering: Requirements Definition. Available online: https://ocw.mit.edu/courses/16-842-fundamentals-of-systems-engineering-fall-2015/7f2bc41156a04ecb94a6c04546f122af_MIT16_842F15_Ses2_Req.pdf (accessed on 1 October 2022).











| Serial No. | Requirements |
| 1 | The product shall be available for use 24 hours per day 365 days per year. |
| 2 | The product shall synchronize with the office system every hour. |
| 3 | System shall let existing customers log into the website with their email address and password in under 5 seconds. |
| 4 | The product should be able to be used by 90% of novice users on the Internet. |
| 5 | The ratings shall be from a scale of 1-10. |
| Serial No. | Name of resource |
| 1 | Part 23: Airworthiness Standards: Normal, Utility, Acrobatic and Commuter Airplanes |
| 2 | Part 25: Airworthiness Standards: Transport Category Airplanes |
| 14 CFR §23.2145(a) | Requirements created |
| Airplanes not certified for aerobatics must - | |
| (1) Have static longitudinal, lateral, and directional stability in normal operations; | Requirement 1: Airplanes not certified for aerobatics must have static longitudinal, lateral, and directional stability in normal operations. |
| (2) Have dynamic short period and Dutch roll stability in normal operations; and | Requirement 2: Airplanes not certified for aerobatics must have dynamic short period and dutch roll stability in normal operations. |
| (3) Provide stable control force feedback throughout the operating envelope. | Requirement 3: Airplanes not certified for aerobatics must provide stable control force feedback throughout the operating envelope. |
| Original Symbol | Modified text/symbol | Example |
| § | Section | §25.531 → Section 25.531 |
| §§ | Sections | §§25.619 through 25.625 → Section 25.619 through 25.625 |
| Dot (‘.’) used in section numbers | Dash (‘-’) | Section 25.531 → Section 25-531 |
| Requirement Type | Definition |
| Design | Dictates “how” a system should be designed given certain technical standards and specifications; |
| Example: Trim control systems must be designed to prevent creeping in flight. | |
| Functional | Defines the functions that need to be performed by a system in order to accomplish the desired system functionality; |
| Example: Each cockpit voice recorder shall record voice communications of flightcrew members on the flight deck. | |
| Performance | Defines “how well” a system needs to perform a certain function; |
| Example: The airplane must be free from flutter, control reversal, and divergence for any configuration and condition of operation. | |
| Interface | Defines the interaction between systems [62]; |
| Example: Each flight recorder shall be supplied with airspeed data. | |
| Environmental | Defines the environment in which the system must function; |
| Example: The exhaust system, including exhaust heat exchangers for each powerplant or auxiliary power unit, must be designed to prevent likely hazards from heat, corrosion, or blockage. | |
| Quality | Describes the quality, reliability, consistency, availability, usability, maintainability, and materials and ingredients of a system [63]; |
| Example: Internal panes must be made of nonsplintering material. |
| Original Interface Requirement | Modified Requirement “type”/category |
| Each flight recorder shall be supplied with airspeed data. | The airplane shall supply the flight recorder with airspeed data. [Functional Requirement] |
| Each flight recorder shall be supplied with directional data. | The airplane shall supply the flight recorder with directional data. [Functional Requirement] |
| The state estimates supplied to the flight recorder shall meet the aircraft-level system requirements and the functionality specified in Section 23-2500. | The state estimates supplied to the flight recorder shall meet the aircraft level system requirements and the functionality specified in Section 23-2500. [Design Requirement] |
| Requirements | Label |
| Each cockpit voice recorder shall record voice communications transmitted from or received in the airplane by radio. | 1 |
| Each recorder container must be either bright orange or bright yellow. | 0 |
| Single-engine airplanes, not certified for aerobatics, must not have a tendency to inadvertently depart controlled flight. | 2 |
| Each part of the airplane must have adequate provisions for ventilation and drainage. | 0 |
| Each baggage and cargo compartment must have a means to prevent the contents of the compartment from becoming a hazard by impacting occupants or shifting. | 1 |
| Requirement type | Training set count | Test set count |
| Design (0) | 136 | 13 |
| Functional (1) | 89 | 10 |
| Performance (2) | 54 | 8 |
| Total | 279 | 31 |
| Requirement type | Precision | Recall | F1 score |
| Design (0) | 0.80 | 0.92 | 0.86 |
| Functional (1) | 0.89 | 0.80 | 0.84 |
| Performance (2) | 0.86 | 0.75 | 0.80 |
| Average | 0.85 | 0.82 | 0.83 |
| Requirements | Actual | Predicted |
| The installed powerplant must operate without any hazardous characteristics during | ||
| normal and emergency operation within the range of operating limitations for the | 2 | 1 |
| airplane and the engine. | ||
| Each flight recorder must be installed so that it remains powered for as long as possible | 0 | 2 |
| without jeopardizing emergency operation of the airplane. | ||
| The microphone must be so located and, if necessary, the preamplifiers and filters of the | 2 | 0 |
| recorder must be so adjusted or supplemented, so that the intelligibility of the recorded | ||
| communications is as high as practicable when recorded under flight cockpit noise | ||
| conditions and played back. | ||
| A means to extinguish fire within a fire zone, except a combustion heater fire zone, | ||
| must be provided for any fire zone embedded within the fuselage, which must also | 1 | 0 |
| include a redundant means to extinguish fire. | ||
| Thermal/acoustic materials in the fuselage, must not be a flame propagation hazard. | 1 | 0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).