Submitted:
28 August 2025
Posted:
29 August 2025
You are already at the latest version
Abstract

Keywords:
1. Introduction
- How accurately can BERT, CNN, and LSTM classify aviation safety incidents into predefined injury severity categories (Nil, Minor, Serious, Fatal) using unstructured textual data?
- How does the performance of BERT compare to CNN and LSTM in terms of classification accuracy, recall, and precision when applied to aviation safety reports?
- What are the relative strengths and limitations of each model in processing aviation incident narratives, and which model demonstrates the highest effectiveness in automating classification tasks?
2. Related Work
2.1. Machine Learning Approaches for Aviation Safety Data
2.2. Deep Learning Models for Text Classification
2.3. Transformer Models in Text Classification
2.4. Comparative Studies of Machine Learning Models in Aviation Safety
3. Materials and Methods
3.1. Data Acquisition
3.2. Data Pre-Processing
3.3. Model Architectures
3.3.1. BERT
3.3.2. CNN
3.3.3. LSTMs
3.4. Experimental Setup
Hyperparameter Tuning
3.5. Performance Metrics
4. Results
4.1. Evaluation of BERT Model Performance
5. Ablation Study
5.1. LSTM Ablation
5.2. CNN Ablation
5.3. BERT Ablation
5.4. Discussion
5.5. Limitations
6. Conclusion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| Abbreviation | Full Form |
| AI | Artificial Intelligence |
| AUC | Area Under the Curve |
| ASN | Aviation Safety Network |
| ASRS | Aviation Safety Reporting System |
| ATSB | Australian Transport Safety Bureau |
| BiLSTM | Bidirectional Long Short-Term Memory |
| BERT | Bidirectional Encoder Representations from Transformers |
| CBRNN | Convolutional Bidirectional Recurrent Neural Network |
| CNN | Convolutional Neural Network |
| GRU | Gated Recurrent Unit |
| ICAO | International Civil Aviation Organization |
| LSTM | Long Short-Term Memory |
| ML | Machine Learning |
| NLP | Natural Language Processing |
| NLU | Natural Language Understanding |
| NTSB | National Transportation Safety Board |
| RNN | Recurrent Neural Network |
| SVM | Support Vector Machine |
| TF-IDF | Term Frequency-Inverse Document Frequency |
References
- Wild Graham Airbus A32x Versus Boeing 737 Safety Occurrences. IEEE Aerospace and Electronic Systems Magazine 2023, 38, 4–12. [CrossRef]
- Council Australian Motorcycle, Roads NSW, Out Older People Speak, Police SA, Force Victorian Police, Police WA. Australian Transport Safety Bureau with the National Road Safety Strategy Panel and Taskforce. Contributions were also received from the following organisations: Australian Automobile Association ACT Department of Urban Services Australian College of Road Safety.
- Nanyonga Aziida, Wasswa Hassan, Turhan Ugur, Joiner Keith, Wild Graham, editors. Exploring Aviation Incident Narratives Using Topic Modeling and Clustering Techniques. 2024 IEEE Region 10 Symposium (TENSYMP); 2024: IEEE.
- Nanyonga Aziida, Wild Graham, editors. Impact of Dataset Size & Data Source on Aviation Safety Incident Prediction Models with Natural Language Processing. 2023 Global Conference on Information Technologies and Communications (GCITC); 2023: IEEE.
- Weber Ludwig. International Civil Aviation Organization (ICAO). 2023.
- Nanyonga Aziida, Wasswa Hassan, Joiner Keith, Turhan Ugur, Wild Graham. A Multi-Head Attention-Based Transformer Model for Predicting Causes in Aviation Incident. 2025.
- Nanyonga Aziida, Joiner Keith, Turhan Ugur, Wild Graham, editors. Applications of natural language processing in aviation safety: A review and qualitative analysis. AIAA SCITECH 2025 Forum; 2025.
- Li Zhibin, Liu Pan, Wang Wei, Xu Chengcheng Using support vector machine models for crash injury severity analysis. Accident Analysis & Prevention 2012, 45, 478–86.
- Mokhtarimousavi Seyedmirsajad, Anderson Jason C, Azizinamini Atorod, Hadi Mohammed Improved support vector machine models for work zone crash injury severity prediction and analysis. Transportation research record 2019, 2673, 680–92. [CrossRef]
- Chen Cong, Zhang Guohui, Qian Zhen, Tarefder Rafiqul A, Tian Zong Investigating driver injury severity patterns in rollover crashes using support vector machine models. Accident Analysis & Prevention 2016, 90, 128–39.
- Nanyonga Aziida, Wasswa Hassan, Turhan Ugur, Joiner Keith, Wild Graham, editors. Comparative Analysis of Topic Modeling Techniques on ATSB Text Narratives Using Natural Language Processing. 2024 3rd International Conference for Innovation in Technology (INOCON); 2024: IEEE.
- Nanyonga Aziida, Wasswa Hassan, Wild Graham, editors. Phase of Flight Classification in Aviation Safety Using LSTM, GRU, and BiLSTM: A Case Study with ASN Dataset. 2023 International Conference on High Performance Big Data and Intelligent Systems (HDIS); 2023: IEEE.
- Socher Richard, Bengio Yoshua, Manning Christopher D. Deep learning for NLP (without magic). Tutorial Abstracts of ACL 20122012. p. 5-.
- Kim Hannah, Jeong Young-Seob Sentiment classification using convolutional neural networks. Applied Sciences 2019, 9, 2347. [CrossRef]
- Harley Adam W, Ufkes Alex, Derpanis Konstantinos G, editors. Evaluation of deep convolutional nets for document image classification and retrieval. 2015 13th international conference on document analysis and recognition (ICDAR); 2015: IEEE.
- Coelho Eugenio Fernando, Badin Tiago Luis, Fernandes Pablo, Mallmann Caroline Lorenci, Schons Cristine, Schuh Mateus Sabadi, Soares Pereira Rudiney, Fantinel Roberta Aparecida, Pereira da Silva Sally Deborah Remotely Piloted Aircraft Systems (RPAS) and machine learning: A review in the context of forest science. International Journal of Remote Sensing 2021, 42, 8207–35.
- Nanyonga Aziida, Wasswa Hassan, Turhan Ugur, Molloy Oleksandra, Wild Graham, editors. Sequential classification of aviation safety occurrences with natural language processing. AIAA AVIATION 2023 Forum; 2023.
- Zhang Xiaoge, Mahadevan Sankaran Ensemble machine learning models for aviation incident risk prediction. Decision Support Systems 2019, 116, 48–63. [CrossRef]
- Nanyonga Aziida, Wasswa Hassan, Molloy Oleksandra, Turhan Ugur, Wild Graham, editors. Natural language processing and deep learning models to classify phase of flight in aviation safety occurrences. 2023 IEEE Region 10 Symposium (TENSYMP); 2023: IEEE.
- Nanyonga Aziida, Wild Graham Classification of Operational Records in Aviation Using Deep Learning Approaches. arXiv 2025, arXiv:2025.01222.
- Devlin Jacob Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:2018.04805.
- Iddrisu Abdul-Manan, Mensah Solomon, Boafo Fredrick, Yeluripati Govindha R, Kudjo Patrick A sentiment analysis framework to classify instances of sarcastic sentiments within the aviation sector. International Journal of Information Management Data Insights 2023, 3, 100180.
- Chandra Chetan, Ojima Yuga, Bendarkar Mayank V, Mavris Dimitri N Aviation-BERT-NER: Named Entity Recognition for Aviation Safety Reports. Aerospace 2024, 11, 890. [CrossRef]
- Qasim Rukhma, Bangyal Waqas Haider, Alqarni Mohammed A, Ali Almazroi Abdulwahab A fine-tuned BERT-based transfer learning approach for text classification. Journal of healthcare engineering 2022, 2022, 3498123.
- Kierszbaum Samuel, Lapasset Laurent, Klein Thierry CORIA. Transformer-based model on aviation incident reports. J Proc. 2021.
- González-Carvajal Santiago, Garrido-Merchán Eduardo C Comparing BERT against traditional machine learning text classification. arXiv 2020, arXiv:2020.13012.
- Rai Nishant, Kumar Deepika, Kaushik Naman, Raj Chandan, Ali Ahad Fake News Classification using transformer based enhanced LSTM and BERT. International Journal of Cognitive Computing in Engineering 2022, 3, 98–105. [CrossRef]
- Kokab Sayyida Tabinda, Asghar Sohail, Naz Shehneela Transformer-based deep learning models for the sentiment analysis of social media data. J Array 2022, 14, 100157. [CrossRef]
- Oliaee Amir Hossein, Das Subasish, Liu Jinli, Rahman M Ashifur Using Bidirectional Encoder Representations from Transformers (BERT) to classify traffic crash severity types. Natural language processing journal 2023, 3, 100007. [CrossRef]
- Singh Utkarsha, Bhattacharya Margamitra, Padhi Radhakant. State-of-the-Art Natural Language Processing for Aviation: A Review.
- Soyalp Gokhan, Alar Artun, Ozkanli Kaan, Yildiz Beytullah, editors. Improving text classification with transformer. 2021 6th International Conference on Computer Science and Engineering (UBMK); 2021: IEEE.
- Gao Yubing, Zhu GuangYu, Duan Ya, Mao Jianfeng Semantic Encoding Algorithm for Classification and Retrieval of Aviation Safety Reports. IEEE Transactions on Automation Science, Engineering 2024.
- Liddy Elizabeth, D. Natural language processing. 2001.
- Nanyonga Aziida, Wasswa Hassan, Wild Graham, editors. Aviation Safety Enhancement via NLP & Deep Learning: Classifying Flight Phases in ATSB Safety Reports. 2023 Global Conference on Information Technologies and Communications (GCITC); 2023: IEEE.
- Gupta Akhilesh, Tatbul Nesime, Marcus Ryan, Zhou Shengtian, Lee Insup, Gottschlich Justin. Class-weighted evaluation metrics for imbalanced data classification. 2020.
- Kamyab Marjan, Liu Guohua, Adjeisah Michael Attention-based CNN and Bi-LSTM model based on TF-IDF and glove word embedding for sentiment analysis. Applied Sciences 2021, 11, 11255. [CrossRef]
- Başarslan Muhammet Sinan, Kayaalp Fatih MBi-GRUMCONV: A novel Multi Bi-GRU and Multi CNN-Based deep learning model for social media sentiment analysis. Journal of Cloud Computing 2023, 12, 5. [CrossRef]
- Loshchilov Ilya, Hutter Frank Fixing weight decay regularization in adam. arXiv 2017, arXiv:2017.05101.
- O’shea Keiron, Nash Ryan. An introduction to convolutional neural networks. arXiv 2015, arXiv:1511.08458. [Google Scholar] [CrossRef]
- Hochreiter S., J. Neural Computation M. I. T. Press. Long Short-term Memory. 1997.
- Nanyonga Aziida, Wasswa Hassan, Joiner Keith, Turhan Ugur, Wild Graham. Explainable Supervised Learning Models for Aviation Predictions in Australia. Aerospace 2025, 12, 223. [Google Scholar] [CrossRef]
- Reza Selim, Ferreira Marta Campos, Machado José JM, Tavares João Manuel RS. A multi-head attention-based transformer model for traffic flow forecasting with a comparative analysis to recurrent neural networks. Expert Systems with Applications 2022, 202, 117275. [Google Scholar] [CrossRef]
- Vaswani A., J. Advances in Neural Information Processing Systems. Attention is all you need. 2017.
- Rengasamy Divish, Morvan Hervé P, Figueredo Grazziela P, editors. Deep learning approaches to aircraft maintenance, repair and overhaul: A review. 2018 21st International Conference on Intelligent Transportation Systems (ITSC); 2018: IEEE.
- Chawla Nitesh V, Bowyer Kevin W, Hall Lawrence O, Kegelmeyer W Philip. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 2002, 16, 321–57. [Google Scholar] [CrossRef]
- Habbat Nassera, Nouri Hicham, Anoun Houda, Hassouni Larbi. Sentiment analysis of imbalanced datasets using BERT and ensemble stacking for deep learning. Engineering Applications of Artificial Intelligence 2023, 126, 106999. [Google Scholar] [CrossRef]
- Ribeiro Marco Tulio, Singh Sameer, Guestrin Carlos, editors. “ Why should i trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016.
- Kannan Rithesh, Ng Hu, Yap Timothy Tzen Vun, Wong Lai Kuan, Chua Fang Fang, Goh Vik Tor, Lee Yee Lien, Wong Hwee Ling. Handling class imbalance in education using data-level and deep learning methods. International Journal of Electrical, Engineering Computer 2025, 15, 741–54. [Google Scholar] [CrossRef]
- Helmreich Robert L, Merritt Ashleigh C. Culture at work in aviation and medicine: National, organizational and professional influences: Routledge; 2017.
- Somerville Alexander, Lynar Timothy, Wild Graham. The nature and costs of civil aviation flight training safety occurrences. Transportation Engineering 2023, 12, 100182. [Google Scholar] [CrossRef]
- Thai-Nghe Nguyen, Nghi DT, Schmidt-Thieme Lars, editors. Learning optimal threshold on resampling data to deal with class imbalance. Proc IEEE RIVF International Conference on Computing and Telecommunication Technologies; 2010.
- Jeni László A, Cohn Jeffrey F, De La Torre Fernando, editors. Facing imbalanced data--recommendations for the use of performance metrics. 2013 Humaine association conference on affective computing and intelligent interaction; 2013: IEEE.
- Vamvakas Panagiotis, Tsiropoulou Eirini Eleni, Papavassiliou Symeon. Risk-aware resource management in public safety networks. Sensors 2019, 19, 3853. [Google Scholar] [CrossRef] [PubMed]




| Model | Hyperparameter | Tuned Values | Optimal Value |
|---|---|---|---|
| LSTM |
Number of Layers | [1, 2, 3] | 2 |
| Hidden Units | [64, 128, 256] | 128 | |
| Dropout Rate | [0.2, 0.3, 0.5] | 0.3 | |
| Learning Rate | [0.001, 0.0005, 0.0001] | 0.0005 | |
| Batch Size | [16, 32, 64] | 32 | |
| Optimizer | [Adam, RMSprop, SGD] | Adam | |
| CNN |
Number of Filters | [32, 64, 128] | 64 |
| Kernel Size | [3, 5, 7] | 5 | |
| Pooling Type | [MaxPooling, AveragePooling] | MaxPooling | |
| Dropout Rate | [0.2, 0.3, 0.5] | 0.3 | |
| Learning Rate | [0.001, 0.0005, 0.0001] | 0.001 | |
| Batch Size | [16, 32, 64] | 32 | |
| Optimizer | [Adam, RMSprop, SGD] | Adam | |
| BERT |
Pretrained Model | [BERT-Base, BERT-Large] | BERT-Base |
| Learning Rate | [3e-5, 5e-5, 1e-4] | 3e-5 | |
| Batch Size | [8, 16, 32] | 16 | |
| Epochs | [3, 5, 10] | 5 | |
| Max Sequence Length | [128, 256, 512] | 256 | |
| Optimizer | [AdamW, SGD] | AdamW |
| Metrics used | Formula | Evaluation focus |
|---|---|---|
| Precision (p) | Precision measures the correctly predicted positives from the total predicted patterns in a positive class. | |
| Recall (r) | This recall measures the fraction of positive patterns that are correctly classified. | |
| F1-score (F) | F-score measures the weighted average score of precision and recall. | |
| Accuracy (acc) | Accuracy measures the total number of instances evaluated using the correctly predicted ratio. |
|
Actual Value |
Predicted Value | |
| TN | FP | |
| FN | TP | |
| Model | Metric | Nil | Minor | Fatal | Serious | Macro Average | Weighted Average | Accuracy |
|---|---|---|---|---|---|---|---|---|
|
LSTM |
Precision | 0.9942 | 0.7452 | 0.9143 | 0.9268 | 0.8951 | 0.9898 | 0.9901 |
| Recall | 0.9964 | 0.7178 | 0.7111 | 0.7917 | 0.8043 | 0.9901 | ||
| F1-Score | 0.9953 | 0.7312 | 0.8000 | 0.8539 | 0.8451 | 0.9898 | ||
|
CNN |
Precision | 0.9955 | 0.7006 | 0.9474 | 0.8776 | 0.8802 | 0.9902 | 0.9899 |
| Recall | 0.9947 | 0.7607 | 0.8000 | 0.8958 | 0.8628 | 0.9899 | ||
| F1-Score | 0.9951 | 0.7294 | 0.8675 | 0.8866 | 0.8696 | 0.9900 | ||
|
BERT |
Precision | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| Recall | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ||
| F1-Score | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).